Checkpoint 12.3.1.
Use findall to find the lines with email addresses in them and print them.
findall()
method to extract all of the substrings which match a regular expression. Let's use the example of wanting to extract anything that looks like an email address from any line regardless of format. For example, we want to pull the email addresses from each of the following lines:From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008 Return-Path: <postmaster@collab.sakaiproject.org> for <source@collab.sakaiproject.org>; Received: (from apache@localhost) Author: stephen.marquard@uct.ac.za
findall()
to find the lines with email addresses in them and extract one or more addresses from each of those lines.findall()
method searches the string in the second argument and returns a list of all of the strings that look like email addresses. We are using a two-character sequence that matches a non-whitespace character (\S
).['csev@umich.edu', 'cwen@iupui.edu']
\S+
matches as many non-whitespace characters as possible.findall()
returns a list, we simply check if the number of elements in our returned list is more than zero to print only lines where we found at least one substring that looks like an email address.\S
is asking to match the set of “non-whitespace characters”. Now we will be a little more explicit in terms of the characters we will match.[a-zA-Z0-9]\S*@\S*[a-zA-Z]
\S*
), followed by an at-sign, followed by zero or more non-blank characters (\S*
), followed by an uppercase or lowercase letter. Note that we switched from +
to *
to indicate zero or more non-blank characters since [a-zA-Z0-9]
is already one non-blank character. Remember that the *
or +
applies to the single character immediately to the left of the plus or asterisk.source@collab.sakaiproject.org
lines, our regular expression eliminated two letters at the end of the string (“>;”). This is because when we append [a-zA-Z]
to the end of our regular expression, we are demanding that whatever string the regular expression parser finds must end with a letter. So when it sees the “>” at the end of “sakaiproject.org>;” it simply stops at the last “matching” letter it found (i.e., the “g” was the last good match).mailto:csev@umich.edu
mailto:cwen@iupui.edu