Skip to main content

Section 12.13 Group Work: Regular Expressions (Regex)

It is best to use a POGIL approach with the following. In POGIL students work in groups on activities and each member has an assigned role. For more information see https://cspogil.org/Home 1 .

Note 12.13.1.

If you work in a group, have only one member of the group fill in the answers on this page. You will be able to share your answers with the group at the bottom of the page.
Learning Objectives
Students will know and be able to do the following.
Content Objectives:
  • Learn about search and findall and what they return
  • Learn about some common quantifiers (*, +, ?, {n})
  • Learn about character classes (\d, \w, \s, .)
  • Learn about character sets ([an]) and ranges ([0-9])
  • Learn how to negate a character set using [^0-9]
  • Learn how to escape a special character to match it
  • Learn about greedy matching and how to make it not greedy
  • Learn how to return just part of a match using parentheses

Subsection 12.13.1 Regex Methods

Two of the methods that you can use with regular expressions are search and findall. Note that you must import re to use these.

Checkpoint 12.13.2.

Run the code below to see what it prints.

Checkpoint 12.13.3.

11-9-2: What type of thing does findall return?

Checkpoint 12.13.4.

11-9-3: What does search return if no match is found?

Checkpoint 12.13.5.

11-9-4: Which method returns information about the first match as an object?

Subsection 12.13.2 Quantifiers

You can specify how many items to match using quantifiers. They refer to the item to their left. The quantifiers are ?, +, *, {n}, and {n,m}.

Checkpoint 12.13.6.

Run the code below to see what it prints.

Checkpoint 12.13.7.

    11-9-6: How many c's must there be in a row for c{2} to match at least part of the string?
  • 0 to many
  • No, this would be 'c*'
  • 0 to 2
  • No, this would be just 'c'
  • exactly 2
  • No, it will match strings that have more than 2 c's in a row.
  • 2 or more
  • This will match 2 c's but there can be more in the string.

Checkpoint 12.13.8.

11-9-7: What characters are used to match a digit?

Checkpoint 12.13.9.

Subsection 12.13.3 Character Sets

You can use [] to specify that you need to match any one item in the [].

Checkpoint 12.13.10.

Run the code below to see what it prints.

Checkpoint 12.13.11.

    11-9-10: What does [ea] mean?
  • Match either an 'e' or 'a' one time
  • It will match one of the items listed in []
  • Match 'ae' one time
  • It will match one of the items listed in []
  • Match either an 'e' or 'a' one to many times
  • This would be true if it was [ae]+
  • Match 'ae' one to many times
  • This would be true if it was (ae)+

Subsection 12.13.4 Character Ranges

You can specify a range of items to match.

Checkpoint 12.13.12.

Run the code below to see what it prints.

Checkpoint 12.13.13.

    11-9-12: What does [0-9.]+ mean?
  • Match any digit or period one or more times
  • Items in the [] match themselves and are not treated as special characters other than '-'
  • Match any digit or anything that isn't a new line one or more times
  • The period in a [] just means match a period
  • Match any digit or period zero to many times
  • The + outside of the [] means match one or more
  • Match any digit or anything that isn't a new line zero to many times
  • The period in a [] just means match a period and the + means match one or more times

Checkpoint 12.13.14.

    11-9-13: What does [^0-9.]+ mean?
  • Match anything other than 0-9 and a period zero to one times
  • The + means one to many times
  • Match anything other than 0-9 and a period one to many times
  • Correct!
  • Match ^ or 0-9 or a period zero to one times
  • The ^ negates the items
  • Match ^ or 0-9 or a period one to many times
  • The ^ negates the items

Subsection 12.13.5 Character Classes

Checkpoint 12.13.15.

Run the code below to see what it prints.

Checkpoint 12.13.16.

Checkpoint 12.13.17.

Run the code below to see what it prints.

Checkpoint 12.13.18.

Subsection 12.13.6 Escaping Special Characters

If you want to match something that is normally a special character in regex you must escape it by adding a \ in front of it.

Checkpoint 12.13.19.

Run the code below to see what it prints.

Checkpoint 12.13.20.

    11-9-19: How many items will be in the list that the following code prints?
    import re
    str = "302.33 64.52 204.24 532.2 1.23 323.320"
    res = re.findall("\d{3}\.\d{2}",str)
    print(res)
    
  • 1
  • It will match three digits followed by a period and then 2 digits
  • 2
  • It will match three digits followed by a period and then 2 digits
  • 3
  • It will match three digits followed by a period and then 2 digits
  • 4
  • It will match three digits followed by a period and then 2 digits

Subsection 12.13.7 Greedy and Non-Greedy Matching

Matching is usually greedy.

Checkpoint 12.13.21.

Run the code below to see what it prints.

Checkpoint 12.13.22.

11-9-21: What character can you add after a quantifier like ‘+' or ‘*' to make it not greedy?
If you worked in a group, you can copy the answers from this page to the other group members. Select the group members below and click the button to share answers.
<div class="runestone sqcontainer %(optclass)s"> <div data-component="groupsub" id=regex_groupsub data-size_limit=3> <div class="col-sm-6"> <select id="assignment_group" multiple class="assignment_partner_select" style="width: 100%"> </select> </div> <div id="groupsub_button" class="col-sm-6"> </div> <p>The Submit Group button will submit the answer for each each question on this page for each member of your group. It also logs you as the official group submitter.</p> </div> </div>
https://cspogil.org/Home