Skip to main content

Section 12.14 Group Work: More Regular Expressions (Regex)

It is best to use a POGIL approach with the following. In POGIL students work in groups on activities and each member has an assigned role. For more information see https://cspogil.org/Home 1 .

Note 12.14.1.

If you work in a group, have only one member of the group fill in the answers on this page. You will be able to share your answers with the group at the bottom of the page.
Learning Objectives
Students will know and be able to do the following.
Content Objectives:
  • Learn about using | as a logical or
  • Learn about matching groups and non matching groups
  • Learn about anchor characters (^, $, and \b)
  • Learn about raw strings
  • Learn how to negate a character set

Subsection 12.14.1 Using a logical “or”

What if you want to match a month from 1 to 12 in MM/DD/YYYY? You can't use [1-12] since it matches a character at a time. You have to match either a digit from 1 to 9 or a 1 followed by 0, 1, or 2. To use a logical or to match one of two expressions use (left|right). This will match either the expression on the left or the one on the right.

Checkpoint 12.14.2.

Run the code below to see what it prints.

Checkpoint 12.14.3.

    11-9-2: Sometimes dates have a leading zero if the month is from 1 to 9. Which of the following would match that case as well but still match if there isn't a 0?
  • "0([1-9]|1[0-2])/\d{2}/\d{4}"
  • This would require a 0 before a 1-9
  • "0*([1-9]|1[0-2])/\d{2}/\d{4}"
  • This would match 0 to many 0's
  • "0+([1-9]|1[0-2])/\d{2}/\d{4}"
  • This would require at least one 0
  • "0?([1-9]|1[0-2])/\d{2}/\d{4}"
  • This matches 0 to 1 0's

Subsection 12.14.2 Specifying What to Extract - Matching Groups

There are times when you want to return just part of what was matched.

Checkpoint 12.14.4.

Run the code below to see what it prints.

Checkpoint 12.14.5.

11-9-4: Which symbols are used to specify the part of the match to return?

Note 12.14.6.

Parentheses are used to define a capture group - only what is in the parentheses will be returned.

Subsection 12.14.3 Specifying What to Extract - Non-Matching Groups

What if we need the parentheses because we are using a logical or but want the whole match to be returned? We can add a “?:” after the first parenthesis to group items for the logical or but return the entire match.

Checkpoint 12.14.7.

Run the code below to see what it prints.
Another approach is to enclose everything in a set of outer parentheses if you have any inner parentheses.

Checkpoint 12.14.8.

Run the code below to see what it prints.

Checkpoint 12.14.9.

    11-9-7: Given the following code which of the following would you use to get the current date and add it to the list?
    import re
    str = "The dates were 9/11/2022, 10/15/2022, 11/20/2022, and 12/01/2022"
    
    # get the dates
    l = []
    matches = re.findall("(([1-9]|1[0-2])/\d{2}/\d{4})", str)
    for match in matches:
        # line to get current date and add to the list
    
  • l.append(match)
  • This would add the tuple not the date
  • l.extend(match)
  • Use extend to add two lists together
  • l.append(match[0])
  • This will add the date to the list (the first element in the tuple)
  • l.extend(match[0])
  • Use extend to add two lists together

Subsection 12.14.4 Boundary or Anchor Characters

Checkpoint 12.14.10.

Run the code below to see what it prints.

Checkpoint 12.14.11.

    11-9-9: What does the ‘^' do?
  • Return the first match that it finds.
  • It does not do this.
  • Return a match if it is at the beginning of the string.
  • Correct. It returns a match only if it is at the beginning of a string.
  • Return a match if it is at the end of the string.
  • It does not do this, however any anchor character does.
  • Return a match if it is a whole word, not just part of a word.
  • It does not do this.

Checkpoint 12.14.12.

Run the code below to see what it prints.

Checkpoint 12.14.13.

    11-9-11: What does the ‘$' do?
  • Return the first match that it finds.
  • It does not do this.
  • Return a match if it is at the beginning of the string.
  • It does not do this, but the '^' does.
  • Return a match if it is at the end of the string.
  • Correct! It matches only at the end of the string.
  • Return a match if it is a whole word, not just part of a word.
  • It does not do this.

Note 12.14.14.

Since ‘$' is an anchor character if you want to match a ‘$' use ‘\$'.

Checkpoint 12.14.15.

Run the code below to see what it prints.

Checkpoint 12.14.16.

    11-9-13: What does the ‘\b' do?
  • Return the first match that it finds.
  • It does not do this.
  • Return a match if it is at the beginning of the string.
  • It does not do this, but the '^' does.
  • Return a match if it is at the end of the string.
  • It does not do this, but the '$' does.
  • Return a match if it is a whole word, not just part of a word.
  • Correct! It matches if it is a whole word, not just part of a word.

Note 12.14.17.

Since ‘\b' usually represents a backspace in a Python string you must use ‘r' before the string to treat it as a raw string. You only need to add the r in front of the string if the expression has a ‘\b' in it.

Subsection 12.14.5 Negating a Character Set

You can negate a character set using the ‘^' after the ‘[‘.

Checkpoint 12.14.18.

Run the code below to see what it prints.

Checkpoint 12.14.19.

    11-9-15: Which of the following best describes when passwordChecker returns true?
  • If the string has only uppercase and lowercase alphabetic characters.
  • It also allows digits.
  • If the string has only uppercase and lowercase alphabetic characters or numeric digits.
  • Correct! It returns true if the string only has alphabetic characters or numeric digits.
  • If the string has only numeric digits.
  • It also allows alpabetic characters.
  • If the string has only uppercase and lowercase alphabetic characters, numeric digits, or special characters like '!{}[]'.
  • It does not do this.

Checkpoint 12.14.20.

If you worked in a group, you can copy the answers from this page to the other group members. Select the group members below and click the button to share answers.
<div class="runestone sqcontainer %(optclass)s"> <div data-component="groupsub" id=regex_adv_groupsub data-size_limit=3> <div class="col-sm-6"> <select id="assignment_group" multiple class="assignment_partner_select" style="width: 100%"> </select> </div> <div id="groupsub_button" class="col-sm-6"> </div> <p>The Submit Group button will submit the answer for each each question on this page for each member of your group. It also logs you as the official group submitter.</p> </div> </div>
https://cspogil.org/Home