I am following https://automatetheboringstuff.com/2e/chapter7 

CHAPTER 7. Pattern matching with regular expressions. 

import re 
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') 
mo = phoneNumRegex.search('My number is 415-555-4242.') 
print('Phone number found: ' + mo.group()) 
## search() finds it and group() saves it. 
## parentheses will organize matches into groups. The 0 or null group is the whole thing. 
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)') 
mo = phoneNumRegex.search('My number is 415-555-4242.') 
mo.group(1) ## 415 only 
mo.group(0) ## 415-555-4242. Same with mo.group(). 

In regular expressions, the following characters have special meanings:
.  ^  $  *  +  ?  {  }  [  ]  \  |  (  )
If you want to detect these characters as part of your text pattern, you need to escape them with a backslash. | is called a pipe meaning or. 
heroRegex = re.compile(r"Batman|Tina Fey") 
mo1 = heroRegex.search("Batman and Tina Fey") 
mo1.group() ## search() finds the first occurrence. To find all, use findall(). It returns a list. 
If there are groups, it returns a list of tuples of strings. 
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)') # has groups
>>> phoneNumRegex.findall('Cell: 415-555-9999 Work: 212-555-0000')
## [('415', '555', '9999'), ('212', '555', '0000')]

Optional matching. 
batRegex = re.compile(r'Bat(wo)?man') ## The group preceding the ? is optional, 0 or 1 times.  
phoneRegex = re.compile(r'(\d\d\d-)?\d\d\d-\d\d\d\d') 
## * is for 0 more more times. + is for 1 or more. 
batRegex = re.compile(r'Bat(wo)+man') 
mo3 = batRegex.search('The Adventures of Batman')
>>> mo3 == None ## True. 
## (45){3} means 454545. {3,5} means between 3 and 5 instances. {,5} means 0 to 5 instances. 
## Greedy. Longest match possible. For non-greedy, shortest, use {3,5}? 

Shorthand characters. Cool. 
\d = digit. 
\D = nondigit character. 
\w = any letter or digit or _. 
\W = any other character. 
\s = space, tab or newline. 
\S = any other character. 

\d+ means one or more digit. \w+ means one or more letters. 

Use [] for your own class. 
vowelRegex = re.compile(r'[aeiouAEIOU]') ## You can also do ranges with -. 
vowelRegex.findall('RoboCop eats baby food. BABY FOOD.') 

^ = must occur at beginning of text. $ = must occur at end. 
. = any character except \n. 
.* = any collection of characters except newline. 
The argument re.compile('.*', re.DOTALL) ## makes the . match anything, even \n. 
[^abc] matches anything except a b or c. 
The argument re.I makes it match regardless of uppercase or lowercase! Cool. 

sub(x,y) replaces any match in y with x. Cool. 
namesRegex = re.compile(r'Agent \w+')
namesRegex.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob.') 
## 'CENSORED gave the secret documents to CENSORED.' 

\1 in sub means the text of group 1. 

agentNamesRegex = re.compile(r'Agent (\w)\w*')
agentNamesRegex.sub(r'\1****', 'Agent Alice told Agent Carol that 
Agent Eve knew Agent Bob was a double agent.') 
## A**** told C**** that E**** knew B**** was a double agent.' 

re.VERBOSE as the second argument to re.compile() makes it ignore whitespace in your input to be matched. 

phoneRegex = re.compile(r'''(
    (\d{3}|\(\d{3}\))?            # area code
    (\s|-|\.)?                    # separator
    \d{3}                         # first 3 digits
    (\s|-|\.)                     # separator
    \d{4}                         # last 4 digits
    (\s*(ext|x|ext.)\s*\d{2,5})?  # extension
    )''', re.VERBOSE) 

The | or pipe lets you combine arguments. 
someRegexValue = re.compile('foo', re.IGNORECASE | re.DOTALL) 
## This ignores case and lets the dot include \n.