I am following https://automatetheboringstuff.com/2e/chapter7 

CHAPTER 7. Pattern matching with regular expressions. 

import re 
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') 
mo = phoneNumRegex.search('My number is 415-555-4242.') 
print('Phone number found: ' + mo.group()) 
## search() finds it and group() saves it. 
## parentheses will organize matches into groups. The 0 or null group is the whole thing. 
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)') 
mo = phoneNumRegex.search('My number is 415-555-4242.') 
print(mo.group(1)) ## 415 only 
print(mo.group(0)) ## 415-555-4242. Same with mo.group(). 

	In regular expressions, the following characters have special meanings:
	.  ^  $  *  +  ?  {  }  [  ]  \  |  (  )
	If you want to detect these characters as part of your text pattern, 
	you need to escape them with a backslash. | is called a pipe meaning or. 
	

import re 
heroRegex = re.compile(r"Batman|Tina Fey") 
mo1 = heroRegex.search("Batman and Tina Fey") 
print(mo1.group()) ## search() finds the first occurrence. 
## To find all, use findall(). It returns a list. 
## If there are groups, it returns a list of tuples of strings. 
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)') # has groups
print(phoneNumRegex.findall('Cell: 415-555-9999 Work: 212-555-0000'))
## [('415', '555', '9999'), ('212', '555', '0000')]

	Optional matching. 

import re 
batRegex = re.compile(r'Bat(wo)?man') ## The group preceding the ? is optional, 0 or 1 times.  
mo3 = batRegex.search('The Adventures of Batman')
print(mo3 == None) 
## * is for 0 more more times. + is for 1 or more. 
batRegex = re.compile(r'Bat(wo)+man') 
mo3 = batRegex.search('The Adventures of Batman')
print(mo3 == None) 

	(45){3} means 454545. 
	{3,5} means between 3 and 5 instances. 
	{,5} means 0 to 5 instances. 
	Greedy. Longest match possible. For shortest, use {3,5}? 

	Shorthand characters. 
	\d = digit. 
	\D = nondigit character. 
	\w = any letter or digit or _. 
	\W = any other character. 
	\s = space, tab or newline. 
	\S = any other character. 

	\d+ means one or more digit. \w+ means one or more letters. 

	Use [] for your own class. 
	
import re 
vowelRegex = re.compile(r'[aeiouAEIOU]') ## You can also do ranges with -. 
print(vowelRegex.findall('RoboCop eats baby food. BABY FOOD.')) 

	^ = must occur at beginning of text. $ = must occur at end. 
	. = any character except \n. 
	.* = any collection of characters except newline. 
	The argument re.compile('.*', re.DOTALL) ## makes the . match anything, even \n. 
	[^abc] matches anything except a b or c. 
	The argument re.I makes it match regardless of uppercase or lowercase! 
	sub(x,y) replaces any match in y with x. 

import re 
namesRegex = re.compile(r'Agent \w+')
print(namesRegex.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob.')) 

	\1 in sub means the text of group 1. 

import re 
agentNamesRegex = re.compile(r'Agent (\w)\w*')
print(agentNamesRegex.sub(r'\1****', 'Agent Alice told Agent Carol that Agent Eve knew Agent Bob was a double agent.')) 

	re.VERBOSE as the second argument to re.compile() 
	makes it ignore whitespace in your input to be matched. 
	phoneRegex = re.compile(r'''(
    	(\d{3}|\(\d{3}\))?            # area code
    	(\s|-|\.)?                    # separator
    	\d{3}                         # first 3 digits
    	(\s|-|\.)                     # separator
    	\d{4}                         # last 4 digits
    	(\s*(ext|x|ext.)\s*\d{2,5})?  # extension
    	)''', re.VERBOSE) 
	The | or pipe lets you combine arguments.