I am following https://automatetheboringstuff.com/2e/chapter7 CHAPTER 7. Pattern matching with regular expressions. import re phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') mo = phoneNumRegex.search('My number is 415-555-4242.') print('Phone number found: ' + mo.group()) ## search() finds it and group() saves it. ## parentheses will organize matches into groups. The 0 or null group is the whole thing. phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)') mo = phoneNumRegex.search('My number is 415-555-4242.') mo.group(1) ## 415 only mo.group(0) ## 415-555-4242. Same with mo.group(). In regular expressions, the following characters have special meanings: . ^ $ * + ? { } [ ] \ | ( ) If you want to detect these characters as part of your text pattern, you need to escape them with a backslash. | is called a pipe meaning or. heroRegex = re.compile(r"Batman|Tina Fey") mo1 = heroRegex.search("Batman and Tina Fey") mo1.group() ## search() finds the first occurrence. To find all, use findall(). It returns a list. If there are groups, it returns a list of tuples of strings. phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)') # has groups >>> phoneNumRegex.findall('Cell: 415-555-9999 Work: 212-555-0000') ## [('415', '555', '9999'), ('212', '555', '0000')] Optional matching. batRegex = re.compile(r'Bat(wo)?man') ## The group preceding the ? is optional, 0 or 1 times. phoneRegex = re.compile(r'(\d\d\d-)?\d\d\d-\d\d\d\d') ## * is for 0 more more times. + is for 1 or more. batRegex = re.compile(r'Bat(wo)+man') mo3 = batRegex.search('The Adventures of Batman') >>> mo3 == None ## True. ## (45){3} means 454545. {3,5} means between 3 and 5 instances. {,5} means 0 to 5 instances. ## Greedy. Longest match possible. For non-greedy, shortest, use {3,5}? Shorthand characters. Cool. \d = digit. \D = nondigit character. \w = any letter or digit or _. \W = any other character. \s = space, tab or newline. \S = any other character. \d+ means one or more digit. \w+ means one or more letters. Use [] for your own class. vowelRegex = re.compile(r'[aeiouAEIOU]') ## You can also do ranges with -. vowelRegex.findall('RoboCop eats baby food. BABY FOOD.') ^ = must occur at beginning of text. $ = must occur at end. . = any character except \n. .* = any collection of characters except newline. The argument re.compile('.*', re.DOTALL) ## makes the . match anything, even \n. [^abc] matches anything except a b or c. The argument re.I makes it match regardless of uppercase or lowercase! Cool. sub(x,y) replaces any match in y with x. Cool. namesRegex = re.compile(r'Agent \w+') namesRegex.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob.') ## 'CENSORED gave the secret documents to CENSORED.' \1 in sub means the text of group 1. agentNamesRegex = re.compile(r'Agent (\w)\w*') agentNamesRegex.sub(r'\1****', 'Agent Alice told Agent Carol that Agent Eve knew Agent Bob was a double agent.') ## A**** told C**** that E**** knew B**** was a double agent.' re.VERBOSE as the second argument to re.compile() makes it ignore whitespace in your input to be matched. phoneRegex = re.compile(r'''( (\d{3}|\(\d{3}\))? # area code (\s|-|\.)? # separator \d{3} # first 3 digits (\s|-|\.) # separator \d{4} # last 4 digits (\s*(ext|x|ext.)\s*\d{2,5})? # extension )''', re.VERBOSE) The | or pipe lets you combine arguments. someRegexValue = re.compile('foo', re.IGNORECASE | re.DOTALL) ## This ignores case and lets the dot include \n.