I am following https://automatetheboringstuff.com/2e/chapter7 CHAPTER 7. Pattern matching with regular expressions. import re phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') mo = phoneNumRegex.search('My number is 415-555-4242.') print('Phone number found: ' + mo.group()) ## search() finds it and group() saves it. ## parentheses will organize matches into groups. The 0 or null group is the whole thing. phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)') mo = phoneNumRegex.search('My number is 415-555-4242.') print(mo.group(1)) ## 415 only print(mo.group(0)) ## 415-555-4242. Same with mo.group(). In regular expressions, the following characters have special meanings: . ^ $ * + ? { } [ ] \ | ( ) If you want to detect these characters as part of your text pattern, you need to escape them with a backslash. | is called a pipe meaning or. import re heroRegex = re.compile(r"Batman|Tina Fey") mo1 = heroRegex.search("Batman and Tina Fey") print(mo1.group()) ## search() finds the first occurrence. ## To find all, use findall(). It returns a list. ## If there are groups, it returns a list of tuples of strings. phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)') # has groups print(phoneNumRegex.findall('Cell: 415-555-9999 Work: 212-555-0000')) ## [('415', '555', '9999'), ('212', '555', '0000')] Optional matching. import re batRegex = re.compile(r'Bat(wo)?man') ## The group preceding the ? is optional, 0 or 1 times. mo3 = batRegex.search('The Adventures of Batman') print(mo3 == None) ## * is for 0 more more times. + is for 1 or more. batRegex = re.compile(r'Bat(wo)+man') mo3 = batRegex.search('The Adventures of Batman') print(mo3 == None) (45){3} means 454545. {3,5} means between 3 and 5 instances. {,5} means 0 to 5 instances. Greedy. Longest match possible. For shortest, use {3,5}? Shorthand characters. \d = digit. \D = nondigit character. \w = any letter or digit or _. \W = any other character. \s = space, tab or newline. \S = any other character. \d+ means one or more digit. \w+ means one or more letters. Use [] for your own class. import re vowelRegex = re.compile(r'[aeiouAEIOU]') ## You can also do ranges with -. print(vowelRegex.findall('RoboCop eats baby food. BABY FOOD.')) ^ = must occur at beginning of text. $ = must occur at end. . = any character except \n. .* = any collection of characters except newline. The argument re.compile('.*', re.DOTALL) ## makes the . match anything, even \n. [^abc] matches anything except a b or c. The argument re.I makes it match regardless of uppercase or lowercase! sub(x,y) replaces any match in y with x. import re namesRegex = re.compile(r'Agent \w+') print(namesRegex.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob.')) \1 in sub means the text of group 1. import re agentNamesRegex = re.compile(r'Agent (\w)\w*') print(agentNamesRegex.sub(r'\1****', 'Agent Alice told Agent Carol that Agent Eve knew Agent Bob was a double agent.')) re.VERBOSE as the second argument to re.compile() makes it ignore whitespace in your input to be matched. phoneRegex = re.compile(r'''( (\d{3}|\(\d{3}\))? # area code (\s|-|\.)? # separator \d{3} # first 3 digits (\s|-|\.) # separator \d{4} # last 4 digits (\s*(ext|x|ext.)\s*\d{2,5})? # extension )''', re.VERBOSE) The | or pipe lets you combine arguments.