Book Image

Mastering Python Regular Expressions

Book Image

Mastering Python Regular Expressions

Overview of this book

Regular expressions are used by many text editors, utilities, and programming languages to search and manipulate text based on patterns. They are considered the Swiss army knife of text processing. Powerful search, replacement, extraction and validation of strings, repetitive and complex tasks are reduced to a simple pattern using regular expressions. Mastering Python Regular Expressions will teach you about Regular Expressions, starting from the basics, irrespective of the language being used, and then it will show you how to use them in Python. You will learn the finer details of what Python supports and how to do it, and the differences between Python 2.x and Python 3.x. The book starts with a general review of the theory behind the regular expressions to follow with an overview of the Python regex module implementation, and then moves on to advanced topics like grouping, looking around, and performance. You will explore how to leverage Regular Expressions in Python, some advanced aspects of Regular Expressions and also how to measure and improve their performance. You will get a better understanding of the working of alternators and quantifiers. Also, you will comprehend the importance of grouping before finally moving on to performance optimization techniques like the RegexBuddy Tool and Backtracking. Mastering Python Regular Expressions provides all the information essential for a better understanding of Regular Expressions in Python.
Table of Contents (12 chapters)

Index

A

  • alternation / Alternation
    • common parts, extracting / Extract common parts in alternation
  • atomic groups / Atomic groups

B

  • backreferences
    • about / Backreferences
  • backslash character
    • used, in string literals / Backslash in string literals, String Python 2.x
  • Backtracking
    • about / Understanding the Python regex engine, Backtracking
  • boundary matchers / Boundary Matchers
  • building blocks, for Python regex
    • RegexObject / RegexObject
    • MatchObject / MatchObject
    • module operations / Module operations

C

  • character classes / Character classes
  • common parts
    • extracting, in alternation / Extract common parts in alternation
  • compilation flags
    • about / Compilation flags
    • re.IGNORECASE / re.IGNORECASE or re.I
    • re.I / re.IGNORECASE or re.I
    • re.MULTILINE / re.MULTILINE or re.M
    • re.M / re.MULTILINE or re.M
    • re.DOTALL / re.DOTALL or re.S
    • re.S / re.DOTALL or re.S
    • re.LOCALE / re.LOCALE or re.L
    • re.L / re.LOCALE or re.L
    • re.UNICODE / re.UNICODE or re.U
    • re.U / re.UNICODE or re.U
    • re.VERBOSE / re.VERBOSE or re.X
    • re.X / re.VERBOSE or re.X
  • compiled patterns
    • reusing / Reuse compiled patterns
  • count argument / sub(repl, string, count=0)

D

  • DOTALL flag / re.DOTALL or re.S

E

  • end([group]) operation / end([group])
  • endpos parameter / search(string[, pos[, endpos]])
  • escape() operation / escape()
  • expand(template) operation / expand(template)

F

  • findall(string[, pos[, endpos]]) operation / findall(string[, pos[, endpos]])
  • findall operation / findall(string[, pos[, endpos]])
  • finditer(string[, pos[, endpos]]) operation / finditer(string[, pos[, endpos]])
  • flags per group
    • about / Flags per group

G

  • greedy behavior
    • about / Greedy and reluctant quantifiers
  • group([group1, …]) operation / group([group1, …])
  • groupdict([default]) operation / groupdict([default])
  • groupdict method / groupdict([default])
  • Grouping
    • parentheses () / Introduction
    • capturing / Introduction
  • groups([default]) operation / groups([default])

L

  • literals / Literals
  • look ahead
    • about / Look ahead
  • look ahead and substitutions
    • about / Look around and substitutions
  • look around
    • used, in groups / Look around and groups
  • look behind
    • about / Look behind
    • negative look behind / Negative look behind

M

  • match(string[, pos[, endpos]]) method / match(string[, pos[, endpos]])
  • MatchObject
    • about / MatchObject
    • group([group1, …]) operation / group([group1, …])
    • group([group1, …]) operation / group([group1, …])
    • groups([default]) operation / groups([default])
    • groupdict([default]) operation / groupdict([default])
    • start([group]) operation / start([group])
    • end([group]) operation / end([group])
    • span([group]) operation / span([group])
    • expand(template) operation / expand(template)
  • maxsplit parameter / split(string, maxsplit=0)
  • module operations
    • escape() operation / escape()
    • purge() operation / purge()

N

  • named groups
    • about / Named groups
  • negative look ahead
    • about / Negative look ahead
  • negative look behind / Negative look behind
  • non-BMP
    • URL / What's new in Python 3
  • non-capturing groups
    • using / Use non-capturing groups when appropriate
  • non-greedy behavior
    • about / Greedy and reluctant quantifiers
  • non capturing groups
    • about / Non-capturing groups
    • atomic groups / Atomic groups
  • Nondeterministic Finite Automata (NFA) / Understanding the Python regex engine
  • normalize_orders function / sub(repl, string, count=0)

O

  • overlapping groups
    • about / Overlapping groups

P

  • parentheses () / Introduction
  • POSIX style support
    • URL / History, relevance, and purpose
  • pos parameter / search(string[, pos[, endpos]])
  • possessive quantifier
    • about / Greedy and reluctant quantifiers
  • predefined character classes / Predefined character classes
  • purge() operation / purge()
  • Python
    • and other flavors, difference between / Differences between Python and other flavors
    • regular expression, benchmarking with / Benchmarking regular expressions with Python
  • Python 3
    • about / What's new in Python 3
  • Python 3.3
    • URL / What's new in Python 3

Q

  • quantifiers
    • about / Quantifiers

R

  • re.DOTALL / re.DOTALL or re.S
  • re.escape method / Literals
  • re.I / re.IGNORECASE or re.I
  • re.IGNORECASE / re.IGNORECASE or re.I
  • re.L / re.LOCALE or re.L
  • re.LOCALE / re.LOCALE or re.L
  • re.M / re.MULTILINE or re.M
  • re.MULTILINE / re.MULTILINE or re.M
  • re.S / re.DOTALL or re.S
  • re.U / re.UNICODE or re.U
  • re.UNICODE / re.UNICODE or re.U
  • re.VERBOSE / re.VERBOSE or re.X
  • re.X / re.VERBOSE or re.X
  • recommendations, regular expression
    • compiled patterns, reusing / Reuse compiled patterns
    • common parts, extracting in alteration / Extract common parts in alternation
    • non-capturing groups, using / Use non-capturing groups when appropriate
  • RegexBuddy
    • about / The RegexBuddy tool
    • URL / The RegexBuddy tool
  • regex module
    • URL / Atomic groups
  • RegexObject
    • about / RegexObject
    • searching / Searching
    • match(string[, pos[, endpos]]) method / match(string[, pos[, endpos]])
    • search(string[, pos[, endpos]]) operation / search(string[, pos[, endpos]])
    • findall(string[, pos[, endpos]]) operation / findall(string[, pos[, endpos]])
    • finditer(string[, pos[, endpos]]) operation / finditer(string[, pos[, endpos]])
    • string, modifying / Modifying a string
    • split(string, maxsplit=0) operation / split(string, maxsplit=0)
    • sub(repl, string, count=0) operation / sub(repl, string, count=0)
    • subn(repl, string, count=0) operation / subn(repl, string, count=0)
  • Regular-Expressions.info
    • URL / The RegexBuddy tool
  • regular expression
    • history / History, relevance, and purpose
    • literals / Literals
    • character classes / Character classes
    • predefined character classes / Predefined character classes
    • alternation / Alternation
    • quantifiers / Quantifiers
    • boundary matchers / Boundary Matchers
    • benchmarking, with Python / Benchmarking regular expressions with Python
  • regular expression syntax
    • about / The regular expression syntax
  • repl argument / sub(repl, string, count=0)

S

  • search(string[, pos[, endpos]]) operation / search(string[, pos[, endpos]])
  • span([group]) operation / span([group])
  • split(string, maxsplit=0) operation / split(string, maxsplit=0)
  • split operation / split(string, maxsplit=0)
  • start([group]) operation / start([group])
  • String Python 2.x
    • about / String Python 2.x
  • sub(repl, string, count=0) operation / sub(repl, string, count=0)
  • subn(repl, string, count=0) operation / subn(repl, string, count=0)

U

  • Unicode
    • about / Unicode

Y

  • yes-pattern|no-pattern
    • about / yes-pattern|no-pattern