Book Image

Mastering Python Regular Expressions

Book Image

Mastering Python Regular Expressions

Overview of this book

Regular expressions are used by many text editors, utilities, and programming languages to search and manipulate text based on patterns. They are considered the Swiss army knife of text processing. Powerful search, replacement, extraction and validation of strings, repetitive and complex tasks are reduced to a simple pattern using regular expressions. Mastering Python Regular Expressions will teach you about Regular Expressions, starting from the basics, irrespective of the language being used, and then it will show you how to use them in Python. You will learn the finer details of what Python supports and how to do it, and the differences between Python 2.x and Python 3.x. The book starts with a general review of the theory behind the regular expressions to follow with an overview of the Python regex module implementation, and then moves on to advanced topics like grouping, looking around, and performance. You will explore how to leverage Regular Expressions in Python, some advanced aspects of Regular Expressions and also how to measure and improve their performance. You will get a better understanding of the working of alternators and quantifiers. Also, you will comprehend the importance of grouping before finally moving on to performance optimization techniques like the RegexBuddy Tool and Backtracking. Mastering Python Regular Expressions provides all the information essential for a better understanding of Regular Expressions in Python.
Table of Contents (12 chapters)

Special cases with groups


Python provides us with some forms of groups that can help us to modify the regular expressions or even to match a pattern only when a previous group exists in the match, such as an if statement.

Flags per group

There is a way to apply the flags we've seen in Chapter 2 Regular Expressions with Python, using a special form of grouping: (?iLmsux).

Letter

Flag

i

re.IGNORECASE

L

re.LOCALE

m

re.MULTILINE

s

re.DOTALL

u

re.UNICODE

x

re.VERBOSE

For example:

>>>re.findall(r"(?u)\w+" ,ur"ñ")
[u'\xf1']

The above example is the same as:

>>>re.findall(r"\w+" ,ur"ñ", re.U)
[u'\xf1']

We've seen what these examples do several times in the previous chapter.

Remember that a flag is applied to the whole expression.

yes-pattern|no-pattern

This is a very useful case of groups. It tries to match a pattern in case a previous one was found. On the other hand, it doesn't try to match a pattern in case a previous group was not found. In short, it's like an if-else statement...