-
Book Overview & Buying
-
Table Of Contents
Mastering Python Regular Expressions
Throughout Chapter 2, Regular Expressions with Python, we've seen several operations where there was a warning about overlapping groups: for example, the findall operation. This is something that seems to confuse a lot of people. So, let's try to bring some clarity with a simple example:
>>>re.findall(r'(a|b)+', 'abaca') ['a', 'a']
What's happening here? Why does the following expression give us 'a' and 'a' instead of 'aba' and 'a'?
Let's look at it step by step to understand the solution:

Overlapping groups matching process
As we can see in the preceding figure, the characters aba are matched, but the captured group is only formed by a. This is because even though our regex is grouping every character, it stays with the last a. Keep this in mind because it's the key to understanding how it works. Stop for a moment and think about it, we're requesting the regex engine to capture all the groups made up of a or b, but just for one of the characters and that's the key...
Change the font size
Change margin width
Change background colour