Lookarounds are patterns that allow for conditional matching, based on what is around (before or after) the pattern we are looking for.
They are divided into two main categories:
- Lookaheads: Check if a pattern follows another pattern.
- Lookbehinds: Check if a pattern precedes another pattern.
Both types of lookarounds do not consume characters in the text string. That is, they are not part of the final match.
To use lookarounds, we employ the following syntax:
| Type | Syntax |
|---|---|
| Positive Lookahead | (?=pattern) |
| Negative Lookahead | (?!pattern) |
| Positive Lookbehind | (?<=pattern) |
| Negative Lookbehind | (?<!pattern) |
Could they have made it more complicated and less intuitive? Possibly not 😆
How to use lookahead
Positive Lookahead
The positive lookahead (?=pattern) checks if a specific pattern immediately follows another pattern. If the condition is met, the match is performed.
For example, imagine we want to find all words that are followed by an exclamation mark.
In this case,
\w+matches the words(?=!)ensures they are followed by an exclamation mark.
Negative Lookahead
The negative lookahead (?!pattern) checks that a specific pattern does not follow another pattern. If the condition is met (i.e., the pattern is not found), the match is performed.
For example, suppose we want to find words that are not followed by a question mark.
Here,
\w+\bmatches the words(?!\?)ensures they are not followed by a question mark.
How to use lookbehind
Positive Lookbehind
The positive lookbehind (?<=pattern) checks that a specific pattern precedes another pattern. If the condition is met, the match is performed.
Imagine we want to find all numbers that are preceded by a dollar sign.
text = "El precio es $10 y el descuento es $2."
pattern = r'(?<=\$)\d+'
matches = re.findall(pattern, text)
print(matches) # ['10', '2']
In this case,
(?<=\$)ensures the number is preceded by a dollar sign- In this case,
10and2meet the condition.
Negative Lookbehind
The negative lookbehind (?<!pattern) checks that a specific pattern does not precede another pattern. If the condition is met (i.e., the pattern is not found), the match is performed.
Suppose we want to find numbers that are not preceded by a dollar sign.
Here,
(?<!\$)ensures the number is not preceded by a dollar sign- In this case, only the
0from the first$10meets the condition.
