regex-lookarounds

Lookarounds in Regex

  • 3 min

Lookarounds are patterns that allow for conditional matching, based on what is around (before or after) the pattern we are looking for.

They are divided into two main categories:

  • Lookaheads: Check if a pattern follows another pattern.
  • Lookbehinds: Check if a pattern precedes another pattern.

Both types of lookarounds do not consume characters in the text string. That is, they are not part of the final match.

To use lookarounds, we employ the following syntax:

TypeSyntax
Positive Lookahead(?=pattern)
Negative Lookahead(?!pattern)
Positive Lookbehind(?<=pattern)
Negative Lookbehind(?<!pattern)

Could they have made it more complicated and less intuitive? Possibly not 😆

How to use lookahead

Positive Lookahead

The positive lookahead (?=pattern) checks if a specific pattern immediately follows another pattern. If the condition is met, the match is performed.

For example, imagine we want to find all words that are followed by an exclamation mark.

¡Hola! ¿Cómo estas? ¡Esto es genial!

In this case,

  • \w+ matches the words
  • (?=!) ensures they are followed by an exclamation mark.

Negative Lookahead

The negative lookahead (?!pattern) checks that a specific pattern does not follow another pattern. If the condition is met (i.e., the pattern is not found), the match is performed.

For example, suppose we want to find words that are not followed by a question mark.

¡Hola! ¿Como estas? ¡Esto es increible!

Here,

  • \w+\b matches the words
  • (?!\?) ensures they are not followed by a question mark.

How to use lookbehind

Positive Lookbehind

The positive lookbehind (?<=pattern) checks that a specific pattern precedes another pattern. If the condition is met, the match is performed.

Imagine we want to find all numbers that are preceded by a dollar sign.

text = "El precio es $10 y el descuento es $2."
pattern = r'(?<=\$)\d+'

matches = re.findall(pattern, text)
print(matches)  # ['10', '2']
Copied!
El precio es $10 y el descuento es $2.

In this case,

  • (?<=\$) ensures the number is preceded by a dollar sign
  • In this case, 10 and 2 meet the condition.

Negative Lookbehind

The negative lookbehind (?<!pattern) checks that a specific pattern does not precede another pattern. If the condition is met (i.e., the pattern is not found), the match is performed.

Suppose we want to find numbers that are not preceded by a dollar sign.

El precio es $10 y el descuento es $2.

Here,

  • (?<!\$) ensures the number is not preceded by a dollar sign
  • In this case, only the 0 from the first $10 meets the condition.