Language: EN

cheatsheet-regex

Regular Expressions Cheatsheet

Regular expressions are text patterns used to search and manipulate strings of characters. They are widely used in programming for tasks such as searching, validation, and text substitution.

Try it online

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
https://www.demo.com
email@domain.com
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum

Syntax

Syntax

| Alternation, matches one of several patterns.

g|h #will match "g" or "h".

() Grouping of patterns.

(hi)+ #will match "hi", "hihi", "hihihi", etc.

\ Escape character for special characters.

(hi)+ #will match "hi", "hihi", "hihihi", etc.

Characters

. Matches any character except newline.

a.c #will match "abc", "a+c", "a!c", etc.

\w Matches any word character (letter, number, underscore).

\w+ #will match "word123", "hello_world", "123", etc.

\W Matches any non-word character.

\W+ #will match "_!@#", "-$%", etc.

\d Matches any digit.

\d{3} #will match "123", "456", etc.

\D Matches any non-digit character.

\D+ #will match "abc", "_!@", etc.

Anchors and Limits

^ Matches the start of the line.

^start  #will match "start of line", "start_here", etc.

$ Matches the end of the line.

end$  #will match "end of line", "goes to end", etc.

\b Matches the boundary of a word.

\bword\b #will match "word", "wording", "my_word", but not "sword".

\B Matches the boundary of a non-word.

\Bnon\B #will match "non-stop", "intrinsic", but not "nonprofit".

Character Classes

[] Defines a character class.

[^] Defines a negative character class.

[-] Defines a range of characters.

[,] Defines multiple character classes.

[abc] Matches any character between the brackets.

[abc]+ #will match "a", "abc", "caba", etc.

[^abc] Matches any character not between the brackets.

[^abc]+ #will match "123", "_!@", etc.

[a-z] Matches any character in the range from a to z.

[a-z]+ #will match "hello", "example", etc.

[A-Z] Matches any character in the range from A to Z.

[A-Z]+ #will match "UPPER", "CASE", etc.

[a-z,A-Z] Matches any character in the range from a to z or A-Z.

[a-z,A-Z]+ #will match "Upper", "CASE", "lower" etc.

[0-9] Matches any digit.

[0-9]+ #will match "123", "4567", etc.

Whitespace and Line Breaks

\n Matches a newline.

\t Matches a tab.

\s Matches any whitespace.

\s+ #will match " ", "    ", "\t\t", etc.

\S Matches any non-whitespace character.

\S+ #will match "word", "123", "_!@", etc.

Quantifiers

* Matches 0 or more occurrences of the previous pattern.

a* #will match "", "a", "aa", "aaa", etc.

+ Matches 1 or more occurrences of the previous pattern.

b+ #will match "b", "bb", "bbb", etc.

? Matches 0 or 1 occurrence of the previous pattern.

c? #will match "", "c", etc.

{n} Matches exactly n occurrences of the previous pattern.

d{3} #will match "ddd".

{n,} Matches at least n occurrences of the previous pattern.

e{2,} #will match "ee", "eee", "eeee", etc.

{n,m} Matches between n and m occurrences of the previous pattern.

f{1,3} #will match "f", "ff", "fff", etc.

Inline Modifiers

(?i) Case-insensitive modifier.

(?i)hello #will match "hello", "HELLO", "hElLo", etc.

(?m) Multiline modifier.

(?m)^start #will match "start of line", "start here", etc.

(?s) Dotall modifier.

(?s)start.*end #will match "start\nmiddle\nend".

(?x) Verbose modifier.

(?x)  a b  c # will match "a b c", ignoring spaces.

Lookarounds

(?=pattern) Positive lookahead, matches if the next text matches pattern.

(?!pattern) Negative lookahead, matches if the next text does NOT match pattern.

(?<=pattern) Positive lookbehind, matches if the previous text matches pattern.

(?<!pattern) Negative lookbehind, matches if the previous text does NOT match pattern.

Regular Expressions Examples

Find all words Finds all words in a text.

\b\w*\b

Find all words that start with a capital letter Finds all words that start with a capital letter.

\b[A-Z]\w*\b

Find all numbers in a text Finds all sequences of one or more digits in a text.

\b\d+\b

Find words of a specific length Finds words that are exactly 5 letters long.

\b\w{5}\b

Find an HTML <img> tag Finds all HTML <img> tags, including any attributes they may have.

<img\b[^>]*>

Find Markdown headers Finds Markdown headers from # to ######, followed by zero or more spaces, and captures the header text.

^#+\s*(.*)

Match Phone Numbers in a specific format Finds phone numbers in the format +34 600-00-00-00 with possible separators.

(\+\d{2})+([ -_])?\d{0,3}([ -_])?\d{0,3}([ -_])?\d{0,3}([ -_])?\d{0,7}

Match Email Addresses Finds valid email addresses.

[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}

Match Dates in MM/DD/YYYY Format Finds dates in MM/DD/YYYY format.

(0[1-9]|1[0-2])\/(0[1-9]|[12][0-9]|3[01])\/(19|20)\d{2}

Usage Examples in Different Languages

In these examples, the regular expression used is \broja\b, which searches for the word “roja” as a whole word. The examples search for this word in the text “La casa es roja y azul.” and display the found matches.

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
        string text = "La casa es roja y azul.";

        // Pattern to search for the word "roja"
        string pattern = @"\broja\b";

        // Create the Regex object
        Regex regex = new Regex(pattern);

        // Search for matches
        MatchCollection matches = regex.Matches(text);

        // Print the matches
        foreach (Match match in matches)
        {
            Console.WriteLine($"Match found: {match.Value}");
        }
    }
}
let text = "La casa es roja y azul.";

// Pattern to search for the word "roja"
let pattern = /\broja\b/g;

// Search for matches
let matches = text.match(pattern);

// Print the matches
matches.forEach(match => {
    console.log(`Match found: ${match}`);
});
import re

text = "La casa es roja y azul."

# Pattern to search for the word "roja"
pattern = r'\broja\b'

# Search for matches
matches = re.findall(pattern, text)

# Print the matches
for match in matches:
    print(f"Match found: {match}")

Flags

Flags are used with regular expressions to modify their behavior when searching for matches in a text string.

In C#, flags can be specified as additional arguments when compiling the regular expression with Regex.Compile(). For example:

Regex.Compile("pattern", RegexOptions.IgnoreCase) for the IgnoreCase flag
Regex.Compile("pattern", RegexOptions.Multiline) for the Multiline flag
Regex.Compile("pattern", RegexOptions.Singleline) for the Singleline flag
Regex.Compile("pattern", RegexOptions.IgnorePatternWhitespace) for the IgnorePatternWhitespace flag
/pattern/i for the i (insensitive) flag
/pattern/g for the g (global) flag
/pattern/m for the m (multiline) flag
/pattern/s for the s (dotall) flag
/pattern/u for the u (unicode) flag

In Python, flags can be specified as additional arguments when compiling the regular expression with re.compile(). For example:

re.compile(r'pattern', re.I) for the I (insensitive) flag
re.compile(r'pattern', re.M) for the M (multiline) flag
re.compile(r'pattern', re.S) for the S (dotall) flag
re.compile(r'pattern', re.U) for the U (unicode) flag