csharp-regular-expressions-regex

What Are and How to Use Regular Expressions (Regex) in C#

  • 4 min

A regular expression is a sequence of characters that forms a search pattern. This pattern can be used to find text strings that match that pattern or to replace text within a string.

Imagine you need to search a text for “any date in the format DD/MM/YYYY”. You could write a for loop with 20 if/else statements checking for numbers and slashes… or you could use a single line of Regex.

In C#, regular expressions are handled by the Regex class, which is included in the System.Text.RegularExpressions namespace.

Basic Syntax

Before diving into C# code, we need to understand the pattern language. At first, it looks like someone smashed the keyboard, but it has logic (more or less).

SymbolMeaningExampleMatches
.Any character (except newline)a.o”aro”, “a1o”, “a-o”
\dA digit (0-9)\d\d”05”, “99”
\wAlphanumeric (letters, numbers, _)\w+”Usuario1”, “Hola”
^Start of line (Anchor)^HolaStarts with “Hola”
$End of line (Anchor)fin$Ends with “fin”
*0 or more timesa*"", “a”, “aaa”
+1 or more timesa+”a”, “aaa” (but not "")
?0 or 1 time (Optional)colou?r”color”, “colour”
[]Range or set[A-Z]Any uppercase letter

The Regex Class

In C#, the class for regular expressions is Regex. We have two ways to use it:

  1. Static Methods: For quick, one-off uses (Regex.IsMatch(...)).
  2. Instance: To reuse the same pattern many times (new Regex(...)).

Validation (IsMatch)

The most common use: Does the text match the pattern? (e.g., validating an email, ID number, phone).

using System.Text.RegularExpressions;

string input = "1234A";
// Pattern: 4 digits followed by an uppercase letter
string patron = @"^\d{4}[A-Z]$"; 

bool esValido = Regex.IsMatch(input, patron);

Console.WriteLine(esValido); // True

Copied!

Notice the @ before the pattern string (@"..."). This is a Verbatim String. It’s almost mandatory in Regex to avoid having to escape each backslash (\) twice.

Data Extraction (Match and Groups)

This is where Regex is most useful. We don’t just validate, we extract information from the text.

Imagine we have a log: "Error en el sistema: [Código 500] - Timeout". We want to extract the error number.

string texto = "Error en el sistema: [Código 500] - Timeout";
string patron = @"\[Código (\d+)\]"; // Parentheses () create a GROUP

Match coincidencia = Regex.Match(texto, patron);

if (coincidencia.Success)
{
    // Group[0] is the entire match: "[Código 500]"
    // Group[1] is what's inside the parentheses: "500"
    string codigo = coincidencia.Groups[1].Value;
    Console.WriteLine($"El error es: {codigo}");
}

Copied!

Multiple Search (Matches)

If we expect to find the pattern multiple times in the same text (e.g., find all emails in a document).

string texto = "Contacta con [email protected] o con [email protected]";
string patron = @"\w+@\w+\.com";

MatchCollection resultados = Regex.Matches(texto, patron);

foreach (Match m in resultados)
{
    Console.WriteLine($"Email encontrado: {m.Value}");
}

Copied!

Replacement and Cleaning (Replace)

We can use Regex for intelligent “Find and Replace”.

string telefonoSucio = "(+34) 666-55-44";
// We want to keep only the numbers.
// \D means "Anything that is NOT a digit"
string soloNumeros = Regex.Replace(telefonoSucio, @"\D", ""); 

Console.WriteLine(soloNumeros); // "346665544"

Copied!

We can also use it to hide sensitive data:

string tarjeta = "1234-5678-9012-3456";
// Replace the first 12 digits with X
string oculta = Regex.Replace(tarjeta, @"^\d{4}-\d{4}-\d{4}", "XXXX-XXXX-XXXX");

Copied!

Performance and Optimization

Regex is powerful, but slow if used incorrectly. The Regex engine has to compile the pattern into internal code before executing it.

If you have a complex pattern that you will use millions of times inside a loop, you can compile it with RegexOptions.Compiled.

// Takes longer to create (new), but is faster to execute (IsMatch)
Regex regexCompilada = new Regex(@"^\d+$", RegexOptions.Compiled);

Copied!

Don’t use Compiled for everything. Compilation consumes initial CPU and memory. Use it only for very frequently reused static expressions.