ZetCode

C# Regular Expressions tutorial

last modified July 5, 2020

C# Regular Expressions tutorial shows how to parse text in C# using regular expressions.

Regular expressions

Regular expressions are used for text searching and more advanced text manipulation. Regular expressions are built into tools including grep and sed, text editors including vi and emacs, programming languages including C#, Java, and Python.

C# has built-in API for working with regular expressions; it is located in System.Text.RegularExpressions.

A regular expression defines a search pattern for strings. Regex represents an immutable regular expression. It contains methods to match text, replace text, or split text.

Regex examples

The following table shows a couple of regular expression strings.

Regex Meaning
. Matches any single character.
? Matches the preceding element once or not at all.
+ Matches the preceding element once or more times.
* Matches the preceding element zero or more times.
^ Matches the starting position within the string.
$ Matches the ending position within the string.
| Alternation operator.
[abc] Matches a or b, or c.
[a-c] Range; matches a or b, or c.
[^abc] Negation, matches everything except a, or b, or c.
\s Matches white space character.
\w Matches a word character; equivalent to [a-zA-Z_0-9]

C# regex isMatch

The isMatch() method indicates whether the regular expression finds a match in the input string.

Program.cs
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace Simple
{
    class Program
    {
        static void Main(string[] args)
        {
            var words = new List<string>() { "Seven", "even",
                    "Maven", "Amen", "eleven" };

            var rx = new Regex(@".even", RegexOptions.Compiled);

            foreach (string word in words)
            {
                if (rx.IsMatch(word))
                {
                    Console.WriteLine($"{word} does match");
                }
                else
                {
                    Console.WriteLine($"{word} does not match");
                }
            }
        }
    }
}

In the example, we have five words in a list. We check which words match the .even regular expression.

var words = new List<string>() { "Seven", "even",
    "Maven", "Amen", "eleven" };

We have a list of words.

var rx = new Regex(@".even", RegexOptions.Compiled);

We define the .even regular expression. The RegexOptions.Compiled option specifies that the regular expression is compiled to an assembly. This yields faster execution but increases startup time. The dot (.) metacharacter stands for any single character in the text.

foreach (string word in words)
{
    if (rx.IsMatch(word))
    {
        Console.WriteLine($"{word} does match");
    }
    else
    {
        Console.WriteLine($"{word} does not match");
    }
}

We go through the list of words. The IsMatch() method returns true if the word matches the regular expression.

$ dotnet run
Seven does match
even does not match
Maven does not match
Amen does not match
eleven does match

This is the output.

C# regex Match index

The Match's Success property returns a boolean value indicating whether the match is successful. The NextMatch() method returns a new Match object with the results for the next match, starting at the position at which the last match ended.

We can find out the position of the matches in the string with the Index property of the Match.

Program.cs
using System;
using System.Text.RegularExpressions;

namespace MatchEx
{
    class Program
    {
        static void Main(string[] args)
        {
            var content = @"Foxes are omnivorous mammals belonging to several genera 
of the family Canidae. Foxes have a flattened skull, upright triangular ears, 
a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every 
continent except Antarctica. By far the most common and widespread species of 
fox is the red fox.";

            var rx = new Regex("fox(es)?", RegexOptions.Compiled |
                RegexOptions.IgnoreCase);

            Match match = rx.Match(content);

            while (match.Success)
            {
                Console.WriteLine($"{match.Value} at index {match.Index}");
                match = match.NextMatch();
            }
        }
    }
}

In the example, we look for all occurrences of the fox word.

var rx = new Regex("fox(es)?", RegexOptions.Compiled |
    RegexOptions.IgnoreCase);

We add the (es)? expression to include the plural form of the word. The RegexOptions.IgnoreCase searches in case-insensitive mode.

Match match = rx.Match(content);

while (match.Success)
{
    Console.WriteLine($"{match.Value} at index {match.Index}");
    match = match.NextMatch();
}

The match.Value returns the matched string and the match.Index returns its index in the text. We find the next occurrence of a match with the match.NextMatch() method.

$ dotnet run
Foxes at index 0
Foxes at index 82
Foxes at index 198
fox at index 300
fox at index 315

This is the output.

C# regex Matches

The Matches() method searches an input string for all occurrences of a regular expression and returns all the matches.

Program.cs
using System;
using System.Text.RegularExpressions;

namespace MatchesEx
{
    class Program
    {
        static void Main(string[] args)
        {
            String content = @"<p>The <code>Regex</code> is a compiled 
                representation of a regular expression.</p>";

            var rx = new Regex(@"</?[a-z]+>", RegexOptions.Compiled);
            var matches = rx.Matches(content);

            foreach (Match match in matches)
            {
                Console.WriteLine(match);
            }
        }
    }
}

The example retrieves all HTML tags from a string.

var rx = new Regex(@"</?[a-z]+>", RegexOptions.Compiled);

In the regular expression, we search for tags; both starting and ending.

var matches = rx.Matches(content);

The Matches() method returns a collection of the Match objects found by the search. If no matches are found, the method returns an empty collection object.

foreach (Match match in matches)
{
    Console.WriteLine(match);
}

We go through the collection and print all matched strings.

$ dotnet run
<p>
<code>
</code>
</p>

This is the output.

C# regex word boundaries

The metacharacter \b is an anchor which matches at a position that is called a word boundary. It allows to search for whole words.

Program.cs
using System;
using System.Text.RegularExpressions;

namespace WordBoundaries
{
    class Program
    {
        static void Main(string[] args)
        {
            var text = "This island is beautiful";

            var rx = new Regex(@"\bis\b", RegexOptions.Compiled);
            var matches = rx.Matches(text);

            foreach (Match match in matches)
            {
                Console.WriteLine($"{match.Value} at {match.Index}");
            }
        }
    }
}

In the example, we look for the is word. We do not want to include the This and the island words.

var rx = new Regex(@"\bis\b", RegexOptions.Compiled);

With two \b metacharacters, we search for the is whole word.

$ dotnet run
is at 12

This is the output.

C# regex implicit word boundaries

The \w is a character class used for a character allowed in a word. For the \w+ regular expression, which denotes a word, the leading and trailing word boundary metacharacters are implicit; i.e. \w+ is equal to \b\w+\b.

Program.cs
using System;
using System.Text.RegularExpressions;

namespace WordsEx
{
    class Program
    {
        static void Main(string[] args)
        {
            var content = @"Foxes are omnivorous mammals belonging to several genera 
of the family Canidae. Foxes have a flattened skull, upright triangular ears, 
a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every 
continent except Antarctica. By far the most common and widespread species of 
fox is the red fox.";

            var rx = new Regex(@"\w+", RegexOptions.Compiled |
                RegexOptions.IgnoreCase);

            var matches = rx.Matches(content);
            Console.WriteLine(matches.Count);

            foreach (var match in matches)
            {
                Console.WriteLine(match);
            }
        }
    }
}

In the example, we search for all words in the text.

Console.WriteLine(matches.Count);

The Count property returns the number of matches.

C# regex currency symbols

The \p{Sc} regular expresion can be used to look for currency symbols.

Program.cs
using System;
using System.Text.RegularExpressions;

namespace CurrencySymbols
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.OutputEncoding = System.Text.Encoding.UTF8;

            string content = @"Currency symbols: ฿ Thailand bath, ₹ Indian rupee, ₾ Georgian lari, $ Dollar,
€ Euro, ¥ Yen, £ Pound Sterling";

            string pattern = @"\p{Sc}";

            var rx = new Regex(pattern, RegexOptions.Compiled);
            var matches = rx.Matches(content);

            foreach (Match match in matches)
            {
                Console.WriteLine($"{match.Value} is at {match.Index}");
            }
        }
    }
}

In the example, we look for currency symbols.

string content = @"Currency symbols: ฿ Thailand bath, ₹ Indian rupee, ₾ Georgian lari, $ Dollar,
    € Euro, ¥ Yen, £ Pound Sterling";

We have a couple of currency symbols in the text.

string pattern = @"\p{Sc}";

We define the regular expression for the currency symbols.

foreach (Match match in matches)
{
    Console.WriteLine($"{match.Value} is at {match.Index}");
}

We find all the symbols and their index.

$ dotnet run
฿ is at 18
₹ is at 35
₾ is at 51
$ is at 68
€ is at 79
¥ is at 87
£ is at 94

This is the output.

C# regex anchors

Anchors match positions of characters inside a given text. In the next example, we look if a string is located at the beginning of a sentence.

Program.cs
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace Anchors
{
    class Program
    {
        static void Main(string[] args)
        {
            var sentences = new List<string>() {
                "I am looking for Jane.",
                "Jane was walking along the river.",
                "Kate and Jane are close friends." 
            };

            var rx = new Regex(@"^Jane", RegexOptions.Compiled);

            foreach (string sentence in sentences)
            {
                if (rx.IsMatch(sentence))
                {
                    Console.WriteLine($"{sentence} does match");
                }
                else
                {
                    Console.WriteLine($"{sentence} does not match");
                }
            }
        }
    }
}

We have three sentences. The search pattern is ^Jane. The pattern checks if the "Jane" string is located at the beginning of the text. Jane\.$ would look for "Jane" at the end of the sentence.

C# regex alternations

The alternation operator | enables to create a regular expression with several choices.

Program.cs
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace Alternations
{
    class Program
    {
        static void Main(string[] args)
        {
            var users = new List<tstring>() {"Jane", "Thomas", "Robert",
                "Lucy", "Beky", "John", "Peter", "Andy"};

            var rx = new Regex("Jane|Beky|Robert", RegexOptions.Compiled);

            foreach (string user in users)
            {
                if (rx.IsMatch(user))
                {
                    Console.WriteLine($"{user} does match");
                }
                else
                {
                    Console.WriteLine($"{user} does not match");
                }
            }
        }
    }
}

We have nine names in the list.

var rx = new Regex("Jane|Beky|Robert", RegexOptions.Compiled);

This regular expression looks for "Jane", "Beky", or "Robert" strings.

C# regex capturing groups

Round brackets () are used to create capturing groups. This allows us to apply a quantifier to the entire group or to restrict alternation to a part of the regular expression.

Program.cs
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace SimpleGroup
{
    class Program
    {
        static void Main(string[] args)
        {
            var sites = new List<string>() {"webcode.me",
                "zetcode.com", "freebsd.org", "netbsd.org"};

            var rx = new Regex(@"(\w+)\.(\w+)", RegexOptions.Compiled);

            foreach (var site in sites) 
            {
                Match match = rx.Match(site);

                if (match.Success)
                {
                    Console.WriteLine(match.Value);
                    Console.WriteLine(match.Groups[1]);
                    Console.WriteLine(match.Groups[2]);
                }

                Console.WriteLine("*****************");
            }
        }
    }
}

In the example, we divide the domain names into two parts by using groups.

var rx = new Regex(@"(\w+)\.(\w+)", RegexOptions.Compiled);

We define two groups with parentheses.

if (match.Success)
{
    Console.WriteLine(match.Value);
    Console.WriteLine(match.Groups[1]);
    Console.WriteLine(match.Groups[2]);
}

The match.Value returns the whole matched string; it is equal to the match.Groups[0]. The groups are accessed via the Groups property.

$ dotnet run
webcode.me
webcode
me
*****************
zetcode.com
zetcode
com
*****************
freebsd.org
freebsd
org
*****************
netbsd.org
netbsd
org
*****************

This is the output.

In the following example, we use groups to work with expressions.

Program.cs
using System;
using System.Text.RegularExpressions;

namespace Expressions
{
    class Program
    {
        static void Main(string[] args)
        {
            string[] expressions = { "16 + 11", "12 * 5", "27 / 3", "2 - 8" };
            string pattern = @"(\d+)\s+([-+*/])\s+(\d+)";

            foreach (var expression in expressions)
            {
                var rx = new Regex(pattern, RegexOptions.Compiled);
                var matches = rx.Matches(expression);

                foreach (Match match in matches)
                {
                    int val1 = Int32.Parse(match.Groups[1].Value);
                    int val2 = Int32.Parse(match.Groups[3].Value);

                    var oper = match.Groups[2].Value;

                    string result = oper switch
                    {
                        "+" => $"{match.Value} = {val1 + val2}",
                        "-" => $"{match.Value} = {val1 - val2}",
                        "*" => $"{match.Value} = {val1 * val2}",
                        "/" => $"{match.Value} = {val1 / val2}",
                        _ => "unknown operator"
                    };

                    Console.WriteLine(result);
                }
            }
        }
    }
}

The example parses four simple mathematical expressions and computes them.

string[] expressions = { "16 + 11", "12 * 5", "27 / 3", "2 - 8" };

We have an array of four expressions.

string pattern = @"(\d+)\s+([-+*/])\s+(\d+)";

In the regex pattern, we have three groups: two groups for the values, one for the operator.

int val1 = Int32.Parse(match.Groups[1].Value);
int val2 = Int32.Parse(match.Groups[3].Value);

We get the values and transform them into integers.

var oper = match.Groups[2].Value;

We get the operator.

string result = oper switch
{
    "+" => $"{match.Value} = {val1 + val2}",
    "-" => $"{match.Value} = {val1 - val2}",
    "*" => $"{match.Value} = {val1 * val2}",
    "/" => $"{match.Value} = {val1 / val2}",
    _ => "unknown operator"
};

With the switch expression, we compute the expressions.

$ dotnet run
16 + 11 = 27
12 * 5 = 60
27 / 3 = 9
2 - 8 = -6

This is the output.

C# regex captures

When we use quantifiers, the group can capture zero, one, or more strings in a single match. All the substrings matched by a single capturing group are available from the Group.Captures property. In such as case, the Group object contains information about the last captured substring.

Program.cs
using System;
using System.Text.RegularExpressions;

namespace Captures
{
    class Program
    {
        static void Main(string[] args)
        {
            string text = "Today is a beautiful day. The sun is shining.";
            string pattern = @"\b(\w+\s*)+\.";

            MatchCollection matches = Regex.Matches(text, pattern);

            foreach (Match match in matches)
            {
                Console.WriteLine("Matched sentence: {0}", match.Value);

                for (int i = 0; i < match.Groups.Count; i++)
                {
                    Console.WriteLine("\tGroup {0}:  {1}", i, match.Groups[i].Value);

                    int captures = 0;

                    foreach (Capture capture in match.Groups[i].Captures)
                    {
                        Console.WriteLine("\t\tCapture {0}: {1}", captures, capture.Value);
                        captures++;
                    }
                }
            }
        }
    }
}

In the example, we have two sentences. With a regular expression, we capture all words from a sentence.

string pattern = @"\b(\w+\s*)+\.";

We use the + quantifier for the (\w+\s*) group. The group then contains all captures: words of the sentence.

foreach (Capture capture in match.Groups[i].Captures)
{
    Console.WriteLine("\t\tCapture {0}: {1}", captures, capture.Value);
    captures++;
}

We go through the captures of the group and print them to the console.

$ dotnet run
Matched sentence: Today is a beautiful day.
        Group 0:  Today is a beautiful day.
                Capture 0: Today is a beautiful day.
        Group 1:  day
                Capture 0: Today
                Capture 1: is
                Capture 2: a
                Capture 3: beautiful
                Capture 4: day
Matched sentence: The sun is shining.
        Group 0:  The sun is shining.
                Capture 0: The sun is shining.
        Group 1:  shining
                Capture 0: The
                Capture 1: sun
                Capture 2: is
                Capture 3: shining

This is the output. Remember that match.Groups[0].Value equals to match.Value.

C# regex replacing strings

It is possible to replace strings with Replace(). The method returns the modified string.

Program.cs
using System;
using System.Text.RegularExpressions;
using System.Net.Http;
using System.Threading.Tasks;

namespace Replace
{
    class Program
    {
        static async Task Main(string[] args)
        {
            using var client = new HttpClient();
            var content = await client.GetStringAsync("http://webcode.me");

            var rx = new Regex(@"<[^>]*>", RegexOptions.Compiled |
                RegexOptions.IgnoreCase);

            var modified = rx.Replace(content, String.Empty);

            Console.WriteLine(modified.Trim());
        }
    }
}

The example reads HTML data of a web page and strips its HTML tags using a regular expression.

using var client = new HttpClient();
var content = await client.GetStringAsync("http://webcode.me");

We create a GET request with HttpClient and retrieve the HTML code.

var rx = new Regex(@"<[^>]*>", RegexOptions.Compiled |
    RegexOptions.IgnoreCase);

This pattern defines a regular expression that matches HTML tags.

var modified = rx.Replace(content, String.Empty);

We remove all the tags with replaceAll() method.

C# regex splitting text

Text can be split with Pattern's split() method.

data.csv
22, 1, 3, 4, 5, 17, 18
2, 13, 4, 1, 8, 4
3, 21, 4, 5, 1, 48, 9, 42

We read from data.csv file.

Program.cs
using System;
using System.IO;
using System.Text.RegularExpressions;

namespace SplitText
{
    class Program
    {
        static void Main(string[] args)
        {
            string content = File.ReadAllText("data.csv");

            var rx = new Regex(@",\s*", RegexOptions.Compiled);
            var data = rx.Split(content);

            Console.WriteLine("[{0}]", string.Join(", ", data));

            int sum = 0;
            Array.ForEach(data, e => {

                var e2 = e.Trim();

                sum += Int32.Parse(e);
            });

            Console.WriteLine(sum);
        }
    }
}

The examples reads values from a CSV file and computes the sum of them. It uses regular expression to process the data.

string content = File.ReadAllText("data.csv");

In one shot, we read all data into the list of strings with File.ReadAllText().

var rx = new Regex(@",\s*", RegexOptions.Compiled);

The regular expression is a comma character followed by zero or more white space characters.

var data = rx.Split(content);

The Split() method splits an input string into an array of substrings.

int sum = 0;
Array.ForEach(data, e => {

    var e2 = e.Trim();

    sum += Int32.Parse(e);
});

We go through the lines and cut off spaces with trim() and compute the sum value.

$ dotnet run
[22, 1, 3, 4, 5, 17, 18, 2, 13, 4, 1, 8, 4, 3, 21, 4, 5, 1, 48, 9, 42]
235

This is the output.

C# case-insensitive regular expression

By setting the RegexOptions.IgnoreCase flag, we can have case-insensitive matching.

Program.cs
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace CaseInsensitive
{
    class Program
    {
        static void Main(string[] args)
        {
            var words = new List<string>() { "dog", "Dog", "DOG", "Doggy" };

            var rx = new Regex(@"\bdog\b", RegexOptions.Compiled | 
                RegexOptions.IgnoreCase);

            foreach (string word in words)
            {
                if (rx.IsMatch(word))
                {
                    Console.WriteLine($"{word} does match");
                }
                else
                {
                    Console.WriteLine($"{word} does not match");
                }
            }
        }
    }
}

The example performs case-insensitive matching of the regular expression.

var rx = new Regex(@"\bdog\b", RegexOptions.Compiled | 
    RegexOptions.IgnoreCase);

Case-insensitive matching is enabled by setting RegexOptions.Compiled as the second parameter to Regex().

C# regex subpatterns

Subpatterns are patterns within patterns. Subpatterns are created with () characters.

Program.cs
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace Subpatterns
{
    class Program
    {
        static void Main(string[] args)
        {
            var words = new List<string>() {"book", "bookshelf", "bookworm",
                "bookcase", "bookish", "bookkeeper", "booklet", "bookmark"};

            var rx = new Regex("^book(worm|mark|keeper)?$", RegexOptions.Compiled);

            foreach (string word in words)
            {
                if (rx.IsMatch(word))
                {

                    Console.WriteLine($"{word} does match");
                }
                else
                {

                    Console.WriteLine($"{word} does not match");
                }
            }
        }
    }
}

The example creates a subpattern.

var rx = new Regex("^book(worm|mark|keeper)?$", RegexOptions.Compiled); 

The regular expression uses a subpattern. It matches bookworm, bookmark, bookkeeper, and book words.

C# regex email example

In the following example, we create a regex pattern for checking email addresses.

Program.cs
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace Emails
{
    class Program
    {
        static void Main(string[] args)
        {
            var emails = new List<string>() {"luke@gmail.com",
                "andy@yahoocom", "34234sdfa#2345", "f344@gmail.com"};

            var pattern = @"[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,18}";

            var rx = new Regex(pattern, RegexOptions.Compiled);

            foreach (string email in emails)
            {
                if (rx.IsMatch(email))
                {
                    Console.WriteLine($"{email} does match");
                }
                else
                {
                    Console.WriteLine($"{email} does not match");
                }
            }
        }
    }
}

This example provides only one possible solution.

var pattern = @"[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,18}";

The email is divided into five parts. The first part is the local part. This is usually a name of a company, individual, or a nickname. The [a-zA-Z0-9._-]+ lists all possible characters, we can use in the local part. They can be used one or more times.

The second part consists of the literal @ character. The third part is the domain part. It is usually the domain name of the email provider, like Yahoo or Gmail. The [a-zA-Z0-9-]+ is a character set providing all characters that can be used in the domain name. The + quantifier makes use of one or more of these characters.

The fourth part is the dot character. It is preceded by the escape character (\). This is because the dot character is a metacharacter and has a special meaning. By escaping it, we get a literal dot.

The final part is the top level domain: [a-zA-Z.]{2,18}. Top level domains can have from 2 to 18 characters, such as sk, net, info, travel, cleaning, travelinsurance. The maximum length can be 63 characters, but most domain are shorter than 18 characters today. There is also a dot character. This is because some top level domains have two parts; for example co.uk.

In this tutorial, we have worked with regular expression in C#.

List all C# tutorials.