ZetCode

F# regular expressions

last modified September 11, 2022

In this article, we show how to work with regular expressions in F#.

Regular expressions

Regular expressions are used for text searching and more advanced text manipulation. They are built into tools such as grep and sed, text editors such as vi and Emacs, and programming languages.

.NET has built-in API for working with regular expressions; it is located in System.Text.RegularExpressions.

A regular expression defines a search pattern for strings. Regex represents an immutable regular expression. It contains methods to match text, replace text, or split text.

F# isMatch

The isMatch method indicates whether the regular expression finds a match in the input string.

main.fsx
open System.Text.RegularExpressions

let words =
    [ "Seven"
      "even"
      "Maven"
      "Amen"
      "eleven" ]

let rx = Regex(@".even", RegexOptions.Compiled)

words
|> List.map
    (fun e ->
        if rx.IsMatch(e) then
            printfn $"{e} matches"
        else
            printfn $"{e} does not match")

In the example, we have five words in a list. We check which words match the .even regular expression.

let words =
    [ "Seven"
      "even"
      "Maven"
      "Amen"
      "eleven" ]

We define a list of words.

let rx = Regex(@".even", RegexOptions.Compiled)

We create the .even regular expression. The RegexOptions.Compiled option specifies that the regular expression is compiled to an assembly. This yields faster execution but increases startup time. The dot (.) metacharacter stands for any single character in the text.

words
|> List.map
    (fun e ->
        if rx.IsMatch(e) then
            printfn $"{e} matches"
        else
            printfn $"{e} does not match")

We apply the given lambda function to the elements of the list. The IsMatch method returns true if the word matches the regular expression.

λ dotnet fsi main.fsx
Seven matches
even does not match
Maven does not match
Amen does not match
eleven matches

F# regex alternations

The alternation operator | enables to create a regular expression with several choices.

main.fsx
open System.Text.RegularExpressions

let users = ["Jane"; "Thomas"; "Robert";
    "Lucy"; "Beky"; "John"; "Peter"; "Andy"]

let rx = Regex("Jane|Beky|Robert", RegexOptions.Compiled)
users |> List.filter rx.IsMatch |> List.iter (printfn "%s")

There are nine names in the list.

let rx = Regex("Jane|Beky|Robert", RegexOptions.Compiled)

The regular expression looks for "Jane", "Beky", or "Robert" strings.

users |> List.filter rx.IsMatch |> List.iter (printfn "%s")

We filter the list by applying the IsMatch on each element of the list and then print all the matched values.

λ dotnet fsi main.fsx
Jane
Robert
Beky

F# regex Matches

The Matches method searches an input string for all occurrences of a regular expression and returns all the matches.

main.fsx
open System.Text.RegularExpressions

let content =
    @"Foxes are omnivorous mammals belonging to several genera
of the family Canidae. Foxes have a flattened skull, upright triangular ears,
a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every
continent except Antarctica. By far the most common and widespread species of
fox is the red fox."


let found =
    seq {
        for m in Regex.Matches(content, "(?i)fox(es)?") do
            yield m.Value, m.Index
    }

found
|> Seq.iter (fun (e, idx) -> printfn "%s at %d" e idx)

In the example, we look for all occurrences of the fox word.

let found =
    seq {
        for m in Regex.Matches(content, "(?i)fox(es)?") do
            yield m.Value, m.Index
    }

We match the content string against the specified regular expression. The regex is case insensitive and may include its plural form. A sequence of matched values and their indexes is created.

found
|> Seq.iter (fun (e, idx) -> printfn "%s at %d" e idx)

We iterate over the sequence and print the found matches and their indexes.

λ dotnet fsi main.fsx
Foxes at 0
Foxes at 80
Foxes at 194
fox at 292
fox at 307

F# regex word boundaries

The metacharacter \b is an anchor which matches at a position that is called a word boundary. It allows to search for whole words.

main.fsx
open System.Text.RegularExpressions

let text = "This island is beautiful"

let rx = Regex(@"\bis\b", RegexOptions.Compiled)

let matches =
    rx.Matches(text)
    |> Seq.map (fun m -> m.Value, m.Index)

matches
|> Seq.iter (fun (e, idx) -> printfn "%s at %d" e idx)

We look for the is word, but not for This and island words.

let rx = Regex(@"\bis\b", RegexOptions.Compiled)

With two \b metacharacters, we search for the is whole word.

λ dotnet fsi main.fsx
is at 12

F# regex currency symbols

The \p{Sc} regular expresion can be used to look for currency symbols.

main.fsx
open System
open System.Text.RegularExpressions

Console.OutputEncoding = Text.Encoding.UTF8

let content = @"Currency symbols: ฿ Thailand bath, ₹  Indian rupee,
    ₾ Georgian lari, $ Dollar, € Euro, ¥ Yen, £ Pound Sterling";

let pattern = @"\p{Sc}";

let rx = Regex(pattern, RegexOptions.Compiled)
let matches = rx.Matches(content)
              |> Seq.map (fun m -> m.Value, m.Index)

matches
|> Seq.iter (fun (e, idx) -> printfn "%s at %d" e idx)

In the program, we look for currency symbols.

λ dotnet fsi main.fsx
฿ at 18
₹ at 35
₾ at 57
$ at 74
€ at 84
¥ at 92
£ at 99

F# regex capturing groups

Round brackets are used to create capturing groups. This allows us to apply a quantifier to the entire group or to restrict alternation to a part of the regular expression.

main.fsx
open System.Text.RegularExpressions

let sites =
    [ "webcode.me"
      "zetcode.com"
      "spoznaj"
      "freebsd.org"
      "netbsd.org" ]

let rx =
    Regex(@"(\w+)\.(\w+)", RegexOptions.Compiled)

let check e =
    let m = rx.Match(e)
    (m.Value, m.Groups[1], m.Groups[2])

let found = sites |> List.map check

printfn "%A" found

In the program, we divide the domain names into two parts by using groups.

let rx =
    Regex(@"(\w+)\.(\w+)", RegexOptions.Compiled)

We define two groups with parentheses.

let check e =
    let m = rx.Match(e)
    (m.Value, m.Groups[1], m.Groups[2])

The Value attribute returns the whole matched string; it is equal to the match.Groups[0]. The groups are accessed via the Groups property.

λ dotnet fsi main.fsx
[("webcode.me", webcode, me); ("zetcode.com", zetcode, com); ("", , );
 ("freebsd.org", freebsd, org); ("netbsd.org", netbsd, org)]

In this article we have worked with regular expressions in F#.