F# regular expressions
last modified September 21, 2023
In this article we show how to work with regular expressions in F#.
Regular expressions
Regular expressions are used for text searching and more advanced text manipulation. They are built into tools such as grep and sed, text editors such as vi and Emacs, and programming languages.
.NET has built-in API for working with regular expressions; it is located in
System.Text.RegularExpressions
.
A regular expression defines a search pattern for strings. Regex
represents an immutable regular expression. It contains methods to match text,
replace text, or split text.
F# isMatch
The isMatch
method indicates whether the regular expression finds a
match in the input string.
open System.Text.RegularExpressions let words = [ "Seven" "even" "Maven" "Amen" "eleven" ] let rx = Regex(@".even", RegexOptions.Compiled) words |> List.map (fun e -> if rx.IsMatch(e) then printfn $"{e} matches" else printfn $"{e} does not match")
In the example, we have five words in a list. We check which words match the .even regular expression.
let words = [ "Seven" "even" "Maven" "Amen" "eleven" ]
We define a list of words.
let rx = Regex(@".even", RegexOptions.Compiled)
We create the .even
regular expression. The
RegexOptions.Compiled
option specifies that the regular expression
is compiled to an assembly. This yields faster execution but increases startup
time. The dot (.) metacharacter stands for any single character in the text.
words |> List.map (fun e -> if rx.IsMatch(e) then printfn $"{e} matches" else printfn $"{e} does not match")
We apply the given lambda function to the elements of the list. The
IsMatch
method returns true if the word matches the regular
expression.
λ dotnet fsi main.fsx Seven matches even does not match Maven does not match Amen does not match eleven matches
F# regex alternations
The alternation operator | enables to create a regular expression with several choices.
open System.Text.RegularExpressions let users = ["Jane"; "Thomas"; "Robert"; "Lucy"; "Beky"; "John"; "Peter"; "Andy"] let rx = Regex("Jane|Beky|Robert", RegexOptions.Compiled) users |> List.filter rx.IsMatch |> List.iter (printfn "%s")
There are nine names in the list.
let rx = Regex("Jane|Beky|Robert", RegexOptions.Compiled)
The regular expression looks for "Jane", "Beky", or "Robert" strings.
users |> List.filter rx.IsMatch |> List.iter (printfn "%s")
We filter the list by applying the IsMatch
on each element of the
list and then print all the matched values.
λ dotnet fsi main.fsx Jane Robert Beky
F# regex Matches
The Matches
method searches an input string for all occurrences of
a regular expression and returns all the matches.
open System.Text.RegularExpressions let content = @"Foxes are omnivorous mammals belonging to several genera of the family Canidae. Foxes have a flattened skull, upright triangular ears, a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every continent except Antarctica. By far the most common and widespread species of fox is the red fox." let found = seq { for m in Regex.Matches(content, "(?i)fox(es)?") do yield m.Value, m.Index } found |> Seq.iter (fun (e, idx) -> printfn "%s at %d" e idx)
In the example, we look for all occurrences of the fox word.
let found = seq { for m in Regex.Matches(content, "(?i)fox(es)?") do yield m.Value, m.Index }
We match the content string against the specified regular expression. The regex is case insensitive and may include its plural form. A sequence of matched values and their indexes is created.
found |> Seq.iter (fun (e, idx) -> printfn "%s at %d" e idx)
We iterate over the sequence and print the found matches and their indexes.
λ dotnet fsi main.fsx Foxes at 0 Foxes at 80 Foxes at 194 fox at 292 fox at 307
F# regex word boundaries
The metacharacter \b
is an anchor which matches at a position that
is called a word boundary. It allows to search for whole words.
open System.Text.RegularExpressions let text = "This island is beautiful" let rx = Regex(@"\bis\b", RegexOptions.Compiled) let matches = rx.Matches(text) |> Seq.map (fun m -> m.Value, m.Index) matches |> Seq.iter (fun (e, idx) -> printfn "%s at %d" e idx)
We look for the is word, but not for This and island words.
let rx = Regex(@"\bis\b", RegexOptions.Compiled)
With two \b
metacharacters, we search for the is whole word.
λ dotnet fsi main.fsx is at 12
F# regex currency symbols
The \p{Sc}
regular expresion can be used to look for currency
symbols.
open System open System.Text.RegularExpressions Console.OutputEncoding = Text.Encoding.UTF8 let content = @"Currency symbols: ฿ Thailand bath, ₹ Indian rupee, ₾ Georgian lari, $ Dollar, € Euro, ¥ Yen, £ Pound Sterling"; let pattern = @"\p{Sc}"; let rx = Regex(pattern, RegexOptions.Compiled) let matches = rx.Matches(content) |> Seq.map (fun m -> m.Value, m.Index) matches |> Seq.iter (fun (e, idx) -> printfn "%s at %d" e idx)
In the program, we look for currency symbols.
λ dotnet fsi main.fsx ฿ at 18 ₹ at 35 ₾ at 57 $ at 74 € at 84 ¥ at 92 £ at 99
F# regex capturing groups
Round brackets are used to create capturing groups. This allows us to apply a quantifier to the entire group or to restrict alternation to a part of the regular expression.
open System.Text.RegularExpressions let sites = [ "webcode.me" "zetcode.com" "spoznaj" "freebsd.org" "netbsd.org" ] let rx = Regex(@"(\w+)\.(\w+)", RegexOptions.Compiled) let check e = let m = rx.Match(e) (m.Value, m.Groups[1], m.Groups[2]) let found = sites |> List.map check printfn "%A" found
In the program, we divide the domain names into two parts by using groups.
let rx = Regex(@"(\w+)\.(\w+)", RegexOptions.Compiled)
We define two groups with parentheses.
let check e = let m = rx.Match(e) (m.Value, m.Groups[1], m.Groups[2])
The Value
attribute returns the whole matched string; it is equal
to the match.Groups[0]
. The groups are accessed via the
Groups
property.
λ dotnet fsi main.fsx [("webcode.me", webcode, me); ("zetcode.com", zetcode, com); ("", , ); ("freebsd.org", freebsd, org); ("netbsd.org", netbsd, org)]
Active patterns
We can use regular expression with active patterns and match expression. Active patterns are named partitions of input data which can be used later in a pattern matching expression.
open System.Text.RegularExpressions let (|RegEx|_|) p i = let m = Regex.Match(i, p) if m.Success then Some m.Groups else None let checkrgx (msg) = match msg with | RegEx @"\d+" g -> printfn "Digit: %A" g | RegEx @"\w+" g -> printfn "Word : %A" g | _ -> printfn "Not recognized" checkrgx "an old falcon" checkrgx "1984" checkrgx "3 hawks"
In the example, we use active patterns and regular expression in pattern matching.
let (|RegEx|_|) p i = let m = Regex.Match(i, p) if m.Success then Some m.Groups else None
We define RegEx
name which matches the pattern agains the given
input.
let checkrgx (msg) = match msg with | RegEx @"\d+" g -> printfn "Digit: %A" g | RegEx @"\w+" g -> printfn "Word : %A" g | _ -> printfn "Not recognized"
The defined name is used in the match expression.
λ dotnet fsi main.fsx Word : seq [an] Digit: seq [1984] Digit: seq [3]
In this article we have worked with regular expressions in F#.