JavaScript regular expressions
last modified July 7, 2020
JavaScript regular expressions tutorial shows how to use regular expressions in JavaScript.
Regular expressions are used for text searching and more advanced text manipulation. Regular expressions are built-in tools like grep, sed, text editors like vi, emacs, programming languages like JavaScript, Perl, and Python.
JavaScript regular expression
In JavaScript, we build regular expressions either with slashes //
or RegExp
object.
A pattern is a regular expression that defines the text we are
searching for or manipulating. It consists of text literals and
metacharacters. Metacharacters are special characters that control
how the regular expression is going to be evaluated. For instance,
with \s
we search for white spaces.
After we have created a pattern, we can use one of the functions
to apply the pattern on a text string. The funcions include test()
,
match()
, search()
, and replace()
.
The following table shows some regular expressions:
Regex | Meaning |
---|---|
. |
Matches any single character. |
? |
Matches the preceding element once or not at all. |
+ |
Matches the preceding element once or more times. |
* |
Matches the preceding element zero or more times. |
^ |
Matches the starting position within the string. |
$ |
Matches the ending position within the string. |
| |
Alternation operator. |
[abc] |
Matches a or b, or c. |
[a-c] |
Range; matches a or b, or c. |
[^abc] |
Negation, matches everything except a, or b, or c. |
\s |
Matches white space character. |
\w |
Matches a word character; equivalent to [a-zA-Z_0-9] |
The test function
The test()
method executes a search for a match between a
regular expression and a specified string. It returns true or false.
let words = ['book', 'bookworm', 'Bible', 'bookish','cookbook', 'bookstore', 'pocketbook']; let pattern = /book/; words.forEach(word => { if (pattern.test(word)) { console.log(`the ${word} matches`); } });
In the example, we have an array of words. The pattern will look for a 'book' string in each of the words.
let pattern = /book/;
We create a pattern using slashes. The regular expression consists of four normal characters.
words.forEach(word => { if (pattern.test(word)) { console.log(`the ${word} matches`); } });
We go through the array and call the test()
function.
It returns true if the pattern matches the word.
$ node test_fun.js the book matches the bookworm matches the bookish matches the cookbook matches the bookstore matches the pocketbook matches
These words match the pattern.
The search function
The search()
function returns the index of the first match
between the regular expression and the given string. It returns -1
if the match is not found.
let text = 'I saw a fox in the wood. The fox had red fur.'; let pattern = /fox/; let idx = text.search(pattern); console.log(`the term was found at index: ${idx}`);
In the example, we find out the index of the first match of the 'fox' term.
$ node search_fun.js the term was found at index: 8
This is the output.
The exec function
The exec()
executes a search for a match in a specified
string. It returns an object with information about the match.
let words = ['book', 'bookworm', 'Bible', 'bookish', 'cookbook', 'bookstore', 'pocketbook']; let pattern = /book/; words.forEach(word => { let res = pattern.exec(word); if (res) { console.log(`${res} matches ${res.input} at index: ${res.index}`); } });
In the example, we apply the pattern on the input strings with exec()
.
if (res) { console.log(`${res} matches ${res.input} at index: ${res.index}`); }
We print the information about the match. It includes the index where the match begins.
$ node exec_fun.js book matches book at index: 0 book matches bookworm at index: 0 book matches bookish at index: 0 book matches cookbook at index: 4 book matches bookstore at index: 0 book matches pocketbook at index: 6
This is the output.
The match function
The match()
function retrieves the matches when matching
a pattern against an input string.
let text = 'I saw a fox in the wood. The fox had red fur.'; let pattern = /fox/g; let found = text.match(pattern); console.log(`There are ${found.length} matches`);
In the example, we find out the number of occurrences of the 'fox' term.
let pattern = /fox/g;
The g
character is a flag that finds all occurrences of
a term. Normally, the search ends when the first occurrence is found.
$ node match_fun.js There are 2 matches
We have found two 'fox' terms in the string.
The replace function
The replace()
function returns a new string with some or
all matches of a pattern replaced by a replacement string.
let text = 'He has gray hair; gray clouds gathered above us.' let pattern = /gray/g; let new_text = text.replace(pattern, 'grey'); console.log(new_text);
In the example, we create a new string from an input string, where we replace 'gray' words with 'grey'.
let pattern = /gray/g;
The g
character is a flag that finds all occurrences of
a term.
$ node replacing.js He has grey hair; grey clouds gathered above us.
This is the output.
Case insensitive match
To enable case insensitive search, we use the i
flag.
let words = ['dog', 'Dog', 'DOG', 'Doggy']; let pattern = /dog/i; words.forEach(word => { if (pattern.test(word)) { console.log(`the ${word} matches`); } });
In the example, we apply the pattern on words regardless of the case.
let pattern = /dog/i;
Appending the i
flag, we do case insensitive search.
$ node case_insensitive.js the dog matches the Dog matches the DOG matches the Doggy matches
All four words match the pattern when doing case insensitive search.
The dot metacharacter
The dot (.) metacharacter stands for any single character in the text.
let words = ['seven', 'even', 'prevent', 'revenge', 'maven', 'eleven', 'amen', 'event']; let pattern = /..even/; words.forEach(word => { if (pattern.test(word)) { console.log(`the ${word} matches`); } });
In the example, we have eight words in an array. We apply a pattern containing two dot metacharacters on each of the words.
$ node dot_meta.js the prevent matches the eleven matches
There are two words that match the pattern.
Question mark meta character
The question mark (?) meta character is a quantifier that matches the previous element zero or one time.
let words = ['seven', 'even', 'prevent', 'revenge', 'maven', 'eleven', 'amen', 'event']; let pattern = /.?even/; words.forEach(word => { if (pattern.test(word)) { console.log(`the ${word} matches`); } });
In the example, we add a question mark after the dot character. This means that in the pattern we can have one arbitrary character or we can have no character there.
$ node question_mark_meta.js the seven matches the even matches the prevent matches the revenge matches the eleven matches the event matches
This time the even and event words, which do not have a preceding character, match as well.
Anchors
Anchors match positions of characters inside a given text. When using the ^ anchor the match must occur at the beginning of the string and when using the $ anchor the match must occur at the end of the string.
let sentences = ['I am looking for Jane.', 'Jane was walking along the river.', 'Kate and Jane are close friends.']; let pattern = /^Jane/; sentences.forEach(sentence => { if (pattern.test(sentence)) { console.log(`${sentence}`); } });
In the example, we have three sentences. The search pattern is
^Jane
. The pattern checks if the "Jane" string is located
at the beginning of the text. The Jane\.
would look for
"Jane" at the end of the sentence.
Exact match
An exact match can be performed by placing the term between the anchors: ^ and $.
let words = ['seven', 'even', 'prevent', 'revenge', 'maven', 'eleven', 'amen', 'event'] let pattern = /^even$/; words.forEach(word => { if (pattern.test(word)) { console.log(`the ${word} matches`); } });
In the example, we look for an exact match for the 'even' term.
$ node exact_match.js the even matches
This is the output.
Character classes
A character class defines a set of characters, any one of which can occur in an input string for a match to succeed.
let words = ['a gray bird', 'grey hair', 'great look']; let pattern = /gr[ea]y/; words.forEach(word => { if (pattern.test(word)) { console.log(`${word}`); } });
In the example, we use a character class to include both gray and grey words.
let pattern = /gr[ea]y/;
The [ea]
class allows to use either 'e' or 'a' character
in the pattern.
Named character classes
There are some predefined character classes. The \s
matches a whitespace character [\t\n\t\f\v]
, the
\d
a digit [0-9]
, and the \w
a word character [a-zA-Z0-9_]
.
let text = 'We met in 2013. She must be now about 27 years old.'; let pattern = /\d+/g; while ((found = pattern.exec(text)) !== null) { console.log(`found ${found} at index ${found.index}`); }
In the example, we search for numbers in the text.
let pattern = /\d+/g;
The \d+
pattern looks for any number of digit sets in
the text. The g
flag makes the search not stop at first
occurrence.
while ((found = pattern.exec(text)) !== null) { console.log(`found ${found} at index ${found.index}`); }
To find all the matches, we use the exec()
function in
a while loop.
$ node named_character_class.js found 2013 at index 10 found 27 at index 38
This is the output.
In the following example, we have an alternative solution using
the match()
function.
let text = 'I met her in 2012. She must be now about 27 years old.' let pattern = /\d+/g; var found = text.match(pattern); console.log(`There are ${found.length} numbers`); found.forEach((num, i) => { console.log(`match ${++i}: ${num}`); });
To count numbers, we use the \d
named class.
$ node count_numbers.js There are 2 numbers match 1: 2012 match 2: 27
This is the output.
Counting words
In the next example, we count words in the text.
let text = 'The Sun was shining; I went for a walk.'; let pattern = /\w+/g; let found = text.match(pattern); console.log(`There are ${found.length} words`);
The \w
name set stands for a word character.
let pattern = /\w+/g;
The pattern uses a quantifier (+) to search for one or more word characters. The global flag makes the search look for all words in the string.
console.log(`There are ${found.length} words`);
We print the number of words to the console.
$ node count_words.js There are 9 words
This is the output.
Alternations
The alternation operator | creates a regular expression with several choices.
let words = ["Jane", "Thomas", "Robert", "Lucy", "Beky", "John", "Peter", "Andy"]; let pattern = /Jane|Beky|Robert/; words.forEach(word => { if (pattern.test(word)) { console.log(`the ${word} matches`); } });
We have eight names in the list.
let pattern = /Jane|Beky|Robert/;
This regular expression looks for "Jane", "Beky", or "Robert" strings.
Capturing groups
Capturing groups is a way to treat multiple characters as a single unit. They are created by placing charactes inside a set of round brackets. For instance, (book) is a single group containing 'b', 'o', 'o', 'k', characters.
The capturing groups technique allows us to find out those parts of a string that match the regular expression pattern.
content = `<p>The <code>Pattern</code> is a compiled representation of a regular expression.</p>`; let pattern = /(<\/?[a-z]*>)/g; let found = content.match(pattern); found.forEach(tag => { console.log(tag); });
The code example prints all HTML tags from the supplied string by capturing a group of characters.
let found = content.match(pattern);
In order to find all tags, we use the match()
method.
$ ./capturing_groups.js <p> <code> </code> </p>
We have found four HTML tags.
JavaScript regex email example
In the following example, we create a regex pattern for checking email addresses.
let emails = ["luke@gmail.com", "andy@yahoocom", "34234sdfa#2345", "f344@gmail.com"]; let pattern = /^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,18}$/; emails.forEach(email => { if (pattern.test(email)) { console.log(`${email} matches`); } else { console.log(`${email} does not match`); } })
This example provides one possible solution.
let pattern = /^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,18}$/;
The first ^
and the last $
characters provide
an exact pattern match. No characters before and after the pattern are allowed.
The email is divided into five parts. The first part is the local part.
This is usually a name of a company, individual, or a nickname.
The [a-zA-Z0-9._-]+
lists all possible characters that we can
use in the local part. They can be used one or more times.
The second part consists of the literal @
character. The third
part is the domain part. It is usually the domain name of the email provider
such as yahoo, or gmail. The [a-zA-Z0-9-]+
is a character class providing all characters that can be used in the domain name.
The +
quantifier allows to use of one or more of these characters.
The fourth part is the dot character; it is preceded by the escape character (\) to get a literal dot.
The final part is the top level domain name: [a-zA-Z.]{2,18}
.
Top level domains can have from 2 to 18 characters, such as sk, net, info,
travel, cleaning, travelinsurance. The maximum lenght can be 63 characters,
but most domain are shorter than 18 characters today. There is also a
dot character. This is because some top level domains have two parts;
for instance co.uk.
$ node emails.js luke@gmail.com matches andy@yahoocom does not match 34234sdfa#2345 does not match f344@gmail.com matches
This is the output.
In this chapter, we have covered regular expressions in JavaScript.
List all JavaScript tutorials.