PHP regular expressions
last modified January 10, 2023
In this part of the PHP tutorial, we cover regular expressions in PHP.
$ php -v php -v PHP 8.1.2 (cli) (built: Aug 8 2022 07:28:23) (NTS) ...
We use PHP version 8.1.2.
Regular expressions are used for text searching and more advanced text manipulation. Regular expressions are built-in tools like grep, sed, text editors like vi, emacs, programming languages like Tcl, Perl, and Python. PHP has a built-in support for regular expressions too.
In PHP, there are two modules for regular expressions: the POSIX Regex and the PCRE. The POSIX Regex is depreciated. In this chapter, we will use the PCRE examples. PCRE stands for Perl compatible regular expressions.
Two things are needed when we work with regular expressions: Regex functions and the pattern.
A pattern is a regular expression that defines the text we are
searching for or manipulating. It consists of text literals and metacharacters.
The pattern is placed inside two delimiters. These are usually //
,
##
, or @@
characters. They inform
the regex function where the pattern starts and ends.
Here is a partial list of metacharacters used in PCRE.
. | Matches any single character. |
* | Matches the preceding element zero or more times. |
[ ] | Bracket expression. Matches a character within the brackets. |
[^ ] | Matches a single character that is not contained within the brackets. |
^ | Matches the starting position within the string. |
$ | Matches the ending position within the string. |
| | Alternation operator. |
PHP PRCE functions
We define some PCRE regex functions. They all have a preg prefix.
preg_split
- splits a string by regex patternpreg_match
- performs a regex matchpreg_replace
- search and replace string by regex patternpreg_grep
- returns array entries that match the regex pattern
Next we will have an example for each function.
php> print_r(preg_split("@\s@", "Jane\tKate\nLucy Marion")); Array ( [0] => Jane [1] => Kate [2] => Lucy [3] => Marion )
We have four names divided by spaces. The \s
is a character
class which stands for spaces. The preg_split
function returns
the split strings in an array.
php> echo preg_match("#[a-z]#", "s"); 1
The preg_match
function looks if the 's' character
is in the character class [a-z]
. The class stands for all
characters from a to z. It returns 1 for success.
php> echo preg_replace("/Jane/","Beky","I saw Jane. Jane was beautiful."); I saw Beky. Beky was beautiful.
The preg_replace
function replaces all occurrences of
the word 'Jane' for the word 'Beky'.
php> print_r(preg_grep("#Jane#", ["Jane", "jane", "Joan", "JANE"])); Array ( [0] => Jane )
The preg_grep
function returns an array of words that
match the given pattern. In this example, only one word is returned in the array.
This is because by default, the search is case sensitive.
php> print_r(preg_grep("#Jane#i", ["Jane", "jane", "Joan", "JANE"])); Array ( [0] => Jane [1] => jane [3] => JANE )
In this example, we perform a case insensitive grep. We put the i
modifier after the right delimiter. The returned array has now three words.
PHP regex dot metacharacter
The .
(dot) metacharacter stands for any single character in the text.
<?php $words = [ "Seven", "even", "Maven", "Amen", "Leven" ]; $pattern = "/.even/"; foreach ($words as $word) { if (preg_match($pattern, $word)) { echo "$word matches the pattern\n"; } else { echo "$word does not match the pattern\n"; } }
In the $words
array, we have five words.
$pattern = "/.even/";
Here we define the search pattern. The pattern is a string. The regular expression
is placed within delimiters. The delimiters are mandatory.
In our case, we use forward slashes / /
as delimiters. Note that we
can use different delimiters if we want. The dot character stands for any single character.
if (preg_match($pattern, $word)) { echo "$word matches the pattern\n"; } else { echo "$word does not match the pattern\n"; }
We test all five words if they match with the pattern.
$ php single.php Seven matches the pattern even does not match the pattern Maven does not match the pattern Amen does not match the pattern Leven matches the pattern
The Seven and Leven words match our search pattern.
PHP regex anchors
Anchors match positions of characters inside a given text.
In the next example, we look if a string is located at the beginning of a sentence.
<?php $sentence1 = "Everywhere I look I see Jane"; $sentence2 = "Jane is the best thing that happened to me"; if (preg_match("/^Jane/", $sentence1)) { echo "Jane is at the beginning of the \$sentence1\n"; } else { echo "Jane is not at the beginning of the \$sentence1\n"; } if (preg_match("/^Jane/", $sentence2)) { echo "Jane is at the beginning of the \$sentence2\n"; } else { echo "Jane is not at the beginning of the \$sentence2\n"; }
We have two sentences. The pattern is ^Jane
. The pattern
checks if the 'Jane' string located at the beginning of the text.
$ php anchors.php Jane is not at the beginning of the $sentence1 Jane is at the beginning of the $sentence2
php> echo preg_match("#Jane$#", "I love Jane"); 1 php> echo preg_match("#Jane$#", "Jane does not love me"); 0
The Jane$
pattern matches a string in which the word
Jane is at the end.
PHP regex exact word match
In the following examples we show how to look for exact word matches.
php> echo preg_match("/mother/", "mother"); 1 php> echo preg_match("/mother/", "motherboard"); 1 php> echo preg_match("/mother/", "motherland"); 1
The mother
pattern fits the words mother, motherboard and motherland.
Say, we want to look just for exact word matches. We will use the aforementioned
anchor ^
and $
characters.
php> echo preg_match("/^mother$/", "motherland"); 0 php> echo preg_match("/^mother$/", "Who is your mother?"); 0 php> echo preg_match("/^mother$/", "mother"); 1
Using the anchor characters, we get an exact word match for a pattern.
PHP regex quantifiers
A quantifier after a token or a group specifies how often that preceding element is allowed to occur.
? - 0 or 1 match * - 0 or more + - 1 or more {n} - exactly n {n,} - n or more {,n} - n or less (??) {n,m} - range n to m
The above is a list of common quantifiers.
The question mark ?
indicates there is zero or one of
the preceding element.
<?php $words = [ "color", "colour", "comic", "colourful", "colored", "cosmos", "coloseum", "coloured", "colourful" ]; $pattern = "/colou?r/"; foreach ($words as $word) { if (preg_match($pattern, $word)) { echo "$word matches the pattern\n"; } else { echo "$word does not match the pattern\n"; } }
We have four nine in the $words
array.
$pattern = "/colou?r/";
Color is used in American English, colour in British English. This pattern matches both cases.
$ php zeroorone.php color matches the pattern colour matches the pattern comic does not match the pattern colourful matches the pattern colored matches the pattern cosmos does not match the pattern coloseum does not match the pattern coloured matches the pattern colourful matches the pattern
This is the output of the zeroorone.php
script.
The *
metacharacter matches the preceding element
zero or more times.
<?php $words = [ "Seven", "even", "Maven", "Amen", "Leven" ]; $pattern = "/.*even/"; foreach ($words as $word) { if (preg_match($pattern, $word)) { echo "$word matches the pattern\n"; } else { echo "$word does not match the pattern\n"; } }
In the above script, we have added the *
metacharacter.
The .*
combination means, zero, one or more single characters.
$ php zeroormore.php Seven matches the pattern even matches the pattern Maven does not match the pattern Amen does not match the pattern Leven matches the pattern
Now the pattern matches three words: Seven, even and Leven.
php> print_r(preg_grep("#o{2}#", ["gool", "root", "foot", "dog"])); Array ( [0] => gool [1] => root [2] => foot )
The o{2}
pattern matches strings that contain exactly
two 'o' characters.
php> print_r(preg_grep("#^\d{2,4}$#", ["1", "12", "123", "1234", "12345"])); Array ( [1] => 12 [2] => 123 [3] => 1234 )
We have this ^\d{2,4}$
pattern. The \d
is a character
set; it stands for digits. The pattern matches numbers that have 2, 3, or 4 digits.
PHP regex alternation
The next example explains the alternation operator |
. This operator
enables to create a regular expression with several choices.
<?php $names = [ "Jane", "Thomas", "Robert", "Lucy", "Beky", "John", "Peter", "Andy" ]; $pattern = "/Jane|Beky|Robert/"; foreach ($names as $name) { if (preg_match($pattern, $name)) { echo "$name is my friend\n"; } else { echo "$name is not my friend\n"; } }
We have eight names in the $names
array.
$pattern = "/Jane|Beky|Robert/";
This is the search pattern. The pattern looks for 'Jane', 'Beky', or 'Robert' strings.
$ php alternation.php Jane is my friend Thomas is not my friend Robert is my friend Lucy is not my friend Beky is my friend John is not my friend Peter is not my friend Andy is not my friend
This is the output of the script.
PHP regex subpatterns
We can use square brackets to create subpatterns
inside patterns.
php> echo preg_match("/book(worm)?$/", "bookworm"); 1 php> echo preg_match("/book(worm)?$/", "book"); 1 php> echo preg_match("/book(worm)?$/", "worm"); 0
We have the following regex pattern: book(worm)?$
. The (worm)
is
a subpattern. The ? character follows the subpattern, which means that the subpattern
might appear 0, 1 times in the final pattern. The $
character is here for
the exact end match of the string. Without it, words like bookstore, bookmania would match too.
php> echo preg_match("/book(shelf|worm)?$/", "book"); 1 php> echo preg_match("/book(shelf|worm)?$/", "bookshelf"); 1 php> echo preg_match("/book(shelf|worm)?$/", "bookworm"); 1 php> echo preg_match("/book(shelf|worm)?$/", "bookstore"); 0
Subpatterns are often used with alternation. The (shelf|worm)
subpattern enables to create several word combinations.
PHP regex character classes
We can combine characters into character classes with the square brackets. A character class matches any character that is specified in the brackets.
<?php $words = [ "sit", "MIT", "fit", "fat", "lot" ]; $pattern = "/[fs]it/"; foreach ($words as $word) { if (preg_match($pattern, $word)) { echo "$word matches the pattern\n"; } else { echo "$word does not match the pattern\n"; } }
We define a character set with two characters.
$pattern = "/[fs]it/";
This is our pattern. The [fs]
is the character class. Note
that we work only with one character at a time. We either consider
f, or s, but not both.
$ php characterclass.php sit matches the pattern MIT does not match the pattern fit matches the pattern fat does not match the pattern lot does not match the pattern
This is the outcome of the script.
We can also use shorthand metacharacters for character classes.
The \w
stands for alphanumeric characters,
\d
for digit, and \s
whitespace characters.
<?php $words = [ "Prague", "111978", "terry2", "mitt##" ]; $pattern = "/\w{6}/"; foreach ($words as $word) { if (preg_match($pattern, $word)) { echo "$word matches the pattern\n"; } else { echo "$word does not match the pattern\n"; } }
In the above script, we test for words consisting of alphanumeric characters.
The \w{6}
stands for six alphanumeric characters. Only the word
mitt##
does not match, because it contains non-alphanumeric characters.
php> echo preg_match("#[^a-z]{3}#", "ABC"); 1
The #[^a-z]{3}#
pattern stands for three characters that are
not in the class a-z. The "ABC" characters match the condition.
php> print_r(preg_grep("#\d{2,4}#", [ "32", "234", "2345", "3d3", "2"])); Array ( [0] => 32 [1] => 234 [2] => 2345 )
In the above example, we have a pattern that matches 2, 3, and 4 digits.
PHP regex extracting matches
The preg_match
takes an optional third parameter.
If it is provided, it is filled with the results of the search.
The variable is an array whose first element contains the text that
matched the full pattern, the second element contains
the first captured parenthesized subpattern, and so on.
<?php $times = [ "10:10:22", "23:23:11", "09:06:56" ]; $pattern = "/(\d\d):(\d\d):(\d\d)/"; foreach ($times as $time) { $r = preg_match($pattern, $time, $match); if ($r) { echo "The $match[0] is split into:\n"; echo "Hour: $match[1]\n"; echo "Minute: $match[2]\n"; echo "Second: $match[3]\n"; } }
In the example, we extract parts of a time string.
$times = [ "10:10:22", "23:23:11", "09:06:56" ];
We have three time strings in English locale.
$pattern = "/(\d\d):(\d\d):(\d\d)/";
The pattern is divided into three subpatterns using square brackets. We want to refer specifically to exactly to each of these parts.
$r = preg_match($pattern, $time, $match);
We pass a third parameter to the preg_match
function. In case of a match, it contains text parts of
the matched string.
if ($r) { echo "The $match[0] is split into:\n"; echo "Hour: $match[1]\n"; echo "Minute: $match[2]\n"; echo "Second: $match[3]\n"; }
The $match[0]
contains the text that matched the full
pattern, $match[1]
contains text that matched the first
subpattern, $match[2]
the second, and $match[3]
the third.
$ php extract_matches.php The 10:10:22 is split into: Hour: 10 Minute: 10 Second: 22 The 23:23:11 is split into: Hour: 23 Minute: 23 Second: 11 The 09:06:56 is split into: Hour: 09 Minute: 06 Second: 56
This is the output of the example.
PHP regex email example
Next have a practical example. We create a regex pattern for checking email addresses.
<?php $emails = [ "luke@gmail.com", "andy@yahoocom", "34234sdfa#2345", "f344@gmail.com"]; # regular expression for emails $pattern = "/^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,18}$/"; foreach ($emails as $email) { if (preg_match($pattern, $email)) { echo "$email matches \n"; } else { echo "$email does not match\n"; } } >?
Note that this example provides only one solution. It does not have to be the best one.
$pattern = "/^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,18}$/";
This is the pattern. The first ^
and the last $
characters
are here to get an exact pattern match. No characters before and after the pattern are allowed.
The email is divided into five parts. The first part is the local part. This is usually
a name of a company, individual, or a nickname.
The [a-zA-Z0-9._-]+
lists all possible characters, we can use in the local part.
They can be used one or more times.
The second part is the literal @
character. The third part is the domain part.
It is usually the domain name of the email provider, like yahoo, or gmail. The [a-zA-Z0-9-]+
is a character set providing all characters that can be used in the domain name.
The +
quantifier makes use of one or more
of these characters. The fourth part is the dot character. It is preceded by the escape character (\).
This is because the dot character is a metacharacter and has a special meaning. By escaping it,
we get a literal dot. The final part is the top level domain. The pattern is as follows: [a-zA-Z.]{2,18}
Top level domains can have from 2 to 18 characters, like sk, net, info, travel, cleaning, travelinsurance.
The maximum lenght can be 63 characters, but most domain are shorter than 18 characters today. There is also a
dot character. This is because some top level domains have two parts; for example co.uk.
$ php emails.php luke@gmail.com matches andy@yahoocom does not match 34234sdfa#2345 does not match f344@gmail.com matches
This is the output of the emails.php
example.
Recap
Finally, we provide a quick recap of the regex patterns.
Jane the 'Jane' string ^Jane 'Jane' at the start of a string Jane$ 'Jane' at the end of a string ^Jane$ exact match of the string 'Jane' [abc] a, b, or c [a-z] any lowercase letter [^A-Z] any character that is not a uppercase letter (Jane|Becky) Matches either 'Jane' or 'Becky' [a-z]+ one or more lowercase letters ^[98]?$ digits 9, 8 or empty string ([wx])([yz]) wy, wz, xy, or xz [0-9] any digit [^A-Za-z0-9] any symbol (not a number or a letter)
In this chapter, we have covered regular expressions in PHP.
Author
List all PHP tutorials.