Short for regular expression, a regex is a string of text that
allows you to create patterns that help match, locate, and manage
text. Perl is a great example of a programming language that
utilizes regular expressions. However, its only one of the many
places you can find regular expressions. Regular expressions can
also be used from the command line and in text editors to find
text within a file.
When first trying to understand regular expressions, it seems as
if it's a different language. However, mastering regular
expressions can save you thousands of hours if you work with text
or need to parse large amounts of data. Below is an example of a
regular expression with each of its components labeled. This
regular expression is also shown in the Perl programming examples
shown later on this page.
What is RegExp?
RegEx stands for 'regular expression' and is a method used by
programmers to define search patterns. Regex is useful for extracting
information from large blocks of data. Data can take many forms,
whether that be plain text, files, or code. A regex search pattern is
much more powerful and flexible than simple string searches, such as
the search queries typically used with search engines.
For example, a regular expression is used when a password policy is
stored in software that specifies certain character combinations for
passwords. For such a password rule, the expression could look as
follows:
(?=^.{8,}$)((?=.*\d)|(?=.*\W+))(?![.\n])(?=.*[A-Z])(?=.*[a-z]).*$"
This rule contains numerous specifications, such as the minimum length
of 8 characters and the use of upper and lower case letters. For
example, the expression .{8,} means that any character (symbolized by
the dot) should occur eight times or more ({8,}).
Components of regular expressions
Regular expressions are commonly found in many different programming
languages, but their exact implementation can differ. This means that
occasionally, some characters may be used in different ways in
different implementations. However, sometimes a character has a
relatively universal use. Below are some common regular expression
components.
Anchors
Anchors are characters that specify the location within a particular
string to search. Regex was developed originally for line-based
systems, so a lot of regex was developed around searching within
lines. To find a character "A" in a string, you can use characters
from the following list to find a match within a line:
^A - Match at the beginning of a line.
A$ - Match at the end of a line.
Character sets
Character sets allow you to define explicit parameters for the type of
text to be searched for. As an example, numerical ranges can be
searched for, using [0-9]. However, regex supports character matching,
so it can be useful for finding letter ranges, or for supporting
alternate spellings. For example, gr[ae]y will match both 'gray' and
'grey'.
[0-9] - Match a range of numbers from 0-9.
[a-z] - Match lowercase letters from a-z.
[A-Z] - Match uppercase letters from A-Z.
[.] - Match any character except line break characters.
Modifiers
Modifiers can be used to alter the behavior of regex strings. They are
typically wrapped in brackets and start with a question mark. Many
modifiers are implementation-dependent, but below are some example
characters.
(?c) - Turns off case sensitivity.
(?s) - Make the dot character include matches for line break
characters.
Chaining regular expressions
Regular expressions can be chained together using the pipe character
(|). This allows for multiple search options to be acceptable in a
single regex string. For example, the regex string
'(string1|string2|string3)' will search for 'string1', 'string2', and
'string3' within the same query, rather than having to run 3 separate
queries. These can be chained together with any other regex character
and with virtually no limitations as to how many.
Quantifiers in regular expressions
Quantifiers allow you to specify how many times you want a particular
regex string to match. Quantifiers usually come in two variants: lazy
and greedy. By default, regex matching is eager and will match as much
as possible, which is not always the desired behavior. Lazy
quantifiers allow you to limit how much is matched, and you can
further specify how many times matches are found with other limiting
characters, such as:
* - Match 0 or more times.
+ - Match 1 or more times
{ n } - Match exactly n times.
Advanced regular expressions
Regular expressions support advanced concepts, such as recursion,
backreferencing, grouping, subroutines, conditionals, and more. These
features allow you to find very specific information within large data
sets, and you can even create regex strings to find results within
results.