1 files changed, 3219 insertions, 0 deletions
diff --git a/upstream/archlinux/man1/perlretut.1perl b/upstream/archlinux/man1/perlretut.1perl
new file mode 100644
index 00000000..2967ebd2
--- /dev/null
+++ b/upstream/archlinux/man1/perlretut.1perl
@@ -0,0 +1,3219 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+.    ds C` ""
+.    ds C' ""
+'br\}
+.el\{\
+.    ds C`
+.    ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el       .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD.  Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+.    if \nF \{\
+.        de IX
+.        tm Index:\\$1\t\\n%\t"\\$2"
+..
+.        if !\nF==2 \{\
+.            nr % 0
+.            nr F 2
+.        \}
+.    \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "PERLRETUT 1perl"
+.TH PERLRETUT 1perl 2024-02-11 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+perlretut \- Perl regular expressions tutorial
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+This page provides a basic tutorial on understanding, creating and
+using regular expressions in Perl.  It serves as a complement to the
+reference page on regular expressions perlre.  Regular expressions
+are an integral part of the \f(CW\*(C`m//\*(C'\fR, \f(CW\*(C`s///\*(C'\fR, \f(CW\*(C`qr//\*(C'\fR and \f(CW\*(C`split\*(C'\fR
+operators and so this tutorial also overlaps with
+"Regexp Quote-Like Operators" in perlop and "split" in perlfunc.
+.PP
+Perl is widely renowned for excellence in text processing, and regular
+expressions are one of the big factors behind this fame.  Perl regular
+expressions display an efficiency and flexibility unknown in most
+other computer languages.  Mastering even the basics of regular
+expressions will allow you to manipulate text with surprising ease.
+.PP
+What is a regular expression?  At its most basic, a regular expression
+is a template that is used to determine if a string has certain
+characteristics.  The string is most often some text, such as a line,
+sentence, web page, or even a whole book, but it doesn't have to be.  It
+could be binary data, for example.  Biologists often use Perl to look
+for patterns in long DNA sequences.
+.PP
+Suppose we want to determine if the text in variable, \f(CW$var\fR contains
+the sequence of characters \f(CW\*(C`m\ u\ s\ h\ r\ o\ o\ m\*(C'\fR
+(blanks added for legibility).  We can write in Perl
+.PP
+.Vb 1
+\& $var =~ m/mushroom/
+.Ve
+.PP
+The value of this expression will be TRUE if \f(CW$var\fR contains that
+sequence of characters anywhere within it, and FALSE otherwise.  The
+portion enclosed in \f(CW\*(Aq/\*(Aq\fR characters denotes the characteristic we
+are looking for.
+We use the term \fIpattern\fR for it.  The process of looking to see if the
+pattern occurs in the string is called \fImatching\fR, and the \f(CW"=~"\fR
+operator along with the \f(CW\*(C`m//\*(C'\fR tell Perl to try to match the pattern
+against the string.  Note that the pattern is also a string, but a very
+special kind of one, as we will see.  Patterns are in common use these
+days;
+examples are the patterns typed into a search engine to find web pages
+and the patterns used to list files in a directory, \fIe.g.\fR, "\f(CW\*(C`ls *.txt\*(C'\fR"
+or "\f(CW\*(C`dir *.*\*(C'\fR".  In Perl, the patterns described by regular expressions
+are used not only to search strings, but to also extract desired parts
+of strings, and to do search and replace operations.
+.PP
+Regular expressions have the undeserved reputation of being abstract
+and difficult to understand.  This really stems simply because the
+notation used to express them tends to be terse and dense, and not
+because of inherent complexity.  We recommend using the \f(CW\*(C`/x\*(C'\fR regular
+expression modifier (described below) along with plenty of white space
+to make them less dense, and easier to read.  Regular expressions are
+constructed using
+simple concepts like conditionals and loops and are no more difficult
+to understand than the corresponding \f(CW\*(C`if\*(C'\fR conditionals and \f(CW\*(C`while\*(C'\fR
+loops in the Perl language itself.
+.PP
+This tutorial flattens the learning curve by discussing regular
+expression concepts, along with their notation, one at a time and with
+many examples.  The first part of the tutorial will progress from the
+simplest word searches to the basic regular expression concepts.  If
+you master the first part, you will have all the tools needed to solve
+about 98% of your needs.  The second part of the tutorial is for those
+comfortable with the basics, and hungry for more power tools.  It
+discusses the more advanced regular expression operators and
+introduces the latest cutting-edge innovations.
+.PP
+A note: to save time, "regular expression" is often abbreviated as
+regexp or regex.  Regexp is a more natural abbreviation than regex, but
+is harder to pronounce.  The Perl pod documentation is evenly split on
+regexp vs regex; in Perl, there is more than one way to abbreviate it.
+We'll use regexp in this tutorial.
+.PP
+New in v5.22, \f(CW\*(C`use re \*(Aqstrict\*(Aq\*(C'\fR applies stricter
+rules than otherwise when compiling regular expression patterns.  It can
+find things that, while legal, may not be what you intended.
+.SH "Part 1: The basics"
+.IX Header "Part 1: The basics"
+.SS "Simple word matching"
+.IX Subsection "Simple word matching"
+The simplest regexp is simply a word, or more generally, a string of
+characters.  A regexp consisting of just a word matches any string that
+contains that word:
+.PP
+.Vb 1
+\&    "Hello World" =~ /World/;  # matches
+.Ve
+.PP
+What is this Perl statement all about? \f(CW"Hello World"\fR is a simple
+double-quoted string.  \f(CW\*(C`World\*(C'\fR is the regular expression and the
+\&\f(CW\*(C`//\*(C'\fR enclosing \f(CW\*(C`/World/\*(C'\fR tells Perl to search a string for a match.
+The operator \f(CW\*(C`=~\*(C'\fR associates the string with the regexp match and
+produces a true value if the regexp matched, or false if the regexp
+did not match.  In our case, \f(CW\*(C`World\*(C'\fR matches the second word in
+\&\f(CW"Hello World"\fR, so the expression is true.  Expressions like this
+are useful in conditionals:
+.PP
+.Vb 6
+\&    if ("Hello World" =~ /World/) {
+\&        print "It matches\en";
+\&    }
+\&    else {
+\&        print "It doesn\*(Aqt match\en";
+\&    }
+.Ve
+.PP
+There are useful variations on this theme.  The sense of the match can
+be reversed by using the \f(CW\*(C`!~\*(C'\fR operator:
+.PP
+.Vb 6
+\&    if ("Hello World" !~ /World/) {
+\&        print "It doesn\*(Aqt match\en";
+\&    }
+\&    else {
+\&        print "It matches\en";
+\&    }
+.Ve
+.PP
+The literal string in the regexp can be replaced by a variable:
+.PP
+.Vb 7
+\&    my $greeting = "World";
+\&    if ("Hello World" =~ /$greeting/) {
+\&        print "It matches\en";
+\&    }
+\&    else {
+\&        print "It doesn\*(Aqt match\en";
+\&    }
+.Ve
+.PP
+If you're matching against the special default variable \f(CW$_\fR, the
+\&\f(CW\*(C`$_ =~\*(C'\fR part can be omitted:
+.PP
+.Vb 7
+\&    $_ = "Hello World";
+\&    if (/World/) {
+\&        print "It matches\en";
+\&    }
+\&    else {
+\&        print "It doesn\*(Aqt match\en";
+\&    }
+.Ve
+.PP
+And finally, the \f(CW\*(C`//\*(C'\fR default delimiters for a match can be changed
+to arbitrary delimiters by putting an \f(CW\*(Aqm\*(Aq\fR out front:
+.PP
+.Vb 4
+\&    "Hello World" =~ m!World!;   # matches, delimited by \*(Aq!\*(Aq
+\&    "Hello World" =~ m{World};   # matches, note the paired \*(Aq{}\*(Aq
+\&    "/usr/bin/perl" =~ m"/perl"; # matches after \*(Aq/usr/bin\*(Aq,
+\&                                 # \*(Aq/\*(Aq becomes an ordinary char
+.Ve
+.PP
+\&\f(CW\*(C`/World/\*(C'\fR, \f(CW\*(C`m!World!\*(C'\fR, and \f(CW\*(C`m{World}\*(C'\fR all represent the
+same thing.  When, \fIe.g.\fR, the quote (\f(CW\*(Aq"\*(Aq\fR) is used as a delimiter, the forward
+slash \f(CW\*(Aq/\*(Aq\fR becomes an ordinary character and can be used in this regexp
+without trouble.
+.PP
+Let's consider how different regexps would match \f(CW"Hello World"\fR:
+.PP
+.Vb 4
+\&    "Hello World" =~ /world/;  # doesn\*(Aqt match
+\&    "Hello World" =~ /o W/;    # matches
+\&    "Hello World" =~ /oW/;     # doesn\*(Aqt match
+\&    "Hello World" =~ /World /; # doesn\*(Aqt match
+.Ve
+.PP
+The first regexp \f(CW\*(C`world\*(C'\fR doesn't match because regexps are by default
+case-sensitive.  The second regexp matches because the substring
+\&\f(CW\*(Aqo\ W\*(Aq\fR occurs in the string \f(CW"Hello\ World"\fR.  The space
+character \f(CW\*(Aq \*(Aq\fR is treated like any other character in a regexp and is
+needed to match in this case.  The lack of a space character is the
+reason the third regexp \f(CW\*(AqoW\*(Aq\fR doesn't match.  The fourth regexp
+"\f(CW\*(C`World \*(C'\fR" doesn't match because there is a space at the end of the
+regexp, but not at the end of the string.  The lesson here is that
+regexps must match a part of the string \fIexactly\fR in order for the
+statement to be true.
+.PP
+If a regexp matches in more than one place in the string, Perl will
+always match at the earliest possible point in the string:
+.PP
+.Vb 2
+\&    "Hello World" =~ /o/;       # matches \*(Aqo\*(Aq in \*(AqHello\*(Aq
+\&    "That hat is red" =~ /hat/; # matches \*(Aqhat\*(Aq in \*(AqThat\*(Aq
+.Ve
+.PP
+With respect to character matching, there are a few more points you
+need to know about.   First of all, not all characters can be used
+"as-is" in a match.  Some characters, called \fImetacharacters\fR, are
+generally reserved for use in regexp notation.  The metacharacters are
+.PP
+.Vb 1
+\&    {}[]()^$.|*+?\-#\e
+.Ve
+.PP
+This list is not as definitive as it may appear (or be claimed to be in
+other documentation).  For example, \f(CW"#"\fR is a metacharacter only when
+the \f(CW\*(C`/x\*(C'\fR pattern modifier (described below) is used, and both \f(CW"}"\fR
+and \f(CW"]"\fR are metacharacters only when paired with opening \f(CW"{"\fR or
+\&\f(CW"["\fR respectively; other gotchas apply.
+.PP
+The significance of each of these will be explained
+in the rest of the tutorial, but for now, it is important only to know
+that a metacharacter can be matched as-is by putting a backslash before
+it:
+.PP
+.Vb 5
+\&    "2+2=4" =~ /2+2/;    # doesn\*(Aqt match, + is a metacharacter
+\&    "2+2=4" =~ /2\e+2/;   # matches, \e+ is treated like an ordinary +
+\&    "The interval is [0,1)." =~ /[0,1)./     # is a syntax error!
+\&    "The interval is [0,1)." =~ /\e[0,1\e)\e./  # matches
+\&    "#!/usr/bin/perl" =~ /#!\e/usr\e/bin\e/perl/;  # matches
+.Ve
+.PP
+In the last regexp, the forward slash \f(CW\*(Aq/\*(Aq\fR is also backslashed,
+because it is used to delimit the regexp.  This can lead to LTS
+(leaning toothpick syndrome), however, and it is often more readable
+to change delimiters.
+.PP
+.Vb 1
+\&    "#!/usr/bin/perl" =~ m!#\e!/usr/bin/perl!;  # easier to read
+.Ve
+.PP
+The backslash character \f(CW\*(Aq\e\*(Aq\fR is a metacharacter itself and needs to
+be backslashed:
+.PP
+.Vb 1
+\&    \*(AqC:\eWIN32\*(Aq =~ /C:\e\eWIN/;   # matches
+.Ve
+.PP
+In situations where it doesn't make sense for a particular metacharacter
+to mean what it normally does, it automatically loses its
+metacharacter-ness and becomes an ordinary character that is to be
+matched literally.  For example, the \f(CW\*(Aq}\*(Aq\fR is a metacharacter only when
+it is the mate of a \f(CW\*(Aq{\*(Aq\fR metacharacter.  Otherwise it is treated as a
+literal RIGHT CURLY BRACKET.  This may lead to unexpected results.
+\&\f(CW\*(C`use re \*(Aqstrict\*(Aq\*(C'\fR can catch some of these.
+.PP
+In addition to the metacharacters, there are some ASCII characters
+which don't have printable character equivalents and are instead
+represented by \fIescape sequences\fR.  Common examples are \f(CW\*(C`\et\*(C'\fR for a
+tab, \f(CW\*(C`\en\*(C'\fR for a newline, \f(CW\*(C`\er\*(C'\fR for a carriage return and \f(CW\*(C`\ea\*(C'\fR for a
+bell (or alert).  If your string is better thought of as a sequence of arbitrary
+bytes, the octal escape sequence, \fIe.g.\fR, \f(CW\*(C`\e033\*(C'\fR, or hexadecimal escape
+sequence, \fIe.g.\fR, \f(CW\*(C`\ex1B\*(C'\fR may be a more natural representation for your
+bytes.  Here are some examples of escapes:
+.PP
+.Vb 5
+\&    "1000\et2000" =~ m(0\et2)   # matches
+\&    "1000\en2000" =~ /0\en20/   # matches
+\&    "1000\et2000" =~ /\e000\et2/ # doesn\*(Aqt match, "0" ne "\e000"
+\&    "cat"   =~ /\eo{143}\ex61\ex74/ # matches in ASCII, but a weird way
+\&                                 # to spell cat
+.Ve
+.PP
+If you've been around Perl a while, all this talk of escape sequences
+may seem familiar.  Similar escape sequences are used in double-quoted
+strings and in fact the regexps in Perl are mostly treated as
+double-quoted strings.  This means that variables can be used in
+regexps as well.  Just like double-quoted strings, the values of the
+variables in the regexp will be substituted in before the regexp is
+evaluated for matching purposes.  So we have:
+.PP
+.Vb 4
+\&    $foo = \*(Aqhouse\*(Aq;
+\&    \*(Aqhousecat\*(Aq =~ /$foo/;      # matches
+\&    \*(Aqcathouse\*(Aq =~ /cat$foo/;   # matches
+\&    \*(Aqhousecat\*(Aq =~ /${foo}cat/; # matches
+.Ve
+.PP
+So far, so good.  With the knowledge above you can already perform
+searches with just about any literal string regexp you can dream up.
+Here is a \fIvery simple\fR emulation of the Unix grep program:
+.PP
+.Vb 7
+\&    % cat > simple_grep
+\&    #!/usr/bin/perl
+\&    $regexp = shift;
+\&    while (<>) {
+\&        print if /$regexp/;
+\&    }
+\&    ^D
+\&
+\&    % chmod +x simple_grep
+\&
+\&    % simple_grep abba /usr/dict/words
+\&    Babbage
+\&    cabbage
+\&    cabbages
+\&    sabbath
+\&    Sabbathize
+\&    Sabbathizes
+\&    sabbatical
+\&    scabbard
+\&    scabbards
+.Ve
+.PP
+This program is easy to understand.  \f(CW\*(C`#!/usr/bin/perl\*(C'\fR is the standard
+way to invoke a perl program from the shell.
+\&\f(CW\*(C`$regexp\ =\ shift;\*(C'\fR saves the first command line argument as the
+regexp to be used, leaving the rest of the command line arguments to
+be treated as files.  \f(CW\*(C`while\ (<>)\*(C'\fR loops over all the lines in
+all the files.  For each line, \f(CW\*(C`print\ if\ /$regexp/;\*(C'\fR prints the
+line if the regexp matches the line.  In this line, both \f(CW\*(C`print\*(C'\fR and
+\&\f(CW\*(C`/$regexp/\*(C'\fR use the default variable \f(CW$_\fR implicitly.
+.PP
+With all of the regexps above, if the regexp matched anywhere in the
+string, it was considered a match.  Sometimes, however, we'd like to
+specify \fIwhere\fR in the string the regexp should try to match.  To do
+this, we would use the \fIanchor\fR metacharacters \f(CW\*(Aq^\*(Aq\fR and \f(CW\*(Aq$\*(Aq\fR.  The
+anchor \f(CW\*(Aq^\*(Aq\fR means match at the beginning of the string and the anchor
+\&\f(CW\*(Aq$\*(Aq\fR means match at the end of the string, or before a newline at the
+end of the string.  Here is how they are used:
+.PP
+.Vb 4
+\&    "housekeeper" =~ /keeper/;    # matches
+\&    "housekeeper" =~ /^keeper/;   # doesn\*(Aqt match
+\&    "housekeeper" =~ /keeper$/;   # matches
+\&    "housekeeper\en" =~ /keeper$/; # matches
+.Ve
+.PP
+The second regexp doesn't match because \f(CW\*(Aq^\*(Aq\fR constrains \f(CW\*(C`keeper\*(C'\fR to
+match only at the beginning of the string, but \f(CW"housekeeper"\fR has
+keeper starting in the middle.  The third regexp does match, since the
+\&\f(CW\*(Aq$\*(Aq\fR constrains \f(CW\*(C`keeper\*(C'\fR to match only at the end of the string.
+.PP
+When both \f(CW\*(Aq^\*(Aq\fR and \f(CW\*(Aq$\*(Aq\fR are used at the same time, the regexp has to
+match both the beginning and the end of the string, \fIi.e.\fR, the regexp
+matches the whole string.  Consider
+.PP
+.Vb 3
+\&    "keeper" =~ /^keep$/;      # doesn\*(Aqt match
+\&    "keeper" =~ /^keeper$/;    # matches
+\&    ""       =~ /^$/;          # ^$ matches an empty string
+.Ve
+.PP
+The first regexp doesn't match because the string has more to it than
+\&\f(CW\*(C`keep\*(C'\fR.  Since the second regexp is exactly the string, it
+matches.  Using both \f(CW\*(Aq^\*(Aq\fR and \f(CW\*(Aq$\*(Aq\fR in a regexp forces the complete
+string to match, so it gives you complete control over which strings
+match and which don't.  Suppose you are looking for a fellow named
+bert, off in a string by himself:
+.PP
+.Vb 1
+\&    "dogbert" =~ /bert/;   # matches, but not what you want
+\&
+\&    "dilbert" =~ /^bert/;  # doesn\*(Aqt match, but ..
+\&    "bertram" =~ /^bert/;  # matches, so still not good enough
+\&
+\&    "bertram" =~ /^bert$/; # doesn\*(Aqt match, good
+\&    "dilbert" =~ /^bert$/; # doesn\*(Aqt match, good
+\&    "bert"    =~ /^bert$/; # matches, perfect
+.Ve
+.PP
+Of course, in the case of a literal string, one could just as easily
+use the string comparison \f(CW\*(C`$string\ eq\ \*(Aqbert\*(Aq\*(C'\fR and it would be
+more efficient.   The  \f(CW\*(C`^...$\*(C'\fR regexp really becomes useful when we
+add in the more powerful regexp tools below.
+.SS "Using character classes"
+.IX Subsection "Using character classes"
+Although one can already do quite a lot with the literal string
+regexps above, we've only scratched the surface of regular expression
+technology.  In this and subsequent sections we will introduce regexp
+concepts (and associated metacharacter notations) that will allow a
+regexp to represent not just a single character sequence, but a \fIwhole
+class\fR of them.
+.PP
+One such concept is that of a \fIcharacter class\fR.  A character class
+allows a set of possible characters, rather than just a single
+character, to match at a particular point in a regexp.  You can define
+your own custom character classes.  These
+are denoted by brackets \f(CW\*(C`[...]\*(C'\fR, with the set of characters
+to be possibly matched inside.  Here are some examples:
+.PP
+.Vb 4
+\&    /cat/;       # matches \*(Aqcat\*(Aq
+\&    /[bcr]at/;   # matches \*(Aqbat, \*(Aqcat\*(Aq, or \*(Aqrat\*(Aq
+\&    /item[0123456789]/;  # matches \*(Aqitem0\*(Aq or ... or \*(Aqitem9\*(Aq
+\&    "abc" =~ /[cab]/;    # matches \*(Aqa\*(Aq
+.Ve
+.PP
+In the last statement, even though \f(CW\*(Aqc\*(Aq\fR is the first character in
+the class, \f(CW\*(Aqa\*(Aq\fR matches because the first character position in the
+string is the earliest point at which the regexp can match.
+.PP
+.Vb 2
+\&    /[yY][eE][sS]/;      # match \*(Aqyes\*(Aq in a case\-insensitive way
+\&                         # \*(Aqyes\*(Aq, \*(AqYes\*(Aq, \*(AqYES\*(Aq, etc.
+.Ve
+.PP
+This regexp displays a common task: perform a case-insensitive
+match.  Perl provides a way of avoiding all those brackets by simply
+appending an \f(CW\*(Aqi\*(Aq\fR to the end of the match.  Then \f(CW\*(C`/[yY][eE][sS]/;\*(C'\fR
+can be rewritten as \f(CW\*(C`/yes/i;\*(C'\fR.  The \f(CW\*(Aqi\*(Aq\fR stands for
+case-insensitive and is an example of a \fImodifier\fR of the matching
+operation.  We will meet other modifiers later in the tutorial.
+.PP
+We saw in the section above that there were ordinary characters, which
+represented themselves, and special characters, which needed a
+backslash \f(CW\*(Aq\e\*(Aq\fR to represent themselves.  The same is true in a
+character class, but the sets of ordinary and special characters
+inside a character class are different than those outside a character
+class.  The special characters for a character class are \f(CW\*(C`\-]\e^$\*(C'\fR (and
+the pattern delimiter, whatever it is).
+\&\f(CW\*(Aq]\*(Aq\fR is special because it denotes the end of a character class.  \f(CW\*(Aq$\*(Aq\fR is
+special because it denotes a scalar variable.  \f(CW\*(Aq\e\*(Aq\fR is special because
+it is used in escape sequences, just like above.  Here is how the
+special characters \f(CW\*(C`]$\e\*(C'\fR are handled:
+.PP
+.Vb 5
+\&   /[\e]c]def/; # matches \*(Aq]def\*(Aq or \*(Aqcdef\*(Aq
+\&   $x = \*(Aqbcr\*(Aq;
+\&   /[$x]at/;   # matches \*(Aqbat\*(Aq, \*(Aqcat\*(Aq, or \*(Aqrat\*(Aq
+\&   /[\e$x]at/;  # matches \*(Aq$at\*(Aq or \*(Aqxat\*(Aq
+\&   /[\e\e$x]at/; # matches \*(Aq\eat\*(Aq, \*(Aqbat, \*(Aqcat\*(Aq, or \*(Aqrat\*(Aq
+.Ve
+.PP
+The last two are a little tricky.  In \f(CW\*(C`[\e$x]\*(C'\fR, the backslash protects
+the dollar sign, so the character class has two members \f(CW\*(Aq$\*(Aq\fR and \f(CW\*(Aqx\*(Aq\fR.
+In \f(CW\*(C`[\e\e$x]\*(C'\fR, the backslash is protected, so \f(CW$x\fR is treated as a
+variable and substituted in double quote fashion.
+.PP
+The special character \f(CW\*(Aq\-\*(Aq\fR acts as a range operator within character
+classes, so that a contiguous set of characters can be written as a
+range.  With ranges, the unwieldy \f(CW\*(C`[0123456789]\*(C'\fR and \f(CW\*(C`[abc...xyz]\*(C'\fR
+become the svelte \f(CW\*(C`[0\-9]\*(C'\fR and \f(CW\*(C`[a\-z]\*(C'\fR.  Some examples are
+.PP
+.Vb 6
+\&    /item[0\-9]/;  # matches \*(Aqitem0\*(Aq or ... or \*(Aqitem9\*(Aq
+\&    /[0\-9bx\-z]aa/;  # matches \*(Aq0aa\*(Aq, ..., \*(Aq9aa\*(Aq,
+\&                    # \*(Aqbaa\*(Aq, \*(Aqxaa\*(Aq, \*(Aqyaa\*(Aq, or \*(Aqzaa\*(Aq
+\&    /[0\-9a\-fA\-F]/;  # matches a hexadecimal digit
+\&    /[0\-9a\-zA\-Z_]/; # matches a "word" character,
+\&                    # like those in a Perl variable name
+.Ve
+.PP
+If \f(CW\*(Aq\-\*(Aq\fR is the first or last character in a character class, it is
+treated as an ordinary character; \f(CW\*(C`[\-ab]\*(C'\fR, \f(CW\*(C`[ab\-]\*(C'\fR and \f(CW\*(C`[a\e\-b]\*(C'\fR are
+all equivalent.
+.PP
+The special character \f(CW\*(Aq^\*(Aq\fR in the first position of a character class
+denotes a \fInegated character class\fR, which matches any character but
+those in the brackets.  Both \f(CW\*(C`[...]\*(C'\fR and \f(CW\*(C`[^...]\*(C'\fR must match a
+character, or the match fails.  Then
+.PP
+.Vb 4
+\&    /[^a]at/;  # doesn\*(Aqt match \*(Aqaat\*(Aq or \*(Aqat\*(Aq, but matches
+\&               # all other \*(Aqbat\*(Aq, \*(Aqcat, \*(Aq0at\*(Aq, \*(Aq%at\*(Aq, etc.
+\&    /[^0\-9]/;  # matches a non\-numeric character
+\&    /[a^]at/;  # matches \*(Aqaat\*(Aq or \*(Aq^at\*(Aq; here \*(Aq^\*(Aq is ordinary
+.Ve
+.PP
+Now, even \f(CW\*(C`[0\-9]\*(C'\fR can be a bother to write multiple times, so in the
+interest of saving keystrokes and making regexps more readable, Perl
+has several abbreviations for common character classes, as shown below.
+Since the introduction of Unicode, unless the \f(CW\*(C`/a\*(C'\fR modifier is in
+effect, these character classes match more than just a few characters in
+the ASCII range.
+.IP \(bu 4
+\&\f(CW\*(C`\ed\*(C'\fR matches a digit, not just \f(CW\*(C`[0\-9]\*(C'\fR but also digits from non-roman scripts
+.IP \(bu 4
+\&\f(CW\*(C`\es\*(C'\fR matches a whitespace character, the set \f(CW\*(C`[\e \et\er\en\ef]\*(C'\fR and others
+.IP \(bu 4
+\&\f(CW\*(C`\ew\*(C'\fR matches a word character (alphanumeric or \f(CW\*(Aq_\*(Aq\fR), not just \f(CW\*(C`[0\-9a\-zA\-Z_]\*(C'\fR
+but also digits and characters from non-roman scripts
+.IP \(bu 4
+\&\f(CW\*(C`\eD\*(C'\fR is a negated \f(CW\*(C`\ed\*(C'\fR; it represents any other character than a digit, or \f(CW\*(C`[^\ed]\*(C'\fR
+.IP \(bu 4
+\&\f(CW\*(C`\eS\*(C'\fR is a negated \f(CW\*(C`\es\*(C'\fR; it represents any non-whitespace character \f(CW\*(C`[^\es]\*(C'\fR
+.IP \(bu 4
+\&\f(CW\*(C`\eW\*(C'\fR is a negated \f(CW\*(C`\ew\*(C'\fR; it represents any non-word character \f(CW\*(C`[^\ew]\*(C'\fR
+.IP \(bu 4
+The period \f(CW\*(Aq.\*(Aq\fR matches any character but \f(CW"\en"\fR (unless the modifier \f(CW\*(C`/s\*(C'\fR is
+in effect, as explained below).
+.IP \(bu 4
+\&\f(CW\*(C`\eN\*(C'\fR, like the period, matches any character but \f(CW"\en"\fR, but it does so
+regardless of whether the modifier \f(CW\*(C`/s\*(C'\fR is in effect.
+.PP
+The \f(CW\*(C`/a\*(C'\fR modifier, available starting in Perl 5.14,  is used to
+restrict the matches of \f(CW\*(C`\ed\*(C'\fR, \f(CW\*(C`\es\*(C'\fR, and \f(CW\*(C`\ew\*(C'\fR to just those in the ASCII range.
+It is useful to keep your program from being needlessly exposed to full
+Unicode (and its accompanying security considerations) when all you want
+is to process English-like text.  (The "a" may be doubled, \f(CW\*(C`/aa\*(C'\fR, to
+provide even more restrictions, preventing case-insensitive matching of
+ASCII with non-ASCII characters; otherwise a Unicode "Kelvin Sign"
+would caselessly match a "k" or "K".)
+.PP
+The \f(CW\*(C`\ed\es\ew\eD\eS\eW\*(C'\fR abbreviations can be used both inside and outside
+of bracketed character classes.  Here are some in use:
+.PP
+.Vb 7
+\&    /\ed\ed:\ed\ed:\ed\ed/; # matches a hh:mm:ss time format
+\&    /[\ed\es]/;         # matches any digit or whitespace character
+\&    /\ew\eW\ew/;         # matches a word char, followed by a
+\&                      # non\-word char, followed by a word char
+\&    /..rt/;           # matches any two chars, followed by \*(Aqrt\*(Aq
+\&    /end\e./;          # matches \*(Aqend.\*(Aq
+\&    /end[.]/;         # same thing, matches \*(Aqend.\*(Aq
+.Ve
+.PP
+Because a period is a metacharacter, it needs to be escaped to match
+as an ordinary period. Because, for example, \f(CW\*(C`\ed\*(C'\fR and \f(CW\*(C`\ew\*(C'\fR are sets
+of characters, it is incorrect to think of \f(CW\*(C`[^\ed\ew]\*(C'\fR as \f(CW\*(C`[\eD\eW]\*(C'\fR; in
+fact \f(CW\*(C`[^\ed\ew]\*(C'\fR is the same as \f(CW\*(C`[^\ew]\*(C'\fR, which is the same as
+\&\f(CW\*(C`[\eW]\*(C'\fR. Think De Morgan's laws.
+.PP
+In actuality, the period and \f(CW\*(C`\ed\es\ew\eD\eS\eW\*(C'\fR abbreviations are
+themselves types of character classes, so the ones surrounded by
+brackets are just one type of character class.  When we need to make a
+distinction, we refer to them as "bracketed character classes."
+.PP
+An anchor useful in basic regexps is the \fIword anchor\fR
+\&\f(CW\*(C`\eb\*(C'\fR.  This matches a boundary between a word character and a non-word
+character \f(CW\*(C`\ew\eW\*(C'\fR or \f(CW\*(C`\eW\ew\*(C'\fR:
+.PP
+.Vb 5
+\&    $x = "Housecat catenates house and cat";
+\&    $x =~ /cat/;    # matches cat in \*(Aqhousecat\*(Aq
+\&    $x =~ /\ebcat/;  # matches cat in \*(Aqcatenates\*(Aq
+\&    $x =~ /cat\eb/;  # matches cat in \*(Aqhousecat\*(Aq
+\&    $x =~ /\ebcat\eb/;  # matches \*(Aqcat\*(Aq at end of string
+.Ve
+.PP
+Note in the last example, the end of the string is considered a word
+boundary.
+.PP
+For natural language processing (so that, for example, apostrophes are
+included in words), use instead \f(CW\*(C`\eb{wb}\*(C'\fR
+.PP
+.Vb 1
+\&    "don\*(Aqt" =~ / .+? \eb{wb} /x;  # matches the whole string
+.Ve
+.PP
+You might wonder why \f(CW\*(Aq.\*(Aq\fR matches everything but \f(CW"\en"\fR \- why not
+every character? The reason is that often one is matching against
+lines and would like to ignore the newline characters.  For instance,
+while the string \f(CW"\en"\fR represents one line, we would like to think
+of it as empty.  Then
+.PP
+.Vb 2
+\&    ""   =~ /^$/;    # matches
+\&    "\en" =~ /^$/;    # matches, $ anchors before "\en"
+\&
+\&    ""   =~ /./;      # doesn\*(Aqt match; it needs a char
+\&    ""   =~ /^.$/;    # doesn\*(Aqt match; it needs a char
+\&    "\en" =~ /^.$/;    # doesn\*(Aqt match; it needs a char other than "\en"
+\&    "a"  =~ /^.$/;    # matches
+\&    "a\en"  =~ /^.$/;  # matches, $ anchors before "\en"
+.Ve
+.PP
+This behavior is convenient, because we usually want to ignore
+newlines when we count and match characters in a line.  Sometimes,
+however, we want to keep track of newlines.  We might even want \f(CW\*(Aq^\*(Aq\fR
+and \f(CW\*(Aq$\*(Aq\fR to anchor at the beginning and end of lines within the
+string, rather than just the beginning and end of the string.  Perl
+allows us to choose between ignoring and paying attention to newlines
+by using the \f(CW\*(C`/s\*(C'\fR and \f(CW\*(C`/m\*(C'\fR modifiers.  \f(CW\*(C`/s\*(C'\fR and \f(CW\*(C`/m\*(C'\fR stand for
+single line and multi-line and they determine whether a string is to
+be treated as one continuous string, or as a set of lines.  The two
+modifiers affect two aspects of how the regexp is interpreted: 1) how
+the \f(CW\*(Aq.\*(Aq\fR character class is defined, and 2) where the anchors \f(CW\*(Aq^\*(Aq\fR
+and \f(CW\*(Aq$\*(Aq\fR are able to match.  Here are the four possible combinations:
+.IP \(bu 4
+no modifiers: Default behavior.  \f(CW\*(Aq.\*(Aq\fR matches any character
+except \f(CW"\en"\fR.  \f(CW\*(Aq^\*(Aq\fR matches only at the beginning of the string and
+\&\f(CW\*(Aq$\*(Aq\fR matches only at the end or before a newline at the end.
+.IP \(bu 4
+s modifier (\f(CW\*(C`/s\*(C'\fR): Treat string as a single long line.  \f(CW\*(Aq.\*(Aq\fR matches
+any character, even \f(CW"\en"\fR.  \f(CW\*(Aq^\*(Aq\fR matches only at the beginning of
+the string and \f(CW\*(Aq$\*(Aq\fR matches only at the end or before a newline at the
+end.
+.IP \(bu 4
+m modifier (\f(CW\*(C`/m\*(C'\fR): Treat string as a set of multiple lines.  \f(CW\*(Aq.\*(Aq\fR
+matches any character except \f(CW"\en"\fR.  \f(CW\*(Aq^\*(Aq\fR and \f(CW\*(Aq$\*(Aq\fR are able to match
+at the start or end of \fIany\fR line within the string.
+.IP \(bu 4
+both s and m modifiers (\f(CW\*(C`/sm\*(C'\fR): Treat string as a single long line, but
+detect multiple lines.  \f(CW\*(Aq.\*(Aq\fR matches any character, even
+\&\f(CW"\en"\fR.  \f(CW\*(Aq^\*(Aq\fR and \f(CW\*(Aq$\*(Aq\fR, however, are able to match at the start or end
+of \fIany\fR line within the string.
+.PP
+Here are examples of \f(CW\*(C`/s\*(C'\fR and \f(CW\*(C`/m\*(C'\fR in action:
+.PP
+.Vb 1
+\&    $x = "There once was a girl\enWho programmed in Perl\en";
+\&
+\&    $x =~ /^Who/;   # doesn\*(Aqt match, "Who" not at start of string
+\&    $x =~ /^Who/s;  # doesn\*(Aqt match, "Who" not at start of string
+\&    $x =~ /^Who/m;  # matches, "Who" at start of second line
+\&    $x =~ /^Who/sm; # matches, "Who" at start of second line
+\&
+\&    $x =~ /girl.Who/;   # doesn\*(Aqt match, "." doesn\*(Aqt match "\en"
+\&    $x =~ /girl.Who/s;  # matches, "." matches "\en"
+\&    $x =~ /girl.Who/m;  # doesn\*(Aqt match, "." doesn\*(Aqt match "\en"
+\&    $x =~ /girl.Who/sm; # matches, "." matches "\en"
+.Ve
+.PP
+Most of the time, the default behavior is what is wanted, but \f(CW\*(C`/s\*(C'\fR and
+\&\f(CW\*(C`/m\*(C'\fR are occasionally very useful.  If \f(CW\*(C`/m\*(C'\fR is being used, the start
+of the string can still be matched with \f(CW\*(C`\eA\*(C'\fR and the end of the string
+can still be matched with the anchors \f(CW\*(C`\eZ\*(C'\fR (matches both the end and
+the newline before, like \f(CW\*(Aq$\*(Aq\fR), and \f(CW\*(C`\ez\*(C'\fR (matches only the end):
+.PP
+.Vb 2
+\&    $x =~ /^Who/m;   # matches, "Who" at start of second line
+\&    $x =~ /\eAWho/m;  # doesn\*(Aqt match, "Who" is not at start of string
+\&
+\&    $x =~ /girl$/m;  # matches, "girl" at end of first line
+\&    $x =~ /girl\eZ/m; # doesn\*(Aqt match, "girl" is not at end of string
+\&
+\&    $x =~ /Perl\eZ/m; # matches, "Perl" is at newline before end
+\&    $x =~ /Perl\ez/m; # doesn\*(Aqt match, "Perl" is not at end of string
+.Ve
+.PP
+We now know how to create choices among classes of characters in a
+regexp.  What about choices among words or character strings? Such
+choices are described in the next section.
+.SS "Matching this or that"
+.IX Subsection "Matching this or that"
+Sometimes we would like our regexp to be able to match different
+possible words or character strings.  This is accomplished by using
+the \fIalternation\fR metacharacter \f(CW\*(Aq|\*(Aq\fR.  To match \f(CW\*(C`dog\*(C'\fR or \f(CW\*(C`cat\*(C'\fR, we
+form the regexp \f(CW\*(C`dog|cat\*(C'\fR.  As before, Perl will try to match the
+regexp at the earliest possible point in the string.  At each
+character position, Perl will first try to match the first
+alternative, \f(CW\*(C`dog\*(C'\fR.  If \f(CW\*(C`dog\*(C'\fR doesn't match, Perl will then try the
+next alternative, \f(CW\*(C`cat\*(C'\fR.  If \f(CW\*(C`cat\*(C'\fR doesn't match either, then the
+match fails and Perl moves to the next position in the string.  Some
+examples:
+.PP
+.Vb 2
+\&    "cats and dogs" =~ /cat|dog|bird/;  # matches "cat"
+\&    "cats and dogs" =~ /dog|cat|bird/;  # matches "cat"
+.Ve
+.PP
+Even though \f(CW\*(C`dog\*(C'\fR is the first alternative in the second regexp,
+\&\f(CW\*(C`cat\*(C'\fR is able to match earlier in the string.
+.PP
+.Vb 2
+\&    "cats"          =~ /c|ca|cat|cats/; # matches "c"
+\&    "cats"          =~ /cats|cat|ca|c/; # matches "cats"
+.Ve
+.PP
+Here, all the alternatives match at the first string position, so the
+first alternative is the one that matches.  If some of the
+alternatives are truncations of the others, put the longest ones first
+to give them a chance to match.
+.PP
+.Vb 2
+\&    "cab" =~ /a|b|c/ # matches "c"
+\&                     # /a|b|c/ == /[abc]/
+.Ve
+.PP
+The last example points out that character classes are like
+alternations of characters.  At a given character position, the first
+alternative that allows the regexp match to succeed will be the one
+that matches.
+.SS "Grouping things and hierarchical matching"
+.IX Subsection "Grouping things and hierarchical matching"
+Alternation allows a regexp to choose among alternatives, but by
+itself it is unsatisfying.  The reason is that each alternative is a whole
+regexp, but sometime we want alternatives for just part of a
+regexp.  For instance, suppose we want to search for housecats or
+housekeepers.  The regexp \f(CW\*(C`housecat|housekeeper\*(C'\fR fits the bill, but is
+inefficient because we had to type \f(CW\*(C`house\*(C'\fR twice.  It would be nice to
+have parts of the regexp be constant, like \f(CW\*(C`house\*(C'\fR, and some
+parts have alternatives, like \f(CW\*(C`cat|keeper\*(C'\fR.
+.PP
+The \fIgrouping\fR metacharacters \f(CW\*(C`()\*(C'\fR solve this problem.  Grouping
+allows parts of a regexp to be treated as a single unit.  Parts of a
+regexp are grouped by enclosing them in parentheses.  Thus we could solve
+the \f(CW\*(C`housecat|housekeeper\*(C'\fR by forming the regexp as
+\&\f(CWhouse(cat|keeper)\fR.  The regexp \f(CWhouse(cat|keeper)\fR means match
+\&\f(CW\*(C`house\*(C'\fR followed by either \f(CW\*(C`cat\*(C'\fR or \f(CW\*(C`keeper\*(C'\fR.  Some more examples
+are
+.PP
+.Vb 4
+\&    /(a|b)b/;    # matches \*(Aqab\*(Aq or \*(Aqbb\*(Aq
+\&    /(ac|b)b/;   # matches \*(Aqacb\*(Aq or \*(Aqbb\*(Aq
+\&    /(^a|b)c/;   # matches \*(Aqac\*(Aq at start of string or \*(Aqbc\*(Aq anywhere
+\&    /(a|[bc])d/; # matches \*(Aqad\*(Aq, \*(Aqbd\*(Aq, or \*(Aqcd\*(Aq
+\&
+\&    /house(cat|)/;  # matches either \*(Aqhousecat\*(Aq or \*(Aqhouse\*(Aq
+\&    /house(cat(s|)|)/;  # matches either \*(Aqhousecats\*(Aq or \*(Aqhousecat\*(Aq or
+\&                        # \*(Aqhouse\*(Aq.  Note groups can be nested.
+\&
+\&    /(19|20|)\ed\ed/;  # match years 19xx, 20xx, or the Y2K problem, xx
+\&    "20" =~ /(19|20|)\ed\ed/;  # matches the null alternative \*(Aq()\ed\ed\*(Aq,
+\&                             # because \*(Aq20\ed\ed\*(Aq can\*(Aqt match
+.Ve
+.PP
+Alternations behave the same way in groups as out of them: at a given
+string position, the leftmost alternative that allows the regexp to
+match is taken.  So in the last example at the first string position,
+\&\f(CW"20"\fR matches the second alternative, but there is nothing left over
+to match the next two digits \f(CW\*(C`\ed\ed\*(C'\fR.  So Perl moves on to the next
+alternative, which is the null alternative and that works, since
+\&\f(CW"20"\fR is two digits.
+.PP
+The process of trying one alternative, seeing if it matches, and
+moving on to the next alternative, while going back in the string
+from where the previous alternative was tried, if it doesn't, is called
+\&\fIbacktracking\fR.  The term "backtracking" comes from the idea that
+matching a regexp is like a walk in the woods.  Successfully matching
+a regexp is like arriving at a destination.  There are many possible
+trailheads, one for each string position, and each one is tried in
+order, left to right.  From each trailhead there may be many paths,
+some of which get you there, and some which are dead ends.  When you
+walk along a trail and hit a dead end, you have to backtrack along the
+trail to an earlier point to try another trail.  If you hit your
+destination, you stop immediately and forget about trying all the
+other trails.  You are persistent, and only if you have tried all the
+trails from all the trailheads and not arrived at your destination, do
+you declare failure.  To be concrete, here is a step-by-step analysis
+of what Perl does when it tries to match the regexp
+.PP
+.Vb 1
+\&    "abcde" =~ /(abd|abc)(df|d|de)/;
+.Ve
+.IP 1. 4
+Start with the first letter in the string \f(CW\*(Aqa\*(Aq\fR.
+.IP 2. 4
+Try the first alternative in the first group \f(CW\*(Aqabd\*(Aq\fR.
+.IP 3. 4
+Match \f(CW\*(Aqa\*(Aq\fR followed by \f(CW\*(Aqb\*(Aq\fR. So far so good.
+.IP 4. 4
+\&\f(CW\*(Aqd\*(Aq\fR in the regexp doesn't match \f(CW\*(Aqc\*(Aq\fR in the string \- a
+dead end.  So backtrack two characters and pick the second alternative
+in the first group \f(CW\*(Aqabc\*(Aq\fR.
+.IP 5. 4
+Match \f(CW\*(Aqa\*(Aq\fR followed by \f(CW\*(Aqb\*(Aq\fR followed by \f(CW\*(Aqc\*(Aq\fR.  We are on a roll
+and have satisfied the first group. Set \f(CW$1\fR to \f(CW\*(Aqabc\*(Aq\fR.
+.IP 6. 4
+Move on to the second group and pick the first alternative \f(CW\*(Aqdf\*(Aq\fR.
+.IP 7. 4
+Match the \f(CW\*(Aqd\*(Aq\fR.
+.IP 8. 4
+\&\f(CW\*(Aqf\*(Aq\fR in the regexp doesn't match \f(CW\*(Aqe\*(Aq\fR in the string, so a dead
+end.  Backtrack one character and pick the second alternative in the
+second group \f(CW\*(Aqd\*(Aq\fR.
+.IP 9. 4
+\&\f(CW\*(Aqd\*(Aq\fR matches. The second grouping is satisfied, so set
+\&\f(CW$2\fR to \f(CW\*(Aqd\*(Aq\fR.
+.IP 10. 4
+We are at the end of the regexp, so we are done! We have
+matched \f(CW\*(Aqabcd\*(Aq\fR out of the string \f(CW"abcde"\fR.
+.PP
+There are a couple of things to note about this analysis.  First, the
+third alternative in the second group \f(CW\*(Aqde\*(Aq\fR also allows a match, but we
+stopped before we got to it \- at a given character position, leftmost
+wins.  Second, we were able to get a match at the first character
+position of the string \f(CW\*(Aqa\*(Aq\fR.  If there were no matches at the first
+position, Perl would move to the second character position \f(CW\*(Aqb\*(Aq\fR and
+attempt the match all over again.  Only when all possible paths at all
+possible character positions have been exhausted does Perl give
+up and declare \f(CW\*(C`$string\ =~\ /(abd|abc)(df|d|de)/;\*(C'\fR to be false.
+.PP
+Even with all this work, regexp matching happens remarkably fast.  To
+speed things up, Perl compiles the regexp into a compact sequence of
+opcodes that can often fit inside a processor cache.  When the code is
+executed, these opcodes can then run at full throttle and search very
+quickly.
+.SS "Extracting matches"
+.IX Subsection "Extracting matches"
+The grouping metacharacters \f(CW\*(C`()\*(C'\fR also serve another completely
+different function: they allow the extraction of the parts of a string
+that matched.  This is very useful to find out what matched and for
+text processing in general.  For each grouping, the part that matched
+inside goes into the special variables \f(CW$1\fR, \f(CW$2\fR, \fIetc\fR.  They can be
+used just as ordinary variables:
+.PP
+.Vb 6
+\&    # extract hours, minutes, seconds
+\&    if ($time =~ /(\ed\ed):(\ed\ed):(\ed\ed)/) {    # match hh:mm:ss format
+\&        $hours = $1;
+\&        $minutes = $2;
+\&        $seconds = $3;
+\&    }
+.Ve
+.PP
+Now, we know that in scalar context,
+\&\f(CW\*(C`$time\ =~\ /(\ed\ed):(\ed\ed):(\ed\ed)/\*(C'\fR returns a true or false
+value.  In list context, however, it returns the list of matched values
+\&\f(CW\*(C`($1,$2,$3)\*(C'\fR.  So we could write the code more compactly as
+.PP
+.Vb 2
+\&    # extract hours, minutes, seconds
+\&    ($hours, $minutes, $second) = ($time =~ /(\ed\ed):(\ed\ed):(\ed\ed)/);
+.Ve
+.PP
+If the groupings in a regexp are nested, \f(CW$1\fR gets the group with the
+leftmost opening parenthesis, \f(CW$2\fR the next opening parenthesis,
+\&\fIetc\fR.  Here is a regexp with nested groups:
+.PP
+.Vb 2
+\&    /(ab(cd|ef)((gi)|j))/;
+\&     1  2      34
+.Ve
+.PP
+If this regexp matches, \f(CW$1\fR contains a string starting with
+\&\f(CW\*(Aqab\*(Aq\fR, \f(CW$2\fR is either set to \f(CW\*(Aqcd\*(Aq\fR or \f(CW\*(Aqef\*(Aq\fR, \f(CW$3\fR equals either
+\&\f(CW\*(Aqgi\*(Aq\fR or \f(CW\*(Aqj\*(Aq\fR, and \f(CW$4\fR is either set to \f(CW\*(Aqgi\*(Aq\fR, just like \f(CW$3\fR,
+or it remains undefined.
+.PP
+For convenience, Perl sets \f(CW$+\fR to the string held by the highest numbered
+\&\f(CW$1\fR, \f(CW$2\fR,... that got assigned (and, somewhat related, \f(CW$^N\fR to the
+value of the \f(CW$1\fR, \f(CW$2\fR,... most-recently assigned; \fIi.e.\fR the \f(CW$1\fR,
+\&\f(CW$2\fR,... associated with the rightmost closing parenthesis used in the
+match).
+.SS Backreferences
+.IX Subsection "Backreferences"
+Closely associated with the matching variables \f(CW$1\fR, \f(CW$2\fR, ... are
+the \fIbackreferences\fR \f(CW\*(C`\eg1\*(C'\fR, \f(CW\*(C`\eg2\*(C'\fR,...  Backreferences are simply
+matching variables that can be used \fIinside\fR a regexp.  This is a
+really nice feature; what matches later in a regexp is made to depend on
+what matched earlier in the regexp.  Suppose we wanted to look
+for doubled words in a text, like "the the".  The following regexp finds
+all 3\-letter doubles with a space in between:
+.PP
+.Vb 1
+\&    /\eb(\ew\ew\ew)\es\eg1\eb/;
+.Ve
+.PP
+The grouping assigns a value to \f(CW\*(C`\eg1\*(C'\fR, so that the same 3\-letter sequence
+is used for both parts.
+.PP
+A similar task is to find words consisting of two identical parts:
+.PP
+.Vb 7
+\&    % simple_grep \*(Aq^(\ew\ew\ew\ew|\ew\ew\ew|\ew\ew|\ew)\eg1$\*(Aq /usr/dict/words
+\&    beriberi
+\&    booboo
+\&    coco
+\&    mama
+\&    murmur
+\&    papa
+.Ve
+.PP
+The regexp has a single grouping which considers 4\-letter
+combinations, then 3\-letter combinations, \fIetc\fR., and uses \f(CW\*(C`\eg1\*(C'\fR to look for
+a repeat.  Although \f(CW$1\fR and \f(CW\*(C`\eg1\*(C'\fR represent the same thing, care should be
+taken to use matched variables \f(CW$1\fR, \f(CW$2\fR,... only \fIoutside\fR a regexp
+and backreferences \f(CW\*(C`\eg1\*(C'\fR, \f(CW\*(C`\eg2\*(C'\fR,... only \fIinside\fR a regexp; not doing
+so may lead to surprising and unsatisfactory results.
+.SS "Relative backreferences"
+.IX Subsection "Relative backreferences"
+Counting the opening parentheses to get the correct number for a
+backreference is error-prone as soon as there is more than one
+capturing group.  A more convenient technique became available
+with Perl 5.10: relative backreferences. To refer to the immediately
+preceding capture group one now may write \f(CW\*(C`\eg\-1\*(C'\fR or \f(CW\*(C`\eg{\-1}\*(C'\fR, the next but
+last is available via \f(CW\*(C`\eg\-2\*(C'\fR or \f(CW\*(C`\eg{\-2}\*(C'\fR, and so on.
+.PP
+Another good reason in addition to readability and maintainability
+for using relative backreferences is illustrated by the following example,
+where a simple pattern for matching peculiar strings is used:
+.PP
+.Vb 1
+\&    $a99a = \*(Aq([a\-z])(\ed)\eg2\eg1\*(Aq;   # matches a11a, g22g, x33x, etc.
+.Ve
+.PP
+Now that we have this pattern stored as a handy string, we might feel
+tempted to use it as a part of some other pattern:
+.PP
+.Vb 6
+\&    $line = "code=e99e";
+\&    if ($line =~ /^(\ew+)=$a99a$/){   # unexpected behavior!
+\&        print "$1 is valid\en";
+\&    } else {
+\&        print "bad line: \*(Aq$line\*(Aq\en";
+\&    }
+.Ve
+.PP
+But this doesn't match, at least not the way one might expect. Only
+after inserting the interpolated \f(CW$a99a\fR and looking at the resulting
+full text of the regexp is it obvious that the backreferences have
+backfired. The subexpression \f(CW\*(C`(\ew+)\*(C'\fR has snatched number 1 and
+demoted the groups in \f(CW$a99a\fR by one rank. This can be avoided by
+using relative backreferences:
+.PP
+.Vb 1
+\&    $a99a = \*(Aq([a\-z])(\ed)\eg{\-1}\eg{\-2}\*(Aq;  # safe for being interpolated
+.Ve
+.SS "Named backreferences"
+.IX Subsection "Named backreferences"
+Perl 5.10 also introduced named capture groups and named backreferences.
+To attach a name to a capturing group, you write either
+\&\f(CW\*(C`(?<name>...)\*(C'\fR or \f(CW\*(C`(?\*(Aqname\*(Aq...)\*(C'\fR.  The backreference may
+then be written as \f(CW\*(C`\eg{name}\*(C'\fR.  It is permissible to attach the
+same name to more than one group, but then only the leftmost one of the
+eponymous set can be referenced.  Outside of the pattern a named
+capture group is accessible through the \f(CW\*(C`%+\*(C'\fR hash.
+.PP
+Assuming that we have to match calendar dates which may be given in one
+of the three formats yyyy-mm-dd, mm/dd/yyyy or dd.mm.yyyy, we can write
+three suitable patterns where we use \f(CW\*(Aqd\*(Aq\fR, \f(CW\*(Aqm\*(Aq\fR and \f(CW\*(Aqy\*(Aq\fR respectively as the
+names of the groups capturing the pertaining components of a date. The
+matching operation combines the three patterns as alternatives:
+.PP
+.Vb 8
+\&    $fmt1 = \*(Aq(?<y>\ed\ed\ed\ed)\-(?<m>\ed\ed)\-(?<d>\ed\ed)\*(Aq;
+\&    $fmt2 = \*(Aq(?<m>\ed\ed)/(?<d>\ed\ed)/(?<y>\ed\ed\ed\ed)\*(Aq;
+\&    $fmt3 = \*(Aq(?<d>\ed\ed)\e.(?<m>\ed\ed)\e.(?<y>\ed\ed\ed\ed)\*(Aq;
+\&    for my $d (qw(2006\-10\-21 15.01.2007 10/31/2005)) {
+\&        if ( $d =~ m{$fmt1|$fmt2|$fmt3} ){
+\&            print "day=$+{d} month=$+{m} year=$+{y}\en";
+\&        }
+\&    }
+.Ve
+.PP
+If any of the alternatives matches, the hash \f(CW\*(C`%+\*(C'\fR is bound to contain the
+three key-value pairs.
+.SS "Alternative capture group numbering"
+.IX Subsection "Alternative capture group numbering"
+Yet another capturing group numbering technique (also as from Perl 5.10)
+deals with the problem of referring to groups within a set of alternatives.
+Consider a pattern for matching a time of the day, civil or military style:
+.PP
+.Vb 3
+\&    if ( $time =~ /(\ed\ed|\ed):(\ed\ed)|(\ed\ed)(\ed\ed)/ ){
+\&        # process hour and minute
+\&    }
+.Ve
+.PP
+Processing the results requires an additional if statement to determine
+whether \f(CW$1\fR and \f(CW$2\fR or \f(CW$3\fR and \f(CW$4\fR contain the goodies. It would
+be easier if we could use group numbers 1 and 2 in second alternative as
+well, and this is exactly what the parenthesized construct \f(CW\*(C`(?|...)\*(C'\fR,
+set around an alternative achieves. Here is an extended version of the
+previous pattern:
+.PP
+.Vb 3
+\&  if($time =~ /(?|(\ed\ed|\ed):(\ed\ed)|(\ed\ed)(\ed\ed))\es+([A\-Z][A\-Z][A\-Z])/){
+\&      print "hour=$1 minute=$2 zone=$3\en";
+\&  }
+.Ve
+.PP
+Within the alternative numbering group, group numbers start at the same
+position for each alternative. After the group, numbering continues
+with one higher than the maximum reached across all the alternatives.
+.SS "Position information"
+.IX Subsection "Position information"
+In addition to what was matched, Perl also provides the
+positions of what was matched as contents of the \f(CW\*(C`@\-\*(C'\fR and \f(CW\*(C`@+\*(C'\fR
+arrays. \f(CW\*(C`$\-[0]\*(C'\fR is the position of the start of the entire match and
+\&\f(CW$+[0]\fR is the position of the end. Similarly, \f(CW\*(C`$\-[n]\*(C'\fR is the
+position of the start of the \f(CW$n\fR match and \f(CW$+[n]\fR is the position
+of the end. If \f(CW$n\fR is undefined, so are \f(CW\*(C`$\-[n]\*(C'\fR and \f(CW$+[n]\fR. Then
+this code
+.PP
+.Vb 6
+\&    $x = "Mmm...donut, thought Homer";
+\&    $x =~ /^(Mmm|Yech)\e.\e.\e.(donut|peas)/; # matches
+\&    foreach $exp (1..$#\-) {
+\&        no strict \*(Aqrefs\*(Aq;
+\&        print "Match $exp: \*(Aq$$exp\*(Aq at position ($\-[$exp],$+[$exp])\en";
+\&    }
+.Ve
+.PP
+prints
+.PP
+.Vb 2
+\&    Match 1: \*(AqMmm\*(Aq at position (0,3)
+\&    Match 2: \*(Aqdonut\*(Aq at position (6,11)
+.Ve
+.PP
+Even if there are no groupings in a regexp, it is still possible to
+find out what exactly matched in a string.  If you use them, Perl
+will set \f(CW\*(C`$\`\*(C'\fR to the part of the string before the match, will set \f(CW$&\fR
+to the part of the string that matched, and will set \f(CW\*(C`$\*(Aq\*(C'\fR to the part
+of the string after the match.  An example:
+.PP
+.Vb 3
+\&    $x = "the cat caught the mouse";
+\&    $x =~ /cat/;  # $\` = \*(Aqthe \*(Aq, $& = \*(Aqcat\*(Aq, $\*(Aq = \*(Aq caught the mouse\*(Aq
+\&    $x =~ /the/;  # $\` = \*(Aq\*(Aq, $& = \*(Aqthe\*(Aq, $\*(Aq = \*(Aq cat caught the mouse\*(Aq
+.Ve
+.PP
+In the second match, \f(CW\*(C`$\`\*(C'\fR equals \f(CW\*(Aq\*(Aq\fR because the regexp matched at the
+first character position in the string and stopped; it never saw the
+second "the".
+.PP
+If your code is to run on Perl versions earlier than
+5.20, it is worthwhile to note that using \f(CW\*(C`$\`\*(C'\fR and \f(CW\*(C`$\*(Aq\*(C'\fR
+slows down regexp matching quite a bit, while \f(CW$&\fR slows it down to a
+lesser extent, because if they are used in one regexp in a program,
+they are generated for \fIall\fR regexps in the program.  So if raw
+performance is a goal of your application, they should be avoided.
+If you need to extract the corresponding substrings, use \f(CW\*(C`@\-\*(C'\fR and
+\&\f(CW\*(C`@+\*(C'\fR instead:
+.PP
+.Vb 3
+\&    $\` is the same as substr( $x, 0, $\-[0] )
+\&    $& is the same as substr( $x, $\-[0], $+[0]\-$\-[0] )
+\&    $\*(Aq is the same as substr( $x, $+[0] )
+.Ve
+.PP
+As of Perl 5.10, the \f(CW\*(C`${^PREMATCH}\*(C'\fR, \f(CW\*(C`${^MATCH}\*(C'\fR and \f(CW\*(C`${^POSTMATCH}\*(C'\fR
+variables may be used.  These are only set if the \f(CW\*(C`/p\*(C'\fR modifier is
+present.  Consequently they do not penalize the rest of the program.  In
+Perl 5.20, \f(CW\*(C`${^PREMATCH}\*(C'\fR, \f(CW\*(C`${^MATCH}\*(C'\fR and \f(CW\*(C`${^POSTMATCH}\*(C'\fR are available
+whether the \f(CW\*(C`/p\*(C'\fR has been used or not (the modifier is ignored), and
+\&\f(CW\*(C`$\`\*(C'\fR, \f(CW\*(C`$\*(Aq\*(C'\fR and \f(CW$&\fR do not cause any speed difference.
+.SS "Non-capturing groupings"
+.IX Subsection "Non-capturing groupings"
+A group that is required to bundle a set of alternatives may or may not be
+useful as a capturing group.  If it isn't, it just creates a superfluous
+addition to the set of available capture group values, inside as well as
+outside the regexp.  Non-capturing groupings, denoted by \f(CW\*(C`(?:regexp)\*(C'\fR,
+still allow the regexp to be treated as a single unit, but don't establish
+a capturing group at the same time.  Both capturing and non-capturing
+groupings are allowed to co-exist in the same regexp.  Because there is
+no extraction, non-capturing groupings are faster than capturing
+groupings.  Non-capturing groupings are also handy for choosing exactly
+which parts of a regexp are to be extracted to matching variables:
+.PP
+.Vb 2
+\&    # match a number, $1\-$4 are set, but we only want $1
+\&    /([+\-]?\e *(\ed+(\e.\ed*)?|\e.\ed+)([eE][+\-]?\ed+)?)/;
+\&
+\&    # match a number faster , only $1 is set
+\&    /([+\-]?\e *(?:\ed+(?:\e.\ed*)?|\e.\ed+)(?:[eE][+\-]?\ed+)?)/;
+\&
+\&    # match a number, get $1 = whole number, $2 = exponent
+\&    /([+\-]?\e *(?:\ed+(?:\e.\ed*)?|\e.\ed+)(?:[eE]([+\-]?\ed+))?)/;
+.Ve
+.PP
+Non-capturing groupings are also useful for removing nuisance
+elements gathered from a split operation where parentheses are
+required for some reason:
+.PP
+.Vb 3
+\&    $x = \*(Aq12aba34ba5\*(Aq;
+\&    @num = split /(a|b)+/, $x;    # @num = (\*(Aq12\*(Aq,\*(Aqa\*(Aq,\*(Aq34\*(Aq,\*(Aqa\*(Aq,\*(Aq5\*(Aq)
+\&    @num = split /(?:a|b)+/, $x;  # @num = (\*(Aq12\*(Aq,\*(Aq34\*(Aq,\*(Aq5\*(Aq)
+.Ve
+.PP
+In Perl 5.22 and later, all groups within a regexp can be set to
+non-capturing by using the new \f(CW\*(C`/n\*(C'\fR flag:
+.PP
+.Vb 1
+\&    "hello" =~ /(hi|hello)/n; # $1 is not set!
+.Ve
+.PP
+See "n" in perlre for more information.
+.SS "Matching repetitions"
+.IX Subsection "Matching repetitions"
+The examples in the previous section display an annoying weakness.  We
+were only matching 3\-letter words, or chunks of words of 4 letters or
+less.  We'd like to be able to match words or, more generally, strings
+of any length, without writing out tedious alternatives like
+\&\f(CW\*(C`\ew\ew\ew\ew|\ew\ew\ew|\ew\ew|\ew\*(C'\fR.
+.PP
+This is exactly the problem the \fIquantifier\fR metacharacters \f(CW\*(Aq?\*(Aq\fR,
+\&\f(CW\*(Aq*\*(Aq\fR, \f(CW\*(Aq+\*(Aq\fR, and \f(CW\*(C`{}\*(C'\fR were created for.  They allow us to delimit the
+number of repeats for a portion of a regexp we consider to be a
+match.  Quantifiers are put immediately after the character, character
+class, or grouping that we want to specify.  They have the following
+meanings:
+.IP \(bu 4
+\&\f(CW\*(C`a?\*(C'\fR means: match \f(CW\*(Aqa\*(Aq\fR 1 or 0 times
+.IP \(bu 4
+\&\f(CW\*(C`a*\*(C'\fR means: match \f(CW\*(Aqa\*(Aq\fR 0 or more times, \fIi.e.\fR, any number of times
+.IP \(bu 4
+\&\f(CW\*(C`a+\*(C'\fR means: match \f(CW\*(Aqa\*(Aq\fR 1 or more times, \fIi.e.\fR, at least once
+.IP \(bu 4
+\&\f(CW\*(C`a{n,m}\*(C'\fR means: match at least \f(CW\*(C`n\*(C'\fR times, but not more than \f(CW\*(C`m\*(C'\fR
+times.
+.IP \(bu 4
+\&\f(CW\*(C`a{n,}\*(C'\fR means: match at least \f(CW\*(C`n\*(C'\fR or more times
+.IP \(bu 4
+\&\f(CW\*(C`a{,n}\*(C'\fR means: match at most \f(CW\*(C`n\*(C'\fR times, or fewer
+.IP \(bu 4
+\&\f(CW\*(C`a{n}\*(C'\fR means: match exactly \f(CW\*(C`n\*(C'\fR times
+.PP
+If you like, you can add blanks (tab or space characters) within the
+braces, but adjacent to them, and/or next to the comma (if any).
+.PP
+Here are some examples:
+.PP
+.Vb 10
+\&    /[a\-z]+\es+\ed*/;  # match a lowercase word, at least one space, and
+\&                     # any number of digits
+\&    /(\ew+)\es+\eg1/;    # match doubled words of arbitrary length
+\&    /y(es)?/i;       # matches \*(Aqy\*(Aq, \*(AqY\*(Aq, or a case\-insensitive \*(Aqyes\*(Aq
+\&    $year =~ /^\ed{2,4}$/;  # make sure year is at least 2 but not more
+\&                           # than 4 digits
+\&    $year =~ /^\ed{ 2, 4 }$/;    # Same; for those who like wide open
+\&                                # spaces.
+\&    $year =~ /^\ed{2, 4}$/;      # Same.
+\&    $year =~ /^\ed{4}$|^\ed{2}$/; # better match; throw out 3\-digit dates
+\&    $year =~ /^\ed{2}(\ed{2})?$/; # same thing written differently.
+\&                                # However, this captures the last two
+\&                                # digits in $1 and the other does not.
+\&
+\&    % simple_grep \*(Aq^(\ew+)\eg1$\*(Aq /usr/dict/words   # isn\*(Aqt this easier?
+\&    beriberi
+\&    booboo
+\&    coco
+\&    mama
+\&    murmur
+\&    papa
+.Ve
+.PP
+For all of these quantifiers, Perl will try to match as much of the
+string as possible, while still allowing the regexp to succeed.  Thus
+with \f(CW\*(C`/a?.../\*(C'\fR, Perl will first try to match the regexp with the \f(CW\*(Aqa\*(Aq\fR
+present; if that fails, Perl will try to match the regexp without the
+\&\f(CW\*(Aqa\*(Aq\fR present.  For the quantifier \f(CW\*(Aq*\*(Aq\fR, we get the following:
+.PP
+.Vb 5
+\&    $x = "the cat in the hat";
+\&    $x =~ /^(.*)(cat)(.*)$/; # matches,
+\&                             # $1 = \*(Aqthe \*(Aq
+\&                             # $2 = \*(Aqcat\*(Aq
+\&                             # $3 = \*(Aq in the hat\*(Aq
+.Ve
+.PP
+Which is what we might expect, the match finds the only \f(CW\*(C`cat\*(C'\fR in the
+string and locks onto it.  Consider, however, this regexp:
+.PP
+.Vb 4
+\&    $x =~ /^(.*)(at)(.*)$/; # matches,
+\&                            # $1 = \*(Aqthe cat in the h\*(Aq
+\&                            # $2 = \*(Aqat\*(Aq
+\&                            # $3 = \*(Aq\*(Aq   (0 characters match)
+.Ve
+.PP
+One might initially guess that Perl would find the \f(CW\*(C`at\*(C'\fR in \f(CW\*(C`cat\*(C'\fR and
+stop there, but that wouldn't give the longest possible string to the
+first quantifier \f(CW\*(C`.*\*(C'\fR.  Instead, the first quantifier \f(CW\*(C`.*\*(C'\fR grabs as
+much of the string as possible while still having the regexp match.  In
+this example, that means having the \f(CW\*(C`at\*(C'\fR sequence with the final \f(CW\*(C`at\*(C'\fR
+in the string.  The other important principle illustrated here is that,
+when there are two or more elements in a regexp, the \fIleftmost\fR
+quantifier, if there is one, gets to grab as much of the string as
+possible, leaving the rest of the regexp to fight over scraps.  Thus in
+our example, the first quantifier \f(CW\*(C`.*\*(C'\fR grabs most of the string, while
+the second quantifier \f(CW\*(C`.*\*(C'\fR gets the empty string.   Quantifiers that
+grab as much of the string as possible are called \fImaximal match\fR or
+\&\fIgreedy\fR quantifiers.
+.PP
+When a regexp can match a string in several different ways, we can use
+the principles above to predict which way the regexp will match:
+.IP \(bu 4
+Principle 0: Taken as a whole, any regexp will be matched at the
+earliest possible position in the string.
+.IP \(bu 4
+Principle 1: In an alternation \f(CW\*(C`a|b|c...\*(C'\fR, the leftmost alternative
+that allows a match for the whole regexp will be the one used.
+.IP \(bu 4
+Principle 2: The maximal matching quantifiers \f(CW\*(Aq?\*(Aq\fR, \f(CW\*(Aq*\*(Aq\fR, \f(CW\*(Aq+\*(Aq\fR and
+\&\f(CW\*(C`{n,m}\*(C'\fR will in general match as much of the string as possible while
+still allowing the whole regexp to match.
+.IP \(bu 4
+Principle 3: If there are two or more elements in a regexp, the
+leftmost greedy quantifier, if any, will match as much of the string
+as possible while still allowing the whole regexp to match.  The next
+leftmost greedy quantifier, if any, will try to match as much of the
+string remaining available to it as possible, while still allowing the
+whole regexp to match.  And so on, until all the regexp elements are
+satisfied.
+.PP
+As we have seen above, Principle 0 overrides the others. The regexp
+will be matched as early as possible, with the other principles
+determining how the regexp matches at that earliest character
+position.
+.PP
+Here is an example of these principles in action:
+.PP
+.Vb 5
+\&    $x = "The programming republic of Perl";
+\&    $x =~ /^(.+)(e|r)(.*)$/;  # matches,
+\&                              # $1 = \*(AqThe programming republic of Pe\*(Aq
+\&                              # $2 = \*(Aqr\*(Aq
+\&                              # $3 = \*(Aql\*(Aq
+.Ve
+.PP
+This regexp matches at the earliest string position, \f(CW\*(AqT\*(Aq\fR.  One
+might think that \f(CW\*(Aqe\*(Aq\fR, being leftmost in the alternation, would be
+matched, but \f(CW\*(Aqr\*(Aq\fR produces the longest string in the first quantifier.
+.PP
+.Vb 3
+\&    $x =~ /(m{1,2})(.*)$/;  # matches,
+\&                            # $1 = \*(Aqmm\*(Aq
+\&                            # $2 = \*(Aqing republic of Perl\*(Aq
+.Ve
+.PP
+Here, The earliest possible match is at the first \f(CW\*(Aqm\*(Aq\fR in
+\&\f(CW\*(C`programming\*(C'\fR. \f(CW\*(C`m{1,2}\*(C'\fR is the first quantifier, so it gets to match
+a maximal \f(CW\*(C`mm\*(C'\fR.
+.PP
+.Vb 3
+\&    $x =~ /.*(m{1,2})(.*)$/;  # matches,
+\&                              # $1 = \*(Aqm\*(Aq
+\&                              # $2 = \*(Aqing republic of Perl\*(Aq
+.Ve
+.PP
+Here, the regexp matches at the start of the string. The first
+quantifier \f(CW\*(C`.*\*(C'\fR grabs as much as possible, leaving just a single
+\&\f(CW\*(Aqm\*(Aq\fR for the second quantifier \f(CW\*(C`m{1,2}\*(C'\fR.
+.PP
+.Vb 4
+\&    $x =~ /(.?)(m{1,2})(.*)$/;  # matches,
+\&                                # $1 = \*(Aqa\*(Aq
+\&                                # $2 = \*(Aqmm\*(Aq
+\&                                # $3 = \*(Aqing republic of Perl\*(Aq
+.Ve
+.PP
+Here, \f(CW\*(C`.?\*(C'\fR eats its maximal one character at the earliest possible
+position in the string, \f(CW\*(Aqa\*(Aq\fR in \f(CW\*(C`programming\*(C'\fR, leaving \f(CW\*(C`m{1,2}\*(C'\fR
+the opportunity to match both \f(CW\*(Aqm\*(Aq\fR's. Finally,
+.PP
+.Vb 1
+\&    "aXXXb" =~ /(X*)/; # matches with $1 = \*(Aq\*(Aq
+.Ve
+.PP
+because it can match zero copies of \f(CW\*(AqX\*(Aq\fR at the beginning of the
+string.  If you definitely want to match at least one \f(CW\*(AqX\*(Aq\fR, use
+\&\f(CW\*(C`X+\*(C'\fR, not \f(CW\*(C`X*\*(C'\fR.
+.PP
+Sometimes greed is not good.  At times, we would like quantifiers to
+match a \fIminimal\fR piece of string, rather than a maximal piece.  For
+this purpose, Larry Wall created the \fIminimal match\fR or
+\&\fInon-greedy\fR quantifiers \f(CW\*(C`??\*(C'\fR, \f(CW\*(C`*?\*(C'\fR, \f(CW\*(C`+?\*(C'\fR, and \f(CW\*(C`{}?\*(C'\fR.  These are
+the usual quantifiers with a \f(CW\*(Aq?\*(Aq\fR appended to them.  They have the
+following meanings:
+.IP \(bu 4
+\&\f(CW\*(C`a??\*(C'\fR means: match \f(CW\*(Aqa\*(Aq\fR 0 or 1 times. Try 0 first, then 1.
+.IP \(bu 4
+\&\f(CW\*(C`a*?\*(C'\fR means: match \f(CW\*(Aqa\*(Aq\fR 0 or more times, \fIi.e.\fR, any number of times,
+but as few times as possible
+.IP \(bu 4
+\&\f(CW\*(C`a+?\*(C'\fR means: match \f(CW\*(Aqa\*(Aq\fR 1 or more times, \fIi.e.\fR, at least once, but
+as few times as possible
+.IP \(bu 4
+\&\f(CW\*(C`a{n,m}?\*(C'\fR means: match at least \f(CW\*(C`n\*(C'\fR times, not more than \f(CW\*(C`m\*(C'\fR
+times, as few times as possible
+.IP \(bu 4
+\&\f(CW\*(C`a{n,}?\*(C'\fR means: match at least \f(CW\*(C`n\*(C'\fR times, but as few times as
+possible
+.IP \(bu 4
+\&\f(CW\*(C`a{,n}?\*(C'\fR means: match at most \f(CW\*(C`n\*(C'\fR times, but as few times as
+possible
+.IP \(bu 4
+\&\f(CW\*(C`a{n}?\*(C'\fR means: match exactly \f(CW\*(C`n\*(C'\fR times.  Because we match exactly
+\&\f(CW\*(C`n\*(C'\fR times, \f(CW\*(C`a{n}?\*(C'\fR is equivalent to \f(CW\*(C`a{n}\*(C'\fR and is just there for
+notational consistency.
+.PP
+Let's look at the example above, but with minimal quantifiers:
+.PP
+.Vb 5
+\&    $x = "The programming republic of Perl";
+\&    $x =~ /^(.+?)(e|r)(.*)$/; # matches,
+\&                              # $1 = \*(AqTh\*(Aq
+\&                              # $2 = \*(Aqe\*(Aq
+\&                              # $3 = \*(Aq programming republic of Perl\*(Aq
+.Ve
+.PP
+The minimal string that will allow both the start of the string \f(CW\*(Aq^\*(Aq\fR
+and the alternation to match is \f(CW\*(C`Th\*(C'\fR, with the alternation \f(CW\*(C`e|r\*(C'\fR
+matching \f(CW\*(Aqe\*(Aq\fR.  The second quantifier \f(CW\*(C`.*\*(C'\fR is free to gobble up the
+rest of the string.
+.PP
+.Vb 3
+\&    $x =~ /(m{1,2}?)(.*?)$/;  # matches,
+\&                              # $1 = \*(Aqm\*(Aq
+\&                              # $2 = \*(Aqming republic of Perl\*(Aq
+.Ve
+.PP
+The first string position that this regexp can match is at the first
+\&\f(CW\*(Aqm\*(Aq\fR in \f(CW\*(C`programming\*(C'\fR. At this position, the minimal \f(CW\*(C`m{1,2}?\*(C'\fR
+matches just one \f(CW\*(Aqm\*(Aq\fR.  Although the second quantifier \f(CW\*(C`.*?\*(C'\fR would
+prefer to match no characters, it is constrained by the end-of-string
+anchor \f(CW\*(Aq$\*(Aq\fR to match the rest of the string.
+.PP
+.Vb 4
+\&    $x =~ /(.*?)(m{1,2}?)(.*)$/;  # matches,
+\&                                  # $1 = \*(AqThe progra\*(Aq
+\&                                  # $2 = \*(Aqm\*(Aq
+\&                                  # $3 = \*(Aqming republic of Perl\*(Aq
+.Ve
+.PP
+In this regexp, you might expect the first minimal quantifier \f(CW\*(C`.*?\*(C'\fR
+to match the empty string, because it is not constrained by a \f(CW\*(Aq^\*(Aq\fR
+anchor to match the beginning of the word.  Principle 0 applies here,
+however.  Because it is possible for the whole regexp to match at the
+start of the string, it \fIwill\fR match at the start of the string.  Thus
+the first quantifier has to match everything up to the first \f(CW\*(Aqm\*(Aq\fR.  The
+second minimal quantifier matches just one \f(CW\*(Aqm\*(Aq\fR and the third
+quantifier matches the rest of the string.
+.PP
+.Vb 4
+\&    $x =~ /(.??)(m{1,2})(.*)$/;  # matches,
+\&                                 # $1 = \*(Aqa\*(Aq
+\&                                 # $2 = \*(Aqmm\*(Aq
+\&                                 # $3 = \*(Aqing republic of Perl\*(Aq
+.Ve
+.PP
+Just as in the previous regexp, the first quantifier \f(CW\*(C`.??\*(C'\fR can match
+earliest at position \f(CW\*(Aqa\*(Aq\fR, so it does.  The second quantifier is
+greedy, so it matches \f(CW\*(C`mm\*(C'\fR, and the third matches the rest of the
+string.
+.PP
+We can modify principle 3 above to take into account non-greedy
+quantifiers:
+.IP \(bu 4
+Principle 3: If there are two or more elements in a regexp, the
+leftmost greedy (non-greedy) quantifier, if any, will match as much
+(little) of the string as possible while still allowing the whole
+regexp to match.  The next leftmost greedy (non-greedy) quantifier, if
+any, will try to match as much (little) of the string remaining
+available to it as possible, while still allowing the whole regexp to
+match.  And so on, until all the regexp elements are satisfied.
+.PP
+Just like alternation, quantifiers are also susceptible to
+backtracking.  Here is a step-by-step analysis of the example
+.PP
+.Vb 5
+\&    $x = "the cat in the hat";
+\&    $x =~ /^(.*)(at)(.*)$/; # matches,
+\&                            # $1 = \*(Aqthe cat in the h\*(Aq
+\&                            # $2 = \*(Aqat\*(Aq
+\&                            # $3 = \*(Aq\*(Aq   (0 matches)
+.Ve
+.IP 1. 4
+Start with the first letter in the string \f(CW\*(Aqt\*(Aq\fR.
+.IP 2. 4
+The first quantifier \f(CW\*(Aq.*\*(Aq\fR starts out by matching the whole
+string \f(CW"the cat in the hat"\fR.
+.IP 3. 4
+\&\f(CW\*(Aqa\*(Aq\fR in the regexp element \f(CW\*(Aqat\*(Aq\fR doesn't match the end
+of the string.  Backtrack one character.
+.IP 4. 4
+\&\f(CW\*(Aqa\*(Aq\fR in the regexp element \f(CW\*(Aqat\*(Aq\fR still doesn't match
+the last letter of the string \f(CW\*(Aqt\*(Aq\fR, so backtrack one more character.
+.IP 5. 4
+Now we can match the \f(CW\*(Aqa\*(Aq\fR and the \f(CW\*(Aqt\*(Aq\fR.
+.IP 6. 4
+Move on to the third element \f(CW\*(Aq.*\*(Aq\fR.  Since we are at the
+end of the string and \f(CW\*(Aq.*\*(Aq\fR can match 0 times, assign it the empty
+string.
+.IP 7. 4
+We are done!
+.PP
+Most of the time, all this moving forward and backtracking happens
+quickly and searching is fast. There are some pathological regexps,
+however, whose execution time exponentially grows with the size of the
+string.  A typical structure that blows up in your face is of the form
+.PP
+.Vb 1
+\&    /(a|b+)*/;
+.Ve
+.PP
+The problem is the nested indeterminate quantifiers.  There are many
+different ways of partitioning a string of length n between the \f(CW\*(Aq+\*(Aq\fR
+and \f(CW\*(Aq*\*(Aq\fR: one repetition with \f(CW\*(C`b+\*(C'\fR of length n, two repetitions with
+the first \f(CW\*(C`b+\*(C'\fR length k and the second with length n\-k, m repetitions
+whose bits add up to length n, \fIetc\fR.  In fact there are an exponential
+number of ways to partition a string as a function of its length.  A
+regexp may get lucky and match early in the process, but if there is
+no match, Perl will try \fIevery\fR possibility before giving up.  So be
+careful with nested \f(CW\*(Aq*\*(Aq\fR's, \f(CW\*(C`{n,m}\*(C'\fR's, and \f(CW\*(Aq+\*(Aq\fR's.  The book
+\&\fIMastering Regular Expressions\fR by Jeffrey Friedl gives a wonderful
+discussion of this and other efficiency issues.
+.SS "Possessive quantifiers"
+.IX Subsection "Possessive quantifiers"
+Backtracking during the relentless search for a match may be a waste
+of time, particularly when the match is bound to fail.  Consider
+the simple pattern
+.PP
+.Vb 1
+\&    /^\ew+\es+\ew+$/; # a word, spaces, a word
+.Ve
+.PP
+Whenever this is applied to a string which doesn't quite meet the
+pattern's expectations such as \f(CW"abc\ \ "\fR or \f(CW"abc\ \ def\ "\fR,
+the regexp engine will backtrack, approximately once for each character
+in the string.  But we know that there is no way around taking \fIall\fR
+of the initial word characters to match the first repetition, that \fIall\fR
+spaces must be eaten by the middle part, and the same goes for the second
+word.
+.PP
+With the introduction of the \fIpossessive quantifiers\fR in Perl 5.10, we
+have a way of instructing the regexp engine not to backtrack, with the
+usual quantifiers with a \f(CW\*(Aq+\*(Aq\fR appended to them.  This makes them greedy as
+well as stingy; once they succeed they won't give anything back to permit
+another solution. They have the following meanings:
+.IP \(bu 4
+\&\f(CW\*(C`a{n,m}+\*(C'\fR means: match at least \f(CW\*(C`n\*(C'\fR times, not more than \f(CW\*(C`m\*(C'\fR times,
+as many times as possible, and don't give anything up. \f(CW\*(C`a?+\*(C'\fR is short
+for \f(CW\*(C`a{0,1}+\*(C'\fR
+.IP \(bu 4
+\&\f(CW\*(C`a{n,}+\*(C'\fR means: match at least \f(CW\*(C`n\*(C'\fR times, but as many times as possible,
+and don't give anything up. \f(CW\*(C`a++\*(C'\fR is short for \f(CW\*(C`a{1,}+\*(C'\fR.
+.IP \(bu 4
+\&\f(CW\*(C`a{,n}+\*(C'\fR means: match as many times as possible up to at most \f(CW\*(C`n\*(C'\fR
+times, and don't give anything up. \f(CW\*(C`a*+\*(C'\fR is short for \f(CW\*(C`a{0,}+\*(C'\fR.
+.IP \(bu 4
+\&\f(CW\*(C`a{n}+\*(C'\fR means: match exactly \f(CW\*(C`n\*(C'\fR times.  It is just there for
+notational consistency.
+.PP
+These possessive quantifiers represent a special case of a more general
+concept, the \fIindependent subexpression\fR, see below.
+.PP
+As an example where a possessive quantifier is suitable we consider
+matching a quoted string, as it appears in several programming languages.
+The backslash is used as an escape character that indicates that the
+next character is to be taken literally, as another character for the
+string.  Therefore, after the opening quote, we expect a (possibly
+empty) sequence of alternatives: either some character except an
+unescaped quote or backslash or an escaped character.
+.PP
+.Vb 1
+\&    /"(?:[^"\e\e]++|\e\e.)*+"/;
+.Ve
+.SS "Building a regexp"
+.IX Subsection "Building a regexp"
+At this point, we have all the basic regexp concepts covered, so let's
+give a more involved example of a regular expression.  We will build a
+regexp that matches numbers.
+.PP
+The first task in building a regexp is to decide what we want to match
+and what we want to exclude.  In our case, we want to match both
+integers and floating point numbers and we want to reject any string
+that isn't a number.
+.PP
+The next task is to break the problem down into smaller problems that
+are easily converted into a regexp.
+.PP
+The simplest case is integers.  These consist of a sequence of digits,
+with an optional sign in front.  The digits we can represent with
+\&\f(CW\*(C`\ed+\*(C'\fR and the sign can be matched with \f(CW\*(C`[+\-]\*(C'\fR.  Thus the integer
+regexp is
+.PP
+.Vb 1
+\&    /[+\-]?\ed+/;  # matches integers
+.Ve
+.PP
+A floating point number potentially has a sign, an integral part, a
+decimal point, a fractional part, and an exponent.  One or more of these
+parts is optional, so we need to check out the different
+possibilities.  Floating point numbers which are in proper form include
+123., 0.345, .34, \-1e6, and 25.4E\-72.  As with integers, the sign out
+front is completely optional and can be matched by \f(CW\*(C`[+\-]?\*(C'\fR.  We can
+see that if there is no exponent, floating point numbers must have a
+decimal point, otherwise they are integers.  We might be tempted to
+model these with \f(CW\*(C`\ed*\e.\ed*\*(C'\fR, but this would also match just a single
+decimal point, which is not a number.  So the three cases of floating
+point number without exponent are
+.PP
+.Vb 3
+\&   /[+\-]?\ed+\e./;  # 1., 321., etc.
+\&   /[+\-]?\e.\ed+/;  # .1, .234, etc.
+\&   /[+\-]?\ed+\e.\ed+/;  # 1.0, 30.56, etc.
+.Ve
+.PP
+These can be combined into a single regexp with a three-way alternation:
+.PP
+.Vb 1
+\&   /[+\-]?(\ed+\e.\ed+|\ed+\e.|\e.\ed+)/;  # floating point, no exponent
+.Ve
+.PP
+In this alternation, it is important to put \f(CW\*(Aq\ed+\e.\ed+\*(Aq\fR before
+\&\f(CW\*(Aq\ed+\e.\*(Aq\fR.  If \f(CW\*(Aq\ed+\e.\*(Aq\fR were first, the regexp would happily match that
+and ignore the fractional part of the number.
+.PP
+Now consider floating point numbers with exponents.  The key
+observation here is that \fIboth\fR integers and numbers with decimal
+points are allowed in front of an exponent.  Then exponents, like the
+overall sign, are independent of whether we are matching numbers with
+or without decimal points, and can be "decoupled" from the
+mantissa.  The overall form of the regexp now becomes clear:
+.PP
+.Vb 1
+\&    /^(optional sign)(integer | f.p. mantissa)(optional exponent)$/;
+.Ve
+.PP
+The exponent is an \f(CW\*(Aqe\*(Aq\fR or \f(CW\*(AqE\*(Aq\fR, followed by an integer.  So the
+exponent regexp is
+.PP
+.Vb 1
+\&   /[eE][+\-]?\ed+/;  # exponent
+.Ve
+.PP
+Putting all the parts together, we get a regexp that matches numbers:
+.PP
+.Vb 1
+\&   /^[+\-]?(\ed+\e.\ed+|\ed+\e.|\e.\ed+|\ed+)([eE][+\-]?\ed+)?$/;  # Ta da!
+.Ve
+.PP
+Long regexps like this may impress your friends, but can be hard to
+decipher.  In complex situations like this, the \f(CW\*(C`/x\*(C'\fR modifier for a
+match is invaluable.  It allows one to put nearly arbitrary whitespace
+and comments into a regexp without affecting their meaning.  Using it,
+we can rewrite our "extended" regexp in the more pleasing form
+.PP
+.Vb 10
+\&   /^
+\&      [+\-]?         # first, match an optional sign
+\&      (             # then match integers or f.p. mantissas:
+\&          \ed+\e.\ed+  # mantissa of the form a.b
+\&         |\ed+\e.     # mantissa of the form a.
+\&         |\e.\ed+     # mantissa of the form .b
+\&         |\ed+       # integer of the form a
+\&      )
+\&      ( [eE] [+\-]? \ed+ )?  # finally, optionally match an exponent
+\&   $/x;
+.Ve
+.PP
+If whitespace is mostly irrelevant, how does one include space
+characters in an extended regexp? The answer is to backslash it
+\&\f(CW\*(Aq\e\ \*(Aq\fR or put it in a character class \f(CW\*(C`[\ ]\*(C'\fR.  The same thing
+goes for pound signs: use \f(CW\*(C`\e#\*(C'\fR or \f(CW\*(C`[#]\*(C'\fR.  For instance, Perl allows
+a space between the sign and the mantissa or integer, and we could add
+this to our regexp as follows:
+.PP
+.Vb 10
+\&   /^
+\&      [+\-]?\e *      # first, match an optional sign *and space*
+\&      (             # then match integers or f.p. mantissas:
+\&          \ed+\e.\ed+  # mantissa of the form a.b
+\&         |\ed+\e.     # mantissa of the form a.
+\&         |\e.\ed+     # mantissa of the form .b
+\&         |\ed+       # integer of the form a
+\&      )
+\&      ( [eE] [+\-]? \ed+ )?  # finally, optionally match an exponent
+\&   $/x;
+.Ve
+.PP
+In this form, it is easier to see a way to simplify the
+alternation.  Alternatives 1, 2, and 4 all start with \f(CW\*(C`\ed+\*(C'\fR, so it
+could be factored out:
+.PP
+.Vb 11
+\&   /^
+\&      [+\-]?\e *      # first, match an optional sign
+\&      (             # then match integers or f.p. mantissas:
+\&          \ed+       # start out with a ...
+\&          (
+\&              \e.\ed* # mantissa of the form a.b or a.
+\&          )?        # ? takes care of integers of the form a
+\&         |\e.\ed+     # mantissa of the form .b
+\&      )
+\&      ( [eE] [+\-]? \ed+ )?  # finally, optionally match an exponent
+\&   $/x;
+.Ve
+.PP
+Starting in Perl v5.26, specifying \f(CW\*(C`/xx\*(C'\fR changes the square-bracketed
+portions of a pattern to ignore tabs and space characters unless they
+are escaped by preceding them with a backslash.  So, we could write
+.PP
+.Vb 11
+\&   /^
+\&      [ + \- ]?\e *   # first, match an optional sign
+\&      (             # then match integers or f.p. mantissas:
+\&          \ed+       # start out with a ...
+\&          (
+\&              \e.\ed* # mantissa of the form a.b or a.
+\&          )?        # ? takes care of integers of the form a
+\&         |\e.\ed+     # mantissa of the form .b
+\&      )
+\&      ( [ e E ] [ + \- ]? \ed+ )?  # finally, optionally match an exponent
+\&   $/xx;
+.Ve
+.PP
+This doesn't really improve the legibility of this example, but it's
+available in case you want it.  Squashing the pattern down to the
+compact form, we have
+.PP
+.Vb 1
+\&    /^[+\-]?\e *(\ed+(\e.\ed*)?|\e.\ed+)([eE][+\-]?\ed+)?$/;
+.Ve
+.PP
+This is our final regexp.  To recap, we built a regexp by
+.IP \(bu 4
+specifying the task in detail,
+.IP \(bu 4
+breaking down the problem into smaller parts,
+.IP \(bu 4
+translating the small parts into regexps,
+.IP \(bu 4
+combining the regexps,
+.IP \(bu 4
+and optimizing the final combined regexp.
+.PP
+These are also the typical steps involved in writing a computer
+program.  This makes perfect sense, because regular expressions are
+essentially programs written in a little computer language that specifies
+patterns.
+.SS "Using regular expressions in Perl"
+.IX Subsection "Using regular expressions in Perl"
+The last topic of Part 1 briefly covers how regexps are used in Perl
+programs.  Where do they fit into Perl syntax?
+.PP
+We have already introduced the matching operator in its default
+\&\f(CW\*(C`/regexp/\*(C'\fR and arbitrary delimiter \f(CW\*(C`m!regexp!\*(C'\fR forms.  We have used
+the binding operator \f(CW\*(C`=~\*(C'\fR and its negation \f(CW\*(C`!~\*(C'\fR to test for string
+matches.  Associated with the matching operator, we have discussed the
+single line \f(CW\*(C`/s\*(C'\fR, multi-line \f(CW\*(C`/m\*(C'\fR, case-insensitive \f(CW\*(C`/i\*(C'\fR and
+extended \f(CW\*(C`/x\*(C'\fR modifiers.  There are a few more things you might
+want to know about matching operators.
+.PP
+\fIProhibiting substitution\fR
+.IX Subsection "Prohibiting substitution"
+.PP
+If you change \f(CW$pattern\fR after the first substitution happens, Perl
+will ignore it.  If you don't want any substitutions at all, use the
+special delimiter \f(CW\*(C`m\*(Aq\*(Aq\*(C'\fR:
+.PP
+.Vb 4
+\&    @pattern = (\*(AqSeuss\*(Aq);
+\&    while (<>) {
+\&        print if m\*(Aq@pattern\*(Aq;  # matches literal \*(Aq@pattern\*(Aq, not \*(AqSeuss\*(Aq
+\&    }
+.Ve
+.PP
+Similar to strings, \f(CW\*(C`m\*(Aq\*(Aq\*(C'\fR acts like apostrophes on a regexp; all other
+\&\f(CW\*(Aqm\*(Aq\fR delimiters act like quotes.  If the regexp evaluates to the empty string,
+the regexp in the \fIlast successful match\fR is used instead.  So we have
+.PP
+.Vb 2
+\&    "dog" =~ /d/;  # \*(Aqd\*(Aq matches
+\&    "dogbert" =~ //;  # this matches the \*(Aqd\*(Aq regexp used before
+.Ve
+.PP
+\fIGlobal matching\fR
+.IX Subsection "Global matching"
+.PP
+The final two modifiers we will discuss here,
+\&\f(CW\*(C`/g\*(C'\fR and \f(CW\*(C`/c\*(C'\fR, concern multiple matches.
+The modifier \f(CW\*(C`/g\*(C'\fR stands for global matching and allows the
+matching operator to match within a string as many times as possible.
+In scalar context, successive invocations against a string will have
+\&\f(CW\*(C`/g\*(C'\fR jump from match to match, keeping track of position in the
+string as it goes along.  You can get or set the position with the
+\&\f(CWpos()\fR function.
+.PP
+The use of \f(CW\*(C`/g\*(C'\fR is shown in the following example.  Suppose we have
+a string that consists of words separated by spaces.  If we know how
+many words there are in advance, we could extract the words using
+groupings:
+.PP
+.Vb 5
+\&    $x = "cat dog house"; # 3 words
+\&    $x =~ /^\es*(\ew+)\es+(\ew+)\es+(\ew+)\es*$/; # matches,
+\&                                           # $1 = \*(Aqcat\*(Aq
+\&                                           # $2 = \*(Aqdog\*(Aq
+\&                                           # $3 = \*(Aqhouse\*(Aq
+.Ve
+.PP
+But what if we had an indeterminate number of words? This is the sort
+of task \f(CW\*(C`/g\*(C'\fR was made for.  To extract all words, form the simple
+regexp \f(CW\*(C`(\ew+)\*(C'\fR and loop over all matches with \f(CW\*(C`/(\ew+)/g\*(C'\fR:
+.PP
+.Vb 3
+\&    while ($x =~ /(\ew+)/g) {
+\&        print "Word is $1, ends at position ", pos $x, "\en";
+\&    }
+.Ve
+.PP
+prints
+.PP
+.Vb 3
+\&    Word is cat, ends at position 3
+\&    Word is dog, ends at position 7
+\&    Word is house, ends at position 13
+.Ve
+.PP
+A failed match or changing the target string resets the position.  If
+you don't want the position reset after failure to match, add the
+\&\f(CW\*(C`/c\*(C'\fR, as in \f(CW\*(C`/regexp/gc\*(C'\fR.  The current position in the string is
+associated with the string, not the regexp.  This means that different
+strings have different positions and their respective positions can be
+set or read independently.
+.PP
+In list context, \f(CW\*(C`/g\*(C'\fR returns a list of matched groupings, or if
+there are no groupings, a list of matches to the whole regexp.  So if
+we wanted just the words, we could use
+.PP
+.Vb 4
+\&    @words = ($x =~ /(\ew+)/g);  # matches,
+\&                                # $words[0] = \*(Aqcat\*(Aq
+\&                                # $words[1] = \*(Aqdog\*(Aq
+\&                                # $words[2] = \*(Aqhouse\*(Aq
+.Ve
+.PP
+Closely associated with the \f(CW\*(C`/g\*(C'\fR modifier is the \f(CW\*(C`\eG\*(C'\fR anchor.  The
+\&\f(CW\*(C`\eG\*(C'\fR anchor matches at the point where the previous \f(CW\*(C`/g\*(C'\fR match left
+off.  \f(CW\*(C`\eG\*(C'\fR allows us to easily do context-sensitive matching:
+.PP
+.Vb 12
+\&    $metric = 1;  # use metric units
+\&    ...
+\&    $x = <FILE>;  # read in measurement
+\&    $x =~ /^([+\-]?\ed+)\es*/g;  # get magnitude
+\&    $weight = $1;
+\&    if ($metric) { # error checking
+\&        print "Units error!" unless $x =~ /\eGkg\e./g;
+\&    }
+\&    else {
+\&        print "Units error!" unless $x =~ /\eGlbs\e./g;
+\&    }
+\&    $x =~ /\eG\es+(widget|sprocket)/g;  # continue processing
+.Ve
+.PP
+The combination of \f(CW\*(C`/g\*(C'\fR and \f(CW\*(C`\eG\*(C'\fR allows us to process the string a
+bit at a time and use arbitrary Perl logic to decide what to do next.
+Currently, the \f(CW\*(C`\eG\*(C'\fR anchor is only fully supported when used to anchor
+to the start of the pattern.
+.PP
+\&\f(CW\*(C`\eG\*(C'\fR is also invaluable in processing fixed-length records with
+regexps.  Suppose we have a snippet of coding region DNA, encoded as
+base pair letters \f(CW\*(C`ATCGTTGAAT...\*(C'\fR and we want to find all the stop
+codons \f(CW\*(C`TGA\*(C'\fR.  In a coding region, codons are 3\-letter sequences, so
+we can think of the DNA snippet as a sequence of 3\-letter records.  The
+naive regexp
+.PP
+.Vb 3
+\&    # expanded, this is "ATC GTT GAA TGC AAA TGA CAT GAC"
+\&    $dna = "ATCGTTGAATGCAAATGACATGAC";
+\&    $dna =~ /TGA/;
+.Ve
+.PP
+doesn't work; it may match a \f(CW\*(C`TGA\*(C'\fR, but there is no guarantee that
+the match is aligned with codon boundaries, \fIe.g.\fR, the substring
+\&\f(CW\*(C`GTT\ GAA\*(C'\fR gives a match.  A better solution is
+.PP
+.Vb 3
+\&    while ($dna =~ /(\ew\ew\ew)*?TGA/g) {  # note the minimal *?
+\&        print "Got a TGA stop codon at position ", pos $dna, "\en";
+\&    }
+.Ve
+.PP
+which prints
+.PP
+.Vb 2
+\&    Got a TGA stop codon at position 18
+\&    Got a TGA stop codon at position 23
+.Ve
+.PP
+Position 18 is good, but position 23 is bogus.  What happened?
+.PP
+The answer is that our regexp works well until we get past the last
+real match.  Then the regexp will fail to match a synchronized \f(CW\*(C`TGA\*(C'\fR
+and start stepping ahead one character position at a time, not what we
+want.  The solution is to use \f(CW\*(C`\eG\*(C'\fR to anchor the match to the codon
+alignment:
+.PP
+.Vb 3
+\&    while ($dna =~ /\eG(\ew\ew\ew)*?TGA/g) {
+\&        print "Got a TGA stop codon at position ", pos $dna, "\en";
+\&    }
+.Ve
+.PP
+This prints
+.PP
+.Vb 1
+\&    Got a TGA stop codon at position 18
+.Ve
+.PP
+which is the correct answer.  This example illustrates that it is
+important not only to match what is desired, but to reject what is not
+desired.
+.PP
+(There are other regexp modifiers that are available, such as
+\&\f(CW\*(C`/o\*(C'\fR, but their specialized uses are beyond the
+scope of this introduction.  )
+.PP
+\fISearch and replace\fR
+.IX Subsection "Search and replace"
+.PP
+Regular expressions also play a big role in \fIsearch and replace\fR
+operations in Perl.  Search and replace is accomplished with the
+\&\f(CW\*(C`s///\*(C'\fR operator.  The general form is
+\&\f(CW\*(C`s/regexp/replacement/modifiers\*(C'\fR, with everything we know about
+regexps and modifiers applying in this case as well.  The
+\&\fIreplacement\fR is a Perl double-quoted string that replaces in the
+string whatever is matched with the \f(CW\*(C`regexp\*(C'\fR.  The operator \f(CW\*(C`=~\*(C'\fR is
+also used here to associate a string with \f(CW\*(C`s///\*(C'\fR.  If matching
+against \f(CW$_\fR, the \f(CW\*(C`$_\ =~\*(C'\fR can be dropped.  If there is a match,
+\&\f(CW\*(C`s///\*(C'\fR returns the number of substitutions made; otherwise it returns
+false.  Here are a few examples:
+.PP
+.Vb 8
+\&    $x = "Time to feed the cat!";
+\&    $x =~ s/cat/hacker/;   # $x contains "Time to feed the hacker!"
+\&    if ($x =~ s/^(Time.*hacker)!$/$1 now!/) {
+\&        $more_insistent = 1;
+\&    }
+\&    $y = "\*(Aqquoted words\*(Aq";
+\&    $y =~ s/^\*(Aq(.*)\*(Aq$/$1/;  # strip single quotes,
+\&                           # $y contains "quoted words"
+.Ve
+.PP
+In the last example, the whole string was matched, but only the part
+inside the single quotes was grouped.  With the \f(CW\*(C`s///\*(C'\fR operator, the
+matched variables \f(CW$1\fR, \f(CW$2\fR, \fIetc\fR. are immediately available for use
+in the replacement expression, so we use \f(CW$1\fR to replace the quoted
+string with just what was quoted.  With the global modifier, \f(CW\*(C`s///g\*(C'\fR
+will search and replace all occurrences of the regexp in the string:
+.PP
+.Vb 6
+\&    $x = "I batted 4 for 4";
+\&    $x =~ s/4/four/;   # doesn\*(Aqt do it all:
+\&                       # $x contains "I batted four for 4"
+\&    $x = "I batted 4 for 4";
+\&    $x =~ s/4/four/g;  # does it all:
+\&                       # $x contains "I batted four for four"
+.Ve
+.PP
+If you prefer "regex" over "regexp" in this tutorial, you could use
+the following program to replace it:
+.PP
+.Vb 9
+\&    % cat > simple_replace
+\&    #!/usr/bin/perl
+\&    $regexp = shift;
+\&    $replacement = shift;
+\&    while (<>) {
+\&        s/$regexp/$replacement/g;
+\&        print;
+\&    }
+\&    ^D
+\&
+\&    % simple_replace regexp regex perlretut.pod
+.Ve
+.PP
+In \f(CW\*(C`simple_replace\*(C'\fR we used the \f(CW\*(C`s///g\*(C'\fR modifier to replace all
+occurrences of the regexp on each line.  (Even though the regular
+expression appears in a loop, Perl is smart enough to compile it
+only once.)  As with \f(CW\*(C`simple_grep\*(C'\fR, both the
+\&\f(CW\*(C`print\*(C'\fR and the \f(CW\*(C`s/$regexp/$replacement/g\*(C'\fR use \f(CW$_\fR implicitly.
+.PP
+If you don't want \f(CW\*(C`s///\*(C'\fR to change your original variable you can use
+the non-destructive substitute modifier, \f(CW\*(C`s///r\*(C'\fR.  This changes the
+behavior so that \f(CW\*(C`s///r\*(C'\fR returns the final substituted string
+(instead of the number of substitutions):
+.PP
+.Vb 3
+\&    $x = "I like dogs.";
+\&    $y = $x =~ s/dogs/cats/r;
+\&    print "$x $y\en";
+.Ve
+.PP
+That example will print "I like dogs. I like cats". Notice the original
+\&\f(CW$x\fR variable has not been affected. The overall
+result of the substitution is instead stored in \f(CW$y\fR. If the
+substitution doesn't affect anything then the original string is
+returned:
+.PP
+.Vb 3
+\&    $x = "I like dogs.";
+\&    $y = $x =~ s/elephants/cougars/r;
+\&    print "$x $y\en"; # prints "I like dogs. I like dogs."
+.Ve
+.PP
+One other interesting thing that the \f(CW\*(C`s///r\*(C'\fR flag allows is chaining
+substitutions:
+.PP
+.Vb 4
+\&    $x = "Cats are great.";
+\&    print $x =~ s/Cats/Dogs/r =~ s/Dogs/Frogs/r =~
+\&        s/Frogs/Hedgehogs/r, "\en";
+\&    # prints "Hedgehogs are great."
+.Ve
+.PP
+A modifier available specifically to search and replace is the
+\&\f(CW\*(C`s///e\*(C'\fR evaluation modifier.  \f(CW\*(C`s///e\*(C'\fR treats the
+replacement text as Perl code, rather than a double-quoted
+string.  The value that the code returns is substituted for the
+matched substring.  \f(CW\*(C`s///e\*(C'\fR is useful if you need to do a bit of
+computation in the process of replacing text.  This example counts
+character frequencies in a line:
+.PP
+.Vb 4
+\&    $x = "Bill the cat";
+\&    $x =~ s/(.)/$chars{$1}++;$1/eg; # final $1 replaces char with itself
+\&    print "frequency of \*(Aq$_\*(Aq is $chars{$_}\en"
+\&        foreach (sort {$chars{$b} <=> $chars{$a}} keys %chars);
+.Ve
+.PP
+This prints
+.PP
+.Vb 9
+\&    frequency of \*(Aq \*(Aq is 2
+\&    frequency of \*(Aqt\*(Aq is 2
+\&    frequency of \*(Aql\*(Aq is 2
+\&    frequency of \*(AqB\*(Aq is 1
+\&    frequency of \*(Aqc\*(Aq is 1
+\&    frequency of \*(Aqe\*(Aq is 1
+\&    frequency of \*(Aqh\*(Aq is 1
+\&    frequency of \*(Aqi\*(Aq is 1
+\&    frequency of \*(Aqa\*(Aq is 1
+.Ve
+.PP
+As with the match \f(CW\*(C`m//\*(C'\fR operator, \f(CW\*(C`s///\*(C'\fR can use other delimiters,
+such as \f(CW\*(C`s!!!\*(C'\fR and \f(CW\*(C`s{}{}\*(C'\fR, and even \f(CW\*(C`s{}//\*(C'\fR.  If single quotes are
+used \f(CW\*(C`s\*(Aq\*(Aq\*(Aq\*(C'\fR, then the regexp and replacement are
+treated as single-quoted strings and there are no
+variable substitutions.  \f(CW\*(C`s///\*(C'\fR in list context
+returns the same thing as in scalar context, \fIi.e.\fR, the number of
+matches.
+.PP
+\fIThe split function\fR
+.IX Subsection "The split function"
+.PP
+The \f(CWsplit()\fR function is another place where a regexp is used.
+\&\f(CW\*(C`split /regexp/, string, limit\*(C'\fR separates the \f(CW\*(C`string\*(C'\fR operand into
+a list of substrings and returns that list.  The regexp must be designed
+to match whatever constitutes the separators for the desired substrings.
+The \f(CW\*(C`limit\*(C'\fR, if present, constrains splitting into no more than \f(CW\*(C`limit\*(C'\fR
+number of strings.  For example, to split a string into words, use
+.PP
+.Vb 4
+\&    $x = "Calvin and Hobbes";
+\&    @words = split /\es+/, $x;  # $word[0] = \*(AqCalvin\*(Aq
+\&                               # $word[1] = \*(Aqand\*(Aq
+\&                               # $word[2] = \*(AqHobbes\*(Aq
+.Ve
+.PP
+If the empty regexp \f(CW\*(C`//\*(C'\fR is used, the regexp always matches and
+the string is split into individual characters.  If the regexp has
+groupings, then the resulting list contains the matched substrings from the
+groupings as well.  For instance,
+.PP
+.Vb 12
+\&    $x = "/usr/bin/perl";
+\&    @dirs = split m!/!, $x;  # $dirs[0] = \*(Aq\*(Aq
+\&                             # $dirs[1] = \*(Aqusr\*(Aq
+\&                             # $dirs[2] = \*(Aqbin\*(Aq
+\&                             # $dirs[3] = \*(Aqperl\*(Aq
+\&    @parts = split m!(/)!, $x;  # $parts[0] = \*(Aq\*(Aq
+\&                                # $parts[1] = \*(Aq/\*(Aq
+\&                                # $parts[2] = \*(Aqusr\*(Aq
+\&                                # $parts[3] = \*(Aq/\*(Aq
+\&                                # $parts[4] = \*(Aqbin\*(Aq
+\&                                # $parts[5] = \*(Aq/\*(Aq
+\&                                # $parts[6] = \*(Aqperl\*(Aq
+.Ve
+.PP
+Since the first character of \f(CW$x\fR matched the regexp, \f(CW\*(C`split\*(C'\fR prepended
+an empty initial element to the list.
+.PP
+If you have read this far, congratulations! You now have all the basic
+tools needed to use regular expressions to solve a wide range of text
+processing problems.  If this is your first time through the tutorial,
+why not stop here and play around with regexps a while....  Part\ 2
+concerns the more esoteric aspects of regular expressions and those
+concepts certainly aren't needed right at the start.
+.SH "Part 2: Power tools"
+.IX Header "Part 2: Power tools"
+OK, you know the basics of regexps and you want to know more.  If
+matching regular expressions is analogous to a walk in the woods, then
+the tools discussed in Part 1 are analogous to topo maps and a
+compass, basic tools we use all the time.  Most of the tools in part 2
+are analogous to flare guns and satellite phones.  They aren't used
+too often on a hike, but when we are stuck, they can be invaluable.
+.PP
+What follows are the more advanced, less used, or sometimes esoteric
+capabilities of Perl regexps.  In Part 2, we will assume you are
+comfortable with the basics and concentrate on the advanced features.
+.SS "More on characters, strings, and character classes"
+.IX Subsection "More on characters, strings, and character classes"
+There are a number of escape sequences and character classes that we
+haven't covered yet.
+.PP
+There are several escape sequences that convert characters or strings
+between upper and lower case, and they are also available within
+patterns.  \f(CW\*(C`\el\*(C'\fR and \f(CW\*(C`\eu\*(C'\fR convert the next character to lower or
+upper case, respectively:
+.PP
+.Vb 4
+\&    $x = "perl";
+\&    $string =~ /\eu$x/;  # matches \*(AqPerl\*(Aq in $string
+\&    $x = "M(rs?|s)\e\e."; # note the double backslash
+\&    $string =~ /\el$x/;  # matches \*(Aqmr.\*(Aq, \*(Aqmrs.\*(Aq, and \*(Aqms.\*(Aq,
+.Ve
+.PP
+A \f(CW\*(C`\eL\*(C'\fR or \f(CW\*(C`\eU\*(C'\fR indicates a lasting conversion of case, until
+terminated by \f(CW\*(C`\eE\*(C'\fR or thrown over by another \f(CW\*(C`\eU\*(C'\fR or \f(CW\*(C`\eL\*(C'\fR:
+.PP
+.Vb 4
+\&    $x = "This word is in lower case:\eL SHOUT\eE";
+\&    $x =~ /shout/;       # matches
+\&    $x = "I STILL KEYPUNCH CARDS FOR MY 360";
+\&    $x =~ /\eUkeypunch/;  # matches punch card string
+.Ve
+.PP
+If there is no \f(CW\*(C`\eE\*(C'\fR, case is converted until the end of the
+string. The regexps \f(CW\*(C`\eL\eu$word\*(C'\fR or \f(CW\*(C`\eu\eL$word\*(C'\fR convert the first
+character of \f(CW$word\fR to uppercase and the rest of the characters to
+lowercase.  (Beyond ASCII characters, it gets somewhat more complicated;
+\&\f(CW\*(C`\eu\*(C'\fR actually performs \fItitlecase\fR mapping, which for most characters
+is the same as uppercase, but not for all; see
+<https://unicode.org/faq/casemap_charprop.html#4>.)
+.PP
+Control characters can be escaped with \f(CW\*(C`\ec\*(C'\fR, so that a control-Z
+character would be matched with \f(CW\*(C`\ecZ\*(C'\fR.  The escape sequence
+\&\f(CW\*(C`\eQ\*(C'\fR...\f(CW\*(C`\eE\*(C'\fR quotes, or protects most non-alphabetic characters.   For
+instance,
+.PP
+.Vb 2
+\&    $x = "\eQThat !^*&%~& cat!";
+\&    $x =~ /\eQ!^*&%~&\eE/;  # check for rough language
+.Ve
+.PP
+It does not protect \f(CW\*(Aq$\*(Aq\fR or \f(CW\*(Aq@\*(Aq\fR, so that variables can still be
+substituted.
+.PP
+\&\f(CW\*(C`\eQ\*(C'\fR, \f(CW\*(C`\eL\*(C'\fR, \f(CW\*(C`\el\*(C'\fR, \f(CW\*(C`\eU\*(C'\fR, \f(CW\*(C`\eu\*(C'\fR and \f(CW\*(C`\eE\*(C'\fR are actually part of
+double-quotish syntax, and not part of regexp syntax proper.  They will
+work if they appear in a regular expression embedded directly in a
+program, but not when contained in a string that is interpolated in a
+pattern.
+.PP
+Perl regexps can handle more than just the
+standard ASCII character set.  Perl supports \fIUnicode\fR, a standard
+for representing the alphabets from virtually all of the world's written
+languages, and a host of symbols.  Perl's text strings are Unicode strings, so
+they can contain characters with a value (codepoint or character number) higher
+than 255.
+.PP
+What does this mean for regexps? Well, regexp users don't need to know
+much about Perl's internal representation of strings.  But they do need
+to know 1) how to represent Unicode characters in a regexp and 2) that
+a matching operation will treat the string to be searched as a sequence
+of characters, not bytes.  The answer to 1) is that Unicode characters
+greater than \f(CWchr(255)\fR are represented using the \f(CW\*(C`\ex{hex}\*(C'\fR notation, because
+\&\f(CW\*(C`\ex\*(C'\fR\fIXY\fR (without curly braces and \fIXY\fR are two hex digits) doesn't
+go further than 255.  (Starting in Perl 5.14, if you're an octal fan,
+you can also use \f(CW\*(C`\eo{oct}\*(C'\fR.)
+.PP
+.Vb 2
+\&    /\ex{263a}/;   # match a Unicode smiley face :)
+\&    /\ex{ 263a }/; # Same
+.Ve
+.PP
+\&\fBNOTE\fR: In Perl 5.6.0 it used to be that one needed to say \f(CW\*(C`use
+utf8\*(C'\fR to use any Unicode features.  This is no longer the case: for
+almost all Unicode processing, the explicit \f(CW\*(C`utf8\*(C'\fR pragma is not
+needed.  (The only case where it matters is if your Perl script is in
+Unicode and encoded in UTF\-8, then an explicit \f(CW\*(C`use utf8\*(C'\fR is needed.)
+.PP
+Figuring out the hexadecimal sequence of a Unicode character you want
+or deciphering someone else's hexadecimal Unicode regexp is about as
+much fun as programming in machine code.  So another way to specify
+Unicode characters is to use the \fInamed character\fR escape
+sequence \f(CW\*(C`\eN{\fR\f(CIname\fR\f(CW}\*(C'\fR.  \fIname\fR is a name for the Unicode character, as
+specified in the Unicode standard.  For instance, if we wanted to
+represent or match the astrological sign for the planet Mercury, we
+could use
+.PP
+.Vb 3
+\&    $x = "abc\eN{MERCURY}def";
+\&    $x =~ /\eN{MERCURY}/;   # matches
+\&    $x =~ /\eN{ MERCURY }/; # Also matches
+.Ve
+.PP
+One can also use "short" names:
+.PP
+.Vb 2
+\&    print "\eN{GREEK SMALL LETTER SIGMA} is called sigma.\en";
+\&    print "\eN{greek:Sigma} is an upper\-case sigma.\en";
+.Ve
+.PP
+You can also restrict names to a certain alphabet by specifying the
+charnames pragma:
+.PP
+.Vb 2
+\&    use charnames qw(greek);
+\&    print "\eN{sigma} is Greek sigma\en";
+.Ve
+.PP
+An index of character names is available on-line from the Unicode
+Consortium, <https://www.unicode.org/charts/charindex.html>; explanatory
+material with links to other resources at
+<https://www.unicode.org/standard/where>.
+.PP
+Starting in Perl v5.32, an alternative to \f(CW\*(C`\eN{...}\*(C'\fR for full names is
+available, and that is to say
+.PP
+.Vb 1
+\& /\ep{Name=greek small letter sigma}/
+.Ve
+.PP
+The casing of the character name is irrelevant when used in \f(CW\*(C`\ep{}\*(C'\fR, as
+are most spaces, underscores and hyphens.  (A few outlier characters
+cause problems with ignoring all of them always.  The details (which you
+can look up when you get more proficient, and if ever needed) are in
+<https://www.unicode.org/reports/tr44/tr44\-24.html#UAX44\-LM2>).
+.PP
+The answer to requirement 2) is that a regexp (mostly)
+uses Unicode characters.  The "mostly" is for messy backward
+compatibility reasons, but starting in Perl 5.14, any regexp compiled in
+the scope of a \f(CW\*(C`use feature \*(Aqunicode_strings\*(Aq\*(C'\fR (which is automatically
+turned on within the scope of a \f(CW\*(C`use v5.12\*(C'\fR or higher) will turn that
+"mostly" into "always".  If you want to handle Unicode properly, you
+should ensure that \f(CW\*(Aqunicode_strings\*(Aq\fR is turned on.
+Internally, this is encoded to bytes using either UTF\-8 or a native 8
+bit encoding, depending on the history of the string, but conceptually
+it is a sequence of characters, not bytes. See perlunitut for a
+tutorial about that.
+.PP
+Let us now discuss Unicode character classes, most usually called
+"character properties".  These are represented by the \f(CW\*(C`\ep{\fR\f(CIname\fR\f(CW}\*(C'\fR
+escape sequence.  The negation of this is \f(CW\*(C`\eP{\fR\f(CIname\fR\f(CW}\*(C'\fR.  For example,
+to match lower and uppercase characters,
+.PP
+.Vb 5
+\&    $x = "BOB";
+\&    $x =~ /^\ep{IsUpper}/;   # matches, uppercase char class
+\&    $x =~ /^\eP{IsUpper}/;   # doesn\*(Aqt match, char class sans uppercase
+\&    $x =~ /^\ep{IsLower}/;   # doesn\*(Aqt match, lowercase char class
+\&    $x =~ /^\eP{IsLower}/;   # matches, char class sans lowercase
+.Ve
+.PP
+(The "\f(CW\*(C`Is\*(C'\fR" is optional.)
+.PP
+There are many, many Unicode character properties.  For the full list
+see perluniprops.  Most of them have synonyms with shorter names,
+also listed there.  Some synonyms are a single character.  For these,
+you can drop the braces.  For instance, \f(CW\*(C`\epM\*(C'\fR is the same thing as
+\&\f(CW\*(C`\ep{Mark}\*(C'\fR, meaning things like accent marks.
+.PP
+The Unicode \f(CW\*(C`\ep{Script}\*(C'\fR and \f(CW\*(C`\ep{Script_Extensions}\*(C'\fR properties are
+used to categorize every Unicode character into the language script it
+is written in.  For example,
+English, French, and a bunch of other European languages are written in
+the Latin script.  But there is also the Greek script, the Thai script,
+the Katakana script, \fIetc\fR.  (\f(CW\*(C`Script\*(C'\fR is an older, less advanced,
+form of \f(CW\*(C`Script_Extensions\*(C'\fR, retained only for backwards
+compatibility.)  You can test whether a character is in a particular
+script  with, for example \f(CW\*(C`\ep{Latin}\*(C'\fR, \f(CW\*(C`\ep{Greek}\*(C'\fR, or
+\&\f(CW\*(C`\ep{Katakana}\*(C'\fR.  To test if it isn't in the Balinese script, you would
+use \f(CW\*(C`\eP{Balinese}\*(C'\fR.  (These all use \f(CW\*(C`Script_Extensions\*(C'\fR under the
+hood, as that gives better results.)
+.PP
+What we have described so far is the single form of the \f(CW\*(C`\ep{...}\*(C'\fR character
+classes.  There is also a compound form which you may run into.  These
+look like \f(CW\*(C`\ep{\fR\f(CIname\fR\f(CW=\fR\f(CIvalue\fR\f(CW}\*(C'\fR or \f(CW\*(C`\ep{\fR\f(CIname\fR\f(CW:\fR\f(CIvalue\fR\f(CW}\*(C'\fR (the equals sign and colon
+can be used interchangeably).  These are more general than the single form,
+and in fact most of the single forms are just Perl-defined shortcuts for common
+compound forms.  For example, the script examples in the previous paragraph
+could be written equivalently as \f(CW\*(C`\ep{Script_Extensions=Latin}\*(C'\fR, \f(CW\*(C`\ep{Script_Extensions:Greek}\*(C'\fR,
+\&\f(CW\*(C`\ep{script_extensions=katakana}\*(C'\fR, and \f(CW\*(C`\eP{script_extensions=balinese}\*(C'\fR (case is irrelevant
+between the \f(CW\*(C`{}\*(C'\fR braces).  You may
+never have to use the compound forms, but sometimes it is necessary, and their
+use can make your code easier to understand.
+.PP
+\&\f(CW\*(C`\eX\*(C'\fR is an abbreviation for a character class that comprises
+a Unicode \fIextended grapheme cluster\fR.  This represents a "logical character":
+what appears to be a single character, but may be represented internally by more
+than one.  As an example, using the Unicode full names, \fIe.g.\fR, "A\ +\ COMBINING\ RING" is a grapheme cluster with base character "A" and combining character
+"COMBINING\ RING, which translates in Danish to "A" with the circle atop it,
+as in the word Ångstrom.
+.PP
+For the full and latest information about Unicode see the latest
+Unicode standard, or the Unicode Consortium's website <https://www.unicode.org>
+.PP
+As if all those classes weren't enough, Perl also defines POSIX-style
+character classes.  These have the form \f(CW\*(C`[:\fR\f(CIname\fR\f(CW:]\*(C'\fR, with \fIname\fR the
+name of the POSIX class.  The POSIX classes are \f(CW\*(C`alpha\*(C'\fR, \f(CW\*(C`alnum\*(C'\fR,
+\&\f(CW\*(C`ascii\*(C'\fR, \f(CW\*(C`cntrl\*(C'\fR, \f(CW\*(C`digit\*(C'\fR, \f(CW\*(C`graph\*(C'\fR, \f(CW\*(C`lower\*(C'\fR, \f(CW\*(C`print\*(C'\fR, \f(CW\*(C`punct\*(C'\fR,
+\&\f(CW\*(C`space\*(C'\fR, \f(CW\*(C`upper\*(C'\fR, and \f(CW\*(C`xdigit\*(C'\fR, and two extensions, \f(CW\*(C`word\*(C'\fR (a Perl
+extension to match \f(CW\*(C`\ew\*(C'\fR), and \f(CW\*(C`blank\*(C'\fR (a GNU extension).  The \f(CW\*(C`/a\*(C'\fR
+modifier restricts these to matching just in the ASCII range; otherwise
+they can match the same as their corresponding Perl Unicode classes:
+\&\f(CW\*(C`[:upper:]\*(C'\fR is the same as \f(CW\*(C`\ep{IsUpper}\*(C'\fR, \fIetc\fR.  (There are some
+exceptions and gotchas with this; see perlrecharclass for a full
+discussion.) The \f(CW\*(C`[:digit:]\*(C'\fR, \f(CW\*(C`[:word:]\*(C'\fR, and
+\&\f(CW\*(C`[:space:]\*(C'\fR correspond to the familiar \f(CW\*(C`\ed\*(C'\fR, \f(CW\*(C`\ew\*(C'\fR, and \f(CW\*(C`\es\*(C'\fR
+character classes.  To negate a POSIX class, put a \f(CW\*(Aq^\*(Aq\fR in front of
+the name, so that, \fIe.g.\fR, \f(CW\*(C`[:^digit:]\*(C'\fR corresponds to \f(CW\*(C`\eD\*(C'\fR and, under
+Unicode, \f(CW\*(C`\eP{IsDigit}\*(C'\fR.  The Unicode and POSIX character classes can
+be used just like \f(CW\*(C`\ed\*(C'\fR, with the exception that POSIX character
+classes can only be used inside of a character class:
+.PP
+.Vb 6
+\&    /\es+[abc[:digit:]xyz]\es*/;  # match a,b,c,x,y,z, or a digit
+\&    /^=item\es[[:digit:]]/;      # match \*(Aq=item\*(Aq,
+\&                                # followed by a space and a digit
+\&    /\es+[abc\ep{IsDigit}xyz]\es+/;  # match a,b,c,x,y,z, or a digit
+\&    /^=item\es\ep{IsDigit}/;        # match \*(Aq=item\*(Aq,
+\&                                  # followed by a space and a digit
+.Ve
+.PP
+Whew! That is all the rest of the characters and character classes.
+.SS "Compiling and saving regular expressions"
+.IX Subsection "Compiling and saving regular expressions"
+In Part 1 we mentioned that Perl compiles a regexp into a compact
+sequence of opcodes.  Thus, a compiled regexp is a data structure
+that can be stored once and used again and again.  The regexp quote
+\&\f(CW\*(C`qr//\*(C'\fR does exactly that: \f(CW\*(C`qr/string/\*(C'\fR compiles the \f(CW\*(C`string\*(C'\fR as a
+regexp and transforms the result into a form that can be assigned to a
+variable:
+.PP
+.Vb 1
+\&    $reg = qr/foo+bar?/;  # reg contains a compiled regexp
+.Ve
+.PP
+Then \f(CW$reg\fR can be used as a regexp:
+.PP
+.Vb 3
+\&    $x = "fooooba";
+\&    $x =~ $reg;     # matches, just like /foo+bar?/
+\&    $x =~ /$reg/;   # same thing, alternate form
+.Ve
+.PP
+\&\f(CW$reg\fR can also be interpolated into a larger regexp:
+.PP
+.Vb 1
+\&    $x =~ /(abc)?$reg/;  # still matches
+.Ve
+.PP
+As with the matching operator, the regexp quote can use different
+delimiters, \fIe.g.\fR, \f(CW\*(C`qr!!\*(C'\fR, \f(CW\*(C`qr{}\*(C'\fR or \f(CW\*(C`qr~~\*(C'\fR.  Apostrophes
+as delimiters (\f(CW\*(C`qr\*(Aq\*(Aq\*(C'\fR) inhibit any interpolation.
+.PP
+Pre-compiled regexps are useful for creating dynamic matches that
+don't need to be recompiled each time they are encountered.  Using
+pre-compiled regexps, we write a \f(CW\*(C`grep_step\*(C'\fR program which greps
+for a sequence of patterns, advancing to the next pattern as soon
+as one has been satisfied.
+.PP
+.Vb 4
+\&    % cat > grep_step
+\&    #!/usr/bin/perl
+\&    # grep_step \- match <number> regexps, one after the other
+\&    # usage: multi_grep <number> regexp1 regexp2 ... file1 file2 ...
+\&
+\&    $number = shift;
+\&    $regexp[$_] = shift foreach (0..$number\-1);
+\&    @compiled = map qr/$_/, @regexp;
+\&    while ($line = <>) {
+\&        if ($line =~ /$compiled[0]/) {
+\&            print $line;
+\&            shift @compiled;
+\&            last unless @compiled;
+\&        }
+\&    }
+\&    ^D
+\&
+\&    % grep_step 3 shift print last grep_step
+\&    $number = shift;
+\&            print $line;
+\&            last unless @compiled;
+.Ve
+.PP
+Storing pre-compiled regexps in an array \f(CW@compiled\fR allows us to
+simply loop through the regexps without any recompilation, thus gaining
+flexibility without sacrificing speed.
+.SS "Composing regular expressions at runtime"
+.IX Subsection "Composing regular expressions at runtime"
+Backtracking is more efficient than repeated tries with different regular
+expressions.  If there are several regular expressions and a match with
+any of them is acceptable, then it is possible to combine them into a set
+of alternatives.  If the individual expressions are input data, this
+can be done by programming a join operation.  We'll exploit this idea in
+an improved version of the \f(CW\*(C`simple_grep\*(C'\fR program: a program that matches
+multiple patterns:
+.PP
+.Vb 4
+\&    % cat > multi_grep
+\&    #!/usr/bin/perl
+\&    # multi_grep \- match any of <number> regexps
+\&    # usage: multi_grep <number> regexp1 regexp2 ... file1 file2 ...
+\&
+\&    $number = shift;
+\&    $regexp[$_] = shift foreach (0..$number\-1);
+\&    $pattern = join \*(Aq|\*(Aq, @regexp;
+\&
+\&    while ($line = <>) {
+\&        print $line if $line =~ /$pattern/;
+\&    }
+\&    ^D
+\&
+\&    % multi_grep 2 shift for multi_grep
+\&    $number = shift;
+\&    $regexp[$_] = shift foreach (0..$number\-1);
+.Ve
+.PP
+Sometimes it is advantageous to construct a pattern from the \fIinput\fR
+that is to be analyzed and use the permissible values on the left
+hand side of the matching operations.  As an example for this somewhat
+paradoxical situation, let's assume that our input contains a command
+verb which should match one out of a set of available command verbs,
+with the additional twist that commands may be abbreviated as long as
+the given string is unique. The program below demonstrates the basic
+algorithm.
+.PP
+.Vb 10
+\&    % cat > keymatch
+\&    #!/usr/bin/perl
+\&    $kwds = \*(Aqcopy compare list print\*(Aq;
+\&    while( $cmd = <> ){
+\&        $cmd =~ s/^\es+|\es+$//g;  # trim leading and trailing spaces
+\&        if( ( @matches = $kwds =~ /\eb$cmd\ew*/g ) == 1 ){
+\&            print "command: \*(Aq@matches\*(Aq\en";
+\&        } elsif( @matches == 0 ){
+\&            print "no such command: \*(Aq$cmd\*(Aq\en";
+\&        } else {
+\&            print "not unique: \*(Aq$cmd\*(Aq (could be one of: @matches)\en";
+\&        }
+\&    }
+\&    ^D
+\&
+\&    % keymatch
+\&    li
+\&    command: \*(Aqlist\*(Aq
+\&    co
+\&    not unique: \*(Aqco\*(Aq (could be one of: copy compare)
+\&    printer
+\&    no such command: \*(Aqprinter\*(Aq
+.Ve
+.PP
+Rather than trying to match the input against the keywords, we match the
+combined set of keywords against the input.  The pattern matching
+operation \f(CW\*(C`$kwds\ =~\ /\eb($cmd\ew*)/g\*(C'\fR does several things at the
+same time. It makes sure that the given command begins where a keyword
+begins (\f(CW\*(C`\eb\*(C'\fR). It tolerates abbreviations due to the added \f(CW\*(C`\ew*\*(C'\fR. It
+tells us the number of matches (\f(CW\*(C`scalar @matches\*(C'\fR) and all the keywords
+that were actually matched.  You could hardly ask for more.
+.SS "Embedding comments and modifiers in a regular expression"
+.IX Subsection "Embedding comments and modifiers in a regular expression"
+Starting with this section, we will be discussing Perl's set of
+\&\fIextended patterns\fR.  These are extensions to the traditional regular
+expression syntax that provide powerful new tools for pattern
+matching.  We have already seen extensions in the form of the minimal
+matching constructs \f(CW\*(C`??\*(C'\fR, \f(CW\*(C`*?\*(C'\fR, \f(CW\*(C`+?\*(C'\fR, \f(CW\*(C`{n,m}?\*(C'\fR, \f(CW\*(C`{n,}?\*(C'\fR, and
+\&\f(CW\*(C`{,n}?\*(C'\fR.  Most of the extensions below have the form \f(CW\*(C`(?char...)\*(C'\fR,
+where the \f(CW\*(C`char\*(C'\fR is a character that determines the type of extension.
+.PP
+The first extension is an embedded comment \f(CW\*(C`(?#text)\*(C'\fR.  This embeds a
+comment into the regular expression without affecting its meaning.  The
+comment should not have any closing parentheses in the text.  An
+example is
+.PP
+.Vb 1
+\&    /(?# Match an integer:)[+\-]?\ed+/;
+.Ve
+.PP
+This style of commenting has been largely superseded by the raw,
+freeform commenting that is allowed with the \f(CW\*(C`/x\*(C'\fR modifier.
+.PP
+Most modifiers, such as \f(CW\*(C`/i\*(C'\fR, \f(CW\*(C`/m\*(C'\fR, \f(CW\*(C`/s\*(C'\fR and \f(CW\*(C`/x\*(C'\fR (or any
+combination thereof) can also be embedded in
+a regexp using \f(CW\*(C`(?i)\*(C'\fR, \f(CW\*(C`(?m)\*(C'\fR, \f(CW\*(C`(?s)\*(C'\fR, and \f(CW\*(C`(?x)\*(C'\fR.  For instance,
+.PP
+.Vb 7
+\&    /(?i)yes/;  # match \*(Aqyes\*(Aq case insensitively
+\&    /yes/i;     # same thing
+\&    /(?x)(          # freeform version of an integer regexp
+\&             [+\-]?  # match an optional sign
+\&             \ed+    # match a sequence of digits
+\&         )
+\&    /x;
+.Ve
+.PP
+Embedded modifiers can have two important advantages over the usual
+modifiers.  Embedded modifiers allow a custom set of modifiers for
+\&\fIeach\fR regexp pattern.  This is great for matching an array of regexps
+that must have different modifiers:
+.PP
+.Vb 8
+\&    $pattern[0] = \*(Aq(?i)doctor\*(Aq;
+\&    $pattern[1] = \*(AqJohnson\*(Aq;
+\&    ...
+\&    while (<>) {
+\&        foreach $patt (@pattern) {
+\&            print if /$patt/;
+\&        }
+\&    }
+.Ve
+.PP
+The second advantage is that embedded modifiers (except \f(CW\*(C`/p\*(C'\fR, which
+modifies the entire regexp) only affect the regexp
+inside the group the embedded modifier is contained in.  So grouping
+can be used to localize the modifier's effects:
+.PP
+.Vb 1
+\&    /Answer: ((?i)yes)/;  # matches \*(AqAnswer: yes\*(Aq, \*(AqAnswer: YES\*(Aq, etc.
+.Ve
+.PP
+Embedded modifiers can also turn off any modifiers already present
+by using, \fIe.g.\fR, \f(CW\*(C`(?\-i)\*(C'\fR.  Modifiers can also be combined into
+a single expression, \fIe.g.\fR, \f(CW\*(C`(?s\-i)\*(C'\fR turns on single line mode and
+turns off case insensitivity.
+.PP
+Embedded modifiers may also be added to a non-capturing grouping.
+\&\f(CW\*(C`(?i\-m:regexp)\*(C'\fR is a non-capturing grouping that matches \f(CW\*(C`regexp\*(C'\fR
+case insensitively and turns off multi-line mode.
+.SS "Looking ahead and looking behind"
+.IX Subsection "Looking ahead and looking behind"
+This section concerns the lookahead and lookbehind assertions.  First,
+a little background.
+.PP
+In Perl regular expressions, most regexp elements "eat up" a certain
+amount of string when they match.  For instance, the regexp element
+\&\f(CW\*(C`[abc]\*(C'\fR eats up one character of the string when it matches, in the
+sense that Perl moves to the next character position in the string
+after the match.  There are some elements, however, that don't eat up
+characters (advance the character position) if they match.  The examples
+we have seen so far are the anchors.  The anchor \f(CW\*(Aq^\*(Aq\fR matches the
+beginning of the line, but doesn't eat any characters.  Similarly, the
+word boundary anchor \f(CW\*(C`\eb\*(C'\fR matches wherever a character matching \f(CW\*(C`\ew\*(C'\fR
+is next to a character that doesn't, but it doesn't eat up any
+characters itself.  Anchors are examples of \fIzero-width assertions\fR:
+zero-width, because they consume
+no characters, and assertions, because they test some property of the
+string.  In the context of our walk in the woods analogy to regexp
+matching, most regexp elements move us along a trail, but anchors have
+us stop a moment and check our surroundings.  If the local environment
+checks out, we can proceed forward.  But if the local environment
+doesn't satisfy us, we must backtrack.
+.PP
+Checking the environment entails either looking ahead on the trail,
+looking behind, or both.  \f(CW\*(Aq^\*(Aq\fR looks behind, to see that there are no
+characters before.  \f(CW\*(Aq$\*(Aq\fR looks ahead, to see that there are no
+characters after.  \f(CW\*(C`\eb\*(C'\fR looks both ahead and behind, to see if the
+characters on either side differ in their "word-ness".
+.PP
+The lookahead and lookbehind assertions are generalizations of the
+anchor concept.  Lookahead and lookbehind are zero-width assertions
+that let us specify which characters we want to test for.  The
+lookahead assertion is denoted by \f(CW\*(C`(?=regexp)\*(C'\fR or (starting in 5.32,
+experimentally in 5.28) \f(CW\*(C`(*pla:regexp)\*(C'\fR or
+\&\f(CW\*(C`(*positive_lookahead:regexp)\*(C'\fR; and the lookbehind assertion is denoted
+by \f(CW\*(C`(?<=fixed\-regexp)\*(C'\fR or (starting in 5.32, experimentally in
+5.28) \f(CW\*(C`(*plb:fixed\-regexp)\*(C'\fR or \f(CW\*(C`(*positive_lookbehind:fixed\-regexp)\*(C'\fR.
+Some examples are
+.PP
+.Vb 8
+\&    $x = "I catch the housecat \*(AqTom\-cat\*(Aq with catnip";
+\&    $x =~ /cat(*pla:\es)/;   # matches \*(Aqcat\*(Aq in \*(Aqhousecat\*(Aq
+\&    @catwords = ($x =~ /(?<=\es)cat\ew+/g);  # matches,
+\&                                           # $catwords[0] = \*(Aqcatch\*(Aq
+\&                                           # $catwords[1] = \*(Aqcatnip\*(Aq
+\&    $x =~ /\ebcat\eb/;  # matches \*(Aqcat\*(Aq in \*(AqTom\-cat\*(Aq
+\&    $x =~ /(?<=\es)cat(?=\es)/; # doesn\*(Aqt match; no isolated \*(Aqcat\*(Aq in
+\&                              # middle of $x
+.Ve
+.PP
+Note that the parentheses in these are
+non-capturing, since these are zero-width assertions.  Thus in the
+second regexp, the substrings captured are those of the whole regexp
+itself.  Lookahead can match arbitrary regexps, but
+lookbehind prior to 5.30 \f(CW\*(C`(?<=fixed\-regexp)\*(C'\fR only works for regexps
+of fixed width, \fIi.e.\fR, a fixed number of characters long.  Thus
+\&\f(CW\*(C`(?<=(ab|bc))\*(C'\fR is fine, but \f(CW\*(C`(?<=(ab)*)\*(C'\fR prior to 5.30 is not.
+.PP
+The negated versions of the lookahead and lookbehind assertions are
+denoted by \f(CW\*(C`(?!regexp)\*(C'\fR and \f(CW\*(C`(?<!fixed\-regexp)\*(C'\fR respectively.
+Or, starting in 5.32 (experimentally in 5.28), \f(CW\*(C`(*nla:regexp)\*(C'\fR,
+\&\f(CW\*(C`(*negative_lookahead:regexp)\*(C'\fR, \f(CW\*(C`(*nlb:regexp)\*(C'\fR, or
+\&\f(CW\*(C`(*negative_lookbehind:regexp)\*(C'\fR.
+They evaluate true if the regexps do \fInot\fR match:
+.PP
+.Vb 4
+\&    $x = "foobar";
+\&    $x =~ /foo(?!bar)/;  # doesn\*(Aqt match, \*(Aqbar\*(Aq follows \*(Aqfoo\*(Aq
+\&    $x =~ /foo(?!baz)/;  # matches, \*(Aqbaz\*(Aq doesn\*(Aqt follow \*(Aqfoo\*(Aq
+\&    $x =~ /(?<!\es)foo/;  # matches, there is no \es before \*(Aqfoo\*(Aq
+.Ve
+.PP
+Here is an example where a string containing blank-separated words,
+numbers and single dashes is to be split into its components.
+Using \f(CW\*(C`/\es+/\*(C'\fR alone won't work, because spaces are not required between
+dashes, or a word or a dash. Additional places for a split are established
+by looking ahead and behind:
+.PP
+.Vb 5
+\&    $str = "one two \- \-\-6\-8";
+\&    @toks = split / \es+              # a run of spaces
+\&                  | (?<=\eS) (?=\-)    # any non\-space followed by \*(Aq\-\*(Aq
+\&                  | (?<=\-)  (?=\eS)   # a \*(Aq\-\*(Aq followed by any non\-space
+\&                  /x, $str;          # @toks = qw(one two \- \- \- 6 \- 8)
+.Ve
+.SS "Using independent subexpressions to prevent backtracking"
+.IX Subsection "Using independent subexpressions to prevent backtracking"
+\&\fIIndependent subexpressions\fR (or atomic subexpressions) are regular
+expressions, in the context of a larger regular expression, that
+function independently of the larger regular expression.  That is, they
+consume as much or as little of the string as they wish without regard
+for the ability of the larger regexp to match.  Independent
+subexpressions are represented by
+\&\f(CW\*(C`(?>regexp)\*(C'\fR or (starting in 5.32, experimentally in 5.28)
+\&\f(CW\*(C`(*atomic:regexp)\*(C'\fR.  We can illustrate their behavior by first
+considering an ordinary regexp:
+.PP
+.Vb 2
+\&    $x = "ab";
+\&    $x =~ /a*ab/;  # matches
+.Ve
+.PP
+This obviously matches, but in the process of matching, the
+subexpression \f(CW\*(C`a*\*(C'\fR first grabbed the \f(CW\*(Aqa\*(Aq\fR.  Doing so, however,
+wouldn't allow the whole regexp to match, so after backtracking, \f(CW\*(C`a*\*(C'\fR
+eventually gave back the \f(CW\*(Aqa\*(Aq\fR and matched the empty string.  Here, what
+\&\f(CW\*(C`a*\*(C'\fR matched was \fIdependent\fR on what the rest of the regexp matched.
+.PP
+Contrast that with an independent subexpression:
+.PP
+.Vb 1
+\&    $x =~ /(?>a*)ab/;  # doesn\*(Aqt match!
+.Ve
+.PP
+The independent subexpression \f(CW\*(C`(?>a*)\*(C'\fR doesn't care about the rest
+of the regexp, so it sees an \f(CW\*(Aqa\*(Aq\fR and grabs it.  Then the rest of the
+regexp \f(CW\*(C`ab\*(C'\fR cannot match.  Because \f(CW\*(C`(?>a*)\*(C'\fR is independent, there
+is no backtracking and the independent subexpression does not give
+up its \f(CW\*(Aqa\*(Aq\fR.  Thus the match of the regexp as a whole fails.  A similar
+behavior occurs with completely independent regexps:
+.PP
+.Vb 3
+\&    $x = "ab";
+\&    $x =~ /a*/g;   # matches, eats an \*(Aqa\*(Aq
+\&    $x =~ /\eGab/g; # doesn\*(Aqt match, no \*(Aqa\*(Aq available
+.Ve
+.PP
+Here \f(CW\*(C`/g\*(C'\fR and \f(CW\*(C`\eG\*(C'\fR create a "tag team" handoff of the string from
+one regexp to the other.  Regexps with an independent subexpression are
+much like this, with a handoff of the string to the independent
+subexpression, and a handoff of the string back to the enclosing
+regexp.
+.PP
+The ability of an independent subexpression to prevent backtracking
+can be quite useful.  Suppose we want to match a non-empty string
+enclosed in parentheses up to two levels deep.  Then the following
+regexp matches:
+.PP
+.Vb 2
+\&    $x = "abc(de(fg)h";  # unbalanced parentheses
+\&    $x =~ /\e( ( [ ^ () ]+ | \e( [ ^ () ]* \e) )+ \e)/xx;
+.Ve
+.PP
+The regexp matches an open parenthesis, one or more copies of an
+alternation, and a close parenthesis.  The alternation is two-way, with
+the first alternative \f(CW\*(C`[^()]+\*(C'\fR matching a substring with no
+parentheses and the second alternative \f(CW\*(C`\e([^()]*\e)\*(C'\fR  matching a
+substring delimited by parentheses.  The problem with this regexp is
+that it is pathological: it has nested indeterminate quantifiers
+of the form \f(CW\*(C`(a+|b)+\*(C'\fR.  We discussed in Part 1 how nested quantifiers
+like this could take an exponentially long time to execute if there
+is no match possible.  To prevent the exponential blowup, we need to
+prevent useless backtracking at some point.  This can be done by
+enclosing the inner quantifier as an independent subexpression:
+.PP
+.Vb 1
+\&    $x =~ /\e( ( (?> [ ^ () ]+ ) | \e([ ^ () ]* \e) )+ \e)/xx;
+.Ve
+.PP
+Here, \f(CW\*(C`(?>[^()]+)\*(C'\fR breaks the degeneracy of string partitioning
+by gobbling up as much of the string as possible and keeping it.   Then
+match failures fail much more quickly.
+.SS "Conditional expressions"
+.IX Subsection "Conditional expressions"
+A \fIconditional expression\fR is a form of if-then-else statement
+that allows one to choose which patterns are to be matched, based on
+some condition.  There are two types of conditional expression:
+\&\f(CW\*(C`(?(\fR\f(CIcondition\fR\f(CW)\fR\f(CIyes\-regexp\fR\f(CW)\*(C'\fR and
+\&\f(CW\*(C`(?(condition)\fR\f(CIyes\-regexp\fR\f(CW|\fR\f(CIno\-regexp\fR\f(CW)\*(C'\fR.
+\&\f(CW\*(C`(?(\fR\f(CIcondition\fR\f(CW)\fR\f(CIyes\-regexp\fR\f(CW)\*(C'\fR is
+like an \f(CW\*(Aqif\ ()\ {}\*(Aq\fR statement in Perl.  If the \fIcondition\fR is true,
+the \fIyes-regexp\fR will be matched.  If the \fIcondition\fR is false, the
+\&\fIyes-regexp\fR will be skipped and Perl will move onto the next regexp
+element.  The second form is like an \f(CW\*(Aqif\ ()\ {}\ else\ {}\*(Aq\fR statement
+in Perl.  If the \fIcondition\fR is true, the \fIyes-regexp\fR will be
+matched, otherwise the \fIno-regexp\fR will be matched.
+.PP
+The \fIcondition\fR can have several forms.  The first form is simply an
+integer in parentheses \f(CW\*(C`(\fR\f(CIinteger\fR\f(CW)\*(C'\fR.  It is true if the corresponding
+backreference \f(CW\*(C`\e\fR\f(CIinteger\fR\f(CW\*(C'\fR matched earlier in the regexp.  The same
+thing can be done with a name associated with a capture group, written
+as \f(CW\*(C`(<\fR\f(CIname\fR\f(CW>)\*(C'\fR or \f(CW\*(C`(\*(Aq\fR\f(CIname\fR\f(CW\*(Aq)\*(C'\fR.  The second form is a bare
+zero-width assertion \f(CW\*(C`(?...)\*(C'\fR, either a lookahead, a lookbehind, or a
+code assertion (discussed in the next section).  The third set of forms
+provides tests that return true if the expression is executed within
+a recursion (\f(CW\*(C`(R)\*(C'\fR) or is being called from some capturing group,
+referenced either by number (\f(CW\*(C`(R1)\*(C'\fR, \f(CW\*(C`(R2)\*(C'\fR,...) or by name
+(\f(CW\*(C`(R&\fR\f(CIname\fR\f(CW)\*(C'\fR).
+.PP
+The integer or name form of the \f(CW\*(C`condition\*(C'\fR allows us to choose,
+with more flexibility, what to match based on what matched earlier in the
+regexp. This searches for words of the form \f(CW"$x$x"\fR or \f(CW"$x$y$y$x"\fR:
+.PP
+.Vb 9
+\&    % simple_grep \*(Aq^(\ew+)(\ew+)?(?(2)\eg2\eg1|\eg1)$\*(Aq /usr/dict/words
+\&    beriberi
+\&    coco
+\&    couscous
+\&    deed
+\&    ...
+\&    toot
+\&    toto
+\&    tutu
+.Ve
+.PP
+The lookbehind \f(CW\*(C`condition\*(C'\fR allows, along with backreferences,
+an earlier part of the match to influence a later part of the
+match.  For instance,
+.PP
+.Vb 1
+\&    /[ATGC]+(?(?<=AA)G|C)$/;
+.Ve
+.PP
+matches a DNA sequence such that it either ends in \f(CW\*(C`AAG\*(C'\fR, or some
+other base pair combination and \f(CW\*(AqC\*(Aq\fR.  Note that the form is
+\&\f(CW\*(C`(?(?<=AA)G|C)\*(C'\fR and not \f(CW\*(C`(?((?<=AA))G|C)\*(C'\fR; for the
+lookahead, lookbehind or code assertions, the parentheses around the
+conditional are not needed.
+.SS "Defining named patterns"
+.IX Subsection "Defining named patterns"
+Some regular expressions use identical subpatterns in several places.
+Starting with Perl 5.10, it is possible to define named subpatterns in
+a section of the pattern so that they can be called up by name
+anywhere in the pattern.  This syntactic pattern for this definition
+group is \f(CW\*(C`(?(DEFINE)(?<\fR\f(CIname\fR\f(CW>\fR\f(CIpattern\fR\f(CW)...)\*(C'\fR.  An insertion
+of a named pattern is written as \f(CW\*(C`(?&\fR\f(CIname\fR\f(CW)\*(C'\fR.
+.PP
+The example below illustrates this feature using the pattern for
+floating point numbers that was presented earlier on.  The three
+subpatterns that are used more than once are the optional sign, the
+digit sequence for an integer and the decimal fraction.  The \f(CW\*(C`DEFINE\*(C'\fR
+group at the end of the pattern contains their definition.  Notice
+that the decimal fraction pattern is the first place where we can
+reuse the integer pattern.
+.PP
+.Vb 8
+\&   /^ (?&osg)\e * ( (?&int)(?&dec)? | (?&dec) )
+\&      (?: [eE](?&osg)(?&int) )?
+\&    $
+\&    (?(DEFINE)
+\&      (?<osg>[\-+]?)         # optional sign
+\&      (?<int>\ed++)          # integer
+\&      (?<dec>\e.(?&int))     # decimal fraction
+\&    )/x
+.Ve
+.SS "Recursive patterns"
+.IX Subsection "Recursive patterns"
+This feature (introduced in Perl 5.10) significantly extends the
+power of Perl's pattern matching.  By referring to some other
+capture group anywhere in the pattern with the construct
+\&\f(CW\*(C`(?\fR\f(CIgroup\-ref\fR\f(CW)\*(C'\fR, the \fIpattern\fR within the referenced group is used
+as an independent subpattern in place of the group reference itself.
+Because the group reference may be contained \fIwithin\fR the group it
+refers to, it is now possible to apply pattern matching to tasks that
+hitherto required a recursive parser.
+.PP
+To illustrate this feature, we'll design a pattern that matches if
+a string contains a palindrome. (This is a word or a sentence that,
+while ignoring spaces, interpunctuation and case, reads the same backwards
+as forwards. We begin by observing that the empty string or a string
+containing just one word character is a palindrome. Otherwise it must
+have a word character up front and the same at its end, with another
+palindrome in between.
+.PP
+.Vb 1
+\& /(?: (\ew) (?...Here be a palindrome...) \eg{ \-1 } | \ew? )/x
+.Ve
+.PP
+Adding \f(CW\*(C`\eW*\*(C'\fR at either end to eliminate what is to be ignored, we already
+have the full pattern:
+.PP
+.Vb 4
+\&    my $pp = qr/^(\eW* (?: (\ew) (?1) \eg{\-1} | \ew? ) \eW*)$/ix;
+\&    for $s ( "saippuakauppias", "A man, a plan, a canal: Panama!" ){
+\&        print "\*(Aq$s\*(Aq is a palindrome\en" if $s =~ /$pp/;
+\&    }
+.Ve
+.PP
+In \f(CW\*(C`(?...)\*(C'\fR both absolute and relative backreferences may be used.
+The entire pattern can be reinserted with \f(CW\*(C`(?R)\*(C'\fR or \f(CW\*(C`(?0)\*(C'\fR.
+If you prefer to name your groups, you can use \f(CW\*(C`(?&\fR\f(CIname\fR\f(CW)\*(C'\fR to
+recurse into that group.
+.SS "A bit of magic: executing Perl code in a regular expression"
+.IX Subsection "A bit of magic: executing Perl code in a regular expression"
+Normally, regexps are a part of Perl expressions.
+\&\fICode evaluation\fR expressions turn that around by allowing
+arbitrary Perl code to be a part of a regexp.  A code evaluation
+expression is denoted \f(CW\*(C`(?{\fR\f(CIcode\fR\f(CW})\*(C'\fR, with \fIcode\fR a string of Perl
+statements.
+.PP
+Code expressions are zero-width assertions, and the value they return
+depends on their environment.  There are two possibilities: either the
+code expression is used as a conditional in a conditional expression
+\&\f(CW\*(C`(?(\fR\f(CIcondition\fR\f(CW)...)\*(C'\fR, or it is not.  If the code expression is a
+conditional, the code is evaluated and the result (\fIi.e.\fR, the result of
+the last statement) is used to determine truth or falsehood.  If the
+code expression is not used as a conditional, the assertion always
+evaluates true and the result is put into the special variable
+\&\f(CW$^R\fR.  The variable \f(CW$^R\fR can then be used in code expressions later
+in the regexp.  Here are some silly examples:
+.PP
+.Vb 5
+\&    $x = "abcdef";
+\&    $x =~ /abc(?{print "Hi Mom!";})def/; # matches,
+\&                                         # prints \*(AqHi Mom!\*(Aq
+\&    $x =~ /aaa(?{print "Hi Mom!";})def/; # doesn\*(Aqt match,
+\&                                         # no \*(AqHi Mom!\*(Aq
+.Ve
+.PP
+Pay careful attention to the next example:
+.PP
+.Vb 3
+\&    $x =~ /abc(?{print "Hi Mom!";})ddd/; # doesn\*(Aqt match,
+\&                                         # no \*(AqHi Mom!\*(Aq
+\&                                         # but why not?
+.Ve
+.PP
+At first glance, you'd think that it shouldn't print, because obviously
+the \f(CW\*(C`ddd\*(C'\fR isn't going to match the target string. But look at this
+example:
+.PP
+.Vb 2
+\&    $x =~ /abc(?{print "Hi Mom!";})[dD]dd/; # doesn\*(Aqt match,
+\&                                            # but _does_ print
+.Ve
+.PP
+Hmm. What happened here? If you've been following along, you know that
+the above pattern should be effectively (almost) the same as the last one;
+enclosing the \f(CW\*(Aqd\*(Aq\fR in a character class isn't going to change what it
+matches. So why does the first not print while the second one does?
+.PP
+The answer lies in the optimizations the regexp engine makes. In the first
+case, all the engine sees are plain old characters (aside from the
+\&\f(CW\*(C`?{}\*(C'\fR construct). It's smart enough to realize that the string \f(CW\*(Aqddd\*(Aq\fR
+doesn't occur in our target string before actually running the pattern
+through. But in the second case, we've tricked it into thinking that our
+pattern is more complicated. It takes a look, sees our
+character class, and decides that it will have to actually run the
+pattern to determine whether or not it matches, and in the process of
+running it hits the print statement before it discovers that we don't
+have a match.
+.PP
+To take a closer look at how the engine does optimizations, see the
+section "Pragmas and debugging" below.
+.PP
+More fun with \f(CW\*(C`?{}\*(C'\fR:
+.PP
+.Vb 6
+\&    $x =~ /(?{print "Hi Mom!";})/;         # matches,
+\&                                           # prints \*(AqHi Mom!\*(Aq
+\&    $x =~ /(?{$c = 1;})(?{print "$c";})/;  # matches,
+\&                                           # prints \*(Aq1\*(Aq
+\&    $x =~ /(?{$c = 1;})(?{print "$^R";})/; # matches,
+\&                                           # prints \*(Aq1\*(Aq
+.Ve
+.PP
+The bit of magic mentioned in the section title occurs when the regexp
+backtracks in the process of searching for a match.  If the regexp
+backtracks over a code expression and if the variables used within are
+localized using \f(CW\*(C`local\*(C'\fR, the changes in the variables produced by the
+code expression are undone! Thus, if we wanted to count how many times
+a character got matched inside a group, we could use, \fIe.g.\fR,
+.PP
+.Vb 11
+\&    $x = "aaaa";
+\&    $count = 0;  # initialize \*(Aqa\*(Aq count
+\&    $c = "bob";  # test if $c gets clobbered
+\&    $x =~ /(?{local $c = 0;})         # initialize count
+\&           ( a                        # match \*(Aqa\*(Aq
+\&             (?{local $c = $c + 1;})  # increment count
+\&           )*                         # do this any number of times,
+\&           aa                         # but match \*(Aqaa\*(Aq at the end
+\&           (?{$count = $c;})          # copy local $c var into $count
+\&          /x;
+\&    print "\*(Aqa\*(Aq count is $count, \e$c variable is \*(Aq$c\*(Aq\en";
+.Ve
+.PP
+This prints
+.PP
+.Vb 1
+\&    \*(Aqa\*(Aq count is 2, $c variable is \*(Aqbob\*(Aq
+.Ve
+.PP
+If we replace the \f(CW\*(C`\ (?{local\ $c\ =\ $c\ +\ 1;})\*(C'\fR with
+\&\f(CW\*(C`\ (?{$c\ =\ $c\ +\ 1;})\*(C'\fR, the variable changes are \fInot\fR undone
+during backtracking, and we get
+.PP
+.Vb 1
+\&    \*(Aqa\*(Aq count is 4, $c variable is \*(Aqbob\*(Aq
+.Ve
+.PP
+Note that only localized variable changes are undone.  Other side
+effects of code expression execution are permanent.  Thus
+.PP
+.Vb 2
+\&    $x = "aaaa";
+\&    $x =~ /(a(?{print "Yow\en";}))*aa/;
+.Ve
+.PP
+produces
+.PP
+.Vb 4
+\&   Yow
+\&   Yow
+\&   Yow
+\&   Yow
+.Ve
+.PP
+The result \f(CW$^R\fR is automatically localized, so that it will behave
+properly in the presence of backtracking.
+.PP
+This example uses a code expression in a conditional to match a
+definite article, either \f(CW\*(Aqthe\*(Aq\fR in English or \f(CW\*(Aqder|die|das\*(Aq\fR in
+German:
+.PP
+.Vb 11
+\&    $lang = \*(AqDE\*(Aq;  # use German
+\&    ...
+\&    $text = "das";
+\&    print "matched\en"
+\&        if $text =~ /(?(?{
+\&                          $lang eq \*(AqEN\*(Aq; # is the language English?
+\&                         })
+\&                       the |             # if so, then match \*(Aqthe\*(Aq
+\&                       (der|die|das)     # else, match \*(Aqder|die|das\*(Aq
+\&                     )
+\&                    /xi;
+.Ve
+.PP
+Note that the syntax here is \f(CW\*(C`(?(?{...})\fR\f(CIyes\-regexp\fR\f(CW|\fR\f(CIno\-regexp\fR\f(CW)\*(C'\fR, not
+\&\f(CW\*(C`(?((?{...}))\fR\f(CIyes\-regexp\fR\f(CW|\fR\f(CIno\-regexp\fR\f(CW)\*(C'\fR.  In other words, in the case of a
+code expression, we don't need the extra parentheses around the
+conditional.
+.PP
+If you try to use code expressions where the code text is contained within
+an interpolated variable, rather than appearing literally in the pattern,
+Perl may surprise you:
+.PP
+.Vb 5
+\&    $bar = 5;
+\&    $pat = \*(Aq(?{ 1 })\*(Aq;
+\&    /foo(?{ $bar })bar/; # compiles ok, $bar not interpolated
+\&    /foo(?{ 1 })$bar/;   # compiles ok, $bar interpolated
+\&    /foo${pat}bar/;      # compile error!
+\&
+\&    $pat = qr/(?{ $foo = 1 })/;  # precompile code regexp
+\&    /foo${pat}bar/;      # compiles ok
+.Ve
+.PP
+If a regexp has a variable that interpolates a code expression, Perl
+treats the regexp as an error. If the code expression is precompiled into
+a variable, however, interpolating is ok. The question is, why is this an
+error?
+.PP
+The reason is that variable interpolation and code expressions
+together pose a security risk.  The combination is dangerous because
+many programmers who write search engines often take user input and
+plug it directly into a regexp:
+.PP
+.Vb 3
+\&    $regexp = <>;       # read user\-supplied regexp
+\&    $chomp $regexp;     # get rid of possible newline
+\&    $text =~ /$regexp/; # search $text for the $regexp
+.Ve
+.PP
+If the \f(CW$regexp\fR variable contains a code expression, the user could
+then execute arbitrary Perl code.  For instance, some joker could
+search for \f(CW\*(C`system(\*(Aqrm\ \-rf\ *\*(Aq);\*(C'\fR to erase your files.  In this
+sense, the combination of interpolation and code expressions \fItaints\fR
+your regexp.  So by default, using both interpolation and code
+expressions in the same regexp is not allowed.  If you're not
+concerned about malicious users, it is possible to bypass this
+security check by invoking \f(CW\*(C`use\ re\ \*(Aqeval\*(Aq\*(C'\fR:
+.PP
+.Vb 4
+\&    use re \*(Aqeval\*(Aq;       # throw caution out the door
+\&    $bar = 5;
+\&    $pat = \*(Aq(?{ 1 })\*(Aq;
+\&    /foo${pat}bar/;      # compiles ok
+.Ve
+.PP
+Another form of code expression is the \fIpattern code expression\fR.
+The pattern code expression is like a regular code expression, except
+that the result of the code evaluation is treated as a regular
+expression and matched immediately.  A simple example is
+.PP
+.Vb 4
+\&    $length = 5;
+\&    $char = \*(Aqa\*(Aq;
+\&    $x = \*(Aqaaaaabb\*(Aq;
+\&    $x =~ /(??{$char x $length})/x; # matches, there are 5 of \*(Aqa\*(Aq
+.Ve
+.PP
+This final example contains both ordinary and pattern code
+expressions.  It detects whether a binary string \f(CW1101010010001...\fR has a
+Fibonacci spacing 0,1,1,2,3,5,...  of the \f(CW\*(Aq1\*(Aq\fR's:
+.PP
+.Vb 12
+\&    $x = "1101010010001000001";
+\&    $z0 = \*(Aq\*(Aq; $z1 = \*(Aq0\*(Aq;   # initial conditions
+\&    print "It is a Fibonacci sequence\en"
+\&        if $x =~ /^1         # match an initial \*(Aq1\*(Aq
+\&                    (?:
+\&                       ((??{ $z0 })) # match some \*(Aq0\*(Aq
+\&                       1             # and then a \*(Aq1\*(Aq
+\&                       (?{ $z0 = $z1; $z1 .= $^N; })
+\&                    )+   # repeat as needed
+\&                  $      # that is all there is
+\&                 /x;
+\&    printf "Largest sequence matched was %d\en", length($z1)\-length($z0);
+.Ve
+.PP
+Remember that \f(CW$^N\fR is set to whatever was matched by the last
+completed capture group. This prints
+.PP
+.Vb 2
+\&    It is a Fibonacci sequence
+\&    Largest sequence matched was 5
+.Ve
+.PP
+Ha! Try that with your garden variety regexp package...
+.PP
+Note that the variables \f(CW$z0\fR and \f(CW$z1\fR are not substituted when the
+regexp is compiled, as happens for ordinary variables outside a code
+expression.  Rather, the whole code block is parsed as perl code at the
+same time as perl is compiling the code containing the literal regexp
+pattern.
+.PP
+This regexp without the \f(CW\*(C`/x\*(C'\fR modifier is
+.PP
+.Vb 1
+\&    /^1(?:((??{ $z0 }))1(?{ $z0 = $z1; $z1 .= $^N; }))+$/
+.Ve
+.PP
+which shows that spaces are still possible in the code parts. Nevertheless,
+when working with code and conditional expressions, the extended form of
+regexps is almost necessary in creating and debugging regexps.
+.SS "Backtracking control verbs"
+.IX Subsection "Backtracking control verbs"
+Perl 5.10 introduced a number of control verbs intended to provide
+detailed control over the backtracking process, by directly influencing
+the regexp engine and by providing monitoring techniques.  See
+"Special Backtracking Control Verbs" in perlre for a detailed
+description.
+.PP
+Below is just one example, illustrating the control verb \f(CW\*(C`(*FAIL)\*(C'\fR,
+which may be abbreviated as \f(CW\*(C`(*F)\*(C'\fR. If this is inserted in a regexp
+it will cause it to fail, just as it would at some
+mismatch between the pattern and the string. Processing
+of the regexp continues as it would after any "normal"
+failure, so that, for instance, the next position in the string or another
+alternative will be tried. As failing to match doesn't preserve capture
+groups or produce results, it may be necessary to use this in
+combination with embedded code.
+.PP
+.Vb 4
+\&   %count = ();
+\&   "supercalifragilisticexpialidocious" =~
+\&       /([aeiou])(?{ $count{$1}++; })(*FAIL)/i;
+\&   printf "%3d \*(Aq%s\*(Aq\en", $count{$_}, $_ for (sort keys %count);
+.Ve
+.PP
+The pattern begins with a class matching a subset of letters.  Whenever
+this matches, a statement like \f(CW\*(C`$count{\*(Aqa\*(Aq}++;\*(C'\fR is executed, incrementing
+the letter's counter. Then \f(CW\*(C`(*FAIL)\*(C'\fR does what it says, and
+the regexp engine proceeds according to the book: as long as the end of
+the string hasn't been reached, the position is advanced before looking
+for another vowel. Thus, match or no match makes no difference, and the
+regexp engine proceeds until the entire string has been inspected.
+(It's remarkable that an alternative solution using something like
+.PP
+.Vb 2
+\&   $count{lc($_)}++ for split(\*(Aq\*(Aq, "supercalifragilisticexpialidocious");
+\&   printf "%3d \*(Aq%s\*(Aq\en", $count2{$_}, $_ for ( qw{ a e i o u } );
+.Ve
+.PP
+is considerably slower.)
+.SS "Pragmas and debugging"
+.IX Subsection "Pragmas and debugging"
+Speaking of debugging, there are several pragmas available to control
+and debug regexps in Perl.  We have already encountered one pragma in
+the previous section, \f(CW\*(C`use\ re\ \*(Aqeval\*(Aq;\*(C'\fR, that allows variable
+interpolation and code expressions to coexist in a regexp.  The other
+pragmas are
+.PP
+.Vb 3
+\&    use re \*(Aqtaint\*(Aq;
+\&    $tainted = <>;
+\&    @parts = ($tainted =~ /(\ew+)\es+(\ew+)/; # @parts is now tainted
+.Ve
+.PP
+The \f(CW\*(C`taint\*(C'\fR pragma causes any substrings from a match with a tainted
+variable to be tainted as well, if your perl supports tainting
+(see perlsec).  This is not normally the case, as
+regexps are often used to extract the safe bits from a tainted
+variable.  Use \f(CW\*(C`taint\*(C'\fR when you are not extracting safe bits, but are
+performing some other processing.  Both \f(CW\*(C`taint\*(C'\fR and \f(CW\*(C`eval\*(C'\fR pragmas
+are lexically scoped, which means they are in effect only until
+the end of the block enclosing the pragmas.
+.PP
+.Vb 2
+\&    use re \*(Aq/m\*(Aq;  # or any other flags
+\&    $multiline_string =~ /^foo/; # /m is implied
+.Ve
+.PP
+The \f(CW\*(C`re \*(Aq/flags\*(Aq\*(C'\fR pragma (introduced in Perl
+5.14) turns on the given regular expression flags
+until the end of the lexical scope.  See
+"'/flags' mode" in re for more
+detail.
+.PP
+.Vb 2
+\&    use re \*(Aqdebug\*(Aq;
+\&    /^(.*)$/s;       # output debugging info
+\&
+\&    use re \*(Aqdebugcolor\*(Aq;
+\&    /^(.*)$/s;       # output debugging info in living color
+.Ve
+.PP
+The global \f(CW\*(C`debug\*(C'\fR and \f(CW\*(C`debugcolor\*(C'\fR pragmas allow one to get
+detailed debugging info about regexp compilation and
+execution.  \f(CW\*(C`debugcolor\*(C'\fR is the same as debug, except the debugging
+information is displayed in color on terminals that can display
+termcap color sequences.  Here is example output:
+.PP
+.Vb 10
+\&    % perl \-e \*(Aquse re "debug"; "abc" =~ /a*b+c/;\*(Aq
+\&    Compiling REx \*(Aqa*b+c\*(Aq
+\&    size 9 first at 1
+\&       1: STAR(4)
+\&       2:   EXACT <a>(0)
+\&       4: PLUS(7)
+\&       5:   EXACT <b>(0)
+\&       7: EXACT <c>(9)
+\&       9: END(0)
+\&    floating \*(Aqbc\*(Aq at 0..2147483647 (checking floating) minlen 2
+\&    Guessing start of match, REx \*(Aqa*b+c\*(Aq against \*(Aqabc\*(Aq...
+\&    Found floating substr \*(Aqbc\*(Aq at offset 1...
+\&    Guessed: match at offset 0
+\&    Matching REx \*(Aqa*b+c\*(Aq against \*(Aqabc\*(Aq
+\&      Setting an EVAL scope, savestack=3
+\&       0 <> <abc>           |  1:  STAR
+\&                             EXACT <a> can match 1 times out of 32767...
+\&      Setting an EVAL scope, savestack=3
+\&       1 <a> <bc>           |  4:    PLUS
+\&                             EXACT <b> can match 1 times out of 32767...
+\&      Setting an EVAL scope, savestack=3
+\&       2 <ab> <c>           |  7:      EXACT <c>
+\&       3 <abc> <>           |  9:      END
+\&    Match successful!
+\&    Freeing REx: \*(Aqa*b+c\*(Aq
+.Ve
+.PP
+If you have gotten this far into the tutorial, you can probably guess
+what the different parts of the debugging output tell you.  The first
+part
+.PP
+.Vb 8
+\&    Compiling REx \*(Aqa*b+c\*(Aq
+\&    size 9 first at 1
+\&       1: STAR(4)
+\&       2:   EXACT <a>(0)
+\&       4: PLUS(7)
+\&       5:   EXACT <b>(0)
+\&       7: EXACT <c>(9)
+\&       9: END(0)
+.Ve
+.PP
+describes the compilation stage.  \f(CWSTAR(4)\fR means that there is a
+starred object, in this case \f(CW\*(Aqa\*(Aq\fR, and if it matches, goto line 4,
+\&\fIi.e.\fR, \f(CWPLUS(7)\fR.  The middle lines describe some heuristics and
+optimizations performed before a match:
+.PP
+.Vb 4
+\&    floating \*(Aqbc\*(Aq at 0..2147483647 (checking floating) minlen 2
+\&    Guessing start of match, REx \*(Aqa*b+c\*(Aq against \*(Aqabc\*(Aq...
+\&    Found floating substr \*(Aqbc\*(Aq at offset 1...
+\&    Guessed: match at offset 0
+.Ve
+.PP
+Then the match is executed and the remaining lines describe the
+process:
+.PP
+.Vb 12
+\&    Matching REx \*(Aqa*b+c\*(Aq against \*(Aqabc\*(Aq
+\&      Setting an EVAL scope, savestack=3
+\&       0 <> <abc>           |  1:  STAR
+\&                             EXACT <a> can match 1 times out of 32767...
+\&      Setting an EVAL scope, savestack=3
+\&       1 <a> <bc>           |  4:    PLUS
+\&                             EXACT <b> can match 1 times out of 32767...
+\&      Setting an EVAL scope, savestack=3
+\&       2 <ab> <c>           |  7:      EXACT <c>
+\&       3 <abc> <>           |  9:      END
+\&    Match successful!
+\&    Freeing REx: \*(Aqa*b+c\*(Aq
+.Ve
+.PP
+Each step is of the form \f(CW\*(C`n\ <x>\ <y>\*(C'\fR, with \f(CW\*(C`<x>\*(C'\fR the
+part of the string matched and \f(CW\*(C`<y>\*(C'\fR the part not yet
+matched.  The \f(CW\*(C`|\ \ 1:\ \ STAR\*(C'\fR says that Perl is at line number 1
+in the compilation list above.  See
+"Debugging Regular Expressions" in perldebguts for much more detail.
+.PP
+An alternative method of debugging regexps is to embed \f(CW\*(C`print\*(C'\fR
+statements within the regexp.  This provides a blow-by-blow account of
+the backtracking in an alternation:
+.PP
+.Vb 12
+\&    "that this" =~ m@(?{print "Start at position ", pos, "\en";})
+\&                     t(?{print "t1\en";})
+\&                     h(?{print "h1\en";})
+\&                     i(?{print "i1\en";})
+\&                     s(?{print "s1\en";})
+\&                         |
+\&                     t(?{print "t2\en";})
+\&                     h(?{print "h2\en";})
+\&                     a(?{print "a2\en";})
+\&                     t(?{print "t2\en";})
+\&                     (?{print "Done at position ", pos, "\en";})
+\&                    @x;
+.Ve
+.PP
+prints
+.PP
+.Vb 8
+\&    Start at position 0
+\&    t1
+\&    h1
+\&    t2
+\&    h2
+\&    a2
+\&    t2
+\&    Done at position 4
+.Ve
+.SH "SEE ALSO"
+.IX Header "SEE ALSO"
+This is just a tutorial.  For the full story on Perl regular
+expressions, see the perlre regular expressions reference page.
+.PP
+For more information on the matching \f(CW\*(C`m//\*(C'\fR and substitution \f(CW\*(C`s///\*(C'\fR
+operators, see "Regexp Quote-Like Operators" in perlop.  For
+information on the \f(CW\*(C`split\*(C'\fR operation, see "split" in perlfunc.
+.PP
+For an excellent all-around resource on the care and feeding of
+regular expressions, see the book \fIMastering Regular Expressions\fR by
+Jeffrey Friedl (published by O'Reilly, ISBN 1556592\-257\-3).
+.SH "AUTHOR AND COPYRIGHT"
+.IX Header "AUTHOR AND COPYRIGHT"
+Copyright (c) 2000 Mark Kvale.
+All rights reserved.
+Now maintained by Perl porters.
+.PP
+This document may be distributed under the same terms as Perl itself.
+.SS Acknowledgments
+.IX Subsection "Acknowledgments"
+The inspiration for the stop codon DNA example came from the ZIP
+code example in chapter 7 of \fIMastering Regular Expressions\fR.
+.PP
+The author would like to thank Jeff Pinyan, Andrew Johnson, Peter
+Haworth, Ronald J Kimball, and Joe Smith for all their helpful
+comments.