summaryrefslogtreecommitdiffstats
path: root/upstream/archlinux/man1/perlre.1perl
diff options
context:
space:
mode:
Diffstat (limited to 'upstream/archlinux/man1/perlre.1perl')
-rw-r--r--upstream/archlinux/man1/perlre.1perl3711
1 files changed, 3711 insertions, 0 deletions
diff --git a/upstream/archlinux/man1/perlre.1perl b/upstream/archlinux/man1/perlre.1perl
new file mode 100644
index 00000000..5380b09b
--- /dev/null
+++ b/upstream/archlinux/man1/perlre.1perl
@@ -0,0 +1,3711 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+. ds C` ""
+. ds C' ""
+'br\}
+.el\{\
+. ds C`
+. ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD. Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+. if \nF \{\
+. de IX
+. tm Index:\\$1\t\\n%\t"\\$2"
+..
+. if !\nF==2 \{\
+. nr % 0
+. nr F 2
+. \}
+. \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "PERLRE 1perl"
+.TH PERLRE 1perl 2024-02-11 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification. Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+perlre \- Perl regular expressions
+.IX Xref "regular expression regex regexp"
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+This page describes the syntax of regular expressions in Perl.
+.PP
+If you haven't used regular expressions before, a tutorial introduction
+is available in perlretut. If you know just a little about them,
+a quick-start introduction is available in perlrequick.
+.PP
+Except for "The Basics" section, this page assumes you are familiar
+with regular expression basics, like what is a "pattern", what does it
+look like, and how it is basically used. For a reference on how they
+are used, plus various examples of the same, see discussions of \f(CW\*(C`m//\*(C'\fR,
+\&\f(CW\*(C`s///\*(C'\fR, \f(CW\*(C`qr//\*(C'\fR and \f(CW"??"\fR in "Regexp Quote-Like Operators" in perlop.
+.PP
+New in v5.22, \f(CW\*(C`use re \*(Aqstrict\*(Aq\*(C'\fR applies stricter
+rules than otherwise when compiling regular expression patterns. It can
+find things that, while legal, may not be what you intended.
+.SS "The Basics"
+.IX Xref "regular expression, version 8 regex, version 8 regexp, version 8"
+.IX Subsection "The Basics"
+Regular expressions are strings with the very particular syntax and
+meaning described in this document and auxiliary documents referred to
+by this one. The strings are called "patterns". Patterns are used to
+determine if some other string, called the "target", has (or doesn't
+have) the characteristics specified by the pattern. We call this
+"matching" the target string against the pattern. Usually the match is
+done by having the target be the first operand, and the pattern be the
+second operand, of one of the two binary operators \f(CW\*(C`=~\*(C'\fR and \f(CW\*(C`!~\*(C'\fR,
+listed in "Binding Operators" in perlop; and the pattern will have been
+converted from an ordinary string by one of the operators in
+"Regexp Quote-Like Operators" in perlop, like so:
+.PP
+.Vb 1
+\& $foo =~ m/abc/
+.Ve
+.PP
+This evaluates to true if and only if the string in the variable \f(CW$foo\fR
+contains somewhere in it, the sequence of characters "a", "b", then "c".
+(The \f(CW\*(C`=~ m\*(C'\fR, or match operator, is described in
+"m/PATTERN/msixpodualngc" in perlop.)
+.PP
+Patterns that aren't already stored in some variable must be delimited,
+at both ends, by delimiter characters. These are often, as in the
+example above, forward slashes, and the typical way a pattern is written
+in documentation is with those slashes. In most cases, the delimiter
+is the same character, fore and aft, but there are a few cases where a
+character looks like it has a mirror-image mate, where the opening
+version is the beginning delimiter, and the closing one is the ending
+delimiter, like
+.PP
+.Vb 1
+\& $foo =~ m<abc>
+.Ve
+.PP
+Most times, the pattern is evaluated in double-quotish context, but it
+is possible to choose delimiters to force single-quotish, like
+.PP
+.Vb 1
+\& $foo =~ m\*(Aqabc\*(Aq
+.Ve
+.PP
+If the pattern contains its delimiter within it, that delimiter must be
+escaped. Prefixing it with a backslash (\fIe.g.\fR, \f(CW"/foo\e/bar/"\fR)
+serves this purpose.
+.PP
+Any single character in a pattern matches that same character in the
+target string, unless the character is a \fImetacharacter\fR with a special
+meaning described in this document. A sequence of non-metacharacters
+matches the same sequence in the target string, as we saw above with
+\&\f(CW\*(C`m/abc/\*(C'\fR.
+.PP
+Only a few characters (all of them being ASCII punctuation characters)
+are metacharacters. The most commonly used one is a dot \f(CW"."\fR, which
+normally matches almost any character (including a dot itself).
+.PP
+You can cause characters that normally function as metacharacters to be
+interpreted literally by prefixing them with a \f(CW"\e"\fR, just like the
+pattern's delimiter must be escaped if it also occurs within the
+pattern. Thus, \f(CW"\e."\fR matches just a literal dot, \f(CW"."\fR instead of
+its normal meaning. This means that the backslash is also a
+metacharacter, so \f(CW"\e\e"\fR matches a single \f(CW"\e"\fR. And a sequence that
+contains an escaped metacharacter matches the same sequence (but without
+the escape) in the target string. So, the pattern \f(CW\*(C`/blur\e\efl/\*(C'\fR would
+match any target string that contains the sequence \f(CW"blur\efl"\fR.
+.PP
+The metacharacter \f(CW"|"\fR is used to match one thing or another. Thus
+.PP
+.Vb 1
+\& $foo =~ m/this|that/
+.Ve
+.PP
+is TRUE if and only if \f(CW$foo\fR contains either the sequence \f(CW"this"\fR or
+the sequence \f(CW"that"\fR. Like all metacharacters, prefixing the \f(CW"|"\fR
+with a backslash makes it match the plain punctuation character; in its
+case, the VERTICAL LINE.
+.PP
+.Vb 1
+\& $foo =~ m/this\e|that/
+.Ve
+.PP
+is TRUE if and only if \f(CW$foo\fR contains the sequence \f(CW"this|that"\fR.
+.PP
+You aren't limited to just a single \f(CW"|"\fR.
+.PP
+.Vb 1
+\& $foo =~ m/fee|fie|foe|fum/
+.Ve
+.PP
+is TRUE if and only if \f(CW$foo\fR contains any of those 4 sequences from
+the children's story "Jack and the Beanstalk".
+.PP
+As you can see, the \f(CW"|"\fR binds less tightly than a sequence of
+ordinary characters. We can override this by using the grouping
+metacharacters, the parentheses \f(CW"("\fR and \f(CW")"\fR.
+.PP
+.Vb 1
+\& $foo =~ m/th(is|at) thing/
+.Ve
+.PP
+is TRUE if and only if \f(CW$foo\fR contains either the sequence \f(CW"this\ thing"\fR or the sequence \f(CW"that\ thing"\fR. The portions of the string
+that match the portions of the pattern enclosed in parentheses are
+normally made available separately for use later in the pattern,
+substitution, or program. This is called "capturing", and it can get
+complicated. See "Capture groups".
+.PP
+The first alternative includes everything from the last pattern
+delimiter (\f(CW"("\fR, \f(CW"(?:"\fR (described later), \fIetc\fR. or the beginning
+of the pattern) up to the first \f(CW"|"\fR, and the last alternative
+contains everything from the last \f(CW"|"\fR to the next closing pattern
+delimiter. That's why it's common practice to include alternatives in
+parentheses: to minimize confusion about where they start and end.
+.PP
+Alternatives are tried from left to right, so the first
+alternative found for which the entire expression matches, is the one that
+is chosen. This means that alternatives are not necessarily greedy. For
+example: when matching \f(CW\*(C`foo|foot\*(C'\fR against \f(CW"barefoot"\fR, only the \f(CW"foo"\fR
+part will match, as that is the first alternative tried, and it successfully
+matches the target string. (This might not seem important, but it is
+important when you are capturing matched text using parentheses.)
+.PP
+Besides taking away the special meaning of a metacharacter, a prefixed
+backslash changes some letter and digit characters away from matching
+just themselves to instead have special meaning. These are called
+"escape sequences", and all such are described in perlrebackslash. A
+backslash sequence (of a letter or digit) that doesn't currently have
+special meaning to Perl will raise a warning if warnings are enabled,
+as those are reserved for potential future use.
+.PP
+One such sequence is \f(CW\*(C`\eb\*(C'\fR, which matches a boundary of some sort.
+\&\f(CW\*(C`\eb{wb}\*(C'\fR and a few others give specialized types of boundaries.
+(They are all described in detail starting at
+"\eb{}, \eb, \eB{}, \eB" in perlrebackslash.) Note that these don't match
+characters, but the zero-width spaces between characters. They are an
+example of a zero-width assertion. Consider again,
+.PP
+.Vb 1
+\& $foo =~ m/fee|fie|foe|fum/
+.Ve
+.PP
+It evaluates to TRUE if, besides those 4 words, any of the sequences
+"feed", "field", "Defoe", "fume", and many others are in \f(CW$foo\fR. By
+judicious use of \f(CW\*(C`\eb\*(C'\fR (or better (because it is designed to handle
+natural language) \f(CW\*(C`\eb{wb}\*(C'\fR), we can make sure that only the Giant's
+words are matched:
+.PP
+.Vb 2
+\& $foo =~ m/\eb(fee|fie|foe|fum)\eb/
+\& $foo =~ m/\eb{wb}(fee|fie|foe|fum)\eb{wb}/
+.Ve
+.PP
+The final example shows that the characters \f(CW"{"\fR and \f(CW"}"\fR are
+metacharacters.
+.PP
+Another use for escape sequences is to specify characters that cannot
+(or which you prefer not to) be written literally. These are described
+in detail in "Character Escapes" in perlrebackslash, but the next three
+paragraphs briefly describe some of them.
+.PP
+Various control characters can be written in C language style: \f(CW"\en"\fR
+matches a newline, \f(CW"\et"\fR a tab, \f(CW"\er"\fR a carriage return, \f(CW"\ef"\fR a
+form feed, \fIetc\fR.
+.PP
+More generally, \f(CW\*(C`\e\fR\f(CInnn\fR\f(CW\*(C'\fR, where \fInnn\fR is a string of three octal
+digits, matches the character whose native code point is \fInnn\fR. You
+can easily run into trouble if you don't have exactly three digits. So
+always use three, or since Perl 5.14, you can use \f(CW\*(C`\eo{...}\*(C'\fR to specify
+any number of octal digits.
+.PP
+Similarly, \f(CW\*(C`\ex\fR\f(CInn\fR\f(CW\*(C'\fR, where \fInn\fR are hexadecimal digits, matches the
+character whose native ordinal is \fInn\fR. Again, not using exactly two
+digits is a recipe for disaster, but you can use \f(CW\*(C`\ex{...}\*(C'\fR to specify
+any number of hex digits.
+.PP
+Besides being a metacharacter, the \f(CW"."\fR is an example of a "character
+class", something that can match any single character of a given set of
+them. In its case, the set is just about all possible characters. Perl
+predefines several character classes besides the \f(CW"."\fR; there is a
+separate reference page about just these, perlrecharclass.
+.PP
+You can define your own custom character classes, by putting into your
+pattern in the appropriate place(s), a list of all the characters you
+want in the set. You do this by enclosing the list within \f(CW\*(C`[]\*(C'\fR bracket
+characters. These are called "bracketed character classes" when we are
+being precise, but often the word "bracketed" is dropped. (Dropping it
+usually doesn't cause confusion.) This means that the \f(CW"["\fR character
+is another metacharacter. It doesn't match anything just by itself; it
+is used only to tell Perl that what follows it is a bracketed character
+class. If you want to match a literal left square bracket, you must
+escape it, like \f(CW"\e["\fR. The matching \f(CW"]"\fR is also a metacharacter;
+again it doesn't match anything by itself, but just marks the end of
+your custom class to Perl. It is an example of a "sometimes
+metacharacter". It isn't a metacharacter if there is no corresponding
+\&\f(CW"["\fR, and matches its literal self:
+.PP
+.Vb 1
+\& print "]" =~ /]/; # prints 1
+.Ve
+.PP
+The list of characters within the character class gives the set of
+characters matched by the class. \f(CW"[abc]"\fR matches a single "a" or "b"
+or "c". But if the first character after the \f(CW"["\fR is \f(CW"^"\fR, the
+class instead matches any character not in the list. Within a list, the
+\&\f(CW"\-"\fR character specifies a range of characters, so that \f(CW\*(C`a\-z\*(C'\fR
+represents all characters between "a" and "z", inclusive. If you want
+either \f(CW"\-"\fR or \f(CW"]"\fR itself to be a member of a class, put it at the
+start of the list (possibly after a \f(CW"^"\fR), or escape it with a
+backslash. \f(CW"\-"\fR is also taken literally when it is at the end of the
+list, just before the closing \f(CW"]"\fR. (The following all specify the
+same class of three characters: \f(CW\*(C`[\-az]\*(C'\fR, \f(CW\*(C`[az\-]\*(C'\fR, and \f(CW\*(C`[a\e\-z]\*(C'\fR. All
+are different from \f(CW\*(C`[a\-z]\*(C'\fR, which specifies a class containing
+twenty-six characters, even on EBCDIC-based character sets.)
+.PP
+There is lots more to bracketed character classes; full details are in
+"Bracketed Character Classes" in perlrecharclass.
+.PP
+\fIMetacharacters\fR
+.IX Xref "metacharacter \\ ^ . $ | ( () [ []"
+.IX Subsection "Metacharacters"
+.PP
+"The Basics" introduced some of the metacharacters. This section
+gives them all. Most of them have the same meaning as in the \fIegrep\fR
+command.
+.PP
+Only the \f(CW"\e"\fR is always a metacharacter. The others are metacharacters
+just sometimes. The following tables lists all of them, summarizes
+their use, and gives the contexts where they are metacharacters.
+Outside those contexts or if prefixed by a \f(CW"\e"\fR, they match their
+corresponding punctuation character. In some cases, their meaning
+varies depending on various pattern modifiers that alter the default
+behaviors. See "Modifiers".
+.PP
+.Vb 10
+\& PURPOSE WHERE
+\& \e Escape the next character Always, except when
+\& escaped by another \e
+\& ^ Match the beginning of the string Not in []
+\& (or line, if /m is used)
+\& ^ Complement the [] class At the beginning of []
+\& . Match any single character except newline Not in []
+\& (under /s, includes newline)
+\& $ Match the end of the string Not in [], but can
+\& (or before newline at the end of the mean interpolate a
+\& string; or before any newline if /m is scalar
+\& used)
+\& | Alternation Not in []
+\& () Grouping Not in []
+\& [ Start Bracketed Character class Not in []
+\& ] End Bracketed Character class Only in [], and
+\& not first
+\& * Matches the preceding element 0 or more Not in []
+\& times
+\& + Matches the preceding element 1 or more Not in []
+\& times
+\& ? Matches the preceding element 0 or 1 Not in []
+\& times
+\& { Starts a sequence that gives number(s) Not in []
+\& of times the preceding element can be
+\& matched
+\& { when following certain escape sequences
+\& starts a modifier to the meaning of the
+\& sequence
+\& } End sequence started by {
+\& \- Indicates a range Only in [] interior
+\& # Beginning of comment, extends to line end Only with /x modifier
+.Ve
+.PP
+Notice that most of the metacharacters lose their special meaning when
+they occur in a bracketed character class, except \f(CW"^"\fR has a different
+meaning when it is at the beginning of such a class. And \f(CW"\-"\fR and \f(CW"]"\fR
+are metacharacters only at restricted positions within bracketed
+character classes; while \f(CW"}"\fR is a metacharacter only when closing a
+special construct started by \f(CW"{"\fR.
+.PP
+In double-quotish context, as is usually the case, you need to be
+careful about \f(CW"$"\fR and the non-metacharacter \f(CW"@"\fR. Those could
+interpolate variables, which may or may not be what you intended.
+.PP
+These rules were designed for compactness of expression, rather than
+legibility and maintainability. The "/x and /xx" pattern
+modifiers allow you to insert white space to improve readability. And
+use of \f(CW\*(C`re\ \*(Aqstrict\*(Aq\*(C'\fR adds extra checking to
+catch some typos that might silently compile into something unintended.
+.PP
+By default, the \f(CW"^"\fR character is guaranteed to match only the
+beginning of the string, the \f(CW"$"\fR character only the end (or before the
+newline at the end), and Perl does certain optimizations with the
+assumption that the string contains only one line. Embedded newlines
+will not be matched by \f(CW"^"\fR or \f(CW"$"\fR. You may, however, wish to treat a
+string as a multi-line buffer, such that the \f(CW"^"\fR will match after any
+newline within the string (except if the newline is the last character in
+the string), and \f(CW"$"\fR will match before any newline. At the
+cost of a little more overhead, you can do this by using the
+\&\f(CW"/m"\fR modifier on the pattern match operator. (Older programs
+did this by setting \f(CW$*\fR, but this option was removed in perl 5.10.)
+.IX Xref "^ $ m"
+.PP
+To simplify multi-line substitutions, the \f(CW"."\fR character never matches a
+newline unless you use the \f(CW\*(C`/s\*(C'\fR modifier, which in effect tells
+Perl to pretend the string is a single line\-\-even if it isn't.
+.IX Xref ". s"
+.SS Modifiers
+.IX Subsection "Modifiers"
+\fIOverview\fR
+.IX Subsection "Overview"
+.PP
+The default behavior for matching can be changed, using various
+modifiers. Modifiers that relate to the interpretation of the pattern
+are listed just below. Modifiers that alter the way a pattern is used
+by Perl are detailed in "Regexp Quote-Like Operators" in perlop and
+"Gory details of parsing quoted constructs" in perlop. Modifiers can be added
+dynamically; see "Extended Patterns" below.
+.ie n .IP "\fR\fB""m""\fR\fB\fR" 4
+.el .IP \fR\f(CBm\fR\fB\fR 4
+.IX Xref " m regex, multiline regexp, multiline regular expression, multiline"
+.IX Item "m"
+Treat the string being matched against as multiple lines. That is, change \f(CW"^"\fR and \f(CW"$"\fR from matching
+the start of the string's first line and the end of its last line to
+matching the start and end of each line within the string.
+.ie n .IP "\fR\fB""s""\fR\fB\fR" 4
+.el .IP \fR\f(CBs\fR\fB\fR 4
+.IX Xref " s regex, single-line regexp, single-line regular expression, single-line"
+.IX Item "s"
+Treat the string as single line. That is, change \f(CW"."\fR to match any character
+whatsoever, even a newline, which normally it would not match.
+.Sp
+Used together, as \f(CW\*(C`/ms\*(C'\fR, they let the \f(CW"."\fR match any character whatsoever,
+while still allowing \f(CW"^"\fR and \f(CW"$"\fR to match, respectively, just after
+and just before newlines within the string.
+.ie n .IP "\fR\fB""i""\fR\fB\fR" 4
+.el .IP \fR\f(CBi\fR\fB\fR 4
+.IX Xref " i regex, case-insensitive regexp, case-insensitive regular expression, case-insensitive"
+.IX Item "i"
+Do case-insensitive pattern matching. For example, "A" will match "a"
+under \f(CW\*(C`/i\*(C'\fR.
+.Sp
+If locale matching rules are in effect, the case map is taken from the
+current
+locale for code points less than 255, and from Unicode rules for larger
+code points. However, matches that would cross the Unicode
+rules/non\-Unicode rules boundary (ords 255/256) will not succeed, unless
+the locale is a UTF\-8 one. See perllocale.
+.Sp
+There are a number of Unicode characters that match a sequence of
+multiple characters under \f(CW\*(C`/i\*(C'\fR. For example,
+\&\f(CW\*(C`LATIN SMALL LIGATURE FI\*(C'\fR should match the sequence \f(CW\*(C`fi\*(C'\fR. Perl is not
+currently able to do this when the multiple characters are in the pattern and
+are split between groupings, or when one or more are quantified. Thus
+.Sp
+.Vb 3
+\& "\eN{LATIN SMALL LIGATURE FI}" =~ /fi/i; # Matches
+\& "\eN{LATIN SMALL LIGATURE FI}" =~ /[fi][fi]/i; # Doesn\*(Aqt match!
+\& "\eN{LATIN SMALL LIGATURE FI}" =~ /fi*/i; # Doesn\*(Aqt match!
+\&
+\& # The below doesn\*(Aqt match, and it isn\*(Aqt clear what $1 and $2 would
+\& # be even if it did!!
+\& "\eN{LATIN SMALL LIGATURE FI}" =~ /(f)(i)/i; # Doesn\*(Aqt match!
+.Ve
+.Sp
+Perl doesn't match multiple characters in a bracketed
+character class unless the character that maps to them is explicitly
+mentioned, and it doesn't match them at all if the character class is
+inverted, which otherwise could be highly confusing. See
+"Bracketed Character Classes" in perlrecharclass, and
+"Negation" in perlrecharclass.
+.ie n .IP "\fR\fB""x""\fR\fB\fR and \fB\fR\fB""xx""\fR\fB\fR" 4
+.el .IP "\fR\f(CBx\fR\fB\fR and \fB\fR\f(CBxx\fR\fB\fR" 4
+.IX Xref " x"
+.IX Item "x and xx"
+Extend your pattern's legibility by permitting whitespace and comments.
+Details in "/x and /xx"
+.ie n .IP "\fR\fB""p""\fR\fB\fR" 4
+.el .IP \fR\f(CBp\fR\fB\fR 4
+.IX Xref " p regex, preserve regexp, preserve"
+.IX Item "p"
+Preserve the string matched such that \f(CW\*(C`${^PREMATCH}\*(C'\fR, \f(CW\*(C`${^MATCH}\*(C'\fR, and
+\&\f(CW\*(C`${^POSTMATCH}\*(C'\fR are available for use after matching.
+.Sp
+In Perl 5.20 and higher this is ignored. Due to a new copy-on-write
+mechanism, \f(CW\*(C`${^PREMATCH}\*(C'\fR, \f(CW\*(C`${^MATCH}\*(C'\fR, and \f(CW\*(C`${^POSTMATCH}\*(C'\fR will be available
+after the match regardless of the modifier.
+.ie n .IP "\fR\fB""a""\fR\fB\fR, \fB\fR\fB""d""\fR\fB\fR, \fB\fR\fB""l""\fR\fB\fR, and \fB\fR\fB""u""\fR\fB\fR" 4
+.el .IP "\fR\f(CBa\fR\fB\fR, \fB\fR\f(CBd\fR\fB\fR, \fB\fR\f(CBl\fR\fB\fR, and \fB\fR\f(CBu\fR\fB\fR" 4
+.IX Xref " a d l u"
+.IX Item "a, d, l, and u"
+These modifiers, all new in 5.14, affect which character-set rules
+(Unicode, \fIetc\fR.) are used, as described below in
+"Character set modifiers".
+.ie n .IP "\fR\fB""n""\fR\fB\fR" 4
+.el .IP \fR\f(CBn\fR\fB\fR 4
+.IX Xref " n regex, non-capture regexp, non-capture regular expression, non-capture"
+.IX Item "n"
+Prevent the grouping metacharacters \f(CW\*(C`()\*(C'\fR from capturing. This modifier,
+new in 5.22, will stop \f(CW$1\fR, \f(CW$2\fR, \fIetc\fR... from being filled in.
+.Sp
+.Vb 2
+\& "hello" =~ /(hi|hello)/; # $1 is "hello"
+\& "hello" =~ /(hi|hello)/n; # $1 is undef
+.Ve
+.Sp
+This is equivalent to putting \f(CW\*(C`?:\*(C'\fR at the beginning of every capturing group:
+.Sp
+.Vb 1
+\& "hello" =~ /(?:hi|hello)/; # $1 is undef
+.Ve
+.Sp
+\&\f(CW\*(C`/n\*(C'\fR can be negated on a per-group basis. Alternatively, named captures
+may still be used.
+.Sp
+.Vb 3
+\& "hello" =~ /(?\-n:(hi|hello))/n; # $1 is "hello"
+\& "hello" =~ /(?<greet>hi|hello)/n; # $1 is "hello", $+{greet} is
+\& # "hello"
+.Ve
+.IP "Other Modifiers" 4
+.IX Item "Other Modifiers"
+There are a number of flags that can be found at the end of regular
+expression constructs that are \fInot\fR generic regular expression flags, but
+apply to the operation being performed, like matching or substitution (\f(CW\*(C`m//\*(C'\fR
+or \f(CW\*(C`s///\*(C'\fR respectively).
+.Sp
+Flags described further in
+"Using regular expressions in Perl" in perlretut are:
+.Sp
+.Vb 2
+\& c \- keep the current position during repeated matching
+\& g \- globally match the pattern repeatedly in the string
+.Ve
+.Sp
+Substitution-specific modifiers described in
+"s/PATTERN/REPLACEMENT/msixpodualngcer" in perlop are:
+.Sp
+.Vb 4
+\& e \- evaluate the right\-hand side as an expression
+\& ee \- evaluate the right side as a string then eval the result
+\& o \- pretend to optimize your code, but actually introduce bugs
+\& r \- perform non\-destructive substitution and return the new value
+.Ve
+.PP
+Regular expression modifiers are usually written in documentation
+as \fIe.g.\fR, "the \f(CW\*(C`/x\*(C'\fR modifier", even though the delimiter
+in question might not really be a slash. The modifiers \f(CW\*(C`/imnsxadlup\*(C'\fR
+may also be embedded within the regular expression itself using
+the \f(CW\*(C`(?...)\*(C'\fR construct, see "Extended Patterns" below.
+.PP
+\fIDetails on some modifiers\fR
+.IX Subsection "Details on some modifiers"
+.PP
+Some of the modifiers require more explanation than given in the
+"Overview" above.
+.PP
+\f(CW\*(C`/x\*(C'\fR and \f(CW\*(C`/xx\*(C'\fR
+.IX Subsection "/x and /xx"
+.PP
+A single \f(CW\*(C`/x\*(C'\fR tells
+the regular expression parser to ignore most whitespace that is neither
+backslashed nor within a bracketed character class, nor within the characters
+of a multi-character metapattern like \f(CW\*(C`(?i: ... )\*(C'\fR. You can use this to
+break up your regular expression into more readable parts.
+Also, the \f(CW"#"\fR character is treated as a metacharacter introducing a
+comment that runs up to the pattern's closing delimiter, or to the end
+of the current line if the pattern extends onto the next line. Hence,
+this is very much like an ordinary Perl code comment. (You can include
+the closing delimiter within the comment only if you precede it with a
+backslash, so be careful!)
+.PP
+Use of \f(CW\*(C`/x\*(C'\fR means that if you want real
+whitespace or \f(CW"#"\fR characters in the pattern (outside a bracketed character
+class, which is unaffected by \f(CW\*(C`/x\*(C'\fR), then you'll either have to
+escape them (using backslashes or \f(CW\*(C`\eQ...\eE\*(C'\fR) or encode them using octal,
+hex, or \f(CW\*(C`\eN{}\*(C'\fR or \f(CW\*(C`\ep{name=...}\*(C'\fR escapes.
+It is ineffective to try to continue a comment onto the next line by
+escaping the \f(CW\*(C`\en\*(C'\fR with a backslash or \f(CW\*(C`\eQ\*(C'\fR.
+.PP
+You can use "(?#text)" to create a comment that ends earlier than the
+end of the current line, but \f(CW\*(C`text\*(C'\fR also can't contain the closing
+delimiter unless escaped with a backslash.
+.PP
+A common pitfall is to forget that \f(CW"#"\fR characters (outside a
+bracketed character class) begin a comment under \f(CW\*(C`/x\*(C'\fR and are not
+matched literally. Just keep that in mind when trying to puzzle out why
+a particular \f(CW\*(C`/x\*(C'\fR pattern isn't working as expected.
+Inside a bracketed character class, \f(CW"#"\fR retains its non-special,
+literal meaning.
+.PP
+Starting in Perl v5.26, if the modifier has a second \f(CW"x"\fR within it,
+the effect of a single \f(CW\*(C`/x\*(C'\fR is increased. The only difference is that
+inside bracketed character classes, non-escaped (by a backslash) SPACE
+and TAB characters are not added to the class, and hence can be inserted
+to make the classes more readable:
+.PP
+.Vb 2
+\& / [d\-e g\-i 3\-7]/xx
+\& /[ ! @ " # $ % ^ & * () = ? <> \*(Aq ]/xx
+.Ve
+.PP
+may be easier to grasp than the squashed equivalents
+.PP
+.Vb 2
+\& /[d\-eg\-i3\-7]/
+\& /[!@"#$%^&*()=?<>\*(Aq]/
+.Ve
+.PP
+Note that this unfortunately doesn't mean that your bracketed classes
+can contain comments or extend over multiple lines. A \f(CW\*(C`#\*(C'\fR inside a
+character class is still just a literal \f(CW\*(C`#\*(C'\fR, and doesn't introduce a
+comment. And, unless the closing bracket is on the same line as the
+opening one, the newline character (and everything on the next line(s)
+until terminated by a \f(CW\*(C`]\*(C'\fR will be part of the class, just as if you'd
+written \f(CW\*(C`\en\*(C'\fR.
+.PP
+Taken together, these features go a long way towards
+making Perl's regular expressions more readable. Here's an example:
+.PP
+.Vb 6
+\& # Delete (most) C comments.
+\& $program =~ s {
+\& /\e* # Match the opening delimiter.
+\& .*? # Match a minimal number of characters.
+\& \e*/ # Match the closing delimiter.
+\& } []gsx;
+.Ve
+.PP
+Note that anything inside
+a \f(CW\*(C`\eQ...\eE\*(C'\fR stays unaffected by \f(CW\*(C`/x\*(C'\fR. And note that \f(CW\*(C`/x\*(C'\fR doesn't affect
+space interpretation within a single multi-character construct. For
+example \f(CW\*(C`(?:...)\*(C'\fR can't have a space between the \f(CW"("\fR,
+\&\f(CW"?"\fR, and \f(CW":"\fR. Within any delimiters for such a construct, allowed
+spaces are not affected by \f(CW\*(C`/x\*(C'\fR, and depend on the construct. For
+example, all constructs using curly braces as delimiters, such as
+\&\f(CW\*(C`\ex{...}\*(C'\fR can have blanks within but adjacent to the braces, but not
+elsewhere, and no non-blank space characters. An exception are Unicode
+properties which follow Unicode rules, for which see
+"Properties accessible through \ep{} and \eP{}" in perluniprops.
+.IX Xref " x"
+.PP
+The set of characters that are deemed whitespace are those that Unicode
+calls "Pattern White Space", namely:
+.PP
+.Vb 11
+\& U+0009 CHARACTER TABULATION
+\& U+000A LINE FEED
+\& U+000B LINE TABULATION
+\& U+000C FORM FEED
+\& U+000D CARRIAGE RETURN
+\& U+0020 SPACE
+\& U+0085 NEXT LINE
+\& U+200E LEFT\-TO\-RIGHT MARK
+\& U+200F RIGHT\-TO\-LEFT MARK
+\& U+2028 LINE SEPARATOR
+\& U+2029 PARAGRAPH SEPARATOR
+.Ve
+.PP
+Character set modifiers
+.IX Subsection "Character set modifiers"
+.PP
+\&\f(CW\*(C`/d\*(C'\fR, \f(CW\*(C`/u\*(C'\fR, \f(CW\*(C`/a\*(C'\fR, and \f(CW\*(C`/l\*(C'\fR, available starting in 5.14, are called
+the character set modifiers; they affect the character set rules
+used for the regular expression.
+.PP
+The \f(CW\*(C`/d\*(C'\fR, \f(CW\*(C`/u\*(C'\fR, and \f(CW\*(C`/l\*(C'\fR modifiers are not likely to be of much use
+to you, and so you need not worry about them very much. They exist for
+Perl's internal use, so that complex regular expression data structures
+can be automatically serialized and later exactly reconstituted,
+including all their nuances. But, since Perl can't keep a secret, and
+there may be rare instances where they are useful, they are documented
+here.
+.PP
+The \f(CW\*(C`/a\*(C'\fR modifier, on the other hand, may be useful. Its purpose is to
+allow code that is to work mostly on ASCII data to not have to concern
+itself with Unicode.
+.PP
+Briefly, \f(CW\*(C`/l\*(C'\fR sets the character set to that of whatever \fBL\fRocale is in
+effect at the time of the execution of the pattern match.
+.PP
+\&\f(CW\*(C`/u\*(C'\fR sets the character set to \fBU\fRnicode.
+.PP
+\&\f(CW\*(C`/a\*(C'\fR also sets the character set to Unicode, BUT adds several
+restrictions for \fBA\fRSCII-safe matching.
+.PP
+\&\f(CW\*(C`/d\*(C'\fR is the old, problematic, pre\-5.14 \fBD\fRefault character set
+behavior. Its only use is to force that old behavior.
+.PP
+At any given time, exactly one of these modifiers is in effect. Their
+existence allows Perl to keep the originally compiled behavior of a
+regular expression, regardless of what rules are in effect when it is
+actually executed. And if it is interpolated into a larger regex, the
+original's rules continue to apply to it, and don't affect the other
+parts.
+.PP
+The \f(CW\*(C`/l\*(C'\fR and \f(CW\*(C`/u\*(C'\fR modifiers are automatically selected for
+regular expressions compiled within the scope of various pragmas,
+and we recommend that in general, you use those pragmas instead of
+specifying these modifiers explicitly. For one thing, the modifiers
+affect only pattern matching, and do not extend to even any replacement
+done, whereas using the pragmas gives consistent results for all
+appropriate operations within their scopes. For example,
+.PP
+.Vb 1
+\& s/foo/\eUbar/il
+.Ve
+.PP
+will match "foo" using the locale's rules for case-insensitive matching,
+but the \f(CW\*(C`/l\*(C'\fR does not affect how the \f(CW\*(C`\eU\*(C'\fR operates. Most likely you
+want both of them to use locale rules. To do this, instead compile the
+regular expression within the scope of \f(CW\*(C`use locale\*(C'\fR. This both
+implicitly adds the \f(CW\*(C`/l\*(C'\fR, and applies locale rules to the \f(CW\*(C`\eU\*(C'\fR. The
+lesson is to \f(CW\*(C`use locale\*(C'\fR, and not \f(CW\*(C`/l\*(C'\fR explicitly.
+.PP
+Similarly, it would be better to use \f(CW\*(C`use feature \*(Aqunicode_strings\*(Aq\*(C'\fR
+instead of,
+.PP
+.Vb 1
+\& s/foo/\eLbar/iu
+.Ve
+.PP
+to get Unicode rules, as the \f(CW\*(C`\eL\*(C'\fR in the former (but not necessarily
+the latter) would also use Unicode rules.
+.PP
+More detail on each of the modifiers follows. Most likely you don't
+need to know this detail for \f(CW\*(C`/l\*(C'\fR, \f(CW\*(C`/u\*(C'\fR, and \f(CW\*(C`/d\*(C'\fR, and can skip ahead
+to /a.
+.PP
+/l
+.IX Subsection "/l"
+.PP
+means to use the current locale's rules (see perllocale) when pattern
+matching. For example, \f(CW\*(C`\ew\*(C'\fR will match the "word" characters of that
+locale, and \f(CW"/i"\fR case-insensitive matching will match according to
+the locale's case folding rules. The locale used will be the one in
+effect at the time of execution of the pattern match. This may not be
+the same as the compilation-time locale, and can differ from one match
+to another if there is an intervening call of the
+\&\fBsetlocale()\fR function.
+.PP
+Prior to v5.20, Perl did not support multi-byte locales. Starting then,
+UTF\-8 locales are supported. No other multi byte locales are ever
+likely to be supported. However, in all locales, one can have code
+points above 255 and these will always be treated as Unicode no matter
+what locale is in effect.
+.PP
+Under Unicode rules, there are a few case-insensitive matches that cross
+the 255/256 boundary. Except for UTF\-8 locales in Perls v5.20 and
+later, these are disallowed under \f(CW\*(C`/l\*(C'\fR. For example, 0xFF (on ASCII
+platforms) does not caselessly match the character at 0x178, \f(CW\*(C`LATIN
+CAPITAL LETTER Y WITH DIAERESIS\*(C'\fR, because 0xFF may not be \f(CW\*(C`LATIN SMALL
+LETTER Y WITH DIAERESIS\*(C'\fR in the current locale, and Perl has no way of
+knowing if that character even exists in the locale, much less what code
+point it is.
+.PP
+In a UTF\-8 locale in v5.20 and later, the only visible difference
+between locale and non-locale in regular expressions should be tainting,
+if your perl supports taint checking (see perlsec).
+.PP
+This modifier may be specified to be the default by \f(CW\*(C`use locale\*(C'\fR, but
+see "Which character set modifier is in effect?".
+.IX Xref " l"
+.PP
+/u
+.IX Subsection "/u"
+.PP
+means to use Unicode rules when pattern matching. On ASCII platforms,
+this means that the code points between 128 and 255 take on their
+Latin\-1 (ISO\-8859\-1) meanings (which are the same as Unicode's).
+(Otherwise Perl considers their meanings to be undefined.) Thus,
+under this modifier, the ASCII platform effectively becomes a Unicode
+platform; and hence, for example, \f(CW\*(C`\ew\*(C'\fR will match any of the more than
+100_000 word characters in Unicode.
+.PP
+Unlike most locales, which are specific to a language and country pair,
+Unicode classifies all the characters that are letters \fIsomewhere\fR in
+the world as
+\&\f(CW\*(C`\ew\*(C'\fR. For example, your locale might not think that \f(CW\*(C`LATIN SMALL
+LETTER ETH\*(C'\fR is a letter (unless you happen to speak Icelandic), but
+Unicode does. Similarly, all the characters that are decimal digits
+somewhere in the world will match \f(CW\*(C`\ed\*(C'\fR; this is hundreds, not 10,
+possible matches. And some of those digits look like some of the 10
+ASCII digits, but mean a different number, so a human could easily think
+a number is a different quantity than it really is. For example,
+\&\f(CW\*(C`BENGALI DIGIT FOUR\*(C'\fR (U+09EA) looks very much like an
+\&\f(CW\*(C`ASCII DIGIT EIGHT\*(C'\fR (U+0038), and \f(CW\*(C`LEPCHA DIGIT SIX\*(C'\fR (U+1C46) looks
+very much like an \f(CW\*(C`ASCII DIGIT FIVE\*(C'\fR (U+0035). And, \f(CW\*(C`\ed+\*(C'\fR, may match
+strings of digits that are a mixture from different writing systems,
+creating a security issue. A fraudulent website, for example, could
+display the price of something using U+1C46, and it would appear to the
+user that something cost 500 units, but it really costs 600. A browser
+that enforced script runs ("Script Runs") would prevent that
+fraudulent display. "\fBnum()\fR" in Unicode::UCD can also be used to sort this
+out. Or the \f(CW\*(C`/a\*(C'\fR modifier can be used to force \f(CW\*(C`\ed\*(C'\fR to match just the
+ASCII 0 through 9.
+.PP
+Also, under this modifier, case-insensitive matching works on the full
+set of Unicode
+characters. The \f(CW\*(C`KELVIN SIGN\*(C'\fR, for example matches the letters "k" and
+"K"; and \f(CW\*(C`LATIN SMALL LIGATURE FF\*(C'\fR matches the sequence "ff", which,
+if you're not prepared, might make it look like a hexadecimal constant,
+presenting another potential security issue. See
+<https://unicode.org/reports/tr36> for a detailed discussion of Unicode
+security issues.
+.PP
+This modifier may be specified to be the default by \f(CW\*(C`use feature
+\&\*(Aqunicode_strings\*(C'\fR, \f(CW\*(C`use locale \*(Aq:not_characters\*(Aq\*(C'\fR, or
+\&\f(CW\*(C`use v5.12\*(C'\fR (or higher),
+but see "Which character set modifier is in effect?".
+.IX Xref " u"
+.PP
+/d
+.IX Subsection "/d"
+.PP
+\&\fBIMPORTANT:\fR Because of the unpredictable behaviors this
+modifier causes, only use it to maintain weird backward compatibilities.
+Use the
+\&\f(CW\*(C`unicode_strings\*(C'\fR
+feature
+in new code to avoid inadvertently enabling this modifier by default.
+.PP
+What does this modifier do? It "Depends"!
+.PP
+This modifier means to use platform-native matching rules
+except when there is cause to use Unicode rules instead, as follows:
+.IP 1. 4
+the target string's UTF8 flag
+(see below) is set; or
+.IP 2. 4
+the pattern's UTF8 flag
+(see below) is set; or
+.IP 3. 4
+the pattern explicitly mentions a code point that is above 255 (say by
+\&\f(CW\*(C`\ex{100}\*(C'\fR); or
+.IP 4. 4
+the pattern uses a Unicode name (\f(CW\*(C`\eN{...}\*(C'\fR); or
+.IP 5. 4
+the pattern uses a Unicode property (\f(CW\*(C`\ep{...}\*(C'\fR or \f(CW\*(C`\eP{...}\*(C'\fR); or
+.IP 6. 4
+the pattern uses a Unicode break (\f(CW\*(C`\eb{...}\*(C'\fR or \f(CW\*(C`\eB{...}\*(C'\fR); or
+.IP 7. 4
+the pattern uses \f(CW"(?[ ])"\fR
+.IP 8. 4
+the pattern uses \f(CW\*(C`(*script_run: ...)\*(C'\fR
+.PP
+Regarding the "UTF8 flag" references above: normally Perl applications
+shouldn't think about that flag. It's part of Perl's internals,
+so it can change whenever Perl wants. \f(CW\*(C`/d\*(C'\fR may thus cause unpredictable
+results. See "The "Unicode Bug"" in perlunicode. This bug
+has become rather infamous, leading to yet other (without swearing) names
+for this modifier like "Dicey" and "Dodgy".
+.PP
+Here are some examples of how that works on an ASCII platform:
+.PP
+.Vb 3
+\& $str = "\exDF"; #
+\& utf8::downgrade($str); # $str is not UTF8\-flagged.
+\& $str =~ /^\ew/; # No match, since no UTF8 flag.
+\&
+\& $str .= "\ex{0e0b}"; # Now $str is UTF8\-flagged.
+\& $str =~ /^\ew/; # Match! $str is now UTF8\-flagged.
+\& chop $str;
+\& $str =~ /^\ew/; # Still a match! $str retains its UTF8 flag.
+.Ve
+.PP
+Under Perl's default configuration this modifier is automatically
+selected by default when none of the others are, so yet another name
+for it (unfortunately) is "Default".
+.PP
+Whenever you can, use the
+\&\f(CW\*(C`unicode_strings\*(C'\fR
+to cause to be the default instead.
+.IX Xref " u"
+.PP
+/a (and /aa)
+.IX Subsection "/a (and /aa)"
+.PP
+This modifier stands for ASCII-restrict (or ASCII-safe). This modifier
+may be doubled-up to increase its effect.
+.PP
+When it appears singly, it causes the sequences \f(CW\*(C`\ed\*(C'\fR, \f(CW\*(C`\es\*(C'\fR, \f(CW\*(C`\ew\*(C'\fR, and
+the Posix character classes to match only in the ASCII range. They thus
+revert to their pre\-5.6, pre-Unicode meanings. Under \f(CW\*(C`/a\*(C'\fR, \f(CW\*(C`\ed\*(C'\fR
+always means precisely the digits \f(CW"0"\fR to \f(CW"9"\fR; \f(CW\*(C`\es\*(C'\fR means the five
+characters \f(CW\*(C`[ \ef\en\er\et]\*(C'\fR, and starting in Perl v5.18, the vertical tab;
+\&\f(CW\*(C`\ew\*(C'\fR means the 63 characters
+\&\f(CW\*(C`[A\-Za\-z0\-9_]\*(C'\fR; and likewise, all the Posix classes such as
+\&\f(CW\*(C`[[:print:]]\*(C'\fR match only the appropriate ASCII-range characters.
+.PP
+This modifier is useful for people who only incidentally use Unicode,
+and who do not wish to be burdened with its complexities and security
+concerns.
+.PP
+With \f(CW\*(C`/a\*(C'\fR, one can write \f(CW\*(C`\ed\*(C'\fR with confidence that it will only match
+ASCII characters, and should the need arise to match beyond ASCII, you
+can instead use \f(CW\*(C`\ep{Digit}\*(C'\fR (or \f(CW\*(C`\ep{Word}\*(C'\fR for \f(CW\*(C`\ew\*(C'\fR). There are
+similar \f(CW\*(C`\ep{...}\*(C'\fR constructs that can match beyond ASCII both white
+space (see "Whitespace" in perlrecharclass), and Posix classes (see
+"POSIX Character Classes" in perlrecharclass). Thus, this modifier
+doesn't mean you can't use Unicode, it means that to get Unicode
+matching you must explicitly use a construct (\f(CW\*(C`\ep{}\*(C'\fR, \f(CW\*(C`\eP{}\*(C'\fR) that
+signals Unicode.
+.PP
+As you would expect, this modifier causes, for example, \f(CW\*(C`\eD\*(C'\fR to mean
+the same thing as \f(CW\*(C`[^0\-9]\*(C'\fR; in fact, all non-ASCII characters match
+\&\f(CW\*(C`\eD\*(C'\fR, \f(CW\*(C`\eS\*(C'\fR, and \f(CW\*(C`\eW\*(C'\fR. \f(CW\*(C`\eb\*(C'\fR still means to match at the boundary
+between \f(CW\*(C`\ew\*(C'\fR and \f(CW\*(C`\eW\*(C'\fR, using the \f(CW\*(C`/a\*(C'\fR definitions of them (similarly
+for \f(CW\*(C`\eB\*(C'\fR).
+.PP
+Otherwise, \f(CW\*(C`/a\*(C'\fR behaves like the \f(CW\*(C`/u\*(C'\fR modifier, in that
+case-insensitive matching uses Unicode rules; for example, "k" will
+match the Unicode \f(CW\*(C`\eN{KELVIN SIGN}\*(C'\fR under \f(CW\*(C`/i\*(C'\fR matching, and code
+points in the Latin1 range, above ASCII will have Unicode rules when it
+comes to case-insensitive matching.
+.PP
+To forbid ASCII/non\-ASCII matches (like "k" with \f(CW\*(C`\eN{KELVIN SIGN}\*(C'\fR),
+specify the \f(CW"a"\fR twice, for example \f(CW\*(C`/aai\*(C'\fR or \f(CW\*(C`/aia\*(C'\fR. (The first
+occurrence of \f(CW"a"\fR restricts the \f(CW\*(C`\ed\*(C'\fR, \fIetc\fR., and the second occurrence
+adds the \f(CW\*(C`/i\*(C'\fR restrictions.) But, note that code points outside the
+ASCII range will use Unicode rules for \f(CW\*(C`/i\*(C'\fR matching, so the modifier
+doesn't really restrict things to just ASCII; it just forbids the
+intermixing of ASCII and non-ASCII.
+.PP
+To summarize, this modifier provides protection for applications that
+don't wish to be exposed to all of Unicode. Specifying it twice
+gives added protection.
+.PP
+This modifier may be specified to be the default by \f(CW\*(C`use re \*(Aq/a\*(Aq\*(C'\fR
+or \f(CW\*(C`use re \*(Aq/aa\*(Aq\*(C'\fR. If you do so, you may actually have occasion to use
+the \f(CW\*(C`/u\*(C'\fR modifier explicitly if there are a few regular expressions
+where you do want full Unicode rules (but even here, it's best if
+everything were under feature \f(CW"unicode_strings"\fR, along with the
+\&\f(CW\*(C`use re \*(Aq/aa\*(Aq\*(C'\fR). Also see "Which character set modifier is in
+effect?".
+.IX Xref " a aa"
+.PP
+Which character set modifier is in effect?
+.IX Subsection "Which character set modifier is in effect?"
+.PP
+Which of these modifiers is in effect at any given point in a regular
+expression depends on a fairly complex set of interactions. These have
+been designed so that in general you don't have to worry about it, but
+this section gives the gory details. As
+explained below in "Extended Patterns" it is possible to explicitly
+specify modifiers that apply only to portions of a regular expression.
+The innermost always has priority over any outer ones, and one applying
+to the whole expression has priority over any of the default settings that are
+described in the remainder of this section.
+.PP
+The \f(CW\*(C`use re \*(Aq/foo\*(Aq\*(C'\fR pragma can be used to set
+default modifiers (including these) for regular expressions compiled
+within its scope. This pragma has precedence over the other pragmas
+listed below that also change the defaults.
+.PP
+Otherwise, \f(CW\*(C`use locale\*(C'\fR sets the default modifier to \f(CW\*(C`/l\*(C'\fR;
+and \f(CW\*(C`use feature \*(Aqunicode_strings\*(C'\fR, or
+\&\f(CW\*(C`use v5.12\*(C'\fR (or higher) set the default to
+\&\f(CW\*(C`/u\*(C'\fR when not in the same scope as either \f(CW\*(C`use locale\*(C'\fR
+or \f(CW\*(C`use bytes\*(C'\fR.
+(\f(CW\*(C`use locale \*(Aq:not_characters\*(Aq\*(C'\fR also
+sets the default to \f(CW\*(C`/u\*(C'\fR, overriding any plain \f(CW\*(C`use locale\*(C'\fR.)
+Unlike the mechanisms mentioned above, these
+affect operations besides regular expressions pattern matching, and so
+give more consistent results with other operators, including using
+\&\f(CW\*(C`\eU\*(C'\fR, \f(CW\*(C`\el\*(C'\fR, \fIetc\fR. in substitution replacements.
+.PP
+If none of the above apply, for backwards compatibility reasons, the
+\&\f(CW\*(C`/d\*(C'\fR modifier is the one in effect by default. As this can lead to
+unexpected results, it is best to specify which other rule set should be
+used.
+.PP
+Character set modifier behavior prior to Perl 5.14
+.IX Subsection "Character set modifier behavior prior to Perl 5.14"
+.PP
+Prior to 5.14, there were no explicit modifiers, but \f(CW\*(C`/l\*(C'\fR was implied
+for regexes compiled within the scope of \f(CW\*(C`use locale\*(C'\fR, and \f(CW\*(C`/d\*(C'\fR was
+implied otherwise. However, interpolating a regex into a larger regex
+would ignore the original compilation in favor of whatever was in effect
+at the time of the second compilation. There were a number of
+inconsistencies (bugs) with the \f(CW\*(C`/d\*(C'\fR modifier, where Unicode rules
+would be used when inappropriate, and vice versa. \f(CW\*(C`\ep{}\*(C'\fR did not imply
+Unicode rules, and neither did all occurrences of \f(CW\*(C`\eN{}\*(C'\fR, until 5.12.
+.SS "Regular Expressions"
+.IX Subsection "Regular Expressions"
+\fIQuantifiers\fR
+.IX Subsection "Quantifiers"
+.PP
+Quantifiers are used when a particular portion of a pattern needs to
+match a certain number (or numbers) of times. If there isn't a
+quantifier the number of times to match is exactly one. The following
+standard quantifiers are recognized:
+.IX Xref "metacharacter quantifier * + ? {n} {n,} {n,m}"
+.PP
+.Vb 7
+\& * Match 0 or more times
+\& + Match 1 or more times
+\& ? Match 1 or 0 times
+\& {n} Match exactly n times
+\& {n,} Match at least n times
+\& {,n} Match at most n times
+\& {n,m} Match at least n but not more than m times
+.Ve
+.PP
+(If a non-escaped curly bracket occurs in a context other than one of
+the quantifiers listed above, where it does not form part of a
+backslashed sequence like \f(CW\*(C`\ex{...}\*(C'\fR, it is either a fatal syntax error,
+or treated as a regular character, generally with a deprecation warning
+raised. To escape it, you can precede it with a backslash (\f(CW"\e{"\fR) or
+enclose it within square brackets (\f(CW"[{]"\fR).
+This change will allow for future syntax extensions (like making the
+lower bound of a quantifier optional), and better error checking of
+quantifiers).
+.PP
+The \f(CW"*"\fR quantifier is equivalent to \f(CW\*(C`{0,}\*(C'\fR, the \f(CW"+"\fR
+quantifier to \f(CW\*(C`{1,}\*(C'\fR, and the \f(CW"?"\fR quantifier to \f(CW\*(C`{0,1}\*(C'\fR. \fIn\fR and \fIm\fR are limited
+to non-negative integral values less than a preset limit defined when perl is built.
+This is usually 65534 on the most common platforms. The actual limit can
+be seen in the error message generated by code such as this:
+.PP
+.Vb 1
+\& $_ **= $_ , / {$_} / for 2 .. 42;
+.Ve
+.PP
+By default, a quantified subpattern is "greedy", that is, it will match as
+many times as possible (given a particular starting location) while still
+allowing the rest of the pattern to match. If you want it to match the
+minimum number of times possible, follow the quantifier with a \f(CW"?"\fR. Note
+that the meanings don't change, just the "greediness":
+.IX Xref "metacharacter greedy greediness ? *? +? ?? {n}? {n,}? {,n}? {n,m}?"
+.PP
+.Vb 7
+\& *? Match 0 or more times, not greedily
+\& +? Match 1 or more times, not greedily
+\& ?? Match 0 or 1 time, not greedily
+\& {n}? Match exactly n times, not greedily (redundant)
+\& {n,}? Match at least n times, not greedily
+\& {,n}? Match at most n times, not greedily
+\& {n,m}? Match at least n but not more than m times, not greedily
+.Ve
+.PP
+Normally when a quantified subpattern does not allow the rest of the
+overall pattern to match, Perl will backtrack. However, this behaviour is
+sometimes undesirable. Thus Perl provides the "possessive" quantifier form
+as well.
+.PP
+.Vb 7
+\& *+ Match 0 or more times and give nothing back
+\& ++ Match 1 or more times and give nothing back
+\& ?+ Match 0 or 1 time and give nothing back
+\& {n}+ Match exactly n times and give nothing back (redundant)
+\& {n,}+ Match at least n times and give nothing back
+\& {,n}+ Match at most n times and give nothing back
+\& {n,m}+ Match at least n but not more than m times and give nothing back
+.Ve
+.PP
+For instance,
+.PP
+.Vb 1
+\& \*(Aqaaaa\*(Aq =~ /a++a/
+.Ve
+.PP
+will never match, as the \f(CW\*(C`a++\*(C'\fR will gobble up all the \f(CW"a"\fR's in the
+string and won't leave any for the remaining part of the pattern. This
+feature can be extremely useful to give perl hints about where it
+shouldn't backtrack. For instance, the typical "match a double-quoted
+string" problem can be most efficiently performed when written as:
+.PP
+.Vb 1
+\& /"(?:[^"\e\e]++|\e\e.)*+"/
+.Ve
+.PP
+as we know that if the final quote does not match, backtracking will not
+help. See the independent subexpression
+\&\f(CW"(?>\fR\f(CIpattern\fR\f(CW)"\fR for more details;
+possessive quantifiers are just syntactic sugar for that construct. For
+instance the above example could also be written as follows:
+.PP
+.Vb 1
+\& /"(?>(?:(?>[^"\e\e]+)|\e\e.)*)"/
+.Ve
+.PP
+Note that the possessive quantifier modifier can not be combined
+with the non-greedy modifier. This is because it would make no sense.
+Consider the follow equivalency table:
+.PP
+.Vb 5
+\& Illegal Legal
+\& \-\-\-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-
+\& X??+ X{0}
+\& X+?+ X{1}
+\& X{min,max}?+ X{min}
+.Ve
+.PP
+\fIEscape sequences\fR
+.IX Subsection "Escape sequences"
+.PP
+Because patterns are processed as double-quoted strings, the following
+also work:
+.PP
+.Vb 10
+\& \et tab (HT, TAB)
+\& \en newline (LF, NL)
+\& \er return (CR)
+\& \ef form feed (FF)
+\& \ea alarm (bell) (BEL)
+\& \ee escape (think troff) (ESC)
+\& \ecK control char (example: VT)
+\& \ex{}, \ex00 character whose ordinal is the given hexadecimal number
+\& \eN{name} named Unicode character or character sequence
+\& \eN{U+263D} Unicode character (example: FIRST QUARTER MOON)
+\& \eo{}, \e000 character whose ordinal is the given octal number
+\& \el lowercase next char (think vi)
+\& \eu uppercase next char (think vi)
+\& \eL lowercase until \eE (think vi)
+\& \eU uppercase until \eE (think vi)
+\& \eQ quote (disable) pattern metacharacters until \eE
+\& \eE end either case modification or quoted section, think vi
+.Ve
+.PP
+Details are in "Quote and Quote-like Operators" in perlop.
+.PP
+\fICharacter Classes and other Special Escapes\fR
+.IX Subsection "Character Classes and other Special Escapes"
+.PP
+In addition, Perl defines the following:
+.IX Xref "\\g \\k \\K backreference"
+.PP
+.Vb 10
+\& Sequence Note Description
+\& [...] [1] Match a character according to the rules of the
+\& bracketed character class defined by the "...".
+\& Example: [a\-z] matches "a" or "b" or "c" ... or "z"
+\& [[:...:]] [2] Match a character according to the rules of the POSIX
+\& character class "..." within the outer bracketed
+\& character class. Example: [[:upper:]] matches any
+\& uppercase character.
+\& (?[...]) [8] Extended bracketed character class
+\& \ew [3] Match a "word" character (alphanumeric plus "_", plus
+\& other connector punctuation chars plus Unicode
+\& marks)
+\& \eW [3] Match a non\-"word" character
+\& \es [3] Match a whitespace character
+\& \eS [3] Match a non\-whitespace character
+\& \ed [3] Match a decimal digit character
+\& \eD [3] Match a non\-digit character
+\& \epP [3] Match P, named property. Use \ep{Prop} for longer names
+\& \ePP [3] Match non\-P
+\& \eX [4] Match Unicode "eXtended grapheme cluster"
+\& \e1 [5] Backreference to a specific capture group or buffer.
+\& \*(Aq1\*(Aq may actually be any positive integer.
+\& \eg1 [5] Backreference to a specific or previous group,
+\& \eg{\-1} [5] The number may be negative indicating a relative
+\& previous group and may optionally be wrapped in
+\& curly brackets for safer parsing.
+\& \eg{name} [5] Named backreference
+\& \ek<name> [5] Named backreference
+\& \ek\*(Aqname\*(Aq [5] Named backreference
+\& \ek{name} [5] Named backreference
+\& \eK [6] Keep the stuff left of the \eK, don\*(Aqt include it in $&
+\& \eN [7] Any character but \en. Not affected by /s modifier
+\& \ev [3] Vertical whitespace
+\& \eV [3] Not vertical whitespace
+\& \eh [3] Horizontal whitespace
+\& \eH [3] Not horizontal whitespace
+\& \eR [4] Linebreak
+.Ve
+.IP [1] 4
+.IX Item "[1]"
+See "Bracketed Character Classes" in perlrecharclass for details.
+.IP [2] 4
+.IX Item "[2]"
+See "POSIX Character Classes" in perlrecharclass for details.
+.IP [3] 4
+.IX Item "[3]"
+See "Unicode Character Properties" in perlunicode for details
+.IP [4] 4
+.IX Item "[4]"
+See "Misc" in perlrebackslash for details.
+.IP [5] 4
+.IX Item "[5]"
+See "Capture groups" below for details.
+.IP [6] 4
+.IX Item "[6]"
+See "Extended Patterns" below for details.
+.IP [7] 4
+.IX Item "[7]"
+Note that \f(CW\*(C`\eN\*(C'\fR has two meanings. When of the form \f(CW\*(C`\eN{\fR\f(CINAME\fR\f(CW}\*(C'\fR, it
+matches the character or character sequence whose name is \fINAME\fR; and
+similarly
+when of the form \f(CW\*(C`\eN{U+\fR\f(CIhex\fR\f(CW}\*(C'\fR, it matches the character whose Unicode
+code point is \fIhex\fR. Otherwise it matches any character but \f(CW\*(C`\en\*(C'\fR.
+.IP [8] 4
+.IX Item "[8]"
+See "Extended Bracketed Character Classes" in perlrecharclass for details.
+.PP
+\fIAssertions\fR
+.IX Subsection "Assertions"
+.PP
+Besides \f(CW"^"\fR and \f(CW"$"\fR, Perl defines the following
+zero-width assertions:
+.IX Xref "zero-width assertion assertion regex, zero-width assertion regexp, zero-width assertion regular expression, zero-width assertion \\b \\B \\A \\Z \\z \\G"
+.PP
+.Vb 9
+\& \eb{} Match at Unicode boundary of specified type
+\& \eB{} Match where corresponding \eb{} doesn\*(Aqt match
+\& \eb Match a \ew\eW or \eW\ew boundary
+\& \eB Match except at a \ew\eW or \eW\ew boundary
+\& \eA Match only at beginning of string
+\& \eZ Match only at end of string, or before newline at the end
+\& \ez Match only at end of string
+\& \eG Match only at pos() (e.g. at the end\-of\-match position
+\& of prior m//g)
+.Ve
+.PP
+A Unicode boundary (\f(CW\*(C`\eb{}\*(C'\fR), available starting in v5.22, is a spot
+between two characters, or before the first character in the string, or
+after the final character in the string where certain criteria defined
+by Unicode are met. See "\eb{}, \eb, \eB{}, \eB" in perlrebackslash for
+details.
+.PP
+A word boundary (\f(CW\*(C`\eb\*(C'\fR) is a spot between two characters
+that has a \f(CW\*(C`\ew\*(C'\fR on one side of it and a \f(CW\*(C`\eW\*(C'\fR on the other side
+of it (in either order), counting the imaginary characters off the
+beginning and end of the string as matching a \f(CW\*(C`\eW\*(C'\fR. (Within
+character classes \f(CW\*(C`\eb\*(C'\fR represents backspace rather than a word
+boundary, just as it normally does in any double-quoted string.)
+The \f(CW\*(C`\eA\*(C'\fR and \f(CW\*(C`\eZ\*(C'\fR are just like \f(CW"^"\fR and \f(CW"$"\fR, except that they
+won't match multiple times when the \f(CW\*(C`/m\*(C'\fR modifier is used, while
+\&\f(CW"^"\fR and \f(CW"$"\fR will match at every internal line boundary. To match
+the actual end of the string and not ignore an optional trailing
+newline, use \f(CW\*(C`\ez\*(C'\fR.
+.IX Xref "\\b \\A \\Z \\z m"
+.PP
+The \f(CW\*(C`\eG\*(C'\fR assertion can be used to chain global matches (using
+\&\f(CW\*(C`m//g\*(C'\fR), as described in "Regexp Quote-Like Operators" in perlop.
+It is also useful when writing \f(CW\*(C`lex\*(C'\fR\-like scanners, when you have
+several patterns that you want to match against consequent substrings
+of your string; see the previous reference. The actual location
+where \f(CW\*(C`\eG\*(C'\fR will match can also be influenced by using \f(CWpos()\fR as
+an lvalue: see "pos" in perlfunc. Note that the rule for zero-length
+matches (see "Repeated Patterns Matching a Zero-length Substring")
+is modified somewhat, in that contents to the left of \f(CW\*(C`\eG\*(C'\fR are
+not counted when determining the length of the match. Thus the following
+will not match forever:
+.IX Xref "\\G"
+.PP
+.Vb 5
+\& my $string = \*(AqABC\*(Aq;
+\& pos($string) = 1;
+\& while ($string =~ /(.\eG)/g) {
+\& print $1;
+\& }
+.Ve
+.PP
+It will print 'A' and then terminate, as it considers the match to
+be zero-width, and thus will not match at the same position twice in a
+row.
+.PP
+It is worth noting that \f(CW\*(C`\eG\*(C'\fR improperly used can result in an infinite
+loop. Take care when using patterns that include \f(CW\*(C`\eG\*(C'\fR in an alternation.
+.PP
+Note also that \f(CW\*(C`s///\*(C'\fR will refuse to overwrite part of a substitution
+that has already been replaced; so for example this will stop after the
+first iteration, rather than iterating its way backwards through the
+string:
+.PP
+.Vb 4
+\& $_ = "123456789";
+\& pos = 6;
+\& s/.(?=.\eG)/X/g;
+\& print; # prints 1234X6789, not XXXXX6789
+.Ve
+.PP
+\fICapture groups\fR
+.IX Subsection "Capture groups"
+.PP
+The grouping construct \f(CW\*(C`( ... )\*(C'\fR creates capture groups (also referred to as
+capture buffers). To refer to the current contents of a group later on, within
+the same pattern, use \f(CW\*(C`\eg1\*(C'\fR (or \f(CW\*(C`\eg{1}\*(C'\fR) for the first, \f(CW\*(C`\eg2\*(C'\fR (or \f(CW\*(C`\eg{2}\*(C'\fR)
+for the second, and so on.
+This is called a \fIbackreference\fR.
+
+
+
+
+
+
+
+
+There is no limit to the number of captured substrings that you may use.
+Groups are numbered with the leftmost open parenthesis being number 1, \fIetc\fR. If
+a group did not match, the associated backreference won't match either. (This
+can happen if the group is optional, or in a different branch of an
+alternation.)
+You can omit the \f(CW"g"\fR, and write \f(CW"\e1"\fR, \fIetc\fR, but there are some issues with
+this form, described below.
+.IX Xref "regex, capture buffer regexp, capture buffer regex, capture group regexp, capture group regular expression, capture buffer backreference regular expression, capture group backreference \\g{1} \\g{-1} \\g{name} relative backreference named backreference named capture buffer regular expression, named capture buffer named capture group regular expression, named capture group %+ $+{name} \\k<name>"
+.PP
+You can also refer to capture groups relatively, by using a negative number, so
+that \f(CW\*(C`\eg\-1\*(C'\fR and \f(CW\*(C`\eg{\-1}\*(C'\fR both refer to the immediately preceding capture
+group, and \f(CW\*(C`\eg\-2\*(C'\fR and \f(CW\*(C`\eg{\-2}\*(C'\fR both refer to the group before it. For
+example:
+.PP
+.Vb 8
+\& /
+\& (Y) # group 1
+\& ( # group 2
+\& (X) # group 3
+\& \eg{\-1} # backref to group 3
+\& \eg{\-3} # backref to group 1
+\& )
+\& /x
+.Ve
+.PP
+would match the same as \f(CW\*(C`/(Y) ( (X) \eg3 \eg1 )/x\*(C'\fR. This allows you to
+interpolate regexes into larger regexes and not have to worry about the
+capture groups being renumbered.
+.PP
+You can dispense with numbers altogether and create named capture groups.
+The notation is \f(CW\*(C`(?<\fR\f(CIname\fR\f(CW>...)\*(C'\fR to declare and \f(CW\*(C`\eg{\fR\f(CIname\fR\f(CW}\*(C'\fR to
+reference. (To be compatible with .Net regular expressions, \f(CW\*(C`\eg{\fR\f(CIname\fR\f(CW}\*(C'\fR may
+also be written as \f(CW\*(C`\ek{\fR\f(CIname\fR\f(CW}\*(C'\fR, \f(CW\*(C`\ek<\fR\f(CIname\fR\f(CW>\*(C'\fR or \f(CW\*(C`\ek\*(Aq\fR\f(CIname\fR\f(CW\*(Aq\*(C'\fR.)
+\&\fIname\fR must not begin with a number, nor contain hyphens.
+When different groups within the same pattern have the same name, any reference
+to that name assumes the leftmost defined group. Named groups count in
+absolute and relative numbering, and so can also be referred to by those
+numbers.
+(It's possible to do things with named capture groups that would otherwise
+require \f(CW\*(C`(??{})\*(C'\fR.)
+.PP
+Capture group contents are dynamically scoped and available to you outside the
+pattern until the end of the enclosing block or until the next successful
+match in the same scope, whichever comes first.
+See "Compound Statements" in perlsyn and
+"Scoping Rules of Regex Variables" in perlvar for more details.
+.PP
+You can access the contents of a capture group by absolute number (using
+\&\f(CW"$1"\fR instead of \f(CW"\eg1"\fR, \fIetc\fR); or by name via the \f(CW\*(C`%+\*(C'\fR hash,
+using \f(CW"$+{\fR\f(CIname\fR\f(CW}"\fR.
+.PP
+Braces are required in referring to named capture groups, but are optional for
+absolute or relative numbered ones. Braces are safer when creating a regex by
+concatenating smaller strings. For example if you have \f(CW\*(C`qr/$a$b/\*(C'\fR, and \f(CW$a\fR
+contained \f(CW"\eg1"\fR, and \f(CW$b\fR contained \f(CW"37"\fR, you would get \f(CW\*(C`/\eg137/\*(C'\fR which
+is probably not what you intended.
+.PP
+If you use braces, you may also optionally add any number of blank
+(space or tab) characters within but adjacent to the braces, like
+\&\f(CW\*(C`\eg{\ \-1\ }\*(C'\fR, or \f(CW\*(C`\ek{\ \fR\f(CIname\fR\f(CW\ }\*(C'\fR.
+.PP
+The \f(CW\*(C`\eg\*(C'\fR and \f(CW\*(C`\ek\*(C'\fR notations were introduced in Perl 5.10.0. Prior to that
+there were no named nor relative numbered capture groups. Absolute numbered
+groups were referred to using \f(CW\*(C`\e1\*(C'\fR,
+\&\f(CW\*(C`\e2\*(C'\fR, \fIetc\fR., and this notation is still
+accepted (and likely always will be). But it leads to some ambiguities if
+there are more than 9 capture groups, as \f(CW\*(C`\e10\*(C'\fR could mean either the tenth
+capture group, or the character whose ordinal in octal is 010 (a backspace in
+ASCII). Perl resolves this ambiguity by interpreting \f(CW\*(C`\e10\*(C'\fR as a backreference
+only if at least 10 left parentheses have opened before it. Likewise \f(CW\*(C`\e11\*(C'\fR is
+a backreference only if at least 11 left parentheses have opened before it.
+And so on. \f(CW\*(C`\e1\*(C'\fR through \f(CW\*(C`\e9\*(C'\fR are always interpreted as backreferences.
+There are several examples below that illustrate these perils. You can avoid
+the ambiguity by always using \f(CW\*(C`\eg{}\*(C'\fR or \f(CW\*(C`\eg\*(C'\fR if you mean capturing groups;
+and for octal constants always using \f(CW\*(C`\eo{}\*(C'\fR, or for \f(CW\*(C`\e077\*(C'\fR and below, using 3
+digits padded with leading zeros, since a leading zero implies an octal
+constant.
+.PP
+The \f(CW\*(C`\e\fR\f(CIdigit\fR\f(CW\*(C'\fR notation also works in certain circumstances outside
+the pattern. See "Warning on \e1 Instead of \f(CW$1\fR" below for details.
+.PP
+Examples:
+.PP
+.Vb 1
+\& s/^([^ ]*) *([^ ]*)/$2 $1/; # swap first two words
+\&
+\& /(.)\eg1/ # find first doubled char
+\& and print "\*(Aq$1\*(Aq is the first doubled character\en";
+\&
+\& /(?<char>.)\ek<char>/ # ... a different way
+\& and print "\*(Aq$+{char}\*(Aq is the first doubled character\en";
+\&
+\& /(?\*(Aqchar\*(Aq.)\eg1/ # ... mix and match
+\& and print "\*(Aq$1\*(Aq is the first doubled character\en";
+\&
+\& if (/Time: (..):(..):(..)/) { # parse out values
+\& $hours = $1;
+\& $minutes = $2;
+\& $seconds = $3;
+\& }
+\&
+\& /(.)(.)(.)(.)(.)(.)(.)(.)(.)\eg10/ # \eg10 is a backreference
+\& /(.)(.)(.)(.)(.)(.)(.)(.)(.)\e10/ # \e10 is octal
+\& /((.)(.)(.)(.)(.)(.)(.)(.)(.))\e10/ # \e10 is a backreference
+\& /((.)(.)(.)(.)(.)(.)(.)(.)(.))\e010/ # \e010 is octal
+\&
+\& $a = \*(Aq(.)\e1\*(Aq; # Creates problems when concatenated.
+\& $b = \*(Aq(.)\eg{1}\*(Aq; # Avoids the problems.
+\& "aa" =~ /${a}/; # True
+\& "aa" =~ /${b}/; # True
+\& "aa0" =~ /${a}0/; # False!
+\& "aa0" =~ /${b}0/; # True
+\& "aa\ex08" =~ /${a}0/; # True!
+\& "aa\ex08" =~ /${b}0/; # False
+.Ve
+.PP
+Several special variables also refer back to portions of the previous
+match. \f(CW$+\fR returns whatever the last bracket match matched.
+\&\f(CW$&\fR returns the entire matched string. (At one point \f(CW$0\fR did
+also, but now it returns the name of the program.) \f(CW\*(C`$\`\*(C'\fR returns
+everything before the matched string. \f(CW\*(C`$\*(Aq\*(C'\fR returns everything
+after the matched string. And \f(CW$^N\fR contains whatever was matched by
+the most-recently closed group (submatch). \f(CW$^N\fR can be used in
+extended patterns (see below), for example to assign a submatch to a
+variable.
+.IX Xref "$+ $^N $& $` $'"
+.PP
+These special variables, like the \f(CW\*(C`%+\*(C'\fR hash and the numbered match variables
+(\f(CW$1\fR, \f(CW$2\fR, \f(CW$3\fR, \fIetc\fR.) are dynamically scoped
+until the end of the enclosing block or until the next successful
+match, whichever comes first. (See "Compound Statements" in perlsyn.)
+.IX Xref "$+ $^N $& $` $' $1 $2 $3 $4 $5 $6 $7 $8 $9 @{^CAPTURE}"
+.PP
+The \f(CW\*(C`@{^CAPTURE}\*(C'\fR array may be used to access ALL of the capture buffers
+as an array without needing to know how many there are. For instance
+.PP
+.Vb 1
+\& $string=~/$pattern/ and @captured = @{^CAPTURE};
+.Ve
+.PP
+will place a copy of each capture variable, \f(CW$1\fR, \f(CW$2\fR etc, into the
+\&\f(CW@captured\fR array.
+.PP
+Be aware that when interpolating a subscript of the \f(CW\*(C`@{^CAPTURE}\*(C'\fR
+array you must use demarcated curly brace notation:
+.PP
+.Vb 1
+\& print "@{^CAPTURE[0]}";
+.Ve
+.PP
+See "Demarcated variable names using braces" in perldata for more on
+this notation.
+.PP
+\&\fBNOTE\fR: Failed matches in Perl do not reset the match variables,
+which makes it easier to write code that tests for a series of more
+specific cases and remembers the best match.
+.PP
+\&\fBWARNING\fR: If your code is to run on Perl 5.16 or earlier,
+beware that once Perl sees that you need one of \f(CW$&\fR, \f(CW\*(C`$\`\*(C'\fR, or
+\&\f(CW\*(C`$\*(Aq\*(C'\fR anywhere in the program, it has to provide them for every
+pattern match. This may substantially slow your program.
+.PP
+Perl uses the same mechanism to produce \f(CW$1\fR, \f(CW$2\fR, \fIetc\fR, so you also
+pay a price for each pattern that contains capturing parentheses.
+(To avoid this cost while retaining the grouping behaviour, use the
+extended regular expression \f(CW\*(C`(?: ... )\*(C'\fR instead.) But if you never
+use \f(CW$&\fR, \f(CW\*(C`$\`\*(C'\fR or \f(CW\*(C`$\*(Aq\*(C'\fR, then patterns \fIwithout\fR capturing
+parentheses will not be penalized. So avoid \f(CW$&\fR, \f(CW\*(C`$\*(Aq\*(C'\fR, and \f(CW\*(C`$\`\*(C'\fR
+if you can, but if you can't (and some algorithms really appreciate
+them), once you've used them once, use them at will, because you've
+already paid the price.
+.IX Xref "$& $` $'"
+.PP
+Perl 5.16 introduced a slightly more efficient mechanism that notes
+separately whether each of \f(CW\*(C`$\`\*(C'\fR, \f(CW$&\fR, and \f(CW\*(C`$\*(Aq\*(C'\fR have been seen, and
+thus may only need to copy part of the string. Perl 5.20 introduced a
+much more efficient copy-on-write mechanism which eliminates any slowdown.
+.PP
+As another workaround for this problem, Perl 5.10.0 introduced \f(CW\*(C`${^PREMATCH}\*(C'\fR,
+\&\f(CW\*(C`${^MATCH}\*(C'\fR and \f(CW\*(C`${^POSTMATCH}\*(C'\fR, which are equivalent to \f(CW\*(C`$\`\*(C'\fR, \f(CW$&\fR
+and \f(CW\*(C`$\*(Aq\*(C'\fR, \fBexcept\fR that they are only guaranteed to be defined after a
+successful match that was executed with the \f(CW\*(C`/p\*(C'\fR (preserve) modifier.
+The use of these variables incurs no global performance penalty, unlike
+their punctuation character equivalents, however at the trade-off that you
+have to tell perl when you want to use them. As of Perl 5.20, these three
+variables are equivalent to \f(CW\*(C`$\`\*(C'\fR, \f(CW$&\fR and \f(CW\*(C`$\*(Aq\*(C'\fR, and \f(CW\*(C`/p\*(C'\fR is ignored.
+.IX Xref " p p modifier"
+.SS "Quoting metacharacters"
+.IX Subsection "Quoting metacharacters"
+Backslashed metacharacters in Perl are alphanumeric, such as \f(CW\*(C`\eb\*(C'\fR,
+\&\f(CW\*(C`\ew\*(C'\fR, \f(CW\*(C`\en\*(C'\fR. Unlike some other regular expression languages, there
+are no backslashed symbols that aren't alphanumeric. So anything
+that looks like \f(CW\*(C`\e\e\*(C'\fR, \f(CW\*(C`\e(\*(C'\fR, \f(CW\*(C`\e)\*(C'\fR, \f(CW\*(C`\e[\*(C'\fR, \f(CW\*(C`\e]\*(C'\fR, \f(CW\*(C`\e{\*(C'\fR, or \f(CW\*(C`\e}\*(C'\fR is
+always
+interpreted as a literal character, not a metacharacter. This was
+once used in a common idiom to disable or quote the special meanings
+of regular expression metacharacters in a string that you want to
+use for a pattern. Simply quote all non\-"word" characters:
+.PP
+.Vb 1
+\& $pattern =~ s/(\eW)/\e\e$1/g;
+.Ve
+.PP
+(If \f(CW\*(C`use locale\*(C'\fR is set, then this depends on the current locale.)
+Today it is more common to use the \f(CWquotemeta()\fR
+function or the \f(CW\*(C`\eQ\*(C'\fR metaquoting escape sequence to disable all
+metacharacters' special meanings like this:
+.PP
+.Vb 1
+\& /$unquoted\eQ$quoted\eE$unquoted/
+.Ve
+.PP
+Beware that if you put literal backslashes (those not inside
+interpolated variables) between \f(CW\*(C`\eQ\*(C'\fR and \f(CW\*(C`\eE\*(C'\fR, double-quotish
+backslash interpolation may lead to confusing results. If you
+\&\fIneed\fR to use literal backslashes within \f(CW\*(C`\eQ...\eE\*(C'\fR,
+consult "Gory details of parsing quoted constructs" in perlop.
+.PP
+\&\f(CWquotemeta()\fR and \f(CW\*(C`\eQ\*(C'\fR are fully described in "quotemeta" in perlfunc.
+.SS "Extended Patterns"
+.IX Subsection "Extended Patterns"
+Perl also defines a consistent extension syntax for features not
+found in standard tools like \fBawk\fR and
+\&\fBlex\fR. The syntax for most of these is a
+pair of parentheses with a question mark as the first thing within
+the parentheses. The character after the question mark indicates
+the extension.
+.PP
+A question mark was chosen for this and for the minimal-matching
+construct because 1) question marks are rare in older regular
+expressions, and 2) whenever you see one, you should stop and
+"question" exactly what is going on. That's psychology....
+.ie n .IP """(?#\fItext\fR)""" 4
+.el .IP \f(CW(?#\fR\f(CItext\fR\f(CW)\fR 4
+.IX Xref "(?#)"
+.IX Item "(?#text)"
+A comment. The \fItext\fR is ignored.
+Note that Perl closes
+the comment as soon as it sees a \f(CW")"\fR, so there is no way to put a literal
+\&\f(CW")"\fR in the comment. The pattern's closing delimiter must be escaped by
+a backslash if it appears in the comment.
+.Sp
+See "/x" for another way to have comments in patterns.
+.Sp
+Note that a comment can go just about anywhere, except in the middle of
+an escape sequence. Examples:
+.Sp
+.Vb 1
+\& qr/foo(?#comment)bar/\*(Aq # Matches \*(Aqfoobar\*(Aq
+\&
+\& # The pattern below matches \*(Aqabcd\*(Aq, \*(Aqabccd\*(Aq, or \*(Aqabcccd\*(Aq
+\& qr/abc(?#comment between literal and its quantifier){1,3}d/
+\&
+\& # The pattern below generates a syntax error, because the \*(Aq\ep\*(Aq must
+\& # be followed immediately by a \*(Aq{\*(Aq.
+\& qr/\ep(?#comment between \ep and its property name){Any}/
+\&
+\& # The pattern below generates a syntax error, because the initial
+\& # \*(Aq\e(\*(Aq is a literal opening parenthesis, and so there is nothing
+\& # for the closing \*(Aq)\*(Aq to match
+\& qr/\e(?#the backslash means this isn\*(Aqt a comment)p{Any}/
+\&
+\& # Comments can be used to fold long patterns into multiple lines
+\& qr/First part of a long regex(?#
+\& )remaining part/
+.Ve
+.ie n .IP """(?adlupimnsx\-imnsx)""" 4
+.el .IP \f(CW(?adlupimnsx\-imnsx)\fR 4
+.IX Item "(?adlupimnsx-imnsx)"
+.PD 0
+.ie n .IP """(?^alupimnsx)""" 4
+.el .IP \f(CW(?^alupimnsx)\fR 4
+.IX Xref "(?) (?^)"
+.IX Item "(?^alupimnsx)"
+.PD
+Zero or more embedded pattern-match modifiers, to be turned on (or
+turned off if preceded by \f(CW"\-"\fR) for the remainder of the pattern or
+the remainder of the enclosing pattern group (if any).
+.Sp
+This is particularly useful for dynamically-generated patterns,
+such as those read in from a
+configuration file, taken from an argument, or specified in a table
+somewhere. Consider the case where some patterns want to be
+case-sensitive and some do not: The case-insensitive ones merely need to
+include \f(CW\*(C`(?i)\*(C'\fR at the front of the pattern. For example:
+.Sp
+.Vb 2
+\& $pattern = "foobar";
+\& if ( /$pattern/i ) { }
+\&
+\& # more flexible:
+\&
+\& $pattern = "(?i)foobar";
+\& if ( /$pattern/ ) { }
+.Ve
+.Sp
+These modifiers are restored at the end of the enclosing group. For example,
+.Sp
+.Vb 1
+\& ( (?i) blah ) \es+ \eg1
+.Ve
+.Sp
+will match \f(CW\*(C`blah\*(C'\fR in any case, some spaces, and an exact (\fIincluding the case\fR!)
+repetition of the previous word, assuming the \f(CW\*(C`/x\*(C'\fR modifier, and no \f(CW\*(C`/i\*(C'\fR
+modifier outside this group.
+.Sp
+These modifiers do not carry over into named subpatterns called in the
+enclosing group. In other words, a pattern such as \f(CW\*(C`((?i)(?&\fR\f(CINAME\fR\f(CW))\*(C'\fR does not
+change the case-sensitivity of the \fINAME\fR pattern.
+.Sp
+A modifier is overridden by later occurrences of this construct in the
+same scope containing the same modifier, so that
+.Sp
+.Vb 1
+\& /((?im)foo(?\-m)bar)/
+.Ve
+.Sp
+matches all of \f(CW\*(C`foobar\*(C'\fR case insensitively, but uses \f(CW\*(C`/m\*(C'\fR rules for
+only the \f(CW\*(C`foo\*(C'\fR portion. The \f(CW"a"\fR flag overrides \f(CW\*(C`aa\*(C'\fR as well;
+likewise \f(CW\*(C`aa\*(C'\fR overrides \f(CW"a"\fR. The same goes for \f(CW"x"\fR and \f(CW\*(C`xx\*(C'\fR.
+Hence, in
+.Sp
+.Vb 1
+\& /(?\-x)foo/xx
+.Ve
+.Sp
+both \f(CW\*(C`/x\*(C'\fR and \f(CW\*(C`/xx\*(C'\fR are turned off during matching \f(CW\*(C`foo\*(C'\fR. And in
+.Sp
+.Vb 1
+\& /(?x)foo/x
+.Ve
+.Sp
+\&\f(CW\*(C`/x\*(C'\fR but NOT \f(CW\*(C`/xx\*(C'\fR is turned on for matching \f(CW\*(C`foo\*(C'\fR. (One might
+mistakenly think that since the inner \f(CW\*(C`(?x)\*(C'\fR is already in the scope of
+\&\f(CW\*(C`/x\*(C'\fR, that the result would effectively be the sum of them, yielding
+\&\f(CW\*(C`/xx\*(C'\fR. It doesn't work that way.) Similarly, doing something like
+\&\f(CW\*(C`(?xx\-x)foo\*(C'\fR turns off all \f(CW"x"\fR behavior for matching \f(CW\*(C`foo\*(C'\fR, it is not
+that you subtract 1 \f(CW"x"\fR from 2 to get 1 \f(CW"x"\fR remaining.
+.Sp
+Any of these modifiers can be set to apply globally to all regular
+expressions compiled within the scope of a \f(CW\*(C`use re\*(C'\fR. See
+"'/flags' mode" in re.
+.Sp
+Starting in Perl 5.14, a \f(CW"^"\fR (caret or circumflex accent) immediately
+after the \f(CW"?"\fR is a shorthand equivalent to \f(CW\*(C`d\-imnsx\*(C'\fR. Flags (except
+\&\f(CW"d"\fR) may follow the caret to override it.
+But a minus sign is not legal with it.
+.Sp
+Note that the \f(CW"a"\fR, \f(CW"d"\fR, \f(CW"l"\fR, \f(CW"p"\fR, and \f(CW"u"\fR modifiers are special in
+that they can only be enabled, not disabled, and the \f(CW"a"\fR, \f(CW"d"\fR, \f(CW"l"\fR, and
+\&\f(CW"u"\fR modifiers are mutually exclusive: specifying one de-specifies the
+others, and a maximum of one (or two \f(CW"a"\fR's) may appear in the
+construct. Thus, for
+example, \f(CW\*(C`(?\-p)\*(C'\fR will warn when compiled under \f(CW\*(C`use warnings\*(C'\fR;
+\&\f(CW\*(C`(?\-d:...)\*(C'\fR and \f(CW\*(C`(?dl:...)\*(C'\fR are fatal errors.
+.Sp
+Note also that the \f(CW"p"\fR modifier is special in that its presence
+anywhere in a pattern has a global effect.
+.Sp
+Having zero modifiers makes this a no-op (so why did you specify it,
+unless it's generated code), and starting in v5.30, warns under \f(CW\*(C`use
+re \*(Aqstrict\*(Aq\*(C'\fR.
+.ie n .IP """(?:\fIpattern\fR)""" 4
+.el .IP \f(CW(?:\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Xref "(?:)"
+.IX Item "(?:pattern)"
+.PD 0
+.ie n .IP """(?adluimnsx\-imnsx:\fIpattern\fR)""" 4
+.el .IP \f(CW(?adluimnsx\-imnsx:\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Item "(?adluimnsx-imnsx:pattern)"
+.ie n .IP """(?^aluimnsx:\fIpattern\fR)""" 4
+.el .IP \f(CW(?^aluimnsx:\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Xref "(?^:)"
+.IX Item "(?^aluimnsx:pattern)"
+.PD
+This is for clustering, not capturing; it groups subexpressions like
+\&\f(CW"()"\fR, but doesn't make backreferences as \f(CW"()"\fR does. So
+.Sp
+.Vb 1
+\& @fields = split(/\eb(?:a|b|c)\eb/)
+.Ve
+.Sp
+matches the same field delimiters as
+.Sp
+.Vb 1
+\& @fields = split(/\eb(a|b|c)\eb/)
+.Ve
+.Sp
+but doesn't spit out the delimiters themselves as extra fields (even though
+that's the behaviour of "split" in perlfunc when its pattern contains capturing
+groups). It's also cheaper not to capture
+characters if you don't need to.
+.Sp
+Any letters between \f(CW"?"\fR and \f(CW":"\fR act as flags modifiers as with
+\&\f(CW\*(C`(?adluimnsx\-imnsx)\*(C'\fR. For example,
+.Sp
+.Vb 1
+\& /(?s\-i:more.*than).*million/i
+.Ve
+.Sp
+is equivalent to the more verbose
+.Sp
+.Vb 1
+\& /(?:(?s\-i)more.*than).*million/i
+.Ve
+.Sp
+Note that any \f(CW\*(C`()\*(C'\fR constructs enclosed within this one will still
+capture unless the \f(CW\*(C`/n\*(C'\fR modifier is in effect.
+.Sp
+Like the "(?adlupimnsx\-imnsx)" construct, \f(CW\*(C`aa\*(C'\fR and \f(CW"a"\fR override each
+other, as do \f(CW\*(C`xx\*(C'\fR and \f(CW"x"\fR. They are not additive. So, doing
+something like \f(CW\*(C`(?xx\-x:foo)\*(C'\fR turns off all \f(CW"x"\fR behavior for matching
+\&\f(CW\*(C`foo\*(C'\fR.
+.Sp
+Starting in Perl 5.14, a \f(CW"^"\fR (caret or circumflex accent) immediately
+after the \f(CW"?"\fR is a shorthand equivalent to \f(CW\*(C`d\-imnsx\*(C'\fR. Any positive
+flags (except \f(CW"d"\fR) may follow the caret, so
+.Sp
+.Vb 1
+\& (?^x:foo)
+.Ve
+.Sp
+is equivalent to
+.Sp
+.Vb 1
+\& (?x\-imns:foo)
+.Ve
+.Sp
+The caret tells Perl that this cluster doesn't inherit the flags of any
+surrounding pattern, but uses the system defaults (\f(CW\*(C`d\-imnsx\*(C'\fR),
+modified by any flags specified.
+.Sp
+The caret allows for simpler stringification of compiled regular
+expressions. These look like
+.Sp
+.Vb 1
+\& (?^:pattern)
+.Ve
+.Sp
+with any non-default flags appearing between the caret and the colon.
+A test that looks at such stringification thus doesn't need to have the
+system default flags hard-coded in it, just the caret. If new flags are
+added to Perl, the meaning of the caret's expansion will change to include
+the default for those flags, so the test will still work, unchanged.
+.Sp
+Specifying a negative flag after the caret is an error, as the flag is
+redundant.
+.Sp
+Mnemonic for \f(CW\*(C`(?^...)\*(C'\fR: A fresh beginning since the usual use of a caret is
+to match at the beginning.
+.ie n .IP """(?|\fIpattern\fR)""" 4
+.el .IP \f(CW(?|\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Xref "(?|) Branch reset"
+.IX Item "(?|pattern)"
+This is the "branch reset" pattern, which has the special property
+that the capture groups are numbered from the same starting point
+in each alternation branch. It is available starting from perl 5.10.0.
+.Sp
+Capture groups are numbered from left to right, but inside this
+construct the numbering is restarted for each branch.
+.Sp
+The numbering within each branch will be as normal, and any groups
+following this construct will be numbered as though the construct
+contained only one branch, that being the one with the most capture
+groups in it.
+.Sp
+This construct is useful when you want to capture one of a
+number of alternative matches.
+.Sp
+Consider the following pattern. The numbers underneath show in
+which group the captured content will be stored.
+.Sp
+.Vb 3
+\& # before \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-branch\-reset\-\-\-\-\-\-\-\-\-\-\- after
+\& / ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
+\& # 1 2 2 3 2 3 4
+.Ve
+.Sp
+Be careful when using the branch reset pattern in combination with
+named captures. Named captures are implemented as being aliases to
+numbered groups holding the captures, and that interferes with the
+implementation of the branch reset pattern. If you are using named
+captures in a branch reset pattern, it's best to use the same names,
+in the same order, in each of the alternations:
+.Sp
+.Vb 2
+\& /(?| (?<a> x ) (?<b> y )
+\& | (?<a> z ) (?<b> w )) /x
+.Ve
+.Sp
+Not doing so may lead to surprises:
+.Sp
+.Vb 3
+\& "12" =~ /(?| (?<a> \ed+ ) | (?<b> \eD+))/x;
+\& say $+{a}; # Prints \*(Aq12\*(Aq
+\& say $+{b}; # *Also* prints \*(Aq12\*(Aq.
+.Ve
+.Sp
+The problem here is that both the group named \f(CW\*(C`a\*(C'\fR and the group
+named \f(CW\*(C`b\*(C'\fR are aliases for the group belonging to \f(CW$1\fR.
+.IP "Lookaround Assertions" 4
+.IX Xref "look-around assertion lookaround assertion look-around lookaround"
+.IX Item "Lookaround Assertions"
+Lookaround assertions are zero-width patterns which match a specific
+pattern without including it in \f(CW$&\fR. Positive assertions match when
+their subpattern matches, negative assertions match when their subpattern
+fails. Lookbehind matches text up to the current match position,
+lookahead matches text following the current match position.
+.RS 4
+.ie n .IP """(?=\fIpattern\fR)""" 4
+.el .IP \f(CW(?=\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Item "(?=pattern)"
+.PD 0
+.ie n .IP """(*pla:\fIpattern\fR)""" 4
+.el .IP \f(CW(*pla:\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Item "(*pla:pattern)"
+.ie n .IP """(*positive_lookahead:\fIpattern\fR)""" 4
+.el .IP \f(CW(*positive_lookahead:\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Xref "(?=) (*pla (*positive_lookahead look-ahead, positive lookahead, positive"
+.IX Item "(*positive_lookahead:pattern)"
+.PD
+A zero-width positive lookahead assertion. For example, \f(CW\*(C`/\ew+(?=\et)/\*(C'\fR
+matches a word followed by a tab, without including the tab in \f(CW$&\fR.
+.ie n .IP """(?!\fIpattern\fR)""" 4
+.el .IP \f(CW(?!\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Item "(?!pattern)"
+.PD 0
+.ie n .IP """(*nla:\fIpattern\fR)""" 4
+.el .IP \f(CW(*nla:\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Item "(*nla:pattern)"
+.ie n .IP """(*negative_lookahead:\fIpattern\fR)""" 4
+.el .IP \f(CW(*negative_lookahead:\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Xref "(?!) (*nla (*negative_lookahead look-ahead, negative lookahead, negative"
+.IX Item "(*negative_lookahead:pattern)"
+.PD
+A zero-width negative lookahead assertion. For example \f(CW\*(C`/foo(?!bar)/\*(C'\fR
+matches any occurrence of "foo" that isn't followed by "bar". Note
+however that lookahead and lookbehind are NOT the same thing. You cannot
+use this for lookbehind.
+.Sp
+If you are looking for a "bar" that isn't preceded by a "foo", \f(CW\*(C`/(?!foo)bar/\*(C'\fR
+will not do what you want. That's because the \f(CW\*(C`(?!foo)\*(C'\fR is just saying that
+the next thing cannot be "foo"\-\-and it's not, it's a "bar", so "foobar" will
+match. Use lookbehind instead (see below).
+.ie n .IP """(?<=\fIpattern\fR)""" 4
+.el .IP \f(CW(?<=\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Item "(?<=pattern)"
+.PD 0
+.ie n .IP """\eK""" 4
+.el .IP \f(CW\eK\fR 4
+.IX Item "K"
+.ie n .IP """(*plb:\fIpattern\fR)""" 4
+.el .IP \f(CW(*plb:\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Item "(*plb:pattern)"
+.ie n .IP """(*positive_lookbehind:\fIpattern\fR)""" 4
+.el .IP \f(CW(*positive_lookbehind:\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Xref "(?<=) (*plb (*positive_lookbehind look-behind, positive lookbehind, positive \\K"
+.IX Item "(*positive_lookbehind:pattern)"
+.PD
+A zero-width positive lookbehind assertion. For example, \f(CW\*(C`/(?<=\et)\ew+/\*(C'\fR
+matches a word that follows a tab, without including the tab in \f(CW$&\fR.
+.Sp
+Prior to Perl 5.30, it worked only for fixed-width lookbehind, but
+starting in that release, it can handle variable lengths from 1 to 255
+characters as an experimental feature. The feature is enabled
+automatically if you use a variable length positive lookbehind assertion.
+.Sp
+In Perl 5.35.10 the scope of the experimental nature of this construct
+has been reduced, and experimental warnings will only be produced when
+the construct contains capturing parenthesis. The warnings will be
+raised at pattern compilation time, unless turned off, in the
+\&\f(CW\*(C`experimental::vlb\*(C'\fR category. This is to warn you that the exact
+contents of capturing buffers in a variable length positive lookbehind
+is not well defined and is subject to change in a future release of perl.
+.Sp
+Currently if you use capture buffers inside of a positive variable length
+lookbehind the result will be the longest and thus leftmost match possible.
+This means that
+.Sp
+.Vb 4
+\& "aax" =~ /(?=x)(?<=(a|aa))/
+\& "aax" =~ /(?=x)(?<=(aa|a))/
+\& "aax" =~ /(?=x)(?<=(a{1,2}?)/
+\& "aax" =~ /(?=x)(?<=(a{1,2})/
+.Ve
+.Sp
+will all result in \f(CW$1\fR containing \f(CW"aa"\fR. It is possible in a future
+release of perl we will change this behavior.
+.Sp
+There is a special form of this construct, called \f(CW\*(C`\eK\*(C'\fR
+(available since Perl 5.10.0), which causes the
+regex engine to "keep" everything it had matched prior to the \f(CW\*(C`\eK\*(C'\fR and
+not include it in \f(CW$&\fR. This effectively provides non-experimental
+variable-length lookbehind of any length.
+.Sp
+And, there is a technique that can be used to handle variable length
+lookbehinds on earlier releases, and longer than 255 characters. It is
+described in
+<http://www.drregex.com/2019/02/variable\-length\-lookbehinds\-actually.html>.
+.Sp
+Note that under \f(CW\*(C`/i\*(C'\fR, a few single characters match two or three other
+characters. This makes them variable length, and the 255 length applies
+to the maximum number of characters in the match. For
+example \f(CW\*(C`qr/\eN{LATIN SMALL LETTER SHARP S}/i\*(C'\fR matches the sequence
+\&\f(CW"ss"\fR. Your lookbehind assertion could contain 127 Sharp S
+characters under \f(CW\*(C`/i\*(C'\fR, but adding a 128th would generate a compilation
+error, as that could match 256 \f(CW"s"\fR characters in a row.
+.Sp
+The use of \f(CW\*(C`\eK\*(C'\fR inside of another lookaround assertion
+is allowed, but the behaviour is currently not well defined.
+.Sp
+For various reasons \f(CW\*(C`\eK\*(C'\fR may be significantly more efficient than the
+equivalent \f(CW\*(C`(?<=...)\*(C'\fR construct, and it is especially useful in
+situations where you want to efficiently remove something following
+something else in a string. For instance
+.Sp
+.Vb 1
+\& s/(foo)bar/$1/g;
+.Ve
+.Sp
+can be rewritten as the much more efficient
+.Sp
+.Vb 1
+\& s/foo\eKbar//g;
+.Ve
+.Sp
+Use of the non-greedy modifier \f(CW"?"\fR may not give you the expected
+results if it is within a capturing group within the construct.
+.ie n .IP """(?<!\fIpattern\fR)""" 4
+.el .IP \f(CW(?<!\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Item "(?<!pattern)"
+.PD 0
+.ie n .IP """(*nlb:\fIpattern\fR)""" 4
+.el .IP \f(CW(*nlb:\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Item "(*nlb:pattern)"
+.ie n .IP """(*negative_lookbehind:\fIpattern\fR)""" 4
+.el .IP \f(CW(*negative_lookbehind:\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Xref "(?<!) (*nlb (*negative_lookbehind look-behind, negative lookbehind, negative"
+.IX Item "(*negative_lookbehind:pattern)"
+.PD
+A zero-width negative lookbehind assertion. For example \f(CW\*(C`/(?<!bar)foo/\*(C'\fR
+matches any occurrence of "foo" that does not follow "bar".
+.Sp
+Prior to Perl 5.30, it worked only for fixed-width lookbehind, but
+starting in that release, it can handle variable lengths from 1 to 255
+characters as an experimental feature. The feature is enabled
+automatically if you use a variable length negative lookbehind assertion.
+.Sp
+In Perl 5.35.10 the scope of the experimental nature of this construct
+has been reduced, and experimental warnings will only be produced when
+the construct contains capturing parentheses. The warnings will be
+raised at pattern compilation time, unless turned off, in the
+\&\f(CW\*(C`experimental::vlb\*(C'\fR category. This is to warn you that the exact
+contents of capturing buffers in a variable length negative lookbehind
+is not well defined and is subject to change in a future release of perl.
+.Sp
+Currently if you use capture buffers inside of a negative variable length
+lookbehind the result may not be what you expect, for instance:
+.Sp
+.Vb 1
+\& say "axfoo"=~/(?=foo)(?<!(a|ax)(?{ say $1 }))/ ? "y" : "n";
+.Ve
+.Sp
+will output the following:
+.Sp
+.Vb 2
+\& a
+\& no
+.Ve
+.Sp
+which does not make sense as this should print out "ax" as the "a" does
+not line up at the correct place. Another example would be:
+.Sp
+.Vb 1
+\& say "yes: \*(Aq$1\-$2\*(Aq" if "aayfoo"=~/(?=foo)(?<!(a|aa)(a|aa)x)/;
+.Ve
+.Sp
+will output the following:
+.Sp
+.Vb 1
+\& yes: \*(Aqaa\-a\*(Aq
+.Ve
+.Sp
+It is possible in a future release of perl we will change this behavior
+so both of these examples produced more reasonable output.
+.Sp
+Note that we are confident that the construct will match and reject
+patterns appropriately, the undefined behavior strictly relates to the
+value of the capture buffer during or after matching.
+.Sp
+There is a technique that can be used to handle variable length
+lookbehind on earlier releases, and longer than 255 characters. It is
+described in
+<http://www.drregex.com/2019/02/variable\-length\-lookbehinds\-actually.html>.
+.Sp
+Note that under \f(CW\*(C`/i\*(C'\fR, a few single characters match two or three other
+characters. This makes them variable length, and the 255 length applies
+to the maximum number of characters in the match. For
+example \f(CW\*(C`qr/\eN{LATIN SMALL LETTER SHARP S}/i\*(C'\fR matches the sequence
+\&\f(CW"ss"\fR. Your lookbehind assertion could contain 127 Sharp S
+characters under \f(CW\*(C`/i\*(C'\fR, but adding a 128th would generate a compilation
+error, as that could match 256 \f(CW"s"\fR characters in a row.
+.Sp
+Use of the non-greedy modifier \f(CW"?"\fR may not give you the expected
+results if it is within a capturing group within the construct.
+.RE
+.RS 4
+.RE
+.ie n .IP """(?<\fINAME\fR>\fIpattern\fR)""" 4
+.el .IP \f(CW(?<\fR\f(CINAME\fR\f(CW>\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Item "(?<NAME>pattern)"
+.PD 0
+.ie n .IP """(?\*(Aq\fINAME\fR\*(Aq\fIpattern\fR)""" 4
+.el .IP \f(CW(?\*(Aq\fR\f(CINAME\fR\f(CW\*(Aq\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Xref "(?<NAME>) (?'NAME') named capture capture"
+.IX Item "(?NAMEpattern)"
+.PD
+A named capture group. Identical in every respect to normal capturing
+parentheses \f(CW\*(C`()\*(C'\fR but for the additional fact that the group
+can be referred to by name in various regular expression
+constructs (like \f(CW\*(C`\eg{\fR\f(CINAME\fR\f(CW}\*(C'\fR) and can be accessed by name
+after a successful match via \f(CW\*(C`%+\*(C'\fR or \f(CW\*(C`%\-\*(C'\fR. See perlvar
+for more details on the \f(CW\*(C`%+\*(C'\fR and \f(CW\*(C`%\-\*(C'\fR hashes.
+.Sp
+If multiple distinct capture groups have the same name, then
+\&\f(CW$+{\fR\f(CINAME\fR\f(CW}\fR will refer to the leftmost defined group in the match.
+.Sp
+The forms \f(CW\*(C`(?\*(Aq\fR\f(CINAME\fR\f(CW\*(Aq\fR\f(CIpattern\fR\f(CW)\*(C'\fR and \f(CW\*(C`(?<\fR\f(CINAME\fR\f(CW>\fR\f(CIpattern\fR\f(CW)\*(C'\fR
+are equivalent.
+.Sp
+\&\fBNOTE:\fR While the notation of this construct is the same as the similar
+function in .NET regexes, the behavior is not. In Perl the groups are
+numbered sequentially regardless of being named or not. Thus in the
+pattern
+.Sp
+.Vb 1
+\& /(x)(?<foo>y)(z)/
+.Ve
+.Sp
+\&\f(CW$+{foo}\fR will be the same as \f(CW$2\fR, and \f(CW$3\fR will contain 'z' instead of
+the opposite which is what a .NET regex hacker might expect.
+.Sp
+Currently \fINAME\fR is restricted to simple identifiers only.
+In other words, it must match \f(CW\*(C`/^[_A\-Za\-z][_A\-Za\-z0\-9]*\ez/\*(C'\fR or
+its Unicode extension (see utf8),
+though it isn't extended by the locale (see perllocale).
+.Sp
+\&\fBNOTE:\fR In order to make things easier for programmers with experience
+with the Python or PCRE regex engines, the pattern \f(CW\*(C`(?P<\fR\f(CINAME\fR\f(CW>\fR\f(CIpattern\fR\f(CW)\*(C'\fR
+may be used instead of \f(CW\*(C`(?<\fR\f(CINAME\fR\f(CW>\fR\f(CIpattern\fR\f(CW)\*(C'\fR; however this form does not
+support the use of single quotes as a delimiter for the name.
+.ie n .IP """\ek<\fINAME\fR>""" 4
+.el .IP \f(CW\ek<\fR\f(CINAME\fR\f(CW>\fR 4
+.IX Item "k<NAME>"
+.PD 0
+.ie n .IP """\ek\*(Aq\fINAME\fR\*(Aq""" 4
+.el .IP \f(CW\ek\*(Aq\fR\f(CINAME\fR\f(CW\*(Aq\fR 4
+.IX Item "kNAME"
+.ie n .IP """\ek{\fINAME\fR}""" 4
+.el .IP \f(CW\ek{\fR\f(CINAME\fR\f(CW}\fR 4
+.IX Item "k{NAME}"
+.PD
+Named backreference. Similar to numeric backreferences, except that
+the group is designated by name and not number. If multiple groups
+have the same name then it refers to the leftmost defined group in
+the current match.
+.Sp
+It is an error to refer to a name not defined by a \f(CW\*(C`(?<\fR\f(CINAME\fR\f(CW>)\*(C'\fR
+earlier in the pattern.
+.Sp
+All three forms are equivalent, although with \f(CW\*(C`\ek{ \fR\f(CINAME\fR\f(CW }\*(C'\fR,
+you may optionally have blanks within but adjacent to the braces, as
+shown.
+.Sp
+\&\fBNOTE:\fR In order to make things easier for programmers with experience
+with the Python or PCRE regex engines, the pattern \f(CW\*(C`(?P=\fR\f(CINAME\fR\f(CW)\*(C'\fR
+may be used instead of \f(CW\*(C`\ek<\fR\f(CINAME\fR\f(CW>\*(C'\fR.
+.ie n .IP """(?{ \fIcode\fR })""" 4
+.el .IP "\f(CW(?{ \fR\f(CIcode\fR\f(CW })\fR" 4
+.IX Xref "(?{}) regex, code in regexp, code in regular expression, code in"
+.IX Item "(?{ code })"
+\&\fBWARNING\fR: Using this feature safely requires that you understand its
+limitations. Code executed that has side effects may not perform identically
+from version to version due to the effect of future optimisations in the regex
+engine. For more information on this, see "Embedded Code Execution
+Frequency".
+.Sp
+This zero-width assertion executes any embedded Perl code. It always
+succeeds, and its return value is set as \f(CW$^R\fR.
+.Sp
+In literal patterns, the code is parsed at the same time as the
+surrounding code. While within the pattern, control is passed temporarily
+back to the perl parser, until the logically-balancing closing brace is
+encountered. This is similar to the way that an array index expression in
+a literal string is handled, for example
+.Sp
+.Vb 1
+\& "abc$array[ 1 + f(\*(Aq[\*(Aq) + g()]def"
+.Ve
+.Sp
+In particular, braces do not need to be balanced:
+.Sp
+.Vb 1
+\& s/abc(?{ f(\*(Aq{\*(Aq); })/def/
+.Ve
+.Sp
+Even in a pattern that is interpolated and compiled at run-time, literal
+code blocks will be compiled once, at perl compile time; the following
+prints "ABCD":
+.Sp
+.Vb 5
+\& print "D";
+\& my $qr = qr/(?{ BEGIN { print "A" } })/;
+\& my $foo = "foo";
+\& /$foo$qr(?{ BEGIN { print "B" } })/;
+\& BEGIN { print "C" }
+.Ve
+.Sp
+In patterns where the text of the code is derived from run-time
+information rather than appearing literally in a source code /pattern/,
+the code is compiled at the same time that the pattern is compiled, and
+for reasons of security, \f(CW\*(C`use re \*(Aqeval\*(Aq\*(C'\fR must be in scope. This is to
+stop user-supplied patterns containing code snippets from being
+executable.
+.Sp
+In situations where you need to enable this with \f(CW\*(C`use re \*(Aqeval\*(Aq\*(C'\fR, you should
+also have taint checking enabled, if your perl supports it.
+Better yet, use the carefully constrained evaluation within a Safe compartment.
+See perlsec for details about both these mechanisms.
+.Sp
+From the viewpoint of parsing, lexical variable scope and closures,
+.Sp
+.Vb 1
+\& /AAA(?{ BBB })CCC/
+.Ve
+.Sp
+behaves approximately like
+.Sp
+.Vb 1
+\& /AAA/ && do { BBB } && /CCC/
+.Ve
+.Sp
+Similarly,
+.Sp
+.Vb 1
+\& qr/AAA(?{ BBB })CCC/
+.Ve
+.Sp
+behaves approximately like
+.Sp
+.Vb 1
+\& sub { /AAA/ && do { BBB } && /CCC/ }
+.Ve
+.Sp
+In particular:
+.Sp
+.Vb 3
+\& { my $i = 1; $r = qr/(?{ print $i })/ }
+\& my $i = 2;
+\& /$r/; # prints "1"
+.Ve
+.Sp
+Inside a \f(CW\*(C`(?{...})\*(C'\fR block, \f(CW$_\fR refers to the string the regular
+expression is matching against. You can also use \f(CWpos()\fR to know what is
+the current position of matching within this string.
+.Sp
+The code block introduces a new scope from the perspective of lexical
+variable declarations, but \fBnot\fR from the perspective of \f(CW\*(C`local\*(C'\fR and
+similar localizing behaviours. So later code blocks within the same
+pattern will still see the values which were localized in earlier blocks.
+These accumulated localizations are undone either at the end of a
+successful match, or if the assertion is backtracked (compare
+"Backtracking"). For example,
+.Sp
+.Vb 10
+\& $_ = \*(Aqa\*(Aq x 8;
+\& m<
+\& (?{ $cnt = 0 }) # Initialize $cnt.
+\& (
+\& a
+\& (?{
+\& local $cnt = $cnt + 1; # Update $cnt,
+\& # backtracking\-safe.
+\& })
+\& )*
+\& aaaa
+\& (?{ $res = $cnt }) # On success copy to
+\& # non\-localized location.
+\& >x;
+.Ve
+.Sp
+will initially increment \f(CW$cnt\fR up to 8; then during backtracking, its
+value will be unwound back to 4, which is the value assigned to \f(CW$res\fR.
+At the end of the regex execution, \f(CW$cnt\fR will be wound back to its initial
+value of 0.
+.Sp
+This assertion may be used as the condition in a
+.Sp
+.Vb 1
+\& (?(condition)yes\-pattern|no\-pattern)
+.Ve
+.Sp
+switch. If \fInot\fR used in this way, the result of evaluation of \fIcode\fR
+is put into the special variable \f(CW$^R\fR. This happens immediately, so
+\&\f(CW$^R\fR can be used from other \f(CW\*(C`(?{ \fR\f(CIcode\fR\f(CW })\*(C'\fR assertions inside the same
+regular expression.
+.Sp
+The assignment to \f(CW$^R\fR above is properly localized, so the old
+value of \f(CW$^R\fR is restored if the assertion is backtracked; compare
+"Backtracking".
+.Sp
+Note that the special variable \f(CW$^N\fR is particularly useful with code
+blocks to capture the results of submatches in variables without having to
+keep track of the number of nested parentheses. For example:
+.Sp
+.Vb 3
+\& $_ = "The brown fox jumps over the lazy dog";
+\& /the (\eS+)(?{ $color = $^N }) (\eS+)(?{ $animal = $^N })/i;
+\& print "color = $color, animal = $animal\en";
+.Ve
+.Sp
+The use of this construct disables some optimisations globally in the
+pattern, and the pattern may execute much slower as a consequence.
+Use a \f(CW\*(C`*\*(C'\fR instead of the \f(CW\*(C`?\*(C'\fR block to create an optimistic form of
+this construct. \f(CW\*(C`(*{ ... })\*(C'\fR should not disable any optimisations.
+.ie n .IP """(*{ \fIcode\fR })""" 4
+.el .IP "\f(CW(*{ \fR\f(CIcode\fR\f(CW })\fR" 4
+.IX Xref "(*{}) regex, optimistic code"
+.IX Item "(*{ code })"
+This is *exactly* the same as \f(CW\*(C`(?{ \fR\f(CIcode\fR\f(CW })\*(C'\fR with the exception
+that it does not disable \fBany\fR optimisations at all in the regex engine.
+How often it is executed may vary from perl release to perl release.
+In a failing match it may not even be executed at all.
+.ie n .IP """(??{ \fIcode\fR })""" 4
+.el .IP "\f(CW(??{ \fR\f(CIcode\fR\f(CW })\fR" 4
+.IX Xref "(??{}) regex, postponed regexp, postponed regular expression, postponed"
+.IX Item "(??{ code })"
+\&\fBWARNING\fR: Using this feature safely requires that you understand its
+limitations. Code executed that has side effects may not perform
+identically from version to version due to the effect of future
+optimisations in the regex engine. For more information on this, see
+"Embedded Code Execution Frequency".
+.Sp
+This is a "postponed" regular subexpression. It behaves in \fIexactly\fR the
+same way as a \f(CW\*(C`(?{ \fR\f(CIcode\fR\f(CW })\*(C'\fR code block as described above, except that
+its return value, rather than being assigned to \f(CW$^R\fR, is treated as a
+pattern, compiled if it's a string (or used as-is if its a qr// object),
+then matched as if it were inserted instead of this construct.
+.Sp
+During the matching of this sub-pattern, it has its own set of
+captures which are valid during the sub-match, but are discarded once
+control returns to the main pattern. For example, the following matches,
+with the inner pattern capturing "B" and matching "BB", while the outer
+pattern captures "A";
+.Sp
+.Vb 3
+\& my $inner = \*(Aq(.)\e1\*(Aq;
+\& "ABBA" =~ /^(.)(??{ $inner })\e1/;
+\& print $1; # prints "A";
+.Ve
+.Sp
+Note that this means that there is no way for the inner pattern to refer
+to a capture group defined outside. (The code block itself can use \f(CW$1\fR,
+\&\fIetc\fR., to refer to the enclosing pattern's capture groups.) Thus, although
+.Sp
+.Vb 1
+\& (\*(Aqa\*(Aq x 100)=~/(??{\*(Aq(.)\*(Aq x 100})/
+.Ve
+.Sp
+\&\fIwill\fR match, it will \fInot\fR set \f(CW$1\fR on exit.
+.Sp
+The following pattern matches a parenthesized group:
+.Sp
+.Vb 9
+\& $re = qr{
+\& \e(
+\& (?:
+\& (?> [^()]+ ) # Non\-parens without backtracking
+\& |
+\& (??{ $re }) # Group with matching parens
+\& )*
+\& \e)
+\& }x;
+.Ve
+.Sp
+See also
+\&\f(CW\*(C`(?\fR\f(CIPARNO\fR\f(CW)\*(C'\fR
+for a different, more efficient way to accomplish
+the same task.
+.Sp
+Executing a postponed regular expression too many times without
+consuming any input string will also result in a fatal error. The depth
+at which that happens is compiled into perl, so it can be changed with a
+custom build.
+.Sp
+The use of this construct disables some optimisations globally in the pattern,
+and the pattern may execute much slower as a consequence.
+.ie n .IP """(?\fIPARNO\fR)"" ""(?\-\fIPARNO\fR)"" ""(?+\fIPARNO\fR)"" ""(?R)"" ""(?0)""" 4
+.el .IP "\f(CW(?\fR\f(CIPARNO\fR\f(CW)\fR \f(CW(?\-\fR\f(CIPARNO\fR\f(CW)\fR \f(CW(?+\fR\f(CIPARNO\fR\f(CW)\fR \f(CW(?R)\fR \f(CW(?0)\fR" 4
+.IX Xref "(?PARNO) (?1) (?R) (?0) (?-1) (?+1) (?-PARNO) (?+PARNO) regex, recursive regexp, recursive regular expression, recursive regex, relative recursion GOSUB GOSTART"
+.IX Item "(?PARNO) (?-PARNO) (?+PARNO) (?R) (?0)"
+Recursive subpattern. Treat the contents of a given capture buffer in the
+current pattern as an independent subpattern and attempt to match it at
+the current position in the string. Information about capture state from
+the caller for things like backreferences is available to the subpattern,
+but capture buffers set by the subpattern are not visible to the caller.
+.Sp
+Similar to \f(CW\*(C`(??{ \fR\f(CIcode\fR\f(CW })\*(C'\fR except that it does not involve executing any
+code or potentially compiling a returned pattern string; instead it treats
+the part of the current pattern contained within a specified capture group
+as an independent pattern that must match at the current position. Also
+different is the treatment of capture buffers, unlike \f(CW\*(C`(??{ \fR\f(CIcode\fR\f(CW })\*(C'\fR
+recursive patterns have access to their caller's match state, so one can
+use backreferences safely.
+.Sp
+\&\fIPARNO\fR is a sequence of digits (not starting with 0) whose value reflects
+the paren-number of the capture group to recurse to. \f(CW\*(C`(?R)\*(C'\fR recurses to
+the beginning of the whole pattern. \f(CW\*(C`(?0)\*(C'\fR is an alternate syntax for
+\&\f(CW\*(C`(?R)\*(C'\fR. If \fIPARNO\fR is preceded by a plus or minus sign then it is assumed
+to be relative, with negative numbers indicating preceding capture groups
+and positive ones following. Thus \f(CW\*(C`(?\-1)\*(C'\fR refers to the most recently
+declared group, and \f(CW\*(C`(?+1)\*(C'\fR indicates the next group to be declared.
+Note that the counting for relative recursion differs from that of
+relative backreferences, in that with recursion unclosed groups \fBare\fR
+included.
+.Sp
+The following pattern matches a function \f(CWfoo()\fR which may contain
+balanced parentheses as the argument.
+.Sp
+.Vb 10
+\& $re = qr{ ( # paren group 1 (full function)
+\& foo
+\& ( # paren group 2 (parens)
+\& \e(
+\& ( # paren group 3 (contents of parens)
+\& (?:
+\& (?> [^()]+ ) # Non\-parens without backtracking
+\& |
+\& (?2) # Recurse to start of paren group 2
+\& )*
+\& )
+\& \e)
+\& )
+\& )
+\& }x;
+.Ve
+.Sp
+If the pattern was used as follows
+.Sp
+.Vb 4
+\& \*(Aqfoo(bar(baz)+baz(bop))\*(Aq=~/$re/
+\& and print "\e$1 = $1\en",
+\& "\e$2 = $2\en",
+\& "\e$3 = $3\en";
+.Ve
+.Sp
+the output produced should be the following:
+.Sp
+.Vb 3
+\& $1 = foo(bar(baz)+baz(bop))
+\& $2 = (bar(baz)+baz(bop))
+\& $3 = bar(baz)+baz(bop)
+.Ve
+.Sp
+If there is no corresponding capture group defined, then it is a
+fatal error. Recursing deeply without consuming any input string will
+also result in a fatal error. The depth at which that happens is
+compiled into perl, so it can be changed with a custom build.
+.Sp
+The following shows how using negative indexing can make it
+easier to embed recursive patterns inside of a \f(CW\*(C`qr//\*(C'\fR construct
+for later use:
+.Sp
+.Vb 4
+\& my $parens = qr/(\e((?:[^()]++|(?\-1))*+\e))/;
+\& if (/foo $parens \es+ \e+ \es+ bar $parens/x) {
+\& # do something here...
+\& }
+.Ve
+.Sp
+\&\fBNote\fR that this pattern does not behave the same way as the equivalent
+PCRE or Python construct of the same form. In Perl you can backtrack into
+a recursed group, in PCRE and Python the recursed into group is treated
+as atomic. Also, modifiers are resolved at compile time, so constructs
+like \f(CW\*(C`(?i:(?1))\*(C'\fR or \f(CW\*(C`(?:(?i)(?1))\*(C'\fR do not affect how the sub-pattern will
+be processed.
+.ie n .IP """(?&\fINAME\fR)""" 4
+.el .IP \f(CW(?&\fR\f(CINAME\fR\f(CW)\fR 4
+.IX Xref "(?&NAME)"
+.IX Item "(?&NAME)"
+Recurse to a named subpattern. Identical to \f(CW\*(C`(?\fR\f(CIPARNO\fR\f(CW)\*(C'\fR except that the
+parenthesis to recurse to is determined by name. If multiple parentheses have
+the same name, then it recurses to the leftmost.
+.Sp
+It is an error to refer to a name that is not declared somewhere in the
+pattern.
+.Sp
+\&\fBNOTE:\fR In order to make things easier for programmers with experience
+with the Python or PCRE regex engines the pattern \f(CW\*(C`(?P>\fR\f(CINAME\fR\f(CW)\*(C'\fR
+may be used instead of \f(CW\*(C`(?&\fR\f(CINAME\fR\f(CW)\*(C'\fR.
+.ie n .IP """(?(\fIcondition\fR)\fIyes\-pattern\fR|\fIno\-pattern\fR)""" 4
+.el .IP \f(CW(?(\fR\f(CIcondition\fR\f(CW)\fR\f(CIyes\-pattern\fR\f(CW|\fR\f(CIno\-pattern\fR\f(CW)\fR 4
+.IX Xref "(?()"
+.IX Item "(?(condition)yes-pattern|no-pattern)"
+.PD 0
+.ie n .IP """(?(\fIcondition\fR)\fIyes\-pattern\fR)""" 4
+.el .IP \f(CW(?(\fR\f(CIcondition\fR\f(CW)\fR\f(CIyes\-pattern\fR\f(CW)\fR 4
+.IX Item "(?(condition)yes-pattern)"
+.PD
+Conditional expression. Matches \fIyes-pattern\fR if \fIcondition\fR yields
+a true value, matches \fIno-pattern\fR otherwise. A missing pattern always
+matches.
+.Sp
+\&\f(CW\*(C`(\fR\f(CIcondition\fR\f(CW)\*(C'\fR should be one of:
+.RS 4
+.IP "an integer in parentheses" 4
+.IX Item "an integer in parentheses"
+(which is valid if the corresponding pair of parentheses
+matched);
+.IP "a lookahead/lookbehind/evaluate zero-width assertion;" 4
+.IX Item "a lookahead/lookbehind/evaluate zero-width assertion;"
+.PD 0
+.IP "a name in angle brackets or single quotes" 4
+.IX Item "a name in angle brackets or single quotes"
+.PD
+(which is valid if a group with the given name matched);
+.ie n .IP "the special symbol ""(R)""" 4
+.el .IP "the special symbol \f(CW(R)\fR" 4
+.IX Item "the special symbol (R)"
+(true when evaluated inside of recursion or eval). Additionally the
+\&\f(CW"R"\fR may be
+followed by a number, (which will be true when evaluated when recursing
+inside of the appropriate group), or by \f(CW\*(C`&\fR\f(CINAME\fR\f(CW\*(C'\fR, in which case it will
+be true only when evaluated during recursion in the named group.
+.RE
+.RS 4
+.Sp
+Here's a summary of the possible predicates:
+.ie n .IP """(1)"" ""(2)"" ..." 4
+.el .IP "\f(CW(1)\fR \f(CW(2)\fR ..." 4
+.IX Item "(1) (2) ..."
+Checks if the numbered capturing group has matched something.
+Full syntax: \f(CW\*(C`(?(1)then|else)\*(C'\fR
+.ie n .IP """(<\fINAME\fR>)"" ""(\*(Aq\fINAME\fR\*(Aq)""" 4
+.el .IP "\f(CW(<\fR\f(CINAME\fR\f(CW>)\fR \f(CW(\*(Aq\fR\f(CINAME\fR\f(CW\*(Aq)\fR" 4
+.IX Item "(<NAME>) (NAME)"
+Checks if a group with the given name has matched something.
+Full syntax: \f(CW\*(C`(?(<name>)then|else)\*(C'\fR
+.ie n .IP """(?=...)"" ""(?!...)"" ""(?<=...)"" ""(?<!...)""" 4
+.el .IP "\f(CW(?=...)\fR \f(CW(?!...)\fR \f(CW(?<=...)\fR \f(CW(?<!...)\fR" 4
+.IX Item "(?=...) (?!...) (?<=...) (?<!...)"
+Checks whether the pattern matches (or does not match, for the \f(CW"!"\fR
+variants).
+Full syntax: \f(CW\*(C`(?(?=\fR\f(CIlookahead\fR\f(CW)\fR\f(CIthen\fR\f(CW|\fR\f(CIelse\fR\f(CW)\*(C'\fR
+.ie n .IP """(?{ \fICODE\fR })""" 4
+.el .IP "\f(CW(?{ \fR\f(CICODE\fR\f(CW })\fR" 4
+.IX Item "(?{ CODE })"
+Treats the return value of the code block as the condition.
+Full syntax: \f(CW\*(C`(?(?{ \fR\f(CICODE\fR\f(CW })\fR\f(CIthen\fR\f(CW|\fR\f(CIelse\fR\f(CW)\*(C'\fR
+.Sp
+Note use of this construct may globally affect the performance
+of the pattern. Consider using \f(CW\*(C`(*{ \fR\f(CICODE\fR\f(CW })\*(C'\fR
+.ie n .IP """(*{ \fICODE\fR })""" 4
+.el .IP "\f(CW(*{ \fR\f(CICODE\fR\f(CW })\fR" 4
+.IX Item "(*{ CODE })"
+Treats the return value of the code block as the condition.
+Full syntax: \f(CW\*(C`(?(*{ \fR\f(CICODE\fR\f(CW })\fR\f(CIthen\fR\f(CW|\fR\f(CIelse\fR\f(CW)\*(C'\fR
+.ie n .IP """(R)""" 4
+.el .IP \f(CW(R)\fR 4
+.IX Item "(R)"
+Checks if the expression has been evaluated inside of recursion.
+Full syntax: \f(CW\*(C`(?(R)\fR\f(CIthen\fR\f(CW|\fR\f(CIelse\fR\f(CW)\*(C'\fR
+.ie n .IP """(R1)"" ""(R2)"" ..." 4
+.el .IP "\f(CW(R1)\fR \f(CW(R2)\fR ..." 4
+.IX Item "(R1) (R2) ..."
+Checks if the expression has been evaluated while executing directly
+inside of the n\-th capture group. This check is the regex equivalent of
+.Sp
+.Vb 1
+\& if ((caller(0))[3] eq \*(Aqsubname\*(Aq) { ... }
+.Ve
+.Sp
+In other words, it does not check the full recursion stack.
+.Sp
+Full syntax: \f(CW\*(C`(?(R1)\fR\f(CIthen\fR\f(CW|\fR\f(CIelse\fR\f(CW)\*(C'\fR
+.ie n .IP """(R&\fINAME\fR)""" 4
+.el .IP \f(CW(R&\fR\f(CINAME\fR\f(CW)\fR 4
+.IX Item "(R&NAME)"
+Similar to \f(CW\*(C`(R1)\*(C'\fR, this predicate checks to see if we're executing
+directly inside of the leftmost group with a given name (this is the same
+logic used by \f(CW\*(C`(?&\fR\f(CINAME\fR\f(CW)\*(C'\fR to disambiguate). It does not check the full
+stack, but only the name of the innermost active recursion.
+Full syntax: \f(CW\*(C`(?(R&\fR\f(CIname\fR\f(CW)\fR\f(CIthen\fR\f(CW|\fR\f(CIelse\fR\f(CW)\*(C'\fR
+.ie n .IP """(DEFINE)""" 4
+.el .IP \f(CW(DEFINE)\fR 4
+.IX Item "(DEFINE)"
+In this case, the yes-pattern is never directly executed, and no
+no-pattern is allowed. Similar in spirit to \f(CW\*(C`(?{0})\*(C'\fR but more efficient.
+See below for details.
+Full syntax: \f(CW\*(C`(?(DEFINE)\fR\f(CIdefinitions\fR\f(CW...)\*(C'\fR
+.RE
+.RS 4
+.Sp
+For example:
+.Sp
+.Vb 4
+\& m{ ( \e( )?
+\& [^()]+
+\& (?(1) \e) )
+\& }x
+.Ve
+.Sp
+matches a chunk of non-parentheses, possibly included in parentheses
+themselves.
+.Sp
+A special form is the \f(CW\*(C`(DEFINE)\*(C'\fR predicate, which never executes its
+yes-pattern directly, and does not allow a no-pattern. This allows one to
+define subpatterns which will be executed only by the recursion mechanism.
+This way, you can define a set of regular expression rules that can be
+bundled into any pattern you choose.
+.Sp
+It is recommended that for this usage you put the DEFINE block at the
+end of the pattern, and that you name any subpatterns defined within it.
+.Sp
+Also, it's worth noting that patterns defined this way probably will
+not be as efficient, as the optimizer is not very clever about
+handling them.
+.Sp
+An example of how this might be used is as follows:
+.Sp
+.Vb 5
+\& /(?<NAME>(?&NAME_PAT))(?<ADDR>(?&ADDRESS_PAT))
+\& (?(DEFINE)
+\& (?<NAME_PAT>....)
+\& (?<ADDRESS_PAT>....)
+\& )/x
+.Ve
+.Sp
+Note that capture groups matched inside of recursion are not accessible
+after the recursion returns, so the extra layer of capturing groups is
+necessary. Thus \f(CW$+{NAME_PAT}\fR would not be defined even though
+\&\f(CW$+{NAME}\fR would be.
+.Sp
+Finally, keep in mind that subpatterns created inside a DEFINE block
+count towards the absolute and relative number of captures, so this:
+.Sp
+.Vb 5
+\& my @captures = "a" =~ /(.) # First capture
+\& (?(DEFINE)
+\& (?<EXAMPLE> 1 ) # Second capture
+\& )/x;
+\& say scalar @captures;
+.Ve
+.Sp
+Will output 2, not 1. This is particularly important if you intend to
+compile the definitions with the \f(CW\*(C`qr//\*(C'\fR operator, and later
+interpolate them in another pattern.
+.RE
+.ie n .IP """(?>\fIpattern\fR)""" 4
+.el .IP \f(CW(?>\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Item "(?>pattern)"
+.PD 0
+.ie n .IP """(*atomic:\fIpattern\fR)""" 4
+.el .IP \f(CW(*atomic:\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Xref "(?>pattern) (*atomic backtrack backtracking atomic possessive"
+.IX Item "(*atomic:pattern)"
+.PD
+An "independent" subexpression, one which matches the substring
+that a standalone \fIpattern\fR would match if anchored at the given
+position, and it matches \fInothing other than this substring\fR. This
+construct is useful for optimizations of what would otherwise be
+"eternal" matches, because it will not backtrack (see "Backtracking").
+It may also be useful in places where the "grab all you can, and do not
+give anything back" semantic is desirable.
+.Sp
+For example: \f(CW\*(C`^(?>a*)ab\*(C'\fR will never match, since \f(CW\*(C`(?>a*)\*(C'\fR
+(anchored at the beginning of string, as above) will match \fIall\fR
+characters \f(CW"a"\fR at the beginning of string, leaving no \f(CW"a"\fR for
+\&\f(CW\*(C`ab\*(C'\fR to match. In contrast, \f(CW\*(C`a*ab\*(C'\fR will match the same as \f(CW\*(C`a+b\*(C'\fR,
+since the match of the subgroup \f(CW\*(C`a*\*(C'\fR is influenced by the following
+group \f(CW\*(C`ab\*(C'\fR (see "Backtracking"). In particular, \f(CW\*(C`a*\*(C'\fR inside
+\&\f(CW\*(C`a*ab\*(C'\fR will match fewer characters than a standalone \f(CW\*(C`a*\*(C'\fR, since
+this makes the tail match.
+.Sp
+\&\f(CW\*(C`(?>\fR\f(CIpattern\fR\f(CW)\*(C'\fR does not disable backtracking altogether once it has
+matched. It is still possible to backtrack past the construct, but not
+into it. So \f(CW\*(C`((?>a*)|(?>b*))ar\*(C'\fR will still match "bar".
+.Sp
+An effect similar to \f(CW\*(C`(?>\fR\f(CIpattern\fR\f(CW)\*(C'\fR may be achieved by writing
+\&\f(CW\*(C`(?=(\fR\f(CIpattern\fR\f(CW))\eg{\-1}\*(C'\fR. This matches the same substring as a standalone
+\&\f(CW\*(C`a+\*(C'\fR, and the following \f(CW\*(C`\eg{\-1}\*(C'\fR eats the matched string; it therefore
+makes a zero-length assertion into an analogue of \f(CW\*(C`(?>...)\*(C'\fR.
+(The difference between these two constructs is that the second one
+uses a capturing group, thus shifting ordinals of backreferences
+in the rest of a regular expression.)
+.Sp
+Consider this pattern:
+.Sp
+.Vb 8
+\& m{ \e(
+\& (
+\& [^()]+ # x+
+\& |
+\& \e( [^()]* \e)
+\& )+
+\& \e)
+\& }x
+.Ve
+.Sp
+That will efficiently match a nonempty group with matching parentheses
+two levels deep or less. However, if there is no such group, it
+will take virtually forever on a long string. That's because there
+are so many different ways to split a long string into several
+substrings. This is what \f(CW\*(C`(.+)+\*(C'\fR is doing, and \f(CW\*(C`(.+)+\*(C'\fR is similar
+to a subpattern of the above pattern. Consider how the pattern
+above detects no-match on \f(CW\*(C`((()aaaaaaaaaaaaaaaaaa\*(C'\fR in several
+seconds, but that each extra letter doubles this time. This
+exponential performance will make it appear that your program has
+hung. However, a tiny change to this pattern
+.Sp
+.Vb 8
+\& m{ \e(
+\& (
+\& (?> [^()]+ ) # change x+ above to (?> x+ )
+\& |
+\& \e( [^()]* \e)
+\& )+
+\& \e)
+\& }x
+.Ve
+.Sp
+which uses \f(CW\*(C`(?>...)\*(C'\fR matches exactly when the one above does (verifying
+this yourself would be a productive exercise), but finishes in a fourth
+the time when used on a similar string with 1000000 \f(CW"a"\fRs. Be aware,
+however, that, when this construct is followed by a
+quantifier, it currently triggers a warning message under
+the \f(CW\*(C`use warnings\*(C'\fR pragma or \fB\-w\fR switch saying it
+\&\f(CW"matches null string many times in regex"\fR.
+.Sp
+On simple groups, such as the pattern \f(CW\*(C`(?> [^()]+ )\*(C'\fR, a comparable
+effect may be achieved by negative lookahead, as in \f(CW\*(C`[^()]+ (?! [^()] )\*(C'\fR.
+This was only 4 times slower on a string with 1000000 \f(CW"a"\fRs.
+.Sp
+The "grab all you can, and do not give anything back" semantic is desirable
+in many situations where on the first sight a simple \f(CW\*(C`()*\*(C'\fR looks like
+the correct solution. Suppose we parse text with comments being delimited
+by \f(CW"#"\fR followed by some optional (horizontal) whitespace. Contrary to
+its appearance, \f(CW\*(C`#[ \et]*\*(C'\fR \fIis not\fR the correct subexpression to match
+the comment delimiter, because it may "give up" some whitespace if
+the remainder of the pattern can be made to match that way. The correct
+answer is either one of these:
+.Sp
+.Vb 2
+\& (?>#[ \et]*)
+\& #[ \et]*(?![ \et])
+.Ve
+.Sp
+For example, to grab non-empty comments into \f(CW$1\fR, one should use either
+one of these:
+.Sp
+.Vb 2
+\& / (?> \e# [ \et]* ) ( .+ ) /x;
+\& / \e# [ \et]* ( [^ \et] .* ) /x;
+.Ve
+.Sp
+Which one you pick depends on which of these expressions better reflects
+the above specification of comments.
+.Sp
+In some literature this construct is called "atomic matching" or
+"possessive matching".
+.Sp
+Possessive quantifiers are equivalent to putting the item they are applied
+to inside of one of these constructs. The following equivalences apply:
+.Sp
+.Vb 6
+\& Quantifier Form Bracketing Form
+\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\& PAT*+ (?>PAT*)
+\& PAT++ (?>PAT+)
+\& PAT?+ (?>PAT?)
+\& PAT{min,max}+ (?>PAT{min,max})
+.Ve
+.Sp
+Nested \f(CW\*(C`(?>...)\*(C'\fR constructs are not no-ops, even if at first glance
+they might seem to be. This is because the nested \f(CW\*(C`(?>...)\*(C'\fR can
+restrict internal backtracking that otherwise might occur. For example,
+.Sp
+.Vb 1
+\& "abc" =~ /(?>a[bc]*c)/
+.Ve
+.Sp
+matches, but
+.Sp
+.Vb 1
+\& "abc" =~ /(?>a(?>[bc]*)c)/
+.Ve
+.Sp
+does not.
+.ie n .IP """(?[ ])""" 4
+.el .IP "\f(CW(?[ ])\fR" 4
+.IX Item "(?[ ])"
+See "Extended Bracketed Character Classes" in perlrecharclass.
+.SS Backtracking
+.IX Xref "backtrack backtracking"
+.IX Subsection "Backtracking"
+NOTE: This section presents an abstract approximation of regular
+expression behavior. For a more rigorous (and complicated) view of
+the rules involved in selecting a match among possible alternatives,
+see "Combining RE Pieces".
+.PP
+A fundamental feature of regular expression matching involves the
+notion called \fIbacktracking\fR, which is currently used (when needed)
+by all regular non-possessive expression quantifiers, namely \f(CW"*"\fR,
+\&\f(CW\*(C`*?\*(C'\fR, \f(CW"+"\fR, \f(CW\*(C`+?\*(C'\fR, \f(CW\*(C`{n,m}\*(C'\fR, and \f(CW\*(C`{n,m}?\*(C'\fR. Backtracking is often
+optimized internally, but the general principle outlined here is valid.
+.PP
+For a regular expression to match, the \fIentire\fR regular expression must
+match, not just part of it. So if the beginning of a pattern containing a
+quantifier succeeds in a way that causes later parts in the pattern to
+fail, the matching engine backs up and recalculates the beginning
+part\-\-that's why it's called backtracking.
+.PP
+Here is an example of backtracking: Let's say you want to find the
+word following "foo" in the string "Food is on the foo table.":
+.PP
+.Vb 4
+\& $_ = "Food is on the foo table.";
+\& if ( /\eb(foo)\es+(\ew+)/i ) {
+\& print "$2 follows $1.\en";
+\& }
+.Ve
+.PP
+When the match runs, the first part of the regular expression (\f(CW\*(C`\eb(foo)\*(C'\fR)
+finds a possible match right at the beginning of the string, and loads up
+\&\f(CW$1\fR with "Foo". However, as soon as the matching engine sees that there's
+no whitespace following the "Foo" that it had saved in \f(CW$1\fR, it realizes its
+mistake and starts over again one character after where it had the
+tentative match. This time it goes all the way until the next occurrence
+of "foo". The complete regular expression matches this time, and you get
+the expected output of "table follows foo."
+.PP
+Sometimes minimal matching can help a lot. Imagine you'd like to match
+everything between "foo" and "bar". Initially, you write something
+like this:
+.PP
+.Vb 4
+\& $_ = "The food is under the bar in the barn.";
+\& if ( /foo(.*)bar/ ) {
+\& print "got <$1>\en";
+\& }
+.Ve
+.PP
+Which perhaps unexpectedly yields:
+.PP
+.Vb 1
+\& got <d is under the bar in the >
+.Ve
+.PP
+That's because \f(CW\*(C`.*\*(C'\fR was greedy, so you get everything between the
+\&\fIfirst\fR "foo" and the \fIlast\fR "bar". Here it's more effective
+to use minimal matching to make sure you get the text between a "foo"
+and the first "bar" thereafter.
+.PP
+.Vb 2
+\& if ( /foo(.*?)bar/ ) { print "got <$1>\en" }
+\& got <d is under the >
+.Ve
+.PP
+Here's another example. Let's say you'd like to match a number at the end
+of a string, and you also want to keep the preceding part of the match.
+So you write this:
+.PP
+.Vb 4
+\& $_ = "I have 2 numbers: 53147";
+\& if ( /(.*)(\ed*)/ ) { # Wrong!
+\& print "Beginning is <$1>, number is <$2>.\en";
+\& }
+.Ve
+.PP
+That won't work at all, because \f(CW\*(C`.*\*(C'\fR was greedy and gobbled up the
+whole string. As \f(CW\*(C`\ed*\*(C'\fR can match on an empty string the complete
+regular expression matched successfully.
+.PP
+.Vb 1
+\& Beginning is <I have 2 numbers: 53147>, number is <>.
+.Ve
+.PP
+Here are some variants, most of which don't work:
+.PP
+.Vb 11
+\& $_ = "I have 2 numbers: 53147";
+\& @pats = qw{
+\& (.*)(\ed*)
+\& (.*)(\ed+)
+\& (.*?)(\ed*)
+\& (.*?)(\ed+)
+\& (.*)(\ed+)$
+\& (.*?)(\ed+)$
+\& (.*)\eb(\ed+)$
+\& (.*\eD)(\ed+)$
+\& };
+\&
+\& for $pat (@pats) {
+\& printf "%\-12s ", $pat;
+\& if ( /$pat/ ) {
+\& print "<$1> <$2>\en";
+\& } else {
+\& print "FAIL\en";
+\& }
+\& }
+.Ve
+.PP
+That will print out:
+.PP
+.Vb 8
+\& (.*)(\ed*) <I have 2 numbers: 53147> <>
+\& (.*)(\ed+) <I have 2 numbers: 5314> <7>
+\& (.*?)(\ed*) <> <>
+\& (.*?)(\ed+) <I have > <2>
+\& (.*)(\ed+)$ <I have 2 numbers: 5314> <7>
+\& (.*?)(\ed+)$ <I have 2 numbers: > <53147>
+\& (.*)\eb(\ed+)$ <I have 2 numbers: > <53147>
+\& (.*\eD)(\ed+)$ <I have 2 numbers: > <53147>
+.Ve
+.PP
+As you see, this can be a bit tricky. It's important to realize that a
+regular expression is merely a set of assertions that gives a definition
+of success. There may be 0, 1, or several different ways that the
+definition might succeed against a particular string. And if there are
+multiple ways it might succeed, you need to understand backtracking to
+know which variety of success you will achieve.
+.PP
+When using lookahead assertions and negations, this can all get even
+trickier. Imagine you'd like to find a sequence of non-digits not
+followed by "123". You might try to write that as
+.PP
+.Vb 4
+\& $_ = "ABC123";
+\& if ( /^\eD*(?!123)/ ) { # Wrong!
+\& print "Yup, no 123 in $_\en";
+\& }
+.Ve
+.PP
+But that isn't going to match; at least, not the way you're hoping. It
+claims that there is no 123 in the string. Here's a clearer picture of
+why that pattern matches, contrary to popular expectations:
+.PP
+.Vb 2
+\& $x = \*(AqABC123\*(Aq;
+\& $y = \*(AqABC445\*(Aq;
+\&
+\& print "1: got $1\en" if $x =~ /^(ABC)(?!123)/;
+\& print "2: got $1\en" if $y =~ /^(ABC)(?!123)/;
+\&
+\& print "3: got $1\en" if $x =~ /^(\eD*)(?!123)/;
+\& print "4: got $1\en" if $y =~ /^(\eD*)(?!123)/;
+.Ve
+.PP
+This prints
+.PP
+.Vb 3
+\& 2: got ABC
+\& 3: got AB
+\& 4: got ABC
+.Ve
+.PP
+You might have expected test 3 to fail because it seems to a more
+general purpose version of test 1. The important difference between
+them is that test 3 contains a quantifier (\f(CW\*(C`\eD*\*(C'\fR) and so can use
+backtracking, whereas test 1 will not. What's happening is
+that you've asked "Is it true that at the start of \f(CW$x\fR, following 0 or more
+non-digits, you have something that's not 123?" If the pattern matcher had
+let \f(CW\*(C`\eD*\*(C'\fR expand to "ABC", this would have caused the whole pattern to
+fail.
+.PP
+The search engine will initially match \f(CW\*(C`\eD*\*(C'\fR with "ABC". Then it will
+try to match \f(CW\*(C`(?!123)\*(C'\fR with "123", which fails. But because
+a quantifier (\f(CW\*(C`\eD*\*(C'\fR) has been used in the regular expression, the
+search engine can backtrack and retry the match differently
+in the hope of matching the complete regular expression.
+.PP
+The pattern really, \fIreally\fR wants to succeed, so it uses the
+standard pattern back-off-and-retry and lets \f(CW\*(C`\eD*\*(C'\fR expand to just "AB" this
+time. Now there's indeed something following "AB" that is not
+"123". It's "C123", which suffices.
+.PP
+We can deal with this by using both an assertion and a negation.
+We'll say that the first part in \f(CW$1\fR must be followed both by a digit
+and by something that's not "123". Remember that the lookaheads
+are zero-width expressions\-\-they only look, but don't consume any
+of the string in their match. So rewriting this way produces what
+you'd expect; that is, case 5 will fail, but case 6 succeeds:
+.PP
+.Vb 2
+\& print "5: got $1\en" if $x =~ /^(\eD*)(?=\ed)(?!123)/;
+\& print "6: got $1\en" if $y =~ /^(\eD*)(?=\ed)(?!123)/;
+\&
+\& 6: got ABC
+.Ve
+.PP
+In other words, the two zero-width assertions next to each other work as though
+they're ANDed together, just as you'd use any built-in assertions: \f(CW\*(C`/^$/\*(C'\fR
+matches only if you're at the beginning of the line AND the end of the
+line simultaneously. The deeper underlying truth is that juxtaposition in
+regular expressions always means AND, except when you write an explicit OR
+using the vertical bar. \f(CW\*(C`/ab/\*(C'\fR means match "a" AND (then) match "b",
+although the attempted matches are made at different positions because "a"
+is not a zero-width assertion, but a one-width assertion.
+.PP
+\&\fBWARNING\fR: Particularly complicated regular expressions can take
+exponential time to solve because of the immense number of possible
+ways they can use backtracking to try for a match. For example, without
+internal optimizations done by the regular expression engine, this will
+take a painfully long time to run:
+.PP
+.Vb 1
+\& \*(Aqaaaaaaaaaaaa\*(Aq =~ /((a{0,5}){0,5})*[c]/
+.Ve
+.PP
+And if you used \f(CW"*"\fR's in the internal groups instead of limiting them
+to 0 through 5 matches, then it would take forever\-\-or until you ran
+out of stack space. Moreover, these internal optimizations are not
+always applicable. For example, if you put \f(CW\*(C`{0,5}\*(C'\fR instead of \f(CW"*"\fR
+on the external group, no current optimization is applicable, and the
+match takes a long time to finish.
+.PP
+A powerful tool for optimizing such beasts is what is known as an
+"independent group",
+which does not backtrack (see \f(CW"(?>pattern)"\fR). Note also that
+zero-length lookahead/lookbehind assertions will not backtrack to make
+the tail match, since they are in "logical" context: only
+whether they match is considered relevant. For an example
+where side-effects of lookahead \fImight\fR have influenced the
+following match, see \f(CW"(?>pattern)"\fR.
+.SS "Script Runs"
+.IX Xref "(*script_run:...) (sr:...) (*atomic_script_run:...) (asr:...)"
+.IX Subsection "Script Runs"
+A script run is basically a sequence of characters, all from the same
+Unicode script (see "Scripts" in perlunicode), such as Latin or Greek. In
+most places a single word would never be written in multiple scripts,
+unless it is a spoofing attack. An infamous example, is
+.PP
+.Vb 1
+\& paypal.com
+.Ve
+.PP
+Those letters could all be Latin (as in the example just above), or they
+could be all Cyrillic (except for the dot), or they could be a mixture
+of the two. In the case of an internet address the \f(CW\*(C`.com\*(C'\fR would be in
+Latin, And any Cyrillic ones would cause it to be a mixture, not a
+script run. Someone clicking on such a link would not be directed to
+the real Paypal website, but an attacker would craft a look-alike one to
+attempt to gather sensitive information from the person.
+.PP
+Starting in Perl 5.28, it is now easy to detect strings that aren't
+script runs. Simply enclose just about any pattern like either of
+these:
+.PP
+.Vb 2
+\& (*script_run:pattern)
+\& (*sr:pattern)
+.Ve
+.PP
+What happens is that after \fIpattern\fR succeeds in matching, it is
+subjected to the additional criterion that every character in it must be
+from the same script (see exceptions below). If this isn't true,
+backtracking occurs until something all in the same script is found that
+matches, or all possibilities are exhausted. This can cause a lot of
+backtracking, but generally, only malicious input will result in this,
+though the slow down could cause a denial of service attack. If your
+needs permit, it is best to make the pattern atomic to cut down on the
+amount of backtracking. This is so likely to be what you want, that
+instead of writing this:
+.PP
+.Vb 1
+\& (*script_run:(?>pattern))
+.Ve
+.PP
+you can write either of these:
+.PP
+.Vb 2
+\& (*atomic_script_run:pattern)
+\& (*asr:pattern)
+.Ve
+.PP
+(See \f(CW"(?>\fR\f(CIpattern\fR\f(CW)"\fR.)
+.PP
+In Taiwan, Japan, and Korea, it is common for text to have a mixture of
+characters from their native scripts and base Chinese. Perl follows
+Unicode's UTS 39 (<https://unicode.org/reports/tr39/>) Unicode Security
+Mechanisms in allowing such mixtures. For example, the Japanese scripts
+Katakana and Hiragana are commonly mixed together in practice, along
+with some Chinese characters, and hence are treated as being in a single
+script run by Perl.
+.PP
+The rules used for matching decimal digits are slightly stricter. Many
+scripts have their own sets of digits equivalent to the Western \f(CW0\fR
+through \f(CW9\fR ones. A few, such as Arabic, have more than one set. For
+a string to be considered a script run, all digits in it must come from
+the same set of ten, as determined by the first digit encountered.
+As an example,
+.PP
+.Vb 1
+\& qr/(*script_run: \ed+ \eb )/x
+.Ve
+.PP
+guarantees that the digits matched will all be from the same set of 10.
+You won't get a look-alike digit from a different script that has a
+different value than what it appears to be.
+.PP
+Unicode has three pseudo scripts that are handled specially.
+.PP
+"Unknown" is applied to code points whose meaning has yet to be
+determined. Perl currently will match as a script run, any single
+character string consisting of one of these code points. But any string
+longer than one code point containing one of these will not be
+considered a script run.
+.PP
+"Inherited" is applied to characters that modify another, such as an
+accent of some type. These are considered to be in the script of the
+master character, and so never cause a script run to not match.
+.PP
+The other one is "Common". This consists of mostly punctuation, emoji,
+characters used in mathematics and music, the ASCII digits \f(CW0\fR
+through \f(CW9\fR, and full-width forms of these digits. These characters
+can appear intermixed in text in many of the world's scripts. These
+also don't cause a script run to not match. But like other scripts, all
+digits in a run must come from the same set of 10.
+.PP
+This construct is non-capturing. You can add parentheses to \fIpattern\fR
+to capture, if desired. You will have to do this if you plan to use
+"(*ACCEPT) (*ACCEPT:arg)" and not have it bypass the script run
+checking.
+.PP
+The \f(CW\*(C`Script_Extensions\*(C'\fR property as modified by UTS 39
+(<https://unicode.org/reports/tr39/>) is used as the basis for this
+feature.
+.PP
+To summarize,
+.IP \(bu 4
+All length 0 or length 1 sequences are script runs.
+.IP \(bu 4
+A longer sequence is a script run if and only if \fBall\fR of the following
+conditions are met:
+.Sp
+
+.RS 4
+.IP 1. 4
+No code point in the sequence has the \f(CW\*(C`Script_Extension\*(C'\fR property of
+\&\f(CW\*(C`Unknown\*(C'\fR.
+.Sp
+This currently means that all code points in the sequence have been
+assigned by Unicode to be characters that aren't private use nor
+surrogate code points.
+.IP 2. 4
+All characters in the sequence come from the Common script and/or the
+Inherited script and/or a single other script.
+.Sp
+The script of a character is determined by the \f(CW\*(C`Script_Extensions\*(C'\fR
+property as modified by UTS 39 (<https://unicode.org/reports/tr39/>), as
+described above.
+.IP 3. 4
+All decimal digits in the sequence come from the same block of 10
+consecutive digits.
+.RE
+.RS 4
+.RE
+.SS "Special Backtracking Control Verbs"
+.IX Subsection "Special Backtracking Control Verbs"
+These special patterns are generally of the form \f(CW\*(C`(*\fR\f(CIVERB\fR\f(CW:\fR\f(CIarg\fR\f(CW)\*(C'\fR. Unless
+otherwise stated the \fIarg\fR argument is optional; in some cases, it is
+mandatory.
+.PP
+Any pattern containing a special backtracking verb that allows an argument
+has the special behaviour that when executed it sets the current package's
+\&\f(CW$REGERROR\fR and \f(CW$REGMARK\fR variables. When doing so the following
+rules apply:
+.PP
+On failure, the \f(CW$REGERROR\fR variable will be set to the \fIarg\fR value of the
+verb pattern, if the verb was involved in the failure of the match. If the
+\&\fIarg\fR part of the pattern was omitted, then \f(CW$REGERROR\fR will be set to the
+name of the last \f(CW\*(C`(*MARK:\fR\f(CINAME\fR\f(CW)\*(C'\fR pattern executed, or to TRUE if there was
+none. Also, the \f(CW$REGMARK\fR variable will be set to FALSE.
+.PP
+On a successful match, the \f(CW$REGERROR\fR variable will be set to FALSE, and
+the \f(CW$REGMARK\fR variable will be set to the name of the last
+\&\f(CW\*(C`(*MARK:\fR\f(CINAME\fR\f(CW)\*(C'\fR pattern executed. See the explanation for the
+\&\f(CW\*(C`(*MARK:\fR\f(CINAME\fR\f(CW)\*(C'\fR verb below for more details.
+.PP
+\&\fBNOTE:\fR \f(CW$REGERROR\fR and \f(CW$REGMARK\fR are not magic variables like \f(CW$1\fR
+and most other regex-related variables. They are not local to a scope, nor
+readonly, but instead are volatile package variables similar to \f(CW$AUTOLOAD\fR.
+They are set in the package containing the code that \fIexecuted\fR the regex
+(rather than the one that compiled it, where those differ). If necessary, you
+can use \f(CW\*(C`local\*(C'\fR to localize changes to these variables to a specific scope
+before executing a regex.
+.PP
+If a pattern does not contain a special backtracking verb that allows an
+argument, then \f(CW$REGERROR\fR and \f(CW$REGMARK\fR are not touched at all.
+.IP Verbs 3
+.IX Item "Verbs"
+.RS 3
+.PD 0
+.ie n .IP """(*PRUNE)"" ""(*PRUNE:\fINAME\fR)""" 4
+.el .IP "\f(CW(*PRUNE)\fR \f(CW(*PRUNE:\fR\f(CINAME\fR\f(CW)\fR" 4
+.IX Xref "(*PRUNE) (*PRUNE:NAME)"
+.IX Item "(*PRUNE) (*PRUNE:NAME)"
+.PD
+This zero-width pattern prunes the backtracking tree at the current point
+when backtracked into on failure. Consider the pattern \f(CW\*(C`/\fR\f(CIA\fR\f(CW (*PRUNE) \fR\f(CIB\fR\f(CW/\*(C'\fR,
+where \fIA\fR and \fIB\fR are complex patterns. Until the \f(CW\*(C`(*PRUNE)\*(C'\fR verb is reached,
+\&\fIA\fR may backtrack as necessary to match. Once it is reached, matching
+continues in \fIB\fR, which may also backtrack as necessary; however, should B
+not match, then no further backtracking will take place, and the pattern
+will fail outright at the current starting position.
+.Sp
+The following example counts all the possible matching strings in a
+pattern (without actually matching any of them).
+.Sp
+.Vb 2
+\& \*(Aqaaab\*(Aq =~ /a+b?(?{print "$&\en"; $count++})(*FAIL)/;
+\& print "Count=$count\en";
+.Ve
+.Sp
+which produces:
+.Sp
+.Vb 10
+\& aaab
+\& aaa
+\& aa
+\& a
+\& aab
+\& aa
+\& a
+\& ab
+\& a
+\& Count=9
+.Ve
+.Sp
+If we add a \f(CW\*(C`(*PRUNE)\*(C'\fR before the count like the following
+.Sp
+.Vb 2
+\& \*(Aqaaab\*(Aq =~ /a+b?(*PRUNE)(?{print "$&\en"; $count++})(*FAIL)/;
+\& print "Count=$count\en";
+.Ve
+.Sp
+we prevent backtracking and find the count of the longest matching string
+at each matching starting point like so:
+.Sp
+.Vb 4
+\& aaab
+\& aab
+\& ab
+\& Count=3
+.Ve
+.Sp
+Any number of \f(CW\*(C`(*PRUNE)\*(C'\fR assertions may be used in a pattern.
+.Sp
+See also \f(CW"(?>\fR\f(CIpattern\fR\f(CW)"\fR and possessive quantifiers for
+other ways to
+control backtracking. In some cases, the use of \f(CW\*(C`(*PRUNE)\*(C'\fR can be
+replaced with a \f(CW\*(C`(?>pattern)\*(C'\fR with no functional difference; however,
+\&\f(CW\*(C`(*PRUNE)\*(C'\fR can be used to handle cases that cannot be expressed using a
+\&\f(CW\*(C`(?>pattern)\*(C'\fR alone.
+.ie n .IP """(*SKIP)"" ""(*SKIP:\fINAME\fR)""" 4
+.el .IP "\f(CW(*SKIP)\fR \f(CW(*SKIP:\fR\f(CINAME\fR\f(CW)\fR" 4
+.IX Xref "(*SKIP)"
+.IX Item "(*SKIP) (*SKIP:NAME)"
+This zero-width pattern is similar to \f(CW\*(C`(*PRUNE)\*(C'\fR, except that on
+failure it also signifies that whatever text that was matched leading up
+to the \f(CW\*(C`(*SKIP)\*(C'\fR pattern being executed cannot be part of \fIany\fR match
+of this pattern. This effectively means that the regex engine "skips" forward
+to this position on failure and tries to match again, (assuming that
+there is sufficient room to match).
+.Sp
+The name of the \f(CW\*(C`(*SKIP:\fR\f(CINAME\fR\f(CW)\*(C'\fR pattern has special significance. If a
+\&\f(CW\*(C`(*MARK:\fR\f(CINAME\fR\f(CW)\*(C'\fR was encountered while matching, then it is that position
+which is used as the "skip point". If no \f(CW\*(C`(*MARK)\*(C'\fR of that name was
+encountered, then the \f(CW\*(C`(*SKIP)\*(C'\fR operator has no effect. When used
+without a name the "skip point" is where the match point was when
+executing the \f(CW\*(C`(*SKIP)\*(C'\fR pattern.
+.Sp
+Compare the following to the examples in \f(CW\*(C`(*PRUNE)\*(C'\fR; note the string
+is twice as long:
+.Sp
+.Vb 2
+\& \*(Aqaaabaaab\*(Aq =~ /a+b?(*SKIP)(?{print "$&\en"; $count++})(*FAIL)/;
+\& print "Count=$count\en";
+.Ve
+.Sp
+outputs
+.Sp
+.Vb 3
+\& aaab
+\& aaab
+\& Count=2
+.Ve
+.Sp
+Once the 'aaab' at the start of the string has matched, and the \f(CW\*(C`(*SKIP)\*(C'\fR
+executed, the next starting point will be where the cursor was when the
+\&\f(CW\*(C`(*SKIP)\*(C'\fR was executed.
+.ie n .IP """(*MARK:\fINAME\fR)"" ""(*:\fINAME\fR)""" 4
+.el .IP "\f(CW(*MARK:\fR\f(CINAME\fR\f(CW)\fR \f(CW(*:\fR\f(CINAME\fR\f(CW)\fR" 4
+.IX Xref "(*MARK) (*MARK:NAME) (*:NAME)"
+.IX Item "(*MARK:NAME) (*:NAME)"
+This zero-width pattern can be used to mark the point reached in a string
+when a certain part of the pattern has been successfully matched. This
+mark may be given a name. A later \f(CW\*(C`(*SKIP)\*(C'\fR pattern will then skip
+forward to that point if backtracked into on failure. Any number of
+\&\f(CW\*(C`(*MARK)\*(C'\fR patterns are allowed, and the \fINAME\fR portion may be duplicated.
+.Sp
+In addition to interacting with the \f(CW\*(C`(*SKIP)\*(C'\fR pattern, \f(CW\*(C`(*MARK:\fR\f(CINAME\fR\f(CW)\*(C'\fR
+can be used to "label" a pattern branch, so that after matching, the
+program can determine which branches of the pattern were involved in the
+match.
+.Sp
+When a match is successful, the \f(CW$REGMARK\fR variable will be set to the
+name of the most recently executed \f(CW\*(C`(*MARK:\fR\f(CINAME\fR\f(CW)\*(C'\fR that was involved
+in the match.
+.Sp
+This can be used to determine which branch of a pattern was matched
+without using a separate capture group for each branch, which in turn
+can result in a performance improvement, as perl cannot optimize
+\&\f(CW\*(C`/(?:(x)|(y)|(z))/\*(C'\fR as efficiently as something like
+\&\f(CW\*(C`/(?:x(*MARK:x)|y(*MARK:y)|z(*MARK:z))/\*(C'\fR.
+.Sp
+When a match has failed, and unless another verb has been involved in
+failing the match and has provided its own name to use, the \f(CW$REGERROR\fR
+variable will be set to the name of the most recently executed
+\&\f(CW\*(C`(*MARK:\fR\f(CINAME\fR\f(CW)\*(C'\fR.
+.Sp
+See "(*SKIP)" for more details.
+.Sp
+As a shortcut \f(CW\*(C`(*MARK:\fR\f(CINAME\fR\f(CW)\*(C'\fR can be written \f(CW\*(C`(*:\fR\f(CINAME\fR\f(CW)\*(C'\fR.
+.ie n .IP """(*THEN)"" ""(*THEN:\fINAME\fR)""" 4
+.el .IP "\f(CW(*THEN)\fR \f(CW(*THEN:\fR\f(CINAME\fR\f(CW)\fR" 4
+.IX Item "(*THEN) (*THEN:NAME)"
+This is similar to the "cut group" operator \f(CW\*(C`::\*(C'\fR from Raku. Like
+\&\f(CW\*(C`(*PRUNE)\*(C'\fR, this verb always matches, and when backtracked into on
+failure, it causes the regex engine to try the next alternation in the
+innermost enclosing group (capturing or otherwise) that has alternations.
+The two branches of a \f(CW\*(C`(?(\fR\f(CIcondition\fR\f(CW)\fR\f(CIyes\-pattern\fR\f(CW|\fR\f(CIno\-pattern\fR\f(CW)\*(C'\fR do not
+count as an alternation, as far as \f(CW\*(C`(*THEN)\*(C'\fR is concerned.
+.Sp
+Its name comes from the observation that this operation combined with the
+alternation operator (\f(CW"|"\fR) can be used to create what is essentially a
+pattern-based if/then/else block:
+.Sp
+.Vb 1
+\& ( COND (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ )
+.Ve
+.Sp
+Note that if this operator is used and NOT inside of an alternation then
+it acts exactly like the \f(CW\*(C`(*PRUNE)\*(C'\fR operator.
+.Sp
+.Vb 1
+\& / A (*PRUNE) B /
+.Ve
+.Sp
+is the same as
+.Sp
+.Vb 1
+\& / A (*THEN) B /
+.Ve
+.Sp
+but
+.Sp
+.Vb 1
+\& / ( A (*THEN) B | C ) /
+.Ve
+.Sp
+is not the same as
+.Sp
+.Vb 1
+\& / ( A (*PRUNE) B | C ) /
+.Ve
+.Sp
+as after matching the \fIA\fR but failing on the \fIB\fR the \f(CW\*(C`(*THEN)\*(C'\fR verb will
+backtrack and try \fIC\fR; but the \f(CW\*(C`(*PRUNE)\*(C'\fR verb will simply fail.
+.ie n .IP """(*COMMIT)"" ""(*COMMIT:\fIarg\fR)""" 4
+.el .IP "\f(CW(*COMMIT)\fR \f(CW(*COMMIT:\fR\f(CIarg\fR\f(CW)\fR" 4
+.IX Xref "(*COMMIT)"
+.IX Item "(*COMMIT) (*COMMIT:arg)"
+This is the Raku "commit pattern" \f(CW\*(C`<commit>\*(C'\fR or \f(CW\*(C`:::\*(C'\fR. It's a
+zero-width pattern similar to \f(CW\*(C`(*SKIP)\*(C'\fR, except that when backtracked
+into on failure it causes the match to fail outright. No further attempts
+to find a valid match by advancing the start pointer will occur again.
+For example,
+.Sp
+.Vb 2
+\& \*(Aqaaabaaab\*(Aq =~ /a+b?(*COMMIT)(?{print "$&\en"; $count++})(*FAIL)/;
+\& print "Count=$count\en";
+.Ve
+.Sp
+outputs
+.Sp
+.Vb 2
+\& aaab
+\& Count=1
+.Ve
+.Sp
+In other words, once the \f(CW\*(C`(*COMMIT)\*(C'\fR has been entered, and if the pattern
+does not match, the regex engine will not try any further matching on the
+rest of the string.
+.ie n .IP """(*FAIL)"" ""(*F)"" ""(*FAIL:\fIarg\fR)""" 4
+.el .IP "\f(CW(*FAIL)\fR \f(CW(*F)\fR \f(CW(*FAIL:\fR\f(CIarg\fR\f(CW)\fR" 4
+.IX Xref "(*FAIL) (*F)"
+.IX Item "(*FAIL) (*F) (*FAIL:arg)"
+This pattern matches nothing and always fails. It can be used to force the
+engine to backtrack. It is equivalent to \f(CW\*(C`(?!)\*(C'\fR, but easier to read. In
+fact, \f(CW\*(C`(?!)\*(C'\fR gets optimised into \f(CW\*(C`(*FAIL)\*(C'\fR internally. You can provide
+an argument so that if the match fails because of this \f(CW\*(C`FAIL\*(C'\fR directive
+the argument can be obtained from \f(CW$REGERROR\fR.
+.Sp
+It is probably useful only when combined with \f(CW\*(C`(?{})\*(C'\fR or \f(CW\*(C`(??{})\*(C'\fR.
+.ie n .IP """(*ACCEPT)"" ""(*ACCEPT:\fIarg\fR)""" 4
+.el .IP "\f(CW(*ACCEPT)\fR \f(CW(*ACCEPT:\fR\f(CIarg\fR\f(CW)\fR" 4
+.IX Xref "(*ACCEPT)"
+.IX Item "(*ACCEPT) (*ACCEPT:arg)"
+This pattern matches nothing and causes the end of successful matching at
+the point at which the \f(CW\*(C`(*ACCEPT)\*(C'\fR pattern was encountered, regardless of
+whether there is actually more to match in the string. When inside of a
+nested pattern, such as recursion, or in a subpattern dynamically generated
+via \f(CW\*(C`(??{})\*(C'\fR, only the innermost pattern is ended immediately.
+.Sp
+If the \f(CW\*(C`(*ACCEPT)\*(C'\fR is inside of capturing groups then the groups are
+marked as ended at the point at which the \f(CW\*(C`(*ACCEPT)\*(C'\fR was encountered.
+For instance:
+.Sp
+.Vb 1
+\& \*(AqAB\*(Aq =~ /(A (A|B(*ACCEPT)|C) D)(E)/x;
+.Ve
+.Sp
+will match, and \f(CW$1\fR will be \f(CW\*(C`AB\*(C'\fR and \f(CW$2\fR will be \f(CW"B"\fR, \f(CW$3\fR will not
+be set. If another branch in the inner parentheses was matched, such as in the
+string 'ACDE', then the \f(CW"D"\fR and \f(CW"E"\fR would have to be matched as well.
+.Sp
+You can provide an argument, which will be available in the var
+\&\f(CW$REGMARK\fR after the match completes.
+.RE
+.RS 3
+.RE
+.ie n .SS "Warning on ""\e1"" Instead of $1"
+.el .SS "Warning on \f(CW\e1\fP Instead of \f(CW$1\fP"
+.IX Subsection "Warning on 1 Instead of $1"
+Some people get too used to writing things like:
+.PP
+.Vb 1
+\& $pattern =~ s/(\eW)/\e\e\e1/g;
+.Ve
+.PP
+This is grandfathered (for \e1 to \e9) for the RHS of a substitute to avoid
+shocking the
+\&\fBsed\fR addicts, but it's a dirty habit to get into. That's because in
+PerlThink, the righthand side of an \f(CW\*(C`s///\*(C'\fR is a double-quoted string. \f(CW\*(C`\e1\*(C'\fR in
+the usual double-quoted string means a control-A. The customary Unix
+meaning of \f(CW\*(C`\e1\*(C'\fR is kludged in for \f(CW\*(C`s///\*(C'\fR. However, if you get into the habit
+of doing that, you get yourself into trouble if you then add an \f(CW\*(C`/e\*(C'\fR
+modifier.
+.PP
+.Vb 1
+\& s/(\ed+)/ \e1 + 1 /eg; # causes warning under \-w
+.Ve
+.PP
+Or if you try to do
+.PP
+.Vb 1
+\& s/(\ed+)/\e1000/;
+.Ve
+.PP
+You can't disambiguate that by saying \f(CW\*(C`\e{1}000\*(C'\fR, whereas you can fix it with
+\&\f(CW\*(C`${1}000\*(C'\fR. The operation of interpolation should not be confused
+with the operation of matching a backreference. Certainly they mean two
+different things on the \fIleft\fR side of the \f(CW\*(C`s///\*(C'\fR.
+.SS "Repeated Patterns Matching a Zero-length Substring"
+.IX Subsection "Repeated Patterns Matching a Zero-length Substring"
+\&\fBWARNING\fR: Difficult material (and prose) ahead. This section needs a rewrite.
+.PP
+Regular expressions provide a terse and powerful programming language. As
+with most other power tools, power comes together with the ability
+to wreak havoc.
+.PP
+A common abuse of this power stems from the ability to make infinite
+loops using regular expressions, with something as innocuous as:
+.PP
+.Vb 1
+\& \*(Aqfoo\*(Aq =~ m{ ( o? )* }x;
+.Ve
+.PP
+The \f(CW\*(C`o?\*(C'\fR matches at the beginning of "\f(CW\*(C`foo\*(C'\fR", and since the position
+in the string is not moved by the match, \f(CW\*(C`o?\*(C'\fR would match again and again
+because of the \f(CW"*"\fR quantifier. Another common way to create a similar cycle
+is with the looping modifier \f(CW\*(C`/g\*(C'\fR:
+.PP
+.Vb 1
+\& @matches = ( \*(Aqfoo\*(Aq =~ m{ o? }xg );
+.Ve
+.PP
+or
+.PP
+.Vb 1
+\& print "match: <$&>\en" while \*(Aqfoo\*(Aq =~ m{ o? }xg;
+.Ve
+.PP
+or the loop implied by \f(CWsplit()\fR.
+.PP
+However, long experience has shown that many programming tasks may
+be significantly simplified by using repeated subexpressions that
+may match zero-length substrings. Here's a simple example being:
+.PP
+.Vb 2
+\& @chars = split //, $string; # // is not magic in split
+\& ($whitewashed = $string) =~ s/()/ /g; # parens avoid magic s// /
+.Ve
+.PP
+Thus Perl allows such constructs, by \fIforcefully breaking
+the infinite loop\fR. The rules for this are different for lower-level
+loops given by the greedy quantifiers \f(CW\*(C`*+{}\*(C'\fR, and for higher-level
+ones like the \f(CW\*(C`/g\*(C'\fR modifier or \f(CWsplit()\fR operator.
+.PP
+The lower-level loops are \fIinterrupted\fR (that is, the loop is
+broken) when Perl detects that a repeated expression matched a
+zero-length substring. Thus
+.PP
+.Vb 1
+\& m{ (?: NON_ZERO_LENGTH | ZERO_LENGTH )* }x;
+.Ve
+.PP
+is made equivalent to
+.PP
+.Vb 1
+\& m{ (?: NON_ZERO_LENGTH )* (?: ZERO_LENGTH )? }x;
+.Ve
+.PP
+For example, this program
+.PP
+.Vb 12
+\& #!perl \-l
+\& "aaaaab" =~ /
+\& (?:
+\& a # non\-zero
+\& | # or
+\& (?{print "hello"}) # print hello whenever this
+\& # branch is tried
+\& (?=(b)) # zero\-width assertion
+\& )* # any number of times
+\& /x;
+\& print $&;
+\& print $1;
+.Ve
+.PP
+prints
+.PP
+.Vb 3
+\& hello
+\& aaaaa
+\& b
+.Ve
+.PP
+Notice that "hello" is only printed once, as when Perl sees that the sixth
+iteration of the outermost \f(CW\*(C`(?:)*\*(C'\fR matches a zero-length string, it stops
+the \f(CW"*"\fR.
+.PP
+The higher-level loops preserve an additional state between iterations:
+whether the last match was zero-length. To break the loop, the following
+match after a zero-length match is prohibited to have a length of zero.
+This prohibition interacts with backtracking (see "Backtracking"),
+and so the \fIsecond best\fR match is chosen if the \fIbest\fR match is of
+zero length.
+.PP
+For example:
+.PP
+.Vb 2
+\& $_ = \*(Aqbar\*(Aq;
+\& s/\ew??/<$&>/g;
+.Ve
+.PP
+results in \f(CW\*(C`<><b><><a><><r><>\*(C'\fR. At each position of the string the best
+match given by non-greedy \f(CW\*(C`??\*(C'\fR is the zero-length match, and the \fIsecond
+best\fR match is what is matched by \f(CW\*(C`\ew\*(C'\fR. Thus zero-length matches
+alternate with one-character-long matches.
+.PP
+Similarly, for repeated \f(CW\*(C`m/()/g\*(C'\fR the second-best match is the match at the
+position one notch further in the string.
+.PP
+The additional state of being \fImatched with zero-length\fR is associated with
+the matched string, and is reset by each assignment to \f(CWpos()\fR.
+Zero-length matches at the end of the previous match are ignored
+during \f(CW\*(C`split\*(C'\fR.
+.SS "Combining RE Pieces"
+.IX Subsection "Combining RE Pieces"
+Each of the elementary pieces of regular expressions which were described
+before (such as \f(CW\*(C`ab\*(C'\fR or \f(CW\*(C`\eZ\*(C'\fR) could match at most one substring
+at the given position of the input string. However, in a typical regular
+expression these elementary pieces are combined into more complicated
+patterns using combining operators \f(CW\*(C`ST\*(C'\fR, \f(CW\*(C`S|T\*(C'\fR, \f(CW\*(C`S*\*(C'\fR \fIetc\fR.
+(in these examples \f(CW"S"\fR and \f(CW"T"\fR are regular subexpressions).
+.PP
+Such combinations can include alternatives, leading to a problem of choice:
+if we match a regular expression \f(CW\*(C`a|ab\*(C'\fR against \f(CW"abc"\fR, will it match
+substring \f(CW"a"\fR or \f(CW"ab"\fR? One way to describe which substring is
+actually matched is the concept of backtracking (see "Backtracking").
+However, this description is too low-level and makes you think
+in terms of a particular implementation.
+.PP
+Another description starts with notions of "better"/"worse". All the
+substrings which may be matched by the given regular expression can be
+sorted from the "best" match to the "worst" match, and it is the "best"
+match which is chosen. This substitutes the question of "what is chosen?"
+by the question of "which matches are better, and which are worse?".
+.PP
+Again, for elementary pieces there is no such question, since at most
+one match at a given position is possible. This section describes the
+notion of better/worse for combining operators. In the description
+below \f(CW"S"\fR and \f(CW"T"\fR are regular subexpressions.
+.ie n .IP """ST""" 4
+.el .IP \f(CWST\fR 4
+.IX Item "ST"
+Consider two possible matches, \f(CW\*(C`AB\*(C'\fR and \f(CW\*(C`A\*(AqB\*(Aq\*(C'\fR, \f(CW"A"\fR and \f(CW\*(C`A\*(Aq\*(C'\fR are
+substrings which can be matched by \f(CW"S"\fR, \f(CW"B"\fR and \f(CW\*(C`B\*(Aq\*(C'\fR are substrings
+which can be matched by \f(CW"T"\fR.
+.Sp
+If \f(CW"A"\fR is a better match for \f(CW"S"\fR than \f(CW\*(C`A\*(Aq\*(C'\fR, \f(CW\*(C`AB\*(C'\fR is a better
+match than \f(CW\*(C`A\*(AqB\*(Aq\*(C'\fR.
+.Sp
+If \f(CW"A"\fR and \f(CW\*(C`A\*(Aq\*(C'\fR coincide: \f(CW\*(C`AB\*(C'\fR is a better match than \f(CW\*(C`AB\*(Aq\*(C'\fR if
+\&\f(CW"B"\fR is a better match for \f(CW"T"\fR than \f(CW\*(C`B\*(Aq\*(C'\fR.
+.ie n .IP """S|T""" 4
+.el .IP \f(CWS|T\fR 4
+.IX Item "S|T"
+When \f(CW"S"\fR can match, it is a better match than when only \f(CW"T"\fR can match.
+.Sp
+Ordering of two matches for \f(CW"S"\fR is the same as for \f(CW"S"\fR. Similar for
+two matches for \f(CW"T"\fR.
+.ie n .IP """S{REPEAT_COUNT}""" 4
+.el .IP \f(CWS{REPEAT_COUNT}\fR 4
+.IX Item "S{REPEAT_COUNT}"
+Matches as \f(CW\*(C`SSS...S\*(C'\fR (repeated as many times as necessary).
+.ie n .IP """S{min,max}""" 4
+.el .IP \f(CWS{min,max}\fR 4
+.IX Item "S{min,max}"
+Matches as \f(CW\*(C`S{max}|S{max\-1}|...|S{min+1}|S{min}\*(C'\fR.
+.ie n .IP """S{min,max}?""" 4
+.el .IP \f(CWS{min,max}?\fR 4
+.IX Item "S{min,max}?"
+Matches as \f(CW\*(C`S{min}|S{min+1}|...|S{max\-1}|S{max}\*(C'\fR.
+.ie n .IP """S?"", ""S*"", ""S+""" 4
+.el .IP "\f(CWS?\fR, \f(CWS*\fR, \f(CWS+\fR" 4
+.IX Item "S?, S*, S+"
+Same as \f(CW\*(C`S{0,1}\*(C'\fR, \f(CW\*(C`S{0,BIG_NUMBER}\*(C'\fR, \f(CW\*(C`S{1,BIG_NUMBER}\*(C'\fR respectively.
+.ie n .IP """S??"", ""S*?"", ""S+?""" 4
+.el .IP "\f(CWS??\fR, \f(CWS*?\fR, \f(CWS+?\fR" 4
+.IX Item "S??, S*?, S+?"
+Same as \f(CW\*(C`S{0,1}?\*(C'\fR, \f(CW\*(C`S{0,BIG_NUMBER}?\*(C'\fR, \f(CW\*(C`S{1,BIG_NUMBER}?\*(C'\fR respectively.
+.ie n .IP """(?>S)""" 4
+.el .IP \f(CW(?>S)\fR 4
+.IX Item "(?>S)"
+Matches the best match for \f(CW"S"\fR and only that.
+.ie n .IP """(?=S)"", ""(?<=S)""" 4
+.el .IP "\f(CW(?=S)\fR, \f(CW(?<=S)\fR" 4
+.IX Item "(?=S), (?<=S)"
+Only the best match for \f(CW"S"\fR is considered. (This is important only if
+\&\f(CW"S"\fR has capturing parentheses, and backreferences are used somewhere
+else in the whole regular expression.)
+.ie n .IP """(?!S)"", ""(?<!S)""" 4
+.el .IP "\f(CW(?!S)\fR, \f(CW(?<!S)\fR" 4
+.IX Item "(?!S), (?<!S)"
+For this grouping operator there is no need to describe the ordering, since
+only whether or not \f(CW"S"\fR can match is important.
+.ie n .IP """(??{ \fIEXPR\fR })"", ""(?\fIPARNO\fR)""" 4
+.el .IP "\f(CW(??{ \fR\f(CIEXPR\fR\f(CW })\fR, \f(CW(?\fR\f(CIPARNO\fR\f(CW)\fR" 4
+.IX Item "(??{ EXPR }), (?PARNO)"
+The ordering is the same as for the regular expression which is
+the result of \fIEXPR\fR, or the pattern contained by capture group \fIPARNO\fR.
+.ie n .IP """(?(\fIcondition\fR)\fIyes\-pattern\fR|\fIno\-pattern\fR)""" 4
+.el .IP \f(CW(?(\fR\f(CIcondition\fR\f(CW)\fR\f(CIyes\-pattern\fR\f(CW|\fR\f(CIno\-pattern\fR\f(CW)\fR 4
+.IX Item "(?(condition)yes-pattern|no-pattern)"
+Recall that which of \fIyes-pattern\fR or \fIno-pattern\fR actually matches is
+already determined. The ordering of the matches is the same as for the
+chosen subexpression.
+.PP
+The above recipes describe the ordering of matches \fIat a given position\fR.
+One more rule is needed to understand how a match is determined for the
+whole regular expression: a match at an earlier position is always better
+than a match at a later position.
+.SS "Creating Custom RE Engines"
+.IX Subsection "Creating Custom RE Engines"
+As of Perl 5.10.0, one can create custom regular expression engines. This
+is not for the faint of heart, as they have to plug in at the C level. See
+perlreapi for more details.
+.PP
+As an alternative, overloaded constants (see overload) provide a simple
+way to extend the functionality of the RE engine, by substituting one
+pattern for another.
+.PP
+Suppose that we want to enable a new RE escape-sequence \f(CW\*(C`\eY|\*(C'\fR which
+matches at a boundary between whitespace characters and non-whitespace
+characters. Note that \f(CW\*(C`(?=\eS)(?<!\eS)|(?!\eS)(?<=\eS)\*(C'\fR matches exactly
+at these positions, so we want to have each \f(CW\*(C`\eY|\*(C'\fR in the place of the
+more complicated version. We can create a module \f(CW\*(C`customre\*(C'\fR to do
+this:
+.PP
+.Vb 2
+\& package customre;
+\& use overload;
+\&
+\& sub import {
+\& shift;
+\& die "No argument to customre::import allowed" if @_;
+\& overload::constant \*(Aqqr\*(Aq => \e&convert;
+\& }
+\&
+\& sub invalid { die "/$_[0]/: invalid escape \*(Aq\e\e$_[1]\*(Aq"}
+\&
+\& # We must also take care of not escaping the legitimate \e\eY|
+\& # sequence, hence the presence of \*(Aq\e\e\*(Aq in the conversion rules.
+\& my %rules = ( \*(Aq\e\e\*(Aq => \*(Aq\e\e\e\e\*(Aq,
+\& \*(AqY|\*(Aq => qr/(?=\eS)(?<!\eS)|(?!\eS)(?<=\eS)/ );
+\& sub convert {
+\& my $re = shift;
+\& $re =~ s{
+\& \e\e ( \e\e | Y . )
+\& }
+\& { $rules{$1} or invalid($re,$1) }sgex;
+\& return $re;
+\& }
+.Ve
+.PP
+Now \f(CW\*(C`use customre\*(C'\fR enables the new escape in constant regular
+expressions, \fIi.e.\fR, those without any runtime variable interpolations.
+As documented in overload, this conversion will work only over
+literal parts of regular expressions. For \f(CW\*(C`\eY|$re\eY|\*(C'\fR the variable
+part of this regular expression needs to be converted explicitly
+(but only if the special meaning of \f(CW\*(C`\eY|\*(C'\fR should be enabled inside \f(CW$re\fR):
+.PP
+.Vb 5
+\& use customre;
+\& $re = <>;
+\& chomp $re;
+\& $re = customre::convert $re;
+\& /\eY|$re\eY|/;
+.Ve
+.SS "Embedded Code Execution Frequency"
+.IX Subsection "Embedded Code Execution Frequency"
+The exact rules for how often \f(CW\*(C`(?{})\*(C'\fR and \f(CW\*(C`(??{})\*(C'\fR are executed in a pattern
+are unspecified, and this is even more true of \f(CW\*(C`(*{})\*(C'\fR.
+In the case of a successful match you can assume that they DWIM and
+will be executed in left to right order the appropriate number of times in the
+accepting path of the pattern as would any other meta-pattern. How non\-
+accepting pathways and match failures affect the number of times a pattern is
+executed is specifically unspecified and may vary depending on what
+optimizations can be applied to the pattern and is likely to change from
+version to version.
+.PP
+For instance in
+.PP
+.Vb 1
+\& "aaabcdeeeee"=~/a(?{print "a"})b(?{print "b"})cde/;
+.Ve
+.PP
+the exact number of times "a" or "b" are printed out is unspecified for
+failure, but you may assume they will be printed at least once during
+a successful match, additionally you may assume that if "b" is printed,
+it will be preceded by at least one "a".
+.PP
+In the case of branching constructs like the following:
+.PP
+.Vb 1
+\& /a(b|(?{ print "a" }))c(?{ print "c" })/;
+.Ve
+.PP
+you can assume that the input "ac" will output "ac", and that "abc"
+will output only "c".
+.PP
+When embedded code is quantified, successful matches will call the
+code once for each matched iteration of the quantifier. For
+example:
+.PP
+.Vb 1
+\& "good" =~ /g(?:o(?{print "o"}))*d/;
+.Ve
+.PP
+will output "o" twice.
+.PP
+For historical and consistency reasons the use of normal code blocks
+anywhere in a pattern will disable certain optimisations. As of 5.37.7
+you can use an "optimistic" codeblock, \f(CW\*(C`(*{ ... })\*(C'\fR as a replacement
+for \f(CW\*(C`(?{ ... })\*(C'\fR, if you do *not* wish to disable these optimisations.
+This may result in the code block being called less often than it might
+have been had they not been optimistic.
+.SS "PCRE/Python Support"
+.IX Subsection "PCRE/Python Support"
+As of Perl 5.10.0, Perl supports several Python/PCRE\-specific extensions
+to the regex syntax. While Perl programmers are encouraged to use the
+Perl-specific syntax, the following are also accepted:
+.ie n .IP """(?P<\fINAME\fR>\fIpattern\fR)""" 4
+.el .IP \f(CW(?P<\fR\f(CINAME\fR\f(CW>\fR\f(CIpattern\fR\f(CW)\fR 4
+.IX Item "(?P<NAME>pattern)"
+Define a named capture group. Equivalent to \f(CW\*(C`(?<\fR\f(CINAME\fR\f(CW>\fR\f(CIpattern\fR\f(CW)\*(C'\fR.
+.ie n .IP """(?P=\fINAME\fR)""" 4
+.el .IP \f(CW(?P=\fR\f(CINAME\fR\f(CW)\fR 4
+.IX Item "(?P=NAME)"
+Backreference to a named capture group. Equivalent to \f(CW\*(C`\eg{\fR\f(CINAME\fR\f(CW}\*(C'\fR.
+.ie n .IP """(?P>\fINAME\fR)""" 4
+.el .IP \f(CW(?P>\fR\f(CINAME\fR\f(CW)\fR 4
+.IX Item "(?P>NAME)"
+Subroutine call to a named capture group. Equivalent to \f(CW\*(C`(?&\fR\f(CINAME\fR\f(CW)\*(C'\fR.
+.SH BUGS
+.IX Header "BUGS"
+There are a number of issues with regard to case-insensitive matching
+in Unicode rules. See \f(CW"i"\fR under "Modifiers" above.
+.PP
+This document varies from difficult to understand to completely
+and utterly opaque. The wandering prose riddled with jargon is
+hard to fathom in several places.
+.PP
+This document needs a rewrite that separates the tutorial content
+from the reference content.
+.SH "SEE ALSO"
+.IX Header "SEE ALSO"
+The syntax of patterns used in Perl pattern matching evolved from those
+supplied in the Bell Labs Research Unix 8th Edition (Version 8) regex
+routines. (The code is actually derived (distantly) from Henry
+Spencer's freely redistributable reimplementation of those V8 routines.)
+.PP
+perlrequick.
+.PP
+perlretut.
+.PP
+"Regexp Quote-Like Operators" in perlop.
+.PP
+"Gory details of parsing quoted constructs" in perlop.
+.PP
+perlfaq6.
+.PP
+"pos" in perlfunc.
+.PP
+perllocale.
+.PP
+perlebcdic.
+.PP
+\&\fIMastering Regular Expressions\fR by Jeffrey Friedl, published
+by O'Reilly and Associates.