summaryrefslogtreecommitdiffstats
path: root/upstream/mageia-cauldron/man1/perlrebackslash.1
diff options
context:
space:
mode:
Diffstat (limited to 'upstream/mageia-cauldron/man1/perlrebackslash.1')
-rw-r--r--upstream/mageia-cauldron/man1/perlrebackslash.1859
1 files changed, 859 insertions, 0 deletions
diff --git a/upstream/mageia-cauldron/man1/perlrebackslash.1 b/upstream/mageia-cauldron/man1/perlrebackslash.1
new file mode 100644
index 00000000..c109cd0f
--- /dev/null
+++ b/upstream/mageia-cauldron/man1/perlrebackslash.1
@@ -0,0 +1,859 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+. ds C` ""
+. ds C' ""
+'br\}
+.el\{\
+. ds C`
+. ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD. Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+. if \nF \{\
+. de IX
+. tm Index:\\$1\t\\n%\t"\\$2"
+..
+. if !\nF==2 \{\
+. nr % 0
+. nr F 2
+. \}
+. \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "PERLREBACKSLASH 1"
+.TH PERLREBACKSLASH 1 2023-11-28 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification. Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+perlrebackslash \- Perl Regular Expression Backslash Sequences and Escapes
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+The top level documentation about Perl regular expressions
+is found in perlre.
+.PP
+This document describes all backslash and escape sequences. After
+explaining the role of the backslash, it lists all the sequences that have
+a special meaning in Perl regular expressions (in alphabetical order),
+then describes each of them.
+.PP
+Most sequences are described in detail in different documents; the primary
+purpose of this document is to have a quick reference guide describing all
+backslash and escape sequences.
+.SS "The backslash"
+.IX Subsection "The backslash"
+In a regular expression, the backslash can perform one of two tasks:
+it either takes away the special meaning of the character following it
+(for instance, \f(CW\*(C`\e|\*(C'\fR matches a vertical bar, it's not an alternation),
+or it is the start of a backslash or escape sequence.
+.PP
+The rules determining what it is are quite simple: if the character
+following the backslash is an ASCII punctuation (non-word) character (that is,
+anything that is not a letter, digit, or underscore), then the backslash just
+takes away any special meaning of the character following it.
+.PP
+If the character following the backslash is an ASCII letter or an ASCII digit,
+then the sequence may be special; if so, it's listed below. A few letters have
+not been used yet, so escaping them with a backslash doesn't change them to be
+special. A future version of Perl may assign a special meaning to them, so if
+you have warnings turned on, Perl issues a warning if you use such a
+sequence. [1].
+.PP
+It is however guaranteed that backslash or escape sequences never have a
+punctuation character following the backslash, not now, and not in a future
+version of Perl 5. So it is safe to put a backslash in front of a non-word
+character.
+.PP
+Note that the backslash itself is special; if you want to match a backslash,
+you have to escape the backslash with a backslash: \f(CW\*(C`/\e\e/\*(C'\fR matches a single
+backslash.
+.IP [1] 4
+.IX Item "[1]"
+There is one exception. If you use an alphanumeric character as the
+delimiter of your pattern (which you probably shouldn't do for readability
+reasons), you have to escape the delimiter if you want to match
+it. Perl won't warn then. See also "Gory details of parsing
+quoted constructs" in perlop.
+.SS "All the sequences and escapes"
+.IX Subsection "All the sequences and escapes"
+Those not usable within a bracketed character class (like \f(CW\*(C`[\eda\-z]\*(C'\fR) are marked
+as \f(CW\*(C`Not in [].\*(C'\fR
+.PP
+.Vb 10
+\& \e000 Octal escape sequence. See also \eo{}.
+\& \e1 Absolute backreference. Not in [].
+\& \ea Alarm or bell.
+\& \eA Beginning of string. Not in [].
+\& \eb{}, \eb Boundary. (\eb is a backspace in []).
+\& \eB{}, \eB Not a boundary. Not in [].
+\& \ecX Control\-X.
+\& \ed Match any digit character.
+\& \eD Match any character that isn\*(Aqt a digit.
+\& \ee Escape character.
+\& \eE Turn off \eQ, \eL and \eU processing. Not in [].
+\& \ef Form feed.
+\& \eF Foldcase till \eE. Not in [].
+\& \eg{}, \eg1 Named, absolute or relative backreference.
+\& Not in [].
+\& \eG Pos assertion. Not in [].
+\& \eh Match any horizontal whitespace character.
+\& \eH Match any character that isn\*(Aqt horizontal whitespace.
+\& \ek{}, \ek<>, \ek\*(Aq\*(Aq Named backreference. Not in [].
+\& \eK Keep the stuff left of \eK. Not in [].
+\& \el Lowercase next character. Not in [].
+\& \eL Lowercase till \eE. Not in [].
+\& \en (Logical) newline character.
+\& \eN Match any character but newline. Not in [].
+\& \eN{} Named or numbered (Unicode) character or sequence.
+\& \eo{} Octal escape sequence.
+\& \ep{}, \epP Match any character with the given Unicode property.
+\& \eP{}, \ePP Match any character without the given property.
+\& \eQ Quote (disable) pattern metacharacters till \eE. Not
+\& in [].
+\& \er Return character.
+\& \eR Generic new line. Not in [].
+\& \es Match any whitespace character.
+\& \eS Match any character that isn\*(Aqt a whitespace.
+\& \et Tab character.
+\& \eu Titlecase next character. Not in [].
+\& \eU Uppercase till \eE. Not in [].
+\& \ev Match any vertical whitespace character.
+\& \eV Match any character that isn\*(Aqt vertical whitespace
+\& \ew Match any word character.
+\& \eW Match any character that isn\*(Aqt a word character.
+\& \ex{}, \ex00 Hexadecimal escape sequence.
+\& \eX Unicode "extended grapheme cluster". Not in [].
+\& \ez End of string. Not in [].
+\& \eZ End of string. Not in [].
+.Ve
+.SS "Character Escapes"
+.IX Subsection "Character Escapes"
+\fIFixed characters\fR
+.IX Subsection "Fixed characters"
+.PP
+A handful of characters have a dedicated \fIcharacter escape\fR. The following
+table shows them, along with their ASCII code points (in decimal and hex),
+their ASCII name, the control escape on ASCII platforms and a short
+description. (For EBCDIC platforms, see "OPERATOR DIFFERENCES" in perlebcdic.)
+.PP
+.Vb 9
+\& Seq. Code Point ASCII Cntrl Description.
+\& Dec Hex
+\& \ea 7 07 BEL \ecG alarm or bell
+\& \eb 8 08 BS \ecH backspace [1]
+\& \ee 27 1B ESC \ec[ escape character
+\& \ef 12 0C FF \ecL form feed
+\& \en 10 0A LF \ecJ line feed [2]
+\& \er 13 0D CR \ecM carriage return
+\& \et 9 09 TAB \ecI tab
+.Ve
+.IP [1] 4
+.IX Item "[1]"
+\&\f(CW\*(C`\eb\*(C'\fR is the backspace character only inside a character class. Outside a
+character class, \f(CW\*(C`\eb\*(C'\fR alone is a word\-character/non\-word\-character
+boundary, and \f(CW\*(C`\eb{}\*(C'\fR is some other type of boundary.
+.IP [2] 4
+.IX Item "[2]"
+\&\f(CW\*(C`\en\*(C'\fR matches a logical newline. Perl converts between \f(CW\*(C`\en\*(C'\fR and your
+OS's native newline character when reading from or writing to text files.
+.PP
+Example
+.IX Subsection "Example"
+.PP
+.Vb 1
+\& $str =~ /\et/; # Matches if $str contains a (horizontal) tab.
+.Ve
+.PP
+\fIControl characters\fR
+.IX Subsection "Control characters"
+.PP
+\&\f(CW\*(C`\ec\*(C'\fR is used to denote a control character; the character following \f(CW\*(C`\ec\*(C'\fR
+determines the value of the construct. For example the value of \f(CW\*(C`\ecA\*(C'\fR is
+\&\f(CWchr(1)\fR, and the value of \f(CW\*(C`\ecb\*(C'\fR is \f(CWchr(2)\fR, etc.
+The gory details are in "Regexp Quote-Like Operators" in perlop. A complete
+list of what \f(CWchr(1)\fR, etc. means for ASCII and EBCDIC platforms is in
+"OPERATOR DIFFERENCES" in perlebcdic.
+.PP
+Note that \f(CW\*(C`\ec\e\*(C'\fR alone at the end of a regular expression (or doubled-quoted
+string) is not valid. The backslash must be followed by another character.
+That is, \f(CW\*(C`\ec\e\fR\f(CIX\fR\f(CW\*(C'\fR means \f(CW\*(C`chr(28) . \*(Aq\fR\f(CIX\fR\f(CW\*(Aq\*(C'\fR for all characters \fIX\fR.
+.PP
+To write platform-independent code, you must use \f(CW\*(C`\eN{\fR\f(CINAME\fR\f(CW}\*(C'\fR instead, like
+\&\f(CW\*(C`\eN{ESCAPE}\*(C'\fR or \f(CW\*(C`\eN{U+001B}\*(C'\fR, see charnames.
+.PP
+Mnemonic: \fIc\fRontrol character.
+.PP
+Example
+.IX Subsection "Example"
+.PP
+.Vb 1
+\& $str =~ /\ecK/; # Matches if $str contains a vertical tab (control\-K).
+.Ve
+.PP
+\fINamed or numbered characters and character sequences\fR
+.IX Subsection "Named or numbered characters and character sequences"
+.PP
+Unicode characters have a Unicode name and numeric code point (ordinal)
+value. Use the
+\&\f(CW\*(C`\eN{}\*(C'\fR construct to specify a character by either of these values.
+Certain sequences of characters also have names.
+.PP
+To specify by name, the name of the character or character sequence goes
+between the curly braces.
+.PP
+To specify a character by Unicode code point, use the form \f(CW\*(C`\eN{U+\fR\f(CIcode
+point\fR\f(CW}\*(C'\fR, where \fIcode point\fR is a number in hexadecimal that gives the
+code point that Unicode has assigned to the desired character. It is
+customary but not required to use leading zeros to pad the number to 4
+digits. Thus \f(CW\*(C`\eN{U+0041}\*(C'\fR means \f(CW\*(C`LATIN CAPITAL LETTER A\*(C'\fR, and you will
+rarely see it written without the two leading zeros. \f(CW\*(C`\eN{U+0041}\*(C'\fR means
+"A" even on EBCDIC machines (where the ordinal value of "A" is not 0x41).
+.PP
+Blanks may freely be inserted adjacent to but within the braces
+enclosing the name or code point. So \f(CW\*(C`\eN{\ U+0041\ }\*(C'\fR is perfectly
+legal.
+.PP
+It is even possible to give your own names to characters and character
+sequences by using the charnames module. These custom names are
+lexically scoped, and so a given code point may have different names
+in different scopes. The name used is what is in effect at the time the
+\&\f(CW\*(C`\eN{}\*(C'\fR is expanded. For patterns in double-quotish context, that means
+at the time the pattern is parsed. But for patterns that are delimitted
+by single quotes, the expansion is deferred until pattern compilation
+time, which may very well have a different \f(CW\*(C`charnames\*(C'\fR translator in
+effect.
+.PP
+(There is an expanded internal form that you may see in debug output:
+\&\f(CW\*(C`\eN{U+\fR\f(CIcode point\fR\f(CW.\fR\f(CIcode point\fR\f(CW...}\*(C'\fR.
+The \f(CW\*(C`...\*(C'\fR means any number of these \fIcode point\fRs separated by dots.
+This represents the sequence formed by the characters. This is an internal
+form only, subject to change, and you should not try to use it yourself.)
+.PP
+Mnemonic: \fIN\fRamed character.
+.PP
+Note that a character or character sequence expressed as a named
+or numbered character is considered a character without special
+meaning by the regex engine, and will match "as is".
+.PP
+Example
+.IX Subsection "Example"
+.PP
+.Vb 1
+\& $str =~ /\eN{THAI CHARACTER SO SO}/; # Matches the Thai SO SO character
+\&
+\& use charnames \*(AqCyrillic\*(Aq; # Loads Cyrillic names.
+\& $str =~ /\eN{ZHE}\eN{KA}/; # Match "ZHE" followed by "KA".
+.Ve
+.PP
+\fIOctal escapes\fR
+.IX Subsection "Octal escapes"
+.PP
+There are two forms of octal escapes. Each is used to specify a character by
+its code point specified in base 8.
+.PP
+One form, available starting in Perl 5.14 looks like \f(CW\*(C`\eo{...}\*(C'\fR, where the dots
+represent one or more octal digits. It can be used for any Unicode character.
+.PP
+It was introduced to avoid the potential problems with the other form,
+available in all Perls. That form consists of a backslash followed by three
+octal digits. One problem with this form is that it can look exactly like an
+old-style backreference (see
+"Disambiguation rules between old-style octal escapes and backreferences"
+below.) You can avoid this by making the first of the three digits always a
+zero, but that makes \e077 the largest code point specifiable.
+.PP
+In some contexts, a backslash followed by two or even one octal digits may be
+interpreted as an octal escape, sometimes with a warning, and because of some
+bugs, sometimes with surprising results. Also, if you are creating a regex
+out of smaller snippets concatenated together, and you use fewer than three
+digits, the beginning of one snippet may be interpreted as adding digits to the
+ending of the snippet before it. See "Absolute referencing" for more
+discussion and examples of the snippet problem.
+.PP
+Note that a character expressed as an octal escape is considered
+a character without special meaning by the regex engine, and will match
+"as is".
+.PP
+To summarize, the \f(CW\*(C`\eo{}\*(C'\fR form is always safe to use, and the other form is
+safe to use for code points through \e077 when you use exactly three digits to
+specify them.
+.PP
+Mnemonic: \fI0\fRctal or \fIo\fRctal.
+.PP
+Examples (assuming an ASCII platform)
+.IX Subsection "Examples (assuming an ASCII platform)"
+.PP
+.Vb 12
+\& $str = "Perl";
+\& $str =~ /\eo{120}/; # Match, "\e120" is "P".
+\& $str =~ /\e120/; # Same.
+\& $str =~ /\eo{120}+/; # Match, "\e120" is "P",
+\& # it\*(Aqs repeated at least once.
+\& $str =~ /\e120+/; # Same.
+\& $str =~ /P\e053/; # No match, "\e053" is "+" and taken literally.
+\& /\eo{23073}/ # Black foreground, white background smiling face.
+\& /\eo{4801234567}/ # Raises a warning, and yields chr(4).
+\& /\eo{ 400}/ # LATIN CAPITAL LETTER A WITH MACRON
+\& /\eo{ 400 }/ # Same. These show blanks are allowed adjacent to
+\& # the braces
+.Ve
+.PP
+Disambiguation rules between old-style octal escapes and backreferences
+.IX Subsection "Disambiguation rules between old-style octal escapes and backreferences"
+.PP
+Octal escapes of the \f(CW\*(C`\e000\*(C'\fR form outside of bracketed character classes
+potentially clash with old-style backreferences (see "Absolute referencing"
+below). They both consist of a backslash followed by numbers. So Perl has to
+use heuristics to determine whether it is a backreference or an octal escape.
+Perl uses the following rules to disambiguate:
+.IP 1. 4
+If the backslash is followed by a single digit, it's a backreference.
+.IP 2. 4
+If the first digit following the backslash is a 0, it's an octal escape.
+.IP 3. 4
+If the number following the backslash is N (in decimal), and Perl already
+has seen N capture groups, Perl considers this a backreference. Otherwise,
+it considers it an octal escape. If N has more than three digits, Perl
+takes only the first three for the octal escape; the rest are matched as is.
+.Sp
+.Vb 6
+\& my $pat = "(" x 999;
+\& $pat .= "a";
+\& $pat .= ")" x 999;
+\& /^($pat)\e1000$/; # Matches \*(Aqaa\*(Aq; there are 1000 capture groups.
+\& /^$pat\e1000$/; # Matches \*(Aqa@0\*(Aq; there are 999 capture groups
+\& # and \e1000 is seen as \e100 (a \*(Aq@\*(Aq) and a \*(Aq0\*(Aq.
+.Ve
+.PP
+You can force a backreference interpretation always by using the \f(CW\*(C`\eg{...}\*(C'\fR
+form. You can the force an octal interpretation always by using the \f(CW\*(C`\eo{...}\*(C'\fR
+form, or for numbers up through \e077 (= 63 decimal), by using three digits,
+beginning with a "0".
+.PP
+\fIHexadecimal escapes\fR
+.IX Subsection "Hexadecimal escapes"
+.PP
+Like octal escapes, there are two forms of hexadecimal escapes, but both start
+with the sequence \f(CW\*(C`\ex\*(C'\fR. This is followed by either exactly two hexadecimal
+digits forming a number, or a hexadecimal number of arbitrary length surrounded
+by curly braces. The hexadecimal number is the code point of the character you
+want to express.
+.PP
+Note that a character expressed as one of these escapes is considered a
+character without special meaning by the regex engine, and will match
+"as is".
+.PP
+Mnemonic: he\fIx\fRadecimal.
+.PP
+Examples (assuming an ASCII platform)
+.IX Subsection "Examples (assuming an ASCII platform)"
+.PP
+.Vb 4
+\& $str = "Perl";
+\& $str =~ /\ex50/; # Match, "\ex50" is "P".
+\& $str =~ /\ex50+/; # Match, "\ex50" is "P", it is repeated at least once
+\& $str =~ /P\ex2B/; # No match, "\ex2B" is "+" and taken literally.
+\&
+\& /\ex{2603}\ex{2602}/ # Snowman with an umbrella.
+\& # The Unicode character 2603 is a snowman,
+\& # the Unicode character 2602 is an umbrella.
+\& /\ex{263B}/ # Black smiling face.
+\& /\ex{263b}/ # Same, the hex digits A \- F are case insensitive.
+\& /\ex{ 263b }/ # Same, showing optional blanks adjacent to the
+\& # braces
+.Ve
+.SS Modifiers
+.IX Subsection "Modifiers"
+A number of backslash sequences have to do with changing the character,
+or characters following them. \f(CW\*(C`\el\*(C'\fR will lowercase the character following
+it, while \f(CW\*(C`\eu\*(C'\fR will uppercase (or, more accurately, titlecase) the
+character following it. They provide functionality similar to the
+functions \f(CW\*(C`lcfirst\*(C'\fR and \f(CW\*(C`ucfirst\*(C'\fR.
+.PP
+To uppercase or lowercase several characters, one might want to use
+\&\f(CW\*(C`\eL\*(C'\fR or \f(CW\*(C`\eU\*(C'\fR, which will lowercase/uppercase all characters following
+them, until either the end of the pattern or the next occurrence of
+\&\f(CW\*(C`\eE\*(C'\fR, whichever comes first. They provide functionality similar to what
+the functions \f(CW\*(C`lc\*(C'\fR and \f(CW\*(C`uc\*(C'\fR provide.
+.PP
+\&\f(CW\*(C`\eQ\*(C'\fR is used to quote (disable) pattern metacharacters, up to the next
+\&\f(CW\*(C`\eE\*(C'\fR or the end of the pattern. \f(CW\*(C`\eQ\*(C'\fR adds a backslash to any character
+that could have special meaning to Perl. In the ASCII range, it quotes
+every character that isn't a letter, digit, or underscore. See
+"quotemeta" in perlfunc for details on what gets quoted for non-ASCII
+code points. Using this ensures that any character between \f(CW\*(C`\eQ\*(C'\fR and
+\&\f(CW\*(C`\eE\*(C'\fR will be matched literally, not interpreted as a metacharacter by
+the regex engine.
+.PP
+\&\f(CW\*(C`\eF\*(C'\fR can be used to casefold all characters following, up to the next \f(CW\*(C`\eE\*(C'\fR
+or the end of the pattern. It provides the functionality similar to
+the \f(CW\*(C`fc\*(C'\fR function.
+.PP
+Mnemonic: \fIL\fRowercase, \fIU\fRppercase, \fIF\fRold-case, \fIQ\fRuotemeta, \fIE\fRnd.
+.PP
+Examples
+.IX Subsection "Examples"
+.PP
+.Vb 7
+\& $sid = "sid";
+\& $greg = "GrEg";
+\& $miranda = "(Miranda)";
+\& $str =~ /\eu$sid/; # Matches \*(AqSid\*(Aq
+\& $str =~ /\eL$greg/; # Matches \*(Aqgreg\*(Aq
+\& $str =~ /\eQ$miranda\eE/; # Matches \*(Aq(Miranda)\*(Aq, as if the pattern
+\& # had been written as /\e(Miranda\e)/
+.Ve
+.SS "Character classes"
+.IX Subsection "Character classes"
+Perl regular expressions have a large range of character classes. Some of
+the character classes are written as a backslash sequence. We will briefly
+discuss those here; full details of character classes can be found in
+perlrecharclass.
+.PP
+\&\f(CW\*(C`\ew\*(C'\fR is a character class that matches any single \fIword\fR character
+(letters, digits, Unicode marks, and connector punctuation (like the
+underscore)). \f(CW\*(C`\ed\*(C'\fR is a character class that matches any decimal
+digit, while the character class \f(CW\*(C`\es\*(C'\fR matches any whitespace character.
+New in perl 5.10.0 are the classes \f(CW\*(C`\eh\*(C'\fR and \f(CW\*(C`\ev\*(C'\fR which match horizontal
+and vertical whitespace characters.
+.PP
+The exact set of characters matched by \f(CW\*(C`\ed\*(C'\fR, \f(CW\*(C`\es\*(C'\fR, and \f(CW\*(C`\ew\*(C'\fR varies
+depending on various pragma and regular expression modifiers. It is
+possible to restrict the match to the ASCII range by using the \f(CW\*(C`/a\*(C'\fR
+regular expression modifier. See perlrecharclass.
+.PP
+The uppercase variants (\f(CW\*(C`\eW\*(C'\fR, \f(CW\*(C`\eD\*(C'\fR, \f(CW\*(C`\eS\*(C'\fR, \f(CW\*(C`\eH\*(C'\fR, and \f(CW\*(C`\eV\*(C'\fR) are
+character classes that match, respectively, any character that isn't a
+word character, digit, whitespace, horizontal whitespace, or vertical
+whitespace.
+.PP
+Mnemonics: \fIw\fRord, \fId\fRigit, \fIs\fRpace, \fIh\fRorizontal, \fIv\fRertical.
+.PP
+\fIUnicode classes\fR
+.IX Subsection "Unicode classes"
+.PP
+\&\f(CW\*(C`\epP\*(C'\fR (where \f(CW\*(C`P\*(C'\fR is a single letter) and \f(CW\*(C`\ep{Property}\*(C'\fR are used to
+match a character that matches the given Unicode property; properties
+include things like "letter", or "thai character". Capitalizing the
+sequence to \f(CW\*(C`\ePP\*(C'\fR and \f(CW\*(C`\eP{Property}\*(C'\fR make the sequence match a character
+that doesn't match the given Unicode property. For more details, see
+"Backslash sequences" in perlrecharclass and
+"Unicode Character Properties" in perlunicode.
+.PP
+Mnemonic: \fIp\fRroperty.
+.SS Referencing
+.IX Subsection "Referencing"
+If capturing parenthesis are used in a regular expression, we can refer
+to the part of the source string that was matched, and match exactly the
+same thing. There are three ways of referring to such \fIbackreference\fR:
+absolutely, relatively, and by name.
+.PP
+\fIAbsolute referencing\fR
+.IX Subsection "Absolute referencing"
+.PP
+Either \f(CW\*(C`\eg\fR\f(CIN\fR\f(CW\*(C'\fR (starting in Perl 5.10.0), or \f(CW\*(C`\e\fR\f(CIN\fR\f(CW\*(C'\fR (old-style) where \fIN\fR
+is a positive (unsigned) decimal number of any length is an absolute reference
+to a capturing group.
+.PP
+\&\fIN\fR refers to the Nth set of parentheses, so \f(CW\*(C`\eg\fR\f(CIN\fR\f(CW\*(C'\fR refers to whatever has
+been matched by that set of parentheses. Thus \f(CW\*(C`\eg1\*(C'\fR refers to the first
+capture group in the regex.
+.PP
+The \f(CW\*(C`\eg\fR\f(CIN\fR\f(CW\*(C'\fR form can be equivalently written as \f(CW\*(C`\eg{\fR\f(CIN\fR\f(CW}\*(C'\fR
+which avoids ambiguity when building a regex by concatenating shorter
+strings. Otherwise if you had a regex \f(CW\*(C`qr/$a$b/\*(C'\fR, and \f(CW$a\fR contained
+\&\f(CW"\eg1"\fR, and \f(CW$b\fR contained \f(CW"37"\fR, you would get \f(CW\*(C`/\eg137/\*(C'\fR which is
+probably not what you intended.
+.PP
+In the \f(CW\*(C`\e\fR\f(CIN\fR\f(CW\*(C'\fR form, \fIN\fR must not begin with a "0", and there must be at
+least \fIN\fR capturing groups, or else \fIN\fR is considered an octal escape
+(but something like \f(CW\*(C`\e18\*(C'\fR is the same as \f(CW\*(C`\e0018\*(C'\fR; that is, the octal escape
+\&\f(CW"\e001"\fR followed by a literal digit \f(CW"8"\fR).
+.PP
+Mnemonic: \fIg\fRroup.
+.PP
+Examples
+.IX Subsection "Examples"
+.PP
+.Vb 5
+\& /(\ew+) \eg1/; # Finds a duplicated word, (e.g. "cat cat").
+\& /(\ew+) \e1/; # Same thing; written old\-style.
+\& /(\ew+) \eg{1}/; # Same, using the safer braced notation
+\& /(\ew+) \eg{ 1 }/;# Same, showing optional blanks adjacent to the braces
+\& /(.)(.)\eg2\eg1/; # Match a four letter palindrome (e.g. "ABBA").
+.Ve
+.PP
+\fIRelative referencing\fR
+.IX Subsection "Relative referencing"
+.PP
+\&\f(CW\*(C`\eg\-\fR\f(CIN\fR\f(CW\*(C'\fR (starting in Perl 5.10.0) is used for relative addressing. (It can
+be written as \f(CW\*(C`\eg{\-\fR\f(CIN\fR\f(CW}\*(C'\fR.) It refers to the \fIN\fRth group before the
+\&\f(CW\*(C`\eg{\-\fR\f(CIN\fR\f(CW}\*(C'\fR.
+.PP
+The big advantage of this form is that it makes it much easier to write
+patterns with references that can be interpolated in larger patterns,
+even if the larger pattern also contains capture groups.
+.PP
+Examples
+.IX Subsection "Examples"
+.PP
+.Vb 8
+\& /(A) # Group 1
+\& ( # Group 2
+\& (B) # Group 3
+\& \eg{\-1} # Refers to group 3 (B)
+\& \eg{\-3} # Refers to group 1 (A)
+\& \eg{ \-3 } # Same, showing optional blanks adjacent to the braces
+\& )
+\& /x; # Matches "ABBA".
+\&
+\& my $qr = qr /(.)(.)\eg{\-2}\eg{\-1}/; # Matches \*(Aqabab\*(Aq, \*(Aqcdcd\*(Aq, etc.
+\& /$qr$qr/ # Matches \*(Aqababcdcd\*(Aq.
+.Ve
+.PP
+\fINamed referencing\fR
+.IX Subsection "Named referencing"
+.PP
+\&\f(CW\*(C`\eg{\fR\f(CIname\fR\f(CW}\*(C'\fR (starting in Perl 5.10.0) can be used to back refer to a
+named capture group, dispensing completely with having to think about capture
+buffer positions.
+.PP
+To be compatible with .Net regular expressions, \f(CW\*(C`\eg{name}\*(C'\fR may also be
+written as \f(CW\*(C`\ek{name}\*(C'\fR, \f(CW\*(C`\ek<name>\*(C'\fR or \f(CW\*(C`\ek\*(Aqname\*(Aq\*(C'\fR.
+.PP
+To prevent any ambiguity, \fIname\fR must not start with a digit nor contain a
+hyphen.
+.PP
+Examples
+.IX Subsection "Examples"
+.PP
+.Vb 10
+\& /(?<word>\ew+) \eg{word}/ # Finds duplicated word, (e.g. "cat cat")
+\& /(?<word>\ew+) \ek{word}/ # Same.
+\& /(?<word>\ew+) \eg{ word }/ # Same, showing optional blanks adjacent to
+\& # the braces
+\& /(?<word>\ew+) \ek{ word }/ # Same.
+\& /(?<word>\ew+) \ek<word>/ # Same. There are no braces, so no blanks
+\& # are permitted
+\& /(?<letter1>.)(?<letter2>.)\eg{letter2}\eg{letter1}/
+\& # Match a four letter palindrome (e.g.
+\& # "ABBA")
+.Ve
+.SS Assertions
+.IX Subsection "Assertions"
+Assertions are conditions that have to be true; they don't actually
+match parts of the substring. There are six assertions that are written as
+backslash sequences.
+.IP \eA 4
+.IX Item "A"
+\&\f(CW\*(C`\eA\*(C'\fR only matches at the beginning of the string. If the \f(CW\*(C`/m\*(C'\fR modifier
+isn't used, then \f(CW\*(C`/\eA/\*(C'\fR is equivalent to \f(CW\*(C`/^/\*(C'\fR. However, if the \f(CW\*(C`/m\*(C'\fR
+modifier is used, then \f(CW\*(C`/^/\*(C'\fR matches internal newlines, but the meaning
+of \f(CW\*(C`/\eA/\*(C'\fR isn't changed by the \f(CW\*(C`/m\*(C'\fR modifier. \f(CW\*(C`\eA\*(C'\fR matches at the beginning
+of the string regardless whether the \f(CW\*(C`/m\*(C'\fR modifier is used.
+.IP "\ez, \eZ" 4
+.IX Item "z, Z"
+\&\f(CW\*(C`\ez\*(C'\fR and \f(CW\*(C`\eZ\*(C'\fR match at the end of the string. If the \f(CW\*(C`/m\*(C'\fR modifier isn't
+used, then \f(CW\*(C`/\eZ/\*(C'\fR is equivalent to \f(CW\*(C`/$/\*(C'\fR; that is, it matches at the
+end of the string, or one before the newline at the end of the string. If the
+\&\f(CW\*(C`/m\*(C'\fR modifier is used, then \f(CW\*(C`/$/\*(C'\fR matches at internal newlines, but the
+meaning of \f(CW\*(C`/\eZ/\*(C'\fR isn't changed by the \f(CW\*(C`/m\*(C'\fR modifier. \f(CW\*(C`\eZ\*(C'\fR matches at
+the end of the string (or just before a trailing newline) regardless whether
+the \f(CW\*(C`/m\*(C'\fR modifier is used.
+.Sp
+\&\f(CW\*(C`\ez\*(C'\fR is just like \f(CW\*(C`\eZ\*(C'\fR, except that it does not match before a trailing
+newline. \f(CW\*(C`\ez\*(C'\fR matches at the end of the string only, regardless of the
+modifiers used, and not just before a newline. It is how to anchor the
+match to the true end of the string under all conditions.
+.IP \eG 4
+.IX Item "G"
+\&\f(CW\*(C`\eG\*(C'\fR is usually used only in combination with the \f(CW\*(C`/g\*(C'\fR modifier. If the
+\&\f(CW\*(C`/g\*(C'\fR modifier is used and the match is done in scalar context, Perl
+remembers where in the source string the last match ended, and the next time,
+it will start the match from where it ended the previous time.
+.Sp
+\&\f(CW\*(C`\eG\*(C'\fR matches the point where the previous match on that string ended,
+or the beginning of that string if there was no previous match.
+.Sp
+Mnemonic: \fIG\fRlobal.
+.IP "\eb{}, \eb, \eB{}, \eB" 4
+.IX Item "b{}, b, B{}, B"
+\&\f(CW\*(C`\eb{...}\*(C'\fR, available starting in v5.22, matches a boundary (between two
+characters, or before the first character of the string, or after the
+final character of the string) based on the Unicode rules for the
+boundary type specified inside the braces. The boundary
+types are given a few paragraphs below. \f(CW\*(C`\eB{...}\*(C'\fR matches at any place
+between characters where \f(CW\*(C`\eb{...}\*(C'\fR of the same type doesn't match.
+.Sp
+\&\f(CW\*(C`\eb\*(C'\fR when not immediately followed by a \f(CW"{"\fR is available in all
+Perls. It matches at any place
+between a word (something matched by \f(CW\*(C`\ew\*(C'\fR) and a non-word character
+(\f(CW\*(C`\eW\*(C'\fR); \f(CW\*(C`\eB\*(C'\fR when not immediately followed by a \f(CW"{"\fR matches at any
+place between characters where \f(CW\*(C`\eb\*(C'\fR doesn't match. To get better
+word matching of natural language text, see "\eb{wb}" below.
+.Sp
+\&\f(CW\*(C`\eb\*(C'\fR
+and \f(CW\*(C`\eB\*(C'\fR assume there's a non-word character before the beginning and after
+the end of the source string; so \f(CW\*(C`\eb\*(C'\fR will match at the beginning (or end)
+of the source string if the source string begins (or ends) with a word
+character. Otherwise, \f(CW\*(C`\eB\*(C'\fR will match.
+.Sp
+Do not use something like \f(CW\*(C`\eb=head\ed\eb\*(C'\fR and expect it to match the
+beginning of a line. It can't, because for there to be a boundary before
+the non-word "=", there must be a word character immediately previous.
+All plain \f(CW\*(C`\eb\*(C'\fR and \f(CW\*(C`\eB\*(C'\fR boundary determinations look for word
+characters alone, not for
+non-word characters nor for string ends. It may help to understand how
+\&\f(CW\*(C`\eb\*(C'\fR and \f(CW\*(C`\eB\*(C'\fR work by equating them as follows:
+.Sp
+.Vb 2
+\& \eb really means (?:(?<=\ew)(?!\ew)|(?<!\ew)(?=\ew))
+\& \eB really means (?:(?<=\ew)(?=\ew)|(?<!\ew)(?!\ew))
+.Ve
+.Sp
+In contrast, \f(CW\*(C`\eb{...}\*(C'\fR and \f(CW\*(C`\eB{...}\*(C'\fR may or may not match at the
+beginning and end of the line, depending on the boundary type. These
+implement the Unicode default boundaries, specified in
+<https://www.unicode.org/reports/tr14/> and
+<https://www.unicode.org/reports/tr29/>.
+The boundary types are:
+.RS 4
+.ie n .IP """\eb{gcb}"" or ""\eb{g}""" 4
+.el .IP "\f(CW\eb{gcb}\fR or \f(CW\eb{g}\fR" 4
+.IX Item "b{gcb} or b{g}"
+This matches a Unicode "Grapheme Cluster Boundary". (Actually Perl
+always uses the improved "extended" grapheme cluster"). These are
+explained below under \f(CW"\eX"\fR. In fact, \f(CW\*(C`\eX\*(C'\fR is another way to get
+the same functionality. It is equivalent to \f(CW\*(C`/.+?\eb{gcb}/\*(C'\fR. Use
+whichever is most convenient for your situation.
+.ie n .IP """\eb{lb}""" 4
+.el .IP \f(CW\eb{lb}\fR 4
+.IX Item "b{lb}"
+This matches according to the default Unicode Line Breaking Algorithm
+(<https://www.unicode.org/reports/tr14/>), as customized in that
+document
+(Example 7 of revision 35 <https://www.unicode.org/reports/tr14/tr14-35.html#Example7>)
+for better handling of numeric expressions.
+.Sp
+This is suitable for many purposes, but the Unicode::LineBreak module
+is available on CPAN that provides many more features, including
+customization.
+.ie n .IP """\eb{sb}""" 4
+.el .IP \f(CW\eb{sb}\fR 4
+.IX Item "b{sb}"
+This matches a Unicode "Sentence Boundary". This is an aid to parsing
+natural language sentences. It gives good, but imperfect results. For
+example, it thinks that "Mr. Smith" is two sentences. More details are
+at <https://www.unicode.org/reports/tr29/>. Note also that it thinks
+that anything matching "\eR" (except form feed and vertical tab) is a
+sentence boundary. \f(CW\*(C`\eb{sb}\*(C'\fR works with text designed for
+word-processors which wrap lines
+automatically for display, but hard-coded line boundaries are considered
+to be essentially the ends of text blocks (paragraphs really), and hence
+the ends of sentences. \f(CW\*(C`\eb{sb}\*(C'\fR doesn't do well with text containing
+embedded newlines, like the source text of the document you are reading.
+Such text needs to be preprocessed to get rid of the line separators
+before looking for sentence boundaries. Some people view this as a bug
+in the Unicode standard, and this behavior is quite subject to change in
+future Perl versions.
+.ie n .IP """\eb{wb}""" 4
+.el .IP \f(CW\eb{wb}\fR 4
+.IX Item "b{wb}"
+This matches a Unicode "Word Boundary", but tailored to Perl
+expectations. This gives better (though not
+perfect) results for natural language processing than plain \f(CW\*(C`\eb\*(C'\fR
+(without braces) does. For example, it understands that apostrophes can
+be in the middle of words and that parentheses aren't (see the examples
+below). More details are at <https://www.unicode.org/reports/tr29/>.
+.Sp
+The current Unicode definition of a Word Boundary matches between every
+white space character. Perl tailors this, starting in version 5.24, to
+generally not break up spans of white space, just as plain \f(CW\*(C`\eb\*(C'\fR has
+always functioned. This allows \f(CW\*(C`\eb{wb}\*(C'\fR to be a drop-in replacement for
+\&\f(CW\*(C`\eb\*(C'\fR, but with generally better results for natural language
+processing. (The exception to this tailoring is when a span of white
+space is immediately followed by something like U+0303, COMBINING TILDE.
+If the final space character in the span is a horizontal white space, it
+is broken out so that it attaches instead to the combining character.
+To be precise, if a span of white space that ends in a horizontal space
+has the character immediately following it have any of the Word
+Boundary property values "Extend", "Format" or "ZWJ", the boundary between the
+final horizontal space character and the rest of the span matches
+\&\f(CW\*(C`\eb{wb}\*(C'\fR. In all other cases the boundary between two white space
+characters matches \f(CW\*(C`\eB{wb}\*(C'\fR.)
+.RE
+.RS 4
+.Sp
+It is important to realize when you use these Unicode boundaries,
+that you are taking a risk that a future version of Perl which contains
+a later version of the Unicode Standard will not work precisely the same
+way as it did when your code was written. These rules are not
+considered stable and have been somewhat more subject to change than the
+rest of the Standard. Unicode reserves the right to change them at
+will, and Perl reserves the right to update its implementation to
+Unicode's new rules. In the past, some changes have been because new
+characters have been added to the Standard which have different
+characteristics than all previous characters, so new rules are
+formulated for handling them. These should not cause any backward
+compatibility issues. But some changes have changed the treatment of
+existing characters because the Unicode Technical Committee has decided
+that the change is warranted for whatever reason. This could be to fix
+a bug, or because they think better results are obtained with the new
+rule.
+.Sp
+It is also important to realize that these are default boundary
+definitions, and that implementations may wish to tailor the results for
+particular purposes and locales. For example, some languages, such as
+Japanese and Thai, require dictionary lookup to accurately determine
+word boundaries.
+.Sp
+Mnemonic: \fIb\fRoundary.
+.RE
+.PP
+Examples
+.IX Subsection "Examples"
+.PP
+.Vb 4
+\& "cat" =~ /\eAcat/; # Match.
+\& "cat" =~ /cat\eZ/; # Match.
+\& "cat\en" =~ /cat\eZ/; # Match.
+\& "cat\en" =~ /cat\ez/; # No match.
+\&
+\& "cat" =~ /\ebcat\eb/; # Matches.
+\& "cats" =~ /\ebcat\eb/; # No match.
+\& "cat" =~ /\ebcat\eB/; # No match.
+\& "cats" =~ /\ebcat\eB/; # Match.
+\&
+\& while ("cat dog" =~ /(\ew+)/g) {
+\& print $1; # Prints \*(Aqcatdog\*(Aq
+\& }
+\& while ("cat dog" =~ /\eG(\ew+)/g) {
+\& print $1; # Prints \*(Aqcat\*(Aq
+\& }
+\&
+\& my $s = "He said, \e"Is pi 3.14? (I\*(Aqm not sure).\e"";
+\& print join("|", $s =~ m/ ( .+? \eb ) /xg), "\en";
+\& print join("|", $s =~ m/ ( .+? \eb{wb} ) /xg), "\en";
+\& prints
+\& He| |said|, "|Is| |pi| |3|.|14|? (|I|\*(Aq|m| |not| |sure
+\& He| |said|,| |"|Is| |pi| |3.14|?| |(|I\*(Aqm| |not| |sure|)|.|"
+.Ve
+.SS Misc
+.IX Subsection "Misc"
+Here we document the backslash sequences that don't fall in one of the
+categories above. These are:
+.IP \eK 4
+.IX Item "K"
+This appeared in perl 5.10.0. Anything matched left of \f(CW\*(C`\eK\*(C'\fR is
+not included in \f(CW$&\fR, and will not be replaced if the pattern is
+used in a substitution. This lets you write \f(CW\*(C`s/PAT1 \eK PAT2/REPL/x\*(C'\fR
+instead of \f(CW\*(C`s/(PAT1) PAT2/${1}REPL/x\*(C'\fR or \f(CW\*(C`s/(?<=PAT1) PAT2/REPL/x\*(C'\fR.
+.Sp
+Mnemonic: \fIK\fReep.
+.IP \eN 4
+.IX Item "N"
+This feature, available starting in v5.12, matches any character
+that is \fBnot\fR a newline. It is a short-hand for writing \f(CW\*(C`[^\en]\*(C'\fR, and is
+identical to the \f(CW\*(C`.\*(C'\fR metasymbol, except under the \f(CW\*(C`/s\*(C'\fR flag, which changes
+the meaning of \f(CW\*(C`.\*(C'\fR, but not \f(CW\*(C`\eN\*(C'\fR.
+.Sp
+Note that \f(CW\*(C`\eN{...}\*(C'\fR can mean a
+named or numbered character
+\&.
+.Sp
+Mnemonic: Complement of \fI\en\fR.
+.IP \eR 4
+.IX Xref "\\R"
+.IX Item "R"
+\&\f(CW\*(C`\eR\*(C'\fR matches a \fIgeneric newline\fR; that is, anything considered a
+linebreak sequence by Unicode. This includes all characters matched by
+\&\f(CW\*(C`\ev\*(C'\fR (vertical whitespace), and the multi character sequence \f(CW"\ex0D\ex0A"\fR
+(carriage return followed by a line feed, sometimes called the network
+newline; it's the end of line sequence used in Microsoft text files opened
+in binary mode). \f(CW\*(C`\eR\*(C'\fR is equivalent to \f(CW\*(C`(?>\ex0D\ex0A|\ev)\*(C'\fR. (The
+reason it doesn't backtrack is that the sequence is considered
+inseparable. That means that
+.Sp
+.Vb 1
+\& "\ex0D\ex0A" =~ /^\eR\ex0A$/ # No match
+.Ve
+.Sp
+fails, because the \f(CW\*(C`\eR\*(C'\fR matches the entire string, and won't backtrack
+to match just the \f(CW"\ex0D"\fR.) Since
+\&\f(CW\*(C`\eR\*(C'\fR can match a sequence of more than one character, it cannot be put
+inside a bracketed character class; \f(CW\*(C`/[\eR]/\*(C'\fR is an error; use \f(CW\*(C`\ev\*(C'\fR
+instead. \f(CW\*(C`\eR\*(C'\fR was introduced in perl 5.10.0.
+.Sp
+Note that this does not respect any locale that might be in effect; it
+matches according to the platform's native character set.
+.Sp
+Mnemonic: none really. \f(CW\*(C`\eR\*(C'\fR was picked because PCRE already uses \f(CW\*(C`\eR\*(C'\fR,
+and more importantly because Unicode recommends such a regular expression
+metacharacter, and suggests \f(CW\*(C`\eR\*(C'\fR as its notation.
+.IP \eX 4
+.IX Xref "\\X"
+.IX Item "X"
+This matches a Unicode \fIextended grapheme cluster\fR.
+.Sp
+\&\f(CW\*(C`\eX\*(C'\fR matches quite well what normal (non-Unicode-programmer) usage
+would consider a single character. As an example, consider a G with some sort
+of diacritic mark, such as an arrow. There is no such single character in
+Unicode, but one can be composed by using a G followed by a Unicode "COMBINING
+UPWARDS ARROW BELOW", and would be displayed by Unicode-aware software as if it
+were a single character.
+.Sp
+The match is greedy and non-backtracking, so that the cluster is never
+broken up into smaller components.
+.Sp
+See also \f(CW\*(C`\eb{gcb}\*(C'\fR.
+.Sp
+Mnemonic: e\fIX\fRtended Unicode character.
+.PP
+Examples
+.IX Subsection "Examples"
+.PP
+.Vb 2
+\& $str =~ s/foo\eKbar/baz/g; # Change any \*(Aqbar\*(Aq following a \*(Aqfoo\*(Aq to \*(Aqbaz\*(Aq
+\& $str =~ s/(.)\eK\eg1//g; # Delete duplicated characters.
+\&
+\& "\en" =~ /^\eR$/; # Match, \en is a generic newline.
+\& "\er" =~ /^\eR$/; # Match, \er is a generic newline.
+\& "\er\en" =~ /^\eR$/; # Match, \er\en is a generic newline.
+\&
+\& "P\ex{307}" =~ /^\eX$/ # \eX matches a P with a dot above.
+.Ve