diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:43:11 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:43:11 +0000 |
commit | fc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch) | |
tree | ce1e3bce06471410239a6f41282e328770aa404a /upstream/fedora-40/man1/perlrequick.1 | |
parent | Initial commit. (diff) | |
download | manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip |
Adding upstream version 4.22.0.upstream/4.22.0
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'upstream/fedora-40/man1/perlrequick.1')
-rw-r--r-- | upstream/fedora-40/man1/perlrequick.1 | 651 |
1 files changed, 651 insertions, 0 deletions
diff --git a/upstream/fedora-40/man1/perlrequick.1 b/upstream/fedora-40/man1/perlrequick.1 new file mode 100644 index 00000000..83b0c982 --- /dev/null +++ b/upstream/fedora-40/man1/perlrequick.1 @@ -0,0 +1,651 @@ +.\" -*- mode: troff; coding: utf-8 -*- +.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) +.\" +.\" Standard preamble: +.\" ======================================================================== +.de Sp \" Vertical space (when we can't use .PP) +.if t .sp .5v +.if n .sp +.. +.de Vb \" Begin verbatim text +.ft CW +.nf +.ne \\$1 +.. +.de Ve \" End verbatim text +.ft R +.fi +.. +.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. +.ie n \{\ +. ds C` "" +. ds C' "" +'br\} +.el\{\ +. ds C` +. ds C' +'br\} +.\" +.\" Escape single quotes in literal strings from groff's Unicode transform. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.\" +.\" If the F register is >0, we'll generate index entries on stderr for +.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index +.\" entries marked with X<> in POD. Of course, you'll have to process the +.\" output yourself in some meaningful fashion. +.\" +.\" Avoid warning from groff about undefined register 'F'. +.de IX +.. +.nr rF 0 +.if \n(.g .if rF .nr rF 1 +.if (\n(rF:(\n(.g==0)) \{\ +. if \nF \{\ +. de IX +. tm Index:\\$1\t\\n%\t"\\$2" +.. +. if !\nF==2 \{\ +. nr % 0 +. nr F 2 +. \} +. \} +.\} +.rr rF +.\" ======================================================================== +.\" +.IX Title "PERLREQUICK 1" +.TH PERLREQUICK 1 2024-01-25 "perl v5.38.2" "Perl Programmers Reference Guide" +.\" For nroff, turn off justification. Always turn off hyphenation; it makes +.\" way too many mistakes in technical documents. +.if n .ad l +.nh +.SH NAME +perlrequick \- Perl regular expressions quick start +.SH DESCRIPTION +.IX Header "DESCRIPTION" +This page covers the very basics of understanding, creating and +using regular expressions ('regexes') in Perl. +.SH "The Guide" +.IX Header "The Guide" +This page assumes you already know things, like what a "pattern" is, and +the basic syntax of using them. If you don't, see perlretut. +.SS "Simple word matching" +.IX Subsection "Simple word matching" +The simplest regex is simply a word, or more generally, a string of +characters. A regex consisting of a word matches any string that +contains that word: +.PP +.Vb 1 +\& "Hello World" =~ /World/; # matches +.Ve +.PP +In this statement, \f(CW\*(C`World\*(C'\fR is a regex and the \f(CW\*(C`//\*(C'\fR enclosing +\&\f(CW\*(C`/World/\*(C'\fR tells Perl to search a string for a match. The operator +\&\f(CW\*(C`=~\*(C'\fR associates the string with the regex match and produces a true +value if the regex matched, or false if the regex did not match. In +our case, \f(CW\*(C`World\*(C'\fR matches the second word in \f(CW"Hello World"\fR, so the +expression is true. This idea has several variations. +.PP +Expressions like this are useful in conditionals: +.PP +.Vb 1 +\& print "It matches\en" if "Hello World" =~ /World/; +.Ve +.PP +The sense of the match can be reversed by using \f(CW\*(C`!~\*(C'\fR operator: +.PP +.Vb 1 +\& print "It doesn\*(Aqt match\en" if "Hello World" !~ /World/; +.Ve +.PP +The literal string in the regex can be replaced by a variable: +.PP +.Vb 2 +\& $greeting = "World"; +\& print "It matches\en" if "Hello World" =~ /$greeting/; +.Ve +.PP +If you're matching against \f(CW$_\fR, the \f(CW\*(C`$_ =~\*(C'\fR part can be omitted: +.PP +.Vb 2 +\& $_ = "Hello World"; +\& print "It matches\en" if /World/; +.Ve +.PP +Finally, the \f(CW\*(C`//\*(C'\fR default delimiters for a match can be changed to +arbitrary delimiters by putting an \f(CW\*(Aqm\*(Aq\fR out front: +.PP +.Vb 4 +\& "Hello World" =~ m!World!; # matches, delimited by \*(Aq!\*(Aq +\& "Hello World" =~ m{World}; # matches, note the matching \*(Aq{}\*(Aq +\& "/usr/bin/perl" =~ m"/perl"; # matches after \*(Aq/usr/bin\*(Aq, +\& # \*(Aq/\*(Aq becomes an ordinary char +.Ve +.PP +Regexes must match a part of the string \fIexactly\fR in order for the +statement to be true: +.PP +.Vb 3 +\& "Hello World" =~ /world/; # doesn\*(Aqt match, case sensitive +\& "Hello World" =~ /o W/; # matches, \*(Aq \*(Aq is an ordinary char +\& "Hello World" =~ /World /; # doesn\*(Aqt match, no \*(Aq \*(Aq at end +.Ve +.PP +Perl will always match at the earliest possible point in the string: +.PP +.Vb 2 +\& "Hello World" =~ /o/; # matches \*(Aqo\*(Aq in \*(AqHello\*(Aq +\& "That hat is red" =~ /hat/; # matches \*(Aqhat\*(Aq in \*(AqThat\*(Aq +.Ve +.PP +Not all characters can be used 'as is' in a match. Some characters, +called \fBmetacharacters\fR, are considered special, and reserved for use +in regex notation. The metacharacters are +.PP +.Vb 1 +\& {}[]()^$.|*+?\e +.Ve +.PP +A metacharacter can be matched literally by putting a backslash before +it: +.PP +.Vb 4 +\& "2+2=4" =~ /2+2/; # doesn\*(Aqt match, + is a metacharacter +\& "2+2=4" =~ /2\e+2/; # matches, \e+ is treated like an ordinary + +\& \*(AqC:\eWIN32\*(Aq =~ /C:\e\eWIN/; # matches +\& "/usr/bin/perl" =~ /\e/usr\e/bin\e/perl/; # matches +.Ve +.PP +In the last regex, the forward slash \f(CW\*(Aq/\*(Aq\fR is also backslashed, +because it is used to delimit the regex. +.PP +Most of the metacharacters aren't always special, and other characters +(such as the ones delimiting the pattern) become special under various +circumstances. This can be confusing and lead to unexpected results. +\&\f(CW\*(C`use\ re\ \*(Aqstrict\*(Aq\*(C'\fR can notify you of potential +pitfalls. +.PP +Non-printable ASCII characters are represented by \fBescape sequences\fR. +Common examples are \f(CW\*(C`\et\*(C'\fR for a tab, \f(CW\*(C`\en\*(C'\fR for a newline, and \f(CW\*(C`\er\*(C'\fR +for a carriage return. Arbitrary bytes are represented by octal +escape sequences, e.g., \f(CW\*(C`\e033\*(C'\fR, or hexadecimal escape sequences, +e.g., \f(CW\*(C`\ex1B\*(C'\fR: +.PP +.Vb 3 +\& "1000\et2000" =~ m(0\et2) # matches +\& "cat" =~ /\e143\ex61\ex74/ # matches in ASCII, but +\& # a weird way to spell cat +.Ve +.PP +Regexes are treated mostly as double-quoted strings, so variable +substitution works: +.PP +.Vb 3 +\& $foo = \*(Aqhouse\*(Aq; +\& \*(Aqcathouse\*(Aq =~ /cat$foo/; # matches +\& \*(Aqhousecat\*(Aq =~ /${foo}cat/; # matches +.Ve +.PP +With all of the regexes above, if the regex matched anywhere in the +string, it was considered a match. To specify \fIwhere\fR it should +match, we would use the \fBanchor\fR metacharacters \f(CW\*(C`^\*(C'\fR and \f(CW\*(C`$\*(C'\fR. The +anchor \f(CW\*(C`^\*(C'\fR means match at the beginning of the string and the anchor +\&\f(CW\*(C`$\*(C'\fR means match at the end of the string, or before a newline at the +end of the string. Some examples: +.PP +.Vb 5 +\& "housekeeper" =~ /keeper/; # matches +\& "housekeeper" =~ /^keeper/; # doesn\*(Aqt match +\& "housekeeper" =~ /keeper$/; # matches +\& "housekeeper\en" =~ /keeper$/; # matches +\& "housekeeper" =~ /^housekeeper$/; # matches +.Ve +.SS "Using character classes" +.IX Subsection "Using character classes" +A \fBcharacter class\fR allows a set of possible characters, rather than +just a single character, to match at a particular point in a regex. +There are a number of different types of character classes, but usually +when people use this term, they are referring to the type described in +this section, which are technically called "Bracketed character +classes", because they are denoted by brackets \f(CW\*(C`[...]\*(C'\fR, with the set of +characters to be possibly matched inside. But we'll drop the "bracketed" +below to correspond with common usage. Here are some examples of +(bracketed) character classes: +.PP +.Vb 3 +\& /cat/; # matches \*(Aqcat\*(Aq +\& /[bcr]at/; # matches \*(Aqbat\*(Aq, \*(Aqcat\*(Aq, or \*(Aqrat\*(Aq +\& "abc" =~ /[cab]/; # matches \*(Aqa\*(Aq +.Ve +.PP +In the last statement, even though \f(CW\*(Aqc\*(Aq\fR is the first character in +the class, the earliest point at which the regex can match is \f(CW\*(Aqa\*(Aq\fR. +.PP +.Vb 3 +\& /[yY][eE][sS]/; # match \*(Aqyes\*(Aq in a case\-insensitive way +\& # \*(Aqyes\*(Aq, \*(AqYes\*(Aq, \*(AqYES\*(Aq, etc. +\& /yes/i; # also match \*(Aqyes\*(Aq in a case\-insensitive way +.Ve +.PP +The last example shows a match with an \f(CW\*(Aqi\*(Aq\fR \fBmodifier\fR, which makes +the match case-insensitive. +.PP +Character classes also have ordinary and special characters, but the +sets of ordinary and special characters inside a character class are +different than those outside a character class. The special +characters for a character class are \f(CW\*(C`\-]\e^$\*(C'\fR and are matched using an +escape: +.PP +.Vb 5 +\& /[\e]c]def/; # matches \*(Aq]def\*(Aq or \*(Aqcdef\*(Aq +\& $x = \*(Aqbcr\*(Aq; +\& /[$x]at/; # matches \*(Aqbat, \*(Aqcat\*(Aq, or \*(Aqrat\*(Aq +\& /[\e$x]at/; # matches \*(Aq$at\*(Aq or \*(Aqxat\*(Aq +\& /[\e\e$x]at/; # matches \*(Aq\eat\*(Aq, \*(Aqbat, \*(Aqcat\*(Aq, or \*(Aqrat\*(Aq +.Ve +.PP +The special character \f(CW\*(Aq\-\*(Aq\fR acts as a range operator within character +classes, so that the unwieldy \f(CW\*(C`[0123456789]\*(C'\fR and \f(CW\*(C`[abc...xyz]\*(C'\fR +become the svelte \f(CW\*(C`[0\-9]\*(C'\fR and \f(CW\*(C`[a\-z]\*(C'\fR: +.PP +.Vb 2 +\& /item[0\-9]/; # matches \*(Aqitem0\*(Aq or ... or \*(Aqitem9\*(Aq +\& /[0\-9a\-fA\-F]/; # matches a hexadecimal digit +.Ve +.PP +If \f(CW\*(Aq\-\*(Aq\fR is the first or last character in a character class, it is +treated as an ordinary character. +.PP +The special character \f(CW\*(C`^\*(C'\fR in the first position of a character class +denotes a \fBnegated character class\fR, which matches any character but +those in the brackets. Both \f(CW\*(C`[...]\*(C'\fR and \f(CW\*(C`[^...]\*(C'\fR must match a +character, or the match fails. Then +.PP +.Vb 4 +\& /[^a]at/; # doesn\*(Aqt match \*(Aqaat\*(Aq or \*(Aqat\*(Aq, but matches +\& # all other \*(Aqbat\*(Aq, \*(Aqcat, \*(Aq0at\*(Aq, \*(Aq%at\*(Aq, etc. +\& /[^0\-9]/; # matches a non\-numeric character +\& /[a^]at/; # matches \*(Aqaat\*(Aq or \*(Aq^at\*(Aq; here \*(Aq^\*(Aq is ordinary +.Ve +.PP +Perl has several abbreviations for common character classes. (These +definitions are those that Perl uses in ASCII-safe mode with the \f(CW\*(C`/a\*(C'\fR modifier. +Otherwise they could match many more non-ASCII Unicode characters as +well. See "Backslash sequences" in perlrecharclass for details.) +.IP \(bu 4 +\&\ed is a digit and represents +.Sp +.Vb 1 +\& [0\-9] +.Ve +.IP \(bu 4 +\&\es is a whitespace character and represents +.Sp +.Vb 1 +\& [\e \et\er\en\ef] +.Ve +.IP \(bu 4 +\&\ew is a word character (alphanumeric or _) and represents +.Sp +.Vb 1 +\& [0\-9a\-zA\-Z_] +.Ve +.IP \(bu 4 +\&\eD is a negated \ed; it represents any character but a digit +.Sp +.Vb 1 +\& [^0\-9] +.Ve +.IP \(bu 4 +\&\eS is a negated \es; it represents any non-whitespace character +.Sp +.Vb 1 +\& [^\es] +.Ve +.IP \(bu 4 +\&\eW is a negated \ew; it represents any non-word character +.Sp +.Vb 1 +\& [^\ew] +.Ve +.IP \(bu 4 +The period '.' matches any character but "\en" +.PP +The \f(CW\*(C`\ed\es\ew\eD\eS\eW\*(C'\fR abbreviations can be used both inside and outside +of character classes. Here are some in use: +.PP +.Vb 7 +\& /\ed\ed:\ed\ed:\ed\ed/; # matches a hh:mm:ss time format +\& /[\ed\es]/; # matches any digit or whitespace character +\& /\ew\eW\ew/; # matches a word char, followed by a +\& # non\-word char, followed by a word char +\& /..rt/; # matches any two chars, followed by \*(Aqrt\*(Aq +\& /end\e./; # matches \*(Aqend.\*(Aq +\& /end[.]/; # same thing, matches \*(Aqend.\*(Aq +.Ve +.PP +The \fBword\ anchor\fR\ \f(CW\*(C`\eb\*(C'\fR matches a boundary between a word +character and a non-word character \f(CW\*(C`\ew\eW\*(C'\fR or \f(CW\*(C`\eW\ew\*(C'\fR: +.PP +.Vb 4 +\& $x = "Housecat catenates house and cat"; +\& $x =~ /\ebcat/; # matches cat in \*(Aqcatenates\*(Aq +\& $x =~ /cat\eb/; # matches cat in \*(Aqhousecat\*(Aq +\& $x =~ /\ebcat\eb/; # matches \*(Aqcat\*(Aq at end of string +.Ve +.PP +In the last example, the end of the string is considered a word +boundary. +.PP +For natural language processing (so that, for example, apostrophes are +included in words), use instead \f(CW\*(C`\eb{wb}\*(C'\fR +.PP +.Vb 1 +\& "don\*(Aqt" =~ / .+? \eb{wb} /x; # matches the whole string +.Ve +.SS "Matching this or that" +.IX Subsection "Matching this or that" +We can match different character strings with the \fBalternation\fR +metacharacter \f(CW\*(Aq|\*(Aq\fR. To match \f(CW\*(C`dog\*(C'\fR or \f(CW\*(C`cat\*(C'\fR, we form the regex +\&\f(CW\*(C`dog|cat\*(C'\fR. As before, Perl will try to match the regex at the +earliest possible point in the string. At each character position, +Perl will first try to match the first alternative, \f(CW\*(C`dog\*(C'\fR. If +\&\f(CW\*(C`dog\*(C'\fR doesn't match, Perl will then try the next alternative, \f(CW\*(C`cat\*(C'\fR. +If \f(CW\*(C`cat\*(C'\fR doesn't match either, then the match fails and Perl moves to +the next position in the string. Some examples: +.PP +.Vb 2 +\& "cats and dogs" =~ /cat|dog|bird/; # matches "cat" +\& "cats and dogs" =~ /dog|cat|bird/; # matches "cat" +.Ve +.PP +Even though \f(CW\*(C`dog\*(C'\fR is the first alternative in the second regex, +\&\f(CW\*(C`cat\*(C'\fR is able to match earlier in the string. +.PP +.Vb 2 +\& "cats" =~ /c|ca|cat|cats/; # matches "c" +\& "cats" =~ /cats|cat|ca|c/; # matches "cats" +.Ve +.PP +At a given character position, the first alternative that allows the +regex match to succeed will be the one that matches. Here, all the +alternatives match at the first string position, so the first matches. +.SS "Grouping things and hierarchical matching" +.IX Subsection "Grouping things and hierarchical matching" +The \fBgrouping\fR metacharacters \f(CW\*(C`()\*(C'\fR allow a part of a regex to be +treated as a single unit. Parts of a regex are grouped by enclosing +them in parentheses. The regex \f(CWhouse(cat|keeper)\fR means match +\&\f(CW\*(C`house\*(C'\fR followed by either \f(CW\*(C`cat\*(C'\fR or \f(CW\*(C`keeper\*(C'\fR. Some more examples +are +.PP +.Vb 2 +\& /(a|b)b/; # matches \*(Aqab\*(Aq or \*(Aqbb\*(Aq +\& /(^a|b)c/; # matches \*(Aqac\*(Aq at start of string or \*(Aqbc\*(Aq anywhere +\& +\& /house(cat|)/; # matches either \*(Aqhousecat\*(Aq or \*(Aqhouse\*(Aq +\& /house(cat(s|)|)/; # matches either \*(Aqhousecats\*(Aq or \*(Aqhousecat\*(Aq or +\& # \*(Aqhouse\*(Aq. Note groups can be nested. +\& +\& "20" =~ /(19|20|)\ed\ed/; # matches the null alternative \*(Aq()\ed\ed\*(Aq, +\& # because \*(Aq20\ed\ed\*(Aq can\*(Aqt match +.Ve +.SS "Extracting matches" +.IX Subsection "Extracting matches" +The grouping metacharacters \f(CW\*(C`()\*(C'\fR also allow the extraction of the +parts of a string that matched. For each grouping, the part that +matched inside goes into the special variables \f(CW$1\fR, \f(CW$2\fR, etc. +They can be used just as ordinary variables: +.PP +.Vb 5 +\& # extract hours, minutes, seconds +\& $time =~ /(\ed\ed):(\ed\ed):(\ed\ed)/; # match hh:mm:ss format +\& $hours = $1; +\& $minutes = $2; +\& $seconds = $3; +.Ve +.PP +In list context, a match \f(CW\*(C`/regex/\*(C'\fR with groupings will return the +list of matched values \f(CW\*(C`($1,$2,...)\*(C'\fR. So we could rewrite it as +.PP +.Vb 1 +\& ($hours, $minutes, $second) = ($time =~ /(\ed\ed):(\ed\ed):(\ed\ed)/); +.Ve +.PP +If the groupings in a regex are nested, \f(CW$1\fR gets the group with the +leftmost opening parenthesis, \f(CW$2\fR the next opening parenthesis, +etc. For example, here is a complex regex and the matching variables +indicated below it: +.PP +.Vb 2 +\& /(ab(cd|ef)((gi)|j))/; +\& 1 2 34 +.Ve +.PP +Associated with the matching variables \f(CW$1\fR, \f(CW$2\fR, ... are +the \fBbackreferences\fR \f(CW\*(C`\eg1\*(C'\fR, \f(CW\*(C`\eg2\*(C'\fR, ... Backreferences are +matching variables that can be used \fIinside\fR a regex: +.PP +.Vb 1 +\& /(\ew\ew\ew)\es\eg1/; # find sequences like \*(Aqthe the\*(Aq in string +.Ve +.PP +\&\f(CW$1\fR, \f(CW$2\fR, ... should only be used outside of a regex, and \f(CW\*(C`\eg1\*(C'\fR, +\&\f(CW\*(C`\eg2\*(C'\fR, ... only inside a regex. +.SS "Matching repetitions" +.IX Subsection "Matching repetitions" +The \fBquantifier\fR metacharacters \f(CW\*(C`?\*(C'\fR, \f(CW\*(C`*\*(C'\fR, \f(CW\*(C`+\*(C'\fR, and \f(CW\*(C`{}\*(C'\fR allow us +to determine the number of repeats of a portion of a regex we +consider to be a match. Quantifiers are put immediately after the +character, character class, or grouping that we want to specify. They +have the following meanings: +.IP \(bu 4 +\&\f(CW\*(C`a?\*(C'\fR = match 'a' 1 or 0 times +.IP \(bu 4 +\&\f(CW\*(C`a*\*(C'\fR = match 'a' 0 or more times, i.e., any number of times +.IP \(bu 4 +\&\f(CW\*(C`a+\*(C'\fR = match 'a' 1 or more times, i.e., at least once +.IP \(bu 4 +\&\f(CW\*(C`a{n,m}\*(C'\fR = match at least \f(CW\*(C`n\*(C'\fR times, but not more than \f(CW\*(C`m\*(C'\fR +times. +.IP \(bu 4 +\&\f(CW\*(C`a{n,}\*(C'\fR = match at least \f(CW\*(C`n\*(C'\fR or more times +.IP \(bu 4 +\&\f(CW\*(C`a{,n}\*(C'\fR = match \f(CW\*(C`n\*(C'\fR times or fewer +.IP \(bu 4 +\&\f(CW\*(C`a{n}\*(C'\fR = match exactly \f(CW\*(C`n\*(C'\fR times +.PP +Here are some examples: +.PP +.Vb 6 +\& /[a\-z]+\es+\ed*/; # match a lowercase word, at least some space, and +\& # any number of digits +\& /(\ew+)\es+\eg1/; # match doubled words of arbitrary length +\& $year =~ /^\ed{2,4}$/; # make sure year is at least 2 but not more +\& # than 4 digits +\& $year =~ /^\ed{ 4 }$|^\ed{2}$/; # better match; throw out 3 digit dates +.Ve +.PP +These quantifiers will try to match as much of the string as possible, +while still allowing the regex to match. So we have +.PP +.Vb 5 +\& $x = \*(Aqthe cat in the hat\*(Aq; +\& $x =~ /^(.*)(at)(.*)$/; # matches, +\& # $1 = \*(Aqthe cat in the h\*(Aq +\& # $2 = \*(Aqat\*(Aq +\& # $3 = \*(Aq\*(Aq (0 matches) +.Ve +.PP +The first quantifier \f(CW\*(C`.*\*(C'\fR grabs as much of the string as possible +while still having the regex match. The second quantifier \f(CW\*(C`.*\*(C'\fR has +no string left to it, so it matches 0 times. +.SS "More matching" +.IX Subsection "More matching" +There are a few more things you might want to know about matching +operators. +The global modifier \f(CW\*(C`/g\*(C'\fR allows the matching operator to match +within a string as many times as possible. In scalar context, +successive matches against a string will have \f(CW\*(C`/g\*(C'\fR jump from match +to match, keeping track of position in the string as it goes along. +You can get or set the position with the \f(CWpos()\fR function. +For example, +.PP +.Vb 4 +\& $x = "cat dog house"; # 3 words +\& while ($x =~ /(\ew+)/g) { +\& print "Word is $1, ends at position ", pos $x, "\en"; +\& } +.Ve +.PP +prints +.PP +.Vb 3 +\& Word is cat, ends at position 3 +\& Word is dog, ends at position 7 +\& Word is house, ends at position 13 +.Ve +.PP +A failed match or changing the target string resets the position. If +you don't want the position reset after failure to match, add the +\&\f(CW\*(C`/c\*(C'\fR, as in \f(CW\*(C`/regex/gc\*(C'\fR. +.PP +In list context, \f(CW\*(C`/g\*(C'\fR returns a list of matched groupings, or if +there are no groupings, a list of matches to the whole regex. So +.PP +.Vb 4 +\& @words = ($x =~ /(\ew+)/g); # matches, +\& # $word[0] = \*(Aqcat\*(Aq +\& # $word[1] = \*(Aqdog\*(Aq +\& # $word[2] = \*(Aqhouse\*(Aq +.Ve +.SS "Search and replace" +.IX Subsection "Search and replace" +Search and replace is performed using \f(CW\*(C`s/regex/replacement/modifiers\*(C'\fR. +The \f(CW\*(C`replacement\*(C'\fR is a Perl double-quoted string that replaces in the +string whatever is matched with the \f(CW\*(C`regex\*(C'\fR. The operator \f(CW\*(C`=~\*(C'\fR is +also used here to associate a string with \f(CW\*(C`s///\*(C'\fR. If matching +against \f(CW$_\fR, the \f(CW\*(C`$_\ =~\*(C'\fR can be dropped. If there is a match, +\&\f(CW\*(C`s///\*(C'\fR returns the number of substitutions made; otherwise it returns +false. Here are a few examples: +.PP +.Vb 5 +\& $x = "Time to feed the cat!"; +\& $x =~ s/cat/hacker/; # $x contains "Time to feed the hacker!" +\& $y = "\*(Aqquoted words\*(Aq"; +\& $y =~ s/^\*(Aq(.*)\*(Aq$/$1/; # strip single quotes, +\& # $y contains "quoted words" +.Ve +.PP +With the \f(CW\*(C`s///\*(C'\fR operator, the matched variables \f(CW$1\fR, \f(CW$2\fR, etc. +are immediately available for use in the replacement expression. With +the global modifier, \f(CW\*(C`s///g\*(C'\fR will search and replace all occurrences +of the regex in the string: +.PP +.Vb 4 +\& $x = "I batted 4 for 4"; +\& $x =~ s/4/four/; # $x contains "I batted four for 4" +\& $x = "I batted 4 for 4"; +\& $x =~ s/4/four/g; # $x contains "I batted four for four" +.Ve +.PP +The non-destructive modifier \f(CW\*(C`s///r\*(C'\fR causes the result of the substitution +to be returned instead of modifying \f(CW$_\fR (or whatever variable the +substitute was bound to with \f(CW\*(C`=~\*(C'\fR): +.PP +.Vb 3 +\& $x = "I like dogs."; +\& $y = $x =~ s/dogs/cats/r; +\& print "$x $y\en"; # prints "I like dogs. I like cats." +\& +\& $x = "Cats are great."; +\& print $x =~ s/Cats/Dogs/r =~ s/Dogs/Frogs/r =~ +\& s/Frogs/Hedgehogs/r, "\en"; +\& # prints "Hedgehogs are great." +\& +\& @foo = map { s/[a\-z]/X/r } qw(a b c 1 2 3); +\& # @foo is now qw(X X X 1 2 3) +.Ve +.PP +The evaluation modifier \f(CW\*(C`s///e\*(C'\fR wraps an \f(CW\*(C`eval{...}\*(C'\fR around the +replacement string and the evaluated result is substituted for the +matched substring. Some examples: +.PP +.Vb 3 +\& # reverse all the words in a string +\& $x = "the cat in the hat"; +\& $x =~ s/(\ew+)/reverse $1/ge; # $x contains "eht tac ni eht tah" +\& +\& # convert percentage to decimal +\& $x = "A 39% hit rate"; +\& $x =~ s!(\ed+)%!$1/100!e; # $x contains "A 0.39 hit rate" +.Ve +.PP +The last example shows that \f(CW\*(C`s///\*(C'\fR can use other delimiters, such as +\&\f(CW\*(C`s!!!\*(C'\fR and \f(CW\*(C`s{}{}\*(C'\fR, and even \f(CW\*(C`s{}//\*(C'\fR. If single quotes are used +\&\f(CW\*(C`s\*(Aq\*(Aq\*(Aq\*(C'\fR, then the regex and replacement are treated as single-quoted +strings. +.SS "The split operator" +.IX Subsection "The split operator" +\&\f(CW\*(C`split /regex/, string\*(C'\fR splits \f(CW\*(C`string\*(C'\fR into a list of substrings +and returns that list. The regex determines the character sequence +that \f(CW\*(C`string\*(C'\fR is split with respect to. For example, to split a +string into words, use +.PP +.Vb 4 +\& $x = "Calvin and Hobbes"; +\& @word = split /\es+/, $x; # $word[0] = \*(AqCalvin\*(Aq +\& # $word[1] = \*(Aqand\*(Aq +\& # $word[2] = \*(AqHobbes\*(Aq +.Ve +.PP +To extract a comma-delimited list of numbers, use +.PP +.Vb 4 +\& $x = "1.618,2.718, 3.142"; +\& @const = split /,\es*/, $x; # $const[0] = \*(Aq1.618\*(Aq +\& # $const[1] = \*(Aq2.718\*(Aq +\& # $const[2] = \*(Aq3.142\*(Aq +.Ve +.PP +If the empty regex \f(CW\*(C`//\*(C'\fR is used, the string is split into individual +characters. If the regex has groupings, then the list produced contains +the matched substrings from the groupings as well: +.PP +.Vb 6 +\& $x = "/usr/bin"; +\& @parts = split m!(/)!, $x; # $parts[0] = \*(Aq\*(Aq +\& # $parts[1] = \*(Aq/\*(Aq +\& # $parts[2] = \*(Aqusr\*(Aq +\& # $parts[3] = \*(Aq/\*(Aq +\& # $parts[4] = \*(Aqbin\*(Aq +.Ve +.PP +Since the first character of \f(CW$x\fR matched the regex, \f(CW\*(C`split\*(C'\fR prepended +an empty initial element to the list. +.ie n .SS """use re \*(Aqstrict\*(Aq""" +.el .SS "\f(CWuse re \*(Aqstrict\*(Aq\fP" +.IX Subsection "use re strict" +New in v5.22, this applies stricter rules than otherwise when compiling +regular expression patterns. It can find things that, while legal, may +not be what you intended. +.PP +See 'strict' in re. +.SH BUGS +.IX Header "BUGS" +None. +.SH "SEE ALSO" +.IX Header "SEE ALSO" +This is just a quick start guide. For a more in-depth tutorial on +regexes, see perlretut and for the reference page, see perlre. +.SH "AUTHOR AND COPYRIGHT" +.IX Header "AUTHOR AND COPYRIGHT" +Copyright (c) 2000 Mark Kvale +All rights reserved. +.PP +This document may be distributed under the same terms as Perl itself. +.SS Acknowledgments +.IX Subsection "Acknowledgments" +The author would like to thank Mark-Jason Dominus, Tom Christiansen, +Ilya Zakharevich, Brad Hughes, and Mike Giroux for all their helpful +comments. |