diff options
Diffstat (limited to 'upstream/mageia-cauldron/man1/perlop.1')
-rw-r--r-- | upstream/mageia-cauldron/man1/perlop.1 | 4136 |
1 files changed, 4136 insertions, 0 deletions
diff --git a/upstream/mageia-cauldron/man1/perlop.1 b/upstream/mageia-cauldron/man1/perlop.1 new file mode 100644 index 00000000..c850b418 --- /dev/null +++ b/upstream/mageia-cauldron/man1/perlop.1 @@ -0,0 +1,4136 @@ +.\" -*- mode: troff; coding: utf-8 -*- +.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) +.\" +.\" Standard preamble: +.\" ======================================================================== +.de Sp \" Vertical space (when we can't use .PP) +.if t .sp .5v +.if n .sp +.. +.de Vb \" Begin verbatim text +.ft CW +.nf +.ne \\$1 +.. +.de Ve \" End verbatim text +.ft R +.fi +.. +.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. +.ie n \{\ +. ds C` "" +. ds C' "" +'br\} +.el\{\ +. ds C` +. ds C' +'br\} +.\" +.\" Escape single quotes in literal strings from groff's Unicode transform. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.\" +.\" If the F register is >0, we'll generate index entries on stderr for +.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index +.\" entries marked with X<> in POD. Of course, you'll have to process the +.\" output yourself in some meaningful fashion. +.\" +.\" Avoid warning from groff about undefined register 'F'. +.de IX +.. +.nr rF 0 +.if \n(.g .if rF .nr rF 1 +.if (\n(rF:(\n(.g==0)) \{\ +. if \nF \{\ +. de IX +. tm Index:\\$1\t\\n%\t"\\$2" +.. +. if !\nF==2 \{\ +. nr % 0 +. nr F 2 +. \} +. \} +.\} +.rr rF +.\" ======================================================================== +.\" +.IX Title "PERLOP 1" +.TH PERLOP 1 2023-11-28 "perl v5.38.2" "Perl Programmers Reference Guide" +.\" For nroff, turn off justification. Always turn off hyphenation; it makes +.\" way too many mistakes in technical documents. +.if n .ad l +.nh +.SH NAME +perlop \- Perl operators and precedence +.IX Xref "operator" +.SH DESCRIPTION +.IX Header "DESCRIPTION" +In Perl, the operator determines what operation is performed, +independent of the type of the operands. For example \f(CW\*(C`$x\ +\ $y\*(C'\fR +is always a numeric addition, and if \f(CW$x\fR or \f(CW$y\fR do not contain +numbers, an attempt is made to convert them to numbers first. +.PP +This is in contrast to many other dynamic languages, where the +operation is determined by the type of the first argument. It also +means that Perl has two versions of some operators, one for numeric +and one for string comparison. For example \f(CW\*(C`$x\ ==\ $y\*(C'\fR compares +two numbers for equality, and \f(CW\*(C`$x\ eq\ $y\*(C'\fR compares two strings. +.PP +There are a few exceptions though: \f(CW\*(C`x\*(C'\fR can be either string +repetition or list repetition, depending on the type of the left +operand, and \f(CW\*(C`&\*(C'\fR, \f(CW\*(C`|\*(C'\fR, \f(CW\*(C`^\*(C'\fR and \f(CW\*(C`~\*(C'\fR can be either string or numeric bit +operations. +.SS "Operator Precedence and Associativity" +.IX Xref "operator, precedence precedence associativity" +.IX Subsection "Operator Precedence and Associativity" +Operator precedence and associativity work in Perl more or less like +they do in mathematics. +.PP +\&\fIOperator precedence\fR means some operators group more tightly than others. +For example, in \f(CW\*(C`2 + 4 * 5\*(C'\fR, the multiplication has higher precedence, so \f(CW\*(C`4 +* 5\*(C'\fR is grouped together as the right-hand operand of the addition, rather +than \f(CW\*(C`2 + 4\*(C'\fR being grouped together as the left-hand operand of the +multiplication. It is as if the expression were written \f(CW\*(C`2 + (4 * 5)\*(C'\fR, not +\&\f(CW\*(C`(2 + 4) * 5\*(C'\fR. So the expression yields \f(CW\*(C`2 + 20 == 22\*(C'\fR, rather than +\&\f(CW\*(C`6 * 5 == 30\*(C'\fR. +.PP +\&\fIOperator associativity\fR defines what happens if a sequence of the same +operators is used one after another: +usually that they will be grouped at the left +or the right. For example, in \f(CW\*(C`9 \- 3 \- 2\*(C'\fR, subtraction is left associative, +so \f(CW\*(C`9 \- 3\*(C'\fR is grouped together as the left-hand operand of the second +subtraction, rather than \f(CW\*(C`3 \- 2\*(C'\fR being grouped together as the right-hand +operand of the first subtraction. It is as if the expression were written +\&\f(CW\*(C`(9 \- 3) \- 2\*(C'\fR, not \f(CW\*(C`9 \- (3 \- 2)\*(C'\fR. So the expression yields \f(CW\*(C`6 \- 2 == 4\*(C'\fR, +rather than \f(CW\*(C`9 \- 1 == 8\*(C'\fR. +.PP +For simple operators that evaluate all their operands and then combine the +values in some way, precedence and associativity (and parentheses) imply some +ordering requirements on those combining operations. For example, in \f(CW2 + 4 * +5\fR, the grouping implied by precedence means that the multiplication of 4 and +5 must be performed before the addition of 2 and 20, simply because the result +of that multiplication is required as one of the operands of the addition. But +the order of operations is not fully determined by this: in \f(CW\*(C`2 * 2 + 4 * 5\*(C'\fR +both multiplications must be performed before the addition, but the grouping +does not say anything about the order in which the two multiplications are +performed. In fact Perl has a general rule that the operands of an operator +are evaluated in left-to-right order. A few operators such as \f(CW\*(C`&&=\*(C'\fR have +special evaluation rules that can result in an operand not being evaluated at +all; in general, the top-level operator in an expression has control of +operand evaluation. +.PP +Some comparison operators, as their associativity, \fIchain\fR with some +operators of the same precedence (but never with operators of different +precedence). This chaining means that each comparison is performed +on the two arguments surrounding it, with each interior argument taking +part in two comparisons, and the comparison results are implicitly ANDed. +Thus \f(CW"$x\ <\ $y\ <=\ $z"\fR behaves exactly like \f(CW"$x\ <\ $y\ &&\ $y\ <=\ $z"\fR, assuming that \f(CW"$y"\fR is as simple a scalar as +it looks. The ANDing short-circuits just like \f(CW"&&"\fR does, stopping +the sequence of comparisons as soon as one yields false. +.PP +In a chained comparison, each argument expression is evaluated at most +once, even if it takes part in two comparisons, but the result of the +evaluation is fetched for each comparison. (It is not evaluated +at all if the short-circuiting means that it's not required for any +comparisons.) This matters if the computation of an interior argument +is expensive or non-deterministic. For example, +.PP +.Vb 1 +\& if($x < expensive_sub() <= $z) { ... +.Ve +.PP +is not entirely like +.PP +.Vb 1 +\& if($x < expensive_sub() && expensive_sub() <= $z) { ... +.Ve +.PP +but instead closer to +.PP +.Vb 2 +\& my $tmp = expensive_sub(); +\& if($x < $tmp && $tmp <= $z) { ... +.Ve +.PP +in that the subroutine is only called once. However, it's not exactly +like this latter code either, because the chained comparison doesn't +actually involve any temporary variable (named or otherwise): there is +no assignment. This doesn't make much difference where the expression +is a call to an ordinary subroutine, but matters more with an lvalue +subroutine, or if the argument expression yields some unusual kind of +scalar by other means. For example, if the argument expression yields +a tied scalar, then the expression is evaluated to produce that scalar +at most once, but the value of that scalar may be fetched up to twice, +once for each comparison in which it is actually used. +.PP +In this example, the expression is evaluated only once, and the tied +scalar (the result of the expression) is fetched for each comparison that +uses it. +.PP +.Vb 1 +\& if ($x < $tied_scalar < $z) { ... +.Ve +.PP +In the next example, the expression is evaluated only once, and the tied +scalar is fetched once as part of the operation within the expression. +The result of that operation is fetched for each comparison, which +normally doesn't matter unless that expression result is also magical due +to operator overloading. +.PP +.Vb 1 +\& if ($x < $tied_scalar + 42 < $z) { ... +.Ve +.PP +Some operators are instead non-associative, meaning that it is a syntax +error to use a sequence of those operators of the same precedence. +For example, \f(CW"$x\ ..\ $y\ ..\ $z"\fR is an error. +.PP +Perl operators have the following associativity and precedence, +listed from highest precedence to lowest. Operators borrowed from +C keep the same precedence relationship with each other, even where +C's precedence is slightly screwy. (This makes learning Perl easier +for C folks.) With very few exceptions, these all operate on scalar +values only, not array values. +.PP +.Vb 10 +\& left terms and list operators (leftward) +\& left \-> +\& nonassoc ++ \-\- +\& right ** +\& right ! ~ ~. \e and unary + and \- +\& left =~ !~ +\& left * / % x +\& left + \- . +\& left << >> +\& nonassoc named unary operators +\& nonassoc isa +\& chained < > <= >= lt gt le ge +\& chain/na == != eq ne <=> cmp ~~ +\& left & &. +\& left | |. ^ ^. +\& left && +\& left || // +\& nonassoc .. ... +\& right ?: +\& right = += \-= *= etc. goto last next redo dump +\& left , => +\& nonassoc list operators (rightward) +\& right not +\& left and +\& left or xor +.Ve +.PP +In the following sections, these operators are covered in detail, in the +same order in which they appear in the table above. +.PP +Many operators can be overloaded for objects. See overload. +.SS "Terms and List Operators (Leftward)" +.IX Xref "list operator operator, list term" +.IX Subsection "Terms and List Operators (Leftward)" +A TERM has the highest precedence in Perl. They include variables, +quote and quote-like operators, any expression in parentheses, +and any function whose arguments are parenthesized. Actually, there +aren't really functions in this sense, just list operators and unary +operators behaving as functions because you put parentheses around +the arguments. These are all documented in perlfunc. +.PP +If any list operator (\f(CWprint()\fR, etc.) or any unary operator (\f(CWchdir()\fR, etc.) +is followed by a left parenthesis as the next token, the operator and +arguments within parentheses are taken to be of highest precedence, +just like a normal function call. +.PP +In the absence of parentheses, the precedence of list operators such as +\&\f(CW\*(C`print\*(C'\fR, \f(CW\*(C`sort\*(C'\fR, or \f(CW\*(C`chmod\*(C'\fR is either very high or very low depending on +whether you are looking at the left side or the right side of the operator. +For example, in +.PP +.Vb 2 +\& @ary = (1, 3, sort 4, 2); +\& print @ary; # prints 1324 +.Ve +.PP +the commas on the right of the \f(CW\*(C`sort\*(C'\fR are evaluated before the \f(CW\*(C`sort\*(C'\fR, +but the commas on the left are evaluated after. In other words, +list operators tend to gobble up all arguments that follow, and +then act like a simple TERM with regard to the preceding expression. +Be careful with parentheses: +.PP +.Vb 3 +\& # These evaluate exit before doing the print: +\& print($foo, exit); # Obviously not what you want. +\& print $foo, exit; # Nor is this. +\& +\& # These do the print before evaluating exit: +\& (print $foo), exit; # This is what you want. +\& print($foo), exit; # Or this. +\& print ($foo), exit; # Or even this. +.Ve +.PP +Also note that +.PP +.Vb 1 +\& print ($foo & 255) + 1, "\en"; +.Ve +.PP +probably doesn't do what you expect at first glance. The parentheses +enclose the argument list for \f(CW\*(C`print\*(C'\fR which is evaluated (printing +the result of \f(CW\*(C`$foo\ &\ 255\*(C'\fR). Then one is added to the return value +of \f(CW\*(C`print\*(C'\fR (usually 1). The result is something like this: +.PP +.Vb 1 +\& 1 + 1, "\en"; # Obviously not what you meant. +.Ve +.PP +To do what you meant properly, you must write: +.PP +.Vb 1 +\& print(($foo & 255) + 1, "\en"); +.Ve +.PP +See "Named Unary Operators" for more discussion of this. +.PP +Also parsed as terms are the \f(CW\*(C`do\ {}\*(C'\fR and \f(CW\*(C`eval\ {}\*(C'\fR constructs, as +well as subroutine and method calls, and the anonymous +constructors \f(CW\*(C`[]\*(C'\fR and \f(CW\*(C`{}\*(C'\fR. +.PP +See also "Quote and Quote-like Operators" toward the end of this section, +as well as "I/O Operators". +.SS "The Arrow Operator" +.IX Xref "arrow dereference ->" +.IX Subsection "The Arrow Operator" +"\f(CW\*(C`\->\*(C'\fR" is an infix dereference operator, just as it is in C +and C++. If the right side is either a \f(CW\*(C`[...]\*(C'\fR, \f(CW\*(C`{...}\*(C'\fR, or a +\&\f(CW\*(C`(...)\*(C'\fR subscript, then the left side must be either a hard or +symbolic reference to an array, a hash, or a subroutine respectively. +(Or technically speaking, a location capable of holding a hard +reference, if it's an array or hash reference being used for +assignment.) See perlreftut and perlref. +.PP +Otherwise, the right side is a method name or a simple scalar +variable containing either the method name or a subroutine reference, +and (if it is a method name) the left side must be either an object (a +blessed reference) or a class name (that is, a package name). See +perlobj. +.PP +The dereferencing cases (as opposed to method-calling cases) are +somewhat extended by the \f(CW\*(C`postderef\*(C'\fR feature. For the +details of that feature, consult "Postfix Dereference Syntax" in perlref. +.SS "Auto-increment and Auto-decrement" +.IX Xref "increment auto-increment ++ decrement auto-decrement --" +.IX Subsection "Auto-increment and Auto-decrement" +\&\f(CW"++"\fR and \f(CW"\-\-"\fR work as in C. That is, if placed before a variable, +they increment or decrement the variable by one before returning the +value, and if placed after, increment or decrement after returning the +value. +.PP +.Vb 3 +\& $i = 0; $j = 0; +\& print $i++; # prints 0 +\& print ++$j; # prints 1 +.Ve +.PP +Note that just as in C, Perl doesn't define \fBwhen\fR the variable is +incremented or decremented. You just know it will be done sometime +before or after the value is returned. This also means that modifying +a variable twice in the same statement will lead to undefined behavior. +Avoid statements like: +.PP +.Vb 2 +\& $i = $i ++; +\& print ++ $i + $i ++; +.Ve +.PP +Perl will not guarantee what the result of the above statements is. +.PP +The auto-increment operator has a little extra builtin magic to it. If +you increment a variable that is numeric, or that has ever been used in +a numeric context, you get a normal increment. If, however, the +variable has been used in only string contexts since it was set, and +has a value that is not the empty string and matches the pattern +\&\f(CW\*(C`/^[a\-zA\-Z]*[0\-9]*\ez/\*(C'\fR, the increment is done as a string, preserving each +character within its range, with carry: +.PP +.Vb 4 +\& print ++($foo = "99"); # prints "100" +\& print ++($foo = "a0"); # prints "a1" +\& print ++($foo = "Az"); # prints "Ba" +\& print ++($foo = "zz"); # prints "aaa" +.Ve +.PP +\&\f(CW\*(C`undef\*(C'\fR is always treated as numeric, and in particular is changed +to \f(CW0\fR before incrementing (so that a post-increment of an undef value +will return \f(CW0\fR rather than \f(CW\*(C`undef\*(C'\fR). +.PP +The auto-decrement operator is not magical. +.SS Exponentiation +.IX Xref "** exponentiation power" +.IX Subsection "Exponentiation" +Binary \f(CW"**"\fR is the exponentiation operator. It binds even more +tightly than unary minus, so \f(CW\*(C`\-2**4\*(C'\fR is \f(CW\*(C`\-(2**4)\*(C'\fR, not \f(CW\*(C`(\-2)**4\*(C'\fR. +(This is +implemented using C's \f(CWpow(3)\fR function, which actually works on doubles +internally.) +.PP +Note that certain exponentiation expressions are ill-defined: +these include \f(CW\*(C`0**0\*(C'\fR, \f(CW\*(C`1**Inf\*(C'\fR, and \f(CW\*(C`Inf**0\*(C'\fR. Do not expect +any particular results from these special cases, the results +are platform-dependent. +.SS "Symbolic Unary Operators" +.IX Xref "unary operator operator, unary" +.IX Subsection "Symbolic Unary Operators" +Unary \f(CW"!"\fR performs logical negation, that is, "not". See also +\&\f(CW\*(C`not\*(C'\fR for a lower precedence version of this. +.IX Xref "!" +.PP +Unary \f(CW"\-"\fR performs arithmetic negation if the operand is numeric, +including any string that looks like a number. If the operand is +an identifier, a string consisting of a minus sign concatenated +with the identifier is returned. Otherwise, if the string starts +with a plus or minus, a string starting with the opposite sign is +returned. One effect of these rules is that \f(CW\*(C`\-bareword\*(C'\fR is equivalent +to the string \f(CW"\-bareword"\fR. If, however, the string begins with a +non-alphabetic character (excluding \f(CW"+"\fR or \f(CW"\-"\fR), Perl will attempt +to convert +the string to a numeric, and the arithmetic negation is performed. If the +string cannot be cleanly converted to a numeric, Perl will give the warning +\&\fBArgument "the string" isn't numeric in negation (\-) at ...\fR. +.IX Xref "- negation, arithmetic" +.PP +Unary \f(CW"~"\fR performs bitwise negation, that is, 1's complement. For +example, \f(CW\*(C`0666\ &\ ~027\*(C'\fR is 0640. (See also "Integer Arithmetic" and +"Bitwise String Operators".) Note that the width of the result is +platform-dependent: \f(CW\*(C`~0\*(C'\fR is 32 bits wide on a 32\-bit platform, but 64 +bits wide on a 64\-bit platform, so if you are expecting a certain bit +width, remember to use the \f(CW"&"\fR operator to mask off the excess bits. +.IX Xref "~ negation, binary" +.PP +Starting in Perl 5.28, it is a fatal error to try to complement a string +containing a character with an ordinal value above 255. +.PP +If the "bitwise" feature is enabled via \f(CW\*(C`use\ feature\ \*(Aqbitwise\*(Aq\*(C'\fR or \f(CW\*(C`use v5.28\*(C'\fR, then unary +\&\f(CW"~"\fR always treats its argument as a number, and an +alternate form of the operator, \f(CW"~."\fR, always treats its argument as a +string. So \f(CW\*(C`~0\*(C'\fR and \f(CW\*(C`~"0"\*(C'\fR will both give 2**32\-1 on 32\-bit platforms, +whereas \f(CW\*(C`~.0\*(C'\fR and \f(CW\*(C`~."0"\*(C'\fR will both yield \f(CW"\exff"\fR. Until Perl 5.28, +this feature produced a warning in the \f(CW"experimental::bitwise"\fR category. +.PP +Unary \f(CW"+"\fR has no effect whatsoever, even on strings. It is useful +syntactically for separating a function name from a parenthesized expression +that would otherwise be interpreted as the complete list of function +arguments. (See examples above under "Terms and List Operators (Leftward)".) +.IX Xref "+" +.PP +Unary \f(CW"\e"\fR creates references. If its operand is a single sigilled +thing, it creates a reference to that object. If its operand is a +parenthesised list, then it creates references to the things mentioned +in the list. Otherwise it puts its operand in list context, and creates +a list of references to the scalars in the list provided by the operand. +See perlreftut +and perlref. Do not confuse this behavior with the behavior of +backslash within a string, although both forms do convey the notion +of protecting the next thing from interpolation. +.IX Xref "\\ reference backslash" +.SS "Binding Operators" +.IX Xref "binding operator, binding =~ !~" +.IX Subsection "Binding Operators" +Binary \f(CW"=~"\fR binds a scalar expression to a pattern match. Certain operations +search or modify the string \f(CW$_\fR by default. This operator makes that kind +of operation work on some other string. The right argument is a search +pattern, substitution, or transliteration. The left argument is what is +supposed to be searched, substituted, or transliterated instead of the default +\&\f(CW$_\fR. When used in scalar context, the return value generally indicates the +success of the operation. The exceptions are substitution (\f(CW\*(C`s///\*(C'\fR) +and transliteration (\f(CW\*(C`y///\*(C'\fR) with the \f(CW\*(C`/r\*(C'\fR (non-destructive) option, +which cause the \fBr\fReturn value to be the result of the substitution. +Behavior in list context depends on the particular operator. +See "Regexp Quote-Like Operators" for details and perlretut for +examples using these operators. +.PP +If the right argument is an expression rather than a search pattern, +substitution, or transliteration, it is interpreted as a search pattern at run +time. Note that this means that its +contents will be interpolated twice, so +.PP +.Vb 1 +\& \*(Aq\e\e\*(Aq =~ q\*(Aq\e\e\*(Aq; +.Ve +.PP +is not ok, as the regex engine will end up trying to compile the +pattern \f(CW\*(C`\e\*(C'\fR, which it will consider a syntax error. +.PP +Binary \f(CW"!~"\fR is just like \f(CW"=~"\fR except the return value is negated in +the logical sense. +.PP +Binary \f(CW"!~"\fR with a non-destructive substitution (\f(CW\*(C`s///r\*(C'\fR) or transliteration +(\f(CW\*(C`y///r\*(C'\fR) is a syntax error. +.SS "Multiplicative Operators" +.IX Xref "operator, multiplicative" +.IX Subsection "Multiplicative Operators" +Binary \f(CW"*"\fR multiplies two numbers. +.IX Xref "*" +.PP +Binary \f(CW"/"\fR divides two numbers. +.IX Xref "slash" +.PP +Binary \f(CW"%"\fR is the modulo operator, which computes the division +remainder of its first argument with respect to its second argument. +Given integer +operands \f(CW$m\fR and \f(CW$n\fR: If \f(CW$n\fR is positive, then \f(CW\*(C`$m\ %\ $n\*(C'\fR is +\&\f(CW$m\fR minus the largest multiple of \f(CW$n\fR less than or equal to +\&\f(CW$m\fR. If \f(CW$n\fR is negative, then \f(CW\*(C`$m\ %\ $n\*(C'\fR is \f(CW$m\fR minus the +smallest multiple of \f(CW$n\fR that is not less than \f(CW$m\fR (that is, the +result will be less than or equal to zero). If the operands +\&\f(CW$m\fR and \f(CW$n\fR are floating point values and the absolute value of +\&\f(CW$n\fR (that is \f(CWabs($n)\fR) is less than \f(CW\*(C`(UV_MAX\ +\ 1)\*(C'\fR, only +the integer portion of \f(CW$m\fR and \f(CW$n\fR will be used in the operation +(Note: here \f(CW\*(C`UV_MAX\*(C'\fR means the maximum of the unsigned integer type). +If the absolute value of the right operand (\f(CWabs($n)\fR) is greater than +or equal to \f(CW\*(C`(UV_MAX\ +\ 1)\*(C'\fR, \f(CW"%"\fR computes the floating-point remainder +\&\f(CW$r\fR in the equation \f(CW\*(C`($r\ =\ $m\ \-\ $i*$n)\*(C'\fR where \f(CW$i\fR is a certain +integer that makes \f(CW$r\fR have the same sign as the right operand +\&\f(CW$n\fR (\fBnot\fR as the left operand \f(CW$m\fR like C function \f(CWfmod()\fR) +and the absolute value less than that of \f(CW$n\fR. +Note that when \f(CW\*(C`use\ integer\*(C'\fR is in scope, \f(CW"%"\fR gives you direct access +to the modulo operator as implemented by your C compiler. This +operator is not as well defined for negative operands, but it will +execute faster. +.IX Xref "% remainder modulo mod" +.PP +Binary \f(CW\*(C`x\*(C'\fR is the repetition operator. In scalar context, or if the +left operand is neither enclosed in parentheses nor a \f(CW\*(C`qw//\*(C'\fR list, +it performs a string repetition. In that case it supplies scalar +context to the left operand, and returns a string consisting of the +left operand string repeated the number of times specified by the right +operand. If the \f(CW\*(C`x\*(C'\fR is in list context, and the left operand is either +enclosed in parentheses or a \f(CW\*(C`qw//\*(C'\fR list, it performs a list repetition. +In that case it supplies list context to the left operand, and returns +a list consisting of the left operand list repeated the number of times +specified by the right operand. +If the right operand is zero or negative (raising a warning on +negative), it returns an empty string +or an empty list, depending on the context. +.IX Xref "x" +.PP +.Vb 1 +\& print \*(Aq\-\*(Aq x 80; # print row of dashes +\& +\& print "\et" x ($tab/8), \*(Aq \*(Aq x ($tab%8); # tab over +\& +\& @ones = (1) x 80; # a list of 80 1\*(Aqs +\& @ones = (5) x @ones; # set all elements to 5 +.Ve +.SS "Additive Operators" +.IX Xref "operator, additive" +.IX Subsection "Additive Operators" +Binary \f(CW"+"\fR returns the sum of two numbers. +.IX Xref "+" +.PP +Binary \f(CW"\-"\fR returns the difference of two numbers. +.IX Xref "-" +.PP +Binary \f(CW"."\fR concatenates two strings. +.IX Xref "string, concatenation concatenation cat concat concatenate ." +.SS "Shift Operators" +.IX Xref "shift operator operator, shift << >> right shift left shift bitwise shift shl shr shift, right shift, left" +.IX Subsection "Shift Operators" +Binary \f(CW"<<"\fR returns the value of its left argument shifted left by the +number of bits specified by the right argument. Arguments should be +integers. (See also "Integer Arithmetic".) +.PP +Binary \f(CW">>"\fR returns the value of its left argument shifted right by +the number of bits specified by the right argument. Arguments should +be integers. (See also "Integer Arithmetic".) +.PP +If \f(CW\*(C`use\ integer\*(C'\fR (see "Integer Arithmetic") is in force then +signed C integers are used (\fIarithmetic shift\fR), otherwise unsigned C +integers are used (\fIlogical shift\fR), even for negative shiftees. +In arithmetic right shift the sign bit is replicated on the left, +in logical shift zero bits come in from the left. +.PP +Either way, the implementation isn't going to generate results larger +than the size of the integer type Perl was built with (32 bits or 64 bits). +.PP +Shifting by negative number of bits means the reverse shift: left +shift becomes right shift, right shift becomes left shift. This is +unlike in C, where negative shift is undefined. +.PP +Shifting by more bits than the size of the integers means most of the +time zero (all bits fall off), except that under \f(CW\*(C`use\ integer\*(C'\fR +right overshifting a negative shiftee results in \-1. This is unlike +in C, where shifting by too many bits is undefined. A common C +behavior is "shift by modulo wordbits", so that for example +.PP +.Vb 1 +\& 1 >> 64 == 1 >> (64 % 64) == 1 >> 0 == 1 # Common C behavior. +.Ve +.PP +but that is completely accidental. +.PP +If you get tired of being subject to your platform's native integers, +the \f(CW\*(C`use\ bigint\*(C'\fR pragma neatly sidesteps the issue altogether: +.PP +.Vb 5 +\& print 20 << 20; # 20971520 +\& print 20 << 40; # 5120 on 32\-bit machines, +\& # 21990232555520 on 64\-bit machines +\& use bigint; +\& print 20 << 100; # 25353012004564588029934064107520 +.Ve +.SS "Named Unary Operators" +.IX Xref "operator, named unary" +.IX Subsection "Named Unary Operators" +The various named unary operators are treated as functions with one +argument, with optional parentheses. +.PP +If any list operator (\f(CWprint()\fR, etc.) or any unary operator (\f(CWchdir()\fR, etc.) +is followed by a left parenthesis as the next token, the operator and +arguments within parentheses are taken to be of highest precedence, +just like a normal function call. For example, +because named unary operators are higher precedence than \f(CW\*(C`||\*(C'\fR: +.PP +.Vb 4 +\& chdir $foo || die; # (chdir $foo) || die +\& chdir($foo) || die; # (chdir $foo) || die +\& chdir ($foo) || die; # (chdir $foo) || die +\& chdir +($foo) || die; # (chdir $foo) || die +.Ve +.PP +but, because \f(CW"*"\fR is higher precedence than named operators: +.PP +.Vb 4 +\& chdir $foo * 20; # chdir ($foo * 20) +\& chdir($foo) * 20; # (chdir $foo) * 20 +\& chdir ($foo) * 20; # (chdir $foo) * 20 +\& chdir +($foo) * 20; # chdir ($foo * 20) +\& +\& rand 10 * 20; # rand (10 * 20) +\& rand(10) * 20; # (rand 10) * 20 +\& rand (10) * 20; # (rand 10) * 20 +\& rand +(10) * 20; # rand (10 * 20) +.Ve +.PP +Regarding precedence, the filetest operators, like \f(CW\*(C`\-f\*(C'\fR, \f(CW\*(C`\-M\*(C'\fR, etc. are +treated like named unary operators, but they don't follow this functional +parenthesis rule. That means, for example, that \f(CW\*(C`\-f($file).".bak"\*(C'\fR is +equivalent to \f(CW\*(C`\-f\ "$file.bak"\*(C'\fR. +.IX Xref "-X filetest operator, filetest" +.PP +See also "Terms and List Operators (Leftward)". +.SS "Relational Operators" +.IX Xref "relational operator operator, relational" +.IX Subsection "Relational Operators" +Perl operators that return true or false generally return values +that can be safely used as numbers. For example, the relational +operators in this section and the equality operators in the next +one return \f(CW1\fR for true and a special version of the defined empty +string, \f(CW""\fR, which counts as a zero but is exempt from warnings +about improper numeric conversions, just as \f(CW"0\ but\ true"\fR is. +.PP +Binary \f(CW"<"\fR returns true if the left argument is numerically less than +the right argument. +.IX Xref "<" +.PP +Binary \f(CW">"\fR returns true if the left argument is numerically greater +than the right argument. +.IX Xref ">" +.PP +Binary \f(CW"<="\fR returns true if the left argument is numerically less than +or equal to the right argument. +.IX Xref "<=" +.PP +Binary \f(CW">="\fR returns true if the left argument is numerically greater +than or equal to the right argument. +.IX Xref ">=" +.PP +Binary \f(CW"lt"\fR returns true if the left argument is stringwise less than +the right argument. +.IX Xref "lt" +.PP +Binary \f(CW"gt"\fR returns true if the left argument is stringwise greater +than the right argument. +.IX Xref "gt" +.PP +Binary \f(CW"le"\fR returns true if the left argument is stringwise less than +or equal to the right argument. +.IX Xref "le" +.PP +Binary \f(CW"ge"\fR returns true if the left argument is stringwise greater +than or equal to the right argument. +.IX Xref "ge" +.PP +A sequence of relational operators, such as \f(CW"$x\ <\ $y\ <=\ $z"\fR, performs chained comparisons, in the manner described above in +the section "Operator Precedence and Associativity". +Beware that they do not chain with equality operators, which have lower +precedence. +.SS "Equality Operators" +.IX Xref "equality equal equals operator, equality" +.IX Subsection "Equality Operators" +Binary \f(CW"=="\fR returns true if the left argument is numerically equal to +the right argument. +.IX Xref "==" +.PP +Binary \f(CW"!="\fR returns true if the left argument is numerically not equal +to the right argument. +.IX Xref "!=" +.PP +Binary \f(CW"eq"\fR returns true if the left argument is stringwise equal to +the right argument. +.IX Xref "eq" +.PP +Binary \f(CW"ne"\fR returns true if the left argument is stringwise not equal +to the right argument. +.IX Xref "ne" +.PP +A sequence of the above equality operators, such as \f(CW"$x\ ==\ $y\ ==\ $z"\fR, performs chained comparisons, in the manner described above in +the section "Operator Precedence and Associativity". +Beware that they do not chain with relational operators, which have +higher precedence. +.PP +Binary \f(CW"<=>"\fR returns \-1, 0, or 1 depending on whether the left +argument is numerically less than, equal to, or greater than the right +argument. If your platform supports \f(CW\*(C`NaN\*(C'\fR's (not-a-numbers) as numeric +values, using them with \f(CW"<=>"\fR returns undef. \f(CW\*(C`NaN\*(C'\fR is not +\&\f(CW"<"\fR, \f(CW"=="\fR, \f(CW">"\fR, \f(CW"<="\fR or \f(CW">="\fR anything +(even \f(CW\*(C`NaN\*(C'\fR), so those 5 return false. \f(CW\*(C`NaN\ !=\ NaN\*(C'\fR returns +true, as does \f(CW\*(C`NaN\ !=\*(C'\fR\ \fIanything\ else\fR. If your platform doesn't +support \f(CW\*(C`NaN\*(C'\fR's then \f(CW\*(C`NaN\*(C'\fR is just a string with numeric value 0. +.IX Xref "<=> spaceship" +.PP +.Vb 2 +\& $ perl \-le \*(Aq$x = "NaN"; print "No NaN support here" if $x == $x\*(Aq +\& $ perl \-le \*(Aq$x = "NaN"; print "NaN support here" if $x != $x\*(Aq +.Ve +.PP +(Note that the bigint, bigrat, and bignum pragmas all +support \f(CW"NaN"\fR.) +.PP +Binary \f(CW"cmp"\fR returns \-1, 0, or 1 depending on whether the left +argument is stringwise less than, equal to, or greater than the right +argument. +.PP +Here we can see the difference between <=> and cmp, +.PP +.Vb 2 +\& print 10 <=> 2 #prints 1 +\& print 10 cmp 2 #prints \-1 +.Ve +.PP +(likewise between gt and >, lt and <, etc.) +.IX Xref "cmp" +.PP +Binary \f(CW"~~"\fR does a smartmatch between its arguments. Smart matching +is described in the next section. +.IX Xref "~~" +.PP +The two-sided ordering operators \f(CW"<=>"\fR and \f(CW"cmp"\fR, and the +smartmatch operator \f(CW"~~"\fR, are non-associative with respect to each +other and with respect to the equality operators of the same precedence. +.PP +\&\f(CW"lt"\fR, \f(CW"le"\fR, \f(CW"ge"\fR, \f(CW"gt"\fR and \f(CW"cmp"\fR use the collation (sort) +order specified by the current \f(CW\*(C`LC_COLLATE\*(C'\fR locale if a \f(CW\*(C`use\ locale\*(C'\fR form that includes collation is in effect. See perllocale. +Do not mix these with Unicode, +only use them with legacy 8\-bit locale encodings. +The standard \f(CW\*(C`Unicode::Collate\*(C'\fR and +\&\f(CW\*(C`Unicode::Collate::Locale\*(C'\fR modules offer much more powerful +solutions to collation issues. +.PP +For case-insensitive comparisons, look at the "fc" in perlfunc case-folding +function, available in Perl v5.16 or later: +.PP +.Vb 1 +\& if ( fc($x) eq fc($y) ) { ... } +.Ve +.SS "Class Instance Operator" +.IX Xref "isa operator" +.IX Subsection "Class Instance Operator" +Binary \f(CW\*(C`isa\*(C'\fR evaluates to true when the left argument is an object instance of +the class (or a subclass derived from that class) given by the right argument. +If the left argument is not defined, not a blessed object instance, nor does +not derive from the class given by the right argument, the operator evaluates +as false. The right argument may give the class either as a bareword or a +scalar expression that yields a string class name: +.PP +.Vb 1 +\& if( $obj isa Some::Class ) { ... } +\& +\& if( $obj isa "Different::Class" ) { ... } +\& if( $obj isa $name_of_class ) { ... } +.Ve +.PP +This feature is available from Perl 5.31.6 onwards when enabled by +\&\f(CW\*(C`use feature \*(Aqisa\*(Aq\*(C'\fR. This feature is enabled automatically by a +\&\f(CW\*(C`use v5.36\*(C'\fR (or higher) declaration in the current scope. +.SS "Smartmatch Operator" +.IX Subsection "Smartmatch Operator" +First available in Perl 5.10.1 (the 5.10.0 version behaved differently), +binary \f(CW\*(C`~~\*(C'\fR does a "smartmatch" between its arguments. This is mostly +used implicitly in the \f(CW\*(C`when\*(C'\fR construct described in perlsyn, although +not all \f(CW\*(C`when\*(C'\fR clauses call the smartmatch operator. Unique among all of +Perl's operators, the smartmatch operator can recurse. The smartmatch +operator is experimental and its behavior is +subject to change. +.PP +It is also unique in that all other Perl operators impose a context +(usually string or numeric context) on their operands, autoconverting +those operands to those imposed contexts. In contrast, smartmatch +\&\fIinfers\fR contexts from the actual types of its operands and uses that +type information to select a suitable comparison mechanism. +.PP +The \f(CW\*(C`~~\*(C'\fR operator compares its operands "polymorphically", determining how +to compare them according to their actual types (numeric, string, array, +hash, etc.). Like the equality operators with which it shares the same +precedence, \f(CW\*(C`~~\*(C'\fR returns 1 for true and \f(CW""\fR for false. It is often best +read aloud as "in", "inside of", or "is contained in", because the left +operand is often looked for \fIinside\fR the right operand. That makes the +order of the operands to the smartmatch operand often opposite that of +the regular match operator. In other words, the "smaller" thing is usually +placed in the left operand and the larger one in the right. +.PP +The behavior of a smartmatch depends on what type of things its arguments +are, as determined by the following table. The first row of the table +whose types apply determines the smartmatch behavior. Because what +actually happens is mostly determined by the type of the second operand, +the table is sorted on the right operand instead of on the left. +.PP +.Vb 4 +\& Left Right Description and pseudocode +\& =============================================================== +\& Any undef check whether Any is undefined +\& like: !defined Any +\& +\& Any Object invoke ~~ overloading on Object, or die +\& +\& Right operand is an ARRAY: +\& +\& Left Right Description and pseudocode +\& =============================================================== +\& ARRAY1 ARRAY2 recurse on paired elements of ARRAY1 and ARRAY2[2] +\& like: (ARRAY1[0] ~~ ARRAY2[0]) +\& && (ARRAY1[1] ~~ ARRAY2[1]) && ... +\& HASH ARRAY any ARRAY elements exist as HASH keys +\& like: grep { exists HASH\->{$_} } ARRAY +\& Regexp ARRAY any ARRAY elements pattern match Regexp +\& like: grep { /Regexp/ } ARRAY +\& undef ARRAY undef in ARRAY +\& like: grep { !defined } ARRAY +\& Any ARRAY smartmatch each ARRAY element[3] +\& like: grep { Any ~~ $_ } ARRAY +\& +\& Right operand is a HASH: +\& +\& Left Right Description and pseudocode +\& =============================================================== +\& HASH1 HASH2 all same keys in both HASHes +\& like: keys HASH1 == +\& grep { exists HASH2\->{$_} } keys HASH1 +\& ARRAY HASH any ARRAY elements exist as HASH keys +\& like: grep { exists HASH\->{$_} } ARRAY +\& Regexp HASH any HASH keys pattern match Regexp +\& like: grep { /Regexp/ } keys HASH +\& undef HASH always false (undef cannot be a key) +\& like: 0 == 1 +\& Any HASH HASH key existence +\& like: exists HASH\->{Any} +\& +\& Right operand is CODE: +\& +\& Left Right Description and pseudocode +\& =============================================================== +\& ARRAY CODE sub returns true on all ARRAY elements[1] +\& like: !grep { !CODE\->($_) } ARRAY +\& HASH CODE sub returns true on all HASH keys[1] +\& like: !grep { !CODE\->($_) } keys HASH +\& Any CODE sub passed Any returns true +\& like: CODE\->(Any) +\& +\& Right operand is a Regexp: +\& +\& Left Right Description and pseudocode +\& =============================================================== +\& ARRAY Regexp any ARRAY elements match Regexp +\& like: grep { /Regexp/ } ARRAY +\& HASH Regexp any HASH keys match Regexp +\& like: grep { /Regexp/ } keys HASH +\& Any Regexp pattern match +\& like: Any =~ /Regexp/ +\& +\& Other: +\& +\& Left Right Description and pseudocode +\& =============================================================== +\& Object Any invoke ~~ overloading on Object, +\& or fall back to... +\& +\& Any Num numeric equality +\& like: Any == Num +\& Num nummy[4] numeric equality +\& like: Num == nummy +\& undef Any check whether undefined +\& like: !defined(Any) +\& Any Any string equality +\& like: Any eq Any +.Ve +.PP +Notes: +.IP "1. Empty hashes or arrays match." 4 +.IX Item "1. Empty hashes or arrays match." +.PD 0 +.IP "2. That is, each element smartmatches the element of the same index in the other array.[3]" 4 +.IX Item "2. That is, each element smartmatches the element of the same index in the other array.[3]" +.IP "3. If a circular reference is found, fall back to referential equality." 4 +.IX Item "3. If a circular reference is found, fall back to referential equality." +.IP "4. Either an actual number, or a string that looks like one." 4 +.IX Item "4. Either an actual number, or a string that looks like one." +.PD +.PP +The smartmatch implicitly dereferences any non-blessed hash or array +reference, so the \f(CW\*(C`\fR\f(CIHASH\fR\f(CW\*(C'\fR and \f(CW\*(C`\fR\f(CIARRAY\fR\f(CW\*(C'\fR entries apply in those cases. +For blessed references, the \f(CW\*(C`\fR\f(CIObject\fR\f(CW\*(C'\fR entries apply. Smartmatches +involving hashes only consider hash keys, never hash values. +.PP +The "like" code entry is not always an exact rendition. For example, the +smartmatch operator short-circuits whenever possible, but \f(CW\*(C`grep\*(C'\fR does +not. Also, \f(CW\*(C`grep\*(C'\fR in scalar context returns the number of matches, but +\&\f(CW\*(C`~~\*(C'\fR returns only true or false. +.PP +Unlike most operators, the smartmatch operator knows to treat \f(CW\*(C`undef\*(C'\fR +specially: +.PP +.Vb 3 +\& use v5.10.1; +\& @array = (1, 2, 3, undef, 4, 5); +\& say "some elements undefined" if undef ~~ @array; +.Ve +.PP +Each operand is considered in a modified scalar context, the modification +being that array and hash variables are passed by reference to the +operator, which implicitly dereferences them. Both elements +of each pair are the same: +.PP +.Vb 1 +\& use v5.10.1; +\& +\& my %hash = (red => 1, blue => 2, green => 3, +\& orange => 4, yellow => 5, purple => 6, +\& black => 7, grey => 8, white => 9); +\& +\& my @array = qw(red blue green); +\& +\& say "some array elements in hash keys" if @array ~~ %hash; +\& say "some array elements in hash keys" if \e@array ~~ \e%hash; +\& +\& say "red in array" if "red" ~~ @array; +\& say "red in array" if "red" ~~ \e@array; +\& +\& say "some keys end in e" if /e$/ ~~ %hash; +\& say "some keys end in e" if /e$/ ~~ \e%hash; +.Ve +.PP +Two arrays smartmatch if each element in the first array smartmatches +(that is, is "in") the corresponding element in the second array, +recursively. +.PP +.Vb 6 +\& use v5.10.1; +\& my @little = qw(red blue green); +\& my @bigger = ("red", "blue", [ "orange", "green" ] ); +\& if (@little ~~ @bigger) { # true! +\& say "little is contained in bigger"; +\& } +.Ve +.PP +Because the smartmatch operator recurses on nested arrays, this +will still report that "red" is in the array. +.PP +.Vb 4 +\& use v5.10.1; +\& my @array = qw(red blue green); +\& my $nested_array = [[[[[[[ @array ]]]]]]]; +\& say "red in array" if "red" ~~ $nested_array; +.Ve +.PP +If two arrays smartmatch each other, then they are deep +copies of each others' values, as this example reports: +.PP +.Vb 3 +\& use v5.12.0; +\& my @a = (0, 1, 2, [3, [4, 5], 6], 7); +\& my @b = (0, 1, 2, [3, [4, 5], 6], 7); +\& +\& if (@a ~~ @b && @b ~~ @a) { +\& say "a and b are deep copies of each other"; +\& } +\& elsif (@a ~~ @b) { +\& say "a smartmatches in b"; +\& } +\& elsif (@b ~~ @a) { +\& say "b smartmatches in a"; +\& } +\& else { +\& say "a and b don\*(Aqt smartmatch each other at all"; +\& } +.Ve +.PP +If you were to set \f(CW\*(C`$b[3]\ =\ 4\*(C'\fR, then instead of reporting that "a and b +are deep copies of each other", it now reports that \f(CW"b smartmatches in a"\fR. +That's because the corresponding position in \f(CW@a\fR contains an array that +(eventually) has a 4 in it. +.PP +Smartmatching one hash against another reports whether both contain the +same keys, no more and no less. This could be used to see whether two +records have the same field names, without caring what values those fields +might have. For example: +.PP +.Vb 3 +\& use v5.10.1; +\& sub make_dogtag { +\& state $REQUIRED_FIELDS = { name=>1, rank=>1, serial_num=>1 }; +\& +\& my ($class, $init_fields) = @_; +\& +\& die "Must supply (only) name, rank, and serial number" +\& unless $init_fields ~~ $REQUIRED_FIELDS; +\& +\& ... +\& } +.Ve +.PP +However, this only does what you mean if \f(CW$init_fields\fR is indeed a hash +reference. The condition \f(CW\*(C`$init_fields ~~ $REQUIRED_FIELDS\*(C'\fR also allows the +strings \f(CW"name"\fR, \f(CW"rank"\fR, \f(CW"serial_num"\fR as well as any array reference +that contains \f(CW"name"\fR or \f(CW"rank"\fR or \f(CW"serial_num"\fR anywhere to pass +through. +.PP +The smartmatch operator is most often used as the implicit operator of a +\&\f(CW\*(C`when\*(C'\fR clause. See the section on "Switch Statements" in perlsyn. +.PP +\fISmartmatching of Objects\fR +.IX Subsection "Smartmatching of Objects" +.PP +To avoid relying on an object's underlying representation, if the +smartmatch's right operand is an object that doesn't overload \f(CW\*(C`~~\*(C'\fR, +it raises the exception "\f(CW\*(C`Smartmatching a non\-overloaded object +breaks encapsulation\*(C'\fR". That's because one has no business digging +around to see whether something is "in" an object. These are all +illegal on objects without a \f(CW\*(C`~~\*(C'\fR overload: +.PP +.Vb 3 +\& %hash ~~ $object +\& 42 ~~ $object +\& "fred" ~~ $object +.Ve +.PP +However, you can change the way an object is smartmatched by overloading +the \f(CW\*(C`~~\*(C'\fR operator. This is allowed to +extend the usual smartmatch semantics. +For objects that do have an \f(CW\*(C`~~\*(C'\fR overload, see overload. +.PP +Using an object as the left operand is allowed, although not very useful. +Smartmatching rules take precedence over overloading, so even if the +object in the left operand has smartmatch overloading, this will be +ignored. A left operand that is a non-overloaded object falls back on a +string or numeric comparison of whatever the \f(CW\*(C`ref\*(C'\fR operator returns. That +means that +.PP +.Vb 1 +\& $object ~~ X +.Ve +.PP +does \fInot\fR invoke the overload method with \f(CW\*(C`\fR\f(CIX\fR\f(CW\*(C'\fR as an argument. +Instead the above table is consulted as normal, and based on the type of +\&\f(CW\*(C`\fR\f(CIX\fR\f(CW\*(C'\fR, overloading may or may not be invoked. For simple strings or +numbers, "in" becomes equivalent to this: +.PP +.Vb 2 +\& $object ~~ $number ref($object) == $number +\& $object ~~ $string ref($object) eq $string +.Ve +.PP +For example, this reports that the handle smells IOish +(but please don't really do this!): +.PP +.Vb 5 +\& use IO::Handle; +\& my $fh = IO::Handle\->new(); +\& if ($fh ~~ /\ebIO\eb/) { +\& say "handle smells IOish"; +\& } +.Ve +.PP +That's because it treats \f(CW$fh\fR as a string like +\&\f(CW"IO::Handle=GLOB(0x8039e0)"\fR, then pattern matches against that. +.SS "Bitwise And" +.IX Xref "operator, bitwise, and bitwise and &" +.IX Subsection "Bitwise And" +Binary \f(CW"&"\fR returns its operands ANDed together bit by bit. Although no +warning is currently raised, the result is not well defined when this operation +is performed on operands that aren't either numbers (see +"Integer Arithmetic") nor bitstrings (see "Bitwise String Operators"). +.PP +Note that \f(CW"&"\fR has lower priority than relational operators, so for example +the parentheses are essential in a test like +.PP +.Vb 1 +\& print "Even\en" if ($x & 1) == 0; +.Ve +.PP +If the "bitwise" feature is enabled via \f(CW\*(C`use\ feature\ \*(Aqbitwise\*(Aq\*(C'\fR or +\&\f(CW\*(C`use v5.28\*(C'\fR, then this operator always treats its operands as numbers. +Before Perl 5.28 this feature produced a warning in the +\&\f(CW"experimental::bitwise"\fR category. +.SS "Bitwise Or and Exclusive Or" +.IX Xref "operator, bitwise, or bitwise or | operator, bitwise, xor bitwise xor ^" +.IX Subsection "Bitwise Or and Exclusive Or" +Binary \f(CW"|"\fR returns its operands ORed together bit by bit. +.PP +Binary \f(CW"^"\fR returns its operands XORed together bit by bit. +.PP +Although no warning is currently raised, the results are not well +defined when these operations are performed on operands that aren't either +numbers (see "Integer Arithmetic") nor bitstrings (see "Bitwise String +Operators"). +.PP +Note that \f(CW"|"\fR and \f(CW"^"\fR have lower priority than relational operators, so +for example the parentheses are essential in a test like +.PP +.Vb 1 +\& print "false\en" if (8 | 2) != 10; +.Ve +.PP +If the "bitwise" feature is enabled via \f(CW\*(C`use\ feature\ \*(Aqbitwise\*(Aq\*(C'\fR or +\&\f(CW\*(C`use v5.28\*(C'\fR, then this operator always treats its operands as numbers. +Before Perl 5.28. this feature produced a warning in the +\&\f(CW"experimental::bitwise"\fR category. +.SS "C\-style Logical And" +.IX Xref "&& logical and operator, logical, and" +.IX Subsection "C-style Logical And" +Binary \f(CW"&&"\fR performs a short-circuit logical AND operation. That is, +if the left operand is false, the right operand is not even evaluated. +Scalar or list context propagates down to the right operand if it +is evaluated. +.SS "C\-style Logical Or" +.IX Xref "|| operator, logical, or" +.IX Subsection "C-style Logical Or" +Binary \f(CW"||"\fR performs a short-circuit logical OR operation. That is, +if the left operand is true, the right operand is not even evaluated. +Scalar or list context propagates down to the right operand if it +is evaluated. +.SS "Logical Defined-Or" +.IX Xref "operator, logical, defined-or" +.IX Subsection "Logical Defined-Or" +Although it has no direct equivalent in C, Perl's \f(CW\*(C`//\*(C'\fR operator is related +to its C\-style "or". In fact, it's exactly the same as \f(CW\*(C`||\*(C'\fR, except that it +tests the left hand side's definedness instead of its truth. Thus, +\&\f(CW\*(C`EXPR1\ //\ EXPR2\*(C'\fR returns the value of \f(CW\*(C`EXPR1\*(C'\fR if it's defined, +otherwise, the value of \f(CW\*(C`EXPR2\*(C'\fR is returned. +(\f(CW\*(C`EXPR1\*(C'\fR is evaluated in scalar context, \f(CW\*(C`EXPR2\*(C'\fR +in the context of \f(CW\*(C`//\*(C'\fR itself). Usually, +this is the same result as \f(CW\*(C`defined(EXPR1)\ ?\ EXPR1\ :\ EXPR2\*(C'\fR (except that +the ternary-operator form can be used as a lvalue, while \f(CW\*(C`EXPR1\ //\ EXPR2\*(C'\fR +cannot). This is very useful for +providing default values for variables. If you actually want to test if +at least one of \f(CW$x\fR and \f(CW$y\fR is defined, use \f(CW\*(C`defined($x\ //\ $y)\*(C'\fR. +.PP +The \f(CW\*(C`||\*(C'\fR, \f(CW\*(C`//\*(C'\fR and \f(CW\*(C`&&\*(C'\fR operators return the last value evaluated +(unlike C's \f(CW\*(C`||\*(C'\fR and \f(CW\*(C`&&\*(C'\fR, which return 0 or 1). Thus, a reasonably +portable way to find out the home directory might be: +.PP +.Vb 4 +\& $home = $ENV{HOME} +\& // $ENV{LOGDIR} +\& // (getpwuid($<))[7] +\& // die "You\*(Aqre homeless!\en"; +.Ve +.PP +In particular, this means that you shouldn't use this +for selecting between two aggregates for assignment: +.PP +.Vb 3 +\& @a = @b || @c; # This doesn\*(Aqt do the right thing +\& @a = scalar(@b) || @c; # because it really means this. +\& @a = @b ? @b : @c; # This works fine, though. +.Ve +.PP +As alternatives to \f(CW\*(C`&&\*(C'\fR and \f(CW\*(C`||\*(C'\fR when used for +control flow, Perl provides the \f(CW\*(C`and\*(C'\fR and \f(CW\*(C`or\*(C'\fR operators (see below). +The short-circuit behavior is identical. The precedence of \f(CW"and"\fR +and \f(CW"or"\fR is much lower, however, so that you can safely use them after a +list operator without the need for parentheses: +.PP +.Vb 2 +\& unlink "alpha", "beta", "gamma" +\& or gripe(), next LINE; +.Ve +.PP +With the C\-style operators that would have been written like this: +.PP +.Vb 2 +\& unlink("alpha", "beta", "gamma") +\& || (gripe(), next LINE); +.Ve +.PP +It would be even more readable to write that this way: +.PP +.Vb 4 +\& unless(unlink("alpha", "beta", "gamma")) { +\& gripe(); +\& next LINE; +\& } +.Ve +.PP +Using \f(CW"or"\fR for assignment is unlikely to do what you want; see below. +.SS "Range Operators" +.IX Xref "operator, range range .. ..." +.IX Subsection "Range Operators" +Binary \f(CW".."\fR is the range operator, which is really two different +operators depending on the context. In list context, it returns a +list of values counting (up by ones) from the left value to the right +value. If the left value is greater than the right value then it +returns the empty list. The range operator is useful for writing +\&\f(CW\*(C`foreach\ (1..10)\*(C'\fR loops and for doing slice operations on arrays. In +the current implementation, no temporary array is created when the +range operator is used as the expression in \f(CW\*(C`foreach\*(C'\fR loops, but older +versions of Perl might burn a lot of memory when you write something +like this: +.PP +.Vb 3 +\& for (1 .. 1_000_000) { +\& # code +\& } +.Ve +.PP +The range operator also works on strings, using the magical +auto-increment, see below. +.PP +In scalar context, \f(CW".."\fR returns a boolean value. The operator is +bistable, like a flip-flop, and emulates the line-range (comma) +operator of \fBsed\fR, \fBawk\fR, and various editors. Each \f(CW".."\fR operator +maintains its own boolean state, even across calls to a subroutine +that contains it. It is false as long as its left operand is false. +Once the left operand is true, the range operator stays true until the +right operand is true, \fIAFTER\fR which the range operator becomes false +again. It doesn't become false till the next time the range operator +is evaluated. It can test the right operand and become false on the +same evaluation it became true (as in \fBawk\fR), but it still returns +true once. If you don't want it to test the right operand until the +next evaluation, as in \fBsed\fR, just use three dots (\f(CW"..."\fR) instead of +two. In all other regards, \f(CW"..."\fR behaves just like \f(CW".."\fR does. +.PP +The right operand is not evaluated while the operator is in the +"false" state, and the left operand is not evaluated while the +operator is in the "true" state. The precedence is a little lower +than || and &&. The value returned is either the empty string for +false, or a sequence number (beginning with 1) for true. The sequence +number is reset for each range encountered. The final sequence number +in a range has the string \f(CW"E0"\fR appended to it, which doesn't affect +its numeric value, but gives you something to search for if you want +to exclude the endpoint. You can exclude the beginning point by +waiting for the sequence number to be greater than 1. +.PP +If either operand of scalar \f(CW".."\fR is a constant expression, +that operand is considered true if it is equal (\f(CW\*(C`==\*(C'\fR) to the current +input line number (the \f(CW$.\fR variable). +.PP +To be pedantic, the comparison is actually \f(CW\*(C`int(EXPR)\ ==\ int(EXPR)\*(C'\fR, +but that is only an issue if you use a floating point expression; when +implicitly using \f(CW$.\fR as described in the previous paragraph, the +comparison is \f(CW\*(C`int(EXPR)\ ==\ int($.)\*(C'\fR which is only an issue when \f(CW$.\fR +is set to a floating point value and you are not reading from a file. +Furthermore, \f(CW"span"\ ..\ "spat"\fR or \f(CW\*(C`2.18\ ..\ 3.14\*(C'\fR will not do what +you want in scalar context because each of the operands are evaluated +using their integer representation. +.PP +Examples: +.PP +As a scalar operator: +.PP +.Vb 2 +\& if (101 .. 200) { print; } # print 2nd hundred lines, short for +\& # if ($. == 101 .. $. == 200) { print; } +\& +\& next LINE if (1 .. /^$/); # skip header lines, short for +\& # next LINE if ($. == 1 .. /^$/); +\& # (typically in a loop labeled LINE) +\& +\& s/^/> / if (/^$/ .. eof()); # quote body +\& +\& # parse mail messages +\& while (<>) { +\& $in_header = 1 .. /^$/; +\& $in_body = /^$/ .. eof; +\& if ($in_header) { +\& # do something +\& } else { # in body +\& # do something else +\& } +\& } continue { +\& close ARGV if eof; # reset $. each file +\& } +.Ve +.PP +Here's a simple example to illustrate the difference between +the two range operators: +.PP +.Vb 4 +\& @lines = (" \- Foo", +\& "01 \- Bar", +\& "1 \- Baz", +\& " \- Quux"); +\& +\& foreach (@lines) { +\& if (/0/ .. /1/) { +\& print "$_\en"; +\& } +\& } +.Ve +.PP +This program will print only the line containing "Bar". If +the range operator is changed to \f(CW\*(C`...\*(C'\fR, it will also print the +"Baz" line. +.PP +And now some examples as a list operator: +.PP +.Vb 3 +\& for (101 .. 200) { print } # print $_ 100 times +\& @foo = @foo[0 .. $#foo]; # an expensive no\-op +\& @foo = @foo[$#foo\-4 .. $#foo]; # slice last 5 items +.Ve +.PP +Because each operand is evaluated in integer form, \f(CW\*(C`2.18\ ..\ 3.14\*(C'\fR will +return two elements in list context. +.PP +.Vb 1 +\& @list = (2.18 .. 3.14); # same as @list = (2 .. 3); +.Ve +.PP +The range operator in list context can make use of the magical +auto-increment algorithm if both operands are strings, subject to the +following rules: +.IP \(bu 4 +With one exception (below), if both strings look like numbers to Perl, +the magic increment will not be applied, and the strings will be treated +as numbers (more specifically, integers) instead. +.Sp +For example, \f(CW"\-2".."2"\fR is the same as \f(CW\-2..2\fR, and +\&\f(CW"2.18".."3.14"\fR produces \f(CW\*(C`2, 3\*(C'\fR. +.IP \(bu 4 +The exception to the above rule is when the left-hand string begins with +\&\f(CW0\fR and is longer than one character, in this case the magic increment +\&\fIwill\fR be applied, even though strings like \f(CW"01"\fR would normally look +like a number to Perl. +.Sp +For example, \f(CW"01".."04"\fR produces \f(CW"01", "02", "03", "04"\fR, and +\&\f(CW"00".."\-1"\fR produces \f(CW"00"\fR through \f(CW"99"\fR \- this may seem +surprising, but see the following rules for why it works this way. +To get dates with leading zeros, you can say: +.Sp +.Vb 2 +\& @z2 = ("01" .. "31"); +\& print $z2[$mday]; +.Ve +.Sp +If you want to force strings to be interpreted as numbers, you could say +.Sp +.Vb 1 +\& @numbers = ( 0+$first .. 0+$last ); +.Ve +.Sp +\&\fBNote:\fR In Perl versions 5.30 and below, \fIany\fR string on the left-hand +side beginning with \f(CW"0"\fR, including the string \f(CW"0"\fR itself, would +cause the magic string increment behavior. This means that on these Perl +versions, \f(CW"0".."\-1"\fR would produce \f(CW"0"\fR through \f(CW"99"\fR, which was +inconsistent with \f(CW\*(C`0..\-1\*(C'\fR, which produces the empty list. This also means +that \f(CW"0".."9"\fR now produces a list of integers instead of a list of +strings. +.IP \(bu 4 +If the initial value specified isn't part of a magical increment +sequence (that is, a non-empty string matching \f(CW\*(C`/^[a\-zA\-Z]*[0\-9]*\ez/\*(C'\fR), +only the initial value will be returned. +.Sp +For example, \f(CW"ax".."az"\fR produces \f(CW"ax", "ay", "az"\fR, but +\&\f(CW"*x".."az"\fR produces only \f(CW"*x"\fR. +.IP \(bu 4 +For other initial values that are strings that do follow the rules of the +magical increment, the corresponding sequence will be returned. +.Sp +For example, you can say +.Sp +.Vb 1 +\& @alphabet = ("A" .. "Z"); +.Ve +.Sp +to get all normal letters of the English alphabet, or +.Sp +.Vb 1 +\& $hexdigit = (0 .. 9, "a" .. "f")[$num & 15]; +.Ve +.Sp +to get a hexadecimal digit. +.IP \(bu 4 +If the final value specified is not in the sequence that the magical +increment would produce, the sequence goes until the next value would +be longer than the final value specified. If the length of the final +string is shorter than the first, the empty list is returned. +.Sp +For example, \f(CW"a".."\-\-"\fR is the same as \f(CW"a".."zz"\fR, \f(CW"0".."xx"\fR +produces \f(CW"0"\fR through \f(CW"99"\fR, and \f(CW"aaa".."\-\-"\fR returns the empty +list. +.PP +As of Perl 5.26, the list-context range operator on strings works as expected +in the scope of \f(CW"use\ feature\ \*(Aqunicode_strings"\fR. In previous versions, and outside the scope of +that feature, it exhibits "The "Unicode Bug"" in perlunicode: its behavior +depends on the internal encoding of the range endpoint. +.PP +Because the magical increment only works on non-empty strings matching +\&\f(CW\*(C`/^[a\-zA\-Z]*[0\-9]*\ez/\*(C'\fR, the following will only return an alpha: +.PP +.Vb 2 +\& use charnames "greek"; +\& my @greek_small = ("\eN{alpha}" .. "\eN{omega}"); +.Ve +.PP +To get the 25 traditional lowercase Greek letters, including both sigmas, +you could use this instead: +.PP +.Vb 5 +\& use charnames "greek"; +\& my @greek_small = map { chr } ( ord("\eN{alpha}") +\& .. +\& ord("\eN{omega}") +\& ); +.Ve +.PP +However, because there are \fImany\fR other lowercase Greek characters than +just those, to match lowercase Greek characters in a regular expression, +you could use the pattern \f(CW\*(C`/(?:(?=\ep{Greek})\ep{Lower})+/\*(C'\fR (or the +experimental feature \f(CW\*(C`/(?[\ \ep{Greek}\ &\ \ep{Lower}\ ])+/\*(C'\fR). +.SS "Conditional Operator" +.IX Xref "operator, conditional operator, ternary ternary ?:" +.IX Subsection "Conditional Operator" +Ternary \f(CW"?:"\fR is the conditional operator, just as in C. It works much +like an if-then-else. If the argument before the \f(CW\*(C`?\*(C'\fR is true, the +argument before the \f(CW\*(C`:\*(C'\fR is returned, otherwise the argument after the +\&\f(CW\*(C`:\*(C'\fR is returned. For example: +.PP +.Vb 2 +\& printf "I have %d dog%s.\en", $n, +\& ($n == 1) ? "" : "s"; +.Ve +.PP +Scalar or list context propagates downward into the 2nd +or 3rd argument, whichever is selected. +.PP +.Vb 3 +\& $x = $ok ? $y : $z; # get a scalar +\& @x = $ok ? @y : @z; # get an array +\& $x = $ok ? @y : @z; # oops, that\*(Aqs just a count! +.Ve +.PP +The operator may be assigned to if both the 2nd and 3rd arguments are +legal lvalues (meaning that you can assign to them): +.PP +.Vb 1 +\& ($x_or_y ? $x : $y) = $z; +.Ve +.PP +Because this operator produces an assignable result, using assignments +without parentheses will get you in trouble. For example, this: +.PP +.Vb 1 +\& $x % 2 ? $x += 10 : $x += 2 +.Ve +.PP +Really means this: +.PP +.Vb 1 +\& (($x % 2) ? ($x += 10) : $x) += 2 +.Ve +.PP +Rather than this: +.PP +.Vb 1 +\& ($x % 2) ? ($x += 10) : ($x += 2) +.Ve +.PP +That should probably be written more simply as: +.PP +.Vb 1 +\& $x += ($x % 2) ? 10 : 2; +.Ve +.SS "Assignment Operators" +.IX Xref "assignment operator, assignment = **= += *= &= <<= &&= -= = |= >>= ||= = .= %= ^= x= &.= |.= ^.=" +.IX Subsection "Assignment Operators" +\&\f(CW"="\fR is the ordinary assignment operator. +.PP +Assignment operators work as in C. That is, +.PP +.Vb 1 +\& $x += 2; +.Ve +.PP +is equivalent to +.PP +.Vb 1 +\& $x = $x + 2; +.Ve +.PP +although without duplicating any side effects that dereferencing the lvalue +might trigger, such as from \f(CWtie()\fR. Other assignment operators work similarly. +The following are recognized: +.PP +.Vb 4 +\& **= += *= &= &.= <<= &&= +\& \-= /= |= |.= >>= ||= +\& .= %= ^= ^.= //= +\& x= +.Ve +.PP +Although these are grouped by family, they all have the precedence +of assignment. These combined assignment operators can only operate on +scalars, whereas the ordinary assignment operator can assign to arrays, +hashes, lists and even references. (See "Context" +and "List value constructors" in perldata, and "Assigning to +References" in perlref.) +.PP +Unlike in C, the scalar assignment operator produces a valid lvalue. +Modifying an assignment is equivalent to doing the assignment and +then modifying the variable that was assigned to. This is useful +for modifying a copy of something, like this: +.PP +.Vb 1 +\& ($tmp = $global) =~ tr/13579/24680/; +.Ve +.PP +Although as of 5.14, that can be also be accomplished this way: +.PP +.Vb 2 +\& use v5.14; +\& $tmp = ($global =~ tr/13579/24680/r); +.Ve +.PP +Likewise, +.PP +.Vb 1 +\& ($x += 2) *= 3; +.Ve +.PP +is equivalent to +.PP +.Vb 2 +\& $x += 2; +\& $x *= 3; +.Ve +.PP +Similarly, a list assignment in list context produces the list of +lvalues assigned to, and a list assignment in scalar context returns +the number of elements produced by the expression on the right hand +side of the assignment. +.PP +The three dotted bitwise assignment operators (\f(CW\*(C`&.=\*(C'\fR \f(CW\*(C`|.=\*(C'\fR \f(CW\*(C`^.=\*(C'\fR) are new in +Perl 5.22. See "Bitwise String Operators". +.SS "Comma Operator" +.IX Xref "comma operator, comma ," +.IX Subsection "Comma Operator" +Binary \f(CW","\fR is the comma operator. In scalar context it evaluates +its left argument, throws that value away, then evaluates its right +argument and returns that value. This is just like C's comma operator. +.PP +In list context, it's just the list argument separator, and inserts +both its arguments into the list. These arguments are also evaluated +from left to right. +.PP +The \f(CW\*(C`=>\*(C'\fR operator (sometimes pronounced "fat comma") is a synonym +for the comma except that it causes a +word on its left to be interpreted as a string if it begins with a letter +or underscore and is composed only of letters, digits and underscores. +This includes operands that might otherwise be interpreted as operators, +constants, single number v\-strings or function calls. If in doubt about +this behavior, the left operand can be quoted explicitly. +.PP +Otherwise, the \f(CW\*(C`=>\*(C'\fR operator behaves exactly as the comma operator +or list argument separator, according to context. +.PP +For example: +.PP +.Vb 1 +\& use constant FOO => "something"; +\& +\& my %h = ( FOO => 23 ); +.Ve +.PP +is equivalent to: +.PP +.Vb 1 +\& my %h = ("FOO", 23); +.Ve +.PP +It is \fINOT\fR: +.PP +.Vb 1 +\& my %h = ("something", 23); +.Ve +.PP +The \f(CW\*(C`=>\*(C'\fR operator is helpful in documenting the correspondence +between keys and values in hashes, and other paired elements in lists. +.PP +.Vb 2 +\& %hash = ( $key => $value ); +\& login( $username => $password ); +.Ve +.PP +The special quoting behavior ignores precedence, and hence may apply to +\&\fIpart\fR of the left operand: +.PP +.Vb 1 +\& print time.shift => "bbb"; +.Ve +.PP +That example prints something like \f(CW"1314363215shiftbbb"\fR, because the +\&\f(CW\*(C`=>\*(C'\fR implicitly quotes the \f(CW\*(C`shift\*(C'\fR immediately on its left, ignoring +the fact that \f(CW\*(C`time.shift\*(C'\fR is the entire left operand. +.SS "List Operators (Rightward)" +.IX Xref "operator, list, rightward list operator" +.IX Subsection "List Operators (Rightward)" +On the right side of a list operator, the comma has very low precedence, +such that it controls all comma-separated expressions found there. +The only operators with lower precedence are the logical operators +\&\f(CW"and"\fR, \f(CW"or"\fR, and \f(CW"not"\fR, which may be used to evaluate calls to list +operators without the need for parentheses: +.PP +.Vb 2 +\& open HANDLE, "< :encoding(UTF\-8)", "filename" +\& or die "Can\*(Aqt open: $!\en"; +.Ve +.PP +However, some people find that code harder to read than writing +it with parentheses: +.PP +.Vb 2 +\& open(HANDLE, "< :encoding(UTF\-8)", "filename") +\& or die "Can\*(Aqt open: $!\en"; +.Ve +.PP +in which case you might as well just use the more customary \f(CW"||"\fR operator: +.PP +.Vb 2 +\& open(HANDLE, "< :encoding(UTF\-8)", "filename") +\& || die "Can\*(Aqt open: $!\en"; +.Ve +.PP +See also discussion of list operators in "Terms and List Operators (Leftward)". +.SS "Logical Not" +.IX Xref "operator, logical, not not" +.IX Subsection "Logical Not" +Unary \f(CW"not"\fR returns the logical negation of the expression to its right. +It's the equivalent of \f(CW"!"\fR except for the very low precedence. +.SS "Logical And" +.IX Xref "operator, logical, and and" +.IX Subsection "Logical And" +Binary \f(CW"and"\fR returns the logical conjunction of the two surrounding +expressions. It's equivalent to \f(CW\*(C`&&\*(C'\fR except for the very low +precedence. This means that it short-circuits: the right +expression is evaluated only if the left expression is true. +.SS "Logical or and Exclusive Or" +.IX Xref "operator, logical, or operator, logical, xor operator, logical, exclusive or or xor" +.IX Subsection "Logical or and Exclusive Or" +Binary \f(CW"or"\fR returns the logical disjunction of the two surrounding +expressions. It's equivalent to \f(CW\*(C`||\*(C'\fR except for the very low precedence. +This makes it useful for control flow: +.PP +.Vb 1 +\& print FH $data or die "Can\*(Aqt write to FH: $!"; +.Ve +.PP +This means that it short-circuits: the right expression is evaluated +only if the left expression is false. Due to its precedence, you must +be careful to avoid using it as replacement for the \f(CW\*(C`||\*(C'\fR operator. +It usually works out better for flow control than in assignments: +.PP +.Vb 3 +\& $x = $y or $z; # bug: this is wrong +\& ($x = $y) or $z; # really means this +\& $x = $y || $z; # better written this way +.Ve +.PP +However, when it's a list-context assignment and you're trying to use +\&\f(CW\*(C`||\*(C'\fR for control flow, you probably need \f(CW"or"\fR so that the assignment +takes higher precedence. +.PP +.Vb 2 +\& @info = stat($file) || die; # oops, scalar sense of stat! +\& @info = stat($file) or die; # better, now @info gets its due +.Ve +.PP +Then again, you could always use parentheses. +.PP +Binary \f(CW"xor"\fR returns the exclusive-OR of the two surrounding expressions. +It cannot short-circuit (of course). +.PP +There is no low precedence operator for defined-OR. +.SS "C Operators Missing From Perl" +.IX Xref "operator, missing from perl & * typecasting (TYPE)" +.IX Subsection "C Operators Missing From Perl" +Here is what C has that Perl doesn't: +.IP "unary &" 8 +.IX Item "unary &" +Address-of operator. (But see the \f(CW"\e"\fR operator for taking a reference.) +.IP "unary *" 8 +.IX Item "unary *" +Dereference-address operator. (Perl's prefix dereferencing +operators are typed: \f(CW\*(C`$\*(C'\fR, \f(CW\*(C`@\*(C'\fR, \f(CW\*(C`%\*(C'\fR, and \f(CW\*(C`&\*(C'\fR.) +.IP (TYPE) 8 +.IX Item "(TYPE)" +Type-casting operator. +.SS "Quote and Quote-like Operators" +.IX Xref "operator, quote operator, quote-like q qq qx qw m qr s tr ' '' "" """" ` `` << escape sequence escape" +.IX Subsection "Quote and Quote-like Operators" +While we usually think of quotes as literal values, in Perl they +function as operators, providing various kinds of interpolating and +pattern matching capabilities. Perl provides customary quote characters +for these behaviors, but also provides a way for you to choose your +quote character for any of them. In the following table, a \f(CW\*(C`{}\*(C'\fR represents +any pair of delimiters you choose. +.PP +.Vb 11 +\& Customary Generic Meaning Interpolates +\& \*(Aq\*(Aq q{} Literal no +\& "" qq{} Literal yes +\& \`\` qx{} Command yes* +\& qw{} Word list no +\& // m{} Pattern match yes* +\& qr{} Pattern yes* +\& s{}{} Substitution yes* +\& tr{}{} Transliteration no (but see below) +\& y{}{} Transliteration no (but see below) +\& <<EOF here\-doc yes* +\& +\& * unless the delimiter is \*(Aq\*(Aq. +.Ve +.PP +Non-bracketing delimiters use the same character fore and aft, but the four +sorts of ASCII brackets (round, angle, square, curly) all nest, which means +that +.PP +.Vb 1 +\& q{foo{bar}baz} +.Ve +.PP +is the same as +.PP +.Vb 1 +\& \*(Aqfoo{bar}baz\*(Aq +.Ve +.PP +Note, however, that this does not always work for quoting Perl code: +.PP +.Vb 1 +\& $s = q{ if($x eq "}") ... }; # WRONG +.Ve +.PP +is a syntax error. The \f(CW\*(C`Text::Balanced\*(C'\fR module (standard as of v5.8, +and from CPAN before then) is able to do this properly. +.PP +There can (and in some cases, must) be whitespace between the operator +and the quoting +characters, except when \f(CW\*(C`#\*(C'\fR is being used as the quoting character. +\&\f(CW\*(C`q#foo#\*(C'\fR is parsed as the string \f(CW\*(C`foo\*(C'\fR, while \f(CW\*(C`q\ #foo#\*(C'\fR is the +operator \f(CW\*(C`q\*(C'\fR followed by a comment. Its argument will be taken +from the next line. This allows you to write: +.PP +.Vb 2 +\& s {foo} # Replace foo +\& {bar} # with bar. +.Ve +.PP +The cases where whitespace must be used are when the quoting character +is a word character (meaning it matches \f(CW\*(C`/\ew/\*(C'\fR): +.PP +.Vb 2 +\& q XfooX # Works: means the string \*(Aqfoo\*(Aq +\& qXfooX # WRONG! +.Ve +.PP +The following escape sequences are available in constructs that interpolate, +and in transliterations whose delimiters aren't single quotes (\f(CW"\*(Aq"\fR). +In all the ones with braces, any number of blanks and/or tabs adjoining +and within the braces are allowed (and ignored). +.IX Xref "\\t \\n \\r \\f \\b \\a \\e \\x \\0 \\c \\N \\N{} \\o{}" +.PP +.Vb 10 +\& Sequence Note Description +\& \et tab (HT, TAB) +\& \en newline (NL) +\& \er return (CR) +\& \ef form feed (FF) +\& \eb backspace (BS) +\& \ea alarm (bell) (BEL) +\& \ee escape (ESC) +\& \ex{263A} [1,8] hex char (example shown: SMILEY) +\& \ex{ 263A } Same, but shows optional blanks inside and +\& adjoining the braces +\& \ex1b [2,8] restricted range hex char (example: ESC) +\& \eN{name} [3] named Unicode character or character sequence +\& \eN{U+263D} [4,8] Unicode character (example: FIRST QUARTER MOON) +\& \ec[ [5] control char (example: chr(27)) +\& \eo{23072} [6,8] octal char (example: SMILEY) +\& \e033 [7,8] restricted range octal char (example: ESC) +.Ve +.PP +Note that any escape sequence using braces inside interpolated +constructs may have optional blanks (tab or space characters) adjoining +with and inside of the braces, as illustrated above by the second +\&\f(CW\*(C`\ex{\ }\*(C'\fR example. +.IP [1] 4 +.IX Item "[1]" +The result is the character specified by the hexadecimal number between +the braces. See "[8]" below for details on which character. +.Sp +Blanks (tab or space characters) may separate the number from either or +both of the braces. +.Sp +Otherwise, only hexadecimal digits are valid between the braces. If an +invalid character is encountered, a warning will be issued and the +invalid character and all subsequent characters (valid or invalid) +within the braces will be discarded. +.Sp +If there are no valid digits between the braces, the generated character is +the NULL character (\f(CW\*(C`\ex{00}\*(C'\fR). However, an explicit empty brace (\f(CW\*(C`\ex{}\*(C'\fR) +will not cause a warning (currently). +.IP [2] 4 +.IX Item "[2]" +The result is the character specified by the hexadecimal number in the range +0x00 to 0xFF. See "[8]" below for details on which character. +.Sp +Only hexadecimal digits are valid following \f(CW\*(C`\ex\*(C'\fR. When \f(CW\*(C`\ex\*(C'\fR is followed +by fewer than two valid digits, any valid digits will be zero-padded. This +means that \f(CW\*(C`\ex7\*(C'\fR will be interpreted as \f(CW\*(C`\ex07\*(C'\fR, and a lone \f(CW"\ex"\fR will be +interpreted as \f(CW\*(C`\ex00\*(C'\fR. Except at the end of a string, having fewer than +two valid digits will result in a warning. Note that although the warning +says the illegal character is ignored, it is only ignored as part of the +escape and will still be used as the subsequent character in the string. +For example: +.Sp +.Vb 5 +\& Original Result Warns? +\& "\ex7" "\ex07" no +\& "\ex" "\ex00" no +\& "\ex7q" "\ex07q" yes +\& "\exq" "\ex00q" yes +.Ve +.IP [3] 4 +.IX Item "[3]" +The result is the Unicode character or character sequence given by \fIname\fR. +See charnames. +.IP [4] 4 +.IX Item "[4]" +\&\f(CW\*(C`\eN{U+\fR\f(CIhexadecimal\ number\fR\f(CW}\*(C'\fR means the Unicode character whose Unicode code +point is \fIhexadecimal number\fR. +.IP [5] 4 +.IX Item "[5]" +The character following \f(CW\*(C`\ec\*(C'\fR is mapped to some other character as shown in the +table: +.Sp +.Vb 10 +\& Sequence Value +\& \ec@ chr(0) +\& \ecA chr(1) +\& \eca chr(1) +\& \ecB chr(2) +\& \ecb chr(2) +\& ... +\& \ecZ chr(26) +\& \ecz chr(26) +\& \ec[ chr(27) +\& # See below for chr(28) +\& \ec] chr(29) +\& \ec^ chr(30) +\& \ec_ chr(31) +\& \ec? chr(127) # (on ASCII platforms; see below for link to +\& # EBCDIC discussion) +.Ve +.Sp +In other words, it's the character whose code point has had 64 xor'd with +its uppercase. \f(CW\*(C`\ec?\*(C'\fR is DELETE on ASCII platforms because +\&\f(CW\*(C`ord("?")\ ^\ 64\*(C'\fR is 127, and +\&\f(CW\*(C`\ec@\*(C'\fR is NULL because the ord of \f(CW"@"\fR is 64, so xor'ing 64 itself produces 0. +.Sp +Also, \f(CW\*(C`\ec\e\fR\f(CIX\fR\f(CW\*(C'\fR yields \f(CW\*(C`\ chr(28)\ .\ "\fR\f(CIX\fR\f(CW"\*(C'\fR for any \fIX\fR, but cannot come at the +end of a string, because the backslash would be parsed as escaping the end +quote. +.Sp +On ASCII platforms, the resulting characters from the list above are the +complete set of ASCII controls. This isn't the case on EBCDIC platforms; see +"OPERATOR DIFFERENCES" in perlebcdic for a full discussion of the +differences between these for ASCII versus EBCDIC platforms. +.Sp +Use of any other character following the \f(CW"c"\fR besides those listed above is +discouraged, and as of Perl v5.20, the only characters actually allowed +are the printable ASCII ones, minus the left brace \f(CW"{"\fR. What happens +for any of the allowed other characters is that the value is derived by +xor'ing with the seventh bit, which is 64, and a warning raised if +enabled. Using the non-allowed characters generates a fatal error. +.Sp +To get platform independent controls, you can use \f(CW\*(C`\eN{...}\*(C'\fR. +.IP [6] 4 +.IX Item "[6]" +The result is the character specified by the octal number between the braces. +See "[8]" below for details on which character. +.Sp +Blanks (tab or space characters) may separate the number from either or +both of the braces. +.Sp +Otherwise, if a character that isn't an octal digit is encountered, a +warning is raised, and the value is based on the octal digits before it, +discarding it and all following characters up to the closing brace. It +is a fatal error if there are no octal digits at all. +.IP [7] 4 +.IX Item "[7]" +The result is the character specified by the three-digit octal number in the +range 000 to 777 (but best to not use above 077, see next paragraph). See +"[8]" below for details on which character. +.Sp +Some contexts allow 2 or even 1 digit, but any usage without exactly +three digits, the first being a zero, may give unintended results. (For +example, in a regular expression it may be confused with a backreference; +see "Octal escapes" in perlrebackslash.) Starting in Perl 5.14, you may +use \f(CW\*(C`\eo{}\*(C'\fR instead, which avoids all these problems. Otherwise, it is best to +use this construct only for ordinals \f(CW\*(C`\e077\*(C'\fR and below, remembering to pad to +the left with zeros to make three digits. For larger ordinals, either use +\&\f(CW\*(C`\eo{}\*(C'\fR, or convert to something else, such as to hex and use \f(CW\*(C`\eN{U+}\*(C'\fR +(which is portable between platforms with different character sets) or +\&\f(CW\*(C`\ex{}\*(C'\fR instead. +.IP [8] 4 +.IX Item "[8]" +Several constructs above specify a character by a number. That number +gives the character's position in the character set encoding (indexed from 0). +This is called synonymously its ordinal, code position, or code point. Perl +works on platforms that have a native encoding currently of either ASCII/Latin1 +or EBCDIC, each of which allow specification of 256 characters. In general, if +the number is 255 (0xFF, 0377) or below, Perl interprets this in the platform's +native encoding. If the number is 256 (0x100, 0400) or above, Perl interprets +it as a Unicode code point and the result is the corresponding Unicode +character. For example \f(CW\*(C`\ex{50}\*(C'\fR and \f(CW\*(C`\eo{120}\*(C'\fR both are the number 80 in +decimal, which is less than 256, so the number is interpreted in the native +character set encoding. In ASCII the character in the 80th position (indexed +from 0) is the letter \f(CW"P"\fR, and in EBCDIC it is the ampersand symbol \f(CW"&"\fR. +\&\f(CW\*(C`\ex{100}\*(C'\fR and \f(CW\*(C`\eo{400}\*(C'\fR are both 256 in decimal, so the number is interpreted +as a Unicode code point no matter what the native encoding is. The name of the +character in the 256th position (indexed by 0) in Unicode is +\&\f(CW\*(C`LATIN CAPITAL LETTER A WITH MACRON\*(C'\fR. +.Sp +An exception to the above rule is that \f(CW\*(C`\eN{U+\fR\f(CIhex\ number\fR\f(CW}\*(C'\fR is +always interpreted as a Unicode code point, so that \f(CW\*(C`\eN{U+0050}\*(C'\fR is \f(CW"P"\fR even +on EBCDIC platforms. +.PP +\&\fBNOTE\fR: Unlike C and other languages, Perl has no \f(CW\*(C`\ev\*(C'\fR escape sequence for +the vertical tab (VT, which is 11 in both ASCII and EBCDIC), but you may +use \f(CW\*(C`\eN{VT}\*(C'\fR, \f(CW\*(C`\eck\*(C'\fR, \f(CW\*(C`\eN{U+0b}\*(C'\fR, or \f(CW\*(C`\ex0b\*(C'\fR. (\f(CW\*(C`\ev\*(C'\fR +does have meaning in regular expression patterns in Perl, see perlre.) +.PP +The following escape sequences are available in constructs that interpolate, +but not in transliterations. +.IX Xref "\\l \\u \\L \\U \\E \\Q \\F" +.PP +.Vb 9 +\& \el lowercase next character only +\& \eu titlecase (not uppercase!) next character only +\& \eL lowercase all characters till \eE or end of string +\& \eU uppercase all characters till \eE or end of string +\& \eF foldcase all characters till \eE or end of string +\& \eQ quote (disable) pattern metacharacters till \eE or +\& end of string +\& \eE end either case modification or quoted section +\& (whichever was last seen) +.Ve +.PP +See "quotemeta" in perlfunc for the exact definition of characters that +are quoted by \f(CW\*(C`\eQ\*(C'\fR. +.PP +\&\f(CW\*(C`\eL\*(C'\fR, \f(CW\*(C`\eU\*(C'\fR, \f(CW\*(C`\eF\*(C'\fR, and \f(CW\*(C`\eQ\*(C'\fR can stack, in which case you need one +\&\f(CW\*(C`\eE\*(C'\fR for each. For example: +.PP +.Vb 2 +\& say "This \eQquoting \eubusiness \eUhere isn\*(Aqt quite\eE done yet,\eE is it?"; +\& This quoting\e Business\e HERE\e ISN\e\*(AqT\e QUITE\e done\e yet\e, is it? +.Ve +.PP +If a \f(CW\*(C`use\ locale\*(C'\fR form that includes \f(CW\*(C`LC_CTYPE\*(C'\fR is in effect (see +perllocale), the case map used by \f(CW\*(C`\el\*(C'\fR, \f(CW\*(C`\eL\*(C'\fR, \f(CW\*(C`\eu\*(C'\fR, and \f(CW\*(C`\eU\*(C'\fR is +taken from the current locale. If Unicode (for example, \f(CW\*(C`\eN{}\*(C'\fR or code +points of 0x100 or beyond) is being used, the case map used by \f(CW\*(C`\el\*(C'\fR, +\&\f(CW\*(C`\eL\*(C'\fR, \f(CW\*(C`\eu\*(C'\fR, and \f(CW\*(C`\eU\*(C'\fR is as defined by Unicode. That means that +case-mapping a single character can sometimes produce a sequence of +several characters. +Under \f(CW\*(C`use\ locale\*(C'\fR, \f(CW\*(C`\eF\*(C'\fR produces the same results as \f(CW\*(C`\eL\*(C'\fR +for all locales but a UTF\-8 one, where it instead uses the Unicode +definition. +.PP +All systems use the virtual \f(CW"\en"\fR to represent a line terminator, +called a "newline". There is no such thing as an unvarying, physical +newline character. It is only an illusion that the operating system, +device drivers, C libraries, and Perl all conspire to preserve. Not all +systems read \f(CW"\er"\fR as ASCII CR and \f(CW"\en"\fR as ASCII LF. For example, +on the ancient Macs (pre-MacOS X) of yesteryear, these used to be reversed, +and on systems without a line terminator, +printing \f(CW"\en"\fR might emit no actual data. In general, use \f(CW"\en"\fR when +you mean a "newline" for your system, but use the literal ASCII when you +need an exact character. For example, most networking protocols expect +and prefer a CR+LF (\f(CW"\e015\e012"\fR or \f(CW"\ecM\ecJ"\fR) for line terminators, +and although they often accept just \f(CW"\e012"\fR, they seldom tolerate just +\&\f(CW"\e015"\fR. If you get in the habit of using \f(CW"\en"\fR for networking, +you may be burned some day. +.IX Xref "newline line terminator eol end of line \\n \\r \\r\\n" +.PP +For constructs that do interpolate, variables beginning with "\f(CW\*(C`$\*(C'\fR" +or "\f(CW\*(C`@\*(C'\fR" are interpolated. Subscripted variables such as \f(CW$a[3]\fR or +\&\f(CW\*(C`$href\->{key}[0]\*(C'\fR are also interpolated, as are array and hash slices. +But method calls such as \f(CW\*(C`$obj\->meth\*(C'\fR are not. +.PP +Interpolating an array or slice interpolates the elements in order, +separated by the value of \f(CW$"\fR, so is equivalent to interpolating +\&\f(CW\*(C`join\ $",\ @array\*(C'\fR. "Punctuation" arrays such as \f(CW\*(C`@*\*(C'\fR are usually +interpolated only if the name is enclosed in braces \f(CW\*(C`@{*}\*(C'\fR, but the +arrays \f(CW@_\fR, \f(CW\*(C`@+\*(C'\fR, and \f(CW\*(C`@\-\*(C'\fR are interpolated even without braces. +.PP +For double-quoted strings, the quoting from \f(CW\*(C`\eQ\*(C'\fR is applied after +interpolation and escapes are processed. +.PP +.Vb 1 +\& "abc\eQfoo\etbar$s\eExyz" +.Ve +.PP +is equivalent to +.PP +.Vb 1 +\& "abc" . quotemeta("foo\etbar$s") . "xyz" +.Ve +.PP +For the pattern of regex operators (\f(CW\*(C`qr//\*(C'\fR, \f(CW\*(C`m//\*(C'\fR and \f(CW\*(C`s///\*(C'\fR), +the quoting from \f(CW\*(C`\eQ\*(C'\fR is applied after interpolation is processed, +but before escapes are processed. This allows the pattern to match +literally (except for \f(CW\*(C`$\*(C'\fR and \f(CW\*(C`@\*(C'\fR). For example, the following matches: +.PP +.Vb 1 +\& \*(Aq\es\et\*(Aq =~ /\eQ\es\et/ +.Ve +.PP +Because \f(CW\*(C`$\*(C'\fR or \f(CW\*(C`@\*(C'\fR trigger interpolation, you'll need to use something +like \f(CW\*(C`/\eQuser\eE\e@\eQhost/\*(C'\fR to match them literally. +.PP +Patterns are subject to an additional level of interpretation as a +regular expression. This is done as a second pass, after variables are +interpolated, so that regular expressions may be incorporated into the +pattern from the variables. If this is not what you want, use \f(CW\*(C`\eQ\*(C'\fR to +interpolate a variable literally. +.PP +Apart from the behavior described above, Perl does not expand +multiple levels of interpolation. In particular, contrary to the +expectations of shell programmers, back-quotes do \fINOT\fR interpolate +within double quotes, nor do single quotes impede evaluation of +variables when used within double quotes. +.SS "Regexp Quote-Like Operators" +.IX Xref "operator, regexp" +.IX Subsection "Regexp Quote-Like Operators" +Here are the quote-like operators that apply to pattern +matching and related activities. +.ie n .IP """qr/\fISTRING\fR/msixpodualn""" 8 +.el .IP \f(CWqr/\fR\f(CISTRING\fR\f(CW/msixpodualn\fR 8 +.IX Xref "qr i m o s x p" +.IX Item "qr/STRING/msixpodualn" +This operator quotes (and possibly compiles) its \fISTRING\fR as a regular +expression. \fISTRING\fR is interpolated the same way as \fIPATTERN\fR +in \f(CW\*(C`m/\fR\f(CIPATTERN\fR\f(CW/\*(C'\fR. If \f(CW"\*(Aq"\fR is used as the delimiter, no variable +interpolation is done. Returns a Perl value which may be used instead of the +corresponding \f(CW\*(C`/\fR\f(CISTRING\fR\f(CW/msixpodualn\*(C'\fR expression. The returned value is a +normalized version of the original pattern. It magically differs from +a string containing the same characters: \f(CWref(qr/x/)\fR returns "Regexp"; +however, dereferencing it is not well defined (you currently get the +normalized version of the original pattern, but this may change). +.Sp +For example, +.Sp +.Vb 3 +\& $rex = qr/my.STRING/is; +\& print $rex; # prints (?si\-xm:my.STRING) +\& s/$rex/foo/; +.Ve +.Sp +is equivalent to +.Sp +.Vb 1 +\& s/my.STRING/foo/is; +.Ve +.Sp +The result may be used as a subpattern in a match: +.Sp +.Vb 5 +\& $re = qr/$pattern/; +\& $string =~ /foo${re}bar/; # can be interpolated in other +\& # patterns +\& $string =~ $re; # or used standalone +\& $string =~ /$re/; # or this way +.Ve +.Sp +Since Perl may compile the pattern at the moment of execution of the \f(CWqr()\fR +operator, using \f(CWqr()\fR may have speed advantages in some situations, +notably if the result of \f(CWqr()\fR is used standalone: +.Sp +.Vb 11 +\& sub match { +\& my $patterns = shift; +\& my @compiled = map qr/$_/i, @$patterns; +\& grep { +\& my $success = 0; +\& foreach my $pat (@compiled) { +\& $success = 1, last if /$pat/; +\& } +\& $success; +\& } @_; +\& } +.Ve +.Sp +Precompilation of the pattern into an internal representation at +the moment of \f(CWqr()\fR avoids the need to recompile the pattern every +time a match \f(CW\*(C`/$pat/\*(C'\fR is attempted. (Perl has many other internal +optimizations, but none would be triggered in the above example if +we did not use \f(CWqr()\fR operator.) +.Sp +Options (specified by the following modifiers) are: +.Sp +.Vb 10 +\& m Treat string as multiple lines. +\& s Treat string as single line. (Make . match a newline) +\& i Do case\-insensitive pattern matching. +\& x Use extended regular expressions; specifying two +\& x\*(Aqs means \et and the SPACE character are ignored within +\& square\-bracketed character classes +\& p When matching preserve a copy of the matched string so +\& that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be +\& defined (ignored starting in v5.20 as these are always +\& defined starting in that release) +\& o Compile pattern only once. +\& a ASCII\-restrict: Use ASCII for \ed, \es, \ew and [[:posix:]] +\& character classes; specifying two a\*(Aqs adds the further +\& restriction that no ASCII character will match a +\& non\-ASCII one under /i. +\& l Use the current run\-time locale\*(Aqs rules. +\& u Use Unicode rules. +\& d Use Unicode or native charset, as in 5.12 and earlier. +\& n Non\-capture mode. Don\*(Aqt let () fill in $1, $2, etc... +.Ve +.Sp +If a precompiled pattern is embedded in a larger pattern then the effect +of \f(CW"msixpluadn"\fR will be propagated appropriately. The effect that the +\&\f(CW\*(C`/o\*(C'\fR modifier has is not propagated, being restricted to those patterns +explicitly using it. +.Sp +The \f(CW\*(C`/a\*(C'\fR, \f(CW\*(C`/d\*(C'\fR, \f(CW\*(C`/l\*(C'\fR, and \f(CW\*(C`/u\*(C'\fR modifiers (added in Perl 5.14) +control the character set rules, but \f(CW\*(C`/a\*(C'\fR is the only one you are likely +to want to specify explicitly; the other three are selected +automatically by various pragmas. +.Sp +See perlre for additional information on valid syntax for \fISTRING\fR, and +for a detailed look at the semantics of regular expressions. In +particular, all modifiers except the largely obsolete \f(CW\*(C`/o\*(C'\fR are further +explained in "Modifiers" in perlre. \f(CW\*(C`/o\*(C'\fR is described in the next section. +.ie n .IP """m/\fIPATTERN\fR/msixpodualngc""" 8 +.el .IP \f(CWm/\fR\f(CIPATTERN\fR\f(CW/msixpodualngc\fR 8 +.IX Xref "m operator, match regexp, options regexp regex, options regex m s i x p o g c" +.IX Item "m/PATTERN/msixpodualngc" +.PD 0 +.ie n .IP """/\fIPATTERN\fR/msixpodualngc""" 8 +.el .IP \f(CW/\fR\f(CIPATTERN\fR\f(CW/msixpodualngc\fR 8 +.IX Item "/PATTERN/msixpodualngc" +.PD +Searches a string for a pattern match, and in scalar context returns +true if it succeeds, false if it fails. If no string is specified +via the \f(CW\*(C`=~\*(C'\fR or \f(CW\*(C`!~\*(C'\fR operator, the \f(CW$_\fR string is searched. (The +string specified with \f(CW\*(C`=~\*(C'\fR need not be an lvalue\-\-it may be the +result of an expression evaluation, but remember the \f(CW\*(C`=~\*(C'\fR binds +rather tightly.) See also perlre. +.Sp +Options are as described in \f(CW\*(C`qr//\*(C'\fR above; in addition, the following match +process modifiers are available: +.Sp +.Vb 3 +\& g Match globally, i.e., find all occurrences. +\& c Do not reset search position on a failed match when /g is +\& in effect. +.Ve +.Sp +If \f(CW"/"\fR is the delimiter then the initial \f(CW\*(C`m\*(C'\fR is optional. With the \f(CW\*(C`m\*(C'\fR +you can use any pair of non-whitespace (ASCII) characters +as delimiters. This is particularly useful for matching path names +that contain \f(CW"/"\fR, to avoid LTS (leaning toothpick syndrome). If \f(CW"?"\fR is +the delimiter, then a match-only-once rule applies, +described in \f(CW\*(C`m?\fR\f(CIPATTERN\fR\f(CW?\*(C'\fR below. If \f(CW"\*(Aq"\fR (single quote) is the delimiter, +no variable interpolation is performed on the \fIPATTERN\fR. +When using a delimiter character valid in an identifier, whitespace is required +after the \f(CW\*(C`m\*(C'\fR. +.Sp +\&\fIPATTERN\fR may contain variables, which will be interpolated +every time the pattern search is evaluated, except +for when the delimiter is a single quote. (Note that \f(CW$(\fR, \f(CW$)\fR, and +\&\f(CW$|\fR are not interpolated because they look like end-of-string tests.) +Perl will not recompile the pattern unless an interpolated +variable that it contains changes. You can force Perl to skip the +test and never recompile by adding a \f(CW\*(C`/o\*(C'\fR (which stands for "once") +after the trailing delimiter. +Once upon a time, Perl would recompile regular expressions +unnecessarily, and this modifier was useful to tell it not to do so, in the +interests of speed. But now, the only reasons to use \f(CW\*(C`/o\*(C'\fR are one of: +.RS 8 +.IP 1. 4 +The variables are thousands of characters long and you know that they +don't change, and you need to wring out the last little bit of speed by +having Perl skip testing for that. (There is a maintenance penalty for +doing this, as mentioning \f(CW\*(C`/o\*(C'\fR constitutes a promise that you won't +change the variables in the pattern. If you do change them, Perl won't +even notice.) +.IP 2. 4 +you want the pattern to use the initial values of the variables +regardless of whether they change or not. (But there are saner ways +of accomplishing this than using \f(CW\*(C`/o\*(C'\fR.) +.IP 3. 4 +If the pattern contains embedded code, such as +.Sp +.Vb 3 +\& use re \*(Aqeval\*(Aq; +\& $code = \*(Aqfoo(?{ $x })\*(Aq; +\& /$code/ +.Ve +.Sp +then perl will recompile each time, even though the pattern string hasn't +changed, to ensure that the current value of \f(CW$x\fR is seen each time. +Use \f(CW\*(C`/o\*(C'\fR if you want to avoid this. +.RE +.RS 8 +.Sp +The bottom line is that using \f(CW\*(C`/o\*(C'\fR is almost never a good idea. +.RE +.ie n .IP "The empty pattern ""//""" 8 +.el .IP "The empty pattern \f(CW//\fR" 8 +.IX Item "The empty pattern //" +If the \fIPATTERN\fR evaluates to the empty string, the last +\&\fIsuccessfully\fR matched regular expression is used instead. In this +case, only the \f(CW\*(C`g\*(C'\fR and \f(CW\*(C`c\*(C'\fR flags on the empty pattern are honored; the +other flags are taken from the original pattern. If no match has +previously succeeded, this will (silently) act instead as a genuine +empty pattern (which will always match). Using a user supplied string as +a pattern has the risk that if the string is empty that it triggers the +"last successful match" behavior, which can be very confusing. In such +cases you are recommended to replace \f(CW\*(C`m/$pattern/\*(C'\fR with +\&\f(CW\*(C`m/(?:$pattern)/\*(C'\fR to avoid this behavior. +.Sp +The last successful pattern may be accessed as a variable via +\&\f(CW\*(C`${^LAST_SUCCESSFUL_PATTERN}\*(C'\fR. Matching against it, or the empty +pattern should have the same effect, with the exception that when there +is no last successful pattern the empty pattern will silently match, +whereas using the \f(CW\*(C`${^LAST_SUCCESSFUL_PATTERN}\*(C'\fR variable will produce +undefined warnings (if warnings are enabled). You can check +\&\f(CWdefined(${^LAST_SUCCESSFUL_PATTERN})\fR to test if there is a "last +successful match" in the current scope. +.Sp +Note that it's possible to confuse Perl into thinking \f(CW\*(C`//\*(C'\fR (the empty +regex) is really \f(CW\*(C`//\*(C'\fR (the defined-or operator). Perl is usually pretty +good about this, but some pathological cases might trigger this, such as +\&\f(CW\*(C`$x///\*(C'\fR (is that \f(CW\*(C`($x)\ /\ (//)\*(C'\fR or \f(CW\*(C`$x\ //\ /\*(C'\fR?) and \f(CW\*(C`print\ $fh\ //\*(C'\fR +(\f(CW\*(C`print\ $fh(//\*(C'\fR or \f(CW\*(C`print($fh\ //\*(C'\fR?). In all of these examples, Perl +will assume you meant defined-or. If you meant the empty regex, just +use parentheses or spaces to disambiguate, or even prefix the empty +regex with an \f(CW\*(C`m\*(C'\fR (so \f(CW\*(C`//\*(C'\fR becomes \f(CW\*(C`m//\*(C'\fR). +.IP "Matching in list context" 8 +.IX Item "Matching in list context" +If the \f(CW\*(C`/g\*(C'\fR option is not used, \f(CW\*(C`m//\*(C'\fR in list context returns a +list consisting of the subexpressions matched by the parentheses in the +pattern, that is, (\f(CW$1\fR, \f(CW$2\fR, \f(CW$3\fR...) (Note that here \f(CW$1\fR etc. are +also set). When there are no parentheses in the pattern, the return +value is the list \f(CW\*(C`(1)\*(C'\fR for success. +With or without parentheses, an empty list is returned upon failure. +.Sp +Examples: +.Sp +.Vb 2 +\& open(TTY, "+</dev/tty") +\& || die "can\*(Aqt access /dev/tty: $!"; +\& +\& <TTY> =~ /^y/i && foo(); # do foo if desired +\& +\& if (/Version: *([0\-9.]*)/) { $version = $1; } +\& +\& next if m#^/usr/spool/uucp#; +\& +\& # poor man\*(Aqs grep +\& $arg = shift; +\& while (<>) { +\& print if /$arg/o; # compile only once (no longer needed!) +\& } +\& +\& if (($F1, $F2, $Etc) = ($foo =~ /^(\eS+)\es+(\eS+)\es*(.*)/)) +.Ve +.Sp +This last example splits \f(CW$foo\fR into the first two words and the +remainder of the line, and assigns those three fields to \f(CW$F1\fR, \f(CW$F2\fR, and +\&\f(CW$Etc\fR. The conditional is true if any variables were assigned; that is, +if the pattern matched. +.Sp +The \f(CW\*(C`/g\*(C'\fR modifier specifies global pattern matching\-\-that is, +matching as many times as possible within the string. How it behaves +depends on the context. In list context, it returns a list of the +substrings matched by any capturing parentheses in the regular +expression. If there are no parentheses, it returns a list of all +the matched strings, as if there were parentheses around the whole +pattern. +.Sp +In scalar context, each execution of \f(CW\*(C`m//g\*(C'\fR finds the next match, +returning true if it matches, and false if there is no further match. +The position after the last match can be read or set using the \f(CWpos()\fR +function; see "pos" in perlfunc. A failed match normally resets the +search position to the beginning of the string, but you can avoid that +by adding the \f(CW\*(C`/c\*(C'\fR modifier (for example, \f(CW\*(C`m//gc\*(C'\fR). Modifying the target +string also resets the search position. +.ie n .IP """\eG \fIassertion\fR""" 8 +.el .IP "\f(CW\eG \fR\f(CIassertion\fR\f(CW\fR" 8 +.IX Item "G assertion" +You can intermix \f(CW\*(C`m//g\*(C'\fR matches with \f(CW\*(C`m/\eG.../g\*(C'\fR, where \f(CW\*(C`\eG\*(C'\fR is a +zero-width assertion that matches the exact position where the +previous \f(CW\*(C`m//g\*(C'\fR, if any, left off. Without the \f(CW\*(C`/g\*(C'\fR modifier, the +\&\f(CW\*(C`\eG\*(C'\fR assertion still anchors at \f(CWpos()\fR as it was at the start of +the operation (see "pos" in perlfunc), but the match is of course only +attempted once. Using \f(CW\*(C`\eG\*(C'\fR without \f(CW\*(C`/g\*(C'\fR on a target string that has +not previously had a \f(CW\*(C`/g\*(C'\fR match applied to it is the same as using +the \f(CW\*(C`\eA\*(C'\fR assertion to match the beginning of the string. Note also +that, currently, \f(CW\*(C`\eG\*(C'\fR is only properly supported when anchored at the +very beginning of the pattern. +.Sp +Examples: +.Sp +.Vb 2 +\& # list context +\& ($one,$five,$fifteen) = (\`uptime\` =~ /(\ed+\e.\ed+)/g); +\& +\& # scalar context +\& local $/ = ""; +\& while ($paragraph = <>) { +\& while ($paragraph =~ /\ep{Ll}[\*(Aq")]*[.!?]+[\*(Aq")]*\es/g) { +\& $sentences++; +\& } +\& } +\& say $sentences; +.Ve +.Sp +Here's another way to check for sentences in a paragraph: +.Sp +.Vb 10 +\& my $sentence_rx = qr{ +\& (?: (?<= ^ ) | (?<= \es ) ) # after start\-of\-string or +\& # whitespace +\& \ep{Lu} # capital letter +\& .*? # a bunch of anything +\& (?<= \eS ) # that ends in non\- +\& # whitespace +\& (?<! \eb [DMS]r ) # but isn\*(Aqt a common abbr. +\& (?<! \eb Mrs ) +\& (?<! \eb Sra ) +\& (?<! \eb St ) +\& [.?!] # followed by a sentence +\& # ender +\& (?= $ | \es ) # in front of end\-of\-string +\& # or whitespace +\& }sx; +\& local $/ = ""; +\& while (my $paragraph = <>) { +\& say "NEW PARAGRAPH"; +\& my $count = 0; +\& while ($paragraph =~ /($sentence_rx)/g) { +\& printf "\etgot sentence %d: <%s>\en", ++$count, $1; +\& } +\& } +.Ve +.Sp +Here's how to use \f(CW\*(C`m//gc\*(C'\fR with \f(CW\*(C`\eG\*(C'\fR: +.Sp +.Vb 10 +\& $_ = "ppooqppqq"; +\& while ($i++ < 2) { +\& print "1: \*(Aq"; +\& print $1 while /(o)/gc; print "\*(Aq, pos=", pos, "\en"; +\& print "2: \*(Aq"; +\& print $1 if /\eG(q)/gc; print "\*(Aq, pos=", pos, "\en"; +\& print "3: \*(Aq"; +\& print $1 while /(p)/gc; print "\*(Aq, pos=", pos, "\en"; +\& } +\& print "Final: \*(Aq$1\*(Aq, pos=",pos,"\en" if /\eG(.)/; +.Ve +.Sp +The last example should print: +.Sp +.Vb 7 +\& 1: \*(Aqoo\*(Aq, pos=4 +\& 2: \*(Aqq\*(Aq, pos=5 +\& 3: \*(Aqpp\*(Aq, pos=7 +\& 1: \*(Aq\*(Aq, pos=7 +\& 2: \*(Aqq\*(Aq, pos=8 +\& 3: \*(Aq\*(Aq, pos=8 +\& Final: \*(Aqq\*(Aq, pos=8 +.Ve +.Sp +Notice that the final match matched \f(CW\*(C`q\*(C'\fR instead of \f(CW\*(C`p\*(C'\fR, which a match +without the \f(CW\*(C`\eG\*(C'\fR anchor would have done. Also note that the final match +did not update \f(CW\*(C`pos\*(C'\fR. \f(CW\*(C`pos\*(C'\fR is only updated on a \f(CW\*(C`/g\*(C'\fR match. If the +final match did indeed match \f(CW\*(C`p\*(C'\fR, it's a good bet that you're running an +ancient (pre\-5.6.0) version of Perl. +.Sp +A useful idiom for \f(CW\*(C`lex\*(C'\fR\-like scanners is \f(CW\*(C`/\eG.../gc\*(C'\fR. You can +combine several regexps like this to process a string part-by-part, +doing different actions depending on which regexp matched. Each +regexp tries to match where the previous one leaves off. +.Sp +.Vb 4 +\& $_ = <<\*(AqEOL\*(Aq; +\& $url = URI::URL\->new( "http://example.com/" ); +\& die if $url eq "xXx"; +\& EOL +\& +\& LOOP: { +\& print(" digits"), redo LOOP if /\eG\ed+\eb[,.;]?\es*/gc; +\& print(" lowercase"), redo LOOP +\& if /\eG\ep{Ll}+\eb[,.;]?\es*/gc; +\& print(" UPPERCASE"), redo LOOP +\& if /\eG\ep{Lu}+\eb[,.;]?\es*/gc; +\& print(" Capitalized"), redo LOOP +\& if /\eG\ep{Lu}\ep{Ll}+\eb[,.;]?\es*/gc; +\& print(" MiXeD"), redo LOOP if /\eG\epL+\eb[,.;]?\es*/gc; +\& print(" alphanumeric"), redo LOOP +\& if /\eG[\ep{Alpha}\epN]+\eb[,.;]?\es*/gc; +\& print(" line\-noise"), redo LOOP if /\eG\eW+/gc; +\& print ". That\*(Aqs all!\en"; +\& } +.Ve +.Sp +Here is the output (split into several lines): +.Sp +.Vb 4 +\& line\-noise lowercase line\-noise UPPERCASE line\-noise UPPERCASE +\& line\-noise lowercase line\-noise lowercase line\-noise lowercase +\& lowercase line\-noise lowercase lowercase line\-noise lowercase +\& lowercase line\-noise MiXeD line\-noise. That\*(Aqs all! +.Ve +.ie n .IP """m?\fIPATTERN\fR?msixpodualngc""" 8 +.el .IP \f(CWm?\fR\f(CIPATTERN\fR\f(CW?msixpodualngc\fR 8 +.IX Xref "? operator, match-once" +.IX Item "m?PATTERN?msixpodualngc" +This is just like the \f(CW\*(C`m/\fR\f(CIPATTERN\fR\f(CW/\*(C'\fR search, except that it matches +only once between calls to the \f(CWreset()\fR operator. This is a useful +optimization when you want to see only the first occurrence of +something in each file of a set of files, for instance. Only \f(CW\*(C`m??\*(C'\fR +patterns local to the current package are reset. +.Sp +.Vb 7 +\& while (<>) { +\& if (m?^$?) { +\& # blank line between header and body +\& } +\& } continue { +\& reset if eof; # clear m?? status for next file +\& } +.Ve +.Sp +Another example switched the first "latin1" encoding it finds +to "utf8" in a pod file: +.Sp +.Vb 1 +\& s//utf8/ if m? ^ =encoding \eh+ \eK latin1 ?x; +.Ve +.Sp +The match-once behavior is controlled by the match delimiter being +\&\f(CW\*(C`?\*(C'\fR; with any other delimiter this is the normal \f(CW\*(C`m//\*(C'\fR operator. +.Sp +In the past, the leading \f(CW\*(C`m\*(C'\fR in \f(CW\*(C`m?\fR\f(CIPATTERN\fR\f(CW?\*(C'\fR was optional, but omitting it +would produce a deprecation warning. As of v5.22.0, omitting it produces a +syntax error. If you encounter this construct in older code, you can just add +\&\f(CW\*(C`m\*(C'\fR. +.ie n .IP """s/\fIPATTERN\fR/\fIREPLACEMENT\fR/msixpodualngcer""" 8 +.el .IP \f(CWs/\fR\f(CIPATTERN\fR\f(CW/\fR\f(CIREPLACEMENT\fR\f(CW/msixpodualngcer\fR 8 +.IX Xref "s substitute substitution replace regexp, replace regexp, substitute m s i x p o g c e r" +.IX Item "s/PATTERN/REPLACEMENT/msixpodualngcer" +Searches a string for a pattern, and if found, replaces that pattern +with the replacement text and returns the number of substitutions +made. Otherwise it returns false (a value that is both an empty string (\f(CW""\fR) +and numeric zero (\f(CW0\fR) as described in "Relational Operators"). +.Sp +If the \f(CW\*(C`/r\*(C'\fR (non-destructive) option is used then it runs the +substitution on a copy of the string and instead of returning the +number of substitutions, it returns the copy whether or not a +substitution occurred. The original string is never changed when +\&\f(CW\*(C`/r\*(C'\fR is used. The copy will always be a plain string, even if the +input is an object or a tied variable. +.Sp +If no string is specified via the \f(CW\*(C`=~\*(C'\fR or \f(CW\*(C`!~\*(C'\fR operator, the \f(CW$_\fR +variable is searched and modified. Unless the \f(CW\*(C`/r\*(C'\fR option is used, +the string specified must be a scalar variable, an array element, a +hash element, or an assignment to one of those; that is, some sort of +scalar lvalue. +.Sp +If the delimiter chosen is a single quote, no variable interpolation is +done on either the \fIPATTERN\fR or the \fIREPLACEMENT\fR. Otherwise, if the +\&\fIPATTERN\fR contains a \f(CW\*(C`$\*(C'\fR that looks like a variable rather than an +end-of-string test, the variable will be interpolated into the pattern +at run-time. If you want the pattern compiled only once the first time +the variable is interpolated, use the \f(CW\*(C`/o\*(C'\fR option. If the pattern +evaluates to the empty string, the last successfully executed regular +expression is used instead. See perlre for further explanation on these. +.Sp +Options are as with \f(CW\*(C`m//\*(C'\fR with the addition of the following replacement +specific options: +.Sp +.Vb 5 +\& e Evaluate the right side as an expression. +\& ee Evaluate the right side as a string then eval the +\& result. +\& r Return substitution and leave the original string +\& untouched. +.Ve +.Sp +Any non-whitespace delimiter may replace the slashes. Add space after +the \f(CW\*(C`s\*(C'\fR when using a character allowed in identifiers. If single quotes +are used, no interpretation is done on the replacement string (the \f(CW\*(C`/e\*(C'\fR +modifier overrides this, however). Note that Perl treats backticks +as normal delimiters; the replacement text is not evaluated as a command. +If the \fIPATTERN\fR is delimited by bracketing quotes, the \fIREPLACEMENT\fR has +its own pair of quotes, which may or may not be bracketing quotes, for example, +\&\f(CW\*(C`s(foo)(bar)\*(C'\fR or \f(CW\*(C`s<foo>/bar/\*(C'\fR. A \f(CW\*(C`/e\*(C'\fR will cause the +replacement portion to be treated as a full-fledged Perl expression +and evaluated right then and there. It is, however, syntax checked at +compile-time. A second \f(CW\*(C`e\*(C'\fR modifier will cause the replacement portion +to be \f(CW\*(C`eval\*(C'\fRed before being run as a Perl expression. +.Sp +Examples: +.Sp +.Vb 1 +\& s/\ebgreen\eb/mauve/g; # don\*(Aqt change wintergreen +\& +\& $path =~ s|/usr/bin|/usr/local/bin|; +\& +\& s/Login: $foo/Login: $bar/; # run\-time pattern +\& +\& ($foo = $bar) =~ s/this/that/; # copy first, then +\& # change +\& ($foo = "$bar") =~ s/this/that/; # convert to string, +\& # copy, then change +\& $foo = $bar =~ s/this/that/r; # Same as above using /r +\& $foo = $bar =~ s/this/that/r +\& =~ s/that/the other/r; # Chained substitutes +\& # using /r +\& @foo = map { s/this/that/r } @bar # /r is very useful in +\& # maps +\& +\& $count = ($paragraph =~ s/Mister\eb/Mr./g); # get change\-cnt +\& +\& $_ = \*(Aqabc123xyz\*(Aq; +\& s/\ed+/$&*2/e; # yields \*(Aqabc246xyz\*(Aq +\& s/\ed+/sprintf("%5d",$&)/e; # yields \*(Aqabc 246xyz\*(Aq +\& s/\ew/$& x 2/eg; # yields \*(Aqaabbcc 224466xxyyzz\*(Aq +\& +\& s/%(.)/$percent{$1}/g; # change percent escapes; no /e +\& s/%(.)/$percent{$1} || $&/ge; # expr now, so /e +\& s/^=(\ew+)/pod($1)/ge; # use function call +\& +\& $_ = \*(Aqabc123xyz\*(Aq; +\& $x = s/abc/def/r; # $x is \*(Aqdef123xyz\*(Aq and +\& # $_ remains \*(Aqabc123xyz\*(Aq. +\& +\& # expand variables in $_, but dynamics only, using +\& # symbolic dereferencing +\& s/\e$(\ew+)/${$1}/g; +\& +\& # Add one to the value of any numbers in the string +\& s/(\ed+)/1 + $1/eg; +\& +\& # Titlecase words in the last 30 characters only (presuming +\& # that the substring doesn\*(Aqt start in the middle of a word) +\& substr($str, \-30) =~ s/\eb(\ep{Alpha})(\ep{Alpha}*)\eb/\eu$1\eL$2/g; +\& +\& # This will expand any embedded scalar variable +\& # (including lexicals) in $_ : First $1 is interpolated +\& # to the variable name, and then evaluated +\& s/(\e$\ew+)/$1/eeg; +\& +\& # Delete (most) C comments. +\& $program =~ s { +\& /\e* # Match the opening delimiter. +\& .*? # Match a minimal number of characters. +\& \e*/ # Match the closing delimiter. +\& } []gsx; +\& +\& s/^\es*(.*?)\es*$/$1/; # trim whitespace in $_, +\& # expensively +\& +\& for ($variable) { # trim whitespace in $variable, +\& # cheap +\& s/^\es+//; +\& s/\es+$//; +\& } +\& +\& s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields +\& +\& $foo !~ s/A/a/g; # Lowercase all A\*(Aqs in $foo; return +\& # 0 if any were found and changed; +\& # otherwise return 1 +.Ve +.Sp +Note the use of \f(CW\*(C`$\*(C'\fR instead of \f(CW\*(C`\e\*(C'\fR in the last example. Unlike +\&\fBsed\fR, we use the \e<\fIdigit\fR> form only in the left hand side. +Anywhere else it's $<\fIdigit\fR>. +.Sp +Occasionally, you can't use just a \f(CW\*(C`/g\*(C'\fR to get all the changes +to occur that you might want. Here are two common cases: +.Sp +.Vb 2 +\& # put commas in the right places in an integer +\& 1 while s/(\ed)(\ed\ed\ed)(?!\ed)/$1,$2/g; +\& +\& # expand tabs to 8\-column spacing +\& 1 while s/\et+/\*(Aq \*(Aq x (length($&)*8 \- length($\`)%8)/e; +.Ve +.Sp +While \f(CW\*(C`s///\*(C'\fR accepts the \f(CW\*(C`/c\*(C'\fR flag, it has no effect beyond +producing a warning if warnings are enabled. +.IX Xref " c" +.SS "Quote-Like Operators" +.IX Xref "operator, quote-like" +.IX Subsection "Quote-Like Operators" +.ie n .IP """q/\fISTRING\fR/""" 4 +.el .IP \f(CWq/\fR\f(CISTRING\fR\f(CW/\fR 4 +.IX Xref "q quote, single ' ''" +.IX Item "q/STRING/" +.PD 0 +.ie n .IP \*(Aq\fISTRING\fR\*(Aq 4 +.el .IP \f(CW\*(Aq\fR\f(CISTRING\fR\f(CW\*(Aq\fR 4 +.IX Item "STRING" +.PD +A single-quoted, literal string. A backslash represents a backslash +unless followed by the delimiter or another backslash, in which case +the delimiter or backslash is interpolated. +.Sp +.Vb 3 +\& $foo = q!I said, "You said, \*(AqShe said it.\*(Aq"!; +\& $bar = q(\*(AqThis is it.\*(Aq); +\& $baz = \*(Aq\en\*(Aq; # a two\-character string +.Ve +.ie n .IP """qq/\fISTRING\fR/""" 4 +.el .IP \f(CWqq/\fR\f(CISTRING\fR\f(CW/\fR 4 +.IX Xref "qq quote, double "" """"" +.IX Item "qq/STRING/" +.PD 0 +.ie n .IP """\fISTRING\fR""" 4 +.el .IP "\f(CW""\fR\f(CISTRING\fR\f(CW""\fR" 4 +.IX Item """STRING""" +.PD +A double-quoted, interpolated string. +.Sp +.Vb 4 +\& $_ .= qq +\& (*** The previous line contains the naughty word "$1".\en) +\& if /\eb(tcl|java|python)\eb/i; # :\-) +\& $baz = "\en"; # a one\-character string +.Ve +.ie n .IP """qx/\fISTRING\fR/""" 4 +.el .IP \f(CWqx/\fR\f(CISTRING\fR\f(CW/\fR 4 +.IX Xref "qx ` `` backtick" +.IX Item "qx/STRING/" +.PD 0 +.ie n .IP \`\fISTRING\fR\` 4 +.el .IP \f(CW\`\fR\f(CISTRING\fR\f(CW\`\fR 4 +.IX Item "STRING" +.PD +A string which is (possibly) interpolated and then executed as a +system command, via \fI/bin/sh\fR or its equivalent if required. Shell +wildcards, pipes, and redirections will be honored. Similarly to +\&\f(CW\*(C`system\*(C'\fR, if the string contains no shell metacharacters then it will +executed directly. The collected standard output of the command is +returned; standard error is unaffected. In scalar context, it comes +back as a single (potentially multi-line) string, or \f(CW\*(C`undef\*(C'\fR if the +shell (or command) could not be started. In list context, returns a +list of lines (however you've defined lines with \f(CW$/\fR or +\&\f(CW$INPUT_RECORD_SEPARATOR\fR), or an empty list if the shell (or command) +could not be started. +.Sp +Because backticks do not affect standard error, use shell file descriptor +syntax (assuming the shell supports this) if you care to address this. +To capture a command's STDERR and STDOUT together: +.Sp +.Vb 1 +\& $output = \`cmd 2>&1\`; +.Ve +.Sp +To capture a command's STDOUT but discard its STDERR: +.Sp +.Vb 1 +\& $output = \`cmd 2>/dev/null\`; +.Ve +.Sp +To capture a command's STDERR but discard its STDOUT (ordering is +important here): +.Sp +.Vb 1 +\& $output = \`cmd 2>&1 1>/dev/null\`; +.Ve +.Sp +To exchange a command's STDOUT and STDERR in order to capture the STDERR +but leave its STDOUT to come out the old STDERR: +.Sp +.Vb 1 +\& $output = \`cmd 3>&1 1>&2 2>&3 3>&\-\`; +.Ve +.Sp +To read both a command's STDOUT and its STDERR separately, it's easiest +to redirect them separately to files, and then read from those files +when the program is done: +.Sp +.Vb 1 +\& system("program args 1>program.stdout 2>program.stderr"); +.Ve +.Sp +The STDIN filehandle used by the command is inherited from Perl's STDIN. +For example: +.Sp +.Vb 3 +\& open(SPLAT, "stuff") || die "can\*(Aqt open stuff: $!"; +\& open(STDIN, "<&SPLAT") || die "can\*(Aqt dupe SPLAT: $!"; +\& print STDOUT \`sort\`; +.Ve +.Sp +will print the sorted contents of the file named \fI"stuff"\fR. +.Sp +Using single-quote as a delimiter protects the command from Perl's +double-quote interpolation, passing it on to the shell instead: +.Sp +.Vb 2 +\& $perl_info = qx(ps $$); # that\*(Aqs Perl\*(Aqs $$ +\& $shell_info = qx\*(Aqps $$\*(Aq; # that\*(Aqs the new shell\*(Aqs $$ +.Ve +.Sp +How that string gets evaluated is entirely subject to the command +interpreter on your system. On most platforms, you will have to protect +shell metacharacters if you want them treated literally. This is in +practice difficult to do, as it's unclear how to escape which characters. +See perlsec for a clean and safe example of a manual \f(CWfork()\fR and \f(CWexec()\fR +to emulate backticks safely. +.Sp +On some platforms (notably DOS-like ones), the shell may not be +capable of dealing with multiline commands, so putting newlines in +the string may not get you what you want. You may be able to evaluate +multiple commands in a single line by separating them with the command +separator character, if your shell supports that (for example, \f(CW\*(C`;\*(C'\fR on +many Unix shells and \f(CW\*(C`&\*(C'\fR on the Windows NT \f(CW\*(C`cmd\*(C'\fR shell). +.Sp +Perl will attempt to flush all files opened for +output before starting the child process, but this may not be supported +on some platforms (see perlport). To be safe, you may need to set +\&\f(CW$|\fR (\f(CW$AUTOFLUSH\fR in \f(CW\*(C`English\*(C'\fR) or call the \f(CWautoflush()\fR method of +\&\f(CW\*(C`IO::Handle\*(C'\fR on any open handles. +.Sp +Beware that some command shells may place restrictions on the length +of the command line. You must ensure your strings don't exceed this +limit after any necessary interpolations. See the platform-specific +release notes for more details about your particular environment. +.Sp +Using this operator can lead to programs that are difficult to port, +because the shell commands called vary between systems, and may in +fact not be present at all. As one example, the \f(CW\*(C`type\*(C'\fR command under +the POSIX shell is very different from the \f(CW\*(C`type\*(C'\fR command under DOS. +That doesn't mean you should go out of your way to avoid backticks +when they're the right way to get something done. Perl was made to be +a glue language, and one of the things it glues together is commands. +Just understand what you're getting yourself into. +.Sp +Like \f(CW\*(C`system\*(C'\fR, backticks put the child process exit code in \f(CW$?\fR. +If you'd like to manually inspect failure, you can check all possible +failure modes by inspecting \f(CW$?\fR like this: +.Sp +.Vb 10 +\& if ($? == \-1) { +\& print "failed to execute: $!\en"; +\& } +\& elsif ($? & 127) { +\& printf "child died with signal %d, %s coredump\en", +\& ($? & 127), ($? & 128) ? \*(Aqwith\*(Aq : \*(Aqwithout\*(Aq; +\& } +\& else { +\& printf "child exited with value %d\en", $? >> 8; +\& } +.Ve +.Sp +Use the open pragma to control the I/O layers used when reading the +output of the command, for example: +.Sp +.Vb 2 +\& use open IN => ":encoding(UTF\-8)"; +\& my $x = \`cmd\-producing\-utf\-8\`; +.Ve +.Sp +\&\f(CW\*(C`qx//\*(C'\fR can also be called like a function with "readpipe" in perlfunc. +.Sp +See "I/O Operators" for more discussion. +.ie n .IP """qw/\fISTRING\fR/""" 4 +.el .IP \f(CWqw/\fR\f(CISTRING\fR\f(CW/\fR 4 +.IX Xref "qw quote, list quote, words" +.IX Item "qw/STRING/" +Evaluates to a list of the words extracted out of \fISTRING\fR, using embedded +whitespace as the word delimiters. It can be understood as being roughly +equivalent to: +.Sp +.Vb 1 +\& split(" ", q/STRING/); +.Ve +.Sp +the differences being that it only splits on ASCII whitespace, +generates a real list at compile time, and +in scalar context it returns the last element in the list. So +this expression: +.Sp +.Vb 1 +\& qw(foo bar baz) +.Ve +.Sp +is semantically equivalent to the list: +.Sp +.Vb 1 +\& "foo", "bar", "baz" +.Ve +.Sp +Some frequently seen examples: +.Sp +.Vb 2 +\& use POSIX qw( setlocale localeconv ) +\& @EXPORT = qw( foo bar baz ); +.Ve +.Sp +A common mistake is to try to separate the words with commas or to +put comments into a multi-line \f(CW\*(C`qw\*(C'\fR\-string. For this reason, the +\&\f(CW\*(C`use\ warnings\*(C'\fR pragma and the \fB\-w\fR switch (that is, the \f(CW$^W\fR variable) +produces warnings if the \fISTRING\fR contains the \f(CW","\fR or the \f(CW"#"\fR character. +.ie n .IP """tr/\fISEARCHLIST\fR/\fIREPLACEMENTLIST\fR/cdsr""" 4 +.el .IP \f(CWtr/\fR\f(CISEARCHLIST\fR\f(CW/\fR\f(CIREPLACEMENTLIST\fR\f(CW/cdsr\fR 4 +.IX Xref "tr y transliterate c d s" +.IX Item "tr/SEARCHLIST/REPLACEMENTLIST/cdsr" +.PD 0 +.ie n .IP """y/\fISEARCHLIST\fR/\fIREPLACEMENTLIST\fR/cdsr""" 4 +.el .IP \f(CWy/\fR\f(CISEARCHLIST\fR\f(CW/\fR\f(CIREPLACEMENTLIST\fR\f(CW/cdsr\fR 4 +.IX Item "y/SEARCHLIST/REPLACEMENTLIST/cdsr" +.PD +Transliterates all occurrences of the characters found (or not found +if the \f(CW\*(C`/c\*(C'\fR modifier is specified) in the search list with the +positionally corresponding character in the replacement list, possibly +deleting some, depending on the modifiers specified. It returns the +number of characters replaced or deleted. If no string is specified via +the \f(CW\*(C`=~\*(C'\fR or \f(CW\*(C`!~\*(C'\fR operator, the \f(CW$_\fR string is transliterated. +.Sp +For \fBsed\fR devotees, \f(CW\*(C`y\*(C'\fR is provided as a synonym for \f(CW\*(C`tr\*(C'\fR. +.Sp +If the \f(CW\*(C`/r\*(C'\fR (non-destructive) option is present, a new copy of the string +is made and its characters transliterated, and this copy is returned no +matter whether it was modified or not: the original string is always +left unchanged. The new copy is always a plain string, even if the input +string is an object or a tied variable. +.Sp +Unless the \f(CW\*(C`/r\*(C'\fR option is used, the string specified with \f(CW\*(C`=~\*(C'\fR must be a +scalar variable, an array element, a hash element, or an assignment to one +of those; in other words, an lvalue. +.Sp +The characters delimitting \fISEARCHLIST\fR and \fIREPLACEMENTLIST\fR +can be any printable character, not just forward slashes. If they +are single quotes (\f(CW\*(C`tr\*(Aq\fR\f(CISEARCHLIST\fR\f(CW\*(Aq\fR\f(CIREPLACEMENTLIST\fR\f(CW\*(Aq\*(C'\fR), the only +interpolation is removal of \f(CW\*(C`\e\*(C'\fR from pairs of \f(CW\*(C`\e\e\*(C'\fR; so hyphens are +interpreted literally rather than specifying a character range. +.Sp +Otherwise, a character range may be specified with a hyphen, so +\&\f(CW\*(C`tr/A\-J/0\-9/\*(C'\fR does the same replacement as +\&\f(CW\*(C`tr/ACEGIBDFHJ/0246813579/\*(C'\fR. +.Sp +If the \fISEARCHLIST\fR is delimited by bracketing quotes, the +\&\fIREPLACEMENTLIST\fR must have its own pair of quotes, which may or may +not be bracketing quotes; for example, \f(CW\*(C`tr(aeiouy)(yuoiea)\*(C'\fR or +\&\f(CW\*(C`tr[+\e\-*/]"ABCD"\*(C'\fR. This final example shows a way to visually clarify +what is going on for people who are more familiar with regular +expression patterns than with \f(CW\*(C`tr\*(C'\fR, and who may think forward slash +delimiters imply that \f(CW\*(C`tr\*(C'\fR is more like a regular expression pattern +than it actually is. (Another option might be to use \f(CW\*(C`tr[...][...]\*(C'\fR.) +.Sp +\&\f(CW\*(C`tr\*(C'\fR isn't fully like bracketed character classes, just +(significantly) more like them than it is to full patterns. For +example, characters appearing more than once in either list behave +differently here than in patterns, and \f(CW\*(C`tr\*(C'\fR lists do not allow +backslashed character classes such as \f(CW\*(C`\ed\*(C'\fR or \f(CW\*(C`\epL\*(C'\fR, nor variable +interpolation, so \f(CW"$"\fR and \f(CW"@"\fR are always treated as literals. +.Sp +The allowed elements are literals plus \f(CW\*(C`\e\*(Aq\*(C'\fR (meaning a single quote). +If the delimiters aren't single quotes, also allowed are any of the +escape sequences accepted in double-quoted strings. Escape sequence +details are in the table near the beginning of this section. +.Sp +A hyphen at the beginning or end, or preceded by a backslash is also +always considered a literal. Precede a delimiter character with a +backslash to allow it. +.Sp +The \f(CW\*(C`tr\*(C'\fR operator is not equivalent to the \f(CWtr(1)\fR utility. +\&\f(CW\*(C`tr[a\-z][A\-Z]\*(C'\fR will uppercase the 26 letters "a" through "z", but for +case changing not confined to ASCII, use \f(CW\*(C`lc\*(C'\fR, +\&\f(CW\*(C`uc\*(C'\fR, \f(CW\*(C`lcfirst\*(C'\fR, +\&\f(CW\*(C`ucfirst\*(C'\fR (all documented in perlfunc), or the +substitution operator +\&\f(CW\*(C`s/\fR\f(CIPATTERN\fR\f(CW/\fR\f(CIREPLACEMENT\fR\f(CW/\*(C'\fR +(with \f(CW\*(C`\eU\*(C'\fR, \f(CW\*(C`\eu\*(C'\fR, \f(CW\*(C`\eL\*(C'\fR, and \f(CW\*(C`\el\*(C'\fR string-interpolation escapes in the +\&\fIREPLACEMENT\fR portion). +.Sp +Most ranges are unportable between character sets, but certain ones +signal Perl to do special handling to make them portable. There are two +classes of portable ranges. The first are any subsets of the ranges +\&\f(CW\*(C`A\-Z\*(C'\fR, \f(CW\*(C`a\-z\*(C'\fR, and \f(CW\*(C`0\-9\*(C'\fR, when expressed as literal characters. +.Sp +.Vb 1 +\& tr/h\-k/H\-K/ +.Ve +.Sp +capitalizes the letters \f(CW"h"\fR, \f(CW"i"\fR, \f(CW"j"\fR, and \f(CW"k"\fR and nothing +else, no matter what the platform's character set is. In contrast, all +of +.Sp +.Vb 3 +\& tr/\ex68\-\ex6B/\ex48\-\ex4B/ +\& tr/h\-\ex6B/H\-\ex4B/ +\& tr/\ex68\-k/\ex48\-K/ +.Ve +.Sp +do the same capitalizations as the previous example when run on ASCII +platforms, but something completely different on EBCDIC ones. +.Sp +The second class of portable ranges is invoked when one or both of the +range's end points are expressed as \f(CW\*(C`\eN{...}\*(C'\fR +.Sp +.Vb 1 +\& $string =~ tr/\eN{U+20}\-\eN{U+7E}//d; +.Ve +.Sp +removes from \f(CW$string\fR all the platform's characters which are +equivalent to any of Unicode U+0020, U+0021, ... U+007D, U+007E. This +is a portable range, and has the same effect on every platform it is +run on. In this example, these are the ASCII +printable characters. So after this is run, \f(CW$string\fR has only +controls and characters which have no ASCII equivalents. +.Sp +But, even for portable ranges, it is not generally obvious what is +included without having to look things up in the manual. A sound +principle is to use only ranges that both begin from, and end at, either +ASCII alphabetics of equal case (\f(CW\*(C`b\-e\*(C'\fR, \f(CW\*(C`B\-E\*(C'\fR), or digits (\f(CW\*(C`1\-4\*(C'\fR). +Anything else is unclear (and unportable unless \f(CW\*(C`\eN{...}\*(C'\fR is used). If +in doubt, spell out the character sets in full. +.Sp +Options: +.Sp +.Vb 5 +\& c Complement the SEARCHLIST. +\& d Delete found but unreplaced characters. +\& r Return the modified string and leave the original string +\& untouched. +\& s Squash duplicate replaced characters. +.Ve +.Sp +If the \f(CW\*(C`/d\*(C'\fR modifier is specified, any characters specified by +\&\fISEARCHLIST\fR not found in \fIREPLACEMENTLIST\fR are deleted. (Note that +this is slightly more flexible than the behavior of some \fBtr\fR programs, +which delete anything they find in the \fISEARCHLIST\fR, period.) +.Sp +If the \f(CW\*(C`/s\*(C'\fR modifier is specified, sequences of characters, all in a +row, that were transliterated to the same character are squashed down to +a single instance of that character. +.Sp +.Vb 2 +\& my $a = "aaabbbca"; +\& $a =~ tr/ab/dd/s; # $a now is "dcd" +.Ve +.Sp +If the \f(CW\*(C`/d\*(C'\fR modifier is used, the \fIREPLACEMENTLIST\fR is always interpreted +exactly as specified. Otherwise, if the \fIREPLACEMENTLIST\fR is shorter +than the \fISEARCHLIST\fR, the final character, if any, is replicated until +it is long enough. There won't be a final character if and only if the +\&\fIREPLACEMENTLIST\fR is empty, in which case \fIREPLACEMENTLIST\fR is +copied from \fISEARCHLIST\fR. An empty \fIREPLACEMENTLIST\fR is useful +for counting characters in a class, or for squashing character sequences +in a class. +.Sp +.Vb 4 +\& tr/abcd// tr/abcd/abcd/ +\& tr/abcd/AB/ tr/abcd/ABBB/ +\& tr/abcd//d s/[abcd]//g +\& tr/abcd/AB/d (tr/ab/AB/ + s/[cd]//g) \- but run together +.Ve +.Sp +If the \f(CW\*(C`/c\*(C'\fR modifier is specified, the characters to be transliterated +are the ones NOT in \fISEARCHLIST\fR, that is, it is complemented. If +\&\f(CW\*(C`/d\*(C'\fR and/or \f(CW\*(C`/s\*(C'\fR are also specified, they apply to the complemented +\&\fISEARCHLIST\fR. Recall, that if \fIREPLACEMENTLIST\fR is empty (except +under \f(CW\*(C`/d\*(C'\fR) a copy of \fISEARCHLIST\fR is used instead. That copy is made +after complementing under \f(CW\*(C`/c\*(C'\fR. \fISEARCHLIST\fR is sorted by code point +order after complementing, and any \fIREPLACEMENTLIST\fR is applied to +that sorted result. This means that under \f(CW\*(C`/c\*(C'\fR, the order of the +characters specified in \fISEARCHLIST\fR is irrelevant. This can +lead to different results on EBCDIC systems if \fIREPLACEMENTLIST\fR +contains more than one character, hence it is generally non-portable to +use \f(CW\*(C`/c\*(C'\fR with such a \fIREPLACEMENTLIST\fR. +.Sp +Another way of describing the operation is this: +If \f(CW\*(C`/c\*(C'\fR is specified, the \fISEARCHLIST\fR is sorted by code point order, +then complemented. If \fIREPLACEMENTLIST\fR is empty and \f(CW\*(C`/d\*(C'\fR is not +specified, \fIREPLACEMENTLIST\fR is replaced by a copy of \fISEARCHLIST\fR (as +modified under \f(CW\*(C`/c\*(C'\fR), and these potentially modified lists are used as +the basis for what follows. Any character in the target string that +isn't in \fISEARCHLIST\fR is passed through unchanged. Every other +character in the target string is replaced by the character in +\&\fIREPLACEMENTLIST\fR that positionally corresponds to its mate in +\&\fISEARCHLIST\fR, except that under \f(CW\*(C`/s\*(C'\fR, the 2nd and following characters +are squeezed out in a sequence of characters in a row that all translate +to the same character. If \fISEARCHLIST\fR is longer than +\&\fIREPLACEMENTLIST\fR, characters in the target string that match a +character in \fISEARCHLIST\fR that doesn't have a correspondence in +\&\fIREPLACEMENTLIST\fR are either deleted from the target string if \f(CW\*(C`/d\*(C'\fR is +specified; or replaced by the final character in \fIREPLACEMENTLIST\fR if +\&\f(CW\*(C`/d\*(C'\fR isn't specified. +.Sp +Some examples: +.Sp +.Vb 1 +\& $ARGV[1] =~ tr/A\-Z/a\-z/; # canonicalize to lower case ASCII +\& +\& $cnt = tr/*/*/; # count the stars in $_ +\& $cnt = tr/*//; # same thing +\& +\& $cnt = $sky =~ tr/*/*/; # count the stars in $sky +\& $cnt = $sky =~ tr/*//; # same thing +\& +\& $cnt = $sky =~ tr/*//c; # count all the non\-stars in $sky +\& $cnt = $sky =~ tr/*/*/c; # same, but transliterate each non\-star +\& # into a star, leaving the already\-stars +\& # alone. Afterwards, everything in $sky +\& # is a star. +\& +\& $cnt = tr/0\-9//; # count the ASCII digits in $_ +\& +\& tr/a\-zA\-Z//s; # bookkeeper \-> bokeper +\& tr/o/o/s; # bookkeeper \-> bokkeeper +\& tr/oe/oe/s; # bookkeeper \-> bokkeper +\& tr/oe//s; # bookkeeper \-> bokkeper +\& tr/oe/o/s; # bookkeeper \-> bokkopor +\& +\& ($HOST = $host) =~ tr/a\-z/A\-Z/; +\& $HOST = $host =~ tr/a\-z/A\-Z/r; # same thing +\& +\& $HOST = $host =~ tr/a\-z/A\-Z/r # chained with s///r +\& =~ s/:/ \-p/r; +\& +\& tr/a\-zA\-Z/ /cs; # change non\-alphas to single space +\& +\& @stripped = map tr/a\-zA\-Z/ /csr, @original; +\& # /r with map +\& +\& tr [\e200\-\e377] +\& [\e000\-\e177]; # wickedly delete 8th bit +\& +\& $foo !~ tr/A/a/ # transliterate all the A\*(Aqs in $foo to \*(Aqa\*(Aq, +\& # return 0 if any were found and changed. +\& # Otherwise return 1 +.Ve +.Sp +If multiple transliterations are given for a character, only the +first one is used: +.Sp +.Vb 1 +\& tr/AAA/XYZ/ +.Ve +.Sp +will transliterate any A to X. +.Sp +Because the transliteration table is built at compile time, neither +the \fISEARCHLIST\fR nor the \fIREPLACEMENTLIST\fR are subjected to double quote +interpolation. That means that if you want to use variables, you +must use an \f(CWeval()\fR: +.Sp +.Vb 2 +\& eval "tr/$oldlist/$newlist/"; +\& die $@ if $@; +\& +\& eval "tr/$oldlist/$newlist/, 1" or die $@; +.Ve +.ie n .IP """<<\fIEOF\fR""" 4 +.el .IP \f(CW<<\fR\f(CIEOF\fR\f(CW\fR 4 +.IX Xref "here-doc heredoc here-document <<" +.IX Item "<<EOF" +A line-oriented form of quoting is based on the shell "here-document" +syntax. Following a \f(CW\*(C`<<\*(C'\fR you specify a string to terminate +the quoted material, and all lines following the current line down to +the terminating string are the value of the item. +.Sp +Prefixing the terminating string with a \f(CW\*(C`~\*(C'\fR specifies that you +want to use "Indented Here-docs" (see below). +.Sp +The terminating string may be either an identifier (a word), or some +quoted text. An unquoted identifier works like double quotes. +There may not be a space between the \f(CW\*(C`<<\*(C'\fR and the identifier, +unless the identifier is explicitly quoted. The terminating string +must appear by itself (unquoted and with no surrounding whitespace) +on the terminating line. +.Sp +If the terminating string is quoted, the type of quotes used determine +the treatment of the text. +.RS 4 +.IP "Double Quotes" 4 +.IX Item "Double Quotes" +Double quotes indicate that the text will be interpolated using exactly +the same rules as normal double quoted strings. +.Sp +.Vb 3 +\& print <<EOF; +\& The price is $Price. +\& EOF +\& +\& print << "EOF"; # same as above +\& The price is $Price. +\& EOF +.Ve +.IP "Single Quotes" 4 +.IX Item "Single Quotes" +Single quotes indicate the text is to be treated literally with no +interpolation of its content. This is similar to single quoted +strings except that backslashes have no special meaning, with \f(CW\*(C`\e\e\*(C'\fR +being treated as two backslashes and not one as they would in every +other quoting construct. +.Sp +Just as in the shell, a backslashed bareword following the \f(CW\*(C`<<\*(C'\fR +means the same thing as a single-quoted string does: +.Sp +.Vb 3 +\& $cost = <<\*(AqVISTA\*(Aq; # hasta la ... +\& That\*(Aqll be $10 please, ma\*(Aqam. +\& VISTA +\& +\& $cost = <<\eVISTA; # Same thing! +\& That\*(Aqll be $10 please, ma\*(Aqam. +\& VISTA +.Ve +.Sp +This is the only form of quoting in perl where there is no need +to worry about escaping content, something that code generators +can and do make good use of. +.IP Backticks 4 +.IX Item "Backticks" +The content of the here doc is treated just as it would be if the +string were embedded in backticks. Thus the content is interpolated +as though it were double quoted and then executed via the shell, with +the results of the execution returned. +.Sp +.Vb 3 +\& print << \`EOC\`; # execute command and get results +\& echo hi there +\& EOC +.Ve +.RE +.RS 4 +.IP "Indented Here-docs" 4 +.IX Item "Indented Here-docs" +The here-doc modifier \f(CW\*(C`~\*(C'\fR allows you to indent your here-docs to make +the code more readable: +.Sp +.Vb 5 +\& if ($some_var) { +\& print <<~EOF; +\& This is a here\-doc +\& EOF +\& } +.Ve +.Sp +This will print... +.Sp +.Vb 1 +\& This is a here\-doc +.Ve +.Sp +\&...with no leading whitespace. +.Sp +The line containing the delimiter that marks the end of the here-doc +determines the indentation template for the whole thing. Compilation +croaks if any non-empty line inside the here-doc does not begin with the +precise indentation of the terminating line. (An empty line consists of +the single character "\en".) For example, suppose the terminating line +begins with a tab character followed by 4 space characters. Every +non-empty line in the here-doc must begin with a tab followed by 4 +spaces. They are stripped from each line, and any leading white space +remaining on a line serves as the indentation for that line. Currently, +only the TAB and SPACE characters are treated as whitespace for this +purpose. Tabs and spaces may be mixed, but are matched exactly; tabs +remain tabs and are not expanded. +.Sp +Additional beginning whitespace (beyond what preceded the +delimiter) will be preserved: +.Sp +.Vb 5 +\& print <<~EOF; +\& This text is not indented +\& This text is indented with two spaces +\& This text is indented with two tabs +\& EOF +.Ve +.Sp +Finally, the modifier may be used with all of the forms +mentioned above: +.Sp +.Vb 4 +\& <<~\eEOF; +\& <<~\*(AqEOF\*(Aq +\& <<~"EOF" +\& <<~\`EOF\` +.Ve +.Sp +And whitespace may be used between the \f(CW\*(C`~\*(C'\fR and quoted delimiters: +.Sp +.Vb 1 +\& <<~ \*(AqEOF\*(Aq; # ... "EOF", \`EOF\` +.Ve +.RE +.RS 4 +.Sp +It is possible to stack multiple here-docs in a row: +.Sp +.Vb 5 +\& print <<"foo", <<"bar"; # you can stack them +\& I said foo. +\& foo +\& I said bar. +\& bar +\& +\& myfunc(<< "THIS", 23, <<\*(AqTHAT\*(Aq); +\& Here\*(Aqs a line +\& or two. +\& THIS +\& and here\*(Aqs another. +\& THAT +.Ve +.Sp +Just don't forget that you have to put a semicolon on the end +to finish the statement, as Perl doesn't know you're not going to +try to do this: +.Sp +.Vb 4 +\& print <<ABC +\& 179231 +\& ABC +\& + 20; +.Ve +.Sp +If you want to remove the line terminator from your here-docs, +use \f(CWchomp()\fR. +.Sp +.Vb 3 +\& chomp($string = <<\*(AqEND\*(Aq); +\& This is a string. +\& END +.Ve +.Sp +If you want your here-docs to be indented with the rest of the code, +use the \f(CW\*(C`<<~FOO\*(C'\fR construct described under "Indented Here-docs": +.Sp +.Vb 4 +\& $quote = <<~\*(AqFINIS\*(Aq; +\& The Road goes ever on and on, +\& down from the door where it began. +\& FINIS +.Ve +.Sp +If you use a here-doc within a delimited construct, such as in \f(CW\*(C`s///eg\*(C'\fR, +the quoted material must still come on the line following the +\&\f(CW\*(C`<<FOO\*(C'\fR marker, which means it may be inside the delimited +construct: +.Sp +.Vb 4 +\& s/this/<<E . \*(Aqthat\*(Aq +\& the other +\& E +\& . \*(Aqmore \*(Aq/eg; +.Ve +.Sp +It works this way as of Perl 5.18. Historically, it was inconsistent, and +you would have to write +.Sp +.Vb 4 +\& s/this/<<E . \*(Aqthat\*(Aq +\& . \*(Aqmore \*(Aq/eg; +\& the other +\& E +.Ve +.Sp +outside of string evals. +.Sp +Additionally, quoting rules for the end-of-string identifier are +unrelated to Perl's quoting rules. \f(CWq()\fR, \f(CWqq()\fR, and the like are not +supported in place of \f(CW\*(Aq\*(Aq\fR and \f(CW""\fR, and the only interpolation is for +backslashing the quoting character: +.Sp +.Vb 3 +\& print << "abc\e"def"; +\& testing... +\& abc"def +.Ve +.Sp +Finally, quoted strings cannot span multiple lines. The general rule is +that the identifier must be a string literal. Stick with that, and you +should be safe. +.RE +.SS "Gory details of parsing quoted constructs" +.IX Xref "quote, gory details" +.IX Subsection "Gory details of parsing quoted constructs" +When presented with something that might have several different +interpretations, Perl uses the \fBDWIM\fR (that's "Do What I Mean") +principle to pick the most probable interpretation. This strategy +is so successful that Perl programmers often do not suspect the +ambivalence of what they write. But from time to time, Perl's +notions differ substantially from what the author honestly meant. +.PP +This section hopes to clarify how Perl handles quoted constructs. +Although the most common reason to learn this is to unravel labyrinthine +regular expressions, because the initial steps of parsing are the +same for all quoting operators, they are all discussed together. +.PP +The most important Perl parsing rule is the first one discussed +below: when processing a quoted construct, Perl first finds the end +of that construct, then interprets its contents. If you understand +this rule, you may skip the rest of this section on the first +reading. The other rules are likely to contradict the user's +expectations much less frequently than this first one. +.PP +Some passes discussed below are performed concurrently, but because +their results are the same, we consider them individually. For different +quoting constructs, Perl performs different numbers of passes, from +one to four, but these passes are always performed in the same order. +.IP "Finding the end" 4 +.IX Item "Finding the end" +The first pass is finding the end of the quoted construct. This results +in saving to a safe location a copy of the text (between the starting +and ending delimiters), normalized as necessary to avoid needing to know +what the original delimiters were. +.Sp +If the construct is a here-doc, the ending delimiter is a line +that has a terminating string as the content. Therefore \f(CW\*(C`<<EOF\*(C'\fR is +terminated by \f(CW\*(C`EOF\*(C'\fR immediately followed by \f(CW"\en"\fR and starting +from the first column of the terminating line. +When searching for the terminating line of a here-doc, nothing +is skipped. In other words, lines after the here-doc syntax +are compared with the terminating string line by line. +.Sp +For the constructs except here-docs, single characters are used as starting +and ending delimiters. If the starting delimiter is an opening punctuation +(that is \f(CW\*(C`(\*(C'\fR, \f(CW\*(C`[\*(C'\fR, \f(CW\*(C`{\*(C'\fR, or \f(CW\*(C`<\*(C'\fR), the ending delimiter is the +corresponding closing punctuation (that is \f(CW\*(C`)\*(C'\fR, \f(CW\*(C`]\*(C'\fR, \f(CW\*(C`}\*(C'\fR, or \f(CW\*(C`>\*(C'\fR). +If the starting delimiter is an unpaired character like \f(CW\*(C`/\*(C'\fR or a closing +punctuation, the ending delimiter is the same as the starting delimiter. +Therefore a \f(CW\*(C`/\*(C'\fR terminates a \f(CW\*(C`qq//\*(C'\fR construct, while a \f(CW\*(C`]\*(C'\fR terminates +both \f(CW\*(C`qq[]\*(C'\fR and \f(CW\*(C`qq]]\*(C'\fR constructs. +.Sp +When searching for single-character delimiters, escaped delimiters +and \f(CW\*(C`\e\e\*(C'\fR are skipped. For example, while searching for terminating \f(CW\*(C`/\*(C'\fR, +combinations of \f(CW\*(C`\e\e\*(C'\fR and \f(CW\*(C`\e/\*(C'\fR are skipped. If the delimiters are +bracketing, nested pairs are also skipped. For example, while searching +for a closing \f(CW\*(C`]\*(C'\fR paired with the opening \f(CW\*(C`[\*(C'\fR, combinations of \f(CW\*(C`\e\e\*(C'\fR, \f(CW\*(C`\e]\*(C'\fR, +and \f(CW\*(C`\e[\*(C'\fR are all skipped, and nested \f(CW\*(C`[\*(C'\fR and \f(CW\*(C`]\*(C'\fR are skipped as well. +However, when backslashes are used as the delimiters (like \f(CW\*(C`qq\e\e\*(C'\fR and +\&\f(CW\*(C`tr\e\e\e\*(C'\fR), nothing is skipped. +During the search for the end, backslashes that escape delimiters or +other backslashes are removed (exactly speaking, they are not copied to the +safe location). +.Sp +For constructs with three-part delimiters (\f(CW\*(C`s///\*(C'\fR, \f(CW\*(C`y///\*(C'\fR, and +\&\f(CW\*(C`tr///\*(C'\fR), the search is repeated once more. +If the first delimiter is not an opening punctuation, the three delimiters must +be the same, such as \f(CW\*(C`s!!!\*(C'\fR and \f(CW\*(C`tr)))\*(C'\fR, +in which case the second delimiter +terminates the left part and starts the right part at once. +If the left part is delimited by bracketing punctuation (that is \f(CW\*(C`()\*(C'\fR, +\&\f(CW\*(C`[]\*(C'\fR, \f(CW\*(C`{}\*(C'\fR, or \f(CW\*(C`<>\*(C'\fR), the right part needs another pair of +delimiters such as \f(CW\*(C`s(){}\*(C'\fR and \f(CW\*(C`tr[]//\*(C'\fR. In these cases, whitespace +and comments are allowed between the two parts, although the comment must follow +at least one whitespace character; otherwise a character expected as the +start of the comment may be regarded as the starting delimiter of the right part. +.Sp +During this search no attention is paid to the semantics of the construct. +Thus: +.Sp +.Vb 1 +\& "$hash{"$foo/$bar"}" +.Ve +.Sp +or: +.Sp +.Vb 3 +\& m/ +\& bar # NOT a comment, this slash / terminated m//! +\& /x +.Ve +.Sp +do not form legal quoted expressions. The quoted part ends on the +first \f(CW\*(C`"\*(C'\fR and \f(CW\*(C`/\*(C'\fR, and the rest happens to be a syntax error. +Because the slash that terminated \f(CW\*(C`m//\*(C'\fR was followed by a \f(CW\*(C`SPACE\*(C'\fR, +the example above is not \f(CW\*(C`m//x\*(C'\fR, but rather \f(CW\*(C`m//\*(C'\fR with no \f(CW\*(C`/x\*(C'\fR +modifier. So the embedded \f(CW\*(C`#\*(C'\fR is interpreted as a literal \f(CW\*(C`#\*(C'\fR. +.Sp +Also no attention is paid to \f(CW\*(C`\ec\e\*(C'\fR (multichar control char syntax) during +this search. Thus the second \f(CW\*(C`\e\*(C'\fR in \f(CW\*(C`qq/\ec\e/\*(C'\fR is interpreted as a part +of \f(CW\*(C`\e/\*(C'\fR, and the following \f(CW\*(C`/\*(C'\fR is not recognized as a delimiter. +Instead, use \f(CW\*(C`\e034\*(C'\fR or \f(CW\*(C`\ex1c\*(C'\fR at the end of quoted constructs. +.IP Interpolation 4 +.IX Xref "interpolation" +.IX Item "Interpolation" +The next step is interpolation in the text obtained, which is now +delimiter-independent. There are multiple cases. +.RS 4 +.ie n .IP """<<\*(AqEOF\*(Aq""" 4 +.el .IP \f(CW<<\*(AqEOF\*(Aq\fR 4 +.IX Item "<<EOF" +No interpolation is performed. +Note that the combination \f(CW\*(C`\e\e\*(C'\fR is left intact, since escaped delimiters +are not available for here-docs. +.ie n .IP """m\*(Aq\*(Aq"", the pattern of ""s\*(Aq\*(Aq\*(Aq""" 4 +.el .IP "\f(CWm\*(Aq\*(Aq\fR, the pattern of \f(CWs\*(Aq\*(Aq\*(Aq\fR" 4 +.IX Item "m, the pattern of s" +No interpolation is performed at this stage. +Any backslashed sequences including \f(CW\*(C`\e\e\*(C'\fR are treated at the stage +of "Parsing regular expressions". +.ie n .IP "\*(Aq\*(Aq, ""q//"", ""tr\*(Aq\*(Aq\*(Aq"", ""y\*(Aq\*(Aq\*(Aq"", the replacement of ""s\*(Aq\*(Aq\*(Aq""" 4 +.el .IP "\f(CW\*(Aq\*(Aq\fR, \f(CWq//\fR, \f(CWtr\*(Aq\*(Aq\*(Aq\fR, \f(CWy\*(Aq\*(Aq\*(Aq\fR, the replacement of \f(CWs\*(Aq\*(Aq\*(Aq\fR" 4 +.IX Item ", q//, tr, y, the replacement of s" +The only interpolation is removal of \f(CW\*(C`\e\*(C'\fR from pairs of \f(CW\*(C`\e\e\*(C'\fR. +Therefore \f(CW"\-"\fR in \f(CW\*(C`tr\*(Aq\*(Aq\*(Aq\*(C'\fR and \f(CW\*(C`y\*(Aq\*(Aq\*(Aq\*(C'\fR is treated literally +as a hyphen and no character range is available. +\&\f(CW\*(C`\e1\*(C'\fR in the replacement of \f(CW\*(C`s\*(Aq\*(Aq\*(Aq\*(C'\fR does not work as \f(CW$1\fR. +.ie n .IP """tr///"", ""y///""" 4 +.el .IP "\f(CWtr///\fR, \f(CWy///\fR" 4 +.IX Item "tr///, y///" +No variable interpolation occurs. String modifying combinations for +case and quoting such as \f(CW\*(C`\eQ\*(C'\fR, \f(CW\*(C`\eU\*(C'\fR, and \f(CW\*(C`\eE\*(C'\fR are not recognized. +The other escape sequences such as \f(CW\*(C`\e200\*(C'\fR and \f(CW\*(C`\et\*(C'\fR and backslashed +characters such as \f(CW\*(C`\e\e\*(C'\fR and \f(CW\*(C`\e\-\*(C'\fR are converted to appropriate literals. +The character \f(CW"\-"\fR is treated specially and therefore \f(CW\*(C`\e\-\*(C'\fR is treated +as a literal \f(CW"\-"\fR. +.ie n .IP """"", \`\`, ""qq//"", ""qx//"", ""<file*glob>"", ""<<""EOF""""" 4 +.el .IP "\f(CW""""\fR, \f(CW\`\`\fR, \f(CWqq//\fR, \f(CWqx//\fR, \f(CW<file*glob>\fR, \f(CW<<""EOF""\fR" 4 +.IX Item """"", , qq//, qx//, <file*glob>, <<""EOF""" +\&\f(CW\*(C`\eQ\*(C'\fR, \f(CW\*(C`\eU\*(C'\fR, \f(CW\*(C`\eu\*(C'\fR, \f(CW\*(C`\eL\*(C'\fR, \f(CW\*(C`\el\*(C'\fR, \f(CW\*(C`\eF\*(C'\fR (possibly paired with \f(CW\*(C`\eE\*(C'\fR) are +converted to corresponding Perl constructs. Thus, \f(CW"$foo\eQbaz$bar"\fR +is converted to \f(CW\*(C`$foo\ .\ (quotemeta("baz"\ .\ $bar))\*(C'\fR internally. +The other escape sequences such as \f(CW\*(C`\e200\*(C'\fR and \f(CW\*(C`\et\*(C'\fR and backslashed +characters such as \f(CW\*(C`\e\e\*(C'\fR and \f(CW\*(C`\e\-\*(C'\fR are replaced with appropriate +expansions. +.Sp +Let it be stressed that \fIwhatever falls between \fR\f(CI\*(C`\eQ\*(C'\fR\fI and \fR\f(CI\*(C`\eE\*(C'\fR +is interpolated in the usual way. Something like \f(CW"\eQ\e\eE"\fR has +no \f(CW\*(C`\eE\*(C'\fR inside. Instead, it has \f(CW\*(C`\eQ\*(C'\fR, \f(CW\*(C`\e\e\*(C'\fR, and \f(CW\*(C`E\*(C'\fR, so the +result is the same as for \f(CW"\e\e\e\eE"\fR. As a general rule, backslashes +between \f(CW\*(C`\eQ\*(C'\fR and \f(CW\*(C`\eE\*(C'\fR may lead to counterintuitive results. So, +\&\f(CW"\eQ\et\eE"\fR is converted to \f(CWquotemeta("\et")\fR, which is the same +as \f(CW"\e\e\et"\fR (since TAB is not alphanumeric). Note also that: +.Sp +.Vb 2 +\& $str = \*(Aq\et\*(Aq; +\& return "\eQ$str"; +.Ve +.Sp +may be closer to the conjectural \fIintention\fR of the writer of \f(CW"\eQ\et\eE"\fR. +.Sp +Interpolated scalars and arrays are converted internally to the \f(CW\*(C`join\*(C'\fR and +\&\f(CW"."\fR catenation operations. Thus, \f(CW"$foo\ XXX\ \*(Aq@arr\*(Aq"\fR becomes: +.Sp +.Vb 1 +\& $foo . " XXX \*(Aq" . (join $", @arr) . "\*(Aq"; +.Ve +.Sp +All operations above are performed simultaneously, left to right. +.Sp +Because the result of \f(CW"\eQ\ \fR\f(CISTRING\fR\f(CW\ \eE"\fR has all metacharacters +quoted, there is no way to insert a literal \f(CW\*(C`$\*(C'\fR or \f(CW\*(C`@\*(C'\fR inside a +\&\f(CW\*(C`\eQ\eE\*(C'\fR pair. If protected by \f(CW\*(C`\e\*(C'\fR, \f(CW\*(C`$\*(C'\fR will be quoted to become +\&\f(CW"\e\e\e$"\fR; if not, it is interpreted as the start of an interpolated +scalar. +.Sp +Note also that the interpolation code needs to make a decision on +where the interpolated scalar ends. For instance, whether +\&\f(CW"a\ $x\ \->\ {c}"\fR really means: +.Sp +.Vb 1 +\& "a " . $x . " \-> {c}"; +.Ve +.Sp +or: +.Sp +.Vb 1 +\& "a " . $x \-> {c}; +.Ve +.Sp +Most of the time, the longest possible text that does not include +spaces between components and which contains matching braces or +brackets. because the outcome may be determined by voting based +on heuristic estimators, the result is not strictly predictable. +Fortunately, it's usually correct for ambiguous cases. +.ie n .IP "The replacement of ""s///""" 4 +.el .IP "The replacement of \f(CWs///\fR" 4 +.IX Item "The replacement of s///" +Processing of \f(CW\*(C`\eQ\*(C'\fR, \f(CW\*(C`\eU\*(C'\fR, \f(CW\*(C`\eu\*(C'\fR, \f(CW\*(C`\eL\*(C'\fR, \f(CW\*(C`\el\*(C'\fR, \f(CW\*(C`\eF\*(C'\fR and interpolation +happens as with \f(CW\*(C`qq//\*(C'\fR constructs. +.Sp +It is at this step that \f(CW\*(C`\e1\*(C'\fR is begrudgingly converted to \f(CW$1\fR in +the replacement text of \f(CW\*(C`s///\*(C'\fR, in order to correct the incorrigible +\&\fIsed\fR hackers who haven't picked up the saner idiom yet. A warning +is emitted if the \f(CW\*(C`use\ warnings\*(C'\fR pragma or the \fB\-w\fR command-line flag +(that is, the \f(CW$^W\fR variable) was set. +.ie n .IP """RE"" in ""m?RE?"", ""/RE/"", ""m/RE/"", ""s/RE/foo/""," 4 +.el .IP "\f(CWRE\fR in \f(CWm?RE?\fR, \f(CW/RE/\fR, \f(CWm/RE/\fR, \f(CWs/RE/foo/\fR," 4 +.IX Item "RE in m?RE?, /RE/, m/RE/, s/RE/foo/," +Processing of \f(CW\*(C`\eQ\*(C'\fR, \f(CW\*(C`\eU\*(C'\fR, \f(CW\*(C`\eu\*(C'\fR, \f(CW\*(C`\eL\*(C'\fR, \f(CW\*(C`\el\*(C'\fR, \f(CW\*(C`\eF\*(C'\fR, \f(CW\*(C`\eE\*(C'\fR, +and interpolation happens (almost) as with \f(CW\*(C`qq//\*(C'\fR constructs. +.Sp +Processing of \f(CW\*(C`\eN{...}\*(C'\fR is also done here, and compiled into an intermediate +form for the regex compiler. (This is because, as mentioned below, the regex +compilation may be done at execution time, and \f(CW\*(C`\eN{...}\*(C'\fR is a compile-time +construct.) +.Sp +However any other combinations of \f(CW\*(C`\e\*(C'\fR followed by a character +are not substituted but only skipped, in order to parse them +as regular expressions at the following step. +As \f(CW\*(C`\ec\*(C'\fR is skipped at this step, \f(CW\*(C`@\*(C'\fR of \f(CW\*(C`\ec@\*(C'\fR in RE is possibly +treated as an array symbol (for example \f(CW@foo\fR), +even though the same text in \f(CW\*(C`qq//\*(C'\fR gives interpolation of \f(CW\*(C`\ec@\*(C'\fR. +.Sp +Code blocks such as \f(CW\*(C`(?{BLOCK})\*(C'\fR are handled by temporarily passing control +back to the perl parser, in a similar way that an interpolated array +subscript expression such as \f(CW"foo$array[1+f("[xyz")]bar"\fR would be. +.Sp +Moreover, inside \f(CW\*(C`(?{BLOCK})\*(C'\fR, \f(CW\*(C`(?#\ comment\ )\*(C'\fR, and +a \f(CW\*(C`#\*(C'\fR\-comment in a \f(CW\*(C`/x\*(C'\fR\-regular expression, no processing is +performed whatsoever. This is the first step at which the presence +of the \f(CW\*(C`/x\*(C'\fR modifier is relevant. +.Sp +Interpolation in patterns has several quirks: \f(CW$|\fR, \f(CW$(\fR, \f(CW$)\fR, \f(CW\*(C`@+\*(C'\fR +and \f(CW\*(C`@\-\*(C'\fR are not interpolated, and constructs \f(CW$var[SOMETHING]\fR are +voted (by several different estimators) to be either an array element +or \f(CW$var\fR followed by an RE alternative. This is where the notation +\&\f(CW\*(C`${arr[$bar]}\*(C'\fR comes handy: \f(CW\*(C`/${arr[0\-9]}/\*(C'\fR is interpreted as +array element \f(CW\-9\fR, not as a regular expression from the variable +\&\f(CW$arr\fR followed by a digit, which would be the interpretation of +\&\f(CW\*(C`/$arr[0\-9]/\*(C'\fR. Since voting among different estimators may occur, +the result is not predictable. +.Sp +The lack of processing of \f(CW\*(C`\e\e\*(C'\fR creates specific restrictions on +the post-processed text. If the delimiter is \f(CW\*(C`/\*(C'\fR, one cannot get +the combination \f(CW\*(C`\e/\*(C'\fR into the result of this step. \f(CW\*(C`/\*(C'\fR will +finish the regular expression, \f(CW\*(C`\e/\*(C'\fR will be stripped to \f(CW\*(C`/\*(C'\fR on +the previous step, and \f(CW\*(C`\e\e/\*(C'\fR will be left as is. Because \f(CW\*(C`/\*(C'\fR is +equivalent to \f(CW\*(C`\e/\*(C'\fR inside a regular expression, this does not +matter unless the delimiter happens to be character special to the +RE engine, such as in \f(CW\*(C`s*foo*bar*\*(C'\fR, \f(CW\*(C`m[foo]\*(C'\fR, or \f(CW\*(C`m?foo?\*(C'\fR; or an +alphanumeric char, as in: +.Sp +.Vb 1 +\& m m ^ a \es* b mmx; +.Ve +.Sp +In the RE above, which is intentionally obfuscated for illustration, the +delimiter is \f(CW\*(C`m\*(C'\fR, the modifier is \f(CW\*(C`mx\*(C'\fR, and after delimiter-removal the +RE is the same as for \f(CW\*(C`m/\ ^\ a\ \es*\ b\ /mx\*(C'\fR. There's more than one +reason you're encouraged to restrict your delimiters to non-alphanumeric, +non-whitespace choices. +.RE +.RS 4 +.Sp +This step is the last one for all constructs except regular expressions, +which are processed further. +.RE +.IP "Parsing regular expressions" 4 +.IX Xref "regexp, parse" +.IX Item "Parsing regular expressions" +Previous steps were performed during the compilation of Perl code, +but this one happens at run time, although it may be optimized to +be calculated at compile time if appropriate. After preprocessing +described above, and possibly after evaluation if concatenation, +joining, casing translation, or metaquoting are involved, the +resulting \fIstring\fR is passed to the RE engine for compilation. +.Sp +Whatever happens in the RE engine might be better discussed in perlre, +but for the sake of continuity, we shall do so here. +.Sp +This is another step where the presence of the \f(CW\*(C`/x\*(C'\fR modifier is +relevant. The RE engine scans the string from left to right and +converts it into a finite automaton. +.Sp +Backslashed characters are either replaced with corresponding +literal strings (as with \f(CW\*(C`\e{\*(C'\fR), or else they generate special nodes +in the finite automaton (as with \f(CW\*(C`\eb\*(C'\fR). Characters special to the +RE engine (such as \f(CW\*(C`|\*(C'\fR) generate corresponding nodes or groups of +nodes. \f(CW\*(C`(?#...)\*(C'\fR comments are ignored. All the rest is either +converted to literal strings to match, or else is ignored (as is +whitespace and \f(CW\*(C`#\*(C'\fR\-style comments if \f(CW\*(C`/x\*(C'\fR is present). +.Sp +Parsing of the bracketed character class construct, \f(CW\*(C`[...]\*(C'\fR, is +rather different than the rule used for the rest of the pattern. +The terminator of this construct is found using the same rules as +for finding the terminator of a \f(CW\*(C`{}\*(C'\fR\-delimited construct, the only +exception being that \f(CW\*(C`]\*(C'\fR immediately following \f(CW\*(C`[\*(C'\fR is treated as +though preceded by a backslash. +.Sp +The terminator of runtime \f(CW\*(C`(?{...})\*(C'\fR is found by temporarily switching +control to the perl parser, which should stop at the point where the +logically balancing terminating \f(CW\*(C`}\*(C'\fR is found. +.Sp +It is possible to inspect both the string given to RE engine and the +resulting finite automaton. See the arguments \f(CW\*(C`debug\*(C'\fR/\f(CW\*(C`debugcolor\*(C'\fR +in the \f(CW\*(C`use\ re\*(C'\fR pragma, as well as Perl's \fB\-Dr\fR command-line +switch documented in "Command Switches" in perlrun. +.IP "Optimization of regular expressions" 4 +.IX Xref "regexp, optimization" +.IX Item "Optimization of regular expressions" +This step is listed for completeness only. Since it does not change +semantics, details of this step are not documented and are subject +to change without notice. This step is performed over the finite +automaton that was generated during the previous pass. +.Sp +It is at this stage that \f(CWsplit()\fR silently optimizes \f(CW\*(C`/^/\*(C'\fR to +mean \f(CW\*(C`/^/m\*(C'\fR. +.SS "I/O Operators" +.IX Xref "operator, i o operator, io io while filehandle <> <<>> @ARGV" +.IX Subsection "I/O Operators" +There are several I/O operators you should know about. +.PP +A string enclosed by backticks (grave accents) first undergoes +double-quote interpolation. It is then interpreted as an external +command, and the output of that command is the value of the +backtick string, like in a shell. In scalar context, a single string +consisting of all output is returned. In list context, a list of +values is returned, one per line of output. (You can set \f(CW$/\fR to use +a different line terminator.) The command is executed each time the +pseudo-literal is evaluated. The status value of the command is +returned in \f(CW$?\fR (see perlvar for the interpretation of \f(CW$?\fR). +Unlike in \fBcsh\fR, no translation is done on the return data\-\-newlines +remain newlines. Unlike in any of the shells, single quotes do not +hide variable names in the command from interpretation. To pass a +literal dollar-sign through to the shell you need to hide it with a +backslash. The generalized form of backticks is \f(CW\*(C`qx//\*(C'\fR, or you can +call the "readpipe" in perlfunc function. (Because +backticks always undergo shell expansion as well, see perlsec for +security concerns.) +.IX Xref "qx ` `` backtick glob" +.PP +In scalar context, evaluating a filehandle in angle brackets yields +the next line from that file (the newline, if any, included), or +\&\f(CW\*(C`undef\*(C'\fR at end-of-file or on error. When \f(CW$/\fR is set to \f(CW\*(C`undef\*(C'\fR +(sometimes known as file-slurp mode) and the file is empty, it +returns \f(CW\*(Aq\*(Aq\fR the first time, followed by \f(CW\*(C`undef\*(C'\fR subsequently. +.PP +Ordinarily you must assign the returned value to a variable, but +there is one situation where an automatic assignment happens. If +and only if the input symbol is the only thing inside the conditional +of a \f(CW\*(C`while\*(C'\fR statement (even if disguised as a \f(CWfor(;;)\fR loop), +the value is automatically assigned to the global variable \f(CW$_\fR, +destroying whatever was there previously. (This may seem like an +odd thing to you, but you'll use the construct in almost every Perl +script you write.) The \f(CW$_\fR variable is not implicitly localized. +You'll have to put a \f(CW\*(C`local\ $_;\*(C'\fR before the loop if you want that +to happen. Furthermore, if the input symbol or an explicit assignment +of the input symbol to a scalar is used as a \f(CW\*(C`while\*(C'\fR/\f(CW\*(C`for\*(C'\fR condition, +then the condition actually tests for definedness of the expression's +value, not for its regular truth value. +.PP +Thus the following lines are equivalent: +.PP +.Vb 7 +\& while (defined($_ = <STDIN>)) { print; } +\& while ($_ = <STDIN>) { print; } +\& while (<STDIN>) { print; } +\& for (;<STDIN>;) { print; } +\& print while defined($_ = <STDIN>); +\& print while ($_ = <STDIN>); +\& print while <STDIN>; +.Ve +.PP +This also behaves similarly, but assigns to a lexical variable +instead of to \f(CW$_\fR: +.PP +.Vb 1 +\& while (my $line = <STDIN>) { print $line } +.Ve +.PP +In these loop constructs, the assigned value (whether assignment +is automatic or explicit) is then tested to see whether it is +defined. The defined test avoids problems where the line has a string +value that would be treated as false by Perl; for example a "" or +a \f(CW"0"\fR with no trailing newline. If you really mean for such values +to terminate the loop, they should be tested for explicitly: +.PP +.Vb 2 +\& while (($_ = <STDIN>) ne \*(Aq0\*(Aq) { ... } +\& while (<STDIN>) { last unless $_; ... } +.Ve +.PP +In other boolean contexts, \f(CW\*(C`<\fR\f(CIFILEHANDLE\fR\f(CW>\*(C'\fR without an +explicit \f(CW\*(C`defined\*(C'\fR test or comparison elicits a warning if the +\&\f(CW\*(C`use\ warnings\*(C'\fR pragma or the \fB\-w\fR +command-line switch (the \f(CW$^W\fR variable) is in effect. +.PP +The filehandles STDIN, STDOUT, and STDERR are predefined. (The +filehandles \f(CW\*(C`stdin\*(C'\fR, \f(CW\*(C`stdout\*(C'\fR, and \f(CW\*(C`stderr\*(C'\fR will also work except +in packages, where they would be interpreted as local identifiers +rather than global.) Additional filehandles may be created with +the \f(CWopen()\fR function, amongst others. See perlopentut and +"open" in perlfunc for details on this. +.IX Xref "stdin stdout sterr" +.PP +If a \f(CW\*(C`<\fR\f(CIFILEHANDLE\fR\f(CW>\*(C'\fR is used in a context that is looking for +a list, a list comprising all input lines is returned, one line per +list element. It's easy to grow to a rather large data space this +way, so use with care. +.PP +\&\f(CW\*(C`<\fR\f(CIFILEHANDLE\fR\f(CW>\*(C'\fR may also be spelled \f(CWreadline(*\fR\f(CIFILEHANDLE\fR\f(CW)\fR. +See "readline" in perlfunc. +.PP +The null filehandle \f(CW\*(C`<>\*(C'\fR (sometimes called the diamond operator) is +special: it can be used to emulate the +behavior of \fBsed\fR and \fBawk\fR, and any other Unix filter program +that takes a list of filenames, doing the same to each line +of input from all of them. Input from \f(CW\*(C`<>\*(C'\fR comes either from +standard input, or from each file listed on the command line. Here's +how it works: the first time \f(CW\*(C`<>\*(C'\fR is evaluated, the \f(CW@ARGV\fR array is +checked, and if it is empty, \f(CW$ARGV[0]\fR is set to \f(CW"\-"\fR, which when opened +gives you standard input. The \f(CW@ARGV\fR array is then processed as a list +of filenames. The loop +.PP +.Vb 3 +\& while (<>) { +\& ... # code for each line +\& } +.Ve +.PP +is equivalent to the following Perl-like pseudo code: +.PP +.Vb 7 +\& unshift(@ARGV, \*(Aq\-\*(Aq) unless @ARGV; +\& while ($ARGV = shift) { +\& open(ARGV, $ARGV); +\& while (<ARGV>) { +\& ... # code for each line +\& } +\& } +.Ve +.PP +except that it isn't so cumbersome to say, and will actually work. +It really does shift the \f(CW@ARGV\fR array and put the current filename +into the \f(CW$ARGV\fR variable. It also uses filehandle \fIARGV\fR +internally. \f(CW\*(C`<>\*(C'\fR is just a synonym for \f(CW\*(C`<ARGV>\*(C'\fR, which +is magical. (The pseudo code above doesn't work because it treats +\&\f(CW\*(C`<ARGV>\*(C'\fR as non-magical.) +.PP +Since the null filehandle uses the two argument form of "open" in perlfunc +it interprets special characters, so if you have a script like this: +.PP +.Vb 3 +\& while (<>) { +\& print; +\& } +.Ve +.PP +and call it with \f(CW\*(C`perl\ dangerous.pl\ \*(Aqrm\ \-rfv\ *|\*(Aq\*(C'\fR, it actually opens a +pipe, executes the \f(CW\*(C`rm\*(C'\fR command and reads \f(CW\*(C`rm\*(C'\fR's output from that pipe. +If you want all items in \f(CW@ARGV\fR to be interpreted as file names, you +can use the module \f(CW\*(C`ARGV::readonly\*(C'\fR from CPAN, or use the double +diamond bracket: +.PP +.Vb 3 +\& while (<<>>) { +\& print; +\& } +.Ve +.PP +Using double angle brackets inside of a while causes the open to use the +three argument form (with the second argument being \f(CW\*(C`<\*(C'\fR), so all +arguments in \f(CW\*(C`ARGV\*(C'\fR are treated as literal filenames (including \f(CW"\-"\fR). +(Note that for convenience, if you use \f(CW\*(C`<<>>\*(C'\fR and if \f(CW@ARGV\fR is +empty, it will still read from the standard input.) +.PP +You can modify \f(CW@ARGV\fR before the first \f(CW\*(C`<>\*(C'\fR as long as the array ends up +containing the list of filenames you really want. Line numbers (\f(CW$.\fR) +continue as though the input were one big happy file. See the example +in "eof" in perlfunc for how to reset line numbers on each file. +.PP +If you want to set \f(CW@ARGV\fR to your own list of files, go right ahead. +This sets \f(CW@ARGV\fR to all plain text files if no \f(CW@ARGV\fR was given: +.PP +.Vb 1 +\& @ARGV = grep { \-f && \-T } glob(\*(Aq*\*(Aq) unless @ARGV; +.Ve +.PP +You can even set them to pipe commands. For example, this automatically +filters compressed arguments through \fBgzip\fR: +.PP +.Vb 1 +\& @ARGV = map { /\e.(gz|Z)$/ ? "gzip \-dc < $_ |" : $_ } @ARGV; +.Ve +.PP +If you want to pass switches into your script, you can use one of the +\&\f(CW\*(C`Getopts\*(C'\fR modules or put a loop on the front like this: +.PP +.Vb 7 +\& while ($_ = $ARGV[0], /^\-/) { +\& shift; +\& last if /^\-\-$/; +\& if (/^\-D(.*)/) { $debug = $1 } +\& if (/^\-v/) { $verbose++ } +\& # ... # other switches +\& } +\& +\& while (<>) { +\& # ... # code for each line +\& } +.Ve +.PP +The \f(CW\*(C`<>\*(C'\fR symbol will return \f(CW\*(C`undef\*(C'\fR for end-of-file only once. +If you call it again after this, it will assume you are processing another +\&\f(CW@ARGV\fR list, and if you haven't set \f(CW@ARGV\fR, will read input from STDIN. +.PP +If what the angle brackets contain is a simple scalar variable (for example, +\&\f(CW$foo\fR), then that variable contains the name of the +filehandle to input from, or its typeglob, or a reference to the +same. For example: +.PP +.Vb 2 +\& $fh = \e*STDIN; +\& $line = <$fh>; +.Ve +.PP +If what's within the angle brackets is neither a filehandle nor a simple +scalar variable containing a filehandle name, typeglob, or typeglob +reference, it is interpreted as a filename pattern to be globbed, and +either a list of filenames or the next filename in the list is returned, +depending on context. This distinction is determined on syntactic +grounds alone. That means \f(CW\*(C`<$x>\*(C'\fR is always a \f(CWreadline()\fR from +an indirect handle, but \f(CW\*(C`<$hash{key}>\*(C'\fR is always a \f(CWglob()\fR. +That's because \f(CW$x\fR is a simple scalar variable, but \f(CW$hash{key}\fR is +not\-\-it's a hash element. Even \f(CW\*(C`<$x >\*(C'\fR (note the extra space) +is treated as \f(CW\*(C`glob("$x ")\*(C'\fR, not \f(CWreadline($x)\fR. +.PP +One level of double-quote interpretation is done first, but you can't +say \f(CW\*(C`<$foo>\*(C'\fR because that's an indirect filehandle as explained +in the previous paragraph. (In older versions of Perl, programmers +would insert curly brackets to force interpretation as a filename glob: +\&\f(CW\*(C`<${foo}>\*(C'\fR. These days, it's considered cleaner to call the +internal function directly as \f(CWglob($foo)\fR, which is probably the right +way to have done it in the first place.) For example: +.PP +.Vb 3 +\& while (<*.c>) { +\& chmod 0644, $_; +\& } +.Ve +.PP +is roughly equivalent to: +.PP +.Vb 5 +\& open(FOO, "echo *.c | tr \-s \*(Aq \et\er\ef\*(Aq \*(Aq\e\e012\e\e012\e\e012\e\e012\*(Aq|"); +\& while (<FOO>) { +\& chomp; +\& chmod 0644, $_; +\& } +.Ve +.PP +except that the globbing is actually done internally using the standard +\&\f(CW\*(C`File::Glob\*(C'\fR extension. Of course, the shortest way to do the above is: +.PP +.Vb 1 +\& chmod 0644, <*.c>; +.Ve +.PP +A (file)glob evaluates its (embedded) argument only when it is +starting a new list. All values must be read before it will start +over. In list context, this isn't important because you automatically +get them all anyway. However, in scalar context the operator returns +the next value each time it's called, or \f(CW\*(C`undef\*(C'\fR when the list has +run out. As with filehandle reads, an automatic \f(CW\*(C`defined\*(C'\fR is +generated when the glob occurs in the test part of a \f(CW\*(C`while\*(C'\fR, +because legal glob returns (for example, +a file called \fI0\fR) would otherwise +terminate the loop. Again, \f(CW\*(C`undef\*(C'\fR is returned only once. So if +you're expecting a single value from a glob, it is much better to +say +.PP +.Vb 1 +\& ($file) = <blurch*>; +.Ve +.PP +than +.PP +.Vb 1 +\& $file = <blurch*>; +.Ve +.PP +because the latter will alternate between returning a filename and +returning false. +.PP +If you're trying to do variable interpolation, it's definitely better +to use the \f(CWglob()\fR function, because the older notation can cause people +to become confused with the indirect filehandle notation. +.PP +.Vb 2 +\& @files = glob("$dir/*.[ch]"); +\& @files = glob($files[$i]); +.Ve +.PP +If an angle-bracket-based globbing expression is used as the condition of +a \f(CW\*(C`while\*(C'\fR or \f(CW\*(C`for\*(C'\fR loop, then it will be implicitly assigned to \f(CW$_\fR. +If either a globbing expression or an explicit assignment of a globbing +expression to a scalar is used as a \f(CW\*(C`while\*(C'\fR/\f(CW\*(C`for\*(C'\fR condition, then +the condition actually tests for definedness of the expression's value, +not for its regular truth value. +.SS "Constant Folding" +.IX Xref "constant folding folding" +.IX Subsection "Constant Folding" +Like C, Perl does a certain amount of expression evaluation at +compile time whenever it determines that all arguments to an +operator are static and have no side effects. In particular, string +concatenation happens at compile time between literals that don't do +variable substitution. Backslash interpolation also happens at +compile time. You can say +.PP +.Vb 3 +\& \*(AqNow is the time for all\*(Aq +\& . "\en" +\& . \*(Aqgood men to come to.\*(Aq +.Ve +.PP +and this all reduces to one string internally. Likewise, if +you say +.PP +.Vb 3 +\& foreach $file (@filenames) { +\& if (\-s $file > 5 + 100 * 2**16) { } +\& } +.Ve +.PP +the compiler precomputes the number which that expression +represents so that the interpreter won't have to. +.SS No-ops +.IX Xref "no-op nop" +.IX Subsection "No-ops" +Perl doesn't officially have a no-op operator, but the bare constants +\&\f(CW0\fR and \f(CW1\fR are special-cased not to produce a warning in void +context, so you can for example safely do +.PP +.Vb 1 +\& 1 while foo(); +.Ve +.SS "Bitwise String Operators" +.IX Xref "operator, bitwise, string &. |. ^. ~." +.IX Subsection "Bitwise String Operators" +Bitstrings of any size may be manipulated by the bitwise operators +(\f(CW\*(C`~ | & ^\*(C'\fR). +.PP +If the operands to a binary bitwise op are strings of different +sizes, \fB|\fR and \fB^\fR ops act as though the shorter operand had +additional zero bits on the right, while the \fB&\fR op acts as though +the longer operand were truncated to the length of the shorter. +The granularity for such extension or truncation is one or more +bytes. +.PP +.Vb 5 +\& # ASCII\-based examples +\& print "j p \en" ^ " a h"; # prints "JAPH\en" +\& print "JA" | " ph\en"; # prints "japh\en" +\& print "japh\enJunk" & \*(Aq_\|_\|_\|_\|_\*(Aq; # prints "JAPH\en"; +\& print \*(Aqp N$\*(Aq ^ " E<H\en"; # prints "Perl\en"; +.Ve +.PP +If you are intending to manipulate bitstrings, be certain that +you're supplying bitstrings: If an operand is a number, that will imply +a \fBnumeric\fR bitwise operation. You may explicitly show which type of +operation you intend by using \f(CW""\fR or \f(CW\*(C`0+\*(C'\fR, as in the examples below. +.PP +.Vb 4 +\& $foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF) +\& $foo = \*(Aq150\*(Aq | 105; # yields 255 +\& $foo = 150 | \*(Aq105\*(Aq; # yields 255 +\& $foo = \*(Aq150\*(Aq | \*(Aq105\*(Aq; # yields string \*(Aq155\*(Aq (under ASCII) +\& +\& $baz = 0+$foo & 0+$bar; # both ops explicitly numeric +\& $biz = "$foo" ^ "$bar"; # both ops explicitly stringy +.Ve +.PP +This somewhat unpredictable behavior can be avoided with the "bitwise" +feature, new in Perl 5.22. You can enable it via \f(CWuse\ feature\ \*(Aqbitwise\*(Aq\fR or \f(CW\*(C`use v5.28\*(C'\fR. Before Perl 5.28, it used to emit a warning +in the \f(CW"experimental::bitwise"\fR category. Under this feature, the four +standard bitwise operators (\f(CW\*(C`~ | & ^\*(C'\fR) are always numeric. Adding a dot +after each operator (\f(CW\*(C`~. |. &. ^.\*(C'\fR) forces it to treat its operands as +strings: +.PP +.Vb 9 +\& use feature "bitwise"; +\& $foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF) +\& $foo = \*(Aq150\*(Aq | 105; # yields 255 +\& $foo = 150 | \*(Aq105\*(Aq; # yields 255 +\& $foo = \*(Aq150\*(Aq | \*(Aq105\*(Aq; # yields 255 +\& $foo = 150 |. 105; # yields string \*(Aq155\*(Aq +\& $foo = \*(Aq150\*(Aq |. 105; # yields string \*(Aq155\*(Aq +\& $foo = 150 |.\*(Aq105\*(Aq; # yields string \*(Aq155\*(Aq +\& $foo = \*(Aq150\*(Aq |.\*(Aq105\*(Aq; # yields string \*(Aq155\*(Aq +\& +\& $baz = $foo & $bar; # both operands numeric +\& $biz = $foo ^. $bar; # both operands stringy +.Ve +.PP +The assignment variants of these operators (\f(CW\*(C`&= |= ^= &.= |.= ^.=\*(C'\fR) +behave likewise under the feature. +.PP +It is a fatal error if an operand contains a character whose ordinal +value is above 0xFF, and hence not expressible except in UTF\-8. The +operation is performed on a non\-UTF\-8 copy for other operands encoded in +UTF\-8. See "Byte and Character Semantics" in perlunicode. +.PP +See "vec" in perlfunc for information on how to manipulate individual bits +in a bit vector. +.SS "Integer Arithmetic" +.IX Xref "integer" +.IX Subsection "Integer Arithmetic" +By default, Perl assumes that it must do most of its arithmetic in +floating point. But by saying +.PP +.Vb 1 +\& use integer; +.Ve +.PP +you may tell the compiler to use integer operations +(see integer for a detailed explanation) from here to the end of +the enclosing BLOCK. An inner BLOCK may countermand this by saying +.PP +.Vb 1 +\& no integer; +.Ve +.PP +which lasts until the end of that BLOCK. Note that this doesn't +mean everything is an integer, merely that Perl will use integer +operations for arithmetic, comparison, and bitwise operators. For +example, even under \f(CW\*(C`use\ integer\*(C'\fR, if you take the \f(CWsqrt(2)\fR, you'll +still get \f(CW1.4142135623731\fR or so. +.PP +Used on numbers, the bitwise operators (\f(CW\*(C`&\*(C'\fR \f(CW\*(C`|\*(C'\fR \f(CW\*(C`^\*(C'\fR \f(CW\*(C`~\*(C'\fR \f(CW\*(C`<<\*(C'\fR +\&\f(CW\*(C`>>\*(C'\fR) always produce integral results. (But see also +"Bitwise String Operators".) However, \f(CW\*(C`use\ integer\*(C'\fR still has meaning for +them. By default, their results are interpreted as unsigned integers, but +if \f(CW\*(C`use\ integer\*(C'\fR is in effect, their results are interpreted +as signed integers. For example, \f(CW\*(C`~0\*(C'\fR usually evaluates to a large +integral value. However, \f(CW\*(C`use\ integer;\ ~0\*(C'\fR is \f(CW\-1\fR on two's-complement +machines. +.SS "Floating-point Arithmetic" +.IX Subsection "Floating-point Arithmetic" + +.IX Xref "floating-point floating point float real" +.PP +While \f(CW\*(C`use\ integer\*(C'\fR provides integer-only arithmetic, there is no +analogous mechanism to provide automatic rounding or truncation to a +certain number of decimal places. For rounding to a certain number +of digits, \f(CWsprintf()\fR or \f(CWprintf()\fR is usually the easiest route. +See perlfaq4. +.PP +Floating-point numbers are only approximations to what a mathematician +would call real numbers. There are infinitely more reals than floats, +so some corners must be cut. For example: +.PP +.Vb 2 +\& printf "%.20g\en", 123456789123456789; +\& # produces 123456789123456784 +.Ve +.PP +Testing for exact floating-point equality or inequality is not a +good idea. Here's a (relatively expensive) work-around to compare +whether two floating-point numbers are equal to a particular number of +decimal places. See Knuth, volume II, for a more robust treatment of +this topic. +.PP +.Vb 7 +\& sub fp_equal { +\& my ($X, $Y, $POINTS) = @_; +\& my ($tX, $tY); +\& $tX = sprintf("%.${POINTS}g", $X); +\& $tY = sprintf("%.${POINTS}g", $Y); +\& return $tX eq $tY; +\& } +.Ve +.PP +The POSIX module (part of the standard perl distribution) implements +\&\f(CWceil()\fR, \f(CWfloor()\fR, and other mathematical and trigonometric functions. +The \f(CW\*(C`Math::Complex\*(C'\fR module (part of the standard perl distribution) +defines mathematical functions that work on both the reals and the +imaginary numbers. \f(CW\*(C`Math::Complex\*(C'\fR is not as efficient as POSIX, but +POSIX can't work with complex numbers. +.PP +Rounding in financial applications can have serious implications, and +the rounding method used should be specified precisely. In these +cases, it probably pays not to trust whichever system rounding is +being used by Perl, but to instead implement the rounding function you +need yourself. +.SS "Bigger Numbers" +.IX Xref "number, arbitrary precision" +.IX Subsection "Bigger Numbers" +The standard \f(CW\*(C`Math::BigInt\*(C'\fR, \f(CW\*(C`Math::BigRat\*(C'\fR, and +\&\f(CW\*(C`Math::BigFloat\*(C'\fR modules, +along with the \f(CW\*(C`bignum\*(C'\fR, \f(CW\*(C`bigint\*(C'\fR, and \f(CW\*(C`bigrat\*(C'\fR pragmas, provide +variable-precision arithmetic and overloaded operators, although +they're currently pretty slow. At the cost of some space and +considerable speed, they avoid the normal pitfalls associated with +limited-precision representations. +.PP +.Vb 5 +\& use 5.010; +\& use bigint; # easy interface to Math::BigInt +\& $x = 123456789123456789; +\& say $x * $x; +\& +15241578780673678515622620750190521 +.Ve +.PP +Or with rationals: +.PP +.Vb 8 +\& use 5.010; +\& use bigrat; +\& $x = 3/22; +\& $y = 4/6; +\& say "x/y is ", $x/$y; +\& say "x*y is ", $x*$y; +\& x/y is 9/44 +\& x*y is 1/11 +.Ve +.PP +Several modules let you calculate with unlimited or fixed precision +(bound only by memory and CPU time). There +are also some non-standard modules that +provide faster implementations via external C libraries. +.PP +Here is a short, but incomplete summary: +.PP +.Vb 10 +\& Math::String treat string sequences like numbers +\& Math::FixedPrecision calculate with a fixed precision +\& Math::Currency for currency calculations +\& Bit::Vector manipulate bit vectors fast (uses C) +\& Math::BigIntFast Bit::Vector wrapper for big numbers +\& Math::Pari provides access to the Pari C library +\& Math::Cephes uses the external Cephes C library (no +\& big numbers) +\& Math::Cephes::Fraction fractions via the Cephes library +\& Math::GMP another one using an external C library +\& Math::GMPz an alternative interface to libgmp\*(Aqs big ints +\& Math::GMPq an interface to libgmp\*(Aqs fraction numbers +\& Math::GMPf an interface to libgmp\*(Aqs floating point numbers +.Ve +.PP +Choose wisely. |