diff options
Diffstat (limited to 'upstream/archlinux/man1/perldata.1perl')
-rw-r--r-- | upstream/archlinux/man1/perldata.1perl | 1482 |
1 files changed, 1482 insertions, 0 deletions
diff --git a/upstream/archlinux/man1/perldata.1perl b/upstream/archlinux/man1/perldata.1perl new file mode 100644 index 00000000..d8433020 --- /dev/null +++ b/upstream/archlinux/man1/perldata.1perl @@ -0,0 +1,1482 @@ +.\" -*- mode: troff; coding: utf-8 -*- +.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) +.\" +.\" Standard preamble: +.\" ======================================================================== +.de Sp \" Vertical space (when we can't use .PP) +.if t .sp .5v +.if n .sp +.. +.de Vb \" Begin verbatim text +.ft CW +.nf +.ne \\$1 +.. +.de Ve \" End verbatim text +.ft R +.fi +.. +.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. +.ie n \{\ +. ds C` "" +. ds C' "" +'br\} +.el\{\ +. ds C` +. ds C' +'br\} +.\" +.\" Escape single quotes in literal strings from groff's Unicode transform. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.\" +.\" If the F register is >0, we'll generate index entries on stderr for +.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index +.\" entries marked with X<> in POD. Of course, you'll have to process the +.\" output yourself in some meaningful fashion. +.\" +.\" Avoid warning from groff about undefined register 'F'. +.de IX +.. +.nr rF 0 +.if \n(.g .if rF .nr rF 1 +.if (\n(rF:(\n(.g==0)) \{\ +. if \nF \{\ +. de IX +. tm Index:\\$1\t\\n%\t"\\$2" +.. +. if !\nF==2 \{\ +. nr % 0 +. nr F 2 +. \} +. \} +.\} +.rr rF +.\" ======================================================================== +.\" +.IX Title "PERLDATA 1perl" +.TH PERLDATA 1perl 2024-02-11 "perl v5.38.2" "Perl Programmers Reference Guide" +.\" For nroff, turn off justification. Always turn off hyphenation; it makes +.\" way too many mistakes in technical documents. +.if n .ad l +.nh +.SH NAME +perldata \- Perl data types +.SH DESCRIPTION +.IX Header "DESCRIPTION" +.SS "Variable names" +.IX Xref "variable, name variable name data type type" +.IX Subsection "Variable names" +Perl has three built-in data types: scalars, arrays of scalars, and +associative arrays of scalars, known as "hashes". A scalar is a +single string (of any size, limited only by the available memory), +number, or a reference to something (which will be discussed +in perlref). Normal arrays are ordered lists of scalars indexed +by number, starting with 0. Hashes are unordered collections of scalar +values indexed by their associated string key. +.PP +Values are usually referred to by name, or through a named reference. +The first character of the name tells you to what sort of data +structure it refers. The rest of the name tells you the particular +value to which it refers. Usually this name is a single \fIidentifier\fR, +that is, a string beginning with a letter or underscore, and +containing letters, underscores, and digits. In some cases, it may +be a chain of identifiers, separated by \f(CW\*(C`::\*(C'\fR (or by the deprecated \f(CW\*(C`\*(Aq\*(C'\fR); +all but the last are interpreted as names of packages, +to locate the namespace in which to look up the final identifier +(see "Packages" in perlmod for details). For a more in-depth discussion +on identifiers, see "Identifier parsing". It's possible to +substitute for a simple identifier, an expression that produces a reference +to the value at runtime. This is described in more detail below +and in perlref. +.IX Xref "identifier" +.PP +Perl also has its own built-in variables whose names don't follow +these rules. They have strange names so they don't accidentally +collide with one of your normal variables. Strings that match +parenthesized parts of a regular expression are saved under names +containing only digits after the \f(CW\*(C`$\*(C'\fR (see perlop and perlre). +In addition, several special variables that provide windows into +the inner working of Perl have names containing punctuation characters. +These are documented in perlvar. +.IX Xref "variable, built-in" +.PP +Scalar values are always named with '$', even when referring to a +scalar that is part of an array or a hash. The '$' symbol works +semantically like the English word "the" in that it indicates a +single value is expected. +.IX Xref "scalar" +.PP +.Vb 4 +\& $days # the simple scalar value "days" +\& $days[28] # the 29th element of array @days +\& $days{\*(AqFeb\*(Aq} # the \*(AqFeb\*(Aq value from hash %days +\& $#days # the last index of array @days +.Ve +.PP +Entire arrays (and slices of arrays and hashes) are denoted by '@', +which works much as the word "these" or "those" does in English, +in that it indicates multiple values are expected. +.IX Xref "array" +.PP +.Vb 3 +\& @days # ($days[0], $days[1],... $days[n]) +\& @days[3,4,5] # same as ($days[3],$days[4],$days[5]) +\& @days{\*(Aqa\*(Aq,\*(Aqc\*(Aq} # same as ($days{\*(Aqa\*(Aq},$days{\*(Aqc\*(Aq}) +.Ve +.PP +Entire hashes are denoted by '%': +.IX Xref "hash" +.PP +.Vb 1 +\& %days # (key1, val1, key2, val2 ...) +.Ve +.PP +In addition, subroutines are named with an initial '&', though this +is optional when unambiguous, just as the word "do" is often redundant +in English. Symbol table entries can be named with an initial '*', +but you don't really care about that yet (if ever :\-). +.PP +Every variable type has its own namespace, as do several +non-variable identifiers. This means that you can, without fear +of conflict, use the same name for a scalar variable, an array, or +a hash\-\-or, for that matter, for a filehandle, a directory handle, a +subroutine name, a format name, or a label. This means that \f(CW$foo\fR +and \f(CW@foo\fR are two different variables. It also means that \f(CW$foo[1]\fR +is a part of \f(CW@foo\fR, not a part of \f(CW$foo\fR. This may seem a bit weird, +but that's okay, because it is weird. +.IX Xref "namespace" +.PP +Because variable references always start with '$', '@', or '%', the +"reserved" words aren't in fact reserved with respect to variable +names. They \fIare\fR reserved with respect to labels and filehandles, +however, which don't have an initial special character. You can't +have a filehandle named "log", for instance. Hint: you could say +\&\f(CW\*(C`open(LOG,\*(Aqlogfile\*(Aq)\*(C'\fR rather than \f(CW\*(C`open(log,\*(Aqlogfile\*(Aq)\*(C'\fR. Using +uppercase filehandles also improves readability and protects you +from conflict with future reserved words. Case \fIis\fR significant\-\-"FOO", +"Foo", and "foo" are all different names. Names that start with a +letter or underscore may also contain digits and underscores. +.IX Xref "identifier, case sensitivity case" +.PP +It is possible to replace such an alphanumeric name with an expression +that returns a reference to the appropriate type. For a description +of this, see perlref. +.PP +Names that start with a digit may contain only more digits. Names +that do not start with a letter, underscore, digit or a caret are +limited to one character, e.g., \f(CW$%\fR or +\&\f(CW$$\fR. (Most of these one character names have a predefined +significance to Perl. For instance, \f(CW$$\fR is the current process +id. And all such names are reserved for Perl's possible use.) +.SS "Identifier parsing" +.IX Xref "identifiers" +.IX Subsection "Identifier parsing" +Up until Perl 5.18, the actual rules of what a valid identifier +was were a bit fuzzy. However, in general, anything defined here should +work on previous versions of Perl, while the opposite \-\- edge cases +that work in previous versions, but aren't defined here \-\- probably +won't work on newer versions. +As an important side note, please note that the following only applies +to bareword identifiers as found in Perl source code, not identifiers +introduced through symbolic references, which have much fewer +restrictions. +If working under the effect of the \f(CW\*(C`use utf8;\*(C'\fR pragma, the following +rules apply: +.PP +.Vb 2 +\& / (?[ ( \ep{Word} & \ep{XID_Start} ) + [_] ]) +\& (?[ ( \ep{Word} & \ep{XID_Continue} ) ]) * /x +.Ve +.PP +That is, a "start" character followed by any number of "continue" +characters. Perl requires every character in an identifier to also +match \f(CW\*(C`\ew\*(C'\fR (this prevents some problematic cases); and Perl +additionally accepts identifier names beginning with an underscore. +.PP +If not under \f(CW\*(C`use utf8\*(C'\fR, the source is treated as ASCII + 128 extra +generic characters, and identifiers should match +.PP +.Vb 1 +\& / (?aa) (?!\ed) \ew+ /x +.Ve +.PP +That is, any word character in the ASCII range, as long as the first +character is not a digit. +.PP +There are two package separators in Perl: A double colon (\f(CW\*(C`::\*(C'\fR) and a single +quote (\f(CW\*(C`\*(Aq\*(C'\fR). Use of \f(CW\*(C`\*(Aq\*(C'\fR as the package separator is deprecated and will be +removed in Perl 5.40. Normal identifiers can start or end with a double +colon, and can contain several parts delimited by double colons. Single +quotes have similar rules, but with the exception that they are not legal at +the end of an identifier: That is, \f(CW\*(C`$\*(Aqfoo\*(C'\fR and \f(CW\*(C`$foo\*(Aqbar\*(C'\fR are legal, but +\&\f(CW\*(C`$foo\*(Aqbar\*(Aq\*(C'\fR is not. +.PP +Additionally, if the identifier is preceded by a sigil \-\- +that is, if the identifier is part of a variable name \-\- it +may optionally be enclosed in braces. +.PP +While you can mix double colons with singles quotes, the quotes must come +after the colons: \f(CW\*(C`$::::\*(Aqfoo\*(C'\fR and \f(CW\*(C`$foo::\*(Aqbar\*(C'\fR are legal, but \f(CW\*(C`$::\*(Aq::foo\*(C'\fR +and \f(CW\*(C`$foo\*(Aq::bar\*(C'\fR are not. +.PP +Put together, a grammar to match a basic identifier becomes +.PP +.Vb 10 +\& / +\& (?(DEFINE) +\& (?<variable> +\& (?&sigil) +\& (?: +\& (?&normal_identifier) +\& | \e{ \es* (?&normal_identifier) \es* \e} +\& ) +\& ) +\& (?<normal_identifier> +\& (?: :: )* \*(Aq? +\& (?&basic_identifier) +\& (?: (?= (?: :: )+ \*(Aq? | (?: :: )* \*(Aq ) (?&normal_identifier) )? +\& (?: :: )* +\& ) +\& (?<basic_identifier> +\& # is use utf8 on? +\& (?(?{ (caller(0))[8] & $utf8::hint_bits }) +\& (?&Perl_XIDS) (?&Perl_XIDC)* +\& | (?aa) (?!\ed) \ew+ +\& ) +\& ) +\& (?<sigil> [&*\e$\e@\e%]) +\& (?<Perl_XIDS> (?[ ( \ep{Word} & \ep{XID_Start} ) + [_] ]) ) +\& (?<Perl_XIDC> (?[ \ep{Word} & \ep{XID_Continue} ]) ) +\& ) +\& /x +.Ve +.PP +Meanwhile, special identifiers don't follow the above rules; For the most +part, all of the identifiers in this category have a special meaning given +by Perl. Because they have special parsing rules, these generally can't be +fully-qualified. They come in six forms (but don't use forms 5 and 6): +.IP 1. 4 +A sigil, followed solely by digits matching \f(CW\*(C`\ep{POSIX_Digit}\*(C'\fR, like +\&\f(CW$0\fR, \f(CW$1\fR, or \f(CW$10000\fR. +.IP 2. 4 +A sigil followed by a single character matching the \f(CW\*(C`\ep{POSIX_Punct}\*(C'\fR +property, like \f(CW$!\fR or \f(CW\*(C`%+\*(C'\fR, except the character \f(CW"{"\fR doesn't work. +.IP 3. 4 +A sigil, followed by a caret and any one of the characters +\&\f(CW\*(C`[][A\-Z^_?\e]\*(C'\fR, like \f(CW$^V\fR or \f(CW$^]\fR. +.IP 4. 4 +Similar to the above, a sigil, followed by bareword text in braces, +where the first character is a caret. The next character is any one of +the characters \f(CW\*(C`[][A\-Z^_?\e]\*(C'\fR, followed by ASCII word characters. An +example is \f(CW\*(C`${^GLOBAL_PHASE}\*(C'\fR. +.IP 5. 4 +A sigil, followed by any single character in the range \f(CW\*(C`[\exA1\-\exAC\exAE\-\exFF]\*(C'\fR +when not under \f(CW"use\ utf8"\fR. (Under \f(CW"use\ utf8"\fR, the normal +identifier rules given earlier in this section apply.) Use of +non-graphic characters (the C1 controls, the NO-BREAK SPACE, and the +SOFT HYPHEN) has been disallowed since v5.26.0. +The use of the other characters is unwise, as these are all +reserved to have special meaning to Perl, and none of them currently +do have special meaning, though this could change without notice. +.Sp +Note that an implication of this form is that there are identifiers only +legal under \f(CW"use\ utf8"\fR, and vice-versa, for example the identifier +\&\f(CW\*(C`$état\*(C'\fR is legal under \f(CW"use\ utf8"\fR, but is otherwise +considered to be the single character variable \f(CW$é\fR followed by +the bareword \f(CW"tat"\fR, the combination of which is a syntax error. +.IP 6. 4 +This is a combination of the previous two forms. It is valid only when +not under \f(CW"use\ utf8"\fR (normal identifier rules apply when under +\&\f(CW"use\ utf8"\fR). The form is a sigil, followed by text in braces, +where the first character is any one of the characters in the range +\&\f(CW\*(C`[\ex80\-\exFF]\*(C'\fR followed by ASCII word characters up to the trailing +brace. +.Sp +The same caveats as the previous form apply: The non-graphic +characters are no longer allowed with "use\ utf8", it is unwise +to use this form at all, and utf8ness makes a big difference. +.PP +Prior to Perl v5.24, non-graphical ASCII control characters were also +allowed in some situations; this had been deprecated since v5.20. +.SS Context +.IX Xref "context scalar context list context" +.IX Subsection "Context" +The interpretation of operations and values in Perl sometimes depends +on the requirements of the context around the operation or value. +There are two major contexts: list and scalar. Certain operations +return list values in contexts wanting a list, and scalar values +otherwise. If this is true of an operation it will be mentioned in +the documentation for that operation. In other words, Perl overloads +certain operations based on whether the expected return value is +singular or plural. Some words in English work this way, like "fish" +and "sheep". +.PP +In a reciprocal fashion, an operation provides either a scalar or a +list context to each of its arguments. For example, if you say +.PP +.Vb 1 +\& int( <STDIN> ) +.Ve +.PP +the integer operation provides scalar context for the <> +operator, which responds by reading one line from STDIN and passing it +back to the integer operation, which will then find the integer value +of that line and return that. If, on the other hand, you say +.PP +.Vb 1 +\& sort( <STDIN> ) +.Ve +.PP +then the sort operation provides list context for <>, which +will proceed to read every line available up to the end of file, and +pass that list of lines back to the sort routine, which will then +sort those lines and return them as a list to whatever the context +of the sort was. +.PP +Assignment is a little bit special in that it uses its left argument +to determine the context for the right argument. Assignment to a +scalar evaluates the right-hand side in scalar context, while +assignment to an array or hash evaluates the righthand side in list +context. Assignment to a list (or slice, which is just a list +anyway) also evaluates the right-hand side in list context. +.PP +When you use the \f(CW\*(C`use warnings\*(C'\fR pragma or Perl's \fB\-w\fR command-line +option, you may see warnings +about useless uses of constants or functions in "void context". +Void context just means the value has been discarded, such as a +statement containing only \f(CW\*(C`"fred";\*(C'\fR or \f(CW\*(C`getpwuid(0);\*(C'\fR. It still +counts as scalar context for functions that care whether or not +they're being called in list context. +.PP +User-defined subroutines may choose to care whether they are being +called in a void, scalar, or list context. Most subroutines do not +need to bother, though. That's because both scalars and lists are +automatically interpolated into lists. See "wantarray" in perlfunc +for how you would dynamically discern your function's calling +context. +.SS "Scalar values" +.IX Xref "scalar number string reference" +.IX Subsection "Scalar values" +All data in Perl is a scalar, an array of scalars, or a hash of +scalars. A scalar may contain one single value in any of three +different flavors: a number, a string, or a reference. In general, +conversion from one form to another is transparent. Although a +scalar may not directly hold multiple values, it may contain a +reference to an array or hash which in turn contains multiple values. +.PP +Scalars aren't necessarily one thing or another. There's no place +to declare a scalar variable to be of type "string", type "number", +type "reference", or anything else. Because of the automatic +conversion of scalars, operations that return scalars don't need +to care (and in fact, cannot care) whether their caller is looking +for a string, a number, or a reference. Perl is a contextually +polymorphic language whose scalars can be strings, numbers, or +references (which includes objects). Although strings and numbers +are considered pretty much the same thing for nearly all purposes, +references are strongly-typed, uncastable pointers with builtin +reference-counting and destructor invocation. +.PP + + +A scalar value is interpreted as FALSE in the Boolean sense +if it is undefined, the null string or the number 0 (or its +string equivalent, "0"), and TRUE if it is anything else. The +Boolean context is just a special kind of scalar context where no +conversion to a string or a number is ever performed. +Negation of a true value by \f(CW\*(C`!\*(C'\fR or \f(CW\*(C`not\*(C'\fR returns a special false value. +When evaluated as a string it is treated as \f(CW""\fR, but as a number, it +is treated as 0. Most Perl operators +that return true or false behave this way. +.IX Xref "truth falsehood true false ! not negation 0 boolean bool" +.PP +There are actually two varieties of null strings (sometimes referred +to as "empty" strings), a defined one and an undefined one. The +defined version is just a string of length zero, such as \f(CW""\fR. +The undefined version is the value that indicates that there is +no real value for something, such as when there was an error, or +at end of file, or when you refer to an uninitialized variable or +element of an array or hash. Although in early versions of Perl, +an undefined scalar could become defined when first used in a +place expecting a defined value, this no longer happens except for +rare cases of autovivification as explained in perlref. You can +use the \fBdefined()\fR operator to determine whether a scalar value is +defined (this has no meaning on arrays or hashes), and the \fBundef()\fR +operator to produce an undefined value. +.IX Xref "defined undefined undef null string, null" +.PP +To find out whether a given string is a valid non-zero number, it's +sometimes enough to test it against both numeric 0 and also lexical +"0" (although this will cause noises if warnings are on). That's +because strings that aren't numbers count as 0, just as they do in \fBawk\fR: +.PP +.Vb 3 +\& if ($str == 0 && $str ne "0") { +\& warn "That doesn\*(Aqt look like a number"; +\& } +.Ve +.PP +That method may be best because otherwise you won't treat IEEE +notations like \f(CW\*(C`NaN\*(C'\fR or \f(CW\*(C`Infinity\*(C'\fR properly. At other times, you +might prefer to determine whether string data can be used numerically +by calling the \fBPOSIX::strtod()\fR function or by inspecting your string +with a regular expression (as documented in perlre). +.PP +.Vb 8 +\& warn "has nondigits" if /\eD/; +\& warn "not a natural number" unless /^\ed+$/; # rejects \-3 +\& warn "not an integer" unless /^\-?\ed+$/; # rejects +3 +\& warn "not an integer" unless /^[+\-]?\ed+$/; +\& warn "not a decimal number" unless /^\-?\ed+\e.?\ed*$/; # rejects .2 +\& warn "not a decimal number" unless /^\-?(?:\ed+(?:\e.\ed*)?|\e.\ed+)$/; +\& warn "not a C float" +\& unless /^([+\-]?)(?=\ed|\e.\ed)\ed*(\e.\ed*)?([Ee]([+\-]?\ed+))?$/; +.Ve +.PP +The length of an array is a scalar value. You may find the length +of array \f(CW@days\fR by evaluating \f(CW$#days\fR, as in \fBcsh\fR. However, this +isn't the length of the array; it's the subscript of the last element, +which is a different value since there is ordinarily a 0th element. +Assigning to \f(CW$#days\fR actually changes the length of the array. +Shortening an array this way destroys intervening values. Lengthening +an array that was previously shortened does not recover values +that were in those elements. +.IX Xref "$# array, length" +.PP +You can also gain some minuscule measure of efficiency by pre-extending +an array that is going to get big. You can also extend an array +by assigning to an element that is off the end of the array. You +can truncate an array down to nothing by assigning the null list +() to it. The following are equivalent: +.PP +.Vb 2 +\& @whatever = (); +\& $#whatever = \-1; +.Ve +.PP +If you evaluate an array in scalar context, it returns the length +of the array. (Note that this is not true of lists, which return +the last value, like the C comma operator, nor of built-in functions, +which return whatever they feel like returning.) The following is +always true: +.IX Xref "array, length" +.PP +.Vb 1 +\& scalar(@whatever) == $#whatever + 1; +.Ve +.PP +Some programmers choose to use an explicit conversion so as to +leave nothing to doubt: +.PP +.Vb 1 +\& $element_count = scalar(@whatever); +.Ve +.PP +If you evaluate a hash in scalar context, it returns a false value if +the hash is empty. If there are any key/value pairs, it returns a +true value. A more precise definition is version dependent. +.PP +Prior to Perl 5.25 the value returned was a string consisting of the +number of used buckets and the number of allocated buckets, separated +by a slash. This is pretty much useful only to find out whether +Perl's internal hashing algorithm is performing poorly on your data +set. For example, you stick 10,000 things in a hash, but evaluating +\&\f(CW%HASH\fR in scalar context reveals \f(CW"1/16"\fR, which means only one out +of sixteen buckets has been touched, and presumably contains all +10,000 of your items. This isn't supposed to happen. +.PP +As of Perl 5.25 the return was changed to be the count of keys in the +hash. If you need access to the old behavior you can use +\&\f(CWHash::Util::bucket_ratio()\fR instead. +.PP +If a tied hash is evaluated in scalar context, the \f(CW\*(C`SCALAR\*(C'\fR method is +called (with a fallback to \f(CW\*(C`FIRSTKEY\*(C'\fR). +.IX Xref "hash, scalar context hash, bucket bucket" +.PP +You can preallocate space for a hash by assigning to the \fBkeys()\fR function. +This rounds up the allocated buckets to the next power of two: +.PP +.Vb 1 +\& keys(%users) = 1000; # allocate 1024 buckets +.Ve +.SS "Scalar value constructors" +.IX Xref "scalar, literal scalar, constant" +.IX Subsection "Scalar value constructors" +Numeric literals are specified in any of the following floating point or +integer formats: +.PP +.Vb 11 +\& 12345 +\& 12345.67 +\& .23E\-10 # a very small number +\& 3.14_15_92 # a very important number +\& 4_294_967_296 # underscore for legibility +\& 0xff # hex +\& 0xdead_beef # more hex +\& 0377 # octal (only numbers, begins with 0) +\& 0o12_345 # alternative octal (introduced in Perl 5.33.5) +\& 0b011011 # binary +\& 0x1.999ap\-4 # hexadecimal floating point (the \*(Aqp\*(Aq is required) +.Ve +.PP +You are allowed to use underscores (underbars) in numeric literals +between digits for legibility (but not multiple underscores in a row: +\&\f(CW\*(C`23_\|_500\*(C'\fR is not legal; \f(CW\*(C`23_500\*(C'\fR is). +You could, for example, group binary +digits by threes (as for a Unix-style mode argument such as 0b110_100_100) +or by fours (to represent nibbles, as in 0b1010_0110) or in other groups. +.IX Xref "number, literal" +.PP +String literals are usually delimited by either single or double +quotes. They work much like quotes in the standard Unix shells: +double-quoted string literals are subject to backslash and variable +substitution; single-quoted strings are not (except for \f(CW\*(C`\e\*(Aq\*(C'\fR and +\&\f(CW\*(C`\e\e\*(C'\fR). The usual C\-style backslash rules apply for making +characters such as newline, tab, etc., as well as some more exotic +forms. See "Quote and Quote-like Operators" in perlop for a list. +.IX Xref "string, literal" +.PP +Hexadecimal, octal, or binary, representations in string literals +(e.g. '0xff') are not automatically converted to their integer +representation. The \fBhex()\fR and \fBoct()\fR functions make these conversions +for you. See "hex" in perlfunc and "oct" in perlfunc for more details. +.PP +Hexadecimal floating point can start just like a hexadecimal literal, +and it can be followed by an optional fractional hexadecimal part, +but it must be followed by \f(CW\*(C`p\*(C'\fR, an optional sign, and a power of two. +The format is useful for accurately presenting floating point values, +avoiding conversions to or from decimal floating point, and therefore +avoiding possible loss in precision. Notice that while most current +platforms use the 64\-bit IEEE 754 floating point, not all do. Another +potential source of (low-order) differences are the floating point +rounding modes, which can differ between CPUs, operating systems, +and compilers, and which Perl doesn't control. +.PP +You can also embed newlines directly in your strings, i.e., they can end +on a different line than they begin. This is nice, but if you forget +your trailing quote, the error will not be reported until Perl finds +another line containing the quote character, which may be much further +on in the script. Variable substitution inside strings is limited to +scalar variables, arrays, and array or hash slices. (In other words, +names beginning with $ or @, followed by an optional bracketed +expression as a subscript.) The following code segment prints out "The +price is \f(CW$100\fR." +.IX Xref "interpolation" +.PP +.Vb 2 +\& $Price = \*(Aq$100\*(Aq; # not interpolated +\& print "The price is $Price.\en"; # interpolated +.Ve +.PP +There is no double interpolation in Perl, so the \f(CW$100\fR is left as is. +.PP +By default floating point numbers substituted inside strings use the +dot (".") as the decimal separator. If \f(CW\*(C`use locale\*(C'\fR is in effect, +and \fBPOSIX::setlocale()\fR has been called, the character used for the +decimal separator is affected by the LC_NUMERIC locale. +See perllocale and POSIX. +.PP +\fIDemarcated variable names using braces\fR +.IX Subsection "Demarcated variable names using braces" +.PP +As in some shells, you can enclose the variable name in braces as a +demarcator to disambiguate it from following alphanumerics and +underscores or other text. You must also do this when interpolating a +variable into a string to separate the variable name from a following +double-colon or an apostrophe since these would be otherwise treated as +a package separator: +.IX Xref "interpolation" +.PP +.Vb 3 +\& $who = "Larry"; +\& print PASSWD "${who}::0:0:Superuser:/:/bin/perl\en"; +\& print "We use ${who}speak when ${who}\*(Aqs here.\en"; +.Ve +.PP +Without the braces, Perl would have looked for a \f(CW$whospeak\fR, a +\&\f(CW$who::0\fR, and a \f(CW\*(C`$who\*(Aqs\*(C'\fR variable. The last two would be the +\&\f(CW$0\fR and the \f(CW$s\fR variables in the (presumably) non-existent package +\&\f(CW\*(C`who\*(C'\fR. +.PP +In fact, a simple identifier within such curly braces is forced to be a +string, and likewise within a hash subscript. Neither need quoting. Our +earlier example, \f(CW$days{\*(AqFeb\*(Aq}\fR can be written as \f(CW$days{Feb}\fR and the +quotes will be assumed automatically. But anything more complicated in +the subscript will be interpreted as an expression. This means for +example that \f(CW\*(C`$version{2.0}++\*(C'\fR is equivalent to \f(CW\*(C`$version{2}++\*(C'\fR, not +to \f(CW\*(C`$version{\*(Aq2.0\*(Aq}++\*(C'\fR. +.PP +There is a similar problem with interpolation with text that looks like +array or hash access notation. Placing a simple variable like \f(CW$who\fR +immediately in front of text like \f(CW"[1]"\fR or \f(CW"{foo}"\fR would cause the +variable to be interpolated as accessing an element of \f(CW@who\fR or a +value stored in \f(CW%who\fR: +.PP +.Vb 2 +\& $who = "Larry Wall"; +\& print "$who[1] is the father of Perl.\en"; +.Ve +.PP +would attempt to access index 1 of an array named \f(CW@who\fR. Again, using +braces will prevent this from happening: +.PP +.Vb 2 +\& $who = "Larry Wall"; +\& print "${who}[1] is the father of Perl.\en"; +.Ve +.PP +will be treated the same as +.PP +.Vb 2 +\& $who = "Larry Wall"; +\& print $who . "[1] is the father of Perl.\en"; +.Ve +.PP +This notation also applies to more complex variable descriptions, +such as array or hash access with subscripts. For instance +.PP +.Vb 2 +\& @name = qw(Larry Curly Moe); +\& print "Also ${name[0]}[1] was a member\en"; +.Ve +.PP +Without the braces the above example would be parsed as a two level +array subscript in the \f(CW@name\fR array, and under \f(CW\*(C`use strict\*(C'\fR would +likely produce a fatal exception, as it would be parsed like this: +.PP +.Vb 1 +\& print "Also " . $name[0][1] . " was a member\en"; +.Ve +.PP +and not as the intended: +.PP +.Vb 1 +\& print "Also " . $name[0] . "[1] was a member\en"; +.Ve +.PP +A similar result may be derived by using a backslash on the first +character of the subscript or package notation that is not part of +the variable you want to access. Thus the above example could also +be written: +.PP +.Vb 2 +\& @name = qw(Larry Curly Moe); +\& print "Also $name[0]\e[1] was a member\en"; +.Ve +.PP +however for some special variables (multi character caret variables) the +demarcated form using curly braces is the \fBonly\fR way you can reference +the variable at all, and the only way you can access a subscript of the +variable via interpolation. +.PP +Consider the magic array \f(CW\*(C`@{^CAPTURE}\*(C'\fR which is populated by the +regex engine with the contents of all of the capture buffers in a +pattern (see perlvar and perlre). The \fBonly\fR way you can +access one of these members inside of a string is via the braced +(demarcated) form: +.PP +.Vb 2 +\& "abc"=~/(.)(.)(.)/ +\& and print "Second buffer is ${^CAPTURE[1]}"; +.Ve +.PP +is equivalent to +.PP +.Vb 2 +\& "abc"=~/(.)(.)(.)/ +\& and print "Second buffer is " . ${^CAPTURE}[1]; +.Ve +.PP +Saying \f(CW\*(C`@^CAPTURE\*(C'\fR is a syntax error, so it \fBmust\fR be referenced as +\&\f(CW\*(C`@{^CAPTURE}\*(C'\fR, and to access one of its elements in normal code you +would write \f(CW\*(C` ${^CAPTURE}[1] \*(C'\fR. However when interpolating in a string +\&\f(CW"${^CAPTURE}[1]"\fR would be equivalent to \f(CW\*(C`${^CAPTURE} . "[1]"\*(C'\fR, +which does not even refer to the same variable! Thus the subscripts must +\&\fBalso\fR be placed \fBinside\fR of the braces: \f(CW"${^CAPTURE[1]}"\fR. +.PP +The demarcated form using curly braces can be used with all the +different types of variable access, including array and hash slices. For +instance code like the following: +.PP +.Vb 3 +\& @name = qw(Larry Curly Moe); +\& local $" = " and "; +\& print "My favorites were @{name[1,2]}.\en"; +.Ve +.PP +would output +.PP +.Vb 1 +\& My favorites were Curly and Moe. +.Ve +.PP +\fISpecial floating point: infinity (Inf) and not-a-number (NaN)\fR +.IX Subsection "Special floating point: infinity (Inf) and not-a-number (NaN)" +.PP +Floating point values include the special values \f(CW\*(C`Inf\*(C'\fR and \f(CW\*(C`NaN\*(C'\fR, +for infinity and not-a-number. The infinity can be also negative. +.PP +The infinity is the result of certain math operations that overflow +the floating point range, like 9**9**9. The not-a-number is the +result when the result is undefined or unrepresentable. Though note +that you cannot get \f(CW\*(C`NaN\*(C'\fR from some common "undefined" or +"out-of-range" operations like dividing by zero, or square root of +a negative number, since Perl generates fatal errors for those. +.PP +The infinity and not-a-number have their own special arithmetic rules. +The general rule is that they are "contagious": \f(CW\*(C`Inf\*(C'\fR plus one is +\&\f(CW\*(C`Inf\*(C'\fR, and \f(CW\*(C`NaN\*(C'\fR plus one is \f(CW\*(C`NaN\*(C'\fR. Where things get interesting +is when you combine infinities and not-a-numbers: \f(CW\*(C`Inf\*(C'\fR minus \f(CW\*(C`Inf\*(C'\fR +and \f(CW\*(C`Inf\*(C'\fR divided by \f(CW\*(C`Inf\*(C'\fR are \f(CW\*(C`NaN\*(C'\fR (while \f(CW\*(C`Inf\*(C'\fR plus \f(CW\*(C`Inf\*(C'\fR is +\&\f(CW\*(C`Inf\*(C'\fR and \f(CW\*(C`Inf\*(C'\fR times \f(CW\*(C`Inf\*(C'\fR is \f(CW\*(C`Inf\*(C'\fR). \f(CW\*(C`NaN\*(C'\fR is also curious +in that it does not equal any number, \fIincluding\fR itself: +\&\f(CW\*(C`NaN\*(C'\fR != \f(CW\*(C`NaN\*(C'\fR. +.PP +Perl doesn't understand \f(CW\*(C`Inf\*(C'\fR and \f(CW\*(C`NaN\*(C'\fR as numeric literals, but +you can have them as strings, and Perl will convert them as needed: +"Inf" + 1. (You can, however, import them from the POSIX extension; +\&\f(CW\*(C`use POSIX qw(Inf NaN);\*(C'\fR and then use them as literals.) +.PP +Note that on input (string to number) Perl accepts \f(CW\*(C`Inf\*(C'\fR and \f(CW\*(C`NaN\*(C'\fR +in many forms. Case is ignored, and the Win32\-specific forms like +\&\f(CW\*(C`1.#INF\*(C'\fR are understood, but on output the values are normalized to +\&\f(CW\*(C`Inf\*(C'\fR and \f(CW\*(C`NaN\*(C'\fR. +.PP +\fIVersion Strings\fR +.IX Xref "version string vstring v-string" +.IX Subsection "Version Strings" +.PP +A literal of the form \f(CW\*(C`v1.20.300.4000\*(C'\fR is parsed as a string composed +of characters with the specified ordinals. This form, known as +v\-strings, provides an alternative, more readable way to construct +strings, rather than use the somewhat less readable interpolation form +\&\f(CW"\ex{1}\ex{14}\ex{12c}\ex{fa0}"\fR. This is useful for representing +Unicode strings, and for comparing version "numbers" using the string +comparison operators, \f(CW\*(C`cmp\*(C'\fR, \f(CW\*(C`gt\*(C'\fR, \f(CW\*(C`lt\*(C'\fR etc. If there are two or +more dots in the literal, the leading \f(CW\*(C`v\*(C'\fR may be omitted. +.PP +.Vb 3 +\& print v9786; # prints SMILEY, "\ex{263a}" +\& print v102.111.111; # prints "foo" +\& print 102.111.111; # same +.Ve +.PP +Such literals are accepted by both \f(CW\*(C`require\*(C'\fR and \f(CW\*(C`use\*(C'\fR for +doing a version check. Note that using the v\-strings for IPv4 +addresses is not portable unless you also use the +\&\fBinet_aton()\fR/\fBinet_ntoa()\fR routines of the Socket package. +.PP +Note that since Perl 5.8.1 the single-number v\-strings (like \f(CW\*(C`v65\*(C'\fR) +are not v\-strings before the \f(CW\*(C`=>\*(C'\fR operator (which is usually used +to separate a hash key from a hash value); instead they are interpreted +as literal strings ('v65'). They were v\-strings from Perl 5.6.0 to +Perl 5.8.0, but that caused more confusion and breakage than good. +Multi-number v\-strings like \f(CW\*(C`v65.66\*(C'\fR and \f(CW65.66.67\fR continue to +be v\-strings always. +.PP +\fISpecial Literals\fR +.IX Xref "special literal __END__ __DATA__ END DATA end data ^D ^Z" +.IX Subsection "Special Literals" +.PP +The special literals _\|_FILE_\|_, _\|_LINE_\|_, and _\|_PACKAGE_\|_ +represent the current filename, line number, and package name at that +point in your program. _\|_SUB_\|_ gives a reference to the current +subroutine. They may be used only as separate tokens; they +will not be interpolated into strings. If there is no current package +(due to an empty \f(CW\*(C`package;\*(C'\fR directive), _\|_PACKAGE_\|_ is the undefined +value. (But the empty \f(CW\*(C`package;\*(C'\fR is no longer supported, as of version +5.10.) Outside of a subroutine, _\|_SUB_\|_ is the undefined value. _\|_SUB_\|_ +is only available in 5.16 or higher, and only with a \f(CW\*(C`use v5.16\*(C'\fR or +\&\f(CW\*(C`use feature "current_sub"\*(C'\fR declaration. +.IX Xref "__FILE__ __LINE__ __PACKAGE__ __SUB__ line file package" +.PP +The two control characters ^D and ^Z, and the tokens _\|_END_\|_ and _\|_DATA_\|_ +may be used to indicate the logical end of the script before the actual +end of file. Any following text is ignored by the interpreter unless +read by the program as described below. +.PP +Text after _\|_DATA_\|_ may be read via the filehandle \f(CW\*(C`PACKNAME::DATA\*(C'\fR, +where \f(CW\*(C`PACKNAME\*(C'\fR is the package that was current when the _\|_DATA_\|_ +token was encountered. The filehandle is left open pointing to the +line after _\|_DATA_\|_. The program should \f(CW\*(C`close DATA\*(C'\fR when it is done +reading from it. (Leaving it open leaks filehandles if the module is +reloaded for any reason, so it's a safer practice to close it.) For +compatibility with older scripts written before _\|_DATA_\|_ was +introduced, _\|_END_\|_ behaves like _\|_DATA_\|_ in the top level script (but +not in files loaded with \f(CW\*(C`require\*(C'\fR or \f(CW\*(C`do\*(C'\fR) and leaves the remaining +contents of the file accessible via \f(CW\*(C`main::DATA\*(C'\fR. +.PP +.Vb 4 +\& while (my $line = <DATA>) { print $line; } +\& close DATA; +\& _\|_DATA_\|_ +\& Hello world. +.Ve +.PP +The \f(CW\*(C`DATA\*(C'\fR file handle by default has whatever PerlIO layers were +in place when Perl read the file to parse the source. Normally that +means that the file is being read bytewise, as if it were encoded in +Latin\-1, but there are two major ways for it to be otherwise. Firstly, +if the \f(CW\*(C`_\|_END_\|_\*(C'\fR/\f(CW\*(C`_\|_DATA_\|_\*(C'\fR token is in the scope of a \f(CW\*(C`use utf8\*(C'\fR +pragma then the \f(CW\*(C`DATA\*(C'\fR handle will be in UTF\-8 mode. And secondly, +if the source is being read from perl's standard input then the \f(CW\*(C`DATA\*(C'\fR +file handle is actually aliased to the \f(CW\*(C`STDIN\*(C'\fR file handle, and may +be in UTF\-8 mode because of the \f(CW\*(C`PERL_UNICODE\*(C'\fR environment variable or +perl's command-line switches. +.PP +See SelfLoader for more description of _\|_DATA_\|_, and +an example of its use. Note that you cannot read from the DATA +filehandle in a BEGIN block: the BEGIN block is executed as soon +as it is seen (during compilation), at which point the corresponding +_\|_DATA_\|_ (or _\|_END_\|_) token has not yet been seen. +.PP +\fIBarewords\fR +.IX Xref "bareword" +.IX Subsection "Barewords" +.PP +A word that has no other interpretation in the grammar will +be treated as if it were a quoted string. These are known as +"barewords". As with filehandles and labels, a bareword that consists +entirely of lowercase letters risks conflict with future reserved +words, and if you use the \f(CW\*(C`use warnings\*(C'\fR pragma or the \fB\-w\fR switch, +Perl will warn you about any such words. Perl limits barewords (like +identifiers) to about 250 characters. Future versions of Perl are likely +to eliminate these arbitrary limitations. +.PP +Some people may wish to outlaw barewords entirely. If you +say +.PP +.Vb 1 +\& use strict \*(Aqsubs\*(Aq; +.Ve +.PP +then any bareword that would NOT be interpreted as a subroutine call +produces a compile-time error instead. The restriction lasts to the +end of the enclosing block. An inner block may countermand this +by saying \f(CW\*(C`no strict \*(Aqsubs\*(Aq\*(C'\fR. +.PP +\fIArray Interpolation\fR +.IX Xref "array, interpolation interpolation, array $""" +.IX Subsection "Array Interpolation" +.PP +Arrays and slices are interpolated into double-quoted strings +by joining the elements with the delimiter specified in the \f(CW$"\fR +variable (\f(CW$LIST_SEPARATOR\fR if "use English;" is specified), +space by default. The following are equivalent: +.PP +.Vb 2 +\& $temp = join($", @ARGV); +\& system "echo $temp"; +\& +\& system "echo @ARGV"; +.Ve +.PP +Within search patterns (which also undergo double-quotish substitution) +there is an unfortunate ambiguity: Is \f(CW\*(C`/$foo[bar]/\*(C'\fR to be interpreted as +\&\f(CW\*(C`/${foo}[bar]/\*(C'\fR (where \f(CW\*(C`[bar]\*(C'\fR is a character class for the regular +expression) or as \f(CW\*(C`/${foo[bar]}/\*(C'\fR (where \f(CW\*(C`[bar]\*(C'\fR is the subscript to array +\&\f(CW@foo\fR)? If \f(CW@foo\fR doesn't otherwise exist, then it's obviously a +character class. If \f(CW@foo\fR exists, Perl takes a good guess about \f(CW\*(C`[bar]\*(C'\fR, +and is almost always right. If it does guess wrong, or if you're just +plain paranoid, you can force the correct interpretation with curly +braces as above. +.PP +If you're looking for the information on how to use here-documents, +which used to be here, that's been moved to +"Quote and Quote-like Operators" in perlop. +.SS "List value constructors" +.IX Xref "list" +.IX Subsection "List value constructors" +List values are denoted by separating individual values by commas +(and enclosing the list in parentheses where precedence requires it): +.PP +.Vb 1 +\& (LIST) +.Ve +.PP +In a context not requiring a list value, the value of what appears +to be a list literal is simply the value of the final element, as +with the C comma operator. For example, +.PP +.Vb 1 +\& @foo = (\*(Aqcc\*(Aq, \*(Aq\-E\*(Aq, $bar); +.Ve +.PP +assigns the entire list value to array \f(CW@foo\fR, but +.PP +.Vb 1 +\& $foo = (\*(Aqcc\*(Aq, \*(Aq\-E\*(Aq, $bar); +.Ve +.PP +assigns the value of variable \f(CW$bar\fR to the scalar variable \f(CW$foo\fR. +Note that the value of an actual array in scalar context is the +length of the array; the following assigns the value 3 to \f(CW$foo:\fR +.PP +.Vb 2 +\& @foo = (\*(Aqcc\*(Aq, \*(Aq\-E\*(Aq, $bar); +\& $foo = @foo; # $foo gets 3 +.Ve +.PP +You may have an optional comma before the closing parenthesis of a +list literal, so that you can say: +.PP +.Vb 5 +\& @foo = ( +\& 1, +\& 2, +\& 3, +\& ); +.Ve +.PP +To use a here-document to assign an array, one line per element, +you might use an approach like this: +.PP +.Vb 7 +\& @sauces = <<End_Lines =~ m/(\eS.*\eS)/g; +\& normal tomato +\& spicy tomato +\& green chile +\& pesto +\& white wine +\& End_Lines +.Ve +.PP +LISTs do automatic interpolation of sublists. That is, when a LIST is +evaluated, each element of the list is evaluated in list context, and +the resulting list value is interpolated into LIST just as if each +individual element were a member of LIST. Thus arrays and hashes lose their +identity in a LIST\-\-the list +.PP +.Vb 1 +\& (@foo,@bar,&SomeSub,%glarch) +.Ve +.PP +contains all the elements of \f(CW@foo\fR followed by all the elements of \f(CW@bar\fR, +followed by all the elements returned by the subroutine named SomeSub +called in list context, followed by the key/value pairs of \f(CW%glarch\fR. +To make a list reference that does \fINOT\fR interpolate, see perlref. +.PP +The null list is represented by (). Interpolating it in a list +has no effect. Thus ((),(),()) is equivalent to (). Similarly, +interpolating an array with no elements is the same as if no +array had been interpolated at that point. +.PP +This interpolation combines with the facts that the opening +and closing parentheses are optional (except when necessary for +precedence) and lists may end with an optional comma to mean that +multiple commas within lists are legal syntax. The list \f(CW\*(C`1,,3\*(C'\fR is a +concatenation of two lists, \f(CW\*(C`1,\*(C'\fR and \f(CW3\fR, the first of which ends +with that optional comma. \f(CW\*(C`1,,3\*(C'\fR is \f(CW\*(C`(1,),(3)\*(C'\fR is \f(CW\*(C`1,3\*(C'\fR (And +similarly for \f(CW\*(C`1,,,3\*(C'\fR is \f(CW\*(C`(1,),(,),3\*(C'\fR is \f(CW\*(C`1,3\*(C'\fR and so on.) Not that +we'd advise you to use this obfuscation. +.PP +A list value may also be subscripted like a normal array. You must +put the list in parentheses to avoid ambiguity. For example: +.PP +.Vb 2 +\& # Stat returns list value. +\& $time = (stat($file))[8]; +\& +\& # SYNTAX ERROR HERE. +\& $time = stat($file)[8]; # OOPS, FORGOT PARENTHESES +\& +\& # Find a hex digit. +\& $hexdigit = (\*(Aqa\*(Aq,\*(Aqb\*(Aq,\*(Aqc\*(Aq,\*(Aqd\*(Aq,\*(Aqe\*(Aq,\*(Aqf\*(Aq)[$digit\-10]; +\& +\& # A "reverse comma operator". +\& return (pop(@foo),pop(@foo))[0]; +.Ve +.PP +Lists may be assigned to only when each element of the list +is itself legal to assign to: +.PP +.Vb 1 +\& ($x, $y, $z) = (1, 2, 3); +\& +\& ($map{\*(Aqred\*(Aq}, $map{\*(Aqblue\*(Aq}, $map{\*(Aqgreen\*(Aq}) = (0x00f, 0x0f0, 0xf00); +.Ve +.PP +An exception to this is that you may assign to \f(CW\*(C`undef\*(C'\fR in a list. +This is useful for throwing away some of the return values of a +function: +.PP +.Vb 1 +\& ($dev, $ino, undef, undef, $uid, $gid) = stat($file); +.Ve +.PP +As of Perl 5.22, you can also use \f(CW\*(C`(undef)x2\*(C'\fR instead of \f(CW\*(C`undef, undef\*(C'\fR. +(You can also do \f(CW\*(C`($x) x 2\*(C'\fR, which is less useful, because it assigns to +the same variable twice, clobbering the first value assigned.) +.PP +When you assign a list of scalars to an array, all previous values in that +array are wiped out and the number of elements in the array will now be equal to +the number of elements in the right-hand list \-\- the list from which +assignment was made. The array will automatically resize itself to precisely +accommodate each element in the right-hand list. +.PP +.Vb 2 +\& use warnings; +\& my (@xyz, $x, $y, $z); +\& +\& @xyz = (1, 2, 3); +\& print "@xyz\en"; # 1 2 3 +\& +\& @xyz = (\*(Aqal\*(Aq, \*(Aqbe\*(Aq, \*(Aqga\*(Aq, \*(Aqde\*(Aq); +\& print "@xyz\en"; # al be ga de +\& +\& @xyz = (101, 102); +\& print "@xyz\en"; # 101 102 +.Ve +.PP +When, however, you assign a list of scalars to another list of scalars, the +results differ according to whether the left-hand list \-\- the list being +assigned to \-\- has the same, more or fewer elements than the right-hand list. +.PP +.Vb 2 +\& ($x, $y, $z) = (1, 2, 3); +\& print "$x $y $z\en"; # 1 2 3 +\& +\& ($x, $y, $z) = (\*(Aqal\*(Aq, \*(Aqbe\*(Aq, \*(Aqga\*(Aq, \*(Aqde\*(Aq); +\& print "$x $y $z\en"; # al be ga +\& +\& ($x, $y, $z) = (101, 102); +\& print "$x $y $z\en"; # 101 102 +\& # Use of uninitialized value $z in concatenation (.) +\& # or string at [program] line [line number]. +.Ve +.PP +If the number of scalars in the left-hand list is less than that in the +right-hand list, the "extra" scalars in the right-hand list will simply not be +assigned. +.PP +If the number of scalars in the left-hand list is greater than that in the +left-hand list, the "missing" scalars will become undefined. +.PP +.Vb 6 +\& ($x, $y, $z) = (101, 102); +\& for my $el ($x, $y, $z) { +\& (defined $el) ? print "$el " : print "<undef>"; +\& } +\& print "\en"; +\& # 101 102 <undef> +.Ve +.PP +List assignment in scalar context returns the number of elements +produced by the expression on the right side of the assignment: +.PP +.Vb 2 +\& $x = (($foo,$bar) = (3,2,1)); # set $x to 3, not 2 +\& $x = (($foo,$bar) = f()); # set $x to f()\*(Aqs return count +.Ve +.PP +This is handy when you want to do a list assignment in a Boolean +context, because most list functions return a null list when finished, +which when assigned produces a 0, which is interpreted as FALSE. +.PP +It's also the source of a useful idiom for executing a function or +performing an operation in list context and then counting the number of +return values, by assigning to an empty list and then using that +assignment in scalar context. For example, this code: +.PP +.Vb 1 +\& $count = () = $string =~ /\ed+/g; +.Ve +.PP +will place into \f(CW$count\fR the number of digit groups found in \f(CW$string\fR. +This happens because the pattern match is in list context (since it +is being assigned to the empty list), and will therefore return a list +of all matching parts of the string. The list assignment in scalar +context will translate that into the number of elements (here, the +number of times the pattern matched) and assign that to \f(CW$count\fR. Note +that simply using +.PP +.Vb 1 +\& $count = $string =~ /\ed+/g; +.Ve +.PP +would not have worked, since a pattern match in scalar context will +only return true or false, rather than a count of matches. +.PP +The final element of a list assignment may be an array or a hash: +.PP +.Vb 2 +\& ($x, $y, @rest) = split; +\& my($x, $y, %rest) = @_; +.Ve +.PP +You can actually put an array or hash anywhere in the list, but the first one +in the list will soak up all the values, and anything after it will become +undefined. This may be useful in a \fBmy()\fR or \fBlocal()\fR. +.PP +A hash can be initialized using a literal list holding pairs of +items to be interpreted as a key and a value: +.PP +.Vb 2 +\& # same as map assignment above +\& %map = (\*(Aqred\*(Aq,0x00f,\*(Aqblue\*(Aq,0x0f0,\*(Aqgreen\*(Aq,0xf00); +.Ve +.PP +While literal lists and named arrays are often interchangeable, that's +not the case for hashes. Just because you can subscript a list value like +a normal array does not mean that you can subscript a list value as a +hash. Likewise, hashes included as parts of other lists (including +parameters lists and return lists from functions) always flatten out into +key/value pairs. That's why it's good to use references sometimes. +.PP +It is often more readable to use the \f(CW\*(C`=>\*(C'\fR operator between key/value +pairs. The \f(CW\*(C`=>\*(C'\fR operator is mostly just a more visually distinctive +synonym for a comma, but it also arranges for its left-hand operand to be +interpreted as a string if it's a bareword that would be a legal simple +identifier. \f(CW\*(C`=>\*(C'\fR doesn't quote compound identifiers, that contain +double colons. This makes it nice for initializing hashes: +.PP +.Vb 5 +\& %map = ( +\& red => 0x00f, +\& blue => 0x0f0, +\& green => 0xf00, +\& ); +.Ve +.PP +or for initializing hash references to be used as records: +.PP +.Vb 5 +\& $rec = { +\& witch => \*(AqMable the Merciless\*(Aq, +\& cat => \*(AqFluffy the Ferocious\*(Aq, +\& date => \*(Aq10/31/1776\*(Aq, +\& }; +.Ve +.PP +or for using call-by-named-parameter to complicated functions: +.PP +.Vb 7 +\& $field = $query\->radio_group( +\& name => \*(Aqgroup_name\*(Aq, +\& values => [\*(Aqeenie\*(Aq,\*(Aqmeenie\*(Aq,\*(Aqminie\*(Aq], +\& default => \*(Aqmeenie\*(Aq, +\& linebreak => \*(Aqtrue\*(Aq, +\& labels => \e%labels +\& ); +.Ve +.PP +Note that just because a hash is initialized in that order doesn't +mean that it comes out in that order. See "sort" in perlfunc for examples +of how to arrange for an output ordering. +.PP +If a key appears more than once in the initializer list of a hash, the last +occurrence wins: +.PP +.Vb 7 +\& %circle = ( +\& center => [5, 10], +\& center => [27, 9], +\& radius => 100, +\& color => [0xDF, 0xFF, 0x00], +\& radius => 54, +\& ); +\& +\& # same as +\& %circle = ( +\& center => [27, 9], +\& color => [0xDF, 0xFF, 0x00], +\& radius => 54, +\& ); +.Ve +.PP +This can be used to provide overridable configuration defaults: +.PP +.Vb 2 +\& # values in %args take priority over %config_defaults +\& %config = (%config_defaults, %args); +.Ve +.SS Subscripts +.IX Subsection "Subscripts" +An array can be accessed one scalar at a +time by specifying a dollar sign (\f(CW\*(C`$\*(C'\fR), then the +name of the array (without the leading \f(CW\*(C`@\*(C'\fR), then the subscript inside +square brackets. For example: +.PP +.Vb 2 +\& @myarray = (5, 50, 500, 5000); +\& print "The Third Element is", $myarray[2], "\en"; +.Ve +.PP +The array indices start with 0. A negative subscript retrieves its +value from the end. In our example, \f(CW$myarray[\-1]\fR would have been +5000, and \f(CW$myarray[\-2]\fR would have been 500. +.PP +Hash subscripts are similar, only instead of square brackets curly brackets +are used. For example: +.PP +.Vb 7 +\& %scientists = +\& ( +\& "Newton" => "Isaac", +\& "Einstein" => "Albert", +\& "Darwin" => "Charles", +\& "Feynman" => "Richard", +\& ); +\& +\& print "Darwin\*(Aqs First Name is ", $scientists{"Darwin"}, "\en"; +.Ve +.PP +You can also subscript a list to get a single element from it: +.PP +.Vb 1 +\& $dir = (getpwnam("daemon"))[7]; +.Ve +.SS "Multi-dimensional array emulation" +.IX Subsection "Multi-dimensional array emulation" +Multidimensional arrays may be emulated by subscripting a hash with a +list. The elements of the list are joined with the subscript separator +(see "$;" in perlvar). +.PP +.Vb 1 +\& $foo{$x,$y,$z} +.Ve +.PP +is equivalent to +.PP +.Vb 1 +\& $foo{join($;, $x, $y, $z)} +.Ve +.PP +The default subscript separator is "\e034", the same as SUBSEP in \fBawk\fR. +.SS Slices +.IX Xref "slice array, slice hash, slice" +.IX Subsection "Slices" +A slice accesses several elements of a list, an array, or a hash +simultaneously using a list of subscripts. It's more convenient +than writing out the individual elements as a list of separate +scalar values. +.PP +.Vb 4 +\& ($him, $her) = @folks[0,\-1]; # array slice +\& @them = @folks[0 .. 3]; # array slice +\& ($who, $home) = @ENV{"USER", "HOME"}; # hash slice +\& ($uid, $dir) = (getpwnam("daemon"))[2,7]; # list slice +.Ve +.PP +Since you can assign to a list of variables, you can also assign to +an array or hash slice. +.PP +.Vb 4 +\& @days[3..5] = qw/Wed Thu Fri/; +\& @colors{\*(Aqred\*(Aq,\*(Aqblue\*(Aq,\*(Aqgreen\*(Aq} +\& = (0xff0000, 0x0000ff, 0x00ff00); +\& @folks[0, \-1] = @folks[\-1, 0]; +.Ve +.PP +The previous assignments are exactly equivalent to +.PP +.Vb 4 +\& ($days[3], $days[4], $days[5]) = qw/Wed Thu Fri/; +\& ($colors{\*(Aqred\*(Aq}, $colors{\*(Aqblue\*(Aq}, $colors{\*(Aqgreen\*(Aq}) +\& = (0xff0000, 0x0000ff, 0x00ff00); +\& ($folks[0], $folks[\-1]) = ($folks[\-1], $folks[0]); +.Ve +.PP +Since changing a slice changes the original array or hash that it's +slicing, a \f(CW\*(C`foreach\*(C'\fR construct will alter some\-\-or even all\-\-of the +values of the array or hash. +.PP +.Vb 1 +\& foreach (@array[ 4 .. 10 ]) { s/peter/paul/ } +\& +\& foreach (@hash{qw[key1 key2]}) { +\& s/^\es+//; # trim leading whitespace +\& s/\es+$//; # trim trailing whitespace +\& s/\eb(\ew)(\ew*)\eb/\eu$1\eL$2/g; # "titlecase" words +\& } +.Ve +.PP +As a special exception, when you slice a list (but not an array or a hash), +if the list evaluates to empty, then taking a slice of that empty list will +always yield the empty list in turn. Thus: +.PP +.Vb 6 +\& @a = ()[0,1]; # @a has no elements +\& @b = (@a)[0,1]; # @b has no elements +\& @c = (sub{}\->())[0,1]; # @c has no elements +\& @d = (\*(Aqa\*(Aq,\*(Aqb\*(Aq)[0,1]; # @d has two elements +\& @e = (@d)[0,1,8,9]; # @e has four elements +\& @f = (@d)[8,9]; # @f has two elements +.Ve +.PP +This makes it easy to write loops that terminate when a null list +is returned: +.PP +.Vb 3 +\& while ( ($home, $user) = (getpwent)[7,0] ) { +\& printf "%\-8s %s\en", $user, $home; +\& } +.Ve +.PP +As noted earlier in this document, the scalar sense of list assignment +is the number of elements on the right-hand side of the assignment. +The null list contains no elements, so when the password file is +exhausted, the result is 0, not 2. +.PP +Slices in scalar context return the last item of the slice. +.PP +.Vb 4 +\& @a = qw/first second third/; +\& %h = (first => \*(AqA\*(Aq, second => \*(AqB\*(Aq); +\& $t = @a[0, 1]; # $t is now \*(Aqsecond\*(Aq +\& $u = @h{\*(Aqfirst\*(Aq, \*(Aqsecond\*(Aq}; # $u is now \*(AqB\*(Aq +.Ve +.PP +If you're confused about why you use an '@' there on a hash slice +instead of a '%', think of it like this. The type of bracket (square +or curly) governs whether it's an array or a hash being looked at. +On the other hand, the leading symbol ('$' or '@') on the array or +hash indicates whether you are getting back a singular value (a +scalar) or a plural one (a list). +.PP +\fIKey/Value Hash Slices\fR +.IX Subsection "Key/Value Hash Slices" +.PP +Starting in Perl 5.20, a hash slice operation +with the % symbol is a variant of slice operation +returning a list of key/value pairs rather than just values: +.PP +.Vb 6 +\& %h = (blonk => 2, foo => 3, squink => 5, bar => 8); +\& %subset = %h{\*(Aqfoo\*(Aq, \*(Aqbar\*(Aq}; # key/value hash slice +\& # %subset is now (foo => 3, bar => 8) +\& %removed = delete %h{\*(Aqfoo\*(Aq, \*(Aqbar\*(Aq}; +\& # %removed is now (foo => 3, bar => 8) +\& # %h is now (blonk => 2, squink => 5) +.Ve +.PP +However, the result of such a slice cannot be localized or assigned to. +These are otherwise very much consistent with hash slices +using the @ symbol. +.PP +\fIIndex/Value Array Slices\fR +.IX Subsection "Index/Value Array Slices" +.PP +Similar to key/value hash slices (and also introduced +in Perl 5.20), the % array slice syntax returns a list +of index/value pairs: +.PP +.Vb 6 +\& @a = "a".."z"; +\& @list = %a[3,4,6]; +\& # @list is now (3, "d", 4, "e", 6, "g") +\& @removed = delete %a[3,4,6] +\& # @removed is now (3, "d", 4, "e", 6, "g") +\& # @list[3,4,6] are now undef +.Ve +.PP +Note that calling \f(CW\*(C`delete\*(C'\fR on array values is +strongly discouraged. +.SS "Typeglobs and Filehandles" +.IX Xref "typeglob filehandle *" +.IX Subsection "Typeglobs and Filehandles" +Perl uses an internal type called a \fItypeglob\fR to hold an entire +symbol table entry. The type prefix of a typeglob is a \f(CW\*(C`*\*(C'\fR, because +it represents all types. This used to be the preferred way to +pass arrays and hashes by reference into a function, but now that +we have real references, this is seldom needed. +.PP +The main use of typeglobs in modern Perl is create symbol table aliases. +This assignment: +.PP +.Vb 1 +\& *this = *that; +.Ve +.PP +makes \f(CW$this\fR an alias for \f(CW$that\fR, \f(CW@this\fR an alias for \f(CW@that\fR, \f(CW%this\fR an alias +for \f(CW%that\fR, &this an alias for &that, etc. Much safer is to use a reference. +This: +.PP +.Vb 1 +\& local *Here::blue = \e$There::green; +.Ve +.PP +temporarily makes \f(CW$Here::blue\fR an alias for \f(CW$There::green\fR, but doesn't +make \f(CW@Here::blue\fR an alias for \f(CW@There::green\fR, or \f(CW%Here::blue\fR an alias for +\&\f(CW%There::green\fR, etc. See "Symbol Tables" in perlmod for more examples +of this. Strange though this may seem, this is the basis for the whole +module import/export system. +.PP +Another use for typeglobs is to pass filehandles into a function or +to create new filehandles. If you need to use a typeglob to save away +a filehandle, do it this way: +.PP +.Vb 1 +\& $fh = *STDOUT; +.Ve +.PP +or perhaps as a real reference, like this: +.PP +.Vb 1 +\& $fh = \e*STDOUT; +.Ve +.PP +See perlsub for examples of using these as indirect filehandles +in functions. +.PP +Typeglobs are also a way to create a local filehandle using the \fBlocal()\fR +operator. These last until their block is exited, but may be passed back. +For example: +.PP +.Vb 7 +\& sub newopen { +\& my $path = shift; +\& local *FH; # not my! +\& open (FH, $path) or return undef; +\& return *FH; +\& } +\& $fh = newopen(\*(Aq/etc/passwd\*(Aq); +.Ve +.PP +Now that we have the \f(CW*foo{THING}\fR notation, typeglobs aren't used as much +for filehandle manipulations, although they're still needed to pass brand +new file and directory handles into or out of functions. That's because +\&\f(CW*HANDLE{IO}\fR only works if HANDLE has already been used as a handle. +In other words, \f(CW*FH\fR must be used to create new symbol table entries; +\&\f(CW*foo{THING}\fR cannot. When in doubt, use \f(CW*FH\fR. +.PP +All functions that are capable of creating filehandles (\fBopen()\fR, +\&\fBopendir()\fR, \fBpipe()\fR, \fBsocketpair()\fR, \fBsysopen()\fR, \fBsocket()\fR, and \fBaccept()\fR) +automatically create an anonymous filehandle if the handle passed to +them is an uninitialized scalar variable. This allows the constructs +such as \f(CW\*(C`open(my $fh, ...)\*(C'\fR and \f(CW\*(C`open(local $fh,...)\*(C'\fR to be used to +create filehandles that will conveniently be closed automatically when +the scope ends, provided there are no other references to them. This +largely eliminates the need for typeglobs when opening filehandles +that must be passed around, as in the following example: +.PP +.Vb 5 +\& sub myopen { +\& open my $fh, "@_" +\& or die "Can\*(Aqt open \*(Aq@_\*(Aq: $!"; +\& return $fh; +\& } +\& +\& { +\& my $f = myopen("</etc/motd"); +\& print <$f>; +\& # $f implicitly closed here +\& } +.Ve +.PP +Note that if an initialized scalar variable is used instead the +result is different: \f(CW\*(C`my $fh=\*(Aqzzz\*(Aq; open($fh, ...)\*(C'\fR is equivalent +to \f(CW\*(C`open( *{\*(Aqzzz\*(Aq}, ...)\*(C'\fR. +\&\f(CW\*(C`use strict \*(Aqrefs\*(Aq\*(C'\fR forbids such practice. +.PP +Another way to create anonymous filehandles is with the Symbol +module or with the IO::Handle module and its ilk. These modules +have the advantage of not hiding different types of the same name +during the \fBlocal()\fR. See the bottom of "open" in perlfunc for an +example. +.SH "SEE ALSO" +.IX Header "SEE ALSO" +See perlvar for a description of Perl's built-in variables and +a discussion of legal variable names. See perlref, perlsub, +and "Symbol Tables" in perlmod for more discussion on typeglobs and +the \f(CW*foo{THING}\fR syntax. |