diff options
Diffstat (limited to 'upstream/archlinux/man1/perlfaq4.1perl')
-rw-r--r-- | upstream/archlinux/man1/perlfaq4.1perl | 3085 |
1 files changed, 3085 insertions, 0 deletions
diff --git a/upstream/archlinux/man1/perlfaq4.1perl b/upstream/archlinux/man1/perlfaq4.1perl new file mode 100644 index 00000000..c1d85d61 --- /dev/null +++ b/upstream/archlinux/man1/perlfaq4.1perl @@ -0,0 +1,3085 @@ +.\" -*- mode: troff; coding: utf-8 -*- +.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) +.\" +.\" Standard preamble: +.\" ======================================================================== +.de Sp \" Vertical space (when we can't use .PP) +.if t .sp .5v +.if n .sp +.. +.de Vb \" Begin verbatim text +.ft CW +.nf +.ne \\$1 +.. +.de Ve \" End verbatim text +.ft R +.fi +.. +.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. +.ie n \{\ +. ds C` "" +. ds C' "" +'br\} +.el\{\ +. ds C` +. ds C' +'br\} +.\" +.\" Escape single quotes in literal strings from groff's Unicode transform. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.\" +.\" If the F register is >0, we'll generate index entries on stderr for +.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index +.\" entries marked with X<> in POD. Of course, you'll have to process the +.\" output yourself in some meaningful fashion. +.\" +.\" Avoid warning from groff about undefined register 'F'. +.de IX +.. +.nr rF 0 +.if \n(.g .if rF .nr rF 1 +.if (\n(rF:(\n(.g==0)) \{\ +. if \nF \{\ +. de IX +. tm Index:\\$1\t\\n%\t"\\$2" +.. +. if !\nF==2 \{\ +. nr % 0 +. nr F 2 +. \} +. \} +.\} +.rr rF +.\" ======================================================================== +.\" +.IX Title "PERLFAQ4 1perl" +.TH PERLFAQ4 1perl 2024-02-11 "perl v5.38.2" "Perl Programmers Reference Guide" +.\" For nroff, turn off justification. Always turn off hyphenation; it makes +.\" way too many mistakes in technical documents. +.if n .ad l +.nh +.SH NAME +perlfaq4 \- Data Manipulation +.SH VERSION +.IX Header "VERSION" +version 5.20210520 +.SH DESCRIPTION +.IX Header "DESCRIPTION" +This section of the FAQ answers questions related to manipulating +numbers, dates, strings, arrays, hashes, and miscellaneous data issues. +.SH "Data: Numbers" +.IX Header "Data: Numbers" +.SS "Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?" +.IX Subsection "Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?" +For the long explanation, see David Goldberg's "What Every Computer +Scientist Should Know About Floating-Point Arithmetic" +(<http://web.cse.msu.edu/~cse320/Documents/FloatingPoint.pdf>). +.PP +Internally, your computer represents floating-point numbers in binary. +Digital (as in powers of two) computers cannot store all numbers +exactly. Some real numbers lose precision in the process. This is a +problem with how computers store numbers and affects all computer +languages, not just Perl. +.PP +perlnumber shows the gory details of number representations and +conversions. +.PP +To limit the number of decimal places in your numbers, you can use the +\&\f(CW\*(C`printf\*(C'\fR or \f(CW\*(C`sprintf\*(C'\fR function. See +"Floating-point Arithmetic" in perlop for more details. +.PP +.Vb 1 +\& printf "%.2f", 10/3; +\& +\& my $number = sprintf "%.2f", 10/3; +.Ve +.SS "Why is \fBint()\fP broken?" +.IX Subsection "Why is int() broken?" +Your \f(CWint()\fR is most probably working just fine. It's the numbers that +aren't quite what you think. +.PP +First, see the answer to "Why am I getting long decimals +(eg, 19.9499999999999) instead of the numbers I should be getting +(eg, 19.95)?". +.PP +For example, this +.PP +.Vb 1 +\& print int(0.6/0.2\-2), "\en"; +.Ve +.PP +will in most computers print 0, not 1, because even such simple +numbers as 0.6 and 0.2 cannot be presented exactly by floating-point +numbers. What you think in the above as 'three' is really more like +2.9999999999999995559. +.SS "Why isn't my octal data interpreted correctly?" +.IX Subsection "Why isn't my octal data interpreted correctly?" +(contributed by brian d foy) +.PP +You're probably trying to convert a string to a number, which Perl only +converts as a decimal number. When Perl converts a string to a number, it +ignores leading spaces and zeroes, then assumes the rest of the digits +are in base 10: +.PP +.Vb 1 +\& my $string = \*(Aq0644\*(Aq; +\& +\& print $string + 0; # prints 644 +\& +\& print $string + 44; # prints 688, certainly not octal! +.Ve +.PP +This problem usually involves one of the Perl built-ins that has the +same name a Unix command that uses octal numbers as arguments on the +command line. In this example, \f(CW\*(C`chmod\*(C'\fR on the command line knows that +its first argument is octal because that's what it does: +.PP +.Vb 1 +\& %prompt> chmod 644 file +.Ve +.PP +If you want to use the same literal digits (644) in Perl, you have to tell +Perl to treat them as octal numbers either by prefixing the digits with +a \f(CW0\fR or using \f(CW\*(C`oct\*(C'\fR: +.PP +.Vb 2 +\& chmod( 0644, $filename ); # right, has leading zero +\& chmod( oct(644), $filename ); # also correct +.Ve +.PP +The problem comes in when you take your numbers from something that Perl +thinks is a string, such as a command line argument in \f(CW@ARGV\fR: +.PP +.Vb 1 +\& chmod( $ARGV[0], $filename ); # wrong, even if "0644" +\& +\& chmod( oct($ARGV[0]), $filename ); # correct, treat string as octal +.Ve +.PP +You can always check the value you're using by printing it in octal +notation to ensure it matches what you think it should be. Print it +in octal and decimal format: +.PP +.Vb 1 +\& printf "0%o %d", $number, $number; +.Ve +.SS "Does Perl have a \fBround()\fP function? What about \fBceil()\fP and \fBfloor()\fP? Trig functions?" +.IX Subsection "Does Perl have a round() function? What about ceil() and floor()? Trig functions?" +Remember that \f(CWint()\fR merely truncates toward 0. For rounding to a +certain number of digits, \f(CWsprintf()\fR or \f(CWprintf()\fR is usually the +easiest route. +.PP +.Vb 1 +\& printf("%.3f", 3.1415926535); # prints 3.142 +.Ve +.PP +The POSIX module (part of the standard Perl distribution) +implements \f(CWceil()\fR, \f(CWfloor()\fR, and a number of other mathematical +and trigonometric functions. +.PP +.Vb 3 +\& use POSIX; +\& my $ceil = ceil(3.5); # 4 +\& my $floor = floor(3.5); # 3 +.Ve +.PP +In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex +module. With 5.004, the Math::Trig module (part of the standard Perl +distribution) implements the trigonometric functions. Internally it +uses the Math::Complex module and some functions can break out from +the real axis into the complex plane, for example the inverse sine of +2. +.PP +Rounding in financial applications can have serious implications, and +the rounding method used should be specified precisely. In these +cases, it probably pays not to trust whichever system of rounding is +being used by Perl, but instead to implement the rounding function you +need yourself. +.PP +To see why, notice how you'll still have an issue on half-way-point +alternation: +.PP +.Vb 1 +\& for (my $i = \-5; $i <= 5; $i += 0.5) { printf "%.0f ",$i } +\& +\& \-5 \-4 \-4 \-4 \-3 \-2 \-2 \-2 \-1 \-0 0 0 1 2 2 2 3 4 4 4 5 +.Ve +.PP +Don't blame Perl. It's the same as in C. IEEE says we have to do +this. Perl numbers whose absolute values are integers under 2**31 (on +32\-bit machines) will work pretty much like mathematical integers. +Other numbers are not guaranteed. +.SS "How do I convert between numeric representations/bases/radixes?" +.IX Subsection "How do I convert between numeric representations/bases/radixes?" +As always with Perl there is more than one way to do it. Below are a +few examples of approaches to making common conversions between number +representations. This is intended to be representational rather than +exhaustive. +.PP +Some of the examples later in perlfaq4 use the Bit::Vector +module from CPAN. The reason you might choose Bit::Vector over the +perl built-in functions is that it works with numbers of ANY size, +that it is optimized for speed on some operations, and for at least +some programmers the notation might be familiar. +.IP "How do I convert hexadecimal into decimal" 4 +.IX Item "How do I convert hexadecimal into decimal" +Using perl's built in conversion of \f(CW\*(C`0x\*(C'\fR notation: +.Sp +.Vb 1 +\& my $dec = 0xDEADBEEF; +.Ve +.Sp +Using the \f(CW\*(C`hex\*(C'\fR function: +.Sp +.Vb 1 +\& my $dec = hex("DEADBEEF"); +.Ve +.Sp +Using \f(CW\*(C`pack\*(C'\fR: +.Sp +.Vb 1 +\& my $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", \-8))); +.Ve +.Sp +Using the CPAN module \f(CW\*(C`Bit::Vector\*(C'\fR: +.Sp +.Vb 3 +\& use Bit::Vector; +\& my $vec = Bit::Vector\->new_Hex(32, "DEADBEEF"); +\& my $dec = $vec\->to_Dec(); +.Ve +.IP "How do I convert from decimal to hexadecimal" 4 +.IX Item "How do I convert from decimal to hexadecimal" +Using \f(CW\*(C`sprintf\*(C'\fR: +.Sp +.Vb 2 +\& my $hex = sprintf("%X", 3735928559); # upper case A\-F +\& my $hex = sprintf("%x", 3735928559); # lower case a\-f +.Ve +.Sp +Using \f(CW\*(C`unpack\*(C'\fR: +.Sp +.Vb 1 +\& my $hex = unpack("H*", pack("N", 3735928559)); +.Ve +.Sp +Using Bit::Vector: +.Sp +.Vb 3 +\& use Bit::Vector; +\& my $vec = Bit::Vector\->new_Dec(32, \-559038737); +\& my $hex = $vec\->to_Hex(); +.Ve +.Sp +And Bit::Vector supports odd bit counts: +.Sp +.Vb 4 +\& use Bit::Vector; +\& my $vec = Bit::Vector\->new_Dec(33, 3735928559); +\& $vec\->Resize(32); # suppress leading 0 if unwanted +\& my $hex = $vec\->to_Hex(); +.Ve +.IP "How do I convert from octal to decimal" 4 +.IX Item "How do I convert from octal to decimal" +Using Perl's built in conversion of numbers with leading zeros: +.Sp +.Vb 1 +\& my $dec = 033653337357; # note the leading 0! +.Ve +.Sp +Using the \f(CW\*(C`oct\*(C'\fR function: +.Sp +.Vb 1 +\& my $dec = oct("33653337357"); +.Ve +.Sp +Using Bit::Vector: +.Sp +.Vb 4 +\& use Bit::Vector; +\& my $vec = Bit::Vector\->new(32); +\& $vec\->Chunk_List_Store(3, split(//, reverse "33653337357")); +\& my $dec = $vec\->to_Dec(); +.Ve +.IP "How do I convert from decimal to octal" 4 +.IX Item "How do I convert from decimal to octal" +Using \f(CW\*(C`sprintf\*(C'\fR: +.Sp +.Vb 1 +\& my $oct = sprintf("%o", 3735928559); +.Ve +.Sp +Using Bit::Vector: +.Sp +.Vb 3 +\& use Bit::Vector; +\& my $vec = Bit::Vector\->new_Dec(32, \-559038737); +\& my $oct = reverse join(\*(Aq\*(Aq, $vec\->Chunk_List_Read(3)); +.Ve +.IP "How do I convert from binary to decimal" 4 +.IX Item "How do I convert from binary to decimal" +Perl 5.6 lets you write binary numbers directly with +the \f(CW\*(C`0b\*(C'\fR notation: +.Sp +.Vb 1 +\& my $number = 0b10110110; +.Ve +.Sp +Using \f(CW\*(C`oct\*(C'\fR: +.Sp +.Vb 2 +\& my $input = "10110110"; +\& my $decimal = oct( "0b$input" ); +.Ve +.Sp +Using \f(CW\*(C`pack\*(C'\fR and \f(CW\*(C`ord\*(C'\fR: +.Sp +.Vb 1 +\& my $decimal = ord(pack(\*(AqB8\*(Aq, \*(Aq10110110\*(Aq)); +.Ve +.Sp +Using \f(CW\*(C`pack\*(C'\fR and \f(CW\*(C`unpack\*(C'\fR for larger strings: +.Sp +.Vb 3 +\& my $int = unpack("N", pack("B32", +\& substr("0" x 32 . "11110101011011011111011101111", \-32))); +\& my $dec = sprintf("%d", $int); +\& +\& # substr() is used to left\-pad a 32\-character string with zeros. +.Ve +.Sp +Using Bit::Vector: +.Sp +.Vb 2 +\& my $vec = Bit::Vector\->new_Bin(32, "11011110101011011011111011101111"); +\& my $dec = $vec\->to_Dec(); +.Ve +.IP "How do I convert from decimal to binary" 4 +.IX Item "How do I convert from decimal to binary" +Using \f(CW\*(C`sprintf\*(C'\fR (perl 5.6+): +.Sp +.Vb 1 +\& my $bin = sprintf("%b", 3735928559); +.Ve +.Sp +Using \f(CW\*(C`unpack\*(C'\fR: +.Sp +.Vb 1 +\& my $bin = unpack("B*", pack("N", 3735928559)); +.Ve +.Sp +Using Bit::Vector: +.Sp +.Vb 3 +\& use Bit::Vector; +\& my $vec = Bit::Vector\->new_Dec(32, \-559038737); +\& my $bin = $vec\->to_Bin(); +.Ve +.Sp +The remaining transformations (e.g. hex \-> oct, bin \-> hex, etc.) +are left as an exercise to the inclined reader. +.SS "Why doesn't & work the way I want it to?" +.IX Subsection "Why doesn't & work the way I want it to?" +The behavior of binary arithmetic operators depends on whether they're +used on numbers or strings. The operators treat a string as a series +of bits and work with that (the string \f(CW"3"\fR is the bit pattern +\&\f(CW00110011\fR). The operators work with the binary form of a number +(the number \f(CW3\fR is treated as the bit pattern \f(CW00000011\fR). +.PP +So, saying \f(CW\*(C`11 & 3\*(C'\fR performs the "and" operation on numbers (yielding +\&\f(CW3\fR). Saying \f(CW"11" & "3"\fR performs the "and" operation on strings +(yielding \f(CW"1"\fR). +.PP +Most problems with \f(CW\*(C`&\*(C'\fR and \f(CW\*(C`|\*(C'\fR arise because the programmer thinks +they have a number but really it's a string or vice versa. To avoid this, +stringify the arguments explicitly (using \f(CW""\fR or \f(CWqq()\fR) or convert them +to numbers explicitly (using \f(CW\*(C`0+$arg\*(C'\fR). The rest arise because +the programmer says: +.PP +.Vb 3 +\& if ("\e020\e020" & "\e101\e101") { +\& # ... +\& } +.Ve +.PP +but a string consisting of two null bytes (the result of \f(CW"\e020\e020" +& "\e101\e101"\fR) is not a false value in Perl. You need: +.PP +.Vb 3 +\& if ( ("\e020\e020" & "\e101\e101") !~ /[^\e000]/) { +\& # ... +\& } +.Ve +.SS "How do I multiply matrices?" +.IX Subsection "How do I multiply matrices?" +Use the Math::Matrix or Math::MatrixReal modules (available from CPAN) +or the PDL extension (also available from CPAN). +.SS "How do I perform an operation on a series of integers?" +.IX Subsection "How do I perform an operation on a series of integers?" +To call a function on each element in an array, and collect the +results, use: +.PP +.Vb 1 +\& my @results = map { my_func($_) } @array; +.Ve +.PP +For example: +.PP +.Vb 1 +\& my @triple = map { 3 * $_ } @single; +.Ve +.PP +To call a function on each element of an array, but ignore the +results: +.PP +.Vb 3 +\& foreach my $iterator (@array) { +\& some_func($iterator); +\& } +.Ve +.PP +To call a function on each integer in a (small) range, you \fBcan\fR use: +.PP +.Vb 1 +\& my @results = map { some_func($_) } (5 .. 25); +.Ve +.PP +but you should be aware that in this form, the \f(CW\*(C`..\*(C'\fR operator +creates a list of all integers in the range, which can take a lot of +memory for large ranges. However, the problem does not occur when +using \f(CW\*(C`..\*(C'\fR within a \f(CW\*(C`for\*(C'\fR loop, because in that case the range +operator is optimized to \fIiterate\fR over the range, without creating +the entire list. So +.PP +.Vb 4 +\& my @results = (); +\& for my $i (5 .. 500_005) { +\& push(@results, some_func($i)); +\& } +.Ve +.PP +or even +.PP +.Vb 1 +\& push(@results, some_func($_)) for 5 .. 500_005; +.Ve +.PP +will not create an intermediate list of 500,000 integers. +.SS "How can I output Roman numerals?" +.IX Subsection "How can I output Roman numerals?" +Get the <http://www.cpan.org/modules/by\-module/Roman> module. +.SS "Why aren't my random numbers random?" +.IX Subsection "Why aren't my random numbers random?" +If you're using a version of Perl before 5.004, you must call \f(CW\*(C`srand\*(C'\fR +once at the start of your program to seed the random number generator. +.PP +.Vb 1 +\& BEGIN { srand() if $] < 5.004 } +.Ve +.PP +5.004 and later automatically call \f(CW\*(C`srand\*(C'\fR at the beginning. Don't +call \f(CW\*(C`srand\*(C'\fR more than once\-\-you make your numbers less random, +rather than more. +.PP +Computers are good at being predictable and bad at being random +(despite appearances caused by bugs in your programs :\-). The +\&\fIrandom\fR article in the "Far More Than You Ever Wanted To Know" +collection in <http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz>, courtesy +of Tom Phoenix, talks more about this. John von Neumann said, "Anyone +who attempts to generate random numbers by deterministic means is, of +course, living in a state of sin." +.PP +Perl relies on the underlying system for the implementation of +\&\f(CW\*(C`rand\*(C'\fR and \f(CW\*(C`srand\*(C'\fR; on some systems, the generated numbers are +not random enough (especially on Windows : see +<http://www.perlmonks.org/?node_id=803632>). +Several CPAN modules in the \f(CW\*(C`Math\*(C'\fR namespace implement better +pseudorandom generators; see for example +Math::Random::MT ("Mersenne Twister", fast), or +Math::TrulyRandom (uses the imperfections in the system's +timer to generate random numbers, which is rather slow). +More algorithms for random numbers are described in +"Numerical Recipes in C" at <http://www.nr.com/> +.SS "How do I get a random number between X and Y?" +.IX Subsection "How do I get a random number between X and Y?" +To get a random number between two values, you can use the \f(CWrand()\fR +built-in to get a random number between 0 and 1. From there, you shift +that into the range that you want. +.PP +\&\f(CWrand($x)\fR returns a number such that \f(CW\*(C`0 <= rand($x) < $x\*(C'\fR. Thus +what you want to have perl figure out is a random number in the range +from 0 to the difference between your \fIX\fR and \fIY\fR. +.PP +That is, to get a number between 10 and 15, inclusive, you want a +random number between 0 and 5 that you can then add to 10. +.PP +.Vb 1 +\& my $number = 10 + int rand( 15\-10+1 ); # ( 10,11,12,13,14, or 15 ) +.Ve +.PP +Hence you derive the following simple function to abstract +that. It selects a random integer between the two given +integers (inclusive). For example: \f(CW\*(C`random_int_between(50,120)\*(C'\fR. +.PP +.Vb 7 +\& sub random_int_between { +\& my($min, $max) = @_; +\& # Assumes that the two arguments are integers themselves! +\& return $min if $min == $max; +\& ($min, $max) = ($max, $min) if $min > $max; +\& return $min + int rand(1 + $max \- $min); +\& } +.Ve +.SH "Data: Dates" +.IX Header "Data: Dates" +.SS "How do I find the day or week of the year?" +.IX Subsection "How do I find the day or week of the year?" +The day of the year is in the list returned +by the \f(CW\*(C`localtime\*(C'\fR function. Without an +argument \f(CW\*(C`localtime\*(C'\fR uses the current time. +.PP +.Vb 1 +\& my $day_of_year = (localtime)[7]; +.Ve +.PP +The POSIX module can also format a date as the day of the year or +week of the year. +.PP +.Vb 3 +\& use POSIX qw/strftime/; +\& my $day_of_year = strftime "%j", localtime; +\& my $week_of_year = strftime "%W", localtime; +.Ve +.PP +To get the day of year for any date, use POSIX's \f(CW\*(C`mktime\*(C'\fR to get +a time in epoch seconds for the argument to \f(CW\*(C`localtime\*(C'\fR. +.PP +.Vb 3 +\& use POSIX qw/mktime strftime/; +\& my $week_of_year = strftime "%W", +\& localtime( mktime( 0, 0, 0, 18, 11, 87 ) ); +.Ve +.PP +You can also use Time::Piece, which comes with Perl and provides a +\&\f(CW\*(C`localtime\*(C'\fR that returns an object: +.PP +.Vb 3 +\& use Time::Piece; +\& my $day_of_year = localtime\->yday; +\& my $week_of_year = localtime\->week; +.Ve +.PP +The Date::Calc module provides two functions to calculate these, too: +.PP +.Vb 3 +\& use Date::Calc; +\& my $day_of_year = Day_of_Year( 1987, 12, 18 ); +\& my $week_of_year = Week_of_Year( 1987, 12, 18 ); +.Ve +.SS "How do I find the current century or millennium?" +.IX Subsection "How do I find the current century or millennium?" +Use the following simple functions: +.PP +.Vb 3 +\& sub get_century { +\& return int((((localtime(shift || time))[5] + 1999))/100); +\& } +\& +\& sub get_millennium { +\& return 1+int((((localtime(shift || time))[5] + 1899))/1000); +\& } +.Ve +.PP +On some systems, the POSIX module's \f(CWstrftime()\fR function has been +extended in a non-standard way to use a \f(CW%C\fR format, which they +sometimes claim is the "century". It isn't, because on most such +systems, this is only the first two digits of the four-digit year, and +thus cannot be used to determine reliably the current century or +millennium. +.SS "How can I compare two dates and find the difference?" +.IX Subsection "How can I compare two dates and find the difference?" +(contributed by brian d foy) +.PP +You could just store all your dates as a number and then subtract. +Life isn't always that simple though. +.PP +The Time::Piece module, which comes with Perl, replaces localtime +with a version that returns an object. It also overloads the comparison +operators so you can compare them directly: +.PP +.Vb 3 +\& use Time::Piece; +\& my $date1 = localtime( $some_time ); +\& my $date2 = localtime( $some_other_time ); +\& +\& if( $date1 < $date2 ) { +\& print "The date was in the past\en"; +\& } +.Ve +.PP +You can also get differences with a subtraction, which returns a +Time::Seconds object: +.PP +.Vb 2 +\& my $date_diff = $date1 \- $date2; +\& print "The difference is ", $date_diff\->days, " days\en"; +.Ve +.PP +If you want to work with formatted dates, the Date::Manip, +Date::Calc, or DateTime modules can help you. +.SS "How can I take a string and turn it into epoch seconds?" +.IX Subsection "How can I take a string and turn it into epoch seconds?" +If it's a regular enough string that it always has the same format, +you can split it up and pass the parts to \f(CW\*(C`timelocal\*(C'\fR in the standard +Time::Local module. Otherwise, you should look into the Date::Calc, +Date::Parse, and Date::Manip modules from CPAN. +.SS "How can I find the Julian Day?" +.IX Subsection "How can I find the Julian Day?" +(contributed by brian d foy and Dave Cross) +.PP +You can use the Time::Piece module, part of the Standard Library, +which can convert a date/time to a Julian Day: +.PP +.Vb 2 +\& $ perl \-MTime::Piece \-le \*(Aqprint localtime\->julian_day\*(Aq +\& 2455607.7959375 +.Ve +.PP +Or the modified Julian Day: +.PP +.Vb 2 +\& $ perl \-MTime::Piece \-le \*(Aqprint localtime\->mjd\*(Aq +\& 55607.2961226851 +.Ve +.PP +Or even the day of the year (which is what some people think of as a +Julian day): +.PP +.Vb 2 +\& $ perl \-MTime::Piece \-le \*(Aqprint localtime\->yday\*(Aq +\& 45 +.Ve +.PP +You can also do the same things with the DateTime module: +.PP +.Vb 6 +\& $ perl \-MDateTime \-le\*(Aqprint DateTime\->today\->jd\*(Aq +\& 2453401.5 +\& $ perl \-MDateTime \-le\*(Aqprint DateTime\->today\->mjd\*(Aq +\& 53401 +\& $ perl \-MDateTime \-le\*(Aqprint DateTime\->today\->doy\*(Aq +\& 31 +.Ve +.PP +You can use the Time::JulianDay module available on CPAN. Ensure +that you really want to find a Julian day, though, as many people have +different ideas about Julian days (see <http://www.hermetic.ch/cal_stud/jdn.htm> +for instance): +.PP +.Vb 2 +\& $ perl \-MTime::JulianDay \-le \*(Aqprint local_julian_day( time )\*(Aq +\& 55608 +.Ve +.SS "How do I find yesterday's date?" +.IX Xref "date yesterday DateTime Date::Calc Time::Local daylight saving time day Today_and_Now localtime timelocal" +.IX Subsection "How do I find yesterday's date?" +(contributed by brian d foy) +.PP +To do it correctly, you can use one of the \f(CW\*(C`Date\*(C'\fR modules since they +work with calendars instead of times. The DateTime module makes it +simple, and give you the same time of day, only the day before, +despite daylight saving time changes: +.PP +.Vb 1 +\& use DateTime; +\& +\& my $yesterday = DateTime\->now\->subtract( days => 1 ); +\& +\& print "Yesterday was $yesterday\en"; +.Ve +.PP +You can also use the Date::Calc module using its \f(CW\*(C`Today_and_Now\*(C'\fR +function. +.PP +.Vb 1 +\& use Date::Calc qw( Today_and_Now Add_Delta_DHMS ); +\& +\& my @date_time = Add_Delta_DHMS( Today_and_Now(), \-1, 0, 0, 0 ); +\& +\& print "@date_time\en"; +.Ve +.PP +Most people try to use the time rather than the calendar to figure out +dates, but that assumes that days are twenty-four hours each. For +most people, there are two days a year when they aren't: the switch to +and from summer time throws this off. For example, the rest of the +suggestions will be wrong sometimes: +.PP +Starting with Perl 5.10, Time::Piece and Time::Seconds are part +of the standard distribution, so you might think that you could do +something like this: +.PP +.Vb 2 +\& use Time::Piece; +\& use Time::Seconds; +\& +\& my $yesterday = localtime() \- ONE_DAY; # WRONG +\& print "Yesterday was $yesterday\en"; +.Ve +.PP +The Time::Piece module exports a new \f(CW\*(C`localtime\*(C'\fR that returns an +object, and Time::Seconds exports the \f(CW\*(C`ONE_DAY\*(C'\fR constant that is a +set number of seconds. This means that it always gives the time 24 +hours ago, which is not always yesterday. This can cause problems +around the end of daylight saving time when there's one day that is 25 +hours long. +.PP +You have the same problem with Time::Local, which will give the wrong +answer for those same special cases: +.PP +.Vb 5 +\& # contributed by Gunnar Hjalmarsson +\& use Time::Local; +\& my $today = timelocal 0, 0, 12, ( localtime )[3..5]; +\& my ($d, $m, $y) = ( localtime $today\-86400 )[3..5]; # WRONG +\& printf "Yesterday: %d\-%02d\-%02d\en", $y+1900, $m+1, $d; +.Ve +.SS "Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?" +.IX Subsection "Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?" +(contributed by brian d foy) +.PP +Perl itself never had a Y2K problem, although that never stopped people +from creating Y2K problems on their own. See the documentation for +\&\f(CW\*(C`localtime\*(C'\fR for its proper use. +.PP +Starting with Perl 5.12, \f(CW\*(C`localtime\*(C'\fR and \f(CW\*(C`gmtime\*(C'\fR can handle dates past +03:14:08 January 19, 2038, when a 32\-bit based time would overflow. You +still might get a warning on a 32\-bit \f(CW\*(C`perl\*(C'\fR: +.PP +.Vb 3 +\& % perl5.12 \-E \*(Aqsay scalar localtime( 0x9FFF_FFFFFFFF )\*(Aq +\& Integer overflow in hexadecimal number at \-e line 1. +\& Wed Nov 1 19:42:39 5576711 +.Ve +.PP +On a 64\-bit \f(CW\*(C`perl\*(C'\fR, you can get even larger dates for those really long +running projects: +.PP +.Vb 2 +\& % perl5.12 \-E \*(Aqsay scalar gmtime( 0x9FFF_FFFFFFFF )\*(Aq +\& Thu Nov 2 00:42:39 5576711 +.Ve +.PP +You're still out of luck if you need to keep track of decaying protons +though. +.SH "Data: Strings" +.IX Header "Data: Strings" +.SS "How do I validate input?" +.IX Subsection "How do I validate input?" +(contributed by brian d foy) +.PP +There are many ways to ensure that values are what you expect or +want to accept. Besides the specific examples that we cover in the +perlfaq, you can also look at the modules with "Assert" and "Validate" +in their names, along with other modules such as Regexp::Common. +.PP +Some modules have validation for particular types of input, such +as Business::ISBN, Business::CreditCard, Email::Valid, +and Data::Validate::IP. +.SS "How do I unescape a string?" +.IX Subsection "How do I unescape a string?" +It depends just what you mean by "escape". URL escapes are dealt +with in perlfaq9. Shell escapes with the backslash (\f(CW\*(C`\e\*(C'\fR) +character are removed with +.PP +.Vb 1 +\& s/\e\e(.)/$1/g; +.Ve +.PP +This won't expand \f(CW"\en"\fR or \f(CW"\et"\fR or any other special escapes. +.SS "How do I remove consecutive pairs of characters?" +.IX Subsection "How do I remove consecutive pairs of characters?" +(contributed by brian d foy) +.PP +You can use the substitution operator to find pairs of characters (or +runs of characters) and replace them with a single instance. In this +substitution, we find a character in \f(CW\*(C`(.)\*(C'\fR. The memory parentheses +store the matched character in the back-reference \f(CW\*(C`\eg1\*(C'\fR and we use +that to require that the same thing immediately follow it. We replace +that part of the string with the character in \f(CW$1\fR. +.PP +.Vb 1 +\& s/(.)\eg1/$1/g; +.Ve +.PP +We can also use the transliteration operator, \f(CW\*(C`tr///\*(C'\fR. In this +example, the search list side of our \f(CW\*(C`tr///\*(C'\fR contains nothing, but +the \f(CW\*(C`c\*(C'\fR option complements that so it contains everything. The +replacement list also contains nothing, so the transliteration is +almost a no-op since it won't do any replacements (or more exactly, +replace the character with itself). However, the \f(CW\*(C`s\*(C'\fR option squashes +duplicated and consecutive characters in the string so a character +does not show up next to itself +.PP +.Vb 2 +\& my $str = \*(AqHaarlem\*(Aq; # in the Netherlands +\& $str =~ tr///cs; # Now Harlem, like in New York +.Ve +.SS "How do I expand function calls in a string?" +.IX Subsection "How do I expand function calls in a string?" +(contributed by brian d foy) +.PP +This is documented in perlref, and although it's not the easiest +thing to read, it does work. In each of these examples, we call the +function inside the braces used to dereference a reference. If we +have more than one return value, we can construct and dereference an +anonymous array. In this case, we call the function in list context. +.PP +.Vb 1 +\& print "The time values are @{ [localtime] }.\en"; +.Ve +.PP +If we want to call the function in scalar context, we have to do a bit +more work. We can really have any code we like inside the braces, so +we simply have to end with the scalar reference, although how you do +that is up to you, and you can use code inside the braces. Note that +the use of parens creates a list context, so we need \f(CW\*(C`scalar\*(C'\fR to +force the scalar context on the function: +.PP +.Vb 1 +\& print "The time is ${\e(scalar localtime)}.\en" +\& +\& print "The time is ${ my $x = localtime; \e$x }.\en"; +.Ve +.PP +If your function already returns a reference, you don't need to create +the reference yourself. +.PP +.Vb 1 +\& sub timestamp { my $t = localtime; \e$t } +\& +\& print "The time is ${ timestamp() }.\en"; +.Ve +.PP +The \f(CW\*(C`Interpolation\*(C'\fR module can also do a lot of magic for you. You can +specify a variable name, in this case \f(CW\*(C`E\*(C'\fR, to set up a tied hash that +does the interpolation for you. It has several other methods to do this +as well. +.PP +.Vb 2 +\& use Interpolation E => \*(Aqeval\*(Aq; +\& print "The time values are $E{localtime()}.\en"; +.Ve +.PP +In most cases, it is probably easier to simply use string concatenation, +which also forces scalar context. +.PP +.Vb 1 +\& print "The time is " . localtime() . ".\en"; +.Ve +.SS "How do I find matching/nesting anything?" +.IX Subsection "How do I find matching/nesting anything?" +To find something between two single +characters, a pattern like \f(CW\*(C`/x([^x]*)x/\*(C'\fR will get the intervening +bits in \f(CW$1\fR. For multiple ones, then something more like +\&\f(CW\*(C`/alpha(.*?)omega/\*(C'\fR would be needed. For nested patterns +and/or balanced expressions, see the so-called +(?PARNO) +construct (available since perl 5.10). +The CPAN module Regexp::Common can help to build such +regular expressions (see in particular +Regexp::Common::balanced and Regexp::Common::delimited). +.PP +More complex cases will require to write a parser, probably +using a parsing module from CPAN, like +Regexp::Grammars, Parse::RecDescent, Parse::Yapp, +Text::Balanced, or Marpa::R2. +.SS "How do I reverse a string?" +.IX Subsection "How do I reverse a string?" +Use \f(CWreverse()\fR in scalar context, as documented in +"reverse" in perlfunc. +.PP +.Vb 1 +\& my $reversed = reverse $string; +.Ve +.SS "How do I expand tabs in a string?" +.IX Subsection "How do I expand tabs in a string?" +You can do it yourself: +.PP +.Vb 1 +\& 1 while $string =~ s/\et+/\*(Aq \*(Aq x (length($&) * 8 \- length($\`) % 8)/e; +.Ve +.PP +Or you can just use the Text::Tabs module (part of the standard Perl +distribution). +.PP +.Vb 2 +\& use Text::Tabs; +\& my @expanded_lines = expand(@lines_with_tabs); +.Ve +.SS "How do I reformat a paragraph?" +.IX Subsection "How do I reformat a paragraph?" +Use Text::Wrap (part of the standard Perl distribution): +.PP +.Vb 2 +\& use Text::Wrap; +\& print wrap("\et", \*(Aq \*(Aq, @paragraphs); +.Ve +.PP +The paragraphs you give to Text::Wrap should not contain embedded +newlines. Text::Wrap doesn't justify the lines (flush-right). +.PP +Or use the CPAN module Text::Autoformat. Formatting files can be +easily done by making a shell alias, like so: +.PP +.Vb 2 +\& alias fmt="perl \-i \-MText::Autoformat \-n0777 \e +\& \-e \*(Aqprint autoformat $_, {all=>1}\*(Aq $*" +.Ve +.PP +See the documentation for Text::Autoformat to appreciate its many +capabilities. +.SS "How can I access or change N characters of a string?" +.IX Subsection "How can I access or change N characters of a string?" +You can access the first characters of a string with \fBsubstr()\fR. +To get the first character, for example, start at position 0 +and grab the string of length 1. +.PP +.Vb 2 +\& my $string = "Just another Perl Hacker"; +\& my $first_char = substr( $string, 0, 1 ); # \*(AqJ\*(Aq +.Ve +.PP +To change part of a string, you can use the optional fourth +argument which is the replacement string. +.PP +.Vb 1 +\& substr( $string, 13, 4, "Perl 5.8.0" ); +.Ve +.PP +You can also use \fBsubstr()\fR as an lvalue. +.PP +.Vb 1 +\& substr( $string, 13, 4 ) = "Perl 5.8.0"; +.Ve +.SS "How do I change the Nth occurrence of something?" +.IX Subsection "How do I change the Nth occurrence of something?" +You have to keep track of N yourself. For example, let's say you want +to change the fifth occurrence of \f(CW"whoever"\fR or \f(CW"whomever"\fR into +\&\f(CW"whosoever"\fR or \f(CW"whomsoever"\fR, case insensitively. These +all assume that \f(CW$_\fR contains the string to be altered. +.PP +.Vb 6 +\& $count = 0; +\& s{((whom?)ever)}{ +\& ++$count == 5 # is it the 5th? +\& ? "${2}soever" # yes, swap +\& : $1 # renege and leave it there +\& }ige; +.Ve +.PP +In the more general case, you can use the \f(CW\*(C`/g\*(C'\fR modifier in a \f(CW\*(C`while\*(C'\fR +loop, keeping count of matches. +.PP +.Vb 8 +\& $WANT = 3; +\& $count = 0; +\& $_ = "One fish two fish red fish blue fish"; +\& while (/(\ew+)\es+fish\eb/gi) { +\& if (++$count == $WANT) { +\& print "The third fish is a $1 one.\en"; +\& } +\& } +.Ve +.PP +That prints out: \f(CW"The third fish is a red one."\fR You can also use a +repetition count and repeated pattern like this: +.PP +.Vb 1 +\& /(?:\ew+\es+fish\es+){2}(\ew+)\es+fish/i; +.Ve +.SS "How can I count the number of occurrences of a substring within a string?" +.IX Subsection "How can I count the number of occurrences of a substring within a string?" +There are a number of ways, with varying efficiency. If you want a +count of a certain single character (X) within a string, you can use the +\&\f(CW\*(C`tr///\*(C'\fR function like so: +.PP +.Vb 3 +\& my $string = "ThisXlineXhasXsomeXx\*(AqsXinXit"; +\& my $count = ($string =~ tr/X//); +\& print "There are $count X characters in the string"; +.Ve +.PP +This is fine if you are just looking for a single character. However, +if you are trying to count multiple character substrings within a +larger string, \f(CW\*(C`tr///\*(C'\fR won't work. What you can do is wrap a \fBwhile()\fR +loop around a global pattern match. For example, let's count negative +integers: +.PP +.Vb 4 +\& my $string = "\-9 55 48 \-2 23 \-76 4 14 \-44"; +\& my $count = 0; +\& while ($string =~ /\-\ed+/g) { $count++ } +\& print "There are $count negative numbers in the string"; +.Ve +.PP +Another version uses a global match in list context, then assigns the +result to a scalar, producing a count of the number of matches. +.PP +.Vb 1 +\& my $count = () = $string =~ /\-\ed+/g; +.Ve +.SS "How do I capitalize all the words on one line?" +.IX Xref "Text::Autoformat capitalize case, title case, sentence" +.IX Subsection "How do I capitalize all the words on one line?" +(contributed by brian d foy) +.PP +Damian Conway's Text::Autoformat handles all of the thinking +for you. +.PP +.Vb 3 +\& use Text::Autoformat; +\& my $x = "Dr. Strangelove or: How I Learned to Stop ". +\& "Worrying and Love the Bomb"; +\& +\& print $x, "\en"; +\& for my $style (qw( sentence title highlight )) { +\& print autoformat($x, { case => $style }), "\en"; +\& } +.Ve +.PP +How do you want to capitalize those words? +.PP +.Vb 3 +\& FRED AND BARNEY\*(AqS LODGE # all uppercase +\& Fred And Barney\*(Aqs Lodge # title case +\& Fred and Barney\*(Aqs Lodge # highlight case +.Ve +.PP +It's not as easy a problem as it looks. How many words do you think +are in there? Wait for it... wait for it.... If you answered 5 +you're right. Perl words are groups of \f(CW\*(C`\ew+\*(C'\fR, but that's not what +you want to capitalize. How is Perl supposed to know not to capitalize +that \f(CW\*(C`s\*(C'\fR after the apostrophe? You could try a regular expression: +.PP +.Vb 6 +\& $string =~ s/ ( +\& (^\ew) #at the beginning of the line +\& | # or +\& (\es\ew) #preceded by whitespace +\& ) +\& /\eU$1/xg; +\& +\& $string =~ s/([\ew\*(Aq]+)/\eu\eL$1/g; +.Ve +.PP +Now, what if you don't want to capitalize that "and"? Just use +Text::Autoformat and get on with the next problem. :) +.SS "How can I split a [character]\-delimited string except when inside [character]?" +.IX Subsection "How can I split a [character]-delimited string except when inside [character]?" +Several modules can handle this sort of parsing\-\-Text::Balanced, +Text::CSV, Text::CSV_XS, and Text::ParseWords, among others. +.PP +Take the example case of trying to split a string that is +comma-separated into its different fields. You can't use \f(CW\*(C`split(/,/)\*(C'\fR +because you shouldn't split if the comma is inside quotes. For +example, take a data line like this: +.PP +.Vb 1 +\& SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped" +.Ve +.PP +Due to the restriction of the quotes, this is a fairly complex +problem. Thankfully, we have Jeffrey Friedl, author of +\&\fIMastering Regular Expressions\fR, to handle these for us. He +suggests (assuming your string is contained in \f(CW$text\fR): +.PP +.Vb 7 +\& my @new = (); +\& push(@new, $+) while $text =~ m{ +\& "([^\e"\e\e]*(?:\e\e.[^\e"\e\e]*)*)",? # groups the phrase inside the quotes +\& | ([^,]+),? +\& | , +\& }gx; +\& push(@new, undef) if substr($text,\-1,1) eq \*(Aq,\*(Aq; +.Ve +.PP +If you want to represent quotation marks inside a +quotation-mark-delimited field, escape them with backslashes (eg, +\&\f(CW"like \e"this\e""\fR. +.PP +Alternatively, the Text::ParseWords module (part of the standard +Perl distribution) lets you say: +.PP +.Vb 2 +\& use Text::ParseWords; +\& @new = quotewords(",", 0, $text); +.Ve +.PP +For parsing or generating CSV, though, using Text::CSV rather than +implementing it yourself is highly recommended; you'll save yourself odd bugs +popping up later by just using code which has already been tried and tested in +production for years. +.SS "How do I strip blank space from the beginning/end of a string?" +.IX Subsection "How do I strip blank space from the beginning/end of a string?" +(contributed by brian d foy) +.PP +A substitution can do this for you. For a single line, you want to +replace all the leading or trailing whitespace with nothing. You +can do that with a pair of substitutions: +.PP +.Vb 2 +\& s/^\es+//; +\& s/\es+$//; +.Ve +.PP +You can also write that as a single substitution, although it turns +out the combined statement is slower than the separate ones. That +might not matter to you, though: +.PP +.Vb 1 +\& s/^\es+|\es+$//g; +.Ve +.PP +In this regular expression, the alternation matches either at the +beginning or the end of the string since the anchors have a lower +precedence than the alternation. With the \f(CW\*(C`/g\*(C'\fR flag, the substitution +makes all possible matches, so it gets both. Remember, the trailing +newline matches the \f(CW\*(C`\es+\*(C'\fR, and the \f(CW\*(C`$\*(C'\fR anchor can match to the +absolute end of the string, so the newline disappears too. Just add +the newline to the output, which has the added benefit of preserving +"blank" (consisting entirely of whitespace) lines which the \f(CW\*(C`^\es+\*(C'\fR +would remove all by itself: +.PP +.Vb 4 +\& while( <> ) { +\& s/^\es+|\es+$//g; +\& print "$_\en"; +\& } +.Ve +.PP +For a multi-line string, you can apply the regular expression to each +logical line in the string by adding the \f(CW\*(C`/m\*(C'\fR flag (for +"multi-line"). With the \f(CW\*(C`/m\*(C'\fR flag, the \f(CW\*(C`$\*(C'\fR matches \fIbefore\fR an +embedded newline, so it doesn't remove it. This pattern still removes +the newline at the end of the string: +.PP +.Vb 1 +\& $string =~ s/^\es+|\es+$//gm; +.Ve +.PP +Remember that lines consisting entirely of whitespace will disappear, +since the first part of the alternation can match the entire string +and replace it with nothing. If you need to keep embedded blank lines, +you have to do a little more work. Instead of matching any whitespace +(since that includes a newline), just match the other whitespace: +.PP +.Vb 1 +\& $string =~ s/^[\et\ef ]+|[\et\ef ]+$//mg; +.Ve +.SS "How do I pad a string with blanks or pad a number with zeroes?" +.IX Subsection "How do I pad a string with blanks or pad a number with zeroes?" +In the following examples, \f(CW$pad_len\fR is the length to which you wish +to pad the string, \f(CW$text\fR or \f(CW$num\fR contains the string to be padded, +and \f(CW$pad_char\fR contains the padding character. You can use a single +character string constant instead of the \f(CW$pad_char\fR variable if you +know what it is in advance. And in the same way you can use an integer in +place of \f(CW$pad_len\fR if you know the pad length in advance. +.PP +The simplest method uses the \f(CW\*(C`sprintf\*(C'\fR function. It can pad on the left +or right with blanks and on the left with zeroes and it will not +truncate the result. The \f(CW\*(C`pack\*(C'\fR function can only pad strings on the +right with blanks and it will truncate the result to a maximum length of +\&\f(CW$pad_len\fR. +.PP +.Vb 3 +\& # Left padding a string with blanks (no truncation): +\& my $padded = sprintf("%${pad_len}s", $text); +\& my $padded = sprintf("%*s", $pad_len, $text); # same thing +\& +\& # Right padding a string with blanks (no truncation): +\& my $padded = sprintf("%\-${pad_len}s", $text); +\& my $padded = sprintf("%\-*s", $pad_len, $text); # same thing +\& +\& # Left padding a number with 0 (no truncation): +\& my $padded = sprintf("%0${pad_len}d", $num); +\& my $padded = sprintf("%0*d", $pad_len, $num); # same thing +\& +\& # Right padding a string with blanks using pack (will truncate): +\& my $padded = pack("A$pad_len",$text); +.Ve +.PP +If you need to pad with a character other than blank or zero you can use +one of the following methods. They all generate a pad string with the +\&\f(CW\*(C`x\*(C'\fR operator and combine that with \f(CW$text\fR. These methods do +not truncate \f(CW$text\fR. +.PP +Left and right padding with any character, creating a new string: +.PP +.Vb 2 +\& my $padded = $pad_char x ( $pad_len \- length( $text ) ) . $text; +\& my $padded = $text . $pad_char x ( $pad_len \- length( $text ) ); +.Ve +.PP +Left and right padding with any character, modifying \f(CW$text\fR directly: +.PP +.Vb 2 +\& substr( $text, 0, 0 ) = $pad_char x ( $pad_len \- length( $text ) ); +\& $text .= $pad_char x ( $pad_len \- length( $text ) ); +.Ve +.SS "How do I extract selected columns from a string?" +.IX Subsection "How do I extract selected columns from a string?" +(contributed by brian d foy) +.PP +If you know the columns that contain the data, you can +use \f(CW\*(C`substr\*(C'\fR to extract a single column. +.PP +.Vb 1 +\& my $column = substr( $line, $start_column, $length ); +.Ve +.PP +You can use \f(CW\*(C`split\*(C'\fR if the columns are separated by whitespace or +some other delimiter, as long as whitespace or the delimiter cannot +appear as part of the data. +.PP +.Vb 3 +\& my $line = \*(Aq fred barney betty \*(Aq; +\& my @columns = split /\es+/, $line; +\& # ( \*(Aq\*(Aq, \*(Aqfred\*(Aq, \*(Aqbarney\*(Aq, \*(Aqbetty\*(Aq ); +\& +\& my $line = \*(Aqfred||barney||betty\*(Aq; +\& my @columns = split /\e|/, $line; +\& # ( \*(Aqfred\*(Aq, \*(Aq\*(Aq, \*(Aqbarney\*(Aq, \*(Aq\*(Aq, \*(Aqbetty\*(Aq ); +.Ve +.PP +If you want to work with comma-separated values, don't do this since +that format is a bit more complicated. Use one of the modules that +handle that format, such as Text::CSV, Text::CSV_XS, or +Text::CSV_PP. +.PP +If you want to break apart an entire line of fixed columns, you can use +\&\f(CW\*(C`unpack\*(C'\fR with the A (ASCII) format. By using a number after the format +specifier, you can denote the column width. See the \f(CW\*(C`pack\*(C'\fR and \f(CW\*(C`unpack\*(C'\fR +entries in perlfunc for more details. +.PP +.Vb 1 +\& my @fields = unpack( $line, "A8 A8 A8 A16 A4" ); +.Ve +.PP +Note that spaces in the format argument to \f(CW\*(C`unpack\*(C'\fR do not denote literal +spaces. If you have space separated data, you may want \f(CW\*(C`split\*(C'\fR instead. +.SS "How do I find the soundex value of a string?" +.IX Subsection "How do I find the soundex value of a string?" +(contributed by brian d foy) +.PP +You can use the \f(CW\*(C`Text::Soundex\*(C'\fR module. If you want to do fuzzy or close +matching, you might also try the String::Approx, and +Text::Metaphone, and Text::DoubleMetaphone modules. +.SS "How can I expand variables in text strings?" +.IX Subsection "How can I expand variables in text strings?" +(contributed by brian d foy) +.PP +If you can avoid it, don't, or if you can use a templating system, +such as Text::Template or Template Toolkit, do that instead. You +might even be able to get the job done with \f(CW\*(C`sprintf\*(C'\fR or \f(CW\*(C`printf\*(C'\fR: +.PP +.Vb 1 +\& my $string = sprintf \*(AqSay hello to %s and %s\*(Aq, $foo, $bar; +.Ve +.PP +However, for the one-off simple case where I don't want to pull out a +full templating system, I'll use a string that has two Perl scalar +variables in it. In this example, I want to expand \f(CW$foo\fR and \f(CW$bar\fR +to their variable's values: +.PP +.Vb 3 +\& my $foo = \*(AqFred\*(Aq; +\& my $bar = \*(AqBarney\*(Aq; +\& $string = \*(AqSay hello to $foo and $bar\*(Aq; +.Ve +.PP +One way I can do this involves the substitution operator and a double +\&\f(CW\*(C`/e\*(C'\fR flag. The first \f(CW\*(C`/e\*(C'\fR evaluates \f(CW$1\fR on the replacement side and +turns it into \f(CW$foo\fR. The second /e starts with \f(CW$foo\fR and replaces +it with its value. \f(CW$foo\fR, then, turns into 'Fred', and that's finally +what's left in the string: +.PP +.Vb 1 +\& $string =~ s/(\e$\ew+)/$1/eeg; # \*(AqSay hello to Fred and Barney\*(Aq +.Ve +.PP +The \f(CW\*(C`/e\*(C'\fR will also silently ignore violations of strict, replacing +undefined variable names with the empty string. Since I'm using the +\&\f(CW\*(C`/e\*(C'\fR flag (twice even!), I have all of the same security problems I +have with \f(CW\*(C`eval\*(C'\fR in its string form. If there's something odd in +\&\f(CW$foo\fR, perhaps something like \f(CW\*(C`@{[ system "rm \-rf /" ]}\*(C'\fR, then +I could get myself in trouble. +.PP +To get around the security problem, I could also pull the values from +a hash instead of evaluating variable names. Using a single \f(CW\*(C`/e\*(C'\fR, I +can check the hash to ensure the value exists, and if it doesn't, I +can replace the missing value with a marker, in this case \f(CW\*(C`???\*(C'\fR to +signal that I missed something: +.PP +.Vb 1 +\& my $string = \*(AqThis has $foo and $bar\*(Aq; +\& +\& my %Replacements = ( +\& foo => \*(AqFred\*(Aq, +\& ); +\& +\& # $string =~ s/\e$(\ew+)/$Replacements{$1}/g; +\& $string =~ s/\e$(\ew+)/ +\& exists $Replacements{$1} ? $Replacements{$1} : \*(Aq???\*(Aq +\& /eg; +\& +\& print $string; +.Ve +.SS "Does Perl have anything like Ruby's #{} or Python's f string?" +.IX Subsection "Does Perl have anything like Ruby's #{} or Python's f string?" +Unlike the others, Perl allows you to embed a variable naked in a double +quoted string, e.g. \f(CW"variable $variable"\fR. When there isn't whitespace or +other non-word characters following the variable name, you can add braces +(e.g. \f(CW"foo ${foo}bar"\fR) to ensure correct parsing. +.PP +An array can also be embedded directly in a string, and will be expanded +by default with spaces between the elements. The default +LIST_SEPARATOR can be changed by assigning a +different string to the special variable \f(CW$"\fR, such as \f(CW\*(C`local $" = \*(Aq, \*(Aq;\*(C'\fR. +.PP +Perl also supports references within a string providing the equivalent of +the features in the other two languages. +.PP +\&\f(CW\*(C`${\e ... }\*(C'\fR embedded within a string will work for most simple statements +such as an object\->method call. More complex code can be wrapped in a do +block \f(CW\*(C`${\e do{...} }\*(C'\fR. +.PP +When you want a list to be expanded per \f(CW$"\fR, use \f(CW\*(C`@{[ ... ]}\*(C'\fR. +.PP +.Vb 4 +\& use Time::Piece; +\& use Time::Seconds; +\& my $scalar = \*(AqSTRING\*(Aq; +\& my @array = ( \*(Aqzorro\*(Aq, \*(Aqa\*(Aq, 1, \*(AqB\*(Aq, 3 ); +\& +\& # Print the current date and time and then Tommorrow +\& my $t = Time::Piece\->new; +\& say "Now is: ${\e $t\->cdate() }"; +\& say "Tomorrow: ${\e do{ my $T=Time::Piece\->new + ONE_DAY ; $T\->fullday }}"; +\& +\& # some variables in strings +\& say "This is some scalar I have $scalar, this is an array @array."; +\& say "You can also write it like this ${scalar} @{array}."; +\& +\& # Change the $LIST_SEPARATOR +\& local $" = \*(Aq:\*(Aq; +\& say "Set \e$\e" to delimit with \*(Aq:\*(Aq and sort the Array @{[ sort @array ]}"; +.Ve +.PP +You may also want to look at the module +Quote::Code, and templating tools such as Template::Toolkit and +Mojo::Template. +.PP +See also: "How can I expand variables in text strings?" and +"How do I expand function calls in a string?" in this FAQ. +.SS "What's wrong with always quoting ""$vars""?" +.IX Subsection "What's wrong with always quoting ""$vars""?" +The problem is that those double-quotes force +stringification\-\-coercing numbers and references into strings\-\-even +when you don't want them to be strings. Think of it this way: +double-quote expansion is used to produce new strings. If you already +have a string, why do you need more? +.PP +If you get used to writing odd things like these: +.PP +.Vb 3 +\& print "$var"; # BAD +\& my $new = "$old"; # BAD +\& somefunc("$var"); # BAD +.Ve +.PP +You'll be in trouble. Those should (in 99.8% of the cases) be +the simpler and more direct: +.PP +.Vb 3 +\& print $var; +\& my $new = $old; +\& somefunc($var); +.Ve +.PP +Otherwise, besides slowing you down, you're going to break code when +the thing in the scalar is actually neither a string nor a number, but +a reference: +.PP +.Vb 5 +\& func(\e@array); +\& sub func { +\& my $aref = shift; +\& my $oref = "$aref"; # WRONG +\& } +.Ve +.PP +You can also get into subtle problems on those few operations in Perl +that actually do care about the difference between a string and a +number, such as the magical \f(CW\*(C`++\*(C'\fR autoincrement operator or the +\&\fBsyscall()\fR function. +.PP +Stringification also destroys arrays. +.PP +.Vb 3 +\& my @lines = \`command\`; +\& print "@lines"; # WRONG \- extra blanks +\& print @lines; # right +.Ve +.SS "Why don't my <<HERE documents work?" +.IX Subsection "Why don't my <<HERE documents work?" +Here documents are found in perlop. Check for these three things: +.IP "There must be no space after the << part." 4 +.IX Item "There must be no space after the << part." +.PD 0 +.IP "There (probably) should be a semicolon at the end of the opening token" 4 +.IX Item "There (probably) should be a semicolon at the end of the opening token" +.IP "You can't (easily) have any space in front of the tag." 4 +.IX Item "You can't (easily) have any space in front of the tag." +.IP "There needs to be at least a line separator after the end token." 4 +.IX Item "There needs to be at least a line separator after the end token." +.PD +.PP +If you want to indent the text in the here document, you +can do this: +.PP +.Vb 5 +\& # all in one +\& (my $VAR = <<HERE_TARGET) =~ s/^\es+//gm; +\& your text +\& goes here +\& HERE_TARGET +.Ve +.PP +But the HERE_TARGET must still be flush against the margin. +If you want that indented also, you'll have to quote +in the indentation. +.PP +.Vb 7 +\& (my $quote = <<\*(Aq FINIS\*(Aq) =~ s/^\es+//gm; +\& ...we will have peace, when you and all your works have +\& perished\-\-and the works of your dark master to whom you +\& would deliver us. You are a liar, Saruman, and a corrupter +\& of men\*(Aqs hearts. \-\-Theoden in /usr/src/perl/taint.c +\& FINIS +\& $quote =~ s/\es+\-\-/\en\-\-/; +.Ve +.PP +A nice general-purpose fixer-upper function for indented here documents +follows. It expects to be called with a here document as its argument. +It looks to see whether each line begins with a common substring, and +if so, strips that substring off. Otherwise, it takes the amount of leading +whitespace found on the first line and removes that much off each +subsequent line. +.PP +.Vb 11 +\& sub fix { +\& local $_ = shift; +\& my ($white, $leader); # common whitespace and common leading string +\& if (/^\es*(?:([^\ew\es]+)(\es*).*\en)(?:\es*\eg1\eg2?.*\en)+$/) { +\& ($white, $leader) = ($2, quotemeta($1)); +\& } else { +\& ($white, $leader) = (/^(\es+)/, \*(Aq\*(Aq); +\& } +\& s/^\es*?$leader(?:$white)?//gm; +\& return $_; +\& } +.Ve +.PP +This works with leading special strings, dynamically determined: +.PP +.Vb 10 +\& my $remember_the_main = fix<<\*(Aq MAIN_INTERPRETER_LOOP\*(Aq; +\& @@@ int +\& @@@ runops() { +\& @@@ SAVEI32(runlevel); +\& @@@ runlevel++; +\& @@@ while ( op = (*op\->op_ppaddr)() ); +\& @@@ TAINT_NOT; +\& @@@ return 0; +\& @@@ } +\& MAIN_INTERPRETER_LOOP +.Ve +.PP +Or with a fixed amount of leading whitespace, with remaining +indentation correctly preserved: +.PP +.Vb 9 +\& my $poem = fix<<EVER_ON_AND_ON; +\& Now far ahead the Road has gone, +\& And I must follow, if I can, +\& Pursuing it with eager feet, +\& Until it joins some larger way +\& Where many paths and errands meet. +\& And whither then? I cannot say. +\& \-\-Bilbo in /usr/src/perl/pp_ctl.c +\& EVER_ON_AND_ON +.Ve +.PP +Beginning with Perl version 5.26, a much simpler and cleaner way to +write indented here documents has been added to the language: the +tilde (~) modifier. See "Indented Here-docs" in perlop for details. +.SH "Data: Arrays" +.IX Header "Data: Arrays" +.SS "What is the difference between a list and an array?" +.IX Subsection "What is the difference between a list and an array?" +(contributed by brian d foy) +.PP +A list is a fixed collection of scalars. An array is a variable that +holds a variable collection of scalars. An array can supply its collection +for list operations, so list operations also work on arrays: +.PP +.Vb 3 +\& # slices +\& ( \*(Aqdog\*(Aq, \*(Aqcat\*(Aq, \*(Aqbird\*(Aq )[2,3]; +\& @animals[2,3]; +\& +\& # iteration +\& foreach ( qw( dog cat bird ) ) { ... } +\& foreach ( @animals ) { ... } +\& +\& my @three = grep { length == 3 } qw( dog cat bird ); +\& my @three = grep { length == 3 } @animals; +\& +\& # supply an argument list +\& wash_animals( qw( dog cat bird ) ); +\& wash_animals( @animals ); +.Ve +.PP +Array operations, which change the scalars, rearrange them, or add +or subtract some scalars, only work on arrays. These can't work on a +list, which is fixed. Array operations include \f(CW\*(C`shift\*(C'\fR, \f(CW\*(C`unshift\*(C'\fR, +\&\f(CW\*(C`push\*(C'\fR, \f(CW\*(C`pop\*(C'\fR, and \f(CW\*(C`splice\*(C'\fR. +.PP +An array can also change its length: +.PP +.Vb 2 +\& $#animals = 1; # truncate to two elements +\& $#animals = 10000; # pre\-extend to 10,001 elements +.Ve +.PP +You can change an array element, but you can't change a list element: +.PP +.Vb 2 +\& $animals[0] = \*(AqRottweiler\*(Aq; +\& qw( dog cat bird )[0] = \*(AqRottweiler\*(Aq; # syntax error! +\& +\& foreach ( @animals ) { +\& s/^d/fr/; # works fine +\& } +\& +\& foreach ( qw( dog cat bird ) ) { +\& s/^d/fr/; # Error! Modification of read only value! +\& } +.Ve +.PP +However, if the list element is itself a variable, it appears that you +can change a list element. However, the list element is the variable, not +the data. You're not changing the list element, but something the list +element refers to. The list element itself doesn't change: it's still +the same variable. +.PP +You also have to be careful about context. You can assign an array to +a scalar to get the number of elements in the array. This only works +for arrays, though: +.PP +.Vb 1 +\& my $count = @animals; # only works with arrays +.Ve +.PP +If you try to do the same thing with what you think is a list, you +get a quite different result. Although it looks like you have a list +on the righthand side, Perl actually sees a bunch of scalars separated +by a comma: +.PP +.Vb 1 +\& my $scalar = ( \*(Aqdog\*(Aq, \*(Aqcat\*(Aq, \*(Aqbird\*(Aq ); # $scalar gets bird +.Ve +.PP +Since you're assigning to a scalar, the righthand side is in scalar +context. The comma operator (yes, it's an operator!) in scalar +context evaluates its lefthand side, throws away the result, and +evaluates it's righthand side and returns the result. In effect, +that list-lookalike assigns to \f(CW$scalar\fR it's rightmost value. Many +people mess this up because they choose a list-lookalike whose +last element is also the count they expect: +.PP +.Vb 1 +\& my $scalar = ( 1, 2, 3 ); # $scalar gets 3, accidentally +.Ve +.ie n .SS "What is the difference between $array[1] and @array[1]?" +.el .SS "What is the difference between \f(CW$array\fP[1] and \f(CW@array\fP[1]?" +.IX Subsection "What is the difference between $array[1] and @array[1]?" +(contributed by brian d foy) +.PP +The difference is the sigil, that special character in front of the +array name. The \f(CW\*(C`$\*(C'\fR sigil means "exactly one item", while the \f(CW\*(C`@\*(C'\fR +sigil means "zero or more items". The \f(CW\*(C`$\*(C'\fR gets you a single scalar, +while the \f(CW\*(C`@\*(C'\fR gets you a list. +.PP +The confusion arises because people incorrectly assume that the sigil +denotes the variable type. +.PP +The \f(CW$array[1]\fR is a single-element access to the array. It's going +to return the item in index 1 (or undef if there is no item there). +If you intend to get exactly one element from the array, this is the +form you should use. +.PP +The \f(CW@array[1]\fR is an array slice, although it has only one index. +You can pull out multiple elements simultaneously by specifying +additional indices as a list, like \f(CW@array[1,4,3,0]\fR. +.PP +Using a slice on the lefthand side of the assignment supplies list +context to the righthand side. This can lead to unexpected results. +For instance, if you want to read a single line from a filehandle, +assigning to a scalar value is fine: +.PP +.Vb 1 +\& $array[1] = <STDIN>; +.Ve +.PP +However, in list context, the line input operator returns all of the +lines as a list. The first line goes into \f(CW@array[1]\fR and the rest +of the lines mysteriously disappear: +.PP +.Vb 1 +\& @array[1] = <STDIN>; # most likely not what you want +.Ve +.PP +Either the \f(CW\*(C`use warnings\*(C'\fR pragma or the \fB\-w\fR flag will warn you when +you use an array slice with a single index. +.SS "How can I remove duplicate elements from a list or array?" +.IX Subsection "How can I remove duplicate elements from a list or array?" +(contributed by brian d foy) +.PP +Use a hash. When you think the words "unique" or "duplicated", think +"hash keys". +.PP +If you don't care about the order of the elements, you could just +create the hash then extract the keys. It's not important how you +create that hash: just that you use \f(CW\*(C`keys\*(C'\fR to get the unique +elements. +.PP +.Vb 3 +\& my %hash = map { $_, 1 } @array; +\& # or a hash slice: @hash{ @array } = (); +\& # or a foreach: $hash{$_} = 1 foreach ( @array ); +\& +\& my @unique = keys %hash; +.Ve +.PP +If you want to use a module, try the \f(CW\*(C`uniq\*(C'\fR function from +List::MoreUtils. In list context it returns the unique elements, +preserving their order in the list. In scalar context, it returns the +number of unique elements. +.PP +.Vb 1 +\& use List::MoreUtils qw(uniq); +\& +\& my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7 +\& my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7 +.Ve +.PP +You can also go through each element and skip the ones you've seen +before. Use a hash to keep track. The first time the loop sees an +element, that element has no key in \f(CW%Seen\fR. The \f(CW\*(C`next\*(C'\fR statement +creates the key and immediately uses its value, which is \f(CW\*(C`undef\*(C'\fR, so +the loop continues to the \f(CW\*(C`push\*(C'\fR and increments the value for that +key. The next time the loop sees that same element, its key exists in +the hash \fIand\fR the value for that key is true (since it's not 0 or +\&\f(CW\*(C`undef\*(C'\fR), so the next skips that iteration and the loop goes to the +next element. +.PP +.Vb 2 +\& my @unique = (); +\& my %seen = (); +\& +\& foreach my $elem ( @array ) { +\& next if $seen{ $elem }++; +\& push @unique, $elem; +\& } +.Ve +.PP +You can write this more briefly using a grep, which does the +same thing. +.PP +.Vb 2 +\& my %seen = (); +\& my @unique = grep { ! $seen{ $_ }++ } @array; +.Ve +.SS "How can I tell whether a certain element is contained in a list or array?" +.IX Subsection "How can I tell whether a certain element is contained in a list or array?" +(portions of this answer contributed by Anno Siegel and brian d foy) +.PP +Hearing the word "in" is an \fIin\fRdication that you probably should have +used a hash, not a list or array, to store your data. Hashes are +designed to answer this question quickly and efficiently. Arrays aren't. +.PP +That being said, there are several ways to approach this. If you +are going to make this query many times over arbitrary string values, +the fastest way is probably to invert the original array and maintain a +hash whose keys are the first array's values: +.PP +.Vb 3 +\& my @blues = qw/azure cerulean teal turquoise lapis\-lazuli/; +\& my %is_blue = (); +\& for (@blues) { $is_blue{$_} = 1 } +.Ve +.PP +Now you can check whether \f(CW$is_blue{$some_color}\fR. It might have +been a good idea to keep the blues all in a hash in the first place. +.PP +If the values are all small integers, you could use a simple indexed +array. This kind of an array will take up less space: +.PP +.Vb 4 +\& my @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31); +\& my @is_tiny_prime = (); +\& for (@primes) { $is_tiny_prime[$_] = 1 } +\& # or simply @istiny_prime[@primes] = (1) x @primes; +.Ve +.PP +Now you check whether \f(CW$is_tiny_prime\fR[$some_number]. +.PP +If the values in question are integers instead of strings, you can save +quite a lot of space by using bit strings instead: +.PP +.Vb 3 +\& my @articles = ( 1..10, 150..2000, 2017 ); +\& undef $read; +\& for (@articles) { vec($read,$_,1) = 1 } +.Ve +.PP +Now check whether \f(CW\*(C`vec($read,$n,1)\*(C'\fR is true for some \f(CW$n\fR. +.PP +These methods guarantee fast individual tests but require a re-organization +of the original list or array. They only pay off if you have to test +multiple values against the same array. +.PP +If you are testing only once, the standard module List::Util exports +the function \f(CW\*(C`any\*(C'\fR for this purpose. It works by stopping once it +finds the element. It's written in C for speed, and its Perl equivalent +looks like this subroutine: +.PP +.Vb 7 +\& sub any (&@) { +\& my $code = shift; +\& foreach (@_) { +\& return 1 if $code\->(); +\& } +\& return 0; +\& } +.Ve +.PP +If speed is of little concern, the common idiom uses grep in scalar context +(which returns the number of items that passed its condition) to traverse the +entire list. This does have the benefit of telling you how many matches it +found, though. +.PP +.Vb 1 +\& my $is_there = grep $_ eq $whatever, @array; +.Ve +.PP +If you want to actually extract the matching elements, simply use grep in +list context. +.PP +.Vb 1 +\& my @matches = grep $_ eq $whatever, @array; +.Ve +.SS "How do I compute the difference of two arrays? How do I compute the intersection of two arrays?" +.IX Subsection "How do I compute the difference of two arrays? How do I compute the intersection of two arrays?" +Use a hash. Here's code to do both and more. It assumes that each +element is unique in a given array: +.PP +.Vb 7 +\& my (@union, @intersection, @difference); +\& my %count = (); +\& foreach my $element (@array1, @array2) { $count{$element}++ } +\& foreach my $element (keys %count) { +\& push @union, $element; +\& push @{ $count{$element} > 1 ? \e@intersection : \e@difference }, $element; +\& } +.Ve +.PP +Note that this is the \fIsymmetric difference\fR, that is, all elements +in either A or in B but not in both. Think of it as an xor operation. +.SS "How do I test whether two arrays or hashes are equal?" +.IX Subsection "How do I test whether two arrays or hashes are equal?" +The following code works for single-level arrays. It uses a +stringwise comparison, and does not distinguish defined versus +undefined empty strings. Modify if you have other needs. +.PP +.Vb 1 +\& $are_equal = compare_arrays(\e@frogs, \e@toads); +\& +\& sub compare_arrays { +\& my ($first, $second) = @_; +\& no warnings; # silence spurious \-w undef complaints +\& return 0 unless @$first == @$second; +\& for (my $i = 0; $i < @$first; $i++) { +\& return 0 if $first\->[$i] ne $second\->[$i]; +\& } +\& return 1; +\& } +.Ve +.PP +For multilevel structures, you may wish to use an approach more +like this one. It uses the CPAN module FreezeThaw: +.PP +.Vb 2 +\& use FreezeThaw qw(cmpStr); +\& my @a = my @b = ( "this", "that", [ "more", "stuff" ] ); +\& +\& printf "a and b contain %s arrays\en", +\& cmpStr(\e@a, \e@b) == 0 +\& ? "the same" +\& : "different"; +.Ve +.PP +This approach also works for comparing hashes. Here we'll demonstrate +two different answers: +.PP +.Vb 1 +\& use FreezeThaw qw(cmpStr cmpStrHard); +\& +\& my %a = my %b = ( "this" => "that", "extra" => [ "more", "stuff" ] ); +\& $a{EXTRA} = \e%b; +\& $b{EXTRA} = \e%a; +\& +\& printf "a and b contain %s hashes\en", +\& cmpStr(\e%a, \e%b) == 0 ? "the same" : "different"; +\& +\& printf "a and b contain %s hashes\en", +\& cmpStrHard(\e%a, \e%b) == 0 ? "the same" : "different"; +.Ve +.PP +The first reports that both those the hashes contain the same data, +while the second reports that they do not. Which you prefer is left as +an exercise to the reader. +.SS "How do I find the first array element for which a condition is true?" +.IX Subsection "How do I find the first array element for which a condition is true?" +To find the first array element which satisfies a condition, you can +use the \f(CWfirst()\fR function in the List::Util module, which comes +with Perl 5.8. This example finds the first element that contains +"Perl". +.PP +.Vb 1 +\& use List::Util qw(first); +\& +\& my $element = first { /Perl/ } @array; +.Ve +.PP +If you cannot use List::Util, you can make your own loop to do the +same thing. Once you find the element, you stop the loop with last. +.PP +.Vb 4 +\& my $found; +\& foreach ( @array ) { +\& if( /Perl/ ) { $found = $_; last } +\& } +.Ve +.PP +If you want the array index, use the \f(CWfirstidx()\fR function from +\&\f(CW\*(C`List::MoreUtils\*(C'\fR: +.PP +.Vb 2 +\& use List::MoreUtils qw(firstidx); +\& my $index = firstidx { /Perl/ } @array; +.Ve +.PP +Or write it yourself, iterating through the indices +and checking the array element at each index until you find one +that satisfies the condition: +.PP +.Vb 8 +\& my( $found, $index ) = ( undef, \-1 ); +\& for( $i = 0; $i < @array; $i++ ) { +\& if( $array[$i] =~ /Perl/ ) { +\& $found = $array[$i]; +\& $index = $i; +\& last; +\& } +\& } +.Ve +.SS "How do I handle linked lists?" +.IX Subsection "How do I handle linked lists?" +(contributed by brian d foy) +.PP +Perl's arrays do not have a fixed size, so you don't need linked lists +if you just want to add or remove items. You can use array operations +such as \f(CW\*(C`push\*(C'\fR, \f(CW\*(C`pop\*(C'\fR, \f(CW\*(C`shift\*(C'\fR, \f(CW\*(C`unshift\*(C'\fR, or \f(CW\*(C`splice\*(C'\fR to do +that. +.PP +Sometimes, however, linked lists can be useful in situations where you +want to "shard" an array so you have many small arrays instead of +a single big array. You can keep arrays longer than Perl's largest +array index, lock smaller arrays separately in threaded programs, +reallocate less memory, or quickly insert elements in the middle of +the chain. +.PP +Steve Lembark goes through the details in his YAPC::NA 2009 talk "Perly +Linked Lists" ( <http://www.slideshare.net/lembark/perly\-linked\-lists> ), +although you can just use his LinkedList::Single module. +.SS "How do I handle circular lists?" +.IX Xref "circular array Tie::Cycle Array::Iterator::Circular cycle modulus" +.IX Subsection "How do I handle circular lists?" +(contributed by brian d foy) +.PP +If you want to cycle through an array endlessly, you can increment the +index modulo the number of elements in the array: +.PP +.Vb 2 +\& my @array = qw( a b c ); +\& my $i = 0; +\& +\& while( 1 ) { +\& print $array[ $i++ % @array ], "\en"; +\& last if $i > 20; +\& } +.Ve +.PP +You can also use Tie::Cycle to use a scalar that always has the +next element of the circular array: +.PP +.Vb 1 +\& use Tie::Cycle; +\& +\& tie my $cycle, \*(AqTie::Cycle\*(Aq, [ qw( FFFFFF 000000 FFFF00 ) ]; +\& +\& print $cycle; # FFFFFF +\& print $cycle; # 000000 +\& print $cycle; # FFFF00 +.Ve +.PP +The Array::Iterator::Circular creates an iterator object for +circular arrays: +.PP +.Vb 1 +\& use Array::Iterator::Circular; +\& +\& my $color_iterator = Array::Iterator::Circular\->new( +\& qw(red green blue orange) +\& ); +\& +\& foreach ( 1 .. 20 ) { +\& print $color_iterator\->next, "\en"; +\& } +.Ve +.SS "How do I shuffle an array randomly?" +.IX Subsection "How do I shuffle an array randomly?" +If you either have Perl 5.8.0 or later installed, or if you have +Scalar-List-Utils 1.03 or later installed, you can say: +.PP +.Vb 1 +\& use List::Util \*(Aqshuffle\*(Aq; +\& +\& @shuffled = shuffle(@list); +.Ve +.PP +If not, you can use a Fisher-Yates shuffle. +.PP +.Vb 3 +\& sub fisher_yates_shuffle { +\& my $deck = shift; # $deck is a reference to an array +\& return unless @$deck; # must not be empty! +\& +\& my $i = @$deck; +\& while (\-\-$i) { +\& my $j = int rand ($i+1); +\& @$deck[$i,$j] = @$deck[$j,$i]; +\& } +\& } +\& +\& # shuffle my mpeg collection +\& # +\& my @mpeg = <audio/*/*.mp3>; +\& fisher_yates_shuffle( \e@mpeg ); # randomize @mpeg in place +\& print @mpeg; +.Ve +.PP +Note that the above implementation shuffles an array in place, +unlike the \f(CWList::Util::shuffle()\fR which takes a list and returns +a new shuffled list. +.PP +You've probably seen shuffling algorithms that work using splice, +randomly picking another element to swap the current element with +.PP +.Vb 6 +\& srand; +\& @new = (); +\& @old = 1 .. 10; # just a demo +\& while (@old) { +\& push(@new, splice(@old, rand @old, 1)); +\& } +.Ve +.PP +This is bad because splice is already O(N), and since you do it N +times, you just invented a quadratic algorithm; that is, O(N**2). +This does not scale, although Perl is so efficient that you probably +won't notice this until you have rather largish arrays. +.SS "How do I process/modify each element of an array?" +.IX Subsection "How do I process/modify each element of an array?" +Use \f(CW\*(C`for\*(C'\fR/\f(CW\*(C`foreach\*(C'\fR: +.PP +.Vb 4 +\& for (@lines) { +\& s/foo/bar/; # change that word +\& tr/XZ/ZX/; # swap those letters +\& } +.Ve +.PP +Here's another; let's compute spherical volumes: +.PP +.Vb 5 +\& my @volumes = @radii; +\& for (@volumes) { # @volumes has changed parts +\& $_ **= 3; +\& $_ *= (4/3) * 3.14159; # this will be constant folded +\& } +.Ve +.PP +which can also be done with \f(CWmap()\fR which is made to transform +one list into another: +.PP +.Vb 1 +\& my @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii; +.Ve +.PP +If you want to do the same thing to modify the values of the +hash, you can use the \f(CW\*(C`values\*(C'\fR function. As of Perl 5.6 +the values are not copied, so if you modify \f(CW$orbit\fR (in this +case), you modify the value. +.PP +.Vb 3 +\& for my $orbit ( values %orbits ) { +\& ($orbit **= 3) *= (4/3) * 3.14159; +\& } +.Ve +.PP +Prior to perl 5.6 \f(CW\*(C`values\*(C'\fR returned copies of the values, +so older perl code often contains constructions such as +\&\f(CW@orbits{keys %orbits}\fR instead of \f(CW\*(C`values %orbits\*(C'\fR where +the hash is to be modified. +.SS "How do I select a random element from an array?" +.IX Subsection "How do I select a random element from an array?" +Use the \f(CWrand()\fR function (see "rand" in perlfunc): +.PP +.Vb 2 +\& my $index = rand @array; +\& my $element = $array[$index]; +.Ve +.PP +Or, simply: +.PP +.Vb 1 +\& my $element = $array[ rand @array ]; +.Ve +.SS "How do I permute N elements of a list?" +.IX Xref "List::Permutor permute Algorithm::Loops Knuth The Art of Computer Programming Fischer-Krause" +.IX Subsection "How do I permute N elements of a list?" +Use the List::Permutor module on CPAN. If the list is actually an +array, try the Algorithm::Permute module (also on CPAN). It's +written in XS code and is very efficient: +.PP +.Vb 1 +\& use Algorithm::Permute; +\& +\& my @array = \*(Aqa\*(Aq..\*(Aqd\*(Aq; +\& my $p_iterator = Algorithm::Permute\->new ( \e@array ); +\& +\& while (my @perm = $p_iterator\->next) { +\& print "next permutation: (@perm)\en"; +\& } +.Ve +.PP +For even faster execution, you could do: +.PP +.Vb 1 +\& use Algorithm::Permute; +\& +\& my @array = \*(Aqa\*(Aq..\*(Aqd\*(Aq; +\& +\& Algorithm::Permute::permute { +\& print "next permutation: (@array)\en"; +\& } @array; +.Ve +.PP +Here's a little program that generates all permutations of all the +words on each line of input. The algorithm embodied in the +\&\f(CWpermute()\fR function is discussed in Volume 4 (still unpublished) of +Knuth's \fIThe Art of Computer Programming\fR and will work on any list: +.PP +.Vb 2 +\& #!/usr/bin/perl \-n +\& # Fischer\-Krause ordered permutation generator +\& +\& sub permute (&@) { +\& my $code = shift; +\& my @idx = 0..$#_; +\& while ( $code\->(@_[@idx]) ) { +\& my $p = $#idx; +\& \-\-$p while $idx[$p\-1] > $idx[$p]; +\& my $q = $p or return; +\& push @idx, reverse splice @idx, $p; +\& ++$q while $idx[$p\-1] > $idx[$q]; +\& @idx[$p\-1,$q]=@idx[$q,$p\-1]; +\& } +\& } +\& +\& permute { print "@_\en" } split; +.Ve +.PP +The Algorithm::Loops module also provides the \f(CW\*(C`NextPermute\*(C'\fR and +\&\f(CW\*(C`NextPermuteNum\*(C'\fR functions which efficiently find all unique permutations +of an array, even if it contains duplicate values, modifying it in-place: +if its elements are in reverse-sorted order then the array is reversed, +making it sorted, and it returns false; otherwise the next +permutation is returned. +.PP +\&\f(CW\*(C`NextPermute\*(C'\fR uses string order and \f(CW\*(C`NextPermuteNum\*(C'\fR numeric order, so +you can enumerate all the permutations of \f(CW0..9\fR like this: +.PP +.Vb 1 +\& use Algorithm::Loops qw(NextPermuteNum); +\& +\& my @list= 0..9; +\& do { print "@list\en" } while NextPermuteNum @list; +.Ve +.SS "How do I sort an array by (anything)?" +.IX Subsection "How do I sort an array by (anything)?" +Supply a comparison function to \fBsort()\fR (described in "sort" in perlfunc): +.PP +.Vb 1 +\& @list = sort { $a <=> $b } @list; +.Ve +.PP +The default sort function is cmp, string comparison, which would +sort \f(CW\*(C`(1, 2, 10)\*(C'\fR into \f(CW\*(C`(1, 10, 2)\*(C'\fR. \f(CW\*(C`<=>\*(C'\fR, used above, is +the numerical comparison operator. +.PP +If you have a complicated function needed to pull out the part you +want to sort on, then don't do it inside the sort function. Pull it +out first, because the sort BLOCK can be called many times for the +same element. Here's an example of how to pull out the first word +after the first number on each item, and then sort those words +case-insensitively. +.PP +.Vb 7 +\& my @idx; +\& for (@data) { +\& my $item; +\& ($item) = /\ed+\es*(\eS+)/; +\& push @idx, uc($item); +\& } +\& my @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ]; +.Ve +.PP +which could also be written this way, using a trick +that's come to be known as the Schwartzian Transform: +.PP +.Vb 3 +\& my @sorted = map { $_\->[0] } +\& sort { $a\->[1] cmp $b\->[1] } +\& map { [ $_, uc( (/\ed+\es*(\eS+)/)[0]) ] } @data; +.Ve +.PP +If you need to sort on several fields, the following paradigm is useful. +.PP +.Vb 5 +\& my @sorted = sort { +\& field1($a) <=> field1($b) || +\& field2($a) cmp field2($b) || +\& field3($a) cmp field3($b) +\& } @data; +.Ve +.PP +This can be conveniently combined with precalculation of keys as given +above. +.PP +See the \fIsort\fR article in the "Far More Than You Ever Wanted +To Know" collection in <http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz> for +more about this approach. +.PP +See also the question later in perlfaq4 on sorting hashes. +.SS "How do I manipulate arrays of bits?" +.IX Subsection "How do I manipulate arrays of bits?" +Use \f(CWpack()\fR and \f(CWunpack()\fR, or else \f(CWvec()\fR and the bitwise +operations. +.PP +For example, you don't have to store individual bits in an array +(which would mean that you're wasting a lot of space). To convert an +array of bits to a string, use \f(CWvec()\fR to set the right bits. This +sets \f(CW$vec\fR to have bit N set only if \f(CW$ints[N]\fR was set: +.PP +.Vb 5 +\& my @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... ) +\& my $vec = \*(Aq\*(Aq; +\& foreach( 0 .. $#ints ) { +\& vec($vec,$_,1) = 1 if $ints[$_]; +\& } +.Ve +.PP +The string \f(CW$vec\fR only takes up as many bits as it needs. For +instance, if you had 16 entries in \f(CW@ints\fR, \f(CW$vec\fR only needs two +bytes to store them (not counting the scalar variable overhead). +.PP +Here's how, given a vector in \f(CW$vec\fR, you can get those bits into +your \f(CW@ints\fR array: +.PP +.Vb 7 +\& sub bitvec_to_list { +\& my $vec = shift; +\& my @ints; +\& # Find null\-byte density then select best algorithm +\& if ($vec =~ tr/\e0// / length $vec > 0.95) { +\& use integer; +\& my $i; +\& +\& # This method is faster with mostly null\-bytes +\& while($vec =~ /[^\e0]/g ) { +\& $i = \-9 + 8 * pos $vec; +\& push @ints, $i if vec($vec, ++$i, 1); +\& push @ints, $i if vec($vec, ++$i, 1); +\& push @ints, $i if vec($vec, ++$i, 1); +\& push @ints, $i if vec($vec, ++$i, 1); +\& push @ints, $i if vec($vec, ++$i, 1); +\& push @ints, $i if vec($vec, ++$i, 1); +\& push @ints, $i if vec($vec, ++$i, 1); +\& push @ints, $i if vec($vec, ++$i, 1); +\& } +\& } +\& else { +\& # This method is a fast general algorithm +\& use integer; +\& my $bits = unpack "b*", $vec; +\& push @ints, 0 if $bits =~ s/^(\ed)// && $1; +\& push @ints, pos $bits while($bits =~ /1/g); +\& } +\& +\& return \e@ints; +\& } +.Ve +.PP +This method gets faster the more sparse the bit vector is. +(Courtesy of Tim Bunce and Winfried Koenig.) +.PP +You can make the while loop a lot shorter with this suggestion +from Benjamin Goldberg: +.PP +.Vb 3 +\& while($vec =~ /[^\e0]+/g ) { +\& push @ints, grep vec($vec, $_, 1), $\-[0] * 8 .. $+[0] * 8; +\& } +.Ve +.PP +Or use the CPAN module Bit::Vector: +.PP +.Vb 3 +\& my $vector = Bit::Vector\->new($num_of_bits); +\& $vector\->Index_List_Store(@ints); +\& my @ints = $vector\->Index_List_Read(); +.Ve +.PP +Bit::Vector provides efficient methods for bit vector, sets of +small integers and "big int" math. +.PP +Here's a more extensive illustration using \fBvec()\fR: +.PP +.Vb 7 +\& # vec demo +\& my $vector = "\exff\ex0f\exef\exfe"; +\& print "Ilya\*(Aqs string \e\exff\e\ex0f\e\exef\e\exfe represents the number ", +\& unpack("N", $vector), "\en"; +\& my $is_set = vec($vector, 23, 1); +\& print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\en"; +\& pvec($vector); +\& +\& set_vec(1,1,1); +\& set_vec(3,1,1); +\& set_vec(23,1,1); +\& +\& set_vec(3,1,3); +\& set_vec(3,2,3); +\& set_vec(3,4,3); +\& set_vec(3,4,7); +\& set_vec(3,8,3); +\& set_vec(3,8,7); +\& +\& set_vec(0,32,17); +\& set_vec(1,32,17); +\& +\& sub set_vec { +\& my ($offset, $width, $value) = @_; +\& my $vector = \*(Aq\*(Aq; +\& vec($vector, $offset, $width) = $value; +\& print "offset=$offset width=$width value=$value\en"; +\& pvec($vector); +\& } +\& +\& sub pvec { +\& my $vector = shift; +\& my $bits = unpack("b*", $vector); +\& my $i = 0; +\& my $BASE = 8; +\& +\& print "vector length in bytes: ", length($vector), "\en"; +\& @bytes = unpack("A8" x length($vector), $bits); +\& print "bits are: @bytes\en\en"; +\& } +.Ve +.SS "Why does \fBdefined()\fP return true on empty arrays and hashes?" +.IX Subsection "Why does defined() return true on empty arrays and hashes?" +The short story is that you should probably only use defined on scalars or +functions, not on aggregates (arrays and hashes). See "defined" in perlfunc +in the 5.004 release or later of Perl for more detail. +.SH "Data: Hashes (Associative Arrays)" +.IX Header "Data: Hashes (Associative Arrays)" +.SS "How do I process an entire hash?" +.IX Subsection "How do I process an entire hash?" +(contributed by brian d foy) +.PP +There are a couple of ways that you can process an entire hash. You +can get a list of keys, then go through each key, or grab a one +key-value pair at a time. +.PP +To go through all of the keys, use the \f(CW\*(C`keys\*(C'\fR function. This extracts +all of the keys of the hash and gives them back to you as a list. You +can then get the value through the particular key you're processing: +.PP +.Vb 4 +\& foreach my $key ( keys %hash ) { +\& my $value = $hash{$key} +\& ... +\& } +.Ve +.PP +Once you have the list of keys, you can process that list before you +process the hash elements. For instance, you can sort the keys so you +can process them in lexical order: +.PP +.Vb 4 +\& foreach my $key ( sort keys %hash ) { +\& my $value = $hash{$key} +\& ... +\& } +.Ve +.PP +Or, you might want to only process some of the items. If you only want +to deal with the keys that start with \f(CW\*(C`text:\*(C'\fR, you can select just +those using \f(CW\*(C`grep\*(C'\fR: +.PP +.Vb 4 +\& foreach my $key ( grep /^text:/, keys %hash ) { +\& my $value = $hash{$key} +\& ... +\& } +.Ve +.PP +If the hash is very large, you might not want to create a long list of +keys. To save some memory, you can grab one key-value pair at a time using +\&\f(CWeach()\fR, which returns a pair you haven't seen yet: +.PP +.Vb 3 +\& while( my( $key, $value ) = each( %hash ) ) { +\& ... +\& } +.Ve +.PP +The \f(CW\*(C`each\*(C'\fR operator returns the pairs in apparently random order, so if +ordering matters to you, you'll have to stick with the \f(CW\*(C`keys\*(C'\fR method. +.PP +The \f(CWeach()\fR operator can be a bit tricky though. You can't add or +delete keys of the hash while you're using it without possibly +skipping or re-processing some pairs after Perl internally rehashes +all of the elements. Additionally, a hash has only one iterator, so if +you mix \f(CW\*(C`keys\*(C'\fR, \f(CW\*(C`values\*(C'\fR, or \f(CW\*(C`each\*(C'\fR on the same hash, you risk resetting +the iterator and messing up your processing. See the \f(CW\*(C`each\*(C'\fR entry in +perlfunc for more details. +.SS "How do I merge two hashes?" +.IX Xref "hash merge slice, hash" +.IX Subsection "How do I merge two hashes?" +(contributed by brian d foy) +.PP +Before you decide to merge two hashes, you have to decide what to do +if both hashes contain keys that are the same and if you want to leave +the original hashes as they were. +.PP +If you want to preserve the original hashes, copy one hash (\f(CW%hash1\fR) +to a new hash (\f(CW%new_hash\fR), then add the keys from the other hash +(\f(CW%hash2\fR to the new hash. Checking that the key already exists in +\&\f(CW%new_hash\fR gives you a chance to decide what to do with the +duplicates: +.PP +.Vb 1 +\& my %new_hash = %hash1; # make a copy; leave %hash1 alone +\& +\& foreach my $key2 ( keys %hash2 ) { +\& if( exists $new_hash{$key2} ) { +\& warn "Key [$key2] is in both hashes!"; +\& # handle the duplicate (perhaps only warning) +\& ... +\& next; +\& } +\& else { +\& $new_hash{$key2} = $hash2{$key2}; +\& } +\& } +.Ve +.PP +If you don't want to create a new hash, you can still use this looping +technique; just change the \f(CW%new_hash\fR to \f(CW%hash1\fR. +.PP +.Vb 11 +\& foreach my $key2 ( keys %hash2 ) { +\& if( exists $hash1{$key2} ) { +\& warn "Key [$key2] is in both hashes!"; +\& # handle the duplicate (perhaps only warning) +\& ... +\& next; +\& } +\& else { +\& $hash1{$key2} = $hash2{$key2}; +\& } +\& } +.Ve +.PP +If you don't care that one hash overwrites keys and values from the other, you +could just use a hash slice to add one hash to another. In this case, values +from \f(CW%hash2\fR replace values from \f(CW%hash1\fR when they have keys in common: +.PP +.Vb 1 +\& @hash1{ keys %hash2 } = values %hash2; +.Ve +.SS "What happens if I add or remove keys from a hash while iterating over it?" +.IX Subsection "What happens if I add or remove keys from a hash while iterating over it?" +(contributed by brian d foy) +.PP +The easy answer is "Don't do that!" +.PP +If you iterate through the hash with \fBeach()\fR, you can delete the key +most recently returned without worrying about it. If you delete or add +other keys, the iterator may skip or double up on them since perl +may rearrange the hash table. See the +entry for \f(CWeach()\fR in perlfunc. +.SS "How do I look up a hash element by value?" +.IX Subsection "How do I look up a hash element by value?" +Create a reverse hash: +.PP +.Vb 2 +\& my %by_value = reverse %by_key; +\& my $key = $by_value{$value}; +.Ve +.PP +That's not particularly efficient. It would be more space-efficient +to use: +.PP +.Vb 3 +\& while (my ($key, $value) = each %by_key) { +\& $by_value{$value} = $key; +\& } +.Ve +.PP +If your hash could have repeated values, the methods above will only find +one of the associated keys. This may or may not worry you. If it does +worry you, you can always reverse the hash into a hash of arrays instead: +.PP +.Vb 3 +\& while (my ($key, $value) = each %by_key) { +\& push @{$key_list_by_value{$value}}, $key; +\& } +.Ve +.SS "How can I know how many entries are in a hash?" +.IX Subsection "How can I know how many entries are in a hash?" +(contributed by brian d foy) +.PP +This is very similar to "How do I process an entire hash?", also in +perlfaq4, but a bit simpler in the common cases. +.PP +You can use the \f(CWkeys()\fR built-in function in scalar context to find out +have many entries you have in a hash: +.PP +.Vb 1 +\& my $key_count = keys %hash; # must be scalar context! +.Ve +.PP +If you want to find out how many entries have a defined value, that's +a bit different. You have to check each value. A \f(CW\*(C`grep\*(C'\fR is handy: +.PP +.Vb 1 +\& my $defined_value_count = grep { defined } values %hash; +.Ve +.PP +You can use that same structure to count the entries any way that +you like. If you want the count of the keys with vowels in them, +you just test for that instead: +.PP +.Vb 1 +\& my $vowel_count = grep { /[aeiou]/ } keys %hash; +.Ve +.PP +The \f(CW\*(C`grep\*(C'\fR in scalar context returns the count. If you want the list +of matching items, just use it in list context instead: +.PP +.Vb 1 +\& my @defined_values = grep { defined } values %hash; +.Ve +.PP +The \f(CWkeys()\fR function also resets the iterator, which means that you may +see strange results if you use this between uses of other hash operators +such as \f(CWeach()\fR. +.SS "How do I sort a hash (optionally by value instead of key)?" +.IX Subsection "How do I sort a hash (optionally by value instead of key)?" +(contributed by brian d foy) +.PP +To sort a hash, start with the keys. In this example, we give the list of +keys to the sort function which then compares them ASCIIbetically (which +might be affected by your locale settings). The output list has the keys +in ASCIIbetical order. Once we have the keys, we can go through them to +create a report which lists the keys in ASCIIbetical order. +.PP +.Vb 1 +\& my @keys = sort { $a cmp $b } keys %hash; +\& +\& foreach my $key ( @keys ) { +\& printf "%\-20s %6d\en", $key, $hash{$key}; +\& } +.Ve +.PP +We could get more fancy in the \f(CWsort()\fR block though. Instead of +comparing the keys, we can compute a value with them and use that +value as the comparison. +.PP +For instance, to make our report order case-insensitive, we use +\&\f(CW\*(C`lc\*(C'\fR to lowercase the keys before comparing them: +.PP +.Vb 1 +\& my @keys = sort { lc $a cmp lc $b } keys %hash; +.Ve +.PP +Note: if the computation is expensive or the hash has many elements, +you may want to look at the Schwartzian Transform to cache the +computation results. +.PP +If we want to sort by the hash value instead, we use the hash key +to look it up. We still get out a list of keys, but this time they +are ordered by their value. +.PP +.Vb 1 +\& my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash; +.Ve +.PP +From there we can get more complex. If the hash values are the same, +we can provide a secondary sort on the hash key. +.PP +.Vb 5 +\& my @keys = sort { +\& $hash{$a} <=> $hash{$b} +\& or +\& "\eL$a" cmp "\eL$b" +\& } keys %hash; +.Ve +.SS "How can I always keep my hash sorted?" +.IX Xref "hash tie sort DB_File Tie::IxHash" +.IX Subsection "How can I always keep my hash sorted?" +You can look into using the \f(CW\*(C`DB_File\*(C'\fR module and \f(CWtie()\fR using the +\&\f(CW$DB_BTREE\fR hash bindings as documented in "In Memory +Databases" in DB_File. The Tie::IxHash module from CPAN might also be +instructive. Although this does keep your hash sorted, you might not +like the slowdown you suffer from the tie interface. Are you sure you +need to do this? :) +.SS "What's the difference between ""delete"" and ""undef"" with hashes?" +.IX Subsection "What's the difference between ""delete"" and ""undef"" with hashes?" +Hashes contain pairs of scalars: the first is the key, the +second is the value. The key will be coerced to a string, +although the value can be any kind of scalar: string, +number, or reference. If a key \f(CW$key\fR is present in +\&\f(CW%hash\fR, \f(CWexists($hash{$key})\fR will return true. The value +for a given key can be \f(CW\*(C`undef\*(C'\fR, in which case +\&\f(CW$hash{$key}\fR will be \f(CW\*(C`undef\*(C'\fR while \f(CW\*(C`exists $hash{$key}\*(C'\fR +will return true. This corresponds to (\f(CW$key\fR, \f(CW\*(C`undef\*(C'\fR) +being in the hash. +.PP +Pictures help... Here's the \f(CW%hash\fR table: +.PP +.Vb 7 +\& keys values +\& +\-\-\-\-\-\-+\-\-\-\-\-\-+ +\& | a | 3 | +\& | x | 7 | +\& | d | 0 | +\& | e | 2 | +\& +\-\-\-\-\-\-+\-\-\-\-\-\-+ +.Ve +.PP +And these conditions hold +.PP +.Vb 6 +\& $hash{\*(Aqa\*(Aq} is true +\& $hash{\*(Aqd\*(Aq} is false +\& defined $hash{\*(Aqd\*(Aq} is true +\& defined $hash{\*(Aqa\*(Aq} is true +\& exists $hash{\*(Aqa\*(Aq} is true (Perl 5 only) +\& grep ($_ eq \*(Aqa\*(Aq, keys %hash) is true +.Ve +.PP +If you now say +.PP +.Vb 1 +\& undef $hash{\*(Aqa\*(Aq} +.Ve +.PP +your table now reads: +.PP +.Vb 7 +\& keys values +\& +\-\-\-\-\-\-+\-\-\-\-\-\-+ +\& | a | undef| +\& | x | 7 | +\& | d | 0 | +\& | e | 2 | +\& +\-\-\-\-\-\-+\-\-\-\-\-\-+ +.Ve +.PP +and these conditions now hold; changes in caps: +.PP +.Vb 6 +\& $hash{\*(Aqa\*(Aq} is FALSE +\& $hash{\*(Aqd\*(Aq} is false +\& defined $hash{\*(Aqd\*(Aq} is true +\& defined $hash{\*(Aqa\*(Aq} is FALSE +\& exists $hash{\*(Aqa\*(Aq} is true (Perl 5 only) +\& grep ($_ eq \*(Aqa\*(Aq, keys %hash) is true +.Ve +.PP +Notice the last two: you have an undef value, but a defined key! +.PP +Now, consider this: +.PP +.Vb 1 +\& delete $hash{\*(Aqa\*(Aq} +.Ve +.PP +your table now reads: +.PP +.Vb 6 +\& keys values +\& +\-\-\-\-\-\-+\-\-\-\-\-\-+ +\& | x | 7 | +\& | d | 0 | +\& | e | 2 | +\& +\-\-\-\-\-\-+\-\-\-\-\-\-+ +.Ve +.PP +and these conditions now hold; changes in caps: +.PP +.Vb 6 +\& $hash{\*(Aqa\*(Aq} is false +\& $hash{\*(Aqd\*(Aq} is false +\& defined $hash{\*(Aqd\*(Aq} is true +\& defined $hash{\*(Aqa\*(Aq} is false +\& exists $hash{\*(Aqa\*(Aq} is FALSE (Perl 5 only) +\& grep ($_ eq \*(Aqa\*(Aq, keys %hash) is FALSE +.Ve +.PP +See, the whole entry is gone! +.SS "Why don't my tied hashes make the defined/exists distinction?" +.IX Subsection "Why don't my tied hashes make the defined/exists distinction?" +This depends on the tied hash's implementation of \fBEXISTS()\fR. +For example, there isn't the concept of undef with hashes +that are tied to DBM* files. It also means that \fBexists()\fR and +\&\fBdefined()\fR do the same thing with a DBM* file, and what they +end up doing is not what they do with ordinary hashes. +.SS "How do I reset an \fBeach()\fP operation part-way through?" +.IX Subsection "How do I reset an each() operation part-way through?" +(contributed by brian d foy) +.PP +You can use the \f(CW\*(C`keys\*(C'\fR or \f(CW\*(C`values\*(C'\fR functions to reset \f(CW\*(C`each\*(C'\fR. To +simply reset the iterator used by \f(CW\*(C`each\*(C'\fR without doing anything else, +use one of them in void context: +.PP +.Vb 2 +\& keys %hash; # resets iterator, nothing else. +\& values %hash; # resets iterator, nothing else. +.Ve +.PP +See the documentation for \f(CW\*(C`each\*(C'\fR in perlfunc. +.SS "How can I get the unique keys from two hashes?" +.IX Subsection "How can I get the unique keys from two hashes?" +First you extract the keys from the hashes into lists, then solve +the "removing duplicates" problem described above. For example: +.PP +.Vb 5 +\& my %seen = (); +\& for my $element (keys(%foo), keys(%bar)) { +\& $seen{$element}++; +\& } +\& my @uniq = keys %seen; +.Ve +.PP +Or more succinctly: +.PP +.Vb 1 +\& my @uniq = keys %{{%foo,%bar}}; +.Ve +.PP +Or if you really want to save space: +.PP +.Vb 8 +\& my %seen = (); +\& while (defined ($key = each %foo)) { +\& $seen{$key}++; +\& } +\& while (defined ($key = each %bar)) { +\& $seen{$key}++; +\& } +\& my @uniq = keys %seen; +.Ve +.SS "How can I store a multidimensional array in a DBM file?" +.IX Subsection "How can I store a multidimensional array in a DBM file?" +Either stringify the structure yourself (no fun), or else +get the MLDBM (which uses Data::Dumper) module from CPAN and layer +it on top of either DB_File or GDBM_File. You might also try DBM::Deep, but +it can be a bit slow. +.SS "How can I make my hash remember the order I put elements into it?" +.IX Subsection "How can I make my hash remember the order I put elements into it?" +Use the Tie::IxHash from CPAN. +.PP +.Vb 1 +\& use Tie::IxHash; +\& +\& tie my %myhash, \*(AqTie::IxHash\*(Aq; +\& +\& for (my $i=0; $i<20; $i++) { +\& $myhash{$i} = 2*$i; +\& } +\& +\& my @keys = keys %myhash; +\& # @keys = (0,1,2,3,...) +.Ve +.SS "Why does passing a subroutine an undefined element in a hash create it?" +.IX Subsection "Why does passing a subroutine an undefined element in a hash create it?" +(contributed by brian d foy) +.PP +Are you using a really old version of Perl? +.PP +Normally, accessing a hash key's value for a nonexistent key will +\&\fInot\fR create the key. +.PP +.Vb 3 +\& my %hash = (); +\& my $value = $hash{ \*(Aqfoo\*(Aq }; +\& print "This won\*(Aqt print\en" if exists $hash{ \*(Aqfoo\*(Aq }; +.Ve +.PP +Passing \f(CW$hash{ \*(Aqfoo\*(Aq }\fR to a subroutine used to be a special case, though. +Since you could assign directly to \f(CW$_[0]\fR, Perl had to be ready to +make that assignment so it created the hash key ahead of time: +.PP +.Vb 2 +\& my_sub( $hash{ \*(Aqfoo\*(Aq } ); +\& print "This will print before 5.004\en" if exists $hash{ \*(Aqfoo\*(Aq }; +\& +\& sub my_sub { +\& # $_[0] = \*(Aqbar\*(Aq; # create hash key in case you do this +\& 1; +\& } +.Ve +.PP +Since Perl 5.004, however, this situation is a special case and Perl +creates the hash key only when you make the assignment: +.PP +.Vb 2 +\& my_sub( $hash{ \*(Aqfoo\*(Aq } ); +\& print "This will print, even after 5.004\en" if exists $hash{ \*(Aqfoo\*(Aq }; +\& +\& sub my_sub { +\& $_[0] = \*(Aqbar\*(Aq; +\& } +.Ve +.PP +However, if you want the old behavior (and think carefully about that +because it's a weird side effect), you can pass a hash slice instead. +Perl 5.004 didn't make this a special case: +.PP +.Vb 1 +\& my_sub( @hash{ qw/foo/ } ); +.Ve +.SS "How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?" +.IX Subsection "How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?" +Usually a hash ref, perhaps like this: +.PP +.Vb 8 +\& $record = { +\& NAME => "Jason", +\& EMPNO => 132, +\& TITLE => "deputy peon", +\& AGE => 23, +\& SALARY => 37_000, +\& PALS => [ "Norbert", "Rhys", "Phineas"], +\& }; +.Ve +.PP +References are documented in perlref and perlreftut. +Examples of complex data structures are given in perldsc and +perllol. Examples of structures and object-oriented classes are +in perlootut. +.SS "How can I use a reference as a hash key?" +.IX Subsection "How can I use a reference as a hash key?" +(contributed by brian d foy and Ben Morrow) +.PP +Hash keys are strings, so you can't really use a reference as the key. +When you try to do that, perl turns the reference into its stringified +form (for instance, \f(CWHASH(0xDEADBEEF)\fR). From there you can't get +back the reference from the stringified form, at least without doing +some extra work on your own. +.PP +Remember that the entry in the hash will still be there even if +the referenced variable goes out of scope, and that it is entirely +possible for Perl to subsequently allocate a different variable at +the same address. This will mean a new variable might accidentally +be associated with the value for an old. +.PP +If you have Perl 5.10 or later, and you just want to store a value +against the reference for lookup later, you can use the core +Hash::Util::Fieldhash module. This will also handle renaming the +keys if you use multiple threads (which causes all variables to be +reallocated at new addresses, changing their stringification), and +garbage-collecting the entries when the referenced variable goes out +of scope. +.PP +If you actually need to be able to get a real reference back from +each hash entry, you can use the Tie::RefHash module, which does the +required work for you. +.SS "How can I check if a key exists in a multilevel hash?" +.IX Subsection "How can I check if a key exists in a multilevel hash?" +(contributed by brian d foy) +.PP +The trick to this problem is avoiding accidental autovivification. If +you want to check three keys deep, you might naïvely try this: +.PP +.Vb 4 +\& my %hash; +\& if( exists $hash{key1}{key2}{key3} ) { +\& ...; +\& } +.Ve +.PP +Even though you started with a completely empty hash, after that call to +\&\f(CW\*(C`exists\*(C'\fR you've created the structure you needed to check for \f(CW\*(C`key3\*(C'\fR: +.PP +.Vb 5 +\& %hash = ( +\& \*(Aqkey1\*(Aq => { +\& \*(Aqkey2\*(Aq => {} +\& } +\& ); +.Ve +.PP +That's autovivification. You can get around this in a few ways. The +easiest way is to just turn it off. The lexical \f(CW\*(C`autovivification\*(C'\fR +pragma is available on CPAN. Now you don't add to the hash: +.PP +.Vb 7 +\& { +\& no autovivification; +\& my %hash; +\& if( exists $hash{key1}{key2}{key3} ) { +\& ...; +\& } +\& } +.Ve +.PP +The Data::Diver module on CPAN can do it for you too. Its \f(CW\*(C`Dive\*(C'\fR +subroutine can tell you not only if the keys exist but also get the +value: +.PP +.Vb 1 +\& use Data::Diver qw(Dive); +\& +\& my @exists = Dive( \e%hash, qw(key1 key2 key3) ); +\& if( ! @exists ) { +\& ...; # keys do not exist +\& } +\& elsif( ! defined $exists[0] ) { +\& ...; # keys exist but value is undef +\& } +.Ve +.PP +You can easily do this yourself too by checking each level of the hash +before you move onto the next level. This is essentially what +Data::Diver does for you: +.PP +.Vb 3 +\& if( check_hash( \e%hash, qw(key1 key2 key3) ) ) { +\& ...; +\& } +\& +\& sub check_hash { +\& my( $hash, @keys ) = @_; +\& +\& return unless @keys; +\& +\& foreach my $key ( @keys ) { +\& return unless eval { exists $hash\->{$key} }; +\& $hash = $hash\->{$key}; +\& } +\& +\& return 1; +\& } +.Ve +.SS "How can I prevent addition of unwanted keys into a hash?" +.IX Subsection "How can I prevent addition of unwanted keys into a hash?" +Since version 5.8.0, hashes can be \fIrestricted\fR to a fixed number +of given keys. Methods for creating and dealing with restricted hashes +are exported by the Hash::Util module. +.SH "Data: Misc" +.IX Header "Data: Misc" +.SS "How do I handle binary data correctly?" +.IX Subsection "How do I handle binary data correctly?" +Perl is binary-clean, so it can handle binary data just fine. +On Windows or DOS, however, you have to use \f(CW\*(C`binmode\*(C'\fR for binary +files to avoid conversions for line endings. In general, you should +use \f(CW\*(C`binmode\*(C'\fR any time you want to work with binary data. +.PP +Also see "binmode" in perlfunc or perlopentut. +.PP +If you're concerned about 8\-bit textual data then see perllocale. +If you want to deal with multibyte characters, however, there are +some gotchas. See the section on Regular Expressions. +.SS "How do I determine whether a scalar is a number/whole/integer/float?" +.IX Subsection "How do I determine whether a scalar is a number/whole/integer/float?" +Assuming that you don't care about IEEE notations like "NaN" or +"Infinity", you probably just want to use a regular expression (see also +perlretut and perlre): +.PP +.Vb 1 +\& use 5.010; +\& +\& if ( /\eD/ ) +\& { say "\ethas nondigits"; } +\& if ( /^\ed+\ez/ ) +\& { say "\etis a whole number"; } +\& if ( /^\-?\ed+\ez/ ) +\& { say "\etis an integer"; } +\& if ( /^[+\-]?\ed+\ez/ ) +\& { say "\etis a +/\- integer"; } +\& if ( /^\-?(?:\ed+\e.?|\e.\ed)\ed*\ez/ ) +\& { say "\etis a real number"; } +\& if ( /^[+\-]?(?=\e.?\ed)\ed*\e.?\ed*(?:e[+\-]?\ed+)?\ez/i ) +\& { say "\etis a C float" } +.Ve +.PP +There are also some commonly used modules for the task. +Scalar::Util (distributed with 5.8) provides access to perl's +internal function \f(CW\*(C`looks_like_number\*(C'\fR for determining whether a +variable looks like a number. Data::Types exports functions that +validate data types using both the above and other regular +expressions. Thirdly, there is Regexp::Common which has regular +expressions to match various types of numbers. Those three modules are +available from the CPAN. +.PP +If you're on a POSIX system, Perl supports the \f(CW\*(C`POSIX::strtod\*(C'\fR +function for converting strings to doubles (and also \f(CW\*(C`POSIX::strtol\*(C'\fR +for longs). Its semantics are somewhat cumbersome, so here's a +\&\f(CW\*(C`getnum\*(C'\fR wrapper function for more convenient access. This function +takes a string and returns the number it found, or \f(CW\*(C`undef\*(C'\fR for input +that isn't a C float. The \f(CW\*(C`is_numeric\*(C'\fR function is a front end to +\&\f(CW\*(C`getnum\*(C'\fR if you just want to say, "Is this a float?" +.PP +.Vb 10 +\& sub getnum { +\& use POSIX qw(strtod); +\& my $str = shift; +\& $str =~ s/^\es+//; +\& $str =~ s/\es+$//; +\& $! = 0; +\& my($num, $unparsed) = strtod($str); +\& if (($str eq \*(Aq\*(Aq) || ($unparsed != 0) || $!) { +\& return undef; +\& } +\& else { +\& return $num; +\& } +\& } +\& +\& sub is_numeric { defined getnum($_[0]) } +.Ve +.PP +Or you could check out the String::Scanf module on the CPAN +instead. +.SS "How do I keep persistent data across program calls?" +.IX Subsection "How do I keep persistent data across program calls?" +For some specific applications, you can use one of the DBM modules. +See AnyDBM_File. More generically, you should consult the FreezeThaw +or Storable modules from CPAN. Starting from Perl 5.8, Storable is part +of the standard distribution. Here's one example using Storable's \f(CW\*(C`store\*(C'\fR +and \f(CW\*(C`retrieve\*(C'\fR functions: +.PP +.Vb 2 +\& use Storable; +\& store(\e%hash, "filename"); +\& +\& # later on... +\& $href = retrieve("filename"); # by ref +\& %hash = %{ retrieve("filename") }; # direct to hash +.Ve +.SS "How do I print out or copy a recursive data structure?" +.IX Subsection "How do I print out or copy a recursive data structure?" +The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great +for printing out data structures. The Storable module on CPAN (or the +5.8 release of Perl), provides a function called \f(CW\*(C`dclone\*(C'\fR that recursively +copies its argument. +.PP +.Vb 2 +\& use Storable qw(dclone); +\& $r2 = dclone($r1); +.Ve +.PP +Where \f(CW$r1\fR can be a reference to any kind of data structure you'd like. +It will be deeply copied. Because \f(CW\*(C`dclone\*(C'\fR takes and returns references, +you'd have to add extra punctuation if you had a hash of arrays that +you wanted to copy. +.PP +.Vb 1 +\& %newhash = %{ dclone(\e%oldhash) }; +.Ve +.SS "How do I define methods for every class/object?" +.IX Subsection "How do I define methods for every class/object?" +(contributed by Ben Morrow) +.PP +You can use the \f(CW\*(C`UNIVERSAL\*(C'\fR class (see UNIVERSAL). However, please +be very careful to consider the consequences of doing this: adding +methods to every object is very likely to have unintended +consequences. If possible, it would be better to have all your object +inherit from some common base class, or to use an object system like +Moose that supports roles. +.SS "How do I verify a credit card checksum?" +.IX Subsection "How do I verify a credit card checksum?" +Get the Business::CreditCard module from CPAN. +.SS "How do I pack arrays of doubles or floats for XS code?" +.IX Subsection "How do I pack arrays of doubles or floats for XS code?" +The arrays.h/arrays.c code in the PGPLOT module on CPAN does just this. +If you're doing a lot of float or double processing, consider using +the PDL module from CPAN instead\-\-it makes number-crunching easy. +.PP +See <https://metacpan.org/release/PGPLOT> for the code. +.SH "AUTHOR AND COPYRIGHT" +.IX Header "AUTHOR AND COPYRIGHT" +Copyright (c) 1997\-2010 Tom Christiansen, Nathan Torkington, and +other authors as noted. All rights reserved. +.PP +This documentation is free; you can redistribute it and/or modify it +under the same terms as Perl itself. +.PP +Irrespective of its distribution, all code examples in this file +are hereby placed into the public domain. You are permitted and +encouraged to use this code in your own programs for fun +or for profit as you see fit. A simple comment in the code giving +credit would be courteous but is not required. |