summaryrefslogtreecommitdiffstats
path: root/upstream/debian-unstable/man1/perlpacktut.1
diff options
context:
space:
mode:
Diffstat (limited to 'upstream/debian-unstable/man1/perlpacktut.1')
-rw-r--r--upstream/debian-unstable/man1/perlpacktut.11413
1 files changed, 1413 insertions, 0 deletions
diff --git a/upstream/debian-unstable/man1/perlpacktut.1 b/upstream/debian-unstable/man1/perlpacktut.1
new file mode 100644
index 00000000..bdf326c5
--- /dev/null
+++ b/upstream/debian-unstable/man1/perlpacktut.1
@@ -0,0 +1,1413 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+. ds C` ""
+. ds C' ""
+'br\}
+.el\{\
+. ds C`
+. ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD. Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+. if \nF \{\
+. de IX
+. tm Index:\\$1\t\\n%\t"\\$2"
+..
+. if !\nF==2 \{\
+. nr % 0
+. nr F 2
+. \}
+. \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "PERLPACKTUT 1"
+.TH PERLPACKTUT 1 2024-01-12 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification. Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+perlpacktut \- tutorial on "pack" and "unpack"
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+\&\f(CW\*(C`pack\*(C'\fR and \f(CW\*(C`unpack\*(C'\fR are two functions for transforming data according
+to a user-defined template, between the guarded way Perl stores values
+and some well-defined representation as might be required in the
+environment of a Perl program. Unfortunately, they're also two of
+the most misunderstood and most often overlooked functions that Perl
+provides. This tutorial will demystify them for you.
+.SH "The Basic Principle"
+.IX Header "The Basic Principle"
+Most programming languages don't shelter the memory where variables are
+stored. In C, for instance, you can take the address of some variable,
+and the \f(CW\*(C`sizeof\*(C'\fR operator tells you how many bytes are allocated to
+the variable. Using the address and the size, you may access the storage
+to your heart's content.
+.PP
+In Perl, you just can't access memory at random, but the structural and
+representational conversion provided by \f(CW\*(C`pack\*(C'\fR and \f(CW\*(C`unpack\*(C'\fR is an
+excellent alternative. The \f(CW\*(C`pack\*(C'\fR function converts values to a byte
+sequence containing representations according to a given specification,
+the so-called "template" argument. \f(CW\*(C`unpack\*(C'\fR is the reverse process,
+deriving some values from the contents of a string of bytes. (Be cautioned,
+however, that not all that has been packed together can be neatly unpacked \-
+a very common experience as seasoned travellers are likely to confirm.)
+.PP
+Why, you may ask, would you need a chunk of memory containing some values
+in binary representation? One good reason is input and output accessing
+some file, a device, or a network connection, whereby this binary
+representation is either forced on you or will give you some benefit
+in processing. Another cause is passing data to some system call that
+is not available as a Perl function: \f(CW\*(C`syscall\*(C'\fR requires you to provide
+parameters stored in the way it happens in a C program. Even text processing
+(as shown in the next section) may be simplified with judicious usage
+of these two functions.
+.PP
+To see how (un)packing works, we'll start with a simple template
+code where the conversion is in low gear: between the contents of a byte
+sequence and a string of hexadecimal digits. Let's use \f(CW\*(C`unpack\*(C'\fR, since
+this is likely to remind you of a dump program, or some desperate last
+message unfortunate programs are wont to throw at you before they expire
+into the wild blue yonder. Assuming that the variable \f(CW$mem\fR holds a
+sequence of bytes that we'd like to inspect without assuming anything
+about its meaning, we can write
+.PP
+.Vb 2
+\& my( $hex ) = unpack( \*(AqH*\*(Aq, $mem );
+\& print "$hex\en";
+.Ve
+.PP
+whereupon we might see something like this, with each pair of hex digits
+corresponding to a byte:
+.PP
+.Vb 1
+\& 41204d414e204120504c414e20412043414e414c2050414e414d41
+.Ve
+.PP
+What was in this chunk of memory? Numbers, characters, or a mixture of
+both? Assuming that we're on a computer where ASCII (or some similar)
+encoding is used: hexadecimal values in the range \f(CW0x40\fR \- \f(CW0x5A\fR
+indicate an uppercase letter, and \f(CW0x20\fR encodes a space. So we might
+assume it is a piece of text, which some are able to read like a tabloid;
+but others will have to get hold of an ASCII table and relive that
+firstgrader feeling. Not caring too much about which way to read this,
+we note that \f(CW\*(C`unpack\*(C'\fR with the template code \f(CW\*(C`H\*(C'\fR converts the contents
+of a sequence of bytes into the customary hexadecimal notation. Since
+"a sequence of" is a pretty vague indication of quantity, \f(CW\*(C`H\*(C'\fR has been
+defined to convert just a single hexadecimal digit unless it is followed
+by a repeat count. An asterisk for the repeat count means to use whatever
+remains.
+.PP
+The inverse operation \- packing byte contents from a string of hexadecimal
+digits \- is just as easily written. For instance:
+.PP
+.Vb 2
+\& my $s = pack( \*(AqH2\*(Aq x 10, 30..39 );
+\& print "$s\en";
+.Ve
+.PP
+Since we feed a list of ten 2\-digit hexadecimal strings to \f(CW\*(C`pack\*(C'\fR, the
+pack template should contain ten pack codes. If this is run on a computer
+with ASCII character coding, it will print \f(CW0123456789\fR.
+.SH "Packing Text"
+.IX Header "Packing Text"
+Let's suppose you've got to read in a data file like this:
+.PP
+.Vb 4
+\& Date |Description | Income|Expenditure
+\& 01/24/2001 Zed\*(Aqs Camel Emporium 1147.99
+\& 01/28/2001 Flea spray 24.99
+\& 01/29/2001 Camel rides to tourists 235.00
+.Ve
+.PP
+How do we do it? You might think first to use \f(CW\*(C`split\*(C'\fR; however, since
+\&\f(CW\*(C`split\*(C'\fR collapses blank fields, you'll never know whether a record was
+income or expenditure. Oops. Well, you could always use \f(CW\*(C`substr\*(C'\fR:
+.PP
+.Vb 7
+\& while (<>) {
+\& my $date = substr($_, 0, 11);
+\& my $desc = substr($_, 12, 27);
+\& my $income = substr($_, 40, 7);
+\& my $expend = substr($_, 52, 7);
+\& ...
+\& }
+.Ve
+.PP
+It's not really a barrel of laughs, is it? In fact, it's worse than it
+may seem; the eagle-eyed may notice that the first field should only be
+10 characters wide, and the error has propagated right through the other
+numbers \- which we've had to count by hand. So it's error-prone as well
+as horribly unfriendly.
+.PP
+Or maybe we could use regular expressions:
+.PP
+.Vb 5
+\& while (<>) {
+\& my($date, $desc, $income, $expend) =
+\& m|(\ed\ed/\ed\ed/\ed{4}) (.{27}) (.{7})(.*)|;
+\& ...
+\& }
+.Ve
+.PP
+Urgh. Well, it's a bit better, but \- well, would you want to maintain
+that?
+.PP
+Hey, isn't Perl supposed to make this sort of thing easy? Well, it does,
+if you use the right tools. \f(CW\*(C`pack\*(C'\fR and \f(CW\*(C`unpack\*(C'\fR are designed to help
+you out when dealing with fixed-width data like the above. Let's have a
+look at a solution with \f(CW\*(C`unpack\*(C'\fR:
+.PP
+.Vb 4
+\& while (<>) {
+\& my($date, $desc, $income, $expend) = unpack("A10xA27xA7A*", $_);
+\& ...
+\& }
+.Ve
+.PP
+That looks a bit nicer; but we've got to take apart that weird template.
+Where did I pull that out of?
+.PP
+OK, let's have a look at some of our data again; in fact, we'll include
+the headers, and a handy ruler so we can keep track of where we are.
+.PP
+.Vb 5
+\& 1 2 3 4 5
+\& 1234567890123456789012345678901234567890123456789012345678
+\& Date |Description | Income|Expenditure
+\& 01/28/2001 Flea spray 24.99
+\& 01/29/2001 Camel rides to tourists 235.00
+.Ve
+.PP
+From this, we can see that the date column stretches from column 1 to
+column 10 \- ten characters wide. The \f(CW\*(C`pack\*(C'\fR\-ese for "character" is
+\&\f(CW\*(C`A\*(C'\fR, and ten of them are \f(CW\*(C`A10\*(C'\fR. So if we just wanted to extract the
+dates, we could say this:
+.PP
+.Vb 1
+\& my($date) = unpack("A10", $_);
+.Ve
+.PP
+OK, what's next? Between the date and the description is a blank column;
+we want to skip over that. The \f(CW\*(C`x\*(C'\fR template means "skip forward", so we
+want one of those. Next, we have another batch of characters, from 12 to
+38. That's 27 more characters, hence \f(CW\*(C`A27\*(C'\fR. (Don't make the fencepost
+error \- there are 27 characters between 12 and 38, not 26. Count 'em!)
+.PP
+Now we skip another character and pick up the next 7 characters:
+.PP
+.Vb 1
+\& my($date,$description,$income) = unpack("A10xA27xA7", $_);
+.Ve
+.PP
+Now comes the clever bit. Lines in our ledger which are just income and
+not expenditure might end at column 46. Hence, we don't want to tell our
+\&\f(CW\*(C`unpack\*(C'\fR pattern that we \fBneed\fR to find another 12 characters; we'll
+just say "if there's anything left, take it". As you might guess from
+regular expressions, that's what the \f(CW\*(C`*\*(C'\fR means: "use everything
+remaining".
+.IP \(bu 3
+Be warned, though, that unlike regular expressions, if the \f(CW\*(C`unpack\*(C'\fR
+template doesn't match the incoming data, Perl will scream and die.
+.PP
+Hence, putting it all together:
+.PP
+.Vb 2
+\& my ($date, $description, $income, $expend) =
+\& unpack("A10xA27xA7xA*", $_);
+.Ve
+.PP
+Now, that's our data parsed. I suppose what we might want to do now is
+total up our income and expenditure, and add another line to the end of
+our ledger \- in the same format \- saying how much we've brought in and
+how much we've spent:
+.PP
+.Vb 6
+\& while (<>) {
+\& my ($date, $desc, $income, $expend) =
+\& unpack("A10xA27xA7xA*", $_);
+\& $tot_income += $income;
+\& $tot_expend += $expend;
+\& }
+\&
+\& $tot_income = sprintf("%.2f", $tot_income); # Get them into
+\& $tot_expend = sprintf("%.2f", $tot_expend); # "financial" format
+\&
+\& $date = POSIX::strftime("%m/%d/%Y", localtime);
+\&
+\& # OK, let\*(Aqs go:
+\&
+\& print pack("A10xA27xA7xA*", $date, "Totals",
+\& $tot_income, $tot_expend);
+.Ve
+.PP
+Oh, hmm. That didn't quite work. Let's see what happened:
+.PP
+.Vb 4
+\& 01/24/2001 Zed\*(Aqs Camel Emporium 1147.99
+\& 01/28/2001 Flea spray 24.99
+\& 01/29/2001 Camel rides to tourists 1235.00
+\& 03/23/2001Totals 1235.001172.98
+.Ve
+.PP
+OK, it's a start, but what happened to the spaces? We put \f(CW\*(C`x\*(C'\fR, didn't
+we? Shouldn't it skip forward? Let's look at what "pack" in perlfunc says:
+.PP
+.Vb 1
+\& x A null byte.
+.Ve
+.PP
+Urgh. No wonder. There's a big difference between "a null byte",
+character zero, and "a space", character 32. Perl's put something
+between the date and the description \- but unfortunately, we can't see
+it!
+.PP
+What we actually need to do is expand the width of the fields. The \f(CW\*(C`A\*(C'\fR
+format pads any non-existent characters with spaces, so we can use the
+additional spaces to line up our fields, like this:
+.PP
+.Vb 2
+\& print pack("A11 A28 A8 A*", $date, "Totals",
+\& $tot_income, $tot_expend);
+.Ve
+.PP
+(Note that you can put spaces in the template to make it more readable,
+but they don't translate to spaces in the output.) Here's what we got
+this time:
+.PP
+.Vb 4
+\& 01/24/2001 Zed\*(Aqs Camel Emporium 1147.99
+\& 01/28/2001 Flea spray 24.99
+\& 01/29/2001 Camel rides to tourists 1235.00
+\& 03/23/2001 Totals 1235.00 1172.98
+.Ve
+.PP
+That's a bit better, but we still have that last column which needs to
+be moved further over. There's an easy way to fix this up:
+unfortunately, we can't get \f(CW\*(C`pack\*(C'\fR to right-justify our fields, but we
+can get \f(CW\*(C`sprintf\*(C'\fR to do it:
+.PP
+.Vb 5
+\& $tot_income = sprintf("%.2f", $tot_income);
+\& $tot_expend = sprintf("%12.2f", $tot_expend);
+\& $date = POSIX::strftime("%m/%d/%Y", localtime);
+\& print pack("A11 A28 A8 A*", $date, "Totals",
+\& $tot_income, $tot_expend);
+.Ve
+.PP
+This time we get the right answer:
+.PP
+.Vb 3
+\& 01/28/2001 Flea spray 24.99
+\& 01/29/2001 Camel rides to tourists 1235.00
+\& 03/23/2001 Totals 1235.00 1172.98
+.Ve
+.PP
+So that's how we consume and produce fixed-width data. Let's recap what
+we've seen of \f(CW\*(C`pack\*(C'\fR and \f(CW\*(C`unpack\*(C'\fR so far:
+.IP \(bu 3
+Use \f(CW\*(C`pack\*(C'\fR to go from several pieces of data to one fixed-width
+version; use \f(CW\*(C`unpack\*(C'\fR to turn a fixed-width-format string into several
+pieces of data.
+.IP \(bu 3
+The pack format \f(CW\*(C`A\*(C'\fR means "any character"; if you're \f(CW\*(C`pack\*(C'\fRing and
+you've run out of things to pack, \f(CW\*(C`pack\*(C'\fR will fill the rest up with
+spaces.
+.IP \(bu 3
+\&\f(CW\*(C`x\*(C'\fR means "skip a byte" when \f(CW\*(C`unpack\*(C'\fRing; when \f(CW\*(C`pack\*(C'\fRing, it means
+"introduce a null byte" \- that's probably not what you mean if you're
+dealing with plain text.
+.IP \(bu 3
+You can follow the formats with numbers to say how many characters
+should be affected by that format: \f(CW\*(C`A12\*(C'\fR means "take 12 characters";
+\&\f(CW\*(C`x6\*(C'\fR means "skip 6 bytes" or "character 0, 6 times".
+.IP \(bu 3
+Instead of a number, you can use \f(CW\*(C`*\*(C'\fR to mean "consume everything else
+left".
+.Sp
+\&\fBWarning\fR: when packing multiple pieces of data, \f(CW\*(C`*\*(C'\fR only means
+"consume all of the current piece of data". That's to say
+.Sp
+.Vb 1
+\& pack("A*A*", $one, $two)
+.Ve
+.Sp
+packs all of \f(CW$one\fR into the first \f(CW\*(C`A*\*(C'\fR and then all of \f(CW$two\fR into
+the second. This is a general principle: each format character
+corresponds to one piece of data to be \f(CW\*(C`pack\*(C'\fRed.
+.SH "Packing Numbers"
+.IX Header "Packing Numbers"
+So much for textual data. Let's get onto the meaty stuff that \f(CW\*(C`pack\*(C'\fR
+and \f(CW\*(C`unpack\*(C'\fR are best at: handling binary formats for numbers. There is,
+of course, not just one binary format \- life would be too simple \- but
+Perl will do all the finicky labor for you.
+.SS Integers
+.IX Subsection "Integers"
+Packing and unpacking numbers implies conversion to and from some
+\&\fIspecific\fR binary representation. Leaving floating point numbers
+aside for the moment, the salient properties of any such representation
+are:
+.IP \(bu 4
+the number of bytes used for storing the integer,
+.IP \(bu 4
+whether the contents are interpreted as a signed or unsigned number,
+.IP \(bu 4
+the byte ordering: whether the first byte is the least or most
+significant byte (or: little-endian or big-endian, respectively).
+.PP
+So, for instance, to pack 20302 to a signed 16 bit integer in your
+computer's representation you write
+.PP
+.Vb 1
+\& my $ps = pack( \*(Aqs\*(Aq, 20302 );
+.Ve
+.PP
+Again, the result is a string, now containing 2 bytes. If you print
+this string (which is, generally, not recommended) you might see
+\&\f(CW\*(C`ON\*(C'\fR or \f(CW\*(C`NO\*(C'\fR (depending on your system's byte ordering) \- or something
+entirely different if your computer doesn't use ASCII character encoding.
+Unpacking \f(CW$ps\fR with the same template returns the original integer value:
+.PP
+.Vb 1
+\& my( $s ) = unpack( \*(Aqs\*(Aq, $ps );
+.Ve
+.PP
+This is true for all numeric template codes. But don't expect miracles:
+if the packed value exceeds the allotted byte capacity, high order bits
+are silently discarded, and unpack certainly won't be able to pull them
+back out of some magic hat. And, when you pack using a signed template
+code such as \f(CW\*(C`s\*(C'\fR, an excess value may result in the sign bit
+getting set, and unpacking this will smartly return a negative value.
+.PP
+16 bits won't get you too far with integers, but there is \f(CW\*(C`l\*(C'\fR and \f(CW\*(C`L\*(C'\fR
+for signed and unsigned 32\-bit integers. And if this is not enough and
+your system supports 64 bit integers you can push the limits much closer
+to infinity with pack codes \f(CW\*(C`q\*(C'\fR and \f(CW\*(C`Q\*(C'\fR. A notable exception is provided
+by pack codes \f(CW\*(C`i\*(C'\fR and \f(CW\*(C`I\*(C'\fR for signed and unsigned integers of the
+"local custom" variety: Such an integer will take up as many bytes as
+a local C compiler returns for \f(CWsizeof(int)\fR, but it'll use \fIat least\fR
+32 bits.
+.PP
+Each of the integer pack codes \f(CW\*(C`sSlLqQ\*(C'\fR results in a fixed number of bytes,
+no matter where you execute your program. This may be useful for some
+applications, but it does not provide for a portable way to pass data
+structures between Perl and C programs (bound to happen when you call
+XS extensions or the Perl function \f(CW\*(C`syscall\*(C'\fR), or when you read or
+write binary files. What you'll need in this case are template codes that
+depend on what your local C compiler compiles when you code \f(CW\*(C`short\*(C'\fR or
+\&\f(CW\*(C`unsigned long\*(C'\fR, for instance. These codes and their corresponding
+byte lengths are shown in the table below. Since the C standard leaves
+much leeway with respect to the relative sizes of these data types, actual
+values may vary, and that's why the values are given as expressions in
+C and Perl. (If you'd like to use values from \f(CW%Config\fR in your program
+you have to import it with \f(CW\*(C`use Config\*(C'\fR.)
+.PP
+.Vb 5
+\& signed unsigned byte length in C byte length in Perl
+\& s! S! sizeof(short) $Config{shortsize}
+\& i! I! sizeof(int) $Config{intsize}
+\& l! L! sizeof(long) $Config{longsize}
+\& q! Q! sizeof(long long) $Config{longlongsize}
+.Ve
+.PP
+The \f(CW\*(C`i!\*(C'\fR and \f(CW\*(C`I!\*(C'\fR codes aren't different from \f(CW\*(C`i\*(C'\fR and \f(CW\*(C`I\*(C'\fR; they are
+tolerated for completeness' sake.
+.SS "Unpacking a Stack Frame"
+.IX Subsection "Unpacking a Stack Frame"
+Requesting a particular byte ordering may be necessary when you work with
+binary data coming from some specific architecture whereas your program could
+run on a totally different system. As an example, assume you have 24 bytes
+containing a stack frame as it happens on an Intel 8086:
+.PP
+.Vb 11
+\& +\-\-\-\-\-\-\-\-\-+ +\-\-\-\-+\-\-\-\-+ +\-\-\-\-\-\-\-\-\-+
+\& TOS: | IP | TOS+4:| FL | FH | FLAGS TOS+14:| SI |
+\& +\-\-\-\-\-\-\-\-\-+ +\-\-\-\-+\-\-\-\-+ +\-\-\-\-\-\-\-\-\-+
+\& | CS | | AL | AH | AX | DI |
+\& +\-\-\-\-\-\-\-\-\-+ +\-\-\-\-+\-\-\-\-+ +\-\-\-\-\-\-\-\-\-+
+\& | BL | BH | BX | BP |
+\& +\-\-\-\-+\-\-\-\-+ +\-\-\-\-\-\-\-\-\-+
+\& | CL | CH | CX | DS |
+\& +\-\-\-\-+\-\-\-\-+ +\-\-\-\-\-\-\-\-\-+
+\& | DL | DH | DX | ES |
+\& +\-\-\-\-+\-\-\-\-+ +\-\-\-\-\-\-\-\-\-+
+.Ve
+.PP
+First, we note that this time-honored 16\-bit CPU uses little-endian order,
+and that's why the low order byte is stored at the lower address. To
+unpack such a (unsigned) short we'll have to use code \f(CW\*(C`v\*(C'\fR. A repeat
+count unpacks all 12 shorts:
+.PP
+.Vb 2
+\& my( $ip, $cs, $flags, $ax, $bx, $cx, $dx, $si, $di, $bp, $ds, $es ) =
+\& unpack( \*(Aqv12\*(Aq, $frame );
+.Ve
+.PP
+Alternatively, we could have used \f(CW\*(C`C\*(C'\fR to unpack the individually
+accessible byte registers FL, FH, AL, AH, etc.:
+.PP
+.Vb 2
+\& my( $fl, $fh, $al, $ah, $bl, $bh, $cl, $ch, $dl, $dh ) =
+\& unpack( \*(AqC10\*(Aq, substr( $frame, 4, 10 ) );
+.Ve
+.PP
+It would be nice if we could do this in one fell swoop: unpack a short,
+back up a little, and then unpack 2 bytes. Since Perl \fIis\fR nice, it
+proffers the template code \f(CW\*(C`X\*(C'\fR to back up one byte. Putting this all
+together, we may now write:
+.PP
+.Vb 5
+\& my( $ip, $cs,
+\& $flags,$fl,$fh,
+\& $ax,$al,$ah, $bx,$bl,$bh, $cx,$cl,$ch, $dx,$dl,$dh,
+\& $si, $di, $bp, $ds, $es ) =
+\& unpack( \*(Aqv2\*(Aq . (\*(AqvXXCC\*(Aq x 5) . \*(Aqv5\*(Aq, $frame );
+.Ve
+.PP
+(The clumsy construction of the template can be avoided \- just read on!)
+.PP
+We've taken some pains to construct the template so that it matches
+the contents of our frame buffer. Otherwise we'd either get undefined values,
+or \f(CW\*(C`unpack\*(C'\fR could not unpack all. If \f(CW\*(C`pack\*(C'\fR runs out of items, it will
+supply null strings (which are coerced into zeroes whenever the pack code
+says so).
+.SS "How to Eat an Egg on a Net"
+.IX Subsection "How to Eat an Egg on a Net"
+The pack code for big-endian (high order byte at the lowest address) is
+\&\f(CW\*(C`n\*(C'\fR for 16 bit and \f(CW\*(C`N\*(C'\fR for 32 bit integers. You use these codes
+if you know that your data comes from a compliant architecture, but,
+surprisingly enough, you should also use these pack codes if you
+exchange binary data, across the network, with some system that you
+know next to nothing about. The simple reason is that this
+order has been chosen as the \fInetwork order\fR, and all standard-fearing
+programs ought to follow this convention. (This is, of course, a stern
+backing for one of the Lilliputian parties and may well influence the
+political development there.) So, if the protocol expects you to send
+a message by sending the length first, followed by just so many bytes,
+you could write:
+.PP
+.Vb 1
+\& my $buf = pack( \*(AqN\*(Aq, length( $msg ) ) . $msg;
+.Ve
+.PP
+or even:
+.PP
+.Vb 1
+\& my $buf = pack( \*(AqNA*\*(Aq, length( $msg ), $msg );
+.Ve
+.PP
+and pass \f(CW$buf\fR to your send routine. Some protocols demand that the
+count should include the length of the count itself: then just add 4
+to the data length. (But make sure to read "Lengths and Widths" before
+you really code this!)
+.SS "Byte-order modifiers"
+.IX Subsection "Byte-order modifiers"
+In the previous sections we've learned how to use \f(CW\*(C`n\*(C'\fR, \f(CW\*(C`N\*(C'\fR, \f(CW\*(C`v\*(C'\fR and
+\&\f(CW\*(C`V\*(C'\fR to pack and unpack integers with big\- or little-endian byte-order.
+While this is nice, it's still rather limited because it leaves out all
+kinds of signed integers as well as 64\-bit integers. For example, if you
+wanted to unpack a sequence of signed big-endian 16\-bit integers in a
+platform-independent way, you would have to write:
+.PP
+.Vb 1
+\& my @data = unpack \*(Aqs*\*(Aq, pack \*(AqS*\*(Aq, unpack \*(Aqn*\*(Aq, $buf;
+.Ve
+.PP
+This is ugly. As of Perl 5.9.2, there's a much nicer way to express your
+desire for a certain byte-order: the \f(CW\*(C`>\*(C'\fR and \f(CW\*(C`<\*(C'\fR modifiers.
+\&\f(CW\*(C`>\*(C'\fR is the big-endian modifier, while \f(CW\*(C`<\*(C'\fR is the little-endian
+modifier. Using them, we could rewrite the above code as:
+.PP
+.Vb 1
+\& my @data = unpack \*(Aqs>*\*(Aq, $buf;
+.Ve
+.PP
+As you can see, the "big end" of the arrow touches the \f(CW\*(C`s\*(C'\fR, which is a
+nice way to remember that \f(CW\*(C`>\*(C'\fR is the big-endian modifier. The same
+obviously works for \f(CW\*(C`<\*(C'\fR, where the "little end" touches the code.
+.PP
+You will probably find these modifiers even more useful if you have
+to deal with big\- or little-endian C structures. Be sure to read
+"Packing and Unpacking C Structures" for more on that.
+.SS "Floating point Numbers"
+.IX Subsection "Floating point Numbers"
+For packing floating point numbers you have the choice between the
+pack codes \f(CW\*(C`f\*(C'\fR, \f(CW\*(C`d\*(C'\fR, \f(CW\*(C`F\*(C'\fR and \f(CW\*(C`D\*(C'\fR. \f(CW\*(C`f\*(C'\fR and \f(CW\*(C`d\*(C'\fR pack into (or unpack
+from) single-precision or double-precision representation as it is provided
+by your system. If your systems supports it, \f(CW\*(C`D\*(C'\fR can be used to pack and
+unpack (\f(CW\*(C`long double\*(C'\fR) values, which can offer even more resolution
+than \f(CW\*(C`f\*(C'\fR or \f(CW\*(C`d\*(C'\fR. \fBNote that there are different long double formats.\fR
+.PP
+\&\f(CW\*(C`F\*(C'\fR packs an \f(CW\*(C`NV\*(C'\fR, which is the floating point type used by Perl
+internally.
+.PP
+There is no such thing as a network representation for reals, so if
+you want to send your real numbers across computer boundaries, you'd
+better stick to text representation, possibly using the hexadecimal
+float format (avoiding the decimal conversion loss), unless you're
+absolutely sure what's on the other end of the line. For the even more
+adventuresome, you can use the byte-order modifiers from the previous
+section also on floating point codes.
+.SH "Exotic Templates"
+.IX Header "Exotic Templates"
+.SS "Bit Strings"
+.IX Subsection "Bit Strings"
+Bits are the atoms in the memory world. Access to individual bits may
+have to be used either as a last resort or because it is the most
+convenient way to handle your data. Bit string (un)packing converts
+between strings containing a series of \f(CW0\fR and \f(CW1\fR characters and
+a sequence of bytes each containing a group of 8 bits. This is almost
+as simple as it sounds, except that there are two ways the contents of
+a byte may be written as a bit string. Let's have a look at an annotated
+byte:
+.PP
+.Vb 5
+\& 7 6 5 4 3 2 1 0
+\& +\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+
+\& | 1 0 0 0 1 1 0 0 |
+\& +\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+
+\& MSB LSB
+.Ve
+.PP
+It's egg-eating all over again: Some think that as a bit string this should
+be written "10001100" i.e. beginning with the most significant bit, others
+insist on "00110001". Well, Perl isn't biased, so that's why we have two bit
+string codes:
+.PP
+.Vb 2
+\& $byte = pack( \*(AqB8\*(Aq, \*(Aq10001100\*(Aq ); # start with MSB
+\& $byte = pack( \*(Aqb8\*(Aq, \*(Aq00110001\*(Aq ); # start with LSB
+.Ve
+.PP
+It is not possible to pack or unpack bit fields \- just integral bytes.
+\&\f(CW\*(C`pack\*(C'\fR always starts at the next byte boundary and "rounds up" to the
+next multiple of 8 by adding zero bits as required. (If you do want bit
+fields, there is "vec" in perlfunc. Or you could implement bit field
+handling at the character string level, using split, substr, and
+concatenation on unpacked bit strings.)
+.PP
+To illustrate unpacking for bit strings, we'll decompose a simple
+status register (a "\-" stands for a "reserved" bit):
+.PP
+.Vb 4
+\& +\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+
+\& | S Z \- A \- P \- C | \- \- \- \- O D I T |
+\& +\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+
+\& MSB LSB MSB LSB
+.Ve
+.PP
+Converting these two bytes to a string can be done with the unpack
+template \f(CW\*(Aqb16\*(Aq\fR. To obtain the individual bit values from the bit
+string we use \f(CW\*(C`split\*(C'\fR with the "empty" separator pattern which dissects
+into individual characters. Bit values from the "reserved" positions are
+simply assigned to \f(CW\*(C`undef\*(C'\fR, a convenient notation for "I don't care where
+this goes".
+.PP
+.Vb 3
+\& ($carry, undef, $parity, undef, $auxcarry, undef, $zero, $sign,
+\& $trace, $interrupt, $direction, $overflow) =
+\& split( //, unpack( \*(Aqb16\*(Aq, $status ) );
+.Ve
+.PP
+We could have used an unpack template \f(CW\*(Aqb12\*(Aq\fR just as well, since the
+last 4 bits can be ignored anyway.
+.SS Uuencoding
+.IX Subsection "Uuencoding"
+Another odd-man-out in the template alphabet is \f(CW\*(C`u\*(C'\fR, which packs a
+"uuencoded string". ("uu" is short for Unix-to-Unix.) Chances are that
+you won't ever need this encoding technique which was invented to overcome
+the shortcomings of old-fashioned transmission mediums that do not support
+other than simple ASCII data. The essential recipe is simple: Take three
+bytes, or 24 bits. Split them into 4 six-packs, adding a space (0x20) to
+each. Repeat until all of the data is blended. Fold groups of 4 bytes into
+lines no longer than 60 and garnish them in front with the original byte count
+(incremented by 0x20) and a \f(CW"\en"\fR at the end. \- The \f(CW\*(C`pack\*(C'\fR chef will
+prepare this for you, a la minute, when you select pack code \f(CW\*(C`u\*(C'\fR on the menu:
+.PP
+.Vb 1
+\& my $uubuf = pack( \*(Aqu\*(Aq, $bindat );
+.Ve
+.PP
+A repeat count after \f(CW\*(C`u\*(C'\fR sets the number of bytes to put into an
+uuencoded line, which is the maximum of 45 by default, but could be
+set to some (smaller) integer multiple of three. \f(CW\*(C`unpack\*(C'\fR simply ignores
+the repeat count.
+.SS "Doing Sums"
+.IX Subsection "Doing Sums"
+An even stranger template code is \f(CW\*(C`%\*(C'\fR<\fInumber\fR>. First, because
+it's used as a prefix to some other template code. Second, because it
+cannot be used in \f(CW\*(C`pack\*(C'\fR at all, and third, in \f(CW\*(C`unpack\*(C'\fR, doesn't return the
+data as defined by the template code it precedes. Instead it'll give you an
+integer of \fInumber\fR bits that is computed from the data value by
+doing sums. For numeric unpack codes, no big feat is achieved:
+.PP
+.Vb 2
+\& my $buf = pack( \*(Aqiii\*(Aq, 100, 20, 3 );
+\& print unpack( \*(Aq%32i3\*(Aq, $buf ), "\en"; # prints 123
+.Ve
+.PP
+For string values, \f(CW\*(C`%\*(C'\fR returns the sum of the byte values saving
+you the trouble of a sum loop with \f(CW\*(C`substr\*(C'\fR and \f(CW\*(C`ord\*(C'\fR:
+.PP
+.Vb 1
+\& print unpack( \*(Aq%32A*\*(Aq, "\ex01\ex10" ), "\en"; # prints 17
+.Ve
+.PP
+Although the \f(CW\*(C`%\*(C'\fR code is documented as returning a "checksum":
+don't put your trust in such values! Even when applied to a small number
+of bytes, they won't guarantee a noticeable Hamming distance.
+.PP
+In connection with \f(CW\*(C`b\*(C'\fR or \f(CW\*(C`B\*(C'\fR, \f(CW\*(C`%\*(C'\fR simply adds bits, and this can be put
+to good use to count set bits efficiently:
+.PP
+.Vb 1
+\& my $bitcount = unpack( \*(Aq%32b*\*(Aq, $mask );
+.Ve
+.PP
+And an even parity bit can be determined like this:
+.PP
+.Vb 1
+\& my $evenparity = unpack( \*(Aq%1b*\*(Aq, $mask );
+.Ve
+.SS Unicode
+.IX Subsection "Unicode"
+Unicode is a character set that can represent most characters in most of
+the world's languages, providing room for over one million different
+characters. Unicode 3.1 specifies 94,140 characters: The Basic Latin
+characters are assigned to the numbers 0 \- 127. The Latin\-1 Supplement with
+characters that are used in several European languages is in the next
+range, up to 255. After some more Latin extensions we find the character
+sets from languages using non-Roman alphabets, interspersed with a
+variety of symbol sets such as currency symbols, Zapf Dingbats or Braille.
+(You might want to visit <https://www.unicode.org/> for a look at some of
+them \- my personal favourites are Telugu and Kannada.)
+.PP
+The Unicode character sets associates characters with integers. Encoding
+these numbers in an equal number of bytes would more than double the
+requirements for storing texts written in Latin alphabets.
+The UTF\-8 encoding avoids this by storing the most common (from a western
+point of view) characters in a single byte while encoding the rarer
+ones in three or more bytes.
+.PP
+Perl uses UTF\-8, internally, for most Unicode strings.
+.PP
+So what has this got to do with \f(CW\*(C`pack\*(C'\fR? Well, if you want to compose a
+Unicode string (that is internally encoded as UTF\-8), you can do so by
+using template code \f(CW\*(C`U\*(C'\fR. As an example, let's produce the Euro currency
+symbol (code number 0x20AC):
+.PP
+.Vb 2
+\& $UTF8{Euro} = pack( \*(AqU\*(Aq, 0x20AC );
+\& # Equivalent to: $UTF8{Euro} = "\ex{20ac}";
+.Ve
+.PP
+Inspecting \f(CW$UTF8{Euro}\fR shows that it contains 3 bytes:
+"\exe2\ex82\exac". However, it contains only 1 character, number 0x20AC.
+The round trip can be completed with \f(CW\*(C`unpack\*(C'\fR:
+.PP
+.Vb 1
+\& $Unicode{Euro} = unpack( \*(AqU\*(Aq, $UTF8{Euro} );
+.Ve
+.PP
+Unpacking using the \f(CW\*(C`U\*(C'\fR template code also works on UTF\-8 encoded byte
+strings.
+.PP
+Usually you'll want to pack or unpack UTF\-8 strings:
+.PP
+.Vb 3
+\& # pack and unpack the Hebrew alphabet
+\& my $alefbet = pack( \*(AqU*\*(Aq, 0x05d0..0x05ea );
+\& my @hebrew = unpack( \*(AqU*\*(Aq, $utf );
+.Ve
+.PP
+Please note: in the general case, you're better off using
+\&\f(CW\*(C`Encode::decode(\*(AqUTF\-8\*(Aq, $utf)\*(C'\fR to decode a UTF\-8
+encoded byte string to a Perl Unicode string, and
+\&\f(CW\*(C`Encode::encode(\*(AqUTF\-8\*(Aq, $str)\*(C'\fR to encode a Perl Unicode
+string to UTF\-8 bytes. These functions provide means of handling invalid byte
+sequences and generally have a friendlier interface.
+.SS "Another Portable Binary Encoding"
+.IX Subsection "Another Portable Binary Encoding"
+The pack code \f(CW\*(C`w\*(C'\fR has been added to support a portable binary data
+encoding scheme that goes way beyond simple integers. (Details can
+be found at <https://github.com/mworks\-project/mw_scarab/blob/master/Scarab\-0.1.00d19/doc/binary\-serialization.txt>,
+the Scarab project.) A BER (Binary Encoded
+Representation) compressed unsigned integer stores base 128
+digits, most significant digit first, with as few digits as possible.
+Bit eight (the high bit) is set on each byte except the last. There
+is no size limit to BER encoding, but Perl won't go to extremes.
+.PP
+.Vb 1
+\& my $berbuf = pack( \*(Aqw*\*(Aq, 1, 128, 128+1, 128*128+127 );
+.Ve
+.PP
+A hex dump of \f(CW$berbuf\fR, with spaces inserted at the right places,
+shows 01 8100 8101 81807F. Since the last byte is always less than
+128, \f(CW\*(C`unpack\*(C'\fR knows where to stop.
+.SH "Template Grouping"
+.IX Header "Template Grouping"
+Prior to Perl 5.8, repetitions of templates had to be made by
+\&\f(CW\*(C`x\*(C'\fR\-multiplication of template strings. Now there is a better way as
+we may use the pack codes \f(CW\*(C`(\*(C'\fR and \f(CW\*(C`)\*(C'\fR combined with a repeat count.
+The \f(CW\*(C`unpack\*(C'\fR template from the Stack Frame example can simply
+be written like this:
+.PP
+.Vb 1
+\& unpack( \*(Aqv2 (vXXCC)5 v5\*(Aq, $frame )
+.Ve
+.PP
+Let's explore this feature a little more. We'll begin with the equivalent of
+.PP
+.Vb 1
+\& join( \*(Aq\*(Aq, map( substr( $_, 0, 1 ), @str ) )
+.Ve
+.PP
+which returns a string consisting of the first character from each string.
+Using pack, we can write
+.PP
+.Vb 1
+\& pack( \*(Aq(A)\*(Aq.@str, @str )
+.Ve
+.PP
+or, because a repeat count \f(CW\*(C`*\*(C'\fR means "repeat as often as required",
+simply
+.PP
+.Vb 1
+\& pack( \*(Aq(A)*\*(Aq, @str )
+.Ve
+.PP
+(Note that the template \f(CW\*(C`A*\*(C'\fR would only have packed \f(CW$str[0]\fR in full
+length.)
+.PP
+To pack dates stored as triplets ( day, month, year ) in an array \f(CW@dates\fR
+into a sequence of byte, byte, short integer we can write
+.PP
+.Vb 1
+\& $pd = pack( \*(Aq(CCS)*\*(Aq, map( @$_, @dates ) );
+.Ve
+.PP
+To swap pairs of characters in a string (with even length) one could use
+several techniques. First, let's use \f(CW\*(C`x\*(C'\fR and \f(CW\*(C`X\*(C'\fR to skip forward and back:
+.PP
+.Vb 1
+\& $s = pack( \*(Aq(A)*\*(Aq, unpack( \*(Aq(xAXXAx)*\*(Aq, $s ) );
+.Ve
+.PP
+We can also use \f(CW\*(C`@\*(C'\fR to jump to an offset, with 0 being the position where
+we were when the last \f(CW\*(C`(\*(C'\fR was encountered:
+.PP
+.Vb 1
+\& $s = pack( \*(Aq(A)*\*(Aq, unpack( \*(Aq(@1A @0A @2)*\*(Aq, $s ) );
+.Ve
+.PP
+Finally, there is also an entirely different approach by unpacking big
+endian shorts and packing them in the reverse byte order:
+.PP
+.Vb 1
+\& $s = pack( \*(Aq(v)*\*(Aq, unpack( \*(Aq(n)*\*(Aq, $s );
+.Ve
+.SH "Lengths and Widths"
+.IX Header "Lengths and Widths"
+.SS "String Lengths"
+.IX Subsection "String Lengths"
+In the previous section we've seen a network message that was constructed
+by prefixing the binary message length to the actual message. You'll find
+that packing a length followed by so many bytes of data is a
+frequently used recipe since appending a null byte won't work
+if a null byte may be part of the data. Here is an example where both
+techniques are used: after two null terminated strings with source and
+destination address, a Short Message (to a mobile phone) is sent after
+a length byte:
+.PP
+.Vb 1
+\& my $msg = pack( \*(AqZ*Z*CA*\*(Aq, $src, $dst, length( $sm ), $sm );
+.Ve
+.PP
+Unpacking this message can be done with the same template:
+.PP
+.Vb 1
+\& ( $src, $dst, $len, $sm ) = unpack( \*(AqZ*Z*CA*\*(Aq, $msg );
+.Ve
+.PP
+There's a subtle trap lurking in the offing: Adding another field after
+the Short Message (in variable \f(CW$sm\fR) is all right when packing, but this
+cannot be unpacked naively:
+.PP
+.Vb 2
+\& # pack a message
+\& my $msg = pack( \*(AqZ*Z*CA*C\*(Aq, $src, $dst, length( $sm ), $sm, $prio );
+\&
+\& # unpack fails \- $prio remains undefined!
+\& ( $src, $dst, $len, $sm, $prio ) = unpack( \*(AqZ*Z*CA*C\*(Aq, $msg );
+.Ve
+.PP
+The pack code \f(CW\*(C`A*\*(C'\fR gobbles up all remaining bytes, and \f(CW$prio\fR remains
+undefined! Before we let disappointment dampen the morale: Perl's got
+the trump card to make this trick too, just a little further up the sleeve.
+Watch this:
+.PP
+.Vb 2
+\& # pack a message: ASCIIZ, ASCIIZ, length/string, byte
+\& my $msg = pack( \*(AqZ* Z* C/A* C\*(Aq, $src, $dst, $sm, $prio );
+\&
+\& # unpack
+\& ( $src, $dst, $sm, $prio ) = unpack( \*(AqZ* Z* C/A* C\*(Aq, $msg );
+.Ve
+.PP
+Combining two pack codes with a slash (\f(CW\*(C`/\*(C'\fR) associates them with a single
+value from the argument list. In \f(CW\*(C`pack\*(C'\fR, the length of the argument is
+taken and packed according to the first code while the argument itself
+is added after being converted with the template code after the slash.
+This saves us the trouble of inserting the \f(CW\*(C`length\*(C'\fR call, but it is
+in \f(CW\*(C`unpack\*(C'\fR where we really score: The value of the length byte marks the
+end of the string to be taken from the buffer. Since this combination
+doesn't make sense except when the second pack code isn't \f(CW\*(C`a*\*(C'\fR, \f(CW\*(C`A*\*(C'\fR
+or \f(CW\*(C`Z*\*(C'\fR, Perl won't let you.
+.PP
+The pack code preceding \f(CW\*(C`/\*(C'\fR may be anything that's fit to represent a
+number: All the numeric binary pack codes, and even text codes such as
+\&\f(CW\*(C`A4\*(C'\fR or \f(CW\*(C`Z*\*(C'\fR:
+.PP
+.Vb 4
+\& # pack/unpack a string preceded by its length in ASCII
+\& my $buf = pack( \*(AqA4/A*\*(Aq, "Humpty\-Dumpty" );
+\& # unpack $buf: \*(Aq13 Humpty\-Dumpty\*(Aq
+\& my $txt = unpack( \*(AqA4/A*\*(Aq, $buf );
+.Ve
+.PP
+\&\f(CW\*(C`/\*(C'\fR is not implemented in Perls before 5.6, so if your code is required to
+work on ancient Perls you'll need to \f(CW\*(C`unpack( \*(AqZ* Z* C\*(Aq)\*(C'\fR to get the length,
+then use it to make a new unpack string. For example
+.PP
+.Vb 3
+\& # pack a message: ASCIIZ, ASCIIZ, length, string, byte
+\& # (5.005 compatible)
+\& my $msg = pack( \*(AqZ* Z* C A* C\*(Aq, $src, $dst, length $sm, $sm, $prio );
+\&
+\& # unpack
+\& ( undef, undef, $len) = unpack( \*(AqZ* Z* C\*(Aq, $msg );
+\& ($src, $dst, $sm, $prio) = unpack ( "Z* Z* x A$len C", $msg );
+.Ve
+.PP
+But that second \f(CW\*(C`unpack\*(C'\fR is rushing ahead. It isn't using a simple literal
+string for the template. So maybe we should introduce...
+.SS "Dynamic Templates"
+.IX Subsection "Dynamic Templates"
+So far, we've seen literals used as templates. If the list of pack
+items doesn't have fixed length, an expression constructing the
+template is required (whenever, for some reason, \f(CW\*(C`()*\*(C'\fR cannot be used).
+Here's an example: To store named string values in a way that can be
+conveniently parsed by a C program, we create a sequence of names and
+null terminated ASCII strings, with \f(CW\*(C`=\*(C'\fR between the name and the value,
+followed by an additional delimiting null byte. Here's how:
+.PP
+.Vb 2
+\& my $env = pack( \*(Aq(A*A*Z*)\*(Aq . keys( %Env ) . \*(AqC\*(Aq,
+\& map( { ( $_, \*(Aq=\*(Aq, $Env{$_} ) } keys( %Env ) ), 0 );
+.Ve
+.PP
+Let's examine the cogs of this byte mill, one by one. There's the \f(CW\*(C`map\*(C'\fR
+call, creating the items we intend to stuff into the \f(CW$env\fR buffer:
+to each key (in \f(CW$_\fR) it adds the \f(CW\*(C`=\*(C'\fR separator and the hash entry value.
+Each triplet is packed with the template code sequence \f(CW\*(C`A*A*Z*\*(C'\fR that
+is repeated according to the number of keys. (Yes, that's what the \f(CW\*(C`keys\*(C'\fR
+function returns in scalar context.) To get the very last null byte,
+we add a \f(CW0\fR at the end of the \f(CW\*(C`pack\*(C'\fR list, to be packed with \f(CW\*(C`C\*(C'\fR.
+(Attentive readers may have noticed that we could have omitted the 0.)
+.PP
+For the reverse operation, we'll have to determine the number of items
+in the buffer before we can let \f(CW\*(C`unpack\*(C'\fR rip it apart:
+.PP
+.Vb 2
+\& my $n = $env =~ tr/\e0// \- 1;
+\& my %env = map( split( /=/, $_ ), unpack( "(Z*)$n", $env ) );
+.Ve
+.PP
+The \f(CW\*(C`tr\*(C'\fR counts the null bytes. The \f(CW\*(C`unpack\*(C'\fR call returns a list of
+name-value pairs each of which is taken apart in the \f(CW\*(C`map\*(C'\fR block.
+.SS "Counting Repetitions"
+.IX Subsection "Counting Repetitions"
+Rather than storing a sentinel at the end of a data item (or a list of items),
+we could precede the data with a count. Again, we pack keys and values of
+a hash, preceding each with an unsigned short length count, and up front
+we store the number of pairs:
+.PP
+.Vb 1
+\& my $env = pack( \*(AqS(S/A* S/A*)*\*(Aq, scalar keys( %Env ), %Env );
+.Ve
+.PP
+This simplifies the reverse operation as the number of repetitions can be
+unpacked with the \f(CW\*(C`/\*(C'\fR code:
+.PP
+.Vb 1
+\& my %env = unpack( \*(AqS/(S/A* S/A*)\*(Aq, $env );
+.Ve
+.PP
+Note that this is one of the rare cases where you cannot use the same
+template for \f(CW\*(C`pack\*(C'\fR and \f(CW\*(C`unpack\*(C'\fR because \f(CW\*(C`pack\*(C'\fR can't determine
+a repeat count for a \f(CW\*(C`()\*(C'\fR\-group.
+.SS "Intel HEX"
+.IX Subsection "Intel HEX"
+Intel HEX is a file format for representing binary data, mostly for
+programming various chips, as a text file. (See
+<https://en.wikipedia.org/wiki/.hex> for a detailed description, and
+<https://en.wikipedia.org/wiki/SREC_(file_format)> for the Motorola
+S\-record format, which can be unravelled using the same technique.)
+Each line begins with a colon (':') and is followed by a sequence of
+hexadecimal characters, specifying a byte count \fIn\fR (8 bit),
+an address (16 bit, big endian), a record type (8 bit), \fIn\fR data bytes
+and a checksum (8 bit) computed as the least significant byte of the two's
+complement sum of the preceding bytes. Example: \f(CW\*(C`:0300300002337A1E\*(C'\fR.
+.PP
+The first step of processing such a line is the conversion, to binary,
+of the hexadecimal data, to obtain the four fields, while checking the
+checksum. No surprise here: we'll start with a simple \f(CW\*(C`pack\*(C'\fR call to
+convert everything to binary:
+.PP
+.Vb 1
+\& my $binrec = pack( \*(AqH*\*(Aq, substr( $hexrec, 1 ) );
+.Ve
+.PP
+The resulting byte sequence is most convenient for checking the checksum.
+Don't slow your program down with a for loop adding the \f(CW\*(C`ord\*(C'\fR values
+of this string's bytes \- the \f(CW\*(C`unpack\*(C'\fR code \f(CW\*(C`%\*(C'\fR is the thing to use
+for computing the 8\-bit sum of all bytes, which must be equal to zero:
+.PP
+.Vb 1
+\& die unless unpack( "%8C*", $binrec ) == 0;
+.Ve
+.PP
+Finally, let's get those four fields. By now, you shouldn't have any
+problems with the first three fields \- but how can we use the byte count
+of the data in the first field as a length for the data field? Here
+the codes \f(CW\*(C`x\*(C'\fR and \f(CW\*(C`X\*(C'\fR come to the rescue, as they permit jumping
+back and forth in the string to unpack.
+.PP
+.Vb 1
+\& my( $addr, $type, $data ) = unpack( "x n C X4 C x3 /a", $bin );
+.Ve
+.PP
+Code \f(CW\*(C`x\*(C'\fR skips a byte, since we don't need the count yet. Code \f(CW\*(C`n\*(C'\fR takes
+care of the 16\-bit big-endian integer address, and \f(CW\*(C`C\*(C'\fR unpacks the
+record type. Being at offset 4, where the data begins, we need the count.
+\&\f(CW\*(C`X4\*(C'\fR brings us back to square one, which is the byte at offset 0.
+Now we pick up the count, and zoom forth to offset 4, where we are
+now fully furnished to extract the exact number of data bytes, leaving
+the trailing checksum byte alone.
+.SH "Packing and Unpacking C Structures"
+.IX Header "Packing and Unpacking C Structures"
+In previous sections we have seen how to pack numbers and character
+strings. If it were not for a couple of snags we could conclude this
+section right away with the terse remark that C structures don't
+contain anything else, and therefore you already know all there is to it.
+Sorry, no: read on, please.
+.PP
+If you have to deal with a lot of C structures, and don't want to
+hack all your template strings manually, you'll probably want to have
+a look at the CPAN module \f(CW\*(C`Convert::Binary::C\*(C'\fR. Not only can it parse
+your C source directly, but it also has built-in support for all the
+odds and ends described further on in this section.
+.SS "The Alignment Pit"
+.IX Subsection "The Alignment Pit"
+In the consideration of speed against memory requirements the balance
+has been tilted in favor of faster execution. This has influenced the
+way C compilers allocate memory for structures: On architectures
+where a 16\-bit or 32\-bit operand can be moved faster between places in
+memory, or to or from a CPU register, if it is aligned at an even or
+multiple-of-four or even at a multiple-of eight address, a C compiler
+will give you this speed benefit by stuffing extra bytes into structures.
+If you don't cross the C shoreline this is not likely to cause you any
+grief (although you should care when you design large data structures,
+or you want your code to be portable between architectures (you do want
+that, don't you?)).
+.PP
+To see how this affects \f(CW\*(C`pack\*(C'\fR and \f(CW\*(C`unpack\*(C'\fR, we'll compare these two
+C structures:
+.PP
+.Vb 6
+\& typedef struct {
+\& char c1;
+\& short s;
+\& char c2;
+\& long l;
+\& } gappy_t;
+\&
+\& typedef struct {
+\& long l;
+\& short s;
+\& char c1;
+\& char c2;
+\& } dense_t;
+.Ve
+.PP
+Typically, a C compiler allocates 12 bytes to a \f(CW\*(C`gappy_t\*(C'\fR variable, but
+requires only 8 bytes for a \f(CW\*(C`dense_t\*(C'\fR. After investigating this further,
+we can draw memory maps, showing where the extra 4 bytes are hidden:
+.PP
+.Vb 5
+\& 0 +4 +8 +12
+\& +\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+
+\& |c1|xx| s |c2|xx|xx|xx| l | xx = fill byte
+\& +\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+
+\& gappy_t
+\&
+\& 0 +4 +8
+\& +\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+
+\& | l | h |c1|c2|
+\& +\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+\-\-+
+\& dense_t
+.Ve
+.PP
+And that's where the first quirk strikes: \f(CW\*(C`pack\*(C'\fR and \f(CW\*(C`unpack\*(C'\fR
+templates have to be stuffed with \f(CW\*(C`x\*(C'\fR codes to get those extra fill bytes.
+.PP
+The natural question: "Why can't Perl compensate for the gaps?" warrants
+an answer. One good reason is that C compilers might provide (non-ANSI)
+extensions permitting all sorts of fancy control over the way structures
+are aligned, even at the level of an individual structure field. And, if
+this were not enough, there is an insidious thing called \f(CW\*(C`union\*(C'\fR where
+the amount of fill bytes cannot be derived from the alignment of the next
+item alone.
+.PP
+OK, so let's bite the bullet. Here's one way to get the alignment right
+by inserting template codes \f(CW\*(C`x\*(C'\fR, which don't take a corresponding item
+from the list:
+.PP
+.Vb 1
+\& my $gappy = pack( \*(Aqcxs cxxx l!\*(Aq, $c1, $s, $c2, $l );
+.Ve
+.PP
+Note the \f(CW\*(C`!\*(C'\fR after \f(CW\*(C`l\*(C'\fR: We want to make sure that we pack a long
+integer as it is compiled by our C compiler. And even now, it will only
+work for the platforms where the compiler aligns things as above.
+And somebody somewhere has a platform where it doesn't.
+[Probably a Cray, where \f(CW\*(C`short\*(C'\fRs, \f(CW\*(C`int\*(C'\fRs and \f(CW\*(C`long\*(C'\fRs are all 8 bytes. :\-)]
+.PP
+Counting bytes and watching alignments in lengthy structures is bound to
+be a drag. Isn't there a way we can create the template with a simple
+program? Here's a C program that does the trick:
+.PP
+.Vb 2
+\& #include <stdio.h>
+\& #include <stddef.h>
+\&
+\& typedef struct {
+\& char fc1;
+\& short fs;
+\& char fc2;
+\& long fl;
+\& } gappy_t;
+\&
+\& #define Pt(struct,field,tchar) \e
+\& printf( "@%d%s ", offsetof(struct,field), # tchar );
+\&
+\& int main() {
+\& Pt( gappy_t, fc1, c );
+\& Pt( gappy_t, fs, s! );
+\& Pt( gappy_t, fc2, c );
+\& Pt( gappy_t, fl, l! );
+\& printf( "\en" );
+\& }
+.Ve
+.PP
+The output line can be used as a template in a \f(CW\*(C`pack\*(C'\fR or \f(CW\*(C`unpack\*(C'\fR call:
+.PP
+.Vb 1
+\& my $gappy = pack( \*(Aq@0c @2s! @4c @8l!\*(Aq, $c1, $s, $c2, $l );
+.Ve
+.PP
+Gee, yet another template code \- as if we hadn't plenty. But
+\&\f(CW\*(C`@\*(C'\fR saves our day by enabling us to specify the offset from the beginning
+of the pack buffer to the next item: This is just the value
+the \f(CW\*(C`offsetof\*(C'\fR macro (defined in \f(CW\*(C`<stddef.h>\*(C'\fR) returns when
+given a \f(CW\*(C`struct\*(C'\fR type and one of its field names ("member-designator" in
+C standardese).
+.PP
+Neither using offsets nor adding \f(CW\*(C`x\*(C'\fR's to bridge the gaps is satisfactory.
+(Just imagine what happens if the structure changes.) What we really need
+is a way of saying "skip as many bytes as required to the next multiple of N".
+In fluent templates, you say this with \f(CW\*(C`x!N\*(C'\fR where N is replaced by the
+appropriate value. Here's the next version of our struct packaging:
+.PP
+.Vb 1
+\& my $gappy = pack( \*(Aqc x!2 s c x!4 l!\*(Aq, $c1, $s, $c2, $l );
+.Ve
+.PP
+That's certainly better, but we still have to know how long all the
+integers are, and portability is far away. Rather than \f(CW2\fR,
+for instance, we want to say "however long a short is". But this can be
+done by enclosing the appropriate pack code in brackets: \f(CW\*(C`[s]\*(C'\fR. So, here's
+the very best we can do:
+.PP
+.Vb 1
+\& my $gappy = pack( \*(Aqc x![s] s c x![l!] l!\*(Aq, $c1, $s, $c2, $l );
+.Ve
+.SS "Dealing with Endian-ness"
+.IX Subsection "Dealing with Endian-ness"
+Now, imagine that we want to pack the data for a machine with a
+different byte-order. First, we'll have to figure out how big the data
+types on the target machine really are. Let's assume that the longs are
+32 bits wide and the shorts are 16 bits wide. You can then rewrite the
+template as:
+.PP
+.Vb 1
+\& my $gappy = pack( \*(Aqc x![s] s c x![l] l\*(Aq, $c1, $s, $c2, $l );
+.Ve
+.PP
+If the target machine is little-endian, we could write:
+.PP
+.Vb 1
+\& my $gappy = pack( \*(Aqc x![s] s< c x![l] l<\*(Aq, $c1, $s, $c2, $l );
+.Ve
+.PP
+This forces the short and the long members to be little-endian, and is
+just fine if you don't have too many struct members. But we could also
+use the byte-order modifier on a group and write the following:
+.PP
+.Vb 1
+\& my $gappy = pack( \*(Aq( c x![s] s c x![l] l )<\*(Aq, $c1, $s, $c2, $l );
+.Ve
+.PP
+This is not as short as before, but it makes it more obvious that we
+intend to have little-endian byte-order for a whole group, not only
+for individual template codes. It can also be more readable and easier
+to maintain.
+.SS "Alignment, Take 2"
+.IX Subsection "Alignment, Take 2"
+I'm afraid that we're not quite through with the alignment catch yet. The
+hydra raises another ugly head when you pack arrays of structures:
+.PP
+.Vb 4
+\& typedef struct {
+\& short count;
+\& char glyph;
+\& } cell_t;
+\&
+\& typedef cell_t buffer_t[BUFLEN];
+.Ve
+.PP
+Where's the catch? Padding is neither required before the first field \f(CW\*(C`count\*(C'\fR,
+nor between this and the next field \f(CW\*(C`glyph\*(C'\fR, so why can't we simply pack
+like this:
+.PP
+.Vb 3
+\& # something goes wrong here:
+\& pack( \*(Aqs!a\*(Aq x @buffer,
+\& map{ ( $_\->{count}, $_\->{glyph} ) } @buffer );
+.Ve
+.PP
+This packs \f(CW\*(C`3*@buffer\*(C'\fR bytes, but it turns out that the size of
+\&\f(CW\*(C`buffer_t\*(C'\fR is four times \f(CW\*(C`BUFLEN\*(C'\fR! The moral of the story is that
+the required alignment of a structure or array is propagated to the
+next higher level where we have to consider padding \fIat the end\fR
+of each component as well. Thus the correct template is:
+.PP
+.Vb 2
+\& pack( \*(Aqs!ax\*(Aq x @buffer,
+\& map{ ( $_\->{count}, $_\->{glyph} ) } @buffer );
+.Ve
+.SS "Alignment, Take 3"
+.IX Subsection "Alignment, Take 3"
+And even if you take all the above into account, ANSI still lets this:
+.PP
+.Vb 3
+\& typedef struct {
+\& char foo[2];
+\& } foo_t;
+.Ve
+.PP
+vary in size. The alignment constraint of the structure can be greater than
+any of its elements. [And if you think that this doesn't affect anything
+common, dismember the next cellphone that you see. Many have ARM cores, and
+the ARM structure rules make \f(CW\*(C`sizeof (foo_t)\*(C'\fR == 4]
+.SS "Pointers for How to Use Them"
+.IX Subsection "Pointers for How to Use Them"
+The title of this section indicates the second problem you may run into
+sooner or later when you pack C structures. If the function you intend
+to call expects a, say, \f(CW\*(C`void *\*(C'\fR value, you \fIcannot\fR simply take
+a reference to a Perl variable. (Although that value certainly is a
+memory address, it's not the address where the variable's contents are
+stored.)
+.PP
+Template code \f(CW\*(C`P\*(C'\fR promises to pack a "pointer to a fixed length string".
+Isn't this what we want? Let's try:
+.PP
+.Vb 3
+\& # allocate some storage and pack a pointer to it
+\& my $memory = "\ex00" x $size;
+\& my $memptr = pack( \*(AqP\*(Aq, $memory );
+.Ve
+.PP
+But wait: doesn't \f(CW\*(C`pack\*(C'\fR just return a sequence of bytes? How can we pass this
+string of bytes to some C code expecting a pointer which is, after all,
+nothing but a number? The answer is simple: We have to obtain the numeric
+address from the bytes returned by \f(CW\*(C`pack\*(C'\fR.
+.PP
+.Vb 1
+\& my $ptr = unpack( \*(AqL!\*(Aq, $memptr );
+.Ve
+.PP
+Obviously this assumes that it is possible to typecast a pointer
+to an unsigned long and vice versa, which frequently works but should not
+be taken as a universal law. \- Now that we have this pointer the next question
+is: How can we put it to good use? We need a call to some C function
+where a pointer is expected. The \fBread\fR\|(2) system call comes to mind:
+.PP
+.Vb 1
+\& ssize_t read(int fd, void *buf, size_t count);
+.Ve
+.PP
+After reading perlfunc explaining how to use \f(CW\*(C`syscall\*(C'\fR we can write
+this Perl function copying a file to standard output:
+.PP
+.Vb 12
+\& require \*(Aqsyscall.ph\*(Aq; # run h2ph to generate this file
+\& sub cat($){
+\& my $path = shift();
+\& my $size = \-s $path;
+\& my $memory = "\ex00" x $size; # allocate some memory
+\& my $ptr = unpack( \*(AqL\*(Aq, pack( \*(AqP\*(Aq, $memory ) );
+\& open( F, $path ) || die( "$path: cannot open ($!)\en" );
+\& my $fd = fileno(F);
+\& my $res = syscall( &SYS_read, fileno(F), $ptr, $size );
+\& print $memory;
+\& close( F );
+\& }
+.Ve
+.PP
+This is neither a specimen of simplicity nor a paragon of portability but
+it illustrates the point: We are able to sneak behind the scenes and
+access Perl's otherwise well-guarded memory! (Important note: Perl's
+\&\f(CW\*(C`syscall\*(C'\fR does \fInot\fR require you to construct pointers in this roundabout
+way. You simply pass a string variable, and Perl forwards the address.)
+.PP
+How does \f(CW\*(C`unpack\*(C'\fR with \f(CW\*(C`P\*(C'\fR work? Imagine some pointer in the buffer
+about to be unpacked: If it isn't the null pointer (which will smartly
+produce the \f(CW\*(C`undef\*(C'\fR value) we have a start address \- but then what?
+Perl has no way of knowing how long this "fixed length string" is, so
+it's up to you to specify the actual size as an explicit length after \f(CW\*(C`P\*(C'\fR.
+.PP
+.Vb 2
+\& my $mem = "abcdefghijklmn";
+\& print unpack( \*(AqP5\*(Aq, pack( \*(AqP\*(Aq, $mem ) ); # prints "abcde"
+.Ve
+.PP
+As a consequence, \f(CW\*(C`pack\*(C'\fR ignores any number or \f(CW\*(C`*\*(C'\fR after \f(CW\*(C`P\*(C'\fR.
+.PP
+Now that we have seen \f(CW\*(C`P\*(C'\fR at work, we might as well give \f(CW\*(C`p\*(C'\fR a whirl.
+Why do we need a second template code for packing pointers at all? The
+answer lies behind the simple fact that an \f(CW\*(C`unpack\*(C'\fR with \f(CW\*(C`p\*(C'\fR promises
+a null-terminated string starting at the address taken from the buffer,
+and that implies a length for the data item to be returned:
+.PP
+.Vb 2
+\& my $buf = pack( \*(Aqp\*(Aq, "abc\ex00efhijklmn" );
+\& print unpack( \*(Aqp\*(Aq, $buf ); # prints "abc"
+.Ve
+.PP
+Albeit this is apt to be confusing: As a consequence of the length being
+implied by the string's length, a number after pack code \f(CW\*(C`p\*(C'\fR is a repeat
+count, not a length as after \f(CW\*(C`P\*(C'\fR.
+.PP
+Using \f(CW\*(C`pack(..., $x)\*(C'\fR with \f(CW\*(C`P\*(C'\fR or \f(CW\*(C`p\*(C'\fR to get the address where \f(CW$x\fR is
+actually stored must be used with circumspection. Perl's internal machinery
+considers the relation between a variable and that address as its very own
+private matter and doesn't really care that we have obtained a copy. Therefore:
+.IP \(bu 4
+Do not use \f(CW\*(C`pack\*(C'\fR with \f(CW\*(C`p\*(C'\fR or \f(CW\*(C`P\*(C'\fR to obtain the address of variable
+that's bound to go out of scope (and thereby freeing its memory) before you
+are done with using the memory at that address.
+.IP \(bu 4
+Be very careful with Perl operations that change the value of the
+variable. Appending something to the variable, for instance, might require
+reallocation of its storage, leaving you with a pointer into no-man's land.
+.IP \(bu 4
+Don't think that you can get the address of a Perl variable
+when it is stored as an integer or double number! \f(CW\*(C`pack(\*(AqP\*(Aq, $x)\*(C'\fR will
+force the variable's internal representation to string, just as if you
+had written something like \f(CW\*(C`$x .= \*(Aq\*(Aq\*(C'\fR.
+.PP
+It's safe, however, to P\- or p\-pack a string literal, because Perl simply
+allocates an anonymous variable.
+.SH "Pack Recipes"
+.IX Header "Pack Recipes"
+Here are a collection of (possibly) useful canned recipes for \f(CW\*(C`pack\*(C'\fR
+and \f(CW\*(C`unpack\*(C'\fR:
+.PP
+.Vb 2
+\& # Convert IP address for socket functions
+\& pack( "C4", split /\e./, "123.4.5.6" );
+\&
+\& # Count the bits in a chunk of memory (e.g. a select vector)
+\& unpack( \*(Aq%32b*\*(Aq, $mask );
+\&
+\& # Determine the endianness of your system
+\& $is_little_endian = unpack( \*(Aqc\*(Aq, pack( \*(Aqs\*(Aq, 1 ) );
+\& $is_big_endian = unpack( \*(Aqxc\*(Aq, pack( \*(Aqs\*(Aq, 1 ) );
+\&
+\& # Determine the number of bits in a native integer
+\& $bits = unpack( \*(Aq%32I!\*(Aq, ~0 );
+\&
+\& # Prepare argument for the nanosleep system call
+\& my $timespec = pack( \*(AqL!L!\*(Aq, $secs, $nanosecs );
+.Ve
+.PP
+For a simple memory dump we unpack some bytes into just as
+many pairs of hex digits, and use \f(CW\*(C`map\*(C'\fR to handle the traditional
+spacing \- 16 bytes to a line:
+.PP
+.Vb 4
+\& my $i;
+\& print map( ++$i % 16 ? "$_ " : "$_\en",
+\& unpack( \*(AqH2\*(Aq x length( $mem ), $mem ) ),
+\& length( $mem ) % 16 ? "\en" : \*(Aq\*(Aq;
+.Ve
+.SH "Funnies Section"
+.IX Header "Funnies Section"
+.Vb 5
+\& # Pulling digits out of nowhere...
+\& print unpack( \*(AqC\*(Aq, pack( \*(Aqx\*(Aq ) ),
+\& unpack( \*(Aq%B*\*(Aq, pack( \*(AqA\*(Aq ) ),
+\& unpack( \*(AqH\*(Aq, pack( \*(AqA\*(Aq ) ),
+\& unpack( \*(AqA\*(Aq, unpack( \*(AqC\*(Aq, pack( \*(AqA\*(Aq ) ) ), "\en";
+\&
+\& # One for the road ;\-)
+\& my $advice = pack( \*(Aqall u can in a van\*(Aq );
+.Ve
+.SH Authors
+.IX Header "Authors"
+Simon Cozens and Wolfgang Laun.