diff options
Diffstat (limited to 'upstream/mageia-cauldron/man3pm/IO::Compress::FAQ.3pm')
-rw-r--r-- | upstream/mageia-cauldron/man3pm/IO::Compress::FAQ.3pm | 768 |
1 files changed, 768 insertions, 0 deletions
diff --git a/upstream/mageia-cauldron/man3pm/IO::Compress::FAQ.3pm b/upstream/mageia-cauldron/man3pm/IO::Compress::FAQ.3pm new file mode 100644 index 00000000..dfed7b3a --- /dev/null +++ b/upstream/mageia-cauldron/man3pm/IO::Compress::FAQ.3pm @@ -0,0 +1,768 @@ +.\" -*- mode: troff; coding: utf-8 -*- +.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) +.\" +.\" Standard preamble: +.\" ======================================================================== +.de Sp \" Vertical space (when we can't use .PP) +.if t .sp .5v +.if n .sp +.. +.de Vb \" Begin verbatim text +.ft CW +.nf +.ne \\$1 +.. +.de Ve \" End verbatim text +.ft R +.fi +.. +.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. +.ie n \{\ +. ds C` "" +. ds C' "" +'br\} +.el\{\ +. ds C` +. ds C' +'br\} +.\" +.\" Escape single quotes in literal strings from groff's Unicode transform. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.\" +.\" If the F register is >0, we'll generate index entries on stderr for +.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index +.\" entries marked with X<> in POD. Of course, you'll have to process the +.\" output yourself in some meaningful fashion. +.\" +.\" Avoid warning from groff about undefined register 'F'. +.de IX +.. +.nr rF 0 +.if \n(.g .if rF .nr rF 1 +.if (\n(rF:(\n(.g==0)) \{\ +. if \nF \{\ +. de IX +. tm Index:\\$1\t\\n%\t"\\$2" +.. +. if !\nF==2 \{\ +. nr % 0 +. nr F 2 +. \} +. \} +.\} +.rr rF +.\" ======================================================================== +.\" +.IX Title "IO::Compress::FAQ 3pm" +.TH IO::Compress::FAQ 3pm 2023-11-28 "perl v5.38.2" "Perl Programmers Reference Guide" +.\" For nroff, turn off justification. Always turn off hyphenation; it makes +.\" way too many mistakes in technical documents. +.if n .ad l +.nh +.SH NAME +IO::Compress::FAQ \-\- Frequently Asked Questions about IO::Compress +.SH DESCRIPTION +.IX Header "DESCRIPTION" +Common questions answered. +.SH GENERAL +.IX Header "GENERAL" +.SS "Compatibility with Unix compress/uncompress." +.IX Subsection "Compatibility with Unix compress/uncompress." +Although \f(CW\*(C`Compress::Zlib\*(C'\fR has a pair of functions called \f(CW\*(C`compress\*(C'\fR and +\&\f(CW\*(C`uncompress\*(C'\fR, they are \fInot\fR related to the Unix programs of the same +name. The \f(CW\*(C`Compress::Zlib\*(C'\fR module is not compatible with Unix +\&\f(CW\*(C`compress\*(C'\fR. +.PP +If you have the \f(CW\*(C`uncompress\*(C'\fR program available, you can use this to read +compressed files +.PP +.Vb 4 +\& open F, "uncompress \-c $filename |"; +\& while (<F>) +\& { +\& ... +.Ve +.PP +Alternatively, if you have the \f(CW\*(C`gunzip\*(C'\fR program available, you can use +this to read compressed files +.PP +.Vb 4 +\& open F, "gunzip \-c $filename |"; +\& while (<F>) +\& { +\& ... +.Ve +.PP +and this to write compress files, if you have the \f(CW\*(C`compress\*(C'\fR program +available +.PP +.Vb 4 +\& open F, "| compress \-c $filename "; +\& print F "data"; +\& ... +\& close F ; +.Ve +.SS "Accessing .tar.Z files" +.IX Subsection "Accessing .tar.Z files" +The \f(CW\*(C`Archive::Tar\*(C'\fR module can optionally use \f(CW\*(C`Compress::Zlib\*(C'\fR (via the +\&\f(CW\*(C`IO::Zlib\*(C'\fR module) to access tar files that have been compressed with +\&\f(CW\*(C`gzip\*(C'\fR. Unfortunately tar files compressed with the Unix \f(CW\*(C`compress\*(C'\fR +utility cannot be read by \f(CW\*(C`Compress::Zlib\*(C'\fR and so cannot be directly +accessed by \f(CW\*(C`Archive::Tar\*(C'\fR. +.PP +If the \f(CW\*(C`uncompress\*(C'\fR or \f(CW\*(C`gunzip\*(C'\fR programs are available, you can use one +of these workarounds to read \f(CW\*(C`.tar.Z\*(C'\fR files from \f(CW\*(C`Archive::Tar\*(C'\fR +.PP +Firstly with \f(CW\*(C`uncompress\*(C'\fR +.PP +.Vb 3 +\& use strict; +\& use warnings; +\& use Archive::Tar; +\& +\& open F, "uncompress \-c $filename |"; +\& my $tar = Archive::Tar\->new(*F); +\& ... +.Ve +.PP +and this with \f(CW\*(C`gunzip\*(C'\fR +.PP +.Vb 3 +\& use strict; +\& use warnings; +\& use Archive::Tar; +\& +\& open F, "gunzip \-c $filename |"; +\& my $tar = Archive::Tar\->new(*F); +\& ... +.Ve +.PP +Similarly, if the \f(CW\*(C`compress\*(C'\fR program is available, you can use this to +write a \f(CW\*(C`.tar.Z\*(C'\fR file +.PP +.Vb 4 +\& use strict; +\& use warnings; +\& use Archive::Tar; +\& use IO::File; +\& +\& my $fh = IO::File\->new( "| compress \-c >$filename" ); +\& my $tar = Archive::Tar\->new(); +\& ... +\& $tar\->write($fh); +\& $fh\->close ; +.Ve +.SS "How do I recompress using a different compression?" +.IX Subsection "How do I recompress using a different compression?" +This is easier that you might expect if you realise that all the +\&\f(CW\*(C`IO::Compress::*\*(C'\fR objects are derived from \f(CW\*(C`IO::File\*(C'\fR and that all the +\&\f(CW\*(C`IO::Uncompress::*\*(C'\fR modules can read from an \f(CW\*(C`IO::File\*(C'\fR filehandle. +.PP +So, for example, say you have a file compressed with gzip that you want to +recompress with bzip2. Here is all that is needed to carry out the +recompression. +.PP +.Vb 2 +\& use IO::Uncompress::Gunzip \*(Aq:all\*(Aq; +\& use IO::Compress::Bzip2 \*(Aq:all\*(Aq; +\& +\& my $gzipFile = "somefile.gz"; +\& my $bzipFile = "somefile.bz2"; +\& +\& my $gunzip = IO::Uncompress::Gunzip\->new( $gzipFile ) +\& or die "Cannot gunzip $gzipFile: $GunzipError\en" ; +\& +\& bzip2 $gunzip => $bzipFile +\& or die "Cannot bzip2 to $bzipFile: $Bzip2Error\en" ; +.Ve +.PP +Note, there is a limitation of this technique. Some compression file +formats store extra information along with the compressed data payload. For +example, gzip can optionally store the original filename and Zip stores a +lot of information about the original file. If the original compressed file +contains any of this extra information, it will not be transferred to the +new compressed file using the technique above. +.SH ZIP +.IX Header "ZIP" +.SS "What Compression Types do IO::Compress::Zip & IO::Uncompress::Unzip support?" +.IX Subsection "What Compression Types do IO::Compress::Zip & IO::Uncompress::Unzip support?" +The following compression formats are supported by \f(CW\*(C`IO::Compress::Zip\*(C'\fR and +\&\f(CW\*(C`IO::Uncompress::Unzip\*(C'\fR +.IP \(bu 5 +Store (method 0) +.Sp +No compression at all. +.IP \(bu 5 +Deflate (method 8) +.Sp +This is the default compression used when creating a zip file with +\&\f(CW\*(C`IO::Compress::Zip\*(C'\fR. +.IP \(bu 5 +Bzip2 (method 12) +.Sp +Only supported if the \f(CW\*(C`IO\-Compress\-Bzip2\*(C'\fR module is installed. +.IP \(bu 5 +Lzma (method 14) +.Sp +Only supported if the \f(CW\*(C`IO\-Compress\-Lzma\*(C'\fR module is installed. +.SS "Can I Read/Write Zip files larger the 4 Gig?" +.IX Subsection "Can I Read/Write Zip files larger the 4 Gig?" +Yes, both the \f(CW\*(C`IO\-Compress\-Zip\*(C'\fR and \f(CW\*(C`IO\-Uncompress\-Unzip\*(C'\fR modules +support the zip feature called \fIZip64\fR. That allows them to read/write +files/buffers larger than 4Gig. +.PP +If you are creating a Zip file using the one-shot interface, and any of the +input files is greater than 4Gig, a zip64 complaint zip file will be +created. +.PP +.Vb 1 +\& zip "really\-large\-file" => "my.zip"; +.Ve +.PP +Similarly with the one-shot interface, if the input is a buffer larger than +4 Gig, a zip64 complaint zip file will be created. +.PP +.Vb 1 +\& zip \e$really_large_buffer => "my.zip"; +.Ve +.PP +The one-shot interface allows you to force the creation of a zip64 zip file +by including the \f(CW\*(C`Zip64\*(C'\fR option. +.PP +.Vb 1 +\& zip $filehandle => "my.zip", Zip64 => 1; +.Ve +.PP +If you want to create a zip64 zip file with the OO interface you must +specify the \f(CW\*(C`Zip64\*(C'\fR option. +.PP +.Vb 1 +\& my $zip = IO::Compress::Zip\->new( "whatever", Zip64 => 1 ); +.Ve +.PP +When uncompressing with \f(CW\*(C`IO\-Uncompress\-Unzip\*(C'\fR, it will automatically +detect if the zip file is zip64. +.PP +If you intend to manipulate the Zip64 zip files created with +\&\f(CW\*(C`IO\-Compress\-Zip\*(C'\fR using an external zip/unzip, make sure that it supports +Zip64. +.PP +In particular, if you are using Info-Zip you need to have zip version 3.x +or better to update a Zip64 archive and unzip version 6.x to read a zip64 +archive. +.SS "Can I write more that 64K entries is a Zip files?" +.IX Subsection "Can I write more that 64K entries is a Zip files?" +Yes. Zip64 allows this. See previous question. +.SS "Zip Resources" +.IX Subsection "Zip Resources" +The primary reference for zip files is the "appnote" document available at +<http://www.pkware.com/documents/casestudies/APPNOTE.TXT> +.PP +An alternatively is the Info-Zip appnote. This is available from +<ftp://ftp.info\-zip.org/pub/infozip/doc/> +.SH GZIP +.IX Header "GZIP" +.SS "Gzip Resources" +.IX Subsection "Gzip Resources" +The primary reference for gzip files is RFC 1952 +<https://datatracker.ietf.org/doc/html/rfc1952> +.PP +The primary site for gzip is <http://www.gzip.org>. +.SS "Dealing with concatenated gzip files" +.IX Subsection "Dealing with concatenated gzip files" +If the gunzip program encounters a file containing multiple gzip files +concatenated together it will automatically uncompress them all. +The example below illustrates this behaviour +.PP +.Vb 5 +\& $ echo abc | gzip \-c >x.gz +\& $ echo def | gzip \-c >>x.gz +\& $ gunzip \-c x.gz +\& abc +\& def +.Ve +.PP +By default \f(CW\*(C`IO::Uncompress::Gunzip\*(C'\fR will \fInot\fR behave like the gunzip +program. It will only uncompress the first gzip data stream in the file, as +shown below +.PP +.Vb 2 +\& $ perl \-MIO::Uncompress::Gunzip=:all \-e \*(Aqgunzip "x.gz" => \e*STDOUT\*(Aq +\& abc +.Ve +.PP +To force \f(CW\*(C`IO::Uncompress::Gunzip\*(C'\fR to uncompress all the gzip data streams, +include the \f(CW\*(C`MultiStream\*(C'\fR option, as shown below +.PP +.Vb 3 +\& $ perl \-MIO::Uncompress::Gunzip=:all \-e \*(Aqgunzip "x.gz" => \e*STDOUT, MultiStream => 1\*(Aq +\& abc +\& def +.Ve +.SS "Reading bgzip files with IO::Uncompress::Gunzip" +.IX Subsection "Reading bgzip files with IO::Uncompress::Gunzip" +A \f(CW\*(C`bgzip\*(C'\fR file consists of a series of valid gzip-compliant data streams +concatenated together. To read a file created by \f(CW\*(C`bgzip\*(C'\fR with +\&\f(CW\*(C`IO::Uncompress::Gunzip\*(C'\fR use the \f(CW\*(C`MultiStream\*(C'\fR option as shown in the +previous section. +.PP +See the section titled "The BGZF compression format" in +<http://samtools.github.io/hts\-specs/SAMv1.pdf> for a definition of +\&\f(CW\*(C`bgzip\*(C'\fR. +.SH ZLIB +.IX Header "ZLIB" +.SS "Zlib Resources" +.IX Subsection "Zlib Resources" +The primary site for the \fIzlib\fR compression library is +<http://www.zlib.org>. +.SH Bzip2 +.IX Header "Bzip2" +.SS "Bzip2 Resources" +.IX Subsection "Bzip2 Resources" +The primary site for bzip2 is <http://www.bzip.org>. +.SS "Dealing with Concatenated bzip2 files" +.IX Subsection "Dealing with Concatenated bzip2 files" +If the bunzip2 program encounters a file containing multiple bzip2 files +concatenated together it will automatically uncompress them all. +The example below illustrates this behaviour +.PP +.Vb 5 +\& $ echo abc | bzip2 \-c >x.bz2 +\& $ echo def | bzip2 \-c >>x.bz2 +\& $ bunzip2 \-c x.bz2 +\& abc +\& def +.Ve +.PP +By default \f(CW\*(C`IO::Uncompress::Bunzip2\*(C'\fR will \fInot\fR behave like the bunzip2 +program. It will only uncompress the first bunzip2 data stream in the file, as +shown below +.PP +.Vb 2 +\& $ perl \-MIO::Uncompress::Bunzip2=:all \-e \*(Aqbunzip2 "x.bz2" => \e*STDOUT\*(Aq +\& abc +.Ve +.PP +To force \f(CW\*(C`IO::Uncompress::Bunzip2\*(C'\fR to uncompress all the bzip2 data streams, +include the \f(CW\*(C`MultiStream\*(C'\fR option, as shown below +.PP +.Vb 3 +\& $ perl \-MIO::Uncompress::Bunzip2=:all \-e \*(Aqbunzip2 "x.bz2" => \e*STDOUT, MultiStream => 1\*(Aq +\& abc +\& def +.Ve +.SS "Interoperating with Pbzip2" +.IX Subsection "Interoperating with Pbzip2" +Pbzip2 (<http://compression.ca/pbzip2/>) is a parallel implementation of +bzip2. The output from pbzip2 consists of a series of concatenated bzip2 +data streams. +.PP +By default \f(CW\*(C`IO::Uncompress::Bzip2\*(C'\fR will only uncompress the first bzip2 +data stream in a pbzip2 file. To uncompress the complete pbzip2 file you +must include the \f(CW\*(C`MultiStream\*(C'\fR option, like this. +.PP +.Vb 2 +\& bunzip2 $input => \e$output, MultiStream => 1 +\& or die "bunzip2 failed: $Bunzip2Error\en"; +.Ve +.SH "HTTP & NETWORK" +.IX Header "HTTP & NETWORK" +.SS "Apache::GZip Revisited" +.IX Subsection "Apache::GZip Revisited" +Below is a mod_perl Apache compression module, called \f(CW\*(C`Apache::GZip\*(C'\fR, +taken from +<http://perl.apache.org/docs/tutorials/tips/mod_perl_tricks/mod_perl_tricks.html#On_the_Fly_Compression> +.PP +.Vb 2 +\& package Apache::GZip; +\& #File: Apache::GZip.pm +\& +\& use strict vars; +\& use Apache::Constants \*(Aq:common\*(Aq; +\& use Compress::Zlib; +\& use IO::File; +\& use constant GZIP_MAGIC => 0x1f8b; +\& use constant OS_MAGIC => 0x03; +\& +\& sub handler { +\& my $r = shift; +\& my ($fh,$gz); +\& my $file = $r\->filename; +\& return DECLINED unless $fh=IO::File\->new($file); +\& $r\->header_out(\*(AqContent\-Encoding\*(Aq=>\*(Aqgzip\*(Aq); +\& $r\->send_http_header; +\& return OK if $r\->header_only; +\& +\& tie *STDOUT,\*(AqApache::GZip\*(Aq,$r; +\& print($_) while <$fh>; +\& untie *STDOUT; +\& return OK; +\& } +\& +\& sub TIEHANDLE { +\& my($class,$r) = @_; +\& # initialize a deflation stream +\& my $d = deflateInit(\-WindowBits=>\-MAX_WBITS()) || return undef; +\& +\& # gzip header \-\- don\*(Aqt ask how I found out +\& $r\->print(pack("nccVcc",GZIP_MAGIC,Z_DEFLATED,0,time(),0,OS_MAGIC)); +\& +\& return bless { r => $r, +\& crc => crc32(undef), +\& d => $d, +\& l => 0 +\& },$class; +\& } +\& +\& sub PRINT { +\& my $self = shift; +\& foreach (@_) { +\& # deflate the data +\& my $data = $self\->{d}\->deflate($_); +\& $self\->{r}\->print($data); +\& # keep track of its length and crc +\& $self\->{l} += length($_); +\& $self\->{crc} = crc32($_,$self\->{crc}); +\& } +\& } +\& +\& sub DESTROY { +\& my $self = shift; +\& +\& # flush the output buffers +\& my $data = $self\->{d}\->flush; +\& $self\->{r}\->print($data); +\& +\& # print the CRC and the total length (uncompressed) +\& $self\->{r}\->print(pack("LL",@{$self}{qw/crc l/})); +\& } +\& +\& 1; +.Ve +.PP +Here's the Apache configuration entry you'll need to make use of it. Once +set it will result in everything in the /compressed directory will be +compressed automagically. +.PP +.Vb 4 +\& <Location /compressed> +\& SetHandler perl\-script +\& PerlHandler Apache::GZip +\& </Location> +.Ve +.PP +Although at first sight there seems to be quite a lot going on in +\&\f(CW\*(C`Apache::GZip\*(C'\fR, you could sum up what the code was doing as follows \-\- +read the contents of the file in \f(CW\*(C`$r\->filename\*(C'\fR, compress it and write +the compressed data to standard output. That's all. +.PP +This code has to jump through a few hoops to achieve this because +.IP 1. 4 +The gzip support in \f(CW\*(C`Compress::Zlib\*(C'\fR version 1.x can only work with a real +filesystem filehandle. The filehandles used by Apache modules are not +associated with the filesystem. +.IP 2. 4 +That means all the gzip support has to be done by hand \- in this case by +creating a tied filehandle to deal with creating the gzip header and +trailer. +.PP +\&\f(CW\*(C`IO::Compress::Gzip\*(C'\fR doesn't have that filehandle limitation (this was one +of the reasons for writing it in the first place). So if +\&\f(CW\*(C`IO::Compress::Gzip\*(C'\fR is used instead of \f(CW\*(C`Compress::Zlib\*(C'\fR the whole tied +filehandle code can be removed. Here is the rewritten code. +.PP +.Vb 1 +\& package Apache::GZip; +\& +\& use strict vars; +\& use Apache::Constants \*(Aq:common\*(Aq; +\& use IO::Compress::Gzip; +\& use IO::File; +\& +\& sub handler { +\& my $r = shift; +\& my ($fh,$gz); +\& my $file = $r\->filename; +\& return DECLINED unless $fh=IO::File\->new($file); +\& $r\->header_out(\*(AqContent\-Encoding\*(Aq=>\*(Aqgzip\*(Aq); +\& $r\->send_http_header; +\& return OK if $r\->header_only; +\& +\& my $gz = IO::Compress::Gzip\->new( \*(Aq\-\*(Aq, Minimal => 1 ) +\& or return DECLINED ; +\& +\& print $gz $_ while <$fh>; +\& +\& return OK; +\& } +.Ve +.PP +or even more succinctly, like this, using a one-shot gzip +.PP +.Vb 1 +\& package Apache::GZip; +\& +\& use strict vars; +\& use Apache::Constants \*(Aq:common\*(Aq; +\& use IO::Compress::Gzip qw(gzip); +\& +\& sub handler { +\& my $r = shift; +\& $r\->header_out(\*(AqContent\-Encoding\*(Aq=>\*(Aqgzip\*(Aq); +\& $r\->send_http_header; +\& return OK if $r\->header_only; +\& +\& gzip $r\->filename => \*(Aq\-\*(Aq, Minimal => 1 +\& or return DECLINED ; +\& +\& return OK; +\& } +\& +\& 1; +.Ve +.PP +The use of one-shot \f(CW\*(C`gzip\*(C'\fR above just reads from \f(CW\*(C`$r\->filename\*(C'\fR and +writes the compressed data to standard output. +.PP +Note the use of the \f(CW\*(C`Minimal\*(C'\fR option in the code above. When using gzip +for Content-Encoding you should \fIalways\fR use this option. In the example +above it will prevent the filename being included in the gzip header and +make the size of the gzip data stream a slight bit smaller. +.SS "Compressed files and Net::FTP" +.IX Subsection "Compressed files and Net::FTP" +The \f(CW\*(C`Net::FTP\*(C'\fR module provides two low-level methods called \f(CW\*(C`stor\*(C'\fR and +\&\f(CW\*(C`retr\*(C'\fR that both return filehandles. These filehandles can used with the +\&\f(CW\*(C`IO::Compress/Uncompress\*(C'\fR modules to compress or uncompress files read +from or written to an FTP Server on the fly, without having to create a +temporary file. +.PP +Firstly, here is code that uses \f(CW\*(C`retr\*(C'\fR to uncompressed a file as it is +read from the FTP Server. +.PP +.Vb 2 +\& use Net::FTP; +\& use IO::Uncompress::Gunzip qw(:all); +\& +\& my $ftp = Net::FTP\->new( ... ) +\& +\& my $retr_fh = $ftp\->retr($compressed_filename); +\& gunzip $retr_fh => $outFilename, AutoClose => 1 +\& or die "Cannot uncompress \*(Aq$compressed_file\*(Aq: $GunzipError\en"; +.Ve +.PP +and this to compress a file as it is written to the FTP Server +.PP +.Vb 2 +\& use Net::FTP; +\& use IO::Compress::Gzip qw(:all); +\& +\& my $stor_fh = $ftp\->stor($filename); +\& gzip "filename" => $stor_fh, AutoClose => 1 +\& or die "Cannot compress \*(Aq$filename\*(Aq: $GzipError\en"; +.Ve +.SH MISC +.IX Header "MISC" +.ie n .SS "Using ""InputLength"" to uncompress data embedded in a larger file/buffer." +.el .SS "Using \f(CWInputLength\fP to uncompress data embedded in a larger file/buffer." +.IX Subsection "Using InputLength to uncompress data embedded in a larger file/buffer." +A fairly common use-case is where compressed data is embedded in a larger +file/buffer and you want to read both. +.PP +As an example consider the structure of a zip file. This is a well-defined +file format that mixes both compressed and uncompressed sections of data in +a single file. +.PP +For the purposes of this discussion you can think of a zip file as sequence +of compressed data streams, each of which is prefixed by an uncompressed +local header. The local header contains information about the compressed +data stream, including the name of the compressed file and, in particular, +the length of the compressed data stream. +.PP +To illustrate how to use \f(CW\*(C`InputLength\*(C'\fR here is a script that walks a zip +file and prints out how many lines are in each compressed file (if you +intend write code to walking through a zip file for real see +"Walking through a zip file" in IO::Uncompress::Unzip ). Also, although +this example uses the zlib-based compression, the technique can be used by +the other \f(CW\*(C`IO::Uncompress::*\*(C'\fR modules. +.PP +.Vb 2 +\& use strict; +\& use warnings; +\& +\& use IO::File; +\& use IO::Uncompress::RawInflate qw(:all); +\& +\& use constant ZIP_LOCAL_HDR_SIG => 0x04034b50; +\& use constant ZIP_LOCAL_HDR_LENGTH => 30; +\& +\& my $file = $ARGV[0] ; +\& +\& my $fh = IO::File\->new( "<$file" ) +\& or die "Cannot open \*(Aq$file\*(Aq: $!\en"; +\& +\& while (1) +\& { +\& my $sig; +\& my $buffer; +\& +\& my $x ; +\& ($x = $fh\->read($buffer, ZIP_LOCAL_HDR_LENGTH)) == ZIP_LOCAL_HDR_LENGTH +\& or die "Truncated file: $!\en"; +\& +\& my $signature = unpack ("V", substr($buffer, 0, 4)); +\& +\& last unless $signature == ZIP_LOCAL_HDR_SIG; +\& +\& # Read Local Header +\& my $gpFlag = unpack ("v", substr($buffer, 6, 2)); +\& my $compressedMethod = unpack ("v", substr($buffer, 8, 2)); +\& my $compressedLength = unpack ("V", substr($buffer, 18, 4)); +\& my $uncompressedLength = unpack ("V", substr($buffer, 22, 4)); +\& my $filename_length = unpack ("v", substr($buffer, 26, 2)); +\& my $extra_length = unpack ("v", substr($buffer, 28, 2)); +\& +\& my $filename ; +\& $fh\->read($filename, $filename_length) == $filename_length +\& or die "Truncated file\en"; +\& +\& $fh\->read($buffer, $extra_length) == $extra_length +\& or die "Truncated file\en"; +\& +\& if ($compressedMethod != 8 && $compressedMethod != 0) +\& { +\& warn "Skipping file \*(Aq$filename\*(Aq \- not deflated $compressedMethod\en"; +\& $fh\->read($buffer, $compressedLength) == $compressedLength +\& or die "Truncated file\en"; +\& next; +\& } +\& +\& if ($compressedMethod == 0 && $gpFlag & 8 == 8) +\& { +\& die "Streamed Stored not supported for \*(Aq$filename\*(Aq\en"; +\& } +\& +\& next if $compressedLength == 0; +\& +\& # Done reading the Local Header +\& +\& my $inf = IO::Uncompress::RawInflate\->new( $fh, +\& Transparent => 1, +\& InputLength => $compressedLength ) +\& or die "Cannot uncompress $file [$filename]: $RawInflateError\en" ; +\& +\& my $line_count = 0; +\& +\& while (<$inf>) +\& { +\& ++ $line_count; +\& } +\& +\& print "$filename: $line_count\en"; +\& } +.Ve +.PP +The majority of the code above is concerned with reading the zip local +header data. The code that I want to focus on is at the bottom. +.PP +.Vb 1 +\& while (1) { +\& +\& # read local zip header data +\& # get $filename +\& # get $compressedLength +\& +\& my $inf = IO::Uncompress::RawInflate\->new( $fh, +\& Transparent => 1, +\& InputLength => $compressedLength ) +\& or die "Cannot uncompress $file [$filename]: $RawInflateError\en" ; +\& +\& my $line_count = 0; +\& +\& while (<$inf>) +\& { +\& ++ $line_count; +\& } +\& +\& print "$filename: $line_count\en"; +\& } +.Ve +.PP +The call to \f(CW\*(C`IO::Uncompress::RawInflate\*(C'\fR creates a new filehandle \f(CW$inf\fR +that can be used to read from the parent filehandle \f(CW$fh\fR, uncompressing +it as it goes. The use of the \f(CW\*(C`InputLength\*(C'\fR option will guarantee that +\&\fIat most\fR \f(CW$compressedLength\fR bytes of compressed data will be read from +the \f(CW$fh\fR filehandle (The only exception is for an error case like a +truncated file or a corrupt data stream). +.PP +This means that once RawInflate is finished \f(CW$fh\fR will be left at the +byte directly after the compressed data stream. +.PP +Now consider what the code looks like without \f(CW\*(C`InputLength\*(C'\fR +.PP +.Vb 1 +\& while (1) { +\& +\& # read local zip header data +\& # get $filename +\& # get $compressedLength +\& +\& # read all the compressed data into $data +\& read($fh, $data, $compressedLength); +\& +\& my $inf = IO::Uncompress::RawInflate\->new( \e$data, +\& Transparent => 1 ) +\& or die "Cannot uncompress $file [$filename]: $RawInflateError\en" ; +\& +\& my $line_count = 0; +\& +\& while (<$inf>) +\& { +\& ++ $line_count; +\& } +\& +\& print "$filename: $line_count\en"; +\& } +.Ve +.PP +The difference here is the addition of the temporary variable \f(CW$data\fR. +This is used to store a copy of the compressed data while it is being +uncompressed. +.PP +If you know that \f(CW$compressedLength\fR isn't that big then using temporary +storage won't be a problem. But if \f(CW$compressedLength\fR is very large or +you are writing an application that other people will use, and so have no +idea how big \f(CW$compressedLength\fR will be, it could be an issue. +.PP +Using \f(CW\*(C`InputLength\*(C'\fR avoids the use of temporary storage and means the +application can cope with large compressed data streams. +.PP +One final point \-\- obviously \f(CW\*(C`InputLength\*(C'\fR can only be used whenever you +know the length of the compressed data beforehand, like here with a zip +file. +.SH SUPPORT +.IX Header "SUPPORT" +General feedback/questions/bug reports should be sent to +<https://github.com/pmqs//issues> (preferred) or +<https://rt.cpan.org/Public/Dist/Display.html?Name=>. +.SH "SEE ALSO" +.IX Header "SEE ALSO" +Compress::Zlib, IO::Compress::Gzip, IO::Uncompress::Gunzip, IO::Compress::Deflate, IO::Uncompress::Inflate, IO::Compress::RawDeflate, IO::Uncompress::RawInflate, IO::Compress::Bzip2, IO::Uncompress::Bunzip2, IO::Compress::Lzma, IO::Uncompress::UnLzma, IO::Compress::Xz, IO::Uncompress::UnXz, IO::Compress::Lzip, IO::Uncompress::UnLzip, IO::Compress::Lzop, IO::Uncompress::UnLzop, IO::Compress::Lzf, IO::Uncompress::UnLzf, IO::Compress::Zstd, IO::Uncompress::UnZstd, IO::Uncompress::AnyInflate, IO::Uncompress::AnyUncompress +.PP +IO::Compress::FAQ +.PP +File::GlobMapper, Archive::Zip, +Archive::Tar, +IO::Zlib +.SH AUTHOR +.IX Header "AUTHOR" +This module was written by Paul Marquess, \f(CW\*(C`pmqs@cpan.org\*(C'\fR. +.SH "MODIFICATION HISTORY" +.IX Header "MODIFICATION HISTORY" +See the Changes file. +.SH "COPYRIGHT AND LICENSE" +.IX Header "COPYRIGHT AND LICENSE" +Copyright (c) 2005\-2023 Paul Marquess. All rights reserved. +.PP +This program is free software; you can redistribute it and/or +modify it under the same terms as Perl itself. |