diff options
Diffstat (limited to 'doc/plzip.texi')
-rw-r--r-- | doc/plzip.texi | 211 |
1 files changed, 177 insertions, 34 deletions
diff --git a/doc/plzip.texi b/doc/plzip.texi index e13515a..c459cde 100644 --- a/doc/plzip.texi +++ b/doc/plzip.texi @@ -6,8 +6,8 @@ @finalout @c %**end of header -@set UPDATED 9 July 2015 -@set VERSION 1.4 +@set UPDATED 14 May 2016 +@set VERSION 1.5 @dircategory Data Compression @direntry @@ -41,12 +41,14 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}). * File format:: Detailed format of the compressed file * Memory requirements:: Memory required to compress and decompress * Minimum file sizes:: Minimum file sizes required for full speed +* Trailing data:: Extra data appended to the file +* Examples:: A small tutorial with examples * Problems:: Reporting bugs * Concept index:: Index of concepts @end menu @sp 1 -Copyright @copyright{} 2009-2015 Antonio Diaz Diaz. +Copyright @copyright{} 2009-2016 Antonio Diaz Diaz. This manual is free documentation: you have unlimited permission to copy, distribute and modify it. @@ -83,7 +85,7 @@ program can repair bit-flip errors (one of the most common forms of data corruption) in lzip files, and provides data recovery capabilities, including error-checked merging of damaged copies of a file. @ifnothtml -@ref{Data safety,,,lziprecover}. +@xref{Data safety,,,lziprecover}. @end ifnothtml @item @@ -144,13 +146,6 @@ or more compressed files. The result is the concatenation of the corresponding uncompressed files. Integrity testing of concatenated compressed files is also supported. -WARNING! Even if plzip is bug-free, other causes may result in a corrupt -compressed file (bugs in the system libraries, memory errors, etc). -Therefore, if the data you are going to compress are important, give the -@samp{--keep} option to plzip and do not remove the original file until -you verify the compressed file with a command like -@w{@samp{plzip -cd file.lz | cmp file -}}. - @node Invoking plzip @chapter Invoking plzip @@ -165,6 +160,11 @@ The format for running plzip is: plzip [@var{options}] [@var{files}] @end example +@noindent +@samp{-} used as a @var{file} argument means standard input. It can be +mixed with other @var{files} and is read just once, the first time it +appears in the command line. + Plzip supports the following options: @table @code @@ -176,6 +176,13 @@ Print an informative help message describing the options and exit. @itemx --version Print the version number of plzip on the standard output and exit. +@anchor{--trailing-error} +@item -a +@itemx --trailing-error +Exit with error status 2 if any remaining input is detected after +decompressing the last member. Such remaining input is usually trailing +garbage that can be safely ignored. @xref{concat-example}. + @anchor{--data-size} @item -B @var{bytes} @itemx --data-size=@var{bytes} @@ -188,12 +195,17 @@ data size. @item -c @itemx --stdout -Compress or decompress to standard output. Needed when reading from a -named pipe (fifo) or from a device. +Compress or decompress to standard output; keep input files unchanged. +If compressing several files, each file is compressed independently. +This option is needed when reading from a named pipe (fifo) or from a +device. @item -d @itemx --decompress -Decompress. +Decompress the specified file(s). If a file does not exist or can't be +opened, plzip continues decompressing the rest of the files. If a file +fails to decompress, plzip exits immediately without decompressing the +rest of the files. @item -f @itemx --force @@ -238,11 +250,13 @@ Quiet operation. Suppress all messages. @item -s @var{bytes} @itemx --dictionary-size=@var{bytes} -Set the dictionary size limit in bytes. Valid values range from 4 KiB to -512 MiB. Plzip will use the smallest possible dictionary size for each -file without exceeding this limit. Note that dictionary sizes are -quantized. If the specified size does not match one of the valid sizes, -it will be rounded upwards by adding up to (@var{bytes} / 16) to it. +Set the dictionary size limit in bytes. Plzip will use the smallest +possible dictionary size for each file without exceeding this limit. +Valid values range from 4 KiB to 512 MiB. Values 12 to 29 are +interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note that +dictionary sizes are quantized. If the specified size does not match one +of the valid sizes, it will be rounded upwards by adding up to +@w{(@var{bytes} / 8)} to it. For maximum compression you should use a dictionary size limit as large as possible, but keep in mind that the decompression memory requirement @@ -252,7 +266,9 @@ is affected at compression time by the choice of dictionary size limit. @itemx --test Check integrity of the specified file(s), but don't decompress them. This really performs a trial decompression and throws away the result. -Use it together with @samp{-v} to see information about the file. +Use it together with @samp{-v} to see information about the file(s). If +a file fails the test, plzip may be unable to check the rest of the +files. @item -v @itemx --verbose @@ -265,14 +281,14 @@ decompressed size, and compressed size. @item -0 .. -9 Set the compression parameters (dictionary size and match length limit) -as shown in the table below. Note that @samp{-9} can be much slower than -@samp{-0}. These options have no effect when decompressing. +as shown in the table below. The default compression level is @samp{-6}. +Note that @samp{-9} can be much slower than @samp{-0}. These options +have no effect when decompressing. The bidimensional parameter space of LZMA can't be mapped to a linear scale optimal for all files. If your files are large, very repetitive, -etc, you may need to use the @samp{--match-length} and -@samp{--dictionary-size} options directly to achieve optimal -performance. +etc, you may need to use the @samp{--dictionary-size} and +@samp{--match-length} options directly to achieve optimal performance. @multitable {Level} {Dictionary size} {Match length limit} @item Level @tab Dictionary size @tab Match length limit @@ -324,7 +340,7 @@ caused plzip to panic. When compressing, plzip divides the input file into chunks and compresses as many chunks simultaneously as worker threads are chosen, -creating a multi-member compressed file. +creating a multimember compressed file. When decompressing, plzip decompresses as many members simultaneously as worker threads are chosen. Files that were compressed with lzip will not @@ -383,14 +399,14 @@ additional information before, between, or after them. Each member has the following structure: @verbatim +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -| ID string | VN | DS | Lzma stream | CRC32 | Data size | Member size | +| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size | +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ @end verbatim All multibyte values are stored in little endian order. @table @samp -@item ID string +@item ID string (the "magic" bytes) A four byte string, identifying the lzip format, with the value "LZIP" (0x4C, 0x5A, 0x49, 0x50). @@ -407,8 +423,8 @@ from the base size to obtain the dictionary size.@* Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@* Valid values for dictionary size range from 4 KiB to 512 MiB. -@item Lzma stream -The lzma stream, finished by an end of stream marker. Uses default +@item LZMA stream +The LZMA stream, finished by an end of stream marker. Uses default values for encoder properties. @ifnothtml @xref{Stream format,,,lzip}, @@ -428,7 +444,7 @@ Size of the uncompressed original data. @item Member size (8 bytes) Total size of the member, including header and trailer. This field acts as a distributed index, allows the verification of stream integrity, and -facilitates safe recovery of undamaged members from multi-member files. +facilitates safe recovery of undamaged members from multimember files. @end table @@ -453,8 +469,8 @@ times the data size. Default is 136 MiB. For decompression of a regular (seekable) file to another regular file, or for testing of a regular file; the dictionary size. -(Note that regular files with more than 1024 bytes of trailing garbage -are treated as non-seekable). +(Note that regular files with more than 1024 bytes of trailing data are +treated as non-seekable). @item For testing of a non-seekable file or of standard input; the dictionary @@ -476,7 +492,7 @@ dictionary size plus up to 35 MiB. When compressing, plzip divides the input file into chunks and compresses as many chunks simultaneously as worker threads are chosen, -creating a multi-member compressed file. +creating a multimember compressed file. For this to work as expected (and roughly multiply the compression speed by the number of available processors), the uncompressed file must be at @@ -506,6 +522,133 @@ data size for each level: @end multitable +@node Trailing data +@chapter Extra data appended to the file +@cindex trailing data + +Sometimes extra data is found appended to a lzip file after the last +member. Such trailing data may be: + +@itemize @bullet +@item +Padding added to make the file size a multiple of some block size, for +example when writing to a tape. + +@item +Garbage added by some not totally successful copy operation. + +@item +Useful data added by the user; a cryptographically secure hash, a +description of file contents, etc. + +@item +Malicious data added to the file in order to make its total size and +hash value (for a chosen hash) coincide with those of another file. + +@item +In very rare cases, trailing data could be the corrupt header of another +member. In multimember or concatenated files the probability of +corruption happening in the magic bytes is 5 times smaller than the +probability of getting a false positive caused by the corruption of the +integrity information itself. Therefore it can be considered to be below +the noise level. +@end itemize + +Trailing data can be safely ignored in most cases. In some cases, like +that of user-added data, it is expected to be ignored. In those cases +where a file containing trailing data must be rejected, the option +@samp{--trailing-error} can be used. @xref{--trailing-error}. + + +@node Examples +@chapter A small tutorial with examples +@cindex examples + +WARNING! Even if plzip is bug-free, other causes may result in a corrupt +compressed file (bugs in the system libraries, memory errors, etc). +Therefore, if the data you are going to compress are important, give the +@samp{--keep} option to plzip and don't remove the original file until +you verify the compressed file with a command like +@w{@samp{plzip -cd file.lz | cmp file -}}. + +@sp 1 +@noindent +Example 1: Replace a regular file with its compressed version +@samp{file.lz} and show the compression ratio. + +@example +plzip -v file +@end example + +@sp 1 +@noindent +Example 2: Like example 1 but the created @samp{file.lz} has a block +size of 1 MiB. The compression ratio is not shown. + +@example +plzip -B 1MiB file +@end example + +@sp 1 +@noindent +Example 3: Restore a regular file from its compressed version +@samp{file.lz}. If the operation is successful, @samp{file.lz} is +removed. + +@example +plzip -d file.lz +@end example + +@sp 1 +@noindent +Example 4: Verify the integrity of the compressed file @samp{file.lz} +and show status. + +@example +plzip -tv file.lz +@end example + +@sp 1 +@noindent +Example 5: Compress a whole device in /dev/sdc and send the output to +@samp{file.lz}. + +@example +plzip -c /dev/sdc > file.lz +@end example + +@sp 1 +@anchor{concat-example} +@noindent +Example 6: The right way of concatenating compressed files. +@xref{Trailing data}. + +@example +Don't do this + cat file1.lz file2.lz file3.lz | plzip -d +Do this instead + plzip -cd file1.lz file2.lz file3.lz +@end example + +@sp 1 +@noindent +Example 7: Decompress @samp{file.lz} partially until 10 KiB of +decompressed data are produced. + +@example +plzip -cd file.lz | dd bs=1024 count=10 +@end example + +@sp 1 +@noindent +Example 8: Decompress @samp{file.lz} partially from decompressed byte +10000 to decompressed byte 15000 (5000 bytes are produced). + +@example +plzip -cd file.lz | dd bs=1000 skip=10 count=5 +@end example + + @node Problems @chapter Reporting bugs @cindex bugs |