From ac32e8eabf1b97208c4ccdfe908aea863d09d1f3 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Tue, 13 Feb 2018 08:06:07 +0100 Subject: Adding upstream version 1.7. Signed-off-by: Daniel Baumann --- doc/plzip.texi | 235 +++++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 170 insertions(+), 65 deletions(-) (limited to 'doc/plzip.texi') diff --git a/doc/plzip.texi b/doc/plzip.texi index 5f32f6e..44cff75 100644 --- a/doc/plzip.texi +++ b/doc/plzip.texi @@ -6,8 +6,8 @@ @finalout @c %**end of header -@set UPDATED 12 April 2017 -@set VERSION 1.6 +@set UPDATED 7 February 2018 +@set VERSION 1.7 @dircategory Data Compression @direntry @@ -36,6 +36,7 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}). @menu * Introduction:: Purpose and features of plzip +* Output:: Meaning of plzip's output * Invoking plzip:: Command line interface * Program design:: Internal structure of plzip * File format:: Detailed format of the compressed file @@ -48,7 +49,7 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}). @end menu @sp 1 -Copyright @copyright{} 2009-2017 Antonio Diaz Diaz. +Copyright @copyright{} 2009-2018 Antonio Diaz Diaz. This manual is free documentation: you have unlimited permission to copy, distribute and modify it. @@ -81,7 +82,7 @@ availability: The lzip format provides very safe integrity checking and some data recovery means. The @uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover} -program can repair bit-flip errors (one of the most common forms of data +program can repair bit flip errors (one of the most common forms of data corruption) in lzip files, and provides data recovery capabilities, including error-checked merging of damaged copies of a file. @ifnothtml @@ -143,9 +144,54 @@ incomprehensible and therefore pointless. Plzip will correctly decompress a file which is the concatenation of two or more compressed files. The result is the concatenation of the -corresponding uncompressed files. Integrity testing of concatenated +corresponding decompressed files. Integrity testing of concatenated compressed files is also supported. + +@node Output +@chapter Meaning of plzip's output +@cindex output + +The output of plzip looks like this: + +@example +plzip -v foo + foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out. + +plzip -tvv foo.lz + foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. ok +@end example + +The meaning of each field is as follows: + +@table @code +@item N:1 +The compression ratio @w{(uncompressed_size / compressed_size)}, shown +as N to 1. + +@item ratio +The inverse compression ratio @w{(compressed_size / uncompressed_size)}, +shown as a percentage. A decimal ratio is easily obtained by moving the +decimal point two places to the left; @w{14.98% = 0.1498}. + +@item saved +The space saved by compression @w{(1 - ratio)}, shown as a percentage. + +@item in +The size of the uncompressed data. When decompressing or testing, it is +shown as @code{decompressed}. Note that plzip always prints the +uncompressed size before the compressed size when compressing, +decompressing, testing or listing. + +@item out +The size of the compressed data. When decompressing or testing, it is +shown as @code{compressed}. + +@end table + +When decompressing or testing at verbosity level 4 (-vvvv), the +dictionary size used to compress the file is also shown. + LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have been compressed. Decompressed is used to refer to data which have undergone the process of decompression. @@ -169,7 +215,7 @@ plzip [@var{options}] [@var{files}] mixed with other @var{files} and is read just once, the first time it appears in the command line. -Plzip supports the following options: +plzip supports the following options: @table @code @item -h @@ -190,12 +236,12 @@ garbage that can be safely ignored. @xref{concat-example}. @anchor{--data-size} @item -B @var{bytes} @itemx --data-size=@var{bytes} -Set the size of the input data blocks, in bytes. The input file will be -divided in chunks of this size before compression is performed. Valid -values range from 8 KiB to 1 GiB. Default value is two times the -dictionary size, except for option @samp{-0} where it defaults to 1 MiB. -Plzip will reduce the dictionary size if it is larger than the chosen -data size. +When compressing, set the size of the input data blocks in bytes. The +input file will be divided in chunks of this size before compression is +performed. Valid values range from @w{8 KiB} to @w{1 GiB}. Default value +is two times the dictionary size, except for option @samp{-0} where it +defaults to @w{1 MiB}. Plzip will reduce the dictionary size if it is +larger than the chosen data size. @item -c @itemx --stdout @@ -206,10 +252,10 @@ device. @item -d @itemx --decompress -Decompress the specified file(s). If a file does not exist or can't be +Decompress the specified files. If a file does not exist or can't be opened, plzip continues decompressing the rest of the files. If a file -fails to decompress, plzip exits immediately without decompressing the -rest of the files. +fails to decompress, or is a terminal, plzip exits immediately without +decompressing the rest of the files. @item -f @itemx --force @@ -217,8 +263,8 @@ Force overwrite of output files. @item -F @itemx --recompress -Force re-compression of files whose name already has the @samp{.lz} or -@samp{.tlz} suffix. +When compressing, force re-compression of files whose name already has +the @samp{.lz} or @samp{.tlz} suffix. @item -k @itemx --keep @@ -227,7 +273,7 @@ Keep (don't delete) input files during compression or decompression. @item -l @itemx --list Print the uncompressed size, compressed size and percentage saved of the -specified file(s). Trailing data are ignored. The values produced are +specified files. Trailing data are ignored. The values produced are correct even for multimember files. If more than one file is given, a final line containing the cumulative sizes is printed. With @samp{-v}, the dictionary size, the number of members in the file, and the amount @@ -240,16 +286,21 @@ verifies that none of the specified files contain trailing data. @item -m @var{bytes} @itemx --match-length=@var{bytes} -Set the match length limit in bytes. After a match this long is found, -the search is finished. Valid values range from 5 to 273. Larger values -usually give better compression ratios but longer compression times. +When compressing, set the match length limit in bytes. After a match +this long is found, the search is finished. Valid values range from 5 to +273. Larger values usually give better compression ratios but longer +compression times. @item -n @var{n} @itemx --threads=@var{n} -Set the number of worker threads. Valid values range from 1 to "as many -as your system can support". If this option is not used, plzip tries to -detect the number of processors in the system and use it as default -value. @w{@samp{plzip --help}} shows the system's default value. +Set the number of worker threads, overriding the system's default. Valid +values range from 1 to "as many as your system can support". If this +option is not used, plzip tries to detect the number of processors in +the system and use it as default value. When compressing on a @w{32 bit} +system, plzip tries to limit the memory use to under @w{2.22 GiB} (4 +worker threads at level -9) by reducing the number of threads below the +system's default. @w{@samp{plzip --help}} shows the system's default +value. Note that the number of usable threads is limited to @w{ceil( file_size / data_size )} during compression (@pxref{Minimum file sizes}), and to @@ -260,7 +311,9 @@ the number of members in the input during decompression. When reading from standard input and @samp{--stdout} has not been specified, use @samp{@var{file}} as the virtual name of the uncompressed file. This produces a file named @samp{@var{file}} when decompressing, -and a file named @samp{@var{file}.lz} when compressing. +or a file named @samp{@var{file}.lz} when compressing. A second +@samp{.lz} extension is not added if @samp{@var{file}} already ends in +@samp{.lz} or @samp{.tlz}. @item -q @itemx --quiet @@ -268,12 +321,12 @@ Quiet operation. Suppress all messages. @item -s @var{bytes} @itemx --dictionary-size=@var{bytes} -Set the dictionary size limit in bytes. Plzip will use the smallest -possible dictionary size for each file without exceeding this limit. -Valid values range from 4 KiB to 512 MiB. Values 12 to 29 are -interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note that -dictionary sizes are quantized. If the specified size does not match one -of the valid sizes, it will be rounded upwards by adding up to +When compressing, set the dictionary size limit in bytes. Plzip will use +the smallest possible dictionary size for each file without exceeding +this limit. Valid values range from @w{4 KiB} to @w{512 MiB}. Values 12 +to 29 are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note +that dictionary sizes are quantized. If the specified size does not +match one of the valid sizes, it will be rounded upwards by adding up to @w{(@var{bytes} / 8)} to it. For maximum compression you should use a dictionary size limit as large @@ -282,27 +335,29 @@ is affected at compression time by the choice of dictionary size limit. @item -t @itemx --test -Check integrity of the specified file(s), but don't decompress them. -This really performs a trial decompression and throws away the result. -Use it together with @samp{-v} to see information about the file(s). If -a file does not exist, can't be opened, or is a terminal, plzip -continues checking the rest of the files. If a file fails the test, -plzip may be unable to check the rest of the files. +Check integrity of the specified files, but don't decompress them. This +really performs a trial decompression and throws away the result. Use it +together with @samp{-v} to see information about the files. If a file +does not exist, can't be opened, or is a terminal, plzip continues +checking the rest of the files. If a file fails the test, plzip may be +unable to check the rest of the files. @item -v @itemx --verbose Verbose mode.@* -When compressing, show the compression ratio for each file processed. A -second @samp{-v} shows the progress of compression.@* +When compressing, show the compression ratio and size for each file +processed.@* When decompressing or testing, further -v's (up to 4) increase the verbosity level, showing status, compression ratio, dictionary size, -decompressed size, and compressed size. +decompressed size, and compressed size.@* +Two or more @samp{-v} options show the progress of (de)compression, +except for single-member files. @item -0 .. -9 Set the compression parameters (dictionary size and match length limit) as shown in the table below. The default compression level is @samp{-6}. Note that @samp{-9} can be much slower than @samp{-0}. These options -have no effect when decompressing. +have no effect when decompressing, testing or listing. The bidimensional parameter space of LZMA can't be mapped to a linear scale optimal for all files. If your files are large, very repetitive, @@ -327,6 +382,12 @@ etc, you may need to use the @samp{--dictionary-size} and @itemx --best Aliases for GNU gzip compatibility. +@item --loose-trailing +When decompressing, testing or listing, allow trailing data whose first +bytes are so similar to the magic bytes of a lzip header that they can +be confused with a corrupt header. Use this option if a file triggers a +"corrupt header" error and the cause is not indeed a corrupt header. + @end table Numbers given as arguments to options may be followed by a multiplier @@ -363,8 +424,8 @@ creating a multimember compressed file. When decompressing, plzip decompresses as many members simultaneously as worker threads are chosen. Files that were compressed with lzip will not -be decompressed faster than using lzip (unless the @samp{-b} option was -used) because lzip usually produces single-member files, which can't be +be decompressed faster than using lzip (unless the @samp{-b} option was used) +because lzip usually produces single-member files, which can't be decompressed in parallel. For each input file, a splitter thread and several worker threads are @@ -377,6 +438,19 @@ to the workers. The workers (de)compress the blocks received from the splitter. The muxer collects processed packets from the workers, and writes them to the output file. +@verbatim + ,------------, + ,-->| worker 0 |--, + | `------------' | +,-------, ,----------, | ,------------, | ,-------, ,--------, +| input |-->| splitter |-+-->| worker 1 |--+-->| muxer |-->| output | +| file | `----------' | `------------' | `-------' | file | +`-------' | ... | `--------' + | ,------------, | + `-->| worker N-1 |--' + `------------' +@end verbatim + When decompressing from a regular file, the splitter is removed and the workers read directly from the input file. If the output file is also a regular file, the muxer is also removed and the workers write directly @@ -472,35 +546,60 @@ facilitates safe recovery of undamaged members from multimember files. @chapter Memory required to compress and decompress @cindex memory requirements -The amount of memory required @strong{per thread} is approximately the -following: +The amount of memory required @strong{per thread} for decompression or +testing is approximately the following: @itemize @bullet -@item -For compression at level -0; 1.5 MiB plus 3 times the data size -(@pxref{--data-size}). Default is 4.5 MiB. - -@item -For compression at other levels; 11 times the dictionary size plus 3 -times the data size. Default is 136 MiB. - @item For decompression of a regular (seekable) file to another regular file, or for testing of a regular file; the dictionary size. @item For testing of a non-seekable file or of standard input; the dictionary -size plus up to 5 MiB. +size plus up to @w{5 MiB}. @item For decompression of a regular file to a non-seekable file or to -standard output; the dictionary size plus up to 32 MiB. +standard output; the dictionary size plus up to @w{32 MiB}. @item For decompression of a non-seekable file or of standard input; the -dictionary size plus up to 35 MiB. +dictionary size plus up to @w{35 MiB}. +@end itemize + +@noindent +The amount of memory required @strong{per thread} for compression is +approximately the following: + +@itemize @bullet +@item +For compression at level -0; @w{1.5 MiB} plus 3.375 times the data size +(@pxref{--data-size}). Default is @w{4.875 MiB}. + +@item +For compression at other levels; 11 times the dictionary size plus 3.375 +times the data size. Default is @w{142 MiB}. @end itemize +@noindent +The following table shows the memory required @strong{per thread} for +compression at a given level, using the default data size for each +level: + +@multitable {Level} {Memory required} +@item Level @tab Memory required +@item -0 @tab 4.875 MiB +@item -1 @tab 17.75 MiB +@item -2 @tab 26.625 MiB +@item -3 @tab 35.5 MiB +@item -4 @tab 53.25 MiB +@item -5 @tab 71 MiB +@item -6 @tab 142 MiB +@item -7 @tab 284 MiB +@item -8 @tab 426 MiB +@item -9 @tab 568 MiB +@end multitable + @node Minimum file sizes @chapter Minimum file sizes required for full compression speed @@ -516,7 +615,8 @@ least as large as the number of worker threads times the chunk size (@pxref{--data-size}). Else some processors will not get any data to compress, and compression will be proportionally slower. The maximum speed increase achievable on a given file is limited by the ratio -@w{(file_size / data_size)}. +@w{(file_size / data_size)}. For example, a tarball the size of gcc or +linux will scale up to 8 processors at level -9. The following table shows the minimum uncompressed file size needed for full use of N processors at a given compression level, using the default @@ -554,9 +654,10 @@ padding zero bytes to a lzip file. @item Useful data added by the user; a cryptographically secure hash, a description of file contents, etc. It is safe to append any amount of -text to a lzip file as long as the text does not begin with the string -"LZIP", and does not contain any zero bytes (null characters). Nonzero -bytes and zero bytes can't be safely mixed in trailing data. +text to a lzip file as long as none of the first four bytes of the text +match the corresponding byte in the string "LZIP", and the text does not +contain any zero bytes (null characters). Nonzero bytes and zero bytes +can't be safely mixed in trailing data. @item Garbage added by some not totally successful copy operation. @@ -566,12 +667,16 @@ Malicious data added to the file in order to make its total size and hash value (for a chosen hash) coincide with those of another file. @item -In very rare cases, trailing data could be the corrupt header of another +In rare cases, trailing data could be the corrupt header of another member. In multimember or concatenated files the probability of corruption happening in the magic bytes is 5 times smaller than the probability of getting a false positive caused by the corruption of the integrity information itself. Therefore it can be considered to be below -the noise level. +the noise level. Additionally, the test used by plzip to discriminate +trailing data from a corrupt header has a Hamming distance (HD) of 3, +and the 3 bit flips must happen in different magic bytes for the test to +fail. In any case, the option @samp{--trailing-error} guarantees that +any corrupt header will be detected. @end itemize Trailing data are in no way part of the lzip file format, but tools @@ -607,7 +712,7 @@ plzip -v file @sp 1 @noindent Example 2: Like example 1 but the created @samp{file.lz} has a block -size of 1 MiB. The compression ratio is not shown. +size of @w{1 MiB}. The compression ratio is not shown. @example plzip -B 1MiB file @@ -656,7 +761,7 @@ Do this instead @sp 1 @noindent -Example 7: Decompress @samp{file.lz} partially until 10 KiB of +Example 7: Decompress @samp{file.lz} partially until @w{10 KiB} of decompressed data are produced. @example -- cgit v1.2.3