From ac32e8eabf1b97208c4ccdfe908aea863d09d1f3 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Tue, 13 Feb 2018 08:06:07 +0100 Subject: Adding upstream version 1.7. Signed-off-by: Daniel Baumann --- doc/plzip.1 | 9 +- doc/plzip.info | 268 +++++++++++++++++++++++++++++++++++++++------------------ doc/plzip.texi | 235 ++++++++++++++++++++++++++++++++++++-------------- 3 files changed, 358 insertions(+), 154 deletions(-) (limited to 'doc') diff --git a/doc/plzip.1 b/doc/plzip.1 index 5c47edd..99dfd8b 100644 --- a/doc/plzip.1 +++ b/doc/plzip.1 @@ -1,5 +1,5 @@ .\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1. -.TH PLZIP "1" "April 2017" "plzip 1.6" "User Commands" +.TH PLZIP "1" "February 2018" "plzip 1.7" "User Commands" .SH NAME plzip \- reduces the size of files .SH SYNOPSIS @@ -68,6 +68,9 @@ alias for \fB\-0\fR .TP \fB\-\-best\fR alias for \fB\-9\fR +.TP +\fB\-\-loose\-trailing\fR +allow trailing data seeming corrupt header .PP If no file names are given, or if a file is '\-', plzip compresses or decompresses from standard input to standard output. @@ -92,8 +95,8 @@ Plzip home page: http://www.nongnu.org/lzip/plzip.html .SH COPYRIGHT Copyright \(co 2009 Laszlo Ersek. .br -Copyright \(co 2017 Antonio Diaz Diaz. -Using lzlib 1.9 +Copyright \(co 2018 Antonio Diaz Diaz. +Using lzlib 1.10 License GPLv2+: GNU GPL version 2 or later .br This is free software: you are free to change and redistribute it. diff --git a/doc/plzip.info b/doc/plzip.info index cf53f13..c8d7387 100644 --- a/doc/plzip.info +++ b/doc/plzip.info @@ -11,11 +11,12 @@ File: plzip.info, Node: Top, Next: Introduction, Up: (dir) Plzip Manual ************ -This manual is for Plzip (version 1.6, 12 April 2017). +This manual is for Plzip (version 1.7, 7 February 2018). * Menu: * Introduction:: Purpose and features of plzip +* Output:: Meaning of plzip's output * Invoking plzip:: Command line interface * Program design:: Internal structure of plzip * File format:: Detailed format of the compressed file @@ -27,13 +28,13 @@ This manual is for Plzip (version 1.6, 12 April 2017). * Concept index:: Index of concepts - Copyright (C) 2009-2017 Antonio Diaz Diaz. + Copyright (C) 2009-2018 Antonio Diaz Diaz. This manual is free documentation: you have unlimited permission to copy, distribute and modify it.  -File: plzip.info, Node: Introduction, Next: Invoking plzip, Prev: Top, Up: Top +File: plzip.info, Node: Introduction, Next: Output, Prev: Top, Up: Top 1 Introduction ************** @@ -58,7 +59,7 @@ archiving, taking into account both data integrity and decoder availability: * The lzip format provides very safe integrity checking and some data - recovery means. The lziprecover program can repair bit-flip errors + recovery means. The lziprecover program can repair bit flip errors (one of the most common forms of data corruption) in lzip files, and provides data recovery capabilities, including error-checked merging of damaged copies of a file. *Note Data safety: @@ -114,17 +115,60 @@ entirely incomprehensible and therefore pointless. Plzip will correctly decompress a file which is the concatenation of two or more compressed files. The result is the concatenation of the -corresponding uncompressed files. Integrity testing of concatenated +corresponding decompressed files. Integrity testing of concatenated compressed files is also supported. + +File: plzip.info, Node: Output, Next: Invoking plzip, Prev: Introduction, Up: Top + +2 Meaning of plzip's output +*************************** + +The output of plzip looks like this: + + plzip -v foo + foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out. + + plzip -tvv foo.lz + foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. ok + + The meaning of each field is as follows: + +'N:1' + The compression ratio (uncompressed_size / compressed_size), shown + as N to 1. + +'ratio' + The inverse compression ratio + (compressed_size / uncompressed_size), shown as a percentage. A + decimal ratio is easily obtained by moving the decimal point two + places to the left; 14.98% = 0.1498. + +'saved' + The space saved by compression (1 - ratio), shown as a percentage. + +'in' + The size of the uncompressed data. When decompressing or testing, + it is shown as 'decompressed'. Note that plzip always prints the + uncompressed size before the compressed size when compressing, + decompressing, testing or listing. + +'out' + The size of the compressed data. When decompressing or testing, it + is shown as 'compressed'. + + + When decompressing or testing at verbosity level 4 (-vvvv), the +dictionary size used to compress the file is also shown. + LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have been compressed. Decompressed is used to refer to data which have undergone the process of decompression.  -File: plzip.info, Node: Invoking plzip, Next: Program design, Prev: Introduction, Up: Top +File: plzip.info, Node: Invoking plzip, Next: Program design, Prev: Output, Up: Top -2 Invoking plzip +3 Invoking plzip **************** The format for running plzip is: @@ -135,7 +179,7 @@ The format for running plzip is: other FILES and is read just once, the first time it appears in the command line. - Plzip supports the following options: + plzip supports the following options: '-h' '--help' @@ -154,12 +198,12 @@ command line. '-B BYTES' '--data-size=BYTES' - Set the size of the input data blocks, in bytes. The input file - will be divided in chunks of this size before compression is - performed. Valid values range from 8 KiB to 1 GiB. Default value - is two times the dictionary size, except for option '-0' where it - defaults to 1 MiB. Plzip will reduce the dictionary size if it is - larger than the chosen data size. + When compressing, set the size of the input data blocks in bytes. + The input file will be divided in chunks of this size before + compression is performed. Valid values range from 8 KiB to 1 GiB. + Default value is two times the dictionary size, except for option + '-0' where it defaults to 1 MiB. Plzip will reduce the dictionary + size if it is larger than the chosen data size. '-c' '--stdout' @@ -170,10 +214,10 @@ command line. '-d' '--decompress' - Decompress the specified file(s). If a file does not exist or - can't be opened, plzip continues decompressing the rest of the - files. If a file fails to decompress, plzip exits immediately - without decompressing the rest of the files. + Decompress the specified files. If a file does not exist or can't + be opened, plzip continues decompressing the rest of the files. If + a file fails to decompress, or is a terminal, plzip exits + immediately without decompressing the rest of the files. '-f' '--force' @@ -181,8 +225,8 @@ command line. '-F' '--recompress' - Force re-compression of files whose name already has the '.lz' or - '.tlz' suffix. + When compressing, force re-compression of files whose name already + has the '.lz' or '.tlz' suffix. '-k' '--keep' @@ -192,7 +236,7 @@ command line. '-l' '--list' Print the uncompressed size, compressed size and percentage saved - of the specified file(s). Trailing data are ignored. The values + of the specified files. Trailing data are ignored. The values produced are correct even for multimember files. If more than one file is given, a final line containing the cumulative sizes is printed. With '-v', the dictionary size, the number of members in @@ -206,18 +250,21 @@ command line. '-m BYTES' '--match-length=BYTES' - Set the match length limit in bytes. After a match this long is - found, the search is finished. Valid values range from 5 to 273. - Larger values usually give better compression ratios but longer - compression times. + When compressing, set the match length limit in bytes. After a + match this long is found, the search is finished. Valid values + range from 5 to 273. Larger values usually give better compression + ratios but longer compression times. '-n N' '--threads=N' - Set the number of worker threads. Valid values range from 1 to "as - many as your system can support". If this option is not used, - plzip tries to detect the number of processors in the system and - use it as default value. 'plzip --help' shows the system's default - value. + Set the number of worker threads, overriding the system's default. + Valid values range from 1 to "as many as your system can support". + If this option is not used, plzip tries to detect the number of + processors in the system and use it as default value. When + compressing on a 32 bit system, plzip tries to limit the memory + use to under 2.22 GiB (4 worker threads at level -9) by reducing + the number of threads below the system's default. 'plzip --help' + shows the system's default value. Note that the number of usable threads is limited to ceil( file_size / data_size ) during compression (*note Minimum @@ -228,8 +275,9 @@ command line. '--output=FILE' When reading from standard input and '--stdout' has not been specified, use 'FILE' as the virtual name of the uncompressed - file. This produces a file named 'FILE' when decompressing, and a - file named 'FILE.lz' when compressing. + file. This produces a file named 'FILE' when decompressing, or a + file named 'FILE.lz' when compressing. A second '.lz' extension is + not added if 'FILE' already ends in '.lz' or '.tlz'. '-q' '--quiet' @@ -237,13 +285,13 @@ command line. '-s BYTES' '--dictionary-size=BYTES' - Set the dictionary size limit in bytes. Plzip will use the smallest - possible dictionary size for each file without exceeding this - limit. Valid values range from 4 KiB to 512 MiB. Values 12 to 29 - are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note - that dictionary sizes are quantized. If the specified size does - not match one of the valid sizes, it will be rounded upwards by - adding up to (BYTES / 8) to it. + When compressing, set the dictionary size limit in bytes. Plzip + will use the smallest possible dictionary size for each file + without exceeding this limit. Valid values range from 4 KiB to + 512 MiB. Values 12 to 29 are interpreted as powers of two, meaning + 2^12 to 2^29 bytes. Note that dictionary sizes are quantized. If + the specified size does not match one of the valid sizes, it will + be rounded upwards by adding up to (BYTES / 8) to it. For maximum compression you should use a dictionary size limit as large as possible, but keep in mind that the decompression memory @@ -252,10 +300,10 @@ command line. '-t' '--test' - Check integrity of the specified file(s), but don't decompress - them. This really performs a trial decompression and throws away - the result. Use it together with '-v' to see information about - the file(s). If a file does not exist, can't be opened, or is a + Check integrity of the specified files, but don't decompress them. + This really performs a trial decompression and throws away the + result. Use it together with '-v' to see information about the + files. If a file does not exist, can't be opened, or is a terminal, plzip continues checking the rest of the files. If a file fails the test, plzip may be unable to check the rest of the files. @@ -263,17 +311,19 @@ command line. '-v' '--verbose' Verbose mode. - When compressing, show the compression ratio for each file - processed. A second '-v' shows the progress of compression. + When compressing, show the compression ratio and size for each file + processed. When decompressing or testing, further -v's (up to 4) increase the verbosity level, showing status, compression ratio, dictionary size, decompressed size, and compressed size. + Two or more '-v' options show the progress of (de)compression, + except for single-member files. '-0 .. -9' Set the compression parameters (dictionary size and match length limit) as shown in the table below. The default compression level is '-6'. Note that '-9' can be much slower than '-0'. These - options have no effect when decompressing. + options have no effect when decompressing, testing or listing. The bidimensional parameter space of LZMA can't be mapped to a linear scale optimal for all files. If your files are large, very @@ -296,6 +346,13 @@ command line. '--best' Aliases for GNU gzip compatibility. +'--loose-trailing' + When decompressing, testing or listing, allow trailing data whose + first bytes are so similar to the magic bytes of a lzip header + that they can be confused with a corrupt header. Use this option + if a file triggers a "corrupt header" error and the cause is not + indeed a corrupt header. + Numbers given as arguments to options may be followed by a multiplier and an optional 'B' for "byte". @@ -321,7 +378,7 @@ caused plzip to panic.  File: plzip.info, Node: Program design, Next: File format, Prev: Invoking plzip, Up: Top -3 Program design +4 Program design **************** When compressing, plzip divides the input file into chunks and @@ -344,6 +401,17 @@ them to the workers. The workers (de)compress the blocks received from the splitter. The muxer collects processed packets from the workers, and writes them to the output file. + ,------------, + ,-->| worker 0 |--, + | `------------' | +,-------, ,----------, | ,------------, | ,-------, ,--------, +| input |-->| splitter |-+-->| worker 1 |--+-->| muxer |-->| output | +| file | `----------' | `------------' | `-------' | file | +`-------' | ... | `--------' + | ,------------, | + `-->| worker N-1 |--' + `------------' + When decompressing from a regular file, the splitter is removed and the workers read directly from the input file. If the output file is also a regular file, the muxer is also removed and the workers write @@ -355,7 +423,7 @@ I/O speed.  File: plzip.info, Node: File format, Next: Memory requirements, Prev: Program design, Up: Top -4 File format +5 File format ************* Perfection is reached, not when there is no longer anything to add, but @@ -426,17 +494,11 @@ additional information before, between, or after them.  File: plzip.info, Node: Memory requirements, Next: Minimum file sizes, Prev: File format, Up: Top -5 Memory required to compress and decompress +6 Memory required to compress and decompress ******************************************** -The amount of memory required *per thread* is approximately the -following: - - * For compression at level -0; 1.5 MiB plus 3 times the data size - (*note --data-size::). Default is 4.5 MiB. - - * For compression at other levels; 11 times the dictionary size plus - 3 times the data size. Default is 136 MiB. +The amount of memory required *per thread* for decompression or testing +is approximately the following: * For decompression of a regular (seekable) file to another regular file, or for testing of a regular file; the dictionary size. @@ -450,10 +512,35 @@ following: * For decompression of a non-seekable file or of standard input; the dictionary size plus up to 35 MiB. +The amount of memory required *per thread* for compression is +approximately the following: + + * For compression at level -0; 1.5 MiB plus 3.375 times the data size + (*note --data-size::). Default is 4.875 MiB. + + * For compression at other levels; 11 times the dictionary size plus + 3.375 times the data size. Default is 142 MiB. + +The following table shows the memory required *per thread* for +compression at a given level, using the default data size for each +level: + +Level Memory required +-0 4.875 MiB +-1 17.75 MiB +-2 26.625 MiB +-3 35.5 MiB +-4 53.25 MiB +-5 71 MiB +-6 142 MiB +-7 284 MiB +-8 426 MiB +-9 568 MiB +  File: plzip.info, Node: Minimum file sizes, Next: Trailing data, Prev: Memory requirements, Up: Top -6 Minimum file sizes required for full compression speed +7 Minimum file sizes required for full compression speed ******************************************************** When compressing, plzip divides the input file into chunks and @@ -466,7 +553,8 @@ must be at least as large as the number of worker threads times the chunk size (*note --data-size::). Else some processors will not get any data to compress, and compression will be proportionally slower. The maximum speed increase achievable on a given file is limited by the -ratio (file_size / data_size). +ratio (file_size / data_size). For example, a tarball the size of gcc or +linux will scale up to 8 processors at level -9. The following table shows the minimum uncompressed file size needed for full use of N processors at a given compression level, using the @@ -489,7 +577,7 @@ Level  File: plzip.info, Node: Trailing data, Next: Examples, Prev: Minimum file sizes, Up: Top -7 Extra data appended to the file +8 Extra data appended to the file ********************************* Sometimes extra data are found appended to a lzip file after the last @@ -501,10 +589,11 @@ member. Such trailing data may be: * Useful data added by the user; a cryptographically secure hash, a description of file contents, etc. It is safe to append any amount - of text to a lzip file as long as the text does not begin with the - string "LZIP", and does not contain any zero bytes (null - characters). Nonzero bytes and zero bytes can't be safely mixed in - trailing data. + of text to a lzip file as long as none of the first four bytes of + the text match the corresponding byte in the string "LZIP", and + the text does not contain any zero bytes (null characters). + Nonzero bytes and zero bytes can't be safely mixed in trailing + data. * Garbage added by some not totally successful copy operation. @@ -512,12 +601,17 @@ member. Such trailing data may be: and hash value (for a chosen hash) coincide with those of another file. - * In very rare cases, trailing data could be the corrupt header of - another member. In multimember or concatenated files the - probability of corruption happening in the magic bytes is 5 times - smaller than the probability of getting a false positive caused by - the corruption of the integrity information itself. Therefore it - can be considered to be below the noise level. + * In rare cases, trailing data could be the corrupt header of another + member. In multimember or concatenated files the probability of + corruption happening in the magic bytes is 5 times smaller than the + probability of getting a false positive caused by the corruption + of the integrity information itself. Therefore it can be + considered to be below the noise level. Additionally, the test + used by plzip to discriminate trailing data from a corrupt header + has a Hamming distance (HD) of 3, and the 3 bit flips must happen + in different magic bytes for the test to fail. In any case, the + option '--trailing-error' guarantees that any corrupt header will + be detected. Trailing data are in no way part of the lzip file format, but tools reading lzip files are expected to behave as correctly and usefully as @@ -531,7 +625,7 @@ cases where a file containing trailing data must be rejected, the option  File: plzip.info, Node: Examples, Next: Problems, Prev: Trailing data, Up: Top -8 A small tutorial with examples +9 A small tutorial with examples ******************************** WARNING! Even if plzip is bug-free, other causes may result in a corrupt @@ -595,8 +689,8 @@ to decompressed byte 15000 (5000 bytes are produced).  File: plzip.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top -9 Reporting bugs -**************** +10 Reporting bugs +***************** There are probably bugs in plzip. There are certainly errors and omissions in this manual. If you report them, they will get fixed. If @@ -625,6 +719,7 @@ Concept index * memory requirements: Memory requirements. (line 6) * minimum file sizes: Minimum file sizes. (line 6) * options: Invoking plzip. (line 6) +* output: Output. (line 6) * program design: Program design. (line 6) * trailing data: Trailing data. (line 6) * usage: Invoking plzip. (line 6) @@ -634,19 +729,20 @@ Concept index  Tag Table: Node: Top221 -Node: Introduction1103 -Node: Invoking plzip5274 -Ref: --trailing-error5843 -Ref: --data-size6086 -Node: Program design12796 -Node: File format14383 -Node: Memory requirements16815 -Node: Minimum file sizes17815 -Node: Trailing data19741 -Node: Examples21648 -Ref: concat-example22813 -Node: Problems23388 -Node: Concept index23914 +Node: Introduction1158 +Node: Output5134 +Node: Invoking plzip6614 +Ref: --trailing-error7177 +Ref: --data-size7420 +Node: Program design14938 +Node: File format17090 +Node: Memory requirements19522 +Node: Minimum file sizes20985 +Node: Trailing data23002 +Node: Examples25285 +Ref: concat-example26450 +Node: Problems27025 +Node: Concept index27553  End Tag Table diff --git a/doc/plzip.texi b/doc/plzip.texi index 5f32f6e..44cff75 100644 --- a/doc/plzip.texi +++ b/doc/plzip.texi @@ -6,8 +6,8 @@ @finalout @c %**end of header -@set UPDATED 12 April 2017 -@set VERSION 1.6 +@set UPDATED 7 February 2018 +@set VERSION 1.7 @dircategory Data Compression @direntry @@ -36,6 +36,7 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}). @menu * Introduction:: Purpose and features of plzip +* Output:: Meaning of plzip's output * Invoking plzip:: Command line interface * Program design:: Internal structure of plzip * File format:: Detailed format of the compressed file @@ -48,7 +49,7 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}). @end menu @sp 1 -Copyright @copyright{} 2009-2017 Antonio Diaz Diaz. +Copyright @copyright{} 2009-2018 Antonio Diaz Diaz. This manual is free documentation: you have unlimited permission to copy, distribute and modify it. @@ -81,7 +82,7 @@ availability: The lzip format provides very safe integrity checking and some data recovery means. The @uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover} -program can repair bit-flip errors (one of the most common forms of data +program can repair bit flip errors (one of the most common forms of data corruption) in lzip files, and provides data recovery capabilities, including error-checked merging of damaged copies of a file. @ifnothtml @@ -143,9 +144,54 @@ incomprehensible and therefore pointless. Plzip will correctly decompress a file which is the concatenation of two or more compressed files. The result is the concatenation of the -corresponding uncompressed files. Integrity testing of concatenated +corresponding decompressed files. Integrity testing of concatenated compressed files is also supported. + +@node Output +@chapter Meaning of plzip's output +@cindex output + +The output of plzip looks like this: + +@example +plzip -v foo + foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out. + +plzip -tvv foo.lz + foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. ok +@end example + +The meaning of each field is as follows: + +@table @code +@item N:1 +The compression ratio @w{(uncompressed_size / compressed_size)}, shown +as N to 1. + +@item ratio +The inverse compression ratio @w{(compressed_size / uncompressed_size)}, +shown as a percentage. A decimal ratio is easily obtained by moving the +decimal point two places to the left; @w{14.98% = 0.1498}. + +@item saved +The space saved by compression @w{(1 - ratio)}, shown as a percentage. + +@item in +The size of the uncompressed data. When decompressing or testing, it is +shown as @code{decompressed}. Note that plzip always prints the +uncompressed size before the compressed size when compressing, +decompressing, testing or listing. + +@item out +The size of the compressed data. When decompressing or testing, it is +shown as @code{compressed}. + +@end table + +When decompressing or testing at verbosity level 4 (-vvvv), the +dictionary size used to compress the file is also shown. + LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have been compressed. Decompressed is used to refer to data which have undergone the process of decompression. @@ -169,7 +215,7 @@ plzip [@var{options}] [@var{files}] mixed with other @var{files} and is read just once, the first time it appears in the command line. -Plzip supports the following options: +plzip supports the following options: @table @code @item -h @@ -190,12 +236,12 @@ garbage that can be safely ignored. @xref{concat-example}. @anchor{--data-size} @item -B @var{bytes} @itemx --data-size=@var{bytes} -Set the size of the input data blocks, in bytes. The input file will be -divided in chunks of this size before compression is performed. Valid -values range from 8 KiB to 1 GiB. Default value is two times the -dictionary size, except for option @samp{-0} where it defaults to 1 MiB. -Plzip will reduce the dictionary size if it is larger than the chosen -data size. +When compressing, set the size of the input data blocks in bytes. The +input file will be divided in chunks of this size before compression is +performed. Valid values range from @w{8 KiB} to @w{1 GiB}. Default value +is two times the dictionary size, except for option @samp{-0} where it +defaults to @w{1 MiB}. Plzip will reduce the dictionary size if it is +larger than the chosen data size. @item -c @itemx --stdout @@ -206,10 +252,10 @@ device. @item -d @itemx --decompress -Decompress the specified file(s). If a file does not exist or can't be +Decompress the specified files. If a file does not exist or can't be opened, plzip continues decompressing the rest of the files. If a file -fails to decompress, plzip exits immediately without decompressing the -rest of the files. +fails to decompress, or is a terminal, plzip exits immediately without +decompressing the rest of the files. @item -f @itemx --force @@ -217,8 +263,8 @@ Force overwrite of output files. @item -F @itemx --recompress -Force re-compression of files whose name already has the @samp{.lz} or -@samp{.tlz} suffix. +When compressing, force re-compression of files whose name already has +the @samp{.lz} or @samp{.tlz} suffix. @item -k @itemx --keep @@ -227,7 +273,7 @@ Keep (don't delete) input files during compression or decompression. @item -l @itemx --list Print the uncompressed size, compressed size and percentage saved of the -specified file(s). Trailing data are ignored. The values produced are +specified files. Trailing data are ignored. The values produced are correct even for multimember files. If more than one file is given, a final line containing the cumulative sizes is printed. With @samp{-v}, the dictionary size, the number of members in the file, and the amount @@ -240,16 +286,21 @@ verifies that none of the specified files contain trailing data. @item -m @var{bytes} @itemx --match-length=@var{bytes} -Set the match length limit in bytes. After a match this long is found, -the search is finished. Valid values range from 5 to 273. Larger values -usually give better compression ratios but longer compression times. +When compressing, set the match length limit in bytes. After a match +this long is found, the search is finished. Valid values range from 5 to +273. Larger values usually give better compression ratios but longer +compression times. @item -n @var{n} @itemx --threads=@var{n} -Set the number of worker threads. Valid values range from 1 to "as many -as your system can support". If this option is not used, plzip tries to -detect the number of processors in the system and use it as default -value. @w{@samp{plzip --help}} shows the system's default value. +Set the number of worker threads, overriding the system's default. Valid +values range from 1 to "as many as your system can support". If this +option is not used, plzip tries to detect the number of processors in +the system and use it as default value. When compressing on a @w{32 bit} +system, plzip tries to limit the memory use to under @w{2.22 GiB} (4 +worker threads at level -9) by reducing the number of threads below the +system's default. @w{@samp{plzip --help}} shows the system's default +value. Note that the number of usable threads is limited to @w{ceil( file_size / data_size )} during compression (@pxref{Minimum file sizes}), and to @@ -260,7 +311,9 @@ the number of members in the input during decompression. When reading from standard input and @samp{--stdout} has not been specified, use @samp{@var{file}} as the virtual name of the uncompressed file. This produces a file named @samp{@var{file}} when decompressing, -and a file named @samp{@var{file}.lz} when compressing. +or a file named @samp{@var{file}.lz} when compressing. A second +@samp{.lz} extension is not added if @samp{@var{file}} already ends in +@samp{.lz} or @samp{.tlz}. @item -q @itemx --quiet @@ -268,12 +321,12 @@ Quiet operation. Suppress all messages. @item -s @var{bytes} @itemx --dictionary-size=@var{bytes} -Set the dictionary size limit in bytes. Plzip will use the smallest -possible dictionary size for each file without exceeding this limit. -Valid values range from 4 KiB to 512 MiB. Values 12 to 29 are -interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note that -dictionary sizes are quantized. If the specified size does not match one -of the valid sizes, it will be rounded upwards by adding up to +When compressing, set the dictionary size limit in bytes. Plzip will use +the smallest possible dictionary size for each file without exceeding +this limit. Valid values range from @w{4 KiB} to @w{512 MiB}. Values 12 +to 29 are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note +that dictionary sizes are quantized. If the specified size does not +match one of the valid sizes, it will be rounded upwards by adding up to @w{(@var{bytes} / 8)} to it. For maximum compression you should use a dictionary size limit as large @@ -282,27 +335,29 @@ is affected at compression time by the choice of dictionary size limit. @item -t @itemx --test -Check integrity of the specified file(s), but don't decompress them. -This really performs a trial decompression and throws away the result. -Use it together with @samp{-v} to see information about the file(s). If -a file does not exist, can't be opened, or is a terminal, plzip -continues checking the rest of the files. If a file fails the test, -plzip may be unable to check the rest of the files. +Check integrity of the specified files, but don't decompress them. This +really performs a trial decompression and throws away the result. Use it +together with @samp{-v} to see information about the files. If a file +does not exist, can't be opened, or is a terminal, plzip continues +checking the rest of the files. If a file fails the test, plzip may be +unable to check the rest of the files. @item -v @itemx --verbose Verbose mode.@* -When compressing, show the compression ratio for each file processed. A -second @samp{-v} shows the progress of compression.@* +When compressing, show the compression ratio and size for each file +processed.@* When decompressing or testing, further -v's (up to 4) increase the verbosity level, showing status, compression ratio, dictionary size, -decompressed size, and compressed size. +decompressed size, and compressed size.@* +Two or more @samp{-v} options show the progress of (de)compression, +except for single-member files. @item -0 .. -9 Set the compression parameters (dictionary size and match length limit) as shown in the table below. The default compression level is @samp{-6}. Note that @samp{-9} can be much slower than @samp{-0}. These options -have no effect when decompressing. +have no effect when decompressing, testing or listing. The bidimensional parameter space of LZMA can't be mapped to a linear scale optimal for all files. If your files are large, very repetitive, @@ -327,6 +382,12 @@ etc, you may need to use the @samp{--dictionary-size} and @itemx --best Aliases for GNU gzip compatibility. +@item --loose-trailing +When decompressing, testing or listing, allow trailing data whose first +bytes are so similar to the magic bytes of a lzip header that they can +be confused with a corrupt header. Use this option if a file triggers a +"corrupt header" error and the cause is not indeed a corrupt header. + @end table Numbers given as arguments to options may be followed by a multiplier @@ -363,8 +424,8 @@ creating a multimember compressed file. When decompressing, plzip decompresses as many members simultaneously as worker threads are chosen. Files that were compressed with lzip will not -be decompressed faster than using lzip (unless the @samp{-b} option was -used) because lzip usually produces single-member files, which can't be +be decompressed faster than using lzip (unless the @samp{-b} option was used) +because lzip usually produces single-member files, which can't be decompressed in parallel. For each input file, a splitter thread and several worker threads are @@ -377,6 +438,19 @@ to the workers. The workers (de)compress the blocks received from the splitter. The muxer collects processed packets from the workers, and writes them to the output file. +@verbatim + ,------------, + ,-->| worker 0 |--, + | `------------' | +,-------, ,----------, | ,------------, | ,-------, ,--------, +| input |-->| splitter |-+-->| worker 1 |--+-->| muxer |-->| output | +| file | `----------' | `------------' | `-------' | file | +`-------' | ... | `--------' + | ,------------, | + `-->| worker N-1 |--' + `------------' +@end verbatim + When decompressing from a regular file, the splitter is removed and the workers read directly from the input file. If the output file is also a regular file, the muxer is also removed and the workers write directly @@ -472,35 +546,60 @@ facilitates safe recovery of undamaged members from multimember files. @chapter Memory required to compress and decompress @cindex memory requirements -The amount of memory required @strong{per thread} is approximately the -following: +The amount of memory required @strong{per thread} for decompression or +testing is approximately the following: @itemize @bullet -@item -For compression at level -0; 1.5 MiB plus 3 times the data size -(@pxref{--data-size}). Default is 4.5 MiB. - -@item -For compression at other levels; 11 times the dictionary size plus 3 -times the data size. Default is 136 MiB. - @item For decompression of a regular (seekable) file to another regular file, or for testing of a regular file; the dictionary size. @item For testing of a non-seekable file or of standard input; the dictionary -size plus up to 5 MiB. +size plus up to @w{5 MiB}. @item For decompression of a regular file to a non-seekable file or to -standard output; the dictionary size plus up to 32 MiB. +standard output; the dictionary size plus up to @w{32 MiB}. @item For decompression of a non-seekable file or of standard input; the -dictionary size plus up to 35 MiB. +dictionary size plus up to @w{35 MiB}. +@end itemize + +@noindent +The amount of memory required @strong{per thread} for compression is +approximately the following: + +@itemize @bullet +@item +For compression at level -0; @w{1.5 MiB} plus 3.375 times the data size +(@pxref{--data-size}). Default is @w{4.875 MiB}. + +@item +For compression at other levels; 11 times the dictionary size plus 3.375 +times the data size. Default is @w{142 MiB}. @end itemize +@noindent +The following table shows the memory required @strong{per thread} for +compression at a given level, using the default data size for each +level: + +@multitable {Level} {Memory required} +@item Level @tab Memory required +@item -0 @tab 4.875 MiB +@item -1 @tab 17.75 MiB +@item -2 @tab 26.625 MiB +@item -3 @tab 35.5 MiB +@item -4 @tab 53.25 MiB +@item -5 @tab 71 MiB +@item -6 @tab 142 MiB +@item -7 @tab 284 MiB +@item -8 @tab 426 MiB +@item -9 @tab 568 MiB +@end multitable + @node Minimum file sizes @chapter Minimum file sizes required for full compression speed @@ -516,7 +615,8 @@ least as large as the number of worker threads times the chunk size (@pxref{--data-size}). Else some processors will not get any data to compress, and compression will be proportionally slower. The maximum speed increase achievable on a given file is limited by the ratio -@w{(file_size / data_size)}. +@w{(file_size / data_size)}. For example, a tarball the size of gcc or +linux will scale up to 8 processors at level -9. The following table shows the minimum uncompressed file size needed for full use of N processors at a given compression level, using the default @@ -554,9 +654,10 @@ padding zero bytes to a lzip file. @item Useful data added by the user; a cryptographically secure hash, a description of file contents, etc. It is safe to append any amount of -text to a lzip file as long as the text does not begin with the string -"LZIP", and does not contain any zero bytes (null characters). Nonzero -bytes and zero bytes can't be safely mixed in trailing data. +text to a lzip file as long as none of the first four bytes of the text +match the corresponding byte in the string "LZIP", and the text does not +contain any zero bytes (null characters). Nonzero bytes and zero bytes +can't be safely mixed in trailing data. @item Garbage added by some not totally successful copy operation. @@ -566,12 +667,16 @@ Malicious data added to the file in order to make its total size and hash value (for a chosen hash) coincide with those of another file. @item -In very rare cases, trailing data could be the corrupt header of another +In rare cases, trailing data could be the corrupt header of another member. In multimember or concatenated files the probability of corruption happening in the magic bytes is 5 times smaller than the probability of getting a false positive caused by the corruption of the integrity information itself. Therefore it can be considered to be below -the noise level. +the noise level. Additionally, the test used by plzip to discriminate +trailing data from a corrupt header has a Hamming distance (HD) of 3, +and the 3 bit flips must happen in different magic bytes for the test to +fail. In any case, the option @samp{--trailing-error} guarantees that +any corrupt header will be detected. @end itemize Trailing data are in no way part of the lzip file format, but tools @@ -607,7 +712,7 @@ plzip -v file @sp 1 @noindent Example 2: Like example 1 but the created @samp{file.lz} has a block -size of 1 MiB. The compression ratio is not shown. +size of @w{1 MiB}. The compression ratio is not shown. @example plzip -B 1MiB file @@ -656,7 +761,7 @@ Do this instead @sp 1 @noindent -Example 7: Decompress @samp{file.lz} partially until 10 KiB of +Example 7: Decompress @samp{file.lz} partially until @w{10 KiB} of decompressed data are produced. @example -- cgit v1.2.3