diff options
Diffstat (limited to 'doc/plzip.info')
-rw-r--r-- | doc/plzip.info | 268 |
1 files changed, 182 insertions, 86 deletions
diff --git a/doc/plzip.info b/doc/plzip.info index cf53f13..c8d7387 100644 --- a/doc/plzip.info +++ b/doc/plzip.info @@ -11,11 +11,12 @@ File: plzip.info, Node: Top, Next: Introduction, Up: (dir) Plzip Manual ************ -This manual is for Plzip (version 1.6, 12 April 2017). +This manual is for Plzip (version 1.7, 7 February 2018). * Menu: * Introduction:: Purpose and features of plzip +* Output:: Meaning of plzip's output * Invoking plzip:: Command line interface * Program design:: Internal structure of plzip * File format:: Detailed format of the compressed file @@ -27,13 +28,13 @@ This manual is for Plzip (version 1.6, 12 April 2017). * Concept index:: Index of concepts - Copyright (C) 2009-2017 Antonio Diaz Diaz. + Copyright (C) 2009-2018 Antonio Diaz Diaz. This manual is free documentation: you have unlimited permission to copy, distribute and modify it. -File: plzip.info, Node: Introduction, Next: Invoking plzip, Prev: Top, Up: Top +File: plzip.info, Node: Introduction, Next: Output, Prev: Top, Up: Top 1 Introduction ************** @@ -58,7 +59,7 @@ archiving, taking into account both data integrity and decoder availability: * The lzip format provides very safe integrity checking and some data - recovery means. The lziprecover program can repair bit-flip errors + recovery means. The lziprecover program can repair bit flip errors (one of the most common forms of data corruption) in lzip files, and provides data recovery capabilities, including error-checked merging of damaged copies of a file. *Note Data safety: @@ -114,17 +115,60 @@ entirely incomprehensible and therefore pointless. Plzip will correctly decompress a file which is the concatenation of two or more compressed files. The result is the concatenation of the -corresponding uncompressed files. Integrity testing of concatenated +corresponding decompressed files. Integrity testing of concatenated compressed files is also supported. + +File: plzip.info, Node: Output, Next: Invoking plzip, Prev: Introduction, Up: Top + +2 Meaning of plzip's output +*************************** + +The output of plzip looks like this: + + plzip -v foo + foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out. + + plzip -tvv foo.lz + foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. ok + + The meaning of each field is as follows: + +'N:1' + The compression ratio (uncompressed_size / compressed_size), shown + as N to 1. + +'ratio' + The inverse compression ratio + (compressed_size / uncompressed_size), shown as a percentage. A + decimal ratio is easily obtained by moving the decimal point two + places to the left; 14.98% = 0.1498. + +'saved' + The space saved by compression (1 - ratio), shown as a percentage. + +'in' + The size of the uncompressed data. When decompressing or testing, + it is shown as 'decompressed'. Note that plzip always prints the + uncompressed size before the compressed size when compressing, + decompressing, testing or listing. + +'out' + The size of the compressed data. When decompressing or testing, it + is shown as 'compressed'. + + + When decompressing or testing at verbosity level 4 (-vvvv), the +dictionary size used to compress the file is also shown. + LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have been compressed. Decompressed is used to refer to data which have undergone the process of decompression. -File: plzip.info, Node: Invoking plzip, Next: Program design, Prev: Introduction, Up: Top +File: plzip.info, Node: Invoking plzip, Next: Program design, Prev: Output, Up: Top -2 Invoking plzip +3 Invoking plzip **************** The format for running plzip is: @@ -135,7 +179,7 @@ The format for running plzip is: other FILES and is read just once, the first time it appears in the command line. - Plzip supports the following options: + plzip supports the following options: '-h' '--help' @@ -154,12 +198,12 @@ command line. '-B BYTES' '--data-size=BYTES' - Set the size of the input data blocks, in bytes. The input file - will be divided in chunks of this size before compression is - performed. Valid values range from 8 KiB to 1 GiB. Default value - is two times the dictionary size, except for option '-0' where it - defaults to 1 MiB. Plzip will reduce the dictionary size if it is - larger than the chosen data size. + When compressing, set the size of the input data blocks in bytes. + The input file will be divided in chunks of this size before + compression is performed. Valid values range from 8 KiB to 1 GiB. + Default value is two times the dictionary size, except for option + '-0' where it defaults to 1 MiB. Plzip will reduce the dictionary + size if it is larger than the chosen data size. '-c' '--stdout' @@ -170,10 +214,10 @@ command line. '-d' '--decompress' - Decompress the specified file(s). If a file does not exist or - can't be opened, plzip continues decompressing the rest of the - files. If a file fails to decompress, plzip exits immediately - without decompressing the rest of the files. + Decompress the specified files. If a file does not exist or can't + be opened, plzip continues decompressing the rest of the files. If + a file fails to decompress, or is a terminal, plzip exits + immediately without decompressing the rest of the files. '-f' '--force' @@ -181,8 +225,8 @@ command line. '-F' '--recompress' - Force re-compression of files whose name already has the '.lz' or - '.tlz' suffix. + When compressing, force re-compression of files whose name already + has the '.lz' or '.tlz' suffix. '-k' '--keep' @@ -192,7 +236,7 @@ command line. '-l' '--list' Print the uncompressed size, compressed size and percentage saved - of the specified file(s). Trailing data are ignored. The values + of the specified files. Trailing data are ignored. The values produced are correct even for multimember files. If more than one file is given, a final line containing the cumulative sizes is printed. With '-v', the dictionary size, the number of members in @@ -206,18 +250,21 @@ command line. '-m BYTES' '--match-length=BYTES' - Set the match length limit in bytes. After a match this long is - found, the search is finished. Valid values range from 5 to 273. - Larger values usually give better compression ratios but longer - compression times. + When compressing, set the match length limit in bytes. After a + match this long is found, the search is finished. Valid values + range from 5 to 273. Larger values usually give better compression + ratios but longer compression times. '-n N' '--threads=N' - Set the number of worker threads. Valid values range from 1 to "as - many as your system can support". If this option is not used, - plzip tries to detect the number of processors in the system and - use it as default value. 'plzip --help' shows the system's default - value. + Set the number of worker threads, overriding the system's default. + Valid values range from 1 to "as many as your system can support". + If this option is not used, plzip tries to detect the number of + processors in the system and use it as default value. When + compressing on a 32 bit system, plzip tries to limit the memory + use to under 2.22 GiB (4 worker threads at level -9) by reducing + the number of threads below the system's default. 'plzip --help' + shows the system's default value. Note that the number of usable threads is limited to ceil( file_size / data_size ) during compression (*note Minimum @@ -228,8 +275,9 @@ command line. '--output=FILE' When reading from standard input and '--stdout' has not been specified, use 'FILE' as the virtual name of the uncompressed - file. This produces a file named 'FILE' when decompressing, and a - file named 'FILE.lz' when compressing. + file. This produces a file named 'FILE' when decompressing, or a + file named 'FILE.lz' when compressing. A second '.lz' extension is + not added if 'FILE' already ends in '.lz' or '.tlz'. '-q' '--quiet' @@ -237,13 +285,13 @@ command line. '-s BYTES' '--dictionary-size=BYTES' - Set the dictionary size limit in bytes. Plzip will use the smallest - possible dictionary size for each file without exceeding this - limit. Valid values range from 4 KiB to 512 MiB. Values 12 to 29 - are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note - that dictionary sizes are quantized. If the specified size does - not match one of the valid sizes, it will be rounded upwards by - adding up to (BYTES / 8) to it. + When compressing, set the dictionary size limit in bytes. Plzip + will use the smallest possible dictionary size for each file + without exceeding this limit. Valid values range from 4 KiB to + 512 MiB. Values 12 to 29 are interpreted as powers of two, meaning + 2^12 to 2^29 bytes. Note that dictionary sizes are quantized. If + the specified size does not match one of the valid sizes, it will + be rounded upwards by adding up to (BYTES / 8) to it. For maximum compression you should use a dictionary size limit as large as possible, but keep in mind that the decompression memory @@ -252,10 +300,10 @@ command line. '-t' '--test' - Check integrity of the specified file(s), but don't decompress - them. This really performs a trial decompression and throws away - the result. Use it together with '-v' to see information about - the file(s). If a file does not exist, can't be opened, or is a + Check integrity of the specified files, but don't decompress them. + This really performs a trial decompression and throws away the + result. Use it together with '-v' to see information about the + files. If a file does not exist, can't be opened, or is a terminal, plzip continues checking the rest of the files. If a file fails the test, plzip may be unable to check the rest of the files. @@ -263,17 +311,19 @@ command line. '-v' '--verbose' Verbose mode. - When compressing, show the compression ratio for each file - processed. A second '-v' shows the progress of compression. + When compressing, show the compression ratio and size for each file + processed. When decompressing or testing, further -v's (up to 4) increase the verbosity level, showing status, compression ratio, dictionary size, decompressed size, and compressed size. + Two or more '-v' options show the progress of (de)compression, + except for single-member files. '-0 .. -9' Set the compression parameters (dictionary size and match length limit) as shown in the table below. The default compression level is '-6'. Note that '-9' can be much slower than '-0'. These - options have no effect when decompressing. + options have no effect when decompressing, testing or listing. The bidimensional parameter space of LZMA can't be mapped to a linear scale optimal for all files. If your files are large, very @@ -296,6 +346,13 @@ command line. '--best' Aliases for GNU gzip compatibility. +'--loose-trailing' + When decompressing, testing or listing, allow trailing data whose + first bytes are so similar to the magic bytes of a lzip header + that they can be confused with a corrupt header. Use this option + if a file triggers a "corrupt header" error and the cause is not + indeed a corrupt header. + Numbers given as arguments to options may be followed by a multiplier and an optional 'B' for "byte". @@ -321,7 +378,7 @@ caused plzip to panic. File: plzip.info, Node: Program design, Next: File format, Prev: Invoking plzip, Up: Top -3 Program design +4 Program design **************** When compressing, plzip divides the input file into chunks and @@ -344,6 +401,17 @@ them to the workers. The workers (de)compress the blocks received from the splitter. The muxer collects processed packets from the workers, and writes them to the output file. + ,------------, + ,-->| worker 0 |--, + | `------------' | +,-------, ,----------, | ,------------, | ,-------, ,--------, +| input |-->| splitter |-+-->| worker 1 |--+-->| muxer |-->| output | +| file | `----------' | `------------' | `-------' | file | +`-------' | ... | `--------' + | ,------------, | + `-->| worker N-1 |--' + `------------' + When decompressing from a regular file, the splitter is removed and the workers read directly from the input file. If the output file is also a regular file, the muxer is also removed and the workers write @@ -355,7 +423,7 @@ I/O speed. File: plzip.info, Node: File format, Next: Memory requirements, Prev: Program design, Up: Top -4 File format +5 File format ************* Perfection is reached, not when there is no longer anything to add, but @@ -426,17 +494,11 @@ additional information before, between, or after them. File: plzip.info, Node: Memory requirements, Next: Minimum file sizes, Prev: File format, Up: Top -5 Memory required to compress and decompress +6 Memory required to compress and decompress ******************************************** -The amount of memory required *per thread* is approximately the -following: - - * For compression at level -0; 1.5 MiB plus 3 times the data size - (*note --data-size::). Default is 4.5 MiB. - - * For compression at other levels; 11 times the dictionary size plus - 3 times the data size. Default is 136 MiB. +The amount of memory required *per thread* for decompression or testing +is approximately the following: * For decompression of a regular (seekable) file to another regular file, or for testing of a regular file; the dictionary size. @@ -450,10 +512,35 @@ following: * For decompression of a non-seekable file or of standard input; the dictionary size plus up to 35 MiB. +The amount of memory required *per thread* for compression is +approximately the following: + + * For compression at level -0; 1.5 MiB plus 3.375 times the data size + (*note --data-size::). Default is 4.875 MiB. + + * For compression at other levels; 11 times the dictionary size plus + 3.375 times the data size. Default is 142 MiB. + +The following table shows the memory required *per thread* for +compression at a given level, using the default data size for each +level: + +Level Memory required +-0 4.875 MiB +-1 17.75 MiB +-2 26.625 MiB +-3 35.5 MiB +-4 53.25 MiB +-5 71 MiB +-6 142 MiB +-7 284 MiB +-8 426 MiB +-9 568 MiB + File: plzip.info, Node: Minimum file sizes, Next: Trailing data, Prev: Memory requirements, Up: Top -6 Minimum file sizes required for full compression speed +7 Minimum file sizes required for full compression speed ******************************************************** When compressing, plzip divides the input file into chunks and @@ -466,7 +553,8 @@ must be at least as large as the number of worker threads times the chunk size (*note --data-size::). Else some processors will not get any data to compress, and compression will be proportionally slower. The maximum speed increase achievable on a given file is limited by the -ratio (file_size / data_size). +ratio (file_size / data_size). For example, a tarball the size of gcc or +linux will scale up to 8 processors at level -9. The following table shows the minimum uncompressed file size needed for full use of N processors at a given compression level, using the @@ -489,7 +577,7 @@ Level File: plzip.info, Node: Trailing data, Next: Examples, Prev: Minimum file sizes, Up: Top -7 Extra data appended to the file +8 Extra data appended to the file ********************************* Sometimes extra data are found appended to a lzip file after the last @@ -501,10 +589,11 @@ member. Such trailing data may be: * Useful data added by the user; a cryptographically secure hash, a description of file contents, etc. It is safe to append any amount - of text to a lzip file as long as the text does not begin with the - string "LZIP", and does not contain any zero bytes (null - characters). Nonzero bytes and zero bytes can't be safely mixed in - trailing data. + of text to a lzip file as long as none of the first four bytes of + the text match the corresponding byte in the string "LZIP", and + the text does not contain any zero bytes (null characters). + Nonzero bytes and zero bytes can't be safely mixed in trailing + data. * Garbage added by some not totally successful copy operation. @@ -512,12 +601,17 @@ member. Such trailing data may be: and hash value (for a chosen hash) coincide with those of another file. - * In very rare cases, trailing data could be the corrupt header of - another member. In multimember or concatenated files the - probability of corruption happening in the magic bytes is 5 times - smaller than the probability of getting a false positive caused by - the corruption of the integrity information itself. Therefore it - can be considered to be below the noise level. + * In rare cases, trailing data could be the corrupt header of another + member. In multimember or concatenated files the probability of + corruption happening in the magic bytes is 5 times smaller than the + probability of getting a false positive caused by the corruption + of the integrity information itself. Therefore it can be + considered to be below the noise level. Additionally, the test + used by plzip to discriminate trailing data from a corrupt header + has a Hamming distance (HD) of 3, and the 3 bit flips must happen + in different magic bytes for the test to fail. In any case, the + option '--trailing-error' guarantees that any corrupt header will + be detected. Trailing data are in no way part of the lzip file format, but tools reading lzip files are expected to behave as correctly and usefully as @@ -531,7 +625,7 @@ cases where a file containing trailing data must be rejected, the option File: plzip.info, Node: Examples, Next: Problems, Prev: Trailing data, Up: Top -8 A small tutorial with examples +9 A small tutorial with examples ******************************** WARNING! Even if plzip is bug-free, other causes may result in a corrupt @@ -595,8 +689,8 @@ to decompressed byte 15000 (5000 bytes are produced). File: plzip.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top -9 Reporting bugs -**************** +10 Reporting bugs +***************** There are probably bugs in plzip. There are certainly errors and omissions in this manual. If you report them, they will get fixed. If @@ -625,6 +719,7 @@ Concept index * memory requirements: Memory requirements. (line 6) * minimum file sizes: Minimum file sizes. (line 6) * options: Invoking plzip. (line 6) +* output: Output. (line 6) * program design: Program design. (line 6) * trailing data: Trailing data. (line 6) * usage: Invoking plzip. (line 6) @@ -634,19 +729,20 @@ Concept index Tag Table: Node: Top221 -Node: Introduction1103 -Node: Invoking plzip5274 -Ref: --trailing-error5843 -Ref: --data-size6086 -Node: Program design12796 -Node: File format14383 -Node: Memory requirements16815 -Node: Minimum file sizes17815 -Node: Trailing data19741 -Node: Examples21648 -Ref: concat-example22813 -Node: Problems23388 -Node: Concept index23914 +Node: Introduction1158 +Node: Output5134 +Node: Invoking plzip6614 +Ref: --trailing-error7177 +Ref: --data-size7420 +Node: Program design14938 +Node: File format17090 +Node: Memory requirements19522 +Node: Minimum file sizes20985 +Node: Trailing data23002 +Node: Examples25285 +Ref: concat-example26450 +Node: Problems27025 +Node: Concept index27553 End Tag Table |