diff options
author | Daniel Baumann <mail@daniel-baumann.ch> | 2015-11-06 12:53:27 +0000 |
---|---|---|
committer | Daniel Baumann <mail@daniel-baumann.ch> | 2015-11-06 12:53:27 +0000 |
commit | 06a01dc7a04d92c60210008435950afa9b6c0a69 (patch) | |
tree | 7cb6896a8a1e8982a513e388554a59adac90a184 /doc/clzip.texi | |
parent | Adding debian version 1.7~rc1-1. (diff) | |
download | clzip-06a01dc7a04d92c60210008435950afa9b6c0a69.tar.xz clzip-06a01dc7a04d92c60210008435950afa9b6c0a69.zip |
Merging upstream version 1.7.
Signed-off-by: Daniel Baumann <mail@daniel-baumann.ch>
Diffstat (limited to 'doc/clzip.texi')
-rw-r--r-- | doc/clzip.texi | 162 |
1 files changed, 83 insertions, 79 deletions
diff --git a/doc/clzip.texi b/doc/clzip.texi index a74ec6f..e2ca889 100644 --- a/doc/clzip.texi +++ b/doc/clzip.texi @@ -6,8 +6,8 @@ @finalout @c %**end of header -@set UPDATED 23 May 2015 -@set VERSION 1.7-rc1 +@set UPDATED 7 July 2015 +@set VERSION 1.7 @dircategory Data Compression @direntry @@ -36,9 +36,9 @@ This manual is for Clzip (version @value{VERSION}, @value{UPDATED}). @menu * Introduction:: Purpose and features of clzip -* Algorithm:: How clzip compresses the data * Invoking clzip:: Command line interface * File format:: Detailed format of the compressed file +* Algorithm:: How clzip compresses the data * Examples:: A small tutorial with examples * Problems:: Reporting bugs * Concept index:: Index of concepts @@ -72,10 +72,14 @@ availability: @itemize @bullet @item The lzip format provides very safe integrity checking and some data -recovery means. The lziprecover program can repair bit-flip errors (one -of the most common forms of data corruption) in lzip files, and provides -data recovery capabilities, including error-checked merging of damaged -copies of a file. +recovery means. The +@uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover} +program can repair bit-flip errors (one of the most common forms of data +corruption) in lzip files, and provides data recovery capabilities, +including error-checked merging of damaged copies of a file. +@ifnothtml +@ref{Data safety,,,lziprecover}. +@end ifnothtml @item The lzip format is as simple as possible (but not simpler). The lzip @@ -111,6 +115,11 @@ bzip2, which makes it safer than compressors returning ambiguous warning values (like gzip) when it is used as a back end for other programs like tar or zutils. +Clzip will automatically use the smallest possible dictionary size for +each file without exceeding the given limit. Keep in mind that the +decompression memory requirement is affected at compression time by the +choice of dictionary size limit. + The amount of memory required for compression is about 1 or 2 times the dictionary size limit (1 if input file size is less than dictionary size limit, else 2) plus 9 times the dictionary size really used. The option @@ -118,11 +127,6 @@ limit, else 2) plus 9 times the dictionary size really used. The option of memory required for decompression is about 46 kB larger than the dictionary size really used. -Clzip will automatically use the smallest possible dictionary size for -each file without exceeding the given limit. Keep in mind that the -decompression memory requirement is affected at compression time by the -choice of dictionary size limit. - When compressing, clzip replaces every file given in the command line with a compressed version of itself, with the name "original_name.lz". When decompressing, clzip attempts to guess the name for the decompressed @@ -164,72 +168,6 @@ automatically creating multi-member output. The members so created are large, about 2 PiB each. -@node Algorithm -@chapter Algorithm -@cindex algorithm - -In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a -concrete algorithm; it is more like "any algorithm using the LZMA coding -scheme". For example, the option '-0' of lzip uses the scheme in almost -the simplest way possible; issuing the longest match it can find, or a -literal byte if it can't find a match. Inversely, a much more elaborated -way of finding coding sequences of minimum size than the one currently -used by lzip could be developed, and the resulting sequence could also -be coded using the LZMA coding scheme. - -Clzip currently implements two variants of the LZMA algorithm; fast -(used by option -0) and normal (used by all other compression levels). - -The high compression of LZMA comes from combining two basic, well-proven -compression ideas: sliding dictionaries (LZ77/78) and markov models (the -thing used by every compression algorithm that uses a range encoder or -similar order-0 entropy coder as its last stage) with segregation of -contexts according to what the bits are used for. - -Clzip is a two stage compressor. The first stage is a Lempel-Ziv coder, -which reduces redundancy by translating chunks of data to their -corresponding distance-length pairs. The second stage is a range encoder -that uses a different probability model for each type of data; -distances, lengths, literal bytes, etc. - -Here is how it works, step by step: - -1) The member header is written to the output stream. - -2) The first byte is coded literally, because there are no previous -bytes to which the match finder can refer to. - -3) The main encoder advances to the next byte in the input data and -calls the match finder. - -4) The match finder fills an array with the minimum distances before the -current byte where a match of a given length can be found. - -5) Go back to step 3 until a sequence (formed of pairs, repeated -distances and literal bytes) of minimum price has been formed. Where the -price represents the number of output bits produced. - -6) The range encoder encodes the sequence produced by the main encoder -and sends the produced bytes to the output stream. - -7) Go back to step 3 until the input data are finished or until the -member or volume size limits are reached. - -8) The range encoder is flushed. - -9) The member trailer is written to the output stream. - -10) If there are more data to compress, go back to step 1. - -@sp 1 -@noindent -The ideas embodied in clzip are due to (at least) the following people: -Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for -the definition of Markov chains), G.N.N. Martin (for the definition of -range encoding), Igor Pavlov (for putting all the above together in -LZMA), and Julian Seward (for bzip2's CLI). - - @node Invoking clzip @chapter Invoking clzip @cindex invoking @@ -276,7 +214,7 @@ Force overwrite of output files. @item -F @itemx --recompress -Force recompression of files whose name already has the @samp{.lz} or +Force re-compression of files whose name already has the @samp{.lz} or @samp{.tlz} suffix. @item -k @@ -476,6 +414,72 @@ facilitates safe recovery of undamaged members from multi-member files. @end table +@node Algorithm +@chapter Algorithm +@cindex algorithm + +In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a +concrete algorithm; it is more like "any algorithm using the LZMA coding +scheme". For example, the option @samp{-0} of lzip uses the scheme in almost +the simplest way possible; issuing the longest match it can find, or a +literal byte if it can't find a match. Inversely, a much more elaborated +way of finding coding sequences of minimum size than the one currently +used by lzip could be developed, and the resulting sequence could also +be coded using the LZMA coding scheme. + +Clzip currently implements two variants of the LZMA algorithm; fast +(used by option @samp{-0}) and normal (used by all other compression levels). + +The high compression of LZMA comes from combining two basic, well-proven +compression ideas: sliding dictionaries (LZ77/78) and markov models (the +thing used by every compression algorithm that uses a range encoder or +similar order-0 entropy coder as its last stage) with segregation of +contexts according to what the bits are used for. + +Clzip is a two stage compressor. The first stage is a Lempel-Ziv coder, +which reduces redundancy by translating chunks of data to their +corresponding distance-length pairs. The second stage is a range encoder +that uses a different probability model for each type of data; +distances, lengths, literal bytes, etc. + +Here is how it works, step by step: + +1) The member header is written to the output stream. + +2) The first byte is coded literally, because there are no previous +bytes to which the match finder can refer to. + +3) The main encoder advances to the next byte in the input data and +calls the match finder. + +4) The match finder fills an array with the minimum distances before the +current byte where a match of a given length can be found. + +5) Go back to step 3 until a sequence (formed of pairs, repeated +distances and literal bytes) of minimum price has been formed. Where the +price represents the number of output bits produced. + +6) The range encoder encodes the sequence produced by the main encoder +and sends the produced bytes to the output stream. + +7) Go back to step 3 until the input data are finished or until the +member or volume size limits are reached. + +8) The range encoder is flushed. + +9) The member trailer is written to the output stream. + +10) If there are more data to compress, go back to step 1. + +@sp 1 +@noindent +The ideas embodied in clzip are due to (at least) the following people: +Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for +the definition of Markov chains), G.N.N. Martin (for the definition of +range encoding), Igor Pavlov (for putting all the above together in +LZMA), and Julian Seward (for bzip2's CLI). + + @node Examples @chapter A small tutorial with examples @cindex examples |