diff options
Diffstat (limited to 'doc/clzip.info')
-rw-r--r-- | doc/clzip.info | 177 |
1 files changed, 89 insertions, 88 deletions
diff --git a/doc/clzip.info b/doc/clzip.info index b66195e..786d8c1 100644 --- a/doc/clzip.info +++ b/doc/clzip.info @@ -11,14 +11,14 @@ File: clzip.info, Node: Top, Next: Introduction, Up: (dir) Clzip Manual ************ -This manual is for Clzip (version 1.7-rc1, 23 May 2015). +This manual is for Clzip (version 1.7, 7 July 2015). * Menu: * Introduction:: Purpose and features of clzip -* Algorithm:: How clzip compresses the data * Invoking clzip:: Command line interface * File format:: Detailed format of the compressed file +* Algorithm:: How clzip compresses the data * Examples:: A small tutorial with examples * Problems:: Reporting bugs * Concept index:: Index of concepts @@ -30,7 +30,7 @@ This manual is for Clzip (version 1.7-rc1, 23 May 2015). copy, distribute and modify it. -File: clzip.info, Node: Introduction, Next: Algorithm, Prev: Top, Up: Top +File: clzip.info, Node: Introduction, Next: Invoking clzip, Prev: Top, Up: Top 1 Introduction ************** @@ -53,7 +53,8 @@ availability: recovery means. The lziprecover program can repair bit-flip errors (one of the most common forms of data corruption) in lzip files, and provides data recovery capabilities, including error-checked - merging of damaged copies of a file. + merging of damaged copies of a file. *note Data safety: + (lziprecover)Data safety. * The lzip format is as simple as possible (but not simpler). The lzip manual provides the code of a simple decompressor along with @@ -87,6 +88,11 @@ bzip2, which makes it safer than compressors returning ambiguous warning values (like gzip) when it is used as a back end for other programs like tar or zutils. + Clzip will automatically use the smallest possible dictionary size +for each file without exceeding the given limit. Keep in mind that the +decompression memory requirement is affected at compression time by the +choice of dictionary size limit. + The amount of memory required for compression is about 1 or 2 times the dictionary size limit (1 if input file size is less than dictionary size limit, else 2) plus 9 times the dictionary size really used. The @@ -94,11 +100,6 @@ option '-0' is special and only requires about 1.5 MiB at most. The amount of memory required for decompression is about 46 kB larger than the dictionary size really used. - Clzip will automatically use the smallest possible dictionary size -for each file without exceeding the given limit. Keep in mind that the -decompression memory requirement is affected at compression time by the -choice of dictionary size limit. - When compressing, clzip replaces every file given in the command line with a compressed version of itself, with the name "original_name.lz". When decompressing, clzip attempts to guess the name for the @@ -138,75 +139,9 @@ automatically creating multi-member output. The members so created are large, about 2 PiB each. -File: clzip.info, Node: Algorithm, Next: Invoking clzip, Prev: Introduction, Up: Top - -2 Algorithm -*********** - -In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a -concrete algorithm; it is more like "any algorithm using the LZMA coding -scheme". For example, the option '-0' of lzip uses the scheme in almost -the simplest way possible; issuing the longest match it can find, or a -literal byte if it can't find a match. Inversely, a much more elaborated -way of finding coding sequences of minimum size than the one currently -used by lzip could be developed, and the resulting sequence could also -be coded using the LZMA coding scheme. - - Clzip currently implements two variants of the LZMA algorithm; fast -(used by option -0) and normal (used by all other compression levels). - - The high compression of LZMA comes from combining two basic, -well-proven compression ideas: sliding dictionaries (LZ77/78) and -markov models (the thing used by every compression algorithm that uses -a range encoder or similar order-0 entropy coder as its last stage) -with segregation of contexts according to what the bits are used for. - - Clzip is a two stage compressor. The first stage is a Lempel-Ziv -coder, which reduces redundancy by translating chunks of data to their -corresponding distance-length pairs. The second stage is a range encoder -that uses a different probability model for each type of data; -distances, lengths, literal bytes, etc. - - Here is how it works, step by step: - - 1) The member header is written to the output stream. - - 2) The first byte is coded literally, because there are no previous -bytes to which the match finder can refer to. - - 3) The main encoder advances to the next byte in the input data and -calls the match finder. - - 4) The match finder fills an array with the minimum distances before -the current byte where a match of a given length can be found. - - 5) Go back to step 3 until a sequence (formed of pairs, repeated -distances and literal bytes) of minimum price has been formed. Where the -price represents the number of output bits produced. - - 6) The range encoder encodes the sequence produced by the main -encoder and sends the produced bytes to the output stream. - - 7) Go back to step 3 until the input data are finished or until the -member or volume size limits are reached. - - 8) The range encoder is flushed. - - 9) The member trailer is written to the output stream. - - 10) If there are more data to compress, go back to step 1. - - -The ideas embodied in clzip are due to (at least) the following people: -Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for -the definition of Markov chains), G.N.N. Martin (for the definition of -range encoding), Igor Pavlov (for putting all the above together in -LZMA), and Julian Seward (for bzip2's CLI). - - -File: clzip.info, Node: Invoking clzip, Next: File format, Prev: Algorithm, Up: Top +File: clzip.info, Node: Invoking clzip, Next: File format, Prev: Introduction, Up: Top -3 Invoking clzip +2 Invoking clzip **************** The format for running clzip is: @@ -246,7 +181,7 @@ The format for running clzip is: '-F' '--recompress' - Force recompression of files whose name already has the '.lz' or + Force re-compression of files whose name already has the '.lz' or '.tlz' suffix. '-k' @@ -363,9 +298,9 @@ invalid input file, 3 for an internal consistency error (eg, bug) which caused clzip to panic. -File: clzip.info, Node: File format, Next: Examples, Prev: Invoking clzip, Up: Top +File: clzip.info, Node: File format, Next: Algorithm, Prev: Invoking clzip, Up: Top -4 File format +3 File format ************* Perfection is reached, not when there is no longer anything to add, but @@ -434,7 +369,73 @@ additional information before, between, or after them. -File: clzip.info, Node: Examples, Next: Problems, Prev: File format, Up: Top +File: clzip.info, Node: Algorithm, Next: Examples, Prev: File format, Up: Top + +4 Algorithm +*********** + +In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a +concrete algorithm; it is more like "any algorithm using the LZMA coding +scheme". For example, the option '-0' of lzip uses the scheme in almost +the simplest way possible; issuing the longest match it can find, or a +literal byte if it can't find a match. Inversely, a much more elaborated +way of finding coding sequences of minimum size than the one currently +used by lzip could be developed, and the resulting sequence could also +be coded using the LZMA coding scheme. + + Clzip currently implements two variants of the LZMA algorithm; fast +(used by option '-0') and normal (used by all other compression levels). + + The high compression of LZMA comes from combining two basic, +well-proven compression ideas: sliding dictionaries (LZ77/78) and +markov models (the thing used by every compression algorithm that uses +a range encoder or similar order-0 entropy coder as its last stage) +with segregation of contexts according to what the bits are used for. + + Clzip is a two stage compressor. The first stage is a Lempel-Ziv +coder, which reduces redundancy by translating chunks of data to their +corresponding distance-length pairs. The second stage is a range encoder +that uses a different probability model for each type of data; +distances, lengths, literal bytes, etc. + + Here is how it works, step by step: + + 1) The member header is written to the output stream. + + 2) The first byte is coded literally, because there are no previous +bytes to which the match finder can refer to. + + 3) The main encoder advances to the next byte in the input data and +calls the match finder. + + 4) The match finder fills an array with the minimum distances before +the current byte where a match of a given length can be found. + + 5) Go back to step 3 until a sequence (formed of pairs, repeated +distances and literal bytes) of minimum price has been formed. Where the +price represents the number of output bits produced. + + 6) The range encoder encodes the sequence produced by the main +encoder and sends the produced bytes to the output stream. + + 7) Go back to step 3 until the input data are finished or until the +member or volume size limits are reached. + + 8) The range encoder is flushed. + + 9) The member trailer is written to the output stream. + + 10) If there are more data to compress, go back to step 1. + + +The ideas embodied in clzip are due to (at least) the following people: +Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for +the definition of Markov chains), G.N.N. Martin (for the definition of +range encoding), Igor Pavlov (for putting all the above together in +LZMA), and Julian Seward (for bzip2's CLI). + + +File: clzip.info, Node: Examples, Next: Problems, Prev: Algorithm, Up: Top 5 A small tutorial with examples ******************************** @@ -545,13 +546,13 @@ Concept index Tag Table: Node: Top210 -Node: Introduction897 -Node: Algorithm6100 -Node: Invoking clzip8930 -Node: File format14479 -Node: Examples16881 -Node: Problems18850 -Node: Concept index19376 +Node: Introduction893 +Node: Invoking clzip6152 +Node: File format11705 +Node: Algorithm14108 +Node: Examples16933 +Node: Problems18900 +Node: Concept index19426 End Tag Table |