diff options
Diffstat (limited to '')
-rw-r--r-- | doc/clzip.1 | 4 | ||||
-rw-r--r-- | doc/clzip.info | 177 | ||||
-rw-r--r-- | doc/clzip.texi | 162 |
3 files changed, 174 insertions, 169 deletions
diff --git a/doc/clzip.1 b/doc/clzip.1 index cfc9050..32b3bde 100644 --- a/doc/clzip.1 +++ b/doc/clzip.1 @@ -1,5 +1,5 @@ .\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1. -.TH CLZIP "1" "May 2015" "clzip 1.7-rc1" "User Commands" +.TH CLZIP "1" "July 2015" "clzip 1.7" "User Commands" .SH NAME clzip \- reduces the size of files .SH SYNOPSIS @@ -28,7 +28,7 @@ decompress overwrite existing output files .TP \fB\-F\fR, \fB\-\-recompress\fR -force recompression of compressed files +force re\-compression of compressed files .TP \fB\-k\fR, \fB\-\-keep\fR keep (don't delete) input files diff --git a/doc/clzip.info b/doc/clzip.info index b66195e..786d8c1 100644 --- a/doc/clzip.info +++ b/doc/clzip.info @@ -11,14 +11,14 @@ File: clzip.info, Node: Top, Next: Introduction, Up: (dir) Clzip Manual ************ -This manual is for Clzip (version 1.7-rc1, 23 May 2015). +This manual is for Clzip (version 1.7, 7 July 2015). * Menu: * Introduction:: Purpose and features of clzip -* Algorithm:: How clzip compresses the data * Invoking clzip:: Command line interface * File format:: Detailed format of the compressed file +* Algorithm:: How clzip compresses the data * Examples:: A small tutorial with examples * Problems:: Reporting bugs * Concept index:: Index of concepts @@ -30,7 +30,7 @@ This manual is for Clzip (version 1.7-rc1, 23 May 2015). copy, distribute and modify it. -File: clzip.info, Node: Introduction, Next: Algorithm, Prev: Top, Up: Top +File: clzip.info, Node: Introduction, Next: Invoking clzip, Prev: Top, Up: Top 1 Introduction ************** @@ -53,7 +53,8 @@ availability: recovery means. The lziprecover program can repair bit-flip errors (one of the most common forms of data corruption) in lzip files, and provides data recovery capabilities, including error-checked - merging of damaged copies of a file. + merging of damaged copies of a file. *note Data safety: + (lziprecover)Data safety. * The lzip format is as simple as possible (but not simpler). The lzip manual provides the code of a simple decompressor along with @@ -87,6 +88,11 @@ bzip2, which makes it safer than compressors returning ambiguous warning values (like gzip) when it is used as a back end for other programs like tar or zutils. + Clzip will automatically use the smallest possible dictionary size +for each file without exceeding the given limit. Keep in mind that the +decompression memory requirement is affected at compression time by the +choice of dictionary size limit. + The amount of memory required for compression is about 1 or 2 times the dictionary size limit (1 if input file size is less than dictionary size limit, else 2) plus 9 times the dictionary size really used. The @@ -94,11 +100,6 @@ option '-0' is special and only requires about 1.5 MiB at most. The amount of memory required for decompression is about 46 kB larger than the dictionary size really used. - Clzip will automatically use the smallest possible dictionary size -for each file without exceeding the given limit. Keep in mind that the -decompression memory requirement is affected at compression time by the -choice of dictionary size limit. - When compressing, clzip replaces every file given in the command line with a compressed version of itself, with the name "original_name.lz". When decompressing, clzip attempts to guess the name for the @@ -138,75 +139,9 @@ automatically creating multi-member output. The members so created are large, about 2 PiB each. -File: clzip.info, Node: Algorithm, Next: Invoking clzip, Prev: Introduction, Up: Top - -2 Algorithm -*********** - -In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a -concrete algorithm; it is more like "any algorithm using the LZMA coding -scheme". For example, the option '-0' of lzip uses the scheme in almost -the simplest way possible; issuing the longest match it can find, or a -literal byte if it can't find a match. Inversely, a much more elaborated -way of finding coding sequences of minimum size than the one currently -used by lzip could be developed, and the resulting sequence could also -be coded using the LZMA coding scheme. - - Clzip currently implements two variants of the LZMA algorithm; fast -(used by option -0) and normal (used by all other compression levels). - - The high compression of LZMA comes from combining two basic, -well-proven compression ideas: sliding dictionaries (LZ77/78) and -markov models (the thing used by every compression algorithm that uses -a range encoder or similar order-0 entropy coder as its last stage) -with segregation of contexts according to what the bits are used for. - - Clzip is a two stage compressor. The first stage is a Lempel-Ziv -coder, which reduces redundancy by translating chunks of data to their -corresponding distance-length pairs. The second stage is a range encoder -that uses a different probability model for each type of data; -distances, lengths, literal bytes, etc. - - Here is how it works, step by step: - - 1) The member header is written to the output stream. - - 2) The first byte is coded literally, because there are no previous -bytes to which the match finder can refer to. - - 3) The main encoder advances to the next byte in the input data and -calls the match finder. - - 4) The match finder fills an array with the minimum distances before -the current byte where a match of a given length can be found. - - 5) Go back to step 3 until a sequence (formed of pairs, repeated -distances and literal bytes) of minimum price has been formed. Where the -price represents the number of output bits produced. - - 6) The range encoder encodes the sequence produced by the main -encoder and sends the produced bytes to the output stream. - - 7) Go back to step 3 until the input data are finished or until the -member or volume size limits are reached. - - 8) The range encoder is flushed. - - 9) The member trailer is written to the output stream. - - 10) If there are more data to compress, go back to step 1. - - -The ideas embodied in clzip are due to (at least) the following people: -Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for -the definition of Markov chains), G.N.N. Martin (for the definition of -range encoding), Igor Pavlov (for putting all the above together in -LZMA), and Julian Seward (for bzip2's CLI). - - -File: clzip.info, Node: Invoking clzip, Next: File format, Prev: Algorithm, Up: Top +File: clzip.info, Node: Invoking clzip, Next: File format, Prev: Introduction, Up: Top -3 Invoking clzip +2 Invoking clzip **************** The format for running clzip is: @@ -246,7 +181,7 @@ The format for running clzip is: '-F' '--recompress' - Force recompression of files whose name already has the '.lz' or + Force re-compression of files whose name already has the '.lz' or '.tlz' suffix. '-k' @@ -363,9 +298,9 @@ invalid input file, 3 for an internal consistency error (eg, bug) which caused clzip to panic. -File: clzip.info, Node: File format, Next: Examples, Prev: Invoking clzip, Up: Top +File: clzip.info, Node: File format, Next: Algorithm, Prev: Invoking clzip, Up: Top -4 File format +3 File format ************* Perfection is reached, not when there is no longer anything to add, but @@ -434,7 +369,73 @@ additional information before, between, or after them. -File: clzip.info, Node: Examples, Next: Problems, Prev: File format, Up: Top +File: clzip.info, Node: Algorithm, Next: Examples, Prev: File format, Up: Top + +4 Algorithm +*********** + +In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a +concrete algorithm; it is more like "any algorithm using the LZMA coding +scheme". For example, the option '-0' of lzip uses the scheme in almost +the simplest way possible; issuing the longest match it can find, or a +literal byte if it can't find a match. Inversely, a much more elaborated +way of finding coding sequences of minimum size than the one currently +used by lzip could be developed, and the resulting sequence could also +be coded using the LZMA coding scheme. + + Clzip currently implements two variants of the LZMA algorithm; fast +(used by option '-0') and normal (used by all other compression levels). + + The high compression of LZMA comes from combining two basic, +well-proven compression ideas: sliding dictionaries (LZ77/78) and +markov models (the thing used by every compression algorithm that uses +a range encoder or similar order-0 entropy coder as its last stage) +with segregation of contexts according to what the bits are used for. + + Clzip is a two stage compressor. The first stage is a Lempel-Ziv +coder, which reduces redundancy by translating chunks of data to their +corresponding distance-length pairs. The second stage is a range encoder +that uses a different probability model for each type of data; +distances, lengths, literal bytes, etc. + + Here is how it works, step by step: + + 1) The member header is written to the output stream. + + 2) The first byte is coded literally, because there are no previous +bytes to which the match finder can refer to. + + 3) The main encoder advances to the next byte in the input data and +calls the match finder. + + 4) The match finder fills an array with the minimum distances before +the current byte where a match of a given length can be found. + + 5) Go back to step 3 until a sequence (formed of pairs, repeated +distances and literal bytes) of minimum price has been formed. Where the +price represents the number of output bits produced. + + 6) The range encoder encodes the sequence produced by the main +encoder and sends the produced bytes to the output stream. + + 7) Go back to step 3 until the input data are finished or until the +member or volume size limits are reached. + + 8) The range encoder is flushed. + + 9) The member trailer is written to the output stream. + + 10) If there are more data to compress, go back to step 1. + + +The ideas embodied in clzip are due to (at least) the following people: +Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for +the definition of Markov chains), G.N.N. Martin (for the definition of +range encoding), Igor Pavlov (for putting all the above together in +LZMA), and Julian Seward (for bzip2's CLI). + + +File: clzip.info, Node: Examples, Next: Problems, Prev: Algorithm, Up: Top 5 A small tutorial with examples ******************************** @@ -545,13 +546,13 @@ Concept index Tag Table: Node: Top210 -Node: Introduction897 -Node: Algorithm6100 -Node: Invoking clzip8930 -Node: File format14479 -Node: Examples16881 -Node: Problems18850 -Node: Concept index19376 +Node: Introduction893 +Node: Invoking clzip6152 +Node: File format11705 +Node: Algorithm14108 +Node: Examples16933 +Node: Problems18900 +Node: Concept index19426 End Tag Table diff --git a/doc/clzip.texi b/doc/clzip.texi index a74ec6f..e2ca889 100644 --- a/doc/clzip.texi +++ b/doc/clzip.texi @@ -6,8 +6,8 @@ @finalout @c %**end of header -@set UPDATED 23 May 2015 -@set VERSION 1.7-rc1 +@set UPDATED 7 July 2015 +@set VERSION 1.7 @dircategory Data Compression @direntry @@ -36,9 +36,9 @@ This manual is for Clzip (version @value{VERSION}, @value{UPDATED}). @menu * Introduction:: Purpose and features of clzip -* Algorithm:: How clzip compresses the data * Invoking clzip:: Command line interface * File format:: Detailed format of the compressed file +* Algorithm:: How clzip compresses the data * Examples:: A small tutorial with examples * Problems:: Reporting bugs * Concept index:: Index of concepts @@ -72,10 +72,14 @@ availability: @itemize @bullet @item The lzip format provides very safe integrity checking and some data -recovery means. The lziprecover program can repair bit-flip errors (one -of the most common forms of data corruption) in lzip files, and provides -data recovery capabilities, including error-checked merging of damaged -copies of a file. +recovery means. The +@uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover} +program can repair bit-flip errors (one of the most common forms of data +corruption) in lzip files, and provides data recovery capabilities, +including error-checked merging of damaged copies of a file. +@ifnothtml +@ref{Data safety,,,lziprecover}. +@end ifnothtml @item The lzip format is as simple as possible (but not simpler). The lzip @@ -111,6 +115,11 @@ bzip2, which makes it safer than compressors returning ambiguous warning values (like gzip) when it is used as a back end for other programs like tar or zutils. +Clzip will automatically use the smallest possible dictionary size for +each file without exceeding the given limit. Keep in mind that the +decompression memory requirement is affected at compression time by the +choice of dictionary size limit. + The amount of memory required for compression is about 1 or 2 times the dictionary size limit (1 if input file size is less than dictionary size limit, else 2) plus 9 times the dictionary size really used. The option @@ -118,11 +127,6 @@ limit, else 2) plus 9 times the dictionary size really used. The option of memory required for decompression is about 46 kB larger than the dictionary size really used. -Clzip will automatically use the smallest possible dictionary size for -each file without exceeding the given limit. Keep in mind that the -decompression memory requirement is affected at compression time by the -choice of dictionary size limit. - When compressing, clzip replaces every file given in the command line with a compressed version of itself, with the name "original_name.lz". When decompressing, clzip attempts to guess the name for the decompressed @@ -164,72 +168,6 @@ automatically creating multi-member output. The members so created are large, about 2 PiB each. -@node Algorithm -@chapter Algorithm -@cindex algorithm - -In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a -concrete algorithm; it is more like "any algorithm using the LZMA coding -scheme". For example, the option '-0' of lzip uses the scheme in almost -the simplest way possible; issuing the longest match it can find, or a -literal byte if it can't find a match. Inversely, a much more elaborated -way of finding coding sequences of minimum size than the one currently -used by lzip could be developed, and the resulting sequence could also -be coded using the LZMA coding scheme. - -Clzip currently implements two variants of the LZMA algorithm; fast -(used by option -0) and normal (used by all other compression levels). - -The high compression of LZMA comes from combining two basic, well-proven -compression ideas: sliding dictionaries (LZ77/78) and markov models (the -thing used by every compression algorithm that uses a range encoder or -similar order-0 entropy coder as its last stage) with segregation of -contexts according to what the bits are used for. - -Clzip is a two stage compressor. The first stage is a Lempel-Ziv coder, -which reduces redundancy by translating chunks of data to their -corresponding distance-length pairs. The second stage is a range encoder -that uses a different probability model for each type of data; -distances, lengths, literal bytes, etc. - -Here is how it works, step by step: - -1) The member header is written to the output stream. - -2) The first byte is coded literally, because there are no previous -bytes to which the match finder can refer to. - -3) The main encoder advances to the next byte in the input data and -calls the match finder. - -4) The match finder fills an array with the minimum distances before the -current byte where a match of a given length can be found. - -5) Go back to step 3 until a sequence (formed of pairs, repeated -distances and literal bytes) of minimum price has been formed. Where the -price represents the number of output bits produced. - -6) The range encoder encodes the sequence produced by the main encoder -and sends the produced bytes to the output stream. - -7) Go back to step 3 until the input data are finished or until the -member or volume size limits are reached. - -8) The range encoder is flushed. - -9) The member trailer is written to the output stream. - -10) If there are more data to compress, go back to step 1. - -@sp 1 -@noindent -The ideas embodied in clzip are due to (at least) the following people: -Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for -the definition of Markov chains), G.N.N. Martin (for the definition of -range encoding), Igor Pavlov (for putting all the above together in -LZMA), and Julian Seward (for bzip2's CLI). - - @node Invoking clzip @chapter Invoking clzip @cindex invoking @@ -276,7 +214,7 @@ Force overwrite of output files. @item -F @itemx --recompress -Force recompression of files whose name already has the @samp{.lz} or +Force re-compression of files whose name already has the @samp{.lz} or @samp{.tlz} suffix. @item -k @@ -476,6 +414,72 @@ facilitates safe recovery of undamaged members from multi-member files. @end table +@node Algorithm +@chapter Algorithm +@cindex algorithm + +In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a +concrete algorithm; it is more like "any algorithm using the LZMA coding +scheme". For example, the option @samp{-0} of lzip uses the scheme in almost +the simplest way possible; issuing the longest match it can find, or a +literal byte if it can't find a match. Inversely, a much more elaborated +way of finding coding sequences of minimum size than the one currently +used by lzip could be developed, and the resulting sequence could also +be coded using the LZMA coding scheme. + +Clzip currently implements two variants of the LZMA algorithm; fast +(used by option @samp{-0}) and normal (used by all other compression levels). + +The high compression of LZMA comes from combining two basic, well-proven +compression ideas: sliding dictionaries (LZ77/78) and markov models (the +thing used by every compression algorithm that uses a range encoder or +similar order-0 entropy coder as its last stage) with segregation of +contexts according to what the bits are used for. + +Clzip is a two stage compressor. The first stage is a Lempel-Ziv coder, +which reduces redundancy by translating chunks of data to their +corresponding distance-length pairs. The second stage is a range encoder +that uses a different probability model for each type of data; +distances, lengths, literal bytes, etc. + +Here is how it works, step by step: + +1) The member header is written to the output stream. + +2) The first byte is coded literally, because there are no previous +bytes to which the match finder can refer to. + +3) The main encoder advances to the next byte in the input data and +calls the match finder. + +4) The match finder fills an array with the minimum distances before the +current byte where a match of a given length can be found. + +5) Go back to step 3 until a sequence (formed of pairs, repeated +distances and literal bytes) of minimum price has been formed. Where the +price represents the number of output bits produced. + +6) The range encoder encodes the sequence produced by the main encoder +and sends the produced bytes to the output stream. + +7) Go back to step 3 until the input data are finished or until the +member or volume size limits are reached. + +8) The range encoder is flushed. + +9) The member trailer is written to the output stream. + +10) If there are more data to compress, go back to step 1. + +@sp 1 +@noindent +The ideas embodied in clzip are due to (at least) the following people: +Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for +the definition of Markov chains), G.N.N. Martin (for the definition of +range encoding), Igor Pavlov (for putting all the above together in +LZMA), and Julian Seward (for bzip2's CLI). + + @node Examples @chapter A small tutorial with examples @cindex examples |