diff options
author | Daniel Baumann <mail@daniel-baumann.ch> | 2015-11-07 10:03:36 +0000 |
---|---|---|
committer | Daniel Baumann <mail@daniel-baumann.ch> | 2015-11-07 10:03:36 +0000 |
commit | e43c45c952bb5d273724bfc6dd69e4c2de1aa190 (patch) | |
tree | bca9bb3d2f16f3edb5ab0d6f6b209be3f4fabb92 /README | |
parent | Adding upstream version 1.16~pre1. (diff) | |
download | lzip-e43c45c952bb5d273724bfc6dd69e4c2de1aa190.tar.xz lzip-e43c45c952bb5d273724bfc6dd69e4c2de1aa190.zip |
Adding upstream version 1.16~pre2.upstream/1.16_pre2
Signed-off-by: Daniel Baumann <mail@daniel-baumann.ch>
Diffstat (limited to 'README')
-rw-r--r-- | README | 81 |
1 files changed, 42 insertions, 39 deletions
@@ -5,43 +5,42 @@ one of gzip or bzip2. Lzip is about as fast as gzip, compresses most files more than bzip2, and is better than both from a data recovery perspective. Lzip is a clean implementation of the LZMA algorithm. -The lzip file format is designed for long-term data archiving and -provides very safe integrity checking. It is as simple as possible (but -not simpler), so that with the only help of the lzip manual it would be -possible for a digital archaeologist to extract the data from a lzip -file long after quantum computers eventually render LZMA obsolete. -Additionally lzip is copylefted, which guarantees that it will remain -free forever. - -The member trailer stores the 32-bit CRC of the original data, the size -of the original data and the size of the member. These values, together -with the value remaining in the range decoder and the end-of-stream -marker, provide a 4 factor integrity checking which guarantees that the -decompressed version of the data is identical to the original. This -guards against corruption of the compressed data, and against undetected -bugs in lzip (hopefully very unlikely). The chances of data corruption -going undetected are microscopic. Be aware, though, that the check -occurs upon decompression, so it can only tell you that something is -wrong. It can't help you recover the original uncompressed data. - -If you ever need to recover data from a damaged lzip file, try the -lziprecover program. Lziprecover makes lzip files resistant to bit-flip -(one of the most common forms of data corruption), and provides data -recovery capabilities, including error-checked merging of damaged copies -of a file. +The lzip file format is designed for long-term data archiving, taking +into account both data integrity and decoder availability: + + * The lzip format provides very safe integrity checking and some data + recovery means. The lziprecover program can repair bit-flip errors + (one of the most common forms of data corruption) in lzip files, + and provides data recovery capabilities, including error-checked + merging of damaged copies of a file. + + * The lzip format is as simple as possible (but not simpler). The + lzip manual provides the code of a simple decompressor along with a + detailed explanation of how it works, so that with the only help of + the lzip manual it would be possible for a digital archaeologist to + extract the data from a lzip file long after quantum computers + eventually render LZMA obsolete. + + * Additionally lzip is copylefted, which guarantees that it will + remain free forever. Lzip uses the same well-defined exit status values used by bzip2, which makes it safer than compressors returning ambiguous warning values (like gzip) when it is used as a back end for tar or zutils. +Lzip will automatically use the smallest possible dictionary size for +each file without exceeding the given limit. Keep in mind that the +decompression memory requirement is affected at compression time by the +choice of dictionary size limit. + When compressing, lzip replaces every file given in the command line with a compressed version of itself, with the name "original_name.lz". When decompressing, lzip attempts to guess the name for the decompressed file from that of the compressed file as follows: -filename.lz becomes filename -filename.tlz becomes filename.tar -anyothername becomes anyothername.out +filename.lz becomes filename +filename.tlz becomes filename.tar +anyothername becomes anyothername.out (De)compressing a file is much like copying or moving it; therefore lzip preserves the access and modification dates, permissions, and, when @@ -72,18 +71,22 @@ Lzip is able to compress and decompress streams of unlimited size by automatically creating multi-member output. The members so created are large, about 64 PiB each. -Lzip will automatically use the smallest possible dictionary size -without exceeding the given limit. Keep in mind that the decompression -memory requirement is affected at compression time by the choice of -dictionary size limit. - -Lzip implements a simplified version of the LZMA (Lempel-Ziv-Markov -chain-Algorithm) algorithm. The high compression of LZMA comes from -combining two basic, well-proven compression ideas: sliding dictionaries -(LZ77/78) and markov models (the thing used by every compression -algorithm that uses a range encoder or similar order-0 entropy coder as -its last stage) with segregation of contexts according to what the bits -are used for. +There is no such thing as a "LZMA algorithm"; it is more like a "LZMA +coding scheme". For example, the option '-0' of lzip uses the scheme in +almost the simplest way possible; issuing the longest match it can find, +or a literal byte if it can't find a match. Inversely, a much more +elaborated way of finding coding sequences of minimum price than the one +currently used by lzip could be developed, and the resulting sequence +could also be coded using the LZMA coding scheme. + +Lzip currently implements two variants of the LZMA algorithm; fast (used +by option -0) and normal (used by all other compression levels). + +The high compression of LZMA comes from combining two basic, well-proven +compression ideas: sliding dictionaries (LZ77/78) and markov models (the +thing used by every compression algorithm that uses a range encoder or +similar order-0 entropy coder as its last stage) with segregation of +contexts according to what the bits are used for. The ideas embodied in lzip are due to (at least) the following people: Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for |