diff options
Diffstat (limited to '')
-rw-r--r-- | doc/lzip.texi (renamed from doc/lzip.texinfo) | 100 |
1 files changed, 54 insertions, 46 deletions
diff --git a/doc/lzip.texinfo b/doc/lzip.texi index cfc9138..957af34 100644 --- a/doc/lzip.texinfo +++ b/doc/lzip.texi @@ -6,8 +6,8 @@ @finalout @c %**end of header -@set UPDATED 20 September 2013 -@set VERSION 1.15 +@set UPDATED 11 January 2014 +@set VERSION 1.16-pre1 @dircategory Data Compression @direntry @@ -47,7 +47,7 @@ This manual is for Lzip (version @value{VERSION}, @value{UPDATED}). @end menu @sp 1 -Copyright @copyright{} 2008, 2009, 2010, 2011, 2012, 2013 +Copyright @copyright{} 2008, 2009, 2010, 2011, 2012, 2013, 2014 Antonio Diaz Diaz. This manual is free documentation: you have unlimited permission @@ -59,23 +59,28 @@ to copy, distribute and modify it. @cindex introduction Lzip is a lossless data compressor with a user interface similar to the -one of gzip or bzip2. Lzip decompresses almost as fast as gzip and -compresses more than bzip2, which makes it well suited for software -distribution and data archiving. Lzip is a clean implementation of the -LZMA algorithm. +one of gzip or bzip2. Lzip is about as fast as gzip, compresses most +files more than bzip2, and is better than both from a data recovery +perspective. Lzip is a clean implementation of the LZMA algorithm. The lzip file format is designed for long-term data archiving and -provides very safe integrity checking. The member trailer stores the -32-bit CRC of the original data, the size of the original data and the -size of the member. These values, together with the value remaining in -the range decoder and the end-of-stream marker, provide a 4 factor -integrity checking which guarantees that the decompressed version of the -data is identical to the original. This guards against corruption of the -compressed data, and against undetected bugs in lzip (hopefully very -unlikely). The chances of data corruption going undetected are -microscopic. Be aware, though, that the check occurs upon decompression, -so it can only tell you that something is wrong. It can't help you -recover the original uncompressed data. +provides very safe integrity checking. It is as simple as possible (but +not simpler), so that with the only help of the lzip manual it would be +possible for a digital archaeologist to extract the data from a lzip +file long after quantum computers eventually render LZMA obsolete. +Additionally lzip is copylefted, which guarantees that it will remain +free forever. + +The member trailer stores the 32-bit CRC of the original data, the size +of the original data and the size of the member. These values, together +with the value remaining in the range decoder and the end-of-stream +marker, provide a 4 factor integrity checking which guarantees that the +decompressed version of the data is identical to the original. This +guards against corruption of the compressed data, and against undetected +bugs in lzip (hopefully very unlikely). The chances of data corruption +going undetected are microscopic. Be aware, though, that the check +occurs upon decompression, so it can only tell you that something is +wrong. It can't help you recover the original uncompressed data. If you ever need to recover data from a damaged lzip file, try the lziprecover program. Lziprecover makes lzip files resistant to bit-flip @@ -87,12 +92,25 @@ Lzip uses the same well-defined exit status values used by bzip2, which makes it safer than compressors returning ambiguous warning values (like gzip) when it is used as a back end for tar or zutils. -Lzip replaces every file given in the command line with a compressed -version of itself, with the name "original_name.lz". Each compressed -file has the same modification date, permissions, and, when possible, -ownership as the corresponding original, so that these properties can be -correctly restored at decompression time. Lzip is able to read from some -types of non regular files if the @samp{--stdout} option is specified. +When compressing, lzip replaces every file given in the command line +with a compressed version of itself, with the name "original_name.lz". +When decompressing, lzip attempts to guess the name for the decompressed +file from that of the compressed file as follows: + +@multitable {anyothername} {becomes} {anyothername.out} +@item filename.lz @tab becomes @tab filename +@item filename.tlz @tab becomes @tab filename.tar +@item anyothername @tab becomes @tab anyothername.out +@end multitable + +(De)compressing a file is much like copying or moving it; therefore lzip +preserves the access and modification dates, permissions, and, when +possible, ownership of the file just as "cp -p" does. (If the user ID or +the group ID can't be duplicated, the file permission bits S_ISUID and +S_ISGID are cleared). + +Lzip is able to read from some types of non regular files if the +@samp{--stdout} option is specified. If no file names are specified, lzip compresses (or decompresses) from standard input to standard output. In this case, lzip will decline to @@ -118,23 +136,14 @@ The amount of memory required for compression is about 1 or 2 times the dictionary size limit (1 if input file size is less than dictionary size limit, else 2) plus 9 times the dictionary size really used. The option @samp{-0} is special and only requires about 1.5 MiB at most. The amount -of memory required for decompression is only a few tens of KiB larger -than the dictionary size really used. +of memory required for decompression is about 46 kB larger than the +dictionary size really used. Lzip will automatically use the smallest possible dictionary size without exceeding the given limit. Keep in mind that the decompression memory requirement is affected at compression time by the choice of dictionary size limit. -When decompressing, lzip attempts to guess the name for the decompressed -file from that of the compressed file as follows: - -@multitable {anyothername} {becomes} {anyothername.out} -@item filename.lz @tab becomes @tab filename -@item filename.tlz @tab becomes @tab filename.tar -@item anyothername @tab becomes @tab anyothername.out -@end multitable - @node Algorithm @chapter Algorithm @@ -179,7 +188,7 @@ price represents the number of output bits produced. 6) The range encoder encodes the sequence produced by the main encoder and sends the produced bytes to the output stream. -7) Go back to step 3 until the input data is finished or until the +7) Go back to step 3 until the input data are finished or until the member or volume size limits are reached. 8) The range encoder is flushed. @@ -566,14 +575,14 @@ byte. @samp{rep} is any one of @samp{rep0}, @samp{rep1}, @samp{rep2} or @samp{rep3}. The types of previous sequences corresponding to each state are: -@multitable {State} {literal, shortrep, literal, literal} +@multitable {State} {rep or (!literal, shortrep), literal, literal} @headitem State @tab Types of previous sequences @item 0 @tab literal, literal, literal @item 1 @tab match, literal, literal -@item 2 @tab (rep or shortrep), literal, literal +@item 2 @tab rep or (!literal, shortrep), literal, literal @item 3 @tab literal, shortrep, literal, literal @item 4 @tab match, literal -@item 5 @tab (rep or shortrep), literal +@item 5 @tab rep or (!literal, shortrep), literal @item 6 @tab literal, shortrep, literal @item 7 @tab literal, match @item 8 @tab literal, rep @@ -667,7 +676,7 @@ Of Stream" marker is decoded. WARNING! Even if lzip is bug-free, other causes may result in a corrupt compressed file (bugs in the system libraries, memory errors, etc). -Therefore, if the data you are going to compress is important, give the +Therefore, if the data you are going to compress are important, give the @samp{--keep} option to lzip and do not remove the original file until you verify the compressed file with a command like @w{@samp{lzip -cd file.lz | cmp file -}}. @@ -785,7 +794,7 @@ find by running @w{@samp{lzip --version}}. @verbatim /* Lzd - Educational decompressor for lzip files - Copyright (C) 2013 Antonio Diaz Diaz. + Copyright (C) 2013, 2014 Antonio Diaz Diaz. This program is free software: you have unlimited permission to copy, distribute and modify it. @@ -995,7 +1004,7 @@ public: break; } } - return symbol - 0x100; + return symbol & 0xFF; } int decode_len( Len_model & lm, const int pos_state ) @@ -1162,7 +1171,7 @@ bool LZ_decoder::decode_member() // Returns false if error } } state.set_match(); - if( rep0 >= dictionary_size || ( rep0 >= pos && !partial_data_pos ) ) + if( rep0 >= dictionary_size || rep0 >= data_position() ) return false; } for( int i = 0; i < len; ++i ) @@ -1184,7 +1193,7 @@ int main( const int argc, const char * const argv[] ) "It is not safe to use lzd for any real work.\n" "\nUsage: %s < file.lz > file\n", argv[0] ); std::printf( "Lzd decompresses from standard input to standard output.\n" - "\nCopyright (C) 2013 Antonio Diaz Diaz.\n" + "\nCopyright (C) 2014 Antonio Diaz Diaz.\n" "This is free software: you are free to change and redistribute it.\n" "There is NO WARRANTY, to the extent permitted by law.\n" "Report bugs to lzip-bug@nongnu.org\n" @@ -1195,8 +1204,7 @@ int main( const int argc, const char * const argv[] ) for( bool first_member = true; ; first_member = false ) { File_header header; - for( int i = 0; i < 6; ++i ) - header[i] = std::getc( stdin ); + for( int i = 0; i < 6; ++i ) header[i] = std::getc( stdin ); if( std::feof( stdin ) || std::memcmp( header, "LZIP", 4 ) != 0 ) { if( first_member ) |