summaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
authorDaniel Baumann <mail@daniel-baumann.ch>2015-11-07 10:08:48 +0000
committerDaniel Baumann <mail@daniel-baumann.ch>2015-11-07 10:08:48 +0000
commit0a25ec869a1151af24e3b1aecf708d35eea02f49 (patch)
tree006290ebbabcae34be1487588ff0cbdea359284a /doc
parentAdding debian version 1.17~rc2-1. (diff)
downloadlzip-0a25ec869a1151af24e3b1aecf708d35eea02f49.tar.xz
lzip-0a25ec869a1151af24e3b1aecf708d35eea02f49.zip
Merging upstream version 1.17.
Signed-off-by: Daniel Baumann <mail@daniel-baumann.ch>
Diffstat (limited to 'doc')
-rw-r--r--doc/lzip.14
-rw-r--r--doc/lzip.info506
-rw-r--r--doc/lzip.texi475
3 files changed, 509 insertions, 476 deletions
diff --git a/doc/lzip.1 b/doc/lzip.1
index 6b779f1..b6acca6 100644
--- a/doc/lzip.1
+++ b/doc/lzip.1
@@ -1,5 +1,5 @@
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
-.TH LZIP "1" "May 2015" "lzip 1.17-rc2" "User Commands"
+.TH LZIP "1" "July 2015" "lzip 1.17" "User Commands"
.SH NAME
lzip \- reduces the size of files
.SH SYNOPSIS
@@ -28,7 +28,7 @@ decompress
overwrite existing output files
.TP
\fB\-F\fR, \fB\-\-recompress\fR
-force recompression of compressed files
+force re\-compression of compressed files
.TP
\fB\-k\fR, \fB\-\-keep\fR
keep (don't delete) input files
diff --git a/doc/lzip.info b/doc/lzip.info
index 6854503..f0aa011 100644
--- a/doc/lzip.info
+++ b/doc/lzip.info
@@ -11,16 +11,16 @@ File: lzip.info, Node: Top, Next: Introduction, Up: (dir)
Lzip Manual
***********
-This manual is for Lzip (version 1.17-rc2, 25 May 2015).
+This manual is for Lzip (version 1.17, 12 July 2015).
* Menu:
* Introduction:: Purpose and features of lzip
-* Algorithm:: How lzip compresses the data
* Invoking lzip:: Command line interface
+* Quality assurance:: Design, development and testing of lzip
* File format:: Detailed format of the compressed file
+* Algorithm:: How lzip compresses the data
* Stream format:: Format of the LZMA stream in lzip files
-* Quality assurance:: Design, development and testing of lzip
* Examples:: A small tutorial with examples
* Problems:: Reporting bugs
* Reference source code:: Source code illustrating stream format
@@ -33,7 +33,7 @@ This manual is for Lzip (version 1.17-rc2, 25 May 2015).
copy, distribute and modify it.

-File: lzip.info, Node: Introduction, Next: Algorithm, Prev: Top, Up: Top
+File: lzip.info, Node: Introduction, Next: Invoking lzip, Prev: Top, Up: Top
1 Introduction
**************
@@ -51,7 +51,8 @@ availability:
recovery means. The lziprecover program can repair bit-flip errors
(one of the most common forms of data corruption) in lzip files,
and provides data recovery capabilities, including error-checked
- merging of damaged copies of a file.
+ merging of damaged copies of a file. *note Data safety:
+ (lziprecover)Data safety.
* The lzip format is as simple as possible (but not simpler). The
lzip manual provides the code of a simple decompressor along with
@@ -85,6 +86,11 @@ which makes it safer than compressors returning ambiguous warning
values (like gzip) when it is used as a back end for other programs
like tar or zutils.
+ Lzip will automatically use the smallest possible dictionary size for
+each file without exceeding the given limit. Keep in mind that the
+decompression memory requirement is affected at compression time by the
+choice of dictionary size limit.
+
The amount of memory required for compression is about 1 or 2 times
the dictionary size limit (1 if input file size is less than dictionary
size limit, else 2) plus 9 times the dictionary size really used. The
@@ -92,11 +98,6 @@ option '-0' is special and only requires about 1.5 MiB at most. The
amount of memory required for decompression is about 46 kB larger than
the dictionary size really used.
- Lzip will automatically use the smallest possible dictionary size for
-each file without exceeding the given limit. Keep in mind that the
-decompression memory requirement is affected at compression time by the
-choice of dictionary size limit.
-
When compressing, lzip replaces every file given in the command line
with a compressed version of itself, with the name "original_name.lz".
When decompressing, lzip attempts to guess the name for the decompressed
@@ -126,8 +127,8 @@ corresponding uncompressed files. Integrity testing of concatenated
compressed files is also supported.
Lzip can produce multi-member files and safely recover, with
-lziprecover, the undamaged members in case of file damage. Lzip can also
-split the compressed output in volumes of a given size, even when
+lziprecover, the undamaged members in case of file damage. Lzip can
+also split the compressed output in volumes of a given size, even when
reading from standard input. This allows the direct creation of
multivolume compressed tar archives.
@@ -136,75 +137,9 @@ automatically creating multi-member output. The members so created are
large, about 2 PiB each.

-File: lzip.info, Node: Algorithm, Next: Invoking lzip, Prev: Introduction, Up: Top
-
-2 Algorithm
-***********
-
-In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a
-concrete algorithm; it is more like "any algorithm using the LZMA coding
-scheme". For example, the option '-0' of lzip uses the scheme in almost
-the simplest way possible; issuing the longest match it can find, or a
-literal byte if it can't find a match. Inversely, a much more elaborated
-way of finding coding sequences of minimum size than the one currently
-used by lzip could be developed, and the resulting sequence could also
-be coded using the LZMA coding scheme.
-
- Lzip currently implements two variants of the LZMA algorithm; fast
-(used by option -0) and normal (used by all other compression levels).
-
- The high compression of LZMA comes from combining two basic,
-well-proven compression ideas: sliding dictionaries (LZ77/78) and
-markov models (the thing used by every compression algorithm that uses
-a range encoder or similar order-0 entropy coder as its last stage)
-with segregation of contexts according to what the bits are used for.
-
- Lzip is a two stage compressor. The first stage is a Lempel-Ziv
-coder, which reduces redundancy by translating chunks of data to their
-corresponding distance-length pairs. The second stage is a range encoder
-that uses a different probability model for each type of data;
-distances, lengths, literal bytes, etc.
-
- Here is how it works, step by step:
-
- 1) The member header is written to the output stream.
-
- 2) The first byte is coded literally, because there are no previous
-bytes to which the match finder can refer to.
-
- 3) The main encoder advances to the next byte in the input data and
-calls the match finder.
-
- 4) The match finder fills an array with the minimum distances before
-the current byte where a match of a given length can be found.
-
- 5) Go back to step 3 until a sequence (formed of pairs, repeated
-distances and literal bytes) of minimum price has been formed. Where the
-price represents the number of output bits produced.
-
- 6) The range encoder encodes the sequence produced by the main
-encoder and sends the produced bytes to the output stream.
-
- 7) Go back to step 3 until the input data are finished or until the
-member or volume size limits are reached.
-
- 8) The range encoder is flushed.
-
- 9) The member trailer is written to the output stream.
-
- 10) If there are more data to compress, go back to step 1.
-
-
-The ideas embodied in lzip are due to (at least) the following people:
-Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for
-the definition of Markov chains), G.N.N. Martin (for the definition of
-range encoding), Igor Pavlov (for putting all the above together in
-LZMA), and Julian Seward (for bzip2's CLI).
-
-
-File: lzip.info, Node: Invoking lzip, Next: File format, Prev: Algorithm, Up: Top
+File: lzip.info, Node: Invoking lzip, Next: Quality assurance, Prev: Introduction, Up: Top
-3 Invoking lzip
+2 Invoking lzip
***************
The format for running lzip is:
@@ -244,7 +179,7 @@ The format for running lzip is:
'-F'
'--recompress'
- Force recompression of files whose name already has the '.lz' or
+ Force re-compression of files whose name already has the '.lz' or
'.tlz' suffix.
'-k'
@@ -362,7 +297,155 @@ invalid input file, 3 for an internal consistency error (eg, bug) which
caused lzip to panic.

-File: lzip.info, Node: File format, Next: Stream format, Prev: Invoking lzip, Up: Top
+File: lzip.info, Node: Quality assurance, Next: File format, Prev: Invoking lzip, Up: Top
+
+3 Design, development and testing of lzip
+*****************************************
+
+There are two ways of constructing a software design. One way is to make
+it so simple that there are obviously no deficiencies and the other is
+to make it so complicated that there are no obvious deficiencies.
+-- C.A.R. Hoare
+
+ Lzip has been designed, written and tested with great care to be the
+standard general-purpose compressor for unix-like systems. This chapter
+describes the lessons learned from previous compressors (gzip and
+bzip2), and their application to the design of lzip.
+
+
+3.1 Format design
+=================
+
+When gzip was designed in 1992, computers and operating systems were
+much less capable than they are today. Gzip tried to work around some of
+those limitations, like 8.3 file names, with additional fields in its
+file format.
+
+ Today those limitations have mostly disappeared, and the format of
+gzip has proved to be unnecessarily complicated. It includes fields
+that were never used, others that have lost its usefulness, and finally
+others that have become too limited.
+
+ Bzip2 was designed 5 years later, and its format is simpler than the
+one of gzip.
+
+ Probably the worst defect of the gzip format from the point of view
+of data safety is the variable size of its header. If the byte at
+offset 3 (flags) of a gzip member gets corrupted, it mat become very
+difficult to recover the data, even if the compressed blocks are
+intact, because it can't be known with certainty where the compressed
+blocks begin.
+
+ By contrast, the header of a lzip member has a fixed length of 6. The
+lzma stream in a lzip member always starts at offset 6, making it
+trivial to recover the data even if the whole header becomes corrupt.
+
+ Bzip2 also provides a header of fixed length and marks the begin and
+end of each compressed block with six magic bytes, making it possible to
+find the compressed blocks even in case of file damage. But bzip2 does
+not store the size of each compressed block, as lzip does.
+
+ Lzip provides better data recovery capabilities than any other
+gzip-like compressor because its format has been designed from the
+beginning to be simple and safe. It would be very difficult to write an
+automatic recovery tool like lziprecover for the gzip format. And, as
+far as I know, it has never been writen.
+
+ The lzip format is designed for long-term archiving. Therefore it
+excludes any unneeded features that may interfere with the future
+extraction of the uncompressed data.
+
+
+3.1.1 Gzip format (mis)features not present in lzip
+---------------------------------------------------
+
+'Multiple algorithms'
+ Gzip provides a CM (Compression Method) field that has never been
+ used because it is a bad idea to begin with. New compression
+ methods may require additional fields, making it impossible to
+ implement new methods and, at the same time, keep the same format.
+ This field does not solve the problem of format proliferation; it
+ just makes the problem less obvious.
+
+'Optional fields in header'
+ Unless special precautions are taken, optional fields are
+ generally a bad idea because they produce a header of variable
+ size. The gzip header has 2 fields that, in addition to being
+ optional, are zero-terminated. This means that if any byte inside
+ the field gets zeroed, or if the terminating zero gets altered,
+ gzip won't be able to find neither the header CRC nor the
+ compressed blocks.
+
+'Optional CRC for the header'
+ Using an optional checksum for the header is not only a bad idea,
+ it is an error; it may prevent the extraction of perfectly good
+ data. For example, if the checksum is used and the bit enabling it
+ is reset by a bit-flip, the header will appear to be intact (in
+ spite of being corrupt) while the compressed blocks will appear to
+ be totally unrecoverable (in spite of being intact). Very
+ misleading indeed.
+
+
+3.1.2 Lzip format improvements over gzip and bzip2
+--------------------------------------------------
+
+'64-bit size field'
+ Probably the most frequently reported shortcoming of the gzip
+ format is that it only stores the least significant 32 bits of the
+ uncompressed size. The size of any file larger than 4 GiB gets
+ truncated.
+
+ Bzip2 does not store the uncompressed size of the file.
+
+ The lzip format provides a 64-bit field for the uncompressed size.
+ Additionaly, lzip produces multi-member output automatically when
+ the size is too large for a single member, allowing for an
+ unlimited uncompressed size.
+
+'Distributed index'
+ The lzip format provides a distributed index that, among other
+ things, helps plzip to decompress several times faster than pigz
+ and helps lziprecover do its job. Neither the gzip format nor the
+ bzip2 format do provide an index.
+
+ A distributed index is safer and more scalable than a monolithic
+ index. The monolithic index introduces a single point of failure
+ in the compressed file and may limit the number of members or the
+ total uncompressed size.
+
+
+3.2 Quality of implementation
+=============================
+
+'Multiple implementations'
+ Just like the lzip format provides 4 factor protection against
+ undetected data corruption, the development methodology of the lzip
+ family of compressors provides 3 factor protection against
+ undetected programming errors.
+
+ Three related but independent compressor implementations, lzip,
+ clzip and minilzip/lzlib, are developed concurrently. Every stable
+ release of any of them is subjected to a hundred hours of
+ intensive testing to verify that it produces identical output to
+ the other two. This guarantees that all three implement the same
+ algorithm, and makes it unlikely that any of them may contain
+ serious undiscovered errors. In fact, no errors have been
+ discovered in lzip since 2009.
+
+'Dictionary size'
+ Lzip automatically uses the smallest possible dictionary size for
+ each file. In addition to reducing the amount of memory required
+ for decompression, this feature also minimizes the probability of
+ being affected by RAM errors during compression.
+
+'Exit status'
+ Returning a warning status of 2 is a design flaw of compress that
+ leaked into the design of gzip. Both bzip2 and lzip are free from
+ this flaw.
+
+
+
+File: lzip.info, Node: File format, Next: Algorithm, Prev: Quality assurance, Up: Top
4 File format
*************
@@ -433,9 +516,75 @@ additional information before, between, or after them.

-File: lzip.info, Node: Stream format, Next: Quality assurance, Prev: File format, Up: Top
+File: lzip.info, Node: Algorithm, Next: Stream format, Prev: File format, Up: Top
+
+5 Algorithm
+***********
+
+In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a
+concrete algorithm; it is more like "any algorithm using the LZMA coding
+scheme". For example, the option '-0' of lzip uses the scheme in almost
+the simplest way possible; issuing the longest match it can find, or a
+literal byte if it can't find a match. Inversely, a much more elaborated
+way of finding coding sequences of minimum size than the one currently
+used by lzip could be developed, and the resulting sequence could also
+be coded using the LZMA coding scheme.
+
+ Lzip currently implements two variants of the LZMA algorithm; fast
+(used by option '-0') and normal (used by all other compression levels).
+
+ The high compression of LZMA comes from combining two basic,
+well-proven compression ideas: sliding dictionaries (LZ77/78) and
+markov models (the thing used by every compression algorithm that uses
+a range encoder or similar order-0 entropy coder as its last stage)
+with segregation of contexts according to what the bits are used for.
+
+ Lzip is a two stage compressor. The first stage is a Lempel-Ziv
+coder, which reduces redundancy by translating chunks of data to their
+corresponding distance-length pairs. The second stage is a range encoder
+that uses a different probability model for each type of data;
+distances, lengths, literal bytes, etc.
+
+ Here is how it works, step by step:
+
+ 1) The member header is written to the output stream.
+
+ 2) The first byte is coded literally, because there are no previous
+bytes to which the match finder can refer to.
+
+ 3) The main encoder advances to the next byte in the input data and
+calls the match finder.
+
+ 4) The match finder fills an array with the minimum distances before
+the current byte where a match of a given length can be found.
+
+ 5) Go back to step 3 until a sequence (formed of pairs, repeated
+distances and literal bytes) of minimum price has been formed. Where the
+price represents the number of output bits produced.
+
+ 6) The range encoder encodes the sequence produced by the main
+encoder and sends the produced bytes to the output stream.
+
+ 7) Go back to step 3 until the input data are finished or until the
+member or volume size limits are reached.
+
+ 8) The range encoder is flushed.
+
+ 9) The member trailer is written to the output stream.
+
+ 10) If there are more data to compress, go back to step 1.
+
+
+The ideas embodied in lzip are due to (at least) the following people:
+Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for
+the definition of Markov chains), G.N.N. Martin (for the definition of
+range encoding), Igor Pavlov (for putting all the above together in
+LZMA), and Julian Seward (for bzip2's CLI).
+
+
+File: lzip.info, Node: Stream format, Next: Examples, Prev: Algorithm, Up: Top
-5 Format of the LZMA stream in lzip files
+6 Format of the LZMA stream in lzip files
*****************************************
The LZMA algorithm has three parameters, called "special LZMA
@@ -473,7 +622,7 @@ the lzip download directory. The source code of lzd is included in
appendix A. *note Reference source code::
-5.1 What is coded
+6.1 What is coded
=================
The LZMA stream includes literals, matches and repeated matches (matches
@@ -525,7 +674,7 @@ slot + direct_bits distances from 4 to 127
slot + (direct_bits - 4) + 4 bits distances from 128 to 2^32 - 1
-5.2 The coding contexts
+6.2 The coding contexts
=======================
These contexts ('Bit_model' in the source), are integers or arrays of
@@ -615,7 +764,7 @@ difference is found, the rest of the byte is decoded using the normal
bit tree context. (See 'decode_matched' in the source).
-5.3 The range decoder
+6.3 The range decoder
=====================
The LZMA stream is consumed one byte at a time by the range decoder.
@@ -635,7 +784,7 @@ range decoder. This is done by shifting 5 bytes in the initialization of
source).
-5.4 Decoding the LZMA stream
+6.4 Decoding the LZMA stream
============================
After decoding the member header and obtaining the dictionary size, the
@@ -646,144 +795,7 @@ with the appropriate contexts to decode the different coding sequences
Stream" marker is decoded.

-File: lzip.info, Node: Quality assurance, Next: Examples, Prev: Stream format, Up: Top
-
-6 Design, development and testing of lzip
-*****************************************
-
-There are two ways of constructing a software design. One way is to make
-it so simple that there are obviously no deficiencies and the other is
-to make it so complicated that there are no obvious deficiencies.
--- C.A.R. Hoare
-
- Lzip has been designed, written and tested with great care to be the
-standard general-purpose compressor for unix-like systems. This chapter
-describes the lessons learned from previous compressors (gzip and
-bzip2), and their application to the design of lzip.
-
-
-6.1 Format design
-=================
-
-When gzip was designed in 1992, computers and operating systems were
-much less capable than they are today. Gzip tried to work around some of
-those limitations, like 8.3 file names, with additional fields in its
-file format.
-
- Today those limitations have mostly disappeared, and the format of
-gzip has proved to be unnecessarily complicated. It includes fields
-that were never used, others that have lost its usefulness, and finally
-others that have become too limited.
-
- Bzip2 was designed 5 years later, and its format is in some aspects
-simpler than the one of gzip. But bzip2 also shows complexities in its
-file format which slow down decompression and, in retrospect, are
-unnecessary.
-
- Probably the worst defect of the gzip format from the point of view
-of data safety is the variable size of its header. If the byte at
-offset 3 (flags) of a gzip member gets corrupted, it mat become very
-difficult to recover the data, even if the compressed blocks are
-intact, because it can't be known with certainty where the compressed
-blocks begin.
-
- By contrast, the lzma stream in a lzip member always starts at
-offset 6, making it trivial to recover the data even if the whole
-header becomes corrupt.
-
- Lzip provides better data recovery capabilities than any other
-gzip-like compressor because its format has been designed from the
-beginning to be simple and safe. It would be very difficult to write an
-automatic recovery tool like lziprecover for the gzip format. And, as
-far as I know, it has never been writen.
-
- The lzip format is designed for long-term archiving. Therefore it
-excludes any unneeded features that may interfere with the future
-extraction of the uncompressed data.
-
-
-6.1.1 Gzip format (mis)features not present in lzip
----------------------------------------------------
-
-'Multiple algorithms'
- Gzip provides a CM (Compression Method) field that has never been
- used because it is a bad idea to begin with. New compression
- methods may require additional fields, making it impossible to
- implement new methods and, at the same time, keep the same format.
- This field does not solve the problem of format proliferation; it
- just makes the problem less obvious.
-
-'Optional fields in header'
- Unless special precautions are taken, optional fields are
- generally a bad idea because they produce a header of variable
- size. The gzip header has 2 fields that, in addition to being
- optional, are zero-terminated. This means that if any byte inside
- the field gets zeroed, or if the terminating zero gets altered,
- gzip won't be able to find neither the header CRC nor the
- compressed blocks.
-
- Using an optional checksum for the header is not only a bad idea,
- it is an error; it may prevent the extraction of perfectly good
- data. For example, if the checksum is used and the bit enabling it
- is reset by a bit-flip, the header will appear to be intact (in
- spite of being corrupt) while the compressed blocks will appear to
- be totally unrecoverable (in spite of being intact). Very
- misleading indeed.
-
-
-6.1.2 Lzip format improvements over gzip
-----------------------------------------
-
-'64-bit size field'
- Probably the most frequently reported shortcoming of the gzip
- format is that it only stores the least significant 32 bits of the
- uncompressed size. The size of any file larger than 4 GiB gets
- truncated.
-
- The lzip format provides a 64-bit field for the uncompressed size.
- Additionaly, lzip produces multi-member output automatically when
- the size is too large for a single member, allowing an unlimited
- uncompressed size.
-
-'Distributed index'
- The lzip format provides a distributed index that, among other
- things, helps plzip to decompress several times faster than pigz
- and helps lziprecover do its job. The gzip format does not provide
- an index.
-
- A distributed index is safer and more scalable than a monolithic
- index. The monolithic index introduces a single point of failure
- in the compressed file and may limit the number of members or the
- total uncompressed size.
-
-
-6.2 Quality of implementation
-=============================
-
-Three related but independent compressor implementations, lzip, clzip
-and minilzip/lzlib, are developed concurrently. Every stable release of
-any of them is subjected to a hundred hours of intensive testing to
-verify that it produces identical output to the other two. This
-guarantees that all three implement the same algorithm, and makes it
-unlikely that any of them may contain serious undiscovered errors. In
-fact, no errors have been discovered in lzip since 2009.
-
- Just like the lzip format provides 4 factor protection against
-undetected data corruption, the development methodology described above
-provides 3 factor protection against undetected programming errors in
-lzip.
-
- Lzip automatically uses the smallest possible dictionary size for
-each file. In addition to reducing the amount of memory required for
-decompression, this feature also minimizes the probability of being
-affected by RAM errors during compression.
-
- Returning a warning status of 2 is a design flaw of compress that
-leaked into the design of gzip. Both bzip2 and lzip are free form this
-flaw.
-
-
-File: lzip.info, Node: Examples, Next: Problems, Prev: Quality assurance, Up: Top
+File: lzip.info, Node: Examples, Next: Problems, Prev: Stream format, Up: Top
7 A small tutorial with examples
********************************
@@ -876,7 +888,7 @@ File: lzip.info, Node: Reference source code, Next: Concept index, Prev: Prob
Appendix A Reference source code
********************************
-/* Lzd - Educational decompressor for lzip files
+/* Lzd - Educational decompressor for the lzip format
Copyright (C) 2013-2015 Antonio Diaz Diaz.
This program is free software: you have unlimited permission
@@ -1133,7 +1145,7 @@ class LZ_decoder
}
public:
- LZ_decoder( const unsigned dict_size )
+ explicit LZ_decoder( const unsigned dict_size )
:
partial_data_pos( 0 ),
dictionary_size( dict_size ),
@@ -1160,7 +1172,7 @@ void LZ_decoder::flush_data()
crc32.update_buf( crc_, buffer + stream_pos, size );
errno = 0;
if( std::fwrite( buffer + stream_pos, 1, size, stdout ) != size )
- { std::fprintf( stderr, "Write error: %s.\n", std::strerror( errno ) );
+ { std::fprintf( stderr, "Write error: %s\n", std::strerror( errno ) );
std::exit( 1 ); }
if( pos >= dictionary_size ) { partial_data_pos += pos; pos = 0; }
stream_pos = pos;
@@ -1202,7 +1214,7 @@ bool LZ_decoder::decode_member() // Returns false if error
put_byte( rdec.decode_matched( bm, peek( rep0 ) ) );
state.set_char();
}
- else
+ else // match or repeated match
{
int len;
if( rdec.decode_bit( bm_rep[state()] ) != 0 ) // 2nd bit
@@ -1231,7 +1243,7 @@ bool LZ_decoder::decode_member() // Returns false if error
state.set_rep();
len = min_match_len + rdec.decode_len( rep_len_model, pos_state );
}
- else
+ else // match
{
rep3 = rep2; rep2 = rep1; rep1 = rep0;
len = min_match_len + rdec.decode_len( match_len_model, pos_state );
@@ -1273,7 +1285,7 @@ int main( const int argc, const char * const argv[] )
{
if( argc > 1 )
{
- std::printf( "Lzd %s - Educational decompressor for lzip files.\n",
+ std::printf( "Lzd %s - Educational decompressor for the lzip format.\n",
PROGVERSION );
std::printf( "Study the source to learn how a lzip decompressor works.\n"
"See the lzip manual for an explanation of the code.\n"
@@ -1300,19 +1312,19 @@ int main( const int argc, const char * const argv[] )
if( std::feof( stdin ) || std::memcmp( header, "LZIP\x01", 5 ) != 0 )
{
if( first_member )
- { std::fprintf( stderr, "Bad magic number (file not in lzip format)\n" );
+ { std::fputs( "Bad magic number (file not in lzip format).\n", stderr );
return 2; }
break;
}
unsigned dict_size = 1 << ( header[5] & 0x1F );
dict_size -= ( dict_size / 16 ) * ( ( header[5] >> 5 ) & 7 );
if( dict_size < min_dictionary_size || dict_size > max_dictionary_size )
- { std::fprintf( stderr, "Invalid dictionary size in member header\n" );
+ { std::fputs( "Invalid dictionary size in member header.\n", stderr );
return 2; }
LZ_decoder decoder( dict_size );
if( !decoder.decode_member() )
- { std::fprintf( stderr, "Data error\n" ); return 2; }
+ { std::fputs( "Data error\n", stderr ); return 2; }
File_trailer trailer;
for( int i = 0; i < 20; ++i ) trailer[i] = std::getc( stdin );
@@ -1321,11 +1333,11 @@ int main( const int argc, const char * const argv[] )
unsigned long long data_size = 0;
for( int i = 11; i >= 4; --i ) { data_size <<= 8; data_size += trailer[i]; }
if( crc != decoder.crc() || data_size != decoder.data_position() )
- { std::fprintf( stderr, "CRC error\n" ); return 2; }
+ { std::fputs( "CRC error\n", stderr ); return 2; }
}
if( std::fclose( stdout ) != 0 )
- { std::fprintf( stderr, "Can't close stdout: %s.\n", std::strerror( errno ) );
+ { std::fprintf( stderr, "Can't close stdout: %s\n", std::strerror( errno ) );
return 1; }
return 0;
}
@@ -1357,16 +1369,16 @@ Concept index

Tag Table:
Node: Top208
-Node: Introduction1090
-Node: Algorithm6008
-Node: Invoking lzip8833
-Node: File format14421
-Node: Stream format16806
-Node: Quality assurance26247
-Node: Examples32269
-Node: Problems34230
-Node: Reference source code34760
-Node: Concept index48358
+Node: Introduction1087
+Node: Invoking lzip6060
+Node: Quality assurance11658
+Node: File format18171
+Node: Algorithm20556
+Node: Stream format23382
+Node: Examples32812
+Node: Problems34769
+Node: Reference source code35299
+Node: Concept index48952

End Tag Table
diff --git a/doc/lzip.texi b/doc/lzip.texi
index ac44ee9..69f44ae 100644
--- a/doc/lzip.texi
+++ b/doc/lzip.texi
@@ -6,8 +6,8 @@
@finalout
@c %**end of header
-@set UPDATED 25 May 2015
-@set VERSION 1.17-rc2
+@set UPDATED 12 July 2015
+@set VERSION 1.17
@dircategory Data Compression
@direntry
@@ -36,11 +36,11 @@ This manual is for Lzip (version @value{VERSION}, @value{UPDATED}).
@menu
* Introduction:: Purpose and features of lzip
-* Algorithm:: How lzip compresses the data
* Invoking lzip:: Command line interface
+* Quality assurance:: Design, development and testing of lzip
* File format:: Detailed format of the compressed file
+* Algorithm:: How lzip compresses the data
* Stream format:: Format of the LZMA stream in lzip files
-* Quality assurance:: Design, development and testing of lzip
* Examples:: A small tutorial with examples
* Problems:: Reporting bugs
* Reference source code:: Source code illustrating stream format
@@ -70,10 +70,14 @@ availability:
@itemize @bullet
@item
The lzip format provides very safe integrity checking and some data
-recovery means. The lziprecover program can repair bit-flip errors (one
-of the most common forms of data corruption) in lzip files, and provides
-data recovery capabilities, including error-checked merging of damaged
-copies of a file.
+recovery means. The
+@uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover}
+program can repair bit-flip errors (one of the most common forms of data
+corruption) in lzip files, and provides data recovery capabilities,
+including error-checked merging of damaged copies of a file.
+@ifnothtml
+@ref{Data safety,,,lziprecover}.
+@end ifnothtml
@item
The lzip format is as simple as possible (but not simpler). The lzip
@@ -109,6 +113,11 @@ makes it safer than compressors returning ambiguous warning values (like
gzip) when it is used as a back end for other programs like tar or
zutils.
+Lzip will automatically use the smallest possible dictionary size for
+each file without exceeding the given limit. Keep in mind that the
+decompression memory requirement is affected at compression time by the
+choice of dictionary size limit.
+
The amount of memory required for compression is about 1 or 2 times the
dictionary size limit (1 if input file size is less than dictionary size
limit, else 2) plus 9 times the dictionary size really used. The option
@@ -116,11 +125,6 @@ limit, else 2) plus 9 times the dictionary size really used. The option
of memory required for decompression is about 46 kB larger than the
dictionary size really used.
-Lzip will automatically use the smallest possible dictionary size for
-each file without exceeding the given limit. Keep in mind that the
-decompression memory requirement is affected at compression time by the
-choice of dictionary size limit.
-
When compressing, lzip replaces every file given in the command line
with a compressed version of itself, with the name "original_name.lz".
When decompressing, lzip attempts to guess the name for the decompressed
@@ -152,8 +156,8 @@ corresponding uncompressed files. Integrity testing of concatenated
compressed files is also supported.
Lzip can produce multi-member files and safely recover, with
-lziprecover, the undamaged members in case of file damage. Lzip can also
-split the compressed output in volumes of a given size, even when
+lziprecover, the undamaged members in case of file damage. Lzip can
+also split the compressed output in volumes of a given size, even when
reading from standard input. This allows the direct creation of
multivolume compressed tar archives.
@@ -162,72 +166,6 @@ automatically creating multi-member output. The members so created are
large, about 2 PiB each.
-@node Algorithm
-@chapter Algorithm
-@cindex algorithm
-
-In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a
-concrete algorithm; it is more like "any algorithm using the LZMA coding
-scheme". For example, the option '-0' of lzip uses the scheme in almost
-the simplest way possible; issuing the longest match it can find, or a
-literal byte if it can't find a match. Inversely, a much more elaborated
-way of finding coding sequences of minimum size than the one currently
-used by lzip could be developed, and the resulting sequence could also
-be coded using the LZMA coding scheme.
-
-Lzip currently implements two variants of the LZMA algorithm; fast
-(used by option -0) and normal (used by all other compression levels).
-
-The high compression of LZMA comes from combining two basic, well-proven
-compression ideas: sliding dictionaries (LZ77/78) and markov models (the
-thing used by every compression algorithm that uses a range encoder or
-similar order-0 entropy coder as its last stage) with segregation of
-contexts according to what the bits are used for.
-
-Lzip is a two stage compressor. The first stage is a Lempel-Ziv coder,
-which reduces redundancy by translating chunks of data to their
-corresponding distance-length pairs. The second stage is a range encoder
-that uses a different probability model for each type of data;
-distances, lengths, literal bytes, etc.
-
-Here is how it works, step by step:
-
-1) The member header is written to the output stream.
-
-2) The first byte is coded literally, because there are no previous
-bytes to which the match finder can refer to.
-
-3) The main encoder advances to the next byte in the input data and
-calls the match finder.
-
-4) The match finder fills an array with the minimum distances before the
-current byte where a match of a given length can be found.
-
-5) Go back to step 3 until a sequence (formed of pairs, repeated
-distances and literal bytes) of minimum price has been formed. Where the
-price represents the number of output bits produced.
-
-6) The range encoder encodes the sequence produced by the main encoder
-and sends the produced bytes to the output stream.
-
-7) Go back to step 3 until the input data are finished or until the
-member or volume size limits are reached.
-
-8) The range encoder is flushed.
-
-9) The member trailer is written to the output stream.
-
-10) If there are more data to compress, go back to step 1.
-
-@sp 1
-@noindent
-The ideas embodied in lzip are due to (at least) the following people:
-Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for
-the definition of Markov chains), G.N.N. Martin (for the definition of
-range encoding), Igor Pavlov (for putting all the above together in
-LZMA), and Julian Seward (for bzip2's CLI).
-
-
@node Invoking lzip
@chapter Invoking lzip
@cindex invoking
@@ -274,7 +212,7 @@ Force overwrite of output files.
@item -F
@itemx --recompress
-Force recompression of files whose name already has the @samp{.lz} or
+Force re-compression of files whose name already has the @samp{.lz} or
@samp{.tlz} suffix.
@item -k
@@ -392,6 +330,157 @@ invalid input file, 3 for an internal consistency error (eg, bug) which
caused lzip to panic.
+@node Quality assurance
+@chapter Design, development and testing of lzip
+@cindex quality assurance
+
+There are two ways of constructing a software design. One way is to make
+it so simple that there are obviously no deficiencies and the other is
+to make it so complicated that there are no obvious deficiencies.@*
+--- C.A.R. Hoare
+
+Lzip has been designed, written and tested with great care to be the
+standard general-purpose compressor for unix-like systems. This chapter
+describes the lessons learned from previous compressors (gzip and
+bzip2), and their application to the design of lzip.
+
+@sp 1
+@section Format design
+
+When gzip was designed in 1992, computers and operating systems were
+much less capable than they are today. Gzip tried to work around some of
+those limitations, like 8.3 file names, with additional fields in its
+file format.
+
+Today those limitations have mostly disappeared, and the format of gzip
+has proved to be unnecessarily complicated. It includes fields that were
+never used, others that have lost its usefulness, and finally others
+that have become too limited.
+
+Bzip2 was designed 5 years later, and its format is simpler than the one
+of gzip.
+
+Probably the worst defect of the gzip format from the point of view of
+data safety is the variable size of its header. If the byte at offset 3
+(flags) of a gzip member gets corrupted, it mat become very difficult to
+recover the data, even if the compressed blocks are intact, because it
+can't be known with certainty where the compressed blocks begin.
+
+By contrast, the header of a lzip member has a fixed length of 6. The
+lzma stream in a lzip member always starts at offset 6, making it
+trivial to recover the data even if the whole header becomes corrupt.
+
+Bzip2 also provides a header of fixed length and marks the begin and end
+of each compressed block with six magic bytes, making it possible to
+find the compressed blocks even in case of file damage. But bzip2 does
+not store the size of each compressed block, as lzip does.
+
+Lzip provides better data recovery capabilities than any other gzip-like
+compressor because its format has been designed from the beginning to be
+simple and safe. It would be very difficult to write an automatic
+recovery tool like lziprecover for the gzip format. And, as far as I
+know, it has never been writen.
+
+The lzip format is designed for long-term archiving. Therefore it
+excludes any unneeded features that may interfere with the future
+extraction of the uncompressed data.
+
+@sp 1
+@subsection Gzip format (mis)features not present in lzip
+
+@table @samp
+@item Multiple algorithms
+
+Gzip provides a CM (Compression Method) field that has never been used
+because it is a bad idea to begin with. New compression methods may
+require additional fields, making it impossible to implement new methods
+and, at the same time, keep the same format. This field does not solve
+the problem of format proliferation; it just makes the problem less
+obvious.
+
+@item Optional fields in header
+
+Unless special precautions are taken, optional fields are generally a
+bad idea because they produce a header of variable size. The gzip header
+has 2 fields that, in addition to being optional, are zero-terminated.
+This means that if any byte inside the field gets zeroed, or if the
+terminating zero gets altered, gzip won't be able to find neither the
+header CRC nor the compressed blocks.
+
+@item Optional CRC for the header
+
+Using an optional checksum for the header is not only a bad idea, it is
+an error; it may prevent the extraction of perfectly good data. For
+example, if the checksum is used and the bit enabling it is reset by a
+bit-flip, the header will appear to be intact (in spite of being
+corrupt) while the compressed blocks will appear to be totally
+unrecoverable (in spite of being intact). Very misleading indeed.
+
+@end table
+
+@subsection Lzip format improvements over gzip and bzip2
+
+@table @samp
+@item 64-bit size field
+
+Probably the most frequently reported shortcoming of the gzip format is
+that it only stores the least significant 32 bits of the uncompressed
+size. The size of any file larger than 4 GiB gets truncated.
+
+Bzip2 does not store the uncompressed size of the file.
+
+The lzip format provides a 64-bit field for the uncompressed size.
+Additionaly, lzip produces multi-member output automatically when the
+size is too large for a single member, allowing for an unlimited
+uncompressed size.
+
+@item Distributed index
+
+The lzip format provides a distributed index that, among other things,
+helps plzip to decompress several times faster than pigz and helps
+lziprecover do its job. Neither the gzip format nor the bzip2 format do
+provide an index.
+
+A distributed index is safer and more scalable than a monolithic index.
+The monolithic index introduces a single point of failure in the
+compressed file and may limit the number of members or the total
+uncompressed size.
+
+@end table
+
+@section Quality of implementation
+
+@table @samp
+@item Multiple implementations
+
+Just like the lzip format provides 4 factor protection against
+undetected data corruption, the development methodology of the lzip
+family of compressors provides 3 factor protection against undetected
+programming errors.
+
+Three related but independent compressor implementations, lzip, clzip
+and minilzip/lzlib, are developed concurrently. Every stable release of
+any of them is subjected to a hundred hours of intensive testing to
+verify that it produces identical output to the other two. This
+guarantees that all three implement the same algorithm, and makes it
+unlikely that any of them may contain serious undiscovered errors. In
+fact, no errors have been discovered in lzip since 2009.
+
+@item Dictionary size
+
+Lzip automatically uses the smallest possible dictionary size for each
+file. In addition to reducing the amount of memory required for
+decompression, this feature also minimizes the probability of being
+affected by RAM errors during compression.
+
+@item Exit status
+
+Returning a warning status of 2 is a design flaw of compress that leaked
+into the design of gzip. Both bzip2 and lzip are free from this flaw.
+
+@end table
+
+
@node File format
@chapter File format
@cindex file format
@@ -468,6 +557,72 @@ facilitates safe recovery of undamaged members from multi-member files.
@end table
+@node Algorithm
+@chapter Algorithm
+@cindex algorithm
+
+In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a
+concrete algorithm; it is more like "any algorithm using the LZMA coding
+scheme". For example, the option @samp{-0} of lzip uses the scheme in almost
+the simplest way possible; issuing the longest match it can find, or a
+literal byte if it can't find a match. Inversely, a much more elaborated
+way of finding coding sequences of minimum size than the one currently
+used by lzip could be developed, and the resulting sequence could also
+be coded using the LZMA coding scheme.
+
+Lzip currently implements two variants of the LZMA algorithm; fast
+(used by option @samp{-0}) and normal (used by all other compression levels).
+
+The high compression of LZMA comes from combining two basic, well-proven
+compression ideas: sliding dictionaries (LZ77/78) and markov models (the
+thing used by every compression algorithm that uses a range encoder or
+similar order-0 entropy coder as its last stage) with segregation of
+contexts according to what the bits are used for.
+
+Lzip is a two stage compressor. The first stage is a Lempel-Ziv coder,
+which reduces redundancy by translating chunks of data to their
+corresponding distance-length pairs. The second stage is a range encoder
+that uses a different probability model for each type of data;
+distances, lengths, literal bytes, etc.
+
+Here is how it works, step by step:
+
+1) The member header is written to the output stream.
+
+2) The first byte is coded literally, because there are no previous
+bytes to which the match finder can refer to.
+
+3) The main encoder advances to the next byte in the input data and
+calls the match finder.
+
+4) The match finder fills an array with the minimum distances before the
+current byte where a match of a given length can be found.
+
+5) Go back to step 3 until a sequence (formed of pairs, repeated
+distances and literal bytes) of minimum price has been formed. Where the
+price represents the number of output bits produced.
+
+6) The range encoder encodes the sequence produced by the main encoder
+and sends the produced bytes to the output stream.
+
+7) Go back to step 3 until the input data are finished or until the
+member or volume size limits are reached.
+
+8) The range encoder is flushed.
+
+9) The member trailer is written to the output stream.
+
+10) If there are more data to compress, go back to step 1.
+
+@sp 1
+@noindent
+The ideas embodied in lzip are due to (at least) the following people:
+Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for
+the definition of Markov chains), G.N.N. Martin (for the definition of
+range encoding), Igor Pavlov (for putting all the above together in
+LZMA), and Julian Seward (for bzip2's CLI).
+
+
@node Stream format
@chapter Format of the LZMA stream in lzip files
@cindex format of the LZMA stream
@@ -690,140 +845,6 @@ sequences (matches, repeated matches, and literal bytes), until the "End
Of Stream" marker is decoded.
-@node Quality assurance
-@chapter Design, development and testing of lzip
-@cindex quality assurance
-
-There are two ways of constructing a software design. One way is to make
-it so simple that there are obviously no deficiencies and the other is
-to make it so complicated that there are no obvious deficiencies.@*
---- C.A.R. Hoare
-
-Lzip has been designed, written and tested with great care to be the
-standard general-purpose compressor for unix-like systems. This chapter
-describes the lessons learned from previous compressors (gzip and
-bzip2), and their application to the design of lzip.
-
-@sp 1
-@section Format design
-
-When gzip was designed in 1992, computers and operating systems were
-much less capable than they are today. Gzip tried to work around some of
-those limitations, like 8.3 file names, with additional fields in its
-file format.
-
-Today those limitations have mostly disappeared, and the format of gzip
-has proved to be unnecessarily complicated. It includes fields that were
-never used, others that have lost its usefulness, and finally others
-that have become too limited.
-
-Bzip2 was designed 5 years later, and its format is in some aspects
-simpler than the one of gzip. But bzip2 also shows complexities in its
-file format which slow down decompression and, in retrospect, are
-unnecessary.
-
-Probably the worst defect of the gzip format from the point of view of
-data safety is the variable size of its header. If the byte at offset 3
-(flags) of a gzip member gets corrupted, it mat become very difficult to
-recover the data, even if the compressed blocks are intact, because it
-can't be known with certainty where the compressed blocks begin.
-
-By contrast, the lzma stream in a lzip member always starts at offset 6,
-making it trivial to recover the data even if the whole header becomes
-corrupt.
-
-Lzip provides better data recovery capabilities than any other gzip-like
-compressor because its format has been designed from the beginning to be
-simple and safe. It would be very difficult to write an automatic
-recovery tool like lziprecover for the gzip format. And, as far as I
-know, it has never been writen.
-
-The lzip format is designed for long-term archiving. Therefore it
-excludes any unneeded features that may interfere with the future
-extraction of the uncompressed data.
-
-@sp 1
-@subsection Gzip format (mis)features not present in lzip
-
-@table @samp
-@item Multiple algorithms
-
-Gzip provides a CM (Compression Method) field that has never been used
-because it is a bad idea to begin with. New compression methods may
-require additional fields, making it impossible to implement new methods
-and, at the same time, keep the same format. This field does not solve
-the problem of format proliferation; it just makes the problem less
-obvious.
-
-@item Optional fields in header
-
-Unless special precautions are taken, optional fields are generally a
-bad idea because they produce a header of variable size. The gzip header
-has 2 fields that, in addition to being optional, are zero-terminated.
-This means that if any byte inside the field gets zeroed, or if the
-terminating zero gets altered, gzip won't be able to find neither the
-header CRC nor the compressed blocks.
-
-Using an optional checksum for the header is not only a bad idea, it is
-an error; it may prevent the extraction of perfectly good data. For
-example, if the checksum is used and the bit enabling it is reset by a
-bit-flip, the header will appear to be intact (in spite of being
-corrupt) while the compressed blocks will appear to be totally
-unrecoverable (in spite of being intact). Very misleading indeed.
-
-@end table
-
-@subsection Lzip format improvements over gzip
-
-@table @samp
-@item 64-bit size field
-
-Probably the most frequently reported shortcoming of the gzip format is
-that it only stores the least significant 32 bits of the uncompressed
-size. The size of any file larger than 4 GiB gets truncated.
-
-The lzip format provides a 64-bit field for the uncompressed size.
-Additionaly, lzip produces multi-member output automatically when the
-size is too large for a single member, allowing an unlimited
-uncompressed size.
-
-@item Distributed index
-
-The lzip format provides a distributed index that, among other things,
-helps plzip to decompress several times faster than pigz and helps
-lziprecover do its job. The gzip format does not provide an index.
-
-A distributed index is safer and more scalable than a monolithic index.
-The monolithic index introduces a single point of failure in the
-compressed file and may limit the number of members or the total
-uncompressed size.
-
-@end table
-
-@section Quality of implementation
-
-Three related but independent compressor implementations, lzip, clzip
-and minilzip/lzlib, are developed concurrently. Every stable release of
-any of them is subjected to a hundred hours of intensive testing to
-verify that it produces identical output to the other two. This
-guarantees that all three implement the same algorithm, and makes it
-unlikely that any of them may contain serious undiscovered errors. In
-fact, no errors have been discovered in lzip since 2009.
-
-Just like the lzip format provides 4 factor protection against
-undetected data corruption, the development methodology described above
-provides 3 factor protection against undetected programming errors in
-lzip.
-
-Lzip automatically uses the smallest possible dictionary size for each
-file. In addition to reducing the amount of memory required for
-decompression, this feature also minimizes the probability of being
-affected by RAM errors during compression.
-
-Returning a warning status of 2 is a design flaw of compress that leaked
-into the design of gzip. Both bzip2 and lzip are free form this flaw.
-
-
@node Examples
@chapter A small tutorial with examples
@cindex examples
@@ -947,7 +968,7 @@ find by running @w{@code{lzip --version}}.
@cindex reference source code
@verbatim
-/* Lzd - Educational decompressor for lzip files
+/* Lzd - Educational decompressor for the lzip format
Copyright (C) 2013-2015 Antonio Diaz Diaz.
This program is free software: you have unlimited permission
@@ -1204,7 +1225,7 @@ class LZ_decoder
}
public:
- LZ_decoder( const unsigned dict_size )
+ explicit LZ_decoder( const unsigned dict_size )
:
partial_data_pos( 0 ),
dictionary_size( dict_size ),
@@ -1231,7 +1252,7 @@ void LZ_decoder::flush_data()
crc32.update_buf( crc_, buffer + stream_pos, size );
errno = 0;
if( std::fwrite( buffer + stream_pos, 1, size, stdout ) != size )
- { std::fprintf( stderr, "Write error: %s.\n", std::strerror( errno ) );
+ { std::fprintf( stderr, "Write error: %s\n", std::strerror( errno ) );
std::exit( 1 ); }
if( pos >= dictionary_size ) { partial_data_pos += pos; pos = 0; }
stream_pos = pos;
@@ -1273,7 +1294,7 @@ bool LZ_decoder::decode_member() // Returns false if error
put_byte( rdec.decode_matched( bm, peek( rep0 ) ) );
state.set_char();
}
- else
+ else // match or repeated match
{
int len;
if( rdec.decode_bit( bm_rep[state()] ) != 0 ) // 2nd bit
@@ -1302,7 +1323,7 @@ bool LZ_decoder::decode_member() // Returns false if error
state.set_rep();
len = min_match_len + rdec.decode_len( rep_len_model, pos_state );
}
- else
+ else // match
{
rep3 = rep2; rep2 = rep1; rep1 = rep0;
len = min_match_len + rdec.decode_len( match_len_model, pos_state );
@@ -1344,7 +1365,7 @@ int main( const int argc, const char * const argv[] )
{
if( argc > 1 )
{
- std::printf( "Lzd %s - Educational decompressor for lzip files.\n",
+ std::printf( "Lzd %s - Educational decompressor for the lzip format.\n",
PROGVERSION );
std::printf( "Study the source to learn how a lzip decompressor works.\n"
"See the lzip manual for an explanation of the code.\n"
@@ -1371,19 +1392,19 @@ int main( const int argc, const char * const argv[] )
if( std::feof( stdin ) || std::memcmp( header, "LZIP\x01", 5 ) != 0 )
{
if( first_member )
- { std::fprintf( stderr, "Bad magic number (file not in lzip format)\n" );
+ { std::fputs( "Bad magic number (file not in lzip format).\n", stderr );
return 2; }
break;
}
unsigned dict_size = 1 << ( header[5] & 0x1F );
dict_size -= ( dict_size / 16 ) * ( ( header[5] >> 5 ) & 7 );
if( dict_size < min_dictionary_size || dict_size > max_dictionary_size )
- { std::fprintf( stderr, "Invalid dictionary size in member header\n" );
+ { std::fputs( "Invalid dictionary size in member header.\n", stderr );
return 2; }
LZ_decoder decoder( dict_size );
if( !decoder.decode_member() )
- { std::fprintf( stderr, "Data error\n" ); return 2; }
+ { std::fputs( "Data error\n", stderr ); return 2; }
File_trailer trailer;
for( int i = 0; i < 20; ++i ) trailer[i] = std::getc( stdin );
@@ -1392,11 +1413,11 @@ int main( const int argc, const char * const argv[] )
unsigned long long data_size = 0;
for( int i = 11; i >= 4; --i ) { data_size <<= 8; data_size += trailer[i]; }
if( crc != decoder.crc() || data_size != decoder.data_position() )
- { std::fprintf( stderr, "CRC error\n" ); return 2; }
+ { std::fputs( "CRC error\n", stderr ); return 2; }
}
if( std::fclose( stdout ) != 0 )
- { std::fprintf( stderr, "Can't close stdout: %s.\n", std::strerror( errno ) );
+ { std::fprintf( stderr, "Can't close stdout: %s\n", std::strerror( errno ) );
return 1; }
return 0;
}