Merging upstream version 1.17~rc2.

Signed-off-by: Daniel Baumann <mail@daniel-baumann.ch>
author: Daniel Baumann <mail@daniel-baumann.ch> 2015-11-07 10:08:13 +0000
committer: Daniel Baumann <mail@daniel-baumann.ch> 2015-11-07 10:08:13 +0000
commit: c31a05b15eb10df5b9a3daa9d9b1d6a5bc7918c5 (patch)
tree: 4aab201a8b40daa717615f9f3a7b9407a25936e6
parent: Adding debian version 1.17~rc1-1. (diff)
download: lzip-c31a05b15eb10df5b9a3daa9d9b1d6a5bc7918c5.tar.xz
lzip-c31a05b15eb10df5b9a3daa9d9b1d6a5bc7918c5.zip
10 files changed, 370 insertions, 68 deletions
diff --git a/ChangeLog b/ChangeLog
index f10c17a..aa3faae 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2015-05-25  Antonio Diaz Diaz  <antonio@gnu.org>
+
+	* Version 1.17-rc2 released.
+	* lzip.texi: Added chapter 'Quality assurance'.
+
 2015-04-17  Antonio Diaz Diaz  <antonio@gnu.org>
 
 	* Version 1.17-rc1 released.
@@ -27,8 +32,8 @@
 	* main.cc (show_header): Do not show header version.
 	* Ignore option '-n, --threads' for compatibility with plzip.
 	* configure: Options now accept a separate argument.
-	* Added chapter 'Stream format' and appendix 'Reference source code'
-	  to the manual.
+	* lzip.texinfo: Added chapter 'Stream format' and appendix
+	  'Reference source code'.
 
 2013-02-17  Antonio Diaz Diaz  <ant_diaz@teleline.es>
 
@@ -107,7 +112,7 @@
 	* main.cc: Fixed warning about fchown's return value being ignored.
 	* decoder.cc: '-tvvvv' now also shows compression ratio.
 	* main.cc: Set stdin/stdout in binary mode on MSVC and OS2.
-	* New examples have been added to the manual.
+	* lzip.texinfo: Added new examples.
 	* testsuite: 'test1' renamed to 'test.txt'. Added new tests.
 	* Matchfinder types HC4 (4 bytes hash-chain) and HT4 (4 bytes
 	  hash-table) have been tested and found no better than the
@@ -171,7 +176,7 @@
 	* 'member_size' and 'volume_size' are now accurate limits.
 	* Compression speed has been improved.
 	* Implemented bt4 type matchfinder.
-	* Added chapter 'Algorithm' to the manual.
+	* lzip.texinfo: Added chapter 'Algorithm'.
 	* Lzdiff and lzgrep now accept '-h' for '--help' and
 	  '-V' for '--version'.
 	* Makefile.in: Man page is now installed by default.
diff --git a/Makefile.in b/Makefile.in
index cfddf4f..af75ce6 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -18,13 +18,13 @@ objs = arg_parser.o encoder_base.o encoder.o fast_encoder.o decoder.o main.o
 all : $(progname)
 
 $(progname) : $(objs)
-	$(CXX) $(CXXFLAGS) $(LDFLAGS) -o $@ $(objs)
+	$(CXX) $(LDFLAGS) $(CXXFLAGS) -o $@ $(objs)
 
 main.o : main.cc
-	$(CXX) $(CXXFLAGS) $(CPPFLAGS) -DPROGVERSION=\"$(pkgversion)\" -c -o $@ $<
+	$(CXX) $(CPPFLAGS) $(CXXFLAGS) -DPROGVERSION=\"$(pkgversion)\" -c -o $@ $<
 
 %.o : %.cc
-	$(CXX) $(CXXFLAGS) $(CPPFLAGS) -c -o $@ $<
+	$(CXX) $(CPPFLAGS) $(CXXFLAGS) -c -o $@ $<
 
 $(objs)        : Makefile
 arg_parser.o   : arg_parser.h
diff --git a/NEWS b/NEWS
index 7949a14..0a1cbf2 100644
--- a/NEWS
+++ b/NEWS
@@ -3,6 +3,8 @@ Changes in version 1.17:
 The compression code has been reorganized to ease the porting of the
 fast encoder to clzip and lzlib.
 
+The new chapter "Quality assurance" has been added to the manual.
+
 The targets "install-compress", "install-strip-compress",
 "install-info-compress" and "install-man-compress" have been added to
 the Makefile.
diff --git a/README b/README
index 0db23e7..894b77a 100644
--- a/README
+++ b/README
@@ -3,7 +3,7 @@ Description
 Lzip is a lossless data compressor with a user interface similar to the
 one of gzip or bzip2. Lzip is about as fast as gzip, compresses most
 files more than bzip2, and is better than both from a data recovery
-perspective. Lzip is a clean implementation of the LZMA "algorithm".
+perspective.
 
 The lzip file format is designed for data sharing and long-term
 archiving, taking into account both data integrity and decoder
@@ -76,18 +76,19 @@ multivolume compressed tar archives.
 
 Lzip is able to compress and decompress streams of unlimited size by
 automatically creating multi-member output. The members so created are
-large, about 64 PiB each.
-
-There is no such thing as a "LZMA algorithm"; it is more like a "LZMA
-coding scheme". For example, the option '-0' of lzip uses the scheme in
-almost the simplest way possible; issuing the longest match it can find,
-or a literal byte if it can't find a match. Inversely, a much more
-elaborated way of finding coding sequences of minimum size than the one
-currently used by lzip could be developed, and the resulting sequence
-could also be coded using the LZMA coding scheme.
-
-Lzip currently implements two variants of the LZMA algorithm; fast (used
-by option -0) and normal (used by all other compression levels).
+large, about 2 PiB each.
+
+In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a
+concrete algorithm; it is more like "any algorithm using the LZMA coding
+scheme". For example, the option '-0' of lzip uses the scheme in almost
+the simplest way possible; issuing the longest match it can find, or a
+literal byte if it can't find a match. Inversely, a much more elaborated
+way of finding coding sequences of minimum size than the one currently
+used by lzip could be developed, and the resulting sequence could also
+be coded using the LZMA coding scheme.
+
+Lzip currently implements two variants of the LZMA algorithm; fast
+(used by option -0) and normal (used by all other compression levels).
 
 The high compression of LZMA comes from combining two basic, well-proven
 compression ideas: sliding dictionaries (LZ77/78) and markov models (the
diff --git a/configure b/configure
index ab1d532..2a09e4f 100755
--- a/configure
+++ b/configure
@@ -6,7 +6,7 @@
 # to copy, distribute and modify it.
 
 pkgname=lzip
-pkgversion=1.17-rc1
+pkgversion=1.17-rc2
 progname=lzip
 srctrigger=doc/${pkgname}.texi
 
diff --git a/doc/lzip.1 b/doc/lzip.1
index df98ed6..6b779f1 100644
--- a/doc/lzip.1
+++ b/doc/lzip.1
@@ -1,5 +1,5 @@
 .\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.46.1.
-.TH LZIP "1" "April 2015" "lzip 1.17-rc1" "User Commands"
+.TH LZIP "1" "May 2015" "lzip 1.17-rc2" "User Commands"
 .SH NAME
 lzip \- reduces the size of files
 .SH SYNOPSIS
diff --git a/doc/lzip.info b/doc/lzip.info
index 53f7c55..6854503 100644
--- a/doc/lzip.info
+++ b/doc/lzip.info
@@ -11,7 +11,7 @@ File: lzip.info,  Node: Top,  Next: Introduction,  Up: (dir)
 Lzip Manual
 ***********
 
-This manual is for Lzip (version 1.17-rc1, 17 April 2015).
+This manual is for Lzip (version 1.17-rc2, 25 May 2015).
 
 * Menu:
 
@@ -20,6 +20,7 @@ This manual is for Lzip (version 1.17-rc1, 17 April 2015).
 * Invoking lzip::          Command line interface
 * File format::            Detailed format of the compressed file
 * Stream format::          Format of the LZMA stream in lzip files
+* Quality assurance::      Design, development and testing of lzip
 * Examples::               A small tutorial with examples
 * Problems::               Reporting bugs
 * Reference source code::  Source code illustrating stream format
@@ -40,8 +41,7 @@ File: lzip.info,  Node: Introduction,  Next: Algorithm,  Prev: Top,  Up: Top
 Lzip is a lossless data compressor with a user interface similar to the
 one of gzip or bzip2. Lzip is about as fast as gzip, compresses most
 files more than bzip2, and is better than both from a data recovery
-perspective. Lzip is a clean implementation of the LZMA
-(Lempel-Ziv-Markov chain-Algorithm) "algorithm".
+perspective.
 
    The lzip file format is designed for data sharing and long-term
 archiving, taking into account both data integrity and decoder
@@ -133,7 +133,7 @@ multivolume compressed tar archives.
 
    Lzip is able to compress and decompress streams of unlimited size by
 automatically creating multi-member output. The members so created are
-large, about 64 PiB each.
+large, about 2 PiB each.
 
 
 File: lzip.info,  Node: Algorithm,  Next: Invoking lzip,  Prev: Introduction,  Up: Top
@@ -141,13 +141,14 @@ File: lzip.info,  Node: Algorithm,  Next: Invoking lzip,  Prev: Introduction,  U
 2 Algorithm
 ***********
 
-There is no such thing as a "LZMA algorithm"; it is more like a "LZMA
-coding scheme". For example, the option '-0' of lzip uses the scheme in
-almost the simplest way possible; issuing the longest match it can find,
-or a literal byte if it can't find a match. Inversely, a much more
-elaborated way of finding coding sequences of minimum size than the one
-currently used by lzip could be developed, and the resulting sequence
-could also be coded using the LZMA coding scheme.
+In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a
+concrete algorithm; it is more like "any algorithm using the LZMA coding
+scheme". For example, the option '-0' of lzip uses the scheme in almost
+the simplest way possible; issuing the longest match it can find, or a
+literal byte if it can't find a match. Inversely, a much more elaborated
+way of finding coding sequences of minimum size than the one currently
+used by lzip could be developed, and the resulting sequence could also
+be coded using the LZMA coding scheme.
 
    Lzip currently implements two variants of the LZMA algorithm; fast
 (used by option -0) and normal (used by all other compression levels).
@@ -224,7 +225,7 @@ The format for running lzip is:
 '--member-size=BYTES'
      Set the member size limit to BYTES. A small member size may
      degrade compression ratio, so use it only when needed. Valid values
-     range from 100 kB to 64 PiB. Defaults to 64 PiB.
+     range from 100 kB to 2 PiB. Defaults to 2 PiB.
 
 '-c'
 '--stdout'
@@ -432,7 +433,7 @@ additional information before, between, or after them.
 
 
 
-File: lzip.info,  Node: Stream format,  Next: Examples,  Prev: File format,  Up: Top
+File: lzip.info,  Node: Stream format,  Next: Quality assurance,  Prev: File format,  Up: Top
 
 5 Format of the LZMA stream in lzip files
 *****************************************
@@ -645,9 +646,146 @@ with the appropriate contexts to decode the different coding sequences
 Stream" marker is decoded.
 
 
-File: lzip.info,  Node: Examples,  Next: Problems,  Prev: Stream format,  Up: Top
+File: lzip.info,  Node: Quality assurance,  Next: Examples,  Prev: Stream format,  Up: Top
 
-6 A small tutorial with examples
+6 Design, development and testing of lzip
+*****************************************
+
+There are two ways of constructing a software design. One way is to make
+it so simple that there are obviously no deficiencies and the other is
+to make it so complicated that there are no obvious deficiencies.
+-- C.A.R. Hoare
+
+   Lzip has been designed, written and tested with great care to be the
+standard general-purpose compressor for unix-like systems. This chapter
+describes the lessons learned from previous compressors (gzip and
+bzip2), and their application to the design of lzip.
+
+
+6.1 Format design
+=================
+
+When gzip was designed in 1992, computers and operating systems were
+much less capable than they are today. Gzip tried to work around some of
+those limitations, like 8.3 file names, with additional fields in its
+file format.
+
+   Today those limitations have mostly disappeared, and the format of
+gzip has proved to be unnecessarily complicated. It includes fields
+that were never used, others that have lost its usefulness, and finally
+others that have become too limited.
+
+   Bzip2 was designed 5 years later, and its format is in some aspects
+simpler than the one of gzip. But bzip2 also shows complexities in its
+file format which slow down decompression and, in retrospect, are
+unnecessary.
+
+   Probably the worst defect of the gzip format from the point of view
+of data safety is the variable size of its header. If the byte at
+offset 3 (flags) of a gzip member gets corrupted, it mat become very
+difficult to recover the data, even if the compressed blocks are
+intact, because it can't be known with certainty where the compressed
+blocks begin.
+
+   By contrast, the lzma stream in a lzip member always starts at
+offset 6, making it trivial to recover the data even if the whole
+header becomes corrupt.
+
+   Lzip provides better data recovery capabilities than any other
+gzip-like compressor because its format has been designed from the
+beginning to be simple and safe. It would be very difficult to write an
+automatic recovery tool like lziprecover for the gzip format. And, as
+far as I know, it has never been writen.
+
+   The lzip format is designed for long-term archiving. Therefore it
+excludes any unneeded features that may interfere with the future
+extraction of the uncompressed data.
+
+
+6.1.1 Gzip format (mis)features not present in lzip
+---------------------------------------------------
+
+'Multiple algorithms'
+     Gzip provides a CM (Compression Method) field that has never been
+     used because it is a bad idea to begin with. New compression
+     methods may require additional fields, making it impossible to
+     implement new methods and, at the same time, keep the same format.
+     This field does not solve the problem of format proliferation; it
+     just makes the problem less obvious.
+
+'Optional fields in header'
+     Unless special precautions are taken, optional fields are
+     generally a bad idea because they produce a header of variable
+     size. The gzip header has 2 fields that, in addition to being
+     optional, are zero-terminated.  This means that if any byte inside
+     the field gets zeroed, or if the terminating zero gets altered,
+     gzip won't be able to find neither the header CRC nor the
+     compressed blocks.
+
+     Using an optional checksum for the header is not only a bad idea,
+     it is an error; it may prevent the extraction of perfectly good
+     data. For example, if the checksum is used and the bit enabling it
+     is reset by a bit-flip, the header will appear to be intact (in
+     spite of being corrupt) while the compressed blocks will appear to
+     be totally unrecoverable (in spite of being intact). Very
+     misleading indeed.
+
+
+6.1.2 Lzip format improvements over gzip
+----------------------------------------
+
+'64-bit size field'
+     Probably the most frequently reported shortcoming of the gzip
+     format is that it only stores the least significant 32 bits of the
+     uncompressed size. The size of any file larger than 4 GiB gets
+     truncated.
+
+     The lzip format provides a 64-bit field for the uncompressed size.
+     Additionaly, lzip produces multi-member output automatically when
+     the size is too large for a single member, allowing an unlimited
+     uncompressed size.
+
+'Distributed index'
+     The lzip format provides a distributed index that, among other
+     things, helps plzip to decompress several times faster than pigz
+     and helps lziprecover do its job. The gzip format does not provide
+     an index.
+
+     A distributed index is safer and more scalable than a monolithic
+     index.  The monolithic index introduces a single point of failure
+     in the compressed file and may limit the number of members or the
+     total uncompressed size.
+
+
+6.2 Quality of implementation
+=============================
+
+Three related but independent compressor implementations, lzip, clzip
+and minilzip/lzlib, are developed concurrently. Every stable release of
+any of them is subjected to a hundred hours of intensive testing to
+verify that it produces identical output to the other two. This
+guarantees that all three implement the same algorithm, and makes it
+unlikely that any of them may contain serious undiscovered errors. In
+fact, no errors have been discovered in lzip since 2009.
+
+   Just like the lzip format provides 4 factor protection against
+undetected data corruption, the development methodology described above
+provides 3 factor protection against undetected programming errors in
+lzip.
+
+   Lzip automatically uses the smallest possible dictionary size for
+each file. In addition to reducing the amount of memory required for
+decompression, this feature also minimizes the probability of being
+affected by RAM errors during compression.
+
+   Returning a warning status of 2 is a design flaw of compress that
+leaked into the design of gzip. Both bzip2 and lzip are free form this
+flaw.
+
+
+File: lzip.info,  Node: Examples,  Next: Problems,  Prev: Quality assurance,  Up: Top
+
+7 A small tutorial with examples
 ********************************
 
 WARNING! Even if lzip is bug-free, other causes may result in a corrupt
@@ -720,7 +858,7 @@ file with a member size of 32 MiB.
 
 File: lzip.info,  Node: Problems,  Next: Reference source code,  Prev: Examples,  Up: Top
 
-7 Reporting bugs
+8 Reporting bugs
 ****************
 
 There are probably bugs in lzip. There are certainly errors and
@@ -761,6 +899,10 @@ Appendix A Reference source code
 #include <cstring>
 #include <stdint.h>
 #include <unistd.h>
+#if defined(__MSVCRT__) || defined(__OS2__) || defined(_MSC_VER)
+#include <fcntl.h>
+#include <io.h>
+#endif
 
 
 class State
@@ -1146,6 +1288,11 @@ int main( const int argc, const char * const argv[] )
     return 0;
     }
 
+#if defined(__MSVCRT__) || defined(__OS2__) || defined(_MSC_VER)
+  setmode( fileno( stdin ), O_BINARY );
+  setmode( fileno( stdout ), O_BINARY );
+#endif
+
   for( bool first_member = true; ; first_member = false )
     {
     File_header header;
@@ -1201,6 +1348,7 @@ Concept index
 * introduction:                          Introduction.          (line 6)
 * invoking:                              Invoking lzip.         (line 6)
 * options:                               Invoking lzip.         (line 6)
+* quality assurance:                     Quality assurance.     (line 6)
 * reference source code:                 Reference source code. (line 6)
 * usage:                                 Invoking lzip.         (line 6)
 * version:                               Invoking lzip.         (line 6)
@@ -1209,15 +1357,16 @@ Concept index
 
 Tag Table:
 Node: Top208
-Node: Introduction1025
-Node: Algorithm6036
-Node: Invoking lzip8793
-Node: File format14383
-Node: Stream format16768
-Node: Examples26200
-Node: Problems28157
-Node: Reference source code28687
-Node: Concept index42024
+Node: Introduction1090
+Node: Algorithm6008
+Node: Invoking lzip8833
+Node: File format14421
+Node: Stream format16806
+Node: Quality assurance26247
+Node: Examples32269
+Node: Problems34230
+Node: Reference source code34760
+Node: Concept index48358
 
 End Tag Table
 
diff --git a/doc/lzip.texi b/doc/lzip.texi
index 5046140..ac44ee9 100644
--- a/doc/lzip.texi
+++ b/doc/lzip.texi
@@ -6,8 +6,8 @@
 @finalout
 @c %**end of header
 
-@set UPDATED 17 April 2015
-@set VERSION 1.17-rc1
+@set UPDATED 25 May 2015
+@set VERSION 1.17-rc2
 
 @dircategory Data Compression
 @direntry
@@ -40,6 +40,7 @@ This manual is for Lzip (version @value{VERSION}, @value{UPDATED}).
 * Invoking lzip::          Command line interface
 * File format::            Detailed format of the compressed file
 * Stream format::          Format of the LZMA stream in lzip files
+* Quality assurance::      Design, development and testing of lzip
 * Examples::               A small tutorial with examples
 * Problems::               Reporting bugs
 * Reference source code::  Source code illustrating stream format
@@ -60,8 +61,7 @@ to copy, distribute and modify it.
 Lzip is a lossless data compressor with a user interface similar to the
 one of gzip or bzip2. Lzip is about as fast as gzip, compresses most
 files more than bzip2, and is better than both from a data recovery
-perspective. Lzip is a clean implementation of the LZMA
-(Lempel-Ziv-Markov chain-Algorithm) "algorithm".
+perspective.
 
 The lzip file format is designed for data sharing and long-term
 archiving, taking into account both data integrity and decoder
@@ -159,23 +159,24 @@ multivolume compressed tar archives.
 
 Lzip is able to compress and decompress streams of unlimited size by
 automatically creating multi-member output. The members so created are
-large, about 64 PiB each.
+large, about 2 PiB each.
 
 
 @node Algorithm
 @chapter Algorithm
 @cindex algorithm
 
-There is no such thing as a "LZMA algorithm"; it is more like a "LZMA
-coding scheme". For example, the option '-0' of lzip uses the scheme in
-almost the simplest way possible; issuing the longest match it can find,
-or a literal byte if it can't find a match. Inversely, a much more
-elaborated way of finding coding sequences of minimum size than the one
-currently used by lzip could be developed, and the resulting sequence
-could also be coded using the LZMA coding scheme.
+In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a
+concrete algorithm; it is more like "any algorithm using the LZMA coding
+scheme". For example, the option '-0' of lzip uses the scheme in almost
+the simplest way possible; issuing the longest match it can find, or a
+literal byte if it can't find a match. Inversely, a much more elaborated
+way of finding coding sequences of minimum size than the one currently
+used by lzip could be developed, and the resulting sequence could also
+be coded using the LZMA coding scheme.
 
-Lzip currently implements two variants of the LZMA algorithm; fast (used
-by option -0) and normal (used by all other compression levels).
+Lzip currently implements two variants of the LZMA algorithm; fast
+(used by option -0) and normal (used by all other compression levels).
 
 The high compression of LZMA comes from combining two basic, well-proven
 compression ideas: sliding dictionaries (LZ77/78) and markov models (the
@@ -242,7 +243,7 @@ lzip [@var{options}] [@var{files}]
 
 Lzip supports the following options:
 
-@table @samp
+@table @code
 @item -h
 @itemx --help
 Print an informative help message describing the options and exit.
@@ -255,7 +256,7 @@ Print the version number of lzip on the standard output and exit.
 @itemx --member-size=@var{bytes}
 Set the member size limit to @var{bytes}. A small member size may
 degrade compression ratio, so use it only when needed. Valid values
-range from 100 kB to 64 PiB. Defaults to 64 PiB.
+range from 100 kB to 2 PiB. Defaults to 2 PiB.
 
 @item -c
 @itemx --stdout
@@ -689,6 +690,140 @@ sequences (matches, repeated matches, and literal bytes), until the "End
 Of Stream" marker is decoded.
 
 
+@node Quality assurance
+@chapter Design, development and testing of lzip
+@cindex quality assurance
+
+There are two ways of constructing a software design. One way is to make
+it so simple that there are obviously no deficiencies and the other is
+to make it so complicated that there are no obvious deficiencies.@*
+--- C.A.R. Hoare
+
+Lzip has been designed, written and tested with great care to be the
+standard general-purpose compressor for unix-like systems. This chapter
+describes the lessons learned from previous compressors (gzip and
+bzip2), and their application to the design of lzip.
+
+@sp 1
+@section Format design
+
+When gzip was designed in 1992, computers and operating systems were
+much less capable than they are today. Gzip tried to work around some of
+those limitations, like 8.3 file names, with additional fields in its
+file format.
+
+Today those limitations have mostly disappeared, and the format of gzip
+has proved to be unnecessarily complicated. It includes fields that were
+never used, others that have lost its usefulness, and finally others
+that have become too limited.
+
+Bzip2 was designed 5 years later, and its format is in some aspects
+simpler than the one of gzip. But bzip2 also shows complexities in its
+file format which slow down decompression and, in retrospect, are
+unnecessary.
+
+Probably the worst defect of the gzip format from the point of view of
+data safety is the variable size of its header. If the byte at offset 3
+(flags) of a gzip member gets corrupted, it mat become very difficult to
+recover the data, even if the compressed blocks are intact, because it
+can't be known with certainty where the compressed blocks begin.
+
+By contrast, the lzma stream in a lzip member always starts at offset 6,
+making it trivial to recover the data even if the whole header becomes
+corrupt.
+
+Lzip provides better data recovery capabilities than any other gzip-like
+compressor because its format has been designed from the beginning to be
+simple and safe. It would be very difficult to write an automatic
+recovery tool like lziprecover for the gzip format. And, as far as I
+know, it has never been writen.
+
+The lzip format is designed for long-term archiving. Therefore it
+excludes any unneeded features that may interfere with the future
+extraction of the uncompressed data.
+
+@sp 1
+@subsection Gzip format (mis)features not present in lzip
+
+@table @samp
+@item Multiple algorithms
+
+Gzip provides a CM (Compression Method) field that has never been used
+because it is a bad idea to begin with. New compression methods may
+require additional fields, making it impossible to implement new methods
+and, at the same time, keep the same format. This field does not solve
+the problem of format proliferation; it just makes the problem less
+obvious.
+
+@item Optional fields in header
+
+Unless special precautions are taken, optional fields are generally a
+bad idea because they produce a header of variable size. The gzip header
+has 2 fields that, in addition to being optional, are zero-terminated.
+This means that if any byte inside the field gets zeroed, or if the
+terminating zero gets altered, gzip won't be able to find neither the
+header CRC nor the compressed blocks.
+
+Using an optional checksum for the header is not only a bad idea, it is
+an error; it may prevent the extraction of perfectly good data. For
+example, if the checksum is used and the bit enabling it is reset by a
+bit-flip, the header will appear to be intact (in spite of being
+corrupt) while the compressed blocks will appear to be totally
+unrecoverable (in spite of being intact). Very misleading indeed.
+
+@end table
+
+@subsection Lzip format improvements over gzip
+
+@table @samp
+@item 64-bit size field
+
+Probably the most frequently reported shortcoming of the gzip format is
+that it only stores the least significant 32 bits of the uncompressed
+size. The size of any file larger than 4 GiB gets truncated.
+
+The lzip format provides a 64-bit field for the uncompressed size.
+Additionaly, lzip produces multi-member output automatically when the
+size is too large for a single member, allowing an unlimited
+uncompressed size.
+
+@item Distributed index
+
+The lzip format provides a distributed index that, among other things,
+helps plzip to decompress several times faster than pigz and helps
+lziprecover do its job. The gzip format does not provide an index.
+
+A distributed index is safer and more scalable than a monolithic index.
+The monolithic index introduces a single point of failure in the
+compressed file and may limit the number of members or the total
+uncompressed size.
+
+@end table
+
+@section Quality of implementation
+
+Three related but independent compressor implementations, lzip, clzip
+and minilzip/lzlib, are developed concurrently. Every stable release of
+any of them is subjected to a hundred hours of intensive testing to
+verify that it produces identical output to the other two. This
+guarantees that all three implement the same algorithm, and makes it
+unlikely that any of them may contain serious undiscovered errors. In
+fact, no errors have been discovered in lzip since 2009.
+
+Just like the lzip format provides 4 factor protection against
+undetected data corruption, the development methodology described above
+provides 3 factor protection against undetected programming errors in
+lzip.
+
+Lzip automatically uses the smallest possible dictionary size for each
+file. In addition to reducing the amount of memory required for
+decompression, this feature also minimizes the probability of being
+affected by RAM errors during compression.
+
+Returning a warning status of 2 is a design flaw of compress that leaked
+into the design of gzip. Both bzip2 and lzip are free form this flaw.
+
+
 @node Examples
 @chapter A small tutorial with examples
 @cindex examples
@@ -835,6 +970,10 @@ find by running @w{@code{lzip --version}}.
 #include <cstring>
 #include <stdint.h>
 #include <unistd.h>
+#if defined(__MSVCRT__) || defined(__OS2__) || defined(_MSC_VER)
+#include <fcntl.h>
+#include <io.h>
+#endif
 
 
 class State
@@ -1220,6 +1359,11 @@ int main( const int argc, const char * const argv[] )
     return 0;
     }
 
+#if defined(__MSVCRT__) || defined(__OS2__) || defined(_MSC_VER)
+  setmode( fileno( stdin ), O_BINARY );
+  setmode( fileno( stdout ), O_BINARY );
+#endif
+
   for( bool first_member = true; ; first_member = false )
     {
     File_header header;
diff --git a/encoder.cc b/encoder.cc
index 4833dd4..3e707f3 100644
--- a/encoder.cc
+++ b/encoder.cc
@@ -441,7 +441,7 @@ int LZ_encoder::sequence_optimizer( const int reps[num_rep_distances],
         trials[++num_trials].price = infinite_price;
 
       int i = 0;
-      while( start_len > pairs[i].len ) ++i;
+      while( pairs[i].len < start_len ) ++i;
       int dis = pairs[i].dis;
       for( int len = start_len; ; ++len )
         {
diff --git a/main.cc b/main.cc
index 1fc42de..27cc156 100644
--- a/main.cc
+++ b/main.cc
@@ -227,7 +227,7 @@ unsigned long long getnum( const char * const ptr,
 int get_dict_size( const char * const arg )
   {
   char * tail;
-  int bits = std::strtol( arg, &tail, 0 );
+  const int bits = std::strtol( arg, &tail, 0 );
   if( bits >= min_dictionary_bits &&
       bits <= max_dictionary_bits && *tail == 0 )
     return ( 1 << bits );
@@ -566,8 +566,9 @@ int decompress( const int infd, const Pretty_print & pp, const bool testing )
                           header.version() ); }
         retval = 2; break;
         }
-      if( header.dictionary_size() < min_dictionary_size ||
-          header.dictionary_size() > max_dictionary_size )
+      const unsigned dictionary_size = header.dictionary_size();
+      if( dictionary_size < min_dictionary_size ||
+          dictionary_size > max_dictionary_size )
         { pp( "Invalid dictionary size in member header." ); retval = 2; break; }
 
       if( verbosity >= 2 || ( verbosity == 1 && first_member ) )
@@ -691,7 +692,7 @@ int main( const int argc, const char * const argv[] )
     { 3 << 23, 132 },		/* -8 */
     { 1 << 25, 273 } };		/* -9 */
   Lzma_options encoder_options = option_mapping[6];	// default = "-6"
-  const unsigned long long max_member_size = 0x0100000000000000ULL;
+  const unsigned long long max_member_size = 0x0008000000000000ULL;
   const unsigned long long max_volume_size = 0x4000000000000000ULL;
   unsigned long long member_size = max_member_size;
   unsigned long long volume_size = 0;
author	Daniel Baumann <mail@daniel-baumann.ch>	2015-11-07 10:08:13 +0000
committer	Daniel Baumann <mail@daniel-baumann.ch>	2015-11-07 10:08:13 +0000
commit	c31a05b15eb10df5b9a3daa9d9b1d6a5bc7918c5 (patch)
tree	4aab201a8b40daa717615f9f3a7b9407a25936e6
parent	Adding debian version 1.17~rc1-1. (diff)
download	lzip-c31a05b15eb10df5b9a3daa9d9b1d6a5bc7918c5.tar.xz lzip-c31a05b15eb10df5b9a3daa9d9b1d6a5bc7918c5.zip