summaryrefslogtreecommitdiffstats
path: root/doc/plzip.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/plzip.texi')
-rw-r--r--doc/plzip.texi72
1 files changed, 54 insertions, 18 deletions
diff --git a/doc/plzip.texi b/doc/plzip.texi
index 7678977..bc7af6d 100644
--- a/doc/plzip.texi
+++ b/doc/plzip.texi
@@ -6,8 +6,8 @@
@finalout
@c %**end of header
-@set UPDATED 1 July 2014
-@set VERSION 1.2-rc2
+@set UPDATED 29 August 2014
+@set VERSION 1.2
@dircategory Data Compression
@direntry
@@ -44,8 +44,7 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}).
@end menu
@sp 1
-Copyright @copyright{} 2009, 2010, 2011, 2012, 2013, 2014
-Antonio Diaz Diaz.
+Copyright @copyright{} 2009-2014 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission
to copy, distribute and modify it.
@@ -55,15 +54,15 @@ to copy, distribute and modify it.
@chapter Introduction
@cindex introduction
-Plzip is a massively parallel (multi-threaded), lossless data compressor
+Plzip is a massively parallel (multi-threaded) lossless data compressor
based on the lzlib compression library, with a user interface similar to
the one of lzip, bzip2 or gzip.
Plzip can compress/decompress large files on multiprocessor machines
much faster than lzip, at the cost of a slightly reduced compression
-ratio. Note that the number of usable threads is limited by file size,
-so on files larger than a few GB plzip can use hundreds of processors,
-but on files of only a few MB plzip is no faster than lzip.
+ratio. Note that the number of usable threads is limited by file size;
+on files larger than a few GB plzip can use hundreds of processors, but
+on files of only a few MB plzip is no faster than lzip.
Plzip uses the lzip file format; the files produced by plzip are fully
compatible with lzip-1.4 or newer, and can be rescued with lziprecover.
@@ -92,6 +91,11 @@ Additionally lzip is copylefted, which guarantees that it will remain
free forever.
@end itemize
+A nice feature of the lzip format is that a corrupt byte is easier to
+repair the nearer it is from the beginning of the file. Therefore, with
+the help of lziprecover, losing an entire archive just because of a
+corrupt byte near the beginning is a thing of the past.
+
The member trailer stores the 32-bit CRC of the original data, the size
of the original data and the size of the member. These values, together
with the value remaining in the range decoder and the end-of-stream
@@ -105,7 +109,29 @@ wrong. It can't help you recover the original uncompressed data.
Plzip uses the same well-defined exit status values used by lzip and
bzip2, which makes it safer than compressors returning ambiguous warning
-values (like gzip) when it is used as a back end for tar or zutils.
+values (like gzip) when it is used as a back end for other programs like
+tar or zutils.
+
+The amount of memory required @strong{per thread} is approximately the
+following:
+
+@itemize @bullet
+@item
+For compression; 3 times the data size (@pxref{--data-size}) plus 11
+times the dictionary size.
+
+@item
+For decompression or testing of a non-seekable file or of standard
+input; 2 times the dictionary size plus up to 32 MiB.
+
+@item
+For decompression of a regular file to a non-seekable file or to
+standard output; the dictionary size plus up to 32 MiB.
+
+@item
+For decompression of a regular file to another regular file, or for
+testing of a regular file; the dictionary size.
+@end itemize
Plzip will automatically use the smallest possible dictionary size for
each file without exceeding the given limit. Keep in mind that the
@@ -154,6 +180,16 @@ you verify the compressed file with a command like
@chapter Program design
@cindex program design
+When compressing, plzip divides the input file into chunks and
+compresses as many chunks simultaneously as worker threads are chosen,
+creating a multi-member compressed file.
+
+When decompressing, plzip decompresses as many members simultaneously as
+worker threads are chosen. Files that were compressed with lzip will not
+be decompressed faster than using lzip (unless the @samp{-b} option was
+used) because lzip usually produces single-member files, which can't be
+decompressed in parallel.
+
For each input file, a splitter thread and several worker threads are
created, acting the main thread as muxer (multiplexer) thread. A "packet
courier" takes care of data transfers among threads and limits the
@@ -166,10 +202,10 @@ writes them to the output file.
When decompressing from a regular file, the splitter is removed and the
workers read directly from the input file. If the output file is also a
-regular file, the muxer is also removed, and the workers write directly
-to the output file. With these optimizations, decompression speed of
-large files with many members is only limited by the number of
-processors available and by I/O speed.
+regular file, the muxer is also removed and the workers write directly
+to the output file. With these optimizations, the use of RAM is greatly
+reduced and the decompression speed of large files with many members is
+only limited by the number of processors available and by I/O speed.
@node Invoking plzip
@@ -199,11 +235,11 @@ Print the version number of plzip on the standard output and exit.
@item -B @var{bytes}
@itemx --data-size=@var{bytes}
@anchor{--data-size}
-Set the input data block size in bytes. The input file will be divided
-in chunks of this size before compression is performed. Valid values
-range from 8 KiB to 1 GiB. Default value is two times the dictionary
-size. Plzip will reduce the dictionary size if it is larger than the
-chosen data size.
+Set the size of the input data blocks, in bytes. The input file will be
+divided in chunks of this size before compression is performed. Valid
+values range from 8 KiB to 1 GiB. Default value is two times the
+dictionary size. Plzip will reduce the dictionary size if it is larger
+than the chosen data size.
@item -c
@itemx --stdout