summaryrefslogtreecommitdiffstats
path: root/doc/plzip.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/plzip.texi')
-rw-r--r--doc/plzip.texi287
1 files changed, 149 insertions, 138 deletions
diff --git a/doc/plzip.texi b/doc/plzip.texi
index 26c0820..818ecf5 100644
--- a/doc/plzip.texi
+++ b/doc/plzip.texi
@@ -6,10 +6,10 @@
@finalout
@c %**end of header
-@set UPDATED 3 January 2021
-@set VERSION 1.9
+@set UPDATED 24 January 2022
+@set VERSION 1.10
-@dircategory Data Compression
+@dircategory Compression
@direntry
* Plzip: (plzip). Massively parallel implementation of lzip
@end direntry
@@ -40,9 +40,9 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}).
* Output:: Meaning of plzip's output
* Invoking plzip:: Command line interface
* Program design:: Internal structure of plzip
-* File format:: Detailed format of the compressed file
* Memory requirements:: Memory required to compress and decompress
* Minimum file sizes:: Minimum file sizes required for full speed
+* File format:: Detailed format of the compressed file
* Trailing data:: Extra data appended to the file
* Examples:: A small tutorial with examples
* Problems:: Reporting bugs
@@ -50,7 +50,7 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}).
@end menu
@sp 1
-Copyright @copyright{} 2009-2021 Antonio Diaz Diaz.
+Copyright @copyright{} 2009-2022 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to copy,
distribute, and modify it.
@@ -69,13 +69,14 @@ compatible with lzip 1.4 or newer. Plzip uses the compression library
@uref{http://www.nongnu.org/lzip/lzip.html,,Lzip}
is a lossless data compressor with a user interface similar to the one
of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
-chain-Algorithm' (LZMA) stream format, chosen to maximize safety and
-interoperability. Lzip can compress about as fast as gzip @w{(lzip -0)} or
-compress most files more than bzip2 @w{(lzip -9)}. Decompression speed is
-intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 from
-a data recovery perspective. Lzip has been designed, written, and tested
-with great care to replace gzip and bzip2 as the standard general-purpose
-compressed format for unix-like systems.
+chain-Algorithm' (LZMA) stream format and provides a 3 factor integrity
+checking to maximize interoperability and optimize safety. Lzip can compress
+about as fast as gzip @w{(lzip -0)} or compress most files more than bzip2
+@w{(lzip -9)}. Decompression speed is intermediate between gzip and bzip2.
+Lzip is better than gzip and bzip2 from a data recovery perspective. Lzip
+has been designed, written, and tested with great care to replace gzip and
+bzip2 as the standard general-purpose compressed format for unix-like
+systems.
Plzip can compress/decompress large files on multiprocessor machines much
faster than lzip, at the cost of a slightly reduced compression ratio (0.4
@@ -85,8 +86,8 @@ hundreds of processors, but on files of only a few MB plzip is no faster
than lzip. @xref{Minimum file sizes}.
For creation and manipulation of compressed tar archives
-@uref{http://www.nongnu.org/lzip/manual/tarlz_manual.html,,tarlz} can be
-more efficient than using tar and plzip because tarlz is able to keep the
+@uref{http://www.nongnu.org/lzip/manual/tarlz_manual.html,,tarlz} can be more
+efficient than using tar and plzip because tarlz is able to keep the
alignment between tar members and lzip members.
@ifnothtml
@xref{Top,tarlz manual,,tarlz}.
@@ -112,8 +113,8 @@ The lzip format is as simple as possible (but not simpler). The lzip
manual provides the source code of a simple decompressor along with a
detailed explanation of how it works, so that with the only help of the
lzip manual it would be possible for a digital archaeologist to extract
-the data from a lzip file long after quantum computers eventually render
-LZMA obsolete.
+the data from a lzip file long after quantum computers eventually
+render LZMA obsolete.
@item
Additionally the lzip reference implementation is copylefted, which
@@ -145,9 +146,9 @@ file from that of the compressed file as follows:
@item anyothername @tab becomes @tab anyothername.out
@end multitable
-(De)compressing a file is much like copying or moving it; therefore plzip
+(De)compressing a file is much like copying or moving it. Therefore plzip
preserves the access and modification dates, permissions, and, when
-possible, ownership of the file just as @samp{cp -p} does. (If the user ID or
+possible, ownership of the file just as @w{@samp{cp -p}} does. (If the user ID or
the group ID can't be duplicated, the file permission bits S_ISUID and
S_ISGID are cleared).
@@ -258,7 +259,7 @@ garbage that can be safely ignored. @xref{concat-example}.
@anchor{--data-size}
@item -B @var{bytes}
@itemx --data-size=@var{bytes}
-When compressing, set the size of the input data blocks in bytes. The
+When compressing, set the size in bytes of the input data blocks. The
input file will be divided in chunks of this size before compression is
performed. Valid values range from @w{8 KiB} to @w{1 GiB}. Default value
is two times the dictionary size, except for option @samp{-0} where it
@@ -276,10 +277,12 @@ overrides @samp{-o}. @samp{-c} has no effect when testing or listing.
@item -d
@itemx --decompress
-Decompress the files specified. If a file does not exist or can't be
-opened, plzip continues decompressing the rest of the files. If a file
-fails to decompress, or is a terminal, plzip exits immediately without
-decompressing the rest of the files.
+Decompress the files specified. If a file does not exist, can't be opened,
+or the destination file already exists and @samp{--force} has not been
+specified, plzip continues decompressing the rest of the files and exits with
+error status 1. If a file fails to decompress, or is a terminal, plzip exits
+immediately with error status 2 without decompressing the rest of the files.
+A terminal is considered an uncompressed file, and therefore invalid.
@item -f
@itemx --force
@@ -304,10 +307,11 @@ size, the number of members in the file, and the amount of trailing data (if
any) are also printed. With @samp{-vv}, the positions and sizes of each
member in multimember files are also printed.
-@samp{-lq} can be used to verify quickly (without decompressing) the
-structural integrity of the files specified. (Use @samp{--test} to verify
-the data integrity). @samp{-alq} additionally verifies that none of the
-files specified contain trailing data.
+If any file is damaged, does not exist, can't be opened, or is not regular,
+the final exit status will be @w{> 0}. @samp{-lq} can be used to verify
+quickly (without decompressing) the structural integrity of the files
+specified. (Use @samp{--test} to verify the data integrity). @samp{-alq}
+additionally verifies that none of the files specified contain trailing data.
@item -m @var{bytes}
@itemx --match-length=@var{bytes}
@@ -448,8 +452,9 @@ used to compile plzip with the version actually being used at run time and
exit. Report any differences found. Exit with error status 1 if differences
are found. A mismatch may indicate that lzlib is not correctly installed or
that a different version of lzlib has been installed after compiling plzip.
-@w{@samp{plzip -v --check-lib}} shows the version of lzlib being used and
-the value of @samp{LZ_API_VERSION} (if defined).
+Exit with error status 2 if LZ_API_VERSION and LZ_version_string don't
+match. @w{@samp{plzip -v --check-lib}} shows the version of lzlib being used
+and the value of LZ_API_VERSION (if defined).
@ifnothtml
@xref{Library version,,,lzlib}.
@end ifnothtml
@@ -475,9 +480,9 @@ Table of SI and binary prefixes (unit multipliers):
@sp 1
Exit status: 0 for a normal exit, 1 for environmental problems (file not
-found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or
-invalid input file, 3 for an internal consistency error (eg, bug) which
-caused plzip to panic.
+found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or invalid
+input file, 3 for an internal consistency error (e.g., bug) which caused
+plzip to panic.
@node Program design
@@ -486,7 +491,8 @@ caused plzip to panic.
When compressing, plzip divides the input file into chunks and compresses as
many chunks simultaneously as worker threads are chosen, creating a
-multimember compressed file.
+multimember compressed file. Each chunk is compressed in-place (using the
+same buffer for input and output), reducing the amount of RAM required.
When decompressing, plzip decompresses as many members simultaneously as
worker threads are chosen. Files that were compressed with lzip will not
@@ -505,14 +511,14 @@ splitter. The muxer collects processed packets from the workers, and
writes them to the output file.
@verbatim
- ,------------,
+ .------------.
,-->| worker 0 |--,
| `------------' |
-,-------, ,----------, | ,------------, | ,-------, ,--------,
+.-------. .----------. | .------------. | .-------. .--------.
| input |-->| splitter |-+-->| worker 1 |--+-->| muxer |-->| output |
| file | `----------' | `------------' | `-------' | file |
`-------' | ... | `--------'
- | ,------------, |
+ | .------------. |
`-->| worker N-1 |--'
`------------'
@end verbatim
@@ -525,92 +531,6 @@ reduced and the decompression speed of large files with many members is
only limited by the number of processors available and by I/O speed.
-@node File format
-@chapter File format
-@cindex file format
-
-Perfection is reached, not when there is no longer anything to add, but
-when there is no longer anything to take away.@*
---- Antoine de Saint-Exupery
-
-@sp 1
-In the diagram below, a box like this:
-
-@verbatim
-+---+
-| | <-- the vertical bars might be missing
-+---+
-@end verbatim
-
-represents one byte; a box like this:
-
-@verbatim
-+==============+
-| |
-+==============+
-@end verbatim
-
-represents a variable number of bytes.
-
-@sp 1
-A lzip file consists of a series of "members" (compressed data sets).
-The members simply appear one after another in the file, with no
-additional information before, between, or after them.
-
-Each member has the following structure:
-
-@verbatim
-+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size |
-+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-@end verbatim
-
-All multibyte values are stored in little endian order.
-
-@table @samp
-@item ID string (the "magic" bytes)
-A four byte string, identifying the lzip format, with the value "LZIP"
-(0x4C, 0x5A, 0x49, 0x50).
-
-@item VN (version number, 1 byte)
-Just in case something needs to be modified in the future. 1 for now.
-
-@anchor{coded-dict-size}
-@item DS (coded dictionary size, 1 byte)
-The dictionary size is calculated by taking a power of 2 (the base size)
-and subtracting from it a fraction between 0/16 and 7/16 of the base size.@*
-Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@*
-Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
-from the base size to obtain the dictionary size.@*
-Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@*
-Valid values for dictionary size range from 4 KiB to 512 MiB.
-
-@item LZMA stream
-The LZMA stream, finished by an end of stream marker. Uses default values
-for encoder properties.
-@ifnothtml
-@xref{Stream format,,,lzip},
-@end ifnothtml
-@ifhtml
-See
-@uref{http://www.nongnu.org/lzip/manual/lzip_manual.html#Stream-format,,Stream format}
-@end ifhtml
-for a complete description.
-
-@item CRC32 (4 bytes)
-Cyclic Redundancy Check (CRC) of the uncompressed original data.
-
-@item Data size (8 bytes)
-Size of the uncompressed original data.
-
-@item Member size (8 bytes)
-Total size of the member, including header and trailer. This field acts
-as a distributed index, allows the verification of stream integrity, and
-facilitates safe recovery of undamaged members from multimember files.
-
-@end table
-
-
@node Memory requirements
@chapter Memory required to compress and decompress
@cindex memory requirements
@@ -709,6 +629,96 @@ data size for each level:
@end multitable
+@node File format
+@chapter File format
+@cindex file format
+
+Perfection is reached, not when there is no longer anything to add, but
+when there is no longer anything to take away.@*
+--- Antoine de Saint-Exupery
+
+@sp 1
+In the diagram below, a box like this:
+
+@verbatim
++---+
+| | <-- the vertical bars might be missing
++---+
+@end verbatim
+
+represents one byte; a box like this:
+
+@verbatim
++==============+
+| |
++==============+
+@end verbatim
+
+represents a variable number of bytes.
+
+@sp 1
+A lzip file consists of a series of independent "members" (compressed data
+sets). The members simply appear one after another in the file, with no
+additional information before, between, or after them. Each member can
+encode in compressed form up to @w{16 EiB - 1 byte} of uncompressed data.
+The size of a multimember file is unlimited.
+
+Each member has the following structure:
+
+@verbatim
++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size |
++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+@end verbatim
+
+All multibyte values are stored in little endian order.
+
+@table @samp
+@item ID string (the "magic" bytes)
+A four byte string, identifying the lzip format, with the value "LZIP"
+(0x4C, 0x5A, 0x49, 0x50).
+
+@item VN (version number, 1 byte)
+Just in case something needs to be modified in the future. 1 for now.
+
+@anchor{coded-dict-size}
+@item DS (coded dictionary size, 1 byte)
+The dictionary size is calculated by taking a power of 2 (the base size)
+and subtracting from it a fraction between 0/16 and 7/16 of the base size.@*
+Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@*
+Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
+from the base size to obtain the dictionary size.@*
+Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@*
+Valid values for dictionary size range from 4 KiB to 512 MiB.
+
+@item LZMA stream
+The LZMA stream, finished by an "End Of Stream" marker. Uses default values
+for encoder properties.
+@ifnothtml
+@xref{Stream format,,,lzip},
+@end ifnothtml
+@ifhtml
+See
+@uref{http://www.nongnu.org/lzip/manual/lzip_manual.html#Stream-format,,Stream format}
+@end ifhtml
+for a complete description.
+
+@item CRC32 (4 bytes)
+Cyclic Redundancy Check (CRC) of the original uncompressed data.
+
+@item Data size (8 bytes)
+Size of the original uncompressed data.
+
+@item Member size (8 bytes)
+Total size of the member, including header and trailer. This field acts
+as a distributed index, allows the verification of stream integrity, and
+facilitates the safe recovery of undamaged members from multimember files.
+Member size should be limited to @w{2 PiB} to prevent the data size field
+from overflowing.
+
+@end table
+
+
@node Trailing data
@chapter Extra data appended to the file
@cindex trailing data
@@ -795,7 +805,7 @@ plzip -v file
@sp 1
@noindent
-Example 3: Like example 1 but the created @samp{file.lz} has a block size of
+Example 3: Like example 2 but the created @samp{file.lz} has a block size of
@w{1 MiB}. The compression ratio is not shown.
@example
@@ -821,20 +831,9 @@ plzip -tv file.lz
@end example
@sp 1
-@noindent
-Example 6: Compress a whole device in /dev/sdc and send the output to
-@samp{file.lz}.
-
-@example
- plzip -c /dev/sdc > file.lz
-or
- plzip /dev/sdc -o file.lz
-@end example
-
-@sp 1
@anchor{concat-example}
@noindent
-Example 7: The right way of concatenating the decompressed output of two or
+Example 6: The right way of concatenating the decompressed output of two or
more compressed files. @xref{Trailing data}.
@example
@@ -846,7 +845,7 @@ Do this instead
@sp 1
@noindent
-Example 8: Decompress @samp{file.lz} partially until @w{10 KiB} of
+Example 7: Decompress @samp{file.lz} partially until @w{10 KiB} of
decompressed data are produced.
@example
@@ -855,13 +854,24 @@ plzip -cd file.lz | dd bs=1024 count=10
@sp 1
@noindent
-Example 9: Decompress @samp{file.lz} partially from decompressed byte at
+Example 8: Decompress @samp{file.lz} partially from decompressed byte at
offset 10000 to decompressed byte at offset 14999 (5000 bytes are produced).
@example
plzip -cd file.lz | dd bs=1000 skip=10 count=5
@end example
+@sp 1
+@noindent
+Example 9: Compress a whole device in /dev/sdc and send the output to
+@samp{file.lz}.
+
+@example
+ plzip -c /dev/sdc > file.lz
+or
+ plzip /dev/sdc -o file.lz
+@end example
+
@node Problems
@chapter Reporting bugs
@@ -875,7 +885,8 @@ for all eternity, if not longer.
If you find a bug in plzip, please send electronic mail to
@email{lzip-bug@@nongnu.org}. Include the version number, which you can
-find by running @w{@samp{plzip --version}}.
+find by running @w{@samp{plzip --version}} and
+@w{@samp{plzip -v --check-lib}}.
@node Concept index