summaryrefslogtreecommitdiffstats
path: root/doc/plzip.info
diff options
context:
space:
mode:
Diffstat (limited to 'doc/plzip.info')
-rw-r--r--doc/plzip.info283
1 files changed, 147 insertions, 136 deletions
diff --git a/doc/plzip.info b/doc/plzip.info
index d70163e..c38ea5c 100644
--- a/doc/plzip.info
+++ b/doc/plzip.info
@@ -1,6 +1,6 @@
This is plzip.info, produced by makeinfo version 4.13+ from plzip.texi.
-INFO-DIR-SECTION Data Compression
+INFO-DIR-SECTION Compression
START-INFO-DIR-ENTRY
* Plzip: (plzip). Massively parallel implementation of lzip
END-INFO-DIR-ENTRY
@@ -11,7 +11,7 @@ File: plzip.info, Node: Top, Next: Introduction, Up: (dir)
Plzip Manual
************
-This manual is for Plzip (version 1.9, 3 January 2021).
+This manual is for Plzip (version 1.10, 24 January 2022).
* Menu:
@@ -19,16 +19,16 @@ This manual is for Plzip (version 1.9, 3 January 2021).
* Output:: Meaning of plzip's output
* Invoking plzip:: Command line interface
* Program design:: Internal structure of plzip
-* File format:: Detailed format of the compressed file
* Memory requirements:: Memory required to compress and decompress
* Minimum file sizes:: Minimum file sizes required for full speed
+* File format:: Detailed format of the compressed file
* Trailing data:: Extra data appended to the file
* Examples:: A small tutorial with examples
* Problems:: Reporting bugs
* Concept index:: Index of concepts
- Copyright (C) 2009-2021 Antonio Diaz Diaz.
+ Copyright (C) 2009-2022 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to copy,
distribute, and modify it.
@@ -44,13 +44,14 @@ compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib.
Lzip is a lossless data compressor with a user interface similar to the
one of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
-chain-Algorithm' (LZMA) stream format, chosen to maximize safety and
-interoperability. Lzip can compress about as fast as gzip (lzip -0) or
-compress most files more than bzip2 (lzip -9). Decompression speed is
-intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 from
-a data recovery perspective. Lzip has been designed, written, and tested
-with great care to replace gzip and bzip2 as the standard general-purpose
-compressed format for unix-like systems.
+chain-Algorithm' (LZMA) stream format and provides a 3 factor integrity
+checking to maximize interoperability and optimize safety. Lzip can compress
+about as fast as gzip (lzip -0) or compress most files more than bzip2
+(lzip -9). Decompression speed is intermediate between gzip and bzip2. Lzip
+is better than gzip and bzip2 from a data recovery perspective. Lzip has
+been designed, written, and tested with great care to replace gzip and
+bzip2 as the standard general-purpose compressed format for unix-like
+systems.
Plzip can compress/decompress large files on multiprocessor machines much
faster than lzip, at the cost of a slightly reduced compression ratio (0.4
@@ -107,7 +108,7 @@ filename.lz becomes filename
filename.tlz becomes filename.tar
anyothername becomes anyothername.out
- (De)compressing a file is much like copying or moving it; therefore plzip
+ (De)compressing a file is much like copying or moving it. Therefore plzip
preserves the access and modification dates, permissions, and, when
possible, ownership of the file just as 'cp -p' does. (If the user ID or
the group ID can't be duplicated, the file permission bits S_ISUID and
@@ -206,7 +207,7 @@ once, the first time it appears in the command line.
'-B BYTES'
'--data-size=BYTES'
- When compressing, set the size of the input data blocks in bytes. The
+ When compressing, set the size in bytes of the input data blocks. The
input file will be divided in chunks of this size before compression is
performed. Valid values range from 8 KiB to 1 GiB. Default value is
two times the dictionary size, except for option '-0' where it
@@ -224,10 +225,13 @@ once, the first time it appears in the command line.
'-d'
'--decompress'
- Decompress the files specified. If a file does not exist or can't be
- opened, plzip continues decompressing the rest of the files. If a file
- fails to decompress, or is a terminal, plzip exits immediately without
- decompressing the rest of the files.
+ Decompress the files specified. If a file does not exist, can't be
+ opened, or the destination file already exists and '--force' has not
+ been specified, plzip continues decompressing the rest of the files
+ and exits with error status 1. If a file fails to decompress, or is a
+ terminal, plzip exits immediately with error status 2 without
+ decompressing the rest of the files. A terminal is considered an
+ uncompressed file, and therefore invalid.
'-f'
'--force'
@@ -253,10 +257,12 @@ once, the first time it appears in the command line.
positions and sizes of each member in multimember files are also
printed.
- '-lq' can be used to verify quickly (without decompressing) the
- structural integrity of the files specified. (Use '--test' to verify
- the data integrity). '-alq' additionally verifies that none of the
- files specified contain trailing data.
+ If any file is damaged, does not exist, can't be opened, or is not
+ regular, the final exit status will be > 0. '-lq' can be used to verify
+ quickly (without decompressing) the structural integrity of the files
+ specified. (Use '--test' to verify the data integrity). '-alq'
+ additionally verifies that none of the files specified contain
+ trailing data.
'-m BYTES'
'--match-length=BYTES'
@@ -395,9 +401,10 @@ once, the first time it appears in the command line.
actually being used at run time and exit. Report any differences
found. Exit with error status 1 if differences are found. A mismatch
may indicate that lzlib is not correctly installed or that a different
- version of lzlib has been installed after compiling plzip.
+ version of lzlib has been installed after compiling plzip. Exit with
+ error status 2 if LZ_API_VERSION and LZ_version_string don't match.
'plzip -v --check-lib' shows the version of lzlib being used and the
- value of 'LZ_API_VERSION' (if defined). *Note Library version:
+ value of LZ_API_VERSION (if defined). *Note Library version:
(lzlib)Library version.
@@ -419,18 +426,19 @@ Y yottabyte (10^24) | Yi yobibyte (2^80)
Exit status: 0 for a normal exit, 1 for environmental problems (file not
found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or invalid
-input file, 3 for an internal consistency error (eg, bug) which caused
+input file, 3 for an internal consistency error (e.g., bug) which caused
plzip to panic.

-File: plzip.info, Node: Program design, Next: File format, Prev: Invoking plzip, Up: Top
+File: plzip.info, Node: Program design, Next: Memory requirements, Prev: Invoking plzip, Up: Top
4 Internal structure of plzip
*****************************
When compressing, plzip divides the input file into chunks and compresses as
many chunks simultaneously as worker threads are chosen, creating a
-multimember compressed file.
+multimember compressed file. Each chunk is compressed in-place (using the
+same buffer for input and output), reducing the amount of RAM required.
When decompressing, plzip decompresses as many members simultaneously as
worker threads are chosen. Files that were compressed with lzip will not be
@@ -448,14 +456,14 @@ to the workers. The workers (de)compress the blocks received from the
splitter. The muxer collects processed packets from the workers, and writes
them to the output file.
- ,------------,
+ .------------.
,-->| worker 0 |--,
| `------------' |
-,-------, ,----------, | ,------------, | ,-------, ,--------,
+.-------. .----------. | .------------. | .-------. .--------.
| input |-->| splitter |-+-->| worker 1 |--+-->| muxer |-->| output |
| file | `----------' | `------------' | `-------' | file |
`-------' | ... | `--------'
- | ,------------, |
+ | .------------. |
`-->| worker N-1 |--'
`------------'
@@ -467,82 +475,9 @@ reduced and the decompression speed of large files with many members is
only limited by the number of processors available and by I/O speed.

-File: plzip.info, Node: File format, Next: Memory requirements, Prev: Program design, Up: Top
-
-5 File format
-*************
-
-Perfection is reached, not when there is no longer anything to add, but
-when there is no longer anything to take away.
--- Antoine de Saint-Exupery
-
-
- In the diagram below, a box like this:
-
-+---+
-| | <-- the vertical bars might be missing
-+---+
-
- represents one byte; a box like this:
-
-+==============+
-| |
-+==============+
-
- represents a variable number of bytes.
-
-
- A lzip file consists of a series of "members" (compressed data sets).
-The members simply appear one after another in the file, with no additional
-information before, between, or after them.
-
- Each member has the following structure:
-
-+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size |
-+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-
- All multibyte values are stored in little endian order.
-
-'ID string (the "magic" bytes)'
- A four byte string, identifying the lzip format, with the value "LZIP"
- (0x4C, 0x5A, 0x49, 0x50).
-
-'VN (version number, 1 byte)'
- Just in case something needs to be modified in the future. 1 for now.
-
-'DS (coded dictionary size, 1 byte)'
- The dictionary size is calculated by taking a power of 2 (the base
- size) and subtracting from it a fraction between 0/16 and 7/16 of the
- base size.
- Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).
- Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
- from the base size to obtain the dictionary size.
- Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB
- Valid values for dictionary size range from 4 KiB to 512 MiB.
-
-'LZMA stream'
- The LZMA stream, finished by an end of stream marker. Uses default
- values for encoder properties. *Note Stream format: (lzip)Stream
- format, for a complete description.
-
-'CRC32 (4 bytes)'
- Cyclic Redundancy Check (CRC) of the uncompressed original data.
-
-'Data size (8 bytes)'
- Size of the uncompressed original data.
-
-'Member size (8 bytes)'
- Total size of the member, including header and trailer. This field acts
- as a distributed index, allows the verification of stream integrity,
- and facilitates safe recovery of undamaged members from multimember
- files.
-
-
-
-File: plzip.info, Node: Memory requirements, Next: Minimum file sizes, Prev: File format, Up: Top
+File: plzip.info, Node: Memory requirements, Next: Minimum file sizes, Prev: Program design, Up: Top
-6 Memory required to compress and decompress
+5 Memory required to compress and decompress
********************************************
The amount of memory required *per worker thread* for decompression or
@@ -588,9 +523,9 @@ Level Memory required
-9 568 MiB

-File: plzip.info, Node: Minimum file sizes, Next: Trailing data, Prev: Memory requirements, Up: Top
+File: plzip.info, Node: Minimum file sizes, Next: File format, Prev: Memory requirements, Up: Top
-7 Minimum file sizes required for full compression speed
+6 Minimum file sizes required for full compression speed
********************************************************
When compressing, plzip divides the input file into chunks and compresses
@@ -625,7 +560,83 @@ Level
-9 128 MiB 256 MiB 512 MiB 1 GiB 4 GiB 16 GiB

-File: plzip.info, Node: Trailing data, Next: Examples, Prev: Minimum file sizes, Up: Top
+File: plzip.info, Node: File format, Next: Trailing data, Prev: Minimum file sizes, Up: Top
+
+7 File format
+*************
+
+Perfection is reached, not when there is no longer anything to add, but
+when there is no longer anything to take away.
+-- Antoine de Saint-Exupery
+
+
+ In the diagram below, a box like this:
+
++---+
+| | <-- the vertical bars might be missing
++---+
+
+ represents one byte; a box like this:
+
++==============+
+| |
++==============+
+
+ represents a variable number of bytes.
+
+
+ A lzip file consists of a series of independent "members" (compressed
+data sets). The members simply appear one after another in the file, with no
+additional information before, between, or after them. Each member can
+encode in compressed form up to 16 EiB - 1 byte of uncompressed data. The
+size of a multimember file is unlimited.
+
+ Each member has the following structure:
+
++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size |
++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ All multibyte values are stored in little endian order.
+
+'ID string (the "magic" bytes)'
+ A four byte string, identifying the lzip format, with the value "LZIP"
+ (0x4C, 0x5A, 0x49, 0x50).
+
+'VN (version number, 1 byte)'
+ Just in case something needs to be modified in the future. 1 for now.
+
+'DS (coded dictionary size, 1 byte)'
+ The dictionary size is calculated by taking a power of 2 (the base
+ size) and subtracting from it a fraction between 0/16 and 7/16 of the
+ base size.
+ Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).
+ Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
+ from the base size to obtain the dictionary size.
+ Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB
+ Valid values for dictionary size range from 4 KiB to 512 MiB.
+
+'LZMA stream'
+ The LZMA stream, finished by an "End Of Stream" marker. Uses default
+ values for encoder properties. *Note Stream format: (lzip)Stream
+ format, for a complete description.
+
+'CRC32 (4 bytes)'
+ Cyclic Redundancy Check (CRC) of the original uncompressed data.
+
+'Data size (8 bytes)'
+ Size of the original uncompressed data.
+
+'Member size (8 bytes)'
+ Total size of the member, including header and trailer. This field acts
+ as a distributed index, allows the verification of stream integrity,
+ and facilitates the safe recovery of undamaged members from
+ multimember files. Member size should be limited to 2 PiB to prevent
+ the data size field from overflowing.
+
+
+
+File: plzip.info, Node: Trailing data, Next: Examples, Prev: File format, Up: Top
8 Extra data appended to the file
*********************************
@@ -699,7 +710,7 @@ show the compression ratio.
plzip -v file
-Example 3: Like example 1 but the created 'file.lz' has a block size of
+Example 3: Like example 2 but the created 'file.lz' has a block size of
1 MiB. The compression ratio is not shown.
plzip -B 1MiB file
@@ -717,15 +728,7 @@ status.
plzip -tv file.lz
-Example 6: Compress a whole device in /dev/sdc and send the output to
-'file.lz'.
-
- plzip -c /dev/sdc > file.lz
- or
- plzip /dev/sdc -o file.lz
-
-
-Example 7: The right way of concatenating the decompressed output of two or
+Example 6: The right way of concatenating the decompressed output of two or
more compressed files. *Note Trailing data::.
Don't do this
@@ -734,17 +737,25 @@ more compressed files. *Note Trailing data::.
plzip -cd file1.lz file2.lz file3.lz
-Example 8: Decompress 'file.lz' partially until 10 KiB of decompressed data
+Example 7: Decompress 'file.lz' partially until 10 KiB of decompressed data
are produced.
plzip -cd file.lz | dd bs=1024 count=10
-Example 9: Decompress 'file.lz' partially from decompressed byte at offset
+Example 8: Decompress 'file.lz' partially from decompressed byte at offset
10000 to decompressed byte at offset 14999 (5000 bytes are produced).
plzip -cd file.lz | dd bs=1000 skip=10 count=5
+
+Example 9: Compress a whole device in /dev/sdc and send the output to
+'file.lz'.
+
+ plzip -c /dev/sdc > file.lz
+ or
+ plzip /dev/sdc -o file.lz
+

File: plzip.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
@@ -758,7 +769,7 @@ eternity, if not longer.
If you find a bug in plzip, please send electronic mail to
<lzip-bug@nongnu.org>. Include the version number, which you can find by
-running 'plzip --version'.
+running 'plzip --version' and 'plzip -v --check-lib'.

File: plzip.info, Node: Concept index, Prev: Problems, Up: Top
@@ -787,22 +798,22 @@ Concept index

Tag Table:
-Node: Top222
-Node: Introduction1159
-Node: Output5788
-Node: Invoking plzip7351
-Ref: --trailing-error8146
-Ref: --data-size8384
-Node: Program design18364
-Node: File format20542
-Ref: coded-dict-size21840
-Node: Memory requirements22995
-Node: Minimum file sizes24677
-Node: Trailing data26693
-Node: Examples28961
-Ref: concat-example30556
-Node: Problems31153
-Node: Concept index31681
+Node: Top217
+Node: Introduction1156
+Node: Output5829
+Node: Invoking plzip7392
+Ref: --trailing-error8187
+Ref: --data-size8425
+Node: Program design18819
+Node: Memory requirements21122
+Node: Minimum file sizes22807
+Node: File format24821
+Ref: coded-dict-size26260
+Node: Trailing data27514
+Node: Examples29775
+Ref: concat-example31210
+Node: Problems31967
+Node: Concept index32522

End Tag Table