summaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2018-02-13 07:06:11 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2018-02-13 07:06:25 +0000
commitaeab49c877a93051065ab41625d9b13a6017c1d6 (patch)
treecd8f68d5ee0caebf508f6d4f1595d8656f0bff55 /doc
parentReleasing debian version 1.6-5. (diff)
downloadplzip-aeab49c877a93051065ab41625d9b13a6017c1d6.tar.xz
plzip-aeab49c877a93051065ab41625d9b13a6017c1d6.zip
Merging upstream version 1.7.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc')
-rw-r--r--doc/plzip.19
-rw-r--r--doc/plzip.info268
-rw-r--r--doc/plzip.texi235
3 files changed, 358 insertions, 154 deletions
diff --git a/doc/plzip.1 b/doc/plzip.1
index 5c47edd..99dfd8b 100644
--- a/doc/plzip.1
+++ b/doc/plzip.1
@@ -1,5 +1,5 @@
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
-.TH PLZIP "1" "April 2017" "plzip 1.6" "User Commands"
+.TH PLZIP "1" "February 2018" "plzip 1.7" "User Commands"
.SH NAME
plzip \- reduces the size of files
.SH SYNOPSIS
@@ -68,6 +68,9 @@ alias for \fB\-0\fR
.TP
\fB\-\-best\fR
alias for \fB\-9\fR
+.TP
+\fB\-\-loose\-trailing\fR
+allow trailing data seeming corrupt header
.PP
If no file names are given, or if a file is '\-', plzip compresses or
decompresses from standard input to standard output.
@@ -92,8 +95,8 @@ Plzip home page: http://www.nongnu.org/lzip/plzip.html
.SH COPYRIGHT
Copyright \(co 2009 Laszlo Ersek.
.br
-Copyright \(co 2017 Antonio Diaz Diaz.
-Using lzlib 1.9
+Copyright \(co 2018 Antonio Diaz Diaz.
+Using lzlib 1.10
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
.br
This is free software: you are free to change and redistribute it.
diff --git a/doc/plzip.info b/doc/plzip.info
index cf53f13..c8d7387 100644
--- a/doc/plzip.info
+++ b/doc/plzip.info
@@ -11,11 +11,12 @@ File: plzip.info, Node: Top, Next: Introduction, Up: (dir)
Plzip Manual
************
-This manual is for Plzip (version 1.6, 12 April 2017).
+This manual is for Plzip (version 1.7, 7 February 2018).
* Menu:
* Introduction:: Purpose and features of plzip
+* Output:: Meaning of plzip's output
* Invoking plzip:: Command line interface
* Program design:: Internal structure of plzip
* File format:: Detailed format of the compressed file
@@ -27,13 +28,13 @@ This manual is for Plzip (version 1.6, 12 April 2017).
* Concept index:: Index of concepts
- Copyright (C) 2009-2017 Antonio Diaz Diaz.
+ Copyright (C) 2009-2018 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to
copy, distribute and modify it.

-File: plzip.info, Node: Introduction, Next: Invoking plzip, Prev: Top, Up: Top
+File: plzip.info, Node: Introduction, Next: Output, Prev: Top, Up: Top
1 Introduction
**************
@@ -58,7 +59,7 @@ archiving, taking into account both data integrity and decoder
availability:
* The lzip format provides very safe integrity checking and some data
- recovery means. The lziprecover program can repair bit-flip errors
+ recovery means. The lziprecover program can repair bit flip errors
(one of the most common forms of data corruption) in lzip files,
and provides data recovery capabilities, including error-checked
merging of damaged copies of a file. *Note Data safety:
@@ -114,17 +115,60 @@ entirely incomprehensible and therefore pointless.
Plzip will correctly decompress a file which is the concatenation of
two or more compressed files. The result is the concatenation of the
-corresponding uncompressed files. Integrity testing of concatenated
+corresponding decompressed files. Integrity testing of concatenated
compressed files is also supported.
+
+File: plzip.info, Node: Output, Next: Invoking plzip, Prev: Introduction, Up: Top
+
+2 Meaning of plzip's output
+***************************
+
+The output of plzip looks like this:
+
+ plzip -v foo
+ foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out.
+
+ plzip -tvv foo.lz
+ foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. ok
+
+ The meaning of each field is as follows:
+
+'N:1'
+ The compression ratio (uncompressed_size / compressed_size), shown
+ as N to 1.
+
+'ratio'
+ The inverse compression ratio
+ (compressed_size / uncompressed_size), shown as a percentage. A
+ decimal ratio is easily obtained by moving the decimal point two
+ places to the left; 14.98% = 0.1498.
+
+'saved'
+ The space saved by compression (1 - ratio), shown as a percentage.
+
+'in'
+ The size of the uncompressed data. When decompressing or testing,
+ it is shown as 'decompressed'. Note that plzip always prints the
+ uncompressed size before the compressed size when compressing,
+ decompressing, testing or listing.
+
+'out'
+ The size of the compressed data. When decompressing or testing, it
+ is shown as 'compressed'.
+
+
+ When decompressing or testing at verbosity level 4 (-vvvv), the
+dictionary size used to compress the file is also shown.
+
LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may
never have been compressed. Decompressed is used to refer to data which
have undergone the process of decompression.

-File: plzip.info, Node: Invoking plzip, Next: Program design, Prev: Introduction, Up: Top
+File: plzip.info, Node: Invoking plzip, Next: Program design, Prev: Output, Up: Top
-2 Invoking plzip
+3 Invoking plzip
****************
The format for running plzip is:
@@ -135,7 +179,7 @@ The format for running plzip is:
other FILES and is read just once, the first time it appears in the
command line.
- Plzip supports the following options:
+ plzip supports the following options:
'-h'
'--help'
@@ -154,12 +198,12 @@ command line.
'-B BYTES'
'--data-size=BYTES'
- Set the size of the input data blocks, in bytes. The input file
- will be divided in chunks of this size before compression is
- performed. Valid values range from 8 KiB to 1 GiB. Default value
- is two times the dictionary size, except for option '-0' where it
- defaults to 1 MiB. Plzip will reduce the dictionary size if it is
- larger than the chosen data size.
+ When compressing, set the size of the input data blocks in bytes.
+ The input file will be divided in chunks of this size before
+ compression is performed. Valid values range from 8 KiB to 1 GiB.
+ Default value is two times the dictionary size, except for option
+ '-0' where it defaults to 1 MiB. Plzip will reduce the dictionary
+ size if it is larger than the chosen data size.
'-c'
'--stdout'
@@ -170,10 +214,10 @@ command line.
'-d'
'--decompress'
- Decompress the specified file(s). If a file does not exist or
- can't be opened, plzip continues decompressing the rest of the
- files. If a file fails to decompress, plzip exits immediately
- without decompressing the rest of the files.
+ Decompress the specified files. If a file does not exist or can't
+ be opened, plzip continues decompressing the rest of the files. If
+ a file fails to decompress, or is a terminal, plzip exits
+ immediately without decompressing the rest of the files.
'-f'
'--force'
@@ -181,8 +225,8 @@ command line.
'-F'
'--recompress'
- Force re-compression of files whose name already has the '.lz' or
- '.tlz' suffix.
+ When compressing, force re-compression of files whose name already
+ has the '.lz' or '.tlz' suffix.
'-k'
'--keep'
@@ -192,7 +236,7 @@ command line.
'-l'
'--list'
Print the uncompressed size, compressed size and percentage saved
- of the specified file(s). Trailing data are ignored. The values
+ of the specified files. Trailing data are ignored. The values
produced are correct even for multimember files. If more than one
file is given, a final line containing the cumulative sizes is
printed. With '-v', the dictionary size, the number of members in
@@ -206,18 +250,21 @@ command line.
'-m BYTES'
'--match-length=BYTES'
- Set the match length limit in bytes. After a match this long is
- found, the search is finished. Valid values range from 5 to 273.
- Larger values usually give better compression ratios but longer
- compression times.
+ When compressing, set the match length limit in bytes. After a
+ match this long is found, the search is finished. Valid values
+ range from 5 to 273. Larger values usually give better compression
+ ratios but longer compression times.
'-n N'
'--threads=N'
- Set the number of worker threads. Valid values range from 1 to "as
- many as your system can support". If this option is not used,
- plzip tries to detect the number of processors in the system and
- use it as default value. 'plzip --help' shows the system's default
- value.
+ Set the number of worker threads, overriding the system's default.
+ Valid values range from 1 to "as many as your system can support".
+ If this option is not used, plzip tries to detect the number of
+ processors in the system and use it as default value. When
+ compressing on a 32 bit system, plzip tries to limit the memory
+ use to under 2.22 GiB (4 worker threads at level -9) by reducing
+ the number of threads below the system's default. 'plzip --help'
+ shows the system's default value.
Note that the number of usable threads is limited to
ceil( file_size / data_size ) during compression (*note Minimum
@@ -228,8 +275,9 @@ command line.
'--output=FILE'
When reading from standard input and '--stdout' has not been
specified, use 'FILE' as the virtual name of the uncompressed
- file. This produces a file named 'FILE' when decompressing, and a
- file named 'FILE.lz' when compressing.
+ file. This produces a file named 'FILE' when decompressing, or a
+ file named 'FILE.lz' when compressing. A second '.lz' extension is
+ not added if 'FILE' already ends in '.lz' or '.tlz'.
'-q'
'--quiet'
@@ -237,13 +285,13 @@ command line.
'-s BYTES'
'--dictionary-size=BYTES'
- Set the dictionary size limit in bytes. Plzip will use the smallest
- possible dictionary size for each file without exceeding this
- limit. Valid values range from 4 KiB to 512 MiB. Values 12 to 29
- are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note
- that dictionary sizes are quantized. If the specified size does
- not match one of the valid sizes, it will be rounded upwards by
- adding up to (BYTES / 8) to it.
+ When compressing, set the dictionary size limit in bytes. Plzip
+ will use the smallest possible dictionary size for each file
+ without exceeding this limit. Valid values range from 4 KiB to
+ 512 MiB. Values 12 to 29 are interpreted as powers of two, meaning
+ 2^12 to 2^29 bytes. Note that dictionary sizes are quantized. If
+ the specified size does not match one of the valid sizes, it will
+ be rounded upwards by adding up to (BYTES / 8) to it.
For maximum compression you should use a dictionary size limit as
large as possible, but keep in mind that the decompression memory
@@ -252,10 +300,10 @@ command line.
'-t'
'--test'
- Check integrity of the specified file(s), but don't decompress
- them. This really performs a trial decompression and throws away
- the result. Use it together with '-v' to see information about
- the file(s). If a file does not exist, can't be opened, or is a
+ Check integrity of the specified files, but don't decompress them.
+ This really performs a trial decompression and throws away the
+ result. Use it together with '-v' to see information about the
+ files. If a file does not exist, can't be opened, or is a
terminal, plzip continues checking the rest of the files. If a
file fails the test, plzip may be unable to check the rest of the
files.
@@ -263,17 +311,19 @@ command line.
'-v'
'--verbose'
Verbose mode.
- When compressing, show the compression ratio for each file
- processed. A second '-v' shows the progress of compression.
+ When compressing, show the compression ratio and size for each file
+ processed.
When decompressing or testing, further -v's (up to 4) increase the
verbosity level, showing status, compression ratio, dictionary
size, decompressed size, and compressed size.
+ Two or more '-v' options show the progress of (de)compression,
+ except for single-member files.
'-0 .. -9'
Set the compression parameters (dictionary size and match length
limit) as shown in the table below. The default compression level
is '-6'. Note that '-9' can be much slower than '-0'. These
- options have no effect when decompressing.
+ options have no effect when decompressing, testing or listing.
The bidimensional parameter space of LZMA can't be mapped to a
linear scale optimal for all files. If your files are large, very
@@ -296,6 +346,13 @@ command line.
'--best'
Aliases for GNU gzip compatibility.
+'--loose-trailing'
+ When decompressing, testing or listing, allow trailing data whose
+ first bytes are so similar to the magic bytes of a lzip header
+ that they can be confused with a corrupt header. Use this option
+ if a file triggers a "corrupt header" error and the cause is not
+ indeed a corrupt header.
+
Numbers given as arguments to options may be followed by a multiplier
and an optional 'B' for "byte".
@@ -321,7 +378,7 @@ caused plzip to panic.

File: plzip.info, Node: Program design, Next: File format, Prev: Invoking plzip, Up: Top
-3 Program design
+4 Program design
****************
When compressing, plzip divides the input file into chunks and
@@ -344,6 +401,17 @@ them to the workers. The workers (de)compress the blocks received from
the splitter. The muxer collects processed packets from the workers, and
writes them to the output file.
+ ,------------,
+ ,-->| worker 0 |--,
+ | `------------' |
+,-------, ,----------, | ,------------, | ,-------, ,--------,
+| input |-->| splitter |-+-->| worker 1 |--+-->| muxer |-->| output |
+| file | `----------' | `------------' | `-------' | file |
+`-------' | ... | `--------'
+ | ,------------, |
+ `-->| worker N-1 |--'
+ `------------'
+
When decompressing from a regular file, the splitter is removed and
the workers read directly from the input file. If the output file is
also a regular file, the muxer is also removed and the workers write
@@ -355,7 +423,7 @@ I/O speed.

File: plzip.info, Node: File format, Next: Memory requirements, Prev: Program design, Up: Top
-4 File format
+5 File format
*************
Perfection is reached, not when there is no longer anything to add, but
@@ -426,17 +494,11 @@ additional information before, between, or after them.

File: plzip.info, Node: Memory requirements, Next: Minimum file sizes, Prev: File format, Up: Top
-5 Memory required to compress and decompress
+6 Memory required to compress and decompress
********************************************
-The amount of memory required *per thread* is approximately the
-following:
-
- * For compression at level -0; 1.5 MiB plus 3 times the data size
- (*note --data-size::). Default is 4.5 MiB.
-
- * For compression at other levels; 11 times the dictionary size plus
- 3 times the data size. Default is 136 MiB.
+The amount of memory required *per thread* for decompression or testing
+is approximately the following:
* For decompression of a regular (seekable) file to another regular
file, or for testing of a regular file; the dictionary size.
@@ -450,10 +512,35 @@ following:
* For decompression of a non-seekable file or of standard input; the
dictionary size plus up to 35 MiB.
+The amount of memory required *per thread* for compression is
+approximately the following:
+
+ * For compression at level -0; 1.5 MiB plus 3.375 times the data size
+ (*note --data-size::). Default is 4.875 MiB.
+
+ * For compression at other levels; 11 times the dictionary size plus
+ 3.375 times the data size. Default is 142 MiB.
+
+The following table shows the memory required *per thread* for
+compression at a given level, using the default data size for each
+level:
+
+Level Memory required
+-0 4.875 MiB
+-1 17.75 MiB
+-2 26.625 MiB
+-3 35.5 MiB
+-4 53.25 MiB
+-5 71 MiB
+-6 142 MiB
+-7 284 MiB
+-8 426 MiB
+-9 568 MiB
+

File: plzip.info, Node: Minimum file sizes, Next: Trailing data, Prev: Memory requirements, Up: Top
-6 Minimum file sizes required for full compression speed
+7 Minimum file sizes required for full compression speed
********************************************************
When compressing, plzip divides the input file into chunks and
@@ -466,7 +553,8 @@ must be at least as large as the number of worker threads times the
chunk size (*note --data-size::). Else some processors will not get any
data to compress, and compression will be proportionally slower. The
maximum speed increase achievable on a given file is limited by the
-ratio (file_size / data_size).
+ratio (file_size / data_size). For example, a tarball the size of gcc or
+linux will scale up to 8 processors at level -9.
The following table shows the minimum uncompressed file size needed
for full use of N processors at a given compression level, using the
@@ -489,7 +577,7 @@ Level

File: plzip.info, Node: Trailing data, Next: Examples, Prev: Minimum file sizes, Up: Top
-7 Extra data appended to the file
+8 Extra data appended to the file
*********************************
Sometimes extra data are found appended to a lzip file after the last
@@ -501,10 +589,11 @@ member. Such trailing data may be:
* Useful data added by the user; a cryptographically secure hash, a
description of file contents, etc. It is safe to append any amount
- of text to a lzip file as long as the text does not begin with the
- string "LZIP", and does not contain any zero bytes (null
- characters). Nonzero bytes and zero bytes can't be safely mixed in
- trailing data.
+ of text to a lzip file as long as none of the first four bytes of
+ the text match the corresponding byte in the string "LZIP", and
+ the text does not contain any zero bytes (null characters).
+ Nonzero bytes and zero bytes can't be safely mixed in trailing
+ data.
* Garbage added by some not totally successful copy operation.
@@ -512,12 +601,17 @@ member. Such trailing data may be:
and hash value (for a chosen hash) coincide with those of another
file.
- * In very rare cases, trailing data could be the corrupt header of
- another member. In multimember or concatenated files the
- probability of corruption happening in the magic bytes is 5 times
- smaller than the probability of getting a false positive caused by
- the corruption of the integrity information itself. Therefore it
- can be considered to be below the noise level.
+ * In rare cases, trailing data could be the corrupt header of another
+ member. In multimember or concatenated files the probability of
+ corruption happening in the magic bytes is 5 times smaller than the
+ probability of getting a false positive caused by the corruption
+ of the integrity information itself. Therefore it can be
+ considered to be below the noise level. Additionally, the test
+ used by plzip to discriminate trailing data from a corrupt header
+ has a Hamming distance (HD) of 3, and the 3 bit flips must happen
+ in different magic bytes for the test to fail. In any case, the
+ option '--trailing-error' guarantees that any corrupt header will
+ be detected.
Trailing data are in no way part of the lzip file format, but tools
reading lzip files are expected to behave as correctly and usefully as
@@ -531,7 +625,7 @@ cases where a file containing trailing data must be rejected, the option

File: plzip.info, Node: Examples, Next: Problems, Prev: Trailing data, Up: Top
-8 A small tutorial with examples
+9 A small tutorial with examples
********************************
WARNING! Even if plzip is bug-free, other causes may result in a corrupt
@@ -595,8 +689,8 @@ to decompressed byte 15000 (5000 bytes are produced).

File: plzip.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
-9 Reporting bugs
-****************
+10 Reporting bugs
+*****************
There are probably bugs in plzip. There are certainly errors and
omissions in this manual. If you report them, they will get fixed. If
@@ -625,6 +719,7 @@ Concept index
* memory requirements: Memory requirements. (line 6)
* minimum file sizes: Minimum file sizes. (line 6)
* options: Invoking plzip. (line 6)
+* output: Output. (line 6)
* program design: Program design. (line 6)
* trailing data: Trailing data. (line 6)
* usage: Invoking plzip. (line 6)
@@ -634,19 +729,20 @@ Concept index

Tag Table:
Node: Top221
-Node: Introduction1103
-Node: Invoking plzip5274
-Ref: --trailing-error5843
-Ref: --data-size6086
-Node: Program design12796
-Node: File format14383
-Node: Memory requirements16815
-Node: Minimum file sizes17815
-Node: Trailing data19741
-Node: Examples21648
-Ref: concat-example22813
-Node: Problems23388
-Node: Concept index23914
+Node: Introduction1158
+Node: Output5134
+Node: Invoking plzip6614
+Ref: --trailing-error7177
+Ref: --data-size7420
+Node: Program design14938
+Node: File format17090
+Node: Memory requirements19522
+Node: Minimum file sizes20985
+Node: Trailing data23002
+Node: Examples25285
+Ref: concat-example26450
+Node: Problems27025
+Node: Concept index27553

End Tag Table
diff --git a/doc/plzip.texi b/doc/plzip.texi
index 5f32f6e..44cff75 100644
--- a/doc/plzip.texi
+++ b/doc/plzip.texi
@@ -6,8 +6,8 @@
@finalout
@c %**end of header
-@set UPDATED 12 April 2017
-@set VERSION 1.6
+@set UPDATED 7 February 2018
+@set VERSION 1.7
@dircategory Data Compression
@direntry
@@ -36,6 +36,7 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}).
@menu
* Introduction:: Purpose and features of plzip
+* Output:: Meaning of plzip's output
* Invoking plzip:: Command line interface
* Program design:: Internal structure of plzip
* File format:: Detailed format of the compressed file
@@ -48,7 +49,7 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}).
@end menu
@sp 1
-Copyright @copyright{} 2009-2017 Antonio Diaz Diaz.
+Copyright @copyright{} 2009-2018 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission
to copy, distribute and modify it.
@@ -81,7 +82,7 @@ availability:
The lzip format provides very safe integrity checking and some data
recovery means. The
@uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover}
-program can repair bit-flip errors (one of the most common forms of data
+program can repair bit flip errors (one of the most common forms of data
corruption) in lzip files, and provides data recovery capabilities,
including error-checked merging of damaged copies of a file.
@ifnothtml
@@ -143,9 +144,54 @@ incomprehensible and therefore pointless.
Plzip will correctly decompress a file which is the concatenation of two
or more compressed files. The result is the concatenation of the
-corresponding uncompressed files. Integrity testing of concatenated
+corresponding decompressed files. Integrity testing of concatenated
compressed files is also supported.
+
+@node Output
+@chapter Meaning of plzip's output
+@cindex output
+
+The output of plzip looks like this:
+
+@example
+plzip -v foo
+ foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out.
+
+plzip -tvv foo.lz
+ foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. ok
+@end example
+
+The meaning of each field is as follows:
+
+@table @code
+@item N:1
+The compression ratio @w{(uncompressed_size / compressed_size)}, shown
+as N to 1.
+
+@item ratio
+The inverse compression ratio @w{(compressed_size / uncompressed_size)},
+shown as a percentage. A decimal ratio is easily obtained by moving the
+decimal point two places to the left; @w{14.98% = 0.1498}.
+
+@item saved
+The space saved by compression @w{(1 - ratio)}, shown as a percentage.
+
+@item in
+The size of the uncompressed data. When decompressing or testing, it is
+shown as @code{decompressed}. Note that plzip always prints the
+uncompressed size before the compressed size when compressing,
+decompressing, testing or listing.
+
+@item out
+The size of the compressed data. When decompressing or testing, it is
+shown as @code{compressed}.
+
+@end table
+
+When decompressing or testing at verbosity level 4 (-vvvv), the
+dictionary size used to compress the file is also shown.
+
LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never
have been compressed. Decompressed is used to refer to data which have
undergone the process of decompression.
@@ -169,7 +215,7 @@ plzip [@var{options}] [@var{files}]
mixed with other @var{files} and is read just once, the first time it
appears in the command line.
-Plzip supports the following options:
+plzip supports the following options:
@table @code
@item -h
@@ -190,12 +236,12 @@ garbage that can be safely ignored. @xref{concat-example}.
@anchor{--data-size}
@item -B @var{bytes}
@itemx --data-size=@var{bytes}
-Set the size of the input data blocks, in bytes. The input file will be
-divided in chunks of this size before compression is performed. Valid
-values range from 8 KiB to 1 GiB. Default value is two times the
-dictionary size, except for option @samp{-0} where it defaults to 1 MiB.
-Plzip will reduce the dictionary size if it is larger than the chosen
-data size.
+When compressing, set the size of the input data blocks in bytes. The
+input file will be divided in chunks of this size before compression is
+performed. Valid values range from @w{8 KiB} to @w{1 GiB}. Default value
+is two times the dictionary size, except for option @samp{-0} where it
+defaults to @w{1 MiB}. Plzip will reduce the dictionary size if it is
+larger than the chosen data size.
@item -c
@itemx --stdout
@@ -206,10 +252,10 @@ device.
@item -d
@itemx --decompress
-Decompress the specified file(s). If a file does not exist or can't be
+Decompress the specified files. If a file does not exist or can't be
opened, plzip continues decompressing the rest of the files. If a file
-fails to decompress, plzip exits immediately without decompressing the
-rest of the files.
+fails to decompress, or is a terminal, plzip exits immediately without
+decompressing the rest of the files.
@item -f
@itemx --force
@@ -217,8 +263,8 @@ Force overwrite of output files.
@item -F
@itemx --recompress
-Force re-compression of files whose name already has the @samp{.lz} or
-@samp{.tlz} suffix.
+When compressing, force re-compression of files whose name already has
+the @samp{.lz} or @samp{.tlz} suffix.
@item -k
@itemx --keep
@@ -227,7 +273,7 @@ Keep (don't delete) input files during compression or decompression.
@item -l
@itemx --list
Print the uncompressed size, compressed size and percentage saved of the
-specified file(s). Trailing data are ignored. The values produced are
+specified files. Trailing data are ignored. The values produced are
correct even for multimember files. If more than one file is given, a
final line containing the cumulative sizes is printed. With @samp{-v},
the dictionary size, the number of members in the file, and the amount
@@ -240,16 +286,21 @@ verifies that none of the specified files contain trailing data.
@item -m @var{bytes}
@itemx --match-length=@var{bytes}
-Set the match length limit in bytes. After a match this long is found,
-the search is finished. Valid values range from 5 to 273. Larger values
-usually give better compression ratios but longer compression times.
+When compressing, set the match length limit in bytes. After a match
+this long is found, the search is finished. Valid values range from 5 to
+273. Larger values usually give better compression ratios but longer
+compression times.
@item -n @var{n}
@itemx --threads=@var{n}
-Set the number of worker threads. Valid values range from 1 to "as many
-as your system can support". If this option is not used, plzip tries to
-detect the number of processors in the system and use it as default
-value. @w{@samp{plzip --help}} shows the system's default value.
+Set the number of worker threads, overriding the system's default. Valid
+values range from 1 to "as many as your system can support". If this
+option is not used, plzip tries to detect the number of processors in
+the system and use it as default value. When compressing on a @w{32 bit}
+system, plzip tries to limit the memory use to under @w{2.22 GiB} (4
+worker threads at level -9) by reducing the number of threads below the
+system's default. @w{@samp{plzip --help}} shows the system's default
+value.
Note that the number of usable threads is limited to @w{ceil( file_size
/ data_size )} during compression (@pxref{Minimum file sizes}), and to
@@ -260,7 +311,9 @@ the number of members in the input during decompression.
When reading from standard input and @samp{--stdout} has not been
specified, use @samp{@var{file}} as the virtual name of the uncompressed
file. This produces a file named @samp{@var{file}} when decompressing,
-and a file named @samp{@var{file}.lz} when compressing.
+or a file named @samp{@var{file}.lz} when compressing. A second
+@samp{.lz} extension is not added if @samp{@var{file}} already ends in
+@samp{.lz} or @samp{.tlz}.
@item -q
@itemx --quiet
@@ -268,12 +321,12 @@ Quiet operation. Suppress all messages.
@item -s @var{bytes}
@itemx --dictionary-size=@var{bytes}
-Set the dictionary size limit in bytes. Plzip will use the smallest
-possible dictionary size for each file without exceeding this limit.
-Valid values range from 4 KiB to 512 MiB. Values 12 to 29 are
-interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note that
-dictionary sizes are quantized. If the specified size does not match one
-of the valid sizes, it will be rounded upwards by adding up to
+When compressing, set the dictionary size limit in bytes. Plzip will use
+the smallest possible dictionary size for each file without exceeding
+this limit. Valid values range from @w{4 KiB} to @w{512 MiB}. Values 12
+to 29 are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note
+that dictionary sizes are quantized. If the specified size does not
+match one of the valid sizes, it will be rounded upwards by adding up to
@w{(@var{bytes} / 8)} to it.
For maximum compression you should use a dictionary size limit as large
@@ -282,27 +335,29 @@ is affected at compression time by the choice of dictionary size limit.
@item -t
@itemx --test
-Check integrity of the specified file(s), but don't decompress them.
-This really performs a trial decompression and throws away the result.
-Use it together with @samp{-v} to see information about the file(s). If
-a file does not exist, can't be opened, or is a terminal, plzip
-continues checking the rest of the files. If a file fails the test,
-plzip may be unable to check the rest of the files.
+Check integrity of the specified files, but don't decompress them. This
+really performs a trial decompression and throws away the result. Use it
+together with @samp{-v} to see information about the files. If a file
+does not exist, can't be opened, or is a terminal, plzip continues
+checking the rest of the files. If a file fails the test, plzip may be
+unable to check the rest of the files.
@item -v
@itemx --verbose
Verbose mode.@*
-When compressing, show the compression ratio for each file processed. A
-second @samp{-v} shows the progress of compression.@*
+When compressing, show the compression ratio and size for each file
+processed.@*
When decompressing or testing, further -v's (up to 4) increase the
verbosity level, showing status, compression ratio, dictionary size,
-decompressed size, and compressed size.
+decompressed size, and compressed size.@*
+Two or more @samp{-v} options show the progress of (de)compression,
+except for single-member files.
@item -0 .. -9
Set the compression parameters (dictionary size and match length limit)
as shown in the table below. The default compression level is @samp{-6}.
Note that @samp{-9} can be much slower than @samp{-0}. These options
-have no effect when decompressing.
+have no effect when decompressing, testing or listing.
The bidimensional parameter space of LZMA can't be mapped to a linear
scale optimal for all files. If your files are large, very repetitive,
@@ -327,6 +382,12 @@ etc, you may need to use the @samp{--dictionary-size} and
@itemx --best
Aliases for GNU gzip compatibility.
+@item --loose-trailing
+When decompressing, testing or listing, allow trailing data whose first
+bytes are so similar to the magic bytes of a lzip header that they can
+be confused with a corrupt header. Use this option if a file triggers a
+"corrupt header" error and the cause is not indeed a corrupt header.
+
@end table
Numbers given as arguments to options may be followed by a multiplier
@@ -363,8 +424,8 @@ creating a multimember compressed file.
When decompressing, plzip decompresses as many members simultaneously as
worker threads are chosen. Files that were compressed with lzip will not
-be decompressed faster than using lzip (unless the @samp{-b} option was
-used) because lzip usually produces single-member files, which can't be
+be decompressed faster than using lzip (unless the @samp{-b} option was used)
+because lzip usually produces single-member files, which can't be
decompressed in parallel.
For each input file, a splitter thread and several worker threads are
@@ -377,6 +438,19 @@ to the workers. The workers (de)compress the blocks received from the
splitter. The muxer collects processed packets from the workers, and
writes them to the output file.
+@verbatim
+ ,------------,
+ ,-->| worker 0 |--,
+ | `------------' |
+,-------, ,----------, | ,------------, | ,-------, ,--------,
+| input |-->| splitter |-+-->| worker 1 |--+-->| muxer |-->| output |
+| file | `----------' | `------------' | `-------' | file |
+`-------' | ... | `--------'
+ | ,------------, |
+ `-->| worker N-1 |--'
+ `------------'
+@end verbatim
+
When decompressing from a regular file, the splitter is removed and the
workers read directly from the input file. If the output file is also a
regular file, the muxer is also removed and the workers write directly
@@ -472,35 +546,60 @@ facilitates safe recovery of undamaged members from multimember files.
@chapter Memory required to compress and decompress
@cindex memory requirements
-The amount of memory required @strong{per thread} is approximately the
-following:
+The amount of memory required @strong{per thread} for decompression or
+testing is approximately the following:
@itemize @bullet
@item
-For compression at level -0; 1.5 MiB plus 3 times the data size
-(@pxref{--data-size}). Default is 4.5 MiB.
-
-@item
-For compression at other levels; 11 times the dictionary size plus 3
-times the data size. Default is 136 MiB.
-
-@item
For decompression of a regular (seekable) file to another regular file,
or for testing of a regular file; the dictionary size.
@item
For testing of a non-seekable file or of standard input; the dictionary
-size plus up to 5 MiB.
+size plus up to @w{5 MiB}.
@item
For decompression of a regular file to a non-seekable file or to
-standard output; the dictionary size plus up to 32 MiB.
+standard output; the dictionary size plus up to @w{32 MiB}.
@item
For decompression of a non-seekable file or of standard input; the
-dictionary size plus up to 35 MiB.
+dictionary size plus up to @w{35 MiB}.
+@end itemize
+
+@noindent
+The amount of memory required @strong{per thread} for compression is
+approximately the following:
+
+@itemize @bullet
+@item
+For compression at level -0; @w{1.5 MiB} plus 3.375 times the data size
+(@pxref{--data-size}). Default is @w{4.875 MiB}.
+
+@item
+For compression at other levels; 11 times the dictionary size plus 3.375
+times the data size. Default is @w{142 MiB}.
@end itemize
+@noindent
+The following table shows the memory required @strong{per thread} for
+compression at a given level, using the default data size for each
+level:
+
+@multitable {Level} {Memory required}
+@item Level @tab Memory required
+@item -0 @tab 4.875 MiB
+@item -1 @tab 17.75 MiB
+@item -2 @tab 26.625 MiB
+@item -3 @tab 35.5 MiB
+@item -4 @tab 53.25 MiB
+@item -5 @tab 71 MiB
+@item -6 @tab 142 MiB
+@item -7 @tab 284 MiB
+@item -8 @tab 426 MiB
+@item -9 @tab 568 MiB
+@end multitable
+
@node Minimum file sizes
@chapter Minimum file sizes required for full compression speed
@@ -516,7 +615,8 @@ least as large as the number of worker threads times the chunk size
(@pxref{--data-size}). Else some processors will not get any data to
compress, and compression will be proportionally slower. The maximum
speed increase achievable on a given file is limited by the ratio
-@w{(file_size / data_size)}.
+@w{(file_size / data_size)}. For example, a tarball the size of gcc or
+linux will scale up to 8 processors at level -9.
The following table shows the minimum uncompressed file size needed for
full use of N processors at a given compression level, using the default
@@ -554,9 +654,10 @@ padding zero bytes to a lzip file.
@item
Useful data added by the user; a cryptographically secure hash, a
description of file contents, etc. It is safe to append any amount of
-text to a lzip file as long as the text does not begin with the string
-"LZIP", and does not contain any zero bytes (null characters). Nonzero
-bytes and zero bytes can't be safely mixed in trailing data.
+text to a lzip file as long as none of the first four bytes of the text
+match the corresponding byte in the string "LZIP", and the text does not
+contain any zero bytes (null characters). Nonzero bytes and zero bytes
+can't be safely mixed in trailing data.
@item
Garbage added by some not totally successful copy operation.
@@ -566,12 +667,16 @@ Malicious data added to the file in order to make its total size and
hash value (for a chosen hash) coincide with those of another file.
@item
-In very rare cases, trailing data could be the corrupt header of another
+In rare cases, trailing data could be the corrupt header of another
member. In multimember or concatenated files the probability of
corruption happening in the magic bytes is 5 times smaller than the
probability of getting a false positive caused by the corruption of the
integrity information itself. Therefore it can be considered to be below
-the noise level.
+the noise level. Additionally, the test used by plzip to discriminate
+trailing data from a corrupt header has a Hamming distance (HD) of 3,
+and the 3 bit flips must happen in different magic bytes for the test to
+fail. In any case, the option @samp{--trailing-error} guarantees that
+any corrupt header will be detected.
@end itemize
Trailing data are in no way part of the lzip file format, but tools
@@ -607,7 +712,7 @@ plzip -v file
@sp 1
@noindent
Example 2: Like example 1 but the created @samp{file.lz} has a block
-size of 1 MiB. The compression ratio is not shown.
+size of @w{1 MiB}. The compression ratio is not shown.
@example
plzip -B 1MiB file
@@ -656,7 +761,7 @@ Do this instead
@sp 1
@noindent
-Example 7: Decompress @samp{file.lz} partially until 10 KiB of
+Example 7: Decompress @samp{file.lz} partially until @w{10 KiB} of
decompressed data are produced.
@example