diff options
author | Daniel Baumann <mail@daniel-baumann.ch> | 2016-05-29 17:17:06 +0000 |
---|---|---|
committer | Daniel Baumann <mail@daniel-baumann.ch> | 2016-05-29 17:17:06 +0000 |
commit | b16de3164ab0d9f55adc5575b46f96c7e92f26f6 (patch) | |
tree | 1fce37f4d8499fbf261ad6fb9bb58f3547e9d01f /doc/clzip.texi | |
parent | Adding upstream version 1.7. (diff) | |
download | clzip-b16de3164ab0d9f55adc5575b46f96c7e92f26f6.tar.xz clzip-b16de3164ab0d9f55adc5575b46f96c7e92f26f6.zip |
Adding upstream version 1.8.upstream/1.8
Signed-off-by: Daniel Baumann <mail@daniel-baumann.ch>
Diffstat (limited to 'doc/clzip.texi')
-rw-r--r-- | doc/clzip.texi | 165 |
1 files changed, 119 insertions, 46 deletions
diff --git a/doc/clzip.texi b/doc/clzip.texi index e2ca889..331d4eb 100644 --- a/doc/clzip.texi +++ b/doc/clzip.texi @@ -6,8 +6,8 @@ @finalout @c %**end of header -@set UPDATED 7 July 2015 -@set VERSION 1.7 +@set UPDATED 13 May 2016 +@set VERSION 1.8 @dircategory Data Compression @direntry @@ -39,13 +39,14 @@ This manual is for Clzip (version @value{VERSION}, @value{UPDATED}). * Invoking clzip:: Command line interface * File format:: Detailed format of the compressed file * Algorithm:: How clzip compresses the data +* Trailing data:: Extra data appended to the file * Examples:: A small tutorial with examples * Problems:: Reporting bugs * Concept index:: Index of concepts @end menu @sp 1 -Copyright @copyright{} 2010-2015 Antonio Diaz Diaz. +Copyright @copyright{} 2010-2016 Antonio Diaz Diaz. This manual is free documentation: you have unlimited permission to copy, distribute and modify it. @@ -78,7 +79,7 @@ program can repair bit-flip errors (one of the most common forms of data corruption) in lzip files, and provides data recovery capabilities, including error-checked merging of damaged copies of a file. @ifnothtml -@ref{Data safety,,,lziprecover}. +@xref{Data safety,,,lziprecover}. @end ifnothtml @item @@ -101,14 +102,14 @@ corrupt byte near the beginning is a thing of the past. The member trailer stores the 32-bit CRC of the original data, the size of the original data and the size of the member. These values, together -with the value remaining in the range decoder and the end-of-stream -marker, provide a 4 factor integrity checking which guarantees that the -decompressed version of the data is identical to the original. This -guards against corruption of the compressed data, and against undetected -bugs in clzip (hopefully very unlikely). The chances of data corruption -going undetected are microscopic. Be aware, though, that the check -occurs upon decompression, so it can only tell you that something is -wrong. It can't help you recover the original uncompressed data. +with the end-of-stream marker, provide a 3 factor integrity checking +which guarantees that the decompressed version of the data is identical +to the original. This guards against corruption of the compressed data, +and against undetected bugs in clzip (hopefully very unlikely). The +chances of data corruption going undetected are microscopic. Be aware, +though, that the check occurs upon decompression, so it can only tell +you that something is wrong. It can't help you recover the original +uncompressed data. Clzip uses the same well-defined exit status values used by lzip and bzip2, which makes it safer than compressors returning ambiguous warning @@ -157,14 +158,14 @@ or more compressed files. The result is the concatenation of the corresponding uncompressed files. Integrity testing of concatenated compressed files is also supported. -Clzip can produce multi-member files and safely recover, with +Clzip can produce multimember files and safely recover, with lziprecover, the undamaged members in case of file damage. Clzip can also split the compressed output in volumes of a given size, even when reading from standard input. This allows the direct creation of multivolume compressed tar archives. Clzip is able to compress and decompress streams of unlimited size by -automatically creating multi-member output. The members so created are +automatically creating multimember output. The members so created are large, about 2 PiB each. @@ -181,6 +182,11 @@ The format for running clzip is: clzip [@var{options}] [@var{files}] @end example +@noindent +@samp{-} used as a @var{file} argument means standard input. It can be +mixed with other @var{files} and is read just once, the first time it +appears in the command line. + Clzip supports the following options: @table @code @@ -192,6 +198,13 @@ Print an informative help message describing the options and exit. @itemx --version Print the version number of clzip on the standard output and exit. +@anchor{--trailing-error} +@item -a +@itemx --trailing-error +Exit with error status 2 if any remaining input is detected after +decompressing the last member. Such remaining input is usually trailing +garbage that can be safely ignored. @xref{concat-example}. + @item -b @var{bytes} @itemx --member-size=@var{bytes} Set the member size limit to @var{bytes}. A small member size may @@ -200,13 +213,18 @@ range from 100 kB to 2 PiB. Defaults to 2 PiB. @item -c @itemx --stdout -Compress or decompress to standard output. Needed when reading from a -named pipe (fifo) or from a device. Use it to recover as much of the -uncompressed data as possible when decompressing a corrupt file. +Compress or decompress to standard output; keep input files unchanged. +If compressing several files, each file is compressed independently. +This option is needed when reading from a named pipe (fifo) or from a +device. Use it also to recover as much of the uncompressed data as +possible when decompressing a corrupt file. @item -d @itemx --decompress -Decompress. +Decompress the specified file(s). If a file does not exist or can't be +opened, clzip continues decompressing the rest of the files. If a file +fails to decompress, clzip exits immediately without decompressing the +rest of the files. @item -f @itemx --force @@ -242,11 +260,13 @@ Quiet operation. Suppress all messages. @item -s @var{bytes} @itemx --dictionary-size=@var{bytes} -Set the dictionary size limit in bytes. Valid values range from 4 KiB to -512 MiB. Clzip will use the smallest possible dictionary size for each -file without exceeding this limit. Note that dictionary sizes are -quantized. If the specified size does not match one of the valid sizes, -it will be rounded upwards by adding up to (@var{bytes} / 16) to it. +Set the dictionary size limit in bytes. Clzip will use the smallest +possible dictionary size for each file without exceeding this limit. +Valid values range from 4 KiB to 512 MiB. Values 12 to 29 are +interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note that +dictionary sizes are quantized. If the specified size does not match one +of the valid sizes, it will be rounded upwards by adding up to +@w{(@var{bytes} / 8)} to it. For maximum compression you should use a dictionary size limit as large as possible, but keep in mind that the decompression memory requirement @@ -257,7 +277,7 @@ is affected at compression time by the choice of dictionary size limit. Split the compressed output into several volume files with names @samp{original_name00001.lz}, @samp{original_name00002.lz}, etc, and set the volume size limit to @var{bytes}. Each volume is a complete, maybe -multi-member, lzip file. A small volume size may degrade compression +multimember, lzip file. A small volume size may degrade compression ratio, so use it only when needed. Valid values range from 100 kB to 4 EiB. @@ -265,7 +285,8 @@ EiB. @itemx --test Check integrity of the specified file(s), but don't decompress them. This really performs a trial decompression and throws away the result. -Use it together with @samp{-v} to see information about the file. +Use it together with @samp{-v} to see information about the file(s). If +a file fails the test, clzip continues checking the rest of the files. @item -v @itemx --verbose @@ -274,18 +295,19 @@ When compressing, show the compression ratio for each file processed. A second @samp{-v} shows the progress of compression.@* When decompressing or testing, further -v's (up to 4) increase the verbosity level, showing status, compression ratio, dictionary size, -and trailer contents (CRC, data size, member size). +trailer contents (CRC, data size, member size), and up to 6 bytes of +trailing data (if any). @item -0 .. -9 Set the compression parameters (dictionary size and match length limit) -as shown in the table below. Note that @samp{-9} can be much slower than -@samp{-0}. These options have no effect when decompressing. +as shown in the table below. The default compression level is @samp{-6}. +Note that @samp{-9} can be much slower than @samp{-0}. These options +have no effect when decompressing. The bidimensional parameter space of LZMA can't be mapped to a linear scale optimal for all files. If your files are large, very repetitive, -etc, you may need to use the @samp{--match-length} and -@samp{--dictionary-size} options directly to achieve optimal -performance. +etc, you may need to use the @samp{--dictionary-size} and +@samp{--match-length} options directly to achieve optimal performance. @multitable {Level} {Dictionary size} {Match length limit} @item Level @tab Dictionary size @tab Match length limit @@ -364,14 +386,14 @@ additional information before, between, or after them. Each member has the following structure: @verbatim +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -| ID string | VN | DS | Lzma stream | CRC32 | Data size | Member size | +| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size | +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ @end verbatim All multibyte values are stored in little endian order. @table @samp -@item ID string +@item ID string (the "magic" bytes) A four byte string, identifying the lzip format, with the value "LZIP" (0x4C, 0x5A, 0x49, 0x50). @@ -388,8 +410,8 @@ from the base size to obtain the dictionary size.@* Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@* Valid values for dictionary size range from 4 KiB to 512 MiB. -@item Lzma stream -The lzma stream, finished by an end of stream marker. Uses default +@item LZMA stream +The LZMA stream, finished by an end of stream marker. Uses default values for encoder properties. @ifnothtml @xref{Stream format,,,lzip}, @@ -409,7 +431,7 @@ Size of the uncompressed original data. @item Member size (8 bytes) Total size of the member, including header and trailer. This field acts as a distributed index, allows the verification of stream integrity, and -facilitates safe recovery of undamaged members from multi-member files. +facilitates safe recovery of undamaged members from multimember files. @end table @@ -480,6 +502,44 @@ range encoding), Igor Pavlov (for putting all the above together in LZMA), and Julian Seward (for bzip2's CLI). +@node Trailing data +@chapter Extra data appended to the file +@cindex trailing data + +Sometimes extra data is found appended to a lzip file after the last +member. Such trailing data may be: + +@itemize @bullet +@item +Padding added to make the file size a multiple of some block size, for +example when writing to a tape. + +@item +Garbage added by some not totally successful copy operation. + +@item +Useful data added by the user; a cryptographically secure hash, a +description of file contents, etc. + +@item +Malicious data added to the file in order to make its total size and +hash value (for a chosen hash) coincide with those of another file. + +@item +In very rare cases, trailing data could be the corrupt header of another +member. In multimember or concatenated files the probability of +corruption happening in the magic bytes is 5 times smaller than the +probability of getting a false positive caused by the corruption of the +integrity information itself. Therefore it can be considered to be below +the noise level. +@end itemize + +Trailing data can be safely ignored in most cases. In some cases, like +that of user-added data, it is expected to be ignored. In those cases +where a file containing trailing data must be rejected, the option +@samp{--trailing-error} can be used. @xref{--trailing-error}. + + @node Examples @chapter A small tutorial with examples @cindex examples @@ -487,7 +547,7 @@ LZMA), and Julian Seward (for bzip2's CLI). WARNING! Even if clzip is bug-free, other causes may result in a corrupt compressed file (bugs in the system libraries, memory errors, etc). Therefore, if the data you are going to compress are important, give the -@samp{--keep} option to clzip and do not remove the original file until +@samp{--keep} option to clzip and don't remove the original file until you verify the compressed file with a command like @w{@samp{clzip -cd file.lz | cmp file -}}. @@ -502,7 +562,7 @@ clzip -v file @sp 1 @noindent -Example 2: Like example 1 but the created @samp{file.lz} is multi-member +Example 2: Like example 1 but the created @samp{file.lz} is multimember with a member size of 1 MiB. The compression ratio is not shown. @example @@ -530,16 +590,29 @@ clzip -tv file.lz @sp 1 @noindent -Example 5: Compress a whole floppy in /dev/fd0 and send the output to +Example 5: Compress a whole device in /dev/sdc and send the output to @samp{file.lz}. @example -clzip -c /dev/fd0 > file.lz +clzip -c /dev/sdc > file.lz +@end example + +@sp 1 +@anchor{concat-example} +@noindent +Example 6: The right way of concatenating compressed files. +@xref{Trailing data}. + +@example +Don't do this + cat file1.lz file2.lz file3.lz | clzip -d +Do this instead + clzip -cd file1.lz file2.lz file3.lz @end example @sp 1 @noindent -Example 6: Decompress @samp{file.lz} partially until 10 KiB of +Example 7: Decompress @samp{file.lz} partially until 10 KiB of decompressed data are produced. @example @@ -548,7 +621,7 @@ clzip -cd file.lz | dd bs=1024 count=10 @sp 1 @noindent -Example 7: Decompress @samp{file.lz} partially from decompressed byte +Example 8: Decompress @samp{file.lz} partially from decompressed byte 10000 to decompressed byte 15000 (5000 bytes are produced). @example @@ -557,7 +630,7 @@ clzip -cd file.lz | dd bs=1000 skip=10 count=5 @sp 1 @noindent -Example 8: Create a multivolume compressed tar archive with a volume +Example 9: Create a multivolume compressed tar archive with a volume size of 1440 KiB. @example @@ -566,7 +639,7 @@ tar -c some_directory | clzip -S 1440KiB -o volume_name @sp 1 @noindent -Example 9: Extract a multivolume compressed tar archive. +Example 10: Extract a multivolume compressed tar archive. @example clzip -cd volume_name*.lz | tar -xf - @@ -574,8 +647,8 @@ clzip -cd volume_name*.lz | tar -xf - @sp 1 @noindent -Example 10: Create a multivolume compressed backup of a large database -file with a volume size of 650 MB, where each volume is a multi-member +Example 11: Create a multivolume compressed backup of a large database +file with a volume size of 650 MB, where each volume is a multimember file with a member size of 32 MiB. @example |