diff options
author | Daniel Baumann <mail@daniel-baumann.ch> | 2016-05-29 17:17:10 +0000 |
---|---|---|
committer | Daniel Baumann <mail@daniel-baumann.ch> | 2016-05-29 17:17:40 +0000 |
commit | 5fcb0d00fb1cdc480ceae6aff80d0ed3ddd602cf (patch) | |
tree | 986e0e9aa7aaa4c5402455481822c5edfba12566 /doc | |
parent | Releasing debian version 1.7-2. (diff) | |
download | clzip-5fcb0d00fb1cdc480ceae6aff80d0ed3ddd602cf.tar.xz clzip-5fcb0d00fb1cdc480ceae6aff80d0ed3ddd602cf.zip |
Merging upstream version 1.8.
Signed-off-by: Daniel Baumann <mail@daniel-baumann.ch>
Diffstat (limited to 'doc')
-rw-r--r-- | doc/clzip.1 | 20 | ||||
-rw-r--r-- | doc/clzip.info | 189 | ||||
-rw-r--r-- | doc/clzip.texi | 165 |
3 files changed, 259 insertions, 115 deletions
diff --git a/doc/clzip.1 b/doc/clzip.1 index 32b3bde..5dbb695 100644 --- a/doc/clzip.1 +++ b/doc/clzip.1 @@ -1,5 +1,5 @@ .\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1. -.TH CLZIP "1" "July 2015" "clzip 1.7" "User Commands" +.TH CLZIP "1" "May 2016" "clzip 1.8" "User Commands" .SH NAME clzip \- reduces the size of files .SH SYNOPSIS @@ -15,11 +15,14 @@ display this help and exit \fB\-V\fR, \fB\-\-version\fR output version information and exit .TP +\fB\-a\fR, \fB\-\-trailing\-error\fR +exit with error status if trailing data +.TP \fB\-b\fR, \fB\-\-member\-size=\fR<bytes> set member size limit in bytes .TP \fB\-c\fR, \fB\-\-stdout\fR -send output to standard output +write to standard output, keep input files .TP \fB\-d\fR, \fB\-\-decompress\fR decompress @@ -37,7 +40,7 @@ keep (don't delete) input files set match length limit in bytes [36] .TP \fB\-o\fR, \fB\-\-output=\fR<file> -if reading stdin, place the output into <file> +if reading standard input, write to <file> .TP \fB\-q\fR, \fB\-\-quiet\fR suppress all messages @@ -63,13 +66,16 @@ alias for \fB\-0\fR \fB\-\-best\fR alias for \fB\-9\fR .PP -If no file names are given, clzip compresses or decompresses -from standard input to standard output. +If no file names are given, or if a file is '\-', clzip compresses or +decompresses from standard input to standard output. Numbers may be followed by a multiplier: k = kB = 10^3 = 1000, Ki = KiB = 2^10 = 1024, M = 10^6, Mi = 2^20, G = 10^9, Gi = 2^30, etc... +Dictionary sizes 12 to 29 are interpreted as powers of two, meaning 2^12 +to 2^29 bytes. +.PP The bidimensional parameter space of LZMA can't be mapped to a linear scale optimal for all files. If your files are large, very repetitive, -etc, you may need to use the \fB\-\-match\-length\fR and \fB\-\-dictionary\-size\fR +etc, you may need to use the \fB\-\-dictionary\-size\fR and \fB\-\-match\-length\fR options directly to achieve optimal performance. .PP Exit status: 0 for a normal exit, 1 for environmental problems (file @@ -81,7 +87,7 @@ Report bugs to lzip\-bug@nongnu.org .br Clzip home page: http://www.nongnu.org/lzip/clzip.html .SH COPYRIGHT -Copyright \(co 2015 Antonio Diaz Diaz. +Copyright \(co 2016 Antonio Diaz Diaz. License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html> .br This is free software: you are free to change and redistribute it. diff --git a/doc/clzip.info b/doc/clzip.info index 786d8c1..c590473 100644 --- a/doc/clzip.info +++ b/doc/clzip.info @@ -11,7 +11,7 @@ File: clzip.info, Node: Top, Next: Introduction, Up: (dir) Clzip Manual ************ -This manual is for Clzip (version 1.7, 7 July 2015). +This manual is for Clzip (version 1.8, 13 May 2016). * Menu: @@ -19,12 +19,13 @@ This manual is for Clzip (version 1.7, 7 July 2015). * Invoking clzip:: Command line interface * File format:: Detailed format of the compressed file * Algorithm:: How clzip compresses the data +* Trailing data:: Extra data appended to the file * Examples:: A small tutorial with examples * Problems:: Reporting bugs * Concept index:: Index of concepts - Copyright (C) 2010-2015 Antonio Diaz Diaz. + Copyright (C) 2010-2016 Antonio Diaz Diaz. This manual is free documentation: you have unlimited permission to copy, distribute and modify it. @@ -53,7 +54,7 @@ availability: recovery means. The lziprecover program can repair bit-flip errors (one of the most common forms of data corruption) in lzip files, and provides data recovery capabilities, including error-checked - merging of damaged copies of a file. *note Data safety: + merging of damaged copies of a file. *Note Data safety: (lziprecover)Data safety. * The lzip format is as simple as possible (but not simpler). The @@ -73,15 +74,14 @@ corrupt byte near the beginning is a thing of the past. The member trailer stores the 32-bit CRC of the original data, the size of the original data and the size of the member. These values, -together with the value remaining in the range decoder and the -end-of-stream marker, provide a 4 factor integrity checking which -guarantees that the decompressed version of the data is identical to -the original. This guards against corruption of the compressed data, -and against undetected bugs in clzip (hopefully very unlikely). The -chances of data corruption going undetected are microscopic. Be aware, -though, that the check occurs upon decompression, so it can only tell -you that something is wrong. It can't help you recover the original -uncompressed data. +together with the end-of-stream marker, provide a 3 factor integrity +checking which guarantees that the decompressed version of the data is +identical to the original. This guards against corruption of the +compressed data, and against undetected bugs in clzip (hopefully very +unlikely). The chances of data corruption going undetected are +microscopic. Be aware, though, that the check occurs upon +decompression, so it can only tell you that something is wrong. It +can't help you recover the original uncompressed data. Clzip uses the same well-defined exit status values used by lzip and bzip2, which makes it safer than compressors returning ambiguous warning @@ -128,14 +128,14 @@ two or more compressed files. The result is the concatenation of the corresponding uncompressed files. Integrity testing of concatenated compressed files is also supported. - Clzip can produce multi-member files and safely recover, with + Clzip can produce multimember files and safely recover, with lziprecover, the undamaged members in case of file damage. Clzip can also split the compressed output in volumes of a given size, even when reading from standard input. This allows the direct creation of multivolume compressed tar archives. Clzip is able to compress and decompress streams of unlimited size by -automatically creating multi-member output. The members so created are +automatically creating multimember output. The members so created are large, about 2 PiB each. @@ -148,6 +148,10 @@ The format for running clzip is: clzip [OPTIONS] [FILES] +'-' used as a FILE argument means standard input. It can be mixed with +other FILES and is read just once, the first time it appears in the +command line. + Clzip supports the following options: '-h' @@ -158,6 +162,13 @@ The format for running clzip is: '--version' Print the version number of clzip on the standard output and exit. +'-a' +'--trailing-error' + Exit with error status 2 if any remaining input is detected after + decompressing the last member. Such remaining input is usually + trailing garbage that can be safely ignored. *Note + concat-example::. + '-b BYTES' '--member-size=BYTES' Set the member size limit to BYTES. A small member size may @@ -166,14 +177,19 @@ The format for running clzip is: '-c' '--stdout' - Compress or decompress to standard output. Needed when reading - from a named pipe (fifo) or from a device. Use it to recover as - much of the uncompressed data as possible when decompressing a - corrupt file. + Compress or decompress to standard output; keep input files + unchanged. If compressing several files, each file is compressed + independently. This option is needed when reading from a named + pipe (fifo) or from a device. Use it also to recover as much of + the uncompressed data as possible when decompressing a corrupt + file. '-d' '--decompress' - Decompress. + Decompress the specified file(s). If a file does not exist or + can't be opened, clzip continues decompressing the rest of the + files. If a file fails to decompress, clzip exits immediately + without decompressing the rest of the files. '-f' '--force' @@ -211,12 +227,13 @@ The format for running clzip is: '-s BYTES' '--dictionary-size=BYTES' - Set the dictionary size limit in bytes. Valid values range from 4 - KiB to 512 MiB. Clzip will use the smallest possible dictionary - size for each file without exceeding this limit. Note that - dictionary sizes are quantized. If the specified size does not - match one of the valid sizes, it will be rounded upwards by adding - up to (BYTES / 16) to it. + Set the dictionary size limit in bytes. Clzip will use the smallest + possible dictionary size for each file without exceeding this + limit. Valid values range from 4 KiB to 512 MiB. Values 12 to 29 + are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note + that dictionary sizes are quantized. If the specified size does + not match one of the valid sizes, it will be rounded upwards by + adding up to (BYTES / 8) to it. For maximum compression you should use a dictionary size limit as large as possible, but keep in mind that the decompression memory @@ -228,16 +245,17 @@ The format for running clzip is: Split the compressed output into several volume files with names 'original_name00001.lz', 'original_name00002.lz', etc, and set the volume size limit to BYTES. Each volume is a complete, maybe - multi-member, lzip file. A small volume size may degrade - compression ratio, so use it only when needed. Valid values range - from 100 kB to 4 EiB. + multimember, lzip file. A small volume size may degrade compression + ratio, so use it only when needed. Valid values range from 100 kB + to 4 EiB. '-t' '--test' Check integrity of the specified file(s), but don't decompress them. This really performs a trial decompression and throws away the result. Use it together with '-v' to see information about - the file. + the file(s). If a file fails the test, clzip continues checking + the rest of the files. '-v' '--verbose' @@ -246,18 +264,19 @@ The format for running clzip is: processed. A second '-v' shows the progress of compression. When decompressing or testing, further -v's (up to 4) increase the verbosity level, showing status, compression ratio, dictionary - size, and trailer contents (CRC, data size, member size). + size, trailer contents (CRC, data size, member size), and up to 6 + bytes of trailing data (if any). '-0 .. -9' Set the compression parameters (dictionary size and match length - limit) as shown in the table below. Note that '-9' can be much - slower than '-0'. These options have no effect when decompressing. + limit) as shown in the table below. The default compression level + is '-6'. Note that '-9' can be much slower than '-0'. These + options have no effect when decompressing. The bidimensional parameter space of LZMA can't be mapped to a linear scale optimal for all files. If your files are large, very - repetitive, etc, you may need to use the '--match-length' and - '--dictionary-size' options directly to achieve optimal - performance. + repetitive, etc, you may need to use the '--dictionary-size' and + '--match-length' options directly to achieve optimal performance. Level Dictionary size Match length limit -0 64 KiB 16 bytes @@ -327,12 +346,12 @@ additional information before, between, or after them. Each member has the following structure: +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -| ID string | VN | DS | Lzma stream | CRC32 | Data size | Member size | +| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size | +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ All multibyte values are stored in little endian order. -'ID string' +'ID string (the "magic" bytes)' A four byte string, identifying the lzip format, with the value "LZIP" (0x4C, 0x5A, 0x49, 0x50). @@ -350,8 +369,8 @@ additional information before, between, or after them. Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB Valid values for dictionary size range from 4 KiB to 512 MiB. -'Lzma stream' - The lzma stream, finished by an end of stream marker. Uses default +'LZMA stream' + The LZMA stream, finished by an end of stream marker. Uses default values for encoder properties. *Note Stream format: (lzip)Stream format, for a complete description. @@ -365,11 +384,11 @@ additional information before, between, or after them. Total size of the member, including header and trailer. This field acts as a distributed index, allows the verification of stream integrity, and facilitates safe recovery of undamaged members from - multi-member files. + multimember files. -File: clzip.info, Node: Algorithm, Next: Examples, Prev: File format, Up: Top +File: clzip.info, Node: Algorithm, Next: Trailing data, Prev: File format, Up: Top 4 Algorithm *********** @@ -435,15 +454,48 @@ range encoding), Igor Pavlov (for putting all the above together in LZMA), and Julian Seward (for bzip2's CLI). -File: clzip.info, Node: Examples, Next: Problems, Prev: Algorithm, Up: Top +File: clzip.info, Node: Trailing data, Next: Examples, Prev: Algorithm, Up: Top + +5 Extra data appended to the file +********************************* + +Sometimes extra data is found appended to a lzip file after the last +member. Such trailing data may be: + + * Padding added to make the file size a multiple of some block size, + for example when writing to a tape. + + * Garbage added by some not totally successful copy operation. + + * Useful data added by the user; a cryptographically secure hash, a + description of file contents, etc. + + * Malicious data added to the file in order to make its total size + and hash value (for a chosen hash) coincide with those of another + file. -5 A small tutorial with examples + * In very rare cases, trailing data could be the corrupt header of + another member. In multimember or concatenated files the + probability of corruption happening in the magic bytes is 5 times + smaller than the probability of getting a false positive caused by + the corruption of the integrity information itself. Therefore it + can be considered to be below the noise level. + + Trailing data can be safely ignored in most cases. In some cases, +like that of user-added data, it is expected to be ignored. In those +cases where a file containing trailing data must be rejected, the option +'--trailing-error' can be used. *Note --trailing-error::. + + +File: clzip.info, Node: Examples, Next: Problems, Prev: Trailing data, Up: Top + +6 A small tutorial with examples ******************************** WARNING! Even if clzip is bug-free, other causes may result in a corrupt compressed file (bugs in the system libraries, memory errors, etc). Therefore, if the data you are going to compress are important, give the -'--keep' option to clzip and do not remove the original file until you +'--keep' option to clzip and don't remove the original file until you verify the compressed file with a command like 'clzip -cd file.lz | cmp file -'. @@ -454,8 +506,8 @@ and show the compression ratio. clzip -v file -Example 2: Like example 1 but the created 'file.lz' is multi-member -with a member size of 1 MiB. The compression ratio is not shown. +Example 2: Like example 1 but the created 'file.lz' is multimember with +a member size of 1 MiB. The compression ratio is not shown. clzip -b 1MiB file @@ -472,37 +524,46 @@ show status. clzip -tv file.lz -Example 5: Compress a whole floppy in /dev/fd0 and send the output to +Example 5: Compress a whole device in /dev/sdc and send the output to 'file.lz'. - clzip -c /dev/fd0 > file.lz + clzip -c /dev/sdc > file.lz + + +Example 6: The right way of concatenating compressed files. *Note +Trailing data::. + + Don't do this + cat file1.lz file2.lz file3.lz | clzip -d + Do this instead + clzip -cd file1.lz file2.lz file3.lz -Example 6: Decompress 'file.lz' partially until 10 KiB of decompressed +Example 7: Decompress 'file.lz' partially until 10 KiB of decompressed data are produced. clzip -cd file.lz | dd bs=1024 count=10 -Example 7: Decompress 'file.lz' partially from decompressed byte 10000 +Example 8: Decompress 'file.lz' partially from decompressed byte 10000 to decompressed byte 15000 (5000 bytes are produced). clzip -cd file.lz | dd bs=1000 skip=10 count=5 -Example 8: Create a multivolume compressed tar archive with a volume +Example 9: Create a multivolume compressed tar archive with a volume size of 1440 KiB. tar -c some_directory | clzip -S 1440KiB -o volume_name -Example 9: Extract a multivolume compressed tar archive. +Example 10: Extract a multivolume compressed tar archive. clzip -cd volume_name*.lz | tar -xf - -Example 10: Create a multivolume compressed backup of a large database -file with a volume size of 650 MB, where each volume is a multi-member +Example 11: Create a multivolume compressed backup of a large database +file with a volume size of 650 MB, where each volume is a multimember file with a member size of 32 MiB. clzip -b 32MiB -S 650MB big_db @@ -510,7 +571,7 @@ file with a member size of 32 MiB. File: clzip.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top -6 Reporting bugs +7 Reporting bugs **************** There are probably bugs in clzip. There are certainly errors and @@ -539,6 +600,7 @@ Concept index * introduction: Introduction. (line 6) * invoking: Invoking clzip. (line 6) * options: Invoking clzip. (line 6) +* trailing data: Trailing data. (line 6) * usage: Invoking clzip. (line 6) * version: Invoking clzip. (line 6) @@ -546,13 +608,16 @@ Concept index Tag Table: Node: Top210 -Node: Introduction893 -Node: Invoking clzip6152 -Node: File format11705 -Node: Algorithm14108 -Node: Examples16933 -Node: Problems18900 -Node: Concept index19426 +Node: Introduction952 +Node: Invoking clzip6164 +Ref: --trailing-error6730 +Node: File format12728 +Node: Algorithm15150 +Node: Trailing data17980 +Node: Examples19355 +Ref: concat-example20537 +Node: Problems21544 +Node: Concept index22070 End Tag Table diff --git a/doc/clzip.texi b/doc/clzip.texi index e2ca889..331d4eb 100644 --- a/doc/clzip.texi +++ b/doc/clzip.texi @@ -6,8 +6,8 @@ @finalout @c %**end of header -@set UPDATED 7 July 2015 -@set VERSION 1.7 +@set UPDATED 13 May 2016 +@set VERSION 1.8 @dircategory Data Compression @direntry @@ -39,13 +39,14 @@ This manual is for Clzip (version @value{VERSION}, @value{UPDATED}). * Invoking clzip:: Command line interface * File format:: Detailed format of the compressed file * Algorithm:: How clzip compresses the data +* Trailing data:: Extra data appended to the file * Examples:: A small tutorial with examples * Problems:: Reporting bugs * Concept index:: Index of concepts @end menu @sp 1 -Copyright @copyright{} 2010-2015 Antonio Diaz Diaz. +Copyright @copyright{} 2010-2016 Antonio Diaz Diaz. This manual is free documentation: you have unlimited permission to copy, distribute and modify it. @@ -78,7 +79,7 @@ program can repair bit-flip errors (one of the most common forms of data corruption) in lzip files, and provides data recovery capabilities, including error-checked merging of damaged copies of a file. @ifnothtml -@ref{Data safety,,,lziprecover}. +@xref{Data safety,,,lziprecover}. @end ifnothtml @item @@ -101,14 +102,14 @@ corrupt byte near the beginning is a thing of the past. The member trailer stores the 32-bit CRC of the original data, the size of the original data and the size of the member. These values, together -with the value remaining in the range decoder and the end-of-stream -marker, provide a 4 factor integrity checking which guarantees that the -decompressed version of the data is identical to the original. This -guards against corruption of the compressed data, and against undetected -bugs in clzip (hopefully very unlikely). The chances of data corruption -going undetected are microscopic. Be aware, though, that the check -occurs upon decompression, so it can only tell you that something is -wrong. It can't help you recover the original uncompressed data. +with the end-of-stream marker, provide a 3 factor integrity checking +which guarantees that the decompressed version of the data is identical +to the original. This guards against corruption of the compressed data, +and against undetected bugs in clzip (hopefully very unlikely). The +chances of data corruption going undetected are microscopic. Be aware, +though, that the check occurs upon decompression, so it can only tell +you that something is wrong. It can't help you recover the original +uncompressed data. Clzip uses the same well-defined exit status values used by lzip and bzip2, which makes it safer than compressors returning ambiguous warning @@ -157,14 +158,14 @@ or more compressed files. The result is the concatenation of the corresponding uncompressed files. Integrity testing of concatenated compressed files is also supported. -Clzip can produce multi-member files and safely recover, with +Clzip can produce multimember files and safely recover, with lziprecover, the undamaged members in case of file damage. Clzip can also split the compressed output in volumes of a given size, even when reading from standard input. This allows the direct creation of multivolume compressed tar archives. Clzip is able to compress and decompress streams of unlimited size by -automatically creating multi-member output. The members so created are +automatically creating multimember output. The members so created are large, about 2 PiB each. @@ -181,6 +182,11 @@ The format for running clzip is: clzip [@var{options}] [@var{files}] @end example +@noindent +@samp{-} used as a @var{file} argument means standard input. It can be +mixed with other @var{files} and is read just once, the first time it +appears in the command line. + Clzip supports the following options: @table @code @@ -192,6 +198,13 @@ Print an informative help message describing the options and exit. @itemx --version Print the version number of clzip on the standard output and exit. +@anchor{--trailing-error} +@item -a +@itemx --trailing-error +Exit with error status 2 if any remaining input is detected after +decompressing the last member. Such remaining input is usually trailing +garbage that can be safely ignored. @xref{concat-example}. + @item -b @var{bytes} @itemx --member-size=@var{bytes} Set the member size limit to @var{bytes}. A small member size may @@ -200,13 +213,18 @@ range from 100 kB to 2 PiB. Defaults to 2 PiB. @item -c @itemx --stdout -Compress or decompress to standard output. Needed when reading from a -named pipe (fifo) or from a device. Use it to recover as much of the -uncompressed data as possible when decompressing a corrupt file. +Compress or decompress to standard output; keep input files unchanged. +If compressing several files, each file is compressed independently. +This option is needed when reading from a named pipe (fifo) or from a +device. Use it also to recover as much of the uncompressed data as +possible when decompressing a corrupt file. @item -d @itemx --decompress -Decompress. +Decompress the specified file(s). If a file does not exist or can't be +opened, clzip continues decompressing the rest of the files. If a file +fails to decompress, clzip exits immediately without decompressing the +rest of the files. @item -f @itemx --force @@ -242,11 +260,13 @@ Quiet operation. Suppress all messages. @item -s @var{bytes} @itemx --dictionary-size=@var{bytes} -Set the dictionary size limit in bytes. Valid values range from 4 KiB to -512 MiB. Clzip will use the smallest possible dictionary size for each -file without exceeding this limit. Note that dictionary sizes are -quantized. If the specified size does not match one of the valid sizes, -it will be rounded upwards by adding up to (@var{bytes} / 16) to it. +Set the dictionary size limit in bytes. Clzip will use the smallest +possible dictionary size for each file without exceeding this limit. +Valid values range from 4 KiB to 512 MiB. Values 12 to 29 are +interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note that +dictionary sizes are quantized. If the specified size does not match one +of the valid sizes, it will be rounded upwards by adding up to +@w{(@var{bytes} / 8)} to it. For maximum compression you should use a dictionary size limit as large as possible, but keep in mind that the decompression memory requirement @@ -257,7 +277,7 @@ is affected at compression time by the choice of dictionary size limit. Split the compressed output into several volume files with names @samp{original_name00001.lz}, @samp{original_name00002.lz}, etc, and set the volume size limit to @var{bytes}. Each volume is a complete, maybe -multi-member, lzip file. A small volume size may degrade compression +multimember, lzip file. A small volume size may degrade compression ratio, so use it only when needed. Valid values range from 100 kB to 4 EiB. @@ -265,7 +285,8 @@ EiB. @itemx --test Check integrity of the specified file(s), but don't decompress them. This really performs a trial decompression and throws away the result. -Use it together with @samp{-v} to see information about the file. +Use it together with @samp{-v} to see information about the file(s). If +a file fails the test, clzip continues checking the rest of the files. @item -v @itemx --verbose @@ -274,18 +295,19 @@ When compressing, show the compression ratio for each file processed. A second @samp{-v} shows the progress of compression.@* When decompressing or testing, further -v's (up to 4) increase the verbosity level, showing status, compression ratio, dictionary size, -and trailer contents (CRC, data size, member size). +trailer contents (CRC, data size, member size), and up to 6 bytes of +trailing data (if any). @item -0 .. -9 Set the compression parameters (dictionary size and match length limit) -as shown in the table below. Note that @samp{-9} can be much slower than -@samp{-0}. These options have no effect when decompressing. +as shown in the table below. The default compression level is @samp{-6}. +Note that @samp{-9} can be much slower than @samp{-0}. These options +have no effect when decompressing. The bidimensional parameter space of LZMA can't be mapped to a linear scale optimal for all files. If your files are large, very repetitive, -etc, you may need to use the @samp{--match-length} and -@samp{--dictionary-size} options directly to achieve optimal -performance. +etc, you may need to use the @samp{--dictionary-size} and +@samp{--match-length} options directly to achieve optimal performance. @multitable {Level} {Dictionary size} {Match length limit} @item Level @tab Dictionary size @tab Match length limit @@ -364,14 +386,14 @@ additional information before, between, or after them. Each member has the following structure: @verbatim +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -| ID string | VN | DS | Lzma stream | CRC32 | Data size | Member size | +| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size | +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ @end verbatim All multibyte values are stored in little endian order. @table @samp -@item ID string +@item ID string (the "magic" bytes) A four byte string, identifying the lzip format, with the value "LZIP" (0x4C, 0x5A, 0x49, 0x50). @@ -388,8 +410,8 @@ from the base size to obtain the dictionary size.@* Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@* Valid values for dictionary size range from 4 KiB to 512 MiB. -@item Lzma stream -The lzma stream, finished by an end of stream marker. Uses default +@item LZMA stream +The LZMA stream, finished by an end of stream marker. Uses default values for encoder properties. @ifnothtml @xref{Stream format,,,lzip}, @@ -409,7 +431,7 @@ Size of the uncompressed original data. @item Member size (8 bytes) Total size of the member, including header and trailer. This field acts as a distributed index, allows the verification of stream integrity, and -facilitates safe recovery of undamaged members from multi-member files. +facilitates safe recovery of undamaged members from multimember files. @end table @@ -480,6 +502,44 @@ range encoding), Igor Pavlov (for putting all the above together in LZMA), and Julian Seward (for bzip2's CLI). +@node Trailing data +@chapter Extra data appended to the file +@cindex trailing data + +Sometimes extra data is found appended to a lzip file after the last +member. Such trailing data may be: + +@itemize @bullet +@item +Padding added to make the file size a multiple of some block size, for +example when writing to a tape. + +@item +Garbage added by some not totally successful copy operation. + +@item +Useful data added by the user; a cryptographically secure hash, a +description of file contents, etc. + +@item +Malicious data added to the file in order to make its total size and +hash value (for a chosen hash) coincide with those of another file. + +@item +In very rare cases, trailing data could be the corrupt header of another +member. In multimember or concatenated files the probability of +corruption happening in the magic bytes is 5 times smaller than the +probability of getting a false positive caused by the corruption of the +integrity information itself. Therefore it can be considered to be below +the noise level. +@end itemize + +Trailing data can be safely ignored in most cases. In some cases, like +that of user-added data, it is expected to be ignored. In those cases +where a file containing trailing data must be rejected, the option +@samp{--trailing-error} can be used. @xref{--trailing-error}. + + @node Examples @chapter A small tutorial with examples @cindex examples @@ -487,7 +547,7 @@ LZMA), and Julian Seward (for bzip2's CLI). WARNING! Even if clzip is bug-free, other causes may result in a corrupt compressed file (bugs in the system libraries, memory errors, etc). Therefore, if the data you are going to compress are important, give the -@samp{--keep} option to clzip and do not remove the original file until +@samp{--keep} option to clzip and don't remove the original file until you verify the compressed file with a command like @w{@samp{clzip -cd file.lz | cmp file -}}. @@ -502,7 +562,7 @@ clzip -v file @sp 1 @noindent -Example 2: Like example 1 but the created @samp{file.lz} is multi-member +Example 2: Like example 1 but the created @samp{file.lz} is multimember with a member size of 1 MiB. The compression ratio is not shown. @example @@ -530,16 +590,29 @@ clzip -tv file.lz @sp 1 @noindent -Example 5: Compress a whole floppy in /dev/fd0 and send the output to +Example 5: Compress a whole device in /dev/sdc and send the output to @samp{file.lz}. @example -clzip -c /dev/fd0 > file.lz +clzip -c /dev/sdc > file.lz +@end example + +@sp 1 +@anchor{concat-example} +@noindent +Example 6: The right way of concatenating compressed files. +@xref{Trailing data}. + +@example +Don't do this + cat file1.lz file2.lz file3.lz | clzip -d +Do this instead + clzip -cd file1.lz file2.lz file3.lz @end example @sp 1 @noindent -Example 6: Decompress @samp{file.lz} partially until 10 KiB of +Example 7: Decompress @samp{file.lz} partially until 10 KiB of decompressed data are produced. @example @@ -548,7 +621,7 @@ clzip -cd file.lz | dd bs=1024 count=10 @sp 1 @noindent -Example 7: Decompress @samp{file.lz} partially from decompressed byte +Example 8: Decompress @samp{file.lz} partially from decompressed byte 10000 to decompressed byte 15000 (5000 bytes are produced). @example @@ -557,7 +630,7 @@ clzip -cd file.lz | dd bs=1000 skip=10 count=5 @sp 1 @noindent -Example 8: Create a multivolume compressed tar archive with a volume +Example 9: Create a multivolume compressed tar archive with a volume size of 1440 KiB. @example @@ -566,7 +639,7 @@ tar -c some_directory | clzip -S 1440KiB -o volume_name @sp 1 @noindent -Example 9: Extract a multivolume compressed tar archive. +Example 10: Extract a multivolume compressed tar archive. @example clzip -cd volume_name*.lz | tar -xf - @@ -574,8 +647,8 @@ clzip -cd volume_name*.lz | tar -xf - @sp 1 @noindent -Example 10: Create a multivolume compressed backup of a large database -file with a volume size of 650 MB, where each volume is a multi-member +Example 11: Create a multivolume compressed backup of a large database +file with a volume size of 650 MB, where each volume is a multimember file with a member size of 32 MiB. @example |