diff options
Diffstat (limited to '')
-rw-r--r-- | doc/clzip.info | 189 |
1 files changed, 127 insertions, 62 deletions
diff --git a/doc/clzip.info b/doc/clzip.info index 786d8c1..c590473 100644 --- a/doc/clzip.info +++ b/doc/clzip.info @@ -11,7 +11,7 @@ File: clzip.info, Node: Top, Next: Introduction, Up: (dir) Clzip Manual ************ -This manual is for Clzip (version 1.7, 7 July 2015). +This manual is for Clzip (version 1.8, 13 May 2016). * Menu: @@ -19,12 +19,13 @@ This manual is for Clzip (version 1.7, 7 July 2015). * Invoking clzip:: Command line interface * File format:: Detailed format of the compressed file * Algorithm:: How clzip compresses the data +* Trailing data:: Extra data appended to the file * Examples:: A small tutorial with examples * Problems:: Reporting bugs * Concept index:: Index of concepts - Copyright (C) 2010-2015 Antonio Diaz Diaz. + Copyright (C) 2010-2016 Antonio Diaz Diaz. This manual is free documentation: you have unlimited permission to copy, distribute and modify it. @@ -53,7 +54,7 @@ availability: recovery means. The lziprecover program can repair bit-flip errors (one of the most common forms of data corruption) in lzip files, and provides data recovery capabilities, including error-checked - merging of damaged copies of a file. *note Data safety: + merging of damaged copies of a file. *Note Data safety: (lziprecover)Data safety. * The lzip format is as simple as possible (but not simpler). The @@ -73,15 +74,14 @@ corrupt byte near the beginning is a thing of the past. The member trailer stores the 32-bit CRC of the original data, the size of the original data and the size of the member. These values, -together with the value remaining in the range decoder and the -end-of-stream marker, provide a 4 factor integrity checking which -guarantees that the decompressed version of the data is identical to -the original. This guards against corruption of the compressed data, -and against undetected bugs in clzip (hopefully very unlikely). The -chances of data corruption going undetected are microscopic. Be aware, -though, that the check occurs upon decompression, so it can only tell -you that something is wrong. It can't help you recover the original -uncompressed data. +together with the end-of-stream marker, provide a 3 factor integrity +checking which guarantees that the decompressed version of the data is +identical to the original. This guards against corruption of the +compressed data, and against undetected bugs in clzip (hopefully very +unlikely). The chances of data corruption going undetected are +microscopic. Be aware, though, that the check occurs upon +decompression, so it can only tell you that something is wrong. It +can't help you recover the original uncompressed data. Clzip uses the same well-defined exit status values used by lzip and bzip2, which makes it safer than compressors returning ambiguous warning @@ -128,14 +128,14 @@ two or more compressed files. The result is the concatenation of the corresponding uncompressed files. Integrity testing of concatenated compressed files is also supported. - Clzip can produce multi-member files and safely recover, with + Clzip can produce multimember files and safely recover, with lziprecover, the undamaged members in case of file damage. Clzip can also split the compressed output in volumes of a given size, even when reading from standard input. This allows the direct creation of multivolume compressed tar archives. Clzip is able to compress and decompress streams of unlimited size by -automatically creating multi-member output. The members so created are +automatically creating multimember output. The members so created are large, about 2 PiB each. @@ -148,6 +148,10 @@ The format for running clzip is: clzip [OPTIONS] [FILES] +'-' used as a FILE argument means standard input. It can be mixed with +other FILES and is read just once, the first time it appears in the +command line. + Clzip supports the following options: '-h' @@ -158,6 +162,13 @@ The format for running clzip is: '--version' Print the version number of clzip on the standard output and exit. +'-a' +'--trailing-error' + Exit with error status 2 if any remaining input is detected after + decompressing the last member. Such remaining input is usually + trailing garbage that can be safely ignored. *Note + concat-example::. + '-b BYTES' '--member-size=BYTES' Set the member size limit to BYTES. A small member size may @@ -166,14 +177,19 @@ The format for running clzip is: '-c' '--stdout' - Compress or decompress to standard output. Needed when reading - from a named pipe (fifo) or from a device. Use it to recover as - much of the uncompressed data as possible when decompressing a - corrupt file. + Compress or decompress to standard output; keep input files + unchanged. If compressing several files, each file is compressed + independently. This option is needed when reading from a named + pipe (fifo) or from a device. Use it also to recover as much of + the uncompressed data as possible when decompressing a corrupt + file. '-d' '--decompress' - Decompress. + Decompress the specified file(s). If a file does not exist or + can't be opened, clzip continues decompressing the rest of the + files. If a file fails to decompress, clzip exits immediately + without decompressing the rest of the files. '-f' '--force' @@ -211,12 +227,13 @@ The format for running clzip is: '-s BYTES' '--dictionary-size=BYTES' - Set the dictionary size limit in bytes. Valid values range from 4 - KiB to 512 MiB. Clzip will use the smallest possible dictionary - size for each file without exceeding this limit. Note that - dictionary sizes are quantized. If the specified size does not - match one of the valid sizes, it will be rounded upwards by adding - up to (BYTES / 16) to it. + Set the dictionary size limit in bytes. Clzip will use the smallest + possible dictionary size for each file without exceeding this + limit. Valid values range from 4 KiB to 512 MiB. Values 12 to 29 + are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note + that dictionary sizes are quantized. If the specified size does + not match one of the valid sizes, it will be rounded upwards by + adding up to (BYTES / 8) to it. For maximum compression you should use a dictionary size limit as large as possible, but keep in mind that the decompression memory @@ -228,16 +245,17 @@ The format for running clzip is: Split the compressed output into several volume files with names 'original_name00001.lz', 'original_name00002.lz', etc, and set the volume size limit to BYTES. Each volume is a complete, maybe - multi-member, lzip file. A small volume size may degrade - compression ratio, so use it only when needed. Valid values range - from 100 kB to 4 EiB. + multimember, lzip file. A small volume size may degrade compression + ratio, so use it only when needed. Valid values range from 100 kB + to 4 EiB. '-t' '--test' Check integrity of the specified file(s), but don't decompress them. This really performs a trial decompression and throws away the result. Use it together with '-v' to see information about - the file. + the file(s). If a file fails the test, clzip continues checking + the rest of the files. '-v' '--verbose' @@ -246,18 +264,19 @@ The format for running clzip is: processed. A second '-v' shows the progress of compression. When decompressing or testing, further -v's (up to 4) increase the verbosity level, showing status, compression ratio, dictionary - size, and trailer contents (CRC, data size, member size). + size, trailer contents (CRC, data size, member size), and up to 6 + bytes of trailing data (if any). '-0 .. -9' Set the compression parameters (dictionary size and match length - limit) as shown in the table below. Note that '-9' can be much - slower than '-0'. These options have no effect when decompressing. + limit) as shown in the table below. The default compression level + is '-6'. Note that '-9' can be much slower than '-0'. These + options have no effect when decompressing. The bidimensional parameter space of LZMA can't be mapped to a linear scale optimal for all files. If your files are large, very - repetitive, etc, you may need to use the '--match-length' and - '--dictionary-size' options directly to achieve optimal - performance. + repetitive, etc, you may need to use the '--dictionary-size' and + '--match-length' options directly to achieve optimal performance. Level Dictionary size Match length limit -0 64 KiB 16 bytes @@ -327,12 +346,12 @@ additional information before, between, or after them. Each member has the following structure: +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -| ID string | VN | DS | Lzma stream | CRC32 | Data size | Member size | +| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size | +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ All multibyte values are stored in little endian order. -'ID string' +'ID string (the "magic" bytes)' A four byte string, identifying the lzip format, with the value "LZIP" (0x4C, 0x5A, 0x49, 0x50). @@ -350,8 +369,8 @@ additional information before, between, or after them. Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB Valid values for dictionary size range from 4 KiB to 512 MiB. -'Lzma stream' - The lzma stream, finished by an end of stream marker. Uses default +'LZMA stream' + The LZMA stream, finished by an end of stream marker. Uses default values for encoder properties. *Note Stream format: (lzip)Stream format, for a complete description. @@ -365,11 +384,11 @@ additional information before, between, or after them. Total size of the member, including header and trailer. This field acts as a distributed index, allows the verification of stream integrity, and facilitates safe recovery of undamaged members from - multi-member files. + multimember files. -File: clzip.info, Node: Algorithm, Next: Examples, Prev: File format, Up: Top +File: clzip.info, Node: Algorithm, Next: Trailing data, Prev: File format, Up: Top 4 Algorithm *********** @@ -435,15 +454,48 @@ range encoding), Igor Pavlov (for putting all the above together in LZMA), and Julian Seward (for bzip2's CLI). -File: clzip.info, Node: Examples, Next: Problems, Prev: Algorithm, Up: Top +File: clzip.info, Node: Trailing data, Next: Examples, Prev: Algorithm, Up: Top + +5 Extra data appended to the file +********************************* + +Sometimes extra data is found appended to a lzip file after the last +member. Such trailing data may be: + + * Padding added to make the file size a multiple of some block size, + for example when writing to a tape. + + * Garbage added by some not totally successful copy operation. + + * Useful data added by the user; a cryptographically secure hash, a + description of file contents, etc. + + * Malicious data added to the file in order to make its total size + and hash value (for a chosen hash) coincide with those of another + file. -5 A small tutorial with examples + * In very rare cases, trailing data could be the corrupt header of + another member. In multimember or concatenated files the + probability of corruption happening in the magic bytes is 5 times + smaller than the probability of getting a false positive caused by + the corruption of the integrity information itself. Therefore it + can be considered to be below the noise level. + + Trailing data can be safely ignored in most cases. In some cases, +like that of user-added data, it is expected to be ignored. In those +cases where a file containing trailing data must be rejected, the option +'--trailing-error' can be used. *Note --trailing-error::. + + +File: clzip.info, Node: Examples, Next: Problems, Prev: Trailing data, Up: Top + +6 A small tutorial with examples ******************************** WARNING! Even if clzip is bug-free, other causes may result in a corrupt compressed file (bugs in the system libraries, memory errors, etc). Therefore, if the data you are going to compress are important, give the -'--keep' option to clzip and do not remove the original file until you +'--keep' option to clzip and don't remove the original file until you verify the compressed file with a command like 'clzip -cd file.lz | cmp file -'. @@ -454,8 +506,8 @@ and show the compression ratio. clzip -v file -Example 2: Like example 1 but the created 'file.lz' is multi-member -with a member size of 1 MiB. The compression ratio is not shown. +Example 2: Like example 1 but the created 'file.lz' is multimember with +a member size of 1 MiB. The compression ratio is not shown. clzip -b 1MiB file @@ -472,37 +524,46 @@ show status. clzip -tv file.lz -Example 5: Compress a whole floppy in /dev/fd0 and send the output to +Example 5: Compress a whole device in /dev/sdc and send the output to 'file.lz'. - clzip -c /dev/fd0 > file.lz + clzip -c /dev/sdc > file.lz + + +Example 6: The right way of concatenating compressed files. *Note +Trailing data::. + + Don't do this + cat file1.lz file2.lz file3.lz | clzip -d + Do this instead + clzip -cd file1.lz file2.lz file3.lz -Example 6: Decompress 'file.lz' partially until 10 KiB of decompressed +Example 7: Decompress 'file.lz' partially until 10 KiB of decompressed data are produced. clzip -cd file.lz | dd bs=1024 count=10 -Example 7: Decompress 'file.lz' partially from decompressed byte 10000 +Example 8: Decompress 'file.lz' partially from decompressed byte 10000 to decompressed byte 15000 (5000 bytes are produced). clzip -cd file.lz | dd bs=1000 skip=10 count=5 -Example 8: Create a multivolume compressed tar archive with a volume +Example 9: Create a multivolume compressed tar archive with a volume size of 1440 KiB. tar -c some_directory | clzip -S 1440KiB -o volume_name -Example 9: Extract a multivolume compressed tar archive. +Example 10: Extract a multivolume compressed tar archive. clzip -cd volume_name*.lz | tar -xf - -Example 10: Create a multivolume compressed backup of a large database -file with a volume size of 650 MB, where each volume is a multi-member +Example 11: Create a multivolume compressed backup of a large database +file with a volume size of 650 MB, where each volume is a multimember file with a member size of 32 MiB. clzip -b 32MiB -S 650MB big_db @@ -510,7 +571,7 @@ file with a member size of 32 MiB. File: clzip.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top -6 Reporting bugs +7 Reporting bugs **************** There are probably bugs in clzip. There are certainly errors and @@ -539,6 +600,7 @@ Concept index * introduction: Introduction. (line 6) * invoking: Invoking clzip. (line 6) * options: Invoking clzip. (line 6) +* trailing data: Trailing data. (line 6) * usage: Invoking clzip. (line 6) * version: Invoking clzip. (line 6) @@ -546,13 +608,16 @@ Concept index Tag Table: Node: Top210 -Node: Introduction893 -Node: Invoking clzip6152 -Node: File format11705 -Node: Algorithm14108 -Node: Examples16933 -Node: Problems18900 -Node: Concept index19426 +Node: Introduction952 +Node: Invoking clzip6164 +Ref: --trailing-error6730 +Node: File format12728 +Node: Algorithm15150 +Node: Trailing data17980 +Node: Examples19355 +Ref: concat-example20537 +Node: Problems21544 +Node: Concept index22070 End Tag Table |