diff options
Diffstat (limited to 'doc/lzlib.texi')
-rw-r--r-- | doc/lzlib.texi | 275 |
1 files changed, 237 insertions, 38 deletions
diff --git a/doc/lzlib.texi b/doc/lzlib.texi index 8b4aaaf..34154cd 100644 --- a/doc/lzlib.texi +++ b/doc/lzlib.texi @@ -6,8 +6,8 @@ @finalout @c %**end of header -@set UPDATED 11 April 2017 -@set VERSION 1.9 +@set UPDATED 7 February 2018 +@set VERSION 1.10 @dircategory Data Compression @direntry @@ -35,22 +35,23 @@ This manual is for Lzlib (version @value{VERSION}, @value{UPDATED}). @menu -* Introduction:: Purpose and features of lzlib -* Library version:: Checking library version -* Buffering:: Sizes of lzlib's buffers -* Parameter limits:: Min / max values for some parameters -* Compression functions:: Descriptions of the compression functions -* Decompression functions:: Descriptions of the decompression functions -* Error codes:: Meaning of codes returned by functions -* Error messages:: Error messages corresponding to error codes -* Data format:: Detailed format of the compressed data -* Examples:: A small tutorial with examples -* Problems:: Reporting bugs -* Concept index:: Index of concepts +* Introduction:: Purpose and features of lzlib +* Library version:: Checking library version +* Buffering:: Sizes of lzlib's buffers +* Parameter limits:: Min / max values for some parameters +* Compression functions:: Descriptions of the compression functions +* Decompression functions:: Descriptions of the decompression functions +* Error codes:: Meaning of codes returned by functions +* Error messages:: Error messages corresponding to error codes +* Invoking minilzip:: Command line interface of the test program +* Data format:: Detailed format of the compressed data +* Examples:: A small tutorial with examples +* Problems:: Reporting bugs +* Concept index:: Index of concepts @end menu @sp 1 -Copyright @copyright{} 2009-2017 Antonio Diaz Diaz. +Copyright @copyright{} 2009-2018 Antonio Diaz Diaz. This manual is free documentation: you have unlimited permission to copy, distribute and modify it. @@ -74,7 +75,7 @@ availability: The lzip format provides very safe integrity checking and some data recovery means. The @uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover} -program can repair bit-flip errors (one of the most common forms of data +program can repair bit flip errors (one of the most common forms of data corruption) in lzip files, and provides data recovery capabilities, including error-checked merging of damaged copies of a file. @ifnothtml @@ -201,18 +202,18 @@ sizes: @item Input compression buffer. Written to by the @samp{LZ_compress_write} function. For the normal variant of LZMA, its size is two times the dictionary size set with the -@samp{LZ_compress_open} function or 64 KiB, whichever is larger. For the -fast variant, its size is 1 MiB. +@samp{LZ_compress_open} function or @w{64 KiB}, whichever is larger. For +the fast variant, its size is @w{1 MiB}. @item Output compression buffer. Read from by the -@samp{LZ_compress_read} function. Its size is 64 KiB. +@samp{LZ_compress_read} function. Its size is @w{64 KiB}. @item Input decompression buffer. Written to by the -@samp{LZ_decompress_write} function. Its size is 64 KiB. +@samp{LZ_decompress_write} function. Its size is @w{64 KiB}. @item Output decompression buffer. Read from by the @samp{LZ_decompress_read} function. Its size is the dictionary size set -in the header of the member currently being decompressed or 64 KiB, +in the header of the member currently being decompressed or @w{64 KiB}, whichever is larger. @end itemize @@ -271,10 +272,10 @@ does not return @samp{LZ_ok}, the returned pointer must not be used and should be freed with @samp{LZ_compress_close} to avoid memory leaks. @var{dictionary_size} sets the dictionary size to be used, in bytes. -Valid values range from 4 KiB to 512 MiB. Note that dictionary sizes are -quantized. If the specified size does not match one of the valid sizes, -it will be rounded upwards by adding up to (@var{dictionary_size} / 8) -to it. +Valid values range from @w{4 KiB} to @w{512 MiB}. Note that dictionary +sizes are quantized. If the specified size does not match one of the +valid sizes, it will be rounded upwards by adding up to +@w{(@var{dictionary_size} / 8)} to it. @var{match_len_limit} sets the match length limit in bytes. Valid values range from 5 to 273. Larger values usually give better compression @@ -283,13 +284,13 @@ ratios but longer compression times. If @var{dictionary_size} is 65535 and @var{match_len_limit} is 16, the fast variant of LZMA is chosen, which produces identical compressed output as @code{lzip -0}. (The dictionary size used will be rounded -upwards to 64 KiB). +upwards to @w{64 KiB}). @var{member_size} sets the member size limit in bytes. Minimum member -size limit is 100 kB. Small member size may degrade compression ratio, so -use it only when needed. To produce a single-member data stream, give -@var{member_size} a value larger than the amount of data to be produced, -for example INT64_MAX. +size limit is @w{100 kB}. Small member size may degrade compression +ratio, so use it only when needed. To produce a single-member data +stream, give @var{member_size} a value larger than the amount of data to +be produced, for example INT64_MAX. @end deftypefun @@ -369,7 +370,8 @@ Returns the current error code for @var{encoder} (@pxref{Error codes}). @deftypefun int LZ_compress_finished ( struct LZ_Encoder * const @var{encoder} ) Returns 1 if all the data have been read and @samp{LZ_compress_close} -can be safely called. Otherwise it returns 0. +can be safely called. Otherwise it returns 0. @samp{LZ_compress_finished} +implies @samp{LZ_compress_member_finished}. @end deftypefun @@ -606,7 +608,11 @@ The end of the data stream was reached in the middle of a member. @end deftypevr @deftypevr Constant {enum LZ_Errno} LZ_data_error -The data stream is corrupt. +The data stream is corrupt. If @samp{LZ_decompress_member_position} is 6 +or less, it indicates either a format version not supported, an invalid +dictionary size, a corrupt header in a multimember data stream, or +trailing data too similar to a valid lzip header. Lziprecover can be +used to remove conflicting trailing data from a file. @end deftypevr @deftypevr Constant {enum LZ_Errno} LZ_library_error @@ -629,6 +635,199 @@ The value of @var{lz_errno} normally comes from a call to @end deftypefun +@node Invoking minilzip +@chapter Invoking minilzip +@cindex invoking +@cindex options + +The format for running minilzip is: + +@example +minilzip [@var{options}] [@var{files}] +@end example + +@noindent +@samp{-} used as a @var{file} argument means standard input. It can be +mixed with other @var{files} and is read just once, the first time it +appears in the command line. + +minilzip supports the following options: + +@table @code +@item -h +@itemx --help +Print an informative help message describing the options and exit. + +@item -V +@itemx --version +Print the version number of minilzip on the standard output and exit. + +@anchor{--trailing-error} +@item -a +@itemx --trailing-error +Exit with error status 2 if any remaining input is detected after +decompressing the last member. Such remaining input is usually trailing +garbage that can be safely ignored. + +@item -b @var{bytes} +@itemx --member-size=@var{bytes} +When compressing, set the member size limit to @var{bytes}. A small +member size may degrade compression ratio, so use it only when needed. +Valid values range from @w{100 kB} to @w{2 PiB}. Defaults to @w{2 PiB}. + +@item -c +@itemx --stdout +Compress or decompress to standard output; keep input files unchanged. +If compressing several files, each file is compressed independently. +This option is needed when reading from a named pipe (fifo) or from a +device. Use it also to recover as much of the decompressed data as +possible when decompressing a corrupt file. + +@item -d +@itemx --decompress +Decompress the specified files. If a file does not exist or can't be +opened, minilzip continues decompressing the rest of the files. If a file +fails to decompress, or is a terminal, minilzip exits immediately without +decompressing the rest of the files. + +@item -f +@itemx --force +Force overwrite of output files. + +@item -F +@itemx --recompress +When compressing, force re-compression of files whose name already has +the @samp{.lz} or @samp{.tlz} suffix. + +@item -k +@itemx --keep +Keep (don't delete) input files during compression or decompression. + +@item -m @var{bytes} +@itemx --match-length=@var{bytes} +When compressing, set the match length limit in bytes. After a match +this long is found, the search is finished. Valid values range from 5 to +273. Larger values usually give better compression ratios but longer +compression times. + +@item -o @var{file} +@itemx --output=@var{file} +When reading from standard input and @samp{--stdout} has not been +specified, use @samp{@var{file}} as the virtual name of the uncompressed +file. This produces a file named @samp{@var{file}} when decompressing, +or a file named @samp{@var{file}.lz} when compressing. A second +@samp{.lz} extension is not added if @samp{@var{file}} already ends in +@samp{.lz} or @samp{.tlz}. When compressing and splitting the output in +volumes, several files named @samp{@var{file}00001.lz}, +@samp{@var{file}00002.lz}, etc, are created. + +@item -q +@itemx --quiet +Quiet operation. Suppress all messages. + +@item -s @var{bytes} +@itemx --dictionary-size=@var{bytes} +When compressing, set the dictionary size limit in bytes. Minilzip will use +the smallest possible dictionary size for each file without exceeding +this limit. Valid values range from @w{4 KiB} to @w{512 MiB}. Values 12 +to 29 are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note +that dictionary sizes are quantized. If the specified size does not +match one of the valid sizes, it will be rounded upwards by adding up to +@w{(@var{bytes} / 8)} to it. + +For maximum compression you should use a dictionary size limit as large +as possible, but keep in mind that the decompression memory requirement +is affected at compression time by the choice of dictionary size limit. + +@item -S @var{bytes} +@itemx --volume-size=@var{bytes} +When compressing, split the compressed output into several volume files +with names @samp{original_name00001.lz}, @samp{original_name00002.lz}, +etc, and set the volume size limit to @var{bytes}. Input files are kept +unchanged. Each volume is a complete, maybe multimember, lzip file. A +small volume size may degrade compression ratio, so use it only when +needed. Valid values range from @w{100 kB} to @w{4 EiB}. + +@item -t +@itemx --test +Check integrity of the specified files, but don't decompress them. This +really performs a trial decompression and throws away the result. Use it +together with @samp{-v} to see information about the files. If a file +fails the test, does not exist, can't be opened, or is a terminal, minilzip +continues checking the rest of the files. A final diagnostic is shown at +verbosity level 1 or higher if any file fails the test when testing +multiple files. + +@item -v +@itemx --verbose +Verbose mode.@* +When compressing, show the compression ratio and size for each file +processed.@* +When decompressing or testing, further -v's (up to 4) increase the +verbosity level, showing status, compression ratio, dictionary size, +and trailer contents (CRC, data size, member size). + +@item -0 .. -9 +Set the compression parameters (dictionary size and match length limit) +as shown in the table below. The default compression level is @samp{-6}. +Note that @samp{-9} can be much slower than @samp{-0}. These options +have no effect when decompressing or testing. + +The bidimensional parameter space of LZMA can't be mapped to a linear +scale optimal for all files. If your files are large, very repetitive, +etc, you may need to use the @samp{--dictionary-size} and +@samp{--match-length} options directly to achieve optimal performance. + +@multitable {Level} {Dictionary size} {Match length limit} +@item Level @tab Dictionary size @tab Match length limit +@item -0 @tab 64 KiB @tab 16 bytes +@item -1 @tab 1 MiB @tab 5 bytes +@item -2 @tab 1.5 MiB @tab 6 bytes +@item -3 @tab 2 MiB @tab 8 bytes +@item -4 @tab 3 MiB @tab 12 bytes +@item -5 @tab 4 MiB @tab 20 bytes +@item -6 @tab 8 MiB @tab 36 bytes +@item -7 @tab 16 MiB @tab 68 bytes +@item -8 @tab 24 MiB @tab 132 bytes +@item -9 @tab 32 MiB @tab 273 bytes +@end multitable + +@item --fast +@itemx --best +Aliases for GNU gzip compatibility. + +@item --loose-trailing +When decompressing or testing, allow trailing data whose first bytes are +so similar to the magic bytes of a lzip header that they can be confused +with a corrupt header. Use this option if a file triggers a "corrupt +header" error and the cause is not indeed a corrupt header. + +@end table + +Numbers given as arguments to options may be followed by a multiplier +and an optional @samp{B} for "byte". + +Table of SI and binary prefixes (unit multipliers): + +@multitable {Prefix} {kilobyte (10^3 = 1000)} {|} {Prefix} {kibibyte (2^10 = 1024)} +@item Prefix @tab Value @tab | @tab Prefix @tab Value +@item k @tab kilobyte (10^3 = 1000) @tab | @tab Ki @tab kibibyte (2^10 = 1024) +@item M @tab megabyte (10^6) @tab | @tab Mi @tab mebibyte (2^20) +@item G @tab gigabyte (10^9) @tab | @tab Gi @tab gibibyte (2^30) +@item T @tab terabyte (10^12) @tab | @tab Ti @tab tebibyte (2^40) +@item P @tab petabyte (10^15) @tab | @tab Pi @tab pebibyte (2^50) +@item E @tab exabyte (10^18) @tab | @tab Ei @tab exbibyte (2^60) +@item Z @tab zettabyte (10^21) @tab | @tab Zi @tab zebibyte (2^70) +@item Y @tab yottabyte (10^24) @tab | @tab Yi @tab yobibyte (2^80) +@end multitable + +@sp 1 +Exit status: 0 for a normal exit, 1 for environmental problems (file not +found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or +invalid input file, 3 for an internal consistency error (eg, bug) which +caused minilzip to panic. + + @node Data format @chapter Data format @cindex data format @@ -655,9 +854,9 @@ represents one byte; a box like this: represents a variable number of bytes. @sp 1 -A lzip data stream consists of a series of "members" (compressed data -sets). The members simply appear one after another in the data stream, -with no additional information before, between, or after them. +A lzip data stream consists of a series of "members" (compressed data sets). +The members simply appear one after another in the data stream, with no +additional information before, between, or after them. Each member has the following structure: @verbatim @@ -810,15 +1009,15 @@ Example 5: Multimember compression (@var{member_size} < total output). Example 6: Multimember compression (user-restarted members). @example - 1) LZ_compress_open + 1) LZ_compress_open (with @var{member_size} > largest member). 2) LZ_compress_write 3) LZ_compress_read 4) go back to step 2 until member termination is desired 5) LZ_compress_finish 6) LZ_compress_read 7) go back to step 6 until LZ_compress_member_finished returns 1 - 8) verify that LZ_compress_finished returns 1 - 9) go to step 12 if all input data have been written + 9) go to step 12 if all input data have been written and + LZ_compress_finished returns 1 10) LZ_compress_restart_member 11) go back to step 2 12) LZ_compress_close |