diff options
Diffstat (limited to 'doc/lzlib.texi')
-rw-r--r-- | doc/lzlib.texi | 101 |
1 files changed, 51 insertions, 50 deletions
diff --git a/doc/lzlib.texi b/doc/lzlib.texi index 462c840..23bbbfb 100644 --- a/doc/lzlib.texi +++ b/doc/lzlib.texi @@ -6,8 +6,8 @@ @finalout @c %**end of header -@set UPDATED 19 April 2024 -@set VERSION 1.15-pre1 +@set UPDATED 16 October 2024 +@set VERSION 1.15-pre2 @dircategory Compression @direntry @@ -45,7 +45,7 @@ This manual is for Lzlib (version @value{VERSION}, @value{UPDATED}). * Error codes:: Meaning of codes returned by functions * Error messages:: Error messages corresponding to error codes * Invoking minilzip:: Command-line interface of the test program -* Data format:: Detailed format of the compressed data +* File format:: Detailed format of the compressed file * Examples:: A small tutorial with examples * Problems:: Reporting bugs * Concept index:: Index of concepts @@ -67,7 +67,7 @@ distribute, and modify it. is a data compression library providing in-memory LZMA compression and decompression functions, including integrity checking of the decompressed data. The compressed data format used by the library is the lzip format. -Lzlib is written in C. +Lzlib is written in C and is distributed under a 2-clause BSD license. The lzip file format is designed for data sharing and long-term archiving, taking into account both data integrity and decoder availability: @@ -103,16 +103,16 @@ lziprecover, losing an entire archive just because of a corrupt byte near the beginning is a thing of the past. The functions and variables forming the interface of the compression library -are declared in the file @samp{lzlib.h}. Usage examples of the library are -given in the files @samp{bbexample.c}, @samp{ffexample.c}, and -@samp{minilzip.c} from the source distribution. +are declared in the file @file{lzlib.h}. Usage examples of the library are +given in the files @file{bbexample.c}, @file{ffexample.c}, and +@file{minilzip.c} from the source distribution. -As @samp{lzlib.h} can be used by C and C++ programs, it must not impose a +As @file{lzlib.h} can be used by C and C++ programs, it must not impose a choice of system headers on the program by including one of them. Therefore it is the responsibility of the program using lzlib to include before -@samp{lzlib.h} some header that declares the type @samp{uint8_t}. There are -at least four such headers in C and C++: @samp{stdint.h}, @samp{cstdint}, -@samp{inttypes.h}, and @samp{cinttypes}. +@file{lzlib.h} some header that declares the type @samp{uint8_t}. There are +at least four such headers in C and C++: @file{stdint.h}, @file{cstdint}, +@file{inttypes.h}, and @file{cinttypes}. All the library functions are thread safe. The library does not install any signal handler. The decoder checks the consistency of the compressed data, @@ -183,10 +183,10 @@ versions of itself down to 1.0. Any application working with an older lzlib should work with a newer lzlib. Installing a newer lzlib should not break anything. This chapter describes the constants and functions that the application can use to discover the version of the library being used. All -of them are declared in @samp{lzlib.h}. +of them are declared in @file{lzlib.h}. @defvr Constant LZ_API_VERSION -This constant is defined in @samp{lzlib.h} and works as a version test +This constant is defined in @file{lzlib.h} and works as a version test macro. The application should check at compile time that LZ_API_VERSION is greater than or equal to the version required by the application: @@ -207,7 +207,7 @@ which allow the application to announce to the library its desire to have certain symbols and prototypes exposed. @deftypefun int LZ_api_version ( void ) -If LZ_API_VERSION >= 1012, this function is declared in @samp{lzlib.h} (else +If LZ_API_VERSION >= 1012, this function is declared in @file{lzlib.h} (else it doesn't exist). It returns the LZ_API_VERSION of the library object code being used. The application should check at run time that the value returned by @code{LZ_api_version} is greater than or equal to the version @@ -225,7 +225,7 @@ the functionality required by the application. @end deftypefun @deftypevr Constant {const char *} LZ_version_string -This string constant is defined in the header file @samp{lzlib.h} and +This string constant is defined in the header file @file{lzlib.h} and represents the version of the library being used at compile time. @end deftypevr @@ -385,7 +385,7 @@ lzip files; it is a device for interactive communication between applications using lzlib, but is useless and wasteful in a file, and is excluded from the media type @samp{application/lzip}. The LZMA marker @samp{2} ("End Of Stream" marker) is the only marker allowed in lzip files. -@xref{Data format}. +@xref{File format}. Repeated use of @samp{LZ_compress_sync_flush} may degrade compression ratio, so use it only when needed. If the interval between calls to @@ -524,7 +524,7 @@ detecting a truncated member. @deftypefun int LZ_decompress_reset ( struct LZ_Decoder * const @var{decoder} ) Resets the internal state of @var{decoder} as it was just after opening it with the function @samp{LZ_decompress_open}. Data stored in the -internal buffers is discarded. Position counters are set to 0. +internal buffers are discarded. Position counters are set to 0. @end deftypefun @@ -670,7 +670,7 @@ necessarily LZ_ok, and you should not use @samp{LZ_(de)compress_errno} to determine whether a call failed. If the call failed, then you can examine @samp{LZ_(de)compress_errno}. -The error codes are defined in the header file @samp{lzlib.h}. +The error codes are defined in the header file @file{lzlib.h}. @deftypevr Constant {enum LZ_Errno} LZ_ok The value of this constant is 0 and is used to indicate that there is no error. @@ -693,8 +693,8 @@ finished. @end deftypevr @deftypevr Constant {enum LZ_Errno} LZ_header_error -An invalid member header (one with the wrong magic bytes) was read. If -this happens at the end of the data stream it may indicate trailing data. +An invalid member header (one with the wrong magic bytes) was read. If this +happens at the end of the data stream it may indicate trailing data. @end deftypevr @deftypevr Constant {enum LZ_Errno} LZ_unexpected_eof @@ -702,11 +702,12 @@ The end of the data stream was reached in the middle of a member. @end deftypevr @deftypevr Constant {enum LZ_Errno} LZ_data_error -The data stream is corrupt. If @samp{LZ_decompress_member_position} is 6 -or less, it indicates either a format version not supported, an invalid -dictionary size, a corrupt header in a multimember data stream, or -trailing data too similar to a valid lzip header. Lziprecover can be -used to remove conflicting trailing data from a file. +The data stream is corrupt. If @samp{LZ_decompress_member_position} is 6 or +less, it indicates either a format version not supported, an invalid +dictionary size, a nonzero first LZMA byte, a corrupt header in a multimember +data stream, or trailing data too similar to a valid lzip header. +Lziprecover can be used to repair some of these errors and to remove +conflicting trailing data from a file. @end deftypevr @deftypevr Constant {enum LZ_Errno} LZ_library_error @@ -747,8 +748,8 @@ maximum dictionary size is 512 MiB so that any lzip file can be decompressed on 32-bit machines. Lzip provides accurate and robust 3-factor integrity checking. Lzip can compress about as fast as gzip @w{(lzip -0)} or compress most files more than bzip2 @w{(lzip -9)}. Decompression speed is intermediate between -gzip and bzip2. Lzip is better than gzip and bzip2 from a data recovery -perspective. Lzip has been designed, written, and tested with great care to +gzip and bzip2. Lzip provides better data recovery capabilities than gzip +and bzip2. Lzip has been designed, written, and tested with great care to replace gzip and bzip2 as the standard general-purpose compressed format for Unix-like systems. @@ -766,6 +767,7 @@ argument means standard input. It can be mixed with other @var{files} and is read just once, the first time it appears in the command line. Remember to prepend @file{./} to any file name beginning with a hyphen, or use @samp{--}. +@noindent minilzip supports the following @uref{http://www.nongnu.org/arg-parser/manual/arg_parser_manual.html#Argument-syntax,,options}: @ifnothtml @@ -823,7 +825,7 @@ Force overwrite of output files. @item -F @itemx --recompress When compressing, force re-compression of files whose name already has -the @samp{.lz} or @samp{.tlz} suffix. +the @file{.lz} or @file{.tlz} suffix. @item -k @itemx --keep @@ -846,8 +848,8 @@ when reading from a named pipe (fifo) or from a device. @w{@option{-o -}} is equivalent to @option{-c}. @option{-o} has no effect when testing. When compressing and splitting the output in volumes, @var{file} is used as -a prefix, and several files named @samp{@var{file}00001.lz}, -@samp{@var{file}00002.lz}, etc, are created. In this case, only one input +a prefix, and several files named @file{@var{file}00001.lz}, +@file{@var{file}00002.lz}, etc, are created. In this case, only one input file is allowed. @item -q @@ -873,7 +875,7 @@ is affected at compression time by the choice of dictionary size limit. @itemx --volume-size=@var{bytes} When compressing, and @option{-c} has not been also specified, split the compressed output into several volume files with names -@samp{original_name00001.lz}, @samp{original_name00002.lz}, etc, and set the +@file{original_name00001.lz}, @file{original_name00002.lz}, etc, and set the volume size limit to @var{bytes}. Input files are kept unchanged. Each volume is a complete, maybe multimember, lzip file. A small volume size may degrade compression ratio, so use it only when needed. Valid values range @@ -892,11 +894,10 @@ files. @item -v @itemx --verbose Verbose mode.@* -When compressing, show the compression ratio and size for each file -processed.@* -When decompressing or testing, further -v's (up to 4) increase the -verbosity level, showing status, compression ratio, dictionary size, -and trailer contents (CRC, data size, member size). +When compressing, show the compression ratio and size for each file processed.@* +When decompressing or testing, further -v's (up to 4) increase the verbosity +level, showing status, compression ratio, dictionary size, and trailer +contents (CRC, data size, member size). @item -0 .. -9 Compression level. Set the compression parameters (dictionary size and @@ -915,7 +916,7 @@ given, the last setting is used. For example @w{@option{-9 -s64MiB}} is equivalent to @w{@option{-s64MiB -m273}} @multitable {Level} {Dictionary size (-s)} {Match length limit (-m)} -@item Level @tab Dictionary size (-s) @tab Match length limit (-m) +@headitem Level @tab Dictionary size (-s) @tab Match length limit (-m) @item -0 @tab 64 KiB @tab 16 bytes @item -1 @tab 1 MiB @tab 5 bytes @item -2 @tab 1.5 MiB @tab 6 bytes @@ -960,7 +961,7 @@ and may be followed by a multiplier and an optional @samp{B} for "byte". Table of SI and binary prefixes (unit multipliers): @multitable {Prefix} {kilobyte (10^3 = 1000)} {|} {Prefix} {kibibyte (2^10 = 1024)} -@item Prefix @tab Value @tab | @tab Prefix @tab Value +@headitem Prefix @tab Value @tab | @tab Prefix @tab Value @item k @tab kilobyte (10^3 = 1000) @tab | @tab Ki @tab kibibyte (2^10 = 1024) @item M @tab megabyte (10^6) @tab | @tab Mi @tab mebibyte (2^20) @item G @tab gigabyte (10^9) @tab | @tab Gi @tab gibibyte (2^30) @@ -980,15 +981,14 @@ indicate a corrupt or invalid input file, 3 for an internal consistency error (e.g., bug) which caused minilzip to panic. -@node Data format -@chapter Data format -@cindex data format +@node File format +@chapter File format +@cindex file format Perfection is reached, not when there is no longer anything to add, but when there is no longer anything to take away.@* --- Antoine de Saint-Exupery -@sp 1 In the diagram below, a box like this: @verbatim @@ -1007,12 +1007,13 @@ represents one byte; a box like this: represents a variable number of bytes. -@sp 1 -Lzip data consist of one or more independent "members" (compressed data -sets). The members simply appear one after another in the data stream, with -no additional information before, between, or after them. Each member can +@noindent +A lzip file consists of one or more independent "members" (compressed data +sets). The members simply appear one after another in the file, with no +additional information before, between, or after them. Each member can encode in compressed form up to @w{16 EiB - 1 byte} of uncompressed data. -The size of a multimember data stream is unlimited. +The size of a multimember file is unlimited. Empty members (data size = 0) +are not allowed in multimember files. Each member has the following structure: @@ -1043,7 +1044,7 @@ Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@* Valid values for dictionary size range from 4 KiB to 512 MiB. @item LZMA stream -The LZMA stream, finished by an "End Of Stream" marker. Uses default values +The LZMA stream, terminated by an "End Of Stream" marker. Uses default values for encoder properties. @ifnothtml @xref{Stream format,,,lzip}, @@ -1077,8 +1078,8 @@ overflowing. @cindex examples This chapter provides real code examples for the most common uses of the -library. See these examples in context in the files @samp{bbexample.c} and -@samp{ffexample.c} from the source distribution of lzlib. +library. See these examples in context in the files @file{bbexample.c} and +@file{ffexample.c} from the source distribution of lzlib. Note that the interface of lzlib is symmetrical. That is, the code for normal compression and decompression is identical except because one calls |