diff options
Diffstat (limited to 'doc/tarlz.texi')
-rw-r--r-- | doc/tarlz.texi | 123 |
1 files changed, 86 insertions, 37 deletions
diff --git a/doc/tarlz.texi b/doc/tarlz.texi index 00116ee..c6e7e89 100644 --- a/doc/tarlz.texi +++ b/doc/tarlz.texi @@ -6,8 +6,8 @@ @finalout @c %**end of header -@set UPDATED 30 July 2020 -@set VERSION 0.17 +@set UPDATED 8 January 2021 +@set VERSION 0.19 @dircategory Data Compression @direntry @@ -29,6 +29,7 @@ @contents @end ifnothtml +@ifnottex @node Top @top @@ -49,10 +50,11 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}). @end menu @sp 1 -Copyright @copyright{} 2013-2020 Antonio Diaz Diaz. +Copyright @copyright{} 2013-2021 Antonio Diaz Diaz. -This manual is free documentation: you have unlimited permission -to copy, distribute, and modify it. +This manual is free documentation: you have unlimited permission to copy, +distribute, and modify it. +@end ifnottex @node Introduction @@ -61,13 +63,15 @@ to copy, distribute, and modify it. @uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a massively parallel (multi-threaded) combined implementation of the tar archiver and the -@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz creates, -lists and extracts archives in a simplified and safer variant of the POSIX -pax format compressed with lzip, keeping the alignment between tar members -and lzip members. The resulting multimember tar.lz archive is fully backward -compatible with standard tar tools like GNU tar, which treat it like any -other tar.lz archive. Tarlz can append files to the end of such compressed -archives. +@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz uses the +compression library @uref{http://www.nongnu.org/lzip/lzlib.html,,lzlib}. + +Tarlz creates tar archives using a simplified and safer variant of the POSIX +pax format compressed in lzip format, keeping the alignment between tar +members and lzip members. The resulting multimember tar.lz archive is fully +backward compatible with standard tar tools like GNU tar, which treat it +like any other tar.lz archive. Tarlz can append files to the end of such +compressed archives. Keeping the alignment between tar members and lzip members has two advantages. It adds an indexed lzip layer on top of the tar archive, making @@ -76,7 +80,7 @@ amount of data lost in case of corruption. Compressing a tar archive with plzip may even double the amount of files lost for each lzip member damaged because it does not keep the members aligned. -Tarlz can create tar archives with five levels of compression granularity; +Tarlz can create tar archives with five levels of compression granularity: per file (---no-solid), per block (---bsolid, default), per directory (---dsolid), appendable solid (---asolid), and solid (---solid). It can also create uncompressed tar archives. @@ -97,17 +101,17 @@ member), and unwanted members can be deleted from the archive. Just like an uncompressed tar archive. @item -It is a safe POSIX-style backup format. In case of corruption, -tarlz can extract all the undamaged members from the tar.lz -archive, skipping over the damaged members, just like the standard -(uncompressed) tar. Moreover, the option @samp{--keep-damaged} can be -used to recover as much data as possible from each damaged member, -and lziprecover can be used to recover some of the damaged members. +It is a safe POSIX-style backup format. In case of corruption, tarlz +can extract all the undamaged members from the tar.lz archive, +skipping over the damaged members, just like the standard +(uncompressed) tar. Moreover, the option @samp{--keep-damaged} can be used +to recover as much data as possible from each damaged member, and +lziprecover can be used to recover some of the damaged members. @item -A multimember tar.lz archive is usually smaller than the -corresponding solidly compressed tar.gz archive, except when -compressing files smaller than about 32 KiB individually. +A multimember tar.lz archive is usually smaller than the corresponding +solidly compressed tar.gz archive, except when individually +compressing files smaller than about 32 KiB. @end itemize Tarlz protects the extended records with a Cyclic Redundancy Check (CRC) in @@ -275,8 +279,6 @@ of 0 disables threads entirely. If this option is not used, tarlz tries to detect the number of processors in the system and use it as default value. @w{@samp{tarlz --help}} shows the system's default value. See the note about multi-threaded archive creation in the option @samp{-C} above. -Multi-threaded extraction of files from an archive is not yet implemented. -@xref{Multi-threaded decoding}. Note that the number of usable threads is limited during compression to @w{ceil( uncompressed_size / data_size )} (@pxref{Minimum archive sizes}), @@ -316,7 +318,8 @@ List the contents of an archive. If @var{files} are given, list only the @item -v @itemx --verbose -Verbosely list files processed. +Verbosely list files processed. Further -v's (up to 4) increase the +verbosity level. @item -x @itemx --extract @@ -409,8 +412,9 @@ decimal numeric group ID. @item --keep-damaged Don't delete partially extracted files. If a decompression error happens -while extracting a file, keep the partial data extracted. Use this -option to recover as much data as possible from each damaged member. +while extracting a file, keep the partial data extracted. Use this option to +recover as much data as possible from each damaged member. It is recommended +to run tarlz in single-threaded mode (--threads=0) when using this option. @item --missing-crc Exit with error status 2 if the CRC of the extended records is missing. @@ -429,6 +433,19 @@ number of packets may increase compression speed if the files being archived are larger than @w{64 MiB} compressed, but requires more memory. Valid values range from 1 to 1024. The default value is 64. +@item --check-lib +Compare the +@uref{http://www.nongnu.org/lzip/manual/lzlib_manual.html#Library-version,,version of lzlib} +used to compile tarlz with the version actually being used and exit. Report +any differences found. Exit with error status 1 if differences are found. A +mismatch may indicate that lzlib is not correctly installed or that a +different version of lzlib has been installed after compiling tarlz. +@w{@samp{tarlz -v --check-lib}} shows the version of lzlib being used and +the value of @samp{LZ_API_VERSION} (if defined). +@ifnothtml +@xref{Library version,,,lzlib}. +@end ifnothtml + @ignore @item --permissive Allow some violations of the archive format, like consecutive extended @@ -613,8 +630,12 @@ protected by the CRC to guarante that corruption is always detected (except in case of CRC collision). A CRC was chosen because a checksum is too weak for a potentially large list of variable sized records. A checksum can't detect simple errors like the swapping of two bytes. + @end table +At verbosity level 1 or higher tarlz prints a diagnostic for each unknown +extended header keyword found in an archive, once per keyword. + @sp 1 @section Ustar header block @@ -839,11 +860,16 @@ or less similar to any other tar and won't be described here. The interesting parts described here are those related to Multi-threaded processing. The structure of the part of tarlz performing Multi-threaded archive -creation is somewhat similar to that of plzip with the added complication of -the solidity levels. A grouper thread and several worker threads are -created, acting the main thread as muxer (multiplexer) thread. A "packet -courier" takes care of data transfers among threads and limits the maximum -number of data blocks (packets) being processed simultaneously. +creation is somewhat similar to that of +@uref{http://www.nongnu.org/lzip/plzip.html#Program-design,,plzip} with the +added complication of the solidity levels. +@ifnothtml +@xref{Program design,,,plzip}. +@end ifnothtml +A grouper thread and several worker threads are created, acting the main +thread as muxer (multiplexer) thread. A "packet courier" takes care of data +transfers among threads and limits the maximum number of data blocks +(packets) being processed simultaneously. The grouper traverses the directory tree, groups together the metadata of the files to be archived in each lzip member, and distributes them to the @@ -876,8 +902,7 @@ access files in the file system either to read them (diff) or write them ,--------, | file |<---> data to/from each worker below | system | -`--------' - ,------------, +`--------' ,------------, ,-->| worker 0 |--, | `------------' | ,---------, | ,------------, | ,-------, ,--------, @@ -941,8 +966,7 @@ decoding it safely in parallel. Tarlz is able to automatically decode aligned and unaligned multimember tar.lz archives, keeping backwards compatibility. If tarlz finds a member misalignment during multi-threaded decoding, it switches to single-threaded -mode and continues decoding the archive. Currently only the options -@samp{--diff} and @samp{--list} are able to do multi-threaded decoding. +mode and continues decoding the archive. If the files in the archive are large, multi-threaded @samp{--list} on a regular (seekable) tar.lz archive can be hundreds of times faster than @@ -959,7 +983,32 @@ time tarlz -tf silesia.tar.lz (0.020s) On the other hand, multi-threaded @samp{--list} won't detect corruption in the tar member data because it only decodes the part of each lzip member -corresponding to the tar member header. +corresponding to the tar member header. This is another reason why the tar +headers must provide its own integrity checking. + +@sp 1 +@section Limitations of multi-threaded extraction + +Multi-threaded extraction may produce different output than single-threaded +extraction in some cases: + +During multi-threaded extraction, several independent processes are +simultaneously reading the archive and creating files in the file system. The +archive is not read sequentially. As a consequence, any error or weirdness +in the archive (like a corrupt member or an EOF block in the middle of the +archive) won't be usually detected until part of the archive beyond that +point has been processed. + +If the archive contains two or more tar members with the same name, +single-threaded extraction extracts the members in the order they appear in +the archive and leaves in the file system the last version of the file. But +multi-threaded extraction may extract the members in any order and leave in +the file system any version of the file nondeterministically. It is +unspecified which of the tar members is extracted. + +If the same file is extracted through several paths (different member names +resolve to the same file in the file system), the result is undefined. +(Probably the resulting file will be mangled). @node Minimum archive sizes |