diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2021-01-27 16:07:31 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2021-01-27 16:07:31 +0000 |
commit | adfd78d2ce2ac3feab8926823ec6349d79a83437 (patch) | |
tree | c876cb9bf2908ca4082c65f20a380b1e13172e44 /doc | |
parent | Adding upstream version 0.17. (diff) | |
download | tarlz-adfd78d2ce2ac3feab8926823ec6349d79a83437.tar.xz tarlz-adfd78d2ce2ac3feab8926823ec6349d79a83437.zip |
Adding upstream version 0.19.upstream/0.19
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc')
-rw-r--r-- | doc/tarlz.1 | 27 | ||||
-rw-r--r-- | doc/tarlz.info | 129 | ||||
-rw-r--r-- | doc/tarlz.texi | 123 |
3 files changed, 187 insertions, 92 deletions
diff --git a/doc/tarlz.1 b/doc/tarlz.1 index cf0f659..e2ed3de 100644 --- a/doc/tarlz.1 +++ b/doc/tarlz.1 @@ -1,5 +1,5 @@ -.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1. -.TH TARLZ "1" "July 2020" "tarlz 0.17" "User Commands" +.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.16. +.TH TARLZ "1" "January 2021" "tarlz 0.19" "User Commands" .SH NAME tarlz \- creates tar archives with multimember lzip compression .SH SYNOPSIS @@ -7,13 +7,15 @@ tarlz \- creates tar archives with multimember lzip compression [\fI\,options\/\fR] [\fI\,files\/\fR] .SH DESCRIPTION Tarlz is a massively parallel (multi\-threaded) combined implementation of -the tar archiver and the lzip compressor. Tarlz creates, lists and extracts -archives in a simplified and safer variant of the POSIX pax format -compressed with lzip, keeping the alignment between tar members and lzip -members. The resulting multimember tar.lz archive is fully backward -compatible with standard tar tools like GNU tar, which treat it like any -other tar.lz archive. Tarlz can append files to the end of such compressed -archives. +the tar archiver and the lzip compressor. Tarlz uses the compression library +lzlib. +.PP +Tarlz creates, lists, and extracts archives in a simplified and safer +variant of the POSIX pax format compressed in lzip format, keeping the +alignment between tar members and lzip members. The resulting multimember +tar.lz archive is fully backward compatible with standard tar tools like GNU +tar, which treat it like any other tar.lz archive. Tarlz can append files to +the end of such compressed archives. .PP Keeping the alignment between tar members and lzip members has two advantages. It adds an indexed lzip layer on top of the tar archive, making @@ -126,6 +128,9 @@ exit with error status if missing extended CRC .TP \fB\-\-out\-slots=\fR<n> number of 1 MiB output packets buffered [64] +.TP +\fB\-\-check\-lib\fR +compare version of lzlib.h with liblz.{a,so} .PP Exit status: 0 for a normal exit, 1 for environmental problems (file not found, files differ, invalid flags, I/O errors, etc), 2 to indicate a @@ -136,8 +141,8 @@ Report bugs to lzip\-bug@nongnu.org .br Tarlz home page: http://www.nongnu.org/lzip/tarlz.html .SH COPYRIGHT -Copyright \(co 2020 Antonio Diaz Diaz. -Using lzlib 1.12\-rc1a +Copyright \(co 2021 Antonio Diaz Diaz. +Using lzlib 1.12 License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html> .br This is free software: you are free to change and redistribute it. diff --git a/doc/tarlz.info b/doc/tarlz.info index e2c61db..d287697 100644 --- a/doc/tarlz.info +++ b/doc/tarlz.info @@ -11,7 +11,7 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir) Tarlz Manual ************ -This manual is for Tarlz (version 0.17, 30 July 2020). +This manual is for Tarlz (version 0.19, 8 January 2021). * Menu: @@ -28,10 +28,10 @@ This manual is for Tarlz (version 0.17, 30 July 2020). * Concept index:: Index of concepts - Copyright (C) 2013-2020 Antonio Diaz Diaz. + Copyright (C) 2013-2021 Antonio Diaz Diaz. - This manual is free documentation: you have unlimited permission to -copy, distribute, and modify it. + This manual is free documentation: you have unlimited permission to copy, +distribute, and modify it. File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: Top @@ -40,13 +40,15 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T ************** Tarlz is a massively parallel (multi-threaded) combined implementation of -the tar archiver and the lzip compressor. Tarlz creates, lists and extracts -archives in a simplified and safer variant of the POSIX pax format -compressed with lzip, keeping the alignment between tar members and lzip -members. The resulting multimember tar.lz archive is fully backward -compatible with standard tar tools like GNU tar, which treat it like any -other tar.lz archive. Tarlz can append files to the end of such compressed -archives. +the tar archiver and the lzip compressor. Tarlz uses the compression +library lzlib. + + Tarlz creates tar archives using a simplified and safer variant of the +POSIX pax format compressed in lzip format, keeping the alignment between +tar members and lzip members. The resulting multimember tar.lz archive is +fully backward compatible with standard tar tools like GNU tar, which treat +it like any other tar.lz archive. Tarlz can append files to the end of such +compressed archives. Keeping the alignment between tar members and lzip members has two advantages. It adds an indexed lzip layer on top of the tar archive, making @@ -56,7 +58,7 @@ plzip may even double the amount of files lost for each lzip member damaged because it does not keep the members aligned. Tarlz can create tar archives with five levels of compression -granularity; per file (--no-solid), per block (--bsolid, default), per +granularity: per file (--no-solid), per block (--bsolid, default), per directory (--dsolid), appendable solid (--asolid), and solid (--solid). It can also create uncompressed tar archives. @@ -79,8 +81,8 @@ archive, but it has the following advantages: lziprecover can be used to recover some of the damaged members. * A multimember tar.lz archive is usually smaller than the corresponding - solidly compressed tar.gz archive, except when compressing files - smaller than about 32 KiB individually. + solidly compressed tar.gz archive, except when individually + compressing files smaller than about 32 KiB. Tarlz protects the extended records with a Cyclic Redundancy Check (CRC) in a way compatible with standard tar tools. *Note crc32::. @@ -240,8 +242,7 @@ to '-1 --solid' not used, tarlz tries to detect the number of processors in the system and use it as default value. 'tarlz --help' shows the system's default value. See the note about multi-threaded archive creation in the - option '-C' above. Multi-threaded extraction of files from an archive - is not yet implemented. *Note Multi-threaded decoding::. + option '-C' above. Note that the number of usable threads is limited during compression to ceil( uncompressed_size / data_size ) (*note Minimum archive sizes::), @@ -281,7 +282,8 @@ to '-1 --solid' '-v' '--verbose' - Verbosely list files processed. + Verbosely list files processed. Further -v's (up to 4) increase the + verbosity level. '-x' '--extract' @@ -376,7 +378,8 @@ to '-1 --solid' Don't delete partially extracted files. If a decompression error happens while extracting a file, keep the partial data extracted. Use this option to recover as much data as possible from each damaged - member. + member. It is recommended to run tarlz in single-threaded mode + (-threads=0) when using this option. '--missing-crc' Exit with error status 2 if the CRC of the extended records is missing. @@ -396,6 +399,15 @@ to '-1 --solid' more memory. Valid values range from 1 to 1024. The default value is 64. +'--check-lib' + Compare the version of lzlib used to compile tarlz with the version + actually being used and exit. Report any differences found. Exit with + error status 1 if differences are found. A mismatch may indicate that + lzlib is not correctly installed or that a different version of lzlib + has been installed after compiling tarlz. 'tarlz -v --check-lib' shows + the version of lzlib being used and the value of 'LZ_API_VERSION' (if + defined). *Note Library version: (lzlib)Library version. + Exit status: 0 for a normal exit, 1 for environmental problems (file not found, files differ, invalid flags, I/O errors, etc), 2 to indicate a @@ -546,6 +558,10 @@ space, equal-sign, and newline. the swapping of two bytes. + At verbosity level 1 or higher tarlz prints a diagnostic for each unknown +extended header keyword found in an archive, once per keyword. + + 4.2 Ustar header block ====================== @@ -770,11 +786,12 @@ interesting parts described here are those related to Multi-threaded processing. The structure of the part of tarlz performing Multi-threaded archive -creation is somewhat similar to that of plzip with the added complication of -the solidity levels. A grouper thread and several worker threads are -created, acting the main thread as muxer (multiplexer) thread. A "packet -courier" takes care of data transfers among threads and limits the maximum -number of data blocks (packets) being processed simultaneously. +creation is somewhat similar to that of plzip with the added complication +of the solidity levels. *Note Program design: (plzip)Program design. A +grouper thread and several worker threads are created, acting the main +thread as muxer (multiplexer) thread. A "packet courier" takes care of data +transfers among threads and limits the maximum number of data blocks +(packets) being processed simultaneously. The grouper traverses the directory tree, groups together the metadata of the files to be archived in each lzip member, and distributes them to the @@ -805,8 +822,7 @@ the archive. ,--------, | file |<---> data to/from each worker below | system | -`--------' - ,------------, +`--------' ,------------, ,-->| worker 0 |--, | `------------' | ,---------, | ,------------, | ,-------, ,--------, @@ -870,8 +886,7 @@ possible decoding it safely in parallel. Tarlz is able to automatically decode aligned and unaligned multimember tar.lz archives, keeping backwards compatibility. If tarlz finds a member misalignment during multi-threaded decoding, it switches to single-threaded -mode and continues decoding the archive. Currently only the options -'--diff' and '--list' are able to do multi-threaded decoding. +mode and continues decoding the archive. If the files in the archive are large, multi-threaded '--list' on a regular (seekable) tar.lz archive can be hundreds of times faster than @@ -886,7 +901,33 @@ example listing the Silesia corpus on a dual core machine: On the other hand, multi-threaded '--list' won't detect corruption in the tar member data because it only decodes the part of each lzip member -corresponding to the tar member header. +corresponding to the tar member header. This is another reason why the tar +headers must provide its own integrity checking. + + +7.1 Limitations of multi-threaded extraction +============================================ + +Multi-threaded extraction may produce different output than single-threaded +extraction in some cases: + + During multi-threaded extraction, several independent processes are +simultaneously reading the archive and creating files in the file system. +The archive is not read sequentially. As a consequence, any error or +weirdness in the archive (like a corrupt member or an EOF block in the +middle of the archive) won't be usually detected until part of the archive +beyond that point has been processed. + + If the archive contains two or more tar members with the same name, +single-threaded extraction extracts the members in the order they appear in +the archive and leaves in the file system the last version of the file. But +multi-threaded extraction may extract the members in any order and leave in +the file system any version of the file nondeterministically. It is +unspecified which of the tar members is extracted. + + If the same file is extracted through several paths (different member +names resolve to the same file in the file system), the result is undefined. +(Probably the resulting file will be mangled). File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded decoding, Up: Top @@ -1028,22 +1069,22 @@ Concept index Tag Table: Node: Top223 -Node: Introduction1212 -Node: Invoking tarlz3982 -Ref: --data-size6193 -Ref: --bsolid14608 -Node: Portable character set18244 -Node: File format18887 -Ref: key_crc3223812 -Node: Amendments to pax format29271 -Ref: crc3229935 -Ref: flawed-compat31220 -Node: Program design33865 -Node: Multi-threaded decoding37756 -Node: Minimum archive sizes40492 -Node: Examples42630 -Node: Problems44345 -Node: Concept index44873 +Node: Introduction1214 +Node: Invoking tarlz4022 +Ref: --data-size6233 +Ref: --bsolid14593 +Node: Portable character set18852 +Node: File format19495 +Ref: key_crc3224420 +Node: Amendments to pax format30021 +Ref: crc3230685 +Ref: flawed-compat31970 +Node: Program design34615 +Node: Multi-threaded decoding38540 +Node: Minimum archive sizes42482 +Node: Examples44620 +Node: Problems46335 +Node: Concept index46863 End Tag Table diff --git a/doc/tarlz.texi b/doc/tarlz.texi index 00116ee..c6e7e89 100644 --- a/doc/tarlz.texi +++ b/doc/tarlz.texi @@ -6,8 +6,8 @@ @finalout @c %**end of header -@set UPDATED 30 July 2020 -@set VERSION 0.17 +@set UPDATED 8 January 2021 +@set VERSION 0.19 @dircategory Data Compression @direntry @@ -29,6 +29,7 @@ @contents @end ifnothtml +@ifnottex @node Top @top @@ -49,10 +50,11 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}). @end menu @sp 1 -Copyright @copyright{} 2013-2020 Antonio Diaz Diaz. +Copyright @copyright{} 2013-2021 Antonio Diaz Diaz. -This manual is free documentation: you have unlimited permission -to copy, distribute, and modify it. +This manual is free documentation: you have unlimited permission to copy, +distribute, and modify it. +@end ifnottex @node Introduction @@ -61,13 +63,15 @@ to copy, distribute, and modify it. @uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a massively parallel (multi-threaded) combined implementation of the tar archiver and the -@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz creates, -lists and extracts archives in a simplified and safer variant of the POSIX -pax format compressed with lzip, keeping the alignment between tar members -and lzip members. The resulting multimember tar.lz archive is fully backward -compatible with standard tar tools like GNU tar, which treat it like any -other tar.lz archive. Tarlz can append files to the end of such compressed -archives. +@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz uses the +compression library @uref{http://www.nongnu.org/lzip/lzlib.html,,lzlib}. + +Tarlz creates tar archives using a simplified and safer variant of the POSIX +pax format compressed in lzip format, keeping the alignment between tar +members and lzip members. The resulting multimember tar.lz archive is fully +backward compatible with standard tar tools like GNU tar, which treat it +like any other tar.lz archive. Tarlz can append files to the end of such +compressed archives. Keeping the alignment between tar members and lzip members has two advantages. It adds an indexed lzip layer on top of the tar archive, making @@ -76,7 +80,7 @@ amount of data lost in case of corruption. Compressing a tar archive with plzip may even double the amount of files lost for each lzip member damaged because it does not keep the members aligned. -Tarlz can create tar archives with five levels of compression granularity; +Tarlz can create tar archives with five levels of compression granularity: per file (---no-solid), per block (---bsolid, default), per directory (---dsolid), appendable solid (---asolid), and solid (---solid). It can also create uncompressed tar archives. @@ -97,17 +101,17 @@ member), and unwanted members can be deleted from the archive. Just like an uncompressed tar archive. @item -It is a safe POSIX-style backup format. In case of corruption, -tarlz can extract all the undamaged members from the tar.lz -archive, skipping over the damaged members, just like the standard -(uncompressed) tar. Moreover, the option @samp{--keep-damaged} can be -used to recover as much data as possible from each damaged member, -and lziprecover can be used to recover some of the damaged members. +It is a safe POSIX-style backup format. In case of corruption, tarlz +can extract all the undamaged members from the tar.lz archive, +skipping over the damaged members, just like the standard +(uncompressed) tar. Moreover, the option @samp{--keep-damaged} can be used +to recover as much data as possible from each damaged member, and +lziprecover can be used to recover some of the damaged members. @item -A multimember tar.lz archive is usually smaller than the -corresponding solidly compressed tar.gz archive, except when -compressing files smaller than about 32 KiB individually. +A multimember tar.lz archive is usually smaller than the corresponding +solidly compressed tar.gz archive, except when individually +compressing files smaller than about 32 KiB. @end itemize Tarlz protects the extended records with a Cyclic Redundancy Check (CRC) in @@ -275,8 +279,6 @@ of 0 disables threads entirely. If this option is not used, tarlz tries to detect the number of processors in the system and use it as default value. @w{@samp{tarlz --help}} shows the system's default value. See the note about multi-threaded archive creation in the option @samp{-C} above. -Multi-threaded extraction of files from an archive is not yet implemented. -@xref{Multi-threaded decoding}. Note that the number of usable threads is limited during compression to @w{ceil( uncompressed_size / data_size )} (@pxref{Minimum archive sizes}), @@ -316,7 +318,8 @@ List the contents of an archive. If @var{files} are given, list only the @item -v @itemx --verbose -Verbosely list files processed. +Verbosely list files processed. Further -v's (up to 4) increase the +verbosity level. @item -x @itemx --extract @@ -409,8 +412,9 @@ decimal numeric group ID. @item --keep-damaged Don't delete partially extracted files. If a decompression error happens -while extracting a file, keep the partial data extracted. Use this -option to recover as much data as possible from each damaged member. +while extracting a file, keep the partial data extracted. Use this option to +recover as much data as possible from each damaged member. It is recommended +to run tarlz in single-threaded mode (--threads=0) when using this option. @item --missing-crc Exit with error status 2 if the CRC of the extended records is missing. @@ -429,6 +433,19 @@ number of packets may increase compression speed if the files being archived are larger than @w{64 MiB} compressed, but requires more memory. Valid values range from 1 to 1024. The default value is 64. +@item --check-lib +Compare the +@uref{http://www.nongnu.org/lzip/manual/lzlib_manual.html#Library-version,,version of lzlib} +used to compile tarlz with the version actually being used and exit. Report +any differences found. Exit with error status 1 if differences are found. A +mismatch may indicate that lzlib is not correctly installed or that a +different version of lzlib has been installed after compiling tarlz. +@w{@samp{tarlz -v --check-lib}} shows the version of lzlib being used and +the value of @samp{LZ_API_VERSION} (if defined). +@ifnothtml +@xref{Library version,,,lzlib}. +@end ifnothtml + @ignore @item --permissive Allow some violations of the archive format, like consecutive extended @@ -613,8 +630,12 @@ protected by the CRC to guarante that corruption is always detected (except in case of CRC collision). A CRC was chosen because a checksum is too weak for a potentially large list of variable sized records. A checksum can't detect simple errors like the swapping of two bytes. + @end table +At verbosity level 1 or higher tarlz prints a diagnostic for each unknown +extended header keyword found in an archive, once per keyword. + @sp 1 @section Ustar header block @@ -839,11 +860,16 @@ or less similar to any other tar and won't be described here. The interesting parts described here are those related to Multi-threaded processing. The structure of the part of tarlz performing Multi-threaded archive -creation is somewhat similar to that of plzip with the added complication of -the solidity levels. A grouper thread and several worker threads are -created, acting the main thread as muxer (multiplexer) thread. A "packet -courier" takes care of data transfers among threads and limits the maximum -number of data blocks (packets) being processed simultaneously. +creation is somewhat similar to that of +@uref{http://www.nongnu.org/lzip/plzip.html#Program-design,,plzip} with the +added complication of the solidity levels. +@ifnothtml +@xref{Program design,,,plzip}. +@end ifnothtml +A grouper thread and several worker threads are created, acting the main +thread as muxer (multiplexer) thread. A "packet courier" takes care of data +transfers among threads and limits the maximum number of data blocks +(packets) being processed simultaneously. The grouper traverses the directory tree, groups together the metadata of the files to be archived in each lzip member, and distributes them to the @@ -876,8 +902,7 @@ access files in the file system either to read them (diff) or write them ,--------, | file |<---> data to/from each worker below | system | -`--------' - ,------------, +`--------' ,------------, ,-->| worker 0 |--, | `------------' | ,---------, | ,------------, | ,-------, ,--------, @@ -941,8 +966,7 @@ decoding it safely in parallel. Tarlz is able to automatically decode aligned and unaligned multimember tar.lz archives, keeping backwards compatibility. If tarlz finds a member misalignment during multi-threaded decoding, it switches to single-threaded -mode and continues decoding the archive. Currently only the options -@samp{--diff} and @samp{--list} are able to do multi-threaded decoding. +mode and continues decoding the archive. If the files in the archive are large, multi-threaded @samp{--list} on a regular (seekable) tar.lz archive can be hundreds of times faster than @@ -959,7 +983,32 @@ time tarlz -tf silesia.tar.lz (0.020s) On the other hand, multi-threaded @samp{--list} won't detect corruption in the tar member data because it only decodes the part of each lzip member -corresponding to the tar member header. +corresponding to the tar member header. This is another reason why the tar +headers must provide its own integrity checking. + +@sp 1 +@section Limitations of multi-threaded extraction + +Multi-threaded extraction may produce different output than single-threaded +extraction in some cases: + +During multi-threaded extraction, several independent processes are +simultaneously reading the archive and creating files in the file system. The +archive is not read sequentially. As a consequence, any error or weirdness +in the archive (like a corrupt member or an EOF block in the middle of the +archive) won't be usually detected until part of the archive beyond that +point has been processed. + +If the archive contains two or more tar members with the same name, +single-threaded extraction extracts the members in the order they appear in +the archive and leaves in the file system the last version of the file. But +multi-threaded extraction may extract the members in any order and leave in +the file system any version of the file nondeterministically. It is +unspecified which of the tar members is extracted. + +If the same file is extracted through several paths (different member names +resolve to the same file in the file system), the result is undefined. +(Probably the resulting file will be mangled). @node Minimum archive sizes |