diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/tarlz.1 | 8 | ||||
-rw-r--r-- | doc/tarlz.info | 79 | ||||
-rw-r--r-- | doc/tarlz.texi | 82 |
3 files changed, 116 insertions, 53 deletions
diff --git a/doc/tarlz.1 b/doc/tarlz.1 index b83a7e6..9450c57 100644 --- a/doc/tarlz.1 +++ b/doc/tarlz.1 @@ -1,5 +1,5 @@ .\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1. -.TH TARLZ "1" "January 2019" "tarlz 0.9" "User Commands" +.TH TARLZ "1" "January 2019" "tarlz 0.10" "User Commands" .SH NAME tarlz \- creates tar archives with multimember lzip compression .SH SYNOPSIS @@ -33,6 +33,9 @@ output version information and exit \fB\-A\fR, \fB\-\-concatenate\fR append tar.lz archives to the end of an archive .TP +\fB\-B\fR, \fB\-\-data\-size=\fR<bytes> +set target size of input data blocks [2x8=16 MiB] +.TP \fB\-c\fR, \fB\-\-create\fR create a new archive .TP @@ -66,6 +69,9 @@ set compression level [default 6] \fB\-\-asolid\fR create solidly compressed appendable archive .TP +\fB\-\-bsolid\fR +create per\-data\-block compressed archive +.TP \fB\-\-dsolid\fR create per\-directory compressed archive .TP diff --git a/doc/tarlz.info b/doc/tarlz.info index 7f90766..bf1e1f5 100644 --- a/doc/tarlz.info +++ b/doc/tarlz.info @@ -11,7 +11,7 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir) Tarlz Manual ************ -This manual is for Tarlz (version 0.9, 22 January 2019). +This manual is for Tarlz (version 0.10, 31 January 2019). * Menu: @@ -120,6 +120,13 @@ archive 'foo'. the archive if no FILES have been specified. Tarlz can't concatenate uncompressed tar archives. +'-B BYTES' +'--data-size=BYTES' + Set target size of input data blocks for the '--bsolid' option. + Valid values range from 8 KiB to 1 GiB. Default value is two times + the dictionary size, except for option '-0' where it defaults to + 1 MiB. + '-c' '--create' Create a new archive from FILES. @@ -190,6 +197,18 @@ archive 'foo'. members it creates, reducing the amount of memory required for decompression. + Level Dictionary size Match length limit + -0 64 KiB 16 bytes + -1 1 MiB 5 bytes + -2 1.5 MiB 6 bytes + -3 2 MiB 8 bytes + -4 3 MiB 12 bytes + -5 4 MiB 20 bytes + -6 8 MiB 36 bytes + -7 16 MiB 68 bytes + -8 24 MiB 132 bytes + -9 32 MiB 273 bytes + '--asolid' When creating or appending to a compressed archive, use appendable solid compression. All the files being added to the archive are @@ -197,6 +216,15 @@ archive 'foo'. are compressed into a separate lzip member. This creates a solidly compressed appendable archive. +'--bsolid' + When creating or appending to a compressed archive, compress tar + members together in a lzip member until they approximate a target + uncompressed size. The size can't be exact because each solidly + compressed data block must contain an integer number of tar + members. This option improves compression efficiency for archives + with lots of small files. *Note --data-size::, to set the target + block size. + '--dsolid' When creating or appending to a compressed archive, use solid compression for each directory especified in the command line. The @@ -560,13 +588,13 @@ old tar programs from extracting the extended records as a file in the wrong place. Tarlz also sets to zero those fields of the ustar header overridden by extended records. - If the extended header is needed because of a file size larger than -8 GiB, the size field will be unable to contain the full size of the -file. Therefore the file may be partially extracted, and the tool will -issue a spurious warning about a corrupt header at the point where it -thinks the file ends. Setting to zero the overridden size in the ustar -header at least prevents the partial extraction and makes obvious that -the file has been truncated. + If an extended header is required for any reason (for example a file +size larger than 8 GiB or a link name longer than 100 bytes), tarlz +moves the filename also to the extended header to prevent an ustar tool +from trying to extract the file or link. This also makes easier during +parallel extraction or listing the detection of a tar member split +between two lzip members at the boundary between the extended header +and the ustar header. 4.3 As simple as possible (but not simpler) @@ -626,10 +654,10 @@ to single-threaded mode and continues decoding the archive. Currently only the '--list' option is able to do multi-threaded decoding. If the files in the archive are large, multi-threaded '--list' on a -regular tar.lz archive can be hundreds of times faster than sequential -'--list' because, in addition to using several processors, it only -needs to decompress part of each lzip member. See the following example -listing the Silesia corpus on a dual core machine: +regular (seekable) tar.lz archive can be hundreds of times faster than +sequential '--list' because, in addition to using several processors, +it only needs to decompress part of each lzip member. See the following +example listing the Silesia corpus on a dual core machine: tarlz -9 -cf silesia.tar.lz silesia time lzip -cd silesia.tar.lz | tar -tf - (5.032s) @@ -690,9 +718,9 @@ Example 7: Extract files 'a' and 'c' from archive 'archive.tar.lz'. Example 8: Copy the contents of directory 'sourcedir' to the directory -'targetdir'. +'destdir'. - tarlz -C sourcedir -c . | tarlz -C targetdir -x + tarlz -C sourcedir -c . | tarlz -C destdir -x File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top @@ -734,17 +762,18 @@ Concept index Tag Table: Node: Top223 -Node: Introduction1012 -Node: Invoking tarlz3124 -Node: File format10384 -Ref: key_crc3215169 -Node: Amendments to pax format20586 -Ref: crc3221110 -Ref: flawed-compat22135 -Node: Multi-threaded tar24508 -Node: Examples27012 -Node: Problems28682 -Node: Concept index29208 +Node: Introduction1013 +Node: Invoking tarlz3125 +Ref: --data-size4717 +Node: File format11536 +Ref: key_crc3216321 +Node: Amendments to pax format21738 +Ref: crc3222262 +Ref: flawed-compat23287 +Node: Multi-threaded tar25649 +Node: Examples28164 +Node: Problems29830 +Node: Concept index30356 End Tag Table diff --git a/doc/tarlz.texi b/doc/tarlz.texi index d9bdc14..2ab37fb 100644 --- a/doc/tarlz.texi +++ b/doc/tarlz.texi @@ -6,8 +6,8 @@ @finalout @c %**end of header -@set UPDATED 22 January 2019 -@set VERSION 0.9 +@set UPDATED 31 January 2019 +@set VERSION 0.10 @dircategory Data Compression @direntry @@ -89,7 +89,7 @@ member) just like to an uncompressed tar archive. It is a safe posix-style backup format. In case of corruption, tarlz can extract all the undamaged members from the tar.lz archive, skipping over the damaged members, just like the standard -(uncompressed) tar. Moreover, the option @code{--keep-damaged} can be +(uncompressed) tar. Moreover, the option @samp{--keep-damaged} can be used to recover as much data as possible from each damaged member, and lziprecover can be used to recover some of the damaged members. @@ -154,6 +154,13 @@ end-of-file blocks are removed as each new archive is concatenated. Exit with status 0 without modifying the archive if no @var{files} have been specified. Tarlz can't concatenate uncompressed tar archives. +@anchor{--data-size} +@item -B @var{bytes} +@itemx --data-size=@var{bytes} +Set target size of input data blocks for the @samp{--bsolid} option. Valid +values range from @w{8 KiB} to @w{1 GiB}. Default value is two times the +dictionary size, except for option @samp{-0} where it defaults to @w{1 MiB}. + @item -c @itemx --create Create a new archive from @var{files}. @@ -161,13 +168,13 @@ Create a new archive from @var{files}. @item -C @var{dir} @itemx --directory=@var{dir} Change to directory @var{dir}. When creating or appending, the position -of each @code{-C} option in the command line is significant; it will +of each @samp{-C} option in the command line is significant; it will change the current working directory for the following @var{files} until -a new @code{-C} option appears in the command line. When extracting, all -the @code{-C} options are executed in sequence before starting the -extraction. Listing ignores any @code{-C} options specified. @var{dir} +a new @samp{-C} option appears in the command line. When extracting, all +the @samp{-C} options are executed in sequence before starting the +extraction. Listing ignores any @samp{-C} options specified. @var{dir} is relative to the then current working directory, perhaps changed by a -previous @code{-C} option. +previous @samp{-C} option. @item -f @var{archive} @itemx --file=@var{archive} @@ -222,6 +229,20 @@ Set the compression level. The default compression level is @samp{-6}. Like lzip, tarlz also minimizes the dictionary size of the lzip members it creates, reducing the amount of memory required for decompression. +@multitable {Level} {Dictionary size} {Match length limit} +@item Level @tab Dictionary size @tab Match length limit +@item -0 @tab 64 KiB @tab 16 bytes +@item -1 @tab 1 MiB @tab 5 bytes +@item -2 @tab 1.5 MiB @tab 6 bytes +@item -3 @tab 2 MiB @tab 8 bytes +@item -4 @tab 3 MiB @tab 12 bytes +@item -5 @tab 4 MiB @tab 20 bytes +@item -6 @tab 8 MiB @tab 36 bytes +@item -7 @tab 16 MiB @tab 68 bytes +@item -8 @tab 24 MiB @tab 132 bytes +@item -9 @tab 32 MiB @tab 273 bytes +@end multitable + @item --asolid When creating or appending to a compressed archive, use appendable solid compression. All the files being added to the archive are compressed @@ -229,6 +250,14 @@ into a single lzip member, but the end-of-file blocks are compressed into a separate lzip member. This creates a solidly compressed appendable archive. +@item --bsolid +When creating or appending to a compressed archive, compress tar members +together in a lzip member until they approximate a target uncompressed size. +The size can't be exact because each solidly compressed data block must +contain an integer number of tar members. This option improves compression +efficiency for archives with lots of small files. @xref{--data-size}, to set +the target block size. + @item --dsolid When creating or appending to a compressed archive, use solid compression for each directory especified in the command line. The @@ -252,7 +281,7 @@ resulting archive is not appendable. No more files can be later appended to the archive. @item --anonymous -Equivalent to @code{--owner=root --group=root}. +Equivalent to @samp{--owner=root --group=root}. @item --owner=@var{owner} When creating or appending, use @var{owner} for files added to the @@ -287,7 +316,7 @@ keyword appearing in the same block of extended records. @end ignore @item --uncompressed -With @code{--create}, don't compress the created tar archive. Create an +With @samp{--create}, don't compress the created tar archive. Create an uncompressed tar archive instead. @end table @@ -350,7 +379,7 @@ Zero or more blocks that contain the contents of the file. @end itemize Each tar member must be contiguously stored in a lzip member for the -parallel decoding operations like @code{--list} to work. If any tar member +parallel decoding operations like @samp{--list} to work. If any tar member is split over two or more lzip members, the archive must be decoded sequentially. @xref{Multi-threaded tar}. @@ -381,7 +410,7 @@ tar.lz @end verbatim @ignore -When @code{--permissive} is used, the following violations of the +When @samp{--permissive} is used, the following violations of the archive format are allowed:@* If several extended headers precede an ustar header, only the last extended header takes effect. The other extended headers are ignored. @@ -623,13 +652,12 @@ programs from extracting the extended records as a file in the wrong place. Tarlz also sets to zero those fields of the ustar header overridden by extended records. -If the extended header is needed because of a file size larger than -@w{8 GiB}, the size field will be unable to contain the full size of the -file. Therefore the file may be partially extracted, and the tool will issue -a spurious warning about a corrupt header at the point where it thinks the -file ends. Setting to zero the overridden size in the ustar header at least -prevents the partial extraction and makes obvious that the file has been -truncated. +If an extended header is required for any reason (for example a file size +larger than @w{8 GiB} or a link name longer than 100 bytes), tarlz moves the +filename also to the extended header to prevent an ustar tool from trying to +extract the file or link. This also makes easier during parallel extraction +or listing the detection of a tar member split between two lzip members at +the boundary between the extended header and the ustar header. @sp 1 @section As simple as possible (but not simpler) @@ -679,14 +707,14 @@ decoding it safely in parallel. Tarlz is able to automatically decode aligned and unaligned multimember tar.lz archives, keeping backwards compatibility. If tarlz finds a member misalignment during multi-threaded decoding, it switches to single-threaded -mode and continues decoding the archive. Currently only the @code{--list} +mode and continues decoding the archive. Currently only the @samp{--list} option is able to do multi-threaded decoding. -If the files in the archive are large, multi-threaded @code{--list} on a -regular tar.lz archive can be hundreds of times faster than sequential -@code{--list} because, in addition to using several processors, it only -needs to decompress part of each lzip member. See the following example -listing the Silesia corpus on a dual core machine: +If the files in the archive are large, multi-threaded @samp{--list} on a +regular (seekable) tar.lz archive can be hundreds of times faster than +sequential @samp{--list} because, in addition to using several processors, +it only needs to decompress part of each lzip member. See the following +example listing the Silesia corpus on a dual core machine: @example tarlz -9 -cf silesia.tar.lz silesia @@ -772,10 +800,10 @@ tarlz -xf archive.tar.lz a c @sp 1 @noindent Example 8: Copy the contents of directory @samp{sourcedir} to the -directory @samp{targetdir}. +directory @samp{destdir}. @example -tarlz -C sourcedir -c . | tarlz -C targetdir -x +tarlz -C sourcedir -c . | tarlz -C destdir -x @end example |