diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2019-02-14 05:12:04 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2019-02-14 05:12:04 +0000 |
commit | e9e3fad677df4b5329912c4dd611a8de620f15cb (patch) | |
tree | 3bc1ab40775cf56a94d8b9cd4ce14a71111b8545 /doc | |
parent | Adding upstream version 0.10a. (diff) | |
download | tarlz-e9e3fad677df4b5329912c4dd611a8de620f15cb.tar.xz tarlz-e9e3fad677df4b5329912c4dd611a8de620f15cb.zip |
Adding upstream version 0.11.upstream/0.11
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
-rw-r--r-- | doc/tarlz.1 | 28 | ||||
-rw-r--r-- | doc/tarlz.info | 202 | ||||
-rw-r--r-- | doc/tarlz.texi | 188 |
3 files changed, 265 insertions, 153 deletions
diff --git a/doc/tarlz.1 b/doc/tarlz.1 index c30c72f..82462cd 100644 --- a/doc/tarlz.1 +++ b/doc/tarlz.1 @@ -1,20 +1,20 @@ .\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1. -.TH TARLZ "1" "February 2019" "tarlz 0.10a" "User Commands" +.TH TARLZ "1" "February 2019" "tarlz 0.11" "User Commands" .SH NAME tarlz \- creates tar archives with multimember lzip compression .SH SYNOPSIS .B tarlz [\fI\,options\/\fR] [\fI\,files\/\fR] .SH DESCRIPTION -Tarlz is a combined implementation of the tar archiver and the lzip -compressor. By default tarlz creates, lists and extracts archives in a -simplified posix pax format compressed with lzip on a per file basis. Each -tar member is compressed in its own lzip member, as well as the end\-of\-file -blocks. This method adds an indexed lzip layer on top of the tar archive, -making it possible to decode the archive safely in parallel. The resulting -multimember tar.lz archive is fully backward compatible with standard tar -tools like GNU tar, which treat it like any other tar.lz archive. Tarlz can -append files to the end of such compressed archives. +Tarlz is a massively parallel (multi\-threaded) combined implementation of +the tar archiver and the lzip compressor. Tarlz creates, lists and extracts +archives in a simplified posix pax format compressed with lzip, keeping the +alignment between tar members and lzip members. This method adds an indexed +lzip layer on top of the tar archive, making it possible to decode the +archive safely in parallel. The resulting multimember tar.lz archive is +fully backward compatible with standard tar tools like GNU tar, which treat +it like any other tar.lz archive. Tarlz can append files to the end of such +compressed archives. .PP The tarlz file format is a safe posix\-style backup format. In case of corruption, tarlz can extract all the undamaged members from the tar.lz @@ -46,7 +46,7 @@ change to directory <dir> use archive file <archive> .TP \fB\-n\fR, \fB\-\-threads=\fR<n> -set number of decompression threads [2] +set number of (de)compression threads [2] .TP \fB\-q\fR, \fB\-\-quiet\fR suppress all messages @@ -70,13 +70,13 @@ set compression level [default 6] create solidly compressed appendable archive .TP \fB\-\-bsolid\fR -create per\-data\-block compressed archive +create per block compressed archive (default) .TP \fB\-\-dsolid\fR -create per\-directory compressed archive +create per directory compressed archive .TP \fB\-\-no\-solid\fR -create per\-file compressed archive (default) +create per file compressed archive .TP \fB\-\-solid\fR create solidly compressed archive diff --git a/doc/tarlz.info b/doc/tarlz.info index bf1e1f5..288c441 100644 --- a/doc/tarlz.info +++ b/doc/tarlz.info @@ -11,7 +11,7 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir) Tarlz Manual ************ -This manual is for Tarlz (version 0.10, 31 January 2019). +This manual is for Tarlz (version 0.11, 13 February 2019). * Menu: @@ -20,6 +20,7 @@ This manual is for Tarlz (version 0.10, 31 January 2019). * File format:: Detailed format of the compressed archive * Amendments to pax format:: The reasons for the differences with pax * Multi-threaded tar:: Limitations of parallel tar decoding +* Minimum archive sizes:: Sizes required for full multi-threaded speed * Examples:: A small tutorial with examples * Problems:: Reporting bugs * Concept index:: Index of concepts @@ -36,23 +37,23 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T 1 Introduction ************** -Tarlz is a combined implementation of the tar archiver and the lzip -compressor. By default tarlz creates, lists and extracts archives in a -simplified posix pax format compressed with lzip on a per file basis. -Each tar member is compressed in its own lzip member, as well as the -end-of-file blocks. This method adds an indexed lzip layer on top of -the tar archive, making it possible to decode the archive safely in -parallel. The resulting multimember tar.lz archive is fully backward -compatible with standard tar tools like GNU tar, which treat it like -any other tar.lz archive. Tarlz can append files to the end of such -compressed archives. +Tarlz is a massively parallel (multi-threaded) combined implementation +of the tar archiver and the lzip compressor. Tarlz creates, lists and +extracts archives in a simplified posix pax format compressed with +lzip, keeping the alignment between tar members and lzip members. This +method adds an indexed lzip layer on top of the tar archive, making it +possible to decode the archive safely in parallel. The resulting +multimember tar.lz archive is fully backward compatible with standard +tar tools like GNU tar, which treat it like any other tar.lz archive. +Tarlz can append files to the end of such compressed archives. - Tarlz can create tar archives with four levels of compression -granularity; per file, per directory, appendable solid, and solid. + Tarlz can create tar archives with five levels of compression +granularity; per file, per block, per directory, appendable solid, and +solid. -Of course, compressing each file (or each directory) individually is -less efficient than compressing the whole tar archive, but it has the -following advantages: +Of course, compressing each file (or each directory) individually can't +achieve a compression ratio as high as compressing solidly the whole tar +archive, but it has the following advantages: * The resulting multimember tar.lz archive can be decompressed in parallel, multiplying the decompression speed. @@ -87,17 +88,23 @@ The format for running tarlz is: tarlz [OPTIONS] [FILES] -On archive creation or appending, tarlz removes leading and trailing -slashes from filenames, as well as filename prefixes containing a '..' -component. On extraction, archive members containing a '..' component -are skipped. Tarlz detects when the archive being created or enlarged -is among the files to be dumped, appended or concatenated, and skips it. +On archive creation or appending tarlz archives the files specified, but +removes from member names any leading and trailing slashes and any +filename prefixes containing a '..' component. On extraction, leading +and trailing slashes are also removed from member names, and archive +members containing a '..' component in the filename are skipped. Tarlz +detects when the archive being created or enlarged is among the files +to be dumped, appended or concatenated, and skips it. On extraction and listing, tarlz removes leading './' strings from member names in the archive or given in the command line, so that 'tarlz -xf foo ./bar baz' extracts members 'bar' and './baz' from archive 'foo'. + If several compression levels or '--*solid' options are given, the +last setting is used. For example '-9 --solid --uncompressed -1' is +equivalent to '-1 --solid' + tarlz supports the following options: '-h' @@ -125,7 +132,7 @@ archive 'foo'. Set target size of input data blocks for the '--bsolid' option. Valid values range from 8 KiB to 1 GiB. Default value is two times the dictionary size, except for option '-0' where it defaults to - 1 MiB. + 1 MiB. *Note Minimum archive sizes::. '-c' '--create' @@ -142,6 +149,11 @@ archive 'foo'. relative to the then current working directory, perhaps changed by a previous '-C' option. + Note that a process can only have one current working directory + (CWD). Therefore multi-threading can't be used to create an + archive if a '-C' option appears after a relative filename in the + command line. + '-f ARCHIVE' '--file=ARCHIVE' Use archive file ARCHIVE. '-' used as an ARCHIVE argument reads @@ -149,18 +161,21 @@ archive 'foo'. '-n N' '--threads=N' - Set the number of decompression threads, overriding the system's + Set the number of (de)compression threads, overriding the system's default. Valid values range from 0 to "as many as your system can support". A value of 0 disables threads entirely. If this option is not used, tarlz tries to detect the number of processors in the system and use it as default value. 'tarlz --help' shows the - system's default value. This option currently only has effect when - listing the contents of a multimember compressed archive. *Note + system's default value. See the note about multi-threaded archive + creation in the '-C' option above. Multi-threaded extraction of + files from an archive is not yet implemented. *Note Multi-threaded tar::. Note that the number of usable threads is limited during - decompression to the number of lzip members in the tar.lz archive, - which you can find by running 'lzip -lv archive.tar.lz'. + compression to ceil( uncompressed_size / data_size ) (*note + Minimum archive sizes::), and during decompression to the number + of lzip members in the tar.lz archive, which you can find by + running 'lzip -lv archive.tar.lz'. '-q' '--quiet' @@ -180,7 +195,7 @@ archive 'foo'. '-t' '--list' List the contents of an archive. If FILES are given, list only the - given FILES. + FILES given. '-v' '--verbose' @@ -189,7 +204,7 @@ archive 'foo'. '-x' '--extract' Extract files from an archive. If FILES are given, extract only - the given FILES. Else extract all the files in the archive. + the FILES given. Else extract all the files in the archive. '-0 .. -9' Set the compression level. The default compression level is '-6'. @@ -214,38 +229,43 @@ archive 'foo'. solid compression. All the files being added to the archive are compressed into a single lzip member, but the end-of-file blocks are compressed into a separate lzip member. This creates a solidly - compressed appendable archive. + compressed appendable archive. Solid archives can't be created + nor decoded in parallel. '--bsolid' - When creating or appending to a compressed archive, compress tar - members together in a lzip member until they approximate a target - uncompressed size. The size can't be exact because each solidly - compressed data block must contain an integer number of tar - members. This option improves compression efficiency for archives - with lots of small files. *Note --data-size::, to set the target + When creating or appending to a compressed archive, use block + compression. Tar members are compressed together in a lzip member + until they approximate a target uncompressed size. The size can't + be exact because each solidly compressed data block must contain + an integer number of tar members. Block compression is the default + because it improves compression ratio for archives with many files + smaller than the block size. This option allows tarlz revert to + default behavior if, for example, it is invoked through an alias + like 'tar='tarlz --solid''. *Note --data-size::, to set the target block size. '--dsolid' - When creating or appending to a compressed archive, use solid - compression for each directory especified in the command line. The - end-of-file blocks are compressed into a separate lzip member. This - creates a compressed appendable archive with a separate lzip - member for each top-level directory. + When creating or appending to a compressed archive, compress each + file specified in the command line separately in its own lzip + member, and use solid compression for each directory specified in + the command line. The end-of-file blocks are compressed into a + separate lzip member. This creates a compressed appendable archive + with a separate lzip member for each file or top-level directory + specified. '--no-solid' When creating or appending to a compressed archive, compress each - file separately. The end-of-file blocks are compressed into a - separate lzip member. This creates a compressed appendable archive - with a separate lzip member for each file. This option allows - tarlz revert to default behavior if, for example, tarlz is invoked - through an alias like 'tar='tarlz --solid''. + file separately in its own lzip member. The end-of-file blocks are + compressed into a separate lzip member. This creates a compressed + appendable archive with a lzip member for each file. '--solid' When creating or appending to a compressed archive, use solid - compression. The files being added to the archive, along with the + compression. The files being added to the archive, along with the end-of-file blocks, are compressed into a single lzip member. The resulting archive is not appendable. No more files can be later - appended to the archive. + appended to the archive. Solid archives can't be created nor + decoded in parallel. '--anonymous' Equivalent to '--owner=root --group=root'. @@ -341,9 +361,9 @@ blocks are either compressed in a separate lzip member or compressed along with the tar members contained in the last lzip member. The diagram below shows the correspondence between each tar member -(formed by one or two headers plus optional data) in the tar archive and -each lzip member in the resulting multimember tar.lz archive: *Note -File format: (lzip)File format. +(formed by one or two headers plus optional data) in the tar archive +and each lzip member in the resulting multimember tar.lz archive, when +per file compression is used: *Note File format: (lzip)File format. tar +========+======+=================+===============+========+======+========+ @@ -612,12 +632,12 @@ wasteful for a backup format. There is no portable way to tell what charset a text string is coded into. Therefore, tarlz stores all fields representing text strings -as-is, without conversion to UTF-8 nor any other transformation. This -prevents accidental double UTF-8 conversions. If the need arises this -behavior will be adjusted with a command line option in the future. +unmodified, without conversion to UTF-8 nor any other transformation. +This prevents accidental double UTF-8 conversions. If the need arises +this behavior will be adjusted with a command line option in the future. -File: tarlz.info, Node: Multi-threaded tar, Next: Examples, Prev: Amendments to pax format, Up: Top +File: tarlz.info, Node: Multi-threaded tar, Next: Minimum archive sizes, Prev: Amendments to pax format, Up: Top 5 Limitations of parallel tar decoding ************************************** @@ -659,15 +679,53 @@ sequential '--list' because, in addition to using several processors, it only needs to decompress part of each lzip member. See the following example listing the Silesia corpus on a dual core machine: - tarlz -9 -cf silesia.tar.lz silesia + tarlz -9 --no-solid -cf silesia.tar.lz silesia time lzip -cd silesia.tar.lz | tar -tf - (5.032s) time plzip -cd silesia.tar.lz | tar -tf - (3.256s) time tarlz -tf silesia.tar.lz (0.020s) -File: tarlz.info, Node: Examples, Next: Problems, Prev: Multi-threaded tar, Up: Top +File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded tar, Up: Top + +6 Minimum archive sizes required for multi-threaded block compression +********************************************************************* + +When creating or appending to a compressed archive using multi-threaded +block compression, tarlz puts tar members together in blocks and +compresses as many blocks simultaneously as worker threads are chosen, +creating a multimember compressed archive. + + For this to work as expected (and roughly multiply the compression +speed by the number of available processors), the uncompressed archive +must be at least as large as the number of worker threads times the +block size (*note --data-size::). Else some processors will not get any +data to compress, and compression will be proportionally slower. The +maximum speed increase achievable on a given file is limited by the +ratio (uncompressed_size / data_size). For example, a tarball the size +of gcc or linux will scale up to 10 or 12 processors at level -9. + + The following table shows the minimum uncompressed archive size +needed for full use of N processors at a given compression level, using +the default data size for each level: + +Processors 2 4 8 16 64 256 +------------------------------------------------------------------ +Level +-0 2 MiB 4 MiB 8 MiB 16 MiB 64 MiB 256 MiB +-1 4 MiB 8 MiB 16 MiB 32 MiB 128 MiB 512 MiB +-2 6 MiB 12 MiB 24 MiB 48 MiB 192 MiB 768 MiB +-3 8 MiB 16 MiB 32 MiB 64 MiB 256 MiB 1 GiB +-4 12 MiB 24 MiB 48 MiB 96 MiB 384 MiB 1.5 GiB +-5 16 MiB 32 MiB 64 MiB 128 MiB 512 MiB 2 GiB +-6 32 MiB 64 MiB 128 MiB 256 MiB 1 GiB 4 GiB +-7 64 MiB 128 MiB 256 MiB 512 MiB 2 GiB 8 GiB +-8 96 MiB 192 MiB 384 MiB 768 MiB 3 GiB 12 GiB +-9 128 MiB 256 MiB 512 MiB 1 GiB 4 GiB 16 GiB + + +File: tarlz.info, Node: Examples, Next: Problems, Prev: Minimum archive sizes, Up: Top -6 A small tutorial with examples +7 A small tutorial with examples ******************************** Example 1: Create a multimember compressed archive 'archive.tar.lz' @@ -725,7 +783,7 @@ Example 8: Copy the contents of directory 'sourcedir' to the directory File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top -7 Reporting bugs +8 Reporting bugs **************** There are probably bugs in tarlz. There are certainly errors and @@ -754,6 +812,7 @@ Concept index * getting help: Problems. (line 6) * introduction: Introduction. (line 6) * invoking: Invoking tarlz. (line 6) +* minimum archive sizes: Minimum archive sizes. (line 6) * options: Invoking tarlz. (line 6) * usage: Invoking tarlz. (line 6) * version: Invoking tarlz. (line 6) @@ -762,18 +821,19 @@ Concept index Tag Table: Node: Top223 -Node: Introduction1013 -Node: Invoking tarlz3125 -Ref: --data-size4717 -Node: File format11536 -Ref: key_crc3216321 -Node: Amendments to pax format21738 -Ref: crc3222262 -Ref: flawed-compat23287 -Node: Multi-threaded tar25649 -Node: Examples28164 -Node: Problems29830 -Node: Concept index30356 +Node: Introduction1089 +Node: Invoking tarlz3218 +Ref: --data-size5097 +Node: File format12673 +Ref: key_crc3217493 +Node: Amendments to pax format22910 +Ref: crc3223434 +Ref: flawed-compat24459 +Node: Multi-threaded tar26826 +Node: Minimum archive sizes29365 +Node: Examples31495 +Node: Problems33164 +Node: Concept index33690 End Tag Table diff --git a/doc/tarlz.texi b/doc/tarlz.texi index 2ab37fb..6026fe3 100644 --- a/doc/tarlz.texi +++ b/doc/tarlz.texi @@ -6,8 +6,8 @@ @finalout @c %**end of header -@set UPDATED 31 January 2019 -@set VERSION 0.10 +@set UPDATED 13 February 2019 +@set VERSION 0.11 @dircategory Data Compression @direntry @@ -40,6 +40,7 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}). * File format:: Detailed format of the compressed archive * Amendments to pax format:: The reasons for the differences with pax * Multi-threaded tar:: Limitations of parallel tar decoding +* Minimum archive sizes:: Sizes required for full multi-threaded speed * Examples:: A small tutorial with examples * Problems:: Reporting bugs * Concept index:: Index of concepts @@ -56,25 +57,24 @@ to copy, distribute and modify it. @chapter Introduction @cindex introduction -@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a combined -implementation of the tar archiver and the -@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. By default -tarlz creates, lists and extracts archives in a simplified posix pax format -compressed with lzip on a per file basis. Each tar member is compressed in -its own lzip member, as well as the end-of-file blocks. This method adds an -indexed lzip layer on top of the tar archive, making it possible to decode -the archive safely in parallel. The resulting multimember tar.lz archive is -fully backward compatible with standard tar tools like GNU tar, which treat -it like any other tar.lz archive. Tarlz can append files to the end of such -compressed archives. - -Tarlz can create tar archives with four levels of compression granularity; -per file, per directory, appendable solid, and solid. +@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a massively parallel +(multi-threaded) combined implementation of the tar archiver and the +@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz creates, +lists and extracts archives in a simplified posix pax format compressed with +lzip, keeping the alignment between tar members and lzip members. This +method adds an indexed lzip layer on top of the tar archive, making it +possible to decode the archive safely in parallel. The resulting multimember +tar.lz archive is fully backward compatible with standard tar tools like GNU +tar, which treat it like any other tar.lz archive. Tarlz can append files to +the end of such compressed archives. + +Tarlz can create tar archives with five levels of compression granularity; +per file, per block, per directory, appendable solid, and solid. @noindent -Of course, compressing each file (or each directory) individually is -less efficient than compressing the whole tar archive, but it has the -following advantages: +Of course, compressing each file (or each directory) individually can't +achieve a compression ratio as high as compressing solidly the whole tar +archive, but it has the following advantages: @itemize @bullet @item @@ -120,18 +120,23 @@ tarlz [@var{options}] [@var{files}] @end example @noindent -On archive creation or appending, tarlz removes leading and trailing -slashes from filenames, as well as filename prefixes containing a -@samp{..} component. On extraction, archive members containing a -@samp{..} component are skipped. Tarlz detects when the archive being -created or enlarged is among the files to be dumped, appended or -concatenated, and skips it. +On archive creation or appending tarlz archives the files specified, but +removes from member names any leading and trailing slashes and any filename +prefixes containing a @samp{..} component. On extraction, leading and +trailing slashes are also removed from member names, and archive members +containing a @samp{..} component in the filename are skipped. Tarlz detects +when the archive being created or enlarged is among the files to be dumped, +appended or concatenated, and skips it. On extraction and listing, tarlz removes leading @samp{./} strings from member names in the archive or given in the command line, so that @w{@code{tarlz -xf foo ./bar baz}} extracts members @samp{bar} and @samp{./baz} from archive @samp{foo}. +If several compression levels or @samp{--*solid} options are given, the last +setting is used. For example @w{@samp{-9 --solid --uncompressed -1}} is +equivalent to @samp{-1 --solid} + tarlz supports the following options: @table @code @@ -160,6 +165,7 @@ specified. Tarlz can't concatenate uncompressed tar archives. Set target size of input data blocks for the @samp{--bsolid} option. Valid values range from @w{8 KiB} to @w{1 GiB}. Default value is two times the dictionary size, except for option @samp{-0} where it defaults to @w{1 MiB}. +@xref{Minimum archive sizes}. @item -c @itemx --create @@ -176,6 +182,10 @@ extraction. Listing ignores any @samp{-C} options specified. @var{dir} is relative to the then current working directory, perhaps changed by a previous @samp{-C} option. +Note that a process can only have one current working directory (CWD). +Therefore multi-threading can't be used to create an archive if a @samp{-C} +option appears after a relative filename in the command line. + @item -f @var{archive} @itemx --file=@var{archive} Use archive file @var{archive}. @samp{-} used as an @var{archive} @@ -183,17 +193,19 @@ argument reads from standard input or writes to standard output. @item -n @var{n} @itemx --threads=@var{n} -Set the number of decompression threads, overriding the system's default. +Set the number of (de)compression threads, overriding the system's default. Valid values range from 0 to "as many as your system can support". A value of 0 disables threads entirely. If this option is not used, tarlz tries to detect the number of processors in the system and use it as default value. -@w{@samp{tarlz --help}} shows the system's default value. This option -currently only has effect when listing the contents of a multimember -compressed archive. @xref{Multi-threaded tar}. +@w{@samp{tarlz --help}} shows the system's default value. See the note about +multi-threaded archive creation in the @samp{-C} option above. +Multi-threaded extraction of files from an archive is not yet implemented. +@xref{Multi-threaded tar}. -Note that the number of usable threads is limited during decompression to -the number of lzip members in the tar.lz archive, which you can find by -running @w{@code{lzip -lv archive.tar.lz}}. +Note that the number of usable threads is limited during compression to +@w{ceil( uncompressed_size / data_size )} (@pxref{Minimum archive sizes}), +and during decompression to the number of lzip members in the tar.lz +archive, which you can find by running @w{@code{lzip -lv archive.tar.lz}}. @item -q @itemx --quiet @@ -213,7 +225,7 @@ to an uncompressed tar archive. @item -t @itemx --list List the contents of an archive. If @var{files} are given, list only the -given @var{files}. +@var{files} given. @item -v @itemx --verbose @@ -222,7 +234,7 @@ Verbosely list files processed. @item -x @itemx --extract Extract files from an archive. If @var{files} are given, extract only -the given @var{files}. Else extract all the files in the archive. +the @var{files} given. Else extract all the files in the archive. @item -0 .. -9 Set the compression level. The default compression level is @samp{-6}. @@ -245,40 +257,42 @@ it creates, reducing the amount of memory required for decompression. @item --asolid When creating or appending to a compressed archive, use appendable solid -compression. All the files being added to the archive are compressed -into a single lzip member, but the end-of-file blocks are compressed -into a separate lzip member. This creates a solidly compressed -appendable archive. +compression. All the files being added to the archive are compressed into a +single lzip member, but the end-of-file blocks are compressed into a +separate lzip member. This creates a solidly compressed appendable archive. +Solid archives can't be created nor decoded in parallel. @item --bsolid -When creating or appending to a compressed archive, compress tar members -together in a lzip member until they approximate a target uncompressed size. -The size can't be exact because each solidly compressed data block must -contain an integer number of tar members. This option improves compression -efficiency for archives with lots of small files. @xref{--data-size}, to set -the target block size. +When creating or appending to a compressed archive, use block compression. +Tar members are compressed together in a lzip member until they approximate +a target uncompressed size. The size can't be exact because each solidly +compressed data block must contain an integer number of tar members. Block +compression is the default because it improves compression ratio for +archives with many files smaller than the block size. This option allows +tarlz revert to default behavior if, for example, it is invoked through an +alias like @code{tar='tarlz --solid'}. @xref{--data-size}, to set the target +block size. @item --dsolid -When creating or appending to a compressed archive, use solid -compression for each directory especified in the command line. The -end-of-file blocks are compressed into a separate lzip member. This -creates a compressed appendable archive with a separate lzip member for -each top-level directory. +When creating or appending to a compressed archive, compress each file +specified in the command line separately in its own lzip member, and use +solid compression for each directory specified in the command line. The +end-of-file blocks are compressed into a separate lzip member. This creates +a compressed appendable archive with a separate lzip member for each file or +top-level directory specified. @item --no-solid When creating or appending to a compressed archive, compress each file -separately. The end-of-file blocks are compressed into a separate lzip -member. This creates a compressed appendable archive with a separate -lzip member for each file. This option allows tarlz revert to default -behavior if, for example, tarlz is invoked through an alias like -@code{tar='tarlz --solid'}. +separately in its own lzip member. The end-of-file blocks are compressed +into a separate lzip member. This creates a compressed appendable archive +with a lzip member for each file. @item --solid -When creating or appending to a compressed archive, use solid -compression. The files being added to the archive, along with the -end-of-file blocks, are compressed into a single lzip member. The -resulting archive is not appendable. No more files can be later appended -to the archive. +When creating or appending to a compressed archive, use solid compression. +The files being added to the archive, along with the end-of-file blocks, are +compressed into a single lzip member. The resulting archive is not +appendable. No more files can be later appended to the archive. Solid +archives can't be created nor decoded in parallel. @item --anonymous Equivalent to @samp{--owner=root --group=root}. @@ -388,11 +402,11 @@ binary zeros, interpreted as an end-of-archive indicator. These EOF blocks are either compressed in a separate lzip member or compressed along with the tar members contained in the last lzip member. -The diagram below shows the correspondence between each tar member -(formed by one or two headers plus optional data) in the tar archive and -each +The diagram below shows the correspondence between each tar member (formed +by one or two headers plus optional data) in the tar archive and each @uref{http://www.nongnu.org/lzip/manual/lzip_manual.html#File-format,,lzip member} -in the resulting multimember tar.lz archive: +in the resulting multimember tar.lz archive, when per file compression is +used: @ifnothtml @xref{File format,,,lzip}. @end ifnothtml @@ -672,10 +686,10 @@ format. @section Avoid misconversions to/from UTF-8 There is no portable way to tell what charset a text string is coded into. -Therefore, tarlz stores all fields representing text strings as-is, without -conversion to UTF-8 nor any other transformation. This prevents accidental -double UTF-8 conversions. If the need arises this behavior will be adjusted -with a command line option in the future. +Therefore, tarlz stores all fields representing text strings unmodified, +without conversion to UTF-8 nor any other transformation. This prevents +accidental double UTF-8 conversions. If the need arises this behavior will +be adjusted with a command line option in the future. @node Multi-threaded tar @@ -717,13 +731,51 @@ it only needs to decompress part of each lzip member. See the following example listing the Silesia corpus on a dual core machine: @example -tarlz -9 -cf silesia.tar.lz silesia +tarlz -9 --no-solid -cf silesia.tar.lz silesia time lzip -cd silesia.tar.lz | tar -tf - (5.032s) time plzip -cd silesia.tar.lz | tar -tf - (3.256s) time tarlz -tf silesia.tar.lz (0.020s) @end example +@node Minimum archive sizes +@chapter Minimum archive sizes required for multi-threaded block compression +@cindex minimum archive sizes + +When creating or appending to a compressed archive using multi-threaded +block compression, tarlz puts tar members together in blocks and compresses +as many blocks simultaneously as worker threads are chosen, creating a +multimember compressed archive. + +For this to work as expected (and roughly multiply the compression speed by +the number of available processors), the uncompressed archive must be at +least as large as the number of worker threads times the block size +(@pxref{--data-size}). Else some processors will not get any data to +compress, and compression will be proportionally slower. The maximum speed +increase achievable on a given file is limited by the ratio +@w{(uncompressed_size / data_size)}. For example, a tarball the size of gcc +or linux will scale up to 10 or 12 processors at level -9. + +The following table shows the minimum uncompressed archive size needed for +full use of N processors at a given compression level, using the default +data size for each level: + +@multitable {Processors} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB} +@headitem Processors @tab 2 @tab 4 @tab 8 @tab 16 @tab 64 @tab 256 +@item Level +@item -0 @tab 2 MiB @tab 4 MiB @tab 8 MiB @tab 16 MiB @tab 64 MiB @tab 256 MiB +@item -1 @tab 4 MiB @tab 8 MiB @tab 16 MiB @tab 32 MiB @tab 128 MiB @tab 512 MiB +@item -2 @tab 6 MiB @tab 12 MiB @tab 24 MiB @tab 48 MiB @tab 192 MiB @tab 768 MiB +@item -3 @tab 8 MiB @tab 16 MiB @tab 32 MiB @tab 64 MiB @tab 256 MiB @tab 1 GiB +@item -4 @tab 12 MiB @tab 24 MiB @tab 48 MiB @tab 96 MiB @tab 384 MiB @tab 1.5 GiB +@item -5 @tab 16 MiB @tab 32 MiB @tab 64 MiB @tab 128 MiB @tab 512 MiB @tab 2 GiB +@item -6 @tab 32 MiB @tab 64 MiB @tab 128 MiB @tab 256 MiB @tab 1 GiB @tab 4 GiB +@item -7 @tab 64 MiB @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab 2 GiB @tab 8 GiB +@item -8 @tab 96 MiB @tab 192 MiB @tab 384 MiB @tab 768 MiB @tab 3 GiB @tab 12 GiB +@item -9 @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab 1 GiB @tab 4 GiB @tab 16 GiB +@end multitable + + @node Examples @chapter A small tutorial with examples @cindex examples |