Merging upstream version 0.11.

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2019-02-14 05:12:08 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2019-02-14 05:12:08 +0000
commit: b276e37c698f0f64669894eac75044f3ab0fd4fe (patch)
tree: 2d9c4ef85d4cd7573088064af7be26631a65f51c /doc
parent: Releasing debian version 0.10a-3. (diff)
download: tarlz-b276e37c698f0f64669894eac75044f3ab0fd4fe.tar.xz
tarlz-b276e37c698f0f64669894eac75044f3ab0fd4fe.zip
3 files changed, 265 insertions, 153 deletions
diff --git a/doc/tarlz.1 b/doc/tarlz.1
index c30c72f..82462cd 100644
--- a/doc/tarlz.1
+++ b/doc/tarlz.1
@@ -1,20 +1,20 @@
 .\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.46.1.
-.TH TARLZ "1" "February 2019" "tarlz 0.10a" "User Commands"
+.TH TARLZ "1" "February 2019" "tarlz 0.11" "User Commands"
 .SH NAME
 tarlz \- creates tar archives with multimember lzip compression
 .SH SYNOPSIS
 .B tarlz
 [\fI\,options\/\fR] [\fI\,files\/\fR]
 .SH DESCRIPTION
-Tarlz is a combined implementation of the tar archiver and the lzip
-compressor. By default tarlz creates, lists and extracts archives in a
-simplified posix pax format compressed with lzip on a per file basis. Each
-tar member is compressed in its own lzip member, as well as the end\-of\-file
-blocks. This method adds an indexed lzip layer on top of the tar archive,
-making it possible to decode the archive safely in parallel. The resulting
-multimember tar.lz archive is fully backward compatible with standard tar
-tools like GNU tar, which treat it like any other tar.lz archive. Tarlz can
-append files to the end of such compressed archives.
+Tarlz is a massively parallel (multi\-threaded) combined implementation of
+the tar archiver and the lzip compressor. Tarlz creates, lists and extracts
+archives in a simplified posix pax format compressed with lzip, keeping the
+alignment between tar members and lzip members. This method adds an indexed
+lzip layer on top of the tar archive, making it possible to decode the
+archive safely in parallel. The resulting multimember tar.lz archive is
+fully backward compatible with standard tar tools like GNU tar, which treat
+it like any other tar.lz archive. Tarlz can append files to the end of such
+compressed archives.
 .PP
 The tarlz file format is a safe posix\-style backup format. In case of
 corruption, tarlz can extract all the undamaged members from the tar.lz
@@ -46,7 +46,7 @@ change to directory <dir>
 use archive file <archive>
 .TP
 \fB\-n\fR, \fB\-\-threads=\fR<n>
-set number of decompression threads [2]
+set number of (de)compression threads [2]
 .TP
 \fB\-q\fR, \fB\-\-quiet\fR
 suppress all messages
@@ -70,13 +70,13 @@ set compression level [default 6]
 create solidly compressed appendable archive
 .TP
 \fB\-\-bsolid\fR
-create per\-data\-block compressed archive
+create per block compressed archive (default)
 .TP
 \fB\-\-dsolid\fR
-create per\-directory compressed archive
+create per directory compressed archive
 .TP
 \fB\-\-no\-solid\fR
-create per\-file compressed archive (default)
+create per file compressed archive
 .TP
 \fB\-\-solid\fR
 create solidly compressed archive
diff --git a/doc/tarlz.info b/doc/tarlz.info
index bf1e1f5..288c441 100644
--- a/doc/tarlz.info
+++ b/doc/tarlz.info
@@ -11,7 +11,7 @@ File: tarlz.info,  Node: Top,  Next: Introduction,  Up: (dir)
 Tarlz Manual
 ************
 
-This manual is for Tarlz (version 0.10, 31 January 2019).
+This manual is for Tarlz (version 0.11, 13 February 2019).
 
 * Menu:
 
@@ -20,6 +20,7 @@ This manual is for Tarlz (version 0.10, 31 January 2019).
 * File format::               Detailed format of the compressed archive
 * Amendments to pax format::  The reasons for the differences with pax
 * Multi-threaded tar::        Limitations of parallel tar decoding
+* Minimum archive sizes::     Sizes required for full multi-threaded speed
 * Examples::                  A small tutorial with examples
 * Problems::                  Reporting bugs
 * Concept index::             Index of concepts
@@ -36,23 +37,23 @@ File: tarlz.info,  Node: Introduction,  Next: Invoking tarlz,  Prev: Top,  Up: T
 1 Introduction
 **************
 
-Tarlz is a combined implementation of the tar archiver and the lzip
-compressor. By default tarlz creates, lists and extracts archives in a
-simplified posix pax format compressed with lzip on a per file basis.
-Each tar member is compressed in its own lzip member, as well as the
-end-of-file blocks. This method adds an indexed lzip layer on top of
-the tar archive, making it possible to decode the archive safely in
-parallel. The resulting multimember tar.lz archive is fully backward
-compatible with standard tar tools like GNU tar, which treat it like
-any other tar.lz archive. Tarlz can append files to the end of such
-compressed archives.
+Tarlz is a massively parallel (multi-threaded) combined implementation
+of the tar archiver and the lzip compressor. Tarlz creates, lists and
+extracts archives in a simplified posix pax format compressed with
+lzip, keeping the alignment between tar members and lzip members. This
+method adds an indexed lzip layer on top of the tar archive, making it
+possible to decode the archive safely in parallel. The resulting
+multimember tar.lz archive is fully backward compatible with standard
+tar tools like GNU tar, which treat it like any other tar.lz archive.
+Tarlz can append files to the end of such compressed archives.
 
-   Tarlz can create tar archives with four levels of compression
-granularity; per file, per directory, appendable solid, and solid.
+   Tarlz can create tar archives with five levels of compression
+granularity; per file, per block, per directory, appendable solid, and
+solid.
 
-Of course, compressing each file (or each directory) individually is
-less efficient than compressing the whole tar archive, but it has the
-following advantages:
+Of course, compressing each file (or each directory) individually can't
+achieve a compression ratio as high as compressing solidly the whole tar
+archive, but it has the following advantages:
 
    * The resulting multimember tar.lz archive can be decompressed in
      parallel, multiplying the decompression speed.
@@ -87,17 +88,23 @@ The format for running tarlz is:
 
      tarlz [OPTIONS] [FILES]
 
-On archive creation or appending, tarlz removes leading and trailing
-slashes from filenames, as well as filename prefixes containing a '..'
-component. On extraction, archive members containing a '..' component
-are skipped. Tarlz detects when the archive being created or enlarged
-is among the files to be dumped, appended or concatenated, and skips it.
+On archive creation or appending tarlz archives the files specified, but
+removes from member names any leading and trailing slashes and any
+filename prefixes containing a '..' component. On extraction, leading
+and trailing slashes are also removed from member names, and archive
+members containing a '..' component in the filename are skipped. Tarlz
+detects when the archive being created or enlarged is among the files
+to be dumped, appended or concatenated, and skips it.
 
    On extraction and listing, tarlz removes leading './' strings from
 member names in the archive or given in the command line, so that
 'tarlz -xf foo ./bar baz' extracts members 'bar' and './baz' from
 archive 'foo'.
 
+   If several compression levels or '--*solid' options are given, the
+last setting is used. For example '-9 --solid --uncompressed -1' is
+equivalent to '-1 --solid'
+
    tarlz supports the following options:
 
 '-h'
@@ -125,7 +132,7 @@ archive 'foo'.
      Set target size of input data blocks for the '--bsolid' option.
      Valid values range from 8 KiB to 1 GiB. Default value is two times
      the dictionary size, except for option '-0' where it defaults to
-     1 MiB.
+     1 MiB.  *Note Minimum archive sizes::.
 
 '-c'
 '--create'
@@ -142,6 +149,11 @@ archive 'foo'.
      relative to the then current working directory, perhaps changed by
      a previous '-C' option.
 
+     Note that a process can only have one current working directory
+     (CWD).  Therefore multi-threading can't be used to create an
+     archive if a '-C' option appears after a relative filename in the
+     command line.
+
 '-f ARCHIVE'
 '--file=ARCHIVE'
      Use archive file ARCHIVE. '-' used as an ARCHIVE argument reads
@@ -149,18 +161,21 @@ archive 'foo'.
 
 '-n N'
 '--threads=N'
-     Set the number of decompression threads, overriding the system's
+     Set the number of (de)compression threads, overriding the system's
      default.  Valid values range from 0 to "as many as your system can
      support". A value of 0 disables threads entirely. If this option
      is not used, tarlz tries to detect the number of processors in the
      system and use it as default value.  'tarlz --help' shows the
-     system's default value. This option currently only has effect when
-     listing the contents of a multimember compressed archive. *Note
+     system's default value. See the note about multi-threaded archive
+     creation in the '-C' option above.  Multi-threaded extraction of
+     files from an archive is not yet implemented.  *Note
      Multi-threaded tar::.
 
      Note that the number of usable threads is limited during
-     decompression to the number of lzip members in the tar.lz archive,
-     which you can find by running 'lzip -lv archive.tar.lz'.
+     compression to ceil( uncompressed_size / data_size ) (*note
+     Minimum archive sizes::), and during decompression to the number
+     of lzip members in the tar.lz archive, which you can find by
+     running 'lzip -lv archive.tar.lz'.
 
 '-q'
 '--quiet'
@@ -180,7 +195,7 @@ archive 'foo'.
 '-t'
 '--list'
      List the contents of an archive. If FILES are given, list only the
-     given FILES.
+     FILES given.
 
 '-v'
 '--verbose'
@@ -189,7 +204,7 @@ archive 'foo'.
 '-x'
 '--extract'
      Extract files from an archive. If FILES are given, extract only
-     the given FILES. Else extract all the files in the archive.
+     the FILES given. Else extract all the files in the archive.
 
 '-0 .. -9'
      Set the compression level. The default compression level is '-6'.
@@ -214,38 +229,43 @@ archive 'foo'.
      solid compression. All the files being added to the archive are
      compressed into a single lzip member, but the end-of-file blocks
      are compressed into a separate lzip member. This creates a solidly
-     compressed appendable archive.
+     compressed appendable archive.  Solid archives can't be created
+     nor decoded in parallel.
 
 '--bsolid'
-     When creating or appending to a compressed archive, compress tar
-     members together in a lzip member until they approximate a target
-     uncompressed size.  The size can't be exact because each solidly
-     compressed data block must contain an integer number of tar
-     members. This option improves compression efficiency for archives
-     with lots of small files. *Note --data-size::, to set the target
+     When creating or appending to a compressed archive, use block
+     compression.  Tar members are compressed together in a lzip member
+     until they approximate a target uncompressed size. The size can't
+     be exact because each solidly compressed data block must contain
+     an integer number of tar members. Block compression is the default
+     because it improves compression ratio for archives with many files
+     smaller than the block size. This option allows tarlz revert to
+     default behavior if, for example, it is invoked through an alias
+     like 'tar='tarlz --solid''. *Note --data-size::, to set the target
      block size.
 
 '--dsolid'
-     When creating or appending to a compressed archive, use solid
-     compression for each directory especified in the command line. The
-     end-of-file blocks are compressed into a separate lzip member. This
-     creates a compressed appendable archive with a separate lzip
-     member for each top-level directory.
+     When creating or appending to a compressed archive, compress each
+     file specified in the command line separately in its own lzip
+     member, and use solid compression for each directory specified in
+     the command line. The end-of-file blocks are compressed into a
+     separate lzip member. This creates a compressed appendable archive
+     with a separate lzip member for each file or top-level directory
+     specified.
 
 '--no-solid'
      When creating or appending to a compressed archive, compress each
-     file separately. The end-of-file blocks are compressed into a
-     separate lzip member. This creates a compressed appendable archive
-     with a separate lzip member for each file. This option allows
-     tarlz revert to default behavior if, for example, tarlz is invoked
-     through an alias like 'tar='tarlz --solid''.
+     file separately in its own lzip member. The end-of-file blocks are
+     compressed into a separate lzip member. This creates a compressed
+     appendable archive with a lzip member for each file.
 
 '--solid'
      When creating or appending to a compressed archive, use solid
-     compression. The files being added to the archive, along with the
+     compression.  The files being added to the archive, along with the
      end-of-file blocks, are compressed into a single lzip member. The
      resulting archive is not appendable. No more files can be later
-     appended to the archive.
+     appended to the archive. Solid archives can't be created nor
+     decoded in parallel.
 
 '--anonymous'
      Equivalent to '--owner=root --group=root'.
@@ -341,9 +361,9 @@ blocks are either compressed in a separate lzip member or compressed
 along with the tar members contained in the last lzip member.
 
    The diagram below shows the correspondence between each tar member
-(formed by one or two headers plus optional data) in the tar archive and
-each lzip member in the resulting multimember tar.lz archive: *Note
-File format: (lzip)File format.
+(formed by one or two headers plus optional data) in the tar archive
+and each lzip member in the resulting multimember tar.lz archive, when
+per file compression is used: *Note File format: (lzip)File format.
 
 tar
 +========+======+=================+===============+========+======+========+
@@ -612,12 +632,12 @@ wasteful for a backup format.
 
 There is no portable way to tell what charset a text string is coded
 into.  Therefore, tarlz stores all fields representing text strings
-as-is, without conversion to UTF-8 nor any other transformation. This
-prevents accidental double UTF-8 conversions. If the need arises this
-behavior will be adjusted with a command line option in the future.
+unmodified, without conversion to UTF-8 nor any other transformation.
+This prevents accidental double UTF-8 conversions. If the need arises
+this behavior will be adjusted with a command line option in the future.
 
 
-File: tarlz.info,  Node: Multi-threaded tar,  Next: Examples,  Prev: Amendments to pax format,  Up: Top
+File: tarlz.info,  Node: Multi-threaded tar,  Next: Minimum archive sizes,  Prev: Amendments to pax format,  Up: Top
 
 5 Limitations of parallel tar decoding
 **************************************
@@ -659,15 +679,53 @@ sequential '--list' because, in addition to using several processors,
 it only needs to decompress part of each lzip member. See the following
 example listing the Silesia corpus on a dual core machine:
 
-     tarlz -9 -cf silesia.tar.lz silesia
+     tarlz -9 --no-solid -cf silesia.tar.lz silesia
      time lzip -cd silesia.tar.lz | tar -tf -            (5.032s)
      time plzip -cd silesia.tar.lz | tar -tf -           (3.256s)
      time tarlz -tf silesia.tar.lz                       (0.020s)
 
 
-File: tarlz.info,  Node: Examples,  Next: Problems,  Prev: Multi-threaded tar,  Up: Top
+File: tarlz.info,  Node: Minimum archive sizes,  Next: Examples,  Prev: Multi-threaded tar,  Up: Top
+
+6 Minimum archive sizes required for multi-threaded block compression
+*********************************************************************
+
+When creating or appending to a compressed archive using multi-threaded
+block compression, tarlz puts tar members together in blocks and
+compresses as many blocks simultaneously as worker threads are chosen,
+creating a multimember compressed archive.
+
+   For this to work as expected (and roughly multiply the compression
+speed by the number of available processors), the uncompressed archive
+must be at least as large as the number of worker threads times the
+block size (*note --data-size::). Else some processors will not get any
+data to compress, and compression will be proportionally slower. The
+maximum speed increase achievable on a given file is limited by the
+ratio (uncompressed_size / data_size). For example, a tarball the size
+of gcc or linux will scale up to 10 or 12 processors at level -9.
+
+   The following table shows the minimum uncompressed archive size
+needed for full use of N processors at a given compression level, using
+the default data size for each level:
+
+Processors   2         4         8         16        64        256
+------------------------------------------------------------------
+Level                                                          
+-0           2 MiB     4 MiB     8 MiB     16 MiB    64 MiB    256 MiB
+-1           4 MiB     8 MiB     16 MiB    32 MiB    128 MiB   512 MiB
+-2           6 MiB     12 MiB    24 MiB    48 MiB    192 MiB   768 MiB
+-3           8 MiB     16 MiB    32 MiB    64 MiB    256 MiB   1 GiB
+-4           12 MiB    24 MiB    48 MiB    96 MiB    384 MiB   1.5 GiB
+-5           16 MiB    32 MiB    64 MiB    128 MiB   512 MiB   2 GiB
+-6           32 MiB    64 MiB    128 MiB   256 MiB   1 GiB     4 GiB
+-7           64 MiB    128 MiB   256 MiB   512 MiB   2 GiB     8 GiB
+-8           96 MiB    192 MiB   384 MiB   768 MiB   3 GiB     12 GiB
+-9           128 MiB   256 MiB   512 MiB   1 GiB     4 GiB     16 GiB
+
+
+File: tarlz.info,  Node: Examples,  Next: Problems,  Prev: Minimum archive sizes,  Up: Top
 
-6 A small tutorial with examples
+7 A small tutorial with examples
 ********************************
 
 Example 1: Create a multimember compressed archive 'archive.tar.lz'
@@ -725,7 +783,7 @@ Example 8: Copy the contents of directory 'sourcedir' to the directory
 
 File: tarlz.info,  Node: Problems,  Next: Concept index,  Prev: Examples,  Up: Top
 
-7 Reporting bugs
+8 Reporting bugs
 ****************
 
 There are probably bugs in tarlz. There are certainly errors and
@@ -754,6 +812,7 @@ Concept index
 * getting help:                          Problems.              (line 6)
 * introduction:                          Introduction.          (line 6)
 * invoking:                              Invoking tarlz.        (line 6)
+* minimum archive sizes:                 Minimum archive sizes. (line 6)
 * options:                               Invoking tarlz.        (line 6)
 * usage:                                 Invoking tarlz.        (line 6)
 * version:                               Invoking tarlz.        (line 6)
@@ -762,18 +821,19 @@ Concept index
 
 Tag Table:
 Node: Top223
-Node: Introduction1013
-Node: Invoking tarlz3125
-Ref: --data-size4717
-Node: File format11536
-Ref: key_crc3216321
-Node: Amendments to pax format21738
-Ref: crc3222262
-Ref: flawed-compat23287
-Node: Multi-threaded tar25649
-Node: Examples28164
-Node: Problems29830
-Node: Concept index30356
+Node: Introduction1089
+Node: Invoking tarlz3218
+Ref: --data-size5097
+Node: File format12673
+Ref: key_crc3217493
+Node: Amendments to pax format22910
+Ref: crc3223434
+Ref: flawed-compat24459
+Node: Multi-threaded tar26826
+Node: Minimum archive sizes29365
+Node: Examples31495
+Node: Problems33164
+Node: Concept index33690
 
 End Tag Table
 
diff --git a/doc/tarlz.texi b/doc/tarlz.texi
index 2ab37fb..6026fe3 100644
--- a/doc/tarlz.texi
+++ b/doc/tarlz.texi
@@ -6,8 +6,8 @@
 @finalout
 @c %**end of header
 
-@set UPDATED 31 January 2019
-@set VERSION 0.10
+@set UPDATED 13 February 2019
+@set VERSION 0.11
 
 @dircategory Data Compression
 @direntry
@@ -40,6 +40,7 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
 * File format::               Detailed format of the compressed archive
 * Amendments to pax format::  The reasons for the differences with pax
 * Multi-threaded tar::        Limitations of parallel tar decoding
+* Minimum archive sizes::     Sizes required for full multi-threaded speed
 * Examples::                  A small tutorial with examples
 * Problems::                  Reporting bugs
 * Concept index::             Index of concepts
@@ -56,25 +57,24 @@ to copy, distribute and modify it.
 @chapter Introduction
 @cindex introduction
 
-@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a combined
-implementation of the tar archiver and the
-@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. By default
-tarlz creates, lists and extracts archives in a simplified posix pax format
-compressed with lzip on a per file basis. Each tar member is compressed in
-its own lzip member, as well as the end-of-file blocks. This method adds an
-indexed lzip layer on top of the tar archive, making it possible to decode
-the archive safely in parallel. The resulting multimember tar.lz archive is
-fully backward compatible with standard tar tools like GNU tar, which treat
-it like any other tar.lz archive. Tarlz can append files to the end of such
-compressed archives.
-
-Tarlz can create tar archives with four levels of compression granularity;
-per file, per directory, appendable solid, and solid.
+@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a massively parallel
+(multi-threaded) combined implementation of the tar archiver and the
+@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz creates,
+lists and extracts archives in a simplified posix pax format compressed with
+lzip, keeping the alignment between tar members and lzip members. This
+method adds an indexed lzip layer on top of the tar archive, making it
+possible to decode the archive safely in parallel. The resulting multimember
+tar.lz archive is fully backward compatible with standard tar tools like GNU
+tar, which treat it like any other tar.lz archive. Tarlz can append files to
+the end of such compressed archives.
+
+Tarlz can create tar archives with five levels of compression granularity;
+per file, per block, per directory, appendable solid, and solid.
 
 @noindent
-Of course, compressing each file (or each directory) individually is
-less efficient than compressing the whole tar archive, but it has the
-following advantages:
+Of course, compressing each file (or each directory) individually can't
+achieve a compression ratio as high as compressing solidly the whole tar
+archive, but it has the following advantages:
 
 @itemize @bullet
 @item
@@ -120,18 +120,23 @@ tarlz [@var{options}] [@var{files}]
 @end example
 
 @noindent
-On archive creation or appending, tarlz removes leading and trailing
-slashes from filenames, as well as filename prefixes containing a
-@samp{..} component. On extraction, archive members containing a
-@samp{..} component are skipped. Tarlz detects when the archive being
-created or enlarged is among the files to be dumped, appended or
-concatenated, and skips it.
+On archive creation or appending tarlz archives the files specified, but
+removes from member names any leading and trailing slashes and any filename
+prefixes containing a @samp{..} component. On extraction, leading and
+trailing slashes are also removed from member names, and archive members
+containing a @samp{..} component in the filename are skipped. Tarlz detects
+when the archive being created or enlarged is among the files to be dumped,
+appended or concatenated, and skips it.
 
 On extraction and listing, tarlz removes leading @samp{./} strings from
 member names in the archive or given in the command line, so that
 @w{@code{tarlz -xf foo ./bar baz}} extracts members @samp{bar} and
 @samp{./baz} from archive @samp{foo}.
 
+If several compression levels or @samp{--*solid} options are given, the last
+setting is used. For example @w{@samp{-9 --solid --uncompressed -1}} is
+equivalent to @samp{-1 --solid}
+
 tarlz supports the following options:
 
 @table @code
@@ -160,6 +165,7 @@ specified. Tarlz can't concatenate uncompressed tar archives.
 Set target size of input data blocks for the @samp{--bsolid} option. Valid
 values range from @w{8 KiB} to @w{1 GiB}. Default value is two times the
 dictionary size, except for option @samp{-0} where it defaults to @w{1 MiB}.
+@xref{Minimum archive sizes}.
 
 @item -c
 @itemx --create
@@ -176,6 +182,10 @@ extraction. Listing ignores any @samp{-C} options specified. @var{dir}
 is relative to the then current working directory, perhaps changed by a
 previous @samp{-C} option.
 
+Note that a process can only have one current working directory (CWD).
+Therefore multi-threading can't be used to create an archive if a @samp{-C}
+option appears after a relative filename in the command line.
+
 @item -f @var{archive}
 @itemx --file=@var{archive}
 Use archive file @var{archive}. @samp{-} used as an @var{archive}
@@ -183,17 +193,19 @@ argument reads from standard input or writes to standard output.
 
 @item -n @var{n}
 @itemx --threads=@var{n}
-Set the number of decompression threads, overriding the system's default.
+Set the number of (de)compression threads, overriding the system's default.
 Valid values range from 0 to "as many as your system can support". A value
 of 0 disables threads entirely. If this option is not used, tarlz tries to
 detect the number of processors in the system and use it as default value.
-@w{@samp{tarlz --help}} shows the system's default value. This option
-currently only has effect when listing the contents of a multimember
-compressed archive. @xref{Multi-threaded tar}.
+@w{@samp{tarlz --help}} shows the system's default value. See the note about
+multi-threaded archive creation in the @samp{-C} option above.
+Multi-threaded extraction of files from an archive is not yet implemented.
+@xref{Multi-threaded tar}.
 
-Note that the number of usable threads is limited during decompression to
-the number of lzip members in the tar.lz archive, which you can find by
-running @w{@code{lzip -lv archive.tar.lz}}.
+Note that the number of usable threads is limited during compression to
+@w{ceil( uncompressed_size / data_size )} (@pxref{Minimum archive sizes}),
+and during decompression to the number of lzip members in the tar.lz
+archive, which you can find by running @w{@code{lzip -lv archive.tar.lz}}.
 
 @item -q
 @itemx --quiet
@@ -213,7 +225,7 @@ to an uncompressed tar archive.
 @item -t
 @itemx --list
 List the contents of an archive. If @var{files} are given, list only the
-given @var{files}.
+@var{files} given.
 
 @item -v
 @itemx --verbose
@@ -222,7 +234,7 @@ Verbosely list files processed.
 @item -x
 @itemx --extract
 Extract files from an archive. If @var{files} are given, extract only
-the given @var{files}. Else extract all the files in the archive.
+the @var{files} given. Else extract all the files in the archive.
 
 @item -0 .. -9
 Set the compression level. The default compression level is @samp{-6}.
@@ -245,40 +257,42 @@ it creates, reducing the amount of memory required for decompression.
 
 @item --asolid
 When creating or appending to a compressed archive, use appendable solid
-compression. All the files being added to the archive are compressed
-into a single lzip member, but the end-of-file blocks are compressed
-into a separate lzip member. This creates a solidly compressed
-appendable archive.
+compression. All the files being added to the archive are compressed into a
+single lzip member, but the end-of-file blocks are compressed into a
+separate lzip member. This creates a solidly compressed appendable archive.
+Solid archives can't be created nor decoded in parallel.
 
 @item --bsolid
-When creating or appending to a compressed archive, compress tar members
-together in a lzip member until they approximate a target uncompressed size.
-The size can't be exact because each solidly compressed data block must
-contain an integer number of tar members. This option improves compression
-efficiency for archives with lots of small files. @xref{--data-size}, to set
-the target block size.
+When creating or appending to a compressed archive, use block compression.
+Tar members are compressed together in a lzip member until they approximate
+a target uncompressed size. The size can't be exact because each solidly
+compressed data block must contain an integer number of tar members. Block
+compression is the default because it improves compression ratio for
+archives with many files smaller than the block size. This option allows
+tarlz revert to default behavior if, for example, it is invoked through an
+alias like @code{tar='tarlz --solid'}. @xref{--data-size}, to set the target
+block size.
 
 @item --dsolid
-When creating or appending to a compressed archive, use solid
-compression for each directory especified in the command line. The
-end-of-file blocks are compressed into a separate lzip member. This
-creates a compressed appendable archive with a separate lzip member for
-each top-level directory.
+When creating or appending to a compressed archive, compress each file
+specified in the command line separately in its own lzip member, and use
+solid compression for each directory specified in the command line. The
+end-of-file blocks are compressed into a separate lzip member. This creates
+a compressed appendable archive with a separate lzip member for each file or
+top-level directory specified.
 
 @item --no-solid
 When creating or appending to a compressed archive, compress each file
-separately. The end-of-file blocks are compressed into a separate lzip
-member. This creates a compressed appendable archive with a separate
-lzip member for each file. This option allows tarlz revert to default
-behavior if, for example, tarlz is invoked through an alias like
-@code{tar='tarlz --solid'}.
+separately in its own lzip member. The end-of-file blocks are compressed
+into a separate lzip member. This creates a compressed appendable archive
+with a lzip member for each file.
 
 @item --solid
-When creating or appending to a compressed archive, use solid
-compression. The files being added to the archive, along with the
-end-of-file blocks, are compressed into a single lzip member. The
-resulting archive is not appendable. No more files can be later appended
-to the archive.
+When creating or appending to a compressed archive, use solid compression.
+The files being added to the archive, along with the end-of-file blocks, are
+compressed into a single lzip member. The resulting archive is not
+appendable. No more files can be later appended to the archive. Solid
+archives can't be created nor decoded in parallel.
 
 @item --anonymous
 Equivalent to @samp{--owner=root --group=root}.
@@ -388,11 +402,11 @@ binary zeros, interpreted as an end-of-archive indicator. These EOF
 blocks are either compressed in a separate lzip member or compressed
 along with the tar members contained in the last lzip member.
 
-The diagram below shows the correspondence between each tar member
-(formed by one or two headers plus optional data) in the tar archive and
-each
+The diagram below shows the correspondence between each tar member (formed
+by one or two headers plus optional data) in the tar archive and each
 @uref{http://www.nongnu.org/lzip/manual/lzip_manual.html#File-format,,lzip member}
-in the resulting multimember tar.lz archive:
+in the resulting multimember tar.lz archive, when per file compression is
+used:
 @ifnothtml
 @xref{File format,,,lzip}.
 @end ifnothtml
@@ -672,10 +686,10 @@ format.
 @section Avoid misconversions to/from UTF-8
 
 There is no portable way to tell what charset a text string is coded into.
-Therefore, tarlz stores all fields representing text strings as-is, without
-conversion to UTF-8 nor any other transformation. This prevents accidental
-double UTF-8 conversions. If the need arises this behavior will be adjusted
-with a command line option in the future.
+Therefore, tarlz stores all fields representing text strings unmodified,
+without conversion to UTF-8 nor any other transformation. This prevents
+accidental double UTF-8 conversions. If the need arises this behavior will
+be adjusted with a command line option in the future.
 
 
 @node Multi-threaded tar
@@ -717,13 +731,51 @@ it only needs to decompress part of each lzip member. See the following
 example listing the Silesia corpus on a dual core machine:
 
 @example
-tarlz -9 -cf silesia.tar.lz silesia
+tarlz -9 --no-solid -cf silesia.tar.lz silesia
 time lzip -cd silesia.tar.lz | tar -tf -            (5.032s)
 time plzip -cd silesia.tar.lz | tar -tf -           (3.256s)
 time tarlz -tf silesia.tar.lz                       (0.020s)
 @end example
 
 
+@node Minimum archive sizes
+@chapter Minimum archive sizes required for multi-threaded block compression
+@cindex minimum archive sizes
+
+When creating or appending to a compressed archive using multi-threaded
+block compression, tarlz puts tar members together in blocks and compresses
+as many blocks simultaneously as worker threads are chosen, creating a
+multimember compressed archive.
+
+For this to work as expected (and roughly multiply the compression speed by
+the number of available processors), the uncompressed archive must be at
+least as large as the number of worker threads times the block size
+(@pxref{--data-size}). Else some processors will not get any data to
+compress, and compression will be proportionally slower. The maximum speed
+increase achievable on a given file is limited by the ratio
+@w{(uncompressed_size / data_size)}. For example, a tarball the size of gcc
+or linux will scale up to 10 or 12 processors at level -9.
+
+The following table shows the minimum uncompressed archive size needed for
+full use of N processors at a given compression level, using the default
+data size for each level:
+
+@multitable {Processors} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB}
+@headitem Processors @tab 2 @tab 4 @tab 8 @tab 16 @tab 64 @tab 256
+@item Level
+@item -0 @tab   2 MiB @tab   4 MiB @tab   8 MiB @tab  16 MiB @tab  64 MiB @tab 256 MiB
+@item -1 @tab   4 MiB @tab   8 MiB @tab  16 MiB @tab  32 MiB @tab 128 MiB @tab 512 MiB
+@item -2 @tab   6 MiB @tab  12 MiB @tab  24 MiB @tab  48 MiB @tab 192 MiB @tab 768 MiB
+@item -3 @tab   8 MiB @tab  16 MiB @tab  32 MiB @tab  64 MiB @tab 256 MiB @tab   1 GiB
+@item -4 @tab  12 MiB @tab  24 MiB @tab  48 MiB @tab  96 MiB @tab 384 MiB @tab 1.5 GiB
+@item -5 @tab  16 MiB @tab  32 MiB @tab  64 MiB @tab 128 MiB @tab 512 MiB @tab   2 GiB
+@item -6 @tab  32 MiB @tab  64 MiB @tab 128 MiB @tab 256 MiB @tab   1 GiB @tab   4 GiB
+@item -7 @tab  64 MiB @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab   2 GiB @tab   8 GiB
+@item -8 @tab  96 MiB @tab 192 MiB @tab 384 MiB @tab 768 MiB @tab   3 GiB @tab  12 GiB
+@item -9 @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab   1 GiB @tab   4 GiB @tab  16 GiB
+@end multitable
+
+
 @node Examples
 @chapter A small tutorial with examples
 @cindex examples
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2019-02-14 05:12:08 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2019-02-14 05:12:08 +0000
commit	b276e37c698f0f64669894eac75044f3ab0fd4fe (patch)
tree	2d9c4ef85d4cd7573088064af7be26631a65f51c /doc
parent	Releasing debian version 0.10a-3. (diff)
download	tarlz-b276e37c698f0f64669894eac75044f3ab0fd4fe.tar.xz tarlz-b276e37c698f0f64669894eac75044f3ab0fd4fe.zip