summaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2019-02-14 05:12:04 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2019-02-14 05:12:04 +0000
commite9e3fad677df4b5329912c4dd611a8de620f15cb (patch)
tree3bc1ab40775cf56a94d8b9cd4ce14a71111b8545 /doc
parentAdding upstream version 0.10a. (diff)
downloadtarlz-e9e3fad677df4b5329912c4dd611a8de620f15cb.tar.xz
tarlz-e9e3fad677df4b5329912c4dd611a8de620f15cb.zip
Adding upstream version 0.11.upstream/0.11
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc')
-rw-r--r--doc/tarlz.128
-rw-r--r--doc/tarlz.info202
-rw-r--r--doc/tarlz.texi188
3 files changed, 265 insertions, 153 deletions
diff --git a/doc/tarlz.1 b/doc/tarlz.1
index c30c72f..82462cd 100644
--- a/doc/tarlz.1
+++ b/doc/tarlz.1
@@ -1,20 +1,20 @@
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
-.TH TARLZ "1" "February 2019" "tarlz 0.10a" "User Commands"
+.TH TARLZ "1" "February 2019" "tarlz 0.11" "User Commands"
.SH NAME
tarlz \- creates tar archives with multimember lzip compression
.SH SYNOPSIS
.B tarlz
[\fI\,options\/\fR] [\fI\,files\/\fR]
.SH DESCRIPTION
-Tarlz is a combined implementation of the tar archiver and the lzip
-compressor. By default tarlz creates, lists and extracts archives in a
-simplified posix pax format compressed with lzip on a per file basis. Each
-tar member is compressed in its own lzip member, as well as the end\-of\-file
-blocks. This method adds an indexed lzip layer on top of the tar archive,
-making it possible to decode the archive safely in parallel. The resulting
-multimember tar.lz archive is fully backward compatible with standard tar
-tools like GNU tar, which treat it like any other tar.lz archive. Tarlz can
-append files to the end of such compressed archives.
+Tarlz is a massively parallel (multi\-threaded) combined implementation of
+the tar archiver and the lzip compressor. Tarlz creates, lists and extracts
+archives in a simplified posix pax format compressed with lzip, keeping the
+alignment between tar members and lzip members. This method adds an indexed
+lzip layer on top of the tar archive, making it possible to decode the
+archive safely in parallel. The resulting multimember tar.lz archive is
+fully backward compatible with standard tar tools like GNU tar, which treat
+it like any other tar.lz archive. Tarlz can append files to the end of such
+compressed archives.
.PP
The tarlz file format is a safe posix\-style backup format. In case of
corruption, tarlz can extract all the undamaged members from the tar.lz
@@ -46,7 +46,7 @@ change to directory <dir>
use archive file <archive>
.TP
\fB\-n\fR, \fB\-\-threads=\fR<n>
-set number of decompression threads [2]
+set number of (de)compression threads [2]
.TP
\fB\-q\fR, \fB\-\-quiet\fR
suppress all messages
@@ -70,13 +70,13 @@ set compression level [default 6]
create solidly compressed appendable archive
.TP
\fB\-\-bsolid\fR
-create per\-data\-block compressed archive
+create per block compressed archive (default)
.TP
\fB\-\-dsolid\fR
-create per\-directory compressed archive
+create per directory compressed archive
.TP
\fB\-\-no\-solid\fR
-create per\-file compressed archive (default)
+create per file compressed archive
.TP
\fB\-\-solid\fR
create solidly compressed archive
diff --git a/doc/tarlz.info b/doc/tarlz.info
index bf1e1f5..288c441 100644
--- a/doc/tarlz.info
+++ b/doc/tarlz.info
@@ -11,7 +11,7 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
Tarlz Manual
************
-This manual is for Tarlz (version 0.10, 31 January 2019).
+This manual is for Tarlz (version 0.11, 13 February 2019).
* Menu:
@@ -20,6 +20,7 @@ This manual is for Tarlz (version 0.10, 31 January 2019).
* File format:: Detailed format of the compressed archive
* Amendments to pax format:: The reasons for the differences with pax
* Multi-threaded tar:: Limitations of parallel tar decoding
+* Minimum archive sizes:: Sizes required for full multi-threaded speed
* Examples:: A small tutorial with examples
* Problems:: Reporting bugs
* Concept index:: Index of concepts
@@ -36,23 +37,23 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T
1 Introduction
**************
-Tarlz is a combined implementation of the tar archiver and the lzip
-compressor. By default tarlz creates, lists and extracts archives in a
-simplified posix pax format compressed with lzip on a per file basis.
-Each tar member is compressed in its own lzip member, as well as the
-end-of-file blocks. This method adds an indexed lzip layer on top of
-the tar archive, making it possible to decode the archive safely in
-parallel. The resulting multimember tar.lz archive is fully backward
-compatible with standard tar tools like GNU tar, which treat it like
-any other tar.lz archive. Tarlz can append files to the end of such
-compressed archives.
+Tarlz is a massively parallel (multi-threaded) combined implementation
+of the tar archiver and the lzip compressor. Tarlz creates, lists and
+extracts archives in a simplified posix pax format compressed with
+lzip, keeping the alignment between tar members and lzip members. This
+method adds an indexed lzip layer on top of the tar archive, making it
+possible to decode the archive safely in parallel. The resulting
+multimember tar.lz archive is fully backward compatible with standard
+tar tools like GNU tar, which treat it like any other tar.lz archive.
+Tarlz can append files to the end of such compressed archives.
- Tarlz can create tar archives with four levels of compression
-granularity; per file, per directory, appendable solid, and solid.
+ Tarlz can create tar archives with five levels of compression
+granularity; per file, per block, per directory, appendable solid, and
+solid.
-Of course, compressing each file (or each directory) individually is
-less efficient than compressing the whole tar archive, but it has the
-following advantages:
+Of course, compressing each file (or each directory) individually can't
+achieve a compression ratio as high as compressing solidly the whole tar
+archive, but it has the following advantages:
* The resulting multimember tar.lz archive can be decompressed in
parallel, multiplying the decompression speed.
@@ -87,17 +88,23 @@ The format for running tarlz is:
tarlz [OPTIONS] [FILES]
-On archive creation or appending, tarlz removes leading and trailing
-slashes from filenames, as well as filename prefixes containing a '..'
-component. On extraction, archive members containing a '..' component
-are skipped. Tarlz detects when the archive being created or enlarged
-is among the files to be dumped, appended or concatenated, and skips it.
+On archive creation or appending tarlz archives the files specified, but
+removes from member names any leading and trailing slashes and any
+filename prefixes containing a '..' component. On extraction, leading
+and trailing slashes are also removed from member names, and archive
+members containing a '..' component in the filename are skipped. Tarlz
+detects when the archive being created or enlarged is among the files
+to be dumped, appended or concatenated, and skips it.
On extraction and listing, tarlz removes leading './' strings from
member names in the archive or given in the command line, so that
'tarlz -xf foo ./bar baz' extracts members 'bar' and './baz' from
archive 'foo'.
+ If several compression levels or '--*solid' options are given, the
+last setting is used. For example '-9 --solid --uncompressed -1' is
+equivalent to '-1 --solid'
+
tarlz supports the following options:
'-h'
@@ -125,7 +132,7 @@ archive 'foo'.
Set target size of input data blocks for the '--bsolid' option.
Valid values range from 8 KiB to 1 GiB. Default value is two times
the dictionary size, except for option '-0' where it defaults to
- 1 MiB.
+ 1 MiB. *Note Minimum archive sizes::.
'-c'
'--create'
@@ -142,6 +149,11 @@ archive 'foo'.
relative to the then current working directory, perhaps changed by
a previous '-C' option.
+ Note that a process can only have one current working directory
+ (CWD). Therefore multi-threading can't be used to create an
+ archive if a '-C' option appears after a relative filename in the
+ command line.
+
'-f ARCHIVE'
'--file=ARCHIVE'
Use archive file ARCHIVE. '-' used as an ARCHIVE argument reads
@@ -149,18 +161,21 @@ archive 'foo'.
'-n N'
'--threads=N'
- Set the number of decompression threads, overriding the system's
+ Set the number of (de)compression threads, overriding the system's
default. Valid values range from 0 to "as many as your system can
support". A value of 0 disables threads entirely. If this option
is not used, tarlz tries to detect the number of processors in the
system and use it as default value. 'tarlz --help' shows the
- system's default value. This option currently only has effect when
- listing the contents of a multimember compressed archive. *Note
+ system's default value. See the note about multi-threaded archive
+ creation in the '-C' option above. Multi-threaded extraction of
+ files from an archive is not yet implemented. *Note
Multi-threaded tar::.
Note that the number of usable threads is limited during
- decompression to the number of lzip members in the tar.lz archive,
- which you can find by running 'lzip -lv archive.tar.lz'.
+ compression to ceil( uncompressed_size / data_size ) (*note
+ Minimum archive sizes::), and during decompression to the number
+ of lzip members in the tar.lz archive, which you can find by
+ running 'lzip -lv archive.tar.lz'.
'-q'
'--quiet'
@@ -180,7 +195,7 @@ archive 'foo'.
'-t'
'--list'
List the contents of an archive. If FILES are given, list only the
- given FILES.
+ FILES given.
'-v'
'--verbose'
@@ -189,7 +204,7 @@ archive 'foo'.
'-x'
'--extract'
Extract files from an archive. If FILES are given, extract only
- the given FILES. Else extract all the files in the archive.
+ the FILES given. Else extract all the files in the archive.
'-0 .. -9'
Set the compression level. The default compression level is '-6'.
@@ -214,38 +229,43 @@ archive 'foo'.
solid compression. All the files being added to the archive are
compressed into a single lzip member, but the end-of-file blocks
are compressed into a separate lzip member. This creates a solidly
- compressed appendable archive.
+ compressed appendable archive. Solid archives can't be created
+ nor decoded in parallel.
'--bsolid'
- When creating or appending to a compressed archive, compress tar
- members together in a lzip member until they approximate a target
- uncompressed size. The size can't be exact because each solidly
- compressed data block must contain an integer number of tar
- members. This option improves compression efficiency for archives
- with lots of small files. *Note --data-size::, to set the target
+ When creating or appending to a compressed archive, use block
+ compression. Tar members are compressed together in a lzip member
+ until they approximate a target uncompressed size. The size can't
+ be exact because each solidly compressed data block must contain
+ an integer number of tar members. Block compression is the default
+ because it improves compression ratio for archives with many files
+ smaller than the block size. This option allows tarlz revert to
+ default behavior if, for example, it is invoked through an alias
+ like 'tar='tarlz --solid''. *Note --data-size::, to set the target
block size.
'--dsolid'
- When creating or appending to a compressed archive, use solid
- compression for each directory especified in the command line. The
- end-of-file blocks are compressed into a separate lzip member. This
- creates a compressed appendable archive with a separate lzip
- member for each top-level directory.
+ When creating or appending to a compressed archive, compress each
+ file specified in the command line separately in its own lzip
+ member, and use solid compression for each directory specified in
+ the command line. The end-of-file blocks are compressed into a
+ separate lzip member. This creates a compressed appendable archive
+ with a separate lzip member for each file or top-level directory
+ specified.
'--no-solid'
When creating or appending to a compressed archive, compress each
- file separately. The end-of-file blocks are compressed into a
- separate lzip member. This creates a compressed appendable archive
- with a separate lzip member for each file. This option allows
- tarlz revert to default behavior if, for example, tarlz is invoked
- through an alias like 'tar='tarlz --solid''.
+ file separately in its own lzip member. The end-of-file blocks are
+ compressed into a separate lzip member. This creates a compressed
+ appendable archive with a lzip member for each file.
'--solid'
When creating or appending to a compressed archive, use solid
- compression. The files being added to the archive, along with the
+ compression. The files being added to the archive, along with the
end-of-file blocks, are compressed into a single lzip member. The
resulting archive is not appendable. No more files can be later
- appended to the archive.
+ appended to the archive. Solid archives can't be created nor
+ decoded in parallel.
'--anonymous'
Equivalent to '--owner=root --group=root'.
@@ -341,9 +361,9 @@ blocks are either compressed in a separate lzip member or compressed
along with the tar members contained in the last lzip member.
The diagram below shows the correspondence between each tar member
-(formed by one or two headers plus optional data) in the tar archive and
-each lzip member in the resulting multimember tar.lz archive: *Note
-File format: (lzip)File format.
+(formed by one or two headers plus optional data) in the tar archive
+and each lzip member in the resulting multimember tar.lz archive, when
+per file compression is used: *Note File format: (lzip)File format.
tar
+========+======+=================+===============+========+======+========+
@@ -612,12 +632,12 @@ wasteful for a backup format.
There is no portable way to tell what charset a text string is coded
into. Therefore, tarlz stores all fields representing text strings
-as-is, without conversion to UTF-8 nor any other transformation. This
-prevents accidental double UTF-8 conversions. If the need arises this
-behavior will be adjusted with a command line option in the future.
+unmodified, without conversion to UTF-8 nor any other transformation.
+This prevents accidental double UTF-8 conversions. If the need arises
+this behavior will be adjusted with a command line option in the future.

-File: tarlz.info, Node: Multi-threaded tar, Next: Examples, Prev: Amendments to pax format, Up: Top
+File: tarlz.info, Node: Multi-threaded tar, Next: Minimum archive sizes, Prev: Amendments to pax format, Up: Top
5 Limitations of parallel tar decoding
**************************************
@@ -659,15 +679,53 @@ sequential '--list' because, in addition to using several processors,
it only needs to decompress part of each lzip member. See the following
example listing the Silesia corpus on a dual core machine:
- tarlz -9 -cf silesia.tar.lz silesia
+ tarlz -9 --no-solid -cf silesia.tar.lz silesia
time lzip -cd silesia.tar.lz | tar -tf - (5.032s)
time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
time tarlz -tf silesia.tar.lz (0.020s)

-File: tarlz.info, Node: Examples, Next: Problems, Prev: Multi-threaded tar, Up: Top
+File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded tar, Up: Top
+
+6 Minimum archive sizes required for multi-threaded block compression
+*********************************************************************
+
+When creating or appending to a compressed archive using multi-threaded
+block compression, tarlz puts tar members together in blocks and
+compresses as many blocks simultaneously as worker threads are chosen,
+creating a multimember compressed archive.
+
+ For this to work as expected (and roughly multiply the compression
+speed by the number of available processors), the uncompressed archive
+must be at least as large as the number of worker threads times the
+block size (*note --data-size::). Else some processors will not get any
+data to compress, and compression will be proportionally slower. The
+maximum speed increase achievable on a given file is limited by the
+ratio (uncompressed_size / data_size). For example, a tarball the size
+of gcc or linux will scale up to 10 or 12 processors at level -9.
+
+ The following table shows the minimum uncompressed archive size
+needed for full use of N processors at a given compression level, using
+the default data size for each level:
+
+Processors 2 4 8 16 64 256
+------------------------------------------------------------------
+Level
+-0 2 MiB 4 MiB 8 MiB 16 MiB 64 MiB 256 MiB
+-1 4 MiB 8 MiB 16 MiB 32 MiB 128 MiB 512 MiB
+-2 6 MiB 12 MiB 24 MiB 48 MiB 192 MiB 768 MiB
+-3 8 MiB 16 MiB 32 MiB 64 MiB 256 MiB 1 GiB
+-4 12 MiB 24 MiB 48 MiB 96 MiB 384 MiB 1.5 GiB
+-5 16 MiB 32 MiB 64 MiB 128 MiB 512 MiB 2 GiB
+-6 32 MiB 64 MiB 128 MiB 256 MiB 1 GiB 4 GiB
+-7 64 MiB 128 MiB 256 MiB 512 MiB 2 GiB 8 GiB
+-8 96 MiB 192 MiB 384 MiB 768 MiB 3 GiB 12 GiB
+-9 128 MiB 256 MiB 512 MiB 1 GiB 4 GiB 16 GiB
+
+
+File: tarlz.info, Node: Examples, Next: Problems, Prev: Minimum archive sizes, Up: Top
-6 A small tutorial with examples
+7 A small tutorial with examples
********************************
Example 1: Create a multimember compressed archive 'archive.tar.lz'
@@ -725,7 +783,7 @@ Example 8: Copy the contents of directory 'sourcedir' to the directory

File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
-7 Reporting bugs
+8 Reporting bugs
****************
There are probably bugs in tarlz. There are certainly errors and
@@ -754,6 +812,7 @@ Concept index
* getting help: Problems. (line 6)
* introduction: Introduction. (line 6)
* invoking: Invoking tarlz. (line 6)
+* minimum archive sizes: Minimum archive sizes. (line 6)
* options: Invoking tarlz. (line 6)
* usage: Invoking tarlz. (line 6)
* version: Invoking tarlz. (line 6)
@@ -762,18 +821,19 @@ Concept index

Tag Table:
Node: Top223
-Node: Introduction1013
-Node: Invoking tarlz3125
-Ref: --data-size4717
-Node: File format11536
-Ref: key_crc3216321
-Node: Amendments to pax format21738
-Ref: crc3222262
-Ref: flawed-compat23287
-Node: Multi-threaded tar25649
-Node: Examples28164
-Node: Problems29830
-Node: Concept index30356
+Node: Introduction1089
+Node: Invoking tarlz3218
+Ref: --data-size5097
+Node: File format12673
+Ref: key_crc3217493
+Node: Amendments to pax format22910
+Ref: crc3223434
+Ref: flawed-compat24459
+Node: Multi-threaded tar26826
+Node: Minimum archive sizes29365
+Node: Examples31495
+Node: Problems33164
+Node: Concept index33690

End Tag Table
diff --git a/doc/tarlz.texi b/doc/tarlz.texi
index 2ab37fb..6026fe3 100644
--- a/doc/tarlz.texi
+++ b/doc/tarlz.texi
@@ -6,8 +6,8 @@
@finalout
@c %**end of header
-@set UPDATED 31 January 2019
-@set VERSION 0.10
+@set UPDATED 13 February 2019
+@set VERSION 0.11
@dircategory Data Compression
@direntry
@@ -40,6 +40,7 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
* File format:: Detailed format of the compressed archive
* Amendments to pax format:: The reasons for the differences with pax
* Multi-threaded tar:: Limitations of parallel tar decoding
+* Minimum archive sizes:: Sizes required for full multi-threaded speed
* Examples:: A small tutorial with examples
* Problems:: Reporting bugs
* Concept index:: Index of concepts
@@ -56,25 +57,24 @@ to copy, distribute and modify it.
@chapter Introduction
@cindex introduction
-@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a combined
-implementation of the tar archiver and the
-@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. By default
-tarlz creates, lists and extracts archives in a simplified posix pax format
-compressed with lzip on a per file basis. Each tar member is compressed in
-its own lzip member, as well as the end-of-file blocks. This method adds an
-indexed lzip layer on top of the tar archive, making it possible to decode
-the archive safely in parallel. The resulting multimember tar.lz archive is
-fully backward compatible with standard tar tools like GNU tar, which treat
-it like any other tar.lz archive. Tarlz can append files to the end of such
-compressed archives.
-
-Tarlz can create tar archives with four levels of compression granularity;
-per file, per directory, appendable solid, and solid.
+@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a massively parallel
+(multi-threaded) combined implementation of the tar archiver and the
+@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz creates,
+lists and extracts archives in a simplified posix pax format compressed with
+lzip, keeping the alignment between tar members and lzip members. This
+method adds an indexed lzip layer on top of the tar archive, making it
+possible to decode the archive safely in parallel. The resulting multimember
+tar.lz archive is fully backward compatible with standard tar tools like GNU
+tar, which treat it like any other tar.lz archive. Tarlz can append files to
+the end of such compressed archives.
+
+Tarlz can create tar archives with five levels of compression granularity;
+per file, per block, per directory, appendable solid, and solid.
@noindent
-Of course, compressing each file (or each directory) individually is
-less efficient than compressing the whole tar archive, but it has the
-following advantages:
+Of course, compressing each file (or each directory) individually can't
+achieve a compression ratio as high as compressing solidly the whole tar
+archive, but it has the following advantages:
@itemize @bullet
@item
@@ -120,18 +120,23 @@ tarlz [@var{options}] [@var{files}]
@end example
@noindent
-On archive creation or appending, tarlz removes leading and trailing
-slashes from filenames, as well as filename prefixes containing a
-@samp{..} component. On extraction, archive members containing a
-@samp{..} component are skipped. Tarlz detects when the archive being
-created or enlarged is among the files to be dumped, appended or
-concatenated, and skips it.
+On archive creation or appending tarlz archives the files specified, but
+removes from member names any leading and trailing slashes and any filename
+prefixes containing a @samp{..} component. On extraction, leading and
+trailing slashes are also removed from member names, and archive members
+containing a @samp{..} component in the filename are skipped. Tarlz detects
+when the archive being created or enlarged is among the files to be dumped,
+appended or concatenated, and skips it.
On extraction and listing, tarlz removes leading @samp{./} strings from
member names in the archive or given in the command line, so that
@w{@code{tarlz -xf foo ./bar baz}} extracts members @samp{bar} and
@samp{./baz} from archive @samp{foo}.
+If several compression levels or @samp{--*solid} options are given, the last
+setting is used. For example @w{@samp{-9 --solid --uncompressed -1}} is
+equivalent to @samp{-1 --solid}
+
tarlz supports the following options:
@table @code
@@ -160,6 +165,7 @@ specified. Tarlz can't concatenate uncompressed tar archives.
Set target size of input data blocks for the @samp{--bsolid} option. Valid
values range from @w{8 KiB} to @w{1 GiB}. Default value is two times the
dictionary size, except for option @samp{-0} where it defaults to @w{1 MiB}.
+@xref{Minimum archive sizes}.
@item -c
@itemx --create
@@ -176,6 +182,10 @@ extraction. Listing ignores any @samp{-C} options specified. @var{dir}
is relative to the then current working directory, perhaps changed by a
previous @samp{-C} option.
+Note that a process can only have one current working directory (CWD).
+Therefore multi-threading can't be used to create an archive if a @samp{-C}
+option appears after a relative filename in the command line.
+
@item -f @var{archive}
@itemx --file=@var{archive}
Use archive file @var{archive}. @samp{-} used as an @var{archive}
@@ -183,17 +193,19 @@ argument reads from standard input or writes to standard output.
@item -n @var{n}
@itemx --threads=@var{n}
-Set the number of decompression threads, overriding the system's default.
+Set the number of (de)compression threads, overriding the system's default.
Valid values range from 0 to "as many as your system can support". A value
of 0 disables threads entirely. If this option is not used, tarlz tries to
detect the number of processors in the system and use it as default value.
-@w{@samp{tarlz --help}} shows the system's default value. This option
-currently only has effect when listing the contents of a multimember
-compressed archive. @xref{Multi-threaded tar}.
+@w{@samp{tarlz --help}} shows the system's default value. See the note about
+multi-threaded archive creation in the @samp{-C} option above.
+Multi-threaded extraction of files from an archive is not yet implemented.
+@xref{Multi-threaded tar}.
-Note that the number of usable threads is limited during decompression to
-the number of lzip members in the tar.lz archive, which you can find by
-running @w{@code{lzip -lv archive.tar.lz}}.
+Note that the number of usable threads is limited during compression to
+@w{ceil( uncompressed_size / data_size )} (@pxref{Minimum archive sizes}),
+and during decompression to the number of lzip members in the tar.lz
+archive, which you can find by running @w{@code{lzip -lv archive.tar.lz}}.
@item -q
@itemx --quiet
@@ -213,7 +225,7 @@ to an uncompressed tar archive.
@item -t
@itemx --list
List the contents of an archive. If @var{files} are given, list only the
-given @var{files}.
+@var{files} given.
@item -v
@itemx --verbose
@@ -222,7 +234,7 @@ Verbosely list files processed.
@item -x
@itemx --extract
Extract files from an archive. If @var{files} are given, extract only
-the given @var{files}. Else extract all the files in the archive.
+the @var{files} given. Else extract all the files in the archive.
@item -0 .. -9
Set the compression level. The default compression level is @samp{-6}.
@@ -245,40 +257,42 @@ it creates, reducing the amount of memory required for decompression.
@item --asolid
When creating or appending to a compressed archive, use appendable solid
-compression. All the files being added to the archive are compressed
-into a single lzip member, but the end-of-file blocks are compressed
-into a separate lzip member. This creates a solidly compressed
-appendable archive.
+compression. All the files being added to the archive are compressed into a
+single lzip member, but the end-of-file blocks are compressed into a
+separate lzip member. This creates a solidly compressed appendable archive.
+Solid archives can't be created nor decoded in parallel.
@item --bsolid
-When creating or appending to a compressed archive, compress tar members
-together in a lzip member until they approximate a target uncompressed size.
-The size can't be exact because each solidly compressed data block must
-contain an integer number of tar members. This option improves compression
-efficiency for archives with lots of small files. @xref{--data-size}, to set
-the target block size.
+When creating or appending to a compressed archive, use block compression.
+Tar members are compressed together in a lzip member until they approximate
+a target uncompressed size. The size can't be exact because each solidly
+compressed data block must contain an integer number of tar members. Block
+compression is the default because it improves compression ratio for
+archives with many files smaller than the block size. This option allows
+tarlz revert to default behavior if, for example, it is invoked through an
+alias like @code{tar='tarlz --solid'}. @xref{--data-size}, to set the target
+block size.
@item --dsolid
-When creating or appending to a compressed archive, use solid
-compression for each directory especified in the command line. The
-end-of-file blocks are compressed into a separate lzip member. This
-creates a compressed appendable archive with a separate lzip member for
-each top-level directory.
+When creating or appending to a compressed archive, compress each file
+specified in the command line separately in its own lzip member, and use
+solid compression for each directory specified in the command line. The
+end-of-file blocks are compressed into a separate lzip member. This creates
+a compressed appendable archive with a separate lzip member for each file or
+top-level directory specified.
@item --no-solid
When creating or appending to a compressed archive, compress each file
-separately. The end-of-file blocks are compressed into a separate lzip
-member. This creates a compressed appendable archive with a separate
-lzip member for each file. This option allows tarlz revert to default
-behavior if, for example, tarlz is invoked through an alias like
-@code{tar='tarlz --solid'}.
+separately in its own lzip member. The end-of-file blocks are compressed
+into a separate lzip member. This creates a compressed appendable archive
+with a lzip member for each file.
@item --solid
-When creating or appending to a compressed archive, use solid
-compression. The files being added to the archive, along with the
-end-of-file blocks, are compressed into a single lzip member. The
-resulting archive is not appendable. No more files can be later appended
-to the archive.
+When creating or appending to a compressed archive, use solid compression.
+The files being added to the archive, along with the end-of-file blocks, are
+compressed into a single lzip member. The resulting archive is not
+appendable. No more files can be later appended to the archive. Solid
+archives can't be created nor decoded in parallel.
@item --anonymous
Equivalent to @samp{--owner=root --group=root}.
@@ -388,11 +402,11 @@ binary zeros, interpreted as an end-of-archive indicator. These EOF
blocks are either compressed in a separate lzip member or compressed
along with the tar members contained in the last lzip member.
-The diagram below shows the correspondence between each tar member
-(formed by one or two headers plus optional data) in the tar archive and
-each
+The diagram below shows the correspondence between each tar member (formed
+by one or two headers plus optional data) in the tar archive and each
@uref{http://www.nongnu.org/lzip/manual/lzip_manual.html#File-format,,lzip member}
-in the resulting multimember tar.lz archive:
+in the resulting multimember tar.lz archive, when per file compression is
+used:
@ifnothtml
@xref{File format,,,lzip}.
@end ifnothtml
@@ -672,10 +686,10 @@ format.
@section Avoid misconversions to/from UTF-8
There is no portable way to tell what charset a text string is coded into.
-Therefore, tarlz stores all fields representing text strings as-is, without
-conversion to UTF-8 nor any other transformation. This prevents accidental
-double UTF-8 conversions. If the need arises this behavior will be adjusted
-with a command line option in the future.
+Therefore, tarlz stores all fields representing text strings unmodified,
+without conversion to UTF-8 nor any other transformation. This prevents
+accidental double UTF-8 conversions. If the need arises this behavior will
+be adjusted with a command line option in the future.
@node Multi-threaded tar
@@ -717,13 +731,51 @@ it only needs to decompress part of each lzip member. See the following
example listing the Silesia corpus on a dual core machine:
@example
-tarlz -9 -cf silesia.tar.lz silesia
+tarlz -9 --no-solid -cf silesia.tar.lz silesia
time lzip -cd silesia.tar.lz | tar -tf - (5.032s)
time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
time tarlz -tf silesia.tar.lz (0.020s)
@end example
+@node Minimum archive sizes
+@chapter Minimum archive sizes required for multi-threaded block compression
+@cindex minimum archive sizes
+
+When creating or appending to a compressed archive using multi-threaded
+block compression, tarlz puts tar members together in blocks and compresses
+as many blocks simultaneously as worker threads are chosen, creating a
+multimember compressed archive.
+
+For this to work as expected (and roughly multiply the compression speed by
+the number of available processors), the uncompressed archive must be at
+least as large as the number of worker threads times the block size
+(@pxref{--data-size}). Else some processors will not get any data to
+compress, and compression will be proportionally slower. The maximum speed
+increase achievable on a given file is limited by the ratio
+@w{(uncompressed_size / data_size)}. For example, a tarball the size of gcc
+or linux will scale up to 10 or 12 processors at level -9.
+
+The following table shows the minimum uncompressed archive size needed for
+full use of N processors at a given compression level, using the default
+data size for each level:
+
+@multitable {Processors} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB}
+@headitem Processors @tab 2 @tab 4 @tab 8 @tab 16 @tab 64 @tab 256
+@item Level
+@item -0 @tab 2 MiB @tab 4 MiB @tab 8 MiB @tab 16 MiB @tab 64 MiB @tab 256 MiB
+@item -1 @tab 4 MiB @tab 8 MiB @tab 16 MiB @tab 32 MiB @tab 128 MiB @tab 512 MiB
+@item -2 @tab 6 MiB @tab 12 MiB @tab 24 MiB @tab 48 MiB @tab 192 MiB @tab 768 MiB
+@item -3 @tab 8 MiB @tab 16 MiB @tab 32 MiB @tab 64 MiB @tab 256 MiB @tab 1 GiB
+@item -4 @tab 12 MiB @tab 24 MiB @tab 48 MiB @tab 96 MiB @tab 384 MiB @tab 1.5 GiB
+@item -5 @tab 16 MiB @tab 32 MiB @tab 64 MiB @tab 128 MiB @tab 512 MiB @tab 2 GiB
+@item -6 @tab 32 MiB @tab 64 MiB @tab 128 MiB @tab 256 MiB @tab 1 GiB @tab 4 GiB
+@item -7 @tab 64 MiB @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab 2 GiB @tab 8 GiB
+@item -8 @tab 96 MiB @tab 192 MiB @tab 384 MiB @tab 768 MiB @tab 3 GiB @tab 12 GiB
+@item -9 @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab 1 GiB @tab 4 GiB @tab 16 GiB
+@end multitable
+
+
@node Examples
@chapter A small tutorial with examples
@cindex examples