Adding upstream version 0.25.upstream/0.25 upstream

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-14 12:57:29 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-14 12:57:29 +0000
commit: 29146f385a524ad6a4b1b127cc3d9641a8fe0adc (patch)
tree: 1caea11496a3d9e0333cdf649d9f9be6d5a67b78 /doc
parent: Initial commit. (diff)
download: tarlz-29146f385a524ad6a4b1b127cc3d9641a8fe0adc.tar.xz
tarlz-29146f385a524ad6a4b1b127cc3d9641a8fe0adc.zip
3 files changed, 2823 insertions, 0 deletions
diff --git a/doc/tarlz.1 b/doc/tarlz.1
new file mode 100644
index 0000000..9d63da5
--- /dev/null
+++ b/doc/tarlz.1
@@ -0,0 +1,180 @@
+.\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.49.2.
+.TH TARLZ "1" "January 2024" "tarlz 0.25" "User Commands"
+.SH NAME
+tarlz \- creates tar archives with multimember lzip compression
+.SH SYNOPSIS
+.B tarlz
+\fI\,operation \/\fR[\fI\,options\/\fR] [\fI\,files\/\fR]
+.SH DESCRIPTION
+Tarlz is a massively parallel (multi\-threaded) combined implementation of
+the tar archiver and the lzip compressor. Tarlz uses the compression library
+lzlib.
+.PP
+Tarlz creates tar archives using a simplified and safer variant of the POSIX
+pax format compressed in lzip format, keeping the alignment between tar
+members and lzip members. The resulting multimember tar.lz archive is
+backward compatible with standard tar tools like GNU tar, which treat it
+like any other tar.lz archive. Tarlz can append files to the end of such
+compressed archives.
+.PP
+Keeping the alignment between tar members and lzip members has two
+advantages. It adds an indexed lzip layer on top of the tar archive, making
+it possible to decode the archive safely in parallel. It also minimizes the
+amount of data lost in case of corruption.
+.PP
+The tarlz file format is a safe POSIX\-style backup format. In case of
+corruption, tarlz can extract all the undamaged members from the tar.lz
+archive, skipping over the damaged members, just like the standard
+(uncompressed) tar. Moreover, the option '\-\-keep\-damaged' can be used to
+recover as much data as possible from each damaged member, and lziprecover
+can be used to recover some of the damaged members.
+.SS "Operations:"
+.TP
+\fB\-\-help\fR
+display this help and exit
+.TP
+\fB\-V\fR, \fB\-\-version\fR
+output version information and exit
+.TP
+\fB\-A\fR, \fB\-\-concatenate\fR
+append archives to the end of an archive
+.TP
+\fB\-c\fR, \fB\-\-create\fR
+create a new archive
+.TP
+\fB\-d\fR, \fB\-\-diff\fR
+find differences between archive and file system
+.TP
+\fB\-\-delete\fR
+delete files/directories from an archive
+.TP
+\fB\-r\fR, \fB\-\-append\fR
+append files to the end of an archive
+.TP
+\fB\-t\fR, \fB\-\-list\fR
+list the contents of an archive
+.TP
+\fB\-x\fR, \fB\-\-extract\fR
+extract files/directories from an archive
+.TP
+\fB\-z\fR, \fB\-\-compress\fR
+compress existing POSIX tar archives
+.TP
+\fB\-\-check\-lib\fR
+check version of lzlib and exit
+.SH OPTIONS
+.TP
+\fB\-B\fR, \fB\-\-data\-size=\fR<bytes>
+set target size of input data blocks [2x8=16 MiB]
+.TP
+\fB\-C\fR, \fB\-\-directory=\fR<dir>
+change to directory <dir>
+.TP
+\fB\-f\fR, \fB\-\-file=\fR<archive>
+use archive file <archive>
+.TP
+\fB\-h\fR, \fB\-\-dereference\fR
+follow symlinks; archive the files they point to
+.TP
+\fB\-n\fR, \fB\-\-threads=\fR<n>
+set number of (de)compression threads [2]
+.TP
+\fB\-o\fR, \fB\-\-output=\fR<file>
+compress to <file> ('\-' for stdout)
+.TP
+\fB\-p\fR, \fB\-\-preserve\-permissions\fR
+don't subtract the umask on extraction
+.TP
+\fB\-q\fR, \fB\-\-quiet\fR
+suppress all messages
+.TP
+\fB\-v\fR, \fB\-\-verbose\fR
+verbosely list files processed
+.TP
+\fB\-0\fR .. \fB\-9\fR
+set compression level [default 6]
+.TP
+\fB\-\-uncompressed\fR
+don't compress the archive created
+.TP
+\fB\-\-asolid\fR
+create solidly compressed appendable archive
+.TP
+\fB\-\-bsolid\fR
+create per block compressed archive (default)
+.TP
+\fB\-\-dsolid\fR
+create per directory compressed archive
+.TP
+\fB\-\-no\-solid\fR
+create per file compressed archive
+.TP
+\fB\-\-solid\fR
+create solidly compressed archive
+.TP
+\fB\-\-anonymous\fR
+equivalent to '\-\-owner=root \fB\-\-group\fR=\fI\,root\/\fR'
+.TP
+\fB\-\-owner=\fR<owner>
+use <owner> name/ID for files added to archive
+.TP
+\fB\-\-group=\fR<group>
+use <group> name/ID for files added to archive
+.TP
+\fB\-\-exclude=\fR<pattern>
+exclude files matching a shell pattern
+.TP
+\fB\-\-ignore\-ids\fR
+ignore differences in owner and group IDs
+.TP
+\fB\-\-ignore\-metadata\fR
+compare only file size and file content
+.TP
+\fB\-\-ignore\-overflow\fR
+ignore mtime overflow differences on 32\-bit
+.TP
+\fB\-\-keep\-damaged\fR
+don't delete partially extracted files
+.TP
+\fB\-\-missing\-crc\fR
+exit with error status if missing extended CRC
+.TP
+\fB\-\-mtime=\fR<date>
+use <date> as mtime for files added to archive
+.TP
+\fB\-\-out\-slots=\fR<n>
+number of 1 MiB output packets buffered [64]
+.TP
+\fB\-\-warn\-newer\fR
+warn if any file is newer than the archive
+.PP
+If no archive is specified, tarlz tries to read it from standard input or
+write it to standard output.
+.PP
+Exit status: 0 for a normal exit, 1 for environmental problems
+(file not found, files differ, invalid command\-line options, I/O errors,
+etc), 2 to indicate a corrupt or invalid input file, 3 for an internal
+consistency error (e.g., bug) which caused tarlz to panic.
+.SH "REPORTING BUGS"
+Report bugs to lzip\-bug@nongnu.org
+.br
+Tarlz home page: http://www.nongnu.org/lzip/tarlz.html
+.SH COPYRIGHT
+Copyright \(co 2024 Antonio Diaz Diaz.
+Using lzlib 1.14\-rc1
+License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
+.br
+This is free software: you are free to change and redistribute it.
+There is NO WARRANTY, to the extent permitted by law.
+.SH "SEE ALSO"
+The full documentation for
+.B tarlz
+is maintained as a Texinfo manual.  If the
+.B info
+and
+.B tarlz
+programs are properly installed at your site, the command
+.IP
+.B info tarlz
+.PP
+should give you access to the complete manual.
diff --git a/doc/tarlz.info b/doc/tarlz.info
new file mode 100644
index 0000000..25ba882
--- /dev/null
+++ b/doc/tarlz.info
@@ -0,0 +1,1287 @@
+This is tarlz.info, produced by makeinfo version 4.13+ from tarlz.texi.
+
+INFO-DIR-SECTION Archiving
+START-INFO-DIR-ENTRY
+* Tarlz: (tarlz).               Archiver with multimember lzip compression
+END-INFO-DIR-ENTRY
+
+
+File: tarlz.info,  Node: Top,  Next: Introduction,  Up: (dir)
+
+Tarlz Manual
+************
+
+This manual is for Tarlz (version 0.25, 3 January 2024).
+
+* Menu:
+
+* Introduction::              Purpose and features of tarlz
+* Invoking tarlz::            Command-line interface
+* Portable character set::    POSIX portable filename character set
+* File format::               Detailed format of the compressed archive
+* Amendments to pax format::  The reasons for the differences with pax
+* Program design::            Internal structure of tarlz
+* Multi-threaded decoding::   Limitations of parallel tar decoding
+* Minimum archive sizes::     Sizes required for full multi-threaded speed
+* Examples::                  A small tutorial with examples
+* Problems::                  Reporting bugs
+* Concept index::             Index of concepts
+
+
+   Copyright (C) 2013-2024 Antonio Diaz Diaz.
+
+   This manual is free documentation: you have unlimited permission to copy,
+distribute, and modify it.
+
+
+File: tarlz.info,  Node: Introduction,  Next: Invoking tarlz,  Prev: Top,  Up: Top
+
+1 Introduction
+**************
+
+Tarlz is a massively parallel (multi-threaded) combined implementation of
+the tar archiver and the lzip compressor. Tarlz uses the compression
+library lzlib.
+
+   Tarlz creates tar archives using a simplified and safer variant of the
+POSIX pax format compressed in lzip format, keeping the alignment between
+tar members and lzip members. The resulting multimember tar.lz archive is
+backward compatible with standard tar tools like GNU tar, which treat it
+like any other tar.lz archive. Tarlz can append files to the end of such
+compressed archives.
+
+   Keeping the alignment between tar members and lzip members has two
+advantages. It adds an indexed lzip layer on top of the tar archive, making
+it possible to decode the archive safely in parallel. It also minimizes the
+amount of data lost in case of corruption. Compressing a tar archive with
+plzip may even double the amount of files lost for each lzip member damaged
+because it does not keep the members aligned.
+
+   Tarlz can create tar archives with five levels of compression
+granularity: per file ('--no-solid'), per block ('--bsolid', default), per
+directory ('--dsolid'), appendable solid ('--asolid'), and solid
+('--solid'). It can also create uncompressed tar archives.
+
+Of course, compressing each file (or each directory) individually can't
+achieve a compression ratio as high as compressing solidly the whole tar
+archive, but it has the following advantages:
+
+   * The resulting multimember tar.lz archive can be decompressed in
+     parallel, multiplying the decompression speed.
+
+   * New members can be appended to the archive (by removing the
+     end-of-archive member), and unwanted members can be deleted from the
+     archive. Just like an uncompressed tar archive.
+
+   * It is a safe POSIX-style backup format. In case of corruption, tarlz
+     can extract all the undamaged members from the tar.lz archive,
+     skipping over the damaged members, just like the standard
+     (uncompressed) tar. Moreover, the option '--keep-damaged' can be used
+     to recover as much data as possible from each damaged member, and
+     lziprecover can be used to recover some of the damaged members.
+
+   * A multimember tar.lz archive is usually smaller than the corresponding
+     solidly compressed tar.gz archive, except when individually
+     compressing files smaller than about 32 KiB.
+
+   Tarlz protects the extended records with a Cyclic Redundancy Check (CRC)
+in a way compatible with standard tar tools. *Note crc32::.
+
+   Tarlz does not understand other tar formats like 'gnu', 'oldgnu',
+'star', or 'v7'. The command 'tarlz -t -f archive.tar.lz > /dev/null' can
+be used to check that the format of the archive is compatible with tarlz.
+
+
+File: tarlz.info,  Node: Invoking tarlz,  Next: Portable character set,  Prev: Introduction,  Up: Top
+
+2 Invoking tarlz
+****************
+
+The format for running tarlz is:
+
+     tarlz OPERATION [OPTIONS] [FILES]
+
+All operations except '--concatenate' and '--compress' operate on whole
+trees if any FILE is a directory. All operations except '--compress'
+overwrite output files without warning. If no archive is specified, tarlz
+tries to read it from standard input or write it to standard output. Tarlz
+refuses to read archive data from a terminal or write archive data to a
+terminal. Tarlz detects when the archive being created or enlarged is among
+the files to be archived, appended, or concatenated, and skips it.
+
+   Tarlz does not use absolute file names nor file names above the current
+working directory (perhaps changed by option '-C'). On archive creation or
+appending tarlz archives the files specified, but removes from member names
+any leading and trailing slashes and any file name prefixes containing a
+'..' component. On extraction, leading and trailing slashes are also
+removed from member names, and archive members containing a '..' component
+in the file name are skipped. Tarlz does not follow symbolic links during
+extraction; not even symbolic links replacing intermediate directories.
+
+   On extraction and listing, tarlz removes leading './' strings from
+member names in the archive or given in the command line, so that
+'tarlz -xf foo ./bar baz' extracts members 'bar' and './baz' from archive
+'foo'.
+
+   If several compression levels or '--*solid' options are given, the last
+setting is used. For example '-9 --solid --uncompressed -1' is equivalent
+to '-1 --solid'.
+
+   tarlz supports the following operations:
+
+'--help'
+     Print an informative help message describing the options and exit.
+
+'-V'
+'--version'
+     Print the version number of tarlz on the standard output and exit.
+     This version number should be included in all bug reports.
+
+'-A'
+'--concatenate'
+     Append one or more archives to the end of an archive. If no archive is
+     specified with the option '-f', concatenate the input archives to
+     standard output. All the archives involved must be regular (seekable)
+     files, and must be either all compressed or all uncompressed.
+     Compressed and uncompressed archives can't be mixed. Compressed
+     archives must be multimember lzip files with the two end-of-archive
+     blocks plus any zero padding contained in the last lzip member of each
+     archive. The intermediate end-of-archive blocks are removed as each
+     new archive is concatenated. If the archive is uncompressed, tarlz
+     parses tar headers until it finds the end-of-archive blocks. Exit with
+     status 0 without modifying the archive if no FILES have been specified.
+
+     Concatenating archives containing files in common results in two or
+     more tar members with the same name in the resulting archive, which
+     may produce nondeterministic behavior during multi-threaded extraction.
+     *Note mt-extraction::.
+
+'-c'
+'--create'
+     Create a new archive from FILES.
+
+'-d'
+'--diff'
+     Compare and report differences between archive and file system. For
+     each tar member in the archive, check that the corresponding file in
+     the file system exists and is of the same type (regular file,
+     directory, etc). Report on standard output the differences found in
+     type, mode (permissions), owner and group IDs, modification time, file
+     size, file contents (of regular files), target (of symlinks) and
+     device number (of block/character special files).
+
+     As tarlz removes leading slashes from member names, the option '-C' may
+     be used in combination with '--diff' when absolute file names were used
+     on archive creation: 'tarlz -C / -d'. Alternatively, tarlz may be run
+     from the root directory to perform the comparison.
+
+'--delete'
+     Delete files and directories from an archive in place. It currently can
+     delete only from uncompressed archives and from archives with files
+     compressed individually ('--no-solid' archives). Note that files of
+     about '--data-size' or larger are compressed individually even if
+     '--bsolid' is used, and can therefore be deleted. Tarlz takes care to
+     not delete a tar member unless it is possible to do so. For example it
+     won't try to delete a tar member that is not compressed individually.
+     Even in the case of finding a corrupt member after having deleted some
+     member(s), tarlz stops and copies the rest of the file as soon as
+     corruption is found, leaving it just as corrupt as it was, but not
+     worse.
+
+     To delete a directory without deleting the files under it, use
+     'tarlz --delete -f foo --exclude='dir/*' dir'. Deleting in place may
+     be dangerous. A corrupt archive, a power cut, or an I/O error may cause
+     data loss.
+
+'-r'
+'--append'
+     Append files to the end of an archive. The archive must be a regular
+     (seekable) file either compressed or uncompressed. Compressed members
+     can't be appended to an uncompressed archive, nor vice versa. If the
+     archive is compressed, it must be a multimember lzip file with the two
+     end-of-archive blocks plus any zero padding contained in the last lzip
+     member of the archive. It is possible to append files to an archive
+     with a different compression granularity. Appending works as follows;
+     first the end-of-archive blocks are removed, then the new members are
+     appended, and finally two new end-of-archive blocks are appended to
+     the archive. If the archive is uncompressed, tarlz parses and skips
+     tar headers until it finds the end-of-archive blocks. Exit with status
+     0 without modifying the archive if no FILES have been specified.
+
+     Appending files already present in the archive results in two or more
+     tar members with the same name, which may produce nondeterministic
+     behavior during multi-threaded extraction. *Note mt-extraction::.
+
+'-t'
+'--list'
+     List the contents of an archive. If FILES are given, list only the
+     FILES given.
+
+'-x'
+'--extract'
+     Extract files from an archive. If FILES are given, extract only the
+     FILES given. Else extract all the files in the archive. To extract a
+     directory without extracting the files under it, use
+     'tarlz -xf foo --exclude='dir/*' dir'. Tarlz removes files and empty
+     directories unconditionally before extracting over them. Other than
+     that, it does not make any special effort to extract a file over an
+     incompatible type of file. For example, extracting a file over a
+     non-empty directory usually fails.
+
+'-z'
+'--compress'
+     Compress existing POSIX tar archives aligning the lzip members to the
+     tar members with choice of granularity ('--bsolid' by default,
+     '--dsolid' works like '--asolid'). Exit with error status 2 if any
+     input archive is an empty file. The input archives are kept unchanged.
+     Existing compressed archives are not overwritten. A hyphen '-' used as
+     the name of an input archive reads from standard input and writes to
+     standard output (unless the option '--output' is used). Tarlz can be
+     used as compressor for GNU tar by using a command like
+     'tar -c -Hustar foo | tarlz -z -o foo.tar.lz'. Tarlz can be used as
+     compressor for zupdate (zutils) by using a command like
+     'zupdate --lz="tarlz -z" foo.tar.gz'. Note that tarlz only works
+     reliably on archives without global headers, or with global headers
+     whose content can be ignored.
+
+     The compression is reversible, including any garbage present after the
+     end-of-archive blocks. Tarlz stops parsing after the first
+     end-of-archive block is found, and then compresses the rest of the
+     archive. Unless solid compression is requested, the end-of-archive
+     blocks are compressed in a lzip member separated from the preceding
+     members and from any non-zero garbage following the end-of-archive
+     blocks. '--compress' implies plzip argument style, not tar style. Each
+     input archive is compressed to a file with the extension '.lz' added
+     unless the option '--output' is used. When '--output' is used, only
+     one input archive can be specified. '-f' can't be used with
+     '--compress'.
+
+'--check-lib'
+     Compare the version of lzlib used to compile tarlz with the version
+     actually being used at run time and exit. Report any differences
+     found. Exit with error status 1 if differences are found. A mismatch
+     may indicate that lzlib is not correctly installed or that a different
+     version of lzlib has been installed after compiling tarlz. Exit with
+     error status 2 if LZ_API_VERSION and LZ_version_string don't match.
+     'tarlz -v --check-lib' shows the version of lzlib being used and the
+     value of LZ_API_VERSION (if defined). *Note Library version:
+     (lzlib)Library version.
+
+
+   tarlz supports the following options: *Note Argument syntax:
+(arg_parser)Argument syntax.
+
+'-B BYTES'
+'--data-size=BYTES'
+     Set target size of input data blocks for the option '--bsolid'. *Note
+     --bsolid::. Valid values range from 8 KiB to 1 GiB. Default value is
+     two times the dictionary size, except for option '-0' where it
+     defaults to 1 MiB. *Note Minimum archive sizes::.
+
+'-C DIR'
+'--directory=DIR'
+     Change to directory DIR. When creating, appending, comparing, or
+     extracting, the position of each '-C' option in the command line is
+     significant; it changes the current working directory for the following
+     FILES until a new '-C' option appears in the command line. '--list'
+     and '--delete' ignore any '-C' options specified. DIR is relative to
+     the then current working directory, perhaps changed by a previous '-C'
+     option.
+
+     Note that a process can only have one current working directory (CWD).
+     Therefore multi-threading can't be used to create or decode an archive
+     if a '-C' option appears after a (relative) file name in the command
+     line. (All file names are made relative when decoding).
+
+'-f ARCHIVE'
+'--file=ARCHIVE'
+     Use archive file ARCHIVE. A hyphen '-' used as an ARCHIVE argument
+     reads from standard input or writes to standard output.
+
+'-h'
+'--dereference'
+     Follow symbolic links during archive creation, appending or comparison.
+     Archive or compare the files they point to instead of the links
+     themselves.
+
+'-n N'
+'--threads=N'
+     Set the number of (de)compression threads, overriding the system's
+     default. Valid values range from 0 to "as many as your system can
+     support". A value of 0 disables threads entirely. If this option is
+     not used, tarlz tries to detect the number of processors in the system
+     and use it as default value. 'tarlz --help' shows the system's default
+     value. See the note about multi-threading in the option '-C' above.
+
+     Note that the number of usable threads is limited during compression to
+     ceil( uncompressed_size / data_size ) (*note Minimum archive sizes::),
+     and during decompression to the number of lzip members in the tar.lz
+     archive, which you can find by running 'lzip -lv archive.tar.lz'.
+
+'-o FILE'
+'--output=FILE'
+     Write the compressed output to FILE. '-o -' writes the compressed
+     output to standard output. Currently '--output' only works with
+     '--compress'.
+
+'-p'
+'--preserve-permissions'
+     On extraction, set file permissions as they appear in the archive.
+     This is the default behavior when tarlz is run by the superuser. The
+     default for other users is to subtract the umask of the user running
+     tarlz from the permissions specified in the archive.
+
+'-q'
+'--quiet'
+     Quiet operation. Suppress all messages.
+
+'-v'
+'--verbose'
+     Verbosely list files processed. Further -v's (up to 4) increase the
+     verbosity level.
+
+'-0 .. -9'
+     Set the compression level for '--create', '--append', and
+     '--compress'. The default compression level is '-6'. Like lzip, tarlz
+     also minimizes the dictionary size of the lzip members it creates,
+     reducing the amount of memory required for decompression.
+
+     Level   Dictionary size   Match length limit
+     -0      64 KiB            16 bytes
+     -1      1 MiB             5 bytes
+     -2      1.5 MiB           6 bytes
+     -3      2 MiB             8 bytes
+     -4      3 MiB             12 bytes
+     -5      4 MiB             20 bytes
+     -6      8 MiB             36 bytes
+     -7      16 MiB            68 bytes
+     -8      24 MiB            132 bytes
+     -9      32 MiB            273 bytes
+
+'--uncompressed'
+     With '--create', don't compress the tar archive created. Create an
+     uncompressed tar archive instead. With '--append', don't compress the
+     new members appended to the tar archive. Compressed members can't be
+     appended to an uncompressed archive, nor vice versa. '--uncompressed'
+     can be omitted if it can be deduced from the archive name. (An
+     uncompressed archive name lacks a '.lz' or '.tlz' extension).
+
+'--asolid'
+     When creating or appending to a compressed archive, use appendable
+     solid compression. All the files being added to the archive are
+     compressed into a single lzip member, but the end-of-archive blocks
+     are compressed into a separate lzip member. This creates a solidly
+     compressed appendable archive. Solid archives can't be created nor
+     decoded in parallel.
+
+'--bsolid'
+     When creating or appending to a compressed archive, use block
+     compression. Tar members are compressed together in a lzip member
+     until they approximate a target uncompressed size. The size can't be
+     exact because each solidly compressed data block must contain an
+     integer number of tar members. Block compression is the default
+     because it improves compression ratio for archives with many files
+     smaller than the block size. This option allows tarlz revert to
+     default behavior if, for example, it is invoked through an alias like
+     'tar='tarlz --solid''. *Note --data-size::, to set the target block
+     size.
+
+'--dsolid'
+     When creating or appending to a compressed archive, compress each file
+     specified in the command line separately in its own lzip member, and
+     use solid compression for each directory specified in the command
+     line. The end-of-archive blocks are compressed into a separate lzip
+     member. This creates a compressed appendable archive with a separate
+     lzip member for each file or top-level directory specified.
+
+'--no-solid'
+     When creating or appending to a compressed archive, compress each file
+     separately in its own lzip member. The end-of-archive blocks are
+     compressed into a separate lzip member. This creates a compressed
+     appendable archive with a lzip member for each file.
+
+'--solid'
+     When creating or appending to a compressed archive, use solid
+     compression. The files being added to the archive, along with the
+     end-of-archive blocks, are compressed into a single lzip member. The
+     resulting archive is not appendable. No more files can be later
+     appended to the archive. Solid archives can't be created nor decoded
+     in parallel.
+
+'--anonymous'
+     Equivalent to '--owner=root --group=root'.
+
+'--owner=OWNER'
+     When creating or appending, use OWNER for files added to the archive.
+     If OWNER is not a valid user name, it is decoded as a decimal numeric
+     user ID.
+
+'--group=GROUP'
+     When creating or appending, use GROUP for files added to the archive.
+     If GROUP is not a valid group name, it is decoded as a decimal numeric
+     group ID.
+
+'--exclude=PATTERN'
+     Exclude files matching a shell pattern like '*.o'. A file is considered
+     to match if any component of the file name matches. For example, '*.o'
+     matches 'foo.o', 'foo.o/bar' and 'foo/bar.o'. If PATTERN contains a
+     '/', it matches a corresponding '/' in the file name. For example,
+     'foo/*.o' matches 'foo/bar.o'. Multiple '--exclude' options can be
+     specified.
+
+'--ignore-ids'
+     Make '--diff' ignore differences in owner and group IDs. This option is
+     useful when comparing an '--anonymous' archive.
+
+'--ignore-metadata'
+     Make '--diff' ignore any differences in metadata (file permissions,
+     owner and group IDs, modification time). Compare only file type, file
+     size, and file content. This option is useful when file permissions
+     have not been fully restored because uid/gid changed on extraction.
+
+'--ignore-overflow'
+     Make '--diff' ignore differences in mtime caused by overflow on 32-bit
+     systems with a 32-bit time_t.
+
+'--keep-damaged'
+     Don't delete partially extracted files. If a decompression error
+     happens while extracting a file, keep the partial data extracted. Use
+     this option to recover as much data as possible from each damaged
+     member. It is recommended to run tarlz in single-threaded mode
+     ('--threads=0') when using this option.
+
+'--missing-crc'
+     Exit with error status 2 if the CRC of the extended records is
+     missing. When this option is used, tarlz detects any corruption in the
+     extended records (only limited by CRC collisions). But note that a
+     corrupt 'GNU.crc32' keyword, for example 'GNU.crc30', is reported as a
+     missing CRC instead of as a corrupt record. This misleading
+     'Missing CRC' message is the consequence of a flaw in the POSIX pax
+     format; i.e., the lack of a mandatory check sequence of the extended
+     records. *Note crc32::.
+
+'--mtime=DATE'
+     When creating or appending, use DATE as the modification time for
+     files added to the archive instead of their actual modification times.
+     The value of DATE may be either '@' followed by the number of seconds
+     since (or before) the epoch, or a date in format
+     '[-]YYYY-MM-DD HH:MM:SS' or '[-]YYYY-MM-DDTHH:MM:SS', or the name of
+     an existing reference file starting with '.' or '/' whose modification
+     time is used. The time of day 'HH:MM:SS' in the date format is
+     optional and defaults to '00:00:00'. The epoch is
+     '1970-01-01 00:00:00 UTC'. Negative seconds or years define a
+     modification time before the epoch.
+
+'--out-slots=N'
+     Number of 1 MiB output packets buffered per worker thread during
+     multi-threaded creation or appending to compressed archives.
+     Increasing the number of packets may increase compression speed if the
+     files being archived are larger than 64 MiB compressed, but requires
+     more memory. Valid values range from 1 to 1024. The default value is
+     64.
+
+'--warn-newer'
+     During archive creation, warn if any file being archived has a
+     modification time newer than the archive creation time. This option
+     may slow archive creation somewhat because it makes an extra call to
+     'stat' after archiving each file, but it guarantees that file contents
+     were not modified during the creation of the archive. Note that the
+     file must be at least one second newer than the archive for it to be
+     detected as newer.
+
+
+   Exit status: 0 for a normal exit, 1 for environmental problems (file not
+found, files differ, invalid command-line options, I/O errors, etc), 2 to
+indicate a corrupt or invalid input file, 3 for an internal consistency
+error (e.g., bug) which caused tarlz to panic.
+
+
+File: tarlz.info,  Node: Portable character set,  Next: File format,  Prev: Invoking tarlz,  Up: Top
+
+3 POSIX portable filename character set
+***************************************
+
+The set of characters from which portable file names are constructed.
+
+     A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
+     a b c d e f g h i j k l m n o p q r s t u v w x y z
+     0 1 2 3 4 5 6 7 8 9 . _ -
+
+   The last three characters are the period, underscore, and hyphen-minus
+characters, respectively.
+
+   File names are identifiers. Therefore, archiving works better when file
+names use only the portable character set without spaces added.
+
+
+File: tarlz.info,  Node: File format,  Next: Amendments to pax format,  Prev: Portable character set,  Up: Top
+
+4 File format
+*************
+
+In the diagram below, a box like this:
+
++---+
+|   | <-- the vertical bars might be missing
++---+
+
+   represents one byte; a box like this:
+
++==============+
+|              |
++==============+
+
+   represents a variable number of bytes or a fixed but large number of
+bytes (for example 512).
+
+
+   A tar.lz file consists of one or more lzip members (compressed data
+sets). The members simply appear one after another in the file, with no
+additional information before, between, or after them.
+
+   Each lzip member contains one or more tar members in a simplified POSIX
+pax interchange format. The only pax typeflag value supported by tarlz (in
+addition to the typeflag values defined by the ustar format) is 'x'. The
+pax format is an extension on top of the ustar format that removes the size
+limitations of the ustar format.
+
+   Each tar member contains one file archived, and is represented by the
+following sequence:
+
+   * An optional extended header block followed by one or more blocks that
+     contain the extended header records as if they were the contents of a
+     file; i.e., the extended header records are included as the data for
+     this header block. This header block is of the form described in pax
+     header block, with a typeflag value of 'x'.
+
+   * A header block in ustar format that describes the file. Any fields
+     defined in the preceding optional extended header records override the
+     associated fields in this header block for this file.
+
+   * Zero or more blocks that contain the contents of the file.
+
+   Each tar member must be contiguously stored in a lzip member for the
+parallel decoding operations like '--list' to work. If any tar member is
+split over two or more lzip members, the archive must be decoded
+sequentially. *Note Multi-threaded decoding::.
+
+   At the end of the archive file there are two 512-byte blocks filled with
+binary zeros, interpreted as an end-of-archive indicator. These EOA blocks
+are either compressed in a separate lzip member or compressed along with the
+tar members contained in the last lzip member. For a compressed archive to
+be recognized by tarlz as appendable, the last lzip member must contain
+between 512 and 32256 zeros alone (without any non-zero bytes).
+
+   The diagram below shows the correspondence between each tar member
+(formed by one or two headers plus optional data) in the tar archive and
+each lzip member in the resulting multimember tar.lz archive, when per file
+compression is used: *Note File format: (lzip)File format.
+
+tar
++========+======+=================+===============+========+======+========+
+| header | data | extended header | extended data | header | data |   EOA  |
++========+======+=================+===============+========+======+========+
+
+tar.lz
++===============+=================================================+========+
+|     member    |                      member                     | member |
++===============+=================================================+========+
+
+
+4.1 Pax header block
+====================
+
+The pax header block is identical to the ustar header block described below
+except that the typeflag has the value 'x' (extended). The field 'size' is
+the size of the extended header data in bytes. Most other fields in the pax
+header block are zeroed on archive creation to prevent trouble if the
+archive is read by an ustar tool, and are ignored by tarlz on archive
+extraction. *Note flawed-compat::.
+
+   Tarlz limits the size of the pax extended header data so that the whole
+header set (extended header + extended data + ustar header) can be read and
+decoded in a buffer of size INT_MAX.
+
+   The pax extended header data consists of one or more records, each of
+them constructed as follows:
+'"%d %s=%s\n", <length>, <keyword>, <value>'
+
+   The fields <length> and <keyword> in the record must be limited to the
+portable character set (*note Portable character set::). The field <length>
+contains the decimal length of the record in bytes, including the trailing
+newline. The field <value> is stored as-is, without conversion to UTF-8 nor
+any other transformation. The fields are separated by the ASCII characters
+space, equal-sign, and newline.
+
+   These are the <keyword> values currently supported by tarlz:
+
+'atime'
+     The signed decimal representation of the access time of the following
+     file in seconds since (or before) the epoch, obtained from the function
+     'stat'. The atime record is created only for files with a modification
+     time outside of the ustar range. *Note ustar-mtime::.
+
+'gid'
+     The unsigned decimal representation of the group ID of the group that
+     owns the following file. The gid record is created only for files with
+     a group ID greater than 2_097_151 (octal 7_777_777). *Note
+     ustar-uid-gid::.
+
+'linkpath'
+     The file name of a link being created to another file, of any type,
+     previously archived. This record overrides the field 'linkname' in the
+     following ustar header block. The following ustar header block
+     determines the type of link created. If typeflag of the following
+     header block is 1, a hard link is created. If typeflag is 2, a
+     symbolic link is created and the linkpath value is used as the
+     contents of the symbolic link. The linkpath record is created only for
+     links with a link name that does not fit in the space provided by the
+     ustar header.
+
+'mtime'
+     The signed decimal representation of the modification time of the
+     following file in seconds since (or before) the epoch, obtained from
+     the function 'stat'. This record overrides the field 'mtime' in the
+     following ustar header block. The mtime record is created only for
+     files with a modification time outside of the ustar range. *Note
+     ustar-mtime::.
+
+'path'
+     The file name of the following file. This record overrides the fields
+     'name' and 'prefix' in the following ustar header block. The path
+     record is created for files with a name that does not fit in the space
+     provided by the ustar header, but is also created for files that
+     require any other extended record so that the fields 'name' and
+     'prefix' in the following ustar header block can be zeroed.
+
+'size'
+     The size of the file in bytes, expressed as a decimal number using
+     digits from the ISO/IEC 646:1991 (ASCII) standard. This record
+     overrides the field 'size' in the following ustar header block. The
+     size record is created only for files with a size value greater than
+     8_589_934_591 (octal 77_777_777_777); that is, 8 GiB (2^33 bytes) or
+     larger.
+
+'uid'
+     The unsigned decimal representation of the user ID of the file owner
+     of the following file. The uid record is created only for files with a
+     user ID greater than 2_097_151 (octal 7_777_777). *Note
+     ustar-uid-gid::.
+
+'GNU.crc32'
+     CRC32-C (Castagnoli) of the extended header data excluding the 8 bytes
+     representing the CRC <value> itself. The <value> is represented as 8
+     hexadecimal digits in big endian order, '22 GNU.crc32=00000000\n'. The
+     keyword of the CRC record is protected by the CRC to guarantee that
+     corruption is always detected when using '--missing-crc' (except in
+     case of CRC collision). A CRC was chosen because a checksum is too
+     weak for a potentially large list of variable sized records. A
+     checksum can't detect simple errors like the swapping of two bytes.
+
+
+   At verbosity level 1 or higher tarlz prints a diagnostic for each unknown
+extended header keyword found in an archive, once per keyword.
+
+
+4.2 Ustar header block
+======================
+
+The ustar header block has a length of 512 bytes and is structured as shown
+in the following table. All lengths and offsets are in decimal.
+
+Field Name   Offset   Length (in bytes)
+name         0        100
+mode         100      8
+uid          108      8
+gid          116      8
+size         124      12
+mtime        136      12
+chksum       148      8
+typeflag     156      1
+linkname     157      100
+magic        257      6
+version      263      2
+uname        265      32
+gname        297      32
+devmajor     329      8
+devminor     337      8
+prefix       345      155
+
+   All characters in the header block are coded using the ISO/IEC 646:1991
+(ASCII) standard, except in fields storing names for files, users, and
+groups. For maximum portability between implementations, names should only
+contain characters from the portable character set (*note Portable
+character set::), but if an implementation supports the use of characters
+outside of '/' and the portable character set in names for files, users,
+and groups, tarlz will use the byte values in these names unmodified.
+
+   The fields 'name', 'linkname', and 'prefix' are null-terminated
+character strings except when all characters in the array contain non-null
+characters including the last character.
+
+   The fields 'name' and 'prefix' produce the file name. A new file name is
+formed, if prefix is not an empty string (its first character is not null),
+by concatenating prefix (up to the first null character), a slash
+character, and name; otherwise, name is used alone. In either case, name is
+terminated at the first null character. If prefix begins with a null
+character, it is ignored. In this manner, file names of at most 256
+characters can be supported. If a file name does not fit in the space
+provided, an extended record is used to store the file name.
+
+   The field 'linkname' does not use the prefix to produce a file name. If
+the link name does not fit in the 100 characters provided, an extended
+record is used to store the link name.
+
+   The field 'mode' provides 12 access permission bits. The following table
+shows the symbolic name of each bit and its octal value:
+
+Bit Name   Value   Bit Name   Value   Bit Name   Value
+---------------------------------------------------
+S_ISUID    04000   S_ISGID    02000   S_ISVTX    01000
+S_IRUSR    00400   S_IWUSR    00200   S_IXUSR    00100
+S_IRGRP    00040   S_IWGRP    00020   S_IXGRP    00010
+S_IROTH    00004   S_IWOTH    00002   S_IXOTH    00001
+
+   The fields 'uid' and 'gid' are the user and group IDs of the owner and
+group of the file, respectively. If the file uid or gid are greater than
+2_097_151 (octal 7_777_777), an extended record is used to store the uid or
+gid.
+
+   The field 'size' contains the octal representation of the size of the
+file in bytes. If the field 'typeflag' specifies a file of type '0'
+(regular file) or '7' (high performance regular file), the number of logical
+records following the header is (size / 512) rounded to the next integer.
+For all other values of typeflag, tarlz either sets the size field to 0 or
+ignores it, and does not store or expect any logical records following the
+header. If the file size is larger than 8_589_934_591 bytes
+(octal 77_777_777_777), an extended record is used to store the file size.
+
+   The field 'mtime' contains the octal representation of the modification
+time of the file at the time it was archived, obtained from the function
+'stat'. If the modification time is negative or larger than 8_589_934_591
+(octal 77_777_777_777) seconds since the epoch, an extended record is used
+to store the modification time. The ustar range of mtime goes from
+'1970-01-01 00:00:00 UTC' to '2242-03-16 12:56:31 UTC'.
+
+   The field 'chksum' contains the octal representation of the value of the
+simple sum of all bytes in the header logical record. Each byte in the
+header is treated as an unsigned value. When calculating the checksum, the
+chksum field is treated as if it were all space characters.
+
+   The field 'typeflag' contains a single character specifying the type of
+file archived:
+
+''0''
+     Regular file.
+
+''1''
+     Hard link to another file, of any type, previously archived. Hard
+     links must not contain file data.
+
+''2''
+     Symbolic link.
+
+''3', '4''
+     Character special file and block special file respectively. In this
+     case the fields 'devmajor' and 'devminor' contain information defining
+     the device in unspecified format.
+
+''5''
+     Directory.
+
+''6''
+     FIFO special file.
+
+''7''
+     Reserved to represent a file to which an implementation has associated
+     some high-performance attribute (contiguous file). Tarlz treats this
+     type of file as a regular file (type 0).
+
+
+   The field 'magic' contains the ASCII null-terminated string "ustar". The
+field 'version' contains the characters "00" (0x30,0x30). The fields
+'uname' and 'gname' are null-terminated character strings except when all
+characters in the array contain non-null characters including the last
+character. Each numeric field contains a leading space- or zero-filled,
+optionally null-terminated octal number using digits from the ISO/IEC
+646:1991 (ASCII) standard. Tarlz is able to decode numeric fields 1 byte
+longer than standard ustar by not requiring a terminating null character.
+
+
+File: tarlz.info,  Node: Amendments to pax format,  Next: Program design,  Prev: File format,  Up: Top
+
+5 The reasons for the differences with pax
+******************************************
+
+Tarlz creates safe archives that allow the reliable detection of invalid or
+corrupt metadata during decoding even when the integrity checking of lzip
+can't be used because the lzip members are only decompressed partially, as
+it happens in parallel '--diff', '--list', and '--extract'. In order to
+achieve this goal and avoid some other flaws in the pax format, tarlz makes
+some changes to the variant of the pax format that it uses. This chapter
+describes these changes and the concrete reasons to implement them.
+
+
+5.1 Add a CRC of the extended records
+=====================================
+
+The POSIX pax format has a serious flaw. The metadata stored in pax extended
+records are not protected by any kind of check sequence. Corruption in a
+long file name may cause the extraction of the file in the wrong place
+without warning. Corruption in a large file size may cause the truncation of
+the file or the appending of garbage to the file, both followed by a
+spurious warning about a corrupt header far from the place of the undetected
+corruption.
+
+   Metadata like file name and file size must be always protected in an
+archive format because of the adverse effects of undetected corruption in
+them, potentially much worse that undetected corruption in the data. Even
+more so in the case of pax because the amount of metadata it stores is
+potentially large, making undetected corruption and archiver misbehavior
+more probable.
+
+   Headers and metadata must be protected separately from data because the
+integrity checking of lzip may not be able to detect the corruption before
+the metadata have been used, for example, to create a new file in the wrong
+place.
+
+   Because of the above, tarlz protects the extended records with a Cyclic
+Redundancy Check (CRC) in a way compatible with standard tar tools. *Note
+key_crc32::.
+
+
+5.2 Remove flawed backward compatibility
+========================================
+
+In order to allow the extraction of pax archives by a tar utility conforming
+to the POSIX-2:1993 standard, POSIX.1-2008 recommends selecting extended
+header field values that allow such tar to create a regular file containing
+the extended header records as data. This approach is broken because if the
+extended header is needed because of a long file name, the fields 'name'
+and 'prefix' are unable to contain the full file name. (Some tar
+implementations store the truncated name in the field 'name' alone,
+truncating the name to only 100 bytes instead of 256). Therefore the files
+corresponding to both the extended header and the overridden ustar header
+are extracted using truncated file names, perhaps overwriting existing
+files or directories. It may be a security risk to extract a file with a
+truncated file name.
+
+   To avoid this problem, tarlz writes extended headers with all fields
+zeroed except 'size' (which contains the size of the extended records),
+'chksum', 'typeflag', 'magic', and 'version'. In particular, tarlz sets the
+fields 'name' and 'prefix' to zero. This prevents old tar programs from
+extracting the extended records as a file in the wrong place. Tarlz also
+sets to zero those fields of the ustar header overridden by extended
+records. Finally, tarlz skips members with zeroed 'name' and 'prefix' when
+decoding, except when listing. This is needed to detect certain format
+violations during parallel extraction.
+
+   If an extended header is required for any reason (for example a file
+size of 8 GiB or larger, or a link name longer than 100 bytes), tarlz also
+moves the file name to the extended records to prevent an ustar tool from
+trying to extract the file or link. This also makes easier during parallel
+decoding the detection of a tar member split between two lzip members at
+the boundary between the extended header and the ustar header.
+
+
+5.3 As simple as possible (but not simpler)
+===========================================
+
+The tarlz format is mainly ustar. Extended pax headers are used only when
+needed because the length of a file name or link name, or the size or other
+attribute of a file exceed the limits of the ustar format. Adding 1 KiB of
+extended header and records to each member just to save subsecond
+timestamps seems wasteful for a backup format. Moreover, minimizing the
+overhead may help recovering the archive with lziprecover in case of
+corruption.
+
+   Global pax headers are tolerated, but not supported; they are parsed and
+ignored. Some operations may not behave as expected if the archive contains
+global headers.
+
+
+5.4 Improve reproducibility
+===========================
+
+Pax includes by default the process ID of the pax process in the ustar name
+of the extended headers, making the archive not reproducible. Tarlz stores
+the true name of the file just once, either in the ustar header or in the
+extended records, making it easier to produce reproducible archives.
+
+   Pax allows an extended record to have length x-1 or x if x is a power of
+ten; '99<97_bytes>' or '100<97_bytes>'. Tarlz minimizes the length of the
+record and always produces a length of x-1 in these cases.
+
+
+5.5 No data in hard links
+=========================
+
+Tarlz does not allow data in hard link members. The data (if any) must be in
+the member determining the type of the file (which can't be a link). If all
+the names of a file are stored as hard links, the type of the file is lost.
+Not allowing data in hard links also prevents invalid actions like
+extracting file data for a hard link to a symbolic link or to a directory.
+
+
+5.6 Avoid misconversions to/from UTF-8
+======================================
+
+There is no portable way to tell what charset a text string is coded into.
+Therefore, tarlz stores all fields representing text strings unmodified,
+without conversion to UTF-8 nor any other transformation. This prevents
+accidental double UTF-8 conversions. If the need arises this behavior will
+be adjusted with a command-line option in the future.
+
+
+File: tarlz.info,  Node: Program design,  Next: Multi-threaded decoding,  Prev: Amendments to pax format,  Up: Top
+
+6 Internal structure of tarlz
+*****************************
+
+The parts of tarlz related to sequential processing of the archive are more
+or less similar to any other tar and won't be described here. The
+interesting parts described here are those related to Multi-threaded
+processing.
+
+   The structure of the part of tarlz performing Multi-threaded archive
+creation is somewhat similar to that of plzip with the added complication
+of the solidity levels. *Note Program design: (plzip)Program design. A
+grouper thread and several worker threads are created, acting the main
+thread as muxer (multiplexer) thread. A "packet courier" takes care of data
+transfers among threads and limits the maximum number of data blocks
+(packets) being processed simultaneously.
+
+   The grouper traverses the directory tree, groups together the metadata of
+the files to be archived in each lzip member, and distributes them to the
+workers. The workers compress the metadata received from the grouper along
+with the file data read from the file system. The muxer collects processed
+packets from the workers, and writes them to the archive.
+
+.--------.
+|    data|---> to each worker below
+|        |                    .------------.
+| file   |                ,-->| worker   0 |--,
+| system |                |   `------------'  |
+|        |    .---------. |   .------------.  |   .-------.   .---------.
+|metadata|--->| grouper |-+-->| worker   1 |--+-->| muxer |-->| archive |
+`--------'    `---------' |   `------------'  |   `-------'   `---------'
+                          |        ...        |
+                          |   .------------.  |
+                          `-->| worker N-1 |--'
+                              `------------'
+
+   Decoding an archive is somewhat similar to how plzip decompresses a
+regular file to standard output, with the differences that it is not the
+data but only messages what is written to stdout/stderr, and that each
+worker may access files in the file system either to read them (diff) or
+write them (extract). As in plzip, each worker reads members directly from
+the archive.
+
+.--------.
+| file   |<---> data to/from each worker below
+| system |
+`--------'      .------------.
+            ,-->| worker   0 |--,
+            |   `------------'  |
+.---------. |   .------------.  |   .-------.   .--------.
+| archive |-+-->| worker   1 |--+-->| muxer |-->| stdout |
+`---------' |   `------------'  |   `-------'   | stderr |
+            |        ...        |               `--------'
+            |   .------------.  |
+            `-->| worker N-1 |--'
+                `------------'
+
+   As misaligned tar.lz archives can't be decoded in parallel, and the
+misalignment can't be detected until after decoding has started, a
+"mastership request" mechanism has been designed that allows the decoding to
+continue instead of signalling an error.
+
+   During parallel decoding, if a worker finds a misalignment, it requests
+mastership to decode the rest of the archive. When mastership is requested,
+an error_member_id is set, and all subsequently received packets with
+member_id > error_member_id are rejected. All workers requesting mastership
+are blocked at the request_mastership call until mastership is granted.
+Mastership is granted to the delivering worker when its queue is empty to
+make sure that all preceding packets have been processed. When mastership is
+granted, all packets are deleted and all subsequently received packets not
+coming from the master are rejected.
+
+   If a worker can't continue decoding for any cause (for example lack of
+memory or finding a split tar member at the beginning of a lzip member), it
+requests mastership to print an error and terminate the program. Only if
+some other worker requests mastership in a previous lzip member can this
+error be avoided.
+
+
+File: tarlz.info,  Node: Multi-threaded decoding,  Next: Minimum archive sizes,  Prev: Program design,  Up: Top
+
+7 Limitations of parallel tar decoding
+**************************************
+
+Safely decoding an arbitrary tar archive in parallel is only possible if one
+decodes the headers sequentially first. For example, if a tar archive
+containing another tar archive is decoded starting from some position other
+than the beginning, there is no way to know if the first header found there
+belongs to the outer tar archive or to the inner tar archive. Tar is a
+format inherently serial; it was designed for tapes.
+
+   The pax format is even more serial than the ustar format. Two headers
+need to be decoded sequentially for each file. The extended header may even
+need parsing to reveal something as basic as file size. If a thread decodes
+the ustar header skipping the preceding extended header, it may extract a
+file of incorrect size at the wrong place. Moreover, a pax archive with
+global headers can't be decoded in parallel because each thread can't know
+about the global headers decoded by other threads.
+
+   In the case of compressed tar archives, the start of each compressed
+block determines one point through which the tar archive can be decoded in
+parallel. Therefore, in tar.lz archives the decoding operations can't be
+parallelized if the tar members are not aligned with the lzip members. Tar
+archives compressed with plzip can't be decoded in parallel because tar and
+plzip do not have a way to align both sets of members. Certainly one can
+decompress one such archive with a multi-threaded tool like plzip, but the
+increase in speed is not as large as it could be because plzip must
+serialize the decompressed data and pass them to tar, which decodes them
+sequentially, one tar member at a time.
+
+   On the other hand, if the tar.lz archive is created with a tool like
+tarlz, which can guarantee the alignment between tar members and lzip
+members because it controls both archiving and compression, then the lzip
+format becomes an indexed layer on top of the tar archive which makes
+possible decoding it safely in parallel.
+
+   Tarlz is able to automatically decode aligned and unaligned multimember
+tar.lz archives, keeping backwards compatibility. If tarlz finds a member
+misalignment during multi-threaded decoding, it switches to single-threaded
+mode and continues decoding the archive.
+
+   If the files in the archive are large, multi-threaded '--list' on a
+regular (seekable) tar.lz archive can be hundreds of times faster than
+sequential '--list' because, in addition to using several processors, it
+only needs to decompress part of each lzip member. See the following
+example listing the Silesia corpus on a dual core machine:
+
+     tarlz -9 --no-solid -cf silesia.tar.lz silesia
+     time lzip -cd silesia.tar.lz | tar -tf -            (5.032s)
+     time plzip -cd silesia.tar.lz | tar -tf -           (3.256s)
+     time tarlz -tf silesia.tar.lz                       (0.020s)
+
+   On the other hand, multi-threaded '--list' won't detect corruption in
+the tar member data because it only decodes the part of each lzip member
+corresponding to the tar member header. This is another reason why the tar
+headers must provide their own integrity checking.
+
+
+7.1 Limitations of multi-threaded extraction
+============================================
+
+Multi-threaded extraction may produce different output than single-threaded
+extraction in some cases:
+
+   During multi-threaded extraction, several independent threads are
+simultaneously reading the archive and creating files in the file system.
+The archive is not read sequentially. As a consequence, any error or
+weirdness in the archive (like a corrupt member or an end-of-archive block
+in the middle of the archive) won't be usually detected until part of the
+archive beyond that point has been processed.
+
+   If the archive contains two or more tar members with the same name,
+single-threaded extraction extracts the members in the order they appear in
+the archive and leaves in the file system the last version of the file. But
+multi-threaded extraction may extract the members in any order and leave in
+the file system any version of the file nondeterministically. It is
+unspecified which of the tar members is extracted.
+
+   If the same file is extracted through several paths (different member
+names resolve to the same file in the file system), the result is undefined.
+(Probably the resulting file will be mangled).
+
+   Extraction of a hard link may fail if it is extracted before the file it
+links to.
+
+
+File: tarlz.info,  Node: Minimum archive sizes,  Next: Examples,  Prev: Multi-threaded decoding,  Up: Top
+
+8 Minimum archive sizes required for multi-threaded block compression
+*********************************************************************
+
+When creating or appending to a compressed archive using multi-threaded
+block compression, tarlz puts tar members together in blocks and compresses
+as many blocks simultaneously as worker threads are chosen, creating a
+multimember compressed archive.
+
+   For this to work as expected (and roughly multiply the compression speed
+by the number of available processors), the uncompressed archive must be at
+least as large as the number of worker threads times the block size (*note
+--data-size::). Else some processors do not get any data to compress, and
+compression is proportionally slower. The maximum speed increase achievable
+on a given archive is limited by the ratio (uncompressed_size / data_size).
+For example, a tarball the size of gcc or linux scales up to 10 or 14
+processors at level -9.
+
+   The following table shows the minimum uncompressed archive size needed
+for full use of N processors at a given compression level, using the default
+data size for each level:
+
+Processors   2         4         8         16        64        256
+------------------------------------------------------------------
+Level                                                          
+-0           2 MiB     4 MiB     8 MiB     16 MiB    64 MiB    256 MiB
+-1           4 MiB     8 MiB     16 MiB    32 MiB    128 MiB   512 MiB
+-2           6 MiB     12 MiB    24 MiB    48 MiB    192 MiB   768 MiB
+-3           8 MiB     16 MiB    32 MiB    64 MiB    256 MiB   1 GiB
+-4           12 MiB    24 MiB    48 MiB    96 MiB    384 MiB   1.5 GiB
+-5           16 MiB    32 MiB    64 MiB    128 MiB   512 MiB   2 GiB
+-6           32 MiB    64 MiB    128 MiB   256 MiB   1 GiB     4 GiB
+-7           64 MiB    128 MiB   256 MiB   512 MiB   2 GiB     8 GiB
+-8           96 MiB    192 MiB   384 MiB   768 MiB   3 GiB     12 GiB
+-9           128 MiB   256 MiB   512 MiB   1 GiB     4 GiB     16 GiB
+
+
+File: tarlz.info,  Node: Examples,  Next: Problems,  Prev: Minimum archive sizes,  Up: Top
+
+9 A small tutorial with examples
+********************************
+
+Example 1: Create a multimember compressed archive 'archive.tar.lz'
+containing files 'a', 'b' and 'c'.
+
+     tarlz -cf archive.tar.lz a b c
+
+
+Example 2: Append files 'd' and 'e' to the multimember compressed archive
+'archive.tar.lz'.
+
+     tarlz -rf archive.tar.lz d e
+
+
+Example 3: Create a solidly compressed appendable archive 'archive.tar.lz'
+containing files 'a', 'b' and 'c'. Then append files 'd' and 'e' to the
+archive.
+
+     tarlz --asolid -cf archive.tar.lz a b c
+     tarlz --asolid -rf archive.tar.lz d e
+
+
+Example 4: Create a compressed appendable archive containing directories
+'dir1', 'dir2' and 'dir3' with a separate lzip member per directory. Then
+append files 'a', 'b', 'c', 'd' and 'e' to the archive, all of them
+contained in a single lzip member. The resulting archive 'archive.tar.lz'
+contains 5 lzip members (including the end-of-archive member).
+
+     tarlz --dsolid -cf archive.tar.lz dir1 dir2 dir3
+     tarlz --asolid -rf archive.tar.lz a b c d e
+
+
+Example 5: Create a solidly compressed archive 'archive.tar.lz' containing
+files 'a', 'b' and 'c'. Note that no more files can be later appended to
+the archive.
+
+     tarlz --solid -cf archive.tar.lz a b c
+
+
+Example 6: Extract all files from archive 'archive.tar.lz'.
+
+     tarlz -xf archive.tar.lz
+
+
+Example 7: Extract files 'a' and 'c', and the whole tree under directory
+'dir1' from archive 'archive.tar.lz'.
+
+     tarlz -xf archive.tar.lz a c dir1
+
+
+Example 8: Copy the contents of directory 'sourcedir' to the directory
+'destdir'.
+
+     tarlz -C sourcedir --uncompressed -cf - . | tarlz -C destdir -xf -
+
+
+Example 9: Compress the existing POSIX archive 'archive.tar' and write the
+output to 'archive.tar.lz'. Compress each member individually for maximum
+availability. (If one member in the compressed archive gets damaged, the
+other members can still be extracted).
+
+     tarlz -z --no-solid archive.tar
+
+
+Example 10: Compress the archive 'archive.tar' and write the output to
+'foo.tar.lz'.
+
+     tarlz -z -o foo.tar.lz archive.tar
+
+
+Example 11: Concatenate and compress two archives 'archive1.tar' and
+'archive2.tar', and write the output to 'foo.tar.lz'.
+
+     tarlz -A archive1.tar archive2.tar | tarlz -z -o foo.tar.lz
+
+
+File: tarlz.info,  Node: Problems,  Next: Concept index,  Prev: Examples,  Up: Top
+
+10 Reporting bugs
+*****************
+
+There are probably bugs in tarlz. There are certainly errors and omissions
+in this manual. If you report them, they will get fixed. If you don't, no
+one will ever know about them and they will remain unfixed for all
+eternity, if not longer.
+
+   If you find a bug in tarlz, please send electronic mail to
+<lzip-bug@nongnu.org>. Include the version number, which you can find by
+running 'tarlz --version' and 'tarlz -v --check-lib'.
+
+
+File: tarlz.info,  Node: Concept index,  Prev: Problems,  Up: Top
+
+Concept index
+*************
+
+
+* Menu:
+
+* Amendments to pax format:              Amendments to pax format.  (line 6)
+* bugs:                                  Problems.                  (line 6)
+* examples:                              Examples.                  (line 6)
+* file format:                           File format.               (line 6)
+* getting help:                          Problems.                  (line 6)
+* introduction:                          Introduction.              (line 6)
+* invoking:                              Invoking tarlz.            (line 6)
+* minimum archive sizes:                 Minimum archive sizes.     (line 6)
+* options:                               Invoking tarlz.            (line 6)
+* parallel tar decoding:                 Multi-threaded decoding.   (line 6)
+* portable character set:                Portable character set.    (line 6)
+* program design:                        Program design.            (line 6)
+* usage:                                 Invoking tarlz.            (line 6)
+* version:                               Invoking tarlz.            (line 6)
+
+
+
+Tag Table:
+Node: Top216
+Node: Introduction1207
+Node: Invoking tarlz4032
+Ref: --data-size13076
+Ref: --bsolid17512
+Node: Portable character set23425
+Node: File format24068
+Ref: key_crc3231050
+Ref: ustar-uid-gid34315
+Ref: ustar-mtime35122
+Node: Amendments to pax format37125
+Ref: crc3237834
+Ref: flawed-compat39146
+Node: Program design43228
+Node: Multi-threaded decoding47153
+Ref: mt-extraction50434
+Node: Minimum archive sizes51740
+Node: Examples53867
+Node: Problems56234
+Node: Concept index56789
+
+End Tag Table
+
+
+Local Variables:
+coding: iso-8859-15
+End:
diff --git a/doc/tarlz.texi b/doc/tarlz.texi
new file mode 100644
index 0000000..f37164f
--- /dev/null
+++ b/doc/tarlz.texi
@@ -0,0 +1,1356 @@
+\input texinfo @c -*-texinfo-*-
+@c %**start of header
+@setfilename tarlz.info
+@documentencoding ISO-8859-15
+@settitle Tarlz Manual
+@finalout
+@c %**end of header
+
+@set UPDATED 3 January 2024
+@set VERSION 0.25
+
+@dircategory Archiving
+@direntry
+* Tarlz: (tarlz).               Archiver with multimember lzip compression
+@end direntry
+
+
+@ifnothtml
+@titlepage
+@title Tarlz
+@subtitle Archiver with multimember lzip compression
+@subtitle for Tarlz version @value{VERSION}, @value{UPDATED}
+@author by Antonio Diaz Diaz
+
+@page
+@vskip 0pt plus 1filll
+@end titlepage
+
+@contents
+@end ifnothtml
+
+@ifnottex
+@node Top
+@top
+
+This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
+
+@menu
+* Introduction::              Purpose and features of tarlz
+* Invoking tarlz::            Command-line interface
+* Portable character set::    POSIX portable filename character set
+* File format::               Detailed format of the compressed archive
+* Amendments to pax format::  The reasons for the differences with pax
+* Program design::            Internal structure of tarlz
+* Multi-threaded decoding::   Limitations of parallel tar decoding
+* Minimum archive sizes::     Sizes required for full multi-threaded speed
+* Examples::                  A small tutorial with examples
+* Problems::                  Reporting bugs
+* Concept index::             Index of concepts
+@end menu
+
+@sp 1
+Copyright @copyright{} 2013-2024 Antonio Diaz Diaz.
+
+This manual is free documentation: you have unlimited permission to copy,
+distribute, and modify it.
+@end ifnottex
+
+
+@node Introduction
+@chapter Introduction
+@cindex introduction
+
+@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a massively parallel
+(multi-threaded) combined implementation of the tar archiver and the
+@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz uses the
+compression library @uref{http://www.nongnu.org/lzip/lzlib.html,,lzlib}.
+
+Tarlz creates tar archives using a simplified and safer variant of the POSIX
+pax format compressed in lzip format, keeping the alignment between tar
+members and lzip members. The resulting multimember tar.lz archive is
+backward compatible with standard tar tools like GNU tar, which treat it
+like any other tar.lz archive. Tarlz can append files to the end of such
+compressed archives.
+
+Keeping the alignment between tar members and lzip members has two
+advantages. It adds an indexed lzip layer on top of the tar archive, making
+it possible to decode the archive safely in parallel. It also minimizes the
+amount of data lost in case of corruption. Compressing a tar archive with
+plzip may even double the amount of files lost for each lzip member damaged
+because it does not keep the members aligned.
+
+Tarlz can create tar archives with five levels of compression granularity:
+per file (@option{--no-solid}), per block (@option{--bsolid}, default), per
+directory (@option{--dsolid}), appendable solid (@option{--asolid}), and
+solid (@option{--solid}). It can also create uncompressed tar archives.
+
+@noindent
+Of course, compressing each file (or each directory) individually can't
+achieve a compression ratio as high as compressing solidly the whole tar
+archive, but it has the following advantages:
+
+@itemize @bullet
+@item
+The resulting multimember tar.lz archive can be decompressed in
+parallel, multiplying the decompression speed.
+
+@item
+New members can be appended to the archive (by removing the
+end-of-archive member), and unwanted members can be deleted from the
+archive. Just like an uncompressed tar archive.
+
+@item
+It is a safe POSIX-style backup format. In case of corruption, tarlz
+can extract all the undamaged members from the tar.lz archive,
+skipping over the damaged members, just like the standard
+(uncompressed) tar. Moreover, the option @option{--keep-damaged} can be used
+to recover as much data as possible from each damaged member, and
+lziprecover can be used to recover some of the damaged members.
+
+@item
+A multimember tar.lz archive is usually smaller than the corresponding
+solidly compressed tar.gz archive, except when individually
+compressing files smaller than about @w{32 KiB}.
+@end itemize
+
+Tarlz protects the extended records with a Cyclic Redundancy Check (CRC) in
+a way compatible with standard tar tools. @xref{crc32}.
+
+Tarlz does not understand other tar formats like @samp{gnu}, @samp{oldgnu},
+@samp{star}, or @samp{v7}. The command
+@w{@samp{tarlz -t -f archive.tar.lz > /dev/null}} can be used to check that
+the format of the archive is compatible with tarlz.
+
+
+@node Invoking tarlz
+@chapter Invoking tarlz
+@cindex invoking
+@cindex options
+@cindex usage
+@cindex version
+
+The format for running tarlz is:
+
+@example
+tarlz @var{operation} [@var{options}] [@var{files}]
+@end example
+
+@noindent
+All operations except @option{--concatenate} and @option{--compress} operate
+on whole trees if any @var{file} is a directory. All operations except
+@option{--compress} overwrite output files without warning. If no archive is
+specified, tarlz tries to read it from standard input or write it to
+standard output. Tarlz refuses to read archive data from a terminal or write
+archive data to a terminal. Tarlz detects when the archive being created or
+enlarged is among the files to be archived, appended, or concatenated, and
+skips it.
+
+Tarlz does not use absolute file names nor file names above the current
+working directory (perhaps changed by option @option{-C}). On archive creation
+or appending tarlz archives the files specified, but removes from member
+names any leading and trailing slashes and any file name prefixes containing
+a @samp{..} component. On extraction, leading and trailing slashes are also
+removed from member names, and archive members containing a @samp{..}
+component in the file name are skipped. Tarlz does not follow symbolic links
+during extraction; not even symbolic links replacing intermediate
+directories.
+
+On extraction and listing, tarlz removes leading @samp{./} strings from
+member names in the archive or given in the command line, so that
+@w{@samp{tarlz -xf foo ./bar baz}} extracts members @samp{bar} and
+@samp{./baz} from archive @samp{foo}.
+
+If several compression levels or @option{--*solid} options are given, the last
+setting is used. For example @w{@option{-9 --solid --uncompressed -1}} is
+equivalent to @w{@option{-1 --solid}}.
+
+tarlz supports the following operations:
+
+@table @code
+@item --help
+Print an informative help message describing the options and exit.
+
+@item -V
+@itemx --version
+Print the version number of tarlz on the standard output and exit.
+This version number should be included in all bug reports.
+
+@item -A
+@itemx --concatenate
+Append one or more archives to the end of an archive. If no archive is
+specified with the option @option{-f}, concatenate the input archives to
+standard output. All the archives involved must be regular (seekable) files,
+and must be either all compressed or all uncompressed. Compressed and
+uncompressed archives can't be mixed. Compressed archives must be
+multimember lzip files with the two end-of-archive blocks plus any zero
+padding contained in the last lzip member of each archive. The intermediate
+end-of-archive blocks are removed as each new archive is concatenated. If
+the archive is uncompressed, tarlz parses tar headers until it finds the
+end-of-archive blocks. Exit with status 0 without modifying the archive if
+no @var{files} have been specified.
+
+Concatenating archives containing files in common results in two or more tar
+members with the same name in the resulting archive, which may produce
+nondeterministic behavior during multi-threaded extraction.
+@xref{mt-extraction}.
+
+@item -c
+@itemx --create
+Create a new archive from @var{files}.
+
+@item -d
+@itemx --diff
+Compare and report differences between archive and file system. For each tar
+member in the archive, check that the corresponding file in the file system
+exists and is of the same type (regular file, directory, etc). Report on
+standard output the differences found in type, mode (permissions), owner and
+group IDs, modification time, file size, file contents (of regular files),
+target (of symlinks) and device number (of block/character special files).
+
+As tarlz removes leading slashes from member names, the option @option{-C} may
+be used in combination with @option{--diff} when absolute file names were used
+on archive creation: @w{@samp{tarlz -C / -d}}. Alternatively, tarlz may be
+run from the root directory to perform the comparison.
+
+@item --delete
+Delete files and directories from an archive in place. It currently can
+delete only from uncompressed archives and from archives with files
+compressed individually (@option{--no-solid} archives). Note that files of
+about @option{--data-size} or larger are compressed individually even if
+@option{--bsolid} is used, and can therefore be deleted. Tarlz takes care to
+not delete a tar member unless it is possible to do so. For example it won't
+try to delete a tar member that is not compressed individually. Even in the
+case of finding a corrupt member after having deleted some member(s), tarlz
+stops and copies the rest of the file as soon as corruption is found,
+leaving it just as corrupt as it was, but not worse.
+
+To delete a directory without deleting the files under it, use
+@w{@samp{tarlz --delete -f foo --exclude='dir/*' dir}}. Deleting in place
+may be dangerous. A corrupt archive, a power cut, or an I/O error may cause
+data loss.
+
+@item -r
+@itemx --append
+Append files to the end of an archive. The archive must be a regular
+(seekable) file either compressed or uncompressed. Compressed members can't
+be appended to an uncompressed archive, nor vice versa. If the archive is
+compressed, it must be a multimember lzip file with the two end-of-archive
+blocks plus any zero padding contained in the last lzip member of the
+archive. It is possible to append files to an archive with a different
+compression granularity. Appending works as follows; first the
+end-of-archive blocks are removed, then the new members are appended, and
+finally two new end-of-archive blocks are appended to the archive. If the
+archive is uncompressed, tarlz parses and skips tar headers until it finds
+the end-of-archive blocks. Exit with status 0 without modifying the archive
+if no @var{files} have been specified.
+
+Appending files already present in the archive results in two or more tar
+members with the same name, which may produce nondeterministic behavior
+during multi-threaded extraction. @xref{mt-extraction}.
+
+@item -t
+@itemx --list
+List the contents of an archive. If @var{files} are given, list only the
+@var{files} given.
+
+@item -x
+@itemx --extract
+Extract files from an archive. If @var{files} are given, extract only the
+@var{files} given. Else extract all the files in the archive. To extract a
+directory without extracting the files under it, use
+@w{@samp{tarlz -xf foo --exclude='dir/*' dir}}. Tarlz removes files and
+empty directories unconditionally before extracting over them. Other than
+that, it does not make any special effort to extract a file over an
+incompatible type of file. For example, extracting a file over a non-empty
+directory usually fails.
+
+@item -z
+@itemx --compress
+Compress existing POSIX tar archives aligning the lzip members to the tar
+members with choice of granularity (@option{--bsolid} by default,
+@option{--dsolid} works like @option{--asolid}). Exit with error status 2 if
+any input archive is an empty file. The input archives are kept unchanged.
+Existing compressed archives are not overwritten. A hyphen @samp{-} used as
+the name of an input archive reads from standard input and writes to
+standard output (unless the option @option{--output} is used). Tarlz can be
+used as compressor for GNU tar by using a command like
+@w{@samp{tar -c -Hustar foo | tarlz -z -o foo.tar.lz}}. Tarlz can be used as
+compressor for zupdate (zutils) by using a command like
+@w{@samp{zupdate --lz="tarlz -z" foo.tar.gz}}. Note that tarlz only works
+reliably on archives without global headers, or with global headers whose
+content can be ignored.
+
+The compression is reversible, including any garbage present after the
+end-of-archive blocks. Tarlz stops parsing after the first end-of-archive
+block is found, and then compresses the rest of the archive. Unless solid
+compression is requested, the end-of-archive blocks are compressed in a lzip
+member separated from the preceding members and from any non-zero garbage
+following the end-of-archive blocks. @option{--compress} implies plzip
+argument style, not tar style. Each input archive is compressed to a file
+with the extension @samp{.lz} added unless the option @option{--output} is
+used. When @option{--output} is used, only one input archive can be specified.
+@option{-f} can't be used with @option{--compress}.
+
+@item --check-lib
+Compare the
+@uref{http://www.nongnu.org/lzip/manual/lzlib_manual.html#Library-version,,version of lzlib}
+used to compile tarlz with the version actually being used at run time and
+exit. Report any differences found. Exit with error status 1 if differences
+are found. A mismatch may indicate that lzlib is not correctly installed or
+that a different version of lzlib has been installed after compiling tarlz.
+Exit with error status 2 if LZ_API_VERSION and LZ_version_string don't
+match. @w{@samp{tarlz -v --check-lib}} shows the version of lzlib being used
+and the value of LZ_API_VERSION (if defined).
+@ifnothtml
+@xref{Library version,,,lzlib}.
+@end ifnothtml
+
+@end table
+
+tarlz supports the following
+@uref{http://www.nongnu.org/arg-parser/manual/arg_parser_manual.html#Argument-syntax,,options}:
+@ifnothtml
+@xref{Argument syntax,,,arg_parser}.
+@end ifnothtml
+
+@table @code
+@anchor{--data-size}
+@item -B @var{bytes}
+@itemx --data-size=@var{bytes}
+Set target size of input data blocks for the option @option{--bsolid}.
+@xref{--bsolid}. Valid values range from @w{8 KiB} to @w{1 GiB}. Default
+value is two times the dictionary size, except for option @option{-0} where it
+defaults to @w{1 MiB}. @xref{Minimum archive sizes}.
+
+@item -C @var{dir}
+@itemx --directory=@var{dir}
+Change to directory @var{dir}. When creating, appending, comparing, or
+extracting, the position of each @option{-C} option in the command line is
+significant; it changes the current working directory for the following
+@var{files} until a new @option{-C} option appears in the command line.
+@option{--list} and @option{--delete} ignore any @option{-C} options
+specified. @var{dir} is relative to the then current working directory,
+perhaps changed by a previous @option{-C} option.
+
+Note that a process can only have one current working directory (CWD).
+Therefore multi-threading can't be used to create or decode an archive if a
+@option{-C} option appears after a (relative) file name in the command line.
+(All file names are made relative when decoding).
+
+@item -f @var{archive}
+@itemx --file=@var{archive}
+Use archive file @var{archive}. A hyphen @samp{-} used as an @var{archive}
+argument reads from standard input or writes to standard output.
+
+@item -h
+@itemx --dereference
+Follow symbolic links during archive creation, appending or comparison.
+Archive or compare the files they point to instead of the links themselves.
+
+@item -n @var{n}
+@itemx --threads=@var{n}
+Set the number of (de)compression threads, overriding the system's default.
+Valid values range from 0 to "as many as your system can support". A value
+of 0 disables threads entirely. If this option is not used, tarlz tries to
+detect the number of processors in the system and use it as default value.
+@w{@samp{tarlz --help}} shows the system's default value. See the note about
+multi-threading in the option @option{-C} above.
+
+Note that the number of usable threads is limited during compression to
+@w{ceil( uncompressed_size / data_size )} (@pxref{Minimum archive sizes}),
+and during decompression to the number of lzip members in the tar.lz
+archive, which you can find by running @w{@samp{lzip -lv archive.tar.lz}}.
+
+@item -o @var{file}
+@itemx --output=@var{file}
+Write the compressed output to @var{file}. @w{@option{-o -}} writes the
+compressed output to standard output. Currently @option{--output} only works
+with @option{--compress}.
+
+@item -p
+@itemx --preserve-permissions
+On extraction, set file permissions as they appear in the archive. This is
+the default behavior when tarlz is run by the superuser. The default for
+other users is to subtract the umask of the user running tarlz from the
+permissions specified in the archive.
+
+@item -q
+@itemx --quiet
+Quiet operation. Suppress all messages.
+
+@item -v
+@itemx --verbose
+Verbosely list files processed. Further -v's (up to 4) increase the
+verbosity level.
+
+@item -0 .. -9
+Set the compression level for @option{--create}, @option{--append}, and
+@option{--compress}. The default compression level is @option{-6}. Like lzip,
+tarlz also minimizes the dictionary size of the lzip members it creates,
+reducing the amount of memory required for decompression.
+
+@multitable {Level} {Dictionary size} {Match length limit}
+@item Level @tab Dictionary size @tab Match length limit
+@item -0 @tab 64 KiB @tab  16 bytes
+@item -1 @tab  1 MiB @tab   5 bytes
+@item -2 @tab  1.5 MiB @tab   6 bytes
+@item -3 @tab  2 MiB @tab   8 bytes
+@item -4 @tab  3 MiB @tab  12 bytes
+@item -5 @tab  4 MiB @tab  20 bytes
+@item -6 @tab  8 MiB @tab  36 bytes
+@item -7 @tab 16 MiB @tab  68 bytes
+@item -8 @tab 24 MiB @tab 132 bytes
+@item -9 @tab 32 MiB @tab 273 bytes
+@end multitable
+
+@item --uncompressed
+With @option{--create}, don't compress the tar archive created. Create an
+uncompressed tar archive instead. With @option{--append}, don't compress the
+new members appended to the tar archive. Compressed members can't be
+appended to an uncompressed archive, nor vice versa. @option{--uncompressed}
+can be omitted if it can be deduced from the archive name. (An uncompressed
+archive name lacks a @samp{.lz} or @samp{.tlz} extension).
+
+@item --asolid
+When creating or appending to a compressed archive, use appendable solid
+compression. All the files being added to the archive are compressed into a
+single lzip member, but the end-of-archive blocks are compressed into a
+separate lzip member. This creates a solidly compressed appendable archive.
+Solid archives can't be created nor decoded in parallel.
+
+@anchor{--bsolid}
+@item --bsolid
+When creating or appending to a compressed archive, use block compression.
+Tar members are compressed together in a lzip member until they approximate
+a target uncompressed size. The size can't be exact because each solidly
+compressed data block must contain an integer number of tar members. Block
+compression is the default because it improves compression ratio for
+archives with many files smaller than the block size. This option allows
+tarlz revert to default behavior if, for example, it is invoked through an
+alias like @w{@samp{tar='tarlz --solid'}}. @xref{--data-size}, to set the
+target block size.
+
+@item --dsolid
+When creating or appending to a compressed archive, compress each file
+specified in the command line separately in its own lzip member, and use
+solid compression for each directory specified in the command line. The
+end-of-archive blocks are compressed into a separate lzip member. This
+creates a compressed appendable archive with a separate lzip member for each
+file or top-level directory specified.
+
+@item --no-solid
+When creating or appending to a compressed archive, compress each file
+separately in its own lzip member. The end-of-archive blocks are compressed
+into a separate lzip member. This creates a compressed appendable archive
+with a lzip member for each file.
+
+@item --solid
+When creating or appending to a compressed archive, use solid compression.
+The files being added to the archive, along with the end-of-archive blocks,
+are compressed into a single lzip member. The resulting archive is not
+appendable. No more files can be later appended to the archive. Solid
+archives can't be created nor decoded in parallel.
+
+@item --anonymous
+Equivalent to @w{@option{--owner=root --group=root}}.
+
+@item --owner=@var{owner}
+When creating or appending, use @var{owner} for files added to the archive.
+If @var{owner} is not a valid user name, it is decoded as a decimal numeric
+user ID.
+
+@item --group=@var{group}
+When creating or appending, use @var{group} for files added to the archive.
+If @var{group} is not a valid group name, it is decoded as a decimal numeric
+group ID.
+
+@item --exclude=@var{pattern}
+Exclude files matching a shell pattern like @samp{*.o}. A file is considered
+to match if any component of the file name matches. For example, @samp{*.o}
+matches @samp{foo.o}, @samp{foo.o/bar} and @samp{foo/bar.o}. If
+@var{pattern} contains a @samp{/}, it matches a corresponding @samp{/} in
+the file name. For example, @samp{foo/*.o} matches @samp{foo/bar.o}.
+Multiple @option{--exclude} options can be specified.
+
+@item --ignore-ids
+Make @option{--diff} ignore differences in owner and group IDs. This option is
+useful when comparing an @option{--anonymous} archive.
+
+@item --ignore-metadata
+Make @option{--diff} ignore any differences in metadata (file permissions,
+owner and group IDs, modification time). Compare only file type, file size,
+and file content. This option is useful when file permissions have not been
+fully restored because uid/gid changed on extraction.
+
+@item --ignore-overflow
+Make @option{--diff} ignore differences in mtime caused by overflow on 32-bit
+systems with a 32-bit time_t.
+
+@item --keep-damaged
+Don't delete partially extracted files. If a decompression error happens
+while extracting a file, keep the partial data extracted. Use this option to
+recover as much data as possible from each damaged member. It is recommended
+to run tarlz in single-threaded mode (@option{--threads=0}) when using this
+option.
+
+@item --missing-crc
+Exit with error status 2 if the CRC of the extended records is missing. When
+this option is used, tarlz detects any corruption in the extended records
+(only limited by CRC collisions). But note that a corrupt @samp{GNU.crc32}
+keyword, for example @samp{GNU.crc30}, is reported as a missing CRC instead
+of as a corrupt record. This misleading @w{@samp{Missing CRC}} message is
+the consequence of a flaw in the POSIX pax format; i.e., the lack of a
+mandatory check sequence of the extended records. @xref{crc32}.
+
+@item --mtime=@var{date}
+When creating or appending, use @var{date} as the modification time for
+files added to the archive instead of their actual modification times. The
+value of @var{date} may be either @samp{@@} followed by the number of
+seconds since (or before) the epoch, or a date in format
+@w{@samp{[-]YYYY-MM-DD HH:MM:SS}} or @samp{[-]YYYY-MM-DDTHH:MM:SS}, or the
+name of an existing reference file starting with @samp{.} or @samp{/} whose
+modification time is used. The time of day @samp{HH:MM:SS} in the date
+format is optional and defaults to @samp{00:00:00}. The epoch is
+@w{@samp{1970-01-01 00:00:00 UTC}}. Negative seconds or years define a
+modification time before the epoch.
+
+@item --out-slots=@var{n}
+Number of @w{1 MiB} output packets buffered per worker thread during
+multi-threaded creation or appending to compressed archives. Increasing the
+number of packets may increase compression speed if the files being archived
+are larger than @w{64 MiB} compressed, but requires more memory. Valid
+values range from 1 to 1024. The default value is 64.
+
+@item --warn-newer
+During archive creation, warn if any file being archived has a modification
+time newer than the archive creation time. This option may slow archive
+creation somewhat because it makes an extra call to @samp{stat} after
+archiving each file, but it guarantees that file contents were not modified
+during the creation of the archive. Note that the file must be at least one
+second newer than the archive for it to be detected as newer.
+
+@ignore
+@item --permissive
+Allow some violations of the archive format, like consecutive extended
+headers preceding a ustar header, or several records with the same
+keyword appearing in the same block of extended records.
+@end ignore
+
+@end table
+
+Exit status: 0 for a normal exit, 1 for environmental problems
+(file not found, files differ, invalid command-line options, I/O errors,
+etc), 2 to indicate a corrupt or invalid input file, 3 for an internal
+consistency error (e.g., bug) which caused tarlz to panic.
+
+
+@node Portable character set
+@chapter POSIX portable filename character set
+@cindex portable character set
+
+The set of characters from which portable file names are constructed.
+
+@example
+A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
+a b c d e f g h i j k l m n o p q r s t u v w x y z
+0 1 2 3 4 5 6 7 8 9 . _ -
+@end example
+
+The last three characters are the period, underscore, and hyphen-minus
+characters, respectively.
+
+File names are identifiers. Therefore, archiving works better when file
+names use only the portable character set without spaces added.
+
+
+@node File format
+@chapter File format
+@cindex file format
+
+In the diagram below, a box like this:
+
+@verbatim
++---+
+|   | <-- the vertical bars might be missing
++---+
+@end verbatim
+
+represents one byte; a box like this:
+
+@verbatim
++==============+
+|              |
++==============+
+@end verbatim
+
+represents a variable number of bytes or a fixed but large number of
+bytes (for example 512).
+
+@sp 1
+A tar.lz file consists of one or more lzip members (compressed data sets).
+The members simply appear one after another in the file, with no additional
+information before, between, or after them.
+
+Each lzip member contains one or more tar members in a simplified POSIX pax
+interchange format. The only pax typeflag value supported by tarlz (in
+addition to the typeflag values defined by the ustar format) is @samp{x}.
+The pax format is an extension on top of the ustar format that removes the
+size limitations of the ustar format.
+
+Each tar member contains one file archived, and is represented by the
+following sequence:
+
+@itemize @bullet
+@item
+An optional extended header block followed by one or more blocks that
+contain the extended header records as if they were the contents of a file;
+i.e., the extended header records are included as the data for this header
+block. This header block is of the form described in pax header block, with
+a typeflag value of @samp{x}.
+
+@item
+A header block in ustar format that describes the file. Any fields defined
+in the preceding optional extended header records override the associated
+fields in this header block for this file.
+
+@item
+Zero or more blocks that contain the contents of the file.
+@end itemize
+
+Each tar member must be contiguously stored in a lzip member for the
+parallel decoding operations like @option{--list} to work. If any tar member
+is split over two or more lzip members, the archive must be decoded
+sequentially. @xref{Multi-threaded decoding}.
+
+At the end of the archive file there are two 512-byte blocks filled with
+binary zeros, interpreted as an end-of-archive indicator. These EOA blocks
+are either compressed in a separate lzip member or compressed along with the
+tar members contained in the last lzip member. For a compressed archive to
+be recognized by tarlz as appendable, the last lzip member must contain
+between 512 and 32256 zeros alone (without any non-zero bytes).
+
+The diagram below shows the correspondence between each tar member (formed
+by one or two headers plus optional data) in the tar archive and each
+@uref{http://www.nongnu.org/lzip/manual/lzip_manual.html#File-format,,lzip member}
+in the resulting multimember tar.lz archive, when per file compression is
+used:
+@ifnothtml
+@xref{File format,,,lzip}.
+@end ifnothtml
+
+@verbatim
+tar
++========+======+=================+===============+========+======+========+
+| header | data | extended header | extended data | header | data |   EOA  |
++========+======+=================+===============+========+======+========+
+
+tar.lz
++===============+=================================================+========+
+|     member    |                      member                     | member |
++===============+=================================================+========+
+@end verbatim
+
+@ignore
+When @option{--permissive} is used, the following violations of the
+archive format are allowed:@*
+If several extended headers precede an ustar header, only the last
+extended header takes effect. The other extended headers are ignored.
+Similarly, if several records with the same keyword appear in the same
+block of extended records, only the last record for the repeated keyword
+takes effect. The other records for the repeated keyword are ignored.@*
+A global header inserted between an extended header and an ustar header.@*
+An extended header just before the end-of-archive blocks.
+@end ignore
+
+@sp 1
+@section Pax header block
+
+The pax header block is identical to the ustar header block described below
+except that the typeflag has the value @samp{x} (extended). The field
+@samp{size} is the size of the extended header data in bytes. Most other
+fields in the pax header block are zeroed on archive creation to prevent
+trouble if the archive is read by an ustar tool, and are ignored by tarlz on
+archive extraction. @xref{flawed-compat}.
+
+Tarlz limits the size of the pax extended header data so that the whole
+header set (extended header + extended data + ustar header) can be read and
+decoded in a buffer of size INT_MAX.
+
+The pax extended header data consists of one or more records, each of
+them constructed as follows:@*
+@w{@samp{"%d %s=%s\n", <length>, <keyword>, <value>}}
+
+The fields <length> and <keyword> in the record must be limited to the
+portable character set (@pxref{Portable character set}). The field <length>
+contains the decimal length of the record in bytes, including the trailing
+newline. The field <value> is stored as-is, without conversion to UTF-8 nor
+any other transformation. The fields are separated by the ASCII characters
+space, equal-sign, and newline.
+
+These are the <keyword> values currently supported by tarlz:
+
+@table @code
+@item atime
+The signed decimal representation of the access time of the following file
+in seconds since (or before) the epoch, obtained from the function
+@samp{stat}. The atime record is created only for files with a modification
+time outside of the ustar range. @xref{ustar-mtime}.
+
+@item gid
+The unsigned decimal representation of the group ID of the group that owns
+the following file. The gid record is created only for files with a group ID
+greater than 2_097_151 @w{(octal 7_777_777)}. @xref{ustar-uid-gid}.
+
+@item linkpath
+The file name of a link being created to another file, of any type,
+previously archived. This record overrides the field @samp{linkname} in the
+following ustar header block. The following ustar header block determines
+the type of link created. If typeflag of the following header block is 1, a
+hard link is created. If typeflag is 2, a symbolic link is created and the
+linkpath value is used as the contents of the symbolic link. The linkpath
+record is created only for links with a link name that does not fit in the
+space provided by the ustar header.
+
+@item mtime
+The signed decimal representation of the modification time of the following
+file in seconds since (or before) the epoch, obtained from the function
+@samp{stat}. This record overrides the field @samp{mtime} in the following
+ustar header block. The mtime record is created only for files with a
+modification time outside of the ustar range. @xref{ustar-mtime}.
+
+@item path
+The file name of the following file. This record overrides the fields
+@samp{name} and @samp{prefix} in the following ustar header block. The path
+record is created for files with a name that does not fit in the space
+provided by the ustar header, but is also created for files that require any
+other extended record so that the fields @samp{name} and @samp{prefix} in
+the following ustar header block can be zeroed.
+
+@item size
+The size of the file in bytes, expressed as a decimal number using digits
+from the ISO/IEC 646:1991 (ASCII) standard. This record overrides the field
+@samp{size} in the following ustar header block. The size record is created
+only for files with a size value greater than 8_589_934_591
+@w{(octal 77_777_777_777)}; that is, @w{8 GiB} (2^33 bytes) or larger.
+
+@item uid
+The unsigned decimal representation of the user ID of the file owner of the
+following file. The uid record is created only for files with a user ID
+greater than 2_097_151 @w{(octal 7_777_777)}. @xref{ustar-uid-gid}.
+
+@anchor{key_crc32}
+@item GNU.crc32
+CRC32-C (Castagnoli) of the extended header data excluding the 8 bytes
+representing the CRC <value> itself. The <value> is represented as 8
+hexadecimal digits in big endian order,
+@w{@samp{22 GNU.crc32=00000000\n}}. The keyword of the CRC record is
+protected by the CRC to guarantee that corruption is always detected when
+using @option{--missing-crc} (except in case of CRC collision). A CRC was
+chosen because a checksum is too weak for a potentially large list of
+variable sized records. A checksum can't detect simple errors like the
+swapping of two bytes.
+
+@end table
+
+At verbosity level 1 or higher tarlz prints a diagnostic for each unknown
+extended header keyword found in an archive, once per keyword.
+
+@sp 1
+@section Ustar header block
+
+The ustar header block has a length of 512 bytes and is structured as
+shown in the following table. All lengths and offsets are in decimal.
+
+@multitable {Field Name} {Offset} {Length (in bytes)}
+@item Field Name @tab Offset @tab Length (in bytes)
+@item name     @tab   0 @tab 100
+@item mode     @tab 100 @tab   8
+@item uid      @tab 108 @tab   8
+@item gid      @tab 116 @tab   8
+@item size     @tab 124 @tab  12
+@item mtime    @tab 136 @tab  12
+@item chksum   @tab 148 @tab   8
+@item typeflag @tab 156 @tab   1
+@item linkname @tab 157 @tab 100
+@item magic    @tab 257 @tab   6
+@item version  @tab 263 @tab   2
+@item uname    @tab 265 @tab  32
+@item gname    @tab 297 @tab  32
+@item devmajor @tab 329 @tab   8
+@item devminor @tab 337 @tab   8
+@item prefix   @tab 345 @tab 155
+@end multitable
+
+All characters in the header block are coded using the ISO/IEC 646:1991
+(ASCII) standard, except in fields storing names for files, users, and
+groups. For maximum portability between implementations, names should only
+contain characters from the portable character set (@pxref{Portable
+character set}), but if an implementation supports the use of characters
+outside of @samp{/} and the portable character set in names for files,
+users, and groups, tarlz will use the byte values in these names unmodified.
+
+The fields @samp{name}, @samp{linkname}, and @samp{prefix} are
+null-terminated character strings except when all characters in the array
+contain non-null characters including the last character.
+
+The fields @samp{name} and @samp{prefix} produce the file name. A new file
+name is formed, if prefix is not an empty string (its first character is not
+null), by concatenating prefix (up to the first null character), a slash
+character, and name; otherwise, name is used alone. In either case, name is
+terminated at the first null character. If prefix begins with a null
+character, it is ignored. In this manner, file names of at most 256
+characters can be supported. If a file name does not fit in the space
+provided, an extended record is used to store the file name.
+
+The field @samp{linkname} does not use the prefix to produce a file name. If
+the link name does not fit in the 100 characters provided, an extended
+record is used to store the link name.
+
+The field @samp{mode} provides 12 access permission bits. The following
+table shows the symbolic name of each bit and its octal value:
+
+@multitable {Bit Name} {Value} {Bit Name} {Value} {Bit Name} {Value}
+@headitem Bit Name @tab Value @tab Bit Name @tab Value @tab Bit Name @tab Value
+@item S_ISUID @tab 04000 @tab S_ISGID @tab 02000 @tab S_ISVTX @tab 01000
+@item S_IRUSR @tab 00400 @tab S_IWUSR @tab 00200 @tab S_IXUSR @tab 00100
+@item S_IRGRP @tab 00040 @tab S_IWGRP @tab 00020 @tab S_IXGRP @tab 00010
+@item S_IROTH @tab 00004 @tab S_IWOTH @tab 00002 @tab S_IXOTH @tab 00001
+@end multitable
+
+@anchor{ustar-uid-gid}
+The fields @samp{uid} and @samp{gid} are the user and group IDs of the owner
+and group of the file, respectively. If the file uid or gid are greater than
+2_097_151 @w{(octal 7_777_777)}, an extended record is used to store the uid
+or gid.
+
+The field @samp{size} contains the octal representation of the size of the
+file in bytes. If the field @samp{typeflag} specifies a file of type '0'
+(regular file) or '7' (high performance regular file), the number of logical
+records following the header is @w{(size / 512)} rounded to the next
+integer. For all other values of typeflag, tarlz either sets the size field
+to 0 or ignores it, and does not store or expect any logical records
+following the header. If the file size is larger than 8_589_934_591 bytes
+@w{(octal 77_777_777_777)}, an extended record is used to store the file size.
+
+@anchor{ustar-mtime}
+The field @samp{mtime} contains the octal representation of the modification
+time of the file at the time it was archived, obtained from the function
+@samp{stat}. If the modification time is negative or larger than
+8_589_934_591 @w{(octal 77_777_777_777)} seconds since the epoch, an extended
+record is used to store the modification time. The ustar range of mtime goes
+from @w{@samp{1970-01-01 00:00:00 UTC}} to @w{@samp{2242-03-16 12:56:31 UTC}}.
+
+The field @samp{chksum} contains the octal representation of the value of
+the simple sum of all bytes in the header logical record. Each byte in the
+header is treated as an unsigned value. When calculating the checksum, the
+chksum field is treated as if it were all space characters.
+
+The field @samp{typeflag} contains a single character specifying the type of
+file archived:
+
+@table @code
+@item '0'
+Regular file.
+
+@item '1'
+Hard link to another file, of any type, previously archived. Hard links must
+not contain file data.
+
+@item '2'
+Symbolic link.
+
+@item '3', '4'
+Character special file and block special file respectively. In this case the
+fields @samp{devmajor} and @samp{devminor} contain information defining the
+device in unspecified format.
+
+@item '5'
+Directory.
+
+@item '6'
+FIFO special file.
+
+@item '7'
+Reserved to represent a file to which an implementation has associated some
+high-performance attribute (contiguous file). Tarlz treats this type of file
+as a regular file (type 0).
+
+@end table
+
+The field @samp{magic} contains the ASCII null-terminated string "ustar".
+The field @samp{version} contains the characters "00" (0x30,0x30). The
+fields @samp{uname} and @samp{gname} are null-terminated character strings
+except when all characters in the array contain non-null characters
+including the last character. Each numeric field contains a leading space-
+or zero-filled, optionally null-terminated octal number using digits from
+the ISO/IEC 646:1991 (ASCII) standard. Tarlz is able to decode numeric
+fields 1 byte longer than standard ustar by not requiring a terminating null
+character.
+
+
+@node Amendments to pax format
+@chapter The reasons for the differences with pax
+@cindex Amendments to pax format
+
+Tarlz creates safe archives that allow the reliable detection of invalid or
+corrupt metadata during decoding even when the integrity checking of lzip
+can't be used because the lzip members are only decompressed partially, as
+it happens in parallel @option{--diff}, @option{--list}, and @option{--extract}.
+In order to achieve this goal and avoid some other flaws in the pax format,
+tarlz makes some changes to the variant of the pax format that it uses. This
+chapter describes these changes and the concrete reasons to implement them.
+
+@sp 1
+@anchor{crc32}
+@section Add a CRC of the extended records
+
+The POSIX pax format has a serious flaw. The metadata stored in pax extended
+records are not protected by any kind of check sequence. Corruption in a
+long file name may cause the extraction of the file in the wrong place
+without warning. Corruption in a large file size may cause the truncation of
+the file or the appending of garbage to the file, both followed by a
+spurious warning about a corrupt header far from the place of the undetected
+corruption.
+
+Metadata like file name and file size must be always protected in an archive
+format because of the adverse effects of undetected corruption in them,
+potentially much worse that undetected corruption in the data. Even more so
+in the case of pax because the amount of metadata it stores is potentially
+large, making undetected corruption and archiver misbehavior more probable.
+
+Headers and metadata must be protected separately from data because the
+integrity checking of lzip may not be able to detect the corruption before
+the metadata have been used, for example, to create a new file in the wrong
+place.
+
+Because of the above, tarlz protects the extended records with a Cyclic
+Redundancy Check (CRC) in a way compatible with standard tar tools.
+@xref{key_crc32}.
+
+@sp 1
+@anchor{flawed-compat}
+@section Remove flawed backward compatibility
+
+In order to allow the extraction of pax archives by a tar utility conforming
+to the POSIX-2:1993 standard, POSIX.1-2008 recommends selecting extended
+header field values that allow such tar to create a regular file containing
+the extended header records as data. This approach is broken because if the
+extended header is needed because of a long file name, the fields
+@samp{name} and @samp{prefix} are unable to contain the full file name.
+(Some tar implementations store the truncated name in the field @samp{name}
+alone, truncating the name to only 100 bytes instead of 256). Therefore the
+files corresponding to both the extended header and the overridden ustar
+header are extracted using truncated file names, perhaps overwriting
+existing files or directories. It may be a security risk to extract a file
+with a truncated file name.
+
+To avoid this problem, tarlz writes extended headers with all fields zeroed
+except @samp{size} (which contains the size of the extended records),
+@samp{chksum}, @samp{typeflag}, @samp{magic}, and @samp{version}. In
+particular, tarlz sets the fields @samp{name} and @samp{prefix} to zero.
+This prevents old tar programs from extracting the extended records as a
+file in the wrong place. Tarlz also sets to zero those fields of the ustar
+header overridden by extended records. Finally, tarlz skips members with
+zeroed @samp{name} and @samp{prefix} when decoding, except when listing.
+This is needed to detect certain format violations during parallel
+extraction.
+
+If an extended header is required for any reason (for example a file size of
+@w{8 GiB} or larger, or a link name longer than 100 bytes), tarlz also moves
+the file name to the extended records to prevent an ustar tool from trying
+to extract the file or link. This also makes easier during parallel decoding
+the detection of a tar member split between two lzip members at the boundary
+between the extended header and the ustar header.
+
+@sp 1
+@section As simple as possible (but not simpler)
+
+The tarlz format is mainly ustar. Extended pax headers are used only when
+needed because the length of a file name or link name, or the size or other
+attribute of a file exceed the limits of the ustar format. Adding @w{1 KiB}
+of extended header and records to each member just to save subsecond
+timestamps seems wasteful for a backup format. Moreover, minimizing the
+overhead may help recovering the archive with lziprecover in case of
+corruption.
+
+Global pax headers are tolerated, but not supported; they are parsed and
+ignored. Some operations may not behave as expected if the archive contains
+global headers.
+
+@sp 1
+@section Improve reproducibility
+
+Pax includes by default the process ID of the pax process in the ustar name
+of the extended headers, making the archive not reproducible. Tarlz stores
+the true name of the file just once, either in the ustar header or in the
+extended records, making it easier to produce reproducible archives.
+
+Pax allows an extended record to have length x-1 or x if x is a power of
+ten; @samp{99<97_bytes>} or @samp{100<97_bytes>}. Tarlz minimizes the length
+of the record and always produces a length of x-1 in these cases.
+
+@sp 1
+@section No data in hard links
+
+Tarlz does not allow data in hard link members. The data (if any) must be in
+the member determining the type of the file (which can't be a link). If all
+the names of a file are stored as hard links, the type of the file is lost.
+Not allowing data in hard links also prevents invalid actions like
+extracting file data for a hard link to a symbolic link or to a directory.
+
+@sp 1
+@section Avoid misconversions to/from UTF-8
+
+There is no portable way to tell what charset a text string is coded into.
+Therefore, tarlz stores all fields representing text strings unmodified,
+without conversion to UTF-8 nor any other transformation. This prevents
+accidental double UTF-8 conversions. If the need arises this behavior will
+be adjusted with a command-line option in the future.
+
+
+@node Program design
+@chapter Internal structure of tarlz
+@cindex program design
+
+The parts of tarlz related to sequential processing of the archive are more
+or less similar to any other tar and won't be described here. The interesting
+parts described here are those related to Multi-threaded processing.
+
+The structure of the part of tarlz performing Multi-threaded archive
+creation is somewhat similar to that of
+@uref{http://www.nongnu.org/lzip/plzip.html#Program-design,,plzip} with the
+added complication of the solidity levels.
+@ifnothtml
+@xref{Program design,,,plzip}.
+@end ifnothtml
+A grouper thread and several worker threads are created, acting the main
+thread as muxer (multiplexer) thread. A "packet courier" takes care of data
+transfers among threads and limits the maximum number of data blocks
+(packets) being processed simultaneously.
+
+The grouper traverses the directory tree, groups together the metadata of
+the files to be archived in each lzip member, and distributes them to the
+workers. The workers compress the metadata received from the grouper along
+with the file data read from the file system. The muxer collects processed
+packets from the workers, and writes them to the archive.
+
+@verbatim
+.--------.
+|    data|---> to each worker below
+|        |                    .------------.
+| file   |                ,-->| worker   0 |--,
+| system |                |   `------------'  |
+|        |    .---------. |   .------------.  |   .-------.   .---------.
+|metadata|--->| grouper |-+-->| worker   1 |--+-->| muxer |-->| archive |
+`--------'    `---------' |   `------------'  |   `-------'   `---------'
+                          |        ...        |
+                          |   .------------.  |
+                          `-->| worker N-1 |--'
+                              `------------'
+@end verbatim
+
+Decoding an archive is somewhat similar to how plzip decompresses a regular
+file to standard output, with the differences that it is not the data but
+only messages what is written to stdout/stderr, and that each worker may
+access files in the file system either to read them (diff) or write them
+(extract). As in plzip, each worker reads members directly from the archive.
+
+@verbatim
+.--------.
+| file   |<---> data to/from each worker below
+| system |
+`--------'      .------------.
+            ,-->| worker   0 |--,
+            |   `------------'  |
+.---------. |   .------------.  |   .-------.   .--------.
+| archive |-+-->| worker   1 |--+-->| muxer |-->| stdout |
+`---------' |   `------------'  |   `-------'   | stderr |
+            |        ...        |               `--------'
+            |   .------------.  |
+            `-->| worker N-1 |--'
+                `------------'
+@end verbatim
+
+As misaligned tar.lz archives can't be decoded in parallel, and the
+misalignment can't be detected until after decoding has started, a
+"mastership request" mechanism has been designed that allows the decoding to
+continue instead of signalling an error.
+
+During parallel decoding, if a worker finds a misalignment, it requests
+mastership to decode the rest of the archive. When mastership is requested,
+an error_member_id is set, and all subsequently received packets with
+member_id > error_member_id are rejected. All workers requesting mastership
+are blocked at the request_mastership call until mastership is granted.
+Mastership is granted to the delivering worker when its queue is empty to
+make sure that all preceding packets have been processed. When mastership is
+granted, all packets are deleted and all subsequently received packets not
+coming from the master are rejected.
+
+If a worker can't continue decoding for any cause (for example lack of
+memory or finding a split tar member at the beginning of a lzip member), it
+requests mastership to print an error and terminate the program. Only if
+some other worker requests mastership in a previous lzip member can this
+error be avoided.
+
+
+@node Multi-threaded decoding
+@chapter Limitations of parallel tar decoding
+@cindex parallel tar decoding
+
+Safely decoding an arbitrary tar archive in parallel is only possible if one
+decodes the headers sequentially first. For example, if a tar archive
+containing another tar archive is decoded starting from some position other
+than the beginning, there is no way to know if the first header found there
+belongs to the outer tar archive or to the inner tar archive. Tar is a
+format inherently serial; it was designed for tapes.
+
+The pax format is even more serial than the ustar format. Two headers need
+to be decoded sequentially for each file. The extended header may even need
+parsing to reveal something as basic as file size. If a thread decodes the
+ustar header skipping the preceding extended header, it may extract a file
+of incorrect size at the wrong place. Moreover, a pax archive with global
+headers can't be decoded in parallel because each thread can't know about
+the global headers decoded by other threads.
+
+In the case of compressed tar archives, the start of each compressed block
+determines one point through which the tar archive can be decoded in
+parallel. Therefore, in tar.lz archives the decoding operations can't be
+parallelized if the tar members are not aligned with the lzip members. Tar
+archives compressed with plzip can't be decoded in parallel because tar and
+plzip do not have a way to align both sets of members. Certainly one can
+decompress one such archive with a multi-threaded tool like plzip, but the
+increase in speed is not as large as it could be because plzip must
+serialize the decompressed data and pass them to tar, which decodes them
+sequentially, one tar member at a time.
+
+On the other hand, if the tar.lz archive is created with a tool like tarlz,
+which can guarantee the alignment between tar members and lzip members
+because it controls both archiving and compression, then the lzip format
+becomes an indexed layer on top of the tar archive which makes possible
+decoding it safely in parallel.
+
+Tarlz is able to automatically decode aligned and unaligned multimember
+tar.lz archives, keeping backwards compatibility. If tarlz finds a member
+misalignment during multi-threaded decoding, it switches to single-threaded
+mode and continues decoding the archive.
+
+If the files in the archive are large, multi-threaded @option{--list} on a
+regular (seekable) tar.lz archive can be hundreds of times faster than
+sequential @option{--list} because, in addition to using several processors,
+it only needs to decompress part of each lzip member. See the following
+example listing the Silesia corpus on a dual core machine:
+
+@example
+tarlz -9 --no-solid -cf silesia.tar.lz silesia
+time lzip -cd silesia.tar.lz | tar -tf -            (5.032s)
+time plzip -cd silesia.tar.lz | tar -tf -           (3.256s)
+time tarlz -tf silesia.tar.lz                       (0.020s)
+@end example
+
+On the other hand, multi-threaded @option{--list} won't detect corruption in
+the tar member data because it only decodes the part of each lzip member
+corresponding to the tar member header. This is another reason why the tar
+headers must provide their own integrity checking.
+
+@sp 1
+@anchor{mt-extraction}
+@section Limitations of multi-threaded extraction
+
+Multi-threaded extraction may produce different output than single-threaded
+extraction in some cases:
+
+During multi-threaded extraction, several independent threads are
+simultaneously reading the archive and creating files in the file system.
+The archive is not read sequentially. As a consequence, any error or
+weirdness in the archive (like a corrupt member or an end-of-archive block
+in the middle of the archive) won't be usually detected until part of the
+archive beyond that point has been processed.
+
+If the archive contains two or more tar members with the same name,
+single-threaded extraction extracts the members in the order they appear in
+the archive and leaves in the file system the last version of the file. But
+multi-threaded extraction may extract the members in any order and leave in
+the file system any version of the file nondeterministically. It is
+unspecified which of the tar members is extracted.
+
+If the same file is extracted through several paths (different member names
+resolve to the same file in the file system), the result is undefined.
+(Probably the resulting file will be mangled).
+
+Extraction of a hard link may fail if it is extracted before the file it
+links to.
+
+
+@node Minimum archive sizes
+@chapter Minimum archive sizes required for multi-threaded block compression
+@cindex minimum archive sizes
+
+When creating or appending to a compressed archive using multi-threaded
+block compression, tarlz puts tar members together in blocks and compresses
+as many blocks simultaneously as worker threads are chosen, creating a
+multimember compressed archive.
+
+For this to work as expected (and roughly multiply the compression speed by
+the number of available processors), the uncompressed archive must be at
+least as large as the number of worker threads times the block size
+(@pxref{--data-size}). Else some processors do not get any data to compress,
+and compression is proportionally slower. The maximum speed increase
+achievable on a given archive is limited by the ratio
+@w{(uncompressed_size / data_size)}. For example, a tarball the size of gcc
+or linux scales up to 10 or 14 processors at level -9.
+
+The following table shows the minimum uncompressed archive size needed for
+full use of N processors at a given compression level, using the default
+data size for each level:
+
+@multitable {Processors} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB}
+@headitem Processors @tab 2 @tab 4 @tab 8 @tab 16 @tab 64 @tab 256
+@item Level
+@item -0 @tab   2 MiB @tab   4 MiB @tab   8 MiB @tab  16 MiB @tab  64 MiB @tab 256 MiB
+@item -1 @tab   4 MiB @tab   8 MiB @tab  16 MiB @tab  32 MiB @tab 128 MiB @tab 512 MiB
+@item -2 @tab   6 MiB @tab  12 MiB @tab  24 MiB @tab  48 MiB @tab 192 MiB @tab 768 MiB
+@item -3 @tab   8 MiB @tab  16 MiB @tab  32 MiB @tab  64 MiB @tab 256 MiB @tab   1 GiB
+@item -4 @tab  12 MiB @tab  24 MiB @tab  48 MiB @tab  96 MiB @tab 384 MiB @tab 1.5 GiB
+@item -5 @tab  16 MiB @tab  32 MiB @tab  64 MiB @tab 128 MiB @tab 512 MiB @tab   2 GiB
+@item -6 @tab  32 MiB @tab  64 MiB @tab 128 MiB @tab 256 MiB @tab   1 GiB @tab   4 GiB
+@item -7 @tab  64 MiB @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab   2 GiB @tab   8 GiB
+@item -8 @tab  96 MiB @tab 192 MiB @tab 384 MiB @tab 768 MiB @tab   3 GiB @tab  12 GiB
+@item -9 @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab   1 GiB @tab   4 GiB @tab  16 GiB
+@end multitable
+
+
+@node Examples
+@chapter A small tutorial with examples
+@cindex examples
+
+@noindent
+Example 1: Create a multimember compressed archive @samp{archive.tar.lz}
+containing files @samp{a}, @samp{b} and @samp{c}.
+
+@example
+tarlz -cf archive.tar.lz a b c
+@end example
+
+@sp 1
+@noindent
+Example 2: Append files @samp{d} and @samp{e} to the multimember compressed
+archive @samp{archive.tar.lz}.
+
+@example
+tarlz -rf archive.tar.lz d e
+@end example
+
+@sp 1
+@noindent
+Example 3: Create a solidly compressed appendable archive
+@samp{archive.tar.lz} containing files @samp{a}, @samp{b} and @samp{c}.
+Then append files @samp{d} and @samp{e} to the archive.
+
+@example
+tarlz --asolid -cf archive.tar.lz a b c
+tarlz --asolid -rf archive.tar.lz d e
+@end example
+
+@sp 1
+@noindent
+Example 4: Create a compressed appendable archive containing directories
+@samp{dir1}, @samp{dir2} and @samp{dir3} with a separate lzip member per
+directory. Then append files @samp{a}, @samp{b}, @samp{c}, @samp{d} and
+@samp{e} to the archive, all of them contained in a single lzip member.
+The resulting archive @samp{archive.tar.lz} contains 5 lzip members
+(including the end-of-archive member).
+
+@example
+tarlz --dsolid -cf archive.tar.lz dir1 dir2 dir3
+tarlz --asolid -rf archive.tar.lz a b c d e
+@end example
+
+@sp 1
+@noindent
+Example 5: Create a solidly compressed archive @samp{archive.tar.lz}
+containing files @samp{a}, @samp{b} and @samp{c}. Note that no more
+files can be later appended to the archive.
+
+@example
+tarlz --solid -cf archive.tar.lz a b c
+@end example
+
+@sp 1
+@noindent
+Example 6: Extract all files from archive @samp{archive.tar.lz}.
+
+@example
+tarlz -xf archive.tar.lz
+@end example
+
+@sp 1
+@noindent
+Example 7: Extract files @samp{a} and @samp{c}, and the whole tree under
+directory @samp{dir1} from archive @samp{archive.tar.lz}.
+
+@example
+tarlz -xf archive.tar.lz a c dir1
+@end example
+
+@sp 1
+@noindent
+Example 8: Copy the contents of directory @samp{sourcedir} to the directory
+@samp{destdir}.
+
+@example
+tarlz -C sourcedir --uncompressed -cf - . | tarlz -C destdir -xf -
+@end example
+
+@sp 1
+@noindent
+Example 9: Compress the existing POSIX archive @samp{archive.tar} and write
+the output to @samp{archive.tar.lz}. Compress each member individually for
+maximum availability. (If one member in the compressed archive gets damaged,
+the other members can still be extracted).
+
+@example
+tarlz -z --no-solid archive.tar
+@end example
+
+@sp 1
+@noindent
+Example 10: Compress the archive @samp{archive.tar} and write the output to
+@samp{foo.tar.lz}.
+
+@example
+tarlz -z -o foo.tar.lz archive.tar
+@end example
+
+@sp 1
+@noindent
+Example 11: Concatenate and compress two archives @samp{archive1.tar} and
+@samp{archive2.tar}, and write the output to @samp{foo.tar.lz}.
+
+@example
+tarlz -A archive1.tar archive2.tar | tarlz -z -o foo.tar.lz
+@end example
+
+
+@node Problems
+@chapter Reporting bugs
+@cindex bugs
+@cindex getting help
+
+There are probably bugs in tarlz. There are certainly errors and
+omissions in this manual. If you report them, they will get fixed. If
+you don't, no one will ever know about them and they will remain unfixed
+for all eternity, if not longer.
+
+If you find a bug in tarlz, please send electronic mail to
+@email{lzip-bug@@nongnu.org}. Include the version number, which you can
+find by running @w{@samp{tarlz --version}} and
+@w{@samp{tarlz -v --check-lib}}.
+
+
+@node Concept index
+@unnumbered Concept index
+
+@printindex cp
+
+@bye
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-14 12:57:29 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-14 12:57:29 +0000
commit	29146f385a524ad6a4b1b127cc3d9641a8fe0adc (patch)
tree	1caea11496a3d9e0333cdf649d9f9be6d5a67b78 /doc
parent	Initial commit. (diff)
download	tarlz-29146f385a524ad6a4b1b127cc3d9641a8fe0adc.tar.xz tarlz-29146f385a524ad6a4b1b127cc3d9641a8fe0adc.zip