summaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-01-23 05:08:19 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-01-23 05:08:19 +0000
commitcb1387c92038634c063ee06a24e249b87525f519 (patch)
treeaeebf76566be407c42678fff1c2482ee9dc8fe17 /doc
parentReleasing debian version 0.23-3. (diff)
downloadtarlz-cb1387c92038634c063ee06a24e249b87525f519.tar.xz
tarlz-cb1387c92038634c063ee06a24e249b87525f519.zip
Merging upstream version 0.25.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc')
-rw-r--r--doc/tarlz.127
-rw-r--r--doc/tarlz.info179
-rw-r--r--doc/tarlz.texi212
3 files changed, 227 insertions, 191 deletions
diff --git a/doc/tarlz.1 b/doc/tarlz.1
index d23b164..9d63da5 100644
--- a/doc/tarlz.1
+++ b/doc/tarlz.1
@@ -1,5 +1,5 @@
-.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.16.
-.TH TARLZ "1" "September 2022" "tarlz 0.23" "User Commands"
+.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.2.
+.TH TARLZ "1" "January 2024" "tarlz 0.25" "User Commands"
.SH NAME
tarlz \- creates tar archives with multimember lzip compression
.SH SYNOPSIS
@@ -10,12 +10,12 @@ Tarlz is a massively parallel (multi\-threaded) combined implementation of
the tar archiver and the lzip compressor. Tarlz uses the compression library
lzlib.
.PP
-Tarlz creates, lists, and extracts archives in a simplified and safer
-variant of the POSIX pax format compressed in lzip format, keeping the
-alignment between tar members and lzip members. The resulting multimember
-tar.lz archive is fully backward compatible with standard tar tools like GNU
-tar, which treat it like any other tar.lz archive. Tarlz can append files to
-the end of such compressed archives.
+Tarlz creates tar archives using a simplified and safer variant of the POSIX
+pax format compressed in lzip format, keeping the alignment between tar
+members and lzip members. The resulting multimember tar.lz archive is
+backward compatible with standard tar tools like GNU tar, which treat it
+like any other tar.lz archive. Tarlz can append files to the end of such
+compressed archives.
.PP
Keeping the alignment between tar members and lzip members has two
advantages. It adds an indexed lzip layer on top of the tar archive, making
@@ -80,7 +80,7 @@ follow symlinks; archive the files they point to
set number of (de)compression threads [2]
.TP
\fB\-o\fR, \fB\-\-output=\fR<file>
-compress to <file>
+compress to <file> ('\-' for stdout)
.TP
\fB\-p\fR, \fB\-\-preserve\-permissions\fR
don't subtract the umask on extraction
@@ -127,6 +127,9 @@ exclude files matching a shell pattern
\fB\-\-ignore\-ids\fR
ignore differences in owner and group IDs
.TP
+\fB\-\-ignore\-metadata\fR
+compare only file size and file content
+.TP
\fB\-\-ignore\-overflow\fR
ignore mtime overflow differences on 32\-bit
.TP
@@ -149,7 +152,7 @@ If no archive is specified, tarlz tries to read it from standard input or
write it to standard output.
.PP
Exit status: 0 for a normal exit, 1 for environmental problems
-(file not found, files differ, invalid command line options, I/O errors,
+(file not found, files differ, invalid command\-line options, I/O errors,
etc), 2 to indicate a corrupt or invalid input file, 3 for an internal
consistency error (e.g., bug) which caused tarlz to panic.
.SH "REPORTING BUGS"
@@ -157,8 +160,8 @@ Report bugs to lzip\-bug@nongnu.org
.br
Tarlz home page: http://www.nongnu.org/lzip/tarlz.html
.SH COPYRIGHT
-Copyright \(co 2022 Antonio Diaz Diaz.
-Using lzlib 1.13
+Copyright \(co 2024 Antonio Diaz Diaz.
+Using lzlib 1.14\-rc1
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
.br
This is free software: you are free to change and redistribute it.
diff --git a/doc/tarlz.info b/doc/tarlz.info
index d71c0a3..25ba882 100644
--- a/doc/tarlz.info
+++ b/doc/tarlz.info
@@ -11,12 +11,12 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
Tarlz Manual
************
-This manual is for Tarlz (version 0.23, 23 September 2022).
+This manual is for Tarlz (version 0.25, 3 January 2024).
* Menu:
* Introduction:: Purpose and features of tarlz
-* Invoking tarlz:: Command line interface
+* Invoking tarlz:: Command-line interface
* Portable character set:: POSIX portable filename character set
* File format:: Detailed format of the compressed archive
* Amendments to pax format:: The reasons for the differences with pax
@@ -28,7 +28,7 @@ This manual is for Tarlz (version 0.23, 23 September 2022).
* Concept index:: Index of concepts
- Copyright (C) 2013-2022 Antonio Diaz Diaz.
+ Copyright (C) 2013-2024 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to copy,
distribute, and modify it.
@@ -46,8 +46,8 @@ library lzlib.
Tarlz creates tar archives using a simplified and safer variant of the
POSIX pax format compressed in lzip format, keeping the alignment between
tar members and lzip members. The resulting multimember tar.lz archive is
-fully backward compatible with standard tar tools like GNU tar, which treat
-it like any other tar.lz archive. Tarlz can append files to the end of such
+backward compatible with standard tar tools like GNU tar, which treat it
+like any other tar.lz archive. Tarlz can append files to the end of such
compressed archives.
Keeping the alignment between tar members and lzip members has two
@@ -58,9 +58,9 @@ plzip may even double the amount of files lost for each lzip member damaged
because it does not keep the members aligned.
Tarlz can create tar archives with five levels of compression
-granularity: per file (--no-solid), per block (--bsolid, default), per
-directory (--dsolid), appendable solid (--asolid), and solid (--solid). It
-can also create uncompressed tar archives.
+granularity: per file ('--no-solid'), per block ('--bsolid', default), per
+directory ('--dsolid'), appendable solid ('--asolid'), and solid
+('--solid'). It can also create uncompressed tar archives.
Of course, compressing each file (or each directory) individually can't
achieve a compression ratio as high as compressing solidly the whole tar
@@ -87,9 +87,9 @@ archive, but it has the following advantages:
Tarlz protects the extended records with a Cyclic Redundancy Check (CRC)
in a way compatible with standard tar tools. *Note crc32::.
- Tarlz does not understand other tar formats like 'gnu', 'oldgnu', 'star'
-or 'v7'. The command 'tarlz -tf archive.tar.lz > /dev/null' can be used to
-verify that the format of the archive is compatible with tarlz.
+ Tarlz does not understand other tar formats like 'gnu', 'oldgnu',
+'star', or 'v7'. The command 'tarlz -t -f archive.tar.lz > /dev/null' can
+be used to check that the format of the archive is compatible with tarlz.

File: tarlz.info, Node: Invoking tarlz, Next: Portable character set, Prev: Introduction, Up: Top
@@ -140,7 +140,7 @@ to '-1 --solid'.
'-A'
'--concatenate'
Append one or more archives to the end of an archive. If no archive is
- specified with the option '-f', the input archives are concatenated to
+ specified with the option '-f', concatenate the input archives to
standard output. All the archives involved must be regular (seekable)
files, and must be either all compressed or all uncompressed.
Compressed and uncompressed archives can't be mixed. Compressed
@@ -163,7 +163,7 @@ to '-1 --solid'.
'-d'
'--diff'
Compare and report differences between archive and file system. For
- each tar member in the archive, verify that the corresponding file in
+ each tar member in the archive, check that the corresponding file in
the file system exists and is of the same type (regular file,
directory, etc). Report on standard output the differences found in
type, mode (permissions), owner and group IDs, modification time, file
@@ -224,22 +224,25 @@ to '-1 --solid'.
directory without extracting the files under it, use
'tarlz -xf foo --exclude='dir/*' dir'. Tarlz removes files and empty
directories unconditionally before extracting over them. Other than
- that, it will not make any special effort to extract a file over an
+ that, it does not make any special effort to extract a file over an
incompatible type of file. For example, extracting a file over a
- non-empty directory will usually fail.
+ non-empty directory usually fails.
'-z'
'--compress'
Compress existing POSIX tar archives aligning the lzip members to the
- tar members with choice of granularity (--bsolid by default, --dsolid
- works like --asolid). The input archives are kept unchanged. Existing
- compressed archives are not overwritten. A hyphen '-' used as the name
- of an input archive reads from standard input and writes to standard
- output (unless the option '--output' is used). Tarlz can be used as
- compressor for GNU tar using a command like
- 'tar -c -Hustar foo | tarlz -z -o foo.tar.lz'. Note that tarlz only
- works reliably on archives without global headers, or with global
- headers whose content can be ignored.
+ tar members with choice of granularity ('--bsolid' by default,
+ '--dsolid' works like '--asolid'). Exit with error status 2 if any
+ input archive is an empty file. The input archives are kept unchanged.
+ Existing compressed archives are not overwritten. A hyphen '-' used as
+ the name of an input archive reads from standard input and writes to
+ standard output (unless the option '--output' is used). Tarlz can be
+ used as compressor for GNU tar by using a command like
+ 'tar -c -Hustar foo | tarlz -z -o foo.tar.lz'. Tarlz can be used as
+ compressor for zupdate (zutils) by using a command like
+ 'zupdate --lz="tarlz -z" foo.tar.gz'. Note that tarlz only works
+ reliably on archives without global headers, or with global headers
+ whose content can be ignored.
The compression is reversible, including any garbage present after the
end-of-archive blocks. Tarlz stops parsing after the first
@@ -277,18 +280,18 @@ to '-1 --solid'.
'-C DIR'
'--directory=DIR'
- Change to directory DIR. When creating or appending, the position of
- each '-C' option in the command line is significant; it will change the
- current working directory for the following FILES until a new '-C'
- option appears in the command line. When extracting or comparing, all
- the '-C' options are executed in sequence before reading the archive.
- Listing ignores any '-C' options specified. DIR is relative to the
- then current working directory, perhaps changed by a previous '-C'
+ Change to directory DIR. When creating, appending, comparing, or
+ extracting, the position of each '-C' option in the command line is
+ significant; it changes the current working directory for the following
+ FILES until a new '-C' option appears in the command line. '--list'
+ and '--delete' ignore any '-C' options specified. DIR is relative to
+ the then current working directory, perhaps changed by a previous '-C'
option.
Note that a process can only have one current working directory (CWD).
- Therefore multi-threading can't be used to create an archive if a '-C'
- option appears after a relative file name in the command line.
+ Therefore multi-threading can't be used to create or decode an archive
+ if a '-C' option appears after a (relative) file name in the command
+ line. (All file names are made relative when decoding).
'-f ARCHIVE'
'--file=ARCHIVE'
@@ -308,8 +311,7 @@ to '-1 --solid'.
support". A value of 0 disables threads entirely. If this option is
not used, tarlz tries to detect the number of processors in the system
and use it as default value. 'tarlz --help' shows the system's default
- value. See the note about multi-threaded archive creation in the
- option '-C' above.
+ value. See the note about multi-threading in the option '-C' above.
Note that the number of usable threads is limited during compression to
ceil( uncompressed_size / data_size ) (*note Minimum archive sizes::),
@@ -360,7 +362,9 @@ to '-1 --solid'.
With '--create', don't compress the tar archive created. Create an
uncompressed tar archive instead. With '--append', don't compress the
new members appended to the tar archive. Compressed members can't be
- appended to an uncompressed archive, nor vice versa.
+ appended to an uncompressed archive, nor vice versa. '--uncompressed'
+ can be omitted if it can be deduced from the archive name. (An
+ uncompressed archive name lacks a '.lz' or '.tlz' extension).
'--asolid'
When creating or appending to a compressed archive, use appendable
@@ -429,6 +433,12 @@ to '-1 --solid'.
Make '--diff' ignore differences in owner and group IDs. This option is
useful when comparing an '--anonymous' archive.
+'--ignore-metadata'
+ Make '--diff' ignore any differences in metadata (file permissions,
+ owner and group IDs, modification time). Compare only file type, file
+ size, and file content. This option is useful when file permissions
+ have not been fully restored because uid/gid changed on extraction.
+
'--ignore-overflow'
Make '--diff' ignore differences in mtime caused by overflow on 32-bit
systems with a 32-bit time_t.
@@ -438,13 +448,13 @@ to '-1 --solid'.
happens while extracting a file, keep the partial data extracted. Use
this option to recover as much data as possible from each damaged
member. It is recommended to run tarlz in single-threaded mode
- (--threads=0) when using this option.
+ ('--threads=0') when using this option.
'--missing-crc'
Exit with error status 2 if the CRC of the extended records is
missing. When this option is used, tarlz detects any corruption in the
extended records (only limited by CRC collisions). But note that a
- corrupt 'GNU.crc32' keyword, for example 'GNU.crc33', is reported as a
+ corrupt 'GNU.crc32' keyword, for example 'GNU.crc30', is reported as a
missing CRC instead of as a corrupt record. This misleading
'Missing CRC' message is the consequence of a flaw in the POSIX pax
format; i.e., the lack of a mandatory check sequence of the extended
@@ -481,7 +491,7 @@ to '-1 --solid'.
Exit status: 0 for a normal exit, 1 for environmental problems (file not
-found, files differ, invalid command line options, I/O errors, etc), 2 to
+found, files differ, invalid command-line options, I/O errors, etc), 2 to
indicate a corrupt or invalid input file, 3 for an internal consistency
error (e.g., bug) which caused tarlz to panic.
@@ -525,7 +535,7 @@ In the diagram below, a box like this:
bytes (for example 512).
- A tar.lz file consists of a series of lzip members (compressed data
+ A tar.lz file consists of one or more lzip members (compressed data
sets). The members simply appear one after another in the file, with no
additional information before, between, or after them.
@@ -560,7 +570,7 @@ binary zeros, interpreted as an end-of-archive indicator. These EOA blocks
are either compressed in a separate lzip member or compressed along with the
tar members contained in the last lzip member. For a compressed archive to
be recognized by tarlz as appendable, the last lzip member must contain
-between 512 and 32256 zeros alone.
+between 512 and 32256 zeros alone (without any non-zero bytes).
The diagram below shows the correspondence between each tar member
(formed by one or two headers plus optional data) in the tar archive and
@@ -588,6 +598,10 @@ header block are zeroed on archive creation to prevent trouble if the
archive is read by an ustar tool, and are ignored by tarlz on archive
extraction. *Note flawed-compat::.
+ Tarlz limits the size of the pax extended header data so that the whole
+header set (extended header + extended data + ustar header) can be read and
+decoded in a buffer of size INT_MAX.
+
The pax extended header data consists of one or more records, each of
them constructed as follows:
'"%d %s=%s\n", <length>, <keyword>, <value>'
@@ -610,7 +624,7 @@ space, equal-sign, and newline.
'gid'
The unsigned decimal representation of the group ID of the group that
owns the following file. The gid record is created only for files with
- a group ID greater than 2_097_151 (octal 7777777). *Note
+ a group ID greater than 2_097_151 (octal 7_777_777). *Note
ustar-uid-gid::.
'linkpath'
@@ -618,11 +632,11 @@ space, equal-sign, and newline.
previously archived. This record overrides the field 'linkname' in the
following ustar header block. The following ustar header block
determines the type of link created. If typeflag of the following
- header block is 1, it will be a hard link. If typeflag is 2, it will
- be a symbolic link and the linkpath value will be used as the contents
- of the symbolic link. The linkpath record is created only for links
- with a link name that does not fit in the space provided by the ustar
- header.
+ header block is 1, a hard link is created. If typeflag is 2, a
+ symbolic link is created and the linkpath value is used as the
+ contents of the symbolic link. The linkpath record is created only for
+ links with a link name that does not fit in the space provided by the
+ ustar header.
'mtime'
The signed decimal representation of the modification time of the
@@ -645,19 +659,20 @@ space, equal-sign, and newline.
digits from the ISO/IEC 646:1991 (ASCII) standard. This record
overrides the field 'size' in the following ustar header block. The
size record is created only for files with a size value greater than
- 8_589_934_591 (octal 77777777777); that is, 8 GiB (2^33 bytes) or
+ 8_589_934_591 (octal 77_777_777_777); that is, 8 GiB (2^33 bytes) or
larger.
'uid'
The unsigned decimal representation of the user ID of the file owner
of the following file. The uid record is created only for files with a
- user ID greater than 2_097_151 (octal 7777777). *Note ustar-uid-gid::.
+ user ID greater than 2_097_151 (octal 7_777_777). *Note
+ ustar-uid-gid::.
'GNU.crc32'
CRC32-C (Castagnoli) of the extended header data excluding the 8 bytes
representing the CRC <value> itself. The <value> is represented as 8
hexadecimal digits in big endian order, '22 GNU.crc32=00000000\n'. The
- keyword of the CRC record is protected by the CRC to guarante that
+ keyword of the CRC record is protected by the CRC to guarantee that
corruption is always detected when using '--missing-crc' (except in
case of CRC collision). A CRC was chosen because a checksum is too
weak for a potentially large list of variable sized records. A
@@ -729,7 +744,7 @@ S_IROTH 00004 S_IWOTH 00002 S_IXOTH 00001
The fields 'uid' and 'gid' are the user and group IDs of the owner and
group of the file, respectively. If the file uid or gid are greater than
-2_097_151 (octal 7777777), an extended record is used to store the uid or
+2_097_151 (octal 7_777_777), an extended record is used to store the uid or
gid.
The field 'size' contains the octal representation of the size of the
@@ -739,13 +754,13 @@ records following the header is (size / 512) rounded to the next integer.
For all other values of typeflag, tarlz either sets the size field to 0 or
ignores it, and does not store or expect any logical records following the
header. If the file size is larger than 8_589_934_591 bytes
-(octal 77777777777), an extended record is used to store the file size.
+(octal 77_777_777_777), an extended record is used to store the file size.
The field 'mtime' contains the octal representation of the modification
time of the file at the time it was archived, obtained from the function
'stat'. If the modification time is negative or larger than 8_589_934_591
-(octal 77777777777) seconds since the epoch, an extended record is used to
-store the modification time. The ustar range of mtime goes from
+(octal 77_777_777_777) seconds since the epoch, an extended record is used
+to store the modification time. The ustar range of mtime goes from
'1970-01-01 00:00:00 UTC' to '2242-03-16 12:56:31 UTC'.
The field 'chksum' contains the octal representation of the value of the
@@ -827,7 +842,7 @@ more probable.
Headers and metadata must be protected separately from data because the
integrity checking of lzip may not be able to detect the corruption before
-the metadata has been used, for example, to create a new file in the wrong
+the metadata have been used, for example, to create a new file in the wrong
place.
Because of the above, tarlz protects the extended records with a Cyclic
@@ -843,11 +858,11 @@ to the POSIX-2:1993 standard, POSIX.1-2008 recommends selecting extended
header field values that allow such tar to create a regular file containing
the extended header records as data. This approach is broken because if the
extended header is needed because of a long file name, the fields 'name'
-and 'prefix' will be unable to contain the full file name. (Some tar
+and 'prefix' are unable to contain the full file name. (Some tar
implementations store the truncated name in the field 'name' alone,
truncating the name to only 100 bytes instead of 256). Therefore the files
corresponding to both the extended header and the overridden ustar header
-will be extracted using truncated file names, perhaps overwriting existing
+are extracted using truncated file names, perhaps overwriting existing
files or directories. It may be a security risk to extract a file with a
truncated file name.
@@ -915,7 +930,7 @@ There is no portable way to tell what charset a text string is coded into.
Therefore, tarlz stores all fields representing text strings unmodified,
without conversion to UTF-8 nor any other transformation. This prevents
accidental double UTF-8 conversions. If the need arises this behavior will
-be adjusted with a command line option in the future.
+be adjusted with a command-line option in the future.

File: tarlz.info, Node: Program design, Next: Multi-threaded decoding, Prev: Amendments to pax format, Up: Top
@@ -1098,11 +1113,11 @@ multimember compressed archive.
For this to work as expected (and roughly multiply the compression speed
by the number of available processors), the uncompressed archive must be at
least as large as the number of worker threads times the block size (*note
---data-size::). Else some processors will not get any data to compress, and
-compression will be proportionally slower. The maximum speed increase
-achievable on a given archive is limited by the ratio
-(uncompressed_size / data_size). For example, a tarball the size of gcc or
-linux will scale up to 10 or 14 processors at level -9.
+--data-size::). Else some processors do not get any data to compress, and
+compression is proportionally slower. The maximum speed increase achievable
+on a given archive is limited by the ratio (uncompressed_size / data_size).
+For example, a tarball the size of gcc or linux scales up to 10 or 14
+processors at level -9.
The following table shows the minimum uncompressed archive size needed
for full use of N processors at a given compression level, using the default
@@ -1244,25 +1259,25 @@ Concept index

Tag Table:
Node: Top216
-Node: Introduction1210
-Node: Invoking tarlz4029
-Ref: --data-size12880
-Ref: --bsolid17192
-Node: Portable character set22788
-Node: File format23431
-Ref: key_crc3230188
-Ref: ustar-uid-gid33452
-Ref: ustar-mtime34254
-Node: Amendments to pax format36254
-Ref: crc3236963
-Ref: flawed-compat38274
-Node: Program design42364
-Node: Multi-threaded decoding46289
-Ref: mt-extraction49570
-Node: Minimum archive sizes50876
-Node: Examples53014
-Node: Problems55381
-Node: Concept index55936
+Node: Introduction1207
+Node: Invoking tarlz4032
+Ref: --data-size13076
+Ref: --bsolid17512
+Node: Portable character set23425
+Node: File format24068
+Ref: key_crc3231050
+Ref: ustar-uid-gid34315
+Ref: ustar-mtime35122
+Node: Amendments to pax format37125
+Ref: crc3237834
+Ref: flawed-compat39146
+Node: Program design43228
+Node: Multi-threaded decoding47153
+Ref: mt-extraction50434
+Node: Minimum archive sizes51740
+Node: Examples53867
+Node: Problems56234
+Node: Concept index56789

End Tag Table
diff --git a/doc/tarlz.texi b/doc/tarlz.texi
index 5bdd2af..f37164f 100644
--- a/doc/tarlz.texi
+++ b/doc/tarlz.texi
@@ -6,8 +6,8 @@
@finalout
@c %**end of header
-@set UPDATED 23 September 2022
-@set VERSION 0.23
+@set UPDATED 3 January 2024
+@set VERSION 0.25
@dircategory Archiving
@direntry
@@ -37,7 +37,7 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
@menu
* Introduction:: Purpose and features of tarlz
-* Invoking tarlz:: Command line interface
+* Invoking tarlz:: Command-line interface
* Portable character set:: POSIX portable filename character set
* File format:: Detailed format of the compressed archive
* Amendments to pax format:: The reasons for the differences with pax
@@ -50,7 +50,7 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
@end menu
@sp 1
-Copyright @copyright{} 2013-2022 Antonio Diaz Diaz.
+Copyright @copyright{} 2013-2024 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to copy,
distribute, and modify it.
@@ -68,7 +68,7 @@ compression library @uref{http://www.nongnu.org/lzip/lzlib.html,,lzlib}.
Tarlz creates tar archives using a simplified and safer variant of the POSIX
pax format compressed in lzip format, keeping the alignment between tar
-members and lzip members. The resulting multimember tar.lz archive is fully
+members and lzip members. The resulting multimember tar.lz archive is
backward compatible with standard tar tools like GNU tar, which treat it
like any other tar.lz archive. Tarlz can append files to the end of such
compressed archives.
@@ -81,9 +81,9 @@ plzip may even double the amount of files lost for each lzip member damaged
because it does not keep the members aligned.
Tarlz can create tar archives with five levels of compression granularity:
-per file (---no-solid), per block (---bsolid, default), per directory
-(---dsolid), appendable solid (---asolid), and solid (---solid). It can also
-create uncompressed tar archives.
+per file (@option{--no-solid}), per block (@option{--bsolid}, default), per
+directory (@option{--dsolid}), appendable solid (@option{--asolid}), and
+solid (@option{--solid}). It can also create uncompressed tar archives.
@noindent
Of course, compressing each file (or each directory) individually can't
@@ -104,7 +104,7 @@ archive. Just like an uncompressed tar archive.
It is a safe POSIX-style backup format. In case of corruption, tarlz
can extract all the undamaged members from the tar.lz archive,
skipping over the damaged members, just like the standard
-(uncompressed) tar. Moreover, the option @samp{--keep-damaged} can be used
+(uncompressed) tar. Moreover, the option @option{--keep-damaged} can be used
to recover as much data as possible from each damaged member, and
lziprecover can be used to recover some of the damaged members.
@@ -118,8 +118,8 @@ Tarlz protects the extended records with a Cyclic Redundancy Check (CRC) in
a way compatible with standard tar tools. @xref{crc32}.
Tarlz does not understand other tar formats like @samp{gnu}, @samp{oldgnu},
-@samp{star} or @samp{v7}. The command
-@w{@samp{tarlz -tf archive.tar.lz > /dev/null}} can be used to verify that
+@samp{star}, or @samp{v7}. The command
+@w{@samp{tarlz -t -f archive.tar.lz > /dev/null}} can be used to check that
the format of the archive is compatible with tarlz.
@@ -137,9 +137,9 @@ tarlz @var{operation} [@var{options}] [@var{files}]
@end example
@noindent
-All operations except @samp{--concatenate} and @samp{--compress} operate on
-whole trees if any @var{file} is a directory. All operations except
-@samp{--compress} overwrite output files without warning. If no archive is
+All operations except @option{--concatenate} and @option{--compress} operate
+on whole trees if any @var{file} is a directory. All operations except
+@option{--compress} overwrite output files without warning. If no archive is
specified, tarlz tries to read it from standard input or write it to
standard output. Tarlz refuses to read archive data from a terminal or write
archive data to a terminal. Tarlz detects when the archive being created or
@@ -147,7 +147,7 @@ enlarged is among the files to be archived, appended, or concatenated, and
skips it.
Tarlz does not use absolute file names nor file names above the current
-working directory (perhaps changed by option @samp{-C}). On archive creation
+working directory (perhaps changed by option @option{-C}). On archive creation
or appending tarlz archives the files specified, but removes from member
names any leading and trailing slashes and any file name prefixes containing
a @samp{..} component. On extraction, leading and trailing slashes are also
@@ -161,9 +161,9 @@ member names in the archive or given in the command line, so that
@w{@samp{tarlz -xf foo ./bar baz}} extracts members @samp{bar} and
@samp{./baz} from archive @samp{foo}.
-If several compression levels or @samp{--*solid} options are given, the last
-setting is used. For example @w{@samp{-9 --solid --uncompressed -1}} is
-equivalent to @w{@samp{-1 --solid}}.
+If several compression levels or @option{--*solid} options are given, the last
+setting is used. For example @w{@option{-9 --solid --uncompressed -1}} is
+equivalent to @w{@option{-1 --solid}}.
tarlz supports the following operations:
@@ -179,7 +179,7 @@ This version number should be included in all bug reports.
@item -A
@itemx --concatenate
Append one or more archives to the end of an archive. If no archive is
-specified with the option @samp{-f}, the input archives are concatenated to
+specified with the option @option{-f}, concatenate the input archives to
standard output. All the archives involved must be regular (seekable) files,
and must be either all compressed or all uncompressed. Compressed and
uncompressed archives can't be mixed. Compressed archives must be
@@ -202,23 +202,23 @@ Create a new archive from @var{files}.
@item -d
@itemx --diff
Compare and report differences between archive and file system. For each tar
-member in the archive, verify that the corresponding file in the file system
+member in the archive, check that the corresponding file in the file system
exists and is of the same type (regular file, directory, etc). Report on
standard output the differences found in type, mode (permissions), owner and
group IDs, modification time, file size, file contents (of regular files),
target (of symlinks) and device number (of block/character special files).
-As tarlz removes leading slashes from member names, the option @samp{-C} may
-be used in combination with @samp{--diff} when absolute file names were used
+As tarlz removes leading slashes from member names, the option @option{-C} may
+be used in combination with @option{--diff} when absolute file names were used
on archive creation: @w{@samp{tarlz -C / -d}}. Alternatively, tarlz may be
run from the root directory to perform the comparison.
@item --delete
Delete files and directories from an archive in place. It currently can
delete only from uncompressed archives and from archives with files
-compressed individually (@samp{--no-solid} archives). Note that files of
-about @samp{--data-size} or larger are compressed individually even if
-@samp{--bsolid} is used, and can therefore be deleted. Tarlz takes care to
+compressed individually (@option{--no-solid} archives). Note that files of
+about @option{--data-size} or larger are compressed individually even if
+@option{--bsolid} is used, and can therefore be deleted. Tarlz takes care to
not delete a tar member unless it is possible to do so. For example it won't
try to delete a tar member that is not compressed individually. Even in the
case of finding a corrupt member after having deleted some member(s), tarlz
@@ -261,32 +261,36 @@ Extract files from an archive. If @var{files} are given, extract only the
directory without extracting the files under it, use
@w{@samp{tarlz -xf foo --exclude='dir/*' dir}}. Tarlz removes files and
empty directories unconditionally before extracting over them. Other than
-that, it will not make any special effort to extract a file over an
+that, it does not make any special effort to extract a file over an
incompatible type of file. For example, extracting a file over a non-empty
-directory will usually fail.
+directory usually fails.
@item -z
@itemx --compress
Compress existing POSIX tar archives aligning the lzip members to the tar
-members with choice of granularity (---bsolid by default, ---dsolid works
-like ---asolid). The input archives are kept unchanged. Existing compressed
-archives are not overwritten. A hyphen @samp{-} used as the name of an input
-archive reads from standard input and writes to standard output (unless the
-option @samp{--output} is used). Tarlz can be used as compressor for GNU tar
-using a command like @w{@samp{tar -c -Hustar foo | tarlz -z -o foo.tar.lz}}.
-Note that tarlz only works reliably on archives without global headers, or
-with global headers whose content can be ignored.
+members with choice of granularity (@option{--bsolid} by default,
+@option{--dsolid} works like @option{--asolid}). Exit with error status 2 if
+any input archive is an empty file. The input archives are kept unchanged.
+Existing compressed archives are not overwritten. A hyphen @samp{-} used as
+the name of an input archive reads from standard input and writes to
+standard output (unless the option @option{--output} is used). Tarlz can be
+used as compressor for GNU tar by using a command like
+@w{@samp{tar -c -Hustar foo | tarlz -z -o foo.tar.lz}}. Tarlz can be used as
+compressor for zupdate (zutils) by using a command like
+@w{@samp{zupdate --lz="tarlz -z" foo.tar.gz}}. Note that tarlz only works
+reliably on archives without global headers, or with global headers whose
+content can be ignored.
The compression is reversible, including any garbage present after the
end-of-archive blocks. Tarlz stops parsing after the first end-of-archive
block is found, and then compresses the rest of the archive. Unless solid
compression is requested, the end-of-archive blocks are compressed in a lzip
member separated from the preceding members and from any non-zero garbage
-following the end-of-archive blocks. @samp{--compress} implies plzip
+following the end-of-archive blocks. @option{--compress} implies plzip
argument style, not tar style. Each input archive is compressed to a file
-with the extension @samp{.lz} added unless the option @samp{--output} is
-used. When @samp{--output} is used, only one input archive can be specified.
-@samp{-f} can't be used with @samp{--compress}.
+with the extension @samp{.lz} added unless the option @option{--output} is
+used. When @option{--output} is used, only one input archive can be specified.
+@option{-f} can't be used with @option{--compress}.
@item --check-lib
Compare the
@@ -314,25 +318,25 @@ tarlz supports the following
@anchor{--data-size}
@item -B @var{bytes}
@itemx --data-size=@var{bytes}
-Set target size of input data blocks for the option @samp{--bsolid}.
+Set target size of input data blocks for the option @option{--bsolid}.
@xref{--bsolid}. Valid values range from @w{8 KiB} to @w{1 GiB}. Default
-value is two times the dictionary size, except for option @samp{-0} where it
+value is two times the dictionary size, except for option @option{-0} where it
defaults to @w{1 MiB}. @xref{Minimum archive sizes}.
@item -C @var{dir}
@itemx --directory=@var{dir}
-Change to directory @var{dir}. When creating or appending, the position of
-each @samp{-C} option in the command line is significant; it will change the
-current working directory for the following @var{files} until a new
-@samp{-C} option appears in the command line. When extracting or comparing,
-all the @samp{-C} options are executed in sequence before reading the
-archive. Listing ignores any @samp{-C} options specified. @var{dir} is
-relative to the then current working directory, perhaps changed by a
-previous @samp{-C} option.
+Change to directory @var{dir}. When creating, appending, comparing, or
+extracting, the position of each @option{-C} option in the command line is
+significant; it changes the current working directory for the following
+@var{files} until a new @option{-C} option appears in the command line.
+@option{--list} and @option{--delete} ignore any @option{-C} options
+specified. @var{dir} is relative to the then current working directory,
+perhaps changed by a previous @option{-C} option.
Note that a process can only have one current working directory (CWD).
-Therefore multi-threading can't be used to create an archive if a @samp{-C}
-option appears after a relative file name in the command line.
+Therefore multi-threading can't be used to create or decode an archive if a
+@option{-C} option appears after a (relative) file name in the command line.
+(All file names are made relative when decoding).
@item -f @var{archive}
@itemx --file=@var{archive}
@@ -351,7 +355,7 @@ Valid values range from 0 to "as many as your system can support". A value
of 0 disables threads entirely. If this option is not used, tarlz tries to
detect the number of processors in the system and use it as default value.
@w{@samp{tarlz --help}} shows the system's default value. See the note about
-multi-threaded archive creation in the option @samp{-C} above.
+multi-threading in the option @option{-C} above.
Note that the number of usable threads is limited during compression to
@w{ceil( uncompressed_size / data_size )} (@pxref{Minimum archive sizes}),
@@ -360,9 +364,9 @@ archive, which you can find by running @w{@samp{lzip -lv archive.tar.lz}}.
@item -o @var{file}
@itemx --output=@var{file}
-Write the compressed output to @var{file}. @w{@samp{-o -}} writes the
-compressed output to standard output. Currently @samp{--output} only works
-with @samp{--compress}.
+Write the compressed output to @var{file}. @w{@option{-o -}} writes the
+compressed output to standard output. Currently @option{--output} only works
+with @option{--compress}.
@item -p
@itemx --preserve-permissions
@@ -381,8 +385,8 @@ Verbosely list files processed. Further -v's (up to 4) increase the
verbosity level.
@item -0 .. -9
-Set the compression level for @samp{--create}, @samp{--append}, and
-@samp{--compress}. The default compression level is @samp{-6}. Like lzip,
+Set the compression level for @option{--create}, @option{--append}, and
+@option{--compress}. The default compression level is @option{-6}. Like lzip,
tarlz also minimizes the dictionary size of the lzip members it creates,
reducing the amount of memory required for decompression.
@@ -401,10 +405,12 @@ reducing the amount of memory required for decompression.
@end multitable
@item --uncompressed
-With @samp{--create}, don't compress the tar archive created. Create an
-uncompressed tar archive instead. With @samp{--append}, don't compress the
+With @option{--create}, don't compress the tar archive created. Create an
+uncompressed tar archive instead. With @option{--append}, don't compress the
new members appended to the tar archive. Compressed members can't be
-appended to an uncompressed archive, nor vice versa.
+appended to an uncompressed archive, nor vice versa. @option{--uncompressed}
+can be omitted if it can be deduced from the archive name. (An uncompressed
+archive name lacks a @samp{.lz} or @samp{.tlz} extension).
@item --asolid
When creating or appending to a compressed archive, use appendable solid
@@ -447,7 +453,7 @@ appendable. No more files can be later appended to the archive. Solid
archives can't be created nor decoded in parallel.
@item --anonymous
-Equivalent to @w{@samp{--owner=root --group=root}}.
+Equivalent to @w{@option{--owner=root --group=root}}.
@item --owner=@var{owner}
When creating or appending, use @var{owner} for files added to the archive.
@@ -465,27 +471,34 @@ to match if any component of the file name matches. For example, @samp{*.o}
matches @samp{foo.o}, @samp{foo.o/bar} and @samp{foo/bar.o}. If
@var{pattern} contains a @samp{/}, it matches a corresponding @samp{/} in
the file name. For example, @samp{foo/*.o} matches @samp{foo/bar.o}.
-Multiple @samp{--exclude} options can be specified.
+Multiple @option{--exclude} options can be specified.
@item --ignore-ids
-Make @samp{--diff} ignore differences in owner and group IDs. This option is
-useful when comparing an @samp{--anonymous} archive.
+Make @option{--diff} ignore differences in owner and group IDs. This option is
+useful when comparing an @option{--anonymous} archive.
+
+@item --ignore-metadata
+Make @option{--diff} ignore any differences in metadata (file permissions,
+owner and group IDs, modification time). Compare only file type, file size,
+and file content. This option is useful when file permissions have not been
+fully restored because uid/gid changed on extraction.
@item --ignore-overflow
-Make @samp{--diff} ignore differences in mtime caused by overflow on 32-bit
+Make @option{--diff} ignore differences in mtime caused by overflow on 32-bit
systems with a 32-bit time_t.
@item --keep-damaged
Don't delete partially extracted files. If a decompression error happens
while extracting a file, keep the partial data extracted. Use this option to
recover as much data as possible from each damaged member. It is recommended
-to run tarlz in single-threaded mode (---threads=0) when using this option.
+to run tarlz in single-threaded mode (@option{--threads=0}) when using this
+option.
@item --missing-crc
Exit with error status 2 if the CRC of the extended records is missing. When
this option is used, tarlz detects any corruption in the extended records
(only limited by CRC collisions). But note that a corrupt @samp{GNU.crc32}
-keyword, for example @samp{GNU.crc33}, is reported as a missing CRC instead
+keyword, for example @samp{GNU.crc30}, is reported as a missing CRC instead
of as a corrupt record. This misleading @w{@samp{Missing CRC}} message is
the consequence of a flaw in the POSIX pax format; i.e., the lack of a
mandatory check sequence of the extended records. @xref{crc32}.
@@ -527,7 +540,7 @@ keyword appearing in the same block of extended records.
@end table
Exit status: 0 for a normal exit, 1 for environmental problems
-(file not found, files differ, invalid command line options, I/O errors,
+(file not found, files differ, invalid command-line options, I/O errors,
etc), 2 to indicate a corrupt or invalid input file, 3 for an internal
consistency error (e.g., bug) which caused tarlz to panic.
@@ -575,7 +588,7 @@ represents a variable number of bytes or a fixed but large number of
bytes (for example 512).
@sp 1
-A tar.lz file consists of a series of lzip members (compressed data sets).
+A tar.lz file consists of one or more lzip members (compressed data sets).
The members simply appear one after another in the file, with no additional
information before, between, or after them.
@@ -606,7 +619,7 @@ Zero or more blocks that contain the contents of the file.
@end itemize
Each tar member must be contiguously stored in a lzip member for the
-parallel decoding operations like @samp{--list} to work. If any tar member
+parallel decoding operations like @option{--list} to work. If any tar member
is split over two or more lzip members, the archive must be decoded
sequentially. @xref{Multi-threaded decoding}.
@@ -615,7 +628,7 @@ binary zeros, interpreted as an end-of-archive indicator. These EOA blocks
are either compressed in a separate lzip member or compressed along with the
tar members contained in the last lzip member. For a compressed archive to
be recognized by tarlz as appendable, the last lzip member must contain
-between 512 and 32256 zeros alone.
+between 512 and 32256 zeros alone (without any non-zero bytes).
The diagram below shows the correspondence between each tar member (formed
by one or two headers plus optional data) in the tar archive and each
@@ -639,7 +652,7 @@ tar.lz
@end verbatim
@ignore
-When @samp{--permissive} is used, the following violations of the
+When @option{--permissive} is used, the following violations of the
archive format are allowed:@*
If several extended headers precede an ustar header, only the last
extended header takes effect. The other extended headers are ignored.
@@ -660,6 +673,10 @@ fields in the pax header block are zeroed on archive creation to prevent
trouble if the archive is read by an ustar tool, and are ignored by tarlz on
archive extraction. @xref{flawed-compat}.
+Tarlz limits the size of the pax extended header data so that the whole
+header set (extended header + extended data + ustar header) can be read and
+decoded in a buffer of size INT_MAX.
+
The pax extended header data consists of one or more records, each of
them constructed as follows:@*
@w{@samp{"%d %s=%s\n", <length>, <keyword>, <value>}}
@@ -683,17 +700,17 @@ time outside of the ustar range. @xref{ustar-mtime}.
@item gid
The unsigned decimal representation of the group ID of the group that owns
the following file. The gid record is created only for files with a group ID
-greater than 2_097_151 (octal 7777777). @xref{ustar-uid-gid}.
+greater than 2_097_151 @w{(octal 7_777_777)}. @xref{ustar-uid-gid}.
@item linkpath
The file name of a link being created to another file, of any type,
previously archived. This record overrides the field @samp{linkname} in the
following ustar header block. The following ustar header block determines
-the type of link created. If typeflag of the following header block is 1, it
-will be a hard link. If typeflag is 2, it will be a symbolic link and the
-linkpath value will be used as the contents of the symbolic link. The
-linkpath record is created only for links with a link name that does not fit
-in the space provided by the ustar header.
+the type of link created. If typeflag of the following header block is 1, a
+hard link is created. If typeflag is 2, a symbolic link is created and the
+linkpath value is used as the contents of the symbolic link. The linkpath
+record is created only for links with a link name that does not fit in the
+space provided by the ustar header.
@item mtime
The signed decimal representation of the modification time of the following
@@ -715,12 +732,12 @@ The size of the file in bytes, expressed as a decimal number using digits
from the ISO/IEC 646:1991 (ASCII) standard. This record overrides the field
@samp{size} in the following ustar header block. The size record is created
only for files with a size value greater than 8_589_934_591
-@w{(octal 77777777777)}; that is, @w{8 GiB} (2^33 bytes) or larger.
+@w{(octal 77_777_777_777)}; that is, @w{8 GiB} (2^33 bytes) or larger.
@item uid
The unsigned decimal representation of the user ID of the file owner of the
following file. The uid record is created only for files with a user ID
-greater than 2_097_151 (octal 7777777). @xref{ustar-uid-gid}.
+greater than 2_097_151 @w{(octal 7_777_777)}. @xref{ustar-uid-gid}.
@anchor{key_crc32}
@item GNU.crc32
@@ -728,8 +745,8 @@ CRC32-C (Castagnoli) of the extended header data excluding the 8 bytes
representing the CRC <value> itself. The <value> is represented as 8
hexadecimal digits in big endian order,
@w{@samp{22 GNU.crc32=00000000\n}}. The keyword of the CRC record is
-protected by the CRC to guarante that corruption is always detected when
-using @samp{--missing-crc} (except in case of CRC collision). A CRC was
+protected by the CRC to guarantee that corruption is always detected when
+using @option{--missing-crc} (except in case of CRC collision). A CRC was
chosen because a checksum is too weak for a potentially large list of
variable sized records. A checksum can't detect simple errors like the
swapping of two bytes.
@@ -804,7 +821,8 @@ table shows the symbolic name of each bit and its octal value:
@anchor{ustar-uid-gid}
The fields @samp{uid} and @samp{gid} are the user and group IDs of the owner
and group of the file, respectively. If the file uid or gid are greater than
-2_097_151 (octal 7777777), an extended record is used to store the uid or gid.
+2_097_151 @w{(octal 7_777_777)}, an extended record is used to store the uid
+or gid.
The field @samp{size} contains the octal representation of the size of the
file in bytes. If the field @samp{typeflag} specifies a file of type '0'
@@ -813,13 +831,13 @@ records following the header is @w{(size / 512)} rounded to the next
integer. For all other values of typeflag, tarlz either sets the size field
to 0 or ignores it, and does not store or expect any logical records
following the header. If the file size is larger than 8_589_934_591 bytes
-@w{(octal 77777777777)}, an extended record is used to store the file size.
+@w{(octal 77_777_777_777)}, an extended record is used to store the file size.
@anchor{ustar-mtime}
The field @samp{mtime} contains the octal representation of the modification
time of the file at the time it was archived, obtained from the function
@samp{stat}. If the modification time is negative or larger than
-8_589_934_591 @w{(octal 77777777777)} seconds since the epoch, an extended
+8_589_934_591 @w{(octal 77_777_777_777)} seconds since the epoch, an extended
record is used to store the modification time. The ustar range of mtime goes
from @w{@samp{1970-01-01 00:00:00 UTC}} to @w{@samp{2242-03-16 12:56:31 UTC}}.
@@ -878,7 +896,7 @@ character.
Tarlz creates safe archives that allow the reliable detection of invalid or
corrupt metadata during decoding even when the integrity checking of lzip
can't be used because the lzip members are only decompressed partially, as
-it happens in parallel @samp{--diff}, @samp{--list}, and @samp{--extract}.
+it happens in parallel @option{--diff}, @option{--list}, and @option{--extract}.
In order to achieve this goal and avoid some other flaws in the pax format,
tarlz makes some changes to the variant of the pax format that it uses. This
chapter describes these changes and the concrete reasons to implement them.
@@ -903,7 +921,7 @@ large, making undetected corruption and archiver misbehavior more probable.
Headers and metadata must be protected separately from data because the
integrity checking of lzip may not be able to detect the corruption before
-the metadata has been used, for example, to create a new file in the wrong
+the metadata have been used, for example, to create a new file in the wrong
place.
Because of the above, tarlz protects the extended records with a Cyclic
@@ -919,11 +937,11 @@ to the POSIX-2:1993 standard, POSIX.1-2008 recommends selecting extended
header field values that allow such tar to create a regular file containing
the extended header records as data. This approach is broken because if the
extended header is needed because of a long file name, the fields
-@samp{name} and @samp{prefix} will be unable to contain the full file name.
+@samp{name} and @samp{prefix} are unable to contain the full file name.
(Some tar implementations store the truncated name in the field @samp{name}
alone, truncating the name to only 100 bytes instead of 256). Therefore the
files corresponding to both the extended header and the overridden ustar
-header will be extracted using truncated file names, perhaps overwriting
+header are extracted using truncated file names, perhaps overwriting
existing files or directories. It may be a security risk to extract a file
with a truncated file name.
@@ -988,7 +1006,7 @@ There is no portable way to tell what charset a text string is coded into.
Therefore, tarlz stores all fields representing text strings unmodified,
without conversion to UTF-8 nor any other transformation. This prevents
accidental double UTF-8 conversions. If the need arises this behavior will
-be adjusted with a command line option in the future.
+be adjusted with a command-line option in the future.
@node Program design
@@ -1117,9 +1135,9 @@ tar.lz archives, keeping backwards compatibility. If tarlz finds a member
misalignment during multi-threaded decoding, it switches to single-threaded
mode and continues decoding the archive.
-If the files in the archive are large, multi-threaded @samp{--list} on a
+If the files in the archive are large, multi-threaded @option{--list} on a
regular (seekable) tar.lz archive can be hundreds of times faster than
-sequential @samp{--list} because, in addition to using several processors,
+sequential @option{--list} because, in addition to using several processors,
it only needs to decompress part of each lzip member. See the following
example listing the Silesia corpus on a dual core machine:
@@ -1130,7 +1148,7 @@ time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
time tarlz -tf silesia.tar.lz (0.020s)
@end example
-On the other hand, multi-threaded @samp{--list} won't detect corruption in
+On the other hand, multi-threaded @option{--list} won't detect corruption in
the tar member data because it only decodes the part of each lzip member
corresponding to the tar member header. This is another reason why the tar
headers must provide their own integrity checking.
@@ -1176,11 +1194,11 @@ multimember compressed archive.
For this to work as expected (and roughly multiply the compression speed by
the number of available processors), the uncompressed archive must be at
least as large as the number of worker threads times the block size
-(@pxref{--data-size}). Else some processors will not get any data to
-compress, and compression will be proportionally slower. The maximum speed
-increase achievable on a given archive is limited by the ratio
+(@pxref{--data-size}). Else some processors do not get any data to compress,
+and compression is proportionally slower. The maximum speed increase
+achievable on a given archive is limited by the ratio
@w{(uncompressed_size / data_size)}. For example, a tarball the size of gcc
-or linux will scale up to 10 or 14 processors at level -9.
+or linux scales up to 10 or 14 processors at level -9.
The following table shows the minimum uncompressed archive size needed for
full use of N processors at a given compression level, using the default