summaryrefslogtreecommitdiffstats
path: root/doc/tarlz.info
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2019-01-23 17:42:07 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2019-01-23 17:42:07 +0000
commit2f15376ba464cf08e710c3353bdacc4f503e11b4 (patch)
tree646663261d4ebf123dd0bb167d626b6c448dc3b8 /doc/tarlz.info
parentReleasing debian version 0.8-2. (diff)
downloadtarlz-2f15376ba464cf08e710c3353bdacc4f503e11b4.tar.xz
tarlz-2f15376ba464cf08e710c3353bdacc4f503e11b4.zip
Merging upstream version 0.9.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/tarlz.info')
-rw-r--r--doc/tarlz.info153
1 files changed, 109 insertions, 44 deletions
diff --git a/doc/tarlz.info b/doc/tarlz.info
index d6d17d0..7f90766 100644
--- a/doc/tarlz.info
+++ b/doc/tarlz.info
@@ -11,7 +11,7 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
Tarlz Manual
************
-This manual is for Tarlz (version 0.8, 16 December 2018).
+This manual is for Tarlz (version 0.9, 22 January 2019).
* Menu:
@@ -19,12 +19,13 @@ This manual is for Tarlz (version 0.8, 16 December 2018).
* Invoking tarlz:: Command line interface
* File format:: Detailed format of the compressed archive
* Amendments to pax format:: The reasons for the differences with pax
+* Multi-threaded tar:: Limitations of parallel tar decoding
* Examples:: A small tutorial with examples
* Problems:: Reporting bugs
* Concept index:: Index of concepts
- Copyright (C) 2013-2018 Antonio Diaz Diaz.
+ Copyright (C) 2013-2019 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to
copy, distribute and modify it.
@@ -35,12 +36,14 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T
1 Introduction
**************
-Tarlz is a small and simple implementation of the tar archiver. By
-default tarlz creates, lists and extracts archives in a simplified
-posix pax format compressed with lzip on a per file basis. Each tar
-member is compressed in its own lzip member, as well as the end-of-file
-blocks. This method is fully backward compatible with standard tar tools
-like GNU tar, which treat the resulting multimember tar.lz archive like
+Tarlz is a combined implementation of the tar archiver and the lzip
+compressor. By default tarlz creates, lists and extracts archives in a
+simplified posix pax format compressed with lzip on a per file basis.
+Each tar member is compressed in its own lzip member, as well as the
+end-of-file blocks. This method adds an indexed lzip layer on top of
+the tar archive, making it possible to decode the archive safely in
+parallel. The resulting multimember tar.lz archive is fully backward
+compatible with standard tar tools like GNU tar, which treat it like
any other tar.lz archive. Tarlz can append files to the end of such
compressed archives.
@@ -52,7 +55,7 @@ less efficient than compressing the whole tar archive, but it has the
following advantages:
* The resulting multimember tar.lz archive can be decompressed in
- parallel with plzip, multiplying the decompression speed.
+ parallel, multiplying the decompression speed.
* New members can be appended to the archive (by removing the EOF
member) just like to an uncompressed tar archive.
@@ -74,10 +77,6 @@ with standard tar tools. *Note crc32::.
Tarlz does not understand other tar formats like 'gnu', 'oldgnu',
'star' or 'v7'.
- Tarlz is intended as a showcase project for the maintainers of real
-tar programs to evaluate the format and perhaps implement it in their
-tools.
-

File: tarlz.info, Node: Invoking tarlz, Next: File format, Prev: Introduction, Up: Top
@@ -141,6 +140,21 @@ archive 'foo'.
Use archive file ARCHIVE. '-' used as an ARCHIVE argument reads
from standard input or writes to standard output.
+'-n N'
+'--threads=N'
+ Set the number of decompression threads, overriding the system's
+ default. Valid values range from 0 to "as many as your system can
+ support". A value of 0 disables threads entirely. If this option
+ is not used, tarlz tries to detect the number of processors in the
+ system and use it as default value. 'tarlz --help' shows the
+ system's default value. This option currently only has effect when
+ listing the contents of a multimember compressed archive. *Note
+ Multi-threaded tar::.
+
+ Note that the number of usable threads is limited during
+ decompression to the number of lzip members in the tar.lz archive,
+ which you can find by running 'lzip -lv archive.tar.lz'.
+
'-q'
'--quiet'
Quiet operation. Suppress all messages.
@@ -288,6 +302,11 @@ following sequence:
* Zero or more blocks that contain the contents of the file.
+ Each tar member must be contiguously stored in a lzip member for the
+parallel decoding operations like '--list' to work. If any tar member
+is split over two or more lzip members, the archive must be decoded
+sequentially. *Note Multi-threaded tar::.
+
At the end of the archive file there are two 512-byte blocks filled
with binary zeros, interpreted as an end-of-archive indicator. These EOF
blocks are either compressed in a separate lzip member or compressed
@@ -417,19 +436,12 @@ record is used to store the linkname.
The mode field provides 12 access permission bits. The following
table shows the symbolic name of each bit and its octal value:
-Bit Name Bit value
-S_ISUID 04000
-S_ISGID 02000
-S_ISVTX 01000
-S_IRUSR 00400
-S_IWUSR 00200
-S_IXUSR 00100
-S_IRGRP 00040
-S_IWGRP 00020
-S_IXGRP 00010
-S_IROTH 00004
-S_IWOTH 00002
-S_IXOTH 00001
+Bit Name Value Bit Name Value Bit Name Value
+---------------------------------------------------
+S_ISUID 04000 S_ISGID 02000 S_ISVTX 01000
+S_IRUSR 00400 S_IWUSR 00200 S_IXUSR 00100
+S_IRGRP 00040 S_IWGRP 00020 S_IXGRP 00010
+S_IROTH 00004 S_IWOTH 00002 S_IXOTH 00001
The uid and gid fields are the user and group ID of the owner and
group of the file, respectively.
@@ -485,12 +497,16 @@ file archived:
The magic field contains the ASCII null-terminated string "ustar".
The version field contains the characters "00" (0x30,0x30). The fields
-uname, and gname are null-terminated character strings. Each numeric
-field contains a leading zero-filled, null-terminated octal number using
-digits from the ISO/IEC 646:1991 (ASCII) standard.
+uname, and gname are null-terminated character strings except when all
+characters in the array contain non-null characters including the last
+character. Each numeric field contains a leading space- or zero-filled,
+optionally null-terminated octal number using digits from the ISO/IEC
+646:1991 (ASCII) standard. Tarlz is able to decode numeric fields 1
+byte larger than standard ustar by not requiring a terminating null
+character.

-File: tarlz.info, Node: Amendments to pax format, Next: Examples, Prev: File format, Up: Top
+File: tarlz.info, Node: Amendments to pax format, Next: Multi-threaded tar, Prev: File format, Up: Top
4 The reasons for the differences with pax
******************************************
@@ -508,7 +524,7 @@ and the concrete reasons to implement them.
The posix pax format has a serious flaw. The metadata stored in pax
extended records are not protected by any kind of check sequence.
Corruption in a long filename may cause the extraction of the file in
-the wrong place without warning. Corruption in a long file size may
+the wrong place without warning. Corruption in a large file size may
cause the truncation of the file or the appending of garbage to the
file, both followed by a spurious warning about a corrupt header far
from the place of the undetected corruption.
@@ -573,9 +589,57 @@ prevents accidental double UTF-8 conversions. If the need arises this
behavior will be adjusted with a command line option in the future.

-File: tarlz.info, Node: Examples, Next: Problems, Prev: Amendments to pax format, Up: Top
+File: tarlz.info, Node: Multi-threaded tar, Next: Examples, Prev: Amendments to pax format, Up: Top
+
+5 Limitations of parallel tar decoding
+**************************************
+
+Safely decoding an arbitrary tar archive in parallel is impossible. For
+example, if a tar archive containing another tar archive is decoded
+starting from some position other than the beginning, there is no way
+to know if the first header found there belongs to the outer tar
+archive or to the inner tar archive. Tar is a format inherently serial;
+it was designed for tapes.
+
+ In the case of compressed tar archives, the start of each compressed
+block determines one point through which the tar archive can be decoded
+in parallel. Therefore, in tar.lz archives the decoding operations
+can't be parallelized if the tar members are not aligned with the lzip
+members. Tar archives compressed with plzip can't be decoded in
+parallel because tar and plzip do not have a way to align both sets of
+members. Certainly one can decompress one such archive with a
+multi-threaded tool like plzip, but the increase in speed is not as
+large as it could be because plzip must serialize the decompressed data
+and pass them to tar, which decodes them sequentially, one tar member
+at a time.
+
+ On the other hand, if the tar.lz archive is created with a tool like
+tarlz, which can guarantee the alignment between tar members and lzip
+members because it controls both archiving and compression, then the
+lzip format becomes an indexed layer on top of the tar archive which
+makes possible decoding it safely in parallel.
+
+ Tarlz is able to automatically decode aligned and unaligned
+multimember tar.lz archives, keeping backwards compatibility. If tarlz
+finds a member misalignment during multi-threaded decoding, it switches
+to single-threaded mode and continues decoding the archive. Currently
+only the '--list' option is able to do multi-threaded decoding.
+
+ If the files in the archive are large, multi-threaded '--list' on a
+regular tar.lz archive can be hundreds of times faster than sequential
+'--list' because, in addition to using several processors, it only
+needs to decompress part of each lzip member. See the following example
+listing the Silesia corpus on a dual core machine:
+
+ tarlz -9 -cf silesia.tar.lz silesia
+ time lzip -cd silesia.tar.lz | tar -tf - (5.032s)
+ time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
+ time tarlz -tf silesia.tar.lz (0.020s)
+
+
+File: tarlz.info, Node: Examples, Next: Problems, Prev: Multi-threaded tar, Up: Top
-5 A small tutorial with examples
+6 A small tutorial with examples
********************************
Example 1: Create a multimember compressed archive 'archive.tar.lz'
@@ -633,7 +697,7 @@ Example 8: Copy the contents of directory 'sourcedir' to the directory

File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
-6 Reporting bugs
+7 Reporting bugs
****************
There are probably bugs in tarlz. There are certainly errors and
@@ -670,16 +734,17 @@ Concept index

Tag Table:
Node: Top223
-Node: Introduction946
-Node: Invoking tarlz3084
-Node: File format9606
-Ref: key_crc3214138
-Node: Amendments to pax format19215
-Ref: crc3219729
-Ref: flawed-compat20753
-Node: Examples23126
-Node: Problems24802
-Node: Concept index25328
+Node: Introduction1012
+Node: Invoking tarlz3124
+Node: File format10384
+Ref: key_crc3215169
+Node: Amendments to pax format20586
+Ref: crc3221110
+Ref: flawed-compat22135
+Node: Multi-threaded tar24508
+Node: Examples27012
+Node: Problems28682
+Node: Concept index29208

End Tag Table