summaryrefslogtreecommitdiffstats
path: root/doc/tarlz.info
diff options
context:
space:
mode:
Diffstat (limited to 'doc/tarlz.info')
-rw-r--r--doc/tarlz.info129
1 files changed, 85 insertions, 44 deletions
diff --git a/doc/tarlz.info b/doc/tarlz.info
index e2c61db..d287697 100644
--- a/doc/tarlz.info
+++ b/doc/tarlz.info
@@ -11,7 +11,7 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
Tarlz Manual
************
-This manual is for Tarlz (version 0.17, 30 July 2020).
+This manual is for Tarlz (version 0.19, 8 January 2021).
* Menu:
@@ -28,10 +28,10 @@ This manual is for Tarlz (version 0.17, 30 July 2020).
* Concept index:: Index of concepts
- Copyright (C) 2013-2020 Antonio Diaz Diaz.
+ Copyright (C) 2013-2021 Antonio Diaz Diaz.
- This manual is free documentation: you have unlimited permission to
-copy, distribute, and modify it.
+ This manual is free documentation: you have unlimited permission to copy,
+distribute, and modify it.

File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: Top
@@ -40,13 +40,15 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T
**************
Tarlz is a massively parallel (multi-threaded) combined implementation of
-the tar archiver and the lzip compressor. Tarlz creates, lists and extracts
-archives in a simplified and safer variant of the POSIX pax format
-compressed with lzip, keeping the alignment between tar members and lzip
-members. The resulting multimember tar.lz archive is fully backward
-compatible with standard tar tools like GNU tar, which treat it like any
-other tar.lz archive. Tarlz can append files to the end of such compressed
-archives.
+the tar archiver and the lzip compressor. Tarlz uses the compression
+library lzlib.
+
+ Tarlz creates tar archives using a simplified and safer variant of the
+POSIX pax format compressed in lzip format, keeping the alignment between
+tar members and lzip members. The resulting multimember tar.lz archive is
+fully backward compatible with standard tar tools like GNU tar, which treat
+it like any other tar.lz archive. Tarlz can append files to the end of such
+compressed archives.
Keeping the alignment between tar members and lzip members has two
advantages. It adds an indexed lzip layer on top of the tar archive, making
@@ -56,7 +58,7 @@ plzip may even double the amount of files lost for each lzip member damaged
because it does not keep the members aligned.
Tarlz can create tar archives with five levels of compression
-granularity; per file (--no-solid), per block (--bsolid, default), per
+granularity: per file (--no-solid), per block (--bsolid, default), per
directory (--dsolid), appendable solid (--asolid), and solid (--solid). It
can also create uncompressed tar archives.
@@ -79,8 +81,8 @@ archive, but it has the following advantages:
lziprecover can be used to recover some of the damaged members.
* A multimember tar.lz archive is usually smaller than the corresponding
- solidly compressed tar.gz archive, except when compressing files
- smaller than about 32 KiB individually.
+ solidly compressed tar.gz archive, except when individually
+ compressing files smaller than about 32 KiB.
Tarlz protects the extended records with a Cyclic Redundancy Check (CRC)
in a way compatible with standard tar tools. *Note crc32::.
@@ -240,8 +242,7 @@ to '-1 --solid'
not used, tarlz tries to detect the number of processors in the system
and use it as default value. 'tarlz --help' shows the system's default
value. See the note about multi-threaded archive creation in the
- option '-C' above. Multi-threaded extraction of files from an archive
- is not yet implemented. *Note Multi-threaded decoding::.
+ option '-C' above.
Note that the number of usable threads is limited during compression to
ceil( uncompressed_size / data_size ) (*note Minimum archive sizes::),
@@ -281,7 +282,8 @@ to '-1 --solid'
'-v'
'--verbose'
- Verbosely list files processed.
+ Verbosely list files processed. Further -v's (up to 4) increase the
+ verbosity level.
'-x'
'--extract'
@@ -376,7 +378,8 @@ to '-1 --solid'
Don't delete partially extracted files. If a decompression error
happens while extracting a file, keep the partial data extracted. Use
this option to recover as much data as possible from each damaged
- member.
+ member. It is recommended to run tarlz in single-threaded mode
+ (-threads=0) when using this option.
'--missing-crc'
Exit with error status 2 if the CRC of the extended records is missing.
@@ -396,6 +399,15 @@ to '-1 --solid'
more memory. Valid values range from 1 to 1024. The default value is
64.
+'--check-lib'
+ Compare the version of lzlib used to compile tarlz with the version
+ actually being used and exit. Report any differences found. Exit with
+ error status 1 if differences are found. A mismatch may indicate that
+ lzlib is not correctly installed or that a different version of lzlib
+ has been installed after compiling tarlz. 'tarlz -v --check-lib' shows
+ the version of lzlib being used and the value of 'LZ_API_VERSION' (if
+ defined). *Note Library version: (lzlib)Library version.
+
Exit status: 0 for a normal exit, 1 for environmental problems (file not
found, files differ, invalid flags, I/O errors, etc), 2 to indicate a
@@ -546,6 +558,10 @@ space, equal-sign, and newline.
the swapping of two bytes.
+ At verbosity level 1 or higher tarlz prints a diagnostic for each unknown
+extended header keyword found in an archive, once per keyword.
+
+
4.2 Ustar header block
======================
@@ -770,11 +786,12 @@ interesting parts described here are those related to Multi-threaded
processing.
The structure of the part of tarlz performing Multi-threaded archive
-creation is somewhat similar to that of plzip with the added complication of
-the solidity levels. A grouper thread and several worker threads are
-created, acting the main thread as muxer (multiplexer) thread. A "packet
-courier" takes care of data transfers among threads and limits the maximum
-number of data blocks (packets) being processed simultaneously.
+creation is somewhat similar to that of plzip with the added complication
+of the solidity levels. *Note Program design: (plzip)Program design. A
+grouper thread and several worker threads are created, acting the main
+thread as muxer (multiplexer) thread. A "packet courier" takes care of data
+transfers among threads and limits the maximum number of data blocks
+(packets) being processed simultaneously.
The grouper traverses the directory tree, groups together the metadata of
the files to be archived in each lzip member, and distributes them to the
@@ -805,8 +822,7 @@ the archive.
,--------,
| file |<---> data to/from each worker below
| system |
-`--------'
- ,------------,
+`--------' ,------------,
,-->| worker 0 |--,
| `------------' |
,---------, | ,------------, | ,-------, ,--------,
@@ -870,8 +886,7 @@ possible decoding it safely in parallel.
Tarlz is able to automatically decode aligned and unaligned multimember
tar.lz archives, keeping backwards compatibility. If tarlz finds a member
misalignment during multi-threaded decoding, it switches to single-threaded
-mode and continues decoding the archive. Currently only the options
-'--diff' and '--list' are able to do multi-threaded decoding.
+mode and continues decoding the archive.
If the files in the archive are large, multi-threaded '--list' on a
regular (seekable) tar.lz archive can be hundreds of times faster than
@@ -886,7 +901,33 @@ example listing the Silesia corpus on a dual core machine:
On the other hand, multi-threaded '--list' won't detect corruption in
the tar member data because it only decodes the part of each lzip member
-corresponding to the tar member header.
+corresponding to the tar member header. This is another reason why the tar
+headers must provide its own integrity checking.
+
+
+7.1 Limitations of multi-threaded extraction
+============================================
+
+Multi-threaded extraction may produce different output than single-threaded
+extraction in some cases:
+
+ During multi-threaded extraction, several independent processes are
+simultaneously reading the archive and creating files in the file system.
+The archive is not read sequentially. As a consequence, any error or
+weirdness in the archive (like a corrupt member or an EOF block in the
+middle of the archive) won't be usually detected until part of the archive
+beyond that point has been processed.
+
+ If the archive contains two or more tar members with the same name,
+single-threaded extraction extracts the members in the order they appear in
+the archive and leaves in the file system the last version of the file. But
+multi-threaded extraction may extract the members in any order and leave in
+the file system any version of the file nondeterministically. It is
+unspecified which of the tar members is extracted.
+
+ If the same file is extracted through several paths (different member
+names resolve to the same file in the file system), the result is undefined.
+(Probably the resulting file will be mangled).

File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded decoding, Up: Top
@@ -1028,22 +1069,22 @@ Concept index

Tag Table:
Node: Top223
-Node: Introduction1212
-Node: Invoking tarlz3982
-Ref: --data-size6193
-Ref: --bsolid14608
-Node: Portable character set18244
-Node: File format18887
-Ref: key_crc3223812
-Node: Amendments to pax format29271
-Ref: crc3229935
-Ref: flawed-compat31220
-Node: Program design33865
-Node: Multi-threaded decoding37756
-Node: Minimum archive sizes40492
-Node: Examples42630
-Node: Problems44345
-Node: Concept index44873
+Node: Introduction1214
+Node: Invoking tarlz4022
+Ref: --data-size6233
+Ref: --bsolid14593
+Node: Portable character set18852
+Node: File format19495
+Ref: key_crc3224420
+Node: Amendments to pax format30021
+Ref: crc3230685
+Ref: flawed-compat31970
+Node: Program design34615
+Node: Multi-threaded decoding38540
+Node: Minimum archive sizes42482
+Node: Examples44620
+Node: Problems46335
+Node: Concept index46863

End Tag Table