summaryrefslogtreecommitdiffstats
path: root/README
diff options
context:
space:
mode:
Diffstat (limited to 'README')
-rw-r--r--README35
1 files changed, 24 insertions, 11 deletions
diff --git a/README b/README
index c0ab721..8a02c72 100644
--- a/README
+++ b/README
@@ -2,13 +2,19 @@ Description
Tarlz is a massively parallel (multi-threaded) combined implementation of
the tar archiver and the lzip compressor. Tarlz creates, lists and extracts
-archives in a simplified posix pax format compressed with lzip, keeping the
-alignment between tar members and lzip members. This method adds an indexed
-lzip layer on top of the tar archive, making it possible to decode the
-archive safely in parallel. The resulting multimember tar.lz archive is
-fully backward compatible with standard tar tools like GNU tar, which treat
-it like any other tar.lz archive. Tarlz can append files to the end of such
-compressed archives.
+archives in a simplified and safer variant of the POSIX pax format
+compressed with lzip, keeping the alignment between tar members and lzip
+members. The resulting multimember tar.lz archive is fully backward
+compatible with standard tar tools like GNU tar, which treat it like any
+other tar.lz archive. Tarlz can append files to the end of such compressed
+archives.
+
+Keeping the alignment between tar members and lzip members has two
+advantages. It adds an indexed lzip layer on top of the tar archive, making
+it possible to decode the archive safely in parallel. It also minimizes the
+amount of data lost in case of corruption. Compressing a tar archive with
+plzip may even double the amount of files lost for each lzip member damaged
+because it does not keep the members aligned.
Tarlz can create tar archives with five levels of compression granularity;
per file (--no-solid), per block (--bsolid, default), per directory
@@ -25,7 +31,7 @@ archive, but it has the following advantages:
member), and unwanted members can be deleted from the archive. Just
like an uncompressed tar archive.
- * It is a safe posix-style backup format. In case of corruption,
+ * It is a safe POSIX-style backup format. In case of corruption,
tarlz can extract all the undamaged members from the tar.lz
archive, skipping over the damaged members, just like the standard
(uncompressed) tar. Moreover, the option '--keep-damaged' can be
@@ -36,24 +42,31 @@ archive, but it has the following advantages:
corresponding solidly compressed tar.gz archive, except when
individually compressing files smaller than about 32 KiB.
-Note that the posix pax format has a serious flaw. The metadata stored in
+Note that the POSIX pax format has a serious flaw. The metadata stored in
pax extended records are not protected by any kind of check sequence.
-Corruption in a long filename may cause the extraction of the file in the
+Corruption in a long file name may cause the extraction of the file in the
wrong place without warning. Corruption in a large file size may cause the
truncation of the file or the appending of garbage to the file, both
followed by a spurious warning about a corrupt header far from the place of
the undetected corruption.
-Metadata like filename and file size must be always protected in an archive
+Metadata like file name and file size must be always protected in an archive
format because of the adverse effects of undetected corruption in them,
potentially much worse that undetected corruption in the data. Even more so
in the case of pax because the amount of metadata it stores is potentially
large, making undetected corruption more probable.
+Headers and metadata must be protected separately from data because the
+integrity checking of lzip may not be able to detect the corruption before
+the metadata has been used, for example, to create a new file in the wrong
+place.
+
Because of the above, tarlz protects the extended records with a CRC in a
way compatible with standard tar tools.
Tarlz does not understand other tar formats like gnu, oldgnu, star or v7.
+'tarlz -tf archive.tar.lz > /dev/null' can be used to verify that the format
+of the archive is compatible with tarlz.
The diagram below shows the correspondence between each tar member (formed
by one or two headers plus optional data) in the tar archive and each lzip