summaryrefslogtreecommitdiffstats
path: root/README
diff options
context:
space:
mode:
Diffstat (limited to 'README')
-rw-r--r--README77
1 files changed, 47 insertions, 30 deletions
diff --git a/README b/README
index bafec18..44edeea 100644
--- a/README
+++ b/README
@@ -1,36 +1,16 @@
Description
-Tarlz is a small and simple implementation of the tar archiver. By
-default tarlz creates, lists and extracts archives in the 'ustar' format
-compressed with lzip on a per file basis. Tarlz can append files to the
-end of such compressed archives.
-
-Each tar member is compressed in its own lzip member, as well as the
-end-of-file blocks. This same method works for any tar format (gnu,
-ustar, posix) and is fully backward compatible with standard tar tools
-like GNU tar, which treat the resulting multimember tar.lz archive like
-any other tar.lz archive.
+Tarlz is a small and simple implementation of the tar archiver. By default
+tarlz creates, lists and extracts archives in a simplified posix pax format
+compressed with lzip on a per file basis. Each tar member is compressed in
+its own lzip member, as well as the end-of-file blocks. This method is fully
+backward compatible with standard tar tools like GNU tar, which treat the
+resulting multimember tar.lz archive like any other tar.lz archive. Tarlz
+can append files to the end of such compressed archives.
Tarlz can create tar archives with four levels of compression
granularity; per file, per directory, appendable solid, and solid.
-Tarlz is intended as a showcase project for the maintainers of real tar
-programs to evaluate the format and perhaps implement it in their tools.
-
-The diagram below shows the correspondence between tar members (formed
-by a header plus optional data) in the tar archive and lzip members in
-the resulting multimember tar.lz archive:
-
-tar
-+========+======+========+======+========+======+========+
-| header | data | header | data | header | data | eof |
-+========+======+========+======+========+======+========+
-
-tar.lz
-+===============+===============+===============+========+
-| member | member | member | member |
-+===============+===============+===============+========+
-
Of course, compressing each file (or each directory) individually is
less efficient than compressing the whole tar archive, but it has the
following advantages:
@@ -38,19 +18,56 @@ following advantages:
* The resulting multimember tar.lz archive can be decompressed in
parallel with plzip, multiplying the decompression speed.
- * New members can be appended to the archive (by removing the eof
+ * New members can be appended to the archive (by removing the EOF
member) just like to an uncompressed tar archive.
* It is a safe posix-style backup format. In case of corruption,
tarlz can extract all the undamaged members from the tar.lz
archive, skipping over the damaged members, just like the standard
- (uncompressed) tar. Moreover, lziprecover can be used to recover at
- least part of the contents of the damaged members.
+ (uncompressed) tar. Moreover, the option '--keep-damaged' can be
+ used to recover as much data as possible from each damaged member,
+ and lziprecover can be used to recover some of the damaged members.
* A multimember tar.lz archive is usually smaller than the
corresponding solidly compressed tar.gz archive, except when
individually compressing files smaller than about 32 KiB.
+Note that the posix pax format has a serious flaw. The metadata stored
+in pax extended records are not protected by any kind of check sequence.
+Corruption in a long filename may cause the extraction of the file in the
+wrong place without warning. Corruption in a long file size may cause the
+truncation of the file or the appending of garbage to the file, both
+followed by a spurious warning about a corrupt header far from the place
+of the undetected corruption.
+
+Metadata like filename and file size must be always protected in an archive
+format because of the adverse effects of undetected corruption in them,
+potentially much worse that undetected corruption in the data. Even more so
+in the case of pax because the amount of metadata it stores is potentially
+large, making undetected corruption more probable.
+
+Because of the above, tarlz protects the extended records with a CRC in
+a way compatible with standard tar tools.
+
+Tarlz does not understand other tar formats like gnu, oldgnu, star or v7.
+
+Tarlz is intended as a showcase project for the maintainers of real tar
+programs to evaluate the format and perhaps implement it in their tools.
+
+The diagram below shows the correspondence between each tar member
+(formed by one or two headers plus optional data) in the tar archive and
+each lzip member in the resulting multimember tar.lz archive:
+
+tar
++========+======+=================+===============+========+======+========+
+| header | data | extended header | extended data | header | data | EOF |
++========+======+=================+===============+========+======+========+
+
+tar.lz
++===============+=================================================+========+
+| member | member | member |
++===============+=================================================+========+
+
Copyright (C) 2013-2018 Antonio Diaz Diaz.