From 48fe8b80e2592f26cae686b0b99547b76018aeb1 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Fri, 28 Dec 2018 10:51:38 +0100 Subject: Adding upstream version 0.8. Signed-off-by: Daniel Baumann --- README | 77 ++++++++++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 47 insertions(+), 30 deletions(-) (limited to 'README') diff --git a/README b/README index bafec18..44edeea 100644 --- a/README +++ b/README @@ -1,36 +1,16 @@ Description -Tarlz is a small and simple implementation of the tar archiver. By -default tarlz creates, lists and extracts archives in the 'ustar' format -compressed with lzip on a per file basis. Tarlz can append files to the -end of such compressed archives. - -Each tar member is compressed in its own lzip member, as well as the -end-of-file blocks. This same method works for any tar format (gnu, -ustar, posix) and is fully backward compatible with standard tar tools -like GNU tar, which treat the resulting multimember tar.lz archive like -any other tar.lz archive. +Tarlz is a small and simple implementation of the tar archiver. By default +tarlz creates, lists and extracts archives in a simplified posix pax format +compressed with lzip on a per file basis. Each tar member is compressed in +its own lzip member, as well as the end-of-file blocks. This method is fully +backward compatible with standard tar tools like GNU tar, which treat the +resulting multimember tar.lz archive like any other tar.lz archive. Tarlz +can append files to the end of such compressed archives. Tarlz can create tar archives with four levels of compression granularity; per file, per directory, appendable solid, and solid. -Tarlz is intended as a showcase project for the maintainers of real tar -programs to evaluate the format and perhaps implement it in their tools. - -The diagram below shows the correspondence between tar members (formed -by a header plus optional data) in the tar archive and lzip members in -the resulting multimember tar.lz archive: - -tar -+========+======+========+======+========+======+========+ -| header | data | header | data | header | data | eof | -+========+======+========+======+========+======+========+ - -tar.lz -+===============+===============+===============+========+ -| member | member | member | member | -+===============+===============+===============+========+ - Of course, compressing each file (or each directory) individually is less efficient than compressing the whole tar archive, but it has the following advantages: @@ -38,19 +18,56 @@ following advantages: * The resulting multimember tar.lz archive can be decompressed in parallel with plzip, multiplying the decompression speed. - * New members can be appended to the archive (by removing the eof + * New members can be appended to the archive (by removing the EOF member) just like to an uncompressed tar archive. * It is a safe posix-style backup format. In case of corruption, tarlz can extract all the undamaged members from the tar.lz archive, skipping over the damaged members, just like the standard - (uncompressed) tar. Moreover, lziprecover can be used to recover at - least part of the contents of the damaged members. + (uncompressed) tar. Moreover, the option '--keep-damaged' can be + used to recover as much data as possible from each damaged member, + and lziprecover can be used to recover some of the damaged members. * A multimember tar.lz archive is usually smaller than the corresponding solidly compressed tar.gz archive, except when individually compressing files smaller than about 32 KiB. +Note that the posix pax format has a serious flaw. The metadata stored +in pax extended records are not protected by any kind of check sequence. +Corruption in a long filename may cause the extraction of the file in the +wrong place without warning. Corruption in a long file size may cause the +truncation of the file or the appending of garbage to the file, both +followed by a spurious warning about a corrupt header far from the place +of the undetected corruption. + +Metadata like filename and file size must be always protected in an archive +format because of the adverse effects of undetected corruption in them, +potentially much worse that undetected corruption in the data. Even more so +in the case of pax because the amount of metadata it stores is potentially +large, making undetected corruption more probable. + +Because of the above, tarlz protects the extended records with a CRC in +a way compatible with standard tar tools. + +Tarlz does not understand other tar formats like gnu, oldgnu, star or v7. + +Tarlz is intended as a showcase project for the maintainers of real tar +programs to evaluate the format and perhaps implement it in their tools. + +The diagram below shows the correspondence between each tar member +(formed by one or two headers plus optional data) in the tar archive and +each lzip member in the resulting multimember tar.lz archive: + +tar ++========+======+=================+===============+========+======+========+ +| header | data | extended header | extended data | header | data | EOF | ++========+======+=================+===============+========+======+========+ + +tar.lz ++===============+=================================================+========+ +| member | member | member | ++===============+=================================================+========+ + Copyright (C) 2013-2018 Antonio Diaz Diaz. -- cgit v1.2.3