diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-14 12:56:09 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-14 12:56:09 +0000 |
commit | 7a268a7a1cbeb80359e05bf74cc258b1e7cd83e9 (patch) | |
tree | e94c5a1aa65e2c1b2370656f0df107edd33700f7 /doc | |
parent | Initial commit. (diff) | |
download | lziprecover-7a268a7a1cbeb80359e05bf74cc258b1e7cd83e9.tar.xz lziprecover-7a268a7a1cbeb80359e05bf74cc258b1e7cd83e9.zip |
Adding upstream version 1.24.upstream/1.24upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
-rw-r--r-- | doc/lziprecover.1 | 152 | ||||
-rw-r--r-- | doc/lziprecover.info | 1536 | ||||
-rw-r--r-- | doc/lziprecover.texi | 1617 |
3 files changed, 3305 insertions, 0 deletions
diff --git a/doc/lziprecover.1 b/doc/lziprecover.1 new file mode 100644 index 0000000..f95e80f --- /dev/null +++ b/doc/lziprecover.1 @@ -0,0 +1,152 @@ +.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.2. +.TH LZIPRECOVER "1" "January 2024" "lziprecover 1.24" "User Commands" +.SH NAME +lziprecover \- recovers data from damaged lzip files +.SH SYNOPSIS +.B lziprecover +[\fI\,options\/\fR] [\fI\,files\/\fR] +.SH DESCRIPTION +Lziprecover is a data recovery tool and decompressor for files in the lzip +compressed data format (.lz). Lziprecover is able to repair slightly damaged +files (up to one single\-byte error per member), produce a correct file by +merging the good parts of two or more damaged copies, reproduce a missing +(zeroed) sector using a reference file, extract data from damaged files, +decompress files, and test integrity of files. +.PP +With the help of lziprecover, losing an entire archive just because of a +corrupt byte near the beginning is a thing of the past. +.PP +Lziprecover can remove the damaged members from multimember files, for +example multimember tar.lz archives. +.PP +Lziprecover provides random access to the data in multimember files; it only +decompresses the members containing the desired data. +.PP +Lziprecover facilitates the management of metadata stored as trailing data +in lzip files. +.PP +Lziprecover is not a replacement for regular backups, but a last line of +defense for the case where the backups are also damaged. +.SH OPTIONS +.TP +\fB\-h\fR, \fB\-\-help\fR +display this help and exit +.TP +\fB\-V\fR, \fB\-\-version\fR +output version information and exit +.TP +\fB\-a\fR, \fB\-\-trailing\-error\fR +exit with error status if trailing data +.TP +\fB\-A\fR, \fB\-\-alone\-to\-lz\fR +convert lzma\-alone files to lzip format +.TP +\fB\-c\fR, \fB\-\-stdout\fR +write to standard output, keep input files +.TP +\fB\-d\fR, \fB\-\-decompress\fR +decompress, test compressed file integrity +.TP +\fB\-D\fR, \fB\-\-range\-decompress=\fR<n\-m> +decompress a range of bytes to stdout +.TP +\fB\-e\fR, \fB\-\-reproduce\fR +try to reproduce a zeroed sector in file +.TP +\fB\-\-lzip\-level\fR=\fI\,N\/\fR|a|m[N] +reproduce one level, all, or match length +.TP +\fB\-\-lzip\-name=\fR<name> +name of lzip executable for \fB\-\-reproduce\fR +.TP +\fB\-\-reference\-file=\fR<file> +reference file for \fB\-\-reproduce\fR +.TP +\fB\-f\fR, \fB\-\-force\fR +overwrite existing output files +.TP +\fB\-i\fR, \fB\-\-ignore\-errors\fR +ignore some errors in \fB\-d\fR, \fB\-D\fR, \fB\-l\fR, \fB\-t\fR, \fB\-\-dump\fR +.TP +\fB\-k\fR, \fB\-\-keep\fR +keep (don't delete) input files +.TP +\fB\-l\fR, \fB\-\-list\fR +print (un)compressed file sizes +.TP +\fB\-m\fR, \fB\-\-merge\fR +repair errors in file using several copies +.TP +\fB\-o\fR, \fB\-\-output=\fR<file> +place the output into <file> +.TP +\fB\-q\fR, \fB\-\-quiet\fR +suppress all messages +.TP +\fB\-R\fR, \fB\-\-byte\-repair\fR +try to repair a corrupt byte in file +.TP +\fB\-s\fR, \fB\-\-split\fR +split multimember file in single\-member files +.TP +\fB\-t\fR, \fB\-\-test\fR +test compressed file integrity +.TP +\fB\-v\fR, \fB\-\-verbose\fR +be verbose (a 2nd \fB\-v\fR gives more) +.TP +\fB\-\-dump=\fR<list>:d:e:t +dump members, damaged/empty, tdata to stdout +.TP +\fB\-\-remove=\fR<list>:d:e:t +remove members, tdata from files in place +.TP +\fB\-\-strip=\fR<list>:d:e:t +copy files to stdout stripping members given +.TP +\fB\-\-empty\-error\fR +exit with error status if empty member in file +.TP +\fB\-\-marking\-error\fR +exit with error status if 1st LZMA byte not 0 +.TP +\fB\-\-loose\-trailing\fR +allow trailing data seeming corrupt header +.TP +\fB\-\-clear\-marking\fR +reset the first LZMA byte of each member +.PP +If no file names are given, or if a file is '\-', lziprecover decompresses +from standard input to standard output. +Numbers may be followed by a multiplier: k = kB = 10^3 = 1000, +Ki = KiB = 2^10 = 1024, M = 10^6, Mi = 2^20, G = 10^9, Gi = 2^30, etc... +.PP +To extract all the files from archive 'foo.tar.lz', use the commands +\&'tar \fB\-xf\fR foo.tar.lz' or 'lziprecover \fB\-cd\fR foo.tar.lz | tar \fB\-xf\fR \-'. +.PP +Exit status: 0 for a normal exit, 1 for environmental problems +(file not found, invalid command\-line options, I/O errors, etc), 2 to +indicate a corrupt or invalid input file, 3 for an internal consistency +error (e.g., bug) which caused lziprecover to panic. +.SH "REPORTING BUGS" +Report bugs to lzip\-bug@nongnu.org +.br +Lziprecover home page: http://www.nongnu.org/lzip/lziprecover.html +.SH COPYRIGHT +Copyright \(co 2024 Antonio Diaz Diaz. +License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html> +.br +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. +.SH "SEE ALSO" +The full documentation for +.B lziprecover +is maintained as a Texinfo manual. If the +.B info +and +.B lziprecover +programs are properly installed at your site, the command +.IP +.B info lziprecover +.PP +should give you access to the complete manual. diff --git a/doc/lziprecover.info b/doc/lziprecover.info new file mode 100644 index 0000000..b1f820f --- /dev/null +++ b/doc/lziprecover.info @@ -0,0 +1,1536 @@ +This is lziprecover.info, produced by makeinfo version 4.13+ from +lziprecover.texi. + +INFO-DIR-SECTION Compression +START-INFO-DIR-ENTRY +* Lziprecover: (lziprecover). Data recovery tool for the lzip format +END-INFO-DIR-ENTRY + + +File: lziprecover.info, Node: Top, Next: Introduction, Up: (dir) + +Lziprecover Manual +****************** + +This manual is for Lziprecover (version 1.24, 20 January 2024). + +* Menu: + +* Introduction:: Purpose and features of lziprecover +* Invoking lziprecover:: Command-line interface +* Data safety:: Protecting data from accidental loss +* Repairing one byte:: Fixing bit flips and similar errors +* Merging files:: Fixing several damaged copies +* Reproducing one sector:: Fixing a missing (zeroed) sector +* Tarlz:: Options supporting the tar.lz format +* File names:: Names of the files produced by lziprecover +* File format:: Detailed format of the compressed file +* Trailing data:: Extra data appended to the file +* Examples:: A small tutorial with examples +* Unzcrash:: Testing the robustness of decompressors +* Problems:: Reporting bugs +* Concept index:: Index of concepts + + + Copyright (C) 2009-2024 Antonio Diaz Diaz. + + This manual is free documentation: you have unlimited permission to copy, +distribute, and modify it. + + +File: lziprecover.info, Node: Introduction, Next: Invoking lziprecover, Prev: Top, Up: Top + +1 Introduction +************** + +Lziprecover is a data recovery tool and decompressor for files in the lzip +compressed data format (.lz). Lziprecover is able to repair slightly damaged +files (up to one single-byte error per member), produce a correct file by +merging the good parts of two or more damaged copies, reproduce a missing +(zeroed) sector using a reference file, extract data from damaged files, +decompress files, and test integrity of files. + + Lziprecover can remove the damaged members from multimember files, for +example multimember tar.lz archives. + + Lziprecover provides random access to the data in multimember files; it +only decompresses the members containing the desired data. + + Lziprecover facilitates the management of metadata stored as trailing +data in lzip files. + + Lziprecover is not a replacement for regular backups, but a last line of +defense for the case where the backups are also damaged. + + The lzip file format is designed for data sharing and long-term +archiving, taking into account both data integrity and decoder availability: + + * The lzip format provides very safe integrity checking and some data + recovery means. The program lziprecover can repair bit flip errors + (one of the most common forms of data corruption) in lzip files, and + provides data recovery capabilities, including error-checked merging + of damaged copies of a file. *Note Data safety::. + + * The lzip format is as simple as possible (but not simpler). The lzip + manual provides the source code of a simple decompressor along with a + detailed explanation of how it works, so that with the only help of the + lzip manual it would be possible for a digital archaeologist to extract + the data from a lzip file long after quantum computers eventually + render LZMA obsolete. + + * Additionally the lzip reference implementation is copylefted, which + guarantees that it will remain free forever. + + A nice feature of the lzip format is that a corrupt byte is easier to +repair the nearer it is from the beginning of the file. Therefore, with the +help of lziprecover, losing an entire archive just because of a corrupt +byte near the beginning is a thing of the past. + + Compression may be good for long-term archiving. For compressible data, +multiple compressed copies may provide redundancy in a more useful form and +may have a better chance of surviving intact than one uncompressed copy +using the same amount of storage space. This is especially true if the +format provides recovery capabilities like those of lziprecover, which is +able to find and combine the good parts of several damaged copies. + + Lziprecover is able to recover or decompress files produced by any of the +compressors in the lzip family: lzip, plzip, minilzip/lzlib, clzip, and +pdlzip. + + If the cause of file corruption is a damaged medium, the combination +GNU ddrescue + lziprecover is the recommended option for recovering data +from damaged lzip files. *Note ddrescue-example::, and *note +ddrescue-example2::, for examples. + + If a file is too damaged for lziprecover to repair it, all the +recoverable data in all members of the file can be extracted with the +following command (the resulting file may contain errors and some garbage +data may be produced at the end of each damaged member): + + lziprecover -cd --ignore-errors file.lz > file + + When recovering data, lziprecover takes as arguments the names of the +damaged files and writes zero or more recovered files depending on the +operation selected and whether the recovery succeeded or not. The damaged +files themselves are kept unchanged. + + When decompressing or testing file integrity, lziprecover behaves like +lzip or lunzip. + + LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never +have been compressed. Decompressed is used to refer to data which have +undergone the process of decompression. + + +File: lziprecover.info, Node: Invoking lziprecover, Next: Data safety, Prev: Introduction, Up: Top + +2 Invoking lziprecover +********************** + +The format for running lziprecover is: + + lziprecover [OPTIONS] [FILES] + +When decompressing or testing, a hyphen '-' used as a FILE argument means +standard input. It can be mixed with other FILES and is read just once, the +first time it appears in the command line. If no file names are specified, +lziprecover decompresses from standard input to standard output. Remember +to prepend './' to any file name beginning with a hyphen, or use '--'. + + lziprecover supports the following options: *Note Argument syntax: +(arg_parser)Argument syntax. + +'-h' +'--help' + Print an informative help message describing the options and exit. + +'-V' +'--version' + Print the version number of lziprecover on the standard output and + exit. This version number should be included in all bug reports. + +'-a' +'--trailing-error' + Exit with error status 2 if any remaining input is detected after + decompressing the last member. Such remaining input is usually trailing + garbage that can be safely ignored. *Note concat-example::. + +'-A' +'--alone-to-lz' + Convert lzma-alone files to lzip format without recompressing, just + adding a lzip header and trailer. The conversion minimizes the + dictionary size of the resulting file (and therefore the amount of + memory required to decompress it). Only streamed files with default + LZMA properties can be converted; non-streamed lzma-alone files lack + the "End Of Stream" marker required in lzip files. + + The name of the converted lzip file is derived from that of the + original lzma-alone file as follows: + + filename.lzma becomes filename.lz + filename.tlz becomes filename.tar.lz + anyothername becomes anyothername.lz + +'-c' +'--stdout' + Write decompressed data to standard output; keep input files + unchanged. This option (or '-o') is needed when reading from a named + pipe (fifo) or from a device. Use it also to recover as much of the + decompressed data as possible when decompressing a corrupt file. '-c' + overrides '-o'. '-c' has no effect when merging, removing members, + repairing, reproducing, splitting, testing or listing. + +'-d' +'--decompress' + Decompress the files specified. The integrity of the files specified is + checked. If a file does not exist, can't be opened, or the destination + file already exists and '--force' has not been specified, lziprecover + continues decompressing the rest of the files and exits with error + status 1. If a file fails to decompress, or is a terminal, lziprecover + exits immediately with error status 2 without decompressing the rest + of the files. A terminal is considered an uncompressed file, and + therefore invalid. + +'-D RANGE' +'--range-decompress=RANGE' + Decompress only a range of bytes starting at decompressed byte position + BEGIN and up to byte position END - 1. Byte positions start at 0. This + option provides random access to the data in multimember files; it + only decompresses the members containing the desired data. In order to + guarantee the correctness of the data produced, all members containing + any part of the desired data are decompressed and their integrity is + checked. + + Four formats of RANGE are recognized, 'BEGIN', 'BEGIN-END', + 'BEGIN,SIZE', and ',SIZE'. If only BEGIN is specified, END is taken as + the end of the file. If only SIZE is specified, BEGIN is taken as the + beginning of the file. The bytes produced are sent to standard output + unless the option '--output' is used. + +'-e' +'--reproduce' + Try to recover a missing (zeroed) sector in FILE using a reference + file and the same version of lzip that created FILE. If successful, a + repaired copy is written to the file FILE_fixed.lz. FILE is not + modified at all. The exit status is 0 if the member containing the + zeroed sector could be repaired, 2 otherwise. Note that FILE_fixed.lz + may still contain errors in the members following the one repaired. + *Note Reproducing one sector::, for a complete description of the + reproduce mode. + +'--lzip-level=DIGIT|a|m[LENGTH]' + Try only the given compression level or match length limit when + reproducing a zeroed sector. '--lzip-level=a' tries all the + compression levels (0 to 9), while '--lzip-level=m' tries all the + match length limits (5 to 273). + +'--lzip-name=NAME' + Set the name of the lzip executable used by '--reproduce'. If + '--lzip-name' is not specified, 'lzip' is used. + +'--reference-file=FILE' + Set the reference file used by '--reproduce'. It must contain the + uncompressed data corresponding to the missing compressed data of the + zeroed sector, plus some context data before and after them. + +'-f' +'--force' + Force overwrite of output files. + +'-i' +'--ignore-errors' + Make '--decompress', '--test', and '--range-decompress' ignore format + and data errors and continue decompressing the remaining members in + the file; keep input files unchanged. For example, the commands + 'lziprecover -cd -i file.lz > file' or + 'lziprecover -D0 -i file.lz > file' decompress all the recoverable + data in all members of 'file.lz' without having to split it first. The + '-cd -i' method resyncs to the next member header after each error, + and is immune to some format errors that make '-D0 -i' fail. The range + decompressed may be smaller than the range requested, because of the + errors. The exit status is set to 0 unless other errors are found (I/O + errors, for example). + + Make '--list', '--dump', '--remove', and '--strip' ignore format + errors. The sizes of the members with errors (especially the last) may + be wrong. + +'-k' +'--keep' + Keep (don't delete) input files during decompression. + +'-l' +'--list' + Print the uncompressed size, compressed size, and percentage saved of + the files specified. Trailing data are ignored. The values produced + are correct even for multimember files. If more than one file is + given, a final line containing the cumulative sizes is printed. With + '-v', the dictionary size, the number of members in the file, and the + amount of trailing data (if any) are also printed. With '-vv', the + positions and sizes of each member in multimember files are also + printed. With '-i', format errors are ignored, and with '-ivv', gaps + between members are shown. The member numbers shown coincide with the + file numbers produced by '--split'. + + If any file is damaged, does not exist, can't be opened, or is not + regular, the final exit status is > 0. '-lq' can be used to check + quickly (without decompressing) the structural integrity of the files + specified. (Use '--test' to check the data integrity). '-alq' + additionally checks that none of the files specified contain trailing + data. + +'-m' +'--merge' + Try to produce a correct file by merging the good parts of two or more + damaged copies. If successful, a repaired copy is written to the file + FILE_fixed.lz. The exit status is 0 if a correct file could be + produced, 2 otherwise. *Note Merging files::, for a complete + description of the merge mode. + +'-o FILE' +'--output=FILE' + Place the repaired output into FILE instead of into FILE_fixed.lz. If + splitting, the names of the files produced are in the form + 'rec01FILE', 'rec02FILE', etc. + + If '-c' has not been also specified, write the (de)compressed output + to FILE, automatically creating any missing parent directories; keep + input files unchanged. This option (or '-c') is needed when reading + from a named pipe (fifo) or from a device. '-o -' is equivalent to + '-c'. '-o' has no effect when testing or listing. + +'-q' +'--quiet' + Quiet operation. Suppress all messages. + +'-R' +'--byte-repair' + Try to repair a FILE with small errors (up to one single-byte error + per member). If successful, a repaired copy is written to the file + FILE_fixed.lz. FILE is not modified at all. The exit status is 0 if + the file could be repaired, 2 otherwise. *Note Repairing one byte::, + for a complete description of the repair mode. + +'-s' +'--split' + Search for members in FILE and write each member in its own file. Gaps + between members are detected and each gap is saved in its own file. + Trailing data (if any) are saved alone in the last file. You can then + use 'lziprecover -t' to test the integrity of the resulting files, + decompress those which are undamaged, and try to repair or partially + decompress those which are damaged. Gaps may contain garbage or may be + members with corrupt headers or trailers. If other lziprecover + functions fail to work on a multimember FILE because of damage in + headers or trailers, try to split FILE and then work on each member + individually. + + The names of the files produced are in the form 'rec01FILE', + 'rec02FILE', etc, and are designed so that the use of wildcards in + subsequent processing, for example, + 'lziprecover -cd rec*FILE > recovered_data', processes the files in + the correct order. The number of digits used in the names varies + depending on the number of members in FILE. + +'-t' +'--test' + Check integrity of the files specified, but don't decompress them. This + really performs a trial decompression and throws away the result. Use + it together with '-v' to see information about the files. If a file + fails the test, does not exist, can't be opened, or is a terminal, + lziprecover continues testing the rest of the files. A final + diagnostic is shown at verbosity level 1 or higher if any file fails + the test when testing multiple files. + +'-v' +'--verbose' + Verbose mode. + When decompressing or testing, further -v's (up to 4) increase the + verbosity level, showing status, compression ratio, dictionary size, + trailer contents (CRC, data size, member size), and up to 6 bytes of + trailing data (if any) both in hexadecimal and as a string of printable + ASCII characters. + Two or more '-v' options show the progress of decompression. + In other modes, increasing verbosity levels show final status, progress + of operations, and extra information (for example, the failed areas). + +'--dump=[MEMBER_LIST][:damaged][:empty][:tdata]' + Dump the members listed, the damaged members (if any), the empty + members (if any), or the trailing data (if any) of one or more regular + multimember files to standard output, or to a file if the option + '--output' is used. If more than one file is given, the elements + dumped from all the files are concatenated. If a file does not exist, + can't be opened, or is not regular, lziprecover continues processing + the rest of the files. If the dump fails in one file, lziprecover + exits immediately without processing the rest of the files. Only + '--dump=tdata' can write to a terminal. '--dump=damaged' implies + '--ignore-errors'. + + The argument to '--dump' is a colon-separated list of the following + element specifiers; a member list (1,3-6), a reverse member list + (r1,3-6), and the strings "damaged", "empty", and "tdata" (which may + be shortened to 'd', 'e', and 't' respectively). A member list selects + the members (or gaps) listed, whose numbers coincide with those shown + by '--list'. A reverse member list selects the members listed counting + from the last member in the file (r1). Negated versions of both kinds + of lists exist (^1,3-6:r^1,3-6) which select all the members except + those in the list. The strings "damaged", "empty", and "tdata" select + the damaged members, the empty members (those with a data size = 0), + and the trailing data respectively. If the same member is selected + more than once, for example by '1:r1' in a single-member file, it is + dumped just once. See the following examples: + + '--dump' argument Elements dumped + --------------------------------------------------------------------- + '1,3-6' members 1, 3, 4, 5, 6 + 'r1-3' last 3 members in file + '^13,15' all but 13th and 15th members in file + 'r^1' all but last member in file + 'damaged' all damaged members in file + 'empty' all empty members in file + 'tdata' trailing data + '1-5:r1:tdata' members 1 to 5, last member, trailing data + 'damaged:tdata' damaged members, trailing data + '3,12:damaged:tdata' members 3, 12, damaged members, trailing data + +'--remove=[MEMBER_LIST][:damaged][:empty][:tdata]' + Remove the members listed, the damaged members (if any), the empty + members (if any), or the trailing data (if any) from regular + multimember files in place. The date of each file modified is + preserved if possible. If all members in a file are selected to be + removed, the file is left unchanged and the exit status is set to 2. + If a file does not exist, can't be opened, is not regular, or is left + unchanged, lziprecover continues processing the rest of the files. In + case of I/O error, lziprecover exits immediately without processing + the rest of the files. See '--dump' above for a description of the + argument. + + This option may be dangerous even if only the trailing data are being + removed because the file may be corrupt or the trailing data may + contain a forbidden combination of characters. *Note Trailing data::. + It is safer to send the output of '--strip' to a temporary file, check + it, and then copy it over the original file. But if you prefer + '--remove' because of its more efficient in-place removal, it is + advisable to make a backup before attempting the removal. At least + check that 'lzip -cd file.lz | wc -c' and the uncompressed size shown + by 'lzip -l file.lz' match before attempting the removal of trailing + data. + +'--strip=[MEMBER_LIST][:damaged][:empty][:tdata]' + Copy one or more regular multimember files to standard output (or to a + file if the option '--output' is used), stripping the members listed, + the damaged members (if any), the empty members (if any), or the + trailing data (if any) from each file. If all members in a file are + selected to be stripped, the trailing data (if any) are also stripped + even if 'tdata' is not specified. If more than one file is given, the + files are concatenated. In this case the trailing data are also + stripped from all but the last file even if 'tdata' is not specified. + If a file does not exist, can't be opened, or is not regular, + lziprecover continues processing the rest of the files. If a file + fails to copy, lziprecover exits immediately without processing the + rest of the files. See '--dump' above for a description of the + argument. + +'--empty-error' + Exit with error status 2 if any empty member is found in the input + files. + +'--marking-error' + Exit with error status 2 if the first LZMA byte is non-zero in any + member of the input files. This may be caused by data corruption or by + deliberate insertion of tracking information in the file. Use + 'lziprecover --clear-marking' to clear any such non-zero bytes. + +'--loose-trailing' + When decompressing, testing, or listing, allow trailing data whose + first bytes are so similar to the magic bytes of a lzip header that + they can be confused with a corrupt header. Use this option if a file + triggers a "corrupt header" error and the cause is not indeed a + corrupt header. + +'--clear-marking' + Set to zero the first LZMA byte of each member in the files specified. + At verbosity level 1 (-v), print the number of members cleared. The + date of each file modified is preserved if possible. This option + exists because the first byte of the LZMA stream is ignored by the + range decoder, and can therefore be (mis)used to store any value which + can then be used as a watermark to track the path of the compressed + payload. + + + Lziprecover also supports the following debug options (for experts): + +'-E RANGE[,SECTOR_SIZE]' +'--debug-reproduce=RANGE[,SECTOR_SIZE]' + Load the compressed FILE into memory, set all bytes in the positions + specified by RANGE to 0, and try to reproduce a correct compressed + file. *Note --reproduce::. *Note range-format::, for a description of + RANGE. If a SECTOR_SIZE is specified, set each sector to 0 in sequence + and try to reproduce the file, printing to standard output final + statistics of the number of sectors reproduced successfully. Exit with + nonzero status only in case of fatal error. + +'-M' +'--md5sum' + Print to standard output the MD5 digests of the input FILES one per + line in the same format produced by the 'md5sum' tool. Lziprecover + uses MD5 digests to check the result of some operations. This option + can be used to test the correctness of lziprecover's implementation of + the MD5 algorithm. + +'-S[VALUE]' +'--nrep-stats[=VALUE]' + Compare the frequency of sequences of N repeated bytes of a given + VALUE in the compressed LZMA streams of the input FILES with the + frequency expected for random data (1 / 2^(8N)). If VALUE is not + specified, print the frequency of repeated sequences of all possible + byte values. Print cumulative data for all the files, followed by the + name of the first file with the longest sequence. + +'-U 1|BSIZE' +'--unzcrash=1|BSIZE' + With argument '1', test 1-bit errors in the LZMA stream of the + compressed input FILE like the command + 'unzcrash -b1 -p7 -s-20 'lzip -t' FILE' but in memory, and therefore + much faster (30 to 50 times faster). *Note Unzcrash::. This option + tests all the members independently in a multimember file, skipping + headers and trailers. If a decompression succeeds, the decompressed + output is compared with the decompressed output of the original FILE + using MD5 digests. FILE must not contain errors and must decompress + correctly for the comparisons to work. + + With argument 'B', test zeroed sectors (blocks of bytes) in the LZMA + stream of the compressed input FILE like the command + 'unzcrash --block=SIZE -d1 -p7 -s-(SIZE+20) 'lzip -t' FILE' but in + memory, and therefore much faster. Testing and comparisons work just + like with the argument '1' explained above. + + By default '--unzcrash' only prints the interesting cases; CRC + mismatches, size mismatches, unsupported marker codes, unexpected EOFs, + apparently successful decompressions, and decoder errors detected + 50_000 or more bytes beyond the byte (or the start of the block) being + tested. At verbosity level 1 (-v) it also prints decoder errors + detected 10_000 or more bytes beyond the byte being tested. At + verbosity level 2 (-vv) it prints all cases for 1-bit errors or the + decoder errors detected beyond the end of the block for zeroed blocks. + +'-W POSITION,VALUE' +'--debug-decompress=POSITION,VALUE' + Load the compressed FILE into memory, set the byte at POSITION to + VALUE, and decompress the modified compressed data to standard output. + If the damaged member can be decompressed to the end (just fails with + a CRC mismatch), the members following it are also decompressed. + +'-X[POSITION,VALUE]' +'--show-packets[=POSITION,VALUE]' + Load the compressed FILE into memory, optionally set the byte at + POSITION to VALUE, decompress the modified compressed data (discarding + the output), and print to standard output descriptions of the LZMA + packets being decoded. + +'-Y RANGE' +'--debug-delay=RANGE' + Load the compressed FILE into memory and then repeatedly decompress + it, increasing 256 times each byte of the subset of the compressed data + positions specified by RANGE, so as to test all possible one-byte + errors. For each decompression error find the error detection delay and + print to standard output the maximum delay. The error detection delay + is the difference between the position of the error and the position + where the decoder realized that the data contains an error. *Note + range-format::, for a description of RANGE. + +'-Z POSITION,VALUE' +'--debug-byte-repair=POSITION,VALUE' + Load the compressed FILE into memory, set the byte at POSITION to + VALUE, and then try to repair the byte error. *Note --byte-repair::. + + + Numbers given as arguments to options may be expressed in decimal, +hexadecimal, or octal (using the same syntax as integer constants in C++), +and may be followed by a multiplier and an optional 'B' for "byte". + + Table of SI and binary prefixes (unit multipliers): + +Prefix Value | Prefix Value +k kilobyte (10^3 = 1000) | Ki kibibyte (2^10 = 1024) +M megabyte (10^6) | Mi mebibyte (2^20) +G gigabyte (10^9) | Gi gibibyte (2^30) +T terabyte (10^12) | Ti tebibyte (2^40) +P petabyte (10^15) | Pi pebibyte (2^50) +E exabyte (10^18) | Ei exbibyte (2^60) +Z zettabyte (10^21) | Zi zebibyte (2^70) +Y yottabyte (10^24) | Yi yobibyte (2^80) +R ronnabyte (10^27) | Ri robibyte (2^90) +Q quettabyte (10^30) | Qi quebibyte (2^100) + + + Exit status: 0 for a normal exit, 1 for environmental problems (file not +found, invalid command-line options, I/O errors, etc), 2 to indicate a +corrupt or invalid input file, 3 for an internal consistency error (e.g., +bug) which caused lziprecover to panic. + + +File: lziprecover.info, Node: Data safety, Next: Repairing one byte, Prev: Invoking lziprecover, Up: Top + +3 Protecting data from accidental loss +************************************** + +It is a fact of life that sometimes data becomes corrupt. Software has +errors. Hardware may misbehave or fail. RAM may be struck by a cosmic ray. +This is why a safe enough integrity checking is needed in compressed +formats, and the reason why a data recovery tool is sometimes needed. + + There are 3 main types of data corruption that may cause data loss: +single-byte errors, multibyte errors (generally affecting a whole sector in +a block device), and total device failure. + + Lziprecover protects natively against single-byte errors as long as file +integrity is checked frequently enough that a second single-byte error does +not develop in the same member before the first one is repaired. *Note +Repairing one byte::. + + Lziprecover also protects against multibyte errors if at least one backup +copy of the file is made (*note Merging files::), or if the error is a +zeroed sector and the uncompressed data corresponding to the zeroed sector +are available (*note Reproducing one sector::). If you can choose between +merging and reproducing, try merging first because it is usually faster, +easier to use, and has a high probability of success. + + Lziprecover can't help in case of device failure. The only remedy for +total device failure is storing backup copies in separate media. + + The extraordinary safety of the lzip format allows lziprecover to exploit +the redundance that occurrs naturally when making compressed backups. +Lziprecover can recover data that would not be recoverable from files +compressed in other formats. Let's see two examples of how much better is +lzip compared with gzip and bzip2 with respect to data safety: + +* Menu: + +* Merging with a backup:: Recovering a file using a damaged backup +* Reproducing a mailbox:: Recovering new messages using an old backup + + +File: lziprecover.info, Node: Merging with a backup, Next: Reproducing a mailbox, Up: Data safety + +3.1 Recovering a file using a damaged backup +============================================ + +Let's suppose that you made a compressed backup of your valuable scientific +data and stored two copies on separate media. Years later you notice that +both copies are corrupt. + + If you compressed the data with gzip and both copies suffer any damage in +the data stream, even if it is just one altered bit, the original data can +only be recovered by an expert, if at all. + + If you used bzip2, and if the file is large enough to contain more than +one compressed data block (usually larger than 900 kB uncompressed), and if +no block is damaged in both files, then the data can be manually recovered +by splitting the files with bzip2recover, checking every block, and then +copying the right blocks in the right order into another file. + + But if you used lzip, the data can be automatically recovered with +'lziprecover --merge' as long as the damaged areas don't overlap. + + Note that each error in a bzip2 file makes a whole block unusable, but +each error in a lzip file only affects the damaged bytes, making it +possible to recover a file with thousands of errors. + + +File: lziprecover.info, Node: Reproducing a mailbox, Prev: Merging with a backup, Up: Data safety + +3.2 Recovering new messages using an old backup +=============================================== + +Let's suppose that you make periodic backups of your email messages stored +in one or more mailboxes. (A mailbox is a file containing a possibly large +number of email messages). New messages are appended to the end of each +mailbox, therefore the initial part of two consecutive backups is identical +unless some messages have been changed or deleted in the meantime. The new +messages added to each backup are usually a small part of the whole mailbox. + ++============================================+ +| Older backup containing some messages | ++============================================+ ++============================================+========================+ +| Newer backup containing the messages above | plus some new messages | ++============================================+========================+ + + One day you discover that your mailbox has disappeared because you +deleted it inadvertently or because of a bug in your email reader. Not only +that. You need to recover a recent message, but the last backup you made of +the mailbox (the newer backup above) has lost the data corresponding to a +whole sector because of an I/O error in the part containing the old +messages. + + If you compressed the mailbox with gzip, usually none of the new messages +can be recovered even if they are intact because all the data beyond the +missing sector can't be decoded. + + If you used bzip2, and if the newer backup is large enough that the new +messages are in a different compressed data block than the one damaged +(usually larger than 900 kB uncompressed), then you can recover the new +messages manually with bzip2recover. If the backups are identical except for +the new messages appended, you may even recover the whole newer backup by +combining the good blocks from both backups. + + But if you used lzip, the whole newer backup can be automatically +recovered with 'lziprecover --reproduce' as long as the missing bytes can be +recovered from the older backup, even if other messages in the common part +have been changed or deleted. Mailboxes seem to be especially easy to +reproduce. The probability of reproducing a mailbox (*note +performance-of-reproduce::) is almost as high as that of merging two +identical backups (*note performance-of-merge::). + + +File: lziprecover.info, Node: Repairing one byte, Next: Merging files, Prev: Data safety, Up: Top + +4 Repairing one byte +******************** + +Lziprecover can repair perfectly most files with small errors (up to one +single-byte error per member), without the need of any extra redundance at +all. If the reparation is successful, the repaired file is identical bit for +bit to the original. This makes lzip files resistant to bit flip, one of the +most common forms of data corruption. + + The file is repaired in memory. Therefore, enough virtual memory +(RAM + swap) to contain the largest damaged member is required. + + The error may be located anywhere in the file except in the first 5 +bytes of each member header or in the 'Member size' field of the trailer +(last 8 bytes of each member). If the error is in the header it can be +easily repaired with a text editor like GNU Moe (*note File format::). If +the error is in the member size, it is enough to ignore the message about +'bad member size' when decompressing. + + Bit flip happens when one bit in the file is changed from 0 to 1 or vice +versa. It may be caused by bad RAM or even by natural radiation. I have +seen a case of bit flip in a file stored on an USB flash drive. + + One byte may seem small, but most file corruptions not produced by +transmission errors or I/O errors just affect one byte, or even one bit, of +the file. Also, unlike magnetic media, where errors usually affect a whole +sector, solid-state storage devices tend to produce single-byte errors, +making of lzip the perfect format for data stored on such devices. + + Repairing a file can take some time. Small files or files with the error +located near the beginning can be repaired in a few seconds. But repairing +a large file compressed with a large dictionary size and with the error +located far from the beginning, may take hours. + + On the other hand, errors located near the beginning of the file cause +much more loss of data than errors located near the end. So lziprecover +repairs more efficiently the worst errors. + + +File: lziprecover.info, Node: Merging files, Next: Reproducing one sector, Prev: Repairing one byte, Up: Top + +5 Merging files +*************** + +If you have several copies of a file but all of them are too damaged to +repair them individually (*note Repairing one byte::), lziprecover can try +to produce a correct file by merging the good parts of the damaged copies. + + The merge may succeed even if some copies of the file have all the +headers and trailers damaged, as long as there is at least one copy of +every header and trailer intact, even if they are in different copies of +the file. + + The merge fails if the damaged areas overlap (at least one byte is +damaged in all copies), or are adjacent and the boundary can't be +determined, or if the copies have too many damaged areas. + + All the copies to be merged must have the same size. If any of them is +larger or smaller than it should, either because it has been truncated or +because it got some garbage data appended at the end, it can be brought to +the correct size with the following command before merging it with the other +copies: + + ddrescue -s<correct_size> -x<correct_size> file.lz correct_size_file.lz + + To give you an idea of its possibilities, when merging two copies, each +of them with one damaged area affecting 1 percent of the copy, the +probability of obtaining a correct file is about 98 percent. With three +such copies the probability rises to 99.97 percent. For large files (a few +MB) with small errors (one sector damaged per copy), the probability +approaches 100 percent even with only two copies. (Supposing that the +errors are randomly located inside each copy). + + Some types of solid-state device (NAND flash, for example) can produce +bursts of scattered single-bit errors. Lziprecover is able to merge files +with thousands of such scattered errors by grouping the errors into +clusters and then merging the files as if each cluster were a single error. + + Here is a real case of successful merging. Two copies of the file +'icecat-3.5.3-x86.tar.lz' (compressed size 9 MB) became corrupt while +stored on the same NAND flash device. One of the copies had 76 single-bit +errors scattered in an area of 1020 bytes, and the other had 3028 such +errors in an area of 31729 bytes. Lziprecover produced a correct file, +identical to the original, in just 5 seconds: + + lziprecover -vvm a/icecat-3.5.3-x86.tar.lz b/icecat-3.5.3-x86.tar.lz + Merging member 1 of 1 (2552 errors) + 2552 errors have been grouped in 16 clusters. + Trying variation 2 of 2, block 2 + Input files merged successfully. + + Note that the number of errors reported by lziprecover (2552) is lower +than the number of corrupt bytes (3104) because contiguous corrupt bytes +are counted as a single multibyte error. + + +Example 1: Recover a compressed backup from two copies on CD-ROM with +error-checked merging of copies. *Note GNU ddrescue manual: (ddrescue)Top, +for details about ddrescue. + + ddrescue -d -r1 -b2048 /dev/cdrom cdimage1 mapfile1 + mount -t iso9660 -o loop,ro cdimage1 /mnt/cdimage + cp /mnt/cdimage/backup.tar.lz rescued1.tar.lz + umount /mnt/cdimage + (insert second copy in the CD drive) + ddrescue -d -r1 -b2048 /dev/cdrom cdimage2 mapfile2 + mount -t iso9660 -o loop,ro cdimage2 /mnt/cdimage + cp /mnt/cdimage/backup.tar.lz rescued2.tar.lz + umount /mnt/cdimage + lziprecover -m -v -o backup.tar.lz rescued1.tar.lz rescued2.tar.lz + Input files merged successfully. + lziprecover -tv backup.tar.lz + backup.tar.lz: ok + + +Example 2: Recover the first volume of those created with the command +'lzip -b 32MiB -S 650MB big_db' from two copies, 'big_db1_00001.lz' and +'big_db2_00001.lz', with member 07 damaged in the first copy, member 18 +damaged in the second copy, and member 12 damaged in both copies. The +correct file produced is saved in 'big_db_00001.lz'. + + lziprecover -m -v -o big_db_00001.lz big_db1_00001.lz big_db2_00001.lz + Input files merged successfully. + lziprecover -tv big_db_00001.lz + big_db_00001.lz: ok + + +File: lziprecover.info, Node: Reproducing one sector, Next: Tarlz, Prev: Merging files, Up: Top + +6 Reproducing one sector +************************ + +Lziprecover can recover a zeroed sector in a lzip file by concatenating the +decompressed contents of the file up to the beginning of the zeroed sector +and the uncompressed data corresponding to the zeroed sector, and then +feeding the concatenated data to the same version of lzip that created the +file. For this to work, a reference file is required containing the +uncompressed data corresponding to the missing compressed data of the zeroed +sector, plus some context data before and after them. It is possible to +recover a large file using just a few kB of reference data. + + The difficult part is finding a suitable reference file. It must contain +the exact data required (possibly mixed with other data). Containing similar +data is not enough. + + A zeroed sector may be caused by the incomplete recovery of a damaged +storage device (with I/O errors) using, for example, ddrescue. The +reproduction can't be done if the zeroed sector overlaps with the first 15 +bytes of a member, or if the zeroed sector is smaller than 8 bytes. + + The file is reproduced in memory. Therefore, enough virtual memory +(RAM + swap) to contain the damaged member is required. + + To understand how it works, take any lzipped file, say 'foo.lz', +decompress it (keeping the original), and try to reproduce an artificially +zeroed sector in it by running the following commands: + + lzip -kd foo.lz + lziprecover -vv --debug-reproduce=65536,512 --reference-file=foo foo.lz + +which should produce an output like the following: + + Reproducing: foo.lz + Reference file: foo + Testing sectors of size 512 at file positions 65536 to 66047 + (master mpos = 65536, dpos = 296892) + foo: Match found at offset 296892 + Reproduction succeeded at pos 65536 + + 1 sectors tested + 1 reproductions returned with zero status + all comparisons passed + + Using 'foo' as reference file guarantees that any zeroed sector in +'foo.lz' can be reproduced because both files contain the same data. In +real use, the reference file needs to contain the data corresponding to the +zeroed sector, but the rest of the data (if any) may differ between both +files. The reference data may be obtained from the partial decompression of +the damaged file itself if it contains repeated data. For example if the +damaged file is a compressed tarball containing several partially modified +versions of the same file. + + The offset reported by lziprecover is the position in the reference file +of the first byte that could not be decompressed. This is the first byte +that will be compressed to reproduce the zeroed sector. + + The reproduce mode tries to reproduce the missing compressed data +originally present in the zeroed sector. It is based on the perfect +reproducibility of lzip files (lzip produces identical compressed output +from identical input). Therefore, the same version of lzip that created the +file to be reproduced should be used to reproduce the zeroed sector. Near +versions may also work because the output of lzip changes infrequently. If +reproducing a tar.lz archive created with tarlz, the version of lzip, +clzip, or minilzip corresponding to the version of the lzlib library used +by tarlz to create the archive should be used. + + When recovering a tar.lz archive and using as reference a file from the +filesystem, if the zeroed sector encodes (part of) a tar header, the archive +can't be reproduced. Therefore, the less overhead (smaller headers) a tar +archive has, the more probable is that the zeroed sector does not include a +header, and that the archive can be reproduced. The tarlz format has minimum +overhead. It uses basic ustar headers, and only adds extended pax headers +when they are required. + +6.1 Performance of '--reproduce' +================================ + +Reproduce mode is especially useful when recovering a corrupt backup (or a +corrupt source tarball) that is part of a series. Usually only a small +fraction of the data changes from one backup to the next or from one version +of a source tarball to the next. This makes sometimes possible to reproduce +a given corrupted version using reference data from a near version. The +following two tables show the fraction of reproducible sectors (reproducible +sectors divided by total sectors in archive) for some archives, using sector +sizes of 512 and 4096 bytes. 'mailbox-aug.tar.lz' is a backup of some of my +mailboxes. 'backup-feb.tar.lz' and 'backup-apr.tar.lz' are real backups of +my own working directory: + +Reference file File Reproducible (512) +--------------------------------------------------------- +backup-feb.tar backup-apr.tar.lz 3273 / 4342 = 75.38% +backup-apr.tar backup-feb.tar.lz 3259 / 4161 = 78.32% +gawk-5.0.0.tar gawk-5.0.1.tar.lz 4369 / 5844 = 74.76% +gawk-5.0.1.tar gawk-5.0.0.tar.lz 4379 / 5603 = 78.15% +gmp-6.1.1.tar gmp-6.1.2.tar.lz 2454 / 3787 = 64.8% +gmp-6.1.2.tar gmp-6.1.1.tar.lz 2461 / 3782 = 65.07% + +Reference file File Reproducible (4096) +----------------------------------------------------------- +mailbox-mar.tar mailbox-aug.tar.lz 4036 / 4252 = 94.92% +backup-feb.tar backup-apr.tar.lz 264 / 542 = 48.71% +backup-apr.tar backup-feb.tar.lz 264 / 520 = 50.77% +gawk-5.0.0.tar gawk-5.0.1.tar.lz 327 / 730 = 44.79% +gawk-5.0.1.tar gawk-5.0.0.tar.lz 326 / 700 = 46.57% +gmp-6.1.1.tar gmp-6.1.2.tar.lz 175 / 473 = 37% +gmp-6.1.2.tar gmp-6.1.1.tar.lz 181 / 472 = 38.35% + + Note that the "performance of reproduce" is a probability, not a partial +recovery. The data are either recovered fully (with the probability X shown +in the last column of the tables above) or not recovered at all (with +probability 1 - X). + +Example 1: Recover a damaged source tarball with a zeroed sector of 512 +bytes at file position 1019904, using as reference another source tarball +for a different version of the software. + + lziprecover -vv -e --reference-file=gmp-6.1.1.tar gmp-6.1.2.tar.lz + Reproducing bad area in member 1 of 1 + (begin = 1019904, size = 512, value = 0x00) + (master mpos = 1019904, dpos = 6292134) + warning: gmp-6.1.1.tar: Partial match found at offset 6277798, len 8716. + Reference data may be mixed with other data. + Trying level -9 + Reproducing position 1015808 + Member reproduced successfully. + Copy of input file reproduced successfully. + + +Example 2: Recover a damaged backup with a zeroed sector of 4096 bytes at +file position 1019904, using as reference a previous backup. The damaged +backup comes from a damaged partition copied with ddrescue. + + ddrescue -b4096 -r10 /dev/sdc1 hdimage mapfile + mount -o loop,ro hdimage /mnt/hdimage + cp /mnt/hdimage/backup.tar.lz backup.tar.lz + umount /mnt/hdimage + lzip -t backup.tar.lz + backup.tar.lz: Decoder error at pos 1020530 + lziprecover -vv -e --reference-file=old_backup.tar backup.tar.lz + Reproducing bad area in member 1 of 1 + (begin = 1019904, size = 4096, value = 0x00) + (master mpos = 1019903, dpos = 5857954) + warning: old_backup.tar: Partial match found at offset 5743778, len 9546. + Reference data may be mixed with other data. + Trying level -9 + Reproducing position 1015808 + Member reproduced successfully. + Copy of input file reproduced successfully. + + +Example 3: Recover a damaged backup with a zeroed sector of 4096 bytes at +file position 1019904, using as reference a file from the filesystem. (If +the zeroed sector encodes (part of) a tar header, the tarball can't be +reproduced). + + # List the contents of the backup tarball to locate the damaged member. + tarlz -n0 -tvf backup.tar.lz + [...] + example.txt + tarlz: Skipping to next header. + tarlz: backup.tar.lz: Archive ends unexpectedly. + # Find in the filesystem the last file listed and use it as reference. + lziprecover -vv -e --reference-file=/somedir/example.txt backup.tar.lz + Reproducing bad area in member 1 of 1 + (begin = 1019904, size = 4096, value = 0x00) + (master mpos = 1019903, dpos = 5857954) + /somedir/example.txt: Match found at offset 9378 + Trying level -9 + Reproducing position 1015808 + Member reproduced successfully. + Copy of input file reproduced successfully. + + If 'backup.tar.lz' is a multimember file with more than one member +damaged and lziprecover shows the message 'One member reproduced. Copy of +input file still contains errors.', the procedure shown in the example +above can be repeated until all the members have been reproduced. + + 'tarlz --keep-damaged -n0 -xf backup.tar.lz example.txt' produces a +partial copy of the reference file 'example.txt' that may help locate a +complete copy in the filesystem or in another backup, even if 'example.txt' +has been renamed. + + +File: lziprecover.info, Node: Tarlz, Next: File names, Prev: Reproducing one sector, Up: Top + +7 Options supporting the tar.lz format +************************************** + +Tarlz is a massively parallel (multi-threaded) combined implementation of +the tar archiver and the lzip compressor. + + Tarlz creates tar archives using a simplified and safer variant of the +POSIX pax format compressed in lzip format, keeping the alignment between +tar members and lzip members. The resulting multimember tar.lz archive is +backward compatible with standard tar tools like GNU tar, which treat it +like any other tar.lz archive. *Note tarlz manual: (tarlz)Top, and *note +lzip manual: (lzip)Top. + + Multimember tar.lz archives have some safety advantages over solidly +compressed tar.lz archives. For example, in case of corruption, tarlz can +extract all the undamaged members from the tar.lz archive, skipping over the +damaged members, just like the standard (uncompressed) tar. Keeping the +alignment between tar members and lzip members minimizes the amount of data +lost in case of corruption. In this chapter we'll explain the ways in which +lziprecover can recover and process multimember tar.lz archives. + + +7.1 Recovering damaged multimember tar.lz archives +================================================== + +If you have several copies of the damaged archive, try merging them first +because merging has a high probability of success. *Note Merging files::. If +the command below prints something like 'Input files merged successfully.' +you are done and 'archive.tar.lz' now contains the recovered archive: + + lziprecover -m -v -o archive.tar.lz a/archive.tar.lz b/archive.tar.lz + + If you only have one copy of the damaged archive with a zeroed block of +data caused by an I/O error, you may try to reproduce the archive. *Note +Reproducing one sector::. If the command below prints something like +'Copy of input file reproduced successfully.' you are done and +'archive_fixed.tar.lz' now contains the recovered archive: + + lziprecover -vv -e --reference-file=old_archive.tar archive.tar.lz + + If you only have one copy of the damaged archive, you may try to repair +the archive, but this has a lower probability of success. *Note Repairing +one byte::. If the command below prints something like +'Copy of input file repaired successfully.' you are done and +'archive_fixed.tar.lz' now contains the recovered archive: + + lziprecover -v -R archive.tar.lz + + If all the above fails, and the archive was created with tarlz, you may +save the damaged members for later and then copy the good members to another +archive. If the two commands below succeed, 'bad_members.tar.lz' will +contain all the damaged members and 'archive_cleaned.tar.lz' will contain a +good archive with the damaged members removed: + + lziprecover -v --dump=damaged -o bad_members.tar.lz archive.tar.lz + lziprecover -v --strip=damaged -o archive_cleaned.tar.lz archive.tar.lz + + You can then use 'tarlz --keep-damaged' to recover as much data as +possible from each damaged member in 'bad_members.tar.lz': + + mkdir tmp + cd tmp + tarlz --keep-damaged -xvf ../bad_members.tar.lz + + +7.2 Processing multimember tar.lz archives +========================================== + +Lziprecover is able to copy a list of members from a file to another. For +example the command +'lziprecover --dump=1-10:r1:tdata archive.tar.lz > subarch.tar.lz' creates +a subset archive containing the first ten members, the end-of-file blocks, +and the trailing data (if any) of 'archive.tar.lz'. The 'r1' part selects +the last member, which in an appendable tar.lz archive contains the +end-of-file blocks. + + +File: lziprecover.info, Node: File names, Next: File format, Prev: Tarlz, Up: Top + +8 Names of the files produced by lziprecover +******************************************** + +The name of the fixed file produced by '--byte-repair' and '--merge' is +made by appending the string '_fixed.lz' to the original file name. If the +original file name ends with one of the extensions '.tar.lz', '.lz', or +'.tlz', the string '_fixed' is inserted before the extension. + + +File: lziprecover.info, Node: File format, Next: Trailing data, Prev: File names, Up: Top + +9 File format +************* + +Perfection is reached, not when there is no longer anything to add, but +when there is no longer anything to take away. +-- Antoine de Saint-Exupery + + + In the diagram below, a box like this: + ++---+ +| | <-- the vertical bars might be missing ++---+ + + represents one byte; a box like this: + ++==============+ +| | ++==============+ + + represents a variable number of bytes. + + + A lzip file consists of one or more independent "members" (compressed +data sets). The members simply appear one after another in the file, with no +additional information before, between, or after them. Each member can +encode in compressed form up to 16 EiB - 1 byte of uncompressed data. The +size of a multimember file is unlimited. + + Each member has the following structure: + ++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size | ++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + All multibyte values are stored in little endian order. + +'ID string (the "magic" bytes)' + A four byte string, identifying the lzip format, with the value "LZIP" + (0x4C, 0x5A, 0x49, 0x50). + +'VN (version number, 1 byte)' + Just in case something needs to be modified in the future. 1 for now. + +'DS (coded dictionary size, 1 byte)' + The dictionary size is calculated by taking a power of 2 (the base + size) and subtracting from it a fraction between 0/16 and 7/16 of the + base size. + Bits 4-0 contain the base 2 logarithm of the base size (12 to 29). + Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract + from the base size to obtain the dictionary size. + Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB + Valid values for dictionary size range from 4 KiB to 512 MiB. + +'LZMA stream' + The LZMA stream, finished by an "End Of Stream" marker. Uses default + values for encoder properties. *Note Stream format: (lzip)Stream + format, for a complete description. + +'CRC32 (4 bytes)' + Cyclic Redundancy Check (CRC) of the original uncompressed data. + +'Data size (8 bytes)' + Size of the original uncompressed data. + +'Member size (8 bytes)' + Total size of the member, including header and trailer. This field acts + as a distributed index, improves the checking of stream integrity, and + facilitates the safe recovery of undamaged members from multimember + files. Lzip limits the member size to 2 PiB to prevent the data size + field from overflowing. + + + +File: lziprecover.info, Node: Trailing data, Next: Examples, Prev: File format, Up: Top + +10 Extra data appended to the file +********************************** + +Sometimes extra data are found appended to a lzip file after the last +member. Such trailing data may be: + + * Padding added to make the file size a multiple of some block size, for + example when writing to a tape. It is safe to append any amount of + padding zero bytes to a lzip file. + + * Useful data added by the user; an "End Of File" string (to check that + the file has not been truncated), a cryptographically secure hash, a + description of file contents, etc. It is safe to append any amount of + text to a lzip file as long as none of the first four bytes of the + text matches the corresponding byte in the string "LZIP", and the text + does not contain any zero bytes (null characters). Nonzero bytes and + zero bytes can't be safely mixed in trailing data. + + * Garbage added by some not totally successful copy operation. + + * Malicious data added to the file in order to make its total size and + hash value (for a chosen hash) coincide with those of another file. + + * In rare cases, trailing data could be the corrupt header of another + member. In multimember or concatenated files the probability of + corruption happening in the magic bytes is 5 times smaller than the + probability of getting a false positive caused by the corruption of the + integrity information itself. Therefore it can be considered to be + below the noise level. Additionally, the test used by lziprecover to + discriminate trailing data from a corrupt header has a Hamming + distance (HD) of 3, and the 3 bit flips must happen in different magic + bytes for the test to fail. In any case, the option '--trailing-error' + guarantees that any corrupt header is detected. + + Trailing data are in no way part of the lzip file format, but tools +reading lzip files are expected to behave as correctly and usefully as +possible in the presence of trailing data. + + Trailing data can be safely ignored in most cases. In some cases, like +that of user-added data, they are expected to be ignored. In those cases +where a file containing trailing data must be rejected, the option +'--trailing-error' can be used. *Note --trailing-error::. + + Lziprecover facilitates the management of metadata stored as trailing +data in lzip files. See the following examples: + +Example 1: Add a comment or description to a compressed file. + + # First append the comment as trailing data to a lzip file + echo 'This file contains this and that' >> file.lz + # This command prints the comment to standard output + lziprecover --dump=tdata file.lz + # This command outputs file.lz without the comment + lziprecover --strip=tdata file.lz > stripped_file.lz + # This command removes the comment from file.lz + lziprecover --remove=tdata file.lz + + +Example 2: Add and check a cryptographically secure hash. (This may be +convenient, but a separate copy of the hash must be kept in a safe place to +guarantee that both file and hash have not been maliciously replaced). + + sha256sum < file.lz >> file.lz + lziprecover --strip=tdata file.lz | sha256sum -c \ + <(lziprecover --dump=tdata file.lz) + + +File: lziprecover.info, Node: Examples, Next: Unzcrash, Prev: Trailing data, Up: Top + +11 A small tutorial with examples +********************************* + +Example 1: Extract all the files from archive 'foo.tar.lz'. + + tar -xf foo.tar.lz + or + lziprecover -cd foo.tar.lz | tar -xf - + + +Example 2: Restore a regular file from its compressed version 'file.lz'. If +the operation is successful, 'file.lz' is removed. + + lziprecover -d file.lz + + +Example 3: Check the integrity of the compressed file 'file.lz' and show +status. + + lziprecover -tv file.lz + + +Example 4: The right way of concatenating the decompressed output of two or +more compressed files. *Note Trailing data::. + + Don't do this + cat file1.lz file2.lz file3.lz | lziprecover -d - + Do this instead + lziprecover -cd file1.lz file2.lz file3.lz + You may also concatenate the compressed files like this + lziprecover --strip=tdata file1.lz file2.lz file3.lz > file123.lz + Or keeping the trailing data of the last file like this + lziprecover --strip=empty file1.lz file2.lz file3.lz > file123.lz + + +Example 5: Decompress 'file.lz' partially until 10 KiB of decompressed data +are produced. + + lziprecover -D 0,10KiB file.lz + + +Example 6: Decompress 'file.lz' partially from decompressed byte at offset +10000 to decompressed byte at offset 14999 (5000 bytes are produced). + + lziprecover -D 10000-15000 file.lz + + +Example 7: Repair a corrupt byte in the file 'file.lz'. (Indented lines are +abridged diagnostic messages from lziprecover). + + lziprecover -v -R file.lz + Copy of input file repaired successfully. + lziprecover -tv file_fixed.lz + file_fixed.lz: ok + mv file_fixed.lz file.lz + + +Example 8: Split the multimember file 'file.lz' and write each member in +its own 'recXXXfile.lz' file. Then use 'lziprecover -t' to test the +integrity of the resulting files. + + lziprecover -s file.lz + lziprecover -tv rec*file.lz + + +File: lziprecover.info, Node: Unzcrash, Next: Problems, Prev: Examples, Up: Top + +12 Testing the robustness of decompressors +****************************************** + +*Note --unzcrash::, for a faster way of testing the robustness of lzip. + + The lziprecover package also includes unzcrash, a program written to test +robustness to decompression of corrupted data, inspired by unzcrash.c from +Julian Seward's bzip2. Type 'make unzcrash' in the lziprecover source +directory to build it. + + By default, unzcrash reads the file specified and then repeatedly +decompresses it, increasing 256 times each byte of the compressed data, so +as to test all possible one-byte errors. Note that it may take years or even +centuries to test all possible one-byte errors in a large file (tens of MB). + + If the option '--block' is given, unzcrash reads the file specified and +then repeatedly decompresses it, setting all bytes in each successive block +to the value given, so as to test all possible full sector errors. + + If the option '--truncate' is given, unzcrash reads the file specified +and then repeatedly decompresses it, truncating the file to increasing +lengths, so as to test all possible truncation points. + + None of the three test modes described above should cause any invalid +memory accesses. If any of them does, please, report it as a bug to the +maintainers of the decompressor being tested. + + Unzcrash really executes as a subprocess the shell command specified in +the first non-option argument, and then writes the file specified in the +second non-option argument to the standard input of the subprocess, +modifying the corresponding byte each time. Therefore unzcrash can be used +to test any decompressor (not only lzip), or even other decoder programs +having a suitable command-line syntax. + + If the decompressor returns with zero status, unzcrash compares the +output of the decompressor for the original and corrupt files. If the +outputs differ, it means that the decompressor returned a false negative; +it failed to recognize the corruption and produced garbage output. The only +exception is when a multimember file is truncated just after the last byte +of a member, producing a shorter but valid compressed file. Except in this +latter case, please, report any false negative as a bug. + + In order to compare the outputs, unzcrash needs a 'zcmp' program able to +understand the format being tested. For example the 'zcmp' provided by +zutils. If the 'zcmp' program used does not understand the format being +tested, all the comparisons fail because the compressed files are compared +without being decompressed first. Use '--zcmp=false' to disable comparisons. +*Note Zcmp: (zutils)Zcmp. + + The format for running unzcrash is: + + unzcrash [OPTIONS] 'lzip -t' FILE + +The compressed FILE must not contain errors and the decompressor being +tested must decompress it correctly for the comparisons to work. + + unzcrash supports the following options: + +'-h' +'--help' + Print an informative help message describing the options and exit. + +'-V' +'--version' + Print the version number of unzcrash on the standard output and exit. + This version number should be included in all bug reports. + +'-b RANGE' +'--bits=RANGE' + Test N-bit errors only, instead of testing all the 255 wrong values for + each byte. 'N-bit error' means any value differing from the original + value in N bit positions, not a value differing from the original + value in the bit position N. + The number of N-bit errors per byte (N = 1 to 8) is: + 8 28 56 70 56 28 8 1 + + Examples of RANGE Tests errors of N-bits + 1 1 + 1,2,3 1, 2, 3 + 2-4 2, 3, 4 + 1,3-5,8 1, 3, 4, 5, 8 + 1-3,5-8 1, 2, 3, 5, 6, 7, 8 + +'-B[SIZE][,VALUE]' +'--block[=SIZE][,VALUE]' + Test block errors of given SIZE, simulating a whole sector I/O error. + SIZE defaults to 512 bytes. VALUE defaults to 0. By default, only + contiguous, non-overlapping blocks are tested, but this may be changed + with the option '--delta'. + +'-d N' +'--delta=N' + Test one byte, block, or truncation size every N bytes. If '--delta' + is not specified, unzcrash tests all the bytes, non-overlapping + blocks, or truncation sizes. Values of N smaller than the block size + result in overlapping blocks. (Which is convenient for testing because + there are usually too few non-overlapping blocks in a file). + +'-e POSITION,VALUE' +'--set-byte=POSITION,VALUE' + Set byte at POSITION to VALUE in the internal buffer after reading and + testing FILE but before the first test call to the decompressor. Byte + positions start at 0. If VALUE is preceded by '+', it is added to the + original value of the byte at POSITION. If VALUE is preceded by 'f' + (flip), it is XORed with the original value of the byte at POSITION. + This option can be used to run tests with a changed dictionary size, + for example. + +'-n' +'--no-check' + Skip initial test of FILE and 'zcmp'. May speed up things a lot when + testing many (or large) known good files. + +'-p BYTES' +'--position=BYTES' + First byte position to test in the file. Defaults to 0. Negative values + are relative to the end of the file. + +'-q' +'--quiet' + Quiet operation. Suppress all messages. + +'-s BYTES' +'--size=BYTES' + Number of byte positions to test. If not specified, the rest of the + file is tested (from '--position' to end of file). Negative values are + relative to the rest of the file. + +'-t' +'--truncate' + Test all possible truncation points in the range specified by + '--position' and '--size'. + +'-v' +'--verbose' + Verbose mode. + +'-z' +'--zcmp=<command>' + Set zcmp command name and options. Defaults to 'zcmp'. Use + '--zcmp=false' to disable comparisons. If testing a decompressor + different from the one used by default by zcmp, it is needed to force + unzcrash and zcmp to use the same decompressor with a command like + 'unzcrash --zcmp='zcmp --lz=plzip' 'plzip -t' FILE' + + + Exit status: 0 for a normal exit, 1 for environmental problems (file not +found, invalid command-line options, I/O errors, etc), 2 to indicate a +corrupt or invalid input file, 3 for an internal consistency error (e.g., +bug) which caused unzcrash to panic. + + +File: lziprecover.info, Node: Problems, Next: Concept index, Prev: Unzcrash, Up: Top + +13 Reporting bugs +***************** + +There are probably bugs in lziprecover. There are certainly errors and +omissions in this manual. If you report them, they will get fixed. If you +don't, no one will ever know about them and they will remain unfixed for +all eternity, if not longer. + + If you find a bug in lziprecover, please send electronic mail to +<lzip-bug@nongnu.org>. Include the version number, which you can find by +running 'lziprecover --version'. + + +File: lziprecover.info, Node: Concept index, Prev: Problems, Up: Top + +Concept index +************* + + +* Menu: + +* bugs: Problems. (line 6) +* data safety: Data safety. (line 6) +* examples: Examples. (line 6) +* file format: File format. (line 6) +* file names: File names. (line 6) +* getting help: Problems. (line 6) +* introduction: Introduction. (line 6) +* invoking: Invoking lziprecover. (line 6) +* merging files: Merging files. (line 6) +* merging with a backup: Merging with a backup. (line 6) +* options: Invoking lziprecover. (line 6) +* repairing one byte: Repairing one byte. (line 6) +* reproducing a mailbox: Reproducing a mailbox. (line 6) +* reproducing one sector: Reproducing one sector. (line 6) +* tarlz: Tarlz. (line 6) +* trailing data: Trailing data. (line 6) +* unzcrash: Unzcrash. (line 6) +* usage: Invoking lziprecover. (line 6) +* version: Invoking lziprecover. (line 6) + + + +Tag Table: +Node: Top226 +Node: Introduction1406 +Node: Invoking lziprecover5412 +Ref: --trailing-error6359 +Ref: range-format8791 +Ref: --reproduce9126 +Ref: --byte-repair13411 +Ref: --unzcrash23209 +Node: Data safety27459 +Node: Merging with a backup29443 +Node: Reproducing a mailbox30706 +Node: Repairing one byte33160 +Node: Merging files35220 +Ref: performance-of-merge36399 +Ref: ddrescue-example38008 +Node: Reproducing one sector39295 +Ref: performance-of-reproduce43181 +Ref: ddrescue-example245855 +Node: Tarlz48275 +Node: File names51933 +Node: File format52395 +Node: Trailing data55082 +Node: Examples58397 +Ref: concat-example58972 +Node: Unzcrash60364 +Node: Problems66704 +Node: Concept index67256 + +End Tag Table + + +Local Variables: +coding: iso-8859-15 +End: diff --git a/doc/lziprecover.texi b/doc/lziprecover.texi new file mode 100644 index 0000000..0d32d9d --- /dev/null +++ b/doc/lziprecover.texi @@ -0,0 +1,1617 @@ +\input texinfo @c -*-texinfo-*- +@c %**start of header +@setfilename lziprecover.info +@documentencoding ISO-8859-15 +@settitle Lziprecover Manual +@finalout +@c %**end of header + +@set UPDATED 20 January 2024 +@set VERSION 1.24 + +@dircategory Compression +@direntry +* Lziprecover: (lziprecover). Data recovery tool for the lzip format +@end direntry + + +@ifnothtml +@titlepage +@title Lziprecover +@subtitle Data recovery tool for the lzip format +@subtitle for Lziprecover version @value{VERSION}, @value{UPDATED} +@author by Antonio Diaz Diaz + +@page +@vskip 0pt plus 1filll +@end titlepage + +@contents +@end ifnothtml + +@ifnottex +@node Top +@top + +This manual is for Lziprecover (version @value{VERSION}, @value{UPDATED}). + +@menu +* Introduction:: Purpose and features of lziprecover +* Invoking lziprecover:: Command-line interface +* Data safety:: Protecting data from accidental loss +* Repairing one byte:: Fixing bit flips and similar errors +* Merging files:: Fixing several damaged copies +* Reproducing one sector:: Fixing a missing (zeroed) sector +* Tarlz:: Options supporting the tar.lz format +* File names:: Names of the files produced by lziprecover +* File format:: Detailed format of the compressed file +* Trailing data:: Extra data appended to the file +* Examples:: A small tutorial with examples +* Unzcrash:: Testing the robustness of decompressors +* Problems:: Reporting bugs +* Concept index:: Index of concepts +@end menu + +@sp 1 +Copyright @copyright{} 2009-2024 Antonio Diaz Diaz. + +This manual is free documentation: you have unlimited permission to copy, +distribute, and modify it. +@end ifnottex + + +@node Introduction +@chapter Introduction +@cindex introduction + +@uref{http://www.nongnu.org/lzip/lziprecover.html,,Lziprecover} +is a data recovery tool and decompressor for files in the lzip +compressed data format (.lz). Lziprecover is able to repair slightly damaged +files (up to one single-byte error per member), produce a correct file by +merging the good parts of two or more damaged copies, reproduce a missing +(zeroed) sector using a reference file, extract data from damaged files, +decompress files, and test integrity of files. + +Lziprecover can remove the damaged members from multimember files, for +example multimember tar.lz archives. + +Lziprecover provides random access to the data in multimember files; it only +decompresses the members containing the desired data. + +Lziprecover facilitates the management of metadata stored as trailing data +in lzip files. + +Lziprecover is not a replacement for regular backups, but a last line of +defense for the case where the backups are also damaged. + +The lzip file format is designed for data sharing and long-term archiving, +taking into account both data integrity and decoder availability: + +@itemize @bullet +@item +The lzip format provides very safe integrity checking and some data +recovery means. The program lziprecover can repair bit flip errors +(one of the most common forms of data corruption) in lzip files, and +provides data recovery capabilities, including error-checked merging +of damaged copies of a file. @xref{Data safety}. + +@item +The lzip format is as simple as possible (but not simpler). The lzip +manual provides the source code of a simple decompressor along with a +detailed explanation of how it works, so that with the only help of the +lzip manual it would be possible for a digital archaeologist to extract +the data from a lzip file long after quantum computers eventually +render LZMA obsolete. + +@item +Additionally the lzip reference implementation is copylefted, which +guarantees that it will remain free forever. +@end itemize + +A nice feature of the lzip format is that a corrupt byte is easier to repair +the nearer it is from the beginning of the file. Therefore, with the help of +lziprecover, losing an entire archive just because of a corrupt byte near +the beginning is a thing of the past. + +Compression may be good for long-term archiving. For compressible data, +multiple compressed copies may provide redundancy in a more useful form and +may have a better chance of surviving intact than one uncompressed copy +using the same amount of storage space. This is especially true if the +format provides recovery capabilities like those of lziprecover, which is +able to find and combine the good parts of several damaged copies. + +Lziprecover is able to recover or decompress files produced by any of the +compressors in the lzip family: lzip, plzip, minilzip/lzlib, clzip, and +pdlzip. + +If the cause of file corruption is a damaged medium, the combination +@w{GNU ddrescue + lziprecover} is the recommended option for recovering data +from damaged lzip files. @xref{ddrescue-example}, and +@ref{ddrescue-example2}, for examples. + +If a file is too damaged for lziprecover to repair it, all the recoverable +data in all members of the file can be extracted with the following command +(the resulting file may contain errors and some garbage data may be produced +at the end of each damaged member): + +@example +lziprecover -cd --ignore-errors file.lz > file +@end example + +When recovering data, lziprecover takes as arguments the names of the +damaged files and writes zero or more recovered files depending on the +operation selected and whether the recovery succeeded or not. The damaged +files themselves are kept unchanged. + +When decompressing or testing file integrity, lziprecover behaves like lzip +or lunzip. + +LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have +been compressed. Decompressed is used to refer to data which have undergone +the process of decompression. + + +@node Invoking lziprecover +@chapter Invoking lziprecover +@cindex invoking +@cindex options +@cindex usage +@cindex version + +The format for running lziprecover is: + +@example +lziprecover [@var{options}] [@var{files}] +@end example + +@noindent +When decompressing or testing, a hyphen @samp{-} used as a @var{file} +argument means standard input. It can be mixed with other @var{files} and is +read just once, the first time it appears in the command line. If no file +names are specified, lziprecover decompresses from standard input to +standard output. Remember to prepend @file{./} to any file name beginning +with a hyphen, or use @samp{--}. + +lziprecover supports the following +@uref{http://www.nongnu.org/arg-parser/manual/arg_parser_manual.html#Argument-syntax,,options}: +@ifnothtml +@xref{Argument syntax,,,arg_parser}. +@end ifnothtml + +@table @code +@item -h +@itemx --help +Print an informative help message describing the options and exit. + +@item -V +@itemx --version +Print the version number of lziprecover on the standard output and exit. +This version number should be included in all bug reports. + +@anchor{--trailing-error} +@item -a +@itemx --trailing-error +Exit with error status 2 if any remaining input is detected after +decompressing the last member. Such remaining input is usually trailing +garbage that can be safely ignored. @xref{concat-example}. + +@item -A +@itemx --alone-to-lz +Convert lzma-alone files to lzip format without recompressing, just +adding a lzip header and trailer. The conversion minimizes the +dictionary size of the resulting file (and therefore the amount of +memory required to decompress it). Only streamed files with default LZMA +properties can be converted; non-streamed lzma-alone files lack the "End +Of Stream" marker required in lzip files. + +The name of the converted lzip file is derived from that of the original +lzma-alone file as follows: + +@multitable {filename.lzma} {becomes} {anyothername.lz} +@item filename.lzma @tab becomes @tab filename.lz +@item filename.tlz @tab becomes @tab filename.tar.lz +@item anyothername @tab becomes @tab anyothername.lz +@end multitable + +@item -c +@itemx --stdout +Write decompressed data to standard output; keep input files unchanged. This +option (or @option{-o}) is needed when reading from a named pipe (fifo) or +from a device. Use it also to recover as much of the decompressed data as +possible when decompressing a corrupt file. @option{-c} overrides @option{-o}. +@option{-c} has no effect when merging, removing members, repairing, +reproducing, splitting, testing or listing. + +@item -d +@itemx --decompress +Decompress the files specified. The integrity of the files specified is +checked. If a file does not exist, can't be opened, or the destination file +already exists and @option{--force} has not been specified, lziprecover +continues decompressing the rest of the files and exits with error status 1. +If a file fails to decompress, or is a terminal, lziprecover exits +immediately with error status 2 without decompressing the rest of the files. +A terminal is considered an uncompressed file, and therefore invalid. + +@item -D @var{range} +@itemx --range-decompress=@var{range} +Decompress only a range of bytes starting at decompressed byte position +@var{begin} and up to byte position @w{@var{end} - 1}. Byte positions start +at 0. This option provides random access to the data in multimember files; +it only decompresses the members containing the desired data. In order to +guarantee the correctness of the data produced, all members containing any +part of the desired data are decompressed and their integrity is checked. + +@anchor{range-format} +Four formats of @var{range} are recognized, @samp{@var{begin}}, +@samp{@var{begin}-@var{end}}, @samp{@var{begin},@var{size}}, and +@samp{,@var{size}}. If only @var{begin} is specified, @var{end} is taken as +the end of the file. If only @var{size} is specified, @var{begin} is taken +as the beginning of the file. The bytes produced are sent to standard output +unless the option @option{--output} is used. + +@anchor{--reproduce} +@item -e +@itemx --reproduce +Try to recover a missing (zeroed) sector in @var{file} using a reference +file and the same version of lzip that created @var{file}. If successful, a +repaired copy is written to the file @var{file}_fixed.lz. @var{file} is not +modified at all. The exit status is 0 if the member containing the zeroed +sector could be repaired, 2 otherwise. Note that @var{file}_fixed.lz may +still contain errors in the members following the one repaired. +@xref{Reproducing one sector}, for a complete description of the reproduce +mode. + +@item --lzip-level=@var{digit}|a|m[@var{length}] +Try only the given compression level or match length limit when reproducing +a zeroed sector. @option{--lzip-level=a} tries all the compression levels +@w{(0 to 9)}, while @option{--lzip-level=m} tries all the match length limits +@w{(5 to 273)}. + +@item --lzip-name=@var{name} +Set the name of the lzip executable used by @option{--reproduce}. If +@option{--lzip-name} is not specified, @samp{lzip} is used. + +@item --reference-file=@var{file} +Set the reference file used by @option{--reproduce}. It must contain the +uncompressed data corresponding to the missing compressed data of the zeroed +sector, plus some context data before and after them. + +@item -f +@itemx --force +Force overwrite of output files. + +@item -i +@itemx --ignore-errors +Make @option{--decompress}, @option{--test}, and @option{--range-decompress} +ignore format and data errors and continue decompressing the remaining +members in the file; keep input files unchanged. For example, the commands +@w{@samp{lziprecover -cd -i file.lz > file}} or +@w{@samp{lziprecover -D0 -i file.lz > file}} decompress all the recoverable +data in all members of @samp{file.lz} without having to split it first. The +@w{@samp{-cd -i}} method resyncs to the next member header after each error, +and is immune to some format errors that make @w{@samp{-D0 -i}} fail. The +range decompressed may be smaller than the range requested, because of the +errors. The exit status is set to 0 unless other errors are found (I/O +errors, for example). + +Make @option{--list}, @option{--dump}, @option{--remove}, and @option{--strip} +ignore format errors. The sizes of the members with errors (especially the +last) may be wrong. + +@item -k +@itemx --keep +Keep (don't delete) input files during decompression. + +@item -l +@itemx --list +Print the uncompressed size, compressed size, and percentage saved of the +files specified. Trailing data are ignored. The values produced are correct +even for multimember files. If more than one file is given, a final line +containing the cumulative sizes is printed. With @option{-v}, the dictionary +size, the number of members in the file, and the amount of trailing data (if +any) are also printed. With @option{-vv}, the positions and sizes of each +member in multimember files are also printed. With @option{-i}, format errors +are ignored, and with @option{-ivv}, gaps between members are shown. The +member numbers shown coincide with the file numbers produced by @option{--split}. + +If any file is damaged, does not exist, can't be opened, or is not regular, +the final exit status is @w{> 0}. @option{-lq} can be used to check quickly +(without decompressing) the structural integrity of the files specified. +(Use @option{--test} to check the data integrity). @option{-alq} +additionally checks that none of the files specified contain trailing data. + +@item -m +@itemx --merge +Try to produce a correct file by merging the good parts of two or more +damaged copies. If successful, a repaired copy is written to the file +@var{file}_fixed.lz. The exit status is 0 if a correct file could be +produced, 2 otherwise. @xref{Merging files}, for a complete description of +the merge mode. + +@item -o @var{file} +@itemx --output=@var{file} +Place the repaired output into @var{file} instead of into +@var{file}_fixed.lz. If splitting, the names of the files produced are in +the form @samp{rec01@var{file}}, @samp{rec02@var{file}}, etc. + +If @option{-c} has not been also specified, write the (de)compressed output +to @var{file}, automatically creating any missing parent directories; keep +input files unchanged. This option (or @option{-c}) is needed when reading +from a named pipe (fifo) or from a device. @w{@option{-o -}} is equivalent +to @option{-c}. @option{-o} has no effect when testing or listing. + +@item -q +@itemx --quiet +Quiet operation. Suppress all messages. + +@anchor{--byte-repair} +@item -R +@itemx --byte-repair +Try to repair a @var{file} with small errors (up to one single-byte error +per member). If successful, a repaired copy is written to the file +@var{file}_fixed.lz. @var{file} is not modified at all. The exit status is 0 +if the file could be repaired, 2 otherwise. @xref{Repairing one byte}, for a +complete description of the repair mode. + +@item -s +@itemx --split +Search for members in @var{file} and write each member in its own file. Gaps +between members are detected and each gap is saved in its own file. Trailing +data (if any) are saved alone in the last file. You can then use +@w{@samp{lziprecover -t}} to test the integrity of the resulting files, +decompress those which are undamaged, and try to repair or partially +decompress those which are damaged. Gaps may contain garbage or may be +members with corrupt headers or trailers. If other lziprecover functions +fail to work on a multimember @var{file} because of damage in headers or +trailers, try to split @var{file} and then work on each member individually. + +The names of the files produced are in the form @samp{rec01@var{file}}, +@samp{rec02@var{file}}, etc, and are designed so that the use of wildcards +in subsequent processing, for example, +@w{@samp{lziprecover -cd rec*@var{file} > recovered_data}}, processes the +files in the correct order. The number of digits used in the names varies +depending on the number of members in @var{file}. + +@item -t +@itemx --test +Check integrity of the files specified, but don't decompress them. This +really performs a trial decompression and throws away the result. Use it +together with @option{-v} to see information about the files. If a file +fails the test, does not exist, can't be opened, or is a terminal, lziprecover +continues testing the rest of the files. A final diagnostic is shown at +verbosity level 1 or higher if any file fails the test when testing multiple +files. + +@item -v +@itemx --verbose +Verbose mode.@* +When decompressing or testing, further -v's (up to 4) increase the +verbosity level, showing status, compression ratio, dictionary size, +trailer contents (CRC, data size, member size), and up to 6 bytes of +trailing data (if any) both in hexadecimal and as a string of printable +ASCII characters.@* +Two or more @option{-v} options show the progress of decompression.@* +In other modes, increasing verbosity levels show final status, progress +of operations, and extra information (for example, the failed areas). + +@item --dump=[@var{member_list}][:damaged][:empty][:tdata] +Dump the members listed, the damaged members (if any), the empty members (if +any), or the trailing data (if any) of one or more regular multimember files +to standard output, or to a file if the option @option{--output} is used. If +more than one file is given, the elements dumped from all the files are +concatenated. If a file does not exist, can't be opened, or is not regular, +lziprecover continues processing the rest of the files. If the dump fails in +one file, lziprecover exits immediately without processing the rest of the +files. Only @option{--dump=tdata} can write to a terminal. +@option{--dump=damaged} implies @option{--ignore-errors}. + +The argument to @option{--dump} is a colon-separated list of the following +element specifiers; a member list (1,3-6), a reverse member list (r1,3-6), +and the strings "damaged", "empty", and "tdata" (which may be shortened to +'d', 'e', and 't' respectively). A member list selects the members (or gaps) +listed, whose numbers coincide with those shown by @option{--list}. A reverse +member list selects the members listed counting from the last member in the +file (r1). Negated versions of both kinds of lists exist (^1,3-6:r^1,3-6) +which select all the members except those in the list. The strings +"damaged", "empty", and "tdata" select the damaged members, the empty +members (those with a data size = 0), and the trailing data respectively. If +the same member is selected more than once, for example by @samp{1:r1} in a +single-member file, it is dumped just once. See the following examples: + +@multitable {@code{3,12:damaged:tdata}} {members 3, 12, damaged members, trailing data} +@headitem @code{--dump} argument @tab Elements dumped +@item @code{1,3-6} @tab members 1, 3, 4, 5, 6 +@item @code{r1-3} @tab last 3 members in file +@item @code{^13,15} @tab all but 13th and 15th members in file +@item @code{r^1} @tab all but last member in file +@item @code{damaged} @tab all damaged members in file +@item @code{empty} @tab all empty members in file +@item @code{tdata} @tab trailing data +@item @code{1-5:r1:tdata} @tab members 1 to 5, last member, trailing data +@item @code{damaged:tdata} @tab damaged members, trailing data +@item @code{3,12:damaged:tdata} @tab members 3, 12, damaged members, trailing data +@end multitable + +@item --remove=[@var{member_list}][:damaged][:empty][:tdata] +Remove the members listed, the damaged members (if any), the empty members +(if any), or the trailing data (if any) from regular multimember files in +place. The date of each file modified is preserved if possible. If all +members in a file are selected to be removed, the file is left unchanged and +the exit status is set to 2. If a file does not exist, can't be opened, is +not regular, or is left unchanged, lziprecover continues processing the rest +of the files. In case of I/O error, lziprecover exits immediately without +processing the rest of the files. See @option{--dump} above for a description +of the argument. + +This option may be dangerous even if only the trailing data are being +removed because the file may be corrupt or the trailing data may contain a +forbidden combination of characters. @xref{Trailing data}. It is safer to +send the output of @option{--strip} to a temporary file, check it, and then +copy it over the original file. But if you prefer @option{--remove} because of +its more efficient in-place removal, it is advisable to make a backup before +attempting the removal. At least check that @w{@samp{lzip -cd file.lz | wc -c}} +and the uncompressed size shown by @w{@samp{lzip -l file.lz}} match before +attempting the removal of trailing data. + +@item --strip=[@var{member_list}][:damaged][:empty][:tdata] +Copy one or more regular multimember files to standard output (or to a file +if the option @option{--output} is used), stripping the members listed, the +damaged members (if any), the empty members (if any), or the trailing data +(if any) from each file. If all members in a file are selected to be +stripped, the trailing data (if any) are also stripped even if @samp{tdata} +is not specified. If more than one file is given, the files are +concatenated. In this case the trailing data are also stripped from all but +the last file even if @samp{tdata} is not specified. If a file does not +exist, can't be opened, or is not regular, lziprecover continues processing +the rest of the files. If a file fails to copy, lziprecover exits +immediately without processing the rest of the files. See @option{--dump} +above for a description of the argument. + +@item --empty-error +Exit with error status 2 if any empty member is found in the input files. + +@item --marking-error +Exit with error status 2 if the first LZMA byte is non-zero in any member of +the input files. This may be caused by data corruption or by deliberate +insertion of tracking information in the file. Use +@w{@samp{lziprecover --clear-marking}} to clear any such non-zero bytes. + +@item --loose-trailing +When decompressing, testing, or listing, allow trailing data whose first +bytes are so similar to the magic bytes of a lzip header that they can +be confused with a corrupt header. Use this option if a file triggers a +"corrupt header" error and the cause is not indeed a corrupt header. + +@item --clear-marking +Set to zero the first LZMA byte of each member in the files specified. At +verbosity level 1 (-v), print the number of members cleared. The date of +each file modified is preserved if possible. This option exists because the +first byte of the LZMA stream is ignored by the range decoder, and can +therefore be (mis)used to store any value which can then be used as a +watermark to track the path of the compressed payload. + +@end table + +Lziprecover also supports the following debug options (for experts): + +@table @code +@item -E @var{range}[,@var{sector_size}] +@itemx --debug-reproduce=@var{range}[,@var{sector_size}] +Load the compressed @var{file} into memory, set all bytes in the positions +specified by @var{range} to 0, and try to reproduce a correct compressed +file. @xref{--reproduce}. @xref{range-format}, for a description of +@var{range}. If a @var{sector_size} is specified, set each sector to 0 in +sequence and try to reproduce the file, printing to standard output final +statistics of the number of sectors reproduced successfully. Exit with +nonzero status only in case of fatal error. + +@item -M +@itemx --md5sum +Print to standard output the MD5 digests of the input @var{files} one per +line in the same format produced by the @command{md5sum} tool. Lziprecover +uses MD5 digests to check the result of some operations. This option can be +used to test the correctness of lziprecover's implementation of the MD5 +algorithm. + +@item -S[@var{value}] +@itemx --nrep-stats[=@var{value}] +Compare the frequency of sequences of N repeated bytes of a given +@var{value} in the compressed LZMA streams of the input @var{files} with the +frequency expected for random data (1 / 2^(8N)). If @var{value} is not +specified, print the frequency of repeated sequences of all possible byte +values. Print cumulative data for all the files, followed by the name of the +first file with the longest sequence. + +@anchor{--unzcrash} +@item -U 1|B@var{size} +@itemx --unzcrash=1|B@var{size} +With argument @samp{1}, test 1-bit errors in the LZMA stream of the +compressed input @var{file} like the command +@w{@samp{unzcrash -b1 -p7 -s-20 'lzip -t' @var{file}}} but in memory, and +therefore much faster (30 to 50 times faster). @xref{Unzcrash}. This option +tests all the members independently in a multimember file, skipping headers +and trailers. If a decompression succeeds, the decompressed output is +compared with the decompressed output of the original @var{file} using MD5 +digests. @var{file} must not contain errors and must decompress correctly +for the comparisons to work. + +With argument @samp{B}, test zeroed sectors (blocks of bytes) in the LZMA +stream of the compressed input @var{file} like the command +@w{@samp{unzcrash --block=@var{size} -d1 -p7 -s-(@var{size}+20) 'lzip -t' @var{file}}} +but in memory, and therefore much faster. Testing and comparisons work just +like with the argument @samp{1} explained above. + +By default @option{--unzcrash} only prints the interesting cases; CRC +mismatches, size mismatches, unsupported marker codes, unexpected EOFs, +apparently successful decompressions, and decoder errors detected 50_000 or +more bytes beyond the byte (or the start of the block) being tested. At +verbosity level 1 (-v) it also prints decoder errors detected 10_000 or more +bytes beyond the byte being tested. At verbosity level 2 (-vv) it prints all +cases for 1-bit errors or the decoder errors detected beyond the end of the +block for zeroed blocks. + +@item -W @var{position},@var{value} +@itemx --debug-decompress=@var{position},@var{value} +Load the compressed @var{file} into memory, set the byte at @var{position} +to @var{value}, and decompress the modified compressed data to standard +output. If the damaged member can be decompressed to the end (just fails +with a CRC mismatch), the members following it are also decompressed. + +@item -X[@var{position},@var{value}] +@itemx --show-packets[=@var{position},@var{value}] +Load the compressed @var{file} into memory, optionally set the byte at +@var{position} to @var{value}, decompress the modified compressed data +(discarding the output), and print to standard output descriptions of the +LZMA packets being decoded. + +@item -Y @var{range} +@itemx --debug-delay=@var{range} +Load the compressed @var{file} into memory and then repeatedly decompress +it, increasing 256 times each byte of the subset of the compressed data +positions specified by @var{range}, so as to test all possible one-byte +errors. For each decompression error find the error detection delay and +print to standard output the maximum delay. The error detection delay is the +difference between the position of the error and the position where the +decoder realized that the data contains an error. @xref{range-format}, for a +description of @var{range}. + +@item -Z @var{position},@var{value} +@itemx --debug-byte-repair=@var{position},@var{value} +Load the compressed @var{file} into memory, set the byte at @var{position} +to @var{value}, and then try to repair the byte error. @xref{--byte-repair}. + +@end table + +Numbers given as arguments to options may be expressed in decimal, +hexadecimal, or octal (using the same syntax as integer constants in C++), +and may be followed by a multiplier and an optional @samp{B} for "byte". + +Table of SI and binary prefixes (unit multipliers): + +@multitable {Prefix} {kilobyte (10^3 = 1000)} {|} {Prefix} {kibibyte (2^10 = 1024)} +@item Prefix @tab Value @tab | @tab Prefix @tab Value +@item k @tab kilobyte (10^3 = 1000) @tab | @tab Ki @tab kibibyte (2^10 = 1024) +@item M @tab megabyte (10^6) @tab | @tab Mi @tab mebibyte (2^20) +@item G @tab gigabyte (10^9) @tab | @tab Gi @tab gibibyte (2^30) +@item T @tab terabyte (10^12) @tab | @tab Ti @tab tebibyte (2^40) +@item P @tab petabyte (10^15) @tab | @tab Pi @tab pebibyte (2^50) +@item E @tab exabyte (10^18) @tab | @tab Ei @tab exbibyte (2^60) +@item Z @tab zettabyte (10^21) @tab | @tab Zi @tab zebibyte (2^70) +@item Y @tab yottabyte (10^24) @tab | @tab Yi @tab yobibyte (2^80) +@item R @tab ronnabyte (10^27) @tab | @tab Ri @tab robibyte (2^90) +@item Q @tab quettabyte (10^30) @tab | @tab Qi @tab quebibyte (2^100) +@end multitable + +@sp 1 +Exit status: 0 for a normal exit, 1 for environmental problems +(file not found, invalid command-line options, I/O errors, etc), 2 to +indicate a corrupt or invalid input file, 3 for an internal consistency +error (e.g., bug) which caused lziprecover to panic. + + +@node Data safety +@chapter Protecting data from accidental loss +@cindex data safety + +It is a fact of life that sometimes data becomes corrupt. Software has +errors. Hardware may misbehave or fail. RAM may be struck by a cosmic ray. +This is why a safe enough integrity checking is needed in compressed +formats, and the reason why a data recovery tool is sometimes needed. + +There are 3 main types of data corruption that may cause data loss: +single-byte errors, multibyte errors (generally affecting a whole sector +in a block device), and total device failure. + +Lziprecover protects natively against single-byte errors as long as file +integrity is checked frequently enough that a second single-byte error does +not develop in the same member before the first one is repaired. +@xref{Repairing one byte}. + +Lziprecover also protects against multibyte errors if at least one backup +copy of the file is made (@pxref{Merging files}), or if the error is a +zeroed sector and the uncompressed data corresponding to the zeroed sector +are available (@pxref{Reproducing one sector}). If you can choose between +merging and reproducing, try merging first because it is usually faster, +easier to use, and has a high probability of success. + +Lziprecover can't help in case of device failure. The only remedy for total +device failure is storing backup copies in separate media. + +The extraordinary safety of the lzip format allows lziprecover to exploit +the redundance that occurrs naturally when making compressed backups. +Lziprecover can recover data that would not be recoverable from files +compressed in other formats. Let's see two examples of how much better is +lzip compared with gzip and bzip2 with respect to data safety: + +@menu +* Merging with a backup:: Recovering a file using a damaged backup +* Reproducing a mailbox:: Recovering new messages using an old backup +@end menu + + +@node Merging with a backup +@section Recovering a file using a damaged backup +@cindex merging with a backup + +Let's suppose that you made a compressed backup of your valuable scientific +data and stored two copies on separate media. Years later you notice that +both copies are corrupt. + +If you compressed the data with gzip and both copies suffer any damage in +the data stream, even if it is just one altered bit, the original data can +only be recovered by an expert, if at all. + +If you used bzip2, and if the file is large enough to contain more than one +compressed data block (usually larger than @w{900 kB} uncompressed), and if +no block is damaged in both files, then the data can be manually recovered +by splitting the files with bzip2recover, checking every block, and then +copying the right blocks in the right order into another file. + +But if you used lzip, the data can be automatically recovered with +@w{@samp{lziprecover --merge}} as long as the damaged areas don't overlap. + +Note that each error in a bzip2 file makes a whole block unusable, but each +error in a lzip file only affects the damaged bytes, making it possible to +recover a file with thousands of errors. + + +@node Reproducing a mailbox +@section Recovering new messages using an old backup +@cindex reproducing a mailbox + +Let's suppose that you make periodic backups of your email messages stored +in one or more mailboxes. (A mailbox is a file containing a possibly large +number of email messages). New messages are appended to the end of each +mailbox, therefore the initial part of two consecutive backups is identical +unless some messages have been changed or deleted in the meantime. The new +messages added to each backup are usually a small part of the whole mailbox. + +@verbatim ++============================================+ +| Older backup containing some messages | ++============================================+ ++============================================+========================+ +| Newer backup containing the messages above | plus some new messages | ++============================================+========================+ +@end verbatim + +One day you discover that your mailbox has disappeared because you deleted +it inadvertently or because of a bug in your email reader. Not only that. +You need to recover a recent message, but the last backup you made of the +mailbox (the newer backup above) has lost the data corresponding to a whole +sector because of an I/O error in the part containing the old messages. + +If you compressed the mailbox with gzip, usually none of the new messages +can be recovered even if they are intact because all the data beyond the +missing sector can't be decoded. + +If you used bzip2, and if the newer backup is large enough that the new +messages are in a different compressed data block than the one damaged +(usually larger than @w{900 kB} uncompressed), then you can recover the new +messages manually with bzip2recover. If the backups are identical except for +the new messages appended, you may even recover the whole newer backup by +combining the good blocks from both backups. + +But if you used lzip, the whole newer backup can be automatically recovered +with @w{@samp{lziprecover --reproduce}} as long as the missing bytes can be +recovered from the older backup, even if other messages in the common part +have been changed or deleted. Mailboxes seem to be especially easy to +reproduce. The probability of reproducing a mailbox +(@pxref{performance-of-reproduce}) is almost as high as that of merging two +identical backups (@pxref{performance-of-merge}). + + +@node Repairing one byte +@chapter Repairing one byte +@cindex repairing one byte + +Lziprecover can repair perfectly most files with small errors (up to one +single-byte error per member), without the need of any extra redundance at +all. If the reparation is successful, the repaired file is identical bit for +bit to the original. This makes lzip files resistant to bit flip, one of the +most common forms of data corruption. + +The file is repaired in memory. Therefore, enough virtual memory +@w{(RAM + swap)} to contain the largest damaged member is required. + +The error may be located anywhere in the file except in the first 5 +bytes of each member header or in the @samp{Member size} field of the +trailer (last 8 bytes of each member). If the error is in the header it +can be easily repaired with a text editor like GNU Moe (@pxref{File +format}). If the error is in the member size, it is enough to ignore the +message about @samp{bad member size} when decompressing. + +Bit flip happens when one bit in the file is changed from 0 to 1 or vice +versa. It may be caused by bad RAM or even by natural radiation. I have +seen a case of bit flip in a file stored on an USB flash drive. + +One byte may seem small, but most file corruptions not produced by +transmission errors or I/O errors just affect one byte, or even one bit, +of the file. Also, unlike magnetic media, where errors usually affect a +whole sector, solid-state storage devices tend to produce single-byte +errors, making of lzip the perfect format for data stored on such devices. + +Repairing a file can take some time. Small files or files with the error +located near the beginning can be repaired in a few seconds. But +repairing a large file compressed with a large dictionary size and with +the error located far from the beginning, may take hours. + +On the other hand, errors located near the beginning of the file cause +much more loss of data than errors located near the end. So lziprecover +repairs more efficiently the worst errors. + + +@node Merging files +@chapter Merging files +@cindex merging files + +If you have several copies of a file but all of them are too damaged to +repair them individually (@pxref{Repairing one byte}), lziprecover can try +to produce a correct file by merging the good parts of the damaged copies. + +The merge may succeed even if some copies of the file have all the headers +and trailers damaged, as long as there is at least one copy of every header +and trailer intact, even if they are in different copies of the file. + +The merge fails if the damaged areas overlap (at least one byte is damaged +in all copies), or are adjacent and the boundary can't be determined, or if +the copies have too many damaged areas. + +All the copies to be merged must have the same size. If any of them is +larger or smaller than it should, either because it has been truncated or +because it got some garbage data appended at the end, it can be brought to +the correct size with the following command before merging it with the other +copies: + +@example +ddrescue -s<correct_size> -x<correct_size> file.lz correct_size_file.lz +@end example + +@anchor{performance-of-merge} +To give you an idea of its possibilities, when merging two copies, each of +them with one damaged area affecting 1 percent of the copy, the probability +of obtaining a correct file is about 98 percent. With three such copies the +probability rises to 99.97 percent. For large files (a few MB) with small +errors (one sector damaged per copy), the probability approaches 100 percent +even with only two copies. (Supposing that the errors are randomly located +inside each copy). + +Some types of solid-state device (NAND flash, for example) can produce +bursts of scattered single-bit errors. Lziprecover is able to merge +files with thousands of such scattered errors by grouping the errors +into clusters and then merging the files as if each cluster were a +single error. + +Here is a real case of successful merging. Two copies of the file +@samp{icecat-3.5.3-x86.tar.lz} (compressed size @w{9 MB}) became corrupt +while stored on the same NAND flash device. One of the copies had 76 +single-bit errors scattered in an area of 1020 bytes, and the other had +3028 such errors in an area of 31729 bytes. Lziprecover produced a +correct file, identical to the original, in just 5 seconds: + +@example +lziprecover -vvm a/icecat-3.5.3-x86.tar.lz b/icecat-3.5.3-x86.tar.lz +Merging member 1 of 1 (2552 errors) + 2552 errors have been grouped in 16 clusters. + Trying variation 2 of 2, block 2 +Input files merged successfully. +@end example + +Note that the number of errors reported by lziprecover (2552) is lower +than the number of corrupt bytes (3104) because contiguous corrupt bytes +are counted as a single multibyte error. + +@sp 1 +@anchor{ddrescue-example} +@noindent +Example 1: Recover a compressed backup from two copies on CD-ROM with +error-checked merging of copies. +@ifnothtml +@xref{Top,GNU ddrescue manual,,ddrescue}, +@end ifnothtml +@ifhtml +See the +@uref{http://www.gnu.org/software/ddrescue/manual/ddrescue_manual.html,,ddrescue manual} +@end ifhtml +for details about ddrescue. + +@example +ddrescue -d -r1 -b2048 /dev/cdrom cdimage1 mapfile1 +mount -t iso9660 -o loop,ro cdimage1 /mnt/cdimage +cp /mnt/cdimage/backup.tar.lz rescued1.tar.lz +umount /mnt/cdimage + (insert second copy in the CD drive) +ddrescue -d -r1 -b2048 /dev/cdrom cdimage2 mapfile2 +mount -t iso9660 -o loop,ro cdimage2 /mnt/cdimage +cp /mnt/cdimage/backup.tar.lz rescued2.tar.lz +umount /mnt/cdimage +lziprecover -m -v -o backup.tar.lz rescued1.tar.lz rescued2.tar.lz + Input files merged successfully. +lziprecover -tv backup.tar.lz + backup.tar.lz: ok +@end example + +@sp 1 +@noindent +Example 2: Recover the first volume of those created with the command +@w{@samp{lzip -b 32MiB -S 650MB big_db}} from two copies, +@samp{big_db1_00001.lz} and @samp{big_db2_00001.lz}, with member 07 +damaged in the first copy, member 18 damaged in the second copy, and +member 12 damaged in both copies. The correct file produced is saved in +@samp{big_db_00001.lz}. + +@example +lziprecover -m -v -o big_db_00001.lz big_db1_00001.lz big_db2_00001.lz + Input files merged successfully. +lziprecover -tv big_db_00001.lz + big_db_00001.lz: ok +@end example + + +@node Reproducing one sector +@chapter Reproducing one sector +@cindex reproducing one sector + +Lziprecover can recover a zeroed sector in a lzip file by concatenating the +decompressed contents of the file up to the beginning of the zeroed sector +and the uncompressed data corresponding to the zeroed sector, and then +feeding the concatenated data to the same version of lzip that created the +file. For this to work, a reference file is required containing the +uncompressed data corresponding to the missing compressed data of the zeroed +sector, plus some context data before and after them. It is possible to +recover a large file using just a few kB of reference data. + +The difficult part is finding a suitable reference file. It must contain the +exact data required (possibly mixed with other data). Containing similar +data is not enough. + +A zeroed sector may be caused by the incomplete recovery of a damaged +storage device (with I/O errors) using, for example, ddrescue. The +reproduction can't be done if the zeroed sector overlaps with the first 15 +bytes of a member, or if the zeroed sector is smaller than 8 bytes. + +The file is reproduced in memory. Therefore, enough virtual memory +@w{(RAM + swap)} to contain the damaged member is required. + +To understand how it works, take any lzipped file, say @samp{foo.lz}, +decompress it (keeping the original), and try to reproduce an artificially +zeroed sector in it by running the following commands: + +@example +lzip -kd foo.lz +lziprecover -vv --debug-reproduce=65536,512 --reference-file=foo foo.lz +@end example + +@noindent +which should produce an output like the following: + +@example +Reproducing: foo.lz +Reference file: foo +Testing sectors of size 512 at file positions 65536 to 66047 + (master mpos = 65536, dpos = 296892) +foo: Match found at offset 296892 +Reproduction succeeded at pos 65536 + + 1 sectors tested + 1 reproductions returned with zero status + all comparisons passed +@end example + +Using @samp{foo} as reference file guarantees that any zeroed sector in +@samp{foo.lz} can be reproduced because both files contain the same data. In +real use, the reference file needs to contain the data corresponding to the +zeroed sector, but the rest of the data (if any) may differ between both +files. The reference data may be obtained from the partial decompression of +the damaged file itself if it contains repeated data. For example if the +damaged file is a compressed tarball containing several partially modified +versions of the same file. + +The offset reported by lziprecover is the position in the reference file of +the first byte that could not be decompressed. This is the first byte that +will be compressed to reproduce the zeroed sector. + +The reproduce mode tries to reproduce the missing compressed data originally +present in the zeroed sector. It is based on the perfect reproducibility of +lzip files (lzip produces identical compressed output from identical input). +Therefore, the same version of lzip that created the file to be reproduced +should be used to reproduce the zeroed sector. Near versions may also work +because the output of lzip changes infrequently. If reproducing a tar.lz +archive created with tarlz, the version of lzip, clzip, or minilzip +corresponding to the version of the lzlib library used by tarlz to create +the archive should be used. + +When recovering a tar.lz archive and using as reference a file from the +filesystem, if the zeroed sector encodes (part of) a tar header, the archive +can't be reproduced. Therefore, the less overhead (smaller headers) a tar +archive has, the more probable is that the zeroed sector does not include a +header, and that the archive can be reproduced. The tarlz format has minimum +overhead. It uses basic ustar headers, and only adds extended pax headers +when they are required. + +@anchor{performance-of-reproduce} +@section Performance of @option{--reproduce} +Reproduce mode is especially useful when recovering a corrupt backup (or a +corrupt source tarball) that is part of a series. Usually only a small +fraction of the data changes from one backup to the next or from one version +of a source tarball to the next. This makes sometimes possible to reproduce +a given corrupted version using reference data from a near version. The +following two tables show the fraction of reproducible sectors (reproducible +sectors divided by total sectors in archive) for some archives, using sector +sizes of 512 and 4096 bytes. @samp{mailbox-aug.tar.lz} is a backup of some +of my mailboxes. @samp{backup-feb.tar.lz} and @samp{backup-apr.tar.lz} are +real backups of my own working directory: + +@multitable {Reference file} {gawk-5.0.1.tar.lz} {4369 / 5844 = 74.76%} +@headitem Reference file @tab File @tab Reproducible (512) +@item backup-feb.tar @tab backup-apr.tar.lz @tab 3273 / 4342 = 75.38% +@item backup-apr.tar @tab backup-feb.tar.lz @tab 3259 / 4161 = 78.32% +@item gawk-5.0.0.tar @tab gawk-5.0.1.tar.lz @tab 4369 / 5844 = 74.76% +@item gawk-5.0.1.tar @tab gawk-5.0.0.tar.lz @tab 4379 / 5603 = 78.15% +@item gmp-6.1.1.tar @tab gmp-6.1.2.tar.lz @tab 2454 / 3787 = 64.8% +@item gmp-6.1.2.tar @tab gmp-6.1.1.tar.lz @tab 2461 / 3782 = 65.07% +@end multitable + +@multitable {mailbox-mar.tar} {mailbox-aug.tar.lz} {4036 / 4252 = 94.92%} +@headitem Reference file @tab File @tab Reproducible (4096) +@item mailbox-mar.tar @tab mailbox-aug.tar.lz @tab 4036 / 4252 = 94.92% +@item backup-feb.tar @tab backup-apr.tar.lz @tab 264 / 542 = 48.71% +@item backup-apr.tar @tab backup-feb.tar.lz @tab 264 / 520 = 50.77% +@item gawk-5.0.0.tar @tab gawk-5.0.1.tar.lz @tab 327 / 730 = 44.79% +@item gawk-5.0.1.tar @tab gawk-5.0.0.tar.lz @tab 326 / 700 = 46.57% +@item gmp-6.1.1.tar @tab gmp-6.1.2.tar.lz @tab 175 / 473 = 37% +@item gmp-6.1.2.tar @tab gmp-6.1.1.tar.lz @tab 181 / 472 = 38.35% +@end multitable + +Note that the "performance of reproduce" is a probability, not a partial +recovery. The data are either recovered fully (with the probability X shown +in the last column of the tables above) or not recovered at all (with +probability @w{1 - X}). + +@noindent +Example 1: Recover a damaged source tarball with a zeroed sector of 512 +bytes at file position 1019904, using as reference another source tarball +for a different version of the software. + +@example +lziprecover -vv -e --reference-file=gmp-6.1.1.tar gmp-6.1.2.tar.lz +Reproducing bad area in member 1 of 1 + (begin = 1019904, size = 512, value = 0x00) + (master mpos = 1019904, dpos = 6292134) +warning: gmp-6.1.1.tar: Partial match found at offset 6277798, len 8716. +Reference data may be mixed with other data. +Trying level -9 + Reproducing position 1015808 +Member reproduced successfully. +Copy of input file reproduced successfully. +@end example + +@sp 1 +@anchor{ddrescue-example2} +@noindent +Example 2: Recover a damaged backup with a zeroed sector of 4096 bytes at +file position 1019904, using as reference a previous backup. The damaged +backup comes from a damaged partition copied with ddrescue. + +@example +ddrescue -b4096 -r10 /dev/sdc1 hdimage mapfile +mount -o loop,ro hdimage /mnt/hdimage +cp /mnt/hdimage/backup.tar.lz backup.tar.lz +umount /mnt/hdimage +lzip -t backup.tar.lz + backup.tar.lz: Decoder error at pos 1020530 +lziprecover -vv -e --reference-file=old_backup.tar backup.tar.lz +Reproducing bad area in member 1 of 1 + (begin = 1019904, size = 4096, value = 0x00) + (master mpos = 1019903, dpos = 5857954) +warning: old_backup.tar: Partial match found at offset 5743778, len 9546. +Reference data may be mixed with other data. +Trying level -9 + Reproducing position 1015808 +Member reproduced successfully. +Copy of input file reproduced successfully. +@end example + +@sp 1 +@noindent +Example 3: Recover a damaged backup with a zeroed sector of 4096 bytes at +file position 1019904, using as reference a file from the filesystem. (If +the zeroed sector encodes (part of) a tar header, the tarball can't be +reproduced). + +@example +# List the contents of the backup tarball to locate the damaged member. +tarlz -n0 -tvf backup.tar.lz + [...] + example.txt +tarlz: Skipping to next header. +tarlz: backup.tar.lz: Archive ends unexpectedly. +# Find in the filesystem the last file listed and use it as reference. +lziprecover -vv -e --reference-file=/somedir/example.txt backup.tar.lz +Reproducing bad area in member 1 of 1 + (begin = 1019904, size = 4096, value = 0x00) + (master mpos = 1019903, dpos = 5857954) +/somedir/example.txt: Match found at offset 9378 +Trying level -9 + Reproducing position 1015808 +Member reproduced successfully. +Copy of input file reproduced successfully. +@end example + +If @samp{backup.tar.lz} is a multimember file with more than one member +damaged and lziprecover shows the message @samp{One member reproduced. Copy +of input file still contains errors.}, the procedure shown in the example +above can be repeated until all the members have been reproduced. + +@samp{tarlz --keep-damaged -n0 -xf backup.tar.lz example.txt} produces a +partial copy of the reference file @samp{example.txt} that may help locate a +complete copy in the filesystem or in another backup, even if +@samp{example.txt} has been renamed. + + +@node Tarlz +@chapter Options supporting the tar.lz format +@cindex tarlz + +@uref{http://www.nongnu.org/lzip/manual/tarlz_manual.html,,Tarlz} is a +massively parallel (multi-threaded) combined implementation of the tar +archiver and the +@uref{http://www.nongnu.org/lzip/manual/lzip_manual.html,,lzip} compressor. + +Tarlz creates tar archives using a simplified and safer variant of the POSIX +pax format compressed in lzip format, keeping the alignment between tar +members and lzip members. The resulting multimember tar.lz archive is +backward compatible with standard tar tools like GNU tar, which treat it +like any other tar.lz archive. +@ifnothtml +@xref{Top,tarlz manual,,tarlz}, and @ref{Top,lzip manual,,lzip}. +@end ifnothtml + +Multimember tar.lz archives have some safety advantages over solidly +compressed tar.lz archives. For example, in case of corruption, tarlz can +extract all the undamaged members from the tar.lz archive, skipping over the +damaged members, just like the standard (uncompressed) tar. Keeping the +alignment between tar members and lzip members minimizes the amount of data +lost in case of corruption. In this chapter we'll explain the ways in which +lziprecover can recover and process multimember tar.lz archives. + +@sp 1 +@section Recovering damaged multimember tar.lz archives + +If you have several copies of the damaged archive, try merging them first +because merging has a high probability of success. @xref{Merging files}. If +the command below prints something like +@w{@samp{Input files merged successfully.}} you are done and +@samp{archive.tar.lz} now contains the recovered archive: + +@example +lziprecover -m -v -o archive.tar.lz a/archive.tar.lz b/archive.tar.lz +@end example + +If you only have one copy of the damaged archive with a zeroed block of data +caused by an I/O error, you may try to reproduce the archive. +@xref{Reproducing one sector}. If the command below prints something like +@w{@samp{Copy of input file reproduced successfully.}} you are done and +@samp{archive_fixed.tar.lz} now contains the recovered archive: + +@example +lziprecover -vv -e --reference-file=old_archive.tar archive.tar.lz +@end example + +If you only have one copy of the damaged archive, you may try to repair the +archive, but this has a lower probability of success. @xref{Repairing one +byte}. If the command below prints something like +@w{@samp{Copy of input file repaired successfully.}} you are done and +@samp{archive_fixed.tar.lz} now contains the recovered archive: + +@example +lziprecover -v -R archive.tar.lz +@end example + +If all the above fails, and the archive was created with tarlz, you may save +the damaged members for later and then copy the good members to another +archive. If the two commands below succeed, @samp{bad_members.tar.lz} will +contain all the damaged members and @samp{archive_cleaned.tar.lz} will +contain a good archive with the damaged members removed: + +@example +lziprecover -v --dump=damaged -o bad_members.tar.lz archive.tar.lz +lziprecover -v --strip=damaged -o archive_cleaned.tar.lz archive.tar.lz +@end example + +You can then use @samp{tarlz --keep-damaged} to recover as much data as +possible from each damaged member in @samp{bad_members.tar.lz}: + +@example +mkdir tmp +cd tmp +tarlz --keep-damaged -xvf ../bad_members.tar.lz +@end example + +@sp 1 +@section Processing multimember tar.lz archives + +Lziprecover is able to copy a list of members from a file to another. +For example the command +@w{@samp{lziprecover --dump=1-10:r1:tdata archive.tar.lz > subarch.tar.lz}} +creates a subset archive containing the first ten members, the end-of-file +blocks, and the trailing data (if any) of @samp{archive.tar.lz}. The +@samp{r1} part selects the last member, which in an appendable tar.lz +archive contains the end-of-file blocks. + + +@node File names +@chapter Names of the files produced by lziprecover +@cindex file names + +The name of the fixed file produced by @option{--byte-repair} and +@option{--merge} is made by appending the string @samp{_fixed.lz} to the +original file name. If the original file name ends with one of the +extensions @samp{.tar.lz}, @samp{.lz}, or @samp{.tlz}, the string +@samp{_fixed} is inserted before the extension. + + +@node File format +@chapter File format +@cindex file format + +Perfection is reached, not when there is no longer anything to add, but +when there is no longer anything to take away.@* +--- Antoine de Saint-Exupery + +@sp 1 +In the diagram below, a box like this: + +@verbatim ++---+ +| | <-- the vertical bars might be missing ++---+ +@end verbatim + +represents one byte; a box like this: + +@verbatim ++==============+ +| | ++==============+ +@end verbatim + +represents a variable number of bytes. + +@sp 1 +A lzip file consists of one or more independent "members" (compressed data +sets). The members simply appear one after another in the file, with no +additional information before, between, or after them. Each member can +encode in compressed form up to @w{16 EiB - 1 byte} of uncompressed data. +The size of a multimember file is unlimited. + +Each member has the following structure: + +@verbatim ++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size | ++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +@end verbatim + +All multibyte values are stored in little endian order. + +@table @samp +@item ID string (the "magic" bytes) +A four byte string, identifying the lzip format, with the value "LZIP" +(0x4C, 0x5A, 0x49, 0x50). + +@item VN (version number, 1 byte) +Just in case something needs to be modified in the future. 1 for now. + +@item DS (coded dictionary size, 1 byte) +The dictionary size is calculated by taking a power of 2 (the base size) +and subtracting from it a fraction between 0/16 and 7/16 of the base size.@* +Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@* +Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract +from the base size to obtain the dictionary size.@* +Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@* +Valid values for dictionary size range from 4 KiB to 512 MiB. + +@item LZMA stream +The LZMA stream, finished by an "End Of Stream" marker. Uses default values +for encoder properties. +@ifnothtml +@xref{Stream format,,,lzip}, +@end ifnothtml +@ifhtml +See +@uref{http://www.nongnu.org/lzip/manual/lzip_manual.html#Stream-format,,Stream format} +@end ifhtml +for a complete description. + +@item CRC32 (4 bytes) +Cyclic Redundancy Check (CRC) of the original uncompressed data. + +@item Data size (8 bytes) +Size of the original uncompressed data. + +@item Member size (8 bytes) +Total size of the member, including header and trailer. This field acts +as a distributed index, improves the checking of stream integrity, and +facilitates the safe recovery of undamaged members from multimember files. +Lzip limits the member size to @w{2 PiB} to prevent the data size field from +overflowing. + +@end table + + +@node Trailing data +@chapter Extra data appended to the file +@cindex trailing data + +Sometimes extra data are found appended to a lzip file after the last +member. Such trailing data may be: + +@itemize @bullet +@item +Padding added to make the file size a multiple of some block size, for +example when writing to a tape. It is safe to append any amount of +padding zero bytes to a lzip file. + +@item +Useful data added by the user; an "End Of File" string (to check that the +file has not been truncated), a cryptographically secure hash, a description +of file contents, etc. It is safe to append any amount of text to a lzip +file as long as none of the first four bytes of the text matches the +corresponding byte in the string "LZIP", and the text does not contain any +zero bytes (null characters). Nonzero bytes and zero bytes can't be safely +mixed in trailing data. + +@item +Garbage added by some not totally successful copy operation. + +@item +Malicious data added to the file in order to make its total size and +hash value (for a chosen hash) coincide with those of another file. + +@item +In rare cases, trailing data could be the corrupt header of another +member. In multimember or concatenated files the probability of +corruption happening in the magic bytes is 5 times smaller than the +probability of getting a false positive caused by the corruption of the +integrity information itself. Therefore it can be considered to be below +the noise level. Additionally, the test used by lziprecover to discriminate +trailing data from a corrupt header has a Hamming distance (HD) of 3, +and the 3 bit flips must happen in different magic bytes for the test to +fail. In any case, the option @option{--trailing-error} guarantees that +any corrupt header is detected. +@end itemize + +Trailing data are in no way part of the lzip file format, but tools +reading lzip files are expected to behave as correctly and usefully as +possible in the presence of trailing data. + +Trailing data can be safely ignored in most cases. In some cases, like +that of user-added data, they are expected to be ignored. In those cases +where a file containing trailing data must be rejected, the option +@option{--trailing-error} can be used. @xref{--trailing-error}. + +Lziprecover facilitates the management of metadata stored as trailing +data in lzip files. See the following examples: + +@noindent +Example 1: Add a comment or description to a compressed file. + +@example +# First append the comment as trailing data to a lzip file +echo 'This file contains this and that' >> file.lz +# This command prints the comment to standard output +lziprecover --dump=tdata file.lz +# This command outputs file.lz without the comment +lziprecover --strip=tdata file.lz > stripped_file.lz +# This command removes the comment from file.lz +lziprecover --remove=tdata file.lz +@end example + +@sp 1 +@noindent +Example 2: Add and check a cryptographically secure hash. (This may be +convenient, but a separate copy of the hash must be kept in a safe place +to guarantee that both file and hash have not been maliciously replaced). + +@example +sha256sum < file.lz >> file.lz +lziprecover --strip=tdata file.lz | sha256sum -c \ + <(lziprecover --dump=tdata file.lz) +@end example + + +@node Examples +@chapter A small tutorial with examples +@cindex examples + +Example 1: Extract all the files from archive @samp{foo.tar.lz}. + +@example + tar -xf foo.tar.lz +or + lziprecover -cd foo.tar.lz | tar -xf - +@end example + +@sp 1 +@noindent +Example 2: Restore a regular file from its compressed version +@samp{file.lz}. If the operation is successful, @samp{file.lz} is removed. + +@example +lziprecover -d file.lz +@end example + +@sp 1 +@noindent +Example 3: Check the integrity of the compressed file @samp{file.lz} and +show status. + +@example +lziprecover -tv file.lz +@end example + +@sp 1 +@anchor{concat-example} +@noindent +Example 4: The right way of concatenating the decompressed output of two or +more compressed files. @xref{Trailing data}. + +@example +Don't do this + cat file1.lz file2.lz file3.lz | lziprecover -d - +Do this instead + lziprecover -cd file1.lz file2.lz file3.lz +You may also concatenate the compressed files like this + lziprecover --strip=tdata file1.lz file2.lz file3.lz > file123.lz +Or keeping the trailing data of the last file like this + lziprecover --strip=empty file1.lz file2.lz file3.lz > file123.lz +@end example + +@sp 1 +@noindent +Example 5: Decompress @samp{file.lz} partially until @w{10 KiB} of +decompressed data are produced. + +@example +lziprecover -D 0,10KiB file.lz +@end example + +@sp 1 +@noindent +Example 6: Decompress @samp{file.lz} partially from decompressed byte at +offset 10000 to decompressed byte at offset 14999 (5000 bytes are produced). + +@example +lziprecover -D 10000-15000 file.lz +@end example + +@sp 1 +@noindent +Example 7: Repair a corrupt byte in the file @samp{file.lz}. (Indented lines +are abridged diagnostic messages from lziprecover). + +@example +lziprecover -v -R file.lz + Copy of input file repaired successfully. +lziprecover -tv file_fixed.lz + file_fixed.lz: ok +mv file_fixed.lz file.lz +@end example + +@sp 1 +@noindent +Example 8: Split the multimember file @samp{file.lz} and write each member +in its own @samp{recXXXfile.lz} file. Then use @w{@samp{lziprecover -t}} to +test the integrity of the resulting files. + +@example +lziprecover -s file.lz +lziprecover -tv rec*file.lz +@end example + + +@node Unzcrash +@chapter Testing the robustness of decompressors +@cindex unzcrash + +@xref{--unzcrash}, for a faster way of testing the robustness of lzip. + +The lziprecover package also includes unzcrash, a program written to test +robustness to decompression of corrupted data, inspired by unzcrash.c from +Julian Seward's bzip2. Type @samp{make unzcrash} in the lziprecover source +directory to build it. + +By default, unzcrash reads the file specified and then repeatedly +decompresses it, increasing 256 times each byte of the compressed data, so +as to test all possible one-byte errors. Note that it may take years or even +centuries to test all possible one-byte errors in a large file (tens of MB). + +If the option @option{--block} is given, unzcrash reads the file specified and +then repeatedly decompresses it, setting all bytes in each successive block +to the value given, so as to test all possible full sector errors. + +If the option @option{--truncate} is given, unzcrash reads the file specified +and then repeatedly decompresses it, truncating the file to increasing +lengths, so as to test all possible truncation points. + +None of the three test modes described above should cause any invalid memory +accesses. If any of them does, please, report it as a bug to the maintainers +of the decompressor being tested. + +Unzcrash really executes as a subprocess the shell command specified in the +first non-option argument, and then writes the file specified in the second +non-option argument to the standard input of the subprocess, modifying the +corresponding byte each time. Therefore unzcrash can be used to test any +decompressor (not only lzip), or even other decoder programs having a +suitable command-line syntax. + +If the decompressor returns with zero status, unzcrash compares the output +of the decompressor for the original and corrupt files. If the outputs +differ, it means that the decompressor returned a false negative; it failed +to recognize the corruption and produced garbage output. The only exception +is when a multimember file is truncated just after the last byte of a +member, producing a shorter but valid compressed file. Except in this latter +case, please, report any false negative as a bug. + +In order to compare the outputs, unzcrash needs a @samp{zcmp} program able +to understand the format being tested. For example the @samp{zcmp} provided +by @uref{http://www.nongnu.org/zutils/manual/zutils_manual.html#Zcmp,,zutils}. +If the @samp{zcmp} program used does not understand the format being tested, +all the comparisons fail because the compressed files are compared without +being decompressed first. Use @option{--zcmp=false} to disable comparisons. +@ifnothtml +@xref{Zcmp,,,zutils}. +@end ifnothtml + +The format for running unzcrash is: + +@example +unzcrash [@var{options}] 'lzip -t' @var{file} +@end example + +@noindent +The compressed @var{file} must not contain errors and the decompressor being +tested must decompress it correctly for the comparisons to work. + +unzcrash supports the following options: + +@table @code +@item -h +@itemx --help +Print an informative help message describing the options and exit. + +@item -V +@itemx --version +Print the version number of unzcrash on the standard output and exit. +This version number should be included in all bug reports. + +@item -b @var{range} +@itemx --bits=@var{range} +Test N-bit errors only, instead of testing all the 255 wrong values for +each byte. @samp{N-bit error} means any value differing from the +original value in N bit positions, not a value differing from the +original value in the bit position N.@* +The number of N-bit errors per byte (N = 1 to 8) is: +@w{8 28 56 70 56 28 8 1} + +@multitable {Examples of @var{range}} {Tests errors of N-bits} +@item Examples of @var{range} @tab Tests errors of N-bits +@item 1 @tab 1 +@item 1,2,3 @tab 1, 2, 3 +@item 2-4 @tab 2, 3, 4 +@item 1,3-5,8 @tab 1, 3, 4, 5, 8 +@item 1-3,5-8 @tab 1, 2, 3, 5, 6, 7, 8 +@end multitable + +@item -B[@var{size}][,@var{value}] +@itemx --block[=@var{size}][,@var{value}] +Test block errors of given @var{size}, simulating a whole sector I/O error. +@var{size} defaults to 512 bytes. @var{value} defaults to 0. By default, +only contiguous, non-overlapping blocks are tested, but this may be changed +with the option @option{--delta}. + +@item -d @var{n} +@itemx --delta=@var{n} +Test one byte, block, or truncation size every @var{n} bytes. If +@option{--delta} is not specified, unzcrash tests all the bytes, +non-overlapping blocks, or truncation sizes. Values of @var{n} smaller than +the block size result in overlapping blocks. (Which is convenient for +testing because there are usually too few non-overlapping blocks in a file). + +@item -e @var{position},@var{value} +@itemx --set-byte=@var{position},@var{value} +Set byte at @var{position} to @var{value} in the internal buffer after +reading and testing @var{file} but before the first test call to the +decompressor. Byte positions start at 0. If @var{value} is preceded by +@samp{+}, it is added to the original value of the byte at @var{position}. +If @var{value} is preceded by @samp{f} (flip), it is XORed with the original +value of the byte at @var{position}. This option can be used to run tests +with a changed dictionary size, for example. + +@item -n +@itemx --no-check +Skip initial test of @var{file} and @samp{zcmp}. May speed up things a lot +when testing many (or large) known good files. + +@item -p @var{bytes} +@itemx --position=@var{bytes} +First byte position to test in the file. Defaults to 0. Negative values +are relative to the end of the file. + +@item -q +@itemx --quiet +Quiet operation. Suppress all messages. + +@item -s @var{bytes} +@itemx --size=@var{bytes} +Number of byte positions to test. If not specified, the rest of the file +is tested (from @option{--position} to end of file). Negative values are +relative to the rest of the file. + +@item -t +@itemx --truncate +Test all possible truncation points in the range specified by +@option{--position} and @option{--size}. + +@item -v +@itemx --verbose +Verbose mode. + +@item -z +@itemx --zcmp=<command> +Set zcmp command name and options. Defaults to @samp{zcmp}. Use +@option{--zcmp=false} to disable comparisons. If testing a decompressor +different from the one used by default by zcmp, it is needed to force +unzcrash and zcmp to use the same decompressor with a command like +@w{@samp{unzcrash --zcmp='zcmp --lz=plzip' 'plzip -t' @var{file}}} + +@end table + +Exit status: 0 for a normal exit, 1 for environmental problems +(file not found, invalid command-line options, I/O errors, etc), 2 to +indicate a corrupt or invalid input file, 3 for an internal consistency +error (e.g., bug) which caused unzcrash to panic. + + +@node Problems +@chapter Reporting bugs +@cindex bugs +@cindex getting help + +There are probably bugs in lziprecover. There are certainly errors and +omissions in this manual. If you report them, they will get fixed. If +you don't, no one will ever know about them and they will remain unfixed +for all eternity, if not longer. + +If you find a bug in lziprecover, please send electronic mail to +@email{lzip-bug@@nongnu.org}. Include the version number, which you can +find by running @w{@samp{lziprecover --version}}. + + +@node Concept index +@unnumbered Concept index + +@printindex cp + +@bye |