diff options
Diffstat (limited to 'doc/lziprecover.texi')
-rw-r--r-- | doc/lziprecover.texi | 606 |
1 files changed, 606 insertions, 0 deletions
diff --git a/doc/lziprecover.texi b/doc/lziprecover.texi new file mode 100644 index 0000000..be4fc27 --- /dev/null +++ b/doc/lziprecover.texi @@ -0,0 +1,606 @@ +\input texinfo @c -*-texinfo-*- +@c %**start of header +@setfilename lziprecover.info +@documentencoding ISO-8859-15 +@settitle Lziprecover Manual +@finalout +@c %**end of header + +@set UPDATED 5 April 2014 +@set VERSION 1.16-pre1 + +@dircategory Data Compression +@direntry +* Lziprecover: (lziprecover). Data recovery tool for lzip files +@end direntry + + +@ifnothtml +@titlepage +@title Lziprecover +@subtitle Data recovery tool for lzip files +@subtitle for Lziprecover version @value{VERSION}, @value{UPDATED} +@author by Antonio Diaz Diaz + +@page +@vskip 0pt plus 1filll +@end titlepage + +@contents +@end ifnothtml + +@node Top +@top + +This manual is for Lziprecover (version @value{VERSION}, @value{UPDATED}). + +@menu +* Introduction:: Purpose and features of lziprecover +* Invoking lziprecover:: Command line interface +* Repairing files:: Fixing bit-flip and similar errors +* Merging files:: Fixing several damaged copies +* File format:: Detailed format of the compressed file +* Examples:: A small tutorial with examples +* Unzcrash:: Testing the robustness of decompressors +* Problems:: Reporting bugs +* Concept index:: Index of concepts +@end menu + +@sp 1 +Copyright @copyright{} 2009, 2010, 2011, 2012, 2013, 2014 +Antonio Diaz Diaz. + +This manual is free documentation: you have unlimited permission +to copy, distribute and modify it. + + +@node Introduction +@chapter Introduction +@cindex introduction + +Lziprecover is a data recovery tool and decompressor for files in the +lzip compressed data format (.lz), able to repair slightly damaged +files, recover badly damaged files from two or more copies, extract data +from damaged files, decompress files and test integrity of files. + +The lzip file format is designed for long-term data archiving, taking +into account both data integrity and decoder availability: + +@itemize @bullet +@item +The lzip format provides very safe integrity checking and some data +recovery means. The lziprecover program can repair bit-flip errors (one +of the most common forms of data corruption) in lzip files, and provides +data recovery capabilities, including error-checked merging of damaged +copies of a file. + +@item +The lzip format is as simple as possible (but not simpler). The lzip +manual provides the code of a simple decompressor along with a detailed +explanation of how it works, so that with the only help of the lzip +manual it would be possible for a digital archaeologist to extract the +data from a lzip file long after quantum computers eventually render +LZMA obsolete. + +@item +Additionally lzip is copylefted, which guarantees that it will remain +free forever. +@end itemize + +Lziprecover is able to recover or decompress files produced by any of +the compressors in the lzip family; lzip, plzip, minilzip/lzlib, clzip +and pdlzip. + +If the cause of file corruption is damaged media, the combination +@w{GNU ddrescue + lziprecover} is the best option for recovering data +from multiple damaged copies. @xref{ddrescue-example}, for an example. + +If a file is too damaged for lziprecover to repair it, all the +recoverable data in all members of the file can be extracted with the +following command (the resulting file may contain errors and some +garbage data may be produced at the end of each member): + +@example +lziprecover -D0 -i -o file -q file.lz +@end example + +Lziprecover is able to efficiently extract a range of bytes from a +multi-member file, because it only decompresses the members containing +the desired data. + +Lziprecover can print correct total file sizes and ratios even for +multi-member files. + +When recovering data, lziprecover takes as arguments the names of the +damaged files and writes zero or more recovered files depending on the +operation selected and whether the recovery succeeded or not. The +damaged files themselves are never modified. + +When decompressing or testing file integrity, lziprecover behaves like +lzip or lunzip. + +Lziprecover is not a replacement for regular backups, but a last line of +defense for the case where the backups are also damaged. + + +@node Invoking lziprecover +@chapter Invoking lziprecover +@cindex invoking + +The format for running lziprecover is: + +@example +lziprecover [@var{options}] [@var{files}] +@end example + +Lziprecover supports the following options: + +@table @samp +@item -h +@itemx --help +Print an informative help message describing the options and exit. + +@item -V +@itemx --version +Print the version number of lziprecover on the standard output and exit. + +@item -c +@itemx --stdout +Decompress to standard output. Needed when reading from a named pipe +(fifo) or from a device. Use it to recover as much of the uncompressed +data as possible when decompressing a corrupt file. + +@item -d +@itemx --decompress +Decompress. + +@item -D @var{range} +@itemx --range-decompress=@var{range} +Decompress only a range of bytes starting at decompressed byte position +@samp{@var{begin}} and up to byte position @w{@samp{@var{end} - 1}}. +Three formats of @var{range} are recognized, @samp{@var{begin}}, +@samp{@var{begin}-@var{end}}, and @samp{@var{begin},@var{size}}. If only +@var{begin} is specified, @var{end} is taken as the end of the file. The +produced bytes are sent to standard output unless the @samp{--output} +option is used. In order to guarantee the correctness of the data +produced, all members containing any part of the desired data are +decompressed and their integrity is verified. This operation is more +efficient in multi-member files because it only decompresses the members +containing the desired data. + +@item -f +@itemx --force +Force overwrite of output files. + +@item -i +@itemx --ignore-errors +Make @samp{--range-decompress} ignore data errors and continue +decompressing the remaining members in the file. For example, +@w{@samp{lziprecover -i -D0 file.lz > file}} decompresses all the +recoverable data in all members of @samp{file.lz} without having to +split it first. + +@item -k +@itemx --keep +Keep (don't delete) input files during decompression. + +@item -l +@itemx --list +Print total file sizes and ratios. The values produced are correct even +for multi-member files. Use it together with @samp{-v} to see +information about the members in the file. + +@item -m +@itemx --merge +Try to produce a correct file merging the good parts of two or more +damaged copies. If successful, a repaired copy is written to the file +@samp{@var{file}_fixed.lz}. The exit status is 0 if a correct file could +be produced, 2 otherwise. See the chapter @samp{Merging files} +(@pxref{Merging files}) for a complete description of the merge mode. + +@item -o @var{file} +@itemx --output=@var{file} +Place the output into @samp{@var{file}} instead of into +@samp{@var{file}_fixed.lz}. If splitting, the names of the files +produced are in the form @samp{rec01@var{file}}, @samp{rec02@var{file}}, +etc. If decompressing from standard input and @samp{--stdout} has not +been specified, use @samp{@var{file}} as the name of the decompressed +file. + +@item -q +@itemx --quiet +Quiet operation. Suppress all messages. + +@item -R +@itemx --repair +Try to repair a file with small errors (up to one byte error per +member). If successful, a repaired copy is written to the file +@samp{@var{file}_fixed.lz}. @samp{@var{file}} is not modified at all. +The exit status is 0 if the file could be repaired, 2 otherwise. See the +chapter @samp{Repairing files} (@pxref{Repairing files}) for a complete +description of the repair mode. + +@item -s +@itemx --split +Search for members in @samp{@var{file}} and write each member in its own +@samp{.lz} file. You can then use @samp{lziprecover -t} to test the +integrity of the resulting files, decompress those which are undamaged, +and try to repair or partially decompress those which are damaged. + +The names of the files produced are in the form +@samp{rec01@var{file}.lz}, @samp{rec02@var{file}.lz}, etc, and are +designed so that the use of wildcards in subsequent processing, for +example, @w{@samp{lziprecover -cd rec*@var{file}.lz > recovered_data}}, +processes the files in the correct order. The number of digits used in +the names varies depending on the number of members in @samp{@var{file}}. + +@item -t +@itemx --test +Check integrity of the specified file(s), but don't decompress them. +This really performs a trial decompression and throws away the result. +Use it together with @samp{-v} to see information about the file. + +@item -v +@itemx --verbose +Verbose mode.@* +When decompressing or testing, further -v's (up to 4) increase the +verbosity level, showing status, compression ratio, dictionary size, +trailer contents (CRC, data size, member size), and up to 6 bytes of +trailing garbage (if any). + +@end table + +Numbers given as arguments to options may be followed by a multiplier +and an optional @samp{B} for "byte". + +Table of SI and binary prefixes (unit multipliers): + +@multitable {Prefix} {kilobyte (10^3 = 1000)} {|} {Prefix} {kibibyte (2^10 = 1024)} +@item Prefix @tab Value @tab | @tab Prefix @tab Value +@item k @tab kilobyte (10^3 = 1000) @tab | @tab Ki @tab kibibyte (2^10 = 1024) +@item M @tab megabyte (10^6) @tab | @tab Mi @tab mebibyte (2^20) +@item G @tab gigabyte (10^9) @tab | @tab Gi @tab gibibyte (2^30) +@item T @tab terabyte (10^12) @tab | @tab Ti @tab tebibyte (2^40) +@item P @tab petabyte (10^15) @tab | @tab Pi @tab pebibyte (2^50) +@item E @tab exabyte (10^18) @tab | @tab Ei @tab exbibyte (2^60) +@item Z @tab zettabyte (10^21) @tab | @tab Zi @tab zebibyte (2^70) +@item Y @tab yottabyte (10^24) @tab | @tab Yi @tab yobibyte (2^80) +@end multitable + +@sp 1 +Exit status: 0 for a normal exit, 1 for environmental problems (file not +found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or +invalid input file, 3 for an internal consistency error (eg, bug) which +caused lziprecover to panic. + + +@node Repairing files +@chapter Repairing files +@cindex repairing files + +Lziprecover is able to repair files with small errors (up to one byte +error per member). The error may be located anywhere in the file except +in the header (first 6 bytes of each member) or in the @samp{Member +size} field of the trailer (last 8 bytes of each member). This makes +lzip files resistant to bit-flip, one of the most common forms of data +corruption. + +Bit-flip happens when one bit in the file is changed from 0 to 1 or vice +versa. It may be caused by bad RAM or even by natural radiation. I have +seen a case of bit-flip in a file stored in an USB flash drive. + + +@node Merging files +@chapter Merging files +@cindex merging files + +If you have several copies of a file but all of them are too damaged to +repair them (@pxref{Repairing files}), lziprecover can try to produce a +correct file merging the good parts of the damaged copies. + +The merge may succeed even if some copies of the file have all the +headers and trailers damaged, as long as there is at least one copy of +every header and trailer intact, even if they are in different copies of +the file. + +The merge will fail if the damaged areas overlap (at least one byte is +damaged in all copies), or are adjacent and the boundary can't be +determined, or if the copies have too many damaged areas. + +All the copies must have the same size. If some of them have been +truncated and are therefore smaller than they should, you can extend +them to the correct size with the following command before merging them +with the other copies: + +@example +ddrescue --extend-outfile=<correct_size> small_file.lz extended_file.lz +@end example + +If some of the copies have got garbage data at the end and are therefore +larger than they should, you can reduce their sizes to the correct value +with the following command before merging them with the other copies: + +@example +ddrescue --size=<correct_size> large_file.lz reduced_file.lz +@end example + +To give you an idea of its possibilities, when merging two copies, each +of them with one damaged area affecting 1 percent of the copy, the +probability of obtaining a correct file is about 98 percent. With three +such copies the probability rises to 99.97 percent. For large files (a +few MB) with small errors (one sector damaged per copy), the probability +approaches 100 percent even with only two copies. + + +@node File format +@chapter File format +@cindex file format + +Perfection is reached, not when there is no longer anything to add, but +when there is no longer anything to take away.@* +--- Antoine de Saint-Exupery + +@sp 1 +In the diagram below, a box like this: +@verbatim ++---+ +| | <-- the vertical bars might be missing ++---+ +@end verbatim + +represents one byte; a box like this: +@verbatim ++==============+ +| | ++==============+ +@end verbatim + +represents a variable number of bytes. + +@sp 1 +A lzip file consists of a series of "members" (compressed data sets). +The members simply appear one after another in the file, with no +additional information before, between, or after them. + +Each member has the following structure: +@verbatim ++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| ID string | VN | DS | Lzma stream | CRC32 | Data size | Member size | ++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +@end verbatim + +All multibyte values are stored in little endian order. + +@table @samp +@item ID string +A four byte string, identifying the lzip format, with the value "LZIP" +(0x4C, 0x5A, 0x49, 0x50). + +@item VN (version number, 1 byte) +Just in case something needs to be modified in the future. 1 for now. + +@item DS (coded dictionary size, 1 byte) +Lzip divides the distance between any two powers of 2 into 8 equally +spaced intervals, named "wedges". The dictionary size is calculated by +taking a power of 2 (the base size) and substracting from it a number of +wedges between 0 and 7. The size of a wedge is (base_size / 16).@* +Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@* +Bits 7-5 contain the number of wedges (0 to 7) to substract from the +base size to obtain the dictionary size.@* +Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@* +Valid values for dictionary size range from 4 KiB to 512 MiB. + +@item Lzma stream +The lzma stream, finished by an end of stream marker. Uses default values +for encoder properties. See the lzip manual for a full description. + +@item CRC32 (4 bytes) +CRC of the uncompressed original data. + +@item Data size (8 bytes) +Size of the uncompressed original data. + +@item Member size (8 bytes) +Total size of the member, including header and trailer. This field acts +as a distributed index, allows the verification of stream integrity, and +facilitates safe recovery of undamaged members from multi-member files. + +@end table + + +@node Examples +@chapter A small tutorial with examples +@cindex examples + +Example 1: Restore a regular file from its compressed version +@samp{file.lz}. If the operation is successful, @samp{file.lz} is +removed. + +@example +lziprecover -d file.lz +@end example + +@sp 1 +@noindent +Example 2: Verify the integrity of the compressed file @samp{file.lz} +and show status. + +@example +lziprecover -tv file.lz +@end example + +@sp 1 +@noindent +Example 3: Decompress @samp{file.lz} partially until 10 KiB of +decompressed data are produced. + +@example +lziprecover -D 0,10KiB file.lz +@end example + +@sp 1 +@noindent +Example 4: Decompress @samp{file.lz} partially from decompressed byte +10000 to decompressed byte 15000 (5000 bytes are produced). + +@example +lziprecover -D 10000-15000 file.lz +@end example + +@sp 1 +@noindent +Example 5: Repair small errors in the file @samp{file.lz}. (Indented +lines are abridged diagnostic messages from lziprecover). + +@example +lziprecover -v -R file.lz + Copy of input file repaired successfully. +mv file_fixed.lz file.lz +@end example + +@sp 1 +@noindent +Example 6: Split the multi-member file @samp{file.lz} and write each +member in its own @samp{recXXXfile.lz} file. Then use +@w{@samp{lziprecover -t}} to test the integrity of the resulting files. + +@example +lziprecover -s file.lz +lziprecover -tv rec*file.lz +@end example + +@sp 1 +@anchor{ddrescue-example} +@noindent +Example 7: Recover a compressed backup from two copies on CD-ROM with +error-checked merging of copies +@ifnothtml +(@xref{Top,GNU ddrescue manual,,ddrescue}, +@end ifnothtml +@ifhtml +(See the +@uref{http://www.gnu.org/software/ddrescue/manual/ddrescue_manual.html,,ddrescue manual} +@end ifhtml +for details about ddrescue). + +@example +ddrescue -b2048 /dev/cdrom cdimage1 logfile1 +mount -t iso9660 -o loop,ro cdimage1 /mnt/cdimage +cp /mnt/cdimage/backup.tar.lz rescued1.tar.lz +umount /mnt/cdimage + (insert second copy in the CD drive) +ddrescue -b2048 /dev/cdrom cdimage2 logfile2 +mount -t iso9660 -o loop,ro cdimage2 /mnt/cdimage +cp /mnt/cdimage/backup.tar.lz rescued2.tar.lz +umount /mnt/cdimage +lziprecover -m -v -o backup.tar.lz rescued1.tar.lz rescued2.tar.lz +@end example + +@sp 1 +@noindent +Example 8: Recover the first volume of those created with the command +@w{@samp{lzip -b 32MiB -S 650MB big_db}} from two copies, +@samp{big_db1_00001.lz} and @samp{big_db2_00001.lz}, with member 07 +damaged in the first copy, member 18 damaged in the second copy, and +member 12 damaged in both copies. The correct file produced is saved in +@samp{big_db_00001.lz}. + +@example +lziprecover -m -v -o big_db_00001.lz big_db1_00001.lz big_db2_00001.lz + Input files merged successfully +@end example + + +@node Unzcrash +@chapter Testing the robustness of decompressors +@cindex unzcrash + +The lziprecover package also includes unzcrash, a program written to +test robustness to decompression of corrupted data, inspired by +unzcrash.c from Julian Seward's bzip2. Type @samp{make unzcrash} in the +lziprecover source directory to build it. + +Unzcrash reads the specified file and then repeatedly decompresses it, +increasing 256 times each byte of the compressed data, so as to test all +possible one-byte errors. This should not cause any invalid memory +accesses. If it does, please, report it as a bug. + +Unzcrash really executes as a subprocess the shell command specified in +the first non-option argument, and then writes the file specified in the +second non-option argument to the standard input of the subprocess, +modifying the corresponding byte each time. Therefore you can use +unzcrash to test any decompressor (not only lzip), or even other decoder +programs with a suitable command line syntax. + +The format for running unzcrash is: + +@example +unzcrash [@var{options}] "lzip -tv" @var{filename}.lz +@end example + +Unzcrash supports the following options: + +@table @samp +@item -h +@itemx --help +Print an informative help message describing the options and exit. + +@item -V +@itemx --version +Print the version number of unzcrash on the standard output and exit. + +@item -b @var{range} +@itemx --bits=@var{range} +Test N-bit errors only, instead of testing all the 255 wrong values for +each byte. @samp{N-bit error} means any value differing from the +original value in N bit positions, not a value differing from the +original value in the bit position N.@* +The number of N-bit errors per byte (N = 1 to 8) is: 8 28 56 70 56 28 8 1@* +Examples of @var{range}: 1 1,2,3 1-4 1,3-5,8 1-3,5-8 + +@item -p @var{bytes} +@itemx --position=@var{bytes} +First byte position to test in the file. Defaults to 0. + +@item -q +@itemx --quiet +Quiet operation. Suppress all messages. + +@item -s @var{bytes} +@itemx --size=@var{bytes} +Number of byte positions to test. If not specified, the whole file is +tested. + +@item -v +@itemx --verbose +Verbose mode. + +@end table + +Exit status: 0 for a normal exit, 1 for environmental problems (file not +found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or +invalid input file, 3 for an internal consistency error (eg, bug) which +caused unzcrash to panic. + + +@node Problems +@chapter Reporting bugs +@cindex bugs +@cindex getting help + +There are probably bugs in lziprecover. There are certainly errors and +omissions in this manual. If you report them, they will get fixed. If +you don't, no one will ever know about them and they will remain unfixed +for all eternity, if not longer. + +If you find a bug in lziprecover, please send electronic mail to +@email{lzip-bug@@nongnu.org}. Include the version number, which you can +find by running @w{@samp{lziprecover --version}}. + + +@node Concept index +@unnumbered Concept index + +@printindex cp + +@bye |