diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-11-24 04:36:43 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-11-24 04:36:43 +0000 |
commit | ab77d16ba47322aab30703e251efbffb680ce0bb (patch) | |
tree | 5d2146c01e938fa6cac7c349192088cb2e962ac5 /doc | |
parent | Adding upstream version 1.25~pre1. (diff) | |
download | lziprecover-upstream.tar.xz lziprecover-upstream.zip |
Adding upstream version 1.25~rc1.upstream/1.25_rc1upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc')
-rw-r--r-- | doc/lziprecover.1 | 8 | ||||
-rw-r--r-- | doc/lziprecover.info | 381 | ||||
-rw-r--r-- | doc/lziprecover.texi | 259 |
3 files changed, 389 insertions, 259 deletions
diff --git a/doc/lziprecover.1 b/doc/lziprecover.1 index bb39f36..0fafa13 100644 --- a/doc/lziprecover.1 +++ b/doc/lziprecover.1 @@ -1,5 +1,5 @@ .\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.2. -.TH LZIPRECOVER "1" "October 2024" "lziprecover 1.25-pre1" "User Commands" +.TH LZIPRECOVER "1" "November 2024" "lziprecover 1.25-rc1" "User Commands" .SH NAME lziprecover \- recovers data from damaged lzip files .SH SYNOPSIS @@ -119,12 +119,6 @@ remove members, tdata from files in place \fB\-\-strip=\fR<list>:d:e:t copy files to stdout stripping members given .TP -\fB\-\-ignore\-empty\fR -ignore empty members in multimember files -.TP -\fB\-\-ignore\-nonzero\fR -ignore a nonzero first LZMA byte -.TP \fB\-\-loose\-trailing\fR allow trailing data seeming corrupt header .TP diff --git a/doc/lziprecover.info b/doc/lziprecover.info index 197af5e..20fd014 100644 --- a/doc/lziprecover.info +++ b/doc/lziprecover.info @@ -12,12 +12,13 @@ File: lziprecover.info, Node: Top, Next: Introduction, Up: (dir) Lziprecover Manual ****************** -This manual is for Lziprecover (version 1.25-pre1, 1 October 2024). +This manual is for Lziprecover (version 1.25-rc1, 18 November 2024). * Menu: * Introduction:: Purpose and features of lziprecover * Invoking lziprecover:: Command-line interface +* Argument syntax:: By convention, options start with a hyphen * File format:: Detailed format of the compressed file * Data safety:: Protecting data from accidental loss * Fec files:: Forward Error Correction @@ -112,8 +113,9 @@ pdlzip. If the cause of file corruption is a damaged medium, the combination GNU ddrescue + lziprecover is the recommended option for recovering data -from damaged lzip files. *Note ddrescue-example::, and *note -ddrescue-example2::, for examples. +from damaged files. *Note ddrescue-example::, *note ddrescue-example2::, and +*note ddrescue-example3::, for examples. *Note GNU ddrescue manual: +(ddrescue)Top, for details about ddrescue. If a file is too damaged for lziprecover to repair it, all the recoverable data in all members of the file can be extracted with the @@ -135,7 +137,7 @@ have been compressed. Decompressed is used to refer to data which have undergone the process of decompression. -File: lziprecover.info, Node: Invoking lziprecover, Next: File format, Prev: Introduction, Up: Top +File: lziprecover.info, Node: Invoking lziprecover, Next: Argument syntax, Prev: Introduction, Up: Top 2 Invoking lziprecover ********************** @@ -150,8 +152,7 @@ first time it appears in the command line. If no file names are specified, lziprecover decompresses from standard input to standard output. Remember to prepend './' to any file name beginning with a hyphen, or use '--'. -lziprecover supports the following options: *Note Argument syntax: -(arg_parser)Argument syntax. +lziprecover supports the following options: *Note Argument syntax::. '-h' '--help' @@ -175,7 +176,7 @@ lziprecover supports the following options: *Note Argument syntax: dictionary size of the resulting file (and therefore the amount of memory required to decompress it). Only streamed files with default LZMA properties can be converted; non-streamed lzma-alone files lack - the "End Of Stream" marker required in lzip files. + the 'End Of Stream' marker required in lzip files. The name of the converted lzip file is derived from that of the original lzma-alone file as follows: @@ -215,23 +216,24 @@ lziprecover supports the following options: *Note Argument syntax: status 1. If a file fails to decompress, or is a terminal, lziprecover exits immediately with error status 2 without decompressing the rest of the files. A terminal is considered an uncompressed file, and - therefore invalid. + therefore invalid. A multimember file with one or more empty members + is accepted if redirected to standard input or if '-i' is given. '-D RANGE' '--range-decompress=RANGE' Decompress only a range of bytes starting at decompressed byte position - BEGIN and up to byte position END - 1. Byte positions start at 0. This - option provides random access to the data in multimember files; it - only decompresses the members containing the desired data. In order to - guarantee the correctness of the data produced, all members containing - any part of the desired data are decompressed and their integrity is - checked. + BEGIN and up to byte position END - 1. Byte positions start at 0. The + bytes produced are sent to standard output unless the option '-o' is + used. This option provides random access to the data in multimember + files; it only decompresses the members containing the desired data. + In order to guarantee the correctness of the data produced, all + members containing any part of the desired data are decompressed and + their integrity is checked. Four formats of RANGE are recognized, 'BEGIN', 'BEGIN-END', 'BEGIN,SIZE', and ',SIZE'. If only BEGIN is specified, END is taken as the end of the file. If only SIZE is specified, BEGIN is taken as the - beginning of the file. The bytes produced are sent to standard output - unless the option '--output' is used. + beginning of the file. '-e' '--reproduce' @@ -325,7 +327,8 @@ lziprecover supports the following options: *Note Argument syntax: '-k' '--keep' - Keep (don't delete) input files during decompression. + Keep (don't delete) input files during decompression or conversion from + lzma-alone. '-l' '--list' @@ -336,9 +339,11 @@ lziprecover supports the following options: *Note Argument syntax: '-v', the dictionary size, the number of members in the file, and the amount of trailing data (if any) are also printed. With '-vv', the positions and sizes of each member in multimember files are also - printed. With '-i', format errors are ignored, and with '-ivv', gaps - between members are shown. The member numbers shown coincide with the - file numbers produced by '--split'. + printed. A multimember file with one or more empty members is accepted + if redirected to standard input or if '-i' is given. With '-i', format + errors are ignored, and with '-ivv', gaps between members are shown. + The member numbers start at 1 and coincide with the file numbers + produced by '--split'. If any file is damaged, does not exist, can't be opened, or is not regular, the final exit status is > 0. '-lq' can be used to check @@ -358,8 +363,8 @@ lziprecover supports the following options: *Note Argument syntax: '-n N' '--threads=N' Set the maximum number of worker threads for '--fec=create', - overriding the system's default. Valid values range from 1 to "as many - as your system can support". If this option is not used, lziprecover + overriding the system's default. Valid values range from 1 to as many + as your system can support. If this option is not used, lziprecover tries to detect the number of processors in the system and use it as default value. 'lziprecover --help' shows the system's default value. @@ -367,7 +372,7 @@ lziprecover supports the following options: *Note Argument syntax: '--output=FILE[/]' If repairing, place the repaired output into FILE instead of into FILE_fixed.lz. If splitting, the names of the files produced are in - the form 'rec01FILE', 'rec02FILE', etc. + the form 'rec1FILE', 'rec2FILE', etc. If creating FEC data and '-c' has not been also specified, write the FEC data to FILE. If FILE ends with a slash, it is interpreted as the @@ -415,8 +420,8 @@ lziprecover supports the following options: *Note Argument syntax: headers or trailers, try to split FILE and then work on each member individually. - The names of the files produced are in the form 'rec01FILE', - 'rec02FILE', etc, and are designed so that the use of wildcards in + The names of the files produced are in the form 'rec1FILE', + 'rec2FILE', etc, and are designed so that the use of wildcards in subsequent processing, for example, 'lziprecover -cd rec*FILE > recovered_data', processes the files in the correct order. The number of digits used in the names varies @@ -430,7 +435,9 @@ lziprecover supports the following options: *Note Argument syntax: fails the test, does not exist, can't be opened, or is a terminal, lziprecover continues testing the rest of the files. A final diagnostic is shown at verbosity level 1 or higher if any file fails - the test when testing multiple files. + the test when testing multiple files. A multimember file with one or + more empty members is accepted if redirected to standard input or if + '-i' is given. '-v' '--verbose' @@ -448,14 +455,13 @@ lziprecover supports the following options: *Note Argument syntax: '--dump=[MEMBER_LIST][:damaged][:empty][:tdata]' Dump the members listed, the damaged members (if any), the empty members (if any), or the trailing data (if any) of one or more regular - multimember files to standard output, or to a file if the option - '--output' is used. If more than one file is given, the elements - dumped from all the files are concatenated. If a file does not exist, - can't be opened, or is not regular, lziprecover continues processing - the rest of the files. If the dump fails in one file, lziprecover - exits immediately without processing the rest of the files. Only - '--dump=tdata' can write to a terminal. '--dump=damaged' implies - '--ignore-errors'. + multimember files to standard output, or to a file if the option '-o' + is used. If more than one file is given, the elements dumped from all + the files are concatenated. If a file does not exist, can't be opened, + or is not regular, lziprecover continues processing the rest of the + files. If the dump fails in one file, lziprecover exits immediately + without processing the rest of the files. Only '--dump=tdata' can + write to a terminal. '--dump=damaged' implies '--ignore-errors'. The argument to '--dump' is a colon-separated list of the following element specifiers; a member list (1,3-6), a reverse member list @@ -509,35 +515,23 @@ lziprecover supports the following options: *Note Argument syntax: '--strip=[MEMBER_LIST][:damaged][:empty][:tdata]' Copy one or more regular multimember files to standard output (or to a - file if the option '--output' is used), stripping the members listed, - the damaged members (if any), the empty members (if any), or the - trailing data (if any) from each file. If all members in a file are - selected to be stripped, the trailing data (if any) are also stripped - even if 'tdata' is not specified. If more than one file is given, the - files are concatenated. In this case the trailing data are also - stripped from all but the last file even if 'tdata' is not specified. - If a file does not exist, can't be opened, or is not regular, - lziprecover continues processing the rest of the files. If a file - fails to copy, lziprecover exits immediately without processing the - rest of the files. See '--dump' above for a description of the - argument. - -'--ignore-empty' - When decompressing, testing, or listing, ignore empty members in - multimember files. By default lziprecover exits with error status 2 if - any empty member is found in a multimember file. - -'--ignore-nonzero' - When decompressing or testing, ignore a nonzero first byte in the LZMA - stream. By default lziprecover exits with error status 2 if the first - LZMA byte is nonzero in any member of the input files. Use - 'lziprecover --nonzero-repair' to repair any such nonzero bytes. + file if the option '-o' is used), stripping the members listed, the + damaged members (if any), the empty members (if any), or the trailing + data (if any) from each file. If all members in a file are selected to + be stripped, the trailing data (if any) are also stripped even if + 'tdata' is not specified. If more than one file is given, the files are + concatenated. In this case the trailing data are also stripped from + all but the last file even if 'tdata' is not specified. If a file does + not exist, can't be opened, or is not regular, lziprecover continues + processing the rest of the files. If a file fails to copy, lziprecover + exits immediately without processing the rest of the files. See + '--dump' above for a description of the argument. '--loose-trailing' When decompressing, testing, or listing, allow trailing data whose first bytes are so similar to the magic bytes of a lzip header that they can be confused with a corrupt header. Use this option if a file - triggers a "corrupt header" error and the cause is not indeed a + triggers a 'corrupt header' error and the cause is not indeed a corrupt header. '--nonzero-repair' @@ -625,14 +619,15 @@ lziprecover also supports the following debug options (for experts): Load the compressed FILE into memory, set the byte at POSITION to VALUE, and decompress the modified compressed data to standard output. If the damaged member can be decompressed to the end (just fails with - a CRC mismatch), the members following it are also decompressed. + a CRC mismatch), the members following it are also decompressed. *Note + --set-byte::, for a description of VALUE. '-X[POSITION,VALUE]' '--show-packets[=POSITION,VALUE]' Load the compressed FILE into memory, optionally set the byte at POSITION to VALUE, decompress the modified compressed data (discarding the output), and print to standard output descriptions of the LZMA - packets being decoded. + packets being decoded. *Note --set-byte::, for a description of VALUE. '-Y RANGE' '--debug-delay=RANGE' @@ -649,6 +644,7 @@ lziprecover also supports the following debug options (for experts): '--debug-byte-repair=POSITION,VALUE' Load the compressed FILE into memory, set the byte at POSITION to VALUE, and then try to repair the byte error. *Note --byte-repair::. + *Note --set-byte::, for a description of VALUE. '--gf16' Forces the use of GF(2^16) when creating FEC blocks even if the number @@ -681,9 +677,57 @@ corrupt or invalid input file, 3 for an internal consistency error (e.g., bug) which caused lziprecover to panic. -File: lziprecover.info, Node: File format, Next: Data safety, Prev: Invoking lziprecover, Up: Top +File: lziprecover.info, Node: Argument syntax, Next: File format, Prev: Invoking lziprecover, Up: Top + +3 Syntax of command-line arguments +********************************** + +POSIX recommends these conventions for command-line arguments. + + * A command-line argument is an option if it begins with a hyphen ('-'). + + * Option names are single alphanumeric characters. + + * Certain options require an argument. + + * An option and its argument may or may not appear as separate tokens. + (In other words, the whitespace separating them is optional, unless the + argument is the empty string). Thus, '-o foo' and '-ofoo' are + equivalent. + + * One or more options without arguments, followed by at most one option + that takes an argument, may follow a hyphen in a single token. Thus, + '-abc' is equivalent to '-a -b -c'. + + * Options typically precede other non-option arguments. + + * The argument '--' terminates all options; any following arguments are + treated as non-option arguments, even if they begin with a hyphen. + + * A token consisting of a single hyphen character is interpreted as an + ordinary non-option argument. By convention, it is used to specify + standard input, standard output, or a file named '-'. + +GNU adds "long options" to these conventions: + + * A long option consists of two hyphens ('--') followed by a name made + of alphanumeric characters and hyphens. Option names are typically one + to three words long, with hyphens to separate words. Abbreviations can + be used for the long option names as long as the abbreviations are + unique. + + * A long option and its argument may or may not appear as separate + tokens. In the latter case they must be separated by an equal sign '='. + Thus, '--foo bar' and '--foo=bar' are equivalent. + +The syntax of options with an optional argument is +'-<short_option><argument>' (without whitespace), or +'--<long_option>=<argument>'. + + +File: lziprecover.info, Node: File format, Next: Data safety, Prev: Argument syntax, Up: Top -3 File format +4 File format ************* Perfection is reached, not when there is no longer anything to add, but @@ -737,7 +781,7 @@ not allowed in multimember files. Valid values for dictionary size range from 4 KiB to 512 MiB. 'LZMA stream' - The LZMA stream, finished by an "End Of Stream" marker. Uses default + The LZMA stream, terminated by an 'End Of Stream' marker. Uses default values for encoder properties. *Note Stream format: (lzip)Stream format, for a complete description. @@ -757,7 +801,7 @@ not allowed in multimember files. File: lziprecover.info, Node: Data safety, Next: Fec files, Prev: File format, Up: Top -4 Protecting data from accidental loss +5 Protecting data from accidental loss ************************************** It is a fact of life that sometimes data becomes corrupt. Software has @@ -803,7 +847,7 @@ with gzip and bzip2 with respect to data safety: File: lziprecover.info, Node: Merging with a backup, Next: Reproducing a mailbox, Up: Data safety -4.1 Recovering a file using a damaged backup +5.1 Recovering a file using a damaged backup ============================================ Let's suppose that you made a compressed backup of your valuable scientific @@ -830,7 +874,7 @@ possible to recover a file with thousands of errors. File: lziprecover.info, Node: Reproducing a mailbox, Prev: Merging with a backup, Up: Data safety -4.2 Recovering new messages using an old backup +5.2 Recovering new messages using an old backup =============================================== Let's suppose that you make periodic backups of your email messages stored @@ -876,15 +920,14 @@ identical backups (*note performance-of-merge::). File: lziprecover.info, Node: Fec files, Next: Repairing one byte, Prev: Data safety, Up: Top -5 Forward Error Correction +6 Forward Error Correction ************************** -"Forward Error Correction" (FEC) is any way of protecting data from -corruption by creating redundant data that can be used later to repair -errors in the protected data. Lziprecover uses a Hilbert-based Reed-Solomon -code to create one fec file (with extension '.fec') for each file that -needs to be protected. The fec files created by lziprecover are -reproducible. +Forward Error Correction (FEC) is any way of protecting data from corruption +by creating redundant data that can be used later to repair errors in the +protected data. Lziprecover uses a Hilbert-based Reed-Solomon code to create +one fec file (with extension '.fec') for each file that needs to be +protected. The fec files created by lziprecover are reproducible. Reed-Solomon is the most space-efficient Error Correcting Code (ECC) for data stored in block devices. It creates redundant FEC blocks in such a way @@ -892,8 +935,7 @@ that X FEC blocks allow the recuperation of any combination of up to X lost data blocks. All the blocks (data and FEC) are of the same size, which in fec files must be a multiple of 512 bytes. Reed-Solomon is not optimum for corruption affecting random single bits in a file because each corrupt bit -invalidates the whole block containing it. But in block devices, scattered -bit flips should not happen. +invalidates the whole block containing it. Usually, a corrupt file does not provide an indication of where the corruption is located. Therefore, each fec file stores one or two arrays of @@ -921,7 +963,7 @@ must be intact to provide 'prodata_size', 'prodata_md5', and 'gf16'. File: lziprecover.info, Node: How Reed-Solomon works, Next: Implementation details, Up: Fec files -5.1 How Reed-Solomon works +6.1 How Reed-Solomon works ========================== To illustrate how Reed-Solomon works on the BEC, we will use an example with @@ -944,8 +986,8 @@ p, q, and r can be computed from the values of x, y, and z: Now, if the values of x and y are lost because of data corruption, they can be recomputed by using any two of the three equations above. For -example, if we replace the known values of z, p, q, and r in equations (1) -and (2) we get: +example, if we replace the known values of z, p, and q in equations (1) and +(2) we get: x + y + 3 = 6 (1b) x + 2y + 9 = 14 (2b) @@ -982,7 +1024,7 @@ obtain the values of x and y (D = A^-1 * F): File: lziprecover.info, Node: Implementation details, Next: Creating fec files, Prev: How Reed-Solomon works, Up: Fec files -5.2 How lziprecover implements Reed-Solomon +6.2 How lziprecover implements Reed-Solomon =========================================== Lziprecover's implementation of Reed-Solomon can manage up to 128 data @@ -1011,17 +1053,17 @@ blocks. Lziprecover implements GF(2^8) with polynomial 0x11D and GF(2^16) with polynomial 0x1100B. - A Hilbert matrix is defined as 'A[i][j] = 1 / (i + j + 1)' for i and j ->= 0. But as in a Galois Field addition is exclusive or, applying the -Hilbert definition produces a singular (non invertible) matrix. To avoid -this problem, lziprecover uses a Hilbert matrix starting at row -'gf_size / 2'. I.e., 'A[i][j] = 1 / (i + gf_size / 2 + j)' for -'0 <= i,j < gf_size / 2'. (gf_size is the size of the Galois Field). + A Hilbert matrix is defined as A[i][j] = 1 / (i + j + 1) for i,j >= 0. +But, as in a Galois Field the addition is the exclusive or operation, +applying the Hilbert definition produces a singular (non invertible) +matrix. To avoid this problem, lziprecover uses a Hilbert matrix starting +at row r0 = gf_size / 2. I.e., A[i][j] = 1 / (i + j + r0) for +0 <= i,j < r0. ('gf_size' is the size of the Galois Field). File: lziprecover.info, Node: Creating fec files, Next: Testing with fec files, Prev: Implementation details, Up: Fec files -5.3 How to create fec files +6.3 How to create fec files =========================== Example 1: Create the fec file 'archive.tar.lz.fec' and store it in the @@ -1039,10 +1081,15 @@ Example 3: Create recursively one fec file for each file in the directory lziprecover -v -r -Fc -o fec/ datadir +Example 4: Create fec files for a collection of photos stored in directory +'photos' and store them in the directory 'photos-fec'. + + lziprecover -v -Fc -o photos-fec/ photos/* + File: lziprecover.info, Node: Testing with fec files, Next: Repairing with fec files, Prev: Creating fec files, Up: Fec files -5.4 How to test files using fec files +6.4 How to test files using fec files ===================================== Example 1: Test the integrity of 'archive.tar.lz' using the fec file @@ -1061,10 +1108,15 @@ directory 'fec'. lziprecover -v -r -Ft --fec-file=fec/ datadir +Example 4: Test the integrity of a collection of photos stored in directory +'photos' using fec files from directory 'photos-fec'. + + lziprecover -v -Ft --fec-file=photos-fec/ photos/* + File: lziprecover.info, Node: Repairing with fec files, Next: Fec file format, Prev: Testing with fec files, Up: Fec files -5.5 How to repair files using fec files +6.5 How to repair files using fec files ======================================= Example 1: Repair the file 'archive.tar.lz' using the fec file @@ -1084,10 +1136,22 @@ directory 'fec'. lziprecover -v -r -Fr --fec-file=fec/ datadir +Example 4: Recover a collection of photos from a damaged external drive +('/dev/sdc1'). The photos are in directory 'photos', and the fec files are +in directory 'photos-fec'. + + ddrescue -b4096 -r10 /dev/sdc1 hdimage mapfile + mount -o loop,ro hdimage /mnt/hdimage + cp -a /mnt/hdimage/photos photos + cp -a /mnt/hdimage/photos-fec photos-fec + umount /mnt/hdimage + lziprecover -v -Fr --fec-file=photos-fec/ photos/* + (Check and rename repaired files. They are named 'photos/*_fixed') + File: lziprecover.info, Node: Fec file format, Prev: Repairing with fec files, Up: Fec files -5.6 Fec file format +6.6 Fec file format =================== A fec file consists of one chksum packet, one or more fec packets, and one @@ -1127,7 +1191,7 @@ achieved by a careful design, without adding any padding bytes. The fec file format has an overhead of 8 bytes per protected data block, plus 16 bytes per FEC block, plus 80 bytes. -5.6.1 Chksum packet +6.6.1 Chksum packet ------------------- A chksum packet contains one CRC for each of the N data blocks in the @@ -1179,7 +1243,7 @@ payload_crc 36 + 4N 4 present) contains an array of CRC32-Cs. For the expected thousands of bit flips caused by a zeroed sector, a - "symmetric" CRC like CRC32 is probably better than CRC32-C, which + symmetric CRC like CRC32 is probably better than CRC32-C, which detects all the errors with an odd number of bit flips at the expense of a larger number of undetected errors with an even number of bit flips. @@ -1187,7 +1251,7 @@ payload_crc 36 + 4N 4 'payload_crc' CRC32 of the crc_array. -5.6.2 Fec packet +6.6.2 Fec packet ---------------- A fec packet contains one FEC block and is structured as shown in the @@ -1224,7 +1288,7 @@ payload_crc 12 + fbs 4 File: lziprecover.info, Node: Repairing one byte, Next: Merging files, Prev: Fec files, Up: Top -6 Repairing one byte +7 Repairing one byte ******************** Lziprecover can repair perfectly most files with small errors (up to one @@ -1238,11 +1302,11 @@ most common forms of data corruption. is limited to 2 GiB on 32-bit systems. The error may be located anywhere in the file except in the first 5 -bytes of each member header or in the 'Member size' field of the trailer -(last 8 bytes of each member). If the error is in the header it can be -easily repaired with a text editor like GNU Moe (*note File format::). If -the error is in the member size, it is enough to ignore the message about -'bad member size' when decompressing. +bytes of each member header (magic and version) or in the 'Member size' +field of the trailer (last 8 bytes of each member). If the error is in the +header it can be easily repaired with a text editor like GNU Moe (*note +File format::). If the error is in the member size, it is enough to ignore +the message about 'bad member size' when decompressing. Bit flip happens when one bit in the file is changed from 0 to 1 or vice versa. It may be caused by bad RAM or even by natural radiation. I have @@ -1252,7 +1316,7 @@ seen a case of bit flip in a file stored on an USB flash drive. transmission errors or I/O errors just affect one byte, or even one bit, of the file. Also, unlike magnetic media, where errors usually affect a whole sector, solid-state storage devices tend to produce single-byte errors, -making of lzip the perfect format for data stored on such devices. +which lziprecover can repair. Repairing a file can take some time. Small files or files with the error located near the beginning can be repaired in a few seconds. But repairing @@ -1266,7 +1330,7 @@ repairs more efficiently the worst errors. File: lziprecover.info, Node: Merging files, Next: Reproducing one sector, Prev: Repairing one byte, Up: Top -7 Merging files +8 Merging files *************** If you have several copies of a file but all of them are too damaged to @@ -1320,10 +1384,8 @@ identical to the original, in just 5 seconds: than the number of corrupt bytes (3104) because contiguous corrupt bytes are counted as a single multibyte error. - Example 1: Recover a compressed backup from two copies on CD-ROM with -error-checked merging of copies. *Note GNU ddrescue manual: (ddrescue)Top, -for details about ddrescue. +error-checked merging of copies. ddrescue -d -r1 -b2048 /dev/cdrom cdimage1 mapfile1 mount -t iso9660 -o loop,ro cdimage1 /mnt/cdimage @@ -1339,7 +1401,6 @@ for details about ddrescue. lziprecover -tv backup.tar.lz backup.tar.lz: ok - Example 2: Recover the first volume of those created with the command 'lzip -b 32MiB -S 650MB big_db' from two copies, 'big_db1_00001.lz' and 'big_db2_00001.lz', with member 07 damaged in the first copy, member 18 @@ -1354,7 +1415,7 @@ correct file produced is saved in 'big_db_00001.lz'. File: lziprecover.info, Node: Reproducing one sector, Next: Tarlz, Prev: Merging files, Up: Top -8 Reproducing one sector +9 Reproducing one sector ************************ Lziprecover can recover a zeroed sector in a lzip file by concatenating the @@ -1430,7 +1491,7 @@ header, and that the archive can be reproduced. The tarlz format has minimum overhead. It uses basic ustar headers, and only adds extended pax headers when they are required. -8.1 Performance of '--reproduce' +9.1 Performance of '--reproduce' ================================ Reproduce mode is especially useful when recovering a corrupt backup (or a @@ -1483,7 +1544,6 @@ for a different version of the software. Member reproduced successfully. Copy of input file reproduced successfully. - Example 2: Recover a damaged backup with a zeroed sector of 4096 bytes at file position 1019904, using as reference a previous backup. The damaged backup comes from a damaged partition copied with ddrescue. @@ -1505,7 +1565,6 @@ backup comes from a damaged partition copied with ddrescue. Member reproduced successfully. Copy of input file reproduced successfully. - Example 3: Recover a damaged backup with a zeroed sector of 4096 bytes at file position 1019904, using as reference a file from the filesystem. (If the zeroed sector encodes (part of) a tar header, the tarball can't be @@ -1541,8 +1600,8 @@ has been renamed. File: lziprecover.info, Node: Tarlz, Next: File names, Prev: Reproducing one sector, Up: Top -9 Options supporting the tar.lz format -************************************** +10 Options supporting the tar.lz format +*************************************** Tarlz is a massively parallel (multi-threaded) combined implementation of the tar archiver and the lzip compressor. @@ -1562,8 +1621,8 @@ alignment between tar members and lzip members minimizes the amount of data lost in case of corruption. In this chapter we'll explain the ways in which lziprecover can recover and process multimember tar.lz archives. -9.1 Recovering damaged multimember tar.lz archives -================================================== +10.1 Recovering damaged multimember tar.lz archives +=================================================== If you have several copies of the damaged archive, try merging them first because merging has a high probability of success. *Note Merging files::. If @@ -1604,8 +1663,8 @@ possible from each damaged member in 'bad_members.tar.lz': cd tmp tarlz --keep-damaged -xvf ../bad_members.tar.lz -9.2 Processing multimember tar.lz archives -========================================== +10.2 Processing multimember tar.lz archives +=========================================== Lziprecover is able to copy a list of members from a file to another. For example the command @@ -1618,7 +1677,7 @@ end-of-file blocks. File: lziprecover.info, Node: File names, Next: Trailing data, Prev: Tarlz, Up: Top -10 Names of the files produced by lziprecover +11 Names of the files produced by lziprecover ********************************************* The name of the fixed file produced by '--byte-repair' and '--merge' is @@ -1634,7 +1693,7 @@ string '_fixed' is inserted before the extension. File: lziprecover.info, Node: Trailing data, Next: Examples, Prev: File names, Up: Top -11 Extra data appended to the file +12 Extra data appended to the file ********************************** Sometimes extra data are found appended to a lzip file after the last @@ -1644,7 +1703,7 @@ member. Such trailing data may be: example when writing to a tape. It is safe to append any amount of padding zero bytes to a lzip file. - * Useful data added by the user; an "End Of File" string (to check that + * Useful data added by the user; an 'End Of File' string (to check that the file has not been truncated), a cryptographically secure hash, a description of file contents, etc. It is safe to append any amount of text to a lzip file as long as none of the first four bytes of the @@ -1691,7 +1750,6 @@ Example 1: Add a comment or description to a compressed file. # This command removes the comment from file.lz lziprecover --remove=tdata file.lz - Example 2: Add and check a cryptographically secure hash. (This may be convenient, but a separate copy of the hash must be kept in a safe place to guarantee that both file and hash have not been maliciously replaced). @@ -1703,7 +1761,7 @@ guarantee that both file and hash have not been maliciously replaced). File: lziprecover.info, Node: Examples, Next: Unzcrash, Prev: Trailing data, Up: Top -12 A small tutorial with examples +13 A small tutorial with examples ********************************* Example 1: Extract all the files from archive 'foo.tar.lz'. @@ -1763,7 +1821,7 @@ integrity of the resulting files. File: lziprecover.info, Node: Unzcrash, Next: Problems, Prev: Examples, Up: Top -13 Testing the robustness of decompressors +14 Testing the robustness of decompressors ****************************************** *Note --unzcrash::, for a faster way of testing the robustness of lzip. @@ -1849,10 +1907,11 @@ unzcrash supports the following options: '-B[SIZE][,VALUE]' '--block[=SIZE][,VALUE]' - Test block errors of given SIZE, simulating a whole sector I/O error. - SIZE defaults to 512 bytes. VALUE defaults to 0. By default, only - contiguous, non-overlapping blocks are tested, but this may be changed - with the option '--delta'. + Test block errors of given SIZE, simulating a whole sector I/O error + by setting all the bytes in the block to VALUE before attempting + decompression. SIZE defaults to 512 bytes. VALUE defaults to 0. By + default, only contiguous, non-overlapping blocks are tested, but this + may be changed with the option '--delta'. '-d N' '--delta=N' @@ -1918,7 +1977,7 @@ bug) which caused unzcrash to panic. File: lziprecover.info, Node: Problems, Next: Concept index, Prev: Unzcrash, Up: Top -14 Reporting bugs +15 Reporting bugs ***************** There are probably bugs in lziprecover. There are certainly errors and @@ -1939,6 +1998,7 @@ Concept index * Menu: +* argument syntax: Argument syntax. (line 6) * bugs: Problems. (line 6) * chksum packet: Fec file format. (line 46) * data safety: Data safety. (line 6) @@ -1973,40 +2033,43 @@ Concept index Tag Table: Node: Top226 -Node: Introduction1463 -Node: Invoking lziprecover6223 -Ref: --trailing-error7167 -Ref: --byte-repair8261 -Ref: range-format10138 -Ref: --reproduce10473 -Ref: --unzcrash28457 -Node: File format32896 -Node: Data safety35653 -Node: Merging with a backup37899 -Node: Reproducing a mailbox39162 -Node: Fec files41616 -Node: How Reed-Solomon works43945 -Node: Implementation details46119 -Node: Creating fec files48188 -Node: Testing with fec files48852 -Node: Repairing with fec files49619 -Node: Fec file format50437 -Ref: fbs53308 -Node: Repairing one byte55099 -Node: Merging files57208 -Ref: performance-of-merge58387 -Ref: ddrescue-example59996 -Node: Reproducing one sector61283 -Ref: performance-of-reproduce65220 -Ref: ddrescue-example267894 -Node: Tarlz70314 -Node: File names73981 -Node: Trailing data74714 -Node: Examples78028 -Ref: concat-example78600 -Node: Unzcrash79999 -Node: Problems86385 -Node: Concept index86937 +Node: Introduction1535 +Node: Invoking lziprecover6387 +Ref: --trailing-error7308 +Ref: --byte-repair8402 +Ref: range-format10483 +Ref: --reproduce10728 +Ref: --unzcrash28455 +Node: Argument syntax33048 +Node: File format35005 +Node: Data safety37759 +Node: Merging with a backup40005 +Node: Reproducing a mailbox41268 +Node: Fec files43722 +Node: How Reed-Solomon works45988 +Node: Implementation details48159 +Node: Creating fec files50224 +Node: Testing with fec files51068 +Node: Repairing with fec files52023 +Ref: ddrescue-example52841 +Node: Fec file format53351 +Ref: fbs56222 +Node: Repairing one byte58011 +Node: Merging files60103 +Ref: performance-of-merge61282 +Ref: ddrescue-example262890 +Node: Reproducing one sector64106 +Ref: performance-of-reproduce68043 +Ref: ddrescue-example370716 +Node: Tarlz73135 +Node: File names76808 +Node: Trailing data77541 +Node: Examples80854 +Ref: concat-example81426 +Node: Unzcrash82825 +Ref: --set-byte87437 +Node: Problems89295 +Node: Concept index89847 End Tag Table diff --git a/doc/lziprecover.texi b/doc/lziprecover.texi index 41f3641..98eeb8c 100644 --- a/doc/lziprecover.texi +++ b/doc/lziprecover.texi @@ -6,8 +6,8 @@ @finalout @c %**end of header -@set UPDATED 1 October 2024 -@set VERSION 1.25-pre1 +@set UPDATED 18 November 2024 +@set VERSION 1.25-rc1 @dircategory Compression @direntry @@ -38,6 +38,7 @@ This manual is for Lziprecover (version @value{VERSION}, @value{UPDATED}). @menu * Introduction:: Purpose and features of lziprecover * Invoking lziprecover:: Command-line interface +* Argument syntax:: By convention, options start with a hyphen * File format:: Detailed format of the compressed file * Data safety:: Protecting data from accidental loss * Fec files:: Forward Error Correction @@ -139,8 +140,16 @@ pdlzip. If the cause of file corruption is a damaged medium, the combination @w{GNU ddrescue + lziprecover} is the recommended option for recovering data -from damaged lzip files. @xref{ddrescue-example}, and -@ref{ddrescue-example2}, for examples. +from damaged files. @xref{ddrescue-example}, @ref{ddrescue-example2}, and +@ref{ddrescue-example3}, for examples. +@ifnothtml +@xref{Top,GNU ddrescue manual,,ddrescue}, +@end ifnothtml +@ifhtml +See the +@uref{http://www.gnu.org/software/ddrescue/manual/ddrescue_manual.html,,ddrescue manual} +@end ifhtml +for details about ddrescue. If a file is too damaged for lziprecover to repair it, all the recoverable data in all members of the file can be extracted with the following command @@ -186,11 +195,7 @@ standard output. Remember to prepend @file{./} to any file name beginning with a hyphen, or use @samp{--}. @noindent -lziprecover supports the following -@uref{http://www.nongnu.org/arg-parser/manual/arg_parser_manual.html#Argument-syntax,,options}: -@ifnothtml -@xref{Argument syntax,,,arg_parser}. -@end ifnothtml +lziprecover supports the following options: @xref{Argument syntax}. @table @code @item -h @@ -211,12 +216,12 @@ garbage that can be safely ignored. @xref{concat-example}. @item -A @itemx --alone-to-lz -Convert lzma-alone files to lzip format without recompressing, just -adding a lzip header and trailer. The conversion minimizes the -dictionary size of the resulting file (and therefore the amount of -memory required to decompress it). Only streamed files with default LZMA -properties can be converted; non-streamed lzma-alone files lack the "End -Of Stream" marker required in lzip files. +Convert lzma-alone files to lzip format without recompressing, just adding a +lzip header and trailer. The conversion minimizes the dictionary size of the +resulting file (and therefore the amount of memory required to decompress +it). Only streamed files with default LZMA properties can be converted; +non-streamed lzma-alone files lack the 'End Of Stream' marker required in +lzip files. The name of the converted lzip file is derived from that of the original lzma-alone file as follows: @@ -258,24 +263,27 @@ already exists and @option{--force} has not been specified, lziprecover continues decompressing the rest of the files and exits with error status 1. If a file fails to decompress, or is a terminal, lziprecover exits immediately with error status 2 without decompressing the rest of the files. -A terminal is considered an uncompressed file, and therefore invalid. +A terminal is considered an uncompressed file, and therefore invalid. A +multimember file with one or more empty members is accepted if redirected to +standard input or if '-i' is given. @item -D @var{range} @itemx --range-decompress=@var{range} Decompress only a range of bytes starting at decompressed byte position @var{begin} and up to byte position @w{@var{end} - 1}. Byte positions start -at 0. This option provides random access to the data in multimember files; -it only decompresses the members containing the desired data. In order to -guarantee the correctness of the data produced, all members containing any -part of the desired data are decompressed and their integrity is checked. +at 0. The bytes produced are sent to standard output unless the option +@option{-o} is used. This option provides random access to the data in +multimember files; it only decompresses the members containing the desired +data. In order to guarantee the correctness of the data produced, all +members containing any part of the desired data are decompressed and their +integrity is checked. @anchor{range-format} Four formats of @var{range} are recognized, @samp{@var{begin}}, @samp{@var{begin}-@var{end}}, @samp{@var{begin},@var{size}}, and @samp{,@var{size}}. If only @var{begin} is specified, @var{end} is taken as the end of the file. If only @var{size} is specified, @var{begin} is taken -as the beginning of the file. The bytes produced are sent to standard output -unless the option @option{--output} is used. +as the beginning of the file. @anchor{--reproduce} @item -e @@ -371,7 +379,8 @@ last) may be wrong. @item -k @itemx --keep -Keep (don't delete) input files during decompression. +Keep (don't delete) input files during decompression or conversion from +lzma-alone. @item -l @itemx --list @@ -381,9 +390,11 @@ even for multimember files. If more than one file is given, a final line containing the cumulative sizes is printed. With @option{-v}, the dictionary size, the number of members in the file, and the amount of trailing data (if any) are also printed. With @option{-vv}, the positions and sizes of each -member in multimember files are also printed. With @option{-i}, format errors -are ignored, and with @option{-ivv}, gaps between members are shown. The -member numbers shown coincide with the file numbers produced by @option{--split}. +member in multimember files are also printed. A multimember file with one or +more empty members is accepted if redirected to standard input or if '-i' is +given. With @option{-i}, format errors are ignored, and with @option{-ivv}, +gaps between members are shown. The member numbers start at 1 and coincide +with the file numbers produced by @option{--split}. If any file is damaged, does not exist, can't be opened, or is not regular, the final exit status is @w{> 0}. @option{-lq} can be used to check quickly @@ -402,8 +413,8 @@ the merge mode. @item -n @var{n} @itemx --threads=@var{n} Set the maximum number of worker threads for @option{--fec=create}, -overriding the system's default. Valid values range from 1 to "as many as -your system can support". If this option is not used, lziprecover tries to +overriding the system's default. Valid values range from 1 to as many as +your system can support. If this option is not used, lziprecover tries to detect the number of processors in the system and use it as default value. @w{@samp{lziprecover --help}} shows the system's default value. @@ -411,7 +422,7 @@ detect the number of processors in the system and use it as default value. @itemx --output=@var{file}[/] If repairing, place the repaired output into @var{file} instead of into @var{file}_fixed.lz. If splitting, the names of the files produced are in -the form @file{rec01@var{file}}, @file{rec02@var{file}}, etc. +the form @file{rec1@var{file}}, @file{rec2@var{file}}, etc. If creating FEC data and @option{-c} has not been also specified, write the FEC data to @var{file}. If @var{file} ends with a slash, it is interpreted @@ -458,8 +469,8 @@ members with corrupt headers or trailers. If other lziprecover functions fail to work on a multimember @var{file} because of damage in headers or trailers, try to split @var{file} and then work on each member individually. -The names of the files produced are in the form @file{rec01@var{file}}, -@file{rec02@var{file}}, etc, and are designed so that the use of wildcards +The names of the files produced are in the form @file{rec1@var{file}}, +@file{rec2@var{file}}, etc, and are designed so that the use of wildcards in subsequent processing, for example, @w{@samp{lziprecover -cd rec*@var{file} > recovered_data}}, processes the files in the correct order. The number of digits used in the names varies @@ -473,7 +484,8 @@ together with @option{-v} to see information about the files. If a file fails the test, does not exist, can't be opened, or is a terminal, lziprecover continues testing the rest of the files. A final diagnostic is shown at verbosity level 1 or higher if any file fails the test when testing multiple -files. +files. A multimember file with one or more empty members is accepted if +redirected to standard input or if '-i' is given. @item -v @itemx --verbose @@ -489,8 +501,8 @@ operations, and extra information (for example, the failed areas). @item --dump=[@var{member_list}][:damaged][:empty][:tdata] Dump the members listed, the damaged members (if any), the empty members (if any), or the trailing data (if any) of one or more regular multimember files -to standard output, or to a file if the option @option{--output} is used. If -more than one file is given, the elements dumped from all the files are +to standard output, or to a file if the option @option{-o} is used. If more +than one file is given, the elements dumped from all the files are concatenated. If a file does not exist, can't be opened, or is not regular, lziprecover continues processing the rest of the files. If the dump fails in one file, lziprecover exits immediately without processing the rest of the @@ -547,7 +559,7 @@ attempting the removal of trailing data. @item --strip=[@var{member_list}][:damaged][:empty][:tdata] Copy one or more regular multimember files to standard output (or to a file -if the option @option{--output} is used), stripping the members listed, the +if the option @option{-o} is used), stripping the members listed, the damaged members (if any), the empty members (if any), or the trailing data (if any) from each file. If all members in a file are selected to be stripped, the trailing data (if any) are also stripped even if @samp{tdata} @@ -559,22 +571,11 @@ the rest of the files. If a file fails to copy, lziprecover exits immediately without processing the rest of the files. See @option{--dump} above for a description of the argument. -@item --ignore-empty -When decompressing, testing, or listing, ignore empty members in multimember -files. By default lziprecover exits with error status 2 if any empty member -is found in a multimember file. - -@item --ignore-nonzero -When decompressing or testing, ignore a nonzero first byte in the LZMA -stream. By default lziprecover exits with error status 2 if the first LZMA -byte is nonzero in any member of the input files. -Use @w{@samp{lziprecover --nonzero-repair}} to repair any such nonzero bytes. - @item --loose-trailing When decompressing, testing, or listing, allow trailing data whose first bytes are so similar to the magic bytes of a lzip header that they can be confused with a corrupt header. Use this option if a file triggers a -"corrupt header" error and the cause is not indeed a corrupt header. +'corrupt header' error and the cause is not indeed a corrupt header. @item --nonzero-repair Repair in place a nonzero first LZMA byte in the files specified. With @@ -666,13 +667,14 @@ Load the compressed @var{file} into memory, set the byte at @var{position} to @var{value}, and decompress the modified compressed data to standard output. If the damaged member can be decompressed to the end (just fails with a CRC mismatch), the members following it are also decompressed. +@xref{--set-byte}, for a description of @var{value}. @item -X[@var{position},@var{value}] @itemx --show-packets[=@var{position},@var{value}] Load the compressed @var{file} into memory, optionally set the byte at @var{position} to @var{value}, decompress the modified compressed data (discarding the output), and print to standard output descriptions of the -LZMA packets being decoded. +LZMA packets being decoded. @xref{--set-byte}, for a description of @var{value}. @item -Y @var{range} @itemx --debug-delay=@var{range} @@ -689,6 +691,7 @@ description of @var{range}. @itemx --debug-byte-repair=@var{position},@var{value} Load the compressed @var{file} into memory, set the byte at @var{position} to @var{value}, and then try to repair the byte error. @xref{--byte-repair}. +@xref{--set-byte}, for a description of @var{value}. @item --gf16 Forces the use of GF(2^16) when creating FEC blocks even if the number of @@ -723,6 +726,59 @@ indicate a corrupt or invalid input file, 3 for an internal consistency error (e.g., bug) which caused lziprecover to panic. +@node Argument syntax +@chapter Syntax of command-line arguments +@cindex argument syntax + +POSIX recommends these conventions for command-line arguments. + +@itemize @bullet +@item A command-line argument is an option if it begins with a hyphen +(@samp{-}). + +@item Option names are single alphanumeric characters. + +@item Certain options require an argument. + +@item An option and its argument may or may not appear as separate tokens. +(In other words, the whitespace separating them is optional, unless the +argument is the empty string). +Thus, @w{@option{-o foo}} and @option{-ofoo} are equivalent. + +@item One or more options without arguments, followed by at most one option +that takes an argument, may follow a hyphen in a single token. +Thus, @option{-abc} is equivalent to @w{@option{-a -b -c}}. + +@item Options typically precede other non-option arguments. + +@item The argument @samp{--} terminates all options; any following arguments +are treated as non-option arguments, even if they begin with a hyphen. + +@item A token consisting of a single hyphen character is interpreted as an +ordinary non-option argument. By convention, it is used to specify standard +input, standard output, or a file named @samp{-}. +@end itemize + +@noindent +GNU adds @dfn{long options} to these conventions: + +@itemize @bullet +@item A long option consists of two hyphens (@samp{--}) followed by a name +made of alphanumeric characters and hyphens. Option names are typically one +to three words long, with hyphens to separate words. Abbreviations can be +used for the long option names as long as the abbreviations are unique. + +@item A long option and its argument may or may not appear as separate +tokens. In the latter case they must be separated by an equal sign @samp{=}. +Thus, @w{@option{--foo bar}} and @option{--foo=bar} are equivalent. +@end itemize + +@noindent +The syntax of options with an optional argument is +@option{-<short_option><argument>} (without whitespace), or +@option{--<long_option>=<argument>}. + + @node File format @chapter File format @cindex file format @@ -785,7 +841,7 @@ Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@* Valid values for dictionary size range from 4 KiB to 512 MiB. @item LZMA stream -The LZMA stream, finished by an "End Of Stream" marker. Uses default values +The LZMA stream, terminated by an 'End Of Stream' marker. Uses default values for encoder properties. @ifnothtml @xref{Stream format,,,lzip}, @@ -932,12 +988,11 @@ identical backups (@pxref{performance-of-merge}). @chapter Forward Error Correction @cindex forward error correction -"Forward Error Correction" (FEC) is any way of protecting data from -corruption by creating redundant data that can be used later to repair -errors in the protected data. Lziprecover uses a Hilbert-based Reed-Solomon -code to create one fec file (with extension @file{.fec}) for each file that -needs to be protected. The fec files created by lziprecover are -reproducible. +Forward Error Correction (FEC) is any way of protecting data from corruption +by creating redundant data that can be used later to repair errors in the +protected data. Lziprecover uses a Hilbert-based Reed-Solomon code to create +one fec file (with extension @file{.fec}) for each file that needs to be +protected. The fec files created by lziprecover are reproducible. Reed-Solomon is the most space-efficient Error Correcting Code (ECC) for data stored in block devices. It creates redundant FEC blocks in such a way @@ -945,8 +1000,7 @@ that X FEC blocks allow the recuperation of any combination of up to X lost data blocks. All the blocks (data and FEC) are of the same size, which in fec files must be a multiple of 512 bytes. Reed-Solomon is not optimum for corruption affecting random single bits in a file because each corrupt bit -invalidates the whole block containing it. But in block devices, scattered -bit flips should not happen. +invalidates the whole block containing it. Usually, a corrupt file does not provide an indication of where the corruption is located. Therefore, each fec file stores one or two arrays of @@ -1000,8 +1054,7 @@ If we have that x = 1, y = 2, and z = 3, then p = 6, q = 14, and r = 13: Now, if the values of x and y are lost because of data corruption, they can be recomputed by using any two of the three equations above. For example, if -we replace the known values of z, p, q, and r in equations (1) and (2) we -get: +we replace the known values of z, p, and q in equations (1) and (2) we get: @example x + y + 3 = 6 (1b) @@ -1076,13 +1129,12 @@ missing data blocks. Lziprecover implements GF(2^8) with polynomial 0x11D and GF(2^16) with polynomial 0x1100B. -A Hilbert matrix is defined as @w{@samp{A[i][j] = 1 / (i + j + 1)}} for i -and j >= 0. But as in a Galois Field addition is exclusive or, applying the -Hilbert definition produces a singular (non invertible) matrix. To avoid -this problem, lziprecover uses a Hilbert matrix starting at row -@w{@samp{gf_size / 2}}. I.e., @w{@samp{A[i][j] = 1 / (i + gf_size / 2 + j)}} -for @w{@samp{0 <= i,j < gf_size / 2}}. (gf_size is the size of the Galois -Field). +A Hilbert matrix is defined as @w{A[i][j] = 1 / (i + j + 1)} for +@w{i,j >= 0}. But, as in a Galois Field the addition is the exclusive or +operation, applying the Hilbert definition produces a singular (non +invertible) matrix. To avoid this problem, lziprecover uses a Hilbert matrix +starting at row @w{r0 = gf_size / 2}. I.e., @w{A[i][j] = 1 / (i + j + r0)} +for @w{0 <= i,j < r0}. (@samp{gf_size} is the size of the Galois Field). @node Creating fec files @@ -1113,6 +1165,14 @@ Example 3: Create recursively one fec file for each file in the directory lziprecover -v -r -Fc -o fec/ datadir @end example +@noindent +Example 4: Create fec files for a collection of photos stored in directory +@file{photos} and store them in the directory @file{photos-fec}. + +@example +lziprecover -v -Fc -o photos-fec/ photos/* +@end example + @node Testing with fec files @section How to test files using fec files @@ -1143,6 +1203,14 @@ directory @file{fec}. lziprecover -v -r -Ft --fec-file=fec/ datadir @end example +@noindent +Example 4: Test the integrity of a collection of photos stored in directory +@file{photos} using fec files from directory @file{photos-fec}. + +@example +lziprecover -v -Ft --fec-file=photos-fec/ photos/* +@end example + @node Repairing with fec files @section How to repair files using fec files @@ -1174,6 +1242,22 @@ directory @file{fec}. lziprecover -v -r -Fr --fec-file=fec/ datadir @end example +@anchor{ddrescue-example} +@noindent +Example 4: Recover a collection of photos from a damaged external drive +(@file{/dev/sdc1}). The photos are in directory @file{photos}, and the fec +files are in directory @file{photos-fec}. + +@example +ddrescue -b4096 -r10 /dev/sdc1 hdimage mapfile +mount -o loop,ro hdimage /mnt/hdimage +cp -a /mnt/hdimage/photos photos +cp -a /mnt/hdimage/photos-fec photos-fec +umount /mnt/hdimage +lziprecover -v -Fr --fec-file=photos-fec/ photos/* + (Check and rename repaired files. They are named @file{photos/*_fixed}) +@end example + @node Fec file format @section Fec file format @@ -1274,9 +1358,9 @@ The first chksum packet contains an array of CRC32s, while the second chksum packet (if present) contains an array of CRC32-Cs. For the expected thousands of bit flips caused by a zeroed sector, a -"symmetric" CRC like CRC32 is probably better than CRC32-C, which detects -all the errors with an odd number of bit flips at the expense of a larger -number of undetected errors with an even number of bit flips. +symmetric CRC like CRC32 is probably better than CRC32-C, which detects all +the errors with an odd number of bit flips at the expense of a larger number +of undetected errors with an even number of bit flips. @item payload_crc CRC32 of the crc_array. @@ -1334,9 +1418,9 @@ The file is repaired in memory. Therefore, enough virtual memory @w{(RAM + swap)} to contain the largest damaged member is required. Member size is limited to @w{2 GiB} on 32-bit systems. -The error may be located anywhere in the file except in the first 5 -bytes of each member header or in the @samp{Member size} field of the -trailer (last 8 bytes of each member). If the error is in the header it +The error may be located anywhere in the file except in the first 5 bytes of +each member header (magic and version) or in the @samp{Member size} field of +the trailer (last 8 bytes of each member). If the error is in the header it can be easily repaired with a text editor like GNU Moe (@pxref{File format}). If the error is in the member size, it is enough to ignore the message about @samp{bad member size} when decompressing. @@ -1349,7 +1433,7 @@ One byte may seem small, but most file corruptions not produced by transmission errors or I/O errors just affect one byte, or even one bit, of the file. Also, unlike magnetic media, where errors usually affect a whole sector, solid-state storage devices tend to produce single-byte -errors, making of lzip the perfect format for data stored on such devices. +errors, which lziprecover can repair. Repairing a file can take some time. Small files or files with the error located near the beginning can be repaired in a few seconds. But @@ -1421,19 +1505,10 @@ Note that the number of errors reported by lziprecover (2552) is lower than the number of corrupt bytes (3104) because contiguous corrupt bytes are counted as a single multibyte error. -@sp 1 -@anchor{ddrescue-example} +@anchor{ddrescue-example2} @noindent Example 1: Recover a compressed backup from two copies on CD-ROM with error-checked merging of copies. -@ifnothtml -@xref{Top,GNU ddrescue manual,,ddrescue}, -@end ifnothtml -@ifhtml -See the -@uref{http://www.gnu.org/software/ddrescue/manual/ddrescue_manual.html,,ddrescue manual} -@end ifhtml -for details about ddrescue. @example ddrescue -d -r1 -b2048 /dev/cdrom cdimage1 mapfile1 @@ -1451,7 +1526,6 @@ lziprecover -tv backup.tar.lz backup.tar.lz: ok @end example -@sp 1 @noindent Example 2: Recover the first volume of those created with the command @w{@samp{lzip -b 32MiB -S 650MB big_db}} from two copies, @@ -1608,8 +1682,7 @@ Member reproduced successfully. Copy of input file reproduced successfully. @end example -@sp 1 -@anchor{ddrescue-example2} +@anchor{ddrescue-example3} @noindent Example 2: Recover a damaged backup with a zeroed sector of 4096 bytes at file position 1019904, using as reference a previous backup. The damaged @@ -1634,7 +1707,6 @@ Member reproduced successfully. Copy of input file reproduced successfully. @end example -@sp 1 @noindent Example 3: Recover a damaged backup with a zeroed sector of 4096 bytes at file position 1019904, using as reference a file from the filesystem. (If @@ -1790,7 +1862,7 @@ example when writing to a tape. It is safe to append any amount of padding zero bytes to a lzip file. @item -Useful data added by the user; an "End Of File" string (to check that the +Useful data added by the user; an 'End Of File' string (to check that the file has not been truncated), a cryptographically secure hash, a description of file contents, etc. It is safe to append any amount of text to a lzip file as long as none of the first four bytes of the text matches the @@ -1844,7 +1916,6 @@ lziprecover --strip=tdata file.lz > stripped_file.lz lziprecover --remove=tdata file.lz @end example -@sp 1 @noindent Example 2: Add and check a cryptographically secure hash. (This may be convenient, but a separate copy of the hash must be kept in a safe place @@ -2036,10 +2107,11 @@ The number of N-bit errors per byte (N = 1 to 8) is: @item -B[@var{size}][,@var{value}] @itemx --block[=@var{size}][,@var{value}] -Test block errors of given @var{size}, simulating a whole sector I/O error. -@var{size} defaults to 512 bytes. @var{value} defaults to 0. By default, -only contiguous, non-overlapping blocks are tested, but this may be changed -with the option @option{--delta}. +Test block errors of given @var{size}, simulating a whole sector I/O error +by setting all the bytes in the block to @var{value} before attempting +decompression. @var{size} defaults to 512 bytes. @var{value} defaults to 0. +By default, only contiguous, non-overlapping blocks are tested, but this may +be changed with the option @option{--delta}. @item -d @var{n} @itemx --delta=@var{n} @@ -2049,6 +2121,7 @@ non-overlapping blocks, or truncation sizes. Values of @var{n} smaller than the block size result in overlapping blocks. (Which is convenient for testing because there are usually too few non-overlapping blocks in a file). +@anchor{--set-byte} @item -e @var{position},@var{value} @itemx --set-byte=@var{position},@var{value} Set byte at @var{position} to @var{value} in the internal buffer after |