summaryrefslogtreecommitdiffstats
path: root/doc/lzip.info
diff options
context:
space:
mode:
Diffstat (limited to 'doc/lzip.info')
-rw-r--r--doc/lzip.info202
1 files changed, 116 insertions, 86 deletions
diff --git a/doc/lzip.info b/doc/lzip.info
index 0210f9e..cac370c 100644
--- a/doc/lzip.info
+++ b/doc/lzip.info
@@ -11,7 +11,7 @@ File: lzip.info, Node: Top, Next: Introduction, Up: (dir)
Lzip Manual
***********
-This manual is for Lzip (version 1.18, 14 May 2016).
+This manual is for Lzip (version 1.19, 13 April 2017).
* Menu:
@@ -28,7 +28,7 @@ This manual is for Lzip (version 1.18, 14 May 2016).
* Concept index:: Index of concepts
- Copyright (C) 2008-2016 Antonio Diaz Diaz.
+ Copyright (C) 2008-2017 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to
copy, distribute and modify it.
@@ -40,9 +40,10 @@ File: lzip.info, Node: Introduction, Next: Invoking lzip, Prev: Top, Up: Top
**************
Lzip is a lossless data compressor with a user interface similar to the
-one of gzip or bzip2. Lzip is about as fast as gzip, compresses most
-files more than bzip2, and is better than both from a data recovery
-perspective.
+one of gzip or bzip2. Lzip can compress about as fast as gzip
+(lzip -0), or compress most files more than bzip2 (lzip -9).
+Decompression speed is intermediate between gzip and bzip2. Lzip is
+better than gzip and bzip2 from a data recovery perspective.
The lzip file format is designed for data sharing and long-term
archiving, taking into account both data integrity and decoder
@@ -56,11 +57,11 @@ availability:
(lziprecover)Data safety.
* The lzip format is as simple as possible (but not simpler). The
- lzip manual provides the code of a simple decompressor along with
- a detailed explanation of how it works, so that with the only help
- of the lzip manual it would be possible for a digital
- archaeologist to extract the data from a lzip file long after
- quantum computers eventually render LZMA obsolete.
+ lzip manual provides the source code of a simple decompressor
+ along with a detailed explanation of how it works, so that with
+ the only help of the lzip manual it would be possible for a
+ digital archaeologist to extract the data from a lzip file long
+ after quantum computers eventually render LZMA obsolete.
* Additionally the lzip reference implementation is copylefted, which
guarantees that it will remain free forever.
@@ -126,9 +127,9 @@ two or more compressed files. The result is the concatenation of the
corresponding uncompressed files. Integrity testing of concatenated
compressed files is also supported.
- Lzip can produce multimember files and safely recover, with
-lziprecover, the undamaged members in case of file damage. Lzip can
-also split the compressed output in volumes of a given size, even when
+ Lzip can produce multimember files, and lziprecover can safely
+recover the undamaged members in case of file damage. Lzip can also
+split the compressed output in volumes of a given size, even when
reading from standard input. This allows the direct creation of
multivolume compressed tar archives.
@@ -136,6 +137,10 @@ multivolume compressed tar archives.
automatically creating multimember output. The members so created are
large, about 2 PiB each.
+ LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may
+never have been compressed. Decompressed is used to refer to data which
+have undergone the process of decompression.
+

File: lzip.info, Node: Invoking lzip, Next: Quality assurance, Prev: Introduction, Up: Top
@@ -203,6 +208,21 @@ command line.
Keep (don't delete) input files during compression or
decompression.
+'-l'
+'--list'
+ Print the uncompressed size, compressed size and percentage saved
+ of the specified file(s). Trailing data are ignored. The values
+ produced are correct even for multimember files. If more than one
+ file is given, a final line containing the cumulative sizes is
+ printed. With '-v', the dictionary size, the number of members in
+ the file, and the amount of trailing data (if any) are also
+ printed. With '-vv', the positions and sizes of each member in
+ multimember files are also printed. '-lq' can be used to verify
+ quickly (without decompressing) the structural integrity of the
+ specified files. (Use '--test' to verify the data integrity).
+ '-alq' additionally verifies that none of the specified files
+ contain trailing data.
+
'-m BYTES'
'--match-length=BYTES'
Set the match length limit in bytes. After a match this long is
@@ -252,8 +272,9 @@ command line.
Check integrity of the specified file(s), but don't decompress
them. This really performs a trial decompression and throws away
the result. Use it together with '-v' to see information about
- the file(s). If a file fails the test, lzip continues checking the
- rest of the files.
+ the file(s). If a file fails the test, does not exist, can't be
+ opened, or is a terminal, lzip continues checking the rest of the
+ files.
'-v'
'--verbose'
@@ -263,7 +284,8 @@ command line.
When decompressing or testing, further -v's (up to 4) increase the
verbosity level, showing status, compression ratio, dictionary
size, trailer contents (CRC, data size, member size), and up to 6
- bytes of trailing data (if any).
+ bytes of trailing data (if any) both in hexadecimal and as a
+ string of printable ASCII characters.
'-0 .. -9'
Set the compression parameters (dictionary size and match length
@@ -714,10 +736,10 @@ You may first send the position of the most significant bit that is set
to 1, which you may find by making a bit scan from the left (from the
MSB). A position of 0 means that the number is 0 (no bit is set), 1
means the LSB is the first bit set (the number is 1), and 32 means the
-MSB is set (i.e., the number is >= 0x80000000). Lets call this bit
-position a "slot". Then, if slot is > 1, you send the remaining slot -
-1 bits. Lets call these bits "direct_bits" because they are coded
-directly by value instead of indirectly by position.
+MSB is set (i.e., the number is >= 0x80000000). Let's call this bit
+position a "slot". Then, if slot is > 1, you send the remaining
+slot - 1 bits. Let's call these bits "direct_bits" because they are
+coded directly by value instead of indirectly by position.
The inconvenient of this simple method is that it needs 6 bits to
code the slot, but it just uses 33 of the 64 possible values, wasting
@@ -729,14 +751,15 @@ same 6 bits that would take to encode the position alone. This seems to
need 66 slots (2 * position + next_bit), but for slots 0 and 1 there is
no next bit, so the number of needed slots is 64 (0 to 63).
- The slot number is context-coded in 6 bits. 'direct_bits' is the
-amount of remaining bits (from 0 to 30) needed to form a complete
-distance, and is calculated as (slot >> 1) - 1. If a distance needs 6 or
-more direct_bits, the last 4 bits are coded separately. The last piece
-(all the direct_bits for distances 4 to 127 or the last 4 bits for
-distances >= 128) is context-coded in reverse order (from LSB to MSB).
-For distances >= 128, the 'direct_bits - 4' part is coded with fixed
-0.5 probability.
+ The 6 bits representing this "slot number" are then context-coded. If
+the distance is >= 4, the remaining bits are coded as follows.
+'direct_bits' is the amount of remaining bits (from 0 to 30) needed to
+form a complete distance, and is calculated as (slot >> 1) - 1. If a
+distance needs 6 or more direct_bits, the last 4 bits are coded
+separately. The last piece (all the direct_bits for distances 4 to 127
+or the last 4 bits for distances >= 128) is context-coded in reverse
+order (from LSB to MSB). For distances >= 128, the 'direct_bits - 4'
+part is coded with fixed 0.5 probability.
Bit sequence Description
--------------------------------------------------------------------------
@@ -871,16 +894,21 @@ File: lzip.info, Node: Trailing data, Next: Examples, Prev: Stream format, U
7 Extra data appended to the file
*********************************
-Sometimes extra data is found appended to a lzip file after the last
+Sometimes extra data are found appended to a lzip file after the last
member. Such trailing data may be:
* Padding added to make the file size a multiple of some block size,
- for example when writing to a tape.
-
- * Garbage added by some not totally successful copy operation.
+ for example when writing to a tape. It is safe to append any
+ amount of padding zero bytes to a lzip file.
* Useful data added by the user; a cryptographically secure hash, a
- description of file contents, etc.
+ description of file contents, etc. It is safe to append any amount
+ of text to a lzip file as long as the text does not begin with the
+ string "LZIP", and does not contain any zero bytes (null
+ characters). Nonzero bytes and zero bytes can't be safely mixed in
+ trailing data.
+
+ * Garbage added by some not totally successful copy operation.
* Malicious data added to the file in order to make its total size
and hash value (for a chosen hash) coincide with those of another
@@ -893,8 +921,12 @@ member. Such trailing data may be:
the corruption of the integrity information itself. Therefore it
can be considered to be below the noise level.
+ Trailing data are in no way part of the lzip file format, but tools
+reading lzip files are expected to behave as correctly and usefully as
+possible in the presence of trailing data.
+
Trailing data can be safely ignored in most cases. In some cases,
-like that of user-added data, it is expected to be ignored. In those
+like that of user-added data, they are expected to be ignored. In those
cases where a file containing trailing data must be rejected, the option
'--trailing-error' can be used. *Note --trailing-error::.
@@ -942,8 +974,8 @@ Example 5: Compress a whole device in /dev/sdc and send the output to
lzip -c /dev/sdc > file.lz
-Example 6: The right way of concatenating compressed files. *Note
-Trailing data::.
+Example 6: The right way of concatenating the decompressed output of two
+or more compressed files. *Note Trailing data::.
Don't do this
cat file1.lz file2.lz file3.lz | lzip -d
@@ -1002,7 +1034,7 @@ Appendix A Reference source code
********************************
/* Lzd - Educational decompressor for the lzip format
- Copyright (C) 2013-2016 Antonio Diaz Diaz.
+ Copyright (C) 2013-2017 Antonio Diaz Diaz.
This program is free software. Redistribution and use in source and
binary forms, with or without modification, are permitted provided
@@ -1153,10 +1185,10 @@ public:
uint8_t get_byte() { return std::getc( stdin ); }
- int decode( const int num_bits )
+ unsigned decode( const int num_bits )
{
- int symbol = 0;
- for( int i = 0; i < num_bits; ++i )
+ unsigned symbol = 0;
+ for( int i = num_bits; i > 0; --i )
{
range >>= 1;
symbol <<= 1;
@@ -1167,9 +1199,9 @@ public:
return symbol;
}
- int decode_bit( Bit_model & bm )
+ unsigned decode_bit( Bit_model & bm )
{
- int symbol;
+ unsigned symbol;
const uint32_t bound = ( range >> bit_model_total_bits ) * bm.probability;
if( code < bound )
{
@@ -1189,18 +1221,18 @@ public:
return symbol;
}
- int decode_tree( Bit_model bm[], const int num_bits )
+ unsigned decode_tree( Bit_model bm[], const int num_bits )
{
- int symbol = 1;
+ unsigned symbol = 1;
for( int i = 0; i < num_bits; ++i )
symbol = ( symbol << 1 ) | decode_bit( bm[symbol] );
return symbol - (1 << num_bits);
}
- int decode_tree_reversed( Bit_model bm[], const int num_bits )
+ unsigned decode_tree_reversed( Bit_model bm[], const int num_bits )
{
- int symbol = decode_tree( bm, num_bits );
- int reversed_symbol = 0;
+ unsigned symbol = decode_tree( bm, num_bits );
+ unsigned reversed_symbol = 0;
for( int i = 0; i < num_bits; ++i )
{
reversed_symbol = ( reversed_symbol << 1 ) | ( symbol & 1 );
@@ -1209,14 +1241,13 @@ public:
return reversed_symbol;
}
- int decode_matched( Bit_model bm[], const int match_byte )
+ unsigned decode_matched( Bit_model bm[], const unsigned match_byte )
{
- Bit_model * const bm1 = bm + 0x100;
- int symbol = 1;
+ unsigned symbol = 1;
for( int i = 7; i >= 0; --i )
{
- const int match_bit = ( match_byte >> i ) & 1;
- const int bit = decode_bit( bm1[(match_bit<<8)+symbol] );
+ const unsigned match_bit = ( match_byte >> i ) & 1;
+ const unsigned bit = decode_bit( bm[symbol+(match_bit<<8)+0x100] );
symbol = ( symbol << 1 ) | bit;
if( match_bit != bit )
{
@@ -1228,7 +1259,7 @@ public:
return symbol & 0xFF;
}
- int decode_len( Len_model & lm, const int pos_state )
+ unsigned decode_len( Len_model & lm, const int pos_state )
{
if( decode_bit( lm.choice1 ) == 0 )
return decode_tree( lm.bm_low[pos_state], len_low_bits );
@@ -1256,9 +1287,9 @@ class LZ_decoder
uint8_t peek( const unsigned distance ) const
{
- unsigned i = pos - distance - 1;
- if( pos <= distance ) i += dictionary_size;
- return buffer[i];
+ if( pos > distance ) return buffer[pos - distance - 1];
+ if( pos_wrapped ) return buffer[dictionary_size + pos - distance - 1];
+ return 0; // prev_byte of first byte
}
void put_byte( const uint8_t b )
@@ -1277,7 +1308,7 @@ public:
stream_pos( 0 ),
crc_( 0xFFFFFFFFU ),
pos_wrapped( false )
- { buffer[dictionary_size-1] = 0; } // prev_byte of first byte
+ {}
~LZ_decoder() { delete[] buffer; }
@@ -1315,7 +1346,7 @@ bool LZ_decoder::decode_member() // Returns false if error
Bit_model bm_rep2[State::states];
Bit_model bm_len[State::states][pos_states];
Bit_model bm_dis_slot[len_states][1<<dis_slot_bits];
- Bit_model bm_dis[modeled_distances-end_dis_model];
+ Bit_model bm_dis[modeled_distances-end_dis_model+1];
Bit_model bm_align[dis_align_size];
Len_model match_len_model;
Len_model rep_len_model;
@@ -1344,7 +1375,12 @@ bool LZ_decoder::decode_member() // Returns false if error
int len;
if( rdec.decode_bit( bm_rep[state()] ) != 0 ) // 2nd bit
{
- if( rdec.decode_bit( bm_rep0[state()] ) != 0 ) // 3rd bit
+ if( rdec.decode_bit( bm_rep0[state()] ) == 0 ) // 3rd bit
+ {
+ if( rdec.decode_bit( bm_len[state()][pos_state] ) == 0 ) // 4th bit
+ { state.set_short_rep(); put_byte( peek( rep0 ) ); continue; }
+ }
+ else
{
unsigned distance;
if( rdec.decode_bit( bm_rep1[state()] ) == 0 ) // 4th bit
@@ -1360,11 +1396,6 @@ bool LZ_decoder::decode_member() // Returns false if error
rep1 = rep0;
rep0 = distance;
}
- else
- {
- if( rdec.decode_bit( bm_len[state()][pos_state] ) == 0 ) // 4th bit
- { state.set_short_rep(); put_byte( peek( rep0 ) ); continue; }
- }
state.set_rep();
len = min_match_len + rdec.decode_len( rep_len_model, pos_state );
}
@@ -1373,15 +1404,14 @@ bool LZ_decoder::decode_member() // Returns false if error
rep3 = rep2; rep2 = rep1; rep1 = rep0;
len = min_match_len + rdec.decode_len( match_len_model, pos_state );
const int len_state = std::min( len - min_match_len, len_states - 1 );
- const int dis_slot =
- rdec.decode_tree( bm_dis_slot[len_state], dis_slot_bits );
- if( dis_slot < start_dis_model ) rep0 = dis_slot;
- else
+ rep0 = rdec.decode_tree( bm_dis_slot[len_state], dis_slot_bits );
+ if( rep0 >= start_dis_model )
{
+ const unsigned dis_slot = rep0;
const int direct_bits = ( dis_slot >> 1 ) - 1;
rep0 = ( 2 | ( dis_slot & 1 ) ) << direct_bits;
if( dis_slot < end_dis_model )
- rep0 += rdec.decode_tree_reversed( bm_dis + rep0 - dis_slot - 1,
+ rep0 += rdec.decode_tree_reversed( bm_dis + ( rep0 - dis_slot ),
direct_bits );
else
{
@@ -1417,7 +1447,7 @@ int main( const int argc, const char * const argv[] )
"It is not safe to use lzd for any real work.\n"
"\nUsage: %s < file.lz > file\n", argv[0] );
std::printf( "Lzd decompresses from standard input to standard output.\n"
- "\nCopyright (C) 2016 Antonio Diaz Diaz.\n"
+ "\nCopyright (C) 2017 Antonio Diaz Diaz.\n"
"This is free software: you are free to change and redistribute it.\n"
"There is NO WARRANTY, to the extent permitted by law.\n"
"Report bugs to lzip-bug@nongnu.org\n"
@@ -1432,7 +1462,7 @@ int main( const int argc, const char * const argv[] )
for( bool first_member = true; ; first_member = false )
{
- File_header header;
+ File_header header; // verify header
for( int i = 0; i < 6; ++i ) header[i] = std::getc( stdin );
if( std::feof( stdin ) || std::memcmp( header, "LZIP\x01", 5 ) != 0 )
{
@@ -1447,11 +1477,11 @@ int main( const int argc, const char * const argv[] )
{ std::fputs( "Invalid dictionary size in member header.\n", stderr );
return 2; }
- LZ_decoder decoder( dict_size );
+ LZ_decoder decoder( dict_size ); // decode LZMA stream
if( !decoder.decode_member() )
{ std::fputs( "Data error\n", stderr ); return 2; }
- File_trailer trailer;
+ File_trailer trailer; // verify trailer
for( int i = 0; i < 20; ++i ) trailer[i] = std::getc( stdin );
unsigned crc = 0;
for( int i = 3; i >= 0; --i ) { crc <<= 8; crc += trailer[i]; }
@@ -1495,19 +1525,19 @@ Concept index

Tag Table:
Node: Top208
-Node: Introduction1145
-Node: Invoking lzip6071
-Ref: --trailing-error6635
-Node: Quality assurance12628
-Node: File format20782
-Node: Algorithm23186
-Node: Stream format26012
-Node: Trailing data36660
-Node: Examples38038
-Ref: concat-example39211
-Node: Problems40211
-Node: Reference source code40741
-Node: Concept index54957
+Node: Introduction1147
+Node: Invoking lzip6367
+Ref: --trailing-error6931
+Node: Quality assurance13849
+Node: File format22003
+Node: Algorithm24407
+Node: Stream format27233
+Node: Trailing data37973
+Node: Examples39874
+Ref: concat-example41047
+Node: Problems42085
+Node: Reference source code42615
+Node: Concept index56932

End Tag Table