summaryrefslogtreecommitdiffstats
path: root/README
blob: 65a8573ac31555a7ab1e540c155097cc744624b5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
Description

Lzlib is a data compression library providing in-memory LZMA compression
and decompression functions, including integrity checking of the
decompressed data. The compressed data format used by the library is the
lzip format. Lzlib is written in C.

The lzip file format is designed for long-term data archiving, taking
into account both data integrity and decoder availability:

   * The lzip format provides very safe integrity checking and some data
     recovery means. The lziprecover program can repair bit-flip errors
     (one of the most common forms of data corruption) in lzip files,
     and provides data recovery capabilities, including error-checked
     merging of damaged copies of a file.

   * The lzip format is as simple as possible (but not simpler). The
     lzip manual provides the code of a simple decompressor along with a
     detailed explanation of how it works, so that with the only help of
     the lzip manual it would be possible for a digital archaeologist to
     extract the data from a lzip file long after quantum computers
     eventually render LZMA obsolete.

   * Additionally lzip is copylefted, which guarantees that it will
     remain free forever.

A nice feature of the lzip format is that a corrupt byte is easier to
repair the nearer it is from the beginning of the file. Therefore, with
the help of lziprecover, losing an entire archive just because of a
corrupt byte near the beginning is a thing of the past.

The functions and variables forming the interface of the compression
library are declared in the file 'lzlib.h'. Usage examples of the
library are given in the files 'main.c' and 'bbexample.c' from the
source distribution.

Compression/decompression is done by repeatedly calling a couple of
read/write functions until all the data has been processed by the
library. This interface is safer and less error prone than the
traditional zlib interface.

Compression/decompression is done when the read function is called. This
means the value returned by the position functions will not be updated
until some data is read, even if you write a lot of data. If you want
the data to be compressed in advance, just call the read function with a
size equal to 0.

Lzlib will correctly decompress a data stream which is the concatenation
of two or more compressed data streams. The result is the concatenation
of the corresponding decompressed data streams. Integrity testing of
concatenated compressed data streams is also supported.

All the library functions are thread safe. The library does not install
any signal handler. The decoder checks the consistency of the compressed
data, so the library should never crash even in case of corrupted input.

There is no such thing as a "LZMA algorithm"; it is more like a "LZMA
coding scheme". For example, the option '-0' of lzip uses the scheme in
almost the simplest way possible; issuing the longest match it can find,
or a literal byte if it can't find a match. Inversely, a much more
elaborated way of finding coding sequences of minimum price than the one
currently used by lzip could be developed, and the resulting sequence
could also be coded using the LZMA coding scheme.

Lzip currently implements two variants of the LZMA algorithm; fast (used
by option -0) and normal (used by all other compression levels). Lzlib
just implements the "normal" variant.

The high compression of LZMA comes from combining two basic, well-proven
compression ideas: sliding dictionaries (LZ77/78) and markov models (the
thing used by every compression algorithm that uses a range encoder or
similar order-0 entropy coder as its last stage) with segregation of
contexts according to what the bits are used for.

The ideas embodied in lzlib are due to (at least) the following people:
Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for
the definition of Markov chains), G.N.N. Martin (for the definition of
range encoding), Igor Pavlov (for putting all the above together in
LZMA), and Julian Seward (for bzip2's CLI).


Copyright (C) 2009-2014 Antonio Diaz Diaz.

This file is free documentation: you have unlimited permission to copy,
distribute and modify it.

The file Makefile.in is a data file used by configure to produce the
Makefile. It has the same copyright owner and permissions that configure
itself.