summaryrefslogtreecommitdiffstats
path: root/README
diff options
context:
space:
mode:
Diffstat (limited to 'README')
-rw-r--r--README108
1 files changed, 108 insertions, 0 deletions
diff --git a/README b/README
new file mode 100644
index 0000000..7dc2950
--- /dev/null
+++ b/README
@@ -0,0 +1,108 @@
+Description
+
+Lzlib is a data compression library providing in-memory LZMA compression and
+decompression functions, including integrity checking of the decompressed
+data. The compressed data format used by the library is the lzip format.
+Lzlib is written in C.
+
+The lzip file format is designed for data sharing and long-term archiving,
+taking into account both data integrity and decoder availability:
+
+ * The lzip format provides very safe integrity checking and some data
+ recovery means. The program lziprecover can repair bit flip errors
+ (one of the most common forms of data corruption) in lzip files, and
+ provides data recovery capabilities, including error-checked merging
+ of damaged copies of a file.
+
+ * The lzip format is as simple as possible (but not simpler). The lzip
+ manual provides the source code of a simple decompressor along with a
+ detailed explanation of how it works, so that with the only help of the
+ lzip manual it would be possible for a digital archaeologist to extract
+ the data from a lzip file long after quantum computers eventually
+ render LZMA obsolete.
+
+ * Additionally the lzip reference implementation is copylefted, which
+ guarantees that it will remain free forever.
+
+A nice feature of the lzip format is that a corrupt byte is easier to repair
+the nearer it is from the beginning of the file. Therefore, with the help of
+lziprecover, losing an entire archive just because of a corrupt byte near
+the beginning is a thing of the past.
+
+The functions and variables forming the interface of the compression library
+are declared in the file 'lzlib.h'. Usage examples of the library are given
+in the files 'bbexample.c', 'ffexample.c', and 'minilzip.c' from the source
+distribution.
+
+As 'lzlib.h' can be used by C and C++ programs, it must not impose a choice
+of system headers on the program by including one of them. Therefore it is
+the responsibility of the program using lzlib to include before 'lzlib.h'
+some header that declares the type 'uint8_t'. There are at least four such
+headers in C and C++: 'stdint.h', 'cstdint', 'inttypes.h', and 'cinttypes'.
+
+All the library functions are thread safe. The library does not install any
+signal handler. The decoder checks the consistency of the compressed data,
+so the library should never crash even in case of corrupted input.
+
+Compression/decompression is done by repeatedly calling a couple of
+read/write functions until all the data have been processed by the library.
+This interface is safer and less error prone than the traditional zlib
+interface.
+
+Compression/decompression is done when the read function is called. This
+means the value returned by the position functions is not updated until a
+read call, even if a lot of data are written. If you want the data to be
+compressed in advance, just call the read function with a size equal to 0.
+
+If all the data to be compressed are written in advance, lzlib automatically
+adjusts the header of the compressed data to use the largest dictionary size
+that does not exceed neither the data size nor the limit given to
+'LZ_compress_open'. This feature reduces the amount of memory needed for
+decompression and allows minilzip to produce identical compressed output as
+lzip.
+
+Lzlib correctly decompresses a data stream which is the concatenation of
+two or more compressed data streams. The result is the concatenation of the
+corresponding decompressed data streams. Integrity testing of concatenated
+compressed data streams is also supported.
+
+Lzlib is able to compress and decompress streams of unlimited size by
+automatically creating multimember output. The members so created are large,
+about 2 PiB each.
+
+In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a
+concrete algorithm; it is more like "any algorithm using the LZMA coding
+scheme". For example, the option '-0' of lzip uses the scheme in almost the
+simplest way possible; issuing the longest match it can find, or a literal
+byte if it can't find a match. Inversely, a much more elaborated way of
+finding coding sequences of minimum size than the one currently used by lzip
+could be developed, and the resulting sequence could also be coded using the
+LZMA coding scheme.
+
+Lzlib currently implements two variants of the LZMA algorithm: fast (used by
+option '-0' of minilzip) and normal (used by all other compression levels).
+
+The high compression of LZMA comes from combining two basic, well-proven
+compression ideas: sliding dictionaries (LZ77) and markov models (the thing
+used by every compression algorithm that uses a range encoder or similar
+order-0 entropy coder as its last stage) with segregation of contexts
+according to what the bits are used for.
+
+The ideas embodied in lzlib are due to (at least) the following people:
+Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrei Markov (for the
+definition of Markov chains), G.N.N. Martin (for the definition of range
+encoding), Igor Pavlov (for putting all the above together in LZMA), and
+Julian Seward (for bzip2's CLI).
+
+LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have
+been compressed. Decompressed is used to refer to data which have undergone
+the process of decompression.
+
+
+Copyright (C) 2009-2024 Antonio Diaz Diaz.
+
+This file is free documentation: you have unlimited permission to copy,
+distribute, and modify it.
+
+The file Makefile.in is a data file used by configure to produce the Makefile.
+It has the same copyright owner and permissions that configure itself.