Data Compression

This page contains a bunch of objects that implement various parts of compression algorithms. They can be put together in different ways to construct many different algorithms. Note that the compress_stream object contains complete compression algorithms. So if you just want to compress some data then you can easily use that object and not bother with the others.

In the column to the right you can see benchmark data for each of the compress_stream typedefs. The times measured are the time it takes to compress and then decompress each file. It was run on a 3.0ghz P4. For reference see the Canterbury corpus web site.

Objects compress_stream conditioning_class entropy_decoder entropy_encoder entropy_decoder_model entropy_encoder_model lz77_buffer lzp_buffer

Benchmarks kernel_1a kernel_1a.html kernel_1b kernel_1b.html kernel_1c kernel_1c.html kernel_1da kernel_1da.html kernel_1db kernel_1db.html kernel_1ea kernel_1ea.html kernel_1eb kernel_1eb.html kernel_1ec kernel_1ec.html kernel_2a kernel_2a.html kernel_3a kernel_3a.html kernel_3b kernel_3b.html

compress_stream dlib/compress_stream.h dlib/compress_stream/compress_stream_kernel_abstract.h This object is pretty straight forward. It has no state and just contains the functions compress and decompress. They do just what their names imply to iostream objects. compress_stream_ex.cpp.html file_to_code_ex.cpp.html compress_stream_kernel_1 dlib/compress_stream/compress_stream_kernel_1.h This implementation is done using the entropy_encoder_model and entropy_decoder_model objects. kernel_1a is a typedef for compress_stream_kernel_1 which uses entropy_decoder_model_kernel_1b and entropy_decoder_model_kernel_1b kernel_1b is a typedef for compress_stream_kernel_1 which uses entropy_decoder_model_kernel_2b and entropy_decoder_model_kernel_2b kernel_1c is a typedef for compress_stream_kernel_1 which uses entropy_decoder_model_kernel_3b and entropy_decoder_model_kernel_3b kernel_1da is a typedef for compress_stream_kernel_1 which uses entropy_decoder_model_kernel_4a and entropy_decoder_model_kernel_4a kernel_1db is a typedef for compress_stream_kernel_1 which uses entropy_decoder_model_kernel_4b and entropy_decoder_model_kernel_4b kernel_1ea is a typedef for compress_stream_kernel_1 which uses entropy_decoder_model_kernel_5a and entropy_decoder_model_kernel_5a kernel_1eb is a typedef for compress_stream_kernel_1 which uses entropy_decoder_model_kernel_5b and entropy_decoder_model_kernel_5b kernel_1ec is a typedef for compress_stream_kernel_1 which uses entropy_decoder_model_kernel_5c and entropy_decoder_model_kernel_5c compress_stream_kernel_2 dlib/compress_stream/compress_stream_kernel_2.h This implementation is done using the entropy_encoder_model and entropy_decoder_model objects. It also uses the lz77_buffer object. It uses the entropy coder models to encode symbols when there is no match found by the lz77_buffer. kernel_2a is a typedef for compress_stream_kernel_2 which uses entropy_encoder_model_kernel_2b, entropy_decoder_model_kernel_2b, and lz77_buffer_kernel_2a. compress_stream_kernel_3 dlib/compress_stream/compress_stream_kernel_3.h This implementation is done using the lzp_buffer object and crc32 object. It does not use any sort of entropy coding, instead a byte aligned output method is used. kernel_3a is a typedef for compress_stream_kernel_3 which uses lzp_buffer_kernel_1. kernel_3b is a typedef for compress_stream_kernel_3 which uses lzp_buffer_kernel_2. conditioning_class dlib/conditioning_class.h dlib/conditioning_class/conditioning_class_kernel_abstract.h This object represents a conditioning class used for arithmetic style compression. It maintains the cumulative counts which are needed by the entropy_encoder and entropy_decoder objects below. conditioning_class_kernel_1 dlib/conditioning_class/conditioning_class_kernel_1.h This implementation is done using an array to store all the counts and they are summed whenever the cumulative counts are requested. It's pretty straight forward. kernel_1a is a typedef for conditioning_class_kernel_1 conditioning_class_kernel_2 dlib/conditioning_class/conditioning_class_kernel_2.h This implementation is done using a binary tree where each node in the tree represents one symbol and contains that symbols count and the sum of all the counts for the nodes to the left. This way when you request a cumulative count it can be computed by visiting log n nodes where n is the size of the alphabet. kernel_2a is a typedef for conditioning_class_kernel_2 conditioning_class_kernel_3 dlib/conditioning_class/conditioning_class_kernel_3.h This implementation is done using an array to store all the counts and they are summed whenever the cumulative counts are requested. The counts are also kept in semi-sorted order to speed up the calculation of the cumulative count. kernel_3a is a typedef for conditioning_class_kernel_3 conditioning_class_kernel_4 dlib/conditioning_class/conditioning_class_kernel_4.h This implementation is done using a linked list to store all the counts and they are summed whenever the cumulative counts are requested. The counts are also kept in semi-sorted order to speed up the calculation of the cumulative count. This implementation also uses the memory_manager component to create a memory pool of linked list nodes. This implementation is especially useful for high order contexts and/or very large and sparse alphabets. kernel_4a is a typedef for conditioning_class_kernel_4 with a memory pool of 10,000 nodes. kernel_4b is a typedef for conditioning_class_kernel_4 with a memory pool of 100,000 nodes. kernel_4c is a typedef for conditioning_class_kernel_4 with a memory pool of 1,000,000 nodes. kernel_4d is a typedef for conditioning_class_kernel_4 with a memory pool of 10,000,000 nodes. entropy_decoder dlib/entropy_decoder.h dlib/entropy_decoder/entropy_decoder_kernel_abstract.h This object represents an entropy decoder. E.g. the decoding part of an arithmetic coder. entropy_decoder_kernel_1 dlib/entropy_decoder/entropy_decoder_kernel_1.h This object is implemented using arithmetic coding and is done in the straight forward way using integers and fixed precision math. kernel_1a is a typedef for entropy_decoder_kernel_1 entropy_decoder_kernel_2 dlib/entropy_decoder/entropy_decoder_kernel_2.h This object is implemented using "range" coding and is done in the straight forward way using integers and fixed precision math. kernel_2a is a typedef for entropy_decoder_kernel_2 entropy_encoder dlib/entropy_encoder.h dlib/entropy_encoder/entropy_encoder_kernel_abstract.h This object represents an entropy encoder. E.g. the encoding part of an arithmetic coder. entropy_encoder_kernel_1 dlib/entropy_encoder/entropy_encoder_kernel_1.h This object is implemented using arithmetic coding and is done in the straight forward way using integers and fixed precision math. kernel_1a is a typedef for entropy_encoder_kernel_1 entropy_encoder_kernel_2 dlib/entropy_encoder/entropy_encoder_kernel_2.h This object is implemented using "range" coding and is done in the straight forward way using integers and fixed precision math. kernel_2a is a typedef for entropy_encoder_kernel_2 entropy_decoder_model dlib/entropy_decoder_model.h dlib/entropy_decoder_model/entropy_decoder_model_kernel_abstract.h This object represents some kind of statistical model. You can use it to read symbols from an entropy_decoder and it will calculate the cumulative counts/probabilities and manage contexts for you. entropy_decoder_model_kernel_1 dlib/entropy_decoder_model/entropy_decoder_model_kernel_1.h This object is implemented using the conditioning_class component. It implements an order-0 finite context model and uses lazy exclusions and update exclusions. The escape method used is method D. kernel_1a is a typedef for entropy_decoder_model_kernel_1 that uses conditioning_class_kernel_1a kernel_1b is a typedef for entropy_decoder_model_kernel_1 that uses conditioning_class_kernel_2a kernel_1c is a typedef for entropy_decoder_model_kernel_1 that uses conditioning_class_kernel_3a entropy_decoder_model_kernel_2 dlib/entropy_decoder_model/entropy_decoder_model_kernel_2.h This object is implemented using the conditioning_class component. It implements an order-1-0 finite context model and uses lazy exclusions and update exclusions. The escape method used is method D. kernel_2a is a typedef for entropy_decoder_model_kernel_2 that uses conditioning_class_kernel_1a kernel_2b is a typedef for entropy_decoder_model_kernel_2 that uses conditioning_class_kernel_2a kernel_2c is a typedef for entropy_decoder_model_kernel_2 that uses conditioning_class_kernel_3a kernel_2d is a typedef for entropy_decoder_model_kernel_2 that uses conditioning_class_kernel_2a for its order-0 context and conditioning_class_kernel_4b for its order-1 context. entropy_decoder_model_kernel_3 dlib/entropy_decoder_model/entropy_decoder_model_kernel_3.h This object is implemented using the conditioning_class component. It implements an order-2-1-0 finite context model and uses lazy exclusions and update exclusions. The escape method used is method D. kernel_3a is a typedef for entropy_decoder_model_kernel_3 that uses conditioning_class_kernel_1a for orders 0 and 1 and conditioning_class_kernel_4b for order-2. kernel_3b is a typedef for entropy_decoder_model_kernel_3 that uses conditioning_class_kernel_2a for orders 0 and 1 and conditioning_class_kernel_4b for order-2. kernel_3c is a typedef for entropy_decoder_model_kernel_3 that uses conditioning_class_kernel_3a for orders 0 and 1 and conditioning_class_kernel_4b for order-2. entropy_decoder_model_kernel_4 dlib/entropy_decoder_model/entropy_decoder_model_kernel_4.h This object is implemented using a variation of the PPM algorithm described by Alistair Moffat in his paper "Implementing the PPM data compression scheme." It provides template arguments to select the maximum order and maximum memory to use. For speed, exclusions are not used. The escape method used is method D. kernel_4a is a typedef for entropy_decoder_model_kernel_4 with the max order set to 4 and the max number of nodes set to 200,000 kernel_4b is a typedef for entropy_decoder_model_kernel_4 with the max order set to 5 and the max number of nodes set to 1,000,000 entropy_decoder_model_kernel_5 dlib/entropy_decoder_model/entropy_decoder_model_kernel_5.h This object is implemented using a variation of the PPM algorithm described by Alistair Moffat in his paper "Implementing the PPM data compression scheme." It provides template arguments to select the maximum order and maximum memory to use. Exclusions are used. The escape method used is method D. This implementation is very much like kernel_4 except it is tuned for higher compression rather than speed. This also uses Dmitry Shkarin's Information Inheritance scheme. kernel_5a is a typedef for entropy_decoder_model_kernel_5 with the max order set to 4 and the max number of nodes set to 200,000 kernel_5b is a typedef for entropy_decoder_model_kernel_5 with the max order set to 5 and the max number of nodes set to 1,000,000 kernel_5c is a typedef for entropy_decoder_model_kernel_5 with the max order set to 7 and the max number of nodes set to 2,500,000 entropy_decoder_model_kernel_6 dlib/entropy_decoder_model/entropy_decoder_model_kernel_6.h This object just assigns every symbol the same probability. I.e. it uses an order-(-1) model. kernel_6a is a typedef for entropy_decoder_model_kernel_6 entropy_encoder_model dlib/entropy_encoder_model.h dlib/entropy_encoder_model/entropy_encoder_model_kernel_abstract.h This object represents some kind of statistical model. You can use it to write symbols to an entropy_encoder and it will calculate the cumulative counts/probabilities and manage contexts for you. entropy_encoder_model_kernel_1 dlib/entropy_encoder_model/entropy_encoder_model_kernel_1.h This object is implemented using the conditioning_class component. It implements an order-0 finite context model and uses lazy exclusions and update exclusions. The escape method used is method D. kernel_1a is a typedef for entropy_encoder_model_kernel_1 that uses conditioning_class_kernel_1a kernel_1b is a typedef for entropy_encoder_model_kernel_1 that uses conditioning_class_kernel_2a kernel_1c is a typedef for entropy_encoder_model_kernel_1 that uses conditioning_class_kernel_3a entropy_encoder_model_kernel_2 dlib/entropy_encoder_model/entropy_encoder_model_kernel_2.h This object is implemented using the conditioning_class component. It implements an order-1-0 finite context model and uses lazy exclusions and update exclusions. The escape method used is method D. kernel_2a is a typedef for entropy_encoder_model_kernel_2 that uses conditioning_class_kernel_1a kernel_2b is a typedef for entropy_encoder_model_kernel_2 that uses conditioning_class_kernel_2a kernel_2c is a typedef for entropy_encoder_model_kernel_2 that uses conditioning_class_kernel_3a kernel_2d is a typedef for entropy_encoder_model_kernel_2 that uses conditioning_class_kernel_2a for its order-0 context and conditioning_class_kernel_4b for its order-1 context. entropy_encoder_model_kernel_3 dlib/entropy_encoder_model/entropy_encoder_model_kernel_3.h This object is implemented using the conditioning_class component. It implements an order-2-1-0 finite context model and uses lazy exclusions and update exclusions. The escape method used is method D. kernel_3a is a typedef for entropy_encoder_model_kernel_3 that uses conditioning_class_kernel_1a for orders 0 and 1 and conditioning_class_kernel_4b for order-2. kernel_3b is a typedef for entropy_encoder_model_kernel_3 that uses conditioning_class_kernel_2a for orders 0 and 1 and conditioning_class_kernel_4b for order-2. kernel_3c is a typedef for entropy_encoder_model_kernel_3 that uses conditioning_class_kernel_3a for orders 0 and 1 and conditioning_class_kernel_4b for order-2. entropy_encoder_model_kernel_4 dlib/entropy_encoder_model/entropy_encoder_model_kernel_4.h This object is implemented using a variation of the PPM algorithm described by Alistair Moffat in his paper "Implementing the PPM data compression scheme." It provides template arguments to select the maximum order and maximum memory to use. For speed, exclusions are not used. The escape method used is method D. kernel_4a is a typedef for entropy_encoder_model_kernel_4 with the max order set to 4 and the max number of nodes set to 200,000 kernel_4b is a typedef for entropy_encoder_model_kernel_4 with the max order set to 5 and the max number of nodes set to 1,000,000 entropy_encoder_model_kernel_5 dlib/entropy_encoder_model/entropy_encoder_model_kernel_5.h This object is implemented using a variation of the PPM algorithm described by Alistair Moffat in his paper "Implementing the PPM data compression scheme." It provides template arguments to select the maximum order and maximum memory to use. Exclusions are used. The escape method used is method D. This implementation is very much like kernel_4 except it is tuned for higher compression rather than speed. This also uses Dmitry Shkarin's Information Inheritance scheme. kernel_5a is a typedef for entropy_encoder_model_kernel_5 with the max order set to 4 and the max number of nodes set to 200,000 kernel_5b is a typedef for entropy_encoder_model_kernel_5 with the max order set to 5 and the max number of nodes set to 1,000,000 kernel_5c is a typedef for entropy_encoder_model_kernel_5 with the max order set to 7 and the max number of nodes set to 2,500,000 entropy_encoder_model_kernel_6 dlib/entropy_encoder_model/entropy_encoder_model_kernel_6.h This object just assigns every symbol the same probability. I.e. it uses an order-(-1) model. kernel_6a is a typedef for entropy_encoder_model_kernel_6 lz77_buffer dlib/lz77_buffer.h dlib/lz77_buffer/lz77_buffer_kernel_abstract.h This object represents a pair of buffers (history and lookahead buffers) used during lz77 style compression. lz77_buffer_kernel_1 dlib/lz77_buffer/lz77_buffer_kernel_1.h This object is implemented using the sliding_buffer and it just does simple linear searches of the history buffer to find matches. kernel_1a is a typedef for lz77_buffer_kernel_1 that uses sliding_buffer_kernel_1 lz77_buffer_kernel_2 dlib/lz77_buffer/lz77_buffer_kernel_2.h This object is implemented using the sliding_buffer. It finds matches by using a hash table. kernel_2a is a typedef for lz77_buffer_kernel_2 that uses sliding_buffer_kernel_1 lzp_buffer dlib/lzp_buffer.h dlib/lzp_buffer/lzp_buffer_kernel_abstract.h This object represents some variation on the LZP algorithm described by Charles Bloom in his paper "LZP: a new data compression algorithm" lzp_buffer_kernel_1 dlib/lzp_buffer/lzp_buffer_kernel_1.h This object is implemented using the sliding_buffer and uses an order-3 model to predict matches. kernel_1a is a typedef for lzp_buffer_kernel_1 that uses sliding_buffer_kernel_1 lzp_buffer_kernel_2 dlib/lzp_buffer/lzp_buffer_kernel_2.h This object is implemented using the sliding_buffer and uses an order-5-4-3 model to predict matches. kernel_2a is a typedef for lzp_buffer_kernel_2 that uses sliding_buffer_kernel_1