summaryrefslogtreecommitdiffstats
path: root/README
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2020-06-28 09:38:47 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2020-06-28 09:38:47 +0000
commitd1eec8184551651d58eefdea942648f2c8432240 (patch)
treec8089fa3b24adda100afb1294c21f2747d321cb1 /README
parentReleasing debian version 0.5-1. (diff)
downloadxlunzip-d1eec8184551651d58eefdea942648f2c8432240.tar.xz
xlunzip-d1eec8184551651d58eefdea942648f2c8432240.zip
Merging upstream version 0.6.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'README')
-rw-r--r--README100
1 files changed, 78 insertions, 22 deletions
diff --git a/README b/README
index c5ebbf3..3e26a40 100644
--- a/README
+++ b/README
@@ -1,25 +1,23 @@
Description
-Xlunzip is a test tool for the lzip decompression code of my lzip patch
-for linux. Xlunzip is similar to lunzip, but it uses the lzip_decompress
-linux module as a backend. Xlunzip tests the module for stream,
-buffer-to-buffer and mixed decompression modes, including in-place
-decompression (using the same buffer for input and output). You can use
-xlunzip to verify that the module produces correct results when
-decompressing single member files, multimember files, or the
-concatenation of two or more compressed files. Xlunzip can be used with
-unzcrash to test the robustness of the module to the decompression of
-corrupted data.
-
-Note that the in-place decompression of concatenated files can't be
-guaranteed to work because an arbitrarily low compression ratio of the
-last part of the data can be achieved by appending enough empty
-compressed members to a file, masking a high compression ratio at the
-beginning of the data.
-
-The xlunzip tarball contains a copy of the lzip_decompress module and
-can be compiled and tested without downloading or applying the patch to
-the kernel.
+Xlunzip is a test tool for the lzip decompression code of my lzip patch for
+linux. Xlunzip is similar to lunzip, but it uses the lzip_decompress linux
+module as a backend. Xlunzip tests the module for stream, buffer-to-buffer,
+and mixed decompression modes, including in-place decompression (using the
+same buffer for input and output). You can use xlunzip to verify that the
+module produces correct results when decompressing single member files,
+multimember files, or the concatenation of two or more compressed files.
+Xlunzip can be used with unzcrash to test the robustness of the module to
+the decompression of corrupted data.
+
+The distributed index feature of the lzip format allows xlunzip to
+decompress concatenated files in place. This can't be guaranteed to work
+with formats like gzip or bzip2 because they can't detect whether a high
+compression ratio in the first members of the multimember data is being
+masked by a low compression ratio in the last members.
+
+The xlunzip tarball contains a copy of the lzip_decompress module and can be
+compiled and tested without downloading or applying the patch to the kernel.
My lzip patch for linux can be found at
http://download.savannah.gnu.org/releases/lzip/kernel/
@@ -29,14 +27,72 @@ Lzip related components in the kernel
The lzip_decompress module in lib/lzip_decompress.c provides a versatile
lzip decompression function able to do buffer to buffer decompression or
-stream decompression with fill and flush callback functions. The usage
-of the function is documented in include/linux/lzip.h.
+stream decompression with fill and flush callback functions. The usage of
+the function is documented in include/linux/lzip.h.
For decompressing the kernel image, initramfs, and initrd, there is a
wrapper function in lib/decompress_lunzip.c providing the same common
interface as the other decompress_*.c files, which is defined in
include/linux/decompress/generic.h.
+Analysis of the in-place decompression
+======================================
+
+In order to decompress the kernel in place (using the same buffer for input
+and output), the compressed data is placed at the end of the buffer used to
+hold the decompressed data. The buffer must be large enough to contain after
+the decompressed data extra space for a marker, a trailer, the maximum
+possible data expansion, and (if the compressed data consists of more than
+one member) N-1 empty members.
+
+ |------ compressed data ------|
+ V V
+|----------------|-------------------|---------|
+^ ^ extra
+|-------- decompressed data ---------|
+
+The input pointer initially points to the beginning of the compressed data
+and the output pointer initially points to the beginning of the buffer.
+Decompressing compressible data reduces the distance between the pointers,
+while decompressing uncompressible data increases the distance. The extra
+space must be large enough that the output pointer does not overrun the
+input pointer even if all the overlap between compressed and decompressed
+data is uncompressible. The worst case is very compressible data followed by
+uncompressible data because in this case the output pointer increases faster
+when the input pointer is smaller.
+
+ | * <-- input pointer
+ | * , <-- output pointer
+ | * , '
+ | x ' <-- overrun (x)
+memory | * ,'
+address | * ,'
+ |* ,'
+ | ,'
+ | ,'
+ |,'
+ `--------------------------
+ time
+
+All we need to know to calculate the minimum required extra space is:
+ The maximum expansion ratio.
+ The size of the last part of a member required to verify integrity.
+ For multimember data, the overhead per member. (36 bytes for lzip).
+
+The maximum expansion ratio of LZMA data is of about 1.4%. Rounding this up
+to 1/64 (1.5625%) and adding 36 bytes per input member, the extra space
+required to decompress lzip data in place is:
+ extra_bytes = ( compressed_size >> 6 ) + members * 36
+
+Using the compressed size to calculate the extra_bytes (as in the equation
+above) may slightly overestimate the amount of space required in the worst
+case. But calculating the extra_bytes from the uncompressed size (as does
+linux) is wrong (and inefficient for high compression ratios). The formula
+used in arch/x86/boot/header.S
+ extra_bytes = (uncompressed_size >> 8) + 65536
+fails with 1 MB of zeros followed by 8 MB of random data, and wastes memory
+for compression ratios > 4:1.
+
Copyright (C) 2016-2020 Antonio Diaz Diaz.