From b9665c8d391b176a290d827c4802ec3bc50ed970 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sun, 28 Jun 2020 11:38:41 +0200 Subject: Adding upstream version 0.6. Signed-off-by: Daniel Baumann --- README | 100 ++++++++++++++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 78 insertions(+), 22 deletions(-) (limited to 'README') diff --git a/README b/README index c5ebbf3..3e26a40 100644 --- a/README +++ b/README @@ -1,25 +1,23 @@ Description -Xlunzip is a test tool for the lzip decompression code of my lzip patch -for linux. Xlunzip is similar to lunzip, but it uses the lzip_decompress -linux module as a backend. Xlunzip tests the module for stream, -buffer-to-buffer and mixed decompression modes, including in-place -decompression (using the same buffer for input and output). You can use -xlunzip to verify that the module produces correct results when -decompressing single member files, multimember files, or the -concatenation of two or more compressed files. Xlunzip can be used with -unzcrash to test the robustness of the module to the decompression of -corrupted data. - -Note that the in-place decompression of concatenated files can't be -guaranteed to work because an arbitrarily low compression ratio of the -last part of the data can be achieved by appending enough empty -compressed members to a file, masking a high compression ratio at the -beginning of the data. - -The xlunzip tarball contains a copy of the lzip_decompress module and -can be compiled and tested without downloading or applying the patch to -the kernel. +Xlunzip is a test tool for the lzip decompression code of my lzip patch for +linux. Xlunzip is similar to lunzip, but it uses the lzip_decompress linux +module as a backend. Xlunzip tests the module for stream, buffer-to-buffer, +and mixed decompression modes, including in-place decompression (using the +same buffer for input and output). You can use xlunzip to verify that the +module produces correct results when decompressing single member files, +multimember files, or the concatenation of two or more compressed files. +Xlunzip can be used with unzcrash to test the robustness of the module to +the decompression of corrupted data. + +The distributed index feature of the lzip format allows xlunzip to +decompress concatenated files in place. This can't be guaranteed to work +with formats like gzip or bzip2 because they can't detect whether a high +compression ratio in the first members of the multimember data is being +masked by a low compression ratio in the last members. + +The xlunzip tarball contains a copy of the lzip_decompress module and can be +compiled and tested without downloading or applying the patch to the kernel. My lzip patch for linux can be found at http://download.savannah.gnu.org/releases/lzip/kernel/ @@ -29,14 +27,72 @@ Lzip related components in the kernel The lzip_decompress module in lib/lzip_decompress.c provides a versatile lzip decompression function able to do buffer to buffer decompression or -stream decompression with fill and flush callback functions. The usage -of the function is documented in include/linux/lzip.h. +stream decompression with fill and flush callback functions. The usage of +the function is documented in include/linux/lzip.h. For decompressing the kernel image, initramfs, and initrd, there is a wrapper function in lib/decompress_lunzip.c providing the same common interface as the other decompress_*.c files, which is defined in include/linux/decompress/generic.h. +Analysis of the in-place decompression +====================================== + +In order to decompress the kernel in place (using the same buffer for input +and output), the compressed data is placed at the end of the buffer used to +hold the decompressed data. The buffer must be large enough to contain after +the decompressed data extra space for a marker, a trailer, the maximum +possible data expansion, and (if the compressed data consists of more than +one member) N-1 empty members. + + |------ compressed data ------| + V V +|----------------|-------------------|---------| +^ ^ extra +|-------- decompressed data ---------| + +The input pointer initially points to the beginning of the compressed data +and the output pointer initially points to the beginning of the buffer. +Decompressing compressible data reduces the distance between the pointers, +while decompressing uncompressible data increases the distance. The extra +space must be large enough that the output pointer does not overrun the +input pointer even if all the overlap between compressed and decompressed +data is uncompressible. The worst case is very compressible data followed by +uncompressible data because in this case the output pointer increases faster +when the input pointer is smaller. + + | * <-- input pointer + | * , <-- output pointer + | * , ' + | x ' <-- overrun (x) +memory | * ,' +address | * ,' + |* ,' + | ,' + | ,' + |,' + `-------------------------- + time + +All we need to know to calculate the minimum required extra space is: + The maximum expansion ratio. + The size of the last part of a member required to verify integrity. + For multimember data, the overhead per member. (36 bytes for lzip). + +The maximum expansion ratio of LZMA data is of about 1.4%. Rounding this up +to 1/64 (1.5625%) and adding 36 bytes per input member, the extra space +required to decompress lzip data in place is: + extra_bytes = ( compressed_size >> 6 ) + members * 36 + +Using the compressed size to calculate the extra_bytes (as in the equation +above) may slightly overestimate the amount of space required in the worst +case. But calculating the extra_bytes from the uncompressed size (as does +linux) is wrong (and inefficient for high compression ratios). The formula +used in arch/x86/boot/header.S + extra_bytes = (uncompressed_size >> 8) + 65536 +fails with 1 MB of zeros followed by 8 MB of random data, and wastes memory +for compression ratios > 4:1. + Copyright (C) 2016-2020 Antonio Diaz Diaz. -- cgit v1.2.3