Description Xlunzip is a test tool for the lzip decompression code of my lzip patch for linux. Xlunzip is similar to lunzip, but it uses the lzip_decompress linux module as a backend. Xlunzip tests the module for stream, buffer-to-buffer, and mixed decompression modes, including in-place decompression (using the same buffer for input and output). You can use xlunzip to verify that the module produces correct results when decompressing single member files, multimember files, or the concatenation of two or more compressed files. Xlunzip can be used with unzcrash to test the robustness of the module to the decompression of corrupted data. The distributed index feature of the lzip format allows xlunzip to decompress concatenated files in place. This can't be guaranteed to work with formats like gzip or bzip2 because they can't detect whether a high compression ratio in the first members of the multimember data is being masked by a low compression ratio in the last members. The xlunzip tarball contains a copy of the lzip_decompress module and can be compiled and tested without downloading or applying the patch to the kernel. My lzip patch for linux can be found at http://download.savannah.gnu.org/releases/lzip/kernel/ Lzip related components in the kernel ===================================== The lzip_decompress module in lib/lzip_decompress.c provides a versatile lzip decompression function able to do buffer to buffer decompression or stream decompression with fill and flush callback functions. The usage of the function is documented in include/linux/lzip.h. For decompressing the kernel image, initramfs, and initrd, there is a wrapper function in lib/decompress_lunzip.c providing the same common interface as the other decompress_*.c files, which is defined in include/linux/decompress/generic.h. Analysis of the in-place decompression ====================================== In order to decompress the kernel in place (using the same buffer for input and output), the compressed data is placed at the end of the buffer used to hold the decompressed data. The buffer must be large enough to contain after the decompressed data extra space for a marker, a trailer, the maximum possible data expansion, and (if the compressed data consists of more than one member) N-1 empty members. |------ compressed data ------| V V |----------------|-------------------|---------| ^ ^ extra |-------- decompressed data ---------| The input pointer initially points to the beginning of the compressed data and the output pointer initially points to the beginning of the buffer. Decompressing compressible data reduces the distance between the pointers, while decompressing uncompressible data increases the distance. The extra space must be large enough that the output pointer does not overrun the input pointer even if all the overlap between compressed and decompressed data is uncompressible. The worst case is very compressible data followed by uncompressible data because in this case the output pointer increases faster when the input pointer is smaller. | * <-- input pointer (*) | * , <-- output pointer (,) | * , ' | x ' <-- overrun (x) memory | * ,' address | * ,' |* ,' | ,' | ,' |,' '-------------------------- time All we need to know to calculate the minimum required extra space is: The maximum expansion ratio. The size of the last part of a member required to verify integrity. For multimember data, the overhead per member. (36 bytes for lzip). The maximum expansion ratio of LZMA data is of about 1.4%. Rounding this up to 1/64 (1.5625%) and adding 36 bytes per input member, the extra space required to decompress lzip data in place is: extra_bytes = ( compressed_size >> 6 ) + members * 36 Using the compressed size to calculate the extra_bytes (as in the formula above) may slightly overestimate the amount of space required in the worst case. But calculating the extra_bytes from the uncompressed size (as does linux currently) is wrong (and inefficient for high compression ratios). The formula used in arch/x86/boot/header.S extra_bytes = ( uncompressed_size >> 8 ) + 65536 fails to decompress 1 MB of zeros followed by 8 MB of random data, wastes memory for compression ratios larger than 4:1, and does not even consider multimember data. Copyright (C) 2016-2021 Antonio Diaz Diaz. This file is free documentation: you have unlimited permission to copy, distribute, and modify it. The file Makefile.in is a data file used by configure to produce the Makefile. It has the same copyright owner and permissions that configure itself.