README


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108

Description

Xlunzip is a test tool for the lzip decompression code of my lzip patch for
linux. Xlunzip is similar to lunzip, but it uses the lzip_decompress linux
module as a backend. Xlunzip tests the module for stream, buffer-to-buffer,
and mixed decompression modes, including in-place decompression (using the
same buffer for input and output). You can use xlunzip to verify that the
module produces correct results when decompressing single member files,
multimember files, or the concatenation of two or more compressed files.
Xlunzip can be used with unzcrash to test the robustness of the module to
the decompression of corrupted data.

The distributed index feature of the lzip format allows xlunzip to
decompress concatenated files in place. This can't be guaranteed to work
with formats like gzip or bzip2 because they can't detect whether a high
compression ratio in the first members of the multimember data is being
masked by a low compression ratio in the last members.

The xlunzip tarball contains a copy of the lzip_decompress module and can be
compiled and tested without downloading or applying the patch to the kernel.

My lzip patch for linux can be found at
http://download.savannah.gnu.org/releases/lzip/kernel/

Lzip related components in the kernel
=====================================

The lzip_decompress module in lib/lzip_decompress.c provides a versatile
lzip decompression function able to do buffer to buffer decompression or
stream decompression with fill and flush callback functions. The usage of
the function is documented in include/linux/lzip.h.

For decompressing the kernel image, initramfs, and initrd, there is a
wrapper function in lib/decompress_lunzip.c providing the same common
interface as the other decompress_*.c files, which is defined in
include/linux/decompress/generic.h.

Analysis of the in-place decompression
======================================

In order to decompress the kernel in place (using the same buffer for input
and output), the compressed data is placed at the end of the buffer used to
hold the decompressed data. The buffer must be large enough to contain after
the decompressed data extra space for a marker, a trailer, the maximum
possible data expansion, and (if the compressed data consists of more than
one member) N-1 empty members.

                 |------ compressed data ------|
                 V                             V
|----------------|-------------------|---------|
^                                    ^  extra
|-------- decompressed data ---------|

The input pointer initially points to the beginning of the compressed data
and the output pointer initially points to the beginning of the buffer.
Decompressing compressible data reduces the distance between the pointers,
while decompressing uncompressible data increases the distance. The extra
space must be large enough that the output pointer does not overrun the
input pointer even if all the overlap between compressed and decompressed
data is uncompressible. The worst case is very compressible data followed by
uncompressible data because in this case the output pointer increases faster
when the input pointer is smaller.

        |                        *   <-- input pointer (*)
        |                    *   ,   <-- output pointer (,)
        |                * ,  '
        |            x  '            <-- overrun (x)
memory  |        * ,'
address |    *   ,'
        |*     ,'
        |    ,'
        |  ,'
        |,'
        '--------------------------
                    time

All we need to know to calculate the minimum required extra space is:
  The maximum expansion ratio.
  The size of the last part of a member required to verify integrity.
  For multimember data, the overhead per member. (36 bytes for lzip).

The maximum expansion ratio of LZMA data is of about 1.4%. Rounding this up
to 1/64 (1.5625%) and adding 36 bytes per input member, the extra space
required to decompress lzip data in place is:

  extra_bytes = ( compressed_size >> 6 ) + members * 36

Using the compressed size to calculate the extra_bytes (as in the formula
above) may slightly overestimate the amount of space required in the worst
case. But calculating the extra_bytes from the uncompressed size (as does
linux currently) is wrong (and inefficient for high compression ratios). The
formula used in arch/x86/boot/header.S

  extra_bytes = ( uncompressed_size >> 8 ) + 65536

fails to decompress 1 MB of zeros followed by 8 MB of random data, wastes
memory for compression ratios larger than 4:1, and does not even consider
multimember data.


Copyright (C) 2016-2021 Antonio Diaz Diaz.

This file is free documentation: you have unlimited permission to copy,
distribute, and modify it.

The file Makefile.in is a data file used by configure to produce the
Makefile. It has the same copyright owner and permissions that configure
itself.