summaryrefslogtreecommitdiffstats
path: root/doc/lzma-file-format.txt
blob: 8cce5dcce749121b0855010e6e5576d545ba6a68 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
The .lzma File Format
=====================

        0. Preface
           0.1. Notices and Acknowledgements
           0.2. Changes
        1. File Format
           1.1. Header
                1.1.1. Properties
                1.1.2. Dictionary Size
                1.1.3. Uncompressed Size
           1.2. LZMA Compressed Data
        2. References


0. Preface

        This document describes the .lzma file format, which is
        sometimes also called LZMA_Alone format. It is a legacy file
        format, which is being or has been replaced by the .xz format.
        The MIME type of the .lzma format is `application/x-lzma'.

        The most commonly used software to handle .lzma files are
        LZMA SDK, LZMA Utils, 7-Zip, and XZ Utils. This document
        describes some of the differences between these implementations
        and gives hints what subset of the .lzma format is the most
        portable.


0.1. Notices and Acknowledgements

        This file format was designed by Igor Pavlov for use in
        LZMA SDK. This document was written by Lasse Collin
        <lasse.collin@tukaani.org> using the documentation found
        from the LZMA SDK.

        This document has been put into the public domain.


0.2. Changes

        Last modified: 2024-04-08 17:35+0300

        From version 2011-04-12 11:55+0300 to 2022-07-13 21:00+0300:
        The section 1.1.3 was modified to allow End of Payload Marker
        with a known Uncompressed Size.


1. File Format

        +-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+
        |         Header          |   LZMA Compressed Data   |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+

        The .lzma format file consist of 13-byte Header followed by
        the LZMA Compressed Data.

        Unlike the .gz, .bz2, and .xz formats, it is not possible to
        concatenate multiple .lzma files as is and expect the
        decompression tool to decode the resulting file as if it were
        a single .lzma file.

        For example, the command line tools from LZMA Utils and
        LZMA SDK silently ignore all the data after the first .lzma
        stream. In contrast, the command line tool from XZ Utils
        considers the .lzma file to be corrupt if there is data after
        the first .lzma stream.


1.1. Header

        +------------+----+----+----+----+--+--+--+--+--+--+--+--+
        | Properties |  Dictionary Size  |   Uncompressed Size   |
        +------------+----+----+----+----+--+--+--+--+--+--+--+--+


1.1.1. Properties

        The Properties field contains three properties. An abbreviation
        is given in parentheses, followed by the value range of the
        property. The field consists of

            1) the number of literal context bits (lc, [0, 8]);
            2) the number of literal position bits (lp, [0, 4]); and
            3) the number of position bits (pb, [0, 4]).

        The properties are encoded using the following formula:

            Properties = (pb * 5 + lp) * 9 + lc

        The following C code illustrates a straightforward way to
        decode the Properties field:

            uint8_t lc, lp, pb;
            uint8_t prop = get_lzma_properties();
            if (prop > (4 * 5 + 4) * 9 + 8)
                return LZMA_PROPERTIES_ERROR;

            pb = prop / (9 * 5);
            prop -= pb * 9 * 5;
            lp = prop / 9;
            lc = prop - lp * 9;

        XZ Utils has an additional requirement: lc + lp <= 4. Files
        which don't follow this requirement cannot be decompressed
        with XZ Utils. Usually this isn't a problem since the most
        common lc/lp/pb values are 3/0/2. It is the only lc/lp/pb
        combination that the files created by LZMA Utils can have,
        but LZMA Utils can decompress files with any lc/lp/pb.


1.1.2. Dictionary Size

        Dictionary Size is stored as an unsigned 32-bit little endian
        integer. Any 32-bit value is possible, but for maximum
        portability, only sizes of 2^n and 2^n + 2^(n-1) should be
        used.

        LZMA Utils creates only files with dictionary size 2^n,
        16 <= n <= 25. LZMA Utils can decompress files with any
        dictionary size.

        XZ Utils creates and decompresses .lzma files only with
        dictionary sizes 2^n and 2^n + 2^(n-1). If some other
        dictionary size is specified when compressing, the value
        stored in the Dictionary Size field is a rounded up, but the
        specified value is still used in the actual compression code.


1.1.3. Uncompressed Size

        Uncompressed Size is stored as unsigned 64-bit little endian
        integer. A special value of 0xFFFF_FFFF_FFFF_FFFF indicates
        that Uncompressed Size is unknown. End of Payload Marker (*)
        is used if Uncompressed Size is unknown. End of Payload Marker
        is allowed but rarely used if Uncompressed Size is known.
        XZ Utils 5.2.5 and older don't support .lzma files that have
        End of Payload Marker together with a known Uncompressed Size.

        XZ Utils rejects files whose Uncompressed Size field specifies
        a known size that is 256 GiB or more. This is to reject false
        positives when trying to guess if the input file is in the
        .lzma format. When Uncompressed Size is unknown, there is no
        limit for the uncompressed size of the file.

        (*) Some tools use the term End of Stream (EOS) marker
            instead of End of Payload Marker.


1.2. LZMA Compressed Data

        Detailed description of the format of this field is out of
        scope of this document.


2. References

        LZMA SDK - The original LZMA implementation
        https://7-zip.org/sdk.html

        7-Zip
        https://7-zip.org/

        LZMA Utils - LZMA adapted to POSIX-like systems
        https://tukaani.org/lzma/

        XZ Utils - The next generation of LZMA Utils
        https://tukaani.org/xz/

        The .xz file format - The successor of the .lzma format
        https://tukaani.org/xz/xz-file-format.txt