diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 17:23:08 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 17:23:08 +0000 |
commit | dd76e45c20acc3f352ffe8257208cc617ba33eba (patch) | |
tree | c50c016a4182a27fd1ece9ec7ba4abf405f19e5f /TECHNICAL-INFO | |
parent | Initial commit. (diff) | |
download | squashfs-tools-dd76e45c20acc3f352ffe8257208cc617ba33eba.tar.xz squashfs-tools-dd76e45c20acc3f352ffe8257208cc617ba33eba.zip |
Adding upstream version 1:4.6.1.upstream/1%4.6.1upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
-rw-r--r-- | TECHNICAL-INFO | 289 |
1 files changed, 289 insertions, 0 deletions
diff --git a/TECHNICAL-INFO b/TECHNICAL-INFO new file mode 100644 index 0000000..a5cc491 --- /dev/null +++ b/TECHNICAL-INFO @@ -0,0 +1,289 @@ + +1. REPRODUCIBLE BUILDS (since version 4.4) +------------------------------------------ + +Ever since Mksquashfs was parallelised back in 2006, there +has been a certain randomness in how fragments and multi-block +files are ordered in the output filesystem even if the input +remains the same. + +This is because the multiple parallel threads can be scheduled +differently between Mksquashfs runs. For example, the thread +given fragment 10 to compress may finish before the thread +given fragment 9 to compress on one run (writing fragment 10 +to the output filesystem before fragment 9), but, on the next +run it could be vice-versa. There are many different scheduling +scenarios here, all of which can have a knock on effect causing +different scheduling and ordering later in the filesystem too. + +Mkquashfs doesn't care about the ordering of fragments and +multi-block files within the filesystem, as this does not +affect the correctness of the filesystem. + +In fact not caring about the ordering, as it doesn't matter, allows +Mksquashfs to run as fast as possible, maximising CPU and I/O +performance. + +But, in the last couple of years, Squashfs has become used in +scenarios (cloud etc) where this randomness is causing problems. +Specifically this appears to be where downloaders, installers etc. +try to work out the differences between Squashfs filesystem +updates to minimise the amount of data that needs to transferred +to update an image. + +Additionally, in the last couple of years has arisen the notion +of reproducible builds, that is the same source and build +environment etc should be able to (re-)generate identical +output. This is usually for verification and security, allowing +binaries/distributions to be checked for malicious activity. +See https://reproducible-builds.org/ for more information. + +Mksquashfs now generates reproducible images by default. +Images generated by Mksquashfs will be ordered identically to +previous runs if the same input has been supplied, and the +same options used. + +1.1 Dealing with timestamps + +Timestamps embedded in the filesystem will stiil cause differences. +Each new run of Mksquashfs will produce a different mkfs (make filesystem) +timestamp in the super-block. Moreover if any file timestamps have changed +(even if the content hasn't), this will produce a difference. + +To prevent timestamps from producing differences, the following +new Mksquashfs options have been added. + +1.1.1 -mkfs-time <time> + +Set mkfs time to <time>. Time can be an integer which is the seconds since +the epoch of 1970-01-01 00:00:00 UTC), or a date string as recognised by the +"date" command. + +1.1.2 -all-time <time> + +Set all file timestamps to <time>. Time can be an integer which is the seconds +since the epoch of 1970-01-01 00:00:00 UTC), or a date string as recognised by +the "date" command. + +1.1.3 environment variable SOURCE_DATE_EPOCH + +As an alternative to the above command line options, you can +set the environment variable SOURCE_DATE_EPOCH to a time value. + +This value will be used to set the mkfs time. Also any +file timestamps which are after SOURCE_DATE_EPOCH will be +clamped to SOURCE_DATE_EPOCH. + +See https://reproducible-builds.org/docs/source-date-epoch/ +for more information. + +1.1.4 -not-reproducible + +This option tells Mksquashfs that the files do not have to be +strictly ordered. This will make Mksquashfs behave like version 4.3. + + +2. EXTENDED ATTRIBUTES (XATTRS) +------------------------------- + +Squashfs file systems now has extended attribute support. The +extended attribute implementation has the following features: + +1. Layout can store up to 2^48 bytes of compressed xattr data. +2. Number of xattrs per inode unlimited. +3. Total size of xattr data per inode 2^48 bytes of compressed data. +4. Up to 4 Gbytes of data per xattr value. +5. Inline and out-of-line xattr values supported for higher performance + in xattr scanning (listxattr & getxattr), and to allow xattr value + de-duplication. +6. Both whole inode xattr duplicate detection and individual xattr value + duplicate detection supported. These can obviously nest, file C's + xattrs can be a complete duplicate of file B, and file B's xattrs + can be a partial duplicate of file A. +7. Xattr name prefix types stored, allowing the redundant "user.", "trusted." + etc. characters to be eliminated and more concisely stored. +8. Support for files, directories, symbolic links, device nodes, fifos + and sockets. + +Extended attribute support is in 2.6.35 and later kernels. File systems +with extended attributes can be mounted on 2.6.29 and later kernels, the +extended attributes will be ignored with a warning. + + +3. FILESYSTEM LAYOUT +-------------------- + +A squashfs filesystem consists of a maximum of nine parts, packed together on a +byte alignment: + + --------------- + | superblock | + |---------------| + | compression | + | options | + |---------------| + | datablocks | + | & fragments | + |---------------| + | inode table | + |---------------| + | directory | + | table | + |---------------| + | fragment | + | table | + |---------------| + | export | + | table | + |---------------| + | uid/gid | + | lookup table | + |---------------| + | xattr | + | table | + --------------- + +Compressed data blocks are written to the filesystem as files are read from +the source directory, and checked for duplicates. Once all file data has been +written the completed super-block, compression options, inode, directory, +fragment, export, uid/gid lookup and xattr tables are written. + +3.1 Compression options +----------------------- + +Compressors can optionally support compression specific options (e.g. +dictionary size). If non-default compression options have been used, then +these are stored here. + +3.2 Inodes +---------- + +Metadata (inodes and directories) are compressed in 8Kbyte blocks. Each +compressed block is prefixed by a two byte length, the top bit is set if the +block is uncompressed. A block will be uncompressed if the -noI option is set, +or if the compressed block was larger than the uncompressed block. + +Inodes are packed into the metadata blocks, and are not aligned to block +boundaries, therefore inodes overlap compressed blocks. Inodes are identified +by a 48-bit number which encodes the location of the compressed metadata block +containing the inode, and the byte offset into that block where the inode is +placed (<block, offset>). + +To maximise compression there are different inodes for each file type +(regular file, directory, device, etc.), the inode contents and length +varying with the type. + +To further maximise compression, two types of regular file inode and +directory inode are defined: inodes optimised for frequently occurring +regular files and directories, and extended types where extra +information has to be stored. + +3.3 Directories +--------------- + +Like inodes, directories are packed into compressed metadata blocks, stored +in a directory table. Directories are accessed using the start address of +the metablock containing the directory and the offset into the +decompressed block (<block, offset>). + +Directories are organised in a slightly complex way, and are not simply +a list of file names. The organisation takes advantage of the +fact that (in most cases) the inodes of the files will be in the same +compressed metadata block, and therefore, can share the start block. +Directories are therefore organised in a two level list, a directory +header containing the shared start block value, and a sequence of directory +entries, each of which share the shared start block. A new directory header +is written once/if the inode start block changes. The directory +header/directory entry list is repeated as many times as necessary. + +Directories are sorted, and can contain a directory index to speed up +file lookup. Directory indexes store one entry per metablock, each entry +storing the index/filename mapping to the first directory header +in each metadata block. Directories are sorted in alphabetical order, +and at lookup the index is scanned linearly looking for the first filename +alphabetically larger than the filename being looked up. At this point the +location of the metadata block the filename is in has been found. +The general idea of the index is ensure only one metadata block needs to be +decompressed to do a lookup irrespective of the length of the directory. +This scheme has the advantage that it doesn't require extra memory overhead +and doesn't require much extra storage on disk. + +3.4 File data +------------- + +Regular files consist of a sequence of contiguous compressed blocks, and/or a +compressed fragment block (tail-end packed block). The compressed size +of each datablock is stored in a block list contained within the +file inode. + +To speed up access to datablocks when reading 'large' files (256 Mbytes or +larger), the code implements an index cache that caches the mapping from +block index to datablock location on disk. + +The index cache allows Squashfs to handle large files (up to 1.75 TiB) while +retaining a simple and space-efficient block list on disk. The cache +is split into slots, caching up to eight 224 GiB files (128 KiB blocks). +Larger files use multiple slots, with 1.75 TiB files using all 8 slots. +The index cache is designed to be memory efficient, and by default uses +16 KiB. + +3.5 Fragment lookup table +------------------------- + +Regular files can contain a fragment index which is mapped to a fragment +location on disk and compressed size using a fragment lookup table. This +fragment lookup table is itself stored compressed into metadata blocks. +A second index table is used to locate these. This second index table for +speed of access (and because it is small) is read at mount time and cached +in memory. + +3.6 Uid/gid lookup table +------------------------ + +For space efficiency regular files store uid and gid indexes, which are +converted to 32-bit uids/gids using an id look up table. This table is +stored compressed into metadata blocks. A second index table is used to +locate these. This second index table for speed of access (and because it +is small) is read at mount time and cached in memory. + +3.7 Export table +---------------- + +To enable Squashfs filesystems to be exportable (via NFS etc.) filesystems +can optionally (disabled with the -no-exports Mksquashfs option) contain +an inode number to inode disk location lookup table. This is required to +enable Squashfs to map inode numbers passed in filehandles to the inode +location on disk, which is necessary when the export code reinstantiates +expired/flushed inodes. + +This table is stored compressed into metadata blocks. A second index table is +used to locate these. This second index table for speed of access (and because +it is small) is read at mount time and cached in memory. + +3.8 Xattr table +--------------- + +The xattr table contains extended attributes for each inode. The xattrs +for each inode are stored in a list, each list entry containing a type, +name and value field. The type field encodes the xattr prefix +("user.", "trusted." etc) and it also encodes how the name/value fields +should be interpreted. Currently the type indicates whether the value +is stored inline (in which case the value field contains the xattr value), +or if it is stored out of line (in which case the value field stores a +reference to where the actual value is stored). This allows large values +to be stored out of line improving scanning and lookup performance and it +also allows values to be de-duplicated, the value being stored once, and +all other occurences holding an out of line reference to that value. + +The xattr lists are packed into compressed 8K metadata blocks. +To reduce overhead in inodes, rather than storing the on-disk +location of the xattr list inside each inode, a 32-bit xattr id +is stored. This xattr id is mapped into the location of the xattr +list using a second xattr id lookup table. + +4. AUTHOR INFO +-------------- + +Squashfs was written by Phillip Lougher, email phillip@squashfs.org.uk, +in Chepstow, Wales, UK. If you like the program, or have any problems, +then please email me, as it's nice to get feedback! |