summaryrefslogtreecommitdiffstats
path: root/TECHNICAL-INFO
diff options
context:
space:
mode:
Diffstat (limited to 'TECHNICAL-INFO')
-rw-r--r--TECHNICAL-INFO289
1 files changed, 289 insertions, 0 deletions
diff --git a/TECHNICAL-INFO b/TECHNICAL-INFO
new file mode 100644
index 0000000..a5cc491
--- /dev/null
+++ b/TECHNICAL-INFO
@@ -0,0 +1,289 @@
+
+1. REPRODUCIBLE BUILDS (since version 4.4)
+------------------------------------------
+
+Ever since Mksquashfs was parallelised back in 2006, there
+has been a certain randomness in how fragments and multi-block
+files are ordered in the output filesystem even if the input
+remains the same.
+
+This is because the multiple parallel threads can be scheduled
+differently between Mksquashfs runs. For example, the thread
+given fragment 10 to compress may finish before the thread
+given fragment 9 to compress on one run (writing fragment 10
+to the output filesystem before fragment 9), but, on the next
+run it could be vice-versa. There are many different scheduling
+scenarios here, all of which can have a knock on effect causing
+different scheduling and ordering later in the filesystem too.
+
+Mkquashfs doesn't care about the ordering of fragments and
+multi-block files within the filesystem, as this does not
+affect the correctness of the filesystem.
+
+In fact not caring about the ordering, as it doesn't matter, allows
+Mksquashfs to run as fast as possible, maximising CPU and I/O
+performance.
+
+But, in the last couple of years, Squashfs has become used in
+scenarios (cloud etc) where this randomness is causing problems.
+Specifically this appears to be where downloaders, installers etc.
+try to work out the differences between Squashfs filesystem
+updates to minimise the amount of data that needs to transferred
+to update an image.
+
+Additionally, in the last couple of years has arisen the notion
+of reproducible builds, that is the same source and build
+environment etc should be able to (re-)generate identical
+output. This is usually for verification and security, allowing
+binaries/distributions to be checked for malicious activity.
+See https://reproducible-builds.org/ for more information.
+
+Mksquashfs now generates reproducible images by default.
+Images generated by Mksquashfs will be ordered identically to
+previous runs if the same input has been supplied, and the
+same options used.
+
+1.1 Dealing with timestamps
+
+Timestamps embedded in the filesystem will stiil cause differences.
+Each new run of Mksquashfs will produce a different mkfs (make filesystem)
+timestamp in the super-block. Moreover if any file timestamps have changed
+(even if the content hasn't), this will produce a difference.
+
+To prevent timestamps from producing differences, the following
+new Mksquashfs options have been added.
+
+1.1.1 -mkfs-time <time>
+
+Set mkfs time to <time>. Time can be an integer which is the seconds since
+the epoch of 1970-01-01 00:00:00 UTC), or a date string as recognised by the
+"date" command.
+
+1.1.2 -all-time <time>
+
+Set all file timestamps to <time>. Time can be an integer which is the seconds
+since the epoch of 1970-01-01 00:00:00 UTC), or a date string as recognised by
+the "date" command.
+
+1.1.3 environment variable SOURCE_DATE_EPOCH
+
+As an alternative to the above command line options, you can
+set the environment variable SOURCE_DATE_EPOCH to a time value.
+
+This value will be used to set the mkfs time. Also any
+file timestamps which are after SOURCE_DATE_EPOCH will be
+clamped to SOURCE_DATE_EPOCH.
+
+See https://reproducible-builds.org/docs/source-date-epoch/
+for more information.
+
+1.1.4 -not-reproducible
+
+This option tells Mksquashfs that the files do not have to be
+strictly ordered. This will make Mksquashfs behave like version 4.3.
+
+
+2. EXTENDED ATTRIBUTES (XATTRS)
+-------------------------------
+
+Squashfs file systems now has extended attribute support. The
+extended attribute implementation has the following features:
+
+1. Layout can store up to 2^48 bytes of compressed xattr data.
+2. Number of xattrs per inode unlimited.
+3. Total size of xattr data per inode 2^48 bytes of compressed data.
+4. Up to 4 Gbytes of data per xattr value.
+5. Inline and out-of-line xattr values supported for higher performance
+ in xattr scanning (listxattr & getxattr), and to allow xattr value
+ de-duplication.
+6. Both whole inode xattr duplicate detection and individual xattr value
+ duplicate detection supported. These can obviously nest, file C's
+ xattrs can be a complete duplicate of file B, and file B's xattrs
+ can be a partial duplicate of file A.
+7. Xattr name prefix types stored, allowing the redundant "user.", "trusted."
+ etc. characters to be eliminated and more concisely stored.
+8. Support for files, directories, symbolic links, device nodes, fifos
+ and sockets.
+
+Extended attribute support is in 2.6.35 and later kernels. File systems
+with extended attributes can be mounted on 2.6.29 and later kernels, the
+extended attributes will be ignored with a warning.
+
+
+3. FILESYSTEM LAYOUT
+--------------------
+
+A squashfs filesystem consists of a maximum of nine parts, packed together on a
+byte alignment:
+
+ ---------------
+ | superblock |
+ |---------------|
+ | compression |
+ | options |
+ |---------------|
+ | datablocks |
+ | & fragments |
+ |---------------|
+ | inode table |
+ |---------------|
+ | directory |
+ | table |
+ |---------------|
+ | fragment |
+ | table |
+ |---------------|
+ | export |
+ | table |
+ |---------------|
+ | uid/gid |
+ | lookup table |
+ |---------------|
+ | xattr |
+ | table |
+ ---------------
+
+Compressed data blocks are written to the filesystem as files are read from
+the source directory, and checked for duplicates. Once all file data has been
+written the completed super-block, compression options, inode, directory,
+fragment, export, uid/gid lookup and xattr tables are written.
+
+3.1 Compression options
+-----------------------
+
+Compressors can optionally support compression specific options (e.g.
+dictionary size). If non-default compression options have been used, then
+these are stored here.
+
+3.2 Inodes
+----------
+
+Metadata (inodes and directories) are compressed in 8Kbyte blocks. Each
+compressed block is prefixed by a two byte length, the top bit is set if the
+block is uncompressed. A block will be uncompressed if the -noI option is set,
+or if the compressed block was larger than the uncompressed block.
+
+Inodes are packed into the metadata blocks, and are not aligned to block
+boundaries, therefore inodes overlap compressed blocks. Inodes are identified
+by a 48-bit number which encodes the location of the compressed metadata block
+containing the inode, and the byte offset into that block where the inode is
+placed (<block, offset>).
+
+To maximise compression there are different inodes for each file type
+(regular file, directory, device, etc.), the inode contents and length
+varying with the type.
+
+To further maximise compression, two types of regular file inode and
+directory inode are defined: inodes optimised for frequently occurring
+regular files and directories, and extended types where extra
+information has to be stored.
+
+3.3 Directories
+---------------
+
+Like inodes, directories are packed into compressed metadata blocks, stored
+in a directory table. Directories are accessed using the start address of
+the metablock containing the directory and the offset into the
+decompressed block (<block, offset>).
+
+Directories are organised in a slightly complex way, and are not simply
+a list of file names. The organisation takes advantage of the
+fact that (in most cases) the inodes of the files will be in the same
+compressed metadata block, and therefore, can share the start block.
+Directories are therefore organised in a two level list, a directory
+header containing the shared start block value, and a sequence of directory
+entries, each of which share the shared start block. A new directory header
+is written once/if the inode start block changes. The directory
+header/directory entry list is repeated as many times as necessary.
+
+Directories are sorted, and can contain a directory index to speed up
+file lookup. Directory indexes store one entry per metablock, each entry
+storing the index/filename mapping to the first directory header
+in each metadata block. Directories are sorted in alphabetical order,
+and at lookup the index is scanned linearly looking for the first filename
+alphabetically larger than the filename being looked up. At this point the
+location of the metadata block the filename is in has been found.
+The general idea of the index is ensure only one metadata block needs to be
+decompressed to do a lookup irrespective of the length of the directory.
+This scheme has the advantage that it doesn't require extra memory overhead
+and doesn't require much extra storage on disk.
+
+3.4 File data
+-------------
+
+Regular files consist of a sequence of contiguous compressed blocks, and/or a
+compressed fragment block (tail-end packed block). The compressed size
+of each datablock is stored in a block list contained within the
+file inode.
+
+To speed up access to datablocks when reading 'large' files (256 Mbytes or
+larger), the code implements an index cache that caches the mapping from
+block index to datablock location on disk.
+
+The index cache allows Squashfs to handle large files (up to 1.75 TiB) while
+retaining a simple and space-efficient block list on disk. The cache
+is split into slots, caching up to eight 224 GiB files (128 KiB blocks).
+Larger files use multiple slots, with 1.75 TiB files using all 8 slots.
+The index cache is designed to be memory efficient, and by default uses
+16 KiB.
+
+3.5 Fragment lookup table
+-------------------------
+
+Regular files can contain a fragment index which is mapped to a fragment
+location on disk and compressed size using a fragment lookup table. This
+fragment lookup table is itself stored compressed into metadata blocks.
+A second index table is used to locate these. This second index table for
+speed of access (and because it is small) is read at mount time and cached
+in memory.
+
+3.6 Uid/gid lookup table
+------------------------
+
+For space efficiency regular files store uid and gid indexes, which are
+converted to 32-bit uids/gids using an id look up table. This table is
+stored compressed into metadata blocks. A second index table is used to
+locate these. This second index table for speed of access (and because it
+is small) is read at mount time and cached in memory.
+
+3.7 Export table
+----------------
+
+To enable Squashfs filesystems to be exportable (via NFS etc.) filesystems
+can optionally (disabled with the -no-exports Mksquashfs option) contain
+an inode number to inode disk location lookup table. This is required to
+enable Squashfs to map inode numbers passed in filehandles to the inode
+location on disk, which is necessary when the export code reinstantiates
+expired/flushed inodes.
+
+This table is stored compressed into metadata blocks. A second index table is
+used to locate these. This second index table for speed of access (and because
+it is small) is read at mount time and cached in memory.
+
+3.8 Xattr table
+---------------
+
+The xattr table contains extended attributes for each inode. The xattrs
+for each inode are stored in a list, each list entry containing a type,
+name and value field. The type field encodes the xattr prefix
+("user.", "trusted." etc) and it also encodes how the name/value fields
+should be interpreted. Currently the type indicates whether the value
+is stored inline (in which case the value field contains the xattr value),
+or if it is stored out of line (in which case the value field stores a
+reference to where the actual value is stored). This allows large values
+to be stored out of line improving scanning and lookup performance and it
+also allows values to be de-duplicated, the value being stored once, and
+all other occurences holding an out of line reference to that value.
+
+The xattr lists are packed into compressed 8K metadata blocks.
+To reduce overhead in inodes, rather than storing the on-disk
+location of the xattr list inside each inode, a 32-bit xattr id
+is stored. This xattr id is mapped into the location of the xattr
+list using a second xattr id lookup table.
+
+4. AUTHOR INFO
+--------------
+
+Squashfs was written by Phillip Lougher, email phillip@squashfs.org.uk,
+in Chepstow, Wales, UK. If you like the program, or have any problems,
+then please email me, as it's nice to get feedback!