summaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems/ext4/orphan.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/filesystems/ext4/orphan.rst')
-rw-r--r--Documentation/filesystems/ext4/orphan.rst42
1 files changed, 42 insertions, 0 deletions
diff --git a/Documentation/filesystems/ext4/orphan.rst b/Documentation/filesystems/ext4/orphan.rst
new file mode 100644
index 000000000..03cca1788
--- /dev/null
+++ b/Documentation/filesystems/ext4/orphan.rst
@@ -0,0 +1,42 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Orphan file
+-----------
+
+In unix there can inodes that are unlinked from directory hierarchy but that
+are still alive because they are open. In case of crash the filesystem has to
+clean up these inodes as otherwise they (and the blocks referenced from them)
+would leak. Similarly if we truncate or extend the file, we need not be able
+to perform the operation in a single journalling transaction. In such case we
+track the inode as orphan so that in case of crash extra blocks allocated to
+the file get truncated.
+
+Traditionally ext4 tracks orphan inodes in a form of single linked list where
+superblock contains the inode number of the last orphan inode (s_last_orphan
+field) and then each inode contains inode number of the previously orphaned
+inode (we overload i_dtime inode field for this). However this filesystem
+global single linked list is a scalability bottleneck for workloads that result
+in heavy creation of orphan inodes. When orphan file feature
+(COMPAT_ORPHAN_FILE) is enabled, the filesystem has a special inode
+(referenced from the superblock through s_orphan_file_inum) with several
+blocks. Each of these blocks has a structure:
+
+============= ================ =============== ===============================
+Offset Type Name Description
+============= ================ =============== ===============================
+0x0 Array of Orphan inode Each __le32 entry is either
+ __le32 entries entries empty (0) or it contains
+ inode number of an orphan
+ inode.
+blocksize-8 __le32 ob_magic Magic value stored in orphan
+ block tail (0x0b10ca04)
+blocksize-4 __le32 ob_checksum Checksum of the orphan block.
+============= ================ =============== ===============================
+
+When a filesystem with orphan file feature is writeably mounted, we set
+RO_COMPAT_ORPHAN_PRESENT feature in the superblock to indicate there may
+be valid orphan entries. In case we see this feature when mounting the
+filesystem, we read the whole orphan file and process all orphan inodes found
+there as usual. When cleanly unmounting the filesystem we remove the
+RO_COMPAT_ORPHAN_PRESENT feature to avoid unnecessary scanning of the orphan
+file and also make the filesystem fully compatible with older kernels.