summaryrefslogtreecommitdiffstats
path: root/doc/dev/mds_internals
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--doc/dev/mds_internals/data-structures.rst44
-rw-r--r--doc/dev/mds_internals/exports.rst76
-rw-r--r--doc/dev/mds_internals/index.rst10
3 files changed, 130 insertions, 0 deletions
diff --git a/doc/dev/mds_internals/data-structures.rst b/doc/dev/mds_internals/data-structures.rst
new file mode 100644
index 000000000..c77175a16
--- /dev/null
+++ b/doc/dev/mds_internals/data-structures.rst
@@ -0,0 +1,44 @@
+MDS internal data structures
+==============================
+
+*CInode*
+ CInode contains the metadata of a file, there is one CInode for each file.
+ The CInode stores information like who owns the file, how big the file is.
+
+*CDentry*
+ CDentry is the glue that holds inodes and files together by relating inode to
+ file/directory names. A CDentry links to at most one CInode (it may not link
+ to any CInode). A CInode may be linked by multiple CDentries.
+
+*CDir*
+ CDir only exists for directory inode, it's used to link CDentries under the
+ directory. A CInode can have multiple CDir when the directory is fragmented.
+
+These data structures are linked together as::
+
+ CInode
+ CDir
+ | \
+ | \
+ | \
+ CDentry CDentry
+ CInode CInode
+ CDir CDir
+ | | \
+ | | \
+ | | \
+ CDentry CDentry CDentry
+ CInode CInode CInode
+
+As this doc is being written, size of CInode is about 1400 bytes, size of CDentry
+is about 400 bytes, size of CDir is about 700 bytes. These data structures are
+quite large. Please be careful if you want to add new fields to them.
+
+*OpenFileTable*
+ Open file table tracks open files and their ancestor directories. Recovering
+ MDS can easily get open files' paths, significantly reducing the time of
+ loading inodes for open files. Each entry in the table corresponds to an inode,
+ it records linkage information (parent inode and dentry name) of the inode. MDS
+ can constructs the inode's path by recursively lookup parent inode's linkage.
+ Open file table is stored in omap of RADOS objects, table entries correspond to
+ KV pairs in omap.
diff --git a/doc/dev/mds_internals/exports.rst b/doc/dev/mds_internals/exports.rst
new file mode 100644
index 000000000..c5b0e3915
--- /dev/null
+++ b/doc/dev/mds_internals/exports.rst
@@ -0,0 +1,76 @@
+
+===============
+Subtree exports
+===============
+
+Normal Migration
+----------------
+
+The exporter begins by doing some checks in export_dir() to verify
+that it is permissible to export the subtree at this time. In
+particular, the cluster must not be degraded, the subtree root may not
+be freezing or frozen (\ie already exporting, or nested beneath
+something that is exporting), and the path must be pinned (\ie not
+conflicted with a rename). If these conditions are met, the subtree
+freeze is initiated, and the exporter is committed to the subtree
+migration, barring an intervening failure of the importer or itself.
+
+The MExportDirDiscover serves simply to ensure that the base directory
+being exported is open on the destination node. It is pinned by the
+importer to prevent it from being trimmed. This occurs before the
+exporter completes the freeze of the subtree to ensure that the
+importer is able to replicate the necessary metadata. When the
+exporter receives the MExportDirDiscoverAck, it allows the freeze to proceed.
+
+The MExportDirPrep message then follows to populate a spanning tree that
+includes all dirs, inodes, and dentries necessary to reach any nested
+exports within the exported region. This replicates metadata as well,
+but it is pushed out by the exporter, avoiding deadlock with the
+regular discover and replication process. The importer is responsible
+for opening the bounding directories from any third parties before
+acknowledging. This ensures that the importer has correct dir_auth
+information about where authority is delegated for all points nested
+within the subtree being migrated. While processing the MExportDirPrep,
+the importer freezes the entire subtree region to prevent any new
+replication or cache expiration.
+
+The warning stage occurs only if the base subtree directory is open by
+nodes other than the importer and exporter. If so, then a
+MExportDirNotify message informs any bystanders that the authority for
+the region is temporarily ambiguous. In particular, bystanders who
+are trimming items from their cache must send MCacheExpire messages to
+both the old and new authorities. This is necessary to ensure that
+the surviving authority reliably receives all expirations even if the
+importer or exporter fails. While the subtree is frozen (on both the
+importer and exporter), expirations will not be immediately processed;
+instead, they will be queued until the region is unfrozen and it can
+be determined that the node is or is not authoritative for the region.
+
+The MExportDir message sends the actual subtree metadata to the importer.
+Upon receipt, the importer inserts the data into its cache, logs a
+copy in the EImportStart, and replies with an MExportDirAck. The exporter
+can now log an EExport, which ultimately specifies that
+the export was a success. In the presence of failures, it is the
+existence of the EExport that disambiguates authority during recovery.
+
+Once logged, the exporter will send an MExportDirNotify to any
+bystanders, informing them that the authority is no longer ambiguous
+and cache expirations should be sent only to the new authority (the
+importer). Once these are acknowledged, implicitly flushing the
+bystander to exporter message streams of any stray expiration notices,
+the exporter unfreezes the subtree, cleans up its state, and sends a
+final MExportDirFinish to the importer. Upon receipt, the importer logs
+an EImportFinish(true), unfreezes its subtree, and cleans up its
+state.
+
+
+PARTIAL FAILURE RECOVERY
+
+
+
+RECOVERY FROM JOURNAL
+
+
+
+
+
diff --git a/doc/dev/mds_internals/index.rst b/doc/dev/mds_internals/index.rst
new file mode 100644
index 000000000..c8c82ad10
--- /dev/null
+++ b/doc/dev/mds_internals/index.rst
@@ -0,0 +1,10 @@
+==============================
+MDS developer documentation
+==============================
+
+.. rubric:: Contents
+
+.. toctree::
+ :glob:
+
+ *