summaryrefslogtreecommitdiffstats
path: root/doc/dev/cephfs-reclaim.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/dev/cephfs-reclaim.rst')
-rw-r--r--doc/dev/cephfs-reclaim.rst104
1 files changed, 104 insertions, 0 deletions
diff --git a/doc/dev/cephfs-reclaim.rst b/doc/dev/cephfs-reclaim.rst
new file mode 100644
index 000000000..ce56f5eaa
--- /dev/null
+++ b/doc/dev/cephfs-reclaim.rst
@@ -0,0 +1,104 @@
+CephFS Reclaim Interface
+========================
+
+Introduction
+------------
+NFS servers typically do not track ephemeral state on stable storage. If
+the NFS server is restarted, then it will be resurrected with no
+ephemeral state, and the NFS clients are expected to send requests to
+reclaim what state they held during a grace period.
+
+In order to support this use-case, libcephfs has grown several functions
+that allow a client that has been stopped and restarted to destroy or
+reclaim state held by a previous incarnation of itself. This allows the
+client to reacquire state held by its previous incarnation, and to avoid
+the long wait for the old session to time out before releasing the state
+previously held.
+
+As soon as an NFS server running over cephfs goes down, it's racing
+against its MDS session timeout. If the Ceph session times out before
+the NFS grace period is started, then conflicting state could be
+acquired by another client. This mechanism also allows us to increase
+the timeout for these clients, to ensure that the server has a long
+window of time to be restarted.
+
+Setting the UUID
+----------------
+In order to properly reset or reclaim against the old session, we need a
+way to identify the old session. This done by setting a unique opaque
+value on the session using **ceph_set_uuid()**. The uuid value can be
+any string and is treated as opaque by the client.
+
+Setting the uuid directly can only be done on a new session, prior to
+mounting. When reclaim is performed the current session will inherit the
+old session's uuid.
+
+Starting Reclaim
+----------------
+After calling ceph_create and ceph_init on the resulting struct
+ceph_mount_info, the client should then issue ceph_start_reclaim,
+passing in the uuid of the previous incarnation of the client with any
+flags.
+
+CEPH_RECLAIM_RESET
+ This flag indicates that we do not intend to do any sort of reclaim
+ against the old session indicated by the given uuid, and that it
+ should just be discarded. Any state held by the previous client
+ should be released immediately.
+
+Finishing Reclaim
+-----------------
+After the Ceph client has completed all of its reclaim operations, the
+client should issue ceph_finish_reclaim to indicate that the reclaim is
+now complete.
+
+Setting Session Timeout (Optional)
+----------------------------------
+When a client dies and is restarted, and we need to preserve its state,
+we are effectively racing against the session expiration clock. In this
+situation we generally want a longer timeout since we expect to
+eventually kill off the old session manually.
+
+Example 1: Reset Old Session
+----------------------------
+This example just kills off the MDS session held by a previous instance
+of itself. An NFS server can start a grace period and then ask the MDS
+to tear down the old session. This allows clients to start reclaim
+immediately.
+
+(Note: error handling omitted for clarity)
+
+.. code-block:: c
+
+ struct ceph_mount_info *cmount;
+ const char *uuid = "foobarbaz";
+
+ /* Set up a new cephfs session, but don't mount it yet. */
+ rc = ceph_create(&cmount);
+ rc = ceph_init(&cmount);
+
+ /*
+ * Set the timeout to 5 minutes to lengthen the window of time for
+ * the server to restart, should it crash.
+ */
+ ceph_set_session_timeout(cmount, 300);
+
+ /*
+ * Start reclaim vs. session with old uuid. Before calling this,
+ * all NFS servers that could acquire conflicting state _must_ be
+ * enforcing their grace period locally.
+ */
+ rc = ceph_start_reclaim(cmount, uuid, CEPH_RECLAIM_RESET);
+
+ /* Declare reclaim complete */
+ rc = ceph_finish_reclaim(cmount);
+
+ /* Set uuid held by new session */
+ ceph_set_uuid(cmount, nodeid);
+
+ /*
+ * Now mount up the file system and do normal open/lock operations to
+ * satisfy reclaim requests.
+ */
+ ceph_mount(cmount, rootpath);
+ ...