diff options
Diffstat (limited to 'doc/cephfs/mds-states.rst')
-rw-r--r-- | doc/cephfs/mds-states.rst | 230 |
1 files changed, 230 insertions, 0 deletions
diff --git a/doc/cephfs/mds-states.rst b/doc/cephfs/mds-states.rst new file mode 100644 index 000000000..c7da639a0 --- /dev/null +++ b/doc/cephfs/mds-states.rst @@ -0,0 +1,230 @@ + +MDS States +========== + + +The Metadata Server (MDS) goes through several states during normal operation +in CephFS. For example, some states indicate that the MDS is recovering from a +failover by a previous instance of the MDS. Here we'll document all of these +states and include a state diagram to visualize the transitions. + +State Descriptions +------------------ + +Common states +~~~~~~~~~~~~~~ + + +:: + + up:active + +This is the normal operating state of the MDS. It indicates that the MDS +and its rank in the file system is available. + + +:: + + up:standby + +The MDS is available to takeover for a failed rank (see also :ref:`mds-standby`). +The monitor will automatically assign an MDS in this state to a failed rank +once available. + + +:: + + up:standby_replay + +The MDS is following the journal of another ``up:active`` MDS. Should the +active MDS fail, having a standby MDS in replay mode is desirable as the MDS is +replaying the live journal and will more quickly takeover. A downside to having +standby replay MDSs is that they are not available to takeover for any other +MDS that fails, only the MDS they follow. + + +Less common or transitory states +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + +:: + + up:boot + +This state is broadcast to the Ceph monitors during startup. This state is +never visible as the Monitor immediately assign the MDS to an available rank or +commands the MDS to operate as a standby. The state is documented here for +completeness. + + +:: + + up:creating + +The MDS is creating a new rank (perhaps rank 0) by constructing some per-rank +metadata (like the journal) and entering the MDS cluster. + + +:: + + up:starting + +The MDS is restarting a stopped rank. It opens associated per-rank metadata +and enters the MDS cluster. + + +:: + + up:stopping + +When a rank is stopped, the monitors command an active MDS to enter the +``up:stopping`` state. In this state, the MDS accepts no new client +connections, migrates all subtrees to other ranks in the file system, flush its +metadata journal, and, if the last rank (0), evict all clients and shutdown +(see also :ref:`cephfs-administration`). + + +:: + + up:replay + +The MDS taking over a failed rank. This state represents that the MDS is +recovering its journal and other metadata. + + +:: + + up:resolve + +The MDS enters this state from ``up:replay`` if the Ceph file system has +multiple ranks (including this one), i.e. it's not a single active MDS cluster. +The MDS is resolving any uncommitted inter-MDS operations. All ranks in the +file system must be in this state or later for progress to be made, i.e. no +rank can be failed/damaged or ``up:replay``. + + +:: + + up:reconnect + +An MDS enters this state from ``up:replay`` or ``up:resolve``. This state is to +solicit reconnections from clients. Any client which had a session with this +rank must reconnect during this time, configurable via +``mds_reconnect_timeout``. + + +:: + + up:rejoin + +The MDS enters this state from ``up:reconnect``. In this state, the MDS is +rejoining the MDS cluster cache. In particular, all inter-MDS locks on metadata +are reestablished. + +If there are no known client requests to be replayed, the MDS directly becomes +``up:active`` from this state. + + +:: + + up:clientreplay + +The MDS may enter this state from ``up:rejoin``. The MDS is replaying any +client requests which were replied to but not yet durable (not journaled). +Clients resend these requests during ``up:reconnect`` and the requests are +replayed once again. The MDS enters ``up:active`` after completing replay. + + +Failed states +~~~~~~~~~~~~~ + +:: + + down:failed + +No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example: + +:: + + $ ceph fs dump + ... + max_mds 1 + in 0 + up {} + failed 0 + ... + +Rank 0 is part of the failed set and is pending to be taken over by a standby +MDS. If this state persists, it indicates no suitable MDS daemons found to be +assigned to this rank. This may be caused by not enough standby daemons, or all +standby daemons have incompatible compat (see also :ref:`upgrade-mds-cluster`). + + +:: + + down:damaged + +No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example: + +:: + + $ ceph fs dump + ... + max_mds 1 + in 0 + up {} + failed + damaged 0 + ... + +Rank 0 has become damaged (see also :ref:`cephfs-disaster-recovery`) and placed in +the ``damaged`` set. An MDS which was running as rank 0 found metadata damage +that could not be automatically recovered. Operator intervention is required. + + +:: + + down:stopped + +No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example: + +:: + + $ ceph fs dump + ... + max_mds 1 + in 0 + up {} + failed + damaged + stopped 1 + ... + +The rank has been stopped by reducing ``max_mds`` (see also :ref:`cephfs-multimds`). + +State Diagram +------------- + +This state diagram shows the possible state transitions for the MDS/rank. The legend is as follows: + +Color +~~~~~ + +- Green: MDS is active. +- Orange: MDS is in transient state trying to become active. +- Red: MDS is indicating a state that causes the rank to be marked failed. +- Purple: MDS and rank is stopping. +- Black: MDS is indicating a state that causes the rank to be marked damaged. + +Shape +~~~~~ + +- Circle: an MDS holds this state. +- Hexagon: no MDS holds this state (it is applied to the rank). + +Lines +~~~~~ + +- A double-lined shape indicates the rank is "in". + +.. graphviz:: mds-state-diagram.dot |