summaryrefslogtreecommitdiffstats
path: root/doc/cephfs/mds-states.rst
blob: aba066ae27a7de143f8a3ce4a7841201f92cacce (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
MDS States
==========


The Metadata Server (MDS) goes through several states during normal operation
in CephFS. For example, some states indicate that the MDS is recovering from a
failover by a previous instance of the MDS. Here we'll document all of these
states and include a state diagram to visualize the transitions.

State Descriptions
------------------

Common states
~~~~~~~~~~~~~~


::

    up:active

This is the normal operating state of the MDS. It indicates that the MDS
and its rank in the file system is available.


::

    up:standby

The MDS is available to takeover for a failed rank (see also :ref:`mds-standby`).
The monitor will automatically assign an MDS in this state to a failed rank
once available.


::

    up:standby_replay

The MDS is following the journal of another ``up:active`` MDS. Should the
active MDS fail, having a standby MDS in replay mode is desirable as the MDS is
replaying the live journal and will more quickly takeover. A downside to having
standby replay MDSs is that they are not available to takeover for any other
MDS that fails, only the MDS they follow.


Less common or transitory states
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


::

    up:boot

This state is broadcast to the Ceph monitors during startup. This state is
never visible as the Monitor immediately assign the MDS to an available rank or
commands the MDS to operate as a standby. The state is documented here for
completeness.


::

    up:creating

The MDS is creating a new rank (perhaps rank 0) by constructing some per-rank
metadata (like the journal) and entering the MDS cluster.


::

    up:starting

The MDS is restarting a stopped rank. It opens associated per-rank metadata
and enters the MDS cluster.


::

    up:stopping

When a rank is stopped, the monitors command an active MDS to enter the
``up:stopping`` state. In this state, the MDS accepts no new client
connections, migrates all subtrees to other ranks in the file system, flush its
metadata journal, and, if the last rank (0), evict all clients and shutdown
(see also :ref:`cephfs-administration`).


::

    up:replay

The MDS taking over a failed rank. This state represents that the MDS is
recovering its journal and other metadata.


::

    up:resolve

The MDS enters this state from ``up:replay`` if the Ceph file system has
multiple ranks (including this one), i.e. it's not a single active MDS cluster.
The MDS is resolving any uncommitted inter-MDS operations. All ranks in the
file system must be in this state or later for progress to be made, i.e. no
rank can be failed/damaged or ``up:replay``.


::

    up:reconnect

An MDS enters this state from ``up:replay`` or ``up:resolve``. This state is to
solicit reconnections from clients. Any client which had a session with this
rank must reconnect during this time, configurable via
``mds_reconnect_timeout``.


::

    up:rejoin

The MDS enters this state from ``up:reconnect``. In this state, the MDS is
rejoining the MDS cluster cache. In particular, all inter-MDS locks on metadata
are reestablished.

If there are no known client requests to be replayed, the MDS directly becomes
``up:active`` from this state.


::

    up:clientreplay

The MDS may enter this state from ``up:rejoin``. The MDS is replaying any
client requests which were replied to but not yet durable (not journaled).
Clients resend these requests during ``up:reconnect`` and the requests are
replayed once again. The MDS enters ``up:active`` after completing replay.


Failed states
~~~~~~~~~~~~~

::

    down:failed

No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example:

::

    $ ceph fs dump
    ...
    max_mds 1
    in      0
    up      {}
    failed  0
    ...

Rank 0 is part of the failed set.


::

    down:damaged

No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example:

::

    $ ceph fs dump
    ...
    max_mds 1
    in      0
    up      {}
    failed  
    damaged 0
    ...

Rank 0 has become damaged (see also :ref:`cephfs-disaster-recovery`) and placed in
the ``damaged`` set. An MDS which was running as rank 0 found metadata damage
that could not be automatically recovered. Operator intervention is required.


::

    down:stopped
    
No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example:

::

    $ ceph fs dump
    ...
    max_mds 1
    in      0
    up      {}
    failed  
    damaged 
    stopped 1
    ...

The rank has been stopped by reducing ``max_mds`` (see also :ref:`cephfs-multimds`).

State Diagram
-------------

This state diagram shows the possible state transitions for the MDS/rank. The legend is as follows:

Color
~~~~~

- Green: MDS is active.
- Orange: MDS is in transient state trying to become active.
- Red: MDS is indicating a state that causes the rank to be marked failed.
- Purple: MDS and rank is stopping.
- Black: MDS is indicating a state that causes the rank to be marked damaged.

Shape
~~~~~

- Circle: an MDS holds this state.
- Hexagon: no MDS holds this state (it is applied to the rank).

Lines
~~~~~

- A double-lined shape indicates the rank is "in".

.. image:: mds-state-diagram.svg