doc/rados/operations/control.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601

.. index:: control, commands

==================
 Control Commands
==================


Monitor Commands
================

Monitor commands are issued using the ``ceph`` utility:

.. prompt:: bash $

   ceph [-m monhost] {command}

The command is usually (though not always) of the form:

.. prompt:: bash $

   ceph {subsystem} {command}


System Commands
===============

Execute the following to display the current cluster status.  :

.. prompt:: bash $

   ceph -s
   ceph status

Execute the following to display a running summary of cluster status
and major events. :

.. prompt:: bash $

   ceph -w

Execute the following to show the monitor quorum, including which monitors are
participating and which one is the leader. :

.. prompt:: bash $

   ceph mon stat
   ceph quorum_status

Execute the following to query the status of a single monitor, including whether
or not it is in the quorum. :

.. prompt:: bash $

   ceph tell mon.[id] mon_status

where the value of ``[id]`` can be determined, e.g., from ``ceph -s``.


Authentication Subsystem
========================

To add a keyring for an OSD, execute the following:

.. prompt:: bash $

   ceph auth add {osd} {--in-file|-i} {path-to-osd-keyring}

To list the cluster's keys and their capabilities, execute the following:

.. prompt:: bash $

   ceph auth ls


Placement Group Subsystem
=========================

To display the statistics for all placement groups (PGs), execute the following: 

.. prompt:: bash $

   ceph pg dump [--format {format}]

The valid formats are ``plain`` (default), ``json`` ``json-pretty``, ``xml``, and ``xml-pretty``.
When implementing monitoring and other tools, it is best to use ``json`` format.
JSON parsing is more deterministic than the human-oriented ``plain``, and the layout is much
less variable from release to release.  The ``jq`` utility can be invaluable when extracting
data from JSON output.

To display the statistics for all placement groups stuck in a specified state, 
execute the following: 

.. prompt:: bash $

   ceph pg dump_stuck inactive|unclean|stale|undersized|degraded [--format {format}] [-t|--threshold {seconds}]


``--format`` may be ``plain`` (default), ``json``, ``json-pretty``, ``xml``, or ``xml-pretty``.

``--threshold`` defines how many seconds "stuck" is (default: 300)

**Inactive** Placement groups cannot process reads or writes because they are waiting for an OSD
with the most up-to-date data to come back.

**Unclean** Placement groups contain objects that are not replicated the desired number
of times. They should be recovering.

**Stale** Placement groups are in an unknown state - the OSDs that host them have not
reported to the monitor cluster in a while (configured by
``mon_osd_report_timeout``).

Delete "lost" objects or revert them to their prior state, either a previous version
or delete them if they were just created. :

.. prompt:: bash $

   ceph pg {pgid} mark_unfound_lost revert|delete


.. _osd-subsystem:

OSD Subsystem
=============

Query OSD subsystem status. :

.. prompt:: bash $

   ceph osd stat

Write a copy of the most recent OSD map to a file. See
:ref:`osdmaptool <osdmaptool>`. :

.. prompt:: bash $

   ceph osd getmap -o file

Write a copy of the crush map from the most recent OSD map to
file. :

.. prompt:: bash $

   ceph osd getcrushmap -o file

The foregoing is functionally equivalent to :

.. prompt:: bash $

   ceph osd getmap -o /tmp/osdmap
   osdmaptool /tmp/osdmap --export-crush file

Dump the OSD map. Valid formats for ``-f`` are ``plain``, ``json``, ``json-pretty``,
``xml``, and ``xml-pretty``. If no ``--format`` option is given, the OSD map is 
dumped as plain text.  As above, JSON format is best for tools, scripting, and other automation. :

.. prompt:: bash $

   ceph osd dump [--format {format}]

Dump the OSD map as a tree with one line per OSD containing weight
and state. :

.. prompt:: bash $

   ceph osd tree [--format {format}]

Find out where a specific object is or would be stored in the system:

.. prompt:: bash $

   ceph osd map <pool-name> <object-name>

Add or move a new item (OSD) with the given id/name/weight at the specified
location. :

.. prompt:: bash $

   ceph osd crush set {id} {weight} [{loc1} [{loc2} ...]]

Remove an existing item (OSD) from the CRUSH map. :

.. prompt:: bash $

   ceph osd crush remove {name}

Remove an existing bucket from the CRUSH map. :

.. prompt:: bash $

   ceph osd crush remove {bucket-name}

Move an existing bucket from one position in the hierarchy to another.  :

.. prompt:: bash $

   ceph osd crush move {id} {loc1} [{loc2} ...]

Set the weight of the item given by ``{name}`` to ``{weight}``. :

.. prompt:: bash $

   ceph osd crush reweight {name} {weight}

Mark an OSD as ``lost``. This may result in permanent data loss. Use with caution. :

.. prompt:: bash $

   ceph osd lost {id} [--yes-i-really-mean-it]

Create a new OSD. If no UUID is given, it will be set automatically when the OSD
starts up. :

.. prompt:: bash $

   ceph osd create [{uuid}]

Remove the given OSD(s). :

.. prompt:: bash $

   ceph osd rm [{id}...]

Query the current ``max_osd`` parameter in the OSD map. :

.. prompt:: bash $

   ceph osd getmaxosd

Import the given crush map. :

.. prompt:: bash $

   ceph osd setcrushmap -i file

Set the ``max_osd`` parameter in the OSD map. This defaults to 10000 now so
most admins will never need to adjust this. :

.. prompt:: bash $

   ceph osd setmaxosd

Mark OSD ``{osd-num}`` down. :

.. prompt:: bash $

   ceph osd down {osd-num}

Mark OSD ``{osd-num}`` out of the distribution (i.e. allocated no data). :

.. prompt:: bash $

   ceph osd out {osd-num}

Mark ``{osd-num}`` in the distribution (i.e. allocated data). :

.. prompt:: bash $

   ceph osd in {osd-num}

Set or clear the pause flags in the OSD map. If set, no IO requests
will be sent to any OSD. Clearing the flags via unpause results in
resending pending requests. :

.. prompt:: bash $

   ceph osd pause
   ceph osd unpause

Set the override weight (reweight) of ``{osd-num}`` to ``{weight}``. Two OSDs with the
same weight will receive roughly the same number of I/O requests and
store approximately the same amount of data. ``ceph osd reweight``
sets an override weight on the OSD. This value is in the range 0 to 1,
and forces CRUSH to re-place (1-weight) of the data that would
otherwise live on this drive. It does not change weights assigned
to the buckets above the OSD in the crush map, and is a corrective
measure in case the normal CRUSH distribution is not working out quite
right. For instance, if one of your OSDs is at 90% and the others are
at 50%, you could reduce this weight to compensate. :

.. prompt:: bash $

   ceph osd reweight {osd-num} {weight}

Balance OSD fullness by reducing the override weight of OSDs which are
overly utilized.  Note that these override aka ``reweight`` values
default to 1.00000 and are relative only to each other; they not absolute.
It is crucial to distinguish them from CRUSH weights, which reflect the
absolute capacity of a bucket in TiB.  By default this command adjusts
override weight on OSDs which have + or - 20% of the average utilization,
but if you include a ``threshold`` that percentage will be used instead. :

.. prompt:: bash $

   ceph osd reweight-by-utilization [threshold [max_change [max_osds]]] [--no-increasing]

To limit the step by which any OSD's reweight will be changed, specify
``max_change`` which defaults to 0.05.  To limit the number of OSDs that will
be adjusted, specify ``max_osds`` as well; the default is 4.  Increasing these
parameters can speed leveling of OSD utilization, at the potential cost of
greater impact on client operations due to more data moving at once.

To determine which and how many PGs and OSDs will be affected by a given invocation
you can test before executing. :

.. prompt:: bash $

   ceph osd test-reweight-by-utilization [threshold [max_change max_osds]] [--no-increasing]

Adding ``--no-increasing`` to either command prevents increasing any
override weights that are currently < 1.00000.  This can be useful when
you are balancing in a hurry to remedy ``full`` or ``nearful`` OSDs or
when some OSDs are being evacuated or slowly brought into service.

Deployments utilizing Nautilus (or later revisions of Luminous and Mimic)
that have no pre-Luminous cients may instead wish to instead enable the
`balancer`` module for ``ceph-mgr``.

Add/remove an IP address or CIDR range to/from the blocklist.
When adding to the blocklist,
you can specify how long it should be blocklisted in seconds; otherwise,
it will default to 1 hour. A blocklisted address is prevented from
connecting to any OSD. If you blocklist an IP or range containing an OSD, be aware
that OSD will also be prevented from performing operations on its peers where it
acts as a client. (This includes tiering and copy-from functionality.)

If you want to blocklist a range (in CIDR format), you may do so by
including the ``range`` keyword.

These commands are mostly only useful for failure testing, as
blocklists are normally maintained automatically and shouldn't need
manual intervention. :

.. prompt:: bash $

   ceph osd blocklist ["range"] add ADDRESS[:source_port][/netmask_bits] [TIME]
   ceph osd blocklist ["range"] rm ADDRESS[:source_port][/netmask_bits]

Creates/deletes a snapshot of a pool. :

.. prompt:: bash $

   ceph osd pool mksnap {pool-name} {snap-name}
   ceph osd pool rmsnap {pool-name} {snap-name}

Creates/deletes/renames a storage pool. :

.. prompt:: bash $

   ceph osd pool create {pool-name} [pg_num [pgp_num]]
   ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
   ceph osd pool rename {old-name} {new-name}

Changes a pool setting. : 

.. prompt:: bash $

   ceph osd pool set {pool-name} {field} {value}

Valid fields are:

	* ``size``: Sets the number of copies of data in the pool.
	* ``pg_num``: The placement group number.
	* ``pgp_num``: Effective number when calculating pg placement.
	* ``crush_rule``: rule number for mapping placement.

Get the value of a pool setting. :

.. prompt:: bash $

   ceph osd pool get {pool-name} {field}

Valid fields are:

	* ``pg_num``: The placement group number.
	* ``pgp_num``: Effective number of placement groups when calculating placement.


Sends a scrub command to OSD ``{osd-num}``. To send the command to all OSDs, use ``*``. :

.. prompt:: bash $

   ceph osd scrub {osd-num}

Sends a repair command to OSD.N. To send the command to all OSDs, use ``*``. :

.. prompt:: bash $

   ceph osd repair N

Runs a simple throughput benchmark against OSD.N, writing ``TOTAL_DATA_BYTES``
in write requests of ``BYTES_PER_WRITE`` each. By default, the test
writes 1 GB in total in 4-MB increments.
The benchmark is non-destructive and will not overwrite existing live
OSD data, but might temporarily affect the performance of clients
concurrently accessing the OSD. :

.. prompt:: bash $

   ceph tell osd.N bench [TOTAL_DATA_BYTES] [BYTES_PER_WRITE]

To clear an OSD's caches between benchmark runs, use the 'cache drop' command :

.. prompt:: bash $

   ceph tell osd.N cache drop

To get the cache statistics of an OSD, use the 'cache status' command :

.. prompt:: bash $

   ceph tell osd.N cache status

MDS Subsystem
=============

Change configuration parameters on a running mds. :

.. prompt:: bash $

   ceph tell mds.{mds-id} config set {setting} {value}

Example:

.. prompt:: bash $

   ceph tell mds.0 config set debug_ms 1

Enables debug messages. :

.. prompt:: bash $

   ceph mds stat

Displays the status of all metadata servers. :

.. prompt:: bash $

   ceph mds fail 0

Marks the active MDS as failed, triggering failover to a standby if present.

.. todo:: ``ceph mds`` subcommands missing docs: set, dump, getmap, stop, setmap


Mon Subsystem
=============

Show monitor stats:

.. prompt:: bash $

   ceph mon stat

::

	e2: 3 mons at {a=127.0.0.1:40000/0,b=127.0.0.1:40001/0,c=127.0.0.1:40002/0}, election epoch 6, quorum 0,1,2 a,b,c


The ``quorum`` list at the end lists monitor nodes that are part of the current quorum.

This is also available more directly:

.. prompt:: bash $

   ceph quorum_status -f json-pretty
	
.. code-block:: javascript	

	{
	    "election_epoch": 6,
	    "quorum": [
		0,
		1,
		2
	    ],
	    "quorum_names": [
		"a",
		"b",
		"c"
	    ],
	    "quorum_leader_name": "a",
	    "monmap": {
		"epoch": 2,
		"fsid": "ba807e74-b64f-4b72-b43f-597dfe60ddbc",
		"modified": "2016-12-26 14:42:09.288066",
		"created": "2016-12-26 14:42:03.573585",
		"features": {
		    "persistent": [
			"kraken"
		    ],
		    "optional": []
		},
		"mons": [
		    {
			"rank": 0,
			"name": "a",
			"addr": "127.0.0.1:40000\/0",
			"public_addr": "127.0.0.1:40000\/0"
		    },
		    {
			"rank": 1,
			"name": "b",
			"addr": "127.0.0.1:40001\/0",
			"public_addr": "127.0.0.1:40001\/0"
		    },
		    {
			"rank": 2,
			"name": "c",
			"addr": "127.0.0.1:40002\/0",
			"public_addr": "127.0.0.1:40002\/0"
		    }
		]
	    }
	}
	  

The above will block until a quorum is reached.

For a status of just a single monitor:

.. prompt:: bash $

   ceph tell mon.[name] mon_status
	
where the value of ``[name]`` can be taken from ``ceph quorum_status``. Sample
output::
	
	{
	    "name": "b",
	    "rank": 1,
	    "state": "peon",
	    "election_epoch": 6,
	    "quorum": [
		0,
		1,
		2
	    ],
	    "features": {
		"required_con": "9025616074522624",
		"required_mon": [
		    "kraken"
		],
		"quorum_con": "1152921504336314367",
		"quorum_mon": [
		    "kraken"
		]
	    },
	    "outside_quorum": [],
	    "extra_probe_peers": [],
	    "sync_provider": [],
	    "monmap": {
		"epoch": 2,
		"fsid": "ba807e74-b64f-4b72-b43f-597dfe60ddbc",
		"modified": "2016-12-26 14:42:09.288066",
		"created": "2016-12-26 14:42:03.573585",
		"features": {
		    "persistent": [
			"kraken"
		    ],
		    "optional": []
		},
		"mons": [
		    {
			"rank": 0,
			"name": "a",
			"addr": "127.0.0.1:40000\/0",
			"public_addr": "127.0.0.1:40000\/0"
		    },
		    {
			"rank": 1,
			"name": "b",
			"addr": "127.0.0.1:40001\/0",
			"public_addr": "127.0.0.1:40001\/0"
		    },
		    {
			"rank": 2,
			"name": "c",
			"addr": "127.0.0.1:40002\/0",
			"public_addr": "127.0.0.1:40002\/0"
		    }
		]
	    }
	}

A dump of the monitor state:

       .. prompt:: bash $

	ceph mon dump

  ::

	dumped monmap epoch 2
	epoch 2
	fsid ba807e74-b64f-4b72-b43f-597dfe60ddbc
	last_changed 2016-12-26 14:42:09.288066
	created 2016-12-26 14:42:03.573585
	0: 127.0.0.1:40000/0 mon.a
	1: 127.0.0.1:40001/0 mon.b
	2: 127.0.0.1:40002/0 mon.c