doc/rados/operations/control.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665

.. index:: control, commands

==================
 Control Commands
==================


Monitor Commands
================

To issue monitor commands, use the ``ceph`` utility:

.. prompt:: bash $

   ceph [-m monhost] {command}

In most cases, monitor commands have the following form:

.. prompt:: bash $

   ceph {subsystem} {command}


System Commands
===============

To display the current cluster status, run the following commands:

.. prompt:: bash $

   ceph -s
   ceph status

To display a running summary of cluster status and major events, run the
following command:

.. prompt:: bash $

   ceph -w

To display the monitor quorum, including which monitors are participating and
which one is the leader, run the following commands:

.. prompt:: bash $

   ceph mon stat
   ceph quorum_status

To query the status of a single monitor, including whether it is in the quorum,
run the following command:

.. prompt:: bash $

   ceph tell mon.[id] mon_status

Here the value of ``[id]`` can be found by consulting the output of ``ceph
-s``.


Authentication Subsystem
========================

To add an OSD keyring for a specific OSD, run the following command:

.. prompt:: bash $

   ceph auth add {osd} {--in-file|-i} {path-to-osd-keyring}

To list the cluster's keys and their capabilities, run the following command:

.. prompt:: bash $

   ceph auth ls


Placement Group Subsystem
=========================

To display the statistics for all placement groups (PGs), run the following
command:

.. prompt:: bash $

   ceph pg dump [--format {format}]

Here the valid formats are ``plain`` (default), ``json`` ``json-pretty``,
``xml``, and ``xml-pretty``.  When implementing monitoring tools and other
tools, it is best to use the ``json`` format.  JSON parsing is more
deterministic than the ``plain`` format (which is more human readable), and the
layout is much more consistent from release to release. The ``jq`` utility is
very useful for extracting data from JSON output.

To display the statistics for all PGs stuck in a specified state, run the
following command:

.. prompt:: bash $

   ceph pg dump_stuck inactive|unclean|stale|undersized|degraded [--format {format}] [-t|--threshold {seconds}]

Here ``--format`` may be ``plain`` (default), ``json``, ``json-pretty``,
``xml``, or ``xml-pretty``.

The ``--threshold`` argument determines the time interval (in seconds) for a PG
to be considered ``stuck`` (default: 300).

PGs might be stuck in any of the following states:

**Inactive** 

    PGs are unable to process reads or writes because they are waiting for an
    OSD that has the most up-to-date data to return to an ``up`` state.


**Unclean** 

    PGs contain objects that have not been replicated the desired number of
    times. These PGs have not yet completed the process of recovering.


**Stale** 

    PGs are in an unknown state, because the OSDs that host them have not
    reported to the monitor cluster for a certain period of time (specified by
    the ``mon_osd_report_timeout`` configuration setting).


To delete a ``lost`` object or revert an object to its prior state, either by
reverting it to its previous version or by deleting it because it was just
created and has no previous version, run the following command:

.. prompt:: bash $

   ceph pg {pgid} mark_unfound_lost revert|delete


.. _osd-subsystem:

OSD Subsystem
=============

To query OSD subsystem status, run the following command:

.. prompt:: bash $

   ceph osd stat

To write a copy of the most recent OSD map to a file (see :ref:`osdmaptool
<osdmaptool>`), run the following command:

.. prompt:: bash $

   ceph osd getmap -o file

To write a copy of the CRUSH map from the most recent OSD map to a file, run
the following command:

.. prompt:: bash $

   ceph osd getcrushmap -o file

Note that this command is functionally equivalent to the following two
commands:

.. prompt:: bash $

   ceph osd getmap -o /tmp/osdmap
   osdmaptool /tmp/osdmap --export-crush file

To dump the OSD map, run the following command:

.. prompt:: bash $

   ceph osd dump [--format {format}]

The ``--format`` option accepts the following arguments: ``plain`` (default),
``json``, ``json-pretty``, ``xml``, and ``xml-pretty``. As noted above, JSON is
the recommended format for tools, scripting, and other forms of automation. 

To dump the OSD map as a tree that lists one OSD per line and displays
information about the weights and states of the OSDs, run the following
command:

.. prompt:: bash $

   ceph osd tree [--format {format}]

To find out where a specific RADOS object is stored in the system, run a
command of the following form:

.. prompt:: bash $

   ceph osd map <pool-name> <object-name>

To add or move a new OSD (specified by its ID, name, or weight) to a specific
CRUSH location, run the following command:

.. prompt:: bash $

   ceph osd crush set {id} {weight} [{loc1} [{loc2} ...]]

To remove an existing OSD from the CRUSH map, run the following command:

.. prompt:: bash $

   ceph osd crush remove {name}

To remove an existing bucket from the CRUSH map, run the following command:

.. prompt:: bash $

   ceph osd crush remove {bucket-name}

To move an existing bucket from one position in the CRUSH hierarchy to another,
run the following command:

.. prompt:: bash $

   ceph osd crush move {id} {loc1} [{loc2} ...]

To set the CRUSH weight of a specific OSD (specified by ``{name}``) to
``{weight}``, run the following command:

.. prompt:: bash $

   ceph osd crush reweight {name} {weight}

To mark an OSD as ``lost``, run the following command:

.. prompt:: bash $

   ceph osd lost {id} [--yes-i-really-mean-it]

.. warning::
   This could result in permanent data loss. Use with caution!

To create a new OSD, run the following command:

.. prompt:: bash $

   ceph osd create [{uuid}]

If no UUID is given as part of this command, the UUID will be set automatically
when the OSD starts up.

To remove one or more specific OSDs, run the following command:

.. prompt:: bash $

   ceph osd rm [{id}...]

To display the current ``max_osd`` parameter in the OSD map, run the following
command:

.. prompt:: bash $

   ceph osd getmaxosd

To import a specific CRUSH map, run the following command:

.. prompt:: bash $

   ceph osd setcrushmap -i file

To set the ``max_osd`` parameter in the OSD map, run the following command:

.. prompt:: bash $

   ceph osd setmaxosd

The parameter has a default value of 10000. Most operators will never need to
adjust it.

To mark a specific OSD ``down``, run the following command:

.. prompt:: bash $

   ceph osd down {osd-num}

To mark a specific OSD ``out`` (so that no data will be allocated to it), run
the following command:

.. prompt:: bash $

   ceph osd out {osd-num}

To mark a specific OSD ``in`` (so that data will be allocated to it), run the
following command:

.. prompt:: bash $

   ceph osd in {osd-num}

By using the "pause flags" in the OSD map, you can pause or unpause I/O
requests.  If the flags are set, then no I/O requests will be sent to any OSD.
When the flags are cleared, then pending I/O requests will be resent. To set or
clear pause flags, run one of the following commands:

.. prompt:: bash $

   ceph osd pause
   ceph osd unpause

You can assign an override or ``reweight`` weight value to a specific OSD if
the normal CRUSH distribution seems to be suboptimal. The weight of an OSD
helps determine the extent of its I/O requests and data storage: two OSDs with
the same weight will receive approximately the same number of I/O requests and
store approximately the same amount of data. The ``ceph osd reweight`` command
assigns an override weight to an OSD. The weight value is in the range 0 to 1,
and the command forces CRUSH to relocate a certain amount (1 - ``weight``) of
the data that would otherwise be on this OSD. The command does not change the
weights of the buckets above the OSD in the CRUSH map. Using the command is
merely a corrective measure: for example, if one of your OSDs is at 90% and the
others are at 50%, you could reduce the outlier weight to correct this
imbalance. To assign an override weight to a specific OSD, run the following
command:

.. prompt:: bash $

   ceph osd reweight {osd-num} {weight}

.. note:: Any assigned override reweight value will conflict with the balancer.
   This means that if the balancer is in use, all override reweight values
   should be ``1.0000`` in order to avoid suboptimal cluster behavior.

A cluster's OSDs can be reweighted in order to maintain balance if some OSDs
are being disproportionately utilized. Note that override or ``reweight``
weights have values relative to one another that default to 1.00000; their
values are not absolute, and these weights must be distinguished from CRUSH
weights (which reflect the absolute capacity of a bucket, as measured in TiB).
To reweight OSDs by utilization, run the following command:

.. prompt:: bash $

   ceph osd reweight-by-utilization [threshold [max_change [max_osds]]] [--no-increasing]

By default, this command adjusts the override weight of OSDs that have ±20% of
the average utilization, but you can specify a different percentage in the
``threshold`` argument. 

To limit the increment by which any OSD's reweight is to be changed, use the
``max_change`` argument (default: 0.05). To limit the number of OSDs that are
to be adjusted, use the ``max_osds`` argument (default: 4). Increasing these
variables can accelerate the reweighting process, but perhaps at the cost of
slower client operations (as a result of the increase in data movement).

You can test the ``osd reweight-by-utilization`` command before running it. To
find out which and how many PGs and OSDs will be affected by a specific use of
the ``osd reweight-by-utilization`` command, run the following command:

.. prompt:: bash $

   ceph osd test-reweight-by-utilization [threshold [max_change max_osds]] [--no-increasing]

The ``--no-increasing`` option can be added to the ``reweight-by-utilization``
and ``test-reweight-by-utilization`` commands in order to prevent any override
weights that are currently less than 1.00000 from being increased. This option
can be useful in certain circumstances: for example, when you are hastily
balancing in order to remedy ``full`` or ``nearfull`` OSDs, or when there are
OSDs being evacuated or slowly brought into service.

Operators of deployments that utilize Nautilus or newer (or later revisions of
Luminous and Mimic) and that have no pre-Luminous clients might likely instead
want to enable the `balancer`` module for ``ceph-mgr``.

The blocklist can be modified by adding or removing an IP address or a CIDR
range. If an address is blocklisted, it will be unable to connect to any OSD.
If an OSD is contained within an IP address or CIDR range that has been
blocklisted, the OSD will be unable to perform operations on its peers when it
acts as a client: such blocked operations include tiering and copy-from
functionality. To add or remove an IP address or CIDR range to the blocklist,
run one of the following commands:

.. prompt:: bash $

   ceph osd blocklist ["range"] add ADDRESS[:source_port][/netmask_bits] [TIME]
   ceph osd blocklist ["range"] rm ADDRESS[:source_port][/netmask_bits]

If you add something to the blocklist with the above ``add`` command, you can
use the ``TIME`` keyword to specify the length of time (in seconds) that it
will remain on the blocklist (default: one hour). To add or remove a CIDR
range, use the ``range`` keyword in the above commands.

Note that these commands are useful primarily in failure testing. Under normal
conditions, blocklists are maintained automatically and do not need any manual
intervention.

To create or delete a snapshot of a specific storage pool, run one of the
following commands:

.. prompt:: bash $

   ceph osd pool mksnap {pool-name} {snap-name}
   ceph osd pool rmsnap {pool-name} {snap-name}

To create, delete, or rename a specific storage pool, run one of the following
commands:

.. prompt:: bash $

   ceph osd pool create {pool-name} [pg_num [pgp_num]]
   ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
   ceph osd pool rename {old-name} {new-name}

To change a pool setting, run the following command: 

.. prompt:: bash $

   ceph osd pool set {pool-name} {field} {value}

The following are valid fields:

    * ``size``: The number of copies of data in the pool.
    * ``pg_num``: The PG number.
    * ``pgp_num``: The effective number of PGs when calculating placement.
    * ``crush_rule``: The rule number for mapping placement.

To retrieve the value of a pool setting, run the following command:

.. prompt:: bash $

   ceph osd pool get {pool-name} {field}

Valid fields are:

    * ``pg_num``: The PG number.
    * ``pgp_num``: The effective number of PGs when calculating placement.

To send a scrub command to a specific OSD, or to all OSDs (by using ``*``), run
the following command:

.. prompt:: bash $

   ceph osd scrub {osd-num}

To send a repair command to a specific OSD, or to all OSDs (by using ``*``),
run the following command:

.. prompt:: bash $

   ceph osd repair N

You can run a simple throughput benchmark test against a specific OSD. This
test writes a total size of ``TOTAL_DATA_BYTES`` (default: 1 GB) incrementally,
in multiple write requests that each have a size of ``BYTES_PER_WRITE``
(default: 4 MB). The test is not destructive and it will not overwrite existing
live OSD data, but it might temporarily affect the performance of clients that
are concurrently accessing the OSD. To launch this benchmark test, run the
following command:

.. prompt:: bash $

   ceph tell osd.N bench [TOTAL_DATA_BYTES] [BYTES_PER_WRITE]

To clear the caches of a specific OSD during the interval between one benchmark
run and another, run the following command:

.. prompt:: bash $

   ceph tell osd.N cache drop

To retrieve the cache statistics of a specific OSD, run the following command:

.. prompt:: bash $

   ceph tell osd.N cache status

MDS Subsystem
=============

To change the configuration parameters of a running metadata server, run the
following command:

.. prompt:: bash $

   ceph tell mds.{mds-id} config set {setting} {value}

Example:

.. prompt:: bash $

   ceph tell mds.0 config set debug_ms 1

To enable debug messages, run the following command:

.. prompt:: bash $

   ceph mds stat

To display the status of all metadata servers, run the following command:

.. prompt:: bash $

   ceph mds fail 0

To mark the active metadata server as failed (and to trigger failover to a
standby if a standby is present), run the following command:

.. todo:: ``ceph mds`` subcommands missing docs: set, dump, getmap, stop, setmap


Mon Subsystem
=============

To display monitor statistics, run the following command:

.. prompt:: bash $

   ceph mon stat

This command returns output similar to the following:

::

    e2: 3 mons at {a=127.0.0.1:40000/0,b=127.0.0.1:40001/0,c=127.0.0.1:40002/0}, election epoch 6, quorum 0,1,2 a,b,c

There is a ``quorum`` list at the end of the output. It lists those monitor
nodes that are part of the current quorum.

To retrieve this information in a more direct way, run the following command:

.. prompt:: bash $

   ceph quorum_status -f json-pretty

This command returns output similar to the following:

.. code-block:: javascript    

    {
        "election_epoch": 6,
        "quorum": [
        0,
        1,
        2
        ],
        "quorum_names": [
        "a",
        "b",
        "c"
        ],
        "quorum_leader_name": "a",
        "monmap": {
        "epoch": 2,
        "fsid": "ba807e74-b64f-4b72-b43f-597dfe60ddbc",
        "modified": "2016-12-26 14:42:09.288066",
        "created": "2016-12-26 14:42:03.573585",
        "features": {
            "persistent": [
            "kraken"
            ],
            "optional": []
        },
        "mons": [
            {
            "rank": 0,
            "name": "a",
            "addr": "127.0.0.1:40000\/0",
            "public_addr": "127.0.0.1:40000\/0"
            },
            {
            "rank": 1,
            "name": "b",
            "addr": "127.0.0.1:40001\/0",
            "public_addr": "127.0.0.1:40001\/0"
            },
            {
            "rank": 2,
            "name": "c",
            "addr": "127.0.0.1:40002\/0",
            "public_addr": "127.0.0.1:40002\/0"
            }
        ]
        }
    }
      

The above will block until a quorum is reached.

To see the status of a specific monitor, run the following command:

.. prompt:: bash $

   ceph tell mon.[name] mon_status

Here the value of ``[name]`` can be found by consulting the output of the
``ceph quorum_status`` command. This command returns output similar to the
following:

::

    {
        "name": "b",
        "rank": 1,
        "state": "peon",
        "election_epoch": 6,
        "quorum": [
        0,
        1,
        2
        ],
        "features": {
        "required_con": "9025616074522624",
        "required_mon": [
            "kraken"
        ],
        "quorum_con": "1152921504336314367",
        "quorum_mon": [
            "kraken"
        ]
        },
        "outside_quorum": [],
        "extra_probe_peers": [],
        "sync_provider": [],
        "monmap": {
        "epoch": 2,
        "fsid": "ba807e74-b64f-4b72-b43f-597dfe60ddbc",
        "modified": "2016-12-26 14:42:09.288066",
        "created": "2016-12-26 14:42:03.573585",
        "features": {
            "persistent": [
            "kraken"
            ],
            "optional": []
        },
        "mons": [
            {
            "rank": 0,
            "name": "a",
            "addr": "127.0.0.1:40000\/0",
            "public_addr": "127.0.0.1:40000\/0"
            },
            {
            "rank": 1,
            "name": "b",
            "addr": "127.0.0.1:40001\/0",
            "public_addr": "127.0.0.1:40001\/0"
            },
            {
            "rank": 2,
            "name": "c",
            "addr": "127.0.0.1:40002\/0",
            "public_addr": "127.0.0.1:40002\/0"
            }
        ]
        }
    }

To see a dump of the monitor state, run the following command:

.. prompt:: bash $

   ceph mon dump

This command returns output similar to the following:

::

    dumped monmap epoch 2
    epoch 2
    fsid ba807e74-b64f-4b72-b43f-597dfe60ddbc
    last_changed 2016-12-26 14:42:09.288066
    created 2016-12-26 14:42:03.573585
    0: 127.0.0.1:40000/0 mon.a
    1: 127.0.0.1:40001/0 mon.b
    2: 127.0.0.1:40002/0 mon.c