summaryrefslogtreecommitdiffstats
path: root/doc/rados/troubleshooting/troubleshooting-mon.rst
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-21 11:54:28 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-21 11:54:28 +0000
commite6918187568dbd01842d8d1d2c808ce16a894239 (patch)
tree64f88b554b444a49f656b6c656111a145cbbaa28 /doc/rados/troubleshooting/troubleshooting-mon.rst
parentInitial commit. (diff)
downloadceph-upstream/18.2.2.tar.xz
ceph-upstream/18.2.2.zip
Adding upstream version 18.2.2.upstream/18.2.2
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
-rw-r--r--doc/rados/troubleshooting/troubleshooting-mon.rst713
1 files changed, 713 insertions, 0 deletions
diff --git a/doc/rados/troubleshooting/troubleshooting-mon.rst b/doc/rados/troubleshooting/troubleshooting-mon.rst
new file mode 100644
index 000000000..1170da7c3
--- /dev/null
+++ b/doc/rados/troubleshooting/troubleshooting-mon.rst
@@ -0,0 +1,713 @@
+.. _rados-troubleshooting-mon:
+
+==========================
+ Troubleshooting Monitors
+==========================
+
+.. index:: monitor, high availability
+
+Even if a cluster experiences monitor-related problems, the cluster is not
+necessarily in danger of going down. If a cluster has lost multiple monitors,
+it can still remain up and running as long as there are enough surviving
+monitors to form a quorum.
+
+If your cluster is having monitor-related problems, we recommend that you
+consult the following troubleshooting information.
+
+Initial Troubleshooting
+=======================
+
+The first steps in the process of troubleshooting Ceph Monitors involve making
+sure that the Monitors are running and that they are able to communicate with
+the network and on the network. Follow the steps in this section to rule out
+the simplest causes of Monitor malfunction.
+
+#. **Make sure that the Monitors are running.**
+
+ Make sure that the Monitor (*mon*) daemon processes (``ceph-mon``) are
+ running. It might be the case that the mons have not be restarted after an
+ upgrade. Checking for this simple oversight can save hours of painstaking
+ troubleshooting.
+
+ It is also important to make sure that the manager daemons (``ceph-mgr``)
+ are running. Remember that typical cluster configurations provide one
+ Manager (``ceph-mgr``) for each Monitor (``ceph-mon``).
+
+ .. note:: In releases prior to v1.12.5, Rook will not run more than two
+ managers.
+
+#. **Make sure that you can reach the Monitor nodes.**
+
+ In certain rare cases, ``iptables`` rules might be blocking access to
+ Monitor nodes or TCP ports. These rules might be left over from earlier
+ stress testing or rule development. To check for the presence of such
+ rules, SSH into each Monitor node and use ``telnet`` or ``nc`` or a similar
+ tool to attempt to connect to each of the other Monitor nodes on ports
+ ``tcp/3300`` and ``tcp/6789``.
+
+#. **Make sure that the "ceph status" command runs and receives a reply from the cluster.**
+
+ If the ``ceph status`` command receives a reply from the cluster, then the
+ cluster is up and running. Monitors answer to a ``status`` request only if
+ there is a formed quorum. Confirm that one or more ``mgr`` daemons are
+ reported as running. In a cluster with no deficiencies, ``ceph status``
+ will report that all ``mgr`` daemons are running.
+
+ If the ``ceph status`` command does not receive a reply from the cluster,
+ then there are probably not enough Monitors ``up`` to form a quorum. If the
+ ``ceph -s`` command is run with no further options specified, it connects
+ to an arbitrarily selected Monitor. In certain cases, however, it might be
+ helpful to connect to a specific Monitor (or to several specific Monitors
+ in sequence) by adding the ``-m`` flag to the command: for example, ``ceph
+ status -m mymon1``.
+
+#. **None of this worked. What now?**
+
+ If the above solutions have not resolved your problems, you might find it
+ helpful to examine each individual Monitor in turn. Even if no quorum has
+ been formed, it is possible to contact each Monitor individually and
+ request its status by using the ``ceph tell mon.ID mon_status`` command
+ (here ``ID`` is the Monitor's identifier).
+
+ Run the ``ceph tell mon.ID mon_status`` command for each Monitor in the
+ cluster. For more on this command's output, see :ref:`Understanding
+ mon_status
+ <rados_troubleshoting_troubleshooting_mon_understanding_mon_status>`.
+
+ There is also an alternative method for contacting each individual Monitor:
+ SSH into each Monitor node and query the daemon's admin socket. See
+ :ref:`Using the Monitor's Admin
+ Socket<rados_troubleshoting_troubleshooting_mon_using_admin_socket>`.
+
+.. _rados_troubleshoting_troubleshooting_mon_using_admin_socket:
+
+Using the monitor's admin socket
+================================
+
+A monitor's admin socket allows you to interact directly with a specific daemon
+by using a Unix socket file. This file is found in the monitor's ``run``
+directory. The admin socket's default directory is
+``/var/run/ceph/ceph-mon.ID.asok``, but this can be overridden and the admin
+socket might be elsewhere, especially if your cluster's daemons are deployed in
+containers. If you cannot find it, either check your ``ceph.conf`` for an
+alternative path or run the following command:
+
+.. prompt:: bash $
+
+ ceph-conf --name mon.ID --show-config-value admin_socket
+
+The admin socket is available for use only when the monitor daemon is running.
+Whenever the monitor has been properly shut down, the admin socket is removed.
+However, if the monitor is not running and the admin socket persists, it is
+likely that the monitor has been improperly shut down. In any case, if the
+monitor is not running, it will be impossible to use the admin socket, and the
+``ceph`` command is likely to return ``Error 111: Connection Refused``.
+
+To access the admin socket, run a ``ceph tell`` command of the following form
+(specifying the daemon that you are interested in):
+
+.. prompt:: bash $
+
+ ceph tell mon.<id> mon_status
+
+This command passes a ``help`` command to the specific running monitor daemon
+``<id>`` via its admin socket. If you know the full path to the admin socket
+file, this can be done more directly by running the following command:
+
+.. prompt:: bash $
+
+ ceph --admin-daemon <full_path_to_asok_file> <command>
+
+Running ``ceph help`` shows all supported commands that are available through
+the admin socket. See especially ``config get``, ``config show``, ``mon stat``,
+and ``quorum_status``.
+
+.. _rados_troubleshoting_troubleshooting_mon_understanding_mon_status:
+
+Understanding mon_status
+========================
+
+The status of the monitor (as reported by the ``ceph tell mon.X mon_status``
+command) can always be obtained via the admin socket. This command outputs a
+great deal of information about the monitor (including the information found in
+the output of the ``quorum_status`` command).
+
+To understand this command's output, let us consider the following example, in
+which we see the output of ``ceph tell mon.c mon_status``::
+
+ { "name": "c",
+ "rank": 2,
+ "state": "peon",
+ "election_epoch": 38,
+ "quorum": [
+ 1,
+ 2],
+ "outside_quorum": [],
+ "extra_probe_peers": [],
+ "sync_provider": [],
+ "monmap": { "epoch": 3,
+ "fsid": "5c4e9d53-e2e1-478a-8061-f543f8be4cf8",
+ "modified": "2013-10-30 04:12:01.945629",
+ "created": "2013-10-29 14:14:41.914786",
+ "mons": [
+ { "rank": 0,
+ "name": "a",
+ "addr": "127.0.0.1:6789\/0"},
+ { "rank": 1,
+ "name": "b",
+ "addr": "127.0.0.1:6790\/0"},
+ { "rank": 2,
+ "name": "c",
+ "addr": "127.0.0.1:6795\/0"}]}}
+
+It is clear that there are three monitors in the monmap (*a*, *b*, and *c*),
+the quorum is formed by only two monitors, and *c* is in the quorum as a
+*peon*.
+
+**Which monitor is out of the quorum?**
+
+ The answer is **a** (that is, ``mon.a``).
+
+**Why?**
+
+ When the ``quorum`` set is examined, there are clearly two monitors in the
+ set: *1* and *2*. But these are not monitor names. They are monitor ranks, as
+ established in the current ``monmap``. The ``quorum`` set does not include
+ the monitor that has rank 0, and according to the ``monmap`` that monitor is
+ ``mon.a``.
+
+**How are monitor ranks determined?**
+
+ Monitor ranks are calculated (or recalculated) whenever monitors are added or
+ removed. The calculation of ranks follows a simple rule: the **greater** the
+ ``IP:PORT`` combination, the **lower** the rank. In this case, because
+ ``127.0.0.1:6789`` is lower than the other two ``IP:PORT`` combinations,
+ ``mon.a`` has the highest rank: namely, rank 0.
+
+
+Most Common Monitor Issues
+===========================
+
+The Cluster Has Quorum but at Least One Monitor is Down
+-------------------------------------------------------
+
+When the cluster has quorum but at least one monitor is down, ``ceph health
+detail`` returns a message similar to the following::
+
+ $ ceph health detail
+ [snip]
+ mon.a (rank 0) addr 127.0.0.1:6789/0 is down (out of quorum)
+
+**How do I troubleshoot a Ceph cluster that has quorum but also has at least one monitor down?**
+
+ #. Make sure that ``mon.a`` is running.
+
+ #. Make sure that you can connect to ``mon.a``'s node from the
+ other Monitor nodes. Check the TCP ports as well. Check ``iptables`` and
+ ``nf_conntrack`` on all nodes and make sure that you are not
+ dropping/rejecting connections.
+
+ If this initial troubleshooting doesn't solve your problem, then further
+ investigation is necessary.
+
+ First, check the problematic monitor's ``mon_status`` via the admin
+ socket as explained in `Using the monitor's admin socket`_ and
+ `Understanding mon_status`_.
+
+ If the Monitor is out of the quorum, then its state will be one of the
+ following: ``probing``, ``electing`` or ``synchronizing``. If the state of
+ the Monitor is ``leader`` or ``peon``, then the Monitor believes itself to be
+ in quorum but the rest of the cluster believes that it is not in quorum. It
+ is possible that a Monitor that is in one of the ``probing``, ``electing``,
+ or ``synchronizing`` states has entered the quorum during the process of
+ troubleshooting. Check ``ceph status`` again to determine whether the Monitor
+ has entered quorum during your troubleshooting. If the Monitor remains out of
+ the quorum, then proceed with the investigations described in this section of
+ the documentation.
+
+
+**What does it mean when a Monitor's state is ``probing``?**
+
+ If ``ceph health detail`` shows that a Monitor's state is
+ ``probing``, then the Monitor is still looking for the other Monitors. Every
+ Monitor remains in this state for some time when it is started. When a
+ Monitor has connected to the other Monitors specified in the ``monmap``, it
+ ceases to be in the ``probing`` state. The amount of time that a Monitor is
+ in the ``probing`` state depends upon the parameters of the cluster of which
+ it is a part. For example, when a Monitor is a part of a single-monitor
+ cluster (never do this in production), the monitor passes through the probing
+ state almost instantaneously. In a multi-monitor cluster, the Monitors stay
+ in the ``probing`` state until they find enough monitors to form a quorum
+ |---| this means that if two out of three Monitors in the cluster are
+ ``down``, the one remaining Monitor stays in the ``probing`` state
+ indefinitely until you bring one of the other monitors up.
+
+ If quorum has been established, then the Monitor daemon should be able to
+ find the other Monitors quickly, as long as they can be reached. If a Monitor
+ is stuck in the ``probing`` state and you have exhausted the procedures above
+ that describe the troubleshooting of communications between the Monitors,
+ then it is possible that the problem Monitor is trying to reach the other
+ Monitors at a wrong address. ``mon_status`` outputs the ``monmap`` that is
+ known to the monitor: determine whether the other Monitors' locations as
+ specified in the ``monmap`` match the locations of the Monitors in the
+ network. If they do not, see `Recovering a Monitor's Broken monmap`_.
+ If the locations of the Monitors as specified in the ``monmap`` match the
+ locations of the Monitors in the network, then the persistent
+ ``probing`` state could be related to severe clock skews amongst the monitor
+ nodes. See `Clock Skews`_. If the information in `Clock Skews`_ does not
+ bring the Monitor out of the ``probing`` state, then prepare your system logs
+ and ask the Ceph community for help. See `Preparing your logs`_ for
+ information about the proper preparation of logs.
+
+
+**What does it mean when a Monitor's state is ``electing``?**
+
+ If ``ceph health detail`` shows that a Monitor's state is ``electing``, the
+ monitor is in the middle of an election. Elections typically complete
+ quickly, but sometimes the monitors can get stuck in what is known as an
+ *election storm*. See :ref:`Monitor Elections <dev_mon_elections>` for more
+ on monitor elections.
+
+ The presence of election storm might indicate clock skew among the monitor
+ nodes. See `Clock Skews`_ for more information.
+
+ If your clocks are properly synchronized, search the mailing lists and bug
+ tracker for issues similar to your issue. The ``electing`` state is not
+ likely to persist. In versions of Ceph after the release of Cuttlefish, there
+ is no obvious reason other than clock skew that explains why an ``electing``
+ state would persist.
+
+ It is possible to investigate the cause of a persistent ``electing`` state if
+ you put the problematic Monitor into a ``down`` state while you investigate.
+ This is possible only if there are enough surviving Monitors to form quorum.
+
+**What does it mean when a Monitor's state is ``synchronizing``?**
+
+ If ``ceph health detail`` shows that the Monitor is ``synchronizing``, the
+ monitor is catching up with the rest of the cluster so that it can join the
+ quorum. The amount of time that it takes for the Monitor to synchronize with
+ the rest of the quorum is a function of the size of the cluster's monitor
+ store, the cluster's size, and the state of the cluster. Larger and degraded
+ clusters generally keep Monitors in the ``synchronizing`` state longer than
+ do smaller, new clusters.
+
+ A Monitor that changes its state from ``synchronizing`` to ``electing`` and
+ then back to ``synchronizing`` indicates a problem: the cluster state may be
+ advancing (that is, generating new maps) too fast for the synchronization
+ process to keep up with the pace of the creation of the new maps. This issue
+ presented more frequently prior to the Cuttlefish release than it does in
+ more recent releases, because the synchronization process has since been
+ refactored and enhanced to avoid this dynamic. If you experience this in
+ later versions, report the issue in the `Ceph bug tracker
+ <https://tracker.ceph.com>`_. Prepare and provide logs to substantiate any
+ bug you raise. See `Preparing your logs`_ for information about the proper
+ preparation of logs.
+
+**What does it mean when a Monitor's state is ``leader`` or ``peon``?**
+
+ If ``ceph health detail`` shows that the Monitor is in the ``leader`` state
+ or in the ``peon`` state, it is likely that clock skew is present. Follow the
+ instructions in `Clock Skews`_. If you have followed those instructions and
+ ``ceph health detail`` still shows that the Monitor is in the ``leader``
+ state or the ``peon`` state, report the issue in the `Ceph bug tracker
+ <https://tracker.ceph.com>`_. If you raise an issue, provide logs to
+ substantiate it. See `Preparing your logs`_ for information about the
+ proper preparation of logs.
+
+
+Recovering a Monitor's Broken ``monmap``
+----------------------------------------
+
+This is how a ``monmap`` usually looks, depending on the number of
+monitors::
+
+
+ epoch 3
+ fsid 5c4e9d53-e2e1-478a-8061-f543f8be4cf8
+ last_changed 2013-10-30 04:12:01.945629
+ created 2013-10-29 14:14:41.914786
+ 0: 127.0.0.1:6789/0 mon.a
+ 1: 127.0.0.1:6790/0 mon.b
+ 2: 127.0.0.1:6795/0 mon.c
+
+This may not be what you have however. For instance, in some versions of
+early Cuttlefish there was a bug that could cause your ``monmap``
+to be nullified. Completely filled with zeros. This means that not even
+``monmaptool`` would be able to make sense of cold, hard, inscrutable zeros.
+It's also possible to end up with a monitor with a severely outdated monmap,
+notably if the node has been down for months while you fight with your vendor's
+TAC. The subject ``ceph-mon`` daemon might be unable to find the surviving
+monitors (e.g., say ``mon.c`` is down; you add a new monitor ``mon.d``,
+then remove ``mon.a``, then add a new monitor ``mon.e`` and remove
+``mon.b``; you will end up with a totally different monmap from the one
+``mon.c`` knows).
+
+In this situation you have two possible solutions:
+
+Scrap the monitor and redeploy
+
+ You should only take this route if you are positive that you won't
+ lose the information kept by that monitor; that you have other monitors
+ and that they are running just fine so that your new monitor is able
+ to synchronize from the remaining monitors. Keep in mind that destroying
+ a monitor, if there are no other copies of its contents, may lead to
+ loss of data.
+
+Inject a monmap into the monitor
+
+ These are the basic steps:
+
+ Retrieve the ``monmap`` from the surviving monitors and inject it into the
+ monitor whose ``monmap`` is corrupted or lost.
+
+ Implement this solution by carrying out the following procedure:
+
+ 1. Is there a quorum of monitors? If so, retrieve the ``monmap`` from the
+ quorum::
+
+ $ ceph mon getmap -o /tmp/monmap
+
+ 2. If there is no quorum, then retrieve the ``monmap`` directly from another
+ monitor that has been stopped (in this example, the other monitor has
+ the ID ``ID-FOO``)::
+
+ $ ceph-mon -i ID-FOO --extract-monmap /tmp/monmap
+
+ 3. Stop the monitor you are going to inject the monmap into.
+
+ 4. Inject the monmap::
+
+ $ ceph-mon -i ID --inject-monmap /tmp/monmap
+
+ 5. Start the monitor
+
+ .. warning:: Injecting ``monmaps`` can cause serious problems because doing
+ so will overwrite the latest existing ``monmap`` stored on the monitor. Be
+ careful!
+
+Clock Skews
+-----------
+
+The Paxos consensus algorithm requires close time synchroniziation, which means
+that clock skew among the monitors in the quorum can have a serious effect on
+monitor operation. The resulting behavior can be puzzling. To avoid this issue,
+run a clock synchronization tool on your monitor nodes: for example, use
+``Chrony`` or the legacy ``ntpd`` utility. Configure each monitor nodes so that
+the `iburst` option is in effect and so that each monitor has multiple peers,
+including the following:
+
+* Each other
+* Internal ``NTP`` servers
+* Multiple external, public pool servers
+
+.. note:: The ``iburst`` option sends a burst of eight packets instead of the
+ usual single packet, and is used during the process of getting two peers
+ into initial synchronization.
+
+Furthermore, it is advisable to synchronize *all* nodes in your cluster against
+internal and external servers, and perhaps even against your monitors. Run
+``NTP`` servers on bare metal: VM-virtualized clocks are not suitable for
+steady timekeeping. See `https://www.ntp.org <https://www.ntp.org>`_ for more
+information about the Network Time Protocol (NTP). Your organization might
+already have quality internal ``NTP`` servers available. Sources for ``NTP``
+server appliances include the following:
+
+* Microsemi (formerly Symmetricom) `https://microsemi.com <https://www.microsemi.com/product-directory/3425-timing-synchronization>`_
+* EndRun `https://endruntechnologies.com <https://endruntechnologies.com/products/ntp-time-servers>`_
+* Netburner `https://www.netburner.com <https://www.netburner.com/products/network-time-server/pk70-ex-ntp-network-time-server>`_
+
+Clock Skew Questions and Answers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+**What's the maximum tolerated clock skew?**
+
+ By default, monitors allow clocks to drift up to a maximum of 0.05 seconds
+ (50 milliseconds).
+
+**Can I increase the maximum tolerated clock skew?**
+
+ Yes, but we strongly recommend against doing so. The maximum tolerated clock
+ skew is configurable via the ``mon-clock-drift-allowed`` option, but it is
+ almost certainly a bad idea to make changes to this option. The clock skew
+ maximum is in place because clock-skewed monitors cannot be relied upon. The
+ current default value has proven its worth at alerting the user before the
+ monitors encounter serious problems. Changing this value might cause
+ unforeseen effects on the stability of the monitors and overall cluster
+ health.
+
+**How do I know whether there is a clock skew?**
+
+ The monitors will warn you via the cluster status ``HEALTH_WARN``. When clock
+ skew is present, the ``ceph health detail`` and ``ceph status`` commands
+ return an output resembling the following::
+
+ mon.c addr 10.10.0.1:6789/0 clock skew 0.08235s > max 0.05s (latency 0.0045s)
+
+ In this example, the monitor ``mon.c`` has been flagged as suffering from
+ clock skew.
+
+ In Luminous and later releases, it is possible to check for a clock skew by
+ running the ``ceph time-sync-status`` command. Note that the lead monitor
+ typically has the numerically lowest IP address. It will always show ``0``:
+ the reported offsets of other monitors are relative to the lead monitor, not
+ to any external reference source.
+
+**What should I do if there is a clock skew?**
+
+ Synchronize your clocks. Using an NTP client might help. However, if you
+ are already using an NTP client and you still encounter clock skew problems,
+ determine whether the NTP server that you are using is remote to your network
+ or instead hosted on your network. Hosting your own NTP servers tends to
+ mitigate clock skew problems.
+
+
+Client Can't Connect or Mount
+-----------------------------
+
+Check your IP tables. Some operating-system install utilities add a ``REJECT``
+rule to ``iptables``. ``iptables`` rules will reject all clients other than
+``ssh`` that try to connect to the host. If your monitor host's IP tables have
+a ``REJECT`` rule in place, clients that are connecting from a separate node
+will fail and will raise a timeout error. Any ``iptables`` rules that reject
+clients trying to connect to Ceph daemons must be addressed. For example::
+
+ REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
+
+It might also be necessary to add rules to iptables on your Ceph hosts to
+ensure that clients are able to access the TCP ports associated with your Ceph
+monitors (default: port 6789) and Ceph OSDs (default: 6800 through 7300). For
+example::
+
+ iptables -A INPUT -m multiport -p tcp -s {ip-address}/{netmask} --dports 6789,6800:7300 -j ACCEPT
+
+
+Monitor Store Failures
+======================
+
+Symptoms of store corruption
+----------------------------
+
+Ceph monitors store the :term:`Cluster Map` in a key-value store. If key-value
+store corruption causes a monitor to fail, then the monitor log might contain
+one of the following error messages::
+
+ Corruption: error in middle of record
+
+or::
+
+ Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/mon.foo/store.db/1234567.ldb
+
+Recovery using healthy monitor(s)
+---------------------------------
+
+If there are surviving monitors, we can always :ref:`replace
+<adding-and-removing-monitors>` the corrupted monitor with a new one. After the
+new monitor boots, it will synchronize with a healthy peer. After the new
+monitor is fully synchronized, it will be able to serve clients.
+
+.. _mon-store-recovery-using-osds:
+
+Recovery using OSDs
+-------------------
+
+Even if all monitors fail at the same time, it is possible to recover the
+monitor store by using information stored in OSDs. You are encouraged to deploy
+at least three (and preferably five) monitors in a Ceph cluster. In such a
+deployment, complete monitor failure is unlikely. However, unplanned power loss
+in a data center whose disk settings or filesystem settings are improperly
+configured could cause the underlying filesystem to fail and this could kill
+all of the monitors. In such a case, data in the OSDs can be used to recover
+the monitors. The following is such a script and can be used to recover the
+monitors:
+
+
+.. code-block:: bash
+
+ ms=/root/mon-store
+ mkdir $ms
+
+ # collect the cluster map from stopped OSDs
+ for host in $hosts; do
+ rsync -avz $ms/. user@$host:$ms.remote
+ rm -rf $ms
+ ssh user@$host <<EOF
+ for osd in /var/lib/ceph/osd/ceph-*; do
+ ceph-objectstore-tool --data-path \$osd --no-mon-config --op update-mon-db --mon-store-path $ms.remote
+ done
+ EOF
+ rsync -avz user@$host:$ms.remote/. $ms
+ done
+
+ # rebuild the monitor store from the collected map, if the cluster does not
+ # use cephx authentication, we can skip the following steps to update the
+ # keyring with the caps, and there is no need to pass the "--keyring" option.
+ # i.e. just use "ceph-monstore-tool $ms rebuild" instead
+ ceph-authtool /path/to/admin.keyring -n mon. \
+ --cap mon 'allow *'
+ ceph-authtool /path/to/admin.keyring -n client.admin \
+ --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'
+ # add one or more ceph-mgr's key to the keyring. in this case, an encoded key
+ # for mgr.x is added, you can find the encoded key in
+ # /etc/ceph/${cluster}.${mgr_name}.keyring on the machine where ceph-mgr is
+ # deployed
+ ceph-authtool /path/to/admin.keyring --add-key 'AQDN8kBe9PLWARAAZwxXMr+n85SBYbSlLcZnMA==' -n mgr.x \
+ --cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *'
+ # If your monitors' ids are not sorted by ip address, please specify them in order.
+ # For example. if mon 'a' is 10.0.0.3, mon 'b' is 10.0.0.2, and mon 'c' is 10.0.0.4,
+ # please passing "--mon-ids b a c".
+ # In addition, if your monitors' ids are not single characters like 'a', 'b', 'c', please
+ # specify them in the command line by passing them as arguments of the "--mon-ids"
+ # option. if you are not sure, please check your ceph.conf to see if there is any
+ # sections named like '[mon.foo]'. don't pass the "--mon-ids" option, if you are
+ # using DNS SRV for looking up monitors.
+ ceph-monstore-tool $ms rebuild -- --keyring /path/to/admin.keyring --mon-ids alpha beta gamma
+
+ # make a backup of the corrupted store.db just in case! repeat for
+ # all monitors.
+ mv /var/lib/ceph/mon/mon.foo/store.db /var/lib/ceph/mon/mon.foo/store.db.corrupted
+
+ # move rebuild store.db into place. repeat for all monitors.
+ mv $ms/store.db /var/lib/ceph/mon/mon.foo/store.db
+ chown -R ceph:ceph /var/lib/ceph/mon/mon.foo/store.db
+
+This script performs the following steps:
+
+#. Collects the map from each OSD host.
+#. Rebuilds the store.
+#. Fills the entities in the keyring file with appropriate capabilities.
+#. Replaces the corrupted store on ``mon.foo`` with the recovered copy.
+
+
+Known limitations
+~~~~~~~~~~~~~~~~~
+
+The above recovery tool is unable to recover the following information:
+
+- **Certain added keyrings**: All of the OSD keyrings added using the ``ceph
+ auth add`` command are recovered from the OSD's copy, and the
+ ``client.admin`` keyring is imported using ``ceph-monstore-tool``. However,
+ the MDS keyrings and all other keyrings will be missing in the recovered
+ monitor store. You might need to manually re-add them.
+
+- **Creating pools**: If any RADOS pools were in the process of being created,
+ that state is lost. The recovery tool operates on the assumption that all
+ pools have already been created. If there are PGs that are stuck in the
+ 'unknown' state after the recovery for a partially created pool, you can
+ force creation of the *empty* PG by running the ``ceph osd force-create-pg``
+ command. Note that this will create an *empty* PG, so take this action only
+ if you know the pool is empty.
+
+- **MDS Maps**: The MDS maps are lost.
+
+
+Everything Failed! Now What?
+============================
+
+Reaching out for help
+---------------------
+
+You can find help on IRC in #ceph and #ceph-devel on OFTC (server
+irc.oftc.net), or at ``dev@ceph.io`` and ``ceph-users@lists.ceph.com``. Make
+sure that you have prepared your logs and that you have them ready upon
+request.
+
+See https://ceph.io/en/community/connect/ for current (as of October 2023)
+information on getting in contact with the upstream Ceph community.
+
+
+Preparing your logs
+-------------------
+
+The default location for monitor logs is ``/var/log/ceph/ceph-mon.FOO.log*``.
+However, if they are not there, you can find their current location by running
+the following command:
+
+.. prompt:: bash
+
+ ceph-conf --name mon.FOO --show-config-value log_file
+
+The amount of information in the logs is determined by the debug levels in the
+cluster's configuration files. If Ceph is using the default debug levels, then
+your logs might be missing important information that would help the upstream
+Ceph community address your issue.
+
+To make sure your monitor logs contain relevant information, you can raise
+debug levels. Here we are interested in information from the monitors. As with
+other components, the monitors have different parts that output their debug
+information on different subsystems.
+
+If you are an experienced Ceph troubleshooter, we recommend raising the debug
+levels of the most relevant subsystems. Of course, this approach might not be
+easy for beginners. In most cases, however, enough information to address the
+issue will be secured if the following debug levels are entered::
+
+ debug_mon = 10
+ debug_ms = 1
+
+Sometimes these debug levels do not yield enough information. In such cases,
+members of the upstream Ceph community might ask you to make additional changes
+to these or to other debug levels. In any case, it is better for us to receive
+at least some useful information than to receive an empty log.
+
+
+Do I need to restart a monitor to adjust debug levels?
+------------------------------------------------------
+
+No, restarting a monitor is not necessary. Debug levels may be adjusted by
+using two different methods, depending on whether or not there is a quorum:
+
+There is a quorum
+
+ Either inject the debug option into the specific monitor that needs to
+ be debugged::
+
+ ceph tell mon.FOO config set debug_mon 10/10
+
+ Or inject it into all monitors at once::
+
+ ceph tell mon.* config set debug_mon 10/10
+
+
+There is no quorum
+
+ Use the admin socket of the specific monitor that needs to be debugged
+ and directly adjust the monitor's configuration options::
+
+ ceph daemon mon.FOO config set debug_mon 10/10
+
+
+To return the debug levels to their default values, run the above commands
+using the debug level ``1/10`` rather than ``10/10``. To check a monitor's
+current values, use the admin socket and run either of the following commands:
+
+ .. prompt:: bash
+
+ ceph daemon mon.FOO config show
+
+or:
+
+ .. prompt:: bash
+
+ ceph daemon mon.FOO config get 'OPTION_NAME'
+
+
+
+I Reproduced the problem with appropriate debug levels. Now what?
+-----------------------------------------------------------------
+
+We prefer that you send us only the portions of your logs that are relevant to
+your monitor problems. Of course, it might not be easy for you to determine
+which portions are relevant so we are willing to accept complete and
+unabridged logs. However, we request that you avoid sending logs containing
+hundreds of thousands of lines with no additional clarifying information. One
+common-sense way of making our task easier is to write down the current time
+and date when you are reproducing the problem and then extract portions of your
+logs based on that information.
+
+Finally, reach out to us on the mailing lists or IRC or Slack, or by filing a
+new issue on the `tracker`_.
+
+.. _tracker: http://tracker.ceph.com/projects/ceph/issues/new
+
+.. |---| unicode:: U+2014 .. EM DASH
+ :trim: