diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-23 16:45:17 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-23 16:45:44 +0000 |
commit | 17d6a993fc17d533460c5f40f3908c708e057c18 (patch) | |
tree | 1a3bd93e0ecd74fa02f93a528fe2f87e5314c4b5 /doc/rados/troubleshooting/troubleshooting-mon.rst | |
parent | Releasing progress-linux version 18.2.2-0progress7.99u1. (diff) | |
download | ceph-17d6a993fc17d533460c5f40f3908c708e057c18.tar.xz ceph-17d6a993fc17d533460c5f40f3908c708e057c18.zip |
Merging upstream version 18.2.3.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/rados/troubleshooting/troubleshooting-mon.rst')
-rw-r--r-- | doc/rados/troubleshooting/troubleshooting-mon.rst | 308 |
1 files changed, 166 insertions, 142 deletions
diff --git a/doc/rados/troubleshooting/troubleshooting-mon.rst b/doc/rados/troubleshooting/troubleshooting-mon.rst index 1170da7c3..443d6c443 100644 --- a/doc/rados/troubleshooting/troubleshooting-mon.rst +++ b/doc/rados/troubleshooting/troubleshooting-mon.rst @@ -85,23 +85,27 @@ Using the monitor's admin socket ================================ A monitor's admin socket allows you to interact directly with a specific daemon -by using a Unix socket file. This file is found in the monitor's ``run`` -directory. The admin socket's default directory is -``/var/run/ceph/ceph-mon.ID.asok``, but this can be overridden and the admin -socket might be elsewhere, especially if your cluster's daemons are deployed in -containers. If you cannot find it, either check your ``ceph.conf`` for an -alternative path or run the following command: +by using a Unix socket file. This socket file is found in the monitor's ``run`` +directory. + +The admin socket's default directory is ``/var/run/ceph/ceph-mon.ID.asok``. It +is possible to override the admin socket's default location. If the default +location has been overridden, then the admin socket will be elsewhere. This is +often the case when a cluster's daemons are deployed in containers. + +To find the directory of the admin socket, check either your ``ceph.conf`` for +an alternative path or run the following command: .. prompt:: bash $ ceph-conf --name mon.ID --show-config-value admin_socket -The admin socket is available for use only when the monitor daemon is running. -Whenever the monitor has been properly shut down, the admin socket is removed. -However, if the monitor is not running and the admin socket persists, it is -likely that the monitor has been improperly shut down. In any case, if the -monitor is not running, it will be impossible to use the admin socket, and the -``ceph`` command is likely to return ``Error 111: Connection Refused``. +The admin socket is available for use only when the Monitor daemon is running. +Every time the Monitor is properly shut down, the admin socket is removed. If +the Monitor is not running and yet the admin socket persists, it is likely that +the Monitor has been improperly shut down. If the Monitor is not running, it +will be impossible to use the admin socket, and the ``ceph`` command is likely +to return ``Error 111: Connection Refused``. To access the admin socket, run a ``ceph tell`` command of the following form (specifying the daemon that you are interested in): @@ -110,7 +114,7 @@ To access the admin socket, run a ``ceph tell`` command of the following form ceph tell mon.<id> mon_status -This command passes a ``help`` command to the specific running monitor daemon +This command passes a ``help`` command to the specified running Monitor daemon ``<id>`` via its admin socket. If you know the full path to the admin socket file, this can be done more directly by running the following command: @@ -127,10 +131,11 @@ and ``quorum_status``. Understanding mon_status ======================== -The status of the monitor (as reported by the ``ceph tell mon.X mon_status`` -command) can always be obtained via the admin socket. This command outputs a -great deal of information about the monitor (including the information found in -the output of the ``quorum_status`` command). +The status of a Monitor (as reported by the ``ceph tell mon.X mon_status`` +command) can be obtained via the admin socket. The ``ceph tell mon.X +mon_status`` command outputs a great deal of information about the monitor +(including the information found in the output of the ``quorum_status`` +command). To understand this command's output, let us consider the following example, in which we see the output of ``ceph tell mon.c mon_status``:: @@ -160,29 +165,34 @@ which we see the output of ``ceph tell mon.c mon_status``:: "name": "c", "addr": "127.0.0.1:6795\/0"}]}} -It is clear that there are three monitors in the monmap (*a*, *b*, and *c*), -the quorum is formed by only two monitors, and *c* is in the quorum as a -*peon*. +This output reports that there are three monitors in the monmap (*a*, *b*, and +*c*), that quorum is formed by only two monitors, and that *c* is in quorum as +a *peon*. -**Which monitor is out of the quorum?** +**Which monitor is out of quorum?** - The answer is **a** (that is, ``mon.a``). + The answer is **a** (that is, ``mon.a``). ``mon.a`` is out of quorum. -**Why?** +**How do we know, in this example, that mon.a is out of quorum?** - When the ``quorum`` set is examined, there are clearly two monitors in the - set: *1* and *2*. But these are not monitor names. They are monitor ranks, as - established in the current ``monmap``. The ``quorum`` set does not include - the monitor that has rank 0, and according to the ``monmap`` that monitor is - ``mon.a``. + We know that ``mon.a`` is out of quorum because it has rank 0, and Monitors + with rank 0 are by definition out of quorum. + + If we examine the ``quorum`` set, we can see that there are clearly two + monitors in the set: *1* and *2*. But these are not monitor names. They are + monitor ranks, as established in the current ``monmap``. The ``quorum`` set + does not include the monitor that has rank 0, and according to the ``monmap`` + that monitor is ``mon.a``. **How are monitor ranks determined?** - Monitor ranks are calculated (or recalculated) whenever monitors are added or - removed. The calculation of ranks follows a simple rule: the **greater** the - ``IP:PORT`` combination, the **lower** the rank. In this case, because - ``127.0.0.1:6789`` is lower than the other two ``IP:PORT`` combinations, - ``mon.a`` has the highest rank: namely, rank 0. + Monitor ranks are calculated (or recalculated) whenever monitors are added to + or removed from the cluster. The calculation of ranks follows a simple rule: + the **greater** the ``IP:PORT`` combination, the **lower** the rank. In this + case, because ``127.0.0.1:6789`` (``mon.a``) is numerically less than the + other two ``IP:PORT`` combinations (which are ``127.0.0.1:6790`` for "Monitor + b" and ``127.0.0.1:6795`` for "Monitor c"), ``mon.a`` has the highest rank: + namely, rank 0. Most Common Monitor Issues @@ -250,14 +260,15 @@ detail`` returns a message similar to the following:: Monitors at a wrong address. ``mon_status`` outputs the ``monmap`` that is known to the monitor: determine whether the other Monitors' locations as specified in the ``monmap`` match the locations of the Monitors in the - network. If they do not, see `Recovering a Monitor's Broken monmap`_. - If the locations of the Monitors as specified in the ``monmap`` match the - locations of the Monitors in the network, then the persistent - ``probing`` state could be related to severe clock skews amongst the monitor - nodes. See `Clock Skews`_. If the information in `Clock Skews`_ does not - bring the Monitor out of the ``probing`` state, then prepare your system logs - and ask the Ceph community for help. See `Preparing your logs`_ for - information about the proper preparation of logs. + network. If they do not, see :ref:`Recovering a Monitor's Broken monmap + <rados_troubleshooting_troubleshooting_mon_recovering_broken_monmap>`. If + the locations of the Monitors as specified in the ``monmap`` match the + locations of the Monitors in the network, then the persistent ``probing`` + state could be related to severe clock skews among the monitor nodes. See + `Clock Skews`_. If the information in `Clock Skews`_ does not bring the + Monitor out of the ``probing`` state, then prepare your system logs and ask + the Ceph community for help. See `Preparing your logs`_ for information about + the proper preparation of logs. **What does it mean when a Monitor's state is ``electing``?** @@ -314,13 +325,16 @@ detail`` returns a message similar to the following:: substantiate it. See `Preparing your logs`_ for information about the proper preparation of logs. +.. _rados_troubleshooting_troubleshooting_mon_recovering_broken_monmap: -Recovering a Monitor's Broken ``monmap`` ----------------------------------------- +Recovering a Monitor's Broken "monmap" +-------------------------------------- -This is how a ``monmap`` usually looks, depending on the number of -monitors:: +A monmap can be retrieved by using a command of the form ``ceph tell mon.c +mon_status``, as described in :ref:`Understanding mon_status +<rados_troubleshoting_troubleshooting_mon_understanding_mon_status>`. +Here is an example of a ``monmap``:: epoch 3 fsid 5c4e9d53-e2e1-478a-8061-f543f8be4cf8 @@ -329,61 +343,64 @@ monitors:: 0: 127.0.0.1:6789/0 mon.a 1: 127.0.0.1:6790/0 mon.b 2: 127.0.0.1:6795/0 mon.c - -This may not be what you have however. For instance, in some versions of -early Cuttlefish there was a bug that could cause your ``monmap`` -to be nullified. Completely filled with zeros. This means that not even -``monmaptool`` would be able to make sense of cold, hard, inscrutable zeros. -It's also possible to end up with a monitor with a severely outdated monmap, -notably if the node has been down for months while you fight with your vendor's -TAC. The subject ``ceph-mon`` daemon might be unable to find the surviving -monitors (e.g., say ``mon.c`` is down; you add a new monitor ``mon.d``, -then remove ``mon.a``, then add a new monitor ``mon.e`` and remove -``mon.b``; you will end up with a totally different monmap from the one -``mon.c`` knows). -In this situation you have two possible solutions: +This ``monmap`` is in working order, but your ``monmap`` might not be in +working order. The ``monmap`` in a given node might be outdated because the +node was down for a long time, during which the cluster's Monitors changed. + +There are two ways to update a Monitor's outdated ``monmap``: + +A. **Scrap the monitor and redeploy.** + + Do this only if you are certain that you will not lose the information kept + by the Monitor that you scrap. Make sure that you have other Monitors in + good condition, so that the new Monitor will be able to synchronize with + the surviving Monitors. Remember that destroying a Monitor can lead to data + loss if there are no other copies of the Monitor's contents. + +B. **Inject a monmap into the monitor.** -Scrap the monitor and redeploy + It is possible to fix a Monitor that has an outdated ``monmap`` by + retrieving an up-to-date ``monmap`` from surviving Monitors in the cluster + and injecting it into the Monitor that has a corrupted or missing + ``monmap``. - You should only take this route if you are positive that you won't - lose the information kept by that monitor; that you have other monitors - and that they are running just fine so that your new monitor is able - to synchronize from the remaining monitors. Keep in mind that destroying - a monitor, if there are no other copies of its contents, may lead to - loss of data. + Implement this solution by carrying out the following procedure: -Inject a monmap into the monitor + #. Retrieve the ``monmap`` in one of the two following ways: - These are the basic steps: + a. **IF THERE IS A QUORUM OF MONITORS:** + + Retrieve the ``monmap`` from the quorum: - Retrieve the ``monmap`` from the surviving monitors and inject it into the - monitor whose ``monmap`` is corrupted or lost. + .. prompt:: bash - Implement this solution by carrying out the following procedure: + ceph mon getmap -o /tmp/monmap - 1. Is there a quorum of monitors? If so, retrieve the ``monmap`` from the - quorum:: + b. **IF THERE IS NO QUORUM OF MONITORS:** + + Retrieve the ``monmap`` directly from a Monitor that has been stopped + : - $ ceph mon getmap -o /tmp/monmap + .. prompt:: bash - 2. If there is no quorum, then retrieve the ``monmap`` directly from another - monitor that has been stopped (in this example, the other monitor has - the ID ``ID-FOO``):: + ceph-mon -i ID-FOO --extract-monmap /tmp/monmap - $ ceph-mon -i ID-FOO --extract-monmap /tmp/monmap + In this example, the ID of the stopped Monitor is ``ID-FOO``. - 3. Stop the monitor you are going to inject the monmap into. + #. Stop the Monitor into which the ``monmap`` will be injected. - 4. Inject the monmap:: + #. Inject the monmap into the stopped Monitor: - $ ceph-mon -i ID --inject-monmap /tmp/monmap + .. prompt:: bash - 5. Start the monitor + ceph-mon -i ID --inject-monmap /tmp/monmap - .. warning:: Injecting ``monmaps`` can cause serious problems because doing - so will overwrite the latest existing ``monmap`` stored on the monitor. Be - careful! + #. Start the Monitor. + + .. warning:: Injecting a ``monmap`` into a Monitor can cause serious + problems. Injecting a ``monmap`` overwrites the latest existing + ``monmap`` stored on the monitor. Be careful! Clock Skews ----------- @@ -464,12 +481,13 @@ Clock Skew Questions and Answers Client Can't Connect or Mount ----------------------------- -Check your IP tables. Some operating-system install utilities add a ``REJECT`` -rule to ``iptables``. ``iptables`` rules will reject all clients other than -``ssh`` that try to connect to the host. If your monitor host's IP tables have -a ``REJECT`` rule in place, clients that are connecting from a separate node -will fail and will raise a timeout error. Any ``iptables`` rules that reject -clients trying to connect to Ceph daemons must be addressed. For example:: +If a client can't connect to the cluster or mount, check your iptables. Some +operating-system install utilities add a ``REJECT`` rule to ``iptables``. +``iptables`` rules will reject all clients other than ``ssh`` that try to +connect to the host. If your monitor host's iptables have a ``REJECT`` rule in +place, clients that connect from a separate node will fail, and this will raise +a timeout error. Look for ``iptables`` rules that reject clients that are +trying to connect to Ceph daemons. For example:: REJECT all -- anywhere anywhere reject-with icmp-host-prohibited @@ -487,9 +505,9 @@ Monitor Store Failures Symptoms of store corruption ---------------------------- -Ceph monitors store the :term:`Cluster Map` in a key-value store. If key-value -store corruption causes a monitor to fail, then the monitor log might contain -one of the following error messages:: +Ceph Monitors maintain the :term:`Cluster Map` in a key-value store. If +key-value store corruption causes a Monitor to fail, then the Monitor log might +contain one of the following error messages:: Corruption: error in middle of record @@ -500,10 +518,10 @@ or:: Recovery using healthy monitor(s) --------------------------------- -If there are surviving monitors, we can always :ref:`replace -<adding-and-removing-monitors>` the corrupted monitor with a new one. After the -new monitor boots, it will synchronize with a healthy peer. After the new -monitor is fully synchronized, it will be able to serve clients. +If the cluster contains surviving Monitors, the corrupted Monitor can be +:ref:`replaced <adding-and-removing-monitors>` with a new Monitor. After the +new Monitor boots, it will synchronize with a healthy peer. After the new +Monitor is fully synchronized, it will be able to serve clients. .. _mon-store-recovery-using-osds: @@ -511,15 +529,14 @@ Recovery using OSDs ------------------- Even if all monitors fail at the same time, it is possible to recover the -monitor store by using information stored in OSDs. You are encouraged to deploy -at least three (and preferably five) monitors in a Ceph cluster. In such a -deployment, complete monitor failure is unlikely. However, unplanned power loss -in a data center whose disk settings or filesystem settings are improperly -configured could cause the underlying filesystem to fail and this could kill -all of the monitors. In such a case, data in the OSDs can be used to recover -the monitors. The following is such a script and can be used to recover the -monitors: - +Monitor store by using information that is stored in OSDs. You are encouraged +to deploy at least three (and preferably five) Monitors in a Ceph cluster. In +such a deployment, complete Monitor failure is unlikely. However, unplanned +power loss in a data center whose disk settings or filesystem settings are +improperly configured could cause the underlying filesystem to fail and this +could kill all of the monitors. In such a case, data in the OSDs can be used to +recover the Monitors. The following is a script that can be used in such a case +to recover the Monitors: .. code-block:: bash @@ -572,10 +589,10 @@ monitors: This script performs the following steps: -#. Collects the map from each OSD host. -#. Rebuilds the store. -#. Fills the entities in the keyring file with appropriate capabilities. -#. Replaces the corrupted store on ``mon.foo`` with the recovered copy. +#. Collect the map from each OSD host. +#. Rebuild the store. +#. Fill the entities in the keyring file with appropriate capabilities. +#. Replace the corrupted store on ``mon.foo`` with the recovered copy. Known limitations @@ -587,19 +604,18 @@ The above recovery tool is unable to recover the following information: auth add`` command are recovered from the OSD's copy, and the ``client.admin`` keyring is imported using ``ceph-monstore-tool``. However, the MDS keyrings and all other keyrings will be missing in the recovered - monitor store. You might need to manually re-add them. + Monitor store. It might be necessary to manually re-add them. - **Creating pools**: If any RADOS pools were in the process of being created, that state is lost. The recovery tool operates on the assumption that all pools have already been created. If there are PGs that are stuck in the - 'unknown' state after the recovery for a partially created pool, you can + ``unknown`` state after the recovery for a partially created pool, you can force creation of the *empty* PG by running the ``ceph osd force-create-pg`` - command. Note that this will create an *empty* PG, so take this action only - if you know the pool is empty. + command. This creates an *empty* PG, so take this action only if you are + certain that the pool is empty. - **MDS Maps**: The MDS maps are lost. - Everything Failed! Now What? ============================ @@ -611,16 +627,20 @@ irc.oftc.net), or at ``dev@ceph.io`` and ``ceph-users@lists.ceph.com``. Make sure that you have prepared your logs and that you have them ready upon request. -See https://ceph.io/en/community/connect/ for current (as of October 2023) -information on getting in contact with the upstream Ceph community. +The upstream Ceph Slack workspace can be joined at this address: +https://ceph-storage.slack.com/ +See https://ceph.io/en/community/connect/ for current (as of December 2023) +information on getting in contact with the upstream Ceph community. Preparing your logs ------------------- -The default location for monitor logs is ``/var/log/ceph/ceph-mon.FOO.log*``. -However, if they are not there, you can find their current location by running -the following command: +The default location for Monitor logs is ``/var/log/ceph/ceph-mon.FOO.log*``. +It is possible that the location of the Monitor logs has been changed from the +default. If the location of the Monitor logs has been changed from the default +location, find the location of the Monitor logs by running the following +command: .. prompt:: bash @@ -631,21 +651,21 @@ cluster's configuration files. If Ceph is using the default debug levels, then your logs might be missing important information that would help the upstream Ceph community address your issue. -To make sure your monitor logs contain relevant information, you can raise -debug levels. Here we are interested in information from the monitors. As with -other components, the monitors have different parts that output their debug +Raise debug levels to make sure that your Monitor logs contain relevant +information. Here we are interested in information from the Monitors. As with +other components, the Monitors have different parts that output their debug information on different subsystems. If you are an experienced Ceph troubleshooter, we recommend raising the debug -levels of the most relevant subsystems. Of course, this approach might not be -easy for beginners. In most cases, however, enough information to address the -issue will be secured if the following debug levels are entered:: +levels of the most relevant subsystems. This approach might not be easy for +beginners. In most cases, however, enough information to address the issue will +be logged if the following debug levels are entered:: debug_mon = 10 debug_ms = 1 Sometimes these debug levels do not yield enough information. In such cases, -members of the upstream Ceph community might ask you to make additional changes +members of the upstream Ceph community will ask you to make additional changes to these or to other debug levels. In any case, it is better for us to receive at least some useful information than to receive an empty log. @@ -653,10 +673,12 @@ at least some useful information than to receive an empty log. Do I need to restart a monitor to adjust debug levels? ------------------------------------------------------ -No, restarting a monitor is not necessary. Debug levels may be adjusted by -using two different methods, depending on whether or not there is a quorum: +No. It is not necessary to restart a Monitor when adjusting its debug levels. + +There are two different methods for adjusting debug levels. One method is used +when there is quorum. The other is used when there is no quorum. -There is a quorum +**Adjusting debug levels when there is a quorum** Either inject the debug option into the specific monitor that needs to be debugged:: @@ -668,17 +690,19 @@ There is a quorum ceph tell mon.* config set debug_mon 10/10 -There is no quorum +**Adjusting debug levels when there is no quorum** Use the admin socket of the specific monitor that needs to be debugged and directly adjust the monitor's configuration options:: ceph daemon mon.FOO config set debug_mon 10/10 +**Returning debug levels to their default values** To return the debug levels to their default values, run the above commands -using the debug level ``1/10`` rather than ``10/10``. To check a monitor's -current values, use the admin socket and run either of the following commands: +using the debug level ``1/10`` rather than the debug level ``10/10``. To check +a Monitor's current values, use the admin socket and run either of the +following commands: .. prompt:: bash @@ -695,17 +719,17 @@ or: I Reproduced the problem with appropriate debug levels. Now what? ----------------------------------------------------------------- -We prefer that you send us only the portions of your logs that are relevant to -your monitor problems. Of course, it might not be easy for you to determine -which portions are relevant so we are willing to accept complete and -unabridged logs. However, we request that you avoid sending logs containing -hundreds of thousands of lines with no additional clarifying information. One -common-sense way of making our task easier is to write down the current time -and date when you are reproducing the problem and then extract portions of your +Send the upstream Ceph community only the portions of your logs that are +relevant to your Monitor problems. Because it might not be easy for you to +determine which portions are relevant, the upstream Ceph community accepts +complete and unabridged logs. But don't send logs containing hundreds of +thousands of lines with no additional clarifying information. One common-sense +way to help the Ceph community help you is to write down the current time and +date when you are reproducing the problem and then extract portions of your logs based on that information. -Finally, reach out to us on the mailing lists or IRC or Slack, or by filing a -new issue on the `tracker`_. +Contact the upstream Ceph community on the mailing lists or IRC or Slack, or by +filing a new issue on the `tracker`_. .. _tracker: http://tracker.ceph.com/projects/ceph/issues/new |