diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-21 11:54:28 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-21 11:54:28 +0000 |
commit | e6918187568dbd01842d8d1d2c808ce16a894239 (patch) | |
tree | 64f88b554b444a49f656b6c656111a145cbbaa28 /doc/rados/operations/add-or-rm-mons.rst | |
parent | Initial commit. (diff) | |
download | ceph-e6918187568dbd01842d8d1d2c808ce16a894239.tar.xz ceph-e6918187568dbd01842d8d1d2c808ce16a894239.zip |
Adding upstream version 18.2.2.upstream/18.2.2
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/rados/operations/add-or-rm-mons.rst')
-rw-r--r-- | doc/rados/operations/add-or-rm-mons.rst | 458 |
1 files changed, 458 insertions, 0 deletions
diff --git a/doc/rados/operations/add-or-rm-mons.rst b/doc/rados/operations/add-or-rm-mons.rst new file mode 100644 index 000000000..3688bb798 --- /dev/null +++ b/doc/rados/operations/add-or-rm-mons.rst @@ -0,0 +1,458 @@ +.. _adding-and-removing-monitors: + +========================== + Adding/Removing Monitors +========================== + +It is possible to add monitors to a running cluster as long as redundancy is +maintained. To bootstrap a monitor, see `Manual Deployment`_ or `Monitor +Bootstrap`_. + +.. _adding-monitors: + +Adding Monitors +=============== + +Ceph monitors serve as the single source of truth for the cluster map. It is +possible to run a cluster with only one monitor, but for a production cluster +it is recommended to have at least three monitors provisioned and in quorum. +Ceph monitors use a variation of the `Paxos`_ algorithm to maintain consensus +about maps and about other critical information across the cluster. Due to the +nature of Paxos, Ceph is able to maintain quorum (and thus establish +consensus) only if a majority of the monitors are ``active``. + +It is best to run an odd number of monitors. This is because a cluster that is +running an odd number of monitors is more resilient than a cluster running an +even number. For example, in a two-monitor deployment, no failures can be +tolerated if quorum is to be maintained; in a three-monitor deployment, one +failure can be tolerated; in a four-monitor deployment, one failure can be +tolerated; and in a five-monitor deployment, two failures can be tolerated. In +general, a cluster running an odd number of monitors is best because it avoids +what is called the *split brain* phenomenon. In short, Ceph is able to operate +only if a majority of monitors are ``active`` and able to communicate with each +other, (for example: there must be a single monitor, two out of two monitors, +two out of three monitors, three out of five monitors, or the like). + +For small or non-critical deployments of multi-node Ceph clusters, it is +recommended to deploy three monitors. For larger clusters or for clusters that +are intended to survive a double failure, it is recommended to deploy five +monitors. Only in rare circumstances is there any justification for deploying +seven or more monitors. + +It is possible to run a monitor on the same host that is running an OSD. +However, this approach has disadvantages: for example: `fsync` issues with the +kernel might weaken performance, monitor and OSD daemons might be inactive at +the same time and cause disruption if the node crashes, is rebooted, or is +taken down for maintenance. Because of these risks, it is instead +recommended to run monitors and managers on dedicated hosts. + +.. note:: A *majority* of monitors in your cluster must be able to + reach each other in order for quorum to be established. + +Deploying your Hardware +----------------------- + +Some operators choose to add a new monitor host at the same time that they add +a new monitor. For details on the minimum recommendations for monitor hardware, +see `Hardware Recommendations`_. Before adding a monitor host to the cluster, +make sure that there is an up-to-date version of Linux installed. + +Add the newly installed monitor host to a rack in your cluster, connect the +host to the network, and make sure that the host has network connectivity. + +.. _Hardware Recommendations: ../../../start/hardware-recommendations + +Installing the Required Software +-------------------------------- + +In manually deployed clusters, it is necessary to install Ceph packages +manually. For details, see `Installing Packages`_. Configure SSH so that it can +be used by a user that has passwordless authentication and root permissions. + +.. _Installing Packages: ../../../install/install-storage-cluster + + +.. _Adding a Monitor (Manual): + +Adding a Monitor (Manual) +------------------------- + +The procedure in this section creates a ``ceph-mon`` data directory, retrieves +both the monitor map and the monitor keyring, and adds a ``ceph-mon`` daemon to +the cluster. The procedure might result in a Ceph cluster that contains only +two monitor daemons. To add more monitors until there are enough ``ceph-mon`` +daemons to establish quorum, repeat the procedure. + +This is a good point at which to define the new monitor's ``id``. Monitors have +often been named with single letters (``a``, ``b``, ``c``, etc.), but you are +free to define the ``id`` however you see fit. In this document, ``{mon-id}`` +refers to the ``id`` exclusive of the ``mon.`` prefix: for example, if +``mon.a`` has been chosen as the ``id`` of a monitor, then ``{mon-id}`` is +``a``. ??? + +#. Create a data directory on the machine that will host the new monitor: + + .. prompt:: bash $ + + ssh {new-mon-host} + sudo mkdir /var/lib/ceph/mon/ceph-{mon-id} + +#. Create a temporary directory ``{tmp}`` that will contain the files needed + during this procedure. This directory should be different from the data + directory created in the previous step. Because this is a temporary + directory, it can be removed after the procedure is complete: + + .. prompt:: bash $ + + mkdir {tmp} + +#. Retrieve the keyring for your monitors (``{tmp}`` is the path to the + retrieved keyring and ``{key-filename}`` is the name of the file that + contains the retrieved monitor key): + + .. prompt:: bash $ + + ceph auth get mon. -o {tmp}/{key-filename} + +#. Retrieve the monitor map (``{tmp}`` is the path to the retrieved monitor map + and ``{map-filename}`` is the name of the file that contains the retrieved + monitor map): + + .. prompt:: bash $ + + ceph mon getmap -o {tmp}/{map-filename} + +#. Prepare the monitor's data directory, which was created in the first step. + The following command must specify the path to the monitor map (so that + information about a quorum of monitors and their ``fsid``\s can be + retrieved) and specify the path to the monitor keyring: + + .. prompt:: bash $ + + sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename} --keyring {tmp}/{key-filename} + +#. Start the new monitor. It will automatically join the cluster. To provide + information to the daemon about which address to bind to, use either the + ``--public-addr {ip}`` option or the ``--public-network {network}`` option. + For example: + + .. prompt:: bash $ + + ceph-mon -i {mon-id} --public-addr {ip:port} + +.. _removing-monitors: + +Removing Monitors +================= + +When monitors are removed from a cluster, it is important to remember +that Ceph monitors use Paxos to maintain consensus about the cluster +map. Such consensus is possible only if the number of monitors is sufficient +to establish quorum. + + +.. _Removing a Monitor (Manual): + +Removing a Monitor (Manual) +--------------------------- + +The procedure in this section removes a ``ceph-mon`` daemon from the cluster. +The procedure might result in a Ceph cluster that contains a number of monitors +insufficient to maintain quorum, so plan carefully. When replacing an old +monitor with a new monitor, add the new monitor first, wait for quorum to be +established, and then remove the old monitor. This ensures that quorum is not +lost. + + +#. Stop the monitor: + + .. prompt:: bash $ + + service ceph -a stop mon.{mon-id} + +#. Remove the monitor from the cluster: + + .. prompt:: bash $ + + ceph mon remove {mon-id} + +#. Remove the monitor entry from the ``ceph.conf`` file: + +.. _rados-mon-remove-from-unhealthy: + + +Removing Monitors from an Unhealthy Cluster +------------------------------------------- + +The procedure in this section removes a ``ceph-mon`` daemon from an unhealthy +cluster (for example, a cluster whose monitors are unable to form a quorum). + +#. Stop all ``ceph-mon`` daemons on all monitor hosts: + + .. prompt:: bash $ + + ssh {mon-host} + systemctl stop ceph-mon.target + + Repeat this step on every monitor host. + +#. Identify a surviving monitor and log in to the monitor's host: + + .. prompt:: bash $ + + ssh {mon-host} + +#. Extract a copy of the ``monmap`` file by running a command of the following + form: + + .. prompt:: bash $ + + ceph-mon -i {mon-id} --extract-monmap {map-path} + + Here is a more concrete example. In this example, ``hostname`` is the + ``{mon-id}`` and ``/tmp/monpap`` is the ``{map-path}``: + + .. prompt:: bash $ + + ceph-mon -i `hostname` --extract-monmap /tmp/monmap + +#. Remove the non-surviving or otherwise problematic monitors: + + .. prompt:: bash $ + + monmaptool {map-path} --rm {mon-id} + + For example, suppose that there are three monitors |---| ``mon.a``, ``mon.b``, + and ``mon.c`` |---| and that only ``mon.a`` will survive: + + .. prompt:: bash $ + + monmaptool /tmp/monmap --rm b + monmaptool /tmp/monmap --rm c + +#. Inject the surviving map that includes the removed monitors into the + monmap of the surviving monitor(s): + + .. prompt:: bash $ + + ceph-mon -i {mon-id} --inject-monmap {map-path} + + Continuing with the above example, inject a map into monitor ``mon.a`` by + running the following command: + + .. prompt:: bash $ + + ceph-mon -i a --inject-monmap /tmp/monmap + + +#. Start only the surviving monitors. + +#. Verify that the monitors form a quorum by running the command ``ceph -s``. + +#. The data directory of the removed monitors is in ``/var/lib/ceph/mon``: + either archive this data directory in a safe location or delete this data + directory. However, do not delete it unless you are confident that the + remaining monitors are healthy and sufficiently redundant. Make sure that + there is enough room for the live DB to expand and compact, and make sure + that there is also room for an archived copy of the DB. The archived copy + can be compressed. + +.. _Changing a Monitor's IP address: + +Changing a Monitor's IP Address +=============================== + +.. important:: Existing monitors are not supposed to change their IP addresses. + +Monitors are critical components of a Ceph cluster. The entire system can work +properly only if the monitors maintain quorum, and quorum can be established +only if the monitors have discovered each other by means of their IP addresses. +Ceph has strict requirements on the discovery of monitors. + +Although the ``ceph.conf`` file is used by Ceph clients and other Ceph daemons +to discover monitors, the monitor map is used by monitors to discover each +other. This is why it is necessary to obtain the current ``monmap`` at the time +a new monitor is created: as can be seen above in `Adding a Monitor (Manual)`_, +the ``monmap`` is one of the arguments required by the ``ceph-mon -i {mon-id} +--mkfs`` command. The following sections explain the consistency requirements +for Ceph monitors, and also explain a number of safe ways to change a monitor's +IP address. + + +Consistency Requirements +------------------------ + +When a monitor discovers other monitors in the cluster, it always refers to the +local copy of the monitor map. Using the monitor map instead of using the +``ceph.conf`` file avoids errors that could break the cluster (for example, +typos or other slight errors in ``ceph.conf`` when a monitor address or port is +specified). Because monitors use monitor maps for discovery and because they +share monitor maps with Ceph clients and other Ceph daemons, the monitor map +provides monitors with a strict guarantee that their consensus is valid. + +Strict consistency also applies to updates to the monmap. As with any other +updates on the monitor, changes to the monmap always run through a distributed +consensus algorithm called `Paxos`_. The monitors must agree on each update to +the monmap, such as adding or removing a monitor, to ensure that each monitor +in the quorum has the same version of the monmap. Updates to the monmap are +incremental so that monitors have the latest agreed upon version, and a set of +previous versions, allowing a monitor that has an older version of the monmap +to catch up with the current state of the cluster. + +There are additional advantages to using the monitor map rather than +``ceph.conf`` when monitors discover each other. Because ``ceph.conf`` is not +automatically updated and distributed, its use would bring certain risks: +monitors might use an outdated ``ceph.conf`` file, might fail to recognize a +specific monitor, might fall out of quorum, and might develop a situation in +which `Paxos`_ is unable to accurately ascertain the current state of the +system. Because of these risks, any changes to an existing monitor's IP address +must be made with great care. + +.. _operations_add_or_rm_mons_changing_mon_ip: + +Changing a Monitor's IP address (Preferred Method) +-------------------------------------------------- + +If a monitor's IP address is changed only in the ``ceph.conf`` file, there is +no guarantee that the other monitors in the cluster will receive the update. +For this reason, the preferred method to change a monitor's IP address is as +follows: add a new monitor with the desired IP address (as described in `Adding +a Monitor (Manual)`_), make sure that the new monitor successfully joins the +quorum, remove the monitor that is using the old IP address, and update the +``ceph.conf`` file to ensure that clients and other daemons are made aware of +the new monitor's IP address. + +For example, suppose that there are three monitors in place:: + + [mon.a] + host = host01 + addr = 10.0.0.1:6789 + [mon.b] + host = host02 + addr = 10.0.0.2:6789 + [mon.c] + host = host03 + addr = 10.0.0.3:6789 + +To change ``mon.c`` so that its name is ``host04`` and its IP address is +``10.0.0.4``: (1) follow the steps in `Adding a Monitor (Manual)`_ to add a new +monitor ``mon.d``, (2) make sure that ``mon.d`` is running before removing +``mon.c`` or else quorum will be broken, and (3) follow the steps in `Removing +a Monitor (Manual)`_ to remove ``mon.c``. To move all three monitors to new IP +addresses, repeat this process. + +Changing a Monitor's IP address (Advanced Method) +------------------------------------------------- + +There are cases in which the method outlined in :ref"`<Changing a Monitor's IP +Address (Preferred Method)> operations_add_or_rm_mons_changing_mon_ip` cannot +be used. For example, it might be necessary to move the cluster's monitors to a +different network, to a different part of the datacenter, or to a different +datacenter altogether. It is still possible to change the monitors' IP +addresses, but a different method must be used. + +For such cases, a new monitor map with updated IP addresses for every monitor +in the cluster must be generated and injected on each monitor. Although this +method is not particularly easy, such a major migration is unlikely to be a +routine task. As stated at the beginning of this section, existing monitors are +not supposed to change their IP addresses. + +Continue with the monitor configuration in the example from :ref"`<Changing a +Monitor's IP Address (Preferred Method)> +operations_add_or_rm_mons_changing_mon_ip` . Suppose that all of the monitors +are to be moved from the ``10.0.0.x`` range to the ``10.1.0.x`` range, and that +these networks are unable to communicate. Carry out the following procedure: + +#. Retrieve the monitor map (``{tmp}`` is the path to the retrieved monitor + map, and ``{filename}`` is the name of the file that contains the retrieved + monitor map): + + .. prompt:: bash $ + + ceph mon getmap -o {tmp}/{filename} + +#. Check the contents of the monitor map: + + .. prompt:: bash $ + + monmaptool --print {tmp}/{filename} + + :: + + monmaptool: monmap file {tmp}/{filename} + epoch 1 + fsid 224e376d-c5fe-4504-96bb-ea6332a19e61 + last_changed 2012-12-17 02:46:41.591248 + created 2012-12-17 02:46:41.591248 + 0: 10.0.0.1:6789/0 mon.a + 1: 10.0.0.2:6789/0 mon.b + 2: 10.0.0.3:6789/0 mon.c + +#. Remove the existing monitors from the monitor map: + + .. prompt:: bash $ + + monmaptool --rm a --rm b --rm c {tmp}/{filename} + + :: + + monmaptool: monmap file {tmp}/{filename} + monmaptool: removing a + monmaptool: removing b + monmaptool: removing c + monmaptool: writing epoch 1 to {tmp}/{filename} (0 monitors) + +#. Add the new monitor locations to the monitor map: + + .. prompt:: bash $ + + monmaptool --add a 10.1.0.1:6789 --add b 10.1.0.2:6789 --add c 10.1.0.3:6789 {tmp}/{filename} + + :: + + monmaptool: monmap file {tmp}/{filename} + monmaptool: writing epoch 1 to {tmp}/{filename} (3 monitors) + +#. Check the new contents of the monitor map: + + .. prompt:: bash $ + + monmaptool --print {tmp}/{filename} + + :: + + monmaptool: monmap file {tmp}/{filename} + epoch 1 + fsid 224e376d-c5fe-4504-96bb-ea6332a19e61 + last_changed 2012-12-17 02:46:41.591248 + created 2012-12-17 02:46:41.591248 + 0: 10.1.0.1:6789/0 mon.a + 1: 10.1.0.2:6789/0 mon.b + 2: 10.1.0.3:6789/0 mon.c + +At this point, we assume that the monitors (and stores) have been installed at +the new location. Next, propagate the modified monitor map to the new monitors, +and inject the modified monitor map into each new monitor. + +#. Make sure all of your monitors have been stopped. Never inject into a + monitor while the monitor daemon is running. + +#. Inject the monitor map: + + .. prompt:: bash $ + + ceph-mon -i {mon-id} --inject-monmap {tmp}/{filename} + +#. Restart all of the monitors. + +Migration to the new location is now complete. The monitors should operate +successfully. + + + +.. _Manual Deployment: ../../../install/manual-deployment +.. _Monitor Bootstrap: ../../../dev/mon-bootstrap +.. _Paxos: https://en.wikipedia.org/wiki/Paxos_(computer_science) + +.. |---| unicode:: U+2014 .. EM DASH + :trim: |