diff options
Diffstat (limited to '')
-rw-r--r-- | doc/rados/operations/add-or-rm-mons.rst | 375 |
1 files changed, 375 insertions, 0 deletions
diff --git a/doc/rados/operations/add-or-rm-mons.rst b/doc/rados/operations/add-or-rm-mons.rst new file mode 100644 index 00000000..ba03839a --- /dev/null +++ b/doc/rados/operations/add-or-rm-mons.rst @@ -0,0 +1,375 @@ +.. _adding-and-removing-monitors: + +========================== + Adding/Removing Monitors +========================== + +When you have a cluster up and running, you may add or remove monitors +from the cluster at runtime. To bootstrap a monitor, see `Manual Deployment`_ +or `Monitor Bootstrap`_. + +.. _adding-monitors: + +Adding Monitors +=============== + +Ceph monitors are light-weight processes that maintain a master copy of the +cluster map. You can run a cluster with 1 monitor. We recommend at least 3 +monitors for a production cluster. Ceph monitors use a variation of the +`Paxos`_ protocol to establish consensus about maps and other critical +information across the cluster. Due to the nature of Paxos, Ceph requires +a majority of monitors running to establish a quorum (thus establishing +consensus). + +It is advisable to run an odd-number of monitors but not mandatory. An +odd-number of monitors has a higher resiliency to failures than an +even-number of monitors. For instance, on a 2 monitor deployment, no +failures can be tolerated in order to maintain a quorum; with 3 monitors, +one failure can be tolerated; in a 4 monitor deployment, one failure can +be tolerated; with 5 monitors, two failures can be tolerated. This is +why an odd-number is advisable. Summarizing, Ceph needs a majority of +monitors to be running (and able to communicate with each other), but that +majority can be achieved using a single monitor, or 2 out of 2 monitors, +2 out of 3, 3 out of 4, etc. + +For an initial deployment of a multi-node Ceph cluster, it is advisable to +deploy three monitors, increasing the number two at a time if a valid need +for more than three exists. + +Since monitors are light-weight, it is possible to run them on the same +host as an OSD; however, we recommend running them on separate hosts, +because fsync issues with the kernel may impair performance. + +.. note:: A *majority* of monitors in your cluster must be able to + reach each other in order to establish a quorum. + +Deploy your Hardware +-------------------- + +If you are adding a new host when adding a new monitor, see `Hardware +Recommendations`_ for details on minimum recommendations for monitor hardware. +To add a monitor host to your cluster, first make sure you have an up-to-date +version of Linux installed (typically Ubuntu 16.04 or RHEL 7). + +Add your monitor host to a rack in your cluster, connect it to the network +and ensure that it has network connectivity. + +.. _Hardware Recommendations: ../../../start/hardware-recommendations + +Install the Required Software +----------------------------- + +For manually deployed clusters, you must install Ceph packages +manually. See `Installing Packages`_ for details. +You should configure SSH to a user with password-less authentication +and root permissions. + +.. _Installing Packages: ../../../install/install-storage-cluster + + +.. _Adding a Monitor (Manual): + +Adding a Monitor (Manual) +------------------------- + +This procedure creates a ``ceph-mon`` data directory, retrieves the monitor map +and monitor keyring, and adds a ``ceph-mon`` daemon to your cluster. If +this results in only two monitor daemons, you may add more monitors by +repeating this procedure until you have a sufficient number of ``ceph-mon`` +daemons to achieve a quorum. + +At this point you should define your monitor's id. Traditionally, monitors +have been named with single letters (``a``, ``b``, ``c``, ...), but you are +free to define the id as you see fit. For the purpose of this document, +please take into account that ``{mon-id}`` should be the id you chose, +without the ``mon.`` prefix (i.e., ``{mon-id}`` should be the ``a`` +on ``mon.a``). + +#. Create the default directory on the machine that will host your + new monitor. :: + + ssh {new-mon-host} + sudo mkdir /var/lib/ceph/mon/ceph-{mon-id} + +#. Create a temporary directory ``{tmp}`` to keep the files needed during + this process. This directory should be different from the monitor's default + directory created in the previous step, and can be removed after all the + steps are executed. :: + + mkdir {tmp} + +#. Retrieve the keyring for your monitors, where ``{tmp}`` is the path to + the retrieved keyring, and ``{key-filename}`` is the name of the file + containing the retrieved monitor key. :: + + ceph auth get mon. -o {tmp}/{key-filename} + +#. Retrieve the monitor map, where ``{tmp}`` is the path to + the retrieved monitor map, and ``{map-filename}`` is the name of the file + containing the retrieved monitor map. :: + + ceph mon getmap -o {tmp}/{map-filename} + +#. Prepare the monitor's data directory created in the first step. You must + specify the path to the monitor map so that you can retrieve the + information about a quorum of monitors and their ``fsid``. You must also + specify a path to the monitor keyring:: + + sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename} --keyring {tmp}/{key-filename} + + +#. Start the new monitor and it will automatically join the cluster. + The daemon needs to know which address to bind to, via either the + ``--public-addr {ip}`` or ``--public-network {network}`` argument. + For example:: + + ceph-mon -i {mon-id} --public-addr {ip:port} + +.. _removing-monitors: + +Removing Monitors +================= + +When you remove monitors from a cluster, consider that Ceph monitors use +PAXOS to establish consensus about the master cluster map. You must have +a sufficient number of monitors to establish a quorum for consensus about +the cluster map. + +.. _Removing a Monitor (Manual): + +Removing a Monitor (Manual) +--------------------------- + +This procedure removes a ``ceph-mon`` daemon from your cluster. If this +procedure results in only two monitor daemons, you may add or remove another +monitor until you have a number of ``ceph-mon`` daemons that can achieve a +quorum. + +#. Stop the monitor. :: + + service ceph -a stop mon.{mon-id} + +#. Remove the monitor from the cluster. :: + + ceph mon remove {mon-id} + +#. Remove the monitor entry from ``ceph.conf``. + + +Removing Monitors from an Unhealthy Cluster +------------------------------------------- + +This procedure removes a ``ceph-mon`` daemon from an unhealthy +cluster, for example a cluster where the monitors cannot form a +quorum. + + +#. Stop all ``ceph-mon`` daemons on all monitor hosts. :: + + ssh {mon-host} + service ceph stop mon || stop ceph-mon-all + # and repeat for all mons + +#. Identify a surviving monitor and log in to that host. :: + + ssh {mon-host} + +#. Extract a copy of the monmap file. :: + + ceph-mon -i {mon-id} --extract-monmap {map-path} + # in most cases, that's + ceph-mon -i `hostname` --extract-monmap /tmp/monmap + +#. Remove the non-surviving or problematic monitors. For example, if + you have three monitors, ``mon.a``, ``mon.b``, and ``mon.c``, where + only ``mon.a`` will survive, follow the example below:: + + monmaptool {map-path} --rm {mon-id} + # for example, + monmaptool /tmp/monmap --rm b + monmaptool /tmp/monmap --rm c + +#. Inject the surviving map with the removed monitors into the + surviving monitor(s). For example, to inject a map into monitor + ``mon.a``, follow the example below:: + + ceph-mon -i {mon-id} --inject-monmap {map-path} + # for example, + ceph-mon -i a --inject-monmap /tmp/monmap + +#. Start only the surviving monitors. + +#. Verify the monitors form a quorum (``ceph -s``). + +#. You may wish to archive the removed monitors' data directory in + ``/var/lib/ceph/mon`` in a safe location, or delete it if you are + confident the remaining monitors are healthy and are sufficiently + redundant. + +.. _Changing a Monitor's IP address: + +Changing a Monitor's IP Address +=============================== + +.. important:: Existing monitors are not supposed to change their IP addresses. + +Monitors are critical components of a Ceph cluster, and they need to maintain a +quorum for the whole system to work properly. To establish a quorum, the +monitors need to discover each other. Ceph has strict requirements for +discovering monitors. + +Ceph clients and other Ceph daemons use ``ceph.conf`` to discover monitors. +However, monitors discover each other using the monitor map, not ``ceph.conf``. +For example, if you refer to `Adding a Monitor (Manual)`_ you will see that you +need to obtain the current monmap for the cluster when creating a new monitor, +as it is one of the required arguments of ``ceph-mon -i {mon-id} --mkfs``. The +following sections explain the consistency requirements for Ceph monitors, and a +few safe ways to change a monitor's IP address. + + +Consistency Requirements +------------------------ + +A monitor always refers to the local copy of the monmap when discovering other +monitors in the cluster. Using the monmap instead of ``ceph.conf`` avoids +errors that could break the cluster (e.g., typos in ``ceph.conf`` when +specifying a monitor address or port). Since monitors use monmaps for discovery +and they share monmaps with clients and other Ceph daemons, the monmap provides +monitors with a strict guarantee that their consensus is valid. + +Strict consistency also applies to updates to the monmap. As with any other +updates on the monitor, changes to the monmap always run through a distributed +consensus algorithm called `Paxos`_. The monitors must agree on each update to +the monmap, such as adding or removing a monitor, to ensure that each monitor in +the quorum has the same version of the monmap. Updates to the monmap are +incremental so that monitors have the latest agreed upon version, and a set of +previous versions, allowing a monitor that has an older version of the monmap to +catch up with the current state of the cluster. + +If monitors discovered each other through the Ceph configuration file instead of +through the monmap, it would introduce additional risks because the Ceph +configuration files are not updated and distributed automatically. Monitors +might inadvertently use an older ``ceph.conf`` file, fail to recognize a +monitor, fall out of a quorum, or develop a situation where `Paxos`_ is not able +to determine the current state of the system accurately. Consequently, making +changes to an existing monitor's IP address must be done with great care. + + +Changing a Monitor's IP address (The Right Way) +----------------------------------------------- + +Changing a monitor's IP address in ``ceph.conf`` only is not sufficient to +ensure that other monitors in the cluster will receive the update. To change a +monitor's IP address, you must add a new monitor with the IP address you want +to use (as described in `Adding a Monitor (Manual)`_), ensure that the new +monitor successfully joins the quorum; then, remove the monitor that uses the +old IP address. Then, update the ``ceph.conf`` file to ensure that clients and +other daemons know the IP address of the new monitor. + +For example, lets assume there are three monitors in place, such as :: + + [mon.a] + host = host01 + addr = 10.0.0.1:6789 + [mon.b] + host = host02 + addr = 10.0.0.2:6789 + [mon.c] + host = host03 + addr = 10.0.0.3:6789 + +To change ``mon.c`` to ``host04`` with the IP address ``10.0.0.4``, follow the +steps in `Adding a Monitor (Manual)`_ by adding a new monitor ``mon.d``. Ensure +that ``mon.d`` is running before removing ``mon.c``, or it will break the +quorum. Remove ``mon.c`` as described on `Removing a Monitor (Manual)`_. Moving +all three monitors would thus require repeating this process as many times as +needed. + + +Changing a Monitor's IP address (The Messy Way) +----------------------------------------------- + +There may come a time when the monitors must be moved to a different network, a +different part of the datacenter or a different datacenter altogether. While it +is possible to do it, the process becomes a bit more hazardous. + +In such a case, the solution is to generate a new monmap with updated IP +addresses for all the monitors in the cluster, and inject the new map on each +individual monitor. This is not the most user-friendly approach, but we do not +expect this to be something that needs to be done every other week. As it is +clearly stated on the top of this section, monitors are not supposed to change +IP addresses. + +Using the previous monitor configuration as an example, assume you want to move +all the monitors from the ``10.0.0.x`` range to ``10.1.0.x``, and these +networks are unable to communicate. Use the following procedure: + +#. Retrieve the monitor map, where ``{tmp}`` is the path to + the retrieved monitor map, and ``{filename}`` is the name of the file + containing the retrieved monitor map. :: + + ceph mon getmap -o {tmp}/{filename} + +#. The following example demonstrates the contents of the monmap. :: + + $ monmaptool --print {tmp}/{filename} + + monmaptool: monmap file {tmp}/{filename} + epoch 1 + fsid 224e376d-c5fe-4504-96bb-ea6332a19e61 + last_changed 2012-12-17 02:46:41.591248 + created 2012-12-17 02:46:41.591248 + 0: 10.0.0.1:6789/0 mon.a + 1: 10.0.0.2:6789/0 mon.b + 2: 10.0.0.3:6789/0 mon.c + +#. Remove the existing monitors. :: + + $ monmaptool --rm a --rm b --rm c {tmp}/{filename} + + monmaptool: monmap file {tmp}/{filename} + monmaptool: removing a + monmaptool: removing b + monmaptool: removing c + monmaptool: writing epoch 1 to {tmp}/{filename} (0 monitors) + +#. Add the new monitor locations. :: + + $ monmaptool --add a 10.1.0.1:6789 --add b 10.1.0.2:6789 --add c 10.1.0.3:6789 {tmp}/{filename} + + monmaptool: monmap file {tmp}/{filename} + monmaptool: writing epoch 1 to {tmp}/{filename} (3 monitors) + +#. Check new contents. :: + + $ monmaptool --print {tmp}/{filename} + + monmaptool: monmap file {tmp}/{filename} + epoch 1 + fsid 224e376d-c5fe-4504-96bb-ea6332a19e61 + last_changed 2012-12-17 02:46:41.591248 + created 2012-12-17 02:46:41.591248 + 0: 10.1.0.1:6789/0 mon.a + 1: 10.1.0.2:6789/0 mon.b + 2: 10.1.0.3:6789/0 mon.c + +At this point, we assume the monitors (and stores) are installed at the new +location. The next step is to propagate the modified monmap to the new +monitors, and inject the modified monmap into each new monitor. + +#. First, make sure to stop all your monitors. Injection must be done while + the daemon is not running. + +#. Inject the monmap. :: + + ceph-mon -i {mon-id} --inject-monmap {tmp}/{filename} + +#. Restart the monitors. + +After this step, migration to the new location is complete and +the monitors should operate successfully. + + +.. _Manual Deployment: ../../../install/manual-deployment +.. _Monitor Bootstrap: ../../../dev/mon-bootstrap +.. _Paxos: https://en.wikipedia.org/wiki/Paxos_(computer_science) |