diff options
Diffstat (limited to 'doc/rados/operations/add-or-rm-osds.rst')
-rw-r--r-- | doc/rados/operations/add-or-rm-osds.rst | 419 |
1 files changed, 419 insertions, 0 deletions
diff --git a/doc/rados/operations/add-or-rm-osds.rst b/doc/rados/operations/add-or-rm-osds.rst new file mode 100644 index 000000000..1a6621148 --- /dev/null +++ b/doc/rados/operations/add-or-rm-osds.rst @@ -0,0 +1,419 @@ +====================== + Adding/Removing OSDs +====================== + +When a cluster is up and running, it is possible to add or remove OSDs. + +Adding OSDs +=========== + +OSDs can be added to a cluster in order to expand the cluster's capacity and +resilience. Typically, an OSD is a Ceph ``ceph-osd`` daemon running on one +storage drive within a host machine. But if your host machine has multiple +storage drives, you may map one ``ceph-osd`` daemon for each drive on the +machine. + +It's a good idea to check the capacity of your cluster so that you know when it +approaches its capacity limits. If your cluster has reached its ``near full`` +ratio, then you should add OSDs to expand your cluster's capacity. + +.. warning:: Do not add an OSD after your cluster has reached its ``full + ratio``. OSD failures that occur after the cluster reaches its ``near full + ratio`` might cause the cluster to exceed its ``full ratio``. + + +Deploying your Hardware +----------------------- + +If you are also adding a new host when adding a new OSD, see `Hardware +Recommendations`_ for details on minimum recommendations for OSD hardware. To +add an OSD host to your cluster, begin by making sure that an appropriate +version of Linux has been installed on the host machine and that all initial +preparations for your storage drives have been carried out. For details, see +`Filesystem Recommendations`_. + +Next, add your OSD host to a rack in your cluster, connect the host to the +network, and ensure that the host has network connectivity. For details, see +`Network Configuration Reference`_. + + +.. _Hardware Recommendations: ../../../start/hardware-recommendations +.. _Filesystem Recommendations: ../../configuration/filesystem-recommendations +.. _Network Configuration Reference: ../../configuration/network-config-ref + +Installing the Required Software +-------------------------------- + +If your cluster has been manually deployed, you will need to install Ceph +software packages manually. For details, see `Installing Ceph (Manual)`_. +Configure SSH for the appropriate user to have both passwordless authentication +and root permissions. + +.. _Installing Ceph (Manual): ../../../install + + +Adding an OSD (Manual) +---------------------- + +The following procedure sets up a ``ceph-osd`` daemon, configures this OSD to +use one drive, and configures the cluster to distribute data to the OSD. If +your host machine has multiple drives, you may add an OSD for each drive on the +host by repeating this procedure. + +As the following procedure will demonstrate, adding an OSD involves creating a +metadata directory for it, configuring a data storage drive, adding the OSD to +the cluster, and then adding it to the CRUSH map. + +When you add the OSD to the CRUSH map, you will need to consider the weight you +assign to the new OSD. Since storage drive capacities increase over time, newer +OSD hosts are likely to have larger hard drives than the older hosts in the +cluster have and therefore might have greater weight as well. + +.. tip:: Ceph works best with uniform hardware across pools. It is possible to + add drives of dissimilar size and then adjust their weights accordingly. + However, for best performance, consider a CRUSH hierarchy that has drives of + the same type and size. It is better to add larger drives uniformly to + existing hosts. This can be done incrementally, replacing smaller drives + each time the new drives are added. + +#. Create the new OSD by running a command of the following form. If you opt + not to specify a UUID in this command, the UUID will be set automatically + when the OSD starts up. The OSD number, which is needed for subsequent + steps, is found in the command's output: + + .. prompt:: bash $ + + ceph osd create [{uuid} [{id}]] + + If the optional parameter {id} is specified it will be used as the OSD ID. + However, if the ID number is already in use, the command will fail. + + .. warning:: Explicitly specifying the ``{id}`` parameter is not + recommended. IDs are allocated as an array, and any skipping of entries + consumes extra memory. This memory consumption can become significant if + there are large gaps or if clusters are large. By leaving the ``{id}`` + parameter unspecified, we ensure that Ceph uses the smallest ID number + available and that these problems are avoided. + +#. Create the default directory for your new OSD by running commands of the + following form: + + .. prompt:: bash $ + + ssh {new-osd-host} + sudo mkdir /var/lib/ceph/osd/ceph-{osd-number} + +#. If the OSD will be created on a drive other than the OS drive, prepare it + for use with Ceph. Run commands of the following form: + + .. prompt:: bash $ + + ssh {new-osd-host} + sudo mkfs -t {fstype} /dev/{drive} + sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number} + +#. Initialize the OSD data directory by running commands of the following form: + + .. prompt:: bash $ + + ssh {new-osd-host} + ceph-osd -i {osd-num} --mkfs --mkkey + + Make sure that the directory is empty before running ``ceph-osd``. + +#. Register the OSD authentication key by running a command of the following + form: + + .. prompt:: bash $ + + ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring + + This presentation of the command has ``ceph-{osd-num}`` in the listed path + because many clusters have the name ``ceph``. However, if your cluster name + is not ``ceph``, then the string ``ceph`` in ``ceph-{osd-num}`` needs to be + replaced with your cluster name. For example, if your cluster name is + ``cluster1``, then the path in the command should be + ``/var/lib/ceph/osd/cluster1-{osd-num}/keyring``. + +#. Add the OSD to the CRUSH map by running the following command. This allows + the OSD to begin receiving data. The ``ceph osd crush add`` command can add + OSDs to the CRUSH hierarchy wherever you want. If you specify one or more + buckets, the command places the OSD in the most specific of those buckets, + and it moves that bucket underneath any other buckets that you have + specified. **Important:** If you specify only the root bucket, the command + will attach the OSD directly to the root, but CRUSH rules expect OSDs to be + inside of hosts. If the OSDs are not inside hosts, the OSDS will likely not + receive any data. + + .. prompt:: bash $ + + ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...] + + Note that there is another way to add a new OSD to the CRUSH map: decompile + the CRUSH map, add the OSD to the device list, add the host as a bucket (if + it is not already in the CRUSH map), add the device as an item in the host, + assign the device a weight, recompile the CRUSH map, and set the CRUSH map. + For details, see `Add/Move an OSD`_. This is rarely necessary with recent + releases (this sentence was written the month that Reef was released). + + +.. _rados-replacing-an-osd: + +Replacing an OSD +---------------- + +.. note:: If the procedure in this section does not work for you, try the + instructions in the ``cephadm`` documentation: + :ref:`cephadm-replacing-an-osd`. + +Sometimes OSDs need to be replaced: for example, when a disk fails, or when an +administrator wants to reprovision OSDs with a new back end (perhaps when +switching from Filestore to BlueStore). Replacing an OSD differs from `Removing +the OSD`_ in that the replaced OSD's ID and CRUSH map entry must be kept intact +after the OSD is destroyed for replacement. + + +#. Make sure that it is safe to destroy the OSD: + + .. prompt:: bash $ + + while ! ceph osd safe-to-destroy osd.{id} ; do sleep 10 ; done + +#. Destroy the OSD: + + .. prompt:: bash $ + + ceph osd destroy {id} --yes-i-really-mean-it + +#. *Optional*: If the disk that you plan to use is not a new disk and has been + used before for other purposes, zap the disk: + + .. prompt:: bash $ + + ceph-volume lvm zap /dev/sdX + +#. Prepare the disk for replacement by using the ID of the OSD that was + destroyed in previous steps: + + .. prompt:: bash $ + + ceph-volume lvm prepare --osd-id {id} --data /dev/sdX + +#. Finally, activate the OSD: + + .. prompt:: bash $ + + ceph-volume lvm activate {id} {fsid} + +Alternatively, instead of carrying out the final two steps (preparing the disk +and activating the OSD), you can re-create the OSD by running a single command +of the following form: + + .. prompt:: bash $ + + ceph-volume lvm create --osd-id {id} --data /dev/sdX + +Starting the OSD +---------------- + +After an OSD is added to Ceph, the OSD is in the cluster. However, until it is +started, the OSD is considered ``down`` and ``in``. The OSD is not running and +will be unable to receive data. To start an OSD, either run ``service ceph`` +from your admin host or run a command of the following form to start the OSD +from its host machine: + + .. prompt:: bash $ + + sudo systemctl start ceph-osd@{osd-num} + +After the OSD is started, it is considered ``up`` and ``in``. + +Observing the Data Migration +---------------------------- + +After the new OSD has been added to the CRUSH map, Ceph begins rebalancing the +cluster by migrating placement groups (PGs) to the new OSD. To observe this +process by using the `ceph`_ tool, run the following command: + + .. prompt:: bash $ + + ceph -w + +Or: + + .. prompt:: bash $ + + watch ceph status + +The PG states will first change from ``active+clean`` to ``active, some +degraded objects`` and then return to ``active+clean`` when migration +completes. When you are finished observing, press Ctrl-C to exit. + +.. _Add/Move an OSD: ../crush-map#addosd +.. _ceph: ../monitoring + + +Removing OSDs (Manual) +====================== + +It is possible to remove an OSD manually while the cluster is running: you +might want to do this in order to reduce the size of the cluster or when +replacing hardware. Typically, an OSD is a Ceph ``ceph-osd`` daemon running on +one storage drive within a host machine. Alternatively, if your host machine +has multiple storage drives, you might need to remove multiple ``ceph-osd`` +daemons: one daemon for each drive on the machine. + +.. warning:: Before you begin the process of removing an OSD, make sure that + your cluster is not near its ``full ratio``. Otherwise the act of removing + OSDs might cause the cluster to reach or exceed its ``full ratio``. + + +Taking the OSD ``out`` of the Cluster +------------------------------------- + +OSDs are typically ``up`` and ``in`` before they are removed from the cluster. +Before the OSD can be removed from the cluster, the OSD must be taken ``out`` +of the cluster so that Ceph can begin rebalancing and copying its data to other +OSDs. To take an OSD ``out`` of the cluster, run a command of the following +form: + + .. prompt:: bash $ + + ceph osd out {osd-num} + + +Observing the Data Migration +---------------------------- + +After the OSD has been taken ``out`` of the cluster, Ceph begins rebalancing +the cluster by migrating placement groups out of the OSD that was removed. To +observe this process by using the `ceph`_ tool, run the following command: + + .. prompt:: bash $ + + ceph -w + +The PG states will change from ``active+clean`` to ``active, some degraded +objects`` and will then return to ``active+clean`` when migration completes. +When you are finished observing, press Ctrl-C to exit. + +.. note:: Under certain conditions, the action of taking ``out`` an OSD + might lead CRUSH to encounter a corner case in which some PGs remain stuck + in the ``active+remapped`` state. This problem sometimes occurs in small + clusters with few hosts (for example, in a small testing cluster). To + address this problem, mark the OSD ``in`` by running a command of the + following form: + + .. prompt:: bash $ + + ceph osd in {osd-num} + + After the OSD has come back to its initial state, do not mark the OSD + ``out`` again. Instead, set the OSD's weight to ``0`` by running a command + of the following form: + + .. prompt:: bash $ + + ceph osd crush reweight osd.{osd-num} 0 + + After the OSD has been reweighted, observe the data migration and confirm + that it has completed successfully. The difference between marking an OSD + ``out`` and reweighting the OSD to ``0`` has to do with the bucket that + contains the OSD. When an OSD is marked ``out``, the weight of the bucket is + not changed. But when an OSD is reweighted to ``0``, the weight of the + bucket is updated (namely, the weight of the OSD is subtracted from the + overall weight of the bucket). When operating small clusters, it can + sometimes be preferable to use the above reweight command. + + +Stopping the OSD +---------------- + +After you take an OSD ``out`` of the cluster, the OSD might still be running. +In such a case, the OSD is ``up`` and ``out``. Before it is removed from the +cluster, the OSD must be stopped by running commands of the following form: + + .. prompt:: bash $ + + ssh {osd-host} + sudo systemctl stop ceph-osd@{osd-num} + +After the OSD has been stopped, it is ``down``. + + +Removing the OSD +---------------- + +The following procedure removes an OSD from the cluster map, removes the OSD's +authentication key, removes the OSD from the OSD map, and removes the OSD from +the ``ceph.conf`` file. If your host has multiple drives, it might be necessary +to remove an OSD from each drive by repeating this procedure. + +#. Begin by having the cluster forget the OSD. This step removes the OSD from + the CRUSH map, removes the OSD's authentication key, and removes the OSD + from the OSD map. (The :ref:`purge subcommand <ceph-admin-osd>` was + introduced in Luminous. For older releases, see :ref:`the procedure linked + here <ceph_osd_purge_procedure_pre_luminous>`.): + + .. prompt:: bash $ + + ceph osd purge {id} --yes-i-really-mean-it + + +#. Navigate to the host where the master copy of the cluster's + ``ceph.conf`` file is kept: + + .. prompt:: bash $ + + ssh {admin-host} + cd /etc/ceph + vim ceph.conf + +#. Remove the OSD entry from your ``ceph.conf`` file (if such an entry + exists):: + + [osd.1] + host = {hostname} + +#. Copy the updated ``ceph.conf`` file from the location on the host where the + master copy of the cluster's ``ceph.conf`` is kept to the ``/etc/ceph`` + directory of the other hosts in your cluster. + +.. _ceph_osd_purge_procedure_pre_luminous: + +If your Ceph cluster is older than Luminous, you will be unable to use the +``ceph osd purge`` command. Instead, carry out the following procedure: + +#. Remove the OSD from the CRUSH map so that it no longer receives data (for + more details, see `Remove an OSD`_): + + .. prompt:: bash $ + + ceph osd crush remove {name} + + Instead of removing the OSD from the CRUSH map, you might opt for one of two + alternatives: (1) decompile the CRUSH map, remove the OSD from the device + list, and remove the device from the host bucket; (2) remove the host bucket + from the CRUSH map (provided that it is in the CRUSH map and that you intend + to remove the host), recompile the map, and set it: + + +#. Remove the OSD authentication key: + + .. prompt:: bash $ + + ceph auth del osd.{osd-num} + +#. Remove the OSD: + + .. prompt:: bash $ + + ceph osd rm {osd-num} + + For example: + + .. prompt:: bash $ + + ceph osd rm 1 + +.. _Remove an OSD: ../crush-map#removeosd |