diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 18:45:59 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 18:45:59 +0000 |
commit | 19fcec84d8d7d21e796c7624e521b60d28ee21ed (patch) | |
tree | 42d26aa27d1e3f7c0b8bd3fd14e7d7082f5008dc /doc/cephadm | |
parent | Initial commit. (diff) | |
download | ceph-19fcec84d8d7d21e796c7624e521b60d28ee21ed.tar.xz ceph-19fcec84d8d7d21e796c7624e521b60d28ee21ed.zip |
Adding upstream version 16.2.11+ds.upstream/16.2.11+dsupstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/cephadm')
-rw-r--r-- | doc/cephadm/adoption.rst | 213 | ||||
-rw-r--r-- | doc/cephadm/client-setup.rst | 45 | ||||
-rw-r--r-- | doc/cephadm/compatibility.rst | 58 | ||||
-rw-r--r-- | doc/cephadm/host-management.rst | 436 | ||||
-rw-r--r-- | doc/cephadm/index.rst | 50 | ||||
-rw-r--r-- | doc/cephadm/install.rst | 439 | ||||
-rw-r--r-- | doc/cephadm/operations.rst | 545 | ||||
-rw-r--r-- | doc/cephadm/services/custom-container.rst | 79 | ||||
-rw-r--r-- | doc/cephadm/services/index.rst | 658 | ||||
-rw-r--r-- | doc/cephadm/services/iscsi.rst | 80 | ||||
-rw-r--r-- | doc/cephadm/services/mds.rst | 49 | ||||
-rw-r--r-- | doc/cephadm/services/mgr.rst | 43 | ||||
-rw-r--r-- | doc/cephadm/services/mon.rst | 179 | ||||
-rw-r--r-- | doc/cephadm/services/monitoring.rst | 457 | ||||
-rw-r--r-- | doc/cephadm/services/nfs.rst | 120 | ||||
-rw-r--r-- | doc/cephadm/services/osd.rst | 936 | ||||
-rw-r--r-- | doc/cephadm/services/rgw.rst | 324 | ||||
-rw-r--r-- | doc/cephadm/services/snmp-gateway.rst | 171 | ||||
-rw-r--r-- | doc/cephadm/troubleshooting.rst | 370 | ||||
-rw-r--r-- | doc/cephadm/upgrade.rst | 270 |
20 files changed, 5522 insertions, 0 deletions
diff --git a/doc/cephadm/adoption.rst b/doc/cephadm/adoption.rst new file mode 100644 index 000000000..78d1343eb --- /dev/null +++ b/doc/cephadm/adoption.rst @@ -0,0 +1,213 @@ +.. _cephadm-adoption: + +Converting an existing cluster to cephadm +========================================= + +It is possible to convert some existing clusters so that they can be managed +with ``cephadm``. This statment applies to some clusters that were deployed +with ``ceph-deploy``, ``ceph-ansible``, or ``DeepSea``. + +This section of the documentation explains how to determine whether your +clusters can be converted to a state in which they can be managed by +``cephadm`` and how to perform those conversions. + +Limitations +----------- + +* Cephadm works only with BlueStore OSDs. FileStore OSDs that are in your + cluster cannot be managed with ``cephadm``. + +Preparation +----------- + +#. Make sure that the ``cephadm`` command line tool is available on each host + in the existing cluster. See :ref:`get-cephadm` to learn how. + +#. Prepare each host for use by ``cephadm`` by running this command: + + .. prompt:: bash # + + cephadm prepare-host + +#. Choose a version of Ceph to use for the conversion. This procedure will work + with any release of Ceph that is Octopus (15.2.z) or later, inclusive. The + latest stable release of Ceph is the default. You might be upgrading from an + earlier Ceph release at the same time that you're performing this + conversion; if you are upgrading from an earlier release, make sure to + follow any upgrade-related instructions for that release. + + Pass the image to cephadm with the following command: + + .. prompt:: bash # + + cephadm --image $IMAGE <rest of command goes here> + + The conversion begins. + +#. Confirm that the conversion is underway by running ``cephadm ls`` and + making sure that the style of the daemons is changed: + + .. prompt:: bash # + + cephadm ls + + Before starting the converstion process, ``cephadm ls`` shows all existing + daemons to have a style of ``legacy``. As the adoption process progresses, + adopted daemons will appear with a style of ``cephadm:v1``. + + +Adoption process +---------------- + +#. Make sure that the ceph configuration has been migrated to use the cluster + config database. If the ``/etc/ceph/ceph.conf`` is identical on each host, + then the following command can be run on one single host and will affect all + hosts: + + .. prompt:: bash # + + ceph config assimilate-conf -i /etc/ceph/ceph.conf + + If there are configuration variations between hosts, you will need to repeat + this command on each host. During this adoption process, view the cluster's + configuration to confirm that it is complete by running the following + command: + + .. prompt:: bash # + + ceph config dump + +#. Adopt each monitor: + + .. prompt:: bash # + + cephadm adopt --style legacy --name mon.<hostname> + + Each legacy monitor should stop, quickly restart as a cephadm + container, and rejoin the quorum. + +#. Adopt each manager: + + .. prompt:: bash # + + cephadm adopt --style legacy --name mgr.<hostname> + +#. Enable cephadm: + + .. prompt:: bash # + + ceph mgr module enable cephadm + ceph orch set backend cephadm + +#. Generate an SSH key: + + .. prompt:: bash # + + ceph cephadm generate-key + ceph cephadm get-pub-key > ~/ceph.pub + +#. Install the cluster SSH key on each host in the cluster: + + .. prompt:: bash # + + ssh-copy-id -f -i ~/ceph.pub root@<host> + + .. note:: + It is also possible to import an existing SSH key. See + :ref:`SSH errors <cephadm-ssh-errors>` in the troubleshooting + document for instructions that describe how to import existing + SSH keys. + + .. note:: + It is also possible to have cephadm use a non-root user to SSH + into cluster hosts. This user needs to have passwordless sudo access. + Use ``ceph cephadm set-user <user>`` and copy the SSH key to that user. + See :ref:`cephadm-ssh-user` + +#. Tell cephadm which hosts to manage: + + .. prompt:: bash # + + ceph orch host add <hostname> [ip-address] + + This will perform a ``cephadm check-host`` on each host before adding it; + this check ensures that the host is functioning properly. The IP address + argument is recommended; if not provided, then the host name will be resolved + via DNS. + +#. Verify that the adopted monitor and manager daemons are visible: + + .. prompt:: bash # + + ceph orch ps + +#. Adopt all OSDs in the cluster: + + .. prompt:: bash # + + cephadm adopt --style legacy --name <name> + + For example: + + .. prompt:: bash # + + cephadm adopt --style legacy --name osd.1 + cephadm adopt --style legacy --name osd.2 + +#. Redeploy MDS daemons by telling cephadm how many daemons to run for + each file system. List file systems by name with the command ``ceph fs + ls``. Run the following command on the master nodes to redeploy the MDS + daemons: + + .. prompt:: bash # + + ceph orch apply mds <fs-name> [--placement=<placement>] + + For example, in a cluster with a single file system called `foo`: + + .. prompt:: bash # + + ceph fs ls + + .. code-block:: bash + + name: foo, metadata pool: foo_metadata, data pools: [foo_data ] + + .. prompt:: bash # + + ceph orch apply mds foo 2 + + Confirm that the new MDS daemons have started: + + .. prompt:: bash # + + ceph orch ps --daemon-type mds + + Finally, stop and remove the legacy MDS daemons: + + .. prompt:: bash # + + systemctl stop ceph-mds.target + rm -rf /var/lib/ceph/mds/ceph-* + +#. Redeploy RGW daemons. Cephadm manages RGW daemons by zone. For each + zone, deploy new RGW daemons with cephadm: + + .. prompt:: bash # + + ceph orch apply rgw <svc_id> [--realm=<realm>] [--zone=<zone>] [--port=<port>] [--ssl] [--placement=<placement>] + + where *<placement>* can be a simple daemon count, or a list of + specific hosts (see :ref:`orchestrator-cli-placement-spec`), and the + zone and realm arguments are needed only for a multisite setup. + + After the daemons have started and you have confirmed that they are + functioning, stop and remove the old, legacy daemons: + + .. prompt:: bash # + + systemctl stop ceph-rgw.target + rm -rf /var/lib/ceph/radosgw/ceph-* + +#. Check the output of the command ``ceph health detail`` for cephadm warnings + about stray cluster daemons or hosts that are not yet managed by cephadm. diff --git a/doc/cephadm/client-setup.rst b/doc/cephadm/client-setup.rst new file mode 100644 index 000000000..f98ba798b --- /dev/null +++ b/doc/cephadm/client-setup.rst @@ -0,0 +1,45 @@ +======================= +Basic Ceph Client Setup +======================= +Client machines require some basic configuration to interact with +Ceph clusters. This section describes how to configure a client machine +so that it can interact with a Ceph cluster. + +.. note:: + Most client machines need to install only the `ceph-common` package + and its dependencies. Such a setup supplies the basic `ceph` and + `rados` commands, as well as other commands including `mount.ceph` + and `rbd`. + +Config File Setup +================= +Client machines usually require smaller configuration files (here +sometimes called "config files") than do full-fledged cluster members. +To generate a minimal config file, log into a host that has been +configured as a client or that is running a cluster daemon, and then run the following command: + +.. prompt:: bash # + + ceph config generate-minimal-conf + +This command generates a minimal config file that tells the client how +to reach the Ceph monitors. The contents of this file should usually +be installed in ``/etc/ceph/ceph.conf``. + +Keyring Setup +============= +Most Ceph clusters run with authentication enabled. This means that +the client needs keys in order to communicate with the machines in the +cluster. To generate a keyring file with credentials for `client.fs`, +log into an running cluster member and run the following command: + +.. prompt:: bash $ + + ceph auth get-or-create client.fs + +The resulting output is directed into a keyring file, typically +``/etc/ceph/ceph.keyring``. + +To gain a broader understanding of client keyring distribution and administration, you should read :ref:`client_keyrings_and_configs`. + +To see an example that explains how to distribute ``ceph.conf`` configuration files to hosts that are tagged with the ``bare_config`` label, you should read the section called "Distributing ceph.conf to hosts tagged with bare_config" in the section called :ref:`etc_ceph_conf_distribution`. diff --git a/doc/cephadm/compatibility.rst b/doc/cephadm/compatibility.rst new file mode 100644 index 000000000..7d9c763bb --- /dev/null +++ b/doc/cephadm/compatibility.rst @@ -0,0 +1,58 @@ + +=========================== +Compatibility and Stability +=========================== + +.. _cephadm-compatibility-with-podman: + +Compatibility with Podman Versions +---------------------------------- + +Podman and Ceph have different end-of-life strategies. This means that care +must be taken in finding a version of Podman that is compatible with Ceph. + +These versions are expected to work: + + ++-----------+---------------------------------------+ +| Ceph | Podman | ++-----------+-------+-------+-------+-------+-------+ +| | 1.9 | 2.0 | 2.1 | 2.2 | 3.0 | ++===========+=======+=======+=======+=======+=======+ +| <= 15.2.5 | True | False | False | False | False | ++-----------+-------+-------+-------+-------+-------+ +| >= 15.2.6 | True | True | True | False | False | ++-----------+-------+-------+-------+-------+-------+ +| >= 16.2.1 | False | True | True | False | True | ++-----------+-------+-------+-------+-------+-------+ + +.. warning:: + + To use Podman with Ceph Pacific, you must use **a version of Podman that + is 2.0.0 or higher**. However, **Podman version 2.2.1 does not work with + Ceph Pacific**. + + "Kubic stable" is known to work with Ceph Pacific, but it must be run + with a newer kernel. + + +.. _cephadm-stability: + +Stability +--------- + +Cephadm is under development. Some functionality is incomplete. Be aware +that some of the components of Ceph may not work perfectly with cephadm. +These include: + +- RGW + +Cephadm support remains under development for the following features: + +- Ingress +- Cephadm exporter daemon +- cephfs-mirror + +If a cephadm command fails or a service stops running properly, see +:ref:`cephadm-pause` for instructions on how to pause the Ceph cluster's +background activity and how to disable cephadm. diff --git a/doc/cephadm/host-management.rst b/doc/cephadm/host-management.rst new file mode 100644 index 000000000..df9525ca8 --- /dev/null +++ b/doc/cephadm/host-management.rst @@ -0,0 +1,436 @@ +.. _orchestrator-cli-host-management: + +=============== +Host Management +=============== + +Listing Hosts +============= + +Run a command of this form to list hosts associated with the cluster: + +.. prompt:: bash # + + ceph orch host ls [--format yaml] [--host-pattern <name>] [--label <label>] [--host-status <status>] + +In commands of this form, the arguments "host-pattern", "label" and +"host-status" are optional and are used for filtering. + +- "host-pattern" is a regex that matches against hostnames and returns only + matching hosts. +- "label" returns only hosts with the specified label. +- "host-status" returns only hosts with the specified status (currently + "offline" or "maintenance"). +- Any combination of these filtering flags is valid. It is possible to filter + against name, label and status simultaneously, or to filter against any + proper subset of name, label and status. + +.. _cephadm-adding-hosts: + +Adding Hosts +============ + +Hosts must have these :ref:`cephadm-host-requirements` installed. +Hosts without all the necessary requirements will fail to be added to the cluster. + +To add each new host to the cluster, perform two steps: + +#. Install the cluster's public SSH key in the new host's root user's ``authorized_keys`` file: + + .. prompt:: bash # + + ssh-copy-id -f -i /etc/ceph/ceph.pub root@*<new-host>* + + For example: + + .. prompt:: bash # + + ssh-copy-id -f -i /etc/ceph/ceph.pub root@host2 + ssh-copy-id -f -i /etc/ceph/ceph.pub root@host3 + +#. Tell Ceph that the new node is part of the cluster: + + .. prompt:: bash # + + ceph orch host add *<newhost>* [*<ip>*] [*<label1> ...*] + + For example: + + .. prompt:: bash # + + ceph orch host add host2 10.10.0.102 + ceph orch host add host3 10.10.0.103 + + It is best to explicitly provide the host IP address. If an IP is + not provided, then the host name will be immediately resolved via + DNS and that IP will be used. + + One or more labels can also be included to immediately label the + new host. For example, by default the ``_admin`` label will make + cephadm maintain a copy of the ``ceph.conf`` file and a + ``client.admin`` keyring file in ``/etc/ceph``: + + .. prompt:: bash # + + ceph orch host add host4 10.10.0.104 --labels _admin + +.. _cephadm-removing-hosts: + +Removing Hosts +============== + +A host can safely be removed from the cluster after all daemons are removed +from it. + +To drain all daemons from a host, run a command of the following form: + +.. prompt:: bash # + + ceph orch host drain *<host>* + +The ``_no_schedule`` label will be applied to the host. See +:ref:`cephadm-special-host-labels`. + +All OSDs on the host will be scheduled to be removed. You can check the progress of the OSD removal operation with the following command: + +.. prompt:: bash # + + ceph orch osd rm status + +See :ref:`cephadm-osd-removal` for more details about OSD removal. + +Use the following command to determine whether any daemons are still on the +host: + +.. prompt:: bash # + + ceph orch ps <host> + +After all daemons have been removed from the host, remove the host from the +cluster by running the following command: + +.. prompt:: bash # + + ceph orch host rm <host> + +Offline host removal +-------------------- + +Even if a host is offline and can not be recovered, it can be removed from the +cluster by running a command of the following form: + +.. prompt:: bash # + + ceph orch host rm <host> --offline --force + +.. warning:: This can potentially cause data loss. This command forcefully + purges OSDs from the cluster by calling ``osd purge-actual`` for each OSD. + Any service specs that still contain this host should be manually updated. + +.. _orchestrator-host-labels: + +Host labels +=========== + +The orchestrator supports assigning labels to hosts. Labels +are free form and have no particular meaning by itself and each host +can have multiple labels. They can be used to specify placement +of daemons. See :ref:`orch-placement-by-labels` + +Labels can be added when adding a host with the ``--labels`` flag: + +.. prompt:: bash # + + ceph orch host add my_hostname --labels=my_label1 + ceph orch host add my_hostname --labels=my_label1,my_label2 + +To add a label a existing host, run: + +.. prompt:: bash # + + ceph orch host label add my_hostname my_label + +To remove a label, run: + +.. prompt:: bash # + + ceph orch host label rm my_hostname my_label + + +.. _cephadm-special-host-labels: + +Special host labels +------------------- + +The following host labels have a special meaning to cephadm. All start with ``_``. + +* ``_no_schedule``: *Do not schedule or deploy daemons on this host*. + + This label prevents cephadm from deploying daemons on this host. If it is added to + an existing host that already contains Ceph daemons, it will cause cephadm to move + those daemons elsewhere (except OSDs, which are not removed automatically). + +* ``_no_autotune_memory``: *Do not autotune memory on this host*. + + This label will prevent daemon memory from being tuned even when the + ``osd_memory_target_autotune`` or similar option is enabled for one or more daemons + on that host. + +* ``_admin``: *Distribute client.admin and ceph.conf to this host*. + + By default, an ``_admin`` label is applied to the first host in the cluster (where + bootstrap was originally run), and the ``client.admin`` key is set to be distributed + to that host via the ``ceph orch client-keyring ...`` function. Adding this label + to additional hosts will normally cause cephadm to deploy config and keyring files + in ``/etc/ceph``. + +Maintenance Mode +================ + +Place a host in and out of maintenance mode (stops all Ceph daemons on host): + +.. prompt:: bash # + + ceph orch host maintenance enter <hostname> [--force] + ceph orch host maintenance exit <hostname> + +Where the force flag when entering maintenance allows the user to bypass warnings (but not alerts) + +See also :ref:`cephadm-fqdn` + +Rescanning Host Devices +======================= + +Some servers and external enclosures may not register device removal or insertion with the +kernel. In these scenarios, you'll need to perform a host rescan. A rescan is typically +non-disruptive, and can be performed with the following CLI command: + +.. prompt:: bash # + + ceph orch host rescan <hostname> [--with-summary] + +The ``with-summary`` flag provides a breakdown of the number of HBAs found and scanned, together +with any that failed: + +.. prompt:: bash [ceph:root@rh9-ceph1/]# + + ceph orch host rescan rh9-ceph1 --with-summary + +:: + + Ok. 2 adapters detected: 2 rescanned, 0 skipped, 0 failed (0.32s) + +Creating many hosts at once +=========================== + +Many hosts can be added at once using +``ceph orch apply -i`` by submitting a multi-document YAML file: + +.. code-block:: yaml + + service_type: host + hostname: node-00 + addr: 192.168.0.10 + labels: + - example1 + - example2 + --- + service_type: host + hostname: node-01 + addr: 192.168.0.11 + labels: + - grafana + --- + service_type: host + hostname: node-02 + addr: 192.168.0.12 + +This can be combined with service specifications (below) to create a cluster spec +file to deploy a whole cluster in one command. see ``cephadm bootstrap --apply-spec`` +also to do this during bootstrap. Cluster SSH Keys must be copied to hosts prior to adding them. + +Setting the initial CRUSH location of host +========================================== + +Hosts can contain a ``location`` identifier which will instruct cephadm to +create a new CRUSH host located in the specified hierachy. + +.. code-block:: yaml + + service_type: host + hostname: node-00 + addr: 192.168.0.10 + location: + rack: rack1 + +.. note:: + + The ``location`` attribute will be only affect the initial CRUSH location. Subsequent + changes of the ``location`` property will be ignored. Also, removing a host will no remove + any CRUSH buckets. + +See also :ref:`crush_map_default_types`. + +SSH Configuration +================= + +Cephadm uses SSH to connect to remote hosts. SSH uses a key to authenticate +with those hosts in a secure way. + + +Default behavior +---------------- + +Cephadm stores an SSH key in the monitor that is used to +connect to remote hosts. When the cluster is bootstrapped, this SSH +key is generated automatically and no additional configuration +is necessary. + +A *new* SSH key can be generated with: + +.. prompt:: bash # + + ceph cephadm generate-key + +The public portion of the SSH key can be retrieved with: + +.. prompt:: bash # + + ceph cephadm get-pub-key + +The currently stored SSH key can be deleted with: + +.. prompt:: bash # + + ceph cephadm clear-key + +You can make use of an existing key by directly importing it with: + +.. prompt:: bash # + + ceph config-key set mgr/cephadm/ssh_identity_key -i <key> + ceph config-key set mgr/cephadm/ssh_identity_pub -i <pub> + +You will then need to restart the mgr daemon to reload the configuration with: + +.. prompt:: bash # + + ceph mgr fail + +.. _cephadm-ssh-user: + +Configuring a different SSH user +---------------------------------- + +Cephadm must be able to log into all the Ceph cluster nodes as an user +that has enough privileges to download container images, start containers +and execute commands without prompting for a password. If you do not want +to use the "root" user (default option in cephadm), you must provide +cephadm the name of the user that is going to be used to perform all the +cephadm operations. Use the command: + +.. prompt:: bash # + + ceph cephadm set-user <user> + +Prior to running this the cluster SSH key needs to be added to this users +authorized_keys file and non-root users must have passwordless sudo access. + + +Customizing the SSH configuration +--------------------------------- + +Cephadm generates an appropriate ``ssh_config`` file that is +used for connecting to remote hosts. This configuration looks +something like this:: + + Host * + User root + StrictHostKeyChecking no + UserKnownHostsFile /dev/null + +There are two ways to customize this configuration for your environment: + +#. Import a customized configuration file that will be stored + by the monitor with: + + .. prompt:: bash # + + ceph cephadm set-ssh-config -i <ssh_config_file> + + To remove a customized SSH config and revert back to the default behavior: + + .. prompt:: bash # + + ceph cephadm clear-ssh-config + +#. You can configure a file location for the SSH configuration file with: + + .. prompt:: bash # + + ceph config set mgr mgr/cephadm/ssh_config_file <path> + + We do *not recommend* this approach. The path name must be + visible to *any* mgr daemon, and cephadm runs all daemons as + containers. That means that the file either need to be placed + inside a customized container image for your deployment, or + manually distributed to the mgr data directory + (``/var/lib/ceph/<cluster-fsid>/mgr.<id>`` on the host, visible at + ``/var/lib/ceph/mgr/ceph-<id>`` from inside the container). + +.. _cephadm-fqdn: + +Fully qualified domain names vs bare host names +=============================================== + +.. note:: + + cephadm demands that the name of the host given via ``ceph orch host add`` + equals the output of ``hostname`` on remote hosts. + +Otherwise cephadm can't be sure that names returned by +``ceph * metadata`` match the hosts known to cephadm. This might result +in a :ref:`cephadm-stray-host` warning. + +When configuring new hosts, there are two **valid** ways to set the +``hostname`` of a host: + +1. Using the bare host name. In this case: + +- ``hostname`` returns the bare host name. +- ``hostname -f`` returns the FQDN. + +2. Using the fully qualified domain name as the host name. In this case: + +- ``hostname`` returns the FQDN +- ``hostname -s`` return the bare host name + +Note that ``man hostname`` recommends ``hostname`` to return the bare +host name: + + The FQDN (Fully Qualified Domain Name) of the system is the + name that the resolver(3) returns for the host name, such as, + ursula.example.com. It is usually the hostname followed by the DNS + domain name (the part after the first dot). You can check the FQDN + using ``hostname --fqdn`` or the domain name using ``dnsdomainname``. + + .. code-block:: none + + You cannot change the FQDN with hostname or dnsdomainname. + + The recommended method of setting the FQDN is to make the hostname + be an alias for the fully qualified name using /etc/hosts, DNS, or + NIS. For example, if the hostname was "ursula", one might have + a line in /etc/hosts which reads + + 127.0.1.1 ursula.example.com ursula + +Which means, ``man hostname`` recommends ``hostname`` to return the bare +host name. This in turn means that Ceph will return the bare host names +when executing ``ceph * metadata``. This in turn means cephadm also +requires the bare host name when adding a host to the cluster: +``ceph orch host add <bare-name>``. + +.. + TODO: This chapter needs to provide way for users to configure + Grafana in the dashboard, as this is right now very hard to do. diff --git a/doc/cephadm/index.rst b/doc/cephadm/index.rst new file mode 100644 index 000000000..bfa3a4bb2 --- /dev/null +++ b/doc/cephadm/index.rst @@ -0,0 +1,50 @@ +.. _cephadm: + +======= +Cephadm +======= + +``cephadm`` is a utility that is used to manage a Ceph cluster. + +Here is a list of some of the things that ``cephadm`` can do: + +- ``cephadm`` can add a Ceph container to the cluster. +- ``cephadm`` can remove a Ceph container from the cluster. +- ``cephadm`` can update Ceph containers. + +``cephadm`` does not rely on external configuration tools like Ansible, Rook, +or Salt. However, those external configuration tools can be used to automate +operations not performed by cephadm itself. To learn more about these external +configuration tools, visit their pages: + + * https://github.com/ceph/cephadm-ansible + * https://rook.io/docs/rook/v1.10/Getting-Started/intro/ + * https://github.com/ceph/ceph-salt + +``cephadm`` manages the full lifecycle of a Ceph cluster. This lifecycle starts +with the bootstrapping process, when ``cephadm`` creates a tiny Ceph cluster on +a single node. This cluster consists of one monitor and one manager. +``cephadm`` then uses the orchestration interface to expand the cluster, adding +hosts and provisioning Ceph daemons and services. Management of this lifecycle +can be performed either via the Ceph command-line interface (CLI) or via the +dashboard (GUI). + +To use ``cephadm`` to get started with Ceph, follow the instructions in +:ref:`cephadm_deploying_new_cluster`. + +``cephadm`` was introduced in Ceph release v15.2.0 (Octopus) and does not +support older versions of Ceph. + +.. toctree:: + :maxdepth: 2 + + compatibility + install + adoption + host-management + Service Management <services/index> + upgrade + Cephadm operations <operations> + Client Setup <client-setup> + troubleshooting + Cephadm Feature Planning <../dev/cephadm/index> diff --git a/doc/cephadm/install.rst b/doc/cephadm/install.rst new file mode 100644 index 000000000..4c6179614 --- /dev/null +++ b/doc/cephadm/install.rst @@ -0,0 +1,439 @@ +.. _cephadm_deploying_new_cluster: + +============================ +Deploying a new Ceph cluster +============================ + +Cephadm creates a new Ceph cluster by "bootstrapping" on a single +host, expanding the cluster to encompass any additional hosts, and +then deploying the needed services. + +.. highlight:: console + +.. _cephadm-host-requirements: + +Requirements +============ + +- Python 3 +- Systemd +- Podman or Docker for running containers +- Time synchronization (such as chrony or NTP) +- LVM2 for provisioning storage devices + +Any modern Linux distribution should be sufficient. Dependencies +are installed automatically by the bootstrap process below. + +See the section :ref:`Compatibility With Podman +Versions<cephadm-compatibility-with-podman>` for a table of Ceph versions that +are compatible with Podman. Not every version of Podman is compatible with +Ceph. + + + +.. _get-cephadm: + +Install cephadm +=============== + +There are two ways to install ``cephadm``: + +#. a :ref:`curl-based installation<cephadm_install_curl>` method +#. :ref:`distribution-specific installation methods<cephadm_install_distros>` + + +.. _cephadm_install_curl: + +curl-based installation +----------------------- + +* Use ``curl`` to fetch the most recent version of the + standalone script. + + .. prompt:: bash # + :substitutions: + + curl --silent --remote-name --location https://github.com/ceph/ceph/raw/|stable-release|/src/cephadm/cephadm + + Make the ``cephadm`` script executable: + + .. prompt:: bash # + + chmod +x cephadm + + This script can be run directly from the current directory: + + .. prompt:: bash # + + ./cephadm <arguments...> + +* Although the standalone script is sufficient to get a cluster started, it is + convenient to have the ``cephadm`` command installed on the host. To install + the packages that provide the ``cephadm`` command, run the following + commands: + + .. prompt:: bash # + :substitutions: + + ./cephadm add-repo --release |stable-release| + ./cephadm install + + Confirm that ``cephadm`` is now in your PATH by running ``which``: + + .. prompt:: bash # + + which cephadm + + A successful ``which cephadm`` command will return this: + + .. code-block:: bash + + /usr/sbin/cephadm + +.. _cephadm_install_distros: + +distribution-specific installations +----------------------------------- + +.. important:: The methods of installing ``cephadm`` in this section are distinct from the curl-based method above. Use either the curl-based method above or one of the methods in this section, but not both the curl-based method and one of these. + +Some Linux distributions may already include up-to-date Ceph packages. In +that case, you can install cephadm directly. For example: + + In Ubuntu: + + .. prompt:: bash # + + apt install -y cephadm + + In CentOS Stream: + + .. prompt:: bash # + + dnf install --assumeyes centos-release-ceph-pacific.noarch + dnf install --assumeyes cephadm + + In Fedora: + + .. prompt:: bash # + + dnf -y install cephadm + + In SUSE: + + .. prompt:: bash # + + zypper install -y cephadm + + + +Bootstrap a new cluster +======================= + +What to know before you bootstrap +--------------------------------- + +The first step in creating a new Ceph cluster is running the ``cephadm +bootstrap`` command on the Ceph cluster's first host. The act of running the +``cephadm bootstrap`` command on the Ceph cluster's first host creates the Ceph +cluster's first "monitor daemon", and that monitor daemon needs an IP address. +You must pass the IP address of the Ceph cluster's first host to the ``ceph +bootstrap`` command, so you'll need to know the IP address of that host. + +.. note:: If there are multiple networks and interfaces, be sure to choose one + that will be accessible by any host accessing the Ceph cluster. + +Running the bootstrap command +----------------------------- + +Run the ``ceph bootstrap`` command: + +.. prompt:: bash # + + cephadm bootstrap --mon-ip *<mon-ip>* + +This command will: + +* Create a monitor and manager daemon for the new cluster on the local + host. +* Generate a new SSH key for the Ceph cluster and add it to the root + user's ``/root/.ssh/authorized_keys`` file. +* Write a copy of the public key to ``/etc/ceph/ceph.pub``. +* Write a minimal configuration file to ``/etc/ceph/ceph.conf``. This + file is needed to communicate with the new cluster. +* Write a copy of the ``client.admin`` administrative (privileged!) + secret key to ``/etc/ceph/ceph.client.admin.keyring``. +* Add the ``_admin`` label to the bootstrap host. By default, any host + with this label will (also) get a copy of ``/etc/ceph/ceph.conf`` and + ``/etc/ceph/ceph.client.admin.keyring``. + +Further information about cephadm bootstrap +------------------------------------------- + +The default bootstrap behavior will work for most users. But if you'd like +immediately to know more about ``cephadm bootstrap``, read the list below. + +Also, you can run ``cephadm bootstrap -h`` to see all of ``cephadm``'s +available options. + +* By default, Ceph daemons send their log output to stdout/stderr, which is picked + up by the container runtime (docker or podman) and (on most systems) sent to + journald. If you want Ceph to write traditional log files to ``/var/log/ceph/$fsid``, + use the ``--log-to-file`` option during bootstrap. + +* Larger Ceph clusters perform better when (external to the Ceph cluster) + public network traffic is separated from (internal to the Ceph cluster) + cluster traffic. The internal cluster traffic handles replication, recovery, + and heartbeats between OSD daemons. You can define the :ref:`cluster + network<cluster-network>` by supplying the ``--cluster-network`` option to the ``bootstrap`` + subcommand. This parameter must define a subnet in CIDR notation (for example + ``10.90.90.0/24`` or ``fe80::/64``). + +* ``cephadm bootstrap`` writes to ``/etc/ceph`` the files needed to access + the new cluster. This central location makes it possible for Ceph + packages installed on the host (e.g., packages that give access to the + cephadm command line interface) to find these files. + + Daemon containers deployed with cephadm, however, do not need + ``/etc/ceph`` at all. Use the ``--output-dir *<directory>*`` option + to put them in a different directory (for example, ``.``). This may help + avoid conflicts with an existing Ceph configuration (cephadm or + otherwise) on the same host. + +* You can pass any initial Ceph configuration options to the new + cluster by putting them in a standard ini-style configuration file + and using the ``--config *<config-file>*`` option. For example:: + + $ cat <<EOF > initial-ceph.conf + [global] + osd crush chooseleaf type = 0 + EOF + $ ./cephadm bootstrap --config initial-ceph.conf ... + +* The ``--ssh-user *<user>*`` option makes it possible to choose which SSH + user cephadm will use to connect to hosts. The associated SSH key will be + added to ``/home/*<user>*/.ssh/authorized_keys``. The user that you + designate with this option must have passwordless sudo access. + +* If you are using a container on an authenticated registry that requires + login, you may add the argument: + + * ``--registry-json <path to json file>`` + + example contents of JSON file with login info:: + + {"url":"REGISTRY_URL", "username":"REGISTRY_USERNAME", "password":"REGISTRY_PASSWORD"} + + Cephadm will attempt to log in to this registry so it can pull your container + and then store the login info in its config database. Other hosts added to + the cluster will then also be able to make use of the authenticated registry. + +* See :ref:`cephadm-deployment-scenarios` for additional examples for using ``cephadm bootstrap``. + +.. _cephadm-enable-cli: + +Enable Ceph CLI +=============== + +Cephadm does not require any Ceph packages to be installed on the +host. However, we recommend enabling easy access to the ``ceph`` +command. There are several ways to do this: + +* The ``cephadm shell`` command launches a bash shell in a container + with all of the Ceph packages installed. By default, if + configuration and keyring files are found in ``/etc/ceph`` on the + host, they are passed into the container environment so that the + shell is fully functional. Note that when executed on a MON host, + ``cephadm shell`` will infer the ``config`` from the MON container + instead of using the default configuration. If ``--mount <path>`` + is given, then the host ``<path>`` (file or directory) will appear + under ``/mnt`` inside the container: + + .. prompt:: bash # + + cephadm shell + +* To execute ``ceph`` commands, you can also run commands like this: + + .. prompt:: bash # + + cephadm shell -- ceph -s + +* You can install the ``ceph-common`` package, which contains all of the + ceph commands, including ``ceph``, ``rbd``, ``mount.ceph`` (for mounting + CephFS file systems), etc.: + + .. prompt:: bash # + :substitutions: + + cephadm add-repo --release |stable-release| + cephadm install ceph-common + +Confirm that the ``ceph`` command is accessible with: + +.. prompt:: bash # + + ceph -v + + +Confirm that the ``ceph`` command can connect to the cluster and also +its status with: + +.. prompt:: bash # + + ceph status + +Adding Hosts +============ + +Next, add all hosts to the cluster by following :ref:`cephadm-adding-hosts`. + +By default, a ``ceph.conf`` file and a copy of the ``client.admin`` keyring +are maintained in ``/etc/ceph`` on all hosts with the ``_admin`` label, which is initially +applied only to the bootstrap host. We usually recommend that one or more other hosts be +given the ``_admin`` label so that the Ceph CLI (e.g., via ``cephadm shell``) is easily +accessible on multiple hosts. To add the ``_admin`` label to additional host(s), + + .. prompt:: bash # + + ceph orch host label add *<host>* _admin + +Adding additional MONs +====================== + +A typical Ceph cluster has three or five monitor daemons spread +across different hosts. We recommend deploying five +monitors if there are five or more nodes in your cluster. + +Please follow :ref:`deploy_additional_monitors` to deploy additional MONs. + +Adding Storage +============== + +To add storage to the cluster, either tell Ceph to consume any +available and unused device: + + .. prompt:: bash # + + ceph orch apply osd --all-available-devices + +See :ref:`cephadm-deploy-osds` for more detailed instructions. + +Enabling OSD memory autotuning +------------------------------ + +.. warning:: By default, cephadm enables ``osd_memory_target_autotune`` on bootstrap, with ``mgr/cephadm/autotune_memory_target_ratio`` set to ``.7`` of total host memory. + +See :ref:`osd_autotune`. + +To deploy hyperconverged Ceph with TripleO, please refer to the TripleO documentation: `Scenario: Deploy Hyperconverged Ceph <https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/cephadm.html#scenario-deploy-hyperconverged-ceph>`_ + +In other cases where the cluster hardware is not exclusively used by Ceph (hyperconverged), +reduce the memory consumption of Ceph like so: + + .. prompt:: bash # + + # hyperconverged only: + ceph config set mgr mgr/cephadm/autotune_memory_target_ratio 0.2 + +Then enable memory autotuning: + + .. prompt:: bash # + + ceph config set osd osd_memory_target_autotune true + + +Using Ceph +========== + +To use the *Ceph Filesystem*, follow :ref:`orchestrator-cli-cephfs`. + +To use the *Ceph Object Gateway*, follow :ref:`cephadm-deploy-rgw`. + +To use *NFS*, follow :ref:`deploy-cephadm-nfs-ganesha` + +To use *iSCSI*, follow :ref:`cephadm-iscsi` + +.. _cephadm-deployment-scenarios: + +Different deployment scenarios +============================== + +Single host +----------- + +To configure a Ceph cluster to run on a single host, use the +``--single-host-defaults`` flag when bootstrapping. For use cases of this, see +:ref:`one-node-cluster`. + +The ``--single-host-defaults`` flag sets the following configuration options:: + + global/osd_crush_chooseleaf_type = 0 + global/osd_pool_default_size = 2 + mgr/mgr_standby_modules = False + +For more information on these options, see :ref:`one-node-cluster` and +``mgr_standby_modules`` in :ref:`mgr-administrator-guide`. + +.. _cephadm-airgap: + +Deployment in an isolated environment +------------------------------------- + +You might need to install cephadm in an environment that is not connected +directly to the internet (such an environment is also called an "isolated +environment"). This can be done if a custom container registry is used. Either +of two kinds of custom container registry can be used in this scenario: (1) a +Podman-based or Docker-based insecure registry, or (2) a secure registry. + +The practice of installing software on systems that are not connected directly +to the internet is called "airgapping" and registries that are not connected +directly to the internet are referred to as "airgapped". + +Make sure that your container image is inside the registry. Make sure that you +have access to all hosts that you plan to add to the cluster. + +#. Run a local container registry: + + .. prompt:: bash # + + podman run --privileged -d --name registry -p 5000:5000 -v /var/lib/registry:/var/lib/registry --restart=always registry:2 + +#. If you are using an insecure registry, configure Podman or Docker with the + hostname and port where the registry is running. + + .. note:: You must repeat this step for every host that accesses the local + insecure registry. + +#. Push your container image to your local registry. Here are some acceptable + kinds of container images: + + * Ceph container image. See :ref:`containers`. + * Prometheus container image + * Node exporter container image + * Grafana container image + * Alertmanager container image + +#. Create a temporary configuration file to store the names of the monitoring + images. (See :ref:`cephadm_monitoring-images`): + + .. prompt:: bash $ + + cat <<EOF > initial-ceph.conf + + :: + + [mgr] + mgr/cephadm/container_image_prometheus *<hostname>*:5000/prometheus + mgr/cephadm/container_image_node_exporter *<hostname>*:5000/node_exporter + mgr/cephadm/container_image_grafana *<hostname>*:5000/grafana + mgr/cephadm/container_image_alertmanager *<hostname>*:5000/alertmanger + +#. Run bootstrap using the ``--image`` flag and pass the name of your + container image as the argument of the image flag. For example: + + .. prompt:: bash # + + cephadm --image *<hostname>*:5000/ceph/ceph bootstrap --mon-ip *<mon-ip>* + +.. _cluster network: ../rados/configuration/network-config-ref#cluster-network diff --git a/doc/cephadm/operations.rst b/doc/cephadm/operations.rst new file mode 100644 index 000000000..23b396b51 --- /dev/null +++ b/doc/cephadm/operations.rst @@ -0,0 +1,545 @@ +================== +Cephadm Operations +================== + +.. _watching_cephadm_logs: + +Watching cephadm log messages +============================= + +Cephadm writes logs to the ``cephadm`` cluster log channel. You can +monitor Ceph's activity in real time by reading the logs as they fill +up. Run the following command to see the logs in real time: + +.. prompt:: bash # + + ceph -W cephadm + +By default, this command shows info-level events and above. To see +debug-level messages as well as info-level events, run the following +commands: + +.. prompt:: bash # + + ceph config set mgr mgr/cephadm/log_to_cluster_level debug + ceph -W cephadm --watch-debug + +.. warning:: + + The debug messages are very verbose! + +You can see recent events by running the following command: + +.. prompt:: bash # + + ceph log last cephadm + +These events are also logged to the ``ceph.cephadm.log`` file on +monitor hosts as well as to the monitor daemons' stderr. + + +.. _cephadm-logs: + +Ceph daemon logs +================ + +Logging to journald +------------------- + +Ceph daemons traditionally write logs to ``/var/log/ceph``. Ceph daemons log to +journald by default and Ceph logs are captured by the container runtime +environment. They are accessible via ``journalctl``. + +.. note:: Prior to Quincy, ceph daemons logged to stderr. + +Example of logging to journald +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For example, to view the logs for the daemon ``mon.foo`` for a cluster +with ID ``5c5a50ae-272a-455d-99e9-32c6a013e694``, the command would be +something like: + +.. prompt:: bash # + + journalctl -u ceph-5c5a50ae-272a-455d-99e9-32c6a013e694@mon.foo + +This works well for normal operations when logging levels are low. + +Logging to files +---------------- + +You can also configure Ceph daemons to log to files instead of to +journald if you prefer logs to appear in files (as they did in earlier, +pre-cephadm, pre-Octopus versions of Ceph). When Ceph logs to files, +the logs appear in ``/var/log/ceph/<cluster-fsid>``. If you choose to +configure Ceph to log to files instead of to journald, remember to +configure Ceph so that it will not log to journald (the commands for +this are covered below). + +Enabling logging to files +~~~~~~~~~~~~~~~~~~~~~~~~~ + +To enable logging to files, run the following commands: + +.. prompt:: bash # + + ceph config set global log_to_file true + ceph config set global mon_cluster_log_to_file true + +Disabling logging to journald +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If you choose to log to files, we recommend disabling logging to journald or else +everything will be logged twice. Run the following commands to disable logging +to stderr: + +.. prompt:: bash # + + ceph config set global log_to_stderr false + ceph config set global mon_cluster_log_to_stderr false + ceph config set global log_to_journald false + ceph config set global mon_cluster_log_to_journald false + +.. note:: You can change the default by passing --log-to-file during + bootstrapping a new cluster. + +Modifying the log retention schedule +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By default, cephadm sets up log rotation on each host to rotate these +files. You can configure the logging retention schedule by modifying +``/etc/logrotate.d/ceph.<cluster-fsid>``. + + +Data location +============= + +Cephadm stores daemon data and logs in different locations than did +older, pre-cephadm (pre Octopus) versions of ceph: + +* ``/var/log/ceph/<cluster-fsid>`` contains all cluster logs. By + default, cephadm logs via stderr and the container runtime. These + logs will not exist unless you have enabled logging to files as + described in `cephadm-logs`_. +* ``/var/lib/ceph/<cluster-fsid>`` contains all cluster daemon data + (besides logs). +* ``/var/lib/ceph/<cluster-fsid>/<daemon-name>`` contains all data for + an individual daemon. +* ``/var/lib/ceph/<cluster-fsid>/crash`` contains crash reports for + the cluster. +* ``/var/lib/ceph/<cluster-fsid>/removed`` contains old daemon + data directories for stateful daemons (e.g., monitor, prometheus) + that have been removed by cephadm. + +Disk usage +---------- + +Because a few Ceph daemons (notably, the monitors and prometheus) store a +large amount of data in ``/var/lib/ceph`` , we recommend moving this +directory to its own disk, partition, or logical volume so that it does not +fill up the root file system. + + +Health checks +============= +The cephadm module provides additional health checks to supplement the +default health checks provided by the Cluster. These additional health +checks fall into two categories: + +- **cephadm operations**: Health checks in this category are always + executed when the cephadm module is active. +- **cluster configuration**: These health checks are *optional*, and + focus on the configuration of the hosts in the cluster. + +CEPHADM Operations +------------------ + +CEPHADM_PAUSED +~~~~~~~~~~~~~~ + +This indicates that cephadm background work has been paused with +``ceph orch pause``. Cephadm continues to perform passive monitoring +activities (like checking host and daemon status), but it will not +make any changes (like deploying or removing daemons). + +Resume cephadm work by running the following command: + +.. prompt:: bash # + + ceph orch resume + +.. _cephadm-stray-host: + +CEPHADM_STRAY_HOST +~~~~~~~~~~~~~~~~~~ + +This indicates that one or more hosts have Ceph daemons that are +running, but are not registered as hosts managed by *cephadm*. This +means that those services cannot currently be managed by cephadm +(e.g., restarted, upgraded, included in `ceph orch ps`). + +* You can manage the host(s) by running the following command: + + .. prompt:: bash # + + ceph orch host add *<hostname>* + + .. note:: + + You might need to configure SSH access to the remote host + before this will work. + +* See :ref:`cephadm-fqdn` for more information about host names and + domain names. + +* Alternatively, you can manually connect to the host and ensure that + services on that host are removed or migrated to a host that is + managed by *cephadm*. + +* This warning can be disabled entirely by running the following + command: + + .. prompt:: bash # + + ceph config set mgr mgr/cephadm/warn_on_stray_hosts false + +CEPHADM_STRAY_DAEMON +~~~~~~~~~~~~~~~~~~~~ + +One or more Ceph daemons are running but not are not managed by +*cephadm*. This may be because they were deployed using a different +tool, or because they were started manually. Those +services cannot currently be managed by cephadm (e.g., restarted, +upgraded, or included in `ceph orch ps`). + +* If the daemon is a stateful one (monitor or OSD), it should be adopted + by cephadm; see :ref:`cephadm-adoption`. For stateless daemons, it is + usually easiest to provision a new daemon with the ``ceph orch apply`` + command and then stop the unmanaged daemon. + +* If the stray daemon(s) are running on hosts not managed by cephadm, you can manage the host(s) by running the following command: + + .. prompt:: bash # + + ceph orch host add *<hostname>* + + .. note:: + + You might need to configure SSH access to the remote host + before this will work. + +* See :ref:`cephadm-fqdn` for more information about host names and + domain names. + +* This warning can be disabled entirely by running the following command: + + .. prompt:: bash # + + ceph config set mgr mgr/cephadm/warn_on_stray_daemons false + +CEPHADM_HOST_CHECK_FAILED +~~~~~~~~~~~~~~~~~~~~~~~~~ + +One or more hosts have failed the basic cephadm host check, which verifies +that (1) the host is reachable and cephadm can be executed there, and (2) +that the host satisfies basic prerequisites, like a working container +runtime (podman or docker) and working time synchronization. +If this test fails, cephadm will no be able to manage services on that host. + +You can manually run this check by running the following command: + +.. prompt:: bash # + + ceph cephadm check-host *<hostname>* + +You can remove a broken host from management by running the following command: + +.. prompt:: bash # + + ceph orch host rm *<hostname>* + +You can disable this health warning by running the following command: + +.. prompt:: bash # + + ceph config set mgr mgr/cephadm/warn_on_failed_host_check false + +Cluster Configuration Checks +---------------------------- +Cephadm periodically scans each of the hosts in the cluster in order +to understand the state of the OS, disks, NICs etc. These facts can +then be analysed for consistency across the hosts in the cluster to +identify any configuration anomalies. + +Enabling Cluster Configuration Checks +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The configuration checks are an **optional** feature, and are enabled +by running the following command: + +.. prompt:: bash # + + ceph config set mgr mgr/cephadm/config_checks_enabled true + +States Returned by Cluster Configuration Checks +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The configuration checks are triggered after each host scan (1m). The +cephadm log entries will show the current state and outcome of the +configuration checks as follows: + +Disabled state (config_checks_enabled false): + +.. code-block:: bash + + ALL cephadm checks are disabled, use 'ceph config set mgr mgr/cephadm/config_checks_enabled true' to enable + +Enabled state (config_checks_enabled true): + +.. code-block:: bash + + CEPHADM 8/8 checks enabled and executed (0 bypassed, 0 disabled). No issues detected + +Managing Configuration Checks (subcommands) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The configuration checks themselves are managed through several cephadm subcommands. + +To determine whether the configuration checks are enabled, run the following command: + +.. prompt:: bash # + + ceph cephadm config-check status + +This command returns the status of the configuration checker as either "Enabled" or "Disabled". + + +To list all the configuration checks and their current states, run the following command: + +.. code-block:: console + + # ceph cephadm config-check ls + + NAME HEALTHCHECK STATUS DESCRIPTION + kernel_security CEPHADM_CHECK_KERNEL_LSM enabled checks SELINUX/Apparmor profiles are consistent across cluster hosts + os_subscription CEPHADM_CHECK_SUBSCRIPTION enabled checks subscription states are consistent for all cluster hosts + public_network CEPHADM_CHECK_PUBLIC_MEMBERSHIP enabled check that all hosts have a NIC on the Ceph public_network + osd_mtu_size CEPHADM_CHECK_MTU enabled check that OSD hosts share a common MTU setting + osd_linkspeed CEPHADM_CHECK_LINKSPEED enabled check that OSD hosts share a common linkspeed + network_missing CEPHADM_CHECK_NETWORK_MISSING enabled checks that the cluster/public networks defined exist on the Ceph hosts + ceph_release CEPHADM_CHECK_CEPH_RELEASE enabled check for Ceph version consistency - ceph daemons should be on the same release (unless upgrade is active) + kernel_version CEPHADM_CHECK_KERNEL_VERSION enabled checks that the MAJ.MIN of the kernel on Ceph hosts is consistent + +The name of each configuration check can be used to enable or disable a specific check by running a command of the following form: +: + +.. prompt:: bash # + + ceph cephadm config-check disable <name> + +For example: + +.. prompt:: bash # + + ceph cephadm config-check disable kernel_security + +CEPHADM_CHECK_KERNEL_LSM +~~~~~~~~~~~~~~~~~~~~~~~~ +Each host within the cluster is expected to operate within the same Linux +Security Module (LSM) state. For example, if the majority of the hosts are +running with SELINUX in enforcing mode, any host not running in this mode is +flagged as an anomaly and a healtcheck (WARNING) state raised. + +CEPHADM_CHECK_SUBSCRIPTION +~~~~~~~~~~~~~~~~~~~~~~~~~~ +This check relates to the status of vendor subscription. This check is +performed only for hosts using RHEL, but helps to confirm that all hosts are +covered by an active subscription, which ensures that patches and updates are +available. + +CEPHADM_CHECK_PUBLIC_MEMBERSHIP +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +All members of the cluster should have NICs configured on at least one of the +public network subnets. Hosts that are not on the public network will rely on +routing, which may affect performance. + +CEPHADM_CHECK_MTU +~~~~~~~~~~~~~~~~~ +The MTU of the NICs on OSDs can be a key factor in consistent performance. This +check examines hosts that are running OSD services to ensure that the MTU is +configured consistently within the cluster. This is determined by establishing +the MTU setting that the majority of hosts is using. Any anomalies result in a +Ceph health check. + +CEPHADM_CHECK_LINKSPEED +~~~~~~~~~~~~~~~~~~~~~~~ +This check is similar to the MTU check. Linkspeed consistency is a factor in +consistent cluster performance, just as the MTU of the NICs on the OSDs is. +This check determines the linkspeed shared by the majority of OSD hosts, and a +health check is run for any hosts that are set at a lower linkspeed rate. + +CEPHADM_CHECK_NETWORK_MISSING +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The `public_network` and `cluster_network` settings support subnet definitions +for IPv4 and IPv6. If these settings are not found on any host in the cluster, +a health check is raised. + +CEPHADM_CHECK_CEPH_RELEASE +~~~~~~~~~~~~~~~~~~~~~~~~~~ +Under normal operations, the Ceph cluster runs daemons under the same ceph +release (that is, the Ceph cluster runs all daemons under (for example) +Octopus). This check determines the active release for each daemon, and +reports any anomalies as a healthcheck. *This check is bypassed if an upgrade +process is active within the cluster.* + +CEPHADM_CHECK_KERNEL_VERSION +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The OS kernel version (maj.min) is checked for consistency across the hosts. +The kernel version of the majority of the hosts is used as the basis for +identifying anomalies. + +.. _client_keyrings_and_configs: + +Client keyrings and configs +=========================== + +Cephadm can distribute copies of the ``ceph.conf`` file and client keyring +files to hosts. It is usually a good idea to store a copy of the config and +``client.admin`` keyring on any host used to administer the cluster via the +CLI. By default, cephadm does this for any nodes that have the ``_admin`` +label (which normally includes the bootstrap host). + +When a client keyring is placed under management, cephadm will: + + - build a list of target hosts based on the specified placement spec (see + :ref:`orchestrator-cli-placement-spec`) + - store a copy of the ``/etc/ceph/ceph.conf`` file on the specified host(s) + - store a copy of the keyring file on the specified host(s) + - update the ``ceph.conf`` file as needed (e.g., due to a change in the cluster monitors) + - update the keyring file if the entity's key is changed (e.g., via ``ceph + auth ...`` commands) + - ensure that the keyring file has the specified ownership and specified mode + - remove the keyring file when client keyring management is disabled + - remove the keyring file from old hosts if the keyring placement spec is + updated (as needed) + +Listing Client Keyrings +----------------------- + +To see the list of client keyrings are currently under management, run the following command: + +.. prompt:: bash # + + ceph orch client-keyring ls + +Putting a Keyring Under Management +---------------------------------- + +To put a keyring under management, run a command of the following form: + +.. prompt:: bash # + + ceph orch client-keyring set <entity> <placement> [--mode=<mode>] [--owner=<uid>.<gid>] [--path=<path>] + +- By default, the *path* is ``/etc/ceph/client.{entity}.keyring``, which is + where Ceph looks by default. Be careful when specifying alternate locations, + as existing files may be overwritten. +- A placement of ``*`` (all hosts) is common. +- The mode defaults to ``0600`` and ownership to ``0:0`` (user root, group root). + +For example, to create a ``client.rbd`` key and deploy it to hosts with the +``rbd-client`` label and make it group readable by uid/gid 107 (qemu), run the +following commands: + +.. prompt:: bash # + + ceph auth get-or-create-key client.rbd mon 'profile rbd' mgr 'profile rbd' osd 'profile rbd pool=my_rbd_pool' + ceph orch client-keyring set client.rbd label:rbd-client --owner 107:107 --mode 640 + +The resulting keyring file is: + +.. code-block:: console + + -rw-r-----. 1 qemu qemu 156 Apr 21 08:47 /etc/ceph/client.client.rbd.keyring + +Disabling Management of a Keyring File +-------------------------------------- + +To disable management of a keyring file, run a command of the following form: + +.. prompt:: bash # + + ceph orch client-keyring rm <entity> + +.. note:: + + This deletes any keyring files for this entity that were previously written + to cluster nodes. + +.. _etc_ceph_conf_distribution: + +/etc/ceph/ceph.conf +=================== + +Distributing ceph.conf to hosts that have no keyrings +----------------------------------------------------- + +It might be useful to distribute ``ceph.conf`` files to hosts without an +associated client keyring file. By default, cephadm deploys only a +``ceph.conf`` file to hosts where a client keyring is also distributed (see +above). To write config files to hosts without client keyrings, run the +following command: + +.. prompt:: bash # + + ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf true + +Using Placement Specs to specify which hosts get keyrings +--------------------------------------------------------- + +By default, the configs are written to all hosts (i.e., those listed by ``ceph +orch host ls``). To specify which hosts get a ``ceph.conf``, run a command of +the following form: + +.. prompt:: bash # + + ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts <placement spec> + +For example, to distribute configs to hosts with the ``bare_config`` label, run +the following command: + +Distributing ceph.conf to hosts tagged with bare_config +------------------------------------------------------- + +For example, to distribute configs to hosts with the ``bare_config`` label, run the following command: + +.. prompt:: bash # + + ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts label:bare_config + +(See :ref:`orchestrator-cli-placement-spec` for more information about placement specs.) + +Purging a cluster +================= + +.. danger:: THIS OPERATION WILL DESTROY ALL DATA STORED IN THIS CLUSTER + +In order to destory a cluster and delete all data stored in this cluster, pause +cephadm to avoid deploying new daemons. + +.. prompt:: bash # + + ceph orch pause + +Then verify the FSID of the cluster: + +.. prompt:: bash # + + ceph fsid + +Purge ceph daemons from all hosts in the cluster + +.. prompt:: bash # + + # For each host: + cephadm rm-cluster --force --zap-osds --fsid <fsid> diff --git a/doc/cephadm/services/custom-container.rst b/doc/cephadm/services/custom-container.rst new file mode 100644 index 000000000..3ece248c5 --- /dev/null +++ b/doc/cephadm/services/custom-container.rst @@ -0,0 +1,79 @@ +======================== +Custom Container Service +======================== + +The orchestrator enables custom containers to be deployed using a YAML file. +A corresponding :ref:`orchestrator-cli-service-spec` must look like: + +.. code-block:: yaml + + service_type: container + service_id: foo + placement: + ... + spec: + image: docker.io/library/foo:latest + entrypoint: /usr/bin/foo + uid: 1000 + gid: 1000 + args: + - "--net=host" + - "--cpus=2" + ports: + - 8080 + - 8443 + envs: + - SECRET=mypassword + - PORT=8080 + - PUID=1000 + - PGID=1000 + volume_mounts: + CONFIG_DIR: /etc/foo + bind_mounts: + - ['type=bind', 'source=lib/modules', 'destination=/lib/modules', 'ro=true'] + dirs: + - CONFIG_DIR + files: + CONFIG_DIR/foo.conf: + - refresh=true + - username=xyz + - "port: 1234" + +where the properties of a service specification are: + +* ``service_id`` + A unique name of the service. +* ``image`` + The name of the Docker image. +* ``uid`` + The UID to use when creating directories and files in the host system. +* ``gid`` + The GID to use when creating directories and files in the host system. +* ``entrypoint`` + Overwrite the default ENTRYPOINT of the image. +* ``args`` + A list of additional Podman/Docker command line arguments. +* ``ports`` + A list of TCP ports to open in the host firewall. +* ``envs`` + A list of environment variables. +* ``bind_mounts`` + When you use a bind mount, a file or directory on the host machine + is mounted into the container. Relative `source=...` paths will be + located below `/var/lib/ceph/<cluster-fsid>/<daemon-name>`. +* ``volume_mounts`` + When you use a volume mount, a new directory is created within + Docker’s storage directory on the host machine, and Docker manages + that directory’s contents. Relative source paths will be located below + `/var/lib/ceph/<cluster-fsid>/<daemon-name>`. +* ``dirs`` + A list of directories that are created below + `/var/lib/ceph/<cluster-fsid>/<daemon-name>`. +* ``files`` + A dictionary, where the key is the relative path of the file and the + value the file content. The content must be double quoted when using + a string. Use '\\n' for line breaks in that case. Otherwise define + multi-line content as list of strings. The given files will be created + below the directory `/var/lib/ceph/<cluster-fsid>/<daemon-name>`. + The absolute path of the directory where the file will be created must + exist. Use the `dirs` property to create them if necessary. diff --git a/doc/cephadm/services/index.rst b/doc/cephadm/services/index.rst new file mode 100644 index 000000000..26fd8864a --- /dev/null +++ b/doc/cephadm/services/index.rst @@ -0,0 +1,658 @@ +================== +Service Management +================== + +A service is a group of daemons configured together. See these chapters +for details on individual services: + +.. toctree:: + :maxdepth: 1 + + mon + mgr + osd + rgw + mds + nfs + iscsi + custom-container + monitoring + snmp-gateway + +Service Status +============== + + +To see the status of one +of the services running in the Ceph cluster, do the following: + +#. Use the command line to print a list of services. +#. Locate the service whose status you want to check. +#. Print the status of the service. + +The following command prints a list of services known to the orchestrator. To +limit the output to services only on a specified host, use the optional +``--host`` parameter. To limit the output to services of only a particular +type, use the optional ``--type`` parameter (mon, osd, mgr, mds, rgw): + + .. prompt:: bash # + + ceph orch ls [--service_type type] [--service_name name] [--export] [--format f] [--refresh] + +Discover the status of a particular service or daemon: + + .. prompt:: bash # + + ceph orch ls --service_type type --service_name <name> [--refresh] + +To export the service specifications knows to the orchestrator, run the following command. + + .. prompt:: bash # + + ceph orch ls --export + +The service specifications exported with this command will be exported as yaml +and that yaml can be used with the ``ceph orch apply -i`` command. + +For information about retrieving the specifications of single services (including examples of commands), see :ref:`orchestrator-cli-service-spec-retrieve`. + +Daemon Status +============= + +A daemon is a systemd unit that is running and part of a service. + +To see the status of a daemon, do the following: + +#. Print a list of all daemons known to the orchestrator. +#. Query the status of the target daemon. + +First, print a list of all daemons known to the orchestrator: + + .. prompt:: bash # + + ceph orch ps [--hostname host] [--daemon_type type] [--service_name name] [--daemon_id id] [--format f] [--refresh] + +Then query the status of a particular service instance (mon, osd, mds, rgw). +For OSDs the id is the numeric OSD ID. For MDS services the id is the file +system name: + + .. prompt:: bash # + + ceph orch ps --daemon_type osd --daemon_id 0 + +.. _orchestrator-cli-service-spec: + +Service Specification +===================== + +A *Service Specification* is a data structure that is used to specify the +deployment of services. In addition to parameters such as `placement` or +`networks`, the user can set initial values of service configuration parameters +by means of the `config` section. For each param/value configuration pair, +cephadm calls the following command to set its value: + + .. prompt:: bash # + + ceph config set <service-name> <param> <value> + +cephadm raises health warnings in case invalid configuration parameters are +found in the spec (`CEPHADM_INVALID_CONFIG_OPTION`) or if any error while +trying to apply the new configuration option(s) (`CEPHADM_FAILED_SET_OPTION`). + +Here is an example of a service specification in YAML: + +.. code-block:: yaml + + service_type: rgw + service_id: realm.zone + placement: + hosts: + - host1 + - host2 + - host3 + config: + param_1: val_1 + ... + param_N: val_N + unmanaged: false + networks: + - 192.169.142.0/24 + spec: + # Additional service specific attributes. + +In this example, the properties of this service specification are: + +.. py:currentmodule:: ceph.deployment.service_spec + +.. autoclass:: ServiceSpec + :members: + +Each service type can have additional service-specific properties. + +Service specifications of type ``mon``, ``mgr``, and the monitoring +types do not require a ``service_id``. + +A service of type ``osd`` is described in :ref:`drivegroups` + +Many service specifications can be applied at once using ``ceph orch apply -i`` +by submitting a multi-document YAML file:: + + cat <<EOF | ceph orch apply -i - + service_type: mon + placement: + host_pattern: "mon*" + --- + service_type: mgr + placement: + host_pattern: "mgr*" + --- + service_type: osd + service_id: default_drive_group + placement: + host_pattern: "osd*" + data_devices: + all: true + EOF + +.. _orchestrator-cli-service-spec-retrieve: + +Retrieving the running Service Specification +-------------------------------------------- + +If the services have been started via ``ceph orch apply...``, then directly changing +the Services Specification is complicated. Instead of attempting to directly change +the Services Specification, we suggest exporting the running Service Specification by +following these instructions: + + .. prompt:: bash # + + ceph orch ls --service-name rgw.<realm>.<zone> --export > rgw.<realm>.<zone>.yaml + ceph orch ls --service-type mgr --export > mgr.yaml + ceph orch ls --export > cluster.yaml + +The Specification can then be changed and re-applied as above. + +Updating Service Specifications +------------------------------- + +The Ceph Orchestrator maintains a declarative state of each +service in a ``ServiceSpec``. For certain operations, like updating +the RGW HTTP port, we need to update the existing +specification. + +1. List the current ``ServiceSpec``: + + .. prompt:: bash # + + ceph orch ls --service_name=<service-name> --export > myservice.yaml + +2. Update the yaml file: + + .. prompt:: bash # + + vi myservice.yaml + +3. Apply the new ``ServiceSpec``: + + .. prompt:: bash # + + ceph orch apply -i myservice.yaml [--dry-run] + +.. _orchestrator-cli-placement-spec: + +Daemon Placement +================ + +For the orchestrator to deploy a *service*, it needs to know where to deploy +*daemons*, and how many to deploy. This is the role of a placement +specification. Placement specifications can either be passed as command line arguments +or in a YAML files. + +.. note:: + + cephadm will not deploy daemons on hosts with the ``_no_schedule`` label; see :ref:`cephadm-special-host-labels`. + +.. note:: + The **apply** command can be confusing. For this reason, we recommend using + YAML specifications. + + Each ``ceph orch apply <service-name>`` command supersedes the one before it. + If you do not use the proper syntax, you will clobber your work + as you go. + + For example: + + .. prompt:: bash # + + ceph orch apply mon host1 + ceph orch apply mon host2 + ceph orch apply mon host3 + + This results in only one host having a monitor applied to it: host 3. + + (The first command creates a monitor on host1. Then the second command + clobbers the monitor on host1 and creates a monitor on host2. Then the + third command clobbers the monitor on host2 and creates a monitor on + host3. In this scenario, at this point, there is a monitor ONLY on + host3.) + + To make certain that a monitor is applied to each of these three hosts, + run a command like this: + + .. prompt:: bash # + + ceph orch apply mon "host1,host2,host3" + + There is another way to apply monitors to multiple hosts: a ``yaml`` file + can be used. Instead of using the "ceph orch apply mon" commands, run a + command of this form: + + .. prompt:: bash # + + ceph orch apply -i file.yaml + + Here is a sample **file.yaml** file + + .. code-block:: yaml + + service_type: mon + placement: + hosts: + - host1 + - host2 + - host3 + +Explicit placements +------------------- + +Daemons can be explicitly placed on hosts by simply specifying them: + + .. prompt:: bash # + + orch apply prometheus --placement="host1 host2 host3" + +Or in YAML: + +.. code-block:: yaml + + service_type: prometheus + placement: + hosts: + - host1 + - host2 + - host3 + +MONs and other services may require some enhanced network specifications: + + .. prompt:: bash # + + orch daemon add mon --placement="myhost:[v2:1.2.3.4:3300,v1:1.2.3.4:6789]=name" + +where ``[v2:1.2.3.4:3300,v1:1.2.3.4:6789]`` is the network address of the monitor +and ``=name`` specifies the name of the new monitor. + +.. _orch-placement-by-labels: + +Placement by labels +------------------- + +Daemon placement can be limited to hosts that match a specific label. To set +a label ``mylabel`` to the appropriate hosts, run this command: + + .. prompt:: bash # + + ceph orch host label add *<hostname>* mylabel + + To view the current hosts and labels, run this command: + + .. prompt:: bash # + + ceph orch host ls + + For example: + + .. prompt:: bash # + + ceph orch host label add host1 mylabel + ceph orch host label add host2 mylabel + ceph orch host label add host3 mylabel + ceph orch host ls + + .. code-block:: bash + + HOST ADDR LABELS STATUS + host1 mylabel + host2 mylabel + host3 mylabel + host4 + host5 + +Now, Tell cephadm to deploy daemons based on the label by running +this command: + + .. prompt:: bash # + + orch apply prometheus --placement="label:mylabel" + +Or in YAML: + +.. code-block:: yaml + + service_type: prometheus + placement: + label: "mylabel" + +* See :ref:`orchestrator-host-labels` + +Placement by pattern matching +----------------------------- + +Daemons can be placed on hosts as well: + + .. prompt:: bash # + + orch apply prometheus --placement='myhost[1-3]' + +Or in YAML: + +.. code-block:: yaml + + service_type: prometheus + placement: + host_pattern: "myhost[1-3]" + +To place a service on *all* hosts, use ``"*"``: + + .. prompt:: bash # + + orch apply node-exporter --placement='*' + +Or in YAML: + +.. code-block:: yaml + + service_type: node-exporter + placement: + host_pattern: "*" + + +Changing the number of daemons +------------------------------ + +By specifying ``count``, only the number of daemons specified will be created: + + .. prompt:: bash # + + orch apply prometheus --placement=3 + +To deploy *daemons* on a subset of hosts, specify the count: + + .. prompt:: bash # + + orch apply prometheus --placement="2 host1 host2 host3" + +If the count is bigger than the amount of hosts, cephadm deploys one per host: + + .. prompt:: bash # + + orch apply prometheus --placement="3 host1 host2" + +The command immediately above results in two Prometheus daemons. + +YAML can also be used to specify limits, in the following way: + +.. code-block:: yaml + + service_type: prometheus + placement: + count: 3 + +YAML can also be used to specify limits on hosts: + +.. code-block:: yaml + + service_type: prometheus + placement: + count: 2 + hosts: + - host1 + - host2 + - host3 + +.. _cephadm_co_location: + +Co-location of daemons +---------------------- + +Cephadm supports the deployment of multiple daemons on the same host: + +.. code-block:: yaml + + service_type: rgw + placement: + label: rgw + count_per_host: 2 + +The main reason for deploying multiple daemons per host is an additional +performance benefit for running multiple RGW and MDS daemons on the same host. + +See also: + +* :ref:`cephadm_mgr_co_location`. +* :ref:`cephadm-rgw-designated_gateways`. + +This feature was introduced in Pacific. + +Algorithm description +--------------------- + +Cephadm's declarative state consists of a list of service specifications +containing placement specifications. + +Cephadm continually compares a list of daemons actually running in the cluster +against the list in the service specifications. Cephadm adds new daemons and +removes old daemons as necessary in order to conform to the service +specifications. + +Cephadm does the following to maintain compliance with the service +specifications. + +Cephadm first selects a list of candidate hosts. Cephadm seeks explicit host +names and selects them. If cephadm finds no explicit host names, it looks for +label specifications. If no label is defined in the specification, cephadm +selects hosts based on a host pattern. If no host pattern is defined, as a last +resort, cephadm selects all known hosts as candidates. + +Cephadm is aware of existing daemons running services and tries to avoid moving +them. + +Cephadm supports the deployment of a specific amount of services. +Consider the following service specification: + +.. code-block:: yaml + + service_type: mds + service_name: myfs + placement: + count: 3 + label: myfs + +This service specifcation instructs cephadm to deploy three daemons on hosts +labeled ``myfs`` across the cluster. + +If there are fewer than three daemons deployed on the candidate hosts, cephadm +randomly chooses hosts on which to deploy new daemons. + +If there are more than three daemons deployed on the candidate hosts, cephadm +removes existing daemons. + +Finally, cephadm removes daemons on hosts that are outside of the list of +candidate hosts. + +.. note:: + + There is a special case that cephadm must consider. + + If there are fewer hosts selected by the placement specification than + demanded by ``count``, cephadm will deploy only on the selected hosts. + +Extra Container Arguments +========================= + +.. warning:: + The arguments provided for extra container args are limited to whatever arguments are available for a `run` command from whichever container engine you are using. Providing any arguments the `run` command does not support (or invalid values for arguments) will cause the daemon to fail to start. + + +Cephadm supports providing extra miscellaneous container arguments for +specific cases when they may be necessary. For example, if a user needed +to limit the amount of cpus their mon daemons make use of they could apply +a spec like + +.. code-block:: yaml + + service_type: mon + service_name: mon + placement: + hosts: + - host1 + - host2 + - host3 + extra_container_args: + - "--cpus=2" + +which would cause each mon daemon to be deployed with `--cpus=2`. + +Mounting Files with Extra Container Arguments +--------------------------------------------- + +A common use case for extra container arguments is to mount additional +files within the container. However, some intuitive formats for doing +so can cause deployment to fail (see https://tracker.ceph.com/issues/57338). +The recommended syntax for mounting a file with extra container arguments is: + +.. code-block:: yaml + + extra_container_args: + - "-v" + - "/absolute/file/path/on/host:/absolute/file/path/in/container" + +For example: + +.. code-block:: yaml + + extra_container_args: + - "-v" + - "/opt/ceph_cert/host.cert:/etc/grafana/certs/cert_file:ro" + +.. _orch-rm: + +Removing a Service +================== + +In order to remove a service including the removal +of all daemons of that service, run + +.. prompt:: bash + + ceph orch rm <service-name> + +For example: + +.. prompt:: bash + + ceph orch rm rgw.myrgw + +.. _cephadm-spec-unmanaged: + +Disabling automatic deployment of daemons +========================================= + +Cephadm supports disabling the automated deployment and removal of daemons on a +per service basis. The CLI supports two commands for this. + +In order to fully remove a service, see :ref:`orch-rm`. + +Disabling automatic management of daemons +----------------------------------------- + +To disable the automatic management of dameons, set ``unmanaged=True`` in the +:ref:`orchestrator-cli-service-spec` (``mgr.yaml``). + +``mgr.yaml``: + +.. code-block:: yaml + + service_type: mgr + unmanaged: true + placement: + label: mgr + + +.. prompt:: bash # + + ceph orch apply -i mgr.yaml + + +.. note:: + + After you apply this change in the Service Specification, cephadm will no + longer deploy any new daemons (even if the placement specification matches + additional hosts). + +Deploying a daemon on a host manually +------------------------------------- + +.. note:: + + This workflow has a very limited use case and should only be used + in rare circumstances. + +To manually deploy a daemon on a host, follow these steps: + +Modify the service spec for a service by getting the +existing spec, adding ``unmanaged: true``, and applying the modified spec. + +Then manually deploy the daemon using the following: + + .. prompt:: bash # + + ceph orch daemon add <daemon-type> --placement=<placement spec> + +For example : + + .. prompt:: bash # + + ceph orch daemon add mgr --placement=my_host + +.. note:: + + Removing ``unmanaged: true`` from the service spec will + enable the reconciliation loop for this service and will + potentially lead to the removal of the daemon, depending + on the placement spec. + +Removing a daemon from a host manually +-------------------------------------- + +To manually remove a daemon, run a command of the following form: + + .. prompt:: bash # + + ceph orch daemon rm <daemon name>... [--force] + +For example: + + .. prompt:: bash # + + ceph orch daemon rm mgr.my_host.xyzxyz + +.. note:: + + For managed services (``unmanaged=False``), cephadm will automatically + deploy a new daemon a few seconds later. + +See also +-------- + +* See :ref:`cephadm-osd-declarative` for special handling of unmanaged OSDs. +* See also :ref:`cephadm-pause` diff --git a/doc/cephadm/services/iscsi.rst b/doc/cephadm/services/iscsi.rst new file mode 100644 index 000000000..e039e8d9a --- /dev/null +++ b/doc/cephadm/services/iscsi.rst @@ -0,0 +1,80 @@ +============= +iSCSI Service +============= + +.. _cephadm-iscsi: + +Deploying iSCSI +=============== + +To deploy an iSCSI gateway, create a yaml file containing a +service specification for iscsi: + +.. code-block:: yaml + + service_type: iscsi + service_id: iscsi + placement: + hosts: + - host1 + - host2 + spec: + pool: mypool # RADOS pool where ceph-iscsi config data is stored. + trusted_ip_list: "IP_ADDRESS_1,IP_ADDRESS_2" + api_port: ... # optional + api_user: ... # optional + api_password: ... # optional + api_secure: true/false # optional + ssl_cert: | # optional + ... + ssl_key: | # optional + ... + +For example: + +.. code-block:: yaml + + service_type: iscsi + service_id: iscsi + placement: + hosts: + - [...] + spec: + pool: iscsi_pool + trusted_ip_list: "IP_ADDRESS_1,IP_ADDRESS_2,IP_ADDRESS_3,..." + api_user: API_USERNAME + api_password: API_PASSWORD + ssl_cert: | + -----BEGIN CERTIFICATE----- + MIIDtTCCAp2gAwIBAgIYMC4xNzc1NDQxNjEzMzc2MjMyXzxvQ7EcMA0GCSqGSIb3 + DQEBCwUAMG0xCzAJBgNVBAYTAlVTMQ0wCwYDVQQIDARVdGFoMRcwFQYDVQQHDA5T + [...] + -----END CERTIFICATE----- + ssl_key: | + -----BEGIN PRIVATE KEY----- + MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQC5jdYbjtNTAKW4 + /CwQr/7wOiLGzVxChn3mmCIF3DwbL/qvTFTX2d8bDf6LjGwLYloXHscRfxszX/4h + [...] + -----END PRIVATE KEY----- + +.. py:currentmodule:: ceph.deployment.service_spec + +.. autoclass:: IscsiServiceSpec + :members: + + +The specification can then be applied using: + +.. prompt:: bash # + + ceph orch apply -i iscsi.yaml + + +See :ref:`orchestrator-cli-placement-spec` for details of the placement specification. + +See also: :ref:`orchestrator-cli-service-spec`. + +Further Reading +=============== + +* RBD: :ref:`ceph-iscsi` diff --git a/doc/cephadm/services/mds.rst b/doc/cephadm/services/mds.rst new file mode 100644 index 000000000..949a0fa5d --- /dev/null +++ b/doc/cephadm/services/mds.rst @@ -0,0 +1,49 @@ +=========== +MDS Service +=========== + + +.. _orchestrator-cli-cephfs: + +Deploy CephFS +============= + +One or more MDS daemons is required to use the :term:`CephFS` file system. +These are created automatically if the newer ``ceph fs volume`` +interface is used to create a new file system. For more information, +see :ref:`fs-volumes-and-subvolumes`. + +For example: + +.. prompt:: bash # + + ceph fs volume create <fs_name> --placement="<placement spec>" + +where ``fs_name`` is the name of the CephFS and ``placement`` is a +:ref:`orchestrator-cli-placement-spec`. + +For manually deploying MDS daemons, use this specification: + +.. code-block:: yaml + + service_type: mds + service_id: fs_name + placement: + count: 3 + + +The specification can then be applied using: + +.. prompt:: bash # + + ceph orch apply -i mds.yaml + +See :ref:`orchestrator-cli-stateless-services` for manually deploying +MDS daemons on the CLI. + +Further Reading +=============== + +* :ref:`ceph-file-system` + + diff --git a/doc/cephadm/services/mgr.rst b/doc/cephadm/services/mgr.rst new file mode 100644 index 000000000..133a00d77 --- /dev/null +++ b/doc/cephadm/services/mgr.rst @@ -0,0 +1,43 @@ +.. _mgr-cephadm-mgr: + +=========== +MGR Service +=========== + +The cephadm MGR service is hosting different modules, like the :ref:`mgr-dashboard` +and the cephadm manager module. + +.. _cephadm-mgr-networks: + +Specifying Networks +------------------- + +The MGR service supports binding only to a specific IP within a network. + +example spec file (leveraging a default placement): + +.. code-block:: yaml + + service_type: mgr + networks: + - 192.169.142.0/24 + +.. _cephadm_mgr_co_location: + +Allow co-location of MGR daemons +================================ + +In deployment scenarios with just a single host, cephadm still needs +to deploy at least two MGR daemons in order to allow an automated +upgrade of the cluster. See ``mgr_standby_modules`` in +the :ref:`mgr-administrator-guide` for further details. + +See also: :ref:`cephadm_co_location`. + + +Further Reading +=============== + +* :ref:`ceph-manager-daemon` +* :ref:`cephadm-manually-deploy-mgr` + diff --git a/doc/cephadm/services/mon.rst b/doc/cephadm/services/mon.rst new file mode 100644 index 000000000..6326b73f4 --- /dev/null +++ b/doc/cephadm/services/mon.rst @@ -0,0 +1,179 @@ +=========== +MON Service +=========== + +.. _deploy_additional_monitors: + +Deploying additional monitors +============================= + +A typical Ceph cluster has three or five monitor daemons that are spread +across different hosts. We recommend deploying five monitors if there are +five or more nodes in your cluster. + +.. _CIDR: https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation + +Ceph deploys monitor daemons automatically as the cluster grows and Ceph +scales back monitor daemons automatically as the cluster shrinks. The +smooth execution of this automatic growing and shrinking depends upon +proper subnet configuration. + +The cephadm bootstrap procedure assigns the first monitor daemon in the +cluster to a particular subnet. ``cephadm`` designates that subnet as the +default subnet of the cluster. New monitor daemons will be assigned by +default to that subnet unless cephadm is instructed to do otherwise. + +If all of the ceph monitor daemons in your cluster are in the same subnet, +manual administration of the ceph monitor daemons is not necessary. +``cephadm`` will automatically add up to five monitors to the subnet, as +needed, as new hosts are added to the cluster. + +By default, cephadm will deploy 5 daemons on arbitrary hosts. See +:ref:`orchestrator-cli-placement-spec` for details of specifying +the placement of daemons. + +Designating a Particular Subnet for Monitors +-------------------------------------------- + +To designate a particular IP subnet for use by ceph monitor daemons, use a +command of the following form, including the subnet's address in `CIDR`_ +format (e.g., ``10.1.2.0/24``): + + .. prompt:: bash # + + ceph config set mon public_network *<mon-cidr-network>* + + For example: + + .. prompt:: bash # + + ceph config set mon public_network 10.1.2.0/24 + +Cephadm deploys new monitor daemons only on hosts that have IP addresses in +the designated subnet. + +You can also specify two public networks by using a list of networks: + + .. prompt:: bash # + + ceph config set mon public_network *<mon-cidr-network1>,<mon-cidr-network2>* + + For example: + + .. prompt:: bash # + + ceph config set mon public_network 10.1.2.0/24,192.168.0.1/24 + + +Deploying Monitors on a Particular Network +------------------------------------------ + +You can explicitly specify the IP address or CIDR network for each monitor and +control where each monitor is placed. To disable automated monitor deployment, +run this command: + + .. prompt:: bash # + + ceph orch apply mon --unmanaged + + To deploy each additional monitor: + + .. prompt:: bash # + + ceph orch daemon add mon *<host1:ip-or-network1> + + For example, to deploy a second monitor on ``newhost1`` using an IP + address ``10.1.2.123`` and a third monitor on ``newhost2`` in + network ``10.1.2.0/24``, run the following commands: + + .. prompt:: bash # + + ceph orch apply mon --unmanaged + ceph orch daemon add mon newhost1:10.1.2.123 + ceph orch daemon add mon newhost2:10.1.2.0/24 + + Now, enable automatic placement of Daemons + + .. prompt:: bash # + + ceph orch apply mon --placement="newhost1,newhost2,newhost3" --dry-run + + See :ref:`orchestrator-cli-placement-spec` for details of specifying + the placement of daemons. + + Finally apply this new placement by dropping ``--dry-run`` + + .. prompt:: bash # + + ceph orch apply mon --placement="newhost1,newhost2,newhost3" + + +Moving Monitors to a Different Network +-------------------------------------- + +To move Monitors to a new network, deploy new monitors on the new network and +subsequently remove monitors from the old network. It is not advised to +modify and inject the ``monmap`` manually. + +First, disable the automated placement of daemons: + + .. prompt:: bash # + + ceph orch apply mon --unmanaged + +To deploy each additional monitor: + + .. prompt:: bash # + + ceph orch daemon add mon *<newhost1:ip-or-network1>* + +For example, to deploy a second monitor on ``newhost1`` using an IP +address ``10.1.2.123`` and a third monitor on ``newhost2`` in +network ``10.1.2.0/24``, run the following commands: + + .. prompt:: bash # + + ceph orch apply mon --unmanaged + ceph orch daemon add mon newhost1:10.1.2.123 + ceph orch daemon add mon newhost2:10.1.2.0/24 + + Subsequently remove monitors from the old network: + + .. prompt:: bash # + + ceph orch daemon rm *mon.<oldhost1>* + + Update the ``public_network``: + + .. prompt:: bash # + + ceph config set mon public_network *<mon-cidr-network>* + + For example: + + .. prompt:: bash # + + ceph config set mon public_network 10.1.2.0/24 + + Now, enable automatic placement of Daemons + + .. prompt:: bash # + + ceph orch apply mon --placement="newhost1,newhost2,newhost3" --dry-run + + See :ref:`orchestrator-cli-placement-spec` for details of specifying + the placement of daemons. + + Finally apply this new placement by dropping ``--dry-run`` + + .. prompt:: bash # + + ceph orch apply mon --placement="newhost1,newhost2,newhost3" + +Futher Reading +============== + +* :ref:`rados-operations` +* :ref:`rados-troubleshooting-mon` +* :ref:`cephadm-restore-quorum` + diff --git a/doc/cephadm/services/monitoring.rst b/doc/cephadm/services/monitoring.rst new file mode 100644 index 000000000..86e3e3f69 --- /dev/null +++ b/doc/cephadm/services/monitoring.rst @@ -0,0 +1,457 @@ +.. _mgr-cephadm-monitoring: + +Monitoring Services +=================== + +Ceph Dashboard uses `Prometheus <https://prometheus.io/>`_, `Grafana +<https://grafana.com/>`_, and related tools to store and visualize detailed +metrics on cluster utilization and performance. Ceph users have three options: + +#. Have cephadm deploy and configure these services. This is the default + when bootstrapping a new cluster unless the ``--skip-monitoring-stack`` + option is used. +#. Deploy and configure these services manually. This is recommended for users + with existing prometheus services in their environment (and in cases where + Ceph is running in Kubernetes with Rook). +#. Skip the monitoring stack completely. Some Ceph dashboard graphs will + not be available. + +The monitoring stack consists of `Prometheus <https://prometheus.io/>`_, +Prometheus exporters (:ref:`mgr-prometheus`, `Node exporter +<https://prometheus.io/docs/guides/node-exporter/>`_), `Prometheus Alert +Manager <https://prometheus.io/docs/alerting/alertmanager/>`_ and `Grafana +<https://grafana.com/>`_. + +.. note:: + + Prometheus' security model presumes that untrusted users have access to the + Prometheus HTTP endpoint and logs. Untrusted users have access to all the + (meta)data Prometheus collects that is contained in the database, plus a + variety of operational and debugging information. + + However, Prometheus' HTTP API is limited to read-only operations. + Configurations can *not* be changed using the API and secrets are not + exposed. Moreover, Prometheus has some built-in measures to mitigate the + impact of denial of service attacks. + + Please see `Prometheus' Security model + <https://prometheus.io/docs/operating/security/>` for more detailed + information. + +Deploying monitoring with cephadm +--------------------------------- + +The default behavior of ``cephadm`` is to deploy a basic monitoring stack. It +is however possible that you have a Ceph cluster without a monitoring stack, +and you would like to add a monitoring stack to it. (Here are some ways that +you might have come to have a Ceph cluster without a monitoring stack: You +might have passed the ``--skip-monitoring stack`` option to ``cephadm`` during +the installation of the cluster, or you might have converted an existing +cluster (which had no monitoring stack) to cephadm management.) + +To set up monitoring on a Ceph cluster that has no monitoring, follow the +steps below: + +#. Deploy a node-exporter service on every node of the cluster. The node-exporter provides host-level metrics like CPU and memory utilization: + + .. prompt:: bash # + + ceph orch apply node-exporter + +#. Deploy alertmanager: + + .. prompt:: bash # + + ceph orch apply alertmanager + +#. Deploy Prometheus. A single Prometheus instance is sufficient, but + for high availablility (HA) you might want to deploy two: + + .. prompt:: bash # + + ceph orch apply prometheus + + or + + .. prompt:: bash # + + ceph orch apply prometheus --placement 'count:2' + +#. Deploy grafana: + + .. prompt:: bash # + + ceph orch apply grafana + +.. _cephadm-monitoring-networks-ports: + +Networks and Ports +~~~~~~~~~~~~~~~~~~ + +All monitoring services can have the network and port they bind to configured with a yaml service specification + +example spec file: + +.. code-block:: yaml + + service_type: grafana + service_name: grafana + placement: + count: 1 + networks: + - 192.169.142.0/24 + spec: + port: 4200 + +.. _cephadm_monitoring-images: + +Using custom images +~~~~~~~~~~~~~~~~~~~ + +It is possible to install or upgrade monitoring components based on other +images. To do so, the name of the image to be used needs to be stored in the +configuration first. The following configuration options are available. + +- ``container_image_prometheus`` +- ``container_image_grafana`` +- ``container_image_alertmanager`` +- ``container_image_node_exporter`` + +Custom images can be set with the ``ceph config`` command + +.. code-block:: bash + + ceph config set mgr mgr/cephadm/<option_name> <value> + +For example + +.. code-block:: bash + + ceph config set mgr mgr/cephadm/container_image_prometheus prom/prometheus:v1.4.1 + +If there were already running monitoring stack daemon(s) of the type whose +image you've changed, you must redeploy the daemon(s) in order to have them +actually use the new image. + +For example, if you had changed the prometheus image + +.. prompt:: bash # + + ceph orch redeploy prometheus + + +.. note:: + + By setting a custom image, the default value will be overridden (but not + overwritten). The default value changes when updates become available. + By setting a custom image, you will not be able to update the component + you have set the custom image for automatically. You will need to + manually update the configuration (image name and tag) to be able to + install updates. + + If you choose to go with the recommendations instead, you can reset the + custom image you have set before. After that, the default value will be + used again. Use ``ceph config rm`` to reset the configuration option + + .. code-block:: bash + + ceph config rm mgr mgr/cephadm/<option_name> + + For example + + .. code-block:: bash + + ceph config rm mgr mgr/cephadm/container_image_prometheus + +See also :ref:`cephadm-airgap`. + +.. _cephadm-overwrite-jinja2-templates: + +Using custom configuration files +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By overriding cephadm templates, it is possible to completely customize the +configuration files for monitoring services. + +Internally, cephadm already uses `Jinja2 +<https://jinja.palletsprojects.com/en/2.11.x/>`_ templates to generate the +configuration files for all monitoring components. To be able to customize the +configuration of Prometheus, Grafana or the Alertmanager it is possible to store +a Jinja2 template for each service that will be used for configuration +generation instead. This template will be evaluated every time a service of that +kind is deployed or reconfigured. That way, the custom configuration is +preserved and automatically applied on future deployments of these services. + +.. note:: + + The configuration of the custom template is also preserved when the default + configuration of cephadm changes. If the updated configuration is to be used, + the custom template needs to be migrated *manually* after each upgrade of Ceph. + +Option names +"""""""""""" + +The following templates for files that will be generated by cephadm can be +overridden. These are the names to be used when storing with ``ceph config-key +set``: + +- ``services/alertmanager/alertmanager.yml`` +- ``services/grafana/ceph-dashboard.yml`` +- ``services/grafana/grafana.ini`` +- ``services/prometheus/prometheus.yml`` +- ``services/prometheus/alerting/custom_alerts.yml`` + +You can look up the file templates that are currently used by cephadm in +``src/pybind/mgr/cephadm/templates``: + +- ``services/alertmanager/alertmanager.yml.j2`` +- ``services/grafana/ceph-dashboard.yml.j2`` +- ``services/grafana/grafana.ini.j2`` +- ``services/prometheus/prometheus.yml.j2`` + +Usage +""""" + +The following command applies a single line value: + +.. code-block:: bash + + ceph config-key set mgr/cephadm/<option_name> <value> + +To set contents of files as template use the ``-i`` argument: + +.. code-block:: bash + + ceph config-key set mgr/cephadm/<option_name> -i $PWD/<filename> + +.. note:: + + When using files as input to ``config-key`` an absolute path to the file must + be used. + + +Then the configuration file for the service needs to be recreated. +This is done using `reconfig`. For more details see the following example. + +Example +""""""" + +.. code-block:: bash + + # set the contents of ./prometheus.yml.j2 as template + ceph config-key set mgr/cephadm/services/prometheus/prometheus.yml \ + -i $PWD/prometheus.yml.j2 + + # reconfig the prometheus service + ceph orch reconfig prometheus + +.. code-block:: bash + + # set additional custom alerting rules for Prometheus + ceph config-key set mgr/cephadm/services/prometheus/alerting/custom_alerts.yml \ + -i $PWD/custom_alerts.yml + + # Note that custom alerting rules are not parsed by Jinja and hence escaping + # will not be an issue. + +Deploying monitoring without cephadm +------------------------------------ + +If you have an existing prometheus monitoring infrastructure, or would like +to manage it yourself, you need to configure it to integrate with your Ceph +cluster. + +* Enable the prometheus module in the ceph-mgr daemon + + .. code-block:: bash + + ceph mgr module enable prometheus + + By default, ceph-mgr presents prometheus metrics on port 9283 on each host + running a ceph-mgr daemon. Configure prometheus to scrape these. + +* To enable the dashboard's prometheus-based alerting, see :ref:`dashboard-alerting`. + +* To enable dashboard integration with Grafana, see :ref:`dashboard-grafana`. + +Disabling monitoring +-------------------- + +To disable monitoring and remove the software that supports it, run the following commands: + +.. code-block:: console + + $ ceph orch rm grafana + $ ceph orch rm prometheus --force # this will delete metrics data collected so far + $ ceph orch rm node-exporter + $ ceph orch rm alertmanager + $ ceph mgr module disable prometheus + +See also :ref:`orch-rm`. + +Setting up RBD-Image monitoring +------------------------------- + +Due to performance reasons, monitoring of RBD images is disabled by default. For more information please see +:ref:`prometheus-rbd-io-statistics`. If disabled, the overview and details dashboards will stay empty in Grafana +and the metrics will not be visible in Prometheus. + +Setting up Prometheus +----------------------- + +Setting Prometheus Retention Time +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Cephadm provides the option to set the Prometheus TDSB retention time using +a ``retention_time`` field in the Prometheus service spec. The value defaults +to 15 days (15d). If you would like a different value, such as 1 year (1y) you +can apply a service spec similar to: + +.. code-block:: yaml + + service_type: prometheus + placement: + count: 1 + spec: + retention_time: "1y" + +.. note:: + + If you already had Prometheus daemon(s) deployed before and are updating an + existent spec as opposed to doing a fresh Prometheus deployment, you must also + tell cephadm to redeploy the Prometheus daemon(s) to put this change into effect. + This can be done with a ``ceph orch redeploy prometheus`` command. + +Setting up Grafana +------------------ + +Manually setting the Grafana URL +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Cephadm automatically configures Prometheus, Grafana, and Alertmanager in +all cases except one. + +In a some setups, the Dashboard user's browser might not be able to access the +Grafana URL that is configured in Ceph Dashboard. This can happen when the +cluster and the accessing user are in different DNS zones. + +If this is the case, you can use a configuration option for Ceph Dashboard +to set the URL that the user's browser will use to access Grafana. This +value will never be altered by cephadm. To set this configuration option, +issue the following command: + + .. prompt:: bash $ + + ceph dashboard set-grafana-frontend-api-url <grafana-server-api> + +It might take a minute or two for services to be deployed. After the +services have been deployed, you should see something like this when you issue the command ``ceph orch ls``: + +.. code-block:: console + + $ ceph orch ls + NAME RUNNING REFRESHED IMAGE NAME IMAGE ID SPEC + alertmanager 1/1 6s ago docker.io/prom/alertmanager:latest 0881eb8f169f present + crash 2/2 6s ago docker.io/ceph/daemon-base:latest-master-devel mix present + grafana 1/1 0s ago docker.io/pcuzner/ceph-grafana-el8:latest f77afcf0bcf6 absent + node-exporter 2/2 6s ago docker.io/prom/node-exporter:latest e5a616e4b9cf present + prometheus 1/1 6s ago docker.io/prom/prometheus:latest e935122ab143 present + +Configuring SSL/TLS for Grafana +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +``cephadm`` deploys Grafana using the certificate defined in the ceph +key/value store. If no certificate is specified, ``cephadm`` generates a +self-signed certificate during the deployment of the Grafana service. + +A custom certificate can be configured using the following commands: + +.. prompt:: bash # + + ceph config-key set mgr/cephadm/grafana_key -i $PWD/key.pem + ceph config-key set mgr/cephadm/grafana_crt -i $PWD/certificate.pem + +If you have already deployed Grafana, run ``reconfig`` on the service to +update its configuration: + +.. prompt:: bash # + + ceph orch reconfig grafana + +The ``reconfig`` command also sets the proper URL for Ceph Dashboard. + +Setting the initial admin password +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By default, Grafana will not create an initial +admin user. In order to create the admin user, please create a file +``grafana.yaml`` with this content: + +.. code-block:: yaml + + service_type: grafana + spec: + initial_admin_password: mypassword + +Then apply this specification: + +.. code-block:: bash + + ceph orch apply -i grafana.yaml + ceph orch redeploy grafana + +Grafana will now create an admin user called ``admin`` with the +given password. + + +Setting up Alertmanager +----------------------- + +Adding Alertmanager webhooks +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To add new webhooks to the Alertmanager configuration, add additional +webhook urls like so: + +.. code-block:: yaml + + service_type: alertmanager + spec: + user_data: + default_webhook_urls: + - "https://foo" + - "https://bar" + +Where ``default_webhook_urls`` is a list of additional URLs that are +added to the default receivers' ``<webhook_configs>`` configuration. + +Run ``reconfig`` on the service to update its configuration: + +.. prompt:: bash # + + ceph orch reconfig alertmanager + +Turn on Certificate Validation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If you are using certificates for alertmanager and want to make sure +these certs are verified, you should set the "secure" option to +true in your alertmanager spec (this defaults to false). + +.. code-block:: yaml + + service_type: alertmanager + spec: + secure: true + +If you already had alertmanager daemons running before applying the spec +you must reconfigure them to update their configuration + +.. prompt:: bash # + + ceph orch reconfig alertmanager + +Further Reading +--------------- + +* :ref:`mgr-prometheus` diff --git a/doc/cephadm/services/nfs.rst b/doc/cephadm/services/nfs.rst new file mode 100644 index 000000000..c48d0f765 --- /dev/null +++ b/doc/cephadm/services/nfs.rst @@ -0,0 +1,120 @@ +.. _deploy-cephadm-nfs-ganesha: + +=========== +NFS Service +=========== + +.. note:: Only the NFSv4 protocol is supported. + +The simplest way to manage NFS is via the ``ceph nfs cluster ...`` +commands; see :ref:`mgr-nfs`. This document covers how to manage the +cephadm services directly, which should only be necessary for unusual NFS +configurations. + +Deploying NFS ganesha +===================== + +Cephadm deploys NFS Ganesha daemon (or set of daemons). The configuration for +NFS is stored in the ``nfs-ganesha`` pool and exports are managed via the +``ceph nfs export ...`` commands and via the dashboard. + +To deploy a NFS Ganesha gateway, run the following command: + +.. prompt:: bash # + + ceph orch apply nfs *<svc_id>* [--port *<port>*] [--placement ...] + +For example, to deploy NFS with a service id of *foo* on the default +port 2049 with the default placement of a single daemon: + +.. prompt:: bash # + + ceph orch apply nfs foo + +See :ref:`orchestrator-cli-placement-spec` for the details of the placement +specification. + +Service Specification +===================== + +Alternatively, an NFS service can be applied using a YAML specification. + +.. code-block:: yaml + + service_type: nfs + service_id: mynfs + placement: + hosts: + - host1 + - host2 + spec: + port: 12345 + +In this example, we run the server on the non-default ``port`` of +12345 (instead of the default 2049) on ``host1`` and ``host2``. + +The specification can then be applied by running the following command: + +.. prompt:: bash # + + ceph orch apply -i nfs.yaml + +.. _cephadm-ha-nfs: + +High-availability NFS +===================== + +Deploying an *ingress* service for an existing *nfs* service will provide: + +* a stable, virtual IP that can be used to access the NFS server +* fail-over between hosts if there is a host failure +* load distribution across multiple NFS gateways (although this is rarely necessary) + +Ingress for NFS can be deployed for an existing NFS service +(``nfs.mynfs`` in this example) with the following specification: + +.. code-block:: yaml + + service_type: ingress + service_id: nfs.mynfs + placement: + count: 2 + spec: + backend_service: nfs.mynfs + frontend_port: 2049 + monitor_port: 9000 + virtual_ip: 10.0.0.123/24 + +A few notes: + + * The *virtual_ip* must include a CIDR prefix length, as in the + example above. The virtual IP will normally be configured on the + first identified network interface that has an existing IP in the + same subnet. You can also specify a *virtual_interface_networks* + property to match against IPs in other networks; see + :ref:`ingress-virtual-ip` for more information. + * The *monitor_port* is used to access the haproxy load status + page. The user is ``admin`` by default, but can be modified by + via an *admin* property in the spec. If a password is not + specified via a *password* property in the spec, the auto-generated password + can be found with: + + .. prompt:: bash # + + ceph config-key get mgr/cephadm/ingress.*{svc_id}*/monitor_password + + For example: + + .. prompt:: bash # + + ceph config-key get mgr/cephadm/ingress.nfs.myfoo/monitor_password + + * The backend service (``nfs.mynfs`` in this example) should include + a *port* property that is not 2049 to avoid conflicting with the + ingress service, which could be placed on the same host(s). + +Further Reading +=============== + +* CephFS: :ref:`cephfs-nfs` +* MGR: :ref:`mgr-nfs` diff --git a/doc/cephadm/services/osd.rst b/doc/cephadm/services/osd.rst new file mode 100644 index 000000000..de0d4f82a --- /dev/null +++ b/doc/cephadm/services/osd.rst @@ -0,0 +1,936 @@ +*********** +OSD Service +*********** +.. _device management: ../rados/operations/devices +.. _libstoragemgmt: https://github.com/libstorage/libstoragemgmt + +List Devices +============ + +``ceph-volume`` scans each host in the cluster from time to time in order +to determine which devices are present and whether they are eligible to be +used as OSDs. + +To print a list of devices discovered by ``cephadm``, run this command: + +.. prompt:: bash # + + ceph orch device ls [--hostname=...] [--wide] [--refresh] + +Example +:: + + Hostname Path Type Serial Size Health Ident Fault Available + srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Unknown N/A N/A No + srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Unknown N/A N/A No + srv-01 /dev/sdd hdd 15R0A07DFRD6 300G Unknown N/A N/A No + srv-01 /dev/sde hdd 15P0A0QDFRD6 300G Unknown N/A N/A No + srv-02 /dev/sdb hdd 15R0A033FRD6 300G Unknown N/A N/A No + srv-02 /dev/sdc hdd 15R0A05XFRD6 300G Unknown N/A N/A No + srv-02 /dev/sde hdd 15R0A0ANFRD6 300G Unknown N/A N/A No + srv-02 /dev/sdf hdd 15R0A06EFRD6 300G Unknown N/A N/A No + srv-03 /dev/sdb hdd 15R0A0OGFRD6 300G Unknown N/A N/A No + srv-03 /dev/sdc hdd 15R0A0P7FRD6 300G Unknown N/A N/A No + srv-03 /dev/sdd hdd 15R0A0O7FRD6 300G Unknown N/A N/A No + +Using the ``--wide`` option provides all details relating to the device, +including any reasons that the device might not be eligible for use as an OSD. + +In the above example you can see fields named "Health", "Ident", and "Fault". +This information is provided by integration with `libstoragemgmt`_. By default, +this integration is disabled (because `libstoragemgmt`_ may not be 100% +compatible with your hardware). To make ``cephadm`` include these fields, +enable cephadm's "enhanced device scan" option as follows; + +.. prompt:: bash # + + ceph config set mgr mgr/cephadm/device_enhanced_scan true + +.. warning:: + Although the libstoragemgmt library performs standard SCSI inquiry calls, + there is no guarantee that your firmware fully implements these standards. + This can lead to erratic behaviour and even bus resets on some older + hardware. It is therefore recommended that, before enabling this feature, + you test your hardware's compatibility with libstoragemgmt first to avoid + unplanned interruptions to services. + + There are a number of ways to test compatibility, but the simplest may be + to use the cephadm shell to call libstoragemgmt directly - ``cephadm shell + lsmcli ldl``. If your hardware is supported you should see something like + this: + + :: + + Path | SCSI VPD 0x83 | Link Type | Serial Number | Health Status + ---------------------------------------------------------------------------- + /dev/sda | 50000396082ba631 | SAS | 15P0A0R0FRD6 | Good + /dev/sdb | 50000396082bbbf9 | SAS | 15P0A0YFFRD6 | Good + + +After you have enabled libstoragemgmt support, the output will look something +like this: + +:: + + # ceph orch device ls + Hostname Path Type Serial Size Health Ident Fault Available + srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Good Off Off No + srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Good Off Off No + : + +In this example, libstoragemgmt has confirmed the health of the drives and the ability to +interact with the Identification and Fault LEDs on the drive enclosures. For further +information about interacting with these LEDs, refer to `device management`_. + +.. note:: + The current release of `libstoragemgmt`_ (1.8.8) supports SCSI, SAS, and SATA based + local disks only. There is no official support for NVMe devices (PCIe) + +.. _cephadm-deploy-osds: + +Deploy OSDs +=========== + +Listing Storage Devices +----------------------- + +In order to deploy an OSD, there must be a storage device that is *available* on +which the OSD will be deployed. + +Run this command to display an inventory of storage devices on all cluster hosts: + +.. prompt:: bash # + + ceph orch device ls + +A storage device is considered *available* if all of the following +conditions are met: + +* The device must have no partitions. +* The device must not have any LVM state. +* The device must not be mounted. +* The device must not contain a file system. +* The device must not contain a Ceph BlueStore OSD. +* The device must be larger than 5 GB. + +Ceph will not provision an OSD on a device that is not available. + +Creating New OSDs +----------------- + +There are a few ways to create new OSDs: + +* Tell Ceph to consume any available and unused storage device: + + .. prompt:: bash # + + ceph orch apply osd --all-available-devices + +* Create an OSD from a specific device on a specific host: + + .. prompt:: bash # + + ceph orch daemon add osd *<host>*:*<device-path>* + + For example: + + .. prompt:: bash # + + ceph orch daemon add osd host1:/dev/sdb + + Advanced OSD creation from specific devices on a specific host: + + .. prompt:: bash # + + ceph orch daemon add osd host1:data_devices=/dev/sda,/dev/sdb,db_devices=/dev/sdc,osds_per_device=2 + +* You can use :ref:`drivegroups` to categorize device(s) based on their + properties. This might be useful in forming a clearer picture of which + devices are available to consume. Properties include device type (SSD or + HDD), device model names, size, and the hosts on which the devices exist: + + .. prompt:: bash # + + ceph orch apply -i spec.yml + +Dry Run +------- + +The ``--dry-run`` flag causes the orchestrator to present a preview of what +will happen without actually creating the OSDs. + +For example: + + .. prompt:: bash # + + ceph orch apply osd --all-available-devices --dry-run + + :: + + NAME HOST DATA DB WAL + all-available-devices node1 /dev/vdb - - + all-available-devices node2 /dev/vdc - - + all-available-devices node3 /dev/vdd - - + +.. _cephadm-osd-declarative: + +Declarative State +----------------- + +The effect of ``ceph orch apply`` is persistent. This means that drives that +are added to the system after the ``ceph orch apply`` command completes will be +automatically found and added to the cluster. It also means that drives that +become available (by zapping, for example) after the ``ceph orch apply`` +command completes will be automatically found and added to the cluster. + +We will examine the effects of the following command: + + .. prompt:: bash # + + ceph orch apply osd --all-available-devices + +After running the above command: + +* If you add new disks to the cluster, they will automatically be used to + create new OSDs. +* If you remove an OSD and clean the LVM physical volume, a new OSD will be + created automatically. + +To disable the automatic creation of OSD on available devices, use the +``unmanaged`` parameter: + +If you want to avoid this behavior (disable automatic creation of OSD on available devices), use the ``unmanaged`` parameter: + +.. prompt:: bash # + + ceph orch apply osd --all-available-devices --unmanaged=true + +.. note:: + + Keep these three facts in mind: + + - The default behavior of ``ceph orch apply`` causes cephadm constantly to reconcile. This means that cephadm creates OSDs as soon as new drives are detected. + + - Setting ``unmanaged: True`` disables the creation of OSDs. If ``unmanaged: True`` is set, nothing will happen even if you apply a new OSD service. + + - ``ceph orch daemon add`` creates OSDs, but does not add an OSD service. + +* For cephadm, see also :ref:`cephadm-spec-unmanaged`. + +.. _cephadm-osd-removal: + +Remove an OSD +============= + +Removing an OSD from a cluster involves two steps: + +#. evacuating all placement groups (PGs) from the cluster +#. removing the PG-free OSD from the cluster + +The following command performs these two steps: + +.. prompt:: bash # + + ceph orch osd rm <osd_id(s)> [--replace] [--force] + +Example: + +.. prompt:: bash # + + ceph orch osd rm 0 + +Expected output:: + + Scheduled OSD(s) for removal + +OSDs that are not safe to destroy will be rejected. + +.. note:: + After removing OSDs, if the drives the OSDs were deployed on once again + become available, cephadm may automatically try to deploy more OSDs + on these drives if they match an existing drivegroup spec. If you deployed + the OSDs you are removing with a spec and don't want any new OSDs deployed on + the drives after removal, it's best to modify the drivegroup spec before removal. + Either set ``unmanaged: true`` to stop it from picking up new drives at all, + or modify it in some way that it no longer matches the drives used for the + OSDs you wish to remove. Then re-apply the spec. For more info on drivegroup + specs see :ref:`drivegroups`. For more info on the declarative nature of + cephadm in reference to deploying OSDs, see :ref:`cephadm-osd-declarative` + +Monitoring OSD State +-------------------- + +You can query the state of OSD operation with the following command: + +.. prompt:: bash # + + ceph orch osd rm status + +Expected output:: + + OSD_ID HOST STATE PG_COUNT REPLACE FORCE STARTED_AT + 2 cephadm-dev done, waiting for purge 0 True False 2020-07-17 13:01:43.147684 + 3 cephadm-dev draining 17 False True 2020-07-17 13:01:45.162158 + 4 cephadm-dev started 42 False True 2020-07-17 13:01:45.162158 + + +When no PGs are left on the OSD, it will be decommissioned and removed from the cluster. + +.. note:: + After removing an OSD, if you wipe the LVM physical volume in the device used by the removed OSD, a new OSD will be created. + For more information on this, read about the ``unmanaged`` parameter in :ref:`cephadm-osd-declarative`. + +Stopping OSD Removal +-------------------- + +It is possible to stop queued OSD removals by using the following command: + +.. prompt:: bash # + + ceph orch osd rm stop <osd_id(s)> + +Example: + +.. prompt:: bash # + + ceph orch osd rm stop 4 + +Expected output:: + + Stopped OSD(s) removal + +This resets the initial state of the OSD and takes it off the removal queue. + +.. _cephadm-replacing-an-osd: + +Replacing an OSD +---------------- + +.. prompt:: bash # + + orch osd rm <osd_id(s)> --replace [--force] + +Example: + +.. prompt:: bash # + + ceph orch osd rm 4 --replace + +Expected output:: + + Scheduled OSD(s) for replacement + +This follows the same procedure as the procedure in the "Remove OSD" section, with +one exception: the OSD is not permanently removed from the CRUSH hierarchy, but is +instead assigned a 'destroyed' flag. + +.. note:: + The new OSD that will replace the removed OSD must be created on the same host + as the OSD that was removed. + +**Preserving the OSD ID** + +The 'destroyed' flag is used to determine which OSD ids will be reused in the +next OSD deployment. + +If you use OSDSpecs for OSD deployment, your newly added disks will be assigned +the OSD ids of their replaced counterparts. This assumes that the new disks +still match the OSDSpecs. + +Use the ``--dry-run`` flag to make certain that the ``ceph orch apply osd`` +command does what you want it to. The ``--dry-run`` flag shows you what the +outcome of the command will be without making the changes you specify. When +you are satisfied that the command will do what you want, run the command +without the ``--dry-run`` flag. + +.. tip:: + + The name of your OSDSpec can be retrieved with the command ``ceph orch ls`` + +Alternatively, you can use your OSDSpec file: + +.. prompt:: bash # + + ceph orch apply -i <osd_spec_file> --dry-run + +Expected output:: + + NAME HOST DATA DB WAL + <name_of_osd_spec> node1 /dev/vdb - - + + +When this output reflects your intention, omit the ``--dry-run`` flag to +execute the deployment. + + +Erasing Devices (Zapping Devices) +--------------------------------- + +Erase (zap) a device so that it can be reused. ``zap`` calls ``ceph-volume +zap`` on the remote host. + +.. prompt:: bash # + + ceph orch device zap <hostname> <path> + +Example command: + +.. prompt:: bash # + + ceph orch device zap my_hostname /dev/sdx + +.. note:: + If the unmanaged flag is unset, cephadm automatically deploys drives that + match the OSDSpec. For example, if you use the + ``all-available-devices`` option when creating OSDs, when you ``zap`` a + device the cephadm orchestrator automatically creates a new OSD in the + device. To disable this behavior, see :ref:`cephadm-osd-declarative`. + + +.. _osd_autotune: + +Automatically tuning OSD memory +=============================== + +OSD daemons will adjust their memory consumption based on the +``osd_memory_target`` config option (several gigabytes, by +default). If Ceph is deployed on dedicated nodes that are not sharing +memory with other services, cephadm can automatically adjust the per-OSD +memory consumption based on the total amount of RAM and the number of deployed +OSDs. + +.. warning:: Cephadm sets ``osd_memory_target_autotune`` to ``true`` by default which is unsuitable for hyperconverged infrastructures. + +Cephadm will start with a fraction +(``mgr/cephadm/autotune_memory_target_ratio``, which defaults to +``.7``) of the total RAM in the system, subtract off any memory +consumed by non-autotuned daemons (non-OSDs, for OSDs for which +``osd_memory_target_autotune`` is false), and then divide by the +remaining OSDs. + +The final targets are reflected in the config database with options like:: + + WHO MASK LEVEL OPTION VALUE + osd host:foo basic osd_memory_target 126092301926 + osd host:bar basic osd_memory_target 6442450944 + +Both the limits and the current memory consumed by each daemon are visible from +the ``ceph orch ps`` output in the ``MEM LIMIT`` column:: + + NAME HOST PORTS STATUS REFRESHED AGE MEM USED MEM LIMIT VERSION IMAGE ID CONTAINER ID + osd.1 dael running (3h) 10s ago 3h 72857k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 9e183363d39c + osd.2 dael running (81m) 10s ago 81m 63989k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 1f0cc479b051 + osd.3 dael running (62m) 10s ago 62m 64071k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 ac5537492f27 + +To exclude an OSD from memory autotuning, disable the autotune option +for that OSD and also set a specific memory target. For example, + + .. prompt:: bash # + + ceph config set osd.123 osd_memory_target_autotune false + ceph config set osd.123 osd_memory_target 16G + + +.. _drivegroups: + +Advanced OSD Service Specifications +=================================== + +:ref:`orchestrator-cli-service-spec`\s of type ``osd`` are a way to describe a +cluster layout, using the properties of disks. Service specifications give the +user an abstract way to tell Ceph which disks should turn into OSDs with which +configurations, without knowing the specifics of device names and paths. + +Service specifications make it possible to define a yaml or json file that can +be used to reduce the amount of manual work involved in creating OSDs. + +For example, instead of running the following command: + +.. prompt:: bash [monitor.1]# + + ceph orch daemon add osd *<host>*:*<path-to-device>* + +for each device and each host, we can define a yaml or json file that allows us +to describe the layout. Here's the most basic example. + +Create a file called (for example) ``osd_spec.yml``: + +.. code-block:: yaml + + service_type: osd + service_id: default_drive_group # custom name of the osd spec + placement: + host_pattern: '*' # which hosts to target + spec: + data_devices: # the type of devices you are applying specs to + all: true # a filter, check below for a full list + +This means : + +#. Turn any available device (ceph-volume decides what 'available' is) into an + OSD on all hosts that match the glob pattern '*'. (The glob pattern matches + against the registered hosts from `host ls`) A more detailed section on + host_pattern is available below. + +#. Then pass it to `osd create` like this: + + .. prompt:: bash [monitor.1]# + + ceph orch apply -i /path/to/osd_spec.yml + + This instruction will be issued to all the matching hosts, and will deploy + these OSDs. + + Setups more complex than the one specified by the ``all`` filter are + possible. See :ref:`osd_filters` for details. + + A ``--dry-run`` flag can be passed to the ``apply osd`` command to display a + synopsis of the proposed layout. + +Example + +.. prompt:: bash [monitor.1]# + + ceph orch apply -i /path/to/osd_spec.yml --dry-run + + + +.. _osd_filters: + +Filters +------- + +.. note:: + Filters are applied using an `AND` gate by default. This means that a drive + must fulfill all filter criteria in order to get selected. This behavior can + be adjusted by setting ``filter_logic: OR`` in the OSD specification. + +Filters are used to assign disks to groups, using their attributes to group +them. + +The attributes are based off of ceph-volume's disk query. You can retrieve +information about the attributes with this command: + +.. code-block:: bash + + ceph-volume inventory </path/to/disk> + +Vendor or Model +^^^^^^^^^^^^^^^ + +Specific disks can be targeted by vendor or model: + +.. code-block:: yaml + + model: disk_model_name + +or + +.. code-block:: yaml + + vendor: disk_vendor_name + + +Size +^^^^ + +Specific disks can be targeted by `Size`: + +.. code-block:: yaml + + size: size_spec + +Size specs +__________ + +Size specifications can be of the following forms: + +* LOW:HIGH +* :HIGH +* LOW: +* EXACT + +Concrete examples: + +To include disks of an exact size + +.. code-block:: yaml + + size: '10G' + +To include disks within a given range of size: + +.. code-block:: yaml + + size: '10G:40G' + +To include disks that are less than or equal to 10G in size: + +.. code-block:: yaml + + size: ':10G' + +To include disks equal to or greater than 40G in size: + +.. code-block:: yaml + + size: '40G:' + +Sizes don't have to be specified exclusively in Gigabytes(G). + +Other units of size are supported: Megabyte(M), Gigabyte(G) and Terrabyte(T). +Appending the (B) for byte is also supported: ``MB``, ``GB``, ``TB``. + + +Rotational +^^^^^^^^^^ + +This operates on the 'rotational' attribute of the disk. + +.. code-block:: yaml + + rotational: 0 | 1 + +`1` to match all disks that are rotational + +`0` to match all disks that are non-rotational (SSD, NVME etc) + + +All +^^^ + +This will take all disks that are 'available' + +.. note:: This is exclusive for the data_devices section. + +.. code-block:: yaml + + all: true + + +Limiter +^^^^^^^ + +If you have specified some valid filters but want to limit the number of disks that they match, use the ``limit`` directive: + +.. code-block:: yaml + + limit: 2 + +For example, if you used `vendor` to match all disks that are from `VendorA` +but want to use only the first two, you could use `limit`: + +.. code-block:: yaml + + data_devices: + vendor: VendorA + limit: 2 + +.. note:: `limit` is a last resort and shouldn't be used if it can be avoided. + + +Additional Options +------------------ + +There are multiple optional settings you can use to change the way OSDs are deployed. +You can add these options to the base level of an OSD spec for it to take effect. + +This example would deploy all OSDs with encryption enabled. + +.. code-block:: yaml + + service_type: osd + service_id: example_osd_spec + placement: + host_pattern: '*' + spec: + data_devices: + all: true + encrypted: true + +See a full list in the DriveGroupSpecs + +.. py:currentmodule:: ceph.deployment.drive_group + +.. autoclass:: DriveGroupSpec + :members: + :exclude-members: from_json + +Examples +======== + +The simple case +--------------- + +All nodes with the same setup + +.. code-block:: none + + 20 HDDs + Vendor: VendorA + Model: HDD-123-foo + Size: 4TB + + 2 SSDs + Vendor: VendorB + Model: MC-55-44-ZX + Size: 512GB + +This is a common setup and can be described quite easily: + +.. code-block:: yaml + + service_type: osd + service_id: osd_spec_default + placement: + host_pattern: '*' + spec: + data_devices: + model: HDD-123-foo # Note, HDD-123 would also be valid + db_devices: + model: MC-55-44-XZ # Same here, MC-55-44 is valid + +However, we can improve it by reducing the filters on core properties of the drives: + +.. code-block:: yaml + + service_type: osd + service_id: osd_spec_default + placement: + host_pattern: '*' + spec: + data_devices: + rotational: 1 + db_devices: + rotational: 0 + +Now, we enforce all rotating devices to be declared as 'data devices' and all non-rotating devices will be used as shared_devices (wal, db) + +If you know that drives with more than 2TB will always be the slower data devices, you can also filter by size: + +.. code-block:: yaml + + service_type: osd + service_id: osd_spec_default + placement: + host_pattern: '*' + spec: + data_devices: + size: '2TB:' + db_devices: + size: ':2TB' + +.. note:: All of the above OSD specs are equally valid. Which of those you want to use depends on taste and on how much you expect your node layout to change. + + +Multiple OSD specs for a single host +------------------------------------ + +Here we have two distinct setups + +.. code-block:: none + + 20 HDDs + Vendor: VendorA + Model: HDD-123-foo + Size: 4TB + + 12 SSDs + Vendor: VendorB + Model: MC-55-44-ZX + Size: 512GB + + 2 NVMEs + Vendor: VendorC + Model: NVME-QQQQ-987 + Size: 256GB + + +* 20 HDDs should share 2 SSDs +* 10 SSDs should share 2 NVMes + +This can be described with two layouts. + +.. code-block:: yaml + + service_type: osd + service_id: osd_spec_hdd + placement: + host_pattern: '*' + spec: + data_devices: + rotational: 0 + db_devices: + model: MC-55-44-XZ + limit: 2 # db_slots is actually to be favoured here, but it's not implemented yet + --- + service_type: osd + service_id: osd_spec_ssd + placement: + host_pattern: '*' + spec: + data_devices: + model: MC-55-44-XZ + db_devices: + vendor: VendorC + +This would create the desired layout by using all HDDs as data_devices with two SSD assigned as dedicated db/wal devices. +The remaining SSDs(8) will be data_devices that have the 'VendorC' NVMEs assigned as dedicated db/wal devices. + +Multiple hosts with the same disk layout +---------------------------------------- + +Assuming the cluster has different kinds of hosts each with similar disk +layout, it is recommended to apply different OSD specs matching only one +set of hosts. Typically you will have a spec for multiple hosts with the +same layout. + +The sevice id as the unique key: In case a new OSD spec with an already +applied service id is applied, the existing OSD spec will be superseeded. +cephadm will now create new OSD daemons based on the new spec +definition. Existing OSD daemons will not be affected. See :ref:`cephadm-osd-declarative`. + +Node1-5 + +.. code-block:: none + + 20 HDDs + Vendor: Intel + Model: SSD-123-foo + Size: 4TB + 2 SSDs + Vendor: VendorA + Model: MC-55-44-ZX + Size: 512GB + +Node6-10 + +.. code-block:: none + + 5 NVMEs + Vendor: Intel + Model: SSD-123-foo + Size: 4TB + 20 SSDs + Vendor: VendorA + Model: MC-55-44-ZX + Size: 512GB + +You can use the 'placement' key in the layout to target certain nodes. + +.. code-block:: yaml + + service_type: osd + service_id: disk_layout_a + placement: + label: disk_layout_a + spec: + data_devices: + rotational: 1 + db_devices: + rotational: 0 + --- + service_type: osd + service_id: disk_layout_b + placement: + label: disk_layout_b + spec: + data_devices: + model: MC-55-44-XZ + db_devices: + model: SSD-123-foo + +This applies different OSD specs to different hosts depending on the `placement` key. +See :ref:`orchestrator-cli-placement-spec` + +.. note:: + + Assuming each host has a unique disk layout, each OSD + spec needs to have a different service id + + +Dedicated wal + db +------------------ + +All previous cases co-located the WALs with the DBs. +It's however possible to deploy the WAL on a dedicated device as well, if it makes sense. + +.. code-block:: none + + 20 HDDs + Vendor: VendorA + Model: SSD-123-foo + Size: 4TB + + 2 SSDs + Vendor: VendorB + Model: MC-55-44-ZX + Size: 512GB + + 2 NVMEs + Vendor: VendorC + Model: NVME-QQQQ-987 + Size: 256GB + + +The OSD spec for this case would look like the following (using the `model` filter): + +.. code-block:: yaml + + service_type: osd + service_id: osd_spec_default + placement: + host_pattern: '*' + spec: + data_devices: + model: MC-55-44-XZ + db_devices: + model: SSD-123-foo + wal_devices: + model: NVME-QQQQ-987 + + +It is also possible to specify directly device paths in specific hosts like the following: + +.. code-block:: yaml + + service_type: osd + service_id: osd_using_paths + placement: + hosts: + - Node01 + - Node02 + spec: + data_devices: + paths: + - /dev/sdb + db_devices: + paths: + - /dev/sdc + wal_devices: + paths: + - /dev/sdd + + +This can easily be done with other filters, like `size` or `vendor` as well. + +.. _cephadm-osd-activate: + +Activate existing OSDs +====================== + +In case the OS of a host was reinstalled, existing OSDs need to be activated +again. For this use case, cephadm provides a wrapper for :ref:`ceph-volume-lvm-activate` that +activates all existing OSDs on a host. + +.. prompt:: bash # + + ceph cephadm osd activate <host>... + +This will scan all existing disks for OSDs and deploy corresponding daemons. + +Futher Reading +============== + +* :ref:`ceph-volume` +* :ref:`rados-index` diff --git a/doc/cephadm/services/rgw.rst b/doc/cephadm/services/rgw.rst new file mode 100644 index 000000000..0f9b14650 --- /dev/null +++ b/doc/cephadm/services/rgw.rst @@ -0,0 +1,324 @@ +=========== +RGW Service +=========== + +.. _cephadm-deploy-rgw: + +Deploy RGWs +=========== + +Cephadm deploys radosgw as a collection of daemons that manage a +single-cluster deployment or a particular *realm* and *zone* in a +multisite deployment. (For more information about realms and zones, +see :ref:`multisite`.) + +Note that with cephadm, radosgw daemons are configured via the monitor +configuration database instead of via a `ceph.conf` or the command line. If +that configuration isn't already in place (usually in the +``client.rgw.<something>`` section), then the radosgw +daemons will start up with default settings (e.g., binding to port +80). + +To deploy a set of radosgw daemons, with an arbitrary service name +*name*, run the following command: + +.. prompt:: bash # + + ceph orch apply rgw *<name>* [--realm=*<realm-name>*] [--zone=*<zone-name>*] --placement="*<num-daemons>* [*<host1>* ...]" + +Trivial setup +------------- + +For example, to deploy 2 RGW daemons (the default) for a single-cluster RGW deployment +under the arbitrary service id *foo*: + +.. prompt:: bash # + + ceph orch apply rgw foo + +.. _cephadm-rgw-designated_gateways: + +Designated gateways +------------------- + +A common scenario is to have a labeled set of hosts that will act +as gateways, with multiple instances of radosgw running on consecutive +ports 8000 and 8001: + +.. prompt:: bash # + + ceph orch host label add gwhost1 rgw # the 'rgw' label can be anything + ceph orch host label add gwhost2 rgw + ceph orch apply rgw foo '--placement=label:rgw count-per-host:2' --port=8000 + +See also: :ref:`cephadm_co_location`. + +.. _cephadm-rgw-networks: + +Specifying Networks +------------------- + +The RGW service can have the network they bind to configured with a yaml service specification. + +example spec file: + +.. code-block:: yaml + + service_type: rgw + service_id: foo + placement: + label: rgw + count_per_host: 2 + networks: + - 192.169.142.0/24 + spec: + rgw_frontend_port: 8080 + + +Multisite zones +--------------- + +To deploy RGWs serving the multisite *myorg* realm and the *us-east-1* zone on +*myhost1* and *myhost2*: + +.. prompt:: bash # + + ceph orch apply rgw east --realm=myorg --zone=us-east-1 --placement="2 myhost1 myhost2" + +Note that in a multisite situation, cephadm only deploys the daemons. It does not create +or update the realm or zone configurations. To create a new realm and zone, you need to do +something like: + +.. prompt:: bash # + + radosgw-admin realm create --rgw-realm=<realm-name> --default + +.. prompt:: bash # + + radosgw-admin zonegroup create --rgw-zonegroup=<zonegroup-name> --master --default + +.. prompt:: bash # + + radosgw-admin zone create --rgw-zonegroup=<zonegroup-name> --rgw-zone=<zone-name> --master --default + +.. prompt:: bash # + + radosgw-admin period update --rgw-realm=<realm-name> --commit + +See :ref:`orchestrator-cli-placement-spec` for details of the placement +specification. See :ref:`multisite` for more information of setting up multisite RGW. + +See also :ref:`multisite`. + +Setting up HTTPS +---------------- + +In order to enable HTTPS for RGW services, apply a spec file following this scheme: + +.. code-block:: yaml + + service_type: rgw + service_id: myrgw + spec: + rgw_frontend_ssl_certificate: | + -----BEGIN PRIVATE KEY----- + V2VyIGRhcyBsaWVzdCBpc3QgZG9vZi4gTG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFt + ZXQsIGNvbnNldGV0dXIgc2FkaXBzY2luZyBlbGl0ciwgc2VkIGRpYW0gbm9udW15 + IGVpcm1vZCB0ZW1wb3IgaW52aWR1bnQgdXQgbGFib3JlIGV0IGRvbG9yZSBtYWdu + YSBhbGlxdXlhbSBlcmF0LCBzZWQgZGlhbSB2b2x1cHR1YS4gQXQgdmVybyBlb3Mg + ZXQgYWNjdXNhbSBldCBqdXN0byBkdW8= + -----END PRIVATE KEY----- + -----BEGIN CERTIFICATE----- + V2VyIGRhcyBsaWVzdCBpc3QgZG9vZi4gTG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFt + ZXQsIGNvbnNldGV0dXIgc2FkaXBzY2luZyBlbGl0ciwgc2VkIGRpYW0gbm9udW15 + IGVpcm1vZCB0ZW1wb3IgaW52aWR1bnQgdXQgbGFib3JlIGV0IGRvbG9yZSBtYWdu + YSBhbGlxdXlhbSBlcmF0LCBzZWQgZGlhbSB2b2x1cHR1YS4gQXQgdmVybyBlb3Mg + ZXQgYWNjdXNhbSBldCBqdXN0byBkdW8= + -----END CERTIFICATE----- + ssl: true + +Then apply this yaml document: + +.. prompt:: bash # + + ceph orch apply -i myrgw.yaml + +Note the value of ``rgw_frontend_ssl_certificate`` is a literal string as +indicated by a ``|`` character preserving newline characters. + +Service specification +--------------------- + +.. py:currentmodule:: ceph.deployment.service_spec + +.. autoclass:: RGWSpec + :members: + +.. _orchestrator-haproxy-service-spec: + +High availability service for RGW +================================= + +The *ingress* service allows you to create a high availability endpoint +for RGW with a minumum set of configuration options. The orchestrator will +deploy and manage a combination of haproxy and keepalived to provide load +balancing on a floating virtual IP. + +If SSL is used, then SSL must be configured and terminated by the ingress service +and not RGW itself. + +.. image:: ../../images/HAProxy_for_RGW.svg + +There are N hosts where the ingress service is deployed. Each host +has a haproxy daemon and a keepalived daemon. A virtual IP is +automatically configured on only one of these hosts at a time. + +Each keepalived daemon checks every few seconds whether the haproxy +daemon on the same host is responding. Keepalived will also check +that the master keepalived daemon is running without problems. If the +"master" keepalived daemon or the active haproxy is not responding, +one of the remaining keepalived daemons running in backup mode will be +elected as master, and the virtual IP will be moved to that node. + +The active haproxy acts like a load balancer, distributing all RGW requests +between all the RGW daemons available. + +Prerequisites +------------- + +* An existing RGW service, without SSL. (If you want SSL service, the certificate + should be configured on the ingress service, not the RGW service.) + +Deploying +--------- + +Use the command:: + + ceph orch apply -i <ingress_spec_file> + +Service specification +--------------------- + +It is a yaml format file with the following properties: + +.. code-block:: yaml + + service_type: ingress + service_id: rgw.something # adjust to match your existing RGW service + placement: + hosts: + - host1 + - host2 + - host3 + spec: + backend_service: rgw.something # adjust to match your existing RGW service + virtual_ip: <string>/<string> # ex: 192.168.20.1/24 + frontend_port: <integer> # ex: 8080 + monitor_port: <integer> # ex: 1967, used by haproxy for load balancer status + virtual_interface_networks: [ ... ] # optional: list of CIDR networks + ssl_cert: | # optional: SSL certificate and key + -----BEGIN CERTIFICATE----- + ... + -----END CERTIFICATE----- + -----BEGIN PRIVATE KEY----- + ... + -----END PRIVATE KEY----- + +.. code-block:: yaml + + service_type: ingress + service_id: rgw.something # adjust to match your existing RGW service + placement: + hosts: + - host1 + - host2 + - host3 + spec: + backend_service: rgw.something # adjust to match your existing RGW service + virtual_ips_list: + - <string>/<string> # ex: 192.168.20.1/24 + - <string>/<string> # ex: 192.168.20.2/24 + - <string>/<string> # ex: 192.168.20.3/24 + frontend_port: <integer> # ex: 8080 + monitor_port: <integer> # ex: 1967, used by haproxy for load balancer status + virtual_interface_networks: [ ... ] # optional: list of CIDR networks + ssl_cert: | # optional: SSL certificate and key + -----BEGIN CERTIFICATE----- + ... + -----END CERTIFICATE----- + -----BEGIN PRIVATE KEY----- + ... + -----END PRIVATE KEY----- + + +where the properties of this service specification are: + +* ``service_type`` + Mandatory and set to "ingress" +* ``service_id`` + The name of the service. We suggest naming this after the service you are + controlling ingress for (e.g., ``rgw.foo``). +* ``placement hosts`` + The hosts where it is desired to run the HA daemons. An haproxy and a + keepalived container will be deployed on these hosts. These hosts do not need + to match the nodes where RGW is deployed. +* ``virtual_ip`` + The virtual IP (and network) in CIDR format where the ingress service will be available. +* ``virtual_ips_list`` + The virtual IP address in CIDR format where the ingress service will be available. + Each virtual IP address will be primary on one node running the ingress service. The number + of virtual IP addresses must be less than or equal to the number of ingress nodes. +* ``virtual_interface_networks`` + A list of networks to identify which ethernet interface to use for the virtual IP. +* ``frontend_port`` + The port used to access the ingress service. +* ``ssl_cert``: + SSL certificate, if SSL is to be enabled. This must contain the both the certificate and + private key blocks in .pem format. + +.. _ingress-virtual-ip: + +Selecting ethernet interfaces for the virtual IP +------------------------------------------------ + +You cannot simply provide the name of the network interface on which +to configure the virtual IP because interface names tend to vary +across hosts (and/or reboots). Instead, cephadm will select +interfaces based on other existing IP addresses that are already +configured. + +Normally, the virtual IP will be configured on the first network +interface that has an existing IP in the same subnet. For example, if +the virtual IP is 192.168.0.80/24 and eth2 has the static IP +192.168.0.40/24, cephadm will use eth2. + +In some cases, the virtual IP may not belong to the same subnet as an existing static +IP. In such cases, you can provide a list of subnets to match against existing IPs, +and cephadm will put the virtual IP on the first network interface to match. For example, +if the virtual IP is 192.168.0.80/24 and we want it on the same interface as the machine's +static IP in 10.10.0.0/16, you can use a spec like:: + + service_type: ingress + service_id: rgw.something + spec: + virtual_ip: 192.168.0.80/24 + virtual_interface_networks: + - 10.10.0.0/16 + ... + +A consequence of this strategy is that you cannot currently configure the virtual IP +on an interface that has no existing IP address. In this situation, we suggest +configuring a "dummy" IP address is an unroutable network on the correct interface +and reference that dummy network in the networks list (see above). + + +Useful hints for ingress +------------------------ + +* It is good to have at least 3 RGW daemons. +* We recommend at least 3 hosts for the ingress service. + +Further Reading +=============== + +* :ref:`object-gateway` diff --git a/doc/cephadm/services/snmp-gateway.rst b/doc/cephadm/services/snmp-gateway.rst new file mode 100644 index 000000000..f927fdfd0 --- /dev/null +++ b/doc/cephadm/services/snmp-gateway.rst @@ -0,0 +1,171 @@ +==================== +SNMP Gateway Service +==================== + +SNMP_ is still a widely used protocol, to monitor distributed systems and devices across a variety of hardware +and software platforms. Ceph's SNMP integration focuses on forwarding alerts from it's Prometheus Alertmanager +cluster to a gateway daemon. The gateway daemon, transforms the alert into an SNMP Notification and sends +it on to a designated SNMP management platform. The gateway daemon is from the snmp_notifier_ project, +which provides SNMP V2c and V3 support (authentication and encryption). + +Ceph's SNMP gateway service deploys one instance of the gateway by default. You may increase this +by providing placement information. However, bear in mind that if you enable multiple SNMP gateway daemons, +your SNMP management platform will receive multiple notifications for the same event. + +.. _SNMP: https://en.wikipedia.org/wiki/Simple_Network_Management_Protocol +.. _snmp_notifier: https://github.com/maxwo/snmp_notifier + +Compatibility +============= +The table below shows the SNMP versions that are supported by the gateway implementation + +================ =========== =============================================== + SNMP Version Supported Notes +================ =========== =============================================== + V1 ❌ Not supported by snmp_notifier + V2c ✔ + V3 authNoPriv ✔ uses username/password authentication, without + encryption (NoPriv = no privacy) + V3 authPriv ✔ uses username/password authentication with + encryption to the SNMP management platform +================ =========== =============================================== + + +Deploying an SNMP Gateway +========================= +Both SNMP V2c and V3 provide credentials support. In the case of V2c, this is just the community string - but for V3 +environments you must provide additional authentication information. These credentials are not supported on the command +line when deploying the service. Instead, you must create the service using a credentials file (in yaml format), or +specify the complete service definition in a yaml file. + +Command format +-------------- + +.. prompt:: bash # + + ceph orch apply snmp-gateway <snmp_version:V2c|V3> <destination> [<port:int>] [<engine_id>] [<auth_protocol: MD5|SHA>] [<privacy_protocol:DES|AES>] [<placement>] ... + + +Usage Notes + +- you must supply the ``--snmp-version`` parameter +- the ``--destination`` parameter must be of the format hostname:port (no default) +- you may omit ``--port``. It defaults to 9464 +- the ``--engine-id`` is a unique identifier for the device (in hex) and required for SNMP v3 only. + Suggested value: 8000C53F<fsid> where the fsid is from your cluster, without the '-' symbols +- for SNMP V3, the ``--auth-protocol`` setting defaults to **SHA** +- for SNMP V3, with encryption you must define the ``--privacy-protocol`` +- you **must** provide a -i <filename> to pass the secrets/passwords to the orchestrator + +Deployment Examples +=================== + +SNMP V2c +-------- +Here's an example for V2c, showing CLI and service based deployments + +.. prompt:: bash # + + ceph orch apply snmp-gateway --port 9464 --snmp_version=V2c --destination=192.168.122.73:162 -i ./snmp_creds.yaml + +with a credentials file that contains; + +.. code-block:: yaml + + --- + snmp_community: public + +Alternatively, you can create a yaml definition for the gateway and apply it from a single file + +.. prompt:: bash # + + ceph orch apply -i snmp-gateway.yml + +with the file containing the following configuration + +.. code-block:: yaml + + service_type: snmp-gateway + service_name: snmp-gateway + placement: + count: 1 + spec: + credentials: + snmp_community: public + port: 9464 + snmp_destination: 192.168.122.73:162 + snmp_version: V2c + + +SNMP V3 (authNoPriv) +-------------------- +Deploying an snmp-gateway service supporting SNMP V3 with authentication only, would look like this; + +.. prompt:: bash # + + ceph orch apply snmp-gateway --snmp-version=V3 --engine-id=800C53F000000 --destination=192.168.122.1:162 -i ./snmpv3_creds.yml + +with a credentials file as; + +.. code-block:: yaml + + --- + snmp_v3_auth_username: myuser + snmp_v3_auth_password: mypassword + +or as a service configuration file + +.. code-block:: yaml + + service_type: snmp-gateway + service_name: snmp-gateway + placement: + count: 1 + spec: + credentials: + snmp_v3_auth_password: mypassword + snmp_v3_auth_username: myuser + engine_id: 800C53F000000 + port: 9464 + snmp_destination: 192.168.122.1:162 + snmp_version: V3 + + +SNMP V3 (authPriv) +------------------ + +Defining an SNMP V3 gateway service that implements authentication and privacy (encryption), requires two additional values + +.. prompt:: bash # + + ceph orch apply snmp-gateway --snmp-version=V3 --engine-id=800C53F000000 --destination=192.168.122.1:162 --privacy-protocol=AES -i ./snmpv3_creds.yml + +with a credentials file as; + +.. code-block:: yaml + + --- + snmp_v3_auth_username: myuser + snmp_v3_auth_password: mypassword + snmp_v3_priv_password: mysecret + + +.. note:: + + The credentials are stored on the host, restricted to the root user and passed to the snmp_notifier daemon as + an environment file (``--env-file``), to limit exposure. + + +AlertManager Integration +======================== +When an SNMP gateway service is deployed or updated, the Prometheus Alertmanager configuration is automatically updated to forward any +alert that has an OID_ label to the SNMP gateway daemon for processing. + +.. _OID: https://en.wikipedia.org/wiki/Object_identifier + +Implementing the MIB +====================== +To make sense of the SNMP Notification/Trap, you'll need to apply the MIB to your SNMP management platform. The MIB (CEPH-MIB.txt) can +downloaded from the main Ceph repo_ + +.. _repo: https://github.com/ceph/ceph/tree/master/monitoring/snmp diff --git a/doc/cephadm/troubleshooting.rst b/doc/cephadm/troubleshooting.rst new file mode 100644 index 000000000..9a534f633 --- /dev/null +++ b/doc/cephadm/troubleshooting.rst @@ -0,0 +1,370 @@ +Troubleshooting +=============== + +You might need to investigate why a cephadm command failed +or why a certain service no longer runs properly. + +Cephadm deploys daemons as containers. This means that +troubleshooting those containerized daemons might work +differently than you expect (and that is certainly true if +you expect this troubleshooting to work the way that +troubleshooting does when the daemons involved aren't +containerized). + +Here are some tools and commands to help you troubleshoot +your Ceph environment. + +.. _cephadm-pause: + +Pausing or disabling cephadm +---------------------------- + +If something goes wrong and cephadm is behaving badly, you can +pause most of the Ceph cluster's background activity by running +the following command: + +.. prompt:: bash # + + ceph orch pause + +This stops all changes in the Ceph cluster, but cephadm will +still periodically check hosts to refresh its inventory of +daemons and devices. You can disable cephadm completely by +running the following commands: + +.. prompt:: bash # + + ceph orch set backend '' + ceph mgr module disable cephadm + +These commands disable all of the ``ceph orch ...`` CLI commands. +All previously deployed daemon containers continue to exist and +will start as they did before you ran these commands. + +See :ref:`cephadm-spec-unmanaged` for information on disabling +individual services. + + +Per-service and per-daemon events +--------------------------------- + +In order to help with the process of debugging failed daemon +deployments, cephadm stores events per service and per daemon. +These events often contain information relevant to +troubleshooting +your Ceph cluster. + +Listing service events +~~~~~~~~~~~~~~~~~~~~~~ + +To see the events associated with a certain service, run a +command of the and following form: + +.. prompt:: bash # + + ceph orch ls --service_name=<service-name> --format yaml + +This will return something in the following form: + +.. code-block:: yaml + + service_type: alertmanager + service_name: alertmanager + placement: + hosts: + - unknown_host + status: + ... + running: 1 + size: 1 + events: + - 2021-02-01T08:58:02.741162 service:alertmanager [INFO] "service was created" + - '2021-02-01T12:09:25.264584 service:alertmanager [ERROR] "Failed to apply: Cannot + place <AlertManagerSpec for service_name=alertmanager> on unknown_host: Unknown hosts"' + +Listing daemon events +~~~~~~~~~~~~~~~~~~~~~ + +To see the events associated with a certain daemon, run a +command of the and following form: + +.. prompt:: bash # + + ceph orch ps --service-name <service-name> --daemon-id <daemon-id> --format yaml + +This will return something in the following form: + +.. code-block:: yaml + + daemon_type: mds + daemon_id: cephfs.hostname.ppdhsz + hostname: hostname + status_desc: running + ... + events: + - 2021-02-01T08:59:43.845866 daemon:mds.cephfs.hostname.ppdhsz [INFO] "Reconfigured + mds.cephfs.hostname.ppdhsz on host 'hostname'" + + +Checking cephadm logs +--------------------- + +To learn how to monitor the cephadm logs as they are generated, read :ref:`watching_cephadm_logs`. + +If your Ceph cluster has been configured to log events to files, there will exist a +cephadm log file called ``ceph.cephadm.log`` on all monitor hosts (see +:ref:`cephadm-logs` for a more complete explanation of this). + +Gathering log files +------------------- + +Use journalctl to gather the log files of all daemons: + +.. note:: By default cephadm now stores logs in journald. This means + that you will no longer find daemon logs in ``/var/log/ceph/``. + +To read the log file of one specific daemon, run:: + + cephadm logs --name <name-of-daemon> + +Note: this only works when run on the same host where the daemon is running. To +get logs of a daemon running on a different host, give the ``--fsid`` option:: + + cephadm logs --fsid <fsid> --name <name-of-daemon> + +where the ``<fsid>`` corresponds to the cluster ID printed by ``ceph status``. + +To fetch all log files of all daemons on a given host, run:: + + for name in $(cephadm ls | jq -r '.[].name') ; do + cephadm logs --fsid <fsid> --name "$name" > $name; + done + +Collecting systemd status +------------------------- + +To print the state of a systemd unit, run:: + + systemctl status "ceph-$(cephadm shell ceph fsid)@<service name>.service"; + + +To fetch all state of all daemons of a given host, run:: + + fsid="$(cephadm shell ceph fsid)" + for name in $(cephadm ls | jq -r '.[].name') ; do + systemctl status "ceph-$fsid@$name.service" > $name; + done + + +List all downloaded container images +------------------------------------ + +To list all container images that are downloaded on a host: + +.. note:: ``Image`` might also be called `ImageID` + +:: + + podman ps -a --format json | jq '.[].Image' + "docker.io/library/centos:8" + "registry.opensuse.org/opensuse/leap:15.2" + + +Manually running containers +--------------------------- + +Cephadm writes small wrappers that run a containers. Refer to +``/var/lib/ceph/<cluster-fsid>/<service-name>/unit.run`` for the +container execution command. + +.. _cephadm-ssh-errors: + +SSH errors +---------- + +Error message:: + + execnet.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-73z09u6g -i /tmp/cephadm-identity-ky7ahp_5 root@10.10.1.2 + ... + raise OrchestratorError(msg) from e + orchestrator._interface.OrchestratorError: Failed to connect to 10.10.1.2 (10.10.1.2). + Please make sure that the host is reachable and accepts connections using the cephadm SSH key + ... + +Things users can do: + +1. Ensure cephadm has an SSH identity key:: + + [root@mon1~]# cephadm shell -- ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key + INFO:cephadm:Inferring fsid f8edc08a-7f17-11ea-8707-000c2915dd98 + INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15 obtained 'mgr/cephadm/ssh_identity_key' + [root@mon1 ~] # chmod 0600 ~/cephadm_private_key + + If this fails, cephadm doesn't have a key. Fix this by running the following command:: + + [root@mon1 ~]# cephadm shell -- ceph cephadm generate-ssh-key + + or:: + + [root@mon1 ~]# cat ~/cephadm_private_key | cephadm shell -- ceph cephadm set-ssk-key -i - + +2. Ensure that the SSH config is correct:: + + [root@mon1 ~]# cephadm shell -- ceph cephadm get-ssh-config > config + +3. Verify that we can connect to the host:: + + [root@mon1 ~]# ssh -F config -i ~/cephadm_private_key root@mon1 + +Verifying that the Public Key is Listed in the authorized_keys file +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To verify that the public key is in the authorized_keys file, run the following commands:: + + [root@mon1 ~]# cephadm shell -- ceph cephadm get-pub-key > ~/ceph.pub + [root@mon1 ~]# grep "`cat ~/ceph.pub`" /root/.ssh/authorized_keys + +Failed to infer CIDR network error +---------------------------------- + +If you see this error:: + + ERROR: Failed to infer CIDR network for mon ip ***; pass --skip-mon-network to configure it later + +Or this error:: + + Must set public_network config option or specify a CIDR network, ceph addrvec, or plain IP + +This means that you must run a command of this form:: + + ceph config set mon public_network <mon_network> + +For more detail on operations of this kind, see :ref:`deploy_additional_monitors` + +Accessing the admin socket +-------------------------- + +Each Ceph daemon provides an admin socket that bypasses the +MONs (See :ref:`rados-monitoring-using-admin-socket`). + +To access the admin socket, first enter the daemon container on the host:: + + [root@mon1 ~]# cephadm enter --name <daemon-name> + [ceph: root@mon1 /]# ceph --admin-daemon /var/run/ceph/ceph-<daemon-name>.asok config show + +Calling miscellaneous ceph tools +-------------------------------- + +To call miscellaneous like ``ceph-objectstore-tool`` or +``ceph-monstore-tool``, you can run them by calling +``cephadm shell --name <daemon-name>`` like so:: + + root@myhostname # cephadm unit --name mon.myhostname stop + root@myhostname # cephadm shell --name mon.myhostname + [ceph: root@myhostname /]# ceph-monstore-tool /var/lib/ceph/mon/ceph-myhostname get monmap > monmap + [ceph: root@myhostname /]# monmaptool --print monmap + monmaptool: monmap file monmap + epoch 1 + fsid 28596f44-3b56-11ec-9034-482ae35a5fbb + last_changed 2021-11-01T20:57:19.755111+0000 + created 2021-11-01T20:57:19.755111+0000 + min_mon_release 17 (quincy) + election_strategy: 1 + 0: [v2:127.0.0.1:3300/0,v1:127.0.0.1:6789/0] mon.myhostname + +This command sets up the environment in a way that is suitable +for extended daemon maintenance and running the deamon interactively. + +.. _cephadm-restore-quorum: + +Restoring the MON quorum +------------------------ + +In case the Ceph MONs cannot form a quorum, cephadm is not able +to manage the cluster, until the quorum is restored. + +In order to restore the MON quorum, remove unhealthy MONs +form the monmap by following these steps: + +1. Stop all MONs. For each MON host:: + + ssh {mon-host} + cephadm unit --name mon.`hostname` stop + + +2. Identify a surviving monitor and log in to that host:: + + ssh {mon-host} + cephadm enter --name mon.`hostname` + +3. Follow the steps in :ref:`rados-mon-remove-from-unhealthy` + +.. _cephadm-manually-deploy-mgr: + +Manually deploying a MGR daemon +------------------------------- +cephadm requires a MGR daemon in order to manage the cluster. In case the cluster +the last MGR of a cluster was removed, follow these steps in order to deploy +a MGR ``mgr.hostname.smfvfd`` on a random host of your cluster manually. + +Disable the cephadm scheduler, in order to prevent cephadm from removing the new +MGR. See :ref:`cephadm-enable-cli`:: + + ceph config-key set mgr/cephadm/pause true + +Then get or create the auth entry for the new MGR:: + + ceph auth get-or-create mgr.hostname.smfvfd mon "profile mgr" osd "allow *" mds "allow *" + +Get the ceph.conf:: + + ceph config generate-minimal-conf + +Get the container image:: + + ceph config get "mgr.hostname.smfvfd" container_image + +Create a file ``config-json.json`` which contains the information neccessary to deploy +the daemon: + +.. code-block:: json + + { + "config": "# minimal ceph.conf for 8255263a-a97e-4934-822c-00bfe029b28f\n[global]\n\tfsid = 8255263a-a97e-4934-822c-00bfe029b28f\n\tmon_host = [v2:192.168.0.1:40483/0,v1:192.168.0.1:40484/0]\n", + "keyring": "[mgr.hostname.smfvfd]\n\tkey = V2VyIGRhcyBsaWVzdCBpc3QgZG9vZi4=\n" + } + +Deploy the daemon:: + + cephadm --image <container-image> deploy --fsid <fsid> --name mgr.hostname.smfvfd --config-json config-json.json + +Analyzing core dumps +--------------------- + +In case a Ceph daemon crashes, cephadm supports analyzing core dumps. To enable core dumps, run + +.. prompt:: bash # + + ulimit -c unlimited + +core dumps will now be written to ``/var/lib/systemd/coredump``. + +.. note:: + + core dumps are not namespaced by the kernel, which means + they will be written to ``/var/lib/systemd/coredump`` on + the container host. + +Now, wait for the crash to happen again. (To simulate the crash of a daemon, run e.g. ``killall -3 ceph-mon``) + +Install debug packages by entering the cephadm shell and install ``ceph-debuginfo``:: + + # cephadm shell --mount /var/lib/systemd/coredump + [ceph: root@host1 /]# dnf install ceph-debuginfo gdb zstd + [ceph: root@host1 /]# unzstd /mnt/coredump/core.ceph-*.zst + [ceph: root@host1 /]# gdb /usr/bin/ceph-mon /mnt/coredump/core.ceph-... + (gdb) bt + #0 0x00007fa9117383fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 + #1 0x00007fa910d7f8f0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6 + #2 0x00007fa913d3f48f in AsyncMessenger::wait() () from /usr/lib64/ceph/libceph-common.so.2 + #3 0x0000563085ca3d7e in main () diff --git a/doc/cephadm/upgrade.rst b/doc/cephadm/upgrade.rst new file mode 100644 index 000000000..2bbfc30e5 --- /dev/null +++ b/doc/cephadm/upgrade.rst @@ -0,0 +1,270 @@ +============== +Upgrading Ceph +============== + +Cephadm can safely upgrade Ceph from one bugfix release to the next. For +example, you can upgrade from v15.2.0 (the first Octopus release) to the next +point release, v15.2.1. + +The automated upgrade process follows Ceph best practices. For example: + +* The upgrade order starts with managers, monitors, then other daemons. +* Each daemon is restarted only after Ceph indicates that the cluster + will remain available. + +.. note:: + + The Ceph cluster health status is likely to switch to + ``HEALTH_WARNING`` during the upgrade. + +.. note:: + + In case a host of the cluster is offline, the upgrade is paused. + + +Starting the upgrade +==================== + +Before you use cephadm to upgrade Ceph, verify that all hosts are currently online and that your cluster is healthy by running the following command: + +.. prompt:: bash # + + ceph -s + +To upgrade (or downgrade) to a specific release, run the following command: + +.. prompt:: bash # + + ceph orch upgrade start --ceph-version <version> + +For example, to upgrade to v16.2.6, run the following command: + +.. prompt:: bash # + + ceph orch upgrade start --ceph-version 16.2.6 + +.. note:: + + From version v16.2.6 the Docker Hub registry is no longer used, so if you use Docker you have to point it to the image in the quay.io registry: + +.. prompt:: bash # + + ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.6 + + +Monitoring the upgrade +====================== + +Determine (1) whether an upgrade is in progress and (2) which version the +cluster is upgrading to by running the following command: + +.. prompt:: bash # + + ceph orch upgrade status + +Watching the progress bar during a Ceph upgrade +----------------------------------------------- + +During the upgrade, a progress bar is visible in the ceph status output. It +looks like this: + +.. code-block:: console + + # ceph -s + + [...] + progress: + Upgrade to docker.io/ceph/ceph:v15.2.1 (00h 20m 12s) + [=======.....................] (time remaining: 01h 43m 31s) + +Watching the cephadm log during an upgrade +------------------------------------------ + +Watch the cephadm log by running the following command: + +.. prompt:: bash # + + ceph -W cephadm + + +Canceling an upgrade +==================== + +You can stop the upgrade process at any time by running the following command: + +.. prompt:: bash # + + ceph orch upgrade stop + +Post upgrade actions +==================== + +In case the new version is based on ``cephadm``, once done with the upgrade the user +has to update the ``cephadm`` package (or ceph-common package in case the user +doesn't use ``cephadm shell``) to a version compatible with the new version. + +Potential problems +================== + +There are a few health alerts that can arise during the upgrade process. + +UPGRADE_NO_STANDBY_MGR +---------------------- + +This alert (``UPGRADE_NO_STANDBY_MGR``) means that Ceph does not detect an +active standby manager daemon. In order to proceed with the upgrade, Ceph +requires an active standby manager daemon (which you can think of in this +context as "a second manager"). + +You can ensure that Cephadm is configured to run 2 (or more) managers by +running the following command: + +.. prompt:: bash # + + ceph orch apply mgr 2 # or more + +You can check the status of existing mgr daemons by running the following +command: + +.. prompt:: bash # + + ceph orch ps --daemon-type mgr + +If an existing mgr daemon has stopped, you can try to restart it by running the +following command: + +.. prompt:: bash # + + ceph orch daemon restart <name> + +UPGRADE_FAILED_PULL +------------------- + +This alert (``UPGRADE_FAILED_PULL``) means that Ceph was unable to pull the +container image for the target version. This can happen if you specify a +version or container image that does not exist (e.g. "1.2.3"), or if the +container registry can not be reached by one or more hosts in the cluster. + +To cancel the existing upgrade and to specify a different target version, run +the following commands: + +.. prompt:: bash # + + ceph orch upgrade stop + ceph orch upgrade start --ceph-version <version> + + +Using customized container images +================================= + +For most users, upgrading requires nothing more complicated than specifying the +Ceph version number to upgrade to. In such cases, cephadm locates the specific +Ceph container image to use by combining the ``container_image_base`` +configuration option (default: ``docker.io/ceph/ceph``) with a tag of +``vX.Y.Z``. + +But it is possible to upgrade to an arbitrary container image, if that's what +you need. For example, the following command upgrades to a development build: + +.. prompt:: bash # + + ceph orch upgrade start --image quay.io/ceph-ci/ceph:recent-git-branch-name + +For more information about available container images, see :ref:`containers`. + +Staggered Upgrade +================= + +Some users may prefer to upgrade components in phases rather than all at once. +The upgrade command, starting in 16.2.11 and 17.2.1 allows parameters +to limit which daemons are upgraded by a single upgrade command. The options in +include ``daemon_types``, ``services``, ``hosts`` and ``limit``. ``daemon_types`` +takes a comma-separated list of daemon types and will only upgrade daemons of those +types. ``services`` is mutually exclusive with ``daemon_types``, only takes services +of one type at a time (e.g. can't provide an OSD and RGW service at the same time), and +will only upgrade daemons belonging to those services. ``hosts`` can be combined +with ``daemon_types`` or ``services`` or provided on its own. The ``hosts`` parameter +follows the same format as the command line options for :ref:`orchestrator-cli-placement-spec`. +``limit`` takes an integer > 0 and provides a numerical limit on the number of +daemons cephadm will upgrade. ``limit`` can be combined with any of the other +parameters. For example, if you specify to upgrade daemons of type osd on host +Host1 with ``limit`` set to 3, cephadm will upgrade (up to) 3 osd daemons on +Host1. + +Example: specifying daemon types and hosts: + +.. prompt:: bash # + + ceph orch upgrade start --image <image-name> --daemon-types mgr,mon --hosts host1,host2 + +Example: specifying services and using limit: + +.. prompt:: bash # + + ceph orch upgrade start --image <image-name> --services rgw.example1,rgw.example2 --limit 2 + +.. note:: + + Cephadm strictly enforces an order to the upgrade of daemons that is still present + in staggered upgrade scenarios. The current upgrade ordering is + ``mgr -> mon -> crash -> osd -> mds -> rgw -> rbd-mirror -> cephfs-mirror -> iscsi -> nfs``. + If you specify parameters that would upgrade daemons out of order, the upgrade + command will block and note which daemons will be missed if you proceed. + +.. note:: + + Upgrade commands with limiting parameters will validate the options before beginning the + upgrade, which may require pulling the new container image. Do not be surprised + if the upgrade start command takes a while to return when limiting parameters are provided. + +.. note:: + + In staggered upgrade scenarios (when a limiting parameter is provided) monitoring + stack daemons including Prometheus and node-exporter are refreshed after the Manager + daemons have been upgraded. Do not be surprised if Manager upgrades thus take longer + than expected. Note that the versions of monitoring stack daemons may not change between + Ceph releases, in which case they are only redeployed. + +Upgrading to a version that supports staggered upgrade from one that doesn't +---------------------------------------------------------------------------- + +While upgrading from a version that already supports staggered upgrades the process +simply requires providing the necessary arguments. However, if you wish to upgrade +to a version that supports staggered upgrade from one that does not, there is a +workaround. It requires first manually upgrading the Manager daemons and then passing +the limiting parameters as usual. + +.. warning:: + Make sure you have multiple running mgr daemons before attempting this procedure. + +To start with, determine which Manager is your active one and which are standby. This +can be done in a variety of ways such as looking at the ``ceph -s`` output. Then, +manually upgrade each standby mgr daemon with: + +.. prompt:: bash # + + ceph orch daemon redeploy mgr.example1.abcdef --image <new-image-name> + +.. note:: + + If you are on a very early version of cephadm (early Octopus) the ``orch daemon redeploy`` + command may not have the ``--image`` flag. In that case, you must manually set the + Manager container image ``ceph config set mgr container_image <new-image-name>`` and then + redeploy the Manager ``ceph orch daemon redeploy mgr.example1.abcdef`` + +At this point, a Manager fail over should allow us to have the active Manager be one +running the new version. + +.. prompt:: bash # + + ceph mgr fail + +Verify the active Manager is now one running the new version. To complete the Manager +upgrading: + +.. prompt:: bash # + + ceph orch upgrade start --image <new-image-name> --daemon-types mgr + +You should now have all your Manager daemons on the new version and be able to +specify the limiting parameters for the rest of the upgrade. |