summaryrefslogtreecommitdiffstats
path: root/doc/cephadm/operations.rst
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-21 11:54:28 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-21 11:54:28 +0000
commite6918187568dbd01842d8d1d2c808ce16a894239 (patch)
tree64f88b554b444a49f656b6c656111a145cbbaa28 /doc/cephadm/operations.rst
parentInitial commit. (diff)
downloadceph-e6918187568dbd01842d8d1d2c808ce16a894239.tar.xz
ceph-e6918187568dbd01842d8d1d2c808ce16a894239.zip
Adding upstream version 18.2.2.upstream/18.2.2
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/cephadm/operations.rst')
-rw-r--r--doc/cephadm/operations.rst616
1 files changed, 616 insertions, 0 deletions
diff --git a/doc/cephadm/operations.rst b/doc/cephadm/operations.rst
new file mode 100644
index 000000000..5d8fdaca8
--- /dev/null
+++ b/doc/cephadm/operations.rst
@@ -0,0 +1,616 @@
+==================
+Cephadm Operations
+==================
+
+.. _watching_cephadm_logs:
+
+Watching cephadm log messages
+=============================
+
+Cephadm writes logs to the ``cephadm`` cluster log channel. You can
+monitor Ceph's activity in real time by reading the logs as they fill
+up. Run the following command to see the logs in real time:
+
+.. prompt:: bash #
+
+ ceph -W cephadm
+
+By default, this command shows info-level events and above. To see
+debug-level messages as well as info-level events, run the following
+commands:
+
+.. prompt:: bash #
+
+ ceph config set mgr mgr/cephadm/log_to_cluster_level debug
+ ceph -W cephadm --watch-debug
+
+.. warning::
+
+ The debug messages are very verbose!
+
+You can see recent events by running the following command:
+
+.. prompt:: bash #
+
+ ceph log last cephadm
+
+These events are also logged to the ``ceph.cephadm.log`` file on
+monitor hosts as well as to the monitor daemons' stderr.
+
+
+.. _cephadm-logs:
+
+
+Ceph daemon control
+===================
+
+Starting and stopping daemons
+-----------------------------
+
+You can stop, start, or restart a daemon with:
+
+.. prompt:: bash #
+
+ ceph orch daemon stop <name>
+ ceph orch daemon start <name>
+ ceph orch daemon restart <name>
+
+You can also do the same for all daemons for a service with:
+
+.. prompt:: bash #
+
+ ceph orch stop <name>
+ ceph orch start <name>
+ ceph orch restart <name>
+
+
+Redeploying or reconfiguring a daemon
+-------------------------------------
+
+The container for a daemon can be stopped, recreated, and restarted with
+the ``redeploy`` command:
+
+.. prompt:: bash #
+
+ ceph orch daemon redeploy <name> [--image <image>]
+
+A container image name can optionally be provided to force a
+particular image to be used (instead of the image specified by the
+``container_image`` config value).
+
+If only the ceph configuration needs to be regenerated, you can also
+issue a ``reconfig`` command, which will rewrite the ``ceph.conf``
+file but will not trigger a restart of the daemon.
+
+.. prompt:: bash #
+
+ ceph orch daemon reconfig <name>
+
+
+Rotating a daemon's authenticate key
+------------------------------------
+
+All Ceph and gateway daemons in the cluster have a secret key that is used to connect
+to and authenticate with the cluster. This key can be rotated (i.e., replaced with a
+new key) with the following command:
+
+.. prompt:: bash #
+
+ ceph orch daemon rotate-key <name>
+
+For MDS, OSD, and MGR daemons, this does not require a daemon restart. For other
+daemons, however (e.g., RGW), the daemon may be restarted to switch to the new key.
+
+
+Ceph daemon logs
+================
+
+Logging to journald
+-------------------
+
+Ceph daemons traditionally write logs to ``/var/log/ceph``. Ceph daemons log to
+journald by default and Ceph logs are captured by the container runtime
+environment. They are accessible via ``journalctl``.
+
+.. note:: Prior to Quincy, ceph daemons logged to stderr.
+
+Example of logging to journald
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For example, to view the logs for the daemon ``mon.foo`` for a cluster
+with ID ``5c5a50ae-272a-455d-99e9-32c6a013e694``, the command would be
+something like:
+
+.. prompt:: bash #
+
+ journalctl -u ceph-5c5a50ae-272a-455d-99e9-32c6a013e694@mon.foo
+
+This works well for normal operations when logging levels are low.
+
+Logging to files
+----------------
+
+You can also configure Ceph daemons to log to files instead of to
+journald if you prefer logs to appear in files (as they did in earlier,
+pre-cephadm, pre-Octopus versions of Ceph). When Ceph logs to files,
+the logs appear in ``/var/log/ceph/<cluster-fsid>``. If you choose to
+configure Ceph to log to files instead of to journald, remember to
+configure Ceph so that it will not log to journald (the commands for
+this are covered below).
+
+Enabling logging to files
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To enable logging to files, run the following commands:
+
+.. prompt:: bash #
+
+ ceph config set global log_to_file true
+ ceph config set global mon_cluster_log_to_file true
+
+Disabling logging to journald
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you choose to log to files, we recommend disabling logging to journald or else
+everything will be logged twice. Run the following commands to disable logging
+to stderr:
+
+.. prompt:: bash #
+
+ ceph config set global log_to_stderr false
+ ceph config set global mon_cluster_log_to_stderr false
+ ceph config set global log_to_journald false
+ ceph config set global mon_cluster_log_to_journald false
+
+.. note:: You can change the default by passing --log-to-file during
+ bootstrapping a new cluster.
+
+Modifying the log retention schedule
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+By default, cephadm sets up log rotation on each host to rotate these
+files. You can configure the logging retention schedule by modifying
+``/etc/logrotate.d/ceph.<cluster-fsid>``.
+
+
+Data location
+=============
+
+Cephadm stores daemon data and logs in different locations than did
+older, pre-cephadm (pre Octopus) versions of ceph:
+
+* ``/var/log/ceph/<cluster-fsid>`` contains all cluster logs. By
+ default, cephadm logs via stderr and the container runtime. These
+ logs will not exist unless you have enabled logging to files as
+ described in `cephadm-logs`_.
+* ``/var/lib/ceph/<cluster-fsid>`` contains all cluster daemon data
+ (besides logs).
+* ``/var/lib/ceph/<cluster-fsid>/<daemon-name>`` contains all data for
+ an individual daemon.
+* ``/var/lib/ceph/<cluster-fsid>/crash`` contains crash reports for
+ the cluster.
+* ``/var/lib/ceph/<cluster-fsid>/removed`` contains old daemon
+ data directories for stateful daemons (e.g., monitor, prometheus)
+ that have been removed by cephadm.
+
+Disk usage
+----------
+
+Because a few Ceph daemons (notably, the monitors and prometheus) store a
+large amount of data in ``/var/lib/ceph`` , we recommend moving this
+directory to its own disk, partition, or logical volume so that it does not
+fill up the root file system.
+
+
+Health checks
+=============
+The cephadm module provides additional health checks to supplement the
+default health checks provided by the Cluster. These additional health
+checks fall into two categories:
+
+- **cephadm operations**: Health checks in this category are always
+ executed when the cephadm module is active.
+- **cluster configuration**: These health checks are *optional*, and
+ focus on the configuration of the hosts in the cluster.
+
+CEPHADM Operations
+------------------
+
+CEPHADM_PAUSED
+~~~~~~~~~~~~~~
+
+This indicates that cephadm background work has been paused with
+``ceph orch pause``. Cephadm continues to perform passive monitoring
+activities (like checking host and daemon status), but it will not
+make any changes (like deploying or removing daemons).
+
+Resume cephadm work by running the following command:
+
+.. prompt:: bash #
+
+ ceph orch resume
+
+.. _cephadm-stray-host:
+
+CEPHADM_STRAY_HOST
+~~~~~~~~~~~~~~~~~~
+
+This indicates that one or more hosts have Ceph daemons that are
+running, but are not registered as hosts managed by *cephadm*. This
+means that those services cannot currently be managed by cephadm
+(e.g., restarted, upgraded, included in `ceph orch ps`).
+
+* You can manage the host(s) by running the following command:
+
+ .. prompt:: bash #
+
+ ceph orch host add *<hostname>*
+
+ .. note::
+
+ You might need to configure SSH access to the remote host
+ before this will work.
+
+* See :ref:`cephadm-fqdn` for more information about host names and
+ domain names.
+
+* Alternatively, you can manually connect to the host and ensure that
+ services on that host are removed or migrated to a host that is
+ managed by *cephadm*.
+
+* This warning can be disabled entirely by running the following
+ command:
+
+ .. prompt:: bash #
+
+ ceph config set mgr mgr/cephadm/warn_on_stray_hosts false
+
+CEPHADM_STRAY_DAEMON
+~~~~~~~~~~~~~~~~~~~~
+
+One or more Ceph daemons are running but not are not managed by
+*cephadm*. This may be because they were deployed using a different
+tool, or because they were started manually. Those
+services cannot currently be managed by cephadm (e.g., restarted,
+upgraded, or included in `ceph orch ps`).
+
+* If the daemon is a stateful one (monitor or OSD), it should be adopted
+ by cephadm; see :ref:`cephadm-adoption`. For stateless daemons, it is
+ usually easiest to provision a new daemon with the ``ceph orch apply``
+ command and then stop the unmanaged daemon.
+
+* If the stray daemon(s) are running on hosts not managed by cephadm, you can manage the host(s) by running the following command:
+
+ .. prompt:: bash #
+
+ ceph orch host add *<hostname>*
+
+ .. note::
+
+ You might need to configure SSH access to the remote host
+ before this will work.
+
+* See :ref:`cephadm-fqdn` for more information about host names and
+ domain names.
+
+* This warning can be disabled entirely by running the following command:
+
+ .. prompt:: bash #
+
+ ceph config set mgr mgr/cephadm/warn_on_stray_daemons false
+
+CEPHADM_HOST_CHECK_FAILED
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+One or more hosts have failed the basic cephadm host check, which verifies
+that (1) the host is reachable and cephadm can be executed there, and (2)
+that the host satisfies basic prerequisites, like a working container
+runtime (podman or docker) and working time synchronization.
+If this test fails, cephadm will no be able to manage services on that host.
+
+You can manually run this check by running the following command:
+
+.. prompt:: bash #
+
+ ceph cephadm check-host *<hostname>*
+
+You can remove a broken host from management by running the following command:
+
+.. prompt:: bash #
+
+ ceph orch host rm *<hostname>*
+
+You can disable this health warning by running the following command:
+
+.. prompt:: bash #
+
+ ceph config set mgr mgr/cephadm/warn_on_failed_host_check false
+
+Cluster Configuration Checks
+----------------------------
+Cephadm periodically scans each of the hosts in the cluster in order
+to understand the state of the OS, disks, NICs etc. These facts can
+then be analysed for consistency across the hosts in the cluster to
+identify any configuration anomalies.
+
+Enabling Cluster Configuration Checks
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The configuration checks are an **optional** feature, and are enabled
+by running the following command:
+
+.. prompt:: bash #
+
+ ceph config set mgr mgr/cephadm/config_checks_enabled true
+
+States Returned by Cluster Configuration Checks
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The configuration checks are triggered after each host scan (1m). The
+cephadm log entries will show the current state and outcome of the
+configuration checks as follows:
+
+Disabled state (config_checks_enabled false):
+
+.. code-block:: bash
+
+ ALL cephadm checks are disabled, use 'ceph config set mgr mgr/cephadm/config_checks_enabled true' to enable
+
+Enabled state (config_checks_enabled true):
+
+.. code-block:: bash
+
+ CEPHADM 8/8 checks enabled and executed (0 bypassed, 0 disabled). No issues detected
+
+Managing Configuration Checks (subcommands)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The configuration checks themselves are managed through several cephadm subcommands.
+
+To determine whether the configuration checks are enabled, run the following command:
+
+.. prompt:: bash #
+
+ ceph cephadm config-check status
+
+This command returns the status of the configuration checker as either "Enabled" or "Disabled".
+
+
+To list all the configuration checks and their current states, run the following command:
+
+.. code-block:: console
+
+ # ceph cephadm config-check ls
+
+ NAME HEALTHCHECK STATUS DESCRIPTION
+ kernel_security CEPHADM_CHECK_KERNEL_LSM enabled checks SELINUX/Apparmor profiles are consistent across cluster hosts
+ os_subscription CEPHADM_CHECK_SUBSCRIPTION enabled checks subscription states are consistent for all cluster hosts
+ public_network CEPHADM_CHECK_PUBLIC_MEMBERSHIP enabled check that all hosts have a NIC on the Ceph public_network
+ osd_mtu_size CEPHADM_CHECK_MTU enabled check that OSD hosts share a common MTU setting
+ osd_linkspeed CEPHADM_CHECK_LINKSPEED enabled check that OSD hosts share a common linkspeed
+ network_missing CEPHADM_CHECK_NETWORK_MISSING enabled checks that the cluster/public networks defined exist on the Ceph hosts
+ ceph_release CEPHADM_CHECK_CEPH_RELEASE enabled check for Ceph version consistency - ceph daemons should be on the same release (unless upgrade is active)
+ kernel_version CEPHADM_CHECK_KERNEL_VERSION enabled checks that the MAJ.MIN of the kernel on Ceph hosts is consistent
+
+The name of each configuration check can be used to enable or disable a specific check by running a command of the following form:
+:
+
+.. prompt:: bash #
+
+ ceph cephadm config-check disable <name>
+
+For example:
+
+.. prompt:: bash #
+
+ ceph cephadm config-check disable kernel_security
+
+CEPHADM_CHECK_KERNEL_LSM
+~~~~~~~~~~~~~~~~~~~~~~~~
+Each host within the cluster is expected to operate within the same Linux
+Security Module (LSM) state. For example, if the majority of the hosts are
+running with SELINUX in enforcing mode, any host not running in this mode is
+flagged as an anomaly and a healthcheck (WARNING) state raised.
+
+CEPHADM_CHECK_SUBSCRIPTION
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+This check relates to the status of vendor subscription. This check is
+performed only for hosts using RHEL, but helps to confirm that all hosts are
+covered by an active subscription, which ensures that patches and updates are
+available.
+
+CEPHADM_CHECK_PUBLIC_MEMBERSHIP
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+All members of the cluster should have NICs configured on at least one of the
+public network subnets. Hosts that are not on the public network will rely on
+routing, which may affect performance.
+
+CEPHADM_CHECK_MTU
+~~~~~~~~~~~~~~~~~
+The MTU of the NICs on OSDs can be a key factor in consistent performance. This
+check examines hosts that are running OSD services to ensure that the MTU is
+configured consistently within the cluster. This is determined by establishing
+the MTU setting that the majority of hosts is using. Any anomalies result in a
+Ceph health check.
+
+CEPHADM_CHECK_LINKSPEED
+~~~~~~~~~~~~~~~~~~~~~~~
+This check is similar to the MTU check. Linkspeed consistency is a factor in
+consistent cluster performance, just as the MTU of the NICs on the OSDs is.
+This check determines the linkspeed shared by the majority of OSD hosts, and a
+health check is run for any hosts that are set at a lower linkspeed rate.
+
+CEPHADM_CHECK_NETWORK_MISSING
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The `public_network` and `cluster_network` settings support subnet definitions
+for IPv4 and IPv6. If these settings are not found on any host in the cluster,
+a health check is raised.
+
+CEPHADM_CHECK_CEPH_RELEASE
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+Under normal operations, the Ceph cluster runs daemons under the same ceph
+release (that is, the Ceph cluster runs all daemons under (for example)
+Octopus). This check determines the active release for each daemon, and
+reports any anomalies as a healthcheck. *This check is bypassed if an upgrade
+process is active within the cluster.*
+
+CEPHADM_CHECK_KERNEL_VERSION
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The OS kernel version (maj.min) is checked for consistency across the hosts.
+The kernel version of the majority of the hosts is used as the basis for
+identifying anomalies.
+
+.. _client_keyrings_and_configs:
+
+Client keyrings and configs
+===========================
+Cephadm can distribute copies of the ``ceph.conf`` file and client keyring
+files to hosts. Starting from versions 16.2.10 (Pacific) and 17.2.1 (Quincy),
+in addition to the default location ``/etc/ceph/`` cephadm also stores config
+and keyring files in the ``/var/lib/ceph/<fsid>/config`` directory. It is usually
+a good idea to store a copy of the config and ``client.admin`` keyring on any host
+used to administer the cluster via the CLI. By default, cephadm does this for any
+nodes that have the ``_admin`` label (which normally includes the bootstrap host).
+
+.. note:: Ceph daemons will still use files on ``/etc/ceph/``. The new configuration
+ location ``/var/lib/ceph/<fsid>/config`` is used by cephadm only. Having this config
+ directory under the fsid helps cephadm to load the configuration associated with
+ the cluster.
+
+
+When a client keyring is placed under management, cephadm will:
+
+ - build a list of target hosts based on the specified placement spec (see
+ :ref:`orchestrator-cli-placement-spec`)
+ - store a copy of the ``/etc/ceph/ceph.conf`` file on the specified host(s)
+ - store a copy of the ``ceph.conf`` file at ``/var/lib/ceph/<fsid>/config/ceph.conf`` on the specified host(s)
+ - store a copy of the ``ceph.client.admin.keyring`` file at ``/var/lib/ceph/<fsid>/config/ceph.client.admin.keyring`` on the specified host(s)
+ - store a copy of the keyring file on the specified host(s)
+ - update the ``ceph.conf`` file as needed (e.g., due to a change in the cluster monitors)
+ - update the keyring file if the entity's key is changed (e.g., via ``ceph
+ auth ...`` commands)
+ - ensure that the keyring file has the specified ownership and specified mode
+ - remove the keyring file when client keyring management is disabled
+ - remove the keyring file from old hosts if the keyring placement spec is
+ updated (as needed)
+
+Listing Client Keyrings
+-----------------------
+
+To see the list of client keyrings are currently under management, run the following command:
+
+.. prompt:: bash #
+
+ ceph orch client-keyring ls
+
+Putting a Keyring Under Management
+----------------------------------
+
+To put a keyring under management, run a command of the following form:
+
+.. prompt:: bash #
+
+ ceph orch client-keyring set <entity> <placement> [--mode=<mode>] [--owner=<uid>.<gid>] [--path=<path>]
+
+- By default, the *path* is ``/etc/ceph/client.{entity}.keyring``, which is
+ where Ceph looks by default. Be careful when specifying alternate locations,
+ as existing files may be overwritten.
+- A placement of ``*`` (all hosts) is common.
+- The mode defaults to ``0600`` and ownership to ``0:0`` (user root, group root).
+
+For example, to create a ``client.rbd`` key and deploy it to hosts with the
+``rbd-client`` label and make it group readable by uid/gid 107 (qemu), run the
+following commands:
+
+.. prompt:: bash #
+
+ ceph auth get-or-create-key client.rbd mon 'profile rbd' mgr 'profile rbd' osd 'profile rbd pool=my_rbd_pool'
+ ceph orch client-keyring set client.rbd label:rbd-client --owner 107:107 --mode 640
+
+The resulting keyring file is:
+
+.. code-block:: console
+
+ -rw-r-----. 1 qemu qemu 156 Apr 21 08:47 /etc/ceph/client.client.rbd.keyring
+
+Disabling Management of a Keyring File
+--------------------------------------
+
+To disable management of a keyring file, run a command of the following form:
+
+.. prompt:: bash #
+
+ ceph orch client-keyring rm <entity>
+
+.. note::
+
+ This deletes any keyring files for this entity that were previously written
+ to cluster nodes.
+
+.. _etc_ceph_conf_distribution:
+
+/etc/ceph/ceph.conf
+===================
+
+Distributing ceph.conf to hosts that have no keyrings
+-----------------------------------------------------
+
+It might be useful to distribute ``ceph.conf`` files to hosts without an
+associated client keyring file. By default, cephadm deploys only a
+``ceph.conf`` file to hosts where a client keyring is also distributed (see
+above). To write config files to hosts without client keyrings, run the
+following command:
+
+.. prompt:: bash #
+
+ ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf true
+
+Using Placement Specs to specify which hosts get keyrings
+---------------------------------------------------------
+
+By default, the configs are written to all hosts (i.e., those listed by ``ceph
+orch host ls``). To specify which hosts get a ``ceph.conf``, run a command of
+the following form:
+
+.. prompt:: bash #
+
+ ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts <placement spec>
+
+For example, to distribute configs to hosts with the ``bare_config`` label, run
+the following command:
+
+Distributing ceph.conf to hosts tagged with bare_config
+-------------------------------------------------------
+
+For example, to distribute configs to hosts with the ``bare_config`` label, run the following command:
+
+.. prompt:: bash #
+
+ ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts label:bare_config
+
+(See :ref:`orchestrator-cli-placement-spec` for more information about placement specs.)
+
+Purging a cluster
+=================
+
+.. danger:: THIS OPERATION WILL DESTROY ALL DATA STORED IN THIS CLUSTER
+
+In order to destroy a cluster and delete all data stored in this cluster, disable
+cephadm to stop all orchestration operations (so we avoid deploying new daemons).
+
+.. prompt:: bash #
+
+ ceph mgr module disable cephadm
+
+Then verify the FSID of the cluster:
+
+.. prompt:: bash #
+
+ ceph fsid
+
+Purge ceph daemons from all hosts in the cluster
+
+.. prompt:: bash #
+
+ # For each host:
+ cephadm rm-cluster --force --zap-osds --fsid <fsid>