From e6918187568dbd01842d8d1d2c808ce16a894239 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sun, 21 Apr 2024 13:54:28 +0200 Subject: Adding upstream version 18.2.2. Signed-off-by: Daniel Baumann --- doc/rados/configuration/auth-config-ref.rst | 379 ++++++++++++ doc/rados/configuration/bluestore-config-ref.rst | 552 +++++++++++++++++ doc/rados/configuration/ceph-conf.rst | 715 +++++++++++++++++++++++ doc/rados/configuration/common.rst | 207 +++++++ doc/rados/configuration/demo-ceph.conf | 31 + doc/rados/configuration/filestore-config-ref.rst | 377 ++++++++++++ doc/rados/configuration/general-config-ref.rst | 19 + doc/rados/configuration/index.rst | 53 ++ doc/rados/configuration/journal-ref.rst | 39 ++ doc/rados/configuration/mclock-config-ref.rst | 699 ++++++++++++++++++++++ doc/rados/configuration/mon-config-ref.rst | 642 ++++++++++++++++++++ doc/rados/configuration/mon-lookup-dns.rst | 58 ++ doc/rados/configuration/mon-osd-interaction.rst | 245 ++++++++ doc/rados/configuration/msgr2.rst | 257 ++++++++ doc/rados/configuration/network-config-ref.rst | 355 +++++++++++ doc/rados/configuration/osd-config-ref.rst | 445 ++++++++++++++ doc/rados/configuration/pool-pg-config-ref.rst | 46 ++ doc/rados/configuration/pool-pg.conf | 21 + doc/rados/configuration/storage-devices.rst | 93 +++ 19 files changed, 5233 insertions(+) create mode 100644 doc/rados/configuration/auth-config-ref.rst create mode 100644 doc/rados/configuration/bluestore-config-ref.rst create mode 100644 doc/rados/configuration/ceph-conf.rst create mode 100644 doc/rados/configuration/common.rst create mode 100644 doc/rados/configuration/demo-ceph.conf create mode 100644 doc/rados/configuration/filestore-config-ref.rst create mode 100644 doc/rados/configuration/general-config-ref.rst create mode 100644 doc/rados/configuration/index.rst create mode 100644 doc/rados/configuration/journal-ref.rst create mode 100644 doc/rados/configuration/mclock-config-ref.rst create mode 100644 doc/rados/configuration/mon-config-ref.rst create mode 100644 doc/rados/configuration/mon-lookup-dns.rst create mode 100644 doc/rados/configuration/mon-osd-interaction.rst create mode 100644 doc/rados/configuration/msgr2.rst create mode 100644 doc/rados/configuration/network-config-ref.rst create mode 100644 doc/rados/configuration/osd-config-ref.rst create mode 100644 doc/rados/configuration/pool-pg-config-ref.rst create mode 100644 doc/rados/configuration/pool-pg.conf create mode 100644 doc/rados/configuration/storage-devices.rst (limited to 'doc/rados/configuration') diff --git a/doc/rados/configuration/auth-config-ref.rst b/doc/rados/configuration/auth-config-ref.rst new file mode 100644 index 000000000..fc14f4ee6 --- /dev/null +++ b/doc/rados/configuration/auth-config-ref.rst @@ -0,0 +1,379 @@ +.. _rados-cephx-config-ref: + +======================== + CephX Config Reference +======================== + +The CephX protocol is enabled by default. The cryptographic authentication that +CephX provides has some computational costs, though they should generally be +quite low. If the network environment connecting your client and server hosts +is very safe and you cannot afford authentication, you can disable it. +**Disabling authentication is not generally recommended**. + +.. note:: If you disable authentication, you will be at risk of a + man-in-the-middle attack that alters your client/server messages, which + could have disastrous security effects. + +For information about creating users, see `User Management`_. For details on +the architecture of CephX, see `Architecture - High Availability +Authentication`_. + + +Deployment Scenarios +==================== + +How you initially configure CephX depends on your scenario. There are two +common strategies for deploying a Ceph cluster. If you are a first-time Ceph +user, you should probably take the easiest approach: using ``cephadm`` to +deploy a cluster. But if your cluster uses other deployment tools (for example, +Ansible, Chef, Juju, or Puppet), you will need either to use the manual +deployment procedures or to configure your deployment tool so that it will +bootstrap your monitor(s). + +Manual Deployment +----------------- + +When you deploy a cluster manually, it is necessary to bootstrap the monitors +manually and to create the ``client.admin`` user and keyring. To bootstrap +monitors, follow the steps in `Monitor Bootstrapping`_. Follow these steps when +using third-party deployment tools (for example, Chef, Puppet, and Juju). + + +Enabling/Disabling CephX +======================== + +Enabling CephX is possible only if the keys for your monitors, OSDs, and +metadata servers have already been deployed. If you are simply toggling CephX +on or off, it is not necessary to repeat the bootstrapping procedures. + +Enabling CephX +-------------- + +When CephX is enabled, Ceph will look for the keyring in the default search +path: this path includes ``/etc/ceph/$cluster.$name.keyring``. It is possible +to override this search-path location by adding a ``keyring`` option in the +``[global]`` section of your `Ceph configuration`_ file, but this is not +recommended. + +To enable CephX on a cluster for which authentication has been disabled, carry +out the following procedure. If you (or your deployment utility) have already +generated the keys, you may skip the steps related to generating keys. + +#. Create a ``client.admin`` key, and save a copy of the key for your client + host: + + .. prompt:: bash $ + + ceph auth get-or-create client.admin mon 'allow *' mds 'allow *' mgr 'allow *' osd 'allow *' -o /etc/ceph/ceph.client.admin.keyring + + **Warning:** This step will clobber any existing + ``/etc/ceph/client.admin.keyring`` file. Do not perform this step if a + deployment tool has already generated a keyring file for you. Be careful! + +#. Create a monitor keyring and generate a monitor secret key: + + .. prompt:: bash $ + + ceph-authtool --create-keyring /tmp/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *' + +#. For each monitor, copy the monitor keyring into a ``ceph.mon.keyring`` file + in the monitor's ``mon data`` directory. For example, to copy the monitor + keyring to ``mon.a`` in a cluster called ``ceph``, run the following + command: + + .. prompt:: bash $ + + cp /tmp/ceph.mon.keyring /var/lib/ceph/mon/ceph-a/keyring + +#. Generate a secret key for every MGR, where ``{$id}`` is the MGR letter: + + .. prompt:: bash $ + + ceph auth get-or-create mgr.{$id} mon 'allow profile mgr' mds 'allow *' osd 'allow *' -o /var/lib/ceph/mgr/ceph-{$id}/keyring + +#. Generate a secret key for every OSD, where ``{$id}`` is the OSD number: + + .. prompt:: bash $ + + ceph auth get-or-create osd.{$id} mon 'allow rwx' osd 'allow *' -o /var/lib/ceph/osd/ceph-{$id}/keyring + +#. Generate a secret key for every MDS, where ``{$id}`` is the MDS letter: + + .. prompt:: bash $ + + ceph auth get-or-create mds.{$id} mon 'allow rwx' osd 'allow *' mds 'allow *' mgr 'allow profile mds' -o /var/lib/ceph/mds/ceph-{$id}/keyring + +#. Enable CephX authentication by setting the following options in the + ``[global]`` section of your `Ceph configuration`_ file: + + .. code-block:: ini + + auth_cluster_required = cephx + auth_service_required = cephx + auth_client_required = cephx + +#. Start or restart the Ceph cluster. For details, see `Operating a Cluster`_. + +For details on bootstrapping a monitor manually, see `Manual Deployment`_. + + + +Disabling CephX +--------------- + +The following procedure describes how to disable CephX. If your cluster +environment is safe, you might want to disable CephX in order to offset the +computational expense of running authentication. **We do not recommend doing +so.** However, setup and troubleshooting might be easier if authentication is +temporarily disabled and subsequently re-enabled. + +#. Disable CephX authentication by setting the following options in the + ``[global]`` section of your `Ceph configuration`_ file: + + .. code-block:: ini + + auth_cluster_required = none + auth_service_required = none + auth_client_required = none + +#. Start or restart the Ceph cluster. For details, see `Operating a Cluster`_. + + +Configuration Settings +====================== + +Enablement +---------- + + +``auth_cluster_required`` + +:Description: If this configuration setting is enabled, the Ceph Storage + Cluster daemons (that is, ``ceph-mon``, ``ceph-osd``, + ``ceph-mds``, and ``ceph-mgr``) are required to authenticate with + each other. Valid settings are ``cephx`` or ``none``. + +:Type: String +:Required: No +:Default: ``cephx``. + + +``auth_service_required`` + +:Description: If this configuration setting is enabled, then Ceph clients can + access Ceph services only if those clients authenticate with the + Ceph Storage Cluster. Valid settings are ``cephx`` or ``none``. + +:Type: String +:Required: No +:Default: ``cephx``. + + +``auth_client_required`` + +:Description: If this configuration setting is enabled, then communication + between the Ceph client and Ceph Storage Cluster can be + established only if the Ceph Storage Cluster authenticates + against the Ceph client. Valid settings are ``cephx`` or + ``none``. + +:Type: String +:Required: No +:Default: ``cephx``. + + +.. index:: keys; keyring + +Keys +---- + +When Ceph is run with authentication enabled, ``ceph`` administrative commands +and Ceph clients can access the Ceph Storage Cluster only if they use +authentication keys. + +The most common way to make these keys available to ``ceph`` administrative +commands and Ceph clients is to include a Ceph keyring under the ``/etc/ceph`` +directory. For Octopus and later releases that use ``cephadm``, the filename is +usually ``ceph.client.admin.keyring``. If the keyring is included in the +``/etc/ceph`` directory, then it is unnecessary to specify a ``keyring`` entry +in the Ceph configuration file. + +Because the Ceph Storage Cluster's keyring file contains the ``client.admin`` +key, we recommend copying the keyring file to nodes from which you run +administrative commands. + +To perform this step manually, run the following command: + +.. prompt:: bash $ + + sudo scp {user}@{ceph-cluster-host}:/etc/ceph/ceph.client.admin.keyring /etc/ceph/ceph.client.admin.keyring + +.. tip:: Make sure that the ``ceph.keyring`` file has appropriate permissions + (for example, ``chmod 644``) set on your client machine. + +You can specify the key itself by using the ``key`` setting in the Ceph +configuration file (this approach is not recommended), or instead specify a +path to a keyfile by using the ``keyfile`` setting in the Ceph configuration +file. + +``keyring`` + +:Description: The path to the keyring file. +:Type: String +:Required: No +:Default: ``/etc/ceph/$cluster.$name.keyring,/etc/ceph/$cluster.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin`` + + +``keyfile`` + +:Description: The path to a keyfile (that is, a file containing only the key). +:Type: String +:Required: No +:Default: None + + +``key`` + +:Description: The key (that is, the text string of the key itself). We do not + recommend that you use this setting unless you know what you're + doing. +:Type: String +:Required: No +:Default: None + + +Daemon Keyrings +--------------- + +Administrative users or deployment tools (for example, ``cephadm``) generate +daemon keyrings in the same way that they generate user keyrings. By default, +Ceph stores the keyring of a daemon inside that daemon's data directory. The +default keyring locations and the capabilities that are necessary for the +daemon to function are shown below. + +``ceph-mon`` + +:Location: ``$mon_data/keyring`` +:Capabilities: ``mon 'allow *'`` + +``ceph-osd`` + +:Location: ``$osd_data/keyring`` +:Capabilities: ``mgr 'allow profile osd' mon 'allow profile osd' osd 'allow *'`` + +``ceph-mds`` + +:Location: ``$mds_data/keyring`` +:Capabilities: ``mds 'allow' mgr 'allow profile mds' mon 'allow profile mds' osd 'allow rwx'`` + +``ceph-mgr`` + +:Location: ``$mgr_data/keyring`` +:Capabilities: ``mon 'allow profile mgr' mds 'allow *' osd 'allow *'`` + +``radosgw`` + +:Location: ``$rgw_data/keyring`` +:Capabilities: ``mon 'allow rwx' osd 'allow rwx'`` + + +.. note:: The monitor keyring (that is, ``mon.``) contains a key but no + capabilities, and this keyring is not part of the cluster ``auth`` database. + +The daemon's data-directory locations default to directories of the form:: + + /var/lib/ceph/$type/$cluster-$id + +For example, ``osd.12`` would have the following data directory:: + + /var/lib/ceph/osd/ceph-12 + +It is possible to override these locations, but it is not recommended. + + +.. index:: signatures + +Signatures +---------- + +Ceph performs a signature check that provides some limited protection against +messages being tampered with in flight (for example, by a "man in the middle" +attack). + +As with other parts of Ceph authentication, signatures admit of fine-grained +control. You can enable or disable signatures for service messages between +clients and Ceph, and for messages between Ceph daemons. + +Note that even when signatures are enabled data is not encrypted in flight. + +``cephx_require_signatures`` + +:Description: If this configuration setting is set to ``true``, Ceph requires + signatures on all message traffic between the Ceph client and the + Ceph Storage Cluster, and between daemons within the Ceph Storage + Cluster. + +.. note:: + **ANTIQUATED NOTE:** + + Neither Ceph Argonaut nor Linux kernel versions prior to 3.19 + support signatures; if one of these clients is in use, ``cephx_require_signatures`` + can be disabled in order to allow the client to connect. + + +:Type: Boolean +:Required: No +:Default: ``false`` + + +``cephx_cluster_require_signatures`` + +:Description: If this configuration setting is set to ``true``, Ceph requires + signatures on all message traffic between Ceph daemons within the + Ceph Storage Cluster. + +:Type: Boolean +:Required: No +:Default: ``false`` + + +``cephx_service_require_signatures`` + +:Description: If this configuration setting is set to ``true``, Ceph requires + signatures on all message traffic between Ceph clients and the + Ceph Storage Cluster. + +:Type: Boolean +:Required: No +:Default: ``false`` + + +``cephx_sign_messages`` + +:Description: If this configuration setting is set to ``true``, and if the Ceph + version supports message signing, then Ceph will sign all + messages so that they are more difficult to spoof. + +:Type: Boolean +:Default: ``true`` + + +Time to Live +------------ + +``auth_service_ticket_ttl`` + +:Description: When the Ceph Storage Cluster sends a ticket for authentication + to a Ceph client, the Ceph Storage Cluster assigns that ticket a + Time To Live (TTL). + +:Type: Double +:Default: ``60*60`` + + +.. _Monitor Bootstrapping: ../../../install/manual-deployment#monitor-bootstrapping +.. _Operating a Cluster: ../../operations/operating +.. _Manual Deployment: ../../../install/manual-deployment +.. _Ceph configuration: ../ceph-conf +.. _Architecture - High Availability Authentication: ../../../architecture#high-availability-authentication +.. _User Management: ../../operations/user-management diff --git a/doc/rados/configuration/bluestore-config-ref.rst b/doc/rados/configuration/bluestore-config-ref.rst new file mode 100644 index 000000000..3707be1aa --- /dev/null +++ b/doc/rados/configuration/bluestore-config-ref.rst @@ -0,0 +1,552 @@ +================================== + BlueStore Configuration Reference +================================== + +Devices +======= + +BlueStore manages either one, two, or in certain cases three storage devices. +These *devices* are "devices" in the Linux/Unix sense. This means that they are +assets listed under ``/dev`` or ``/devices``. Each of these devices may be an +entire storage drive, or a partition of a storage drive, or a logical volume. +BlueStore does not create or mount a conventional file system on devices that +it uses; BlueStore reads and writes to the devices directly in a "raw" fashion. + +In the simplest case, BlueStore consumes all of a single storage device. This +device is known as the *primary device*. The primary device is identified by +the ``block`` symlink in the data directory. + +The data directory is a ``tmpfs`` mount. When this data directory is booted or +activated by ``ceph-volume``, it is populated with metadata files and links +that hold information about the OSD: for example, the OSD's identifier, the +name of the cluster that the OSD belongs to, and the OSD's private keyring. + +In more complicated cases, BlueStore is deployed across one or two additional +devices: + +* A *write-ahead log (WAL) device* (identified as ``block.wal`` in the data + directory) can be used to separate out BlueStore's internal journal or + write-ahead log. Using a WAL device is advantageous only if the WAL device + is faster than the primary device (for example, if the WAL device is an SSD + and the primary device is an HDD). +* A *DB device* (identified as ``block.db`` in the data directory) can be used + to store BlueStore's internal metadata. BlueStore (or more precisely, the + embedded RocksDB) will put as much metadata as it can on the DB device in + order to improve performance. If the DB device becomes full, metadata will + spill back onto the primary device (where it would have been located in the + absence of the DB device). Again, it is advantageous to provision a DB device + only if it is faster than the primary device. + +If there is only a small amount of fast storage available (for example, less +than a gigabyte), we recommend using the available space as a WAL device. But +if more fast storage is available, it makes more sense to provision a DB +device. Because the BlueStore journal is always placed on the fastest device +available, using a DB device provides the same benefit that using a WAL device +would, while *also* allowing additional metadata to be stored off the primary +device (provided that it fits). DB devices make this possible because whenever +a DB device is specified but an explicit WAL device is not, the WAL will be +implicitly colocated with the DB on the faster device. + +To provision a single-device (colocated) BlueStore OSD, run the following +command: + +.. prompt:: bash $ + + ceph-volume lvm prepare --bluestore --data + +To specify a WAL device or DB device, run the following command: + +.. prompt:: bash $ + + ceph-volume lvm prepare --bluestore --data --block.wal --block.db + +.. note:: The option ``--data`` can take as its argument any of the the + following devices: logical volumes specified using *vg/lv* notation, + existing logical volumes, and GPT partitions. + + + +Provisioning strategies +----------------------- + +BlueStore differs from Filestore in that there are several ways to deploy a +BlueStore OSD. However, the overall deployment strategy for BlueStore can be +clarified by examining just these two common arrangements: + +.. _bluestore-single-type-device-config: + +**block (data) only** +^^^^^^^^^^^^^^^^^^^^^ +If all devices are of the same type (for example, they are all HDDs), and if +there are no fast devices available for the storage of metadata, then it makes +sense to specify the block device only and to leave ``block.db`` and +``block.wal`` unseparated. The :ref:`ceph-volume-lvm` command for a single +``/dev/sda`` device is as follows: + +.. prompt:: bash $ + + ceph-volume lvm create --bluestore --data /dev/sda + +If the devices to be used for a BlueStore OSD are pre-created logical volumes, +then the :ref:`ceph-volume-lvm` call for an logical volume named +``ceph-vg/block-lv`` is as follows: + +.. prompt:: bash $ + + ceph-volume lvm create --bluestore --data ceph-vg/block-lv + +.. _bluestore-mixed-device-config: + +**block and block.db** +^^^^^^^^^^^^^^^^^^^^^^ + +If you have a mix of fast and slow devices (for example, SSD or HDD), then we +recommend placing ``block.db`` on the faster device while ``block`` (that is, +the data) is stored on the slower device (that is, the rotational drive). + +You must create these volume groups and these logical volumes manually. as The +``ceph-volume`` tool is currently unable to do so [create them?] automatically. + +The following procedure illustrates the manual creation of volume groups and +logical volumes. For this example, we shall assume four rotational drives +(``sda``, ``sdb``, ``sdc``, and ``sdd``) and one (fast) SSD (``sdx``). First, +to create the volume groups, run the following commands: + +.. prompt:: bash $ + + vgcreate ceph-block-0 /dev/sda + vgcreate ceph-block-1 /dev/sdb + vgcreate ceph-block-2 /dev/sdc + vgcreate ceph-block-3 /dev/sdd + +Next, to create the logical volumes for ``block``, run the following commands: + +.. prompt:: bash $ + + lvcreate -l 100%FREE -n block-0 ceph-block-0 + lvcreate -l 100%FREE -n block-1 ceph-block-1 + lvcreate -l 100%FREE -n block-2 ceph-block-2 + lvcreate -l 100%FREE -n block-3 ceph-block-3 + +Because there are four HDDs, there will be four OSDs. Supposing that there is a +200GB SSD in ``/dev/sdx``, we can create four 50GB logical volumes by running +the following commands: + +.. prompt:: bash $ + + vgcreate ceph-db-0 /dev/sdx + lvcreate -L 50GB -n db-0 ceph-db-0 + lvcreate -L 50GB -n db-1 ceph-db-0 + lvcreate -L 50GB -n db-2 ceph-db-0 + lvcreate -L 50GB -n db-3 ceph-db-0 + +Finally, to create the four OSDs, run the following commands: + +.. prompt:: bash $ + + ceph-volume lvm create --bluestore --data ceph-block-0/block-0 --block.db ceph-db-0/db-0 + ceph-volume lvm create --bluestore --data ceph-block-1/block-1 --block.db ceph-db-0/db-1 + ceph-volume lvm create --bluestore --data ceph-block-2/block-2 --block.db ceph-db-0/db-2 + ceph-volume lvm create --bluestore --data ceph-block-3/block-3 --block.db ceph-db-0/db-3 + +After this procedure is finished, there should be four OSDs, ``block`` should +be on the four HDDs, and each HDD should have a 50GB logical volume +(specifically, a DB device) on the shared SSD. + +Sizing +====== +When using a :ref:`mixed spinning-and-solid-drive setup +`, it is important to make a large enough +``block.db`` logical volume for BlueStore. The logical volumes associated with +``block.db`` should have logical volumes that are *as large as possible*. + +It is generally recommended that the size of ``block.db`` be somewhere between +1% and 4% of the size of ``block``. For RGW workloads, it is recommended that +the ``block.db`` be at least 4% of the ``block`` size, because RGW makes heavy +use of ``block.db`` to store metadata (in particular, omap keys). For example, +if the ``block`` size is 1TB, then ``block.db`` should have a size of at least +40GB. For RBD workloads, however, ``block.db`` usually needs no more than 1% to +2% of the ``block`` size. + +In older releases, internal level sizes are such that the DB can fully utilize +only those specific partition / logical volume sizes that correspond to sums of +L0, L0+L1, L1+L2, and so on--that is, given default settings, sizes of roughly +3GB, 30GB, 300GB, and so on. Most deployments do not substantially benefit from +sizing that accommodates L3 and higher, though DB compaction can be facilitated +by doubling these figures to 6GB, 60GB, and 600GB. + +Improvements in Nautilus 14.2.12, Octopus 15.2.6, and subsequent releases allow +for better utilization of arbitrarily-sized DB devices. Moreover, the Pacific +release brings experimental dynamic-level support. Because of these advances, +users of older releases might want to plan ahead by provisioning larger DB +devices today so that the benefits of scale can be realized when upgrades are +made in the future. + +When *not* using a mix of fast and slow devices, there is no requirement to +create separate logical volumes for ``block.db`` or ``block.wal``. BlueStore +will automatically colocate these devices within the space of ``block``. + +Automatic Cache Sizing +====================== + +BlueStore can be configured to automatically resize its caches, provided that +certain conditions are met: TCMalloc must be configured as the memory allocator +and the ``bluestore_cache_autotune`` configuration option must be enabled (note +that it is currently enabled by default). When automatic cache sizing is in +effect, BlueStore attempts to keep OSD heap-memory usage under a certain target +size (as determined by ``osd_memory_target``). This approach makes use of a +best-effort algorithm and caches do not shrink smaller than the size defined by +the value of ``osd_memory_cache_min``. Cache ratios are selected in accordance +with a hierarchy of priorities. But if priority information is not available, +the values specified in the ``bluestore_cache_meta_ratio`` and +``bluestore_cache_kv_ratio`` options are used as fallback cache ratios. + +.. confval:: bluestore_cache_autotune +.. confval:: osd_memory_target +.. confval:: bluestore_cache_autotune_interval +.. confval:: osd_memory_base +.. confval:: osd_memory_expected_fragmentation +.. confval:: osd_memory_cache_min +.. confval:: osd_memory_cache_resize_interval + + +Manual Cache Sizing +=================== + +The amount of memory consumed by each OSD to be used for its BlueStore cache is +determined by the ``bluestore_cache_size`` configuration option. If that option +has not been specified (that is, if it remains at 0), then Ceph uses a +different configuration option to determine the default memory budget: +``bluestore_cache_size_hdd`` if the primary device is an HDD, or +``bluestore_cache_size_ssd`` if the primary device is an SSD. + +BlueStore and the rest of the Ceph OSD daemon make every effort to work within +this memory budget. Note that in addition to the configured cache size, there +is also memory consumed by the OSD itself. There is additional utilization due +to memory fragmentation and other allocator overhead. + +The configured cache-memory budget can be used to store the following types of +things: + +* Key/Value metadata (that is, RocksDB's internal cache) +* BlueStore metadata +* BlueStore data (that is, recently read or recently written object data) + +Cache memory usage is governed by the configuration options +``bluestore_cache_meta_ratio`` and ``bluestore_cache_kv_ratio``. The fraction +of the cache that is reserved for data is governed by both the effective +BlueStore cache size (which depends on the relevant +``bluestore_cache_size[_ssd|_hdd]`` option and the device class of the primary +device) and the "meta" and "kv" ratios. This data fraction can be calculated +with the following formula: `` * (1 - +bluestore_cache_meta_ratio - bluestore_cache_kv_ratio)``. + +.. confval:: bluestore_cache_size +.. confval:: bluestore_cache_size_hdd +.. confval:: bluestore_cache_size_ssd +.. confval:: bluestore_cache_meta_ratio +.. confval:: bluestore_cache_kv_ratio + +Checksums +========= + +BlueStore checksums all metadata and all data written to disk. Metadata +checksumming is handled by RocksDB and uses the `crc32c` algorithm. By +contrast, data checksumming is handled by BlueStore and can use either +`crc32c`, `xxhash32`, or `xxhash64`. Nonetheless, `crc32c` is the default +checksum algorithm and it is suitable for most purposes. + +Full data checksumming increases the amount of metadata that BlueStore must +store and manage. Whenever possible (for example, when clients hint that data +is written and read sequentially), BlueStore will checksum larger blocks. In +many cases, however, it must store a checksum value (usually 4 bytes) for every +4 KB block of data. + +It is possible to obtain a smaller checksum value by truncating the checksum to +one or two bytes and reducing the metadata overhead. A drawback of this +approach is that it increases the probability of a random error going +undetected: about one in four billion given a 32-bit (4 byte) checksum, 1 in +65,536 given a 16-bit (2 byte) checksum, and 1 in 256 given an 8-bit (1 byte) +checksum. To use the smaller checksum values, select `crc32c_16` or `crc32c_8` +as the checksum algorithm. + +The *checksum algorithm* can be specified either via a per-pool ``csum_type`` +configuration option or via the global configuration option. For example: + +.. prompt:: bash $ + + ceph osd pool set csum_type + +.. confval:: bluestore_csum_type + +Inline Compression +================== + +BlueStore supports inline compression using `snappy`, `zlib`, `lz4`, or `zstd`. + +Whether data in BlueStore is compressed is determined by two factors: (1) the +*compression mode* and (2) any client hints associated with a write operation. +The compression modes are as follows: + +* **none**: Never compress data. +* **passive**: Do not compress data unless the write operation has a + *compressible* hint set. +* **aggressive**: Do compress data unless the write operation has an + *incompressible* hint set. +* **force**: Try to compress data no matter what. + +For more information about the *compressible* and *incompressible* I/O hints, +see :c:func:`rados_set_alloc_hint`. + +Note that data in Bluestore will be compressed only if the data chunk will be +sufficiently reduced in size (as determined by the ``bluestore compression +required ratio`` setting). No matter which compression modes have been used, if +the data chunk is too big, then it will be discarded and the original +(uncompressed) data will be stored instead. For example, if ``bluestore +compression required ratio`` is set to ``.7``, then data compression will take +place only if the size of the compressed data is no more than 70% of the size +of the original data. + +The *compression mode*, *compression algorithm*, *compression required ratio*, +*min blob size*, and *max blob size* settings can be specified either via a +per-pool property or via a global config option. To specify pool properties, +run the following commands: + +.. prompt:: bash $ + + ceph osd pool set compression_algorithm + ceph osd pool set compression_mode + ceph osd pool set compression_required_ratio + ceph osd pool set compression_min_blob_size + ceph osd pool set compression_max_blob_size + +.. confval:: bluestore_compression_algorithm +.. confval:: bluestore_compression_mode +.. confval:: bluestore_compression_required_ratio +.. confval:: bluestore_compression_min_blob_size +.. confval:: bluestore_compression_min_blob_size_hdd +.. confval:: bluestore_compression_min_blob_size_ssd +.. confval:: bluestore_compression_max_blob_size +.. confval:: bluestore_compression_max_blob_size_hdd +.. confval:: bluestore_compression_max_blob_size_ssd + +.. _bluestore-rocksdb-sharding: + +RocksDB Sharding +================ + +BlueStore maintains several types of internal key-value data, all of which are +stored in RocksDB. Each data type in BlueStore is assigned a unique prefix. +Prior to the Pacific release, all key-value data was stored in a single RocksDB +column family: 'default'. In Pacific and later releases, however, BlueStore can +divide key-value data into several RocksDB column families. BlueStore achieves +better caching and more precise compaction when keys are similar: specifically, +when keys have similar access frequency, similar modification frequency, and a +similar lifetime. Under such conditions, performance is improved and less disk +space is required during compaction (because each column family is smaller and +is able to compact independently of the others). + +OSDs deployed in Pacific or later releases use RocksDB sharding by default. +However, if Ceph has been upgraded to Pacific or a later version from a +previous version, sharding is disabled on any OSDs that were created before +Pacific. + +To enable sharding and apply the Pacific defaults to a specific OSD, stop the +OSD and run the following command: + + .. prompt:: bash # + + ceph-bluestore-tool \ + --path \ + --sharding="m(3) p(3,0-12) o(3,0-13)=block_cache={type=binned_lru} l p" \ + reshard + +.. confval:: bluestore_rocksdb_cf +.. confval:: bluestore_rocksdb_cfs + +Throttling +========== + +.. confval:: bluestore_throttle_bytes +.. confval:: bluestore_throttle_deferred_bytes +.. confval:: bluestore_throttle_cost_per_io +.. confval:: bluestore_throttle_cost_per_io_hdd +.. confval:: bluestore_throttle_cost_per_io_ssd + +SPDK Usage +========== + +To use the SPDK driver for NVMe devices, you must first prepare your system. +See `SPDK document`__. + +.. __: http://www.spdk.io/doc/getting_started.html#getting_started_examples + +SPDK offers a script that will configure the device automatically. Run this +script with root permissions: + +.. prompt:: bash $ + + sudo src/spdk/scripts/setup.sh + +You will need to specify the subject NVMe device's device selector with the +"spdk:" prefix for ``bluestore_block_path``. + +In the following example, you first find the device selector of an Intel NVMe +SSD by running the following command: + +.. prompt:: bash $ + + lspci -mm -n -d -d 8086:0953 + +The form of the device selector is either ``DDDD:BB:DD.FF`` or +``DDDD.BB.DD.FF``. + +Next, supposing that ``0000:01:00.0`` is the device selector found in the +output of the ``lspci`` command, you can specify the device selector by running +the following command:: + + bluestore_block_path = "spdk:trtype:pcie traddr:0000:01:00.0" + +You may also specify a remote NVMeoF target over the TCP transport, as in the +following example:: + + bluestore_block_path = "spdk:trtype:tcp traddr:10.67.110.197 trsvcid:4420 subnqn:nqn.2019-02.io.spdk:cnode1" + +To run multiple SPDK instances per node, you must make sure each instance uses +its own DPDK memory by specifying for each instance the amount of DPDK memory +(in MB) that the instance will use. + +In most cases, a single device can be used for data, DB, and WAL. We describe +this strategy as *colocating* these components. Be sure to enter the below +settings to ensure that all I/Os are issued through SPDK:: + + bluestore_block_db_path = "" + bluestore_block_db_size = 0 + bluestore_block_wal_path = "" + bluestore_block_wal_size = 0 + +If these settings are not entered, then the current implementation will +populate the SPDK map files with kernel file system symbols and will use the +kernel driver to issue DB/WAL I/Os. + +Minimum Allocation Size +======================= + +There is a configured minimum amount of storage that BlueStore allocates on an +underlying storage device. In practice, this is the least amount of capacity +that even a tiny RADOS object can consume on each OSD's primary device. The +configuration option in question--:confval:`bluestore_min_alloc_size`--derives +its value from the value of either :confval:`bluestore_min_alloc_size_hdd` or +:confval:`bluestore_min_alloc_size_ssd`, depending on the OSD's ``rotational`` +attribute. Thus if an OSD is created on an HDD, BlueStore is initialized with +the current value of :confval:`bluestore_min_alloc_size_hdd`; but with SSD OSDs +(including NVMe devices), Bluestore is initialized with the current value of +:confval:`bluestore_min_alloc_size_ssd`. + +In Mimic and earlier releases, the default values were 64KB for rotational +media (HDD) and 16KB for non-rotational media (SSD). The Octopus release +changed the the default value for non-rotational media (SSD) to 4KB, and the +Pacific release changed the default value for rotational media (HDD) to 4KB. + +These changes were driven by space amplification that was experienced by Ceph +RADOS GateWay (RGW) deployments that hosted large numbers of small files +(S3/Swift objects). + +For example, when an RGW client stores a 1 KB S3 object, that object is written +to a single RADOS object. In accordance with the default +:confval:`min_alloc_size` value, 4 KB of underlying drive space is allocated. +This means that roughly 3 KB (that is, 4 KB minus 1 KB) is allocated but never +used: this corresponds to 300% overhead or 25% efficiency. Similarly, a 5 KB +user object will be stored as two RADOS objects, a 4 KB RADOS object and a 1 KB +RADOS object, with the result that 4KB of device capacity is stranded. In this +case, however, the overhead percentage is much smaller. Think of this in terms +of the remainder from a modulus operation. The overhead *percentage* thus +decreases rapidly as object size increases. + +There is an additional subtlety that is easily missed: the amplification +phenomenon just described takes place for *each* replica. For example, when +using the default of three copies of data (3R), a 1 KB S3 object actually +strands roughly 9 KB of storage device capacity. If erasure coding (EC) is used +instead of replication, the amplification might be even higher: for a ``k=4, +m=2`` pool, our 1 KB S3 object allocates 24 KB (that is, 4 KB multiplied by 6) +of device capacity. + +When an RGW bucket pool contains many relatively large user objects, the effect +of this phenomenon is often negligible. However, with deployments that can +expect a significant fraction of relatively small user objects, the effect +should be taken into consideration. + +The 4KB default value aligns well with conventional HDD and SSD devices. +However, certain novel coarse-IU (Indirection Unit) QLC SSDs perform and wear +best when :confval:`bluestore_min_alloc_size_ssd` is specified at OSD creation +to match the device's IU: this might be 8KB, 16KB, or even 64KB. These novel +storage drives can achieve read performance that is competitive with that of +conventional TLC SSDs and write performance that is faster than that of HDDs, +with higher density and lower cost than TLC SSDs. + +Note that when creating OSDs on these novel devices, one must be careful to +apply the non-default value only to appropriate devices, and not to +conventional HDD and SSD devices. Error can be avoided through careful ordering +of OSD creation, with custom OSD device classes, and especially by the use of +central configuration *masks*. + +In Quincy and later releases, you can use the +:confval:`bluestore_use_optimal_io_size_for_min_alloc_size` option to allow +automatic discovery of the correct value as each OSD is created. Note that the +use of ``bcache``, ``OpenCAS``, ``dmcrypt``, ``ATA over Ethernet``, `iSCSI`, or +other device-layering and abstraction technologies might confound the +determination of correct values. Moreover, OSDs deployed on top of VMware +storage have sometimes been found to report a ``rotational`` attribute that +does not match the underlying hardware. + +We suggest inspecting such OSDs at startup via logs and admin sockets in order +to ensure that their behavior is correct. Be aware that this kind of inspection +might not work as expected with older kernels. To check for this issue, +examine the presence and value of ``/sys/block//queue/optimal_io_size``. + +.. note:: When running Reef or a later Ceph release, the ``min_alloc_size`` + baked into each OSD is conveniently reported by ``ceph osd metadata``. + +To inspect a specific OSD, run the following command: + +.. prompt:: bash # + + ceph osd metadata osd.1701 | egrep rotational\|alloc + +This space amplification might manifest as an unusually high ratio of raw to +stored data as reported by ``ceph df``. There might also be ``%USE`` / ``VAR`` +values reported by ``ceph osd df`` that are unusually high in comparison to +other, ostensibly identical, OSDs. Finally, there might be unexpected balancer +behavior in pools that use OSDs that have mismatched ``min_alloc_size`` values. + +This BlueStore attribute takes effect *only* at OSD creation; if the attribute +is changed later, a specific OSD's behavior will not change unless and until +the OSD is destroyed and redeployed with the appropriate option value(s). +Upgrading to a later Ceph release will *not* change the value used by OSDs that +were deployed under older releases or with other settings. + +.. confval:: bluestore_min_alloc_size +.. confval:: bluestore_min_alloc_size_hdd +.. confval:: bluestore_min_alloc_size_ssd +.. confval:: bluestore_use_optimal_io_size_for_min_alloc_size + +DSA (Data Streaming Accelerator) Usage +====================================== + +If you want to use the DML library to drive the DSA device for offloading +read/write operations on persistent memory (PMEM) in BlueStore, you need to +install `DML`_ and the `idxd-config`_ library. This will work only on machines +that have a SPR (Sapphire Rapids) CPU. + +.. _dml: https://github.com/intel/dml +.. _idxd-config: https://github.com/intel/idxd-config + +After installing the DML software, configure the shared work queues (WQs) with +reference to the following WQ configuration example: + +.. prompt:: bash $ + + accel-config config-wq --group-id=1 --mode=shared --wq-size=16 --threshold=15 --type=user --name="myapp1" --priority=10 --block-on-fault=1 dsa0/wq0.1 + accel-config config-engine dsa0/engine0.1 --group-id=1 + accel-config enable-device dsa0 + accel-config enable-wq dsa0/wq0.1 diff --git a/doc/rados/configuration/ceph-conf.rst b/doc/rados/configuration/ceph-conf.rst new file mode 100644 index 000000000..d8d5c9d03 --- /dev/null +++ b/doc/rados/configuration/ceph-conf.rst @@ -0,0 +1,715 @@ +.. _configuring-ceph: + +================== + Configuring Ceph +================== + +When Ceph services start, the initialization process activates a set of +daemons that run in the background. A :term:`Ceph Storage Cluster` runs at +least three types of daemons: + +- :term:`Ceph Monitor` (``ceph-mon``) +- :term:`Ceph Manager` (``ceph-mgr``) +- :term:`Ceph OSD Daemon` (``ceph-osd``) + +Any Ceph Storage Cluster that supports the :term:`Ceph File System` also runs +at least one :term:`Ceph Metadata Server` (``ceph-mds``). Any Cluster that +supports :term:`Ceph Object Storage` runs Ceph RADOS Gateway daemons +(``radosgw``). + +Each daemon has a number of configuration options, and each of those options +has a default value. Adjust the behavior of the system by changing these +configuration options. Make sure to understand the consequences before +overriding the default values, as it is possible to significantly degrade the +performance and stability of your cluster. Remember that default values +sometimes change between releases. For this reason, it is best to review the +version of this documentation that applies to your Ceph release. + +Option names +============ + +Each of the Ceph configuration options has a unique name that consists of words +formed with lowercase characters and connected with underscore characters +(``_``). + +When option names are specified on the command line, underscore (``_``) and +dash (``-``) characters can be used interchangeably (for example, +``--mon-host`` is equivalent to ``--mon_host``). + +When option names appear in configuration files, spaces can also be used in +place of underscores or dashes. However, for the sake of clarity and +convenience, we suggest that you consistently use underscores, as we do +throughout this documentation. + +Config sources +============== + +Each Ceph daemon, process, and library pulls its configuration from one or more +of the several sources listed below. Sources that occur later in the list +override those that occur earlier in the list (when both are present). + +- the compiled-in default value +- the monitor cluster's centralized configuration database +- a configuration file stored on the local host +- environment variables +- command-line arguments +- runtime overrides that are set by an administrator + +One of the first things a Ceph process does on startup is parse the +configuration options provided via the command line, via the environment, and +via the local configuration file. Next, the process contacts the monitor +cluster to retrieve centrally-stored configuration for the entire cluster. +After a complete view of the configuration is available, the startup of the +daemon or process will commence. + +.. _bootstrap-options: + +Bootstrap options +----------------- + +Bootstrap options are configuration options that affect the process's ability +to contact the monitors, to authenticate, and to retrieve the cluster-stored +configuration. For this reason, these options might need to be stored locally +on the node, and set by means of a local configuration file. These options +include the following: + +.. confval:: mon_host +.. confval:: mon_host_override + +- :confval:`mon_dns_srv_name` +- :confval:`mon_data`, :confval:`osd_data`, :confval:`mds_data`, + :confval:`mgr_data`, and similar options that define which local directory + the daemon stores its data in. +- :confval:`keyring`, :confval:`keyfile`, and/or :confval:`key`, which can be + used to specify the authentication credential to use to authenticate with the + monitor. Note that in most cases the default keyring location is in the data + directory specified above. + +In most cases, there is no reason to modify the default values of these +options. However, there is one exception to this: the :confval:`mon_host` +option that identifies the addresses of the cluster's monitors. But when +:ref:`DNS is used to identify monitors`, a local Ceph +configuration file can be avoided entirely. + + +Skipping monitor config +----------------------- + +The option ``--no-mon-config`` can be passed in any command in order to skip +the step that retrieves configuration information from the cluster's monitors. +Skipping this retrieval step can be useful in cases where configuration is +managed entirely via configuration files, or when maintenance activity needs to +be done but the monitor cluster is down. + +.. _ceph-conf-file: + +Configuration sections +====================== + +Each of the configuration options associated with a single process or daemon +has a single value. However, the values for a configuration option can vary +across daemon types, and can vary even across different daemons of the same +type. Ceph options that are stored in the monitor configuration database or in +local configuration files are grouped into sections |---| so-called "configuration +sections" |---| to indicate which daemons or clients they apply to. + + +These sections include the following: + +.. confsec:: global + + Settings under ``global`` affect all daemons and clients + in a Ceph Storage Cluster. + + :example: ``log_file = /var/log/ceph/$cluster-$type.$id.log`` + +.. confsec:: mon + + Settings under ``mon`` affect all ``ceph-mon`` daemons in + the Ceph Storage Cluster, and override the same setting in + ``global``. + + :example: ``mon_cluster_log_to_syslog = true`` + +.. confsec:: mgr + + Settings in the ``mgr`` section affect all ``ceph-mgr`` daemons in + the Ceph Storage Cluster, and override the same setting in + ``global``. + + :example: ``mgr_stats_period = 10`` + +.. confsec:: osd + + Settings under ``osd`` affect all ``ceph-osd`` daemons in + the Ceph Storage Cluster, and override the same setting in + ``global``. + + :example: ``osd_op_queue = wpq`` + +.. confsec:: mds + + Settings in the ``mds`` section affect all ``ceph-mds`` daemons in + the Ceph Storage Cluster, and override the same setting in + ``global``. + + :example: ``mds_cache_memory_limit = 10G`` + +.. confsec:: client + + Settings under ``client`` affect all Ceph clients + (for example, mounted Ceph File Systems, mounted Ceph Block Devices) + as well as RADOS Gateway (RGW) daemons. + + :example: ``objecter_inflight_ops = 512`` + + +Configuration sections can also specify an individual daemon or client name. For example, +``mon.foo``, ``osd.123``, and ``client.smith`` are all valid section names. + + +Any given daemon will draw its settings from the global section, the daemon- or +client-type section, and the section sharing its name. Settings in the +most-specific section take precedence so precedence: for example, if the same +option is specified in both :confsec:`global`, :confsec:`mon`, and ``mon.foo`` +on the same source (i.e. that is, in the same configuration file), the +``mon.foo`` setting will be used. + +If multiple values of the same configuration option are specified in the same +section, the last value specified takes precedence. + +Note that values from the local configuration file always take precedence over +values from the monitor configuration database, regardless of the section in +which they appear. + +.. _ceph-metavariables: + +Metavariables +============= + +Metavariables dramatically simplify Ceph storage cluster configuration. When a +metavariable is set in a configuration value, Ceph expands the metavariable at +the time the configuration value is used. In this way, Ceph metavariables +behave similarly to the way that variable expansion works in the Bash shell. + +Ceph supports the following metavariables: + +.. describe:: $cluster + + Expands to the Ceph Storage Cluster name. Useful when running + multiple Ceph Storage Clusters on the same hardware. + + :example: ``/etc/ceph/$cluster.keyring`` + :default: ``ceph`` + +.. describe:: $type + + Expands to a daemon or process type (for example, ``mds``, ``osd``, or ``mon``) + + :example: ``/var/lib/ceph/$type`` + +.. describe:: $id + + Expands to the daemon or client identifier. For + ``osd.0``, this would be ``0``; for ``mds.a``, it would + be ``a``. + + :example: ``/var/lib/ceph/$type/$cluster-$id`` + +.. describe:: $host + + Expands to the host name where the process is running. + +.. describe:: $name + + Expands to ``$type.$id``. + + :example: ``/var/run/ceph/$cluster-$name.asok`` + +.. describe:: $pid + + Expands to daemon pid. + + :example: ``/var/run/ceph/$cluster-$name-$pid.asok`` + + +Ceph configuration file +======================= + +On startup, Ceph processes search for a configuration file in the +following locations: + +#. ``$CEPH_CONF`` (that is, the path following the ``$CEPH_CONF`` + environment variable) +#. ``-c path/path`` (that is, the ``-c`` command line argument) +#. ``/etc/ceph/$cluster.conf`` +#. ``~/.ceph/$cluster.conf`` +#. ``./$cluster.conf`` (that is, in the current working directory) +#. On FreeBSD systems only, ``/usr/local/etc/ceph/$cluster.conf`` + +Here ``$cluster`` is the cluster's name (default: ``ceph``). + +The Ceph configuration file uses an ``ini`` style syntax. You can add "comment +text" after a pound sign (#) or a semi-colon semicolon (;). For example: + +.. code-block:: ini + + # <--A number (#) sign number sign (#) precedes a comment. + ; A comment may be anything. + # Comments always follow a semi-colon semicolon (;) or a pound sign (#) on each line. + # The end of the line terminates a comment. + # We recommend that you provide comments in your configuration file(s). + + +.. _ceph-conf-settings: + +Config file section names +------------------------- + +The configuration file is divided into sections. Each section must begin with a +valid configuration section name (see `Configuration sections`_, above) that is +surrounded by square brackets. For example: + +.. code-block:: ini + + [global] + debug_ms = 0 + + [osd] + debug_ms = 1 + + [osd.1] + debug_ms = 10 + + [osd.2] + debug_ms = 10 + +Config file option values +------------------------- + +The value of a configuration option is a string. If the string is too long to +fit on a single line, you can put a backslash (``\``) at the end of the line +and the backslash will act as a line continuation marker. In such a case, the +value of the option will be the string after ``=`` in the current line, +combined with the string in the next line. Here is an example:: + + [global] + foo = long long ago\ + long ago + +In this example, the value of the "``foo``" option is "``long long ago long +ago``". + +An option value typically ends with either a newline or a comment. For +example: + +.. code-block:: ini + + [global] + obscure_one = difficult to explain # I will try harder in next release + simpler_one = nothing to explain + +In this example, the value of the "``obscure one``" option is "``difficult to +explain``" and the value of the "``simpler one`` options is "``nothing to +explain``". + +When an option value contains spaces, it can be enclosed within single quotes +or double quotes in order to make its scope clear and in order to make sure +that the first space in the value is not interpreted as the end of the value. +For example: + +.. code-block:: ini + + [global] + line = "to be, or not to be" + +In option values, there are four characters that are treated as escape +characters: ``=``, ``#``, ``;`` and ``[``. They are permitted to occur in an +option value only if they are immediately preceded by the backslash character +(``\``). For example: + +.. code-block:: ini + + [global] + secret = "i love \# and \[" + +Each configuration option falls under one of the following types: + +.. describe:: int + + 64-bit signed integer. Some SI suffixes are supported, such as "K", "M", + "G", "T", "P", and "E" (meaning, respectively, 10\ :sup:`3`, 10\ :sup:`6`, + 10\ :sup:`9`, etc.). "B" is the only supported unit string. Thus "1K", "1M", + "128B" and "-1" are all valid option values. When a negative value is + assigned to a threshold option, this can indicate that the option is + "unlimited" -- that is, that there is no threshold or limit in effect. + + :example: ``42``, ``-1`` + +.. describe:: uint + + This differs from ``integer`` only in that negative values are not + permitted. + + :example: ``256``, ``0`` + +.. describe:: str + + A string encoded in UTF-8. Certain characters are not permitted. Reference + the above notes for the details. + + :example: ``"hello world"``, ``"i love \#"``, ``yet-another-name`` + +.. describe:: boolean + + Typically either of the two values ``true`` or ``false``. However, any + integer is permitted: "0" implies ``false``, and any non-zero value implies + ``true``. + + :example: ``true``, ``false``, ``1``, ``0`` + +.. describe:: addr + + A single address, optionally prefixed with ``v1``, ``v2`` or ``any`` for the + messenger protocol. If no prefix is specified, the ``v2`` protocol is used. + For more details, see :ref:`address_formats`. + + :example: ``v1:1.2.3.4:567``, ``v2:1.2.3.4:567``, ``1.2.3.4:567``, ``2409:8a1e:8fb6:aa20:1260:4bff:fe92:18f5::567``, ``[::1]:6789`` + +.. describe:: addrvec + + A set of addresses separated by ",". The addresses can be optionally quoted + with ``[`` and ``]``. + + :example: ``[v1:1.2.3.4:567,v2:1.2.3.4:568]``, ``v1:1.2.3.4:567,v1:1.2.3.14:567`` ``[2409:8a1e:8fb6:aa20:1260:4bff:fe92:18f5::567], [2409:8a1e:8fb6:aa20:1260:4bff:fe92:18f5::568]`` + +.. describe:: uuid + + The string format of a uuid defined by `RFC4122 + `_. Certain variants are also + supported: for more details, see `Boost document + `_. + + :example: ``f81d4fae-7dec-11d0-a765-00a0c91e6bf6`` + +.. describe:: size + + 64-bit unsigned integer. Both SI prefixes and IEC prefixes are supported. + "B" is the only supported unit string. Negative values are not permitted. + + :example: ``1Ki``, ``1K``, ``1KiB`` and ``1B``. + +.. describe:: secs + + Denotes a duration of time. The default unit of time is the second. + The following units of time are supported: + + * second: ``s``, ``sec``, ``second``, ``seconds`` + * minute: ``m``, ``min``, ``minute``, ``minutes`` + * hour: ``hs``, ``hr``, ``hour``, ``hours`` + * day: ``d``, ``day``, ``days`` + * week: ``w``, ``wk``, ``week``, ``weeks`` + * month: ``mo``, ``month``, ``months`` + * year: ``y``, ``yr``, ``year``, ``years`` + + :example: ``1 m``, ``1m`` and ``1 week`` + +.. _ceph-conf-database: + +Monitor configuration database +============================== + +The monitor cluster manages a database of configuration options that can be +consumed by the entire cluster. This allows for streamlined central +configuration management of the entire system. For ease of administration and +transparency, the vast majority of configuration options can and should be +stored in this database. + +Some settings might need to be stored in local configuration files because they +affect the ability of the process to connect to the monitors, to authenticate, +and to fetch configuration information. In most cases this applies only to the +``mon_host`` option. This issue can be avoided by using :ref:`DNS SRV +records`. + +Sections and masks +------------------ + +Configuration options stored by the monitor can be stored in a global section, +in a daemon-type section, or in a specific daemon section. In this, they are +no different from the options in a configuration file. + +In addition, options may have a *mask* associated with them to further restrict +which daemons or clients the option applies to. Masks take two forms: + +#. ``type:location`` where ``type`` is a CRUSH property like ``rack`` or + ``host``, and ``location`` is a value for that property. For example, + ``host:foo`` would limit the option only to daemons or clients + running on a particular host. +#. ``class:device-class`` where ``device-class`` is the name of a CRUSH + device class (for example, ``hdd`` or ``ssd``). For example, + ``class:ssd`` would limit the option only to OSDs backed by SSDs. + (This mask has no effect on non-OSD daemons or clients.) + +In commands that specify a configuration option, the argument of the option (in +the following examples, this is the "who" string) may be a section name, a +mask, or a combination of both separated by a slash character (``/``). For +example, ``osd/rack:foo`` would refer to all OSD daemons in the ``foo`` rack. + +When configuration options are shown, the section name and mask are presented +in separate fields or columns to make them more readable. + +Commands +-------- + +The following CLI commands are used to configure the cluster: + +* ``ceph config dump`` dumps the entire monitor configuration + database for the cluster. + +* ``ceph config get `` dumps the configuration options stored in + the monitor configuration database for a specific daemon or client + (for example, ``mds.a``). + +* ``ceph config get