diff options
Diffstat (limited to '')
-rw-r--r-- | doc/rbd/rbd-config-ref.rst | 265 |
1 files changed, 265 insertions, 0 deletions
diff --git a/doc/rbd/rbd-config-ref.rst b/doc/rbd/rbd-config-ref.rst new file mode 100644 index 000000000..c21731adc --- /dev/null +++ b/doc/rbd/rbd-config-ref.rst @@ -0,0 +1,265 @@ +======================= + Config Settings +======================= + +See `Block Device`_ for additional details. + +Generic IO Settings +=================== + +.. confval:: rbd_compression_hint +.. confval:: rbd_read_from_replica_policy +.. confval:: rbd_default_order + +Cache Settings +======================= + +.. sidebar:: Kernel Caching + + The kernel driver for Ceph block devices can use the Linux page cache to + improve performance. + +The user space implementation of the Ceph block device (i.e., ``librbd``) cannot +take advantage of the Linux page cache, so it includes its own in-memory +caching, called "RBD caching." RBD caching behaves just like well-behaved hard +disk caching. When the OS sends a barrier or a flush request, all dirty data is +written to the OSDs. This means that using write-back caching is just as safe as +using a well-behaved physical hard disk with a VM that properly sends flushes +(i.e. Linux kernel >= 2.6.32). The cache uses a Least Recently Used (LRU) +algorithm, and in write-back mode it can coalesce contiguous requests for +better throughput. + +The librbd cache is enabled by default and supports three different cache +policies: write-around, write-back, and write-through. Writes return +immediately under both the write-around and write-back policies, unless there +are more than ``rbd_cache_max_dirty`` unwritten bytes to the storage cluster. +The write-around policy differs from the write-back policy in that it does +not attempt to service read requests from the cache, unlike the write-back +policy, and is therefore faster for high performance write workloads. Under the +write-through policy, writes return only when the data is on disk on all +replicas, but reads may come from the cache. + +Prior to receiving a flush request, the cache behaves like a write-through cache +to ensure safe operation for older operating systems that do not send flushes to +ensure crash consistent behavior. + +If the librbd cache is disabled, writes and +reads go directly to the storage cluster, and writes return only when the data +is on disk on all replicas. + +.. note:: + The cache is in memory on the client, and each RBD image has + its own. Since the cache is local to the client, there's no coherency + if there are others accessing the image. Running GFS or OCFS on top of + RBD will not work with caching enabled. + + +Option settings for RBD should be set in the ``[client]`` +section of your configuration file or the central config store. These settings +include: + +.. confval:: rbd_cache +.. confval:: rbd_cache_policy +.. confval:: rbd_cache_writethrough_until_flush +.. confval:: rbd_cache_size +.. confval:: rbd_cache_max_dirty +.. confval:: rbd_cache_target_dirty +.. confval:: rbd_cache_max_dirty_age + +.. _Block Device: ../../rbd + + +Read-ahead Settings +======================= + +librbd supports read-ahead/prefetching to optimize small, sequential reads. +This should normally be handled by the guest OS in the case of a VM, +but boot loaders may not issue efficient reads. Read-ahead is automatically +disabled if caching is disabled or if the policy is write-around. + + +.. confval:: rbd_readahead_trigger_requests +.. confval:: rbd_readahead_max_bytes +.. confval:: rbd_readahead_disable_after_bytes + +Image Features +============== + +RBD supports advanced features which can be specified via the command line when +creating images or the default features can be configured via +``rbd_default_features = <sum of feature numeric values>`` or +``rbd_default_features = <comma-delimited list of CLI values>``. + +``Layering`` + +:Description: Layering enables cloning. +:Internal value: 1 +:CLI value: layering +:Added in: v0.52 (Bobtail) +:KRBD support: since v3.10 +:Default: yes + +``Striping v2`` + +:Description: Striping spreads data across multiple objects. Striping helps with + parallelism for sequential read/write workloads. +:Internal value: 2 +:CLI value: striping +:Added in: v0.55 (Bobtail) +:KRBD support: since v3.10 (default striping only, "fancy" striping added in v4.17) +:Default: yes + +``Exclusive locking`` + +:Description: When enabled, it requires a client to acquire a lock on an object + before making a write. Exclusive lock should only be enabled when + a single client is accessing an image at any given time. +:Internal value: 4 +:CLI value: exclusive-lock +:Added in: v0.92 (Hammer) +:KRBD support: since v4.9 +:Default: yes + +``Object map`` + +:Description: Object map support depends on exclusive lock support. Block + devices are thin provisioned, which means that they only store + data that actually has been written, ie. they are *sparse*. Object + map support helps track which objects actually exist (have data + stored on a device). Enabling object map support speeds up I/O + operations for cloning, importing and exporting a sparsely + populated image, and deleting. +:Internal value: 8 +:CLI value: object-map +:Added in: v0.93 (Hammer) +:KRBD support: since v5.3 +:Default: yes + + +``Fast-diff`` + +:Description: Fast-diff support depends on object map support and exclusive lock + support. It adds another property to the object map, which makes + it much faster to generate diffs between snapshots of an image. + It is also much faster to calculate the actual data usage of a + snapshot or volume (``rbd du``). +:Internal value: 16 +:CLI value: fast-diff +:Added in: v9.0.1 (Infernalis) +:KRBD support: since v5.3 +:Default: yes + + +``Deep-flatten`` + +:Description: Deep-flatten enables ``rbd flatten`` to work on all snapshots of + an image, in addition to the image itself. Without it, snapshots + of an image will still rely on the parent, so the parent cannot be + deleted until the snapshots are first deleted. Deep-flatten makes + a parent independent of its clones, even if they have snapshots, + at the expense of using additional OSD device space. +:Internal value: 32 +:CLI value: deep-flatten +:Added in: v9.0.2 (Infernalis) +:KRBD support: since v5.1 +:Default: yes + + +``Journaling`` + +:Description: Journaling support depends on exclusive lock support. Journaling + records all modifications to an image in the order they occur. RBD + mirroring can utilize the journal to replicate a crash-consistent + image to a remote cluster. It is best to let ``rbd-mirror`` + manage this feature only as needed, as enabling it long term may + result in substantial additional OSD space consumption. +:Internal value: 64 +:CLI value: journaling +:Added in: v10.0.1 (Jewel) +:KRBD support: no +:Default: no + + +``Data pool`` + +:Description: On erasure-coded pools, the image data block objects need to be stored on a separate pool from the image metadata. +:Internal value: 128 +:Added in: v11.1.0 (Kraken) +:KRBD support: since v4.11 +:Default: no + + +``Operations`` + +:Description: Used to restrict older clients from performing certain maintenance operations against an image (e.g. clone, snap create). +:Internal value: 256 +:Added in: v13.0.2 (Mimic) +:KRBD support: since v4.16 + + +``Migrating`` + +:Description: Used to restrict older clients from opening an image when it is in migration state. +:Internal value: 512 +:Added in: v14.0.1 (Nautilus) +:KRBD support: no + +``Non-primary`` + +:Description: Used to restrict changes to non-primary images using snapshot-based mirroring. +:Internal value: 1024 +:Added in: v15.2.0 (Octopus) +:KRBD support: no + + +QoS Settings +============ + +librbd supports limiting per-image IO in several ways. These all apply +to a given image within a given process - the same image used in +multiple places, e.g. two separate VMs, would have independent limits. + +* **IOPS:** number of I/Os per second (any type of I/O) +* **read IOPS:** number of read I/Os per second +* **write IOPS:** number of write I/Os per second +* **bps:** bytes per second (any type of I/O) +* **read bps:** bytes per second read +* **write bps:** bytes per second written + +Each of these limits operates independently of each other. They are +all off by default. Every type of limit throttles I/O using a token +bucket algorithm, with the ability to configure the limit (average +speed over time) and potential for a higher rate (a burst) for a short +period of time (burst_seconds). When any of these limits is reached, +and there is no burst capacity left, librbd reduces the rate of that +type of I/O to the limit. + +For example, if a read bps limit of 100MB was configured, but writes +were not limited, writes could proceed as quickly as possible, while +reads would be throttled to 100MB/s on average. If a read bps burst of +150MB was set, and read burst seconds was set to five seconds, reads +could proceed at 150MB/s for up to five seconds before dropping back +to the 100MB/s limit. + +The following options configure these throttles: + +.. confval:: rbd_qos_iops_limit +.. confval:: rbd_qos_iops_burst +.. confval:: rbd_qos_iops_burst_seconds +.. confval:: rbd_qos_read_iops_limit +.. confval:: rbd_qos_read_iops_burst +.. confval:: rbd_qos_read_iops_burst_seconds +.. confval:: rbd_qos_write_iops_limit +.. confval:: rbd_qos_write_iops_burst +.. confval:: rbd_qos_write_iops_burst_seconds +.. confval:: rbd_qos_bps_limit +.. confval:: rbd_qos_bps_burst +.. confval:: rbd_qos_bps_burst_seconds +.. confval:: rbd_qos_read_bps_limit +.. confval:: rbd_qos_read_bps_burst +.. confval:: rbd_qos_read_bps_burst_seconds +.. confval:: rbd_qos_write_bps_limit +.. confval:: rbd_qos_write_bps_burst +.. confval:: rbd_qos_write_bps_burst_seconds +.. confval:: rbd_qos_schedule_tick_min +.. confval:: rbd_qos_exclude_ops |