Adding upstream version 18.2.2.upstream/18.2.2

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-21 11:54:28 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-21 11:54:28 +0000
commit: e6918187568dbd01842d8d1d2c808ce16a894239 (patch)
tree: 64f88b554b444a49f656b6c656111a145cbbaa28 /doc/rados/configuration/bluestore-config-ref.rst
parent: Initial commit. (diff)
download: ceph-e6918187568dbd01842d8d1d2c808ce16a894239.tar.xz
ceph-e6918187568dbd01842d8d1d2c808ce16a894239.zip
1 files changed, 552 insertions, 0 deletions
diff --git a/doc/rados/configuration/bluestore-config-ref.rst b/doc/rados/configuration/bluestore-config-ref.rst
new file mode 100644
index 000000000..3707be1aa
--- /dev/null
+++ b/doc/rados/configuration/bluestore-config-ref.rst
@@ -0,0 +1,552 @@
+==================================
+ BlueStore Configuration Reference 
+==================================
+
+Devices
+=======
+
+BlueStore manages either one, two, or in certain cases three storage devices.
+These *devices* are "devices" in the Linux/Unix sense. This means that they are
+assets listed under ``/dev`` or ``/devices``. Each of these devices may be an
+entire storage drive, or a partition of a storage drive, or a logical volume.
+BlueStore does not create or mount a conventional file system on devices that
+it uses; BlueStore reads and writes to the devices directly in a "raw" fashion.
+
+In the simplest case, BlueStore consumes all of a single storage device. This
+device is known as the *primary device*. The primary device is identified by
+the ``block`` symlink in the data directory.
+
+The data directory is a ``tmpfs`` mount. When this data directory is booted or
+activated by ``ceph-volume``, it is populated with metadata files and links
+that hold information about the OSD: for example, the OSD's identifier, the
+name of the cluster that the OSD belongs to, and the OSD's private keyring.
+
+In more complicated cases, BlueStore is deployed across one or two additional
+devices:
+
+* A *write-ahead log (WAL) device* (identified as ``block.wal`` in the data
+  directory) can be used to separate out BlueStore's internal journal or
+  write-ahead log. Using a WAL device is advantageous only if the WAL device
+  is faster than the primary device (for example, if the WAL device is an SSD
+  and the primary device is an HDD).
+* A *DB device* (identified as ``block.db`` in the data directory) can be used
+  to store BlueStore's internal metadata. BlueStore (or more precisely, the
+  embedded RocksDB) will put as much metadata as it can on the DB device in
+  order to improve performance. If the DB device becomes full, metadata will
+  spill back onto the primary device (where it would have been located in the
+  absence of the DB device). Again, it is advantageous to provision a DB device
+  only if it is faster than the primary device.
+
+If there is only a small amount of fast storage available (for example, less
+than a gigabyte), we recommend using the available space as a WAL device. But
+if more fast storage is available, it makes more sense to provision a DB
+device. Because the BlueStore journal is always placed on the fastest device
+available, using a DB device provides the same benefit that using a WAL device
+would, while *also* allowing additional metadata to be stored off the primary
+device (provided that it fits). DB devices make this possible because whenever
+a DB device is specified but an explicit WAL device is not, the WAL will be
+implicitly colocated with the DB on the faster device.
+
+To provision a single-device (colocated) BlueStore OSD, run the following
+command:
+
+.. prompt:: bash $
+
+   ceph-volume lvm prepare --bluestore --data <device>
+
+To specify a WAL device or DB device, run the following command:
+   
+.. prompt:: bash $
+
+   ceph-volume lvm prepare --bluestore --data <device> --block.wal <wal-device> --block.db <db-device>
+
+.. note:: The option ``--data`` can take as its argument any of the the
+   following devices: logical volumes specified using *vg/lv* notation,
+   existing logical volumes, and GPT partitions.
+
+
+
+Provisioning strategies
+-----------------------
+
+BlueStore differs from Filestore in that there are several ways to deploy a
+BlueStore OSD. However, the overall deployment strategy for BlueStore can be
+clarified by examining just these two common arrangements:
+
+.. _bluestore-single-type-device-config:
+
+**block (data) only**
+^^^^^^^^^^^^^^^^^^^^^
+If all devices are of the same type (for example, they are all HDDs), and if
+there are no fast devices available for the storage of metadata, then it makes
+sense to specify the block device only and to leave ``block.db`` and
+``block.wal`` unseparated. The :ref:`ceph-volume-lvm` command for a single
+``/dev/sda`` device is as follows:
+
+.. prompt:: bash $
+
+   ceph-volume lvm create --bluestore --data /dev/sda
+
+If the devices to be used for a BlueStore OSD are pre-created logical volumes,
+then the :ref:`ceph-volume-lvm` call for an logical volume named
+``ceph-vg/block-lv`` is as follows:
+
+.. prompt:: bash $
+
+   ceph-volume lvm create --bluestore --data ceph-vg/block-lv
+
+.. _bluestore-mixed-device-config:
+
+**block and block.db**
+^^^^^^^^^^^^^^^^^^^^^^
+
+If you have a mix of fast and slow devices (for example, SSD or HDD), then we
+recommend placing ``block.db`` on the faster device while ``block`` (that is,
+the data) is stored on the slower device (that is, the rotational drive).
+
+You must create these volume groups and these logical volumes manually. as The
+``ceph-volume`` tool is currently unable to do so [create them?] automatically.
+
+The following procedure illustrates the manual creation of volume groups and
+logical volumes.  For this example, we shall assume four rotational drives
+(``sda``, ``sdb``, ``sdc``, and ``sdd``) and one (fast) SSD (``sdx``). First,
+to create the volume groups, run the following commands:
+
+.. prompt:: bash $
+
+   vgcreate ceph-block-0 /dev/sda
+   vgcreate ceph-block-1 /dev/sdb
+   vgcreate ceph-block-2 /dev/sdc
+   vgcreate ceph-block-3 /dev/sdd
+
+Next, to create the logical volumes for ``block``, run the following commands:
+
+.. prompt:: bash $
+
+   lvcreate -l 100%FREE -n block-0 ceph-block-0
+   lvcreate -l 100%FREE -n block-1 ceph-block-1
+   lvcreate -l 100%FREE -n block-2 ceph-block-2
+   lvcreate -l 100%FREE -n block-3 ceph-block-3
+
+Because there are four HDDs, there will be four OSDs. Supposing that there is a
+200GB SSD in ``/dev/sdx``, we can create four 50GB logical volumes by running
+the following commands:
+
+.. prompt:: bash $
+
+   vgcreate ceph-db-0 /dev/sdx
+   lvcreate -L 50GB -n db-0 ceph-db-0
+   lvcreate -L 50GB -n db-1 ceph-db-0
+   lvcreate -L 50GB -n db-2 ceph-db-0
+   lvcreate -L 50GB -n db-3 ceph-db-0
+
+Finally, to create the four OSDs, run the following commands:
+
+.. prompt:: bash $
+
+   ceph-volume lvm create --bluestore --data ceph-block-0/block-0 --block.db ceph-db-0/db-0
+   ceph-volume lvm create --bluestore --data ceph-block-1/block-1 --block.db ceph-db-0/db-1
+   ceph-volume lvm create --bluestore --data ceph-block-2/block-2 --block.db ceph-db-0/db-2
+   ceph-volume lvm create --bluestore --data ceph-block-3/block-3 --block.db ceph-db-0/db-3
+
+After this procedure is finished, there should be four OSDs, ``block`` should
+be on the four HDDs, and each HDD should have a 50GB logical volume
+(specifically, a DB device) on the shared SSD.
+
+Sizing
+======
+When using a :ref:`mixed spinning-and-solid-drive setup
+<bluestore-mixed-device-config>`, it is important to make a large enough
+``block.db`` logical volume for BlueStore. The logical volumes associated with
+``block.db`` should have logical volumes that are *as large as possible*.
+
+It is generally recommended that the size of ``block.db`` be somewhere between
+1% and 4% of the size of ``block``. For RGW workloads, it is recommended that
+the ``block.db`` be at least 4% of the ``block`` size, because RGW makes heavy
+use of ``block.db`` to store metadata (in particular, omap keys). For example,
+if the ``block`` size is 1TB, then ``block.db`` should have a size of at least
+40GB. For RBD workloads, however, ``block.db`` usually needs no more than 1% to
+2% of the ``block`` size.
+
+In older releases, internal level sizes are such that the DB can fully utilize
+only those specific partition / logical volume sizes that correspond to sums of
+L0, L0+L1, L1+L2, and so on--that is, given default settings, sizes of roughly
+3GB, 30GB, 300GB, and so on. Most deployments do not substantially benefit from
+sizing that accommodates L3 and higher, though DB compaction can be facilitated
+by doubling these figures to 6GB, 60GB, and 600GB.
+
+Improvements in Nautilus 14.2.12, Octopus 15.2.6, and subsequent releases allow
+for better utilization of arbitrarily-sized DB devices. Moreover, the Pacific
+release brings experimental dynamic-level support. Because of these advances,
+users of older releases might want to plan ahead by provisioning larger DB
+devices today so that the benefits of scale can be realized when upgrades are
+made in the future.
+
+When *not* using a mix of fast and slow devices, there is no requirement to
+create separate logical volumes for ``block.db`` or ``block.wal``. BlueStore
+will automatically colocate these devices within the space of ``block``.
+
+Automatic Cache Sizing
+======================
+
+BlueStore can be configured to automatically resize its caches, provided that
+certain conditions are met: TCMalloc must be configured as the memory allocator
+and the ``bluestore_cache_autotune`` configuration option must be enabled (note
+that it is currently enabled by default). When automatic cache sizing is in
+effect, BlueStore attempts to keep OSD heap-memory usage under a certain target
+size (as determined by ``osd_memory_target``). This approach makes use of a
+best-effort algorithm and caches do not shrink smaller than the size defined by
+the value of ``osd_memory_cache_min``. Cache ratios are selected in accordance
+with a hierarchy of priorities.  But if priority information is not available,
+the values specified in the ``bluestore_cache_meta_ratio`` and
+``bluestore_cache_kv_ratio`` options are used as fallback cache ratios.
+
+.. confval:: bluestore_cache_autotune
+.. confval:: osd_memory_target
+.. confval:: bluestore_cache_autotune_interval
+.. confval:: osd_memory_base
+.. confval:: osd_memory_expected_fragmentation
+.. confval:: osd_memory_cache_min
+.. confval:: osd_memory_cache_resize_interval
+
+
+Manual Cache Sizing
+===================
+
+The amount of memory consumed by each OSD to be used for its BlueStore cache is
+determined by the ``bluestore_cache_size`` configuration option. If that option
+has not been specified (that is, if it remains at 0), then Ceph uses a
+different configuration option to determine the default memory budget:
+``bluestore_cache_size_hdd`` if the primary device is an HDD, or
+``bluestore_cache_size_ssd`` if the primary device is an SSD.
+
+BlueStore and the rest of the Ceph OSD daemon make every effort to work within
+this memory budget. Note that in addition to the configured cache size, there
+is also memory consumed by the OSD itself. There is additional utilization due
+to memory fragmentation and other allocator overhead. 
+
+The configured cache-memory budget can be used to store the following types of
+things:
+
+* Key/Value metadata (that is, RocksDB's internal cache)
+* BlueStore metadata
+* BlueStore data (that is, recently read or recently written object data)
+
+Cache memory usage is governed by the configuration options
+``bluestore_cache_meta_ratio`` and ``bluestore_cache_kv_ratio``.  The fraction
+of the cache that is reserved for data is governed by both the effective
+BlueStore cache size (which depends on the relevant
+``bluestore_cache_size[_ssd|_hdd]`` option and the device class of the primary
+device) and the "meta" and "kv" ratios.  This data fraction can be calculated
+with the following formula: ``<effective_cache_size> * (1 -
+bluestore_cache_meta_ratio - bluestore_cache_kv_ratio)``.
+
+.. confval:: bluestore_cache_size
+.. confval:: bluestore_cache_size_hdd
+.. confval:: bluestore_cache_size_ssd
+.. confval:: bluestore_cache_meta_ratio
+.. confval:: bluestore_cache_kv_ratio
+
+Checksums
+=========
+
+BlueStore checksums all metadata and all data written to disk. Metadata
+checksumming is handled by RocksDB and uses the `crc32c` algorithm. By
+contrast, data checksumming is handled by BlueStore and can use either
+`crc32c`, `xxhash32`, or `xxhash64`. Nonetheless, `crc32c` is the default
+checksum algorithm and it is suitable for most purposes.
+
+Full data checksumming increases the amount of metadata that BlueStore must
+store and manage. Whenever possible (for example, when clients hint that data
+is written and read sequentially), BlueStore will checksum larger blocks. In
+many cases, however, it must store a checksum value (usually 4 bytes) for every
+4 KB block of data.
+
+It is possible to obtain a smaller checksum value by truncating the checksum to
+one or two bytes and reducing the metadata overhead.  A drawback of this
+approach is that it increases the probability of a random error going
+undetected: about one in four billion given a 32-bit (4 byte) checksum, 1 in
+65,536 given a 16-bit (2 byte) checksum, and 1 in 256 given an 8-bit (1 byte)
+checksum. To use the smaller checksum values, select `crc32c_16` or `crc32c_8`
+as the checksum algorithm.
+
+The *checksum algorithm* can be specified either via a per-pool ``csum_type``
+configuration option or via the global configuration option. For example:
+
+.. prompt:: bash $
+
+   ceph osd pool set <pool-name> csum_type <algorithm>
+
+.. confval:: bluestore_csum_type
+
+Inline Compression
+==================
+
+BlueStore supports inline compression using `snappy`, `zlib`, `lz4`, or `zstd`. 
+
+Whether data in BlueStore is compressed is determined by two factors: (1) the
+*compression mode* and (2) any client hints associated with a write operation.
+The compression modes are as follows:
+
+* **none**: Never compress data.
+* **passive**: Do not compress data unless the write operation has a
+  *compressible* hint set.
+* **aggressive**: Do compress data unless the write operation has an
+  *incompressible* hint set.
+* **force**: Try to compress data no matter what.
+
+For more information about the *compressible* and *incompressible* I/O hints,
+see :c:func:`rados_set_alloc_hint`.
+
+Note that data in Bluestore will be compressed only if the data chunk will be
+sufficiently reduced in size (as determined by the ``bluestore compression
+required ratio`` setting). No matter which compression modes have been used, if
+the data chunk is too big, then it will be discarded and the original
+(uncompressed) data will be stored instead. For example, if ``bluestore
+compression required ratio`` is set to ``.7``, then data compression will take
+place only if the size of the compressed data is no more than 70% of the size
+of the original data.
+
+The *compression mode*, *compression algorithm*, *compression required ratio*,
+*min blob size*, and *max blob size* settings can be specified either via a
+per-pool property or via a global config option. To specify pool properties,
+run the following commands:
+
+.. prompt:: bash $
+
+   ceph osd pool set <pool-name> compression_algorithm <algorithm>
+   ceph osd pool set <pool-name> compression_mode <mode>
+   ceph osd pool set <pool-name> compression_required_ratio <ratio>
+   ceph osd pool set <pool-name> compression_min_blob_size <size>
+   ceph osd pool set <pool-name> compression_max_blob_size <size>
+
+.. confval:: bluestore_compression_algorithm
+.. confval:: bluestore_compression_mode
+.. confval:: bluestore_compression_required_ratio
+.. confval:: bluestore_compression_min_blob_size
+.. confval:: bluestore_compression_min_blob_size_hdd
+.. confval:: bluestore_compression_min_blob_size_ssd
+.. confval:: bluestore_compression_max_blob_size
+.. confval:: bluestore_compression_max_blob_size_hdd
+.. confval:: bluestore_compression_max_blob_size_ssd
+
+.. _bluestore-rocksdb-sharding:
+
+RocksDB Sharding
+================
+
+BlueStore maintains several types of internal key-value data, all of which are
+stored in RocksDB. Each data type in BlueStore is assigned a unique prefix.
+Prior to the Pacific release, all key-value data was stored in a single RocksDB
+column family: 'default'. In Pacific and later releases, however, BlueStore can
+divide key-value data into several RocksDB column families. BlueStore achieves
+better caching and more precise compaction when keys are similar: specifically,
+when keys have similar access frequency, similar modification frequency, and a
+similar lifetime.  Under such conditions, performance is improved and less disk
+space is required during compaction (because each column family is smaller and
+is able to compact independently of the others).
+
+OSDs deployed in Pacific or later releases use RocksDB sharding by default.
+However, if Ceph has been upgraded to Pacific or a later version from a
+previous version, sharding is disabled on any OSDs that were created before
+Pacific.
+
+To enable sharding and apply the Pacific defaults to a specific OSD, stop the
+OSD and run the following command:
+
+    .. prompt:: bash #
+
+       ceph-bluestore-tool \
+        --path <data path> \
+        --sharding="m(3) p(3,0-12) o(3,0-13)=block_cache={type=binned_lru} l p" \
+        reshard
+
+.. confval:: bluestore_rocksdb_cf
+.. confval:: bluestore_rocksdb_cfs
+
+Throttling
+==========
+
+.. confval:: bluestore_throttle_bytes
+.. confval:: bluestore_throttle_deferred_bytes
+.. confval:: bluestore_throttle_cost_per_io
+.. confval:: bluestore_throttle_cost_per_io_hdd
+.. confval:: bluestore_throttle_cost_per_io_ssd
+
+SPDK Usage
+==========
+
+To use the SPDK driver for NVMe devices, you must first prepare your system.
+See `SPDK document`__.
+
+.. __: http://www.spdk.io/doc/getting_started.html#getting_started_examples
+
+SPDK offers a script that will configure the device automatically. Run this
+script with root permissions:
+
+.. prompt:: bash $
+
+   sudo src/spdk/scripts/setup.sh
+
+You will need to specify the subject NVMe device's device selector with the
+"spdk:" prefix for ``bluestore_block_path``.
+
+In the following example, you first find the device selector of an Intel NVMe
+SSD by running the following command:
+
+.. prompt:: bash $
+
+   lspci -mm -n -d -d 8086:0953
+
+The form of the device selector is either ``DDDD:BB:DD.FF`` or
+``DDDD.BB.DD.FF``.
+
+Next, supposing that ``0000:01:00.0`` is the device selector found in the
+output of the ``lspci`` command, you can specify the device selector by running
+the following command::
+
+  bluestore_block_path = "spdk:trtype:pcie traddr:0000:01:00.0"
+
+You may also specify a remote NVMeoF target over the TCP transport, as in the
+following example::
+
+  bluestore_block_path = "spdk:trtype:tcp traddr:10.67.110.197 trsvcid:4420 subnqn:nqn.2019-02.io.spdk:cnode1"
+
+To run multiple SPDK instances per node, you must make sure each instance uses
+its own DPDK memory by specifying for each instance the amount of DPDK memory
+(in MB) that the instance will use.
+
+In most cases, a single device can be used for data, DB, and WAL. We describe
+this strategy as *colocating* these components. Be sure to enter the below
+settings to ensure that all I/Os are issued through SPDK::
+
+  bluestore_block_db_path = ""
+  bluestore_block_db_size = 0
+  bluestore_block_wal_path = ""
+  bluestore_block_wal_size = 0
+
+If these settings are not entered, then the current implementation will
+populate the SPDK map files with kernel file system symbols and will use the
+kernel driver to issue DB/WAL I/Os.
+
+Minimum Allocation Size
+=======================
+
+There is a configured minimum amount of storage that BlueStore allocates on an
+underlying storage device. In practice, this is the least amount of capacity
+that even a tiny RADOS object can consume on each OSD's primary device. The
+configuration option in question--:confval:`bluestore_min_alloc_size`--derives
+its value from the value of either :confval:`bluestore_min_alloc_size_hdd` or
+:confval:`bluestore_min_alloc_size_ssd`, depending on the OSD's ``rotational``
+attribute. Thus if an OSD is created on an HDD, BlueStore is initialized with
+the current value of :confval:`bluestore_min_alloc_size_hdd`; but with SSD OSDs
+(including NVMe devices), Bluestore is initialized with the current value of
+:confval:`bluestore_min_alloc_size_ssd`.
+
+In Mimic and earlier releases, the default values were 64KB for rotational
+media (HDD) and 16KB for non-rotational media (SSD). The Octopus release
+changed the the default value for non-rotational media (SSD) to 4KB, and the
+Pacific release changed the default value for rotational media (HDD) to 4KB.
+
+These changes were driven by space amplification that was experienced by Ceph
+RADOS GateWay (RGW) deployments that hosted large numbers of small files
+(S3/Swift objects).
+
+For example, when an RGW client stores a 1 KB S3 object, that object is written
+to a single RADOS object. In accordance with the default
+:confval:`min_alloc_size` value, 4 KB of underlying drive space is allocated.
+This means that roughly 3 KB (that is, 4 KB minus 1 KB) is allocated but never
+used: this corresponds to 300% overhead or 25% efficiency. Similarly, a 5 KB
+user object will be stored as two RADOS objects, a 4 KB RADOS object and a 1 KB
+RADOS object, with the result that 4KB of device capacity is stranded. In this
+case, however, the overhead percentage is much smaller. Think of this in terms
+of the remainder from a modulus operation. The overhead *percentage* thus
+decreases rapidly as object size increases.
+
+There is an additional subtlety that is easily missed: the amplification
+phenomenon just described takes place for *each* replica. For example, when
+using the default of three copies of data (3R), a 1 KB S3 object actually
+strands roughly 9 KB of storage device capacity. If erasure coding (EC) is used
+instead of replication, the amplification might be even higher: for a ``k=4,
+m=2`` pool, our 1 KB S3 object allocates 24 KB (that is, 4 KB multiplied by 6)
+of device capacity.
+
+When an RGW bucket pool contains many relatively large user objects, the effect
+of this phenomenon is often negligible. However, with deployments that can
+expect a significant fraction of relatively small user objects, the effect
+should be taken into consideration.
+
+The 4KB default value aligns well with conventional HDD and SSD devices.
+However, certain novel coarse-IU (Indirection Unit) QLC SSDs perform and wear
+best when :confval:`bluestore_min_alloc_size_ssd` is specified at OSD creation
+to match the device's IU: this might be 8KB, 16KB, or even 64KB.  These novel
+storage drives can achieve read performance that is competitive with that of
+conventional TLC SSDs and write performance that is faster than that of HDDs,
+with higher density and lower cost than TLC SSDs.
+
+Note that when creating OSDs on these novel devices, one must be careful to
+apply the non-default value only to appropriate devices, and not to
+conventional HDD and SSD devices. Error can be avoided through careful ordering
+of OSD creation, with custom OSD device classes, and especially by the use of
+central configuration *masks*.
+
+In Quincy and later releases, you can use the
+:confval:`bluestore_use_optimal_io_size_for_min_alloc_size` option to allow
+automatic discovery of the correct value as each OSD is created. Note that the
+use of ``bcache``, ``OpenCAS``, ``dmcrypt``, ``ATA over Ethernet``, `iSCSI`, or
+other device-layering and abstraction technologies might confound the
+determination of correct values. Moreover, OSDs deployed on top of VMware
+storage have sometimes been found to report a ``rotational`` attribute that
+does not match the underlying hardware.
+
+We suggest inspecting such OSDs at startup via logs and admin sockets in order
+to ensure that their behavior is correct. Be aware that this kind of inspection
+might not work as expected with older kernels.  To check for this issue,
+examine the presence and value of ``/sys/block/<drive>/queue/optimal_io_size``.
+
+.. note:: When running Reef or a later Ceph release, the ``min_alloc_size``
+   baked into each OSD is conveniently reported by ``ceph osd metadata``.
+
+To inspect a specific OSD, run the following command:
+
+.. prompt:: bash #
+
+   ceph osd metadata osd.1701 | egrep rotational\|alloc
+
+This space amplification might manifest as an unusually high ratio of raw to
+stored data as reported by ``ceph df``. There might also be ``%USE`` / ``VAR``
+values reported by ``ceph osd df`` that are unusually high in comparison to
+other, ostensibly identical, OSDs. Finally, there might be unexpected balancer
+behavior in pools that use OSDs that have mismatched ``min_alloc_size`` values.
+
+This BlueStore attribute takes effect *only* at OSD creation; if the attribute
+is changed later, a specific OSD's behavior will not change unless and until
+the OSD is destroyed and redeployed with the appropriate option value(s).
+Upgrading to a later Ceph release will *not* change the value used by OSDs that
+were deployed under older releases or with other settings.
+
+.. confval:: bluestore_min_alloc_size
+.. confval:: bluestore_min_alloc_size_hdd
+.. confval:: bluestore_min_alloc_size_ssd
+.. confval:: bluestore_use_optimal_io_size_for_min_alloc_size
+
+DSA (Data Streaming Accelerator) Usage
+======================================
+
+If you want to use the DML library to drive the DSA device for offloading
+read/write operations on persistent memory (PMEM) in BlueStore, you need to
+install `DML`_ and the `idxd-config`_ library. This will work only on machines
+that have a SPR (Sapphire Rapids) CPU.
+
+.. _dml: https://github.com/intel/dml
+.. _idxd-config: https://github.com/intel/idxd-config
+
+After installing the DML software, configure the shared work queues (WQs) with
+reference to the following WQ configuration example:
+
+.. prompt:: bash $
+
+   accel-config config-wq --group-id=1 --mode=shared --wq-size=16 --threshold=15 --type=user --name="myapp1" --priority=10 --block-on-fault=1 dsa0/wq0.1
+   accel-config config-engine dsa0/engine0.1 --group-id=1
+   accel-config enable-device dsa0
+   accel-config enable-wq dsa0/wq0.1
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-21 11:54:28 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-21 11:54:28 +0000
commit	e6918187568dbd01842d8d1d2c808ce16a894239 (patch)
tree	64f88b554b444a49f656b6c656111a145cbbaa28 /doc/rados/configuration/bluestore-config-ref.rst
parent	Initial commit. (diff)
download	ceph-e6918187568dbd01842d8d1d2c808ce16a894239.tar.xz ceph-e6918187568dbd01842d8d1d2c808ce16a894239.zip