summaryrefslogtreecommitdiffstats
path: root/doc/rbd
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-21 11:54:28 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-21 11:54:28 +0000
commite6918187568dbd01842d8d1d2c808ce16a894239 (patch)
tree64f88b554b444a49f656b6c656111a145cbbaa28 /doc/rbd
parentInitial commit. (diff)
downloadceph-e6918187568dbd01842d8d1d2c808ce16a894239.tar.xz
ceph-e6918187568dbd01842d8d1d2c808ce16a894239.zip
Adding upstream version 18.2.2.upstream/18.2.2
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/rbd')
-rw-r--r--doc/rbd/api/index.rst8
-rw-r--r--doc/rbd/api/librbdpy.rst85
-rw-r--r--doc/rbd/disk.conf8
-rw-r--r--doc/rbd/index.rst72
-rw-r--r--doc/rbd/iscsi-initiator-esx.rst105
-rw-r--r--doc/rbd/iscsi-initiator-linux.rst119
-rw-r--r--doc/rbd/iscsi-initiator-win.rst102
-rw-r--r--doc/rbd/iscsi-initiators.rst25
-rw-r--r--doc/rbd/iscsi-monitoring.rst85
-rw-r--r--doc/rbd/iscsi-overview.rst57
-rw-r--r--doc/rbd/iscsi-requirements.rst51
-rw-r--r--doc/rbd/iscsi-target-ansible.rst236
-rw-r--r--doc/rbd/iscsi-target-cli-manual-install.rst190
-rw-r--r--doc/rbd/iscsi-target-cli.rst266
-rw-r--r--doc/rbd/iscsi-targets.rst27
-rw-r--r--doc/rbd/libvirt.rst323
-rw-r--r--doc/rbd/man/index.rst16
-rw-r--r--doc/rbd/qemu-rbd.rst219
-rw-r--r--doc/rbd/rados-rbd-cmds.rst326
-rw-r--r--doc/rbd/rbd-cloudstack.rst157
-rw-r--r--doc/rbd/rbd-config-ref.rst265
-rw-r--r--doc/rbd/rbd-encryption.rst246
-rw-r--r--doc/rbd/rbd-exclusive-locks.rst104
-rw-r--r--doc/rbd/rbd-integrations.rst16
-rw-r--r--doc/rbd/rbd-ko.rst59
-rw-r--r--doc/rbd/rbd-kubernetes.rst364
-rw-r--r--doc/rbd/rbd-live-migration.rst367
-rw-r--r--doc/rbd/rbd-mirroring.rst538
-rw-r--r--doc/rbd/rbd-nomad.rst475
-rw-r--r--doc/rbd/rbd-openstack.rst395
-rw-r--r--doc/rbd/rbd-operations.rst16
-rw-r--r--doc/rbd/rbd-persistent-read-only-cache.rst201
-rw-r--r--doc/rbd/rbd-persistent-write-log-cache.rst139
-rw-r--r--doc/rbd/rbd-replay.rst42
-rw-r--r--doc/rbd/rbd-snapshot.rst368
-rw-r--r--doc/rbd/rbd-windows.rst235
36 files changed, 6307 insertions, 0 deletions
diff --git a/doc/rbd/api/index.rst b/doc/rbd/api/index.rst
new file mode 100644
index 000000000..27bb4485d
--- /dev/null
+++ b/doc/rbd/api/index.rst
@@ -0,0 +1,8 @@
+========================
+ Ceph Block Device APIs
+========================
+
+.. toctree::
+ :maxdepth: 2
+
+ librbd (Python) <librbdpy>
diff --git a/doc/rbd/api/librbdpy.rst b/doc/rbd/api/librbdpy.rst
new file mode 100644
index 000000000..7a74b0498
--- /dev/null
+++ b/doc/rbd/api/librbdpy.rst
@@ -0,0 +1,85 @@
+.. _rbd api py:
+
+================
+ Librbd (Python)
+================
+
+.. highlight:: python
+
+The `rbd` python module provides file-like access to RBD images.
+
+
+Example: Creating and writing to an image
+=========================================
+
+To use `rbd`, you must first connect to RADOS and open an IO
+context::
+
+ cluster = rados.Rados(conffile='my_ceph.conf')
+ cluster.connect()
+ ioctx = cluster.open_ioctx('mypool')
+
+Then you instantiate an :class:rbd.RBD object, which you use to create the
+image::
+
+ rbd_inst = rbd.RBD()
+ size = 4 * 1024**3 # 4 GiB
+ rbd_inst.create(ioctx, 'myimage', size)
+
+To perform I/O on the image, you instantiate an :class:rbd.Image object::
+
+ image = rbd.Image(ioctx, 'myimage')
+ data = b'foo' * 200
+ image.write(data, 0)
+
+This writes 'foo' to the first 600 bytes of the image. Note that data
+cannot be :type:unicode - `Librbd` does not know how to deal with
+characters wider than a :c:type:char.
+
+In the end, you will want to close the image, the IO context and the connection to RADOS::
+
+ image.close()
+ ioctx.close()
+ cluster.shutdown()
+
+To be safe, each of these calls would need to be in a separate :finally
+block::
+
+ cluster = rados.Rados(conffile='my_ceph_conf')
+ try:
+ cluster.connect()
+ ioctx = cluster.open_ioctx('my_pool')
+ try:
+ rbd_inst = rbd.RBD()
+ size = 4 * 1024**3 # 4 GiB
+ rbd_inst.create(ioctx, 'myimage', size)
+ image = rbd.Image(ioctx, 'myimage')
+ try:
+ data = b'foo' * 200
+ image.write(data, 0)
+ finally:
+ image.close()
+ finally:
+ ioctx.close()
+ finally:
+ cluster.shutdown()
+
+This can be cumbersome, so the :class:`Rados`, :class:`Ioctx`, and
+:class:`Image` classes can be used as context managers that close/shutdown
+automatically (see :pep:`343`). Using them as context managers, the
+above example becomes::
+
+ with rados.Rados(conffile='my_ceph.conf') as cluster:
+ with cluster.open_ioctx('mypool') as ioctx:
+ rbd_inst = rbd.RBD()
+ size = 4 * 1024**3 # 4 GiB
+ rbd_inst.create(ioctx, 'myimage', size)
+ with rbd.Image(ioctx, 'myimage') as image:
+ data = b'foo' * 200
+ image.write(data, 0)
+
+API Reference
+=============
+
+.. automodule:: rbd
+ :members: RBD, Image, SnapIterator
diff --git a/doc/rbd/disk.conf b/doc/rbd/disk.conf
new file mode 100644
index 000000000..3db9b8a11
--- /dev/null
+++ b/doc/rbd/disk.conf
@@ -0,0 +1,8 @@
+<disk type='network' device='disk'>
+ <source protocol='rbd' name='poolname/imagename'>
+ <host name='{fqdn}' port='6789'/>
+ <host name='{fqdn}' port='6790'/>
+ <host name='{fqdn}' port='6791'/>
+ </source>
+ <target dev='vda' bus='virtio'/>
+</disk>
diff --git a/doc/rbd/index.rst b/doc/rbd/index.rst
new file mode 100644
index 000000000..4a8029bba
--- /dev/null
+++ b/doc/rbd/index.rst
@@ -0,0 +1,72 @@
+===================
+ Ceph Block Device
+===================
+
+.. index:: Ceph Block Device; introduction
+
+A block is a sequence of bytes (often 512).
+Block-based storage interfaces are a mature and common way to store data on
+media including HDDs, SSDs, CDs, floppy disks, and even tape.
+The ubiquity of block device interfaces is a perfect fit for interacting
+with mass data storage including Ceph.
+
+Ceph block devices are thin-provisioned, resizable, and store data striped over
+multiple OSDs. Ceph block devices leverage
+:abbr:`RADOS (Reliable Autonomic Distributed Object Store)` capabilities
+including snapshotting, replication and strong consistency. Ceph block
+storage clients communicate with Ceph clusters through kernel modules or
+the ``librbd`` library.
+
+.. ditaa::
+
+ +------------------------+ +------------------------+
+ | Kernel Module | | librbd |
+ +------------------------+-+------------------------+
+ | RADOS Protocol |
+ +------------------------+-+------------------------+
+ | OSDs | | Monitors |
+ +------------------------+ +------------------------+
+
+.. note:: Kernel modules can use Linux page caching. For ``librbd``-based
+ applications, Ceph supports `RBD Caching`_.
+
+Ceph's block devices deliver high performance with vast scalability to
+`kernel modules`_, or to :abbr:`KVMs (kernel virtual machines)` such as `QEMU`_, and
+cloud-based computing systems like `OpenStack`_ and `CloudStack`_ that rely on
+libvirt and QEMU to integrate with Ceph block devices. You can use the same cluster
+to operate the :ref:`Ceph RADOS Gateway <object-gateway>`, the
+:ref:`Ceph File System <ceph-file-system>`, and Ceph block devices simultaneously.
+
+.. important:: To use Ceph Block Devices, you must have access to a running
+ Ceph cluster.
+
+.. toctree::
+ :maxdepth: 1
+
+ Basic Commands <rados-rbd-cmds>
+
+.. toctree::
+ :maxdepth: 2
+
+ Operations <rbd-operations>
+
+.. toctree::
+ :maxdepth: 2
+
+ Integrations <rbd-integrations>
+
+.. toctree::
+ :maxdepth: 2
+
+ Manpages <man/index>
+
+.. toctree::
+ :maxdepth: 2
+
+ APIs <api/index>
+
+.. _RBD Caching: ./rbd-config-ref/
+.. _kernel modules: ./rbd-ko/
+.. _QEMU: ./qemu-rbd/
+.. _OpenStack: ./rbd-openstack
+.. _CloudStack: ./rbd-cloudstack
diff --git a/doc/rbd/iscsi-initiator-esx.rst b/doc/rbd/iscsi-initiator-esx.rst
new file mode 100644
index 000000000..8bed6f2a2
--- /dev/null
+++ b/doc/rbd/iscsi-initiator-esx.rst
@@ -0,0 +1,105 @@
+------------------------------
+iSCSI Initiator for VMware ESX
+------------------------------
+
+**Prerequisite:**
+
+- VMware ESX 6.5 or later using Virtual Machine compatibility 6.5 with VMFS 6.
+
+**iSCSI Discovery and Multipath Device Setup:**
+
+The following instructions will use the default vSphere web client and esxcli.
+
+#. Enable Software iSCSI
+
+ .. image:: ../images/esx_web_client_storage_main.png
+ :align: center
+
+ Click on "Storage" from "Navigator", and select the "Adapters" tab.
+ From there right click "Configure iSCSI".
+
+#. Set Initiator Name
+
+ .. image:: ../images/esx_config_iscsi_main.png
+ :align: center
+
+ If the initiator name in the "Name & alias" section is not the same name
+ used when creating the client during gwcli setup or the initiator name used
+ in the ansible client_connections client variable, then ssh to the ESX
+ host and run the following esxcli commands to change the name.
+
+ Get the adapter name for Software iSCSI:
+
+ ::
+
+ > esxcli iscsi adapter list
+ > Adapter Driver State UID Description
+ > ------- --------- ------ ------------- ----------------------
+ > vmhba64 iscsi_vmk online iscsi.vmhba64 iSCSI Software Adapter
+
+ In this example the software iSCSI adapter is vmhba64 and the initiator
+ name is iqn.1994-05.com.redhat:rh7-client:
+
+ ::
+
+ > esxcli iscsi adapter set -A vmhba64 -n iqn.1994-05.com.redhat:rh7-client
+
+#. Setup CHAP
+
+ .. image:: ../images/esx_chap.png
+ :align: center
+
+ Expand the CHAP authentication section, select "Do not use CHAP unless
+ required by target" and enter the CHAP credentials used in the gwcli
+ auth command or ansible client_connections credentials variable.
+
+ The Mutual CHAP authentication section should have "Do not use CHAP"
+ selected.
+
+ Warning: There is a bug in the web client where the requested CHAP
+ settings are not always used initially. On the iSCSI gateway kernel
+ logs you will see the error:
+
+ ::
+
+ > kernel: CHAP user or password not set for Initiator ACL
+ > kernel: Security negotiation failed.
+ > kernel: iSCSI Login negotiation failed.
+
+ To workaround this set the CHAP settings with the esxcli command. Here
+ authname is the username and secret is the password used in previous
+ examples:
+
+ ::
+
+ > esxcli iscsi adapter auth chap set --direction=uni --authname=myiscsiusername --secret=myiscsipassword --level=discouraged -A vmhba64
+
+#. Configure iSCSI Settings
+
+ .. image:: ../images/esx_iscsi_recov_timeout.png
+ :align: center
+
+ Expand Advanced settings and set the "RecoveryTimeout" to 25.
+
+#. Set the discovery address
+
+ .. image:: ../images/esx_config_iscsi_main.png
+ :align: center
+
+ In the Dynamic targets section, click "Add dynamic target" and under
+ Addresses add one of the gateway IP addresses added during the iSCSI
+ gateway setup stage in the gwcli section or an IP set in the ansible
+ gateway_ip_list variable. Only one address needs to be added as the gateways
+ have been setup so all the iSCSI portals are returned during discovery.
+
+ Finally, click the "Save configuration" button. In the Devices tab, you
+ should see the RBD image.
+
+ The LUN should be automatically configured and using the ALUA SATP and
+ MRU PSP. Other SATPs and PSPs must not be used. This can be verified with
+ the esxcli command:
+
+ ::
+
+ > esxcli storage nmp path list -d eui.your_devices_id
+
diff --git a/doc/rbd/iscsi-initiator-linux.rst b/doc/rbd/iscsi-initiator-linux.rst
new file mode 100644
index 000000000..bf8c930f3
--- /dev/null
+++ b/doc/rbd/iscsi-initiator-linux.rst
@@ -0,0 +1,119 @@
+-------------------------
+iSCSI Initiator for Linux
+-------------------------
+
+**Prerequisite:**
+
+- Package ``iscsi-initiator-utils``
+
+- Package ``device-mapper-multipath``
+
+**Installing:**
+
+Install the iSCSI initiator and multipath tools:
+
+.. prompt:: bash #
+
+ yum install iscsi-initiator-utils
+ yum install device-mapper-multipath
+
+**Configuring:**
+
+#. Create the default ``/etc/multipath.conf`` file and enable the
+ ``multipathd`` service:
+
+ .. prompt:: bash #
+
+ mpathconf --enable --with_multipathd y
+
+#. Add the following to the ``/etc/multipath.conf`` file:
+
+ ::
+
+ devices {
+ device {
+ vendor "LIO-ORG"
+ product "TCMU device"
+ hardware_handler "1 alua"
+ path_grouping_policy "failover"
+ path_selector "queue-length 0"
+ failback 60
+ path_checker tur
+ prio alua
+ prio_args exclusive_pref_bit
+ fast_io_fail_tmo 25
+ no_path_retry queue
+ }
+ }
+
+#. Restart the ``multipathd`` service:
+
+ .. prompt:: bash #
+
+ systemctl reload multipathd
+
+**iSCSI Discovery and Setup:**
+
+#. Enable CHAP authentication and provide the initiator CHAP username
+ and password by uncommenting and setting the following options in
+ the ``/etc/iscsi/iscsid.conf`` file:
+
+ ::
+
+ node.session.auth.authmethod = CHAP
+ node.session.auth.username = myusername
+ node.session.auth.password = mypassword
+
+ If you intend to use mutual (bidirectional) authentication, provide the
+ target CHAP username and password:
+
+ ::
+
+ node.session.auth.username_in = mytgtusername
+ node.session.auth.password_in = mytgtpassword
+
+#. Discover the target portals:
+
+ .. prompt:: bash #
+
+ iscsiadm -m discovery -t st -p 192.168.56.101
+
+ ::
+
+ 192.168.56.101:3260,1 iqn.2003-01.org.linux-iscsi.rheln1
+ 192.168.56.102:3260,2 iqn.2003-01.org.linux-iscsi.rheln1
+
+#. Log in to the target:
+
+ .. prompt:: bash #
+
+ iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.rheln1 -l
+
+**Multipath IO Setup:**
+
+#. The multipath daemon (``multipathd``) uses the ``multipath.conf`` settings
+ to set up devices automatically. Running the ``multipath`` command shows
+ that the devices have been set up in a failover configuration. Notice that
+ each path has been placed into its own priority group:
+
+ .. prompt:: bash #
+
+ multipath -ll
+
+ ::
+
+ mpathbt (360014059ca317516a69465c883a29603) dm-1 LIO-ORG ,IBLOCK
+ size=1.0G features='0' hwhandler='1 alua' wp=rw
+ |-+- policy='queue-length 0' prio=50 status=active
+ | `- 28:0:0:1 sde 8:64 active ready running
+ `-+- policy='queue-length 0' prio=10 status=enabled
+ `- 29:0:0:1 sdc 8:32 active ready running
+
+ You should now be able to use the RBD image in the same way that you would
+ use a normal multipath iSCSI disk.
+
+#. Log out of the target:
+
+ .. prompt:: bash #
+
+ iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.rheln1 -u
diff --git a/doc/rbd/iscsi-initiator-win.rst b/doc/rbd/iscsi-initiator-win.rst
new file mode 100644
index 000000000..7816059bc
--- /dev/null
+++ b/doc/rbd/iscsi-initiator-win.rst
@@ -0,0 +1,102 @@
+-------------------------------------
+iSCSI Initiator for Microsoft Windows
+-------------------------------------
+
+**Prerequisite:**
+
+- Microsoft Windows Server 2016 or later
+
+**iSCSI Initiator, Discovery and Setup:**
+
+#. Install the iSCSI initiator driver and MPIO tools.
+
+#. Launch the MPIO program, click on the "Discover Multi-Paths" tab, check the
+ "Add support for iSCSI devices” box, and click "Add". This will require a
+ reboot.
+
+#. On the iSCSI Initiator Properties window, on the "Discovery" tab, add a target
+ portal. Enter the IP address or DNS name and Port of the Ceph iSCSI gateway.
+
+#. On the “Targets” tab, select the target and click on “Connect”.
+
+#. On the “Connect To Target” window, select the “Enable multi-path” option, and
+ click the “Advanced” button.
+
+#. Under the "Connect using" section, select a “Target portal IP” . Select the
+ “Enable CHAP login on” and enter the "Name" and "Target secret" values from the
+ Ceph iSCSI Ansible client credentials section, and click OK.
+
+#. Repeat steps 5 and 6 for each target portal defined when setting up
+ the iSCSI gateway.
+
+**Multipath IO Setup:**
+
+Configuring the MPIO load balancing policy, setting the timeout and
+retry options are using PowerShell with the ``mpclaim`` command. The
+rest is done in the iSCSI Initiator tool.
+
+.. note::
+ It is recommended to increase the ``PDORemovePeriod`` option to 120
+ seconds from PowerShell. This value might need to be adjusted based
+ on the application. When all paths are down, and 120 seconds
+ expires, the operating system will start failing IO requests.
+
+::
+
+ Set-MPIOSetting -NewPDORemovePeriod 120
+
+::
+
+ mpclaim.exe -l -m 1
+
+::
+
+ mpclaim -s -m
+ MSDSM-wide Load Balance Policy: Fail Over Only
+
+#. Using the iSCSI Initiator tool, from the “Targets” tab, click on
+ the “Devices...” button.
+
+#. From the Devices window, select a disk and click the
+ “MPIO...” button.
+
+#. On the "Device Details" window the paths to each target portal is
+ displayed. If using the ``ceph-ansible`` setup method, the
+ iSCSI gateway will use ALUA to tell the iSCSI initiator which path
+ and iSCSI gateway should be used as the primary path. The Load
+ Balancing Policy “Fail Over Only” must be selected
+
+::
+
+ mpclaim -s -d $MPIO_DISK_ID
+
+.. note::
+ For the ``ceph-ansible`` setup method, there will be one
+ Active/Optimized path which is the path to the iSCSI gateway node
+ that owns the LUN, and there will be an Active/Unoptimized path for
+ each other iSCSI gateway node.
+
+**Tuning:**
+
+Consider using the following registry settings:
+
+- Windows Disk Timeout
+
+ ::
+
+ HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk
+
+ ::
+
+ TimeOutValue = 65
+
+- Microsoft iSCSI Initiator Driver
+
+ ::
+
+ HKEY_LOCAL_MACHINE\\SYSTEM\CurrentControlSet\Control\Class\{4D36E97B-E325-11CE-BFC1-08002BE10318}\<Instance_Number>\Parameters
+
+ ::
+
+ LinkDownTime = 25
+ SRBTimeoutDelta = 15
diff --git a/doc/rbd/iscsi-initiators.rst b/doc/rbd/iscsi-initiators.rst
new file mode 100644
index 000000000..b24338952
--- /dev/null
+++ b/doc/rbd/iscsi-initiators.rst
@@ -0,0 +1,25 @@
+.. _configuring-the-iscsi-initiators:
+
+--------------------------------
+Configuring the iSCSI Initiators
+--------------------------------
+
+- `iSCSI Initiator for Linux <../iscsi-initiator-linux>`_
+
+- `iSCSI Initiator for Microsoft Windows <../iscsi-initiator-win>`_
+
+- `iSCSI Initiator for VMware ESX <../iscsi-initiator-esx>`_
+
+ .. warning::
+
+ Applications that use SCSI persistent group reservations (PGR) and
+ SCSI 2 based reservations are not supported when exporting a RBD image
+ through more than one iSCSI gateway.
+
+.. toctree::
+ :maxdepth: 1
+ :hidden:
+
+ Linux <iscsi-initiator-linux>
+ Microsoft Windows <iscsi-initiator-win>
+ VMware ESX <iscsi-initiator-esx>
diff --git a/doc/rbd/iscsi-monitoring.rst b/doc/rbd/iscsi-monitoring.rst
new file mode 100644
index 000000000..a36cc3cdb
--- /dev/null
+++ b/doc/rbd/iscsi-monitoring.rst
@@ -0,0 +1,85 @@
+------------------------------
+Monitoring Ceph iSCSI gateways
+------------------------------
+
+Ceph provides a tool for iSCSI gateway environments
+to monitor performance of exported RADOS Block Device (RBD) images.
+
+The ``gwtop`` tool is a ``top``-like tool that displays aggregated
+performance metrics of RBD images that are exported to clients over
+iSCSI. The metrics are sourced from a Performance Metrics Domain Agent
+(PMDA). Information from the Linux-IO target (LIO) PMDA is used to list
+each exported RBD image, the connected client, and its associated I/O
+metrics.
+
+**Requirements:**
+
+- A running Ceph iSCSI gateway
+
+**Installing:**
+
+#. As ``root``, install the ``ceph-iscsi-tools`` package on each iSCSI
+ gateway node:
+
+ .. prompt:: bash #
+
+ yum install ceph-iscsi-tools
+
+#. As ``root``, install the performance co-pilot package on each iSCSI
+ gateway node:
+
+ .. prompt:: bash #
+
+ yum install pcp
+
+#. As ``root``, install the LIO PMDA package on each iSCSI gateway node:
+
+ .. prompt:: bash #
+
+ yum install pcp-pmda-lio
+
+#. As ``root``, enable and start the performance co-pilot service on
+ each iSCSI gateway node:
+
+ .. prompt:: bash #
+
+ systemctl enable pmcd
+ systemctl start pmcd
+
+#. As ``root``, register the ``pcp-pmda-lio`` agent:
+
+ .. prompt:: bash #
+
+ cd /var/lib/pcp/pmdas/lio
+ ./Install
+
+By default, ``gwtop`` assumes the iSCSI gateway configuration object is
+stored in a RADOS object called ``gateway.conf`` in the ``rbd`` pool.
+This configuration defines the iSCSI gateways to contact for gathering
+the performance statistics. This can be overridden by using either the
+``-g`` or ``-c`` flags. See ``gwtop --help`` for more details.
+
+The LIO configuration determines which type of performance statistics to
+extract from performance co-pilot. When ``gwtop`` starts it looks at the
+LIO configuration, and if it find user-space disks, then ``gwtop``
+selects the LIO collector automatically.
+
+**Example ``gwtop`` Outputs**
+
+::
+
+ gwtop 2/2 Gateways CPU% MIN: 4 MAX: 5 Network Total In: 2M Out: 3M 10:20:00
+ Capacity: 8G Disks: 8 IOPS: 503 Clients: 1 Ceph: HEALTH_OK OSDs: 3
+ Pool.Image Src Size iops rMB/s wMB/s Client
+ iscsi.t1703 500M 0 0.00 0.00
+ iscsi.testme1 500M 0 0.00 0.00
+ iscsi.testme2 500M 0 0.00 0.00
+ iscsi.testme3 500M 0 0.00 0.00
+ iscsi.testme5 500M 0 0.00 0.00
+ rbd.myhost_1 T 4G 504 1.95 0.00 rh460p(CON)
+ rbd.test_2 1G 0 0.00 0.00
+ rbd.testme 500M 0 0.00 0.00
+
+In the *Client* column, ``(CON)`` means the iSCSI initiator (client) is
+currently logged into the iSCSI gateway. If ``-multi-`` is displayed,
+then multiple clients are mapped to the single RBD image.
diff --git a/doc/rbd/iscsi-overview.rst b/doc/rbd/iscsi-overview.rst
new file mode 100644
index 000000000..879083c3f
--- /dev/null
+++ b/doc/rbd/iscsi-overview.rst
@@ -0,0 +1,57 @@
+.. _ceph-iscsi:
+
+==================
+Ceph iSCSI Gateway
+==================
+
+The iSCSI Gateway presents a Highly Available (HA) iSCSI target that exports
+RADOS Block Device (RBD) images as SCSI disks. The iSCSI protocol allows
+clients (initiators) to send SCSI commands to storage devices (targets) over a
+TCP/IP network, enabling clients without native Ceph client support to access
+Ceph block storage.
+
+Each iSCSI gateway exploits the Linux IO target kernel subsystem (LIO) to
+provide iSCSI protocol support. LIO utilizes userspace passthrough (TCMU) to
+interact with Ceph's librbd library and expose RBD images to iSCSI clients.
+With Ceph’s iSCSI gateway you can provision a fully integrated block-storage
+infrastructure with all the features and benefits of a conventional Storage
+Area Network (SAN).
+
+.. ditaa::
+ Cluster Network (optional)
+ +-------------------------------------------+
+ | | | |
+ +-------+ +-------+ +-------+ +-------+
+ | | | | | | | |
+ | OSD 1 | | OSD 2 | | OSD 3 | | OSD N |
+ | {s}| | {s}| | {s}| | {s}|
+ +-------+ +-------+ +-------+ +-------+
+ | | | |
+ +--------->| | +---------+ | |<---------+
+ : | | | RBD | | | :
+ | +----------------| Image |----------------+ |
+ | Public Network | {d} | |
+ | +---------+ |
+ | |
+ | +-------------------+ |
+ | +--------------+ | iSCSI Initiators | +--------------+ |
+ | | iSCSI GW | | +-----------+ | | iSCSI GW | |
+ +-->| RBD Module |<--+ | Various | +-->| RBD Module |<--+
+ | | | | Operating | | | |
+ +--------------+ | | Systems | | +--------------+
+ | +-----------+ |
+ +-------------------+
+
+.. warning::
+
+ The iSCSI gateway is in maintenance as of November 2022. This means that
+ it is no longer in active development and will not be updated to add
+ new features.
+
+.. toctree::
+ :maxdepth: 1
+
+ Requirements <iscsi-requirements>
+ Configuring the iSCSI Target <iscsi-targets>
+ Configuring the iSCSI Initiators <iscsi-initiators>
+ Monitoring the iSCSI Gateways <iscsi-monitoring>
diff --git a/doc/rbd/iscsi-requirements.rst b/doc/rbd/iscsi-requirements.rst
new file mode 100644
index 000000000..50dfc2a27
--- /dev/null
+++ b/doc/rbd/iscsi-requirements.rst
@@ -0,0 +1,51 @@
+==========================
+iSCSI Gateway Requirements
+==========================
+
+It is recommended to provision two to four iSCSI gateway nodes to
+realize a highly available Ceph iSCSI gateway solution.
+
+For hardware recommendations, see :ref:`hardware-recommendations` .
+
+.. note::
+ On iSCSI gateway nodes the memory footprint is a function of
+ of the RBD images mapped and can grow to be large. Plan memory
+ requirements accordingly based on the number RBD images to be mapped.
+
+There are no specific iSCSI gateway options for the Ceph Monitors or
+OSDs, but it is important to lower the default heartbeat interval for
+detecting down OSDs to reduce the possibility of initiator timeouts.
+The following configuration options are suggested::
+
+ [osd]
+ osd heartbeat grace = 20
+ osd heartbeat interval = 5
+
+- Updating Running State From a Ceph Monitor Node
+
+ ::
+
+ ceph tell <daemon_type>.<id> config set <parameter_name> <new_value>
+
+ ::
+
+ ceph tell osd.* config set osd_heartbeat_grace 20
+ ceph tell osd.* config set osd_heartbeat_interval 5
+
+- Updating Running State On Each OSD Node
+
+ ::
+
+ ceph daemon <daemon_type>.<id> config set osd_client_watch_timeout 15
+
+ ::
+
+ ceph daemon osd.0 config set osd_heartbeat_grace 20
+ ceph daemon osd.0 config set osd_heartbeat_interval 5
+
+For more details on setting Ceph's configuration options, see
+:ref:`configuring-ceph`. Be sure to persist these settings in
+``/etc/ceph.conf`` or, on Mimic and later releases, in the
+centralized config store.
+
+
diff --git a/doc/rbd/iscsi-target-ansible.rst b/doc/rbd/iscsi-target-ansible.rst
new file mode 100644
index 000000000..f89c4a0d2
--- /dev/null
+++ b/doc/rbd/iscsi-target-ansible.rst
@@ -0,0 +1,236 @@
+==========================================
+Configuring the iSCSI Target using Ansible
+==========================================
+
+The Ceph iSCSI gateway is the iSCSI target node and also a Ceph client
+node. The Ceph iSCSI gateway can be provisioned on dedicated node
+or be colocated on a Ceph Object Store Disk (OSD) node. The following steps will
+install and configure the Ceph iSCSI gateway for basic operation.
+
+**Requirements:**
+
+- A running Ceph Luminous (12.2.x) cluster or newer
+
+- Red Hat Enterprise Linux/CentOS 7.5 (or newer); Linux kernel v4.16 (or newer)
+
+- The ``ceph-iscsi`` package installed on all the iSCSI gateway nodes
+
+**Installation:**
+
+#. On the Ansible installer node, which could be either the administration node
+ or a dedicated deployment node, perform the following steps:
+
+ #. As ``root``, install the ``ceph-ansible`` package:
+
+ ::
+
+ # yum install ceph-ansible
+
+ #. Add an entry in ``/etc/ansible/hosts`` file for the gateway group:
+
+ ::
+
+ [iscsigws]
+ ceph-igw-1
+ ceph-igw-2
+
+.. note::
+ If co-locating the iSCSI gateway with an OSD node, then add the OSD node to the
+ ``[iscsigws]`` section.
+
+**Configuration:**
+
+The ``ceph-ansible`` package places a file in the ``/usr/share/ceph-ansible/group_vars/``
+directory called ``iscsigws.yml.sample``. Create a copy of this sample file named
+``iscsigws.yml``. Review the following Ansible variables and descriptions,
+and update accordingly. See the ``iscsigws.yml.sample`` for a full list of
+advanced variables.
+
++--------------------------------------+--------------------------------------+
+| Variable | Meaning/Purpose |
++======================================+======================================+
+| ``seed_monitor`` | Each gateway needs access to the |
+| | ceph cluster for rados and rbd |
+| | calls. This means the iSCSI gateway |
+| | must have an appropriate |
+| | ``/etc/ceph/`` directory defined. |
+| | The ``seed_monitor`` host is used to |
+| | populate the iSCSI gateway’s |
+| | ``/etc/ceph/`` directory. |
++--------------------------------------+--------------------------------------+
+| ``cluster_name`` | Define a custom storage cluster |
+| | name. |
++--------------------------------------+--------------------------------------+
+| ``gateway_keyring`` | Define a custom keyring name. |
++--------------------------------------+--------------------------------------+
+| ``deploy_settings`` | If set to ``true``, then deploy the |
+| | settings when the playbook is ran. |
++--------------------------------------+--------------------------------------+
+| ``perform_system_checks`` | This is a boolean value that checks |
+| | for multipath and lvm configuration |
+| | settings on each gateway. It must be |
+| | set to true for at least the first |
+| | run to ensure multipathd and lvm are |
+| | configured properly. |
++--------------------------------------+--------------------------------------+
+| ``api_user`` | The user name for the API. The |
+| | default is `admin`. |
++--------------------------------------+--------------------------------------+
+| ``api_password`` | The password for using the API. The |
+| | default is `admin`. |
++--------------------------------------+--------------------------------------+
+| ``api_port`` | The TCP port number for using the |
+| | API. The default is `5000`. |
++--------------------------------------+--------------------------------------+
+| ``api_secure`` | True if TLS must be used. The |
+| | default is `false`. If true the user |
+| | must create the necessary |
+| | certificate and key files. See the |
+| | gwcli man file for details. |
++--------------------------------------+--------------------------------------+
+| ``trusted_ip_list`` | A list of IPv4 or IPv6 addresses |
+| | who have access to the API. By |
+| | default, only the iSCSI gateway |
+| | nodes have access. |
++--------------------------------------+--------------------------------------+
+
+**Deployment:**
+
+Perform the following steps on the Ansible installer node.
+
+#. As ``root``, execute the Ansible playbook:
+
+ .. prompt:: bash #
+
+ cd /usr/share/ceph-ansible
+ ansible-playbook site.yml --limit iscsigws
+
+ .. note::
+ The Ansible playbook will handle RPM dependencies, setting up daemons,
+ and installing gwcli so it can be used to create iSCSI targets and export
+ RBD images as LUNs. In past versions, ``iscsigws.yml`` could define the
+ iSCSI target and other objects like clients, images and LUNs, but this is
+ no longer supported.
+
+#. Verify the configuration from an iSCSI gateway node:
+
+ .. prompt:: bash #
+
+ gwcli ls
+
+ .. note::
+ See the `Configuring the iSCSI Target using the Command Line Interface`_
+ section to create gateways, LUNs, and clients using the `gwcli` tool.
+
+ .. important::
+ Attempting to use the ``targetcli`` tool to change the configuration will
+ cause problems including ALUA misconfiguration and path failover
+ issues. There is the potential to corrupt data, to have mismatched
+ configuration across iSCSI gateways, and to have mismatched WWN information,
+ leading to client multipath problems.
+
+**Service Management:**
+
+The ``ceph-iscsi`` package installs the configuration management
+logic and a Systemd service called ``rbd-target-api``. When the Systemd
+service is enabled, the ``rbd-target-api`` will start at boot time and
+will restore the Linux IO state. The Ansible playbook disables the
+target service during the deployment. Below are the outcomes of when
+interacting with the ``rbd-target-api`` Systemd service.
+
+.. prompt:: bash #
+
+ systemctl <start|stop|restart|reload> rbd-target-api
+
+- ``reload``
+
+ A reload request will force ``rbd-target-api`` to reread the
+ configuration and apply it to the current running environment. This
+ is normally not required, since changes are deployed in parallel from
+ Ansible to all iSCSI gateway nodes
+
+- ``stop``
+
+ A stop request will close the gateway’s portal interfaces, dropping
+ connections to clients and wipe the current LIO configuration from
+ the kernel. This returns the iSCSI gateway to a clean state. When
+ clients are disconnected, active I/O is rescheduled to the other
+ iSCSI gateways by the client side multipathing layer.
+
+**Removing the Configuration:**
+
+The ``ceph-ansible`` package provides an Ansible playbook to
+remove the iSCSI gateway configuration and related RBD images. The
+Ansible playbook is ``/usr/share/ceph-ansible/purge_gateways.yml``. When
+this Ansible playbook is ran a prompted for the type of purge to
+perform:
+
+*lio* :
+
+In this mode the LIO configuration is purged on all iSCSI gateways that
+are defined. Disks that were created are left untouched within the Ceph
+storage cluster.
+
+*all* :
+
+When ``all`` is chosen, the LIO configuration is removed together with
+**all** RBD images that were defined within the iSCSI gateway
+environment, other unrelated RBD images will not be removed. Ensure the
+correct mode is chosen, this operation will delete data.
+
+.. warning::
+ A purge operation is destructive action against your iSCSI gateway
+ environment.
+
+.. warning::
+ A purge operation will fail, if RBD images have snapshots or clones
+ and are exported through the Ceph iSCSI gateway.
+
+.. highlight:: console
+
+::
+
+ [root@rh7-iscsi-client ceph-ansible]# ansible-playbook purge_gateways.yml
+ Which configuration elements should be purged? (all, lio or abort) [abort]: all
+
+
+ PLAY [Confirm removal of the iSCSI gateway configuration] *********************
+
+
+ GATHERING FACTS ***************************************************************
+ ok: [localhost]
+
+
+ TASK: [Exit playbook if user aborted the purge] *******************************
+ skipping: [localhost]
+
+
+ TASK: [set_fact ] *************************************************************
+ ok: [localhost]
+
+
+ PLAY [Removing the gateway configuration] *************************************
+
+
+ GATHERING FACTS ***************************************************************
+ ok: [ceph-igw-1]
+ ok: [ceph-igw-2]
+
+
+ TASK: [igw_purge | purging the gateway configuration] *************************
+ changed: [ceph-igw-1]
+ changed: [ceph-igw-2]
+
+
+ TASK: [igw_purge | deleting configured rbd devices] ***************************
+ changed: [ceph-igw-1]
+ changed: [ceph-igw-2]
+
+
+ PLAY RECAP ********************************************************************
+ ceph-igw-1 : ok=3 changed=2 unreachable=0 failed=0
+ ceph-igw-2 : ok=3 changed=2 unreachable=0 failed=0
+ localhost : ok=2 changed=0 unreachable=0 failed=0
+
+
+.. _Configuring the iSCSI Target using the Command Line Interface: ../iscsi-target-cli
diff --git a/doc/rbd/iscsi-target-cli-manual-install.rst b/doc/rbd/iscsi-target-cli-manual-install.rst
new file mode 100644
index 000000000..005f8aa94
--- /dev/null
+++ b/doc/rbd/iscsi-target-cli-manual-install.rst
@@ -0,0 +1,190 @@
+==============================
+Manual ceph-iscsi Installation
+==============================
+
+**Requirements**
+
+To complete the installation of ceph-iscsi, there are 4 steps:
+
+1. Install common packages from your Linux distribution's software repository
+2. Install Git to fetch the remaining packages directly from their Git repositories
+3. Ensure a compatible kernel is used
+4. Install all the components of ceph-iscsi and start associated daemons:
+
+ - tcmu-runner
+ - rtslib-fb
+ - configshell-fb
+ - targetcli-fb
+ - ceph-iscsi
+
+
+1. Install Common Packages
+==========================
+
+The following packages will be used by ceph-iscsi and target tools.
+They must be installed from your Linux distribution's software repository
+on each machine that will be a iSCSI gateway:
+
+- libnl3
+- libkmod
+- librbd1
+- pyparsing
+- python kmod
+- python pyudev
+- python gobject
+- python urwid
+- python pyparsing
+- python rados
+- python rbd
+- python netifaces
+- python crypto
+- python requests
+- python flask
+- pyOpenSSL
+
+
+2. Install Git
+==============
+
+In order to install all the packages needed to run iSCSI with Ceph, you need to download them directly from their repository by using Git.
+On CentOS/RHEL execute:
+
+.. prompt:: bash >
+
+ sudo yum install git
+
+On Debian/Ubuntu execute:
+
+.. prompt:: bash >
+
+ sudo apt install git
+
+To know more about Git and how it works, please, visit https://git-scm.com
+
+
+3. Ensure a compatible kernel is used
+=====================================
+
+Ensure you use a supported kernel that contains the required Ceph iSCSI patches:
+
+- all Linux distribution with a kernel v4.16 or newer, or
+- Red Hat Enterprise Linux or CentOS 7.5 or later (in these distributions ceph-iscsi support is backported)
+
+If you are already using a compatible kernel, you can go to next step.
+However, if you are NOT using a compatible kernel then check your distro's
+documentation for specific instructions on how to build this kernel. The only
+Ceph iSCSI specific requirements are that the following build options must be
+enabled:
+
+ .. code-block:: ini
+
+ CONFIG_TARGET_CORE=m
+ CONFIG_TCM_USER2=m
+ CONFIG_ISCSI_TARGET=m
+
+
+4. Install ceph-iscsi
+========================================================
+
+Finally, the remaining tools can be fetched directly from their Git repositories and their associated services started
+
+
+tcmu-runner
+-----------
+
+ Installation:
+
+ .. prompt:: bash >
+
+ git clone https://github.com/open-iscsi/tcmu-runner
+ cd tcmu-runner
+
+ Run the following command to install all the needed dependencies:
+
+ .. prompt:: bash >
+
+ ./extra/install_dep.sh
+
+ Now you can build the tcmu-runner.
+ To do so, use the following build command:
+
+ .. prompt:: bash >
+
+ cmake -Dwith-glfs=false -Dwith-qcow=false -DSUPPORT_SYSTEMD=ON -DCMAKE_INSTALL_PREFIX=/usr
+ make install
+
+ Enable and start the daemon:
+
+ .. prompt:: bash >
+
+ systemctl daemon-reload
+ systemctl enable tcmu-runner
+ systemctl start tcmu-runner
+
+
+rtslib-fb
+---------
+
+ Installation:
+
+ .. prompt:: bash >
+
+ git clone https://github.com/open-iscsi/rtslib-fb.git
+ cd rtslib-fb
+ python setup.py install
+
+configshell-fb
+--------------
+
+ Installation:
+
+ .. prompt:: bash >
+
+ git clone https://github.com/open-iscsi/configshell-fb.git
+ cd configshell-fb
+ python setup.py install
+
+targetcli-fb
+------------
+
+ Installation:
+
+ .. prompt:: bash >
+
+ git clone https://github.com/open-iscsi/targetcli-fb.git
+ cd targetcli-fb
+ python setup.py install
+ mkdir /etc/target
+ mkdir /var/target
+
+ .. warning:: The ceph-iscsi tools assume they are managing all targets
+ on the system. If targets have been setup and are being managed by
+ targetcli the target service must be disabled.
+
+ceph-iscsi
+-----------------
+
+ Installation:
+
+ .. prompt:: bash >
+
+ git clone https://github.com/ceph/ceph-iscsi.git
+ cd ceph-iscsi
+ python setup.py install --install-scripts=/usr/bin
+ cp usr/lib/systemd/system/rbd-target-gw.service /lib/systemd/system
+ cp usr/lib/systemd/system/rbd-target-api.service /lib/systemd/system
+
+ Enable and start the daemon:
+
+ .. prompt:: bash >
+
+ systemctl daemon-reload
+ systemctl enable rbd-target-gw
+ systemctl start rbd-target-gw
+ systemctl enable rbd-target-api
+ systemctl start rbd-target-api
+
+Installation is complete. Proceed to the setup section in the
+`main ceph-iscsi CLI page`_.
+
+.. _`main ceph-iscsi CLI page`: ../iscsi-target-cli
diff --git a/doc/rbd/iscsi-target-cli.rst b/doc/rbd/iscsi-target-cli.rst
new file mode 100644
index 000000000..44da56000
--- /dev/null
+++ b/doc/rbd/iscsi-target-cli.rst
@@ -0,0 +1,266 @@
+=============================================================
+Configuring the iSCSI Target using the Command Line Interface
+=============================================================
+
+The Ceph iSCSI gateway is both an iSCSI target and a Ceph client;
+think of it as a "translator" between Ceph's RBD interface
+and the iSCSI standard. The Ceph iSCSI gateway can run on a
+standalone node or be colocated with other daemons eg. on
+a Ceph Object Store Disk (OSD) node. When co-locating, ensure
+that sufficient CPU and memory are available to share.
+The following steps install and configure the Ceph iSCSI gateway for basic operation.
+
+**Requirements:**
+
+- A running Ceph Luminous or later storage cluster
+
+- Red Hat Enterprise Linux/CentOS 7.5 (or newer); Linux kernel v4.16 (or newer)
+
+- The following packages must be installed from your Linux distribution's software repository:
+
+ - ``targetcli-2.1.fb47`` or newer package
+
+ - ``python-rtslib-2.1.fb68`` or newer package
+
+ - ``tcmu-runner-1.4.0`` or newer package
+
+ - ``ceph-iscsi-3.2`` or newer package
+
+ .. important::
+ If previous versions of these packages exist, then they must
+ be removed first before installing the newer versions.
+
+Do the following steps on the Ceph iSCSI gateway node before proceeding
+to the *Installing* section:
+
+#. If the Ceph iSCSI gateway is not colocated on an OSD node, then copy
+ the Ceph configuration files, located in ``/etc/ceph/``, from a
+ running Ceph node in the storage cluster to the iSCSI Gateway node.
+ The Ceph configuration files must exist on the iSCSI gateway node
+ under ``/etc/ceph/``.
+
+#. Install and configure the `Ceph Command-line Interface`_
+
+#. If needed, open TCP ports 3260 and 5000 on the firewall.
+
+ .. note::
+ Access to port 5000 should be restricted to a trusted internal network or
+ only the individual hosts where ``gwcli`` is used or ``ceph-mgr`` daemons
+ are running.
+
+#. Create a new or use an existing RADOS Block Device (RBD).
+
+**Installing:**
+
+If you are using the upstream ceph-iscsi package follow the
+`manual install instructions`_.
+
+.. _`manual install instructions`: ../iscsi-target-cli-manual-install
+
+.. toctree::
+ :hidden:
+
+ iscsi-target-cli-manual-install
+
+For rpm based instructions execute the following commands:
+
+#. As ``root``, on all iSCSI gateway nodes, install the
+ ``ceph-iscsi`` package:
+
+ .. prompt:: bash #
+
+ yum install ceph-iscsi
+
+#. As ``root``, on all iSCSI gateway nodes, install the ``tcmu-runner``
+ package:
+
+ .. prompt:: bash #
+
+ yum install tcmu-runner
+
+**Setup:**
+
+#. gwcli requires a pool with the name ``rbd``, so it can store metadata
+ like the iSCSI configuration. To check if this pool has been created
+ run:
+
+ .. prompt:: bash #
+
+ ceph osd lspools
+
+ If it does not exist instructions for creating pools can be found on the
+ `RADOS pool operations page
+ <http://docs.ceph.com/en/latest/rados/operations/pools/>`_.
+
+#. As ``root``, on a iSCSI gateway node, create a file named
+ ``iscsi-gateway.cfg`` in the ``/etc/ceph/`` directory:
+
+ .. prompt:: bash #
+
+ touch /etc/ceph/iscsi-gateway.cfg
+
+ #. Edit the ``iscsi-gateway.cfg`` file and add the following lines:
+
+ .. code-block:: ini
+
+ [config]
+ # Name of the Ceph storage cluster. A suitable Ceph configuration file allowing
+ # access to the Ceph storage cluster from the gateway node is required, if not
+ # colocated on an OSD node.
+ cluster_name = ceph
+
+ # Place a copy of the ceph cluster's admin keyring in the gateway's /etc/ceph
+ # directory and reference the filename here
+ gateway_keyring = ceph.client.admin.keyring
+
+
+ # API settings.
+ # The API supports a number of options that allow you to tailor it to your
+ # local environment. If you want to run the API under https, you will need to
+ # create cert/key files that are compatible for each iSCSI gateway node, that is
+ # not locked to a specific node. SSL cert and key files *must* be called
+ # 'iscsi-gateway.crt' and 'iscsi-gateway.key' and placed in the '/etc/ceph/' directory
+ # on *each* gateway node. With the SSL files in place, you can use 'api_secure = true'
+ # to switch to https mode.
+
+ # To support the API, the bare minimum settings are:
+ api_secure = false
+
+ # Additional API configuration options are as follows, defaults shown.
+ # api_user = admin
+ # api_password = admin
+ # api_port = 5001
+ # trusted_ip_list = 192.168.0.10,192.168.0.11
+
+ .. note::
+ trusted_ip_list is a list of IP addresses on each iSCSI gateway that
+ will be used for management operations like target creation, LUN
+ exporting, etc. The IP can be the same that will be used for iSCSI
+ data, like READ/WRITE commands to/from the RBD image, but using
+ separate IPs is recommended.
+
+ .. important::
+ The ``iscsi-gateway.cfg`` file must be identical on all iSCSI gateway nodes.
+
+ #. As ``root``, copy the ``iscsi-gateway.cfg`` file to all iSCSI
+ gateway nodes.
+
+#. As ``root``, on all iSCSI gateway nodes, enable and start the API
+ service:
+
+ .. prompt:: bash #
+
+ systemctl daemon-reload
+
+ systemctl enable rbd-target-gw
+ systemctl start rbd-target-gw
+
+ systemctl enable rbd-target-api
+ systemctl start rbd-target-api
+
+
+**Configuring:**
+
+gwcli will create and configure the iSCSI target and RBD images and copy the
+configuration across the gateways setup in the last section. Lower level
+tools including targetcli and rbd can be used to query the local configuration,
+but should not be used to modify it. This next section will demonstrate how
+to create a iSCSI target and export a RBD image as LUN 0.
+
+#. As ``root``, on a iSCSI gateway node, start the iSCSI gateway
+ command-line interface:
+
+ .. prompt:: bash #
+
+ gwcli
+
+#. Go to iscsi-targets and create a target with the name
+ iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw:
+
+ .. code-block:: console
+
+ > /> cd /iscsi-targets
+ > /iscsi-targets> create iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
+
+#. Create the iSCSI gateways. The IPs used below are the ones that will be
+ used for iSCSI data like READ and WRITE commands. They can be the
+ same IPs used for management operations listed in trusted_ip_list,
+ but it is recommended that different IPs are used.
+
+ .. code-block:: console
+
+ > /iscsi-targets> cd iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw/gateways
+ > /iscsi-target...-igw/gateways> create ceph-gw-1 10.172.19.21
+ > /iscsi-target...-igw/gateways> create ceph-gw-2 10.172.19.22
+
+ If not using RHEL/CentOS or using an upstream or ceph-iscsi-test kernel,
+ the skipchecks=true argument must be used. This will avoid the Red Hat kernel
+ and rpm checks:
+
+ .. code-block:: console
+
+ > /iscsi-targets> cd iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw/gateways
+ > /iscsi-target...-igw/gateways> create ceph-gw-1 10.172.19.21 skipchecks=true
+ > /iscsi-target...-igw/gateways> create ceph-gw-2 10.172.19.22 skipchecks=true
+
+#. Add a RBD image with the name disk_1 in the pool rbd:
+
+ .. code-block:: console
+
+ > /iscsi-target...-igw/gateways> cd /disks
+ > /disks> create pool=rbd image=disk_1 size=90G
+
+#. Create a client with the initiator name iqn.1994-05.com.redhat:rh7-client:
+
+ .. code-block:: console
+
+ > /disks> cd /iscsi-targets/iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw/hosts
+ > /iscsi-target...eph-igw/hosts> create iqn.1994-05.com.redhat:rh7-client
+
+#. Set the initiator CHAP username and password which the target would
+ use when authenticating the initiator:
+
+ .. code-block:: console
+
+ > /iscsi-target...at:rh7-client> auth username=myusername password=mypassword
+
+ .. warning::
+ CHAP must always be configured. Without CHAP, the target will
+ reject any login requests.
+
+ To use mutual (bidirectional) authentication, also set the target CHAP
+ username and password which the initiator would use when authenticating
+ the target:
+
+ .. code-block:: console
+
+ > /iscsi-target...at:rh7-client> auth username=myusername password=mypassword mutual_username=mytgtusername mutual_password=mytgtpassword
+
+ .. note::
+ CHAP usernames must be between 8 and 64 characters long. Valid
+ characters: ``0`` to ``9``, ``a`` to ``z``, ``A`` to ``Z``, ``@``,
+ ``_``, ``-``, ``.``, ``:``.
+
+ .. note::
+ CHAP passwords must be between 12 and 16 characters long. Valid
+ characters: ``0`` to ``9``, ``a`` to ``z``, ``A`` to ``Z``, ``@``,
+ ``_``, ``-``, ``/``.
+
+ .. note::
+ For mutual CHAP, initiator and target usernames and passwords
+ must not be the same.
+
+#. Add the disk to the client:
+
+ .. code-block:: console
+
+ > /iscsi-target...at:rh7-client> disk add rbd/disk_1
+
+The next step is to configure the iSCSI initiators.
+
+.. _`Ceph Command-line Interface`: ../../start/quick-rbd/#install-ceph
+
+.. toctree::
+ :hidden:
+
+ ../../start/quick-rbd
diff --git a/doc/rbd/iscsi-targets.rst b/doc/rbd/iscsi-targets.rst
new file mode 100644
index 000000000..d2a035283
--- /dev/null
+++ b/doc/rbd/iscsi-targets.rst
@@ -0,0 +1,27 @@
+=============
+iSCSI Targets
+=============
+
+Traditionally, block-level access to a Ceph storage cluster has been
+limited to QEMU and ``librbd``, which is a key enabler for adoption
+within OpenStack environments. Starting with the Ceph Luminous release,
+block-level access is expanding to offer standard iSCSI support allowing
+wider platform usage, and potentially opening new use cases.
+
+- Red Hat Enterprise Linux/CentOS 7.5 (or newer); Linux kernel v4.16 (or newer)
+
+- A working Ceph Storage cluster, deployed with ``ceph-ansible`` or using the command-line interface
+
+- iSCSI gateways nodes, which can either be colocated with OSD nodes or on dedicated nodes
+
+- Separate network subnets for iSCSI front-end traffic and Ceph back-end traffic
+
+A choice of using Ansible or the command-line interface are the
+available deployment methods for installing and configuring the Ceph
+iSCSI gateway:
+
+.. toctree::
+ :maxdepth: 1
+
+ Using Ansible <iscsi-target-ansible>
+ Using the Command Line Interface <iscsi-target-cli>
diff --git a/doc/rbd/libvirt.rst b/doc/rbd/libvirt.rst
new file mode 100644
index 000000000..e3523f8a8
--- /dev/null
+++ b/doc/rbd/libvirt.rst
@@ -0,0 +1,323 @@
+=================================
+ Using libvirt with Ceph RBD
+=================================
+
+.. index:: Ceph Block Device; livirt
+
+The ``libvirt`` library creates a virtual machine abstraction layer between
+hypervisor interfaces and the software applications that use them. With
+``libvirt``, developers and system administrators can focus on a common
+management framework, common API, and common shell interface (i.e., ``virsh``)
+to many different hypervisors, including:
+
+- QEMU/KVM
+- XEN
+- LXC
+- VirtualBox
+- etc.
+
+Ceph block devices support QEMU/KVM. You can use Ceph block devices with
+software that interfaces with ``libvirt``. The following stack diagram
+illustrates how ``libvirt`` and QEMU use Ceph block devices via ``librbd``.
+
+
+.. ditaa::
+
+ +---------------------------------------------------+
+ | libvirt |
+ +------------------------+--------------------------+
+ |
+ | configures
+ v
+ +---------------------------------------------------+
+ | QEMU |
+ +---------------------------------------------------+
+ | librbd |
+ +---------------------------------------------------+
+ | librados |
+ +------------------------+-+------------------------+
+ | OSDs | | Monitors |
+ +------------------------+ +------------------------+
+
+
+The most common ``libvirt`` use case involves providing Ceph block devices to
+cloud solutions like OpenStack or CloudStack. The cloud solution uses
+``libvirt`` to interact with QEMU/KVM, and QEMU/KVM interacts with Ceph block
+devices via ``librbd``. See `Block Devices and OpenStack`_ and `Block Devices
+and CloudStack`_ for details. See `Installation`_ for installation details.
+
+You can also use Ceph block devices with ``libvirt``, ``virsh`` and the
+``libvirt`` API. See `libvirt Virtualization API`_ for details.
+
+
+To create VMs that use Ceph block devices, use the procedures in the following
+sections. In the exemplary embodiment, we have used ``libvirt-pool`` for the pool
+name, ``client.libvirt`` for the user name, and ``new-libvirt-image`` for the
+image name. You may use any value you like, but ensure you replace those values
+when executing commands in the subsequent procedures.
+
+
+Configuring Ceph
+================
+
+To configure Ceph for use with ``libvirt``, perform the following steps:
+
+#. `Create a pool`_. The following example uses the
+ pool name ``libvirt-pool``.::
+
+ ceph osd pool create libvirt-pool
+
+ Verify the pool exists. ::
+
+ ceph osd lspools
+
+#. Use the ``rbd`` tool to initialize the pool for use by RBD::
+
+ rbd pool init <pool-name>
+
+#. `Create a Ceph User`_ (or use ``client.admin`` for version 0.9.7 and
+ earlier). The following example uses the Ceph user name ``client.libvirt``
+ and references ``libvirt-pool``. ::
+
+ ceph auth get-or-create client.libvirt mon 'profile rbd' osd 'profile rbd pool=libvirt-pool'
+
+ Verify the name exists. ::
+
+ ceph auth ls
+
+ **NOTE**: ``libvirt`` will access Ceph using the ID ``libvirt``,
+ not the Ceph name ``client.libvirt``. See `User Management - User`_ and
+ `User Management - CLI`_ for a detailed explanation of the difference
+ between ID and name.
+
+#. Use QEMU to `create an image`_ in your RBD pool.
+ The following example uses the image name ``new-libvirt-image``
+ and references ``libvirt-pool``. ::
+
+ qemu-img create -f rbd rbd:libvirt-pool/new-libvirt-image 2G
+
+ Verify the image exists. ::
+
+ rbd -p libvirt-pool ls
+
+ **NOTE:** You can also use `rbd create`_ to create an image, but we
+ recommend ensuring that QEMU is working properly.
+
+.. tip:: Optionally, if you wish to enable debug logs and the admin socket for
+ this client, you can add the following section to ``/etc/ceph/ceph.conf``::
+
+ [client.libvirt]
+ log file = /var/log/ceph/qemu-guest-$pid.log
+ admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
+
+ The ``client.libvirt`` section name should match the cephx user you created
+ above.
+ If SELinux or AppArmor is enabled, note that this could prevent the client
+ process (qemu via libvirt) from doing some operations, such as writing logs
+ or operate the images or admin socket to the destination locations (``/var/
+ log/ceph`` or ``/var/run/ceph``). Additionally, make sure that the libvirt
+ and qemu users have appropriate access to the specified directory.
+
+
+Preparing the VM Manager
+========================
+
+You may use ``libvirt`` without a VM manager, but you may find it simpler to
+create your first domain with ``virt-manager``.
+
+#. Install a virtual machine manager. See `KVM/VirtManager`_ for details. ::
+
+ sudo apt-get install virt-manager
+
+#. Download an OS image (if necessary).
+
+#. Launch the virtual machine manager. ::
+
+ sudo virt-manager
+
+
+
+Creating a VM
+=============
+
+To create a VM with ``virt-manager``, perform the following steps:
+
+#. Press the **Create New Virtual Machine** button.
+
+#. Name the new virtual machine domain. In the exemplary embodiment, we
+ use the name ``libvirt-virtual-machine``. You may use any name you wish,
+ but ensure you replace ``libvirt-virtual-machine`` with the name you
+ choose in subsequent commandline and configuration examples. ::
+
+ libvirt-virtual-machine
+
+#. Import the image. ::
+
+ /path/to/image/recent-linux.img
+
+ **NOTE:** Import a recent image. Some older images may not rescan for
+ virtual devices properly.
+
+#. Configure and start the VM.
+
+#. You may use ``virsh list`` to verify the VM domain exists. ::
+
+ sudo virsh list
+
+#. Login to the VM (root/root)
+
+#. Stop the VM before configuring it for use with Ceph.
+
+
+Configuring the VM
+==================
+
+When configuring the VM for use with Ceph, it is important to use ``virsh``
+where appropriate. Additionally, ``virsh`` commands often require root
+privileges (i.e., ``sudo``) and will not return appropriate results or notify
+you that root privileges are required. For a reference of ``virsh``
+commands, refer to `Virsh Command Reference`_.
+
+
+#. Open the configuration file with ``virsh edit``. ::
+
+ sudo virsh edit {vm-domain-name}
+
+ Under ``<devices>`` there should be a ``<disk>`` entry. ::
+
+ <devices>
+ <emulator>/usr/bin/kvm</emulator>
+ <disk type='file' device='disk'>
+ <driver name='qemu' type='raw'/>
+ <source file='/path/to/image/recent-linux.img'/>
+ <target dev='vda' bus='virtio'/>
+ <address type='drive' controller='0' bus='0' unit='0'/>
+ </disk>
+
+
+ Replace ``/path/to/image/recent-linux.img`` with the path to the OS image.
+ The minimum kernel for using the faster ``virtio`` bus is 2.6.25. See
+ `Virtio`_ for details.
+
+ **IMPORTANT:** Use ``sudo virsh edit`` instead of a text editor. If you edit
+ the configuration file under ``/etc/libvirt/qemu`` with a text editor,
+ ``libvirt`` may not recognize the change. If there is a discrepancy between
+ the contents of the XML file under ``/etc/libvirt/qemu`` and the result of
+ ``sudo virsh dumpxml {vm-domain-name}``, then your VM may not work
+ properly.
+
+
+#. Add the Ceph RBD image you created as a ``<disk>`` entry. ::
+
+ <disk type='network' device='disk'>
+ <source protocol='rbd' name='libvirt-pool/new-libvirt-image'>
+ <host name='{monitor-host}' port='6789'/>
+ </source>
+ <target dev='vdb' bus='virtio'/>
+ </disk>
+
+ Replace ``{monitor-host}`` with the name of your host, and replace the
+ pool and/or image name as necessary. You may add multiple ``<host>``
+ entries for your Ceph monitors. The ``dev`` attribute is the logical
+ device name that will appear under the ``/dev`` directory of your
+ VM. The optional ``bus`` attribute indicates the type of disk device to
+ emulate. The valid settings are driver specific (e.g., "ide", "scsi",
+ "virtio", "xen", "usb" or "sata").
+
+ See `Disks`_ for details of the ``<disk>`` element, and its child elements
+ and attributes.
+
+#. Save the file.
+
+#. If your Ceph Storage Cluster has `Ceph Authentication`_ enabled (it does by
+ default), you must generate a secret. ::
+
+ cat > secret.xml <<EOF
+ <secret ephemeral='no' private='no'>
+ <usage type='ceph'>
+ <name>client.libvirt secret</name>
+ </usage>
+ </secret>
+ EOF
+
+#. Define the secret. ::
+
+ sudo virsh secret-define --file secret.xml
+ {uuid of secret}
+
+#. Get the ``client.libvirt`` key and save the key string to a file. ::
+
+ ceph auth get-key client.libvirt | sudo tee client.libvirt.key
+
+#. Set the UUID of the secret. ::
+
+ sudo virsh secret-set-value --secret {uuid of secret} --base64 $(cat client.libvirt.key) && rm client.libvirt.key secret.xml
+
+ You must also set the secret manually by adding the following ``<auth>``
+ entry to the ``<disk>`` element you entered earlier (replacing the
+ ``uuid`` value with the result from the command line example above). ::
+
+ sudo virsh edit {vm-domain-name}
+
+ Then, add ``<auth></auth>`` element to the domain configuration file::
+
+ ...
+ </source>
+ <auth username='libvirt'>
+ <secret type='ceph' uuid='{uuid of secret}'/>
+ </auth>
+ <target ...
+
+
+ **NOTE:** The exemplary ID is ``libvirt``, not the Ceph name
+ ``client.libvirt`` as generated at step 2 of `Configuring Ceph`_. Ensure
+ you use the ID component of the Ceph name you generated. If for some reason
+ you need to regenerate the secret, you will have to execute
+ ``sudo virsh secret-undefine {uuid}`` before executing
+ ``sudo virsh secret-set-value`` again.
+
+
+Summary
+=======
+
+Once you have configured the VM for use with Ceph, you can start the VM.
+To verify that the VM and Ceph are communicating, you may perform the
+following procedures.
+
+
+#. Check to see if Ceph is running::
+
+ ceph health
+
+#. Check to see if the VM is running. ::
+
+ sudo virsh list
+
+#. Check to see if the VM is communicating with Ceph. Replace
+ ``{vm-domain-name}`` with the name of your VM domain::
+
+ sudo virsh qemu-monitor-command --hmp {vm-domain-name} 'info block'
+
+#. Check to see if the device from ``<target dev='vdb' bus='virtio'/>`` exists::
+
+ virsh domblklist {vm-domain-name} --details
+
+If everything looks okay, you may begin using the Ceph block device
+within your VM.
+
+
+.. _Installation: ../../install
+.. _libvirt Virtualization API: http://www.libvirt.org
+.. _Block Devices and OpenStack: ../rbd-openstack
+.. _Block Devices and CloudStack: ../rbd-cloudstack
+.. _Create a pool: ../../rados/operations/pools#create-a-pool
+.. _Create a Ceph User: ../../rados/operations/user-management#add-a-user
+.. _create an image: ../qemu-rbd#creating-images-with-qemu
+.. _Virsh Command Reference: http://www.libvirt.org/virshcmdref.html
+.. _KVM/VirtManager: https://help.ubuntu.com/community/KVM/VirtManager
+.. _Ceph Authentication: ../../rados/configuration/auth-config-ref
+.. _Disks: http://www.libvirt.org/formatdomain.html#elementsDisks
+.. _rbd create: ../rados-rbd-cmds#creating-a-block-device-image
+.. _User Management - User: ../../rados/operations/user-management#user
+.. _User Management - CLI: ../../rados/operations/user-management#command-line-usage
+.. _Virtio: http://www.linux-kvm.org/page/Virtio
diff --git a/doc/rbd/man/index.rst b/doc/rbd/man/index.rst
new file mode 100644
index 000000000..110273acc
--- /dev/null
+++ b/doc/rbd/man/index.rst
@@ -0,0 +1,16 @@
+============================
+ Ceph Block Device Manpages
+============================
+
+.. toctree::
+ :maxdepth: 1
+
+ rbd <../../man/8/rbd>
+ rbd-fuse <../../man/8/rbd-fuse>
+ rbd-nbd <../../man/8/rbd-nbd>
+ rbd-ggate <../../man/8/rbd-ggate>
+ rbd-map <../../man/8/rbdmap>
+ ceph-rbdnamer <../../man/8/ceph-rbdnamer>
+ rbd-replay-prep <../../man/8/rbd-replay-prep>
+ rbd-replay <../../man/8/rbd-replay>
+ rbd-replay-many <../../man/8/rbd-replay-many>
diff --git a/doc/rbd/qemu-rbd.rst b/doc/rbd/qemu-rbd.rst
new file mode 100644
index 000000000..281335ebe
--- /dev/null
+++ b/doc/rbd/qemu-rbd.rst
@@ -0,0 +1,219 @@
+========================
+ QEMU and Block Devices
+========================
+
+.. index:: Ceph Block Device; QEMU KVM
+
+The most frequent Ceph Block Device use case involves providing block device
+images to virtual machines. For example, a user may create a "golden" image
+with an OS and any relevant software in an ideal configuration. Then the user
+takes a snapshot of the image. Finally the user clones the snapshot (potentially
+many times). See `Snapshots`_ for details. The ability to make copy-on-write
+clones of a snapshot means that Ceph can provision block device images to
+virtual machines quickly, because the client doesn't have to download the entire
+image each time it spins up a new virtual machine.
+
+
+.. ditaa::
+
+ +---------------------------------------------------+
+ | QEMU |
+ +---------------------------------------------------+
+ | librbd |
+ +---------------------------------------------------+
+ | librados |
+ +------------------------+-+------------------------+
+ | OSDs | | Monitors |
+ +------------------------+ +------------------------+
+
+
+Ceph Block Devices attach to QEMU virtual machines. For details on
+QEMU, see `QEMU Open Source Processor Emulator`_. For QEMU documentation, see
+`QEMU Manual`_. For installation details, see `Installation`_.
+
+.. important:: To use Ceph Block Devices with QEMU, you must have access to a
+ running Ceph cluster.
+
+
+Usage
+=====
+
+The QEMU command line expects you to specify the Ceph pool and image name. You
+may also specify a snapshot.
+
+QEMU will assume that Ceph configuration resides in the default
+location (e.g., ``/etc/ceph/$cluster.conf``) and that you are executing
+commands as the default ``client.admin`` user unless you expressly specify
+another Ceph configuration file path or another user. When specifying a user,
+QEMU uses the ``ID`` rather than the full ``TYPE:ID``. See `User Management -
+User`_ for details. Do not prepend the client type (i.e., ``client.``) to the
+beginning of the user ``ID``, or you will receive an authentication error. You
+should have the key for the ``admin`` user or the key of another user you
+specify with the ``:id={user}`` option in a keyring file stored in default path
+(i.e., ``/etc/ceph`` or the local directory with appropriate file ownership and
+permissions. Usage takes the following form::
+
+ qemu-img {command} [options] rbd:{pool-name}/{image-name}[@snapshot-name][:option1=value1][:option2=value2...]
+
+For example, specifying the ``id`` and ``conf`` options might look like the following::
+
+ qemu-img {command} [options] rbd:glance-pool/maipo:id=glance:conf=/etc/ceph/ceph.conf
+
+.. tip:: Configuration values containing ``:``, ``@``, or ``=`` can be escaped with a
+ leading ``\`` character.
+
+
+Creating Images with QEMU
+=========================
+
+You can create a block device image from QEMU. You must specify ``rbd``, the
+pool name, and the name of the image you wish to create. You must also specify
+the size of the image. ::
+
+ qemu-img create -f raw rbd:{pool-name}/{image-name} {size}
+
+For example::
+
+ qemu-img create -f raw rbd:data/foo 10G
+
+.. important:: The ``raw`` data format is really the only sensible
+ ``format`` option to use with RBD. Technically, you could use other
+ QEMU-supported formats (such as ``qcow2`` or ``vmdk``), but doing
+ so would add additional overhead, and would also render the volume
+ unsafe for virtual machine live migration when caching (see below)
+ is enabled.
+
+
+Resizing Images with QEMU
+=========================
+
+You can resize a block device image from QEMU. You must specify ``rbd``,
+the pool name, and the name of the image you wish to resize. You must also
+specify the size of the image. ::
+
+ qemu-img resize rbd:{pool-name}/{image-name} {size}
+
+For example::
+
+ qemu-img resize rbd:data/foo 10G
+
+
+Retrieving Image Info with QEMU
+===============================
+
+You can retrieve block device image information from QEMU. You must
+specify ``rbd``, the pool name, and the name of the image. ::
+
+ qemu-img info rbd:{pool-name}/{image-name}
+
+For example::
+
+ qemu-img info rbd:data/foo
+
+
+Running QEMU with RBD
+=====================
+
+QEMU can pass a block device from the host on to a guest, but since
+QEMU 0.15, there's no need to map an image as a block device on
+the host. Instead, QEMU attaches an image as a virtual block
+device directly via ``librbd``. This strategy increases performance
+by avoiding context switches and taking advantage of `RBD caching`_.
+
+You can use ``qemu-img`` to convert existing virtual machine images to Ceph
+block device images. For example, if you have a qcow2 image, you could run::
+
+ qemu-img convert -f qcow2 -O raw debian_squeeze.qcow2 rbd:data/squeeze
+
+To run a virtual machine booting from that image, you could run::
+
+ qemu -m 1024 -drive format=raw,file=rbd:data/squeeze
+
+`RBD caching`_ can significantly improve performance.
+Since QEMU 1.2, QEMU's cache options control ``librbd`` caching::
+
+ qemu -m 1024 -drive format=rbd,file=rbd:data/squeeze,cache=writeback
+
+If you have an older version of QEMU, you can set the ``librbd`` cache
+configuration (like any Ceph configuration option) as part of the
+'file' parameter::
+
+ qemu -m 1024 -drive format=raw,file=rbd:data/squeeze:rbd_cache=true,cache=writeback
+
+.. important:: If you set rbd_cache=true, you must set cache=writeback
+ or risk data loss. Without cache=writeback, QEMU will not send
+ flush requests to librbd. If QEMU exits uncleanly in this
+ configuration, file systems on top of rbd can be corrupted.
+
+.. _RBD caching: ../rbd-config-ref/#rbd-cache-config-settings
+
+
+.. index:: Ceph Block Device; discard trim and libvirt
+
+Enabling Discard/TRIM
+=====================
+
+Since Ceph version 0.46 and QEMU version 1.1, Ceph Block Devices support the
+discard operation. This means that a guest can send TRIM requests to let a Ceph
+block device reclaim unused space. This can be enabled in the guest by mounting
+``ext4`` or ``XFS`` with the ``discard`` option.
+
+For this to be available to the guest, it must be explicitly enabled
+for the block device. To do this, you must specify a
+``discard_granularity`` associated with the drive::
+
+ qemu -m 1024 -drive format=raw,file=rbd:data/squeeze,id=drive1,if=none \
+ -device driver=ide-hd,drive=drive1,discard_granularity=512
+
+Note that this uses the IDE driver. The virtio driver supports discard since Linux kernel version 5.0.
+
+If using libvirt, edit your libvirt domain's configuration file using ``virsh
+edit`` to include the ``xmlns:qemu`` value. Then, add a ``qemu:commandline``
+block as a child of that domain. The following example shows how to set two
+devices with ``qemu id=`` to different ``discard_granularity`` values.
+
+.. code-block:: xml
+
+ <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
+ <qemu:commandline>
+ <qemu:arg value='-set'/>
+ <qemu:arg value='block.scsi0-0-0.discard_granularity=4096'/>
+ <qemu:arg value='-set'/>
+ <qemu:arg value='block.scsi0-0-1.discard_granularity=65536'/>
+ </qemu:commandline>
+ </domain>
+
+
+.. index:: Ceph Block Device; cache options
+
+QEMU Cache Options
+==================
+
+QEMU's cache options correspond to the following Ceph `RBD Cache`_ settings.
+
+Writeback::
+
+ rbd_cache = true
+
+Writethrough::
+
+ rbd_cache = true
+ rbd_cache_max_dirty = 0
+
+None::
+
+ rbd_cache = false
+
+QEMU's cache settings override Ceph's cache settings (including settings that
+are explicitly set in the Ceph configuration file).
+
+.. note:: Prior to QEMU v2.4.0, if you explicitly set `RBD Cache`_ settings
+ in the Ceph configuration file, your Ceph settings override the QEMU cache
+ settings.
+
+.. _QEMU Open Source Processor Emulator: http://wiki.qemu.org/Main_Page
+.. _QEMU Manual: http://wiki.qemu.org/Manual
+.. _RBD Cache: ../rbd-config-ref/
+.. _Snapshots: ../rbd-snapshot/
+.. _Installation: ../../install
+.. _User Management - User: ../../rados/operations/user-management#user
diff --git a/doc/rbd/rados-rbd-cmds.rst b/doc/rbd/rados-rbd-cmds.rst
new file mode 100644
index 000000000..0bbcb2611
--- /dev/null
+++ b/doc/rbd/rados-rbd-cmds.rst
@@ -0,0 +1,326 @@
+=============================
+ Basic Block Device Commands
+=============================
+
+.. index:: Ceph Block Device; image management
+
+The ``rbd`` command enables you to create, list, introspect and remove block
+device images. You can also use it to clone images, create snapshots,
+rollback an image to a snapshot, view a snapshot, etc. For details on using
+the ``rbd`` command, see `RBD – Manage RADOS Block Device (RBD) Images`_ for
+details.
+
+.. important:: To use Ceph Block Device commands, you must have access to
+ a running Ceph cluster.
+
+Create a Block Device Pool
+==========================
+
+#. Use the ``ceph`` tool to `create a pool`_.
+
+#. Use the ``rbd`` tool to initialize the pool for use by RBD:
+
+ .. prompt:: bash $
+
+ rbd pool init <pool-name>
+
+ .. note:: The ``rbd`` tool assumes a default pool name of 'rbd' if no pool
+ name is specified in the command.
+
+
+Create a Block Device User
+==========================
+
+Unless otherwise specified, the ``rbd`` command uses the Ceph user ID ``admin``
+to access the Ceph cluster. The ``admin`` Ceph user ID allows full
+administrative access to the cluster. We recommend that you acess the Ceph
+cluster with a Ceph user ID that has fewer permissions than the ``admin`` Ceph
+user ID does. We call this non-``admin`` Ceph user ID a "block device user" or
+"Ceph user".
+
+To `create a Ceph user`_, use the ``ceph auth get-or-create`` command to
+specify the Ceph user ID name, monitor caps (capabilities), and OSD caps
+(capabilities):
+
+.. prompt:: bash $
+
+ ceph auth get-or-create client.{ID} mon 'profile rbd' osd 'profile {profile name} [pool={pool-name}][, profile ...]' mgr 'profile rbd [pool={pool-name}]'
+
+For example: to create a Ceph user ID named ``qemu`` that has read-write access
+to the pool ``vms`` and read-only access to the pool ``images``, run the
+following command:
+
+.. prompt:: bash $
+
+ ceph auth get-or-create client.qemu mon 'profile rbd' osd 'profile rbd pool=vms, profile rbd-read-only pool=images' mgr 'profile rbd pool=images'
+
+The output from the ``ceph auth get-or-create`` command is the keyring for the
+specified Ceph user ID, which can be written to
+``/etc/ceph/ceph.client.{ID}.keyring``.
+
+.. note:: Specify the Ceph user ID by providing the ``--id {id} argument when
+ using the ``rbd`` command. This argument is optional.
+
+Creating a Block Device Image
+=============================
+
+Before you can add a block device to a node, you must create an image for it in
+the :term:`Ceph Storage Cluster`. To create a block device image, run a command of this form:
+
+.. prompt:: bash $
+
+ rbd create --size {megabytes} {pool-name}/{image-name}
+
+For example, to create a 1GB image named ``bar`` that stores information in a
+pool named ``swimmingpool``, run this command:
+
+.. prompt:: bash $
+
+ rbd create --size 1024 swimmingpool/bar
+
+If you don't specify a pool when you create an image, then the image will be
+stored in the default pool ``rbd``. For example, if you ran this command, you
+would create a 1GB image named ``foo`` that is stored in the default pool
+``rbd``:
+
+.. prompt:: bash $
+
+ rbd create --size 1024 foo
+
+.. note:: You must create a pool before you can specify it as a source. See
+ `Storage Pools`_ for details.
+
+Listing Block Device Images
+===========================
+
+To list block devices in the ``rbd`` pool, run the following command:
+
+.. prompt:: bash $
+
+ rbd ls
+
+.. note:: ``rbd`` is the default pool name, and ``rbd ls`` lists the commands
+ in the default pool.
+
+To list block devices in a particular pool, run the following command, but
+replace ``{poolname}`` with the name of the pool:
+
+.. prompt:: bash $
+
+ rbd ls {poolname}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd ls swimmingpool
+
+To list "deferred delete" block devices in the ``rbd`` pool, run the
+following command:
+
+.. prompt:: bash $
+
+ rbd trash ls
+
+To list "deferred delete" block devices in a particular pool, run the
+following command, but replace ``{poolname}`` with the name of the pool:
+
+.. prompt:: bash $
+
+ rbd trash ls {poolname}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd trash ls swimmingpool
+
+Retrieving Image Information
+============================
+
+To retrieve information from a particular image, run the following command, but
+replace ``{image-name}`` with the name for the image:
+
+.. prompt:: bash $
+
+ rbd info {image-name}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd info foo
+
+To retrieve information from an image within a pool, run the following command,
+but replace ``{image-name}`` with the name of the image and replace
+``{pool-name}`` with the name of the pool:
+
+.. prompt:: bash $
+
+ rbd info {pool-name}/{image-name}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd info swimmingpool/bar
+
+.. note:: Other naming conventions are possible, and might conflict with the
+ naming convention described here. For example, ``userid/<uuid>`` is a
+ possible name for an RBD image, and such a name might (at the least) be
+ confusing.
+
+Resizing a Block Device Image
+=============================
+
+:term:`Ceph Block Device` images are thin provisioned. They don't actually use
+any physical storage until you begin saving data to them. However, they do have
+a maximum capacity that you set with the ``--size`` option. If you want to
+increase (or decrease) the maximum size of a Ceph Block Device image, run one
+of the following commands:
+
+Increasing the Size of a Block Device Image
+-------------------------------------------
+
+.. prompt:: bash $
+
+ rbd resize --size 2048 foo
+
+Decreasing the Size of a Block Device Image
+-------------------------------------------
+
+.. prompt:: bash $
+
+ rbd resize --size 2048 foo --allow-shrink
+
+
+Removing a Block Device Image
+=============================
+
+To remove a block device, run the following command, but replace
+``{image-name}`` with the name of the image you want to remove:
+
+.. prompt:: bash $
+
+ rbd rm {image-name}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd rm foo
+
+Removing a Block Device from a Pool
+-----------------------------------
+
+To remove a block device from a pool, run the following command but replace
+``{image-name}`` with the name of the image to be removed, and replace
+``{pool-name}`` with the name of the pool from which the image is to be
+removed:
+
+.. prompt:: bash $
+
+ rbd rm {pool-name}/{image-name}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd rm swimmingpool/bar
+
+"Defer Deleting" a Block Device from a Pool
+-------------------------------------------
+
+To defer delete a block device from a pool (which entails moving it to the
+"trash" and deleting it later), run the following command but replace
+``{image-name}`` with the name of the image to be moved to the trash and
+replace ``{pool-name}`` with the name of the pool:
+
+.. prompt:: bash $
+
+ rbd trash mv {pool-name}/{image-name}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd trash mv swimmingpool/bar
+
+Removing a Deferred Block Device from a Pool
+--------------------------------------------
+
+To remove a deferred block device from a pool, run the following command but
+replace ``{image-}`` with the ID of the image to be removed, and replace
+``{pool-name}`` with the name of the pool from which the image is to be
+removed:
+
+.. prompt:: bash $
+
+ rbd trash rm {pool-name}/{image-}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd trash rm swimmingpool/2bf4474b0dc51
+
+.. note::
+
+ * You can move an image to the trash even if it has snapshot(s) or is
+ actively in use by clones. However, you cannot remove it from the trash
+ under those conditions.
+
+ * You can use ``--expires-at`` to set the deferment time (default is
+ ``now``). If the deferment time has not yet arrived, you cannot remove the
+ image unless you use ``--force``.
+
+Restoring a Block Device Image
+==============================
+
+To restore a deferred delete block device in the rbd pool, run the
+following command but replace ``{image-id}`` with the ID of the image:
+
+.. prompt:: bash $
+
+ rbd trash restore {image-id}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd trash restore 2bf4474b0dc51
+
+Restoring a Block Device Image in a Specific Pool
+-------------------------------------------------
+
+To restore a deferred delete block device in a particular pool, run the
+following command but replace ``{image-id}`` with the ID of the image and
+replace ``{pool-name}`` with the name of the pool:
+
+.. prompt:: bash $
+
+ rbd trash restore {pool-name}/{image-id}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd trash restore swimmingpool/2bf4474b0dc51
+
+
+Renaming an Image While Restoring It
+------------------------------------
+
+You can also use ``--image`` to rename the image while restoring it.
+
+For example:
+
+.. prompt:: bash $
+
+ rbd trash restore swimmingpool/2bf4474b0dc51 --image new-name
+
+
+.. _create a pool: ../../rados/operations/pools/#create-a-pool
+.. _Storage Pools: ../../rados/operations/pools
+.. _RBD – Manage RADOS Block Device (RBD) Images: ../../man/8/rbd/
+.. _create a Ceph user: ../../rados/operations/user-management#add-a-user
diff --git a/doc/rbd/rbd-cloudstack.rst b/doc/rbd/rbd-cloudstack.rst
new file mode 100644
index 000000000..1b961234b
--- /dev/null
+++ b/doc/rbd/rbd-cloudstack.rst
@@ -0,0 +1,157 @@
+=============================
+ Block Devices and CloudStack
+=============================
+
+You may use Ceph Block Device images with CloudStack 4.0 and higher through
+``libvirt``, which configures the QEMU interface to ``librbd``. Ceph stripes
+block device images as objects across the cluster, which means that large Ceph
+Block Device images have better performance than a standalone server!
+
+To use Ceph Block Devices with CloudStack 4.0 and higher, you must install QEMU,
+``libvirt``, and CloudStack first. We recommend using a separate physical host
+for your CloudStack installation. CloudStack recommends a minimum of 4GB of RAM
+and a dual-core processor, but more CPU and RAM will perform better. The
+following diagram depicts the CloudStack/Ceph technology stack.
+
+
+.. ditaa::
+
+ +---------------------------------------------------+
+ | CloudStack |
+ +---------------------------------------------------+
+ | libvirt |
+ +------------------------+--------------------------+
+ |
+ | configures
+ v
+ +---------------------------------------------------+
+ | QEMU |
+ +---------------------------------------------------+
+ | librbd |
+ +---------------------------------------------------+
+ | librados |
+ +------------------------+-+------------------------+
+ | OSDs | | Monitors |
+ +------------------------+ +------------------------+
+
+.. important:: To use Ceph Block Devices with CloudStack, you must have
+ access to a running Ceph Storage Cluster.
+
+CloudStack integrates with Ceph's block devices to provide CloudStack with a
+back end for CloudStack's Primary Storage. The instructions below detail the
+setup for CloudStack Primary Storage.
+
+.. note:: We recommend installing with Ubuntu 14.04 or later so that
+ you can use package installation instead of having to compile
+ libvirt from source.
+
+Installing and configuring QEMU for use with CloudStack doesn't require any
+special handling. Ensure that you have a running Ceph Storage Cluster. Install
+QEMU and configure it for use with Ceph; then, install ``libvirt`` version
+0.9.13 or higher (you may need to compile from source) and ensure it is running
+with Ceph.
+
+
+.. note:: Ubuntu 14.04 and CentOS 7.2 will have ``libvirt`` with RBD storage
+ pool support enabled by default.
+
+.. index:: pools; CloudStack
+
+Create a Pool
+=============
+
+By default, Ceph block devices use the ``rbd`` pool. Create a pool for
+CloudStack NFS Primary Storage. Ensure your Ceph cluster is running, then create
+the pool. ::
+
+ ceph osd pool create cloudstack
+
+See `Create a Pool`_ for details on specifying the number of placement groups
+for your pools, and `Placement Groups`_ for details on the number of placement
+groups you should set for your pools.
+
+A newly created pool must be initialized prior to use. Use the ``rbd`` tool
+to initialize the pool::
+
+ rbd pool init cloudstack
+
+Create a Ceph User
+==================
+
+To access the Ceph cluster we require a Ceph user which has the correct
+credentials to access the ``cloudstack`` pool we just created. Although we could
+use ``client.admin`` for this, it's recommended to create a user with only
+access to the ``cloudstack`` pool. ::
+
+ ceph auth get-or-create client.cloudstack mon 'profile rbd' osd 'profile rbd pool=cloudstack'
+
+Use the information returned by the command in the next step when adding the
+Primary Storage.
+
+See `User Management`_ for additional details.
+
+Add Primary Storage
+===================
+
+To add a Ceph block device as Primary Storage, the steps include:
+
+#. Log in to the CloudStack UI.
+#. Click **Infrastructure** on the left side navigation bar.
+#. Select **View All** under **Primary Storage**.
+#. Click the **Add Primary Storage** button on the top right hand side.
+#. Fill in the following information, according to your infrastructure setup:
+
+ - Scope (i.e. Cluster or Zone-Wide).
+
+ - Zone.
+
+ - Pod.
+
+ - Cluster.
+
+ - Name of Primary Storage.
+
+ - For **Protocol**, select ``RBD``.
+
+ - For **Provider**, select the appropriate provider type (i.e. DefaultPrimary, SolidFire, SolidFireShared, or CloudByte). Depending on the provider chosen, fill out the information pertinent to your setup.
+
+#. Add cluster information (``cephx`` is supported).
+
+ - For **RADOS Monitor**, provide the IP address of a Ceph monitor node.
+
+ - For **RADOS Pool**, provide the name of an RBD pool.
+
+ - For **RADOS User**, provide a user that has sufficient rights to the RBD pool. Note: Do not include the ``client.`` part of the user.
+
+ - For **RADOS Secret**, provide the secret the user's secret.
+
+ - **Storage Tags** are optional. Use tags at your own discretion. For more information about storage tags in CloudStack, refer to `Storage Tags`_.
+
+#. Click **OK**.
+
+Create a Disk Offering
+======================
+
+To create a new disk offering, refer to `Create a New Disk Offering`_.
+Create a disk offering so that it matches the ``rbd`` tag.
+The ``StoragePoolAllocator`` will choose the ``rbd``
+pool when searching for a suitable storage pool. If the disk offering doesn't
+match the ``rbd`` tag, the ``StoragePoolAllocator`` may select the pool you
+created (e.g., ``cloudstack``).
+
+
+Limitations
+===========
+
+- CloudStack will only bind to one monitor (You can however create a Round Robin DNS record over multiple monitors)
+
+
+
+.. _Create a Pool: ../../rados/operations/pools#createpool
+.. _Placement Groups: ../../rados/operations/placement-groups
+.. _Install and Configure QEMU: ../qemu-rbd
+.. _Install and Configure libvirt: ../libvirt
+.. _KVM Hypervisor Host Installation: http://docs.cloudstack.apache.org/en/latest/installguide/hypervisor/kvm.html
+.. _Storage Tags: http://docs.cloudstack.apache.org/en/latest/adminguide/storage.html#storage-tags
+.. _Create a New Disk Offering: http://docs.cloudstack.apache.org/en/latest/adminguide/service_offerings.html#creating-a-new-disk-offering
+.. _User Management: ../../rados/operations/user-management
diff --git a/doc/rbd/rbd-config-ref.rst b/doc/rbd/rbd-config-ref.rst
new file mode 100644
index 000000000..c21731adc
--- /dev/null
+++ b/doc/rbd/rbd-config-ref.rst
@@ -0,0 +1,265 @@
+=======================
+ Config Settings
+=======================
+
+See `Block Device`_ for additional details.
+
+Generic IO Settings
+===================
+
+.. confval:: rbd_compression_hint
+.. confval:: rbd_read_from_replica_policy
+.. confval:: rbd_default_order
+
+Cache Settings
+=======================
+
+.. sidebar:: Kernel Caching
+
+ The kernel driver for Ceph block devices can use the Linux page cache to
+ improve performance.
+
+The user space implementation of the Ceph block device (i.e., ``librbd``) cannot
+take advantage of the Linux page cache, so it includes its own in-memory
+caching, called "RBD caching." RBD caching behaves just like well-behaved hard
+disk caching. When the OS sends a barrier or a flush request, all dirty data is
+written to the OSDs. This means that using write-back caching is just as safe as
+using a well-behaved physical hard disk with a VM that properly sends flushes
+(i.e. Linux kernel >= 2.6.32). The cache uses a Least Recently Used (LRU)
+algorithm, and in write-back mode it can coalesce contiguous requests for
+better throughput.
+
+The librbd cache is enabled by default and supports three different cache
+policies: write-around, write-back, and write-through. Writes return
+immediately under both the write-around and write-back policies, unless there
+are more than ``rbd_cache_max_dirty`` unwritten bytes to the storage cluster.
+The write-around policy differs from the write-back policy in that it does
+not attempt to service read requests from the cache, unlike the write-back
+policy, and is therefore faster for high performance write workloads. Under the
+write-through policy, writes return only when the data is on disk on all
+replicas, but reads may come from the cache.
+
+Prior to receiving a flush request, the cache behaves like a write-through cache
+to ensure safe operation for older operating systems that do not send flushes to
+ensure crash consistent behavior.
+
+If the librbd cache is disabled, writes and
+reads go directly to the storage cluster, and writes return only when the data
+is on disk on all replicas.
+
+.. note::
+ The cache is in memory on the client, and each RBD image has
+ its own. Since the cache is local to the client, there's no coherency
+ if there are others accessing the image. Running GFS or OCFS on top of
+ RBD will not work with caching enabled.
+
+
+Option settings for RBD should be set in the ``[client]``
+section of your configuration file or the central config store. These settings
+include:
+
+.. confval:: rbd_cache
+.. confval:: rbd_cache_policy
+.. confval:: rbd_cache_writethrough_until_flush
+.. confval:: rbd_cache_size
+.. confval:: rbd_cache_max_dirty
+.. confval:: rbd_cache_target_dirty
+.. confval:: rbd_cache_max_dirty_age
+
+.. _Block Device: ../../rbd
+
+
+Read-ahead Settings
+=======================
+
+librbd supports read-ahead/prefetching to optimize small, sequential reads.
+This should normally be handled by the guest OS in the case of a VM,
+but boot loaders may not issue efficient reads. Read-ahead is automatically
+disabled if caching is disabled or if the policy is write-around.
+
+
+.. confval:: rbd_readahead_trigger_requests
+.. confval:: rbd_readahead_max_bytes
+.. confval:: rbd_readahead_disable_after_bytes
+
+Image Features
+==============
+
+RBD supports advanced features which can be specified via the command line when
+creating images or the default features can be configured via
+``rbd_default_features = <sum of feature numeric values>`` or
+``rbd_default_features = <comma-delimited list of CLI values>``.
+
+``Layering``
+
+:Description: Layering enables cloning.
+:Internal value: 1
+:CLI value: layering
+:Added in: v0.52 (Bobtail)
+:KRBD support: since v3.10
+:Default: yes
+
+``Striping v2``
+
+:Description: Striping spreads data across multiple objects. Striping helps with
+ parallelism for sequential read/write workloads.
+:Internal value: 2
+:CLI value: striping
+:Added in: v0.55 (Bobtail)
+:KRBD support: since v3.10 (default striping only, "fancy" striping added in v4.17)
+:Default: yes
+
+``Exclusive locking``
+
+:Description: When enabled, it requires a client to acquire a lock on an object
+ before making a write. Exclusive lock should only be enabled when
+ a single client is accessing an image at any given time.
+:Internal value: 4
+:CLI value: exclusive-lock
+:Added in: v0.92 (Hammer)
+:KRBD support: since v4.9
+:Default: yes
+
+``Object map``
+
+:Description: Object map support depends on exclusive lock support. Block
+ devices are thin provisioned, which means that they only store
+ data that actually has been written, ie. they are *sparse*. Object
+ map support helps track which objects actually exist (have data
+ stored on a device). Enabling object map support speeds up I/O
+ operations for cloning, importing and exporting a sparsely
+ populated image, and deleting.
+:Internal value: 8
+:CLI value: object-map
+:Added in: v0.93 (Hammer)
+:KRBD support: since v5.3
+:Default: yes
+
+
+``Fast-diff``
+
+:Description: Fast-diff support depends on object map support and exclusive lock
+ support. It adds another property to the object map, which makes
+ it much faster to generate diffs between snapshots of an image.
+ It is also much faster to calculate the actual data usage of a
+ snapshot or volume (``rbd du``).
+:Internal value: 16
+:CLI value: fast-diff
+:Added in: v9.0.1 (Infernalis)
+:KRBD support: since v5.3
+:Default: yes
+
+
+``Deep-flatten``
+
+:Description: Deep-flatten enables ``rbd flatten`` to work on all snapshots of
+ an image, in addition to the image itself. Without it, snapshots
+ of an image will still rely on the parent, so the parent cannot be
+ deleted until the snapshots are first deleted. Deep-flatten makes
+ a parent independent of its clones, even if they have snapshots,
+ at the expense of using additional OSD device space.
+:Internal value: 32
+:CLI value: deep-flatten
+:Added in: v9.0.2 (Infernalis)
+:KRBD support: since v5.1
+:Default: yes
+
+
+``Journaling``
+
+:Description: Journaling support depends on exclusive lock support. Journaling
+ records all modifications to an image in the order they occur. RBD
+ mirroring can utilize the journal to replicate a crash-consistent
+ image to a remote cluster. It is best to let ``rbd-mirror``
+ manage this feature only as needed, as enabling it long term may
+ result in substantial additional OSD space consumption.
+:Internal value: 64
+:CLI value: journaling
+:Added in: v10.0.1 (Jewel)
+:KRBD support: no
+:Default: no
+
+
+``Data pool``
+
+:Description: On erasure-coded pools, the image data block objects need to be stored on a separate pool from the image metadata.
+:Internal value: 128
+:Added in: v11.1.0 (Kraken)
+:KRBD support: since v4.11
+:Default: no
+
+
+``Operations``
+
+:Description: Used to restrict older clients from performing certain maintenance operations against an image (e.g. clone, snap create).
+:Internal value: 256
+:Added in: v13.0.2 (Mimic)
+:KRBD support: since v4.16
+
+
+``Migrating``
+
+:Description: Used to restrict older clients from opening an image when it is in migration state.
+:Internal value: 512
+:Added in: v14.0.1 (Nautilus)
+:KRBD support: no
+
+``Non-primary``
+
+:Description: Used to restrict changes to non-primary images using snapshot-based mirroring.
+:Internal value: 1024
+:Added in: v15.2.0 (Octopus)
+:KRBD support: no
+
+
+QoS Settings
+============
+
+librbd supports limiting per-image IO in several ways. These all apply
+to a given image within a given process - the same image used in
+multiple places, e.g. two separate VMs, would have independent limits.
+
+* **IOPS:** number of I/Os per second (any type of I/O)
+* **read IOPS:** number of read I/Os per second
+* **write IOPS:** number of write I/Os per second
+* **bps:** bytes per second (any type of I/O)
+* **read bps:** bytes per second read
+* **write bps:** bytes per second written
+
+Each of these limits operates independently of each other. They are
+all off by default. Every type of limit throttles I/O using a token
+bucket algorithm, with the ability to configure the limit (average
+speed over time) and potential for a higher rate (a burst) for a short
+period of time (burst_seconds). When any of these limits is reached,
+and there is no burst capacity left, librbd reduces the rate of that
+type of I/O to the limit.
+
+For example, if a read bps limit of 100MB was configured, but writes
+were not limited, writes could proceed as quickly as possible, while
+reads would be throttled to 100MB/s on average. If a read bps burst of
+150MB was set, and read burst seconds was set to five seconds, reads
+could proceed at 150MB/s for up to five seconds before dropping back
+to the 100MB/s limit.
+
+The following options configure these throttles:
+
+.. confval:: rbd_qos_iops_limit
+.. confval:: rbd_qos_iops_burst
+.. confval:: rbd_qos_iops_burst_seconds
+.. confval:: rbd_qos_read_iops_limit
+.. confval:: rbd_qos_read_iops_burst
+.. confval:: rbd_qos_read_iops_burst_seconds
+.. confval:: rbd_qos_write_iops_limit
+.. confval:: rbd_qos_write_iops_burst
+.. confval:: rbd_qos_write_iops_burst_seconds
+.. confval:: rbd_qos_bps_limit
+.. confval:: rbd_qos_bps_burst
+.. confval:: rbd_qos_bps_burst_seconds
+.. confval:: rbd_qos_read_bps_limit
+.. confval:: rbd_qos_read_bps_burst
+.. confval:: rbd_qos_read_bps_burst_seconds
+.. confval:: rbd_qos_write_bps_limit
+.. confval:: rbd_qos_write_bps_burst
+.. confval:: rbd_qos_write_bps_burst_seconds
+.. confval:: rbd_qos_schedule_tick_min
+.. confval:: rbd_qos_exclude_ops
diff --git a/doc/rbd/rbd-encryption.rst b/doc/rbd/rbd-encryption.rst
new file mode 100644
index 000000000..3f37a8b1c
--- /dev/null
+++ b/doc/rbd/rbd-encryption.rst
@@ -0,0 +1,246 @@
+======================
+ Image Encryption
+======================
+
+.. index:: Ceph Block Device; encryption
+
+Starting with the Pacific release, image-level encryption can be handled
+internally by RBD clients. This means you can set a secret key that will be
+used to encrypt a specific RBD image. This page describes the scope of the
+RBD encryption feature.
+
+.. note::
+ The ``krbd`` kernel module does not support encryption at this time.
+
+.. note::
+ External tools (e.g. dm-crypt, QEMU) can be used as well to encrypt
+ an RBD image, and the feature set and limitation set for that use may be
+ different than described here.
+
+Encryption Format
+=================
+
+By default, RBD images are not encrypted. To encrypt an RBD image, it needs to
+be formatted to one of the supported encryption formats. The format operation
+persists encryption metadata to the image. The encryption metadata usually
+includes information such as the encryption format and version, cipher
+algorithm and mode specification, as well as information used to secure the
+encryption key. The encryption key itself is protected by a user-kept secret
+(usually a passphrase), which is never persisted. The basic encryption format
+operation will require specifying the encryption format and a secret.
+
+Some of the encryption metadata may be stored as part of the image data,
+typically an encryption header will be written to the beginning of the raw
+image data. This means that the effective image size of the encrypted image may
+be lower than the raw image size. See the `Supported Formats`_ section for more
+details.
+
+.. note::
+ Unless explicitly (re-)formatted, clones of an encrypted image are
+ inherently encrypted using the same format and secret.
+
+.. note::
+ Clones of an encrypted image are always encrypted.
+ Re-formatting to plaintext is not supported.
+
+.. note::
+ Any data written to the image prior to its format may become unreadable,
+ though it may still occupy storage resources.
+
+.. note::
+ Images with the `journal feature`_ enabled cannot be formatted and encrypted
+ by RBD clients.
+
+Encryption Load
+=================
+
+Formatting an image is a necessary pre-requisite for enabling encryption.
+However, formatted images will still be treated as raw unencrypted images by
+all of the RBD APIs. In particular, an encrypted RBD image can be opened
+by the same APIs as any other image, and raw unencrypted data can be
+read / written. Such raw IOs may risk the integrity of the encryption format,
+for example by overriding encryption metadata located at the beginning of the
+image.
+
+In order to safely perform encrypted IO on the formatted image, an additional
+*encryption load* operation should be applied after opening the image. The
+encryption load operation requires supplying the encryption format and a secret
+for unlocking the encryption key for the image itself and each of its explicitly
+formatted ancestor images. Following a successful encryption load operation,
+all IOs for the opened image will be encrypted / decrypted. For a cloned
+image, this includes IOs for ancestor images as well. The encryption keys will
+be stored in-memory by the RBD client until the image is closed.
+
+.. note::
+ Once encryption has been loaded, no other encryption load / format
+ operations can be applied to the context of the opened image.
+
+.. note::
+ Once encryption has been loaded, API calls for retrieving the image size
+ and the parent overlap using the opened image context will return the
+ effective image size and the effective parent overlap respectively.
+
+.. note::
+ Once encryption has been loaded, API calls for resizing the image will
+ interpret the specified target size as effective image size.
+
+.. note::
+ If a clone of an encrypted image is explicitly formatted, the operation of
+ flattening the cloned image ceases to be transparent since the parent data
+ must be re-encrypted according to the cloned image format as it is copied
+ from the parent snapshot. If encryption is not loaded before the flatten
+ operation is issued, any parent data that was previously accessible in the
+ cloned image may become unreadable.
+
+.. note::
+ If a clone of an encrypted image is explicitly formatted, the operation of
+ shrinking the cloned image ceases to be transparent since in some cases
+ (e.g. if the cloned image has snapshots or if the cloned image is being
+ shrunk to a size that is not aligned with the object size) it involves
+ copying some data from the parent snapshot, similar to flattening. If
+ encryption is not loaded before the shrink operation is issued, any parent
+ data that was previously accessible in the cloned image may become
+ unreadable.
+
+.. note::
+ Encryption load can be automatically applied when mounting RBD images as
+ block devices via `rbd-nbd`_.
+
+Supported Formats
+=================
+
+LUKS
+~~~~~~~
+
+Both LUKS1 and LUKS2 are supported. The data layout is fully compliant with the
+LUKS specification. Thus, images formatted by RBD can be loaded using external
+LUKS-supporting tools such as dm-crypt or QEMU. Furthermore, existing LUKS
+data, created outside of RBD, can be imported (by copying the raw LUKS data
+into the image) and loaded by RBD encryption.
+
+.. note::
+ The LUKS formats are supported on Linux-based systems only.
+
+.. note::
+ Currently, only AES-128 and AES-256 encryption algorithms are supported.
+ Additionally, xts-plain64 is currently the only supported encryption mode.
+
+To use the LUKS format, start by formatting the image:
+
+.. prompt:: bash $
+
+ rbd encryption format [--cipher-alg {aes-128|aes-256}] {image-spec} {luks1|luks2} {passphrase-file}
+
+The encryption format operation generates a LUKS header and writes it to the
+beginning of the image. The header is appended with a single keyslot holding a
+randomly-generated encryption key, and is protected by the passphrase read from
+`passphrase-file`.
+
+.. note::
+ In older versions, if the content of `passphrase-file` ended with a newline
+ character, it was stripped off.
+
+By default, AES-256 in xts-plain64 mode (which is the current recommended mode,
+and the usual default for other tools) will be used. The format operation
+allows selecting AES-128 as well. Adding / removing passphrases is currently
+not supported by RBD, but can be applied to the raw RBD data using compatible
+tools such as cryptsetup.
+
+The LUKS header size can vary (up to 136MiB in LUKS2), but is usually up to
+16MiB, depending on the version of `libcryptsetup` installed. For optimal
+performance, the encryption format will set the data offset to be aligned with
+the image stripe period size. For example, expect a minimum overhead of 8MiB if
+using an image configured with an 8MiB object size and a minimum overhead of
+12MiB if using an image configured with a 4MiB object size and `stripe count`_
+of 3.
+
+In LUKS1, sectors, which are the minimal encryption units, are fixed at 512
+bytes. LUKS2 supports larger sectors, and for better performance we set
+the default sector size to the maximum of 4KiB. Writes which are either smaller
+than a sector, or are not aligned to a sector start, will trigger a guarded
+read-modify-write chain on the client, with a considerable latency penalty.
+A batch of such unaligned writes can lead to IO races which will further
+deteriorate performance. Thus it is advisable to avoid using RBD encryption
+in cases where incoming writes cannot be guaranteed to be sector-aligned.
+
+To map a LUKS-formatted image run:
+
+.. prompt:: bash #
+
+ rbd device map -t nbd -o encryption-passphrase-file={passphrase-file} {image-spec}
+
+Note that for security reasons, both the encryption format and encryption load
+operations are CPU-intensive, and may take a few seconds to complete. For the
+encryption operations of actual image IO, assuming AES-NI is enabled,
+a relative small microseconds latency should be added, as well as a small
+increase in CPU utilization.
+
+Examples
+========
+
+Create a LUKS2-formatted image with the effective size of 50GiB:
+
+.. prompt:: bash $
+
+ rbd create --size 50G mypool/myimage
+ rbd encryption format mypool/myimage luks2 passphrase.bin
+ rbd resize --size 50G --encryption-passphrase-file passphrase.bin mypool/myimage
+
+``rbd resize`` command at the end grows the image to compensate for the
+overhead associated with the LUKS2 header.
+
+Given a LUKS2-formatted image, create a LUKS2-formatted clone with the
+same effective size:
+
+.. prompt:: bash $
+
+ rbd snap create mypool/myimage@snap
+ rbd snap protect mypool/myimage@snap
+ rbd clone mypool/myimage@snap mypool/myclone
+ rbd encryption format mypool/myclone luks2 clone-passphrase.bin
+
+Given a LUKS2-formatted image with the effective size of 50GiB, create
+a LUKS1-formatted clone with the same effective size:
+
+.. prompt:: bash $
+
+ rbd snap create mypool/myimage@snap
+ rbd snap protect mypool/myimage@snap
+ rbd clone mypool/myimage@snap mypool/myclone
+ rbd encryption format mypool/myclone luks1 clone-passphrase.bin
+ rbd resize --size 50G --allow-shrink --encryption-passphrase-file clone-passphrase.bin --encryption-passphrase-file passphrase.bin mypool/myclone
+
+Since LUKS1 header is usually smaller than LUKS2 header, ``rbd resize``
+command at the end shrinks the cloned image to get rid of unneeded
+space allowance.
+
+Given a LUKS1-formatted image with the effective size of 50GiB, create
+a LUKS2-formatted clone with the same effective size:
+
+.. prompt:: bash $
+
+ rbd resize --size 51G mypool/myimage
+ rbd snap create mypool/myimage@snap
+ rbd snap protect mypool/myimage@snap
+ rbd clone mypool/myimage@snap mypool/myclone
+ rbd encryption format mypool/myclone luks2 clone-passphrase.bin
+ rbd resize --size 50G --allow-shrink --encryption-passphrase-file passphrase.bin mypool/myimage
+ rbd resize --size 50G --allow-shrink --encryption-passphrase-file clone-passphrase.bin --encryption-passphrase-file passphrase.bin mypool/myclone
+
+Since LUKS2 header is usually bigger than LUKS1 header, ``rbd resize``
+command at the beginning temporarily grows the parent image to reserve
+some extra space in the parent snapshot and consequently the cloned
+image. This is necessary to make all parent data accessible in the
+cloned image. ``rbd resize`` commands at the end shrink the parent
+image back to its original size (this does not impact the parent
+snapshot) and also the cloned image to get rid of unused reserved
+space.
+
+The same applies to creating a formatted clone of an unformatted
+(plaintext) image since an unformatted image does not have a header at
+all.
+
+.. _journal feature: ../rbd-mirroring/#enable-image-journaling-feature
+.. _Supported Formats: #supported-formats
+.. _rbd-nbd: ../../man/8/rbd-nbd
+.. _stripe count: ../../man/8/rbd/#striping
diff --git a/doc/rbd/rbd-exclusive-locks.rst b/doc/rbd/rbd-exclusive-locks.rst
new file mode 100644
index 000000000..f9b99dfb4
--- /dev/null
+++ b/doc/rbd/rbd-exclusive-locks.rst
@@ -0,0 +1,104 @@
+.. _rbd-exclusive-locks:
+
+====================
+ RBD Exclusive Locks
+====================
+
+.. index:: Ceph Block Device; RBD exclusive locks; exclusive-lock
+
+Exclusive locks are mechanisms designed to prevent multiple processes from
+accessing the same Rados Block Device (RBD) in an uncoordinated fashion.
+Exclusive locks are used heavily in virtualization (where they prevent VMs from
+clobbering each other's writes) and in `RBD mirroring`_ (where they are a
+prerequisite for journaling in journal-based mirroring and fast generation of
+incremental diffs in snapshot-based mirroring).
+
+The ``exclusive-lock`` feature is enabled on newly created images. This default
+can be overridden via the ``rbd_default_features`` configuration option or the
+``--image-feature`` and ``--image-shared`` options for ``rbd create`` command.
+
+.. note::
+ Many image features, including ``object-map`` and ``fast-diff``, depend upon
+ exclusive locking. Disabling the ``exclusive-lock`` feature will negatively
+ affect the performance of some operations.
+
+To maintain multi-client access, the ``exclusive-lock`` feature implements
+automatic cooperative lock transitions between clients. It ensures that only
+a single client can write to an RBD image at any given time and thus protects
+internal image structures such as the object map, the journal or the `PWL
+cache`_ from concurrent modification.
+
+Exclusive locking is mostly transparent to the user:
+
+* Whenever a client (a ``librbd`` process or, in case of a ``krbd`` client,
+ a client node's kernel) needs to handle a write to an RBD image on which
+ exclusive locking has been enabled, it first acquires an exclusive lock on
+ the image. If the lock is already held by some other client, that client is
+ requested to release it.
+
+* Whenever a client that holds an exclusive lock on an RBD image gets
+ a request to release the lock, it stops handling writes, flushes its caches
+ and releases the lock.
+
+* Whenever a client that holds an exclusive lock on an RBD image terminates
+ gracefully, the lock is also released gracefully.
+
+* A graceful release of an exclusive lock on an RBD image (whether by request
+ or due to client termination) enables another, subsequent, client to acquire
+ the lock and start handling writes.
+
+.. warning::
+ By default, the ``exclusive-lock`` feature does not prevent two or more
+ concurrently running clients from opening the same RBD image and writing to
+ it in turns (whether on the same node or not). In effect, their writes just
+ get linearized as the lock is automatically transitioned back and forth in
+ a cooperative fashion.
+
+.. note::
+ To disable automatic lock transitions between clients, the
+ ``RBD_LOCK_MODE_EXCLUSIVE`` flag may be specified when acquiring the
+ exclusive lock. This is exposed by the ``--exclusive`` option for ``rbd
+ device map`` command.
+
+
+Blocklisting
+============
+
+Sometimes a client that previously held an exclusive lock on an RBD image does
+not terminate gracefully, but dies abruptly. This may be because the client
+process received a ``KILL`` or ``ABRT`` signal, or because the client node
+underwent a hard reboot or suffered a power failure. In cases like this, the
+lock is never gracefully released. This means that any new client that comes up
+and attempts to write to the image must break the previously held exclusive
+lock.
+
+However, a process (or kernel thread) may hang or merely lose network
+connectivity to the Ceph cluster for some amount of time. In that case,
+breaking the lock would be potentially catastrophic: the hung process or
+connectivity issue could resolve itself and the original process might then
+compete with one that started in the interim, thus accessing RBD data in an
+uncoordinated and destructive manner.
+
+In the event that a lock cannot be acquired in the standard graceful manner,
+the overtaking process not only breaks the lock but also blocklists the
+previous lock holder. This is negotiated between the new client process and the
+Ceph Monitor.
+
+* Upon receiving the blocklist request, the monitor instructs the relevant OSDs
+ to no longer serve requests from the old client process;
+* after the associated OSD map update is complete, the new client can break the
+ previously held lock;
+* after the new client has acquired the lock, it can commence writing
+ to the image.
+
+Blocklisting is thus a form of storage-level resource `fencing`_.
+
+.. note::
+ In order for blocklisting to work, the client must have the ``osd
+ blocklist`` capability. This capability is included in the ``profile
+ rbd`` capability profile, which should be set generally on all Ceph
+ :ref:`client identities <user-management>` using RBD.
+
+.. _RBD mirroring: ../rbd-mirroring
+.. _PWL cache: ../rbd-persistent-write-log-cache
+.. _fencing: https://en.wikipedia.org/wiki/Fencing_(computing)
diff --git a/doc/rbd/rbd-integrations.rst b/doc/rbd/rbd-integrations.rst
new file mode 100644
index 000000000..f55604a6f
--- /dev/null
+++ b/doc/rbd/rbd-integrations.rst
@@ -0,0 +1,16 @@
+=========================================
+ Ceph Block Device 3rd Party Integration
+=========================================
+
+.. toctree::
+ :maxdepth: 1
+
+ Kernel Modules <rbd-ko>
+ QEMU <qemu-rbd>
+ libvirt <libvirt>
+ Kubernetes <rbd-kubernetes>
+ Nomad <rbd-nomad>
+ OpenStack <rbd-openstack>
+ CloudStack <rbd-cloudstack>
+ LIO iSCSI Gateway <iscsi-overview>
+ Windows <rbd-windows>
diff --git a/doc/rbd/rbd-ko.rst b/doc/rbd/rbd-ko.rst
new file mode 100644
index 000000000..70c407839
--- /dev/null
+++ b/doc/rbd/rbd-ko.rst
@@ -0,0 +1,59 @@
+==========================
+ Kernel Module Operations
+==========================
+
+.. index:: Ceph Block Device; kernel module
+
+.. important:: To use kernel module operations, you must have a running Ceph cluster.
+
+Get a List of Images
+====================
+
+To mount a block device image, first return a list of the images. ::
+
+ rbd list
+
+Map a Block Device
+==================
+
+Use ``rbd`` to map an image name to a kernel module. You must specify the
+image name, the pool name, and the user name. ``rbd`` will load RBD kernel
+module on your behalf if it's not already loaded. ::
+
+ sudo rbd device map {pool-name}/{image-name} --id {user-name}
+
+For example::
+
+ sudo rbd device map rbd/myimage --id admin
+
+If you use `cephx`_ authentication, you must also specify a secret. It may come
+from a keyring or a file containing the secret. ::
+
+ sudo rbd device map rbd/myimage --id admin --keyring /path/to/keyring
+ sudo rbd device map rbd/myimage --id admin --keyfile /path/to/file
+
+
+Show Mapped Block Devices
+=========================
+
+To show block device images mapped to kernel modules with the ``rbd``,
+specify ``device list`` arguments. ::
+
+ rbd device list
+
+
+Unmapping a Block Device
+========================
+
+To unmap a block device image with the ``rbd`` command, specify the
+``device unmap`` arguments and the device name (i.e., by convention the
+same as the block device image name). ::
+
+ sudo rbd device unmap /dev/rbd/{poolname}/{imagename}
+
+For example::
+
+ sudo rbd device unmap /dev/rbd/rbd/foo
+
+
+.. _cephx: ../../rados/operations/user-management/
diff --git a/doc/rbd/rbd-kubernetes.rst b/doc/rbd/rbd-kubernetes.rst
new file mode 100644
index 000000000..ccec4813a
--- /dev/null
+++ b/doc/rbd/rbd-kubernetes.rst
@@ -0,0 +1,364 @@
+==============================
+ Block Devices and Kubernetes
+==============================
+
+You may use Ceph Block Device images with Kubernetes v1.13 and later through
+`ceph-csi`_, which dynamically provisions RBD images to back Kubernetes
+`volumes`_ and maps these RBD images as block devices (optionally mounting
+a file system contained within the image) on worker nodes running
+`pods`_ that reference an RBD-backed volume. Ceph stripes block device images as
+objects across the cluster, which means that large Ceph Block Device images have
+better performance than a standalone server!
+
+To use Ceph Block Devices with Kubernetes v1.13 and higher, you must install
+and configure ``ceph-csi`` within your Kubernetes environment. The following
+diagram depicts the Kubernetes/Ceph technology stack.
+
+.. ditaa::
+ +---------------------------------------------------+
+ | Kubernetes |
+ +---------------------------------------------------+
+ | ceph--csi |
+ +------------------------+--------------------------+
+ |
+ | configures
+ v
+ +------------------------+ +------------------------+
+ | | | rbd--nbd |
+ | Kernel Modules | +------------------------+
+ | | | librbd |
+ +------------------------+-+------------------------+
+ | RADOS Protocol |
+ +------------------------+-+------------------------+
+ | OSDs | | Monitors |
+ +------------------------+ +------------------------+
+
+
+.. important::
+ ``ceph-csi`` uses the RBD kernel modules by default which may not support all
+ Ceph `CRUSH tunables`_ or `RBD image features`_.
+
+Create a Pool
+=============
+
+By default, Ceph block devices use the ``rbd`` pool. Create a pool for
+Kubernetes volume storage. Ensure your Ceph cluster is running, then create
+the pool. ::
+
+ $ ceph osd pool create kubernetes
+
+See `Create a Pool`_ for details on specifying the number of placement groups
+for your pools, and `Placement Groups`_ for details on the number of placement
+groups you should set for your pools.
+
+A newly created pool must be initialized prior to use. Use the ``rbd`` tool
+to initialize the pool::
+
+ $ rbd pool init kubernetes
+
+Configure ceph-csi
+==================
+
+Setup Ceph Client Authentication
+--------------------------------
+
+Create a new user for Kubernetes and `ceph-csi`. Execute the following and
+record the generated key::
+
+ $ ceph auth get-or-create client.kubernetes mon 'profile rbd' osd 'profile rbd pool=kubernetes' mgr 'profile rbd pool=kubernetes'
+ [client.kubernetes]
+ key = AQD9o0Fd6hQRChAAt7fMaSZXduT3NWEqylNpmg==
+
+Generate `ceph-csi` `ConfigMap`
+-------------------------------
+
+The `ceph-csi` requires a `ConfigMap` object stored in Kubernetes to define the
+the Ceph monitor addresses for the Ceph cluster. Collect both the Ceph cluster
+unique `fsid` and the monitor addresses::
+
+ $ ceph mon dump
+ <...>
+ fsid b9127830-b0cc-4e34-aa47-9d1a2e9949a8
+ <...>
+ 0: [v2:192.168.1.1:3300/0,v1:192.168.1.1:6789/0] mon.a
+ 1: [v2:192.168.1.2:3300/0,v1:192.168.1.2:6789/0] mon.b
+ 2: [v2:192.168.1.3:3300/0,v1:192.168.1.3:6789/0] mon.c
+
+.. note::
+ ``ceph-csi`` currently only supports the `legacy V1 protocol`_.
+
+Generate a `csi-config-map.yaml` file similar to the example below, substituting
+the `fsid` for "clusterID", and the monitor addresses for "monitors"::
+
+ $ cat <<EOF > csi-config-map.yaml
+ ---
+ apiVersion: v1
+ kind: ConfigMap
+ data:
+ config.json: |-
+ [
+ {
+ "clusterID": "b9127830-b0cc-4e34-aa47-9d1a2e9949a8",
+ "monitors": [
+ "192.168.1.1:6789",
+ "192.168.1.2:6789",
+ "192.168.1.3:6789"
+ ]
+ }
+ ]
+ metadata:
+ name: ceph-csi-config
+ EOF
+
+Once generated, store the new `ConfigMap` object in Kubernetes::
+
+ $ kubectl apply -f csi-config-map.yaml
+
+Recent versions of `ceph-csi` also require an additional `ConfigMap` object to
+define Key Management Service (KMS) provider details. If KMS isn't set up, put
+an empty configuration in a `csi-kms-config-map.yaml` file or refer to examples
+at https://github.com/ceph/ceph-csi/tree/master/examples/kms::
+
+ $ cat <<EOF > csi-kms-config-map.yaml
+ ---
+ apiVersion: v1
+ kind: ConfigMap
+ data:
+ config.json: |-
+ {}
+ metadata:
+ name: ceph-csi-encryption-kms-config
+ EOF
+
+Once generated, store the new `ConfigMap` object in Kubernetes::
+
+ $ kubectl apply -f csi-kms-config-map.yaml
+
+Recent versions of `ceph-csi` also require yet another `ConfigMap` object
+to define Ceph configuration to add to ceph.conf file inside CSI containers::
+
+ $ cat <<EOF > ceph-config-map.yaml
+ ---
+ apiVersion: v1
+ kind: ConfigMap
+ data:
+ ceph.conf: |
+ [global]
+ auth_cluster_required = cephx
+ auth_service_required = cephx
+ auth_client_required = cephx
+ # keyring is a required key and its value should be empty
+ keyring: |
+ metadata:
+ name: ceph-config
+ EOF
+
+Once generated, store the new `ConfigMap` object in Kubernetes::
+
+ $ kubectl apply -f ceph-config-map.yaml
+
+Generate `ceph-csi` cephx `Secret`
+----------------------------------
+
+`ceph-csi` requires the cephx credentials for communicating with the Ceph
+cluster. Generate a `csi-rbd-secret.yaml` file similar to the example below,
+using the newly created Kubernetes user id and cephx key::
+
+ $ cat <<EOF > csi-rbd-secret.yaml
+ ---
+ apiVersion: v1
+ kind: Secret
+ metadata:
+ name: csi-rbd-secret
+ namespace: default
+ stringData:
+ userID: kubernetes
+ userKey: AQD9o0Fd6hQRChAAt7fMaSZXduT3NWEqylNpmg==
+ EOF
+
+Once generated, store the new `Secret` object in Kubernetes::
+
+ $ kubectl apply -f csi-rbd-secret.yaml
+
+Configure `ceph-csi` Plugins
+----------------------------
+
+Create the required `ServiceAccount` and RBAC `ClusterRole`/`ClusterRoleBinding`
+Kubernetes objects. These objects do not necessarily need to be customized for
+your Kubernetes environment and therefore can be used as-is from the `ceph-csi`
+deployment YAMLs::
+
+ $ kubectl apply -f https://raw.githubusercontent.com/ceph/ceph-csi/master/deploy/rbd/kubernetes/csi-provisioner-rbac.yaml
+ $ kubectl apply -f https://raw.githubusercontent.com/ceph/ceph-csi/master/deploy/rbd/kubernetes/csi-nodeplugin-rbac.yaml
+
+Finally, create the `ceph-csi` provisioner and node plugins. With the
+possible exception of the `ceph-csi` container release version, these objects do
+not necessarily need to be customized for your Kubernetes environment and
+therefore can be used as-is from the `ceph-csi` deployment YAMLs::
+
+ $ wget https://raw.githubusercontent.com/ceph/ceph-csi/master/deploy/rbd/kubernetes/csi-rbdplugin-provisioner.yaml
+ $ kubectl apply -f csi-rbdplugin-provisioner.yaml
+ $ wget https://raw.githubusercontent.com/ceph/ceph-csi/master/deploy/rbd/kubernetes/csi-rbdplugin.yaml
+ $ kubectl apply -f csi-rbdplugin.yaml
+
+.. important::
+ The provisioner and node plugin YAMLs will, by default, pull the development
+ release of the `ceph-csi` container (quay.io/cephcsi/cephcsi:canary).
+ The YAMLs should be updated to use a release version container for
+ production workloads.
+
+Using Ceph Block Devices
+========================
+
+Create a `StorageClass`
+-----------------------
+
+The Kubernetes `StorageClass` defines a class of storage. Multiple `StorageClass`
+objects can be created to map to different quality-of-service levels (i.e. NVMe
+vs HDD-based pools) and features.
+
+For example, to create a `ceph-csi` `StorageClass` that maps to the `kubernetes`
+pool created above, the following YAML file can be used after ensuring that the
+"clusterID" property matches your Ceph cluster's `fsid`::
+
+ $ cat <<EOF > csi-rbd-sc.yaml
+ ---
+ apiVersion: storage.k8s.io/v1
+ kind: StorageClass
+ metadata:
+ name: csi-rbd-sc
+ provisioner: rbd.csi.ceph.com
+ parameters:
+ clusterID: b9127830-b0cc-4e34-aa47-9d1a2e9949a8
+ pool: kubernetes
+ imageFeatures: layering
+ csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret
+ csi.storage.k8s.io/provisioner-secret-namespace: default
+ csi.storage.k8s.io/controller-expand-secret-name: csi-rbd-secret
+ csi.storage.k8s.io/controller-expand-secret-namespace: default
+ csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret
+ csi.storage.k8s.io/node-stage-secret-namespace: default
+ reclaimPolicy: Delete
+ allowVolumeExpansion: true
+ mountOptions:
+ - discard
+ EOF
+ $ kubectl apply -f csi-rbd-sc.yaml
+
+Note that in Kubernetes v1.14 and v1.15 volume expansion feature was in alpha
+status and required enabling `ExpandCSIVolumes` feature gate.
+
+Create a `PersistentVolumeClaim`
+--------------------------------
+
+A `PersistentVolumeClaim` is a request for abstract storage resources by a user.
+The `PersistentVolumeClaim` would then be associated to a `Pod` resource to
+provision a `PersistentVolume`, which would be backed by a Ceph block image.
+An optional `volumeMode` can be included to select between a mounted file system
+(default) or raw block device-based volume.
+
+Using `ceph-csi`, specifying `Filesystem` for `volumeMode` can support both
+`ReadWriteOnce` and `ReadOnlyMany` `accessMode` claims, and specifying `Block`
+for `volumeMode` can support `ReadWriteOnce`, `ReadWriteMany`, and
+`ReadOnlyMany` `accessMode` claims.
+
+For example, to create a block-based `PersistentVolumeClaim` that utilizes
+the `ceph-csi`-based `StorageClass` created above, the following YAML can be
+used to request raw block storage from the `csi-rbd-sc` `StorageClass`::
+
+ $ cat <<EOF > raw-block-pvc.yaml
+ ---
+ apiVersion: v1
+ kind: PersistentVolumeClaim
+ metadata:
+ name: raw-block-pvc
+ spec:
+ accessModes:
+ - ReadWriteOnce
+ volumeMode: Block
+ resources:
+ requests:
+ storage: 1Gi
+ storageClassName: csi-rbd-sc
+ EOF
+ $ kubectl apply -f raw-block-pvc.yaml
+
+The following demonstrates and example of binding the above
+`PersistentVolumeClaim` to a `Pod` resource as a raw block device::
+
+ $ cat <<EOF > raw-block-pod.yaml
+ ---
+ apiVersion: v1
+ kind: Pod
+ metadata:
+ name: pod-with-raw-block-volume
+ spec:
+ containers:
+ - name: fc-container
+ image: fedora:26
+ command: ["/bin/sh", "-c"]
+ args: ["tail -f /dev/null"]
+ volumeDevices:
+ - name: data
+ devicePath: /dev/xvda
+ volumes:
+ - name: data
+ persistentVolumeClaim:
+ claimName: raw-block-pvc
+ EOF
+ $ kubectl apply -f raw-block-pod.yaml
+
+To create a file-system-based `PersistentVolumeClaim` that utilizes the
+`ceph-csi`-based `StorageClass` created above, the following YAML can be used to
+request a mounted file system (backed by an RBD image) from the `csi-rbd-sc`
+`StorageClass`::
+
+ $ cat <<EOF > pvc.yaml
+ ---
+ apiVersion: v1
+ kind: PersistentVolumeClaim
+ metadata:
+ name: rbd-pvc
+ spec:
+ accessModes:
+ - ReadWriteOnce
+ volumeMode: Filesystem
+ resources:
+ requests:
+ storage: 1Gi
+ storageClassName: csi-rbd-sc
+ EOF
+ $ kubectl apply -f pvc.yaml
+
+The following demonstrates and example of binding the above
+`PersistentVolumeClaim` to a `Pod` resource as a mounted file system::
+
+ $ cat <<EOF > pod.yaml
+ ---
+ apiVersion: v1
+ kind: Pod
+ metadata:
+ name: csi-rbd-demo-pod
+ spec:
+ containers:
+ - name: web-server
+ image: nginx
+ volumeMounts:
+ - name: mypvc
+ mountPath: /var/lib/www/html
+ volumes:
+ - name: mypvc
+ persistentVolumeClaim:
+ claimName: rbd-pvc
+ readOnly: false
+ EOF
+ $ kubectl apply -f pod.yaml
+
+.. _ceph-csi: https://github.com/ceph/ceph-csi/
+.. _volumes: https://kubernetes.io/docs/concepts/storage/volumes/
+.. _pods: https://kubernetes.io/docs/concepts/workloads/pods/pod-overview/
+.. _Create a Pool: ../../rados/operations/pools#createpool
+.. _Placement Groups: ../../rados/operations/placement-groups
+.. _CRUSH tunables: ../../rados/operations/crush-map/#tunables
+.. _RBD image features: ../rbd-config-ref/#image-features
+.. _legacy V1 protocol: ../../rados/configuration/msgr2/#address-formats
diff --git a/doc/rbd/rbd-live-migration.rst b/doc/rbd/rbd-live-migration.rst
new file mode 100644
index 000000000..c3e09193d
--- /dev/null
+++ b/doc/rbd/rbd-live-migration.rst
@@ -0,0 +1,367 @@
+======================
+ Image Live-Migration
+======================
+
+.. index:: Ceph Block Device; live-migration
+
+RBD images can be live-migrated between different pools within the same cluster;
+between different image formats and layouts; or from external data sources.
+When started, the source will be deep-copied to the destination image, pulling
+all snapshot history while preserving the sparse allocation of data where
+possible.
+
+By default, when live-migrating RBD images within the same Ceph cluster, the
+source image will be marked read-only and all clients will instead redirect
+IOs to the new target image. In addition, this mode can optionally preserve the
+link to the source image's parent to preserve sparseness, or it can flatten the
+image during the migration to remove the dependency on the source image's
+parent.
+
+The live-migration process can also be used in an import-only mode where the
+source image remains unmodified and the target image can be linked to an
+external data source such as a backing file, HTTP(s) file, or S3 object.
+
+The live-migration copy process can safely run in the background while the new
+target image is in use. There is currently a requirement to temporarily stop
+using the source image before preparing a migration when not using the
+import-only mode of operation. This helps to ensure that the client using the
+image is updated to point to the new target image.
+
+.. note::
+ Image live-migration requires the Ceph Nautilus release or later. Support for
+ external data sources requires the Ceph Pacific release of later. The
+ ``krbd`` kernel module does not support live-migration at this time.
+
+
+.. ditaa::
+
+ +-------------+ +-------------+
+ | {s} c999 | | {s} |
+ | Live | Target refers | Live |
+ | migration |<-------------*| migration |
+ | source | to Source | target |
+ | | | |
+ | (read only) | | (writable) |
+ +-------------+ +-------------+
+
+ Source Target
+
+The live-migration process is comprised of three steps:
+
+#. **Prepare Migration:** The initial step creates the new target image and
+ links the target image to the source. When not configured in the import-only
+ mode, the source image will also be linked to the target image and marked
+ read-only.
+
+ Similar to `layered images`_, attempts to read uninitialized data extents
+ within the target image will internally redirect the read to the source
+ image, and writes to uninitialized extents within the target will internally
+ deep-copy the overlapping source image block to the target image.
+
+
+#. **Execute Migration:** This is a background operation that deep-copies all
+ initialized blocks from the source image to the target. This step can be
+ run while clients are actively using the new target image.
+
+
+#. **Finish Migration:** Once the background migration process has completed,
+ the migration can be committed or aborted. Committing the migration will
+ remove the cross-links between the source and target images, and will
+ remove the source image if not configured in the import-only mode. Aborting
+ the migration will remove the cross-links, and will remove the target image.
+
+Prepare Migration
+=================
+
+The default live-migration process for images within the same Ceph cluster is
+initiated by running the `rbd migration prepare` command, providing the source
+and target images::
+
+ $ rbd migration prepare migration_source [migration_target]
+
+The `rbd migration prepare` command accepts all the same layout optionals as the
+`rbd create` command, which allows changes to the immutable image on-disk
+layout. The `migration_target` can be skipped if the goal is only to change the
+on-disk layout, keeping the original image name.
+
+All clients using the source image must be stopped prior to preparing a
+live-migration. The prepare step will fail if it finds any running clients with
+the image open in read/write mode. Once the prepare step is complete, the
+clients can be restarted using the new target image name. Attempting to restart
+the clients using the source image name will result in failure.
+
+The `rbd status` command will show the current state of the live-migration::
+
+ $ rbd status migration_target
+ Watchers: none
+ Migration:
+ source: rbd/migration_source (5e2cba2f62e)
+ destination: rbd/migration_target (5e2ed95ed806)
+ state: prepared
+
+Note that the source image will be moved to the RBD trash to avoid mistaken
+usage during the migration process::
+
+ $ rbd info migration_source
+ rbd: error opening image migration_source: (2) No such file or directory
+ $ rbd trash ls --all
+ 5e2cba2f62e migration_source
+
+
+Prepare Import-Only Migration
+=============================
+
+The import-only live-migration process is initiated by running the same
+`rbd migration prepare` command, but adding the `--import-only` optional
+and providing a JSON-encoded ``source-spec`` to describe how to access
+the source image data. This ``source-spec`` can either be passed
+directly via the `--source-spec` optional, or via a file or STDIN via the
+`--source-spec-path` optional::
+
+ $ rbd migration prepare --import-only --source-spec "<JSON>" migration_target
+
+The `rbd migration prepare` command accepts all the same layout optionals as the
+`rbd create` command.
+
+The `rbd status` command will show the current state of the live-migration::
+
+ $ rbd status migration_target
+ Watchers: none
+ Migration:
+ source: {"stream":{"file_path":"/mnt/image.raw","type":"file"},"type":"raw"}
+ destination: rbd/migration_target (ac69113dc1d7)
+ state: prepared
+
+The general format for the ``source-spec`` JSON is as follows::
+
+ {
+ "type": "<format-type>",
+ <format unique parameters>
+ "stream": {
+ "type": "<stream-type>",
+ <stream unique parameters>
+ }
+ }
+
+The following formats are currently supported: ``native``, ``qcow``, and
+``raw``. The following streams are currently supported: ``file``, ``http``, and
+``s3``.
+
+Formats
+~~~~~~~
+
+The ``native`` format can be used to describe a native RBD image within a
+Ceph cluster as the source image. Its ``source-spec`` JSON is encoded
+as follows::
+
+ {
+ "type": "native",
+ "pool_name": "<pool-name>",
+ ["pool_id": <pool-id>,] (optional alternative to "pool_name")
+ ["pool_namespace": "<pool-namespace",] (optional)
+ "image_name": "<image-name>",
+ ["image_id": "<image-id>",] (optional if image in trash)
+ "snap_name": "<snap-name>",
+ ["snap_id": "<snap-id>",] (optional alternative to "snap_name")
+ }
+
+Note that the ``native`` format does not include the ``stream`` object since
+it utilizes native Ceph operations. For example, to import from the image
+``rbd/ns1/image1@snap1``, the ``source-spec`` could be encoded as::
+
+ {
+ "type": "native",
+ "pool_name": "rbd",
+ "pool_namespace": "ns1",
+ "image_name": "image1",
+ "snap_name": "snap1"
+ }
+
+The ``qcow`` format can be used to describe a QCOW (QEMU copy-on-write) block
+device. Both the QCOW (v1) and QCOW2 formats are currently supported with the
+exception of advanced features such as compression, encryption, backing
+files, and external data files. Support for these missing features may be added
+in a future release. The ``qcow`` format data can be linked to any supported
+stream source described below. For example, its base ``source-spec`` JSON is
+encoded as follows::
+
+ {
+ "type": "qcow",
+ "stream": {
+ <stream unique parameters>
+ }
+ }
+
+The ``raw`` format can be used to describe a thick-provisioned, raw block device
+export (i.e. `rbd export --export-format 1 <snap-spec>`). The ``raw`` format
+data can be linked to any supported stream source described below. For example,
+its base ``source-spec`` JSON is encoded as follows::
+
+ {
+ "type": "raw",
+ "stream": {
+ <stream unique parameters for HEAD, non-snapshot revision>
+ },
+ "snapshots": [
+ {
+ "type": "raw",
+ "name": "<snapshot-name>",
+ "stream": {
+ <stream unique parameters for snapshot>
+ }
+ },
+ ] (optional oldest to newest ordering of snapshots)
+ }
+
+The inclusion of the ``snapshots`` array is optional and currently only supports
+thick-provisioned ``raw`` snapshot exports.
+
+Additional formats such as RBD export-format v2 and RBD export-diff
+snapshots will be added in a future release.
+
+Streams
+~~~~~~~
+
+The ``file`` stream can be used to import from a locally accessible POSIX file
+source. Its ``source-spec`` JSON is encoded as follows::
+
+ {
+ <format unique parameters>
+ "stream": {
+ "type": "file",
+ "file_path": "<file-path>"
+ }
+ }
+
+For example, to import a raw-format image from a file located at
+"/mnt/image.raw", its ``source-spec`` JSON is encoded as follows::
+
+ {
+ "type": "raw",
+ "stream": {
+ "type": "file",
+ "file_path": "/mnt/image.raw"
+ }
+ }
+
+The ``http`` stream can be used to import from a remote HTTP or HTTPS web
+server. Its ``source-spec`` JSON is encoded as follows::
+
+ {
+ <format unique parameters>
+ "stream": {
+ "type": "http",
+ "url": "<url-path>"
+ }
+ }
+
+For example, to import a raw-format image from a file located at
+``http://download.ceph.com/image.raw``, its ``source-spec`` JSON is encoded
+as follows::
+
+ {
+ "type": "raw",
+ "stream": {
+ "type": "http",
+ "url": "http://download.ceph.com/image.raw"
+ }
+ }
+
+The ``s3`` stream can be used to import from a remote S3 bucket. Its
+``source-spec`` JSON is encoded as follows::
+
+ {
+ <format unique parameters>
+ "stream": {
+ "type": "s3",
+ "url": "<url-path>",
+ "access_key": "<access-key>",
+ "secret_key": "<secret-key>"
+ }
+ }
+
+For example, to import a raw-format image from a file located at
+`http://s3.ceph.com/bucket/image.raw`, its ``source-spec`` JSON is encoded
+as follows::
+
+ {
+ "type": "raw",
+ "stream": {
+ "type": "s3",
+ "url": "http://s3.ceph.com/bucket/image.raw",
+ "access_key": "NX5QOQKC6BH2IDN8HC7A",
+ "secret_key": "LnEsqNNqZIpkzauboDcLXLcYaWwLQ3Kop0zAnKIn"
+ }
+ }
+
+.. note::
+ The ``access_key`` and ``secret_key`` parameters support storing the keys in
+ the MON config-key store by prefixing the key values with ``config://``
+ followed by the path in the MON config-key store to the value. Values can be
+ stored in the config-key store via ``ceph config-key set <key-path> <value>``
+ (e.g. ``ceph config-key set rbd/s3/access_key NX5QOQKC6BH2IDN8HC7A``).
+
+Execute Migration
+=================
+
+After preparing the live-migration, the image blocks from the source image
+must be copied to the target image. This is accomplished by running the
+`rbd migration execute` command::
+
+ $ rbd migration execute migration_target
+ Image migration: 100% complete...done.
+
+The `rbd status` command will also provide feedback on the progress of the
+migration block deep-copy process::
+
+ $ rbd status migration_target
+ Watchers:
+ watcher=1.2.3.4:0/3695551461 client.123 cookie=123
+ Migration:
+ source: rbd/migration_source (5e2cba2f62e)
+ destination: rbd/migration_target (5e2ed95ed806)
+ state: executing (32% complete)
+
+
+Commit Migration
+================
+
+Once the live-migration has completed deep-copying all data blocks from the
+source image to the target, the migration can be committed::
+
+ $ rbd status migration_target
+ Watchers: none
+ Migration:
+ source: rbd/migration_source (5e2cba2f62e)
+ destination: rbd/migration_target (5e2ed95ed806)
+ state: executed
+ $ rbd migration commit migration_target
+ Commit image migration: 100% complete...done.
+
+If the `migration_source` image is a parent of one or more clones, the `--force`
+option will need to be specified after ensuring all descendent clone images are
+not in use.
+
+Committing the live-migration will remove the cross-links between the source
+and target images, and will remove the source image::
+
+ $ rbd trash list --all
+
+
+Abort Migration
+===============
+
+If you wish to revert the prepare or execute step, run the `rbd migration abort`
+command to revert the migration process::
+
+ $ rbd migration abort migration_target
+ Abort image migration: 100% complete...done.
+
+Aborting the migration will result in the target image being deleted and access
+to the original source image being restored::
+
+ $ rbd ls
+ migration_source
+
+
+.. _layered images: ../rbd-snapshot/#layering
diff --git a/doc/rbd/rbd-mirroring.rst b/doc/rbd/rbd-mirroring.rst
new file mode 100644
index 000000000..74a2a364e
--- /dev/null
+++ b/doc/rbd/rbd-mirroring.rst
@@ -0,0 +1,538 @@
+===============
+ RBD Mirroring
+===============
+
+.. index:: Ceph Block Device; mirroring
+
+RBD images can be asynchronously mirrored between two Ceph clusters. This
+capability is available in two modes:
+
+* **Journal-based**: This mode uses the RBD journaling image feature to ensure
+ point-in-time, crash-consistent replication between clusters. Every write to
+ the RBD image is first recorded to the associated journal before modifying the
+ actual image. The remote cluster will read from this associated journal and
+ replay the updates to its local copy of the image. Since each write to the
+ RBD image will result in two writes to the Ceph cluster, expect write
+ latencies to nearly double while using the RBD journaling image feature.
+
+* **Snapshot-based**: This mode uses periodically scheduled or manually
+ created RBD image mirror-snapshots to replicate crash-consistent RBD images
+ between clusters. The remote cluster will determine any data or metadata
+ updates between two mirror-snapshots and copy the deltas to its local copy of
+ the image. With the help of the RBD ``fast-diff`` image feature, updated data
+ blocks can be quickly determined without the need to scan the full RBD image.
+ Since this mode is not as fine-grained as journaling, the complete delta
+ between two snapshots will need to be synced prior to use during a failover
+ scenario. Any partially applied set of deltas will be rolled back at moment
+ of failover.
+
+.. note:: journal-based mirroring requires the Ceph Jewel release or later;
+ snapshot-based mirroring requires the Ceph Octopus release or later.
+
+Mirroring is configured on a per-pool basis within peer clusters and can be
+configured on a specific subset of images within the pool. You can also mirror
+all images within a given pool when using journal-based
+mirroring. Mirroring is configured using the ``rbd`` command. The
+``rbd-mirror`` daemon is responsible for pulling image updates from the remote
+peer cluster and applying them to the image within the local cluster.
+
+Depending on the desired needs for replication, RBD mirroring can be configured
+for either one- or two-way replication:
+
+* **One-way Replication**: When data is only mirrored from a primary cluster to
+ a secondary cluster, the ``rbd-mirror`` daemon runs only on the secondary
+ cluster.
+
+* **Two-way Replication**: When data is mirrored from primary images on one
+ cluster to non-primary images on another cluster (and vice-versa), the
+ ``rbd-mirror`` daemon runs on both clusters.
+
+.. important:: Each instance of the ``rbd-mirror`` daemon must be able to
+ connect to both the local and remote Ceph clusters simultaneously (i.e.
+ all monitor and OSD hosts). Additionally, the network must have sufficient
+ bandwidth between the two data centers to handle mirroring workload.
+
+Pool Configuration
+==================
+
+The following procedures demonstrate how to perform the basic administrative
+tasks to configure mirroring using the ``rbd`` command. Mirroring is
+configured on a per-pool basis.
+
+These pool configuration steps should be performed on both peer clusters. These
+procedures assume that both clusters, named "site-a" and "site-b", are accessible
+from a single host for clarity.
+
+See the `rbd`_ manpage for additional details of how to connect to different
+Ceph clusters.
+
+.. note:: The cluster name in the following examples corresponds to a Ceph
+ configuration file of the same name (e.g. /etc/ceph/site-b.conf). See the
+ `ceph-conf`_ documentation for how to configure multiple clusters. Note
+ that ``rbd-mirror`` does **not** require the source and destination clusters
+ to have unique internal names; both can and should call themselves ``ceph``.
+ The config `files` that ``rbd-mirror`` needs for local and remote clusters
+ can be named arbitrarily, and containerizing the daemon is one strategy
+ for maintaining them outside of ``/etc/ceph`` to avoid confusion.
+
+Enable Mirroring
+----------------
+
+To enable mirroring on a pool with ``rbd``, issue the ``mirror pool enable``
+subcommand with the pool name, the mirroring mode, and an optional friendly
+site name to describe the local cluster::
+
+ rbd mirror pool enable [--site-name {local-site-name}] {pool-name} {mode}
+
+The mirroring mode can either be ``image`` or ``pool``:
+
+* **image**: When configured in ``image`` mode, mirroring must
+ `explicitly enabled`_ on each image.
+* **pool** (default): When configured in ``pool`` mode, all images in the pool
+ with the journaling feature enabled are mirrored.
+
+For example::
+
+ $ rbd --cluster site-a mirror pool enable --site-name site-a image-pool image
+ $ rbd --cluster site-b mirror pool enable --site-name site-b image-pool image
+
+The site name can also be specified when creating or importing a new
+`bootstrap token`_.
+
+The site name can be changed later using the same ``mirror pool enable``
+subcommand but note that the local site name and the corresponding site name
+used by the remote cluster generally must match.
+
+Disable Mirroring
+-----------------
+
+To disable mirroring on a pool with ``rbd``, specify the ``mirror pool disable``
+command and the pool name::
+
+ rbd mirror pool disable {pool-name}
+
+When mirroring is disabled on a pool in this way, mirroring will also be
+disabled on any images (within the pool) for which mirroring was enabled
+explicitly.
+
+For example::
+
+ $ rbd --cluster site-a mirror pool disable image-pool
+ $ rbd --cluster site-b mirror pool disable image-pool
+
+Bootstrap Peers
+---------------
+
+In order for the ``rbd-mirror`` daemon to discover its peer cluster, the peer
+must be registered and a user account must be created.
+This process can be automated with ``rbd`` and the
+``mirror pool peer bootstrap create`` and ``mirror pool peer bootstrap import``
+commands.
+
+To manually create a new bootstrap token with ``rbd``, issue the
+``mirror pool peer bootstrap create`` subcommand, a pool name, and an
+optional friendly site name to describe the local cluster::
+
+ rbd mirror pool peer bootstrap create [--site-name {local-site-name}] {pool-name}
+
+The output of ``mirror pool peer bootstrap create`` will be a token that should
+be provided to the ``mirror pool peer bootstrap import`` command. For example,
+on site-a::
+
+ $ rbd --cluster site-a mirror pool peer bootstrap create --site-name site-a image-pool
+ eyJmc2lkIjoiOWY1MjgyZGItYjg5OS00NTk2LTgwOTgtMzIwYzFmYzM5NmYzIiwiY2xpZW50X2lkIjoicmJkLW1pcnJvci1wZWVyIiwia2V5IjoiQVFBUnczOWQwdkhvQmhBQVlMM1I4RmR5dHNJQU50bkFTZ0lOTVE9PSIsIm1vbl9ob3N0IjoiW3YyOjE5Mi4xNjguMS4zOjY4MjAsdjE6MTkyLjE2OC4xLjM6NjgyMV0ifQ==
+
+To manually import the bootstrap token created by another cluster with ``rbd``,
+specify the ``mirror pool peer bootstrap import`` command, the pool name, a file
+path to the created token (or '-' to read from standard input), along with an
+optional friendly site name to describe the local cluster and a mirroring
+direction (defaults to rx-tx for bidirectional mirroring, but can also be set
+to rx-only for unidirectional mirroring)::
+
+ rbd mirror pool peer bootstrap import [--site-name {local-site-name}] [--direction {rx-only or rx-tx}] {pool-name} {token-path}
+
+For example, on site-b::
+
+ $ cat <<EOF > token
+ eyJmc2lkIjoiOWY1MjgyZGItYjg5OS00NTk2LTgwOTgtMzIwYzFmYzM5NmYzIiwiY2xpZW50X2lkIjoicmJkLW1pcnJvci1wZWVyIiwia2V5IjoiQVFBUnczOWQwdkhvQmhBQVlMM1I4RmR5dHNJQU50bkFTZ0lOTVE9PSIsIm1vbl9ob3N0IjoiW3YyOjE5Mi4xNjguMS4zOjY4MjAsdjE6MTkyLjE2OC4xLjM6NjgyMV0ifQ==
+ EOF
+ $ rbd --cluster site-b mirror pool peer bootstrap import --site-name site-b image-pool token
+
+Add Cluster Peer Manually
+-------------------------
+
+Cluster peers can be specified manually if desired or if the above bootstrap
+commands are not available with the currently installed Ceph release.
+
+The remote ``rbd-mirror`` daemon will need access to the local cluster to
+perform mirroring. A new local Ceph user should be created for the remote
+daemon to use. To `create a Ceph user`_, with ``ceph`` specify the
+``auth get-or-create`` command, user name, monitor caps, and OSD caps::
+
+ $ ceph auth get-or-create client.rbd-mirror-peer mon 'profile rbd-mirror-peer' osd 'profile rbd'
+
+The resulting keyring should be copied to the other cluster's ``rbd-mirror``
+daemon hosts if not using the Ceph monitor ``config-key`` store described below.
+
+To manually add a mirroring peer Ceph cluster with ``rbd``, specify the
+``mirror pool peer add`` command, the pool name, and a cluster specification::
+
+ rbd mirror pool peer add {pool-name} {client-name}@{cluster-name}
+
+For example::
+
+ $ rbd --cluster site-a mirror pool peer add image-pool client.rbd-mirror-peer@site-b
+ $ rbd --cluster site-b mirror pool peer add image-pool client.rbd-mirror-peer@site-a
+
+By default, the ``rbd-mirror`` daemon needs to have access to a Ceph
+configuration file located at ``/etc/ceph/{cluster-name}.conf`` that provides
+the addresses of the peer cluster's monitors, in addition to a keyring for
+``{client-name}`` located in the default or configured keyring search paths
+(e.g. ``/etc/ceph/{cluster-name}.{client-name}.keyring``).
+
+Alternatively, the peer cluster's monitor and/or client key can be securely
+stored within the local Ceph monitor ``config-key`` store. To specify the
+peer cluster connection attributes when adding a mirroring peer, use the
+``--remote-mon-host`` and ``--remote-key-file`` optionals. For example::
+
+ $ cat <<EOF > remote-key-file
+ AQAeuZdbMMoBChAAcj++/XUxNOLFaWdtTREEsw==
+ EOF
+ $ rbd --cluster site-a mirror pool peer add image-pool client.rbd-mirror-peer@site-b --remote-mon-host 192.168.1.1,192.168.1.2 --remote-key-file remote-key-file
+ $ rbd --cluster site-a mirror pool info image-pool --all
+ Mode: pool
+ Peers:
+ UUID NAME CLIENT MON_HOST KEY
+ 587b08db-3d33-4f32-8af8-421e77abb081 site-b client.rbd-mirror-peer 192.168.1.1,192.168.1.2 AQAeuZdbMMoBChAAcj++/XUxNOLFaWdtTREEsw==
+
+Remove Cluster Peer
+-------------------
+
+To remove a mirroring peer Ceph cluster with ``rbd``, specify the
+``mirror pool peer remove`` command, the pool name, and the peer UUID
+(available from the ``rbd mirror pool info`` command)::
+
+ rbd mirror pool peer remove {pool-name} {peer-uuid}
+
+For example::
+
+ $ rbd --cluster site-a mirror pool peer remove image-pool 55672766-c02b-4729-8567-f13a66893445
+ $ rbd --cluster site-b mirror pool peer remove image-pool 60c0e299-b38f-4234-91f6-eed0a367be08
+
+Data Pools
+----------
+
+When creating images in the destination cluster, ``rbd-mirror`` selects a data
+pool as follows:
+
+#. If the destination cluster has a default data pool configured (with the
+ ``rbd_default_data_pool`` configuration option), it will be used.
+#. Otherwise, if the source image uses a separate data pool, and a pool with the
+ same name exists on the destination cluster, that pool will be used.
+#. If neither of the above is true, no data pool will be set.
+
+Image Configuration
+===================
+
+Unlike pool configuration, image configuration only needs to be performed
+against a single mirroring peer Ceph cluster.
+
+Mirrored RBD images are designated as either primary or non-primary. This is a
+property of the image and not the pool. Images that are designated as
+non-primary cannot be modified.
+
+Images are automatically promoted to primary when mirroring is first enabled on
+an image (either implicitly if the pool mirror mode was ``pool`` and the image
+has the journaling image feature enabled, or `explicitly enabled`_ by the
+``rbd`` command if the pool mirror mode was ``image``).
+
+Enable Image Mirroring
+----------------------
+
+If mirroring is configured in ``image`` mode for the image's pool, then it
+is necessary to explicitly enable mirroring for each image within the pool.
+To enable mirroring for a specific image with ``rbd``, specify the
+``mirror image enable`` command along with the pool, image name, and mode::
+
+ rbd mirror image enable {pool-name}/{image-name} {mode}
+
+The mirror image mode can either be ``journal`` or ``snapshot``:
+
+* **journal** (default): When configured in ``journal`` mode, mirroring will
+ utilize the RBD journaling image feature to replicate the image contents. If
+ the RBD journaling image feature is not yet enabled on the image, it will be
+ automatically enabled.
+
+* **snapshot**: When configured in ``snapshot`` mode, mirroring will utilize
+ RBD image mirror-snapshots to replicate the image contents. Once enabled, an
+ initial mirror-snapshot will automatically be created. Additional RBD image
+ `mirror-snapshots`_ can be created by the ``rbd`` command.
+
+For example::
+
+ $ rbd --cluster site-a mirror image enable image-pool/image-1 snapshot
+ $ rbd --cluster site-a mirror image enable image-pool/image-2 journal
+
+Enable Image Journaling Feature
+-------------------------------
+
+RBD journal-based mirroring uses the RBD image journaling feature to ensure that
+the replicated image always remains crash-consistent. When using the ``image``
+mirroring mode, the journaling feature will be automatically enabled when
+mirroring is enabled on the image. When using the ``pool`` mirroring mode,
+before an image can be mirrored to a peer cluster, the RBD image journaling
+feature must be enabled. The feature can be enabled at image creation time by
+providing the ``--image-feature exclusive-lock,journaling`` option to the
+``rbd`` command.
+
+Alternatively, the journaling feature can be dynamically enabled on
+pre-existing RBD images. To enable journaling with ``rbd``, specify
+the ``feature enable`` command, the pool and image name, and the feature name::
+
+ rbd feature enable {pool-name}/{image-name} {feature-name}
+
+For example::
+
+ $ rbd --cluster site-a feature enable image-pool/image-1 journaling
+
+.. note:: The journaling feature is dependent on the exclusive-lock feature. If
+ the exclusive-lock feature is not already enabled, it should be enabled prior
+ to enabling the journaling feature.
+
+.. tip:: You can enable journaling on all new images by default by adding
+ ``rbd default features = 125`` to your Ceph configuration file.
+
+.. tip:: ``rbd-mirror`` tunables are set by default to values suitable for
+ mirroring an entire pool. When using ``rbd-mirror`` to migrate single
+ volumes been clusters you may achieve substantial performance gains
+ by setting ``rbd_mirror_journal_max_fetch_bytes=33554432`` and
+ ``rbd_journal_max_payload_bytes=8388608`` within the ``[client]`` config
+ section of the local or centralized configuration. Note that these
+ settings may allow ``rbd-mirror`` to present a substantial write workload
+ to the destination cluster: monitor cluster performance closely during
+ migrations and test carefully before running multiple migrations in parallel.
+
+Create Image Mirror-Snapshots
+-----------------------------
+
+When using snapshot-based mirroring, mirror-snapshots will need to be created
+whenever it is desired to mirror the changed contents of the RBD image. To
+create a mirror-snapshot manually with ``rbd``, specify the
+``mirror image snapshot`` command along with the pool and image name::
+
+ rbd mirror image snapshot {pool-name}/{image-name}
+
+For example::
+
+ $ rbd --cluster site-a mirror image snapshot image-pool/image-1
+
+By default up to ``5`` mirror-snapshots will be created per-image. The most
+recent mirror-snapshot is automatically pruned if the limit is reached.
+The limit can be overridden via the ``rbd_mirroring_max_mirroring_snapshots``
+configuration option if required. Additionally, mirror-snapshots are
+automatically deleted when the image is removed or when mirroring is disabled.
+
+Mirror-snapshots can also be automatically created on a periodic basis if
+mirror-snapshot schedules are defined. The mirror-snapshot can be scheduled
+globally, per-pool, or per-image levels. Multiple mirror-snapshot schedules can
+be defined at any level, but only the most-specific snapshot schedules that
+match an individual mirrored image will run.
+
+To create a mirror-snapshot schedule with ``rbd``, specify the
+``mirror snapshot schedule add`` command along with an optional pool or
+image name; interval; and optional start time::
+
+ rbd mirror snapshot schedule add [--pool {pool-name}] [--image {image-name}] {interval} [{start-time}]
+
+The ``interval`` can be specified in days, hours, or minutes using ``d``, ``h``,
+``m`` suffix respectively. The optional ``start-time`` can be specified using
+the ISO 8601 time format. For example::
+
+ $ rbd --cluster site-a mirror snapshot schedule add --pool image-pool 24h 14:00:00-05:00
+ $ rbd --cluster site-a mirror snapshot schedule add --pool image-pool --image image1 6h
+
+To remove a mirror-snapshot schedules with ``rbd``, specify the
+``mirror snapshot schedule remove`` command with options that match the
+corresponding ``add`` schedule command.
+
+To list all snapshot schedules for a specific level (global, pool, or image)
+with ``rbd``, specify the ``mirror snapshot schedule ls`` command along with
+an optional pool or image name. Additionally, the ``--recursive`` option can
+be specified to list all schedules at the specified level and below. For
+example::
+
+ $ rbd --cluster site-a mirror snapshot schedule ls --pool image-pool --recursive
+ POOL NAMESPACE IMAGE SCHEDULE
+ image-pool - - every 1d starting at 14:00:00-05:00
+ image-pool image1 every 6h
+
+To view the status for when the next snapshots will be created for
+snapshot-based mirroring RBD images with ``rbd``, specify the
+``mirror snapshot schedule status`` command along with an optional pool or
+image name::
+
+ rbd mirror snapshot schedule status [--pool {pool-name}] [--image {image-name}]
+
+For example::
+
+ $ rbd --cluster site-a mirror snapshot schedule status
+ SCHEDULE TIME IMAGE
+ 2020-02-26 18:00:00 image-pool/image1
+
+Disable Image Mirroring
+-----------------------
+
+To disable mirroring for a specific image with ``rbd``, specify the
+``mirror image disable`` command along with the pool and image name::
+
+ rbd mirror image disable {pool-name}/{image-name}
+
+For example::
+
+ $ rbd --cluster site-a mirror image disable image-pool/image-1
+
+Image Promotion and Demotion
+----------------------------
+
+In a failover scenario where the primary designation needs to be moved to the
+image in the peer Ceph cluster, access to the primary image should be stopped
+(e.g. power down the VM or remove the associated drive from a VM), demote the
+current primary image, promote the new primary image, and resume access to the
+image on the alternate cluster.
+
+.. note:: RBD only provides the necessary tools to facilitate an orderly
+ failover of an image. An external mechanism is required to coordinate the
+ full failover process (e.g. closing the image before demotion).
+
+To demote a specific image to non-primary with ``rbd``, specify the
+``mirror image demote`` command along with the pool and image name::
+
+ rbd mirror image demote {pool-name}/{image-name}
+
+For example::
+
+ $ rbd --cluster site-a mirror image demote image-pool/image-1
+
+To demote all primary images within a pool to non-primary with ``rbd``, specify
+the ``mirror pool demote`` command along with the pool name::
+
+ rbd mirror pool demote {pool-name}
+
+For example::
+
+ $ rbd --cluster site-a mirror pool demote image-pool
+
+To promote a specific image to primary with ``rbd``, specify the
+``mirror image promote`` command along with the pool and image name::
+
+ rbd mirror image promote [--force] {pool-name}/{image-name}
+
+For example::
+
+ $ rbd --cluster site-b mirror image promote image-pool/image-1
+
+To promote all non-primary images within a pool to primary with ``rbd``, specify
+the ``mirror pool promote`` command along with the pool name::
+
+ rbd mirror pool promote [--force] {pool-name}
+
+For example::
+
+ $ rbd --cluster site-a mirror pool promote image-pool
+
+.. tip:: Since the primary / non-primary status is per-image, it is possible to
+ have two clusters split the IO load and stage failover / failback.
+
+.. note:: Promotion can be forced using the ``--force`` option. Forced
+ promotion is needed when the demotion cannot be propagated to the peer
+ Ceph cluster (e.g. Ceph cluster failure, communication outage). This will
+ result in a split-brain scenario between the two peers and the image will no
+ longer be in-sync until a `force resync command`_ is issued.
+
+Force Image Resync
+------------------
+
+If a split-brain event is detected by the ``rbd-mirror`` daemon, it will not
+attempt to mirror the affected image until corrected. To resume mirroring for an
+image, first `demote the image`_ determined to be out-of-date and then request a
+resync to the primary image. To request an image resync with ``rbd``, specify
+the ``mirror image resync`` command along with the pool and image name::
+
+ rbd mirror image resync {pool-name}/{image-name}
+
+For example::
+
+ $ rbd mirror image resync image-pool/image-1
+
+.. note:: The ``rbd`` command only flags the image as requiring a resync. The
+ local cluster's ``rbd-mirror`` daemon process is responsible for performing
+ the resync asynchronously.
+
+Mirror Status
+=============
+
+The peer cluster replication status is stored for every primary mirrored image.
+This status can be retrieved using the ``mirror image status`` and
+``mirror pool status`` commands.
+
+To request the mirror image status with ``rbd``, specify the
+``mirror image status`` command along with the pool and image name::
+
+ rbd mirror image status {pool-name}/{image-name}
+
+For example::
+
+ $ rbd mirror image status image-pool/image-1
+
+To request the mirror pool summary status with ``rbd``, specify the
+``mirror pool status`` command along with the pool name::
+
+ rbd mirror pool status {pool-name}
+
+For example::
+
+ $ rbd mirror pool status image-pool
+
+.. note:: Adding ``--verbose`` option to the ``mirror pool status`` command will
+ additionally output status details for every mirroring image in the pool.
+
+rbd-mirror Daemon
+=================
+
+The two ``rbd-mirror`` daemons are responsible for watching image journals on
+the remote, peer cluster and replaying the journal events against the local
+cluster. The RBD image journaling feature records all modifications to the
+image in the order they occur. This ensures that a crash-consistent mirror of
+the remote image is available locally.
+
+The ``rbd-mirror`` daemon is available within the optional ``rbd-mirror``
+distribution package.
+
+.. important:: Each ``rbd-mirror`` daemon requires the ability to connect
+ to both clusters simultaneously.
+.. warning:: Pre-Luminous releases: only run a single ``rbd-mirror`` daemon per
+ Ceph cluster.
+
+Each ``rbd-mirror`` daemon should use a unique Ceph user ID. To
+`create a Ceph user`_, with ``ceph`` specify the ``auth get-or-create``
+command, user name, monitor caps, and OSD caps::
+
+ ceph auth get-or-create client.rbd-mirror.{unique id} mon 'profile rbd-mirror' osd 'profile rbd'
+
+The ``rbd-mirror`` daemon can be managed by ``systemd`` by specifying the user
+ID as the daemon instance::
+
+ systemctl enable ceph-rbd-mirror@rbd-mirror.{unique id}
+
+The ``rbd-mirror`` can also be run in foreground by ``rbd-mirror`` command::
+
+ rbd-mirror -f --log-file={log_path}
+
+.. _rbd: ../../man/8/rbd
+.. _ceph-conf: ../../rados/configuration/ceph-conf/#running-multiple-clusters
+.. _explicitly enabled: #enable-image-mirroring
+.. _bootstrap token: #bootstrap-peers
+.. _force resync command: #force-image-resync
+.. _demote the image: #image-promotion-and-demotion
+.. _create a Ceph user: ../../rados/operations/user-management#add-a-user
+.. _mirror-snapshots: #create-image-mirror-snapshots
diff --git a/doc/rbd/rbd-nomad.rst b/doc/rbd/rbd-nomad.rst
new file mode 100644
index 000000000..66d87d6ce
--- /dev/null
+++ b/doc/rbd/rbd-nomad.rst
@@ -0,0 +1,475 @@
+=========================
+ Block Devices and Nomad
+=========================
+
+Like Kubernetes, Nomad can use Ceph Block Device. This is made possible by
+`ceph-csi`_, which allows you to dynamically provision RBD images or import
+existing RBD images.
+
+Every version of Nomad is compatible with `ceph-csi`_, but the reference
+version of Nomad that was used to generate the procedures and guidance in this
+document is Nomad v1.1.2, the latest version available at the time of the
+writing of the document.
+
+To use Ceph Block Devices with Nomad, you must install
+and configure ``ceph-csi`` within your Nomad environment. The following
+diagram shows the Nomad/Ceph technology stack.
+
+.. ditaa::
+ +-------------------------+-------------------------+
+ | Container | ceph--csi |
+ | | node |
+ | ^ | ^ |
+ | | | | |
+ +----------+--------------+-------------------------+
+ | | | |
+ | v | |
+ | Nomad | |
+ | | |
+ +---------------------------------------------------+
+ | ceph--csi |
+ | controller |
+ +--------+------------------------------------------+
+ | |
+ | configures maps |
+ +---------------+ +----------------+
+ | |
+ v v
+ +------------------------+ +------------------------+
+ | | | rbd--nbd |
+ | Kernel Modules | +------------------------+
+ | | | librbd |
+ +------------------------+-+------------------------+
+ | RADOS Protocol |
+ +------------------------+-+------------------------+
+ | OSDs | | Monitors |
+ +------------------------+ +------------------------+
+
+.. note::
+ Nomad has many possible task drivers, but this example uses only a Docker container.
+
+.. important::
+ ``ceph-csi`` uses the RBD kernel modules by default, which may not support
+ all Ceph `CRUSH tunables`_ or `RBD image features`_.
+
+Create a Pool
+=============
+
+By default, Ceph block devices use the ``rbd`` pool. Ensure that your Ceph
+cluster is running, then create a pool for Nomad persistent storage:
+
+.. prompt:: bash $
+
+ ceph osd pool create nomad
+
+See `Create a Pool`_ for details on specifying the number of placement groups
+for your pools. See `Placement Groups`_ for details on the number of placement
+groups you should set for your pools.
+
+A newly created pool must be initialized prior to use. Use the ``rbd`` tool
+to initialize the pool:
+
+.. prompt:: bash $
+
+ rbd pool init nomad
+
+Configure ceph-csi
+==================
+
+Ceph Client Authentication Setup
+--------------------------------
+
+Create a new user for Nomad and `ceph-csi`. Execute the following command and
+record the generated key:
+
+.. code-block:: console
+
+ $ ceph auth get-or-create client.nomad mon 'profile rbd' osd 'profile rbd pool=nomad' mgr 'profile rbd pool=nomad'
+ [client.nomad]
+ key = AQAlh9Rgg2vrDxAARy25T7KHabs6iskSHpAEAQ==
+
+
+Configure Nomad
+---------------
+
+Configuring Nomad to Allow Containers to Use Privileged Mode
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+By default, Nomad doesn't allow containers to use privileged mode. We must
+configure Nomad so that it allows containers to use privileged mode. Edit the
+Nomad configuration file by adding the following configuration block to
+`/etc/nomad.d/nomad.hcl`::
+
+ plugin "docker" {
+ config {
+ allow_privileged = true
+ }
+ }
+
+Loading the rbd module
+~~~~~~~~~~~~~~~~~~~~~~
+
+Nomad must have the `rbd` module loaded. Run the following command to confirm that the `rbd` module is loaded:
+
+.. code-block:: console
+
+ $ lsmod | grep rbd
+ rbd 94208 2
+ libceph 364544 1 rbd
+
+If the `rbd` module is not loaded, load it:
+
+.. prompt:: bash $
+
+ sudo modprobe rbd
+
+Restarting Nomad
+~~~~~~~~~~~~~~~~
+
+Restart Nomad:
+
+.. prompt:: bash $
+
+ sudo systemctl restart nomad
+
+
+Create ceph-csi controller and plugin nodes
+===========================================
+
+The `ceph-csi`_ plugin requires two components:
+
+- **Controller plugin**: communicates with the provider's API.
+- **Node plugin**: executes tasks on the client.
+
+.. note::
+ We'll set the ceph-csi's version in those files. See `ceph-csi release`_
+ for information about ceph-csi's compatibility with other versions.
+
+Configure controller plugin
+---------------------------
+
+The controller plugin requires the Ceph monitor addresses of the Ceph
+cluster. Collect both (1) the Ceph cluster unique `fsid` and (2) the monitor
+addresses:
+
+.. code-block:: console
+
+ $ ceph mon dump
+ <...>
+ fsid b9127830-b0cc-4e34-aa47-9d1a2e9949a8
+ <...>
+ 0: [v2:192.168.1.1:3300/0,v1:192.168.1.1:6789/0] mon.a
+ 1: [v2:192.168.1.2:3300/0,v1:192.168.1.2:6789/0] mon.b
+ 2: [v2:192.168.1.3:3300/0,v1:192.168.1.3:6789/0] mon.c
+
+Generate a ``ceph-csi-plugin-controller.nomad`` file similar to the example
+below. Substitute the `fsid` for "clusterID", and the monitor addresses for
+"monitors"::
+
+ job "ceph-csi-plugin-controller" {
+ datacenters = ["dc1"]
+ group "controller" {
+ network {
+ port "metrics" {}
+ }
+ task "ceph-controller" {
+ template {
+ data = <<EOF
+ [{
+ "clusterID": "b9127830-b0cc-4e34-aa47-9d1a2e9949a8",
+ "monitors": [
+ "192.168.1.1",
+ "192.168.1.2",
+ "192.168.1.3"
+ ]
+ }]
+ EOF
+ destination = "local/config.json"
+ change_mode = "restart"
+ }
+ driver = "docker"
+ config {
+ image = "quay.io/cephcsi/cephcsi:v3.3.1"
+ volumes = [
+ "./local/config.json:/etc/ceph-csi-config/config.json"
+ ]
+ mounts = [
+ {
+ type = "tmpfs"
+ target = "/tmp/csi/keys"
+ readonly = false
+ tmpfs_options = {
+ size = 1000000 # size in bytes
+ }
+ }
+ ]
+ args = [
+ "--type=rbd",
+ "--controllerserver=true",
+ "--drivername=rbd.csi.ceph.com",
+ "--endpoint=unix://csi/csi.sock",
+ "--nodeid=${node.unique.name}",
+ "--instanceid=${node.unique.name}-controller",
+ "--pidlimit=-1",
+ "--logtostderr=true",
+ "--v=5",
+ "--metricsport=$${NOMAD_PORT_metrics}"
+ ]
+ }
+ resources {
+ cpu = 500
+ memory = 256
+ }
+ service {
+ name = "ceph-csi-controller"
+ port = "metrics"
+ tags = [ "prometheus" ]
+ }
+ csi_plugin {
+ id = "ceph-csi"
+ type = "controller"
+ mount_dir = "/csi"
+ }
+ }
+ }
+ }
+
+Configure plugin node
+---------------------
+
+Generate a ``ceph-csi-plugin-nodes.nomad`` file similar to the example below.
+Substitute the `fsid` for "clusterID" and the monitor addresses for
+"monitors"::
+
+ job "ceph-csi-plugin-nodes" {
+ datacenters = ["dc1"]
+ type = "system"
+ group "nodes" {
+ network {
+ port "metrics" {}
+ }
+ task "ceph-node" {
+ driver = "docker"
+ template {
+ data = <<EOF
+ [{
+ "clusterID": "b9127830-b0cc-4e34-aa47-9d1a2e9949a8",
+ "monitors": [
+ "192.168.1.1",
+ "192.168.1.2",
+ "192.168.1.3"
+ ]
+ }]
+ EOF
+ destination = "local/config.json"
+ change_mode = "restart"
+ }
+ config {
+ image = "quay.io/cephcsi/cephcsi:v3.3.1"
+ volumes = [
+ "./local/config.json:/etc/ceph-csi-config/config.json"
+ ]
+ mounts = [
+ {
+ type = "tmpfs"
+ target = "/tmp/csi/keys"
+ readonly = false
+ tmpfs_options = {
+ size = 1000000 # size in bytes
+ }
+ }
+ ]
+ args = [
+ "--type=rbd",
+ "--drivername=rbd.csi.ceph.com",
+ "--nodeserver=true",
+ "--endpoint=unix://csi/csi.sock",
+ "--nodeid=${node.unique.name}",
+ "--instanceid=${node.unique.name}-nodes",
+ "--pidlimit=-1",
+ "--logtostderr=true",
+ "--v=5",
+ "--metricsport=$${NOMAD_PORT_metrics}"
+ ]
+ privileged = true
+ }
+ resources {
+ cpu = 500
+ memory = 256
+ }
+ service {
+ name = "ceph-csi-nodes"
+ port = "metrics"
+ tags = [ "prometheus" ]
+ }
+ csi_plugin {
+ id = "ceph-csi"
+ type = "node"
+ mount_dir = "/csi"
+ }
+ }
+ }
+ }
+
+Start plugin controller and node
+--------------------------------
+
+To start the plugin controller and the Nomad node, run the following commands:
+
+.. prompt:: bash $
+
+ nomad job run ceph-csi-plugin-controller.nomad
+ nomad job run ceph-csi-plugin-nodes.nomad
+
+The `ceph-csi`_ image will be downloaded.
+
+Check the plugin status after a few minutes:
+
+.. code-block:: console
+
+ $ nomad plugin status ceph-csi
+ ID = ceph-csi
+ Provider = rbd.csi.ceph.com
+ Version = 3.3.1
+ Controllers Healthy = 1
+ Controllers Expected = 1
+ Nodes Healthy = 1
+ Nodes Expected = 1
+
+ Allocations
+ ID Node ID Task Group Version Desired Status Created Modified
+ 23b4db0c a61ef171 nodes 4 run running 3h26m ago 3h25m ago
+ fee74115 a61ef171 controller 6 run running 3h26m ago 3h25m ago
+
+Using Ceph Block Devices
+========================
+
+Create rbd image
+----------------
+
+``ceph-csi`` requires the cephx credentials for communicating with the Ceph
+cluster. Generate a ``ceph-volume.hcl`` file similar to the example below,
+using the newly created nomad user id and cephx key::
+
+ id = "ceph-mysql"
+ name = "ceph-mysql"
+ type = "csi"
+ plugin_id = "ceph-csi"
+ capacity_max = "200G"
+ capacity_min = "100G"
+
+ capability {
+ access_mode = "single-node-writer"
+ attachment_mode = "file-system"
+ }
+
+ secrets {
+ userID = "admin"
+ userKey = "AQAlh9Rgg2vrDxAARy25T7KHabs6iskSHpAEAQ=="
+ }
+
+ parameters {
+ clusterID = "b9127830-b0cc-4e34-aa47-9d1a2e9949a8"
+ pool = "nomad"
+ imageFeatures = "layering"
+ }
+
+After the ``ceph-volume.hcl`` file has been generated, create the volume:
+
+.. prompt:: bash $
+
+ nomad volume create ceph-volume.hcl
+
+Use rbd image with a container
+------------------------------
+
+As an exercise in using an rbd image with a container, modify the Hashicorp
+`nomad stateful`_ example.
+
+Generate a ``mysql.nomad`` file similar to the example below::
+
+ job "mysql-server" {
+ datacenters = ["dc1"]
+ type = "service"
+ group "mysql-server" {
+ count = 1
+ volume "ceph-mysql" {
+ type = "csi"
+ attachment_mode = "file-system"
+ access_mode = "single-node-writer"
+ read_only = false
+ source = "ceph-mysql"
+ }
+ network {
+ port "db" {
+ static = 3306
+ }
+ }
+ restart {
+ attempts = 10
+ interval = "5m"
+ delay = "25s"
+ mode = "delay"
+ }
+ task "mysql-server" {
+ driver = "docker"
+ volume_mount {
+ volume = "ceph-mysql"
+ destination = "/srv"
+ read_only = false
+ }
+ env {
+ MYSQL_ROOT_PASSWORD = "password"
+ }
+ config {
+ image = "hashicorp/mysql-portworx-demo:latest"
+ args = ["--datadir", "/srv/mysql"]
+ ports = ["db"]
+ }
+ resources {
+ cpu = 500
+ memory = 1024
+ }
+ service {
+ name = "mysql-server"
+ port = "db"
+ check {
+ type = "tcp"
+ interval = "10s"
+ timeout = "2s"
+ }
+ }
+ }
+ }
+ }
+
+Start the job:
+
+.. prompt:: bash $
+
+ nomad job run mysql.nomad
+
+Check the status of the job:
+
+.. code-block:: console
+
+ $ nomad job status mysql-server
+ ...
+ Status = running
+ ...
+ Allocations
+ ID Node ID Task Group Version Desired Status Created Modified
+ 38070da7 9ad01c63 mysql-server 0 run running 6s ago 3s ago
+
+To check that data are persistent, modify the database, purge the job, then
+create it using the same file. The same RBD image will be used (re-used,
+really).
+
+.. _ceph-csi: https://github.com/ceph/ceph-csi/
+.. _csi: https://www.nomadproject.io/docs/internals/plugins/csi
+.. _Create a Pool: ../../rados/operations/pools#createpool
+.. _Placement Groups: ../../rados/operations/placement-groups
+.. _CRUSH tunables: ../../rados/operations/crush-map/#tunables
+.. _RBD image features: ../rbd-config-ref/#image-features
+.. _nomad stateful: https://learn.hashicorp.com/tutorials/nomad/stateful-workloads-csi-volumes?in=nomad/stateful-workloads#create-the-job-file
+.. _ceph-csi release: https://github.com/ceph/ceph-csi#ceph-csi-container-images-and-release-compatibility
diff --git a/doc/rbd/rbd-openstack.rst b/doc/rbd/rbd-openstack.rst
new file mode 100644
index 000000000..7d64b3548
--- /dev/null
+++ b/doc/rbd/rbd-openstack.rst
@@ -0,0 +1,395 @@
+=============================
+ Block Devices and OpenStack
+=============================
+
+.. index:: Ceph Block Device; OpenStack
+
+You can attach Ceph Block Device images to OpenStack instances through ``libvirt``,
+which configures the QEMU interface to ``librbd``. Ceph stripes block volumes
+across multiple OSDs within the cluster, which means that large volumes can
+realize better performance than local drives on a standalone server!
+
+To use Ceph Block Devices with OpenStack, you must install QEMU, ``libvirt``,
+and OpenStack first. We recommend using a separate physical node for your
+OpenStack installation. OpenStack recommends a minimum of 8GB of RAM and a
+quad-core processor. The following diagram depicts the OpenStack/Ceph
+technology stack.
+
+
+.. ditaa::
+
+ +---------------------------------------------------+
+ | OpenStack |
+ +---------------------------------------------------+
+ | libvirt |
+ +------------------------+--------------------------+
+ |
+ | configures
+ v
+ +---------------------------------------------------+
+ | QEMU |
+ +---------------------------------------------------+
+ | librbd |
+ +---------------------------------------------------+
+ | librados |
+ +------------------------+-+------------------------+
+ | OSDs | | Monitors |
+ +------------------------+ +------------------------+
+
+.. important:: To use Ceph Block Devices with OpenStack, you must have
+ access to a running Ceph Storage Cluster.
+
+Three parts of OpenStack integrate with Ceph's block devices:
+
+- **Images**: OpenStack Glance manages images for VMs. Images are immutable.
+ OpenStack treats images as binary blobs and downloads them accordingly.
+
+- **Volumes**: Volumes are block devices. OpenStack uses volumes to boot VMs,
+ or to attach volumes to running VMs. OpenStack manages volumes using
+ Cinder services.
+
+- **Guest Disks**: Guest disks are guest operating system disks. By default,
+ when you boot a virtual machine, its disk appears as a file on the file system
+ of the hypervisor (usually under ``/var/lib/nova/instances/<uuid>/``). Prior
+ to OpenStack Havana, the only way to boot a VM in Ceph was to use the
+ boot-from-volume functionality of Cinder. However, now it is possible to boot
+ every virtual machine inside Ceph directly without using Cinder, which is
+ advantageous because it allows you to perform maintenance operations easily
+ with the live-migration process. Additionally, if your hypervisor dies it is
+ also convenient to trigger ``nova evacuate`` and reinstate the virtual machine
+ elsewhere almost seamlessly. In doing so,
+ :ref:`exclusive locks <rbd-exclusive-locks>` prevent multiple
+ compute nodes from concurrently accessing the guest disk.
+
+
+You can use OpenStack Glance to store images as Ceph Block Devices, and you
+can use Cinder to boot a VM using a copy-on-write clone of an image.
+
+The instructions below detail the setup for Glance, Cinder and Nova, although
+they do not have to be used together. You may store images in Ceph block devices
+while running VMs using a local disk, or vice versa.
+
+.. important:: Using QCOW2 for hosting a virtual machine disk is NOT recommended.
+ If you want to boot virtual machines in Ceph (ephemeral backend or boot
+ from volume), please use the ``raw`` image format within Glance.
+
+.. index:: pools; OpenStack
+
+Create a Pool
+=============
+
+By default, Ceph block devices live within the ``rbd`` pool. You may use any
+suitable pool by specifying it explicitly. We recommend creating a pool for
+Cinder and a pool for Glance. Ensure your Ceph cluster is running, then create the pools. ::
+
+ ceph osd pool create volumes
+ ceph osd pool create images
+ ceph osd pool create backups
+ ceph osd pool create vms
+
+See `Create a Pool`_ for detail on specifying the number of placement groups for
+your pools, and `Placement Groups`_ for details on the number of placement
+groups you should set for your pools.
+
+Newly created pools must be initialized prior to use. Use the ``rbd`` tool
+to initialize the pools::
+
+ rbd pool init volumes
+ rbd pool init images
+ rbd pool init backups
+ rbd pool init vms
+
+.. _Create a Pool: ../../rados/operations/pools#createpool
+.. _Placement Groups: ../../rados/operations/placement-groups
+
+
+Configure OpenStack Ceph Clients
+================================
+
+The nodes running ``glance-api``, ``cinder-volume``, ``nova-compute`` and
+``cinder-backup`` act as Ceph clients. Each requires the ``ceph.conf`` file::
+
+ ssh {your-openstack-server} sudo tee /etc/ceph/ceph.conf </etc/ceph/ceph.conf
+
+
+Install Ceph client packages
+----------------------------
+
+On the ``glance-api`` node, you will need the Python bindings for ``librbd``::
+
+ sudo apt-get install python-rbd
+ sudo yum install python-rbd
+
+On the ``nova-compute``, ``cinder-backup`` and on the ``cinder-volume`` node,
+use both the Python bindings and the client command line tools::
+
+ sudo apt-get install ceph-common
+ sudo yum install ceph-common
+
+
+Setup Ceph Client Authentication
+--------------------------------
+
+If you have `cephx authentication`_ enabled, create a new user for Nova/Cinder
+and Glance. Execute the following::
+
+ ceph auth get-or-create client.glance mon 'profile rbd' osd 'profile rbd pool=images' mgr 'profile rbd pool=images'
+ ceph auth get-or-create client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images' mgr 'profile rbd pool=volumes, profile rbd pool=vms'
+ ceph auth get-or-create client.cinder-backup mon 'profile rbd' osd 'profile rbd pool=backups' mgr 'profile rbd pool=backups'
+
+Add the keyrings for ``client.cinder``, ``client.glance``, and
+``client.cinder-backup`` to the appropriate nodes and change their ownership::
+
+ ceph auth get-or-create client.glance | ssh {your-glance-api-server} sudo tee /etc/ceph/ceph.client.glance.keyring
+ ssh {your-glance-api-server} sudo chown glance:glance /etc/ceph/ceph.client.glance.keyring
+ ceph auth get-or-create client.cinder | ssh {your-volume-server} sudo tee /etc/ceph/ceph.client.cinder.keyring
+ ssh {your-cinder-volume-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring
+ ceph auth get-or-create client.cinder-backup | ssh {your-cinder-backup-server} sudo tee /etc/ceph/ceph.client.cinder-backup.keyring
+ ssh {your-cinder-backup-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder-backup.keyring
+
+Nodes running ``nova-compute`` need the keyring file for the ``nova-compute``
+process::
+
+ ceph auth get-or-create client.cinder | ssh {your-nova-compute-server} sudo tee /etc/ceph/ceph.client.cinder.keyring
+
+They also need to store the secret key of the ``client.cinder`` user in
+``libvirt``. The libvirt process needs it to access the cluster while attaching
+a block device from Cinder.
+
+Create a temporary copy of the secret key on the nodes running
+``nova-compute``::
+
+ ceph auth get-key client.cinder | ssh {your-compute-node} tee client.cinder.key
+
+Then, on the compute nodes, add the secret key to ``libvirt`` and remove the
+temporary copy of the key::
+
+ uuidgen
+ 457eb676-33da-42ec-9a8c-9293d545c337
+
+ cat > secret.xml <<EOF
+ <secret ephemeral='no' private='no'>
+ <uuid>457eb676-33da-42ec-9a8c-9293d545c337</uuid>
+ <usage type='ceph'>
+ <name>client.cinder secret</name>
+ </usage>
+ </secret>
+ EOF
+ sudo virsh secret-define --file secret.xml
+ Secret 457eb676-33da-42ec-9a8c-9293d545c337 created
+ sudo virsh secret-set-value --secret 457eb676-33da-42ec-9a8c-9293d545c337 --base64 $(cat client.cinder.key) && rm client.cinder.key secret.xml
+
+Save the uuid of the secret for configuring ``nova-compute`` later.
+
+.. important:: You don't necessarily need the UUID on all the compute nodes.
+ However from a platform consistency perspective, it's better to keep the
+ same UUID.
+
+.. _cephx authentication: ../../rados/configuration/auth-config-ref/#enabling-disabling-cephx
+
+
+Configure OpenStack to use Ceph
+===============================
+
+Configuring Glance
+------------------
+
+Glance can use multiple back ends to store images. To use Ceph block devices by
+default, configure Glance like the following.
+
+
+Kilo and after
+~~~~~~~~~~~~~~
+
+Edit ``/etc/glance/glance-api.conf`` and add under the ``[glance_store]`` section::
+
+ [glance_store]
+ stores = rbd
+ default_store = rbd
+ rbd_store_pool = images
+ rbd_store_user = glance
+ rbd_store_ceph_conf = /etc/ceph/ceph.conf
+ rbd_store_chunk_size = 8
+
+For more information about the configuration options available in Glance please refer to the OpenStack Configuration Reference: http://docs.openstack.org/.
+
+Enable copy-on-write cloning of images
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Note that this exposes the back end location via Glance's API, so the endpoint
+with this option enabled should not be publicly accessible.
+
+Any OpenStack version except Mitaka
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If you want to enable copy-on-write cloning of images, also add under the ``[DEFAULT]`` section::
+
+ show_image_direct_url = True
+
+Disable cache management (any OpenStack version)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Disable the Glance cache management to avoid images getting cached under ``/var/lib/glance/image-cache/``,
+assuming your configuration file has ``flavor = keystone+cachemanagement``::
+
+ [paste_deploy]
+ flavor = keystone
+
+Image properties
+~~~~~~~~~~~~~~~~
+
+We recommend to use the following properties for your images:
+
+- ``hw_scsi_model=virtio-scsi``: add the virtio-scsi controller and get better performance and support for discard operation
+- ``hw_disk_bus=scsi``: connect every cinder block devices to that controller
+- ``hw_qemu_guest_agent=yes``: enable the QEMU guest agent
+- ``os_require_quiesce=yes``: send fs-freeze/thaw calls through the QEMU guest agent
+
+
+Configuring Cinder
+------------------
+
+OpenStack requires a driver to interact with Ceph block devices. You must also
+specify the pool name for the block device. On your OpenStack node, edit
+``/etc/cinder/cinder.conf`` by adding::
+
+ [DEFAULT]
+ ...
+ enabled_backends = ceph
+ glance_api_version = 2
+ ...
+ [ceph]
+ volume_driver = cinder.volume.drivers.rbd.RBDDriver
+ volume_backend_name = ceph
+ rbd_pool = volumes
+ rbd_ceph_conf = /etc/ceph/ceph.conf
+ rbd_flatten_volume_from_snapshot = false
+ rbd_max_clone_depth = 5
+ rbd_store_chunk_size = 4
+ rados_connect_timeout = -1
+
+If you are using `cephx authentication`_, also configure the user and uuid of
+the secret you added to ``libvirt`` as documented earlier::
+
+ [ceph]
+ ...
+ rbd_user = cinder
+ rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
+
+Note that if you are configuring multiple cinder back ends,
+``glance_api_version = 2`` must be in the ``[DEFAULT]`` section.
+
+
+Configuring Cinder Backup
+-------------------------
+
+OpenStack Cinder Backup requires a specific daemon so don't forget to install it.
+On your Cinder Backup node, edit ``/etc/cinder/cinder.conf`` and add::
+
+ backup_driver = cinder.backup.drivers.ceph
+ backup_ceph_conf = /etc/ceph/ceph.conf
+ backup_ceph_user = cinder-backup
+ backup_ceph_chunk_size = 134217728
+ backup_ceph_pool = backups
+ backup_ceph_stripe_unit = 0
+ backup_ceph_stripe_count = 0
+ restore_discard_excess_bytes = true
+
+
+Configuring Nova to attach Ceph RBD block device
+------------------------------------------------
+
+In order to attach Cinder devices (either normal block or by issuing a boot
+from volume), you must tell Nova (and libvirt) which user and UUID to refer to
+when attaching the device. libvirt will refer to this user when connecting and
+authenticating with the Ceph cluster. ::
+
+ [libvirt]
+ ...
+ rbd_user = cinder
+ rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
+
+These two flags are also used by the Nova ephemeral back end.
+
+
+Configuring Nova
+----------------
+
+In order to boot virtual machines directly from Ceph volumes, you must
+configure the ephemeral backend for Nova.
+
+It is recommended to enable the RBD cache in your Ceph configuration file; this
+has been enabled by default since the Giant release. Moreover, enabling the
+client admin socket allows the collection of metrics and can be invaluable
+for troubleshooting.
+
+This socket can be accessed on the hypervisor (Nova compute) node::
+
+ ceph daemon /var/run/ceph/ceph-client.cinder.19195.32310016.asok help
+
+To enable RBD cache and admin sockets, ensure that on each hypervisor's
+``ceph.conf`` contains::
+
+ [client]
+ rbd cache = true
+ rbd cache writethrough until flush = true
+ admin socket = /var/run/ceph/guests/$cluster-$type.$id.$pid.$cctid.asok
+ log file = /var/log/qemu/qemu-guest-$pid.log
+ rbd concurrent management ops = 20
+
+Configure permissions for these directories::
+
+ mkdir -p /var/run/ceph/guests/ /var/log/qemu/
+ chown qemu:libvirtd /var/run/ceph/guests /var/log/qemu/
+
+Note that user ``qemu`` and group ``libvirtd`` can vary depending on your system.
+The provided example works for RedHat based systems.
+
+.. tip:: If your virtual machine is already running you can simply restart it to enable the admin socket
+
+
+Restart OpenStack
+=================
+
+To activate the Ceph block device driver and load the block device pool name
+into the configuration, you must restart the related OpenStack services.
+For Debian based systems execute these commands on the appropriate nodes::
+
+ sudo glance-control api restart
+ sudo service nova-compute restart
+ sudo service cinder-volume restart
+ sudo service cinder-backup restart
+
+For Red Hat based systems execute::
+
+ sudo service openstack-glance-api restart
+ sudo service openstack-nova-compute restart
+ sudo service openstack-cinder-volume restart
+ sudo service openstack-cinder-backup restart
+
+Once OpenStack is up and running, you should be able to create a volume
+and boot from it.
+
+
+Booting from a Block Device
+===========================
+
+You can create a volume from an image using the Cinder command line tool::
+
+ cinder create --image-id {id of image} --display-name {name of volume} {size of volume}
+
+You can use `qemu-img`_ to convert from one format to another. For example::
+
+ qemu-img convert -f {source-format} -O {output-format} {source-filename} {output-filename}
+ qemu-img convert -f qcow2 -O raw precise-cloudimg.img precise-cloudimg.raw
+
+When Glance and Cinder are both using Ceph block devices, the image is a
+copy-on-write clone, so new volumes are created quickly. In the OpenStack
+dashboard, you can boot from that volume by performing the following steps:
+
+#. Launch a new instance.
+#. Choose the image associated to the copy-on-write clone.
+#. Select 'boot from volume'.
+#. Select the volume you created.
+
+.. _qemu-img: ../qemu-rbd/#running-qemu-with-rbd
diff --git a/doc/rbd/rbd-operations.rst b/doc/rbd/rbd-operations.rst
new file mode 100644
index 000000000..df702114b
--- /dev/null
+++ b/doc/rbd/rbd-operations.rst
@@ -0,0 +1,16 @@
+==============================
+ Ceph Block Device Operations
+==============================
+
+.. toctree::
+ :maxdepth: 1
+
+ Snapshots<rbd-snapshot>
+ Exclusive Locking <rbd-exclusive-locks>
+ Mirroring <rbd-mirroring>
+ Live-Migration <rbd-live-migration>
+ Persistent Read-only Cache <rbd-persistent-read-only-cache>
+ Persistent Write Log Cache <rbd-persistent-write-log-cache>
+ Encryption <rbd-encryption>
+ Config Settings (librbd) <rbd-config-ref/>
+ RBD Replay <rbd-replay>
diff --git a/doc/rbd/rbd-persistent-read-only-cache.rst b/doc/rbd/rbd-persistent-read-only-cache.rst
new file mode 100644
index 000000000..5bef7f592
--- /dev/null
+++ b/doc/rbd/rbd-persistent-read-only-cache.rst
@@ -0,0 +1,201 @@
+===============================
+ RBD Persistent Read-only Cache
+===============================
+
+.. index:: Ceph Block Device; Persistent Read-only Cache
+
+Shared, Read-only Parent Image Cache
+====================================
+
+`Cloned RBD images`_ usually modify only a small fraction of the parent
+image. For example, in a VDI use-case, VMs are cloned from the same
+base image and initially differ only by hostname and IP address. During
+booting, all of these VMs read portions of the same parent
+image data. If we have a local cache of the parent
+image, this speeds up reads on the caching host. We also achieve
+reduction of client-to-cluster network traffic.
+RBD cache must be explicitly enabled in
+``ceph.conf``. The ``ceph-immutable-object-cache`` daemon is responsible for
+caching the parent content on the local disk, and future reads on that data
+will be serviced from the local cache.
+
+.. note:: RBD shared read-only parent image cache requires the Ceph Nautilus release or later.
+
+.. ditaa::
+
+ +--------------------------------------------------------+
+ | QEMU |
+ +--------------------------------------------------------+
+ | librbd (cloned images) |
+ +-------------------+-+----------------------------------+
+ | librados | | ceph--immutable--object--cache |
+ +-------------------+ +----------------------------------+
+ | OSDs/Mons | | local cached parent image |
+ +-------------------+ +----------------------------------+
+
+
+Enable RBD Shared Read-only Parent Image Cache
+----------------------------------------------
+
+To enable RBD shared read-only parent image cache, the following Ceph settings
+need to added in the ``[client]`` `section`_ of your ``ceph.conf`` file::
+
+ rbd parent cache enabled = true
+ rbd plugins = parent_cache
+
+Immutable Object Cache Daemon
+=============================
+
+Introduction and Generic Settings
+---------------------------------
+
+The ``ceph-immutable-object-cache`` daemon is responsible for caching parent
+image content within its local caching directory. Using SSDs as the underlying
+storage is recommended because doing so provides better performance.
+
+The key components of the daemon are:
+
+#. **Domain socket based IPC:** The daemon listens on a local domain socket at
+ startup and waits for connections from librbd clients.
+
+#. **LRU based promotion/demotion policy:** The daemon maintains in-memory
+ statistics of cache hits for each cache file. It demotes the cold cache
+ if capacity reaches the configured threshold.
+
+#. **File-based caching store:** The daemon maintains a simple file-based cache
+ store. On promotion, the RADOS objects are fetched from RADOS cluster and
+ stored in the local caching directory.
+
+When each cloned RBD image is opened, ``librbd`` tries to connect to the cache
+daemon through its Unix domain socket. After ``librbd`` is successfully
+connected, it coordinates with the daemon upon every subsequent read. In the
+case of an uncached read, the daemon promotes the RADOS object to the local
+caching directory and the next read of the object is serviced from the cache.
+The daemon maintains simple LRU statistics, which are used to evict cold cache
+files when required (for example, when the cache is at capacity and under
+pressure).
+
+Here are some important cache configuration settings:
+
+``immutable_object_cache_sock``
+
+:Description: The path to the domain socket used for communication between
+ librbd clients and the ceph-immutable-object-cache daemon.
+:Type: String
+:Required: No
+:Default: ``/var/run/ceph/immutable_object_cache_sock``
+
+
+``immutable_object_cache_path``
+
+:Description: The immutable object cache data directory.
+:Type: String
+:Required: No
+:Default: ``/tmp/ceph_immutable_object_cache``
+
+
+``immutable_object_cache_max_size``
+
+:Description: The max size for immutable cache.
+:Type: Size
+:Required: No
+:Default: ``1G``
+
+
+``immutable_object_cache_watermark``
+
+:Description: The high-water mark for the cache. The value is between (0, 1).
+ If the cache size reaches this threshold the daemon will start
+ to delete cold cache based on LRU statistics.
+:Type: Float
+:Required: No
+:Default: ``0.9``
+
+The ``ceph-immutable-object-cache`` daemon is available within the optional
+``ceph-immutable-object-cache`` distribution package.
+
+.. important:: ``ceph-immutable-object-cache`` daemon requires the ability to
+ connect RADOS clusters.
+
+Running the Immutable Object Cache Daemon
+-----------------------------------------
+
+``ceph-immutable-object-cache`` daemon should use a unique Ceph user ID.
+To `create a Ceph user`_, with ``ceph`` specify the ``auth get-or-create``
+command, user name, monitor caps, and OSD caps::
+
+ ceph auth get-or-create client.ceph-immutable-object-cache.{unique id} mon 'allow r' osd 'profile rbd-read-only'
+
+The ``ceph-immutable-object-cache`` daemon can be managed by ``systemd`` by specifying the user
+ID as the daemon instance::
+
+ systemctl enable ceph-immutable-object-cache@ceph-immutable-object-cache.{unique id}
+
+The ``ceph-immutable-object-cache`` can also be run in foreground by ``ceph-immutable-object-cache`` command::
+
+ ceph-immutable-object-cache -f --log-file={log_path}
+
+QOS Settings
+------------
+
+The immutable object cache supports throttling, controlled by the following settings:
+
+``immutable_object_cache_qos_schedule_tick_min``
+
+:Description: Minimum schedule tick for immutable object cache.
+:Type: Milliseconds
+:Required: No
+:Default: ``50``
+
+
+``immutable_object_cache_qos_iops_limit``
+
+:Description: The desired immutable object cache IO operations limit per second.
+:Type: Unsigned Integer
+:Required: No
+:Default: ``0``
+
+
+``immutable_object_cache_qos_iops_burst``
+
+:Description: The desired burst limit of immutable object cache IO operations.
+:Type: Unsigned Integer
+:Required: No
+:Default: ``0``
+
+
+``immutable_object_cache_qos_iops_burst_seconds``
+
+:Description: The desired burst duration in seconds of immutable object cache IO operations.
+:Type: Seconds
+:Required: No
+:Default: ``1``
+
+
+``immutable_object_cache_qos_bps_limit``
+
+:Description: The desired immutable object cache IO bytes limit per second.
+:Type: Unsigned Integer
+:Required: No
+:Default: ``0``
+
+
+``immutable_object_cache_qos_bps_burst``
+
+:Description: The desired burst limit of immutable object cache IO bytes.
+:Type: Unsigned Integer
+:Required: No
+:Default: ``0``
+
+
+``immutable_object_cache_qos_bps_burst_seconds``
+
+:Description: The desired burst duration in seconds of immutable object cache IO bytes.
+:Type: Seconds
+:Required: No
+:Default: ``1``
+
+.. _Cloned RBD Images: ../rbd-snapshot/#layering
+.. _section: ../../rados/configuration/ceph-conf/#configuration-sections
+.. _create a Ceph user: ../../rados/operations/user-management#add-a-user
+
diff --git a/doc/rbd/rbd-persistent-write-log-cache.rst b/doc/rbd/rbd-persistent-write-log-cache.rst
new file mode 100644
index 000000000..af323962d
--- /dev/null
+++ b/doc/rbd/rbd-persistent-write-log-cache.rst
@@ -0,0 +1,139 @@
+================================
+ RBD Persistent Write Log Cache
+================================
+
+.. index:: Ceph Block Device; Persistent Write Log Cache
+
+Persistent Write Log Cache
+===========================
+
+The Persistent Write Log Cache (PWL) provides a persistent, fault-tolerant
+write-back cache for librbd-based RBD clients.
+
+This cache uses a log-ordered write-back design which maintains checkpoints
+internally so that writes that get flushed back to the cluster are always
+crash consistent. Even if the client cache is lost entirely, the disk image is
+still consistent but the data will appear to be stale.
+
+This cache can be used with PMEM or SSD as a cache device. For PMEM, the cache
+mode is called ``replica write log (rwl)``. At present, only local cache is
+supported, and the replica function is under development. For SSD, the cache
+mode is called ``ssd``.
+
+Usage
+=====
+
+The PWL cache manages the cache data in a persistent device. It looks for and
+creates cache files in a configured directory, and then caches data in the
+file.
+
+The PWL cache depends on the exclusive-lock feature. The cache can be loaded
+only after the exclusive lock is acquired.
+
+The cache provides two different persistence modes. In persistent-on-write mode,
+the writes are completed only when they are persisted to the cache device and
+will be readable after a crash. In persistent-on-flush mode, the writes are
+completed as soon as it no longer needs the caller's data buffer to complete
+the writes, but does not guarantee that writes will be readable after a crash.
+The data is persisted to the cache device when a flush request is received.
+
+Initially it defaults to the persistent-on-write mode and it switches to
+persistent-on-flush mode after the first flush request is received.
+
+Enable Cache
+========================================
+
+To enable the PWL cache, set the following configuration settings::
+
+ rbd_persistent_cache_mode = {cache-mode}
+ rbd_plugins = pwl_cache
+
+Value of {cache-mode} can be ``rwl``, ``ssd`` or ``disabled``. By default the
+cache is disabled.
+
+The ``rwl`` cache mode depends on libpmem library (part of PMDK). It should
+be universally available on x86_64 architecture and may also be available on
+ppc64le and aarch64 architectures on some distributions. It is not available
+on s390x architecture.
+
+Here are some cache configuration settings:
+
+- ``rbd_persistent_cache_path`` A file folder to cache data. This folder must
+ have DAX enabled (see `DAX`_) when using ``rwl`` mode to avoid performance
+ degradation.
+
+- ``rbd_persistent_cache_size`` The cache size per image. The minimum cache
+ size is 1 GB.
+
+The above configurations can be set per-host, per-pool, per-image etc. Eg, to
+set per-host, add the overrides to the appropriate `section`_ in the host's
+``ceph.conf`` file. To set per-pool, per-image, etc, please refer to the
+``rbd config`` `commands`_.
+
+Cache Status
+------------
+
+The PWL cache is enabled when the exclusive lock is acquired,
+and it is closed when the exclusive lock is released. To check the cache status,
+users may use the command ``rbd status``. ::
+
+ rbd status {pool-name}/{image-name}
+
+The status of the cache is shown, including present, clean, cache size and the
+location as well as some basic metrics.
+
+For example::
+
+ $ rbd status rbd/foo
+ Watchers:
+ watcher=10.10.0.102:0/1061883624 client.25496 cookie=140338056493088
+ Persistent cache state:
+ host: sceph9
+ path: /mnt/nvme0/rbd-pwl.rbd.101e5824ad9a.pool
+ size: 1 GiB
+ mode: ssd
+ stats_timestamp: Sun Apr 10 13:26:32 2022
+ present: true empty: false clean: false
+ allocated: 509 MiB
+ cached: 501 MiB
+ dirty: 338 MiB
+ free: 515 MiB
+ hits_full: 1450 / 61%
+ hits_partial: 0 / 0%
+ misses: 924
+ hit_bytes: 192 MiB / 66%
+ miss_bytes: 97 MiB
+
+Flush Cache
+-----------
+
+To flush a cache file with ``rbd``, specify the ``persistent-cache flush``
+command, the pool name and the image name. ::
+
+ rbd persistent-cache flush {pool-name}/{image-name}
+
+If the application dies unexpectedly, this command can also be used to flush
+the cache back to OSDs.
+
+For example::
+
+ $ rbd persistent-cache flush rbd/foo
+
+Invalidate Cache
+----------------
+
+To invalidate (discard) a cache file with ``rbd``, specify the
+``persistent-cache invalidate`` command, the pool name and the image name. ::
+
+ rbd persistent-cache invalidate {pool-name}/{image-name}
+
+The command removes the cache metadata of the corresponding image, disables
+the cache feature and deletes the local cache file if it exists.
+
+For example::
+
+ $ rbd persistent-cache invalidate rbd/foo
+
+.. _section: ../../rados/configuration/ceph-conf/#configuration-sections
+.. _commands: ../../man/8/rbd#commands
+.. _DAX: https://www.kernel.org/doc/Documentation/filesystems/dax.txt
diff --git a/doc/rbd/rbd-replay.rst b/doc/rbd/rbd-replay.rst
new file mode 100644
index 000000000..b1fc4973f
--- /dev/null
+++ b/doc/rbd/rbd-replay.rst
@@ -0,0 +1,42 @@
+===================
+ RBD Replay
+===================
+
+.. index:: Ceph Block Device; RBD Replay
+
+RBD Replay is a set of tools for capturing and replaying RADOS Block Device
+(RBD) workloads. To capture an RBD workload, ``lttng-tools`` must be installed
+on the client, and ``librbd`` on the client must be the v0.87 (Giant) release
+or later. To replay an RBD workload, ``librbd`` on the client must be the Giant
+release or later.
+
+Capture and replay takes three steps:
+
+#. Capture the trace. Make sure to capture ``pthread_id`` context::
+
+ mkdir -p traces
+ lttng create -o traces librbd
+ lttng enable-event -u 'librbd:*'
+ lttng add-context -u -t pthread_id
+ lttng start
+ # run RBD workload here
+ lttng stop
+
+#. Process the trace with `rbd-replay-prep`_::
+
+ rbd-replay-prep traces/ust/uid/*/* replay.bin
+
+#. Replay the trace with `rbd-replay`_. Use read-only until you know
+ it's doing what you want::
+
+ rbd-replay --read-only replay.bin
+
+.. important:: ``rbd-replay`` will destroy data by default. Do not use against
+ an image you wish to keep, unless you use the ``--read-only`` option.
+
+The replayed workload does not have to be against the same RBD image or even the
+same cluster as the captured workload. To account for differences, you may need
+to use the ``--pool`` and ``--map-image`` options of ``rbd-replay``.
+
+.. _rbd-replay: ../../man/8/rbd-replay
+.. _rbd-replay-prep: ../../man/8/rbd-replay-prep
diff --git a/doc/rbd/rbd-snapshot.rst b/doc/rbd/rbd-snapshot.rst
new file mode 100644
index 000000000..120dd8ec1
--- /dev/null
+++ b/doc/rbd/rbd-snapshot.rst
@@ -0,0 +1,368 @@
+===========
+ Snapshots
+===========
+
+.. index:: Ceph Block Device; snapshots
+
+A snapshot is a read-only logical copy of an image at a particular point in
+time: a checkpoint. One of the advanced features of Ceph block devices is that
+you can create snapshots of images to retain point-in-time state history. Ceph
+also supports snapshot layering, which allows you to clone images (for example,
+VM images) quickly and easily. Ceph block device snapshots are managed using
+the ``rbd`` command and several higher-level interfaces, including `QEMU`_,
+`libvirt`_, `OpenStack`_, and `CloudStack`_.
+
+.. important:: To use RBD snapshots, you must have a running Ceph cluster.
+
+
+.. note:: Because RBD is unaware of any file system within an image (volume),
+ snapshots are merely `crash-consistent` unless they are coordinated within
+ the mounting (attaching) operating system. We therefore recommend that you
+ pause or stop I/O before taking a snapshot.
+
+ If the volume contains a file system, the file system should be in an
+ internally consistent state before a snapshot is taken. Snapshots taken
+ without write quiescing could need an `fsck` pass before they are mounted
+ again. To quiesce I/O you can use `fsfreeze` command. See the `fsfreeze(8)`
+ man page for more details.
+
+ For virtual machines, `qemu-guest-agent` can be used to automatically freeze
+ file systems when creating a snapshot.
+
+.. ditaa::
+
+ +------------+ +-------------+
+ | {s} | | {s} c999 |
+ | Active |<-------*| Snapshot |
+ | Image | | of Image |
+ | (stop i/o) | | (read only) |
+ +------------+ +-------------+
+
+
+Cephx Notes
+===========
+
+When `cephx`_ authentication is enabled (it is by default), you must specify a
+user name or ID and a path to the keyring containing the corresponding key. See
+:ref:`User Management <user-management>` for details.
+
+.. prompt:: bash $
+
+ rbd --id {user-ID} --keyring /path/to/secret [commands]
+ rbd --name {username} --keyring /path/to/secret [commands]
+
+For example:
+
+.. prompt:: bash $
+
+ rbd --id admin --keyring /etc/ceph/ceph.keyring [commands]
+ rbd --name client.admin --keyring /etc/ceph/ceph.keyring [commands]
+
+.. tip:: Add the user and secret to the ``CEPH_ARGS`` environment variable to
+ avoid re-entry of these parameters.
+
+
+Snapshot Basics
+===============
+
+The following procedures demonstrate how to create, list, and remove
+snapshots using the ``rbd`` command.
+
+Create Snapshot
+---------------
+
+To create a snapshot, use the ``rbd snap create`` command and specify the pool
+name, the image name, and the snap name:
+
+.. prompt:: bash $
+
+ rbd snap create {pool-name}/{image-name}@{snap-name}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd snap create rbd/foo@snapname
+
+
+List Snapshots
+--------------
+
+To list the snapshots of an image, use the ``rbd snap ls`` command and specify
+the pool name and the image name:
+
+.. prompt:: bash $
+
+ rbd snap ls {pool-name}/{image-name}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd snap ls rbd/foo
+
+
+Roll back Snapshot
+------------------
+
+To roll back to a snapshot, use the ``rbd snap rollback`` command and specify
+the pool name, the image name, and the snap name:
+
+.. prompt:: bash $
+
+ rbd snap rollback {pool-name}/{image-name}@{snap-name}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd snap rollback rbd/foo@snapname
+
+
+.. note:: Rolling back an image to a snapshot means overwriting the current
+ version of the image with data from a snapshot. The time it takes to execute
+ a rollback increases with the size of the image. It is **faster to clone**
+ from a snapshot **than to roll back** an image to a snapshot. Cloning from a
+ snapshot is the preferred method of returning to a pre-existing state.
+
+
+Delete a Snapshot
+-----------------
+
+To delete a snapshot, use the ``rbd snap rm`` command and specify the pool
+name, the image name, and the snap name:
+
+.. prompt:: bash $
+
+ rbd snap rm {pool-name}/{image-name}@{snap-name}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd snap rm rbd/foo@snapname
+
+
+.. note:: Ceph OSDs delete data asynchronously, so deleting a snapshot does
+ not immediately free up the capacity of the underlying OSDs. This process is
+ known as "snaptrim", and is referred to as such in ``ceph status`` output.
+
+Purge Snapshots
+---------------
+
+To delete all snapshots, use the ``rbd snap purge`` command and specify the
+pool name and the image name:
+
+.. prompt:: bash $
+
+ rbd snap purge {pool-name}/{image-name}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd snap purge rbd/foo
+
+
+.. index:: Ceph Block Device; snapshot layering
+
+Layering
+========
+
+Ceph supports the ability to create many copy-on-write (COW) clones of a block
+device snapshot. Snapshot layering enables Ceph block device clients to create
+images very quickly. For example, you might create a block device image with a
+Linux VM written to it, snapshot the image, protect the snapshot, and create as
+many copy-on-write clones as you like. A snapshot is read-only, so cloning a
+snapshot simplifies semantics, making it possible to create clones rapidly.
+
+
+.. ditaa::
+
+ +-------------+ +-------------+
+ | {s} c999 | | {s} |
+ | Snapshot | Child refers | COW Clone |
+ | of Image |<------------*| of Snapshot |
+ | | to Parent | |
+ | (read only) | | (writable) |
+ +-------------+ +-------------+
+
+ Parent Child
+
+.. note:: The terms "parent" and "child" refer to a Ceph block device snapshot
+ (parent) and the corresponding image cloned from the snapshot (child).
+ These terms are important for the command line usage below.
+
+Each cloned image (child) stores a reference to its parent image, which enables
+the cloned image to open the parent snapshot and read it.
+
+A copy-on-write clone of a snapshot behaves exactly like any other Ceph
+block device image. You can read to, write from, clone, and resize cloned
+images. There are no special restrictions with cloned images. However, the
+copy-on-write clone of a snapshot depends on the snapshot, so you must
+protect the snapshot before you clone it. The diagram below depicts this
+process.
+
+.. note:: Ceph supports the cloning of only "RBD format 2" images (that is,
+ images created without specifying ``--image-format 1``). The Linux kernel
+ client supports cloned images beginning with the 3.10 release.
+
+Getting Started with Layering
+-----------------------------
+
+Ceph block device layering is a simple process. You must have an image. You
+must create a snapshot of the image. You must protect the snapshot. After you
+have performed these steps, you can begin cloning the snapshot.
+
+.. ditaa::
+
+ +----------------------------+ +-----------------------------+
+ | | | |
+ | Create Block Device Image |------->| Create a Snapshot |
+ | | | |
+ +----------------------------+ +-----------------------------+
+ |
+ +--------------------------------------+
+ |
+ v
+ +----------------------------+ +-----------------------------+
+ | | | |
+ | Protect the Snapshot |------->| Clone the Snapshot |
+ | | | |
+ +----------------------------+ +-----------------------------+
+
+
+The cloned image has a reference to the parent snapshot, and includes the pool
+ID, the image ID, and the snapshot ID. The inclusion of the pool ID means that
+you may clone snapshots from one pool to images in another pool.
+
+#. **Image Template:** A common use case for block device layering is to create
+ a base image and a snapshot that serves as a template for clones. For
+ example: a user may create an image for a Linux distribution (for example,
+ Ubuntu 22.04) and create a snapshot of it. The user may occasionally update
+ the image and create a new snapshot (by using such commands as ``sudo
+ apt-get update``, ``sudo apt-get upgrade``, or ``sudo apt-get dist-upgrade``
+ followed by ``rbd snap create``). As the image matures, the user can clone
+ any one of the snapshots.
+
+#. **Extended Template:** A more advanced use case includes extending a
+ template image to provide more information than a base image. For
+ example, a user may clone an image (for example, a VM template) and install
+ other software (for example, a database, a content management system, an
+ analytics system) and then snapshot the extended image, which may itself be
+ updated just like the base image.
+
+#. **Template Pool:** One way to use block device layering is to create a pool
+ that contains (1) base images that act as templates and (2) snapshots of
+ those templates. You may then extend read-only privileges to users so that
+ they may clone the snapshots even though they do not have permissions that
+ allow them to write or execute within the pool.
+
+#. **Image Migration/Recovery:** One way to use block device layering is to
+ migrate or recover data from one pool into another pool.
+
+Protecting a Snapshot
+---------------------
+
+Clones access the parent snapshots. All clones would break if a user
+inadvertently deleted the parent snapshot. To prevent data loss, you must
+protect the snapshot before you can clone it:
+
+.. prompt:: bash $
+
+ rbd snap protect {pool-name}/{image-name}@{snapshot-name}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd snap protect rbd/foo@snapname
+
+.. note:: You cannot delete a protected snapshot.
+
+Cloning a Snapshot
+------------------
+
+To clone a snapshot, specify the parent pool, the parent image, and the parent
+snapshot; and also the child pool together with the image name. You must
+protect the snapshot before you can clone it:
+
+.. prompt:: bash $
+
+ rbd clone {pool-name}/{parent-image-name}@{snap-name} {pool-name}/{child-image-name}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd clone rbd/foo@snapname rbd/bar
+
+
+.. note:: You may clone a snapshot from one pool to an image in another pool.
+ For example, you may maintain read-only images and snapshots as templates in
+ one pool, and writeable clones in another pool.
+
+Unprotecting a Snapshot
+-----------------------
+
+Before you can delete a snapshot, you must first unprotect it. Additionally,
+you may *NOT* delete snapshots that have references from clones. You must
+flatten or delete each clone of a snapshot before you can unprotect the
+snapshot:
+
+.. prompt:: bash $
+
+ rbd snap unprotect {pool-name}/{image-name}@{snapshot-name}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd snap unprotect rbd/foo@snapname
+
+
+Listing Children of a Snapshot
+------------------------------
+
+To list the children of a snapshot, use the ``rbd children`` command and
+specify the pool name, the image name, and the snap name:
+
+.. prompt:: bash $
+
+ rbd children {pool-name}/{image-name}@{snapshot-name}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd children rbd/foo@snapname
+
+
+Flattening a Cloned Image
+-------------------------
+
+Cloned images retain a reference to the parent snapshot. When you remove the
+reference to the parent snapshot from the clone, you effectively "flatten" the
+clone by copying the data stored in the snapshot to the clone. The time it
+takes to flatten a clone increases with the size of the snapshot. To delete a
+snapshot, you must first flatten the child images (or delete them):
+
+.. prompt:: bash $
+
+ rbd flatten {pool-name}/{image-name}
+
+For example:
+
+.. prompt:: bash $
+
+ rbd flatten rbd/bar
+
+.. note:: Since a flattened image contains all the data stored in the snapshot,
+ a flattened image takes up more storage space than a layered clone does.
+
+
+.. _cephx: ../../rados/configuration/auth-config-ref/
+.. _QEMU: ../qemu-rbd/
+.. _OpenStack: ../rbd-openstack/
+.. _CloudStack: ../rbd-cloudstack/
+.. _libvirt: ../libvirt/
diff --git a/doc/rbd/rbd-windows.rst b/doc/rbd/rbd-windows.rst
new file mode 100644
index 000000000..df4bd172e
--- /dev/null
+++ b/doc/rbd/rbd-windows.rst
@@ -0,0 +1,235 @@
+==============
+RBD on Windows
+==============
+
+The ``rbd`` command can be used to create, remove, import, export, map or
+unmap images exactly like it would on Linux. Make sure to check the
+`RBD basic commands`_ guide.
+
+``librbd.dll`` is also available for applications that can natively use Ceph.
+
+Please check the `installation guide`_ to get started.
+
+Windows service
+===============
+On Windows, ``rbd-wnbd`` daemons are managed by a centralized service. This allows
+decoupling the daemons from the Windows session from which they originate. At
+the same time, the service is responsible of recreating persistent mappings,
+usually when the host boots.
+
+Note that only one such service may run per host.
+
+By default, all image mappings are persistent. Non-persistent mappings can be
+requested using the ``-onon-persistent`` ``rbd`` flag.
+
+Persistent mappings are recreated when the service starts, unless explicitly
+unmapped. The service disconnects the mappings when being stopped. This also
+allows adjusting the Windows service start order so that RBD images can be
+mapped before starting services that may depend on it, such as VMMS.
+
+In order to be able to reconnect the images, ``rbd-wnbd`` stores mapping
+information in the Windows registry at the following location:
+``SYSTEM\CurrentControlSet\Services\rbd-wnbd``.
+
+The following command can be used to configure the service. Please update
+the ``rbd-wnbd.exe`` path accordingly::
+
+ New-Service -Name "ceph-rbd" `
+ -Description "Ceph RBD Mapping Service" `
+ -BinaryPathName "c:\ceph\rbd-wnbd.exe service" `
+ -StartupType Automatic
+
+Note that the Ceph MSI installer takes care of creating the ``ceph-rbd``
+Windows service.
+
+Usage
+=====
+
+Integration
+-----------
+
+RBD images can be exposed to the OS and host Windows partitions or they can be
+attached to Hyper-V VMs in the same way as iSCSI disks.
+
+Starting with Openstack Wallaby, the Nova Hyper-V driver can attach RBD Cinder
+volumes to Hyper-V VMs.
+
+Mapping images
+--------------
+
+The workflow and CLI is similar to the Linux counterpart, with a few
+notable differences:
+
+* device paths cannot be requested. The disk number and path will be picked by
+ Windows. If a device path is provided by the used when mapping an image, it
+ will be used as an identifier, which can also be used when unmapping the
+ image.
+* the ``show`` command was added, which describes a specific mapping.
+ This can be used for retrieving the disk path.
+* the ``service`` command was added, allowing ``rbd-wnbd`` to run as a Windows service.
+ All mappings are by default persistent, being recreated when the service
+ stops, unless explicitly unmapped. The service disconnects the mappings
+ when being stopped.
+* the ``list`` command also includes a ``status`` column.
+
+The purpose of the ``service`` mode is to ensure that mappings survive reboots
+and that the Windows service start order can be adjusted so that RBD images can
+be mapped before starting services that may depend on it, such as VMMS.
+
+The mapped images can either be consumed by the host directly or exposed to
+Hyper-V VMs.
+
+Hyper-V VM disks
+----------------
+
+The following sample imports an RBD image and boots a Hyper-V VM using it::
+
+ # Feel free to use any other image. This one is convenient to use for
+ # testing purposes because it's very small (~15MB) and the login prompt
+ # prints the pre-configured password.
+ wget http://download.cirros-cloud.net/0.5.1/cirros-0.5.1-x86_64-disk.img `
+ -OutFile cirros-0.5.1-x86_64-disk.img
+
+ # We'll need to make sure that the imported images are raw (so no qcow2 or vhdx).
+ # You may get qemu-img from https://cloudbase.it/qemu-img-windows/
+ # You can add the extracted location to $env:Path or update the path accordingly.
+ qemu-img convert -O raw cirros-0.5.1-x86_64-disk.img cirros-0.5.1-x86_64-disk.raw
+
+ rbd import cirros-0.5.1-x86_64-disk.raw
+ # Let's give it a hefty 100MB size.
+ rbd resize cirros-0.5.1-x86_64-disk.raw --size=100MB
+
+ rbd device map cirros-0.5.1-x86_64-disk.raw
+
+ # Let's have a look at the mappings.
+ rbd device list
+ Get-Disk
+
+ $mappingJson = rbd-wnbd show cirros-0.5.1-x86_64-disk.raw --format=json
+ $mappingJson = $mappingJson | ConvertFrom-Json
+
+ $diskNumber = $mappingJson.disk_number
+
+ New-VM -VMName BootFromRBD -MemoryStartupBytes 512MB
+ # The disk must be turned offline before it can be passed to Hyper-V VMs
+ Set-Disk -Number $diskNumber -IsOffline $true
+ Add-VMHardDiskDrive -VMName BootFromRBD -DiskNumber $diskNumber
+ Start-VM -VMName BootFromRBD
+
+Windows partitions
+------------------
+
+The following sample creates an empty RBD image, attaches it to the host and
+initializes a partition::
+
+ rbd create blank_image --size=1G
+ rbd device map blank_image -onon-persistent
+
+ $mappingJson = rbd-wnbd show blank_image --format=json
+ $mappingJson = $mappingJson | ConvertFrom-Json
+
+ $diskNumber = $mappingJson.disk_number
+
+ # The disk must be online before creating or accessing partitions.
+ Set-Disk -Number $diskNumber -IsOffline $false
+
+ # Initialize the disk, partition it and create a filesystem.
+ Get-Disk -Number $diskNumber | `
+ Initialize-Disk -PassThru | `
+ New-Partition -AssignDriveLetter -UseMaximumSize | `
+ Format-Volume -Force -Confirm:$false
+
+ # Show the partition letter (for example, "D:" or "F:"):
+ (Get-Partition -DiskNumber $diskNumber).DriveLetter
+
+SAN policy
+----------
+
+The Windows SAN policy determines which disks will be automatically mounted.
+The default policy (``offlineShared``) specifies that:
+
+ All newly discovered disks that do not reside on a shared bus (such as SCSI
+ and iSCSI) are brought online and made read-write. Disks that are left
+ offline will be read-only by default."
+
+Note that recent WNBD driver versions report rbd-wnbd disks as SAS, which is
+also considered a shared bus. As a result, the disks will be offline and
+read-only by default.
+
+In order to turn a disk online (mounting the disk partitions) and clear the
+read-only flag, use the following commands::
+
+ Set-Disk -Number $diskNumber -IsOffline $false
+ Set-Disk -Number $diskNumber -IsReadOnly $false
+
+Please check the `Limitations`_ section to learn about the Windows limitations
+that affect automatically mounted disks.
+
+Windows documentation:
+
+* `SAN policy reference`_
+* `san command`_
+* `StorageSetting command`_
+
+Limitations
+-----------
+
+CSV support
+~~~~~~~~~~~
+
+At the moment, the Microsoft Failover Cluster can't use WNBD disks as
+Cluster Shared Volumes (CSVs) underlying storage. The main reason is that
+``WNBD`` and ``rbd-wnbd`` don't support the *SCSI Persistent Reservations*
+feature yet.
+
+Hyper-V disk addressing
+~~~~~~~~~~~~~~~~~~~~~~~
+
+.. warning::
+ Hyper-V identifies passthrough VM disks by number instead of SCSI ID, although
+ the disk number can change across host reboots. This means that the VMs can end
+ up using incorrect disks after rebooting the host, which is an important
+ security concern. This issue also affects iSCSI and Fibre Channel disks.
+
+There are a few possible ways of avoiding this Hyper-V limitation:
+
+* use an NTFS/ReFS partition to store VHDX image files instead of directly
+ attaching the RBD image. This may slightly impact the IO performance.
+* use the Hyper-V ``AutomaticStartAction`` setting to prevent the VMs from
+ booting with the incorrect disks and have a script that updates VM disks
+ attachments before powering them back on. The ``ElementName`` field of the
+ `Msvm_StorageAllocationSettingData`_ `WMI`_ class may be used to label VM
+ disk attachments.
+* use the Openstack Hyper-V driver, which automatically refreshes the VM disk
+ attachments before powering them back on.
+
+Automatically mounted disks
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Disks that are marked as "online" or "writable" will remain so after being
+reconnected (e.g. due to host reboots, Ceph service restarts, etc).
+
+Unfortunately, Windows restores the disk status based on the disk number,
+ignoring the disk unique identifier. However, the disk numbers can change
+after being reconnected. This issue also affects iSCSI and Fibre Channel disks.
+
+Let's assume that the `SAN policy`_ is set to ``offlineShared``, three
+RBD images are attached and disk 1 is turned online. After a reboot, disk 1
+will become online but it may now correspond to a different RBD image. This can
+be an issue if the disk that was mounted on the host was actually meant for a
+VM.
+
+Troubleshooting
+===============
+
+Please consult the `Windows troubleshooting`_ page.
+
+.. _Windows troubleshooting: ../../install/windows-troubleshooting
+.. _installation guide: ../../install/windows-install
+.. _RBD basic commands: ../rados-rbd-cmds
+.. _WNBD driver: https://github.com/cloudbase/wnbd
+.. _Msvm_StorageAllocationSettingData: https://docs.microsoft.com/en-us/windows/win32/hyperv_v2/msvm-storageallocationsettingdata
+.. _WMI: https://docs.microsoft.com/en-us/windows/win32/wmisdk/wmi-start-page
+.. _san command: https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/san
+.. _StorageSetting command: https://learn.microsoft.com/en-us/powershell/module/storage/set-storagesetting?view=windowsserver2022-ps
+.. _SAN policy reference: https://learn.microsoft.com/en-us/windows-hardware/customize/desktop/unattend/microsoft-windows-partitionmanager-sanpolicy