summaryrefslogtreecommitdiffstats
path: root/doc/rbd
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-27 18:24:20 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-27 18:24:20 +0000
commit483eb2f56657e8e7f419ab1a4fab8dce9ade8609 (patch)
treee5d88d25d870d5dedacb6bbdbe2a966086a0a5cf /doc/rbd
parentInitial commit. (diff)
downloadceph-483eb2f56657e8e7f419ab1a4fab8dce9ade8609.tar.xz
ceph-483eb2f56657e8e7f419ab1a4fab8dce9ade8609.zip
Adding upstream version 14.2.21.upstream/14.2.21upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
-rw-r--r--doc/rbd/api/index.rst8
-rw-r--r--doc/rbd/api/librbdpy.rst83
-rw-r--r--doc/rbd/disk.conf8
-rw-r--r--doc/rbd/index.rst73
-rw-r--r--doc/rbd/iscsi-initiator-esx.rst105
-rw-r--r--doc/rbd/iscsi-initiator-linux.rst91
-rw-r--r--doc/rbd/iscsi-initiator-win.rst102
-rw-r--r--doc/rbd/iscsi-initiators.rst23
-rw-r--r--doc/rbd/iscsi-monitoring.rst85
-rw-r--r--doc/rbd/iscsi-overview.rst52
-rw-r--r--doc/rbd/iscsi-requirements.rst50
-rw-r--r--doc/rbd/iscsi-target-ansible.rst343
-rw-r--r--doc/rbd/iscsi-target-cli-manual-install.rst190
-rw-r--r--doc/rbd/iscsi-target-cli.rst235
-rw-r--r--doc/rbd/iscsi-targets.rst27
-rw-r--r--doc/rbd/libvirt.rst321
-rw-r--r--doc/rbd/man/index.rst16
-rw-r--r--doc/rbd/qemu-rbd.rst220
-rw-r--r--doc/rbd/rados-rbd-cmds.rst224
-rw-r--r--doc/rbd/rbd-cloudstack.rst157
-rw-r--r--doc/rbd/rbd-config-ref.rst354
-rw-r--r--doc/rbd/rbd-ko.rst59
-rw-r--r--doc/rbd/rbd-live-migration.rst157
-rw-r--r--doc/rbd/rbd-mirroring.rst410
-rw-r--r--doc/rbd/rbd-openstack.rst514
-rw-r--r--doc/rbd/rbd-replay.rst42
-rw-r--r--doc/rbd/rbd-snapshot.rst314
27 files changed, 4263 insertions, 0 deletions
diff --git a/doc/rbd/api/index.rst b/doc/rbd/api/index.rst
new file mode 100644
index 00000000..27bb4485
--- /dev/null
+++ b/doc/rbd/api/index.rst
@@ -0,0 +1,8 @@
+========================
+ Ceph Block Device APIs
+========================
+
+.. toctree::
+ :maxdepth: 2
+
+ librbd (Python) <librbdpy>
diff --git a/doc/rbd/api/librbdpy.rst b/doc/rbd/api/librbdpy.rst
new file mode 100644
index 00000000..981235f8
--- /dev/null
+++ b/doc/rbd/api/librbdpy.rst
@@ -0,0 +1,83 @@
+================
+ Librbd (Python)
+================
+
+.. highlight:: python
+
+The `rbd` python module provides file-like access to RBD images.
+
+
+Example: Creating and writing to an image
+=========================================
+
+To use `rbd`, you must first connect to RADOS and open an IO
+context::
+
+ cluster = rados.Rados(conffile='my_ceph.conf')
+ cluster.connect()
+ ioctx = cluster.open_ioctx('mypool')
+
+Then you instantiate an :class:rbd.RBD object, which you use to create the
+image::
+
+ rbd_inst = rbd.RBD()
+ size = 4 * 1024**3 # 4 GiB
+ rbd_inst.create(ioctx, 'myimage', size)
+
+To perform I/O on the image, you instantiate an :class:rbd.Image object::
+
+ image = rbd.Image(ioctx, 'myimage')
+ data = 'foo' * 200
+ image.write(data, 0)
+
+This writes 'foo' to the first 600 bytes of the image. Note that data
+cannot be :type:unicode - `Librbd` does not know how to deal with
+characters wider than a :c:type:char.
+
+In the end, you will want to close the image, the IO context and the connection to RADOS::
+
+ image.close()
+ ioctx.close()
+ cluster.shutdown()
+
+To be safe, each of these calls would need to be in a separate :finally
+block::
+
+ cluster = rados.Rados(conffile='my_ceph_conf')
+ try:
+ cluster.connect()
+ ioctx = cluster.open_ioctx('my_pool')
+ try:
+ rbd_inst = rbd.RBD()
+ size = 4 * 1024**3 # 4 GiB
+ rbd_inst.create(ioctx, 'myimage', size)
+ image = rbd.Image(ioctx, 'myimage')
+ try:
+ data = 'foo' * 200
+ image.write(data, 0)
+ finally:
+ image.close()
+ finally:
+ ioctx.close()
+ finally:
+ cluster.shutdown()
+
+This can be cumbersome, so the :class:`Rados`, :class:`Ioctx`, and
+:class:`Image` classes can be used as context managers that close/shutdown
+automatically (see :pep:`343`). Using them as context managers, the
+above example becomes::
+
+ with rados.Rados(conffile='my_ceph.conf') as cluster:
+ with cluster.open_ioctx('mypool') as ioctx:
+ rbd_inst = rbd.RBD()
+ size = 4 * 1024**3 # 4 GiB
+ rbd_inst.create(ioctx, 'myimage', size)
+ with rbd.Image(ioctx, 'myimage') as image:
+ data = 'foo' * 200
+ image.write(data, 0)
+
+API Reference
+=============
+
+.. automodule:: rbd
+ :members: RBD, Image, SnapIterator
diff --git a/doc/rbd/disk.conf b/doc/rbd/disk.conf
new file mode 100644
index 00000000..3db9b8a1
--- /dev/null
+++ b/doc/rbd/disk.conf
@@ -0,0 +1,8 @@
+<disk type='network' device='disk'>
+ <source protocol='rbd' name='poolname/imagename'>
+ <host name='{fqdn}' port='6789'/>
+ <host name='{fqdn}' port='6790'/>
+ <host name='{fqdn}' port='6791'/>
+ </source>
+ <target dev='vda' bus='virtio'/>
+</disk>
diff --git a/doc/rbd/index.rst b/doc/rbd/index.rst
new file mode 100644
index 00000000..2a410c37
--- /dev/null
+++ b/doc/rbd/index.rst
@@ -0,0 +1,73 @@
+===================
+ Ceph Block Device
+===================
+
+.. index:: Ceph Block Device; introduction
+
+A block is a sequence of bytes (for example, a 512-byte block of data).
+Block-based storage interfaces are the most common way to store data with
+rotating media such as hard disks, CDs, floppy disks, and even traditional
+9-track tape. The ubiquity of block device interfaces makes a virtual block
+device an ideal candidate to interact with a mass data storage system like Ceph.
+
+Ceph block devices are thin-provisioned, resizable and store data striped over
+multiple OSDs in a Ceph cluster. Ceph block devices leverage
+:abbr:`RADOS (Reliable Autonomic Distributed Object Store)` capabilities
+such as snapshotting, replication and consistency. Ceph's
+:abbr:`RADOS (Reliable Autonomic Distributed Object Store)` Block Devices (RBD)
+interact with OSDs using kernel modules or the ``librbd`` library.
+
+.. ditaa::
+
+ +------------------------+ +------------------------+
+ | Kernel Module | | librbd |
+ +------------------------+-+------------------------+
+ | RADOS Protocol |
+ +------------------------+-+------------------------+
+ | OSDs | | Monitors |
+ +------------------------+ +------------------------+
+
+.. note:: Kernel modules can use Linux page caching. For ``librbd``-based
+ applications, Ceph supports `RBD Caching`_.
+
+Ceph's block devices deliver high performance with infinite scalability to
+`kernel modules`_, or to :abbr:`KVMs (kernel virtual machines)` such as `QEMU`_, and
+cloud-based computing systems like `OpenStack`_ and `CloudStack`_ that rely on
+libvirt and QEMU to integrate with Ceph block devices. You can use the same cluster
+to operate the :ref:`Ceph RADOS Gateway <object-gateway>`, the
+:ref:`CephFS filesystem <ceph-filesystem>`, and Ceph block devices simultaneously.
+
+.. important:: To use Ceph Block Devices, you must have access to a running
+ Ceph cluster.
+
+.. toctree::
+ :maxdepth: 1
+
+ Commands <rados-rbd-cmds>
+ Kernel Modules <rbd-ko>
+ Snapshots<rbd-snapshot>
+ Mirroring <rbd-mirroring>
+ Live-Migration <rbd-live-migration>
+ LIO iSCSI Gateway <iscsi-overview>
+ QEMU <qemu-rbd>
+ libvirt <libvirt>
+ librbd Settings <rbd-config-ref/>
+ OpenStack <rbd-openstack>
+ CloudStack <rbd-cloudstack>
+ RBD Replay <rbd-replay>
+
+.. toctree::
+ :maxdepth: 2
+
+ Manpages <man/index>
+
+.. toctree::
+ :maxdepth: 2
+
+ APIs <api/index>
+
+.. _RBD Caching: ./rbd-config-ref/
+.. _kernel modules: ./rbd-ko/
+.. _QEMU: ./qemu-rbd/
+.. _OpenStack: ./rbd-openstack
+.. _CloudStack: ./rbd-cloudstack
diff --git a/doc/rbd/iscsi-initiator-esx.rst b/doc/rbd/iscsi-initiator-esx.rst
new file mode 100644
index 00000000..41c144dd
--- /dev/null
+++ b/doc/rbd/iscsi-initiator-esx.rst
@@ -0,0 +1,105 @@
+------------------------------
+iSCSI Initiator for VMware ESX
+------------------------------
+
+**Prerequisite:**
+
+- VMware ESX 6.5 or later using Virtual Machine compatibility 6.5 with VMFS 6.
+
+**iSCSI Discovery and Multipath Device Setup:**
+
+The following instructions will use the default vSphere web client and esxcli.
+
+#. Enable Software iSCSI
+
+ .. image:: ../images/esx_web_client_storage_main.png
+ :align: center
+
+ Click on "Storage" from "Navigator", and select the "Adapters" tab.
+ From there right click "Confgure iSCSI".
+
+#. Set Initiator Name
+
+ .. image:: ../images/esx_config_iscsi_main.png
+ :align: center
+
+ If the initiator name in the "Name & alias" section is not the same name
+ used when creating the client during gwcli setup or the initiator name used
+ in the ansible client_connections client variable, then ssh to the ESX
+ host and run the following esxcli commands to change the name.
+
+ Get the adapter name for Software iSCSI:
+
+ ::
+
+ > esxcli iscsi adapter list
+ > Adapter Driver State UID Description
+ > ------- --------- ------ ------------- ----------------------
+ > vmhba64 iscsi_vmk online iscsi.vmhba64 iSCSI Software Adapter
+
+ In this example the software iSCSI adapter is vmhba64 and the initiator
+ name is iqn.1994-05.com.redhat:rh7-client:
+
+ ::
+
+ > esxcli iscsi adapter set -A vmhba64 -n iqn.1994-05.com.redhat:rh7-client
+
+#. Setup CHAP
+
+ .. image:: ../images/esx_chap.png
+ :align: center
+
+ Expand the CHAP authentication section, select "Do not use CHAP unless
+ required by target" and enter the CHAP credentials used in the gwcli
+ auth command or ansible client_connections credentials variable.
+
+ The Mutual CHAP authentication section should have "Do not use CHAP"
+ selected.
+
+ Warning: There is a bug in the web client where the requested CHAP
+ settings are not always used initially. On the iSCSI gateway kernel
+ logs you will see the error:
+
+ ::
+
+ > kernel: CHAP user or password not set for Initiator ACL
+ > kernel: Security negotiation failed.
+ > kernel: iSCSI Login negotiation failed.
+
+ To workaround this set the CHAP settings with the esxcli command. Here
+ authname is the username and secret is the password used in previous
+ examples:
+
+ ::
+
+ > esxcli iscsi adapter auth chap set --direction=uni --authname=myiscsiusername --secret=myiscsipassword --level=discouraged -A vmhba64
+
+#. Configure iSCSI Settings
+
+ .. image:: ../images/esx_iscsi_recov_timeout.png
+ :align: center
+
+ Expand Advanced settings and set the "RecoveryTimeout" to 25.
+
+#. Set the discovery address
+
+ .. image:: ../images/esx_config_iscsi_main.png
+ :align: center
+
+ In the Dynamic targets section, click "Add dynamic target" and under
+ Addresses add one of the gateway IP addresses added during the iSCSI
+ gateway setup stage in the gwcli section or an IP set in the ansible
+ gateway_ip_list variable. Only one address needs to be added as the gateways
+ have been setup so all the iSCSI portals are returned during discovery.
+
+ Finally, click the "Save configuration" button. In the Devices tab, you
+ should see the RBD image.
+
+ The LUN should be automatically configured and using the ALUA SATP and
+ MRU PSP. Other SATPs and PSPs must not be used. This can be verified with
+ the esxcli command:
+
+ ::
+
+ > esxcli storage nmp path list -d eui.your_devices_id
+
diff --git a/doc/rbd/iscsi-initiator-linux.rst b/doc/rbd/iscsi-initiator-linux.rst
new file mode 100644
index 00000000..ba374c40
--- /dev/null
+++ b/doc/rbd/iscsi-initiator-linux.rst
@@ -0,0 +1,91 @@
+-------------------------
+iSCSI Initiator for Linux
+-------------------------
+
+**Prerequisite:**
+
+- Package ``iscsi-initiator-utils``
+
+- Package ``device-mapper-multipath``
+
+**Installing:**
+
+Install the iSCSI initiator and multipath tools:
+
+ ::
+
+ # yum install iscsi-initiator-utils
+ # yum install device-mapper-multipath
+
+**Configuring:**
+
+#. Create the default ``/etc/multipath.conf`` file and enable the
+ ``multipathd`` service:
+
+ ::
+
+ # mpathconf --enable --with_multipathd y
+
+#. Add the following to ``/etc/multipath.conf`` file:
+
+ ::
+
+ devices {
+ device {
+ vendor "LIO-ORG"
+ hardware_handler "1 alua"
+ path_grouping_policy "failover"
+ path_selector "queue-length 0"
+ failback 60
+ path_checker tur
+ prio alua
+ prio_args exclusive_pref_bit
+ fast_io_fail_tmo 25
+ no_path_retry queue
+ }
+ }
+
+#. Restart the ``multipathd`` service:
+
+ ::
+
+ # systemctl reload multipathd
+
+**iSCSI Discovery and Setup:**
+
+#. If CHAP was setup on the iSCSI gateway, provide a CHAP username and
+ password by updating the ``/etc/iscsi/iscsid.conf`` file accordingly.
+
+#. Discover the target portals:
+
+ ::
+
+ # iscsiadm -m discovery -t st -p 192.168.56.101
+ 192.168.56.101:3260,1 iqn.2003-01.org.linux-iscsi.rheln1
+ 192.168.56.102:3260,2 iqn.2003-01.org.linux-iscsi.rheln1
+
+#. Login to target:
+
+ ::
+
+ # iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.rheln1 -l
+
+**Multipath IO Setup:**
+
+The multipath daemon (``multipathd``), will set up devices automatically
+based on the ``multipath.conf`` settings. Running the ``multipath``
+command show devices setup in a failover configuration with a priority
+group for each path.
+
+::
+
+ # multipath -ll
+ mpathbt (360014059ca317516a69465c883a29603) dm-1 LIO-ORG ,IBLOCK
+ size=1.0G features='0' hwhandler='1 alua' wp=rw
+ |-+- policy='queue-length 0' prio=50 status=active
+ | `- 28:0:0:1 sde 8:64 active ready running
+ `-+- policy='queue-length 0' prio=10 status=enabled
+ `- 29:0:0:1 sdc 8:32 active ready running
+
+You should now be able to use the RBD image like you would a normal
+multipath’d iSCSI disk.
diff --git a/doc/rbd/iscsi-initiator-win.rst b/doc/rbd/iscsi-initiator-win.rst
new file mode 100644
index 00000000..42df1086
--- /dev/null
+++ b/doc/rbd/iscsi-initiator-win.rst
@@ -0,0 +1,102 @@
+-------------------------------------
+iSCSI Initiator for Microsoft Windows
+-------------------------------------
+
+**Prerequisite:**
+
+- Microsoft Windows Server 2016
+
+**iSCSI Initiator, Discovery and Setup:**
+
+#. Install the iSCSI initiator driver and MPIO tools.
+
+#. Launch the MPIO program, click on the "Discover Multi-Paths" tab, check the
+ "Add support for iSCSI devices” box, and click "Add". This will require a
+ reboot.
+
+#. On the iSCSI Initiator Properties window, on the "Discovery" tab, add a target
+ portal. Enter the IP address or DNS name and Port of the Ceph iSCSI gateway.
+
+#. On the “Targets” tab, select the target and click on “Connect”.
+
+#. On the “Connect To Target” window, select the “Enable multi-path” option, and
+ click the “Advanced” button.
+
+#. Under the "Connet using" section, select a “Target portal IP” . Select the
+ “Enable CHAP login on” and enter the "Name" and "Target secret" values from the
+ Ceph iSCSI Ansible client credentials section, and click OK.
+
+#. Repeat steps 5 and 6 for each target portal defined when setting up
+ the iSCSI gateway.
+
+**Multipath IO Setup:**
+
+Configuring the MPIO load balancing policy, setting the timeout and
+retry options are using PowerShell with the ``mpclaim`` command. The
+rest is done in the iSCSI Initiator tool.
+
+.. note::
+ It is recommended to increase the ``PDORemovePeriod`` option to 120
+ seconds from PowerShell. This value might need to be adjusted based
+ on the application. When all paths are down, and 120 seconds
+ expires, the operating system will start failing IO requests.
+
+::
+
+ Set-MPIOSetting -NewPDORemovePeriod 120
+
+::
+
+ mpclaim.exe -l -m 1
+
+::
+
+ mpclaim -s -m
+ MSDSM-wide Load Balance Policy: Fail Over Only
+
+#. Using the iSCSI Initiator tool, from the “Targets” tab, click on
+ the “Devices...” button.
+
+#. From the Devices window, select a disk and click the
+ “MPIO...” button.
+
+#. On the "Device Details" window the paths to each target portal is
+ displayed. If using the ``ceph-ansible`` setup method, the
+ iSCSI gateway will use ALUA to tell the iSCSI initiator which path
+ and iSCSI gateway should be used as the primary path. The Load
+ Balancing Policy “Fail Over Only” must be selected
+
+::
+
+ mpclaim -s -d $MPIO_DISK_ID
+
+.. note::
+ For the ``ceph-ansible`` setup method, there will be one
+ Active/Optimized path which is the path to the iSCSI gateway node
+ that owns the LUN, and there will be an Active/Unoptimized path for
+ each other iSCSI gateway node.
+
+**Tuning:**
+
+Consider using the following registry settings:
+
+- Windows Disk Timeout
+
+ ::
+
+ HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk
+
+ ::
+
+ TimeOutValue = 65
+
+- Microsoft iSCSI Initiator Driver
+
+ ::
+
+ HKEY_LOCAL_MACHINE\\SYSTEM\CurrentControlSet\Control\Class\{4D36E97B-E325-11CE-BFC1-08002BE10318}\<Instance_Number>\Parameters
+
+ ::
+
+ LinkDownTime = 25
+ SRBTimeoutDelta = 15
diff --git a/doc/rbd/iscsi-initiators.rst b/doc/rbd/iscsi-initiators.rst
new file mode 100644
index 00000000..67447789
--- /dev/null
+++ b/doc/rbd/iscsi-initiators.rst
@@ -0,0 +1,23 @@
+--------------------------------
+Configuring the iSCSI Initiators
+--------------------------------
+
+- `iSCSI Initiator for Linux <../iscsi-initiator-linux>`_
+
+- `iSCSI Initiator for Microsoft Windows <../iscsi-initiator-win>`_
+
+- `iSCSI Initiator for VMware ESX <../iscsi-initiator-esx>`_
+
+ .. warning::
+
+ Applications that use SCSI persistent group reservations (PGR) and
+ SCSI 2 based reservations are not supported when exporting a RBD image
+ through more than one iSCSI gateway.
+
+.. toctree::
+ :maxdepth: 1
+ :hidden:
+
+ Linux <iscsi-initiator-linux>
+ Microsoft Windows <iscsi-initiator-win>
+ VMware ESX <iscsi-initiator-esx>
diff --git a/doc/rbd/iscsi-monitoring.rst b/doc/rbd/iscsi-monitoring.rst
new file mode 100644
index 00000000..8a15dd73
--- /dev/null
+++ b/doc/rbd/iscsi-monitoring.rst
@@ -0,0 +1,85 @@
+-----------------------------
+Monitoring the iSCSI gateways
+-----------------------------
+
+Ceph provides an additional tool for iSCSI gateway environments
+to monitor performance of exported RADOS Block Device (RBD) images.
+
+The ``gwtop`` tool is a ``top``-like tool that displays aggregated
+performance metrics of RBD images that are exported to clients over
+iSCSI. The metrics are sourced from a Performance Metrics Domain Agent
+(PMDA). Information from the Linux-IO target (LIO) PMDA is used to list
+each exported RBD image with the connected client and its associated I/O
+metrics.
+
+**Requirements:**
+
+- A running Ceph iSCSI gateway
+
+**Installing:**
+
+#. As ``root``, install the ``ceph-iscsi-tools`` package on each iSCSI
+ gateway node:
+
+ ::
+
+ # yum install ceph-iscsi-tools
+
+#. As ``root``, install the performance co-pilot package on each iSCSI
+ gateway node:
+
+ ::
+
+ # yum install pcp
+
+#. As ``root``, install the LIO PMDA package on each iSCSI gateway node:
+
+ ::
+
+ # yum install pcp-pmda-lio
+
+#. As ``root``, enable and start the performance co-pilot service on
+ each iSCSI gateway node:
+
+ ::
+
+ # systemctl enable pmcd
+ # systemctl start pmcd
+
+#. As ``root``, register the ``pcp-pmda-lio`` agent:
+
+ ::
+
+ cd /var/lib/pcp/pmdas/lio
+ ./Install
+
+By default, ``gwtop`` assumes the iSCSI gateway configuration object is
+stored in a RADOS object called ``gateway.conf`` in the ``rbd`` pool.
+This configuration defines the iSCSI gateways to contact for gathering
+the performance statistics. This can be overridden by using either the
+``-g`` or ``-c`` flags. See ``gwtop --help`` for more details.
+
+The LIO configuration determines which type of performance statistics to
+extract from performance co-pilot. When ``gwtop`` starts it looks at the
+LIO configuration, and if it find user-space disks, then ``gwtop``
+selects the LIO collector automatically.
+
+**Example ``gwtop`` Outputs**
+
+::
+
+ gwtop 2/2 Gateways CPU% MIN: 4 MAX: 5 Network Total In: 2M Out: 3M 10:20:00
+ Capacity: 8G Disks: 8 IOPS: 503 Clients: 1 Ceph: HEALTH_OK OSDs: 3
+ Pool.Image Src Size iops rMB/s wMB/s Client
+ iscsi.t1703 500M 0 0.00 0.00
+ iscsi.testme1 500M 0 0.00 0.00
+ iscsi.testme2 500M 0 0.00 0.00
+ iscsi.testme3 500M 0 0.00 0.00
+ iscsi.testme5 500M 0 0.00 0.00
+ rbd.myhost_1 T 4G 504 1.95 0.00 rh460p(CON)
+ rbd.test_2 1G 0 0.00 0.00
+ rbd.testme 500M 0 0.00 0.00
+
+In the *Client* column, ``(CON)`` means the iSCSI initiator (client) is
+currently logged into the iSCSI gateway. If ``-multi-`` is displayed,
+then multiple clients are mapped to the single RBD image.
diff --git a/doc/rbd/iscsi-overview.rst b/doc/rbd/iscsi-overview.rst
new file mode 100644
index 00000000..034dc1bc
--- /dev/null
+++ b/doc/rbd/iscsi-overview.rst
@@ -0,0 +1,52 @@
+.. _ceph-iscsi:
+
+==================
+Ceph iSCSI Gateway
+==================
+
+The iSCSI gateway is integrating Ceph Storage with the iSCSI standard to provide
+a Highly Available (HA) iSCSI target that exports RADOS Block Device (RBD) images
+as SCSI disks. The iSCSI protocol allows clients (initiators) to send SCSI commands
+to SCSI storage devices (targets) over a TCP/IP network. This allows for heterogeneous
+clients, such as Microsoft Windows, to access the Ceph Storage cluster.
+
+Each iSCSI gateway runs the Linux IO target kernel subsystem (LIO) to provide the
+iSCSI protocol support. LIO utilizes a userspace passthrough (TCMU) to interact
+with Ceph's librbd library and expose RBD images to iSCSI clients. With Ceph’s
+iSCSI gateway you can effectively run a fully integrated block-storage
+infrastructure with all the features and benefits of a conventional Storage Area
+Network (SAN).
+
+.. ditaa::
+ Cluster Network
+ +-------------------------------------------+
+ | | | |
+ +-------+ +-------+ +-------+ +-------+
+ | | | | | | | |
+ | OSD 1 | | OSD 2 | | OSD 3 | | OSD N |
+ | {s}| | {s}| | {s}| | {s}|
+ +-------+ +-------+ +-------+ +-------+
+ | | | |
+ +--------->| | +---------+ | |<---------+
+ : | | | RBD | | | :
+ | +----------------| Image |----------------+ |
+ | Public Network | {d} | |
+ | +---------+ |
+ | |
+ | +-------------------+ |
+ | +--------------+ | iSCSI Initiators | +--------------+ |
+ | | iSCSI GW | | +-----------+ | | iSCSI GW | |
+ +-->| RBD Module |<--+ | Various | +-->| RBD Module |<--+
+ | | | | Operating | | | |
+ +--------------+ | | Systems | | +--------------+
+ | +-----------+ |
+ +-------------------+
+
+
+.. toctree::
+ :maxdepth: 1
+
+ Requirements <iscsi-requirements>
+ Configuring the iSCSI Target <iscsi-targets>
+ Configuring the iSCSI Initiators <iscsi-initiators>
+ Monitoring the iSCSI Gateways <iscsi-monitoring>
diff --git a/doc/rbd/iscsi-requirements.rst b/doc/rbd/iscsi-requirements.rst
new file mode 100644
index 00000000..57e42be9
--- /dev/null
+++ b/doc/rbd/iscsi-requirements.rst
@@ -0,0 +1,50 @@
+==========================
+iSCSI Gateway Requirements
+==========================
+
+To implement the Ceph iSCSI gateway there are a few requirements. It is recommended
+to use two to four iSCSI gateway nodes for a highly available Ceph iSCSI gateway
+solution.
+
+For hardware recommendations, see :ref:`hardware-recommendations` for more
+details.
+
+.. note::
+ On the iSCSI gateway nodes, the memory footprint of the RBD images
+ can grow to a large size. Plan memory requirements accordingly based
+ off the number RBD images mapped.
+
+There are no specific iSCSI gateway options for the Ceph Monitors or
+OSDs, but it is important to lower the default timers for detecting
+down OSDs to reduce the possibility of initiator timeouts. The following
+configuration options are suggested for each OSD node in the storage
+cluster::
+
+ [osd]
+ osd heartbeat grace = 20
+ osd heartbeat interval = 5
+
+- Online Updating Using the Ceph Monitor
+
+ ::
+
+ ceph tell <daemon_type>.<id> config set <parameter_name> <new_value>
+
+ ::
+
+ ceph tell osd.0 config set osd_heartbeat_grace 20
+ ceph tell osd.0 config set osd_heartbeat_interval 5
+
+- Online Updating on the OSD Node
+
+ ::
+
+ ceph daemon <daemon_type>.<id> config set osd_client_watch_timeout 15
+
+ ::
+
+ ceph daemon osd.0 config set osd_heartbeat_grace 20
+ ceph daemon osd.0 config set osd_heartbeat_interval 5
+
+For more details on setting Ceph's configuration options, see
+:ref:`configuring-ceph`.
diff --git a/doc/rbd/iscsi-target-ansible.rst b/doc/rbd/iscsi-target-ansible.rst
new file mode 100644
index 00000000..564c1064
--- /dev/null
+++ b/doc/rbd/iscsi-target-ansible.rst
@@ -0,0 +1,343 @@
+==========================================
+Configuring the iSCSI Target using Ansible
+==========================================
+
+The Ceph iSCSI gateway is the iSCSI target node and also a Ceph client
+node. The Ceph iSCSI gateway can be a standalone node or be colocated on
+a Ceph Object Store Disk (OSD) node. Completing the following steps will
+install, and configure the Ceph iSCSI gateway for basic operation.
+
+**Requirements:**
+
+- A running Ceph Luminous (12.2.x) cluster or newer
+
+- Red Hat Enterprise Linux/CentOS 7.5 (or newer); Linux kernel v4.16 (or newer)
+
+- The ``ceph-iscsi`` package installed on all the iSCSI gateway nodes
+
+**Installing:**
+
+#. On the Ansible installer node, which could be either the administration node
+ or a dedicated deployment node, perform the following steps:
+
+ #. As ``root``, install the ``ceph-ansible`` package:
+
+ ::
+
+ # yum install ceph-ansible
+
+ #. Add an entry in ``/etc/ansible/hosts`` file for the gateway group:
+
+ ::
+
+ [ceph-iscsi-gw]
+ ceph-igw-1
+ ceph-igw-2
+
+.. note::
+ If co-locating the iSCSI gateway with an OSD node, then add the OSD node to the
+ ``[ceph-iscsi-gw]`` section.
+
+**Configuring:**
+
+The ``ceph-ansible`` package places a file in the ``/usr/share/ceph-ansible/group_vars/``
+directory called ``ceph-iscsi-gw.sample``. Create a copy of this sample file named
+``ceph-iscsi-gw.yml``. Review the following Ansible variables and descriptions,
+and update accordingly.
+
++--------------------------------------+--------------------------------------+
+| Variable | Meaning/Purpose |
++======================================+======================================+
+| ``seed_monitor`` | Each gateway needs access to the |
+| | ceph cluster for rados and rbd |
+| | calls. This means the iSCSI gateway |
+| | must have an appropriate |
+| | ``/etc/ceph/`` directory defined. |
+| | The ``seed_monitor`` host is used to |
+| | populate the iSCSI gateway’s |
+| | ``/etc/ceph/`` directory. |
++--------------------------------------+--------------------------------------+
+| ``cluster_name`` | Define a custom storage cluster |
+| | name. |
++--------------------------------------+--------------------------------------+
+| ``gateway_keyring`` | Define a custom keyring name. |
++--------------------------------------+--------------------------------------+
+| ``deploy_settings`` | If set to ``true``, then deploy the |
+| | settings when the playbook is ran. |
++--------------------------------------+--------------------------------------+
+| ``perform_system_checks`` | This is a boolean value that checks |
+| | for multipath and lvm configuration |
+| | settings on each gateway. It must be |
+| | set to true for at least the first |
+| | run to ensure multipathd and lvm are |
+| | configured properly. |
++--------------------------------------+--------------------------------------+
+| ``gateway_iqn`` | This is the iSCSI IQN that all the |
+| | gateways will expose to clients. |
+| | This means each client will see the |
+| | gateway group as a single subsystem. |
++--------------------------------------+--------------------------------------+
+| ``gateway_ip_list`` | The ip list defines the IP addresses |
+| | that will be used on the front end |
+| | network for iSCSI traffic. This IP |
+| | will be bound to the active target |
+| | portal group on each node, and is |
+| | the access point for iSCSI traffic. |
+| | Each IP should correspond to an IP |
+| | available on the hosts defined in |
+| | the ``ceph-iscsi-gw`` host group in |
+| | ``/etc/ansible/hosts``. |
++--------------------------------------+--------------------------------------+
+| ``rbd_devices`` | This section defines the RBD images |
+| | that will be controlled and managed |
+| | within the iSCSI gateway |
+| | configuration. Parameters like |
+| | ``pool`` and ``image`` are self |
+| | explanatory. Here are the other |
+| | parameters: ``size`` = This defines |
+| | the size of the RBD. You may |
+| | increase the size later, by simply |
+| | changing this value, but shrinking |
+| | the size of an RBD is not supported |
+| | and is ignored. ``host`` = This is |
+| | the iSCSI gateway host name that |
+| | will be responsible for the rbd |
+| | allocation/resize. Every defined |
+| | ``rbd_device`` entry must have a |
+| | host assigned. ``state`` = This is |
+| | typical Ansible syntax for whether |
+| | the resource should be defined or |
+| | removed. A request with a state of |
+| | absent will first be checked to |
+| | ensure the rbd is not mapped to any |
+| | client. If the RBD is unallocated, |
+| | it will be removed from the iSCSI |
+| | gateway and deleted from the |
+| | configuration. |
++--------------------------------------+--------------------------------------+
+| ``client_connections`` | This section defines the iSCSI |
+| | client connection details together |
+| | with the LUN (RBD image) masking. |
+| | Currently only CHAP is supported as |
+| | an authentication mechanism. Each |
+| | connection defines an ``image_list`` |
+| | which is a comma separated list of |
+| | the form |
+| | ``pool.rbd_image[,pool.rbd_image]``. |
+| | RBD images can be added and removed |
+| | from this list, to change the client |
+| | masking. Note that there are no |
+| | checks done to limit RBD sharing |
+| | across client connections. |
++--------------------------------------+--------------------------------------+
+
+.. note::
+ When using the ``gateway_iqn`` variable, and for Red Hat Enterprise Linux
+ clients, installing the ``iscsi-initiator-utils`` package is required for
+ retrieving the gateway’s IQN name. The iSCSI initiator name is located in the
+ ``/etc/iscsi/initiatorname.iscsi`` file.
+
+**Deploying:**
+
+On the Ansible installer node, perform the following steps.
+
+#. As ``root``, execute the Ansible playbook:
+
+ ::
+
+ # cd /usr/share/ceph-ansible
+ # ansible-playbook ceph-iscsi-gw.yml
+
+ .. note::
+ The Ansible playbook will handle RPM dependencies, RBD creation
+ and Linux IO configuration.
+
+#. Verify the configuration from an iSCSI gateway node:
+
+ ::
+
+ # gwcli ls
+
+ .. note::
+ For more information on using the ``gwcli`` command to install and configure
+ a Ceph iSCSI gateway, see the `Configuring the iSCSI Target using the Command Line Interface`_
+ section.
+
+ .. important::
+ Attempting to use the ``targetcli`` tool to change the configuration will
+ result in the following issues, such as ALUA misconfiguration and path failover
+ problems. There is the potential to corrupt data, to have mismatched
+ configuration across iSCSI gateways, and to have mismatched WWN information,
+ which will lead to client multipath problems.
+
+**Service Management:**
+
+The ``ceph-iscsi`` package installs the configuration management
+logic and a Systemd service called ``rbd-target-gw``. When the Systemd
+service is enabled, the ``rbd-target-gw`` will start at boot time and
+will restore the Linux IO state. The Ansible playbook disables the
+target service during the deployment. Below are the outcomes of when
+interacting with the ``rbd-target-gw`` Systemd service.
+
+::
+
+ # systemctl <start|stop|restart|reload> rbd-target-gw
+
+- ``reload``
+
+ A reload request will force ``rbd-target-gw`` to reread the
+ configuration and apply it to the current running environment. This
+ is normally not required, since changes are deployed in parallel from
+ Ansible to all iSCSI gateway nodes
+
+- ``stop``
+
+ A stop request will close the gateway’s portal interfaces, dropping
+ connections to clients and wipe the current LIO configuration from
+ the kernel. This returns the iSCSI gateway to a clean state. When
+ clients are disconnected, active I/O is rescheduled to the other
+ iSCSI gateways by the client side multipathing layer.
+
+**Administration:**
+
+Within the ``/usr/share/ceph-ansible/group_vars/ceph-iscsi-gw`` file
+there are a number of operational workflows that the Ansible playbook
+supports.
+
+.. warning::
+ Before removing RBD images from the iSCSI gateway configuration,
+ follow the standard procedures for removing a storage device from
+ the operating system.
+
++--------------------------------------+--------------------------------------+
+| I want to…​ | Update the ``ceph-iscsi-gw`` file |
+| | by…​ |
++======================================+======================================+
+| Add more RBD images | Adding another entry to the |
+| | ``rbd_devices`` section with the new |
+| | image. |
++--------------------------------------+--------------------------------------+
+| Resize an existing RBD image | Updating the size parameter within |
+| | the ``rbd_devices`` section. Client |
+| | side actions are required to pick up |
+| | the new size of the disk. |
++--------------------------------------+--------------------------------------+
+| Add a client | Adding an entry to the |
+| | ``client_connections`` section. |
++--------------------------------------+--------------------------------------+
+| Add another RBD to a client | Adding the relevant RBD |
+| | ``pool.image`` name to the |
+| | ``image_list`` variable for the |
+| | client. |
++--------------------------------------+--------------------------------------+
+| Remove an RBD from a client | Removing the RBD ``pool.image`` name |
+| | from the clients ``image_list`` |
+| | variable. |
++--------------------------------------+--------------------------------------+
+| Remove an RBD from the system | Changing the RBD entry state |
+| | variable to ``absent``. The RBD |
+| | image must be unallocated from the |
+| | operating system first for this to |
+| | succeed. |
++--------------------------------------+--------------------------------------+
+| Change the clients CHAP credentials | Updating the relevant CHAP details |
+| | in ``client_connections``. This will |
+| | need to be coordinated with the |
+| | clients. For example, the client |
+| | issues an iSCSI logout, the |
+| | credentials are changed by the |
+| | Ansible playbook, the credentials |
+| | are changed at the client, then the |
+| | client performs an iSCSI login. |
++--------------------------------------+--------------------------------------+
+| Remove a client | Updating the relevant |
+| | ``client_connections`` item with a |
+| | state of ``absent``. Once the |
+| | Ansible playbook is ran, the client |
+| | will be purged from the system, but |
+| | the disks will remain defined to |
+| | Linux IO for potential reuse. |
++--------------------------------------+--------------------------------------+
+
+Once a change has been made, rerun the Ansible playbook to apply the
+change across the iSCSI gateway nodes.
+
+::
+
+ # ansible-playbook ceph-iscsi-gw.yml
+
+**Removing the Configuration:**
+
+The ``ceph-ansible`` package provides an Ansible playbook to
+remove the iSCSI gateway configuration and related RBD images. The
+Ansible playbook is ``/usr/share/ceph-ansible/purge_gateways.yml``. When
+this Ansible playbook is ran a prompted for the type of purge to
+perform:
+
+*lio* :
+
+In this mode the LIO configuration is purged on all iSCSI gateways that
+are defined. Disks that were created are left untouched within the Ceph
+storage cluster.
+
+*all* :
+
+When ``all`` is chosen, the LIO configuration is removed together with
+**all** RBD images that were defined within the iSCSI gateway
+environment, other unrelated RBD images will not be removed. Ensure the
+correct mode is chosen, this operation will delete data.
+
+.. warning::
+ A purge operation is destructive action against your iSCSI gateway
+ environment.
+
+.. warning::
+ A purge operation will fail, if RBD images have snapshots or clones
+ and are exported through the Ceph iSCSI gateway.
+
+::
+
+ [root@rh7-iscsi-client ceph-ansible]# ansible-playbook purge_gateways.yml
+ Which configuration elements should be purged? (all, lio or abort) [abort]: all
+
+
+ PLAY [Confirm removal of the iSCSI gateway configuration] *********************
+
+
+ GATHERING FACTS ***************************************************************
+ ok: [localhost]
+
+
+ TASK: [Exit playbook if user aborted the purge] *******************************
+ skipping: [localhost]
+
+
+ TASK: [set_fact ] *************************************************************
+ ok: [localhost]
+
+
+ PLAY [Removing the gateway configuration] *************************************
+
+
+ GATHERING FACTS ***************************************************************
+ ok: [ceph-igw-1]
+ ok: [ceph-igw-2]
+
+
+ TASK: [igw_purge | purging the gateway configuration] *************************
+ changed: [ceph-igw-1]
+ changed: [ceph-igw-2]
+
+
+ TASK: [igw_purge | deleting configured rbd devices] ***************************
+ changed: [ceph-igw-1]
+ changed: [ceph-igw-2]
+
+
+ PLAY RECAP ********************************************************************
+ ceph-igw-1 : ok=3 changed=2 unreachable=0 failed=0
+ ceph-igw-2 : ok=3 changed=2 unreachable=0 failed=0
+ localhost : ok=2 changed=0 unreachable=0 failed=0
+
+
+.. _Configuring the iSCSI Target using the Command Line Interface: ../iscsi-target-cli
diff --git a/doc/rbd/iscsi-target-cli-manual-install.rst b/doc/rbd/iscsi-target-cli-manual-install.rst
new file mode 100644
index 00000000..ccc422e0
--- /dev/null
+++ b/doc/rbd/iscsi-target-cli-manual-install.rst
@@ -0,0 +1,190 @@
+==============================
+Manual ceph-iscsi Installation
+==============================
+
+**Requirements**
+
+To complete the installation of ceph-iscsi, there are 4 steps:
+
+1. Install common packages from your Linux distribution's software repository
+2. Install Git to fetch the remaining packages directly from their Git repositories
+3. Ensure a compatible kernel is used
+4. Install all the components of ceph-iscsi and start associated daemons:
+
+ - tcmu-runner
+ - rtslib-fb
+ - configshell-fb
+ - targetcli-fb
+ - ceph-iscsi
+
+
+1. Install Common Packages
+==========================
+
+The following packages will be used by ceph-iscsi and target tools.
+They must be installed from your Linux distribution's software repository
+on each machine that will be a iSCSI gateway:
+
+- libnl3
+- libkmod
+- librbd1
+- pyparsing
+- python kmod
+- python pyudev
+- python gobject
+- python urwid
+- python pyparsing
+- python rados
+- python rbd
+- python netifaces
+- python crypto
+- python requests
+- python flask
+- pyOpenSSL
+
+
+2. Install Git
+==============
+
+In order to install all the packages needed to run iSCSI with Ceph, you need to download them directly from their repository by using Git.
+On CentOS/RHEL execute:
+
+::
+
+ > sudo yum install git
+
+On Debian/Ubuntu execute:
+
+::
+
+ > sudo apt install git
+
+To know more about Git and how it works, please, visit https://git-scm.com
+
+
+3. Ensure a compatible kernel is used
+=====================================
+
+Ensure you use a supported kernel that contains the required Ceph iSCSI patches:
+
+- all Linux distribution with a kernel v4.16 or newer, or
+- Red Hat Enterprise Linux or CentOS 7.5 or later (in these distributions ceph-iscsi support is backported)
+
+If you are already using a compatible kernel, you can go to next step.
+However, if you are NOT using a compatible kernel then check your distro's
+documentation for specific instructions on how to build this kernel. The only
+Ceph iSCSI specific requirements are that the following build options must be
+enabled:
+
+ ::
+
+ CONFIG_TARGET_CORE=m
+ CONFIG_TCM_USER2=m
+ CONFIG_ISCSI_TARGET=m
+
+
+4. Install ceph-iscsi
+========================================================
+
+Finally, the remaining tools can be fetched directly from their Git repositories and their associated services started
+
+
+tcmu-runner
+-----------
+
+ Installation:
+
+ ::
+
+ > git clone https://github.com/open-iscsi/tcmu-runner
+ > cd tcmu-runner
+
+ Run the following command to install all the needed dependencies:
+
+ ::
+
+ > ./extra/install_dep.sh
+
+ Now you can build the tcmu-runner.
+ To do so, use the following build command:
+
+ ::
+
+ > cmake -Dwith-glfs=false -Dwith-qcow=false -DSUPPORT_SYSTEMD=ON -DCMAKE_INSTALL_PREFIX=/usr
+ > make install
+
+ Enable and start the daemon:
+
+ ::
+
+ > systemctl daemon-reload
+ > systemctl enable tcmu-runner
+ > systemctl start tcmu-runner
+
+
+rtslib-fb
+---------
+
+ Installation:
+
+ ::
+
+ > git clone https://github.com/open-iscsi/rtslib-fb.git
+ > cd rtslib-fb
+ > python setup.py install
+
+configshell-fb
+--------------
+
+ Installation:
+
+ ::
+
+ > git clone https://github.com/open-iscsi/configshell-fb.git
+ > cd configshell-fb
+ > python setup.py install
+
+targetcli-fb
+------------
+
+ Installation:
+
+ ::
+
+ > git clone https://github.com/open-iscsi/targetcli-fb.git
+ > cd targetcli-fb
+ > python setup.py install
+ > mkdir /etc/target
+ > mkdir /var/target
+
+ .. warning:: The ceph-iscsi tools assume they are managing all targets
+ on the system. If targets have been setup and are being managed by
+ targetcli the target service must be disabled.
+
+ceph-iscsi
+-----------------
+
+ Installation:
+
+ ::
+
+ > git clone https://github.com/ceph/ceph-iscsi.git
+ > cd ceph-iscsi
+ > python setup.py install --install-scripts=/usr/bin
+ > cp usr/lib/systemd/system/rbd-target-gw.service /lib/systemd/system
+ > cp usr/lib/systemd/system/rbd-target-api.service /lib/systemd/system
+
+ Enable and start the daemon:
+
+ ::
+
+ > systemctl daemon-reload
+ > systemctl enable rbd-target-gw
+ > systemctl start rbd-target-gw
+ > systemctl enable rbd-target-api
+ > systemctl start rbd-target-api
+
+Installation is complete. Proceed to the setup section in the
+`main ceph-iscsi CLI page`_.
+
+.. _`main ceph-iscsi CLI page`: ../iscsi-target-cli
diff --git a/doc/rbd/iscsi-target-cli.rst b/doc/rbd/iscsi-target-cli.rst
new file mode 100644
index 00000000..44bf0ec6
--- /dev/null
+++ b/doc/rbd/iscsi-target-cli.rst
@@ -0,0 +1,235 @@
+=============================================================
+Configuring the iSCSI Target using the Command Line Interface
+=============================================================
+
+The Ceph iSCSI gateway is the iSCSI target node and also a Ceph client
+node. The Ceph iSCSI gateway can be a standalone node or be colocated on
+a Ceph Object Store Disk (OSD) node. Completing the following steps will
+install, and configure the Ceph iSCSI gateway for basic operation.
+
+**Requirements:**
+
+- A running Ceph Luminous or later storage cluster
+
+- Red Hat Enterprise Linux/CentOS 7.5 (or newer); Linux kernel v4.16 (or newer)
+
+- The following packages must be installed from your Linux distribution's software repository:
+
+ - ``targetcli-2.1.fb47`` or newer package
+
+ - ``python-rtslib-2.1.fb64`` or newer package
+
+ - ``tcmu-runner-1.3.0`` or newer package
+
+ - ``ceph-iscsi-3.2`` or newer package
+
+ .. important::
+ If previous versions of these packages exist, then they must
+ be removed first before installing the newer versions.
+
+Do the following steps on the Ceph iSCSI gateway node before proceeding
+to the *Installing* section:
+
+#. If the Ceph iSCSI gateway is not colocated on an OSD node, then copy
+ the Ceph configuration files, located in ``/etc/ceph/``, from a
+ running Ceph node in the storage cluster to the iSCSI Gateway node.
+ The Ceph configuration files must exist on the iSCSI gateway node
+ under ``/etc/ceph/``.
+
+#. Install and configure the `Ceph Command-line
+ Interface <http://docs.ceph.com/docs/master/start/quick-rbd/#install-ceph>`_
+
+#. If needed, open TCP ports 3260 and 5000 on the firewall.
+
+ .. note::
+ Access to port 5000 should be restricted to a trusted internal network or
+ only the individual hosts where ``gwcli`` is used or ``ceph-mgr`` daemons
+ are running.
+
+#. Create a new or use an existing RADOS Block Device (RBD).
+
+**Installing:**
+
+If you are using the upstream ceph-iscsi package follow the
+`manual install instructions`_.
+
+.. _`manual install instructions`: ../iscsi-target-cli-manual-install
+
+.. toctree::
+ :hidden:
+
+ iscsi-target-cli-manual-install
+
+For rpm based instructions execute the following commands:
+
+#. As ``root``, on all iSCSI gateway nodes, install the
+ ``ceph-iscsi`` package:
+
+ ::
+
+ # yum install ceph-iscsi
+
+#. As ``root``, on all iSCSI gateway nodes, install the ``tcmu-runner``
+ package:
+
+ ::
+
+ # yum install tcmu-runner
+
+**Setup:**
+
+#. gwcli requires a pool with the name ``rbd``, so it can store metadata
+ like the iSCSI configuration. To check if this pool has been created
+ run:
+
+ ::
+
+ # ceph osd lspools
+
+ If it does not exist instructions for creating pools can be found on the
+ `RADOS pool operations page
+ <http://docs.ceph.com/docs/master/rados/operations/pools/>`_.
+
+#. As ``root``, on a iSCSI gateway node, create a file named
+ ``iscsi-gateway.cfg`` in the ``/etc/ceph/`` directory:
+
+ ::
+
+ # touch /etc/ceph/iscsi-gateway.cfg
+
+ #. Edit the ``iscsi-gateway.cfg`` file and add the following lines:
+
+ ::
+
+ [config]
+ # Name of the Ceph storage cluster. A suitable Ceph configuration file allowing
+ # access to the Ceph storage cluster from the gateway node is required, if not
+ # colocated on an OSD node.
+ cluster_name = ceph
+
+ # Place a copy of the ceph cluster's admin keyring in the gateway's /etc/ceph
+ # drectory and reference the filename here
+ gateway_keyring = ceph.client.admin.keyring
+
+
+ # API settings.
+ # The API supports a number of options that allow you to tailor it to your
+ # local environment. If you want to run the API under https, you will need to
+ # create cert/key files that are compatible for each iSCSI gateway node, that is
+ # not locked to a specific node. SSL cert and key files *must* be called
+ # 'iscsi-gateway.crt' and 'iscsi-gateway.key' and placed in the '/etc/ceph/' directory
+ # on *each* gateway node. With the SSL files in place, you can use 'api_secure = true'
+ # to switch to https mode.
+
+ # To support the API, the bear minimum settings are:
+ api_secure = false
+
+ # Additional API configuration options are as follows, defaults shown.
+ # api_user = admin
+ # api_password = admin
+ # api_port = 5001
+ # trusted_ip_list = 192.168.0.10,192.168.0.11
+
+ .. note::
+ trusted_ip_list is a list of IP addresses on each iscsi gateway that
+ will be used for management operations like target creation, lun
+ exporting, etc. The IP can be the same that will be used for iSCSI
+ data, like READ/WRITE commands to/from the RBD image, but using
+ separate IPs is recommended.
+
+ .. important::
+ The ``iscsi-gateway.cfg`` file must be identical on all iSCSI gateway nodes.
+
+ #. As ``root``, copy the ``iscsi-gateway.cfg`` file to all iSCSI
+ gateway nodes.
+
+#. As ``root``, on all iSCSI gateway nodes, enable and start the API
+ service:
+
+ ::
+
+ # systemctl daemon-reload
+
+ # systemctl enable rbd-target-gw
+ # systemctl start rbd-target-gw
+
+ # systemctl enable rbd-target-api
+ # systemctl start rbd-target-api
+
+
+**Configuring:**
+
+gwcli will create and configure the iSCSI target and RBD images and copy the
+configuration across the gateways setup in the last section. Lower level
+tools, like targetcli and rbd, can be used to query the local configuration,
+but should not be used to modify it. This next section will demonstrate how
+to create a iSCSI target and export a RBD image as LUN 0.
+
+#. As ``root``, on a iSCSI gateway node, start the iSCSI gateway
+ command-line interface:
+
+ ::
+
+ # gwcli
+
+#. Go to iscsi-targets and create a target with the name
+ iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw:
+
+ ::
+
+ > /> cd /iscsi-target
+ > /iscsi-target> create iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
+
+#. Create the iSCSI gateways. The IPs used below are the ones that will be
+ used for iSCSI data like READ and WRITE commands. They can be the
+ same IPs used for management operations listed in trusted_ip_list,
+ but it is recommended that different IPs are used.
+
+ ::
+
+ > /iscsi-target> cd iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw/gateways
+ > /iscsi-target...-igw/gateways> create ceph-gw-1 10.172.19.21
+ > /iscsi-target...-igw/gateways> create ceph-gw-2 10.172.19.22
+
+ If not using RHEL/CentOS or using an upstream or ceph-iscsi-test kernel,
+ the skipchecks=true argument must be used. This will avoid the Red Hat kernel
+ and rpm checks:
+
+ ::
+
+ > /iscsi-target> cd iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw/gateways
+ > /iscsi-target...-igw/gateways> create ceph-gw-1 10.172.19.21 skipchecks=true
+ > /iscsi-target...-igw/gateways> create ceph-gw-2 10.172.19.22 skipchecks=true
+
+#. Add a RBD image with the name disk_1 in the pool rbd:
+
+ ::
+
+ > /iscsi-target...-igw/gateways> cd /disks
+ > /disks> create pool=rbd image=disk_1 size=90G
+
+#. Create a client with the initiator name iqn.1994-05.com.redhat:rh7-client:
+
+ ::
+
+ > /disks> cd /iscsi-target/iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw/hosts
+ > /iscsi-target...eph-igw/hosts> create iqn.1994-05.com.redhat:rh7-client
+
+#. Set the client's CHAP username to myiscsiusername and password to
+ myiscsipassword:
+
+ ::
+
+ > /iscsi-target...at:rh7-client> auth username=myiscsiusername password=myiscsipassword
+
+ .. warning::
+ CHAP must always be configured. Without CHAP, the target will
+ reject any login requests.
+
+#. Add the disk to the client:
+
+ ::
+
+ > /iscsi-target...at:rh7-client> disk add rbd/disk_1
+
+The next step is to configure the iSCSI initiators.
diff --git a/doc/rbd/iscsi-targets.rst b/doc/rbd/iscsi-targets.rst
new file mode 100644
index 00000000..d2a03528
--- /dev/null
+++ b/doc/rbd/iscsi-targets.rst
@@ -0,0 +1,27 @@
+=============
+iSCSI Targets
+=============
+
+Traditionally, block-level access to a Ceph storage cluster has been
+limited to QEMU and ``librbd``, which is a key enabler for adoption
+within OpenStack environments. Starting with the Ceph Luminous release,
+block-level access is expanding to offer standard iSCSI support allowing
+wider platform usage, and potentially opening new use cases.
+
+- Red Hat Enterprise Linux/CentOS 7.5 (or newer); Linux kernel v4.16 (or newer)
+
+- A working Ceph Storage cluster, deployed with ``ceph-ansible`` or using the command-line interface
+
+- iSCSI gateways nodes, which can either be colocated with OSD nodes or on dedicated nodes
+
+- Separate network subnets for iSCSI front-end traffic and Ceph back-end traffic
+
+A choice of using Ansible or the command-line interface are the
+available deployment methods for installing and configuring the Ceph
+iSCSI gateway:
+
+.. toctree::
+ :maxdepth: 1
+
+ Using Ansible <iscsi-target-ansible>
+ Using the Command Line Interface <iscsi-target-cli>
diff --git a/doc/rbd/libvirt.rst b/doc/rbd/libvirt.rst
new file mode 100644
index 00000000..b7f07316
--- /dev/null
+++ b/doc/rbd/libvirt.rst
@@ -0,0 +1,321 @@
+=================================
+ Using libvirt with Ceph RBD
+=================================
+
+.. index:: Ceph Block Device; livirt
+
+The ``libvirt`` library creates a virtual machine abstraction layer between
+hypervisor interfaces and the software applications that use them. With
+``libvirt``, developers and system administrators can focus on a common
+management framework, common API, and common shell interface (i.e., ``virsh``)
+to many different hypervisors, including:
+
+- QEMU/KVM
+- XEN
+- LXC
+- VirtualBox
+- etc.
+
+Ceph block devices support QEMU/KVM. You can use Ceph block devices with
+software that interfaces with ``libvirt``. The following stack diagram
+illustrates how ``libvirt`` and QEMU use Ceph block devices via ``librbd``.
+
+
+.. ditaa::
+
+ +---------------------------------------------------+
+ | libvirt |
+ +------------------------+--------------------------+
+ |
+ | configures
+ v
+ +---------------------------------------------------+
+ | QEMU |
+ +---------------------------------------------------+
+ | librbd |
+ +------------------------+-+------------------------+
+ | OSDs | | Monitors |
+ +------------------------+ +------------------------+
+
+
+The most common ``libvirt`` use case involves providing Ceph block devices to
+cloud solutions like OpenStack or CloudStack. The cloud solution uses
+``libvirt`` to interact with QEMU/KVM, and QEMU/KVM interacts with Ceph block
+devices via ``librbd``. See `Block Devices and OpenStack`_ and `Block Devices
+and CloudStack`_ for details. See `Installation`_ for installation details.
+
+You can also use Ceph block devices with ``libvirt``, ``virsh`` and the
+``libvirt`` API. See `libvirt Virtualization API`_ for details.
+
+
+To create VMs that use Ceph block devices, use the procedures in the following
+sections. In the exemplary embodiment, we have used ``libvirt-pool`` for the pool
+name, ``client.libvirt`` for the user name, and ``new-libvirt-image`` for the
+image name. You may use any value you like, but ensure you replace those values
+when executing commands in the subsequent procedures.
+
+
+Configuring Ceph
+================
+
+To configure Ceph for use with ``libvirt``, perform the following steps:
+
+#. `Create a pool`_. The following example uses the
+ pool name ``libvirt-pool`` with 128 placement groups. ::
+
+ ceph osd pool create libvirt-pool 128 128
+
+ Verify the pool exists. ::
+
+ ceph osd lspools
+
+#. Use the ``rbd`` tool to initialize the pool for use by RBD::
+
+ rbd pool init <pool-name>
+
+#. `Create a Ceph User`_ (or use ``client.admin`` for version 0.9.7 and
+ earlier). The following example uses the Ceph user name ``client.libvirt``
+ and references ``libvirt-pool``. ::
+
+ ceph auth get-or-create client.libvirt mon 'profile rbd' osd 'profile rbd pool=libvirt-pool'
+
+ Verify the name exists. ::
+
+ ceph auth ls
+
+ **NOTE**: ``libvirt`` will access Ceph using the ID ``libvirt``,
+ not the Ceph name ``client.libvirt``. See `User Management - User`_ and
+ `User Management - CLI`_ for a detailed explanation of the difference
+ between ID and name.
+
+#. Use QEMU to `create an image`_ in your RBD pool.
+ The following example uses the image name ``new-libvirt-image``
+ and references ``libvirt-pool``. ::
+
+ qemu-img create -f rbd rbd:libvirt-pool/new-libvirt-image 2G
+
+ Verify the image exists. ::
+
+ rbd -p libvirt-pool ls
+
+ **NOTE:** You can also use `rbd create`_ to create an image, but we
+ recommend ensuring that QEMU is working properly.
+
+.. tip:: Optionally, if you wish to enable debug logs and the admin socket for
+ this client, you can add the following section to ``/etc/ceph/ceph.conf``::
+
+ [client.libvirt]
+ log file = /var/log/ceph/qemu-guest-$pid.log
+ admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
+
+ The ``client.libvirt`` section name should match the cephx user you created
+ above. If SELinux or AppArmor is enabled, note that this could prevent the
+ client process (qemu via libvirt) from writing the logs or admin socket to
+ the destination locations (``/var/log/ceph`` or ``/var/run/ceph``).
+
+
+
+Preparing the VM Manager
+========================
+
+You may use ``libvirt`` without a VM manager, but you may find it simpler to
+create your first domain with ``virt-manager``.
+
+#. Install a virtual machine manager. See `KVM/VirtManager`_ for details. ::
+
+ sudo apt-get install virt-manager
+
+#. Download an OS image (if necessary).
+
+#. Launch the virtual machine manager. ::
+
+ sudo virt-manager
+
+
+
+Creating a VM
+=============
+
+To create a VM with ``virt-manager``, perform the following steps:
+
+#. Press the **Create New Virtual Machine** button.
+
+#. Name the new virtual machine domain. In the exemplary embodiment, we
+ use the name ``libvirt-virtual-machine``. You may use any name you wish,
+ but ensure you replace ``libvirt-virtual-machine`` with the name you
+ choose in subsequent commandline and configuration examples. ::
+
+ libvirt-virtual-machine
+
+#. Import the image. ::
+
+ /path/to/image/recent-linux.img
+
+ **NOTE:** Import a recent image. Some older images may not rescan for
+ virtual devices properly.
+
+#. Configure and start the VM.
+
+#. You may use ``virsh list`` to verify the VM domain exists. ::
+
+ sudo virsh list
+
+#. Login to the VM (root/root)
+
+#. Stop the VM before configuring it for use with Ceph.
+
+
+Configuring the VM
+==================
+
+When configuring the VM for use with Ceph, it is important to use ``virsh``
+where appropriate. Additionally, ``virsh`` commands often require root
+privileges (i.e., ``sudo``) and will not return appropriate results or notify
+you that that root privileges are required. For a reference of ``virsh``
+commands, refer to `Virsh Command Reference`_.
+
+
+#. Open the configuration file with ``virsh edit``. ::
+
+ sudo virsh edit {vm-domain-name}
+
+ Under ``<devices>`` there should be a ``<disk>`` entry. ::
+
+ <devices>
+ <emulator>/usr/bin/kvm</emulator>
+ <disk type='file' device='disk'>
+ <driver name='qemu' type='raw'/>
+ <source file='/path/to/image/recent-linux.img'/>
+ <target dev='vda' bus='virtio'/>
+ <address type='drive' controller='0' bus='0' unit='0'/>
+ </disk>
+
+
+ Replace ``/path/to/image/recent-linux.img`` with the path to the OS image.
+ The minimum kernel for using the faster ``virtio`` bus is 2.6.25. See
+ `Virtio`_ for details.
+
+ **IMPORTANT:** Use ``sudo virsh edit`` instead of a text editor. If you edit
+ the configuration file under ``/etc/libvirt/qemu`` with a text editor,
+ ``libvirt`` may not recognize the change. If there is a discrepancy between
+ the contents of the XML file under ``/etc/libvirt/qemu`` and the result of
+ ``sudo virsh dumpxml {vm-domain-name}``, then your VM may not work
+ properly.
+
+
+#. Add the Ceph RBD image you created as a ``<disk>`` entry. ::
+
+ <disk type='network' device='disk'>
+ <source protocol='rbd' name='libvirt-pool/new-libvirt-image'>
+ <host name='{monitor-host}' port='6789'/>
+ </source>
+ <target dev='vda' bus='virtio'/>
+ </disk>
+
+ Replace ``{monitor-host}`` with the name of your host, and replace the
+ pool and/or image name as necessary. You may add multiple ``<host>``
+ entries for your Ceph monitors. The ``dev`` attribute is the logical
+ device name that will appear under the ``/dev`` directory of your
+ VM. The optional ``bus`` attribute indicates the type of disk device to
+ emulate. The valid settings are driver specific (e.g., "ide", "scsi",
+ "virtio", "xen", "usb" or "sata").
+
+ See `Disks`_ for details of the ``<disk>`` element, and its child elements
+ and attributes.
+
+#. Save the file.
+
+#. If your Ceph Storage Cluster has `Ceph Authentication`_ enabled (it does by
+ default), you must generate a secret. ::
+
+ cat > secret.xml <<EOF
+ <secret ephemeral='no' private='no'>
+ <usage type='ceph'>
+ <name>client.libvirt secret</name>
+ </usage>
+ </secret>
+ EOF
+
+#. Define the secret. ::
+
+ sudo virsh secret-define --file secret.xml
+ <uuid of secret is output here>
+
+#. Get the ``client.libvirt`` key and save the key string to a file. ::
+
+ ceph auth get-key client.libvirt | sudo tee client.libvirt.key
+
+#. Set the UUID of the secret. ::
+
+ sudo virsh secret-set-value --secret {uuid of secret} --base64 $(cat client.libvirt.key) && rm client.libvirt.key secret.xml
+
+ You must also set the secret manually by adding the following ``<auth>``
+ entry to the ``<disk>`` element you entered earlier (replacing the
+ ``uuid`` value with the result from the command line example above). ::
+
+ sudo virsh edit {vm-domain-name}
+
+ Then, add ``<auth></auth>`` element to the domain configuration file::
+
+ ...
+ </source>
+ <auth username='libvirt'>
+ <secret type='ceph' uuid='9ec59067-fdbc-a6c0-03ff-df165c0587b8'/>
+ </auth>
+ <target ...
+
+
+ **NOTE:** The exemplary ID is ``libvirt``, not the Ceph name
+ ``client.libvirt`` as generated at step 2 of `Configuring Ceph`_. Ensure
+ you use the ID component of the Ceph name you generated. If for some reason
+ you need to regenerate the secret, you will have to execute
+ ``sudo virsh secret-undefine {uuid}`` before executing
+ ``sudo virsh secret-set-value`` again.
+
+
+Summary
+=======
+
+Once you have configured the VM for use with Ceph, you can start the VM.
+To verify that the VM and Ceph are communicating, you may perform the
+following procedures.
+
+
+#. Check to see if Ceph is running::
+
+ ceph health
+
+#. Check to see if the VM is running. ::
+
+ sudo virsh list
+
+#. Check to see if the VM is communicating with Ceph. Replace
+ ``{vm-domain-name}`` with the name of your VM domain::
+
+ sudo virsh qemu-monitor-command --hmp {vm-domain-name} 'info block'
+
+#. Check to see if the device from ``<target dev='hdb' bus='ide'/>`` appears
+ under ``/dev`` or under ``proc/partitions``. ::
+
+ ls dev
+ cat proc/partitions
+
+If everything looks okay, you may begin using the Ceph block device
+within your VM.
+
+
+.. _Installation: ../../install
+.. _libvirt Virtualization API: http://www.libvirt.org
+.. _Block Devices and OpenStack: ../rbd-openstack
+.. _Block Devices and CloudStack: ../rbd-cloudstack
+.. _Create a pool: ../../rados/operations/pools#create-a-pool
+.. _Create a Ceph User: ../../rados/operations/user-management#add-a-user
+.. _create an image: ../qemu-rbd#creating-images-with-qemu
+.. _Virsh Command Reference: http://www.libvirt.org/virshcmdref.html
+.. _KVM/VirtManager: https://help.ubuntu.com/community/KVM/VirtManager
+.. _Ceph Authentication: ../../rados/configuration/auth-config-ref
+.. _Disks: http://www.libvirt.org/formatdomain.html#elementsDisks
+.. _rbd create: ../rados-rbd-cmds#creating-a-block-device-image
+.. _User Management - User: ../../rados/operations/user-management#user
+.. _User Management - CLI: ../../rados/operations/user-management#command-line-usage
+.. _Virtio: http://www.linux-kvm.org/page/Virtio
diff --git a/doc/rbd/man/index.rst b/doc/rbd/man/index.rst
new file mode 100644
index 00000000..33a192a7
--- /dev/null
+++ b/doc/rbd/man/index.rst
@@ -0,0 +1,16 @@
+============================
+ Ceph Block Device Manpages
+============================
+
+.. toctree::
+ :maxdepth: 1
+
+ rbd <../../man/8/rbd>
+ rbd-fuse <../../man/8/rbd-fuse>
+ rbd-nbd <../../man/8/rbd-nbd>
+ rbd-ggate <../../man/8/rbd-ggate>
+ ceph-rbdnamer <../../man/8/ceph-rbdnamer>
+ rbd-replay-prep <../../man/8/rbd-replay-prep>
+ rbd-replay <../../man/8/rbd-replay>
+ rbd-replay-many <../../man/8/rbd-replay-many>
+ rbd-map <../../man/8/rbdmap>
diff --git a/doc/rbd/qemu-rbd.rst b/doc/rbd/qemu-rbd.rst
new file mode 100644
index 00000000..3b8b75fe
--- /dev/null
+++ b/doc/rbd/qemu-rbd.rst
@@ -0,0 +1,220 @@
+========================
+ QEMU and Block Devices
+========================
+
+.. index:: Ceph Block Device; QEMU KVM
+
+The most frequent Ceph Block Device use case involves providing block device
+images to virtual machines. For example, a user may create a "golden" image
+with an OS and any relevant software in an ideal configuration. Then, the user
+takes a snapshot of the image. Finally, the user clones the snapshot (usually
+many times). See `Snapshots`_ for details. The ability to make copy-on-write
+clones of a snapshot means that Ceph can provision block device images to
+virtual machines quickly, because the client doesn't have to download an entire
+image each time it spins up a new virtual machine.
+
+
+.. ditaa::
+
+ +---------------------------------------------------+
+ | QEMU |
+ +---------------------------------------------------+
+ | librbd |
+ +---------------------------------------------------+
+ | librados |
+ +------------------------+-+------------------------+
+ | OSDs | | Monitors |
+ +------------------------+ +------------------------+
+
+
+Ceph Block Devices can integrate with the QEMU virtual machine. For details on
+QEMU, see `QEMU Open Source Processor Emulator`_. For QEMU documentation, see
+`QEMU Manual`_. For installation details, see `Installation`_.
+
+.. important:: To use Ceph Block Devices with QEMU, you must have access to a
+ running Ceph cluster.
+
+
+Usage
+=====
+
+The QEMU command line expects you to specify the pool name and image name. You
+may also specify a snapshot name.
+
+QEMU will assume that the Ceph configuration file resides in the default
+location (e.g., ``/etc/ceph/$cluster.conf``) and that you are executing
+commands as the default ``client.admin`` user unless you expressly specify
+another Ceph configuration file path or another user. When specifying a user,
+QEMU uses the ``ID`` rather than the full ``TYPE:ID``. See `User Management -
+User`_ for details. Do not prepend the client type (i.e., ``client.``) to the
+beginning of the user ``ID``, or you will receive an authentication error. You
+should have the key for the ``admin`` user or the key of another user you
+specify with the ``:id={user}`` option in a keyring file stored in default path
+(i.e., ``/etc/ceph`` or the local directory with appropriate file ownership and
+permissions. Usage takes the following form::
+
+ qemu-img {command} [options] rbd:{pool-name}/{image-name}[@snapshot-name][:option1=value1][:option2=value2...]
+
+For example, specifying the ``id`` and ``conf`` options might look like the following::
+
+ qemu-img {command} [options] rbd:glance-pool/maipo:id=glance:conf=/etc/ceph/ceph.conf
+
+.. tip:: Configuration values containing ``:``, ``@``, or ``=`` can be escaped with a
+ leading ``\`` character.
+
+
+Creating Images with QEMU
+=========================
+
+You can create a block device image from QEMU. You must specify ``rbd``, the
+pool name, and the name of the image you wish to create. You must also specify
+the size of the image. ::
+
+ qemu-img create -f raw rbd:{pool-name}/{image-name} {size}
+
+For example::
+
+ qemu-img create -f raw rbd:data/foo 10G
+
+.. important:: The ``raw`` data format is really the only sensible
+ ``format`` option to use with RBD. Technically, you could use other
+ QEMU-supported formats (such as ``qcow2`` or ``vmdk``), but doing
+ so would add additional overhead, and would also render the volume
+ unsafe for virtual machine live migration when caching (see below)
+ is enabled.
+
+
+Resizing Images with QEMU
+=========================
+
+You can resize a block device image from QEMU. You must specify ``rbd``,
+the pool name, and the name of the image you wish to resize. You must also
+specify the size of the image. ::
+
+ qemu-img resize rbd:{pool-name}/{image-name} {size}
+
+For example::
+
+ qemu-img resize rbd:data/foo 10G
+
+
+Retrieving Image Info with QEMU
+===============================
+
+You can retrieve block device image information from QEMU. You must
+specify ``rbd``, the pool name, and the name of the image. ::
+
+ qemu-img info rbd:{pool-name}/{image-name}
+
+For example::
+
+ qemu-img info rbd:data/foo
+
+
+Running QEMU with RBD
+=====================
+
+QEMU can pass a block device from the host on to a guest, but since
+QEMU 0.15, there's no need to map an image as a block device on
+the host. Instead, QEMU can access an image as a virtual block
+device directly via ``librbd``. This performs better because it avoids
+an additional context switch, and can take advantage of `RBD caching`_.
+
+You can use ``qemu-img`` to convert existing virtual machine images to Ceph
+block device images. For example, if you have a qcow2 image, you could run::
+
+ qemu-img convert -f qcow2 -O raw debian_squeeze.qcow2 rbd:data/squeeze
+
+To run a virtual machine booting from that image, you could run::
+
+ qemu -m 1024 -drive format=raw,file=rbd:data/squeeze
+
+`RBD caching`_ can significantly improve performance.
+Since QEMU 1.2, QEMU's cache options control ``librbd`` caching::
+
+ qemu -m 1024 -drive format=rbd,file=rbd:data/squeeze,cache=writeback
+
+If you have an older version of QEMU, you can set the ``librbd`` cache
+configuration (like any Ceph configuration option) as part of the
+'file' parameter::
+
+ qemu -m 1024 -drive format=raw,file=rbd:data/squeeze:rbd_cache=true,cache=writeback
+
+.. important:: If you set rbd_cache=true, you must set cache=writeback
+ or risk data loss. Without cache=writeback, QEMU will not send
+ flush requests to librbd. If QEMU exits uncleanly in this
+ configuration, filesystems on top of rbd can be corrupted.
+
+.. _RBD caching: ../rbd-config-ref/#rbd-cache-config-settings
+
+
+.. index:: Ceph Block Device; discard trim and libvirt
+
+Enabling Discard/TRIM
+=====================
+
+Since Ceph version 0.46 and QEMU version 1.1, Ceph Block Devices support the
+discard operation. This means that a guest can send TRIM requests to let a Ceph
+block device reclaim unused space. This can be enabled in the guest by mounting
+``ext4`` or ``XFS`` with the ``discard`` option.
+
+For this to be available to the guest, it must be explicitly enabled
+for the block device. To do this, you must specify a
+``discard_granularity`` associated with the drive::
+
+ qemu -m 1024 -drive format=raw,file=rbd:data/squeeze,id=drive1,if=none \
+ -device driver=ide-hd,drive=drive1,discard_granularity=512
+
+Note that this uses the IDE driver. The virtio driver does not
+support discard.
+
+If using libvirt, edit your libvirt domain's configuration file using ``virsh
+edit`` to include the ``xmlns:qemu`` value. Then, add a ``qemu:commandline``
+block as a child of that domain. The following example shows how to set two
+devices with ``qemu id=`` to different ``discard_granularity`` values.
+
+.. code-block:: xml
+
+ <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
+ <qemu:commandline>
+ <qemu:arg value='-set'/>
+ <qemu:arg value='block.scsi0-0-0.discard_granularity=4096'/>
+ <qemu:arg value='-set'/>
+ <qemu:arg value='block.scsi0-0-1.discard_granularity=65536'/>
+ </qemu:commandline>
+ </domain>
+
+
+.. index:: Ceph Block Device; cache options
+
+QEMU Cache Options
+==================
+
+QEMU's cache options correspond to the following Ceph `RBD Cache`_ settings.
+
+Writeback::
+
+ rbd_cache = true
+
+Writethrough::
+
+ rbd_cache = true
+ rbd_cache_max_dirty = 0
+
+None::
+
+ rbd_cache = false
+
+QEMU's cache settings override Ceph's cache settings (including settings that
+are explicitly set in the Ceph configuration file).
+
+.. note:: Prior to QEMU v2.4.0, if you explicitly set `RBD Cache`_ settings
+ in the Ceph configuration file, your Ceph settings override the QEMU cache
+ settings.
+
+.. _QEMU Open Source Processor Emulator: http://wiki.qemu.org/Main_Page
+.. _QEMU Manual: http://wiki.qemu.org/Manual
+.. _RBD Cache: ../rbd-config-ref/
+.. _Snapshots: ../rbd-snapshot/
+.. _Installation: ../../install
+.. _User Management - User: ../../rados/operations/user-management#user
diff --git a/doc/rbd/rados-rbd-cmds.rst b/doc/rbd/rados-rbd-cmds.rst
new file mode 100644
index 00000000..3b0ef345
--- /dev/null
+++ b/doc/rbd/rados-rbd-cmds.rst
@@ -0,0 +1,224 @@
+=======================
+ Block Device Commands
+=======================
+
+.. index:: Ceph Block Device; image management
+
+The ``rbd`` command enables you to create, list, introspect and remove block
+device images. You can also use it to clone images, create snapshots,
+rollback an image to a snapshot, view a snapshot, etc. For details on using
+the ``rbd`` command, see `RBD – Manage RADOS Block Device (RBD) Images`_ for
+details.
+
+.. important:: To use Ceph Block Device commands, you must have access to
+ a running Ceph cluster.
+
+Create a Block Device Pool
+==========================
+
+#. On the admin node, use the ``ceph`` tool to `create a pool`_.
+
+#. On the admin node, use the ``rbd`` tool to initialize the pool for use by RBD::
+
+ rbd pool init <pool-name>
+
+.. note:: The ``rbd`` tool assumes a default pool name of 'rbd' when not
+ provided.
+
+Create a Block Device User
+==========================
+
+Unless specified, the ``rbd`` command will access the Ceph cluster using the ID
+``admin``. This ID allows full administrative access to the cluster. It is
+recommended that you utilize a more restricted user wherever possible.
+
+To `create a Ceph user`_, with ``ceph`` specify the ``auth get-or-create``
+command, user name, monitor caps, and OSD caps::
+
+ ceph auth get-or-create client.{ID} mon 'profile rbd' osd 'profile {profile name} [pool={pool-name}][, profile ...]' mgr 'profile rbd [pool={pool-name}]'
+
+For example, to create a user ID named ``qemu`` with read-write access to the
+pool ``vms`` and read-only access to the pool ``images``, execute the
+following::
+
+ ceph auth get-or-create client.qemu mon 'profile rbd' osd 'profile rbd pool=vms, profile rbd-read-only pool=images' mgr 'profile rbd pool=images'
+
+The output from the ``ceph auth get-or-create`` command will be the keyring for
+the specified user, which can be written to ``/etc/ceph/ceph.client.{ID}.keyring``.
+
+.. note:: The user ID can be specified when using the ``rbd`` command by
+ providing the ``--id {id}`` optional argument.
+
+Creating a Block Device Image
+=============================
+
+Before you can add a block device to a node, you must create an image for it in
+the :term:`Ceph Storage Cluster` first. To create a block device image, execute
+the following::
+
+ rbd create --size {megabytes} {pool-name}/{image-name}
+
+For example, to create a 1GB image named ``bar`` that stores information in a
+pool named ``swimmingpool``, execute the following::
+
+ rbd create --size 1024 swimmingpool/bar
+
+If you don't specify pool when creating an image, it will be stored in the
+default pool ``rbd``. For example, to create a 1GB image named ``foo`` stored in
+the default pool ``rbd``, execute the following::
+
+ rbd create --size 1024 foo
+
+.. note:: You must create a pool first before you can specify it as a
+ source. See `Storage Pools`_ for details.
+
+Listing Block Device Images
+===========================
+
+To list block devices in the ``rbd`` pool, execute the following
+(i.e., ``rbd`` is the default pool name)::
+
+ rbd ls
+
+To list block devices in a particular pool, execute the following,
+but replace ``{poolname}`` with the name of the pool::
+
+ rbd ls {poolname}
+
+For example::
+
+ rbd ls swimmingpool
+
+To list deferred delete block devices in the ``rbd`` pool, execute the
+following::
+
+ rbd trash ls
+
+To list deferred delete block devices in a particular pool, execute the
+following, but replace ``{poolname}`` with the name of the pool::
+
+ rbd trash ls {poolname}
+
+For example::
+
+ rbd trash ls swimmingpool
+
+Retrieving Image Information
+============================
+
+To retrieve information from a particular image, execute the following,
+but replace ``{image-name}`` with the name for the image::
+
+ rbd info {image-name}
+
+For example::
+
+ rbd info foo
+
+To retrieve information from an image within a pool, execute the following,
+but replace ``{image-name}`` with the name of the image and replace ``{pool-name}``
+with the name of the pool::
+
+ rbd info {pool-name}/{image-name}
+
+For example::
+
+ rbd info swimmingpool/bar
+
+Resizing a Block Device Image
+=============================
+
+:term:`Ceph Block Device` images are thin provisioned. They don't actually use
+any physical storage until you begin saving data to them. However, they do have
+a maximum capacity that you set with the ``--size`` option. If you want to
+increase (or decrease) the maximum size of a Ceph Block Device image, execute
+the following::
+
+ rbd resize --size 2048 foo (to increase)
+ rbd resize --size 2048 foo --allow-shrink (to decrease)
+
+
+Removing a Block Device Image
+=============================
+
+To remove a block device, execute the following, but replace ``{image-name}``
+with the name of the image you want to remove::
+
+ rbd rm {image-name}
+
+For example::
+
+ rbd rm foo
+
+To remove a block device from a pool, execute the following, but replace
+``{image-name}`` with the name of the image to remove and replace
+``{pool-name}`` with the name of the pool::
+
+ rbd rm {pool-name}/{image-name}
+
+For example::
+
+ rbd rm swimmingpool/bar
+
+To defer delete a block device from a pool, execute the following, but
+replace ``{image-name}`` with the name of the image to move and replace
+``{pool-name}`` with the name of the pool::
+
+ rbd trash mv {pool-name}/{image-name}
+
+For example::
+
+ rbd trash mv swimmingpool/bar
+
+To remove a deferred block device from a pool, execute the following, but
+replace ``{image-id}`` with the id of the image to remove and replace
+``{pool-name}`` with the name of the pool::
+
+ rbd trash rm {pool-name}/{image-id}
+
+For example::
+
+ rbd trash rm swimmingpool/2bf4474b0dc51
+
+.. note::
+
+ * You can move an image to the trash even it has snapshot(s) or actively
+ in-use by clones, but can not be removed from trash.
+
+ * You can use *--expires-at* to set the defer time (default is ``now``),
+ and if its deferment time has not expired, it can not be removed unless
+ you use *--force*.
+
+Restoring a Block Device Image
+==============================
+
+To restore a deferred delete block device in the rbd pool, execute the
+following, but replace ``{image-id}`` with the id of the image::
+
+ rbd trash restore {image-id}
+
+For example::
+
+ rbd trash restore 2bf4474b0dc51
+
+To restore a deferred delete block device in a particular pool, execute
+the following, but replace ``{image-id}`` with the id of the image and
+replace ``{pool-name}`` with the name of the pool::
+
+ rbd trash restore {pool-name}/{image-id}
+
+For example::
+
+ rbd trash restore swimmingpool/2bf4474b0dc51
+
+You can also use ``--image`` to rename the image while restoring it.
+
+For example::
+
+ rbd trash restore swimmingpool/2bf4474b0dc51 --image new-name
+
+
+.. _create a pool: ../../rados/operations/pools/#create-a-pool
+.. _Storage Pools: ../../rados/operations/pools
+.. _RBD – Manage RADOS Block Device (RBD) Images: ../../man/8/rbd/
+.. _create a Ceph user: ../../rados/operations/user-management#add-a-user
diff --git a/doc/rbd/rbd-cloudstack.rst b/doc/rbd/rbd-cloudstack.rst
new file mode 100644
index 00000000..4d02f95b
--- /dev/null
+++ b/doc/rbd/rbd-cloudstack.rst
@@ -0,0 +1,157 @@
+=============================
+ Block Devices and CloudStack
+=============================
+
+You may use Ceph Block Device images with CloudStack 4.0 and higher through
+``libvirt``, which configures the QEMU interface to ``librbd``. Ceph stripes
+block device images as objects across the cluster, which means that large Ceph
+Block Device images have better performance than a standalone server!
+
+To use Ceph Block Devices with CloudStack 4.0 and higher, you must install QEMU,
+``libvirt``, and CloudStack first. We recommend using a separate physical host
+for your CloudStack installation. CloudStack recommends a minimum of 4GB of RAM
+and a dual-core processor, but more CPU and RAM will perform better. The
+following diagram depicts the CloudStack/Ceph technology stack.
+
+
+.. ditaa::
+
+ +---------------------------------------------------+
+ | CloudStack |
+ +---------------------------------------------------+
+ | libvirt |
+ +------------------------+--------------------------+
+ |
+ | configures
+ v
+ +---------------------------------------------------+
+ | QEMU |
+ +---------------------------------------------------+
+ | librbd |
+ +---------------------------------------------------+
+ | librados |
+ +------------------------+-+------------------------+
+ | OSDs | | Monitors |
+ +------------------------+ +------------------------+
+
+.. important:: To use Ceph Block Devices with CloudStack, you must have
+ access to a running Ceph Storage Cluster.
+
+CloudStack integrates with Ceph's block devices to provide CloudStack with a
+back end for CloudStack's Primary Storage. The instructions below detail the
+setup for CloudStack Primary Storage.
+
+.. note:: We recommend installing with Ubuntu 14.04 or later so that
+ you can use package installation instead of having to compile
+ libvirt from source.
+
+Installing and configuring QEMU for use with CloudStack doesn't require any
+special handling. Ensure that you have a running Ceph Storage Cluster. Install
+QEMU and configure it for use with Ceph; then, install ``libvirt`` version
+0.9.13 or higher (you may need to compile from source) and ensure it is running
+with Ceph.
+
+
+.. note:: Ubuntu 14.04 and CentOS 7.2 will have ``libvirt`` with RBD storage
+ pool support enabled by default.
+
+.. index:: pools; CloudStack
+
+Create a Pool
+=============
+
+By default, Ceph block devices use the ``rbd`` pool. Create a pool for
+CloudStack NFS Primary Storage. Ensure your Ceph cluster is running, then create
+the pool. ::
+
+ ceph osd pool create cloudstack
+
+See `Create a Pool`_ for details on specifying the number of placement groups
+for your pools, and `Placement Groups`_ for details on the number of placement
+groups you should set for your pools.
+
+A newly created pool must initialized prior to use. Use the ``rbd`` tool
+to initialize the pool::
+
+ rbd pool init cloudstack
+
+Create a Ceph User
+==================
+
+To access the Ceph cluster we require a Ceph user which has the correct
+credentials to access the ``cloudstack`` pool we just created. Although we could
+use ``client.admin`` for this, it's recommended to create a user with only
+access to the ``cloudstack`` pool. ::
+
+ ceph auth get-or-create client.cloudstack mon 'profile rbd' osd 'profile rbd pool=cloudstack'
+
+Use the information returned by the command in the next step when adding the
+Primary Storage.
+
+See `User Management`_ for additional details.
+
+Add Primary Storage
+===================
+
+To add a Ceph block device as Primary Storage, the steps include:
+
+#. Log in to the CloudStack UI.
+#. Click **Infrastructure** on the left side navigation bar.
+#. Select **View All** under **Primary Storage**.
+#. Click the **Add Primary Storage** button on the top right hand side.
+#. Fill in the following information, according to your infrastructure setup:
+
+ - Scope (i.e. Cluster or Zone-Wide).
+
+ - Zone.
+
+ - Pod.
+
+ - Cluster.
+
+ - Name of Primary Storage.
+
+ - For **Protocol**, select ``RBD``.
+
+ - For **Provider**, select the appropriate provider type (i.e. DefaultPrimary, SolidFire, SolidFireShared, or CloudByte). Depending on the provider chosen, fill out the information pertinent to your setup.
+
+#. Add cluster information (``cephx`` is supported).
+
+ - For **RADOS Monitor**, provide the IP address of a Ceph monitor node.
+
+ - For **RADOS Pool**, provide the name of an RBD pool.
+
+ - For **RADOS User**, provide a user that has sufficient rights to the RBD pool. Note: Do not include the ``client.`` part of the user.
+
+ - For **RADOS Secret**, provide the secret the user's secret.
+
+ - **Storage Tags** are optional. Use tags at your own discretion. For more information about storage tags in CloudStack, refer to `Storage Tags`_.
+
+#. Click **OK**.
+
+Create a Disk Offering
+======================
+
+To create a new disk offering, refer to `Create a New Disk Offering`_.
+Create a disk offering so that it matches the ``rbd`` tag.
+The ``StoragePoolAllocator`` will choose the ``rbd``
+pool when searching for a suitable storage pool. If the disk offering doesn't
+match the ``rbd`` tag, the ``StoragePoolAllocator`` may select the pool you
+created (e.g., ``cloudstack``).
+
+
+Limitations
+===========
+
+- CloudStack will only bind to one monitor (You can however create a Round Robin DNS record over multiple monitors)
+
+
+
+.. _Create a Pool: ../../rados/operations/pools#createpool
+.. _Placement Groups: ../../rados/operations/placement-groups
+.. _Install and Configure QEMU: ../qemu-rbd
+.. _Install and Configure libvirt: ../libvirt
+.. _KVM Hypervisor Host Installation: http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html/Installation_Guide/hypervisor-kvm-install-flow.html
+.. _Storage Tags: http://docs.cloudstack.apache.org/projects/cloudstack-administration/en/4.11/storage.html#storage-tags
+.. _Create a New Disk Offering: http://docs.cloudstack.apache.org/projects/cloudstack-administration/en/master/service_offerings.html#creating-a-new-disk-offering
+.. _User Management: ../../rados/operations/user-management
diff --git a/doc/rbd/rbd-config-ref.rst b/doc/rbd/rbd-config-ref.rst
new file mode 100644
index 00000000..2fa313b7
--- /dev/null
+++ b/doc/rbd/rbd-config-ref.rst
@@ -0,0 +1,354 @@
+=======================
+ librbd Settings
+=======================
+
+See `Block Device`_ for additional details.
+
+Generic IO Settings
+===================
+
+``rbd compression hint``
+
+:Description: Hint to send to the OSDs on write operations. If set to `compressible` and the OSD `bluestore compression mode` setting is `passive`, the OSD will attempt to compress the data. If set to `incompressible` and the OSD compression setting is `aggressive`, the OSD will not attempt to compress the data.
+:Type: Enum
+:Required: No
+:Default: ``none``
+:Values: ``none``, ``compressible``, ``incompressible``
+
+Cache Settings
+=======================
+
+.. sidebar:: Kernel Caching
+
+ The kernel driver for Ceph block devices can use the Linux page cache to
+ improve performance.
+
+The user space implementation of the Ceph block device (i.e., ``librbd``) cannot
+take advantage of the Linux page cache, so it includes its own in-memory
+caching, called "RBD caching." RBD caching behaves just like well-behaved hard
+disk caching. When the OS sends a barrier or a flush request, all dirty data is
+written to the OSDs. This means that using write-back caching is just as safe as
+using a well-behaved physical hard disk with a VM that properly sends flushes
+(i.e. Linux kernel >= 2.6.32). The cache uses a Least Recently Used (LRU)
+algorithm, and in write-back mode it can coalesce contiguous requests for
+better throughput.
+
+.. versionadded:: 0.46
+
+Ceph supports write-back caching for RBD. To enable it, add ``rbd cache =
+true`` to the ``[client]`` section of your ``ceph.conf`` file. By default
+``librbd`` does not perform any caching. Writes and reads go directly to the
+storage cluster, and writes return only when the data is on disk on all
+replicas. With caching enabled, writes return immediately, unless there are more
+than ``rbd cache max dirty`` unflushed bytes. In this case, the write triggers
+writeback and blocks until enough bytes are flushed.
+
+.. versionadded:: 0.47
+
+Ceph supports write-through caching for RBD. You can set the size of
+the cache, and you can set targets and limits to switch from
+write-back caching to write through caching. To enable write-through
+mode, set ``rbd cache max dirty`` to 0. This means writes return only
+when the data is on disk on all replicas, but reads may come from the
+cache. The cache is in memory on the client, and each RBD image has
+its own. Since the cache is local to the client, there's no coherency
+if there are others accessing the image. Running GFS or OCFS on top of
+RBD will not work with caching enabled.
+
+The ``ceph.conf`` file settings for RBD should be set in the ``[client]``
+section of your configuration file. The settings include:
+
+
+``rbd cache``
+
+:Description: Enable caching for RADOS Block Device (RBD).
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
+
+``rbd cache size``
+
+:Description: The RBD cache size in bytes.
+:Type: 64-bit Integer
+:Required: No
+:Default: ``32 MiB``
+
+
+``rbd cache max dirty``
+
+:Description: The ``dirty`` limit in bytes at which the cache triggers write-back. If ``0``, uses write-through caching.
+:Type: 64-bit Integer
+:Required: No
+:Constraint: Must be less than ``rbd cache size``.
+:Default: ``24 MiB``
+
+
+``rbd cache target dirty``
+
+:Description: The ``dirty target`` before the cache begins writing data to the data storage. Does not block writes to the cache.
+:Type: 64-bit Integer
+:Required: No
+:Constraint: Must be less than ``rbd cache max dirty``.
+:Default: ``16 MiB``
+
+
+``rbd cache max dirty age``
+
+:Description: The number of seconds dirty data is in the cache before writeback starts.
+:Type: Float
+:Required: No
+:Default: ``1.0``
+
+.. versionadded:: 0.60
+
+``rbd cache writethrough until flush``
+
+:Description: Start out in write-through mode, and switch to write-back after the first flush request is received. Enabling this is a conservative but safe setting in case VMs running on rbd are too old to send flushes, like the virtio driver in Linux before 2.6.32.
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
+.. _Block Device: ../../rbd
+
+
+Read-ahead Settings
+=======================
+
+.. versionadded:: 0.86
+
+RBD supports read-ahead/prefetching to optimize small, sequential reads.
+This should normally be handled by the guest OS in the case of a VM,
+but boot loaders may not issue efficient reads.
+Read-ahead is automatically disabled if caching is disabled.
+
+
+``rbd readahead trigger requests``
+
+:Description: Number of sequential read requests necessary to trigger read-ahead.
+:Type: Integer
+:Required: No
+:Default: ``10``
+
+
+``rbd readahead max bytes``
+
+:Description: Maximum size of a read-ahead request. If zero, read-ahead is disabled.
+:Type: 64-bit Integer
+:Required: No
+:Default: ``512 KiB``
+
+
+``rbd readahead disable after bytes``
+
+:Description: After this many bytes have been read from an RBD image, read-ahead is disabled for that image until it is closed. This allows the guest OS to take over read-ahead once it is booted. If zero, read-ahead stays enabled.
+:Type: 64-bit Integer
+:Required: No
+:Default: ``50 MiB``
+
+
+RBD Features
+============
+
+RBD supports advanced features which can be specified via the command line when creating images or the default features can be specified via Ceph config file via 'rbd_default_features = <sum of feature numeric values>' or 'rbd_default_features = <comma-delimited list of CLI values>'
+
+``Layering``
+
+:Description: Layering enables you to use cloning.
+:Internal value: 1
+:CLI value: layering
+:Added in: v0.70 (Emperor)
+:KRBD support: since v3.10
+:Default: yes
+
+``Striping v2``
+
+:Description: Striping spreads data across multiple objects. Striping helps with parallelism for sequential read/write workloads.
+:Internal value: 2
+:CLI value: striping
+:Added in: v0.70 (Emperor)
+:KRBD support: since v3.10
+:Default: yes
+
+``Exclusive locking``
+
+:Description: When enabled, it requires a client to get a lock on an object before making a write. Exclusive lock should only be enabled when a single client is accessing an image at the same time.
+:Internal value: 4
+:CLI value: exclusive-lock
+:Added in: v0.92 (Hammer)
+:KRBD support: since v4.9
+:Default: yes
+
+``Object map``
+
+:Description: Object map support depends on exclusive lock support. Block devices are thin provisioned—meaning, they only store data that actually exists. Object map support helps track which objects actually exist (have data stored on a drive). Enabling object map support speeds up I/O operations for cloning; importing and exporting a sparsely populated image; and deleting.
+:Internal value: 8
+:CLI value: object-map
+:Added in: v0.93 (Hammer)
+:KRBD support: no
+:Default: yes
+
+
+``Fast-diff``
+
+:Description: Fast-diff support depends on object map support and exclusive lock support. It adds another property to the object map, which makes it much faster to generate diffs between snapshots of an image, and the actual data usage of a snapshot much faster.
+:Internal value: 16
+:CLI value: fast-diff
+:Added in: v9.0.1 (Infernalis)
+:KRBD support: no
+:Default: yes
+
+
+``Deep-flatten``
+
+:Description: Deep-flatten makes rbd flatten work on all the snapshots of an image, in addition to the image itself. Without it, snapshots of an image will still rely on the parent, so the parent will not be delete-able until the snapshots are deleted. Deep-flatten makes a parent independent of its clones, even if they have snapshots.
+:Internal value: 32
+:CLI value: deep-flatten
+:Added in: v9.0.2 (Infernalis)
+:KRBD support: no
+:Default: yes
+
+
+``Journaling``
+
+:Description: Journaling support depends on exclusive lock support. Journaling records all modifications to an image in the order they occur. RBD mirroring utilizes the journal to replicate a crash consistent image to a remote cluster.
+:Internal value: 64
+:CLI value: journaling
+:Added in: v10.0.1 (Jewel)
+:KRBD support: no
+:Default: no
+
+
+``Data pool``
+
+:Description: On erasure-coded pools, the image data block objects need to be stored on a separate pool from the image metadata.
+:Internal value: 128
+:Added in: v11.1.0 (Kraken)
+:KRBD support: since v4.11
+:Default: no
+
+
+``Operations``
+
+:Description: Used to restrict older clients from performing certain maintenance operations against an image (e.g. clone, snap create).
+:Internal value: 256
+:Added in: v13.0.2 (Mimic)
+:KRBD support: since v4.16
+
+
+``Migrating``
+
+:Description: Used to restrict older clients from opening an image when it is in migration state.
+:Internal value: 512
+:Added in: v14.0.1 (Nautilus)
+:KRBD support: no
+
+
+RBD QOS Settings
+================
+
+RBD supports limiting per image IO, controlled by the following
+settings.
+
+``rbd qos iops limit``
+
+:Description: The desired limit of IO operations per second.
+:Type: Unsigned Integer
+:Required: No
+:Default: ``0``
+
+
+``rbd qos bps limit``
+
+:Description: The desired limit of IO bytes per second.
+:Type: Unsigned Integer
+:Required: No
+:Default: ``0``
+
+
+``rbd qos read iops limit``
+
+:Description: The desired limit of read operations per second.
+:Type: Unsigned Integer
+:Required: No
+:Default: ``0``
+
+
+``rbd qos write iops limit``
+
+:Description: The desired limit of write operations per second.
+:Type: Unsigned Integer
+:Required: No
+:Default: ``0``
+
+
+``rbd qos read bps limit``
+
+:Description: The desired limit of read bytes per second.
+:Type: Unsigned Integer
+:Required: No
+:Default: ``0``
+
+
+``rbd qos write bps limit``
+
+:Description: The desired limit of write bytes per second.
+:Type: Unsigned Integer
+:Required: No
+:Default: ``0``
+
+
+``rbd qos iops burst``
+
+:Description: The desired burst limit of IO operations.
+:Type: Unsigned Integer
+:Required: No
+:Default: ``0``
+
+
+``rbd qos bps burst``
+
+:Description: The desired burst limit of IO bytes.
+:Type: Unsigned Integer
+:Required: No
+:Default: ``0``
+
+
+``rbd qos read iops burst``
+
+:Description: The desired burst limit of read operations.
+:Type: Unsigned Integer
+:Required: No
+:Default: ``0``
+
+
+``rbd qos write iops burst``
+
+:Description: The desired burst limit of write operations.
+:Type: Unsigned Integer
+:Required: No
+:Default: ``0``
+
+
+``rbd qos read bps burst``
+
+:Description: The desired burst limit of read bytes.
+:Type: Unsigned Integer
+:Required: No
+:Default: ``0``
+
+
+``rbd qos write bps burst``
+
+:Description: The desired burst limit of write bytes.
+:Type: Unsigned Integer
+:Required: No
+:Default: ``0``
+
+
+``rbd qos schedule tick min``
+
+:Description: The minimum schedule tick (in milliseconds) for QoS.
+:Type: Unsigned Integer
+:Required: No
+:Default: ``50``
diff --git a/doc/rbd/rbd-ko.rst b/doc/rbd/rbd-ko.rst
new file mode 100644
index 00000000..70c40783
--- /dev/null
+++ b/doc/rbd/rbd-ko.rst
@@ -0,0 +1,59 @@
+==========================
+ Kernel Module Operations
+==========================
+
+.. index:: Ceph Block Device; kernel module
+
+.. important:: To use kernel module operations, you must have a running Ceph cluster.
+
+Get a List of Images
+====================
+
+To mount a block device image, first return a list of the images. ::
+
+ rbd list
+
+Map a Block Device
+==================
+
+Use ``rbd`` to map an image name to a kernel module. You must specify the
+image name, the pool name, and the user name. ``rbd`` will load RBD kernel
+module on your behalf if it's not already loaded. ::
+
+ sudo rbd device map {pool-name}/{image-name} --id {user-name}
+
+For example::
+
+ sudo rbd device map rbd/myimage --id admin
+
+If you use `cephx`_ authentication, you must also specify a secret. It may come
+from a keyring or a file containing the secret. ::
+
+ sudo rbd device map rbd/myimage --id admin --keyring /path/to/keyring
+ sudo rbd device map rbd/myimage --id admin --keyfile /path/to/file
+
+
+Show Mapped Block Devices
+=========================
+
+To show block device images mapped to kernel modules with the ``rbd``,
+specify ``device list`` arguments. ::
+
+ rbd device list
+
+
+Unmapping a Block Device
+========================
+
+To unmap a block device image with the ``rbd`` command, specify the
+``device unmap`` arguments and the device name (i.e., by convention the
+same as the block device image name). ::
+
+ sudo rbd device unmap /dev/rbd/{poolname}/{imagename}
+
+For example::
+
+ sudo rbd device unmap /dev/rbd/rbd/foo
+
+
+.. _cephx: ../../rados/operations/user-management/
diff --git a/doc/rbd/rbd-live-migration.rst b/doc/rbd/rbd-live-migration.rst
new file mode 100644
index 00000000..0757f4a2
--- /dev/null
+++ b/doc/rbd/rbd-live-migration.rst
@@ -0,0 +1,157 @@
+======================
+ Image Live-Migration
+======================
+
+.. index:: Ceph Block Device; live-migration
+
+RBD images can be live-migrated between different pools within the same cluster
+or between different image formats and layouts. When started, the source image
+will be deep-copied to the destination image, pulling all snapshot history and
+optionally keeping any link to the source image's parent to help preserve
+sparseness.
+
+This copy process can safely run in the background while the new target image is
+in-use. There is currently a requirement to temporarily stop using the source
+image before preparing a migration. This helps to ensure that the client using
+the image is updated to point to the new target image.
+
+.. note::
+ Image live-migration requires the Ceph Nautilus release or later. The krbd
+ kernel module does not support live-migration at this time.
+
+
+.. ditaa::
+
+ +-------------+ +-------------+
+ | {s} c999 | | {s} |
+ | Live | Target refers | Live |
+ | migration |<-------------*| migration |
+ | source | to Source | target |
+ | | | |
+ | (read only) | | (writable) |
+ +-------------+ +-------------+
+
+ Source Target
+
+The live-migration process is comprised of three steps:
+
+#. **Prepare Migration:** The initial step creates the new target image and
+ cross-links the source and target images. Similar to `layered images`_,
+ attempts to read uninitialized extents within the target image will
+ internally redirect the read to the source image, and writes to
+ uninitialized extents within the target will internally deep-copy the
+ overlapping source image block to the target image.
+
+
+#. **Execute Migration:** This is a background operation that deep-copies all
+ initialized blocks from the source image to the target. This step can be
+ run while clients are actively using the new target image.
+
+
+#. **Finish Migration:** Once the background migration process has completed,
+ the migration can be committed or aborted. Committing the migration will
+ remove the cross-links between the source and target images, and will
+ remove the source image. Aborting the migration will remove the cross-links,
+ and will remove the target image.
+
+Prepare Migration
+=================
+
+The live-migration process is initiated by running the `rbd migration prepare`
+command, providing the source and target images::
+
+ $ rbd migration prepare migration_source [migration_target]
+
+The `rbd migration prepare` command accepts all the same layout optionals as the
+`rbd create` command, which allows changes to the immutable image on-disk
+layout. The `migration_target` can be skipped if the goal is only to change the
+on-disk layout, keeping the original image name.
+
+All clients using the source image must be stopped prior to preparing a
+live-migration. The prepare step will fail if it finds any running clients with
+the image open in read/write mode. Once the prepare step is complete, the
+clients can be restarted using the new target image name. Attempting to restart
+the clients using the source image name will result in failure.
+
+The `rbd status` command will show the current state of the live-migration::
+
+ $ rbd status migration_target
+ Watchers: none
+ Migration:
+ source: rbd/migration_source (5e2cba2f62e)
+ destination: rbd/migration_target (5e2ed95ed806)
+ state: prepared
+
+Note that the source image will be moved to the RBD trash to avoid mistaken
+usage during the migration process::
+
+ $ rbd info migration_source
+ rbd: error opening image migration_source: (2) No such file or directory
+ $ rbd trash ls --all
+ 5e2cba2f62e migration_source
+
+
+Execute Migration
+=================
+
+After preparing the live-migration, the image blocks from the source image
+must be copied to the target image. This is accomplished by running the
+`rbd migration execute` command::
+
+ $ rbd migration execute migration_target
+ Image migration: 100% complete...done.
+
+The `rbd status` command will also provide feedback on the progress of the
+migration block deep-copy process::
+
+ $ rbd status migration_target
+ Watchers:
+ watcher=1.2.3.4:0/3695551461 client.123 cookie=123
+ Migration:
+ source: rbd/migration_source (5e2cba2f62e)
+ destination: rbd/migration_target (5e2ed95ed806)
+ state: executing (32% complete)
+
+
+Commit Migration
+================
+
+Once the live-migration has completed deep-copying all data blocks from the
+source image to the target, the migration can be committed::
+
+ $ rbd status migration_target
+ Watchers: none
+ Migration:
+ source: rbd/migration_source (5e2cba2f62e)
+ destination: rbd/migration_target (5e2ed95ed806)
+ state: executed
+ $ rbd migration commit migration_target
+ Commit image migration: 100% complete...done.
+
+If the `migration_source` image is a parent of one or more clones, the `--force`
+option will need to be specified after ensuring all descendent clone images are
+not in use.
+
+Commiting the live-migration will remove the cross-links between the source
+and target images, and will remove the source image::
+
+ $ rbd trash list --all
+
+
+Abort Migration
+===============
+
+If you wish to revert the prepare or execute step, run the `rbd migration abort`
+command to revert the migration process::
+
+ $ rbd migration abort migration_target
+ Abort image migration: 100% complete...done.
+
+Aborting the migration will result in the target image being deleted and access
+to the original source image being restored::
+
+ $ rbd ls
+ migration_source
+
+
+.. _layered images: ../rbd-snapshot/#layering
diff --git a/doc/rbd/rbd-mirroring.rst b/doc/rbd/rbd-mirroring.rst
new file mode 100644
index 00000000..8762b9b9
--- /dev/null
+++ b/doc/rbd/rbd-mirroring.rst
@@ -0,0 +1,410 @@
+===============
+ RBD Mirroring
+===============
+
+.. index:: Ceph Block Device; mirroring
+
+RBD images can be asynchronously mirrored between two Ceph clusters. This
+capability uses the RBD journaling image feature to ensure crash-consistent
+replication between clusters. Mirroring is configured on a per-pool basis
+within peer clusters and can be configured to automatically mirror all
+images within a pool or only a specific subset of images. Mirroring is
+configured using the ``rbd`` command. The ``rbd-mirror`` daemon is responsible
+for pulling image updates from the remote, peer cluster and applying them to
+the image within the local cluster.
+
+.. note:: RBD mirroring requires the Ceph Jewel release or later.
+
+Depending on the desired needs for replication, RBD mirroring can be configured
+for either one- or two-way replication:
+
+* **One-way Replication**: When data is only mirrored from a primary cluster to
+ a secondary cluster, the ``rbd-mirror`` daemon runs only on the secondary
+ cluster.
+
+* **Two-way Replication**: When data is mirrored from primary images on one
+ cluster to non-primary images on another cluster (and vice-versa), the
+ ``rbd-mirror`` daemon runs on both clusters.
+
+.. important:: Each instance of the ``rbd-mirror`` daemon must be able to
+ connect to both the local and remote Ceph clusters simultaneously (i.e.
+ all monitor and OSD hosts). Additionally, the network must have sufficient
+ bandwidth between the two data centers to handle mirroring workload.
+
+Pool Configuration
+==================
+
+The following procedures demonstrate how to perform the basic administrative
+tasks to configure mirroring using the ``rbd`` command. Mirroring is
+configured on a per-pool basis within the Ceph clusters.
+
+The pool configuration steps should be performed on both peer clusters. These
+procedures assume two clusters, named "site-a" and "site-b", are accessible from
+a single host for clarity.
+
+See the `rbd`_ manpage for additional details of how to connect to different
+Ceph clusters.
+
+.. note:: The cluster name in the following examples corresponds to a Ceph
+ configuration file of the same name (e.g. /etc/ceph/site-b.conf). See the
+ `ceph-conf`_ documentation for how to configure multiple clusters.
+
+Enable Mirroring
+----------------
+
+To enable mirroring on a pool with ``rbd``, specify the ``mirror pool enable``
+command, the pool name, and the mirroring mode::
+
+ rbd mirror pool enable {pool-name} {mode}
+
+The mirroring mode can either be ``pool`` or ``image``:
+
+* **pool**: When configured in ``pool`` mode, all images in the pool with the
+ journaling feature enabled are mirrored.
+* **image**: When configured in ``image`` mode, mirroring needs to be
+ `explicitly enabled`_ on each image.
+
+For example::
+
+ $ rbd --cluster site-a mirror pool enable image-pool pool
+ $ rbd --cluster site-b mirror pool enable image-pool pool
+
+Disable Mirroring
+-----------------
+
+To disable mirroring on a pool with ``rbd``, specify the ``mirror pool disable``
+command and the pool name::
+
+ rbd mirror pool disable {pool-name}
+
+When mirroring is disabled on a pool in this way, mirroring will also be
+disabled on any images (within the pool) for which mirroring was enabled
+explicitly.
+
+For example::
+
+ $ rbd --cluster site-a mirror pool disable image-pool
+ $ rbd --cluster site-b mirror pool disable image-pool
+
+Bootstrap Peers
+---------------
+
+In order for the ``rbd-mirror`` daemon to discover its peer cluster, the peer
+needs to be registered to the pool and a user account needs to be created.
+This process can be automated with ``rbd`` and the
+``mirror pool peer bootstrap create`` and ``mirror pool peer bootstrap import``
+commands.
+
+To manually create a new bootstrap token with ``rbd``, specify the
+``mirror pool peer bootstrap create`` command, a pool name, along with an
+optional friendly site name to describe the local cluster::
+
+ rbd mirror pool peer bootstrap create [--site-name {local-site-name}] {pool-name}
+
+The output of ``mirror pool peer bootstrap create`` will be a token that should
+be provided to the ``mirror pool peer bootstrap import`` command. For example,
+on site-a::
+
+ $ rbd --cluster site-a mirror pool peer bootstrap create --site-name site-a image-pool
+ eyJmc2lkIjoiOWY1MjgyZGItYjg5OS00NTk2LTgwOTgtMzIwYzFmYzM5NmYzIiwiY2xpZW50X2lkIjoicmJkLW1pcnJvci1wZWVyIiwia2V5IjoiQVFBUnczOWQwdkhvQmhBQVlMM1I4RmR5dHNJQU50bkFTZ0lOTVE9PSIsIm1vbl9ob3N0IjoiW3YyOjE5Mi4xNjguMS4zOjY4MjAsdjE6MTkyLjE2OC4xLjM6NjgyMV0ifQ==
+
+To manually import the bootstrap token created by another cluster with ``rbd``,
+specify the ``mirror pool peer bootstrap import`` command, the pool name, a file
+path to the created token (or '-' to read from standard input), along with an
+optional friendly site name to describe the local cluster and a mirroring
+direction (defaults to rx-tx for bidirectional mirroring, but can also be set
+to rx-only for unidirectional mirroring)::
+
+ rbd mirror pool peer bootstrap import [--site-name {local-site-name}] [--direction {rx-only or rx-tx}] {pool-name} {token-path}
+
+For example, on site-b::
+
+ $ cat <<EOF > token
+ eyJmc2lkIjoiOWY1MjgyZGItYjg5OS00NTk2LTgwOTgtMzIwYzFmYzM5NmYzIiwiY2xpZW50X2lkIjoicmJkLW1pcnJvci1wZWVyIiwia2V5IjoiQVFBUnczOWQwdkhvQmhBQVlMM1I4RmR5dHNJQU50bkFTZ0lOTVE9PSIsIm1vbl9ob3N0IjoiW3YyOjE5Mi4xNjguMS4zOjY4MjAsdjE6MTkyLjE2OC4xLjM6NjgyMV0ifQ==
+ EOF
+ $ rbd --cluster site-b mirror pool peer bootstrap import --site-name site-b image-pool token
+
+Add Cluster Peer Manually
+-------------------------
+
+Cluster peers can be specified manually if desired or if the above bootstrap
+commands are not available with the currently installed Ceph release.
+
+The remote ``rbd-mirror`` daemon will need access to the local cluster to
+perform mirroring. A new local Ceph user should be created for the remote
+daemon to use. To `create a Ceph user`_, with ``ceph`` specify the
+``auth get-or-create`` command, user name, monitor caps, and OSD caps::
+
+ ceph auth get-or-create client.rbd-mirror-peer mon 'profile rbd' osd 'profile rbd'
+
+The resulting keyring should be copied to the other cluster's ``rbd-mirror``
+daemon hosts if not using the Ceph monitor ``config-key`` store described below.
+
+To manually add a mirroring peer Ceph cluster with ``rbd``, specify the
+``mirror pool peer add`` command, the pool name, and a cluster specification::
+
+ rbd mirror pool peer add {pool-name} {client-name}@{cluster-name}
+
+For example::
+
+ $ rbd --cluster site-a mirror pool peer add image-pool client.rbd-mirror-peer@site-b
+ $ rbd --cluster site-b mirror pool peer add image-pool client.rbd-mirror-peer@site-a
+
+By default, the ``rbd-mirror`` daemon needs to have access to a Ceph
+configuration file located at ``/etc/ceph/{cluster-name}.conf`` that provides
+the addresses of the peer cluster's monitors, in addition to a keyring for
+``{client-name}`` located in the default or configured keyring search paths
+(e.g. ``/etc/ceph/{cluster-name}.{client-name}.keyring``).
+
+Alternatively, the peer cluster's monitor and/or client key can be securely
+stored within the local Ceph monitor ``config-key`` store. To specify the
+peer cluster connection attributes when adding a mirroring peer, use the
+``--remote-mon-host`` and ``--remote-key-file`` optionals. For example::
+
+ $ cat <<EOF > remote-key-file
+ AQAeuZdbMMoBChAAcj++/XUxNOLFaWdtTREEsw==
+ EOF
+ $ rbd --cluster site-a mirror pool peer add image-pool client.rbd-mirror-peer@site-b --remote-mon-host 192.168.1.1,192.168.1.2 --remote-key-file remote-key-file
+ $ rbd --cluster site-a mirror pool info image-pool --all
+ Mode: pool
+ Peers:
+ UUID NAME CLIENT MON_HOST KEY
+ 587b08db-3d33-4f32-8af8-421e77abb081 site-b client.rbd-mirror-peer 192.168.1.1,192.168.1.2 AQAeuZdbMMoBChAAcj++/XUxNOLFaWdtTREEsw==
+
+Remove Cluster Peer
+-------------------
+
+To remove a mirroring peer Ceph cluster with ``rbd``, specify the
+``mirror pool peer remove`` command, the pool name, and the peer UUID
+(available from the ``rbd mirror pool info`` command)::
+
+ rbd mirror pool peer remove {pool-name} {peer-uuid}
+
+For example::
+
+ $ rbd --cluster site-a mirror pool peer remove image-pool 55672766-c02b-4729-8567-f13a66893445
+ $ rbd --cluster site-b mirror pool peer remove image-pool 60c0e299-b38f-4234-91f6-eed0a367be08
+
+Data Pools
+----------
+
+When creating images in the destination cluster, ``rbd-mirror`` selects a data
+pool as follows:
+
+#. If the destination cluster has a default data pool configured (with the
+ ``rbd_default_data_pool`` configuration option), it will be used.
+#. Otherwise, if the source image uses a separate data pool, and a pool with the
+ same name exists on the destination cluster, that pool will be used.
+#. If neither of the above is true, no data pool will be set.
+
+Image Configuration
+===================
+
+Unlike pool configuration, image configuration only needs to be performed against
+a single mirroring peer Ceph cluster.
+
+Mirrored RBD images are designated as either primary or non-primary. This is a
+property of the image and not the pool. Images that are designated as
+non-primary cannot be modified.
+
+Images are automatically promoted to primary when mirroring is first enabled on
+an image (either implicitly if the pool mirror mode was **pool** and the image
+has the journaling image feature enabled, or `explicitly enabled`_ by the
+``rbd`` command).
+
+Enable Image Journaling Support
+-------------------------------
+
+RBD mirroring uses the RBD journaling feature to ensure that the replicated
+image always remains crash-consistent. Before an image can be mirrored to
+a peer cluster, the journaling feature must be enabled. The feature can be
+enabled at image creation time by providing the
+``--image-feature exclusive-lock,journaling`` option to the ``rbd`` command.
+
+Alternatively, the journaling feature can be dynamically enabled on
+pre-existing RBD images. To enable journaling with ``rbd``, specify
+the ``feature enable`` command, the pool and image name, and the feature name::
+
+ rbd feature enable {pool-name}/{image-name} {feature-name}
+
+For example::
+
+ $ rbd --cluster site-a feature enable image-pool/image-1 journaling
+
+.. note:: The journaling feature is dependent on the exclusive-lock feature. If
+ the exclusive-lock feature is not already enabled, it should be enabled prior
+ to enabling the journaling feature.
+
+.. tip:: You can enable journaling on all new images by default by adding
+ ``rbd default features = 125`` to your Ceph configuration file.
+
+Enable Image Mirroring
+----------------------
+
+If the mirroring is configured in ``image`` mode for the image's pool, then it
+is necessary to explicitly enable mirroring for each image within the pool.
+To enable mirroring for a specific image with ``rbd``, specify the
+``mirror image enable`` command along with the pool and image name::
+
+ rbd mirror image enable {pool-name}/{image-name}
+
+For example::
+
+ $ rbd --cluster site-a mirror image enable image-pool/image-1
+
+Disable Image Mirroring
+-----------------------
+
+To disable mirroring for a specific image with ``rbd``, specify the
+``mirror image disable`` command along with the pool and image name::
+
+ rbd mirror image disable {pool-name}/{image-name}
+
+For example::
+
+ $ rbd --cluster site-a mirror image disable image-pool/image-1
+
+Image Promotion and Demotion
+----------------------------
+
+In a failover scenario where the primary designation needs to be moved to the
+image in the peer Ceph cluster, access to the primary image should be stopped
+(e.g. power down the VM or remove the associated drive from a VM), demote the
+current primary image, promote the new primary image, and resume access to the
+image on the alternate cluster.
+
+.. note:: RBD only provides the necessary tools to facilitate an orderly
+ failover of an image. An external mechanism is required to coordinate the
+ full failover process (e.g. closing the image before demotion).
+
+To demote a specific image to non-primary with ``rbd``, specify the
+``mirror image demote`` command along with the pool and image name::
+
+ rbd mirror image demote {pool-name}/{image-name}
+
+For example::
+
+ $ rbd --cluster site-a mirror image demote image-pool/image-1
+
+To demote all primary images within a pool to non-primary with ``rbd``, specify
+the ``mirror pool demote`` command along with the pool name::
+
+ rbd mirror pool demote {pool-name}
+
+For example::
+
+ $ rbd --cluster site-a mirror pool demote image-pool
+
+To promote a specific image to primary with ``rbd``, specify the
+``mirror image promote`` command along with the pool and image name::
+
+ rbd mirror image promote [--force] {pool-name}/{image-name}
+
+For example::
+
+ $ rbd --cluster site-b mirror image promote image-pool/image-1
+
+To promote all non-primary images within a pool to primary with ``rbd``, specify
+the ``mirror pool promote`` command along with the pool name::
+
+ rbd mirror pool promote [--force] {pool-name}
+
+For example::
+
+ $ rbd --cluster site-a mirror pool promote image-pool
+
+.. tip:: Since the primary / non-primary status is per-image, it is possible to
+ have two clusters split the IO load and stage failover / failback.
+
+.. note:: Promotion can be forced using the ``--force`` option. Forced
+ promotion is needed when the demotion cannot be propagated to the peer
+ Ceph cluster (e.g. Ceph cluster failure, communication outage). This will
+ result in a split-brain scenario between the two peers and the image will no
+ longer be in-sync until a `force resync command`_ is issued.
+
+Force Image Resync
+------------------
+
+If a split-brain event is detected by the ``rbd-mirror`` daemon, it will not
+attempt to mirror the affected image until corrected. To resume mirroring for an
+image, first `demote the image`_ determined to be out-of-date and then request a
+resync to the primary image. To request an image resync with ``rbd``, specify the
+``mirror image resync`` command along with the pool and image name::
+
+ rbd mirror image resync {pool-name}/{image-name}
+
+For example::
+
+ $ rbd mirror image resync image-pool/image-1
+
+.. note:: The ``rbd`` command only flags the image as requiring a resync. The
+ local cluster's ``rbd-mirror`` daemon process is responsible for performing
+ the resync asynchronously.
+
+Mirror Status
+=============
+
+The peer cluster replication status is stored for every primary mirrored image.
+This status can be retrieved using the ``mirror image status`` and
+``mirror pool status`` commands.
+
+To request the mirror image status with ``rbd``, specify the
+``mirror image status`` command along with the pool and image name::
+
+ rbd mirror image status {pool-name}/{image-name}
+
+For example::
+
+ $ rbd mirror image status image-pool/image-1
+
+To request the mirror pool summary status with ``rbd``, specify the
+``mirror pool status`` command along with the pool name::
+
+ rbd mirror pool status {pool-name}
+
+For example::
+
+ $ rbd mirror pool status image-pool
+
+.. note:: Adding ``--verbose`` option to the ``mirror pool status`` command will
+ additionally output status details for every mirroring image in the pool.
+
+rbd-mirror Daemon
+=================
+
+The two ``rbd-mirror`` daemons are responsible for watching image journals on the
+remote, peer cluster and replaying the journal events against the local
+cluster. The RBD image journaling feature records all modifications to the
+image in the order they occur. This ensures that a crash-consistent mirror of
+the remote image is available locally.
+
+The ``rbd-mirror`` daemon is available within the optional ``rbd-mirror``
+distribution package.
+
+.. important:: Each ``rbd-mirror`` daemon requires the ability to connect
+ to both clusters simultaneously.
+.. warning:: Pre-Luminous releases: only run a single ``rbd-mirror`` daemon per
+ Ceph cluster.
+
+Each ``rbd-mirror`` daemon should use a unique Ceph user ID. To
+`create a Ceph user`_, with ``ceph`` specify the ``auth get-or-create``
+command, user name, monitor caps, and OSD caps::
+
+ ceph auth get-or-create client.rbd-mirror.{unique id} mon 'profile rbd-mirror' osd 'profile rbd'
+
+The ``rbd-mirror`` daemon can be managed by ``systemd`` by specifying the user
+ID as the daemon instance::
+
+ systemctl enable ceph-rbd-mirror@rbd-mirror.{unique id}
+
+The ``rbd-mirror`` can also be run in foreground by ``rbd-mirror`` command::
+
+ rbd-mirror -f --log-file={log_path}
+
+.. _rbd: ../../man/8/rbd
+.. _ceph-conf: ../../rados/configuration/ceph-conf/#running-multiple-clusters
+.. _explicitly enabled: #enable-image-mirroring
+.. _force resync command: #force-image-resync
+.. _demote the image: #image-promotion-and-demotion
+.. _create a Ceph user: ../../rados/operations/user-management#add-a-user
+
diff --git a/doc/rbd/rbd-openstack.rst b/doc/rbd/rbd-openstack.rst
new file mode 100644
index 00000000..a7c95078
--- /dev/null
+++ b/doc/rbd/rbd-openstack.rst
@@ -0,0 +1,514 @@
+=============================
+ Block Devices and OpenStack
+=============================
+
+.. index:: Ceph Block Device; OpenStack
+
+You may use Ceph Block Device images with OpenStack through ``libvirt``, which
+configures the QEMU interface to ``librbd``. Ceph stripes block device images as
+objects across the cluster, which means that large Ceph Block Device images have
+better performance than a standalone server!
+
+To use Ceph Block Devices with OpenStack, you must install QEMU, ``libvirt``,
+and OpenStack first. We recommend using a separate physical node for your
+OpenStack installation. OpenStack recommends a minimum of 8GB of RAM and a
+quad-core processor. The following diagram depicts the OpenStack/Ceph
+technology stack.
+
+
+.. ditaa::
+
+ +---------------------------------------------------+
+ | OpenStack |
+ +---------------------------------------------------+
+ | libvirt |
+ +------------------------+--------------------------+
+ |
+ | configures
+ v
+ +---------------------------------------------------+
+ | QEMU |
+ +---------------------------------------------------+
+ | librbd |
+ +---------------------------------------------------+
+ | librados |
+ +------------------------+-+------------------------+
+ | OSDs | | Monitors |
+ +------------------------+ +------------------------+
+
+.. important:: To use Ceph Block Devices with OpenStack, you must have
+ access to a running Ceph Storage Cluster.
+
+Three parts of OpenStack integrate with Ceph's block devices:
+
+- **Images**: OpenStack Glance manages images for VMs. Images are immutable.
+ OpenStack treats images as binary blobs and downloads them accordingly.
+
+- **Volumes**: Volumes are block devices. OpenStack uses volumes to boot VMs,
+ or to attach volumes to running VMs. OpenStack manages volumes using
+ Cinder services.
+
+- **Guest Disks**: Guest disks are guest operating system disks. By default,
+ when you boot a virtual machine, its disk appears as a file on the filesystem
+ of the hypervisor (usually under ``/var/lib/nova/instances/<uuid>/``). Prior
+ to OpenStack Havana, the only way to boot a VM in Ceph was to use the
+ boot-from-volume functionality of Cinder. However, now it is possible to boot
+ every virtual machine inside Ceph directly without using Cinder, which is
+ advantageous because it allows you to perform maintenance operations easily
+ with the live-migration process. Additionally, if your hypervisor dies it is
+ also convenient to trigger ``nova evacuate`` and run the virtual machine
+ elsewhere almost seamlessly.
+
+You can use OpenStack Glance to store images in a Ceph Block Device, and you
+can use Cinder to boot a VM using a copy-on-write clone of an image.
+
+The instructions below detail the setup for Glance, Cinder and Nova, although
+they do not have to be used together. You may store images in Ceph block devices
+while running VMs using a local disk, or vice versa.
+
+.. important:: Ceph doesn’t support QCOW2 for hosting a virtual machine disk.
+ Thus if you want to boot virtual machines in Ceph (ephemeral backend or boot
+ from volume), the Glance image format must be ``RAW``.
+
+.. tip:: This document describes using Ceph Block Devices with OpenStack Havana.
+ For earlier versions of OpenStack see
+ `Block Devices and OpenStack (Dumpling)`_.
+
+.. index:: pools; OpenStack
+
+Create a Pool
+=============
+
+By default, Ceph block devices use the ``rbd`` pool. You may use any available
+pool. We recommend creating a pool for Cinder and a pool for Glance. Ensure
+your Ceph cluster is running, then create the pools. ::
+
+ ceph osd pool create volumes 128
+ ceph osd pool create images 128
+ ceph osd pool create backups 128
+ ceph osd pool create vms 128
+
+See `Create a Pool`_ for detail on specifying the number of placement groups for
+your pools, and `Placement Groups`_ for details on the number of placement
+groups you should set for your pools.
+
+Newly created pools must initialized prior to use. Use the ``rbd`` tool
+to initialize the pools::
+
+ rbd pool init volumes
+ rbd pool init images
+ rbd pool init backups
+ rbd pool init vms
+
+.. _Create a Pool: ../../rados/operations/pools#createpool
+.. _Placement Groups: ../../rados/operations/placement-groups
+
+
+Configure OpenStack Ceph Clients
+================================
+
+The nodes running ``glance-api``, ``cinder-volume``, ``nova-compute`` and
+``cinder-backup`` act as Ceph clients. Each requires the ``ceph.conf`` file::
+
+ ssh {your-openstack-server} sudo tee /etc/ceph/ceph.conf </etc/ceph/ceph.conf
+
+
+Install Ceph client packages
+----------------------------
+
+On the ``glance-api`` node, you will need the Python bindings for ``librbd``::
+
+ sudo apt-get install python-rbd
+ sudo yum install python-rbd
+
+On the ``nova-compute``, ``cinder-backup`` and on the ``cinder-volume`` node,
+use both the Python bindings and the client command line tools::
+
+ sudo apt-get install ceph-common
+ sudo yum install ceph-common
+
+
+Setup Ceph Client Authentication
+--------------------------------
+
+If you have `cephx authentication`_ enabled, create a new user for Nova/Cinder
+and Glance. Execute the following::
+
+ ceph auth get-or-create client.glance mon 'profile rbd' osd 'profile rbd pool=images' mgr 'profile rbd pool=images'
+ ceph auth get-or-create client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images' mgr 'profile rbd pool=volumes, profile rbd pool=vms'
+ ceph auth get-or-create client.cinder-backup mon 'profile rbd' osd 'profile rbd pool=backups' mgr 'profile rbd pool=backups'
+
+Add the keyrings for ``client.cinder``, ``client.glance``, and
+``client.cinder-backup`` to the appropriate nodes and change their ownership::
+
+ ceph auth get-or-create client.glance | ssh {your-glance-api-server} sudo tee /etc/ceph/ceph.client.glance.keyring
+ ssh {your-glance-api-server} sudo chown glance:glance /etc/ceph/ceph.client.glance.keyring
+ ceph auth get-or-create client.cinder | ssh {your-volume-server} sudo tee /etc/ceph/ceph.client.cinder.keyring
+ ssh {your-cinder-volume-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring
+ ceph auth get-or-create client.cinder-backup | ssh {your-cinder-backup-server} sudo tee /etc/ceph/ceph.client.cinder-backup.keyring
+ ssh {your-cinder-backup-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder-backup.keyring
+
+Nodes running ``nova-compute`` need the keyring file for the ``nova-compute``
+process::
+
+ ceph auth get-or-create client.cinder | ssh {your-nova-compute-server} sudo tee /etc/ceph/ceph.client.cinder.keyring
+
+They also need to store the secret key of the ``client.cinder`` user in
+``libvirt``. The libvirt process needs it to access the cluster while attaching
+a block device from Cinder.
+
+Create a temporary copy of the secret key on the nodes running
+``nova-compute``::
+
+ ceph auth get-key client.cinder | ssh {your-compute-node} tee client.cinder.key
+
+Then, on the compute nodes, add the secret key to ``libvirt`` and remove the
+temporary copy of the key::
+
+ uuidgen
+ 457eb676-33da-42ec-9a8c-9293d545c337
+
+ cat > secret.xml <<EOF
+ <secret ephemeral='no' private='no'>
+ <uuid>457eb676-33da-42ec-9a8c-9293d545c337</uuid>
+ <usage type='ceph'>
+ <name>client.cinder secret</name>
+ </usage>
+ </secret>
+ EOF
+ sudo virsh secret-define --file secret.xml
+ Secret 457eb676-33da-42ec-9a8c-9293d545c337 created
+ sudo virsh secret-set-value --secret 457eb676-33da-42ec-9a8c-9293d545c337 --base64 $(cat client.cinder.key) && rm client.cinder.key secret.xml
+
+Save the uuid of the secret for configuring ``nova-compute`` later.
+
+.. important:: You don't necessarily need the UUID on all the compute nodes.
+ However from a platform consistency perspective, it's better to keep the
+ same UUID.
+
+.. _cephx authentication: ../../rados/configuration/auth-config-ref/#enabling-disabling-cephx
+
+
+Configure OpenStack to use Ceph
+===============================
+
+Configuring Glance
+------------------
+
+Glance can use multiple back ends to store images. To use Ceph block devices by
+default, configure Glance like the following.
+
+Prior to Juno
+~~~~~~~~~~~~~~
+
+Edit ``/etc/glance/glance-api.conf`` and add under the ``[DEFAULT]`` section::
+
+ default_store = rbd
+ rbd_store_user = glance
+ rbd_store_pool = images
+ rbd_store_chunk_size = 8
+
+
+Juno
+~~~~
+
+Edit ``/etc/glance/glance-api.conf`` and add under the ``[glance_store]`` section::
+
+ [DEFAULT]
+ ...
+ default_store = rbd
+ ...
+ [glance_store]
+ stores = rbd
+ rbd_store_pool = images
+ rbd_store_user = glance
+ rbd_store_ceph_conf = /etc/ceph/ceph.conf
+ rbd_store_chunk_size = 8
+
+.. important:: Glance has not completely moved to 'store' yet.
+ So we still need to configure the store in the DEFAULT section until Kilo.
+
+Kilo and after
+~~~~~~~~~~~~~~
+
+Edit ``/etc/glance/glance-api.conf`` and add under the ``[glance_store]`` section::
+
+ [glance_store]
+ stores = rbd
+ default_store = rbd
+ rbd_store_pool = images
+ rbd_store_user = glance
+ rbd_store_ceph_conf = /etc/ceph/ceph.conf
+ rbd_store_chunk_size = 8
+
+For more information about the configuration options available in Glance please refer to the OpenStack Configuration Reference: http://docs.openstack.org/.
+
+Enable copy-on-write cloning of images
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Note that this exposes the back end location via Glance's API, so the endpoint
+with this option enabled should not be publicly accessible.
+
+Any OpenStack version except Mitaka
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If you want to enable copy-on-write cloning of images, also add under the ``[DEFAULT]`` section::
+
+ show_image_direct_url = True
+
+For Mitaka only
+^^^^^^^^^^^^^^^
+
+To enable image locations and take advantage of copy-on-write cloning for images, add under the ``[DEFAULT]`` section::
+
+ show_multiple_locations = True
+ show_image_direct_url = True
+
+Disable cache management (any OpenStack version)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Disable the Glance cache management to avoid images getting cached under ``/var/lib/glance/image-cache/``,
+assuming your configuration file has ``flavor = keystone+cachemanagement``::
+
+ [paste_deploy]
+ flavor = keystone
+
+Image properties
+~~~~~~~~~~~~~~~~
+
+We recommend to use the following properties for your images:
+
+- ``hw_scsi_model=virtio-scsi``: add the virtio-scsi controller and get better performance and support for discard operation
+- ``hw_disk_bus=scsi``: connect every cinder block devices to that controller
+- ``hw_qemu_guest_agent=yes``: enable the QEMU guest agent
+- ``os_require_quiesce=yes``: send fs-freeze/thaw calls through the QEMU guest agent
+
+
+Configuring Cinder
+------------------
+
+OpenStack requires a driver to interact with Ceph block devices. You must also
+specify the pool name for the block device. On your OpenStack node, edit
+``/etc/cinder/cinder.conf`` by adding::
+
+ [DEFAULT]
+ ...
+ enabled_backends = ceph
+ glance_api_version = 2
+ ...
+ [ceph]
+ volume_driver = cinder.volume.drivers.rbd.RBDDriver
+ volume_backend_name = ceph
+ rbd_pool = volumes
+ rbd_ceph_conf = /etc/ceph/ceph.conf
+ rbd_flatten_volume_from_snapshot = false
+ rbd_max_clone_depth = 5
+ rbd_store_chunk_size = 4
+ rados_connect_timeout = -1
+
+If you are using `cephx authentication`_, also configure the user and uuid of
+the secret you added to ``libvirt`` as documented earlier::
+
+ [ceph]
+ ...
+ rbd_user = cinder
+ rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
+
+Note that if you are configuring multiple cinder back ends,
+``glance_api_version = 2`` must be in the ``[DEFAULT]`` section.
+
+
+Configuring Cinder Backup
+-------------------------
+
+OpenStack Cinder Backup requires a specific daemon so don't forget to install it.
+On your Cinder Backup node, edit ``/etc/cinder/cinder.conf`` and add::
+
+ backup_driver = cinder.backup.drivers.ceph
+ backup_ceph_conf = /etc/ceph/ceph.conf
+ backup_ceph_user = cinder-backup
+ backup_ceph_chunk_size = 134217728
+ backup_ceph_pool = backups
+ backup_ceph_stripe_unit = 0
+ backup_ceph_stripe_count = 0
+ restore_discard_excess_bytes = true
+
+
+Configuring Nova to attach Ceph RBD block device
+------------------------------------------------
+
+In order to attach Cinder devices (either normal block or by issuing a boot
+from volume), you must tell Nova (and libvirt) which user and UUID to refer to
+when attaching the device. libvirt will refer to this user when connecting and
+authenticating with the Ceph cluster. ::
+
+ [libvirt]
+ ...
+ rbd_user = cinder
+ rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
+
+These two flags are also used by the Nova ephemeral backend.
+
+
+Configuring Nova
+----------------
+
+In order to boot all the virtual machines directly into Ceph, you must
+configure the ephemeral backend for Nova.
+
+It is recommended to enable the RBD cache in your Ceph configuration file
+(enabled by default since Giant). Moreover, enabling the admin socket
+brings a lot of benefits while troubleshooting. Having one socket
+per virtual machine using a Ceph block device will help investigating performance and/or wrong behaviors.
+
+This socket can be accessed like this::
+
+ ceph daemon /var/run/ceph/ceph-client.cinder.19195.32310016.asok help
+
+Now on every compute nodes edit your Ceph configuration file::
+
+ [client]
+ rbd cache = true
+ rbd cache writethrough until flush = true
+ admin socket = /var/run/ceph/guests/$cluster-$type.$id.$pid.$cctid.asok
+ log file = /var/log/qemu/qemu-guest-$pid.log
+ rbd concurrent management ops = 20
+
+Configure the permissions of these paths::
+
+ mkdir -p /var/run/ceph/guests/ /var/log/qemu/
+ chown qemu:libvirtd /var/run/ceph/guests /var/log/qemu/
+
+Note that user ``qemu`` and group ``libvirtd`` can vary depending on your system.
+The provided example works for RedHat based systems.
+
+.. tip:: If your virtual machine is already running you can simply restart it to get the socket
+
+
+Havana and Icehouse
+~~~~~~~~~~~~~~~~~~~
+
+Havana and Icehouse require patches to implement copy-on-write cloning and fix
+bugs with image size and live migration of ephemeral disks on rbd. These are
+available in branches based on upstream Nova `stable/havana`_ and
+`stable/icehouse`_. Using them is not mandatory but **highly recommended** in
+order to take advantage of the copy-on-write clone functionality.
+
+On every Compute node, edit ``/etc/nova/nova.conf`` and add::
+
+ libvirt_images_type = rbd
+ libvirt_images_rbd_pool = vms
+ libvirt_images_rbd_ceph_conf = /etc/ceph/ceph.conf
+ disk_cachemodes="network=writeback"
+ rbd_user = cinder
+ rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
+
+It is also a good practice to disable file injection. While booting an
+instance, Nova usually attempts to open the rootfs of the virtual machine.
+Then, Nova injects values such as password, ssh keys etc. directly into the
+filesystem. However, it is better to rely on the metadata service and
+``cloud-init``.
+
+On every Compute node, edit ``/etc/nova/nova.conf`` and add::
+
+ libvirt_inject_password = false
+ libvirt_inject_key = false
+ libvirt_inject_partition = -2
+
+To ensure a proper live-migration, use the following flags::
+
+ libvirt_live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED"
+
+Juno
+~~~~
+
+In Juno, Ceph block device was moved under the ``[libvirt]`` section.
+On every Compute node, edit ``/etc/nova/nova.conf`` under the ``[libvirt]``
+section and add::
+
+ [libvirt]
+ images_type = rbd
+ images_rbd_pool = vms
+ images_rbd_ceph_conf = /etc/ceph/ceph.conf
+ rbd_user = cinder
+ rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
+ disk_cachemodes="network=writeback"
+
+
+It is also a good practice to disable file injection. While booting an
+instance, Nova usually attempts to open the rootfs of the virtual machine.
+Then, Nova injects values such as password, ssh keys etc. directly into the
+filesystem. However, it is better to rely on the metadata service and
+``cloud-init``.
+
+On every Compute node, edit ``/etc/nova/nova.conf`` and add the following
+under the ``[libvirt]`` section::
+
+ inject_password = false
+ inject_key = false
+ inject_partition = -2
+
+To ensure a proper live-migration, use the following flags (under the ``[libvirt]`` section)::
+
+ live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED"
+
+Kilo
+~~~~
+
+Enable discard support for virtual machine ephemeral root disk::
+
+ [libvirt]
+ ...
+ ...
+ hw_disk_discard = unmap # enable discard support (be careful of performance)
+
+
+Restart OpenStack
+=================
+
+To activate the Ceph block device driver and load the block device pool name
+into the configuration, you must restart OpenStack. Thus, for Debian based
+systems execute these commands on the appropriate nodes::
+
+ sudo glance-control api restart
+ sudo service nova-compute restart
+ sudo service cinder-volume restart
+ sudo service cinder-backup restart
+
+For Red Hat based systems execute::
+
+ sudo service openstack-glance-api restart
+ sudo service openstack-nova-compute restart
+ sudo service openstack-cinder-volume restart
+ sudo service openstack-cinder-backup restart
+
+Once OpenStack is up and running, you should be able to create a volume
+and boot from it.
+
+
+Booting from a Block Device
+===========================
+
+You can create a volume from an image using the Cinder command line tool::
+
+ cinder create --image-id {id of image} --display-name {name of volume} {size of volume}
+
+Note that image must be RAW format. You can use `qemu-img`_ to convert
+from one format to another. For example::
+
+ qemu-img convert -f {source-format} -O {output-format} {source-filename} {output-filename}
+ qemu-img convert -f qcow2 -O raw precise-cloudimg.img precise-cloudimg.raw
+
+When Glance and Cinder are both using Ceph block devices, the image is a
+copy-on-write clone, so it can create a new volume quickly. In the OpenStack
+dashboard, you can boot from that volume by performing the following steps:
+
+#. Launch a new instance.
+#. Choose the image associated to the copy-on-write clone.
+#. Select 'boot from volume'.
+#. Select the volume you created.
+
+.. _qemu-img: ../qemu-rbd/#running-qemu-with-rbd
+.. _Block Devices and OpenStack (Dumpling): http://docs.ceph.com/docs/dumpling/rbd/rbd-openstack
+.. _stable/havana: https://github.com/jdurgin/nova/tree/havana-ephemeral-rbd
+.. _stable/icehouse: https://github.com/angdraug/nova/tree/rbd-ephemeral-clone-stable-icehouse
diff --git a/doc/rbd/rbd-replay.rst b/doc/rbd/rbd-replay.rst
new file mode 100644
index 00000000..e1c96b21
--- /dev/null
+++ b/doc/rbd/rbd-replay.rst
@@ -0,0 +1,42 @@
+===================
+ RBD Replay
+===================
+
+.. index:: Ceph Block Device; RBD Replay
+
+RBD Replay is a set of tools for capturing and replaying Rados Block Device
+(RBD) workloads. To capture an RBD workload, ``lttng-tools`` must be installed
+on the client, and ``librbd`` on the client must be the v0.87 (Giant) release
+or later. To replay an RBD workload, ``librbd`` on the client must be the Giant
+release or later.
+
+Capture and replay takes three steps:
+
+#. Capture the trace. Make sure to capture ``pthread_id`` context::
+
+ mkdir -p traces
+ lttng create -o traces librbd
+ lttng enable-event -u 'librbd:*'
+ lttng add-context -u -t pthread_id
+ lttng start
+ # run RBD workload here
+ lttng stop
+
+#. Process the trace with `rbd-replay-prep`_::
+
+ rbd-replay-prep traces/ust/uid/*/* replay.bin
+
+#. Replay the trace with `rbd-replay`_. Use read-only until you know
+ it's doing what you want::
+
+ rbd-replay --read-only replay.bin
+
+.. important:: ``rbd-replay`` will destroy data by default. Do not use against
+ an image you wish to keep, unless you use the ``--read-only`` option.
+
+The replayed workload does not have to be against the same RBD image or even the
+same cluster as the captured workload. To account for differences, you may need
+to use the ``--pool`` and ``--map-image`` options of ``rbd-replay``.
+
+.. _rbd-replay: ../../man/8/rbd-replay
+.. _rbd-replay-prep: ../../man/8/rbd-replay-prep
diff --git a/doc/rbd/rbd-snapshot.rst b/doc/rbd/rbd-snapshot.rst
new file mode 100644
index 00000000..47eba16a
--- /dev/null
+++ b/doc/rbd/rbd-snapshot.rst
@@ -0,0 +1,314 @@
+===========
+ Snapshots
+===========
+
+.. index:: Ceph Block Device; snapshots
+
+A snapshot is a read-only copy of the state of an image at a particular point in
+time. One of the advanced features of Ceph block devices is that you can create
+snapshots of the images to retain a history of an image's state. Ceph also
+supports snapshot layering, which allows you to clone images (e.g., a VM image)
+quickly and easily. Ceph supports block device snapshots using the ``rbd``
+command and many higher level interfaces, including `QEMU`_, `libvirt`_,
+`OpenStack`_ and `CloudStack`_.
+
+.. important:: To use use RBD snapshots, you must have a running Ceph cluster.
+
+.. note:: Because RBD does not know about the filesystem, snapshots are
+ `crash-consistent` if they are not coordinated with the mounting
+ computer. So, we recommend you stop `I/O` before taking a snapshot of
+ an image. If the image contains a filesystem, the filesystem must be
+ in a consistent state before taking a snapshot or you may have to run
+ `fsck`. To stop `I/O` you can use `fsfreeze` command. See
+ `fsfreeze(8)` man page for more details.
+ For virtual machines, `qemu-guest-agent` can be used to automatically
+ freeze filesystems when creating a snapshot.
+
+.. ditaa::
+
+ +------------+ +-------------+
+ | {s} | | {s} c999 |
+ | Active |<-------*| Snapshot |
+ | Image | | of Image |
+ | (stop i/o) | | (read only) |
+ +------------+ +-------------+
+
+
+Cephx Notes
+===========
+
+When `cephx`_ is enabled (it is by default), you must specify a user name or ID
+and a path to the keyring containing the corresponding key for the user. See
+:ref:`User Management <user-management>` for details. You may also add the ``CEPH_ARGS`` environment
+variable to avoid re-entry of the following parameters. ::
+
+ rbd --id {user-ID} --keyring=/path/to/secret [commands]
+ rbd --name {username} --keyring=/path/to/secret [commands]
+
+For example::
+
+ rbd --id admin --keyring=/etc/ceph/ceph.keyring [commands]
+ rbd --name client.admin --keyring=/etc/ceph/ceph.keyring [commands]
+
+.. tip:: Add the user and secret to the ``CEPH_ARGS`` environment
+ variable so that you don't need to enter them each time.
+
+
+Snapshot Basics
+===============
+
+The following procedures demonstrate how to create, list, and remove
+snapshots using the ``rbd`` command on the command line.
+
+Create Snapshot
+---------------
+
+To create a snapshot with ``rbd``, specify the ``snap create`` option, the pool
+name and the image name. ::
+
+ rbd snap create {pool-name}/{image-name}@{snap-name}
+
+For example::
+
+ rbd snap create rbd/foo@snapname
+
+
+List Snapshots
+--------------
+
+To list snapshots of an image, specify the pool name and the image name. ::
+
+ rbd snap ls {pool-name}/{image-name}
+
+For example::
+
+ rbd snap ls rbd/foo
+
+
+Rollback Snapshot
+-----------------
+
+To rollback to a snapshot with ``rbd``, specify the ``snap rollback`` option, the
+pool name, the image name and the snap name. ::
+
+ rbd snap rollback {pool-name}/{image-name}@{snap-name}
+
+For example::
+
+ rbd snap rollback rbd/foo@snapname
+
+
+.. note:: Rolling back an image to a snapshot means overwriting
+ the current version of the image with data from a snapshot. The
+ time it takes to execute a rollback increases with the size of the
+ image. It is **faster to clone** from a snapshot **than to rollback**
+ an image to a snapshot, and it is the preferred method of returning
+ to a pre-existing state.
+
+
+Delete a Snapshot
+-----------------
+
+To delete a snapshot with ``rbd``, specify the ``snap rm`` option, the pool
+name, the image name and the snap name. ::
+
+ rbd snap rm {pool-name}/{image-name}@{snap-name}
+
+For example::
+
+ rbd snap rm rbd/foo@snapname
+
+
+.. note:: Ceph OSDs delete data asynchronously, so deleting a snapshot
+ doesn't free up the disk space immediately.
+
+Purge Snapshots
+---------------
+
+To delete all snapshots for an image with ``rbd``, specify the ``snap purge``
+option and the image name. ::
+
+ rbd snap purge {pool-name}/{image-name}
+
+For example::
+
+ rbd snap purge rbd/foo
+
+
+.. index:: Ceph Block Device; snapshot layering
+
+Layering
+========
+
+Ceph supports the ability to create many copy-on-write (COW) clones of a block
+device snapshot. Snapshot layering enables Ceph block device clients to create
+images very quickly. For example, you might create a block device image with a
+Linux VM written to it; then, snapshot the image, protect the snapshot, and
+create as many copy-on-write clones as you like. A snapshot is read-only,
+so cloning a snapshot simplifies semantics--making it possible to create
+clones rapidly.
+
+
+.. ditaa::
+
+ +-------------+ +-------------+
+ | {s} c999 | | {s} |
+ | Snapshot | Child refers | COW Clone |
+ | of Image |<------------*| of Snapshot |
+ | | to Parent | |
+ | (read only) | | (writable) |
+ +-------------+ +-------------+
+
+ Parent Child
+
+.. note:: The terms "parent" and "child" mean a Ceph block device snapshot (parent),
+ and the corresponding image cloned from the snapshot (child). These terms are
+ important for the command line usage below.
+
+Each cloned image (child) stores a reference to its parent image, which enables
+the cloned image to open the parent snapshot and read it.
+
+A COW clone of a snapshot behaves exactly like any other Ceph block device
+image. You can read to, write from, clone, and resize cloned images. There are
+no special restrictions with cloned images. However, the copy-on-write clone of
+a snapshot refers to the snapshot, so you **MUST** protect the snapshot before
+you clone it. The following diagram depicts the process.
+
+.. note:: Ceph only supports cloning for format 2 images (i.e., created with
+ ``rbd create --image-format 2``). The kernel client supports cloned images
+ since kernel 3.10.
+
+Getting Started with Layering
+-----------------------------
+
+Ceph block device layering is a simple process. You must have an image. You must
+create a snapshot of the image. You must protect the snapshot. Once you have
+performed these steps, you can begin cloning the snapshot.
+
+.. ditaa::
+
+ +----------------------------+ +-----------------------------+
+ | | | |
+ | Create Block Device Image |------->| Create a Snapshot |
+ | | | |
+ +----------------------------+ +-----------------------------+
+ |
+ +--------------------------------------+
+ |
+ v
+ +----------------------------+ +-----------------------------+
+ | | | |
+ | Protect the Snapshot |------->| Clone the Snapshot |
+ | | | |
+ +----------------------------+ +-----------------------------+
+
+
+The cloned image has a reference to the parent snapshot, and includes the pool
+ID, image ID and snapshot ID. The inclusion of the pool ID means that you may
+clone snapshots from one pool to images in another pool.
+
+
+#. **Image Template:** A common use case for block device layering is to create a
+ master image and a snapshot that serves as a template for clones. For example,
+ a user may create an image for a Linux distribution (e.g., Ubuntu 12.04), and
+ create a snapshot for it. Periodically, the user may update the image and create
+ a new snapshot (e.g., ``sudo apt-get update``, ``sudo apt-get upgrade``,
+ ``sudo apt-get dist-upgrade`` followed by ``rbd snap create``). As the image
+ matures, the user can clone any one of the snapshots.
+
+#. **Extended Template:** A more advanced use case includes extending a template
+ image that provides more information than a base image. For example, a user may
+ clone an image (e.g., a VM template) and install other software (e.g., a database,
+ a content management system, an analytics system, etc.) and then snapshot the
+ extended image, which itself may be updated just like the base image.
+
+#. **Template Pool:** One way to use block device layering is to create a
+ pool that contains master images that act as templates, and snapshots of those
+ templates. You may then extend read-only privileges to users so that they
+ may clone the snapshots without the ability to write or execute within the pool.
+
+#. **Image Migration/Recovery:** One way to use block device layering is to migrate
+ or recover data from one pool into another pool.
+
+Protecting a Snapshot
+---------------------
+
+Clones access the parent snapshots. All clones would break if a user inadvertently
+deleted the parent snapshot. To prevent data loss, you **MUST** protect the
+snapshot before you can clone it. ::
+
+ rbd snap protect {pool-name}/{image-name}@{snapshot-name}
+
+For example::
+
+ rbd snap protect rbd/my-image@my-snapshot
+
+.. note:: You cannot delete a protected snapshot.
+
+Cloning a Snapshot
+------------------
+
+To clone a snapshot, specify you need to specify the parent pool, image and
+snapshot; and, the child pool and image name. You must protect the snapshot
+before you can clone it. ::
+
+ rbd clone {pool-name}/{parent-image}@{snap-name} {pool-name}/{child-image-name}
+
+For example::
+
+ rbd clone rbd/my-image@my-snapshot rbd/new-image
+
+.. note:: You may clone a snapshot from one pool to an image in another pool. For example,
+ you may maintain read-only images and snapshots as templates in one pool, and writeable
+ clones in another pool.
+
+Unprotecting a Snapshot
+-----------------------
+
+Before you can delete a snapshot, you must unprotect it first. Additionally,
+you may *NOT* delete snapshots that have references from clones. You must
+flatten each clone of a snapshot, before you can delete the snapshot. ::
+
+ rbd snap unprotect {pool-name}/{image-name}@{snapshot-name}
+
+For example::
+
+ rbd snap unprotect rbd/my-image@my-snapshot
+
+
+Listing Children of a Snapshot
+------------------------------
+
+To list the children of a snapshot, execute the following::
+
+ rbd children {pool-name}/{image-name}@{snapshot-name}
+
+For example::
+
+ rbd children rbd/my-image@my-snapshot
+
+
+Flattening a Cloned Image
+-------------------------
+
+Cloned images retain a reference to the parent snapshot. When you remove the
+reference from the child clone to the parent snapshot, you effectively "flatten"
+the image by copying the information from the snapshot to the clone. The time
+it takes to flatten a clone increases with the size of the snapshot. To delete
+a snapshot, you must flatten the child images first. ::
+
+ rbd flatten {pool-name}/{image-name}
+
+For example::
+
+ rbd flatten rbd/new-image
+
+.. note:: Since a flattened image contains all the information from the snapshot,
+ a flattened image will take up more storage space than a layered clone.
+
+
+.. _cephx: ../../rados/configuration/auth-config-ref/
+.. _QEMU: ../qemu-rbd/
+.. _OpenStack: ../rbd-openstack/
+.. _CloudStack: ../rbd-cloudstack/
+.. _libvirt: ../libvirt/