24 files changed, 1733 insertions, 0 deletions
diff --git a/doc/ceph-volume/drive-group.rst b/doc/ceph-volume/drive-group.rst
new file mode 100644
index 000000000..f9d1cf3c3
--- /dev/null
+++ b/doc/ceph-volume/drive-group.rst
@@ -0,0 +1,12 @@
+.. _ceph-volume-drive-group:
+
+``drive-group``
+===============
+The drive-group subcommand allows for passing :ref:`drivegroups` specifications
+straight to ceph-volume as json. ceph-volume will then attempt to deploy this
+drive groups via the batch subcommand.
+
+The specification can be passed via a file, string argument or on stdin.
+See the subcommand help for further details::
+
+    # ceph-volume drive-group --help
diff --git a/doc/ceph-volume/index.rst b/doc/ceph-volume/index.rst
new file mode 100644
index 000000000..9271bc2a0
--- /dev/null
+++ b/doc/ceph-volume/index.rst
@@ -0,0 +1,87 @@
+.. _ceph-volume:
+
+ceph-volume
+===========
+Deploy OSDs with different device technologies like lvm or physical disks using
+pluggable tools (:doc:`lvm/index` itself is treated like a plugin) and trying to
+follow a predictable, and robust way of preparing, activating, and starting OSDs.
+
+:ref:`Overview <ceph-volume-overview>` |
+:ref:`Plugin Guide <ceph-volume-plugins>` |
+
+
+**Command Line Subcommands**
+
+There is currently support for ``lvm``, and plain disks (with GPT partitions)
+that may have been deployed with ``ceph-disk``.
+
+``zfs`` support is available for running a FreeBSD cluster.
+
+* :ref:`ceph-volume-lvm`
+* :ref:`ceph-volume-simple`
+* :ref:`ceph-volume-zfs`
+
+**Node inventory**
+
+The :ref:`ceph-volume-inventory` subcommand provides information and metadata
+about a nodes physical disk inventory.
+
+
+Migrating
+---------
+Starting on Ceph version 13.0.0, ``ceph-disk`` is deprecated. Deprecation
+warnings will show up that will link to this page. It is strongly suggested
+that users start consuming ``ceph-volume``. There are two paths for migrating:
+
+#. Keep OSDs deployed with ``ceph-disk``: The :ref:`ceph-volume-simple` command
+   provides a way to take over the management while disabling ``ceph-disk``
+   triggers.
+#. Redeploy existing OSDs with ``ceph-volume``: This is covered in depth on
+   :ref:`rados-replacing-an-osd`
+
+For details on why ``ceph-disk`` was removed please see the :ref:`Why was
+ceph-disk replaced? <ceph-disk-replaced>` section.
+
+
+New deployments
+^^^^^^^^^^^^^^^
+For new deployments, :ref:`ceph-volume-lvm` is recommended, it can use any
+logical volume as input for data OSDs, or it can setup a minimal/naive logical
+volume from a device.
+
+Existing OSDs
+^^^^^^^^^^^^^
+If the cluster has OSDs that were provisioned with ``ceph-disk``, then
+``ceph-volume`` can take over the management of these with
+:ref:`ceph-volume-simple`. A scan is done on the data device or OSD directory,
+and ``ceph-disk`` is fully disabled. Encryption is fully supported.
+
+
+.. toctree::
+   :hidden:
+   :maxdepth: 3
+   :caption: Contents:
+
+   intro
+   systemd
+   inventory
+   drive-group
+   lvm/index
+   lvm/activate
+   lvm/batch
+   lvm/encryption
+   lvm/prepare
+   lvm/create
+   lvm/scan
+   lvm/systemd
+   lvm/list
+   lvm/zap
+   lvm/migrate
+   lvm/newdb
+   lvm/newwal
+   simple/index
+   simple/activate
+   simple/scan
+   simple/systemd
+   zfs/index
+   zfs/inventory
diff --git a/doc/ceph-volume/intro.rst b/doc/ceph-volume/intro.rst
new file mode 100644
index 000000000..c36f12a77
--- /dev/null
+++ b/doc/ceph-volume/intro.rst
@@ -0,0 +1,84 @@
+.. _ceph-volume-overview:
+
+Overview
+--------
+The ``ceph-volume`` tool aims to be a single purpose command line tool to deploy
+logical volumes as OSDs, trying to maintain a similar API to ``ceph-disk`` when
+preparing, activating, and creating OSDs.
+
+It deviates from ``ceph-disk`` by not interacting or relying on the udev rules
+that come installed for Ceph. These rules allow automatic detection of
+previously setup devices that are in turn fed into ``ceph-disk`` to activate
+them.
+
+.. _ceph-disk-replaced:
+
+Replacing ``ceph-disk``
+-----------------------
+The ``ceph-disk`` tool was created at a time when the project was required to
+support many different types of init systems (upstart, sysvinit, etc...) while
+being able to discover devices. This caused the tool to concentrate initially
+(and exclusively afterwards) on GPT partitions. Specifically on GPT GUIDs,
+which were used to label devices in a unique way to answer questions like:
+
+* is this device a Journal?
+* an encrypted data partition?
+* was the device left partially prepared?
+
+To solve these, it used ``UDEV`` rules to match the GUIDs, that would call
+``ceph-disk``, and end up in a back and forth between the ``ceph-disk`` systemd
+unit and the ``ceph-disk`` executable. The process was very unreliable and time
+consuming (a timeout of close to three hours **per OSD** had to be put in
+place), and would cause OSDs to not come up at all during the boot process of
+a node.
+
+It was hard to debug, or even replicate these problems given the asynchronous
+behavior of ``UDEV``.
+
+Since the world-view of ``ceph-disk`` had to be GPT partitions exclusively, it meant
+that it couldn't work with other technologies like LVM, or similar device
+mapper devices. It was ultimately decided to create something modular, starting
+with LVM support, and the ability to expand on other technologies as needed.
+
+
+GPT partitions are simple?
+--------------------------
+Although partitions in general are simple to reason about, ``ceph-disk``
+partitions were not simple by any means. It required a tremendous amount of
+special flags in order to get them to work correctly with the device discovery
+workflow. Here is an example call to create a data partition::
+
+    /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:f0fc39fd-eeb2-49f1-b922-a11939cf8a0f --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/sdb
+
+Not only creating these was hard, but these partitions required devices to be
+exclusively owned by Ceph. For example, in some cases a special partition would
+be created when devices were encrypted, which would contain unencrypted keys.
+This was ``ceph-disk`` domain knowledge, which would not translate to a "GPT
+partitions are simple" understanding. Here is an example of that special
+partition being created::
+
+    /sbin/sgdisk --new=5:0:+10M --change-name=5:ceph lockbox --partition-guid=5:None --typecode=5:fb3aabf9-d25f-47cc-bf5e-721d181642be --mbrtogpt -- /dev/sdad
+
+
+Modularity
+----------
+``ceph-volume`` was designed to be a modular tool because we anticipate that
+there are going to be lots of ways that people provision the hardware devices
+that we need to consider. There are already two: legacy ceph-disk devices that
+are still in use and have GPT partitions (handled by :ref:`ceph-volume-simple`),
+and lvm. SPDK devices where we manage NVMe devices directly from userspace are
+on the immediate horizon, where LVM won't work there since the kernel isn't
+involved at all.
+
+``ceph-volume lvm``
+-------------------
+By making use of :term:`LVM tags`, the :ref:`ceph-volume-lvm` sub-command is
+able to store and later re-discover and query devices associated with OSDs so
+that they can later be activated.
+
+LVM performance penalty
+-----------------------
+In short: we haven't been able to notice any significant performance penalties
+associated with the change to LVM. By being able to work closely with LVM, the
+ability to work with other device mapper technologies was a given: there is no
+technical difficulty in working with anything that can sit below a Logical Volume.
diff --git a/doc/ceph-volume/inventory.rst b/doc/ceph-volume/inventory.rst
new file mode 100644
index 000000000..edb1fd205
--- /dev/null
+++ b/doc/ceph-volume/inventory.rst
@@ -0,0 +1,17 @@
+.. _ceph-volume-inventory:
+
+``inventory``
+=============
+The ``inventory`` subcommand queries a host's disc inventory and provides
+hardware information and metadata on every physical device.
+
+By default the command returns a short, human-readable report of all physical disks.
+
+For programmatic consumption of this report pass ``--format json`` to generate a
+JSON formatted report. This report includes extensive information on the
+physical drives such as disk metadata (like model and size), logical volumes
+and whether they are used by ceph, and if the disk is usable by ceph and
+reasons why not.
+
+A device path can be specified to report extensive information on a device in
+both plain and json format.
diff --git a/doc/ceph-volume/lvm/activate.rst b/doc/ceph-volume/lvm/activate.rst
new file mode 100644
index 000000000..fe34ecb71
--- /dev/null
+++ b/doc/ceph-volume/lvm/activate.rst
@@ -0,0 +1,112 @@
+.. _ceph-volume-lvm-activate:
+
+``activate``
+============
+
+After :ref:`ceph-volume-lvm-prepare` has completed its run, the volume can be
+activated. 
+
+Activating the volume involves enabling a ``systemd`` unit that persists the
+``OSD ID`` and its ``UUID`` (which is also called the ``fsid`` in the Ceph CLI
+tools). After this information has been persisted, the cluster can determine
+which OSD is enabled and must be mounted.
+
+.. note:: The execution of this call is fully idempotent. This means that the
+   call can be executed multiple times without changing the result of its first
+   successful execution.
+
+For information about OSDs deployed by cephadm, refer to
+:ref:`cephadm-osd-activate`.
+
+New OSDs
+--------
+To activate newly prepared OSDs both the :term:`OSD id` and :term:`OSD uuid`
+need to be supplied. For example::
+
+    ceph-volume lvm activate --bluestore 0 0263644D-0BF1-4D6D-BC34-28BD98AE3BC8
+
+.. note:: The UUID is stored in the ``fsid`` file in the OSD path, which is
+          generated when :ref:`ceph-volume-lvm-prepare` is used.
+
+Activating all OSDs
+-------------------
+
+.. note:: For OSDs deployed by cephadm, please refer to :ref:`cephadm-osd-activate`
+          instead.
+
+It is possible to activate all existing OSDs at once by using the ``--all``
+flag. For example::
+
+    ceph-volume lvm activate --all
+
+This call will inspect all the OSDs created by ceph-volume that are inactive
+and will activate them one by one. If any of the OSDs are already running, it
+will report them in the command output and skip them, making it safe to rerun
+(idempotent).
+
+requiring uuids
+^^^^^^^^^^^^^^^
+The :term:`OSD uuid` is being required as an extra step to ensure that the
+right OSD is being activated. It is entirely possible that a previous OSD with
+the same id exists and would end up activating the incorrect one.
+
+
+dmcrypt
+^^^^^^^
+If the OSD was prepared with dmcrypt by ceph-volume, there is no need to
+specify ``--dmcrypt`` on the command line again (that flag is not available for
+the ``activate`` subcommand). An encrypted OSD will be automatically detected.
+
+
+Discovery
+---------
+With OSDs previously created by ``ceph-volume``, a *discovery* process is
+performed using :term:`LVM tags` to enable the systemd units.
+
+The systemd unit will capture the :term:`OSD id` and :term:`OSD uuid` and
+persist it. Internally, the activation will enable it like::
+
+    systemctl enable ceph-volume@lvm-$id-$uuid
+
+For example::
+
+    systemctl enable ceph-volume@lvm-0-8715BEB4-15C5-49DE-BA6F-401086EC7B41
+
+Would start the discovery process for the OSD with an id of ``0`` and a UUID of
+``8715BEB4-15C5-49DE-BA6F-401086EC7B41``.
+
+.. note:: for more details on the systemd workflow see :ref:`ceph-volume-lvm-systemd`
+
+The systemd unit will look for the matching OSD device, and by looking at its
+:term:`LVM tags` will proceed to:
+
+#. Mount the device in the corresponding location (by convention this is
+``/var/lib/ceph/osd/<cluster name>-<osd id>/``)
+
+#. Ensure that all required devices are ready for that OSD.
+
+#. Start the ``ceph-osd@0`` systemd unit
+
+.. note:: The system infers the objectstore type by
+          inspecting the LVM tags applied to the OSD devices
+
+Existing OSDs
+-------------
+For existing OSDs that have been deployed with ``ceph-disk``, they need to be
+scanned and activated :ref:`using the simple sub-command <ceph-volume-simple>`.
+If a different tool was used then the only way to port them over to the new
+mechanism is to prepare them again (losing data). See
+:ref:`ceph-volume-lvm-existing-osds` for details on how to proceed.
+
+Summary
+-------
+To recap the ``activate`` process for :term:`bluestore`:
+
+#. Require both :term:`OSD id` and :term:`OSD uuid`
+#. Enable the system unit with matching id and uuid
+#. Create the ``tmpfs`` mount at the OSD directory in
+   ``/var/lib/ceph/osd/$cluster-$id/``
+#. Recreate all the files needed with ``ceph-bluestore-tool prime-osd-dir`` by
+   pointing it to the OSD ``block`` device.
+#. The systemd unit will ensure all devices are ready and linked
+#. The matching ``ceph-osd`` systemd unit will get started
diff --git a/doc/ceph-volume/lvm/batch.rst b/doc/ceph-volume/lvm/batch.rst
new file mode 100644
index 000000000..2114518bf
--- /dev/null
+++ b/doc/ceph-volume/lvm/batch.rst
@@ -0,0 +1,179 @@
+.. _ceph-volume-lvm-batch:
+
+``batch``
+===========
+The subcommand allows to create multiple OSDs at the same time given
+an input of devices. The ``batch`` subcommand is closely related to
+drive-groups. One individual drive group specification translates to a single
+``batch`` invocation.
+
+The subcommand is based to :ref:`ceph-volume-lvm-create`, and will use the very
+same code path. All ``batch`` does is to calculate the appropriate sizes of all
+volumes and skip over already created volumes.
+
+All the features that ``ceph-volume lvm create`` supports, like ``dmcrypt``,
+avoiding ``systemd`` units from starting, defining bluestore,
+is supported.
+
+
+.. _ceph-volume-lvm-batch_auto:
+
+Automatic sorting of disks
+--------------------------
+If ``batch`` receives only a single list of data devices and other options are
+passed , ``ceph-volume`` will auto-sort disks by its rotational
+property and use non-rotating disks for ``block.db`` or ``journal`` depending
+on the objectstore used. If all devices are to be used for standalone OSDs,
+no matter if rotating or solid state, pass ``--no-auto``.
+For example assuming :term:`bluestore` is used and ``--no-auto`` is not passed,
+the deprecated behavior would deploy the following, depending on the devices
+passed:
+
+#. Devices are all spinning HDDs: 1 OSD is created per device
+#. Devices are all SSDs: 2 OSDs are created per device
+#. Devices are a mix of HDDs and SSDs: data is placed on the spinning device,
+   the ``block.db`` is created on the SSD, as large as possible.
+
+.. note:: Although operations in ``ceph-volume lvm create`` allow usage of
+          ``block.wal`` it isn't supported with the ``auto`` behavior.
+
+This default auto-sorting behavior is now DEPRECATED and will be changed in future releases.
+Instead devices are not automatically sorted unless the ``--auto`` option is passed
+
+It is recommended to make use of the explicit device lists for ``block.db``,
+   ``block.wal`` and ``journal``.
+
+.. _ceph-volume-lvm-batch_bluestore:
+
+Reporting
+=========
+By default ``batch`` will print a report of the computed OSD layout and ask the
+user to confirm. This can be overridden by passing ``--yes``.
+
+If one wants to try out several invocations with being asked to deploy
+``--report`` can be passed. ``ceph-volume`` will exit after printing the report.
+
+Consider the following invocation::
+
+    $ ceph-volume lvm batch --report /dev/sdb /dev/sdc /dev/sdd --db-devices /dev/nvme0n1
+
+This will deploy three OSDs with external ``db`` and ``wal`` volumes on
+an NVME device.
+
+Pretty reporting
+----------------
+
+The ``pretty`` report format (the default) would
+look like this::
+
+    $ ceph-volume lvm batch --report /dev/sdb /dev/sdc /dev/sdd --db-devices /dev/nvme0n1
+    --> passed data devices: 3 physical, 0 LVM
+    --> relative data size: 1.0
+    --> passed block_db devices: 1 physical, 0 LVM
+
+    Total OSDs: 3
+
+      Type            Path                                                    LV Size         % of device
+    ----------------------------------------------------------------------------------------------------
+      data            /dev/sdb                                              300.00 GB         100.00%
+      block_db        /dev/nvme0n1                                           66.67 GB         33.33%
+    ----------------------------------------------------------------------------------------------------
+      data            /dev/sdc                                              300.00 GB         100.00%
+      block_db        /dev/nvme0n1                                           66.67 GB         33.33%
+    ----------------------------------------------------------------------------------------------------
+      data            /dev/sdd                                              300.00 GB         100.00%
+      block_db        /dev/nvme0n1                                           66.67 GB         33.33%
+
+
+JSON reporting
+--------------
+
+Reporting can produce a structured output with ``--format json`` or
+``--format json-pretty``::
+
+    $ ceph-volume lvm batch --report --format json-pretty /dev/sdb /dev/sdc /dev/sdd --db-devices /dev/nvme0n1
+    --> passed data devices: 3 physical, 0 LVM
+    --> relative data size: 1.0
+    --> passed block_db devices: 1 physical, 0 LVM
+    [
+        {
+            "block_db": "/dev/nvme0n1",
+            "block_db_size": "66.67 GB",
+            "data": "/dev/sdb",
+            "data_size": "300.00 GB",
+            "encryption": "None"
+        },
+        {
+            "block_db": "/dev/nvme0n1",
+            "block_db_size": "66.67 GB",
+            "data": "/dev/sdc",
+            "data_size": "300.00 GB",
+            "encryption": "None"
+        },
+        {
+            "block_db": "/dev/nvme0n1",
+            "block_db_size": "66.67 GB",
+            "data": "/dev/sdd",
+            "data_size": "300.00 GB",
+            "encryption": "None"
+        }
+    ]
+
+Sizing
+======
+When no sizing arguments are passed, `ceph-volume` will derive the sizing from
+the passed device lists (or the sorted lists when using the automatic sorting).
+`ceph-volume batch` will attempt to fully utilize a device's available capacity.
+Relying on automatic sizing is recommended.
+
+If one requires a different sizing policy for wal, db or journal devices,
+`ceph-volume` offers implicit and explicit sizing rules.
+
+Implicit sizing
+---------------
+Scenarios in which either devices are under-committed or not all data devices are
+currently ready for use (due to a broken disk for example), one can still rely
+on `ceph-volume` automatic sizing.
+Users can provide hints to `ceph-volume` as to how many data devices should have
+their external volumes on a set of fast devices. These options are:
+
+* ``--block-db-slots``
+* ``--block-wal-slots``
+* ``--journal-slots``
+
+For example, consider an OSD host that is supposed to contain 5 data devices and
+one device for wal/db volumes. However, one data device is currently broken and
+is being replaced. Instead of calculating the explicit sizes for the wal/db
+volume, one can simply call::
+
+    $ ceph-volume lvm batch --report /dev/sdb /dev/sdc /dev/sdd /dev/sde --db-devices /dev/nvme0n1 --block-db-slots 5
+
+Explicit sizing
+---------------
+It is also possible to provide explicit sizes to `ceph-volume` via the arguments
+
+* ``--block-db-size``
+* ``--block-wal-size``
+* ``--journal-size``
+
+`ceph-volume` will try to satisfy the requested sizes given the passed disks. If
+this is not possible, no OSDs will be deployed.
+
+
+Idempotency and disk replacements
+=================================
+`ceph-volume lvm batch` intends to be idempotent, i.e. calling the same command
+repeatedly must result in the same outcome. For example calling::
+
+    $ ceph-volume lvm batch --report /dev/sdb /dev/sdc /dev/sdd --db-devices /dev/nvme0n1
+
+will result in three deployed OSDs (if all disks were available). Calling this
+command again, you will still end up with three OSDs and ceph-volume will exit
+with return code 0.
+
+Suppose /dev/sdc goes bad and needs to be replaced. After destroying the OSD and
+replacing the hardware, you can again call the same command and `ceph-volume`
+will detect that only two out of the three wanted OSDs are setup and re-create
+the missing OSD.
+
+This idempotency notion is tightly coupled to and extensively used by :ref:`drivegroups`.
diff --git a/doc/ceph-volume/lvm/create.rst b/doc/ceph-volume/lvm/create.rst
new file mode 100644
index 000000000..17fe9fa5a
--- /dev/null
+++ b/doc/ceph-volume/lvm/create.rst
@@ -0,0 +1,23 @@
+.. _ceph-volume-lvm-create:
+
+``create``
+===========
+This subcommand wraps the two-step process to provision a new osd (calling
+``prepare`` first and then ``activate``) into a single
+one. The reason to prefer ``prepare`` and then ``activate`` is to gradually
+introduce new OSDs into a cluster, and avoiding large amounts of data being
+rebalanced.
+
+The single-call process unifies exactly what :ref:`ceph-volume-lvm-prepare` and
+:ref:`ceph-volume-lvm-activate` do, with the convenience of doing it all at
+once.
+
+There is nothing different to the process except the OSD will become up and in
+immediately after completion.
+
+The backing objectstore can be specified with:
+
+* :ref:`--bluestore <ceph-volume-lvm-prepare_bluestore>`
+
+All command line flags and options are the same as ``ceph-volume lvm prepare``.
+Please refer to :ref:`ceph-volume-lvm-prepare` for details.
diff --git a/doc/ceph-volume/lvm/encryption.rst b/doc/ceph-volume/lvm/encryption.rst
new file mode 100644
index 000000000..4564a7ffe
--- /dev/null
+++ b/doc/ceph-volume/lvm/encryption.rst
@@ -0,0 +1,84 @@
+.. _ceph-volume-lvm-encryption:
+
+Encryption
+==========
+
+Logical volumes can be encrypted using ``dmcrypt`` by specifying the
+``--dmcrypt`` flag when creating OSDs. When using LVM, logical volumes can be
+encrypted in different ways. ``ceph-volume`` does not offer as many options as
+LVM does, but it encrypts logical volumes in a way that  is consistent and
+robust.
+
+In this case, ``ceph-volume lvm`` follows this constraint:
+
+* Non-LVM devices (such as partitions) are encrypted with the same OSD key.
+
+
+LUKS
+----
+There are currently two versions of LUKS, 1 and 2. Version 2 is a bit easier to
+implement but not widely available in all Linux distributions supported by
+Ceph. 
+
+.. note:: Version 1 of LUKS is referred to in this documentation as "LUKS".
+   Version 2 is of LUKS is referred to in this documentation as "LUKS2".
+
+
+LUKS on LVM
+-----------
+Encryption is done on top of existing logical volumes (this is not the same as
+encrypting the physical device). Any single logical volume can be encrypted,
+leaving other volumes unencrypted. This method also allows for flexible logical
+volume setups, since encryption will happen once the LV is created.
+
+
+Workflow
+--------
+When setting up the OSD, a secret key is created. That secret key is passed
+to the monitor in JSON format as ``stdin`` to prevent the key from being
+captured in the logs.
+
+The JSON payload looks something like this::
+
+        {
+            "cephx_secret": CEPHX_SECRET,
+            "dmcrypt_key": DMCRYPT_KEY,
+            "cephx_lockbox_secret": LOCKBOX_SECRET,
+        }
+
+The naming convention for the keys is **strict**, and they are named like that
+for the hardcoded (legacy) names used by ceph-disk.
+
+* ``cephx_secret`` : The cephx key used to authenticate
+* ``dmcrypt_key`` : The secret (or private) key to unlock encrypted devices
+* ``cephx_lockbox_secret`` : The authentication key used to retrieve the
+  ``dmcrypt_key``. It is named *lockbox* because ceph-disk used to have an
+  unencrypted partition named after it, which was used to store public keys and
+  other OSD metadata.
+
+The naming convention is strict because Monitors supported the naming
+convention of ceph-disk, which used these key names. In order to maintain 
+compatibility and prevent ceph-disk from breaking, ceph-volume uses the same
+naming convention *although it does not make sense for the new encryption
+workflow*.
+
+After the common steps of setting up the OSD during the "prepare stage" (
+with :term:`bluestore`), the logical volume is left ready
+to be activated, regardless of the state of the device (encrypted or
+decrypted).
+
+At the time of its activation, the logical volume is decrypted. The OSD starts
+after the process completes correctly.
+
+Summary of the encryption workflow for creating a new OSD
+----------------------------------------------------------
+
+#. OSD is created. Both lockbox and dmcrypt keys are created and sent to the
+   monitors in JSON format, indicating an encrypted OSD.
+
+#. All complementary devices (like journal, db, or wal) get created and
+   encrypted with the same OSD key. Key is stored in the LVM metadata of the
+   OSD.
+
+#. Activation continues by ensuring devices are mounted, retrieving the dmcrypt
+   secret key from the monitors, and decrypting before the OSD gets started.
diff --git a/doc/ceph-volume/lvm/index.rst b/doc/ceph-volume/lvm/index.rst
new file mode 100644
index 000000000..962e51a51
--- /dev/null
+++ b/doc/ceph-volume/lvm/index.rst
@@ -0,0 +1,34 @@
+.. _ceph-volume-lvm:
+
+``lvm``
+=======
+Implements the functionality needed to deploy OSDs from the ``lvm`` subcommand:
+``ceph-volume lvm``
+
+**Command Line Subcommands**
+
+* :ref:`ceph-volume-lvm-prepare`
+
+* :ref:`ceph-volume-lvm-activate`
+
+* :ref:`ceph-volume-lvm-create`
+
+* :ref:`ceph-volume-lvm-list`
+
+* :ref:`ceph-volume-lvm-migrate`
+
+* :ref:`ceph-volume-lvm-newdb`
+
+* :ref:`ceph-volume-lvm-newwal`
+
+.. not yet implemented
+.. * :ref:`ceph-volume-lvm-scan`
+
+**Internal functionality**
+
+There are other aspects of the ``lvm`` subcommand that are internal and not
+exposed to the user, these sections explain how these pieces work together,
+clarifying the workflows of the tool.
+
+:ref:`Systemd Units <ceph-volume-lvm-systemd>` |
+:ref:`lvm <ceph-volume-lvm-api>`
diff --git a/doc/ceph-volume/lvm/list.rst b/doc/ceph-volume/lvm/list.rst
new file mode 100644
index 000000000..718154b10
--- /dev/null
+++ b/doc/ceph-volume/lvm/list.rst
@@ -0,0 +1,184 @@
+.. _ceph-volume-lvm-list:
+
+``list``
+========
+This subcommand will list any devices (logical and physical) that may be
+associated with a Ceph cluster, as long as they contain enough metadata to
+allow for that discovery.
+
+Output is grouped by the OSD ID associated with the devices, and unlike
+``ceph-disk`` it does not provide any information for devices that aren't
+associated with Ceph.
+
+Command line options:
+
+* ``--format`` Allows a ``json`` or ``pretty`` value. Defaults to ``pretty``
+  which will group the device information in a human-readable format.
+
+Full Reporting
+--------------
+When no positional arguments are used, a full reporting will be presented. This
+means that all devices and logical volumes found in the system will be
+displayed.
+
+Full ``pretty`` reporting for two OSDs, one with a lv as a journal, and another
+one with a physical device may look similar to::
+
+    # ceph-volume lvm list
+
+
+    ====== osd.1 =======
+
+      [journal]    /dev/journals/journal1
+
+          journal uuid              C65n7d-B1gy-cqX3-vZKY-ZoE0-IEYM-HnIJzs
+          osd id                    1
+          cluster fsid              ce454d91-d748-4751-a318-ff7f7aa18ffd
+          type                      journal
+          osd fsid                  661b24f8-e062-482b-8110-826ffe7f13fa
+          data uuid                 SlEgHe-jX1H-QBQk-Sce0-RUls-8KlY-g8HgcZ
+          journal device            /dev/journals/journal1
+          data device               /dev/test_group/data-lv2
+          devices                   /dev/sda
+
+      [data]    /dev/test_group/data-lv2
+
+          journal uuid              C65n7d-B1gy-cqX3-vZKY-ZoE0-IEYM-HnIJzs
+          osd id                    1
+          cluster fsid              ce454d91-d748-4751-a318-ff7f7aa18ffd
+          type                      data
+          osd fsid                  661b24f8-e062-482b-8110-826ffe7f13fa
+          data uuid                 SlEgHe-jX1H-QBQk-Sce0-RUls-8KlY-g8HgcZ
+          journal device            /dev/journals/journal1
+          data device               /dev/test_group/data-lv2
+          devices                   /dev/sdb
+
+    ====== osd.0 =======
+
+      [data]    /dev/test_group/data-lv1
+
+          journal uuid              cd72bd28-002a-48da-bdf6-d5b993e84f3f
+          osd id                    0
+          cluster fsid              ce454d91-d748-4751-a318-ff7f7aa18ffd
+          type                      data
+          osd fsid                  943949f0-ce37-47ca-a33c-3413d46ee9ec
+          data uuid                 TUpfel-Q5ZT-eFph-bdGW-SiNW-l0ag-f5kh00
+          journal device            /dev/sdd1
+          data device               /dev/test_group/data-lv1
+          devices                   /dev/sdc
+
+      [journal]    /dev/sdd1
+
+          PARTUUID                  cd72bd28-002a-48da-bdf6-d5b993e84f3f
+
+
+For logical volumes the ``devices`` key is populated with the physical devices
+associated with the logical volume. Since LVM allows multiple physical devices
+to be part of a logical volume, the value will be comma separated when using
+``pretty``, but an array when using ``json``.
+
+.. note:: Tags are displayed in a readable format. The ``osd id`` key is stored
+          as a ``ceph.osd_id`` tag. For more information on lvm tag conventions
+          see :ref:`ceph-volume-lvm-tag-api`
+
+Single Reporting
+----------------
+Single reporting can consume both devices and logical volumes as input
+(positional parameters). For logical volumes, it is required to use the group
+name as well as the logical volume name.
+
+For example the ``data-lv2`` logical volume, in the ``test_group`` volume group
+can be listed in the following way::
+
+    # ceph-volume lvm list test_group/data-lv2
+
+
+    ====== osd.1 =======
+
+      [data]    /dev/test_group/data-lv2
+
+          journal uuid              C65n7d-B1gy-cqX3-vZKY-ZoE0-IEYM-HnIJzs
+          osd id                    1
+          cluster fsid              ce454d91-d748-4751-a318-ff7f7aa18ffd
+          type                      data
+          osd fsid                  661b24f8-e062-482b-8110-826ffe7f13fa
+          data uuid                 SlEgHe-jX1H-QBQk-Sce0-RUls-8KlY-g8HgcZ
+          journal device            /dev/journals/journal1
+          data device               /dev/test_group/data-lv2
+          devices                   /dev/sdc
+
+
+.. note:: Tags are displayed in a readable format. The ``osd id`` key is stored
+          as a ``ceph.osd_id`` tag. For more information on lvm tag conventions
+          see :ref:`ceph-volume-lvm-tag-api`
+
+
+For plain disks, the full path to the device is required. For example, for
+a device like ``/dev/sdd1`` it can look like::
+
+
+    # ceph-volume lvm list /dev/sdd1
+
+
+    ====== osd.0 =======
+
+      [journal]    /dev/sdd1
+
+          PARTUUID                  cd72bd28-002a-48da-bdf6-d5b993e84f3f
+
+
+
+``json`` output
+---------------
+All output using ``--format=json`` will show everything the system has stored
+as metadata for the devices, including tags.
+
+No changes for readability are done with ``json`` reporting, and all
+information is presented as-is. Full output as well as single devices can be
+listed.
+
+For brevity, this is how a single logical volume would look with ``json``
+output (note how tags aren't modified)::
+
+    # ceph-volume lvm list --format=json test_group/data-lv1
+    {
+        "0": [
+            {
+                "devices": ["/dev/sda"],
+                "lv_name": "data-lv1",
+                "lv_path": "/dev/test_group/data-lv1",
+                "lv_tags": "ceph.cluster_fsid=ce454d91-d748-4751-a318-ff7f7aa18ffd,ceph.data_device=/dev/test_group/data-lv1,ceph.data_uuid=TUpfel-Q5ZT-eFph-bdGW-SiNW-l0ag-f5kh00,ceph.journal_device=/dev/sdd1,ceph.journal_uuid=cd72bd28-002a-48da-bdf6-d5b993e84f3f,ceph.osd_fsid=943949f0-ce37-47ca-a33c-3413d46ee9ec,ceph.osd_id=0,ceph.type=data",
+                "lv_uuid": "TUpfel-Q5ZT-eFph-bdGW-SiNW-l0ag-f5kh00",
+                "name": "data-lv1",
+                "path": "/dev/test_group/data-lv1",
+                "tags": {
+                    "ceph.cluster_fsid": "ce454d91-d748-4751-a318-ff7f7aa18ffd",
+                    "ceph.data_device": "/dev/test_group/data-lv1",
+                    "ceph.data_uuid": "TUpfel-Q5ZT-eFph-bdGW-SiNW-l0ag-f5kh00",
+                    "ceph.journal_device": "/dev/sdd1",
+                    "ceph.journal_uuid": "cd72bd28-002a-48da-bdf6-d5b993e84f3f",
+                    "ceph.osd_fsid": "943949f0-ce37-47ca-a33c-3413d46ee9ec",
+                    "ceph.osd_id": "0",
+                    "ceph.type": "data"
+                },
+                "type": "data",
+                "vg_name": "test_group"
+            }
+        ]
+    }
+
+
+Synchronized information
+------------------------
+Before any listing type, the lvm API is queried to ensure that physical devices
+that may be in use haven't changed naming. It is possible that non-persistent
+devices like ``/dev/sda1`` could change to ``/dev/sdb1``.
+
+The detection is possible because the ``PARTUUID`` is stored as part of the
+metadata in the logical volume for the data lv. Even in the case of a journal
+that is a physical device, this information is still stored on the data logical
+volume associated with it.
+
+If the name is no longer the same (as reported by ``blkid`` when using the
+``PARTUUID``), the tag will get updated and the report will use the newly
+refreshed information.
diff --git a/doc/ceph-volume/lvm/migrate.rst b/doc/ceph-volume/lvm/migrate.rst
new file mode 100644
index 000000000..983d2e797
--- /dev/null
+++ b/doc/ceph-volume/lvm/migrate.rst
@@ -0,0 +1,47 @@
+.. _ceph-volume-lvm-migrate:
+
+``migrate``
+===========
+
+Moves BlueFS data from source volume(s) to the target one, source volumes
+(except the main, i.e. data or block one) are removed on success.
+
+LVM volumes are permitted for Target only, both already attached or new one.
+
+In the latter case it is attached to the OSD replacing one of the source
+devices.
+
+Following replacement rules apply (in the order of precedence, stop
+on the first match):
+
+    - if source list has DB volume - target device replaces it.
+    - if source list has WAL volume - target device replaces it.
+    - if source list has slow volume only - operation is not permitted,
+      requires explicit allocation via new-db/new-wal command.
+
+Moves BlueFS data from main device to LV already attached as DB::
+
+    ceph-volume lvm migrate --osd-id 1 --osd-fsid <uuid> --from data --target vgname/db
+
+Moves BlueFS data from shared main device to LV which will be attached as a
+new DB::
+
+    ceph-volume lvm migrate --osd-id 1 --osd-fsid <uuid> --from data --target vgname/new_db
+
+Moves BlueFS data from DB device to new LV, DB is replaced::
+
+    ceph-volume lvm migrate --osd-id 1 --osd-fsid <uuid> --from db --target vgname/new_db
+
+Moves BlueFS data from main and DB devices to new LV, DB is replaced::
+
+    ceph-volume lvm migrate --osd-id 1 --osd-fsid <uuid> --from data db --target vgname/new_db
+
+Moves BlueFS data from main, DB and WAL devices to new LV, WAL is  removed and
+DB is replaced::
+
+    ceph-volume lvm migrate --osd-id 1 --osd-fsid <uuid> --from data db wal --target vgname/new_db
+
+Moves BlueFS data from main, DB and WAL devices to main device, WAL and DB are
+removed::
+
+    ceph-volume lvm migrate --osd-id 1 --osd-fsid <uuid> --from db wal --target vgname/data
diff --git a/doc/ceph-volume/lvm/newdb.rst b/doc/ceph-volume/lvm/newdb.rst
new file mode 100644
index 000000000..dcc87fc8a
--- /dev/null
+++ b/doc/ceph-volume/lvm/newdb.rst
@@ -0,0 +1,11 @@
+.. _ceph-volume-lvm-newdb:
+
+``new-db``
+===========
+
+Attaches the given logical volume to OSD as a DB.
+Logical volume name format is vg/lv. Fails if OSD has already got attached DB.
+
+Attach vgname/lvname as a DB volume to OSD 1::
+
+    ceph-volume lvm new-db --osd-id 1 --osd-fsid 55BD4219-16A7-4037-BC20-0F158EFCC83D --target vgname/new_db
diff --git a/doc/ceph-volume/lvm/newwal.rst b/doc/ceph-volume/lvm/newwal.rst
new file mode 100644
index 000000000..05f87fff6
--- /dev/null
+++ b/doc/ceph-volume/lvm/newwal.rst
@@ -0,0 +1,11 @@
+.. _ceph-volume-lvm-newwal:
+
+``new-wal``
+===========
+
+Attaches the given logical volume to the given OSD as a WAL volume.
+Logical volume format is vg/lv. Fails if OSD has already got attached DB.
+
+Attach vgname/lvname as a WAL volume to OSD 1::
+
+    ceph-volume lvm new-wal --osd-id 1 --osd-fsid 55BD4219-16A7-4037-BC20-0F158EFCC83D --target vgname/new_wal
diff --git a/doc/ceph-volume/lvm/prepare.rst b/doc/ceph-volume/lvm/prepare.rst
new file mode 100644
index 000000000..2faf12a4e
--- /dev/null
+++ b/doc/ceph-volume/lvm/prepare.rst
@@ -0,0 +1,332 @@
+.. _ceph-volume-lvm-prepare:
+
+``prepare``
+===========
+Before you run ``ceph-volume lvm prepare``, we recommend that you provision a
+logical volume. Then you can run ``prepare`` on that logical volume. 
+
+``prepare`` adds metadata to logical volumes but does not alter them in any
+other way. 
+
+.. note:: This is part of a two-step process to deploy an OSD. If you prefer 
+   to deploy an OSD by using only one command, see :ref:`ceph-volume-lvm-create`.
+
+``prepare`` uses :term:`LVM tags` to assign several pieces of metadata to a
+logical volume. Volumes tagged in this way are easier to identify and easier to
+use with Ceph. :term:`LVM tags` identify logical volumes by the role that they
+play in the Ceph cluster (for example: BlueStore data or BlueStore WAL+DB).
+
+:term:`BlueStore<bluestore>` is the default backend. Ceph permits changing
+the backend, which can be done by using the following flags and arguments:
+
+* :ref:`--bluestore <ceph-volume-lvm-prepare_bluestore>`
+
+.. _ceph-volume-lvm-prepare_bluestore:
+
+``bluestore``
+-------------
+:term:`Bluestore<bluestore>` is the default backend for new OSDs.  Bluestore
+supports the following configurations:
+
+* a block device, a block.wal device, and a block.db device
+* a block device and a block.wal device
+* a block device and a block.db device
+* a single block device
+
+The ``bluestore`` subcommand accepts physical block devices, partitions on physical
+block devices, or logical volumes as arguments for the various device
+parameters. If a physical block device is provided, a logical volume will be
+created. If the provided volume group's name begins with `ceph`, it will be
+created if it does not yet exist and it will be clobbered and reused if it
+already exists. This allows for a simpler approach to using LVM but at the
+cost of flexibility: no option or configuration can be used to change how the
+logical volume is created.
+
+The ``block`` is specified with the ``--data`` flag, and in its simplest use
+case it looks like:
+
+.. prompt:: bash #
+
+    ceph-volume lvm prepare --bluestore --data vg/lv
+
+A raw device can be specified in the same way:
+
+.. prompt:: bash #
+
+    ceph-volume lvm prepare --bluestore --data /path/to/device
+
+For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required:
+
+.. prompt:: bash #
+
+    ceph-volume lvm prepare --bluestore --dmcrypt --data vg/lv
+
+If a ``block.db`` device or a ``block.wal`` device is needed, it can be
+specified with ``--block.db`` or ``--block.wal``. These can be physical
+devices, partitions, or logical volumes. ``block.db`` and ``block.wal`` are
+optional for bluestore.
+
+For both ``block.db`` and ``block.wal``, partitions can be used as-is, and 
+therefore are not made into logical volumes.
+
+While creating the OSD directory, the process uses a ``tmpfs`` mount to hold
+the files needed for the OSD. These files are created by ``ceph-osd --mkfs``
+and are ephemeral.
+
+A symlink is created for the ``block`` device, and is optional for ``block.db``
+and ``block.wal``. For a cluster with a default name and an OSD ID of 0, the
+directory looks like this::
+
+    # ls -l /var/lib/ceph/osd/ceph-0
+    lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block -> /dev/ceph-be2b6fbd-bcf2-4c51-b35d-a35a162a02f0/osd-block-25cf0a05-2bc6-44ef-9137-79d65bd7ad62
+    lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.db -> /dev/sda1
+    lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.wal -> /dev/ceph/osd-wal-0
+    -rw-------. 1 ceph ceph 37 Oct 20 13:05 ceph_fsid
+    -rw-------. 1 ceph ceph 37 Oct 20 13:05 fsid
+    -rw-------. 1 ceph ceph 55 Oct 20 13:05 keyring
+    -rw-------. 1 ceph ceph  6 Oct 20 13:05 ready
+    -rw-------. 1 ceph ceph 10 Oct 20 13:05 type
+    -rw-------. 1 ceph ceph  2 Oct 20 13:05 whoami
+
+In the above case, a device was used for ``block``, so ``ceph-volume`` created
+a volume group and a logical volume using the following conventions:
+
+* volume group name: ``ceph-{cluster fsid}`` (or if the volume group already
+  exists: ``ceph-{random uuid}``)
+
+* logical volume name: ``osd-block-{osd_fsid}``
+
+
+.. _ceph-volume-lvm-prepare_filestore:
+
+``filestore``
+-------------
+.. warning:: Filestore has been deprecated in the Reef release and is no longer supported.
+
+``Filestore<filestore>`` is the OSD backend that prepares logical volumes for a
+`filestore`-backed object-store OSD.
+
+
+``Filestore<filestore>`` uses a logical volume to store OSD data and it uses
+physical devices, partitions, or logical volumes to store the journal.  If a
+physical device is used to create a filestore backend, a logical volume will be
+created on that physical device. If the provided volume group's name begins
+with `ceph`, it will be created if it does not yet exist and it will be
+clobbered and reused if it already exists. No special preparation is needed for
+these volumes, but be sure to meet the minimum size requirements for OSD data and
+for the journal.
+
+Use the following command to create a basic filestore OSD:
+
+.. prompt:: bash #
+
+   ceph-volume lvm prepare --filestore --data <data block device>
+
+Use this command to deploy filestore with an external journal:
+
+.. prompt:: bash #
+
+   ceph-volume lvm prepare --filestore --data <data block device> --journal <journal block device>
+
+Use this command to enable :ref:`encryption <ceph-volume-lvm-encryption>`, and note that the ``--dmcrypt`` flag is required:
+
+.. prompt:: bash #
+
+   ceph-volume lvm prepare --filestore --dmcrypt --data <data block device> --journal <journal block device>
+
+The data block device and the journal can each take one of three forms: 
+
+* a physical block device
+* a partition on a physical block device
+* a logical volume
+
+If you use a logical volume to deploy filestore, the value that you pass in the
+command *must* be of the format ``volume_group/logical_volume_name``. Since logical
+volume names are not enforced for uniqueness, using this format is an important 
+safeguard against accidentally choosing the wrong volume (and clobbering its data).
+
+If you use a partition to deploy filestore, the partition *must* contain a
+``PARTUUID`` that can be discovered by ``blkid``. This ensures that the
+partition can be identified correctly regardless of the device's name (or path).
+
+For example, to use a logical volume for OSD data and a partition
+(``/dev/sdc1``) for the journal, run a command of this form:
+
+.. prompt:: bash #
+
+   ceph-volume lvm prepare --filestore --data volume_group/logical_volume_name --journal /dev/sdc1
+
+Or, to use a bare device for data and a logical volume for the journal:
+
+.. prompt:: bash #
+
+   ceph-volume lvm prepare --filestore --data /dev/sdc --journal volume_group/journal_lv
+
+A generated UUID is used when asking the cluster for a new OSD. These two
+pieces of information (the OSD ID and the OSD UUID) are necessary for
+identifying a given OSD and will later be used throughout the
+:ref:`activation<ceph-volume-lvm-activate>` process.
+
+The OSD data directory is created using the following convention::
+
+    /var/lib/ceph/osd/<cluster name>-<osd id>
+
+To link the journal volume to the mounted data volume, use this command:
+
+.. prompt:: bash #
+
+   ln -s /path/to/journal /var/lib/ceph/osd/<cluster_name>-<osd-id>/journal
+
+To fetch the monmap by using the bootstrap key from the OSD, use this command:
+
+.. prompt:: bash #
+
+   /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring
+   /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o
+   /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap
+
+To populate the OSD directory (which has already been mounted), use this ``ceph-osd`` command:  
+.. prompt:: bash #
+
+   ceph-osd --cluster ceph --mkfs --mkkey -i <osd id> \ --monmap
+   /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap --osd-data \
+   /var/lib/ceph/osd/<cluster name>-<osd id> --osd-journal
+   /var/lib/ceph/osd/<cluster name>-<osd id>/journal \ --osd-uuid <osd uuid>
+   --keyring /var/lib/ceph/osd/<cluster name>-<osd id>/keyring \ --setuser ceph
+   --setgroup ceph
+
+All of the information from the previous steps is used in the above command.      
+
+
+
+.. _ceph-volume-lvm-partitions:
+
+Partitioning
+------------
+``ceph-volume lvm`` does not currently create partitions from a whole device.
+If using device partitions the only requirement is that they contain the
+``PARTUUID`` and that it is discoverable by ``blkid``. Both ``fdisk`` and
+``parted`` will create that automatically for a new partition.
+
+For example, using a new, unformatted drive (``/dev/sdd`` in this case) we can
+use ``parted`` to create a new partition. First we list the device
+information::
+
+    $ parted --script /dev/sdd print
+    Model: VBOX HARDDISK (scsi)
+    Disk /dev/sdd: 11.5GB
+    Sector size (logical/physical): 512B/512B
+    Disk Flags:
+
+This device is not even labeled yet, so we can use ``parted`` to create
+a ``gpt`` label before we create a partition, and verify again with ``parted
+print``::
+
+    $ parted --script /dev/sdd mklabel gpt
+    $ parted --script /dev/sdd print
+    Model: VBOX HARDDISK (scsi)
+    Disk /dev/sdd: 11.5GB
+    Sector size (logical/physical): 512B/512B
+    Partition Table: gpt
+    Disk Flags:
+
+Now lets create a single partition, and verify later if ``blkid`` can find
+a ``PARTUUID`` that is needed by ``ceph-volume``::
+
+    $ parted --script /dev/sdd mkpart primary 1 100%
+    $ blkid /dev/sdd1
+    /dev/sdd1: PARTLABEL="primary" PARTUUID="16399d72-1e1f-467d-96ee-6fe371a7d0d4"
+
+
+.. _ceph-volume-lvm-existing-osds:
+
+Existing OSDs
+-------------
+For existing clusters that want to use this new system and have OSDs that are
+already running there are a few things to take into account:
+
+.. warning:: this process will forcefully format the data device, destroying
+             existing data, if any.
+
+* OSD paths should follow this convention::
+
+     /var/lib/ceph/osd/<cluster name>-<osd id>
+
+* Preferably, no other mechanisms to mount the volume should exist, and should
+  be removed (like fstab mount points)
+
+The one time process for an existing OSD, with an ID of 0 and using
+a ``"ceph"`` cluster name would look like (the following command will **destroy
+any data** in the OSD)::
+
+    ceph-volume lvm prepare --filestore --osd-id 0 --osd-fsid E3D291C1-E7BF-4984-9794-B60D9FA139CB
+
+The command line tool will not contact the monitor to generate an OSD ID and
+will format the LVM device in addition to storing the metadata on it so that it
+can be started later (for detailed metadata description see
+:ref:`ceph-volume-lvm-tags`).
+
+
+Crush device class
+------------------
+
+To set the crush device class for the OSD, use the ``--crush-device-class`` flag. 
+
+    ceph-volume lvm prepare --bluestore --data vg/lv --crush-device-class foo
+
+
+.. _ceph-volume-lvm-multipath:
+
+``multipath`` support
+---------------------
+``multipath`` devices are supported if ``lvm`` is configured properly.
+
+**Leave it to LVM**
+
+Most Linux distributions should ship their LVM2 package with
+``multipath_component_detection = 1`` in the default configuration. With this
+setting ``LVM`` ignores any device that is a multipath component and
+``ceph-volume`` will accordingly not touch these devices.
+
+**Using filters**
+
+Should this setting be unavailable, a correct ``filter`` expression must be
+provided in ``lvm.conf``. ``ceph-volume`` must not be able to use both the
+multipath device and its multipath components.
+
+Storing metadata
+----------------
+The following tags will get applied as part of the preparation process
+regardless of the type of volume (journal or data) or OSD objectstore:
+
+* ``cluster_fsid``
+* ``encrypted``
+* ``osd_fsid``
+* ``osd_id``
+* ``crush_device_class``
+
+For :term:`bluestore` these tags will be added:
+
+* ``block_device``
+* ``block_uuid``
+* ``db_device``
+* ``db_uuid``
+* ``wal_device``
+* ``wal_uuid``
+
+.. note:: For the complete lvm tag conventions see :ref:`ceph-volume-lvm-tag-api`
+
+
+Summary
+-------
+To recap the ``prepare`` process for :term:`bluestore`:
+
+#. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments.
+#. Creates logical volumes on any raw physical devices.
+#. Generate a UUID for the OSD
+#. Ask the monitor get an OSD ID reusing the generated UUID
+#. OSD data directory is created on a tmpfs mount.
+#. ``block``, ``block.wal``, and ``block.db`` are symlinked if defined.
+#. monmap is fetched for activation
+#. Data directory is populated by ``ceph-osd``
+#. Logical Volumes are assigned all the Ceph metadata using lvm tags
diff --git a/doc/ceph-volume/lvm/scan.rst b/doc/ceph-volume/lvm/scan.rst
new file mode 100644
index 000000000..aa9990f71
--- /dev/null
+++ b/doc/ceph-volume/lvm/scan.rst
@@ -0,0 +1,9 @@
+scan
+====
+This sub-command will allow to discover Ceph volumes previously setup by the
+tool by looking into the system's logical volumes and their tags.
+
+As part of the :ref:`ceph-volume-lvm-prepare` process, the logical volumes are assigned
+a few tags with important pieces of information.
+
+.. note:: This sub-command is not yet implemented
diff --git a/doc/ceph-volume/lvm/systemd.rst b/doc/ceph-volume/lvm/systemd.rst
new file mode 100644
index 000000000..30260de7e
--- /dev/null
+++ b/doc/ceph-volume/lvm/systemd.rst
@@ -0,0 +1,28 @@
+.. _ceph-volume-lvm-systemd:
+
+systemd
+=======
+Upon startup, it will identify the logical volume using :term:`LVM tags`,
+finding a matching ID and later ensuring it is the right one with
+the :term:`OSD uuid`.
+
+After identifying the correct volume it will then proceed to mount it by using
+the OSD destination conventions, that is::
+
+    /var/lib/ceph/osd/<cluster name>-<osd id>
+
+For our example OSD with an id of ``0``, that means the identified device will
+be mounted at::
+
+
+    /var/lib/ceph/osd/ceph-0
+
+
+Once that process is complete, a call will be made to start the OSD::
+
+    systemctl start ceph-osd@0
+
+The systemd portion of this process is handled by the ``ceph-volume lvm
+trigger`` sub-command, which is only in charge of parsing metadata coming from
+systemd and startup, and then dispatching to ``ceph-volume lvm activate`` which
+would proceed with activation.
diff --git a/doc/ceph-volume/lvm/zap.rst b/doc/ceph-volume/lvm/zap.rst
new file mode 100644
index 000000000..e737fc386
--- /dev/null
+++ b/doc/ceph-volume/lvm/zap.rst
@@ -0,0 +1,65 @@
+.. _ceph-volume-lvm-zap:
+
+``zap``
+=======
+
+This subcommand is used to zap lvs, partitions or raw devices that have been used
+by ceph OSDs so that they may be reused. If given a path to a logical
+volume it must be in the format of vg/lv. Any file systems present
+on the given lv or partition will be removed and all data will be purged.
+
+.. note:: The lv or partition will be kept intact.
+
+.. note:: If the logical volume, raw device or partition is being used for any ceph related
+          mount points they will be unmounted.
+
+Zapping a logical volume::
+
+    ceph-volume lvm zap {vg name/lv name}
+
+Zapping a partition::
+
+    ceph-volume lvm zap /dev/sdc1
+
+Removing Devices
+----------------
+When zapping, and looking for full removal of the device (lv, vg, or partition)
+use the ``--destroy`` flag. A common use case is to simply deploy OSDs using
+a whole raw device. If you do so and then wish to reuse that device for another
+OSD you must use the ``--destroy`` flag when zapping so that the vgs and lvs
+that ceph-volume created on the raw device will be removed.
+
+.. note:: Multiple devices can be accepted at once, to zap them all
+
+Zapping a raw device and destroying any vgs or lvs present::
+
+    ceph-volume lvm zap /dev/sdc --destroy
+
+
+This action can be performed on partitions, and logical volumes as well::
+
+    ceph-volume lvm zap /dev/sdc1 --destroy
+    ceph-volume lvm zap osd-vg/data-lv --destroy
+
+
+Finally, multiple devices can be detected if filtering by OSD ID and/or OSD
+FSID. Either identifier can be used or both can be used at the same time. This
+is useful in situations where multiple devices associated with a specific ID
+need to be purged. When using the FSID, the filtering is stricter, and might
+not match other (possibly invalid) devices associated to an ID.
+
+By ID only::
+
+    ceph-volume lvm zap --destroy --osd-id 1
+
+By FSID::
+
+    ceph-volume lvm zap --destroy --osd-fsid 2E8FBE58-0328-4E3B-BFB7-3CACE4E9A6CE
+
+By both::
+
+    ceph-volume lvm zap --destroy --osd-fsid 2E8FBE58-0328-4E3B-BFB7-3CACE4E9A6CE --osd-id 1
+
+
+.. warning:: If the systemd unit associated with the OSD ID to be zapped is
+             detected as running, the tool will refuse to zap until the daemon is stopped.
diff --git a/doc/ceph-volume/simple/activate.rst b/doc/ceph-volume/simple/activate.rst
new file mode 100644
index 000000000..8c7737162
--- /dev/null
+++ b/doc/ceph-volume/simple/activate.rst
@@ -0,0 +1,79 @@
+.. _ceph-volume-simple-activate:
+
+``activate``
+============
+Once :ref:`ceph-volume-simple-scan` has been completed, and all the metadata
+captured for an OSD has been persisted to ``/etc/ceph/osd/{id}-{uuid}.json``
+the OSD is now ready to get "activated".
+
+This activation process **disables** all ``ceph-disk`` systemd units by masking
+them, to prevent the UDEV/ceph-disk interaction that will attempt to start them
+up at boot time.
+
+The disabling of ``ceph-disk`` units is done only when calling ``ceph-volume
+simple activate`` directly, but is avoided when being called by systemd when
+the system is booting up.
+
+The activation process requires using both the :term:`OSD id` and :term:`OSD uuid`
+To activate parsed OSDs::
+
+    ceph-volume simple activate 0 6cc43680-4f6e-4feb-92ff-9c7ba204120e
+
+The above command will assume that a JSON configuration will be found in::
+
+    /etc/ceph/osd/0-6cc43680-4f6e-4feb-92ff-9c7ba204120e.json
+
+Alternatively, using a path to a JSON file directly is also possible::
+
+    ceph-volume simple activate --file /etc/ceph/osd/0-6cc43680-4f6e-4feb-92ff-9c7ba204120e.json
+
+requiring uuids
+^^^^^^^^^^^^^^^
+The :term:`OSD uuid` is being required as an extra step to ensure that the
+right OSD is being activated. It is entirely possible that a previous OSD with
+the same id exists and would end up activating the incorrect one.
+
+
+Discovery
+---------
+With OSDs previously scanned by ``ceph-volume``, a *discovery* process is
+performed using ``blkid`` and ``lvm``. There is currently support only for
+devices with GPT partitions and LVM logical volumes.
+
+The GPT partitions will have a ``PARTUUID`` that can be queried by calling out
+to ``blkid``, and the logical volumes will have a ``lv_uuid`` that can be
+queried against ``lvs`` (the LVM tool to list logical volumes).
+
+This discovery process ensures that devices can be correctly detected even if
+they are repurposed into another system or if their name changes (as in the
+case of non-persisting names like ``/dev/sda1``)
+
+The JSON configuration file used to map what devices go to what OSD will then
+coordinate the mounting and symlinking as part of activation.
+
+To ensure that the symlinks are always correct, if they exist in the OSD
+directory, the symlinks will be re-done.
+
+A systemd unit will capture the :term:`OSD id` and :term:`OSD uuid` and
+persist it. Internally, the activation will enable it like::
+
+    systemctl enable ceph-volume@simple-$id-$uuid
+
+For example::
+
+    systemctl enable ceph-volume@simple-0-8715BEB4-15C5-49DE-BA6F-401086EC7B41
+
+Would start the discovery process for the OSD with an id of ``0`` and a UUID of
+``8715BEB4-15C5-49DE-BA6F-401086EC7B41``.
+
+
+The systemd process will call out to activate passing the information needed to
+identify the OSD and its devices, and it will proceed to:
+
+# mount the device in the corresponding location (by convention this is
+  ``/var/lib/ceph/osd/<cluster name>-<osd id>/``)
+
+# ensure that all required devices are ready for that OSD and properly linked. 
+The symbolic link will **always** be re-done to ensure that the correct device is linked.
+
+# start the ``ceph-osd@0`` systemd unit
diff --git a/doc/ceph-volume/simple/index.rst b/doc/ceph-volume/simple/index.rst
new file mode 100644
index 000000000..315dea99a
--- /dev/null
+++ b/doc/ceph-volume/simple/index.rst
@@ -0,0 +1,32 @@
+.. _ceph-volume-simple:
+
+``simple``
+==========
+Implements the functionality needed to manage OSDs from the ``simple`` subcommand:
+``ceph-volume simple``
+
+**Command Line Subcommands**
+
+* :ref:`ceph-volume-simple-scan`
+
+* :ref:`ceph-volume-simple-activate`
+
+* :ref:`ceph-volume-simple-systemd`
+
+
+By *taking over* management, it disables all ``ceph-disk`` systemd units used
+to trigger devices at startup, relying on basic (customizable) JSON
+configuration and systemd for starting up OSDs.
+
+This process involves two steps:
+
+#. :ref:`Scan <ceph-volume-simple-scan>` the running OSD or the data device
+#. :ref:`Activate <ceph-volume-simple-activate>` the scanned OSD
+
+The scanning will infer everything that ``ceph-volume`` needs to start the OSD,
+so that when activation is needed, the OSD can start normally without getting
+interference from ``ceph-disk``.
+
+As part of the activation process the systemd units for ``ceph-disk`` in charge
+of reacting to ``udev`` events, are linked to ``/dev/null`` so that they are
+fully inactive.
diff --git a/doc/ceph-volume/simple/scan.rst b/doc/ceph-volume/simple/scan.rst
new file mode 100644
index 000000000..2749b14b6
--- /dev/null
+++ b/doc/ceph-volume/simple/scan.rst
@@ -0,0 +1,176 @@
+.. _ceph-volume-simple-scan:
+
+``scan``
+========
+Scanning allows to capture any important details from an already-deployed OSD
+so that ``ceph-volume`` can manage it without the need of any other startup
+workflows or tools (like ``udev`` or ``ceph-disk``). Encryption with LUKS or
+PLAIN formats is fully supported.
+
+The command has the ability to inspect a running OSD, by inspecting the
+directory where the OSD data is stored, or by consuming the data partition.
+The command can also scan all running OSDs if no path or device is provided.
+
+Once scanned, information will (by default) persist the metadata as JSON in
+a file in ``/etc/ceph/osd``. This ``JSON`` file will use the naming convention
+of: ``{OSD ID}-{OSD FSID}.json``. An OSD with an id of 1, and an FSID like
+``86ebd829-1405-43d3-8fd6-4cbc9b6ecf96`` the absolute path of the file would
+be::
+
+    /etc/ceph/osd/1-86ebd829-1405-43d3-8fd6-4cbc9b6ecf96.json
+
+The ``scan`` subcommand will refuse to write to this file if it already exists.
+If overwriting the contents is needed, the ``--force`` flag must be used::
+
+    ceph-volume simple scan --force {path}
+
+If there is no need to persist the ``JSON`` metadata, there is support to send
+the contents to ``stdout`` (no file will be written)::
+
+    ceph-volume simple scan --stdout {path}
+
+
+.. _ceph-volume-simple-scan-directory:
+
+Running OSDs scan
+-----------------
+Using this command without providing an OSD directory or device will scan the
+directories of any currently running OSDs. If a running OSD was not created
+by ceph-disk it will be ignored and not scanned.
+
+To scan all running ceph-disk OSDs, the command would look like::
+
+    ceph-volume simple scan
+
+Directory scan
+--------------
+The directory scan will capture OSD file contents from interesting files. There
+are a few files that must exist in order to have a successful scan:
+
+* ``ceph_fsid``
+* ``fsid``
+* ``keyring``
+* ``ready``
+* ``type``
+* ``whoami``
+
+If the OSD is encrypted, it will additionally add the following keys:
+
+* ``encrypted``
+* ``encryption_type``
+* ``lockbox_keyring``
+
+In the case of any other file, as long as it is not a binary or a directory, it
+will also get captured and persisted as part of the JSON object.
+
+The convention for the keys in the JSON object is that any file name will be
+a key, and its contents will be its value. If the contents are a single line
+(like in the case of the ``whoami``) the contents are trimmed, and the newline
+is dropped. For example with an OSD with an id of 1, this is how the JSON entry
+would look like::
+
+    "whoami": "1",
+
+For files that may have more than one line, the contents are left as-is, except
+for keyrings which are treated specially and parsed to extract the keyring. For
+example, a ``keyring`` that gets read as::
+
+    [osd.1]\n\tkey = AQBBJ/dZp57NIBAAtnuQS9WOS0hnLVe0rZnE6Q==\n
+
+Would get stored as::
+
+    "keyring": "AQBBJ/dZp57NIBAAtnuQS9WOS0hnLVe0rZnE6Q==",
+
+
+For a directory like ``/var/lib/ceph/osd/ceph-1``, the command could look
+like::
+
+    ceph-volume simple scan /var/lib/ceph/osd/ceph1
+
+
+.. _ceph-volume-simple-scan-device:
+
+Device scan
+-----------
+When an OSD directory is not available (OSD is not running, or device is not
+mounted) the ``scan`` command is able to introspect the device to capture
+required data. Just like :ref:`ceph-volume-simple-scan-directory`, it would
+still require a few files present. This means that the device to be scanned
+**must be** the data partition of the OSD.
+
+As long as the data partition of the OSD is being passed in as an argument, the
+sub-command can scan its contents.
+
+In the case where the device is already mounted, the tool can detect this
+scenario and capture file contents from that directory.
+
+If the device is not mounted, a temporary directory will be created, and the
+device will be mounted temporarily just for scanning the contents. Once
+contents are scanned, the device will be unmounted.
+
+For a device like ``/dev/sda1`` which **must** be a data partition, the command
+could look like::
+
+    ceph-volume simple scan /dev/sda1
+
+
+.. _ceph-volume-simple-scan-json:
+
+``JSON`` contents
+-----------------
+The contents of the JSON object is very simple. The scan not only will persist
+information from the special OSD files and their contents, but will also
+validate paths and device UUIDs. Unlike what ``ceph-disk`` would do, by storing
+them in ``{device type}_uuid`` files, the tool will persist them as part of the
+device type key.
+
+For example, a ``block.db`` device would look something like::
+
+    "block.db": {
+        "path": "/dev/disk/by-partuuid/6cc43680-4f6e-4feb-92ff-9c7ba204120e",
+        "uuid": "6cc43680-4f6e-4feb-92ff-9c7ba204120e"
+    },
+
+But it will also persist the ``ceph-disk`` special file generated, like so::
+
+    "block.db_uuid": "6cc43680-4f6e-4feb-92ff-9c7ba204120e",
+
+This duplication is in place because the tool is trying to ensure the
+following:
+
+# Support OSDs that may not have ceph-disk special files
+# Check the most up-to-date information on the device, by querying against LVM
+and ``blkid``
+# Support both logical volumes and GPT devices
+
+This is a sample ``JSON`` metadata, from an OSD that is using ``bluestore``::
+
+    {
+        "active": "ok",
+        "block": {
+            "path": "/dev/disk/by-partuuid/40fd0a64-caa5-43a3-9717-1836ac661a12",
+            "uuid": "40fd0a64-caa5-43a3-9717-1836ac661a12"
+        },
+        "block.db": {
+            "path": "/dev/disk/by-partuuid/6cc43680-4f6e-4feb-92ff-9c7ba204120e",
+            "uuid": "6cc43680-4f6e-4feb-92ff-9c7ba204120e"
+        },
+        "block.db_uuid": "6cc43680-4f6e-4feb-92ff-9c7ba204120e",
+        "block_uuid": "40fd0a64-caa5-43a3-9717-1836ac661a12",
+        "bluefs": "1",
+        "ceph_fsid": "c92fc9eb-0610-4363-aafc-81ddf70aaf1b",
+        "cluster_name": "ceph",
+        "data": {
+            "path": "/dev/sdr1",
+            "uuid": "86ebd829-1405-43d3-8fd6-4cbc9b6ecf96"
+        },
+        "fsid": "86ebd829-1405-43d3-8fd6-4cbc9b6ecf96",
+        "keyring": "AQBBJ/dZp57NIBAAtnuQS9WOS0hnLVe0rZnE6Q==",
+        "kv_backend": "rocksdb",
+        "magic": "ceph osd volume v026",
+        "mkfs_done": "yes",
+        "ready": "ready",
+        "systemd": "",
+        "type": "bluestore",
+        "whoami": "3"
+    }
diff --git a/doc/ceph-volume/simple/systemd.rst b/doc/ceph-volume/simple/systemd.rst
new file mode 100644
index 000000000..aa5bebffe
--- /dev/null
+++ b/doc/ceph-volume/simple/systemd.rst
@@ -0,0 +1,28 @@
+.. _ceph-volume-simple-systemd:
+
+systemd
+=======
+Upon startup, it will identify the logical volume by loading the JSON file in
+``/etc/ceph/osd/{id}-{uuid}.json`` corresponding to the instance name of the
+systemd unit.
+
+After identifying the correct volume it will then proceed to mount it by using
+the OSD destination conventions, that is::
+
+    /var/lib/ceph/osd/{cluster name}-{osd id}
+
+For our example OSD with an id of ``0``, that means the identified device will
+be mounted at::
+
+
+    /var/lib/ceph/osd/ceph-0
+
+
+Once that process is complete, a call will be made to start the OSD::
+
+    systemctl start ceph-osd@0
+
+The systemd portion of this process is handled by the ``ceph-volume simple
+trigger`` sub-command, which is only in charge of parsing metadata coming from
+systemd and startup, and then dispatching to ``ceph-volume simple activate`` which
+would proceed with activation.
diff --git a/doc/ceph-volume/systemd.rst b/doc/ceph-volume/systemd.rst
new file mode 100644
index 000000000..5b5273c9c
--- /dev/null
+++ b/doc/ceph-volume/systemd.rst
@@ -0,0 +1,49 @@
+.. _ceph-volume-systemd:
+
+systemd
+=======
+As part of the activation process (either with :ref:`ceph-volume-lvm-activate`
+or :ref:`ceph-volume-simple-activate`), systemd units will get enabled that
+will use the OSD id and uuid as part of their name. These units will be run
+when the system boots, and will proceed to activate their corresponding
+volumes via their sub-command implementation.
+
+The API for activation is a bit loose, it only requires two parts: the
+subcommand to use and any extra meta information separated by a dash. This
+convention makes the units look like::
+
+    ceph-volume@{command}-{extra metadata}
+
+The *extra metadata* can be anything needed that the subcommand implementing
+the processing might need. In the case of :ref:`ceph-volume-lvm` and
+:ref:`ceph-volume-simple`, both look to consume the :term:`OSD id` and :term:`OSD uuid`,
+but this is not a hard requirement, it is just how the sub-commands are
+implemented.
+
+Both the command and extra metadata gets persisted by systemd as part of the
+*"instance name"* of the unit.  For example an OSD with an ID of 0, for the
+``lvm`` sub-command would look like::
+
+    systemctl enable ceph-volume@lvm-0-0A3E1ED2-DA8A-4F0E-AA95-61DEC71768D6
+
+The enabled unit is a :term:`systemd oneshot` service, meant to start at boot
+after the local file system is ready to be used.
+
+
+Failure and Retries
+-------------------
+It is common to have failures when a system is coming up online. The devices
+are sometimes not fully available and this unpredictable behavior may cause an
+OSD to not be ready to be used.
+
+There are two configurable environment variables used to set the retry
+behavior:
+
+* ``CEPH_VOLUME_SYSTEMD_TRIES``: Defaults to 30
+* ``CEPH_VOLUME_SYSTEMD_INTERVAL``: Defaults to 5
+
+The *"tries"* is a number that sets the maximum number of times the unit will
+attempt to activate an OSD before giving up.
+
+The *"interval"* is a value in seconds that determines the waiting time before
+initiating another try at activating the OSD.
diff --git a/doc/ceph-volume/zfs/index.rst b/doc/ceph-volume/zfs/index.rst
new file mode 100644
index 000000000..c06228de9
--- /dev/null
+++ b/doc/ceph-volume/zfs/index.rst
@@ -0,0 +1,31 @@
+.. _ceph-volume-zfs:
+
+``zfs``
+=======
+Implements the functionality needed to deploy OSDs from the ``zfs`` subcommand:
+``ceph-volume zfs``
+
+The current implementation only works for ZFS on FreeBSD
+
+**Command Line Subcommands**
+
+* :ref:`ceph-volume-zfs-inventory`
+
+.. not yet implemented
+.. * :ref:`ceph-volume-zfs-prepare`
+
+.. * :ref:`ceph-volume-zfs-activate`
+
+.. * :ref:`ceph-volume-zfs-create`
+
+.. * :ref:`ceph-volume-zfs-list`
+
+.. * :ref:`ceph-volume-zfs-scan`
+
+**Internal functionality**
+
+There are other aspects of the ``zfs`` subcommand that are internal and not
+exposed to the user, these sections explain how these pieces work together,
+clarifying the workflows of the tool.
+
+:ref:`zfs <ceph-volume-zfs-api>`
diff --git a/doc/ceph-volume/zfs/inventory.rst b/doc/ceph-volume/zfs/inventory.rst
new file mode 100644
index 000000000..fd00325b6
--- /dev/null
+++ b/doc/ceph-volume/zfs/inventory.rst
@@ -0,0 +1,19 @@
+.. _ceph-volume-zfs-inventory:
+
+``inventory``
+=============
+The ``inventory`` subcommand queries a host's disc inventory through GEOM and provides
+hardware information and metadata on every physical device.
+
+This only works on a FreeBSD platform.
+
+By default the command returns a short, human-readable report of all physical disks.
+
+For programmatic consumption of this report pass ``--format json`` to generate a
+JSON formatted report. This report includes extensive information on the
+physical drives such as disk metadata (like model and size), logical volumes
+and whether they are used by ceph, and if the disk is usable by ceph and
+reasons why not.
+
+A device path can be specified to report extensive information on a device in
+both plain and json format.