summaryrefslogtreecommitdiffstats
path: root/doc/ceph-volume/lvm
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--doc/ceph-volume/lvm/activate.rst113
-rw-r--r--doc/ceph-volume/lvm/batch.rst129
-rw-r--r--doc/ceph-volume/lvm/create.rst24
-rw-r--r--doc/ceph-volume/lvm/encryption.rst86
-rw-r--r--doc/ceph-volume/lvm/index.rst28
-rw-r--r--doc/ceph-volume/lvm/list.rst184
-rw-r--r--doc/ceph-volume/lvm/prepare.rst332
-rw-r--r--doc/ceph-volume/lvm/scan.rst9
-rw-r--r--doc/ceph-volume/lvm/systemd.rst28
-rw-r--r--doc/ceph-volume/lvm/zap.rst65
10 files changed, 998 insertions, 0 deletions
diff --git a/doc/ceph-volume/lvm/activate.rst b/doc/ceph-volume/lvm/activate.rst
new file mode 100644
index 00000000..5e43d4cb
--- /dev/null
+++ b/doc/ceph-volume/lvm/activate.rst
@@ -0,0 +1,113 @@
+.. _ceph-volume-lvm-activate:
+
+``activate``
+============
+Once :ref:`ceph-volume-lvm-prepare` is completed, and all the various steps
+that entails are done, the volume is ready to get "activated".
+
+This activation process enables a systemd unit that persists the OSD ID and its
+UUID (also called ``fsid`` in Ceph CLI tools), so that at boot time it can
+understand what OSD is enabled and needs to be mounted.
+
+.. note:: The execution of this call is fully idempotent, and there is no
+ side-effects when running multiple times
+
+New OSDs
+--------
+To activate newly prepared OSDs both the :term:`OSD id` and :term:`OSD uuid`
+need to be supplied. For example::
+
+ ceph-volume lvm activate --bluestore 0 0263644D-0BF1-4D6D-BC34-28BD98AE3BC8
+
+.. note:: The UUID is stored in the ``fsid`` file in the OSD path, which is
+ generated when :ref:`ceph-volume-lvm-prepare` is used.
+
+Activating all OSDs
+-------------------
+It is possible to activate all existing OSDs at once by using the ``--all``
+flag. For example::
+
+ ceph-volume lvm activate --all
+
+This call will inspect all the OSDs created by ceph-volume that are inactive
+and will activate them one by one. If any of the OSDs are already running, it
+will report them in the command output and skip them, making it safe to rerun
+(idempotent).
+
+requiring uuids
+^^^^^^^^^^^^^^^
+The :term:`OSD uuid` is being required as an extra step to ensure that the
+right OSD is being activated. It is entirely possible that a previous OSD with
+the same id exists and would end up activating the incorrect one.
+
+
+dmcrypt
+^^^^^^^
+If the OSD was prepared with dmcrypt by ceph-volume, there is no need to
+specify ``--dmcrypt`` on the command line again (that flag is not available for
+the ``activate`` subcommand). An encrypted OSD will be automatically detected.
+
+
+Discovery
+---------
+With OSDs previously created by ``ceph-volume``, a *discovery* process is
+performed using :term:`LVM tags` to enable the systemd units.
+
+The systemd unit will capture the :term:`OSD id` and :term:`OSD uuid` and
+persist it. Internally, the activation will enable it like::
+
+ systemctl enable ceph-volume@lvm-$id-$uuid
+
+For example::
+
+ systemctl enable ceph-volume@lvm-0-8715BEB4-15C5-49DE-BA6F-401086EC7B41
+
+Would start the discovery process for the OSD with an id of ``0`` and a UUID of
+``8715BEB4-15C5-49DE-BA6F-401086EC7B41``.
+
+.. note:: for more details on the systemd workflow see :ref:`ceph-volume-lvm-systemd`
+
+The systemd unit will look for the matching OSD device, and by looking at its
+:term:`LVM tags` will proceed to:
+
+# mount the device in the corresponding location (by convention this is
+ ``/var/lib/ceph/osd/<cluster name>-<osd id>/``)
+
+# ensure that all required devices are ready for that OSD. In the case of
+a journal (when ``--filestore`` is selected) the device will be queried (with
+``blkid`` for partitions, and lvm for logical volumes) to ensure that the
+correct device is being linked. The symbolic link will *always* be re-done to
+ensure that the correct device is linked.
+
+# start the ``ceph-osd@0`` systemd unit
+
+.. note:: The system infers the objectstore type (filestore or bluestore) by
+ inspecting the LVM tags applied to the OSD devices
+
+Existing OSDs
+-------------
+For existing OSDs that have been deployed with ``ceph-disk``, they need to be
+scanned and activated :ref:`using the simple sub-command <ceph-volume-simple>`.
+If a different tooling was used then the only way to port them over to the new
+mechanism is to prepare them again (losing data). See
+:ref:`ceph-volume-lvm-existing-osds` for details on how to proceed.
+
+Summary
+-------
+To recap the ``activate`` process for :term:`bluestore`:
+
+#. require both :term:`OSD id` and :term:`OSD uuid`
+#. enable the system unit with matching id and uuid
+#. Create the ``tmpfs`` mount at the OSD directory in
+ ``/var/lib/ceph/osd/$cluster-$id/``
+#. Recreate all the files needed with ``ceph-bluestore-tool prime-osd-dir`` by
+ pointing it to the OSD ``block`` device.
+#. the systemd unit will ensure all devices are ready and linked
+#. the matching ``ceph-osd`` systemd unit will get started
+
+And for :term:`filestore`:
+
+#. require both :term:`OSD id` and :term:`OSD uuid`
+#. enable the system unit with matching id and uuid
+#. the systemd unit will ensure all devices are ready and mounted (if needed)
+#. the matching ``ceph-osd`` systemd unit will get started
diff --git a/doc/ceph-volume/lvm/batch.rst b/doc/ceph-volume/lvm/batch.rst
new file mode 100644
index 00000000..84959fb5
--- /dev/null
+++ b/doc/ceph-volume/lvm/batch.rst
@@ -0,0 +1,129 @@
+.. _ceph-volume-lvm-batch:
+
+``batch``
+===========
+The subcommand allows to create multiple OSDs at the same time given
+an input of devices.
+
+The subcommand is based to :ref:`ceph-volume-lvm-create`, and will use the very
+same code path. All ``batch`` does is to calculate the appropriate sizes of all
+volumes and skip over already created volumes.
+
+All the features that ``ceph-volume lvm create`` supports, like ``dmcrypt``,
+avoiding ``systemd`` units from starting, defining bluestore or filestore,
+are supported.
+
+
+.. _ceph-volume-lvm-batch_auto:
+
+Automatic sorting of disks
+--------------------------
+If ``batch`` receives only a single list of devices and the ``--no-auto`` option
+ is not passed, ``ceph-volume`` will auto-sort disks by its rotational
+ property and use non-rotating disks for ``block.db`` or ``journal`` depending
+ on the objectstore used.
+This behavior is now DEPRECATED and will be removed in future releases. Instead
+ an ``auto`` option will be introduced to retain this behavior.
+It is recommended to make use of the explicit device lists for ``block.db``,
+ ``block.wal`` and ``journal``.
+For example assuming :term:`bluestore` is used and ``--no-auto`` is not passed,
+ the deprecated behavior would deploy the following, depending on the devices
+ passed:
+
+#. Devices are all spinning HDDs: 1 OSD is created per device
+#. Devices are all SSDs: 2 OSDs are created per device
+#. Devices are a mix of HDDs and SSDs: data is placed on the spinning device,
+ the ``block.db`` is created on the SSD, as large as possible.
+
+.. note:: Although operations in ``ceph-volume lvm create`` allow usage of
+ ``block.wal`` it isn't supported with the ``auto`` behavior.
+
+.. _ceph-volume-lvm-batch_bluestore:
+
+Reporting
+=========
+By default ``batch`` will print a report of the computed OSD layout and ask the
+user to confirm. This can be overridden by passing ``--yes``.
+
+If one wants to try out several invocations with being asked to deploy
+``--report`` can be passed. ``ceph-volume`` will exit after printing the report.
+
+Consider the following invocation::
+
+ $ ceph-volume lvm batch --report /dev/sdb /dev/sdc /dev/sdd --db-devices /dev/nvme0n1
+
+This will deploy three OSDs with external ``db`` and ``wal`` volumes on
+an NVME device.
+
+**pretty reporting**
+The ``pretty`` report format (the default) would
+look like this::
+
+ $ ceph-volume lvm batch --report /dev/sdb /dev/sdc /dev/sdd --db-devices /dev/nvme0n1
+
+ Total OSDs: 3
+
+
+
+
+**JSON reporting**
+Reporting can produce a structured output with ``--format json``::
+
+ $ ceph-volume lvm batch --report /dev/sdb /dev/sdc /dev/sdd --db-devices /dev/nvme0n1
+
+Sizing
+======
+When no sizing arguments are passed, `ceph-volume` will derive the sizing from
+the passed device lists (or the sorted lists when using the automatic sorting).
+`ceph-volume batch` will attempt to fully utilize a devices available capacity.
+Relying on automatic sizing is recommended.
+
+If one requires a different sizing policy for wal, db or journal devices,
+`ceph-volume` offers implicit and explicit sizing rules.
+
+Implicit sizing
+---------------
+Scenarios in which either devices are under-comitted or not all data devices are
+currently ready for use (due to a broken disk for example), one can still rely
+on `ceph-volume` automatic sizing.
+Users can provide hints to `ceph-volume` as to how many data devices should have
+their external volumes on a set of fast devices. These options are:
+
+* `--block-db-slots`
+* `--block-wal-slots`
+* `--journal-slots`
+
+For example consider an OSD host that is supposed to contain 5 data devices and
+one device for wal/db volumes. However one data device is currently broken and
+is being replaced. Instead of calculating the explicit sizes for the wal/db
+volume one can simply call::
+
+ $ ceph-volume lvm batch --report /dev/sdb /dev/sdc /dev/sdd /dev/sde --db-devices /dev/nvme0n1 --block-db-slots 5
+
+Explicit sizing
+---------------
+It is also possible to provide explicit sizes to `ceph-volume` via the arguments
+
+* `--block-db-size`
+* `--block-wal-size`
+* `--journal-size`
+
+`ceph-volume` will try to satisfy the requested sizes given the passed disks. If
+this is not possible, no OSDs will be deployed.
+
+
+Idempotency and disk replacements
+=================================
+`ceph-volume lvm batch` intends to be idempotent, i.e. calling the same command
+repeatedly must result in the same outcome. For example calling::
+
+ $ ceph-volume lvm batch --report /dev/sdb /dev/sdc /dev/sdd --db-devices /dev/nvme0n1
+
+will result in three deployed OSDs (if all disks were available). Calling this
+command again, you will still end up with three OSDs and ceph-volume will exit
+with return code 0.
+
+Suppose /dev/sdc goes bad and needs to be replaced. After destroying the OSD and
+replacing the hardware, you can again call the same command and `ceph-volume`
+will detect that only two out of the three wanted OSDs are setup and re-create
+the missing OSD.
diff --git a/doc/ceph-volume/lvm/create.rst b/doc/ceph-volume/lvm/create.rst
new file mode 100644
index 00000000..c90d1f6f
--- /dev/null
+++ b/doc/ceph-volume/lvm/create.rst
@@ -0,0 +1,24 @@
+.. _ceph-volume-lvm-create:
+
+``create``
+===========
+This subcommand wraps the two-step process to provision a new osd (calling
+``prepare`` first and then ``activate``) into a single
+one. The reason to prefer ``prepare`` and then ``activate`` is to gradually
+introduce new OSDs into a cluster, and avoiding large amounts of data being
+rebalanced.
+
+The single-call process unifies exactly what :ref:`ceph-volume-lvm-prepare` and
+:ref:`ceph-volume-lvm-activate` do, with the convenience of doing it all at
+once.
+
+There is nothing different to the process except the OSD will become up and in
+immediately after completion.
+
+The backing objectstore can be specified with:
+
+* :ref:`--filestore <ceph-volume-lvm-prepare_filestore>`
+* :ref:`--bluestore <ceph-volume-lvm-prepare_bluestore>`
+
+All command line flags and options are the same as ``ceph-volume lvm prepare``.
+Please refer to :ref:`ceph-volume-lvm-prepare` for details.
diff --git a/doc/ceph-volume/lvm/encryption.rst b/doc/ceph-volume/lvm/encryption.rst
new file mode 100644
index 00000000..1483ef32
--- /dev/null
+++ b/doc/ceph-volume/lvm/encryption.rst
@@ -0,0 +1,86 @@
+.. _ceph-volume-lvm-encryption:
+
+Encryption
+==========
+
+Logical volumes can be encrypted using ``dmcrypt`` by specifying the
+``--dmcrypt`` flag when creating OSDs. Encryption can be done in different ways,
+specially with LVM. ``ceph-volume`` is somewhat opinionated with the way it
+sets up encryption with logical volumes so that the process is consistent and
+robust.
+
+In this case, ``ceph-volume lvm`` follows these constraints:
+
+* only LUKS (version 1) is used
+* Logical Volumes are encrypted, while their underlying PVs (physical volumes)
+ aren't
+* Non-LVM devices like partitions are also encrypted with the same OSD key
+
+
+LUKS
+----
+There are currently two versions of LUKS, 1 and 2. Version 2 is a bit easier
+to implement but not widely available in all distros Ceph supports. LUKS 1 is
+not going to be deprecated in favor of LUKS 2, so in order to have as wide
+support as possible, ``ceph-volume`` uses LUKS version 1.
+
+.. note:: Version 1 of LUKS is just referenced as "LUKS" whereas version 2 is
+ referred to as LUKS2
+
+
+LUKS on LVM
+-----------
+Encryption is done on top of existing logical volumes (unlike encrypting the
+physical device). Any single logical volume can be encrypted while other
+volumes can remain unencrypted. This method also allows for flexible logical
+volume setups, since encryption will happen once the LV is created.
+
+
+Workflow
+--------
+When setting up the OSD, a secret key will be created, that will be passed
+along to the monitor in JSON format as ``stdin`` to prevent the key from being
+captured in the logs.
+
+The JSON payload looks something like::
+
+ {
+ "cephx_secret": CEPHX_SECRET,
+ "dmcrypt_key": DMCRYPT_KEY,
+ "cephx_lockbox_secret": LOCKBOX_SECRET,
+ }
+
+The naming convention for the keys is **strict**, and they are named like that
+for the hardcoded (legacy) names ceph-disk used.
+
+* ``cephx_secret`` : The cephx key used to authenticate
+* ``dmcrypt_key`` : The secret (or private) key to unlock encrypted devices
+* ``cephx_lockbox_secret`` : The authentication key used to retrieve the
+ ``dmcrypt_key``. It is named *lockbox* because ceph-disk used to have an
+ unencrypted partition named after it, used to store public keys and other
+ OSD metadata.
+
+The naming convention is strict because Monitors supported the naming
+convention by ceph-disk, which used these key names. In order to keep
+compatibility and prevent ceph-disk from breaking, ceph-volume will use the same
+naming convention *although they don't make sense for the new encryption
+workflow*.
+
+After the common steps of setting up the OSD during the prepare stage, either
+with :term:`filestore` or :term:`bluestore`, the logical volume is left ready
+to be activated, regardless of the state of the device (encrypted or decrypted).
+
+At activation time, the logical volume will get decrypted and the OSD started
+once the process completes correctly.
+
+Summary of the encryption workflow for creating a new OSD:
+
+#. OSD is created, both lockbox and dmcrypt keys are created, and sent along
+ with JSON to the monitors, indicating an encrypted OSD.
+
+#. All complementary devices (like journal, db, or wal) get created and
+ encrypted with the same OSD key. Key is stored in the LVM metadata of the
+ OSD
+
+#. Activation continues by ensuring devices are mounted, retrieving the dmcrypt
+ secret key from the monitors and decrypting before the OSD gets started.
diff --git a/doc/ceph-volume/lvm/index.rst b/doc/ceph-volume/lvm/index.rst
new file mode 100644
index 00000000..9a2191fb
--- /dev/null
+++ b/doc/ceph-volume/lvm/index.rst
@@ -0,0 +1,28 @@
+.. _ceph-volume-lvm:
+
+``lvm``
+=======
+Implements the functionality needed to deploy OSDs from the ``lvm`` subcommand:
+``ceph-volume lvm``
+
+**Command Line Subcommands**
+
+* :ref:`ceph-volume-lvm-prepare`
+
+* :ref:`ceph-volume-lvm-activate`
+
+* :ref:`ceph-volume-lvm-create`
+
+* :ref:`ceph-volume-lvm-list`
+
+.. not yet implemented
+.. * :ref:`ceph-volume-lvm-scan`
+
+**Internal functionality**
+
+There are other aspects of the ``lvm`` subcommand that are internal and not
+exposed to the user, these sections explain how these pieces work together,
+clarifying the workflows of the tool.
+
+:ref:`Systemd Units <ceph-volume-lvm-systemd>` |
+:ref:`lvm <ceph-volume-lvm-api>`
diff --git a/doc/ceph-volume/lvm/list.rst b/doc/ceph-volume/lvm/list.rst
new file mode 100644
index 00000000..718154b1
--- /dev/null
+++ b/doc/ceph-volume/lvm/list.rst
@@ -0,0 +1,184 @@
+.. _ceph-volume-lvm-list:
+
+``list``
+========
+This subcommand will list any devices (logical and physical) that may be
+associated with a Ceph cluster, as long as they contain enough metadata to
+allow for that discovery.
+
+Output is grouped by the OSD ID associated with the devices, and unlike
+``ceph-disk`` it does not provide any information for devices that aren't
+associated with Ceph.
+
+Command line options:
+
+* ``--format`` Allows a ``json`` or ``pretty`` value. Defaults to ``pretty``
+ which will group the device information in a human-readable format.
+
+Full Reporting
+--------------
+When no positional arguments are used, a full reporting will be presented. This
+means that all devices and logical volumes found in the system will be
+displayed.
+
+Full ``pretty`` reporting for two OSDs, one with a lv as a journal, and another
+one with a physical device may look similar to::
+
+ # ceph-volume lvm list
+
+
+ ====== osd.1 =======
+
+ [journal] /dev/journals/journal1
+
+ journal uuid C65n7d-B1gy-cqX3-vZKY-ZoE0-IEYM-HnIJzs
+ osd id 1
+ cluster fsid ce454d91-d748-4751-a318-ff7f7aa18ffd
+ type journal
+ osd fsid 661b24f8-e062-482b-8110-826ffe7f13fa
+ data uuid SlEgHe-jX1H-QBQk-Sce0-RUls-8KlY-g8HgcZ
+ journal device /dev/journals/journal1
+ data device /dev/test_group/data-lv2
+ devices /dev/sda
+
+ [data] /dev/test_group/data-lv2
+
+ journal uuid C65n7d-B1gy-cqX3-vZKY-ZoE0-IEYM-HnIJzs
+ osd id 1
+ cluster fsid ce454d91-d748-4751-a318-ff7f7aa18ffd
+ type data
+ osd fsid 661b24f8-e062-482b-8110-826ffe7f13fa
+ data uuid SlEgHe-jX1H-QBQk-Sce0-RUls-8KlY-g8HgcZ
+ journal device /dev/journals/journal1
+ data device /dev/test_group/data-lv2
+ devices /dev/sdb
+
+ ====== osd.0 =======
+
+ [data] /dev/test_group/data-lv1
+
+ journal uuid cd72bd28-002a-48da-bdf6-d5b993e84f3f
+ osd id 0
+ cluster fsid ce454d91-d748-4751-a318-ff7f7aa18ffd
+ type data
+ osd fsid 943949f0-ce37-47ca-a33c-3413d46ee9ec
+ data uuid TUpfel-Q5ZT-eFph-bdGW-SiNW-l0ag-f5kh00
+ journal device /dev/sdd1
+ data device /dev/test_group/data-lv1
+ devices /dev/sdc
+
+ [journal] /dev/sdd1
+
+ PARTUUID cd72bd28-002a-48da-bdf6-d5b993e84f3f
+
+
+For logical volumes the ``devices`` key is populated with the physical devices
+associated with the logical volume. Since LVM allows multiple physical devices
+to be part of a logical volume, the value will be comma separated when using
+``pretty``, but an array when using ``json``.
+
+.. note:: Tags are displayed in a readable format. The ``osd id`` key is stored
+ as a ``ceph.osd_id`` tag. For more information on lvm tag conventions
+ see :ref:`ceph-volume-lvm-tag-api`
+
+Single Reporting
+----------------
+Single reporting can consume both devices and logical volumes as input
+(positional parameters). For logical volumes, it is required to use the group
+name as well as the logical volume name.
+
+For example the ``data-lv2`` logical volume, in the ``test_group`` volume group
+can be listed in the following way::
+
+ # ceph-volume lvm list test_group/data-lv2
+
+
+ ====== osd.1 =======
+
+ [data] /dev/test_group/data-lv2
+
+ journal uuid C65n7d-B1gy-cqX3-vZKY-ZoE0-IEYM-HnIJzs
+ osd id 1
+ cluster fsid ce454d91-d748-4751-a318-ff7f7aa18ffd
+ type data
+ osd fsid 661b24f8-e062-482b-8110-826ffe7f13fa
+ data uuid SlEgHe-jX1H-QBQk-Sce0-RUls-8KlY-g8HgcZ
+ journal device /dev/journals/journal1
+ data device /dev/test_group/data-lv2
+ devices /dev/sdc
+
+
+.. note:: Tags are displayed in a readable format. The ``osd id`` key is stored
+ as a ``ceph.osd_id`` tag. For more information on lvm tag conventions
+ see :ref:`ceph-volume-lvm-tag-api`
+
+
+For plain disks, the full path to the device is required. For example, for
+a device like ``/dev/sdd1`` it can look like::
+
+
+ # ceph-volume lvm list /dev/sdd1
+
+
+ ====== osd.0 =======
+
+ [journal] /dev/sdd1
+
+ PARTUUID cd72bd28-002a-48da-bdf6-d5b993e84f3f
+
+
+
+``json`` output
+---------------
+All output using ``--format=json`` will show everything the system has stored
+as metadata for the devices, including tags.
+
+No changes for readability are done with ``json`` reporting, and all
+information is presented as-is. Full output as well as single devices can be
+listed.
+
+For brevity, this is how a single logical volume would look with ``json``
+output (note how tags aren't modified)::
+
+ # ceph-volume lvm list --format=json test_group/data-lv1
+ {
+ "0": [
+ {
+ "devices": ["/dev/sda"],
+ "lv_name": "data-lv1",
+ "lv_path": "/dev/test_group/data-lv1",
+ "lv_tags": "ceph.cluster_fsid=ce454d91-d748-4751-a318-ff7f7aa18ffd,ceph.data_device=/dev/test_group/data-lv1,ceph.data_uuid=TUpfel-Q5ZT-eFph-bdGW-SiNW-l0ag-f5kh00,ceph.journal_device=/dev/sdd1,ceph.journal_uuid=cd72bd28-002a-48da-bdf6-d5b993e84f3f,ceph.osd_fsid=943949f0-ce37-47ca-a33c-3413d46ee9ec,ceph.osd_id=0,ceph.type=data",
+ "lv_uuid": "TUpfel-Q5ZT-eFph-bdGW-SiNW-l0ag-f5kh00",
+ "name": "data-lv1",
+ "path": "/dev/test_group/data-lv1",
+ "tags": {
+ "ceph.cluster_fsid": "ce454d91-d748-4751-a318-ff7f7aa18ffd",
+ "ceph.data_device": "/dev/test_group/data-lv1",
+ "ceph.data_uuid": "TUpfel-Q5ZT-eFph-bdGW-SiNW-l0ag-f5kh00",
+ "ceph.journal_device": "/dev/sdd1",
+ "ceph.journal_uuid": "cd72bd28-002a-48da-bdf6-d5b993e84f3f",
+ "ceph.osd_fsid": "943949f0-ce37-47ca-a33c-3413d46ee9ec",
+ "ceph.osd_id": "0",
+ "ceph.type": "data"
+ },
+ "type": "data",
+ "vg_name": "test_group"
+ }
+ ]
+ }
+
+
+Synchronized information
+------------------------
+Before any listing type, the lvm API is queried to ensure that physical devices
+that may be in use haven't changed naming. It is possible that non-persistent
+devices like ``/dev/sda1`` could change to ``/dev/sdb1``.
+
+The detection is possible because the ``PARTUUID`` is stored as part of the
+metadata in the logical volume for the data lv. Even in the case of a journal
+that is a physical device, this information is still stored on the data logical
+volume associated with it.
+
+If the name is no longer the same (as reported by ``blkid`` when using the
+``PARTUUID``), the tag will get updated and the report will use the newly
+refreshed information.
diff --git a/doc/ceph-volume/lvm/prepare.rst b/doc/ceph-volume/lvm/prepare.rst
new file mode 100644
index 00000000..78a7c091
--- /dev/null
+++ b/doc/ceph-volume/lvm/prepare.rst
@@ -0,0 +1,332 @@
+.. _ceph-volume-lvm-prepare:
+
+``prepare``
+===========
+This subcommand allows a :term:`filestore` or :term:`bluestore` setup. It is
+recommended to pre-provision a logical volume before using it with
+``ceph-volume lvm``.
+
+Logical volumes are not altered except for adding extra metadata.
+
+.. note:: This is part of a two step process to deploy an OSD. If looking for
+ a single-call way, please see :ref:`ceph-volume-lvm-create`
+
+To help identify volumes, the process of preparing a volume (or volumes) to
+work with Ceph, the tool will assign a few pieces of metadata information using
+:term:`LVM tags`.
+
+:term:`LVM tags` makes volumes easy to discover later, and help identify them as
+part of a Ceph system, and what role they have (journal, filestore, bluestore,
+etc...)
+
+Although initially :term:`filestore` is supported (and supported by default)
+the back end can be specified with:
+
+
+* :ref:`--filestore <ceph-volume-lvm-prepare_filestore>`
+* :ref:`--bluestore <ceph-volume-lvm-prepare_bluestore>`
+
+.. _ceph-volume-lvm-prepare_bluestore:
+
+``bluestore``
+-------------
+The :term:`bluestore` objectstore is the default for new OSDs. It offers a bit
+more flexibility for devices compared to :term:`filestore`.
+Bluestore supports the following configurations:
+
+* A block device, a block.wal, and a block.db device
+* A block device and a block.wal device
+* A block device and a block.db device
+* A single block device
+
+The bluestore subcommand accepts physical block devices, partitions on
+physical block devices or logical volumes as arguments for the various device parameters
+If a physical device is provided, a logical volume will be created. A volume group will
+either be created or reused it its name begins with ``ceph``.
+This allows a simpler approach at using LVM but at the cost of flexibility:
+there are no options or configurations to change how the LV is created.
+
+The ``block`` is specified with the ``--data`` flag, and in its simplest use
+case it looks like::
+
+ ceph-volume lvm prepare --bluestore --data vg/lv
+
+A raw device can be specified in the same way::
+
+ ceph-volume lvm prepare --bluestore --data /path/to/device
+
+For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required::
+
+ ceph-volume lvm prepare --bluestore --dmcrypt --data vg/lv
+
+If a ``block.db`` or a ``block.wal`` is needed (they are optional for
+bluestore) they can be specified with ``--block.db`` and ``--block.wal``
+accordingly. These can be a physical device, a partition or
+a logical volume.
+
+For both ``block.db`` and ``block.wal`` partitions aren't made logical volumes
+because they can be used as-is.
+
+While creating the OSD directory, the process will use a ``tmpfs`` mount to
+place all the files needed for the OSD. These files are initially created by
+``ceph-osd --mkfs`` and are fully ephemeral.
+
+A symlink is always created for the ``block`` device, and optionally for
+``block.db`` and ``block.wal``. For a cluster with a default name, and an OSD
+id of 0, the directory could look like::
+
+ # ls -l /var/lib/ceph/osd/ceph-0
+ lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block -> /dev/ceph-be2b6fbd-bcf2-4c51-b35d-a35a162a02f0/osd-block-25cf0a05-2bc6-44ef-9137-79d65bd7ad62
+ lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.db -> /dev/sda1
+ lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.wal -> /dev/ceph/osd-wal-0
+ -rw-------. 1 ceph ceph 37 Oct 20 13:05 ceph_fsid
+ -rw-------. 1 ceph ceph 37 Oct 20 13:05 fsid
+ -rw-------. 1 ceph ceph 55 Oct 20 13:05 keyring
+ -rw-------. 1 ceph ceph 6 Oct 20 13:05 ready
+ -rw-------. 1 ceph ceph 10 Oct 20 13:05 type
+ -rw-------. 1 ceph ceph 2 Oct 20 13:05 whoami
+
+In the above case, a device was used for ``block`` so ``ceph-volume`` create
+a volume group and a logical volume using the following convention:
+
+* volume group name: ``ceph-{cluster fsid}`` or if the vg exists already
+ ``ceph-{random uuid}``
+
+* logical volume name: ``osd-block-{osd_fsid}``
+
+
+.. _ceph-volume-lvm-prepare_filestore:
+
+``filestore``
+-------------
+This is the OSD backend that allows preparation of logical volumes for
+a :term:`filestore` objectstore OSD.
+
+It can use a logical volume for the OSD data and a physical device, a partition
+or logical volume for the journal. A physical device will have a logical volume
+created on it. A volume group will either be created or reused it its name begins
+with ``ceph``. No special preparation is needed for these volumes other than
+following the minimum size requirements for data and journal.
+
+The CLI call looks like this of a basic standalone filestore OSD::
+
+ ceph-volume lvm prepare --filestore --data <data block device>
+
+To deploy file store with an external journal::
+
+ ceph-volume lvm prepare --filestore --data <data block device> --journal <journal block device>
+
+For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required::
+
+ ceph-volume lvm prepare --filestore --dmcrypt --data <data block device> --journal <journal block device>
+
+Both the journal and data block device can take three forms:
+
+* a physical block device
+* a partition on a physical block device
+* a logical volume
+
+When using logical volumes the value *must* be of the format
+``volume_group/logical_volume``. Since logical volume names
+are not enforced for uniqueness, this prevents accidentally
+choosing the wrong volume.
+
+When using a partition, it *must* contain a ``PARTUUID``, that can be
+discovered by ``blkid``. THis ensure it can later be identified correctly
+regardless of the device name (or path).
+
+For example: passing a logical volume for data and a partition ``/dev/sdc1`` for
+the journal::
+
+ ceph-volume lvm prepare --filestore --data volume_group/lv_name --journal /dev/sdc1
+
+Passing a bare device for data and a logical volume ias the journal::
+
+ ceph-volume lvm prepare --filestore --data /dev/sdc --journal volume_group/journal_lv
+
+A generated uuid is used to ask the cluster for a new OSD. These two pieces are
+crucial for identifying an OSD and will later be used throughout the
+:ref:`ceph-volume-lvm-activate` process.
+
+The OSD data directory is created using the following convention::
+
+ /var/lib/ceph/osd/<cluster name>-<osd id>
+
+At this point the data volume is mounted at this location, and the journal
+volume is linked::
+
+ ln -s /path/to/journal /var/lib/ceph/osd/<cluster_name>-<osd-id>/journal
+
+The monmap is fetched using the bootstrap key from the OSD::
+
+ /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
+ --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
+ mon getmap -o /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap
+
+``ceph-osd`` will be called to populate the OSD directory, that is already
+mounted, re-using all the pieces of information from the initial steps::
+
+ ceph-osd --cluster ceph --mkfs --mkkey -i <osd id> \
+ --monmap /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap --osd-data \
+ /var/lib/ceph/osd/<cluster name>-<osd id> --osd-journal /var/lib/ceph/osd/<cluster name>-<osd id>/journal \
+ --osd-uuid <osd uuid> --keyring /var/lib/ceph/osd/<cluster name>-<osd id>/keyring \
+ --setuser ceph --setgroup ceph
+
+
+.. _ceph-volume-lvm-partitions:
+
+Partitioning
+------------
+``ceph-volume lvm`` does not currently create partitions from a whole device.
+If using device partitions the only requirement is that they contain the
+``PARTUUID`` and that it is discoverable by ``blkid``. Both ``fdisk`` and
+``parted`` will create that automatically for a new partition.
+
+For example, using a new, unformatted drive (``/dev/sdd`` in this case) we can
+use ``parted`` to create a new partition. First we list the device
+information::
+
+ $ parted --script /dev/sdd print
+ Model: VBOX HARDDISK (scsi)
+ Disk /dev/sdd: 11.5GB
+ Sector size (logical/physical): 512B/512B
+ Disk Flags:
+
+This device is not even labeled yet, so we can use ``parted`` to create
+a ``gpt`` label before we create a partition, and verify again with ``parted
+print``::
+
+ $ parted --script /dev/sdd mklabel gpt
+ $ parted --script /dev/sdd print
+ Model: VBOX HARDDISK (scsi)
+ Disk /dev/sdd: 11.5GB
+ Sector size (logical/physical): 512B/512B
+ Partition Table: gpt
+ Disk Flags:
+
+Now lets create a single partition, and verify later if ``blkid`` can find
+a ``PARTUUID`` that is needed by ``ceph-volume``::
+
+ $ parted --script /dev/sdd mkpart primary 1 100%
+ $ blkid /dev/sdd1
+ /dev/sdd1: PARTLABEL="primary" PARTUUID="16399d72-1e1f-467d-96ee-6fe371a7d0d4"
+
+
+.. _ceph-volume-lvm-existing-osds:
+
+Existing OSDs
+-------------
+For existing clusters that want to use this new system and have OSDs that are
+already running there are a few things to take into account:
+
+.. warning:: this process will forcefully format the data device, destroying
+ existing data, if any.
+
+* OSD paths should follow this convention::
+
+ /var/lib/ceph/osd/<cluster name>-<osd id>
+
+* Preferably, no other mechanisms to mount the volume should exist, and should
+ be removed (like fstab mount points)
+
+The one time process for an existing OSD, with an ID of 0 and using
+a ``"ceph"`` cluster name would look like (the following command will **destroy
+any data** in the OSD)::
+
+ ceph-volume lvm prepare --filestore --osd-id 0 --osd-fsid E3D291C1-E7BF-4984-9794-B60D9FA139CB
+
+The command line tool will not contact the monitor to generate an OSD ID and
+will format the LVM device in addition to storing the metadata on it so that it
+can be started later (for detailed metadata description see
+:ref:`ceph-volume-lvm-tags`).
+
+
+Crush device class
+------------------
+
+To set the crush device class for the OSD, use the ``--crush-device-class`` flag. This will
+work for both bluestore and filestore OSDs::
+
+ ceph-volume lvm prepare --bluestore --data vg/lv --crush-device-class foo
+
+
+.. _ceph-volume-lvm-multipath:
+
+``multipath`` support
+---------------------
+Devices that come from ``multipath`` are not supported as-is. The tool will
+refuse to consume a raw multipath device and will report a message like::
+
+ --> RuntimeError: Cannot use device (/dev/mapper/<name>). A vg/lv path or an existing device is needed
+
+The reason for not supporting multipath is that depending on the type of the
+multipath setup, if using an active/passive array as the underlying physical
+devices, filters are required in ``lvm.conf`` to exclude the disks that are part of
+those underlying devices.
+
+It is unfeasible for ceph-volume to understand what type of configuration is
+needed for LVM to be able to work in various different multipath scenarios. The
+functionality to create the LV for you is merely a (naive) convenience,
+anything that involves different settings or configuration must be provided by
+a config management system which can then provide VGs and LVs for ceph-volume
+to consume.
+
+This situation will only arise when trying to use the ceph-volume functionality
+that creates a volume group and logical volume from a device. If a multipath
+device is already a logical volume it *should* work, given that the LVM
+configuration is done correctly to avoid issues.
+
+
+Storing metadata
+----------------
+The following tags will get applied as part of the preparation process
+regardless of the type of volume (journal or data) or OSD objectstore:
+
+* ``cluster_fsid``
+* ``encrypted``
+* ``osd_fsid``
+* ``osd_id``
+* ``crush_device_class``
+
+For :term:`filestore` these tags will be added:
+
+* ``journal_device``
+* ``journal_uuid``
+
+For :term:`bluestore` these tags will be added:
+
+* ``block_device``
+* ``block_uuid``
+* ``db_device``
+* ``db_uuid``
+* ``wal_device``
+* ``wal_uuid``
+
+.. note:: For the complete lvm tag conventions see :ref:`ceph-volume-lvm-tag-api`
+
+
+Summary
+-------
+To recap the ``prepare`` process for :term:`bluestore`:
+
+#. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments.
+#. Creates logical volumes on any raw physical devices.
+#. Generate a UUID for the OSD
+#. Ask the monitor get an OSD ID reusing the generated UUID
+#. OSD data directory is created on a tmpfs mount.
+#. ``block``, ``block.wal``, and ``block.db`` are symlinked if defined.
+#. monmap is fetched for activation
+#. Data directory is populated by ``ceph-osd``
+#. Logical Volumes are assigned all the Ceph metadata using lvm tags
+
+
+And the ``prepare`` process for :term:`filestore`:
+
+#. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments.
+#. Generate a UUID for the OSD
+#. Ask the monitor get an OSD ID reusing the generated UUID
+#. OSD data directory is created and data volume mounted
+#. Journal is symlinked from data volume to journal location
+#. monmap is fetched for activation
+#. devices is mounted and data directory is populated by ``ceph-osd``
+#. data and journal volumes are assigned all the Ceph metadata using lvm tags
diff --git a/doc/ceph-volume/lvm/scan.rst b/doc/ceph-volume/lvm/scan.rst
new file mode 100644
index 00000000..aa9990f7
--- /dev/null
+++ b/doc/ceph-volume/lvm/scan.rst
@@ -0,0 +1,9 @@
+scan
+====
+This sub-command will allow to discover Ceph volumes previously setup by the
+tool by looking into the system's logical volumes and their tags.
+
+As part of the :ref:`ceph-volume-lvm-prepare` process, the logical volumes are assigned
+a few tags with important pieces of information.
+
+.. note:: This sub-command is not yet implemented
diff --git a/doc/ceph-volume/lvm/systemd.rst b/doc/ceph-volume/lvm/systemd.rst
new file mode 100644
index 00000000..30260de7
--- /dev/null
+++ b/doc/ceph-volume/lvm/systemd.rst
@@ -0,0 +1,28 @@
+.. _ceph-volume-lvm-systemd:
+
+systemd
+=======
+Upon startup, it will identify the logical volume using :term:`LVM tags`,
+finding a matching ID and later ensuring it is the right one with
+the :term:`OSD uuid`.
+
+After identifying the correct volume it will then proceed to mount it by using
+the OSD destination conventions, that is::
+
+ /var/lib/ceph/osd/<cluster name>-<osd id>
+
+For our example OSD with an id of ``0``, that means the identified device will
+be mounted at::
+
+
+ /var/lib/ceph/osd/ceph-0
+
+
+Once that process is complete, a call will be made to start the OSD::
+
+ systemctl start ceph-osd@0
+
+The systemd portion of this process is handled by the ``ceph-volume lvm
+trigger`` sub-command, which is only in charge of parsing metadata coming from
+systemd and startup, and then dispatching to ``ceph-volume lvm activate`` which
+would proceed with activation.
diff --git a/doc/ceph-volume/lvm/zap.rst b/doc/ceph-volume/lvm/zap.rst
new file mode 100644
index 00000000..367d7469
--- /dev/null
+++ b/doc/ceph-volume/lvm/zap.rst
@@ -0,0 +1,65 @@
+.. _ceph-volume-lvm-zap:
+
+``zap``
+=======
+
+This subcommand is used to zap lvs, partitions or raw devices that have been used
+by ceph OSDs so that they may be reused. If given a path to a logical
+volume it must be in the format of vg/lv. Any filesystems present
+on the given lv or partition will be removed and all data will be purged.
+
+.. note:: The lv or partition will be kept intact.
+
+.. note:: If the logical volume, raw device or partition is being used for any ceph related
+ mount points they will be unmounted.
+
+Zapping a logical volume::
+
+ ceph-volume lvm zap {vg name/lv name}
+
+Zapping a partition::
+
+ ceph-volume lvm zap /dev/sdc1
+
+Removing Devices
+----------------
+When zapping, and looking for full removal of the device (lv, vg, or partition)
+use the ``--destroy`` flag. A common use case is to simply deploy OSDs using
+a whole raw device. If you do so and then wish to reuse that device for another
+OSD you must use the ``--destroy`` flag when zapping so that the vgs and lvs
+that ceph-volume created on the raw device will be removed.
+
+.. note:: Multiple devices can be accepted at once, to zap them all
+
+Zapping a raw device and destroying any vgs or lvs present::
+
+ ceph-volume lvm zap /dev/sdc --destroy
+
+
+This action can be performed on partitions, and logical volumes as well::
+
+ ceph-volume lvm zap /dev/sdc1 --destroy
+ ceph-volume lvm zap osd-vg/data-lv --destroy
+
+
+Finally, multiple devices can be detected if filtering by OSD ID and/or OSD
+FSID. Either identifier can be used or both can be used at the same time. This
+is useful in situations where multiple devices associated with a specific ID
+need to be purged. When using the FSID, the filtering is stricter, and might
+not match other (possibly invalid) devices associated to an ID.
+
+By ID only::
+
+ ceph-volume lvm zap --destroy --osd-id 1
+
+By FSID::
+
+ ceph-volume lvm zap --destroy --osd-fsid 2E8FBE58-0328-4E3B-BFB7-3CACE4E9A6CE
+
+By both::
+
+ ceph-volume lvm zap --destroy --osd-fsid 2E8FBE58-0328-4E3B-BFB7-3CACE4E9A6CE --osd-id 1
+
+
+.. warning:: If the systemd unit associated with the OSD ID to be zapped is
+ detected as running, the tool will refuse to zap until the daemon is stopped.