summaryrefslogtreecommitdiffstats
path: root/doc/dev/cephadm
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-07 18:45:59 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-07 18:45:59 +0000
commit19fcec84d8d7d21e796c7624e521b60d28ee21ed (patch)
tree42d26aa27d1e3f7c0b8bd3fd14e7d7082f5008dc /doc/dev/cephadm
parentInitial commit. (diff)
downloadceph-upstream.tar.xz
ceph-upstream.zip
Adding upstream version 16.2.11+ds.upstream/16.2.11+dsupstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/dev/cephadm')
-rw-r--r--doc/dev/cephadm/cephadm-exporter.rst306
-rw-r--r--doc/dev/cephadm/compliance-check.rst121
-rw-r--r--doc/dev/cephadm/developing-cephadm.rst263
-rw-r--r--doc/dev/cephadm/host-maintenance.rst104
-rw-r--r--doc/dev/cephadm/index.rst15
-rw-r--r--doc/dev/cephadm/scalability-notes.rst95
6 files changed, 904 insertions, 0 deletions
diff --git a/doc/dev/cephadm/cephadm-exporter.rst b/doc/dev/cephadm/cephadm-exporter.rst
new file mode 100644
index 000000000..bc41fcaeb
--- /dev/null
+++ b/doc/dev/cephadm/cephadm-exporter.rst
@@ -0,0 +1,306 @@
+================
+cephadm Exporter
+================
+
+There are a number of long running tasks that the cephadm 'binary' runs which can take several seconds
+to run. This latency represents a scalability challenge to the Ceph orchestrator management plane.
+
+To address this, cephadm needs to be able to run some of these longer running tasks asynchronously - this
+frees up processing on the mgr by offloading tasks to each host, reduces latency and improves scalability.
+
+This document describes the implementation requirements and design for an 'exporter' feature
+
+
+Requirements
+============
+The exporter should address these functional and non-functional requirements;
+
+* run as a normal systemd unit
+* utilise the same filesystem schema as other services deployed with cephadm
+* require only python3 standard library modules (no external dependencies)
+* use encryption to protect the data flowing from a host to Ceph mgr
+* execute data gathering tasks as background threads
+* be easily extended to include more data gathering tasks
+* monitor itself for the health of the data gathering threads
+* cache metadata to respond to queries quickly
+* respond to a metadata query in <30ms to support large Ceph clusters (000's nodes)
+* provide CLI interaction to enable the exporter to be deployed either at bootstrap time, or once the
+ cluster has been deployed.
+* be deployed as a normal orchestrator service (similar to the node-exporter)
+
+High Level Design
+=================
+
+This section will focus on the exporter logic **only**.
+
+.. code::
+
+ Establish a metadata cache object (tasks will be represented by separate attributes)
+ Create a thread for each data gathering task; host, ceph-volume and list_daemons
+ each thread updates it's own attribute within the cache object
+ Start a server instance passing requests to a specific request handler
+ the request handler only interacts with the cache object
+ the request handler passes metadata back to the caller
+ Main Loop
+ Leave the loop if a 'stop' request is received
+ check thread health
+ if a thread that was active, is now inactive
+ update the cache marking the task as inactive
+ update the cache with an error message for that task
+ wait for n secs
+
+
+In the initial exporter implementation, the exporter has been implemented as a RESTful API.
+
+
+Security
+========
+
+The cephadm 'binary' only supports standard python3 features, which has meant the RESTful API has been
+developed using the http module, which itself is not intended for production use. However, the implementation
+is not complex (based only on HTTPServer and BaseHHTPRequestHandler) and only supports the GET method - so the
+security risk is perceived as low.
+
+Current mgr to host interactions occurs within an ssh connection, so the goal of the exporter is to adopt a similar
+security model.
+
+The initial REST API is implemented with the following features;
+
+* generic self-signed, or user provided SSL crt/key to encrypt traffic between the mgr and the host
+* 'token' based authentication of the request
+
+All exporter instances will use the **same** crt/key to secure the link from the mgr to the host(s), in the same way
+that the ssh access uses the same public key and port for each host connection.
+
+.. note:: Since the same SSL configuration is used on every exporter, when you supply your own settings you must
+ ensure that the CN or SAN components of the distinguished name are either **not** used or created using wildcard naming.
+
+The crt, key and token files are all defined with restrictive permissions (600), to help mitigate against the risk of exposure
+to any other user on the Ceph cluster node(s).
+
+Administrator Interaction
+=========================
+Several new commands are required to configure the exporter, and additional parameters should be added to the bootstrap
+process to allow the exporter to be deployed automatically for new clusters.
+
+
+Enhancements to the 'bootstrap' process
+---------------------------------------
+bootstrap should support additional parameters to automatically configure exporter daemons across hosts
+
+``--with-exporter``
+
+By using this flag, you're telling the bootstrap process to include the cephadm-exporter service within the
+cluster. If you do not provide a specific configuration (SSL, token, port) to use, defaults would be applied.
+
+``--exporter-config``
+
+With the --exporter-config option, you may pass your own SSL, token and port information. The file must be in
+JSON format and contain the following fields; crt, key, token and port. The JSON content should be validated, and any
+errors detected passed back to the user during the argument parsing phase (before any changes are done).
+
+
+Additional ceph commands
+------------------------
+::
+
+# ceph cephadm generate-exporter-config
+
+This command will create generate a default configuration consisting of; a self signed certificate, a randomly generated
+32 character token and the default port of 9443 for the REST API.
+::
+
+# ceph cephadm set-exporter-config -i <config.json>
+
+Use a JSON file to define the crt, key, token and port for the REST API. The crt, key and token are validated by
+the mgr/cephadm module prior storing the values in the KV store. Invalid or missing entries should be reported to the
+user.
+::
+
+# ceph cephadm clear-exporter-config
+
+Clear the current configuration (removes the associated keys from the KV store)
+::
+
+# ceph cephadm get-exporter-config
+
+Show the current exporter configuration, in JSON format
+
+
+.. note:: If the service is already deployed any attempt to change or clear the configuration will
+ be denied. In order to change settings you must remove the service, apply the required configuration
+ and re-apply (``ceph orch apply cephadm-exporter``)
+
+
+
+New Ceph Configuration Keys
+===========================
+The exporter configuration is persisted to the monitor's KV store, with the following keys:
+
+| mgr/cephadm/exporter_config
+| mgr/cephadm/exporter_enabled
+
+
+
+RESTful API
+===========
+The primary goal of the exporter is the provision of metadata from the host to the mgr. This interaction takes
+place over a simple GET interface. Although only the GET method is supported, the API provides multiple URLs to
+provide different views on the metadata that has been gathered.
+
+.. csv-table:: Supported URL endpoints
+ :header: "URL", "Purpose"
+
+ "/v1/metadata", "show all metadata including health of all threads"
+ "/v1/metadata/health", "only report on the health of the data gathering threads"
+ "/v1/metadata/disks", "show the disk output (ceph-volume inventory data)"
+ "/v1/metadata/host", "show host related metadata from the gather-facts command"
+ "/v1/metatdata/daemons", "show the status of all ceph cluster related daemons on the host"
+
+Return Codes
+------------
+The following HTTP return codes are generated by the API
+
+.. csv-table:: Supported HTTP Responses
+ :header: "Status Code", "Meaning"
+
+ "200", "OK"
+ "204", "the thread associated with this request is no longer active, no data is returned"
+ "206", "some threads have stopped, so some content is missing"
+ "401", "request is not authorised - check your token is correct"
+ "404", "URL is malformed, not found"
+ "500", "all threads have stopped - unable to provide any metadata for the host"
+
+
+Deployment
+==========
+During the initial phases of the exporter implementation, deployment is regarded as optional but is available
+to new clusters and existing clusters that have the feature (Pacific and above).
+
+* new clusters : use the ``--with-exporter`` option
+* existing clusters : you'll need to set the configuration and deploy the service manually
+
+.. code::
+
+ # ceph cephadm generate-exporter-config
+ # ceph orch apply cephadm-exporter
+
+If you choose to remove the cephadm-exporter service, you may simply
+
+.. code::
+
+ # ceph orch rm cephadm-exporter
+
+This will remove the daemons, and the exporter releated settings stored in the KV store.
+
+
+Management
+==========
+Once the exporter is deployed, you can use the following snippet to extract the host's metadata.
+
+.. code-block:: python
+
+ import ssl
+ import json
+ import sys
+ import tempfile
+ import time
+ from urllib.request import Request, urlopen
+
+ # CHANGE THIS V
+ hostname = "rh8-1.storage.lab"
+
+ print("Reading config.json")
+ try:
+ with open('./config.json', 'r') as f:
+ raw=f.read()
+ except FileNotFoundError as e:
+ print("You must first create a config.json file using the cephadm get-exporter-config command")
+ sys.exit(1)
+
+ cfg = json.loads(raw)
+ with tempfile.NamedTemporaryFile(buffering=0) as t:
+ print("creating a temporary local crt file from the json")
+ t.write(cfg['crt'].encode('utf-8'))
+
+ ctx = ssl.create_default_context()
+ ctx.check_hostname = False
+ ctx.load_verify_locations(t.name)
+ hdrs={"Authorization":f"Bearer {cfg['token']}"}
+ print("Issuing call to gather metadata")
+ req=Request(f"https://{hostname}:9443/v1/metadata",headers=hdrs)
+ s_time = time.time()
+ r = urlopen(req,context=ctx)
+ print(r.status)
+ print("call complete")
+ # assert r.status == 200
+ if r.status in [200, 206]:
+
+ raw=r.read() # bytes string
+ js=json.loads(raw.decode())
+ print(json.dumps(js, indent=2))
+ elapsed = time.time() - s_time
+ print(f"Elapsed secs : {elapsed}")
+
+
+.. note:: the above example uses python3, and assumes that you've extracted the config using the ``get-exporter-config`` command.
+
+
+Implementation Specific Details
+===============================
+
+In the same way as a typical container based deployment, the exporter is deployed to a directory under ``/var/lib/ceph/<fsid>``. The
+cephadm binary is stored in this cluster folder, and the daemon's configuration and systemd settings are stored
+under ``/var/lib/ceph/<fsid>/cephadm-exporter.<id>/``.
+
+.. code::
+
+ [root@rh8-1 cephadm-exporter.rh8-1]# pwd
+ /var/lib/ceph/cb576f70-2f72-11eb-b141-525400da3eb7/cephadm-exporter.rh8-1
+ [root@rh8-1 cephadm-exporter.rh8-1]# ls -al
+ total 24
+ drwx------. 2 root root 100 Nov 25 18:10 .
+ drwx------. 8 root root 160 Nov 25 23:19 ..
+ -rw-------. 1 root root 1046 Nov 25 18:10 crt
+ -rw-------. 1 root root 1704 Nov 25 18:10 key
+ -rw-------. 1 root root 64 Nov 25 18:10 token
+ -rw-------. 1 root root 38 Nov 25 18:10 unit.configured
+ -rw-------. 1 root root 48 Nov 25 18:10 unit.created
+ -rw-r--r--. 1 root root 157 Nov 25 18:10 unit.run
+
+
+In order to respond to requests quickly, the CephadmDaemon uses a cache object (CephadmCache) to hold the results
+of the cephadm commands.
+
+The exporter doesn't introduce any new data gathering capability - instead it merely calls the existing cephadm commands.
+
+The CephadmDaemon class creates a local HTTP server(uses ThreadingMixIn), secured with TLS and uses the CephadmDaemonHandler
+to handle the requests. The request handler inspects the request header and looks for a valid Bearer token - if this is invalid
+or missing the caller receives a 401 Unauthorized error.
+
+The 'run' method of the CephadmDaemon class, places the scrape_* methods into different threads with each thread supporting
+a different refresh interval. Each thread then periodically issues it's cephadm command, and places the output
+in the cache object.
+
+In addition to the command output, each thread also maintains it's own timestamp record in the cache so the caller can
+very easily determine the age of the data it's received.
+
+If the underlying cephadm command execution hits an exception, the thread passes control to a _handle_thread_exception method.
+Here the exception is logged to the daemon's log file and the exception details are added to the cache, providing visibility
+of the problem to the caller.
+
+Although each thread is effectively given it's own URL endpoint (host, disks, daemons), the recommended way to gather data from
+the host is to simply use the ``/v1/metadata`` endpoint. This will provide all of the data, and indicate whether any of the
+threads have failed.
+
+The run method uses "signal" to establish a reload hook, but in the initial implementation this doesn't take any action and simply
+logs that a reload was received.
+
+
+Future Work
+===========
+
+#. Consider the potential of adding a restart policy for threads
+#. Once the exporter is fully integrated into mgr/cephadm, the goal would be to make the exporter the
+ default means of data gathering. However, until then the exporter will remain as an opt-in 'feature
+ preview'.
diff --git a/doc/dev/cephadm/compliance-check.rst b/doc/dev/cephadm/compliance-check.rst
new file mode 100644
index 000000000..eea462445
--- /dev/null
+++ b/doc/dev/cephadm/compliance-check.rst
@@ -0,0 +1,121 @@
+================
+Compliance Check
+================
+
+The stability and reliability of a Ceph cluster is dependent not just upon the Ceph daemons, but
+also the OS and hardware that Ceph is installed on. This document is intended to promote a design
+discussion for providing a "compliance" feature within mgr/cephadm, which would be responsible for
+identifying common platform-related issues that could impact Ceph stability and operation.
+
+The ultimate goal of these checks is to identify issues early and raise a healthcheck WARN
+event, to alert the Administrator to the issue.
+
+Prerequisites
+=============
+In order to effectively analyse the hosts that Ceph is deployed to, this feature requires a cache
+of host-related metadata. The metadata is already available from cephadm's HostFacts class and the
+``gather-facts`` cephadm command. For the purposes of this document, we will assume that this
+data is available within the mgr/cephadm "cache" structure.
+
+Some checks will require that the host status is also populated e.g. ONLINE, OFFLINE, MAINTENANCE
+
+Administrator Interaction
+=========================
+Not all users will require this feature, and must be able to 'opt out'. For this reason,
+mgr/cephadm must provide controls, such as the following;
+
+.. code-block::
+
+ ceph cephadm compliance enable | disable | status [--format json]
+ ceph cephadm compliance ls [--format json]
+ ceph cephadm compliance enable-check <name>
+ ceph cephadm compliance disable-check <name>
+ ceph cephadm compliance set-check-interval <int>
+ ceph cephadm compliance get-check-interval
+
+The status option would show the enabled/disabled state of the feature, along with the
+check-interval.
+
+The ``ls`` subcommand would show all checks in the following format;
+
+``check-name status description``
+
+Proposed Integration
+====================
+The compliance checks are not required to run all the time, but instead should run at discrete
+intervals. The interval would be configurable under via the :code:`set-check-interval`
+subcommand (default would be every 12 hours)
+
+
+mgr/cephadm currently executes an event driven (time based) serve loop to act on deploy/remove and
+reconcile activity. In order to execute the compliance checks, the compliance check code would be
+called from this main serve loop - when the :code:`set-check-interval` is met.
+
+
+Proposed Checks
+===============
+All checks would push any errors to a list, so multiple issues can be escalated to the Admin at
+the same time. The list below provides a description of each check, with the text following the
+name indicating a shortname version *(the shortname is the reference for command Interaction
+when enabling or disabling a check)*
+
+
+OS Consistency (OS)
+___________________
+* all hosts must use same vendor
+* all hosts must be on the same major release (this check would only be applicable to distributions that
+ offer a long-term-support strategy (RHEL, CentOS, SLES, Ubuntu etc)
+
+
+*src: gather-facts output*
+
+Linux Kernel Security Mode (LSM)
+________________________________
+* All hosts should have a consistent SELINUX/AppArmor configuration
+
+*src: gather-facts output*
+
+Services Check (SERVICES)
+_________________________
+Hosts that are in an ONLINE state should adhere to the following;
+
+* all daemons (systemd units) should be enabled
+* all daemons should be running (not dead)
+
+*src: list_daemons output*
+
+Support Status (SUPPORT)
+________________________
+If support status has been detected, it should be consistent across all hosts. At this point
+support status is available only for Red Hat machines.
+
+*src: gather-facts output*
+
+Network : MTU (MTU)
+________________________________
+All network interfaces on the same Ceph network (public/cluster) should have the same MTU
+
+*src: gather-facts output*
+
+Network : LinkSpeed (LINKSPEED)
+____________________________________________
+All network interfaces on the same Ceph network (public/cluster) should have the same Linkspeed
+
+*src: gather-facts output*
+
+Network : Consistency (INTERFACE)
+______________________________________________
+All hosts with OSDs should have consistent network configuration - eg. if some hosts do
+not separate cluster/public traffic but others do, that is an anomaly that would generate a
+compliance check warning.
+
+*src: gather-facts output*
+
+Notification Strategy
+=====================
+If any of the checks fail, mgr/cephadm would raise a WARN level alert
+
+Futures
+=======
+The checks highlighted here serve only as a starting point, and we should expect to expand
+on the checks over time.
diff --git a/doc/dev/cephadm/developing-cephadm.rst b/doc/dev/cephadm/developing-cephadm.rst
new file mode 100644
index 000000000..d9f81c2c0
--- /dev/null
+++ b/doc/dev/cephadm/developing-cephadm.rst
@@ -0,0 +1,263 @@
+=======================
+Developing with cephadm
+=======================
+
+There are several ways to develop with cephadm. Which you use depends
+on what you're trying to accomplish.
+
+vstart --cephadm
+================
+
+- Start a cluster with vstart, with cephadm configured
+- Manage any additional daemons with cephadm
+- Requires compiled ceph binaries
+
+In this case, the mon and manager at a minimum are running in the usual
+vstart way, not managed by cephadm. But cephadm is enabled and the local
+host is added, so you can deploy additional daemons or add additional hosts.
+
+This works well for developing cephadm itself, because any mgr/cephadm
+or cephadm/cephadm code changes can be applied by kicking ceph-mgr
+with ``ceph mgr fail x``. (When the mgr (re)starts, it loads the
+cephadm/cephadm script into memory.)
+
+::
+
+ MON=1 MGR=1 OSD=0 MDS=0 ../src/vstart.sh -d -n -x --cephadm
+
+- ``~/.ssh/id_dsa[.pub]`` is used as the cluster key. It is assumed that
+ this key is authorized to ssh with no passphrase to root@`hostname`.
+- cephadm does not try to manage any daemons started by vstart.sh (any
+ nonzero number in the environment variables). No service spec is defined
+ for mon or mgr.
+- You'll see health warnings from cephadm about stray daemons--that's because
+ the vstart-launched daemons aren't controlled by cephadm.
+- The default image is ``quay.io/ceph-ci/ceph:master``, but you can change
+ this by passing ``-o container_image=...`` or ``ceph config set global container_image ...``.
+
+
+cstart and cpatch
+=================
+
+The ``cstart.sh`` script will launch a cluster using cephadm and put the
+conf and keyring in your build dir, so that the ``bin/ceph ...`` CLI works
+(just like with vstart). The ``ckill.sh`` script will tear it down.
+
+- A unique but stable fsid is stored in ``fsid`` (in the build dir).
+- The mon port is random, just like with vstart.
+- The container image is ``quay.io/ceph-ci/ceph:$tag`` where $tag is
+ the first 8 chars of the fsid.
+- If the container image doesn't exist yet when you run cstart for the
+ first time, it is built with cpatch.
+
+There are a few advantages here:
+
+- The cluster is a "normal" cephadm cluster that looks and behaves
+ just like a user's cluster would. In contrast, vstart and teuthology
+ clusters tend to be special in subtle (and not-so-subtle) ways (e.g.
+ having the ``lockdep`` turned on).
+
+To start a test cluster::
+
+ sudo ../src/cstart.sh
+
+The last line of the output will be a line you can cut+paste to update
+the container image. For instance::
+
+ sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e
+
+By default, cpatch will patch everything it can think of from the local
+build dir into the container image. If you are working on a specific
+part of the system, though, can you get away with smaller changes so that
+cpatch runs faster. For instance::
+
+ sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e --py
+
+will update the mgr modules (minus the dashboard). Or::
+
+ sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e --core
+
+will do most binaries and libraries. Pass ``-h`` to cpatch for all options.
+
+Once the container is updated, you can refresh/restart daemons by bouncing
+them with::
+
+ sudo systemctl restart ceph-`cat fsid`.target
+
+When you're done, you can tear down the cluster with::
+
+ sudo ../src/ckill.sh # or,
+ sudo ../src/cephadm/cephadm rm-cluster --force --fsid `cat fsid`
+
+cephadm bootstrap --shared_ceph_folder
+======================================
+
+Cephadm can also be used directly without compiled ceph binaries.
+
+Run cephadm like so::
+
+ sudo ./cephadm bootstrap --mon-ip 127.0.0.1 \
+ --ssh-private-key /home/<user>/.ssh/id_rsa \
+ --skip-mon-network \
+ --skip-monitoring-stack --single-host-defaults \
+ --skip-dashboard \
+ --shared_ceph_folder /home/<user>/path/to/ceph/
+
+- ``~/.ssh/id_rsa`` is used as the cluster key. It is assumed that
+ this key is authorized to ssh with no passphrase to root@`hostname`.
+
+Source code changes made in the ``pybind/mgr/`` directory then
+require a daemon restart to take effect.
+
+Note regarding network calls from CLI handlers
+==============================================
+
+Executing any cephadm CLI commands like ``ceph orch ls`` will block the
+mon command handler thread within the MGR, thus preventing any concurrent
+CLI calls. Note that pressing ``^C`` will not resolve this situation,
+as *only* the client will be aborted, but not execution of the command
+within the orchestrator manager module itself. This means, cephadm will
+be completely unresponsive until the execution of the CLI handler is
+fully completed. Note that even ``ceph orch ps`` will not respond while
+another handler is executing.
+
+This means we should do very few synchronous calls to remote hosts.
+As a guideline, cephadm should do at most ``O(1)`` network calls in CLI handlers.
+Everything else should be done asynchronously in other threads, like ``serve()``.
+
+Note regarding different variables used in the code
+===================================================
+
+* a ``service_type`` is something like mon, mgr, alertmanager etc defined
+ in ``ServiceSpec``
+* a ``service_id`` is the name of the service. Some services don't have
+ names.
+* a ``service_name`` is ``<service_type>.<service_id>``
+* a ``daemon_type`` is the same as the service_type, except for ingress,
+ which has the haproxy and keepalived daemon types.
+* a ``daemon_id`` is typically ``<service_id>.<hostname>.<random-string>``.
+ (Not the case for e.g. OSDs. OSDs are always called OSD.N)
+* a ``daemon_name`` is ``<daemon_type>.<daemon_id>``
+
+Kcli: a virtualization management tool to make easy orchestrators development
+=============================================================================
+`Kcli <https://github.com/karmab/kcli>`_ is meant to interact with existing
+virtualization providers (libvirt, KubeVirt, oVirt, OpenStack, VMware vSphere,
+GCP and AWS) and to easily deploy and customize VMs from cloud images.
+
+It allows you to setup an environment with several vms with your preferred
+configuration( memory, cpus, disks) and OS flavor.
+
+main advantages:
+----------------
+ - Is fast. Typically you can have a completely new Ceph cluster ready to debug
+ and develop orchestrator features in less than 5 minutes.
+ - Is a "near production" lab. The lab created with kcli is very near of "real"
+ clusters in QE labs or even in production. So easy to test "real things" in
+ almost "real environment"
+ - Is safe and isolated. Do not depend of the things you have installed in your
+ machine. And the vms are isolated from your environment.
+ - Easy to work "dev" environment. For "not compilated" software pieces,
+ for example any mgr module. It is an environment that allow you to test your
+ changes interactively.
+
+Installation:
+-------------
+Complete documentation in `kcli installation <https://kcli.readthedocs.io/en/latest/#installation>`_
+but we strongly suggest to use the container image approach.
+
+So things to do:
+ - 1. Review `requeriments <https://kcli.readthedocs.io/en/latest/#libvirt-hypervisor-requisites>`_
+ and install/configure whatever you need to meet them.
+ - 2. get the kcli image and create one alias for executing the kcli command
+ ::
+
+ # podman pull quay.io/karmab/kcli
+ # alias kcli='podman run --net host -it --rm --security-opt label=disable -v $HOME/.ssh:/root/.ssh -v $HOME/.kcli:/root/.kcli -v /var/lib/libvirt/images:/var/lib/libvirt/images -v /var/run/libvirt:/var/run/libvirt -v $PWD:/workdir -v /var/tmp:/ignitiondir quay.io/karmab/kcli'
+
+.. note:: /var/lib/libvirt/images can be customized.... be sure that you are
+ using this folder for your OS images
+
+.. note:: Once you have used your kcli tool to create and use different labs, we
+ suggest you to "save" and use your own kcli image.
+ Why?: kcli is alive and it changes (and for the moment only exists one tag ...
+ latest). Because we have more than enough with the current functionality, and
+ what we want is overall stability,
+ we suggest to store the kcli image you are using in a safe place and update
+ your kcli alias to use your own image.
+
+Test your kcli installation:
+----------------------------
+See the kcli `basic usage workflow <https://kcli.readthedocs.io/en/latest/#basic-workflow>`_
+
+Create a Ceph lab cluster
+-------------------------
+In order to make easy this task we are going to use a kcli plan.
+
+A kcli plan is a file where you can define the different settings you want to
+have in a set of vms.
+You can define hardware parameters (cpu, memory, disks ..), operating system and
+it also allows you to automate the installation and configuration of any
+software you want to have.
+
+There is a `repository <https://github.com/karmab/kcli-plans>`_ with a collection of
+plans that can be used for different purposes. And we have predefined plans to
+install Ceph clusters using Ceph ansible or cephadm, lets create our first Ceph
+cluster using cephadm::
+
+# kcli2 create plan -u https://github.com/karmab/kcli-plans/blob/master/ceph/ceph_cluster.yml
+
+This will create a set of three vms using the plan file pointed by the url.
+After a few minutes (depend of your laptop power), lets examine the cluster:
+
+* Take a look to the vms created::
+
+ # kcli list vms
+
+* Enter in the bootstrap node::
+
+ # kcli ssh ceph-node-00
+
+* Take a look to the ceph cluster installed::
+
+ [centos@ceph-node-00 ~]$ sudo -i
+ [root@ceph-node-00 ~]# cephadm version
+ [root@ceph-node-00 ~]# cephadm shell
+ [ceph: root@ceph-node-00 /]# ceph orch host ls
+
+Create a Ceph cluster to make easy developing in mgr modules (Orchestrators and Dashboard)
+------------------------------------------------------------------------------------------
+The cephadm kcli plan (and cephadm) are prepared to do that.
+
+The idea behind this method is to replace several python mgr folders in each of
+the ceph daemons with the source code folders in your host machine.
+This "trick" will allow you to make changes in any orchestrator or dashboard
+module and test them intermediately. (only needed to disable/enable the mgr module)
+
+So in order to create a ceph cluster for development purposes you must use the
+same cephadm plan but with a new parameter pointing your Ceph source code folder::
+
+ # kcli create plan -u https://github.com/karmab/kcli-plans/blob/master/ceph/ceph_cluster.yml -P ceph_dev_folder=/home/mycodefolder/ceph
+
+Ceph Dashboard development
+--------------------------
+Ceph dashboard module is not going to be loaded if previously you have not
+generated the frontend bundle.
+
+For now, in order load properly the Ceph Dashboardmodule and to apply frontend
+changes you have to run "ng build" on your laptop::
+
+ # Start local frontend build with watcher (in background):
+ sudo dnf install -y nodejs
+ cd <path-to-your-ceph-repo>
+ cd src/pybind/mgr/dashboard/frontend
+ sudo chown -R <your-user>:root dist node_modules
+ NG_CLI_ANALYTICS=false npm ci
+ npm run build -- --deleteOutputPath=false --watch &
+
+After saving your changes, the frontend bundle will be built again.
+When completed, you'll see::
+
+ "Localized bundle generation complete."
+
+Then you can reload your Dashboard browser tab.
diff --git a/doc/dev/cephadm/host-maintenance.rst b/doc/dev/cephadm/host-maintenance.rst
new file mode 100644
index 000000000..2b84ec7bd
--- /dev/null
+++ b/doc/dev/cephadm/host-maintenance.rst
@@ -0,0 +1,104 @@
+================
+Host Maintenance
+================
+
+All hosts that support Ceph daemons need to support maintenance activity, whether the host
+is physical or virtual. This means that management workflows should provide
+a simple and consistent way to support this operational requirement. This document defines
+the maintenance strategy that could be implemented in cephadm and mgr/cephadm.
+
+
+High Level Design
+=================
+Placing a host into maintenance, adopts the following workflow;
+
+#. confirm that the removal of the host does not impact data availability (the following
+ steps will assume it is safe to proceed)
+
+ * orch host ok-to-stop <host> would be used here
+
+#. if the host has osd daemons, apply noout to the host subtree to prevent data migration
+ from triggering during the planned maintenance slot.
+#. Stop the ceph target (all daemons stop)
+#. Disable the ceph target on that host, to prevent a reboot from automatically starting
+ ceph services again)
+
+
+Exiting Maintenance, is basically the reverse of the above sequence
+
+Admin Interaction
+=================
+The ceph orch command will be extended to support maintenance.
+
+.. code-block::
+
+ ceph orch host maintenance enter <host> [ --force ]
+ ceph orch host maintenance exit <host>
+
+.. note:: In addition, the host's status should be updated to reflect whether it
+ is in maintenance or not.
+
+The 'check' Option
+__________________
+The orch host ok-to-stop command focuses on ceph daemons (mon, osd, mds), which
+provides the first check. However, a ceph cluster also uses other types of daemons
+for monitoring, management and non-native protocol support which means the
+logic will need to consider service impact too. The 'check' option provides
+this additional layer to alert the user of service impact to *secondary*
+daemons.
+
+The list below shows some of these additional daemons.
+
+* mgr (not included in ok-to-stop checks)
+* prometheus, grafana, alertmanager
+* rgw
+* haproxy
+* iscsi gateways
+* ganesha gateways
+
+By using the --check option first, the Admin can choose whether to proceed. This
+workflow is obviously optional for the CLI user, but could be integrated into the
+UI workflow to help less experienced Administators manage the cluster.
+
+By adopting this two-phase approach, a UI based workflow would look something
+like this.
+
+#. User selects a host to place into maintenance
+
+ * orchestrator checks for data **and** service impact
+#. If potential impact is shown, the next steps depend on the impact type
+
+ * **data availability** : maintenance is denied, informing the user of the issue
+ * **service availability** : user is provided a list of affected services and
+ asked to confirm
+
+
+Components Impacted
+===================
+Implementing this capability will require changes to the following;
+
+* cephadm
+
+ * Add maintenance subcommand with the following 'verbs'; enter, exit, check
+
+* mgr/cephadm
+
+ * add methods to CephadmOrchestrator for enter/exit and check
+ * data gathering would be skipped for hosts in a maintenance state
+
+* mgr/orchestrator
+
+ * add CLI commands to OrchestratorCli which expose the enter/exit and check interaction
+
+
+Ideas for Future Work
+=====================
+#. When a host is placed into maintenance, the time of the event could be persisted. This
+ would allow the orchestrator layer to establish a maintenance window for the task and
+ alert if the maintenance window has been exceeded.
+#. The maintenance process could support plugins to allow other integration tasks to be
+ initiated as part of the transition to and from maintenance. This plugin capability could
+ support actions like;
+
+ * alert suppression to 3rd party monitoring framework(s)
+ * service level reporting, to record outage windows
diff --git a/doc/dev/cephadm/index.rst b/doc/dev/cephadm/index.rst
new file mode 100644
index 000000000..a09baffdb
--- /dev/null
+++ b/doc/dev/cephadm/index.rst
@@ -0,0 +1,15 @@
+===================================
+CEPHADM Developer Documentation
+===================================
+
+.. rubric:: Contents
+
+.. toctree::
+ :maxdepth: 1
+
+
+ developing-cephadm
+ host-maintenance
+ compliance-check
+ cephadm-exporter
+ scalability-notes
diff --git a/doc/dev/cephadm/scalability-notes.rst b/doc/dev/cephadm/scalability-notes.rst
new file mode 100644
index 000000000..157153cb3
--- /dev/null
+++ b/doc/dev/cephadm/scalability-notes.rst
@@ -0,0 +1,95 @@
+#############################################
+ Notes and Thoughts on Cephadm's scalability
+#############################################
+
+*********************
+ About this document
+*********************
+
+This document does NOT define a specific proposal or some future work.
+Instead it merely lists a few thoughts that MIGHT be relevant for future
+cephadm enhacements.
+
+*******
+ Intro
+*******
+
+Current situation:
+
+Cephadm manages all registered hosts. This means that it periodically
+scrapes data from each host to identify changes on the host like:
+
+- disk added/removed
+- daemon added/removed
+- host network/firewall etc has changed
+
+Currently, cephadm scrapes each host (up to 10 in parallel) every 6
+minutes, unless a refresh is forced manually.
+
+Refreshes for disks (ceph-volume), daemons (podman/docker), etc, happen
+in sequence.
+
+With the cephadm exporter, we have now reduced the time to scan hosts
+considerably, but the question remains:
+
+Is the cephadm-exporter sufficient to solve all future scalability
+issues?
+
+***********************************************
+ Considerations of cephadm-exporter's REST API
+***********************************************
+
+The cephadm-exporter uses HTTP to serve an endpoint to the hosts
+metadata. We MIGHT encounter some issues with this approach, which need
+to be mitigated at some point.
+
+- With the cephadm-exporter we use SSH and HTTP to connect to each
+ host. Having two distinct transport layers feels odd, and we might
+ want to consider reducing it to only a single protocol.
+
+- The current approach of delivering ``bin/cephadm`` to the host doesn't
+ allow the use of external dependencies. This means that we're stuck
+ with the built-in HTTP server lib, which isn't great for providing a
+ good developer experience. ``bin/cephadm`` needs to be packaged and
+ distributed (one way or the other) for us to make use of a better
+ http server library.
+
+************************
+ MON's config-key store
+************************
+
+After the ``mgr/cephadm`` queried metadata from each host, cephadm stores
+the data within the mon's k-v store.
+
+If each host would be allowed to write their own metadata to the store,
+``mgr/cephadm`` would no longer be required to gather the data.
+
+Some questions arise:
+
+- ``mgr/cephadm`` now needs to query data from the config-key store,
+ instead of relying on cached data.
+
+- cephadm knows three different types of data: (1) Data that is
+ critical and needs to be stored in the config-key store. (2) Data
+ that can be kept in memory only. (3) Data that can be stored in
+ RADOS pool. How can we apply this idea to those different types of
+ data.
+
+*******************************
+ Increase the worker pool size
+*******************************
+
+``mgr/cephadm`` is currently able to scrape 10 nodes at the same time.
+
+The scrape of a individual host takes the same amount of time persists.
+We'd just reduce the overall execution time.
+
+At best we can reach O(hosts) + O(daemons).
+
+*************************
+ Backwards compatibility
+*************************
+
+Any changes need to be backwards compatible or completely isolated from
+any existing functionality. There are running cephadm clusters out there
+that require an upgrade path.