diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 18:45:59 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 18:45:59 +0000 |
commit | 19fcec84d8d7d21e796c7624e521b60d28ee21ed (patch) | |
tree | 42d26aa27d1e3f7c0b8bd3fd14e7d7082f5008dc /doc/dev/cephadm | |
parent | Initial commit. (diff) | |
download | ceph-upstream.tar.xz ceph-upstream.zip |
Adding upstream version 16.2.11+ds.upstream/16.2.11+dsupstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/dev/cephadm')
-rw-r--r-- | doc/dev/cephadm/cephadm-exporter.rst | 306 | ||||
-rw-r--r-- | doc/dev/cephadm/compliance-check.rst | 121 | ||||
-rw-r--r-- | doc/dev/cephadm/developing-cephadm.rst | 263 | ||||
-rw-r--r-- | doc/dev/cephadm/host-maintenance.rst | 104 | ||||
-rw-r--r-- | doc/dev/cephadm/index.rst | 15 | ||||
-rw-r--r-- | doc/dev/cephadm/scalability-notes.rst | 95 |
6 files changed, 904 insertions, 0 deletions
diff --git a/doc/dev/cephadm/cephadm-exporter.rst b/doc/dev/cephadm/cephadm-exporter.rst new file mode 100644 index 000000000..bc41fcaeb --- /dev/null +++ b/doc/dev/cephadm/cephadm-exporter.rst @@ -0,0 +1,306 @@ +================ +cephadm Exporter +================ + +There are a number of long running tasks that the cephadm 'binary' runs which can take several seconds +to run. This latency represents a scalability challenge to the Ceph orchestrator management plane. + +To address this, cephadm needs to be able to run some of these longer running tasks asynchronously - this +frees up processing on the mgr by offloading tasks to each host, reduces latency and improves scalability. + +This document describes the implementation requirements and design for an 'exporter' feature + + +Requirements +============ +The exporter should address these functional and non-functional requirements; + +* run as a normal systemd unit +* utilise the same filesystem schema as other services deployed with cephadm +* require only python3 standard library modules (no external dependencies) +* use encryption to protect the data flowing from a host to Ceph mgr +* execute data gathering tasks as background threads +* be easily extended to include more data gathering tasks +* monitor itself for the health of the data gathering threads +* cache metadata to respond to queries quickly +* respond to a metadata query in <30ms to support large Ceph clusters (000's nodes) +* provide CLI interaction to enable the exporter to be deployed either at bootstrap time, or once the + cluster has been deployed. +* be deployed as a normal orchestrator service (similar to the node-exporter) + +High Level Design +================= + +This section will focus on the exporter logic **only**. + +.. code:: + + Establish a metadata cache object (tasks will be represented by separate attributes) + Create a thread for each data gathering task; host, ceph-volume and list_daemons + each thread updates it's own attribute within the cache object + Start a server instance passing requests to a specific request handler + the request handler only interacts with the cache object + the request handler passes metadata back to the caller + Main Loop + Leave the loop if a 'stop' request is received + check thread health + if a thread that was active, is now inactive + update the cache marking the task as inactive + update the cache with an error message for that task + wait for n secs + + +In the initial exporter implementation, the exporter has been implemented as a RESTful API. + + +Security +======== + +The cephadm 'binary' only supports standard python3 features, which has meant the RESTful API has been +developed using the http module, which itself is not intended for production use. However, the implementation +is not complex (based only on HTTPServer and BaseHHTPRequestHandler) and only supports the GET method - so the +security risk is perceived as low. + +Current mgr to host interactions occurs within an ssh connection, so the goal of the exporter is to adopt a similar +security model. + +The initial REST API is implemented with the following features; + +* generic self-signed, or user provided SSL crt/key to encrypt traffic between the mgr and the host +* 'token' based authentication of the request + +All exporter instances will use the **same** crt/key to secure the link from the mgr to the host(s), in the same way +that the ssh access uses the same public key and port for each host connection. + +.. note:: Since the same SSL configuration is used on every exporter, when you supply your own settings you must + ensure that the CN or SAN components of the distinguished name are either **not** used or created using wildcard naming. + +The crt, key and token files are all defined with restrictive permissions (600), to help mitigate against the risk of exposure +to any other user on the Ceph cluster node(s). + +Administrator Interaction +========================= +Several new commands are required to configure the exporter, and additional parameters should be added to the bootstrap +process to allow the exporter to be deployed automatically for new clusters. + + +Enhancements to the 'bootstrap' process +--------------------------------------- +bootstrap should support additional parameters to automatically configure exporter daemons across hosts + +``--with-exporter`` + +By using this flag, you're telling the bootstrap process to include the cephadm-exporter service within the +cluster. If you do not provide a specific configuration (SSL, token, port) to use, defaults would be applied. + +``--exporter-config`` + +With the --exporter-config option, you may pass your own SSL, token and port information. The file must be in +JSON format and contain the following fields; crt, key, token and port. The JSON content should be validated, and any +errors detected passed back to the user during the argument parsing phase (before any changes are done). + + +Additional ceph commands +------------------------ +:: + +# ceph cephadm generate-exporter-config + +This command will create generate a default configuration consisting of; a self signed certificate, a randomly generated +32 character token and the default port of 9443 for the REST API. +:: + +# ceph cephadm set-exporter-config -i <config.json> + +Use a JSON file to define the crt, key, token and port for the REST API. The crt, key and token are validated by +the mgr/cephadm module prior storing the values in the KV store. Invalid or missing entries should be reported to the +user. +:: + +# ceph cephadm clear-exporter-config + +Clear the current configuration (removes the associated keys from the KV store) +:: + +# ceph cephadm get-exporter-config + +Show the current exporter configuration, in JSON format + + +.. note:: If the service is already deployed any attempt to change or clear the configuration will + be denied. In order to change settings you must remove the service, apply the required configuration + and re-apply (``ceph orch apply cephadm-exporter``) + + + +New Ceph Configuration Keys +=========================== +The exporter configuration is persisted to the monitor's KV store, with the following keys: + +| mgr/cephadm/exporter_config +| mgr/cephadm/exporter_enabled + + + +RESTful API +=========== +The primary goal of the exporter is the provision of metadata from the host to the mgr. This interaction takes +place over a simple GET interface. Although only the GET method is supported, the API provides multiple URLs to +provide different views on the metadata that has been gathered. + +.. csv-table:: Supported URL endpoints + :header: "URL", "Purpose" + + "/v1/metadata", "show all metadata including health of all threads" + "/v1/metadata/health", "only report on the health of the data gathering threads" + "/v1/metadata/disks", "show the disk output (ceph-volume inventory data)" + "/v1/metadata/host", "show host related metadata from the gather-facts command" + "/v1/metatdata/daemons", "show the status of all ceph cluster related daemons on the host" + +Return Codes +------------ +The following HTTP return codes are generated by the API + +.. csv-table:: Supported HTTP Responses + :header: "Status Code", "Meaning" + + "200", "OK" + "204", "the thread associated with this request is no longer active, no data is returned" + "206", "some threads have stopped, so some content is missing" + "401", "request is not authorised - check your token is correct" + "404", "URL is malformed, not found" + "500", "all threads have stopped - unable to provide any metadata for the host" + + +Deployment +========== +During the initial phases of the exporter implementation, deployment is regarded as optional but is available +to new clusters and existing clusters that have the feature (Pacific and above). + +* new clusters : use the ``--with-exporter`` option +* existing clusters : you'll need to set the configuration and deploy the service manually + +.. code:: + + # ceph cephadm generate-exporter-config + # ceph orch apply cephadm-exporter + +If you choose to remove the cephadm-exporter service, you may simply + +.. code:: + + # ceph orch rm cephadm-exporter + +This will remove the daemons, and the exporter releated settings stored in the KV store. + + +Management +========== +Once the exporter is deployed, you can use the following snippet to extract the host's metadata. + +.. code-block:: python + + import ssl + import json + import sys + import tempfile + import time + from urllib.request import Request, urlopen + + # CHANGE THIS V + hostname = "rh8-1.storage.lab" + + print("Reading config.json") + try: + with open('./config.json', 'r') as f: + raw=f.read() + except FileNotFoundError as e: + print("You must first create a config.json file using the cephadm get-exporter-config command") + sys.exit(1) + + cfg = json.loads(raw) + with tempfile.NamedTemporaryFile(buffering=0) as t: + print("creating a temporary local crt file from the json") + t.write(cfg['crt'].encode('utf-8')) + + ctx = ssl.create_default_context() + ctx.check_hostname = False + ctx.load_verify_locations(t.name) + hdrs={"Authorization":f"Bearer {cfg['token']}"} + print("Issuing call to gather metadata") + req=Request(f"https://{hostname}:9443/v1/metadata",headers=hdrs) + s_time = time.time() + r = urlopen(req,context=ctx) + print(r.status) + print("call complete") + # assert r.status == 200 + if r.status in [200, 206]: + + raw=r.read() # bytes string + js=json.loads(raw.decode()) + print(json.dumps(js, indent=2)) + elapsed = time.time() - s_time + print(f"Elapsed secs : {elapsed}") + + +.. note:: the above example uses python3, and assumes that you've extracted the config using the ``get-exporter-config`` command. + + +Implementation Specific Details +=============================== + +In the same way as a typical container based deployment, the exporter is deployed to a directory under ``/var/lib/ceph/<fsid>``. The +cephadm binary is stored in this cluster folder, and the daemon's configuration and systemd settings are stored +under ``/var/lib/ceph/<fsid>/cephadm-exporter.<id>/``. + +.. code:: + + [root@rh8-1 cephadm-exporter.rh8-1]# pwd + /var/lib/ceph/cb576f70-2f72-11eb-b141-525400da3eb7/cephadm-exporter.rh8-1 + [root@rh8-1 cephadm-exporter.rh8-1]# ls -al + total 24 + drwx------. 2 root root 100 Nov 25 18:10 . + drwx------. 8 root root 160 Nov 25 23:19 .. + -rw-------. 1 root root 1046 Nov 25 18:10 crt + -rw-------. 1 root root 1704 Nov 25 18:10 key + -rw-------. 1 root root 64 Nov 25 18:10 token + -rw-------. 1 root root 38 Nov 25 18:10 unit.configured + -rw-------. 1 root root 48 Nov 25 18:10 unit.created + -rw-r--r--. 1 root root 157 Nov 25 18:10 unit.run + + +In order to respond to requests quickly, the CephadmDaemon uses a cache object (CephadmCache) to hold the results +of the cephadm commands. + +The exporter doesn't introduce any new data gathering capability - instead it merely calls the existing cephadm commands. + +The CephadmDaemon class creates a local HTTP server(uses ThreadingMixIn), secured with TLS and uses the CephadmDaemonHandler +to handle the requests. The request handler inspects the request header and looks for a valid Bearer token - if this is invalid +or missing the caller receives a 401 Unauthorized error. + +The 'run' method of the CephadmDaemon class, places the scrape_* methods into different threads with each thread supporting +a different refresh interval. Each thread then periodically issues it's cephadm command, and places the output +in the cache object. + +In addition to the command output, each thread also maintains it's own timestamp record in the cache so the caller can +very easily determine the age of the data it's received. + +If the underlying cephadm command execution hits an exception, the thread passes control to a _handle_thread_exception method. +Here the exception is logged to the daemon's log file and the exception details are added to the cache, providing visibility +of the problem to the caller. + +Although each thread is effectively given it's own URL endpoint (host, disks, daemons), the recommended way to gather data from +the host is to simply use the ``/v1/metadata`` endpoint. This will provide all of the data, and indicate whether any of the +threads have failed. + +The run method uses "signal" to establish a reload hook, but in the initial implementation this doesn't take any action and simply +logs that a reload was received. + + +Future Work +=========== + +#. Consider the potential of adding a restart policy for threads +#. Once the exporter is fully integrated into mgr/cephadm, the goal would be to make the exporter the + default means of data gathering. However, until then the exporter will remain as an opt-in 'feature + preview'. diff --git a/doc/dev/cephadm/compliance-check.rst b/doc/dev/cephadm/compliance-check.rst new file mode 100644 index 000000000..eea462445 --- /dev/null +++ b/doc/dev/cephadm/compliance-check.rst @@ -0,0 +1,121 @@ +================ +Compliance Check +================ + +The stability and reliability of a Ceph cluster is dependent not just upon the Ceph daemons, but +also the OS and hardware that Ceph is installed on. This document is intended to promote a design +discussion for providing a "compliance" feature within mgr/cephadm, which would be responsible for +identifying common platform-related issues that could impact Ceph stability and operation. + +The ultimate goal of these checks is to identify issues early and raise a healthcheck WARN +event, to alert the Administrator to the issue. + +Prerequisites +============= +In order to effectively analyse the hosts that Ceph is deployed to, this feature requires a cache +of host-related metadata. The metadata is already available from cephadm's HostFacts class and the +``gather-facts`` cephadm command. For the purposes of this document, we will assume that this +data is available within the mgr/cephadm "cache" structure. + +Some checks will require that the host status is also populated e.g. ONLINE, OFFLINE, MAINTENANCE + +Administrator Interaction +========================= +Not all users will require this feature, and must be able to 'opt out'. For this reason, +mgr/cephadm must provide controls, such as the following; + +.. code-block:: + + ceph cephadm compliance enable | disable | status [--format json] + ceph cephadm compliance ls [--format json] + ceph cephadm compliance enable-check <name> + ceph cephadm compliance disable-check <name> + ceph cephadm compliance set-check-interval <int> + ceph cephadm compliance get-check-interval + +The status option would show the enabled/disabled state of the feature, along with the +check-interval. + +The ``ls`` subcommand would show all checks in the following format; + +``check-name status description`` + +Proposed Integration +==================== +The compliance checks are not required to run all the time, but instead should run at discrete +intervals. The interval would be configurable under via the :code:`set-check-interval` +subcommand (default would be every 12 hours) + + +mgr/cephadm currently executes an event driven (time based) serve loop to act on deploy/remove and +reconcile activity. In order to execute the compliance checks, the compliance check code would be +called from this main serve loop - when the :code:`set-check-interval` is met. + + +Proposed Checks +=============== +All checks would push any errors to a list, so multiple issues can be escalated to the Admin at +the same time. The list below provides a description of each check, with the text following the +name indicating a shortname version *(the shortname is the reference for command Interaction +when enabling or disabling a check)* + + +OS Consistency (OS) +___________________ +* all hosts must use same vendor +* all hosts must be on the same major release (this check would only be applicable to distributions that + offer a long-term-support strategy (RHEL, CentOS, SLES, Ubuntu etc) + + +*src: gather-facts output* + +Linux Kernel Security Mode (LSM) +________________________________ +* All hosts should have a consistent SELINUX/AppArmor configuration + +*src: gather-facts output* + +Services Check (SERVICES) +_________________________ +Hosts that are in an ONLINE state should adhere to the following; + +* all daemons (systemd units) should be enabled +* all daemons should be running (not dead) + +*src: list_daemons output* + +Support Status (SUPPORT) +________________________ +If support status has been detected, it should be consistent across all hosts. At this point +support status is available only for Red Hat machines. + +*src: gather-facts output* + +Network : MTU (MTU) +________________________________ +All network interfaces on the same Ceph network (public/cluster) should have the same MTU + +*src: gather-facts output* + +Network : LinkSpeed (LINKSPEED) +____________________________________________ +All network interfaces on the same Ceph network (public/cluster) should have the same Linkspeed + +*src: gather-facts output* + +Network : Consistency (INTERFACE) +______________________________________________ +All hosts with OSDs should have consistent network configuration - eg. if some hosts do +not separate cluster/public traffic but others do, that is an anomaly that would generate a +compliance check warning. + +*src: gather-facts output* + +Notification Strategy +===================== +If any of the checks fail, mgr/cephadm would raise a WARN level alert + +Futures +======= +The checks highlighted here serve only as a starting point, and we should expect to expand +on the checks over time. diff --git a/doc/dev/cephadm/developing-cephadm.rst b/doc/dev/cephadm/developing-cephadm.rst new file mode 100644 index 000000000..d9f81c2c0 --- /dev/null +++ b/doc/dev/cephadm/developing-cephadm.rst @@ -0,0 +1,263 @@ +======================= +Developing with cephadm +======================= + +There are several ways to develop with cephadm. Which you use depends +on what you're trying to accomplish. + +vstart --cephadm +================ + +- Start a cluster with vstart, with cephadm configured +- Manage any additional daemons with cephadm +- Requires compiled ceph binaries + +In this case, the mon and manager at a minimum are running in the usual +vstart way, not managed by cephadm. But cephadm is enabled and the local +host is added, so you can deploy additional daemons or add additional hosts. + +This works well for developing cephadm itself, because any mgr/cephadm +or cephadm/cephadm code changes can be applied by kicking ceph-mgr +with ``ceph mgr fail x``. (When the mgr (re)starts, it loads the +cephadm/cephadm script into memory.) + +:: + + MON=1 MGR=1 OSD=0 MDS=0 ../src/vstart.sh -d -n -x --cephadm + +- ``~/.ssh/id_dsa[.pub]`` is used as the cluster key. It is assumed that + this key is authorized to ssh with no passphrase to root@`hostname`. +- cephadm does not try to manage any daemons started by vstart.sh (any + nonzero number in the environment variables). No service spec is defined + for mon or mgr. +- You'll see health warnings from cephadm about stray daemons--that's because + the vstart-launched daemons aren't controlled by cephadm. +- The default image is ``quay.io/ceph-ci/ceph:master``, but you can change + this by passing ``-o container_image=...`` or ``ceph config set global container_image ...``. + + +cstart and cpatch +================= + +The ``cstart.sh`` script will launch a cluster using cephadm and put the +conf and keyring in your build dir, so that the ``bin/ceph ...`` CLI works +(just like with vstart). The ``ckill.sh`` script will tear it down. + +- A unique but stable fsid is stored in ``fsid`` (in the build dir). +- The mon port is random, just like with vstart. +- The container image is ``quay.io/ceph-ci/ceph:$tag`` where $tag is + the first 8 chars of the fsid. +- If the container image doesn't exist yet when you run cstart for the + first time, it is built with cpatch. + +There are a few advantages here: + +- The cluster is a "normal" cephadm cluster that looks and behaves + just like a user's cluster would. In contrast, vstart and teuthology + clusters tend to be special in subtle (and not-so-subtle) ways (e.g. + having the ``lockdep`` turned on). + +To start a test cluster:: + + sudo ../src/cstart.sh + +The last line of the output will be a line you can cut+paste to update +the container image. For instance:: + + sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e + +By default, cpatch will patch everything it can think of from the local +build dir into the container image. If you are working on a specific +part of the system, though, can you get away with smaller changes so that +cpatch runs faster. For instance:: + + sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e --py + +will update the mgr modules (minus the dashboard). Or:: + + sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e --core + +will do most binaries and libraries. Pass ``-h`` to cpatch for all options. + +Once the container is updated, you can refresh/restart daemons by bouncing +them with:: + + sudo systemctl restart ceph-`cat fsid`.target + +When you're done, you can tear down the cluster with:: + + sudo ../src/ckill.sh # or, + sudo ../src/cephadm/cephadm rm-cluster --force --fsid `cat fsid` + +cephadm bootstrap --shared_ceph_folder +====================================== + +Cephadm can also be used directly without compiled ceph binaries. + +Run cephadm like so:: + + sudo ./cephadm bootstrap --mon-ip 127.0.0.1 \ + --ssh-private-key /home/<user>/.ssh/id_rsa \ + --skip-mon-network \ + --skip-monitoring-stack --single-host-defaults \ + --skip-dashboard \ + --shared_ceph_folder /home/<user>/path/to/ceph/ + +- ``~/.ssh/id_rsa`` is used as the cluster key. It is assumed that + this key is authorized to ssh with no passphrase to root@`hostname`. + +Source code changes made in the ``pybind/mgr/`` directory then +require a daemon restart to take effect. + +Note regarding network calls from CLI handlers +============================================== + +Executing any cephadm CLI commands like ``ceph orch ls`` will block the +mon command handler thread within the MGR, thus preventing any concurrent +CLI calls. Note that pressing ``^C`` will not resolve this situation, +as *only* the client will be aborted, but not execution of the command +within the orchestrator manager module itself. This means, cephadm will +be completely unresponsive until the execution of the CLI handler is +fully completed. Note that even ``ceph orch ps`` will not respond while +another handler is executing. + +This means we should do very few synchronous calls to remote hosts. +As a guideline, cephadm should do at most ``O(1)`` network calls in CLI handlers. +Everything else should be done asynchronously in other threads, like ``serve()``. + +Note regarding different variables used in the code +=================================================== + +* a ``service_type`` is something like mon, mgr, alertmanager etc defined + in ``ServiceSpec`` +* a ``service_id`` is the name of the service. Some services don't have + names. +* a ``service_name`` is ``<service_type>.<service_id>`` +* a ``daemon_type`` is the same as the service_type, except for ingress, + which has the haproxy and keepalived daemon types. +* a ``daemon_id`` is typically ``<service_id>.<hostname>.<random-string>``. + (Not the case for e.g. OSDs. OSDs are always called OSD.N) +* a ``daemon_name`` is ``<daemon_type>.<daemon_id>`` + +Kcli: a virtualization management tool to make easy orchestrators development +============================================================================= +`Kcli <https://github.com/karmab/kcli>`_ is meant to interact with existing +virtualization providers (libvirt, KubeVirt, oVirt, OpenStack, VMware vSphere, +GCP and AWS) and to easily deploy and customize VMs from cloud images. + +It allows you to setup an environment with several vms with your preferred +configuration( memory, cpus, disks) and OS flavor. + +main advantages: +---------------- + - Is fast. Typically you can have a completely new Ceph cluster ready to debug + and develop orchestrator features in less than 5 minutes. + - Is a "near production" lab. The lab created with kcli is very near of "real" + clusters in QE labs or even in production. So easy to test "real things" in + almost "real environment" + - Is safe and isolated. Do not depend of the things you have installed in your + machine. And the vms are isolated from your environment. + - Easy to work "dev" environment. For "not compilated" software pieces, + for example any mgr module. It is an environment that allow you to test your + changes interactively. + +Installation: +------------- +Complete documentation in `kcli installation <https://kcli.readthedocs.io/en/latest/#installation>`_ +but we strongly suggest to use the container image approach. + +So things to do: + - 1. Review `requeriments <https://kcli.readthedocs.io/en/latest/#libvirt-hypervisor-requisites>`_ + and install/configure whatever you need to meet them. + - 2. get the kcli image and create one alias for executing the kcli command + :: + + # podman pull quay.io/karmab/kcli + # alias kcli='podman run --net host -it --rm --security-opt label=disable -v $HOME/.ssh:/root/.ssh -v $HOME/.kcli:/root/.kcli -v /var/lib/libvirt/images:/var/lib/libvirt/images -v /var/run/libvirt:/var/run/libvirt -v $PWD:/workdir -v /var/tmp:/ignitiondir quay.io/karmab/kcli' + +.. note:: /var/lib/libvirt/images can be customized.... be sure that you are + using this folder for your OS images + +.. note:: Once you have used your kcli tool to create and use different labs, we + suggest you to "save" and use your own kcli image. + Why?: kcli is alive and it changes (and for the moment only exists one tag ... + latest). Because we have more than enough with the current functionality, and + what we want is overall stability, + we suggest to store the kcli image you are using in a safe place and update + your kcli alias to use your own image. + +Test your kcli installation: +---------------------------- +See the kcli `basic usage workflow <https://kcli.readthedocs.io/en/latest/#basic-workflow>`_ + +Create a Ceph lab cluster +------------------------- +In order to make easy this task we are going to use a kcli plan. + +A kcli plan is a file where you can define the different settings you want to +have in a set of vms. +You can define hardware parameters (cpu, memory, disks ..), operating system and +it also allows you to automate the installation and configuration of any +software you want to have. + +There is a `repository <https://github.com/karmab/kcli-plans>`_ with a collection of +plans that can be used for different purposes. And we have predefined plans to +install Ceph clusters using Ceph ansible or cephadm, lets create our first Ceph +cluster using cephadm:: + +# kcli2 create plan -u https://github.com/karmab/kcli-plans/blob/master/ceph/ceph_cluster.yml + +This will create a set of three vms using the plan file pointed by the url. +After a few minutes (depend of your laptop power), lets examine the cluster: + +* Take a look to the vms created:: + + # kcli list vms + +* Enter in the bootstrap node:: + + # kcli ssh ceph-node-00 + +* Take a look to the ceph cluster installed:: + + [centos@ceph-node-00 ~]$ sudo -i + [root@ceph-node-00 ~]# cephadm version + [root@ceph-node-00 ~]# cephadm shell + [ceph: root@ceph-node-00 /]# ceph orch host ls + +Create a Ceph cluster to make easy developing in mgr modules (Orchestrators and Dashboard) +------------------------------------------------------------------------------------------ +The cephadm kcli plan (and cephadm) are prepared to do that. + +The idea behind this method is to replace several python mgr folders in each of +the ceph daemons with the source code folders in your host machine. +This "trick" will allow you to make changes in any orchestrator or dashboard +module and test them intermediately. (only needed to disable/enable the mgr module) + +So in order to create a ceph cluster for development purposes you must use the +same cephadm plan but with a new parameter pointing your Ceph source code folder:: + + # kcli create plan -u https://github.com/karmab/kcli-plans/blob/master/ceph/ceph_cluster.yml -P ceph_dev_folder=/home/mycodefolder/ceph + +Ceph Dashboard development +-------------------------- +Ceph dashboard module is not going to be loaded if previously you have not +generated the frontend bundle. + +For now, in order load properly the Ceph Dashboardmodule and to apply frontend +changes you have to run "ng build" on your laptop:: + + # Start local frontend build with watcher (in background): + sudo dnf install -y nodejs + cd <path-to-your-ceph-repo> + cd src/pybind/mgr/dashboard/frontend + sudo chown -R <your-user>:root dist node_modules + NG_CLI_ANALYTICS=false npm ci + npm run build -- --deleteOutputPath=false --watch & + +After saving your changes, the frontend bundle will be built again. +When completed, you'll see:: + + "Localized bundle generation complete." + +Then you can reload your Dashboard browser tab. diff --git a/doc/dev/cephadm/host-maintenance.rst b/doc/dev/cephadm/host-maintenance.rst new file mode 100644 index 000000000..2b84ec7bd --- /dev/null +++ b/doc/dev/cephadm/host-maintenance.rst @@ -0,0 +1,104 @@ +================ +Host Maintenance +================ + +All hosts that support Ceph daemons need to support maintenance activity, whether the host +is physical or virtual. This means that management workflows should provide +a simple and consistent way to support this operational requirement. This document defines +the maintenance strategy that could be implemented in cephadm and mgr/cephadm. + + +High Level Design +================= +Placing a host into maintenance, adopts the following workflow; + +#. confirm that the removal of the host does not impact data availability (the following + steps will assume it is safe to proceed) + + * orch host ok-to-stop <host> would be used here + +#. if the host has osd daemons, apply noout to the host subtree to prevent data migration + from triggering during the planned maintenance slot. +#. Stop the ceph target (all daemons stop) +#. Disable the ceph target on that host, to prevent a reboot from automatically starting + ceph services again) + + +Exiting Maintenance, is basically the reverse of the above sequence + +Admin Interaction +================= +The ceph orch command will be extended to support maintenance. + +.. code-block:: + + ceph orch host maintenance enter <host> [ --force ] + ceph orch host maintenance exit <host> + +.. note:: In addition, the host's status should be updated to reflect whether it + is in maintenance or not. + +The 'check' Option +__________________ +The orch host ok-to-stop command focuses on ceph daemons (mon, osd, mds), which +provides the first check. However, a ceph cluster also uses other types of daemons +for monitoring, management and non-native protocol support which means the +logic will need to consider service impact too. The 'check' option provides +this additional layer to alert the user of service impact to *secondary* +daemons. + +The list below shows some of these additional daemons. + +* mgr (not included in ok-to-stop checks) +* prometheus, grafana, alertmanager +* rgw +* haproxy +* iscsi gateways +* ganesha gateways + +By using the --check option first, the Admin can choose whether to proceed. This +workflow is obviously optional for the CLI user, but could be integrated into the +UI workflow to help less experienced Administators manage the cluster. + +By adopting this two-phase approach, a UI based workflow would look something +like this. + +#. User selects a host to place into maintenance + + * orchestrator checks for data **and** service impact +#. If potential impact is shown, the next steps depend on the impact type + + * **data availability** : maintenance is denied, informing the user of the issue + * **service availability** : user is provided a list of affected services and + asked to confirm + + +Components Impacted +=================== +Implementing this capability will require changes to the following; + +* cephadm + + * Add maintenance subcommand with the following 'verbs'; enter, exit, check + +* mgr/cephadm + + * add methods to CephadmOrchestrator for enter/exit and check + * data gathering would be skipped for hosts in a maintenance state + +* mgr/orchestrator + + * add CLI commands to OrchestratorCli which expose the enter/exit and check interaction + + +Ideas for Future Work +===================== +#. When a host is placed into maintenance, the time of the event could be persisted. This + would allow the orchestrator layer to establish a maintenance window for the task and + alert if the maintenance window has been exceeded. +#. The maintenance process could support plugins to allow other integration tasks to be + initiated as part of the transition to and from maintenance. This plugin capability could + support actions like; + + * alert suppression to 3rd party monitoring framework(s) + * service level reporting, to record outage windows diff --git a/doc/dev/cephadm/index.rst b/doc/dev/cephadm/index.rst new file mode 100644 index 000000000..a09baffdb --- /dev/null +++ b/doc/dev/cephadm/index.rst @@ -0,0 +1,15 @@ +=================================== +CEPHADM Developer Documentation +=================================== + +.. rubric:: Contents + +.. toctree:: + :maxdepth: 1 + + + developing-cephadm + host-maintenance + compliance-check + cephadm-exporter + scalability-notes diff --git a/doc/dev/cephadm/scalability-notes.rst b/doc/dev/cephadm/scalability-notes.rst new file mode 100644 index 000000000..157153cb3 --- /dev/null +++ b/doc/dev/cephadm/scalability-notes.rst @@ -0,0 +1,95 @@ +############################################# + Notes and Thoughts on Cephadm's scalability +############################################# + +********************* + About this document +********************* + +This document does NOT define a specific proposal or some future work. +Instead it merely lists a few thoughts that MIGHT be relevant for future +cephadm enhacements. + +******* + Intro +******* + +Current situation: + +Cephadm manages all registered hosts. This means that it periodically +scrapes data from each host to identify changes on the host like: + +- disk added/removed +- daemon added/removed +- host network/firewall etc has changed + +Currently, cephadm scrapes each host (up to 10 in parallel) every 6 +minutes, unless a refresh is forced manually. + +Refreshes for disks (ceph-volume), daemons (podman/docker), etc, happen +in sequence. + +With the cephadm exporter, we have now reduced the time to scan hosts +considerably, but the question remains: + +Is the cephadm-exporter sufficient to solve all future scalability +issues? + +*********************************************** + Considerations of cephadm-exporter's REST API +*********************************************** + +The cephadm-exporter uses HTTP to serve an endpoint to the hosts +metadata. We MIGHT encounter some issues with this approach, which need +to be mitigated at some point. + +- With the cephadm-exporter we use SSH and HTTP to connect to each + host. Having two distinct transport layers feels odd, and we might + want to consider reducing it to only a single protocol. + +- The current approach of delivering ``bin/cephadm`` to the host doesn't + allow the use of external dependencies. This means that we're stuck + with the built-in HTTP server lib, which isn't great for providing a + good developer experience. ``bin/cephadm`` needs to be packaged and + distributed (one way or the other) for us to make use of a better + http server library. + +************************ + MON's config-key store +************************ + +After the ``mgr/cephadm`` queried metadata from each host, cephadm stores +the data within the mon's k-v store. + +If each host would be allowed to write their own metadata to the store, +``mgr/cephadm`` would no longer be required to gather the data. + +Some questions arise: + +- ``mgr/cephadm`` now needs to query data from the config-key store, + instead of relying on cached data. + +- cephadm knows three different types of data: (1) Data that is + critical and needs to be stored in the config-key store. (2) Data + that can be kept in memory only. (3) Data that can be stored in + RADOS pool. How can we apply this idea to those different types of + data. + +******************************* + Increase the worker pool size +******************************* + +``mgr/cephadm`` is currently able to scrape 10 nodes at the same time. + +The scrape of a individual host takes the same amount of time persists. +We'd just reduce the overall execution time. + +At best we can reach O(hosts) + O(daemons). + +************************* + Backwards compatibility +************************* + +Any changes need to be backwards compatible or completely isolated from +any existing functionality. There are running cephadm clusters out there +that require an upgrade path. |