doc/dev/cephadm/developing-cephadm.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403

=======================
Developing with cephadm
=======================

There are several ways to develop with cephadm.  Which you use depends
on what you're trying to accomplish.

vstart --cephadm
================

- Start a cluster with vstart, with cephadm configured
- Manage any additional daemons with cephadm
- Requires compiled ceph binaries

In this case, the mon and manager at a minimum are running in the usual
vstart way, not managed by cephadm.  But cephadm is enabled and the local
host is added, so you can deploy additional daemons or add additional hosts.

This works well for developing cephadm itself, because any mgr/cephadm
or cephadm/cephadm code changes can be applied by kicking ceph-mgr
with ``ceph mgr fail x``.  (When the mgr (re)starts, it loads the
cephadm/cephadm script into memory.)

::

   MON=1 MGR=1 OSD=0 MDS=0 ../src/vstart.sh -d -n -x --cephadm

- ``~/.ssh/id_dsa[.pub]`` is used as the cluster key.  It is assumed that
  this key is authorized to ssh with no passphrase to root@`hostname`.
- cephadm does not try to manage any daemons started by vstart.sh (any
  nonzero number in the environment variables).  No service spec is defined
  for mon or mgr.
- You'll see health warnings from cephadm about stray daemons--that's because
  the vstart-launched daemons aren't controlled by cephadm.
- The default image is ``quay.io/ceph-ci/ceph:main``, but you can change
  this by passing ``-o container_image=...`` or ``ceph config set global container_image ...``.


cstart and cpatch
=================

The ``cstart.sh`` script will launch a cluster using cephadm and put the
conf and keyring in your build dir, so that the ``bin/ceph ...`` CLI works
(just like with vstart).  The ``ckill.sh`` script will tear it down.

- A unique but stable fsid is stored in ``fsid`` (in the build dir).
- The mon port is random, just like with vstart.
- The container image is ``quay.io/ceph-ci/ceph:$tag`` where $tag is
  the first 8 chars of the fsid.
- If the container image doesn't exist yet when you run cstart for the
  first time, it is built with cpatch.

There are a few advantages here:

- The cluster is a "normal" cephadm cluster that looks and behaves
  just like a user's cluster would.  In contrast, vstart and teuthology
  clusters tend to be special in subtle (and not-so-subtle) ways (e.g.
  having the ``lockdep`` turned on).

To start a test cluster::

  sudo ../src/cstart.sh

The last line of the output will be a line you can cut+paste to update
the container image.  For instance::

  sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e

By default, cpatch will patch everything it can think of from the local
build dir into the container image.  If you are working on a specific
part of the system, though, can you get away with smaller changes so that
cpatch runs faster.  For instance::

  sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e --py

will update the mgr modules (minus the dashboard).  Or::

  sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e --core

will do most binaries and libraries.  Pass ``-h`` to cpatch for all options.

Once the container is updated, you can refresh/restart daemons by bouncing
them with::

  sudo systemctl restart ceph-`cat fsid`.target

When you're done, you can tear down the cluster with::

  sudo ../src/ckill.sh   # or,
  sudo ../src/cephadm/cephadm rm-cluster --force --fsid `cat fsid`

cephadm bootstrap --shared_ceph_folder
======================================

Cephadm can also be used directly without compiled ceph binaries.

Run cephadm like so::

  sudo ./cephadm bootstrap --mon-ip 127.0.0.1 \
    --ssh-private-key /home/<user>/.ssh/id_rsa \
    --skip-mon-network \
    --skip-monitoring-stack --single-host-defaults \
    --skip-dashboard \
    --shared_ceph_folder /home/<user>/path/to/ceph/

- ``~/.ssh/id_rsa`` is used as the cluster key.  It is assumed that
  this key is authorized to ssh with no passphrase to root@`hostname`.

Source code changes made in the ``pybind/mgr/`` directory then
require a daemon restart to take effect.

Kcli: a virtualization management tool to make easy orchestrators development
=============================================================================
`Kcli <https://github.com/karmab/kcli>`_ is meant to interact with existing
virtualization providers (libvirt, KubeVirt, oVirt, OpenStack, VMware vSphere,
GCP and AWS) and to easily deploy and customize VMs from cloud images.

It allows you to setup an environment with several vms with your preferred
configuration (memory, cpus, disks) and OS flavor.

main advantages:
----------------
  - Fast. Typically you can have a completely new Ceph cluster ready to debug
    and develop orchestrator features in less than 5 minutes.
  - "Close to production" lab. The resulting lab is close to "real" clusters
    in QE labs or even production. It makes it easy to test "real things" in
    an almost "real" environment.
  - Safe and isolated. Does not depend of the things you have installed in
    your machine. And the vms are isolated from your environment.
  - Easy to work "dev" environment. For "not compiled" software pieces,
    for example any mgr module. It is an environment that allow you to test your
    changes interactively.

Installation:
-------------
Complete documentation in `kcli installation <https://kcli.readthedocs.io/en/latest/#installation>`_
but we suggest to use the container image approach.

So things to do:
  - 1. Review `requirements <https://kcli.readthedocs.io/en/latest/#libvirt-hypervisor-requisites>`_
    and install/configure whatever is needed to meet them.
  - 2. get the kcli image and create one alias for executing the kcli command
    ::

        # podman pull quay.io/karmab/kcli
        # alias kcli='podman run --net host -it --rm --security-opt label=disable -v $HOME/.ssh:/root/.ssh -v $HOME/.kcli:/root/.kcli -v /var/lib/libvirt/images:/var/lib/libvirt/images -v /var/run/libvirt:/var/run/libvirt -v $PWD:/workdir -v /var/tmp:/ignitiondir quay.io/karmab/kcli'

.. note:: This assumes that /var/lib/libvirt/images is your default libvirt pool.... Adjust if using a different path

.. note:: Once you have used your kcli tool to create and use different labs, we
   suggest you stick to a given container tag and update your kcli alias.
   Why? kcli uses a rolling release model and sticking to a specific
   container tag will improve overall stability.
   what we want is overall stability.

Test your kcli installation:
----------------------------
See the kcli `basic usage workflow <https://kcli.readthedocs.io/en/latest/#basic-workflow>`_

Create a Ceph lab cluster
-------------------------
In order to make this task simple, we are going to use a "plan".

A "plan" is a file where you can define a set of vms with different settings.
You can define hardware parameters (cpu, memory, disks ..), operating system and
it also allows you to automate the installation and configuration of any
software you want to have.

There is a `repository <https://github.com/karmab/kcli-plans>`_ with a collection of
plans that can be used for different purposes. And we have predefined plans to
install Ceph clusters using Ceph ansible or cephadm, so let's create our first Ceph
cluster using cephadm::

# kcli create plan -u https://github.com/karmab/kcli-plans/blob/master/ceph/ceph_cluster.yml

This will create a set of three vms using the plan file pointed by the url.
After a few minutes, let's check the cluster:

* Take a look to the vms created::

  # kcli list vms

* Enter in the bootstrap node::

  # kcli ssh ceph-node-00

* Take a look to the ceph cluster installed::

  [centos@ceph-node-00 ~]$ sudo -i
  [root@ceph-node-00 ~]# cephadm version
  [root@ceph-node-00 ~]# cephadm shell
  [ceph: root@ceph-node-00 /]# ceph orch host ls

Create a Ceph cluster to make easy developing in mgr modules (Orchestrators and Dashboard)
------------------------------------------------------------------------------------------
The cephadm kcli plan (and cephadm) are prepared to do that.

The idea behind this method is to replace several python mgr folders in each of
the ceph daemons with the source code folders in your host machine.
This "trick" will allow you to make changes in any orchestrator or dashboard
module and test them intermediately. (only needed to disable/enable the mgr module)

So in order to create a ceph cluster for development purposes you must use the
same cephadm plan but with a new parameter pointing to your Ceph source code folder::

  # kcli create plan -u https://github.com/karmab/kcli-plans/blob/master/ceph/ceph_cluster.yml -P ceph_dev_folder=/home/mycodefolder/ceph

Ceph Dashboard development
--------------------------
Ceph dashboard module is not going to be loaded if previously you have not
generated the frontend bundle.

For now, in order load properly the Ceph Dashboardmodule and to apply frontend
changes you have to run "ng build" on your laptop::

  # Start local frontend build with watcher (in background):
  sudo dnf install -y nodejs
  cd <path-to-your-ceph-repo>
  cd src/pybind/mgr/dashboard/frontend
  sudo chown -R <your-user>:root dist node_modules
  NG_CLI_ANALYTICS=false npm ci
  npm run build -- --deleteOutputPath=false --watch &

After saving your changes, the frontend bundle will be built again.
When completed, you'll see::

  "Localized bundle generation complete."

Then you can reload your Dashboard browser tab.

Cephadm box container (Podman inside Podman) development environment
====================================================================

As kcli has a long startup time, we created an alternative which is faster using
Podman inside Podman. This approach has its downsides too as we have to
simulate the creation of osds and addition of devices with loopback devices.

Cephadm's box environment is simple to set up. The setup requires you to
get the required Podman images for Ceph and what we call boxes.
A box is the first layer of Podman containers which can be either a seed or a
host. A seed is the main box which holds Cephadm and where you bootstrap the
cluster. On the other hand, you have hosts with a SSH server setup so you can
add those hosts to the cluster. The second layer, managed by Cephadm, inside the
seed box, requires the Ceph image.

.. warning:: This development environment is still experimental and can have unexpected
             behaviour. Please take a look at the road map and the known issues section
             to see what the development progress.

Requirements
------------

* `podman-compose <https://github.com/containers/podman-compose>`_
* lvm

Setup
-----

In order to setup Cephadm's box run::

  cd src/cephadm/box
  ./box.py -v cluster setup

.. note:: It is recommended to run box with verbose (-v) as it will show the output of
          shell commands being run.

After getting all needed images we can create a simple cluster without OSDs and hosts with::

  ./box.py -v cluster start

If you want to deploy the cluster with more OSDs and hosts::
  # 3 osds and 3 hosts by default
  sudo box -v cluster start --extended
  # explicitly change number of hosts and osds
  sudo box -v cluster start --extended --osds 5 --hosts 5

.. warning:: OSDs are still not supported in the box implementation with Podman. It is
             work in progress.


Without the extended option, explicitly adding either more hosts or OSDs won't change the state
of the cluster.

.. note:: Cluster start will try to setup even if cluster setup was not called.
.. note:: OSDs are created with loopback devices and hence, sudo is needed to
   create loopback devices capable of holding OSDs.
.. note::  Each osd will require 5GiB of space.

After bootstrapping the cluster you can go inside the seed box in which you'll be
able to run Cephadm commands::

  ./box.py -v cluster bash
  [root@8d52a7860245] cephadm --help
  [root@8d52a7860245] cephadm shell
  ...


If you want to navigate to the dashboard enter https://localhost:8443 on you browser.

You can also find the hostname and ip of each box container with::

  ./box.py cluster list

and you'll see something like::

  IP               Name            Hostname
  172.30.0.2       box_hosts_1     6283b7b51d91
  172.30.0.3       box_hosts_3     3dcf7f1b25a4
  172.30.0.4       box_seed_1      8d52a7860245
  172.30.0.5       box_hosts_2     c3c7b3273bf1

To remove the cluster and clean up run::

  ./box.py cluster down

If you just want to clean up the last cluster created run::

  ./box.py cluster cleanup

To check all available commands run::

  ./box.py --help

If you want to run the box with Docker you can. You'll have to specify which
engine you want to you like::

  ./box.py -v --engine docker cluster list

With Docker commands like bootstrap and osd creation should be called with sudo
since it requires privileges to create osds on VGs and LVs::

  sudo ./box.py -v --engine docker cluster start --expanded

.. warning:: Using Docker as the box engine is dangerous as there were some instances
             where the Xorg session was killed.

Known issues
------------

* If you get permission issues with Cephadm because it cannot infer the keyring
  and configuration, please run cephadm like this example::

    cephadm shell --config /etc/ceph/ceph.conf --keyring /etc/ceph/ceph.kerying

* Docker containers run with the --privileged flag enabled which has been seen
  to make some computers log out.
* If SELinux is not disabled you'll probably see unexpected behaviour. For example:
  if not all permissions of Ceph repo files are set to your user it will probably
  fail starting with podman-compose.
* If running a command it fails to run a podman command because it couldn't find the
  container, you can debug by running the same podman-compose .. up command displayed
  with the flag -v.

Road map
------------

* Create osds with ``ceph-volume raw``.
* Enable ceph-volume to mark loopback devices as a valid block device in
  the inventory.
* Make the box ready to run dashboard CI tests (including cluster expansion).

Note regarding network calls from CLI handlers
==============================================

Executing any cephadm CLI commands like ``ceph orch ls`` will block the
mon command handler thread within the MGR, thus preventing any concurrent
CLI calls. Note that pressing ``^C`` will not resolve this situation,
as *only* the client will be aborted, but not execution of the command
within the orchestrator manager module itself. This means, cephadm will
be completely unresponsive until the execution of the CLI handler is
fully completed. Note that even ``ceph orch ps`` will not respond while
another handler is executing.

This means we should do very few synchronous calls to remote hosts.
As a guideline, cephadm should do at most ``O(1)`` network calls in CLI handlers.
Everything else should be done asynchronously in other threads, like ``serve()``.

Note regarding different variables used in the code
===================================================

* a ``service_type`` is something like mon, mgr, alertmanager etc defined
  in ``ServiceSpec``
* a ``service_id`` is the name of the service. Some services don't have
  names.
* a ``service_name`` is ``<service_type>.<service_id>``
* a ``daemon_type`` is the same as the service_type, except for ingress,
  which has the haproxy and keepalived daemon types.
* a ``daemon_id`` is typically ``<service_id>.<hostname>.<random-string>``.
  (Not the case for e.g. OSDs. OSDs are always called OSD.N)
* a ``daemon_name`` is ``<daemon_type>.<daemon_id>``

.. _compiling-cephadm:

Compiling cephadm
=================

Recent versions of cephadm are based on `Python Zip Application`_ support, and
are "compiled" from Python source code files in the ceph tree. To create your
own copy of the cephadm "binary" use the script located at
``src/cephadm/build.py`` in the Ceph tree.  The command should take the form
``./src/cephadm/build.py [output]``.

.. _Python Zip Application: https://peps.python.org/pep-0441/