doc/dev/cephadm/scalability-notes.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95

#############################################
 Notes and Thoughts on Cephadm's scalability
#############################################

*********************
 About this document
*********************

This document does NOT define a specific proposal or some future work.
Instead it merely lists a few thoughts that MIGHT be relevant for future
cephadm enhacements.

*******
 Intro
*******

Current situation:

Cephadm manages all registered hosts. This means that it periodically
scrapes data from each host to identify changes on the host like:

-  disk added/removed
-  daemon added/removed
-  host network/firewall etc has changed

Currently, cephadm scrapes each host (up to 10 in parallel) every 6
minutes, unless a refresh is forced manually.

Refreshes for disks (ceph-volume), daemons (podman/docker), etc, happen
in sequence.

With the cephadm exporter, we have now reduced the time to scan hosts
considerably, but the question remains:

Is the cephadm-exporter sufficient to solve all future scalability
issues?

***********************************************
 Considerations of cephadm-exporter's REST API
***********************************************

The cephadm-exporter uses HTTP to serve an endpoint to the hosts
metadata. We MIGHT encounter some issues with this approach, which need
to be mitigated at some point.

-  With the cephadm-exporter we use SSH and HTTP to connect to each
   host. Having two distinct transport layers feels odd, and we might
   want to consider reducing it to only a single protocol.

-  The current approach of delivering ``bin/cephadm`` to the host doesn't
   allow the use of external dependencies. This means that we're stuck
   with the built-in HTTP server lib, which isn't great for providing a
   good developer experience. ``bin/cephadm`` needs to be packaged and
   distributed (one way or the other) for us to make use of a better
   http server library.

************************
 MON's config-key store
************************

After the ``mgr/cephadm`` queried metadata from each host, cephadm stores
the data within the mon's k-v store.

If each host would be allowed to write their own metadata to the store,
``mgr/cephadm`` would no longer be required to gather the data.

Some questions arise:

-  ``mgr/cephadm`` now needs to query data from the config-key store,
   instead of relying on cached data.

-  cephadm knows three different types of data: (1) Data that is
   critical and needs to be stored in the config-key store. (2) Data
   that can be kept in memory only. (3) Data that can be stored in
   RADOS pool. How can we apply this idea to those different types of
   data.

*******************************
 Increase the worker pool size
*******************************

``mgr/cephadm`` is currently able to scrape 10 nodes at the same time.

The scrape of a individual host takes the same amount of time persists.
We'd just reduce the overall execution time.

At best we can reach O(hosts) + O(daemons).

*************************
 Backwards compatibility
*************************

Any changes need to be backwards compatible or completely isolated from
any existing functionality. There are running cephadm clusters out there
that require an upgrade path.