doc/monitoring/index.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477

.. _monitoring:

===================
Monitoring overview
===================

The aim of this part of the documentation is to explain the Ceph monitoring
stack and the meaning of the main Ceph metrics.

With a good understand of the Ceph monitoring stack and metrics users can
create customized monitoring tools, like Prometheus queries, Grafana
dashboards, or scripts.


Ceph Monitoring stack
=====================

Ceph provides a default monitoring stack wich is installed by cephadm and
explained in the :ref:`Monitoring Services <mgr-cephadm-monitoring>` section of
the cephadm documentation.


Ceph metrics
============

The main source for Ceph metrics are the performance counters exposed by each
Ceph daemon. The :doc:`../dev/perf_counters` are native Ceph monitoring data

Performance counters are transformed into standard Prometheus metrics by the
Ceph exporter daemon. This daemon runs on every Ceph cluster host and exposes a
metrics end point where all the performance counters exposed by all the Ceph
daemons running in the host are published in the form of Prometheus metrics.

In addition to the Ceph exporter, there is another agent to expose Ceph
metrics. It is the Prometheus manager module, wich exposes metrics related to
the whole cluster, basically metrics that are not produced by individual Ceph
daemons.

The main source for obtaining Ceph metrics is the metrics endpoint exposed by
the Cluster Prometheus server.  Ceph can provide you with the Prometheus
endpoint where you can obtain the complete list of metrics (coming from Ceph
exporter daemons and Prometheus manager module) and exeute queries.

Use the following command to obtain the Prometheus server endpoint in your
cluster:

Example:

.. code-block:: bash

  # ceph orch ps --service_name prometheus
  NAME                         HOST                          PORTS   STATUS          REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
  prometheus.cephtest-node-00  cephtest-node-00.cephlab.com  *:9095  running (103m)    50s ago   5w     142M        -  2.33.4   514e6a882f6e  efe3cbc2e521

With this information you can connect to
``http://cephtest-node-00.cephlab.com:9095`` to access the Prometheus server
interface.

And the complete list of metrics (with help) for your cluster will be available
in:

``http://cephtest-node-00.cephlab.com:9095/api/v1/targets/metadata``


It is good to outline that the main tool allowing users to observe and monitor a Ceph cluster is the **Ceph dashboard**. It provides graphics where the most important cluster and service metrics are represented. Most of the examples in this document are extracted from the dashboard graphics or extrapolated from the metrics exposed by the Ceph dashboard.


Performance metrics
===================

Main metrics used to measure Cluster Ceph performance:

All metrics have the following labels:
``ceph_daemon``: identifier of the OSD daemon generating the metric
``instance``: the IP address of the ceph exporter instance exposing the metric.
``job``: prometheus scrape job

Example:

.. code-block:: bash

  ceph_osd_op_r{ceph_daemon="osd.0", instance="192.168.122.7:9283", job="ceph"} = 73981

*Cluster I/O (throughput):*
Use ``ceph_osd_op_r_out_bytes`` and ``ceph_osd_op_w_in_bytes`` to obtain the cluster throughput generated by clients

Example:

.. code-block:: bash

  Writes (B/s):
  sum(irate(ceph_osd_op_w_in_bytes[1m]))

  Reads (B/s):
  sum(irate(ceph_osd_op_r_out_bytes[1m]))


*Cluster I/O (operations):*
Use ``ceph_osd_op_r``, ``ceph_osd_op_w`` to obtain the number of operations generated by clients

Example:

.. code-block:: bash

  Writes (ops/s):
  sum(irate(ceph_osd_op_w[1m]))

  Reads (ops/s):
  sum(irate(ceph_osd_op_r[1m]))

*Latency:*
Use ``ceph_osd_op_latency_sum`` wich represents the delay before a OSD transfer of data begins following a client instruction for its transfer

Example:

.. code-block:: bash

  sum(irate(ceph_osd_op_latency_sum[1m]))


OSD performance
===============

The previous explained cluster performance metrics are based in OSD metrics, selecting the right label we can obtain for a single OSD the same performance information explained for the cluster:

Example:

.. code-block:: bash

  OSD 0 read latency
  irate(ceph_osd_op_r_latency_sum{ceph_daemon=~"osd.0"}[1m]) / on (ceph_daemon) irate(ceph_osd_op_r_latency_count[1m])

  OSD 0 write IOPS
  irate(ceph_osd_op_w{ceph_daemon=~"osd.0"}[1m])

  OSD 0 write thughtput (bytes)
  irate(ceph_osd_op_w_in_bytes{ceph_daemon=~"osd.0"}[1m])

  OSD.0 total raw capacity available
  ceph_osd_stat_bytes{ceph_daemon="osd.0", instance="cephtest-node-00.cephlab.com:9283", job="ceph"} = 536451481


Physical disk performance:
==========================

Combining Prometheus ``node_exporter`` metrics with Ceph metrics we can have
information about the performance provided by physical disks used by OSDs.

Example:

.. code-block:: bash

  Read latency of device used by OSD 0:
  label_replace(irate(node_disk_read_time_seconds_total[1m]) / irate(node_disk_reads_completed_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")

  Write latency of device used by OSD 0
  label_replace(irate(node_disk_write_time_seconds_total[1m]) / irate(node_disk_writes_completed_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")

  IOPS (device used by OSD.0)
  reads:
  label_replace(irate(node_disk_reads_completed_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")

  writes:
  label_replace(irate(node_disk_writes_completed_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")

  Throughput (device used by OSD.0)
  reads:
  label_replace(irate(node_disk_read_bytes_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")

  writes:
  label_replace(irate(node_disk_written_bytes_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")

  Physical Device Utilization (%) for OSD.0 in the last 5 minutes
  label_replace(irate(node_disk_io_time_seconds_total[5m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")

Pool metrics
============

These metrics have the following labels:
``instance``: the ip address of the Ceph exporter daemon producing the metric.
``pool_id``: identifier of the pool
``job``: prometheus scrape job


- ``ceph_pool_metadata``: Information about the pool It can be used together
  with other metrics to provide more contextual information in queries and
  graphs.  Apart of the three common labels this metric provide the following
  extra labels:

  - ``compression_mode``: compression used in the pool (lz4, snappy, zlib,
    zstd, none). Example: compression_mode="none"

  - ``description``: brief description of the pool type (replica:number of
    replicas or Erasure code: ec profile). Example: description="replica:3"
  - ``name``: name of the pool. Example: name=".mgr"
  - ``type``: type of pool (replicated/erasure code). Example: type="replicated"

- ``ceph_pool_bytes_used``: Total raw capacity consumed by user data and associated overheads by pool (metadata + redundancy):

- ``ceph_pool_stored``: Total of CLIENT data stored in the pool

- ``ceph_pool_compress_under_bytes``: Data eligible to be compressed in the pool

- ``ceph_pool_compress_bytes_used``:  Data compressed in the pool

- ``ceph_pool_rd``: CLIENT read operations per pool (reads per second)

- ``ceph_pool_rd_bytes``: CLIENT read operations in bytes per pool

- ``ceph_pool_wr``: CLIENT write operations per pool (writes per second)

- ``ceph_pool_wr_bytes``: CLIENT write operation in bytes per pool


**Useful queries**:

.. code-block:: bash

  Total raw capacity available in the cluster:
  sum(ceph_osd_stat_bytes)

  Total raw capacity consumed in the cluster (including metadata + redundancy):
  sum(ceph_pool_bytes_used)

  Total of CLIENT data stored in the cluster:
  sum(ceph_pool_stored)

  Compression savings:
  sum(ceph_pool_compress_under_bytes - ceph_pool_compress_bytes_used)

  CLIENT IOPS for a pool (testrbdpool)
  reads: irate(ceph_pool_rd[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}
  writes: irate(ceph_pool_wr[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}

  CLIENT Throughput for a pool
  reads: irate(ceph_pool_rd_bytes[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}
  writes: irate(ceph_pool_wr_bytes[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}

Object metrics
==============

These metrics have the following labels:
``instance``: the ip address of the ceph exporter daemon providing the metric
``instance_id``: identifier of the rgw daemon
``job``: prometheus scrape job

Example:

.. code-block:: bash

  ceph_rgw_req{instance="192.168.122.7:9283", instance_id="154247", job="ceph"} = 12345


Generic metrics
---------------
- ``ceph_rgw_metadata``: Provides generic information about the RGW daemon.  It
  can be used together with other metrics to provide more contextual
  information in queries and graphs. Apart from the three common labels, this
  metric provides the following extra labels:

  - ``ceph_daemon``: Name of the Ceph daemon. Example:
    ceph_daemon="rgw.rgwtest.cephtest-node-00.sxizyq",
  - ``ceph_version``: Version of Ceph daemon. Example: ceph_version="ceph
    version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)",
  - ``hostname``: Name of the host where the daemon runs. Example:
    hostname:"cephtest-node-00.cephlab.com",

- ``ceph_rgw_req``: Number total of requests for the daemon (GET+PUT+DELETE)
    Useful to detect bottlenecks and optimize load distribution.

- ``ceph_rgw_qlen``: RGW operations queue length for the daemon.
    Useful to detect bottlenecks and optimize load distribution.

- ``ceph_rgw_failed_req``: Aborted requests.
    Useful to detect daemon errors


GET operations: related metrics
-------------------------------
- ``ceph_rgw_get_initial_lat_count``: Number of get operations

- ``ceph_rgw_get_initial_lat_sum``: Total latency time for the GET operations

- ``ceph_rgw_get``: Number total of GET requests

- ``ceph_rgw_get_b``: Total bytes transferred in GET operations


Put operations: related metrics
-------------------------------
- ``ceph_rgw_put_initial_lat_count``: Number of get operations

- ``ceph_rgw_put_initial_lat_sum``: Total latency time for the PUT operations

- ``ceph_rgw_put``: Total number of PUT operations

- ``ceph_rgw_get_b``: Total bytes transferred in PUT operations


Useful queries
--------------

.. code-block:: bash

  The average of get latencies:
  rate(ceph_rgw_get_initial_lat_sum[30s]) / rate(ceph_rgw_get_initial_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata

  The average of put latencies:
  rate(ceph_rgw_put_initial_lat_sum[30s]) / rate(ceph_rgw_put_initial_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata

  Total requests per second:
  rate(ceph_rgw_req[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata

  Total number of "other" operations (LIST, DELETE)
  rate(ceph_rgw_req[30s]) -  (rate(ceph_rgw_get[30s]) + rate(ceph_rgw_put[30s]))

  GET latencies
  rate(ceph_rgw_get_initial_lat_sum[30s]) /  rate(ceph_rgw_get_initial_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata

  PUT latencies
  rate(ceph_rgw_put_initial_lat_sum[30s]) /  rate(ceph_rgw_put_initial_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata

  Bandwidth consumed by GET operations
  sum(rate(ceph_rgw_get_b[30s]))

  Bandwidth consumed by PUT operations
  sum(rate(ceph_rgw_put_b[30s]))

  Bandwidth consumed by RGW instance (PUTs + GETs)
  sum by (instance_id) (rate(ceph_rgw_get_b[30s]) + rate(ceph_rgw_put_b[30s])) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata

  Http errors:
  rate(ceph_rgw_failed_req[30s])


Filesystem Metrics
==================

These metrics have the following labels:
``ceph_daemon``: The name of the MDS daemon
``instance``: the ip address (and port) of of the Ceph exporter daemon exposing the metric
``job``: prometheus scrape job

Example:

.. code-block:: bash

  ceph_mds_request{ceph_daemon="mds.test.cephtest-node-00.hmhsoh", instance="192.168.122.7:9283", job="ceph"} = 1452


Main metrics
------------

- ``ceph_mds_metadata``: Provides general information about the MDS daemon.  It
  can be used together with other metrics to provide more contextual
  information in queries and graphs.  It provides the following extra labels:

  - ``ceph_version``: MDS daemon Ceph version
  - ``fs_id``: filesystem cluster id
  - ``hostname``: Host name where the MDS daemon runs
  - ``public_addr``: Public address where the MDS daemon runs
  - ``rank``: Rank of the MDS daemon

Example:

.. code-block:: bash

 ceph_mds_metadata{ceph_daemon="mds.test.cephtest-node-00.hmhsoh", ceph_version="ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)", fs_id="-1", hostname="cephtest-node-00.cephlab.com", instance="cephtest-node-00.cephlab.com:9283", job="ceph", public_addr="192.168.122.145:6801/118896446", rank="-1"}


- ``ceph_mds_request``: Total number of requests for the MDs daemon

- ``ceph_mds_reply_latency_sum``: Reply latency total

- ``ceph_mds_reply_latency_count``: Reply latency count

- ``ceph_mds_server_handle_client_request``: Number of client requests

- ``ceph_mds_sessions_session_count``: Session count

- ``ceph_mds_sessions_total_load``: Total load

- ``ceph_mds_sessions_sessions_open``: Sessions currently open

- ``ceph_mds_sessions_sessions_stale``: Sessions currently stale

- ``ceph_objecter_op_r``: Number of read operations

- ``ceph_objecter_op_w``: Number of write operations

- ``ceph_mds_root_rbytes``: Total number of bytes managed by the daemon

- ``ceph_mds_root_rfiles``: Total number of files managed by the daemon


Useful queries:
---------------

.. code-block:: bash

  Total MDS daemons read workload:
  sum(rate(ceph_objecter_op_r[1m]))

  Total MDS daemons write workload:
  sum(rate(ceph_objecter_op_w[1m]))

  MDS daemon read workload: (daemon name is "mdstest")
  sum(rate(ceph_objecter_op_r{ceph_daemon=~"mdstest"}[1m]))

  MDS daemon write workload: (daemon name is "mdstest")
  sum(rate(ceph_objecter_op_r{ceph_daemon=~"mdstest"}[1m]))

  The average of reply latencies:
  rate(ceph_mds_reply_latency_sum[30s]) / rate(ceph_mds_reply_latency_count[30s])

  Total requests per second:
  rate(ceph_mds_request[30s]) * on (instance) group_right (ceph_daemon) ceph_mds_metadata


Block metrics
=============

By default RBD metrics for images are not available in order to provide the
best performance in the prometheus manager module.

To produce metrics for RBD images it is needed to configure properly the
manager option ``mgr/prometheus/rbd_stats_pools``. For more information please
see :ref:`prometheus-rbd-io-statistics`


These metrics have the following labels:
``image``: Name of the image which produces the metric value.
``instance``: Node where the rbd metric is produced. (It points to the Ceph exporter daemon)
``job``: Name of the Prometheus scrape job.
``pool``: Image pool name.

Example:

.. code-block:: bash

  ceph_rbd_read_bytes{image="test2", instance="cephtest-node-00.cephlab.com:9283", job="ceph", pool="testrbdpool"}


Main metrics
------------

- ``ceph_rbd_read_bytes``: RBD image bytes read

- ``ceph_rbd_read_latency_count``: RBD image reads latency count

- ``ceph_rbd_read_latency_sum``: RBD image reads latency total

- ``ceph_rbd_read_ops``: RBD image reads count

- ``ceph_rbd_write_bytes``: RBD image bytes written

- ``ceph_rbd_write_latency_count``: RBD image writes latency count

- ``ceph_rbd_write_latency_sum``: RBD image writes latency total

- ``ceph_rbd_write_ops``: RBD image writes count


Useful queries
--------------

.. code-block:: bash

  The average of read latencies:
  rate(ceph_rbd_read_latency_sum[30s]) / rate(ceph_rbd_read_latency_count[30s]) * on (instance) group_left (ceph_daemon) ceph_rgw_metadata


Hardware monitoring
===================

See :ref:`hardware-monitoring`