1 files changed, 474 insertions, 0 deletions
diff --git a/doc/monitoring/index.rst b/doc/monitoring/index.rst
new file mode 100644
index 000000000..2bf2aca90
--- /dev/null
+++ b/doc/monitoring/index.rst
@@ -0,0 +1,474 @@
+.. _monitoring:
+
+===================
+Monitoring overview
+===================
+
+The aim of this part of the documentation is to explain the Ceph monitoring
+stack and the meaning of the main Ceph metrics.
+
+With a good understand of the Ceph monitoring stack and metrics users can
+create customized monitoring tools, like Prometheus queries, Grafana
+dashboards, or scripts.
+
+
+Ceph Monitoring stack
+=====================
+
+Ceph provides a default monitoring stack wich is installed by cephadm and
+explained in the :ref:`Monitoring Services <mgr-cephadm-monitoring>` section of
+the cephadm documentation.
+
+
+Ceph metrics
+============
+
+The main source for Ceph metrics are the performance counters exposed by each
+Ceph daemon. The :doc:`../dev/perf_counters` are native Ceph monitoring data
+
+Performance counters are transformed into standard Prometheus metrics by the
+Ceph exporter daemon. This daemon runs on every Ceph cluster host and exposes a
+metrics end point where all the performance counters exposed by all the Ceph
+daemons running in the host are published in the form of Prometheus metrics.
+
+In addition to the Ceph exporter, there is another agent to expose Ceph
+metrics. It is the Prometheus manager module, wich exposes metrics related to
+the whole cluster, basically metrics that are not produced by individual Ceph
+daemons.
+
+The main source for obtaining Ceph metrics is the metrics endpoint exposed by
+the Cluster Prometheus server.  Ceph can provide you with the Prometheus
+endpoint where you can obtain the complete list of metrics (coming from Ceph
+exporter daemons and Prometheus manager module) and exeute queries.
+
+Use the following command to obtain the Prometheus server endpoint in your
+cluster:
+
+Example:
+
+.. code-block:: bash
+
+  # ceph orch ps --service_name prometheus
+  NAME                         HOST                          PORTS   STATUS          REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
+  prometheus.cephtest-node-00  cephtest-node-00.cephlab.com  *:9095  running (103m)    50s ago   5w     142M        -  2.33.4   514e6a882f6e  efe3cbc2e521
+
+With this information you can connect to
+``http://cephtest-node-00.cephlab.com:9095`` to access the Prometheus server
+interface.
+
+And the complete list of metrics (with help) for your cluster will be available
+in:
+
+``http://cephtest-node-00.cephlab.com:9095/api/v1/targets/metadata``
+
+
+It is good to outline that the main tool allowing users to observe and monitor a Ceph cluster is the **Ceph dashboard**. It provides graphics where the most important cluster and service metrics are represented. Most of the examples in this document are extracted from the dashboard graphics or extrapolated from the metrics exposed by the Ceph dashboard.
+
+
+Performance metrics
+===================
+
+Main metrics used to measure Cluster Ceph performance:
+
+All metrics have the following labels:
+``ceph_daemon``: identifier of the OSD daemon generating the metric
+``instance``: the IP address of the ceph exporter instance exposing the metric.
+``job``: prometheus scrape job
+
+Example:
+
+.. code-block:: bash
+
+  ceph_osd_op_r{ceph_daemon="osd.0", instance="192.168.122.7:9283", job="ceph"} = 73981
+
+*Cluster I/O (throughput):*
+Use ``ceph_osd_op_r_out_bytes`` and ``ceph_osd_op_w_in_bytes`` to obtain the cluster throughput generated by clients
+
+Example:
+
+.. code-block:: bash
+
+  Writes (B/s):
+  sum(irate(ceph_osd_op_w_in_bytes[1m]))
+
+  Reads (B/s):
+  sum(irate(ceph_osd_op_r_out_bytes[1m]))
+
+
+*Cluster I/O (operations):*
+Use ``ceph_osd_op_r``, ``ceph_osd_op_w`` to obtain the number of operations generated by clients
+
+Example:
+
+.. code-block:: bash
+
+  Writes (ops/s):
+  sum(irate(ceph_osd_op_w[1m]))
+
+  Reads (ops/s):
+  sum(irate(ceph_osd_op_r[1m]))
+
+*Latency:*
+Use ``ceph_osd_op_latency_sum`` wich represents the delay before a OSD transfer of data begins following a client instruction for its transfer
+
+Example:
+
+.. code-block:: bash
+
+  sum(irate(ceph_osd_op_latency_sum[1m]))
+
+
+OSD performance
+===============
+
+The previous explained cluster performance metrics are based in OSD metrics, selecting the right label we can obtain for a single OSD the same performance information explained for the cluster:
+
+Example:
+
+.. code-block:: bash
+
+  OSD 0 read latency
+  irate(ceph_osd_op_r_latency_sum{ceph_daemon=~"osd.0"}[1m]) / on (ceph_daemon) irate(ceph_osd_op_r_latency_count[1m])
+
+  OSD 0 write IOPS
+  irate(ceph_osd_op_w{ceph_daemon=~"osd.0"}[1m])
+
+  OSD 0 write thughtput (bytes)
+  irate(ceph_osd_op_w_in_bytes{ceph_daemon=~"osd.0"}[1m])
+
+  OSD.0 total raw capacity available
+  ceph_osd_stat_bytes{ceph_daemon="osd.0", instance="cephtest-node-00.cephlab.com:9283", job="ceph"} = 536451481
+
+
+Physical disk performance:
+==========================
+
+Combining Prometheus ``node_exporter`` metrics with Ceph metrics we can have
+information about the performance provided by physical disks used by OSDs.
+
+Example:
+
+.. code-block:: bash
+
+  Read latency of device used by OSD 0:
+  label_replace(irate(node_disk_read_time_seconds_total[1m]) / irate(node_disk_reads_completed_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")
+
+  Write latency of device used by OSD 0
+  label_replace(irate(node_disk_write_time_seconds_total[1m]) / irate(node_disk_writes_completed_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")
+
+  IOPS (device used by OSD.0)
+  reads:
+  label_replace(irate(node_disk_reads_completed_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")
+
+  writes:
+  label_replace(irate(node_disk_writes_completed_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")
+
+  Throughput (device used by OSD.0)
+  reads:
+  label_replace(irate(node_disk_read_bytes_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")
+
+  writes:
+  label_replace(irate(node_disk_written_bytes_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")
+
+  Physical Device Utilization (%) for OSD.0 in the last 5 minutes
+  label_replace(irate(node_disk_io_time_seconds_total[5m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")
+
+Pool metrics
+============
+
+These metrics have the following labels:
+``instance``: the ip address of the Ceph exporter daemon producing the metric.
+``pool_id``: identifier of the pool
+``job``: prometheus scrape job
+
+
+- ``ceph_pool_metadata``: Information about the pool It can be used together
+  with other metrics to provide more contextual information in queries and
+  graphs.  Apart of the three common labels this metric provide the following
+  extra labels:
+
+  - ``compression_mode``: compression used in the pool (lz4, snappy, zlib,
+    zstd, none). Example: compression_mode="none"
+
+  - ``description``: brief description of the pool type (replica:number of
+    replicas or Erasure code: ec profile). Example: description="replica:3"
+  - ``name``: name of the pool. Example: name=".mgr"
+  - ``type``: type of pool (replicated/erasure code). Example: type="replicated"
+
+- ``ceph_pool_bytes_used``: Total raw capacity consumed by user data and associated overheads by pool (metadata + redundancy):
+
+- ``ceph_pool_stored``: Total of CLIENT data stored in the pool
+
+- ``ceph_pool_compress_under_bytes``: Data eligible to be compressed in the pool
+
+- ``ceph_pool_compress_bytes_used``:  Data compressed in the pool
+
+- ``ceph_pool_rd``: CLIENT read operations per pool (reads per second)
+
+- ``ceph_pool_rd_bytes``: CLIENT read operations in bytes per pool
+
+- ``ceph_pool_wr``: CLIENT write operations per pool (writes per second)
+
+- ``ceph_pool_wr_bytes``: CLIENT write operation in bytes per pool
+
+
+**Useful queries**:
+
+.. code-block:: bash
+
+  Total raw capacity available in the cluster:
+  sum(ceph_osd_stat_bytes)
+
+  Total raw capacity consumed in the cluster (including metadata + redundancy):
+  sum(ceph_pool_bytes_used)
+
+  Total of CLIENT data stored in the cluster:
+  sum(ceph_pool_stored)
+
+  Compression savings:
+  sum(ceph_pool_compress_under_bytes - ceph_pool_compress_bytes_used)
+
+  CLIENT IOPS for a pool (testrbdpool)
+  reads: irate(ceph_pool_rd[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}
+  writes: irate(ceph_pool_wr[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}
+
+  CLIENT Throughput for a pool
+  reads: irate(ceph_pool_rd_bytes[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}
+  writes: irate(ceph_pool_wr_bytes[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}
+
+Object metrics
+==============
+
+These metrics have the following labels:
+``instance``: the ip address of the ceph exporter daemon providing the metric
+``instance_id``: identifier of the rgw daemon
+``job``: prometheus scrape job
+
+Example:
+
+.. code-block:: bash
+
+  ceph_rgw_req{instance="192.168.122.7:9283", instance_id="154247", job="ceph"} = 12345
+
+
+Generic metrics
+---------------
+- ``ceph_rgw_metadata``: Provides generic information about the RGW daemon.  It
+  can be used together with other metrics to provide more contextual
+  information in queries and graphs. Apart from the three common labels, this
+  metric provides the following extra labels:
+
+  - ``ceph_daemon``: Name of the Ceph daemon. Example:
+    ceph_daemon="rgw.rgwtest.cephtest-node-00.sxizyq",
+  - ``ceph_version``: Version of Ceph daemon. Example: ceph_version="ceph
+    version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)",
+  - ``hostname``: Name of the host where the daemon runs. Example:
+    hostname:"cephtest-node-00.cephlab.com",
+
+- ``ceph_rgw_req``: Number total of requests for the daemon (GET+PUT+DELETE)
+    Useful to detect bottlenecks and optimize load distribution.
+
+- ``ceph_rgw_qlen``: RGW operations queue length for the daemon.
+    Useful to detect bottlenecks and optimize load distribution.
+
+- ``ceph_rgw_failed_req``: Aborted requests.
+    Useful to detect daemon errors
+
+
+GET operations: related metrics
+-------------------------------
+- ``ceph_rgw_get_initial_lat_count``: Number of get operations
+
+- ``ceph_rgw_get_initial_lat_sum``: Total latency time for the GET operations
+
+- ``ceph_rgw_get``: Number total of GET requests
+
+- ``ceph_rgw_get_b``: Total bytes transferred in GET operations
+
+
+Put operations: related metrics
+-------------------------------
+- ``ceph_rgw_put_initial_lat_count``: Number of get operations
+
+- ``ceph_rgw_put_initial_lat_sum``: Total latency time for the PUT operations
+
+- ``ceph_rgw_put``: Total number of PUT operations
+
+- ``ceph_rgw_get_b``: Total bytes transferred in PUT operations
+
+
+Useful queries
+--------------
+
+.. code-block:: bash
+
+  The average of get latencies:
+  rate(ceph_rgw_get_initial_lat_sum[30s]) / rate(ceph_rgw_get_initial_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
+
+  The average of put latencies:
+  rate(ceph_rgw_put_initial_lat_sum[30s]) / rate(ceph_rgw_put_initial_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
+
+  Total requests per second:
+  rate(ceph_rgw_req[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
+
+  Total number of "other" operations (LIST, DELETE)
+  rate(ceph_rgw_req[30s]) -  (rate(ceph_rgw_get[30s]) + rate(ceph_rgw_put[30s]))
+
+  GET latencies
+  rate(ceph_rgw_get_initial_lat_sum[30s]) /  rate(ceph_rgw_get_initial_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
+
+  PUT latencies
+  rate(ceph_rgw_put_initial_lat_sum[30s]) /  rate(ceph_rgw_put_initial_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
+
+  Bandwidth consumed by GET operations
+  sum(rate(ceph_rgw_get_b[30s]))
+
+  Bandwidth consumed by PUT operations
+  sum(rate(ceph_rgw_put_b[30s]))
+
+  Bandwidth consumed by RGW instance (PUTs + GETs)
+  sum by (instance_id) (rate(ceph_rgw_get_b[30s]) + rate(ceph_rgw_put_b[30s])) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
+
+  Http errors:
+  rate(ceph_rgw_failed_req[30s])
+
+
+Filesystem Metrics
+==================
+
+These metrics have the following labels:
+``ceph_daemon``: The name of the MDS daemon
+``instance``: the ip address (and port) of of the Ceph exporter daemon exposing the metric
+``job``: prometheus scrape job
+
+Example:
+
+.. code-block:: bash
+
+  ceph_mds_request{ceph_daemon="mds.test.cephtest-node-00.hmhsoh", instance="192.168.122.7:9283", job="ceph"} = 1452
+
+
+Main metrics
+------------
+
+- ``ceph_mds_metadata``: Provides general information about the MDS daemon.  It
+  can be used together with other metrics to provide more contextual
+  information in queries and graphs.  It provides the following extra labels:
+
+  - ``ceph_version``: MDS daemon Ceph version
+  - ``fs_id``: filesystem cluster id
+  - ``hostname``: Host name where the MDS daemon runs
+  - ``public_addr``: Public address where the MDS daemon runs
+  - ``rank``: Rank of the MDS daemon
+
+Example:
+
+.. code-block:: bash
+
+ ceph_mds_metadata{ceph_daemon="mds.test.cephtest-node-00.hmhsoh", ceph_version="ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)", fs_id="-1", hostname="cephtest-node-00.cephlab.com", instance="cephtest-node-00.cephlab.com:9283", job="ceph", public_addr="192.168.122.145:6801/118896446", rank="-1"}
+
+
+- ``ceph_mds_request``: Total number of requests for the MDs daemon
+
+- ``ceph_mds_reply_latency_sum``: Reply latency total
+
+- ``ceph_mds_reply_latency_count``: Reply latency count
+
+- ``ceph_mds_server_handle_client_request``: Number of client requests
+
+- ``ceph_mds_sessions_session_count``: Session count
+
+- ``ceph_mds_sessions_total_load``: Total load
+
+- ``ceph_mds_sessions_sessions_open``: Sessions currently open
+
+- ``ceph_mds_sessions_sessions_stale``: Sessions currently stale
+
+- ``ceph_objecter_op_r``: Number of read operations
+
+- ``ceph_objecter_op_w``: Number of write operations
+
+- ``ceph_mds_root_rbytes``: Total number of bytes managed by the daemon
+
+- ``ceph_mds_root_rfiles``: Total number of files managed by the daemon
+
+
+Useful queries:
+---------------
+
+.. code-block:: bash
+
+  Total MDS daemons read workload:
+  sum(rate(ceph_objecter_op_r[1m]))
+
+  Total MDS daemons write workload:
+  sum(rate(ceph_objecter_op_w[1m]))
+
+  MDS daemon read workload: (daemon name is "mdstest")
+  sum(rate(ceph_objecter_op_r{ceph_daemon=~"mdstest"}[1m]))
+
+  MDS daemon write workload: (daemon name is "mdstest")
+  sum(rate(ceph_objecter_op_r{ceph_daemon=~"mdstest"}[1m]))
+
+  The average of reply latencies:
+  rate(ceph_mds_reply_latency_sum[30s]) / rate(ceph_mds_reply_latency_count[30s])
+
+  Total requests per second:
+  rate(ceph_mds_request[30s]) * on (instance) group_right (ceph_daemon) ceph_mds_metadata
+
+
+Block metrics
+=============
+
+By default RBD metrics for images are not available in order to provide the
+best performance in the prometheus manager module.
+
+To produce metrics for RBD images it is needed to configure properly the
+manager option ``mgr/prometheus/rbd_stats_pools``. For more information please
+see :ref:`prometheus-rbd-io-statistics`
+
+
+These metrics have the following labels:
+``image``: Name of the image which produces the metric value.
+``instance``: Node where the rbd metric is produced. (It points to the Ceph exporter daemon)
+``job``: Name of the Prometheus scrape job.
+``pool``: Image pool name.
+
+Example:
+
+.. code-block:: bash
+
+  ceph_rbd_read_bytes{image="test2", instance="cephtest-node-00.cephlab.com:9283", job="ceph", pool="testrbdpool"}
+
+
+Main metrics
+------------
+
+- ``ceph_rbd_read_bytes``: RBD image bytes read
+
+- ``ceph_rbd_read_latency_count``: RBD image reads latency count
+
+- ``ceph_rbd_read_latency_sum``: RBD image reads latency total
+
+- ``ceph_rbd_read_ops``: RBD image reads count
+
+- ``ceph_rbd_write_bytes``: RBD image bytes written
+
+- ``ceph_rbd_write_latency_count``: RBD image writes latency count
+
+- ``ceph_rbd_write_latency_sum``: RBD image writes latency total
+
+- ``ceph_rbd_write_ops``: RBD image writes count
+
+
+Useful queries
+--------------
+
+.. code-block:: bash
+
+  The average of read latencies:
+  rate(ceph_rbd_read_latency_sum[30s]) / rate(ceph_rbd_read_latency_count[30s]) * on (instance) group_left (ceph_daemon) ceph_rgw_metadata
+
+
+
+