summaryrefslogtreecommitdiffstats
path: root/doc/mgr/insights.rst
blob: 37b8903f165a1d4e3cdd3484c2b92499d215053d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
Insights Module
===============

The insights module collects and exposes system information to the Insights Core
data analysis framework. It is intended to replace explicit interrogation of
Ceph CLIs and daemon admin sockets, reducing the API surface that Insights
depends on. The insights reports contains the following:

* **Health reports**. In addition to reporting the current health of the
  cluster, the insights module reports a summary of the last 24 hours of health
  checks. This feature is important for catching cluster health issues that are
  transient and may not be present at the moment the report is generated. Health
  checks are deduplicated to avoid unbounded data growth.

* **Crash reports**. A summary of any daemon crashes in the past 24 hours is
  included in the insights report. Crashes are reported as the number of crashes
  per daemon type (e.g. `ceph-osd`) within the time window. Full details of a
  crash may be obtained using the `crash module`_.

* Software version, storage utilization, cluster maps, placement group summary,
  monitor status, cluster configuration, and OSD metadata.

Enabling
--------

The *insights* module is enabled with::

  ceph mgr module enable insights

Commands
--------
::

  ceph insights

Generate the full report.

::

  ceph insights prune-health <hours>

Remove historical health data older than <hours>. Passing `0` for <hours> will
clear all health data.

This command is useful for cleaning the health history before automated nightly
reports are generated, which may contain spurious health checks accumulated
while performing system maintenance, or other health checks that have been
resolved. There is no need to prune health data to reclaim storage space;
garbage collection is performed regularly to remove old health data from
persistent storage.

.. _crash module: ../crash