.. _health-checks: =============== Health checks =============== Overview ======== There is a finite set of health messages that a Ceph cluster can raise. These messages are known as *health checks*. Each health check has a unique identifier. The identifier is a terse human-readable string -- that is, the identifier is readable in much the same way as a typical variable name. It is intended to enable tools (for example, UIs) to make sense of health checks and present them in a way that reflects their meaning. This page lists the health checks that are raised by the monitor and manager daemons. In addition to these, you might see health checks that originate from MDS daemons (see :ref:`cephfs-health-messages`), and health checks that are defined by ``ceph-mgr`` python modules. Definitions =========== Monitor ------- DAEMON_OLD_VERSION __________________ Warn if one or more old versions of Ceph are running on any daemons. A health check is raised if multiple versions are detected. This condition must exist for a period of time greater than ``mon_warn_older_version_delay`` (set to one week by default) in order for the health check to be raised. This allows most upgrades to proceed without the occurrence of a false warning. If the upgrade is paused for an extended time period, ``health mute`` can be used by running ``ceph health mute DAEMON_OLD_VERSION --sticky``. Be sure, however, to run ``ceph health unmute DAEMON_OLD_VERSION`` after the upgrade has finished. MON_DOWN ________ One or more monitor daemons are currently down. The cluster requires a majority (more than one-half) of the monitors to be available. When one or more monitors are down, clients might have a harder time forming their initial connection to the cluster, as they might need to try more addresses before they reach an operating monitor. The down monitor daemon should be restarted as soon as possible to reduce the risk of a subsequent monitor failure leading to a service outage. MON_CLOCK_SKEW ______________ The clocks on the hosts running the ceph-mon monitor daemons are not well-synchronized. This health check is raised if the cluster detects a clock skew greater than ``mon_clock_drift_allowed``. This issue is best resolved by synchronizing the clocks by using a tool like ``ntpd`` or ``chrony``. If it is impractical to keep the clocks closely synchronized, the ``mon_clock_drift_allowed`` threshold can also be increased. However, this value must stay significantly below the ``mon_lease`` interval in order for the monitor cluster to function properly. MON_MSGR2_NOT_ENABLED _____________________ The :confval:`ms_bind_msgr2` option is enabled but one or more monitors are not configured to bind to a v2 port in the cluster's monmap. This means that features specific to the msgr2 protocol (for example, encryption) are unavailable on some or all connections. In most cases this can be corrected by running the following command: .. prompt:: bash $ ceph mon enable-msgr2 After this command is run, any monitor configured to listen on the old default port (6789) will continue to listen for v1 connections on 6789 and begin to listen for v2 connections on the new default port 3300. If a monitor is configured to listen for v1 connections on a non-standard port (that is, a port other than 6789), then the monmap will need to be modified manually. MON_DISK_LOW ____________ One or more monitors are low on disk space. This health check is raised if the percentage of available space on the file system used by the monitor database (normally ``/var/lib/ceph/mon``) drops below the percentage value ``mon_data_avail_warn`` (default: 30%). This alert might indicate that some other process or user on the system is filling up the file system used by the monitor. It might also indicate that the monitor database is too large (see ``MON_DISK_BIG`` below). If space cannot be freed, the monitor's data directory might need to be moved to another storage device or file system (this relocation process must be carried out while the monitor daemon is not running). MON_DISK_CRIT _____________ One or more monitors are critically low on disk space. This health check is raised if the percentage of available space on the file system used by the monitor database (normally ``/var/lib/ceph/mon``) drops below the percentage value ``mon_data_avail_crit`` (default: 5%). See ``MON_DISK_LOW``, above. MON_DISK_BIG ____________ The database size for one or more monitors is very large. This health check is raised if the size of the monitor database is larger than ``mon_data_size_warn`` (default: 15 GiB). A large database is unusual, but does not necessarily indicate a problem. Monitor databases might grow in size when there are placement groups that have not reached an ``active+clean`` state in a long time. This alert might also indicate that the monitor's database is not properly compacting, an issue that has been observed with some older versions of leveldb and rocksdb. Forcing a compaction with ``ceph daemon mon. compact`` might shrink the database's on-disk size. This alert might also indicate that the monitor has a bug that prevents it from pruning the cluster metadata that it stores. If the problem persists, please report a bug. To adjust the warning threshold, run the following command: .. prompt:: bash $ ceph config set global mon_data_size_warn AUTH_INSECURE_GLOBAL_ID_RECLAIM _______________________________ One or more clients or daemons that are connected to the cluster are not securely reclaiming their ``global_id`` (a unique number that identifies each entity in the cluster) when reconnecting to a monitor. The client is being permitted to connect anyway because the ``auth_allow_insecure_global_id_reclaim`` option is set to ``true`` (which may be necessary until all Ceph clients have been upgraded) and because the ``auth_expose_insecure_global_id_reclaim`` option is set to ``true`` (which allows monitors to detect clients with "insecure reclaim" sooner by forcing those clients to reconnect immediately after their initial authentication). To identify which client(s) are using unpatched Ceph client code, run the following command: .. prompt:: bash $ ceph health detail If you collect a dump of the clients that are connected to an individual monitor and examine the ``global_id_status`` field in the output of the dump, you can see the ``global_id`` reclaim behavior of those clients. Here ``reclaim_insecure`` means that a client is unpatched and is contributing to this health check. To effect a client dump, run the following command: .. prompt:: bash $ ceph tell mon.\* sessions We strongly recommend that all clients in the system be upgraded to a newer version of Ceph that correctly reclaims ``global_id`` values. After all clients have been updated, run the following command to stop allowing insecure reconnections: .. prompt:: bash $ ceph config set mon auth_allow_insecure_global_id_reclaim false If it is impractical to upgrade all clients immediately, you can temporarily silence this alert by running the following command: .. prompt:: bash $ ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1w # 1 week Although we do NOT recommend doing so, you can also disable this alert indefinitely by running the following command: .. prompt:: bash $ ceph config set mon mon_warn_on_insecure_global_id_reclaim false AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED _______________________________________ Ceph is currently configured to allow clients that reconnect to monitors using an insecure process to reclaim their previous ``global_id``. Such reclaiming is allowed because, by default, ``auth_allow_insecure_global_id_reclaim`` is set to ``true``. It might be necessary to leave this setting enabled while existing Ceph clients are upgraded to newer versions of Ceph that correctly and securely reclaim their ``global_id``. If the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM`` health check has not also been raised and if the ``auth_expose_insecure_global_id_reclaim`` setting has not been disabled (it is enabled by default), then there are currently no clients connected that need to be upgraded. In that case, it is safe to disable ``insecure global_id reclaim`` by running the following command: .. prompt:: bash $ ceph config set mon auth_allow_insecure_global_id_reclaim false On the other hand, if there are still clients that need to be upgraded, then this alert can be temporarily silenced by running the following command: .. prompt:: bash $ ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w # 1 week Although we do NOT recommend doing so, you can also disable this alert indefinitely by running the following command: .. prompt:: bash $ ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false Manager ------- MGR_DOWN ________ All manager daemons are currently down. The cluster should normally have at least one running manager (``ceph-mgr``) daemon. If no manager daemon is running, the cluster's ability to monitor itself will be compromised, and parts of the management API will become unavailable (for example, the dashboard will not work, and most CLI commands that report metrics or runtime state will block). However, the cluster will still be able to perform all I/O operations and to recover from failures. The "down" manager daemon should be restarted as soon as possible to ensure that the cluster can be monitored (for example, so that the ``ceph -s`` information is up to date, or so that metrics can be scraped by Prometheus). MGR_MODULE_DEPENDENCY _____________________ An enabled manager module is failing its dependency check. This health check typically comes with an explanatory message from the module about the problem. For example, a module might report that a required package is not installed: in this case, you should install the required package and restart your manager daemons. This health check is applied only to enabled modules. If a module is not enabled, you can see whether it is reporting dependency issues in the output of `ceph module ls`. MGR_MODULE_ERROR ________________ A manager module has experienced an unexpected error. Typically, this means that an unhandled exception was raised from the module's `serve` function. The human-readable description of the error might be obscurely worded if the exception did not provide a useful description of itself. This health check might indicate a bug: please open a Ceph bug report if you think you have encountered a bug. However, if you believe the error is transient, you may restart your manager daemon(s) or use ``ceph mgr fail`` on the active daemon in order to force failover to another daemon. OSDs ---- OSD_DOWN ________ One or more OSDs are marked "down". The ceph-osd daemon might have been stopped, or peer OSDs might be unable to reach the OSD over the network. Common causes include a stopped or crashed daemon, a "down" host, or a network outage. Verify that the host is healthy, the daemon is started, and the network is functioning. If the daemon has crashed, the daemon log file (``/var/log/ceph/ceph-osd.*``) might contain debugging information. OSD__DOWN _____________________ (for example, OSD_HOST_DOWN, OSD_ROOT_DOWN) All of the OSDs within a particular CRUSH subtree are marked "down" (for example, all OSDs on a host). OSD_ORPHAN __________ An OSD is referenced in the CRUSH map hierarchy, but does not exist. To remove the OSD from the CRUSH map hierarchy, run the following command: .. prompt:: bash $ ceph osd crush rm osd. OSD_OUT_OF_ORDER_FULL _____________________ The utilization thresholds for `nearfull`, `backfillfull`, `full`, and/or `failsafe_full` are not ascending. In particular, the following pattern is expected: `nearfull < backfillfull`, `backfillfull < full`, and `full < failsafe_full`. To adjust these utilization thresholds, run the following commands: .. prompt:: bash $ ceph osd set-nearfull-ratio ceph osd set-backfillfull-ratio ceph osd set-full-ratio OSD_FULL ________ One or more OSDs have exceeded the `full` threshold and are preventing the cluster from servicing writes. To check utilization by pool, run the following command: .. prompt:: bash $ ceph df To see the currently defined `full` ratio, run the following command: .. prompt:: bash $ ceph osd dump | grep full_ratio A short-term workaround to restore write availability is to raise the full threshold by a small amount. To do so, run the following command: .. prompt:: bash $ ceph osd set-full-ratio Additional OSDs should be deployed in order to add new storage to the cluster, or existing data should be deleted in order to free up space in the cluster. OSD_BACKFILLFULL ________________ One or more OSDs have exceeded the `backfillfull` threshold or *would* exceed it if the currently-mapped backfills were to finish, which will prevent data from rebalancing to this OSD. This alert is an early warning that rebalancing might be unable to complete and that the cluster is approaching full. To check utilization by pool, run the following command: .. prompt:: bash $ ceph df OSD_NEARFULL ____________ One or more OSDs have exceeded the `nearfull` threshold. This alert is an early warning that the cluster is approaching full. To check utilization by pool, run the following command: .. prompt:: bash $ ceph df OSDMAP_FLAGS ____________ One or more cluster flags of interest have been set. These flags include: * *full* - the cluster is flagged as full and cannot serve writes * *pauserd*, *pausewr* - there are paused reads or writes * *noup* - OSDs are not allowed to start * *nodown* - OSD failure reports are being ignored, and that means that the monitors will not mark OSDs "down" * *noin* - OSDs that were previously marked ``out`` are not being marked back ``in`` when they start * *noout* - "down" OSDs are not automatically being marked ``out`` after the configured interval * *nobackfill*, *norecover*, *norebalance* - recovery or data rebalancing is suspended * *noscrub*, *nodeep_scrub* - scrubbing is disabled * *notieragent* - cache-tiering activity is suspended With the exception of *full*, these flags can be set or cleared by running the following commands: .. prompt:: bash $ ceph osd set ceph osd unset OSD_FLAGS _________ One or more OSDs or CRUSH {nodes,device classes} have a flag of interest set. These flags include: * *noup*: these OSDs are not allowed to start * *nodown*: failure reports for these OSDs will be ignored * *noin*: if these OSDs were previously marked ``out`` automatically after a failure, they will not be marked ``in`` when they start * *noout*: if these OSDs are "down" they will not automatically be marked ``out`` after the configured interval To set and clear these flags in batch, run the following commands: .. prompt:: bash $ ceph osd set-group ceph osd unset-group For example: .. prompt:: bash $ ceph osd set-group noup,noout osd.0 osd.1 ceph osd unset-group noup,noout osd.0 osd.1 ceph osd set-group noup,noout host-foo ceph osd unset-group noup,noout host-foo ceph osd set-group noup,noout class-hdd ceph osd unset-group noup,noout class-hdd OLD_CRUSH_TUNABLES __________________ The CRUSH map is using very old settings and should be updated. The oldest set of tunables that can be used (that is, the oldest client version that can connect to the cluster) without raising this health check is determined by the ``mon_crush_min_required_version`` config option. For more information, see :ref:`crush-map-tunables`. OLD_CRUSH_STRAW_CALC_VERSION ____________________________ The CRUSH map is using an older, non-optimal method of calculating intermediate weight values for ``straw`` buckets. The CRUSH map should be updated to use the newer method (that is: ``straw_calc_version=1``). For more information, see :ref:`crush-map-tunables`. CACHE_POOL_NO_HIT_SET _____________________ One or more cache pools are not configured with a *hit set* to track utilization. This issue prevents the tiering agent from identifying cold objects that are to be flushed and evicted from the cache. To configure hit sets on the cache pool, run the following commands: .. prompt:: bash $ ceph osd pool set hit_set_type ceph osd pool set hit_set_period ceph osd pool set hit_set_count ceph osd pool set hit_set_fpp OSD_NO_SORTBITWISE __________________ No pre-Luminous v12.y.z OSDs are running, but the ``sortbitwise`` flag has not been set. The ``sortbitwise`` flag must be set in order for OSDs running Luminous v12.y.z or newer to start. To safely set the flag, run the following command: .. prompt:: bash $ ceph osd set sortbitwise OSD_FILESTORE __________________ Warn if OSDs are running Filestore. The Filestore OSD back end has been deprecated; the BlueStore back end has been the default object store since the Ceph Luminous release. The 'mclock_scheduler' is not supported for Filestore OSDs. For this reason, the default 'osd_op_queue' is set to 'wpq' for Filestore OSDs and is enforced even if the user attempts to change it. .. prompt:: bash $ ceph report | jq -c '."osd_metadata" | .[] | select(.osd_objectstore | contains("filestore")) | {id, osd_objectstore}' **In order to upgrade to Reef or a later release, you must first migrate any Filestore OSDs to BlueStore.** If you are upgrading a pre-Reef release to Reef or later, but it is not feasible to migrate Filestore OSDs to BlueStore immediately, you can temporarily silence this alert by running the following command: .. prompt:: bash $ ceph health mute OSD_FILESTORE Since this migration can take a considerable amount of time to complete, we recommend that you begin the process well in advance of any update to Reef or to later releases. POOL_FULL _________ One or more pools have reached their quota and are no longer allowing writes. To see pool quotas and utilization, run the following command: .. prompt:: bash $ ceph df detail If you opt to raise the pool quota, run the following commands: .. prompt:: bash $ ceph osd pool set-quota max_objects ceph osd pool set-quota max_bytes If not, delete some existing data to reduce utilization. BLUEFS_SPILLOVER ________________ One or more OSDs that use the BlueStore back end have been allocated `db` partitions (that is, storage space for metadata, normally on a faster device), but because that space has been filled, metadata has "spilled over" onto the slow device. This is not necessarily an error condition or even unexpected behavior, but may result in degraded performance. If the administrator had expected that all metadata would fit on the faster device, this alert indicates that not enough space was provided. To disable this alert on all OSDs, run the following command: .. prompt:: bash $ ceph config set osd bluestore_warn_on_bluefs_spillover false Alternatively, to disable the alert on a specific OSD, run the following command: .. prompt:: bash $ ceph config set osd.123 bluestore_warn_on_bluefs_spillover false To secure more metadata space, you can destroy and reprovision the OSD in question. This process involves data migration and recovery. It might also be possible to expand the LVM logical volume that backs the `db` storage. If the underlying LV has been expanded, you must stop the OSD daemon and inform BlueFS of the device-size change by running the following command: .. prompt:: bash $ ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-$ID BLUEFS_AVAILABLE_SPACE ______________________ To see how much space is free for BlueFS, run the following command: .. prompt:: bash $ ceph daemon osd.123 bluestore bluefs available This will output up to three values: ``BDEV_DB free``, ``BDEV_SLOW free``, and ``available_from_bluestore``. ``BDEV_DB`` and ``BDEV_SLOW`` report the amount of space that has been acquired by BlueFS and is now considered free. The value ``available_from_bluestore`` indicates the ability of BlueStore to relinquish more space to BlueFS. It is normal for this value to differ from the amount of BlueStore free space, because the BlueFS allocation unit is typically larger than the BlueStore allocation unit. This means that only part of the BlueStore free space will be available for BlueFS. BLUEFS_LOW_SPACE _________________ If BlueFS is running low on available free space and there is not much free space available from BlueStore (in other words, `available_from_bluestore` has a low value), consider reducing the BlueFS allocation unit size. To simulate available space when the allocation unit is different, run the following command: .. prompt:: bash $ ceph daemon osd.123 bluestore bluefs available BLUESTORE_FRAGMENTATION _______________________ As BlueStore operates, the free space on the underlying storage will become fragmented. This is normal and unavoidable, but excessive fragmentation causes slowdown. To inspect BlueStore fragmentation, run the following command: .. prompt:: bash $ ceph daemon osd.123 bluestore allocator score block The fragmentation score is given in a [0-1] range. [0.0 .. 0.4] tiny fragmentation [0.4 .. 0.7] small, acceptable fragmentation [0.7 .. 0.9] considerable, but safe fragmentation [0.9 .. 1.0] severe fragmentation, might impact BlueFS's ability to get space from BlueStore To see a detailed report of free fragments, run the following command: .. prompt:: bash $ ceph daemon osd.123 bluestore allocator dump block For OSD processes that are not currently running, fragmentation can be inspected with `ceph-bluestore-tool`. To see the fragmentation score, run the following command: .. prompt:: bash $ ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-score To dump detailed free chunks, run the following command: .. prompt:: bash $ ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-dump BLUESTORE_LEGACY_STATFS _______________________ One or more OSDs have BlueStore volumes that were created prior to the Nautilus release. (In Nautilus, BlueStore tracks its internal usage statistics on a granular, per-pool basis.) If *all* OSDs are older than Nautilus, this means that the per-pool metrics are simply unavailable. But if there is a mixture of pre-Nautilus and post-Nautilus OSDs, the cluster usage statistics reported by ``ceph df`` will be inaccurate. The old OSDs can be updated to use the new usage-tracking scheme by stopping each OSD, running a repair operation, and then restarting the OSD. For example, to update ``osd.123``, run the following commands: .. prompt:: bash $ systemctl stop ceph-osd@123 ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123 systemctl start ceph-osd@123 To disable this alert, run the following command: .. prompt:: bash $ ceph config set global bluestore_warn_on_legacy_statfs false BLUESTORE_NO_PER_POOL_OMAP __________________________ One or more OSDs have volumes that were created prior to the Octopus release. (In Octopus and later releases, BlueStore tracks omap space utilization by pool.) If there are any BlueStore OSDs that do not have the new tracking enabled, the cluster will report an approximate value for per-pool omap usage based on the most recent deep scrub. The OSDs can be updated to track by pool by stopping each OSD, running a repair operation, and then restarting the OSD. For example, to update ``osd.123``, run the following commands: .. prompt:: bash $ systemctl stop ceph-osd@123 ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123 systemctl start ceph-osd@123 To disable this alert, run the following command: .. prompt:: bash $ ceph config set global bluestore_warn_on_no_per_pool_omap false BLUESTORE_NO_PER_PG_OMAP __________________________ One or more OSDs have volumes that were created prior to Pacific. (In Pacific and later releases Bluestore tracks omap space utilitzation by Placement Group (PG).) Per-PG omap allows faster PG removal when PGs migrate. The older OSDs can be updated to track by PG by stopping each OSD, running a repair operation, and then restarting the OSD. For example, to update ``osd.123``, run the following commands: .. prompt:: bash $ systemctl stop ceph-osd@123 ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123 systemctl start ceph-osd@123 To disable this alert, run the following command: .. prompt:: bash $ ceph config set global bluestore_warn_on_no_per_pg_omap false BLUESTORE_DISK_SIZE_MISMATCH ____________________________ One or more BlueStore OSDs have an internal inconsistency between the size of the physical device and the metadata that tracks its size. This inconsistency can lead to the OSD(s) crashing in the future. The OSDs that have this inconsistency should be destroyed and reprovisioned. Be very careful to execute this procedure on only one OSD at a time, so as to minimize the risk of losing any data. To execute this procedure, where ``$N`` is the OSD that has the inconsistency, run the following commands: .. prompt:: bash $ ceph osd out osd.$N while ! ceph osd safe-to-destroy osd.$N ; do sleep 1m ; done ceph osd destroy osd.$N ceph-volume lvm zap /path/to/device ceph-volume lvm create --osd-id $N --data /path/to/device .. note:: Wait for this recovery procedure to completely on one OSD before running it on the next. BLUESTORE_NO_COMPRESSION ________________________ One or more OSDs is unable to load a BlueStore compression plugin. This issue might be caused by a broken installation, in which the ``ceph-osd`` binary does not match the compression plugins. Or it might be caused by a recent upgrade in which the ``ceph-osd`` daemon was not restarted. To resolve this issue, verify that all of the packages on the host that is running the affected OSD(s) are correctly installed and that the OSD daemon(s) have been restarted. If the problem persists, check the OSD log for information about the source of the problem. BLUESTORE_SPURIOUS_READ_ERRORS ______________________________ One or more BlueStore OSDs detect spurious read errors on the main device. BlueStore has recovered from these errors by retrying disk reads. This alert might indicate issues with underlying hardware, issues with the I/O subsystem, or something similar. In theory, such issues can cause permanent data corruption. Some observations on the root cause of spurious read errors can be found here: https://tracker.ceph.com/issues/22464 This alert does not require an immediate response, but the affected host might need additional attention: for example, upgrading the host to the latest OS/kernel versions and implementing hardware-resource-utilization monitoring. To disable this alert on all OSDs, run the following command: .. prompt:: bash $ ceph config set osd bluestore_warn_on_spurious_read_errors false Or, to disable this alert on a specific OSD, run the following command: .. prompt:: bash $ ceph config set osd.123 bluestore_warn_on_spurious_read_errors false Device health ------------- DEVICE_HEALTH _____________ One or more OSD devices are expected to fail soon, where the warning threshold is determined by the ``mgr/devicehealth/warn_threshold`` config option. Because this alert applies only to OSDs that are currently marked ``in``, the appropriate response to this expected failure is (1) to mark the OSD ``out`` so that data is migrated off of the OSD, and then (2) to remove the hardware from the system. Note that this marking ``out`` is normally done automatically if ``mgr/devicehealth/self_heal`` is enabled (as determined by ``mgr/devicehealth/mark_out_threshold``). To check device health, run the following command: .. prompt:: bash $ ceph device info Device life expectancy is set either by a prediction model that the mgr runs or by an external tool that is activated by running the following command: .. prompt:: bash $ ceph device set-life-expectancy You can change the stored life expectancy manually, but such a change usually doesn't accomplish anything. The reason for this is that whichever tool originally set the stored life expectancy will probably undo your change by setting it again, and a change to the stored value does not affect the actual health of the hardware device. DEVICE_HEALTH_IN_USE ____________________ One or more devices (that is, OSDs) are expected to fail soon and have been marked ``out`` of the cluster (as controlled by ``mgr/devicehealth/mark_out_threshold``), but they are still participating in one or more Placement Groups. This might be because the OSD(s) were marked ``out`` only recently and data is still migrating, or because data cannot be migrated off of the OSD(s) for some reason (for example, the cluster is nearly full, or the CRUSH hierarchy is structured so that there isn't another suitable OSD to migrate the data to). This message can be silenced by disabling self-heal behavior (that is, setting ``mgr/devicehealth/self_heal`` to ``false``), by adjusting ``mgr/devicehealth/mark_out_threshold``, or by addressing whichever condition is preventing data from being migrated off of the ailing OSD(s). .. _rados_health_checks_device_health_toomany: DEVICE_HEALTH_TOOMANY _____________________ Too many devices (that is, OSDs) are expected to fail soon, and because ``mgr/devicehealth/self_heal`` behavior is enabled, marking ``out`` all of the ailing OSDs would exceed the cluster's ``mon_osd_min_in_ratio`` ratio. This ratio prevents a cascade of too many OSDs from being automatically marked ``out``. You should promptly add new OSDs to the cluster to prevent data loss, or incrementally replace the failing OSDs. Alternatively, you can silence this health check by adjusting options including ``mon_osd_min_in_ratio`` or ``mgr/devicehealth/mark_out_threshold``. Be warned, however, that this will increase the likelihood of unrecoverable data loss. Data health (pools & placement groups) -------------------------------------- PG_AVAILABILITY _______________ Data availability is reduced. In other words, the cluster is unable to service potential read or write requests for at least some data in the cluster. More precisely, one or more Placement Groups (PGs) are in a state that does not allow I/O requests to be serviced. Any of the following PG states are problematic if they do not clear quickly: *peering*, *stale*, *incomplete*, and the lack of *active*. For detailed information about which PGs are affected, run the following command: .. prompt:: bash $ ceph health detail In most cases, the root cause of this issue is that one or more OSDs are currently ``down``: see ``OSD_DOWN`` above. To see the state of a specific problematic PG, run the following command: .. prompt:: bash $ ceph tell query PG_DEGRADED ___________ Data redundancy is reduced for some data: in other words, the cluster does not have the desired number of replicas for all data (in the case of replicated pools) or erasure code fragments (in the case of erasure-coded pools). More precisely, one or more Placement Groups (PGs): * have the *degraded* or *undersized* flag set, which means that there are not enough instances of that PG in the cluster; or * have not had the *clean* state set for a long time. For detailed information about which PGs are affected, run the following command: .. prompt:: bash $ ceph health detail In most cases, the root cause of this issue is that one or more OSDs are currently "down": see ``OSD_DOWN`` above. To see the state of a specific problematic PG, run the following command: .. prompt:: bash $ ceph tell query PG_RECOVERY_FULL ________________ Data redundancy might be reduced or even put at risk for some data due to a lack of free space in the cluster. More precisely, one or more Placement Groups have the *recovery_toofull* flag set, which means that the cluster is unable to migrate or recover data because one or more OSDs are above the ``full`` threshold. For steps to resolve this condition, see *OSD_FULL* above. PG_BACKFILL_FULL ________________ Data redundancy might be reduced or even put at risk for some data due to a lack of free space in the cluster. More precisely, one or more Placement Groups have the *backfill_toofull* flag set, which means that the cluster is unable to migrate or recover data because one or more OSDs are above the ``backfillfull`` threshold. For steps to resolve this condition, see *OSD_BACKFILLFULL* above. PG_DAMAGED __________ Data scrubbing has discovered problems with data consistency in the cluster. More precisely, one or more Placement Groups either (1) have the *inconsistent* or ``snaptrim_error`` flag set, which indicates that an earlier data scrub operation found a problem, or (2) have the *repair* flag set, which means that a repair for such an inconsistency is currently in progress. For more information, see :doc:`pg-repair`. OSD_SCRUB_ERRORS ________________ Recent OSD scrubs have discovered inconsistencies. This alert is generally paired with *PG_DAMAGED* (see above). For more information, see :doc:`pg-repair`. OSD_TOO_MANY_REPAIRS ____________________ The count of read repairs has exceeded the config value threshold ``mon_osd_warn_num_repaired`` (default: ``10``). Because scrub handles errors only for data at rest, and because any read error that occurs when another replica is available will be repaired immediately so that the client can get the object data, there might exist failing disks that are not registering any scrub errors. This repair count is maintained as a way of identifying any such failing disks. LARGE_OMAP_OBJECTS __________________ One or more pools contain large omap objects, as determined by ``osd_deep_scrub_large_omap_object_key_threshold`` (threshold for the number of keys to determine what is considered a large omap object) or ``osd_deep_scrub_large_omap_object_value_sum_threshold`` (the threshold for the summed size in bytes of all key values to determine what is considered a large omap object) or both. To find more information on object name, key count, and size in bytes, search the cluster log for 'Large omap object found'. This issue can be caused by RGW-bucket index objects that do not have automatic resharding enabled. For more information on resharding, see :ref:`RGW Dynamic Bucket Index Resharding `. To adjust the thresholds mentioned above, run the following commands: .. prompt:: bash $ ceph config set osd osd_deep_scrub_large_omap_object_key_threshold ceph config set osd osd_deep_scrub_large_omap_object_value_sum_threshold CACHE_POOL_NEAR_FULL ____________________ A cache-tier pool is nearly full, as determined by the ``target_max_bytes`` and ``target_max_objects`` properties of the cache pool. Once the pool reaches the target threshold, write requests to the pool might block while data is flushed and evicted from the cache. This state normally leads to very high latencies and poor performance. To adjust the cache pool's target size, run the following commands: .. prompt:: bash $ ceph osd pool set target_max_bytes ceph osd pool set target_max_objects There might be other reasons that normal cache flush and evict activity are throttled: for example, reduced availability of the base tier, reduced performance of the base tier, or overall cluster load. TOO_FEW_PGS ___________ The number of Placement Groups (PGs) that are in use in the cluster is below the configurable threshold of ``mon_pg_warn_min_per_osd`` PGs per OSD. This can lead to suboptimal distribution and suboptimal balance of data across the OSDs in the cluster, and a reduction of overall performance. If data pools have not yet been created, this condition is expected. To address this issue, you can increase the PG count for existing pools or create new pools. For more information, see :ref:`choosing-number-of-placement-groups`. POOL_PG_NUM_NOT_POWER_OF_TWO ____________________________ One or more pools have a ``pg_num`` value that is not a power of two. Although this is not strictly incorrect, it does lead to a less balanced distribution of data because some Placement Groups will have roughly twice as much data as others have. This is easily corrected by setting the ``pg_num`` value for the affected pool(s) to a nearby power of two. To do so, run the following command: .. prompt:: bash $ ceph osd pool set pg_num To disable this health check, run the following command: .. prompt:: bash $ ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false POOL_TOO_FEW_PGS ________________ One or more pools should probably have more Placement Groups (PGs), given the amount of data that is currently stored in the pool. This issue can lead to suboptimal distribution and suboptimal balance of data across the OSDs in the cluster, and a reduction of overall performance. This alert is raised only if the ``pg_autoscale_mode`` property on the pool is set to ``warn``. To disable the alert, entirely disable auto-scaling of PGs for the pool by running the following command: .. prompt:: bash $ ceph osd pool set pg_autoscale_mode off To allow the cluster to automatically adjust the number of PGs for the pool, run the following command: .. prompt:: bash $ ceph osd pool set pg_autoscale_mode on Alternatively, to manually set the number of PGs for the pool to the recommended amount, run the following command: .. prompt:: bash $ ceph osd pool set pg_num For more information, see :ref:`choosing-number-of-placement-groups` and :ref:`pg-autoscaler`. TOO_MANY_PGS ____________ The number of Placement Groups (PGs) in use in the cluster is above the configurable threshold of ``mon_max_pg_per_osd`` PGs per OSD. If this threshold is exceeded, the cluster will not allow new pools to be created, pool `pg_num` to be increased, or pool replication to be increased (any of which, if allowed, would lead to more PGs in the cluster). A large number of PGs can lead to higher memory utilization for OSD daemons, slower peering after cluster state changes (for example, OSD restarts, additions, or removals), and higher load on the Manager and Monitor daemons. The simplest way to mitigate the problem is to increase the number of OSDs in the cluster by adding more hardware. Note that, because the OSD count that is used for the purposes of this health check is the number of ``in`` OSDs, marking ``out`` OSDs ``in`` (if there are any ``out`` OSDs available) can also help. To do so, run the following command: .. prompt:: bash $ ceph osd in For more information, see :ref:`choosing-number-of-placement-groups`. POOL_TOO_MANY_PGS _________________ One or more pools should probably have fewer Placement Groups (PGs), given the amount of data that is currently stored in the pool. This issue can lead to higher memory utilization for OSD daemons, slower peering after cluster state changes (for example, OSD restarts, additions, or removals), and higher load on the Manager and Monitor daemons. This alert is raised only if the ``pg_autoscale_mode`` property on the pool is set to ``warn``. To disable the alert, entirely disable auto-scaling of PGs for the pool by running the following command: .. prompt:: bash $ ceph osd pool set pg_autoscale_mode off To allow the cluster to automatically adjust the number of PGs for the pool, run the following command: .. prompt:: bash $ ceph osd pool set pg_autoscale_mode on Alternatively, to manually set the number of PGs for the pool to the recommended amount, run the following command: .. prompt:: bash $ ceph osd pool set pg_num For more information, see :ref:`choosing-number-of-placement-groups` and :ref:`pg-autoscaler`. POOL_TARGET_SIZE_BYTES_OVERCOMMITTED ____________________________________ One or more pools have a ``target_size_bytes`` property that is set in order to estimate the expected size of the pool, but the value(s) of this property are greater than the total available storage (either by themselves or in combination with other pools). This alert is usually an indication that the ``target_size_bytes`` value for the pool is too large and should be reduced or set to zero. To reduce the ``target_size_bytes`` value or set it to zero, run the following command: .. prompt:: bash $ ceph osd pool set target_size_bytes 0 The above command sets the value of ``target_size_bytes`` to zero. To set the value of ``target_size_bytes`` to a non-zero value, replace the ``0`` with that non-zero value. For more information, see :ref:`specifying_pool_target_size`. POOL_HAS_TARGET_SIZE_BYTES_AND_RATIO ____________________________________ One or more pools have both ``target_size_bytes`` and ``target_size_ratio`` set in order to estimate the expected size of the pool. Only one of these properties should be non-zero. If both are set to a non-zero value, then ``target_size_ratio`` takes precedence and ``target_size_bytes`` is ignored. To reset ``target_size_bytes`` to zero, run the following command: .. prompt:: bash $ ceph osd pool set target_size_bytes 0 For more information, see :ref:`specifying_pool_target_size`. TOO_FEW_OSDS ____________ The number of OSDs in the cluster is below the configurable threshold of ``osd_pool_default_size``. This means that some or all data may not be able to satisfy the data protection policy specified in CRUSH rules and pool settings. SMALLER_PGP_NUM _______________ One or more pools have a ``pgp_num`` value less than ``pg_num``. This alert is normally an indication that the Placement Group (PG) count was increased without any increase in the placement behavior. This disparity is sometimes brought about deliberately, in order to separate out the `split` step when the PG count is adjusted from the data migration that is needed when ``pgp_num`` is changed. This issue is normally resolved by setting ``pgp_num`` to match ``pg_num``, so as to trigger the data migration, by running the following command: .. prompt:: bash $ ceph osd pool set pgp_num MANY_OBJECTS_PER_PG ___________________ One or more pools have an average number of objects per Placement Group (PG) that is significantly higher than the overall cluster average. The specific threshold is determined by the ``mon_pg_warn_max_object_skew`` configuration value. This alert is usually an indication that the pool(s) that contain most of the data in the cluster have too few PGs, or that other pools that contain less data have too many PGs. See *TOO_MANY_PGS* above. To silence the health check, raise the threshold by adjusting the ``mon_pg_warn_max_object_skew`` config option on the managers. The health check will be silenced for a specific pool only if ``pg_autoscale_mode`` is set to ``on``. POOL_APP_NOT_ENABLED ____________________ A pool exists but the pool has not been tagged for use by a particular application. To resolve this issue, tag the pool for use by an application. For example, if the pool is used by RBD, run the following command: .. prompt:: bash $ rbd pool init Alternatively, if the pool is being used by a custom application (here 'foo'), you can label the pool by running the following low-level command: .. prompt:: bash $ ceph osd pool application enable foo For more information, see :ref:`associate-pool-to-application`. POOL_FULL _________ One or more pools have reached (or are very close to reaching) their quota. The threshold to raise this health check is determined by the ``mon_pool_quota_crit_threshold`` configuration option. Pool quotas can be adjusted up or down (or removed) by running the following commands: .. prompt:: bash $ ceph osd pool set-quota max_bytes ceph osd pool set-quota max_objects To disable a quota, set the quota value to 0. POOL_NEAR_FULL ______________ One or more pools are approaching a configured fullness threshold. One of the several thresholds that can raise this health check is determined by the ``mon_pool_quota_warn_threshold`` configuration option. Pool quotas can be adjusted up or down (or removed) by running the following commands: .. prompt:: bash $ ceph osd pool set-quota max_bytes ceph osd pool set-quota max_objects To disable a quota, set the quota value to 0. Other thresholds that can raise the two health checks above are ``mon_osd_nearfull_ratio`` and ``mon_osd_full_ratio``. For details and resolution, see :ref:`storage-capacity` and :ref:`no-free-drive-space`. OBJECT_MISPLACED ________________ One or more objects in the cluster are not stored on the node that CRUSH would prefer that they be stored on. This alert is an indication that data migration due to a recent cluster change has not yet completed. Misplaced data is not a dangerous condition in and of itself; data consistency is never at risk, and old copies of objects will not be removed until the desired number of new copies (in the desired locations) has been created. OBJECT_UNFOUND ______________ One or more objects in the cluster cannot be found. More precisely, the OSDs know that a new or updated copy of an object should exist, but no such copy has been found on OSDs that are currently online. Read or write requests to unfound objects will block. Ideally, a "down" OSD that has a more recent copy of the unfound object can be brought back online. To identify candidate OSDs, check the peering state of the PG(s) responsible for the unfound object. To see the peering state, run the following command: .. prompt:: bash $ ceph tell query On the other hand, if the latest copy of the object is not available, the cluster can be told to roll back to a previous version of the object. For more information, see :ref:`failures-osd-unfound`. SLOW_OPS ________ One or more OSD requests or monitor requests are taking a long time to process. This alert might be an indication of extreme load, a slow storage device, or a software bug. To query the request queue for the daemon that is causing the slowdown, run the following command from the daemon's host: .. prompt:: bash $ ceph daemon osd. ops To see a summary of the slowest recent requests, run the following command: .. prompt:: bash $ ceph daemon osd. dump_historic_ops To see the location of a specific OSD, run the following command: .. prompt:: bash $ ceph osd find osd. PG_NOT_SCRUBBED _______________ One or more Placement Groups (PGs) have not been scrubbed recently. PGs are normally scrubbed within an interval determined by :confval:`osd_scrub_max_interval` globally. This interval can be overridden on per-pool basis by changing the value of the variable :confval:`scrub_max_interval`. This health check is raised if a certain percentage (determined by ``mon_warn_pg_not_scrubbed_ratio``) of the interval has elapsed after the time the scrub was scheduled and no scrub has been performed. PGs will be scrubbed only if they are flagged as ``clean`` (which means that they are to be cleaned, and not that they have been examined and found to be clean). Misplaced or degraded PGs will not be flagged as ``clean`` (see *PG_AVAILABILITY* and *PG_DEGRADED* above). To manually initiate a scrub of a clean PG, run the following command: .. prompt: bash $ ceph pg scrub PG_NOT_DEEP_SCRUBBED ____________________ One or more Placement Groups (PGs) have not been deep scrubbed recently. PGs are normally scrubbed every :confval:`osd_deep_scrub_interval` seconds at most. This health check is raised if a certain percentage (determined by ``mon_warn_pg_not_deep_scrubbed_ratio``) of the interval has elapsed after the time the scrub was scheduled and no scrub has been performed. PGs will receive a deep scrub only if they are flagged as *clean* (which means that they are to be cleaned, and not that they have been examined and found to be clean). Misplaced or degraded PGs might not be flagged as ``clean`` (see *PG_AVAILABILITY* and *PG_DEGRADED* above). To manually initiate a deep scrub of a clean PG, run the following command: .. prompt:: bash $ ceph pg deep-scrub PG_SLOW_SNAP_TRIMMING _____________________ The snapshot trim queue for one or more PGs has exceeded the configured warning threshold. This alert indicates either that an extremely large number of snapshots was recently deleted, or that OSDs are unable to trim snapshots quickly enough to keep up with the rate of new snapshot deletions. The warning threshold is determined by the ``mon_osd_snap_trim_queue_warn_on`` option (default: 32768). This alert might be raised if OSDs are under excessive load and unable to keep up with their background work, or if the OSDs' internal metadata database is heavily fragmented and unable to perform. The alert might also indicate some other performance issue with the OSDs. The exact size of the snapshot trim queue is reported by the ``snaptrimq_len`` field of ``ceph pg ls -f json-detail``. Stretch Mode ------------ INCORRECT_NUM_BUCKETS_STRETCH_MODE __________________________________ Stretch mode currently only support 2 dividing buckets with OSDs, this warning suggests that the number of dividing buckets is not equal to 2 after stretch mode is enabled. You can expect unpredictable failures and MON assertions until the condition is fixed. We encourage you to fix this by removing additional dividing buckets or bump the number of dividing buckets to 2. UNEVEN_WEIGHTS_STRETCH_MODE ___________________________ The 2 dividing buckets must have equal weights when stretch mode is enabled. This warning suggests that the 2 dividing buckets have uneven weights after stretch mode is enabled. This is not immediately fatal, however, you can expect Ceph to be confused when trying to process transitions between dividing buckets. We encourage you to fix this by making the weights even on both dividing buckets. This can be done by making sure the combined weight of the OSDs on each dividing bucket are the same. Miscellaneous ------------- RECENT_CRASH ____________ One or more Ceph daemons have crashed recently, and the crash(es) have not yet been acknowledged and archived by the administrator. This alert might indicate a software bug, a hardware problem (for example, a failing disk), or some other problem. To list recent crashes, run the following command: .. prompt:: bash $ ceph crash ls-new To examine information about a specific crash, run the following command: .. prompt:: bash $ ceph crash info To silence this alert, you can archive the crash (perhaps after the crash has been examined by an administrator) by running the following command: .. prompt:: bash $ ceph crash archive Similarly, to archive all recent crashes, run the following command: .. prompt:: bash $ ceph crash archive-all Archived crashes will still be visible by running the command ``ceph crash ls``, but not by running the command ``ceph crash ls-new``. The time period that is considered recent is determined by the option ``mgr/crash/warn_recent_interval`` (default: two weeks). To entirely disable this alert, run the following command: .. prompt:: bash $ ceph config set mgr/crash/warn_recent_interval 0 RECENT_MGR_MODULE_CRASH _______________________ One or more ``ceph-mgr`` modules have crashed recently, and the crash(es) have not yet been acknowledged and archived by the administrator. This alert usually indicates a software bug in one of the software modules that are running inside the ``ceph-mgr`` daemon. The module that experienced the problem might be disabled as a result, but other modules are unaffected and continue to function as expected. As with the *RECENT_CRASH* health check, a specific crash can be inspected by running the following command: .. prompt:: bash $ ceph crash info To silence this alert, you can archive the crash (perhaps after the crash has been examined by an administrator) by running the following command: .. prompt:: bash $ ceph crash archive Similarly, to archive all recent crashes, run the following command: .. prompt:: bash $ ceph crash archive-all Archived crashes will still be visible by running the command ``ceph crash ls`` but not by running the command ``ceph crash ls-new``. The time period that is considered recent is determined by the option ``mgr/crash/warn_recent_interval`` (default: two weeks). To entirely disable this alert, run the following command: .. prompt:: bash $ ceph config set mgr/crash/warn_recent_interval 0 TELEMETRY_CHANGED _________________ Telemetry has been enabled, but because the contents of the telemetry report have changed in the meantime, telemetry reports will not be sent. Ceph developers occasionally revise the telemetry feature to include new and useful information, or to remove information found to be useless or sensitive. If any new information is included in the report, Ceph requires the administrator to re-enable telemetry. This requirement ensures that the administrator has an opportunity to (re)review the information that will be shared. To review the contents of the telemetry report, run the following command: .. prompt:: bash $ ceph telemetry show Note that the telemetry report consists of several channels that may be independently enabled or disabled. For more information, see :ref:`telemetry`. To re-enable telemetry (and silence the alert), run the following command: .. prompt:: bash $ ceph telemetry on To disable telemetry (and silence the alert), run the following command: .. prompt:: bash $ ceph telemetry off AUTH_BAD_CAPS _____________ One or more auth users have capabilities that cannot be parsed by the monitors. As a general rule, this alert indicates that there are one or more daemon types that the user is not authorized to use to perform any action. This alert is most likely to be raised after an upgrade if (1) the capabilities were set with an older version of Ceph that did not properly validate the syntax of those capabilities, or if (2) the syntax of the capabilities has changed. To remove the user(s) in question, run the following command: .. prompt:: bash $ ceph auth rm (This resolves the health check, but it prevents clients from being able to authenticate as the removed user.) Alternatively, to update the capabilities for the user(s), run the following command: .. prompt:: bash $ ceph auth [ ...] For more information about auth capabilities, see :ref:`user-management`. OSD_NO_DOWN_OUT_INTERVAL ________________________ The ``mon_osd_down_out_interval`` option is set to zero, which means that the system does not automatically perform any repair or healing operations when an OSD fails. Instead, an administrator an external orchestrator must manually mark "down" OSDs as ``out`` (by running ``ceph osd out ``) in order to trigger recovery. This option is normally set to five or ten minutes, which should be enough time for a host to power-cycle or reboot. To silence this alert, set ``mon_warn_on_osd_down_out_interval_zero`` to ``false`` by running the following command: .. prompt:: bash $ ceph config global mon mon_warn_on_osd_down_out_interval_zero false DASHBOARD_DEBUG _______________ The Dashboard debug mode is enabled. This means that if there is an error while processing a REST API request, the HTTP error response will contain a Python traceback. This mode should be disabled in production environments because such a traceback might contain and expose sensitive information. To disable the debug mode, run the following command: .. prompt:: bash $ ceph dashboard debug disable