diff options
Diffstat (limited to '')
-rw-r--r-- | doc/rados/operations/health-checks.rst | 1084 |
1 files changed, 1084 insertions, 0 deletions
diff --git a/doc/rados/operations/health-checks.rst b/doc/rados/operations/health-checks.rst new file mode 100644 index 00000000..03e357f4 --- /dev/null +++ b/doc/rados/operations/health-checks.rst @@ -0,0 +1,1084 @@ + +============= +Health checks +============= + +Overview +======== + +There is a finite set of possible health messages that a Ceph cluster can +raise -- these are defined as *health checks* which have unique identifiers. + +The identifier is a terse pseudo-human-readable (i.e. like a variable name) +string. It is intended to enable tools (such as UIs) to make sense of +health checks, and present them in a way that reflects their meaning. + +This page lists the health checks that are raised by the monitor and manager +daemons. In addition to these, you may also see health checks that originate +from MDS daemons (see :ref:`cephfs-health-messages`), and health checks +that are defined by ceph-mgr python modules. + +Definitions +=========== + +Monitor +------- + +MON_DOWN +________ + +One or more monitor daemons is currently down. The cluster requires a +majority (more than 1/2) of the monitors in order to function. When +one or more monitors are down, clients may have a harder time forming +their initial connection to the cluster as they may need to try more +addresses before they reach an operating monitor. + +The down monitor daemon should generally be restarted as soon as +possible to reduce the risk of a subsequen monitor failure leading to +a service outage. + +MON_CLOCK_SKEW +______________ + +The clocks on the hosts running the ceph-mon monitor daemons are not +sufficiently well synchronized. This health alert is raised if the +cluster detects a clock skew greater than ``mon_clock_drift_allowed``. + +This is best resolved by synchronizing the clocks using a tool like +``ntpd`` or ``chrony``. + +If it is impractical to keep the clocks closely synchronized, the +``mon_clock_drift_allowed`` threshold can also be increased, but this +value must stay significantly below the ``mon_lease`` interval in +order for monitor cluster to function properly. + +MON_MSGR2_NOT_ENABLED +_____________________ + +The ``ms_bind_msgr2`` option is enabled but one or more monitors is +not configured to bind to a v2 port in the cluster's monmap. This +means that features specific to the msgr2 protocol (e.g., encryption) +are not available on some or all connections. + +In most cases this can be corrected by issuing the command:: + + ceph mon enable-msgr2 + +That command will change any monitor configured for the old default +port 6789 to continue to listen for v1 connections on 6789 and also +listen for v2 connections on the new default 3300 port. + +If a monitor is configured to listen for v1 connections on a non-standard port (not 6789), then the monmap will need to be modified manually. + +AUTH_INSECURE_GLOBAL_ID_RECLAIM +_______________________________ + +One or more clients or daemons are connected to the cluster that are +not securely reclaiming their global_id (a unique number identifying +each entity in the cluster) when reconnecting to a monitor. The +client is being permitted to connect anyway because the +``auth_allow_insecure_global_id_reclaim`` option is set to true (which may +be necessary until all ceph clients have been upgraded), and the +``auth_expose_insecure_global_id_reclaim`` option set to ``true`` (which +allows monitors to detect clients with insecure reclaim early by forcing them to +reconnect right after they first authenticate). + +You can identify which client(s) are using unpatched ceph client code with:: + + ceph health detail + +Clients global_id reclaim rehavior can also seen in the +``global_id_status`` field in the dump of clients connected to an +individual monitor (``reclaim_insecure`` means the client is +unpatched and is contributing to this health alert):: + + ceph tell mon.\* sessions + +We strongly recommend that all clients in the system are upgraded to a +newer version of Ceph that correctly reclaims global_id values. Once +all clients have been updated, you can stop allowing insecure reconnections +with:: + + ceph config set mon auth_allow_insecure_global_id_reclaim false + +Although we do NOT recommend doing so, you can disable this warning indefinitely +with:: + + ceph config set mon mon_warn_on_insecure_global_id_reclaim false + +AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED +_______________________________________ + +Ceph is currently configured to allow clients to reconnect to monitors using +an insecure process to reclaim their previous global_id because the setting +``auth_allow_insecure_global_id_reclaim`` is set to ``true``. It may be necessary to +leave this setting enabled while existing Ceph clients are upgraded to newer +versions of Ceph that correctly and securely reclaim their global_id. + +If the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM`` health alert has not also been raised and +the ``auth_expose_insecure_global_id_reclaim`` setting has not been disabled (it is +on by default), then there are currently no clients connected that need to be +upgraded, and it is safe to disallow insecure global_id reclaim with:: + + ceph config set mon auth_allow_insecure_global_id_reclaim false + +Although we do NOT recommend doing so, you can disable this warning indefinitely +with:: + + ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false + + +Manager +------- + +MGR_MODULE_DEPENDENCY +_____________________ + +An enabled manager module is failing its dependency check. This health check +should come with an explanatory message from the module about the problem. + +For example, a module might report that a required package is not installed: +install the required package and restart your manager daemons. + +This health check is only applied to enabled modules. If a module is +not enabled, you can see whether it is reporting dependency issues in +the output of `ceph module ls`. + + +MGR_MODULE_ERROR +________________ + +A manager module has experienced an unexpected error. Typically, +this means an unhandled exception was raised from the module's `serve` +function. The human readable description of the error may be obscurely +worded if the exception did not provide a useful description of itself. + +This health check may indicate a bug: please open a Ceph bug report if you +think you have encountered a bug. + +If you believe the error is transient, you may restart your manager +daemon(s), or use `ceph mgr fail` on the active daemon to prompt +a failover to another daemon. + + +OSDs +---- + +OSD_DOWN +________ + +One or more OSDs are marked down. The ceph-osd daemon may have been +stopped, or peer OSDs may be unable to reach the OSD over the network. +Common causes include a stopped or crashed daemon, a down host, or a +network outage. + +Verify the host is healthy, the daemon is started, and network is +functioning. If the daemon has crashed, the daemon log file +(``/var/log/ceph/ceph-osd.*``) may contain debugging information. + +OSD_<crush type>_DOWN +_____________________ + +(e.g. OSD_HOST_DOWN, OSD_ROOT_DOWN) + +All the OSDs within a particular CRUSH subtree are marked down, for example +all OSDs on a host. + +OSD_ORPHAN +__________ + +An OSD is referenced in the CRUSH map hierarchy but does not exist. + +The OSD can be removed from the CRUSH hierarchy with:: + + ceph osd crush rm osd.<id> + +OSD_OUT_OF_ORDER_FULL +_____________________ + +The utilization thresholds for `backfillfull`, `nearfull`, `full`, +and/or `failsafe_full` are not ascending. In particular, we expect +`backfillfull < nearfull`, `nearfull < full`, and `full < +failsafe_full`. + +The thresholds can be adjusted with:: + + ceph osd set-backfillfull-ratio <ratio> + ceph osd set-nearfull-ratio <ratio> + ceph osd set-full-ratio <ratio> + + +OSD_FULL +________ + +One or more OSDs has exceeded the `full` threshold and is preventing +the cluster from servicing writes. + +Utilization by pool can be checked with:: + + ceph df + +The currently defined `full` ratio can be seen with:: + + ceph osd dump | grep full_ratio + +A short-term workaround to restore write availability is to raise the full +threshold by a small amount:: + + ceph osd set-full-ratio <ratio> + +New storage should be added to the cluster by deploying more OSDs or +existing data should be deleted in order to free up space. + +OSD_BACKFILLFULL +________________ + +One or more OSDs has exceeded the `backfillfull` threshold, which will +prevent data from being allowed to rebalance to this device. This is +an early warning that rebalancing may not be able to complete and that +the cluster is approaching full. + +Utilization by pool can be checked with:: + + ceph df + +OSD_NEARFULL +____________ + +One or more OSDs has exceeded the `nearfull` threshold. This is an early +warning that the cluster is approaching full. + +Utilization by pool can be checked with:: + + ceph df + +OSDMAP_FLAGS +____________ + +One or more cluster flags of interest has been set. These flags include: + +* *full* - the cluster is flagged as full and cannot serve writes +* *pauserd*, *pausewr* - paused reads or writes +* *noup* - OSDs are not allowed to start +* *nodown* - OSD failure reports are being ignored, such that the + monitors will not mark OSDs `down` +* *noin* - OSDs that were previously marked `out` will not be marked + back `in` when they start +* *noout* - down OSDs will not automatically be marked out after the + configured interval +* *nobackfill*, *norecover*, *norebalance* - recovery or data + rebalancing is suspended +* *noscrub*, *nodeep_scrub* - scrubbing is disabled +* *notieragent* - cache tiering activity is suspended + +With the exception of *full*, these flags can be set or cleared with:: + + ceph osd set <flag> + ceph osd unset <flag> + +OSD_FLAGS +_________ + +One or more OSDs or CRUSH {nodes,device classes} has a flag of interest set. +These flags include: + +* *noup*: these OSDs are not allowed to start +* *nodown*: failure reports for these OSDs will be ignored +* *noin*: if these OSDs were previously marked `out` automatically + after a failure, they will not be marked in when they start +* *noout*: if these OSDs are down they will not automatically be marked + `out` after the configured interval + +These flags can be set and cleared in batch with:: + + ceph osd set-group <flags> <who> + ceph osd unset-group <flags> <who> + +For example, :: + + ceph osd set-group noup,noout osd.0 osd.1 + ceph osd unset-group noup,noout osd.0 osd.1 + ceph osd set-group noup,noout host-foo + ceph osd unset-group noup,noout host-foo + ceph osd set-group noup,noout class-hdd + ceph osd unset-group noup,noout class-hdd + +OLD_CRUSH_TUNABLES +__________________ + +The CRUSH map is using very old settings and should be updated. The +oldest tunables that can be used (i.e., the oldest client version that +can connect to the cluster) without triggering this health warning is +determined by the ``mon_crush_min_required_version`` config option. +See :ref:`crush-map-tunables` for more information. + +OLD_CRUSH_STRAW_CALC_VERSION +____________________________ + +The CRUSH map is using an older, non-optimal method for calculating +intermediate weight values for ``straw`` buckets. + +The CRUSH map should be updated to use the newer method +(``straw_calc_version=1``). See +:ref:`crush-map-tunables` for more information. + +CACHE_POOL_NO_HIT_SET +_____________________ + +One or more cache pools is not configured with a *hit set* to track +utilization, which will prevent the tiering agent from identifying +cold objects to flush and evict from the cache. + +Hit sets can be configured on the cache pool with:: + + ceph osd pool set <poolname> hit_set_type <type> + ceph osd pool set <poolname> hit_set_period <period-in-seconds> + ceph osd pool set <poolname> hit_set_count <number-of-hitsets> + ceph osd pool set <poolname> hit_set_fpp <target-false-positive-rate> + +OSD_NO_SORTBITWISE +__________________ + +No pre-luminous v12.y.z OSDs are running but the ``sortbitwise`` flag has not +been set. + +The ``sortbitwise`` flag must be set before luminous v12.y.z or newer +OSDs can start. You can safely set the flag with:: + + ceph osd set sortbitwise + +POOL_FULL +_________ + +One or more pools has reached its quota and is no longer allowing writes. + +Pool quotas and utilization can be seen with:: + + ceph df detail + +You can either raise the pool quota with:: + + ceph osd pool set-quota <poolname> max_objects <num-objects> + ceph osd pool set-quota <poolname> max_bytes <num-bytes> + +or delete some existing data to reduce utilization. + +BLUEFS_SPILLOVER +________________ + +One or more OSDs that use the BlueStore backend have been allocated +`db` partitions (storage space for metadata, normally on a faster +device) but that space has filled, such that metadata has "spilled +over" onto the normal slow device. This isn't necessarily an error +condition or even unexpected, but if the administrator's expectation +was that all metadata would fit on the faster device, it indicates +that not enough space was provided. + +This warning can be disabled on all OSDs with:: + + ceph config set osd bluestore_warn_on_bluefs_spillover false + +Alternatively, it can be disabled on a specific OSD with:: + + ceph config set osd.123 bluestore_warn_on_bluefs_spillover false + +To provide more metadata space, the OSD in question could be destroyed and +reprovisioned. This will involve data migration and recovery. + +It may also be possible to expand the LVM logical volume backing the +`db` storage. If the underlying LV has been expanded, the OSD daemon +needs to be stopped and BlueFS informed of the device size change with:: + + ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-$ID + +BLUEFS_AVAILABLE_SPACE +______________________ + +To check how much space is free for BlueFS do:: + + ceph daemon osd.123 bluestore bluefs available + +This will output up to 3 values: `BDEV_DB free`, `BDEV_SLOW free` and +`available_from_bluestore`. `BDEV_DB` and `BDEV_SLOW` report amount of space that +has been acquired by BlueFS and is considered free. Value `available_from_bluestore` +denotes ability of BlueStore to relinquish more space to BlueFS. +It is normal that this value is different from amount of BlueStore free space, as +BlueFS allocation unit is typically larger than BlueStore allocation unit. +This means that only part of BlueStore free space will be acceptable for BlueFS. + +BLUEFS_LOW_SPACE +_________________ + +If BlueFS is running low on available free space and there is little +`available_from_bluestore` one can consider reducing BlueFS allocation unit size. +To simulate available space when allocation unit is different do:: + + ceph daemon osd.123 bluestore bluefs available <alloc-unit-size> + +BLUESTORE_FRAGMENTATION +_______________________ + +As BlueStore works free space on underlying storage will get fragmented. +This is normal and unavoidable but excessive fragmentation will cause slowdown. +To inspect BlueStore fragmentation one can do:: + + ceph daemon osd.123 bluestore allocator score block + +Score is given in [0-1] range. +[0.0 .. 0.4] tiny fragmentation +[0.4 .. 0.7] small, acceptable fragmentation +[0.7 .. 0.9] considerable, but safe fragmentation +[0.9 .. 1.0] severe fragmentation, may impact BlueFS ability to get space from BlueStore + +If detailed report of free fragments is required do:: + + ceph daemon osd.123 bluestore allocator dump block + +In case when handling OSD process that is not running fragmentation can be +inspected with `ceph-bluestore-tool`. +Get fragmentation score:: + + ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-score + +And dump detailed free chunks:: + + ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-dump + +BLUESTORE_LEGACY_STATFS +_______________________ + +In the Nautilus release, BlueStore tracks its internal usage +statistics on a per-pool granular basis, and one or more OSDs have +BlueStore volumes that were created prior to Nautilus. If *all* OSDs +are older than Nautilus, this just means that the per-pool metrics are +not available. However, if there is a mix of pre-Nautilus and +post-Nautilus OSDs, the cluster usage statistics reported by ``ceph +df`` will not be accurate. + +The old OSDs can be updated to use the new usage tracking scheme by stopping each OSD, running a repair operation, and the restarting it. For example, if ``osd.123`` needed to be updated,:: + + systemctl stop ceph-osd@123 + ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123 + systemctl start ceph-osd@123 + +This warning can be disabled with:: + + ceph config set global bluestore_warn_on_legacy_statfs false + + +BLUESTORE_DISK_SIZE_MISMATCH +____________________________ + +One or more OSDs using BlueStore has an internal inconsistency between the size +of the physical device and the metadata tracking its size. This can lead to +the OSD crashing in the future. + +The OSDs in question should be destroyed and reprovisioned. Care should be +taken to do this one OSD at a time, and in a way that doesn't put any data at +risk. For example, if osd ``$N`` has the error,:: + + ceph osd out osd.$N + while ! ceph osd safe-to-destroy osd.$N ; do sleep 1m ; done + ceph osd destroy osd.$N + ceph-volume lvm zap /path/to/device + ceph-volume lvm create --osd-id $N --data /path/to/device + + +Device health +------------- + +DEVICE_HEALTH +_____________ + +One or more devices is expected to fail soon, where the warning +threshold is controlled by the ``mgr/devicehealth/warn_threshold`` +config option. + +This warning only applies to OSDs that are currently marked "in", so +the expected response to this failure is to mark the device "out" so +that data is migrated off of the device, and then to remove the +hardware from the system. Note that the marking out is normally done +automatically if ``mgr/devicehealth/self_heal`` is enabled based on +the ``mgr/devicehealth/mark_out_threshold``. + +Device health can be checked with:: + + ceph device info <device-id> + +Device life expectancy is set by a prediction model run by +the mgr or an by external tool via the command:: + + ceph device set-life-expectancy <device-id> <from> <to> + +You can change the stored life expectancy manually, but that usually +doesn't accomplish anything as whatever tool originally set it will +probably set it again, and changing the stored value does not affect +the actual health of the hardware device. + +DEVICE_HEALTH_IN_USE +____________________ + +One or more devices is expected to fail soon and has been marked "out" +of the cluster based on ``mgr/devicehealth/mark_out_threshold``, but it +is still participating in one more PGs. This may be because it was +only recently marked "out" and data is still migrating, or because data +cannot be migrated off for some reason (e.g., the cluster is nearly +full, or the CRUSH hierarchy is such that there isn't another suitable +OSD to migrate the data too). + +This message can be silenced by disabling the self heal behavior +(setting ``mgr/devicehealth/self_heal`` to false), by adjusting the +``mgr/devicehealth/mark_out_threshold``, or by addressing what is +preventing data from being migrated off of the ailing device. + +DEVICE_HEALTH_TOOMANY +_____________________ + +Too many devices is expected to fail soon and the +``mgr/devicehealth/self_heal`` behavior is enabled, such that marking +out all of the ailing devices would exceed the clusters +``mon_osd_min_in_ratio`` ratio that prevents too many OSDs from being +automatically marked "out". + +This generally indicates that too many devices in your cluster are +expected to fail soon and you should take action to add newer +(healthier) devices before too many devices fail and data is lost. + +The health message can also be silenced by adjusting parameters like +``mon_osd_min_in_ratio`` or ``mgr/devicehealth/mark_out_threshold``, +but be warned that this will increase the likelihood of unrecoverable +data loss in the cluster. + + +Data health (pools & placement groups) +-------------------------------------- + +PG_AVAILABILITY +_______________ + +Data availability is reduced, meaning that the cluster is unable to +service potential read or write requests for some data in the cluster. +Specifically, one or more PGs is in a state that does not allow IO +requests to be serviced. Problematic PG states include *peering*, +*stale*, *incomplete*, and the lack of *active* (if those conditions do not clear +quickly). + +Detailed information about which PGs are affected is available from:: + + ceph health detail + +In most cases the root cause is that one or more OSDs is currently +down; see the discussion for ``OSD_DOWN`` above. + +The state of specific problematic PGs can be queried with:: + + ceph tell <pgid> query + +PG_DEGRADED +___________ + +Data redundancy is reduced for some data, meaning the cluster does not +have the desired number of replicas for all data (for replicated +pools) or erasure code fragments (for erasure coded pools). +Specifically, one or more PGs: + +* has the *degraded* or *undersized* flag set, meaning there are not + enough instances of that placement group in the cluster; +* has not had the *clean* flag set for some time. + +Detailed information about which PGs are affected is available from:: + + ceph health detail + +In most cases the root cause is that one or more OSDs is currently +down; see the dicussion for ``OSD_DOWN`` above. + +The state of specific problematic PGs can be queried with:: + + ceph tell <pgid> query + + +PG_RECOVERY_FULL +________________ + +Data redundancy may be reduced or at risk for some data due to a lack +of free space in the cluster. Specifically, one or more PGs has the +*recovery_toofull* flag set, meaning that the +cluster is unable to migrate or recover data because one or more OSDs +is above the *full* threshold. + +See the discussion for *OSD_FULL* above for steps to resolve this condition. + +PG_BACKFILL_FULL +________________ + +Data redundancy may be reduced or at risk for some data due to a lack +of free space in the cluster. Specifically, one or more PGs has the +*backfill_toofull* flag set, meaning that the +cluster is unable to migrate or recover data because one or more OSDs +is above the *backfillfull* threshold. + +See the discussion for *OSD_BACKFILLFULL* above for +steps to resolve this condition. + +PG_DAMAGED +__________ + +Data scrubbing has discovered some problems with data consistency in +the cluster. Specifically, one or more PGs has the *inconsistent* or +*snaptrim_error* flag is set, indicating an earlier scrub operation +found a problem, or that the *repair* flag is set, meaning a repair +for such an inconsistency is currently in progress. + +See :doc:`pg-repair` for more information. + +OSD_SCRUB_ERRORS +________________ + +Recent OSD scrubs have uncovered inconsistencies. This error is generally +paired with *PG_DAMAGED* (see above). + +See :doc:`pg-repair` for more information. + +OSD_TOO_MANY_REPAIRS +____________________ + +When a read error occurs and another replica is available it is used to repair +the error immediately, so that the client can get the object data. Scrub +handles errors for data at rest. In order to identify possible failing disks +that aren't seeing scrub errors, a count of read repairs is maintained. If +it exceeds a config value threshold *mon_osd_warn_num_repaired* default 10, +this health warning is generated. + +In order to allow clearing of the warning, a new command +``ceph tell osd.# clear_shards_repaired [count]`` has been added. +By default it will set the repair count to 0. If the administrator wanted +to re-enable the warning if any additional repairs are performed you can provide +a value to the command and specify the value of ``mon_osd_warn_num_repaired``. +This command will be replaced in future releases by the health mute/unmute feature. + +LARGE_OMAP_OBJECTS +__________________ + +One or more pools contain large omap objects as determined by +``osd_deep_scrub_large_omap_object_key_threshold`` (threshold for number of keys +to determine a large omap object) or +``osd_deep_scrub_large_omap_object_value_sum_threshold`` (the threshold for +summed size (bytes) of all key values to determine a large omap object) or both. +More information on the object name, key count, and size in bytes can be found +by searching the cluster log for 'Large omap object found'. Large omap objects +can be caused by RGW bucket index objects that do not have automatic resharding +enabled. Please see :ref:`RGW Dynamic Bucket Index Resharding +<rgw_dynamic_bucket_index_resharding>` for more information on resharding. + +The thresholds can be adjusted with:: + + ceph config set osd osd_deep_scrub_large_omap_object_key_threshold <keys> + ceph config set osd osd_deep_scrub_large_omap_object_value_sum_threshold <bytes> + +CACHE_POOL_NEAR_FULL +____________________ + +A cache tier pool is nearly full. Full in this context is determined +by the ``target_max_bytes`` and ``target_max_objects`` properties on +the cache pool. Once the pool reaches the target threshold, write +requests to the pool may block while data is flushed and evicted +from the cache, a state that normally leads to very high latencies and +poor performance. + +The cache pool target size can be adjusted with:: + + ceph osd pool set <cache-pool-name> target_max_bytes <bytes> + ceph osd pool set <cache-pool-name> target_max_objects <objects> + +Normal cache flush and evict activity may also be throttled due to reduced +availability or performance of the base tier, or overall cluster load. + +TOO_FEW_PGS +___________ + +The number of PGs in use in the cluster is below the configurable +threshold of ``mon_pg_warn_min_per_osd`` PGs per OSD. This can lead +to suboptimal distribution and balance of data across the OSDs in +the cluster, and similarly reduce overall performance. + +This may be an expected condition if data pools have not yet been +created. + +The PG count for existing pools can be increased or new pools can be created. +Please refer to :ref:`choosing-number-of-placement-groups` for more +information. + +POOL_PG_NUM_NOT_POWER_OF_TWO +____________________________ + +One or more pools has a ``pg_num`` value that is not a power of two. +Although this is not strictly incorrect, it does lead to a less +balanced distribution of data because some PGs have roughly twice as +much data as others. + +This is easily corrected by setting the ``pg_num`` value for the +affected pool(s) to a nearby power of two:: + + ceph osd pool set <pool-name> pg_num <value> + +This health warning can be disabled with:: + + ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false + +POOL_TOO_FEW_PGS +________________ + +One or more pools should probably have more PGs, based on the amount +of data that is currently stored in the pool. This can lead to +suboptimal distribution and balance of data across the OSDs in the +cluster, and similarly reduce overall performance. This warning is +generated if the ``pg_autoscale_mode`` property on the pool is set to +``warn``. + +To disable the warning, you can disable auto-scaling of PGs for the +pool entirely with:: + + ceph osd pool set <pool-name> pg_autoscale_mode off + +To allow the cluster to automatically adjust the number of PGs,:: + + ceph osd pool set <pool-name> pg_autoscale_mode on + +You can also manually set the number of PGs for the pool to the +recommended amount with:: + + ceph osd pool set <pool-name> pg_num <new-pg-num> + +Please refer to :ref:`choosing-number-of-placement-groups` and +:ref:`pg-autoscaler` for more information. + +TOO_MANY_PGS +____________ + +The number of PGs in use in the cluster is above the configurable +threshold of ``mon_max_pg_per_osd`` PGs per OSD. If this threshold is +exceed the cluster will not allow new pools to be created, pool `pg_num` to +be increased, or pool replication to be increased (any of which would lead to +more PGs in the cluster). A large number of PGs can lead +to higher memory utilization for OSD daemons, slower peering after +cluster state changes (like OSD restarts, additions, or removals), and +higher load on the Manager and Monitor daemons. + +The simplest way to mitigate the problem is to increase the number of +OSDs in the cluster by adding more hardware. Note that the OSD count +used for the purposes of this health check is the number of "in" OSDs, +so marking "out" OSDs "in" (if there are any) can also help:: + + ceph osd in <osd id(s)> + +Please refer to :ref:`choosing-number-of-placement-groups` for more +information. + +POOL_TOO_MANY_PGS +_________________ + +One or more pools should probably have more PGs, based on the amount +of data that is currently stored in the pool. This can lead to higher +memory utilization for OSD daemons, slower peering after cluster state +changes (like OSD restarts, additions, or removals), and higher load +on the Manager and Monitor daemons. This warning is generated if the +``pg_autoscale_mode`` property on the pool is set to ``warn``. + +To disable the warning, you can disable auto-scaling of PGs for the +pool entirely with:: + + ceph osd pool set <pool-name> pg_autoscale_mode off + +To allow the cluster to automatically adjust the number of PGs,:: + + ceph osd pool set <pool-name> pg_autoscale_mode on + +You can also manually set the number of PGs for the pool to the +recommended amount with:: + + ceph osd pool set <pool-name> pg_num <new-pg-num> + +Please refer to :ref:`choosing-number-of-placement-groups` and +:ref:`pg-autoscaler` for more information. + +POOL_TARGET_SIZE_BYTES_OVERCOMMITTED +____________________________________ + +One or more pools have a ``target_size_bytes`` property set to +estimate the expected size of the pool, +but the value(s) exceed the total available storage (either by +themselves or in combination with other pools' actual usage). + +This is usually an indication that the ``target_size_bytes`` value for +the pool is too large and should be reduced or set to zero with:: + + ceph osd pool set <pool-name> target_size_bytes 0 + +For more information, see :ref:`specifying_pool_target_size`. + +POOL_HAS_TARGET_SIZE_BYTES_AND_RATIO +____________________________________ + +One or more pools have both ``target_size_bytes`` and +``target_size_ratio`` set to estimate the expected size of the pool. +Only one of these properties should be non-zero. If both are set, +``target_size_ratio`` takes precedence and ``target_size_bytes`` is +ignored. + +To reset ``target_size_bytes`` to zero:: + + ceph osd pool set <pool-name> target_size_bytes 0 + +For more information, see :ref:`specifying_pool_target_size`. + +TOO_FEW_OSDS +____________ + +The number of OSDs in the cluster is below the configurable +threshold of ``osd_pool_default_size``. + +SMALLER_PGP_NUM +_______________ + +One or more pools has a ``pgp_num`` value less than ``pg_num``. This +is normally an indication that the PG count was increased without +also increasing the placement behavior. + +This is sometimes done deliberately to separate out the `split` step +when the PG count is adjusted from the data migration that is needed +when ``pgp_num`` is changed. + +This is normally resolved by setting ``pgp_num`` to match ``pg_num``, +triggering the data migration, with:: + + ceph osd pool set <pool> pgp_num <pg-num-value> + +MANY_OBJECTS_PER_PG +___________________ + +One or more pools has an average number of objects per PG that is +significantly higher than the overall cluster average. The specific +threshold is controlled by the ``mon_pg_warn_max_object_skew`` +configuration value. + +This is usually an indication that the pool(s) containing most of the +data in the cluster have too few PGs, and/or that other pools that do +not contain as much data have too many PGs. See the discussion of +*TOO_MANY_PGS* above. + +The threshold can be raised to silence the health warning by adjusting +the ``mon_pg_warn_max_object_skew`` config option on the monitors. + + +POOL_APP_NOT_ENABLED +____________________ + +A pool exists that contains one or more objects but has not been +tagged for use by a particular application. + +Resolve this warning by labeling the pool for use by an application. For +example, if the pool is used by RBD,:: + + rbd pool init <poolname> + +If the pool is being used by a custom application 'foo', you can also label +via the low-level command:: + + ceph osd pool application enable foo + +For more information, see :ref:`associate-pool-to-application`. + +POOL_FULL +_________ + +One or more pools has reached (or is very close to reaching) its +quota. The threshold to trigger this error condition is controlled by +the ``mon_pool_quota_crit_threshold`` configuration option. + +Pool quotas can be adjusted up or down (or removed) with:: + + ceph osd pool set-quota <pool> max_bytes <bytes> + ceph osd pool set-quota <pool> max_objects <objects> + +Setting the quota value to 0 will disable the quota. + +POOL_NEAR_FULL +______________ + +One or more pools is approaching is quota. The threshold to trigger +this warning condition is controlled by the +``mon_pool_quota_warn_threshold`` configuration option. + +Pool quotas can be adjusted up or down (or removed) with:: + + ceph osd pool set-quota <pool> max_bytes <bytes> + ceph osd pool set-quota <pool> max_objects <objects> + +Setting the quota value to 0 will disable the quota. + +OBJECT_MISPLACED +________________ + +One or more objects in the cluster is not stored on the node the +cluster would like it to be stored on. This is an indication that +data migration due to some recent cluster change has not yet completed. + +Misplaced data is not a dangerous condition in and of itself; data +consistency is never at risk, and old copies of objects are never +removed until the desired number of new copies (in the desired +locations) are present. + +OBJECT_UNFOUND +______________ + +One or more objects in the cluster cannot be found. Specifically, the +OSDs know that a new or updated copy of an object should exist, but a +copy of that version of the object has not been found on OSDs that are +currently online. + +Read or write requests to unfound objects will block. + +Ideally, a down OSD can be brought back online that has the more +recent copy of the unfound object. Candidate OSDs can be identified from the +peering state for the PG(s) responsible for the unfound object:: + + ceph tell <pgid> query + +If the latest copy of the object is not available, the cluster can be +told to roll back to a previous version of the object. See +:ref:`failures-osd-unfound` for more information. + +SLOW_OPS +________ + +One or more OSD requests is taking a long time to process. This can +be an indication of extreme load, a slow storage device, or a software +bug. + +The request queue on the OSD(s) in question can be queried with the +following command, executed from the OSD host:: + + ceph daemon osd.<id> ops + +A summary of the slowest recent requests can be seen with:: + + ceph daemon osd.<id> dump_historic_ops + +The location of an OSD can be found with:: + + ceph osd find osd.<id> + +PG_NOT_SCRUBBED +_______________ + +One or more PGs has not been scrubbed recently. PGs are normally +scrubbed every ``mon_scrub_interval`` seconds, and this warning +triggers when ``mon_warn_pg_not_scrubbed_ratio`` percentage of interval has elapsed +without a scrub since it was due. + +PGs will not scrub if they are not flagged as *clean*, which may +happen if they are misplaced or degraded (see *PG_AVAILABILITY* and +*PG_DEGRADED* above). + +You can manually initiate a scrub of a clean PG with:: + + ceph pg scrub <pgid> + +PG_NOT_DEEP_SCRUBBED +____________________ + +One or more PGs has not been deep scrubbed recently. PGs are normally +scrubbed every ``osd_deep_scrub_interval`` seconds, and this warning +triggers when ``mon_warn_pg_not_deep_scrubbed_ratio`` percentage of interval has elapsed +without a scrub since it was due. + +PGs will not (deep) scrub if they are not flagged as *clean*, which may +happen if they are misplaced or degraded (see *PG_AVAILABILITY* and +*PG_DEGRADED* above). + +You can manually initiate a scrub of a clean PG with:: + + ceph pg deep-scrub <pgid> + + +Miscellaneous +------------- + +RECENT_CRASH +____________ + +One or more Ceph daemons has crashed recently, and the crash has not +yet been archived (acknowledged) by the administrator. This may +indicate a software bug, a hardware problem (e.g., a failing disk), or +some other problem. + +New crashes can be listed with:: + + ceph crash ls-new + +Information about a specific crash can be examined with:: + + ceph crash info <crash-id> + +This warning can be silenced by "archiving" the crash (perhaps after +being examined by an administrator) so that it does not generate this +warning:: + + ceph crash archive <crash-id> + +Similarly, all new crashes can be archived with:: + + ceph crash archive-all + +Archived crashes will still be visible via ``ceph crash ls`` but not +``ceph crash ls-new``. + +The time period for what "recent" means is controlled by the option +``mgr/crash/warn_recent_interval`` (default: two weeks). + +These warnings can be disabled entirely with:: + + ceph config set mgr/crash/warn_recent_interval 0 + +TELEMETRY_CHANGED +_________________ + +Telemetry has been enabled, but the contents of the telemetry report +have changed since that time, so telemetry reports will not be sent. + +The Ceph developers periodically revise the telemetry feature to +include new and useful information, or to remove information found to +be useless or sensitive. If any new information is included in the +report, Ceph will require the administrator to re-enable telemetry to +ensure they have an opportunity to (re)review what information will be +shared. + +To review the contents of the telemetry report,:: + + ceph telemetry show + +Note that the telemetry report consists of several optional channels +that may be independently enabled or disabled. For more information, see +:ref:`telemetry`. + +To re-enable telemetry (and make this warning go away),:: + + ceph telemetry on + +To disable telemetry (and make this warning go away),:: + + ceph telemetry off + +DASHBOARD_DEBUG +_______________ + +The Dashboard debug mode is enabled. This means, if there is an error +while processing a REST API request, the HTTP error response contains +a Python traceback. This behaviour should be disabled in production +environments because such a traceback might contain and expose sensible +information. + +The debug mode can be disabled with:: + + ceph dashboard debug disable |