diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-07-24 09:54:23 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-07-24 09:54:44 +0000 |
commit | 836b47cb7e99a977c5a23b059ca1d0b5065d310e (patch) | |
tree | 1604da8f482d02effa033c94a84be42bc0c848c3 /src/health/guides/elasticsearch | |
parent | Releasing debian version 1.44.3-2. (diff) | |
download | netdata-836b47cb7e99a977c5a23b059ca1d0b5065d310e.tar.xz netdata-836b47cb7e99a977c5a23b059ca1d0b5065d310e.zip |
Merging upstream version 1.46.3.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/health/guides/elasticsearch')
5 files changed, 254 insertions, 0 deletions
diff --git a/src/health/guides/elasticsearch/elasticsearch_cluster_health_status_red.md b/src/health/guides/elasticsearch/elasticsearch_cluster_health_status_red.md new file mode 100644 index 000000000..494a7853c --- /dev/null +++ b/src/health/guides/elasticsearch/elasticsearch_cluster_health_status_red.md @@ -0,0 +1,55 @@ +### Understand the alert + +This alert is triggered when the Elasticsearch cluster health status turns `RED`. If you receive this alert, it means that there is a problem that needs immediate attention, such as data loss or one or more primary and replica shards are not allocated to the cluster. + +### Elasticsearch Cluster Health Status + +Elasticsearch cluster health status provides an indication of the cluster's overall health, based on the state of its shards. The status can be `green`, `yellow`, or `red`: + +- `Green`: All primary and replica shards are allocated. +- `Yellow`: All primary shards are allocated, but some replica shards are not. +- `Red`: One or more primary shards are not allocated, leading to data loss. + +### Troubleshoot the alert + +1. Check the Elasticsearch cluster health using the `_cat` API: + +``` +curl -XGET 'http://localhost:9200/_cat/health?v' +``` + +Examine the output to understand the current health status, the number of nodes and shards, and any unassigned shards. + +2. To get more details on the unassigned shards, use the `_cat/shards` API: + +``` +curl -XGET 'http://localhost:9200/_cat/shards?v' +``` + +Look for shards with the status `UNASSIGNED`. + +3. Identify the root cause of the issue, such as: + + - A node has left the cluster or failed, causing the primary shard to become unassigned. + - Insufficient disk space is available, preventing shards from being allocated. + - Cluster settings or shard allocation settings are misconfigured. + +4. Take appropriate action based on the root cause: + + - Ensure all Elasticsearch nodes are running and connected to the cluster. + - Add more nodes or increase disk space as needed. + - Review and correct cluster and shard allocation settings. + +5. Monitor the health status as the cluster recovers: + +``` +curl -XGET 'http://localhost:9200/_cat/health?v' +``` + +If the health status turns `YELLOW` or `GREEN`, the cluster is no longer in the `RED` state. + +### Useful resources + +1. [Elasticsearch Cluster Health](https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html) +2. [Fixing Elasticsearch Cluster Health Status "RED"](https://www.elastic.co/guide/en/elasticsearch/guide/current/_cluster_health.html) +3. [Elasticsearch Shard Allocation](https://www.elastic.co/guide/en/elasticsearch/reference/current/shards-allocation.html)
\ No newline at end of file diff --git a/src/health/guides/elasticsearch/elasticsearch_cluster_health_status_yellow.md b/src/health/guides/elasticsearch/elasticsearch_cluster_health_status_yellow.md new file mode 100644 index 000000000..2f8bf854d --- /dev/null +++ b/src/health/guides/elasticsearch/elasticsearch_cluster_health_status_yellow.md @@ -0,0 +1,57 @@ +### Understand the alert + +The `elasticsearch_cluster_health_status_yellow` alert triggers when the Elasticsearch cluster's health status is `yellow` for longer than 10 minutes. This may indicate potential issues in the cluster, like unassigned or missing replicas. The alert class is `Errors`, and the type is `SearchEngine`. + +### What does the health status mean? + +In Elasticsearch, cluster health status can be one of three colors: + +- Green: All primary shards and replicas are active and properly assigned to each index. +- Yellow: All primary shards are active, but one or more replicas are unassigned or missing. +- Red: One or more primary shards are unassigned or missing. + +### Troubleshoot the alert + +1. Check the Elasticsearch cluster health. + +You can check the health of the Elasticsearch cluster using the `/_cluster/health` API endpoint: + +``` +curl -XGET 'http://localhost:9200/_cluster/health?pretty' +``` + +2. Identify the unassigned or missing replicas. + +You can check for any unassigned or missing shards using the `/_cat/shards` API endpoint: + +``` +curl -XGET 'http://localhost:9200/_cat/shards?v&h=index,shard,prirep,state' +``` + +3. Check Elasticsearch logs for any errors or warnings: + +``` +sudo journalctl --unit elasticsearch +``` + +4. Check disk space on all Elasticsearch nodes. Insufficient disk space may lead to unassigned or missing replicas: + +``` +df -h +``` + +5. Ensure Elasticsearch is properly configured. + +Check the `elasticsearch.yml` configuration file on all nodes for any misconfigurations or errors: + +``` +sudo nano /etc/elasticsearch/elasticsearch.yml +``` + +6. Review the Elasticsearch documentation on [Cluster-Level Shard Allocation and Routing Settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-awareness.html) to understand how to properly assign and balance shards. + +### Useful resources + +1. [Elasticsearch Cluster Health](https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html) +2. [Elasticsearch Shards](https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-shards.html) +3. [Allocation Awareness in Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-awareness.html)
\ No newline at end of file diff --git a/src/health/guides/elasticsearch/elasticsearch_node_index_health_red.md b/src/health/guides/elasticsearch/elasticsearch_node_index_health_red.md new file mode 100644 index 000000000..1e2877d14 --- /dev/null +++ b/src/health/guides/elasticsearch/elasticsearch_node_index_health_red.md @@ -0,0 +1,49 @@ +### Understand the alert + +This alert is triggered when the health status of an Elasticsearch node index turns `red`. If you receive this alert, it means that at least one primary shard and its replicas are not allocated to any node, and the data in the index is potentially at risk. + +### What does a red index health status mean? + +In Elasticsearch, the index health status can be green, yellow, or red: + +- Green: All primary and replica shards are allocated and active. +- Yellow: All primary shards are active, but not all replicas are allocated due to the lack of available nodes. +- Red: At least one primary shard and its replicas are not allocated, which means the cluster can't serve all the incoming data, and data loss is possible. + +### Troubleshoot the alert + +1. Check the cluster health + + Use the Elasticsearch `_cluster/health` endpoint to check the health status of your cluster: + ``` + curl -X GET "localhost:9200/_cluster/health?pretty" + ``` + +2. Identify the unassigned shards + + Use the Elasticsearch `_cat/shards` endpoint to view the status of all shards in your cluster: + ``` + curl -X GET "localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason&pretty" + ``` + +3. Check Elasticsearch logs + + Examine the Elasticsearch logs for any error messages or alerts related to shard allocation. The log file is usually located at `/var/log/elasticsearch/`. + +4. Resolve shard allocation issues + + Depending on the cause of the unassigned shards, you may need to perform actions such as: + + - Add more nodes to the cluster to distribute the load evenly. + - Reallocate shards manually using the Elasticsearch `_cluster/reroute` API. + - Adjust shard allocation settings in the Elasticsearch `elasticsearch.yml` configuration file. + +5. Recheck the cluster health + + After addressing the issues found in the previous steps, use the `_cluster/health` endpoint again to check if the health status of the affected index has improved. + +### Useful resources + +1. [Elasticsearch: Cluster Health](https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html) +2. [Elasticsearch: Shards and Replicas](https://www.elastic.co/guide/en/elasticsearch/reference/current/_basic_concepts.html#shards-and-replicas) +3. [Elasticsearch: Shard Allocation and Cluster-Level Settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/shards-allocation.html)
\ No newline at end of file diff --git a/src/health/guides/elasticsearch/elasticsearch_node_indices_search_time_fetch.md b/src/health/guides/elasticsearch/elasticsearch_node_indices_search_time_fetch.md new file mode 100644 index 000000000..e0bcc1125 --- /dev/null +++ b/src/health/guides/elasticsearch/elasticsearch_node_indices_search_time_fetch.md @@ -0,0 +1,49 @@ +### Understand the alert + +This alert is triggered when the Elasticsearch node's average `search_time_fetch` exceeds the warning or critical thresholds over a 10-minute window. The `search_time_fetch` measures the time spent fetching data from shards during search operations. If you receive this alert, it means your Elasticsearch search performance is degraded, and fetches are running slowly. + +### Troubleshoot the alert + +1. Check the Elasticsearch cluster health + +Run the following command to check the health of your Elasticsearch cluster: + +``` +curl -XGET 'http://localhost:9200/_cluster/health?pretty' +``` + +Look for the `status` field in the output, which indicates the overall health of the cluster: + +- green: All primary and replica shards are active and allocated. +- yellow: All primary shards are active, but not all replica shards are active. +- red: Some primary shards are not active. + +2. Identify slow search queries + +Run the following command to gather information on slow search queries: + +``` +curl -XGET 'http://localhost:9200/_nodes/stats/indices/search?pretty' +``` + +Look for the `query`, `fetch`, and `take` fields in the output, which indicate the time taken by different parts of the search operation. + +3. Check Elasticsearch node resources + +Ensure the Elasticsearch node has sufficient resources (CPU, memory, disk space, and disk I/O). Use system monitoring tools like `top`, `htop`, `vmstat`, and `iostat` to analyze the resource usage on the Elasticsearch node. + +4. Optimize search queries + +If slow search queries are identified in Step 2, consider optimizing them for better performance. Some techniques for optimizing Elasticsearch search performance include using filters, limiting result set size, and disabling expensive operations like sorting and faceting when not needed. + +5. Review Elasticsearch configuration + +Check your Elasticsearch configuration to ensure it is optimized for search performance. Verify settings such as index refresh intervals, query caches, and field data caches. Consult the Elasticsearch documentation for best practices on configuration settings. + +6. Consider horizontal scaling + +If your Elasticsearch node is experiencing high search loads regularly, consider adding more nodes to distribute the load evenly across the cluster. + +### Useful resources + +1. [Elasticsearch Performance Tuning](https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-search-speed.html) diff --git a/src/health/guides/elasticsearch/elasticsearch_node_indices_search_time_query.md b/src/health/guides/elasticsearch/elasticsearch_node_indices_search_time_query.md new file mode 100644 index 000000000..3a82a64ac --- /dev/null +++ b/src/health/guides/elasticsearch/elasticsearch_node_indices_search_time_query.md @@ -0,0 +1,44 @@ +### Understand the alert + +This alert is triggered when the average search time for Elasticsearch queries has been higher than the defined warning thresholds. If you receive this alert, it means that your search performance is degraded, and queries are running slower than usual. + +### What does search performance mean? + +Search performance in Elasticsearch refers to how quickly and efficiently search queries are executed, and the respective results are returned. Good search performance is essential for providing fast and relevant results in applications and services relying on Elasticsearch for their search capabilities. + +### What causes degraded search performance? + +Several factors can cause search performance degradation, including: + +- High system load, causing CPU, memory or disk I/O bottlenecks +- Poorly optimized search queries +- High query rate, resulting in a large number of concurrent queries +- Insufficient hardware or resources allocated to Elasticsearch + +### Troubleshoot the alert + +1. Check the Elasticsearch logs for any error messages or warnings: + + ``` + cat /var/log/elasticsearch/elasticsearch.log + ``` + +2. Monitor the system resources (CPU, memory, and disk I/O) using tools like `top`, `vmstat`, and `iotop`. Determine if there are any resource bottlenecks affecting the search performance. + +3. Analyze and optimize the slow search queries by using the Elasticsearch [Slow Log](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-slowlog.html). + +4. Evaluate the cluster health status by running the following Elasticsearch API command: + + ``` + curl -XGET 'http://localhost:9200/_cluster/health?pretty' + ``` + + Check for any issues that may be impacting the search performance. + +5. Assess the number of concurrent queries and, if possible, reduce the query rate or distribute the load among additional Elasticsearch nodes. + +6. If the issue persists, consider scaling up your Elasticsearch deployment or allocating additional resources to the affected nodes to improve their performance. + +### Useful resources + +1. [Tune for Search Speed - Elasticsearch Guide](https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-search-speed.html) |