summaryrefslogtreecommitdiffstats
path: root/health/guides/consul/consul_node_health_check_status.md
diff options
context:
space:
mode:
Diffstat (limited to 'health/guides/consul/consul_node_health_check_status.md')
-rw-r--r--health/guides/consul/consul_node_health_check_status.md34
1 files changed, 34 insertions, 0 deletions
diff --git a/health/guides/consul/consul_node_health_check_status.md b/health/guides/consul/consul_node_health_check_status.md
new file mode 100644
index 000000000..44b431edc
--- /dev/null
+++ b/health/guides/consul/consul_node_health_check_status.md
@@ -0,0 +1,34 @@
+### Understand the alert
+
+This alert is triggered when a Consul node health check status indicates a failure. Consul is a service mesh solution for service discovery and configuration. If you receive this alert, it means that the health check for a specific service on a node within the Consul cluster has failed.
+
+### What does the health check status mean?
+
+Consul performs health checks to ensure the services registered within the cluster are functioning as expected. The health check status represents the result of these checks, with a non-zero value indicating a failed health check. A failed health check can potentially cause downtime or degraded performance for the affected service.
+
+### Troubleshoot the alert
+
+1. Check the alert details: The alert information provided should include the `check_name`, `node_name`, and `datacenter` affected. Note these details as they will be useful in further troubleshooting.
+
+2. Verify the health check status in Consul: To confirm the health check failure, access the Consul UI or use the Consul command-line tool to query the health status of the affected service and node:
+
+ ```
+ consul members
+ ```
+
+ ```
+ consul monitor
+ ```
+
+3. Investigate the failed service: Once you confirm the health check failure, start investigating the specific service affected. Check logs, resource usage, configuration files, and other relevant information to identify the root cause of the failure.
+
+4. Fix the issue: Based on your investigation, apply the necessary fixes to the service or its configuration. This may include restarting the service, adjusting resource allocation, or fixing any configuration errors.
+
+5. Verify service health: After applying the required fixes, verify the health status of the service once again through the Consul UI or command-line tool. If the service health check status has returned to normal (zero value), the issue has been resolved.
+
+6. Monitor for any recurrence: Keep an eye on the service, node, and overall Consul cluster health to ensure the issue does not reappear and to catch any other potential problems.
+
+### Useful resources
+
+1. [Consul documentation](https://www.consul.io/docs/)
+2. [Service and Node Health](https://www.consul.io/api-docs/health)