summaryrefslogtreecommitdiffstats
path: root/health/guides/consul/consul_autopilot_health_status.md
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-19 02:57:58 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-19 02:57:58 +0000
commitbe1c7e50e1e8809ea56f2c9d472eccd8ffd73a97 (patch)
tree9754ff1ca740f6346cf8483ec915d4054bc5da2d /health/guides/consul/consul_autopilot_health_status.md
parentInitial commit. (diff)
downloadnetdata-upstream.tar.xz
netdata-upstream.zip
Adding upstream version 1.44.3.upstream/1.44.3upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'health/guides/consul/consul_autopilot_health_status.md')
-rw-r--r--health/guides/consul/consul_autopilot_health_status.md53
1 files changed, 53 insertions, 0 deletions
diff --git a/health/guides/consul/consul_autopilot_health_status.md b/health/guides/consul/consul_autopilot_health_status.md
new file mode 100644
index 00000000..42ccab5a
--- /dev/null
+++ b/health/guides/consul/consul_autopilot_health_status.md
@@ -0,0 +1,53 @@
+### Understand the alert
+
+This alert checks the health status of the Consul cluster regarding its autopilot functionality. If you receive this alert, it means that the Consul datacenter is experiencing issues, and its health status has been reported as `unhealthy` by the Consul server.
+
+### What is Consul autopilot?
+
+Consul's autopilot feature provides automatic management and stabilization features for Consul server clusters, ensuring that the clusters remain in a healthy state. These features include server health monitoring, automatic dead server reaping, and stable server introduction.
+
+### What does unhealthy mean?
+
+An unhealthy Consul cluster could experience issues regarding its operations, services, leader elections, and cluster consistency. In this alert scenario, the cluster health functionality is not working correctly, and it could lead to stability and performance problems.
+
+### Troubleshoot the alert
+
+Here are some steps to troubleshoot the consul_autopilot_health_status alert:
+
+1. Check the logs of the Consul server to identify any error messages or warning signs. The logs will often provide insights into the underlying problems.
+
+ ```
+ journalctl -u consul
+ ```
+
+2. Inspect the Consul health status using the Consul CLI or API:
+
+ ```
+ consul operator autopilot get-config
+ ```
+
+ Using the Consul HTTP API:
+ ```
+ curl http://<consul_server>:8500/v1/operator/autopilot/health
+ ```
+
+3. Verify the configuration of Consul servers, check the `retry_join` and addresses of the Consul servers in the configuration file:
+
+ ```
+ cat /etc/consul.d/consul.hcl | grep retry_join
+ ```
+
+4. Ensure that there is a sufficient number of Consul servers and that they are healthy. The `consul members` command will show the status of cluster members:
+
+ ```
+ consul members
+ ```
+
+5. Check the network connectivity between Consul servers by running network diagnostics like ping and traceroute.
+
+6. Review Consul documentation to gain a deeper understanding of the autopilot health issues and potential configuration problems.
+
+
+### Useful resources
+
+- [Consul CLI reference](https://www.consul.io/docs/commands)