summaryrefslogtreecommitdiffstats
path: root/health/guides/kubelet/kubelet_1m_pleg_relist_latency_quantile_09.md
diff options
context:
space:
mode:
Diffstat (limited to 'health/guides/kubelet/kubelet_1m_pleg_relist_latency_quantile_09.md')
-rw-r--r--health/guides/kubelet/kubelet_1m_pleg_relist_latency_quantile_09.md45
1 files changed, 45 insertions, 0 deletions
diff --git a/health/guides/kubelet/kubelet_1m_pleg_relist_latency_quantile_09.md b/health/guides/kubelet/kubelet_1m_pleg_relist_latency_quantile_09.md
new file mode 100644
index 000000000..6c71f1cf6
--- /dev/null
+++ b/health/guides/kubelet/kubelet_1m_pleg_relist_latency_quantile_09.md
@@ -0,0 +1,45 @@
+### Understand the alert
+
+This alert calculates the average Pod Lifecycle Event Generator (PLEG) relisting latency over the period of one minute, using the 0.9 quantile. This alert is related to Kubelet, a critical component in the Kubernetes cluster that ensures the correct running of containers inside pods. If you receive this alert, it means that the relisting latency has increased in your Kubernetes cluster, possibly affecting the performance of your workloads.
+
+### What does PLEG relisting latency mean?
+
+In Kubernetes, PLEG is responsible for keeping track of container lifecycle events, such as container start, stop, or pause. It periodically relists these events and updates the Kubernetes Pod status, ensuring the scheduler and other components know the correct state of the containers. An increased relisting latency could lead to slower updates on Pod status and overall degraded performance.
+
+### What does 0.9 quantile mean?
+
+The 0.9 quantile represents the value below which 90% of the latencies are. An alert based on the 0.9 quantile suggests that 90% of relisting latencies are below the specified threshold, meaning that the remaining 10% are experiencing increased latency, which could lead to issues in your cluster.
+
+### Troubleshoot the alert
+
+1. Check Kubelet logs for errors or warnings related to PLEG:
+
+ Access the logs of the Kubelet component running on the affected node:
+
+ ```
+ sudo journalctl -u kubelet
+ ```
+
+2. Monitor the overall performance of your Kubernetes cluster:
+
+ Use `kubectl top nodes` to check the resource usage of your nodes and identify any bottlenecks, such as high CPU or memory consumption.
+
+3. Check the status of Pods:
+
+ Use `kubectl get pods --all-namespaces` to check the status of all Pods in your cluster. Look for Pods in an abnormal state (e.g., Pending, CrashLoopBackOff, or Terminating), which could be related to high PLEG relisting latency.
+
+4. Analyze Pod logs for issues:
+
+ Investigate the logs of the affected Pods to understand any issues with the container lifecycle events:
+
+ ```
+ kubectl logs <pod-name> -n <namespace>
+ ```
+
+5. Review the Kubelet configuration:
+
+ Ensure that your Kubelet configuration is set up correctly to handle your workloads. If necessary, adjust the settings to improve PLEG relisting performance.
+
+### Useful resources
+
+1. [Kubernetes Troubleshooting Guide](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-cluster/)