diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-03-09 13:19:48 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-03-09 13:20:02 +0000 |
commit | 58daab21cd043e1dc37024a7f99b396788372918 (patch) | |
tree | 96771e43bb69f7c1c2b0b4f7374cb74d7866d0cb /health/guides/kubelet/kubelet_1m_pleg_relist_latency_quantile_099.md | |
parent | Releasing debian version 1.43.2-1. (diff) | |
download | netdata-58daab21cd043e1dc37024a7f99b396788372918.tar.xz netdata-58daab21cd043e1dc37024a7f99b396788372918.zip |
Merging upstream version 1.44.3.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'health/guides/kubelet/kubelet_1m_pleg_relist_latency_quantile_099.md')
-rw-r--r-- | health/guides/kubelet/kubelet_1m_pleg_relist_latency_quantile_099.md | 36 |
1 files changed, 36 insertions, 0 deletions
diff --git a/health/guides/kubelet/kubelet_1m_pleg_relist_latency_quantile_099.md b/health/guides/kubelet/kubelet_1m_pleg_relist_latency_quantile_099.md new file mode 100644 index 000000000..39e031628 --- /dev/null +++ b/health/guides/kubelet/kubelet_1m_pleg_relist_latency_quantile_099.md @@ -0,0 +1,36 @@ +### Understand the alert + +This alert calculates the average Pod Lifecycle Event Generator (PLEG) relisting latency over the last minute with a quantile of 0.99 in microseconds. If you receive this alert, it means that the Kubelet's PLEG latency is high, which can slow down your Kubernetes cluster. + +### What does PLEG latency mean? + +Pod Lifecycle Event Generator (PLEG) is a component of the Kubelet that watches for container events on the system and generates events for a pod's lifecycle. High PLEG latency indicates a delay in processing these events, which can cause delays in pod startup, termination, and updates. + +### Troubleshoot the alert + +1. Check the overall Kubelet performance and system load: + + a. Run `kubectl get nodes` to check the status of the nodes in your cluster. + b. Investigate the node with high PLEG latency using `kubectl describe node <NODE_NAME>` to view detailed information about resource usage and events. + c. Use monitoring tools like `top`, `htop`, or `vmstat` to check for high CPU, memory, or disk usage on the node. + +2. Look for problematic pods or containers: + + a. Run `kubectl get pods --all-namespaces` to check the status of all pods across namespaces. + b. Use `kubectl logs <POD_NAME> -n <NAMESPACE>` to check the logs of the pods in the namespace. + c. Investigate pods with high restart counts, crash loops, or other abnormal statuses. + +3. Verify Kubelet configurations and logs: + + a. Check the Kubelet configuration on the node. Look for any misconfigurations or settings that could cause high latency. + b. Check Kubelet logs using `journalctl -u kubelet` for more information about PLEG events and errors. + +4. Consider evaluating your workloads and scaling your cluster: + + a. If you have multiple nodes experiencing high PLEG latency or if the overall load on your nodes is consistently high, you might need to scale your cluster. + b. Evaluate your workloads and adjust resource requests and limits to make the best use of your available resources. + +### Useful resources + +1. [Understanding the Kubernetes Kubelet](https://kubernetes.io/docs/concepts/overview/components/#kubelet) +2. [Troubleshooting Kubernetes Clusters](https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/) |