Adding upstream version 1.44.3.upstream/1.44.3 upstream

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-19 02:57:58 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-19 02:57:58 +0000
commit: be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97 (patch)
tree: 9754ff1ca740f6346cf8483ec915d4054bc5da2d /health/guides/kubelet/kubelet_1m_pleg_relist_latency_quantile_099.md
parent: Initial commit. (diff)
download: netdata-be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97.tar.xz
netdata-be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97.zip
1 files changed, 36 insertions, 0 deletions
diff --git a/health/guides/kubelet/kubelet_1m_pleg_relist_latency_quantile_099.md b/health/guides/kubelet/kubelet_1m_pleg_relist_latency_quantile_099.md
new file mode 100644
index 00000000..39e03162
--- /dev/null
+++ b/health/guides/kubelet/kubelet_1m_pleg_relist_latency_quantile_099.md
@@ -0,0 +1,36 @@
+### Understand the alert
+
+This alert calculates the average Pod Lifecycle Event Generator (PLEG) relisting latency over the last minute with a quantile of 0.99 in microseconds. If you receive this alert, it means that the Kubelet's PLEG latency is high, which can slow down your Kubernetes cluster.
+
+### What does PLEG latency mean?
+
+Pod Lifecycle Event Generator (PLEG) is a component of the Kubelet that watches for container events on the system and generates events for a pod's lifecycle. High PLEG latency indicates a delay in processing these events, which can cause delays in pod startup, termination, and updates.
+
+### Troubleshoot the alert
+
+1. Check the overall Kubelet performance and system load:
+
+   a. Run `kubectl get nodes` to check the status of the nodes in your cluster.
+   b. Investigate the node with high PLEG latency using `kubectl describe node <NODE_NAME>` to view detailed information about resource usage and events.
+   c. Use monitoring tools like `top`, `htop`, or `vmstat` to check for high CPU, memory, or disk usage on the node.
+
+2. Look for problematic pods or containers:
+
+   a. Run `kubectl get pods --all-namespaces` to check the status of all pods across namespaces.
+   b. Use `kubectl logs <POD_NAME> -n <NAMESPACE>` to check the logs of the pods in the namespace.
+   c. Investigate pods with high restart counts, crash loops, or other abnormal statuses.
+
+3. Verify Kubelet configurations and logs:
+
+   a. Check the Kubelet configuration on the node. Look for any misconfigurations or settings that could cause high latency.
+   b. Check Kubelet logs using `journalctl -u kubelet` for more information about PLEG events and errors.
+
+4. Consider evaluating your workloads and scaling your cluster:
+
+   a. If you have multiple nodes experiencing high PLEG latency or if the overall load on your nodes is consistently high, you might need to scale your cluster.
+   b. Evaluate your workloads and adjust resource requests and limits to make the best use of your available resources.
+
+### Useful resources
+
+1. [Understanding the Kubernetes Kubelet](https://kubernetes.io/docs/concepts/overview/components/#kubelet)
+2. [Troubleshooting Kubernetes Clusters](https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/)
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-19 02:57:58 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-19 02:57:58 +0000
commit	be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97 (patch)
tree	9754ff1ca740f6346cf8483ec915d4054bc5da2d /health/guides/kubelet/kubelet_1m_pleg_relist_latency_quantile_099.md
parent	Initial commit. (diff)
download	netdata-be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97.tar.xz netdata-be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97.zip