summaryrefslogtreecommitdiffstats
path: root/health/guides/cgroups
diff options
context:
space:
mode:
Diffstat (limited to 'health/guides/cgroups')
-rw-r--r--health/guides/cgroups/cgroup_10min_cpu_usage.md5
-rw-r--r--health/guides/cgroups/cgroup_ram_in_use.md5
-rw-r--r--health/guides/cgroups/k8s_cgroup_10min_cpu_usage.md48
-rw-r--r--health/guides/cgroups/k8s_cgroup_ram_in_use.md42
4 files changed, 100 insertions, 0 deletions
diff --git a/health/guides/cgroups/cgroup_10min_cpu_usage.md b/health/guides/cgroups/cgroup_10min_cpu_usage.md
new file mode 100644
index 00000000..0ba41363
--- /dev/null
+++ b/health/guides/cgroups/cgroup_10min_cpu_usage.md
@@ -0,0 +1,5 @@
+### Understand the alert
+
+The Netdata Agent calculates the average CPU utilization over the last 10 minutes. This alert indicates that your system is in high cgroup CPU utilization. The system will throttle the group CPU usage when the usage is over the limit. To fix this issue, try to increase the cgroup CPU limit.
+
+This alert is triggered in warning state when the average CPU utilization is between 75-80% and in critical state when it is between 85-95%. \ No newline at end of file
diff --git a/health/guides/cgroups/cgroup_ram_in_use.md b/health/guides/cgroups/cgroup_ram_in_use.md
new file mode 100644
index 00000000..59440e0b
--- /dev/null
+++ b/health/guides/cgroups/cgroup_ram_in_use.md
@@ -0,0 +1,5 @@
+### Understand the alert
+
+The Netdata Agent calculates the percentage of used memory. This alert indicates high cgroup memory utilization. Out Of Memory (OOM) killer will kill some processes when the utilization reaches 100%. To fix this issue, try to increase the cgroup memory limit (if set).
+
+This alert is triggered in warning state when the percentage of used memory is between 80-90% and in critical state between 90-98%.
diff --git a/health/guides/cgroups/k8s_cgroup_10min_cpu_usage.md b/health/guides/cgroups/k8s_cgroup_10min_cpu_usage.md
new file mode 100644
index 00000000..3168e279
--- /dev/null
+++ b/health/guides/cgroups/k8s_cgroup_10min_cpu_usage.md
@@ -0,0 +1,48 @@
+### Understand the alert
+
+This alert calculates the average `cgroup CPU utilization` over the past 10 minutes in a Kubernetes cluster. If you receive this alert at the warning or critical levels, it means that your cgroup is heavily utilizing the available CPU resources.
+
+### What does cgroup CPU utilization mean?
+
+In Kubernetes, `cgroups` are a Linux kernel feature that helps to limit and isolate the resource usage (CPU, memory, disk I/O, etc.) of a collection of processes. The `cgroup CPU utilization` measures the percentage of available CPU resources consumed by the processes within a cgroup.
+
+### Troubleshoot the alert
+
+- Identify the over-utilizing cgroup
+
+Check the alert message for the specific cgroup that is causing high CPU utilization.
+
+- Determine the processes utilizing the most CPU resources in the cgroup
+
+To find the processes within the cgroup with high CPU usage, you can use `systemd-cgtop` on the Kubernetes nodes:
+
+```
+systemd-cgtop -m -1 -p -n10
+```
+
+- Analyze the Kubernetes resource usage
+
+Use `kubectl top` to get an overview of the resource usage in your Kubernetes cluster:
+
+```
+kubectl top nodes
+kubectl top pods
+```
+
+- Investigate the Kubernetes events and logs
+
+Examine the events and logs of the Kubernetes cluster and the specific resources that are causing the high CPU utilization.
+
+```
+kubectl get events --sort-by='.metadata.creationTimestamp'
+kubectl logs <pod-name> -n <namespace> --timestamps -f
+```
+
+- Optimize the resource usage of the cluster
+
+You may need to scale your cluster by adding more resources, adjusting the resource limits, or optimizing the application code to minimize CPU usage.
+
+### Useful resources
+
+1. [Overview of a Pod](https://kubernetes.io/docs/concepts/workloads/pods/)
+2. [Assign CPU Resources to Containers and Pods](https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/)
diff --git a/health/guides/cgroups/k8s_cgroup_ram_in_use.md b/health/guides/cgroups/k8s_cgroup_ram_in_use.md
new file mode 100644
index 00000000..aec443b7
--- /dev/null
+++ b/health/guides/cgroups/k8s_cgroup_ram_in_use.md
@@ -0,0 +1,42 @@
+### Understand the alert
+
+This alert monitors the `RAM usage` in a Kubernetes cluster by calculating the ratio of the memory used by a cgroup to its memory limit. If the memory usage exceeds certain thresholds, the alert triggers and indicates that the system's memory resources are under pressure.
+
+### Troubleshoot the alert
+
+1. Check overall RAM usage in the cluster
+
+ Use the `kubectl top nodes` command to check the overall memory usage on the cluster nodes:
+ ```
+ kubectl top nodes
+ ```
+
+2. Identify Pods with high memory usage
+
+ Use the `kubectl top pods --all-namespaces` command to identify Pods consuming a high amount of memory:
+ ```
+ kubectl top pods --all-namespaces
+ ```
+
+3. Inspect logs for errors or misconfigurations
+
+ Check the logs of Pods consuming high memory for any issues or misconfigurations:
+ ```
+ kubectl logs -n <namespace> <pod_name>
+ ```
+
+4. Inspect container resource limits
+
+ Review the resource limits defined in the Pod's yaml file, particularly the `limits` and `requests` sections. If you're not setting limits on Pods, then consider setting appropriate limits to prevent running out of resources.
+
+5. Scale or optimize applications
+
+ If high memory usage is expected and justified, consider scaling the application by adding replicas or increasing the allocated resources.
+
+ If the memory usage is not justified, optimizing the application code or configurations may help reduce memory usage.
+
+### Useful resources
+
+1. [Kubernetes best practices: Organizing with Namespaces](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/)
+2. [Managing Resources for Containers](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/)
+3. [Configure Default Memory Requests and Limits](https://kubernetes.io/docs/tasks/administer-cluster/memory-default-namespace/) \ No newline at end of file