summaryrefslogtreecommitdiffstats
path: root/health/guides/load/load_average_5.md
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-19 02:57:58 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-19 02:57:58 +0000
commitbe1c7e50e1e8809ea56f2c9d472eccd8ffd73a97 (patch)
tree9754ff1ca740f6346cf8483ec915d4054bc5da2d /health/guides/load/load_average_5.md
parentInitial commit. (diff)
downloadnetdata-be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97.tar.xz
netdata-be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97.zip
Adding upstream version 1.44.3.upstream/1.44.3upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'health/guides/load/load_average_5.md')
-rw-r--r--health/guides/load/load_average_5.md66
1 files changed, 66 insertions, 0 deletions
diff --git a/health/guides/load/load_average_5.md b/health/guides/load/load_average_5.md
new file mode 100644
index 00000000..6eacfcec
--- /dev/null
+++ b/health/guides/load/load_average_5.md
@@ -0,0 +1,66 @@
+### Understand the alert
+
+This alarm calculates the system `load average` (CPU and I/O demand) over the period of five minutes. If you receive this alarm, it means that your system is "overloaded."
+
+The alert gets raised into warning if the metric is 4 times the expected value and cleared if the value is 3.5 times the expected value.
+
+For further information on how our alerts are calculated, please have a look at our [Documentation](https://learn.netdata.cloud/docs/agent/health/reference#expressions).
+
+
+### What does "load average" mean?
+
+The term `system load average` on a Linux machine, measures the **number of threads that are currently working and those waiting to work** (CPU, disk, uninterruptible locks). So simply stated: **System load average measures the number of threads that aren't idle.**
+
+### What does "overloaded" mean?
+
+Let's look at a single core CPU system and think of its core count as car lanes on a bridge. A car represents a process in this example:
+
+- On a 0.5 load average, the traffic on the bridge is fine, it is at 50% of its capacity.
+- If the load average is at 1, then the bridge is full, and it is utilized 100%.
+- If the load average gets to 2 (remember we are on a single core machine), it means that there is one car lane that is passing the bridge. However, there is **another** full car lane that waits to pass the bridge.
+
+So this is how you can imagine CPU load, but keep in mind that `load average` counts also I/O demand, so there is an analogous example there.
+
+### Useful resources
+
+1. [UNIX Load Average Part 1: How It Works](https://www.helpsystems.com/resources/guides/unix-load-average-part-1-how-it-works)
+2. [UNIX Load Average Part 2: Not Your Average Average](https://www.helpsystems.com/resources/guides/unix-load-average-part-2-not-your-average-average)
+3. [Understanding Linux CPU Load](https://scoutapm.com/blog/understanding-load-averages)
+4. [Linux Load Averages: Solving the Mystery](https://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html)
+5. [Understanding Linux Process States](https://access.redhat.com/sites/default/files/attachments/processstates_20120831.pdf)
+
+
+### Troubleshoot the alert
+
+- Determine if the problem is CPU or I/O bound
+
+First you need to check if you are running on a CPU load or an I/O load problem.
+
+1. To get a report about your system statistics, use `vmstat` (or `vmstat 1`, to set a delay between updates in seconds):
+
+The `procs` column, shows:
+r: The number of runnable processes (running or waiting for run time).
+b: The number of processes blocked waiting for I/O to complete.
+
+2. List your currently running processes using the `ps` command:
+
+The `grep` command will fetch the processes that their state code starts either with R (running or runnable (on run queue)) or D(uninterruptible sleep (usually IO)).
+
+3. Minimize the load by closing any unnecessary main consumer processes. We strongly advise you to double-check if the process you want to close is necessary.
+
+- Check per-process CPU/disk usage to find the top consumers
+
+1. To see the processes that are the main CPU consumers, use the task manager program `top` like this:
+
+ ```
+ top -o +%CPU -i
+ ```
+
+2. Use `iotop`:
+ `iotop` is a useful tool, similar to `top`, used to monitor Disk I/O usage, if you don't have it, then [install it](https://www.tecmint.com/iotop-monitor-linux-disk-io-activity-per-process/)
+ ```
+ sudo iotop
+ ```
+
+3. Minimize the load by closing any unnecessary main consumer processes. We strongly advise you to double-check if the process you want to close is necessary.
+