diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-05 11:19:16 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-05 12:07:37 +0000 |
commit | b485aab7e71c1625cfc27e0f92c9509f42378458 (patch) | |
tree | ae9abe108601079d1679194de237c9a435ae5b55 /src/health/guides/load/load_average_15.md | |
parent | Adding upstream version 1.44.3. (diff) | |
download | netdata-b485aab7e71c1625cfc27e0f92c9509f42378458.tar.xz netdata-b485aab7e71c1625cfc27e0f92c9509f42378458.zip |
Adding upstream version 1.45.3+dfsg.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/health/guides/load/load_average_15.md')
-rw-r--r-- | src/health/guides/load/load_average_15.md | 55 |
1 files changed, 55 insertions, 0 deletions
diff --git a/src/health/guides/load/load_average_15.md b/src/health/guides/load/load_average_15.md new file mode 100644 index 000000000..ba8b1e3e0 --- /dev/null +++ b/src/health/guides/load/load_average_15.md @@ -0,0 +1,55 @@ +### Understand the alert + +This alarm calculates the system `load average` (CPU and I/O demand) over the period of fifteen minutes. If you receive this alarm, it means that your system is "overloaded." + +The alert gets raised into warning if the metric is 2 times the expected value and cleared if the value is 1.75 times the expected value. + +For further information on how our alerts are calculated, please have a look at our [Documentation](https://learn.netdata.cloud/docs/agent/health/reference#expressions). + +### What does "load average" mean? + +The term `system load average` on a Linux machine, measures the **number of threads that are currently working and those waiting to work** (CPU, disk, uninterruptible locks). So simply stated: **System load average measures the number of threads that aren't idle.** + +### What does "overloaded" mean? + +Let's look at a single core CPU system and think of its core count as car lanes on a bridge. A car represents a process in this example: + +- On a 0.5 load average, the traffic on the bridge is fine, it is at 50% of its capacity. +- If the load average is at 1, then the bridge is full, and it is utilized 100%. +- If the load average gets to 2 (remember we are on a single core machine), it means that there is one car lane that is passing the bridge. However, there is **another** full car lane that waits to pass the bridge. + +So this is how you can imagine CPU load, but keep in mind that `load average` counts also I/O demand, so there is an analogous example there. + +### Troubleshoot the alert + +- Determine if the problem is CPU load or I/O load + +To get a report about your system statistics, use `vmstat` (or `vmstat 1`, to set a delay between updates in seconds): + +The `procs` column, shows: +r: The number of runnable processes (running or waiting for run time). +b: The number of processes blocked waiting for I/O to complete. + +- Check per-process CPU/disk usage to find the top consumers + +1. To see the processes that are the main CPU consumers, use the task manager program `top` like this: + + ``` + top -o +%CPU -i + ``` + +2. Use `iotop`: + `iotop` is a useful tool, similar to `top`, used to monitor Disk I/O usage, if you don't have it, then [install it](https://www.tecmint.com/iotop-monitor-linux-disk-io-activity-per-process/) + ``` + sudo iotop + ``` + +3. Minimize the load by closing any unnecessary main consumer processes. We strongly advise you to double-check if the process you want to close is necessary. + +### Useful resources + +1. [UNIX Load Average Part 1: How It Works](https://www.helpsystems.com/resources/guides/unix-load-average-part-1-how-it-works) +2. [UNIX Load Average Part 2: Not Your Average Average](https://www.helpsystems.com/resources/guides/unix-load-average-part-2-not-your-average-average) +3. [Understanding Linux CPU Load](https://scoutapm.com/blog/understanding-load-averages) +4. [Linux Load Averages: Solving the Mystery](https://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html) +5. [Understanding Linux Process States](https://access.redhat.com/sites/default/files/attachments/processstates_20120831.pdf) |