Adding upstream version 1.45.3+dfsg.

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-05-05 11:19:16 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-05-05 12:07:37 +0000
commit: b485aab7e71c1625cfc27e0f92c9509f42378458 (patch)
tree: ae9abe108601079d1679194de237c9a435ae5b55 /collectors/apps.plugin/README.md
parent: Adding upstream version 1.44.3. (diff)
download: netdata-b485aab7e71c1625cfc27e0f92c9509f42378458.tar.xz
netdata-b485aab7e71c1625cfc27e0f92c9509f42378458.zip
1 files changed, 0 insertions, 402 deletions
diff --git a/collectors/apps.plugin/README.md b/collectors/apps.plugin/README.md
deleted file mode 100644
index fd5371f08..000000000
--- a/collectors/apps.plugin/README.md
+++ /dev/null
@@ -1,402 +0,0 @@
-<!--
-title: "Application monitoring (apps.plugin)"
-sidebar_label: "Application monitoring "
-custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/apps.plugin/README.md"
-learn_status: "Published"
-learn_topic_type: "References"
-learn_rel_path: "Integrations/Monitor/System metrics"
--->
-
-# Application monitoring (apps.plugin)
-
-`apps.plugin` breaks down system resource usage to **processes**, **users** and **user groups**.  
-It is enabled by default on every Netdata installation.
-
-To achieve this task, it iterates through the whole process tree, collecting resource usage information
-for every process found running.
-
-Since Netdata needs to present this information in charts and track them through time,
-instead of presenting a `top` like list, `apps.plugin` uses a pre-defined list of **process groups**
-to which it assigns all running processes. This list is customizable via `apps_groups.conf`, and Netdata
-ships with a good default for most cases (to edit it on your system run `/etc/netdata/edit-config apps_groups.conf`).
-
-So, `apps.plugin` builds a process tree (much like `ps fax` does in Linux), and groups
-processes together (evaluating both child and parent processes) so that the result is always a list with
-a predefined set of members (of course, only process groups found running are reported).
-
-> If you find that `apps.plugin` categorizes standard applications as `other`, we would be
-> glad to accept pull requests improving the defaults shipped with Netdata in `apps_groups.conf`.
-
-Unlike traditional process monitoring tools (like `top`), `apps.plugin` is able to account the resource
-utilization of exit processes. Their utilization is accounted at their currently running parents.
-So, `apps.plugin` is perfectly able to measure the resources used by shell scripts and other processes
-that fork/spawn other short-lived processes hundreds of times per second.
-
-## Charts
-
-`apps.plugin` provides charts for 3 sections:
-
-1.  Per application charts as **Applications** at Netdata dashboards
-2.  Per user charts as **Users** at Netdata dashboards
-3.  Per user group charts as **User Groups** at Netdata dashboards
-
-Each of these sections provides the same number of charts:
-
--   CPU utilization (`apps.cpu`)
-    -   Total CPU usage
-    -   User/system CPU usage (`apps.cpu_user`/`apps.cpu_system`)
--   Disk I/O
-    -   Physical reads/writes (`apps.preads`/`apps.pwrites`)
-    -   Logical reads/writes (`apps.lreads`/`apps.lwrites`)
-    -   Open unique files (if a file is found open multiple times, it is counted just once, `apps.files`)
--   Memory
-    -   Real Memory Used (non-shared, `apps.mem`)
-    -   Virtual Memory Allocated (`apps.vmem`)
-    -   Minor page faults (i.e. memory activity, `apps.minor_faults`)
--   Processes
-    -   Threads running (`apps.threads`)
-    -   Processes running (`apps.processes`)
-    -   Carried over uptime (since the last Netdata Agent restart, `apps.uptime`)
-    -   Minimum uptime (`apps.uptime_min`)
-    -   Average uptime (`apps.uptime_average`)
-    -   Maximum uptime (`apps.uptime_max`)
-    -   Pipes open (`apps.pipes`)
--   Swap memory
-    -   Swap memory used (`apps.swap`)
-    -   Major page faults (i.e. swap activity, `apps.major_faults`)
--   Network
-    -   Sockets open (`apps.sockets`)
-    
-In addition, if the [eBPF collector](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/README.md) is running, your dashboard will also show an
-additional [list of charts](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/README.md#integration-with-appsplugin) using low-level Linux
-metrics.
-
-The above are reported:
-
--   For **Applications** per target configured.
--   For **Users** per username or UID (when the username is not available).
--   For **User Groups** per group name or GID (when group name is not available).
-
-## Performance
-
-`apps.plugin` is a complex piece of software and has a lot of work to do
-We are proud that `apps.plugin` is a lot faster compared to any other similar tool,
-while collecting a lot more information for the processes, however the fact is that
-this plugin requires more CPU resources than the `netdata` daemon itself.
-
-Under Linux, for each process running, `apps.plugin` reads several `/proc` files
-per process. Doing this work per-second, especially on hosts with several thousands
-of processes, may increase the CPU resources consumed by the plugin.
-
-In such cases, you many need to lower its data collection frequency.
-
-To do this, edit `/etc/netdata/netdata.conf` and find this section:
-
-```
-[plugin:apps]
-	# update every = 1
-	# command options =
-```
-
-Uncomment the line `update every` and set it to a higher number. If you just set it to `2`,
-its CPU resources will be cut in half, and data collection will be once every 2 seconds.
-
-## Configuration
-
-The configuration file is `/etc/netdata/apps_groups.conf`. To edit it on your system, run `/etc/netdata/edit-config apps_groups.conf`.
-
-The configuration file works accepts multiple lines, each having this format:
-
-```txt
-group: process1 process2 ...
-```
-
-Each group can be given multiple times, to add more processes to it.
-
-For the **Applications** section, only groups configured in this file are reported.
-All other processes will be reported as `other`.
-
-For each process given, its whole process tree will be grouped, not just the process matched.
-The plugin will include both parents and children. If including the parents into the group is
-undesirable, the line `other: *` should be appended to the `apps_groups.conf`.
-
-The process names are the ones returned by:
-
--   `ps -e` or `cat /proc/PID/stat`
--   in case of substring mode (see below): `/proc/PID/cmdline`
-
-To add process names with spaces, enclose them in quotes (single or double)
-example: `'Plex Media Serv'` or `"my other process"`.
-
-You can add an asterisk `*` at the beginning and/or the end of a process:
-
--   `*name` _suffix_ mode: will search for processes ending with `name` (at `/proc/PID/stat`)
--   `name*` _prefix_ mode: will search for processes beginning with `name` (at `/proc/PID/stat`)
--   `*name*` _substring_ mode: will search for `name` in the whole command line (at `/proc/PID/cmdline`)
-
-If you enter even just one _name_ (substring), `apps.plugin` will process
-`/proc/PID/cmdline` for all processes (of course only once per process: when they are first seen).
-
-To add processes with single quotes, enclose them in double quotes: `"process with this ' single quote"`
-
-To add processes with double quotes, enclose them in single quotes: `'process with this " double quote'`
-
-If a group or process name starts with a `-`, the dimension will be hidden from the chart (cpu chart only).
-
-If a process starts with a `+`, debugging will be enabled for it (debugging produces a lot of output - do not enable it in production systems).
-
-You can add any number of groups. Only the ones found running will affect the charts generated.
-However, producing charts with hundreds of dimensions may slow down your web browser.
-
-The order of the entries in this list is important: the first that matches a process is used, so put important
-ones at the top. Processes not matched by any row, will inherit it from their parents or children.
-
-The order also controls the order of the dimensions on the generated charts (although applications started
-after apps.plugin is started, will be appended to the existing list of dimensions the `netdata` daemon maintains).
-
-There are a few command line options you can pass to `apps.plugin`. The list of available options can be acquired with the `--help` flag. The options can be set in the `netdata.conf` file. For example, to disable user and user group charts you should set
-
-```
-[plugin:apps]
-  command options = without-users without-groups
-```
-
-### Integration with eBPF
-
-If you don't see charts under the **eBPF syscall** or **eBPF net** sections, you should edit your
-[`ebpf.d.conf`](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/README.md#configure-the-ebpf-collector) file to ensure the eBPF program is enabled.
-
-Also see our [guide on troubleshooting apps with eBPF
-metrics](https://github.com/netdata/netdata/blob/master/docs/guides/troubleshoot/monitor-debug-applications-ebpf.md) for ideas on how to interpret these charts in a
-few scenarios.
-
-## Permissions
-
-`apps.plugin` requires additional privileges to collect all the information it needs.
-The problem is described in issue #157.
-
-When Netdata is installed, `apps.plugin` is given the capabilities `cap_dac_read_search,cap_sys_ptrace+ep`.
-If this fails (i.e. `setcap` fails), `apps.plugin` is setuid to `root`.
-
-### linux capabilities in containers
-
-There are a few cases, like `docker` and `virtuozzo` containers, where `setcap` succeeds, but the capabilities
-are silently ignored (in `lxc` containers `setcap` fails).
-
-In this case, you will have to setuid to root `apps.plugin` by running these commands:
-
-```sh
-chown root:netdata /usr/libexec/netdata/plugins.d/apps.plugin
-chmod 4750 /usr/libexec/netdata/plugins.d/apps.plugin
-```
-
-You will have to run these, every time you update Netdata.
-
-## Security
-
-`apps.plugin` performs a hard-coded function of building the process tree in memory,
-iterating forever, collecting metrics for each running process and sending them to Netdata.
-This is a one-way communication, from `apps.plugin` to Netdata.
-
-So, since `apps.plugin` cannot be instructed by Netdata for the actions it performs,
-we think it is pretty safe to allow it to have these increased privileges.
-
-Keep in mind that `apps.plugin` will still run without escalated permissions,
-but it will not be able to collect all the information.
-
-## Application Badges
-
-You can create badges that you can embed anywhere you like, with URLs like this:
-
-```
-https://your.netdata.ip:19999/api/v1/badge.svg?chart=apps.processes&dimensions=myapp&value_color=green%3E0%7Cred
-```
-
-The color expression unescaped is this: `value_color=green>0|red`.
-
-Here is an example for the process group `sql` at `https://registry.my-netdata.io`:
-
-![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.processes&dimensions=sql&value_color=green%3E0%7Cred)
-
-Netdata is able to give you a lot more badges for your app.
-Examples below for process group `sql`:
-
--   CPU usage: ![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.cpu&dimensions=sql&value_color=green=0%7Corange%3C50%7Cred)
--   Disk Physical Reads ![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.preads&dimensions=sql&value_color=green%3C100%7Corange%3C1000%7Cred)
--   Disk Physical Writes ![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.pwrites&dimensions=sql&value_color=green%3C100%7Corange%3C1000%7Cred)
--   Disk Logical Reads ![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.lreads&dimensions=sql&value_color=green%3C100%7Corange%3C1000%7Cred)
--   Disk Logical Writes ![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.lwrites&dimensions=sql&value_color=green%3C100%7Corange%3C1000%7Cred)
--   Open Files ![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.fds_files&dimensions=sql&value_color=green%3E30%7Cred)
--   Real Memory ![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.mem&dimensions=sql&value_color=green%3C100%7Corange%3C200%7Cred)
--   Virtual Memory ![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.vmem&dimensions=sql&value_color=green%3C100%7Corange%3C1000%7Cred)
--   Swap Memory ![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.swap&dimensions=sql&value_color=green=0%7Cred)
--   Minor Page Faults ![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.minor_faults&dimensions=sql&value_color=green%3C100%7Corange%3C1000%7Cred)
--   Processes ![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.processes&dimensions=sql&value_color=green%3E0%7Cred)
--   Threads ![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.threads&dimensions=sql&value_color=green%3E=28%7Cred)
--   Major Faults (swap activity) ![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.major_faults&dimensions=sql&value_color=green=0%7Cred)
--   Open Pipes ![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.fds_pipes&dimensions=sql&value_color=green=0%7Cred)
--   Open Sockets ![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.fds_sockets&dimensions=sql&value_color=green%3E=3%7Cred)
-
-For more information about badges check [Generating Badges](https://github.com/netdata/netdata/blob/master/web/api/badges/README.md)
-
-## Comparison with console tools
-
-SSH to a server running Netdata and execute this:
-
-```sh
-while true; do ls -l /var/run >/dev/null; done
-```
-
-In most systems `/var/run` is a `tmpfs` device, so there is nothing that can stop this command
-from consuming entirely one of the CPU cores of the machine.
-
-As we will see below, **none** of the console performance monitoring tools can report that this
-command is using 100% CPU. They do report of course that the CPU is busy, but **they fail to
-identify the process that consumes so much CPU**.
-
-Here is what common Linux console monitoring tools report:
-
-### top
-
-`top` reports that `bash` is using just 14%.
-
-If you check the total system CPU utilization, it says there is no idle CPU at all, but `top`
-fails to provide a breakdown of the CPU consumption in the system. The sum of the CPU utilization
-of all processes reported by `top`, is 15.6%.
-
-```
-top - 18:46:28 up 3 days, 20:14,  2 users,  load average: 0.22, 0.05, 0.02
-Tasks:  76 total,   2 running,  74 sleeping,   0 stopped,   0 zombie
-%Cpu(s): 32.8 us, 65.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  1.3 hi,  0.3 si,  0.0 st
-KiB Mem :  1016576 total,   244112 free,    52012 used,   720452 buff/cache
-KiB Swap:        0 total,        0 free,        0 used.   753712 avail Mem
-
-  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
-12789 root      20   0   14980   4180   3020 S 14.0  0.4   0:02.82 bash
-    9 root      20   0       0      0      0 S  1.0  0.0   0:22.36 rcuos/0
-  642 netdata   20   0  132024  20112   2660 S  0.3  2.0  14:26.29 netdata
-12522 netdata   20   0    9508   2476   1828 S  0.3  0.2   0:02.26 apps.plugin
-    1 root      20   0   67196  10216   7500 S  0.0  1.0   0:04.83 systemd
-    2 root      20   0       0      0      0 S  0.0  0.0   0:00.00 kthreadd
-```
-
-### htop
-
-Exactly like `top`, `htop` is providing an incomplete breakdown of the system CPU utilization.
-
-```
-  CPU[||||||||||||||||||||||||100.0%]   Tasks: 27, 11 thr; 2 running
-  Mem[||||||||||||||||||||85.4M/993M]   Load average: 1.16 0.88 0.90
-  Swp[                         0K/0K]   Uptime: 3 days, 21:37:03
-
-  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
-12789 root       20   0 15104  4484  3208 S 14.0  0.4 10:57.15 -bash
- 7024 netdata    20   0  9544  2480  1744 S  0.7  0.2  0:00.88 /usr/libexec/netd
- 7009 netdata    20   0  138M 21016  2712 S  0.7  2.1  0:00.89 /usr/sbin/netdata
- 7012 netdata    20   0  138M 21016  2712 S  0.0  2.1  0:00.31 /usr/sbin/netdata
-  563 root	     20   0  308M  202M  202M S  0.0 20.4  1:00.81 /usr/lib/systemd/
- 7019 netdata    20   0  138M 21016  2712 S  0.0  2.1  0:00.14 /usr/sbin/netdata
-```
-
-### atop
-
-`atop` also fails to break down CPU usage.
-
-```
-ATOP - localhost            2016/12/10  20:11:27    -----------      10s elapsed
-PRC | sys    1.13s | user   0.43s | #proc     75 | #zombie    0 | #exit   5383 |
-CPU | sys      67% | user     31% | irq       2% | idle      0% | wait      0% |
-CPL | avg1    1.34 | avg5    1.05 | avg15   0.96 | csw    51346 | intr   10508 |
-MEM | tot   992.8M | free  211.5M | cache 470.0M | buff   87.2M | slab  164.7M |
-SWP | tot     0.0M | free    0.0M |              | vmcom 207.6M | vmlim 496.4M |
-DSK |          vda | busy      0% | read       0 | write      4 | avio 1.50 ms |
-NET | transport    | tcpi      16 | tcpo      15 | udpi       0 | udpo       0 |
-NET | network      | ipi       16 | ipo       15 | ipfrw      0 | deliv     16 |
-NET | eth0    ---- | pcki      16 | pcko      15 | si    1 Kbps | so    4 Kbps |
-
-  PID SYSCPU USRCPU   VGROW  RGROW  RDDSK   WRDSK ST EXC  S  CPU CMD       1/600
-12789  0.98s  0.40s      0K     0K     0K    336K --   -  S  14% bash
-    9  0.08s  0.00s      0K     0K     0K      0K --   -  S   1% rcuos/0
- 7024  0.03s  0.00s      0K     0K     0K      0K --   -  S   0% apps.plugin
- 7009  0.01s  0.01s	     0K     0K     0K      4K --   -  S   0% netdata
-```
-
-### glances
-
-And the same is true for `glances`. The system runs at 100%, but `glances` reports only 17%
-per process utilization.
-
-Note also, that being a `python` program, `glances` uses 1.6% CPU while it runs.
-
-```
-localhost                                               Uptime: 3 days, 21:42:00
-
-CPU  [100.0%]   CPU     100.0%   MEM     23.7%   SWAP      0.0%   LOAD    1-core
-MEM  [ 23.7%]   user:    30.9%   total:   993M   total:       0   1 min:    1.18
-SWAP [  0.0%]   system:  67.8%   used:    236M   used:        0   5 min:    1.08
-                idle:     0.0%   free:    757M   free:        0   15 min:   1.00
-
-NETWORK     Rx/s   Tx/s   TASKS  75 (90 thr), 1 run, 74 slp, 0 oth
-eth0        168b    2Kb
-eth1          0b     0b     CPU%  MEM%   PID USER        NI S Command
-lo            0b     0b     13.5   0.4 12789 root         0 S -bash
-                             1.6   2.2  7025 root         0 R /usr/bin/python /u
-DISK I/O     R/s    W/s      1.0   0.0     9 root         0 S rcuos/0
-vda1           0     4K      0.3   0.2  7024 netdata      0 S /usr/libexec/netda
-                             0.3   0.0     7 root         0 S rcu_sched
-FILE SYS    Used  Total      0.3   2.1  7009 netdata      0 S /usr/sbin/netdata
-/ (vda1)   1.56G  29.5G      0.0   0.0    17 root         0 S oom_reaper
-```
-
-### why does this happen?
-
-All the console tools report usage based on the processes found running *at the moment they
-examine the process tree*. So, they see just one `ls` command, which is actually very quick
-with minor CPU utilization. But the shell, is spawning hundreds of them, one after another
-(much like shell scripts do).
-
-### What does Netdata report?
-
-The total CPU utilization of the system:
-
-![image](https://cloud.githubusercontent.com/assets/2662304/21076212/9198e5a6-bf2e-11e6-9bc0-6bdea25befb2.png)
-<br/>***Figure 1**: The system overview section at Netdata, just a few seconds after the command was run*
-
-And at the applications `apps.plugin` breaks down CPU usage per application:
-
-![image](https://cloud.githubusercontent.com/assets/2662304/21076220/c9687848-bf2e-11e6-8d81-348592c5aca2.png)
-<br/>***Figure 2**: The Applications section at Netdata, just a few seconds after the command was run*
-
-So, the `ssh` session is using 95% CPU time.
-
-Why `ssh`?
-
-`apps.plugin` groups all processes based on its configuration file.
-The default configuration has nothing for `bash`, but it has for `sshd`, so Netdata accumulates
-all ssh sessions to a dimension on the charts, called `ssh`. This includes all the processes in
-the process tree of `sshd`, **including the exited children**.
-
-> Distributions based on `systemd`, provide another way to get cpu utilization per user session
-> or service running: control groups, or cgroups, commonly used as part of containers
-> `apps.plugin` does not use these mechanisms. The process grouping made by `apps.plugin` works
-> on any Linux, `systemd` based or not.
-
-#### a more technical description of how Netdata works
-
-Netdata reads `/proc/<pid>/stat` for all processes, once per second and extracts `utime` and
-`stime` (user and system cpu utilization), much like all the console tools do.
-
-But it also extracts `cutime` and `cstime` that account the user and system time of the exit children of each process.
-By keeping a map in memory of the whole process tree, it is capable of assigning the right time to every process, taking
-into account all its exited children.
-
-It is tricky, since a process may be running for 1 hour and once it exits, its parent should not
-receive the whole 1 hour of cpu time in just 1 second - you have to subtract the cpu time that has
-been reported for it prior to this iteration.
-
-It is even trickier, because walking through the entire process tree takes some time itself. So,
-if you sum the CPU utilization of all processes, you might have more CPU time than the reported
-total cpu time of the system. Netdata solves this, by adapting the per process cpu utilization to
-the total of the system. [Netdata adds charts that document this normalization](https://london.my-netdata.io/default.html#menu_netdata_submenu_apps_plugin).
-
-
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-05-05 11:19:16 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-05-05 12:07:37 +0000
commit	b485aab7e71c1625cfc27e0f92c9509f42378458 (patch)
tree	ae9abe108601079d1679194de237c9a435ae5b55 /collectors/apps.plugin/README.md
parent	Adding upstream version 1.44.3. (diff)
download	netdata-b485aab7e71c1625cfc27e0f92c9509f42378458.tar.xz netdata-b485aab7e71c1625cfc27e0f92c9509f42378458.zip