diff options
Diffstat (limited to '')
-rw-r--r-- | collectors/apps.plugin/README.md | 80 |
1 files changed, 47 insertions, 33 deletions
diff --git a/collectors/apps.plugin/README.md b/collectors/apps.plugin/README.md index 1b682bc65..d10af1cdd 100644 --- a/collectors/apps.plugin/README.md +++ b/collectors/apps.plugin/README.md @@ -1,3 +1,9 @@ +<!-- +title: "apps.plugin" +sidebar_label: "Application monitoring (apps.plugin)" +custom_edit_url: https://github.com/netdata/netdata/edit/master/collectors/apps.plugin/README.md +--> + # apps.plugin `apps.plugin` breaks down system resource usage to **processes**, **users** and **user groups**. @@ -7,7 +13,7 @@ for every process found running. Since Netdata needs to present this information in charts and track them through time, instead of presenting a `top` like list, `apps.plugin` uses a pre-defined list of **process groups** -to which it assigns all running processes. This list is [customizable](apps_groups.conf) and Netdata +to which it assigns all running processes. This list is customizable via `apps_groups.conf`, and Netdata ships with a good default for most cases (to edit it on your system run `/etc/netdata/edit-config apps_groups.conf`). So, `apps.plugin` builds a process tree (much like `ps fax` does in Linux), and groups @@ -15,7 +21,7 @@ processes together (evaluating both child and parent processes) so that the resu a predefined set of members (of course, only process groups found running are reported). > If you find that `apps.plugin` categorizes standard applications as `other`, we would be -> glad to accept pull requests improving the [defaults](apps_groups.conf) shipped with Netdata. +> glad to accept pull requests improving the defaults shipped with Netdata in `apps_groups.conf`. Unlike traditional process monitoring tools (like `top`), `apps.plugin` is able to account the resource utilization of exit processes. Their utilization is accounted at their currently running parents. @@ -32,35 +38,38 @@ that fork/spawn other short lived processes hundreds of times per second. Each of these sections provides the same number of charts: -- CPU Utilization +- CPU utilization (`apps.cpu`) - Total CPU usage - - User / System CPU usage + - User/system CPU usage (`apps.cpu_user`/`apps.cpu_system`) - Disk I/O - - Physical Reads / Writes - - Logical Reads / Writes - - Open Unique Files (if a file is found open multiple times, it is counted just once) + - Physical reads/writes (`apps.preads`/`apps.pwrites`) + - Logical reads/writes (`apps.lreads`/`apps.lwrites`) + - Open unique files (if a file is found open multiple times, it is counted just once, `apps.files`) - Memory - - Real Memory Used (non shared) - - Virtual Memory Allocated - - Minor Page Faults (i.e. memory activity) + - Real Memory Used (non-shared, `apps.mem`) + - Virtual Memory Allocated (`apps.vmem`) + - Minor page faults (i.e. memory activity, `apps.minor_faults`) - Processes - - Threads Running - - Processes Running - - Pipes Open - - Carried Over Uptime (since the Netdata restart) - - Minimum Uptime - - Average Uptime - - Maximum Uptime - -- Swap Memory - - Swap Memory Used - - Major Page Faults (i.e. swap activity) + - Threads running (`apps.threads`) + - Processes running (`apps.processes`) + - Carried over uptime (since the last Netdata Agent restart, `apps.uptime`) + - Minimum uptime (`apps.uptime_min`) + - Average uptime (`apps.uptime_average`) + - Maximum uptime (`apps.uptime_max`) + - Pipes open (`apps.pipes`) +- Swap memory + - Swap memory used (`apps.swap`) + - Major page faults (i.e. swap activity, `apps.major_faults`) - Network - - Sockets Open + - Sockets open (`apps.sockets`) + +In addition, if the [eBPF collector](/collectors/ebpf.plugin/README.md) is running, your dashboard will also show an +additional [list of charts](/collectors/ebpf.plugin/README.md#integration-with-appsplugin) using low-level Linux +metrics. The above are reported: -- For **Applications** per [target configured](apps_groups.conf). +- For **Applications** per target configured. - For **Users** per username or UID (when the username is not available). - For **User Groups** per groupname or GID (when groupname is not available). @@ -90,8 +99,7 @@ its CPU resources will be cut in half, and data collection will be once every 2 ## Configuration -The configuration file is `/etc/netdata/apps_groups.conf` (the default is [here](apps_groups.conf)). -To edit it on your system run `/etc/netdata/edit-config apps_groups.conf`. +The configuration file is `/etc/netdata/apps_groups.conf`. To edit it on your system, run `/etc/netdata/edit-config apps_groups.conf`. The configuration file works accepts multiple lines, each having this format: @@ -149,6 +157,15 @@ There are a few command line options you can pass to `apps.plugin`. The list of command options = without-users without-groups ``` +### Integration with eBPF + +If you don't see charts under the **eBPF syscall** or **eBPF net** sections, you should edit your +[`ebpf.conf`](/collectors/ebpf.plugin/README.md#ebpf-programs) file to ensure the eBPF program is enabled. + +Also see our [guide on troubleshooting apps with eBPF +metrics](/docs/guides/troubleshoot/monitor-debug-applications-ebpf.md) for ideas on how to interpret these charts in a +few scenarios. + ## Permissions `apps.plugin` requires additional privileges to collect all the information it needs. @@ -217,7 +234,7 @@ Examples below for process group `sql`: - Open Pipes ![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.pipes&dimensions=sql&value_color=green=0%7Cred) - Open Sockets ![image](https://registry.my-netdata.io/api/v1/badge.svg?chart=apps.sockets&dimensions=sql&value_color=green%3E=3%7Cred) -For more information about badges check [Generating Badges](../../web/api/badges) +For more information about badges check [Generating Badges](/web/api/badges/README.md) ## Comparison with console tools @@ -351,9 +368,7 @@ So, the `ssh` session is using 95% CPU time. Why `ssh`? -`apps.plugin` groups all processes based on its configuration file -[`/etc/netdata/apps_groups.conf`](apps_groups.conf) -(to edit it on your system run `/etc/netdata/edit-config apps_groups.conf`). +`apps.plugin` groups all processes based on its configuration file. The default configuration has nothing for `bash`, but it has for `sshd`, so Netdata accumulates all ssh sessions to a dimension on the charts, called `ssh`. This includes all the processes in the process tree of `sshd`, **including the exited children**. @@ -368,10 +383,9 @@ the process tree of `sshd`, **including the exited children**. Netdata reads `/proc/<pid>/stat` for all processes, once per second and extracts `utime` and `stime` (user and system cpu utilization), much like all the console tools do. -But it [also extracts `cutime` and `cstime`](https://github.com/netdata/netdata/blob/62596cc6b906b1564657510ca9135c08f6d4cdda/src/apps_plugin.c#L636-L642) -that account the user and system time of the exit children of each process. By keeping a map in -memory of the whole process tree, it is capable of assigning the right time to every process, -taking into account all its exited children. +But it also extracts `cutime` and `cstime` that account the user and system time of the exit children of each process. +By keeping a map in memory of the whole process tree, it is capable of assigning the right time to every process, taking +into account all its exited children. It is tricky, since a process may be running for 1 hour and once it exits, its parent should not receive the whole 1 hour of cpu time in just 1 second - you have to subtract the cpu time that has |