diff options
Diffstat (limited to 'collectors/proc.plugin/README.md')
-rw-r--r-- | collectors/proc.plugin/README.md | 121 |
1 files changed, 114 insertions, 7 deletions
diff --git a/collectors/proc.plugin/README.md b/collectors/proc.plugin/README.md index 7e2aa1096..085afb4fb 100644 --- a/collectors/proc.plugin/README.md +++ b/collectors/proc.plugin/README.md @@ -1,3 +1,8 @@ +<!-- +title: "proc.plugin" +custom_edit_url: https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/README.md +--> + # proc.plugin - `/proc/net/dev` (all network interfaces for all their values) @@ -6,6 +11,7 @@ - `/proc/net/snmp` (total IPv4, TCP and UDP usage) - `/proc/net/snmp6` (total IPv6 usage) - `/proc/net/netstat` (more IPv4 usage) +- `/proc/net/wireless` (wireless extension) - `/proc/net/stat/nf_conntrack` (connection tracking performance) - `/proc/net/stat/synproxy` (synproxy performance) - `/proc/net/ip_vs/stats` (IPVS connection statistics) @@ -18,8 +24,10 @@ - `/proc/interrupts` (total and per core hardware interrupts) - `/proc/softirqs` (total and per core software interrupts) - `/proc/loadavg` (system load and total processes running) +- `/proc/pressure/{cpu,memory,io}` (pressure stall information) - `/proc/sys/kernel/random/entropy_avail` (random numbers pool availability - used in cryptography) - `/sys/class/power_supply` (power supply properties) +- `/sys/class/infiniband` (infiniband interconnect) - `ipc` (IPC semaphores and message queues) - `ksm` Kernel Same-Page Merging performance (several files under `/sys/kernel/mm/ksm`). - `netdata` (internal Netdata resources utilization) @@ -78,8 +86,8 @@ By default, Netdata will enable monitoring metrics only when they are not zero. Netdata categorizes all block devices in 3 categories: -1. physical disks (i.e. block devices that does not have slaves and are not partitions) -2. virtual disks (i.e. block devices that have slaves - like RAID devices) +1. physical disks (i.e. block devices that do not have child devices and are not partitions) +2. virtual disks (i.e. block devices that have child devices - like RAID devices) 3. disk partitions (i.e. block devices that are part of a physical disk) Performance metrics are enabled by default for all disk devices, except partitions and not-mounted virtual disks. Of course, you can enable/disable monitoring any block device by editing the Netdata configuration file. @@ -226,13 +234,31 @@ So, to disable performance metrics for all loop devices you could add `performan ## Monitoring CPUs -The `/proc/stat` module monitors CPU utilization, interrupts, context switches, processes started/running, thermal throttling, frequency, and idle states. It gathers this information from multiple files. +The `/proc/stat` module monitors CPU utilization, interrupts, context switches, processes started/running, thermal +throttling, frequency, and idle states. It gathers this information from multiple files. -If more than 50 cores are present in a system then CPU thermal throttling, frequency, and idle state charts are disabled. +If your system has more than 50 processors (`physical processors * cores per processor * threads per core`), the Agent +automatically disables CPU thermal throttling, frequency, and idle state charts. To override this default, see the next +section on configuration. -#### configuration +### Configuration + +The settings for monitoring CPUs is in the `[plugin:proc:/proc/stat]` of your `netdata.conf` file. -`keep per core files open` option in the `[plugin:proc:/proc/stat]` configuration section allows reducing the number of file operations on multiple files. +The `keep per core files open` option lets you reduce the number of file operations on multiple files. + +If your system has more than 50 processors and you would like to see the CPU thermal throttling, frequency, and idle +state charts that are automatically disabled, you can set the following boolean options in the +`[plugin:proc:/proc/stat]` section. + +```conf + keep per core files open = yes + keep cpuidle files open = yes + core_throttle_count = yes + package_throttle_count = yes + cpu frequency = yes + cpu idle states = yes +``` ### CPU frequency @@ -295,11 +321,50 @@ each state. By default Netdata will enable monitoring metrics only when they are not zero. If they are constantly zero they are ignored. Metrics that will start having values, after Netdata is started, will be detected and charts will be automatically added to the dashboard (a refresh of the dashboard is needed for them to appear though). +### Monitoring wireless network interfaces + +The settings for monitoring wireless is in the `[plugin:proc:/proc/net/wireless]` section of your `netdata.conf` file. + +```conf + status for all interfaces = yes + quality for all interfaces = yes + discarded packets for all interfaces = yes + missed beacon for all interface = yes +``` + +You can set the following values for each configuration option: + +- `auto` = enable monitoring if the collected values are not zero +- `yes` = enable monitoring +- `no` = disable monitoring + +#### Monitored wireless interface metrics + +- **Status** + The current state of the interface. This is a device-dependent option. + +- **Link** + Overall quality of the link. + +- **Level** + Received signal strength (RSSI), which indicates how strong the received signal is. + +- **Noise** + Background noise level. + +- **Discarded packets** + Discarded packets for: Number of packets received with a different NWID or ESSID (`nwid`), unable to decrypt (`crypt`), hardware was not able to properly re-assemble the link layer fragments (`frag`), packets failed to deliver (`retry`), and packets lost in relation with specific wireless operations (`misc`). + +- **Missed beacon** + Number of periodic beacons from the cell or the access point the interface has missed. + +#### Wireless configuration + #### alarms There are several alarms defined in `health.d/net.conf`. -The tricky ones are `inbound packets dropped` and `inbound packets dropped ratio`. They have quite a strict policy so that they warn users about possible issues. These alarms can be annoying for some network configurations. It is especially true for some bonding configurations if an interface is a slave or a bonding interface itself. If it is expected to have a certain number of drops on an interface for a certain network configuration, a separate alarm with different triggering thresholds can be created or the existing one can be disabled for this specific interface. It can be done with the help of the [families](../../health/#alarm-line-families) line in the alarm configuration. For example, if you want to disable the `inbound packets dropped` alarm for `eth0`, set `families: !eth0 *` in the alarm definition for `template: inbound_packets_dropped`. +The tricky ones are `inbound packets dropped` and `inbound packets dropped ratio`. They have quite a strict policy so that they warn users about possible issues. These alarms can be annoying for some network configurations. It is especially true for some bonding configurations if an interface is a child or a bonding interface itself. If it is expected to have a certain number of drops on an interface for a certain network configuration, a separate alarm with different triggering thresholds can be created or the existing one can be disabled for this specific interface. It can be done with the help of the [families](/health/REFERENCE.md#alarm-line-families) line in the alarm configuration. For example, if you want to disable the `inbound packets dropped` alarm for `eth0`, set `families: !eth0 *` in the alarm definition for `template: inbound_packets_dropped`. #### configuration @@ -436,6 +501,48 @@ and metrics: the corresponding `min` or `empty`, which will then always read as zero. This way, alerts which match on these will still work. +## Infiniband interconnect + +This module monitors every active Infiniband port. It provides generic counters statistics, and per-vendor hw-counters (if vendor is supported). + +### Monitored interface metrics + +Each port will have its counters metrics monitored, grouped in the following charts: + +- **Bandwidth usage** + Sent/Received data, in KB/s + +- **Packets Statistics** + Sent/Received packets, in 3 categories: total, unicast and multicast. + +- **Errors Statistics** + Many errors counters are provided, presenting statistics for: + - Packets: malformated, sent/received discarded by card/switch, missing ressource + - Link: downed, recovered, integrity error, minor error + - Other events: Tick Wait to send, buffer overrun + +If your vendor is supported, you'll also get HW-Counters statistics. These being vendor specific, please refer to their documentation. + +- Mellanox: [see statistics documentation](https://community.mellanox.com/s/article/understanding-mlx5-linux-counters-and-status-parameters) + +### configuration + +Default configuration will monitor only enabled infiniband ports, and refresh newly activated or created ports every 30 seconds + +``` +[plugin:proc:/sys/class/infiniband] + # dirname to monitor = /sys/class/infiniband + # bandwidth counters = yes + # packets counters = yes + # errors counters = yes + # hardware packets counters = auto + # hardware errors counters = auto + # monitor only ports being active = auto + # disable by default interfaces matching = + # refresh ports state every seconds = 30 +``` + + ## IPC ### Monitored IPC metrics |