summaryrefslogtreecommitdiffstats
path: root/collectors/proc.plugin/README.md
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--collectors/proc.plugin/README.md106
1 files changed, 94 insertions, 12 deletions
diff --git a/collectors/proc.plugin/README.md b/collectors/proc.plugin/README.md
index de62aeca7..b505bced2 100644
--- a/collectors/proc.plugin/README.md
+++ b/collectors/proc.plugin/README.md
@@ -20,6 +20,7 @@
- `/proc/loadavg` (system load and total processes running)
- `/proc/sys/kernel/random/entropy_avail` (random numbers pool availability - used in cryptography)
- `/sys/class/power_supply` (power supply properties)
+ - `ipc` (IPC semaphores and message queues)
- `ksm` Kernel Same-Page Merging performance (several files under `/sys/kernel/mm/ksm`).
- `netdata` (internal netdata resources utilization)
@@ -36,33 +37,33 @@ Hopefully, the Linux kernel provides many metrics that can provide deep insights
### Monitored disk metrics
-- I/O bandwidth/s (kb/s)
+- **I/O bandwidth/s (kb/s)**
The amount of data transferred from and to the disk.
-- I/O operations/s
+- **I/O operations/s**
The number of I/O operations completed.
-- Queued I/O operations
+- **Queued I/O operations**
The number of currently queued I/O operations. For traditional disks that execute commands one after another, one of them is being run by the disk and the rest are just waiting in a queue.
-- Backlog size (time in ms)
+- **Backlog size (time in ms)**
The expected duration of the currently queued I/O operations.
-- Utilization (time percentage)
+- **Utilization (time percentage)**
The percentage of time the disk was busy with something. This is a very interesting metric, since for most disks, that execute commands sequentially, **this is the key indication of congestion**. A sequential disk that is 100% of the available time busy, has no time to do anything more, so even if the bandwidth or the number of operations executed by the disk is low, its capacity has been reached.
Of course, for newer disk technologies (like fusion cards) that are capable to execute multiple commands in parallel, this metric is just meaningless.
-- Average I/O operation time (ms)
+- **Average I/O operation time (ms)**
The average time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
-- Average I/O operation size (kb)
+- **Average I/O operation size (kb)**
The average amount of data of the completed I/O operations.
-- Average Service Time (ms)
+- **Average Service Time (ms)**
The average service time for completed I/O operations. This metric is calculated using the total busy time of the disk and the number of completed operations. If the disk is able to execute multiple parallel operations the reporting average service time will be misleading.
-- Merged I/O operations/s
+- **Merged I/O operations/s**
The Linux kernel is capable of merging I/O operations. So, if two requests to read data from the disk are adjacent, the Linux kernel may merge them to one before giving them to disk. This metric measures the number of operations that have been merged by the Linux kernel.
-- Total I/O time
+- **Total I/O time**
The sum of the duration of all completed I/O operations. This number can exceed the interval if the disk is able to execute multiple I/O operations in parallel.
-- Space usage
+- **Space usage**
For mounted disks, netdata will provide a chart for their space, with 3 dimensions:
1. free
2. used
3. reserved for root
-- inode usage
+- **inode usage**
For mounted disks, netdata will provide a chart for their inodes (number of file and directories), with 3 dimensions:
1. free
2. used
@@ -249,6 +250,73 @@ each state.
`schedstat filename to monitor`, `cpuidle name filename to monitor`, and `cpuidle time filename to monitor` in the `[plugin:proc:/proc/stat]` configuration section
+## Monitoring Network Interfaces
+
+### Monitored network interface metrics
+
+- **Physical Network Interfaces Aggregated Bandwidth (kilobits/s)**
+ The amount of data received and sent through all physical interfaces in the system. This is the source of data for the Net Inbound and Net Outbound dials in the System Overview section.
+
+- **Bandwidth (kilobits/s)**
+ The amount of data received and sent through the interface.
+- **Packets (packets/s)**
+ The number of packets received, packets sent, and multicast packets transmitted through the interface.
+
+- **Interface Errors (errors/s)**
+ The number of errors for the inbound and outbound traffic on the interface.
+- **Interface Drops (drops/s)**
+ The number of packets dropped for the inbound and outbound traffic on the interface.
+- **Interface FIFO Buffer Errors (errors/s)**
+ The number of FIFO buffer errors encountered while receiving and transmitting data through the interface.
+- **Compressed Packets (packets/s)**
+ The number of compressed packets transmitted or received by the device driver.
+- **Network Interface Events (events/s)**
+ The number of packet framing errors, collisions detected on the interface, and carrier losses detected by the device driver.
+
+By default netdata will enable monitoring metrics only when they are not zero. If they are constantly zero they are ignored. Metrics that will start having values, after netdata is started, will be detected and charts will be automatically added to the dashboard (a refresh of the dashboard is needed for them to appear though).
+
+#### alarms
+
+There are several alarms defined in `health.d/net.conf`.
+
+The tricky ones are `inbound packets dropped` and `inbound packets dropped ratio`. They have quite a strict policy so that they warn users about possible issues. These alarms can be annoying for some network configurations. It is especially true for some bonding configurations if an interface is a slave or a bonding interface itself. If it is expected to have a certain number of drops on an interface for a certain network configuration, a separate alarm with different triggering thresholds can be created or the existing one can be disabled for this specific interface. It can be done with the help of the [families](../../health/#alarm-line-families) line in the alarm configuration. For example, if you want to disable the `inbound packets dropped` alarm for `eth0`, set `families: !eth0 *` in the alarm definition for `template: inbound_packets_dropped`.
+
+#### configuration
+
+Module configuration:
+
+```
+[plugin:proc:/proc/net/dev]
+ # filename to monitor = /proc/net/dev
+ # path to get virtual interfaces = /sys/devices/virtual/net/%s
+ # path to get net device speed = /sys/class/net/%s/speed
+ # enable new interfaces detected at runtime = auto
+ # bandwidth for all interfaces = auto
+ # packets for all interfaces = auto
+ # errors for all interfaces = auto
+ # drops for all interfaces = auto
+ # fifo for all interfaces = auto
+ # compressed packets for all interfaces = auto
+ # frames, collisions, carrier counters for all interfaces = auto
+ # disable by default interfaces matching = lo fireqos* *-ifb
+ # refresh interface speed every seconds = 10
+```
+
+Per interface configuration:
+
+```
+[plugin:proc:/proc/net/dev:enp0s3]
+ # enabled = yes
+ # virtual = no
+ # bandwidth = auto
+ # packets = auto
+ # errors = auto
+ # drops = auto
+ # fifo = auto
+ # compressed = auto
+ # events = auto
+```
+
## Linux Anti-DDoS
![image6](https://cloud.githubusercontent.com/assets/2662304/14253733/53550b16-fa95-11e5-8d9d-4ed171df4735.gif)
@@ -343,4 +411,18 @@ corresponding `min` or `empty` attribute, then Netdata will still provide
the corresponding `min` or `empty`, which will then always read as zero.
This way, alerts which match on these will still work.
+## IPC
+
+This module monitors the number of semaphores, semaphore arrays, number of messages in message queues, and amount of memory used by message queues. As far as the message queue charts are dynamic, sane limits are applied for the number of dimensions per chart (the limit is configurable).
+
+#### configuration
+
+```
+[plugin:proc:ipc]
+ # semaphore totals = yes
+ # message queues = yes
+ # msg filename to monitor = /proc/sysvipc/msg
+ # max dimensions in memory allowed = 50
+```
+
[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fcollectors%2Fproc.plugin%2FREADME&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)]()