summaryrefslogtreecommitdiffstats
path: root/health/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'health/README.md')
-rw-r--r--health/README.md53
1 files changed, 22 insertions, 31 deletions
diff --git a/health/README.md b/health/README.md
index 597bd3c3..5d68d752 100644
--- a/health/README.md
+++ b/health/README.md
@@ -1,4 +1,3 @@
-
# Health monitoring
Each netdata node runs an independent thread evaluating health monitoring checks.
@@ -40,16 +39,16 @@ killall -USR2 netdata
There are 2 entities:
-1. **alarms**, which are attached to specific charts, and
+1. **alarms**, which are attached to specific charts, and
-2. **templates**, which define rules that should be applied to all charts having a
+1. **templates**, which define rules that should be applied to all charts having a
specific `context`. You can use this feature to apply **alarms** to all disks,
all network interfaces, all mysql databases, all nginx web servers, etc.
Both of these entities have exactly the same format and feature set.
The only difference is the label `alarm` or `template`.
-netdata supports overriding **templates** with **alarms**.
+Netdata supports overriding **templates** with **alarms**.
For example, when a template is defined for a set of charts, an alarm with exactly the
same name attached to the same chart the template matches, will have higher precedence
(i.e. netdata will use the alarm on this chart and prevent the template from being applied
@@ -59,7 +58,7 @@ to it).
The following lines are parsed.
-#### alarm line `alarm` or `template`
+#### Alarm line `alarm` or `template`
This line starts an alarm or alarm template.
@@ -78,7 +77,7 @@ This line has to be first on each alarm or template.
---
-#### alarm line `on`
+#### Alarm line `on`
This line defines the data the alarm should be attached to.
@@ -112,7 +111,7 @@ So, `plugin = proc`, `module = /proc/net/dev` and `context = net.net`.
---
-#### alarm line `os`
+#### Alarm line `os`
This alarm or template will be used only if the O/S of the host loading it, matches this
pattern list. The value is a space separated list of simple patterns (use `*` as wildcard,
@@ -124,7 +123,7 @@ os: linux freebsd macos
---
-#### alarm line `hosts`
+#### Alarm line `hosts`
This alarm or template will be used only if the hostname of the host loading it, matches
this pattern list. The value is a space separated list of simple patterns (use `*` as wildcard,
@@ -141,7 +140,7 @@ This is useful when you centralize metrics from multiple hosts, to one netdata.
---
-#### alarm line `families`
+#### Alarm line `families`
This line is only used in alarm templates. It filters the charts. So, if you need to create
an alarm template for a few of a kind of chart (a few of your disks, or a few of your network
@@ -165,7 +164,7 @@ The family of a chart is usually the submenu of the netdata dashboard it appears
---
-#### alarm line `lookup`
+#### Alarm line `lookup`
This lines makes a database lookup to find a value. This result of this lookup is available as `$this`.
@@ -205,7 +204,7 @@ The timestamps of the timeframe evaluated by the database lookup is available as
---
-#### alarm line `calc`
+#### Alarm line `calc`
This expression is evaluated just after the `lookup` (if any). Its purpose is to apply some
calculation before using the value looked up from the db.
@@ -225,7 +224,7 @@ Check [Expressions](#expressions) for more information.
---
-#### alarm line `every`
+#### Alarm line `every`
Sets the update frequency of this alarm. This is the same to the `every DURATION` given
in the `lookup` lines.
@@ -240,7 +239,7 @@ every: DURATION
---
-#### alarm lines `green` and `red`
+#### Alarm lines `green` and `red`
Set the green and red thresholds of a chart. Both are available as `$green` and `$red` in
expressions. If multiple alarms define different thresholds, the ones defined by the first
@@ -257,7 +256,7 @@ red: NUMBER
---
-#### alarm lines `warn` and `crit`
+#### Alarm lines `warn` and `crit`
These expressions should evaluate to true or false (alternatively non-zero or zero).
They trigger the alarm. Both are optional.
@@ -272,7 +271,7 @@ Check [Expressions](#expressions) for more information.
---
-#### alarm line `to`
+#### Alarm line `to`
This will be the first parameter of the script to be executed when the alarm switches status.
Its meaning is left up to the `exec` script.
@@ -288,7 +287,7 @@ to: ROLE1 ROLE2 ROLE3 ...
---
-#### alarm line `exec`
+#### Alarm line `exec`
The script that will be executed when the alarm changes status.
@@ -303,7 +302,7 @@ methods netdata supports, including custom hooks.
---
-#### alarm line `delay`
+#### Alarm line `delay`
This is used to provide optional hysteresis settings for the notifications, to defend
against notification floods. These settings do not affect the actual alarm - only the time
@@ -374,13 +373,9 @@ Expressions can have variables. Variables start with `$`. Check below for more i
There are two special values you can use:
- - `nan`, for example `$this != nan` will check if the variable `this` is available.
- A variable can be `nan` if the database lookup failed. All calculations (i.e. addition,
- multiplication, etc) with a `nan` result in a `nan`.
+- `nan`, for example `$this != nan` will check if the variable `this` is available. A variable can be `nan` if the database lookup failed. All calculations (i.e. addition, multiplication, etc) with a `nan` result in a `nan`.
- - `inf`, for example `$this != inf` will check if `this` is not infinite. A value or
- variable can be infinite if divided by zero. All calculations (i.e. addition,
- multiplication, etc) with a `inf` result in a `inf`.
+- `inf`, for example `$this != inf` will check if `this` is not infinite. A value or variable can be infinite if divided by zero. All calculations (i.e. addition, multiplication, etc) with a `inf` result in a `inf`.
---
@@ -412,10 +407,10 @@ Which in turn, results in the following behavior:
* While the value is falling, it will return to a warning state when it goes below 85,
and a normal state when it goes below 75.
-
+
* If the value is constantly varying between 80 and 90, then it will trigger a warning the
first time it goes above 85, but will remain a warning until it goes below 75 (or goes above 85).
-
+
* If the value is constantly varying between 90 and 100, then it will trigger a critical alert
the first time it goes above 95, but will remain a critical alert goes below 85 (at which
point it will return to being a warning).
@@ -490,8 +485,7 @@ The external script will be called for all status changes.
## Examples
-
-Check the **[health.d directory](health.d)** for all alarms shipped with netdata.
+Check the `health/health.d/` directory for all alarms shipped with netdata.
Here are a few examples:
@@ -650,8 +644,5 @@ Important: this will generate a lot of output in debug.log.
You can find the context of charts by looking up the chart in either
`http://your.netdata:19999/netdata.conf` or `http://your.netdata:19999/api/v1/charts`.
-You can find how netdata interpreted the expressions by examining the alarm at
-`http://your.netdata:19999/api/v1/alarms?all`. For each expression, netdata will return the
-expression as given in its config file, and the same expression with additional parentheses
-added to indicate the evaluation flow of the expression.
+You can find how netdata interpreted the expressions by examining the alarm at `http://your.netdata:19999/api/v1/alarms?all`. For each expression, netdata will return the expression as given in its config file, and the same expression with additional parentheses added to indicate the evaluation flow of the expression.