summaryrefslogtreecommitdiffstats
path: root/src/health/REFERENCE.md
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--src/health/REFERENCE.md48
1 files changed, 24 insertions, 24 deletions
diff --git a/src/health/REFERENCE.md b/src/health/REFERENCE.md
index 85f1d2281..8b0a9177e 100644
--- a/src/health/REFERENCE.md
+++ b/src/health/REFERENCE.md
@@ -4,13 +4,13 @@ Netdata's health watchdog is highly configurable, with support for dynamic thres
more. You can tweak any of the existing alerts based on your infrastructure's topology or specific monitoring needs, or
create new entities.
-You can use health alerts in conjunction with any of Netdata's [collectors](https://github.com/netdata/netdata/blob/master/src/collectors/README.md) (see
-the [supported collector list](https://github.com/netdata/netdata/blob/master/src/collectors/COLLECTORS.md)) to monitor the health of your systems, containers, and
+You can use health alerts in conjunction with any of Netdata's [collectors](/src/collectors/README.md) (see
+the [supported collector list](/src/collectors/COLLECTORS.md)) to monitor the health of your systems, containers, and
applications in real time.
While you can see active alerts both on the local dashboard and Netdata Cloud, all health alerts are configured _per
node_ via individual Netdata Agents. If you want to deploy a new alert across your
-[infrastructure](https://github.com/netdata/netdata/blob/master/docs/quickstart/infrastructure.md), you must configure each node with the same health configuration
+[infrastructure](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md), you must configure each node with the same health configuration
files.
## Reload health configuration
@@ -33,14 +33,14 @@ You can configure the Agent's health watchdog service by editing files in two lo
- The `[health]` section in `netdata.conf`. By editing the daemon's behavior, you can disable health monitoring
altogether, run health checks more or less often, and more. See
- [daemon configuration](https://github.com/netdata/netdata/blob/master/src/daemon/config/README.md#health-section-options) for a table of
+ [daemon configuration](/src/daemon/config/README.md#health-section-options) for a table of
all the available settings, their default values, and what they control.
- The individual `.conf` files in `health.d/`. These health entity files are organized by the type of metric they are
performing calculations on or their associated collector. You should edit these files using the `edit-config`
script. For example: `sudo ./edit-config health.d/cpu.conf`.
-Navigate to your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) and
+Navigate to your [Netdata config directory](/docs/netdata-agent/configuration/README.md) and
use `edit-config` to make changes to any of these files.
### Edit individual alerts
@@ -84,7 +84,7 @@ Save the file and [reload Netdata's health configuration](#reload-health-configu
## Disable or silence alerts
Alerts and notifications can be disabled permanently via configuration changes, or temporarily, via the
-[health management API](https://github.com/netdata/netdata/blob/master/src/web/api/health/README.md). The
+[health management API](/src/web/api/health/README.md). The
available options are described below.
### Disable all alerts
@@ -114,7 +114,7 @@ This action requires that you [reload Netdata's health configuration](#reload-he
When you need to frequently disable all or some alerts from triggering during certain times (for instance
when running backups) you can use the
-[health management API](https://github.com/netdata/netdata/blob/master/src/web/api/health/README.md).
+[health management API](/src/web/api/health/README.md).
The API allows you to issue commands to control the health engine's behavior without changing configuration,
or restarting the agent.
@@ -122,7 +122,7 @@ or restarting the agent.
If you want health checks to keep running and alerts to keep getting triggered, but notifications to be
suppressed temporarily, you can use the
-[health management API](https://github.com/netdata/netdata/blob/master/src/web/api/health/README.md).
+[health management API](/src/web/api/health/README.md).
The API allows you to issue commands to control the health engine's behavior without changing configuration,
or restarting the agent.
@@ -134,7 +134,7 @@ your systems, containers, and applications work.
Read the [health entity reference](#health-entity-reference) for a full listing of the format,
syntax, and functionality of health entities.
-To write a new health entity into a new file, navigate to your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md),
+To write a new health entity into a new file, navigate to your [Netdata config directory](/docs/netdata-agent/configuration/README.md),
then use `touch` to create a new file in the `health.d/` directory. Use `edit-config` to start editing the file.
As an example, let's create a `ram-usage.conf` file.
@@ -223,7 +223,7 @@ Netdata parses the following lines. Beneath the table is an in-depth explanation
- The `every` line is **required** if not using `lookup`.
- Each entity **must** have at least one of the following lines: `lookup`, `calc`, `warn`, or `crit`.
- A few lines use space-separated lists to define how the entity behaves. You can use `*` as a wildcard or prefix with
- `!` for a negative match. Order is important, too! See our [simple patterns docs](https://github.com/netdata/netdata/blob/master/src/libnetdata/simple_pattern/README.md) for
+ `!` for a negative match. Order is important, too! See our [simple patterns docs](/src/libnetdata/simple_pattern/README.md) for
more examples.
- Lines terminated by a `\` are spliced together with the next line. The backslash is removed and the following line is
joined with the current one. No space is inserted, so you may split a line anywhere, even in the middle of a word.
@@ -388,9 +388,9 @@ The format is:
lookup: METHOD(GROUPING OPTIONS) AFTER [at BEFORE] [every DURATION] [OPTIONS] [of DIMENSIONS]
```
-The full [database query API](https://github.com/netdata/netdata/blob/master/src/web/api/queries/README.md) is supported. In short:
+The full [database query API](/src/web/api/queries/README.md) is supported. In short:
-- `METHOD` is one of the available [grouping methods](https://github.com/netdata/netdata/blob/master/src/web/api/queries/README.md#grouping-methods) such as `average`, `min`, `max` etc.
+- `METHOD` is one of the available [grouping methods](/src/web/api/queries/README.md#grouping-methods) such as `average`, `min`, `max` etc.
This is required.
- `GROUPING OPTIONS` are optional and can have the form `CONDITION VALUE`, where `CONDITION` is `!=`, `=`, `<=`, `<`, `>`, `>=` and `VALUE` is a number. The `CONDITION` and `VALUE` are required for `countif`, while `VALUE` is used by `percentile`, `trimmed_mean` and `trimmed_median`.
@@ -416,7 +416,7 @@ The full [database query API](https://github.com/netdata/netdata/blob/master/src
- `average` after time-aggregation of each dimension, return the average of all dimensions.
- `sum` after time-aggregation of each dimension, return the sum of all dimensions (this is the default).
- `min2max` after time-aggregation of each dimension, return the delta between the min and the max of the dimensions.
- - `unligned` prevents shifting the query window to multiples of the query duration.
+ - `unaligned` prevents shifting the query window to multiples of the query duration.
- `match-ids` matches the dimensions based on their IDs (the default is enabled, give `match-names` to disable).
- `match-names` matches the dimension based on their names (the default is enabled, give `match-ids` to disable).
@@ -600,7 +600,7 @@ good idea to tell Netdata to not clear the notification, by using the `no-clear-
#### Alert line `host labels`
-Defines the list of labels present on a host. See our [host labels guide](https://github.com/netdata/netdata/blob/master/docs/guides/using-host-labels.md) for
+Defines the list of labels present on a host. See our [host labels guide](/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts.md) for
an explanation of host labels and how to implement them.
For example, let's suppose that `netdata.conf` is configured with the following labels:
@@ -633,7 +633,7 @@ that will be applied to all hosts installed in the last decade with the followin
host labels: installed = 201*
```
-See our [simple patterns docs](https://github.com/netdata/netdata/blob/master/src/libnetdata/simple_pattern/README.md) for more examples.
+See our [simple patterns docs](/src/libnetdata/simple_pattern/README.md) for more examples.
#### Alert line `chart labels`
@@ -662,7 +662,7 @@ chart labels: mount_point=/mnt/disk1 device=sda
Will create the alert if the `mount_point` is `/mnt/disk1` and the `device` is `sda`. Furthermore, if a chart label name
is specified that does not exist in the chart, the chart won't be matched.
-See our [simple patterns docs](https://github.com/netdata/netdata/blob/master/src/libnetdata/simple_pattern/README.md) for more examples.
+See our [simple patterns docs](/src/libnetdata/simple_pattern/README.md) for more examples.
#### Alert line `summary`
@@ -808,14 +808,14 @@ You can find all the variables that can be used for a given chart, using
Agent dashboard. For example, [variables for the `system.cpu` chart of the
registry](https://registry.my-netdata.io/api/v1/alarm_variables?chart=system.cpu).
-> If you don't know how to find the CHART_NAME, you can read about it [here](https://github.com/netdata/netdata/blob/master/src/web/README.md#charts).
+> If you don't know how to find the CHART_NAME, you can read about it [here](/src/web/README.md#charts).
Netdata supports 3 internal indexes for variables that will be used in health monitoring.
<details><summary>The variables below can be used in both chart alerts and context templates.</summary>
Although the `alarm_variables` link shows you variables for a particular chart, the same variables can also be used in
-templates for charts belonging to a given [context](https://github.com/netdata/netdata/blob/master/src/web/README.md#contexts). The reason is that all charts of a given
+templates for charts belonging to a given [context](/src/web/README.md#contexts). The reason is that all charts of a given
context are essentially identical, with the only difference being the family that identifies a particular hardware or software instance.
</details>
@@ -1048,9 +1048,9 @@ lookup: mean -10s of user
Since [`z = (x - mean) / stddev`](https://en.wikipedia.org/wiki/Standard_score) we create two input alerts, one for `mean` and one for `stddev` and then use them both as inputs in our final `cpu_user_zscore` alert.
-### Example 6 - [Anomaly rate](https://github.com/netdata/netdata/blob/master/src/ml/README.md#anomaly-rate) based CPU chart alert
+### Example 6 - [Anomaly rate](/src/ml/README.md#anomaly-rate) based CPU chart alert
-Warning if 5 minute rolling [anomaly rate](https://github.com/netdata/netdata/blob/master/src/ml/README.md#anomaly-rate) averaged across all CPU dimensions is above 5%, critical if it goes above 20%:
+Warning if 5 minute rolling [anomaly rate](/src/ml/README.md#anomaly-rate) averaged across all CPU dimensions is above 5%, critical if it goes above 20%:
```yaml
template: ml_5min_cpu_chart
@@ -1067,9 +1067,9 @@ template: ml_5min_cpu_chart
The `lookup` line will calculate the average anomaly rate across all `system.cpu` dimensions over the last 5 minues. In this case
Netdata will create one alert for the chart.
-### Example 7 - [Anomaly rate](https://github.com/netdata/netdata/blob/master/src/ml/README.md#anomaly-rate) based node level alert
+### Example 7 - [Anomaly rate](/src/ml/README.md#anomaly-rate) based node level alert
-Warning if 5 minute rolling [anomaly rate](https://github.com/netdata/netdata/blob/master/src/ml/README.md#anomaly-rate) averaged across all ML enabled dimensions is above 5%, critical if it goes above 20%:
+Warning if 5 minute rolling [anomaly rate](/src/ml/README.md#anomaly-rate) averaged across all ML enabled dimensions is above 5%, critical if it goes above 20%:
```yaml
template: ml_5min_node
@@ -1083,11 +1083,11 @@ template: ml_5min_node
info: rolling 5min anomaly rate for all ML enabled dims
```
-The `lookup` line will use the `anomaly_rate` dimension of the `anomaly_detection.anomaly_rate` ML chart to calculate the average [node level anomaly rate](https://github.com/netdata/netdata/blob/master/src/ml/README.md#node-anomaly-rate) over the last 5 minutes.
+The `lookup` line will use the `anomaly_rate` dimension of the `anomaly_detection.anomaly_rate` ML chart to calculate the average [node level anomaly rate](/src/ml/README.md#node-anomaly-rate) over the last 5 minutes.
## Troubleshooting
-You can compile Netdata with [debugging](https://github.com/netdata/netdata/blob/master/src/daemon/README.md#debugging) and then set in `netdata.conf`:
+You can compile Netdata with [debugging](/src/daemon/README.md#debugging) and then set in `netdata.conf`:
```yaml
[global]