diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2023-05-08 16:27:08 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2023-05-08 16:27:08 +0000 |
commit | 81581f9719bc56f01d5aa08952671d65fda9867a (patch) | |
tree | 0f5c6b6138bf169c23c9d24b1fc0a3521385cb18 /health | |
parent | Releasing debian version 1.38.1-1. (diff) | |
download | netdata-81581f9719bc56f01d5aa08952671d65fda9867a.tar.xz netdata-81581f9719bc56f01d5aa08952671d65fda9867a.zip |
Merging upstream version 1.39.0.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'health')
45 files changed, 2452 insertions, 1304 deletions
diff --git a/health/Makefile.am b/health/Makefile.am index f0cbb7715..ea1b6e961 100644 --- a/health/Makefile.am +++ b/health/Makefile.am @@ -40,7 +40,7 @@ dist_healthconfig_DATA = \ health.d/disks.conf \ health.d/dnsmasq_dhcp.conf \ health.d/dns_query.conf \ - health.d/dockerd.conf \ + health.d/docker.conf \ health.d/elasticsearch.conf \ health.d/entropy.conf \ health.d/exporting.conf \ @@ -97,7 +97,7 @@ dist_healthconfig_DATA = \ health.d/vsphere.conf \ health.d/web_log.conf \ health.d/whoisquery.conf \ - health.d/wmi.conf \ + health.d/windows.conf \ health.d/x509check.conf \ health.d/zfs.conf \ health.d/dbengine.conf \ diff --git a/health/README.md b/health/README.md index 460f65680..96f71f87a 100644 --- a/health/README.md +++ b/health/README.md @@ -1,13 +1,4 @@ -<!-- -title: "Health monitoring" -custom_edit_url: https://github.com/netdata/netdata/edit/master/health/README.md -sidebar_label: "Health monitoring" -learn_status: "Published" -learn_topic_type: "Concepts" -learn_rel_path: "Concepts" ---> - -# Health monitoring +# Alerts and notifications The Netdata Agent is a health watchdog for the health and performance of your systems, services, and applications. We've worked closely with our community of DevOps engineers, SREs, and developers to define hundreds of production-ready @@ -18,23 +9,6 @@ community-configured alarms for every app/service [the Agent collects metrics fr silence anything you're not interested in. You can even power complex lookups by running statistical algorithms against your metrics. -Ready to take the next steps with health monitoring? - -[Configuration reference](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md) - -## Guides - -Every infrastructure is different, so we're not interested in mandating how you should configure Netdata's health -monitoring features. Instead, these guides should give you the details you need to tweak alarms to your heart's -content. - -[Stopping notifications for individual alarms](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/stop-notifications-alarms.md) - -[Use dimension templates to create dynamic alarms](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/dimension-templates.md) - -## Related features - -**[Notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md)**: Get notified about ongoing alarms from your Agents via your -favorite platform(s), such as Slack, Discord, PagerDuty, email, and much more. - - +You can [use various alert notification methods](https://github.com/netdata/netdata/edit/master/docs/monitor/enable-notifications.md), +[customize alerts](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md), and +[disable/silence](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#disable-or-silence-alerts) alerts. diff --git a/health/REFERENCE.md b/health/REFERENCE.md index 27031cd19..b95dc852e 100644 --- a/health/REFERENCE.md +++ b/health/REFERENCE.md @@ -1,34 +1,190 @@ -<!-- -title: "Health configuration reference" -sidebar_label: "Health" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/REFERENCE.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations/Alerts" ---> +# Configure alerts -# Health configuration reference +Netdata's health watchdog is highly configurable, with support for dynamic thresholds, hysteresis, alarm templates, and +more. You can tweak any of the existing alarms based on your infrastructure's topology or specific monitoring needs, or +create new entities. -Welcome to the health configuration reference. +You can use health alarms in conjunction with any of Netdata's [collectors](https://github.com/netdata/netdata/blob/master/collectors/README.md) (see +the [supported collector list](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md)) to monitor the health of your systems, containers, and +applications in real time. -This guide contains information about editing health configuration files to tweak existing alarms or create new health -entities that are customized to the needs of your infrastructure. +While you can see active alarms both on the local dashboard and Netdata Cloud, all health alarms are configured _per +node_ via individual Netdata Agents. If you want to deploy a new alarm across your +[infrastructure](https://github.com/netdata/netdata/blob/master/docs/quickstart/infrastructure.md), you must configure each node with the same health configuration +files. -To learn the basics of locating and editing health configuration files, see the [health -quickstart](https://github.com/netdata/netdata/blob/master/health/QUICKSTART.md). - -## Health configuration files +## Edit health configuration files You can configure the Agent's health watchdog service by editing files in two locations: -- The `[health]` section in `netdata.conf`. By editing the daemon's behavior, you can disable health monitoring - altogether, run health checks more or less often, and more. See [daemon - configuration](https://github.com/netdata/netdata/blob/master/daemon/config/README.md#health-section-options) for a table of all the available settings, their - default values, and what they control. -- The individual `.conf` files in `health.d/`. These health entity files are organized by the type of metric they are +- The `[health]` section in `netdata.conf`. By editing the daemon's behavior, you can disable health monitoring + altogether, run health checks more or less often, and more. See + [daemon configuration](https://github.com/netdata/netdata/blob/master/daemon/config/README.md#health-section-options) for a table of + all the available settings, their default values, and what they control. + +- The individual `.conf` files in `health.d/`. These health entity files are organized by the type of metric they are performing calculations on or their associated collector. You should edit these files using the `edit-config` script. For example: `sudo ./edit-config health.d/cpu.conf`. +Navigate to your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) and +use `edit-config` to make changes to any of these files. + +### Edit individual alerts + +For example, to edit the `cpu.conf` health configuration file, run: + +```bash +sudo ./edit-config health.d/cpu.conf +``` + +Each health configuration file contains one or more health _entities_, which always begin with `alarm:` or `template:`. +For example, here is the first health entity in `health.d/cpu.conf`: + +```yaml +template: 10min_cpu_usage + on: system.cpu + os: linux + hosts: * + lookup: average -10m unaligned of user,system,softirq,irq,guest + units: % + every: 1m + warn: $this > (($status >= $WARNING) ? (75) : (85)) + crit: $this > (($status == $CRITICAL) ? (85) : (95)) + delay: down 15m multiplier 1.5 max 1h + info: average cpu utilization for the last 10 minutes (excluding iowait, nice and steal) + to: sysadmin +``` + +To tune this alarm to trigger warning and critical alarms at a lower CPU utilization, change the `warn` and `crit` lines +to the values of your choosing. For example: + +```yaml + warn: $this > (($status >= $WARNING) ? (60) : (75)) + crit: $this > (($status == $CRITICAL) ? (75) : (85)) +``` + +Save the file and [reload Netdata's health configuration](#reload-health-configuration) to apply your changes. + +## Disable or silence alerts + +Alerts and notifications can be disabled permanently via configuration changes, or temporarily, via the +[health management API](https://github.com/netdata/netdata/blob/master/web/api/health/README.md). The +available options are described below. + +### Disable all alerts + +In the `netdata.conf` `[health]` section, set `enabled` to `no`, and restart the agent. + +### Disable some alerts + +In the `netdata.conf` `[health]` section, set `enabled alarms` to a +[simple pattern](https://github.com/netdata/netdata/edit/master/libnetdata/simple_pattern/README.md) that +excludes one or more alerts. e.g. `enabled alarms = !oom_kill *` will load all alarms except `oom_kill`. + +You can also [edit the file where the alert is defined](#edit-individual-alerts), comment out its definition, +and [reload Netdata's health configuration](#reload-health-configuration). + +### Silence an individual alert + +You can stop receiving notification for an individual alert by [changing](#edit-individual-alerts) the `to:` line to `silent`. + +```yaml + to: silent +``` + +This action requires that you [reload Netdata's health configuration](#reload-health-configuration). + +### Temporarily disable alerts at runtime + +When you need to frequently disable all or some alerts from triggering during certain times (for instance +when running backups) you can use the +[health management API](https://github.com/netdata/netdata/blob/master/web/api/health/README.md). +The API allows you to issue commands to control the health engine's behavior without changing configuration, +or restarting the agent. + +### Temporarily silence notifications at runtime + +If you want health checks to keep running and alerts to keep getting triggered, but notifications to be +suppressed temporarily, you can use the +[health management API](https://github.com/netdata/netdata/blob/master/web/api/health/README.md). +The API allows you to issue commands to control the health engine's behavior without changing configuration, +or restarting the agent. + +## Write a new health entity + +While tuning existing alarms may work in some cases, you may need to write entirely new health entities based on how +your systems, containers, and applications work. + +Read the [health entity reference](#health-entity-reference) for a full listing of the format, +syntax, and functionality of health entities. + +To write a new health entity into a new file, navigate to your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), +then use `touch` to create a new file in the `health.d/` directory. Use `edit-config` to start editing the file. + +As an example, let's create a `ram-usage.conf` file. + +```bash +sudo touch health.d/ram-usage.conf +sudo ./edit-config health.d/ram-usage.conf +``` + +For example, here is a health entity that triggers a warning alarm when a node's RAM usage rises above 80%, and a +critical alarm above 90%: + +```yaml + alarm: ram_usage + on: system.ram +lookup: average -1m percentage of used + units: % + every: 1m + warn: $this > 80 + crit: $this > 90 + info: The percentage of RAM being used by the system. +``` + +Let's look into each of the lines to see how they create a working health entity. + +- `alarm`: The name for your new entity. The name needs to follow these requirements: + - Any alphabet letter or number. + - The symbols `.` and `_`. + - Cannot be `chart name`, `dimension name`, `family name`, or `chart variable names`. + +- `on`: Which chart the entity listens to. + +- `lookup`: Which metrics the alarm monitors, the duration of time to monitor, and how to process the metrics into a + usable format. + - `average`: Calculate the average of all the metrics collected. + - `-1m`: Use metrics from 1 minute ago until now to calculate that average. + - `percentage`: Clarify that we're calculating a percentage of RAM usage. + - `of used`: Specify which dimension (`used`) on the `system.ram` chart you want to monitor with this entity. + +- `units`: Use percentages rather than absolute units. + +- `every`: How often to perform the `lookup` calculation to decide whether or not to trigger this alarm. + +- `warn`/`crit`: The value at which Netdata should trigger a warning or critical alarm. This example uses simple + syntax, but most pre-configured health entities use + [hysteresis](#special-use-of-the-conditional-operator) to avoid superfluous notifications. + +- `info`: A description of the alarm, which will appear in the dashboard and notifications. + +In human-readable format: + +> This health entity, named **ram_usage**, watches the **system.ram** chart. It looks up the last **1 minute** of +> metrics from the **used** dimension and calculates the **average** of all those metrics in a **percentage** format, +> using a **% unit**. The entity performs this lookup **every minute**. +> +> If the average RAM usage percentage over the last 1 minute is **more than 80%**, the entity triggers a warning alarm. +> If the usage is **more than 90%**, the entity triggers a critical alarm. + +When you finish writing this new health entity, [reload Netdata's health configuration](#reload-health-configuration) to +see it live on the local dashboard or Netdata Cloud. + +## Reload health configuration + +To make any changes to your health configuration live, you must reload Netdata's health monitoring system. To do that +without restarting all of Netdata, run `netdatacli reload-health` or `killall -USR2 netdata`. + ## Health entity reference The following reference contains information about the syntax and options of _health entities_, which Netdata attaches @@ -51,14 +207,14 @@ to the same chart, Netdata will use the alarm. Netdata parses the following lines. Beneath the table is an in-depth explanation of each line's purpose and syntax. -- The `alarm` or `template` line must be the first line of any entity. -- The `on` line is **always required**. -- The `every` line is **required** if not using `lookup`. -- Each entity **must** have at least one of the following lines: `lookup`, `calc`, `warn`, or `crit`. -- A few lines use space-separated lists to define how the entity behaves. You can use `*` as a wildcard or prefix with +- The `alarm` or `template` line must be the first line of any entity. +- The `on` line is **always required**. +- The `every` line is **required** if not using `lookup`. +- Each entity **must** have at least one of the following lines: `lookup`, `calc`, `warn`, or `crit`. +- A few lines use space-separated lists to define how the entity behaves. You can use `*` as a wildcard or prefix with `!` for a negative match. Order is important, too! See our [simple patterns docs](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md) for more examples. -- Lines terminated by a `\` are spliced together with the next line. The backslash is removed and the following line is +- Lines terminated by a `\` are spliced together with the next line. The backslash is removed and the following line is joined with the current one. No space is inserted, so you may split a line anywhere, even in the middle of a word. This comes in handy if your `info` line consists of several sentences. @@ -106,7 +262,7 @@ alarm: NAME template: NAME ``` -`NAME` can be any alpha character, with `.` (period) and `_` (underscore) as the only allowed symbols, but the names +`NAME` can be any alpha character, with `.` (period) and `_` (underscore) as the only allowed symbols, but the names cannot be `chart name`, `dimension name`, `family name`, or `chart variables names`. #### Alarm line `on` @@ -138,7 +294,7 @@ shows a disk I/O chart, the tooltip reads: `proc:/proc/diskstats, disk.io`. ![Finding the context of a chart via the tooltip](https://user-images.githubusercontent.com/1153921/68882856-2b230880-06cd-11ea-923b-b28c4632d479.png) -You're interested in what comes after the comma: `disk.io`. That's the name of the chart's context. +You're interested in what comes after the comma: `disk.io`. That's the name of the chart's context. If you create a template using the `disk.io` context, it will apply an alarm to every disk available on your system. @@ -160,7 +316,6 @@ class: Latency | Utilization | | Workload | - </details> `class` will default to `Unknown` if the line is missing from the alarm configuration. @@ -172,34 +327,35 @@ Type can be used to indicate the broader area of the system that the alarm appli ```yaml type: Database ``` + <details> <summary>Netdata's stock alarms use the following `type` attributes by default, but feel free to adjust for your own requirements.</summary> -| Type | Description | -| ------------------------ | ------------------------------------------------------------------------------------------------ | -| Ad Filtering | Services related to Ad Filtering (like pi-hole) | -| Certificates | Certificates monitoring related | -| Cgroups | Alerts for cpu and memory usage of control groups | -| Computing | Alerts for shared computing applications (e.g. boinc) | -| Containers | Container related alerts (e.g. docker instances) | -| Database | Database systems (e.g. MySQL, PostgreSQL, etc) | -| Data Sharing | Used to group together alerts for data sharing applications | -| DHCP | Alerts for dhcp related services | -| DNS | Alerts for dns related services | -| Kubernetes | Alerts for kubernetes nodes monitoring | -| KV Storage | Key-Value pairs services alerts (e.g. memcached) | -| Linux | Services specific to Linux (e.g. systemd) | -| Messaging | Alerts for message passing services (e.g. vernemq) | -| Netdata | Internal Netdata components monitoring | -| Other | When an alert doesn't fit in other types. | -| Power Supply | Alerts from power supply related services (e.g. apcupsd) | -| Search engine | Alerts for search services (e.g. elasticsearch) | -| Storage | Class for alerts dealing with storage services (storage devices typically live under `System`) | -| System | General system alarms (e.g. cpu, network, etc.) | -| Virtual Machine | Virtual Machine software | -| Web Proxy | Web proxy software (e.g. squid) | -| Web Server | Web server software (e.g. Apache, ngnix, etc.) | -| Windows | Alerts for monitor of wmi services | +| Type | Description | +|-----------------|------------------------------------------------------------------------------------------------| +| Ad Filtering | Services related to Ad Filtering (like pi-hole) | +| Certificates | Certificates monitoring related | +| Cgroups | Alerts for cpu and memory usage of control groups | +| Computing | Alerts for shared computing applications (e.g. boinc) | +| Containers | Container related alerts (e.g. docker instances) | +| Database | Database systems (e.g. MySQL, PostgreSQL, etc) | +| Data Sharing | Used to group together alerts for data sharing applications | +| DHCP | Alerts for dhcp related services | +| DNS | Alerts for dns related services | +| Kubernetes | Alerts for kubernetes nodes monitoring | +| KV Storage | Key-Value pairs services alerts (e.g. memcached) | +| Linux | Services specific to Linux (e.g. systemd) | +| Messaging | Alerts for message passing services (e.g. vernemq) | +| Netdata | Internal Netdata components monitoring | +| Other | When an alert doesn't fit in other types. | +| Power Supply | Alerts from power supply related services (e.g. apcupsd) | +| Search engine | Alerts for search services (e.g. elasticsearch) | +| Storage | Class for alerts dealing with storage services (storage devices typically live under `System`) | +| System | General system alarms (e.g. cpu, network, etc.) | +| Virtual Machine | Virtual Machine software | +| Web Proxy | Web proxy software (e.g. squid) | +| Web Server | Web server software (e.g. Apache, ngnix, etc.) | +| Windows | Alerts for monitor of windows services | </details> @@ -212,6 +368,7 @@ Component can be used to narrow down what the previous `type` value specifies fo ```yaml component: MySQL ``` + As with the `class` and `type` line, if `component` is missing from the configuration, its value will default to `Unknown`. #### Alarm line `os` @@ -264,7 +421,7 @@ module: isc_dhcpd #### Alarm line `charts` -The `charts` line filters which chart this alarm should apply to. It is only available on entities using the +The `charts` line filters which chart this alarm should apply to. It is only available on entities using the [`template`](#alarm-line-alarm-or-template) line. The value is a space-separated list of [simple patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md). For example, a template that applies to `disk.svctm` (Average Service Time) context, but excludes the disk `sdb` from alarms: @@ -299,35 +456,36 @@ The format is: lookup: METHOD AFTER [at BEFORE] [every DURATION] [OPTIONS] [of DIMENSIONS] [foreach DIMENSIONS] ``` -Everything is the same with [badges](https://github.com/netdata/netdata/blob/master/web/api/badges/README.md). In short: +The full [database query API](https://github.com/netdata/netdata/blob/master/web/api/queries/README.md) is supported. In short: -- `METHOD` is one of `average`, `min`, `max`, `sum`, `incremental-sum`. +- `METHOD` is one of the available [grouping methods](https://github.com/netdata/netdata/blob/master/web/api/queries/README.md#grouping-methods) such as `average`, `min`, `max` etc. This is required. -- `AFTER` is a relative number of seconds, but it also accepts a single letter for changing +- `AFTER` is a relative number of seconds, but it also accepts a single letter for changing the units, like `-1s` = 1 second in the past, `-1m` = 1 minute in the past, `-1h` = 1 hour in the past, `-1d` = 1 day in the past. You need a negative number (i.e. how far in the past to look for the value). **This is required**. -- `at BEFORE` is by default 0 and is not required. Using this you can define the end of the +- `at BEFORE` is by default 0 and is not required. Using this you can define the end of the lookup. So data will be evaluated between `AFTER` and `BEFORE`. -- `every DURATION` sets the updated frequency of the lookup (supports single letter units as +- `every DURATION` sets the updated frequency of the lookup (supports single letter units as above too). -- `OPTIONS` is a space separated list of `percentage`, `absolute`, `min2max`, `unaligned`, +- `OPTIONS` is a space separated list of `percentage`, `absolute`, `min2max`, `unaligned`, `match-ids`, `match-names`. Check the [badges](https://github.com/netdata/netdata/blob/master/web/api/badges/README.md) documentation for more info. -- `of DIMENSIONS` is optional and has to be the last parameter. Dimensions have to be separated +- `of DIMENSIONS` is optional and has to be the last parameter. Dimensions have to be separated by `,` or `|`. The space characters found in dimensions will be kept as-is (a few dimensions have spaces in their names). This accepts Netdata simple patterns _(with `words` separated by `,` or `|` instead of spaces)_ and the `match-ids` and `match-names` options affect the searches for dimensions. -- `foreach DIMENSIONS` is optional, will always be the last parameter, and uses the same `,`/`|` +- `foreach DIMENSIONS` is optional, will always be the last parameter, and uses the same `,`/`|` rules as the `of` parameter. Each dimension you specify in `foreach` will use the same rule to trigger an alarm. If you set both `of` and `foreach`, Netdata will ignore the `of` parameter - and replace it with one of the dimensions you gave to `foreach`. + and replace it with one of the dimensions you gave to `foreach`. This option allows you to + [use dimension templates to create dynamic alarms](#use-dimension-templates-to-create-dynamic-alarms). The result of the lookup will be available as `$this` and `$NAME` in expressions. The timestamps of the timeframe evaluated by the database lookup is available as variables @@ -427,21 +585,21 @@ Format: delay: [[[up U] [down D] multiplier M] max X] ``` -- `up U` defines the delay to be applied to a notification for an alarm that raised its status +- `up U` defines the delay to be applied to a notification for an alarm that raised its status (i.e. CLEAR to WARNING, CLEAR to CRITICAL, WARNING to CRITICAL). For example, `up 10s`, the notification for this event will be sent 10 seconds after the actual event. This is used in hope the alarm will get back to its previous state within the duration given. The default `U` is zero. -- `down D` defines the delay to be applied to a notification for an alarm that moves to lower +- `down D` defines the delay to be applied to a notification for an alarm that moves to lower state (i.e. CRITICAL to WARNING, CRITICAL to CLEAR, WARNING to CLEAR). For example, `down 1m` will delay the notification by 1 minute. This is used to prevent notifications for flapping alarms. The default `D` is zero. -- `multiplier M` multiplies `U` and `D` when an alarm changes state, while a notification is +- `multiplier M` multiplies `U` and `D` when an alarm changes state, while a notification is delayed. The default multiplier is `1.0`. -- `max X` defines the maximum absolute notification delay an alarm may get. The default `X` +- `max X` defines the maximum absolute notification delay an alarm may get. The default `X` is `max(U * M, D * M)` (i.e. the max duration of `U` or `D` multiplied once with `M`). Example: @@ -459,9 +617,9 @@ delay: [[[up U] [down D] multiplier M] max X] So: - - `U` and `D` are multiplied by `M` every time the alarm changes state (any state, not just + - `U` and `D` are multiplied by `M` every time the alarm changes state (any state, not just their matching one) and a delay is in place. - - All are reset to their defaults when the alarm switches state without a delay in place. + - All are reset to their defaults when the alarm switches state without a delay in place. #### Alarm line `repeat` @@ -477,11 +635,11 @@ Format: repeat: [off] [warning DURATION] [critical DURATION] ``` -- `off`: Turns off the repeating feature for the current alarm. This is effective when the default repeat settings has +- `off`: Turns off the repeating feature for the current alarm. This is effective when the default repeat settings has been enabled in health configuration. -- `warning DURATION`: Defines the interval when the alarm is in WARNING state. Use `0s` to turn off the repeating +- `warning DURATION`: Defines the interval when the alarm is in WARNING state. Use `0s` to turn off the repeating notification for WARNING mode. -- `critical DURATION`: Defines the interval when the alarm is in CRITICAL state. Use `0s` to turn off the repeating +- `critical DURATION`: Defines the interval when the alarm is in CRITICAL state. Use `0s` to turn off the repeating notification for CRITICAL mode. #### Alarm line `options` @@ -529,7 +687,7 @@ line to any alarms you'd like to apply to hosts that have the label `room = serv host labels: room = server ``` -The `host labels` is a space-separated list that accepts simple patterns. For example, you can create an alarm +The `host labels` is a space-separated list that accepts simple patterns. For example, you can create an alarm that will be applied to all hosts installed in the last decade with the following line: ```yaml @@ -584,7 +742,7 @@ info: average ratio of HTTP responses with unexpected status over the last 5 min ## Expressions -Netdata has an internal [infix expression parser](/libnetdata/eval). This parses expressions and creates an internal +Netdata has an internal infix expression parser under `libnetdata/eval`. This parses expressions and creates an internal structure that allows fast execution of them. These operators are supported `+`, `-`, `*`, `/`, `<`, `==`, `<=`, `<>`, `!=`, `>`, `>=`, `&&`, `||`, `!`, `AND`, `OR`, `NOT`. @@ -605,10 +763,10 @@ Expressions can have variables. Variables start with `$`. Check below for more i There are two special values you can use: -- `nan`, for example `$this != nan` will check if the variable `this` is available. A variable can be `nan` if the +- `nan`, for example `$this != nan` will check if the variable `this` is available. A variable can be `nan` if the database lookup failed. All calculations (i.e. addition, multiplication, etc) with a `nan` result in a `nan`. -- `inf`, for example `$this != inf` will check if `this` is not infinite. A value or variable can be set to infinite +- `inf`, for example `$this != inf` will check if `this` is not infinite. A value or variable can be set to infinite if divided by zero. All calculations (i.e. addition, multiplication, etc) with a `inf` result in a `inf`. ### Special use of the conditional operator @@ -627,21 +785,21 @@ crit: $this > (($status == $CRITICAL) ? (85) : (95)) The above say: -- If the alarm is currently a warning, then the threshold for being considered a warning is 75, otherwise it's 85. +- If the alarm is currently a warning, then the threshold for being considered a warning is 75, otherwise it's 85. -- If the alarm is currently critical, then the threshold for being considered critical is 85, otherwise it's 95. +- If the alarm is currently critical, then the threshold for being considered critical is 85, otherwise it's 95. Which in turn, results in the following behavior: -- While the value is rising, it will trigger a warning when it exceeds 85, and a critical alert when it exceeds 95. +- While the value is rising, it will trigger a warning when it exceeds 85, and a critical alert when it exceeds 95. -- While the value is falling, it will return to a warning state when it goes below 85, and a normal state when it goes +- While the value is falling, it will return to a warning state when it goes below 85, and a normal state when it goes below 75. -- If the value is constantly varying between 80 and 90, then it will trigger a warning the first time it goes above +- If the value is constantly varying between 80 and 90, then it will trigger a warning the first time it goes above 85, but will remain a warning until it goes below 75 (or goes above 85). -- If the value is constantly varying between 90 and 100, then it will trigger a critical alert the first time it goes +- If the value is constantly varying between 90 and 100, then it will trigger a critical alert the first time it goes above 95, but will remain a critical alert goes below 85 (at which point it will return to being a warning). ## Variables @@ -665,15 +823,15 @@ unless if you explicitly limit an alarm with the [alarm line `families`](#alarm- </details> -- **chart local variables**. All the dimensions of the chart are exposed as local variables. The value of `$this` for +- **chart local variables**. All the dimensions of the chart are exposed as local variables. The value of `$this` for the other configured alarms of the chart also appears, under the name of each configured alarm. Charts also define a few special variables: - - `$last_collected_t` is the unix timestamp of the last data collection - - `$collected_total_raw` is the sum of all the dimensions (their last collected values) - - `$update_every` is the update frequency of the chart - - `$green` and `$red` the threshold defined in alarms (these are per chart - the charts + - `$last_collected_t` is the unix timestamp of the last data collection + - `$collected_total_raw` is the sum of all the dimensions (their last collected values) + - `$update_every` is the update frequency of the chart + - `$green` and `$red` the threshold defined in alarms (these are per chart - the charts inherits them from the the first alarm that defined them) Chart dimensions define their last calculated (i.e. interpolated) value, exactly as @@ -682,43 +840,43 @@ unless if you explicitly limit an alarm with the [alarm line `families`](#alarm- that resolves to unix timestamp the dimension was last collected (there may be dimensions that fail to be collected while others continue normally). -- **family variables**. Families are used to group charts together. For example all `eth0` +- **family variables**. Families are used to group charts together. For example all `eth0` charts, have `family = eth0`. This index includes all local variables, but if there are overlapping variables, only the first are exposed. -- **host variables**. All the dimensions of all charts, including all alarms, in fullname. +- **host variables**. All the dimensions of all charts, including all alarms, in fullname. Fullname is `CHART.VARIABLE`, where `CHART` is either the chart id or the chart name (both are supported). -- **special variables\*** are: +- **special variables\*** are: - - `$this`, which is resolved to the value of the current alarm. + - `$this`, which is resolved to the value of the current alarm. - - `$status`, which is resolved to the current status of the alarm (the current = the last + - `$status`, which is resolved to the current status of the alarm (the current = the last status, i.e. before the current database lookup and the evaluation of the `calc` line). This values can be compared with `$REMOVED`, `$UNINITIALIZED`, `$UNDEFINED`, `$CLEAR`, `$WARNING`, `$CRITICAL`. These values are incremental, ie. `$status > $CLEAR` works as expected. - - `$now`, which is resolved to current unix timestamp. + - `$now`, which is resolved to current unix timestamp. ## Alarm statuses Alarms can have the following statuses: -- `REMOVED` - the alarm has been deleted (this happens when a SIGUSR2 is sent to Netdata +- `REMOVED` - the alarm has been deleted (this happens when a SIGUSR2 is sent to Netdata to reload health configuration) -- `UNINITIALIZED` - the alarm is not initialized yet +- `UNINITIALIZED` - the alarm is not initialized yet -- `UNDEFINED` - the alarm failed to be calculated (i.e. the database lookup failed, +- `UNDEFINED` - the alarm failed to be calculated (i.e. the database lookup failed, a division by zero occurred, etc) -- `CLEAR` - the alarm is not armed / raised (i.e. is OK) +- `CLEAR` - the alarm is not armed / raised (i.e. is OK) -- `WARNING` - the warning expression resulted in true or non-zero +- `WARNING` - the warning expression resulted in true or non-zero -- `CRITICAL` - the critical expression resulted in true or non-zero +- `CRITICAL` - the critical expression resulted in true or non-zero The external script will be called for all status changes. @@ -762,9 +920,9 @@ The above applies the **template** to all charts that have `context = apache.req calc: $now - $last_collected_t ``` -- `$now` is a standard variable that resolves to the current timestamp. +- `$now` is a standard variable that resolves to the current timestamp. -- `$last_collected_t` is the last data collection timestamp of the chart. +- `$last_collected_t` is the last data collection timestamp of the chart. So this calculation gives the number of seconds passed since the last data collection. ```yaml @@ -780,7 +938,7 @@ The alarm will be evaluated every 10 seconds. If these result in non-zero or true, they trigger the alarm. -- `$this` refers to the value of this alarm (i.e. the result of the `calc` line. +- `$this` refers to the value of this alarm (i.e. the result of the `calc` line. We could also use `$apache_last_collected_secs`. `$update_every` is the update frequency of the chart, in seconds. @@ -935,9 +1093,9 @@ lookup: mean -10s of user Since [`z = (x - mean) / stddev`](https://en.wikipedia.org/wiki/Standard_score) we create two input alarms, one for `mean` and one for `stddev` and then use them both as inputs in our final `cpu_user_zscore` alarm. -### Example 8 - [Anomaly rate](https://learn.netdata.cloud/docs/agent/ml#anomaly-rate) based CPU dimensions alarm +### Example 8 - [Anomaly rate](https://github.com/netdata/netdata/blob/master/ml/README.md#anomaly-rate) based CPU dimensions alarm -Warning if 5 minute rolling [anomaly rate](https://learn.netdata.cloud/docs/agent/ml#anomaly-rate) for any CPU dimension is above 5%, critical if it goes above 20%: +Warning if 5 minute rolling [anomaly rate](https://github.com/netdata/netdata/blob/master/ml/README.md#anomaly-rate) for any CPU dimension is above 5%, critical if it goes above 20%: ```yaml template: ml_5min_cpu_dims @@ -956,9 +1114,9 @@ template: ml_5min_cpu_dims The `lookup` line will calculate the average anomaly rate of each `system.cpu` dimension over the last 5 minues. In this case Netdata will create alarms for all dimensions of the chart. -### Example 9 - [Anomaly rate](https://learn.netdata.cloud/docs/agent/ml#anomaly-rate) based CPU chart alarm +### Example 9 - [Anomaly rate](https://github.com/netdata/netdata/blob/master/ml/README.md#anomaly-rate) based CPU chart alarm -Warning if 5 minute rolling [anomaly rate](https://learn.netdata.cloud/docs/agent/ml#anomaly-rate) averaged across all CPU dimensions is above 5%, critical if it goes above 20%: +Warning if 5 minute rolling [anomaly rate](https://github.com/netdata/netdata/blob/master/ml/README.md#anomaly-rate) averaged across all CPU dimensions is above 5%, critical if it goes above 20%: ```yaml template: ml_5min_cpu_chart @@ -977,9 +1135,9 @@ template: ml_5min_cpu_chart The `lookup` line will calculate the average anomaly rate across all `system.cpu` dimensions over the last 5 minues. In this case Netdata will create one alarm for the chart. -### Example 10 - [Anomaly rate](https://learn.netdata.cloud/docs/agent/ml#anomaly-rate) based node level alarm +### Example 10 - [Anomaly rate](https://github.com/netdata/netdata/blob/master/ml/README.md#anomaly-rate) based node level alarm -Warning if 5 minute rolling [anomaly rate](https://learn.netdata.cloud/docs/agent/ml#anomaly-rate) averaged across all ML enabled dimensions is above 5%, critical if it goes above 20%: +Warning if 5 minute rolling [anomaly rate](https://github.com/netdata/netdata/blob/master/ml/README.md#anomaly-rate) averaged across all ML enabled dimensions is above 5%, critical if it goes above 20%: ```yaml template: ml_5min_node @@ -995,7 +1153,168 @@ template: ml_5min_node info: rolling 5min anomaly rate for all ML enabled dims ``` -The `lookup` line will use the `anomaly_rate` dimension of the `anomaly_detection.anomaly_rate` ML chart to calculate the average [node level anomaly rate](https://learn.netdata.cloud/docs/agent/ml#node-anomaly-rate) over the last 5 minues. +The `lookup` line will use the `anomaly_rate` dimension of the `anomaly_detection.anomaly_rate` ML chart to calculate the average [node level anomaly rate](https://github.com/netdata/netdata/blob/master/ml/README.md#node-anomaly-rate) over the last 5 minues. + +## Use dimension templates to create dynamic alarms + +In v1.18 of Netdata, we introduced **dimension templates** for alarms, which simplifies the process of +writing [alarm entities](#health-entity-reference) for +charts with many dimensions. + +Dimension templates can condense many individual entities into one—no more copy-pasting one entity and changing the +`alarm`/`template` and `lookup` lines for each dimension you'd like to monitor. + +### The fundamentals of `foreach` + +Our dimension templates update creates a new `foreach` parameter to the +existing [`lookup` line](#alarm-line-lookup). This +is where the magic happens. + +You use the `foreach` parameter to specify which dimensions you want to monitor with this single alarm. You can separate +them with a comma (`,`) or a pipe (`|`). You can also use +a [Netdata simple pattern](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md) to create +many alarms with a regex-like syntax. + +The `foreach` parameter _has_ to be the last parameter in your `lookup` line, and if you have both `of` and `foreach` in +the same `lookup` line, Netdata will ignore the `of` parameter and use `foreach` instead. + +Let's get into some examples so you can see how the new parameter works. + +> ⚠️ The following entities are examples to showcase the functionality and syntax of dimension templates. They are not +> meant to be run as-is on production systems. + +### Condensing entities with `foreach` + +Let's say you want to monitor the `system`, `user`, and `nice` dimensions in your system's overall CPU utilization. +Before dimension templates, you would need the following three entities: + +```yaml + alarm: cpu_system + on: system.cpu +lookup: average -10m percentage of system + every: 1m + warn: $this > 50 + crit: $this > 80 + + alarm: cpu_user + on: system.cpu +lookup: average -10m percentage of user + every: 1m + warn: $this > 50 + crit: $this > 80 + + alarm: cpu_nice + on: system.cpu +lookup: average -10m percentage of nice + every: 1m + warn: $this > 50 + crit: $this > 80 +``` + +With dimension templates, you can condense these into a single alarm. Take note of the `alarm` and `lookup` lines. + +```yaml + alarm: cpu_template + on: system.cpu +lookup: average -10m percentage foreach system,user,nice + every: 1m + warn: $this > 50 + crit: $this > 80 +``` + +The `alarm` line specifies the naming scheme Netdata will use. You can use whatever naming scheme you'd like, with `.` +and `_` being the only allowed symbols. + +The `lookup` line has changed from `of` to `foreach`, and we're now passing three dimensions. + +In this example, Netdata will create three alarms with the names `cpu_template_system`, `cpu_template_user`, and +`cpu_template_nice`. Every minute, each alarm will use the same database query to calculate the average CPU usage for +the `system`, `user`, and `nice` dimensions over the last 10 minutes and send out alarms if necessary. + +You can find these three alarms active by clicking on the **Alarms** button in the top navigation, and then clicking on +the **All** tab and scrolling to the **system - cpu** collapsible section. + +![Three new alarms created from the dimension template](https://user-images.githubusercontent.com/1153921/66218994-29523800-e67f-11e9-9bcb-9bca23e2c554.png) + +Let's look at some other examples of how `foreach` works so you can best apply it in your configurations. + +### Using a Netdata simple pattern in `foreach` + +In the last example, we used `foreach system,user,nice` to create three distinct alarms using dimension templates. But +what if you want to quickly create alarms for _all_ the dimensions of a given chart? + +Use a [simple pattern](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md)! One example of a simple pattern is a single wildcard +(`*`). + +Instead of monitoring system CPU usage, let's monitor per-application CPU usage using the `apps.cpu` chart. Passing a +wildcard as the simple pattern tells Netdata to create a separate alarm for _every_ process on your system: + +```yaml + alarm: app_cpu + on: apps.cpu +lookup: average -10m percentage foreach * + every: 1m + warn: $this > 50 + crit: $this > 80 +``` + +This entity will now create alarms for every dimension in the `apps.cpu` chart. Given that most `apps.cpu` charts have +10 or more dimensions, using the wildcard ensures you catch every CPU-hogging process. + +To learn more about how to use simple patterns with dimension templates, see +our [simple patterns documentation](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md). + +### Using `foreach` with alarm templates + +Dimension templates also work +with [alarm templates](#alarm-line-alarm-or-template). +Alarm templates help you create alarms for all the charts with a given context—for example, all the cores of your +system's CPU. + +By combining the two, you can create dozens of individual alarms with a single template entity. Here's how you would +create alarms for the `system`, `user`, and `nice` dimensions for every chart in the `cpu.cpu` context—or, in other +words, every CPU core. + +```yaml +template: cpu_template + on: cpu.cpu + lookup: average -10m percentage foreach system,user,nice + every: 1m + warn: $this > 50 + crit: $this > 80 +``` + +On a system with a 6-core, 12-thread Ryzen 5 1600 CPU, this one entity creates alarms on the following charts and +dimensions: + +- `cpu.cpu0` + - `cpu_template_user` + - `cpu_template_system` + - `cpu_template_nice` + +- `cpu.cpu1` + - `cpu_template_user` + - `cpu_template_system` + - `cpu_template_nice` + +- `cpu.cpu2` + - `cpu_template_user` + - `cpu_template_system` + - `cpu_template_nice` + +- ... + +- `cpu.cpu11` + - `cpu_template_user` + - `cpu_template_system` + - `cpu_template_nice` + +And how just a few of those dimension template-generated alarms look like in the Netdata dashboard. + +![A few of the created alarms in the Netdata dashboard](https://user-images.githubusercontent.com/1153921/66219669-708cf880-e680-11e9-8b3a-7bfe178fa28b.png) + +All in all, this single entity creates 36 individual alarms. Much easier than writing 36 separate entities in your +health configuration files! ## Troubleshooting @@ -1016,12 +1335,3 @@ You can find how Netdata interpreted the expressions by examining the alarm at `http://NODE:19999/api/v1/alarms?all`. For each expression, Netdata will return the expression as given in its config file, and the same expression with additional parentheses added to indicate the evaluation flow of the expression. - -## Disabling health checks or silencing notifications at runtime - -It's currently not possible to schedule notifications from within the alarm template. For those scenarios where you need -to temporary disable notifications (for instance when running backups triggers a disk alert) you can disable or silence -notifications are runtime. The health checks can be controlled at runtime via the [health management -api](https://github.com/netdata/netdata/blob/master/web/api/health/README.md). - - diff --git a/health/health.c b/health/health.c index b34f54ab5..5c2b85bc5 100644 --- a/health/health.c +++ b/health/health.c @@ -17,6 +17,11 @@ #error WORKER_UTILIZATION_MAX_JOB_TYPES has to be at least 10 #endif +unsigned int default_health_enabled = 1; +char *silencers_filename; +SIMPLE_PATTERN *conf_enabled_alarms = NULL; +DICTIONARY *health_rrdvars; + static bool prepare_command(BUFFER *wb, const char *exec, const char *recipient, @@ -157,10 +162,6 @@ static bool prepare_command(BUFFER *wb, return true; } -unsigned int default_health_enabled = 1; -char *silencers_filename; -SIMPLE_PATTERN *conf_enabled_alarms = NULL; - // the queue of executed alarm notifications that haven't been waited for yet static struct { ALARM_ENTRY *head; // oldest @@ -346,6 +347,15 @@ static void health_reload_host(RRDHOST *host) { rrdcalctemplate_link_matching_templates_to_rrdset(st); } rrdset_foreach_done(st); + +#ifdef ENABLE_ACLK + if (netdata_cloud_setting) { + struct aclk_sync_host_config *wc = (struct aclk_sync_host_config *)host->aclk_sync_host_config; + if (likely(wc)) { + wc->alert_queue_removed = SEND_REMOVED_AFTER_HEALTH_LOOPS; + } + } +#endif } /** @@ -356,19 +366,11 @@ static void health_reload_host(RRDHOST *host) { void health_reload(void) { sql_refresh_hashes(); - rrd_rdlock(); - RRDHOST *host; - rrdhost_foreach_read(host) + dfe_start_reentrant(rrdhost_root_index, host){ health_reload_host(host); - - rrd_unlock(); - -#ifdef ENABLE_ACLK - if (netdata_cloud_setting) { - aclk_alert_reloaded = 1; } -#endif + dfe_done(host); } // ---------------------------------------------------------------------------- @@ -752,7 +754,8 @@ static void health_main_cleanup(void *ptr) { log_health("Health thread ended."); } -static void initialize_health(RRDHOST *host, int is_localhost) { +static void initialize_health(RRDHOST *host) +{ if(!host->health.health_enabled || rrdhost_flag_check(host, RRDHOST_FLAG_INITIALIZED_HEALTH) || !service_running(SERVICE_HEALTH)) @@ -779,25 +782,13 @@ static void initialize_health(RRDHOST *host, int is_localhost) { else host->health_log.max = (unsigned int)n; - conf_enabled_alarms = simple_pattern_create(config_get(CONFIG_SECTION_HEALTH, "enabled alarms", "*"), NULL, SIMPLE_PATTERN_EXACT); + conf_enabled_alarms = simple_pattern_create(config_get(CONFIG_SECTION_HEALTH, "enabled alarms", "*"), NULL, + SIMPLE_PATTERN_EXACT, true); netdata_rwlock_init(&host->health_log.alarm_log_rwlock); char filename[FILENAME_MAX + 1]; - if(!is_localhost) { - int r = mkdir(host->varlib_dir, 0775); - if (r != 0 && errno != EEXIST) - error("Host '%s': cannot create directory '%s'", rrdhost_hostname(host), host->varlib_dir); - } - - { - snprintfz(filename, FILENAME_MAX, "%s/health", host->varlib_dir); - int r = mkdir(filename, 0775); - if(r != 0 && errno != EEXIST) - error("Host '%s': cannot create directory '%s'", rrdhost_hostname(host), filename); - } - snprintfz(filename, FILENAME_MAX, "%s/alarm-notify.sh", netdata_configured_primary_plugins_dir); host->health.health_default_exec = string_strdupz(config_get(CONFIG_SECTION_HEALTH, "script to execute on alarm", filename)); host->health.health_default_recipient = string_strdupz("root"); @@ -814,7 +805,7 @@ static void initialize_health(RRDHOST *host, int is_localhost) { // link the loaded alarms to their charts RRDSET *st; - rrdset_foreach_write(st, host) { + rrdset_foreach_reentrant(st, host) { if (rrdset_flag_check(st, RRDSET_FLAG_ARCHIVED)) continue; @@ -849,11 +840,11 @@ static SILENCE_TYPE check_silenced(RRDCALC *rc, const char *host, SILENCERS *sil for (s = silencers->silencers; s!=NULL; s=s->next){ if ( - (!s->alarms_pattern || (rc->name && s->alarms_pattern && simple_pattern_matches(s->alarms_pattern, rrdcalc_name(rc)))) && - (!s->contexts_pattern || (rc->rrdset && rc->rrdset->context && s->contexts_pattern && simple_pattern_matches(s->contexts_pattern, rrdset_context(rc->rrdset)))) && - (!s->hosts_pattern || (host && s->hosts_pattern && simple_pattern_matches(s->hosts_pattern,host))) && - (!s->charts_pattern || (rc->chart && s->charts_pattern && simple_pattern_matches(s->charts_pattern, rrdcalc_chart_name(rc)))) && - (!s->families_pattern || (rc->rrdset && rc->rrdset->family && s->families_pattern && simple_pattern_matches(s->families_pattern, rrdset_family(rc->rrdset)))) + (!s->alarms_pattern || (rc->name && s->alarms_pattern && simple_pattern_matches_string(s->alarms_pattern, rc->name))) && + (!s->contexts_pattern || (rc->rrdset && rc->rrdset->context && s->contexts_pattern && simple_pattern_matches_string(s->contexts_pattern, rc->rrdset->context))) && + (!s->hosts_pattern || (host && s->hosts_pattern && simple_pattern_matches(s->hosts_pattern, host))) && + (!s->charts_pattern || (rc->chart && s->charts_pattern && simple_pattern_matches_string(s->charts_pattern, rc->chart))) && + (!s->families_pattern || (rc->rrdset && rc->rrdset->family && s->families_pattern && simple_pattern_matches_string(s->families_pattern, rc->rrdset->family))) ) { debug(D_HEALTH, "Alarm matches command API silence entry %s:%s:%s:%s:%s", s->alarms,s->charts, s->contexts, s->hosts, s->families); if (unlikely(silencers->stype == STYPE_NONE)) { @@ -925,19 +916,6 @@ static void health_execute_delayed_initializations(RRDHOST *host) { worker_is_busy(WORKER_HEALTH_JOB_DELAYED_INIT_RRDSET); - if(!st->rrdfamily) - st->rrdfamily = rrdfamily_add_and_acquire(host, rrdset_family(st)); - - if(!st->rrdvars) - st->rrdvars = rrdvariables_create(); - - rrddimvar_index_init(st); - - rrdsetvar_add_and_leave_released(st, "last_collected_t", RRDVAR_TYPE_TIME_T, &st->last_collected_time.tv_sec, RRDVAR_FLAG_NONE); - rrdsetvar_add_and_leave_released(st, "green", RRDVAR_TYPE_CALCULATED, &st->green, RRDVAR_FLAG_NONE); - rrdsetvar_add_and_leave_released(st, "red", RRDVAR_TYPE_CALCULATED, &st->red, RRDVAR_FLAG_NONE); - rrdsetvar_add_and_leave_released(st, "update_every", RRDVAR_TYPE_INT, &st->update_every, RRDVAR_FLAG_NONE); - rrdcalc_link_matching_alerts_to_rrdset(st); rrdcalctemplate_link_matching_templates_to_rrdset(st); @@ -948,19 +926,19 @@ static void health_execute_delayed_initializations(RRDHOST *host) { worker_is_busy(WORKER_HEALTH_JOB_DELAYED_INIT_RRDDIM); - rrddimvar_add_and_leave_released(rd, RRDVAR_TYPE_CALCULATED, NULL, NULL, &rd->last_stored_value, RRDVAR_FLAG_NONE); - rrddimvar_add_and_leave_released(rd, RRDVAR_TYPE_COLLECTED, NULL, "_raw", &rd->last_collected_value, RRDVAR_FLAG_NONE); - rrddimvar_add_and_leave_released(rd, RRDVAR_TYPE_TIME_T, NULL, "_last_collected_t", &rd->last_collected_time.tv_sec, RRDVAR_FLAG_NONE); - RRDCALCTEMPLATE *rt; foreach_rrdcalctemplate_read(host, rt) { if(!rt->foreach_dimension_pattern) continue; - if(rrdcalctemplate_check_rrdset_conditions(rt, st, host)) + if(rrdcalctemplate_check_rrdset_conditions(rt, st, host)) { rrdcalctemplate_check_rrddim_conditions_and_link(rt, st, rd, host); + } } foreach_rrdcalctemplate_done(rt); + + if (health_variable_check(health_rrdvars, st, rd)) + rrdvar_store_for_chart(host, st); } rrddim_foreach_done(rd); } @@ -1002,9 +980,7 @@ void *health_main(void *ptr) { rrdcalc_delete_alerts_not_matching_host_labels_from_all_hosts(); unsigned int loop = 0; -#ifdef ENABLE_ACLK - unsigned int marked_aclk_reload_loop = 0; -#endif + while(service_running(SERVICE_HEALTH)) { loop++; debug(D_HEALTH, "Health monitoring iteration no %u started", loop); @@ -1033,15 +1009,8 @@ void *health_main(void *ptr) { } } -#ifdef ENABLE_ACLK - if (aclk_alert_reloaded && !marked_aclk_reload_loop) - marked_aclk_reload_loop = loop; -#endif - worker_is_busy(WORKER_HEALTH_JOB_RRD_LOCK); - rrd_rdlock(); - - rrdhost_foreach_read(host) { + dfe_start_reentrant(rrdhost_root_index, host) { if(unlikely(!service_running(SERVICE_HEALTH))) break; @@ -1049,11 +1018,8 @@ void *health_main(void *ptr) { if (unlikely(!host->health.health_enabled)) continue; - if (unlikely(!rrdhost_flag_check(host, RRDHOST_FLAG_INITIALIZED_HEALTH))) { - rrd_unlock(); - initialize_health(host, host == localhost); - rrd_rdlock(); - } + if (unlikely(!rrdhost_flag_check(host, RRDHOST_FLAG_INITIALIZED_HEALTH))) + initialize_health(host); health_execute_delayed_initializations(host); @@ -1147,7 +1113,7 @@ void *health_main(void *ptr) { rc->value = NAN; #ifdef ENABLE_ACLK - if (netdata_cloud_setting && likely(!aclk_alert_reloaded)) + if (netdata_cloud_setting) sql_queue_alarm_to_aclk(host, ae, 1); #endif } @@ -1518,9 +1484,28 @@ void *health_main(void *ptr) { } break; } - } //for each host +#ifdef ENABLE_ACLK + if (netdata_cloud_setting) { + struct aclk_sync_host_config *wc = (struct aclk_sync_host_config *)host->aclk_sync_host_config; + if (unlikely(!wc)) { + continue; + } + + if (wc->alert_queue_removed == 1) { + sql_queue_removed_alerts_to_aclk(host); + } else if (wc->alert_queue_removed > 1) { + wc->alert_queue_removed--; + } - rrd_unlock(); + if (wc->alert_checkpoint_req == 1) { + aclk_push_alarm_checkpoint(host); + } else if (wc->alert_checkpoint_req > 1) { + wc->alert_checkpoint_req--; + } + } +#endif + } + dfe_done(host); // wait for all notifications to finish before allowing health to be cleaned up ALARM_ENTRY *ae; @@ -1531,22 +1516,6 @@ void *health_main(void *ptr) { health_alarm_wait_for_execution(ae); } -#ifdef ENABLE_ACLK - if (netdata_cloud_setting && unlikely(aclk_alert_reloaded) && loop > (marked_aclk_reload_loop + 2)) { - rrdhost_foreach_read(host) { - if(unlikely(!service_running(SERVICE_HEALTH))) - break; - - if (unlikely(!host->health.health_enabled)) - continue; - - sql_queue_removed_alerts_to_aclk(host); - } - aclk_alert_reloaded = 0; - marked_aclk_reload_loop = 0; - } -#endif - if(unlikely(!service_running(SERVICE_HEALTH))) break; diff --git a/health/health.d/btrfs.conf b/health/health.d/btrfs.conf index 8d197aa8d..ab63ff28d 100644 --- a/health/health.d/btrfs.conf +++ b/health/health.d/btrfs.conf @@ -66,3 +66,78 @@ component: File system delay: up 1m down 15m multiplier 1.5 max 1h info: utilization of BTRFS system space to: sysadmin + + template: btrfs_device_read_errors + on: btrfs.device_errors + class: Errors + type: System +component: File system + os: * + hosts: * + families: * + units: errors + lookup: max -10m every 1m of read_errs + warn: $this > 0 + delay: up 1m down 15m multiplier 1.5 max 1h + info: number of encountered BTRFS read errors + to: sysadmin + + template: btrfs_device_write_errors + on: btrfs.device_errors + class: Errors + type: System +component: File system + os: * + hosts: * + families: * + units: errors + lookup: max -10m every 1m of write_errs + warn: $this > 0 + delay: up 1m down 15m multiplier 1.5 max 1h + info: number of encountered BTRFS write errors + to: sysadmin + + template: btrfs_device_flush_errors + on: btrfs.device_errors + class: Errors + type: System +component: File system + os: * + hosts: * + families: * + units: errors + lookup: max -10m every 1m of flush_errs + warn: $this > 0 + delay: up 1m down 15m multiplier 1.5 max 1h + info: number of encountered BTRFS flush errors + to: sysadmin + + template: btrfs_device_corruption_errors + on: btrfs.device_errors + class: Errors + type: System +component: File system + os: * + hosts: * + families: * + units: errors + lookup: max -10m every 1m of corruption_errs + warn: $this > 0 + delay: up 1m down 15m multiplier 1.5 max 1h + info: number of encountered BTRFS corruption errors + to: sysadmin + + template: btrfs_device_generation_errors + on: btrfs.device_errors + class: Errors + type: System +component: File system + os: * + hosts: * + families: * + units: errors + lookup: max -10m every 1m of generation_errs + warn: $this > 0 + delay: up 1m down 15m multiplier 1.5 max 1h + info: number of encountered BTRFS generation errors + to: sysadmin diff --git a/health/health.d/docker.conf b/health/health.d/docker.conf new file mode 100644 index 000000000..f17028472 --- /dev/null +++ b/health/health.d/docker.conf @@ -0,0 +1,11 @@ + template: docker_container_unhealthy + on: docker.container_health_status + class: Errors + type: Containers +component: Docker + units: status + every: 10s + lookup: average -10s of unhealthy + crit: $this > 0 + info: ${label:container_name} docker container health status is unhealthy + to: sysadmin diff --git a/health/health.d/dockerd.conf b/health/health.d/dockerd.conf deleted file mode 100644 index 220ddd664..000000000 --- a/health/health.d/dockerd.conf +++ /dev/null @@ -1,11 +0,0 @@ - template: docker_unhealthy_containers - on: docker.unhealthy_containers - class: Errors - type: Containers -component: Docker - units: unhealthy containers - every: 10s - lookup: average -10s - crit: $this > 0 - info: average number of unhealthy docker containers over the last 10 seconds - to: sysadmin diff --git a/health/health.d/wmi.conf b/health/health.d/windows.conf index 90d39ce9d..d678ac3ae 100644 --- a/health/health.d/wmi.conf +++ b/health/health.d/windows.conf @@ -1,8 +1,8 @@ ## CPU - template: wmi_10min_cpu_usage - on: wmi.cpu_utilization_total + template: windows_10min_cpu_usage + on: windows.cpu_utilization_total class: Utilization type: Windows component: CPU @@ -20,8 +20,8 @@ component: CPU ## Memory - template: wmi_ram_in_use - on: wmi.memory_utilization + template: windows_ram_in_use + on: windows.memory_utilization class: Utilization type: Windows component: Memory @@ -36,8 +36,8 @@ component: Memory info: memory utilization to: sysadmin - template: wmi_swap_in_use - on: wmi.memory_swap_utilization + template: windows_swap_in_use + on: windows.memory_swap_utilization class: Utilization type: Windows component: Memory @@ -55,8 +55,8 @@ component: Memory ## Network - template: wmi_inbound_packets_discarded - on: wmi.net_discarded + template: windows_inbound_packets_discarded + on: windows.net_discarded class: Errors type: Windows component: Network @@ -71,8 +71,8 @@ component: Network info: number of inbound discarded packets for the network interface in the last 10 minutes to: sysadmin - template: wmi_outbound_packets_discarded - on: wmi.net_discarded + template: windows_outbound_packets_discarded + on: windows.net_discarded class: Errors type: Windows component: Network @@ -87,8 +87,8 @@ component: Network info: number of outbound discarded packets for the network interface in the last 10 minutes to: sysadmin - template: wmi_inbound_packets_errors - on: wmi.net_errors + template: windows_inbound_packets_errors + on: windows.net_errors class: Errors type: Windows component: Network @@ -103,8 +103,8 @@ component: Network info: number of inbound errors for the network interface in the last 10 minutes to: sysadmin - template: wmi_outbound_packets_errors - on: wmi.net_errors + template: windows_outbound_packets_errors + on: windows.net_errors class: Errors type: Windows component: Network @@ -122,8 +122,8 @@ component: Network ## Disk - template: wmi_disk_in_use - on: wmi.logical_disk_utilization + template: windows_disk_in_use + on: windows.logical_disk_utilization class: Utilization type: Windows component: Disk diff --git a/health/health.h b/health/health.h index 50c3e3452..902e36c62 100644 --- a/health/health.h +++ b/health/health.h @@ -32,6 +32,7 @@ extern unsigned int default_health_enabled; extern char *silencers_filename; extern SIMPLE_PATTERN *conf_enabled_alarms; +extern DICTIONARY *health_rrdvars; void health_init(void); diff --git a/health/health_config.c b/health/health_config.c index 55d5e10eb..38857fc9a 100644 --- a/health/health_config.c +++ b/health/health_config.c @@ -185,6 +185,51 @@ static inline int health_parse_repeat( return 1; } +static inline int isvariableterm(const char s) { + if(isalnum(s) || s == '.' || s == '_') + return 0; + + return 1; +} + +static inline void parse_variables_and_store_in_health_rrdvars(char *value, size_t len) { + const char *s = value; + char buffer[RRDVAR_MAX_LENGTH]; + + // $ + while (*s) { + if(*s == '$') { + size_t i = 0; + s++; + + if(*s == '{') { + // ${variable_name} + + s++; + while (*s && *s != '}' && i < len) + buffer[i++] = *s++; + + if(*s == '}') + s++; + } + else { + // $variable_name + + while (*s && !isvariableterm(*s) && i < len) + buffer[i++] = *s++; + } + + buffer[i] = '\0'; + + //TODO: check and try to store only variables + STRING *name_string = rrdvar_name_to_string(buffer); + rrdvar_add("health", health_rrdvars, name_string, RRDVAR_TYPE_CALCULATED, RRDVAR_FLAG_CONFIG_VAR, NULL); + string_freez(name_string); + } else + s++; + } +} + /** * Health pattern from Foreach * @@ -206,7 +251,7 @@ static SIMPLE_PATTERN *health_pattern_from_foreach(const char *s) { if(convert) { dimension_remove_pipe_comma(convert); - val = simple_pattern_create(convert, NULL, SIMPLE_PATTERN_EXACT); + val = simple_pattern_create(convert, NULL, SIMPLE_PATTERN_EXACT, true); freez(convert); } @@ -215,7 +260,7 @@ static SIMPLE_PATTERN *health_pattern_from_foreach(const char *s) { static inline int health_parse_db_lookup( size_t line, const char *filename, char *string, - RRDR_GROUPING *group_method, int *after, int *before, int *every, + RRDR_TIME_GROUPING *group_method, int *after, int *before, int *every, RRDCALC_OPTIONS *options, STRING **dimensions, STRING **foreachdim ) { debug(D_HEALTH, "Health configuration parsing database lookup %zu@%s: %s", line, filename, string); @@ -241,7 +286,7 @@ static inline int health_parse_db_lookup( return 0; } - if((*group_method = web_client_api_request_v1_data_group(key, RRDR_GROUPING_UNDEFINED)) == RRDR_GROUPING_UNDEFINED) { + if((*group_method = time_grouping_parse(key, RRDR_GROUPING_UNDEFINED)) == RRDR_GROUPING_UNDEFINED) { error("Health configuration at line %zu of file '%s': invalid group method '%s'", line, filename, key); return 0; @@ -634,9 +679,9 @@ static int health_readfile(const char *filename, void *data) { else if(hash == hash_os && !strcasecmp(key, HEALTH_OS_KEY)) { char *os_match = value; if (alert_cfg) alert_cfg->os = string_strdupz(value); - SIMPLE_PATTERN *os_pattern = simple_pattern_create(os_match, NULL, SIMPLE_PATTERN_EXACT); + SIMPLE_PATTERN *os_pattern = simple_pattern_create(os_match, NULL, SIMPLE_PATTERN_EXACT, true); - if(!simple_pattern_matches(os_pattern, rrdhost_os(host))) { + if(!simple_pattern_matches_string(os_pattern, host->os)) { if(rc) debug(D_HEALTH, "HEALTH on '%s' ignoring alarm '%s' defined at %zu@%s: host O/S does not match '%s'", rrdhost_hostname(host), rrdcalc_name(rc), line, filename, os_match); @@ -651,9 +696,9 @@ static int health_readfile(const char *filename, void *data) { else if(hash == hash_host && !strcasecmp(key, HEALTH_HOST_KEY)) { char *host_match = value; if (alert_cfg) alert_cfg->host = string_strdupz(value); - SIMPLE_PATTERN *host_pattern = simple_pattern_create(host_match, NULL, SIMPLE_PATTERN_EXACT); + SIMPLE_PATTERN *host_pattern = simple_pattern_create(host_match, NULL, SIMPLE_PATTERN_EXACT, true); - if(!simple_pattern_matches(host_pattern, rrdhost_hostname(host))) { + if(!simple_pattern_matches_string(host_pattern, host->hostname)) { if(rc) debug(D_HEALTH, "HEALTH on '%s' ignoring alarm '%s' defined at %zu@%s: hostname does not match '%s'", rrdhost_hostname(host), rrdcalc_name(rc), line, filename, host_match); @@ -728,7 +773,7 @@ static int health_readfile(const char *filename, void *data) { if (rc->dimensions) alert_cfg->p_db_lookup_dimensions = string_dup(rc->dimensions); if (rc->group) - alert_cfg->p_db_lookup_method = string_strdupz(group_method2string(rc->group)); + alert_cfg->p_db_lookup_method = string_strdupz(time_grouping_method2string(rc->group)); alert_cfg->p_db_lookup_options = rc->options; alert_cfg->p_db_lookup_after = rc->after; alert_cfg->p_db_lookup_before = rc->before; @@ -769,6 +814,7 @@ static int health_readfile(const char *filename, void *data) { error("Health configuration at line %zu of file '%s' for alarm '%s' at key '%s' has unparse-able expression '%s': %s at '%s'", line, filename, rrdcalc_name(rc), key, value, expression_strerror(error), failed_at); } + parse_variables_and_store_in_health_rrdvars(value, HEALTH_CONF_MAX_LINE); } else if(hash == hash_warn && !strcasecmp(key, HEALTH_WARN_KEY)) { alert_cfg->warn = string_strdupz(value); @@ -779,6 +825,7 @@ static int health_readfile(const char *filename, void *data) { error("Health configuration at line %zu of file '%s' for alarm '%s' at key '%s' has unparse-able expression '%s': %s at '%s'", line, filename, rrdcalc_name(rc), key, value, expression_strerror(error), failed_at); } + parse_variables_and_store_in_health_rrdvars(value, HEALTH_CONF_MAX_LINE); } else if(hash == hash_crit && !strcasecmp(key, HEALTH_CRIT_KEY)) { alert_cfg->crit = string_strdupz(value); @@ -789,6 +836,7 @@ static int health_readfile(const char *filename, void *data) { error("Health configuration at line %zu of file '%s' for alarm '%s' at key '%s' has unparse-able expression '%s': %s at '%s'", line, filename, rrdcalc_name(rc), key, value, expression_strerror(error), failed_at); } + parse_variables_and_store_in_health_rrdvars(value, HEALTH_CONF_MAX_LINE); } else if(hash == hash_exec && !strcasecmp(key, HEALTH_EXEC_KEY)) { alert_cfg->exec = string_strdupz(value); @@ -870,7 +918,8 @@ static int health_readfile(const char *filename, void *data) { rc->host_labels = string_strdupz(tmp); freez(tmp); } - rc->host_labels_pattern = simple_pattern_create(rrdcalc_host_labels(rc), NULL, SIMPLE_PATTERN_EXACT); + rc->host_labels_pattern = simple_pattern_create(rrdcalc_host_labels(rc), NULL, SIMPLE_PATTERN_EXACT, + true); } else if(hash == hash_plugin && !strcasecmp(key, HEALTH_PLUGIN_KEY)) { alert_cfg->plugin = string_strdupz(value); @@ -878,7 +927,7 @@ static int health_readfile(const char *filename, void *data) { simple_pattern_free(rc->plugin_pattern); rc->plugin_match = string_strdupz(value); - rc->plugin_pattern = simple_pattern_create(rrdcalc_plugin_match(rc), NULL, SIMPLE_PATTERN_EXACT); + rc->plugin_pattern = simple_pattern_create(rrdcalc_plugin_match(rc), NULL, SIMPLE_PATTERN_EXACT, true); } else if(hash == hash_module && !strcasecmp(key, HEALTH_MODULE_KEY)) { alert_cfg->module = string_strdupz(value); @@ -886,7 +935,7 @@ static int health_readfile(const char *filename, void *data) { simple_pattern_free(rc->module_pattern); rc->module_match = string_strdupz(value); - rc->module_pattern = simple_pattern_create(rrdcalc_module_match(rc), NULL, SIMPLE_PATTERN_EXACT); + rc->module_pattern = simple_pattern_create(rrdcalc_module_match(rc), NULL, SIMPLE_PATTERN_EXACT, true); } else { error("Health configuration at line %zu of file '%s' for alarm '%s' has unknown key '%s'.", @@ -950,7 +999,8 @@ static int health_readfile(const char *filename, void *data) { simple_pattern_free(rt->family_pattern); rt->family_match = string_strdupz(value); - rt->family_pattern = simple_pattern_create(rrdcalctemplate_family_match(rt), NULL, SIMPLE_PATTERN_EXACT); + rt->family_pattern = simple_pattern_create(rrdcalctemplate_family_match(rt), NULL, SIMPLE_PATTERN_EXACT, + true); } else if(hash == hash_plugin && !strcasecmp(key, HEALTH_PLUGIN_KEY)) { alert_cfg->plugin = string_strdupz(value); @@ -958,7 +1008,8 @@ static int health_readfile(const char *filename, void *data) { simple_pattern_free(rt->plugin_pattern); rt->plugin_match = string_strdupz(value); - rt->plugin_pattern = simple_pattern_create(rrdcalctemplate_plugin_match(rt), NULL, SIMPLE_PATTERN_EXACT); + rt->plugin_pattern = simple_pattern_create(rrdcalctemplate_plugin_match(rt), NULL, SIMPLE_PATTERN_EXACT, + true); } else if(hash == hash_module && !strcasecmp(key, HEALTH_MODULE_KEY)) { alert_cfg->module = string_strdupz(value); @@ -966,7 +1017,8 @@ static int health_readfile(const char *filename, void *data) { simple_pattern_free(rt->module_pattern); rt->module_match = string_strdupz(value); - rt->module_pattern = simple_pattern_create(rrdcalctemplate_module_match(rt), NULL, SIMPLE_PATTERN_EXACT); + rt->module_pattern = simple_pattern_create(rrdcalctemplate_module_match(rt), NULL, SIMPLE_PATTERN_EXACT, + true); } else if(hash == hash_charts && !strcasecmp(key, HEALTH_CHARTS_KEY)) { alert_cfg->charts = string_strdupz(value); @@ -974,7 +1026,8 @@ static int health_readfile(const char *filename, void *data) { simple_pattern_free(rt->charts_pattern); rt->charts_match = string_strdupz(value); - rt->charts_pattern = simple_pattern_create(rrdcalctemplate_charts_match(rt), NULL, SIMPLE_PATTERN_EXACT); + rt->charts_pattern = simple_pattern_create(rrdcalctemplate_charts_match(rt), NULL, SIMPLE_PATTERN_EXACT, + true); } else if(hash == hash_lookup && !strcasecmp(key, HEALTH_LOOKUP_KEY)) { alert_cfg->lookup = string_strdupz(value); @@ -989,7 +1042,7 @@ static int health_readfile(const char *filename, void *data) { alert_cfg->p_db_lookup_dimensions = string_dup(rt->dimensions); if (rt->group) - alert_cfg->p_db_lookup_method = string_strdupz(group_method2string(rt->group)); + alert_cfg->p_db_lookup_method = string_strdupz(time_grouping_method2string(rt->group)); alert_cfg->p_db_lookup_options = rt->options; alert_cfg->p_db_lookup_after = rt->after; @@ -1031,6 +1084,7 @@ static int health_readfile(const char *filename, void *data) { error("Health configuration at line %zu of file '%s' for template '%s' at key '%s' has unparse-able expression '%s': %s at '%s'", line, filename, rrdcalctemplate_name(rt), key, value, expression_strerror(error), failed_at); } + parse_variables_and_store_in_health_rrdvars(value, HEALTH_CONF_MAX_LINE); } else if(hash == hash_warn && !strcasecmp(key, HEALTH_WARN_KEY)) { alert_cfg->warn = string_strdupz(value); @@ -1041,6 +1095,7 @@ static int health_readfile(const char *filename, void *data) { error("Health configuration at line %zu of file '%s' for template '%s' at key '%s' has unparse-able expression '%s': %s at '%s'", line, filename, rrdcalctemplate_name(rt), key, value, expression_strerror(error), failed_at); } + parse_variables_and_store_in_health_rrdvars(value, HEALTH_CONF_MAX_LINE); } else if(hash == hash_crit && !strcasecmp(key, HEALTH_CRIT_KEY)) { alert_cfg->crit = string_strdupz(value); @@ -1051,6 +1106,7 @@ static int health_readfile(const char *filename, void *data) { error("Health configuration at line %zu of file '%s' for template '%s' at key '%s' has unparse-able expression '%s': %s at '%s'", line, filename, rrdcalctemplate_name(rt), key, value, expression_strerror(error), failed_at); } + parse_variables_and_store_in_health_rrdvars(value, HEALTH_CONF_MAX_LINE); } else if(hash == hash_exec && !strcasecmp(key, HEALTH_EXEC_KEY)) { alert_cfg->exec = string_strdupz(value); @@ -1130,7 +1186,8 @@ static int health_readfile(const char *filename, void *data) { rt->host_labels = string_strdupz(tmp); freez(tmp); } - rt->host_labels_pattern = simple_pattern_create(rrdcalctemplate_host_labels(rt), NULL, SIMPLE_PATTERN_EXACT); + rt->host_labels_pattern = simple_pattern_create(rrdcalctemplate_host_labels(rt), NULL, + SIMPLE_PATTERN_EXACT, true); } else { error("Health configuration at line %zu of file '%s' for template '%s' has unknown key '%s'.", @@ -1185,6 +1242,9 @@ void health_readdir(RRDHOST *host, const char *user_path, const char *stock_path stock_path = user_path; } + if (!health_rrdvars) + health_rrdvars = health_rrdvariables_create(); + recursive_config_double_dir_load(user_path, stock_path, subpath, health_readfile, (void *) host, 0); log_health("[%s]: Read health configuration.", rrdhost_hostname(host)); sql_store_hashes = 0; diff --git a/health/health_json.c b/health/health_json.c index 8cabaa0bf..ba18bddba 100644 --- a/health/health_json.c +++ b/health/health_json.c @@ -103,11 +103,11 @@ void health_alarm_entry2json_nolock(BUFFER *wb, ALARM_ENTRY *ae, RRDHOST *host) } buffer_strcat(wb, "\t\t\"value\":"); - buffer_rrd_value(wb, ae->new_value); + buffer_print_netdata_double(wb, ae->new_value); buffer_strcat(wb, ",\n"); buffer_strcat(wb, "\t\t\"old_value\":"); - buffer_rrd_value(wb, ae->old_value); + buffer_print_netdata_double(wb, ae->old_value); buffer_strcat(wb, "\n"); buffer_strcat(wb, "\t}"); @@ -152,7 +152,7 @@ static inline void health_rrdcalc_values2json_nolock(RRDHOST *host, BUFFER *wb, , (unsigned long)rc->id); buffer_strcat(wb, "\t\t\t\"value\":"); - buffer_rrd_value(wb, rc->value); + buffer_print_netdata_double(wb, rc->value); buffer_strcat(wb, ",\n"); buffer_strcat(wb, "\t\t\t\"last_updated\":"); @@ -257,11 +257,11 @@ static inline void health_rrdcalc2json_nolock(RRDHOST *host, BUFFER *wb, RRDCALC "\t\t\t\"lookup_after\": %d,\n" "\t\t\t\"lookup_before\": %d,\n" "\t\t\t\"lookup_options\": \"", - (unsigned long) rc->db_after, - (unsigned long) rc->db_before, - group_method2string(rc->group), - rc->after, - rc->before + (unsigned long) rc->db_after, + (unsigned long) rc->db_before, + time_grouping_method2string(rc->group), + rc->after, + rc->before ); buffer_data_options2string(wb, rc->options); buffer_strcat(wb, "\",\n"); @@ -283,15 +283,15 @@ static inline void health_rrdcalc2json_nolock(RRDHOST *host, BUFFER *wb, RRDCALC } buffer_strcat(wb, "\t\t\t\"green\":"); - buffer_rrd_value(wb, rc->green); + buffer_print_netdata_double(wb, rc->green); buffer_strcat(wb, ",\n"); buffer_strcat(wb, "\t\t\t\"red\":"); - buffer_rrd_value(wb, rc->red); + buffer_print_netdata_double(wb, rc->red); buffer_strcat(wb, ",\n"); buffer_strcat(wb, "\t\t\t\"value\":"); - buffer_rrd_value(wb, rc->value); + buffer_print_netdata_double(wb, rc->value); buffer_strcat(wb, "\n"); buffer_strcat(wb, "\t\t}"); @@ -309,7 +309,7 @@ void health_aggregate_alarms(RRDHOST *host, BUFFER *wb, BUFFER* contexts, RRDCAL if (contexts) { p = (char*)buffer_tostring(contexts); - while(p && *p && (tok = mystrsep(&p, ", |"))) { + while(p && *p && (tok = strsep_skip_consecutive_separators(&p, ", |"))) { if(!*tok) continue; STRING *tok_string = string_strdupz(tok); diff --git a/health/health_log.c b/health/health_log.c index d3417493b..b1f59a1a5 100644 --- a/health/health_log.c +++ b/health/health_log.c @@ -95,6 +95,8 @@ inline void health_alarm_log_add_entry( ) { debug(D_HEALTH, "Health adding alarm log entry with id: %u", ae->unique_id); + __atomic_add_fetch(&host->health_transitions, 1, __ATOMIC_RELAXED); + // link it netdata_rwlock_wrlock(&host->health_log.alarm_log_rwlock); ae->next = host->health_log.alarms; diff --git a/health/notifications/Makefile.am b/health/notifications/Makefile.am index f026171a7..3114abc4e 100644 --- a/health/notifications/Makefile.am +++ b/health/notifications/Makefile.am @@ -37,6 +37,7 @@ include irc/Makefile.inc include kavenegar/Makefile.inc include messagebird/Makefile.inc include msteams/Makefile.inc +include ntfy/Makefile.inc include opsgenie/Makefile.inc include pagerduty/Makefile.inc include pushbullet/Makefile.inc diff --git a/health/notifications/README.md b/health/notifications/README.md index c59fecced..05efb3a06 100644 --- a/health/notifications/README.md +++ b/health/notifications/README.md @@ -1,72 +1,107 @@ -<!-- -title: "Alarm notifications" -description: "Reference documentation for Netdata's alarm notification feature, which supports dozens of endpoints, user roles, and more." -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/README.md" -sidebar_label: "Notifications Reference" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations/Alerts" ---> +# Agent alert notifications -# Alarm notifications +This is a reference documentation for Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. -The `exec` line in health configuration defines an external script that will be called once -the alarm is triggered. The default script is `alarm-notify.sh`. +The `script to execute on alarm` line in `netdata.conf` defines the external script that will be called once the alert is triggered. -You can change the default script globally by editing `/etc/netdata/netdata.conf`. +The default script is `alarm-notify.sh`. + +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> - Please also note that after most configuration changes you will need to [restart the Agent](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for the changes to take effect. +> +> It is recommended to use this way for configuring Netdata. + +You can change the default script globally by editing `netdata.conf` and changing the `script to execute on alarm` in the `[health]` section. `alarm-notify.sh` is capable of sending notifications: -- to multiple recipients -- using multiple notification methods -- filtering severity per recipient +- to multiple recipients +- using multiple notification methods +- filtering severity per recipient It uses **roles**. For example `sysadmin`, `webmaster`, `dba`, etc. -Each alarm is assigned to one or more roles, using the `to` line of the alarm configuration. Then `alarm-notify.sh` uses -its own configuration file `/etc/netdata/health_alarm_notify.conf`. To edit it on your system, run -`/etc/netdata/edit-config health_alarm_notify.conf` and find the destination address of the notification for each -method. +Each alert is assigned to one or more roles, using the `to` line of the alert configuration. For example, here is the alert configuration for `ram.conf` that defaults to the role `sysadmin`: + +```conf + alarm: ram_in_use + on: system.ram + class: Utilization + type: System +component: Memory + os: linux + hosts: * + calc: $used * 100 / ($used + $cached + $free + $buffers) + units: % + every: 10s + warn: $this > (($status >= $WARNING) ? (80) : (90)) + crit: $this > (($status == $CRITICAL) ? (90) : (98)) + delay: down 15m multiplier 1.5 max 1h + info: system memory utilization + to: sysadmin +``` + +Then `alarm-notify.sh` uses its own configuration file `health_alarm_notify.conf`, which at the bottom of the file stores the recipients per role, for all notification methods. + +Here is an example, of the `sysadmin`'s role recipients for the email notification. +You can send the notification to multiple recipients by separating the emails with a space. + +```conf + +############################################################################### +# RECIPIENTS PER ROLE + +# ----------------------------------------------------------------------------- +# generic system alarms +# CPU, disks, network interfaces, entropy, etc -Each role may have one or more destinations. +role_recipients_email[sysadmin]="someone@exaple.com someoneelse@example.com" +``` + +Each role may have one or more destinations and one or more notification methods. So, for example the `sysadmin` role may send: -1. emails to admin1@example.com and admin2@example.com -2. pushover.net notifications to USERTOKENS `A`, `B` and `C`. -3. pushbullet.com push notifications to admin1@example.com and admin2@example.com -4. messages to slack.com channel `#alarms` and `#systems`. -5. messages to Discord channels `#alarms` and `#systems`. +1. emails to admin1@example.com and admin2@example.com +2. pushover.net notifications to USERTOKENS `A`, `B` and `C`. +3. pushbullet.com push notifications to admin1@example.com and admin2@example.com +4. messages to the `#alerts` and `#systems` channels of a Slack workspace. +5. messages to Discord channels `#alerts` and `#systems`. ## Configuration -Edit `/etc/netdata/health_alarm_notify.conf` by running `/etc/netdata/edit-config health_alarm_notify.conf`: +You can edit `health_alarm_notify.conf` using the `edit-config` script to configure: -- settings per notification method: +- **Settings** per notification method: - all notification methods except email, require some configuration - (i.e. API keys, tokens, destination rooms, channels, etc). + All notification methods except email, require some configuration (i.e. API keys, tokens, destination rooms, channels, etc). Please check this section's content to find the configuration guides for your notification option of choice -- **recipients** per **role** per **notification method** +- **Recipients** per role per notification method -```sh -grep sysadmin /etc/netdata/health_alarm_notify.conf - -role_recipients_email[sysadmin]="${DEFAULT_RECIPIENT_EMAIL}" -role_recipients_pushover[sysadmin]="${DEFAULT_RECIPIENT_PUSHOVER}" -role_recipients_pushbullet[sysadmin]="${DEFAULT_RECIPIENT_PUSHBULLET}" -role_recipients_telegram[sysadmin]="${DEFAULT_RECIPIENT_TELEGRAM}" -role_recipients_slack[sysadmin]="${DEFAULT_RECIPIENT_SLACK}" -... -``` + ```conf + role_recipients_email[sysadmin]="${DEFAULT_RECIPIENT_EMAIL}" + role_recipients_pushover[sysadmin]="${DEFAULT_RECIPIENT_PUSHOVER}" + role_recipients_pushbullet[sysadmin]="${DEFAULT_RECIPIENT_PUSHBULLET}" + role_recipients_telegram[sysadmin]="${DEFAULT_RECIPIENT_TELEGRAM}" + role_recipients_slack[sysadmin]="${DEFAULT_RECIPIENT_SLACK}" + ... + ``` -## Testing Notifications + Here you can change the `${DEFAULT_...}` values to the values of the recipients you want, separated by a space if you have multiple recipients. -You can run the following command by hand, to test alarms configuration: +## Testing Alert Notifications + +You can run the following command by hand, to test alerts configuration: ```sh # become user netdata -su -s /bin/bash netdata +sudo su -s /bin/bash netdata # enable debugging info on the console export NETDATA_ALARM_NOTIFY_DEBUG=1 @@ -78,13 +113,95 @@ export NETDATA_ALARM_NOTIFY_DEBUG=1 /usr/libexec/netdata/plugins.d/alarm-notify.sh test "ROLE" ``` -Note that in versions before 1.16, the plugins.d directory may be installed in a different location in certain OSs (e.g. under `/usr/lib/netdata`). You can always find the location of the alarm-notify.sh script in `netdata.conf`. +If you are [running your own registry](https://github.com/netdata/netdata/blob/master/registry/README.md#run-your-own-registry), add `export NETDATA_REGISTRY_URL=[YOUR_URL]` before calling `alarm-notify.sh`. + +> If you need to dig even deeper, you can trace the execution with `bash -x`. Note that in test mode, `alarm-notify.sh` calls itself with many more arguments. So first do: +> +>```sh +>bash -x /usr/libexec/netdata/plugins.d/alarm-notify.sh test +>``` +> +> And then look in the output for the alarm-notify.sh calls and run the one you want to trace with `bash -x`. + +## Global configuration options + +### Notification Filtering + +When you define recipients per role for notification methods, you can append `|critical` to limit the notifications that are sent. + +In the following examples, the first recipient receives all the alarms, while the second one receives only notifications for alarms that have at some point become critical. +The second user may still receive warning and clear notifications, but only for the event that previously caused a critical alarm. + +```conf + email : "user1@example.com user2@example.com|critical" + pushover : "2987343...9437837 8756278...2362736|critical" + telegram : "111827421 112746832|critical" + slack : "alarms disasters|critical" + alerta : "alarms disasters|critical" + flock : "alarms disasters|critical" + discord : "alarms disasters|critical" + twilio : "+15555555555 +17777777777|critical" + messagebird: "+15555555555 +17777777777|critical" + kavenegar : "09155555555 09177777777|critical" + pd : "<pd_service_key_1> <pd_service_key_2>|critical" + irc : "<irc_channel_1> <irc_channel_2>|critical" +``` + +If a per role recipient is set to an empty string, the default recipient of the given +notification method (email, pushover, telegram, slack, alerta, etc) will be used. -If you need to dig even deeper, you can trace the execution with `bash -x`. Note that in test mode, alarm-notify.sh calls itself with many more arguments. So first do +To disable a notification, use the recipient called: disabled +This works for all notification methods (including the default recipients). -```sh -bash -x /usr/libexec/netdata/plugins.d/alarm-notify.sh test +### Proxy configuration + +If you need to send curl based notifications (pushover, pushbullet, slack, alerta, +flock, discord, telegram) via a proxy, you should set these variables to your proxy address: + +```conf +export http_proxy="http://10.0.0.1:3128/" +export https_proxy="http://10.0.0.1:3128/" +``` + +### Notification images + +Images in notifications need to be downloaded from an Internet facing site. + +To allow notification providers to fetch the icons/images, by default we set the URL of the global public netdata registry. + +If you have an Internet facing netdata (or you have copied the images/ folder +of netdata to your web server), set its URL here, to fetch the notification +images from it. + +```conf +images_base_url="http://my.public.netdata.server:19999" ``` - Then look in the output for the alarm-notify.sh calls and run the one you want to trace with `bash -x`. +### Date handling + +You can configure netdata alerts to send dates in any format you want via editing the `date_format` variable. + +This uses standard `date` command format strings. See `man date` for +more info on what formats are supported. + +Note that this has to start with a '+', otherwise it won't work. + +- For ISO 8601 dates, use `+%FT%T%z` +- For RFC 5322 dates, use `+%a, %d %b %Y %H:%M:%S %z` +- For RFC 3339 dates, use `+%F %T%:z` +- For RFC 1123 dates, use `+%a, %d %b %Y %H:%M:%S %Z` +- For RFC 1036 dates, use `+%A, %d-%b-%y %H:%M:%S %Z` +- For a reasonably local date and time (in that order), use `+%x %X` +- For the old default behavior (compatible with ANSI C's `asctime()` function), leave the `date_format` field empty. + +### Hostname handling + +By default, Netdata will use the simple hostname for the system (the hostname with everything after the first `.` removed) when displaying the hostname in alert notifications. + +If you instead prefer to have Netdata use the host's fully qualified domain name, you can set `use_fdqn` to `YES`. + +This setting does not account for child systems for which the system you are configuring is a parent. +> ### Note +> +> If the system's host name is overridden in `/etc/netdata.conf` with the `hostname` option, that name will be used unconditionally. diff --git a/health/notifications/alarm-notify.sh.in b/health/notifications/alarm-notify.sh.in index 0090427a0..51c000218 100755 --- a/health/notifications/alarm-notify.sh.in +++ b/health/notifications/alarm-notify.sh.in @@ -39,6 +39,7 @@ # - Stackpulse Event by @thiagoftsm # - Opsgenie by @thiaoftsm #9858 # - Gotify by @coffeegrind123 +# - ntfy.sh by @Dim-P # ----------------------------------------------------------------------------- # testing notifications @@ -176,6 +177,7 @@ sms hangouts dynatrace matrix +ntfy " # ----------------------------------------------------------------------------- @@ -199,7 +201,7 @@ fi [ -z "${NETDATA_STOCK_CONFIG_DIR}" ] && NETDATA_STOCK_CONFIG_DIR="@libconfigdir_POST@" [ -z "${NETDATA_CACHE_DIR}" ] && NETDATA_CACHE_DIR="@cachedir_POST@" [ -z "${NETDATA_REGISTRY_URL}" ] && NETDATA_REGISTRY_URL="https://registry.my-netdata.io" -[ -z "${NETDATA_REGISTRY_CLOUD_BASE_URL}" ] && NETDATA_REGISTRY_CLOUD_BASE_URL="https://api.netdata.cloud" +[ -z "${NETDATA_REGISTRY_CLOUD_BASE_URL}" ] && NETDATA_REGISTRY_CLOUD_BASE_URL="https://app.netdata.cloud" # ----------------------------------------------------------------------------- # parse command line parameters @@ -654,6 +656,9 @@ filter_recipient_by_criticality() { # check gotify { [ -z "${GOTIFY_APP_TOKEN}" ] || [ -z "${GOTIFY_APP_URL}" ]; } && SEND_GOTIFY="NO" +# check ntfy +[ -z "${DEFAULT_RECIPIENT_NTFY}" ] && SEND_NTFY="NO" + # check stackpulse [ -z "${STACKPULSE_WEBHOOK}" ] && SEND_STACKPULSE="NO" @@ -692,7 +697,8 @@ if [ "${SEND_PUSHOVER}" = "YES" ] || [ "${SEND_DYNATRACE}" = "YES" ] || [ "${SEND_STACKPULSE}" = "YES" ] || [ "${SEND_OPSGENIE}" = "YES" ] || - [ "${SEND_GOTIFY}" = "YES" ]; then + [ "${SEND_GOTIFY}" = "YES" ] || + [ "${SEND_NTFY}" = "YES" ]; then # if we need curl, check for the curl command if [ -z "${curl}" ]; then curl="$(command -v curl 2>/dev/null)" @@ -723,6 +729,7 @@ if [ "${SEND_PUSHOVER}" = "YES" ] || SEND_STACKPULSE="NO" SEND_OPSGENIE="NO" SEND_GOTIFY="NO" + SEND_NTFY="NO" fi fi @@ -863,7 +870,8 @@ for method in "${SEND_EMAIL}" \ "${SEND_DYNATRACE}" \ "${SEND_STACKPULSE}" \ "${SEND_OPSGENIE}" \ - "${SEND_GOTIFY}" ; do + "${SEND_GOTIFY}" \ + "${SEND_NTFY}" ; do if [ "${method}" == "YES" ]; then proceed=1 @@ -2313,7 +2321,7 @@ EOF # Opsgenie sender send_opsgenie() { - local payload httpcode oldv currv + local payload httpcode oldv currv priority [ "${SEND_OPSGENIE}" != "YES" ] && return 1 if [ -z "${OPSGENIE_API_KEY}" ] ; then @@ -2321,6 +2329,14 @@ send_opsgenie() { return 1 fi + # Priority for OpsGenie alert (https://support.atlassian.com/opsgenie/docs/update-alert-priority-level/) + case "${status}" in + CRITICAL) priority="P1" ;; # Critical is P1 + WARNING) priority="P3" ;; # Warning is P3 + CLEAR) priority="P5" ;; # Clear is P5 + *) priority="P3" ;; # OpsGenie's default alert level is P3 + esac + # We are sending null when values are nan to avoid errors while JSON message is parsed [ "${old_value}" != "nan" ] && oldv="${old_value}" || oldv="null" [ "${value}" != "nan" ] && currv="${value}" || currv="null" @@ -2335,6 +2351,7 @@ send_opsgenie() { "when": ${when}, "name" : "${name}", "family" : "${family}", + "priority" : "${priority}", "status" : "${status}", "old_status" : "${old_status}", "value" : ${currv}, @@ -2403,6 +2420,50 @@ EOF } # ----------------------------------------------------------------------------- +# ntfy sender + +send_ntfy() { + local httpcode priority recipients=${1} sent=0 msg + + [ "${SEND_NTFY}" != "YES" ] && return 1 + + case "${status}" in + WARNING) emoji="warning" ;; + CRITICAL) emoji="red_circle" ;; + CLEAR) emoji="white_check_mark" ;; + *) emoji="white_circle" ;; + esac + + case ${status} in + WARNING) priority="high";; + CRITICAL) priority="urgent";; + *) priority="default" ;; + esac + + for recipient in ${recipients}; do + msg="${host} ${status_message}: ${alarm} - ${info}" + httpcode=$(docurl -X POST \ + -H "Icon: https://raw.githubusercontent.com/netdata/netdata/master/web/gui/dashboard/images/favicon-196x196.png" \ + -H "Title: ${host}: ${name}" \ + -H "Tags: ${emoji}" \ + -H "Priority: ${priority}" \ + -H "Actions: view, View node, ${goto_url}, clear=true;" \ + -d "${msg}" \ + ${recipient}) + if [ "${httpcode}" == "200" ]; then + info "sent ntfy notification for: ${host} ${chart}.${name} is ${status} to '${recipient}'" + sent=$((sent + 1)) + else + error "failed to send ntfy notification for: ${host} ${chart}.${name} is ${status} to '${recipient}', with HTTP response status code ${httpcode}." + fi + done + + [ ${sent} -gt 0 ] && return 0 + + return 1 +} + +# ----------------------------------------------------------------------------- # prepare the content of the notification # the url to send the user on click @@ -3609,6 +3670,11 @@ send_gotify SENT_GOTIFY=$? # ----------------------------------------------------------------------------- +# send messages to ntfy +send_ntfy "${DEFAULT_RECIPIENT_NTFY}" +SENT_NTFY=$? + +# ----------------------------------------------------------------------------- # let netdata know for state in "${SENT_EMAIL}" \ "${SENT_PUSHOVER}" \ @@ -3638,7 +3704,8 @@ for state in "${SENT_EMAIL}" \ "${SENT_DYNATRACE}" \ "${SENT_STACKPULSE}" \ "${SENT_OPSGENIE}" \ - "${SENT_GOTIFY}"; do + "${SENT_GOTIFY}" \ + "${SENT_NTFY}"; do if [ "${state}" -eq 0 ]; then # we sent something exit 0 diff --git a/health/notifications/alerta/README.md b/health/notifications/alerta/README.md index 5ecf55eea..237b9a78e 100644 --- a/health/notifications/alerta/README.md +++ b/health/notifications/alerta/README.md @@ -1,86 +1,76 @@ -<!-- -title: "alerta.io" -sidebar_label: "Alerta" -description: "Send alarm notifications to Alerta to see the latest health status updates from multiple nodes in a single interface." -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/alerta/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> +# Alerta Agent alert notifications -# alerta.io +Learn how to send notifications to Alerta using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. -The [Alerta](https://alerta.io) monitoring system is a tool used to -consolidate and de-duplicate alerts from multiple sources for quick -‘at-a-glance’ visualisation. With just one system you can monitor -alerts from many other monitoring tools on a single screen. +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. -![Alerta dashboard](https://docs.alerta.io/_images/alerta-screen-shot-3.png "Alerta dashboard showing several alerts.") +The [Alerta](https://alerta.io) monitoring system is a tool used to consolidate and de-duplicate alerts from multiple sources for quick ‘at-a-glance’ visualization. +With just one system you can monitor alerts from many other monitoring tools on a single screen. -Alerta's advantage is the main view, where you can see all active alarms with the most recent state. You can also view an alert history. You can send Netdata alerts to Alerta to see alerts coming from many Netdata hosts or also from a multi-host -Netdata configuration. +![Alerta dashboard showing several alerts](https://docs.alerta.io/_images/alerta-screen-shot-3.png) -## Deploying Alerta +Alerta's advantage is the main view, where you can see all active alert with the most recent state. +You can also view an alert history. -The recommended setup is using a dedicated server, VM or container. If you have other NGINX or Apache servers in your organization, -it is recommended to proxy to this new server. +You can send Netdata alerts to Alerta to see alerts coming from many Netdata hosts or also from a multi-host Netdata configuration. -You can install Alerta in several ways: -- **Docker**: Alerta provides a [Docker image](https://hub.docker.com/r/alerta/alerta-web/) to get you started quickly. -- **Deployment on Ubuntu server**: Alerta's [getting started tutorial](https://docs.alerta.io/gettingstarted/tutorial-1-deploy-alerta.html) walks you through this process. -- **Advanced deployment scenarios**: More ways to install and deploy Alerta are documented on the [Alerta docs](http://docs.alerta.io/en/latest/deployment.html). +## Prerequisites -## Sending alerts to Alerta +You need: -### Step 1. Create an API key (if authentication in Alerta is enabled) +- an Alerta instance +- an Alerta API key (if authentication in Alerta is enabled) +- terminal access to the Agent you wish to configure -You will need an API key to send messages from any source, if -Alerta is configured to use authentication (recommended). +## Configure Netdata to send alert notifications to Alerta -Create a new API key in Alerta: -1. Go to *Configuration* > *API Keys* -2. Create a new API key called "netdata" with `write:alerts` permission. +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. -### Step 2. Configure Netdata to send alerts to Alerta -1. Edit the `health_alarm_notify.conf` by running: -```sh -/etc/netdata/edit-config health_alarm_notify.conf -``` +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: -2. Modify the file as below: -``` -# enable/disable sending alerta notifications -SEND_ALERTA="YES" +1. Set `SEND_ALERTA` to `YES`. +2. set `ALERTA_WEBHOOK_URL` to the API url you defined when you installed the Alerta server. +3. Set `ALERTA_API_KEY` to your API key. + You will need an API key to send messages from any source, if Alerta is configured to use authentication (recommended). To create a new API key: + 1. Go to *Configuration* > *API Keys*. + 2. Create a new API key called "netdata" with `write:alerts` permission. +4. Set `DEFAULT_RECIPIENT_ALERTA` to the default recipient environment you want the alert notifications to be sent to. + All roles will default to this variable if left unconfigured. -# here set your alerta server API url -# this is the API url you defined when installed Alerta server, -# it is the same for all users. Do not include last slash. -ALERTA_WEBHOOK_URL="http://yourserver/alerta/api" +You can then have different recipient environments per **role**, by editing `DEFAULT_RECIPIENT_CUSTOM` with the environment name you want, in the following entries at the bottom of the same file: -# Login with an administrative user to you Alerta server and create an API KEY -# with write permissions. -ALERTA_API_KEY="INSERT_YOUR_API_KEY_HERE" - -# you can define environments in /etc/alertad.conf option ALLOWED_ENVIRONMENTS -# standard environments are Production and Development -# if a role's recipients are not configured, a notification will be send to -# this Environment (empty = do not send a notification for unconfigured roles): -DEFAULT_RECIPIENT_ALERTA="Production" +```conf +role_recipients_alerta[sysadmin]="Systems" +role_recipients_alerta[domainadmin]="Domains" +role_recipients_alerta[dba]="Databases Systems" +role_recipients_alerta[webmaster]="Marketing Development" +role_recipients_alerta[proxyadmin]="Proxy" +role_recipients_alerta[sitemgr]="Sites" ``` -## Test alarms +The values you provide should be defined as environments in `/etc/alertad.conf` option `ALLOWED_ENVIRONMENTS`. -We can test alarms using the standard approach: +An example working configuration would be: -```sh -/opt/netdata/netdata-plugins/plugins.d/alarm-notify.sh test -``` - -> **Note** This script will send 3 alarms. -> Alerta will not show the alerts in the main page, because last alarm is "CLEAR". -> To see the test alarms, you need to select "closed" alarms in the top-right lookup. +```conf +#------------------------------------------------------------------------------ +# alerta (alerta.io) global notification options -For more information see the [Alerta documentation](https://docs.alerta.io) +SEND_ALERTA="YES" +ALERTA_WEBHOOK_URL="http://yourserver/alerta/api" +ALERTA_API_KEY="INSERT_YOUR_API_KEY_HERE" +DEFAULT_RECIPIENT_ALERTA="Production" +``` +## Test the notification method +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/awssns/README.md b/health/notifications/awssns/README.md index 97768399e..f02e70912 100644 --- a/health/notifications/awssns/README.md +++ b/health/notifications/awssns/README.md @@ -1,31 +1,29 @@ -<!-- -title: "Amazon SNS" -sidebar_label: "Amazon SNS" -description: "hello" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/awssns/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> - -# Amazon SNS - -As part of its AWS suite, Amazon provides a notification broker service called 'Simple Notification Service' (SNS). Amazon SNS works similarly to Netdata's own notification system, allowing to dispatch a single notification to multiple subscribers of different types. While Amazon SNS supports sending differently formatted messages for different delivery methods, Netdata does not currently support this functionality. -Among other things, SNS supports sending notifications to: - -- Email addresses. -- Mobile Phones via SMS. -- HTTP or HTTPS web hooks. -- AWS Lambda functions. -- AWS SQS queues. -- Mobile applications via push notifications. - -For email notification support, we recommend using Netdata's email notifications, as it is has the following benefits: +# Amazon SNS Agent alert notifications + +Learn how to send notifications through Amazon SNS using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. + +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. + +As part of its AWS suite, Amazon provides a notification broker service called 'Simple Notification Service' (SNS). Amazon SNS works similarly to Netdata's own notification system, allowing to dispatch a single notification to multiple subscribers of different types. Among other things, SNS supports sending notifications to: + +- email addresses +- mobile Phones via SMS +- HTTP or HTTPS web hooks +- AWS Lambda functions +- AWS SQS queues +- mobile applications via push notifications + +> ### Note +> +> While Amazon SNS supports sending differently formatted messages for different delivery methods, Netdata does not currently support this functionality. + +For email notification support, we recommend using Netdata's [email notifications](https://github.com/netdata/netdata/blob/master/health/notifications/email/README.md), as it is has the following benefits: - In most cases, it requires less configuration. - Netdata's emails are nicely pre-formatted and support features like threading, which requires a lot of manual effort in SNS. -- It is less resource intensive and more cost-efficient than SNS. +- It is less resource intensive and more cost-efficient than SNS. Read on to learn how to set up Amazon SNS in Netdata. @@ -33,26 +31,97 @@ Read on to learn how to set up Amazon SNS in Netdata. Before you can enable SNS, you need: -- The [Amazon Web Services CLI tools](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) (`awscli`). -- An actual home directory for the user you run Netdata as, instead of just using `/` as a home directory. The setup depends on the distribution, but `/var/lib/netdata` is the recommended directory. If you are using Netdata as a dedicated user, the permissions will already be correct. -- An Amazon SNS topic to send notifications to with one or more subscribers. The [Getting Started](https://docs.aws.amazon.com/sns/latest/dg/sns-getting-started.html) section of the Amazon SNS documentation covers the basics of how to set this up. Make note of the **Topic ARN** when you create the topic. -- While not mandatory, it is highly recommended to create a dedicated IAM user on your account for Netdata to send notifications. This user needs to have programmatic access, and should only allow access to SNS. For an additional layer of security, you can create one for each system or group of systems. - -## Enabling Amazon SNS - -To enable SNS: -1. Run the following command as the user Netdata runs under: - ``` - aws configure - ``` -2. Enter the access key and secret key for accessing Amazon SNS. The system also prompts you to enter the default region and output format, but you can leave those blank because Netdata doesn't use them. - -3. Specify the desired topic ARN as a recipient, see [SNS documentation](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/US_SetupSNS.html#set-up-sns-topic-cli). -4. Optional: To change the notification format for SNS notifications, change the `AWSSNS_MESSAGE_FORMAT` variable in `health_alarm_notify.conf`. -This variable supports all the same variables you can use in custom notifications. - - The default format looks like this: - ```bash - AWSSNS_MESSAGE_FORMAT="${status} on ${host} at ${date}: ${chart} ${value_string}" - ``` - +- The [Amazon Web Services CLI tools](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) (`awscli`). +- An actual home directory for the user you run Netdata as, instead of just using `/` as a home directory. + The setup depends on the distribution, but `/var/lib/netdata` is the recommended directory. If you are using Netdata as a dedicated user, the permissions will already be correct. +- An Amazon SNS topic to send notifications to with one or more subscribers. + The [Getting Started](https://docs.aws.amazon.com/sns/latest/dg/sns-getting-started.html) section of the Amazon SNS documentation covers the basics of how to set this up. Make note of the **Topic ARN** when you create the topic. +- While not mandatory, it is highly recommended to create a dedicated IAM user on your account for Netdata to send notifications. + This user needs to have programmatic access, and should only allow access to SNS. For an additional layer of security, you can create one for each system or group of systems. +- Terminal access to the Agent you wish to configure. + +## Configure Netdata to send alert notifications to Amazon SNS + +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. + +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: + +1. Set `SEND_AWSNS` to `YES`. +2. Set `AWSSNS_MESSAGE_FORMAT` to the string that you want the alert to be sent into. + + The supported variables are: + + | Variable name | Description | + |:---------------------------:|:---------------------------------------------------------------------------------| + | `${alarm}` | Like "name = value units" | + | `${status_message}` | Like "needs attention", "recovered", "is critical" | + | `${severity}` | Like "Escalated to CRITICAL", "Recovered from WARNING" | + | `${raised_for}` | Like "(alarm was raised for 10 minutes)" | + | `${host}` | The host generated this event | + | `${url_host}` | Same as ${host} but URL encoded | + | `${unique_id}` | The unique id of this event | + | `${alarm_id}` | The unique id of the alarm that generated this event | + | `${event_id}` | The incremental id of the event, for this alarm id | + | `${when}` | The timestamp this event occurred | + | `${name}` | The name of the alarm, as given in netdata health.d entries | + | `${url_name}` | Same as ${name} but URL encoded | + | `${chart}` | The name of the chart (type.id) | + | `${url_chart}` | Same as ${chart} but URL encoded | + | `${family}` | The family of the chart | + | `${url_family}` | Same as ${family} but URL encoded | + | `${status}` | The current status : REMOVED, UNINITIALIZED, UNDEFINED, CLEAR, WARNING, CRITICAL | + | `${old_status}` | The previous status: REMOVED, UNINITIALIZED, UNDEFINED, CLEAR, WARNING, CRITICAL | + | `${value}` | The current value of the alarm | + | `${old_value}` | The previous value of the alarm | + | `${src}` | The line number and file the alarm has been configured | + | `${duration}` | The duration in seconds of the previous alarm state | + | `${duration_txt}` | Same as ${duration} for humans | + | `${non_clear_duration}` | The total duration in seconds this is/was non-clear | + | `${non_clear_duration_txt}` | Same as ${non_clear_duration} for humans | + | `${units}` | The units of the value | + | `${info}` | A short description of the alarm | + | `${value_string}` | Friendly value (with units) | + | `${old_value_string}` | Friendly old value (with units) | + | `${image}` | The URL of an image to represent the status of the alarm | + | `${color}` | A color in AABBCC format for the alarm | + | `${goto_url}` | The URL the user can click to see the netdata dashboard | + | `${calc_expression}` | The expression evaluated to provide the value for the alarm | + | `${calc_param_values}` | The value of the variables in the evaluated expression | + | `${total_warnings}` | The total number of alarms in WARNING state on the host | + | `${total_critical}` | The total number of alarms in CRITICAL state on the host | + +3. Set `DEFAULT_RECIPIENT_AWSSNS` to the Topic ARN you noted down upon creating the Topic. + All roles will default to this variable if left unconfigured. + +You can then have different recipient Topics per **role**, by editing `DEFAULT_RECIPIENT_AWSSNS` with the Topic ARN you want, in the following entries at the bottom of the same file: + +```conf +role_recipients_awssns[sysadmin]="arn:aws:sns:us-east-2:123456789012:Systems" +role_recipients_awssns[domainadmin]="arn:aws:sns:us-east-2:123456789012:Domains" +role_recipients_awssns[dba]="arn:aws:sns:us-east-2:123456789012:Databases" +role_recipients_awssns[webmaster]="arn:aws:sns:us-east-2:123456789012:Development" +role_recipients_awssns[proxyadmin]="arn:aws:sns:us-east-2:123456789012:Proxy" +role_recipients_awssns[sitemgr]="arn:aws:sns:us-east-2:123456789012:Sites" +``` + +An example working configuration would be: + +```conf +#------------------------------------------------------------------------------ +# Amazon SNS notifications + +SEND_AWSSNS="YES" +AWSSNS_MESSAGE_FORMAT="${status} on ${host} at ${date}: ${chart} ${value_string}" +DEFAULT_RECIPIENT_AWSSNS="arn:aws:sns:us-east-2:123456789012:MyTopic" +``` + +## Test the notification method + +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/custom/README.md b/health/notifications/custom/README.md index df8f88e40..ad64cea27 100644 --- a/health/notifications/custom/README.md +++ b/health/notifications/custom/README.md @@ -1,29 +1,128 @@ -<!-- -title: "Custom" -sidebar_label: "Custom endpoint" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/custom/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> - -# Custom - -Netdata allows you to send custom notifications to any endpoint you choose. - -To configure custom notifications, you will need to customize `health_alarm_notify.conf`. Open the file for editing -using [`edit-config`](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) from the [Netdata config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory), which is typically at `/etc/netdata`. +# Custom Agent alert notifications + +Netdata Agent's alert notification feature allows you to send custom notifications to any endpoint you choose. + +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. + +## Prerequisites + +You need to have terminal access to the Agent you wish to configure. + +## Configure Netdata to send alert notifications to a custom endpoint + +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. + +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: + +1. Set `SEND_CUSTOM` to `YES`. +2. The `DEFAULT_RECIPIENT_CUSTOM`'s value is dependent on how you handle the `${to}` variable inside the `custom_sender()` function. + All roles will default to this variable if left unconfigured. +3. Edit the `custom_sender()` function. + You can look at the other senders in `/usr/libexec/netdata/plugins.d/alarm-notify.sh` for examples of how to modify the function in this configuration file. + + The following is a sample `custom_sender()` function in `health_alarm_notify.conf`, to send an SMS via an imaginary HTTPS endpoint to the SMS gateway: + + ```sh + custom_sender() { + # example human readable SMS + local msg="${host} ${status_message}: ${alarm} ${raised_for}" + + # limit it to 160 characters and encode it for use in a URL + urlencode "${msg:0:160}" >/dev/null; msg="${REPLY}" + + # a space separated list of the recipients to send alarms to + to="${1}" + + for phone in ${to}; do + httpcode=$(docurl -X POST \ + --data-urlencode "From=XXX" \ + --data-urlencode "To=${phone}" \ + --data-urlencode "Body=${msg}" \ + -u "${accountsid}:${accounttoken}" \ + https://domain.website.com/) + + if [ "${httpcode}" = "200" ]; then + info "sent custom notification ${msg} to ${phone}" + sent=$((sent + 1)) + else + error "failed to send custom notification ${msg} to ${phone} with HTTP error code ${httpcode}." + fi + done + } + ``` + + The supported variables that you can use for the function's `msg` variable are: + + | Variable name | Description | + |:---------------------------:|:---------------------------------------------------------------------------------| + | `${alarm}` | Like "name = value units" | + | `${status_message}` | Like "needs attention", "recovered", "is critical" | + | `${severity}` | Like "Escalated to CRITICAL", "Recovered from WARNING" | + | `${raised_for}` | Like "(alarm was raised for 10 minutes)" | + | `${host}` | The host generated this event | + | `${url_host}` | Same as ${host} but URL encoded | + | `${unique_id}` | The unique id of this event | + | `${alarm_id}` | The unique id of the alarm that generated this event | + | `${event_id}` | The incremental id of the event, for this alarm id | + | `${when}` | The timestamp this event occurred | + | `${name}` | The name of the alarm, as given in netdata health.d entries | + | `${url_name}` | Same as ${name} but URL encoded | + | `${chart}` | The name of the chart (type.id) | + | `${url_chart}` | Same as ${chart} but URL encoded | + | `${family}` | The family of the chart | + | `${url_family}` | Same as ${family} but URL encoded | + | `${status}` | The current status : REMOVED, UNINITIALIZED, UNDEFINED, CLEAR, WARNING, CRITICAL | + | `${old_status}` | The previous status: REMOVED, UNINITIALIZED, UNDEFINED, CLEAR, WARNING, CRITICAL | + | `${value}` | The current value of the alarm | + | `${old_value}` | The previous value of the alarm | + | `${src}` | The line number and file the alarm has been configured | + | `${duration}` | The duration in seconds of the previous alarm state | + | `${duration_txt}` | Same as ${duration} for humans | + | `${non_clear_duration}` | The total duration in seconds this is/was non-clear | + | `${non_clear_duration_txt}` | Same as ${non_clear_duration} for humans | + | `${units}` | The units of the value | + | `${info}` | A short description of the alarm | + | `${value_string}` | Friendly value (with units) | + | `${old_value_string}` | Friendly old value (with units) | + | `${image}` | The URL of an image to represent the status of the alarm | + | `${color}` | A color in AABBCC format for the alarm | + | `${goto_url}` | The URL the user can click to see the netdata dashboard | + | `${calc_expression}` | The expression evaluated to provide the value for the alarm | + | `${calc_param_values}` | The value of the variables in the evaluated expression | + | `${total_warnings}` | The total number of alarms in WARNING state on the host | + | `${total_critical}` | The total number of alarms in CRITICAL state on the host | + +You can then have different `${to}` variables per **role**, by editing `DEFAULT_RECIPIENT_CUSTOM` with the variable you want, in the following entries at the bottom of the same file: + +```conf +role_recipients_custom[sysadmin]="systems" +role_recipients_custom[domainadmin]="domains" +role_recipients_custom[dba]="databases systems" +role_recipients_custom[webmaster]="marketing development" +role_recipients_custom[proxyadmin]="proxy-admin" +role_recipients_custom[sitemgr]="sites" +``` -You can look at the other senders in `/usr/libexec/netdata/plugins.d/alarm-notify.sh` for examples of how to modify the `custom_sender()` function in `health_alarm_notify.conf`. +An example working configuration would be: -As with other notifications, you will also need to define the recipient list in `DEFAULT_RECIPIENT_CUSTOM` and/or the `role_recipients_custom` array. +```conf +#------------------------------------------------------------------------------ +# custom notifications -The following is a sample `custom_sender` function in `health_alarm_notify.conf`, to send an SMS via an imaginary HTTPS endpoint to the SMS gateway: +SEND_CUSTOM="YES" +DEFAULT_RECIPIENT_CUSTOM="" -``` - custom_sender() { +# The custom_sender() is a custom function to do whatever you need to do +custom_sender() { # example human readable SMS local msg="${host} ${status_message}: ${alarm} ${raised_for}" @@ -35,63 +134,22 @@ The following is a sample `custom_sender` function in `health_alarm_notify.conf` for phone in ${to}; do httpcode=$(docurl -X POST \ - --data-urlencode "From=XXX" \ - --data-urlencode "To=${phone}" \ - --data-urlencode "Body=${msg}" \ - -u "${accountsid}:${accounttoken}" \ + --data-urlencode "From=XXX" \ + --data-urlencode "To=${phone}" \ + --data-urlencode "Body=${msg}" \ + -u "${accountsid}:${accounttoken}" \ https://domain.website.com/) - if [ "${httpcode}" = "200" ]; then - info "sent custom notification ${msg} to ${phone}" - sent=$((sent + 1)) - else - error "failed to send custom notification ${msg} to ${phone} with HTTP error code ${httpcode}." - fi + if [ "${httpcode}" = "200" ]; then + info "sent custom notification ${msg} to ${phone}" + sent=$((sent + 1)) + else + error "failed to send custom notification ${msg} to ${phone} with HTTP error code ${httpcode}." + fi done } ``` -Variables available to the custom_sender: - -- `${to_custom}` the list of recipients for the alarm -- `${host}` the host generated this event -- `${url_host}` same as `${host}` but URL encoded -- `${unique_id}` the unique id of this event -- `${alarm_id}` the unique id of the alarm that generated this event -- `${event_id}` the incremental id of the event, for this alarm id -- `${when}` the timestamp this event occurred -- `${name}` the name of the alarm, as given in Netdata health.d entries -- `${url_name}` same as `${name}` but URL encoded -- `${chart}` the name of the chart (type.id) -- `${url_chart}` same as `${chart}` but URL encoded -- `${family}` the family of the chart -- `${url_family}` same as `${family}` but URL encoded -- `${status}` the current status : REMOVED, UNINITIALIZED, UNDEFINED, CLEAR, WARNING, CRITICAL -- `${old_status}` the previous status: REMOVED, UNINITIALIZED, UNDEFINED, CLEAR, WARNING, CRITICAL -- `${value}` the current value of the alarm -- `${old_value}` the previous value of the alarm -- `${src}` the line number and file the alarm has been configured -- `${duration}` the duration in seconds of the previous alarm state -- `${duration_txt}` same as `${duration}` for humans -- `${non_clear_duration}` the total duration in seconds this is/was non-clear -- `${non_clear_duration_txt}` same as `${non_clear_duration}` for humans -- `${units}` the units of the value -- `${info}` a short description of the alarm -- `${value_string}` friendly value (with units) -- `${old_value_string}` friendly old value (with units) -- `${image}` the URL of an image to represent the status of the alarm -- `${color}` a color in #AABBCC format for the alarm -- `${goto_url}` the URL the user can click to see the Netdata dashboard -- `${calc_expression}` the expression evaluated to provide the value for the alarm -- `${calc_param_values}` the value of the variables in the evaluated expression -- `${total_warnings}` the total number of alarms in WARNING state on the host -- `${total_critical}` the total number of alarms in CRITICAL state on the host - -The following are more human friendly: - -- `${alarm}` like "name = value units" -- `${status_message}` like "needs attention", "recovered", "is critical" -- `${severity}` like "Escalated to CRITICAL", "Recovered from WARNING" -- `${raised_for}` like "(alarm was raised for 10 minutes)" - +## Test the notification method +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/discord/README.md b/health/notifications/discord/README.md index b4cbce533..b4aa7fd95 100644 --- a/health/notifications/discord/README.md +++ b/health/notifications/discord/README.md @@ -1,53 +1,71 @@ -<!-- -title: "Discord.com" -sidebar_label: "Discord" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/discord/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> - -# Discord.com +# Discord Agent alert notifications + +Learn how to send notifications to Discord using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. + +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. This is what you will get: ![image](https://cloud.githubusercontent.com/assets/7321975/22215935/b49ede7e-e162-11e6-98d0-ae8541e6b92e.png) -You need: +## Prerequisites -1. The **incoming webhook URL** as given by Discord. Create a webhook by following the official [Discord documentation](https://support.discord.com/hc/en-us/articles/228383668-Intro-to-Webhooks). You can use the same on all your Netdata servers (or you can have multiple if you like - your decision). -2. One or more Discord channels to post the messages to. +You will need: -Set them in `/etc/netdata/health_alarm_notify.conf` (to edit it on your system run `/etc/netdata/edit-config health_alarm_notify.conf`), like this: +- The **incoming webhook URL** as given by Discord. + Create a webhook by following the official [Discord documentation](https://support.discord.com/hc/en-us/articles/228383668-Intro-to-Webhooks). You can use the same on all your Netdata servers (or you can have multiple if you like - your decision). +- one or more Discord channels to post the messages to +- terminal access to the Agent you wish to configure -``` -############################################################################### -# sending discord notifications +## Configure Netdata to send alert notifications to Discord -# note: multiple recipients can be given like this: -# "CHANNEL1 CHANNEL2 ..." +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. -# enable/disable sending discord notifications -SEND_DISCORD="YES" +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: -# Create a webhook by following the official documentation - -# https://support.discord.com/hc/en-us/articles/228383668-Intro-to-Webhooks -DISCORD_WEBHOOK_URL="https://discord.com/api/webhooks/XXXXXXXXXXXXX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" +1. Set `SEND_DISCORD` to `YES`. +2. Set `DISCORD_WEBHOOK_URL` to your webhook URL. +3. Set `DEFAULT_RECIPIENT_DISCORD` to the channel you want the alert notifications to be sent to. + You can define multiple channels like this: `alerts systems`. + All roles will default to this variable if left unconfigured. -# if a role's recipients are not configured, a notification will be send to -# this discord channel (empty = do not send a notification for unconfigured -# roles): -DEFAULT_RECIPIENT_DISCORD="alarms" -``` + > ### Note + > + > You don't have to include the hashtag "#" of the channel, just its name. -You can define multiple channels like this: `alarms systems`. -You can give different channels per **role** using these (at the same file): +You can then have different channels per **role**, by editing `DEFAULT_RECIPIENT_DISCORD` with the channel you want, in the following entries at the bottom of the same file: -``` +```conf role_recipients_discord[sysadmin]="systems" +role_recipients_discord[domainadmin]="domains" role_recipients_discord[dba]="databases systems" role_recipients_discord[webmaster]="marketing development" +role_recipients_discord[proxyadmin]="proxy-admin" +role_recipients_discord[sitemgr]="sites" ``` -The keywords `systems`, `databases`, `marketing`, `development` are discord.com channels (they should already exist within your discord server). +The values you provide should already exist as Discord channels in your server. + +An example of a working configuration would be: + +```conf +#------------------------------------------------------------------------------ +# discord (discordapp.com) global notification options + +SEND_DISCORD="YES" +DISCORD_WEBHOOK_URL="https://discord.com/api/webhooks/XXXXXXXXXXXXX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" +DEFAULT_RECIPIENT_DISCORD="alerts" +``` + +## Test the notification method + +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/dynatrace/README.md b/health/notifications/dynatrace/README.md index a36683933..7665d0ca2 100644 --- a/health/notifications/dynatrace/README.md +++ b/health/notifications/dynatrace/README.md @@ -1,39 +1,66 @@ -<!-- -title: "Dynatrace" -sidebar_label: "Dynatrace Events" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/dynatrace/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> - -# Dynatrace +# Dynatrace Agent alert notifications + +Learn how to send notifications to Dynatrace using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. + +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. Dynatrace allows you to receive notifications using their Events REST API. -See [the Dynatrace documentation](https://www.dynatrace.com/support/help/extend-dynatrace/dynatrace-api/environment-api/events/post-event/) about POSTing an event in the Events API for more details. - - - -You need: - -1. Dynatrace Server. You can use the same on all your Netdata servers but make sure the server is network visible from your Netdata hosts. -The Dynatrace server should be with protocol prefixed (`http://` or `https://`). For example: `https://monitor.example.com` -This is a required parameter. -2. API Token. Generate a secure access API token that enables access to your Dynatrace monitoring data via the REST-based API. -Generate a Dynatrace API authentication token. On your Dynatrace server, go to **Settings** --> **Integration** --> **Dynatrace API** --> **Generate token**. -See [Dynatrace API - Authentication](https://www.dynatrace.com/support/help/extend-dynatrace/dynatrace-api/basics/dynatrace-api-authentication/) for more details. -This is a required parameter. -3. API Space. This is the URL part of the page you have access in order to generate the API Token. For example, the URL - for a generated API token might look like: - `https://monitor.illumineit.com/e/2a93fe0e-4cd5-469a-9d0d-1a064235cfce/#settings/integration/apikeys;gf=all` In that - case, my space is _2a93fe0e-4cd5-469a-9d0d-1a064235cfce_ This is a required parameter. -4. Generate a Server Tag. On your Dynatrace Server, go to **Settings** --> **Tags** --> **Manually applied tags** and create the Tag. -The Netdata alarm is sent as a Dynatrace Event to be correlated with all those hosts tagged with this Tag you have created. -This is a required parameter. -5. Specify the Dynatrace event. This can be one of `CUSTOM_INFO`, `CUSTOM_ANNOTATION`, `CUSTOM_CONFIGURATION`, and `CUSTOM_DEPLOYMENT`. -The default value is `CUSTOM_INFO`. -This is a required parameter. -6. Specify the annotation type. This is the source of the Dynatrace event. Put whatever it fits you, for example, -_Netdata Alarm_, which is also the default value. +See [the Dynatrace documentation](https://www.dynatrace.com/support/help/dynatrace-api/environment-api/events-v2/post-event) about POSTing an event in the Events API for more details. + +## Prerequisites + +You will need: + +- A Dynatrace Server. You can use the same on all your Netdata servers but make sure the server is network visible from your Netdata hosts. + The Dynatrace server should be with protocol prefixed (`http://` or `https://`), for example: `https://monitor.example.com`. +- An API Token. Generate a secure access API token that enables access to your Dynatrace monitoring data via the REST-based API. + See [Dynatrace API - Authentication](https://www.dynatrace.com/support/help/extend-dynatrace/dynatrace-api/basics/dynatrace-api-authentication/) for more details. +- An API Space. This is the URL part of the page you have access in order to generate the API Token. + For example, the URL for a generated API token might look like: `https://monitor.illumineit.com/e/2a93fe0e-4cd5-469a-9d0d-1a064235cfce/#settings/integration/apikeys;gf=all` In that case, the Space is `2a93fe0e-4cd5-469a-9d0d-1a064235cfce`. +- A Server Tag. To generate one on your Dynatrace Server, go to **Settings** --> **Tags** --> **Manually applied tags** and create the Tag. + The Netdata alarm is sent as a Dynatrace Event to be correlated with all those hosts tagged with this Tag you have created. +- terminal access to the Agent you wish to configure + +## Configure Netdata to send alert notifications to Dynatrace + +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. + +Edit `health_alarm_notify.conf`: + +1. Set `SEND_DYNATRACE` to `YES`. +2. Set `DYNATRACE_SERVER` to the Dynatrace server with the protocol prefix, for example `https://monitor.example.com`. +3. Set `DYNATRACE_TOKEN` to your Dynatrace API authentication token +4. Set `DYNATRACE_SPACE` to the API Space, it is the URL part of the page you have access in order to generate the API Token. For example, the URL for a generated API token might look like: `https://monitor.illumineit.com/e/2a93fe0e-4cd5-469a-9d0d-1a064235cfce/#settings/integration/apikeys;gf=all` In that case, the Space is `2a93fe0e-4cd5-469a-9d0d-1a064235cfce`. +5. Set `DYNATRACE_TAG_VALUE` to your Dynatrace Server Tag. +6. `DYNATRACE_ANNOTATION_TYPE` can be left to its default value `Netdata Alarm`, but you can change it to better fit your needs. +7. Set `DYNATRACE_EVENT` to the Dynatrace `eventType` you want, possible values are: + `AVAILABILITY_EVENT`, `CUSTOM_ALERT`, `CUSTOM_ANNOTATION`, `CUSTOM_CONFIGURATION`, `CUSTOM_DEPLOYMENT`, `CUSTOM_INFO`, `ERROR_EVENT`, `MARKED_FOR_TERMINATION`, `PERFORMANCE_EVENT`, `RESOURCE_CONTENTION_EVENT`. You can read more [here](https://www.dynatrace.com/support/help/dynatrace-api/environment-api/events-v2/post-event#request-body-objects) + +An example of a working configuration would be: + +```conf +#------------------------------------------------------------------------------ +# Dynatrace global notification options + +SEND_DYNATRACE="YES" +DYNATRACE_SERVER="https://monitor.example.com" +DYNATRACE_TOKEN="XXXXXXX" +DYNATRACE_SPACE="2a93fe0e-4cd5-469a-9d0d-1a064235cfce" +DYNATRACE_TAG_VALUE="SERVERTAG" +DYNATRACE_ANNOTATION_TYPE="Netdata Alert" +DYNATRACE_EVENT="AVAILABILITY_EVENT" +``` + +## Test the notification method + +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/email/README.md b/health/notifications/email/README.md index 01dfd0e6f..2470ac4d7 100644 --- a/health/notifications/email/README.md +++ b/health/notifications/email/README.md @@ -1,53 +1,83 @@ -<!-- -title: "Email" -sidebar_label: "Email" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/email/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': True, 'part_of_agent': True}" ---> +# Email Agent alert notifications -# Email +Learn how to send notifications via Email using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. -You need a working `sendmail` command for email alerts to work. Almost all MTAs provide a `sendmail` interface. +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. -Netdata sends all emails as user `netdata`, so make sure your `sendmail` works for local users. +Email notifications look like this: -email notifications look like this: +<img src="https://user-images.githubusercontent.com/1905463/133216974-a2ca0e4f-787b-4dce-b1b2-9996a8c5f718.png" alt="Email notification screenshot" width="50%"></img> -![image](https://user-images.githubusercontent.com/1905463/133216974-a2ca0e4f-787b-4dce-b1b2-9996a8c5f718.png) +## Prerequisites -## Configuration +You will need: -To edit `health_alarm_notify.conf` on your system run `/etc/netdata/edit-config health_alarm_notify.conf`. +- A working `sendmail` command for email alerts to work. Almost all MTAs provide a `sendmail` interface. + Netdata sends all emails as user `netdata`, so make sure your `sendmail` works for local users. -You can configure recipients in [`/etc/netdata/health_alarm_notify.conf`](https://github.com/netdata/netdata/blob/99d44b7d0c4e006b11318a28ba4a7e7d3f9b3bae/conf.d/health_alarm_notify.conf#L101). + > ### Note + > + > If you are using our Docker images, or are running Netdata on a system that does not have a working `sendmail` command, see [the section below about using msmtp in place of sendmail](#using-msmtp-instead-of-sendmail). +- terminal access to the Agent you wish to configure -You can also configure per role recipients [in the same file, a few lines below](https://github.com/netdata/netdata/blob/99d44b7d0c4e006b11318a28ba4a7e7d3f9b3bae/conf.d/health_alarm_notify.conf#L313). +## Configure Netdata to send alerts via Email -Changes to this file do not require a Netdata restart. +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. -You can test your configuration by issuing the commands: +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: -```sh -# become user netdata -sudo su -s /bin/bash netdata +1. You can change `EMAIL_SENDER` to the email address sending the notifications, the default is the system user Netdata runs as, usually being `netdata`. + Supported formats are: -# send a test alarm -/usr/libexec/netdata/plugins.d/alarm-notify.sh test [ROLE] + ```conf + EMAIL_SENDER="user@domain" + EMAIL_SENDER="User Name <user@domain>" + EMAIL_SENDER="'User Name' <user@domain>" + EMAIL_SENDER="\"User Name\" <user@domain>" + ``` + +2. Set `SEND_EMAIL` to `YES`. +3. Set `DEFAULT_RECIPIENT_EMAIL` to the email address you want the email to be sent by default. + You can define multiple email addresses like this: `alarms@example.com systems@example.com`. + All roles will default to this variable if left unconfigured. +4. There are also other optional configuration entries that can be found in the same section of the file. + +You can then have different email addresses per **role**, by editing `DEFAULT_RECIPIENT_EMAIL` with the email address you want, in the following entries at the bottom of the same file: + +```conf +role_recipients_email[sysadmin]="systems@example.com" +role_recipients_email[domainadmin]="domains@example.com" +role_recipients_email[dba]="databases@example.com systems@example.com" +role_recipients_email[webmaster]="marketing@example.com development@example.com" +role_recipients_email[proxyadmin]="proxy-admin@example.com" +role_recipients_email[sitemgr]="sites@example.com" ``` -Where `[ROLE]` is the role you want to test. The default (if you don't give a `[ROLE]`) is `sysadmin`. +An example of a working configuration would be: + +```conf +#------------------------------------------------------------------------------ +# email global notification options -Note that in versions before 1.16, the plugins.d directory may be installed in a different location in certain OSs (e.g. under `/usr/lib/netdata`). -You can always find the location of the alarm-notify.sh script in `netdata.conf`. +EMAIL_SENDER="example@domain.com" +SEND_EMAIL="YES" +DEFAULT_RECIPIENT_EMAIL="recipient@example.com" +``` -## Filtering +### Filtering Every notification email (both the plain text and the rich html versions) from the Netdata agent, contain a set of custom email headers that can be used for filtering using an email client. Example: -``` +```conf X-Netdata-Severity: warning X-Netdata-Alert-Name: inbound_packets_dropped_ratio X-Netdata-Chart: net_packets.enp2s0 @@ -57,26 +87,37 @@ X-Netdata-Host: winterland X-Netdata-Role: sysadmin ``` -## Simple SMTP transport configuration +### Using msmtp instead of sendmail -If you want an alternative to `sendmail` in order to have a simple MTA configuration for sending emails and auth to an existing SMTP server, you can do the following: +[msmtp](https://marlam.de/msmtp/) provides a simple alternative to a full-blown local mail server and `sendmail` +that will still allow you to send email notifications. It comes pre-installed in our Docker images, and is available +on most distributions in the system package repositories. -- Install `msmtp`. -- Modify the `sendmail` path in `health_alarm_notify.conf` to point to the location of `msmtp`: -``` -# The full path to the sendmail command. -# If empty, the system $PATH will be searched for it. -# If not found, email notifications will be disabled (silently). -sendmail="/usr/bin/msmtp" -``` -- Login as netdata : -```sh -(sudo) su -s /bin/bash netdata -``` -- Configure `~/.msmtprc` as shown [in the documentation](https://marlam.de/msmtp/documentation/). -- Finally set the appropriate permissions on the `.msmtprc` file : -```sh -chmod 600 ~/.msmtprc -``` +To use msmtp with Netdata for sending email alerts: + +1. If it’s not already installed, install msmtp. Most distributions have it in their package repositories with the package name `msmtp`. +2. Modify the `sendmail` path in `health_alarm_notify.conf` to point to the location of `msmtp`: + + ```conf + # The full path to the sendmail command. + # If empty, the system $PATH will be searched for it. + # If not found, email notifications will be disabled (silently). + sendmail="/usr/bin/msmtp" + ``` + +3. Login as netdata: + + ```sh + (sudo) su -s /bin/bash netdata + ``` + +4. Configure `~/.msmtprc` as shown [in the documentation](https://marlam.de/msmtp/documentation/). +5. Finally set the appropriate permissions on the `.msmtprc` file : + + ```sh + chmod 600 ~/.msmtprc + ``` +## Test the notification method +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/flock/README.md b/health/notifications/flock/README.md index 175f8a466..daf50abf4 100644 --- a/health/notifications/flock/README.md +++ b/health/notifications/flock/README.md @@ -1,42 +1,66 @@ -<!-- -title: "Flock" -sidebar_label: "Flock" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/flock/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> - -# Flock +# Flock Agent alert notifications + +Learn how to send notifications to Flock using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. + +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. This is what you will get: ![Flock](https://i.imgur.com/ok9bRzw.png) -You need: +## Prerequisites + +You will need: + +- The **incoming webhook URL** as given by flock.com + You can use the same on all your Netdata servers (or you can have multiple if you like - your decision). + Read more about flock webhooks and how to get one [here](https://admin.flock.com/webhooks). +- Terminal access to the Agent you wish to configure + +## Configure Netdata to send alert notifications to Flock + +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. -The **incoming webhook URL** as given by flock.com. -You can use the same on all your Netdata servers (or you can have multiple if you like - your decision). +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: -Get them here: <https://admin.flock.com/webhooks> +1. Set `SEND_FLOCK` to `YES`. +2. Set `FLOCK_WEBHOOK_URL` to your webhook URL. +3. Set `DEFAULT_RECIPIENT_FLOCK` to the Flock channel you want the alert notifications to be sent to. + All roles will default to this variable if left unconfigured. -Set them in `/etc/netdata/health_alarm_notify.conf` (to edit it on your system run `/etc/netdata/edit-config health_alarm_notify.conf`), like this: +You can then have different channels per **role**, by editing `DEFAULT_RECIPIENT_FLOCK` with the channel you want, in the following entries at the bottom of the same file: +```conf +role_recipients_flock[sysadmin]="systems" +role_recipients_flock[domainadmin]="domains" +role_recipients_flock[dba]="databases systems" +role_recipients_flock[webmaster]="marketing development" +role_recipients_flock[proxyadmin]="proxy-admin" +role_recipients_flock[sitemgr]="sites" ``` -############################################################################### -# sending flock notifications -# enable/disable sending pushover notifications -SEND_FLOCK="YES" +The values you provide should already exist as Flock channels. -# Login to flock.com and create an incoming webhook. -# You need only one for all your Netdata servers. -# Without it, Netdata cannot send flock notifications. -FLOCK_WEBHOOK_URL="https://api.flock.com/hooks/sendMessage/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" +An example of a working configuration would be: + +```conf +#------------------------------------------------------------------------------ +# flock (flock.com) global notification options -# if a role recipient is not configured, no notification will be sent +SEND_FLOCK="YES" +FLOCK_WEBHOOK_URL="https://api.flock.com/hooks/sendMessage/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" DEFAULT_RECIPIENT_FLOCK="alarms" ``` +## Test the notification method +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/gotify/README.md b/health/notifications/gotify/README.md index d01502b65..4f6760f64 100644 --- a/health/notifications/gotify/README.md +++ b/health/notifications/gotify/README.md @@ -1,66 +1,49 @@ -<!-- -title: "Send notifications to Gotify" -description: "Send alerts to your Gotify instance when an alert gets triggered in Netdata." -sidebar_label: "Gotify" -custom_edit_url: https://github.com/netdata/netdata/edit/master/health/notifications/gotify/README.md -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> - -# Send notifications to Gotify +# Gotify agent alert notifications -[Gotify](https://gotify.net/) is a self-hosted push notification service created for sending and receiving messages in real time. - -## Configuring Gotify +Learn how to send alerts to your Gotify instance using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. -### Prerequisites +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. -To use Gotify as your notification service, you need an application token. -You can generate a new token in the Gotify Web UI. +[Gotify](https://gotify.net/) is a self-hosted push notification service created for sending and receiving messages in real time. -### Configuration +This is what you will get: -To set up Gotify in Netdata: +<img src="https://user-images.githubusercontent.com/103264516/162509205-1e88e5d9-96b6-4f7f-9426-182776158128.png" alt="Example alarm notifications in Gotify" width="70%"></img> -1. Switch to your [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) and edit the file `health_alarm_notify.conf` using the edit config script. - - ```bash - ./edit-config health_alarm_notify.conf - ``` +## Prerequisites -2. Change the variable `GOTIFY_APP_TOKEN` to the application token you generated in the Gotify Web UI. Change -`GOTIFY_APP_URL` to point to your Gotify instance. +You will need: - ```conf - SEND_GOTIFY="YES" +- An application token. You can generate a new token in the Gotify Web UI. +- terminal access to the Agent you wish to configure - # Application token - # Gotify instance url - GOTIFY_APP_TOKEN=XXXXXXXXXXXXXXX - GOTIFY_APP_URL=https://push.example.de/ - ``` +## Configure Netdata to send alert notifications to Gotify - Changes to `health_alarm_notify.conf` do not require a Netdata restart. - -3. Test your Gotify notifications configuration by running the following commands, replacing `ROLE` with your preferred role: +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. - ```sh - # become user netdata - sudo su -s /bin/bash netdata +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: - # send a test alarm - /usr/libexec/netdata/plugins.d/alarm-notify.sh test ROLE - ``` +1. Set `SEND_GOTIFY` to `YES` +2. Set `GOTIFY_APP_TOKEN` to the app token you generated +3. `GOTIFY_APP_URL` to point to your Gotify instance, for example `https://push.example.domain/` - 🟢 If everything works, you'll see alarms in Gotify: +An example of a working configuration would be: - ![Example alarm notifications in Gotify](https://user-images.githubusercontent.com/103264516/162509205-1e88e5d9-96b6-4f7f-9426-182776158128.png) +```conf +SEND_GOTIFY="YES" +GOTIFY_APP_TOKEN="XXXXXXXXXXXXXXX" +GOTIFY_APP_URL="https://push.example.domain/" +``` - 🔴 If sending the test notifications fails, check `/var/log/netdata/error.log` to find the relevant error message: +## Test the notification method - ```log - 2020-09-03 23:07:00: alarm-notify.sh: ERROR: failed to send Gotify notification for: hades test.chart.test_alarm is CRITICAL, with HTTP error code 401. - ``` +To test this alert refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/hangouts/README.md b/health/notifications/hangouts/README.md index 45da1bfa0..491b738bc 100644 --- a/health/notifications/hangouts/README.md +++ b/health/notifications/hangouts/README.md @@ -1,15 +1,15 @@ <!-- -title: "Send notifications to Google Hangouts" +title: "Google Hangouts agent alert notifications" description: "Send alerts to Send notifications to Google Hangouts any time an anomaly or performance issue strikes a node in your infrastructure." sidebar_label: "Google Hangouts" custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/hangouts/README.md" learn_status: "Published" learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" +learn_rel_path: "Integrations/Notify/Agent alert notifications" learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" --> -# Send notifications to Google Hangouts +# Google Hangouts agent alert notifications [Google Hangouts](https://hangouts.google.com/) is a cross-platform messaging app developed by Google. You can configure Netdata to send alarm notifications to a Hangouts room in order to stay aware of possible health or performance issues diff --git a/health/notifications/health_alarm_notify.conf b/health/notifications/health_alarm_notify.conf index 4878661aa..b7fa6e796 100755 --- a/health/notifications/health_alarm_notify.conf +++ b/health/notifications/health_alarm_notify.conf @@ -22,6 +22,7 @@ # - message to Microsoft Teams (through webhook) # - message to Rocket.Chat (through webhook) # - message to Google Hangouts Chat (through webhook) +# - push notifications to your mobile phone or desktop (ntfy.sh) # # The 'to' line given at netdata alarms defines a *role*, so that many # people can be notified for each role. @@ -854,6 +855,18 @@ MATRIX_ACCESSTOKEN= DEFAULT_RECIPIENT_MATRIX="" #------------------------------------------------------------------------------ +# ntfy.sh global notification options + +# enable/disable sending ntfy notifications +SEND_NTFY="YES" + +# if a role's recipients are not configured, a notification will be sent to +# this ntfy server / topic combination (empty = do not send a notification for +# unconfigured roles). +# Multiple recipients can be given like this: "https://SERVER1/TOPIC1 https://SERVER2/TOPIC2 ..." +DEFAULT_RECIPIENT_NTFY="" + +#------------------------------------------------------------------------------ # custom notifications # @@ -997,6 +1010,8 @@ role_recipients_stackpulse[sysadmin]="${DEFAULT_RECIPIENT_STACKPULSE}" role_recipients_gotify[sysadmin]="${DEFAULT_RECIPIENT_GOTIFY}" +role_recipients_ntfy[sysadmin]="${DEFAULT_RECIPIENT_NTFY}" + # ----------------------------------------------------------------------------- # DNS related alarms @@ -1056,6 +1071,8 @@ role_recipients_stackpulse[domainadmin]="${DEFAULT_RECIPIENT_STACKPULSE}" role_recipients_gotify[domainadmin]="${DEFAULT_RECIPIENT_GOTIFY}" +role_recipients_ntfy[domainadmin]="${DEFAULT_RECIPIENT_NTFY}" + # ----------------------------------------------------------------------------- # database servers alarms # mysql, redis, memcached, postgres, etc @@ -1116,6 +1133,8 @@ role_recipients_stackpulse[dba]="${DEFAULT_RECIPIENT_STACKPULSE}" role_recipients_gotify[dba]="${DEFAULT_RECIPIENT_GOTIFY}" +role_recipients_ntfy[dba]="${DEFAULT_RECIPIENT_NTFY}" + # ----------------------------------------------------------------------------- # web servers alarms # apache, nginx, lighttpd, etc @@ -1176,6 +1195,8 @@ role_recipients_stackpulse[webmaster]="${DEFAULT_RECIPIENT_STACKPULSE}" role_recipients_gotify[webmaster]="${DEFAULT_RECIPIENT_GOTIFY}" +role_recipients_ntfy[webmaster]="${DEFAULT_RECIPIENT_NTFY}" + # ----------------------------------------------------------------------------- # proxy servers alarms # squid, etc @@ -1236,6 +1257,8 @@ role_recipients_stackpulse[proxyadmin]="${DEFAULT_RECIPIENT_STACKPULSE}" role_recipients_gotify[proxyadmin]="${DEFAULT_RECIPIENT_GOTIFY}" +role_recipients_ntfy[proxyadmin]="${DEFAULT_RECIPIENT_NTFY}" + # ----------------------------------------------------------------------------- # peripheral devices # UPS, photovoltaics, etc @@ -1293,3 +1316,5 @@ role_recipients_matrix[sitemgr]="${DEFAULT_RECIPIENT_MATRIX}" role_recipients_stackpulse[sitemgr]="${DEFAULT_RECIPIENT_STACKPULSE}" role_recipients_gotify[sitemgr]="${DEFAULT_RECIPIENT_GOTIFY}" + +role_recipients_ntfy[sitemgr]="${DEFAULT_RECIPIENT_NTFY}" diff --git a/health/notifications/irc/README.md b/health/notifications/irc/README.md index a4877f48a..bf40bfb6b 100644 --- a/health/notifications/irc/README.md +++ b/health/notifications/irc/README.md @@ -1,83 +1,88 @@ -<!-- -title: "IRC" -sidebar_label: "IRC" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/irc/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> - -# IRC +# IRC Agent alert notifications + +Learn how to send notifications to IRC using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. + +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. This is what you will get: -IRCCloud web client:\ +IRCCloud web client: ![image](https://user-images.githubusercontent.com/31221999/36793487-3735673e-1ca6-11e8-8880-d1d8b6cd3bc0.png) -Irssi terminal client: +Irssi terminal client: ![image](https://user-images.githubusercontent.com/31221999/36793486-3713ada6-1ca6-11e8-8c12-70d956ad801e.png) -You need: - -1. The `nc` utility. If you do not set the path, Netdata will search for it in your system `$PATH`. - -Set the path for `nc` in `/etc/netdata/health_alarm_notify.conf` (to edit it on your system run `/etc/netdata/edit-config health_alarm_notify.conf`), like this: - -``` -#------------------------------------------------------------------------------ -# external commands -# -# The full path of the nc command. -# If empty, the system $PATH will be searched for it. -# If not found, irc notifications will be silently disabled. -nc="/usr/bin/nc" +## Prerequisites + +You will need: + +- The `nc` utility. + You can set the path to it, or Netdata will search for it in your system `$PATH`. +- terminal access to the Agent you wish to configure + +## Configure Netdata to send alert notifications to IRC + +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. + +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: + +1. Set the path for `nc`, otherwise Netdata will search for it in your system `$PATH`: + + ```conf + #------------------------------------------------------------------------------ + # external commands + # + # The full path of the nc command. + # If empty, the system $PATH will be searched for it. + # If not found, irc notifications will be silently disabled. + nc="/usr/bin/nc" + ``` + +2. Set `SEND_IRC` to `YES` +3. Set `DEFAULT_RECIPIENT_IRC` to one or more channels to post the messages to. + You can define multiple channels like this: `#alarms #systems`. + All roles will default to this variable if left unconfigured. +4. Set `IRC_NETWORK` to the IRC network which your preferred channels belong to. +5. Set `IRC_PORT` to the IRC port to which a connection will occur. +6. Set `IRC_NICKNAME` to the IRC nickname which is required to send the notification. + It must not be an already registered name as the connection's `MODE` is defined as a `guest`. +7. Set `IRC_REALNAME` to the IRC realname which is required in order to make he connection. + +You can then have different channels per **role**, by editing `DEFAULT_RECIPIENT_IRC` with the channel you want, in the following entries at the bottom of the same file: + +```conf +role_recipients_irc[sysadmin]="#systems" +role_recipients_irc[domainadmin]="#domains" +role_recipients_irc[dba]="#databases #systems" +role_recipients_irc[webmaster]="#marketing #development" +role_recipients_irc[proxyadmin]="#proxy-admin" +role_recipients_irc[sitemgr]="#sites" ``` -2. Αn `IRC_NETWORK` to which your preferred channels belong to. -3. One or more channels ( `DEFAULT_RECIPIENT_IRC` ) to post the messages to. -4. An `IRC_NICKNAME` and an `IRC_REALNAME` to identify in IRC. +The values you provide should be IRC channels which belong to the specified IRC network. -Set them in `/etc/netdata/health_alarm_notify.conf` (to edit it on your system run `/etc/netdata/edit-config health_alarm_notify.conf`), like this: +An example of a working configuration would be: -``` +```conf #------------------------------------------------------------------------------ # irc notification options # -# irc notifications require only the nc utility to be installed. - -# multiple recipients can be given like this: -# "<irc_channel_1> <irc_channel_2> ..." - -# enable/disable sending irc notifications SEND_IRC="YES" - -# if a role's recipients are not configured, a notification will not be sent. -# (empty = do not send a notification for unconfigured roles): DEFAULT_RECIPIENT_IRC="#system-alarms" - -# The irc network to which the recipients belong. It must be the full network. IRC_NETWORK="irc.freenode.net" - -# The irc nickname which is required to send the notification. It must not be -# an already registered name as the connection's MODE is defined as a 'guest'. IRC_NICKNAME="netdata-alarm-user" - -# The irc realname which is required in order to make the connection and is an -# extra identifier. IRC_REALNAME="netdata-user" ``` -You can define multiple channels like this: `#system-alarms #networking-alarms`.\ -You can also filter the notifications like this: `#system-alarms|critical`.\ -You can give different channels per **role** using these (at the same file): - -``` -role_recipients_irc[sysadmin]="#user-alarms #networking-alarms #system-alarms" -role_recipients_irc[dba]="#databases-alarms" -role_recipients_irc[webmaster]="#networking-alarms" -``` - -The keywords `#user-alarms`, `#networking-alarms`, `#system-alarms`, `#databases-alarms` are irc channels which belong to the specified IRC network. - +## Test the notification method +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/kavenegar/README.md b/health/notifications/kavenegar/README.md index 443fcdba4..434354f6d 100644 --- a/health/notifications/kavenegar/README.md +++ b/health/notifications/kavenegar/README.md @@ -1,51 +1,67 @@ -<!-- -title: "Kavenegar" -sidebar_label: "Kavenegar" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/kavenegar/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> - -# Kavenegar +# Kavenegar Agent alert notifications + +Learn how to send notifications to Kavenegar using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. + +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. [Kavenegar](https://kavenegar.com/) as service for software developers, based in Iran, provides send and receive SMS, calling voice by using its APIs. -Will look like this on your Android device: +This is what you will get: + +![image](https://user-images.githubusercontent.com/70198089/229841323-6c4b1956-dd91-423e-abaf-2799000f72a8.png) -![image](https://cloud.githubusercontent.com/assets/17090999/20034652/620b6100-a39b-11e6-96af-4f83b8e830e2.png) +## Prerequisites You will need: -1. Signup and Login to kavenegar.com -2. Get your APIKEY and Sender from `http://panel.kavenegar.com/client/setting/account` -3. Fill in KAVENEGAR_API_KEY="" KAVENEGAR_SENDER="" -4. Add the recipient phone numbers to DEFAULT_RECIPIENT_KAVENEGAR="" +- the `APIKEY` and Sender from <http://panel.kavenegar.com/client/setting/account> +- terminal access to the Agent you wish to configure + +## Configure Netdata to send alert notifications to Kavenegar + +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. -Set them in `/etc/netdata/health_alarm_notify.conf` (to edit it on your system run `/etc/netdata/edit-config health_alarm_notify.conf`), like this: +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: +1. Set `SEND_KAVENEGAR` to `YES`. +2. Set `KAVENEGAR_API_KEY` to your `APIKEY`. +3. Set `KAVENEGAR_SENDER` to the value of your Sender. +4. Set `DEFAULT_RECIPIENT_KAVENEGAR` to the SMS recipient you want the alert notifications to be sent to. + You can define multiple recipients like this: `09155555555 09177777777`. + All roles will default to this variable if lest unconfigured. + +You can then have different SMS recipients per **role**, by editing `DEFAULT_RECIPIENT_KAVENEGAR` with the SMS recipients you want, in the following entries at the bottom of the same file: + +```conf +role_recipients_kavenegar[sysadmin]="09100000000" +role_recipients_kavenegar[domainadmin]="09111111111" +role_recipients_kavenegar[dba]="0922222222" +role_recipients_kavenegar[webmaster]="0933333333" +role_recipients_kavenegar[proxyadmin]="0944444444" +role_recipients_kavenegar[sitemgr]="0955555555" ``` -############################################################################### -# Kavenegar (kavenegar.com) SMS options -# multiple recipients can be given like this: -# "09155555555 09177777777" +An example of a working configuration would be: -# enable/disable sending kavenegar SMS -SEND_KAVENEGAR="YES" +```conf +#------------------------------------------------------------------------------ +# Kavenegar (Kavenegar.com) SMS options -# to get an access key, after selecting and purchasing your desired service -# at http://kavenegar.com/pricing.html -# login to your account, go to your dashboard and my account are -# https://panel.kavenegar.com/Client/setting/account from API Key -# copy your api key. You can generate new API Key too. -# You can find and select kevenegar sender number from this place. - -# Without an API key, Netdata cannot send KAVENEGAR text messages. -KAVENEGAR_API_KEY="" -KAVENEGAR_SENDER="" -DEFAULT_RECIPIENT_KAVENEGAR="" +SEND_KAVENEGAR="YES" +KAVENEGAR_API_KEY="XXXXXXXXXXXX" +KAVENEGAR_SENDER="YYYYYYYY" +DEFAULT_RECIPIENT_KAVENEGAR="0912345678" ``` +## Test the notification method +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/matrix/README.md b/health/notifications/matrix/README.md index 80e22da37..714d8c22e 100644 --- a/health/notifications/matrix/README.md +++ b/health/notifications/matrix/README.md @@ -1,62 +1,74 @@ -<!-- -title: "Send Netdata notifications to Matrix network rooms" -description: "Stay aware of warning or critical anomalies by sending health alarms to Matrix network rooms with Netdata's health monitoring watchdog." -sidebar_label: "Matrix" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/matrix/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> +# Matrix Agent alert notifications -# Matrix +Learn how to send notifications to Matrix network rooms using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. -Send notifications to [Matrix](https://matrix.org/) network rooms. +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. -The requirements for this notification method are: +## Prerequisites -1. The url of the homeserver (`https://homeserver:port`). -2. Credentials for connecting to the homeserver, in the form of a valid access token for your account (or for a - dedicated notification account). These tokens usually don't expire. -3. The room ids that you want to sent the notification to. +You will need: -To obtain the access token, you can use the following `curl` command: +- The url of the homeserver (`https://homeserver:port`). +- Credentials for connecting to the homeserver, in the form of a valid access token for your account (or for a dedicated notification account). These tokens usually don't expire. +- The room ids that you want to sent the notification to. -```bash -curl -XPOST -d '{"type":"m.login.password", "user":"example", "password":"wordpass"}' "https://homeserver:8448/_matrix/client/r0/login" -``` +## Configure Netdata to send alert notifications to Matrix + +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. + +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: + +1. Set `SEND_MATRIX` to `YES`. +2. Set `MATRIX_HOMESERVER` to the URL of the Matrix homeserver. +3. Set `MATRIX_ACCESSTOKEN` to the access token from your Matrix account. + To obtain the access token, you can use the following `curl` command: -The room ids are unique identifiers and can be obtained from the room settings in a Matrix client (e.g. Riot). Their -format is `!uniqueid:homeserver`. + ```bash + curl -XPOST -d '{"type":"m.login.password", "user":"example", "password":"wordpass"}' "https://homeserver:8448/_matrix/client/r0/login" + ``` -Multiple room ids can be defined by separating with a space character. +4. Set `DEFAULT_RECIPIENT_MATRIX` to the rooms you want the alert notifications to be sent to. + The format is `!roomid:homeservername`. -Detailed information about the Matrix client API is available at the [official -site](https://matrix.org/docs/guides/client-server.html). + The room ids are unique identifiers and can be obtained from the room settings in a Matrix client (e.g. Riot). -Your `health_alarm_notify.conf` should look like this: + You can define multiple rooms like this: `!roomid1:homeservername !roomid2:homeservername`. + All roles will default to this variable if left unconfigured. + +Detailed information about the Matrix client API is available at the [official site](https://matrix.org/docs/guides/client-server.html). + +You can then have different rooms per **role**, by editing `DEFAULT_RECIPIENT_MATRIX` with the `!roomid:homeservername` you want, in the following entries at the bottom of the same file: ```conf -############################################################################### +role_recipients_matrix[sysadmin]="!roomid1:homeservername" +role_recipients_matrix[domainadmin]="!roomid2:homeservername" +role_recipients_matrix[dba]="!roomid3:homeservername" +role_recipients_matrix[webmaster]="!roomid4:homeservername" +role_recipients_matrix[proxyadmin]="!roomid5:homeservername" +role_recipients_matrix[sitemgr]="!roomid6:homeservername" +``` + +An example of a working configuration would be: + +```conf +#------------------------------------------------------------------------------ # Matrix notifications -# -# enable/disable Matrix notifications SEND_MATRIX="YES" - -# The url of the Matrix homeserver -# e.g https://matrix.org:8448 MATRIX_HOMESERVER="https://matrix.org:8448" - -# A access token from a valid Matrix account. Tokens usually don't expire, -# can be controlled from a Matrix client. -# See https://matrix.org/docs/guides/client-server.html MATRIX_ACCESSTOKEN="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" - -# Specify the default rooms to receive the notification if no rooms are provided -# in a role's recipients. -# The format is !roomid:homeservername DEFAULT_RECIPIENT_MATRIX="!XXXXXXXXXXXX:matrix.org" ``` +## Test the notification method +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/messagebird/README.md b/health/notifications/messagebird/README.md index 014301985..6b96c0d96 100644 --- a/health/notifications/messagebird/README.md +++ b/health/notifications/messagebird/README.md @@ -1,50 +1,65 @@ -<!-- -title: "Messagebird" -sidebar_label: "Messagebird" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/messagebird/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> +# MessageBird Agent alert notifications -# Messagebird +Learn how to send notifications to MessageBird using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. -The messagebird notifications will look like this on your Android device: +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. -![image](https://cloud.githubusercontent.com/assets/17090999/20034652/620b6100-a39b-11e6-96af-4f83b8e830e2.png) +This is what you will get: + +![image](https://user-images.githubusercontent.com/70198089/229841323-6c4b1956-dd91-423e-abaf-2799000f72a8.png) + +## Prerequisites You will need: -1. Signup and Login to messagebird.com -2. Pick an SMS capable number after sign up to get some free credits -3. Go to <https://www.messagebird.com/app/settings/developers/access> -4. Create a new access key under 'API ACCESS (REST)' (you will want a live key) -5. Fill in MESSAGEBIRD_ACCESS_KEY="XXXXXXXX" MESSAGEBIRD_NUMBER="+XXXXXXXXXXX" -6. Add the recipient phone numbers to DEFAULT_RECIPIENT_MESSAGEBIRD="+XXXXXXXXXXX" +- an access key under 'API ACCESS (REST)' (you will want a live key), you can read more [here](https://developers.messagebird.com/quickstarts/sms/test-credits-api-keys/) +- terminal access to the Agent you wish to configure + +## Configure Netdata to send alert notifications to MessageBird + +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. + +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: + +1. Set `SEND_MESSAGEBIRD` to `YES`. +2. Set `MESSAGEBIRD_ACCESS_KEY` to your API access key. +3. Set `MESSAGEBIRD_NUMBER` to the MessageBird number you want to use for the alert. +4. Set `DEFAULT_RECIPIENT_MESSAGEBIRD` to the number you want the alert notification to be sent as an SMS. + You can define multiple recipients like this: `+15555555555 +17777777777`. + All roles will default to this variable if left unconfigured. -Set them in `/etc/netdata/health_alarm_notify.conf` (to edit it on your system run `/etc/netdata/edit-config health_alarm_notify.conf`), like this: +You can then have different recipients per **role**, by editing `DEFAULT_RECIPIENT_MESSAGEBIRD` with the number you want, in the following entries at the bottom of the same file: +```conf +role_recipients_messagebird[sysadmin]="+15555555555" +role_recipients_messagebird[domainadmin]="+15555555556" +role_recipients_messagebird[dba]="+15555555557" +role_recipients_messagebird[webmaster]="+15555555558" +role_recipients_messagebird[proxyadmin]="+15555555559" +role_recipients_messagebird[sitemgr]="+15555555550" ``` + +An example of a working configuration would be: + +```conf #------------------------------------------------------------------------------ # Messagebird (messagebird.com) SMS options -# multiple recipients can be given like this: -# "+15555555555 +17777777777" - -# enable/disable sending messagebird SMS SEND_MESSAGEBIRD="YES" - -# to get an access key, create a free account at https://www.messagebird.com -# verify and activate the account (no CC info needed) -# login to your account and enter your phonenumber to get some free credits -# to get the API key, click on 'API' in the sidebar, then 'API Access (REST)' -# click 'Add access key' and fill in data (you want a live key to send SMS) - -# Without an access key, Netdata cannot send Messagebird text messages. MESSAGEBIRD_ACCESS_KEY="XXXXXXXX" MESSAGEBIRD_NUMBER="XXXXXXX" -DEFAULT_RECIPIENT_MESSAGEBIRD="XXXXXXX" +DEFAULT_RECIPIENT_MESSAGEBIRD="+15555555555" ``` +## Test the notification method +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/msteams/README.md b/health/notifications/msteams/README.md index 75e652a72..5511a97b9 100644 --- a/health/notifications/msteams/README.md +++ b/health/notifications/msteams/README.md @@ -1,48 +1,67 @@ -<!-- -title: "Microsoft Teams" -sidebar_label: "Microsoft Teams" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/msteams/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> - -# Microsoft Teams +# Microsoft Teams Agent alert notifications + +Learn how to send notifications to Microsoft Teams using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. + +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. This is what you will get: ![image](https://user-images.githubusercontent.com/1122372/92710359-0385e680-f358-11ea-8c52-f366a4fb57dd.png) -You need: +## Prerequisites -1. The **incoming webhook URL** as given by Microsoft Teams. You can use the same on all your Netdata servers (or you can have multiple if you like - your decision). -2. One or more channels to post the messages to. +You will need: -In Microsoft Teams the channel name is encoded in the URI after `/IncomingWebhook/` (for clarity the marked with `[]` in the following example): `https://outlook.office.com/webhook/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX@XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/IncomingWebhook/[XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX]/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX` +- the **incoming webhook URL** as given by Microsoft Teams. You can use the same on all your Netdata servers (or you can have multiple if you like - your decision) +- one or more channels to post the messages to +- terminal access to the Agent you wish to configure -You have to replace the encoded channel name by the placeholder `CHANNEL` in `MSTEAMS_WEBHOOK_URL`. The placeholder `CHANNEL` will be replaced by the actual encoded channel name before sending the notification. This makes it possible to publish to several channels in the same team. +## Configure Netdata to send alert notifications to Microsoft Teams -The encoded channel name must then be added to `DEFAULT_RECIPIENTS_MSTEAMS` or to one of the specific variables `role_recipients_msteams[]`. **At least one channel is mandatory for `DEFAULT_RECIPIENTS_MSTEAMS`.** +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. -Set the webhook and the recipients in `/etc/netdata/health_alarm_notify.conf` (to edit it on your system run `/etc/netdata/edit-config health_alarm_notify.conf`), like this: +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: -``` -SEND_MSTEAMS="YES" +1. Set `SEND_MSTEAMS` to `YES`. +2. Set `MSTEAMS_WEBHOOK_URL` to the incoming webhook URL as given by Microsoft Teams. +3. Set `DEFAULT_RECIPIENT_MSTEAMS` to the **encoded** Microsoft Teams channel name you want the alert notifications to be sent to. + In Microsoft Teams the channel name is encoded in the URI after `/IncomingWebhook/`. + You can define multiple channels like this: `CHANNEL1 CHANNEL2`. + All roles will default to this variable if left unconfigured. +4. You can also set the icons and colors for the different alerts in the same section of the file. -MSTEAMS_WEBHOOK_URL="https://outlook.office.com/webhook/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX@XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/IncomingWebhook/CHANNEL/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" +You can then have different channels per **role**, by editing `DEFAULT_RECIPIENT_MSTEAMS` with the channel you want, in the following entries at the bottom of the same file: -DEFAULT_RECIPIENT_MSTEAMS="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" +```conf +role_recipients_msteams[sysadmin]="CHANNEL1" +role_recipients_msteams[domainadmin]="CHANNEL2" +role_recipients_msteams[dba]="databases CHANNEL3" +role_recipients_msteams[webmaster]="CHANNEL4" +role_recipients_msteams[proxyadmin]="CHANNEL5" +role_recipients_msteams[sitemgr]="CHANNEL6" ``` -You can define multiple recipients by listing the encoded channel names like this: `XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY`. -This example will send the alarm to the two channels specified by their encoded channel names. +The values you provide should already exist as Microsoft Teams channels in the same Team. -You can give different recipients per **role** using these (in the same file): +An example of a working configuration would be: -``` -role_recipients_msteams[sysadmin]="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" -role_recipients_msteams[dba]="YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY" -role_recipients_msteams[webmaster]="ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ" +```conf +#------------------------------------------------------------------------------ +# Microsoft Teams (office.com) global notification options + +SEND_MSTEAMS="YES" +MSTEAMS_WEBHOOK_URL="https://outlook.office.com/webhook/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX@XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/IncomingWebhook/CHANNEL/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" +DEFAULT_RECIPIENT_MSTEAMS="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" ``` +## Test the notification method +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/ntfy/Makefile.inc b/health/notifications/ntfy/Makefile.inc new file mode 100644 index 000000000..b2045192c --- /dev/null +++ b/health/notifications/ntfy/Makefile.inc @@ -0,0 +1,12 @@ +# SPDX-License-Identifier: GPL-3.0-or-later + +# THIS IS NOT A COMPLETE Makefile +# IT IS INCLUDED BY ITS PARENT'S Makefile.am +# IT IS REQUIRED TO REFERENCE ALL FILES RELATIVE TO THE PARENT + +# install these files +dist_noinst_DATA += \ + ntfy/README.md \ + ntfy/Makefile.inc \ + $(NULL) + diff --git a/health/notifications/ntfy/README.md b/health/notifications/ntfy/README.md new file mode 100644 index 000000000..156fb09e2 --- /dev/null +++ b/health/notifications/ntfy/README.md @@ -0,0 +1,66 @@ +# ntfy agent alert notifications + +Learn how to send alerts to an ntfy server using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. + +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. + +[ntfy](https://ntfy.sh/) (pronounce: notify) is a simple HTTP-based [pub-sub](https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern) notification service. It allows you to send notifications to your phone or desktop via scripts from any computer, entirely without signup, cost or setup. It's also [open source](https://github.com/binwiederhier/ntfy) if you want to run your own server. + +This is what you will get: + +<img src="https://user-images.githubusercontent.com/5953192/230661442-a180abe2-c8bd-496e-88be-9038e62fb4f7.png" alt="Example alarm notifications in Ntfy" width="60%"></img> + +## Prerequisites + +You will need: + +- (Optional) A [self-hosted ntfy server](https://docs.ntfy.sh/faq/#can-i-self-host-it), in case you don't want to use https://ntfy.sh +- A new [topic](https://ntfy.sh/#subscribe) for the notifications to be published to +- terminal access to the Agent you wish to configure + +## Configure Netdata to send alert notifications to ntfy + +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. + +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: + +1. Set `SEND_NTFY` to `YES` +2. Set `DEFAULT_RECIPIENT_NTFY` to the URL formed by the server-topic combination you want the alert notifications to be sent to. Unless you are hosting your own server, the server should always be set to [https://ntfy.sh](https://ntfy.sh) + + You can define multiple recipient URLs like this: `https://SERVER1/TOPIC1 https://SERVER2/TOPIC2` + All roles will default to this variable if left unconfigured. + +> ### Warning +> All topics published on https://ntfy.sh are public, so anyone can subscribe to them and follow your notifications. To avoid that, ensure the topic is unique enough using a long, randomly generated ID, like in the following examples. +> + +An example of a working configuration with two topics as recipients, using the [https://ntfy.sh](https://ntfy.sh) server would be: + +```conf +SEND_NTFY="YES" +DEFAULT_RECIPIENT_NTFY="https://ntfy.sh/netdata-X7seHg7d3Tw9zGOk https://ntfy.sh/netdata-oIPm4IK1IlUtlA30" +``` + +You can then have different servers and/or topics per **role**, by editing `DEFAULT_RECIPIENT_NTFY` with the server-topic combination you want, in the following entries at the bottom of the same file: + +```conf +role_recipients_ntfy[sysadmin]="https://SERVER1/TOPIC1" +role_recipients_ntfy[domainadmin]="https://SERVER2/TOPIC2" +role_recipients_ntfy[dba]="https://SERVER3/TOPIC3" +role_recipients_ntfy[webmaster]="https://SERVER4/TOPIC4" +role_recipients_ntfy[proxyadmin]="https://SERVER5/TOPIC5" +role_recipients_ntfy[sitemgr]="https://SERVER6/TOPIC6" +``` + +## Test the notification method + +To test this alert refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/opsgenie/README.md b/health/notifications/opsgenie/README.md index 20f14b396..5b0303243 100644 --- a/health/notifications/opsgenie/README.md +++ b/health/notifications/opsgenie/README.md @@ -1,66 +1,51 @@ -<!-- -title: "Send notifications to Opsgenie" -description: "Send alerts to your Opsgenie incident response account any time an anomaly or performance issue strikes a node in your infrastructure." -sidebar_label: "Opsgenie" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/opsgenie/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> - -# Send notifications to Opsgenie - -[Opsgenie](https://www.atlassian.com/software/opsgenie) is an alerting and incident response tool. It is designed to -group and filter alarms, build custom routing rules for on-call teams, and correlate deployments and commits to -incidents. - -The first step is to create a [Netdata integration](https://docs.opsgenie.com/docs/api-integration) in the -[Opsgenie](https://www.atlassian.com/software/opsgenie) dashboard. After this, you need to edit -`health_alarm_notify.conf` on your system, by running the following from -your [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md): - -```bash -./edit-config health_alarm_notify.conf -``` +# Opsgenie Agent alert notifications -Change the variable `OPSGENIE_API_KEY` with the API key you got from Opsgenie. `OPSGENIE_API_URL` defaults to -`https://api.opsgenie.com`, however there are region-specific API URLs such as `https://eu.api.opsgenie.com`, so set -this if required. +Learn how to send notifications to Opsgenie using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. -```conf -SEND_OPSGENIE="YES" +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. -# Api key -# Default Opsgenie API -OPSGENIE_API_KEY="11111111-2222-3333-4444-555555555555" -OPSGENIE_API_URL="" -``` +[Opsgenie](https://www.atlassian.com/software/opsgenie) is an alerting and incident response tool. +It is designed to group and filter alarms, build custom routing rules for on-call teams, and correlate deployments and commits to incidents. -Changes to `health_alarm_notify.conf` do not require a Netdata restart. You can test your Opsgenie notifications -configuration by issuing the commands, replacing `ROLE` with your preferred role: +This is what you will get: +![Example alarm notifications in +Opsgenie](https://user-images.githubusercontent.com/49162938/92184518-f725f900-ee40-11ea-9afa-e7c639c72206.png) -```sh -# become user netdata -sudo su -s /bin/bash netdata +## Prerequisites -# send a test alarm -/usr/libexec/netdata/plugins.d/alarm-notify.sh test ROLE -``` +You will need: -If everything works, you'll see alarms in your Opsgenie platform: +- An Opsgenie integration. You can create an [integration](https://docs.opsgenie.com/docs/api-integration) in the [Opsgenie](https://www.atlassian.com/software/opsgenie) dashboard. -![Example alarm notifications in -Opsgenie](https://user-images.githubusercontent.com/49162938/92184518-f725f900-ee40-11ea-9afa-e7c639c72206.png) +- terminal access to the Agent you wish to configure -If sending the test notifications fails, you can look in `/var/log/netdata/error.log` to find the relevant error -message: +## Configure Netdata to send alert notifications to your Opsgenie account -```log -2020-09-03 23:07:00: alarm-notify.sh: ERROR: failed to send opsgenie notification for: hades test.chart.test_alarm is CRITICAL, with HTTP error code 401. -``` +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. + +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: + +1. Set `SEND_OPSGENIE` to `YES`. +2. Set `OPSGENIE_API_KEY` to the API key you got from Opsgenie. +3. `OPSGENIE_API_URL` defaults to `https://api.opsgenie.com`, however there are region-specific API URLs such as `https://eu.api.opsgenie.com`, so set this if required. + +An example of a working configuration would be: -You can find more details about the Opsgenie error codes in -their [response docs](https://docs.opsgenie.com/docs/response). +```conf +SEND_OPSGENIE="YES" +OPSGENIE_API_KEY="11111111-2222-3333-4444-555555555555" +OPSGENIE_API_URL="" +``` +## Test the notification method +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/pagerduty/README.md b/health/notifications/pagerduty/README.md index c6190e83f..70d6090d5 100644 --- a/health/notifications/pagerduty/README.md +++ b/health/notifications/pagerduty/README.md @@ -1,67 +1,69 @@ -<!-- -title: "Send alert notifications to PagerDuty" -description: "Send alerts to your PagerDuty dashboard any time an anomaly or performance issue strikes a node in your infrastructure." -sidebar_label: "PagerDuty" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/pagerduty/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> - -# Send alert notifications to PagerDuty +# PagerDuty Agent alert notifications + +Learn how to send notifications to PagerDuty using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. + +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. [PagerDuty](https://www.pagerduty.com/company/) is an enterprise incident resolution service that integrates with ITOps and DevOps monitoring stacks to improve operational reliability and agility. From enriching and aggregating events to correlating them into incidents, PagerDuty streamlines the incident management process by reducing alert noise and resolution times. -## What you need to get started +## Prerequisites + +You will need: + +- an installation of the [PagerDuty agent](https://www.pagerduty.com/docs/guides/agent-install-guide/) on the node running the Netdata Agent +- a PagerDuty `Generic API` service using either the `Events API v2` or `Events API v1` +- terminal access to the Agent you wish to configure -- An installation of the open-source [Netdata](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx) monitoring agent. -- An installation of the [PagerDuty agent](https://www.pagerduty.com/docs/guides/agent-install-guide/) on the node - running Netdata. -- A PagerDuty `Generic API` service using either the `Events API v2` or `Events API v1`. +## Configure Netdata to send alert notifications to PagerDuty -## Setup +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. -[Add a new service](https://support.pagerduty.com/docs/services-and-integrations#section-configuring-services-and-integrations) +Firstly, [Add a new service](https://support.pagerduty.com/docs/services-and-integrations#section-configuring-services-and-integrations) to PagerDuty. Click **Use our API directly** and select either `Events API v2` or `Events API v1`. Once you finish creating the service, click on the **Integrations** tab to find your **Integration Key**. -Navigate to the [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) and use -[`edit-config`](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) to open -`health_alarm_notify.conf`. +Then, edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: -```bash -cd /etc/netdata -sudo ./edit-config health_alarm_notify.conf -``` - -Scroll down to the `# pagerduty.com notification options` section. +1. Set `SEND_PD` to `YES`. +2. Set `DEFAULT_RECIPIENT_PD` to the PagerDuty service key you want the alert notifications to be sent to. + You can define multiple service keys like this: `pd_service_key_1 pd_service_key_2`. + All roles will default to this variable if left unconfigured. +3. If you chose `Events API v2` during service setup on PagerDuty, change `USE_PD_VERSION` to `2`. -Ensure `SEND_PD` is set to `YES`, then copy your Integration Key into `DEFAULT_RECIPIENT_ID`. Change `USE_PD_VERSION` to -`2` if you chose `Events API v2` during service setup on PagerDuty. Minus comments, the section should look like this: +You can then have different PagerDuty service keys per **role**, by editing `DEFAULT_RECIPIENT_PD` with the service key you want, in the following entries at the bottom of the same file: ```conf -SEND_PD="YES" -DEFAULT_RECIPIENT_PD="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" -USE_PD_VERSION="2" +role_recipients_pd[sysadmin]="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxa" +role_recipients_pd[domainadmin]="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxb" +role_recipients_pd[dba]="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxc" +role_recipients_pd[webmaster]="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxd" +role_recipients_pd[proxyadmin]="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxe" +role_recipients_pd[sitemgr]="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxf" ``` -## Testing +An example of a working configuration would be: -To test alert notifications to PagerDuty, run the following: +```conf +#------------------------------------------------------------------------------ +# pagerduty.com notification options -```bash -sudo su -s /bin/bash netdata -/usr/libexec/netdata/plugins.d/alarm-notify.sh test +SEND_PD="YES" +DEFAULT_RECIPIENT_PD="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" +USE_PD_VERSION="2" ``` -## Configuration - -Aside from the three values set in `health_alarm_notify.conf`, there is no further configuration required to send alert -notifications to PagerDuty. +## Test the notification method -To configure individual alarms, read our [alert configuration](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md) doc or -the [health entity reference](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md) doc. +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/prowl/README.md b/health/notifications/prowl/README.md index 8656c1314..a57405297 100644 --- a/health/notifications/prowl/README.md +++ b/health/notifications/prowl/README.md @@ -1,21 +1,17 @@ -<!-- -title: "Prowl" -sidebar_label: "Prowl" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/prowl/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> +# Prowl Agent alert notifications -# Prowl +Learn how to send notifications to Prowl using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. -[Prowl](https://www.prowlapp.com/) is a push notification service for iOS devices. Netdata -supports delivering notifications to iOS devices through Prowl. +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. + +[Prowl](https://www.prowlapp.com/) is a push notification service for iOS devices. +Netdata supports delivering notifications to iOS devices through Prowl. Because of how Netdata integrates with Prowl, there is a hard limit of at most 1000 notifications per hour (starting from the first notification -sent). Any alerts beyond the first thousand in an hour will be dropped. +sent). Any alerts beyond the first thousand in an hour will be dropped. Warning messages will be sent with the 'High' priority, critical messages will be sent with the 'Emergency' priority, and all other messages will @@ -23,10 +19,52 @@ be sent with the normal priority. Opening the notification's associated URL will take you to the Netdata dashboard of the system that issued the alert, directly to the chart that it triggered on. -## configuration +## Prerequisites + +You will need: + +- a Prowl API key, which can be requested through the Prowl website after registering +- terminal access to the Agent you wish to configure + +## Configure Netdata to send alert notifications to Prowl + +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. + +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: + +1. Set `SEND_PROWL` to `YES`. +2. Set `DEFAULT_RECIPIENT_PROWL` to the Prowl API key you want the alert notifications to be sent to. + You can define multiple API keys like this: `APIKEY1, APIKEY2`. + All roles will default to this variable if left unconfigured. + +You can then have different API keys per **role**, by editing `DEFAULT_RECIPIENT_PROWL` with the API keys you want, in the following entries at the bottom of the same file: + +```conf +role_recipients_prowl[sysadmin]="AAAAAAAA" +role_recipients_prowl[domainadmin]="BBBBBBBBB" +role_recipients_prowl[dba]="CCCCCCCCC" +role_recipients_prowl[webmaster]="DDDDDDDDDD" +role_recipients_prowl[proxyadmin]="EEEEEEEEEE" +role_recipients_prowl[sitemgr]="FFFFFFFFFF" +``` + +An example of a working configuration would be: + +```conf +#------------------------------------------------------------------------------ +# iOS Push Notifications + +SEND_PROWL="YES" +DEFAULT_RECIPIENT_PROWL="XXXXXXXXXX" +``` -To use this, you will need a Prowl API key, which can be requested through -the Prowl website after registering. +## Test the notification method -Once you have an API key, simply specify that as a recipient for Prowl -notifications. +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/pushbullet/README.md b/health/notifications/pushbullet/README.md index 17ed93646..6b19536a1 100644 --- a/health/notifications/pushbullet/README.md +++ b/health/notifications/pushbullet/README.md @@ -1,55 +1,72 @@ -<!-- -title: "PushBullet" -sidebar_label: "PushBullet" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/pushbullet/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> +# Pushbullet Agent alert notifications -# PushBullet +Learn how to send notifications to Pushbullet using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. -Will look like this on your browser: -![image](https://cloud.githubusercontent.com/assets/4300670/19109636/278b1c0c-8aee-11e6-8a09-7fc94fdbfec8.png) +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. -And like this on your Android device: +This is what it will look like this on your browser: +![image](https://user-images.githubusercontent.com/70198089/229842827-e9c93e44-3c86-4ab6-9b44-d8b36a00b015.png) -![image](https://cloud.githubusercontent.com/assets/4300670/19109635/278a1dde-8aee-11e6-9984-0bc87a13312d.png) +And this is what it will look like on your Android device: + +![image](https://user-images.githubusercontent.com/70198089/229842936-ea7e8f92-a353-43ca-a993-b1cc08e8508b.png) + +## Prerequisites You will need: -1. Sign up and log in to [pushbullet.com](https://www.pushbullet.com/) -2. Create a new access token in your [account settings](https://www.pushbullet.com/#settings/account). -3. Fill in the `PUSHBULLET_ACCESS_TOKEN` with the newly generated access token. -4. Add the recipient emails or channel tags (each channel tag must be prefixed with #, e.g. #channeltag) to `DEFAULT_RECIPIENT_PUSHBULLET`. - > 🚨 The pushbullet notification service will send emails to the email recipient, regardless of if they have a pushbullet account. +- a Pushbullet access token that can be created in your [account settings](https://www.pushbullet.com/#settings/account) +- terminal access to the Agent you wish to configure -To add notification channels, run `/etc/netdata/edit-config health_alarm_notify.conf` +## Configure Netdata to send alert notifications to Pushbullet -You can change the configuration like this: +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. -``` -############################################################################### -# pushbullet (pushbullet.com) push notification options +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: -# multiple recipients (a combination of email addresses or channel tags) can be given like this: -# "user1@email.com user2@mail.com #channel1 #channel2" +1. Set `Send_PUSHBULLET` to `YES`. +2. Set `PUSHBULLET_ACCESS_TOKEN` to the token you generated. +3. Set `DEFAULT_RECIPIENT_PUSHBULLET` to the email (e.g. `example@domain.com`) or the channel tag (e.g. `#channel`) you want the alert notifications to be sent to. -# enable/disable sending pushbullet notifications -SEND_PUSHBULLET="YES" + > ### Note + > + > Please note that the Pushbullet notification service will send emails to the email recipient, regardless of if they have a Pushbullet account or not. + + You can define multiple entries like this: `user1@email.com user2@email.com`. + All roles will default to this variable if left unconfigured. +4. While optional, you can also set `PUSHBULLET_SOURCE_DEVICE` to the identifier of the sending device. -# Signup and Login to pushbullet.com -# To get your Access Token, go to https://www.pushbullet.com/#settings/account -# And create a new access token -# Then just set the recipients emails and/or channel tags (channel tags must be prefixed with #) -# Please note that the if an email in the DEFAULT_RECIPIENT_PUSHBULLET does -# not have a pushbullet account, the pushbullet service will send an email -# to that address instead +You can then have different recipients per **role**, by editing `DEFAULT_RECIPIENT_PUSHBULLET` with the recipients you want, in the following entries at the bottom of the same file: -# Without an access token, Netdata cannot send pushbullet notifications. -PUSHBULLET_ACCESS_TOKEN="o.Sometokenhere" +```conf +role_recipients_pushbullet[sysadmin]="user1@email.com" +role_recipients_pushbullet[domainadmin]="user2@mail.com" +role_recipients_pushbullet[dba]="#channel1" +role_recipients_pushbullet[webmaster]="#channel2" +role_recipients_pushbullet[proxyadmin]="user3@mail.com" +role_recipients_pushbullet[sitemgr]="user4@mail.com" +``` + +An example of a working configuration would be: + +```conf +#------------------------------------------------------------------------------ +# pushbullet (pushbullet.com) push notification options + +SEND_PUSHBULLET="YES" +PUSHBULLET_ACCESS_TOKEN="XXXXXXXXX" DEFAULT_RECIPIENT_PUSHBULLET="admin1@example.com admin3@somemail.com #examplechanneltag #anotherchanneltag" ``` +## Test the notification method +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/pushover/README.md b/health/notifications/pushover/README.md index 4d5ea5a96..cd3621ef1 100644 --- a/health/notifications/pushover/README.md +++ b/health/notifications/pushover/README.md @@ -1,28 +1,67 @@ -<!-- -title: "PushOver" -sidebar_label: "PushOver" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/pushover/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> +# Pushover Agent alert notifications -# PushOver +Learn how to send notification to Pushover using Netdata's Agent alert notification +feature, which supports dozens of endpoints, user roles, and more. -pushover.net allows you to receive push notifications on your mobile phone. The service seems free for up to 7.500 messages per month. +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. -Netdata will send warning messages with priority `0` and critical messages with priority `1`. pushover.net allows you to select do-not-disturb hours. The way this is configured, critical notifications will ring and vibrate your phone, even during the do-not-disturb-hours. All other notifications will be delivered silently. +This is what you will get: -You need: +![image](https://user-images.githubusercontent.com/70198089/229842244-4ac998bb-6158-4955-ac2d-766a9999cc98.png) -1. APP TOKEN. You can use the same on all your Netdata servers. -2. USER TOKEN for each user you are going to send notifications to. This is the actual recipient of the notification. +Netdata will send warning messages with priority `0` and critical messages with priority `1`. Pushover allows you to select do-not-disturb hours. The way this is configured, critical notifications will ring and vibrate your phone, even during the do-not-disturb-hours. All other notifications will be delivered silently. -The configuration is like above (slack messages). +## Prerequisites -pushover.net notifications look like this: +You will need: -![image](https://cloud.githubusercontent.com/assets/2662304/18407319/839c10c4-7715-11e6-92c0-12f8215128d3.png) +- An Application token. You can use the same on all your Netdata servers. +- A User token for each user you are going to send notifications to. This is the actual recipient of the notification. +- terminal access to the Agent you wish to configure +## Configure Netdata to send alert notifications to Pushover +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. + +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: + +1. Set `SEND_PUSHOVER` to `YES`. +2. Set `PUSHOVER_APP_TOKEN` to your Pushover Application token. +3. Set `DEFAULT_RECIPIENT_PUSHOVER` to the Pushover User token you want the alert notifications to be sent to. + You can define multiple User tokens like this: `USERTOKEN1 USERTOKEN2`. + All roles will default to this variable if left unconfigured. + +You can then have different User tokens per **role**, by editing `DEFAULT_RECIPIENT_PUSHOVER` with the token you want, in the following entries at the bottom of the same file: + +```conf +role_recipients_pushover[sysadmin]="USERTOKEN1" +role_recipients_pushover[domainadmin]="USERTOKEN2" +role_recipients_pushover[dba]="USERTOKEN3 USERTOKEN4" +role_recipients_pushover[webmaster]="USERTOKEN5" +role_recipients_pushover[proxyadmin]="USERTOKEN6" +role_recipients_pushover[sitemgr]="USERTOKEN7" +``` + +An example of a working configuration would be: + +```conf +#------------------------------------------------------------------------------ +# pushover (pushover.net) global notification options + +SEND_PUSHOVER="YES" +PUSHOVER_APP_TOKEN="XXXXXXXXX" +DEFAULT_RECIPIENT_PUSHOVER="USERTOKEN" +``` + +## Test the notification method + +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/rocketchat/README.md b/health/notifications/rocketchat/README.md index 0f7867d0f..6f722aa86 100644 --- a/health/notifications/rocketchat/README.md +++ b/health/notifications/rocketchat/README.md @@ -1,57 +1,66 @@ -<!-- -title: "Rocket.Chat" -sidebar_label: "Rocket Chat" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/rocketchat/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> - -# Rocket.Chat +# Rocket.Chat Agent alert notifications + +Learn how to send notifications to Rocket.Chat using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. + +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. This is what you will get: ![Netdata on RocketChat](https://i.imgur.com/Zu4t3j3.png) -You need: -1. The **incoming webhook URL** as given by RocketChat. You can use the same on all your Netdata servers (or you can have multiple if you like - your decision). -2. One or more channels to post the messages to. +## Prerequisites -Get them here: <https://rocket.chat/docs/administrator-guides/integrations/index.html#how-to-create-a-new-incoming-webhook> +You will need: -Set them in `/etc/netdata/health_alarm_notify.conf` (to edit it on your system run `/etc/netdata/edit-config health_alarm_notify.conf`), like this: +- The **incoming webhook URL** as given by RocketChat. You can use the same on all your Netdata servers (or you can have multiple if you like - your decision). +- one or more channels to post the messages to. +- terminal access to the Agent you wish to configure -``` -#------------------------------------------------------------------------------ -# rocketchat (rocket.chat) global notification options +## Configure Netdata to send alert notifications to Rocket.Chat -# multiple recipients can be given like this: -# "CHANNEL1 CHANNEL2 ..." +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. -# enable/disable sending rocketchat notifications -SEND_ROCKETCHAT="YES" +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: -# Login to rocket.chat and create an incoming webhook. You need only one for all -# your Netdata servers (or you can have one for each of your Netdata). -# Without it, Netdata cannot send rocketchat notifications. -ROCKETCHAT_WEBHOOK_URL="<your_incoming_webhook_url>" +1. Set `SEND_ROCKETCHAT` to `YES`. +2. Set `ROCKETCHAT_WEBHOOK_URL` to your webhook URL. +3. Set `DEFAULT_RECIPIENT_ROCKETCHAT` to the channel you want the alert notifications to be sent to. + You can define multiple channels like this: `alerts systems`. + All roles will default to this variable if left unconfigured. -# if a role's recipients are not configured, a notification will be send to -# this rocketchat channel (empty = do not send a notification for unconfigured -# roles). -DEFAULT_RECIPIENT_ROCKETCHAT="monitoring_alarms" -``` +You can then have different channels per **role**, by editing `DEFAULT_RECIPIENT_ROCKETCHAT` with the channel you want, in the following entries at the bottom of the same file: -You can define multiple channels like this: `alarms systems`. -You can give different channels per **role** using these (at the same file): - -``` +```conf role_recipients_rocketchat[sysadmin]="systems" +role_recipients_rocketchat[domainadmin]="domains" role_recipients_rocketchat[dba]="databases systems" role_recipients_rocketchat[webmaster]="marketing development" +role_recipients_rocketchat[proxyadmin]="proxy_admin" +role_recipients_rocketchat[sitemgr]="sites" +``` + +The values you provide should already exist as Rocket.Chat channels. + +An example of a working configuration would be: + +```conf +#------------------------------------------------------------------------------ +# rocketchat (rocket.chat) global notification options + +SEND_ROCKETCHAT="YES" +ROCKETCHAT_WEBHOOK_URL="<your_incoming_webhook_url>" +DEFAULT_RECIPIENT_ROCKETCHAT="monitoring_alarms" ``` -The keywords `systems`, `databases`, `marketing`, `development` are RocketChat channels (they should already exist). -Both public and private channels can be used, even if they differ from the channel configured in your RocketChat incoming webhook. +## Test the notification method +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/slack/README.md b/health/notifications/slack/README.md index ad9a21346..66fdcc027 100644 --- a/health/notifications/slack/README.md +++ b/health/notifications/slack/README.md @@ -1,55 +1,54 @@ -<!-- -title: "Slack" -sidebar_label: "Slack" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/slack/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> - -# Slack +# Slack Agent alert notifications + +Learn how to send notifications to a Slack workspace using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. + +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. This is what you will get: -![image](https://cloud.githubusercontent.com/assets/2662304/18407116/bbd0fee6-7710-11e6-81cf-58c0defaee2b.png) -You need: +![image](https://user-images.githubusercontent.com/70198089/229841857-77ed2562-ee62-427b-803a-cef03d08238d.png) -1. The **incoming webhook URL** as given by slack.com. You can use the same on all your Netdata servers (or you can have multiple if you like - your decision). -2. One or more channels to post the messages to. -To get a webhook that works on multiple channels, you will need to login to your slack.com workspace and create an incoming webhook using the [Incoming Webhooks App](https://slack.com/apps/A0F7XDUAZ-incoming-webhooks). -Do NOT use the instructions in <https://api.slack.com/incoming-webhooks#enable_webhooks>, as the particular webhooks work only for a single channel. +## Prerequisites -Set the webhook and the recipients in `/etc/netdata/health_alarm_notify.conf` (to edit it on your system run `/etc/netdata/edit-config health_alarm_notify.conf`), like this: +You will need: -``` -SEND_SLACK="YES" +- a Slack app along with an incoming webhook, read Slack's guide on the topic [here](https://api.slack.com/messaging/webhooks) +- one or more channels to post the messages to +- terminal access to the Agent you wish to configure -SLACK_WEBHOOK_URL="https://hooks.slack.com/services/XXXXXXXX/XXXXXXXX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" +## Configure Netdata to send alert notifications to Slack -# if a role's recipients are not configured, a notification will be send to: -# - A slack channel (syntax: '#channel' or 'channel') -# - A slack user (syntax: '@user') -# - The channel or user defined in slack for the webhook (syntax: '#') -# empty = do not send a notification for unconfigured roles -DEFAULT_RECIPIENT_SLACK="alarms" -``` +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. -You can define multiple recipients like this: `# #alarms systems @myuser`. -This example will send the alarm to: +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: -- The recipient defined in slack for the webhook (not known to Netdata) -- The channel 'alarms' -- The channel 'systems' -- The user @myuser +1. Set `SEND_SLACK` to `YES`. +2. Set `SLACK_WEBHOOK_URL` to your Slack app's webhook URL. +3. Set `DEFAULT_RECIPIENT_SLACK` to the Slack channel your Slack app is set to send messages to. + The syntax for channels is `#channel` or `channel`. + All roles will default to this variable if left unconfigured. -You can give different recipients per **role** using these (at the same file): +An example of a working configuration would be: -``` -role_recipients_slack[sysadmin]="systems" -role_recipients_slack[dba]="databases systems" -role_recipients_slack[webmaster]="marketing development" +```conf +#------------------------------------------------------------------------------ +# slack (slack.com) global notification options + +SEND_SLACK="YES" +SLACK_WEBHOOK_URL="https://hooks.slack.com/services/XXXXXXXX/XXXXXXXX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" +DEFAULT_RECIPIENT_SLACK="#alarms" ``` +## Test the notification method +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/smstools3/README.md b/health/notifications/smstools3/README.md index 9535c9549..d72df4a62 100644 --- a/health/notifications/smstools3/README.md +++ b/health/notifications/smstools3/README.md @@ -1,49 +1,73 @@ -<!-- -title: "SMS Server Tools 3" -sidebar_label: "SMS server" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/smstools3/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> - -# SMS Server Tools 3 +# SMS Server Tools 3 Agent alert notifications + +Learn how to send notifications to `smstools3` using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. + +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. The [SMS Server Tools 3](http://smstools3.kekekasvi.com/) is a SMS Gateway software which can send and receive short messages through GSM modems and mobile phones. -To have Netdata send notifications via SMS Server Tools 3, you'll first need to [install](http://smstools3.kekekasvi.com/index.php?p=compiling) and [configure](http://smstools3.kekekasvi.com/index.php?p=configure) smsd. +## Prerequisites -Ensure that the user `netdata` can execute `sendsms`. Any user executing `sendsms` needs to: +You will need: -- Have write permissions to `/tmp` and `/var/spool/sms/outgoing` -- Be a member of group `smsd` +- to [install](http://smstools3.kekekasvi.com/index.php?p=compiling) and [configure](http://smstools3.kekekasvi.com/index.php?p=configure) smsd -To ensure that the steps above are successful, just `su netdata` and execute `sendsms phone message`. +- To ensure that the user `netdata` can execute `sendsms`. Any user executing `sendsms` needs to: + - have write permissions to `/tmp` and `/var/spool/sms/outgoing` + - be a member of group `smsd` -You then just need to configure the recipient phone numbers in `health_alarm_notify.conf`: + To ensure that the steps above are successful, just `su netdata` and execute `sendsms phone message`. +- terminal access to the Agent you wish to configure -```sh -#------------------------------------------------------------------------------ -# SMS Server Tools 3 (smstools3) global notification options +## Configure Netdata to send alert notifications to SMS Server Tools 3 -# enable/disable sending SMS Server Tools 3 SMS notifications -SEND_SMS="YES" +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. + +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: -# if a role's recipients are not configured, a notification will be sent to -# this SMS channel (empty = do not send a notification for unconfigured -# roles). Multiple recipients can be given like this: "PHONE1 PHONE2 ..." +1. Set the path for `sendsms`, otherwise Netdata will search for it in your system `$PATH`: -DEFAULT_RECIPIENT_SMS="" + ```conf + # The full path of the sendsms command (smstools3). + # If empty, the system $PATH will be searched for it. + # If not found, SMS notifications will be silently disabled. + sendsms="/usr/bin/sendsms" + ``` + +2. Set `SEND_SMS` to `YES`. +3. Set `DEFAULT_RECIPIENT_SMS` to the phone number you want the alert notifications to be sent to. + You can define multiple phone numbers like this: `PHONE1 PHONE2`. + All roles will default to this variable if left unconfigured. + +You can then have different phone numbers per **role**, by editing `DEFAULT_RECIPIENT_IRC` with the phone number you want, in the following entries at the bottom of the same file: + +```conf +role_recipients_sms[sysadmin]="PHONE1" +role_recipients_sms[domainadmin]="PHONE2" +role_recipients_sms[dba]="PHONE3" +role_recipients_sms[webmaster]="PHONE4" +role_recipients_sms[proxyadmin]="PHONE5" +role_recipients_sms[sitemgr]="PHONE6" ``` -Netdata uses the script `sendsms` that is installed by `smstools3` and just passes a phone number and a message to it. If `sendsms` is not in `$PATH`, you can pass its location in `health_alarm_notify.conf`: +An example of a working configuration would be: -```sh -# The full path of the sendsms command (smstools3). -# If empty, the system $PATH will be searched for it. -# If not found, SMS notifications will be silently disabled. -sendsms="" +```conf +#------------------------------------------------------------------------------ +# SMS Server Tools 3 (smstools3) global notification options +SEND_SMS="YES" +DEFAULT_RECIPIENT_SMS="1234567890" ``` +## Test the notification method +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/stackpulse/README.md b/health/notifications/stackpulse/README.md index 25266e822..b488ca192 100644 --- a/health/notifications/stackpulse/README.md +++ b/health/notifications/stackpulse/README.md @@ -1,15 +1,15 @@ <!-- -title: "Send notifications to StackPulse" +title: "StackPulse agent alert notifications" description: "Send alerts to your StackPulse Netdata integration any time an anomaly or performance issue strikes a node in your infrastructure." sidebar_label: "StackPulse" custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/stackpulse/README.md" learn_status: "Published" learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" +learn_rel_path: "Integrations/Notify/Agent alert notifications" learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" --> -# Send notifications to StackPulse +# StackPulse agent alert notifications [StackPulse](https://stackpulse.com/) is a software-as-a-service platform for site reliability engineering. It helps SREs, DevOps Engineers and Software Developers reduce toil and alert fatigue while improving reliability of diff --git a/health/notifications/syslog/README.md b/health/notifications/syslog/README.md index 3527decc4..4cda14b37 100644 --- a/health/notifications/syslog/README.md +++ b/health/notifications/syslog/README.md @@ -1,39 +1,77 @@ -<!-- -title: "Syslog" -sidebar_label: "Syslog" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/syslog/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> +# Syslog Agent alert notifications -# Syslog +Learn how to send notifications to Syslog using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. -You need a working `logger` command for this to work. This is the case on pretty much every Linux system in existence, and most BSD systems. +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. Logged messages will look like this: -``` +```bash netdata WARNING on hostname at Tue Apr 3 09:00:00 EDT 2018: disk_space._ out of disk space time = 5h ``` -## configuration +## Prerequisites -System log targets are configured as recipients in [`/etc/netdata/health_alarm_notify.conf`](https://github.com/netdata/netdata/blob/36bedc044584dea791fd29455bdcd287c3306cb2/conf.d/health_alarm_notify.conf#L534) (to edit it on your system run `/etc/netdata/edit-config health_alarm_notify.conf`). +You will need: -You can also configure per-role targets in the same file a bit further down. +- A working `logger` command for this to work. This is the case on pretty much every Linux system in existence, and most BSD systems. +- terminal access to the Agent you wish to configure -Targets are defined as follows: +## Configure Netdata to send alert notifications to Syslog -``` -[[facility.level][@host[:port]]/]prefix +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. + +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: + +1. Set `SYSLOG_FACILITY` to the facility used for logging, by default this value is set to `local6`. +2. Set `DEFAULT_RECIPIENT_SYSLOG` to the recipient you want the alert notifications to be sent to. + Targets are defined as follows: + + ```conf + [[facility.level][@host[:port]]/]prefix + ``` + + `prefix` defines what the log messages are prefixed with. By default, all lines are prefixed with 'netdata'. + + The `facility` and `level` are the standard syslog facility and level options, for more info on them see your local `logger` and `syslog` documentation. By default, Netdata will log to the `local6` facility, with a log level dependent on the type of message (`crit` for CRITICAL, `warning` for WARNING, and `info` for everything else). + + You can configure sending directly to remote log servers by specifying a host (and optionally a port). However, this has a somewhat high overhead, so it is much preferred to use your local syslog daemon to handle the forwarding of messages to remote systems (pretty much all of them allow at least simple forwarding, and most of the really popular ones support complex queueing and routing of messages to remote log servers). + + You can define multiple recipients like this: `daemon.notice@loghost:514/netdata daemon.notice@loghost2:514/netdata`. + All roles will default to this variable if left unconfigured. +3. Lastly, set `SEND_SYSLOG` to `YES`, make sure you have everything else configured _before_ turning this on. + +You can then have different recipients per **role**, by editing `DEFAULT_RECIPIENT_SYSLOG` with the recipient you want, in the following entries at the bottom of the same file: + +```conf +role_recipients_syslog[sysadmin]="daemon.notice@loghost1:514/netdata" +role_recipients_syslog[domainadmin]="daemon.notice@loghost2:514/netdata" +role_recipients_syslog[dba]="daemon.notice@loghost3:514/netdata" +role_recipients_syslog[webmaster]="daemon.notice@loghost4:514/netdata" +role_recipients_syslog[proxyadmin]="daemon.notice@loghost5:514/netdata" +role_recipients_syslog[sitemgr]="daemon.notice@loghost6:514/netdata" ``` -`prefix` defines what the log messages are prefixed with. By default, all lines are prefixed with 'netdata'. +An example of a working configuration would be: -The `facility` and `level` are the standard syslog facility and level options, for more info on them see your local `logger` and `syslog` documentation. By default, Netdata will log to the `local6` facility, with a log level dependent on the type of message (`crit` for CRITICAL, `warning` for WARNING, and `info` for everything else). +```conf +#------------------------------------------------------------------------------ +# syslog notifications -You can configure sending directly to remote log servers by specifying a host (and optionally a port). However, this has a somewhat high overhead, so it is much preferred to use your local syslog daemon to handle the forwarding of messages to remote systems (pretty much all of them allow at least simple forwarding, and most of the really popular ones support complex queueing and routing of messages to remote log servers). +SEND_SYSLOG="YES" +SYSLOG_FACILITY='local6' +DEFAULT_RECIPIENT_SYSLOG="daemon.notice@loghost6:514/netdata" +``` +## Test the notification method +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/telegram/README.md b/health/notifications/telegram/README.md index f80a2838d..9cc77d68b 100644 --- a/health/notifications/telegram/README.md +++ b/health/notifications/telegram/README.md @@ -1,50 +1,71 @@ -<!-- -title: "Telegram" -sidebar_label: "Telegram" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/telegram/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> - -# Telegram +# Telegram Agent alert notifications + +Learn how to send notifications to Telegram using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. + +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. [Telegram](https://telegram.org/) is a messaging app with a focus on speed and security, it’s super-fast, simple and free. You can use Telegram on all your devices at the same time — your messages sync seamlessly across any number of your phones, tablets or computers. -With Telegram, you can send messages, photos, videos and files of any type (doc, zip, mp3, etc), as well as create groups for up to 100,000 people or channels for broadcasting to unlimited audiences. You can write to your phone contacts and find people by their usernames. As a result, Telegram is like SMS and email combined — and can take care of all your personal or business messaging needs. +Telegram messages look like this: + +<img src="https://user-images.githubusercontent.com/1153921/66612223-f07dfb80-eb75-11e9-976f-5734ffd93ecd.png" width="50%"></img> Netdata will send warning messages without vibration. -You need to: +## Prerequisites + +You will need: + +- A bot token. To get one, contact the [@BotFather](https://t.me/BotFather) bot and send the command `/newbot` and follow the instructions. + Start a conversation with your bot or invite it into a group where you want it to send messages. +- The chat ID for every chat you want to send messages to. Contact the [@myidbot](https://t.me/myidbot) bot and send the `/getid` command to get your personal chat ID or invite it into a group and use the `/getgroupid` command to get the group chat ID. Group IDs start with a hyphen, supergroup IDs start with `-100`. -1. Get a bot token. To get one, contact the [@BotFather](https://t.me/BotFather) bot and send the command `/newbot`. Follow the instructions. -2. Start a conversation with your bot or invite it into a group where you want it to send messages. -3. Find the chat ID for every chat you want to send messages to. Contact the [@myidbot](https://t.me/myidbot) bot and send the `/getid` command to get your personal chat ID or invite it into a group and use the `/getgroupid` command to get the group chat ID. Group IDs start with a hyphen, supergroup IDs start with `-100`. Alternatively, you can get the chat ID directly from the bot API. Send *your* bot a command in the chat you want to use, then check `https://api.telegram.org/bot{YourBotToken}/getUpdates`, eg. `https://api.telegram.org/bot111122223:7OpFlFFRzRBbrUUmIjj5HF9Ox2pYJZy5/getUpdates` -4. Set the bot token and the chat ID of the recipient in `/etc/netdata/health_alarm_notify.conf` (to edit it on your system run `/etc/netdata/edit-config health_alarm_notify.conf`), like this: -``` -SEND_TELEGRAM="YES" -TELEGRAM_BOT_TOKEN="111122223:7OpFlFFRzRBbrUUmIjj5HF9Ox2pYJZy5" -DEFAULT_RECIPIENT_TELEGRAM="-100233335555" -``` +- terminal access to the Agent you wish to configure -You can define multiple recipients like this: `"-100311112222 212341234|critical"`. -This example will send: +## Configure Netdata to send alert notifications to Telegram -- All alerts to the group with ID -100311112222 -- Critical alerts to the user with ID 212341234 +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. -You can give different recipients per **role** using these (in the same file): +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: -``` -role_recipients_telegram[sysadmin]="212341234" -role_recipients_telegram[dba]="-1004444333321" -role_recipients_telegram[webmaster]="49999333322 -1009999222255" +1. Set `SEND_TELEGRAM` to `YES`. +2. Set `TELEGRAM_BOT_TOKEN` to your bot token. +3. Set `DEFAULT_RECIPIENT_TELEGRAM` to the chat ID you want the alert notifications to be sent to. + You can define multiple chat IDs like this: `49999333322 -1009999222255`. + All roles will default to this variable if left unconfigured. + +You can then have different chats per **role**, by editing `DEFAULT_RECIPIENT_TELEGRAM` with the chat ID you want, in the following entries at the bottom of the same file: + +```conf +role_recipients_telegram[sysadmin]="49999333324" +role_recipients_telegram[domainadmin]="49999333389" +role_recipients_telegram[dba]="-1009999222255" +role_recipients_telegram[webmaster]="-1009999222255 49999333389" +role_recipients_telegram[proxyadmin]="49999333344" +role_recipients_telegram[sitemgr]="49999333876" ``` -Telegram messages look like this: +An example of a working configuration would be: -![Netdata notifications via Telegram](https://user-images.githubusercontent.com/1153921/66612223-f07dfb80-eb75-11e9-976f-5734ffd93ecd.png) +```conf +#------------------------------------------------------------------------------ +# telegram (telegram.org) global notification options + +SEND_TELEGRAM="YES" +TELEGRAM_BOT_TOKEN="111122223:7OpFlFFRzRBbrUUmIjj5HF9Ox2pYJZy5" +DEFAULT_RECIPIENT_TELEGRAM="-100233335555" +``` +## Test the notification method +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/twilio/README.md b/health/notifications/twilio/README.md index 470b2413b..8214b6a42 100644 --- a/health/notifications/twilio/README.md +++ b/health/notifications/twilio/README.md @@ -1,52 +1,72 @@ -<!-- -title: "Twilio" -sidebar_label: "Twilio" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/twilio/README.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" -learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" ---> - -# Twilio +# Twilio Agent alert notifications + +Learn how to send notifications to Twilio using Netdata's Agent alert notification feature, which supports dozens of endpoints, user roles, and more. + +> ### Note +> +> This file assumes you have read the [Introduction to Agent alert notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md), detailing how the Netdata Agent's alert notification method works. Will look like this on your Android device: -![image](https://cloud.githubusercontent.com/assets/17090999/20034652/620b6100-a39b-11e6-96af-4f83b8e830e2.png) +![image](https://user-images.githubusercontent.com/70198089/229841323-6c4b1956-dd91-423e-abaf-2799000f72a8.png) + + +## Prerequisites You will need: -1. Signup and Login to twilio.com -2. Pick an SMS capable number during sign up. -3. Get your SID, and Token from <https://www.twilio.com/console> -4. Fill in TWILIO_ACCOUNT_SID="XXXXXXXX" TWILIO_ACCOUNT_TOKEN="XXXXXXXXX" TWILIO_NUMBER="+XXXXXXXXXXX" -5. Add the recipient phone numbers to DEFAULT_RECIPIENT_TWILIO="+XXXXXXXXXXX" +- to get your SID, and Token from <https://www.twilio.com/console> +- terminal access to the Agent you wish to configure -!!PLEASE NOTE THAT IF YOUR ACCOUNT IS A TRIAL ACCOUNT YOU WILL ONLY BE ABLE TO SEND NOTIFICATIONS TO THE NUMBER YOU SIGNED UP WITH +## Configure Netdata to send alert notifications to Twilio -Set them in `/etc/netdata/health_alarm_notify.conf` (to edit it on your system run `/etc/netdata/edit-config health_alarm_notify.conf`), like this: +> ### Info +> +> This file mentions editing configuration files. +> +> - To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. +> Note that to run the script you need to be inside your Netdata config directory. +> +> It is recommended to use this way for configuring Netdata. -``` -############################################################################### -# Twilio (twilio.com) SMS options +Edit `health_alarm_notify.conf`, changes to this file do not require restarting Netdata: -# multiple recipients can be given like this: -# "+15555555555 +17777777777" +1. Set `SEND_TWILIO` to `YES`. +2. Set `TWILIO_ACCOUNT_SID` to your account SID. +3. Set `TWILIO_ACCOUNT_TOKEN` to your account token. +4. Set `TWILIO_NUMBER` to your account's number. +5. Set `DEFAULT_RECIPIENT_TWILIO` to the number you want the alert notifications to be sent to. + You can define multiple numbers like this: `+15555555555 +17777777777`. + All roles will default to this variable if left unconfigured. -# enable/disable sending twilio SMS -SEND_TWILIO="YES" + > ### Note + > + > Please not that if your account is a trial account you will only be able to send notifications to the number you signed up with. -# Signup for free trial and select a SMS capable Twilio Number -# To get your Account SID and Token, go to https://www.twilio.com/console -# Place your sid, token and number below. -# Then just set the recipients' phone numbers. -# The trial account is only allowed to use the number specified when set up. +You can then have different recipients per **role**, by editing `DEFAULT_RECIPIENT_TWILIO` with the recipient's number you want, in the following entries at the bottom of the same file: -# Without an account sid and token, Netdata cannot send Twilio text messages. +```conf +role_recipients_twilio[sysadmin]="+15555555555" +role_recipients_twilio[domainadmin]="+15555555556" +role_recipients_twilio[dba]="+15555555557" +role_recipients_twilio[webmaster]="+15555555558" +role_recipients_twilio[proxyadmin]="+15555555559" +role_recipients_twilio[sitemgr]="+15555555550" +``` + +An example of a working configuration would be: + +```conf +#------------------------------------------------------------------------------ +# Twilio (twilio.com) SMS options + +SEND_TWILIO="YES" TWILIO_ACCOUNT_SID="xxxxxxxxx" TWILIO_ACCOUNT_TOKEN="xxxxxxxxxx" TWILIO_NUMBER="xxxxxxxxxxx" DEFAULT_RECIPIENT_TWILIO="+15555555555" ``` +## Test the notification method +To test this alert notification method refer to the ["Testing Alert Notifications"](https://github.com/netdata/netdata/blob/master/health/notifications/README.md#testing-alert-notifications) section of the Agent alert notifications page. diff --git a/health/notifications/web/README.md b/health/notifications/web/README.md index b4afd9ea7..36ca26689 100644 --- a/health/notifications/web/README.md +++ b/health/notifications/web/README.md @@ -1,14 +1,14 @@ <!-- -title: "Pop up" -sidebar_label: "Pop up notifications" +title: "Browser pop up agent alert notifications" +sidebar_label: "Browser pop ups" custom_edit_url: "https://github.com/netdata/netdata/edit/master/health/notifications/web/README.md" learn_status: "Published" learn_topic_type: "Tasks" -learn_rel_path: "Setup/Notification/Agent" +learn_rel_path: "Integrations/Notify/Agent alert notifications" learn_autogeneration_metadata: "{'part_of_cloud': False, 'part_of_agent': True}" --> -# Pop up notifications +# Browser pop up agent alert notifications The Netdata dashboard shows HTML notifications, when it is open. |