diff options
Diffstat (limited to '')
3 files changed, 278 insertions, 0 deletions
diff --git a/docs/cloud/alerts-notifications/manage-alert-notification-silencing-rules.md b/docs/cloud/alerts-notifications/manage-alert-notification-silencing-rules.md new file mode 100644 index 00000000..b9806c6f --- /dev/null +++ b/docs/cloud/alerts-notifications/manage-alert-notification-silencing-rules.md @@ -0,0 +1,58 @@ +# Manage alert notification silencing rules + +From the Cloud interface, you can manage your space's alert notification silencing rules settings as well as allow users to define their personal ones. + +## Prerequisites + +To manage **space's alert notification silencing rule settings**, you will need the following: + +- A Netdata Cloud account +- Access to the space as an **administrator** or **manager** (**troubleshooters** can only view space rules) + + +To manage your **personal alert notification silencing rule settings**, you will need the following: + +- A Netdata Cloud account +- Access to the space with any roles except **billing** + +### Steps + +1. Click on the **Space settings** cog (located above your profile icon) +1. Click on the **Alert & Notification** tab on the left hand-side +1. Click on the **Notification Silencing Rules** tab +1. You will be presented with a table of the configured alert notification silencing rules for: + * the space (if aren't an **observer**) + * yourself + + You will be able to: + 1. **Add a new** alert notification silencing rule configuration. + - Choose if it applies to **All users** or **Myself** (All users is only available for **administrators** and **managers**) + - You need to provide a name for the configuration so you can easily refer to it + - Define criteria for Nodes: To which Rooms will this apply? What Nodes? Does it apply to host labels key-value pairs? + - Define criteria for Alerts: Which alert name is being targeted? What alert context? Will it apply to a specific alert role? + - Define when it will be applied: + - Immediately, from now till until it is turned off or until a specific duration (start and end date automatically set) + - Scheduled, you specify the start and end time for when the rule becomes active and then inactive (time is set according to your browser local timezone) + Note: You are only able to add a rule if your space is on a [paid plan](https://github.com/netdata/netdata/edit/master/docs/cloud/manage/plans.md). + 1. **Edit an existing** alert notification silencing rule configurations. You will be able to change: + - The name provided for it + - Who it applies to + - Selection criteria for Nodes and Alert + - When it will be applied + 1. **Enable/Disable** a given alert notification silencing rule configuration. + - Use the toggle to enable or disable + 1. **Delete an existing** alert notification silencing rule. + - Use the trash icon to delete your configuration + +## Silencing rules examples + +| Rule name | War Rooms | Nodes | Host Label | Alert name | Alert context | Alert role | Description | +| :-- | :-- | :-- | :-- | :-- | :-- | :-- | :--| +| Space silencing | All Rooms | * | * | * | * | * | This rule silences the entire space, targets all nodes and for all users. E.g. infrastructure wide maintenance window. | +| DB Servers Rooms | PostgreSQL Servers | * | * | * | * | * | This rules silences the nodes in the room named PostgreSQL Servers, for example it doesn't silence the `All Nodes` room. E.g. My team with membership to this room doesn't want to receive notifications for these nodes. | +| Node child1 | All Rooms | `child1` | * | * | * | * | This rule silences all alert state transitions for node `child1` on all rooms and for all users. E.g. node could be going under maintenance. | +| Production nodes | All Rooms | * | `environment:production` | * | * | * | This rule silences all alert state transitions for nodes with the host label key-value pair `environment:production`. E.g. Maintenance window on nodes with specific host labels. | +| Third party maintenance | All Rooms | * | * | `httpcheck_posthog_netdata_cloud.request_status` | * | * | This rule silences this specific alert since third party partner will be undergoing maintenance. | +| Intended stress usage on CPU | All Rooms | * | * | * | `system.cpu` | * | This rule silences specific alerts across all nodes and their CPU cores. | +| Silence role webmaster | All Rooms | * | * | * | * | `webmaster` | This rule silences all alerts configured with the role `webmaster`. | +| Silence alert on node | All Rooms | `child1` | * | `httpcheck_posthog_netdata_cloud.request_status` | * | * | This rule silences the specific alert on the `child1` node. | diff --git a/docs/cloud/alerts-notifications/manage-notification-methods.md b/docs/cloud/alerts-notifications/manage-notification-methods.md new file mode 100644 index 00000000..f61b6bf6 --- /dev/null +++ b/docs/cloud/alerts-notifications/manage-notification-methods.md @@ -0,0 +1,73 @@ +# Manage notification methods + +From the Cloud interface, you can manage your space's notification settings as well as allow users to personalize their notifications setting + +## Manage space notification settings + +### Prerequisites + +To manage space notification settings, you will need the following: + +- A Netdata Cloud account +- Access to the space as an **administrator** + +### Available actions per notification methods based on service level + +| **Action** | **Personal service level** | **System service level** | +| :- | :-: | :-: | +| Enable / Disable | X | X | +| Edit | | X | | +| Delete | X | X | +| Add multiple configurations for same method | | X | + +Notes: +* For Netadata provided ones you can't delete the existing notification method configuration. +* Enable, Edit and Add actions over specific notification methods will only be allowed if your plan has access to those ([service classification](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.md#service-classification)) + +### Steps + +1. Click on the **Space settings** cog (located above your profile icon) +1. Click on the **Alerts & Notification** tab on the left hand-side +1. Click on the **Notification Methods** tab +1. You will be presented with a table of the configured notification methods for the space. You will be able to: + 1. **Add a new** notification method configuration. + - Choose the service from the list of the available ones, you'll may see a list of unavailable options if your plan doesn't allow some of them (you will see on the + card the plan level that allows a specific service) + - You can optionally provide a name for the configuration so you can easily refer to what it + - Define filtering criteria. To which Rooms will this apply? What notifications I want to receive? (All Alerts and unreachable, All Alerts, Critical only) + - Depending on the service different inputs will be present, please note that there are mandatory and optional inputs + - If you doubts on how to configure the service you can find a link at the top of the modal that takes you to the specific documentation page to help you + 1. **Edit an existing** notification method configuration. Personal level ones can't be edited here, see [Manage user notification settings](#manage-user-notification-settings). You will be able to change: + - The name provided for it + - Filtering criteria + - Service specific inputs + 1. **Enable/Disable** a given notification method configuration. + - Use the toggle to enable or disable the notification method configuration + 1. **Delete an existing** notification method configuration. Netdata provided ones can't be deleted, e.g. Email + - Use the trash icon to delete your configuration + +## Manage user notification settings + +### Prerequisites + +To manage user specific notification settings, you will need the following: + +- A Cloud account +- Have access to, at least, a space + +Note: If an administrator has disabled a Personal [service level](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.md#service-level) notification method this will override any user specific setting. + +### Steps + +1. Click on the **User notification settings** shortcut on top of the help button +1. You are presented with: + - The Personal [service level](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.md#service-level) notification methods you can manage + - The list spaces and rooms inside those where you have access to + - If you're an administrator, Manager or Troubleshooter you'll also see the Rooms from a space you don't have access to on **All Rooms** tab and you can activate notifications for them by joining the room +1. On this modal you will be able to: + 1. **Enable/Disable** the notification method for you, this applies accross all spaces and rooms + - Use the the toggle enable or disable the notification method + 1. **Define what notifications you want** to per space/room: All Alerts and unreachable, All Alerts, Critical only or No notifications + 1. **Activate notifications** for a room you aren't a member of + - From the **All Rooms** tab click on the Join button for the room(s) you want + diff --git a/docs/cloud/alerts-notifications/notifications.md b/docs/cloud/alerts-notifications/notifications.md new file mode 100644 index 00000000..cde30a2b --- /dev/null +++ b/docs/cloud/alerts-notifications/notifications.md @@ -0,0 +1,147 @@ +# Cloud alert notifications + +import Callout from '@site/src/components/Callout' + +Netdata Cloud can send centralized alert notifications to your team whenever a node enters a warning, critical, or +unreachable state. By enabling notifications, you ensure no alert, on any node in your infrastructure, goes unnoticed by +you or your team. + +Having this information centralized helps you: +* Have a clear view of the health across your infrastructure, seeing all alerts in one place. +* Easily [set up your alert notification process](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/manage-notification-methods.md): +methods to use and where to use them, filtering rules, etc. +* Quickly troubleshoot using [Metric Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md) +or [Anomaly Advisor](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/anomaly-advisor.md) + +If a node is getting disconnected often or has many alerts, we protect you and your team from alert fatigue by sending +you a flood protection notification. Getting one of these notifications is a good signal of health or performance issues +on that node. + +Admins must enable alert notifications for their [Space(s)](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/manage-notification-methods.md#manage-space-notification-settings). All users in a +Space can then personalize their notifications settings from within their [account +menu](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/#manage-user-notification-settings). + +<Callout type="notice"> + +Centralized alert notifications from Netdata Cloud is a independent process from [notifications from +Netdata](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md). You can enable one or the other, or both, based on your needs. However, +the alerts you see in Netdata Cloud are based on those streamed from your Netdata-monitoring nodes. If you want to tweak +or add new alert that you see in Netdata Cloud, and receive via centralized alert notifications, you must +[configure](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md) each node's alert watchdog. + +</Callout> + +## Alert notifications + +Netdata Cloud can send centralized alert notifications to your team whenever a node enters a warning, critical, or unreachable state. By enabling notifications, +you ensure no alert, on any node in your infrastructure, goes unnoticed by you or your team. + +If a node is getting disconnected often or has many alerts, we protect you and your team from alert fatigue by sending you a flood protection notification. +Getting one of these notifications is a good signal of health or performance issues on that node. + +Alert notifications can be delivered through different methods, these can go from an Email sent from Netdata to the use of a 3rd party tool like PagerDuty. + +Notification methods are classified on two main attributes: +* Service level: Personal or System +* Service classification: Community or Business + +Only administrators are able to manage the space's alert notification settings. +All users in a Space can personalize their notifications settings, for Personal service level notification methods, from within their profile menu. + +> ⚠️ Netdata Cloud supports different notification methods and their availability will depend on the plan you are at. +> For more details check [Service classification](#service-classification) or [netdata.cloud/pricing](https://www.netdata.cloud/pricing). + +### Service level + +#### Personal + +The notifications methods classified as **Personal** are what we consider generic, meaning that these can't have specific rules for them set by the administrators. + +These notifications are sent to the destination of the channel which is a user-specific attribute, e.g. user's e-mail, and the users are the ones that will then be able to +manage what specific configurations they want for the Space / Room(s) and the desired Notification level, they can achieve this from their User Profile page under +**Notifications**. + +One example of such a notification method is the E-mail. + +#### System + +For **System** notification methods, the destination of the channel will be a target that usually isn't specific to a single user, e.g. slack channel. + +These notification methods allow for fine-grain rule settings to be done by administrators and more than one configuration can exist for them since. You can specify +different targets depending on Rooms or Notification level settings. + +Some examples of such notification methods are: Webhook, PagerDuty, Slack. + +### Service classification + +#### Community + +Notification methods classified as Community can be used by everyone independent on the plan your space is at. +These are: Email and discord + +#### Pro + +Notification methods classified as Pro are only available for **Pro** and **Business** plans +These are: webhook + +#### Business + +Notification methods classified as Business are only available for **Business** plans +These are: PagerDuty, Slack, Opsgenie + +## Silencing Alert notifications + +Netdata Cloud provides you a Silencing Rule engine which allows you to mute alert notifications. This muting action is specific to alert state transition notifications, it doesn't include node unreachable state transitions. + +The Silencing Rule engine is flexible and allows you to enter silence rules for the two main entities involved on alert notifications and can be set using different attributes. The main entities you can enter are **Nodes** and **Alerts** which can be used in combination or isolation to target specific needs - see some examples [here](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/manage-alert-notification-silencing-rules.md#silencing-rules-examples). + +### Scope definition for Nodes +* **Space:** silencing the space, selecting `All Rooms`, silences all alert state transitions from any node claimed to the space. +* **War Room:** silencing a specific room will silence all alert state transitions from any node in that room. Please note if the node belongs to +another room which isn't silenced it can trigger alert notifications to the users with membership to that other room. +* **Node:** silencing a specific node can be done for the entire space, selecting `All Rooms`, or for specific war room(s). The main difference is +if the node should be silenced for the entire space or just for specific rooms (when specific rooms are selected only users with membership to that room won't receive notifications). + +### Scope definition for Alerts +* **Alert name:** silencing a specific alert name silences all alert state transitions for that specific alert. +* **Alert context:** silencing a specific alert context will silence all alert state transitions for alerts targeting that chart context, for more details check [alert configuration docs](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#alert-line-on). +* **Alert role:** silencing a specific alert role will silence all the alert state transitions for alerts that are configured to be specific role recipients, for more details check [alert configuration docs](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#alert-line-to). + +Beside the above two main entities there are another two important settings that you can define on a silencing rule: +* Who does the rule affect? **All user** in the space or **Myself** +* When does is to apply? **Immediately** or on a **Schedule** (when setting immediately you can set duration) + +For further help on setting alert notification silencing rules go to [Manage Alert Notification Silencing Rules](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/manage-alert-notification-silencing-rules.md). + +> ⚠️ This feature is only available for [Netdata paid plans](https://github.com/netdata/netdata/edit/master/docs/cloud/manage/plans.md). + +## Flood protection + +If a node has too many state changes like firing too many alerts or going from reachable to unreachable, Netdata Cloud +enables flood protection. As long as a node is in flood protection mode, Netdata Cloud does not send notifications about +this node. Even with flood protection active, it is possible to access the node directly, either via Netdata Cloud or +the local Agent dashboard at `http://NODE:19999`. + +## Anatomy of an alert notification + +Email alert notifications show the following information: + +- The Space's name +- The node's name +- Alert status: critical, warning, cleared +- Previous alert status +- Time at which the alert triggered +- Chart context that triggered the alert +- Name and information about the triggered alert +- Alert value +- Total number of warning and critical alerts on that node +- Threshold for triggering the given alert state +- Calculation or database lookups that Netdata uses to compute the value +- Source of the alert, including which file you can edit to configure this alert on an individual node + +Email notifications also feature a **Go to Node** button, which takes you directly to the offending chart for that node +within Cloud's embedded dashboards. + +Here's an example email notification for the `ram_available` chart, which is in a critical state: + +![Screenshot of an alert notification email from Netdata Cloud](https://user-images.githubusercontent.com/1153921/87461878-e933c480-c5c3-11ea-870b-affdb0801854.png) |