diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-13 11:32:39 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-13 11:32:39 +0000 |
commit | 56ae875861ab260b80a030f50c4aff9f9dc8fff0 (patch) | |
tree | 531412110fc901a5918c7f7442202804a83cada9 /doc/03-monitoring-basics.md | |
parent | Initial commit. (diff) | |
download | icinga2-upstream/2.14.2.tar.xz icinga2-upstream/2.14.2.zip |
Adding upstream version 2.14.2.upstream/2.14.2upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
-rw-r--r-- | doc/03-monitoring-basics.md | 3305 |
1 files changed, 3305 insertions, 0 deletions
diff --git a/doc/03-monitoring-basics.md b/doc/03-monitoring-basics.md new file mode 100644 index 0000000..06ea0c1 --- /dev/null +++ b/doc/03-monitoring-basics.md @@ -0,0 +1,3305 @@ +# Monitoring Basics <a id="monitoring-basics"></a> + +This part of the Icinga 2 documentation provides an overview of all the basic +monitoring concepts you need to know to run Icinga 2. +Keep in mind these examples are made with a Linux server. If you are +using Windows, you will need to change the services accordingly. See the [ITL reference](10-icinga-template-library.md#windows-plugins) + for further information. + +## Attribute Value Types <a id="attribute-value-types"></a> + +The Icinga 2 configuration uses different value types for attributes. + + Type | Example + -------------------------------------------------------|--------------------------------------------------------- + [Number](17-language-reference.md#numeric-literals) | `5` + [Duration](17-language-reference.md#duration-literals) | `1m` + [String](17-language-reference.md#string-literals) | `"These are notes"` + [Boolean](17-language-reference.md#boolean-literals) | `true` + [Array](17-language-reference.md#array) | `[ "value1", "value2" ]` + [Dictionary](17-language-reference.md#dictionary) | `{ "key1" = "value1", "key2" = false }` + +It is important to use the correct value type for object attributes +as otherwise the [configuration validation](11-cli-commands.md#config-validation) will fail. + +## Hosts and Services <a id="hosts-services"></a> + +Icinga 2 can be used to monitor the availability of hosts and services. Hosts +and services can be virtually anything which can be checked in some way: + +* Network services (HTTP, SMTP, SNMP, SSH, etc.) +* Printers +* Switches or routers +* Temperature sensors +* Other local or network-accessible services + +Host objects provide a mechanism to group services that are running +on the same physical device. + +Here is an example of a host object which defines two child services: + +``` +object Host "my-server1" { + address = "10.0.0.1" + check_command = "hostalive" +} + +object Service "ping4" { + host_name = "my-server1" + check_command = "ping4" +} + +object Service "http" { + host_name = "my-server1" + check_command = "http" +} +``` + +The example creates two services `ping4` and `http` which belong to the +host `my-server1`. + +It also specifies that the host should perform its own check using the `hostalive` +check command. + +The `address` attribute is used by check commands to determine which network +address is associated with the host object. + +Details on troubleshooting check problems can be found [here](15-troubleshooting.md#troubleshooting). + +### Host States <a id="host-states"></a> + +Hosts can be in any one of the following states: + + Name | Description + ------------|-------------- + UP | The host is available. + DOWN | The host is unavailable. + +### Service States <a id="service-states"></a> + +Services can be in any one of the following states: + + Name | Description + ------------|-------------- + OK | The service is working properly. + WARNING | The service is experiencing some problems but is still considered to be in working condition. + CRITICAL | The check successfully determined that the service is in a critical state. + UNKNOWN | The check could not determine the service's state. + +### Check Result State Mapping <a id="check-result-state-mapping"></a> + +[Check plugins](05-service-monitoring.md#service-monitoring-plugins) return +with an exit code which is converted into a state number. +Services map the states directly while hosts will treat `0` or `1` as `UP` +for example. + + Value | Host State | Service State + ------|------------|-------------- + 0 | Up | OK + 1 | Up | Warning + 2 | Down | Critical + 3 | Down | Unknown + +### Hard and Soft States <a id="hard-soft-states"></a> + +When detecting a problem with a host/service, Icinga re-checks the object a number of +times (based on the `max_check_attempts` and `retry_interval` settings) before sending +notifications. This ensures that no unnecessary notifications are sent for +transient failures. During this time the object is in a `SOFT` state. + +After all re-checks have been executed and the object is still in a non-OK +state, the host/service switches to a `HARD` state and notifications are sent. + + Name | Description + ------------|-------------- + HARD | The host/service's state hasn't recently changed. `check_interval` applies here. + SOFT | The host/service has recently changed state and is being re-checked with `retry_interval`. + +### Host and Service Checks <a id="host-service-checks"></a> + +Hosts and services determine their state by running checks in a regular interval. + +``` +object Host "router" { + check_command = "hostalive" + address = "10.0.0.1" +} +``` + +The `hostalive` command is one of several built-in check commands. It sends ICMP +echo requests to the IP address specified in the `address` attribute to determine +whether a host is online. + +> **Tip** +> +> `hostalive` is the same as `ping` but with different default thresholds. +> Both use the `ping` CLI command to execute sequential checks. +> +> If you need faster ICMP checks, look into the [icmp](10-icinga-template-library.md#plugin-check-command-icmp) CheckCommand. + +A number of other [built-in check commands](10-icinga-template-library.md#icinga-template-library) are also +available. In addition to these commands the next few chapters will explain in +detail how to set up your own check commands. + +#### Host Check Alternatives <a id="host-check-alternatives"></a> + +If the host is not reachable with ICMP, HTTP, etc. you can +also use the [dummy](10-icinga-template-library.md#itl-dummy) CheckCommand to set a default state. + +``` +object Host "dummy-host" { + check_command = "dummy" + vars.dummy_state = 0 //Up + vars.dummy_text = "Everything OK." +} +``` + +This method is also used when you send in [external check results](08-advanced-topics.md#external-check-results). + +A more advanced technique is to calculate an overall state +based on all services. This is described [here](08-advanced-topics.md#access-object-attributes-at-runtime-cluster-check). + + +## Templates <a id="object-inheritance-using-templates"></a> + +Templates may be used to apply a set of identical attributes to more than one +object: + +``` +template Service "generic-service" { + max_check_attempts = 3 + check_interval = 5m + retry_interval = 1m + enable_perfdata = true +} + +apply Service "ping4" { + import "generic-service" + + check_command = "ping4" + + assign where host.address +} + +apply Service "ping6" { + import "generic-service" + + check_command = "ping6" + + assign where host.address6 +} +``` + + +In this example the `ping4` and `ping6` services inherit properties from the +template `generic-service`. + +Objects as well as templates themselves can import an arbitrary number of +other templates. Attributes inherited from a template can be overridden in the +object if necessary. + +You can also import existing non-template objects. + +> **Note** +> +> Templates and objects share the same namespace, i.e. you can't define a template +> that has the same name like an object. + + +### Multiple Templates <a id="object-inheritance-using-multiple-templates"></a> + +The following example uses [custom variables](03-monitoring-basics.md#custom-variables) which +are provided in each template. The `web-server` template is used as the +base template for any host providing web services. In addition to that it +specifies the custom variable `webserver_type`, e.g. `apache`. Since this +template is also the base template, we import the `generic-host` template here. +This provides the `check_command` attribute by default and we don't need +to set it anywhere later on. + +``` +template Host "web-server" { + import "generic-host" + vars = { + webserver_type = "apache" + } +} +``` + +The `wp-server` host template specifies a Wordpress instance and sets +the `application_type` custom variable. Please note the `+=` [operator](17-language-reference.md#dictionary-operators) +which adds [dictionary](17-language-reference.md#dictionary) items, +but does not override any previous `vars` attribute. + +``` +template Host "wp-server" { + vars += { + application_type = "wordpress" + } +} +``` + +The final host object imports both templates. The order is important here: +First the base template `web-server` is added to the object, then additional +attributes are imported from the `wp-server` object. + +``` +object Host "wp.example.com" { + import "web-server" + import "wp-server" + + address = "192.168.56.200" +} +``` + +If you want to override specific attributes inherited from templates, you can +specify them on the host object. + +``` +object Host "wp1.example.com" { + import "web-server" + import "wp-server" + + vars.webserver_type = "nginx" //overrides attribute from base template + + address = "192.168.56.201" +} +``` + +<!-- Keep this for compatibility --> +<a id="custom-attributes"></a> + +## Custom Variables <a id="custom-variables"></a> + +In addition to built-in object attributes you can define your own custom +attributes inside the `vars` attribute. + +> **Tip** +> +> This is called `custom variables` throughout the documentation, backends and web interfaces. +> +> Older documentation versions referred to this as `custom attribute`. + +The following example specifies the key `ssh_port` as custom +variable and assigns an integer value. + +``` +object Host "localhost" { + check_command = "ssh" + vars.ssh_port = 2222 +} +``` + +`vars` is a [dictionary](17-language-reference.md#dictionary) where you +can set specific keys to values. The example above uses the shorter +[indexer](17-language-reference.md#indexer) syntax. + +An alternative representation can be written like this: + +``` + vars = { + ssh_port = 2222 + } +``` + +or + +``` + vars["ssh_port"] = 2222 +``` + +### Custom Variable Values <a id="custom-variables-values"></a> + +Valid values for custom variables include: + +* [Strings](17-language-reference.md#string-literals), [numbers](17-language-reference.md#numeric-literals) and [booleans](17-language-reference.md#boolean-literals) +* [Arrays](17-language-reference.md#array) and [dictionaries](17-language-reference.md#dictionary) +* [Functions](03-monitoring-basics.md#custom-variables-functions) + +You can also define nested values such as dictionaries in dictionaries. + +This example defines the custom variable `disks` as dictionary. +The first key is set to `disk /` is itself set to a dictionary +with one key-value pair. + +``` + vars.disks["disk /"] = { + disk_partitions = "/" + } +``` + +This can be written as resolved structure like this: + +``` + vars = { + disks = { + "disk /" = { + disk_partitions = "/" + } + } + } +``` + +Keep this in mind when trying to access specific sub-keys +in apply rules or functions. + +Another example which is shown in the example configuration: + +``` + vars.notification["mail"] = { + groups = [ "icingaadmins" ] + } +``` + +This defines the `notification` custom variable as dictionary +with the key `mail`. Its value is a dictionary with the key `groups` +which itself has an array as value. Note: This array is the exact +same as the `user_groups` attribute for [notification apply rules](#03-monitoring-basics.md#using-apply-notifications) +expects. + +``` + vars.notification = { + mail = { + groups = [ + "icingaadmins" + ] + } + } +``` + +<!-- Keep this for compatibility --> +<a id="custom-attributes-functions"></a> + +### Functions as Custom Variables <a id="custom-variables-functions"></a> + +Icinga 2 lets you specify [functions](17-language-reference.md#functions) for custom variables. +The special case here is that whenever Icinga 2 needs the value for such a custom variable it runs +the function and uses whatever value the function returns: + +``` +object CheckCommand "random-value" { + command = [ PluginDir + "/check_dummy", "0", "$text$" ] + + vars.text = {{ Math.random() * 100 }} +} +``` + +This example uses the [abbreviated lambda syntax](17-language-reference.md#nullary-lambdas). + +These functions have access to a number of variables: + + Variable | Description + -------------|--------------- + user | The User object (for notifications). + service | The Service object (for service checks/notifications/event handlers). + host | The Host object. + command | The command object (e.g. a CheckCommand object for checks). + +Here's an example: + +``` +vars.text = {{ host.check_interval }} +``` + +In addition to these variables the [macro](18-library-reference.md#scoped-functions-macro) function can be used to retrieve the +value of arbitrary macro expressions: + +``` +vars.text = {{ + if (macro("$address$") == "127.0.0.1") { + log("Running a check for localhost!") + } + + return "Some text" +}} +``` + +The `resolve_arguments` function can be used to resolve a command and its arguments much in +the same fashion Icinga does this for the `command` and `arguments` attributes for +commands. The `by_ssh` command uses this functionality to let users specify a +command and arguments that should be executed via SSH: + +``` +arguments = { + "-C" = {{ + var command = macro("$by_ssh_command$") + var arguments = macro("$by_ssh_arguments$") + + if (typeof(command) == String && !arguments) { + return command + } + + var escaped_args = [] + for (arg in resolve_arguments(command, arguments)) { + escaped_args.add(escape_shell_arg(arg)) + } + return escaped_args.join(" ") + }} + ... +} +``` + +Accessing object attributes at runtime inside these functions is described in the +[advanced topics](08-advanced-topics.md#access-object-attributes-at-runtime) chapter. + + +## Runtime Macros <a id="runtime-macros"></a> + +Macros can be used to access other objects' attributes and [custom variables](03-monitoring-basics.md#custom-variables) +at runtime. For example they are used in command definitions to figure out +which IP address a check should be run against: + +``` +object CheckCommand "my-ping" { + command = [ PluginDir + "/check_ping" ] + + arguments = { + "-H" = "$ping_address$" + "-w" = "$ping_wrta$,$ping_wpl$%" + "-c" = "$ping_crta$,$ping_cpl$%" + "-p" = "$ping_packets$" + } + + // Resolve from a host attribute, or custom variable. + vars.ping_address = "$address$" + + // Default values + vars.ping_wrta = 100 + vars.ping_wpl = 5 + + vars.ping_crta = 250 + vars.ping_cpl = 10 + + vars.ping_packets = 5 +} + +object Host "router" { + check_command = "my-ping" + address = "10.0.0.1" +} +``` + +In this example we are using the `$address$` macro to refer to the host's `address` +attribute. + +We can also directly refer to custom variables, e.g. by using `$ping_wrta$`. Icinga +automatically tries to find the closest match for the attribute you specified. The +exact rules for this are explained in the next section. + +> **Note** +> +> When using the `$` sign as single character you must escape it with an +> additional dollar character (`$$`). + + +### Evaluation Order <a id="macro-evaluation-order"></a> + +When executing commands Icinga 2 checks the following objects in this order to look +up macros and their respective values: + +1. User object (only for notifications) +2. Service object +3. Host object +4. Command object +5. Global custom variables in the `Vars` constant + +This execution order allows you to define default values for custom variables +in your command objects. + +Here's how you can override the custom variable `ping_packets` from the previous +example: + +``` +object Service "ping" { + host_name = "localhost" + check_command = "my-ping" + + vars.ping_packets = 10 // Overrides the default value of 5 given in the command +} +``` + +If a custom variable isn't defined anywhere, an empty value is used and a warning is +written to the Icinga 2 log. + +You can also directly refer to a specific attribute -- thereby ignoring these evaluation +rules -- by specifying the full attribute name: + +``` +$service.vars.ping_wrta$ +``` + +This retrieves the value of the `ping_wrta` custom variable for the service. This +returns an empty value if the service does not have such a custom variable no matter +whether another object such as the host has this attribute. + + +### Host Runtime Macros <a id="host-runtime-macros"></a> + +The following host custom variables are available in all commands that are executed for +hosts or services: + + Name | Description + -----------------------------|-------------- + host.name | The name of the host object. + host.display\_name | The value of the `display_name` attribute. + host.state | The host's current state. Can be one of `UNREACHABLE`, `UP` and `DOWN`. + host.state\_id | The host's current state. Can be one of `0` (up), `1` (down) and `2` (unreachable). + host.state\_type | The host's current state type. Can be one of `SOFT` and `HARD`. + host.check\_attempt | The current check attempt number. + host.max\_check\_attempts | The maximum number of checks which are executed before changing to a hard state. + host.last\_state | The host's previous state. Can be one of `UNREACHABLE`, `UP` and `DOWN`. + host.last\_state\_id | The host's previous state. Can be one of `0` (up), `1` (down) and `2` (unreachable). + host.last\_state\_type | The host's previous state type. Can be one of `SOFT` and `HARD`. + host.last\_state\_change | The last state change's timestamp. + host.downtime\_depth | The number of active downtimes. + host.duration\_sec | The time since the last state change. + host.latency | The host's check latency. + host.execution\_time | The host's check execution time. + host.output | The last check's output. + host.perfdata | The last check's performance data. + host.last\_check | The timestamp when the last check was executed. + host.check\_source | The monitoring instance that performed the last check. + host.num\_services | Number of services associated with the host. + host.num\_services\_ok | Number of services associated with the host which are in an `OK` state. + host.num\_services\_warning | Number of services associated with the host which are in a `WARNING` state. + host.num\_services\_unknown | Number of services associated with the host which are in an `UNKNOWN` state. + host.num\_services\_critical | Number of services associated with the host which are in a `CRITICAL` state. + +In addition to these specific runtime macros [host object](09-object-types.md#objecttype-host) +attributes can be accessed too. + +### Service Runtime Macros <a id="service-runtime-macros"></a> + +The following service macros are available in all commands that are executed for +services: + + Name | Description + -----------------------------|-------------- + service.name | The short name of the service object. + service.display\_name | The value of the `display_name` attribute. + service.check\_command | The short name of the command along with any arguments to be used for the check. + service.state | The service's current state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`. + service.state\_id | The service's current state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown). + service.state\_type | The service's current state type. Can be one of `SOFT` and `HARD`. + service.check\_attempt | The current check attempt number. + service.max\_check\_attempts | The maximum number of checks which are executed before changing to a hard state. + service.last\_state | The service's previous state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`. + service.last\_state\_id | The service's previous state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown). + service.last\_state\_type | The service's previous state type. Can be one of `SOFT` and `HARD`. + service.last\_state\_change | The last state change's timestamp. + service.downtime\_depth | The number of active downtimes. + service.duration\_sec | The time since the last state change. + service.latency | The service's check latency. + service.execution\_time | The service's check execution time. + service.output | The last check's output. + service.perfdata | The last check's performance data. + service.last\_check | The timestamp when the last check was executed. + service.check\_source | The monitoring instance that performed the last check. + +In addition to these specific runtime macros [service object](09-object-types.md#objecttype-service) +attributes can be accessed too. + +### Command Runtime Macros <a id="command-runtime-macros"></a> + +The following custom variables are available in all commands: + + Name | Description + -----------------------|-------------- + command.name | The name of the command object. + +### User Runtime Macros <a id="user-runtime-macros"></a> + +The following custom variables are available in all commands that are executed for +users: + + Name | Description + -----------------------|-------------- + user.name | The name of the user object. + user.display\_name | The value of the `display_name` attribute. + +In addition to these specific runtime macros [user object](09-object-types.md#objecttype-user) +attributes can be accessed too. + +### Notification Runtime Macros <a id="notification-runtime-macros"></a> + + Name | Description + -----------------------|-------------- + notification.type | The type of the notification. + notification.author | The author of the notification comment if existing. + notification.comment | The comment of the notification if existing. + +In addition to these specific runtime macros [notification object](09-object-types.md#objecttype-notification) +attributes can be accessed too. + +### Global Runtime Macros <a id="global-runtime-macros"></a> + +The following macros are available in all executed commands: + + Name | Description + -------------------------|-------------- + icinga.timet | Current UNIX timestamp. + icinga.long\_date\_time | Current date and time including timezone information. Example: `2014-01-03 11:23:08 +0000` + icinga.short\_date\_time | Current date and time. Example: `2014-01-03 11:23:08` + icinga.date | Current date. Example: `2014-01-03` + icinga.time | Current time including timezone information. Example: `11:23:08 +0000` + icinga.uptime | Current uptime of the Icinga 2 process. + +The following macros provide global statistics: + + Name | Description + ------------------------------------|------------------------------------ + icinga.num\_services\_ok | Current number of services in state 'OK'. + icinga.num\_services\_warning | Current number of services in state 'Warning'. + icinga.num\_services\_critical | Current number of services in state 'Critical'. + icinga.num\_services\_unknown | Current number of services in state 'Unknown'. + icinga.num\_services\_pending | Current number of pending services. + icinga.num\_services\_unreachable | Current number of unreachable services. + icinga.num\_services\_flapping | Current number of flapping services. + icinga.num\_services\_in\_downtime | Current number of services in downtime. + icinga.num\_services\_acknowledged | Current number of acknowledged service problems. + icinga.num\_hosts\_up | Current number of hosts in state 'Up'. + icinga.num\_hosts\_down | Current number of hosts in state 'Down'. + icinga.num\_hosts\_unreachable | Current number of unreachable hosts. + icinga.num\_hosts\_pending | Current number of pending hosts. + icinga.num\_hosts\_flapping | Current number of flapping hosts. + icinga.num\_hosts\_in\_downtime | Current number of hosts in downtime. + icinga.num\_hosts\_acknowledged | Current number of acknowledged host problems. + +### Environment Variable Runtime Macros <a id="env-runtime-macros"></a> + +All environment variables of the Icinga process are available as runtime macros +named `env.<env var name>`. E.g. `$env.ProgramFiles$` for ProgramFiles which is +especially useful on Windows. In contrast to the other runtime macros env vars +require the `env.` prefix. + + +## Apply Rules <a id="using-apply"></a> + +Several object types require an object relation, e.g. [Service](09-object-types.md#objecttype-service), +[Notification](09-object-types.md#objecttype-notification), [Dependency](09-object-types.md#objecttype-dependency), +[ScheduledDowntime](09-object-types.md#objecttype-scheduleddowntime) objects. The +object relations are documented in the linked chapters. + +If you for example create a service object you have to specify the [host_name](09-object-types.md#objecttype-service) +attribute and reference an existing host attribute. + +``` +object Service "ping4" { + check_command = "ping4" + host_name = "icinga2-agent1.localdomain" +} +``` + +This isn't comfortable when managing a huge set of configuration objects which could +[match](03-monitoring-basics.md#using-apply-expressions) on a common pattern. + +Instead you want to use **[apply](17-language-reference.md#apply) rules**. + +If you want basic monitoring for all your hosts, add a `ping4` service apply rule +for all hosts which have the `address` attribute specified. Just one rule for 1000 hosts +instead of 1000 service objects. Apply rules will automatically generate them for you. + +``` +apply Service "ping4" { + check_command = "ping4" + assign where host.address +} +``` + +More explanations on assign where expressions can be found [here](03-monitoring-basics.md#using-apply-expressions). + +### Apply Rules: Prerequisites <a id="using-apply-prerquisites"></a> + +Before you start with apply rules keep the following in mind: + +* Define the best match. + * A set of unique [custom variables](03-monitoring-basics.md#custom-variables) for these hosts/services? + * Or [group](03-monitoring-basics.md#groups) memberships, e.g. a host being a member of a hostgroup which should have a service set? + * A generic pattern [match](18-library-reference.md#global-functions-match) on the host/service name? + * [Multiple expressions combined](03-monitoring-basics.md#using-apply-expressions) with `&&` or `||` [operators](17-language-reference.md#expression-operators) +* All expressions must return a boolean value (an empty string is equal to `false` e.g.) + +More specific object type requirements are described in these chapters: + +* [Apply services to hosts](03-monitoring-basics.md#using-apply-services) +* [Apply notifications to hosts and services](03-monitoring-basics.md#using-apply-notifications) +* [Apply dependencies to hosts and services](03-monitoring-basics.md#using-apply-dependencies) +* [Apply scheduled downtimes to hosts and services](03-monitoring-basics.md#using-apply-scheduledowntimes) + +### Apply Rules: Usage Examples <a id="using-apply-usage-examples"></a> + +You can set/override object attributes in apply rules using the respectively available +objects in that scope (host and/or service objects). + +``` +vars.application_type = host.vars.application_type +``` + +[Custom variables](03-monitoring-basics.md#custom-variables) can also store +nested dictionaries and arrays. That way you can use them for not only matching +for their existence or values in apply expressions, but also assign +("inherit") their values into the generated objected from apply rules. + +Remember the examples shown for [custom variable values](03-monitoring-basics.md#custom-variables-values): + +``` + vars.notification["mail"] = { + groups = [ "icingaadmins" ] + } +``` + +You can do two things here: + +* Check for the existence of the `notification` custom variable and its nested dictionary key `mail`. +If this is boolean true, the notification object will be generated. +* Assign the value of the `groups` key to the `user_groups` attribute. + +``` +apply Notification "mail-icingaadmin" to Host { + [...] + + user_groups = host.vars.notification.mail.groups + + assign where host.vars.notification.mail +} + +``` + +A more advanced example is to use [apply rules with for loops on arrays or +dictionaries](03-monitoring-basics.md#using-apply-for) provided by +[custom atttributes](03-monitoring-basics.md#custom-variables) or groups. + +Remember the examples shown for [custom variable values](03-monitoring-basics.md#custom-variables-values): + +``` + vars.disks["disk /"] = { + disk_partitions = "/" + } +``` + +You can iterate over all dictionary keys defined in `disks`. +You can optionally use the value to specify additional object attributes. + +``` +apply Service for (disk => config in host.vars.disks) { + [...] + + vars.disk_partitions = config.disk_partitions +} +``` + +Please read the [apply for chapter](03-monitoring-basics.md#using-apply-for) +for more specific insights. + + +> **Tip** +> +> Building configuration in that dynamic way requires detailed information +> of the generated objects. Use the `object list` [CLI command](11-cli-commands.md#cli-command-object) +> after successful [configuration validation](11-cli-commands.md#config-validation). + + +### Apply Rules Expressions <a id="using-apply-expressions"></a> + +You can use simple or advanced combinations of apply rule expressions. Each +expression must evaluate into the boolean `true` value. An empty string +will be for instance interpreted as `false`. In a similar fashion undefined +attributes will return `false`. + +Returns `false`: + +``` +assign where host.vars.attribute_does_not_exist +``` + +Multiple `assign where` condition rows are evaluated as `OR` condition. + +You can combine multiple expressions for matching only a subset of objects. In some cases, +you want to be able to add more than one assign/ignore where expression which matches +a specific condition. To achieve this you can use the logical `and` and `or` operators. + +#### Apply Rules Expressions Examples <a id="using-apply-expressions-examples"></a> + +Assign a service to a specific host in a host group [array](18-library-reference.md#array-type) using the [in operator](17-language-reference.md#expression-operators): + +``` +assign where "hostgroup-dev" in host.groups +``` + +Assign an object when a custom variable is [equal](17-language-reference.md#expression-operators) to a value: + +``` +assign where host.vars.application_type == "database" + +assign where service.vars.sms_notify == true +``` + +Assign an object if a dictionary [contains](18-library-reference.md#dictionary-contains) a given key: + +``` +assign where host.vars.app_dict.contains("app") +``` + +Match the host name by either using a [case insensitive match](18-library-reference.md#global-functions-match): + +``` +assign where match("webserver*", host.name) +``` + +Match the host name by using a [regular expression](18-library-reference.md#global-functions-regex). Please note the [escaped](17-language-reference.md#string-literals-escape-sequences) backslash character: + +``` +assign where regex("^webserver-[\\d+]", host.name) +``` + +[Match](18-library-reference.md#global-functions-match) all `*mysql*` patterns in the host name and (`&&`) custom variable `prod_mysql_db` +matches the `db-*` pattern. All hosts with the custom variable `test_server` set to `true` +should be ignored, or any host name ending with `*internal` pattern. + +``` +object HostGroup "mysql-server" { + display_name = "MySQL Server" + + assign where match("*mysql*", host.name) && match("db-*", host.vars.prod_mysql_db) + ignore where host.vars.test_server == true + ignore where match("*internal", host.name) +} +``` + +Similar example for advanced notification apply rule filters: If the service +attribute `notes` [matches](18-library-reference.md#global-functions-match) the `has gold support 24x7` string `AND` one of the +two condition passes, either the `customer` host custom variable is set to `customer-xy` +`OR` the host custom variable `always_notify` is set to `true`. + +The notification is ignored for services whose host name ends with `*internal` +`OR` the `priority` custom variable is [less than](17-language-reference.md#expression-operators) `2`. + +``` +template Notification "cust-xy-notification" { + users = [ "noc-xy", "mgmt-xy" ] + command = "mail-service-notification" +} + +apply Notification "notify-cust-xy-mysql" to Service { + import "cust-xy-notification" + + assign where match("*has gold support 24x7*", service.notes) && (host.vars.customer == "customer-xy" || host.vars.always_notify == true) + ignore where match("*internal", host.name) || (service.vars.priority < 2 && host.vars.is_clustered == true) +} +``` + +More advanced examples are covered [here](08-advanced-topics.md#use-functions-assign-where). + +### Apply Services to Hosts <a id="using-apply-services"></a> + +The sample configuration already includes a detailed example in [hosts.conf](04-configuration.md#hosts-conf) +and [services.conf](04-configuration.md#services-conf) for this use case. + +The example for `ssh` applies a service object to all hosts with the `address` +attribute being defined and the custom variable `os` set to the string `Linux` in `vars`. + +``` +apply Service "ssh" { + import "generic-service" + + check_command = "ssh" + + assign where host.address && host.vars.os == "Linux" +} +``` + +Other detailed examples are used in their respective chapters, for example +[apply services with custom command arguments](03-monitoring-basics.md#command-passing-parameters). + +### Apply Notifications to Hosts and Services <a id="using-apply-notifications"></a> + +Notifications are applied to specific targets (`Host` or `Service`) and work in a similar +manner: + +``` +apply Notification "mail-noc" to Service { + import "mail-service-notification" + + user_groups = [ "noc" ] + + assign where host.vars.notification.mail +} +``` + +In this example the `mail-noc` notification will be created as object for all services having the +`notification.mail` custom variable defined. The notification command is set to `mail-service-notification` +and all members of the user group `noc` will get notified. + +It is also possible to generally apply a notification template and dynamically overwrite values from +the template by checking for custom variables. This can be achieved by using [conditional statements](17-language-reference.md#conditional-statements): + +``` +apply Notification "host-mail-noc" to Host { + import "mail-host-notification" + + // replace interval inherited from `mail-host-notification` template with new notfication interval set by a host custom variable + if (host.vars.notification_interval) { + interval = host.vars.notification_interval + } + + // same with notification period + if (host.vars.notification_period) { + period = host.vars.notification_period + } + + // Send SMS instead of email if the host's custom variable `notification_type` is set to `sms` + if (host.vars.notification_type == "sms") { + command = "sms-host-notification" + } else { + command = "mail-host-notification" + } + + user_groups = [ "noc" ] + + assign where host.address +} +``` + +In the example above the notification template `mail-host-notification` +contains all relevant notification settings. +The apply rule is applied on all host objects where the `host.address` is defined. + +If the host object has a specific custom variable set, its value is inherited +into the local notification object scope, e.g. `host.vars.notification_interval`, +`host.vars.notification_period` and `host.vars.notification_type`. +This overwrites attributes already specified in the imported `mail-host-notification` +template. + +The corresponding host object could look like this: + +``` +object Host "host1" { + import "host-linux-prod" + display_name = "host1" + address = "192.168.1.50" + vars.notification_interval = 1h + vars.notification_period = "24x7" + vars.notification_type = "sms" +} +``` + +### Apply Dependencies to Hosts and Services <a id="using-apply-dependencies"></a> + +Detailed examples can be found in the [dependencies](03-monitoring-basics.md#dependencies) chapter. + +### Apply Recurring Downtimes to Hosts and Services <a id="using-apply-scheduledowntimes"></a> + +The sample configuration includes an example in [downtimes.conf](04-configuration.md#downtimes-conf). + +Detailed examples can be found in the [recurring downtimes](08-advanced-topics.md#recurring-downtimes) chapter. + + +### Using Apply For Rules <a id="using-apply-for"></a> + +Next to the standard way of using [apply rules](03-monitoring-basics.md#using-apply) +there is the requirement of applying objects based on a set (array or +dictionary) using [apply for](17-language-reference.md#apply-for) expressions. + +The sample configuration already includes a detailed example in [hosts.conf](04-configuration.md#hosts-conf) +and [services.conf](04-configuration.md#services-conf) for this use case. + +Take the following example: A host provides the snmp oids for different service check +types. This could look like the following example: + +``` +object Host "router-v6" { + check_command = "hostalive" + address6 = "2001:db8:1234::42" + + vars.oids["if01"] = "1.1.1.1.1" + vars.oids["temp"] = "1.1.1.1.2" + vars.oids["bgp"] = "1.1.1.1.5" +} +``` + +The idea is to create service objects for `if01` and `temp` but not `bgp`. +The oid value should also be used as service custom variable `snmp_oid`. +This is the command argument required by the [snmp](10-icinga-template-library.md#plugin-check-command-snmp) +check command. +The service's `display_name` should be set to the identifier inside the dictionary, +e.g. `if01`. + +``` +apply Service for (identifier => oid in host.vars.oids) { + check_command = "snmp" + display_name = identifier + vars.snmp_oid = oid + + ignore where identifier == "bgp" //don't generate service for bgp checks +} +``` + +Icinga 2 evaluates the `apply for` rule for all objects with the custom variable +`oids` set. +It iterates over all dictionary items inside the `for` loop and evaluates the +`assign/ignore where` expressions. You can access the loop variable +in these expressions, e.g. to ignore specific values. + +In this example the `bgp` identifier is ignored. This avoids to generate +unwanted services. A different approach would be to match the `oid` value with a +[regex](18-library-reference.md#global-functions-regex)/[wildcard match](18-library-reference.md#global-functions-match) pattern for example. + +``` + ignore where regex("^\d.\d.\d.\d.5$", oid) +``` + +> **Note** +> +> You don't need an `assign where` expression which checks for the existence of the +> `oids` custom variable. + +This method saves you from creating multiple apply rules. It also moves +the attribute specification logic from the service to the host. + +<!-- Keep this for compatibility --> +<a id="using-apply-for-custom-attribute-override"></a> + +#### Apply For and Custom Variable Override <a id="using-apply-for-custom-variable-override"></a> + +Imagine a different more advanced example: You are monitoring your network device (host) +with many interfaces (services). The following requirements/problems apply: + +* Each interface service should be named with a prefix and a name defined in your host object (which could be generated from your CMDB, etc.) +* Each interface has its own VLAN tag +* Some interfaces have QoS enabled +* Additional attributes such as `display_name` or `notes`, `notes_url` and `action_url` must be +dynamically generated. + + +> **Tip** +> +> Define the SNMP community as global constant in your [constants.conf](04-configuration.md#constants-conf) file. + +``` +const IftrafficSnmpCommunity = "public" +``` + +Define the `interfaces` [custom variable](03-monitoring-basics.md#custom-variables) +on the `cisco-catalyst-6509-34` host object and add three example interfaces as dictionary keys. + +Specify additional attributes inside the nested dictionary +as learned with [custom variable values](03-monitoring-basics.md#custom-variables-values): + +``` +object Host "cisco-catalyst-6509-34" { + import "generic-host" + display_name = "Catalyst 6509 #34 VIE21" + address = "127.0.1.4" + + /* "GigabitEthernet0/2" is the interface name, + * and key name in service apply for later on + */ + vars.interfaces["GigabitEthernet0/2"] = { + /* define all custom variables with the + * same name required for command parameters/arguments + * in service apply (look into your CheckCommand definition) + */ + iftraffic_units = "g" + iftraffic_community = IftrafficSnmpCommunity + iftraffic_bandwidth = 1 + vlan = "internal" + qos = "disabled" + } + vars.interfaces["GigabitEthernet0/4"] = { + iftraffic_units = "g" + //iftraffic_community = IftrafficSnmpCommunity + iftraffic_bandwidth = 1 + vlan = "remote" + qos = "enabled" + } + vars.interfaces["MgmtInterface1"] = { + iftraffic_community = IftrafficSnmpCommunity + vlan = "mgmt" + interface_address = "127.99.0.100" #special management ip + } +} +``` + +Start with the apply for definition and iterate over `host.vars.interfaces`. +This is a dictionary and should use the variables `interface_name` as key +and `interface_config` as value for each generated object scope. + +`"if-"` specifies the object name prefix for each service which results +in `if-<interface_name>` for each iteration. + +``` +/* loop over the host.vars.interfaces dictionary + * for (key => value in dict) means `interface_name` as key + * and `interface_config` as value. Access config attributes + * with the indexer (`.`) character. + */ +apply Service "if-" for (interface_name => interface_config in host.vars.interfaces) { +``` + +Import the `generic-service` template, assign the [iftraffic](10-icinga-template-library.md#plugin-contrib-command-iftraffic) +`check_command`. Use the dictionary key `interface_name` to set a proper `display_name` +string for external interfaces. + +``` + import "generic-service" + check_command = "iftraffic" + display_name = "IF-" + interface_name +``` + +The `interface_name` key's value is the same string used as command parameter for +`iftraffic`: + +``` + /* use the key as command argument (no duplication of values in host.vars.interfaces) */ + vars.iftraffic_interface = interface_name +``` + +Remember that `interface_config` is a nested dictionary. In the first iteration it looks +like this: + +``` +interface_config = { + iftraffic_units = "g" + iftraffic_community = IftrafficSnmpCommunity + iftraffic_bandwidth = 1 + vlan = "internal" + qos = "disabled" +} +``` + +Access the dictionary keys with the [indexer](17-language-reference.md#indexer) syntax +and assign them to custom variables used as command parameters for the `iftraffic` +check command. + +``` + /* map the custom variables as command arguments */ + vars.iftraffic_units = interface_config.iftraffic_units + vars.iftraffic_community = interface_config.iftraffic_community +``` + +If you just want to inherit all attributes specified inside the `interface_config` +dictionary, add it to the generated service custom variables like this: + +``` + /* the above can be achieved in a shorter fashion if the names inside host.vars.interfaces + * are the _exact_ same as required as command parameter by the check command + * definition. + */ + vars += interface_config +``` + +If the user did not specify default values for required service custom variables, +add them here. This also helps to avoid unwanted configuration validation errors or +runtime failures. Please read more about conditional statements [here](17-language-reference.md#conditional-statements). + +``` + /* set a default value for units and bandwidth */ + if (interface_config.iftraffic_units == "") { + vars.iftraffic_units = "m" + } + if (interface_config.iftraffic_bandwidth == "") { + vars.iftraffic_bandwidth = 1 + } + if (interface_config.vlan == "") { + vars.vlan = "not set" + } + if (interface_config.qos == "") { + vars.qos = "not set" + } +``` + +If the host object did not specify a custom SNMP community, +set a default value specified by the [global constant](17-language-reference.md#constants) `IftrafficSnmpCommunity`. + +``` + /* set the global constant if not explicitely + * not provided by the `interfaces` dictionary on the host + */ + if (len(interface_config.iftraffic_community) == 0 || len(vars.iftraffic_community) == 0) { + vars.iftraffic_community = IftrafficSnmpCommunity + } +``` + +Use the provided values to [calculate](17-language-reference.md#expression-operators) +more object attributes which can be e.g. seen in external interfaces. + +``` + /* Calculate some additional object attributes after populating the `vars` dictionary */ + notes = "Interface check for " + interface_name + " (units: '" + interface_config.iftraffic_units + "') in VLAN '" + vars.vlan + "' with ' QoS '" + vars.qos + "'" + notes_url = "https://foreman.company.com/hosts/" + host.name + action_url = "https://snmp.checker.company.com/" + host.name + "/if-" + interface_name +} +``` + +> **Tip** +> +> Building configuration in that dynamic way requires detailed information +> of the generated objects. Use the `object list` [CLI command](11-cli-commands.md#cli-command-object) +> after successful [configuration validation](11-cli-commands.md#config-validation). + +Verify that the apply-for-rule successfully created the service objects with the +inherited custom variables: + +``` +# icinga2 daemon -C +# icinga2 object list --type Service --name *catalyst* + +Object 'cisco-catalyst-6509-34!if-GigabitEthernet0/2' of type 'Service': +...... + * vars + % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 59:3-59:26 + * iftraffic_bandwidth = 1 + * iftraffic_community = "public" + % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 53:3-53:65 + * iftraffic_interface = "GigabitEthernet0/2" + % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 49:3-49:43 + * iftraffic_units = "g" + % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 52:3-52:57 + * qos = "disabled" + * vlan = "internal" + + +Object 'cisco-catalyst-6509-34!if-GigabitEthernet0/4' of type 'Service': +... + * vars + % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 59:3-59:26 + * iftraffic_bandwidth = 1 + * iftraffic_community = "public" + % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 53:3-53:65 + % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 79:5-79:53 + * iftraffic_interface = "GigabitEthernet0/4" + % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 49:3-49:43 + * iftraffic_units = "g" + % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 52:3-52:57 + * qos = "enabled" + * vlan = "remote" + +Object 'cisco-catalyst-6509-34!if-MgmtInterface1' of type 'Service': +... + * vars + % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 59:3-59:26 + * iftraffic_bandwidth = 1 + % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 66:5-66:32 + * iftraffic_community = "public" + % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 53:3-53:65 + * iftraffic_interface = "MgmtInterface1" + % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 49:3-49:43 + * iftraffic_units = "m" + % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 52:3-52:57 + % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 63:5-63:30 + * interface_address = "127.99.0.100" + * qos = "not set" + % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 72:5-72:24 + * vlan = "mgmt" +``` + +### Use Object Attributes in Apply Rules <a id="using-apply-object-attributes"></a> + +Since apply rules are evaluated after the generic objects, you +can reference existing host and/or service object attributes as +values for any object attribute specified in that apply rule. + +``` +object Host "opennebula-host" { + import "generic-host" + address = "10.1.1.2" + + vars.hosting["cust1"] = { + http_uri = "/shop" + customer_name = "Customer 1" + customer_id = "7568" + support_contract = "gold" + } + vars.hosting["cust2"] = { + http_uri = "/" + customer_name = "Customer 2" + customer_id = "7569" + support_contract = "silver" + } +} +``` + +`hosting` is a custom variable with the Dictionary value type. +This is mandatory to iterate with the `key => value` notation +in the below apply for rule. + +``` +apply Service for (customer => config in host.vars.hosting) { + import "generic-service" + check_command = "ping4" + + vars.qos = "disabled" + + vars += config + + vars.http_uri = "/" + customer + "/" + config.http_uri + + display_name = "Shop Check for " + vars.customer_name + "-" + vars.customer_id + + notes = "Support contract: " + vars.support_contract + " for Customer " + vars.customer_name + " (" + vars.customer_id + ")." + + notes_url = "https://foreman.company.com/hosts/" + host.name + action_url = "https://snmp.checker.company.com/" + host.name + "/" + vars.customer_id +} +``` + +Each loop iteration has different values for `customer` and config` +in the local scope. + +1. + +``` +customer = "cust 1" +config = { + http_uri = "/shop" + customer_name = "Customer 1" + customer_id = "7568" + support_contract = "gold" +} +``` + +2. + +``` +customer = "cust2" +config = { + http_uri = "/" + customer_name = "Customer 2" + customer_id = "7569" + support_contract = "silver" +} +``` + +You can now add the `config` dictionary into `vars`. + +``` +vars += config +``` + +Now it looks like the following in the first iteration: + +``` +customer = "cust 1" +vars = { + http_uri = "/shop" + customer_name = "Customer 1" + customer_id = "7568" + support_contract = "gold" +} +``` + +Remember, you know this structure already. Custom +attributes can also be accessed by using the [indexer](17-language-reference.md#indexer) +syntax. + +``` + vars.http_uri = ... + config.http_uri +``` + +can also be written as + +``` + vars += config + vars.http_uri = ... + vars.http_uri +``` + + +## Groups <a id="groups"></a> + +A group is a collection of similar objects. Groups are primarily used as a +visualization aid in web interfaces. + +Group membership is defined at the respective object itself. If +you have a hostgroup name `windows` for example, and want to assign +specific hosts to this group for later viewing the group on your +alert dashboard, first create a HostGroup object: + +``` +object HostGroup "windows" { + display_name = "Windows Servers" +} +``` + +Then add your hosts to this group: + +``` +template Host "windows-server" { + groups += [ "windows" ] +} + +object Host "mssql-srv1" { + import "windows-server" + + vars.mssql_port = 1433 +} + +object Host "mssql-srv2" { + import "windows-server" + + vars.mssql_port = 1433 +} +``` + +This can be done for service and user groups the same way: + +``` +object UserGroup "windows-mssql-admins" { + display_name = "Windows MSSQL Admins" +} + +template User "generic-windows-mssql-users" { + groups += [ "windows-mssql-admins" ] +} + +object User "win-mssql-noc" { + import "generic-windows-mssql-users" + + email = "noc@example.com" +} + +object User "win-mssql-ops" { + import "generic-windows-mssql-users" + + email = "ops@example.com" +} +``` + +### Group Membership Assign <a id="group-assign-intro"></a> + +Instead of manually assigning each object to a group you can also assign objects +to a group based on their attributes: + +``` +object HostGroup "prod-mssql" { + display_name = "Production MSSQL Servers" + + assign where host.vars.mssql_port && host.vars.prod_mysql_db + ignore where host.vars.test_server == true + ignore where match("*internal", host.name) +} +``` + +In this example all hosts with the `vars` attribute `mssql_port` +will be added as members to the host group `mssql`. However, all +hosts [matching](18-library-reference.md#global-functions-match) the string `\*internal` +or with the `test_server` attribute set to `true` are **not** added to this group. + +Details on the `assign where` syntax can be found in the +[Language Reference](17-language-reference.md#apply). + +## Notifications <a id="alert-notifications"></a> + +Notifications for service and host problems are an integral part of your +monitoring setup. + +When a host or service is in a downtime, a problem has been acknowledged or +the dependency logic determined that the host/service is unreachable, no +notifications are sent. You can configure additional type and state filters +refining the notifications being actually sent. + +There are many ways of sending notifications, e.g. by email, XMPP, +IRC, Twitter, etc. On its own Icinga 2 does not know how to send notifications. +Instead it relies on external mechanisms such as shell scripts to notify users. +More notification methods are listed in the [addons and plugins](13-addons.md#notification-scripts-interfaces) +chapter. + +A notification specification requires one or more users (and/or user groups) +who will be notified in case of problems. These users must have all custom +attributes defined which will be used in the `NotificationCommand` on execution. + +The user `icingaadmin` in the example below will get notified only on `Warning` and +`Critical` problems. In addition to that `Recovery` notifications are sent (they require +the `OK` state). + +``` +object User "icingaadmin" { + display_name = "Icinga 2 Admin" + enable_notifications = true + states = [ OK, Warning, Critical ] + types = [ Problem, Recovery ] + email = "icinga@localhost" +} +``` + +If you don't set the `states` and `types` configuration attributes for the `User` +object, notifications for all states and types will be sent. + +Details on troubleshooting notification problems can be found [here](15-troubleshooting.md#troubleshooting). + +> **Note** +> +> Make sure that the [notification](11-cli-commands.md#enable-features) feature is enabled +> in order to execute notification commands. + +You should choose which information you (and your notified users) are interested in +case of emergency, and also which information does not provide any value to you and +your environment. + +An example notification command is explained [here](03-monitoring-basics.md#notification-commands). + +You can add all shared attributes to a `Notification` template which is inherited +to the defined notifications. That way you'll save duplicated attributes in each +`Notification` object. Attributes can be overridden locally. + +``` +template Notification "generic-notification" { + interval = 15m + + command = "mail-service-notification" + + states = [ Warning, Critical, Unknown ] + types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart, + FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ] + + period = "24x7" +} +``` + +The time period `24x7` is included as example configuration with Icinga 2. + +Use the `apply` keyword to create `Notification` objects for your services: + +``` +apply Notification "notify-cust-xy-mysql" to Service { + import "generic-notification" + + users = [ "noc-xy", "mgmt-xy" ] + + assign where match("*has gold support 24x7*", service.notes) && (host.vars.customer == "customer-xy" || host.vars.always_notify == true + ignore where match("*internal", host.name) || (service.vars.priority < 2 && host.vars.is_clustered == true) +} +``` + + +Instead of assigning users to notifications, you can also add the `user_groups` +attribute with a list of user groups to the `Notification` object. Icinga 2 will +send notifications to all group members. + +> **Note** +> +> Only users who have been notified of a problem before (`Warning`, `Critical`, `Unknown` +states for services, `Down` for hosts) will receive `Recovery` notifications. + +Icinga 2 v2.10 allows you to configure a `User` object with `Acknowledgement` and/or `Recovery` +without a `Problem` notification. These notifications will be sent without +any problem notifications beforehand, and can be used for e.g. ticket systems. + +``` +object User "ticketadmin" { + display_name = "Ticket Admin" + enable_notifications = true + states = [ OK, Warning, Critical ] + types = [ Acknowledgement, Recovery ] + email = "ticket@localhost" +} +``` + +### Notifications: Users from Host/Service <a id="alert-notifications-users-host-service"></a> + +A common pattern is to store the users and user groups +on the host or service objects instead of the notification +object itself. + +The sample configuration provided in [hosts.conf](04-configuration.md#hosts-conf) and [notifications.conf](notifications-conf) +already provides an example for this question. + +> **Tip** +> +> Please make sure to read the [apply](03-monitoring-basics.md#using-apply) and +> [custom variable values](03-monitoring-basics.md#custom-variables-values) chapter to +> fully understand these examples. + + +Specify the user and groups as nested custom variable on the host object: + +``` +object Host "icinga2-agent1.localdomain" { + [...] + + vars.notification["mail"] = { + groups = [ "icingaadmins" ] + users = [ "icingaadmin" ] + } + vars.notification["sms"] = { + users = [ "icingaadmin" ] + } +} +``` + +As you can see, there is the option to use two different notification +apply rules here: One for `mail` and one for `sms`. + +This example assigns the `users` and `groups` nested keys from the `notification` +custom variable to the actual notification object attributes. + +Since errors are hard to debug if host objects don't specify the required +configuration attributes, you can add a safety condition which logs which +host object is affected. + +``` +critical/config: Host 'icinga2-client3.localdomain' does not specify required user/user_groups configuration attributes for notification 'mail-icingaadmin'. +``` + +You can also use the [script debugger](20-script-debugger.md#script-debugger) for more advanced insights. + +``` +apply Notification "mail-host-notification" to Host { + [...] + + /* Log which host does not specify required user/user_groups attributes. This will fail immediately during config validation and help a lot. */ + if (len(host.vars.notification.mail.users) == 0 && len(host.vars.notification.mail.user_groups) == 0) { + log(LogCritical, "config", "Host '" + host.name + "' does not specify required user/user_groups configuration attributes for notification '" + name + "'.") + } + + users = host.vars.notification.mail.users + user_groups = host.vars.notification.mail.groups + + assign where host.vars.notification.mail && typeof(host.vars.notification.mail) == Dictionary +} + +apply Notification "sms-host-notification" to Host { + [...] + + /* Log which host does not specify required user/user_groups attributes. This will fail immediately during config validation and help a lot. */ + if (len(host.vars.notification.sms.users) == 0 && len(host.vars.notification.sms.user_groups) == 0) { + log(LogCritical, "config", "Host '" + host.name + "' does not specify required user/user_groups configuration attributes for notification '" + name + "'.") + } + + users = host.vars.notification.sms.users + user_groups = host.vars.notification.sms.groups + + assign where host.vars.notification.sms && typeof(host.vars.notification.sms) == Dictionary +} +``` + +The example above uses [typeof](18-library-reference.md#global-functions-typeof) as safety function to ensure that +the `mail` key really provides a dictionary as value. Otherwise +the configuration validation could fail if an admin adds something +like this on another host: + +``` + vars.notification.mail = "yes" +``` + + +You can also do a more fine granular assignment on the service object: + +``` +apply Service "http" { + [...] + + vars.notification["mail"] = { + groups = [ "icingaadmins" ] + users = [ "icingaadmin" ] + } + + [...] +} +``` + +This notification apply rule is different to the one above. The service +notification users and groups are inherited from the service and if not set, +from the host object. A default user is set too. + +``` +apply Notification "mail-service-notification" to Service { + [...] + + if (service.vars.notification.mail.users) { + users = service.vars.notification.mail.users + } else if (host.vars.notification.mail.users) { + users = host.vars.notification.mail.users + } else { + /* Default user who receives everything. */ + users = [ "icingaadmin" ] + } + + if (service.vars.notification.mail.groups) { + user_groups = service.vars.notification.mail.groups + } else if (host.vars.notification.mail.groups) { + user_groups = host.vars.notification.mail.groups + } + + assign where ( host.vars.notification.mail && typeof(host.vars.notification.mail) == Dictionary ) || ( service.vars.notification.mail && typeof(service.vars.notification.mail) == Dictionary ) +} +``` + +### Notification Escalations <a id="notification-escalations"></a> + +When a problem notification is sent and a problem still exists at the time of re-notification +you may want to escalate the problem to the next support level. A different approach +is to configure the default notification by email, and escalate the problem via SMS +if not already solved. + +You can define notification start and end times as additional configuration +attributes making the `Notification` object a so-called `notification escalation`. +Using templates you can share the basic notification attributes such as users or the +`interval` (and override them for the escalation then). + +Using the example from above, you can define additional users being escalated for SMS +notifications between start and end time. + +``` +object User "icinga-oncall-2nd-level" { + display_name = "Icinga 2nd Level" + + vars.mobile = "+1 555 424642" +} + +object User "icinga-oncall-1st-level" { + display_name = "Icinga 1st Level" + + vars.mobile = "+1 555 424642" +} +``` + +Define an additional [NotificationCommand](03-monitoring-basics.md#notification-commands) for SMS notifications. + +> **Note** +> +> The example is not complete as there are many different SMS providers. +> Please note that sending SMS notifications will require an SMS provider +> or local hardware with an active SIM card. + +``` +object NotificationCommand "sms-notification" { + command = [ + PluginDir + "/send_sms_notification", + "$mobile$", + "..." +} +``` + +The two new notification escalations are added onto the local host +and its service `ping4` using the `generic-notification` template. +The user `icinga-oncall-2nd-level` will get notified by SMS (`sms-notification` +command) after `30m` until `1h`. + +> **Note** +> +> The `interval` was set to 15m in the `generic-notification` +> template example. Lower that value in your escalations by using a secondary +> template or by overriding the attribute directly in the `notifications` array +> position for `escalation-sms-2nd-level`. + +If the problem does not get resolved nor acknowledged preventing further notifications, +the `escalation-sms-1st-level` user will be escalated `1h` after the initial problem was +notified, but only for one hour (`2h` as `end` key for the `times` dictionary). + +``` +apply Notification "mail" to Service { + import "generic-notification" + + command = "mail-notification" + users = [ "icingaadmin" ] + + assign where service.name == "ping4" +} + +apply Notification "escalation-sms-2nd-level" to Service { + import "generic-notification" + + command = "sms-notification" + users = [ "icinga-oncall-2nd-level" ] + + times = { + begin = 30m + end = 1h + } + + assign where service.name == "ping4" +} + +apply Notification "escalation-sms-1st-level" to Service { + import "generic-notification" + + command = "sms-notification" + users = [ "icinga-oncall-1st-level" ] + + times = { + begin = 1h + end = 2h + } + + assign where service.name == "ping4" +} +``` + +### Notification Delay <a id="notification-delay"></a> + +Sometimes the problem in question should not be announced when the notification is due +(the object reaching the `HARD` state), but after a certain period. In Icinga 2 +you can use the `times` dictionary and set `begin = 15m` as key and value if you want to +postpone the notification window for 15 minutes. Leave out the `end` key -- if not set, +Icinga 2 will not check against any end time for this notification. + +> **Note** +> +> Setting the `end` key to `0` will stop sending notifications immediately +> when a problem occurs, effectively disabling the notification. + +Make sure to specify a relatively low notification `interval` to get notified soon enough again. + +``` +apply Notification "mail" to Service { + import "generic-notification" + + command = "mail-notification" + users = [ "icingaadmin" ] + + interval = 5m + + times.begin = 15m // delay notification window + + assign where service.name == "ping4" +} +``` + +Also note that this mechanism doesn't take downtimes etc. into account, only +the `HARD` state change time matters. E.g. for a problem which occurred in the +middle of a downtime from 2 PM to 4 PM `times.begin = 2h` means 5 PM, not 6 PM. + +### Disable Re-notifications <a id="disable-renotification"></a> + +If you prefer to be notified only once, you can disable re-notifications by setting the +`interval` attribute to `0`. + +``` +apply Notification "notify-once" to Service { + import "generic-notification" + + command = "mail-notification" + users = [ "icingaadmin" ] + + interval = 0 // disable re-notification + + assign where service.name == "ping4" +} +``` + +### Notification Filters by State and Type <a id="notification-filters-state-type"></a> + +If there are no notification state and type filter attributes defined at the `Notification` +or `User` object, Icinga 2 assumes that all states and types are being notified. + +Available state and type filters for notifications are: + +``` +template Notification "generic-notification" { + + states = [ OK, Warning, Critical, Unknown ] + types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart, + FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ] +} +``` + + +## Commands <a id="commands"></a> + +Icinga 2 uses three different command object types to specify how +checks should be performed, notifications should be sent, and +events should be handled. + +### Check Commands <a id="check-commands"></a> + +[CheckCommand](09-object-types.md#objecttype-checkcommand) objects define the command line how +a check is called. + +[CheckCommand](09-object-types.md#objecttype-checkcommand) objects are referenced by +[Host](09-object-types.md#objecttype-host) and [Service](09-object-types.md#objecttype-service) objects +using the `check_command` attribute. + +> **Note** +> +> Make sure that the [checker](11-cli-commands.md#enable-features) feature is enabled in order to +> execute checks. + +#### Integrate the Plugin with a CheckCommand Definition <a id="command-plugin-integration"></a> + +Unless you have done so already, download your check plugin and put it +into the [PluginDir](04-configuration.md#constants-conf) directory. The following example uses the +`check_mysql` plugin contained in the Monitoring Plugins package. + +The plugin path and all command arguments are made a list of +double-quoted string arguments for proper shell escaping. + +Call the `check_mysql` plugin with the `--help` parameter to see +all available options. Our example defines warning (`-w`) and +critical (`-c`) thresholds. + +``` +icinga@icinga2 $ /usr/lib64/nagios/plugins/check_mysql --help +... +This program tests connections to a MySQL server + +Usage: +check_mysql [-d database] [-H host] [-P port] [-s socket] +[-u user] [-p password] [-S] [-l] [-a cert] [-k key] +[-C ca-cert] [-D ca-dir] [-L ciphers] [-f optfile] [-g group] +``` + +Next step is to understand how [command parameters](03-monitoring-basics.md#command-passing-parameters) +are being passed from a host or service object, and add a [CheckCommand](09-object-types.md#objecttype-checkcommand) +definition based on these required parameters and/or default values. + +Please continue reading in the [plugins section](05-service-monitoring.md#service-monitoring-plugins) for additional integration examples. + +#### Passing Check Command Parameters from Host or Service <a id="command-passing-parameters"></a> + +Check command parameters are defined as custom variables which can be accessed as runtime macros +by the executed check command. + +The check command parameters for ITL provided plugin check command definitions are documented +[here](10-icinga-template-library.md#icinga-template-library), for example +[disk](10-icinga-template-library.md#plugin-check-command-disk). + +In order to practice passing command parameters you should [integrate your own plugin](03-monitoring-basics.md#command-plugin-integration). + +The following example will use `check_mysql` provided by the [Monitoring Plugins](https://www.monitoring-plugins.org/). + +Define the default check command custom variables, for example `mysql_user` and `mysql_password` +(freely definable naming schema) and optional their default threshold values. You can +then use these custom variables as runtime macros for [command arguments](03-monitoring-basics.md#command-arguments) +on the command line. + +> **Tip** +> +> Use a common command type as prefix for your command arguments to increase +> readability. `mysql_user` helps understanding the context better than just +> `user` as argument. + +The default custom variables can be overridden by the custom variables +defined in the host or service using the check command `my-mysql`. The custom variables +can also be inherited from a parent template using additive inheritance (`+=`). + +``` +# vim /etc/icinga2/conf.d/commands.conf + +object CheckCommand "my-mysql" { + command = [ PluginDir + "/check_mysql" ] //constants.conf -> const PluginDir + + arguments = { + "-H" = "$mysql_host$" + "-u" = { + required = true + value = "$mysql_user$" + } + "-p" = "$mysql_password$" + "-P" = "$mysql_port$" + "-s" = "$mysql_socket$" + "-a" = "$mysql_cert$" + "-d" = "$mysql_database$" + "-k" = "$mysql_key$" + "-C" = "$mysql_ca_cert$" + "-D" = "$mysql_ca_dir$" + "-L" = "$mysql_ciphers$" + "-f" = "$mysql_optfile$" + "-g" = "$mysql_group$" + "-S" = { + set_if = "$mysql_check_slave$" + description = "Check if the slave thread is running properly." + } + "-l" = { + set_if = "$mysql_ssl$" + description = "Use ssl encryption" + } + } + + vars.mysql_check_slave = false + vars.mysql_ssl = false + vars.mysql_host = "$address$" +} +``` + +The check command definition also sets `mysql_host` to the `$address$` default value. You can override +this command parameter if for example your MySQL host is not running on the same server's ip address. + +Make sure pass all required command parameters, such as `mysql_user`, `mysql_password` and `mysql_database`. +`MysqlUsername` and `MysqlPassword` are specified as [global constants](04-configuration.md#constants-conf) +in this example. + +``` +# vim /etc/icinga2/conf.d/services.conf + +apply Service "mysql-icinga-db-health" { + import "generic-service" + + check_command = "my-mysql" + + vars.mysql_user = MysqlUsername + vars.mysql_password = MysqlPassword + + vars.mysql_database = "icinga" + vars.mysql_host = "192.168.33.11" + + assign where match("icinga2*", host.name) + ignore where host.vars.no_health_check == true +} +``` + + +Take a different example: The example host configuration in [hosts.conf](04-configuration.md#hosts-conf) +also applies an `ssh` service check. Your host's ssh port is not the default `22`, but set to `2022`. +You can pass the command parameter as custom variable `ssh_port` directly inside the service apply rule +inside [services.conf](04-configuration.md#services-conf): + +``` +apply Service "ssh" { + import "generic-service" + + check_command = "ssh" + vars.ssh_port = 2022 //custom command parameter + + assign where (host.address || host.address6) && host.vars.os == "Linux" +} +``` + +If you prefer this being configured at the host instead of the service, modify the host configuration +object instead. The runtime macro resolving order is described [here](03-monitoring-basics.md#macro-evaluation-order). + +``` +object Host "icinga2-agent1.localdomain { +... + vars.ssh_port = 2022 +} +``` + +#### Passing Check Command Parameters Using Apply For <a id="command-passing-parameters-apply-for"></a> + +The host `localhost` with the generated services from the `basic-partitions` dictionary (see +[apply for](03-monitoring-basics.md#using-apply-for) for details) checks a basic set of disk partitions +with modified custom variables (warning thresholds at `10%`, critical thresholds at `5%` +free disk space). + +The custom variable `disk_partition` can either hold a single string or an array of +string values for passing multiple partitions to the `check_disk` check plugin. + +``` +object Host "my-server" { + import "generic-host" + address = "127.0.0.1" + address6 = "::1" + + vars.local_disks["basic-partitions"] = { + disk_partitions = [ "/", "/tmp", "/var", "/home" ] + } +} + +apply Service for (disk => config in host.vars.local_disks) { + import "generic-service" + check_command = "my-disk" + + vars += config + + vars.disk_wfree = "10%" + vars.disk_cfree = "5%" +} +``` + + +More details on using arrays in custom variables can be found in +[this chapter](03-monitoring-basics.md#custom-variables). + + +#### Command Arguments <a id="command-arguments"></a> + +Next to the short `command` array specified in the command object, +it is advised to define plugin/script parameters in the `arguments` +dictionary attribute. + +The value of the `--parameter` key itself is a dictionary with additional +keys. They allow to create generic command objects and are also for documentation +purposes, e.g. with the `description` field copying the plugin's help text in there. +The Icinga Director uses this field to show the argument's purpose when selecting it. + +``` + arguments = { + "--parameter" = { + description = "..." + value = "..." + } + } +``` + +Each argument is optional by default and is omitted if +the value is not set. + +Learn more about integrating plugins with CheckCommand +objects in [this chapter](05-service-monitoring.md#service-monitoring-plugin-checkcommand). + +There are additional possibilities for creating a command only once, +with different parameters and arguments, shown below. + +##### Command Arguments: Value <a id="command-arguments-value"></a> + +In order to find out about the command argument, call the plugin's help +or consult the README. + +``` +./check_systemd.py --help + +... + + -u UNIT, --unit UNIT Name of the systemd unit that is beeing tested. +``` + +Whenever the long parameter name is available, prefer this over the short one. + +``` + arguments = { + "--unit" = { + + } + } +``` + +Define a unique `prefix` for the command's specific arguments. Best practice is to follow this schema: + +``` +<command name>_<parameter name> +``` + +Therefore use `systemd_` as prefix, and use the long plugin parameter name `unit` inside the [runtime macro](03-monitoring-basics.md#runtime-macros) +syntax. + +``` + arguments = { + "--unit" = { + value = "$systemd_unit$" + } + } +``` + +In order to specify a default value, specify +a [custom variable](03-monitoring-basics.md#custom-variables) inside +the CheckCommand object. + +``` + vars.systemd_unit = "icinga2" +``` + +This value can be overridden from the host/service +object as command parameters. + + +##### Command Arguments: Description <a id="command-arguments-description"></a> + +Best practice, also inside the [ITL](10-icinga-template-library.md#icinga-template-library), is to always +copy the command parameter help output into the `description` +field of your check command. + +Learn more about integrating plugins with CheckCommand +objects in [this chapter](05-service-monitoring.md#service-monitoring-plugin-checkcommand). + +With the [example above](03-monitoring-basics.md#command-arguments-value), +inspect the parameter's help text. + +``` +./check_systemd.py --help + +... + + -u UNIT, --unit UNIT Name of the systemd unit that is beeing tested. +``` + +Copy this into the command arguments `description` entry. + +``` + arguments = { + "--unit" = { + value = "$systemd_unit$" + description = "Name of the systemd unit that is beeing tested." + } + } +``` + +##### Command Arguments: Required <a id="command-arguments-required"></a> + +Specifies whether this command argument is required, or not. By +default all arguments are optional. + +> **Tip** +> +> Good plugins provide optional parameters in square brackets, e.g. `[-w SECONDS]`. + +The `required` field can be toggled with a [boolean](17-language-reference.md#boolean-literals) value. + +``` + arguments = { + "--host" = { + value = "..." + description = "..." + required = true + } + } +``` + +Whenever the check is executed and the argument is missing, Icinga +logs an error. This allows to better debug configuration errors +instead of sometimes unreadable plugin errors when parameters are +missing. + +##### Command Arguments: Skip Key <a id="command-arguments-skip-key"></a> + +The `arguments` attribute requires a key, empty values are not allowed. +To overcome this for parameters which don't need the name in front of +the value, use the `skip_key` [boolean](17-language-reference.md#boolean-literals) toggle. + +``` + command = [ PrefixDir + "/bin/icingacli", "businessprocess", "process", "check" ] + + arguments = { + "--process" = { + value = "$icingacli_businessprocess_process$" + description = "Business process to monitor" + skip_key = true + required = true + order = -1 + } + } +``` + +The service specifies the [custom variable](03-monitoring-basics.md#custom-variables) `icingacli_businessprocess_process`. + +``` + vars.icingacli_businessprocess_process = "bp-shop-web" +``` + +This results in this command line without the `--process` parameter: + +```bash +'/bin/icingacli' 'businessprocess' 'process' 'check' 'bp-shop-web' +``` + +You can use this method to put everything into the `arguments` attribute +in a defined order and without keys. This avoids entries in the `command` +attributes too. + + +##### Command Arguments: Set If <a id="command-arguments-set-if"></a> + +This can be used for the following scenarios: + +**Parameters without value, e.g. `--sni`.** + +``` + command = [ PluginDir + "/check_http"] + + arguments = { + "--sni" = { + set_if = "$http_sni$" + } + } +``` + +Whenever a host/service object sets the `http_sni` [custom variable](03-monitoring-basics.md#custom-variables) +to `true`, the parameter is added to the command line. + +```bash +'/usr/lib64/nagios/plugins/check_http' '--sni' +``` + +[Numeric](17-language-reference.md#numeric-literals) values are allowed too. + +**Parameters with value, but additionally controlled with an extra custom variable boolean flag.** + +The following example is taken from the [postgres]() CheckCommand. The host +parameter should use a `value` but only whenever the `postgres_unixsocket` +[custom variable](03-monitoring-basics.md#custom-variables) is set to false. + +Note: `set_if` is using a runtime lambda function because the value +is evaluated at runtime. This is explained in [this chapter](08-advanced-topics.md#use-functions-object-config). + +``` + command = [ PluginContribDir + "/check_postgres.pl" ] + + arguments = { + "-H" = { + value = "$postgres_host$" + set_if = {{ macro("$postgres_unixsocket$") == false }} + description = "hostname(s) to connect to; defaults to none (Unix socket)" + } +``` + +An executed check for this host and services ... + +``` +object Host "postgresql-cluster" { + // ... + + vars.postgres_host = "192.168.56.200" + vars.postgres_unixsocket = false +} +``` + +... use the following command line: + +```bash +'/usr/lib64/nagios/plugins/check_postgres.pl' '-H' '192.168.56.200' +``` + +Host/service objects which set `postgres_unixsocket` to `false` don't add the `-H` parameter +and its value to the command line. + +References: [abbreviated lambda syntax](17-language-reference.md#nullary-lambdas), [macro](18-library-reference.md#scoped-functions-macro). + +##### Command Arguments: Order <a id="command-arguments-order"></a> + +Plugin may require parameters in a special order. One after the other, +or e.g. one parameter always in the first position. + +``` + arguments = { + "--first" = { + value = "..." + description = "..." + order = -5 + } + "--second" = { + value = "..." + description = "..." + order = -4 + } + "--last" = { + value = "..." + description = "..." + order = 99 + } + } +``` + +Keep in mind that positional arguments need to be tested thoroughly. + +##### Command Arguments: Repeat Key <a id="command-arguments-repeat-key"></a> + +Parameters can use [Array](17-language-reference.md#array) as value type. Whenever Icinga encounters +an array, it repeats the parameter key and each value element by default. + +``` + command = [ NscpPath + "\\nscp.exe", "client" ] + + arguments = { + "-a" = { + value = "$nscp_arguments$" + description = "..." + repeat_key = true + } + } +``` + +On a host/service object, specify the `nscp_arguments` [custom variable](03-monitoring-basics.md#custom-variables) +as an array. + +``` + vars.nscp_arguments = [ "exclude=sppsvc", "exclude=ShellHWDetection" ] +``` + +This translates into the following command line: + +``` +nscp.exe 'client' '-a' 'exclude=sppsvc' '-a' 'exclude=ShellHWDetection' +``` + +If the plugin requires you to pass the list without repeating the key, +set `repeat_key = false` in the argument definition. + +``` + command = [ NscpPath + "\\nscp.exe", "client" ] + + arguments = { + "-a" = { + value = "$nscp_arguments$" + description = "..." + repeat_key = false + } + } +``` + +This translates into the following command line: + +``` +nscp.exe 'client' '-a' 'exclude=sppsvc' 'exclude=ShellHWDetection' +``` + + +##### Command Arguments: Key <a id="command-arguments-key"></a> + +The `arguments` attribute requires unique keys. Sometimes, you'll +need to override this in the resulting command line with same key +names. Therefore you can specifically override the arguments key. + +``` +arguments = { + "--key1" = { + value = "..." + key = "-specialkey" + } + "--key2" = { + value = "..." + key = "-specialkey" + } +} +``` + +This results in the following command line: + +``` + '-specialkey' '...' '-specialkey' '...' +``` + +#### Environment Variables <a id="command-environment-variables"></a> + +The `env` command object attribute specifies a list of environment variables with values calculated +from custom variables which should be exported as environment variables prior to executing the command. + +This is useful for example for hiding sensitive information on the command line output +when passing credentials to database checks: + +``` +object CheckCommand "mysql" { + command = [ PluginDir + "/check_mysql" ] + + arguments = { + "-H" = "$mysql_address$" + "-d" = "$mysql_database$" + } + + vars.mysql_address = "$address$" + vars.mysql_database = "icinga" + vars.mysql_user = "icinga_check" + vars.mysql_pass = "password" + + env.MYSQLUSER = "$mysql_user$" + env.MYSQLPASS = "$mysql_pass$" +} +``` + +The executed command line visible with `ps` or `top` looks like this and hides +the database credentials in the user's environment. + +```bash +/usr/lib/nagios/plugins/check_mysql -H 192.168.56.101 -d icinga +``` + +> **Note** +> +> If the CheckCommand also supports setting the parameter in the command line, +> ensure to use a different name for the custom variable. Otherwise Icinga 2 +> adds the command line parameter. + +If a specific CheckCommand object provided with the [Icinga Template Library](10-icinga-template-library.md#icinga-template-library) +needs additional environment variables, you can import it into a new custom +CheckCommand object and add additional `env` keys. Example for the [mysql_health](10-icinga-template-library.md#plugin-contrib-command-mysql_health) +CheckCommand: + +``` +object CheckCommand "mysql_health_env" { + import "mysql_health" + + // https://labs.consol.de/nagios/check_mysql_health/ + env.NAGIOS__SERVICEMYSQL_USER = "$mysql_health_env_username$" + env.NAGIOS__SERVICEMYSQL_PASS = "$mysql_health_env_password$" +} +``` + +Specify the custom variables `mysql_health_env_username` and `mysql_health_env_password` +in the service object then. + +> **Note** +> +> Keep in mind that the values are still visible with the [debug console](11-cli-commands.md#cli-command-console) +> and the inspect mode in the [Icinga Director](https://icinga.com/docs/director/latest/). + +You can also set global environment variables in the application's +sysconfig configuration file, e.g. `HOME` or specific library paths +for Oracle. Beware that these environment variables can be used +by any CheckCommand object and executed plugin and can leak sensitive +information. + +### Notification Commands <a id="notification-commands"></a> + +[NotificationCommand](09-object-types.md#objecttype-notificationcommand) +objects define how notifications are delivered to external interfaces +(email, XMPP, IRC, Twitter, etc.). +[NotificationCommand](09-object-types.md#objecttype-notificationcommand) +objects are referenced by [Notification](09-object-types.md#objecttype-notification) +objects using the `command` attribute. + +> **Note** +> +> Make sure that the [notification](11-cli-commands.md#enable-features) feature is enabled +> in order to execute notification commands. + +While it's possible to specify an entire notification command right +in the NotificationCommand object it is generally advisable to create a +shell script in the `/etc/icinga2/scripts` directory and have the +NotificationCommand object refer to that. + +A fresh Icinga 2 install comes with with two example scripts for host +and service notifications by email. Based on the Icinga 2 runtime macros +(such as `$service.output$` for the current check output) it's possible +to send email to the user(s) associated with the notification itself +(`$user.email$`). Feel free to take these scripts as a starting point +for your own individual notification solution - and keep in mind that +nearly everything is technically possible. + +Information needed to generate notifications is passed to the scripts as +arguments. The NotificationCommand objects `mail-host-notification` and +`mail-service-notification` correspond to the shell scripts +`mail-host-notification.sh` and `mail-service-notification.sh` in +`/etc/icinga2/scripts` and define default values for arguments. These +defaults can always be overwritten locally. + +> **Note** +> +> This example requires the `mail` binary installed on the Icinga 2 +> master. +> +> Depending on the distribution, you need a local mail transfer +> agent (MTA) such as Postfix, Exim or Sendmail in order +> to send emails. +> +> These tools virtually provide the `mail` binary executed +> by the notification scripts below. + +#### mail-host-notification <a id="mail-host-notification"></a> + +The `mail-host-notification` NotificationCommand object uses the +example notification script located in `/etc/icinga2/scripts/mail-host-notification.sh`. + +Here is a quick overview of the arguments that can be used. See also [host runtime +macros](03-monitoring-basics.md#-host-runtime-macros) for further +information. + + Name | Description + -------------------------------|--------------------------------------- + `notification_date` | **Required.** Date and time. Defaults to `$icinga.long_date_time$`. + `notification_hostname` | **Required.** The host's `FQDN`. Defaults to `$host.name$`. + `notification_hostdisplayname` | **Required.** The host's display name. Defaults to `$host.display_name$`. + `notification_hostoutput` | **Required.** Output from host check. Defaults to `$host.output$`. + `notification_useremail` | **Required.** The notification's recipient(s). Defaults to `$user.email$`. + `notification_hoststate` | **Required.** Current state of host. Defaults to `$host.state$`. + `notification_type` | **Required.** Type of notification. Defaults to `$notification.type$`. + `notification_address` | **Optional.** The host's IPv4 address. Defaults to `$address$`. + `notification_address6` | **Optional.** The host's IPv6 address. Defaults to `$address6$`. + `notification_author` | **Optional.** Comment author. Defaults to `$notification.author$`. + `notification_comment` | **Optional.** Comment text. Defaults to `$notification.comment$`. + `notification_from` | **Optional.** Define a valid From: string (e.g. `"Icinga 2 Host Monitoring <icinga@example.com>"`). Requires `GNU mailutils` (Debian/Ubuntu) or `mailx` (RHEL/SUSE). + `notification_icingaweb2url` | **Optional.** Define URL to your Icinga Web 2 (e.g. `"https://www.example.com/icingaweb2"`) + `notification_logtosyslog` | **Optional.** Set `true` to log notification events to syslog; useful for debugging. Defaults to `false`. + +#### mail-service-notification <a id="mail-service-notification"></a> + +The `mail-service-notification` NotificationCommand object uses the +example notification script located in `/etc/icinga2/scripts/mail-service-notification.sh`. + +Here is a quick overview of the arguments that can be used. See also [service runtime +macros](03-monitoring-basics.md#-service-runtime-macros) for further +information. + + Name | Description + ----------------------------------|--------------------------------------- + `notification_date` | **Required.** Date and time. Defaults to `$icinga.long_date_time$`. + `notification_hostname` | **Required.** The host's `FQDN`. Defaults to `$host.name$`. + `notification_servicename` | **Required.** The service name. Defaults to `$service.name$`. + `notification_hostdisplayname` | **Required.** Host display name. Defaults to `$host.display_name$`. + `notification_servicedisplayname` | **Required.** Service display name. Defaults to `$service.display_name$`. + `notification_serviceoutput` | **Required.** Output from service check. Defaults to `$service.output$`. + `notification_useremail` | **Required.** The notification's recipient(s). Defaults to `$user.email$`. + `notification_servicestate` | **Required.** Current state of host. Defaults to `$service.state$`. + `notification_type` | **Required.** Type of notification. Defaults to `$notification.type$`. + `notification_address` | **Optional.** The host's IPv4 address. Defaults to `$address$`. + `notification_address6` | **Optional.** The host's IPv6 address. Defaults to `$address6$`. + `notification_author` | **Optional.** Comment author. Defaults to `$notification.author$`. + `notification_comment` | **Optional.** Comment text. Defaults to `$notification.comment$`. + `notification_from` | **Optional.** Define a valid From: string (e.g. `"Icinga 2 Host Monitoring <icinga@example.com>"`). Requires `GNU mailutils` (Debian/Ubuntu) or `mailx` (RHEL/SUSE). + `notification_icingaweb2url` | **Optional.** Define URL to your Icinga Web 2 (e.g. `"https://www.example.com/icingaweb2"`) + `notification_logtosyslog` | **Optional.** Set `true` to log notification events to syslog; useful for debugging. Defaults to `false`. + +### Event Commands <a id="event-commands"></a> + +Unlike notifications, event commands for hosts/services are called on every +check execution if one of these conditions matches: + +* The host/service is in a [soft state](03-monitoring-basics.md#hard-soft-states) +* The host/service state changes into a [hard state](03-monitoring-basics.md#hard-soft-states) +* The host/service state recovers from a [soft or hard state](03-monitoring-basics.md#hard-soft-states) to [OK](03-monitoring-basics.md#service-states)/[Up](03-monitoring-basics.md#host-states) + +[EventCommand](09-object-types.md#objecttype-eventcommand) objects are referenced by +[Host](09-object-types.md#objecttype-host) and [Service](09-object-types.md#objecttype-service) objects +with the `event_command` attribute. + +Therefore the `EventCommand` object should define a command line +evaluating the current service state and other service runtime attributes +available through runtime variables. Runtime macros such as `$service.state_type$` +and `$service.state$` will be processed by Icinga 2 and help with fine-granular +triggered events + +If the host/service is located on a client as [command endpoint](06-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint) +the event command will be executed on the client itself (similar to the check +command). + +Common use case scenarios are a failing HTTP check which requires an immediate +restart via event command. Another example would be an application that is not +responding and therefore requires a restart. You can also use event handlers +to forward more details on state changes and events than the typical notification +alerts provide. + +#### Use Event Commands to Send Information from the Master <a id="event-command-send-information-from-master"></a> + +This example sends a web request from the master node to an external tool +for every event triggered on a `businessprocess` service. + +Define an [EventCommand](09-object-types.md#objecttype-eventcommand) +object `send_to_businesstool` which sends state changes to the external tool. + +``` +object EventCommand "send_to_businesstool" { + command = [ + "/usr/bin/curl", + "-s", + "-X PUT" + ] + + arguments = { + "-H" = { + value ="$businesstool_url$" + skip_key = true + } + "-d" = "$businesstool_message$" + } + + vars.businesstool_url = "http://localhost:8080/businesstool" + vars.businesstool_message = "$host.name$ $service.name$ $service.state$ $service.state_type$ $service.check_attempt$" +} +``` + +Set the `event_command` attribute to `send_to_businesstool` on the Service. + +``` +object Service "businessprocess" { + host_name = "businessprocess" + + check_command = "icingacli-businessprocess" + vars.icingacli_businessprocess_process = "icinga" + vars.icingacli_businessprocess_config = "training" + + event_command = "send_to_businesstool" +} +``` + +In order to test this scenario you can run: + +```bash +nc -l 8080 +``` + +This allows to catch the web request. You can also enable the [debug log](15-troubleshooting.md#troubleshooting-enable-debug-output) +and search for the event command execution log message. + +```bash +tail -f /var/log/icinga2/debug.log | grep EventCommand +``` + +Feed in a check result via REST API action [process-check-result](12-icinga2-api.md#icinga2-api-actions-process-check-result) +or via Icinga Web 2. + +Expected Result: + +``` +# nc -l 8080 +PUT /businesstool HTTP/1.1 +User-Agent: curl/7.29.0 +Host: localhost:8080 +Accept: */* +Content-Length: 47 +Content-Type: application/x-www-form-urlencoded + +businessprocess businessprocess CRITICAL SOFT 1 +``` + +#### Use Event Commands to Restart Service Daemon via Command Endpoint on Linux <a id="event-command-restart-service-daemon-command-endpoint-linux"></a> + +This example triggers a restart of the `httpd` service on the local system +when the `procs` service check executed via Command Endpoint fails. It only +triggers if the service state is `Critical` and attempts to restart the +service before a notification is sent. + +Requirements: + +* Icinga 2 as client on the remote node +* icinga user with sudo permissions to the httpd daemon + +Example on CentOS 7: + +``` +# visudo +icinga ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart httpd +``` + +Note: Distributions might use a different name. On Debian/Ubuntu the service is called `apache2`. + +Define an [EventCommand](09-object-types.md#objecttype-eventcommand) object `restart_service` +which allows to trigger local service restarts. Put it into a [global zone](06-distributed-monitoring.md#distributed-monitoring-global-zone-config-sync) +to sync its configuration to all clients. + +``` +[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.d/global-templates/eventcommands.conf + +object EventCommand "restart_service" { + command = [ PluginDir + "/restart_service" ] + + arguments = { + "-s" = "$service.state$" + "-t" = "$service.state_type$" + "-a" = "$service.check_attempt$" + "-S" = "$restart_service$" + } + + vars.restart_service = "$procs_command$" +} +``` + +This event command triggers the following script which restarts the service. +The script only is executed if the service state is `CRITICAL`. Warning and Unknown states +are ignored as they indicate not an immediate failure. + +``` +[root@icinga2-agent1.localdomain /]# vim /usr/lib64/nagios/plugins/restart_service + +#!/bin/bash + +while getopts "s:t:a:S:" opt; do + case $opt in + s) + servicestate=$OPTARG + ;; + t) + servicestatetype=$OPTARG + ;; + a) + serviceattempt=$OPTARG + ;; + S) + service=$OPTARG + ;; + esac +done + +if ( [ -z $servicestate ] || [ -z $servicestatetype ] || [ -z $serviceattempt ] || [ -z $service ] ); then + echo "USAGE: $0 -s servicestate -z servicestatetype -a serviceattempt -S service" + exit 3; +else + # Only restart on the third attempt of a critical event + if ( [ $servicestate == "CRITICAL" ] && [ $servicestatetype == "SOFT" ] && [ $serviceattempt -eq 3 ] ); then + sudo /usr/bin/systemctl restart $service + fi +fi + +[root@icinga2-agent1.localdomain /]# chmod +x /usr/lib64/nagios/plugins/restart_service +``` + +Add a service on the master node which is executed via command endpoint on the client. +Set the `event_command` attribute to `restart_service`, the name of the previously defined +EventCommand object. + +``` +[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.d/master/icinga2-agent1.localdomain.conf + +object Service "Process httpd" { + check_command = "procs" + event_command = "restart_service" + max_check_attempts = 4 + + host_name = "icinga2-agent1.localdomain" + command_endpoint = "icinga2-agent1.localdomain" + + vars.procs_command = "httpd" + vars.procs_warning = "1:10" + vars.procs_critical = "1:" +} +``` + +In order to test this configuration just stop the `httpd` on the remote host `icinga2-agent1.localdomain`. + +``` +[root@icinga2-agent1.localdomain /]# systemctl stop httpd +``` + +You can enable the [debug log](15-troubleshooting.md#troubleshooting-enable-debug-output) and search for the +executed command line. + +``` +[root@icinga2-agent1.localdomain /]# tail -f /var/log/icinga2/debug.log | grep restart_service +``` + +#### Use Event Commands to Restart Service Daemon via Command Endpoint on Windows <a id="event-command-restart-service-daemon-command-endpoint-windows"></a> + +This example triggers a restart of the `httpd` service on the remote system +when the `service-windows` service check executed via Command Endpoint fails. +It only triggers if the service state is `Critical` and attempts to restart the +service before a notification is sent. + +Requirements: + +* Icinga 2 as client on the remote node +* Icinga 2 service with permissions to execute Powershell scripts (which is the default) + +Define an [EventCommand](09-object-types.md#objecttype-eventcommand) object `restart_service-windows` +which allows to trigger local service restarts. Put it into a [global zone](06-distributed-monitoring.md#distributed-monitoring-global-zone-config-sync) +to sync its configuration to all clients. + +``` +[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.d/global-templates/eventcommands.conf + +object EventCommand "restart_service-windows" { + command = [ + "C:\\Windows\\SysWOW64\\WindowsPowerShell\\v1.0\\powershell.exe", + PluginDir + "/restart_service.ps1" + ] + + arguments = { + "-ServiceState" = "$service.state$" + "-ServiceStateType" = "$service.state_type$" + "-ServiceAttempt" = "$service.check_attempt$" + "-Service" = "$restart_service$" + "; exit" = { + order = 99 + value = "$$LASTEXITCODE" + } + } + + vars.restart_service = "$service_win_service$" +} +``` + +This event command triggers the following script which restarts the service. +The script only is executed if the service state is `CRITICAL`. Warning and Unknown states +are ignored as they indicate not an immediate failure. + +Add the `restart_service.ps1` Powershell script into `C:\Program Files\Icinga2\sbin`: + +``` +param( + [string]$Service = '', + [string]$ServiceState = '', + [string]$ServiceStateType = '', + [int]$ServiceAttempt = '' + ) + +if (!$Service -Or !$ServiceState -Or !$ServiceStateType -Or !$ServiceAttempt) { + $scriptName = GCI $MyInvocation.PSCommandPath | Select -Expand Name; + Write-Host "USAGE: $scriptName -ServiceState servicestate -ServiceStateType servicestatetype -ServiceAttempt serviceattempt -Service service" -ForegroundColor red; + exit 3; +} + +# Only restart on the third attempt of a critical event +if ($ServiceState -eq "CRITICAL" -And $ServiceStateType -eq "SOFT" -And $ServiceAttempt -eq 3) { + Restart-Service $Service; +} + +exit 0; +``` + +Add a service on the master node which is executed via command endpoint on the client. +Set the `event_command` attribute to `restart_service-windows`, the name of the previously defined +EventCommand object. + +``` +[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.d/master/icinga2-agent2.localdomain.conf + +object Service "Service httpd" { + check_command = "service-windows" + event_command = "restart_service-windows" + max_check_attempts = 4 + + host_name = "icinga2-agent2.localdomain" + command_endpoint = "icinga2-agent2.localdomain" + + vars.service_win_service = "httpd" +} +``` + +In order to test this configuration just stop the `httpd` on the remote host `icinga2-agent1.localdomain`. + +``` +C:> net stop httpd +``` + +You can enable the [debug log](15-troubleshooting.md#troubleshooting-enable-debug-output) and search for the +executed command line in `C:\ProgramData\icinga2\var\log\icinga2\debug.log`. + + +#### Use Event Commands to Restart Service Daemon via SSH <a id="event-command-restart-service-daemon-ssh"></a> + +This example triggers a restart of the `httpd` daemon +via SSH when the `http` service check fails. + +Requirements: + +* SSH connection allowed (firewall, packet filters) +* icinga user with public key authentication +* icinga user with sudo permissions to restart the httpd daemon. + +Example on Debian: + +``` +# ls /home/icinga/.ssh/ +authorized_keys + +# visudo +icinga ALL=(ALL) NOPASSWD: /etc/init.d/apache2 restart +``` + +Define a generic [EventCommand](09-object-types.md#objecttype-eventcommand) object `event_by_ssh` +which can be used for all event commands triggered using SSH: + +``` +[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.d/master/local_eventcommands.conf + +/* pass event commands through ssh */ +object EventCommand "event_by_ssh" { + command = [ PluginDir + "/check_by_ssh" ] + + arguments = { + "-H" = "$event_by_ssh_address$" + "-p" = "$event_by_ssh_port$" + "-C" = "$event_by_ssh_command$" + "-l" = "$event_by_ssh_logname$" + "-i" = "$event_by_ssh_identity$" + "-q" = { + set_if = "$event_by_ssh_quiet$" + } + "-w" = "$event_by_ssh_warn$" + "-c" = "$event_by_ssh_crit$" + "-t" = "$event_by_ssh_timeout$" + } + + vars.event_by_ssh_address = "$address$" + vars.event_by_ssh_quiet = false +} +``` + +The actual event command only passes the `event_by_ssh_command` attribute. +The `event_by_ssh_service` custom variable takes care of passing the correct +daemon name, while `test $service.state_id$ -gt 0` makes sure that the daemon +is only restarted when the service is not in an `OK` state. + +``` +object EventCommand "event_by_ssh_restart_service" { + import "event_by_ssh" + + //only restart the daemon if state > 0 (not-ok) + //requires sudo permissions for the icinga user + vars.event_by_ssh_command = "test $service.state_id$ -gt 0 && sudo systemctl restart $event_by_ssh_service$" +} +``` + + +Now set the `event_command` attribute to `event_by_ssh_restart_service` and tell it +which service should be restarted using the `event_by_ssh_service` attribute. + +``` +apply Service "http" { + import "generic-service" + check_command = "http" + + event_command = "event_by_ssh_restart_service" + vars.event_by_ssh_service = "$host.vars.httpd_name$" + + //vars.event_by_ssh_logname = "icinga" + //vars.event_by_ssh_identity = "/home/icinga/.ssh/id_rsa.pub" + + assign where host.vars.httpd_name +} +``` + +Specify the `httpd_name` custom variable on the host to assign the +service and set the event handler service. + +``` +object Host "remote-http-host" { + import "generic-host" + address = "192.168.1.100" + + vars.httpd_name = "apache2" +} +``` + +In order to test this configuration just stop the `httpd` on the remote host `icinga2-agent1.localdomain`. + +``` +[root@icinga2-agent1.localdomain /]# systemctl stop httpd +``` + +You can enable the [debug log](15-troubleshooting.md#troubleshooting-enable-debug-output) and search for the +executed command line. + +``` +[root@icinga2-agent1.localdomain /]# tail -f /var/log/icinga2/debug.log | grep by_ssh +``` + + +## Dependencies <a id="dependencies"></a> + +Icinga 2 uses host and service [Dependency](09-object-types.md#objecttype-dependency) objects +for determining their network reachability. + +A service can depend on a host, and vice versa. A service has an implicit +dependency (parent) to its host. A host to host dependency acts implicitly +as host parent relation. +When dependencies are calculated, not only the immediate parent is taken into +account but all parents are inherited. + +The `parent_host_name` and `parent_service_name` attributes are mandatory for +service dependencies, `parent_host_name` is required for host dependencies. +[Apply rules](03-monitoring-basics.md#using-apply) will allow you to +[determine these attributes](03-monitoring-basics.md#dependencies-apply-custom-variables) in a more +dynamic fashion if required. + +``` +parent_host_name = "core-router" +parent_service_name = "uplink-port" +``` + +Notifications are suppressed by default if a host or service becomes unreachable. +You can control that option by defining the `disable_notifications` attribute. + +``` +disable_notifications = false +``` + +If the dependency should be triggered in the parent object's soft state, you +need to set `ignore_soft_states` to `false`. + +The dependency state filter must be defined based on the parent object being +either a host (`Up`, `Down`) or a service (`OK`, `Warning`, `Critical`, `Unknown`). + +The following example will make the dependency fail and trigger it if the parent +object is **not** in one of these states: + +``` +states = [ OK, Critical, Unknown ] +``` + +> **In other words** +> +> If the parent service object changes into the `Warning` state, this +> dependency will fail and render all child objects (hosts or services) unreachable. + +You can determine the child's reachability by querying the `last_reachable` attribute +via the [REST API](12-icinga2-api.md#icinga2-api). + +> **Note** +> +> Reachability calculation depends on fresh and processed check results. If dependencies +> disable checks for child objects, this won't work reliably. + +### Implicit Dependencies for Services on Host <a id="dependencies-implicit-host-service"></a> + +Icinga 2 automatically adds an implicit dependency for services on their host. That way +service notifications are suppressed when a host is `DOWN` or `UNREACHABLE`. This dependency +does not overwrite other dependencies and implicitly sets `disable_notifications = true` and +`states = [ Up ]` for all service objects. + +Service checks are still executed. If you want to prevent them from happening, you can +apply the following dependency to all services setting their host as `parent_host_name` +and disabling the checks. `assign where true` matches on all `Service` objects. + +``` +apply Dependency "disable-host-service-checks" to Service { + disable_checks = true + assign where true +} +``` + +### Dependencies for Network Reachability <a id="dependencies-network-reachability"></a> + +A common scenario is the Icinga 2 server behind a router. Checking internet +access by pinging the Google DNS server `google-dns` is a common method, but +will fail in case the `dsl-router` host is down. Therefore the example below +defines a host dependency which acts implicitly as parent relation too. + +Furthermore the host may be reachable but ping probes are dropped by the +router's firewall. In case the `dsl-router`'s `ping4` service check fails, all +further checks for the `ping4` service on host `google-dns` service should +be suppressed. This is achieved by setting the `disable_checks` attribute to `true`. + +``` +object Host "dsl-router" { + import "generic-host" + address = "192.168.1.1" +} + +object Host "google-dns" { + import "generic-host" + address = "8.8.8.8" +} + +apply Service "ping4" { + import "generic-service" + + check_command = "ping4" + + assign where host.address +} + +apply Dependency "internet" to Host { + parent_host_name = "dsl-router" + disable_checks = true + disable_notifications = true + + assign where host.name != "dsl-router" +} + +apply Dependency "internet" to Service { + parent_host_name = "dsl-router" + parent_service_name = "ping4" + disable_checks = true + + assign where host.name != "dsl-router" +} +``` + +### Redundancy Groups <a id="dependencies-redundancy-groups"></a> + +Sometimes you want dependencies to accumulate, +i.e. to consider the parent reachable only if no dependency is violated. +Sometimes you want them to be regarded as redundant, +i.e. to consider the parent unreachable only if no dependency is fulfilled. +Think of a host connected to both a network and a storage switch vs. a host connected to redundant routers. + +Sometimes you even want a mixture of both. +Think of a service like SSH depeding on both LDAP and DNS to function, +while operating redundant LDAP servers as well as redundant DNS resolvers. + +Before v2.12, Icinga regarded all dependecies as cumulative. +In v2.12 and v2.13, Icinga regarded all dependencies redundant. +The latter led to unrelated services being inadvertantly regarded to be redundant to each other. + +v2.14 restored the former behavior and allowed to override it. +I.e. all dependecies are regarded as essential for the parent by default. +Specifying the `redundancy_group` attribute for two dependecies of a child object with the equal value +causes them to be regarded as redundant (only inside that redundancy group). + +<!-- Keep this for compatibility --> +<a id="dependencies-apply-custom-attrÃbutes"></a> + +### Apply Dependencies based on Custom Variables <a id="dependencies-apply-custom-variables"></a> + +You can use [apply rules](03-monitoring-basics.md#using-apply) to set parent or +child attributes, e.g. `parent_host_name` to other objects' +attributes. + +A common example are virtual machines hosted on a master. The object +name of that master is auto-generated from your CMDB or VMWare inventory +into the host's custom variables (or a generic template for your +cloud). + +Define your master host object: + +``` +/* your master */ +object Host "master.example.com" { + import "generic-host" +} +``` + +Add a generic template defining all common host attributes: + +``` +/* generic template for your virtual machines */ +template Host "generic-vm" { + import "generic-host" +} +``` + +Add a template for all hosts on your example.com cloud setting +custom variable `vm_parent` to `master.example.com`: + +``` +template Host "generic-vm-example.com" { + import "generic-vm" + vars.vm_parent = "master.example.com" +} +``` + +Define your guest hosts: + +``` +object Host "www.example1.com" { + import "generic-vm-master.example.com" +} + +object Host "www.example2.com" { + import "generic-vm-master.example.com" +} +``` + +Apply the host dependency to all child hosts importing the +`generic-vm` template and set the `parent_host_name` +to the previously defined custom variable `host.vars.vm_parent`. + +``` +apply Dependency "vm-host-to-parent-master" to Host { + parent_host_name = host.vars.vm_parent + assign where "generic-vm" in host.templates +} +``` + +You can extend this example, and make your services depend on the +`master.example.com` host too. Their local scope allows you to use +`host.vars.vm_parent` similar to the example above. + +``` +apply Dependency "vm-service-to-parent-master" to Service { + parent_host_name = host.vars.vm_parent + assign where "generic-vm" in host.templates +} +``` + +That way you don't need to wait for your guest hosts becoming +unreachable when the master host goes down. Instead the services +will detect their reachability immediately when executing checks. + +> **Note** +> +> This method with setting locally scoped variables only works in +> apply rules, but not in object definitions. + + +### Dependencies for Agent Checks <a id="dependencies-agent-checks"></a> + +Another good example are agent based checks. You would define a health check +for the agent daemon responding to your requests, and make all other services +querying that daemon depend on that health check. + +``` +apply Service "agent-health" { + check_command = "cluster-zone" + + display_name = "cluster-health-" + host.name + + /* This follows the convention that the agent zone name is the FQDN which is the same as the host object name. */ + vars.cluster_zone = host.name + + assign where host.vars.agent_endpoint +} +``` + +Now, make all other agent based checks dependent on the OK state of the `agent-health` +service. + +``` +apply Dependency "agent-health-check" to Service { + parent_service_name = "agent-health" + + states = [ OK ] // Fail if the parent service state switches to NOT-OK + disable_notifications = true + + assign where host.vars.agent_endpoint // Automatically assigns all agent endpoint checks as child services on the matched host + ignore where service.name == "agent-health" // Avoid a self reference from child to parent +} + +``` + +This is described in detail in [this chapter](06-distributed-monitoring.md#distributed-monitoring-health-checks). |