summaryrefslogtreecommitdiffstats
path: root/doc/sphinx/Pacemaker_Administration/alerts.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/sphinx/Pacemaker_Administration/alerts.rst')
-rw-r--r--doc/sphinx/Pacemaker_Administration/alerts.rst311
1 files changed, 311 insertions, 0 deletions
diff --git a/doc/sphinx/Pacemaker_Administration/alerts.rst b/doc/sphinx/Pacemaker_Administration/alerts.rst
new file mode 100644
index 0000000..c0f54c6
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Administration/alerts.rst
@@ -0,0 +1,311 @@
+.. index::
+ single: alert; agents
+
+Alert Agents
+------------
+
+.. index::
+ single: alert; sample agents
+
+Using the Sample Alert Agents
+#############################
+
+Pacemaker provides several sample alert agents, installed in
+``/usr/share/pacemaker/alerts`` by default.
+
+While these sample scripts may be copied and used as-is, they are provided
+mainly as templates to be edited to suit your purposes. See their source code
+for the full set of instance attributes they support.
+
+.. topic:: Sending cluster events as SNMP v2c traps
+
+ .. code-block:: xml
+
+ <configuration>
+ <alerts>
+ <alert id="snmp_alert" path="/path/to/alert_snmp.sh">
+ <instance_attributes id="config_for_alert_snmp">
+ <nvpair id="trap_node_states" name="trap_node_states"
+ value="all"/>
+ </instance_attributes>
+ <meta_attributes id="config_for_timestamp">
+ <nvpair id="ts_fmt" name="timestamp-format"
+ value="%Y-%m-%d,%H:%M:%S.%01N"/>
+ </meta_attributes>
+ <recipient id="snmp_destination" value="192.168.1.2"/>
+ </alert>
+ </alerts>
+ </configuration>
+
+.. note:: **SNMP alert agent attributes**
+
+ The ``timestamp-format`` meta-attribute should always be set to
+ ``%Y-%m-%d,%H:%M:%S.%01N`` when using the SNMP agent, to match the SNMP
+ standard.
+
+ The SNMP agent provides a number of instance attributes in addition to the
+ one used in the example above. The most useful are ``trap_version``, which
+ defaults to ``2c``, and ``trap_community``, which defaults to ``public``.
+ See the source code for more details.
+
+.. topic:: Sending cluster events as SNMP v3 traps
+
+ .. code-block:: xml
+
+ <configuration>
+ <alerts>
+ <alert id="snmp_alert" path="/path/to/alert_snmp.sh">
+ <instance_attributes id="config_for_alert_snmp">
+ <nvpair id="trap_node_states" name="trap_node_states"
+ value="all"/>
+ <nvpair id="trap_version" name="trap_version" value="3"/>
+ <nvpair id="trap_community" name="trap_community" value=""/>
+ <nvpair id="trap_options" name="trap_options"
+ value="-l authNoPriv -a MD5 -u testuser -A secret1"/>
+ </instance_attributes>
+ <meta_attributes id="config_for_timestamp">
+ <nvpair id="ts_fmt" name="timestamp-format"
+ value="%Y-%m-%d,%H:%M:%S.%01N"/>
+ </meta_attributes>
+ <recipient id="snmp_destination" value="192.168.1.2"/>
+ </alert>
+ </alerts>
+ </configuration>
+
+.. note:: **SNMP v3 trap configuration**
+
+ To use SNMP v3, ``trap_version`` must be set to ``3``. ``trap_community``
+ will be ignored.
+
+ The example above uses the ``trap_options`` instance attribute to override
+ the security level, authentication protocol, authentication user, and
+ authentication password from snmp.conf. These will be passed to the snmptrap
+ command. Passing the password on the command line is considered insecure;
+ specify authentication and privacy options suitable for your environment.
+
+.. topic:: Sending cluster events as e-mails
+
+ .. code-block:: xml
+
+ <configuration>
+ <alerts>
+ <alert id="smtp_alert" path="/path/to/alert_smtp.sh">
+ <instance_attributes id="config_for_alert_smtp">
+ <nvpair id="email_sender" name="email_sender"
+ value="donotreply@example.com"/>
+ </instance_attributes>
+ <recipient id="smtp_destination" value="admin@example.com"/>
+ </alert>
+ </alerts>
+ </configuration>
+
+
+.. index::
+ single: alert; agent development
+
+Writing an Alert Agent
+######################
+
+.. index::
+ single: alert; environment variables
+ single: environment variable; alert agents
+
+.. table:: **Environment variables passed to alert agents**
+ :class: longtable
+ :widths: 1 3
+
+ +---------------------------+----------------------------------------------------------------+
+ | Environment Variable | Description |
+ +===========================+================================================================+
+ | CRM_alert_kind | .. index:: |
+ | | single:environment variable; CRM_alert_kind |
+ | | single:CRM_alert_kind |
+ | | |
+ | | The type of alert (``node``, ``fencing``, ``resource``, or |
+ | | ``attribute``) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_node | .. index:: |
+ | | single:environment variable; CRM_alert_node |
+ | | single:CRM_alert_node |
+ | | |
+ | | Name of affected node |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_node_sequence | .. index:: |
+ | | single:environment variable; CRM_alert_sequence |
+ | | single:CRM_alert_sequence |
+ | | |
+ | | A sequence number increased whenever an alert is being issued |
+ | | on the local node, which can be used to reference the order in |
+ | | which alerts have been issued by Pacemaker. An alert for an |
+ | | event that happened later in time reliably has a higher |
+ | | sequence number than alerts for earlier events. |
+ | | |
+ | | Be aware that this number has no cluster-wide meaning. |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_recipient | .. index:: |
+ | | single:environment variable; CRM_alert_recipient |
+ | | single:CRM_alert_recipient |
+ | | |
+ | | The configured recipient |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_timestamp | .. index:: |
+ | | single:environment variable; CRM_alert_timestamp |
+ | | single:CRM_alert_timestamp |
+ | | |
+ | | A timestamp created prior to executing the agent, in the |
+ | | format specified by the ``timestamp-format`` meta-attribute. |
+ | | This allows the agent to have a reliable, high-precision time |
+ | | of when the event occurred, regardless of when the agent |
+ | | itself was invoked (which could potentially be delayed due to |
+ | | system load, etc.). |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_timestamp_epoch | .. index:: |
+ | | single:environment variable; CRM_alert_timestamp_epoch |
+ | | single:CRM_alert_timestamp_epoch |
+ | | |
+ | | The same time as ``CRM_alert_timestamp``, expressed as the |
+ | | integer number of seconds since January 1, 1970. This (along |
+ | | with ``CRM_alert_timestamp_usec``) can be useful for alert |
+ | | agents that need to format time in a specific way rather than |
+ | | let the user configure it. |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_timestamp_usec | .. index:: |
+ | | single:environment variable; CRM_alert_timestamp_usec |
+ | | single:CRM_alert_timestamp_usec |
+ | | |
+ | | The same time as ``CRM_alert_timestamp``, expressed as the |
+ | | integer number of microseconds since |
+ | | ``CRM_alert_timestamp_epoch``. |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_version | .. index:: |
+ | | single:environment variable; CRM_alert_version |
+ | | single:CRM_alert_version |
+ | | |
+ | | The version of Pacemaker sending the alert |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_desc | .. index:: |
+ | | single:environment variable; CRM_alert_desc |
+ | | single:CRM_alert_desc |
+ | | |
+ | | Detail about event. For ``node`` alerts, this is the node's |
+ | | current state (``member`` or ``lost``). For ``fencing`` |
+ | | alerts, this is a summary of the requested fencing operation, |
+ | | including origin, target, and fencing operation error code, if |
+ | | any. For ``resource`` alerts, this is a readable string |
+ | | equivalent of ``CRM_alert_status``. |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_nodeid | .. index:: |
+ | | single:environment variable; CRM_alert_nodeid |
+ | | single:CRM_alert_nodeid |
+ | | |
+ | | ID of node whose status changed (provided with ``node`` alerts |
+ | | only) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_rc | .. index:: |
+ | | single:environment variable; CRM_alert_rc |
+ | | single:CRM_alert_rc |
+ | | |
+ | | The numerical return code of the fencing or resource operation |
+ | | (provided with ``fencing`` and ``resource`` alerts only) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_task | .. index:: |
+ | | single:environment variable; CRM_alert_task |
+ | | single:CRM_alert_task |
+ | | |
+ | | The requested fencing or resource operation (provided with |
+ | | ``fencing`` and ``resource`` alerts only) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_exec_time | .. index:: |
+ | | single:environment variable; CRM_alert_exec_time |
+ | | single:CRM_alert_exec_time |
+ | | |
+ | | The (wall-clock) time, in milliseconds, that it took to |
+ | | execute the action. If the action timed out, |
+ | | ``CRM_alert_status`` will be 2, ``CRM_alert_desc`` will be |
+ | | "Timed Out", and this value will be the action timeout. May |
+ | | not be supported on all platforms. (``resource`` alerts only) |
+ | | *(since 2.0.1)* |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_interval | .. index:: |
+ | | single:environment variable; CRM_alert_interval |
+ | | single:CRM_alert_interval |
+ | | |
+ | | The interval of the resource operation (``resource`` alerts |
+ | | only) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_rsc | .. index:: |
+ | | single:environment variable; CRM_alert_rsc |
+ | | single:CRM_alert_rsc |
+ | | |
+ | | The name of the affected resource (``resource`` alerts only) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_status | .. index:: |
+ | | single:environment variable; CRM_alert_status |
+ | | single:CRM_alert_status |
+ | | |
+ | | A numerical code used by Pacemaker to represent the operation |
+ | | result (``resource`` alerts only) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_target_rc | .. index:: |
+ | | single:environment variable; CRM_alert_target_rc |
+ | | single:CRM_alert_target_rc |
+ | | |
+ | | The expected numerical return code of the operation |
+ | | (``resource`` alerts only) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_attribute_name | .. index:: |
+ | | single:environment variable; CRM_alert_attribute_name |
+ | | single:CRM_alert_attribute_name |
+ | | |
+ | | The name of the node attribute that changed (``attribute`` |
+ | | alerts only) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_attribute_value | .. index:: |
+ | | single:environment variable; CRM_alert_attribute_value |
+ | | single:CRM_alert_attribute_value |
+ | | |
+ | | The new value of the node attribute that changed |
+ | | (``attribute`` alerts only) |
+ +---------------------------+----------------------------------------------------------------+
+
+Special concerns when writing alert agents:
+
+* Alert agents may be called with no recipient (if none is configured),
+ so the agent must be able to handle this situation, even if it
+ only exits in that case. (Users may modify the configuration in
+ stages, and add a recipient later.)
+
+* If more than one recipient is configured for an alert, the alert agent will
+ be called once per recipient. If an agent is not able to run concurrently, it
+ should be configured with only a single recipient. The agent is free,
+ however, to interpret the recipient as a list.
+
+* When a cluster event occurs, all alerts are fired off at the same time as
+ separate processes. Depending on how many alerts and recipients are
+ configured, and on what is done within the alert agents,
+ a significant load burst may occur. The agent could be written to take
+ this into consideration, for example by queueing resource-intensive actions
+ into some other instance, instead of directly executing them.
+
+* Alert agents are run as the ``hacluster`` user, which has a minimal set
+ of permissions. If an agent requires additional privileges, it is
+ recommended to configure ``sudo`` to allow the agent to run the necessary
+ commands as another user with the appropriate privileges.
+
+* As always, take care to validate and sanitize user-configured parameters,
+ such as ``CRM_alert_timestamp`` (whose content is specified by the
+ user-configured ``timestamp-format``), ``CRM_alert_recipient,`` and all
+ instance attributes. Mostly this is needed simply to protect against
+ configuration errors, but if some user can modify the CIB without having
+ ``hacluster``-level access to the cluster nodes, it is a potential security
+ concern as well, to avoid the possibility of code injection.
+
+.. note:: **ocf:pacemaker:ClusterMon compatibility**
+
+ The alerts interface is designed to be backward compatible with the external
+ scripts interface used by the ``ocf:pacemaker:ClusterMon`` resource, which
+ is now deprecated. To preserve this compatibility, the environment variables
+ passed to alert agents are available prepended with ``CRM_notify_``
+ as well as ``CRM_alert_``. One break in compatibility is that ``ClusterMon``
+ ran external scripts as the ``root`` user, while alert agents are run as the
+ ``hacluster`` user.