doc/sphinx/Pacemaker_Administration/alerts.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311

.. index::
   single: alert; agents

Alert Agents
------------

.. index::
   single: alert; sample agents

Using the Sample Alert Agents
#############################
   
Pacemaker provides several sample alert agents, installed in
``/usr/share/pacemaker/alerts`` by default.
   
While these sample scripts may be copied and used as-is, they are provided
mainly as templates to be edited to suit your purposes. See their source code
for the full set of instance attributes they support.
   
.. topic:: Sending cluster events as SNMP v2c traps

   .. code-block:: xml

      <configuration>
         <alerts>
            <alert id="snmp_alert" path="/path/to/alert_snmp.sh">
               <instance_attributes id="config_for_alert_snmp">
                  <nvpair id="trap_node_states" name="trap_node_states"
                          value="all"/>
               </instance_attributes>
               <meta_attributes id="config_for_timestamp">
                  <nvpair id="ts_fmt" name="timestamp-format"
                          value="%Y-%m-%d,%H:%M:%S.%01N"/>
               </meta_attributes>
               <recipient id="snmp_destination" value="192.168.1.2"/>
            </alert>
         </alerts>
      </configuration>

.. note:: **SNMP alert agent attributes**

   The ``timestamp-format`` meta-attribute should always be set to
   ``%Y-%m-%d,%H:%M:%S.%01N`` when using the SNMP agent, to match the SNMP
   standard.

   The SNMP agent provides a number of instance attributes in addition to the
   one used in the example above. The most useful are ``trap_version``, which
   defaults to ``2c``, and ``trap_community``, which defaults to ``public``.
   See the source code for more details.

.. topic:: Sending cluster events as SNMP v3 traps

   .. code-block:: xml

      <configuration>
         <alerts>
            <alert id="snmp_alert" path="/path/to/alert_snmp.sh">
               <instance_attributes id="config_for_alert_snmp">
                  <nvpair id="trap_node_states" name="trap_node_states"
                          value="all"/>
                  <nvpair id="trap_version" name="trap_version" value="3"/>
                  <nvpair id="trap_community" name="trap_community" value=""/>
                  <nvpair id="trap_options" name="trap_options"
                          value="-l authNoPriv -a MD5 -u testuser -A secret1"/>
               </instance_attributes>
               <meta_attributes id="config_for_timestamp">
                  <nvpair id="ts_fmt" name="timestamp-format"
                          value="%Y-%m-%d,%H:%M:%S.%01N"/>
               </meta_attributes>
               <recipient id="snmp_destination" value="192.168.1.2"/>
            </alert>
         </alerts>
      </configuration>

.. note:: **SNMP v3 trap configuration**

   To use SNMP v3, ``trap_version`` must be set to ``3``. ``trap_community``
   will be ignored.

   The example above uses the ``trap_options`` instance attribute to override
   the security level, authentication protocol, authentication user, and
   authentication password from snmp.conf. These will be passed to the snmptrap
   command. Passing the password on the command line is considered insecure;
   specify authentication and privacy options suitable for your environment.

.. topic:: Sending cluster events as e-mails

   .. code-block:: xml

      <configuration>
         <alerts>
            <alert id="smtp_alert" path="/path/to/alert_smtp.sh">
               <instance_attributes id="config_for_alert_smtp">
                  <nvpair id="email_sender" name="email_sender"
                          value="donotreply@example.com"/>
               </instance_attributes>
               <recipient id="smtp_destination" value="admin@example.com"/>
            </alert>
         </alerts>
      </configuration>


.. index::
   single: alert; agent development

Writing an Alert Agent
######################
   
.. index::
   single: alert; environment variables
   single: environment variable; alert agents

.. table:: **Environment variables passed to alert agents**
   :class: longtable
   :widths: 1 3
   
   +---------------------------+----------------------------------------------------------------+
   | Environment Variable      | Description                                                    |
   +===========================+================================================================+
   | CRM_alert_kind            | .. index::                                                     | 
   |                           |   single:environment variable; CRM_alert_kind                  |
   |                           |   single:CRM_alert_kind                                        |
   |                           |                                                                |
   |                           | The type of alert (``node``, ``fencing``, ``resource``, or     |
   |                           | ``attribute``)                                                 |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_node            | .. index::                                                     |
   |                           |   single:environment variable; CRM_alert_node                  |
   |                           |   single:CRM_alert_node                                        |
   |                           |                                                                |
   |                           | Name of affected node                                          |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_node_sequence   | .. index::                                                     |
   |                           |   single:environment variable; CRM_alert_sequence              |
   |                           |   single:CRM_alert_sequence                                    |
   |                           |                                                                |
   |                           | A sequence number increased whenever an alert is being issued  |
   |                           | on the local node, which can be used to reference the order in |
   |                           | which alerts have been issued by Pacemaker. An alert for an    |
   |                           | event that happened later in time reliably has a higher        |
   |                           | sequence number than alerts for earlier events.                |
   |                           |                                                                |
   |                           | Be aware that this number has no cluster-wide meaning.         |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_recipient       | .. index::                                                     | 
   |                           |   single:environment variable; CRM_alert_recipient             |
   |                           |   single:CRM_alert_recipient                                   |
   |                           |                                                                |
   |                           | The configured recipient                                       |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_timestamp       | .. index::                                                     |
   |                           |   single:environment variable; CRM_alert_timestamp             |
   |                           |   single:CRM_alert_timestamp                                   |
   |                           |                                                                |
   |                           | A timestamp created prior to executing the agent, in the       |
   |                           | format specified by the ``timestamp-format`` meta-attribute.   |
   |                           | This allows the agent to have a reliable, high-precision time  |
   |                           | of when the event occurred, regardless of when the agent       |
   |                           | itself was invoked (which could potentially be delayed due to  |
   |                           | system load, etc.).                                            |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_timestamp_epoch | .. index::                                                     |
   |                           |   single:environment variable; CRM_alert_timestamp_epoch       |
   |                           |   single:CRM_alert_timestamp_epoch                             |
   |                           |                                                                |
   |                           | The same time as ``CRM_alert_timestamp``, expressed as the     |
   |                           | integer number of seconds since January 1, 1970. This (along   |
   |                           | with ``CRM_alert_timestamp_usec``) can be useful for alert     |
   |                           | agents that need to format time in a specific way rather than  |
   |                           | let the user configure it.                                     |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_timestamp_usec  | .. index::                                                     |
   |                           |   single:environment variable; CRM_alert_timestamp_usec        |
   |                           |   single:CRM_alert_timestamp_usec                              |
   |                           |                                                                |
   |                           | The same time as ``CRM_alert_timestamp``, expressed as the     |
   |                           | integer number of microseconds since                           |
   |                           | ``CRM_alert_timestamp_epoch``.                                 |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_version         | .. index::                                                     |
   |                           |   single:environment variable; CRM_alert_version               |
   |                           |   single:CRM_alert_version                                     |
   |                           |                                                                |
   |                           | The version of Pacemaker sending the alert                     |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_desc            | .. index::                                                     |
   |                           |   single:environment variable; CRM_alert_desc                  |
   |                           |   single:CRM_alert_desc                                        |
   |                           |                                                                |
   |                           | Detail about event. For ``node`` alerts, this is the node's    |
   |                           | current state (``member`` or ``lost``). For ``fencing``        |
   |                           | alerts, this is a summary of the requested fencing operation,  |
   |                           | including origin, target, and fencing operation error code, if |
   |                           | any. For ``resource`` alerts, this is a readable string        |
   |                           | equivalent of ``CRM_alert_status``.                            |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_nodeid          | .. index::                                                     |
   |                           |   single:environment variable; CRM_alert_nodeid                |
   |                           |   single:CRM_alert_nodeid                                      |
   |                           |                                                                |
   |                           | ID of node whose status changed (provided with ``node`` alerts |
   |                           | only)                                                          |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_rc              | .. index::                                                     |
   |                           |   single:environment variable; CRM_alert_rc                    |
   |                           |   single:CRM_alert_rc                                          |
   |                           |                                                                |
   |                           | The numerical return code of the fencing or resource operation |
   |                           | (provided with ``fencing`` and ``resource`` alerts only)       |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_task            | .. index::                                                     |
   |                           |   single:environment variable; CRM_alert_task                  |
   |                           |   single:CRM_alert_task                                        |
   |                           |                                                                |
   |                           | The requested fencing or resource operation (provided with     |
   |                           | ``fencing`` and ``resource`` alerts only)                      |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_exec_time       | .. index::                                                     |
   |                           |   single:environment variable; CRM_alert_exec_time             |
   |                           |   single:CRM_alert_exec_time                                   |
   |                           |                                                                |
   |                           | The (wall-clock) time, in milliseconds, that it took to        |
   |                           | execute the action. If the action timed out,                   |
   |                           | ``CRM_alert_status`` will be 2, ``CRM_alert_desc`` will be     |
   |                           | "Timed Out", and this value will be the action timeout. May    |
   |                           | not be supported on all platforms. (``resource`` alerts only)  |
   |                           | *(since 2.0.1)*                                                |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_interval        | .. index::                                                     |
   |                           |   single:environment variable; CRM_alert_interval              |
   |                           |   single:CRM_alert_interval                                    |
   |                           |                                                                |
   |                           | The interval of the resource operation (``resource`` alerts    |
   |                           | only)                                                          |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_rsc             | .. index::                                                     |
   |                           |   single:environment variable; CRM_alert_rsc                   |
   |                           |   single:CRM_alert_rsc                                         |
   |                           |                                                                |
   |                           | The name of the affected resource (``resource`` alerts only)   |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_status          | .. index::                                                     |
   |                           |   single:environment variable; CRM_alert_status                |
   |                           |   single:CRM_alert_status                                      |
   |                           |                                                                |
   |                           | A numerical code used by Pacemaker to represent the operation  |
   |                           | result (``resource`` alerts only)                              |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_target_rc       | .. index::                                                     |
   |                           |   single:environment variable; CRM_alert_target_rc             |
   |                           |   single:CRM_alert_target_rc                                   |
   |                           |                                                                |
   |                           | The expected numerical return code of the operation            |
   |                           | (``resource`` alerts only)                                     |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_attribute_name  | .. index::                                                     |
   |                           |   single:environment variable; CRM_alert_attribute_name        |
   |                           |   single:CRM_alert_attribute_name                              |
   |                           |                                                                |
   |                           | The name of the node attribute that changed (``attribute``     |
   |                           | alerts only)                                                   |
   +---------------------------+----------------------------------------------------------------+
   | CRM_alert_attribute_value | .. index::                                                     |
   |                           |   single:environment variable; CRM_alert_attribute_value       |
   |                           |   single:CRM_alert_attribute_value                             |
   |                           |                                                                |
   |                           | The new value of the node attribute that changed               |
   |                           | (``attribute`` alerts only)                                    |
   +---------------------------+----------------------------------------------------------------+
   
Special concerns when writing alert agents:
   
* Alert agents may be called with no recipient (if none is configured),
  so the agent must be able to handle this situation, even if it
  only exits in that case. (Users may modify the configuration in
  stages, and add a recipient later.)
   
* If more than one recipient is configured for an alert, the alert agent will
  be called once per recipient. If an agent is not able to run concurrently, it
  should be configured with only a single recipient. The agent is free,
  however, to interpret the recipient as a list.
   
* When a cluster event occurs, all alerts are fired off at the same time as
  separate processes. Depending on how many alerts and recipients are
  configured, and on what is done within the alert agents,
  a significant load burst may occur. The agent could be written to take
  this into consideration, for example by queueing resource-intensive actions
  into some other instance, instead of directly executing them.
   
* Alert agents are run as the ``hacluster`` user, which has a minimal set
  of permissions. If an agent requires additional privileges, it is
  recommended to configure ``sudo`` to allow the agent to run the necessary
  commands as another user with the appropriate privileges.
   
* As always, take care to validate and sanitize user-configured parameters,
  such as ``CRM_alert_timestamp`` (whose content is specified by the
  user-configured ``timestamp-format``), ``CRM_alert_recipient,`` and all
  instance attributes. Mostly this is needed simply to protect against
  configuration errors, but if some user can modify the CIB without having
  ``hacluster``-level access to the cluster nodes, it is a potential security
  concern as well, to avoid the possibility of code injection.
   
.. note:: **ocf:pacemaker:ClusterMon compatibility**

   The alerts interface is designed to be backward compatible with the external
   scripts interface used by the ``ocf:pacemaker:ClusterMon`` resource, which
   is now deprecated. To preserve this compatibility, the environment variables
   passed to alert agents are available prepended with ``CRM_notify_``
   as well as ``CRM_alert_``. One break in compatibility is that ``ClusterMon``
   ran external scripts as the ``root`` user, while alert agents are run as the
   ``hacluster`` user.