summaryrefslogtreecommitdiffstats
path: root/doc/sphinx/Pacemaker_Administration/agents.rst
blob: e5b17e273f3a836309a5ce2a4f4ca9f9406a6a90 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
.. index::
   single: resource agent

Resource Agents
---------------


Action Completion
#################

If one resource depends on another resource via constraints, the cluster will
interpret an expected result as sufficient to continue with dependent actions.
This may cause timing issues if the resource agent start returns before the
service is not only launched but fully ready to perform its function, or if the
resource agent stop returns before the service has fully released all its
claims on system resources. At a minimum, the start or stop should not return
before a status command would return the expected (started or stopped) result.


.. index::
   single: OCF resource agent
   single: resource agent; OCF

OCF Resource Agents
###################

.. index::
   single: OCF resource agent; location

Location of Custom Scripts
__________________________

OCF Resource Agents are found in ``/usr/lib/ocf/resource.d/$PROVIDER``

When creating your own agents, you are encouraged to create a new directory
under ``/usr/lib/ocf/resource.d/`` so that they are not confused with (or
overwritten by) the agents shipped by existing providers.

So, for example, if you choose the provider name of big-corp and want a new
resource named big-app, you would create a resource agent called
``/usr/lib/ocf/resource.d/big-corp/big-app`` and define a resource:
 
.. code-block: xml

   <primitive id="custom-app" class="ocf" provider="big-corp" type="big-app"/>


.. index::
   single: OCF resource agent; action

Actions
_______

All OCF resource agents are required to implement the following actions.

.. table:: **Required Actions for OCF Agents**

   +--------------+-------------+------------------------------------------------+
   | Action       | Description | Instructions                                   |
   +==============+=============+================================================+
   | start        | Start the   | .. index::                                     |
   |              | resource    |    single: OCF resource agent; start           |
   |              |             |    single: start action                        |
   |              |             |                                                |
   |              |             | Return 0 on success and an appropriate         |
   |              |             | error code otherwise. Must not report          |
   |              |             | success until the resource is fully            |
   |              |             | active.                                        |
   +--------------+-------------+------------------------------------------------+
   | stop         | Stop the    | .. index::                                     |
   |              | resource    |    single: OCF resource agent; stop            |
   |              |             |    single: stop action                         |
   |              |             |                                                |
   |              |             | Return 0 on success and an appropriate         |
   |              |             | error code otherwise. Must not report          |
   |              |             | success until the resource is fully            |
   |              |             | stopped.                                       |
   +--------------+-------------+------------------------------------------------+
   | monitor      | Check the   | .. index::                                     |
   |              | resource's  |    single: OCF resource agent; monitor         |
   |              | state       |    single: monitor action                      |
   |              |             |                                                |
   |              |             | Exit 0 if the resource is running, 7           |
   |              |             | if it is stopped, and any other OCF            |
   |              |             | exit code if it is failed. NOTE: The           |
   |              |             | monitor script should test the state           |
   |              |             | of the resource on the local machine           |
   |              |             | only.                                          |
   +--------------+-------------+------------------------------------------------+
   | meta-data    | Describe    | .. index::                                     |
   |              | the         |    single: OCF resource agent; meta-data       |
   |              | resource    |    single: meta-data action                    |
   |              |             |                                                |
   |              |             | Provide information about this                 |
   |              |             | resource in the XML format defined by          |
   |              |             | the OCF standard. Exit with 0. NOTE:           |
   |              |             | This is *not* required to be performed         |
   |              |             | as root.                                       |
   +--------------+-------------+------------------------------------------------+

OCF resource agents may optionally implement additional actions. Some are used
only with advanced resource types such as clones.

.. table:: **Optional Actions for OCF Resource Agents**

   +--------------+-------------+------------------------------------------------+
   | Action       | Description | Instructions                                   |
   +==============+=============+================================================+
   | validate-all | This should | .. index::                                     |
   |              | validate    |    single: OCF resource agent; validate-all    |
   |              | the         |    single: validate-all action                 |
   |              | instance    |                                                |
   |              | parameters  | Return 0 if parameters are valid, 2 if         |
   |              | provided.   | not valid, and 6 if resource is not            |
   |              |             | configured.                                    |
   +--------------+-------------+------------------------------------------------+
   | promote      | Bring the   | .. index::                                     |
   |              | local       |    single: OCF resource agent; promote         |
   |              | instance of |    single: promote action                      |
   |              | a promotable|                                                |
   |              | clone       | Return 0 on success                            |
   |              | resource to |                                                |
   |              | the promoted|                                                |
   |              | role.       |                                                |
   +--------------+-------------+------------------------------------------------+
   | demote       | Bring the   | .. index::                                     |
   |              | local       |    single: OCF resource agent; demote          |
   |              | instance of |    single: demote action                       |
   |              | a promotable|                                                |
   |              | clone       | Return 0 on success                            |
   |              | resource to |                                                |
   |              | the         |                                                |
   |              | unpromoted  |                                                |
   |              | role.       |                                                |
   +--------------+-------------+------------------------------------------------+
   | notify       | Used by the | .. index::                                     |
   |              | cluster to  |    single: OCF resource agent; notify          |
   |              | send        |    single: notify action                       |
   |              | the agent   |                                                |
   |              | pre- and    | Must not fail. Must exit with 0                |
   |              | post-       |                                                |
   |              | notification|                                                |
   |              | events      |                                                |
   |              | telling the |                                                |
   |              | resource    |                                                |
   |              | what has    |                                                |
   |              | happened and|                                                |
   |              | will happen.|                                                |
   +--------------+-------------+------------------------------------------------+
   | reload       | Reload the  | .. index::                                     |
   |              | service's   |    single: OCF resource agent; reload          |
   |              | own         |    single: reload action                       |
   |              | config.     |                                                |
   |              |             | Not used by Pacemaker                          |
   +--------------+-------------+------------------------------------------------+
   | reload-agent | Make        | .. index::                                     |
   |              | effective   |    single: OCF resource agent; reload-agent    |
   |              | any changes |    single: reload-agent action                 |
   |              | in instance |                                                |
   |              | parameters  | This is used when the agent can handle a       |
   |              | marked as   | change in some of its parameters more          |
   |              | reloadable  | efficiently than stopping and starting the     |
   |              | in the      | resource.                                      |
   |              | agent's     |                                                |
   |              | meta-data.  |                                                |
   +--------------+-------------+------------------------------------------------+
   | recover      | Restart the | .. index::                                     |
   |              | service.    |    single: OCF resource agent; recover         |
   |              |             |    single: recover action                      |
   |              |             |                                                |
   |              |             | Not used by Pacemaker                          |
   +--------------+-------------+------------------------------------------------+

.. important::

   If you create a new OCF resource agent, use `ocf-tester` to verify that the
   agent complies with the OCF standard properly.


.. index::
   single: OCF resource agent; return code

How are OCF Return Codes Interpreted?
_____________________________________

The first thing the cluster does is to check the return code against
the expected result.  If the result does not match the expected value,
then the operation is considered to have failed, and recovery action is
initiated.

There are three types of failure recovery:

.. table:: **Types of recovery performed by the cluster**

   +-------+--------------------------------------------+--------------------------------------+
   | Type  | Description                                | Action Taken by the Cluster          |
   +=======+============================================+======================================+
   | soft  | .. index::                                 | Restart the resource or move it to a |
   |       |    single: OCF resource agent; soft error  | new location                         |
   |       |                                            |                                      |
   |       | A transient error occurred                 |                                      |
   +-------+--------------------------------------------+--------------------------------------+
   | hard  | .. index::                                 | Move the resource elsewhere and      |
   |       |    single: OCF resource agent; hard error  | prevent it from being retried on the |
   |       |                                            | current node                         |
   |       | A non-transient error that                 |                                      |
   |       | may be specific to the                     |                                      |
   |       | current node                               |                                      |
   +-------+--------------------------------------------+--------------------------------------+
   | fatal | .. index::                                 | Stop the resource and prevent it     |
   |       |    single: OCF resource agent; fatal error | from being started on any cluster    |
   |       |                                            | node                                 |
   |       | A non-transient error that                 |                                      |
   |       | will be common to all                      |                                      |
   |       | cluster nodes (e.g. a bad                  |                                      |
   |       | configuration was specified)               |                                      |
   +-------+--------------------------------------------+--------------------------------------+

.. _ocf_return_codes:

OCF Return Codes
________________

The following table outlines the different OCF return codes and the type of
recovery the cluster will initiate when a failure code is received. Although
counterintuitive, even actions that return 0 (aka. ``OCF_SUCCESS``) can be
considered to have failed, if 0 was not the expected return value.

.. table:: **OCF Exit Codes and their Recovery Types**

   +-------+-----------------------+---------------------------------------------------+----------+
   | Exit  | OCF Alias             | Description                                       | Recovery |
   | Code  |                       |                                                   |          |
   +=======+=======================+===================================================+==========+
   | 0     | OCF_SUCCESS           | .. index::                                        | soft     |
   |       |                       |    single: OCF_SUCCESS                            |          |
   |       |                       |    single: OCF return code; OCF_SUCCESS           |          |
   |       |                       |    pair: OCF return code; 0                       |          |
   |       |                       |                                                   |          |
   |       |                       | Success. The command completed successfully.      |          |
   |       |                       | This is the expected result for all start,        |          |
   |       |                       | stop, promote and demote commands.                |          |
   +-------+-----------------------+---------------------------------------------------+----------+
   | 1     | OCF_ERR_GENERIC       | .. index::                                        | soft     |
   |       |                       |    single: OCF_ERR_GENERIC                        |          |
   |       |                       |    single: OCF return code; OCF_ERR_GENERIC       |          |
   |       |                       |    pair: OCF return code; 1                       |          |
   |       |                       |                                                   |          |
   |       |                       | Generic "there was a problem" error code.         |          |
   +-------+-----------------------+---------------------------------------------------+----------+
   | 2     | OCF_ERR_ARGS          | .. index::                                        | hard     |
   |       |                       |     single: OCF_ERR_ARGS                          |          |
   |       |                       |     single: OCF return code; OCF_ERR_ARGS         |          |
   |       |                       |     pair: OCF return code; 2                      |          |
   |       |                       |                                                   |          |
   |       |                       | The resource's parameter values are not valid on  |          |
   |       |                       | this machine (for example, a value refers to a    |          |
   |       |                       | file not found on the local host).                |          |
   +-------+-----------------------+---------------------------------------------------+----------+
   | 3     | OCF_ERR_UNIMPLEMENTED | .. index::                                        | hard     |
   |       |                       |    single: OCF_ERR_UNIMPLEMENTED                  |          |
   |       |                       |    single: OCF return code; OCF_ERR_UNIMPLEMENTED |          |
   |       |                       |    pair: OCF return code; 3                       |          |
   |       |                       |                                                   |          |
   |       |                       | The requested action is not implemented.          |          |
   +-------+-----------------------+---------------------------------------------------+----------+
   | 4     | OCF_ERR_PERM          | .. index::                                        | hard     |
   |       |                       |    single: OCF_ERR_PERM                           |          |
   |       |                       |    single: OCF return code; OCF_ERR_PERM          |          |
   |       |                       |    pair: OCF return code; 4                       |          |
   |       |                       |                                                   |          |
   |       |                       | The resource agent does not have                  |          |
   |       |                       | sufficient privileges to complete the task.       |          |
   +-------+-----------------------+---------------------------------------------------+----------+
   | 5     | OCF_ERR_INSTALLED     | .. index::                                        | hard     |
   |       |                       |    single: OCF_ERR_INSTALLED                      |          |
   |       |                       |    single: OCF return code; OCF_ERR_INSTALLED     |          |
   |       |                       |    pair: OCF return code; 5                       |          |
   |       |                       |                                                   |          |
   |       |                       | The tools required by the resource are            |          |
   |       |                       | not installed on this machine.                    |          |
   +-------+-----------------------+---------------------------------------------------+----------+
   | 6     | OCF_ERR_CONFIGURED    | .. index::                                        | fatal    |
   |       |                       |    single: OCF_ERR_CONFIGURED                     |          |
   |       |                       |    single: OCF return code; OCF_ERR_CONFIGURED    |          |
   |       |                       |    pair: OCF return code; 6                       |          |
   |       |                       |                                                   |          |
   |       |                       | The resource's parameter values are inherently    |          |
   |       |                       | invalid (for example, a required parameter was    |          |
   |       |                       | not given).                                       |          |
   +-------+-----------------------+---------------------------------------------------+----------+
   | 7     | OCF_NOT_RUNNING       | .. index::                                        | N/A      |
   |       |                       |    single: OCF_NOT_RUNNING                        |          |
   |       |                       |    single: OCF return code; OCF_NOT_RUNNING       |          |
   |       |                       |    pair: OCF return code; 7                       |          |
   |       |                       |                                                   |          |
   |       |                       | The resource is safely stopped. This should only  |          |
   |       |                       | be returned by monitor actions, not stop actions. |          |
   +-------+-----------------------+---------------------------------------------------+----------+
   | 8     | OCF_RUNNING_PROMOTED  | .. index::                                        | soft     |
   |       |                       |    single: OCF_RUNNING_PROMOTED                   |          |
   |       |                       |    single: OCF return code; OCF_RUNNING_PROMOTED  |          |
   |       |                       |    pair: OCF return code; 8                       |          |
   |       |                       |                                                   |          |
   |       |                       | The resource is running in the promoted role.     |          |
   +-------+-----------------------+---------------------------------------------------+----------+
   | 9     | OCF_FAILED_PROMOTED   | .. index::                                        | soft     |
   |       |                       |    single: OCF_FAILED_PROMOTED                    |          |
   |       |                       |    single: OCF return code; OCF_FAILED_PROMOTED   |          |
   |       |                       |    pair: OCF return code; 9                       |          |
   |       |                       |                                                   |          |
   |       |                       | The resource is (or might be) in the promoted     |          |
   |       |                       | role but has failed. The resource will be         |          |
   |       |                       | demoted, stopped and then started (and possibly   |          |
   |       |                       | promoted) again.                                  |          |
   +-------+-----------------------+---------------------------------------------------+----------+
   | 190   | OCF_DEGRADED          | .. index::                                        | none     |
   |       |                       |    single: OCF_DEGRADED                           |          |
   |       |                       |    single: OCF return code; OCF_DEGRADED          |          |
   |       |                       |    pair: OCF return code; 190                     |          |
   |       |                       |                                                   |          |
   |       |                       | The resource is properly active, but in such a    |          |
   |       |                       | condition that future failures are more likely.   |          |
   +-------+-----------------------+---------------------------------------------------+----------+
   | 191   | OCF_DEGRADED_PROMOTED | .. index::                                        | none     |
   |       |                       |    single: OCF_DEGRADED_PROMOTED                  |          |
   |       |                       |    single: OCF return code; OCF_DEGRADED_PROMOTED |          |
   |       |                       |    pair: OCF return code; 191                     |          |
   |       |                       |                                                   |          |
   |       |                       | The resource is properly active in the promoted   |          |
   |       |                       | role, but in such a condition that future         |          |
   |       |                       | failures are more likely.                         |          |
   +-------+-----------------------+---------------------------------------------------+----------+
   | other | *none*                | Custom error code.                                | soft     |
   +-------+-----------------------+---------------------------------------------------+----------+

Exceptions to the recovery handling described above:

* Probes (non-recurring monitor actions) that find a resource active
  (or in the promoted role) will not result in recovery action unless it is
  also found active elsewhere.
* The recovery action taken when a resource is found active more than
  once is determined by the resource's ``multiple-active`` property.
* Recurring actions that return ``OCF_ERR_UNIMPLEMENTED``
  do not cause any type of recovery.
* Actions that return one of the "degraded" codes will be treated the same as
  if they had returned success, but status output will indicate that the
  resource is degraded.


.. index::
   single: resource agent; LSB
   single: LSB resource agent
   single: init script

LSB Resource Agents (Init Scripts)
##################################

LSB Compliance
______________

The relevant part of the
`LSB specifications <http://refspecs.linuxfoundation.org/lsb.shtml>`_
includes a description of all the return codes listed here.
    
Assuming `some_service` is configured correctly and currently
inactive, the following sequence will help you determine if it is
LSB-compatible:

#. Start (stopped):
 
   .. code-block:: none

      # /etc/init.d/some_service start ; echo "result: $?"

   * Did the service start?
   * Did the echo command print ``result: 0`` (in addition to the init script's
     usual output)?

#. Status (running):
 
   .. code-block:: none

      # /etc/init.d/some_service status ; echo "result: $?"

   * Did the script accept the command?
   * Did the script indicate the service was running?
   * Did the echo command print ``result: 0`` (in addition to the init script's
     usual output)?

#. Start (running):
 
   .. code-block:: none

      # /etc/init.d/some_service start ; echo "result: $?"

   * Is the service still running?
   * Did the echo command print ``result: 0`` (in addition to the init
      script's usual output)?

#. Stop (running):
 
   .. code-block:: none

      # /etc/init.d/some_service stop ; echo "result: $?"

   * Was the service stopped?
   * Did the echo command print ``result: 0`` (in addition to the init
     script's usual output)?

#. Status (stopped):
 
   .. code-block:: none

      # /etc/init.d/some_service status ; echo "result: $?"

   * Did the script accept the command?
   * Did the script indicate the service was not running?
   * Did the echo command print ``result: 3`` (in addition to the init
     script's usual output)?

#. Stop (stopped):
 
   .. code-block:: none

      # /etc/init.d/some_service stop ; echo "result: $?"

   * Is the service still stopped?
   * Did the echo command print ``result: 0`` (in addition to the init
     script's usual output)?

#. Status (failed):

   This step is not readily testable and relies on manual inspection of the script.

   The script can use one of the error codes (other than 3) listed in the
   LSB spec to indicate that it is active but failed. This tells the
   cluster that before moving the resource to another node, it needs to
   stop it on the existing one first.

If the answer to any of the above questions is no, then the script is not
LSB-compliant. Your options are then to either fix the script or write an OCF
agent based on the existing script.