summaryrefslogtreecommitdiffstats
path: root/doc/dev-guides/ra-dev-guide.asc
blob: 7a788b68355be753f3a7314b4bd75b8fba309c23 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
= The OCF Resource Agent Developer's Guide

== Introduction

This document is to serve as a guide and reference for all developers,
maintainers, and contributors working on OCF (Open Cluster Framework)
compliant cluster resource agents. It explains the anatomy and general
functionality of a resource agent, illustrates the resource agent API,
and provides valuable hints and tips to resource agent authors.

=== What is a resource agent?

A resource agent is an executable that manages a cluster resource. No
formal definition of a cluster resource exists, other than "anything a
cluster manages is a resource." Cluster resources can be as diverse as
IP addresses, file systems, database services, and entire virtual
machines -- to name just a few examples.

=== Who or what uses a resource agent?

Any Open Cluster Framework (OCF) compliant cluster management
application is capable of managing resources using the resource agents
described in this document. At the time of writing, two OCF compliant
cluster management applications exist for the Linux platform:

* _Pacemaker_, a cluster manager supporting both the Corosync and
  Heartbeat cluster messaging frameworks. Pacemaker evolved out of the
  Linux-HA project.
* _RGmanager_, the cluster manager bundled in Red Hat Cluster
  Suite. It supports the Corosync cluster messaging framework
  exclusively.

=== Which language is a resource agent written in?

An OCF compliant resource agent can be implemented in _any_
programming language. The API is not language specific. However, most
resource agents are implemented as shell scripts, which is why this
guide primarily uses example code written in shell language.

=== Is there a naming convention?

Yes! We have agreed to the following convention for resource agent
names: Please name resource agents using lower case letters, with
words separated by dashes (+example-agent-name+).

Existing agents may or may not follow this convention, but it is the
intention to make sure future agents follow this rule.

== API definitions

=== Environment variables

A resource agent receives all configuration information about the
resource it manages via environment variables. The names of these
environment variables are always the name of the resource parameter,
prefixed with +OCF_RESKEY_+. For example, if the resource has an +ip+
parameter set to +192.168.1.1+, then the resource agent will have
access to an environment variable +OCF_RESKEY_ip+ holding that value.

For any resource parameter that is not required to be set by the user
-- that is, its parameter definition in the resource agent metadata
does not specify +required="true"+ -- then the resource agent must

* Provide a reasonable default. This should be advertised in the
  metadata. By convention, the resource agent uses a variable named
  +OCF_RESKEY_<parametername>_default+ that holds this default.
* Alternatively, cater correctly for the value being empty.

In addition, the cluster manager may also support _meta_ resource
parameters. These do not apply directly to the resource configuration,
but rather specify _how_ the cluster resource manager is expected to manage
the resource. For example, the Pacemaker cluster manager uses the
+target-role+ meta parameter to specify whether the resource should be
started or stopped.

Meta parameters are passed into the resource agent in the
+OCF_RESKEY_CRM_meta_+ namespace, with any hypens converted to
underscores. Thus, the +target-role+ attribute maps to an environment
variable named +OCF_RESKEY_CRM_meta_target_role+.

The <<_script_variables>> section contains other system environment
variables.

=== Actions

Any resource agent must support one command-line argument which
specifies the action the resource agent is about to execute. The
following actions must be supported by any resource agent:

* +start+ -- starts the resource.
* +stop+ -- shuts down the resource.
* +monitor+ -- queries the resource for its state.
* +meta-data+ -- dumps the resource agent metadata.

In addition, resource agents may optionally support the following
actions:

* +promote+ -- turns a resource into the +Master+ role (Master/Slave
  resources only).
* +demote+ -- turns a resource into the +Slave+ role (Master/Slave
  resources only).
* +migrate_to+ and +migrate_from+ -- implement live migration of
  resources.
* +validate-all+ -- validates a resource's configuration.
* +usage+ or +help+ -- displays a usage message when the resource
  agent is invoked from the command line, rather than by the cluster
  manager.
* +notify+ -- inform resource about changes in state of other clones.
* +status+ -- historical (deprecated) synonym for +monitor+.

=== Timeouts

Action timeouts are enforced outside the resource agent proper. It is
the cluster manager's responsibility to monitor how long a resource
agent action has been running, and terminate it if it does not meet
its completion deadline. Thus, resource agents need not themselves
check for any timeout expiry.

Resource agents can, however, _advise_ the user of sensible timeout
values (which, when correctly set, will be duly enforced by the
cluster manager). See <<_metadata,the following section>> for details
on how a resource agent advertises its suggested timeouts.

=== Metadata

Every resource agent must describe its own purpose and supported
parameters in a set of XML metadata. This metadata is used by cluster
management applications for on-line help, and resource agent man pages
are generated from it as well. The following is a fictitious set of
metadata from an imaginary resource agent:

[source,xml]
--------------------------------------------------------------------------
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="foobar" version="0.1">
  <version>1.0</version>
  <longdesc lang="en">
This is a fictitious example resource agent written for the
OCF Resource Agent Developers Guide.
  </longdesc>
  <shortdesc lang="en">Example resource agent 
  for budding OCF RA developers</shortdesc>
  <parameters>
    <parameter name="eggs" unique="0" required="1">
      <longdesc lang="en">
      Number of eggs, an example numeric parameter
      </longdesc>
      <shortdesc lang="en">Number of eggs</shortdesc>
      <content type="integer"/>
    </parameter>
    <parameter name="superfrobnicate" unique="0" required="0">
      <longdesc lang="en">
      Enable superfrobnication, an example boolean parameter
      </longdesc>
      <shortdesc lang="en">Enable superfrobnication</shortdesc>
      <content type="boolean" default="false"/>
    </parameter>
    <parameter name="datadir" unique="0" required="1">
      <longdesc lang="en">
      Data directory, an example string parameter
      </longdesc>
      <shortdesc lang="en">Data directory</shortdesc>
      <content type="string"/>
    </parameter>
  </parameters>
  <actions>
    <action name="start"        timeout="20" />
    <action name="stop"         timeout="20" />
    <action name="monitor"      timeout="20"
                                interval="10" depth="0" />
    <action name="notify"       timeout="20" />
    <action name="reload"       timeout="20" />
    <action name="migrate_to"   timeout="20" />
    <action name="migrate_from" timeout="20" />
    <action name="meta-data"    timeout="5" />
    <action name="validate-all"   timeout="20" />
  </actions>
</resource-agent>
--------------------------------------------------------------------------

The +resource-agent+ element, of which there must only be one per
resource agent, defines the resource agent +name+ and +version+. The
+version+ element specifies the OCF version standard the metadata complies
with.

The +longdesc+ and +shortdesc+ elements in +resource-agent+ provide a
long and short description of the resource agent's
functionality. While +shortdesc+ is a one-line description of what
the resource agent does and is usually used in terse listings,
+longdesc+ should give a full-blown description of the resource agent
in as much detail as possible.

The +parameters+ element describes the resource agent parameters, and
should hold any number of +parameter+ children -- one for each
parameter that the resource agent supports.

Every +parameter+ should, like the +resource-agent+ as a whole, come
with a +shortdesc+ and a +longdesc+, and also a +content+ child that
describes the parameter's expected content.

On the +content+ element, there may be four different attributes:

* +type+ describes the parameter type (+string+, +integer+, or
  +boolean+). If unset, +type+ defaults to +string+.

* +required+ indicates whether setting the parameter is mandatory
  (+required="true"+) or optional (+required="false"+).

* For optional parameters, it is customary to provide a sensible
  default via the +default+ attribute.

* Finally, the +unique+ attribute (allowed values: +true+ or +false+)
  indicates that a specific value must be unique across the cluster,
  for this parameter of this particular resource type. For example, a
  highly available floating IP address is declared +unique+ -- as that
  one IP address should run only once throughout the cluster, avoiding
  duplicates.

The +actions+ list defines the actions that the resource agent
advertises as supported.

Every +action+ should list its own +timeout+ value. This is a
hint to the user what _minimal_ timeout should be configured for the
action. This is meant to cater for the fact that some resources are
quick to start and stop (IP addresses or filesystems, for example),
some may take several minutes to do so (such as databases).

In addition, recurring actions (such as +monitor+) should also specify
a recommended minimum +interval+, which is the time between two
consecutive invocations of the same action. Like +timeout+, this value
does not constitute a default -- it is merely a hint for the user
which action interval to configure, at minimum.

== Return codes

For any invocation, resource agents must exit with a defined return
code that informs the caller of the outcome of the invoked
action. The return codes are explained in detail in the following
subsections.

=== +OCF_SUCCESS+ (0)

The action completed successfully. This is the expected return code
for any successful +start+, +stop+, +promote+, +demote+,
+migrate_from+, +migrate_to+, +meta_data+, +help+, and +usage+ action.

For +monitor+ (and its deprecated alias, +status+), however, a
modified convention applies:

* For primitive (stateless) resources, +OCF_SUCCESS+ from +monitor+
  means that the resource is running. Non-running and gracefully
  shut-down resources must instead return +OCF_NOT_RUNNING+.

* For master/slave (stateful) resources, +OCF_SUCCESS+ from +monitor+
  means that the resource is running _in Slave mode_. Resources
  running in Master mode must instead return +OCF_RUNNING_MASTER+, and
  gracefully shut-down resources must instead return
  +OCF_NOT_RUNNING+.

=== +OCF_ERR_GENERIC+ (1)

The action returned a generic error. A resource agent should use this
exit code only when none of the more specific error codes, defined
below, accurately describes the problem.

The cluster resource manager interprets this exit code as a _soft_
error. This means that unless specifically configured otherwise, the
resource manager will attempt to recover a resource which failed with
+OCF_ERR_GENERIC+ in-place -- usually by restarting the resource on
the same node.

=== +OCF_ERR_ARGS+ (2)

The resource’s configuration is not valid on this machine. E.g. it
refers to a location not found on the node.

NOTE: The resource agent should not return this error when instructed
to perform an action that it does not support. Instead, under those
circumstances, it should return +OCF_ERR_UNIMPLEMENTED+.

=== +OCF_ERR_UNIMPLEMENTED+ (3)

The resource agent was instructed to execute an action that the agent
does not implement.

Not all resource agent actions are mandatory. +promote+, +demote+,
+migrate_to+, +migrate_from+, and +notify+, are all optional actions
which the resource agent may or may not implement. When a non-stateful
resource agent is misconfigured as a master/slave resource, for
example, then the resource agent should alert the user about this
misconfiguration by returning +OCF_ERR_UNIMPLEMENTED+ on the +promote+
and +demote+ actions.

=== +OCF_ERR_PERM+ (4)

The action failed due to insufficient permissions. This may be due to
the agent not being able to open a certain file, to listen on a
specific socket, to write to a directory, or similar.

The cluster resource manager interprets this exit code as a _hard_
error. This means that unless specifically configured otherwise, the
resource manager will attempt to recover a resource which failed with
this error by restarting the resource on a different node (where the
permission problem may not exist).

=== +OCF_ERR_INSTALLED+ (5)

The action failed because a required component is missing on the node
where the action was executed. This may be due to a required binary
not being executable, or a vital configuration file being unreadable.

The cluster resource manager interprets this exit code as a _hard_
error. This means that unless specifically configured otherwise, the
resource manager will attempt to recover a resource which failed with
this error by restarting the resource on a different node (where the
required files or binaries may be present).

=== +OCF_ERR_CONFIGURED+ (6)

The action failed because the user misconfigured the resource. For
example, the user may have configured an alphanumeric string for a
parameter that really should be an integer.

The cluster resource manager interprets this exit code as a _fatal_
error. Since this is a configuration error that is present
cluster-wide, it would make no sense to recover such a resource on a
different node, let alone in-place. When a resource fails with this
error, the cluster manager will attempt to shut down the resource, and
wait for administrator intervention.

=== +OCF_NOT_RUNNING+ (7)

The resource was found not to be running. This is an exit code that
may be returned by the +monitor+ action exclusively. Note that this
implies that the resource has either _gracefully_ shut down, or has
never been started.

If the resource is not running due to an error condition, the
+monitor+ action should instead return one of the +OCF_ERR_+ exit
codes or +OCF_FAILED_MASTER+.

=== +OCF_RUNNING_MASTER+ (8)

The resource was found to be running in the +Master+ role. This
applies only to stateful (Master/Slave) resources, and only to
their +monitor+ action.

Note that there is no specific exit code for "running in slave
mode". This is because their is no functional distinction between a
primitive resource running normally, and a stateful resource running
as a slave. The +monitor+ action of a stateful resource running
normally in the +Slave+ role should simply return +OCF_SUCCESS+.

=== +OCF_FAILED_MASTER+ (9)

The resource was found to have failed in the +Master+ role. This
applies only to stateful (Master/Slave) resources, and only to their
+monitor+ action.

The cluster resource manager interprets this exit code as a _soft_
error. This means that unless specifically configured otherwise, the
resource manager will attempt to recover a resource which failed with
+$OCF_FAILED_MASTER+ in-place -- usually by demoting, stopping,
starting and then promoting the resource on the same node.


== Resource agent structure

A typical (shell-based) resource agent contains standard structural
items, in the order as listed in this section.  It describes the
expected behavior of a resource agent with respect to the various
actions it supports, using a fictitous resource agent named +foobar+
as an example.

=== Resource agent interpreter

Any resource agent implemented as a script must specify its
interpreter using standard "shebang" (+#!+) header syntax.

[source,bash]
--------------------------------------------------------------------------
#!/bin/sh
--------------------------------------------------------------------------

If a resource agent is written in shell, specifying the generic shell
interpreter (+#!/bin/sh+) is generally preferred, though not
required. Resource agents declared as +/bin/sh+ compatible must not
use constructs native to a specific shell (such as, for example,
+${!variable}+ syntax native to +bash+). It is advisable to
occasionally run such resource agents through a sanitization utility
such as +checkbashisms+.

It is considered a regression to introduce a patch that will make a
previously +sh+ compatible resource agent suitable only for +bash+,
+ksh+, or any other non-generic shell. It is, however, perfectly
acceptable for a new resource agent to explicitly define a specific
shell, such as +/bin/bash+, as its interpreter.

=== Author and license information

The resource agent should contain a comment listing the resource agent
author(s) and/or copyright holder(s), and stating the license that
applies to the resource agent:

[source,bash]
--------------------------------------------------------------------------
#
#   Resource Agent for managing foobar resources.
#
#   License:      GNU General Public License (GPL)
#   (c) 2008-2010 John Doe, Jane Roe,
#                 and Linux-HA contributors
--------------------------------------------------------------------------

When a resource agent refers to a license for which multiple versions
exist, it is assumed that the current version applies.

=== Initialization

Any shell resource agent should source the +ocf-shellfuncs+ function
library. With the syntax below, this is done in terms of
+$OCF_FUNCTIONS_DIR+, which -- for testing purposes, and also for
generating documentation -- may be overridden from the command line.

[source,bash]
--------------------------------------------------------------------------
# Initialization:
: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
--------------------------------------------------------------------------

=== Functions implementing resource agent actions

What follows next are the functions implementing the resource agent's
advertised actions. The individual actions are described in detail in
<<_resource_agent_actions>>.

=== Execution block

This is the part of the resource agent that actually executes when the
resource agent is invoked. It typically follows a fairly standard
structure:

[source,bash]
--------------------------------------------------------------------------
# Make sure meta-data and usage always succeed
case $__OCF_ACTION in
meta-data)	foobar_meta_data
		exit $OCF_SUCCESS
		;;
usage|help)	foobar_usage
		exit $OCF_SUCCESS
		;;
esac

# Anything other than meta-data and usage must pass validation
foobar_validate_all || exit $?

# Translate each action into the appropriate function call
case $__OCF_ACTION in
start)		foobar_start;;
stop)		foobar_stop;;
status|monitor)	foobar_monitor;;
promote)	foobar_promote;;
demote)		foobar_demote;;
notify)		foobar_notify;;
reload)		ocf_log info "Reloading..."
	        foobar_start
		;;
validate-all)	;;
*)		foobar_usage
		exit $OCF_ERR_UNIMPLEMENTED
		;;
esac
rc=$?

# The resource agent may optionally log a debug message
ocf_log debug "${OCF_RESOURCE_INSTANCE} $__OCF_ACTION returned $rc"
exit $rc
--------------------------------------------------------------------------


== Resource agent actions

Each action is typically implemented in a separate function or method
in the resource agent. By convention, these are usually named
+<agent>_<action>+, so the function implementing the +start+ action in
+foobar+ would be named +foobar_start()+.

As a general rule, whenever the resource agent encounters an error
that it is not able to recover, it is permitted to immediately exit,
throw an exception, or otherwise cease execution. Examples for this
include configuration issues, missing binaries, permission problems,
etc. It is not necessary to pass these errors up the call stack.

It is the cluster manager's responsibility to initiate the appropriate
recovery action based on the user's configuration. The resource agent
should not guess at said configuration.

=== +start+ action

When invoked with the +start+ action, the resource agent must start
the resource if it is not yet running. This means that the agent must
verify the resource's configuration, query its state, and then start
it only if it is not running. A common way of doing this would be to
invoke the +validate_all+ and +monitor+ function first, as in the
following example:

[source,bash]
--------------------------------------------------------------------------
foobar_start() {
    # exit immediately if configuration is not valid
    foobar_validate_all || exit $?

    # if resource is already running, bail out early
    if foobar_monitor; then
	ocf_log info "Resource is already running"
	return $OCF_SUCCESS
    fi
    
    # actually start up the resource here (make sure to immediately
    # exit with an $OCF_ERR_ error code if anything goes seriously
    # wrong)
    ...

    # After the resource has been started, check whether it started up
    # correctly. If the resource starts asynchronously, the agent may
    # spin on the monitor function here -- if the resource does not
    # start up within the defined timeout, the cluster manager will
    # consider the start action failed
    while ! foobar_monitor; do
	ocf_log debug "Resource has not started yet, waiting"
	sleep 1
    done

    # only return $OCF_SUCCESS if _everything_ succeeded as expected
    return $OCF_SUCCESS
}
--------------------------------------------------------------------------


=== +stop+ action

When invoked with the +stop+ action, the resource agent must stop the
resource, if it is running. This means that the agent must verify the
resource configuration, query its state, and then stop it only if it
is currently running. A common way of doing this would be to invoke
the +validate_all+ and +monitor+ function first. It is important to
understand that +stop+ is a force operation -- the resource agent must
do everything in its power to shut down, the resource, short of
rebooting the node or shutting it off. Consider the following example:

[source,bash]
--------------------------------------------------------------------------
foobar_stop() {
    local rc

    # exit immediately if configuration is not valid
    foobar_validate_all || exit $?

    foobar_monitor
    rc=$?
    case "$rc" in
        "$OCF_SUCCESS")
            # Currently running. Normal, expected behavior.
            ocf_log debug "Resource is currently running"
            ;;
        "$OCF_RUNNING_MASTER")
            # Running as a Master. Need to demote before stopping.
            ocf_log info "Resource is currently running as Master"
	    foobar_demote || \
                ocf_log warn "Demote failed, trying to stop anyway"
            ;;
        "$OCF_NOT_RUNNING")
            # Currently not running. Nothing to do.
	    ocf_log info "Resource is already stopped"
	    return $OCF_SUCCESS
            ;;
    esac
    
    # actually shut down the resource here (make sure to immediately
    # exit with an $OCF_ERR_ error code if anything goes seriously
    # wrong)
    ...

    # After the resource has been stopped, check whether it shut down
    # correctly. If the resource stops asynchronously, the agent may
    # spin on the monitor function here -- if the resource does not
    # shut down within the defined timeout, the cluster manager will
    # consider the stop action failed
    while foobar_monitor; do
	ocf_log debug "Resource has not stopped yet, waiting"
	sleep 1
    done

    # only return $OCF_SUCCESS if _everything_ succeeded as expected
    return $OCF_SUCCESS

}
--------------------------------------------------------------------------

NOTE: The expected exit code for a successful stop operation is
+$OCF_SUCCESS+, _not_ +$OCF_NOT_RUNNING+.

IMPORTANT: A failed stop operation is a potentially dangerous
situation which the cluster manager will almost invariably try to
resolve by means of node fencing. In other words, the cluster manager
will forcibly evict from the cluster a node on which a stop operation
has failed. While this measure serves ultimately to protect data, it
does cause disruption to applications and their users. Thus, a
resource agent should make sure that it exits with an error only if
all avenues for proper resource shutdown have been exhausted.

=== +monitor+ action

The +monitor+ action queries the current status of a resource. It must
discern between three different states:

* resource is currently running (return +$OCF_SUCCESS+);
* resource has stopped gracefully (return +$OCF_NOT_RUNNING+);
* resource has run into a problem and must be considered failed
  (return the appropriate +$OCF_ERR_+ code to indicate the nature of the
  problem).


[source,bash]
--------------------------------------------------------------------------
foobar_monitor() {
    local rc

    # exit immediately if configuration is not valid
    foobar_validate_all || exit $?

    ocf_run frobnicate --test

    # This example assumes the following exit code convention
    # for frobnicate:
    # 0: running, and fully caught up with master
    # 1: gracefully stopped
    # any other: error
    case "$?" in
	0)
            rc=$OCF_SUCCESS
	    ocf_log debug "Resource is running"
            ;;
	1)
            rc=$OCF_NOT_RUNNING
	    ocf_log debug "Resource is not running"
	    ;;
	*)
	    ocf_log err "Resource has failed"
	    exit $OCF_ERR_GENERIC
    esac

    return $rc
}
--------------------------------------------------------------------------

Stateful (master/slave) resource agents may use a more elaborate
monitoring scheme where they can provide "hints" to the cluster
manager identifying which instance is best suited to assume the
+Master+ role. <<_specifying_a_master_preference>> explains the
details.

NOTE: The cluster manager may invoke the +monitor+ action for a
_probe_, which is a test whether the resource is currently
running. Normally, the monitor operation would behave exactly the same
during a probe and a "real" monitor action. If a specific resource
does require special treatment for probes, however, the +ocf_is_probe+
convenience function is available in the OCF shell functions library
for that purpose.

=== +validate-all+ action

The +validate-all+ action tests for correct resource agent
configuration and a working environment. +validate-all+ should exit
with one of the following return codes:

* +$OCF_SUCCESS+ -- all is well, the configuration is valid and
  usable.
* +$OCF_ERR_CONFIGURED+ -- the user has misconfigured the resource.
* +$OCF_ERR_INSTALLED+ -- the resource has possibly been configured
  correctly, but a vital component is missing on the node where
  +validate-all+ is being executed.
* +$OCF_ERR_PERM+ -- the resource is configured correctly and is not
  missing any required components, but is suffering from a permission
  issue (such as not being able to create a necessary file).

+validate-all+ is usually wrapped in a function that is not only
called when explicitly invoking the corresponding action, but also --
as a sanity check -- from just about any other function. Therefore,
the resource agent author must keep in mind that the function may be
invoked during the +start+, +stop+, and +monitor+ operations, and also
during probes.

Probes pose a separate challenge for validation. During a probe (when
the cluster manager may expect the resource _not_ to be running on the
node where the probe is executed), some required components may be
_expected_ to not be available on the affected node. For example, this
includes any shared data on storage devices not available for reading
during the probe. The +validate-all+ function may thus need to treat
probes specially, using the +ocf_is_probe+ convenience function:

[source,bash]
--------------------------------------------------------------------------
foobar_validate_all() {
    # Test for configuration errors first
    if ! ocf_is_decimal $OCF_RESKEY_eggs; then
       ocf_log err "eggs is not numeric!"
       exit $OCF_ERR_CONFIGURED
    fi

    # Test for required binaries
    check_binary frobnicate

    # Check for data directory (this may be on shared storage, so
    # disable this test during probes)
    if ! ocf_is_probe; then
       if ! [ -d $OCF_RESKEY_datadir ]; then
       	  ocf_log err "$OCF_RESKEY_datadir does not exist or is not a directory!"
          exit $OCF_ERR_INSTALLED
       fi
    fi

    return $OCF_SUCCESS
}
--------------------------------------------------------------------------

=== +meta-data+ action

The +meta-data+ action dumps the resource agent metadata to standard
output. The output must follow the metadata format as specified in
<<_metadata>>.

[source,bash]
--------------------------------------------------------------------------
foobar_meta_data {
    cat <<EOF
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="foobar" version="0.1">
  <version>1.0</version>
  <longdesc lang="en">
...
EOF
}
--------------------------------------------------------------------------

=== +promote+ action

The +promote+ action is optional. It must only be supported by
_stateful_ resource agents, which means agents that discern between
two distinct _roles_: +Master+ and +Slave+. +Slave+ is functionally
identical to the +Started+ state in a stateless resource agent. Thus,
while a regular (stateless) resource agent only needs to implement
+start+ and +stop+, a stateful resource agent must also support the
+promote+ action to be able to make a transition between the +Started+
(+Slave+) and +Master+ roles.

[source,bash]
--------------------------------------------------------------------------
foobar_promote() {
    local rc

    # exit immediately if configuration is not valid
    foobar_validate_all || exit $?

    # test the resource's current state
    foobar_monitor
    rc=$?
    case "$rc" in
        "$OCF_SUCCESS")
            # Running as slave. Normal, expected behavior.
            ocf_log debug "Resource is currently running as Slave"
            ;;
        "$OCF_RUNNING_MASTER")
            # Already a master. Unexpected, but not a problem.
            ocf_log info "Resource is already running as Master"
	    return $OCF_SUCCESS
            ;;
        "$OCF_NOT_RUNNING")
            # Currently not running. Need to start before promoting.
            ocf_log info "Resource is currently not running"
            foobar_start
            ;;
        *)
            # Failed resource. Let the cluster manager recover.
            ocf_log err "Unexpected error, cannot promote"
            exit $rc
            ;;
    esac
    
    # actually promote the resource here (make sure to immediately
    # exit with an $OCF_ERR_ error code if anything goes seriously
    # wrong)
    ocf_run frobnicate --master-mode || exit $OCF_ERR_GENERIC

    # After the resource has been promoted, check whether the
    # promotion worked. If the resource promotion is asynchronous, the
    # agent may spin on the monitor function here -- if the resource
    # does not assume the Master role within the defined timeout, the
    # cluster manager will consider the promote action failed.
    while true; do
        foobar_monitor
        if [ $? -eq $OCF_RUNNING_MASTER ]; then
            ocf_log debug "Resource promoted"
            break
        else
            ocf_log debug "Resource still awaiting promotion"
            sleep 1
        fi
    done

    # only return $OCF_SUCCESS if _everything_ succeeded as expected
    return $OCF_SUCCESS
}
--------------------------------------------------------------------------

=== +demote+ action

The +demote+ action is optional. It must only be supported by
_stateful_ resource agents, which means agents that discern between
two distict _roles_: +Master+ and +Slave+. +Slave+ is functionally
identical to the +Started+ state in a stateless resource agent. Thus,
while a regular (stateless) resource agent only needs to implement
+start+ and +stop+, a stateful resource agent must also support the
+demote+ action to be able to make a transition between the +Master+
and +Started+ (+Slave+) roles.

[source,bash]
--------------------------------------------------------------------------
foobar_demote() {
    local rc

    # exit immediately if configuration is not valid
    foobar_validate_all || exit $?

    # test the resource's current state
    foobar_monitor
    rc=$?
    case "$rc" in
        "$OCF_RUNNING_MASTER")
            # Running as master. Normal, expected behavior.
            ocf_log debug "Resource is currently running as Master"
            ;;
        "$OCF_SUCCESS")
            # Alread running as slave. Nothing to do.
            ocf_log debug "Resource is currently running as Slave"
	    return $OCF_SUCCESS
            ;;
        "$OCF_NOT_RUNNING")
            # Currently not running. Getting a demote action
            # in this state is unexpected. Exit with an error
            # and let the cluster manager recover.
            ocf_log err "Resource is currently not running"
            exit $OCF_ERR_GENERIC
            ;;
        *)
            # Failed resource. Let the cluster manager recover.
            ocf_log err "Unexpected error, cannot demote"
            exit $rc
            ;;
    esac
    
    # actually demote the resource here (make sure to immediately
    # exit with an $OCF_ERR_ error code if anything goes seriously
    # wrong)
    ocf_run frobnicate --unset-master-mode || exit $OCF_ERR_GENERIC

    # After the resource has been demoted, check whether the
    # demotion worked. If the resource demotion is asynchronous, the
    # agent may spin on the monitor function here -- if the resource
    # does not assume the Slave role within the defined timeout, the
    # cluster manager will consider the demote action failed.
    while true; do
        foobar_monitor
        if [ $? -eq $OCF_RUNNING_MASTER ]; then
            ocf_log debug "Resource still demoting"
            sleep 1
        else
            ocf_log debug "Resource demoted"
            break
        fi
    done

    # only return $OCF_SUCCESS if _everything_ succeeded as expected
    return $OCF_SUCCESS
}
--------------------------------------------------------------------------

=== +migrate_to+ action

The +migrate_to+ action can serve one of two purposes:

* Initiate a native _push_ type migration for the resource. In other
  words, instruct the resource to move _to_ a specific node from the
  node it is currently running on. The resource agent knows about its
  destination node via the +$OCF_RESKEY_CRM_meta_migrate_target+ environment
  variable.

* Freeze the resource in a _freeze/thaw_ (also known as
  _suspend/resume_) type migration. In this mode, the resource does
  not need any information about its destination node at this point.

The example below illustrates a push type migration:

[source,bash]
--------------------------------------------------------------------------
foobar_migrate_to() {
    # exit immediately if configuration is not valid
    foobar_validate_all || exit $?

    # if resource is not running, bail out early
    if ! foobar_monitor; then
	ocf_log err "Resource is not running"
	exit $OCF_ERR_GENERIC
    fi
    
    # actually start up the resource here (make sure to immediately
    # exit with an $OCF_ERR_ error code if anything goes seriously
    # wrong)
    ocf_run frobnicate --migrate \
                       --dest=$OCF_RESKEY_CRM_meta_migrate_target \
                       || exit OCF_ERR_GENERIC
    ...

    # only return $OCF_SUCCESS if _everything_ succeeded as expected
    return $OCF_SUCCESS
}
--------------------------------------------------------------------------

In contrast, a freeze/thaw type migration may implement its freeze
operation like this:

[source,bash]
--------------------------------------------------------------------------
foobar_migrate_to() {
    # exit immediately if configuration is not valid
    foobar_validate_all || exit $?

    # if resource is not running, bail out early
    if ! foobar_monitor; then
	ocf_log err "Resource is not running"
	exit $OCF_ERR_GENERIC
    fi
    
    # actually start up the resource here (make sure to immediately
    # exit with an $OCF_ERR_ error code if anything goes seriously
    # wrong)
    ocf_run frobnicate --freeze || exit OCF_ERR_GENERIC
    ...

    # only return $OCF_SUCCESS if _everything_ succeeded as expected
    return $OCF_SUCCESS
}
--------------------------------------------------------------------------


=== +migrate_from+ action

The +migrate_from+ action can serve one of two purposes:

* Complete a native _push_ type migration for the resource. In other
  words, check whether the migration has succeeded properly, and the
  resource is running on the local node. The resource agent knows
  about its the migration source via the
  +$OCF_RESKEY_CRM_meta_migrate_source+ environment variable.

* Thaw the resource in a _freeze/thaw_ (also known as
  _suspend/resume_) type migration. In this mode, the resource usually
  not need any information about its source node at this point.

The example below illustrates a push type migration:

[source,bash]
--------------------------------------------------------------------------
foobar_migrate_from() {
    # exit immediately if configuration is not valid
    foobar_validate_all || exit $?

    # After the resource has been migrated, check whether it resumed
    # correctly. If the resource starts asynchronously, the agent may
    # spin on the monitor function here -- if the resource does not
    # run within the defined timeout, the cluster manager will
    # consider the migrate_from action failed
    while ! foobar_monitor; do
	ocf_log debug "Resource has not yet migrated, waiting"
	sleep 1
    done

    # only return $OCF_SUCCESS if _everything_ succeeded as expected
    return $OCF_SUCCESS
}
--------------------------------------------------------------------------

In contrast, a freeze/thaw type migration may implement its thaw
operation like this:

[source,bash]
--------------------------------------------------------------------------
foobar_migrate_from() {
    # exit immediately if configuration is not valid
    foobar_validate_all || exit $?

    # actually start up the resource here (make sure to immediately
    # exit with an $OCF_ERR_ error code if anything goes seriously
    # wrong)
    ocf_run frobnicate --thaw || exit OCF_ERR_GENERIC

    # After the resource has been migrated, check whether it resumed
    # correctly. If the resource starts asynchronously, the agent may
    # spin on the monitor function here -- if the resource does not
    # run within the defined timeout, the cluster manager will
    # consider the migrate_from action failed
    while ! foobar_monitor; do
	ocf_log debug "Resource has not yet migrated, waiting"
	sleep 1
    done

    # only return $OCF_SUCCESS if _everything_ succeeded as expected
    return $OCF_SUCCESS
}
--------------------------------------------------------------------------


=== +notify+ action

With notifications, instances of clones (and of master/slave
resources, which are an extended kind of clones) can inform each other
about their state. When notifications are enabled, certain actions on
any instance of a clone carries a +pre+ and +post+ notification.

List of actions that trigger notifications:

* start
* stop
* promote
* demote

The cluster manager invokes the +notify+ operation on _all_ clone
instances. For +notify+ operations, additional environment variables
are passed into the resource agent during execution:

* +$OCF_RESKEY_CRM_meta_notify_type+ -- the notification type (+pre+
  or +post+)

* +$OCF_RESKEY_CRM_meta_notify_operation+ -- the operation (action)
  that the notification is about (+start+, +stop+, +promote+, +demote+
  etc.)

* +$OCF_RESKEY_CRM_meta_notify_start_uname+ -- node name of the node
  where the resource is being started (+start+ notifications only)

* +$OCF_RESKEY_CRM_meta_notify_stop_uname+ -- node name of the node
  where the resource is being stopped (+stop+ notifications only)

* +$OCF_RESKEY_CRM_meta_notify_master_uname+ -- node name of the node
  where the resource currently _is in_ the Master role

* +$OCF_RESKEY_CRM_meta_notify_promote_uname+ -- node name of the node
  where the resource currently _is being promoted to_ the Master role
  (+promote+ notifications only)

* +$OCF_RESKEY_CRM_meta_notify_demote_uname+ -- node name of the node
  where the resource currently _is being demoted to_ the Slave role
  (+demote+ notifications only)

Notifications come in particularly handy for master/slave resources
using a "pull" scheme, where the master is a publisher and the slave a
subscriber. Since the master is obviously only available as such when
a promotion has occurred, the slaves can use a "pre-promote"
notification to configure themselves to subscribe to the right
publisher.

Likewise, the subscribers may want to unsubscribe from the publisher
after it has relinquished its master status, and a "post-demote"
notification can be used for that purpose.

Consider the example below to illustrate the concept.

[source,bash]
--------------------------------------------------------------------------
foobar_notify() {
    local type_op
    type_op="${OCF_RESKEY_CRM_meta_notify_type}-${OCF_RESKEY_CRM_meta_notify_operation}"

    ocf_log debug "Received $type_op notification."
    case "$type_op" in
	'pre-promote')
	    ocf_run frobnicate --slave-mode \
                               --master=$OCF_RESKEY_CRM_meta_notify_promote_uname \
                               || exit $OCF_ERR_GENERIC
	    ;;
	'post-demote')
	    ocf_run frobnicate --unset-slave-mode || exit $OCF_ERR_GENERIC
	    ;;
    esac

    return $OCF_SUCCESS
}
--------------------------------------------------------------------------

NOTE: A master/slave resource agent may support a _multi-master_
configuration, where there is possibly more than one master at any
given time. If that is the case, then the
+$OCF_RESKEY_CRM_meta_notify_*_uname+ variables may each contain a
space-separated lists of hostnames, rather than a single host name as
shown in the example. Under those circumstances the resource agent
would have to properly iterate over this list.

== Script variables

This section outlines variables typically available to resource agents,
primarily for convenience purposes. For additional variables
available while the agent is being executed, refer to
<<_environment_variables>> and <<_return_codes>>.

=== +$OCF_RA_VERSION_MAJOR+

The major version number of the resource agent API that the cluster
manager is currently using.

=== +$OCF_RA_VERSION_MINOR+

The minor version number of the resource agent API that the cluster
manager is currently using.

=== +$OCF_ROOT+

The root of the OCF resource agent hierarchy. This should never be
changed by a resource agent. This is usually +/usr/lib/ocf+.

=== +$OCF_FUNCTIONS_DIR+

The directory where the resource agents shell function library,
+ocf-shellfuncs+, resides. This is usually defined in terms of
+$OCF_ROOT+ and should never be changed by a resource agent. This
variable may, however, be overridden from the command line while
testing a new or modified resource agent.

=== +$OCF_EXIT_REASON_PREFIX+

Used as a prefix when printing error messages from the resource agent.
Script functions use this automaticly so no explicit use is required
for shell based scripts.

=== +$OCF_RESOURCE_INSTANCE+

The resource instance name. For primitive (non-clone, non-stateful)
resources, this is simply the resource name. For clones and stateful
resources, this is the primitive name, followed by a colon an the
clone instance number (such as +p_foobar:0+).

=== +$OCF_RESOURCE_TYPE+

The resource type of the current resource, e.g. IPaddr2.

=== +$OCF_RESOURCE_PROVIDER+

The resource provider, e.g. heartbeat. This may not be in all cluster
managers of Resource Agent API version 1.0.

=== +$__OCF_ACTION+

The currently invoked action. This is exactly the first command-line
argument that the cluster manager specifies when it invokes the
resource agent.

=== +$__SCRIPT_NAME+

The name of the resource agent. This is exactly the base name of the
resource agent script, with leading directory names removed.

=== +$HA_RSCTMP+

A temporary directory for use by resource agents. The system startup
sequence (on any LSB compliant Linux distribution) guarantees that
this directory is emptied on system startup, so this directory will
not contain any stale data after a node reboot.

== Convenience functions

=== Logging: +ocf_log+

Resource agents should use the +ocf_log+ function for logging
purposes. This convenient logging wrapper is invoked as follows:

[source,bash]
--------------------------------------------------------------------------
ocf_log <severity> "Log message"
--------------------------------------------------------------------------

It supports following the following severity levels:

* +debug+ -- for debugging messages. Most logging configurations
  suppress this level by default.
* +info+ -- for informational messages about the agent's behavior or
  status.
* +warn+ -- for warnings. This is for any messages which reflect
  unexpected behavior that does _not_ constitute an unrecoverable
  error.
* +err+ -- for errors. As a general rule, this logging level should
  only be used immediately prior to an +exit+ with the appropriate
  error code.
* +crit+ -- for critical errors. As with +err+, this logging level
  should not be used unless the resource agent also exits with an
  error code. Very rarely used.

=== Testing for binaries: +have_binary+ and +check_binary+

A resource agent may need to test for the availability of a specific
executable. The +have_binary+ convenience function comes in handy
here:

[source,bash]
--------------------------------------------------------------------------
if ! have_binary frobnicate; then
   ocf_log warn "Missing frobnicate binary, frobnication disabled!"
fi
--------------------------------------------------------------------------

If a missing binary is a fatal problem for the resource, then the
+check_binary+ function should be used:

[source,bash]
--------------------------------------------------------------------------
check_binary frobnicate
--------------------------------------------------------------------------

Using +check_binary+ is a shorthand method for testing for the
existence (and executability) of the specified binary, and exiting
with +$OCF_ERR_INSTALLED+ if it cannot be found or executed.

NOTE: Both +have_binary+ and +check_binary+ honor +$PATH+ when the
binary to test for is not specified as a full path. It is usually wise
to _not_ test for a full path, as binary installations path may vary
by distribution or user policy.

=== Executing commands and capturing their output: +ocf_run+

Whenever a resource agent needs to execute a command and capture its
output, it should use the +ocf_run+ convenience function, invoked as
in this example:

[source,bash]
--------------------------------------------------------------------------
ocf_run frobnicate --spam=eggs || exit $OCF_ERR_GENERIC
--------------------------------------------------------------------------

With the command specified above, the resource agent will invoke
+frobnicate --spam=eggs+ and capture its output and
exit code. If the exit code is nonzero (indicating an error),
+ocf_run+ logs the command output with the +err+ logging severity, and
the resource agent subsequently exits.  If the exit code is zero
(indicating success), any command output will be logged with the +info+
logging severity.

If the resource agent wishes to ignore the output of a successful
command execution, it can use the +-q+ flag with +ocf_run+. In the
example below, +ocf_run+ will only log output if the command exit code
is nonzero.

[source,bash]
--------------------------------------------------------------------------
ocf_run -q frobnicate --spam=eggs || exit $OCF_ERR_GENERIC
--------------------------------------------------------------------------

Finally, if the resource agent wants to log the output of a command
with a nonzero exit code with a severity _other_ than error, it may do
so by adding the +-info+ or +-warn+ option to +ocf_run+:

[source,bash]
--------------------------------------------------------------------------
ocf_run -warn frobnicate --spam=eggs
--------------------------------------------------------------------------

=== Locks: +ocf_take_lock+ and +ocf_release_lock_on_exit+

Occasionally, there may be different resources of the same type in a
cluster configuration that should not execute actions in
parallel. When a resource agent needs to guard against parallel
execution on the same machine, it can use the +ocf_take_lock+ and
+ocf_release_lock_on_exit+ convenience functions:

[source,bash]
--------------------------------------------------------------------------
LOCKFILE=${HA_RSCTMP}/foobar
ocf_release_lock_on_exit $LOCKFILE

foobar_start() {
    ...
    ocf_take_lock $LOCKFILE
    ...
}
--------------------------------------------------------------------------

+ocf_take_lock+ attempts to acquire the designated +$LOCKFILE+. When
it is unavailable, it sleeps a random amount of time between 0 and 1
seconds, and retries. +ocf_release_lock_on_exit+ releases the lock
file when the agent exits (for any reason).

=== Testing for numerical values: +ocf_is_decimal+

Specifically for parameter validation, it can be helpful to test
whether a given value is numeric. The +ocf_is_decimal+ function exists
for that purpose:
--------------------------------------------------------------------------
foobar_validate_all() {
    if ! ocf_is_decimal $OCF_RESKEY_eggs; then
        ocf_log err "eggs is not numeric!"
        exit $OCF_ERR_CONFIGURED
    fi
    ...
}
--------------------------------------------------------------------------

=== Testing for boolean values: +ocf_is_true+

When a resource agent defines a boolean parameter, the value
for this parameter may be specified by the user as +0+/+1+,
+true+/+false+, or +on+/+off+. Since it is tedious to test for all
these values from within the resource agent, the agent should instead
use the +ocf_is_true+ convenience function:

[source,bash]
--------------------------------------------------------------------------
if ocf_is_true $OCF_RESKEY_superfrobnicate; then
    ocf_run frobnicate --super
fi
--------------------------------------------------------------------------

NOTE: If +ocf_is_true+ is used against an empty or non-existant
variable, it always returns an exit code of +1+, which is equivalent
to +false+.

=== Version comparison: +ocf_version_cmp+

A resource agent may want to check the version of software
installed. +ocf_version_cmp+ takes care of all the necessary
details.

The return codes are

* +0+ -- the first version is smaller (earlier) than the second
* +1+ -- the two versions are equal
* +2+ -- the first version is greater (later) than the second
* +3+ -- one of arguments is not recognized as a version string

The versions are allowed to contain digits, dots, and dashes.

[source,bash]
--------------------------------------------------------------------------
local v=`gooey --version`
ocf_version_cmp "$v" 12.0.8-1
case $? in
	0) ocf_log err "we do not support version $v, it is too old"
	   exit $OCF_ERR_INSTALLED
	;;
	[12]) ;; # we can work with versions >= 12.0.8-1
	3) ocf_log err "gooey produced version <$v>, too funky for me"
	   exit $OCF_ERR_INSTALLED
	;;
esac
--------------------------------------------------------------------------

=== Pseudo resources: +ha_pseudo_resource+

"Pseudo resources" are those where the resource agent in fact does not
actually start or stop something akin to a runnable process, but
merely executes a single action and then needs some form of tracing
whether that action has been executed or not. The +portblock+ resource
agent is an example of this.

Resource agents for pseudo resources can use a convenience function,
+ha_pseudo_resource+, which makes use of _tracking files_ to keep tabs
on the status of a resource. If +foobar+ was designed to manage a
pseudo resource, then its +start+ action could look like this:

[source,bash]
--------------------------------------------------------------------------
foobar_start() {
    # exit immediately if configuration is not valid
    foobar_validate_all || exit $?

    # if resource is already running, bail out early
    if foobar_monitor; then
	ocf_log info "Resource is already running"
	return $OCF_SUCCESS
    fi

    # start the pseudo resource
    ha_pseudo_resource ${OCF_RESOURCE_INSTANCE} start

    # After the resource has been started, check whether it started up
    # correctly. If the resource starts asynchronously, the agent may
    # spin on the monitor function here -- if the resource does not
    # start up within the defined timeout, the cluster manager will
    # consider the start action failed
    while ! foobar_monitor; do
	ocf_log debug "Resource has not started yet, waiting"
	sleep 1
    done

    # only return $OCF_SUCCESS if _everything_ succeeded as expected
    return $OCF_SUCCESS
}
--------------------------------------------------------------------------


== Conventions

This section contains a collection of conventions that have emerged in
the resource agent repositories over the years. Following these
conventions is by no means mandatory for resource agent authors, but
it is a good idea based on the
http://en.wikipedia.org/wiki/Principle_of_least_surprise[Principle of
Least Surprise] -- resource agents following these conventions will be
easier to understand, review, and use than those that do not.

=== Well-known parameter names

Several parameter names are supported by a number of resource
agents. For new resource agents, following these examples is generally
a good idea:

* +binary+ -- the name of a binary that principally manages the
  resource, such as a server daemon
* +config+ -- the full path to a configuration file
* +pid+ -- the full path to a file holding a process ID (PID)
* +log+ -- the full path to a log file
* +socket+ -- the full path to a UNIX socket that the resource manages
* +ip+ -- an IP address that a daemon binds to
* +port+ -- a TCP or UDP port that a daemon binds to

Needless to say, resource agents should only implement any of these
parameters if they are sensible to use in the agent's context.

=== Parameter defaults

Defaults for resource agent parameters should be set by initializing
variables with the suffix +_default+:

[source,bash]
--------------------------------------------------------------------------
# Defaults
OCF_RESKEY_superfrobnicate_default=0

: ${OCF_RESKEY_superfrobnicate=${OCF_RESKEY_superfrobnicate_default}}
--------------------------------------------------------------------------

NOTE: The resource agent should make sure that it sets a default for
any parameter not marked as +required+ in the metadata.


=== Honoring +PATH+ for binaries

When a resource agent supports a parameter designed to hold the name
of a binary (such as a daemon, or a client utility for querying
status), then that parameter should honor the +PATH+ environment
variable. Do not supply full paths. Thus, the following approach:

[source,bash]
--------------------------------------------------------------------------
# Good example -- do it this way
OCF_RESKEY_frobnicate_default="frobnicate"
: ${OCF_RESKEY_frobnicate="${OCF_RESKEY_frobnicate_default}"}
--------------------------------------------------------------------------

is much preferred over specifying a full path, as shown here:

[source,bash]
--------------------------------------------------------------------------
# Bad example -- avoid if you can
OCF_RESKEY_frobnicate_default="/usr/local/sbin/frobnicate"
: ${OCF_RESKEY_frobnicate="${OCF_RESKEY_frobnicate_default}"}
--------------------------------------------------------------------------

This rule holds for defaults, as well.



== Special considerations

=== Licensing

Whenever possible, resource agent contributors are _encouraged_ to use
the GNU General Public License (GPL), version 2 and later, for any new
resource agents. The shell functions library does not strictly mandate
this, however, as it is licensed under the GNU Lesser General Public
License (LGPL), version 2.1 and later (so it can be used by non-GPL
agents).

The resource agent _must_ explicitly state its own license in the
agent source code.


=== Locale settings

When sourcing +ocf-shellfuncs+ as explained in <<_initialization>>,
any resource agent automatically sets +LANG+ and +LC_ALL+ to the +C+
locale. Resource agents can thus expect to always operate in the +C+
locale, and need not reset +LANG+ or any of the +LC_+ environment
variables themselves.


=== Testing for running processes

For testing whether a particular process (with a known process ID) is
currently running, a frequently found method is to send it a +0+
signal and catch errors, similar to this example:

[source,bash]
--------------------------------------------------------------------------
if kill -s 0 `cat $daemon_pid_file`; then
    ocf_log debug "Process is currently running"
else
    ocf_log warn "Process is dead, removing pid file"
    rm -f $daemon_pid_file
if
--------------------------------------------------------------------------

IMPORTANT: An approach far superior to this example is to instead test
the _functionality_ of the daemon by connecting to it with a client
process, as shown in the example in
<<_literal_monitor_literal_action>>.


=== Specifying a master preference

Stateful (master/slave) resources must set their own _master
preference_ -- they can thus provide hints to the cluster manager
which is the the best instance to promote to the +Master+ role.

IMPORTANT: It is acceptable for multiple instances to have identical
positive master preferences. In that case, the cluster resource
manager will automatically select a resource agent to
promote. However, if _all_ instances have the (default) master score
of zero, the cluster manager will not promote any instance at
all. Thus, it is crucial that at least one instance has a positive
master score.

For this purpose, +crm_master+ comes in handy. This convenience
wrapper around the +crm_attribute+ sets a node attribute named
+master-<<_literal_ocf_resource_instance_literal,$OCF_RESOURCE_INSTANCE>>+
for the node it is being executed on, and fills this attribute with
the specified value. The cluster manager is then expected to translate
this into a promotion score for the corresponding instance, and base
its promotion preference on that score.

Stateful resource agents typically execute +crm_master+ during the
<<_literal_monitor_literal_action,+monitor+>> and/or
<<_literal_notify_literal_action,+notify+>> action.

The following example assumes that the +foobar+ resource agent can
test the application's status by executing a binary that returns
certain exit codes based on whether

* the resource is either in the master role, or is a slave that is
  fully caught up with the master (at any rate, it has current data),
  or
* the resource is in the slave role, but through some form of
  asynchronous replication has "fallen behind" the master, or
* the resource has gracefully stopped, or
* the resource has unexpectedly failed.

[source,bash]
--------------------------------------------------------------------------
foobar_monitor() {
    local rc

    # exit immediately if configuration is not valid
    foobar_validate_all || exit $?

    ocf_run frobnicate --test

    # This example assumes the following exit code convention
    # for frobnicate:
    # 0: running, and fully caught up with master
    # 1: gracefully stopped
    # 2: running, but lagging behind master
    # any other: error
    case "$?" in
	0)
            rc=$OCF_SUCCESS
	    ocf_log debug "Resource is running"
            # Set a high master preference. The current master
            # will always get this, plus 1. Any current slaves
            # will get a high preference so that if the master
            # fails, they are next in line to take over.
            crm_master -l reboot -v 100
            ;;
	1)
            rc=$OCF_NOT_RUNNING
	    ocf_log debug "Resource is not running"
            # Remove the master preference for this node
            crm_master -l reboot -D
	    ;;
        2)
            rc=$OCF_SUCCESS
            ocf_log debug "Resource is lagging behind master"
            # Set a low master preference: if the master fails
            # right now, and there is another slave that does
            # not lag behind the master, its higher master
            # preference will win and that slave will become
            # the new master
            crm_master -l reboot -v 5
            ;;
	*)
	    ocf_log err "Resource has failed"
	    exit $OCF_ERR_GENERIC
    esac

    return $rc
}
--------------------------------------------------------------------------


== Testing resource agents

This section discusses automated testing for resource agents. Testing
is a vital aspect of development; it is crucial both for creating new
resource agents, and for modifying existing ones.


=== Testing with +ocf-tester+

The resource agents repository (and hence, any installed resource
agents package) contains a utility named +ocf-tester+. This shell
script allows you to conveniently and easily test the functionality of
your resource agent.

+ocf-tester+ is commonly invoked, as +root+, like this:

--------------------------------------------------------------------------
ocf-tester -n <name> [-o <param>=<value> ... ] <resource agent>
--------------------------------------------------------------------------

* +<name>+ is an arbitrary resource name.

* You may set any number of +<param>=<value>+ with the +-o+ option,
  corresponding to any resource parameters you wish to set for
  testing.

* +<resource agent>+ is the full path to your resource agent.

When invoked, +ocf-tester+ executes all mandatory actions and enforces
action behavior as explained in <<_resource_agent_actions>>.

It also tests for optional actions. Optional actions must behave as
expected when advertised, but do not cause +ocf-tester+ to flag an
error if not implemented.

IMPORTANT: +ocf-tester+ does not initiate "dry runs" of actions, nor
does it create resource dummies of any kind. Instead, it exercises the
actual resource agent as-is, whether that may include opening and
closing databases, mounting file systems, starting or stopping virtual
machines, etc. Use with care.

For example, you could run +ocf-tester+ on the +foobar+ resource agent
as follows:

--------------------------------------------------------------------------
# ocf-tester -n foobartest \
             -o superfrobnicate=true \
             -o datadir=/tmp \
             /home/johndoe/ra-dev/foobar
Beginning tests for /home/johndoe/ra-dev/foobar...
* Your agent does not support the notify action (optional)
* Your agent does not support the reload action (optional)
/home/johndoe/ra-dev/foobar passed all tests
--------------------------------------------------------------------------

If the resource agent exhibits some difficult to grasp behaviour,
which is typically the case with just developed software, there
are +-v+ and +-d+ options to dump more output. If that does not
help, instruct +ocf-tester+ to trace the resource agent with
+-X+ (make sure to redirect output to a file, unless you are a
really fast reader).

=== Testing with +ocft+

+ocft+ is a testing tool for resource agents. The main difference
to +ocf-tester+ is that +ocft+ can automate creating complex
testing environments. That includes package installation and
arbitrary shell scripting.

==== +ocft+ components

+ocft+ consists of the following components:

* A test case generator (+/usr/sbin/ocft+) -- generates shell
  scripts from test case configuration files

* Configuration files (+/usr/share/resource-agents/ocft/configs/+) --
  a configuration file contains environment setup and test cases
  for one resource agent

* The testing scripts are stored in +/var/lib/resource-agents/ocft/cases/+,
  but normally there is no need to inspect them

==== Customizing the testing environment

+ocft+ modifies the runtime environment of the resource agent
either by changing environment variables (through the interface
defined by OCF) or by running ad-hoc shell scripts which can for
instance change permissions of a file or unmount a file system.

==== How to test

You need to know the software (resource) you want to test. Draw a
sketch of all interesting scenarios, with all expected and
unexpected conditions and how the resource agent should react to
them. Then you need to encode these conditions and the expected
outcomes as +ocft+ test cases. Running ocft is then simple:

---------------------------------------
# ocft make <RA>
# ocft test <RA>
---------------------------------------

The first subcommand generates the scripts for your test cases
whereas the second runs them and checks the outcome.

==== +ocft+ configuration file syntax

There are four top level options each of which can contain
one or more sub-options.

===== +CONFIG+ (top level option)

This option is global and influences every test case.

  ** +AgentRoot+ (sub-option)
---------------------------------------
AgentRoot /usr/lib/ocf/resource.d/xxx
---------------------------------------

Normally, we assume that the resource agent lives under the
+heartbeat+ provider. Use `AgentRoot` to test agent which is
distributed by another vendor.

  ** +InstallPackage+ (sub-option)
---------------------------------------
InstallPackage package [package2 [...]]
---------------------------------------

Install packages necessary for testing. The installation is
skipped if the packages have already been installed.

  ** 'HangTimeout' (sub-option)
---------------------------------------
HangTimeout secs
---------------------------------------

The maximum time allowed for a single RA action. If this timer
expires, the action is considered as failed.

===== +SETUP-AGENT+ (top level option)
---------------------------------------
SETUP-AGENT
  bash commands
---------------------------------------

If the RA needs to be initialized before testing, you can put
bash code here for that purpose. The initialization is done only
once. If you need to reinitialize then delete the
+/tmp/.[AGENT_NAME]_set+ stamp file.

===== +CASE+ (top level option)
---------------------------------------
CASE "description"
---------------------------------------

This is the main building block of the test suite. Each test
case is to be described in one +CASE+ top level option.

One case consists of several suboptions typically followed by the
+RunAgent+ suboption.

  ** +Var+ (sub-option)
---------------------------------------
Var VARIABLE=value
---------------------------------------

It is to set up an environment variable of the resource agent. They
usually appear to be OCF_RESKEY_xxx. One point is to be noted is there
is no blank by both sides of "=".

  ** +Unvar+ (sub-option)
---------------------------------------
Unvar VARIABLE [VARIABLE2 [...]]
---------------------------------------

Remove the environment variable.

  ** +Include+ (sub-option)
---------------------------------------
Include macro_name
---------------------------------------

Include statements in 'macro_name'. See below for description of
+CASE-BLOCK+.

** +Bash+ (sub-option)
---------------------------------------
Bash bash_codes
---------------------------------------

This option is to set up the environment of OS, where you can insert
BASH code to customize the system randomly. Note, do not cause
unrecoverable consequences to the system.

** +BashAtExit+ (sub-option)
---------------------------------------
BashAtExit bash_codes
---------------------------------------

This option is to recover the OS environment in order to run another
test case correctly. Of cause you can use 'Bash' option to recover
it. However, if mistakes occur in the process, the script will quit
directly instead of running your recovery codes.  If it happens, you
ought to use BashAtExit which can restore the system environment
before you quit.

** +RunAgent+ (sub-option)
---------------------------------------
RunAgent cmd [ret_value]
---------------------------------------

This option is to run resource agent. "cmd" is the parameter of the
resource agent, such as "start, status, stop ...". The second
parameter is optional. It will compare the actual returned value with
the expected value when the script has run recourse agent.  If
differs, bugs will be found.

It is also possible to execute a suboption on a remote host
instead of locally. The protocol used is ssh and the command is
run in the background. Just add the +@<ipaddr>+ suffix to the
suboption name. For instance:

---------------------------------------
Bash@192.168.1.100 date
---------------------------------------

would run the date program. Remote commands are run in
background.

NB: Not clear how can ssh be automated as we don't know in
advance the environment. Perhaps use "well-known" host names such
as "node2"? Also, if the command runs in the background, it's not
clear how is the exit code checked. Finally, does Var@node make
sense? Or is the current environment somehow copied over? We
probably need an example here.

Need examples in general.

===== +CASE-BLOCK+ (top level option)
---------------------------------------
CASE-BLOCK macro_name
---------------------------------------

The +CASE-BLOCK+ option defines a macro which can be +Include+d
in any +CASE+. All +CASE+ suboptions are valid in +CASE-BLOCK+.


== Installing and packaging resource agents

This section discusses what to do with your resource agent once it is
done and tested -- where to install it, and how to include it in either
your own application package or in the Linux-HA resource agents
repository.

=== Installing resource agents

If you choose to include your resource agent in your own project, make
sure it installs into the correct location. Resource agents should
install into the +/usr/lib/ocf/resource.d/<provider>+ directory, where
+<provider>+ is the name of your project or any other name you wish to
identify the resource agent with.

For example, if your +foobar+ resource agent is being packaged as part
of a project named +fortytwo+, then the correct full path to your
resource agent would be
+/usr/lib/ocf/resource.d/fortytwo/foobar+. Make sure your resource
agent installs with +0755+ (+-rwxr-xr-x+) permission bits.

When installed this way, OCF-compliant cluster resource managers will
be able to properly identify, parse, and execute your resource
agent. The Pacemaker cluster manager, for example, would map the
above-mentioned installation path to the +ocf:fortytwo:foobar+
resource type identifier.

=== Packaging resource agents

When you package resource agents as part of your own project, you
should apply the considerations outlined in this section.

NOTE: If you instead prefer to submit your resource agent to the
Linux-HA resource agents repository, see
<<_submitting_resource_agents>> for information on doing so.

==== RPM packaging

It is recommended to put your OCF resource agent(s) in an RPM
sub-package, with the name +<toppackage>-resource-agents+. Ensure that
the package owns its provider directory, and depends on the upstream
+resource-agents+ package which lays out the directory hierarchy and
provides convenience shell functions. An example RPM spec snippet is
given below:

--------------------------------------------------------------------------
%package resource-agents
Summary: OCF resource agent for Foobar
Group: System Environment/Base
Requires: %{name} = %{version}-%{release}, resource-agents

%description resource-agents
This package contains the OCF-compliant resource agents for Foobar.

%files resource-agents
%defattr(755,root,root,-)
%dir %{_prefix}/lib/ocf/resource.d/fortytwo
%{_prefix}/lib/ocf/resource.d/fortytwo/foobar
--------------------------------------------------------------------------

NOTE: If an RPM spec file contains a +%package+ declaration, then RPM
considers this a sub-package which inherits top-level fields such as
+Name+, +Version+, +License+, etc. Sub-packages have the top-level
package name automatically prepended to their own name. Thus the snippet
above would create a sub-package named +foobar-resource-agents+
(presuming the package +Name+ is +foobar+).

==== Debian packaging

For Debian packages, like for <<_rpm_packaging,RPMs>>, it is
recommended to create a separate package holding your resource agents,
which then should depend on the +cluster-agents+ package.

NOTE: This section assumes that you are packaging with +debhelper+.

An example +debian/control+ snippet is given below:

--------------------------------------------------------------------------
Package: foobar-cluster-agents
Priority: extra
Architecture: all
Depends: cluster-agents
Description: OCF-compliant resource agents for Foobar
--------------------------------------------------------------------------

You will also create a separate +.install+ file. Sticking with the
example of installing the +foobar+ resource agent as a sub-package of
+fortytwo+, the +debian/fortytwo-cluster-agents.install+ file could
consist of the following content:

--------------------------------------------------------------------------
usr/lib/ocf/resource.d/fortytwo/foobar
--------------------------------------------------------------------------

=== Submitting resource agents

If you choose not to bundle your resource agent with your own package,
but instead wish to submit it to the upstream resource agent
repository hosted on
https://github.com/ClusterLabs/resource-agents[the ClusterLabs
repository on GitHub], please follow the steps outlined in this section.

Create a fork of the
https://github.com/ClusterLabs/resource-agents[upstream repository] and
clone it with the following commands:

--------------------------------------------------------------------------
git clone git://github.com/<your-username>/resource-agents
git remote add upstream git@github.com:ClusterLabs/resource-agents.git
git checkout -b <new-branch>
--------------------------------------------------------------------------

Then, copy your resource agent into the +heartbeat+ subdirectory:
--------------------------------------------------------------------------
cd resource-agents/heartbeat
cp /path/to/your/local/copy/of/foobar .
chmod 0755 foobar
cd ..
--------------------------------------------------------------------------

Next, modify the +Makefile.am+ file in +resource-agents/heartbeat+ and
add your new resource agent to the +ocf_SCRIPTS+ list. This will make
sure the agent is properly installed.

Lastly, open Makefile.am in +resource-agents/doc/man+ and add
+ocf_heartbeat_<name>.7+ to the +man_MANS+ variable. This will
automatically generate a resource agent manual page from its metadata,
and then install that man page into the correct location.

Now, add your new resource agents, and the two modifications to the
Makefiles, to your changeset:

--------------------------------------------------------------------------
git add heartbeat/foobar
git add heartbeat/Makefile.am
git add doc/man/Makefile.am
git commit
--------------------------------------------------------------------------

In your commit message, be sure to include a meaningful description,
for example:
--------------------------------------------------------------------------
High: foobar: new resource agent

This new resource agent adds functionality to manage a foobar service.
It supports being configured as a primitive or as a master/slave set,
and also optionally supports superfrobnication.
--------------------------------------------------------------------------

Now push the patch set to GitHub:
--------------------------------------------------------------------------
git push
--------------------------------------------------------------------------

Create a Pull Request (PR) on Github that will be reviewed by the
upstream developers.

Once your new resource agent has been accepted for merging, one of the
upstream developers will Merge the Pull Request into the upstream
repository. At that point, you can update your main branch from
upstream, and remove your own branch.

--------------------------------------------------------------------------
git checkout main
git fetch upstream
git merge upstream/main
git branch -D <branch>
--------------------------------------------------------------------------

=== Maintaining resource agents

If you maintain a specific resource agent, or you are making repeated
contributions to the codebase, it's usually a good idea to maintain
your own _fork_ of the +ClusterLabs/resource-agents+ repository on
GitHub.

To do so,

* https://github.com/signup[Create a GitHub account] if you do not
  have one already.
* http://help.github.com/fork-a-repo/[Fork] the
  https://github.com/ClusterLabs/resource-agents[+resource-agents+
  repository].
* Clone your personal fork into a local working copy.

As you work on resource agents, *please* commit early, and commit
often. You can always fold commits later with +git rebase -i+.

Once you have made a number of changes that you would like others to
review, push them to your GitHub fork and send a post to the
+linux-ha-dev+ mailing list pointing people to it.

After the review is done, fix up your tree with any requested changes,
and then issue a pull request. There are two ways of doing so:

* You can use the +git request-pull+ utility to get a pre-populated
  email skeleton summarizing your changesets. Add any information you
  see fit, and send it to the list. It is a good idea to prefix your
  email subject with +[GIT PULL]+ so upstream maintainers can pick the
  message out easily.

* You can also issue a pull request directly on GitHub. GitHub
  automatically notifies upstream maintainers about new pull requests
  by email. Please refer to
  http://help.github.com/send-pull-requests/[github:help] for details
  on initiating pull requests.