1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
|
.. index::
single: fencing
single: STONITH
.. _fencing:
Fencing
-------
What Is Fencing?
################
*Fencing* is the ability to make a node unable to run resources, even when that
node is unresponsive to cluster commands.
Fencing is also known as *STONITH*, an acronym for "Shoot The Other Node In The
Head", since the most common fencing method is cutting power to the node.
Another method is "fabric fencing", cutting the node's access to some
capability required to run resources (such as network access or a shared disk).
.. index::
single: fencing; why necessary
Why Is Fencing Necessary?
#########################
Fencing protects your data from being corrupted by malfunctioning nodes or
unintentional concurrent access to shared resources.
Fencing protects against the "split brain" failure scenario, where cluster
nodes have lost the ability to reliably communicate with each other but are
still able to run resources. If the cluster just assumed that uncommunicative
nodes were down, then multiple instances of a resource could be started on
different nodes.
The effect of split brain depends on the resource type. For example, an IP
address brought up on two hosts on a network will cause packets to randomly be
sent to one or the other host, rendering the IP useless. For a database or
clustered file system, the effect could be much more severe, causing data
corruption or divergence.
Fencing is also used when a resource cannot otherwise be stopped. If a
resource fails to stop on a node, it cannot be started on a different node
without risking the same type of conflict as split-brain. Fencing the
original node ensures the resource can be safely started elsewhere.
Users may also configure the ``on-fail`` property of :ref:`operation` or the
``loss-policy`` property of
:ref:`ticket constraints <ticket-constraints>` to ``fence``, in which
case the cluster will fence the resource's node if the operation fails or the
ticket is lost.
.. index::
single: fencing; device
Fence Devices
#############
A *fence device* or *fencing device* is a special type of resource that
provides the means to fence a node.
Examples of fencing devices include intelligent power switches and IPMI devices
that accept SNMP commands to cut power to a node, and iSCSI controllers that
allow SCSI reservations to be used to cut a node's access to a shared disk.
Since fencing devices will be used to recover from loss of networking
connectivity to other nodes, it is essential that they do not rely on the same
network as the cluster itself, otherwise that network becomes a single point of
failure.
Since loss of a node due to power outage is indistinguishable from loss of
network connectivity to that node, it is also essential that at least one fence
device for a node does not share power with that node. For example, an on-board
IPMI controller that shares power with its host should not be used as the sole
fencing device for that host.
Since fencing is used to isolate malfunctioning nodes, no fence device should
rely on its target functioning properly. This includes, for example, devices
that ssh into a node and issue a shutdown command (such devices might be
suitable for testing, but never for production).
.. index::
single: fencing; agent
Fence Agents
############
A *fence agent* or *fencing agent* is a ``stonith``-class resource agent.
The fence agent standard provides commands (such as ``off`` and ``reboot``)
that the cluster can use to fence nodes. As with other resource agent classes,
this allows a layer of abstraction so that Pacemaker doesn't need any knowledge
about specific fencing technologies -- that knowledge is isolated in the agent.
Pacemaker supports two fence agent standards, both inherited from
no-longer-active projects:
* Red Hat Cluster Suite (RHCS) style: These are typically installed in
``/usr/sbin`` with names starting with ``fence_``.
* Linux-HA style: These typically have names starting with ``external/``.
Pacemaker can support these agents using the **fence_legacy** RHCS-style
agent as a wrapper, *if* support was enabled when Pacemaker was built, which
requires the ``cluster-glue`` library.
When a Fence Device Can Be Used
###############################
Fencing devices do not actually "run" like most services. Typically, they just
provide an interface for sending commands to an external device.
Additionally, fencing may be initiated by Pacemaker, by other cluster-aware
software such as DRBD or DLM, or manually by an administrator, at any point in
the cluster life cycle, including before any resources have been started.
To accommodate this, Pacemaker does not require the fence device resource to be
"started" in order to be used. Whether a fence device is started or not
determines whether a node runs any recurring monitor for the device, and gives
the node a slight preference for being chosen to execute fencing using that
device.
By default, any node can execute any fencing device. If a fence device is
disabled by setting its ``target-role`` to ``Stopped``, then no node can use
that device. If a location constraint with a negative score prevents a specific
node from "running" a fence device, then that node will never be chosen to
execute fencing using the device. A node may fence itself, but the cluster will
choose that only if no other nodes can do the fencing.
A common configuration scenario is to have one fence device per target node.
In such a case, users often configure anti-location constraints so that
the target node does not monitor its own device.
Limitations of Fencing Resources
################################
Fencing resources have certain limitations that other resource classes don't:
* They may have only one set of meta-attributes and one set of instance
attributes.
* If :ref:`rules` are used to determine fencing resource options, these
might be evaluated only when first read, meaning that later changes to the
rules will have no effect. Therefore, it is better to avoid confusion and not
use rules at all with fencing resources.
These limitations could be revisited if there is sufficient user demand.
.. index::
single: fencing; special instance attributes
.. _fencing-attributes:
Special Meta-Attributes for Fencing Resources
#############################################
The table below lists special resource meta-attributes that may be set for any
fencing resource.
.. table:: **Additional Properties of Fencing Resources**
:widths: 2 1 2 4
+----------------------+---------+--------------------+----------------------------------------+
| Field | Type | Default | Description |
+======================+=========+====================+========================================+
| provides | string | | .. index:: |
| | | | single: provides |
| | | | |
| | | | Any special capability provided by the |
| | | | fence device. Currently, only one such |
| | | | capability is meaningful: |
| | | | :ref:`unfencing <unfencing>`. |
+----------------------+---------+--------------------+----------------------------------------+
Special Instance Attributes for Fencing Resources
#################################################
The table below lists special instance attributes that may be set for any
fencing resource (*not* meta-attributes, even though they are interpreted by
Pacemaker rather than the fence agent). These are also listed in the man page
for ``pacemaker-fenced``.
.. Not_Yet_Implemented:
+----------------------+---------+--------------------+----------------------------------------+
| priority | integer | 0 | .. index:: |
| | | | single: priority |
| | | | |
| | | | The priority of the fence device. |
| | | | Devices are tried in order of highest |
| | | | priority to lowest. |
+----------------------+---------+--------------------+----------------------------------------+
.. table:: **Additional Properties of Fencing Resources**
:class: longtable
:widths: 2 1 2 4
+----------------------+---------+--------------------+----------------------------------------+
| Field | Type | Default | Description |
+======================+=========+====================+========================================+
| stonith-timeout | time | | .. index:: |
| | | | single: stonith-timeout |
| | | | |
| | | | This is not used by Pacemaker (see the |
| | | | ``pcmk_reboot_timeout``, |
| | | | ``pcmk_off_timeout``, etc. properties |
| | | | instead), but it may be used by |
| | | | Linux-HA fence agents. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_host_map | string | | .. index:: |
| | | | single: pcmk_host_map |
| | | | |
| | | | A mapping of node names to ports |
| | | | for devices that do not understand |
| | | | the node names. |
| | | | |
| | | | Example: ``node1:1;node2:2,3`` tells |
| | | | the cluster to use port 1 for |
| | | | ``node1`` and ports 2 and 3 for |
| | | | ``node2``. If ``pcmk_host_check`` is |
| | | | explicitly set to ``static-list``, |
| | | | either this or ``pcmk_host_list`` must |
| | | | be set. The port portion of the map |
| | | | may contain special characters such as |
| | | | spaces if preceded by a backslash |
| | | | *(since 2.1.2)*. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_host_list | string | | .. index:: |
| | | | single: pcmk_host_list |
| | | | |
| | | | A list of machines controlled by this |
| | | | device. If ``pcmk_host_check`` is |
| | | | explicitly set to ``static-list``, |
| | | | either this or ``pcmk_host_map`` must |
| | | | be set. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_host_check | string | Value appropriate | .. index:: |
| | | to other | single: pcmk_host_check |
| | | parameters (see | |
| | | "Default Check | The method Pacemaker should use to |
| | | Type" below) | determine which nodes can be targeted |
| | | | by this device. Allowed values: |
| | | | |
| | | | * ``static-list:`` targets are listed |
| | | | in the ``pcmk_host_list`` or |
| | | | ``pcmk_host_map`` attribute |
| | | | * ``dynamic-list:`` query the device |
| | | | via the agent's ``list`` action |
| | | | * ``status:`` query the device via the |
| | | | agent's ``status`` action |
| | | | * ``none:`` assume the device can |
| | | | fence any node |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_delay_max | time | 0s | .. index:: |
| | | | single: pcmk_delay_max |
| | | | |
| | | | Enable a delay of no more than the |
| | | | time specified before executing |
| | | | fencing actions. Pacemaker derives the |
| | | | overall delay by taking the value of |
| | | | pcmk_delay_base and adding a random |
| | | | delay value such that the sum is kept |
| | | | below this maximum. This is sometimes |
| | | | used in two-node clusters to ensure |
| | | | that the nodes don't fence each other |
| | | | at the same time. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_delay_base | time | 0s | .. index:: |
| | | | single: pcmk_delay_base |
| | | | |
| | | | Enable a static delay before executing |
| | | | fencing actions. This can be used, for |
| | | | example, in two-node clusters to |
| | | | ensure that the nodes don't fence each |
| | | | other, by having separate fencing |
| | | | resources with different values. The |
| | | | node that is fenced with the shorter |
| | | | delay will lose a fencing race. The |
| | | | overall delay introduced by pacemaker |
| | | | is derived from this value plus a |
| | | | random delay such that the sum is kept |
| | | | below the maximum delay. A single |
| | | | device can have different delays per |
| | | | node using a host map *(since 2.1.2)*, |
| | | | for example ``node1:0s;node2:5s.`` |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_action_limit | integer | 1 | .. index:: |
| | | | single: pcmk_action_limit |
| | | | |
| | | | The maximum number of actions that can |
| | | | be performed in parallel on this |
| | | | device. A value of -1 means unlimited. |
| | | | Node fencing actions initiated by the |
| | | | cluster (as opposed to an administrator|
| | | | running the ``stonith_admin`` tool or |
| | | | the fencer running recurring device |
| | | | monitors and ``status`` and ``list`` |
| | | | commands) are additionally subject to |
| | | | the ``concurrent-fencing`` cluster |
| | | | property. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_host_argument | string | ``port`` otherwise | .. index:: |
| | | ``plug`` if | single: pcmk_host_argument |
| | | supported | |
| | | according to the | *Advanced use only.* Which parameter |
| | | metadata of the | should be supplied to the fence agent |
| | | fence agent | to identify the node to be fenced. |
| | | | Some devices support neither the |
| | | | standard ``plug`` nor the deprecated |
| | | | ``port`` parameter, or may provide |
| | | | additional ones. Use this to specify |
| | | | an alternate, device-specific |
| | | | parameter. A value of ``none`` tells |
| | | | the cluster not to supply any |
| | | | additional parameters. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_reboot_action | string | reboot | .. index:: |
| | | | single: pcmk_reboot_action |
| | | | |
| | | | *Advanced use only.* The command to |
| | | | send to the resource agent in order to |
| | | | reboot a node. Some devices do not |
| | | | support the standard commands or may |
| | | | provide additional ones. Use this to |
| | | | specify an alternate, device-specific |
| | | | command. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_reboot_timeout | time | 60s | .. index:: |
| | | | single: pcmk_reboot_timeout |
| | | | |
| | | | *Advanced use only.* Specify an |
| | | | alternate timeout to use for |
| | | | ``reboot`` actions instead of the |
| | | | value of ``stonith-timeout``. Some |
| | | | devices need much more or less time to |
| | | | complete than normal. Use this to |
| | | | specify an alternate, device-specific |
| | | | timeout. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_reboot_retries | integer | 2 | .. index:: |
| | | | single: pcmk_reboot_retries |
| | | | |
| | | | *Advanced use only.* The maximum |
| | | | number of times to retry the |
| | | | ``reboot`` command within the timeout |
| | | | period. Some devices do not support |
| | | | multiple connections, and operations |
| | | | may fail if the device is busy with |
| | | | another task, so Pacemaker will |
| | | | automatically retry the operation, if |
| | | | there is time remaining. Use this |
| | | | option to alter the number of times |
| | | | Pacemaker retries before giving up. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_off_action | string | off | .. index:: |
| | | | single: pcmk_off_action |
| | | | |
| | | | *Advanced use only.* The command to |
| | | | send to the resource agent in order to |
| | | | shut down a node. Some devices do not |
| | | | support the standard commands or may |
| | | | provide additional ones. Use this to |
| | | | specify an alternate, device-specific |
| | | | command. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_off_timeout | time | 60s | .. index:: |
| | | | single: pcmk_off_timeout |
| | | | |
| | | | *Advanced use only.* Specify an |
| | | | alternate timeout to use for |
| | | | ``off`` actions instead of the |
| | | | value of ``stonith-timeout``. Some |
| | | | devices need much more or less time to |
| | | | complete than normal. Use this to |
| | | | specify an alternate, device-specific |
| | | | timeout. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_off_retries | integer | 2 | .. index:: |
| | | | single: pcmk_off_retries |
| | | | |
| | | | *Advanced use only.* The maximum |
| | | | number of times to retry the |
| | | | ``off`` command within the timeout |
| | | | period. Some devices do not support |
| | | | multiple connections, and operations |
| | | | may fail if the device is busy with |
| | | | another task, so Pacemaker will |
| | | | automatically retry the operation, if |
| | | | there is time remaining. Use this |
| | | | option to alter the number of times |
| | | | Pacemaker retries before giving up. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_list_action | string | list | .. index:: |
| | | | single: pcmk_list_action |
| | | | |
| | | | *Advanced use only.* The command to |
| | | | send to the resource agent in order to |
| | | | list nodes. Some devices do not |
| | | | support the standard commands or may |
| | | | provide additional ones. Use this to |
| | | | specify an alternate, device-specific |
| | | | command. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_list_timeout | time | 60s | .. index:: |
| | | | single: pcmk_list_timeout |
| | | | |
| | | | *Advanced use only.* Specify an |
| | | | alternate timeout to use for |
| | | | ``list`` actions instead of the |
| | | | value of ``stonith-timeout``. Some |
| | | | devices need much more or less time to |
| | | | complete than normal. Use this to |
| | | | specify an alternate, device-specific |
| | | | timeout. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_list_retries | integer | 2 | .. index:: |
| | | | single: pcmk_list_retries |
| | | | |
| | | | *Advanced use only.* The maximum |
| | | | number of times to retry the |
| | | | ``list`` command within the timeout |
| | | | period. Some devices do not support |
| | | | multiple connections, and operations |
| | | | may fail if the device is busy with |
| | | | another task, so Pacemaker will |
| | | | automatically retry the operation, if |
| | | | there is time remaining. Use this |
| | | | option to alter the number of times |
| | | | Pacemaker retries before giving up. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_monitor_action | string | monitor | .. index:: |
| | | | single: pcmk_monitor_action |
| | | | |
| | | | *Advanced use only.* The command to |
| | | | send to the resource agent in order to |
| | | | report extended status. Some devices do|
| | | | not support the standard commands or |
| | | | may provide additional ones. Use this |
| | | | to specify an alternate, |
| | | | device-specific command. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_monitor_timeout | time | 60s | .. index:: |
| | | | single: pcmk_monitor_timeout |
| | | | |
| | | | *Advanced use only.* Specify an |
| | | | alternate timeout to use for |
| | | | ``monitor`` actions instead of the |
| | | | value of ``stonith-timeout``. Some |
| | | | devices need much more or less time to |
| | | | complete than normal. Use this to |
| | | | specify an alternate, device-specific |
| | | | timeout. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_monitor_retries | integer | 2 | .. index:: |
| | | | single: pcmk_monitor_retries |
| | | | |
| | | | *Advanced use only.* The maximum |
| | | | number of times to retry the |
| | | | ``monitor`` command within the timeout |
| | | | period. Some devices do not support |
| | | | multiple connections, and operations |
| | | | may fail if the device is busy with |
| | | | another task, so Pacemaker will |
| | | | automatically retry the operation, if |
| | | | there is time remaining. Use this |
| | | | option to alter the number of times |
| | | | Pacemaker retries before giving up. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_status_action | string | status | .. index:: |
| | | | single: pcmk_status_action |
| | | | |
| | | | *Advanced use only.* The command to |
| | | | send to the resource agent in order to |
| | | | report status. Some devices do |
| | | | not support the standard commands or |
| | | | may provide additional ones. Use this |
| | | | to specify an alternate, |
| | | | device-specific command. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_status_timeout | time | 60s | .. index:: |
| | | | single: pcmk_status_timeout |
| | | | |
| | | | *Advanced use only.* Specify an |
| | | | alternate timeout to use for |
| | | | ``status`` actions instead of the |
| | | | value of ``stonith-timeout``. Some |
| | | | devices need much more or less time to |
| | | | complete than normal. Use this to |
| | | | specify an alternate, device-specific |
| | | | timeout. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_status_retries | integer | 2 | .. index:: |
| | | | single: pcmk_status_retries |
| | | | |
| | | | *Advanced use only.* The maximum |
| | | | number of times to retry the |
| | | | ``status`` command within the timeout |
| | | | period. Some devices do not support |
| | | | multiple connections, and operations |
| | | | may fail if the device is busy with |
| | | | another task, so Pacemaker will |
| | | | automatically retry the operation, if |
| | | | there is time remaining. Use this |
| | | | option to alter the number of times |
| | | | Pacemaker retries before giving up. |
+----------------------+---------+--------------------+----------------------------------------+
Default Check Type
##################
If the user does not explicitly configure ``pcmk_host_check`` for a fence
device, a default value appropriate to other configured parameters will be
used:
* If either ``pcmk_host_list`` or ``pcmk_host_map`` is configured,
``static-list`` will be used;
* otherwise, if the fence device supports the ``list`` action, and the first
attempt at using ``list`` succeeds, ``dynamic-list`` will be used;
* otherwise, if the fence device supports the ``status`` action, ``status``
will be used;
* otherwise, ``none`` will be used.
.. index::
single: unfencing
single: fencing; unfencing
.. _unfencing:
Unfencing
#########
With fabric fencing (such as cutting network or shared disk access rather than
power), it is expected that the cluster will fence the node, and then a system
administrator must manually investigate what went wrong, correct any issues
found, then reboot (or restart the cluster services on) the node.
Once the node reboots and rejoins the cluster, some fabric fencing devices
require an explicit command to restore the node's access. This capability is
called *unfencing* and is typically implemented as the fence agent's ``on``
command.
If any cluster resource has ``requires`` set to ``unfencing``, then that
resource will not be probed or started on a node until that node has been
unfenced.
Fencing and Quorum
##################
In general, a cluster partition may execute fencing only if the partition has
quorum, and the ``stonith-enabled`` cluster property is set to true. However,
there are exceptions:
* The requirements apply only to fencing initiated by Pacemaker. If an
administrator initiates fencing using the ``stonith_admin`` command, or an
external application such as DLM initiates fencing using Pacemaker's C API,
the requirements do not apply.
* A cluster partition without quorum is allowed to fence any active member of
that partition. As a corollary, this allows a ``no-quorum-policy`` of
``suicide`` to work.
* If the ``no-quorum-policy`` cluster property is set to ``ignore``, then
quorum is not required to execute fencing of any node.
Fencing Timeouts
################
Fencing timeouts are complicated, since a single fencing operation can involve
many steps, each of which may have a separate timeout.
Fencing may be initiated in one of several ways:
* An administrator may initiate fencing using the ``stonith_admin`` tool,
which has a ``--timeout`` option (defaulting to 2 minutes) that will be used
as the fence operation timeout.
* An external application such as DLM may initiate fencing using the Pacemaker
C API. The application will specify the fence operation timeout in this case,
which might or might not be configurable by the user.
* The cluster may initiate fencing itself. In this case, the
``stonith-timeout`` cluster property (defaulting to 1 minute) will be used as
the fence operation timeout.
However fencing is initiated, the initiator contacts Pacemaker's fencer
(``pacemaker-fenced``) to request fencing. This connection and request has its
own timeout, separate from the fencing operation timeout, but usually happens
very quickly.
The fencer will contact all fencers in the cluster to ask what devices they
have available to fence the target node. The fence operation timeout will be
used as the timeout for each of these queries.
Once a fencing device has been selected, the fencer will check whether any
action-specific timeout has been configured for the device, to use instead of
the fence operation timeout. For example, if ``stonith-timeout`` is 60 seconds,
but the fencing device has ``pcmk_reboot_timeout`` configured as 90 seconds,
then a timeout of 90 seconds will be used for reboot actions using that device.
A device may have retries configured, in which case the timeout applies across
all attempts. For example, if a device has ``pcmk_reboot_retries`` configured
as 2, and the first reboot attempt fails, the second attempt will only have
whatever time is remaining in the action timeout after subtracting how much
time the first attempt used. This means that if the first attempt fails due to
using the entire timeout, no further attempts will be made. There is currently
no way to configure a per-attempt timeout.
If more than one device is required to fence a target, whether due to failure
of the first device or a fencing topology with multiple devices configured for
the target, each device will have its own separate action timeout.
For all of the above timeouts, the fencer will generally multiply the
configured value by 1.2 to get an actual value to use, to account for time
needed by the fencer's own processing.
Separate from the fencer's timeouts, some fence agents have internal timeouts
for individual steps of their fencing process. These agents often have
parameters to configure these timeouts, such as ``login-timeout``,
``shell-timeout``, or ``power-timeout``. Many such agents also have a
``disable-timeout`` parameter to ignore their internal timeouts and just let
Pacemaker handle the timeout. This causes a difference in retry behavior.
If ``disable-timeout`` is not set, and the agent hits one of its internal
timeouts, it will report that as a failure to Pacemaker, which can then retry.
If ``disable-timeout`` is set, and Pacemaker hits a timeout for the agent, then
there will be no time remaining, and no retry will be done.
Fence Devices Dependent on Other Resources
##########################################
In some cases, a fence device may require some other cluster resource (such as
an IP address) to be active in order to function properly.
This is obviously undesirable in general: fencing may be required when the
depended-on resource is not active, or fencing may be required because the node
running the depended-on resource is no longer responding.
However, this may be acceptable under certain conditions:
* The dependent fence device should not be able to target any node that is
allowed to run the depended-on resource.
* The depended-on resource should not be disabled during production operation.
* The ``concurrent-fencing`` cluster property should be set to ``true``.
Otherwise, if both the node running the depended-on resource and some node
targeted by the dependent fence device need to be fenced, the fencing of the
node running the depended-on resource might be ordered first, making the
second fencing impossible and blocking further recovery. With concurrent
fencing, the dependent fence device might fail at first due to the
depended-on resource being unavailable, but it will be retried and eventually
succeed once the resource is brought back up.
Even under those conditions, there is one unlikely problem scenario. The DC
always schedules fencing of itself after any other fencing needed, to avoid
unnecessary repeated DC elections. If the dependent fence device targets the
DC, and both the DC and a different node running the depended-on resource need
to be fenced, the DC fencing will always fail and block further recovery. Note,
however, that losing a DC node entirely causes some other node to become DC and
schedule the fencing, so this is only a risk when a stop or other operation
with ``on-fail`` set to ``fencing`` fails on the DC.
.. index::
single: fencing; configuration
Configuring Fencing
###################
Higher-level tools can provide simpler interfaces to this process, but using
Pacemaker command-line tools, this is how you could configure a fence device.
#. Find the correct driver:
.. code-block:: none
# stonith_admin --list-installed
.. note::
You may have to install packages to make fence agents available on your
host. Searching your available packages for ``fence-`` is usually
helpful. Ensure the packages providing the fence agents you require are
installed on every cluster node.
#. Find the required parameters associated with the device
(replacing ``$AGENT_NAME`` with the name obtained from the previous step):
.. code-block:: none
# stonith_admin --metadata --agent $AGENT_NAME
#. Create a file called ``stonith.xml`` containing a primitive resource
with a class of ``stonith``, a type equal to the agent name obtained earlier,
and a parameter for each of the values returned in the previous step.
#. If the device does not know how to fence nodes based on their uname,
you may also need to set the special ``pcmk_host_map`` parameter. See
:ref:`fencing-attributes` for details.
#. If the device does not support the ``list`` command, you may also need
to set the special ``pcmk_host_list`` and/or ``pcmk_host_check``
parameters. See :ref:`fencing-attributes` for details.
#. If the device does not expect the target to be specified with the
``port`` parameter, you may also need to set the special
``pcmk_host_argument`` parameter. See :ref:`fencing-attributes` for details.
#. Upload it into the CIB using cibadmin:
.. code-block:: none
# cibadmin --create --scope resources --xml-file stonith.xml
#. Set ``stonith-enabled`` to true:
.. code-block:: none
# crm_attribute --type crm_config --name stonith-enabled --update true
#. Once the stonith resource is running, you can test it by executing the
following, replacing ``$NODE_NAME`` with the name of the node to fence
(although you might want to stop the cluster on that machine first):
.. code-block:: none
# stonith_admin --reboot $NODE_NAME
Example Fencing Configuration
_____________________________
For this example, we assume we have a cluster node, ``pcmk-1``, whose IPMI
controller is reachable at the IP address 192.0.2.1. The IPMI controller uses
the username ``testuser`` and the password ``abc123``.
#. Looking at what's installed, we may see a variety of available agents:
.. code-block:: none
# stonith_admin --list-installed
.. code-block:: none
(... some output omitted ...)
fence_idrac
fence_ilo3
fence_ilo4
fence_ilo5
fence_imm
fence_ipmilan
(... some output omitted ...)
Perhaps after some reading some man pages and doing some Internet searches,
we might decide ``fence_ipmilan`` is our best choice.
#. Next, we would check what parameters ``fence_ipmilan`` provides:
.. code-block:: none
# stonith_admin --metadata -a fence_ipmilan
.. code-block:: xml
<resource-agent name="fence_ipmilan" shortdesc="Fence agent for IPMI">
<symlink name="fence_ilo3" shortdesc="Fence agent for HP iLO3"/>
<symlink name="fence_ilo4" shortdesc="Fence agent for HP iLO4"/>
<symlink name="fence_ilo5" shortdesc="Fence agent for HP iLO5"/>
<symlink name="fence_imm" shortdesc="Fence agent for IBM Integrated Management Module"/>
<symlink name="fence_idrac" shortdesc="Fence agent for Dell iDRAC"/>
<longdesc>fence_ipmilan is an I/O Fencing agentwhich can be used with machines controlled by IPMI.This agent calls support software ipmitool (http://ipmitool.sf.net/). WARNING! This fence agent might report success before the node is powered off. You should use -m/method onoff if your fence device works correctly with that option.</longdesc>
<vendor-url/>
<parameters>
<parameter name="action" unique="0" required="0">
<getopt mixed="-o, --action=[action]"/>
<content type="string" default="reboot"/>
<shortdesc lang="en">Fencing action</shortdesc>
</parameter>
<parameter name="auth" unique="0" required="0">
<getopt mixed="-A, --auth=[auth]"/>
<content type="select">
<option value="md5"/>
<option value="password"/>
<option value="none"/>
</content>
<shortdesc lang="en">IPMI Lan Auth type.</shortdesc>
</parameter>
<parameter name="cipher" unique="0" required="0">
<getopt mixed="-C, --cipher=[cipher]"/>
<content type="string"/>
<shortdesc lang="en">Ciphersuite to use (same as ipmitool -C parameter)</shortdesc>
</parameter>
<parameter name="hexadecimal_kg" unique="0" required="0">
<getopt mixed="--hexadecimal-kg=[key]"/>
<content type="string"/>
<shortdesc lang="en">Hexadecimal-encoded Kg key for IPMIv2 authentication</shortdesc>
</parameter>
<parameter name="ip" unique="0" required="0" obsoletes="ipaddr">
<getopt mixed="-a, --ip=[ip]"/>
<content type="string"/>
<shortdesc lang="en">IP address or hostname of fencing device</shortdesc>
</parameter>
<parameter name="ipaddr" unique="0" required="0" deprecated="1">
<getopt mixed="-a, --ip=[ip]"/>
<content type="string"/>
<shortdesc lang="en">IP address or hostname of fencing device</shortdesc>
</parameter>
<parameter name="ipport" unique="0" required="0">
<getopt mixed="-u, --ipport=[port]"/>
<content type="integer" default="623"/>
<shortdesc lang="en">TCP/UDP port to use for connection with device</shortdesc>
</parameter>
<parameter name="lanplus" unique="0" required="0">
<getopt mixed="-P, --lanplus"/>
<content type="boolean" default="0"/>
<shortdesc lang="en">Use Lanplus to improve security of connection</shortdesc>
</parameter>
<parameter name="login" unique="0" required="0" deprecated="1">
<getopt mixed="-l, --username=[name]"/>
<content type="string"/>
<shortdesc lang="en">Login name</shortdesc>
</parameter>
<parameter name="method" unique="0" required="0">
<getopt mixed="-m, --method=[method]"/>
<content type="select" default="onoff">
<option value="onoff"/>
<option value="cycle"/>
</content>
<shortdesc lang="en">Method to fence</shortdesc>
</parameter>
<parameter name="passwd" unique="0" required="0" deprecated="1">
<getopt mixed="-p, --password=[password]"/>
<content type="string"/>
<shortdesc lang="en">Login password or passphrase</shortdesc>
</parameter>
<parameter name="passwd_script" unique="0" required="0" deprecated="1">
<getopt mixed="-S, --password-script=[script]"/>
<content type="string"/>
<shortdesc lang="en">Script to run to retrieve password</shortdesc>
</parameter>
<parameter name="password" unique="0" required="0" obsoletes="passwd">
<getopt mixed="-p, --password=[password]"/>
<content type="string"/>
<shortdesc lang="en">Login password or passphrase</shortdesc>
</parameter>
<parameter name="password_script" unique="0" required="0" obsoletes="passwd_script">
<getopt mixed="-S, --password-script=[script]"/>
<content type="string"/>
<shortdesc lang="en">Script to run to retrieve password</shortdesc>
</parameter>
<parameter name="plug" unique="0" required="0" obsoletes="port">
<getopt mixed="-n, --plug=[ip]"/>
<content type="string"/>
<shortdesc lang="en">IP address or hostname of fencing device (together with --port-as-ip)</shortdesc>
</parameter>
<parameter name="port" unique="0" required="0" deprecated="1">
<getopt mixed="-n, --plug=[ip]"/>
<content type="string"/>
<shortdesc lang="en">IP address or hostname of fencing device (together with --port-as-ip)</shortdesc>
</parameter>
<parameter name="privlvl" unique="0" required="0">
<getopt mixed="-L, --privlvl=[level]"/>
<content type="select" default="administrator">
<option value="callback"/>
<option value="user"/>
<option value="operator"/>
<option value="administrator"/>
</content>
<shortdesc lang="en">Privilege level on IPMI device</shortdesc>
</parameter>
<parameter name="target" unique="0" required="0">
<getopt mixed="--target=[targetaddress]"/>
<content type="string"/>
<shortdesc lang="en">Bridge IPMI requests to the remote target address</shortdesc>
</parameter>
<parameter name="username" unique="0" required="0" obsoletes="login">
<getopt mixed="-l, --username=[name]"/>
<content type="string"/>
<shortdesc lang="en">Login name</shortdesc>
</parameter>
<parameter name="quiet" unique="0" required="0">
<getopt mixed="-q, --quiet"/>
<content type="boolean"/>
<shortdesc lang="en">Disable logging to stderr. Does not affect --verbose or --debug-file or logging to syslog.</shortdesc>
</parameter>
<parameter name="verbose" unique="0" required="0">
<getopt mixed="-v, --verbose"/>
<content type="boolean"/>
<shortdesc lang="en">Verbose mode</shortdesc>
</parameter>
<parameter name="debug" unique="0" required="0" deprecated="1">
<getopt mixed="-D, --debug-file=[debugfile]"/>
<content type="string"/>
<shortdesc lang="en">Write debug information to given file</shortdesc>
</parameter>
<parameter name="debug_file" unique="0" required="0" obsoletes="debug">
<getopt mixed="-D, --debug-file=[debugfile]"/>
<content type="string"/>
<shortdesc lang="en">Write debug information to given file</shortdesc>
</parameter>
<parameter name="version" unique="0" required="0">
<getopt mixed="-V, --version"/>
<content type="boolean"/>
<shortdesc lang="en">Display version information and exit</shortdesc>
</parameter>
<parameter name="help" unique="0" required="0">
<getopt mixed="-h, --help"/>
<content type="boolean"/>
<shortdesc lang="en">Display help and exit</shortdesc>
</parameter>
<parameter name="delay" unique="0" required="0">
<getopt mixed="--delay=[seconds]"/>
<content type="second" default="0"/>
<shortdesc lang="en">Wait X seconds before fencing is started</shortdesc>
</parameter>
<parameter name="ipmitool_path" unique="0" required="0">
<getopt mixed="--ipmitool-path=[path]"/>
<content type="string" default="/usr/bin/ipmitool"/>
<shortdesc lang="en">Path to ipmitool binary</shortdesc>
</parameter>
<parameter name="login_timeout" unique="0" required="0">
<getopt mixed="--login-timeout=[seconds]"/>
<content type="second" default="5"/>
<shortdesc lang="en">Wait X seconds for cmd prompt after login</shortdesc>
</parameter>
<parameter name="port_as_ip" unique="0" required="0">
<getopt mixed="--port-as-ip"/>
<content type="boolean"/>
<shortdesc lang="en">Make "port/plug" to be an alias to IP address</shortdesc>
</parameter>
<parameter name="power_timeout" unique="0" required="0">
<getopt mixed="--power-timeout=[seconds]"/>
<content type="second" default="20"/>
<shortdesc lang="en">Test X seconds for status change after ON/OFF</shortdesc>
</parameter>
<parameter name="power_wait" unique="0" required="0">
<getopt mixed="--power-wait=[seconds]"/>
<content type="second" default="2"/>
<shortdesc lang="en">Wait X seconds after issuing ON/OFF</shortdesc>
</parameter>
<parameter name="shell_timeout" unique="0" required="0">
<getopt mixed="--shell-timeout=[seconds]"/>
<content type="second" default="3"/>
<shortdesc lang="en">Wait X seconds for cmd prompt after issuing command</shortdesc>
</parameter>
<parameter name="retry_on" unique="0" required="0">
<getopt mixed="--retry-on=[attempts]"/>
<content type="integer" default="1"/>
<shortdesc lang="en">Count of attempts to retry power on</shortdesc>
</parameter>
<parameter name="sudo" unique="0" required="0" deprecated="1">
<getopt mixed="--use-sudo"/>
<content type="boolean"/>
<shortdesc lang="en">Use sudo (without password) when calling 3rd party software</shortdesc>
</parameter>
<parameter name="use_sudo" unique="0" required="0" obsoletes="sudo">
<getopt mixed="--use-sudo"/>
<content type="boolean"/>
<shortdesc lang="en">Use sudo (without password) when calling 3rd party software</shortdesc>
</parameter>
<parameter name="sudo_path" unique="0" required="0">
<getopt mixed="--sudo-path=[path]"/>
<content type="string" default="/usr/bin/sudo"/>
<shortdesc lang="en">Path to sudo binary</shortdesc>
</parameter>
</parameters>
<actions>
<action name="on" automatic="0"/>
<action name="off"/>
<action name="reboot"/>
<action name="status"/>
<action name="monitor"/>
<action name="metadata"/>
<action name="manpage"/>
<action name="validate-all"/>
<action name="diag"/>
<action name="stop" timeout="20s"/>
<action name="start" timeout="20s"/>
</actions>
</resource-agent>
Once we've decided what parameter values we think we need, it is a good idea
to run the fence agent's status action manually, to verify that our values
work correctly:
.. code-block:: none
# fence_ipmilan --lanplus -a 192.0.2.1 -l testuser -p abc123 -o status
Chassis Power is on
#. Based on that, we might create a fencing resource configuration like this in
``stonith.xml`` (or any file name, just use the same name with ``cibadmin``
later):
.. code-block:: xml
<primitive id="Fencing-pcmk-1" class="stonith" type="fence_ipmilan" >
<instance_attributes id="Fencing-params" >
<nvpair id="Fencing-lanplus" name="lanplus" value="1" />
<nvpair id="Fencing-ip" name="ip" value="192.0.2.1" />
<nvpair id="Fencing-password" name="password" value="testuser" />
<nvpair id="Fencing-username" name="username" value="abc123" />
</instance_attributes>
<operations >
<op id="Fencing-monitor-10m" interval="10m" name="monitor" timeout="300s" />
</operations>
</primitive>
.. note::
Even though the man page shows that the ``action`` parameter is
supported, we do not provide that in the resource configuration.
Pacemaker will supply an appropriate action whenever the fence device
must be used.
#. In this case, we don't need to configure ``pcmk_host_map`` because
``fence_ipmilan`` ignores the target node name and instead uses its
``ip`` parameter to know how to contact the IPMI controller.
#. We do need to let Pacemaker know which cluster node can be fenced by this
device, since ``fence_ipmilan`` doesn't support the ``list`` action. Add
a line like this to the agent's instance attributes:
.. code-block:: xml
<nvpair id="Fencing-pcmk_host_list" name="pcmk_host_list" value="pcmk-1" />
#. We don't need to configure ``pcmk_host_argument`` since ``ip`` is all the
fence agent needs (it ignores the target name).
#. Make the configuration active:
.. code-block:: none
# cibadmin --create --scope resources --xml-file stonith.xml
#. Set ``stonith-enabled`` to true (this only has to be done once):
.. code-block:: none
# crm_attribute --type crm_config --name stonith-enabled --update true
#. Since our cluster is still in testing, we can reboot ``pcmk-1`` without
bothering anyone, so we'll test our fencing configuration by running this
from one of the other cluster nodes:
.. code-block:: none
# stonith_admin --reboot pcmk-1
Then we will verify that the node did, in fact, reboot.
We can repeat that process to create a separate fencing resource for each node.
With some other fence device types, a single fencing resource is able to be
used for all nodes. In fact, we could do that with ``fence_ipmilan``, using the
``port-as-ip`` parameter along with ``pcmk_host_map``. Either approach is
fine.
.. index::
single: fencing; topology
single: fencing-topology
single: fencing-level
Fencing Topologies
##################
Pacemaker supports fencing nodes with multiple devices through a feature called
*fencing topologies*. Fencing topologies may be used to provide alternative
devices in case one fails, or to require multiple devices to all be executed
successfully in order to consider the node successfully fenced, or even a
combination of the two.
Create the individual devices as you normally would, then define one or more
``fencing-level`` entries in the ``fencing-topology`` section of the
configuration.
* Each fencing level is attempted in order of ascending ``index``. Allowed
values are 1 through 9.
* If a device fails, processing terminates for the current level. No further
devices in that level are exercised, and the next level is attempted instead.
* If the operation succeeds for all the listed devices in a level, the level is
deemed to have passed.
* The operation is finished when a level has passed (success), or all levels
have been attempted (failed).
* If the operation failed, the next step is determined by the scheduler and/or
the controller.
Some possible uses of topologies include:
* Try on-board IPMI, then an intelligent power switch if that fails
* Try fabric fencing of both disk and network, then fall back to power fencing
if either fails
* Wait up to a certain time for a kernel dump to complete, then cut power to
the node
.. table:: **Attributes of a fencing-level Element**
:class: longtable
:widths: 1 4
+------------------+-----------------------------------------------------------------------------------------+
| Attribute | Description |
+==================+=========================================================================================+
| id | .. index:: |
| | pair: fencing-level; id |
| | |
| | A unique name for this element (required) |
+------------------+-----------------------------------------------------------------------------------------+
| target | .. index:: |
| | pair: fencing-level; target |
| | |
| | The name of a single node to which this level applies |
+------------------+-----------------------------------------------------------------------------------------+
| target-pattern | .. index:: |
| | pair: fencing-level; target-pattern |
| | |
| | An extended regular expression (as defined in `POSIX |
| | <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04>`_) |
| | matching the names of nodes to which this level applies |
+------------------+-----------------------------------------------------------------------------------------+
| target-attribute | .. index:: |
| | pair: fencing-level; target-attribute |
| | |
| | The name of a node attribute that is set (to ``target-value``) for nodes to which this |
| | level applies |
+------------------+-----------------------------------------------------------------------------------------+
| target-value | .. index:: |
| | pair: fencing-level; target-value |
| | |
| | The node attribute value (of ``target-attribute``) that is set for nodes to which this |
| | level applies |
+------------------+-----------------------------------------------------------------------------------------+
| index | .. index:: |
| | pair: fencing-level; index |
| | |
| | The order in which to attempt the levels. Levels are attempted in ascending order |
| | *until one succeeds*. Valid values are 1 through 9. |
+------------------+-----------------------------------------------------------------------------------------+
| devices | .. index:: |
| | pair: fencing-level; devices |
| | |
| | A comma-separated list of devices that must all be tried for this level |
+------------------+-----------------------------------------------------------------------------------------+
.. note:: **Fencing topology with different devices for different nodes**
.. code-block:: xml
<cib crm_feature_set="3.6.0" validate-with="pacemaker-3.5" admin_epoch="1" epoch="0" num_updates="0">
<configuration>
...
<fencing-topology>
<!-- For pcmk-1, try poison-pill and fail back to power -->
<fencing-level id="f-p1.1" target="pcmk-1" index="1" devices="poison-pill"/>
<fencing-level id="f-p1.2" target="pcmk-1" index="2" devices="power"/>
<!-- For pcmk-2, try disk and network, and fail back to power -->
<fencing-level id="f-p2.1" target="pcmk-2" index="1" devices="disk,network"/>
<fencing-level id="f-p2.2" target="pcmk-2" index="2" devices="power"/>
</fencing-topology>
...
<configuration>
<status/>
</cib>
Example Dual-Layer, Dual-Device Fencing Topologies
__________________________________________________
The following example illustrates an advanced use of ``fencing-topology`` in a
cluster with the following properties:
* 2 nodes (prod-mysql1 and prod-mysql2)
* the nodes have IPMI controllers reachable at 192.0.2.1 and 192.0.2.2
* the nodes each have two independent Power Supply Units (PSUs) connected to
two independent Power Distribution Units (PDUs) reachable at 198.51.100.1
(port 10 and port 11) and 203.0.113.1 (port 10 and port 11)
* fencing via the IPMI controller uses the ``fence_ipmilan`` agent (1 fence device
per controller, with each device targeting a separate node)
* fencing via the PDUs uses the ``fence_apc_snmp`` agent (1 fence device per
PDU, with both devices targeting both nodes)
* a random delay is used to lessen the chance of a "death match"
* fencing topology is set to try IPMI fencing first then dual PDU fencing if
that fails
In a node failure scenario, Pacemaker will first select ``fence_ipmilan`` to
try to kill the faulty node. Using the fencing topology, if that method fails,
it will then move on to selecting ``fence_apc_snmp`` twice (once for the first
PDU, then again for the second PDU).
The fence action is considered successful only if both PDUs report the required
status. If any of them fails, fencing loops back to the first fencing method,
``fence_ipmilan``, and so on, until the node is fenced or the fencing action is
cancelled.
.. note:: **First fencing method: single IPMI device per target**
Each cluster node has it own dedicated IPMI controller that can be contacted
for fencing using the following primitives:
.. code-block:: xml
<primitive class="stonith" id="fence_prod-mysql1_ipmi" type="fence_ipmilan">
<instance_attributes id="fence_prod-mysql1_ipmi-instance_attributes">
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-ipaddr" name="ipaddr" value="192.0.2.1"/>
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-login" name="login" value="fencing"/>
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-passwd" name="passwd" value="finishme"/>
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-lanplus" name="lanplus" value="true"/>
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql1"/>
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-pcmk_delay_max" name="pcmk_delay_max" value="8s"/>
</instance_attributes>
</primitive>
<primitive class="stonith" id="fence_prod-mysql2_ipmi" type="fence_ipmilan">
<instance_attributes id="fence_prod-mysql2_ipmi-instance_attributes">
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-ipaddr" name="ipaddr" value="192.0.2.2"/>
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-login" name="login" value="fencing"/>
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-passwd" name="passwd" value="finishme"/>
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-lanplus" name="lanplus" value="true"/>
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql2"/>
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-pcmk_delay_max" name="pcmk_delay_max" value="8s"/>
</instance_attributes>
</primitive>
.. note:: **Second fencing method: dual PDU devices**
Each cluster node also has 2 distinct power supplies controlled by 2
distinct PDUs:
* Node 1: PDU 1 port 10 and PDU 2 port 10
* Node 2: PDU 1 port 11 and PDU 2 port 11
The matching fencing agents are configured as follows:
.. code-block:: xml
<primitive class="stonith" id="fence_apc1" type="fence_apc_snmp">
<instance_attributes id="fence_apc1-instance_attributes">
<nvpair id="fence_apc1-instance_attributes-ipaddr" name="ipaddr" value="198.51.100.1"/>
<nvpair id="fence_apc1-instance_attributes-login" name="login" value="fencing"/>
<nvpair id="fence_apc1-instance_attributes-passwd" name="passwd" value="fencing"/>
<nvpair id="fence_apc1-instance_attributes-pcmk_host_list"
name="pcmk_host_map" value="prod-mysql1:10;prod-mysql2:11"/>
<nvpair id="fence_apc1-instance_attributes-pcmk_delay_max" name="pcmk_delay_max" value="8s"/>
</instance_attributes>
</primitive>
<primitive class="stonith" id="fence_apc2" type="fence_apc_snmp">
<instance_attributes id="fence_apc2-instance_attributes">
<nvpair id="fence_apc2-instance_attributes-ipaddr" name="ipaddr" value="203.0.113.1"/>
<nvpair id="fence_apc2-instance_attributes-login" name="login" value="fencing"/>
<nvpair id="fence_apc2-instance_attributes-passwd" name="passwd" value="fencing"/>
<nvpair id="fence_apc2-instance_attributes-pcmk_host_list"
name="pcmk_host_map" value="prod-mysql1:10;prod-mysql2:11"/>
<nvpair id="fence_apc2-instance_attributes-pcmk_delay_max" name="pcmk_delay_max" value="8s"/>
</instance_attributes>
</primitive>
.. note:: **Fencing topology**
Now that all the fencing resources are defined, it's time to create the
right topology. We want to first fence using IPMI and if that does not work,
fence both PDUs to effectively and surely kill the node.
.. code-block:: xml
<fencing-topology>
<fencing-level id="level-1-1" target="prod-mysql1" index="1" devices="fence_prod-mysql1_ipmi" />
<fencing-level id="level-1-2" target="prod-mysql1" index="2" devices="fence_apc1,fence_apc2" />
<fencing-level id="level-2-1" target="prod-mysql2" index="1" devices="fence_prod-mysql2_ipmi" />
<fencing-level id="level-2-2" target="prod-mysql2" index="2" devices="fence_apc1,fence_apc2" />
</fencing-topology>
In ``fencing-topology``, the lowest ``index`` value for a target determines
its first fencing method.
Remapping Reboots
#################
When the cluster needs to reboot a node, whether because ``stonith-action`` is
``reboot`` or because a reboot was requested externally (such as by
``stonith_admin --reboot``), it will remap that to other commands in two cases:
* If the chosen fencing device does not support the ``reboot`` command, the
cluster will ask it to perform ``off`` instead.
* If a fencing topology level with multiple devices must be executed, the
cluster will ask all the devices to perform ``off``, then ask the devices to
perform ``on``.
To understand the second case, consider the example of a node with redundant
power supplies connected to intelligent power switches. Rebooting one switch
and then the other would have no effect on the node. Turning both switches off,
and then on, actually reboots the node.
In such a case, the fencing operation will be treated as successful as long as
the ``off`` commands succeed, because then it is safe for the cluster to
recover any resources that were on the node. Timeouts and errors in the ``on``
phase will be logged but ignored.
When a reboot operation is remapped, any action-specific timeout for the
remapped action will be used (for example, ``pcmk_off_timeout`` will be used
when executing the ``off`` command, not ``pcmk_reboot_timeout``).
|