summaryrefslogtreecommitdiffstats
path: root/agents/crosslink/README.md
blob: 990b7906f6759b92071c91d0d054fef5272f10d7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Two node cross-link fence agent

The problem that this fence agents tries to solve is the following:

Given a two-node cluster with a direct crosslink ethernet cable
between the two nodes (in addition to the normal networking setup), we want to
be able to maintain quorum on node (A) when node (B) lost power.
The loss of power on node (B) in this case implies its BMC/IPMI is also
not available which would be normally used in fencing in this case.

Note: An external PDU would be preferrable and would solve this
situation more elegantly. The assumption here is that something
like that won't be available in this environment.

This works by creating a stonith level composed of a BMC/IPMI
fencing at level 1 and then the fence_crosslink agent at level 2.
 
In case node (A) has lost power, then node (B) will do the following:
1. Try to fence node (B) via IPMI, which will fail since the node has no
power and the BMC is unavailable
2. Check via fence_crosslink the cross-cable interconnect. If the cross cable
IP is not reachable, then we know for "sure" (this is a potentially broad
assumption) that the node is really down and fence_crosslink tells pacemaker
that the fencing was successful, so pacemaker can work with that new
information.
 
Here are some example configuration commands:
~~~
pcs stonith create crosslink-controller-1 fence_crosslink crosscableip=1.1.1.2 pcmk_host_list=controller-1 pcmk_reboot_action=off
pcs stonith create crosslink-controller-0 fence_crosslink crosscableip=1.1.1.1 pcmk_host_list=controller-0 pcmk_reboot_action=off
# We make sure the stonith resource do not run on the same node as the fencing target
pcs constraint location crosslink-controller-1 avoids controller-1
pcs constraint location crosslink-controller-0 avoids controller-0
pcs stonith level add 2 controller-0 crosslink-controller-0
pcs stonith level add 2 controller-1 crosslink-controller-1
~~~

Testing done:
- Simulate power outage by turning off the controller-1 VM and its IPMI interface and leaving the crosslink intact.

  * Expected Outcome:
  We should retain quorum on controller-0 and all services should be running on controller-0. No UNCLEAN resources should be observed on controller-0.
  * Actual Outcome:
  Matched the expected outcome.