diff options
Diffstat (limited to 'agents/crosslink/README.md')
-rw-r--r-- | agents/crosslink/README.md | 44 |
1 files changed, 44 insertions, 0 deletions
diff --git a/agents/crosslink/README.md b/agents/crosslink/README.md new file mode 100644 index 0000000..990b790 --- /dev/null +++ b/agents/crosslink/README.md @@ -0,0 +1,44 @@ +# Two node cross-link fence agent + +The problem that this fence agents tries to solve is the following: + +Given a two-node cluster with a direct crosslink ethernet cable +between the two nodes (in addition to the normal networking setup), we want to +be able to maintain quorum on node (A) when node (B) lost power. +The loss of power on node (B) in this case implies its BMC/IPMI is also +not available which would be normally used in fencing in this case. + +Note: An external PDU would be preferrable and would solve this +situation more elegantly. The assumption here is that something +like that won't be available in this environment. + +This works by creating a stonith level composed of a BMC/IPMI +fencing at level 1 and then the fence_crosslink agent at level 2. + +In case node (A) has lost power, then node (B) will do the following: +1. Try to fence node (B) via IPMI, which will fail since the node has no +power and the BMC is unavailable +2. Check via fence_crosslink the cross-cable interconnect. If the cross cable +IP is not reachable, then we know for "sure" (this is a potentially broad +assumption) that the node is really down and fence_crosslink tells pacemaker +that the fencing was successful, so pacemaker can work with that new +information. + +Here are some example configuration commands: +~~~ +pcs stonith create crosslink-controller-1 fence_crosslink crosscableip=1.1.1.2 pcmk_host_list=controller-1 pcmk_reboot_action=off +pcs stonith create crosslink-controller-0 fence_crosslink crosscableip=1.1.1.1 pcmk_host_list=controller-0 pcmk_reboot_action=off +# We make sure the stonith resource do not run on the same node as the fencing target +pcs constraint location crosslink-controller-1 avoids controller-1 +pcs constraint location crosslink-controller-0 avoids controller-0 +pcs stonith level add 2 controller-0 crosslink-controller-0 +pcs stonith level add 2 controller-1 crosslink-controller-1 +~~~ + +Testing done: +- Simulate power outage by turning off the controller-1 VM and its IPMI interface and leaving the crosslink intact. + + * Expected Outcome: + We should retain quorum on controller-0 and all services should be running on controller-0. No UNCLEAN resources should be observed on controller-0. + * Actual Outcome: + Matched the expected outcome. |