summaryrefslogtreecommitdiffstats
path: root/agents/crosslink/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'agents/crosslink/README.md')
-rw-r--r--agents/crosslink/README.md44
1 files changed, 44 insertions, 0 deletions
diff --git a/agents/crosslink/README.md b/agents/crosslink/README.md
new file mode 100644
index 0000000..990b790
--- /dev/null
+++ b/agents/crosslink/README.md
@@ -0,0 +1,44 @@
+# Two node cross-link fence agent
+
+The problem that this fence agents tries to solve is the following:
+
+Given a two-node cluster with a direct crosslink ethernet cable
+between the two nodes (in addition to the normal networking setup), we want to
+be able to maintain quorum on node (A) when node (B) lost power.
+The loss of power on node (B) in this case implies its BMC/IPMI is also
+not available which would be normally used in fencing in this case.
+
+Note: An external PDU would be preferrable and would solve this
+situation more elegantly. The assumption here is that something
+like that won't be available in this environment.
+
+This works by creating a stonith level composed of a BMC/IPMI
+fencing at level 1 and then the fence_crosslink agent at level 2.
+
+In case node (A) has lost power, then node (B) will do the following:
+1. Try to fence node (B) via IPMI, which will fail since the node has no
+power and the BMC is unavailable
+2. Check via fence_crosslink the cross-cable interconnect. If the cross cable
+IP is not reachable, then we know for "sure" (this is a potentially broad
+assumption) that the node is really down and fence_crosslink tells pacemaker
+that the fencing was successful, so pacemaker can work with that new
+information.
+
+Here are some example configuration commands:
+~~~
+pcs stonith create crosslink-controller-1 fence_crosslink crosscableip=1.1.1.2 pcmk_host_list=controller-1 pcmk_reboot_action=off
+pcs stonith create crosslink-controller-0 fence_crosslink crosscableip=1.1.1.1 pcmk_host_list=controller-0 pcmk_reboot_action=off
+# We make sure the stonith resource do not run on the same node as the fencing target
+pcs constraint location crosslink-controller-1 avoids controller-1
+pcs constraint location crosslink-controller-0 avoids controller-0
+pcs stonith level add 2 controller-0 crosslink-controller-0
+pcs stonith level add 2 controller-1 crosslink-controller-1
+~~~
+
+Testing done:
+- Simulate power outage by turning off the controller-1 VM and its IPMI interface and leaving the crosslink intact.
+
+ * Expected Outcome:
+ We should retain quorum on controller-0 and all services should be running on controller-0. No UNCLEAN resources should be observed on controller-0.
+ * Actual Outcome:
+ Matched the expected outcome.