summaryrefslogtreecommitdiffstats
path: root/ctdb/doc/cluster_mutex_helper.txt
diff options
context:
space:
mode:
Diffstat (limited to 'ctdb/doc/cluster_mutex_helper.txt')
-rw-r--r--ctdb/doc/cluster_mutex_helper.txt80
1 files changed, 80 insertions, 0 deletions
diff --git a/ctdb/doc/cluster_mutex_helper.txt b/ctdb/doc/cluster_mutex_helper.txt
new file mode 100644
index 0000000..4ee018f
--- /dev/null
+++ b/ctdb/doc/cluster_mutex_helper.txt
@@ -0,0 +1,80 @@
+Writing CTDB cluster mutex helpers
+==================================
+
+CTDB uses cluster-wide mutexes to protect against a "split brain",
+which could occur if the cluster becomes partitioned due to network
+failure or similar.
+
+CTDB uses a cluster-wide mutex for its "cluster lock", which is used
+to ensure that only one database recovery can happen at a time. For
+an overview of cluster lock configuration see the CLUSTER LOCK
+section in ctdb(7). CTDB tries to ensure correct operation of the
+cluster lock by attempting to take the cluster lock when CTDB knows
+that it should already be held.
+
+By default, CTDB uses a supplied mutex helper that uses a fcntl(2)
+lock on a specified file in the cluster filesystem.
+
+However, a user supplied mutex helper can be used as an alternative.
+The rest of this document describes the API for mutex helpers.
+
+A mutex helper is an external executable
+----------------------------------------
+
+A mutex helper is an external executable that can be run by CTDB.
+There are no CTDB-specific compilation dependencies. This means that
+a helper could easily be scripted around existing commands. Mutex
+helpers are run relatively rarely and are not time critical.
+Therefore, reliability is preferred over high performance.
+
+Taking a mutex with a helper
+----------------------------
+
+1. Helper is executed with helper-specific arguments
+
+2. Helper attempts to take mutex
+
+3. On success, the helper writes ASCII 0 to standard output
+
+4. Helper stays running, holding mutex, awaiting termination by CTDB
+
+5. When a helper receives SIGTERM it must release any mutex it is
+ holding and then exit.
+
+Status codes
+------------
+
+CTDB ignores the exit code of a helper. Instead, CTDB reacts to a
+single ASCII character that is sent to it via a helper's standard
+output.
+
+Valid status codes are:
+
+0 - The helper took the mutex and is holding it, awaiting termination.
+
+1 - The helper was unable to take the mutex due to contention.
+
+2 - The helper took too long to take the mutex.
+
+ Helpers do not need to implement this status code. CTDB
+ already implements any required timeout handling.
+
+3 - An unexpected error occurred.
+
+If a 0 status code is sent then it the helper should periodically
+check if the (original) parent processes still exists while awaiting
+termination. If the parent process disappears then the helper should
+release the mutex and exit. This avoids stale mutexes. Note that a
+helper should never wait for parent process ID 1!
+
+If a non-0 status code is sent then the helper can exit immediately.
+However, if the helper does not exit then it must terminate if it
+receives SIGTERM.
+
+Logging
+-------
+
+Anything written to standard error by a helper is incorporated into
+CTDB's logs. A helper should generally only output to stderr for
+unexpected errors and avoid output to stderr on success or on mutex
+contention.