diff options
Diffstat (limited to 'ctdb/doc/cluster_mutex_helper.txt')
-rw-r--r-- | ctdb/doc/cluster_mutex_helper.txt | 80 |
1 files changed, 80 insertions, 0 deletions
diff --git a/ctdb/doc/cluster_mutex_helper.txt b/ctdb/doc/cluster_mutex_helper.txt new file mode 100644 index 0000000..4ee018f --- /dev/null +++ b/ctdb/doc/cluster_mutex_helper.txt @@ -0,0 +1,80 @@ +Writing CTDB cluster mutex helpers +================================== + +CTDB uses cluster-wide mutexes to protect against a "split brain", +which could occur if the cluster becomes partitioned due to network +failure or similar. + +CTDB uses a cluster-wide mutex for its "cluster lock", which is used +to ensure that only one database recovery can happen at a time. For +an overview of cluster lock configuration see the CLUSTER LOCK +section in ctdb(7). CTDB tries to ensure correct operation of the +cluster lock by attempting to take the cluster lock when CTDB knows +that it should already be held. + +By default, CTDB uses a supplied mutex helper that uses a fcntl(2) +lock on a specified file in the cluster filesystem. + +However, a user supplied mutex helper can be used as an alternative. +The rest of this document describes the API for mutex helpers. + +A mutex helper is an external executable +---------------------------------------- + +A mutex helper is an external executable that can be run by CTDB. +There are no CTDB-specific compilation dependencies. This means that +a helper could easily be scripted around existing commands. Mutex +helpers are run relatively rarely and are not time critical. +Therefore, reliability is preferred over high performance. + +Taking a mutex with a helper +---------------------------- + +1. Helper is executed with helper-specific arguments + +2. Helper attempts to take mutex + +3. On success, the helper writes ASCII 0 to standard output + +4. Helper stays running, holding mutex, awaiting termination by CTDB + +5. When a helper receives SIGTERM it must release any mutex it is + holding and then exit. + +Status codes +------------ + +CTDB ignores the exit code of a helper. Instead, CTDB reacts to a +single ASCII character that is sent to it via a helper's standard +output. + +Valid status codes are: + +0 - The helper took the mutex and is holding it, awaiting termination. + +1 - The helper was unable to take the mutex due to contention. + +2 - The helper took too long to take the mutex. + + Helpers do not need to implement this status code. CTDB + already implements any required timeout handling. + +3 - An unexpected error occurred. + +If a 0 status code is sent then it the helper should periodically +check if the (original) parent processes still exists while awaiting +termination. If the parent process disappears then the helper should +release the mutex and exit. This avoids stale mutexes. Note that a +helper should never wait for parent process ID 1! + +If a non-0 status code is sent then the helper can exit immediately. +However, if the helper does not exit then it must terminate if it +receives SIGTERM. + +Logging +------- + +Anything written to standard error by a helper is incorporated into +CTDB's logs. A helper should generally only output to stderr for +unexpected errors and avoid output to stderr on success or on mutex +contention. |