From 06cba6ccd165ca8b224797e37fccb9e63f026d77 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sat, 21 Mar 2020 11:28:17 +0100 Subject: Adding upstream version 1.9.1. Signed-off-by: Daniel Baumann --- iredis/data/commands/cluster-failover.md | 81 ++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100644 iredis/data/commands/cluster-failover.md (limited to 'iredis/data/commands/cluster-failover.md') diff --git a/iredis/data/commands/cluster-failover.md b/iredis/data/commands/cluster-failover.md new file mode 100644 index 0000000..c811c04 --- /dev/null +++ b/iredis/data/commands/cluster-failover.md @@ -0,0 +1,81 @@ +This command, that can only be sent to a Redis Cluster replica node, forces the +replica to start a manual failover of its master instance. + +A manual failover is a special kind of failover that is usually executed when +there are no actual failures, but we wish to swap the current master with one of +its replicas (which is the node we send the command to), in a safe way, without +any window for data loss. It works in the following way: + +1. The replica tells the master to stop processing queries from clients. +2. The master replies to the replica with the current _replication offset_. +3. The replica waits for the replication offset to match on its side, to make + sure it processed all the data from the master before it continues. +4. The replica starts a failover, obtains a new configuration epoch from the + majority of the masters, and broadcasts the new configuration. +5. The old master receives the configuration update: unblocks its clients and + starts replying with redirection messages so that they'll continue the chat + with the new master. + +This way clients are moved away from the old master to the new master atomically +and only when the replica that is turning into the new master has processed all +of the replication stream from the old master. + +## FORCE option: manual failover when the master is down + +The command behavior can be modified by two options: **FORCE** and **TAKEOVER**. + +If the **FORCE** option is given, the replica does not perform any handshake +with the master, that may be not reachable, but instead just starts a failover +ASAP starting from point 4. This is useful when we want to start a manual +failover while the master is no longer reachable. + +However using **FORCE** we still need the majority of masters to be available in +order to authorize the failover and generate a new configuration epoch for the +replica that is going to become master. + +## TAKEOVER option: manual failover without cluster consensus + +There are situations where this is not enough, and we want a replica to failover +without any agreement with the rest of the cluster. A real world use case for +this is to mass promote replicas in a different data center to masters in order +to perform a data center switch, while all the masters are down or partitioned +away. + +The **TAKEOVER** option implies everything **FORCE** implies, but also does not +uses any cluster authorization in order to failover. A replica receiving +`CLUSTER FAILOVER TAKEOVER` will instead: + +1. Generate a new `configEpoch` unilaterally, just taking the current greatest + epoch available and incrementing it if its local configuration epoch is not + already the greatest. +2. Assign itself all the hash slots of its master, and propagate the new + configuration to every node which is reachable ASAP, and eventually to every + other node. + +Note that **TAKEOVER violates the last-failover-wins principle** of Redis +Cluster, since the configuration epoch generated by the replica violates the +normal generation of configuration epochs in several ways: + +1. There is no guarantee that it is actually the higher configuration epoch, + since, for example, we can use the **TAKEOVER** option within a minority, nor + any message exchange is performed to generate the new configuration epoch. +2. If we generate a configuration epoch which happens to collide with another + instance, eventually our configuration epoch, or the one of another instance + with our same epoch, will be moved away using the _configuration epoch + collision resolution algorithm_. + +Because of this the **TAKEOVER** option should be used with care. + +## Implementation details and notes + +`CLUSTER FAILOVER`, unless the **TAKEOVER** option is specified, does not +execute a failover synchronously, it only _schedules_ a manual failover, +bypassing the failure detection stage, so to check if the failover actually +happened, `CLUSTER NODES` or other means should be used in order to verify that +the state of the cluster changes after some time the command was sent. + +@return + +@simple-string-reply: `OK` if the command was accepted and a manual failover is +going to be attempted. An error if the operation cannot be executed, for example +if we are talking with a node which is already a master. -- cgit v1.2.3