summaryrefslogtreecommitdiffstats
path: root/man/votequorum.5
diff options
context:
space:
mode:
Diffstat (limited to 'man/votequorum.5')
-rw-r--r--man/votequorum.5410
1 files changed, 410 insertions, 0 deletions
diff --git a/man/votequorum.5 b/man/votequorum.5
new file mode 100644
index 0000000..0cb03c2
--- /dev/null
+++ b/man/votequorum.5
@@ -0,0 +1,410 @@
+.\"/*
+.\" * Copyright (c) 2012-2014 Red Hat, Inc.
+.\" *
+.\" * All rights reserved.
+.\" *
+.\" * Authors: Christine Caulfield <ccaulfie@redhat.com>
+.\" * Fabio M. Di Nitto <fdinitto@redhat.com>
+.\" *
+.\" * This software licensed under BSD license, the text of which follows:
+.\" *
+.\" * Redistribution and use in source and binary forms, with or without
+.\" * modification, are permitted provided that the following conditions are met:
+.\" *
+.\" * - Redistributions of source code must retain the above copyright notice,
+.\" * this list of conditions and the following disclaimer.
+.\" * - Redistributions in binary form must reproduce the above copyright notice,
+.\" * this list of conditions and the following disclaimer in the documentation
+.\" * and/or other materials provided with the distribution.
+.\" * - Neither the name of the MontaVista Software, Inc. nor the names of its
+.\" * contributors may be used to endorse or promote products derived from this
+.\" * software without specific prior written permission.
+.\" *
+.\" * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+.\" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+.\" * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+.\" * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+.\" * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+.\" * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+.\" * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+.\" * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+.\" * THE POSSIBILITY OF SUCH DAMAGE.
+.\" */
+.TH VOTEQUORUM 5 2018-12-14 "corosync Man Page" "Corosync Cluster Engine Programmer's Manual"
+.SH NAME
+votequorum \- Votequorum Configuration Overview
+.SH OVERVIEW
+The votequorum service is part of the corosync project. This service can be optionally loaded
+into the nodes of a corosync cluster to avoid split-brain situations.
+It does this by having a number of votes assigned to each system in the cluster and ensuring
+that only when a majority of the votes are present, cluster operations are allowed to proceed.
+The service must be loaded into all nodes or none. If it is loaded into a subset of cluster nodes
+the results will be unpredictable.
+.PP
+The following corosync.conf extract will enable votequorum service within corosync:
+.PP
+.nf
+quorum {
+ provider: corosync_votequorum
+}
+.fi
+.PP
+votequorum reads its configuration from corosync.conf. Some values can be changed at runtime, others
+are only read at corosync startup. It is very important that those values are consistent
+across all the nodes participating in the cluster or votequorum behavior will be unpredictable.
+.PP
+votequorum requires an expected_votes value to function, this can be provided in two ways.
+The number of expected votes will be automatically calculated when the nodelist { } section is
+present in corosync.conf or expected_votes can be specified in the quorum { } section. Lack of
+both will disable votequorum. If both are present at the same time,
+the quorum.expected_votes value will override the one calculated from the nodelist.
+.PP
+Example (no nodelist) of an 8 node cluster (each node has 1 vote):
+.nf
+
+quorum {
+ provider: corosync_votequorum
+ expected_votes: 8
+}
+.fi
+.PP
+Example (with nodelist) of a 3 node cluster (each node has 1 vote):
+.nf
+
+quorum {
+ provider: corosync_votequorum
+}
+
+nodelist {
+ node {
+ ring0_addr: 192.168.1.1
+ }
+ node {
+ ring0_addr: 192.168.1.2
+ }
+ node {
+ ring0_addr: 192.168.1.3
+ }
+}
+.fi
+.SH SPECIAL FEATURES
+.PP
+.B two_node: 1
+.PP
+Enables two node cluster operations (default: 0).
+.PP
+The "two node cluster" is a use case that requires special consideration.
+With a standard two node cluster, each node with a single vote, there
+are 2 votes in the cluster. Using the simple majority calculation
+(50% of the votes + 1) to calculate quorum, the quorum would be 2.
+This means that the both nodes would always have
+to be alive for the cluster to be quorate and operate.
+.PP
+Enabling two_node: 1, quorum is set artificially to 1.
+.PP
+Example configuration 1:
+
+.nf
+quorum {
+ provider: corosync_votequorum
+ expected_votes: 2
+ two_node: 1
+}
+.fi
+
+.PP
+Example configuration 2:
+
+.nf
+quorum {
+ provider: corosync_votequorum
+ two_node: 1
+}
+
+nodelist {
+ node {
+ ring0_addr: 192.168.1.1
+ }
+ node {
+ ring0_addr: 192.168.1.2
+ }
+}
+.fi
+.PP
+NOTES: enabling two_node: 1 automatically enables wait_for_all. It is
+still possible to override wait_for_all by explicitly setting it to 0.
+If more than 2 nodes join the cluster, the two_node option is
+automatically disabled.
+.PP
+.B wait_for_all: 1
+.PP
+Enables Wait For All (WFA) feature (default: 0).
+.PP
+The general behaviour of votequorum is to switch a cluster from inquorate to quorate
+as soon as possible. For example, in an 8 node cluster, where every node has 1 vote,
+expected_votes is set to 8 and quorum is (50% + 1) 5. As soon as 5 (or more) nodes
+are visible to each other, the partition of 5 (or more) becomes quorate and can
+start operating.
+.PP
+When WFA is enabled, the cluster will be quorate for the first time
+only after all nodes have been visible at least once at the same time.
+.PP
+This feature has the advantage of avoiding some startup race conditions, with the cost
+that all nodes need to be up at the same time at least once before the cluster
+can operate.
+.PP
+A common startup race condition based on the above example is that as soon as 5
+nodes become quorate, with the other 3 still offline, the remaining 3 nodes will
+be fenced.
+.PP
+It is very useful when combined with last_man_standing (see below).
+.PP
+Example configuration:
+.nf
+
+quorum {
+ provider: corosync_votequorum
+ expected_votes: 8
+ wait_for_all: 1
+}
+.fi
+.PP
+.B last_man_standing: 1
+/
+.B last_man_standing_window: 10000
+.PP
+Enables Last Man Standing (LMS) feature (default: 0).
+Tunable last_man_standing_window (default: 10 seconds, expressed in ms).
+.PP
+The general behaviour of votequorum is to set expected_votes and quorum
+at startup (unless modified by the user at runtime, see below) and use
+those values during the whole lifetime of the cluster.
+.PP
+Using for example an 8 node cluster where each node has 1 vote, expected_votes
+is set to 8 and quorum to 5. This condition allows a total failure of 3
+nodes. If a 4th node fails, the cluster becomes inquorate and it will
+stop providing services.
+.PP
+Enabling LMS allows the cluster to dynamically recalculate expected_votes
+and quorum under specific circumstances. It is essential to enable
+WFA when using LMS in High Availability clusters.
+.PP
+Using the above 8 node cluster example, with LMS enabled the cluster can retain
+quorum and continue operating by losing, in a cascade fashion, up to 6 nodes with
+only 2 remaining active.
+.PP
+Example chain of events:
+.nf
+1) cluster is fully operational with 8 nodes.
+ (expected_votes: 8 quorum: 5)
+
+2) 3 nodes die, cluster is quorate with 5 nodes.
+
+3) after last_man_standing_window timer expires,
+ expected_votes and quorum are recalculated.
+ (expected_votes: 5 quorum: 3)
+
+4) at this point, 2 more nodes can die and
+ cluster will still be quorate with 3.
+
+5) once again, after last_man_standing_window
+ timer expires expected_votes and quorum are
+ recalculated.
+ (expected_votes: 3 quorum: 2)
+
+6) at this point, 1 more node can die and
+ cluster will still be quorate with 2.
+
+7) one more last_man_standing_window timer
+ (expected_votes: 2 quorum: 2)
+.fi
+.PP
+NOTES: In order for the cluster to downgrade automatically from 2 nodes
+to a 1 node cluster, the auto_tie_breaker feature must also be enabled (see below).
+If auto_tie_breaker is not enabled, and one more failure occurs, the
+remaining node will not be quorate. LMS does not work with asymmetric voting
+schemes, each node must vote 1. LMS is also incompatible with quorum devices,
+if last_man_standing is specified in corosync.conf then the quorum device
+will be disabled.
+
+.PP
+Example configuration 1:
+.nf
+
+quorum {
+ provider: corosync_votequorum
+ expected_votes: 8
+ last_man_standing: 1
+}
+.fi
+.PP
+Example configuration 2 (increase timeout to 20 seconds):
+.nf
+
+quorum {
+ provider: corosync_votequorum
+ expected_votes: 8
+ last_man_standing: 1
+ last_man_standing_window: 20000
+}
+.fi
+.PP
+.B auto_tie_breaker: 1
+.PP
+Enables Auto Tie Breaker (ATB) feature (default: 0).
+.PP
+The general behaviour of votequorum allows a simultaneous node failure up
+to 50% - 1 node, assuming each node has 1 vote.
+.PP
+When ATB is enabled, the cluster can suffer up to 50% of the nodes failing
+at the same time, in a deterministic fashion. By default the cluster
+partition, or the set of nodes that are still in contact with the
+node that has the lowest nodeid will remain quorate. The other nodes will
+be inquorate. This behaviour can be changed by also specifying
+.PP
+.B auto_tie_breaker_node: lowest|highest|<list of node IDs>
+.PP
+\(oqlowest\(cq is the default, \(oqhighest\(cq is similar in that if the current set of
+nodes contains the highest nodeid then it will remain quorate. Alternatively
+it is possible to specify a particular node ID or list of node IDs that will
+be required to maintain quorum. If a (space-separated) list is given, the
+nodes are evaluated in order, so if the first node is present then it will
+be used to determine the quorate partition, if that node is not in either
+half (ie was not in the cluster before the split) then the second node ID
+will be checked for and so on. ATB is incompatible with quorum devices -
+if auto_tie_breaker is specified in corosync.conf then the quorum device
+will be disabled.
+.PP
+Example configuration 1:
+.nf
+
+quorum {
+ provider: corosync_votequorum
+ expected_votes: 8
+ auto_tie_breaker: 1
+ auto_tie_breaker_node: lowest
+}
+.fi
+.PP
+Example configuration 2:
+.nf
+quorum {
+ provider: corosync_votequorum
+ expected_votes: 8
+ auto_tie_breaker: 1
+ auto_tie_breaker_node: 1 3 5
+}
+.PP
+.fi
+.PP
+.B allow_downscale: 1
+.PP
+Enables allow downscale (AD) feature (default: 0).
+.PP
+THIS FEATURE IS INCOMPLETE AND CURRENTLY UNSUPPORTED.
+.PP
+The general behaviour of votequorum is to never decrease expected votes or quorum.
+.PP
+When AD is enabled, both expected votes and quorum are recalculated when
+a node leaves the cluster in a clean state (normal corosync shutdown process) down
+to configured expected_votes.
+.PP
+Example use case:
+.PP
+.nf
+1) N node cluster (where N is any value higher than 3)
+
+2) expected_votes set to 3 in corosync.conf
+
+3) only 3 nodes are running
+
+4) admin requires to increase processing power and adds 10 nodes
+
+5) internal expected_votes is automatically set to 13
+
+6) minimum expected_votes is 3 (from configuration)
+
+- up to this point this is standard votequorum behavior -
+
+7) once the work is done, admin wants to remove nodes from the cluster
+
+8) using an ordered shutdown the admin can reduce the cluster size
+ automatically back to 3, but not below 3, where normal quorum
+ operation will work as usual.
+
+.fi
+.PP
+Example configuration:
+.nf
+
+quorum {
+ provider: corosync_votequorum
+ expected_votes: 3
+ allow_downscale: 1
+}
+.fi
+allow_downscale implicitly enabled EVT (see below).
+.PP
+.B expected_votes_tracking: 1
+.PP
+Enables Expected Votes Tracking (EVT) feature (default: 0).
+.PP
+Expected Votes Tracking stores the highest-seen value of expected votes on disk and uses
+that as the minimum value for expected votes in the absence of any higher authority (eg
+a current quorate cluster). This is useful for when a group of nodes becomes detached from
+the main cluster and after a restart could have enough votes to provide quorum, which can
+happen after using allow_downscale.
+.PP
+Note that even if the in-memory version of expected_votes is reduced, eg by removing nodes
+or using corosync-quorumtool, the stored value will still be the highest value seen - it
+never gets reduced.
+.PP
+The value is held in the file ev_tracking (stored in the directory configured in system.state_dir
+or /var/lib/corosync/ when unset) which can be deleted if you
+really do need to reduce the expected votes for any reason, like the node has been moved
+to a different cluster.
+.PP
+.fi
+.PP
+.SH VARIOUS NOTES
+.PP
+* WFA / LMS / ATB / AD can be used combined together.
+.PP
+* In order to change the default votes for a node there are two options:
+.nf
+
+1) nodelist:
+
+nodelist {
+ node {
+ ring0_addr: 192.168.1.1
+ quorum_votes: 3
+ }
+ ....
+}
+
+2) quorum section (deprecated):
+
+quorum {
+ provider: corosync_votequorum
+ expected_votes: 2
+ votes: 2
+}
+
+.fi
+In the event that both nodelist and quorum { votes: } are defined, the value
+from the nodelist will be used.
+.PP
+* Only votes, quorum_votes, expected_votes and two_node can be changed at runtime. Everything else
+requires a cluster restart.
+.SH BUGS
+No known bugs at the time of writing. The authors are from outerspace. Deal with it.
+.SH "SEE ALSO"
+.BR corosync (8),
+.BR corosync.conf (5),
+.BR corosync-quorumtool (8),
+.BR corosync-qdevice (8),
+.BR votequorum_overview (3)
+.PP