diff options
Diffstat (limited to 'man/votequorum.5')
-rw-r--r-- | man/votequorum.5 | 410 |
1 files changed, 410 insertions, 0 deletions
diff --git a/man/votequorum.5 b/man/votequorum.5 new file mode 100644 index 0000000..0cb03c2 --- /dev/null +++ b/man/votequorum.5 @@ -0,0 +1,410 @@ +.\"/* +.\" * Copyright (c) 2012-2014 Red Hat, Inc. +.\" * +.\" * All rights reserved. +.\" * +.\" * Authors: Christine Caulfield <ccaulfie@redhat.com> +.\" * Fabio M. Di Nitto <fdinitto@redhat.com> +.\" * +.\" * This software licensed under BSD license, the text of which follows: +.\" * +.\" * Redistribution and use in source and binary forms, with or without +.\" * modification, are permitted provided that the following conditions are met: +.\" * +.\" * - Redistributions of source code must retain the above copyright notice, +.\" * this list of conditions and the following disclaimer. +.\" * - Redistributions in binary form must reproduce the above copyright notice, +.\" * this list of conditions and the following disclaimer in the documentation +.\" * and/or other materials provided with the distribution. +.\" * - Neither the name of the MontaVista Software, Inc. nor the names of its +.\" * contributors may be used to endorse or promote products derived from this +.\" * software without specific prior written permission. +.\" * +.\" * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +.\" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +.\" * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +.\" * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +.\" * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +.\" * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF +.\" * THE POSSIBILITY OF SUCH DAMAGE. +.\" */ +.TH VOTEQUORUM 5 2018-12-14 "corosync Man Page" "Corosync Cluster Engine Programmer's Manual" +.SH NAME +votequorum \- Votequorum Configuration Overview +.SH OVERVIEW +The votequorum service is part of the corosync project. This service can be optionally loaded +into the nodes of a corosync cluster to avoid split-brain situations. +It does this by having a number of votes assigned to each system in the cluster and ensuring +that only when a majority of the votes are present, cluster operations are allowed to proceed. +The service must be loaded into all nodes or none. If it is loaded into a subset of cluster nodes +the results will be unpredictable. +.PP +The following corosync.conf extract will enable votequorum service within corosync: +.PP +.nf +quorum { + provider: corosync_votequorum +} +.fi +.PP +votequorum reads its configuration from corosync.conf. Some values can be changed at runtime, others +are only read at corosync startup. It is very important that those values are consistent +across all the nodes participating in the cluster or votequorum behavior will be unpredictable. +.PP +votequorum requires an expected_votes value to function, this can be provided in two ways. +The number of expected votes will be automatically calculated when the nodelist { } section is +present in corosync.conf or expected_votes can be specified in the quorum { } section. Lack of +both will disable votequorum. If both are present at the same time, +the quorum.expected_votes value will override the one calculated from the nodelist. +.PP +Example (no nodelist) of an 8 node cluster (each node has 1 vote): +.nf + +quorum { + provider: corosync_votequorum + expected_votes: 8 +} +.fi +.PP +Example (with nodelist) of a 3 node cluster (each node has 1 vote): +.nf + +quorum { + provider: corosync_votequorum +} + +nodelist { + node { + ring0_addr: 192.168.1.1 + } + node { + ring0_addr: 192.168.1.2 + } + node { + ring0_addr: 192.168.1.3 + } +} +.fi +.SH SPECIAL FEATURES +.PP +.B two_node: 1 +.PP +Enables two node cluster operations (default: 0). +.PP +The "two node cluster" is a use case that requires special consideration. +With a standard two node cluster, each node with a single vote, there +are 2 votes in the cluster. Using the simple majority calculation +(50% of the votes + 1) to calculate quorum, the quorum would be 2. +This means that the both nodes would always have +to be alive for the cluster to be quorate and operate. +.PP +Enabling two_node: 1, quorum is set artificially to 1. +.PP +Example configuration 1: + +.nf +quorum { + provider: corosync_votequorum + expected_votes: 2 + two_node: 1 +} +.fi + +.PP +Example configuration 2: + +.nf +quorum { + provider: corosync_votequorum + two_node: 1 +} + +nodelist { + node { + ring0_addr: 192.168.1.1 + } + node { + ring0_addr: 192.168.1.2 + } +} +.fi +.PP +NOTES: enabling two_node: 1 automatically enables wait_for_all. It is +still possible to override wait_for_all by explicitly setting it to 0. +If more than 2 nodes join the cluster, the two_node option is +automatically disabled. +.PP +.B wait_for_all: 1 +.PP +Enables Wait For All (WFA) feature (default: 0). +.PP +The general behaviour of votequorum is to switch a cluster from inquorate to quorate +as soon as possible. For example, in an 8 node cluster, where every node has 1 vote, +expected_votes is set to 8 and quorum is (50% + 1) 5. As soon as 5 (or more) nodes +are visible to each other, the partition of 5 (or more) becomes quorate and can +start operating. +.PP +When WFA is enabled, the cluster will be quorate for the first time +only after all nodes have been visible at least once at the same time. +.PP +This feature has the advantage of avoiding some startup race conditions, with the cost +that all nodes need to be up at the same time at least once before the cluster +can operate. +.PP +A common startup race condition based on the above example is that as soon as 5 +nodes become quorate, with the other 3 still offline, the remaining 3 nodes will +be fenced. +.PP +It is very useful when combined with last_man_standing (see below). +.PP +Example configuration: +.nf + +quorum { + provider: corosync_votequorum + expected_votes: 8 + wait_for_all: 1 +} +.fi +.PP +.B last_man_standing: 1 +/ +.B last_man_standing_window: 10000 +.PP +Enables Last Man Standing (LMS) feature (default: 0). +Tunable last_man_standing_window (default: 10 seconds, expressed in ms). +.PP +The general behaviour of votequorum is to set expected_votes and quorum +at startup (unless modified by the user at runtime, see below) and use +those values during the whole lifetime of the cluster. +.PP +Using for example an 8 node cluster where each node has 1 vote, expected_votes +is set to 8 and quorum to 5. This condition allows a total failure of 3 +nodes. If a 4th node fails, the cluster becomes inquorate and it will +stop providing services. +.PP +Enabling LMS allows the cluster to dynamically recalculate expected_votes +and quorum under specific circumstances. It is essential to enable +WFA when using LMS in High Availability clusters. +.PP +Using the above 8 node cluster example, with LMS enabled the cluster can retain +quorum and continue operating by losing, in a cascade fashion, up to 6 nodes with +only 2 remaining active. +.PP +Example chain of events: +.nf +1) cluster is fully operational with 8 nodes. + (expected_votes: 8 quorum: 5) + +2) 3 nodes die, cluster is quorate with 5 nodes. + +3) after last_man_standing_window timer expires, + expected_votes and quorum are recalculated. + (expected_votes: 5 quorum: 3) + +4) at this point, 2 more nodes can die and + cluster will still be quorate with 3. + +5) once again, after last_man_standing_window + timer expires expected_votes and quorum are + recalculated. + (expected_votes: 3 quorum: 2) + +6) at this point, 1 more node can die and + cluster will still be quorate with 2. + +7) one more last_man_standing_window timer + (expected_votes: 2 quorum: 2) +.fi +.PP +NOTES: In order for the cluster to downgrade automatically from 2 nodes +to a 1 node cluster, the auto_tie_breaker feature must also be enabled (see below). +If auto_tie_breaker is not enabled, and one more failure occurs, the +remaining node will not be quorate. LMS does not work with asymmetric voting +schemes, each node must vote 1. LMS is also incompatible with quorum devices, +if last_man_standing is specified in corosync.conf then the quorum device +will be disabled. + +.PP +Example configuration 1: +.nf + +quorum { + provider: corosync_votequorum + expected_votes: 8 + last_man_standing: 1 +} +.fi +.PP +Example configuration 2 (increase timeout to 20 seconds): +.nf + +quorum { + provider: corosync_votequorum + expected_votes: 8 + last_man_standing: 1 + last_man_standing_window: 20000 +} +.fi +.PP +.B auto_tie_breaker: 1 +.PP +Enables Auto Tie Breaker (ATB) feature (default: 0). +.PP +The general behaviour of votequorum allows a simultaneous node failure up +to 50% - 1 node, assuming each node has 1 vote. +.PP +When ATB is enabled, the cluster can suffer up to 50% of the nodes failing +at the same time, in a deterministic fashion. By default the cluster +partition, or the set of nodes that are still in contact with the +node that has the lowest nodeid will remain quorate. The other nodes will +be inquorate. This behaviour can be changed by also specifying +.PP +.B auto_tie_breaker_node: lowest|highest|<list of node IDs> +.PP +\(oqlowest\(cq is the default, \(oqhighest\(cq is similar in that if the current set of +nodes contains the highest nodeid then it will remain quorate. Alternatively +it is possible to specify a particular node ID or list of node IDs that will +be required to maintain quorum. If a (space-separated) list is given, the +nodes are evaluated in order, so if the first node is present then it will +be used to determine the quorate partition, if that node is not in either +half (ie was not in the cluster before the split) then the second node ID +will be checked for and so on. ATB is incompatible with quorum devices - +if auto_tie_breaker is specified in corosync.conf then the quorum device +will be disabled. +.PP +Example configuration 1: +.nf + +quorum { + provider: corosync_votequorum + expected_votes: 8 + auto_tie_breaker: 1 + auto_tie_breaker_node: lowest +} +.fi +.PP +Example configuration 2: +.nf +quorum { + provider: corosync_votequorum + expected_votes: 8 + auto_tie_breaker: 1 + auto_tie_breaker_node: 1 3 5 +} +.PP +.fi +.PP +.B allow_downscale: 1 +.PP +Enables allow downscale (AD) feature (default: 0). +.PP +THIS FEATURE IS INCOMPLETE AND CURRENTLY UNSUPPORTED. +.PP +The general behaviour of votequorum is to never decrease expected votes or quorum. +.PP +When AD is enabled, both expected votes and quorum are recalculated when +a node leaves the cluster in a clean state (normal corosync shutdown process) down +to configured expected_votes. +.PP +Example use case: +.PP +.nf +1) N node cluster (where N is any value higher than 3) + +2) expected_votes set to 3 in corosync.conf + +3) only 3 nodes are running + +4) admin requires to increase processing power and adds 10 nodes + +5) internal expected_votes is automatically set to 13 + +6) minimum expected_votes is 3 (from configuration) + +- up to this point this is standard votequorum behavior - + +7) once the work is done, admin wants to remove nodes from the cluster + +8) using an ordered shutdown the admin can reduce the cluster size + automatically back to 3, but not below 3, where normal quorum + operation will work as usual. + +.fi +.PP +Example configuration: +.nf + +quorum { + provider: corosync_votequorum + expected_votes: 3 + allow_downscale: 1 +} +.fi +allow_downscale implicitly enabled EVT (see below). +.PP +.B expected_votes_tracking: 1 +.PP +Enables Expected Votes Tracking (EVT) feature (default: 0). +.PP +Expected Votes Tracking stores the highest-seen value of expected votes on disk and uses +that as the minimum value for expected votes in the absence of any higher authority (eg +a current quorate cluster). This is useful for when a group of nodes becomes detached from +the main cluster and after a restart could have enough votes to provide quorum, which can +happen after using allow_downscale. +.PP +Note that even if the in-memory version of expected_votes is reduced, eg by removing nodes +or using corosync-quorumtool, the stored value will still be the highest value seen - it +never gets reduced. +.PP +The value is held in the file ev_tracking (stored in the directory configured in system.state_dir +or /var/lib/corosync/ when unset) which can be deleted if you +really do need to reduce the expected votes for any reason, like the node has been moved +to a different cluster. +.PP +.fi +.PP +.SH VARIOUS NOTES +.PP +* WFA / LMS / ATB / AD can be used combined together. +.PP +* In order to change the default votes for a node there are two options: +.nf + +1) nodelist: + +nodelist { + node { + ring0_addr: 192.168.1.1 + quorum_votes: 3 + } + .... +} + +2) quorum section (deprecated): + +quorum { + provider: corosync_votequorum + expected_votes: 2 + votes: 2 +} + +.fi +In the event that both nodelist and quorum { votes: } are defined, the value +from the nodelist will be used. +.PP +* Only votes, quorum_votes, expected_votes and two_node can be changed at runtime. Everything else +requires a cluster restart. +.SH BUGS +No known bugs at the time of writing. The authors are from outerspace. Deal with it. +.SH "SEE ALSO" +.BR corosync (8), +.BR corosync.conf (5), +.BR corosync-quorumtool (8), +.BR corosync-qdevice (8), +.BR votequorum_overview (3) +.PP |