diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-06-03 17:01:24 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-06-03 17:01:24 +0000 |
commit | 6dd3dfb79125cd02d02efbce435a6c82e5af92ef (patch) | |
tree | 45084fc83278586f6bbafcb935f92d53f71a6b03 /man/corosync.conf.5 | |
parent | Initial commit. (diff) | |
download | corosync-upstream.tar.xz corosync-upstream.zip |
Adding upstream version 3.1.8.upstream/3.1.8upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'man/corosync.conf.5')
-rw-r--r-- | man/corosync.conf.5 | 1061 |
1 files changed, 1061 insertions, 0 deletions
diff --git a/man/corosync.conf.5 b/man/corosync.conf.5 new file mode 100644 index 0000000..e76d64e --- /dev/null +++ b/man/corosync.conf.5 @@ -0,0 +1,1061 @@ +.\"/* +.\" * Copyright (c) 2005 MontaVista Software, Inc. +.\" * Copyright (c) 2006-2022 Red Hat, Inc. +.\" * +.\" * All rights reserved. +.\" * +.\" * Author: Steven Dake (sdake@redhat.com) +.\" * +.\" * This software licensed under BSD license, the text of which follows: +.\" * +.\" * Redistribution and use in source and binary forms, with or without +.\" * modification, are permitted provided that the following conditions are met: +.\" * +.\" * - Redistributions of source code must retain the above copyright notice, +.\" * this list of conditions and the following disclaimer. +.\" * - Redistributions in binary form must reproduce the above copyright notice, +.\" * this list of conditions and the following disclaimer in the documentation +.\" * and/or other materials provided with the distribution. +.\" * - Neither the name of the MontaVista Software, Inc. nor the names of its +.\" * contributors may be used to endorse or promote products derived from this +.\" * software without specific prior written permission. +.\" * +.\" * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +.\" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +.\" * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +.\" * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +.\" * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +.\" * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF +.\" * THE POSSIBILITY OF SUCH DAMAGE. +.\" */ +.TH COROSYNC_CONF 5 2022-10-20 "corosync Man Page" "Corosync Cluster Engine Programmer's Manual" +.SH NAME +corosync.conf - corosync executive configuration file + +.SH SYNOPSIS +/etc/corosync/corosync.conf + +.SH DESCRIPTION +The corosync.conf instructs the corosync executive about various parameters +needed to control the corosync executive. Empty lines and lines starting with +# character are ignored. The configuration file consists of bracketed top level +directives. The possible directive choices are: + +.TP +totem { } +This top level directive contains configuration options for the totem protocol. +.TP +logging { } +This top level directive contains configuration options for logging. +.TP +quorum { } +This top level directive contains configuration options for quorum. +.TP +nodelist { } +This top level directive contains configuration options for nodes in cluster. +.TP +system { } +This top level directive contains configuration options related to system. +.TP +resources { } +This top level directive contains configuration options for resources. +.TP +nozzle { } +This top level directive contains configuration options for a libnozzle device. + +.PP +The +.B interface sub-directive of totem is optional for UDP and knet transports. + +For knet, multiple interface subsections define parameters for each knet link on the +system. + +For UDPU an interface section is not needed and it is recommended that the nodelist +is used to define cluster nodes. + +.TP +linknumber +This specifies the link number for the interface. When using the knet +protocol, each interface should specify separate link numbers to uniquely +identify to the membership protocol which interface to use for which link. +The linknumber must start at 0. For UDP the only supported linknumber is 0. + +.TP +knet_link_priority +This specifies the priority for the link when knet is used in 'passive' +mode. (see link_mode below) + +.TP +knet_ping_interval +This specifies the interval between knet link pings. +knet_ping_interval and knet_ping_timeout +are a pair, if one is specified the other should be too, otherwise one will be calculated from +the token timeout and one will be taken from the config file. +(default is token timeout / (knet_pong_count*2)) + +.TP +knet_ping_timeout +If no ping is received within this time, the knet link is declared dead. +knet_ping_interval and knet_ping_timeout +are a pair, if one is specified the other should be too, otherwise one will be calculated from +the token timeout and one will be taken from the config file. +(default is token timeout / knet_pong_count) + +.TP +knet_ping_precision +How many values of latency are used to calculate +the average link latency. (default 2048 samples) + +.TP +knet_pong_count +How many valid ping/pongs before a link is marked UP. (default 2) +.TP + +knet_transport +Which IP transport knet should use. valid values are "sctp" or "udp". (default: udp) + +.TP +bindnetaddr (udp only) +This specifies the network address the corosync executive should bind +to when using udp. + +bindnetaddr (udp only) +should be an IP address configured on the system, or a network +address. + +For example, if the local interface is 192.168.5.92 with netmask +255.255.255.0, you should set bindnetaddr to 192.168.5.92 or 192.168.5.0. +If the local interface is 192.168.5.92 with netmask 255.255.255.192, +set bindnetaddr to 192.168.5.92 or 192.168.5.64, and so forth. + +This may also be an IPV6 address, in which case IPV6 networking will be used. +In this case, the exact address must be specified and there is no automatic +selection of the network interface within a specific subnet as with IPv4. + +If IPv6 networking is used, the nodeid field in nodelist must be specified. + +.TP +broadcast (udp only) +This is optional and can be set to yes. If it is set to yes, the broadcast +address will be used for communication. If this option is set, mcastaddr +should not be set. + +.TP +mcastaddr (udp only) +This is the multicast address used by corosync executive. The default +should work for most networks, but the network administrator should be queried +about a multicast address to use. Avoid 224.x.x.x because this is a "config" +multicast address. + +This may also be an IPV6 multicast address, in which case IPV6 networking +will be used. If IPv6 networking is used, the nodeid field in nodelist must +be specified. + +It's not necessary to use this option if cluster_name option is used. If both options +are used, mcastaddr has higher priority. + +.TP +mcastport (udp only) +This specifies the UDP port number. It is possible to use the same multicast +address on a network with the corosync services configured for different +UDP ports. +Please note corosync uses two UDP ports mcastport (for mcast receives) and +mcastport - 1 (for mcast sends). +If you have multiple clusters on the same network using the same mcastaddr +please configure the mcastports with a gap. + +.TP +ttl (udp only) +This specifies the Time To Live (TTL). If you run your cluster on a routed +network then the default of "1" will be too small. This option provides +a way to increase this up to 255. The valid range is 0..255. + +.PP +.PP +Within the +.B totem +directive, there are seven configuration options of which one is required, +five are optional, and one is required when IPV6 is configured in the interface +subdirective. The required directive controls the version of the totem +configuration. The optional option unless using IPV6 directive controls +identification of the processor. The optional options control secrecy and +authentication, the network mode of operation and maximum network MTU +field. + +.TP +version +This specifies the version of the configuration file. Currently the only +valid version for this directive is 2. + +.TP +clear_node_high_bit +This configuration option is optional and is only relevant when no nodeid is +specified. Some corosync clients require a signed 32 bit nodeid that is greater +than zero however by default corosync uses all 32 bits of the IPv4 address space +when generating a nodeid. Set this option to yes to force the high bit to be +zero and therefore ensure the nodeid is a positive signed 32 bit integer. + +WARNING: Cluster behavior is undefined if this option is enabled on only +a subset of the cluster (for example during a rolling upgrade). + +.TP +crypto_model +This specifies which cryptographic library should be used by knet. +Supported values depend on the libknet build and on the installed +cryptography libraries. Typically nss and openssl will be available +but gcrypt and others could also be allowed. + +The default is nss. + +.TP +crypto_hash +This specifies which HMAC authentication should be used to authenticate all +messages. Valid values are none (no authentication), md5, sha1, sha256, +sha384 and sha512. Encrypted transmission is only supported for +the knet transport. + +The default is none. + +.TP +crypto_cipher +This specifies which cipher should be used to encrypt all messages. +Valid values are none (no encryption), aes256, aes192 and aes128. +Enabling crypto_cipher, requires also enabling of crypto_hash. Encrypted +transmission is only supported for the knet transport. + +The default is none. + +.TP +secauth +This implies crypto_cipher=aes256 and crypto_hash=sha256, unless those options +are explicitly set. Encrypted transmission is only supported for the knet +transport. + +The default is off. + +.TP +keyfile +This specifies the fully qualified path to the shared key used to +authenticate and encrypt data used within the Totem protocol. + +The default is /etc/corosync/authkey. + +.TP +key +Shared key stored in configuration instead of authkey file. This option +has lower precedence than keyfile option so it's +used only when keyfile is not specified. +Using this option is not recommended for security reasons. + +.TP +link_mode +This specifies the Kronosnet mode, which may be passive, active, or +rr (round-robin). +.B passive: +the active link with the highest priority (highest number) will be used. If one or more +links share the same priority the one with the lowest link ID will +be used. +.B active: +All active links will be used simultaneously to send traffic. +link priority is ignored. +.B rr: +Round-Robin policy. Each packet will be sent to the next active link in +order. + +If only one interface directive is specified, passive is automatically chosen. + +The maximum number of interface directives that is allowed with Kronosnet +is 8. For other transports it is 1. + +.TP +netmtu +This specifies maximum packet length sent by corosync. It's mainly for the UDPU +(and UDP) transport, where it specifies the network maximum transmit size, but +can be used also with the KNET transport, where it defines the maximum length of packets +passed to the knet layer. To specify the network MTU manually for KNET, use the +.B knet_mtu +option. + +For UDPU (and UDP), setting this value beyond 1500, the regular frame MTU, +requires ethernet devices that support large, or +also called jumbo, frames. If any device in the network doesn't support large +frames, the protocol will not operate properly. The hosts must also have their +mtu size set from 1500 to whatever frame size is specified here. + +Please note while some NICs or switches claim large frame support, they support +9000 MTU as the maximum frame size including the IP header. Setting the netmtu +and host MTUs to 9000 will cause totem to use the full 9000 bytes of the frame. +Then Linux will add a 18 byte header moving the full frame size to 9018. As a +result some hardware will not operate properly with this size of data. A netmtu +of 8982 seems to work for the few large frame devices that have been tested. +Some manufacturers claim large frame support when in fact they support frame +sizes of 4500 bytes. + +When sending multicast traffic, if the network frequently reconfigures, chances are +that some device in the network doesn't support large frames. + +Choose hardware carefully if intending to use large frame support. + +The default is 1500 for UDPU (and UDP) and 65536 for the KNET transport. + +.TP +transport +This directive controls the transport mechanism used. +The default is knet. The transport type can also be set to udpu or udp. +Only knet allows crypto or multiple interfaces per node. + +.TP +cluster_name +This specifies the name of cluster and it's used for automatic generating +of multicast address. + +.TP +config_version +This specifies version of config file. This is converted to unsigned 64-bit int. +By default it's 0. Option is used to prevent joining old nodes with not +up-to-date configuration. If value is not 0, and node is going for first time +(only for first time, join after split doesn't follow this rules) +from single-node membership to multiple nodes membership, other nodes +config_versions are collected. If current node config_version is not +equal to highest of collected versions, corosync is terminated. + +.TP +ip_version +This specifies version of IP to ask DNS resolver for. +The value can be one of +.B ipv4 +(look only for an IPv4 address) +, +.B ipv6 +(check only IPv6 address) +, +.B ipv4-6 +(look for all address families and use first IPv4 address found in the list if there is such address, +otherwise use first IPv6 address) and +.B ipv6-4 +(look for all address families and use first IPv6 address found in the list if there is such address, +otherwise use first IPv4 address). + +Default (if unspecified) is +.B ipv6-4 +for knet and udpu transports and +.B ipv4 +for udp. + +The knet transport supports IPv4 and IPv6 addresses concurrently, +provided they are consistent on each link. + +Within the +.B totem +directive, there are several configuration options which are used to control +the operation of the protocol. It is generally not recommended to change any +of these values without proper guidance and sufficient testing. Some networks +may require larger values if suffering from frequent reconfigurations. Some +applications may require faster failure detection times which can be achieved +by reducing the token timeout. + +.TP +token +This timeout is used directly or as a base for real token timeout calculation (explained in +.B token_coefficient +section). Token timeout specifies in milliseconds until a token loss is declared after not +receiving a token. This is the time spent detecting a failure of a processor +in the current configuration. Reforming a new configuration takes about 50 +milliseconds in addition to this timeout. + +For real token timeout used by totem it's possible to read cmap value of +.B runtime.config.totem.token +key. + +Be careful to use the same timeout values on each of the nodes in the cluster +or unpredictable results may occur. + +The default is 3000 milliseconds. + +.TP +token_warning +Specifies the interval between warnings that the token has not been received. The +value is a percentage of the token timeout and can be set to 0 to disable +warnings. + +The default is 75%. + +.TP +token_coefficient +This value is used only when +.B nodelist +section is specified and contains at least 3 nodes. If so, real token timeout +is then computed as token + (number_of_nodes - 2) * token_coefficient. +This allows cluster to scale without manually changing token timeout +every time new node is added. This value can be set to 0 resulting +in effective removal of this feature. + +The default is 650 milliseconds. + +.TP +token_retransmit +This timeout specifies in milliseconds after how long before receiving a token +the token is retransmitted. This will be automatically calculated if token +is modified. It is not recommended to alter this value without guidance from +the corosync community. + +The minimum is 30 milliseconds. If not set and error occur, make sure +token / (token_retransmits_before_loss_const + 0.2) is more than 30. + +The default is 238 milliseconds for two nodes cluster. Three or more nodes reference +.B token_coefficient. + +.TP +knet_compression_model +Type of compression used by Kronosnet. Supported values depend on +the libknet build and on the installed compression libraries. Typically zlib and lz4 will be available +but bzip2 and others could also be allowed. The default is 'none'. + +.TP +knet_compression_threshold +Tells knet to NOT compress any packets that are smaller than the value +indicated. Default 100 bytes. + +Set to 0 to reset to the default. +Set to 1 to compress everything. + +.TP +knet_compression_level +Many compression libraries allow tuning of compression parameters. For example +0 or 1 ... 9 are commonly used to determine the level of compression. This value +is passed unmodified to the compression library so it is recommended to consult +the library's documentation for more detailed information. + +.TP +hold +This timeout specifies in milliseconds how long the token should be held by +the representative when the protocol is under low utilization. It is not +recommended to alter this value without guidance from the corosync community. + +The default is 180 milliseconds. + +.TP +token_retransmits_before_loss_const +This value identifies how many token retransmits should be attempted before +forming a new configuration. It is also used for token_retransmit +and hold calculations. + +The default is 4 retransmissions. + +.TP +join +This timeout specifies in milliseconds how long to wait for join messages in +the membership protocol. + +The default is 50 milliseconds. + +.TP +send_join +This timeout specifies in milliseconds an upper range between 0 and send_join +to wait before sending a join message. For configurations with less than +32 nodes, this parameter is not necessary. For larger rings, this parameter +is necessary to ensure the NIC is not overflowed with join messages on +formation of a new ring. A reasonable value for large rings (128 nodes) would +be 80msec. Other timer values must also change if this value is changed. Seek +advice from the corosync mailing list if trying to run larger configurations. + +The default is 0 milliseconds. + +.TP +consensus +This timeout specifies in milliseconds how long to wait for consensus to be +achieved before starting a new round of membership configuration. The minimum +value for consensus must be 1.2 * token. This value will be automatically +calculated at 1.2 * token if the user doesn't specify a consensus value. + +For two node clusters, a consensus larger than the join timeout but less than +token is safe. For three node or larger clusters, consensus should be larger +than token. There is an increasing risk of odd membership changes, which still +guarantee virtual synchrony, as node count grows if consensus is less than +token. + +The default is 3600 milliseconds. + +.TP +merge +This timeout specifies in milliseconds how long to wait before checking for +a partition when no multicast traffic is being sent. If multicast traffic +is being sent, the merge detection happens automatically as a function of +the protocol. + +The default is 200 milliseconds. + +.TP +downcheck +This timeout specifies in milliseconds how long to wait before checking +that a network interface is back up after it has been downed. + +The default is 1000 milliseconds. + +.TP +fail_recv_const +This constant specifies how many rotations of the token without receiving any +of the messages when messages should be received may occur before a new +configuration is formed. + +The default is 2500 failures to receive a message. + +.TP +seqno_unchanged_const +This constant specifies how many rotations of the token without any multicast +traffic should occur before the hold timer is started. + +The default is 30 rotations. + +.TP +heartbeat_failures_allowed +[HeartBeating mechanism] +Configures the optional HeartBeating mechanism for faster failure detection. Keep in +mind that engaging this mechanism in lossy networks could cause faulty loss declaration +as the mechanism relies on the network for heartbeating. + +So as a rule of thumb use this mechanism if you require improved failure in low to +medium utilized networks. + +This constant specifies the number of heartbeat failures the system should tolerate +before declaring heartbeat failure e.g 3. Also if this value is not set or is 0 then the +heartbeat mechanism is not engaged in the system and token rotation is the method +of failure detection + +The default is 0 (disabled). + +.TP +max_network_delay +[HeartBeating mechanism] +This constant specifies in milliseconds the approximate delay that your network takes +to transport one packet from one machine to another. This value is to be set by system +engineers and please don't change if not sure as this effects the failure detection +mechanism using heartbeat. + +The default is 50 milliseconds. + +.TP +window_size +This constant specifies the maximum number of messages that may be sent on one +token rotation. If all processors perform equally well, this value could be +large (300), which would introduce higher latency from origination to delivery +for very large rings. To reduce latency in large rings(16+), the defaults are +a safe compromise. If 1 or more slow processor(s) are present among fast +processors, window_size should be no larger than 256000 / netmtu to avoid +overflow of the kernel receive buffers. The user is notified of this by +the display of a retransmit list in the notification logs. There is no loss +of data, but performance is reduced when these errors occur. + +The default is 50 messages. + +.TP +max_messages +This constant specifies the maximum number of messages that may be sent by one +processor on receipt of the token. The max_messages parameter is limited to +256000 / netmtu to prevent overflow of the kernel transmit buffers. + +The default is 17 messages. + +.TP +miss_count_const +This constant defines the maximum number of times on receipt of a token +a message is checked for retransmission before a retransmission occurs. This +parameter is useful to modify for switches that delay multicast packets +compared to unicast packets. The default setting works well for nearly all +modern switches. + +The default is 5 messages. + +.TP +knet_pmtud_interval +How often the knet PMTUd runs to look for network MTU changes. +Value in seconds, default: 30 + +.TP +knet_mtu +Switch between manual and automatic MTU discovery. A value of 0 means +automatic, other values set a manual MTU. +In a setup with multiple interfaces, please specify +the lowest MTU of the selected interfaces. + +The default value is 0. + +.TP +block_unlisted_ips +Allow UDPU and KNET to drop packets from IP addresses that are not known +(nodes which don't exist in the nodelist) to corosync. +Value is yes or no. + +This feature is mainly to protect against the joining of nodes +with outdated configurations after a cluster split. +Another use case is to allow the atomic merge of two independent clusters. + +Changing the default value is not recommended, the overhead is tiny and +an existing cluster may fail if corosync is started on an unlisted node +with an old configuration. + +The default value is yes. + +.TP +cancel_token_hold_on_retransmit +Allows Corosync to hold token by representative when there is too much +retransmit messages. This allows network to process increased load without +overloading it. Used mechanism is same as described for +.B hold +directive. + +Some deployments may prefer to never hold token when there is +retransmit messages. If so, option should be set to yes. + +The default value is no. + +.PP +Within the +.B logging +directive, there are several configuration options which are all optional. + +.PP +The following 3 options are valid only for the top level logging directive: + +.TP +timestamp +This specifies that a timestamp is placed on all log messages. It can be one +of off (no timestamp), on (second precision timestamp) or +hires (millisecond precision timestamp - only when supported by LibQB). + +The default is hires (or on if hires is not supported). + +.TP +fileline +This specifies that file and line should be printed. + +The default is off. + +.TP +function_name +This specifies that the code function name should be printed. + +The default is off. + +.TP +blackbox +This specifies that blackbox functionality should be enabled. + +The default is on. + +.PP +The following options are valid both for top level logging directive +and they can be overridden in logger_subsys entries. + +.TP +to_stderr +.TP +to_logfile +.TP +to_syslog +These specify the destination of logging output. Any combination of +these options may be specified. Valid options are +.B yes +and +.B no. + +The default is syslog and stderr. + +Please note, if you are using to_logfile and want to rotate the file, use logrotate(8) +with the option +.B +copytruncate. +eg. +.ne 18 +.RS +.nf +.ft CW +/var/log/corosync.log { + missingok + compress + notifempty + daily + rotate 7 + copytruncate +} +.ft +.fi +.RE + +.TP +logfile +If the +.B to_logfile +directive is set to +.B yes +, this option specifies the pathname of the log file. + +No default. + +.TP +logfile_priority +This specifies the logfile priority for this particular subsystem. Ignored if debug is on. +Possible values are: alert, crit, debug (same as debug = on), emerg, err, info, notice, warning. + +The default is: info. + +.TP +syslog_facility +This specifies the syslog facility type that will be used for any messages +sent to syslog. options are daemon, local0, local1, local2, local3, local4, +local5, local6 & local7. + +The default is daemon. + +.TP +syslog_priority +This specifies the syslog level for this particular subsystem. Ignored if debug is on. +Possible values are: alert, crit, debug (same as debug = on), emerg, err, info, notice, warning. + +The default is: info. + +.TP +debug +This specifies whether debug output is logged for this particular logger. Also can contain +value trace, what is highest level of debug information. + +The default is off. + +.PP +Within the +.B logging +directive, logger_subsys directives are optional. + +.PP +Within the +.B logger_subsys +sub-directive, all of the above logging configuration options are valid and +can be used to override the default settings. +The subsys entry, described below, is mandatory to identify the subsystem. + +.TP +subsys +This specifies the subsystem identity (name) for which logging is specified. This is the +name used by a service in the log_init() call. E.g. 'CPG'. This directive is +required. + +.PP +Within the +.B quorum +directive it is possible to specify the quorum algorithm to use with the + +.TP +provider +directive. At the time of writing only corosync_votequorum is supported. +See votequorum(5) for configuration options. + +.PP +Within the +.B nodelist +directive it is possible to specify specific information about nodes in cluster. Directive +can contain only +.B node +sub-directive, which specifies every node that should be a member of the membership, and where +non-default options are needed. Every node must have at least ring0_addr field filled. + +Every node that should be a member of the membership must be specified. + +Possible options are: +.TP +ringX_addr +This specifies IP or network hostname address of the particular node. +X is a link number. + +.TP +nodeid +This configuration option is required for each node for Kronosnet mode. +It is a 32 bit value specifying the node identifier delivered to the +cluster membership service. The node identifier value of zero is +reserved and should not be used. If knet is set, this field must be set. + +.TP +name +This option is used mainly with knet transport to identify local node. +It's also used by client software (pacemaker). +Algorithm for identifying local node is following: +.RS +.IP 1. +Looks up $HOSTNAME in the nodelist +.IP 2. +If this fails strip the domain name from $HOSTNAME and looks up +that in the nodelist +.IP 3. +If this fails look in the nodelist for a fully-qualified name whose +short version matches the short version of $HOSTNAME +.IP 4. +If all this fails then search the interfaces list for an address that +matches a name in the nodelist +.RE + +.PP +Within the +.B system +directive it is possible to specify system options. + +Possible options are: +.TP +qb_ipc_type +This specifies type of IPC to use. Can be one of native (default), shm and socket. +Native means one of shm or socket, depending on what is supported by OS. On systems +with support for both, SHM is selected. SHM is generally faster, but need to allocate +ring buffer file in /dev/shm. + +.TP +sched_rr +Should be set to yes (default) if corosync should try to set round robin realtime +scheduling with maximal priority to itself. When setting of scheduler fails, fallback to set +maximal priority. + +.TP +priority +Set priority of corosync process. Valid only when sched_rr is set to no. +Can be ether numeric value with similar meaning as +.BR nice (1) +or +.B max +/ +.B min +meaning maximal / minimal priority (so minimal / maximal nice value). + +.TP +move_to_root_cgroup +Can be one of +.B yes +(Corosync always moves itself to root cgroup), +.B no +(Corosync never tries to move itself to root cgroup) or +.B auto +(Corosync first checks if sched_rr is enabled, and if +so, it tries to set round robin realtime scheduling with maximal priority to itself. +If setting of priority fails, corosync tries to move itself to root +cgroup and retries setting of priority). + +This feature is available only for systems with cgroups v1 with RT +sched enabled (Linux with CONFIG_RT_GROUP_SCHED kernel option) and cgroups v2. + +It's worth noting that currently (May 3 2021) cgroup2 doesn’t yet +support control of realtime processes and the cpu controller can only be +enabled when all RT processes are in the root cgroup (applies only for kernel +with CONFIG_RT_GROUP_SCHED enabled). So when move_to_root_cgroup +is disabled, kernel is compiled with CONFIG_RT_GROUP_SCHED and systemd is used, +it may be impossible to make systemd options +like CPUQuota working correctly until corosync is stopped. + +Also when moving to root cgroup is enforced and used together with cgroup2 and systemd +it makes impossible (most of the time) for journald to add systemd specific +metadata (most importantly _SYSTEMD_UNIT) properly, because corosync is +moved out of cgroup created by systemd. This means +it is not possible to filter corosync logged messages based on these metadata +(for example using -u or _SYSTEMD_UNIT=UNIT pattern) and also running +systemctl status doesn't display (all) corosync log messages. +The problem is even worse because journald caches pid for some time +(approx. 5 sec) so initial corosync messages have correct metadata. + +.TP +allow_knet_handle_fallback +If knet handle creation fails using privileged operations, allow fallback to +creating knet handle using unprivileged operations. Defaults to no, meaning +if privileged knet handle creation fails, corosync will refuse to start. + +The knet handle will always be created using privileged operations if possible, +setting this to yes only allows fallback to unprivileged operations. This fallback +may result in performance issues, but if running in an unprivileged environment, +e.g. as a normal user or in unprivileged container, this may be required. + +.TP +state_dir +Existing directory where corosync should chdir into. Corosync stores +important state files and blackboxes there. + +The default is /var/lib/corosync. + +.PP +Within the +.B resources +directive it is possible to specify options for resources. + +Possible option is: +.TP +watchdog_device +(Valid only if Corosync was compiled with watchdog support.) +.br +Watchdog device to use, for example /dev/watchdog. +If unset, empty or "off", no watchdog is used. +.IP +In a cluster with properly configured power fencing a watchdog +provides no additional value. On the other hand, slow watchdog +communication may incur multi-second delays in the Corosync main loop, +potentially breaking down membership. IPMI watchdogs are particularly +notorious in this regard: read about kipmid_max_busy_us in IPMI.txt in +the Linux kernel documentation. + + +.PP +Within the +.B nozzle +directive it is possible to specify options for a libnozzle device. This is a pseudo +ethernet device that routes network traffic through a channel on the corosync knet network +(NOT cpg or any corosync internal service) to other nodes in the cluster. This allows +applications to take advantage of knet features such as multipathing, automatic failover, +link switching etc. Note that libnozzle is not a reliable transport, but you can tunnel TCP +through it for reliable communications. +.br +libnozzle also supports optional interface up/down scripts that are kept under a +/etc/corosync/updown.d/ directory. See the knet documentation for more information. +.br +Only one nozzle device is allowed. +.br +The nozzle stanza takes several options: +.TP +name +The name of the network device to be created. On Linux this may be any name at all, other +platforms have restrictions on the name. +.TP +ipaddr +The IP address (IPv6 or IPv4) of the interface. The bottom part of this address will be replaced +by the local node's nodeid in conjunction with ipprefix. so, eg +ipaddr: 192.168.1.0 +ipprefix: 24 +will make nodeids 1,2,5 use IP addresses 192.168.1.1, 192.168.1.2 & 192.168.1.5. +If a prefix length of 16 is used then the bottom two bytes will be filled in with nodeid numbers. +IPv6 addresses must end in '::', the nodeid will be added after the two colons to make the +local IP address. +Only one IP address is currently supported in the corosync.conf file. Additional IP addresses +can be added in the ifup script if necessary. +.TP +ipprefix +specifies the IP address prefix for the nozzle device (see above) +.TP +macaddr +Specifies the MAC address prefix for the nozzle device. As for the IP address, the bottom part +of the MAC address will be filled in with the node id. In this case no prefix applies, the bottom +two bytes of the MAC address will always be overwritten with the node id. So specifying +macaddr: 54:54:12:24:12:12 on nodeid 1 will result in it having a MAC address of 54:54:12:24:00:01 + +.SH "TO ADD A NEW NODE TO THE CLUSTER" +For example to add a node with address 10.24.38.108 with nodeid 3. The node has the name NEW +(in DNS or /etc/hosts) and is not currently running corosync. The current corosync.conf nodelist +looks like this: +.PP +.nf +.RS +nodelist { + node { + nodeid: 1 + ring0_addr: 10.24.38.101 + name: node1 + } + node { + nodeid: 2 + ring0_addr: 10.24.38.102 + name: node2 + + } +} +.RE +.fi +.PP +Add a new entry for the node below the existing nodes. Node entries don't have +to be in nodeid order, but it will help keep you sane. So the nodelist now looks like this: +.PP +.nf +.RS +nodelist { + node { + nodeid: 1 + ring0_addr: 10.24.38.101 + name: node1 + } + node { + nodeid: 2 + ring0_addr: 10.24.38.102 + name: node2 + + } + node { + nodeid: 3 + ring0_addr: 10.24.38.108 + name: NEW + + } +} +.RE +.fi +.PP + +.PP +This file must then be copied onto all three nodes - the existing two nodes, and the new one. +On one of the existing corosync nodes, tell corosync to re-read the updated config file into memory: +.PP +.nf +.RS +corosync-cfgtool -R +.RE +.fi +.PP +This command only needs to be run on one node in the cluster. You may then start corosync on the NEW node +and it should join the cluster. If this doesn't work as expected then check the communications between all +three nodes is working, and check the syslog files on all nodes for more information. It's important to note +that the key bit of information about a node failing to join might be on a different node than you expect. + +.SH "TO REMOVE A NODE FROM THE CLUSTER" +This is the reverse procedure to 'Adding a node' above. First you need to shut down the node you will +be removing from the cluster. +.PP +.nf +.RS +corosync-cfgtool -H +.RE +.fi + + +.PP +Then delete the nodelist stanza from corosync.conf and finally update corosync on the remaining nodes by +running +.PP +.nf +.RS +corosync-cfgtool -R +.RE +.fi +.TP +on one of them. + +.SH "ADDRESS RESOLUTION" +corosync resolves ringX_addr names/IP addresses using the getaddrinfo(3) call with respect +of totem.ip_version setting. + +getaddrinfo() function uses a sophisticated algorithm to sort node addresses into a preferred +order and corosync always chooses the first address in that list of the required family. +As such it is essential that your DNS or /etc/hosts files are correctly configured so that +all addresses for ringX appear on the same network (or are reachable with minimal hops) +and over the same IP protocol. If this is not the case then some nodes might not be able +to join the cluster. It is possible to override the search order used +by getaddrinfo() using the configuration file /etc/gai.conf(5) if necessary, +but this is not recommended. + +If there is any doubt about the order of addresses returned from getaddrinfo() then it might be simpler to use +IP addresses (v4 or v6) in the ringX_addr field. + +.SH "FILES" +.TP +/etc/corosync/corosync.conf +The corosync executive configuration file. + +.SH "SEE ALSO" +.BR corosync_overview (7), +.BR votequorum (5), +.BR corosync-qdevice (8), +.BR logrotate (8) +.BR getaddrinfo (3) +.BR gai.conf (5) +.PP |