diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-06-03 13:39:28 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-06-03 13:39:28 +0000 |
commit | 924f5ea83e48277e014ebf0d19a27187cb93e2f7 (patch) | |
tree | 75920a275bba045f6d108204562c218a9a26ea15 /doc/sphinx/Pacemaker_Explained/cluster-options.rst | |
parent | Adding upstream version 2.1.7. (diff) | |
download | pacemaker-upstream.tar.xz pacemaker-upstream.zip |
Adding upstream version 2.1.8~rc1.upstream/2.1.8_rc1upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/sphinx/Pacemaker_Explained/cluster-options.rst')
-rw-r--r-- | doc/sphinx/Pacemaker_Explained/cluster-options.rst | 242 |
1 files changed, 80 insertions, 162 deletions
diff --git a/doc/sphinx/Pacemaker_Explained/cluster-options.rst b/doc/sphinx/Pacemaker_Explained/cluster-options.rst index 77bd7e6..042ed0b 100644 --- a/doc/sphinx/Pacemaker_Explained/cluster-options.rst +++ b/doc/sphinx/Pacemaker_Explained/cluster-options.rst @@ -62,143 +62,6 @@ Normally, you will use command-line tools that abstract the XML, so the distinction will be unimportant; both properties and options are cluster settings you can tweak. -Configuration Value Types -######################### - -Throughout this document, configuration values will be designated as having one -of the following types: - -.. list-table:: **Configuration Value Types** - :class: longtable - :widths: 1 3 - :header-rows: 1 - - * - Type - - Description - * - .. _boolean: - - .. index:: - pair: type; boolean - - boolean - - Case-insensitive text value where ``1``, ``yes``, ``y``, ``on``, - and ``true`` evaluate as true and ``0``, ``no``, ``n``, ``off``, - ``false``, and unset evaluate as false - * - .. _date_time: - - .. index:: - pair: type; date/time - - date/time - - Textual timestamp like ``Sat Dec 21 11:47:45 2013`` - * - .. _duration: - - .. index:: - pair: type; duration - - duration - - A time duration, specified either like a :ref:`timeout <timeout>` or an - `ISO 8601 duration <https://en.wikipedia.org/wiki/ISO_8601#Durations>`_. - A duration may be up to approximately 49 days but is intended for much - smaller time periods. - * - .. _enumeration: - - .. index:: - pair: type; enumeration - - enumeration - - Text that must be one of a set of defined values (which will be listed - in the description) - * - .. _integer: - - .. index:: - pair: type; integer - - integer - - 32-bit signed integer value (-2,147,483,648 to 2,147,483,647) - * - .. _nonnegative_integer: - - .. index:: - pair: type; nonnegative integer - - nonnegative integer - - 32-bit nonnegative integer value (0 to 2,147,483,647) - * - .. _port: - - .. index:: - pair: type; port - - port - - Integer TCP port number (0 to 65535) - * - .. _score: - - .. index:: - pair: type; score - - score - - A Pacemaker score can be an integer between -1,000,000 and 1,000,000, or - a string alias: ``INFINITY`` or ``+INFINITY`` is equivalent to - 1,000,000, ``-INFINITY`` is equivalent to -1,000,000, and ``red``, - ``yellow``, and ``green`` are equivalent to integers as described in - :ref:`node-health`. - * - .. _text: - - .. index:: - pair: type; text - - text - - A text string - * - .. _timeout: - - .. index:: - pair: type; timeout - - timeout - - A time duration, specified as a bare number (in which case it is - considered to be in seconds) or a number with a unit (``ms`` or ``msec`` - for milliseconds, ``us`` or ``usec`` for microseconds, ``s`` or ``sec`` - for seconds, ``m`` or ``min`` for minutes, ``h`` or ``hr`` for hours) - optionally with whitespace before and/or after the number. - * - .. _version: - - .. index:: - pair: type; version - - version - - Version number (any combination of alphanumeric characters, dots, and - dashes, starting with a number). - - -Scores -______ - -Scores are integral to how Pacemaker works. Practically everything from moving -a resource to deciding which resource to stop in a degraded cluster is achieved -by manipulating scores in some way. - -Scores are calculated per resource and node. Any node with a negative score for -a resource can't run that resource. The cluster places a resource on the node -with the highest score for it. - -Score addition and subtraction follow these rules: - -* Any value (including ``INFINITY``) - ``INFINITY`` = ``-INFINITY`` -* ``INFINITY`` + any value other than ``-INFINITY`` = ``INFINITY`` - -.. note:: - - What if you want to use a score higher than 1,000,000? Typically this possibility - arises when someone wants to base the score on some external metric that might - go above 1,000,000. - - The short answer is you can't. - - The long answer is it is sometimes possible work around this limitation - creatively. You may be able to set the score to some computed value based on - the external metric rather than use the metric directly. For nodes, you can - store the metric as a node attribute, and query the attribute when computing - the score (possibly as part of a custom resource agent). - CIB Properties ############## @@ -321,6 +184,15 @@ holds. So the decision was made to place them in an easy-to-find location. - - Node ID of the cluster's current designated controller (DC). Used and maintained by the cluster. + * - .. _execution_date: + + .. index:: + pair: execution-date; cib + + execution-date + - :ref:`epoch time <epoch_time>` + - + - Time to use when evaluating rules. .. _cluster_options: @@ -427,6 +299,29 @@ values, by running the ``man pacemaker-schedulerd`` and - The number of :ref:`live migration <live-migration>` actions that the cluster is allowed to execute in parallel on a node. A value of -1 means unlimited. + * - .. _load_threshold: + + .. index:: + pair: cluster option; load-threshold + + load-threshold + - :ref:`percentage <percentage>` + - 80% + - Maximum amount of system load that should be used by cluster nodes. The + cluster will slow down its recovery process when the amount of system + resources used (currently CPU) approaches this limit. + * - .. _node_action_limit: + + .. index:: + pair: cluster option; node-action-limit + + node-action-limit + - :ref:`integer <integer>` + - 0 + - Maximum number of jobs that can be scheduled per node. If nonpositive or + invalid, double the number of cores is used as the maximum number of jobs + per node. :ref:`PCMK_node_action_limit <pcmk_node_action_limit>` + overrides this option on a per-node basis. * - .. _symmetric_cluster: .. index:: @@ -558,6 +453,22 @@ values, by running the ``man pacemaker-schedulerd`` and - How many times fencing can fail for a target before the cluster will no longer immediately re-attempt it. Any value below 1 will be ignored, and the default will be used instead. + * - .. _have_watchdog: + + .. index:: + pair: cluster option; have-watchdog + + have-watchdog + - :ref:`boolean <boolean>` + - *detected* + - Whether watchdog integration is enabled. This is set automatically by the + cluster according to whether SBD is detected to be in use. + User-configured values are ignored. The value `true` is meaningful if + diskless SBD is used and + :ref:`stonith-watchdog-timeout <stonith_watchdog_timeout>` is nonzero. In + that case, if fencing is required, watchdog-based self-fencing will be + performed via SBD without requiring a fencing resource explicitly + configured. * - .. _stonith_watchdog_timeout: .. index:: @@ -568,23 +479,29 @@ values, by running the ``man pacemaker-schedulerd`` and - 0 - If nonzero, and the cluster detects ``have-watchdog`` as ``true``, then watchdog-based self-fencing will be performed via SBD when fencing is - required, without requiring a fencing resource explicitly configured. - - If this is set to a positive value, unseen nodes are assumed to - self-fence within this much time. + required. - **Warning:** It must be ensured that this value is larger than the - ``SBD_WATCHDOG_TIMEOUT`` environment variable on all nodes. Pacemaker - verifies the settings individually on all nodes and prevents startup or - shuts down if configured wrongly on the fly. It is strongly recommended - that ``SBD_WATCHDOG_TIMEOUT`` be set to the same value on all nodes. + If this is set to a positive value, lost nodes are assumed to achieve + self-fencing within this much time. + + This does not require a fencing resource to be explicitly configured, + though a fence_watchdog resource can be configured, to limit use to + specific nodes. + + If this is set to 0 (the default), the cluster will never assume + watchdog-based self-fencing. + + If this is set to a negative value, the cluster will use twice the local + value of the ``SBD_WATCHDOG_TIMEOUT`` environment variable if that is + positive, or otherwise treat this as 0. - If this is set to a negative value, and ``SBD_WATCHDOG_TIMEOUT`` is set, - twice that value will be used. + **Warning:** When used, this timeout must be larger than + ``SBD_WATCHDOG_TIMEOUT`` on all nodes that use watchdog-based SBD, and + Pacemaker will refuse to start on any of those nodes where this is not + true for the local value or SBD is not active. When this is set to a + negative value, ``SBD_WATCHDOG_TIMEOUT`` must be set to the same value + on all nodes that use SBD, otherwise data corruption or loss could occur. - **Warning:** In this case, it is essential (and currently not verified - by pacemaker) that ``SBD_WATCHDOG_TIMEOUT`` is set to the same value on - all nodes. * - .. _concurrent-fencing: .. index:: @@ -607,12 +524,13 @@ values, by running the ``man pacemaker-schedulerd`` and - :ref:`enumeration <enumeration>` - stop - How should a cluster node react if notified of its own fencing? A - cluster node may receive notification of its own fencing if fencing is - misconfigured, or if fabric fencing is in use that doesn't cut cluster - communication. Allowed values are ``stop`` to attempt to immediately - stop Pacemaker and stay stopped, or ``panic`` to attempt to immediately - reboot the local node, falling back to stop on failure. The default is - likely to be changed to ``panic`` in a future release. *(since 2.0.3)* + cluster node may receive notification of a "succeeded" fencing that + targeted it if fencing is misconfigured, or if fabric fencing is in use + that doesn't cut cluster communication. Allowed values are ``stop`` to + attempt to immediately stop Pacemaker and stay stopped, or ``panic`` to + attempt to immediately reboot the local node, falling back to stop on + failure. The default is likely to be changed to ``panic`` in a future + release. *(since 2.0.3)* * - .. _priority_fencing_delay: .. index:: @@ -784,7 +702,7 @@ values, by running the ``man pacemaker-schedulerd`` and node-health-red - :ref:`score <score>` - - 0 + - -INFINITY - The score to use for a node health attribute whose value is ``red``. Only used when ``node-health-strategy`` is ``progressive`` or ``custom``. @@ -797,10 +715,10 @@ values, by running the ``man pacemaker-schedulerd`` and - :ref:`duration <duration>` - 15min - Pacemaker is primarily event-driven, and looks ahead to know when to - recheck the cluster for failure timeouts and most time-based rules - *(since 2.0.3)*. However, it will also recheck the cluster after this - amount of inactivity. This has two goals: rules with ``date_spec`` are - only guaranteed to be checked this often, and it also serves as a + recheck the cluster for failure-timeout settings and most time-based + rules *(since 2.0.3)*. However, it will also recheck the cluster after + this amount of inactivity. This has two goals: rules with ``date_spec`` + are only guaranteed to be checked this often, and it also serves as a fail-safe for some kinds of scheduler bugs. A value of 0 disables this polling. * - .. _shutdown_lock: |