diff options
Diffstat (limited to 'doc/sphinx/Pacemaker_Administration')
-rw-r--r-- | doc/sphinx/Pacemaker_Administration/administrative.rst | 150 | ||||
-rw-r--r-- | doc/sphinx/Pacemaker_Administration/alerts.rst | 6 | ||||
-rw-r--r-- | doc/sphinx/Pacemaker_Administration/configuring.rst | 109 | ||||
-rw-r--r-- | doc/sphinx/Pacemaker_Administration/index.rst | 2 | ||||
-rw-r--r-- | doc/sphinx/Pacemaker_Administration/moving.rst | 305 | ||||
-rw-r--r-- | doc/sphinx/Pacemaker_Administration/pcs-crmsh.rst | 14 |
6 files changed, 516 insertions, 70 deletions
diff --git a/doc/sphinx/Pacemaker_Administration/administrative.rst b/doc/sphinx/Pacemaker_Administration/administrative.rst new file mode 100644 index 0000000..7c8b346 --- /dev/null +++ b/doc/sphinx/Pacemaker_Administration/administrative.rst @@ -0,0 +1,150 @@ +.. index:: + single: administrative mode + +Administrative Modes +-------------------- + +Intrusive administration can be performed on a Pacemaker cluster without +causing resource failures, recovery, and fencing, by putting the cluster or a +subset of it into an administrative mode. + +Pacemaker supports several administrative modes: + +* Maintenance mode for the entire cluster, specific nodes, or specific + resources +* Unmanaged resources +* Disabled configuration items +* Standby mode for specific nodes + +Rules may be used to automatically set any of these modes for specific times or +other conditions. + + +.. index:: + pair: administrative mode; maintenance mode + +.. _maintenance_mode: + +Maintenance Mode +################ + +In maintenance mode, the cluster will not start or stop resources. Recurring +monitors for affected resources will be paused, except those specifying +``role`` as ``Stopped``. + +To put a specific resource into maintenance mode, set the resource's +``maintenance`` meta-attribute to ``true``. + +To put all active resources on a specific node into maintenance mode, set the +node's ``maintenance`` node attribute to ``true``. When enabled, this overrides +resource-specific maintenance mode. + +.. warning:: + + Restarting Pacemaker on a node that is in single-node maintenance mode will + likely lead to undesirable effects. If ``maintenance`` is set as a transient + attribute, it will be erased when Pacemaker is stopped, which will + immediately take the node out of maintenance mode and likely get it fenced. + If set as a permanent attribute, any resources active on the node will have + their local history erased when Pacemaker is restarted, so the cluster will + no longer consider them running on the node and thus will consider them + managed again, allowing them to be started elsewhere. + +To put all resources in the cluster into maintenance mode, set the +``maintenance-mode`` cluster option to ``true``. When enabled, this overrides +node- or resource- specific maintenance mode. + +Maintenance mode, at any level, overrides other administrative modes. + + +.. index:: + pair: administrative mode; unmanaged resources + +.. _unmanaged_resources: + +Unmanaged Resources +################### + +An unmanaged resource will not be started or stopped by the cluster. A resource +may become unmanaged in several ways: + +* The administrator may set the ``is-managed`` resource meta-attribute to + ``false`` (whether for a specific resource, or all resources without an + explicit setting via ``rsc_defaults``) +* :ref:`Maintenance mode <maintenance_mode>` causes affected resources to + become unmanaged (and overrides any ``is-managed`` setting) +* Certain types of failure cause affected resources to become unmanaged. These + include: + + * Failed stop operations when the ``stonith-enabled`` cluster property is set + to ``false`` + * Failure of an operation that has ``on-fail`` set to ``block`` + * A resource detected as incorrectly active on more than one node when its + ``multiple-active`` meta-attribute is set to ``block`` + * A resource constrained by a revoked ``rsc_ticket`` with ``loss-policy`` set + to ``freeze`` + * Resources with ``requires`` set (or defaulting) to anything other than + ``nothing`` in a partition that loses quorum when the ``no-quorum-policy`` + cluster option is set to ``freeze`` + +Recurring actions are not affected by unmanaging a resource. + +.. warning:: + + Manually starting an unmanaged resource on a different node is strongly + discouraged. It will at least cause the cluster to consider the resource + failed, and may require the resource's ``target-role`` to be set to + ``Stopped`` then ``Started`` in order for recovery to succeed. + + +.. index:: + pair: administrative mode; disabled configuration + +.. _disabled_configuration: + +Disabled Configuration +###################### + +Some configuration elements disable particular behaviors: + +* The ``stonith-enabled`` cluster option, when set to ``false``, disables node + fencing. This is highly discouraged, as it can lead to data unavailability, + loss, or corruption. + +* The ``stop-all-resources`` cluster option, when set to ``true``, causes all + resources to be stopped. + +* Certain elements support an ``enabled`` meta-attribute, which if set to + ``false``, causes the cluster to act as if the specific element is not + configured. These include ``op``, ``alert`` *(since 2.1.6)*, and + ``recipient`` *(since 2.1.6)*. ``enabled`` may be set for specific ``op`` + elements, or all operations without an explicit setting via ``op_defaults``. + + +.. index:: + pair: administrative mode; standby + +.. _standby: + +Standby Mode +############ + +When a node is put into standby, all resources will be moved away from the +node, and all recurring operations will be stopped on the node, except those +specifying ``role`` as ``Stopped`` (which will be newly initiated if +appropriate). + +A node may be put into standby mode by setting its ``standby`` node attribute +to ``true``. The attribute may be queried and set using the ``crm_standby`` +tool. + + +.. index:: + pair: administrative mode; rules + +Rules +##### + +Rules may be used to set administrative mode options automatically according to +various criteria such as date and time. See the "Rules" chapter of the +*Pacemaker Explained* document for details. diff --git a/doc/sphinx/Pacemaker_Administration/alerts.rst b/doc/sphinx/Pacemaker_Administration/alerts.rst index c0f54c6..42efc8d 100644 --- a/doc/sphinx/Pacemaker_Administration/alerts.rst +++ b/doc/sphinx/Pacemaker_Administration/alerts.rst @@ -287,7 +287,7 @@ Special concerns when writing alert agents: this into consideration, for example by queueing resource-intensive actions into some other instance, instead of directly executing them. -* Alert agents are run as the ``hacluster`` user, which has a minimal set +* Alert agents are run as the |CRM_DAEMON_USER| user, which has a minimal set of permissions. If an agent requires additional privileges, it is recommended to configure ``sudo`` to allow the agent to run the necessary commands as another user with the appropriate privileges. @@ -297,7 +297,7 @@ Special concerns when writing alert agents: user-configured ``timestamp-format``), ``CRM_alert_recipient,`` and all instance attributes. Mostly this is needed simply to protect against configuration errors, but if some user can modify the CIB without having - ``hacluster``-level access to the cluster nodes, it is a potential security + |CRM_DAEMON_USER| access to the cluster nodes, it is a potential security concern as well, to avoid the possibility of code injection. .. note:: **ocf:pacemaker:ClusterMon compatibility** @@ -308,4 +308,4 @@ Special concerns when writing alert agents: passed to alert agents are available prepended with ``CRM_notify_`` as well as ``CRM_alert_``. One break in compatibility is that ``ClusterMon`` ran external scripts as the ``root`` user, while alert agents are run as the - ``hacluster`` user. + |CRM_DAEMON_USER| user. diff --git a/doc/sphinx/Pacemaker_Administration/configuring.rst b/doc/sphinx/Pacemaker_Administration/configuring.rst index 415dd81..295c96a 100644 --- a/doc/sphinx/Pacemaker_Administration/configuring.rst +++ b/doc/sphinx/Pacemaker_Administration/configuring.rst @@ -189,48 +189,53 @@ cluster even if the machine itself is not in the same cluster. To do this, one simply sets up a number of environment variables and runs the same commands as when working on a cluster node. -.. table:: **Environment Variables Used to Connect to Remote Instances of the CIB** - - +----------------------+-----------+------------------------------------------------+ - | Environment Variable | Default | Description | - +======================+===========+================================================+ - | CIB_user | $USER | .. index:: | - | | | single: CIB_user | - | | | single: environment variable; CIB_user | - | | | | - | | | The user to connect as. Needs to be | - | | | part of the ``haclient`` group on | - | | | the target host. | - +----------------------+-----------+------------------------------------------------+ - | CIB_passwd | | .. index:: | - | | | single: CIB_passwd | - | | | single: environment variable; CIB_passwd | - | | | | - | | | The user's password. Read from the | - | | | command line if unset. | - +----------------------+-----------+------------------------------------------------+ - | CIB_server | localhost | .. index:: | - | | | single: CIB_server | - | | | single: environment variable; CIB_server | - | | | | - | | | The host to contact | - +----------------------+-----------+------------------------------------------------+ - | CIB_port | | .. index:: | - | | | single: CIB_port | - | | | single: environment variable; CIB_port | - | | | | - | | | The port on which to contact the server; | - | | | required. | - +----------------------+-----------+------------------------------------------------+ - | CIB_encrypted | TRUE | .. index:: | - | | | single: CIB_encrypted | - | | | single: environment variable; CIB_encrypted | - | | | | - | | | Whether to encrypt network traffic | - +----------------------+-----------+------------------------------------------------+ +.. list-table:: **Environment Variables Used to Connect to Remote Instances of the CIB** + :class: longtable + :widths: 2 2 5 + :header-rows: 1 + + * - Environment Variable + - Default + - Description + * - .. index:: + single: CIB_user + single: environment variable; CIB_user + + CIB_user + - |CRM_DAEMON_USER_RAW| + - The user to connect as. Needs to be part of the |CRM_DAEMON_GROUP| group + on the target host. + * - .. index:: + single: CIB_passwd + single: environment variable; CIB_passwd + + CIB_passwd + - + - The user's password. Read from the command line if unset. + * - .. index:: + single: CIB_server + single: environment variable; CIB_server + + CIB_server + - localhost + - The host to contact + * - .. index:: + single: CIB_port + single: environment variable; CIB_port + + CIB_port + - + - The port on which to contact the server; required + * - .. index:: + single: CIB_encrypted + single: environment variable; CIB_encrypted + + CIB_encrypted + - true + - Whether to encrypt network traffic So, if **c001n01** is an active cluster node and is listening on port 1234 -for connections, and **someuser** is a member of the **haclient** group, +for connections, and **someuser** is a member of the |CRM_DAEMON_GROUP| group, then the following would prompt for **someuser**'s password and return the cluster's current configuration: @@ -243,27 +248,9 @@ For security reasons, the cluster does not listen for remote connections by default. If you wish to allow remote access, you need to set the ``remote-tls-port`` (encrypted) or ``remote-clear-port`` (unencrypted) CIB properties (i.e., those kept in the ``cib`` tag, like ``num_updates`` and -``epoch``). - -.. table:: **Extra top-level CIB properties for remote access** - - +----------------------+-----------+------------------------------------------------------+ - | CIB Property | Default | Description | - +======================+===========+======================================================+ - | remote-tls-port | | .. index:: | - | | | single: remote-tls-port | - | | | single: CIB property; remote-tls-port | - | | | | - | | | Listen for encrypted remote connections | - | | | on this port. | - +----------------------+-----------+------------------------------------------------------+ - | remote-clear-port | | .. index:: | - | | | single: remote-clear-port | - | | | single: CIB property; remote-clear-port | - | | | | - | | | Listen for plaintext remote connections | - | | | on this port. | - +----------------------+-----------+------------------------------------------------------+ +``epoch``). Encrypted communication is keyless, which makes it subject to +man-in-the-middle attacks, and thus either option should be used only on +protected networks. .. important:: diff --git a/doc/sphinx/Pacemaker_Administration/index.rst b/doc/sphinx/Pacemaker_Administration/index.rst index 327ad31..af89380 100644 --- a/doc/sphinx/Pacemaker_Administration/index.rst +++ b/doc/sphinx/Pacemaker_Administration/index.rst @@ -22,6 +22,8 @@ Table of Contents cluster configuring tools + administrative + moving troubleshooting upgrading alerts diff --git a/doc/sphinx/Pacemaker_Administration/moving.rst b/doc/sphinx/Pacemaker_Administration/moving.rst new file mode 100644 index 0000000..3d6a92a --- /dev/null +++ b/doc/sphinx/Pacemaker_Administration/moving.rst @@ -0,0 +1,305 @@ +Moving Resources +---------------- + +.. index:: + single: resource; move + +Moving Resources Manually +######################### + +There are primarily two occasions when you would want to move a resource from +its current location: when the whole node is under maintenance, and when a +single resource needs to be moved. + +.. index:: + single: standby mode + single: node; standby mode + +Standby Mode +____________ + +Since everything eventually comes down to a score, you could create constraints +for every resource to prevent them from running on one node. While Pacemaker +configuration can seem convoluted at times, not even we would require this of +administrators. + +Instead, you can set a special node attribute which tells the cluster "don't +let anything run here". There is even a helpful tool to help query and set it, +called ``crm_standby``. To check the standby status of the current machine, +run: + +.. code-block:: none + + # crm_standby -G + +A value of ``on`` indicates that the node is *not* able to host any resources, +while a value of ``off`` says that it *can*. + +You can also check the status of other nodes in the cluster by specifying the +`--node` option: + +.. code-block:: none + + # crm_standby -G --node sles-2 + +To change the current node's standby status, use ``-v`` instead of ``-G``: + +.. code-block:: none + + # crm_standby -v on + +Again, you can change another host's value by supplying a hostname with +``--node``. + +A cluster node in standby mode will not run resources, but still contributes to +quorum, and may fence or be fenced by nodes. + +Moving One Resource +___________________ + +When only one resource is required to move, we could do this by creating +location constraints. However, once again we provide a user-friendly shortcut +as part of the ``crm_resource`` command, which creates and modifies the extra +constraints for you. If ``Email`` were running on ``sles-1`` and you wanted it +moved to a specific location, the command would look something like: + +.. code-block:: none + + # crm_resource -M -r Email -H sles-2 + +Behind the scenes, the tool will create the following location constraint: + +.. code-block:: xml + + <rsc_location id="cli-prefer-Email" rsc="Email" node="sles-2" score="INFINITY"/> + +It is important to note that subsequent invocations of ``crm_resource -M`` are +not cumulative. So, if you ran these commands: + +.. code-block:: none + + # crm_resource -M -r Email -H sles-2 + # crm_resource -M -r Email -H sles-3 + +then it is as if you had never performed the first command. + +To allow the resource to move back again, use: + +.. code-block:: none + + # crm_resource -U -r Email + +Note the use of the word *allow*. The resource *can* move back to its original +location, but depending on ``resource-stickiness``, location constraints, and +so forth, it might stay where it is. + +To be absolutely certain that it moves back to ``sles-1``, move it there before +issuing the call to ``crm_resource -U``: + +.. code-block:: none + + # crm_resource -M -r Email -H sles-1 + # crm_resource -U -r Email + +Alternatively, if you only care that the resource should be moved from its +current location, try: + +.. code-block:: none + + # crm_resource -B -r Email + +which will instead create a negative constraint, like: + +.. code-block:: xml + + <rsc_location id="cli-ban-Email-on-sles-1" rsc="Email" node="sles-1" score="-INFINITY"/> + +This will achieve the desired effect, but will also have long-term +consequences. As the tool will warn you, the creation of a ``-INFINITY`` +constraint will prevent the resource from running on that node until +``crm_resource -U`` is used. This includes the situation where every other +cluster node is no longer available! + +In some cases, such as when ``resource-stickiness`` is set to ``INFINITY``, it +is possible that you will end up with nodes with the same score, forcing the +cluster to choose one (which may not be the one you want). The tool can detect +some of these cases and deals with them by creating both positive and negative +constraints. For example: + +.. code-block:: xml + + <rsc_location id="cli-ban-Email-on-sles-1" rsc="Email" node="sles-1" score="-INFINITY"/> + <rsc_location id="cli-prefer-Email" rsc="Email" node="sles-2" score="INFINITY"/> + +which has the same long-term consequences as discussed earlier. + +Moving Resources Due to Connectivity Changes +############################################ + +You can configure the cluster to move resources when external connectivity is +lost in two steps. + +.. index:: + single: ocf:pacemaker:ping resource + single: ping resource + +Tell Pacemaker to Monitor Connectivity +______________________________________ + +First, add an ``ocf:pacemaker:ping`` resource to the cluster. The ``ping`` +resource uses the system utility of the same name to a test whether a list of +machines (specified by DNS hostname or IP address) are reachable, and uses the +results to maintain a node attribute. + +The node attribute is called ``pingd`` by default, but is customizable in order +to allow multiple ping groups to be defined. + +Normally, the ping resource should run on all cluster nodes, which means that +you'll need to create a clone. A template for this can be found below, along +with a description of the most interesting parameters. + +.. table:: **Commonly Used ocf:pacemaker:ping Resource Parameters** + :widths: 1 4 + + +--------------------+--------------------------------------------------------------+ + | Resource Parameter | Description | + +====================+==============================================================+ + | dampen | .. index:: | + | | single: ocf:pacemaker:ping resource; dampen parameter | + | | single: dampen; ocf:pacemaker:ping resource parameter | + | | | + | | The time to wait (dampening) for further changes to occur. | + | | Use this to prevent a resource from bouncing around the | + | | cluster when cluster nodes notice the loss of connectivity | + | | at slightly different times. | + +--------------------+--------------------------------------------------------------+ + | multiplier | .. index:: | + | | single: ocf:pacemaker:ping resource; multiplier parameter | + | | single: multiplier; ocf:pacemaker:ping resource parameter | + | | | + | | The number of connected ping nodes gets multiplied by this | + | | value to get a score. Useful when there are multiple ping | + | | nodes configured. | + +--------------------+--------------------------------------------------------------+ + | host_list | .. index:: | + | | single: ocf:pacemaker:ping resource; host_list parameter | + | | single: host_list; ocf:pacemaker:ping resource parameter | + | | | + | | The machines to contact in order to determine the current | + | | connectivity status. Allowed values include resolvable DNS | + | | connectivity host names, IPv4 addresses, and IPv6 addresses. | + +--------------------+--------------------------------------------------------------+ + +.. topic:: Example ping resource that checks node connectivity once every minute + + .. code-block:: xml + + <clone id="Connected"> + <primitive id="ping" class="ocf" provider="pacemaker" type="ping"> + <instance_attributes id="ping-attrs"> + <nvpair id="ping-dampen" name="dampen" value="5s"/> + <nvpair id="ping-multiplier" name="multiplier" value="1000"/> + <nvpair id="ping-hosts" name="host_list" value="my.gateway.com www.bigcorp.com"/> + </instance_attributes> + <operations> + <op id="ping-monitor-60s" interval="60s" name="monitor"/> + </operations> + </primitive> + </clone> + +.. important:: + + You're only half done. The next section deals with telling Pacemaker how to + deal with the connectivity status that ``ocf:pacemaker:ping`` is recording. + +Tell Pacemaker How to Interpret the Connectivity Data +_____________________________________________________ + +.. important:: + + Before attempting the following, make sure you understand rules. See the + "Rules" chapter of the *Pacemaker Explained* document for details. + +There are a number of ways to use the connectivity data. + +The most common setup is for people to have a single ping target (for example, +the service network's default gateway), to prevent the cluster from running a +resource on any unconnected node. + +.. topic:: Don't run a resource on unconnected nodes + + .. code-block:: xml + + <rsc_location id="WebServer-no-connectivity" rsc="Webserver"> + <rule id="ping-exclude-rule" score="-INFINITY" > + <expression id="ping-exclude" attribute="pingd" operation="not_defined"/> + </rule> + </rsc_location> + +A more complex setup is to have a number of ping targets configured. You can +require the cluster to only run resources on nodes that can connect to all (or +a minimum subset) of them. + +.. topic:: Run only on nodes connected to three or more ping targets + + .. code-block:: xml + + <primitive id="ping" provider="pacemaker" class="ocf" type="ping"> + ... <!-- omitting some configuration to highlight important parts --> + <nvpair id="ping-multiplier" name="multiplier" value="1000"/> + ... + </primitive> + ... + <rsc_location id="WebServer-connectivity" rsc="Webserver"> + <rule id="ping-prefer-rule" score="-INFINITY" > + <expression id="ping-prefer" attribute="pingd" operation="lt" value="3000"/> + </rule> + </rsc_location> + +Alternatively, you can tell the cluster only to *prefer* nodes with the best +connectivity, by using ``score-attribute`` in the rule. Just be sure to set +``multiplier`` to a value higher than that of ``resource-stickiness`` (and +don't set either of them to ``INFINITY``). + +.. topic:: Prefer node with most connected ping nodes + + .. code-block:: xml + + <rsc_location id="WebServer-connectivity" rsc="Webserver"> + <rule id="ping-prefer-rule" score-attribute="pingd" > + <expression id="ping-prefer" attribute="pingd" operation="defined"/> + </rule> + </rsc_location> + +It is perhaps easier to think of this in terms of the simple constraints that +the cluster translates it into. For example, if ``sles-1`` is connected to all +five ping nodes but ``sles-2`` is only connected to two, then it would be as if +you instead had the following constraints in your configuration: + +.. topic:: How the cluster translates the above location constraint + + .. code-block:: xml + + <rsc_location id="ping-1" rsc="Webserver" node="sles-1" score="5000"/> + <rsc_location id="ping-2" rsc="Webserver" node="sles-2" score="2000"/> + +The advantage is that you don't have to manually update any constraints +whenever your network connectivity changes. + +You can also combine the concepts above into something even more complex. The +example below shows how you can prefer the node with the most connected ping +nodes provided they have connectivity to at least three (again assuming that +``multiplier`` is set to 1000). + +.. topic:: More complex example of choosing location based on connectivity + + .. code-block:: xml + + <rsc_location id="WebServer-connectivity" rsc="Webserver"> + <rule id="ping-exclude-rule" score="-INFINITY" > + <expression id="ping-exclude" attribute="pingd" operation="lt" value="3000"/> + </rule> + <rule id="ping-prefer-rule" score-attribute="pingd" > + <expression id="ping-prefer" attribute="pingd" operation="defined"/> + </rule> + </rsc_location> diff --git a/doc/sphinx/Pacemaker_Administration/pcs-crmsh.rst b/doc/sphinx/Pacemaker_Administration/pcs-crmsh.rst index 61ab4e6..3eda60a 100644 --- a/doc/sphinx/Pacemaker_Administration/pcs-crmsh.rst +++ b/doc/sphinx/Pacemaker_Administration/pcs-crmsh.rst @@ -118,14 +118,11 @@ Manage Resources .. topic:: Create a Resource .. code-block:: none - - crmsh # crm configure primitive ClusterIP ocf:heartbeat:IPaddr2 \ - params ip=192.168.122.120 cidr_netmask=24 \ - op monitor interval=30s + crmsh # crm configure primitive ClusterIP IPaddr2 params ip=192.168.122.120 cidr_netmask=24 pcs # pcs resource create ClusterIP IPaddr2 ip=192.168.122.120 cidr_netmask=24 -pcs determines the standard and provider (``ocf:heartbeat``) automatically -since ``IPaddr2`` is unique, and automatically creates operations (including +Both crmsh and pcs determine the standard and provider (``ocf:heartbeat``) automatically +since ``IPaddr2`` is unique, and automatically create operations (including monitor) based on the agent's meta-data. .. topic:: Show Configuration of All Resources @@ -270,6 +267,10 @@ edited and verified before committing to the live configuration: crmsh # crm configure ms WebDataClone WebData \ meta master-max=1 master-node-max=1 \ clone-max=2 clone-node-max=1 notify=true + crmsh # crm configure clone WebDataClone WebData \ + meta promotable=true \ + promoted-max=1 promoted-node-max=1 \ + clone-max=2 clone-node-max=1 notify=true pcs-0.9 # pcs resource master WebDataClone WebData \ master-max=1 master-node-max=1 \ clone-max=2 clone-node-max=1 notify=true @@ -277,6 +278,7 @@ edited and verified before committing to the live configuration: promoted-max=1 promoted-node-max=1 \ clone-max=2 clone-node-max=1 notify=true +crmsh supports both ways ('configure ms' is deprecated) to configure promotable clone since crmsh 4.4.0. pcs will generate the clone name automatically if it is omitted from the command line. |