summaryrefslogtreecommitdiffstats
path: root/doc/sphinx/Pacemaker_Explained
diff options
context:
space:
mode:
Diffstat (limited to 'doc/sphinx/Pacemaker_Explained')
-rw-r--r--doc/sphinx/Pacemaker_Explained/acls.rst460
-rw-r--r--doc/sphinx/Pacemaker_Explained/advanced-options.rst586
-rw-r--r--doc/sphinx/Pacemaker_Explained/advanced-resources.rst1629
-rw-r--r--doc/sphinx/Pacemaker_Explained/alerts.rst257
-rw-r--r--doc/sphinx/Pacemaker_Explained/ap-samples.rst148
-rw-r--r--doc/sphinx/Pacemaker_Explained/constraints.rst1106
-rw-r--r--doc/sphinx/Pacemaker_Explained/fencing.rst1298
-rw-r--r--doc/sphinx/Pacemaker_Explained/images/resource-set.pngbin0 -> 27238 bytes
-rw-r--r--doc/sphinx/Pacemaker_Explained/images/three-sets.pngbin0 -> 69969 bytes
-rw-r--r--doc/sphinx/Pacemaker_Explained/images/two-sets.pngbin0 -> 47601 bytes
-rw-r--r--doc/sphinx/Pacemaker_Explained/index.rst41
-rw-r--r--doc/sphinx/Pacemaker_Explained/intro.rst22
-rw-r--r--doc/sphinx/Pacemaker_Explained/multi-site-clusters.rst341
-rw-r--r--doc/sphinx/Pacemaker_Explained/nodes.rst441
-rw-r--r--doc/sphinx/Pacemaker_Explained/options.rst622
-rw-r--r--doc/sphinx/Pacemaker_Explained/resources.rst1074
-rw-r--r--doc/sphinx/Pacemaker_Explained/reusing-configuration.rst415
-rw-r--r--doc/sphinx/Pacemaker_Explained/rules.rst1021
-rw-r--r--doc/sphinx/Pacemaker_Explained/status.rst372
-rw-r--r--doc/sphinx/Pacemaker_Explained/utilization.rst264
20 files changed, 10097 insertions, 0 deletions
diff --git a/doc/sphinx/Pacemaker_Explained/acls.rst b/doc/sphinx/Pacemaker_Explained/acls.rst
new file mode 100644
index 0000000..67d5d15
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/acls.rst
@@ -0,0 +1,460 @@
+.. index::
+ single: Access Control List (ACL)
+
+.. _acl:
+
+Access Control Lists (ACLs)
+---------------------------
+
+By default, the ``root`` user or any user in the ``haclient`` group can modify
+Pacemaker's CIB without restriction. Pacemaker offers *access control lists
+(ACLs)* to provide more fine-grained authorization.
+
+.. important::
+
+ Being able to modify the CIB's resource section allows a user to run any
+ executable file as root, by configuring it as an LSB resource with a full
+ path.
+
+ACL Prerequisites
+#################
+
+In order to use ACLs:
+
+* The ``enable-acl`` :ref:`cluster option <cluster_options>` must be set to
+ true.
+
+* Desired users must have user accounts in the ``haclient`` group on all
+ cluster nodes in the cluster.
+
+* If your CIB was created before Pacemaker 1.1.12, it might need to be updated
+ to the current schema (using ``cibadmin --upgrade`` or a higher-level tool
+ equivalent) in order to use the syntax documented here.
+
+* Prior to the 2.1.0 release, the Pacemaker software had to have been built
+ with ACL support. If you are using an older release, your installation
+ supports ACLs only if the output of the command ``pacemakerd --features``
+ contains ``acls``. In newer versions, ACLs are always enabled.
+
+
+.. index::
+ single: Access Control List (ACL); acls
+ pair: acls; XML element
+
+ACL Configuration
+#################
+
+ACLs are specified within an ``acls`` element of the CIB. The ``acls`` element
+may contain any number of ``acl_role``, ``acl_target``, and ``acl_group``
+elements.
+
+
+.. index::
+ single: Access Control List (ACL); acl_role
+ pair: acl_role; XML element
+
+ACL Roles
+#########
+
+An ACL *role* is a collection of permissions allowing or denying access to
+particular portions of the CIB. A role is configured with an ``acl_role``
+element in the CIB ``acls`` section.
+
+.. table:: **Properties of an acl_role element**
+ :widths: 1 3
+
+ +------------------+-----------------------------------------------------------+
+ | Attribute | Description |
+ +==================+===========================================================+
+ | id | .. index:: |
+ | | single: acl_role; id (attribute) |
+ | | single: id; acl_role attribute |
+ | | single: attribute; id (acl_role) |
+ | | |
+ | | A unique name for the role *(required)* |
+ +------------------+-----------------------------------------------------------+
+ | description | .. index:: |
+ | | single: acl_role; description (attribute) |
+ | | single: description; acl_role attribute |
+ | | single: attribute; description (acl_role) |
+ | | |
+ | | Arbitrary text (not used by Pacemaker) |
+ +------------------+-----------------------------------------------------------+
+
+An ``acl_role`` element may contain any number of ``acl_permission`` elements.
+
+.. index::
+ single: Access Control List (ACL); acl_permission
+ pair: acl_permission; XML element
+
+.. table:: **Properties of an acl_permission element**
+ :widths: 1 3
+
+ +------------------+-----------------------------------------------------------+
+ | Attribute | Description |
+ +==================+===========================================================+
+ | id | .. index:: |
+ | | single: acl_permission; id (attribute) |
+ | | single: id; acl_permission attribute |
+ | | single: attribute; id (acl_permission) |
+ | | |
+ | | A unique name for the permission *(required)* |
+ +------------------+-----------------------------------------------------------+
+ | description | .. index:: |
+ | | single: acl_permission; description (attribute) |
+ | | single: description; acl_permission attribute |
+ | | single: attribute; description (acl_permission) |
+ | | |
+ | | Arbitrary text (not used by Pacemaker) |
+ +------------------+-----------------------------------------------------------+
+ | kind | .. index:: |
+ | | single: acl_permission; kind (attribute) |
+ | | single: kind; acl_permission attribute |
+ | | single: attribute; kind (acl_permission) |
+ | | |
+ | | The access being granted. Allowed values are ``read``, |
+ | | ``write``, and ``deny``. A value of ``write`` grants both |
+ | | read and write access. |
+ +------------------+-----------------------------------------------------------+
+ | object-type | .. index:: |
+ | | single: acl_permission; object-type (attribute) |
+ | | single: object-type; acl_permission attribute |
+ | | single: attribute; object-type (acl_permission) |
+ | | |
+ | | The name of an XML element in the CIB to which the |
+ | | permission applies. (Exactly one of ``object-type``, |
+ | | ``xpath``, and ``reference`` must be specified for a |
+ | | permission.) |
+ +------------------+-----------------------------------------------------------+
+ | attribute | .. index:: |
+ | | single: acl_permission; attribute (attribute) |
+ | | single: attribute; acl_permission attribute |
+ | | single: attribute; attribute (acl_permission) |
+ | | |
+ | | If specified, the permission applies only to |
+ | | ``object-type`` elements that have this attribute set (to |
+ | | any value). If not specified, the permission applies to |
+ | | all ``object-type`` elements. May only be used with |
+ | | ``object-type``. |
+ +------------------+-----------------------------------------------------------+
+ | reference | .. index:: |
+ | | single: acl_permission; reference (attribute) |
+ | | single: reference; acl_permission attribute |
+ | | single: attribute; reference (acl_permission) |
+ | | |
+ | | The ID of an XML element in the CIB to which the |
+ | | permission applies. (Exactly one of ``object-type``, |
+ | | ``xpath``, and ``reference`` must be specified for a |
+ | | permission.) |
+ +------------------+-----------------------------------------------------------+
+ | xpath | .. index:: |
+ | | single: acl_permission; xpath (attribute) |
+ | | single: xpath; acl_permission attribute |
+ | | single: attribute; xpath (acl_permission) |
+ | | |
+ | | An `XPath <https://www.w3.org/TR/xpath-10/>`_ |
+ | | specification selecting an XML element in the CIB to |
+ | | which the permission applies. Attributes may be specified |
+ | | in the XPath to select particular elements, but the |
+ | | permissions apply to the entire element. (Exactly one of |
+ | | ``object-type``, ``xpath``, and ``reference`` must be |
+ | | specified for a permission.) |
+ +------------------+-----------------------------------------------------------+
+
+.. important::
+
+ * Permissions are applied to the selected XML element's entire XML subtree
+ (all elements enclosed within it).
+
+ * Write permission grants the ability to create, modify, or remove the
+ element and its subtree, and also the ability to create any "scaffolding"
+ elements (enclosing elements that do not have attributes other than an
+ ID).
+
+ * Permissions for more specific matches (more deeply nested elements) take
+ precedence over more general ones.
+
+ * If multiple permissions are configured for the same match (for example, in
+ different roles applied to the same user), any ``deny`` permission takes
+ precedence, then ``write``, then lastly ``read``.
+
+
+ACL Targets and Groups
+######################
+
+ACL targets correspond to user accounts on the system.
+
+.. index::
+ single: Access Control List (ACL); acl_target
+ pair: acl_target; XML element
+
+.. table:: **Properties of an acl_target element**
+ :widths: 1 3
+
+ +------------------+-----------------------------------------------------------+
+ | Attribute | Description |
+ +==================+===========================================================+
+ | id | .. index:: |
+ | | single: acl_target; id (attribute) |
+ | | single: id; acl_target attribute |
+ | | single: attribute; id (acl_target) |
+ | | |
+ | | A unique identifier for the target (if ``name`` is not |
+ | | specified, this must be the name of the user account) |
+ | | *(required)* |
+ +------------------+-----------------------------------------------------------+
+ | name | .. index:: |
+ | | single: acl_target; name (attribute) |
+ | | single: name; acl_target attribute |
+ | | single: attribute; name (acl_target) |
+ | | |
+ | | If specified, the user account name (this allows you to |
+ | | specify a user name that is already used as the ``id`` |
+ | | for some other configuration element) *(since 2.1.5)* |
+ +------------------+-----------------------------------------------------------+
+
+ACL groups correspond to groups on the system. Any role configured for these
+groups apply to all users in that group *(since 2.1.5)*.
+
+.. index::
+ single: Access Control List (ACL); acl_group
+ pair: acl_group; XML element
+
+.. table:: **Properties of an acl_group element**
+ :widths: 1 3
+
+ +------------------+-----------------------------------------------------------+
+ | Attribute | Description |
+ +==================+===========================================================+
+ | id | .. index:: |
+ | | single: acl_group; id (attribute) |
+ | | single: id; acl_group attribute |
+ | | single: attribute; id (acl_group) |
+ | | |
+ | | A unique identifier for the group (if ``name`` is not |
+ | | specified, this must be the group name) *(required)* |
+ +------------------+-----------------------------------------------------------+
+ | name | .. index:: |
+ | | single: acl_group; name (attribute) |
+ | | single: name; acl_group attribute |
+ | | single: attribute; name (acl_group) |
+ | | |
+ | | If specified, the group name (this allows you to specify |
+ | | a group name that is already used as the ``id`` for some |
+ | | other configuration element) |
+ +------------------+-----------------------------------------------------------+
+
+Each ``acl_target`` and ``acl_group`` element may contain any number of ``role``
+elements.
+
+.. note::
+
+ If the system users and groups are defined by some network service (such as
+ LDAP), the cluster itself will be unaffected by outages in the service, but
+ affected users and groups will not be able to make changes to the CIB.
+
+
+.. index::
+ single: Access Control List (ACL); role
+ pair: role; XML element
+
+.. table:: **Properties of a role element**
+ :widths: 1 3
+
+ +------------------+-----------------------------------------------------------+
+ | Attribute | Description |
+ +==================+===========================================================+
+ | id | .. index:: |
+ | | single: role; id (attribute) |
+ | | single: id; role attribute |
+ | | single: attribute; id (role) |
+ | | |
+ | | The ``id`` of an ``acl_role`` element that specifies |
+ | | permissions granted to the enclosing target or group. |
+ +------------------+-----------------------------------------------------------+
+
+.. important::
+
+ The ``root`` and ``hacluster`` user accounts always have full access to the
+ CIB, regardless of ACLs. For all other user accounts, when ``enable-acl`` is
+ true, permission to all parts of the CIB is denied by default (permissions
+ must be explicitly granted).
+
+ACL Examples
+############
+
+.. code-block:: xml
+
+ <acls>
+
+ <acl_role id="read_all">
+ <acl_permission id="read_all-cib" kind="read" xpath="/cib" />
+ </acl_role>
+
+ <acl_role id="operator">
+
+ <acl_permission id="operator-maintenance-mode" kind="write"
+ xpath="//crm_config//nvpair[@name='maintenance-mode']" />
+
+ <acl_permission id="operator-maintenance-attr" kind="write"
+ xpath="//nvpair[@name='maintenance']" />
+
+ <acl_permission id="operator-target-role" kind="write"
+ xpath="//resources//meta_attributes/nvpair[@name='target-role']" />
+
+ <acl_permission id="operator-is-managed" kind="write"
+ xpath="//resources//nvpair[@name='is-managed']" />
+
+ <acl_permission id="operator-rsc_location" kind="write"
+ object-type="rsc_location" />
+
+ </acl_role>
+
+ <acl_role id="administrator">
+ <acl_permission id="administrator-cib" kind="write" xpath="/cib" />
+ </acl_role>
+
+ <acl_role id="minimal">
+
+ <acl_permission id="minimal-standby" kind="read"
+ description="allow reading standby node attribute (permanent or transient)"
+ xpath="//instance_attributes/nvpair[@name='standby']"/>
+
+ <acl_permission id="minimal-maintenance" kind="read"
+ description="allow reading maintenance node attribute (permanent or transient)"
+ xpath="//nvpair[@name='maintenance']"/>
+
+ <acl_permission id="minimal-target-role" kind="read"
+ description="allow reading resource target roles"
+ xpath="//resources//meta_attributes/nvpair[@name='target-role']"/>
+
+ <acl_permission id="minimal-is-managed" kind="read"
+ description="allow reading resource managed status"
+ xpath="//resources//meta_attributes/nvpair[@name='is-managed']"/>
+
+ <acl_permission id="minimal-deny-instance-attributes" kind="deny"
+ xpath="//instance_attributes"/>
+
+ <acl_permission id="minimal-deny-meta-attributes" kind="deny"
+ xpath="//meta_attributes"/>
+
+ <acl_permission id="minimal-deny-operations" kind="deny"
+ xpath="//operations"/>
+
+ <acl_permission id="minimal-deny-utilization" kind="deny"
+ xpath="//utilization"/>
+
+ <acl_permission id="minimal-nodes" kind="read"
+ description="allow reading node names/IDs (attributes are denied separately)"
+ xpath="/cib/configuration/nodes"/>
+
+ <acl_permission id="minimal-resources" kind="read"
+ description="allow reading resource names/agents (parameters are denied separately)"
+ xpath="/cib/configuration/resources"/>
+
+ <acl_permission id="minimal-deny-constraints" kind="deny"
+ xpath="/cib/configuration/constraints"/>
+
+ <acl_permission id="minimal-deny-topology" kind="deny"
+ xpath="/cib/configuration/fencing-topology"/>
+
+ <acl_permission id="minimal-deny-op_defaults" kind="deny"
+ xpath="/cib/configuration/op_defaults"/>
+
+ <acl_permission id="minimal-deny-rsc_defaults" kind="deny"
+ xpath="/cib/configuration/rsc_defaults"/>
+
+ <acl_permission id="minimal-deny-alerts" kind="deny"
+ xpath="/cib/configuration/alerts"/>
+
+ <acl_permission id="minimal-deny-acls" kind="deny"
+ xpath="/cib/configuration/acls"/>
+
+ <acl_permission id="minimal-cib" kind="read"
+ description="allow reading cib element and crm_config/status sections"
+ xpath="/cib"/>
+
+ </acl_role>
+
+ <acl_target id="alice">
+ <role id="minimal"/>
+ </acl_target>
+
+ <acl_target id="bob">
+ <role id="read_all"/>
+ </acl_target>
+
+ <acl_target id="carol">
+ <role id="read_all"/>
+ <role id="operator"/>
+ </acl_target>
+
+ <acl_target id="dave">
+ <role id="administrator"/>
+ </acl_target>
+
+ </acls>
+
+In the above example, the user ``alice`` has the minimal permissions necessary
+to run basic Pacemaker CLI tools, including using ``crm_mon`` to view the
+cluster status, without being able to modify anything. The user ``bob`` can
+view the entire configuration and status of the cluster, but not make any
+changes. The user ``carol`` can read everything, and change selected cluster
+properties as well as resource roles and location constraints. Finally,
+``dave`` has full read and write access to the entire CIB.
+
+Looking at the ``minimal`` role in more depth, it is designed to allow read
+access to the ``cib`` tag itself, while denying access to particular portions
+of its subtree (which is the entire CIB).
+
+This is because the DC node is indicated in the ``cib`` tag, so ``crm_mon``
+will not be able to report the DC otherwise. However, this does change the
+security model to allow by default, since any portions of the CIB not
+explicitly denied will be readable. The ``cib`` read access could be removed
+and replaced with read access to just the ``crm_config`` and ``status``
+sections, for a safer approach at the cost of not seeing the DC in status
+output.
+
+For a simpler configuration, the ``minimal`` role allows read access to the
+entire ``crm_config`` section, which contains cluster properties. It would be
+possible to allow read access to specific properties instead (such as
+``stonith-enabled``, ``dc-uuid``, ``have-quorum``, and ``cluster-name``) to
+restrict access further while still allowing status output, but cluster
+properties are unlikely to be considered sensitive.
+
+
+ACL Limitations
+###############
+
+Actions performed via IPC rather than the CIB
+_____________________________________________
+
+ACLs apply *only* to the CIB.
+
+That means ACLs apply to command-line tools that operate by reading or writing
+the CIB, such as ``crm_attribute`` when managing permanent node attributes,
+``crm_mon``, and ``cibadmin``.
+
+However, command-line tools that communicate directly with Pacemaker daemons
+via IPC are not affected by ACLs. For example, users in the ``haclient`` group
+may still do the following, regardless of ACLs:
+
+* Query transient node attribute values using ``crm_attribute`` and
+ ``attrd_updater``.
+
+* Query basic node information using ``crm_node``.
+
+* Erase resource operation history using ``crm_resource``.
+
+* Query fencing configuration information, and execute fencing against nodes,
+ using ``stonith_admin``.
+
+ACLs and Pacemaker Remote
+_________________________
+
+ACLs apply to commands run on Pacemaker Remote nodes using the Pacemaker Remote
+node's name as the ACL user name.
+
+The idea is that Pacemaker Remote nodes (especially virtual machines and
+containers) are likely to be purpose-built and have different user accounts
+from full cluster nodes.
diff --git a/doc/sphinx/Pacemaker_Explained/advanced-options.rst b/doc/sphinx/Pacemaker_Explained/advanced-options.rst
new file mode 100644
index 0000000..20ab79e
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/advanced-options.rst
@@ -0,0 +1,586 @@
+Advanced Configuration
+----------------------
+
+.. index::
+ single: start-delay; operation attribute
+ single: interval-origin; operation attribute
+ single: interval; interval-origin
+ single: operation; interval-origin
+ single: operation; start-delay
+
+Specifying When Recurring Actions are Performed
+###############################################
+
+By default, recurring actions are scheduled relative to when the resource
+started. In some cases, you might prefer that a recurring action start relative
+to a specific date and time. For example, you might schedule an in-depth
+monitor to run once every 24 hours, and want it to run outside business hours.
+
+To do this, set the operation's ``interval-origin``. The cluster uses this point
+to calculate the correct ``start-delay`` such that the operation will occur
+at ``interval-origin`` plus a multiple of the operation interval.
+
+For example, if the recurring operation's interval is 24h, its
+``interval-origin`` is set to 02:00, and it is currently 14:32, then the
+cluster would initiate the operation after 11 hours and 28 minutes.
+
+The value specified for ``interval`` and ``interval-origin`` can be any
+date/time conforming to the
+`ISO8601 standard <https://en.wikipedia.org/wiki/ISO_8601>`_. By way of
+example, to specify an operation that would run on the first Monday of
+2021 and every Monday after that, you would add:
+
+.. topic:: Example recurring action that runs relative to base date/time
+
+ .. code-block:: xml
+
+ <op id="intensive-monitor" name="monitor" interval="P7D" interval-origin="2021-W01-1"/>
+
+.. index::
+ single: resource; failure recovery
+ single: operation; failure recovery
+
+.. _failure-handling:
+
+Handling Resource Failure
+#########################
+
+By default, Pacemaker will attempt to recover failed resources by restarting
+them. However, failure recovery is highly configurable.
+
+.. index::
+ single: resource; failure count
+ single: operation; failure count
+
+Failure Counts
+______________
+
+Pacemaker tracks resource failures for each combination of node, resource, and
+operation (start, stop, monitor, etc.).
+
+You can query the fail count for a particular node, resource, and/or operation
+using the ``crm_failcount`` command. For example, to see how many times the
+10-second monitor for ``myrsc`` has failed on ``node1``, run:
+
+.. code-block:: none
+
+ # crm_failcount --query -r myrsc -N node1 -n monitor -I 10s
+
+If you omit the node, ``crm_failcount`` will use the local node. If you omit
+the operation and interval, ``crm_failcount`` will display the sum of the fail
+counts for all operations on the resource.
+
+You can use ``crm_resource --cleanup`` or ``crm_failcount --delete`` to clear
+fail counts. For example, to clear the above monitor failures, run:
+
+.. code-block:: none
+
+ # crm_resource --cleanup -r myrsc -N node1 -n monitor -I 10s
+
+If you omit the resource, ``crm_resource --cleanup`` will clear failures for
+all resources. If you omit the node, it will clear failures on all nodes. If
+you omit the operation and interval, it will clear the failures for all
+operations on the resource.
+
+.. note::
+
+ Even when cleaning up only a single operation, all failed operations will
+ disappear from the status display. This allows us to trigger a re-check of
+ the resource's current status.
+
+Higher-level tools may provide other commands for querying and clearing
+fail counts.
+
+The ``crm_mon`` tool shows the current cluster status, including any failed
+operations. To see the current fail counts for any failed resources, call
+``crm_mon`` with the ``--failcounts`` option. This shows the fail counts per
+resource (that is, the sum of any operation fail counts for the resource).
+
+.. index::
+ single: migration-threshold; resource meta-attribute
+ single: resource; migration-threshold
+
+Failure Response
+________________
+
+Normally, if a running resource fails, pacemaker will try to stop it and start
+it again. Pacemaker will choose the best location to start it each time, which
+may be the same node that it failed on.
+
+However, if a resource fails repeatedly, it is possible that there is an
+underlying problem on that node, and you might desire trying a different node
+in such a case. Pacemaker allows you to set your preference via the
+``migration-threshold`` resource meta-attribute. [#]_
+
+If you define ``migration-threshold`` to *N* for a resource, it will be banned
+from the original node after *N* failures there.
+
+.. note::
+
+ The ``migration-threshold`` is per *resource*, even though fail counts are
+ tracked per *operation*. The operation fail counts are added together
+ to compare against the ``migration-threshold``.
+
+By default, fail counts remain until manually cleared by an administrator
+using ``crm_resource --cleanup`` or ``crm_failcount --delete`` (hopefully after
+first fixing the failure's cause). It is possible to have fail counts expire
+automatically by setting the ``failure-timeout`` resource meta-attribute.
+
+.. important::
+
+ A successful operation does not clear past failures. If a recurring monitor
+ operation fails once, succeeds many times, then fails again days later, its
+ fail count is 2. Fail counts are cleared only by manual intervention or
+ failure timeout.
+
+For example, setting ``migration-threshold`` to 2 and ``failure-timeout`` to
+``60s`` would cause the resource to move to a new node after 2 failures, and
+allow it to move back (depending on stickiness and constraint scores) after one
+minute.
+
+.. note::
+
+ ``failure-timeout`` is measured since the most recent failure. That is, older
+ failures do not individually time out and lower the fail count. Instead, all
+ failures are timed out simultaneously (and the fail count is reset to 0) if
+ there is no new failure for the timeout period.
+
+There are two exceptions to the migration threshold: when a resource either
+fails to start or fails to stop.
+
+If the cluster property ``start-failure-is-fatal`` is set to ``true`` (which is
+the default), start failures cause the fail count to be set to ``INFINITY`` and
+thus always cause the resource to move immediately.
+
+Stop failures are slightly different and crucial. If a resource fails to stop
+and fencing is enabled, then the cluster will fence the node in order to be
+able to start the resource elsewhere. If fencing is disabled, then the cluster
+has no way to continue and will not try to start the resource elsewhere, but
+will try to stop it again after any failure timeout or clearing.
+
+.. index::
+ single: resource; move
+
+Moving Resources
+################
+
+Moving Resources Manually
+_________________________
+
+There are primarily two occasions when you would want to move a resource from
+its current location: when the whole node is under maintenance, and when a
+single resource needs to be moved.
+
+.. index::
+ single: standby mode
+ single: node; standby mode
+
+Standby Mode
+~~~~~~~~~~~~
+
+Since everything eventually comes down to a score, you could create constraints
+for every resource to prevent them from running on one node. While Pacemaker
+configuration can seem convoluted at times, not even we would require this of
+administrators.
+
+Instead, you can set a special node attribute which tells the cluster "don't
+let anything run here". There is even a helpful tool to help query and set it,
+called ``crm_standby``. To check the standby status of the current machine,
+run:
+
+.. code-block:: none
+
+ # crm_standby -G
+
+A value of ``on`` indicates that the node is *not* able to host any resources,
+while a value of ``off`` says that it *can*.
+
+You can also check the status of other nodes in the cluster by specifying the
+`--node` option:
+
+.. code-block:: none
+
+ # crm_standby -G --node sles-2
+
+To change the current node's standby status, use ``-v`` instead of ``-G``:
+
+.. code-block:: none
+
+ # crm_standby -v on
+
+Again, you can change another host's value by supplying a hostname with
+``--node``.
+
+A cluster node in standby mode will not run resources, but still contributes to
+quorum, and may fence or be fenced by nodes.
+
+Moving One Resource
+~~~~~~~~~~~~~~~~~~~
+
+When only one resource is required to move, we could do this by creating
+location constraints. However, once again we provide a user-friendly shortcut
+as part of the ``crm_resource`` command, which creates and modifies the extra
+constraints for you. If ``Email`` were running on ``sles-1`` and you wanted it
+moved to a specific location, the command would look something like:
+
+.. code-block:: none
+
+ # crm_resource -M -r Email -H sles-2
+
+Behind the scenes, the tool will create the following location constraint:
+
+.. code-block:: xml
+
+ <rsc_location id="cli-prefer-Email" rsc="Email" node="sles-2" score="INFINITY"/>
+
+It is important to note that subsequent invocations of ``crm_resource -M`` are
+not cumulative. So, if you ran these commands:
+
+.. code-block:: none
+
+ # crm_resource -M -r Email -H sles-2
+ # crm_resource -M -r Email -H sles-3
+
+then it is as if you had never performed the first command.
+
+To allow the resource to move back again, use:
+
+.. code-block:: none
+
+ # crm_resource -U -r Email
+
+Note the use of the word *allow*. The resource *can* move back to its original
+location, but depending on ``resource-stickiness``, location constraints, and
+so forth, it might stay where it is.
+
+To be absolutely certain that it moves back to ``sles-1``, move it there before
+issuing the call to ``crm_resource -U``:
+
+.. code-block:: none
+
+ # crm_resource -M -r Email -H sles-1
+ # crm_resource -U -r Email
+
+Alternatively, if you only care that the resource should be moved from its
+current location, try:
+
+.. code-block:: none
+
+ # crm_resource -B -r Email
+
+which will instead create a negative constraint, like:
+
+.. code-block:: xml
+
+ <rsc_location id="cli-ban-Email-on-sles-1" rsc="Email" node="sles-1" score="-INFINITY"/>
+
+This will achieve the desired effect, but will also have long-term
+consequences. As the tool will warn you, the creation of a ``-INFINITY``
+constraint will prevent the resource from running on that node until
+``crm_resource -U`` is used. This includes the situation where every other
+cluster node is no longer available!
+
+In some cases, such as when ``resource-stickiness`` is set to ``INFINITY``, it
+is possible that you will end up with the problem described in
+:ref:`node-score-equal`. The tool can detect some of these cases and deals with
+them by creating both positive and negative constraints. For example:
+
+.. code-block:: xml
+
+ <rsc_location id="cli-ban-Email-on-sles-1" rsc="Email" node="sles-1" score="-INFINITY"/>
+ <rsc_location id="cli-prefer-Email" rsc="Email" node="sles-2" score="INFINITY"/>
+
+which has the same long-term consequences as discussed earlier.
+
+Moving Resources Due to Connectivity Changes
+____________________________________________
+
+You can configure the cluster to move resources when external connectivity is
+lost in two steps.
+
+.. index::
+ single: ocf:pacemaker:ping resource
+ single: ping resource
+
+Tell Pacemaker to Monitor Connectivity
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+First, add an ``ocf:pacemaker:ping`` resource to the cluster. The ``ping``
+resource uses the system utility of the same name to a test whether a list of
+machines (specified by DNS hostname or IP address) are reachable, and uses the
+results to maintain a node attribute.
+
+The node attribute is called ``pingd`` by default, but is customizable in order
+to allow multiple ping groups to be defined.
+
+Normally, the ping resource should run on all cluster nodes, which means that
+you'll need to create a clone. A template for this can be found below, along
+with a description of the most interesting parameters.
+
+.. table:: **Commonly Used ocf:pacemaker:ping Resource Parameters**
+ :widths: 1 4
+
+ +--------------------+--------------------------------------------------------------+
+ | Resource Parameter | Description |
+ +====================+==============================================================+
+ | dampen | .. index:: |
+ | | single: ocf:pacemaker:ping resource; dampen parameter |
+ | | single: dampen; ocf:pacemaker:ping resource parameter |
+ | | |
+ | | The time to wait (dampening) for further changes to occur. |
+ | | Use this to prevent a resource from bouncing around the |
+ | | cluster when cluster nodes notice the loss of connectivity |
+ | | at slightly different times. |
+ +--------------------+--------------------------------------------------------------+
+ | multiplier | .. index:: |
+ | | single: ocf:pacemaker:ping resource; multiplier parameter |
+ | | single: multiplier; ocf:pacemaker:ping resource parameter |
+ | | |
+ | | The number of connected ping nodes gets multiplied by this |
+ | | value to get a score. Useful when there are multiple ping |
+ | | nodes configured. |
+ +--------------------+--------------------------------------------------------------+
+ | host_list | .. index:: |
+ | | single: ocf:pacemaker:ping resource; host_list parameter |
+ | | single: host_list; ocf:pacemaker:ping resource parameter |
+ | | |
+ | | The machines to contact in order to determine the current |
+ | | connectivity status. Allowed values include resolvable DNS |
+ | | connectivity host names, IPv4 addresses, and IPv6 addresses. |
+ +--------------------+--------------------------------------------------------------+
+
+.. topic:: Example ping resource that checks node connectivity once every minute
+
+ .. code-block:: xml
+
+ <clone id="Connected">
+ <primitive id="ping" class="ocf" provider="pacemaker" type="ping">
+ <instance_attributes id="ping-attrs">
+ <nvpair id="ping-dampen" name="dampen" value="5s"/>
+ <nvpair id="ping-multiplier" name="multiplier" value="1000"/>
+ <nvpair id="ping-hosts" name="host_list" value="my.gateway.com www.bigcorp.com"/>
+ </instance_attributes>
+ <operations>
+ <op id="ping-monitor-60s" interval="60s" name="monitor"/>
+ </operations>
+ </primitive>
+ </clone>
+
+.. important::
+
+ You're only half done. The next section deals with telling Pacemaker how to
+ deal with the connectivity status that ``ocf:pacemaker:ping`` is recording.
+
+Tell Pacemaker How to Interpret the Connectivity Data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. important::
+
+ Before attempting the following, make sure you understand
+ :ref:`rules`.
+
+There are a number of ways to use the connectivity data.
+
+The most common setup is for people to have a single ping target (for example,
+the service network's default gateway), to prevent the cluster from running a
+resource on any unconnected node.
+
+.. topic:: Don't run a resource on unconnected nodes
+
+ .. code-block:: xml
+
+ <rsc_location id="WebServer-no-connectivity" rsc="Webserver">
+ <rule id="ping-exclude-rule" score="-INFINITY" >
+ <expression id="ping-exclude" attribute="pingd" operation="not_defined"/>
+ </rule>
+ </rsc_location>
+
+A more complex setup is to have a number of ping targets configured. You can
+require the cluster to only run resources on nodes that can connect to all (or
+a minimum subset) of them.
+
+.. topic:: Run only on nodes connected to three or more ping targets
+
+ .. code-block:: xml
+
+ <primitive id="ping" provider="pacemaker" class="ocf" type="ping">
+ ... <!-- omitting some configuration to highlight important parts -->
+ <nvpair id="ping-multiplier" name="multiplier" value="1000"/>
+ ...
+ </primitive>
+ ...
+ <rsc_location id="WebServer-connectivity" rsc="Webserver">
+ <rule id="ping-prefer-rule" score="-INFINITY" >
+ <expression id="ping-prefer" attribute="pingd" operation="lt" value="3000"/>
+ </rule>
+ </rsc_location>
+
+Alternatively, you can tell the cluster only to *prefer* nodes with the best
+connectivity, by using ``score-attribute`` in the rule. Just be sure to set
+``multiplier`` to a value higher than that of ``resource-stickiness`` (and
+don't set either of them to ``INFINITY``).
+
+.. topic:: Prefer node with most connected ping nodes
+
+ .. code-block:: xml
+
+ <rsc_location id="WebServer-connectivity" rsc="Webserver">
+ <rule id="ping-prefer-rule" score-attribute="pingd" >
+ <expression id="ping-prefer" attribute="pingd" operation="defined"/>
+ </rule>
+ </rsc_location>
+
+It is perhaps easier to think of this in terms of the simple constraints that
+the cluster translates it into. For example, if ``sles-1`` is connected to all
+five ping nodes but ``sles-2`` is only connected to two, then it would be as if
+you instead had the following constraints in your configuration:
+
+.. topic:: How the cluster translates the above location constraint
+
+ .. code-block:: xml
+
+ <rsc_location id="ping-1" rsc="Webserver" node="sles-1" score="5000"/>
+ <rsc_location id="ping-2" rsc="Webserver" node="sles-2" score="2000"/>
+
+The advantage is that you don't have to manually update any constraints
+whenever your network connectivity changes.
+
+You can also combine the concepts above into something even more complex. The
+example below shows how you can prefer the node with the most connected ping
+nodes provided they have connectivity to at least three (again assuming that
+``multiplier`` is set to 1000).
+
+.. topic:: More complex example of choosing location based on connectivity
+
+ .. code-block:: xml
+
+ <rsc_location id="WebServer-connectivity" rsc="Webserver">
+ <rule id="ping-exclude-rule" score="-INFINITY" >
+ <expression id="ping-exclude" attribute="pingd" operation="lt" value="3000"/>
+ </rule>
+ <rule id="ping-prefer-rule" score-attribute="pingd" >
+ <expression id="ping-prefer" attribute="pingd" operation="defined"/>
+ </rule>
+ </rsc_location>
+
+
+.. _live-migration:
+
+Migrating Resources
+___________________
+
+Normally, when the cluster needs to move a resource, it fully restarts the
+resource (that is, it stops the resource on the current node and starts it on
+the new node).
+
+However, some types of resources, such as many virtual machines, are able to
+move to another location without loss of state (often referred to as live
+migration or hot migration). In pacemaker, this is called resource migration.
+Pacemaker can be configured to migrate a resource when moving it, rather than
+restarting it.
+
+Not all resources are able to migrate; see the
+:ref:`migration checklist <migration_checklist>` below. Even those that can,
+won't do so in all situations. Conceptually, there are two requirements from
+which the other prerequisites follow:
+
+* The resource must be active and healthy at the old location; and
+* everything required for the resource to run must be available on both the old
+ and new locations.
+
+The cluster is able to accommodate both *push* and *pull* migration models by
+requiring the resource agent to support two special actions: ``migrate_to``
+(performed on the current location) and ``migrate_from`` (performed on the
+destination).
+
+In push migration, the process on the current location transfers the resource
+to the new location where is it later activated. In this scenario, most of the
+work would be done in the ``migrate_to`` action and, if anything, the
+activation would occur during ``migrate_from``.
+
+Conversely for pull, the ``migrate_to`` action is practically empty and
+``migrate_from`` does most of the work, extracting the relevant resource state
+from the old location and activating it.
+
+There is no wrong or right way for a resource agent to implement migration, as
+long as it works.
+
+.. _migration_checklist:
+
+.. topic:: Migration Checklist
+
+ * The resource may not be a clone.
+ * The resource agent standard must be OCF.
+ * The resource must not be in a failed or degraded state.
+ * The resource agent must support ``migrate_to`` and ``migrate_from``
+ actions, and advertise them in its meta-data.
+ * The resource must have the ``allow-migrate`` meta-attribute set to
+ ``true`` (which is not the default).
+
+If an otherwise migratable resource depends on another resource via an ordering
+constraint, there are special situations in which it will be restarted rather
+than migrated.
+
+For example, if the resource depends on a clone, and at the time the resource
+needs to be moved, the clone has instances that are stopping and instances that
+are starting, then the resource will be restarted. The scheduler is not yet
+able to model this situation correctly and so takes the safer (if less optimal)
+path.
+
+Also, if a migratable resource depends on a non-migratable resource, and both
+need to be moved, the migratable resource will be restarted.
+
+
+.. index::
+ single: reload
+ single: reload-agent
+
+Reloading an Agent After a Definition Change
+############################################
+
+The cluster automatically detects changes to the configuration of active
+resources. The cluster's normal response is to stop the service (using the old
+definition) and start it again (with the new definition). This works, but some
+resource agents are smarter and can be told to use a new set of options without
+restarting.
+
+To take advantage of this capability, the resource agent must:
+
+* Implement the ``reload-agent`` action. What it should do depends completely
+ on your application!
+
+ .. note::
+
+ Resource agents may also implement a ``reload`` action to make the managed
+ service reload its own *native* configuration. This is different from
+ ``reload-agent``, which makes effective changes in the resource's
+ *Pacemaker* configuration (specifically, the values of the agent's
+ reloadable parameters).
+
+* Advertise the ``reload-agent`` operation in the ``actions`` section of its
+ meta-data.
+
+* Set the ``reloadable`` attribute to 1 in the ``parameters`` section of
+ its meta-data for any parameters eligible to be reloaded after a change.
+
+Once these requirements are satisfied, the cluster will automatically know to
+reload the resource (instead of restarting) when a reloadable parameter
+changes.
+
+.. note::
+
+ Metadata will not be re-read unless the resource needs to be started. If you
+ edit the agent of an already active resource to set a parameter reloadable,
+ the resource may restart the first time the parameter value changes.
+
+.. note::
+
+ If both a reloadable and non-reloadable parameter are changed
+ simultaneously, the resource will be restarted.
+
+.. rubric:: Footnotes
+
+.. [#] The naming of this option was perhaps unfortunate as it is easily
+ confused with live migration, the process of moving a resource from one
+ node to another without stopping it. Xen virtual guests are the most
+ common example of resources that can be migrated in this manner.
diff --git a/doc/sphinx/Pacemaker_Explained/advanced-resources.rst b/doc/sphinx/Pacemaker_Explained/advanced-resources.rst
new file mode 100644
index 0000000..a61b76d
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/advanced-resources.rst
@@ -0,0 +1,1629 @@
+Advanced Resource Types
+-----------------------
+
+.. index:
+ single: group resource
+ single: resource; group
+
+.. _group-resources:
+
+Groups - A Syntactic Shortcut
+#############################
+
+One of the most common elements of a cluster is a set of resources
+that need to be located together, start sequentially, and stop in the
+reverse order. To simplify this configuration, we support the concept
+of groups.
+
+.. topic:: A group of two primitive resources
+
+ .. code-block:: xml
+
+ <group id="shortcut">
+ <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
+ <instance_attributes id="params-public-ip">
+ <nvpair id="public-ip-addr" name="ip" value="192.0.2.2"/>
+ </instance_attributes>
+ </primitive>
+ <primitive id="Email" class="lsb" type="exim"/>
+ </group>
+
+Although the example above contains only two resources, there is no
+limit to the number of resources a group can contain. The example is
+also sufficient to explain the fundamental properties of a group:
+
+* Resources are started in the order they appear in (**Public-IP** first,
+ then **Email**)
+* Resources are stopped in the reverse order to which they appear in
+ (**Email** first, then **Public-IP**)
+
+If a resource in the group can't run anywhere, then nothing after that
+is allowed to run, too.
+
+* If **Public-IP** can't run anywhere, neither can **Email**;
+* but if **Email** can't run anywhere, this does not affect **Public-IP**
+ in any way
+
+The group above is logically equivalent to writing:
+
+.. topic:: How the cluster sees a group resource
+
+ .. code-block:: xml
+
+ <configuration>
+ <resources>
+ <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
+ <instance_attributes id="params-public-ip">
+ <nvpair id="public-ip-addr" name="ip" value="192.0.2.2"/>
+ </instance_attributes>
+ </primitive>
+ <primitive id="Email" class="lsb" type="exim"/>
+ </resources>
+ <constraints>
+ <rsc_colocation id="xxx" rsc="Email" with-rsc="Public-IP" score="INFINITY"/>
+ <rsc_order id="yyy" first="Public-IP" then="Email"/>
+ </constraints>
+ </configuration>
+
+Obviously as the group grows bigger, the reduced configuration effort
+can become significant.
+
+Another (typical) example of a group is a DRBD volume, the filesystem
+mount, an IP address, and an application that uses them.
+
+.. index::
+ pair: XML element; group
+
+Group Properties
+________________
+
+.. table:: **Properties of a Group Resource**
+ :widths: 1 4
+
+ +-------------+------------------------------------------------------------------+
+ | Field | Description |
+ +=============+==================================================================+
+ | id | .. index:: |
+ | | single: group; property, id |
+ | | single: property; id (group) |
+ | | single: id; group property |
+ | | |
+ | | A unique name for the group |
+ +-------------+------------------------------------------------------------------+
+ | description | .. index:: |
+ | | single: group; attribute, description |
+ | | single: attribute; description (group) |
+ | | single: description; group attribute |
+ | | |
+ | | An optional description of the group, for the user's own |
+ | | purposes. |
+ | | E.g. ``resources needed for website`` |
+ +-------------+------------------------------------------------------------------+
+
+Group Options
+_____________
+
+Groups inherit the ``priority``, ``target-role``, and ``is-managed`` properties
+from primitive resources. See :ref:`resource_options` for information about
+those properties.
+
+Group Instance Attributes
+_________________________
+
+Groups have no instance attributes. However, any that are set for the group
+object will be inherited by the group's children.
+
+Group Contents
+______________
+
+Groups may only contain a collection of cluster resources (see
+:ref:`primitive-resource`). To refer to a child of a group resource, just use
+the child's ``id`` instead of the group's.
+
+Group Constraints
+_________________
+
+Although it is possible to reference a group's children in
+constraints, it is usually preferable to reference the group itself.
+
+.. topic:: Some constraints involving groups
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_location id="group-prefers-node1" rsc="shortcut" node="node1" score="500"/>
+ <rsc_colocation id="webserver-with-group" rsc="Webserver" with-rsc="shortcut"/>
+ <rsc_order id="start-group-then-webserver" first="Webserver" then="shortcut"/>
+ </constraints>
+
+.. index::
+ pair: resource-stickiness; group
+
+Group Stickiness
+________________
+
+Stickiness, the measure of how much a resource wants to stay where it
+is, is additive in groups. Every active resource of the group will
+contribute its stickiness value to the group's total. So if the
+default ``resource-stickiness`` is 100, and a group has seven members,
+five of which are active, then the group as a whole will prefer its
+current location with a score of 500.
+
+.. index::
+ single: clone
+ single: resource; clone
+
+.. _s-resource-clone:
+
+Clones - Resources That Can Have Multiple Active Instances
+##########################################################
+
+*Clone* resources are resources that can have more than one copy active at the
+same time. This allows you, for example, to run a copy of a daemon on every
+node. You can clone any primitive or group resource [#]_.
+
+Anonymous versus Unique Clones
+______________________________
+
+A clone resource is configured to be either *anonymous* or *globally unique*.
+
+Anonymous clones are the simplest. These behave completely identically
+everywhere they are running. Because of this, there can be only one instance of
+an anonymous clone active per node.
+
+The instances of globally unique clones are distinct entities. All instances
+are launched identically, but one instance of the clone is not identical to any
+other instance, whether running on the same node or a different node. As an
+example, a cloned IP address can use special kernel functionality such that
+each instance handles a subset of requests for the same IP address.
+
+.. index::
+ single: promotable clone
+ single: resource; promotable
+
+.. _s-resource-promotable:
+
+Promotable clones
+_________________
+
+If a clone is *promotable*, its instances can perform a special role that
+Pacemaker will manage via the ``promote`` and ``demote`` actions of the resource
+agent.
+
+Services that support such a special role have various terms for the special
+role and the default role: primary and secondary, master and replica,
+controller and worker, etc. Pacemaker uses the terms *promoted* and
+*unpromoted* to be agnostic to what the service calls them or what they do.
+
+All that Pacemaker cares about is that an instance comes up in the unpromoted role
+when started, and the resource agent supports the ``promote`` and ``demote`` actions
+to manage entering and exiting the promoted role.
+
+.. index::
+ pair: XML element; clone
+
+Clone Properties
+________________
+
+.. table:: **Properties of a Clone Resource**
+ :widths: 1 4
+
+ +-------------+------------------------------------------------------------------+
+ | Field | Description |
+ +=============+==================================================================+
+ | id | .. index:: |
+ | | single: clone; property, id |
+ | | single: property; id (clone) |
+ | | single: id; clone property |
+ | | |
+ | | A unique name for the clone |
+ +-------------+------------------------------------------------------------------+
+ | description | .. index:: |
+ | | single: clone; attribute, description |
+ | | single: attribute; description (clone) |
+ | | single: description; clone attribute |
+ | | |
+ | | An optional description of the clone, for the user's own |
+ | | purposes. |
+ | | E.g. ``IP address for website`` |
+ +-------------+------------------------------------------------------------------+
+
+.. index::
+ pair: options; clone
+
+Clone Options
+_____________
+
+:ref:`Options <resource_options>` inherited from primitive resources:
+``priority, target-role, is-managed``
+
+.. table:: **Clone-specific configuration options**
+ :class: longtable
+ :widths: 1 1 3
+
+ +-------------------+-----------------+-------------------------------------------------------+
+ | Field | Default | Description |
+ +===================+=================+=======================================================+
+ | globally-unique | false | .. index:: |
+ | | | single: clone; option, globally-unique |
+ | | | single: option; globally-unique (clone) |
+ | | | single: globally-unique; clone option |
+ | | | |
+ | | | If **true**, each clone instance performs a |
+ | | | distinct function |
+ +-------------------+-----------------+-------------------------------------------------------+
+ | clone-max | 0 | .. index:: |
+ | | | single: clone; option, clone-max |
+ | | | single: option; clone-max (clone) |
+ | | | single: clone-max; clone option |
+ | | | |
+ | | | The maximum number of clone instances that can |
+ | | | be started across the entire cluster. If 0, the |
+ | | | number of nodes in the cluster will be used. |
+ +-------------------+-----------------+-------------------------------------------------------+
+ | clone-node-max | 1 | .. index:: |
+ | | | single: clone; option, clone-node-max |
+ | | | single: option; clone-node-max (clone) |
+ | | | single: clone-node-max; clone option |
+ | | | |
+ | | | If ``globally-unique`` is **true**, the maximum |
+ | | | number of clone instances that can be started |
+ | | | on a single node |
+ +-------------------+-----------------+-------------------------------------------------------+
+ | clone-min | 0 | .. index:: |
+ | | | single: clone; option, clone-min |
+ | | | single: option; clone-min (clone) |
+ | | | single: clone-min; clone option |
+ | | | |
+ | | | Require at least this number of clone instances |
+ | | | to be runnable before allowing resources |
+ | | | depending on the clone to be runnable. A value |
+ | | | of 0 means require all clone instances to be |
+ | | | runnable. |
+ +-------------------+-----------------+-------------------------------------------------------+
+ | notify | false | .. index:: |
+ | | | single: clone; option, notify |
+ | | | single: option; notify (clone) |
+ | | | single: notify; clone option |
+ | | | |
+ | | | Call the resource agent's **notify** action for |
+ | | | all active instances, before and after starting |
+ | | | or stopping any clone instance. The resource |
+ | | | agent must support this action. |
+ | | | Allowed values: **false**, **true** |
+ +-------------------+-----------------+-------------------------------------------------------+
+ | ordered | false | .. index:: |
+ | | | single: clone; option, ordered |
+ | | | single: option; ordered (clone) |
+ | | | single: ordered; clone option |
+ | | | |
+ | | | If **true**, clone instances must be started |
+ | | | sequentially instead of in parallel. |
+ | | | Allowed values: **false**, **true** |
+ +-------------------+-----------------+-------------------------------------------------------+
+ | interleave | false | .. index:: |
+ | | | single: clone; option, interleave |
+ | | | single: option; interleave (clone) |
+ | | | single: interleave; clone option |
+ | | | |
+ | | | When this clone is ordered relative to another |
+ | | | clone, if this option is **false** (the default), |
+ | | | the ordering is relative to *all* instances of |
+ | | | the other clone, whereas if this option is |
+ | | | **true**, the ordering is relative only to |
+ | | | instances on the same node. |
+ | | | Allowed values: **false**, **true** |
+ +-------------------+-----------------+-------------------------------------------------------+
+ | promotable | false | .. index:: |
+ | | | single: clone; option, promotable |
+ | | | single: option; promotable (clone) |
+ | | | single: promotable; clone option |
+ | | | |
+ | | | If **true**, clone instances can perform a |
+ | | | special role that Pacemaker will manage via the |
+ | | | resource agent's **promote** and **demote** |
+ | | | actions. The resource agent must support these |
+ | | | actions. |
+ | | | Allowed values: **false**, **true** |
+ +-------------------+-----------------+-------------------------------------------------------+
+ | promoted-max | 1 | .. index:: |
+ | | | single: clone; option, promoted-max |
+ | | | single: option; promoted-max (clone) |
+ | | | single: promoted-max; clone option |
+ | | | |
+ | | | If ``promotable`` is **true**, the number of |
+ | | | instances that can be promoted at one time |
+ | | | across the entire cluster |
+ +-------------------+-----------------+-------------------------------------------------------+
+ | promoted-node-max | 1 | .. index:: |
+ | | | single: clone; option, promoted-node-max |
+ | | | single: option; promoted-node-max (clone) |
+ | | | single: promoted-node-max; clone option |
+ | | | |
+ | | | If ``promotable`` is **true** and ``globally-unique`` |
+ | | | is **false**, the number of clone instances can be |
+ | | | promoted at one time on a single node |
+ +-------------------+-----------------+-------------------------------------------------------+
+
+.. note:: **Deprecated Terminology**
+
+ In older documentation and online examples, you may see promotable clones
+ referred to as *multi-state*, *stateful*, or *master/slave*; these mean the
+ same thing as *promotable*. Certain syntax is supported for backward
+ compatibility, but is deprecated and will be removed in a future version:
+
+ * Using a ``master`` tag, instead of a ``clone`` tag with the ``promotable``
+ meta-attribute set to ``true``
+ * Using the ``master-max`` meta-attribute instead of ``promoted-max``
+ * Using the ``master-node-max`` meta-attribute instead of
+ ``promoted-node-max``
+ * Using ``Master`` as a role name instead of ``Promoted``
+ * Using ``Slave`` as a role name instead of ``Unpromoted``
+
+
+Clone Contents
+______________
+
+Clones must contain exactly one primitive or group resource.
+
+.. topic:: A clone that runs a web server on all nodes
+
+ .. code-block:: xml
+
+ <clone id="apache-clone">
+ <primitive id="apache" class="lsb" type="apache">
+ <operations>
+ <op id="apache-monitor" name="monitor" interval="30"/>
+ </operations>
+ </primitive>
+ </clone>
+
+.. warning::
+
+ You should never reference the name of a clone's child (the primitive or group
+ resource being cloned). If you think you need to do this, you probably need to
+ re-evaluate your design.
+
+Clone Instance Attribute
+________________________
+
+Clones have no instance attributes; however, any that are set here will be
+inherited by the clone's child.
+
+.. index::
+ single: clone; constraint
+
+Clone Constraints
+_________________
+
+In most cases, a clone will have a single instance on each active cluster
+node. If this is not the case, you can indicate which nodes the
+cluster should preferentially assign copies to with resource location
+constraints. These constraints are written no differently from those
+for primitive resources except that the clone's **id** is used.
+
+.. topic:: Some constraints involving clones
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_location id="clone-prefers-node1" rsc="apache-clone" node="node1" score="500"/>
+ <rsc_colocation id="stats-with-clone" rsc="apache-stats" with="apache-clone"/>
+ <rsc_order id="start-clone-then-stats" first="apache-clone" then="apache-stats"/>
+ </constraints>
+
+Ordering constraints behave slightly differently for clones. In the
+example above, ``apache-stats`` will wait until all copies of ``apache-clone``
+that need to be started have done so before being started itself.
+Only if *no* copies can be started will ``apache-stats`` be prevented
+from being active. Additionally, the clone will wait for
+``apache-stats`` to be stopped before stopping itself.
+
+Colocation of a primitive or group resource with a clone means that
+the resource can run on any node with an active instance of the clone.
+The cluster will choose an instance based on where the clone is running and
+the resource's own location preferences.
+
+Colocation between clones is also possible. If one clone **A** is colocated
+with another clone **B**, the set of allowed locations for **A** is limited to
+nodes on which **B** is (or will be) active. Placement is then performed
+normally.
+
+.. index::
+ single: promotable clone; constraint
+
+.. _promotable-clone-constraints:
+
+Promotable Clone Constraints
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For promotable clone resources, the ``first-action`` and/or ``then-action`` fields
+for ordering constraints may be set to ``promote`` or ``demote`` to constrain the
+promoted role, and colocation constraints may contain ``rsc-role`` and/or
+``with-rsc-role`` fields.
+
+.. topic:: Constraints involving promotable clone resources
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_location id="db-prefers-node1" rsc="database" node="node1" score="500"/>
+ <rsc_colocation id="backup-with-db-unpromoted" rsc="backup"
+ with-rsc="database" with-rsc-role="Unpromoted"/>
+ <rsc_colocation id="myapp-with-db-promoted" rsc="myApp"
+ with-rsc="database" with-rsc-role="Promoted"/>
+ <rsc_order id="start-db-before-backup" first="database" then="backup"/>
+ <rsc_order id="promote-db-then-app" first="database" first-action="promote"
+ then="myApp" then-action="start"/>
+ </constraints>
+
+In the example above, **myApp** will wait until one of the database
+copies has been started and promoted before being started
+itself on the same node. Only if no copies can be promoted will **myApp** be
+prevented from being active. Additionally, the cluster will wait for
+**myApp** to be stopped before demoting the database.
+
+Colocation of a primitive or group resource with a promotable clone
+resource means that it can run on any node with an active instance of
+the promotable clone resource that has the specified role (``Promoted`` or
+``Unpromoted``). In the example above, the cluster will choose a location
+based on where database is running in the promoted role, and if there are
+multiple promoted instances it will also factor in **myApp**'s own location
+preferences when deciding which location to choose.
+
+Colocation with regular clones and other promotable clone resources is also
+possible. In such cases, the set of allowed locations for the **rsc**
+clone is (after role filtering) limited to nodes on which the
+``with-rsc`` promotable clone resource is (or will be) in the specified role.
+Placement is then performed as normal.
+
+Using Promotable Clone Resources in Colocation Sets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When a promotable clone is used in a :ref:`resource set <s-resource-sets>`
+inside a colocation constraint, the resource set may take a ``role`` attribute.
+
+In the following example, an instance of **B** may be promoted only on a node
+where **A** is in the promoted role. Additionally, resources **C** and **D**
+must be located on a node where both **A** and **B** are promoted.
+
+.. topic:: Colocate C and D with A's and B's promoted instances
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_colocation id="coloc-1" score="INFINITY" >
+ <resource_set id="colocated-set-example-1" sequential="true" role="Promoted">
+ <resource_ref id="A"/>
+ <resource_ref id="B"/>
+ </resource_set>
+ <resource_set id="colocated-set-example-2" sequential="true">
+ <resource_ref id="C"/>
+ <resource_ref id="D"/>
+ </resource_set>
+ </rsc_colocation>
+ </constraints>
+
+Using Promotable Clone Resources in Ordered Sets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When a promotable clone is used in a :ref:`resource set <s-resource-sets>`
+inside an ordering constraint, the resource set may take an ``action``
+attribute.
+
+.. topic:: Start C and D after first promoting A and B
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_order id="order-1" score="INFINITY" >
+ <resource_set id="ordered-set-1" sequential="true" action="promote">
+ <resource_ref id="A"/>
+ <resource_ref id="B"/>
+ </resource_set>
+ <resource_set id="ordered-set-2" sequential="true" action="start">
+ <resource_ref id="C"/>
+ <resource_ref id="D"/>
+ </resource_set>
+ </rsc_order>
+ </constraints>
+
+In the above example, **B** cannot be promoted until **A** has been promoted.
+Additionally, resources **C** and **D** must wait until **A** and **B** have
+been promoted before they can start.
+
+.. index::
+ pair: resource-stickiness; clone
+
+.. _s-clone-stickiness:
+
+Clone Stickiness
+________________
+
+To achieve a stable allocation pattern, clones are slightly sticky by
+default. If no value for ``resource-stickiness`` is provided, the clone
+will use a value of 1. Being a small value, it causes minimal
+disturbance to the score calculations of other resources but is enough
+to prevent Pacemaker from needlessly moving copies around the cluster.
+
+.. note::
+
+ For globally unique clones, this may result in multiple instances of the
+ clone staying on a single node, even after another eligible node becomes
+ active (for example, after being put into standby mode then made active again).
+ If you do not want this behavior, specify a ``resource-stickiness`` of 0
+ for the clone temporarily and let the cluster adjust, then set it back
+ to 1 if you want the default behavior to apply again.
+
+.. important::
+
+ If ``resource-stickiness`` is set in the ``rsc_defaults`` section, it will
+ apply to clone instances as well. This means an explicit ``resource-stickiness``
+ of 0 in ``rsc_defaults`` works differently from the implicit default used when
+ ``resource-stickiness`` is not specified.
+
+Clone Resource Agent Requirements
+_________________________________
+
+Any resource can be used as an anonymous clone, as it requires no
+additional support from the resource agent. Whether it makes sense to
+do so depends on your resource and its resource agent.
+
+Resource Agent Requirements for Globally Unique Clones
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Globally unique clones require additional support in the resource agent. In
+particular, it must only respond with ``${OCF_SUCCESS}`` if the node has that
+exact instance active. All other probes for instances of the clone should
+result in ``${OCF_NOT_RUNNING}`` (or one of the other OCF error codes if
+they are failed).
+
+Individual instances of a clone are identified by appending a colon and a
+numerical offset, e.g. **apache:2**.
+
+Resource agents can find out how many copies there are by examining
+the ``OCF_RESKEY_CRM_meta_clone_max`` environment variable and which
+instance it is by examining ``OCF_RESKEY_CRM_meta_clone``.
+
+The resource agent must not make any assumptions (based on
+``OCF_RESKEY_CRM_meta_clone``) about which numerical instances are active. In
+particular, the list of active copies will not always be an unbroken
+sequence, nor always start at 0.
+
+Resource Agent Requirements for Promotable Clones
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Promotable clone resources require two extra actions, ``demote`` and ``promote``,
+which are responsible for changing the state of the resource. Like **start** and
+**stop**, they should return ``${OCF_SUCCESS}`` if they completed successfully or
+a relevant error code if they did not.
+
+The states can mean whatever you wish, but when the resource is
+started, it must come up in the unpromoted role. From there, the
+cluster will decide which instances to promote.
+
+In addition to the clone requirements for monitor actions, agents must
+also *accurately* report which state they are in. The cluster relies
+on the agent to report its status (including role) accurately and does
+not indicate to the agent what role it currently believes it to be in.
+
+.. table:: **Role implications of OCF return codes**
+ :widths: 1 3
+
+ +----------------------+--------------------------------------------------+
+ | Monitor Return Code | Description |
+ +======================+==================================================+
+ | OCF_NOT_RUNNING | .. index:: |
+ | | single: OCF_NOT_RUNNING |
+ | | single: OCF return code; OCF_NOT_RUNNING |
+ | | |
+ | | Stopped |
+ +----------------------+--------------------------------------------------+
+ | OCF_SUCCESS | .. index:: |
+ | | single: OCF_SUCCESS |
+ | | single: OCF return code; OCF_SUCCESS |
+ | | |
+ | | Running (Unpromoted) |
+ +----------------------+--------------------------------------------------+
+ | OCF_RUNNING_PROMOTED | .. index:: |
+ | | single: OCF_RUNNING_PROMOTED |
+ | | single: OCF return code; OCF_RUNNING_PROMOTED |
+ | | |
+ | | Running (Promoted) |
+ +----------------------+--------------------------------------------------+
+ | OCF_FAILED_PROMOTED | .. index:: |
+ | | single: OCF_FAILED_PROMOTED |
+ | | single: OCF return code; OCF_FAILED_PROMOTED |
+ | | |
+ | | Failed (Promoted) |
+ +----------------------+--------------------------------------------------+
+ | Other | .. index:: |
+ | | single: return code |
+ | | |
+ | | Failed (Unpromoted) |
+ +----------------------+--------------------------------------------------+
+
+Clone Notifications
+~~~~~~~~~~~~~~~~~~~
+
+If the clone has the ``notify`` meta-attribute set to **true**, and the resource
+agent supports the ``notify`` action, Pacemaker will call the action when
+appropriate, passing a number of extra variables which, when combined with
+additional context, can be used to calculate the current state of the cluster
+and what is about to happen to it.
+
+.. index::
+ single: clone; environment variables
+ single: notify; environment variables
+
+.. table:: **Environment variables supplied with Clone notify actions**
+ :widths: 1 1
+
+ +----------------------------------------------+-------------------------------------------------------------------------------+
+ | Variable | Description |
+ +==============================================+===============================================================================+
+ | OCF_RESKEY_CRM_meta_notify_type | .. index:: |
+ | | single: environment variable; OCF_RESKEY_CRM_meta_notify_type |
+ | | single: OCF_RESKEY_CRM_meta_notify_type |
+ | | |
+ | | Allowed values: **pre**, **post** |
+ +----------------------------------------------+-------------------------------------------------------------------------------+
+ | OCF_RESKEY_CRM_meta_notify_operation | .. index:: |
+ | | single: environment variable; OCF_RESKEY_CRM_meta_notify_operation |
+ | | single: OCF_RESKEY_CRM_meta_notify_operation |
+ | | |
+ | | Allowed values: **start**, **stop** |
+ +----------------------------------------------+-------------------------------------------------------------------------------+
+ | OCF_RESKEY_CRM_meta_notify_start_resource | .. index:: |
+ | | single: environment variable; OCF_RESKEY_CRM_meta_notify_start_resource |
+ | | single: OCF_RESKEY_CRM_meta_notify_start_resource |
+ | | |
+ | | Resources to be started |
+ +----------------------------------------------+-------------------------------------------------------------------------------+
+ | OCF_RESKEY_CRM_meta_notify_stop_resource | .. index:: |
+ | | single: environment variable; OCF_RESKEY_CRM_meta_notify_stop_resource |
+ | | single: OCF_RESKEY_CRM_meta_notify_stop_resource |
+ | | |
+ | | Resources to be stopped |
+ +----------------------------------------------+-------------------------------------------------------------------------------+
+ | OCF_RESKEY_CRM_meta_notify_active_resource | .. index:: |
+ | | single: environment variable; OCF_RESKEY_CRM_meta_notify_active_resource |
+ | | single: OCF_RESKEY_CRM_meta_notify_active_resource |
+ | | |
+ | | Resources that are running |
+ +----------------------------------------------+-------------------------------------------------------------------------------+
+ | OCF_RESKEY_CRM_meta_notify_inactive_resource | .. index:: |
+ | | single: environment variable; OCF_RESKEY_CRM_meta_notify_inactive_resource |
+ | | single: OCF_RESKEY_CRM_meta_notify_inactive_resource |
+ | | |
+ | | Resources that are not running |
+ +----------------------------------------------+-------------------------------------------------------------------------------+
+ | OCF_RESKEY_CRM_meta_notify_start_uname | .. index:: |
+ | | single: environment variable; OCF_RESKEY_CRM_meta_notify_start_uname |
+ | | single: OCF_RESKEY_CRM_meta_notify_start_uname |
+ | | |
+ | | Nodes on which resources will be started |
+ +----------------------------------------------+-------------------------------------------------------------------------------+
+ | OCF_RESKEY_CRM_meta_notify_stop_uname | .. index:: |
+ | | single: environment variable; OCF_RESKEY_CRM_meta_notify_stop_uname |
+ | | single: OCF_RESKEY_CRM_meta_notify_stop_uname |
+ | | |
+ | | Nodes on which resources will be stopped |
+ +----------------------------------------------+-------------------------------------------------------------------------------+
+ | OCF_RESKEY_CRM_meta_notify_active_uname | .. index:: |
+ | | single: environment variable; OCF_RESKEY_CRM_meta_notify_active_uname |
+ | | single: OCF_RESKEY_CRM_meta_notify_active_uname |
+ | | |
+ | | Nodes on which resources are running |
+ +----------------------------------------------+-------------------------------------------------------------------------------+
+
+The variables come in pairs, such as
+``OCF_RESKEY_CRM_meta_notify_start_resource`` and
+``OCF_RESKEY_CRM_meta_notify_start_uname``, and should be treated as an
+array of whitespace-separated elements.
+
+``OCF_RESKEY_CRM_meta_notify_inactive_resource`` is an exception, as the
+matching **uname** variable does not exist since inactive resources
+are not running on any node.
+
+Thus, in order to indicate that **clone:0** will be started on **sles-1**,
+**clone:2** will be started on **sles-3**, and **clone:3** will be started
+on **sles-2**, the cluster would set:
+
+.. topic:: Notification variables
+
+ .. code-block:: none
+
+ OCF_RESKEY_CRM_meta_notify_start_resource="clone:0 clone:2 clone:3"
+ OCF_RESKEY_CRM_meta_notify_start_uname="sles-1 sles-3 sles-2"
+
+.. note::
+
+ Pacemaker will log but otherwise ignore failures of notify actions.
+
+Interpretation of Notification Variables
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+**Pre-notification (stop):**
+
+* Active resources: ``$OCF_RESKEY_CRM_meta_notify_active_resource``
+* Inactive resources: ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
+* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+
+**Post-notification (stop) / Pre-notification (start):**
+
+* Active resources
+
+ * ``$OCF_RESKEY_CRM_meta_notify_active_resource``
+ * minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+
+* Inactive resources
+
+ * ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
+ * plus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+
+* Resources that were started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+* Resources that were stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+
+**Post-notification (start):**
+
+* Active resources:
+
+ * ``$OCF_RESKEY_CRM_meta_notify_active_resource``
+ * minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+ * plus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+
+* Inactive resources:
+
+ * ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
+ * plus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+ * minus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+
+* Resources that were started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+* Resources that were stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+
+Extra Notifications for Promotable Clones
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. index::
+ single: clone; environment variables
+ single: promotable; environment variables
+
+.. table:: **Extra environment variables supplied for promotable clones**
+ :widths: 1 1
+
+ +------------------------------------------------+---------------------------------------------------------------------------------+
+ | Variable | Description |
+ +================================================+=================================================================================+
+ | OCF_RESKEY_CRM_meta_notify_promoted_resource | .. index:: |
+ | | single: environment variable; OCF_RESKEY_CRM_meta_notify_promoted_resource |
+ | | single: OCF_RESKEY_CRM_meta_notify_promoted_resource |
+ | | |
+ | | Resources that are running in the promoted role |
+ +------------------------------------------------+---------------------------------------------------------------------------------+
+ | OCF_RESKEY_CRM_meta_notify_unpromoted_resource | .. index:: |
+ | | single: environment variable; OCF_RESKEY_CRM_meta_notify_unpromoted_resource |
+ | | single: OCF_RESKEY_CRM_meta_notify_unpromoted_resource |
+ | | |
+ | | Resources that are running in the unpromoted role |
+ +------------------------------------------------+---------------------------------------------------------------------------------+
+ | OCF_RESKEY_CRM_meta_notify_promote_resource | .. index:: |
+ | | single: environment variable; OCF_RESKEY_CRM_meta_notify_promote_resource |
+ | | single: OCF_RESKEY_CRM_meta_notify_promote_resource |
+ | | |
+ | | Resources to be promoted |
+ +------------------------------------------------+---------------------------------------------------------------------------------+
+ | OCF_RESKEY_CRM_meta_notify_demote_resource | .. index:: |
+ | | single: environment variable; OCF_RESKEY_CRM_meta_notify_demote_resource |
+ | | single: OCF_RESKEY_CRM_meta_notify_demote_resource |
+ | | |
+ | | Resources to be demoted |
+ +------------------------------------------------+---------------------------------------------------------------------------------+
+ | OCF_RESKEY_CRM_meta_notify_promote_uname | .. index:: |
+ | | single: environment variable; OCF_RESKEY_CRM_meta_notify_promote_uname |
+ | | single: OCF_RESKEY_CRM_meta_notify_promote_uname |
+ | | |
+ | | Nodes on which resources will be promoted |
+ +------------------------------------------------+---------------------------------------------------------------------------------+
+ | OCF_RESKEY_CRM_meta_notify_demote_uname | .. index:: |
+ | | single: environment variable; OCF_RESKEY_CRM_meta_notify_demote_uname |
+ | | single: OCF_RESKEY_CRM_meta_notify_demote_uname |
+ | | |
+ | | Nodes on which resources will be demoted |
+ +------------------------------------------------+---------------------------------------------------------------------------------+
+ | OCF_RESKEY_CRM_meta_notify_promoted_uname | .. index:: |
+ | | single: environment variable; OCF_RESKEY_CRM_meta_notify_promoted_uname |
+ | | single: OCF_RESKEY_CRM_meta_notify_promoted_uname |
+ | | |
+ | | Nodes on which resources are running in the promoted role |
+ +------------------------------------------------+---------------------------------------------------------------------------------+
+ | OCF_RESKEY_CRM_meta_notify_unpromoted_uname | .. index:: |
+ | | single: environment variable; OCF_RESKEY_CRM_meta_notify_unpromoted_uname |
+ | | single: OCF_RESKEY_CRM_meta_notify_unpromoted_uname |
+ | | |
+ | | Nodes on which resources are running in the unpromoted role |
+ +------------------------------------------------+---------------------------------------------------------------------------------+
+
+Interpretation of Promotable Notification Variables
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+**Pre-notification (demote):**
+
+* Active resources: ``$OCF_RESKEY_CRM_meta_notify_active_resource``
+* Promoted resources: ``$OCF_RESKEY_CRM_meta_notify_promoted_resource``
+* Unpromoted resources: ``$OCF_RESKEY_CRM_meta_notify_unpromoted_resource``
+* Inactive resources: ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
+* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+* Resources to be promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
+* Resources to be demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
+* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+
+**Post-notification (demote) / Pre-notification (stop):**
+
+* Active resources: ``$OCF_RESKEY_CRM_meta_notify_active_resource``
+* Promoted resources:
+
+ * ``$OCF_RESKEY_CRM_meta_notify_promoted_resource``
+ * minus ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
+
+* Unpromoted resources: ``$OCF_RESKEY_CRM_meta_notify_unpromoted_resource``
+* Inactive resources: ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
+* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+* Resources to be promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
+* Resources to be demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
+* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+* Resources that were demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
+
+**Post-notification (stop) / Pre-notification (start)**
+
+* Active resources:
+
+ * ``$OCF_RESKEY_CRM_meta_notify_active_resource``
+ * minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+
+* Promoted resources:
+
+ * ``$OCF_RESKEY_CRM_meta_notify_promoted_resource``
+ * minus ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
+
+* Unpromoted resources:
+
+ * ``$OCF_RESKEY_CRM_meta_notify_unpromoted_resource``
+ * minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+
+* Inactive resources:
+
+ * ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
+ * plus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+
+* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+* Resources to be promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
+* Resources to be demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
+* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+* Resources that were demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
+* Resources that were stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+
+**Post-notification (start) / Pre-notification (promote)**
+
+* Active resources:
+
+ * ``$OCF_RESKEY_CRM_meta_notify_active_resource``
+ * minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+ * plus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+
+* Promoted resources:
+
+ * ``$OCF_RESKEY_CRM_meta_notify_promoted_resource``
+ * minus ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
+
+* Unpromoted resources:
+
+ * ``$OCF_RESKEY_CRM_meta_notify_unpromoted_resource``
+ * minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+ * plus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+
+* Inactive resources:
+
+ * ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
+ * plus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+ * minus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+
+* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+* Resources to be promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
+* Resources to be demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
+* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+* Resources that were started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+* Resources that were demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
+* Resources that were stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+
+**Post-notification (promote)**
+
+* Active resources:
+
+ * ``$OCF_RESKEY_CRM_meta_notify_active_resource``
+ * minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+ * plus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+
+* Promoted resources:
+
+ * ``$OCF_RESKEY_CRM_meta_notify_promoted_resource``
+ * minus ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
+ * plus ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
+
+* Unpromoted resources:
+
+ * ``$OCF_RESKEY_CRM_meta_notify_unpromoted_resource``
+ * minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+ * plus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+ * minus ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
+
+* Inactive resources:
+
+ * ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
+ * plus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+ * minus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+
+* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+* Resources to be promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
+* Resources to be demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
+* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+* Resources that were started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
+* Resources that were promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
+* Resources that were demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
+* Resources that were stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
+
+Monitoring Promotable Clone Resources
+_____________________________________
+
+The usual monitor actions are insufficient to monitor a promotable clone
+resource, because Pacemaker needs to verify not only that the resource is
+active, but also that its actual role matches its intended one.
+
+Define two monitoring actions: the usual one will cover the unpromoted role,
+and an additional one with ``role="Promoted"`` will cover the promoted role.
+
+.. topic:: Monitoring both states of a promotable clone resource
+
+ .. code-block:: xml
+
+ <clone id="myPromotableRsc">
+ <meta_attributes id="myPromotableRsc-meta">
+ <nvpair name="promotable" value="true"/>
+ </meta_attributes>
+ <primitive id="myRsc" class="ocf" type="myApp" provider="myCorp">
+ <operations>
+ <op id="public-ip-unpromoted-check" name="monitor" interval="60"/>
+ <op id="public-ip-promoted-check" name="monitor" interval="61" role="Promoted"/>
+ </operations>
+ </primitive>
+ </clone>
+
+.. important::
+
+ It is crucial that *every* monitor operation has a different interval!
+ Pacemaker currently differentiates between operations
+ only by resource and interval; so if (for example) a promotable clone resource
+ had the same monitor interval for both roles, Pacemaker would ignore the
+ role when checking the status -- which would cause unexpected return
+ codes, and therefore unnecessary complications.
+
+.. _s-promotion-scores:
+
+Determining Which Instance is Promoted
+______________________________________
+
+Pacemaker can choose a promotable clone instance to be promoted in one of two
+ways:
+
+* Promotion scores: These are node attributes set via the ``crm_attribute``
+ command using the ``--promotion`` option, which generally would be called by
+ the resource agent's start action if it supports promotable clones. This tool
+ automatically detects both the resource and host, and should be used to set a
+ preference for being promoted. Based on this, ``promoted-max``, and
+ ``promoted-node-max``, the instance(s) with the highest preference will be
+ promoted.
+
+* Constraints: Location constraints can indicate which nodes are most preferred
+ to be promoted.
+
+.. topic:: Explicitly preferring node1 to be promoted
+
+ .. code-block:: xml
+
+ <rsc_location id="promoted-location" rsc="myPromotableRsc">
+ <rule id="promoted-rule" score="100" role="Promoted">
+ <expression id="promoted-exp" attribute="#uname" operation="eq" value="node1"/>
+ </rule>
+ </rsc_location>
+
+.. index:
+ single: bundle
+ single: resource; bundle
+ pair: container; Docker
+ pair: container; podman
+ pair: container; rkt
+
+.. _s-resource-bundle:
+
+Bundles - Containerized Resources
+#################################
+
+Pacemaker supports a special syntax for launching a service inside a
+`container <https://en.wikipedia.org/wiki/Operating-system-level_virtualization>`_
+with any infrastructure it requires: the *bundle*.
+
+Pacemaker bundles support `Docker <https://www.docker.com/>`_,
+`podman <https://podman.io/>`_ *(since 2.0.1)*, and
+`rkt <https://coreos.com/rkt/>`_ container technologies. [#]_
+
+.. topic:: A bundle for a containerized web server
+
+ .. code-block:: xml
+
+ <bundle id="httpd-bundle">
+ <podman image="pcmk:http" replicas="3"/>
+ <network ip-range-start="192.168.122.131"
+ host-netmask="24"
+ host-interface="eth0">
+ <port-mapping id="httpd-port" port="80"/>
+ </network>
+ <storage>
+ <storage-mapping id="httpd-syslog"
+ source-dir="/dev/log"
+ target-dir="/dev/log"
+ options="rw"/>
+ <storage-mapping id="httpd-root"
+ source-dir="/srv/html"
+ target-dir="/var/www/html"
+ options="rw,Z"/>
+ <storage-mapping id="httpd-logs"
+ source-dir-root="/var/log/pacemaker/bundles"
+ target-dir="/etc/httpd/logs"
+ options="rw,Z"/>
+ </storage>
+ <primitive class="ocf" id="httpd" provider="heartbeat" type="apache"/>
+ </bundle>
+
+Bundle Prerequisites
+____________________
+
+Before configuring a bundle in Pacemaker, the user must install the appropriate
+container launch technology (Docker, podman, or rkt), and supply a fully
+configured container image, on every node allowed to run the bundle.
+
+Pacemaker will create an implicit resource of type **ocf:heartbeat:docker**,
+**ocf:heartbeat:podman**, or **ocf:heartbeat:rkt** to manage a bundle's
+container. The user must ensure that the appropriate resource agent is
+installed on every node allowed to run the bundle.
+
+.. index::
+ pair: XML element; bundle
+
+Bundle Properties
+_________________
+
+.. table:: **XML Attributes of a bundle Element**
+ :widths: 1 4
+
+ +-------------+------------------------------------------------------------------+
+ | Field | Description |
+ +=============+==================================================================+
+ | id | .. index:: |
+ | | single: bundle; attribute, id |
+ | | single: attribute; id (bundle) |
+ | | single: id; bundle attribute |
+ | | |
+ | | A unique name for the bundle (required) |
+ +-------------+------------------------------------------------------------------+
+ | description | .. index:: |
+ | | single: bundle; attribute, description |
+ | | single: attribute; description (bundle) |
+ | | single: description; bundle attribute |
+ | | |
+ | | An optional description of the group, for the user's own |
+ | | purposes. |
+ | | E.g. ``manages the container that runs the service`` |
+ +-------------+------------------------------------------------------------------+
+
+
+A bundle must contain exactly one ``docker``, ``podman``, or ``rkt`` element.
+
+.. index::
+ pair: XML element; docker
+ pair: XML element; podman
+ pair: XML element; rkt
+
+Bundle Container Properties
+___________________________
+
+.. table:: **XML attributes of a docker, podman, or rkt Element**
+ :class: longtable
+ :widths: 2 3 4
+
+ +-------------------+------------------------------------+---------------------------------------------------+
+ | Attribute | Default | Description |
+ +===================+====================================+===================================================+
+ | image | | .. index:: |
+ | | | single: docker; attribute, image |
+ | | | single: attribute; image (docker) |
+ | | | single: image; docker attribute |
+ | | | single: podman; attribute, image |
+ | | | single: attribute; image (podman) |
+ | | | single: image; podman attribute |
+ | | | single: rkt; attribute, image |
+ | | | single: attribute; image (rkt) |
+ | | | single: image; rkt attribute |
+ | | | |
+ | | | Container image tag (required) |
+ +-------------------+------------------------------------+---------------------------------------------------+
+ | replicas | Value of ``promoted-max`` | .. index:: |
+ | | if that is positive, else 1 | single: docker; attribute, replicas |
+ | | | single: attribute; replicas (docker) |
+ | | | single: replicas; docker attribute |
+ | | | single: podman; attribute, replicas |
+ | | | single: attribute; replicas (podman) |
+ | | | single: replicas; podman attribute |
+ | | | single: rkt; attribute, replicas |
+ | | | single: attribute; replicas (rkt) |
+ | | | single: replicas; rkt attribute |
+ | | | |
+ | | | A positive integer specifying the number of |
+ | | | container instances to launch |
+ +-------------------+------------------------------------+---------------------------------------------------+
+ | replicas-per-host | 1 | .. index:: |
+ | | | single: docker; attribute, replicas-per-host |
+ | | | single: attribute; replicas-per-host (docker) |
+ | | | single: replicas-per-host; docker attribute |
+ | | | single: podman; attribute, replicas-per-host |
+ | | | single: attribute; replicas-per-host (podman) |
+ | | | single: replicas-per-host; podman attribute |
+ | | | single: rkt; attribute, replicas-per-host |
+ | | | single: attribute; replicas-per-host (rkt) |
+ | | | single: replicas-per-host; rkt attribute |
+ | | | |
+ | | | A positive integer specifying the number of |
+ | | | container instances allowed to run on a |
+ | | | single node |
+ +-------------------+------------------------------------+---------------------------------------------------+
+ | promoted-max | 0 | .. index:: |
+ | | | single: docker; attribute, promoted-max |
+ | | | single: attribute; promoted-max (docker) |
+ | | | single: promoted-max; docker attribute |
+ | | | single: podman; attribute, promoted-max |
+ | | | single: attribute; promoted-max (podman) |
+ | | | single: promoted-max; podman attribute |
+ | | | single: rkt; attribute, promoted-max |
+ | | | single: attribute; promoted-max (rkt) |
+ | | | single: promoted-max; rkt attribute |
+ | | | |
+ | | | A non-negative integer that, if positive, |
+ | | | indicates that the containerized service |
+ | | | should be treated as a promotable service, |
+ | | | with this many replicas allowed to run the |
+ | | | service in the promoted role |
+ +-------------------+------------------------------------+---------------------------------------------------+
+ | network | | .. index:: |
+ | | | single: docker; attribute, network |
+ | | | single: attribute; network (docker) |
+ | | | single: network; docker attribute |
+ | | | single: podman; attribute, network |
+ | | | single: attribute; network (podman) |
+ | | | single: network; podman attribute |
+ | | | single: rkt; attribute, network |
+ | | | single: attribute; network (rkt) |
+ | | | single: network; rkt attribute |
+ | | | |
+ | | | If specified, this will be passed to the |
+ | | | ``docker run``, ``podman run``, or |
+ | | | ``rkt run`` command as the network setting |
+ | | | for the container. |
+ +-------------------+------------------------------------+---------------------------------------------------+
+ | run-command | ``/usr/sbin/pacemaker-remoted`` if | .. index:: |
+ | | bundle contains a **primitive**, | single: docker; attribute, run-command |
+ | | otherwise none | single: attribute; run-command (docker) |
+ | | | single: run-command; docker attribute |
+ | | | single: podman; attribute, run-command |
+ | | | single: attribute; run-command (podman) |
+ | | | single: run-command; podman attribute |
+ | | | single: rkt; attribute, run-command |
+ | | | single: attribute; run-command (rkt) |
+ | | | single: run-command; rkt attribute |
+ | | | |
+ | | | This command will be run inside the container |
+ | | | when launching it ("PID 1"). If the bundle |
+ | | | contains a **primitive**, this command *must* |
+ | | | start ``pacemaker-remoted`` (but could, for |
+ | | | example, be a script that does other stuff, too). |
+ +-------------------+------------------------------------+---------------------------------------------------+
+ | options | | .. index:: |
+ | | | single: docker; attribute, options |
+ | | | single: attribute; options (docker) |
+ | | | single: options; docker attribute |
+ | | | single: podman; attribute, options |
+ | | | single: attribute; options (podman) |
+ | | | single: options; podman attribute |
+ | | | single: rkt; attribute, options |
+ | | | single: attribute; options (rkt) |
+ | | | single: options; rkt attribute |
+ | | | |
+ | | | Extra command-line options to pass to the |
+ | | | ``docker run``, ``podman run``, or ``rkt run`` |
+ | | | command |
+ +-------------------+------------------------------------+---------------------------------------------------+
+
+.. note::
+
+ Considerations when using cluster configurations or container images from
+ Pacemaker 1.1:
+
+ * If the container image has a pre-2.0.0 version of Pacemaker, set ``run-command``
+ to ``/usr/sbin/pacemaker_remoted`` (note the underbar instead of dash).
+
+ * ``masters`` is accepted as an alias for ``promoted-max``, but is deprecated since
+ 2.0.0, and support for it will be removed in a future version.
+
+Bundle Network Properties
+_________________________
+
+A bundle may optionally contain one ``<network>`` element.
+
+.. index::
+ pair: XML element; network
+ single: bundle; network
+
+.. table:: **XML attributes of a network Element**
+ :widths: 2 1 5
+
+ +----------------+---------+------------------------------------------------------------+
+ | Attribute | Default | Description |
+ +================+=========+============================================================+
+ | add-host | TRUE | .. index:: |
+ | | | single: network; attribute, add-host |
+ | | | single: attribute; add-host (network) |
+ | | | single: add-host; network attribute |
+ | | | |
+ | | | If TRUE, and ``ip-range-start`` is used, Pacemaker will |
+ | | | automatically ensure that ``/etc/hosts`` inside the |
+ | | | containers has entries for each |
+ | | | :ref:`replica name <s-resource-bundle-note-replica-names>` |
+ | | | and its assigned IP. |
+ +----------------+---------+------------------------------------------------------------+
+ | ip-range-start | | .. index:: |
+ | | | single: network; attribute, ip-range-start |
+ | | | single: attribute; ip-range-start (network) |
+ | | | single: ip-range-start; network attribute |
+ | | | |
+ | | | If specified, Pacemaker will create an implicit |
+ | | | ``ocf:heartbeat:IPaddr2`` resource for each container |
+ | | | instance, starting with this IP address, using up to |
+ | | | ``replicas`` sequential addresses. These addresses can be |
+ | | | used from the host's network to reach the service inside |
+ | | | the container, though it is not visible within the |
+ | | | container itself. Only IPv4 addresses are currently |
+ | | | supported. |
+ +----------------+---------+------------------------------------------------------------+
+ | host-netmask | 32 | .. index:: |
+ | | | single: network; attribute; host-netmask |
+ | | | single: attribute; host-netmask (network) |
+ | | | single: host-netmask; network attribute |
+ | | | |
+ | | | If ``ip-range-start`` is specified, the IP addresses |
+ | | | are created with this CIDR netmask (as a number of bits). |
+ +----------------+---------+------------------------------------------------------------+
+ | host-interface | | .. index:: |
+ | | | single: network; attribute; host-interface |
+ | | | single: attribute; host-interface (network) |
+ | | | single: host-interface; network attribute |
+ | | | |
+ | | | If ``ip-range-start`` is specified, the IP addresses are |
+ | | | created on this host interface (by default, it will be |
+ | | | determined from the IP address). |
+ +----------------+---------+------------------------------------------------------------+
+ | control-port | 3121 | .. index:: |
+ | | | single: network; attribute; control-port |
+ | | | single: attribute; control-port (network) |
+ | | | single: control-port; network attribute |
+ | | | |
+ | | | If the bundle contains a ``primitive``, the cluster will |
+ | | | use this integer TCP port for communication with |
+ | | | Pacemaker Remote inside the container. Changing this is |
+ | | | useful when the container is unable to listen on the |
+ | | | default port, for example, when the container uses the |
+ | | | host's network rather than ``ip-range-start`` (in which |
+ | | | case ``replicas-per-host`` must be 1), or when the bundle |
+ | | | may run on a Pacemaker Remote node that is already |
+ | | | listening on the default port. Any ``PCMK_remote_port`` |
+ | | | environment variable set on the host or in the container |
+ | | | is ignored for bundle connections. |
+ +----------------+---------+------------------------------------------------------------+
+
+.. _s-resource-bundle-note-replica-names:
+
+.. note::
+
+ Replicas are named by the bundle id plus a dash and an integer counter starting
+ with zero. For example, if a bundle named **httpd-bundle** has **replicas=2**, its
+ containers will be named **httpd-bundle-0** and **httpd-bundle-1**.
+
+.. index::
+ pair: XML element; port-mapping
+
+Additionally, a ``network`` element may optionally contain one or more
+``port-mapping`` elements.
+
+.. table:: **Attributes of a port-mapping Element**
+ :widths: 2 1 5
+
+ +---------------+-------------------+------------------------------------------------------+
+ | Attribute | Default | Description |
+ +===============+===================+======================================================+
+ | id | | .. index:: |
+ | | | single: port-mapping; attribute, id |
+ | | | single: attribute; id (port-mapping) |
+ | | | single: id; port-mapping attribute |
+ | | | |
+ | | | A unique name for the port mapping (required) |
+ +---------------+-------------------+------------------------------------------------------+
+ | port | | .. index:: |
+ | | | single: port-mapping; attribute, port |
+ | | | single: attribute; port (port-mapping) |
+ | | | single: port; port-mapping attribute |
+ | | | |
+ | | | If this is specified, connections to this TCP port |
+ | | | number on the host network (on the container's |
+ | | | assigned IP address, if ``ip-range-start`` is |
+ | | | specified) will be forwarded to the container |
+ | | | network. Exactly one of ``port`` or ``range`` |
+ | | | must be specified in a ``port-mapping``. |
+ +---------------+-------------------+------------------------------------------------------+
+ | internal-port | value of ``port`` | .. index:: |
+ | | | single: port-mapping; attribute, internal-port |
+ | | | single: attribute; internal-port (port-mapping) |
+ | | | single: internal-port; port-mapping attribute |
+ | | | |
+ | | | If ``port`` and this are specified, connections |
+ | | | to ``port`` on the host's network will be |
+ | | | forwarded to this port on the container network. |
+ +---------------+-------------------+------------------------------------------------------+
+ | range | | .. index:: |
+ | | | single: port-mapping; attribute, range |
+ | | | single: attribute; range (port-mapping) |
+ | | | single: range; port-mapping attribute |
+ | | | |
+ | | | If this is specified, connections to these TCP |
+ | | | port numbers (expressed as *first_port*-*last_port*) |
+ | | | on the host network (on the container's assigned IP |
+ | | | address, if ``ip-range-start`` is specified) will |
+ | | | be forwarded to the same ports in the container |
+ | | | network. Exactly one of ``port`` or ``range`` |
+ | | | must be specified in a ``port-mapping``. |
+ +---------------+-------------------+------------------------------------------------------+
+
+.. note::
+
+ If the bundle contains a ``primitive``, Pacemaker will automatically map the
+ ``control-port``, so it is not necessary to specify that port in a
+ ``port-mapping``.
+
+.. index:
+ pair: XML element; storage
+ pair: XML element; storage-mapping
+ single: bundle; storage
+
+.. _s-bundle-storage:
+
+Bundle Storage Properties
+_________________________
+
+A bundle may optionally contain one ``storage`` element. A ``storage`` element
+has no properties of its own, but may contain one or more ``storage-mapping``
+elements.
+
+.. table:: **Attributes of a storage-mapping Element**
+ :widths: 2 1 5
+
+ +-----------------+---------+-------------------------------------------------------------+
+ | Attribute | Default | Description |
+ +=================+=========+=============================================================+
+ | id | | .. index:: |
+ | | | single: storage-mapping; attribute, id |
+ | | | single: attribute; id (storage-mapping) |
+ | | | single: id; storage-mapping attribute |
+ | | | |
+ | | | A unique name for the storage mapping (required) |
+ +-----------------+---------+-------------------------------------------------------------+
+ | source-dir | | .. index:: |
+ | | | single: storage-mapping; attribute, source-dir |
+ | | | single: attribute; source-dir (storage-mapping) |
+ | | | single: source-dir; storage-mapping attribute |
+ | | | |
+ | | | The absolute path on the host's filesystem that will be |
+ | | | mapped into the container. Exactly one of ``source-dir`` |
+ | | | and ``source-dir-root`` must be specified in a |
+ | | | ``storage-mapping``. |
+ +-----------------+---------+-------------------------------------------------------------+
+ | source-dir-root | | .. index:: |
+ | | | single: storage-mapping; attribute, source-dir-root |
+ | | | single: attribute; source-dir-root (storage-mapping) |
+ | | | single: source-dir-root; storage-mapping attribute |
+ | | | |
+ | | | The start of a path on the host's filesystem that will |
+ | | | be mapped into the container, using a different |
+ | | | subdirectory on the host for each container instance. |
+ | | | The subdirectory will be named the same as the |
+ | | | :ref:`replica name <s-resource-bundle-note-replica-names>`. |
+ | | | Exactly one of ``source-dir`` and ``source-dir-root`` |
+ | | | must be specified in a ``storage-mapping``. |
+ +-----------------+---------+-------------------------------------------------------------+
+ | target-dir | | .. index:: |
+ | | | single: storage-mapping; attribute, target-dir |
+ | | | single: attribute; target-dir (storage-mapping) |
+ | | | single: target-dir; storage-mapping attribute |
+ | | | |
+ | | | The path name within the container where the host |
+ | | | storage will be mapped (required) |
+ +-----------------+---------+-------------------------------------------------------------+
+ | options | | .. index:: |
+ | | | single: storage-mapping; attribute, options |
+ | | | single: attribute; options (storage-mapping) |
+ | | | single: options; storage-mapping attribute |
+ | | | |
+ | | | A comma-separated list of file system mount |
+ | | | options to use when mapping the storage |
+ +-----------------+---------+-------------------------------------------------------------+
+
+.. note::
+
+ Pacemaker does not define the behavior if the source directory does not already
+ exist on the host. However, it is expected that the container technology and/or
+ its resource agent will create the source directory in that case.
+
+.. note::
+
+ If the bundle contains a ``primitive``,
+ Pacemaker will automatically map the equivalent of
+ ``source-dir=/etc/pacemaker/authkey target-dir=/etc/pacemaker/authkey``
+ and ``source-dir-root=/var/log/pacemaker/bundles target-dir=/var/log`` into the
+ container, so it is not necessary to specify those paths in a
+ ``storage-mapping``.
+
+.. important::
+
+ The ``PCMK_authkey_location`` environment variable must not be set to anything
+ other than the default of ``/etc/pacemaker/authkey`` on any node in the cluster.
+
+.. important::
+
+ If SELinux is used in enforcing mode on the host, you must ensure the container
+ is allowed to use any storage you mount into it. For Docker and podman bundles,
+ adding "Z" to the mount options will create a container-specific label for the
+ mount that allows the container access.
+
+.. index::
+ single: bundle; primitive
+
+Bundle Primitive
+________________
+
+A bundle may optionally contain one :ref:`primitive <primitive-resource>`
+resource. The primitive may have operations, instance attributes, and
+meta-attributes defined, as usual.
+
+If a bundle contains a primitive resource, the container image must include
+the Pacemaker Remote daemon, and at least one of ``ip-range-start`` or
+``control-port`` must be configured in the bundle. Pacemaker will create an
+implicit **ocf:pacemaker:remote** resource for the connection, launch
+Pacemaker Remote within the container, and monitor and manage the primitive
+resource via Pacemaker Remote.
+
+If the bundle has more than one container instance (replica), the primitive
+resource will function as an implicit :ref:`clone <s-resource-clone>` -- a
+:ref:`promotable clone <s-resource-promotable>` if the bundle has ``promoted-max``
+greater than zero.
+
+.. note::
+
+ If you want to pass environment variables to a bundle's Pacemaker Remote
+ connection or primitive, you have two options:
+
+ * Environment variables whose value is the same regardless of the underlying host
+ may be set using the container element's ``options`` attribute.
+ * If you want variables to have host-specific values, you can use the
+ :ref:`storage-mapping <s-bundle-storage>` element to map a file on the host as
+ ``/etc/pacemaker/pcmk-init.env`` in the container *(since 2.0.3)*.
+ Pacemaker Remote will parse this file as a shell-like format, with
+ variables set as NAME=VALUE, ignoring blank lines and comments starting
+ with "#".
+
+.. important::
+
+ When a bundle has a ``primitive``, Pacemaker on all cluster nodes must be able to
+ contact Pacemaker Remote inside the bundle's containers.
+
+ * The containers must have an accessible network (for example, ``network`` should
+ not be set to "none" with a ``primitive``).
+ * The default, using a distinct network space inside the container, works in
+ combination with ``ip-range-start``. Any firewall must allow access from all
+ cluster nodes to the ``control-port`` on the container IPs.
+ * If the container shares the host's network space (for example, by setting
+ ``network`` to "host"), a unique ``control-port`` should be specified for each
+ bundle. Any firewall must allow access from all cluster nodes to the
+ ``control-port`` on all cluster and remote node IPs.
+
+.. index::
+ single: bundle; node attributes
+
+.. _s-bundle-attributes:
+
+Bundle Node Attributes
+______________________
+
+If the bundle has a ``primitive``, the primitive's resource agent may want to set
+node attributes such as :ref:`promotion scores <s-promotion-scores>`. However, with
+containers, it is not apparent which node should get the attribute.
+
+If the container uses shared storage that is the same no matter which node the
+container is hosted on, then it is appropriate to use the promotion score on the
+bundle node itself.
+
+On the other hand, if the container uses storage exported from the underlying host,
+then it may be more appropriate to use the promotion score on the underlying host.
+
+Since this depends on the particular situation, the
+``container-attribute-target`` resource meta-attribute allows the user to specify
+which approach to use. If it is set to ``host``, then user-defined node attributes
+will be checked on the underlying host. If it is anything else, the local node
+(in this case the bundle node) is used as usual.
+
+This only applies to user-defined attributes; the cluster will always check the
+local node for cluster-defined attributes such as ``#uname``.
+
+If ``container-attribute-target`` is ``host``, the cluster will pass additional
+environment variables to the primitive's resource agent that allow it to set
+node attributes appropriately: ``CRM_meta_container_attribute_target`` (identical
+to the meta-attribute value) and ``CRM_meta_physical_host`` (the name of the
+underlying host).
+
+.. note::
+
+ When called by a resource agent, the ``attrd_updater`` and ``crm_attribute``
+ commands will automatically check those environment variables and set
+ attributes appropriately.
+
+.. index::
+ single: bundle; meta-attributes
+
+Bundle Meta-Attributes
+______________________
+
+Any meta-attribute set on a bundle will be inherited by the bundle's
+primitive and any resources implicitly created by Pacemaker for the bundle.
+
+This includes options such as ``priority``, ``target-role``, and ``is-managed``. See
+:ref:`resource_options` for more information.
+
+Bundles support clone meta-attributes including ``notify``, ``ordered``, and
+``interleave``.
+
+Limitations of Bundles
+______________________
+
+Restarting pacemaker while a bundle is unmanaged or the cluster is in
+maintenance mode may cause the bundle to fail.
+
+Bundles may not be explicitly cloned or included in groups. This includes the
+bundle's primitive and any resources implicitly created by Pacemaker for the
+bundle. (If ``replicas`` is greater than 1, the bundle will behave like a clone
+implicitly.)
+
+Bundles do not have instance attributes, utilization attributes, or operations,
+though a bundle's primitive may have them.
+
+A bundle with a primitive can run on a Pacemaker Remote node only if the bundle
+uses a distinct ``control-port``.
+
+.. [#] Of course, the service must support running multiple instances.
+
+.. [#] Docker is a trademark of Docker, Inc. No endorsement by or association with
+ Docker, Inc. is implied.
diff --git a/doc/sphinx/Pacemaker_Explained/alerts.rst b/doc/sphinx/Pacemaker_Explained/alerts.rst
new file mode 100644
index 0000000..1d02187
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/alerts.rst
@@ -0,0 +1,257 @@
+.. index::
+ single: alert
+ single: resource; alert
+ single: node; alert
+ single: fencing; alert
+ pair: XML element; alert
+ pair: XML element; alerts
+
+Alerts
+------
+
+*Alerts* may be configured to take some external action when a cluster event
+occurs (node failure, resource starting or stopping, etc.).
+
+
+.. index::
+ pair: alert; agent
+
+Alert Agents
+############
+
+As with resource agents, the cluster calls an external program (an
+*alert agent*) to handle alerts. The cluster passes information about the event
+to the agent via environment variables. Agents can do anything desired with
+this information (send an e-mail, log to a file, update a monitoring system,
+etc.).
+
+.. topic:: Simple alert configuration
+
+ .. code-block:: xml
+
+ <configuration>
+ <alerts>
+ <alert id="my-alert" path="/path/to/my-script.sh" />
+ </alerts>
+ </configuration>
+
+In the example above, the cluster will call ``my-script.sh`` for each event.
+
+Multiple alert agents may be configured; the cluster will call all of them for
+each event.
+
+Alert agents will be called only on cluster nodes. They will be called for
+events involving Pacemaker Remote nodes, but they will never be called *on*
+those nodes.
+
+For more information about sample alert agents provided by Pacemaker and about
+developing custom alert agents, see the *Pacemaker Administration* document.
+
+
+.. index::
+ single: alert; recipient
+ pair: XML element; recipient
+
+Alert Recipients
+################
+
+Usually, alerts are directed towards a recipient. Thus, each alert may be
+additionally configured with one or more recipients. The cluster will call the
+agent separately for each recipient.
+
+.. topic:: Alert configuration with recipient
+
+ .. code-block:: xml
+
+ <configuration>
+ <alerts>
+ <alert id="my-alert" path="/path/to/my-script.sh">
+ <recipient id="my-alert-recipient" value="some-address"/>
+ </alert>
+ </alerts>
+ </configuration>
+
+In the above example, the cluster will call ``my-script.sh`` for each event,
+passing the recipient ``some-address`` as an environment variable.
+
+The recipient may be anything the alert agent can recognize -- an IP address,
+an e-mail address, a file name, whatever the particular agent supports.
+
+
+.. index::
+ single: alert; meta-attributes
+ single: meta-attribute; alert meta-attributes
+
+Alert Meta-Attributes
+#####################
+
+As with resources, meta-attributes can be configured for alerts to change
+whether and how Pacemaker calls them.
+
+.. table:: **Meta-Attributes of an Alert**
+ :class: longtable
+ :widths: 1 1 3
+
+ +------------------+---------------+-----------------------------------------------------+
+ | Meta-Attribute | Default | Description |
+ +==================+===============+=====================================================+
+ | enabled | true | .. index:: |
+ | | | single: alert; meta-attribute, enabled |
+ | | | single: meta-attribute; enabled (alert) |
+ | | | single: enabled; alert meta-attribute |
+ | | | |
+ | | | If false for an alert, the alert will not be used. |
+ | | | If true for an alert and false for a particular |
+ | | | recipient of that alert, that recipient will not be |
+ | | | used. *(since 2.1.6)* |
+ +------------------+---------------+-----------------------------------------------------+
+ | timestamp-format | %H:%M:%S.%06N | .. index:: |
+ | | | single: alert; meta-attribute, timestamp-format |
+ | | | single: meta-attribute; timestamp-format (alert) |
+ | | | single: timestamp-format; alert meta-attribute |
+ | | | |
+ | | | Format the cluster will use when sending the |
+ | | | event's timestamp to the agent. This is a string as |
+ | | | used with the ``date(1)`` command. |
+ +------------------+---------------+-----------------------------------------------------+
+ | timeout | 30s | .. index:: |
+ | | | single: alert; meta-attribute, timeout |
+ | | | single: meta-attribute; timeout (alert) |
+ | | | single: timeout; alert meta-attribute |
+ | | | |
+ | | | If the alert agent does not complete within this |
+ | | | amount of time, it will be terminated. |
+ +------------------+---------------+-----------------------------------------------------+
+
+Meta-attributes can be configured per alert and/or per recipient.
+
+.. topic:: Alert configuration with meta-attributes
+
+ .. code-block:: xml
+
+ <configuration>
+ <alerts>
+ <alert id="my-alert" path="/path/to/my-script.sh">
+ <meta_attributes id="my-alert-attributes">
+ <nvpair id="my-alert-attributes-timeout" name="timeout"
+ value="15s"/>
+ </meta_attributes>
+ <recipient id="my-alert-recipient1" value="someuser@example.com">
+ <meta_attributes id="my-alert-recipient1-attributes">
+ <nvpair id="my-alert-recipient1-timestamp-format"
+ name="timestamp-format" value="%D %H:%M"/>
+ </meta_attributes>
+ </recipient>
+ <recipient id="my-alert-recipient2" value="otheruser@example.com">
+ <meta_attributes id="my-alert-recipient2-attributes">
+ <nvpair id="my-alert-recipient2-timestamp-format"
+ name="timestamp-format" value="%c"/>
+ </meta_attributes>
+ </recipient>
+ </alert>
+ </alerts>
+ </configuration>
+
+In the above example, the ``my-script.sh`` will get called twice for each
+event, with each call using a 15-second timeout. One call will be passed the
+recipient ``someuser@example.com`` and a timestamp in the format ``%D %H:%M``,
+while the other call will be passed the recipient ``otheruser@example.com`` and
+a timestamp in the format ``%c``.
+
+
+.. index::
+ single: alert; instance attributes
+ single: instance attribute; alert instance attributes
+
+Alert Instance Attributes
+#########################
+
+As with resource agents, agent-specific configuration values may be configured
+as instance attributes. These will be passed to the agent as additional
+environment variables. The number, names and allowed values of these instance
+attributes are completely up to the particular agent.
+
+.. topic:: Alert configuration with instance attributes
+
+ .. code-block:: xml
+
+ <configuration>
+ <alerts>
+ <alert id="my-alert" path="/path/to/my-script.sh">
+ <meta_attributes id="my-alert-attributes">
+ <nvpair id="my-alert-attributes-timeout" name="timeout"
+ value="15s"/>
+ </meta_attributes>
+ <instance_attributes id="my-alert-options">
+ <nvpair id="my-alert-options-debug" name="debug"
+ value="false"/>
+ </instance_attributes>
+ <recipient id="my-alert-recipient1"
+ value="someuser@example.com"/>
+ </alert>
+ </alerts>
+ </configuration>
+
+
+.. index::
+ single: alert; filters
+ pair: XML element; select
+ pair: XML element; select_nodes
+ pair: XML element; select_fencing
+ pair: XML element; select_resources
+ pair: XML element; select_attributes
+ pair: XML element; attribute
+
+Alert Filters
+#############
+
+By default, an alert agent will be called for node events, fencing events, and
+resource events. An agent may choose to ignore certain types of events, but
+there is still the overhead of calling it for those events. To eliminate that
+overhead, you may select which types of events the agent should receive.
+
+.. topic:: Alert configuration to receive only node events and fencing events
+
+ .. code-block:: xml
+
+ <configuration>
+ <alerts>
+ <alert id="my-alert" path="/path/to/my-script.sh">
+ <select>
+ <select_nodes />
+ <select_fencing />
+ </select>
+ <recipient id="my-alert-recipient1"
+ value="someuser@example.com"/>
+ </alert>
+ </alerts>
+ </configuration>
+
+The possible options within ``<select>`` are ``<select_nodes>``,
+``<select_fencing>``, ``<select_resources>``, and ``<select_attributes>``.
+
+With ``<select_attributes>`` (the only event type not enabled by default), the
+agent will receive alerts when a node attribute changes. If you wish the agent
+to be called only when certain attributes change, you can configure that as well.
+
+.. topic:: Alert configuration to be called when certain node attributes change
+
+ .. code-block:: xml
+
+ <configuration>
+ <alerts>
+ <alert id="my-alert" path="/path/to/my-script.sh">
+ <select>
+ <select_attributes>
+ <attribute id="alert-standby" name="standby" />
+ <attribute id="alert-shutdown" name="shutdown" />
+ </select_attributes>
+ </select>
+ <recipient id="my-alert-recipient1" value="someuser@example.com"/>
+ </alert>
+ </alerts>
+ </configuration>
+
+Node attribute alerts are currently considered experimental. Alerts may be
+limited to attributes set via ``attrd_updater``, and agents may be called
+multiple times with the same attribute value.
diff --git a/doc/sphinx/Pacemaker_Explained/ap-samples.rst b/doc/sphinx/Pacemaker_Explained/ap-samples.rst
new file mode 100644
index 0000000..641affc
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/ap-samples.rst
@@ -0,0 +1,148 @@
+Sample Configurations
+---------------------
+
+Empty
+#####
+
+.. topic:: An Empty Configuration
+
+ .. code-block:: xml
+
+ <cib crm_feature_set="3.0.7" validate-with="pacemaker-1.2" admin_epoch="1" epoch="0" num_updates="0">
+ <configuration>
+ <crm_config/>
+ <nodes/>
+ <resources/>
+ <constraints/>
+ </configuration>
+ <status/>
+ </cib>
+
+Simple
+######
+
+.. topic:: A simple configuration with two nodes, some cluster options and a resource
+
+ .. code-block:: xml
+
+ <cib crm_feature_set="3.0.7" validate-with="pacemaker-1.2" admin_epoch="1" epoch="0" num_updates="0">
+ <configuration>
+ <crm_config>
+ <cluster_property_set id="cib-bootstrap-options">
+ <nvpair id="option-1" name="symmetric-cluster" value="true"/>
+ <nvpair id="option-2" name="no-quorum-policy" value="stop"/>
+ <nvpair id="option-3" name="stonith-enabled" value="0"/>
+ </cluster_property_set>
+ </crm_config>
+ <nodes>
+ <node id="xxx" uname="c001n01" type="normal"/>
+ <node id="yyy" uname="c001n02" type="normal"/>
+ </nodes>
+ <resources>
+ <primitive id="myAddr" class="ocf" provider="heartbeat" type="IPaddr">
+ <operations>
+ <op id="myAddr-monitor" name="monitor" interval="300s"/>
+ </operations>
+ <instance_attributes id="myAddr-params">
+ <nvpair id="myAddr-ip" name="ip" value="192.0.2.10"/>
+ </instance_attributes>
+ </primitive>
+ </resources>
+ <constraints>
+ <rsc_location id="myAddr-prefer" rsc="myAddr" node="c001n01" score="INFINITY"/>
+ </constraints>
+ <rsc_defaults>
+ <meta_attributes id="rsc_defaults-options">
+ <nvpair id="rsc-default-1" name="resource-stickiness" value="100"/>
+ <nvpair id="rsc-default-2" name="migration-threshold" value="10"/>
+ </meta_attributes>
+ </rsc_defaults>
+ <op_defaults>
+ <meta_attributes id="op_defaults-options">
+ <nvpair id="op-default-1" name="timeout" value="30s"/>
+ </meta_attributes>
+ </op_defaults>
+ </configuration>
+ <status/>
+ </cib>
+
+In the above example, we have one resource (an IP address) that we check
+every five minutes and will run on host ``c001n01`` until either the
+resource fails 10 times or the host shuts down.
+
+Advanced Configuration
+######################
+
+.. topic:: An advanced configuration with groups, clones and STONITH
+
+ .. code-block:: xml
+
+ <cib crm_feature_set="3.0.7" validate-with="pacemaker-1.2" admin_epoch="1" epoch="0" num_updates="0">
+ <configuration>
+ <crm_config>
+ <cluster_property_set id="cib-bootstrap-options">
+ <nvpair id="option-1" name="symmetric-cluster" value="true"/>
+ <nvpair id="option-2" name="no-quorum-policy" value="stop"/>
+ <nvpair id="option-3" name="stonith-enabled" value="true"/>
+ </cluster_property_set>
+ </crm_config>
+ <nodes>
+ <node id="xxx" uname="c001n01" type="normal"/>
+ <node id="yyy" uname="c001n02" type="normal"/>
+ <node id="zzz" uname="c001n03" type="normal"/>
+ </nodes>
+ <resources>
+ <primitive id="myAddr" class="ocf" provider="heartbeat" type="IPaddr">
+ <operations>
+ <op id="myAddr-monitor" name="monitor" interval="300s"/>
+ </operations>
+ <instance_attributes id="myAddr-attrs">
+ <nvpair id="myAddr-attr-1" name="ip" value="192.0.2.10"/>
+ </instance_attributes>
+ </primitive>
+ <group id="myGroup">
+ <primitive id="database" class="lsb" type="oracle">
+ <operations>
+ <op id="database-monitor" name="monitor" interval="300s"/>
+ </operations>
+ </primitive>
+ <primitive id="webserver" class="lsb" type="apache">
+ <operations>
+ <op id="webserver-monitor" name="monitor" interval="300s"/>
+ </operations>
+ </primitive>
+ </group>
+ <clone id="STONITH">
+ <meta_attributes id="stonith-options">
+ <nvpair id="stonith-option-1" name="globally-unique" value="false"/>
+ </meta_attributes>
+ <primitive id="stonithclone" class="stonith" type="external/ssh">
+ <operations>
+ <op id="stonith-op-mon" name="monitor" interval="5s"/>
+ </operations>
+ <instance_attributes id="stonith-attrs">
+ <nvpair id="stonith-attr-1" name="hostlist" value="c001n01,c001n02"/>
+ </instance_attributes>
+ </primitive>
+ </clone>
+ </resources>
+ <constraints>
+ <rsc_location id="myAddr-prefer" rsc="myAddr" node="c001n01"
+ score="INFINITY"/>
+ <rsc_colocation id="group-with-ip" rsc="myGroup" with-rsc="myAddr"
+ score="INFINITY"/>
+ </constraints>
+ <op_defaults>
+ <meta_attributes id="op_defaults-options">
+ <nvpair id="op-default-1" name="timeout" value="30s"/>
+ </meta_attributes>
+ </op_defaults>
+ <rsc_defaults>
+ <meta_attributes id="rsc_defaults-options">
+ <nvpair id="rsc-default-1" name="resource-stickiness" value="100"/>
+ <nvpair id="rsc-default-2" name="migration-threshold" value="10"/>
+ </meta_attributes>
+ </rsc_defaults>
+ </configuration>
+ <status/>
+ </cib>
diff --git a/doc/sphinx/Pacemaker_Explained/constraints.rst b/doc/sphinx/Pacemaker_Explained/constraints.rst
new file mode 100644
index 0000000..ab34c9f
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/constraints.rst
@@ -0,0 +1,1106 @@
+.. index::
+ single: constraint
+ single: resource; constraint
+
+.. _constraints:
+
+Resource Constraints
+--------------------
+
+.. index::
+ single: resource; score
+ single: node; score
+
+Scores
+######
+
+Scores of all kinds are integral to how the cluster works.
+Practically everything from moving a resource to deciding which
+resource to stop in a degraded cluster is achieved by manipulating
+scores in some way.
+
+Scores are calculated per resource and node. Any node with a
+negative score for a resource can't run that resource. The cluster
+places a resource on the node with the highest score for it.
+
+Infinity Math
+_____________
+
+Pacemaker implements **INFINITY** (or equivalently, **+INFINITY**) internally as a
+score of 1,000,000. Addition and subtraction with it follow these three basic
+rules:
+
+* Any value + **INFINITY** = **INFINITY**
+
+* Any value - **INFINITY** = -**INFINITY**
+
+* **INFINITY** - **INFINITY** = **-INFINITY**
+
+.. note::
+
+ What if you want to use a score higher than 1,000,000? Typically this possibility
+ arises when someone wants to base the score on some external metric that might
+ go above 1,000,000.
+
+ The short answer is you can't.
+
+ The long answer is it is sometimes possible work around this limitation
+ creatively. You may be able to set the score to some computed value based on
+ the external metric rather than use the metric directly. For nodes, you can
+ store the metric as a node attribute, and query the attribute when computing
+ the score (possibly as part of a custom resource agent).
+
+.. _location-constraint:
+
+.. index::
+ single: location constraint
+ single: constraint; location
+
+Deciding Which Nodes a Resource Can Run On
+##########################################
+
+*Location constraints* tell the cluster which nodes a resource can run on.
+
+There are two alternative strategies. One way is to say that, by default,
+resources can run anywhere, and then the location constraints specify nodes
+that are not allowed (an *opt-out* cluster). The other way is to start with
+nothing able to run anywhere, and use location constraints to selectively
+enable allowed nodes (an *opt-in* cluster).
+
+Whether you should choose opt-in or opt-out depends on your
+personal preference and the make-up of your cluster. If most of your
+resources can run on most of the nodes, then an opt-out arrangement is
+likely to result in a simpler configuration. On the other-hand, if
+most resources can only run on a small subset of nodes, an opt-in
+configuration might be simpler.
+
+.. index::
+ pair: XML element; rsc_location
+ single: constraint; rsc_location
+
+Location Properties
+___________________
+
+.. table:: **Attributes of a rsc_location Element**
+ :class: longtable
+ :widths: 1 1 4
+
+ +--------------------+---------+----------------------------------------------------------------------------------------------+
+ | Attribute | Default | Description |
+ +====================+=========+==============================================================================================+
+ | id | | .. index:: |
+ | | | single: rsc_location; attribute, id |
+ | | | single: attribute; id (rsc_location) |
+ | | | single: id; rsc_location attribute |
+ | | | |
+ | | | A unique name for the constraint (required) |
+ +--------------------+---------+----------------------------------------------------------------------------------------------+
+ | rsc | | .. index:: |
+ | | | single: rsc_location; attribute, rsc |
+ | | | single: attribute; rsc (rsc_location) |
+ | | | single: rsc; rsc_location attribute |
+ | | | |
+ | | | The name of the resource to which this constraint |
+ | | | applies. A location constraint must either have a |
+ | | | ``rsc``, have a ``rsc-pattern``, or contain at |
+ | | | least one resource set. |
+ +--------------------+---------+----------------------------------------------------------------------------------------------+
+ | rsc-pattern | | .. index:: |
+ | | | single: rsc_location; attribute, rsc-pattern |
+ | | | single: attribute; rsc-pattern (rsc_location) |
+ | | | single: rsc-pattern; rsc_location attribute |
+ | | | |
+ | | | A pattern matching the names of resources to which |
+ | | | this constraint applies. The syntax is the same as |
+ | | | `POSIX <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04>`_ |
+ | | | extended regular expressions, with the addition of an |
+ | | | initial ``!`` indicating that resources *not* matching |
+ | | | the pattern are selected. If the regular expression |
+ | | | contains submatches, and the constraint is governed by |
+ | | | a :ref:`rule <rules>`, the submatches can be |
+ | | | referenced as ``%1`` through ``%9`` in the rule's |
+ | | | ``score-attribute`` or a rule expression's ``attribute`` |
+ | | | (see :ref:`s-rsc-pattern-rules`). A location constraint |
+ | | | must either have a ``rsc``, have a ``rsc-pattern``, or |
+ | | | contain at least one resource set. |
+ +--------------------+---------+----------------------------------------------------------------------------------------------+
+ | node | | .. index:: |
+ | | | single: rsc_location; attribute, node |
+ | | | single: attribute; node (rsc_location) |
+ | | | single: node; rsc_location attribute |
+ | | | |
+ | | | The name of the node to which this constraint applies. |
+ | | | A location constraint must either have a ``node`` and |
+ | | | ``score``, or contain at least one rule. |
+ +--------------------+---------+----------------------------------------------------------------------------------------------+
+ | score | | .. index:: |
+ | | | single: rsc_location; attribute, score |
+ | | | single: attribute; score (rsc_location) |
+ | | | single: score; rsc_location attribute |
+ | | | |
+ | | | Positive values indicate a preference for running the |
+ | | | affected resource(s) on ``node`` -- the higher the value, |
+ | | | the stronger the preference. Negative values indicate |
+ | | | the resource(s) should avoid this node (a value of |
+ | | | **-INFINITY** changes "should" to "must"). A location |
+ | | | constraint must either have a ``node`` and ``score``, |
+ | | | or contain at least one rule. |
+ +--------------------+---------+----------------------------------------------------------------------------------------------+
+ | resource-discovery | always | .. index:: |
+ | | | single: rsc_location; attribute, resource-discovery |
+ | | | single: attribute; resource-discovery (rsc_location) |
+ | | | single: resource-discovery; rsc_location attribute |
+ | | | |
+ | | | Whether Pacemaker should perform resource discovery |
+ | | | (that is, check whether the resource is already running) |
+ | | | for this resource on this node. This should normally be |
+ | | | left as the default, so that rogue instances of a |
+ | | | service can be stopped when they are running where they |
+ | | | are not supposed to be. However, there are two |
+ | | | situations where disabling resource discovery is a good |
+ | | | idea: when a service is not installed on a node, |
+ | | | discovery might return an error (properly written OCF |
+ | | | agents will not, so this is usually only seen with other |
+ | | | agent types); and when Pacemaker Remote is used to scale |
+ | | | a cluster to hundreds of nodes, limiting resource |
+ | | | discovery to allowed nodes can significantly boost |
+ | | | performance. |
+ | | | |
+ | | | * ``always:`` Always perform resource discovery for |
+ | | | the specified resource on this node. |
+ | | | |
+ | | | * ``never:`` Never perform resource discovery for the |
+ | | | specified resource on this node. This option should |
+ | | | generally be used with a -INFINITY score, although |
+ | | | that is not strictly required. |
+ | | | |
+ | | | * ``exclusive:`` Perform resource discovery for the |
+ | | | specified resource only on this node (and other nodes |
+ | | | similarly marked as ``exclusive``). Multiple location |
+ | | | constraints using ``exclusive`` discovery for the |
+ | | | same resource across different nodes creates a subset |
+ | | | of nodes resource-discovery is exclusive to. If a |
+ | | | resource is marked for ``exclusive`` discovery on one |
+ | | | or more nodes, that resource is only allowed to be |
+ | | | placed within that subset of nodes. |
+ +--------------------+---------+----------------------------------------------------------------------------------------------+
+
+.. warning::
+
+ Setting ``resource-discovery`` to ``never`` or ``exclusive`` removes Pacemaker's
+ ability to detect and stop unwanted instances of a service running
+ where it's not supposed to be. It is up to the system administrator (you!)
+ to make sure that the service can *never* be active on nodes without
+ ``resource-discovery`` (such as by leaving the relevant software uninstalled).
+
+.. index::
+ single: Asymmetrical Clusters
+ single: Opt-In Clusters
+
+Asymmetrical "Opt-In" Clusters
+______________________________
+
+To create an opt-in cluster, start by preventing resources from running anywhere
+by default:
+
+.. code-block:: none
+
+ # crm_attribute --name symmetric-cluster --update false
+
+Then start enabling nodes. The following fragment says that the web
+server prefers **sles-1**, the database prefers **sles-2** and both can
+fail over to **sles-3** if their most preferred node fails.
+
+.. topic:: Opt-in location constraints for two resources
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_location id="loc-1" rsc="Webserver" node="sles-1" score="200"/>
+ <rsc_location id="loc-2" rsc="Webserver" node="sles-3" score="0"/>
+ <rsc_location id="loc-3" rsc="Database" node="sles-2" score="200"/>
+ <rsc_location id="loc-4" rsc="Database" node="sles-3" score="0"/>
+ </constraints>
+
+.. index::
+ single: Symmetrical Clusters
+ single: Opt-Out Clusters
+
+Symmetrical "Opt-Out" Clusters
+______________________________
+
+To create an opt-out cluster, start by allowing resources to run
+anywhere by default:
+
+.. code-block:: none
+
+ # crm_attribute --name symmetric-cluster --update true
+
+Then start disabling nodes. The following fragment is the equivalent
+of the above opt-in configuration.
+
+.. topic:: Opt-out location constraints for two resources
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_location id="loc-1" rsc="Webserver" node="sles-1" score="200"/>
+ <rsc_location id="loc-2-do-not-run" rsc="Webserver" node="sles-2" score="-INFINITY"/>
+ <rsc_location id="loc-3-do-not-run" rsc="Database" node="sles-1" score="-INFINITY"/>
+ <rsc_location id="loc-4" rsc="Database" node="sles-2" score="200"/>
+ </constraints>
+
+.. _node-score-equal:
+
+What if Two Nodes Have the Same Score
+_____________________________________
+
+If two nodes have the same score, then the cluster will choose one.
+This choice may seem random and may not be what was intended, however
+the cluster was not given enough information to know any better.
+
+.. topic:: Constraints where a resource prefers two nodes equally
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_location id="loc-1" rsc="Webserver" node="sles-1" score="INFINITY"/>
+ <rsc_location id="loc-2" rsc="Webserver" node="sles-2" score="INFINITY"/>
+ <rsc_location id="loc-3" rsc="Database" node="sles-1" score="500"/>
+ <rsc_location id="loc-4" rsc="Database" node="sles-2" score="300"/>
+ <rsc_location id="loc-5" rsc="Database" node="sles-2" score="200"/>
+ </constraints>
+
+In the example above, assuming no other constraints and an inactive
+cluster, **Webserver** would probably be placed on **sles-1** and **Database** on
+**sles-2**. It would likely have placed **Webserver** based on the node's
+uname and **Database** based on the desire to spread the resource load
+evenly across the cluster. However other factors can also be involved
+in more complex configurations.
+
+.. _s-rsc-pattern:
+
+Specifying locations using pattern matching
+___________________________________________
+
+A location constraint can affect all resources whose IDs match a given pattern.
+The following example bans resources named **ip-httpd**, **ip-asterisk**,
+**ip-gateway**, etc., from **node1**.
+
+.. topic:: Location constraint banning all resources matching a pattern from one node
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_location id="ban-ips-from-node1" rsc-pattern="ip-.*" node="node1" score="-INFINITY"/>
+ </constraints>
+
+
+.. index::
+ single: constraint; ordering
+ single: resource; start order
+
+
+.. _s-resource-ordering:
+
+Specifying the Order in which Resources Should Start/Stop
+#########################################################
+
+*Ordering constraints* tell the cluster the order in which certain
+resource actions should occur.
+
+.. important::
+
+ Ordering constraints affect *only* the ordering of resource actions;
+ they do *not* require that the resources be placed on the
+ same node. If you want resources to be started on the same node
+ *and* in a specific order, you need both an ordering constraint *and*
+ a colocation constraint (see :ref:`s-resource-colocation`), or
+ alternatively, a group (see :ref:`group-resources`).
+
+.. index::
+ pair: XML element; rsc_order
+ pair: constraint; rsc_order
+
+Ordering Properties
+___________________
+
+.. table:: **Attributes of a rsc_order Element**
+ :class: longtable
+ :widths: 1 2 4
+
+ +--------------+----------------------------+-------------------------------------------------------------------+
+ | Field | Default | Description |
+ +==============+============================+===================================================================+
+ | id | | .. index:: |
+ | | | single: rsc_order; attribute, id |
+ | | | single: attribute; id (rsc_order) |
+ | | | single: id; rsc_order attribute |
+ | | | |
+ | | | A unique name for the constraint |
+ +--------------+----------------------------+-------------------------------------------------------------------+
+ | first | | .. index:: |
+ | | | single: rsc_order; attribute, first |
+ | | | single: attribute; first (rsc_order) |
+ | | | single: first; rsc_order attribute |
+ | | | |
+ | | | Name of the resource that the ``then`` resource |
+ | | | depends on |
+ +--------------+----------------------------+-------------------------------------------------------------------+
+ | then | | .. index:: |
+ | | | single: rsc_order; attribute, then |
+ | | | single: attribute; then (rsc_order) |
+ | | | single: then; rsc_order attribute |
+ | | | |
+ | | | Name of the dependent resource |
+ +--------------+----------------------------+-------------------------------------------------------------------+
+ | first-action | start | .. index:: |
+ | | | single: rsc_order; attribute, first-action |
+ | | | single: attribute; first-action (rsc_order) |
+ | | | single: first-action; rsc_order attribute |
+ | | | |
+ | | | The action that the ``first`` resource must complete |
+ | | | before ``then-action`` can be initiated for the ``then`` |
+ | | | resource. Allowed values: ``start``, ``stop``, |
+ | | | ``promote``, ``demote``. |
+ +--------------+----------------------------+-------------------------------------------------------------------+
+ | then-action | value of ``first-action`` | .. index:: |
+ | | | single: rsc_order; attribute, then-action |
+ | | | single: attribute; then-action (rsc_order) |
+ | | | single: first-action; rsc_order attribute |
+ | | | |
+ | | | The action that the ``then`` resource can execute only |
+ | | | after the ``first-action`` on the ``first`` resource has |
+ | | | completed. Allowed values: ``start``, ``stop``, |
+ | | | ``promote``, ``demote``. |
+ +--------------+----------------------------+-------------------------------------------------------------------+
+ | kind | Mandatory | .. index:: |
+ | | | single: rsc_order; attribute, kind |
+ | | | single: attribute; kind (rsc_order) |
+ | | | single: kind; rsc_order attribute |
+ | | | |
+ | | | How to enforce the constraint. Allowed values: |
+ | | | |
+ | | | * ``Mandatory:`` ``then-action`` will never be initiated |
+ | | | for the ``then`` resource unless and until ``first-action`` |
+ | | | successfully completes for the ``first`` resource. |
+ | | | |
+ | | | * ``Optional:`` The constraint applies only if both specified |
+ | | | resource actions are scheduled in the same transition |
+ | | | (that is, in response to the same cluster state). This |
+ | | | means that ``then-action`` is allowed on the ``then`` |
+ | | | resource regardless of the state of the ``first`` resource, |
+ | | | but if both actions happen to be scheduled at the same time, |
+ | | | they will be ordered. |
+ | | | |
+ | | | * ``Serialize:`` Ensure that the specified actions are never |
+ | | | performed concurrently for the specified resources. |
+ | | | ``First-action`` and ``then-action`` can be executed in either |
+ | | | order, but one must complete before the other can be initiated. |
+ | | | An example use case is when resource start-up puts a high load |
+ | | | on the host. |
+ +--------------+----------------------------+-------------------------------------------------------------------+
+ | symmetrical | TRUE for ``Mandatory`` and | .. index:: |
+ | | ``Optional`` kinds. FALSE | single: rsc_order; attribute, symmetrical |
+ | | for ``Serialize`` kind. | single: attribute; symmetrical (rsc)order) |
+ | | | single: symmetrical; rsc_order attribute |
+ | | | |
+ | | | If true, the reverse of the constraint applies for the |
+ | | | opposite action (for example, if B starts after A starts, |
+ | | | then B stops before A stops). ``Serialize`` orders cannot |
+ | | | be symmetrical. |
+ +--------------+----------------------------+-------------------------------------------------------------------+
+
+``Promote`` and ``demote`` apply to :ref:`promotable <s-resource-promotable>`
+clone resources.
+
+Optional and mandatory ordering
+_______________________________
+
+Here is an example of ordering constraints where **Database** *must* start before
+**Webserver**, and **IP** *should* start before **Webserver** if they both need to be
+started:
+
+.. topic:: Optional and mandatory ordering constraints
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_order id="order-1" first="IP" then="Webserver" kind="Optional"/>
+ <rsc_order id="order-2" first="Database" then="Webserver" kind="Mandatory" />
+ </constraints>
+
+Because the above example lets ``symmetrical`` default to TRUE, **Webserver**
+must be stopped before **Database** can be stopped, and **Webserver** should be
+stopped before **IP** if they both need to be stopped.
+
+.. index::
+ single: colocation
+ single: constraint; colocation
+ single: resource; location relative to other resources
+
+.. _s-resource-colocation:
+
+Placing Resources Relative to other Resources
+#############################################
+
+*Colocation constraints* tell the cluster that the location of one resource
+depends on the location of another one.
+
+Colocation has an important side-effect: it affects the order in which
+resources are assigned to a node. Think about it: You can't place A relative to
+B unless you know where B is [#]_.
+
+So when you are creating colocation constraints, it is important to
+consider whether you should colocate A with B, or B with A.
+
+.. important::
+
+ Colocation constraints affect *only* the placement of resources; they do *not*
+ require that the resources be started in a particular order. If you want
+ resources to be started on the same node *and* in a specific order, you need
+ both an ordering constraint (see :ref:`s-resource-ordering`) *and* a colocation
+ constraint, or alternatively, a group (see :ref:`group-resources`).
+
+.. index::
+ pair: XML element; rsc_colocation
+ single: constraint; rsc_colocation
+
+Colocation Properties
+_____________________
+
+.. table:: **Attributes of a rsc_colocation Constraint**
+ :class: longtable
+ :widths: 2 2 5
+
+ +----------------+----------------+--------------------------------------------------------+
+ | Field | Default | Description |
+ +================+================+========================================================+
+ | id | | .. index:: |
+ | | | single: rsc_colocation; attribute, id |
+ | | | single: attribute; id (rsc_colocation) |
+ | | | single: id; rsc_colocation attribute |
+ | | | |
+ | | | A unique name for the constraint (required). |
+ +----------------+----------------+--------------------------------------------------------+
+ | rsc | | .. index:: |
+ | | | single: rsc_colocation; attribute, rsc |
+ | | | single: attribute; rsc (rsc_colocation) |
+ | | | single: rsc; rsc_colocation attribute |
+ | | | |
+ | | | The name of a resource that should be located |
+ | | | relative to ``with-rsc``. A colocation constraint must |
+ | | | either contain at least one |
+ | | | :ref:`resource set <s-resource-sets>`, or specify both |
+ | | | ``rsc`` and ``with-rsc``. |
+ +----------------+----------------+--------------------------------------------------------+
+ | with-rsc | | .. index:: |
+ | | | single: rsc_colocation; attribute, with-rsc |
+ | | | single: attribute; with-rsc (rsc_colocation) |
+ | | | single: with-rsc; rsc_colocation attribute |
+ | | | |
+ | | | The name of the resource used as the colocation |
+ | | | target. The cluster will decide where to put this |
+ | | | resource first and then decide where to put ``rsc``. |
+ | | | A colocation constraint must either contain at least |
+ | | | one :ref:`resource set <s-resource-sets>`, or specify |
+ | | | both ``rsc`` and ``with-rsc``. |
+ +----------------+----------------+--------------------------------------------------------+
+ | node-attribute | #uname | .. index:: |
+ | | | single: rsc_colocation; attribute, node-attribute |
+ | | | single: attribute; node-attribute (rsc_colocation) |
+ | | | single: node-attribute; rsc_colocation attribute |
+ | | | |
+ | | | If ``rsc`` and ``with-rsc`` are specified, this node |
+ | | | attribute must be the same on the node running ``rsc`` |
+ | | | and the node running ``with-rsc`` for the constraint |
+ | | | to be satisfied. (For details, see |
+ | | | :ref:`s-coloc-attribute`.) |
+ +----------------+----------------+--------------------------------------------------------+
+ | score | 0 | .. index:: |
+ | | | single: rsc_colocation; attribute, score |
+ | | | single: attribute; score (rsc_colocation) |
+ | | | single: score; rsc_colocation attribute |
+ | | | |
+ | | | Positive values indicate the resources should run on |
+ | | | the same node. Negative values indicate the resources |
+ | | | should run on different nodes. Values of |
+ | | | +/- ``INFINITY`` change "should" to "must". |
+ +----------------+----------------+--------------------------------------------------------+
+ | rsc-role | Started | .. index:: |
+ | | | single: clone; ordering constraint, rsc-role |
+ | | | single: ordering constraint; rsc-role (clone) |
+ | | | single: rsc-role; clone ordering constraint |
+ | | | |
+ | | | If ``rsc`` and ``with-rsc`` are specified, and ``rsc`` |
+ | | | is a :ref:`promotable clone <s-resource-promotable>`, |
+ | | | the constraint applies only to ``rsc`` instances in |
+ | | | this role. Allowed values: ``Started``, ``Promoted``, |
+ | | | ``Unpromoted``. For details, see |
+ | | | :ref:`promotable-clone-constraints`. |
+ +----------------+----------------+--------------------------------------------------------+
+ | with-rsc-role | Started | .. index:: |
+ | | | single: clone; ordering constraint, with-rsc-role |
+ | | | single: ordering constraint; with-rsc-role (clone) |
+ | | | single: with-rsc-role; clone ordering constraint |
+ | | | |
+ | | | If ``rsc`` and ``with-rsc`` are specified, and |
+ | | | ``with-rsc`` is a |
+ | | | :ref:`promotable clone <s-resource-promotable>`, the |
+ | | | constraint applies only to ``with-rsc`` instances in |
+ | | | this role. Allowed values: ``Started``, ``Promoted``, |
+ | | | ``Unpromoted``. For details, see |
+ | | | :ref:`promotable-clone-constraints`. |
+ +----------------+----------------+--------------------------------------------------------+
+ | influence | value of | .. index:: |
+ | | ``critical`` | single: rsc_colocation; attribute, influence |
+ | | meta-attribute | single: attribute; influence (rsc_colocation) |
+ | | for ``rsc`` | single: influence; rsc_colocation attribute |
+ | | | |
+ | | | Whether to consider the location preferences of |
+ | | | ``rsc`` when ``with-rsc`` is already active. Allowed |
+ | | | values: ``true``, ``false``. For details, see |
+ | | | :ref:`s-coloc-influence`. *(since 2.1.0)* |
+ +----------------+----------------+--------------------------------------------------------+
+
+Mandatory Placement
+___________________
+
+Mandatory placement occurs when the constraint's score is
+**+INFINITY** or **-INFINITY**. In such cases, if the constraint can't be
+satisfied, then the **rsc** resource is not permitted to run. For
+``score=INFINITY``, this includes cases where the ``with-rsc`` resource is
+not active.
+
+If you need resource **A** to always run on the same machine as
+resource **B**, you would add the following constraint:
+
+.. topic:: Mandatory colocation constraint for two resources
+
+ .. code-block:: xml
+
+ <rsc_colocation id="colocate" rsc="A" with-rsc="B" score="INFINITY"/>
+
+Remember, because **INFINITY** was used, if **B** can't run on any
+of the cluster nodes (for whatever reason) then **A** will not
+be allowed to run. Whether **A** is running or not has no effect on **B**.
+
+Alternatively, you may want the opposite -- that **A** *cannot*
+run on the same machine as **B**. In this case, use ``score="-INFINITY"``.
+
+.. topic:: Mandatory anti-colocation constraint for two resources
+
+ .. code-block:: xml
+
+ <rsc_colocation id="anti-colocate" rsc="A" with-rsc="B" score="-INFINITY"/>
+
+Again, by specifying **-INFINITY**, the constraint is binding. So if the
+only place left to run is where **B** already is, then **A** may not run anywhere.
+
+As with **INFINITY**, **B** can run even if **A** is stopped. However, in this
+case **A** also can run if **B** is stopped, because it still meets the
+constraint of **A** and **B** not running on the same node.
+
+Advisory Placement
+__________________
+
+If mandatory placement is about "must" and "must not", then advisory
+placement is the "I'd prefer if" alternative.
+
+For colocation constraints with scores greater than **-INFINITY** and less than
+**INFINITY**, the cluster will try to accommodate your wishes, but may ignore
+them if other factors outweigh the colocation score. Those factors might
+include other constraints, resource stickiness, failure thresholds, whether
+other resources would be prevented from being active, etc.
+
+.. topic:: Advisory colocation constraint for two resources
+
+ .. code-block:: xml
+
+ <rsc_colocation id="colocate-maybe" rsc="A" with-rsc="B" score="500"/>
+
+.. _s-coloc-attribute:
+
+Colocation by Node Attribute
+____________________________
+
+The ``node-attribute`` property of a colocation constraints allows you to express
+the requirement, "these resources must be on similar nodes".
+
+As an example, imagine that you have two Storage Area Networks (SANs) that are
+not controlled by the cluster, and each node is connected to one or the other.
+You may have two resources **r1** and **r2** such that **r2** needs to use the same
+SAN as **r1**, but doesn't necessarily have to be on the same exact node.
+In such a case, you could define a :ref:`node attribute <node_attributes>` named
+**san**, with the value **san1** or **san2** on each node as appropriate. Then, you
+could colocate **r2** with **r1** using ``node-attribute`` set to **san**.
+
+.. _s-coloc-influence:
+
+Colocation Influence
+____________________
+
+By default, if A is colocated with B, the cluster will take into account A's
+preferences when deciding where to place B, to maximize the chance that both
+resources can run.
+
+For a detailed look at exactly how this occurs, see
+`Colocation Explained <http://clusterlabs.org/doc/Colocation_Explained.pdf>`_.
+
+However, if ``influence`` is set to ``false`` in the colocation constraint,
+this will happen only if B is inactive and needing to be started. If B is
+already active, A's preferences will have no effect on placing B.
+
+An example of what effect this would have and when it would be desirable would
+be a nonessential reporting tool colocated with a resource-intensive service
+that takes a long time to start. If the reporting tool fails enough times to
+reach its migration threshold, by default the cluster will want to move both
+resources to another node if possible. Setting ``influence`` to ``false`` on
+the colocation constraint would mean that the reporting tool would be stopped
+in this situation instead, to avoid forcing the service to move.
+
+The ``critical`` resource meta-attribute is a convenient way to specify the
+default for all colocation constraints and groups involving a particular
+resource.
+
+.. note::
+
+ If a noncritical resource is a member of a group, all later members of the
+ group will be treated as noncritical, even if they are marked as (or left to
+ default to) critical.
+
+
+.. _s-resource-sets:
+
+Resource Sets
+#############
+
+.. index::
+ single: constraint; resource set
+ single: resource; resource set
+
+*Resource sets* allow multiple resources to be affected by a single constraint.
+
+.. topic:: A set of 3 resources
+
+ .. code-block:: xml
+
+ <resource_set id="resource-set-example">
+ <resource_ref id="A"/>
+ <resource_ref id="B"/>
+ <resource_ref id="C"/>
+ </resource_set>
+
+Resource sets are valid inside ``rsc_location``, ``rsc_order``
+(see :ref:`s-resource-sets-ordering`), ``rsc_colocation``
+(see :ref:`s-resource-sets-colocation`), and ``rsc_ticket``
+(see :ref:`ticket-constraints`) constraints.
+
+A resource set has a number of properties that can be set, though not all
+have an effect in all contexts.
+
+.. index::
+ pair: XML element; resource_set
+
+.. table:: **Attributes of a resource_set Element**
+ :class: longtable
+ :widths: 2 2 5
+
+ +-------------+------------------+--------------------------------------------------------+
+ | Field | Default | Description |
+ +=============+==================+========================================================+
+ | id | | .. index:: |
+ | | | single: resource_set; attribute, id |
+ | | | single: attribute; id (resource_set) |
+ | | | single: id; resource_set attribute |
+ | | | |
+ | | | A unique name for the set (required) |
+ +-------------+------------------+--------------------------------------------------------+
+ | sequential | true | .. index:: |
+ | | | single: resource_set; attribute, sequential |
+ | | | single: attribute; sequential (resource_set) |
+ | | | single: sequential; resource_set attribute |
+ | | | |
+ | | | Whether the members of the set must be acted on in |
+ | | | order. Meaningful within ``rsc_order`` and |
+ | | | ``rsc_colocation``. |
+ +-------------+------------------+--------------------------------------------------------+
+ | require-all | true | .. index:: |
+ | | | single: resource_set; attribute, require-all |
+ | | | single: attribute; require-all (resource_set) |
+ | | | single: require-all; resource_set attribute |
+ | | | |
+ | | | Whether all members of the set must be active before |
+ | | | continuing. With the current implementation, the |
+ | | | cluster may continue even if only one member of the |
+ | | | set is started, but if more than one member of the set |
+ | | | is starting at the same time, the cluster will still |
+ | | | wait until all of those have started before continuing |
+ | | | (this may change in future versions). Meaningful |
+ | | | within ``rsc_order``. |
+ +-------------+------------------+--------------------------------------------------------+
+ | role | | .. index:: |
+ | | | single: resource_set; attribute, role |
+ | | | single: attribute; role (resource_set) |
+ | | | single: role; resource_set attribute |
+ | | | |
+ | | | The constraint applies only to resource set members |
+ | | | that are :ref:`s-resource-promotable` in this |
+ | | | role. Meaningful within ``rsc_location``, |
+ | | | ``rsc_colocation`` and ``rsc_ticket``. |
+ | | | Allowed values: ``Started``, ``Promoted``, |
+ | | | ``Unpromoted``. For details, see |
+ | | | :ref:`promotable-clone-constraints`. |
+ +-------------+------------------+--------------------------------------------------------+
+ | action | value of | .. index:: |
+ | | ``first-action`` | single: resource_set; attribute, action |
+ | | in the enclosing | single: attribute; action (resource_set) |
+ | | ordering | single: action; resource_set attribute |
+ | | constraint | |
+ | | | The action that applies to *all members* of the set. |
+ | | | Meaningful within ``rsc_order``. Allowed values: |
+ | | | ``start``, ``stop``, ``promote``, ``demote``. |
+ +-------------+------------------+--------------------------------------------------------+
+ | score | | .. index:: |
+ | | | single: resource_set; attribute, score |
+ | | | single: attribute; score (resource_set) |
+ | | | single: score; resource_set attribute |
+ | | | |
+ | | | *Advanced use only.* Use a specific score for this |
+ | | | set within the constraint. |
+ +-------------+------------------+--------------------------------------------------------+
+
+.. _s-resource-sets-ordering:
+
+Ordering Sets of Resources
+##########################
+
+A common situation is for an administrator to create a chain of ordered
+resources, such as:
+
+.. topic:: A chain of ordered resources
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_order id="order-1" first="A" then="B" />
+ <rsc_order id="order-2" first="B" then="C" />
+ <rsc_order id="order-3" first="C" then="D" />
+ </constraints>
+
+.. topic:: Visual representation of the four resources' start order for the above constraints
+
+ .. image:: images/resource-set.png
+ :alt: Ordered set
+
+Ordered Set
+___________
+
+To simplify this situation, :ref:`s-resource-sets` can be used within ordering
+constraints:
+
+.. topic:: A chain of ordered resources expressed as a set
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_order id="order-1">
+ <resource_set id="ordered-set-example" sequential="true">
+ <resource_ref id="A"/>
+ <resource_ref id="B"/>
+ <resource_ref id="C"/>
+ <resource_ref id="D"/>
+ </resource_set>
+ </rsc_order>
+ </constraints>
+
+While the set-based format is not less verbose, it is significantly easier to
+get right and maintain.
+
+.. important::
+
+ If you use a higher-level tool, pay attention to how it exposes this
+ functionality. Depending on the tool, creating a set **A B** may be equivalent to
+ **A then B**, or **B then A**.
+
+Ordering Multiple Sets
+______________________
+
+The syntax can be expanded to allow sets of resources to be ordered relative to
+each other, where the members of each individual set may be ordered or
+unordered (controlled by the ``sequential`` property). In the example below, **A**
+and **B** can both start in parallel, as can **C** and **D**, however **C** and
+**D** can only start once *both* **A** *and* **B** are active.
+
+.. topic:: Ordered sets of unordered resources
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_order id="order-1">
+ <resource_set id="ordered-set-1" sequential="false">
+ <resource_ref id="A"/>
+ <resource_ref id="B"/>
+ </resource_set>
+ <resource_set id="ordered-set-2" sequential="false">
+ <resource_ref id="C"/>
+ <resource_ref id="D"/>
+ </resource_set>
+ </rsc_order>
+ </constraints>
+
+.. topic:: Visual representation of the start order for two ordered sets of
+ unordered resources
+
+ .. image:: images/two-sets.png
+ :alt: Two ordered sets
+
+Of course either set -- or both sets -- of resources can also be internally
+ordered (by setting ``sequential="true"``) and there is no limit to the number
+of sets that can be specified.
+
+.. topic:: Advanced use of set ordering - Three ordered sets, two of which are
+ internally unordered
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_order id="order-1">
+ <resource_set id="ordered-set-1" sequential="false">
+ <resource_ref id="A"/>
+ <resource_ref id="B"/>
+ </resource_set>
+ <resource_set id="ordered-set-2" sequential="true">
+ <resource_ref id="C"/>
+ <resource_ref id="D"/>
+ </resource_set>
+ <resource_set id="ordered-set-3" sequential="false">
+ <resource_ref id="E"/>
+ <resource_ref id="F"/>
+ </resource_set>
+ </rsc_order>
+ </constraints>
+
+.. topic:: Visual representation of the start order for the three sets defined above
+
+ .. image:: images/three-sets.png
+ :alt: Three ordered sets
+
+.. important::
+
+ An ordered set with ``sequential=false`` makes sense only if there is another
+ set in the constraint. Otherwise, the constraint has no effect.
+
+Resource Set OR Logic
+_____________________
+
+The unordered set logic discussed so far has all been "AND" logic. To illustrate
+this take the 3 resource set figure in the previous section. Those sets can be
+expressed, **(A and B) then (C) then (D) then (E and F)**.
+
+Say for example we want to change the first set, **(A and B)**, to use "OR" logic
+so the sets look like this: **(A or B) then (C) then (D) then (E and F)**. This
+functionality can be achieved through the use of the ``require-all`` option.
+This option defaults to TRUE which is why the "AND" logic is used by default.
+Setting ``require-all=false`` means only one resource in the set needs to be
+started before continuing on to the next set.
+
+.. topic:: Resource Set "OR" logic: Three ordered sets, where the first set is
+ internally unordered with "OR" logic
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_order id="order-1">
+ <resource_set id="ordered-set-1" sequential="false" require-all="false">
+ <resource_ref id="A"/>
+ <resource_ref id="B"/>
+ </resource_set>
+ <resource_set id="ordered-set-2" sequential="true">
+ <resource_ref id="C"/>
+ <resource_ref id="D"/>
+ </resource_set>
+ <resource_set id="ordered-set-3" sequential="false">
+ <resource_ref id="E"/>
+ <resource_ref id="F"/>
+ </resource_set>
+ </rsc_order>
+ </constraints>
+
+.. important::
+
+ An ordered set with ``require-all=false`` makes sense only in conjunction with
+ ``sequential=false``. Think of it like this: ``sequential=false`` modifies the set
+ to be an unordered set using "AND" logic by default, and adding
+ ``require-all=false`` flips the unordered set's "AND" logic to "OR" logic.
+
+.. _s-resource-sets-colocation:
+
+Colocating Sets of Resources
+############################
+
+Another common situation is for an administrator to create a set of
+colocated resources.
+
+The simplest way to do this is to define a resource group (see
+:ref:`group-resources`), but that cannot always accurately express the desired
+relationships. For example, maybe the resources do not need to be ordered.
+
+Another way would be to define each relationship as an individual constraint,
+but that causes a difficult-to-follow constraint explosion as the number of
+resources and combinations grow.
+
+.. topic:: Colocation chain as individual constraints, where A is placed first,
+ then B, then C, then D
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_colocation id="coloc-1" rsc="D" with-rsc="C" score="INFINITY"/>
+ <rsc_colocation id="coloc-2" rsc="C" with-rsc="B" score="INFINITY"/>
+ <rsc_colocation id="coloc-3" rsc="B" with-rsc="A" score="INFINITY"/>
+ </constraints>
+
+To express complicated relationships with a simplified syntax [#]_,
+:ref:`resource sets <s-resource-sets>` can be used within colocation constraints.
+
+.. topic:: Equivalent colocation chain expressed using **resource_set**
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_colocation id="coloc-1" score="INFINITY" >
+ <resource_set id="colocated-set-example" sequential="true">
+ <resource_ref id="A"/>
+ <resource_ref id="B"/>
+ <resource_ref id="C"/>
+ <resource_ref id="D"/>
+ </resource_set>
+ </rsc_colocation>
+ </constraints>
+
+.. note::
+
+ Within a ``resource_set``, the resources are listed in the order they are
+ *placed*, which is the reverse of the order in which they are *colocated*.
+ In the above example, resource **A** is placed before resource **B**, which is
+ the same as saying resource **B** is colocated with resource **A**.
+
+As with individual constraints, a resource that can't be active prevents any
+resource that must be colocated with it from being active. In both of the two
+previous examples, if **B** is unable to run, then both **C** and by inference **D**
+must remain stopped.
+
+.. important::
+
+ If you use a higher-level tool, pay attention to how it exposes this
+ functionality. Depending on the tool, creating a set **A B** may be equivalent to
+ **A with B**, or **B with A**.
+
+Resource sets can also be used to tell the cluster that entire *sets* of
+resources must be colocated relative to each other, while the individual
+members within any one set may or may not be colocated relative to each other
+(determined by the set's ``sequential`` property).
+
+In the following example, resources **B**, **C**, and **D** will each be colocated
+with **A** (which will be placed first). **A** must be able to run in order for any
+of the resources to run, but any of **B**, **C**, or **D** may be stopped without
+affecting any of the others.
+
+.. topic:: Using colocated sets to specify a shared dependency
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_colocation id="coloc-1" score="INFINITY" >
+ <resource_set id="colocated-set-2" sequential="false">
+ <resource_ref id="B"/>
+ <resource_ref id="C"/>
+ <resource_ref id="D"/>
+ </resource_set>
+ <resource_set id="colocated-set-1" sequential="true">
+ <resource_ref id="A"/>
+ </resource_set>
+ </rsc_colocation>
+ </constraints>
+
+.. note::
+
+ Pay close attention to the order in which resources and sets are listed.
+ While the members of any one sequential set are placed first to last (i.e., the
+ colocation dependency is last with first), multiple sets are placed last to
+ first (i.e. the colocation dependency is first with last).
+
+.. important::
+
+ A colocated set with ``sequential="false"`` makes sense only if there is
+ another set in the constraint. Otherwise, the constraint has no effect.
+
+There is no inherent limit to the number and size of the sets used.
+The only thing that matters is that in order for any member of one set
+in the constraint to be active, all members of sets listed after it must also
+be active (and naturally on the same node); and if a set has ``sequential="true"``,
+then in order for one member of that set to be active, all members listed
+before it must also be active.
+
+If desired, you can restrict the dependency to instances of promotable clone
+resources that are in a specific role, using the set's ``role`` property.
+
+.. topic:: Colocation in which the members of the middle set have no
+ interdependencies, and the last set listed applies only to promoted
+ instances
+
+ .. code-block:: xml
+
+ <constraints>
+ <rsc_colocation id="coloc-1" score="INFINITY" >
+ <resource_set id="colocated-set-1" sequential="true">
+ <resource_ref id="F"/>
+ <resource_ref id="G"/>
+ </resource_set>
+ <resource_set id="colocated-set-2" sequential="false">
+ <resource_ref id="C"/>
+ <resource_ref id="D"/>
+ <resource_ref id="E"/>
+ </resource_set>
+ <resource_set id="colocated-set-3" sequential="true" role="Promoted">
+ <resource_ref id="A"/>
+ <resource_ref id="B"/>
+ </resource_set>
+ </rsc_colocation>
+ </constraints>
+
+.. topic:: Visual representation of the above example (resources are placed from
+ left to right)
+
+ .. image:: ../shared/images/pcmk-colocated-sets.png
+ :alt: Colocation chain
+
+.. note::
+
+ Unlike ordered sets, colocated sets do not use the ``require-all`` option.
+
+
+External Resource Dependencies
+##############################
+
+Sometimes, a resource will depend on services that are not managed by the
+cluster. An example might be a resource that requires a file system that is
+not managed by the cluster but mounted by systemd at boot time.
+
+To accommodate this, the pacemaker systemd service depends on a normally empty
+target called ``resource-agents-deps.target``. The system administrator may
+create a unit drop-in for that target specifying the dependencies, to ensure
+that the services are started before Pacemaker starts and stopped after
+Pacemaker stops.
+
+Typically, this is accomplished by placing a unit file in the
+``/etc/systemd/system/resource-agents-deps.target.d`` directory, with directives
+such as ``Requires`` and ``After`` specifying the dependencies as needed.
+
+
+.. [#] While the human brain is sophisticated enough to read the constraint
+ in any order and choose the correct one depending on the situation,
+ the cluster is not quite so smart. Yet.
+
+.. [#] which is not the same as saying easy to follow
diff --git a/doc/sphinx/Pacemaker_Explained/fencing.rst b/doc/sphinx/Pacemaker_Explained/fencing.rst
new file mode 100644
index 0000000..109b4da
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/fencing.rst
@@ -0,0 +1,1298 @@
+.. index::
+ single: fencing
+ single: STONITH
+
+.. _fencing:
+
+Fencing
+-------
+
+What Is Fencing?
+################
+
+*Fencing* is the ability to make a node unable to run resources, even when that
+node is unresponsive to cluster commands.
+
+Fencing is also known as *STONITH*, an acronym for "Shoot The Other Node In The
+Head", since the most common fencing method is cutting power to the node.
+Another method is "fabric fencing", cutting the node's access to some
+capability required to run resources (such as network access or a shared disk).
+
+.. index::
+ single: fencing; why necessary
+
+Why Is Fencing Necessary?
+#########################
+
+Fencing protects your data from being corrupted by malfunctioning nodes or
+unintentional concurrent access to shared resources.
+
+Fencing protects against the "split brain" failure scenario, where cluster
+nodes have lost the ability to reliably communicate with each other but are
+still able to run resources. If the cluster just assumed that uncommunicative
+nodes were down, then multiple instances of a resource could be started on
+different nodes.
+
+The effect of split brain depends on the resource type. For example, an IP
+address brought up on two hosts on a network will cause packets to randomly be
+sent to one or the other host, rendering the IP useless. For a database or
+clustered file system, the effect could be much more severe, causing data
+corruption or divergence.
+
+Fencing is also used when a resource cannot otherwise be stopped. If a
+resource fails to stop on a node, it cannot be started on a different node
+without risking the same type of conflict as split-brain. Fencing the
+original node ensures the resource can be safely started elsewhere.
+
+Users may also configure the ``on-fail`` property of :ref:`operation` or the
+``loss-policy`` property of
+:ref:`ticket constraints <ticket-constraints>` to ``fence``, in which
+case the cluster will fence the resource's node if the operation fails or the
+ticket is lost.
+
+.. index::
+ single: fencing; device
+
+Fence Devices
+#############
+
+A *fence device* or *fencing device* is a special type of resource that
+provides the means to fence a node.
+
+Examples of fencing devices include intelligent power switches and IPMI devices
+that accept SNMP commands to cut power to a node, and iSCSI controllers that
+allow SCSI reservations to be used to cut a node's access to a shared disk.
+
+Since fencing devices will be used to recover from loss of networking
+connectivity to other nodes, it is essential that they do not rely on the same
+network as the cluster itself, otherwise that network becomes a single point of
+failure.
+
+Since loss of a node due to power outage is indistinguishable from loss of
+network connectivity to that node, it is also essential that at least one fence
+device for a node does not share power with that node. For example, an on-board
+IPMI controller that shares power with its host should not be used as the sole
+fencing device for that host.
+
+Since fencing is used to isolate malfunctioning nodes, no fence device should
+rely on its target functioning properly. This includes, for example, devices
+that ssh into a node and issue a shutdown command (such devices might be
+suitable for testing, but never for production).
+
+.. index::
+ single: fencing; agent
+
+Fence Agents
+############
+
+A *fence agent* or *fencing agent* is a ``stonith``-class resource agent.
+
+The fence agent standard provides commands (such as ``off`` and ``reboot``)
+that the cluster can use to fence nodes. As with other resource agent classes,
+this allows a layer of abstraction so that Pacemaker doesn't need any knowledge
+about specific fencing technologies -- that knowledge is isolated in the agent.
+
+Pacemaker supports two fence agent standards, both inherited from
+no-longer-active projects:
+
+* Red Hat Cluster Suite (RHCS) style: These are typically installed in
+ ``/usr/sbin`` with names starting with ``fence_``.
+
+* Linux-HA style: These typically have names starting with ``external/``.
+ Pacemaker can support these agents using the **fence_legacy** RHCS-style
+ agent as a wrapper, *if* support was enabled when Pacemaker was built, which
+ requires the ``cluster-glue`` library.
+
+When a Fence Device Can Be Used
+###############################
+
+Fencing devices do not actually "run" like most services. Typically, they just
+provide an interface for sending commands to an external device.
+
+Additionally, fencing may be initiated by Pacemaker, by other cluster-aware
+software such as DRBD or DLM, or manually by an administrator, at any point in
+the cluster life cycle, including before any resources have been started.
+
+To accommodate this, Pacemaker does not require the fence device resource to be
+"started" in order to be used. Whether a fence device is started or not
+determines whether a node runs any recurring monitor for the device, and gives
+the node a slight preference for being chosen to execute fencing using that
+device.
+
+By default, any node can execute any fencing device. If a fence device is
+disabled by setting its ``target-role`` to ``Stopped``, then no node can use
+that device. If a location constraint with a negative score prevents a specific
+node from "running" a fence device, then that node will never be chosen to
+execute fencing using the device. A node may fence itself, but the cluster will
+choose that only if no other nodes can do the fencing.
+
+A common configuration scenario is to have one fence device per target node.
+In such a case, users often configure anti-location constraints so that
+the target node does not monitor its own device.
+
+Limitations of Fencing Resources
+################################
+
+Fencing resources have certain limitations that other resource classes don't:
+
+* They may have only one set of meta-attributes and one set of instance
+ attributes.
+* If :ref:`rules` are used to determine fencing resource options, these
+ might be evaluated only when first read, meaning that later changes to the
+ rules will have no effect. Therefore, it is better to avoid confusion and not
+ use rules at all with fencing resources.
+
+These limitations could be revisited if there is sufficient user demand.
+
+.. index::
+ single: fencing; special instance attributes
+
+.. _fencing-attributes:
+
+Special Meta-Attributes for Fencing Resources
+#############################################
+
+The table below lists special resource meta-attributes that may be set for any
+fencing resource.
+
+.. table:: **Additional Properties of Fencing Resources**
+ :widths: 2 1 2 4
+
+
+ +----------------------+---------+--------------------+----------------------------------------+
+ | Field | Type | Default | Description |
+ +======================+=========+====================+========================================+
+ | provides | string | | .. index:: |
+ | | | | single: provides |
+ | | | | |
+ | | | | Any special capability provided by the |
+ | | | | fence device. Currently, only one such |
+ | | | | capability is meaningful: |
+ | | | | :ref:`unfencing <unfencing>`. |
+ +----------------------+---------+--------------------+----------------------------------------+
+
+Special Instance Attributes for Fencing Resources
+#################################################
+
+The table below lists special instance attributes that may be set for any
+fencing resource (*not* meta-attributes, even though they are interpreted by
+Pacemaker rather than the fence agent). These are also listed in the man page
+for ``pacemaker-fenced``.
+
+.. Not_Yet_Implemented:
+
+ +----------------------+---------+--------------------+----------------------------------------+
+ | priority | integer | 0 | .. index:: |
+ | | | | single: priority |
+ | | | | |
+ | | | | The priority of the fence device. |
+ | | | | Devices are tried in order of highest |
+ | | | | priority to lowest. |
+ +----------------------+---------+--------------------+----------------------------------------+
+
+.. table:: **Additional Properties of Fencing Resources**
+ :class: longtable
+ :widths: 2 1 2 4
+
+ +----------------------+---------+--------------------+----------------------------------------+
+ | Field | Type | Default | Description |
+ +======================+=========+====================+========================================+
+ | stonith-timeout | time | | .. index:: |
+ | | | | single: stonith-timeout |
+ | | | | |
+ | | | | This is not used by Pacemaker (see the |
+ | | | | ``pcmk_reboot_timeout``, |
+ | | | | ``pcmk_off_timeout``, etc. properties |
+ | | | | instead), but it may be used by |
+ | | | | Linux-HA fence agents. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_host_map | string | | .. index:: |
+ | | | | single: pcmk_host_map |
+ | | | | |
+ | | | | A mapping of node names to ports |
+ | | | | for devices that do not understand |
+ | | | | the node names. |
+ | | | | |
+ | | | | Example: ``node1:1;node2:2,3`` tells |
+ | | | | the cluster to use port 1 for |
+ | | | | ``node1`` and ports 2 and 3 for |
+ | | | | ``node2``. If ``pcmk_host_check`` is |
+ | | | | explicitly set to ``static-list``, |
+ | | | | either this or ``pcmk_host_list`` must |
+ | | | | be set. The port portion of the map |
+ | | | | may contain special characters such as |
+ | | | | spaces if preceded by a backslash |
+ | | | | *(since 2.1.2)*. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_host_list | string | | .. index:: |
+ | | | | single: pcmk_host_list |
+ | | | | |
+ | | | | A list of machines controlled by this |
+ | | | | device. If ``pcmk_host_check`` is |
+ | | | | explicitly set to ``static-list``, |
+ | | | | either this or ``pcmk_host_map`` must |
+ | | | | be set. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_host_check | string | Value appropriate | .. index:: |
+ | | | to other | single: pcmk_host_check |
+ | | | parameters (see | |
+ | | | "Default Check | The method Pacemaker should use to |
+ | | | Type" below) | determine which nodes can be targeted |
+ | | | | by this device. Allowed values: |
+ | | | | |
+ | | | | * ``static-list:`` targets are listed |
+ | | | | in the ``pcmk_host_list`` or |
+ | | | | ``pcmk_host_map`` attribute |
+ | | | | * ``dynamic-list:`` query the device |
+ | | | | via the agent's ``list`` action |
+ | | | | * ``status:`` query the device via the |
+ | | | | agent's ``status`` action |
+ | | | | * ``none:`` assume the device can |
+ | | | | fence any node |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_delay_max | time | 0s | .. index:: |
+ | | | | single: pcmk_delay_max |
+ | | | | |
+ | | | | Enable a delay of no more than the |
+ | | | | time specified before executing |
+ | | | | fencing actions. Pacemaker derives the |
+ | | | | overall delay by taking the value of |
+ | | | | pcmk_delay_base and adding a random |
+ | | | | delay value such that the sum is kept |
+ | | | | below this maximum. This is sometimes |
+ | | | | used in two-node clusters to ensure |
+ | | | | that the nodes don't fence each other |
+ | | | | at the same time. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_delay_base | time | 0s | .. index:: |
+ | | | | single: pcmk_delay_base |
+ | | | | |
+ | | | | Enable a static delay before executing |
+ | | | | fencing actions. This can be used, for |
+ | | | | example, in two-node clusters to |
+ | | | | ensure that the nodes don't fence each |
+ | | | | other, by having separate fencing |
+ | | | | resources with different values. The |
+ | | | | node that is fenced with the shorter |
+ | | | | delay will lose a fencing race. The |
+ | | | | overall delay introduced by pacemaker |
+ | | | | is derived from this value plus a |
+ | | | | random delay such that the sum is kept |
+ | | | | below the maximum delay. A single |
+ | | | | device can have different delays per |
+ | | | | node using a host map *(since 2.1.2)*, |
+ | | | | for example ``node1:0s;node2:5s.`` |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_action_limit | integer | 1 | .. index:: |
+ | | | | single: pcmk_action_limit |
+ | | | | |
+ | | | | The maximum number of actions that can |
+ | | | | be performed in parallel on this |
+ | | | | device. A value of -1 means unlimited. |
+ | | | | Node fencing actions initiated by the |
+ | | | | cluster (as opposed to an administrator|
+ | | | | running the ``stonith_admin`` tool or |
+ | | | | the fencer running recurring device |
+ | | | | monitors and ``status`` and ``list`` |
+ | | | | commands) are additionally subject to |
+ | | | | the ``concurrent-fencing`` cluster |
+ | | | | property. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_host_argument | string | ``port`` otherwise | .. index:: |
+ | | | ``plug`` if | single: pcmk_host_argument |
+ | | | supported | |
+ | | | according to the | *Advanced use only.* Which parameter |
+ | | | metadata of the | should be supplied to the fence agent |
+ | | | fence agent | to identify the node to be fenced. |
+ | | | | Some devices support neither the |
+ | | | | standard ``plug`` nor the deprecated |
+ | | | | ``port`` parameter, or may provide |
+ | | | | additional ones. Use this to specify |
+ | | | | an alternate, device-specific |
+ | | | | parameter. A value of ``none`` tells |
+ | | | | the cluster not to supply any |
+ | | | | additional parameters. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_reboot_action | string | reboot | .. index:: |
+ | | | | single: pcmk_reboot_action |
+ | | | | |
+ | | | | *Advanced use only.* The command to |
+ | | | | send to the resource agent in order to |
+ | | | | reboot a node. Some devices do not |
+ | | | | support the standard commands or may |
+ | | | | provide additional ones. Use this to |
+ | | | | specify an alternate, device-specific |
+ | | | | command. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_reboot_timeout | time | 60s | .. index:: |
+ | | | | single: pcmk_reboot_timeout |
+ | | | | |
+ | | | | *Advanced use only.* Specify an |
+ | | | | alternate timeout to use for |
+ | | | | ``reboot`` actions instead of the |
+ | | | | value of ``stonith-timeout``. Some |
+ | | | | devices need much more or less time to |
+ | | | | complete than normal. Use this to |
+ | | | | specify an alternate, device-specific |
+ | | | | timeout. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_reboot_retries | integer | 2 | .. index:: |
+ | | | | single: pcmk_reboot_retries |
+ | | | | |
+ | | | | *Advanced use only.* The maximum |
+ | | | | number of times to retry the |
+ | | | | ``reboot`` command within the timeout |
+ | | | | period. Some devices do not support |
+ | | | | multiple connections, and operations |
+ | | | | may fail if the device is busy with |
+ | | | | another task, so Pacemaker will |
+ | | | | automatically retry the operation, if |
+ | | | | there is time remaining. Use this |
+ | | | | option to alter the number of times |
+ | | | | Pacemaker retries before giving up. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_off_action | string | off | .. index:: |
+ | | | | single: pcmk_off_action |
+ | | | | |
+ | | | | *Advanced use only.* The command to |
+ | | | | send to the resource agent in order to |
+ | | | | shut down a node. Some devices do not |
+ | | | | support the standard commands or may |
+ | | | | provide additional ones. Use this to |
+ | | | | specify an alternate, device-specific |
+ | | | | command. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_off_timeout | time | 60s | .. index:: |
+ | | | | single: pcmk_off_timeout |
+ | | | | |
+ | | | | *Advanced use only.* Specify an |
+ | | | | alternate timeout to use for |
+ | | | | ``off`` actions instead of the |
+ | | | | value of ``stonith-timeout``. Some |
+ | | | | devices need much more or less time to |
+ | | | | complete than normal. Use this to |
+ | | | | specify an alternate, device-specific |
+ | | | | timeout. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_off_retries | integer | 2 | .. index:: |
+ | | | | single: pcmk_off_retries |
+ | | | | |
+ | | | | *Advanced use only.* The maximum |
+ | | | | number of times to retry the |
+ | | | | ``off`` command within the timeout |
+ | | | | period. Some devices do not support |
+ | | | | multiple connections, and operations |
+ | | | | may fail if the device is busy with |
+ | | | | another task, so Pacemaker will |
+ | | | | automatically retry the operation, if |
+ | | | | there is time remaining. Use this |
+ | | | | option to alter the number of times |
+ | | | | Pacemaker retries before giving up. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_list_action | string | list | .. index:: |
+ | | | | single: pcmk_list_action |
+ | | | | |
+ | | | | *Advanced use only.* The command to |
+ | | | | send to the resource agent in order to |
+ | | | | list nodes. Some devices do not |
+ | | | | support the standard commands or may |
+ | | | | provide additional ones. Use this to |
+ | | | | specify an alternate, device-specific |
+ | | | | command. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_list_timeout | time | 60s | .. index:: |
+ | | | | single: pcmk_list_timeout |
+ | | | | |
+ | | | | *Advanced use only.* Specify an |
+ | | | | alternate timeout to use for |
+ | | | | ``list`` actions instead of the |
+ | | | | value of ``stonith-timeout``. Some |
+ | | | | devices need much more or less time to |
+ | | | | complete than normal. Use this to |
+ | | | | specify an alternate, device-specific |
+ | | | | timeout. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_list_retries | integer | 2 | .. index:: |
+ | | | | single: pcmk_list_retries |
+ | | | | |
+ | | | | *Advanced use only.* The maximum |
+ | | | | number of times to retry the |
+ | | | | ``list`` command within the timeout |
+ | | | | period. Some devices do not support |
+ | | | | multiple connections, and operations |
+ | | | | may fail if the device is busy with |
+ | | | | another task, so Pacemaker will |
+ | | | | automatically retry the operation, if |
+ | | | | there is time remaining. Use this |
+ | | | | option to alter the number of times |
+ | | | | Pacemaker retries before giving up. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_monitor_action | string | monitor | .. index:: |
+ | | | | single: pcmk_monitor_action |
+ | | | | |
+ | | | | *Advanced use only.* The command to |
+ | | | | send to the resource agent in order to |
+ | | | | report extended status. Some devices do|
+ | | | | not support the standard commands or |
+ | | | | may provide additional ones. Use this |
+ | | | | to specify an alternate, |
+ | | | | device-specific command. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_monitor_timeout | time | 60s | .. index:: |
+ | | | | single: pcmk_monitor_timeout |
+ | | | | |
+ | | | | *Advanced use only.* Specify an |
+ | | | | alternate timeout to use for |
+ | | | | ``monitor`` actions instead of the |
+ | | | | value of ``stonith-timeout``. Some |
+ | | | | devices need much more or less time to |
+ | | | | complete than normal. Use this to |
+ | | | | specify an alternate, device-specific |
+ | | | | timeout. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_monitor_retries | integer | 2 | .. index:: |
+ | | | | single: pcmk_monitor_retries |
+ | | | | |
+ | | | | *Advanced use only.* The maximum |
+ | | | | number of times to retry the |
+ | | | | ``monitor`` command within the timeout |
+ | | | | period. Some devices do not support |
+ | | | | multiple connections, and operations |
+ | | | | may fail if the device is busy with |
+ | | | | another task, so Pacemaker will |
+ | | | | automatically retry the operation, if |
+ | | | | there is time remaining. Use this |
+ | | | | option to alter the number of times |
+ | | | | Pacemaker retries before giving up. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_status_action | string | status | .. index:: |
+ | | | | single: pcmk_status_action |
+ | | | | |
+ | | | | *Advanced use only.* The command to |
+ | | | | send to the resource agent in order to |
+ | | | | report status. Some devices do |
+ | | | | not support the standard commands or |
+ | | | | may provide additional ones. Use this |
+ | | | | to specify an alternate, |
+ | | | | device-specific command. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_status_timeout | time | 60s | .. index:: |
+ | | | | single: pcmk_status_timeout |
+ | | | | |
+ | | | | *Advanced use only.* Specify an |
+ | | | | alternate timeout to use for |
+ | | | | ``status`` actions instead of the |
+ | | | | value of ``stonith-timeout``. Some |
+ | | | | devices need much more or less time to |
+ | | | | complete than normal. Use this to |
+ | | | | specify an alternate, device-specific |
+ | | | | timeout. |
+ +----------------------+---------+--------------------+----------------------------------------+
+ | pcmk_status_retries | integer | 2 | .. index:: |
+ | | | | single: pcmk_status_retries |
+ | | | | |
+ | | | | *Advanced use only.* The maximum |
+ | | | | number of times to retry the |
+ | | | | ``status`` command within the timeout |
+ | | | | period. Some devices do not support |
+ | | | | multiple connections, and operations |
+ | | | | may fail if the device is busy with |
+ | | | | another task, so Pacemaker will |
+ | | | | automatically retry the operation, if |
+ | | | | there is time remaining. Use this |
+ | | | | option to alter the number of times |
+ | | | | Pacemaker retries before giving up. |
+ +----------------------+---------+--------------------+----------------------------------------+
+
+Default Check Type
+##################
+
+If the user does not explicitly configure ``pcmk_host_check`` for a fence
+device, a default value appropriate to other configured parameters will be
+used:
+
+* If either ``pcmk_host_list`` or ``pcmk_host_map`` is configured,
+ ``static-list`` will be used;
+* otherwise, if the fence device supports the ``list`` action, and the first
+ attempt at using ``list`` succeeds, ``dynamic-list`` will be used;
+* otherwise, if the fence device supports the ``status`` action, ``status``
+ will be used;
+* otherwise, ``none`` will be used.
+
+.. index::
+ single: unfencing
+ single: fencing; unfencing
+
+.. _unfencing:
+
+Unfencing
+#########
+
+With fabric fencing (such as cutting network or shared disk access rather than
+power), it is expected that the cluster will fence the node, and then a system
+administrator must manually investigate what went wrong, correct any issues
+found, then reboot (or restart the cluster services on) the node.
+
+Once the node reboots and rejoins the cluster, some fabric fencing devices
+require an explicit command to restore the node's access. This capability is
+called *unfencing* and is typically implemented as the fence agent's ``on``
+command.
+
+If any cluster resource has ``requires`` set to ``unfencing``, then that
+resource will not be probed or started on a node until that node has been
+unfenced.
+
+Fencing and Quorum
+##################
+
+In general, a cluster partition may execute fencing only if the partition has
+quorum, and the ``stonith-enabled`` cluster property is set to true. However,
+there are exceptions:
+
+* The requirements apply only to fencing initiated by Pacemaker. If an
+ administrator initiates fencing using the ``stonith_admin`` command, or an
+ external application such as DLM initiates fencing using Pacemaker's C API,
+ the requirements do not apply.
+
+* A cluster partition without quorum is allowed to fence any active member of
+ that partition. As a corollary, this allows a ``no-quorum-policy`` of
+ ``suicide`` to work.
+
+* If the ``no-quorum-policy`` cluster property is set to ``ignore``, then
+ quorum is not required to execute fencing of any node.
+
+Fencing Timeouts
+################
+
+Fencing timeouts are complicated, since a single fencing operation can involve
+many steps, each of which may have a separate timeout.
+
+Fencing may be initiated in one of several ways:
+
+* An administrator may initiate fencing using the ``stonith_admin`` tool,
+ which has a ``--timeout`` option (defaulting to 2 minutes) that will be used
+ as the fence operation timeout.
+
+* An external application such as DLM may initiate fencing using the Pacemaker
+ C API. The application will specify the fence operation timeout in this case,
+ which might or might not be configurable by the user.
+
+* The cluster may initiate fencing itself. In this case, the
+ ``stonith-timeout`` cluster property (defaulting to 1 minute) will be used as
+ the fence operation timeout.
+
+However fencing is initiated, the initiator contacts Pacemaker's fencer
+(``pacemaker-fenced``) to request fencing. This connection and request has its
+own timeout, separate from the fencing operation timeout, but usually happens
+very quickly.
+
+The fencer will contact all fencers in the cluster to ask what devices they
+have available to fence the target node. The fence operation timeout will be
+used as the timeout for each of these queries.
+
+Once a fencing device has been selected, the fencer will check whether any
+action-specific timeout has been configured for the device, to use instead of
+the fence operation timeout. For example, if ``stonith-timeout`` is 60 seconds,
+but the fencing device has ``pcmk_reboot_timeout`` configured as 90 seconds,
+then a timeout of 90 seconds will be used for reboot actions using that device.
+
+A device may have retries configured, in which case the timeout applies across
+all attempts. For example, if a device has ``pcmk_reboot_retries`` configured
+as 2, and the first reboot attempt fails, the second attempt will only have
+whatever time is remaining in the action timeout after subtracting how much
+time the first attempt used. This means that if the first attempt fails due to
+using the entire timeout, no further attempts will be made. There is currently
+no way to configure a per-attempt timeout.
+
+If more than one device is required to fence a target, whether due to failure
+of the first device or a fencing topology with multiple devices configured for
+the target, each device will have its own separate action timeout.
+
+For all of the above timeouts, the fencer will generally multiply the
+configured value by 1.2 to get an actual value to use, to account for time
+needed by the fencer's own processing.
+
+Separate from the fencer's timeouts, some fence agents have internal timeouts
+for individual steps of their fencing process. These agents often have
+parameters to configure these timeouts, such as ``login-timeout``,
+``shell-timeout``, or ``power-timeout``. Many such agents also have a
+``disable-timeout`` parameter to ignore their internal timeouts and just let
+Pacemaker handle the timeout. This causes a difference in retry behavior.
+If ``disable-timeout`` is not set, and the agent hits one of its internal
+timeouts, it will report that as a failure to Pacemaker, which can then retry.
+If ``disable-timeout`` is set, and Pacemaker hits a timeout for the agent, then
+there will be no time remaining, and no retry will be done.
+
+Fence Devices Dependent on Other Resources
+##########################################
+
+In some cases, a fence device may require some other cluster resource (such as
+an IP address) to be active in order to function properly.
+
+This is obviously undesirable in general: fencing may be required when the
+depended-on resource is not active, or fencing may be required because the node
+running the depended-on resource is no longer responding.
+
+However, this may be acceptable under certain conditions:
+
+* The dependent fence device should not be able to target any node that is
+ allowed to run the depended-on resource.
+
+* The depended-on resource should not be disabled during production operation.
+
+* The ``concurrent-fencing`` cluster property should be set to ``true``.
+ Otherwise, if both the node running the depended-on resource and some node
+ targeted by the dependent fence device need to be fenced, the fencing of the
+ node running the depended-on resource might be ordered first, making the
+ second fencing impossible and blocking further recovery. With concurrent
+ fencing, the dependent fence device might fail at first due to the
+ depended-on resource being unavailable, but it will be retried and eventually
+ succeed once the resource is brought back up.
+
+Even under those conditions, there is one unlikely problem scenario. The DC
+always schedules fencing of itself after any other fencing needed, to avoid
+unnecessary repeated DC elections. If the dependent fence device targets the
+DC, and both the DC and a different node running the depended-on resource need
+to be fenced, the DC fencing will always fail and block further recovery. Note,
+however, that losing a DC node entirely causes some other node to become DC and
+schedule the fencing, so this is only a risk when a stop or other operation
+with ``on-fail`` set to ``fencing`` fails on the DC.
+
+.. index::
+ single: fencing; configuration
+
+Configuring Fencing
+###################
+
+Higher-level tools can provide simpler interfaces to this process, but using
+Pacemaker command-line tools, this is how you could configure a fence device.
+
+#. Find the correct driver:
+
+ .. code-block:: none
+
+ # stonith_admin --list-installed
+
+ .. note::
+
+ You may have to install packages to make fence agents available on your
+ host. Searching your available packages for ``fence-`` is usually
+ helpful. Ensure the packages providing the fence agents you require are
+ installed on every cluster node.
+
+#. Find the required parameters associated with the device
+ (replacing ``$AGENT_NAME`` with the name obtained from the previous step):
+
+ .. code-block:: none
+
+ # stonith_admin --metadata --agent $AGENT_NAME
+
+#. Create a file called ``stonith.xml`` containing a primitive resource
+ with a class of ``stonith``, a type equal to the agent name obtained earlier,
+ and a parameter for each of the values returned in the previous step.
+
+#. If the device does not know how to fence nodes based on their uname,
+ you may also need to set the special ``pcmk_host_map`` parameter. See
+ :ref:`fencing-attributes` for details.
+
+#. If the device does not support the ``list`` command, you may also need
+ to set the special ``pcmk_host_list`` and/or ``pcmk_host_check``
+ parameters. See :ref:`fencing-attributes` for details.
+
+#. If the device does not expect the target to be specified with the
+ ``port`` parameter, you may also need to set the special
+ ``pcmk_host_argument`` parameter. See :ref:`fencing-attributes` for details.
+
+#. Upload it into the CIB using cibadmin:
+
+ .. code-block:: none
+
+ # cibadmin --create --scope resources --xml-file stonith.xml
+
+#. Set ``stonith-enabled`` to true:
+
+ .. code-block:: none
+
+ # crm_attribute --type crm_config --name stonith-enabled --update true
+
+#. Once the stonith resource is running, you can test it by executing the
+ following, replacing ``$NODE_NAME`` with the name of the node to fence
+ (although you might want to stop the cluster on that machine first):
+
+ .. code-block:: none
+
+ # stonith_admin --reboot $NODE_NAME
+
+
+Example Fencing Configuration
+_____________________________
+
+For this example, we assume we have a cluster node, ``pcmk-1``, whose IPMI
+controller is reachable at the IP address 192.0.2.1. The IPMI controller uses
+the username ``testuser`` and the password ``abc123``.
+
+#. Looking at what's installed, we may see a variety of available agents:
+
+ .. code-block:: none
+
+ # stonith_admin --list-installed
+
+ .. code-block:: none
+
+ (... some output omitted ...)
+ fence_idrac
+ fence_ilo3
+ fence_ilo4
+ fence_ilo5
+ fence_imm
+ fence_ipmilan
+ (... some output omitted ...)
+
+ Perhaps after some reading some man pages and doing some Internet searches,
+ we might decide ``fence_ipmilan`` is our best choice.
+
+#. Next, we would check what parameters ``fence_ipmilan`` provides:
+
+ .. code-block:: none
+
+ # stonith_admin --metadata -a fence_ipmilan
+
+ .. code-block:: xml
+
+ <resource-agent name="fence_ipmilan" shortdesc="Fence agent for IPMI">
+ <symlink name="fence_ilo3" shortdesc="Fence agent for HP iLO3"/>
+ <symlink name="fence_ilo4" shortdesc="Fence agent for HP iLO4"/>
+ <symlink name="fence_ilo5" shortdesc="Fence agent for HP iLO5"/>
+ <symlink name="fence_imm" shortdesc="Fence agent for IBM Integrated Management Module"/>
+ <symlink name="fence_idrac" shortdesc="Fence agent for Dell iDRAC"/>
+ <longdesc>fence_ipmilan is an I/O Fencing agentwhich can be used with machines controlled by IPMI.This agent calls support software ipmitool (http://ipmitool.sf.net/). WARNING! This fence agent might report success before the node is powered off. You should use -m/method onoff if your fence device works correctly with that option.</longdesc>
+ <vendor-url/>
+ <parameters>
+ <parameter name="action" unique="0" required="0">
+ <getopt mixed="-o, --action=[action]"/>
+ <content type="string" default="reboot"/>
+ <shortdesc lang="en">Fencing action</shortdesc>
+ </parameter>
+ <parameter name="auth" unique="0" required="0">
+ <getopt mixed="-A, --auth=[auth]"/>
+ <content type="select">
+ <option value="md5"/>
+ <option value="password"/>
+ <option value="none"/>
+ </content>
+ <shortdesc lang="en">IPMI Lan Auth type.</shortdesc>
+ </parameter>
+ <parameter name="cipher" unique="0" required="0">
+ <getopt mixed="-C, --cipher=[cipher]"/>
+ <content type="string"/>
+ <shortdesc lang="en">Ciphersuite to use (same as ipmitool -C parameter)</shortdesc>
+ </parameter>
+ <parameter name="hexadecimal_kg" unique="0" required="0">
+ <getopt mixed="--hexadecimal-kg=[key]"/>
+ <content type="string"/>
+ <shortdesc lang="en">Hexadecimal-encoded Kg key for IPMIv2 authentication</shortdesc>
+ </parameter>
+ <parameter name="ip" unique="0" required="0" obsoletes="ipaddr">
+ <getopt mixed="-a, --ip=[ip]"/>
+ <content type="string"/>
+ <shortdesc lang="en">IP address or hostname of fencing device</shortdesc>
+ </parameter>
+ <parameter name="ipaddr" unique="0" required="0" deprecated="1">
+ <getopt mixed="-a, --ip=[ip]"/>
+ <content type="string"/>
+ <shortdesc lang="en">IP address or hostname of fencing device</shortdesc>
+ </parameter>
+ <parameter name="ipport" unique="0" required="0">
+ <getopt mixed="-u, --ipport=[port]"/>
+ <content type="integer" default="623"/>
+ <shortdesc lang="en">TCP/UDP port to use for connection with device</shortdesc>
+ </parameter>
+ <parameter name="lanplus" unique="0" required="0">
+ <getopt mixed="-P, --lanplus"/>
+ <content type="boolean" default="0"/>
+ <shortdesc lang="en">Use Lanplus to improve security of connection</shortdesc>
+ </parameter>
+ <parameter name="login" unique="0" required="0" deprecated="1">
+ <getopt mixed="-l, --username=[name]"/>
+ <content type="string"/>
+ <shortdesc lang="en">Login name</shortdesc>
+ </parameter>
+ <parameter name="method" unique="0" required="0">
+ <getopt mixed="-m, --method=[method]"/>
+ <content type="select" default="onoff">
+ <option value="onoff"/>
+ <option value="cycle"/>
+ </content>
+ <shortdesc lang="en">Method to fence</shortdesc>
+ </parameter>
+ <parameter name="passwd" unique="0" required="0" deprecated="1">
+ <getopt mixed="-p, --password=[password]"/>
+ <content type="string"/>
+ <shortdesc lang="en">Login password or passphrase</shortdesc>
+ </parameter>
+ <parameter name="passwd_script" unique="0" required="0" deprecated="1">
+ <getopt mixed="-S, --password-script=[script]"/>
+ <content type="string"/>
+ <shortdesc lang="en">Script to run to retrieve password</shortdesc>
+ </parameter>
+ <parameter name="password" unique="0" required="0" obsoletes="passwd">
+ <getopt mixed="-p, --password=[password]"/>
+ <content type="string"/>
+ <shortdesc lang="en">Login password or passphrase</shortdesc>
+ </parameter>
+ <parameter name="password_script" unique="0" required="0" obsoletes="passwd_script">
+ <getopt mixed="-S, --password-script=[script]"/>
+ <content type="string"/>
+ <shortdesc lang="en">Script to run to retrieve password</shortdesc>
+ </parameter>
+ <parameter name="plug" unique="0" required="0" obsoletes="port">
+ <getopt mixed="-n, --plug=[ip]"/>
+ <content type="string"/>
+ <shortdesc lang="en">IP address or hostname of fencing device (together with --port-as-ip)</shortdesc>
+ </parameter>
+ <parameter name="port" unique="0" required="0" deprecated="1">
+ <getopt mixed="-n, --plug=[ip]"/>
+ <content type="string"/>
+ <shortdesc lang="en">IP address or hostname of fencing device (together with --port-as-ip)</shortdesc>
+ </parameter>
+ <parameter name="privlvl" unique="0" required="0">
+ <getopt mixed="-L, --privlvl=[level]"/>
+ <content type="select" default="administrator">
+ <option value="callback"/>
+ <option value="user"/>
+ <option value="operator"/>
+ <option value="administrator"/>
+ </content>
+ <shortdesc lang="en">Privilege level on IPMI device</shortdesc>
+ </parameter>
+ <parameter name="target" unique="0" required="0">
+ <getopt mixed="--target=[targetaddress]"/>
+ <content type="string"/>
+ <shortdesc lang="en">Bridge IPMI requests to the remote target address</shortdesc>
+ </parameter>
+ <parameter name="username" unique="0" required="0" obsoletes="login">
+ <getopt mixed="-l, --username=[name]"/>
+ <content type="string"/>
+ <shortdesc lang="en">Login name</shortdesc>
+ </parameter>
+ <parameter name="quiet" unique="0" required="0">
+ <getopt mixed="-q, --quiet"/>
+ <content type="boolean"/>
+ <shortdesc lang="en">Disable logging to stderr. Does not affect --verbose or --debug-file or logging to syslog.</shortdesc>
+ </parameter>
+ <parameter name="verbose" unique="0" required="0">
+ <getopt mixed="-v, --verbose"/>
+ <content type="boolean"/>
+ <shortdesc lang="en">Verbose mode</shortdesc>
+ </parameter>
+ <parameter name="debug" unique="0" required="0" deprecated="1">
+ <getopt mixed="-D, --debug-file=[debugfile]"/>
+ <content type="string"/>
+ <shortdesc lang="en">Write debug information to given file</shortdesc>
+ </parameter>
+ <parameter name="debug_file" unique="0" required="0" obsoletes="debug">
+ <getopt mixed="-D, --debug-file=[debugfile]"/>
+ <content type="string"/>
+ <shortdesc lang="en">Write debug information to given file</shortdesc>
+ </parameter>
+ <parameter name="version" unique="0" required="0">
+ <getopt mixed="-V, --version"/>
+ <content type="boolean"/>
+ <shortdesc lang="en">Display version information and exit</shortdesc>
+ </parameter>
+ <parameter name="help" unique="0" required="0">
+ <getopt mixed="-h, --help"/>
+ <content type="boolean"/>
+ <shortdesc lang="en">Display help and exit</shortdesc>
+ </parameter>
+ <parameter name="delay" unique="0" required="0">
+ <getopt mixed="--delay=[seconds]"/>
+ <content type="second" default="0"/>
+ <shortdesc lang="en">Wait X seconds before fencing is started</shortdesc>
+ </parameter>
+ <parameter name="ipmitool_path" unique="0" required="0">
+ <getopt mixed="--ipmitool-path=[path]"/>
+ <content type="string" default="/usr/bin/ipmitool"/>
+ <shortdesc lang="en">Path to ipmitool binary</shortdesc>
+ </parameter>
+ <parameter name="login_timeout" unique="0" required="0">
+ <getopt mixed="--login-timeout=[seconds]"/>
+ <content type="second" default="5"/>
+ <shortdesc lang="en">Wait X seconds for cmd prompt after login</shortdesc>
+ </parameter>
+ <parameter name="port_as_ip" unique="0" required="0">
+ <getopt mixed="--port-as-ip"/>
+ <content type="boolean"/>
+ <shortdesc lang="en">Make "port/plug" to be an alias to IP address</shortdesc>
+ </parameter>
+ <parameter name="power_timeout" unique="0" required="0">
+ <getopt mixed="--power-timeout=[seconds]"/>
+ <content type="second" default="20"/>
+ <shortdesc lang="en">Test X seconds for status change after ON/OFF</shortdesc>
+ </parameter>
+ <parameter name="power_wait" unique="0" required="0">
+ <getopt mixed="--power-wait=[seconds]"/>
+ <content type="second" default="2"/>
+ <shortdesc lang="en">Wait X seconds after issuing ON/OFF</shortdesc>
+ </parameter>
+ <parameter name="shell_timeout" unique="0" required="0">
+ <getopt mixed="--shell-timeout=[seconds]"/>
+ <content type="second" default="3"/>
+ <shortdesc lang="en">Wait X seconds for cmd prompt after issuing command</shortdesc>
+ </parameter>
+ <parameter name="retry_on" unique="0" required="0">
+ <getopt mixed="--retry-on=[attempts]"/>
+ <content type="integer" default="1"/>
+ <shortdesc lang="en">Count of attempts to retry power on</shortdesc>
+ </parameter>
+ <parameter name="sudo" unique="0" required="0" deprecated="1">
+ <getopt mixed="--use-sudo"/>
+ <content type="boolean"/>
+ <shortdesc lang="en">Use sudo (without password) when calling 3rd party software</shortdesc>
+ </parameter>
+ <parameter name="use_sudo" unique="0" required="0" obsoletes="sudo">
+ <getopt mixed="--use-sudo"/>
+ <content type="boolean"/>
+ <shortdesc lang="en">Use sudo (without password) when calling 3rd party software</shortdesc>
+ </parameter>
+ <parameter name="sudo_path" unique="0" required="0">
+ <getopt mixed="--sudo-path=[path]"/>
+ <content type="string" default="/usr/bin/sudo"/>
+ <shortdesc lang="en">Path to sudo binary</shortdesc>
+ </parameter>
+ </parameters>
+ <actions>
+ <action name="on" automatic="0"/>
+ <action name="off"/>
+ <action name="reboot"/>
+ <action name="status"/>
+ <action name="monitor"/>
+ <action name="metadata"/>
+ <action name="manpage"/>
+ <action name="validate-all"/>
+ <action name="diag"/>
+ <action name="stop" timeout="20s"/>
+ <action name="start" timeout="20s"/>
+ </actions>
+ </resource-agent>
+
+ Once we've decided what parameter values we think we need, it is a good idea
+ to run the fence agent's status action manually, to verify that our values
+ work correctly:
+
+ .. code-block:: none
+
+ # fence_ipmilan --lanplus -a 192.0.2.1 -l testuser -p abc123 -o status
+
+ Chassis Power is on
+
+#. Based on that, we might create a fencing resource configuration like this in
+ ``stonith.xml`` (or any file name, just use the same name with ``cibadmin``
+ later):
+
+ .. code-block:: xml
+
+ <primitive id="Fencing-pcmk-1" class="stonith" type="fence_ipmilan" >
+ <instance_attributes id="Fencing-params" >
+ <nvpair id="Fencing-lanplus" name="lanplus" value="1" />
+ <nvpair id="Fencing-ip" name="ip" value="192.0.2.1" />
+ <nvpair id="Fencing-password" name="password" value="testuser" />
+ <nvpair id="Fencing-username" name="username" value="abc123" />
+ </instance_attributes>
+ <operations >
+ <op id="Fencing-monitor-10m" interval="10m" name="monitor" timeout="300s" />
+ </operations>
+ </primitive>
+
+ .. note::
+
+ Even though the man page shows that the ``action`` parameter is
+ supported, we do not provide that in the resource configuration.
+ Pacemaker will supply an appropriate action whenever the fence device
+ must be used.
+
+#. In this case, we don't need to configure ``pcmk_host_map`` because
+ ``fence_ipmilan`` ignores the target node name and instead uses its
+ ``ip`` parameter to know how to contact the IPMI controller.
+
+#. We do need to let Pacemaker know which cluster node can be fenced by this
+ device, since ``fence_ipmilan`` doesn't support the ``list`` action. Add
+ a line like this to the agent's instance attributes:
+
+ .. code-block:: xml
+
+ <nvpair id="Fencing-pcmk_host_list" name="pcmk_host_list" value="pcmk-1" />
+
+#. We don't need to configure ``pcmk_host_argument`` since ``ip`` is all the
+ fence agent needs (it ignores the target name).
+
+#. Make the configuration active:
+
+ .. code-block:: none
+
+ # cibadmin --create --scope resources --xml-file stonith.xml
+
+#. Set ``stonith-enabled`` to true (this only has to be done once):
+
+ .. code-block:: none
+
+ # crm_attribute --type crm_config --name stonith-enabled --update true
+
+#. Since our cluster is still in testing, we can reboot ``pcmk-1`` without
+ bothering anyone, so we'll test our fencing configuration by running this
+ from one of the other cluster nodes:
+
+ .. code-block:: none
+
+ # stonith_admin --reboot pcmk-1
+
+ Then we will verify that the node did, in fact, reboot.
+
+We can repeat that process to create a separate fencing resource for each node.
+
+With some other fence device types, a single fencing resource is able to be
+used for all nodes. In fact, we could do that with ``fence_ipmilan``, using the
+``port-as-ip`` parameter along with ``pcmk_host_map``. Either approach is
+fine.
+
+.. index::
+ single: fencing; topology
+ single: fencing-topology
+ single: fencing-level
+
+Fencing Topologies
+##################
+
+Pacemaker supports fencing nodes with multiple devices through a feature called
+*fencing topologies*. Fencing topologies may be used to provide alternative
+devices in case one fails, or to require multiple devices to all be executed
+successfully in order to consider the node successfully fenced, or even a
+combination of the two.
+
+Create the individual devices as you normally would, then define one or more
+``fencing-level`` entries in the ``fencing-topology`` section of the
+configuration.
+
+* Each fencing level is attempted in order of ascending ``index``. Allowed
+ values are 1 through 9.
+* If a device fails, processing terminates for the current level. No further
+ devices in that level are exercised, and the next level is attempted instead.
+* If the operation succeeds for all the listed devices in a level, the level is
+ deemed to have passed.
+* The operation is finished when a level has passed (success), or all levels
+ have been attempted (failed).
+* If the operation failed, the next step is determined by the scheduler and/or
+ the controller.
+
+Some possible uses of topologies include:
+
+* Try on-board IPMI, then an intelligent power switch if that fails
+* Try fabric fencing of both disk and network, then fall back to power fencing
+ if either fails
+* Wait up to a certain time for a kernel dump to complete, then cut power to
+ the node
+
+.. table:: **Attributes of a fencing-level Element**
+ :class: longtable
+ :widths: 1 4
+
+ +------------------+-----------------------------------------------------------------------------------------+
+ | Attribute | Description |
+ +==================+=========================================================================================+
+ | id | .. index:: |
+ | | pair: fencing-level; id |
+ | | |
+ | | A unique name for this element (required) |
+ +------------------+-----------------------------------------------------------------------------------------+
+ | target | .. index:: |
+ | | pair: fencing-level; target |
+ | | |
+ | | The name of a single node to which this level applies |
+ +------------------+-----------------------------------------------------------------------------------------+
+ | target-pattern | .. index:: |
+ | | pair: fencing-level; target-pattern |
+ | | |
+ | | An extended regular expression (as defined in `POSIX |
+ | | <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04>`_) |
+ | | matching the names of nodes to which this level applies |
+ +------------------+-----------------------------------------------------------------------------------------+
+ | target-attribute | .. index:: |
+ | | pair: fencing-level; target-attribute |
+ | | |
+ | | The name of a node attribute that is set (to ``target-value``) for nodes to which this |
+ | | level applies |
+ +------------------+-----------------------------------------------------------------------------------------+
+ | target-value | .. index:: |
+ | | pair: fencing-level; target-value |
+ | | |
+ | | The node attribute value (of ``target-attribute``) that is set for nodes to which this |
+ | | level applies |
+ +------------------+-----------------------------------------------------------------------------------------+
+ | index | .. index:: |
+ | | pair: fencing-level; index |
+ | | |
+ | | The order in which to attempt the levels. Levels are attempted in ascending order |
+ | | *until one succeeds*. Valid values are 1 through 9. |
+ +------------------+-----------------------------------------------------------------------------------------+
+ | devices | .. index:: |
+ | | pair: fencing-level; devices |
+ | | |
+ | | A comma-separated list of devices that must all be tried for this level |
+ +------------------+-----------------------------------------------------------------------------------------+
+
+.. note:: **Fencing topology with different devices for different nodes**
+
+ .. code-block:: xml
+
+ <cib crm_feature_set="3.6.0" validate-with="pacemaker-3.5" admin_epoch="1" epoch="0" num_updates="0">
+ <configuration>
+ ...
+ <fencing-topology>
+ <!-- For pcmk-1, try poison-pill and fail back to power -->
+ <fencing-level id="f-p1.1" target="pcmk-1" index="1" devices="poison-pill"/>
+ <fencing-level id="f-p1.2" target="pcmk-1" index="2" devices="power"/>
+
+ <!-- For pcmk-2, try disk and network, and fail back to power -->
+ <fencing-level id="f-p2.1" target="pcmk-2" index="1" devices="disk,network"/>
+ <fencing-level id="f-p2.2" target="pcmk-2" index="2" devices="power"/>
+ </fencing-topology>
+ ...
+ <configuration>
+ <status/>
+ </cib>
+
+Example Dual-Layer, Dual-Device Fencing Topologies
+__________________________________________________
+
+The following example illustrates an advanced use of ``fencing-topology`` in a
+cluster with the following properties:
+
+* 2 nodes (prod-mysql1 and prod-mysql2)
+* the nodes have IPMI controllers reachable at 192.0.2.1 and 192.0.2.2
+* the nodes each have two independent Power Supply Units (PSUs) connected to
+ two independent Power Distribution Units (PDUs) reachable at 198.51.100.1
+ (port 10 and port 11) and 203.0.113.1 (port 10 and port 11)
+* fencing via the IPMI controller uses the ``fence_ipmilan`` agent (1 fence device
+ per controller, with each device targeting a separate node)
+* fencing via the PDUs uses the ``fence_apc_snmp`` agent (1 fence device per
+ PDU, with both devices targeting both nodes)
+* a random delay is used to lessen the chance of a "death match"
+* fencing topology is set to try IPMI fencing first then dual PDU fencing if
+ that fails
+
+In a node failure scenario, Pacemaker will first select ``fence_ipmilan`` to
+try to kill the faulty node. Using the fencing topology, if that method fails,
+it will then move on to selecting ``fence_apc_snmp`` twice (once for the first
+PDU, then again for the second PDU).
+
+The fence action is considered successful only if both PDUs report the required
+status. If any of them fails, fencing loops back to the first fencing method,
+``fence_ipmilan``, and so on, until the node is fenced or the fencing action is
+cancelled.
+
+.. note:: **First fencing method: single IPMI device per target**
+
+ Each cluster node has it own dedicated IPMI controller that can be contacted
+ for fencing using the following primitives:
+
+ .. code-block:: xml
+
+ <primitive class="stonith" id="fence_prod-mysql1_ipmi" type="fence_ipmilan">
+ <instance_attributes id="fence_prod-mysql1_ipmi-instance_attributes">
+ <nvpair id="fence_prod-mysql1_ipmi-instance_attributes-ipaddr" name="ipaddr" value="192.0.2.1"/>
+ <nvpair id="fence_prod-mysql1_ipmi-instance_attributes-login" name="login" value="fencing"/>
+ <nvpair id="fence_prod-mysql1_ipmi-instance_attributes-passwd" name="passwd" value="finishme"/>
+ <nvpair id="fence_prod-mysql1_ipmi-instance_attributes-lanplus" name="lanplus" value="true"/>
+ <nvpair id="fence_prod-mysql1_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql1"/>
+ <nvpair id="fence_prod-mysql1_ipmi-instance_attributes-pcmk_delay_max" name="pcmk_delay_max" value="8s"/>
+ </instance_attributes>
+ </primitive>
+ <primitive class="stonith" id="fence_prod-mysql2_ipmi" type="fence_ipmilan">
+ <instance_attributes id="fence_prod-mysql2_ipmi-instance_attributes">
+ <nvpair id="fence_prod-mysql2_ipmi-instance_attributes-ipaddr" name="ipaddr" value="192.0.2.2"/>
+ <nvpair id="fence_prod-mysql2_ipmi-instance_attributes-login" name="login" value="fencing"/>
+ <nvpair id="fence_prod-mysql2_ipmi-instance_attributes-passwd" name="passwd" value="finishme"/>
+ <nvpair id="fence_prod-mysql2_ipmi-instance_attributes-lanplus" name="lanplus" value="true"/>
+ <nvpair id="fence_prod-mysql2_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql2"/>
+ <nvpair id="fence_prod-mysql2_ipmi-instance_attributes-pcmk_delay_max" name="pcmk_delay_max" value="8s"/>
+ </instance_attributes>
+ </primitive>
+
+.. note:: **Second fencing method: dual PDU devices**
+
+ Each cluster node also has 2 distinct power supplies controlled by 2
+ distinct PDUs:
+
+ * Node 1: PDU 1 port 10 and PDU 2 port 10
+ * Node 2: PDU 1 port 11 and PDU 2 port 11
+
+ The matching fencing agents are configured as follows:
+
+ .. code-block:: xml
+
+ <primitive class="stonith" id="fence_apc1" type="fence_apc_snmp">
+ <instance_attributes id="fence_apc1-instance_attributes">
+ <nvpair id="fence_apc1-instance_attributes-ipaddr" name="ipaddr" value="198.51.100.1"/>
+ <nvpair id="fence_apc1-instance_attributes-login" name="login" value="fencing"/>
+ <nvpair id="fence_apc1-instance_attributes-passwd" name="passwd" value="fencing"/>
+ <nvpair id="fence_apc1-instance_attributes-pcmk_host_list"
+ name="pcmk_host_map" value="prod-mysql1:10;prod-mysql2:11"/>
+ <nvpair id="fence_apc1-instance_attributes-pcmk_delay_max" name="pcmk_delay_max" value="8s"/>
+ </instance_attributes>
+ </primitive>
+ <primitive class="stonith" id="fence_apc2" type="fence_apc_snmp">
+ <instance_attributes id="fence_apc2-instance_attributes">
+ <nvpair id="fence_apc2-instance_attributes-ipaddr" name="ipaddr" value="203.0.113.1"/>
+ <nvpair id="fence_apc2-instance_attributes-login" name="login" value="fencing"/>
+ <nvpair id="fence_apc2-instance_attributes-passwd" name="passwd" value="fencing"/>
+ <nvpair id="fence_apc2-instance_attributes-pcmk_host_list"
+ name="pcmk_host_map" value="prod-mysql1:10;prod-mysql2:11"/>
+ <nvpair id="fence_apc2-instance_attributes-pcmk_delay_max" name="pcmk_delay_max" value="8s"/>
+ </instance_attributes>
+ </primitive>
+
+.. note:: **Fencing topology**
+
+ Now that all the fencing resources are defined, it's time to create the
+ right topology. We want to first fence using IPMI and if that does not work,
+ fence both PDUs to effectively and surely kill the node.
+
+ .. code-block:: xml
+
+ <fencing-topology>
+ <fencing-level id="level-1-1" target="prod-mysql1" index="1" devices="fence_prod-mysql1_ipmi" />
+ <fencing-level id="level-1-2" target="prod-mysql1" index="2" devices="fence_apc1,fence_apc2" />
+ <fencing-level id="level-2-1" target="prod-mysql2" index="1" devices="fence_prod-mysql2_ipmi" />
+ <fencing-level id="level-2-2" target="prod-mysql2" index="2" devices="fence_apc1,fence_apc2" />
+ </fencing-topology>
+
+ In ``fencing-topology``, the lowest ``index`` value for a target determines
+ its first fencing method.
+
+Remapping Reboots
+#################
+
+When the cluster needs to reboot a node, whether because ``stonith-action`` is
+``reboot`` or because a reboot was requested externally (such as by
+``stonith_admin --reboot``), it will remap that to other commands in two cases:
+
+* If the chosen fencing device does not support the ``reboot`` command, the
+ cluster will ask it to perform ``off`` instead.
+
+* If a fencing topology level with multiple devices must be executed, the
+ cluster will ask all the devices to perform ``off``, then ask the devices to
+ perform ``on``.
+
+To understand the second case, consider the example of a node with redundant
+power supplies connected to intelligent power switches. Rebooting one switch
+and then the other would have no effect on the node. Turning both switches off,
+and then on, actually reboots the node.
+
+In such a case, the fencing operation will be treated as successful as long as
+the ``off`` commands succeed, because then it is safe for the cluster to
+recover any resources that were on the node. Timeouts and errors in the ``on``
+phase will be logged but ignored.
+
+When a reboot operation is remapped, any action-specific timeout for the
+remapped action will be used (for example, ``pcmk_off_timeout`` will be used
+when executing the ``off`` command, not ``pcmk_reboot_timeout``).
diff --git a/doc/sphinx/Pacemaker_Explained/images/resource-set.png b/doc/sphinx/Pacemaker_Explained/images/resource-set.png
new file mode 100644
index 0000000..fbed8b8
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/images/resource-set.png
Binary files differ
diff --git a/doc/sphinx/Pacemaker_Explained/images/three-sets.png b/doc/sphinx/Pacemaker_Explained/images/three-sets.png
new file mode 100644
index 0000000..feda36e
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/images/three-sets.png
Binary files differ
diff --git a/doc/sphinx/Pacemaker_Explained/images/two-sets.png b/doc/sphinx/Pacemaker_Explained/images/two-sets.png
new file mode 100644
index 0000000..b84b5f4
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/images/two-sets.png
Binary files differ
diff --git a/doc/sphinx/Pacemaker_Explained/index.rst b/doc/sphinx/Pacemaker_Explained/index.rst
new file mode 100644
index 0000000..de2ddd9
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/index.rst
@@ -0,0 +1,41 @@
+Pacemaker Explained
+===================
+
+*Configuring Pacemaker Clusters*
+
+
+Abstract
+--------
+This document definitively explains Pacemaker's features and capabilities,
+particularly the XML syntax used in Pacemaker's Cluster Information Base (CIB).
+
+
+Table of Contents
+-----------------
+
+.. toctree::
+ :maxdepth: 3
+ :numbered:
+
+ intro
+ options
+ nodes
+ resources
+ constraints
+ fencing
+ alerts
+ rules
+ advanced-options
+ advanced-resources
+ reusing-configuration
+ utilization
+ acls
+ status
+ multi-site-clusters
+ ap-samples
+
+Index
+-----
+
+* :ref:`genindex`
+* :ref:`search`
diff --git a/doc/sphinx/Pacemaker_Explained/intro.rst b/doc/sphinx/Pacemaker_Explained/intro.rst
new file mode 100644
index 0000000..a1240c3
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/intro.rst
@@ -0,0 +1,22 @@
+Introduction
+------------
+
+The Scope of this Document
+##########################
+
+This document is intended to be an exhaustive reference for configuring
+Pacemaker. To achieve this, it focuses on the XML syntax used to configure the
+CIB.
+
+For those that are allergic to XML, multiple higher-level front-ends
+(both command-line and GUI) are available. These tools will not be covered
+in this document, though the concepts explained here should make the
+functionality of these tools more easily understood.
+
+Users may be interested in other parts of the
+`Pacemaker documentation set <https://www.clusterlabs.org/pacemaker/doc/>`_,
+such as *Clusters from Scratch*, a step-by-step guide to setting up an
+example cluster, and *Pacemaker Administration*, a guide to maintaining a
+cluster.
+
+.. include:: ../shared/pacemaker-intro.rst
diff --git a/doc/sphinx/Pacemaker_Explained/multi-site-clusters.rst b/doc/sphinx/Pacemaker_Explained/multi-site-clusters.rst
new file mode 100644
index 0000000..59d3f93
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/multi-site-clusters.rst
@@ -0,0 +1,341 @@
+Multi-Site Clusters and Tickets
+-------------------------------
+
+Apart from local clusters, Pacemaker also supports multi-site clusters.
+That means you can have multiple, geographically dispersed sites, each with a
+local cluster. Failover between these clusters can be coordinated
+manually by the administrator, or automatically by a higher-level entity called
+a *Cluster Ticket Registry (CTR)*.
+
+Challenges for Multi-Site Clusters
+##################################
+
+Typically, multi-site environments are too far apart to support
+synchronous communication and data replication between the sites.
+That leads to significant challenges:
+
+- How do we make sure that a cluster site is up and running?
+
+- How do we make sure that resources are only started once?
+
+- How do we make sure that quorum can be reached between the different
+ sites and a split-brain scenario avoided?
+
+- How do we manage failover between sites?
+
+- How do we deal with high latency in case of resources that need to be
+ stopped?
+
+In the following sections, learn how to meet these challenges.
+
+Conceptual Overview
+###################
+
+Multi-site clusters can be considered as “overlay” clusters where
+each cluster site corresponds to a cluster node in a traditional cluster.
+The overlay cluster can be managed by a CTR in order to
+guarantee that any cluster resource will be active
+on no more than one cluster site. This is achieved by using
+*tickets* that are treated as failover domain between cluster
+sites, in case a site should be down.
+
+The following sections explain the individual components and mechanisms
+that were introduced for multi-site clusters in more detail.
+
+Ticket
+______
+
+Tickets are, essentially, cluster-wide attributes. A ticket grants the
+right to run certain resources on a specific cluster site. Resources can
+be bound to a certain ticket by ``rsc_ticket`` constraints. Only if the
+ticket is available at a site can the respective resources be started there.
+Vice versa, if the ticket is revoked, the resources depending on that
+ticket must be stopped.
+
+The ticket thus is similar to a *site quorum*, i.e. the permission to
+manage/own resources associated with that site. (One can also think of the
+current ``have-quorum`` flag as a special, cluster-wide ticket that is
+granted in case of node majority.)
+
+Tickets can be granted and revoked either manually by administrators
+(which could be the default for classic enterprise clusters), or via
+the automated CTR mechanism described below.
+
+A ticket can only be owned by one site at a time. Initially, none
+of the sites has a ticket. Each ticket must be granted once by the cluster
+administrator.
+
+The presence or absence of tickets for a site is stored in the CIB as a
+cluster status. With regards to a certain ticket, there are only two states
+for a site: ``true`` (the site has the ticket) or ``false`` (the site does
+not have the ticket). The absence of a certain ticket (during the initial
+state of the multi-site cluster) is the same as the value ``false``.
+
+Dead Man Dependency
+___________________
+
+A site can only activate resources safely if it can be sure that the
+other site has deactivated them. However after a ticket is revoked, it can
+take a long time until all resources depending on that ticket are stopped
+"cleanly", especially in case of cascaded resources. To cut that process
+short, the concept of a *Dead Man Dependency* was introduced.
+
+If a dead man dependency is in force, if a ticket is revoked from a site, the
+nodes that are hosting dependent resources are fenced. This considerably speeds
+up the recovery process of the cluster and makes sure that resources can be
+migrated more quickly.
+
+This can be configured by specifying a ``loss-policy="fence"`` in
+``rsc_ticket`` constraints.
+
+Cluster Ticket Registry
+_______________________
+
+A CTR is a coordinated group of network daemons that automatically handles
+granting, revoking, and timing out tickets (instead of the administrator
+revoking the ticket somewhere, waiting for everything to stop, and then
+granting it on the desired site).
+
+Pacemaker does not implement its own CTR, but interoperates with external
+software designed for that purpose (similar to how resource and fencing agents
+are not directly part of pacemaker).
+
+Participating clusters run the CTR daemons, which connect to each other, exchange
+information about their connectivity, and vote on which sites gets which
+tickets.
+
+A ticket is granted to a site only once the CTR is sure that the ticket
+has been relinquished by the previous owner, implemented via a timer in most
+scenarios. If a site loses connection to its peers, its tickets time out and
+recovery occurs. After the connection timeout plus the recovery timeout has
+passed, the other sites are allowed to re-acquire the ticket and start the
+resources again.
+
+This can also be thought of as a "quorum server", except that it is not
+a single quorum ticket, but several.
+
+Configuration Replication
+_________________________
+
+As usual, the CIB is synchronized within each cluster, but it is *not* synchronized
+across cluster sites of a multi-site cluster. You have to configure the resources
+that will be highly available across the multi-site cluster for every site
+accordingly.
+
+.. _ticket-constraints:
+
+Configuring Ticket Dependencies
+###############################
+
+The **rsc_ticket** constraint lets you specify the resources depending on a certain
+ticket. Together with the constraint, you can set a **loss-policy** that defines
+what should happen to the respective resources if the ticket is revoked.
+
+The attribute **loss-policy** can have the following values:
+
+* ``fence:`` Fence the nodes that are running the relevant resources.
+
+* ``stop:`` Stop the relevant resources.
+
+* ``freeze:`` Do nothing to the relevant resources.
+
+* ``demote:`` Demote relevant resources that are running in the promoted role.
+
+.. topic:: Constraint that fences node if ``ticketA`` is revoked
+
+ .. code-block:: xml
+
+ <rsc_ticket id="rsc1-req-ticketA" rsc="rsc1" ticket="ticketA" loss-policy="fence"/>
+
+The example above creates a constraint with the ID ``rsc1-req-ticketA``. It
+defines that the resource ``rsc1`` depends on ``ticketA`` and that the node running
+the resource should be fenced if ``ticketA`` is revoked.
+
+If resource ``rsc1`` were a promotable resource, you might want to configure
+that only being in the promoted role depends on ``ticketA``. With the following
+configuration, ``rsc1`` will be demoted if ``ticketA`` is revoked:
+
+.. topic:: Constraint that demotes ``rsc1`` if ``ticketA`` is revoked
+
+ .. code-block:: xml
+
+ <rsc_ticket id="rsc1-req-ticketA" rsc="rsc1" rsc-role="Promoted" ticket="ticketA" loss-policy="demote"/>
+
+You can create multiple **rsc_ticket** constraints to let multiple resources
+depend on the same ticket. However, **rsc_ticket** also supports resource sets
+(see :ref:`s-resource-sets`), so one can easily list all the resources in one
+**rsc_ticket** constraint instead.
+
+.. topic:: Ticket constraint for multiple resources
+
+ .. code-block:: xml
+
+ <rsc_ticket id="resources-dep-ticketA" ticket="ticketA" loss-policy="fence">
+ <resource_set id="resources-dep-ticketA-0" role="Started">
+ <resource_ref id="rsc1"/>
+ <resource_ref id="group1"/>
+ <resource_ref id="clone1"/>
+ </resource_set>
+ <resource_set id="resources-dep-ticketA-1" role="Promoted">
+ <resource_ref id="ms1"/>
+ </resource_set>
+ </rsc_ticket>
+
+In the example above, there are two resource sets, so we can list resources
+with different roles in a single ``rsc_ticket`` constraint. There's no dependency
+between the two resource sets, and there's no dependency among the
+resources within a resource set. Each of the resources just depends on
+``ticketA``.
+
+Referencing resource templates in ``rsc_ticket`` constraints, and even
+referencing them within resource sets, is also supported.
+
+If you want other resources to depend on further tickets, create as many
+constraints as necessary with ``rsc_ticket``.
+
+Managing Multi-Site Clusters
+############################
+
+Granting and Revoking Tickets Manually
+______________________________________
+
+You can grant tickets to sites or revoke them from sites manually.
+If you want to re-distribute a ticket, you should wait for
+the dependent resources to stop cleanly at the previous site before you
+grant the ticket to the new site.
+
+Use the **crm_ticket** command line tool to grant and revoke tickets.
+
+To grant a ticket to this site:
+
+ .. code-block:: none
+
+ # crm_ticket --ticket ticketA --grant
+
+To revoke a ticket from this site:
+
+ .. code-block:: none
+
+ # crm_ticket --ticket ticketA --revoke
+
+.. important::
+
+ If you are managing tickets manually, use the **crm_ticket** command with
+ great care, because it cannot check whether the same ticket is already
+ granted elsewhere.
+
+Granting and Revoking Tickets via a Cluster Ticket Registry
+___________________________________________________________
+
+We will use `Booth <https://github.com/ClusterLabs/booth>`_ here as an example of
+software that can be used with pacemaker as a Cluster Ticket Registry. Booth
+implements the `Raft <http://en.wikipedia.org/wiki/Raft_%28computer_science%29>`_
+algorithm to guarantee the distributed consensus among different
+cluster sites, and manages the ticket distribution (and thus the failover
+process between sites).
+
+Each of the participating clusters and *arbitrators* runs the Booth daemon
+**boothd**.
+
+An *arbitrator* is the multi-site equivalent of a quorum-only node in a local
+cluster. If you have a setup with an even number of sites,
+you need an additional instance to reach consensus about decisions such
+as failover of resources across sites. In this case, add one or more
+arbitrators running at additional sites. Arbitrators are single machines
+that run a booth instance in a special mode. An arbitrator is especially
+important for a two-site scenario, otherwise there is no way for one site
+to distinguish between a network failure between it and the other site, and
+a failure of the other site.
+
+The most common multi-site scenario is probably a multi-site cluster with two
+sites and a single arbitrator on a third site. However, technically, there are
+no limitations with regards to the number of sites and the number of
+arbitrators involved.
+
+**Boothd** at each site connects to its peers running at the other sites and
+exchanges connectivity details. Once a ticket is granted to a site, the
+booth mechanism will manage the ticket automatically: If the site which
+holds the ticket is out of service, the booth daemons will vote which
+of the other sites will get the ticket. To protect against brief
+connection failures, sites that lose the vote (either explicitly or
+implicitly by being disconnected from the voting body) need to
+relinquish the ticket after a time-out. Thus, it is made sure that a
+ticket will only be re-distributed after it has been relinquished by the
+previous site. The resources that depend on that ticket will fail over
+to the new site holding the ticket. The nodes that have run the
+resources before will be treated according to the **loss-policy** you set
+within the **rsc_ticket** constraint.
+
+Before the booth can manage a certain ticket within the multi-site cluster,
+you initially need to grant it to a site manually via the **booth** command-line
+tool. After you have initially granted a ticket to a site, **boothd**
+will take over and manage the ticket automatically.
+
+.. important::
+
+ The **booth** command-line tool can be used to grant, list, or
+ revoke tickets and can be run on any machine where **boothd** is running.
+ If you are managing tickets via Booth, use only **booth** for manual
+ intervention, not **crm_ticket**. That ensures the same ticket
+ will only be owned by one cluster site at a time.
+
+Booth Requirements
+~~~~~~~~~~~~~~~~~~
+
+* All clusters that will be part of the multi-site cluster must be based on
+ Pacemaker.
+
+* Booth must be installed on all cluster nodes and on all arbitrators that will
+ be part of the multi-site cluster.
+
+* Nodes belonging to the same cluster site should be synchronized via NTP. However,
+ time synchronization is not required between the individual cluster sites.
+
+General Management of Tickets
+_____________________________
+
+Display the information of tickets:
+
+ .. code-block:: none
+
+ # crm_ticket --info
+
+Or you can monitor them with:
+
+ .. code-block:: none
+
+ # crm_mon --tickets
+
+Display the ``rsc_ticket`` constraints that apply to a ticket:
+
+ .. code-block:: none
+
+ # crm_ticket --ticket ticketA --constraints
+
+When you want to do maintenance or manual switch-over of a ticket,
+revoking the ticket would trigger the loss policies. If
+``loss-policy="fence"``, the dependent resources could not be gracefully
+stopped/demoted, and other unrelated resources could even be affected.
+
+The proper way is making the ticket *standby* first with:
+
+ .. code-block:: none
+
+ # crm_ticket --ticket ticketA --standby
+
+Then the dependent resources will be stopped or demoted gracefully without
+triggering the loss policies.
+
+If you have finished the maintenance and want to activate the ticket again,
+you can run:
+
+ .. code-block:: none
+
+ # crm_ticket --ticket ticketA --activate
+
+For more information
+####################
+
+* `SUSE's Geo Clustering quick start <https://www.suse.com/documentation/sle-ha-geo-12/art_ha_geo_quick/data/art_ha_geo_quick.html>`_
+
+* `Booth <https://github.com/ClusterLabs/booth>`_
diff --git a/doc/sphinx/Pacemaker_Explained/nodes.rst b/doc/sphinx/Pacemaker_Explained/nodes.rst
new file mode 100644
index 0000000..6fcadb3
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/nodes.rst
@@ -0,0 +1,441 @@
+Cluster Nodes
+-------------
+
+Defining a Cluster Node
+_______________________
+
+Each cluster node will have an entry in the ``nodes`` section containing at
+least an ID and a name. A cluster node's ID is defined by the cluster layer
+(Corosync).
+
+.. topic:: **Example Corosync cluster node entry**
+
+ .. code-block:: xml
+
+ <node id="101" uname="pcmk-1"/>
+
+In normal circumstances, the admin should let the cluster populate this
+information automatically from the cluster layer.
+
+
+.. _node_name:
+
+Where Pacemaker Gets the Node Name
+##################################
+
+The name that Pacemaker uses for a node in the configuration does not have to
+be the same as its local hostname. Pacemaker uses the following for a Corosync
+node's name, in order of most preferred first:
+
+* The value of ``name`` in the ``nodelist`` section of ``corosync.conf``
+* The value of ``ring0_addr`` in the ``nodelist`` section of ``corosync.conf``
+* The local hostname (value of ``uname -n``)
+
+If the cluster is running, the ``crm_node -n`` command will display the local
+node's name as used by the cluster.
+
+If a Corosync ``nodelist`` is used, ``crm_node --name-for-id`` with a Corosync
+node ID will display the name used by the node with the given Corosync
+``nodeid``, for example:
+
+.. code-block:: none
+
+ crm_node --name-for-id 2
+
+
+.. index::
+ single: node; attribute
+ single: node attribute
+
+.. _node_attributes:
+
+Node Attributes
+_______________
+
+Pacemaker allows node-specific values to be specified using *node attributes*.
+A node attribute has a name, and may have a distinct value for each node.
+
+Node attributes come in two types, *permanent* and *transient*. Permanent node
+attributes are kept within the ``node`` entry, and keep their values even if
+the cluster restarts on a node. Transient node attributes are kept in the CIB's
+``status`` section, and go away when the cluster stops on the node.
+
+While certain node attributes have specific meanings to the cluster, they are
+mainly intended to allow administrators and resource agents to track any
+information desired.
+
+For example, an administrator might choose to define node attributes for how
+much RAM and disk space each node has, which OS each uses, or which server room
+rack each node is in.
+
+Users can configure :ref:`rules` that use node attributes to affect where
+resources are placed.
+
+Setting and querying node attributes
+####################################
+
+Node attributes can be set and queried using the ``crm_attribute`` and
+``attrd_updater`` commands, so that the user does not have to deal with XML
+configuration directly.
+
+Here is an example command to set a permanent node attribute, and the XML
+configuration that would be generated:
+
+.. topic:: **Result of using crm_attribute to specify which kernel pcmk-1 is running**
+
+ .. code-block:: none
+
+ # crm_attribute --type nodes --node pcmk-1 --name kernel --update $(uname -r)
+
+ .. code-block:: xml
+
+ <node id="1" uname="pcmk-1">
+ <instance_attributes id="nodes-1-attributes">
+ <nvpair id="nodes-1-kernel" name="kernel" value="3.10.0-862.14.4.el7.x86_64"/>
+ </instance_attributes>
+ </node>
+
+To read back the value that was just set:
+
+.. code-block:: none
+
+ # crm_attribute --type nodes --node pcmk-1 --name kernel --query
+ scope=nodes name=kernel value=3.10.0-862.14.4.el7.x86_64
+
+The ``--type nodes`` indicates that this is a permanent node attribute;
+``--type status`` would indicate a transient node attribute.
+
+Special node attributes
+#######################
+
+Certain node attributes have special meaning to the cluster.
+
+Node attribute names beginning with ``#`` are considered reserved for these
+special attributes. Some special attributes do not start with ``#``, for
+historical reasons.
+
+Certain special attributes are set automatically by the cluster, should never
+be modified directly, and can be used only within :ref:`rules`; these are
+listed under
+:ref:`built-in node attributes <node-attribute-expressions-special>`.
+
+For true/false values, the cluster considers a value of "1", "y", "yes", "on",
+or "true" (case-insensitively) to be true, "0", "n", "no", "off", "false", or
+unset to be false, and anything else to be an error.
+
+.. table:: **Node attributes with special significance**
+ :class: longtable
+ :widths: 1 2
+
+ +----------------------------+-----------------------------------------------------+
+ | Name | Description |
+ +============================+=====================================================+
+ | fail-count-* | .. index:: |
+ | | pair: node attribute; fail-count |
+ | | |
+ | | Attributes whose names start with |
+ | | ``fail-count-`` are managed by the cluster |
+ | | to track how many times particular resource |
+ | | operations have failed on this node. These |
+ | | should be queried and cleared via the |
+ | | ``crm_failcount`` or |
+ | | ``crm_resource --cleanup`` commands rather |
+ | | than directly. |
+ +----------------------------+-----------------------------------------------------+
+ | last-failure-* | .. index:: |
+ | | pair: node attribute; last-failure |
+ | | |
+ | | Attributes whose names start with |
+ | | ``last-failure-`` are managed by the cluster |
+ | | to track when particular resource operations |
+ | | have most recently failed on this node. |
+ | | These should be cleared via the |
+ | | ``crm_failcount`` or |
+ | | ``crm_resource --cleanup`` commands rather |
+ | | than directly. |
+ +----------------------------+-----------------------------------------------------+
+ | maintenance | .. index:: |
+ | | pair: node attribute; maintenance |
+ | | |
+ | | Similar to the ``maintenance-mode`` |
+ | | :ref:`cluster option <cluster_options>`, but |
+ | | for a single node. If true, resources will |
+ | | not be started or stopped on the node, |
+ | | resources and individual clone instances |
+ | | running on the node will become unmanaged, |
+ | | and any recurring operations for those will |
+ | | be cancelled. |
+ | | |
+ | | **Warning:** Restarting pacemaker on a node that is |
+ | | in single-node maintenance mode will likely |
+ | | lead to undesirable effects. If |
+ | | ``maintenance`` is set as a transient |
+ | | attribute, it will be erased when |
+ | | Pacemaker is stopped, which will |
+ | | immediately take the node out of |
+ | | maintenance mode and likely get it |
+ | | fenced. Even if permanent, if Pacemaker |
+ | | is restarted, any resources active on the |
+ | | node will have their local history erased |
+ | | when the node rejoins, so the cluster |
+ | | will no longer consider them running on |
+ | | the node and thus will consider them |
+ | | managed again, leading them to be started |
+ | | elsewhere. This behavior might be |
+ | | improved in a future release. |
+ +----------------------------+-----------------------------------------------------+
+ | probe_complete | .. index:: |
+ | | pair: node attribute; probe_complete |
+ | | |
+ | | This is managed by the cluster to detect |
+ | | when nodes need to be reprobed, and should |
+ | | never be used directly. |
+ +----------------------------+-----------------------------------------------------+
+ | resource-discovery-enabled | .. index:: |
+ | | pair: node attribute; resource-discovery-enabled |
+ | | |
+ | | If the node is a remote node, fencing is enabled, |
+ | | and this attribute is explicitly set to false |
+ | | (unset means true in this case), resource discovery |
+ | | (probes) will not be done on this node. This is |
+ | | highly discouraged; the ``resource-discovery`` |
+ | | location constraint property is preferred for this |
+ | | purpose. |
+ +----------------------------+-----------------------------------------------------+
+ | shutdown | .. index:: |
+ | | pair: node attribute; shutdown |
+ | | |
+ | | This is managed by the cluster to orchestrate the |
+ | | shutdown of a node, and should never be used |
+ | | directly. |
+ +----------------------------+-----------------------------------------------------+
+ | site-name | .. index:: |
+ | | pair: node attribute; site-name |
+ | | |
+ | | If set, this will be used as the value of the |
+ | | ``#site-name`` node attribute used in rules. (If |
+ | | not set, the value of the ``cluster-name`` cluster |
+ | | option will be used as ``#site-name`` instead.) |
+ +----------------------------+-----------------------------------------------------+
+ | standby | .. index:: |
+ | | pair: node attribute; standby |
+ | | |
+ | | If true, the node is in standby mode. This is |
+ | | typically set and queried via the ``crm_standby`` |
+ | | command rather than directly. |
+ +----------------------------+-----------------------------------------------------+
+ | terminate | .. index:: |
+ | | pair: node attribute; terminate |
+ | | |
+ | | If the value is true or begins with any nonzero |
+ | | number, the node will be fenced. This is typically |
+ | | set by tools rather than directly. |
+ +----------------------------+-----------------------------------------------------+
+ | #digests-* | .. index:: |
+ | | pair: node attribute; #digests |
+ | | |
+ | | Attributes whose names start with ``#digests-`` are |
+ | | managed by the cluster to detect when |
+ | | :ref:`unfencing` needs to be redone, and should |
+ | | never be used directly. |
+ +----------------------------+-----------------------------------------------------+
+ | #node-unfenced | .. index:: |
+ | | pair: node attribute; #node-unfenced |
+ | | |
+ | | When the node was last unfenced (as seconds since |
+ | | the epoch). This is managed by the cluster and |
+ | | should never be used directly. |
+ +----------------------------+-----------------------------------------------------+
+
+.. index::
+ single: node; health
+
+.. _node-health:
+
+Tracking Node Health
+____________________
+
+A node may be functioning adequately as far as cluster membership is concerned,
+and yet be "unhealthy" in some respect that makes it an undesirable location
+for resources. For example, a disk drive may be reporting SMART errors, or the
+CPU may be highly loaded.
+
+Pacemaker offers a way to automatically move resources off unhealthy nodes.
+
+.. index::
+ single: node attribute; health
+
+Node Health Attributes
+######################
+
+Pacemaker will treat any node attribute whose name starts with ``#health`` as
+an indicator of node health. Node health attributes may have one of the
+following values:
+
+.. table:: **Allowed Values for Node Health Attributes**
+ :widths: 1 4
+
+ +------------+--------------------------------------------------------------+
+ | Value | Intended significance |
+ +============+==============================================================+
+ | ``red`` | .. index:: |
+ | | single: red; node health attribute value |
+ | | single: node attribute; health (red) |
+ | | |
+ | | This indicator is unhealthy |
+ +------------+--------------------------------------------------------------+
+ | ``yellow`` | .. index:: |
+ | | single: yellow; node health attribute value |
+ | | single: node attribute; health (yellow) |
+ | | |
+ | | This indicator is becoming unhealthy |
+ +------------+--------------------------------------------------------------+
+ | ``green`` | .. index:: |
+ | | single: green; node health attribute value |
+ | | single: node attribute; health (green) |
+ | | |
+ | | This indicator is healthy |
+ +------------+--------------------------------------------------------------+
+ | *integer* | .. index:: |
+ | | single: score; node health attribute value |
+ | | single: node attribute; health (score) |
+ | | |
+ | | A numeric score to apply to all resources on this node (0 or |
+ | | positive is healthy, negative is unhealthy) |
+ +------------+--------------------------------------------------------------+
+
+
+.. index::
+ pair: cluster option; node-health-strategy
+
+Node Health Strategy
+####################
+
+Pacemaker assigns a node health score to each node, as the sum of the values of
+all its node health attributes. This score will be used as a location
+constraint applied to this node for all resources.
+
+The ``node-health-strategy`` cluster option controls how Pacemaker responds to
+changes in node health attributes, and how it translates ``red``, ``yellow``,
+and ``green`` to scores.
+
+Allowed values are:
+
+.. table:: **Node Health Strategies**
+ :widths: 1 4
+
+ +----------------+----------------------------------------------------------+
+ | Value | Effect |
+ +================+==========================================================+
+ | none | .. index:: |
+ | | single: node-health-strategy; none |
+ | | single: none; node-health-strategy value |
+ | | |
+ | | Do not track node health attributes at all. |
+ +----------------+----------------------------------------------------------+
+ | migrate-on-red | .. index:: |
+ | | single: node-health-strategy; migrate-on-red |
+ | | single: migrate-on-red; node-health-strategy value |
+ | | |
+ | | Assign the value of ``-INFINITY`` to ``red``, and 0 to |
+ | | ``yellow`` and ``green``. This will cause all resources |
+ | | to move off the node if any attribute is ``red``. |
+ +----------------+----------------------------------------------------------+
+ | only-green | .. index:: |
+ | | single: node-health-strategy; only-green |
+ | | single: only-green; node-health-strategy value |
+ | | |
+ | | Assign the value of ``-INFINITY`` to ``red`` and |
+ | | ``yellow``, and 0 to ``green``. This will cause all |
+ | | resources to move off the node if any attribute is |
+ | | ``red`` or ``yellow``. |
+ +----------------+----------------------------------------------------------+
+ | progressive | .. index:: |
+ | | single: node-health-strategy; progressive |
+ | | single: progressive; node-health-strategy value |
+ | | |
+ | | Assign the value of the ``node-health-red`` cluster |
+ | | option to ``red``, the value of ``node-health-yellow`` |
+ | | to ``yellow``, and the value of ``node-health-green`` to |
+ | | ``green``. Each node is additionally assigned a score of |
+ | | ``node-health-base`` (this allows resources to start |
+ | | even if some attributes are ``yellow``). This strategy |
+ | | gives the administrator finer control over how important |
+ | | each value is. |
+ +----------------+----------------------------------------------------------+
+ | custom | .. index:: |
+ | | single: node-health-strategy; custom |
+ | | single: custom; node-health-strategy value |
+ | | |
+ | | Track node health attributes using the same values as |
+ | | ``progressive`` for ``red``, ``yellow``, and ``green``, |
+ | | but do not take them into account. The administrator is |
+ | | expected to implement a policy by defining :ref:`rules` |
+ | | referencing node health attributes. |
+ +----------------+----------------------------------------------------------+
+
+
+Exempting a Resource from Health Restrictions
+#############################################
+
+If you want a resource to be able to run on a node even if its health score
+would otherwise prevent it, set the resource's ``allow-unhealthy-nodes``
+meta-attribute to ``true`` *(available since 2.1.3)*.
+
+This is particularly useful for node health agents, to allow them to detect
+when the node becomes healthy again. If you configure a health agent without
+this setting, then the health agent will be banned from an unhealthy node,
+and you will have to investigate and clear the health attribute manually once
+it is healthy to allow resources on the node again.
+
+If you want the meta-attribute to apply to a clone, it must be set on the clone
+itself, not on the resource being cloned.
+
+
+Configuring Node Health Agents
+##############################
+
+Since Pacemaker calculates node health based on node attributes, any method
+that sets node attributes may be used to measure node health. The most common
+are resource agents and custom daemons.
+
+Pacemaker provides examples that can be used directly or as a basis for custom
+code. The ``ocf:pacemaker:HealthCPU``, ``ocf:pacemaker:HealthIOWait``, and
+``ocf:pacemaker:HealthSMART`` resource agents set node health attributes based
+on CPU and disk status.
+
+To take advantage of this feature, add the resource to your cluster (generally
+as a cloned resource with a recurring monitor action, to continually check the
+health of all nodes). For example:
+
+.. topic:: Example HealthIOWait resource configuration
+
+ .. code-block:: xml
+
+ <clone id="resHealthIOWait-clone">
+ <primitive class="ocf" id="HealthIOWait" provider="pacemaker" type="HealthIOWait">
+ <instance_attributes id="resHealthIOWait-instance_attributes">
+ <nvpair id="resHealthIOWait-instance_attributes-red_limit" name="red_limit" value="30"/>
+ <nvpair id="resHealthIOWait-instance_attributes-yellow_limit" name="yellow_limit" value="10"/>
+ </instance_attributes>
+ <operations>
+ <op id="resHealthIOWait-monitor-interval-5" interval="5" name="monitor" timeout="5"/>
+ <op id="resHealthIOWait-start-interval-0s" interval="0s" name="start" timeout="10s"/>
+ <op id="resHealthIOWait-stop-interval-0s" interval="0s" name="stop" timeout="10s"/>
+ </operations>
+ </primitive>
+ </clone>
+
+The resource agents use ``attrd_updater`` to set proper status for each node
+running this resource, as a node attribute whose name starts with ``#health``
+(for ``HealthIOWait``, the node attribute is named ``#health-iowait``).
+
+When a node is no longer faulty, you can force the cluster to make it available
+to take resources without waiting for the next monitor, by setting the node
+health attribute to green. For example:
+
+.. topic:: **Force node1 to be marked as healthy**
+
+ .. code-block:: none
+
+ # attrd_updater --name "#health-iowait" --update "green" --node "node1"
diff --git a/doc/sphinx/Pacemaker_Explained/options.rst b/doc/sphinx/Pacemaker_Explained/options.rst
new file mode 100644
index 0000000..ee0511c
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/options.rst
@@ -0,0 +1,622 @@
+Cluster-Wide Configuration
+--------------------------
+
+.. index::
+ pair: XML element; cib
+ pair: XML element; configuration
+
+Configuration Layout
+####################
+
+The cluster is defined by the Cluster Information Base (CIB), which uses XML
+notation. The simplest CIB, an empty one, looks like this:
+
+.. topic:: An empty configuration
+
+ .. code-block:: xml
+
+ <cib crm_feature_set="3.6.0" validate-with="pacemaker-3.5" epoch="1" num_updates="0" admin_epoch="0">
+ <configuration>
+ <crm_config/>
+ <nodes/>
+ <resources/>
+ <constraints/>
+ </configuration>
+ <status/>
+ </cib>
+
+The empty configuration above contains the major sections that make up a CIB:
+
+* ``cib``: The entire CIB is enclosed with a ``cib`` element. Certain
+ fundamental settings are defined as attributes of this element.
+
+ * ``configuration``: This section -- the primary focus of this document --
+ contains traditional configuration information such as what resources the
+ cluster serves and the relationships among them.
+
+ * ``crm_config``: cluster-wide configuration options
+
+ * ``nodes``: the machines that host the cluster
+
+ * ``resources``: the services run by the cluster
+
+ * ``constraints``: indications of how resources should be placed
+
+ * ``status``: This section contains the history of each resource on each
+ node. Based on this data, the cluster can construct the complete current
+ state of the cluster. The authoritative source for this section is the
+ local executor (pacemaker-execd process) on each cluster node, and the
+ cluster will occasionally repopulate the entire section. For this reason,
+ it is never written to disk, and administrators are advised against
+ modifying it in any way.
+
+In this document, configuration settings will be described as properties or
+options based on how they are defined in the CIB:
+
+* Properties are XML attributes of an XML element.
+
+* Options are name-value pairs expressed as ``nvpair`` child elements of an XML
+ element.
+
+Normally, you will use command-line tools that abstract the XML, so the
+distinction will be unimportant; both properties and options are cluster
+settings you can tweak.
+
+CIB Properties
+##############
+
+Certain settings are defined by CIB properties (that is, attributes of the
+``cib`` tag) rather than with the rest of the cluster configuration in the
+``configuration`` section.
+
+The reason is simply a matter of parsing. These options are used by the
+configuration database which is, by design, mostly ignorant of the content it
+holds. So the decision was made to place them in an easy-to-find location.
+
+.. table:: **CIB Properties**
+ :class: longtable
+ :widths: 1 3
+
+ +------------------+-----------------------------------------------------------+
+ | Attribute | Description |
+ +==================+===========================================================+
+ | admin_epoch | .. index:: |
+ | | pair: admin_epoch; cib |
+ | | |
+ | | When a node joins the cluster, the cluster performs a |
+ | | check to see which node has the best configuration. It |
+ | | asks the node with the highest (``admin_epoch``, |
+ | | ``epoch``, ``num_updates``) tuple to replace the |
+ | | configuration on all the nodes -- which makes setting |
+ | | them, and setting them correctly, very important. |
+ | | ``admin_epoch`` is never modified by the cluster; you can |
+ | | use this to make the configurations on any inactive nodes |
+ | | obsolete. |
+ | | |
+ | | **Warning:** Never set this value to zero. In such cases, |
+ | | the cluster cannot tell the difference between your |
+ | | configuration and the "empty" one used when nothing is |
+ | | found on disk. |
+ +------------------+-----------------------------------------------------------+
+ | epoch | .. index:: |
+ | | pair: epoch; cib |
+ | | |
+ | | The cluster increments this every time the configuration |
+ | | is updated (usually by the administrator). |
+ +------------------+-----------------------------------------------------------+
+ | num_updates | .. index:: |
+ | | pair: num_updates; cib |
+ | | |
+ | | The cluster increments this every time the configuration |
+ | | or status is updated (usually by the cluster) and resets |
+ | | it to 0 when epoch changes. |
+ +------------------+-----------------------------------------------------------+
+ | validate-with | .. index:: |
+ | | pair: validate-with; cib |
+ | | |
+ | | Determines the type of XML validation that will be done |
+ | | on the configuration. If set to ``none``, the cluster |
+ | | will not verify that updates conform to the DTD (nor |
+ | | reject ones that don't). |
+ +------------------+-----------------------------------------------------------+
+ | cib-last-written | .. index:: |
+ | | pair: cib-last-written; cib |
+ | | |
+ | | Indicates when the configuration was last written to |
+ | | disk. Maintained by the cluster; for informational |
+ | | purposes only. |
+ +------------------+-----------------------------------------------------------+
+ | have-quorum | .. index:: |
+ | | pair: have-quorum; cib |
+ | | |
+ | | Indicates if the cluster has quorum. If false, this may |
+ | | mean that the cluster cannot start resources or fence |
+ | | other nodes (see ``no-quorum-policy`` below). Maintained |
+ | | by the cluster. |
+ +------------------+-----------------------------------------------------------+
+ | dc-uuid | .. index:: |
+ | | pair: dc-uuid; cib |
+ | | |
+ | | Indicates which cluster node is the current leader. Used |
+ | | by the cluster when placing resources and determining the |
+ | | order of some events. Maintained by the cluster. |
+ +------------------+-----------------------------------------------------------+
+
+.. _cluster_options:
+
+Cluster Options
+###############
+
+Cluster options, as you might expect, control how the cluster behaves when
+confronted with various situations.
+
+They are grouped into sets within the ``crm_config`` section. In advanced
+configurations, there may be more than one set. (This will be described later
+in the chapter on :ref:`rules` where we will show how to have the cluster use
+different sets of options during working hours than during weekends.) For now,
+we will describe the simple case where each option is present at most once.
+
+You can obtain an up-to-date list of cluster options, including their default
+values, by running the ``man pacemaker-schedulerd`` and
+``man pacemaker-controld`` commands.
+
+.. table:: **Cluster Options**
+ :class: longtable
+ :widths: 2 1 4
+
+ +---------------------------+---------+----------------------------------------------------+
+ | Option | Default | Description |
+ +===========================+=========+====================================================+
+ | cluster-name | | .. index:: |
+ | | | pair: cluster option; cluster-name |
+ | | | |
+ | | | An (optional) name for the cluster as a whole. |
+ | | | This is mostly for users' convenience for use |
+ | | | as desired in administration, but this can be |
+ | | | used in the Pacemaker configuration in |
+ | | | :ref:`rules` (as the ``#cluster-name`` |
+ | | | :ref:`node attribute |
+ | | | <node-attribute-expressions-special>`. It may |
+ | | | also be used by higher-level tools when |
+ | | | displaying cluster information, and by |
+ | | | certain resource agents (for example, the |
+ | | | ``ocf:heartbeat:GFS2`` agent stores the |
+ | | | cluster name in filesystem meta-data). |
+ +---------------------------+---------+----------------------------------------------------+
+ | dc-version | | .. index:: |
+ | | | pair: cluster option; dc-version |
+ | | | |
+ | | | Version of Pacemaker on the cluster's DC. |
+ | | | Determined automatically by the cluster. Often |
+ | | | includes the hash which identifies the exact |
+ | | | Git changeset it was built from. Used for |
+ | | | diagnostic purposes. |
+ +---------------------------+---------+----------------------------------------------------+
+ | cluster-infrastructure | | .. index:: |
+ | | | pair: cluster option; cluster-infrastructure |
+ | | | |
+ | | | The messaging stack on which Pacemaker is |
+ | | | currently running. Determined automatically by |
+ | | | the cluster. Used for informational and |
+ | | | diagnostic purposes. |
+ +---------------------------+---------+----------------------------------------------------+
+ | no-quorum-policy | stop | .. index:: |
+ | | | pair: cluster option; no-quorum-policy |
+ | | | |
+ | | | What to do when the cluster does not have |
+ | | | quorum. Allowed values: |
+ | | | |
+ | | | * ``ignore:`` continue all resource management |
+ | | | * ``freeze:`` continue resource management, but |
+ | | | don't recover resources from nodes not in the |
+ | | | affected partition |
+ | | | * ``stop:`` stop all resources in the affected |
+ | | | cluster partition |
+ | | | * ``demote:`` demote promotable resources and |
+ | | | stop all other resources in the affected |
+ | | | cluster partition *(since 2.0.5)* |
+ | | | * ``suicide:`` fence all nodes in the affected |
+ | | | cluster partition |
+ +---------------------------+---------+----------------------------------------------------+
+ | batch-limit | 0 | .. index:: |
+ | | | pair: cluster option; batch-limit |
+ | | | |
+ | | | The maximum number of actions that the cluster |
+ | | | may execute in parallel across all nodes. The |
+ | | | "correct" value will depend on the speed and |
+ | | | load of your network and cluster nodes. If zero, |
+ | | | the cluster will impose a dynamically calculated |
+ | | | limit only when any node has high load. If -1, the |
+ | | | cluster will not impose any limit. |
+ +---------------------------+---------+----------------------------------------------------+
+ | migration-limit | -1 | .. index:: |
+ | | | pair: cluster option; migration-limit |
+ | | | |
+ | | | The number of |
+ | | | :ref:`live migration <live-migration>` actions |
+ | | | that the cluster is allowed to execute in |
+ | | | parallel on a node. A value of -1 means |
+ | | | unlimited. |
+ +---------------------------+---------+----------------------------------------------------+
+ | symmetric-cluster | true | .. index:: |
+ | | | pair: cluster option; symmetric-cluster |
+ | | | |
+ | | | Whether resources can run on any node by default |
+ | | | (if false, a resource is allowed to run on a |
+ | | | node only if a |
+ | | | :ref:`location constraint <location-constraint>` |
+ | | | enables it) |
+ +---------------------------+---------+----------------------------------------------------+
+ | stop-all-resources | false | .. index:: |
+ | | | pair: cluster option; stop-all-resources |
+ | | | |
+ | | | Whether all resources should be disallowed from |
+ | | | running (can be useful during maintenance) |
+ +---------------------------+---------+----------------------------------------------------+
+ | stop-orphan-resources | true | .. index:: |
+ | | | pair: cluster option; stop-orphan-resources |
+ | | | |
+ | | | Whether resources that have been deleted from |
+ | | | the configuration should be stopped. This value |
+ | | | takes precedence over ``is-managed`` (that is, |
+ | | | even unmanaged resources will be stopped when |
+ | | | orphaned if this value is ``true`` |
+ +---------------------------+---------+----------------------------------------------------+
+ | stop-orphan-actions | true | .. index:: |
+ | | | pair: cluster option; stop-orphan-actions |
+ | | | |
+ | | | Whether recurring :ref:`operations <operation>` |
+ | | | that have been deleted from the configuration |
+ | | | should be cancelled |
+ +---------------------------+---------+----------------------------------------------------+
+ | start-failure-is-fatal | true | .. index:: |
+ | | | pair: cluster option; start-failure-is-fatal |
+ | | | |
+ | | | Whether a failure to start a resource on a |
+ | | | particular node prevents further start attempts |
+ | | | on that node? If ``false``, the cluster will |
+ | | | decide whether the node is still eligible based |
+ | | | on the resource's current failure count and |
+ | | | :ref:`migration-threshold <failure-handling>`. |
+ +---------------------------+---------+----------------------------------------------------+
+ | enable-startup-probes | true | .. index:: |
+ | | | pair: cluster option; enable-startup-probes |
+ | | | |
+ | | | Whether the cluster should check the |
+ | | | pre-existing state of resources when the cluster |
+ | | | starts |
+ +---------------------------+---------+----------------------------------------------------+
+ | maintenance-mode | false | .. index:: |
+ | | | pair: cluster option; maintenance-mode |
+ | | | |
+ | | | Whether the cluster should refrain from |
+ | | | monitoring, starting and stopping resources |
+ +---------------------------+---------+----------------------------------------------------+
+ | stonith-enabled | true | .. index:: |
+ | | | pair: cluster option; stonith-enabled |
+ | | | |
+ | | | Whether the cluster is allowed to fence nodes |
+ | | | (for example, failed nodes and nodes with |
+ | | | resources that can't be stopped. |
+ | | | |
+ | | | If true, at least one fence device must be |
+ | | | configured before resources are allowed to run. |
+ | | | |
+ | | | If false, unresponsive nodes are immediately |
+ | | | assumed to be running no resources, and resource |
+ | | | recovery on online nodes starts without any |
+ | | | further protection (which can mean *data loss* |
+ | | | if the unresponsive node still accesses shared |
+ | | | storage, for example). See also the |
+ | | | :ref:`requires <requires>` resource |
+ | | | meta-attribute. |
+ +---------------------------+---------+----------------------------------------------------+
+ | stonith-action | reboot | .. index:: |
+ | | | pair: cluster option; stonith-action |
+ | | | |
+ | | | Action the cluster should send to the fence agent |
+ | | | when a node must be fenced. Allowed values are |
+ | | | ``reboot``, ``off``, and (for legacy agents only) |
+ | | | ``poweroff``. |
+ +---------------------------+---------+----------------------------------------------------+
+ | stonith-timeout | 60s | .. index:: |
+ | | | pair: cluster option; stonith-timeout |
+ | | | |
+ | | | How long to wait for ``on``, ``off``, and |
+ | | | ``reboot`` fence actions to complete by default. |
+ +---------------------------+---------+----------------------------------------------------+
+ | stonith-max-attempts | 10 | .. index:: |
+ | | | pair: cluster option; stonith-max-attempts |
+ | | | |
+ | | | How many times fencing can fail for a target |
+ | | | before the cluster will no longer immediately |
+ | | | re-attempt it. |
+ +---------------------------+---------+----------------------------------------------------+
+ | stonith-watchdog-timeout | 0 | .. index:: |
+ | | | pair: cluster option; stonith-watchdog-timeout |
+ | | | |
+ | | | If nonzero, and the cluster detects |
+ | | | ``have-watchdog`` as ``true``, then watchdog-based |
+ | | | self-fencing will be performed via SBD when |
+ | | | fencing is required, without requiring a fencing |
+ | | | resource explicitly configured. |
+ | | | |
+ | | | If this is set to a positive value, unseen nodes |
+ | | | are assumed to self-fence within this much time. |
+ | | | |
+ | | | **Warning:** It must be ensured that this value is |
+ | | | larger than the ``SBD_WATCHDOG_TIMEOUT`` |
+ | | | environment variable on all nodes. Pacemaker |
+ | | | verifies the settings individually on all nodes |
+ | | | and prevents startup or shuts down if configured |
+ | | | wrongly on the fly. It is strongly recommended |
+ | | | that ``SBD_WATCHDOG_TIMEOUT`` be set to the same |
+ | | | value on all nodes. |
+ | | | |
+ | | | If this is set to a negative value, and |
+ | | | ``SBD_WATCHDOG_TIMEOUT`` is set, twice that value |
+ | | | will be used. |
+ | | | |
+ | | | **Warning:** In this case, it is essential (and |
+ | | | currently not verified by pacemaker) that |
+ | | | ``SBD_WATCHDOG_TIMEOUT`` is set to the same |
+ | | | value on all nodes. |
+ +---------------------------+---------+----------------------------------------------------+
+ | concurrent-fencing | false | .. index:: |
+ | | | pair: cluster option; concurrent-fencing |
+ | | | |
+ | | | Whether the cluster is allowed to initiate |
+ | | | multiple fence actions concurrently. Fence actions |
+ | | | initiated externally, such as via the |
+ | | | ``stonith_admin`` tool or an application such as |
+ | | | DLM, or by the fencer itself such as recurring |
+ | | | device monitors and ``status`` and ``list`` |
+ | | | commands, are not limited by this option. |
+ +---------------------------+---------+----------------------------------------------------+
+ | fence-reaction | stop | .. index:: |
+ | | | pair: cluster option; fence-reaction |
+ | | | |
+ | | | How should a cluster node react if notified of its |
+ | | | own fencing? A cluster node may receive |
+ | | | notification of its own fencing if fencing is |
+ | | | misconfigured, or if fabric fencing is in use that |
+ | | | doesn't cut cluster communication. Allowed values |
+ | | | are ``stop`` to attempt to immediately stop |
+ | | | pacemaker and stay stopped, or ``panic`` to |
+ | | | attempt to immediately reboot the local node, |
+ | | | falling back to stop on failure. The default is |
+ | | | likely to be changed to ``panic`` in a future |
+ | | | release. *(since 2.0.3)* |
+ +---------------------------+---------+----------------------------------------------------+
+ | priority-fencing-delay | 0 | .. index:: |
+ | | | pair: cluster option; priority-fencing-delay |
+ | | | |
+ | | | Apply this delay to any fencing targeting the lost |
+ | | | nodes with the highest total resource priority in |
+ | | | case we don't have the majority of the nodes in |
+ | | | our cluster partition, so that the more |
+ | | | significant nodes potentially win any fencing |
+ | | | match (especially meaningful in a split-brain of a |
+ | | | 2-node cluster). A promoted resource instance |
+ | | | takes the resource's priority plus 1 if the |
+ | | | resource's priority is not 0. Any static or random |
+ | | | delays introduced by ``pcmk_delay_base`` and |
+ | | | ``pcmk_delay_max`` configured for the |
+ | | | corresponding fencing resources will be added to |
+ | | | this delay. This delay should be significantly |
+ | | | greater than (safely twice) the maximum delay from |
+ | | | those parameters. *(since 2.0.4)* |
+ +---------------------------+---------+----------------------------------------------------+
+ | cluster-delay | 60s | .. index:: |
+ | | | pair: cluster option; cluster-delay |
+ | | | |
+ | | | Estimated maximum round-trip delay over the |
+ | | | network (excluding action execution). If the DC |
+ | | | requires an action to be executed on another node, |
+ | | | it will consider the action failed if it does not |
+ | | | get a response from the other node in this time |
+ | | | (after considering the action's own timeout). The |
+ | | | "correct" value will depend on the speed and load |
+ | | | of your network and cluster nodes. |
+ +---------------------------+---------+----------------------------------------------------+
+ | dc-deadtime | 20s | .. index:: |
+ | | | pair: cluster option; dc-deadtime |
+ | | | |
+ | | | How long to wait for a response from other nodes |
+ | | | during startup. The "correct" value will depend on |
+ | | | the speed/load of your network and the type of |
+ | | | switches used. |
+ +---------------------------+---------+----------------------------------------------------+
+ | cluster-ipc-limit | 500 | .. index:: |
+ | | | pair: cluster option; cluster-ipc-limit |
+ | | | |
+ | | | The maximum IPC message backlog before one cluster |
+ | | | daemon will disconnect another. This is of use in |
+ | | | large clusters, for which a good value is the |
+ | | | number of resources in the cluster multiplied by |
+ | | | the number of nodes. The default of 500 is also |
+ | | | the minimum. Raise this if you see |
+ | | | "Evicting client" messages for cluster daemon PIDs |
+ | | | in the logs. |
+ +---------------------------+---------+----------------------------------------------------+
+ | pe-error-series-max | -1 | .. index:: |
+ | | | pair: cluster option; pe-error-series-max |
+ | | | |
+ | | | The number of scheduler inputs resulting in errors |
+ | | | to save. Used when reporting problems. A value of |
+ | | | -1 means unlimited (report all), and 0 means none. |
+ +---------------------------+---------+----------------------------------------------------+
+ | pe-warn-series-max | 5000 | .. index:: |
+ | | | pair: cluster option; pe-warn-series-max |
+ | | | |
+ | | | The number of scheduler inputs resulting in |
+ | | | warnings to save. Used when reporting problems. A |
+ | | | value of -1 means unlimited (report all), and 0 |
+ | | | means none. |
+ +---------------------------+---------+----------------------------------------------------+
+ | pe-input-series-max | 4000 | .. index:: |
+ | | | pair: cluster option; pe-input-series-max |
+ | | | |
+ | | | The number of "normal" scheduler inputs to save. |
+ | | | Used when reporting problems. A value of -1 means |
+ | | | unlimited (report all), and 0 means none. |
+ +---------------------------+---------+----------------------------------------------------+
+ | enable-acl | false | .. index:: |
+ | | | pair: cluster option; enable-acl |
+ | | | |
+ | | | Whether :ref:`acl` should be used to authorize |
+ | | | modifications to the CIB |
+ +---------------------------+---------+----------------------------------------------------+
+ | placement-strategy | default | .. index:: |
+ | | | pair: cluster option; placement-strategy |
+ | | | |
+ | | | How the cluster should allocate resources to nodes |
+ | | | (see :ref:`utilization`). Allowed values are |
+ | | | ``default``, ``utilization``, ``balanced``, and |
+ | | | ``minimal``. |
+ +---------------------------+---------+----------------------------------------------------+
+ | node-health-strategy | none | .. index:: |
+ | | | pair: cluster option; node-health-strategy |
+ | | | |
+ | | | How the cluster should react to node health |
+ | | | attributes (see :ref:`node-health`). Allowed values|
+ | | | are ``none``, ``migrate-on-red``, ``only-green``, |
+ | | | ``progressive``, and ``custom``. |
+ +---------------------------+---------+----------------------------------------------------+
+ | node-health-base | 0 | .. index:: |
+ | | | pair: cluster option; node-health-base |
+ | | | |
+ | | | The base health score assigned to a node. Only |
+ | | | used when ``node-health-strategy`` is |
+ | | | ``progressive``. |
+ +---------------------------+---------+----------------------------------------------------+
+ | node-health-green | 0 | .. index:: |
+ | | | pair: cluster option; node-health-green |
+ | | | |
+ | | | The score to use for a node health attribute whose |
+ | | | value is ``green``. Only used when |
+ | | | ``node-health-strategy`` is ``progressive`` or |
+ | | | ``custom``. |
+ +---------------------------+---------+----------------------------------------------------+
+ | node-health-yellow | 0 | .. index:: |
+ | | | pair: cluster option; node-health-yellow |
+ | | | |
+ | | | The score to use for a node health attribute whose |
+ | | | value is ``yellow``. Only used when |
+ | | | ``node-health-strategy`` is ``progressive`` or |
+ | | | ``custom``. |
+ +---------------------------+---------+----------------------------------------------------+
+ | node-health-red | 0 | .. index:: |
+ | | | pair: cluster option; node-health-red |
+ | | | |
+ | | | The score to use for a node health attribute whose |
+ | | | value is ``red``. Only used when |
+ | | | ``node-health-strategy`` is ``progressive`` or |
+ | | | ``custom``. |
+ +---------------------------+---------+----------------------------------------------------+
+ | cluster-recheck-interval | 15min | .. index:: |
+ | | | pair: cluster option; cluster-recheck-interval |
+ | | | |
+ | | | Pacemaker is primarily event-driven, and looks |
+ | | | ahead to know when to recheck the cluster for |
+ | | | failure timeouts and most time-based rules |
+ | | | *(since 2.0.3)*. However, it will also recheck the |
+ | | | cluster after this amount of inactivity. This has |
+ | | | two goals: rules with ``date_spec`` are only |
+ | | | guaranteed to be checked this often, and it also |
+ | | | serves as a fail-safe for some kinds of scheduler |
+ | | | bugs. A value of 0 disables this polling; positive |
+ | | | values are a time interval. |
+ +---------------------------+---------+----------------------------------------------------+
+ | shutdown-lock | false | .. index:: |
+ | | | pair: cluster option; shutdown-lock |
+ | | | |
+ | | | The default of false allows active resources to be |
+ | | | recovered elsewhere when their node is cleanly |
+ | | | shut down, which is what the vast majority of |
+ | | | users will want. However, some users prefer to |
+ | | | make resources highly available only for failures, |
+ | | | with no recovery for clean shutdowns. If this |
+ | | | option is true, resources active on a node when it |
+ | | | is cleanly shut down are kept "locked" to that |
+ | | | node (not allowed to run elsewhere) until they |
+ | | | start again on that node after it rejoins (or for |
+ | | | at most ``shutdown-lock-limit``, if set). Stonith |
+ | | | resources and Pacemaker Remote connections are |
+ | | | never locked. Clone and bundle instances and the |
+ | | | promoted role of promotable clones are currently |
+ | | | never locked, though support could be added in a |
+ | | | future release. Locks may be manually cleared |
+ | | | using the ``--refresh`` option of ``crm_resource`` |
+ | | | (both the resource and node must be specified; |
+ | | | this works with remote nodes if their connection |
+ | | | resource's ``target-role`` is set to ``Stopped``, |
+ | | | but not if Pacemaker Remote is stopped on the |
+ | | | remote node without disabling the connection |
+ | | | resource). *(since 2.0.4)* |
+ +---------------------------+---------+----------------------------------------------------+
+ | shutdown-lock-limit | 0 | .. index:: |
+ | | | pair: cluster option; shutdown-lock-limit |
+ | | | |
+ | | | If ``shutdown-lock`` is true, and this is set to a |
+ | | | nonzero time duration, locked resources will be |
+ | | | allowed to start after this much time has passed |
+ | | | since the node shutdown was initiated, even if the |
+ | | | node has not rejoined. (This works with remote |
+ | | | nodes only if their connection resource's |
+ | | | ``target-role`` is set to ``Stopped``.) |
+ | | | *(since 2.0.4)* |
+ +---------------------------+---------+----------------------------------------------------+
+ | remove-after-stop | false | .. index:: |
+ | | | pair: cluster option; remove-after-stop |
+ | | | |
+ | | | *Deprecated* Should the cluster remove |
+ | | | resources from Pacemaker's executor after they are |
+ | | | stopped? Values other than the default are, at |
+ | | | best, poorly tested and potentially dangerous. |
+ | | | This option is deprecated and will be removed in a |
+ | | | future release. |
+ +---------------------------+---------+----------------------------------------------------+
+ | startup-fencing | true | .. index:: |
+ | | | pair: cluster option; startup-fencing |
+ | | | |
+ | | | *Advanced Use Only:* Should the cluster fence |
+ | | | unseen nodes at start-up? Setting this to false is |
+ | | | unsafe, because the unseen nodes could be active |
+ | | | and running resources but unreachable. |
+ +---------------------------+---------+----------------------------------------------------+
+ | election-timeout | 2min | .. index:: |
+ | | | pair: cluster option; election-timeout |
+ | | | |
+ | | | *Advanced Use Only:* If you need to adjust this |
+ | | | value, it probably indicates the presence of a bug.|
+ +---------------------------+---------+----------------------------------------------------+
+ | shutdown-escalation | 20min | .. index:: |
+ | | | pair: cluster option; shutdown-escalation |
+ | | | |
+ | | | *Advanced Use Only:* If you need to adjust this |
+ | | | value, it probably indicates the presence of a bug.|
+ +---------------------------+---------+----------------------------------------------------+
+ | join-integration-timeout | 3min | .. index:: |
+ | | | pair: cluster option; join-integration-timeout |
+ | | | |
+ | | | *Advanced Use Only:* If you need to adjust this |
+ | | | value, it probably indicates the presence of a bug.|
+ +---------------------------+---------+----------------------------------------------------+
+ | join-finalization-timeout | 30min | .. index:: |
+ | | | pair: cluster option; join-finalization-timeout |
+ | | | |
+ | | | *Advanced Use Only:* If you need to adjust this |
+ | | | value, it probably indicates the presence of a bug.|
+ +---------------------------+---------+----------------------------------------------------+
+ | transition-delay | 0s | .. index:: |
+ | | | pair: cluster option; transition-delay |
+ | | | |
+ | | | *Advanced Use Only:* Delay cluster recovery for |
+ | | | the configured interval to allow for additional or |
+ | | | related events to occur. This can be useful if |
+ | | | your configuration is sensitive to the order in |
+ | | | which ping updates arrive. Enabling this option |
+ | | | will slow down cluster recovery under all |
+ | | | conditions. |
+ +---------------------------+---------+----------------------------------------------------+
diff --git a/doc/sphinx/Pacemaker_Explained/resources.rst b/doc/sphinx/Pacemaker_Explained/resources.rst
new file mode 100644
index 0000000..3b7520f
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/resources.rst
@@ -0,0 +1,1074 @@
+.. _resource:
+
+Cluster Resources
+-----------------
+
+.. _s-resource-primitive:
+
+What is a Cluster Resource?
+###########################
+
+.. index::
+ single: resource
+
+A *resource* is a service managed by Pacemaker. The simplest type of resource,
+a *primitive*, is described in this chapter. More complex forms, such as groups
+and clones, are described in later chapters.
+
+Every primitive has a *resource agent* that provides Pacemaker a standardized
+interface for managing the service. This allows Pacemaker to be agnostic about
+the services it manages. Pacemaker doesn't need to understand how the service
+works because it relies on the resource agent to do the right thing when asked.
+
+Every resource has a *class* specifying the standard that its resource agent
+follows, and a *type* identifying the specific service being managed.
+
+
+.. _s-resource-supported:
+
+.. index::
+ single: resource; class
+
+Resource Classes
+################
+
+Pacemaker supports several classes, or standards, of resource agents:
+
+* OCF
+* LSB
+* Systemd
+* Service
+* Fencing
+* Nagios *(deprecated since 2.1.6)*
+* Upstart *(deprecated since 2.1.0)*
+
+
+.. index::
+ single: resource; OCF
+ single: OCF; resources
+ single: Open Cluster Framework; resources
+
+Open Cluster Framework
+______________________
+
+The Open Cluster Framework (OCF) Resource Agent API is a ClusterLabs
+standard for managing services. It is the most preferred since it is
+specifically designed for use in a Pacemaker cluster.
+
+OCF agents are scripts that support a variety of actions including ``start``,
+``stop``, and ``monitor``. They may accept parameters, making them more
+flexible than other classes. The number and purpose of parameters is left to
+the agent, which advertises them via the ``meta-data`` action.
+
+Unlike other classes, OCF agents have a *provider* as well as a class and type.
+
+For more information, see the "Resource Agents" chapter of *Pacemaker
+Administration* and the `OCF standard
+<https://github.com/ClusterLabs/OCF-spec/tree/main/ra>`_.
+
+
+.. _s-resource-supported-systemd:
+
+.. index::
+ single: Resource; Systemd
+ single: Systemd; resources
+
+Systemd
+_______
+
+Most Linux distributions use `Systemd
+<http://www.freedesktop.org/wiki/Software/systemd>`_ for system initialization
+and service management. *Unit files* specify how to manage services and are
+usually provided by the distribution.
+
+Pacemaker can manage systemd services. Simply create a resource with
+``systemd`` as the resource class and the unit file name as the resource type.
+Do *not* run ``systemctl enable`` on the unit.
+
+.. important::
+
+ Make sure that any systemd services to be controlled by the cluster are
+ *not* enabled to start at boot.
+
+
+.. index::
+ single: resource; LSB
+ single: LSB; resources
+ single: Linux Standard Base; resources
+
+Linux Standard Base
+___________________
+
+*LSB* resource agents, also known as `SysV-style
+<https://en.wikipedia.org/wiki/Init#SysV-style init scripts>`_, are scripts that
+provide start, stop, and status actions for a service.
+
+They are provided by some operating system distributions. If a full path is not
+given, they are assumed to be located in a directory specified when your
+Pacemaker software was built (usually ``/etc/init.d``).
+
+In order to be used with Pacemaker, they must conform to the `LSB specification
+<http://refspecs.linux-foundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html>`_
+as it relates to init scripts.
+
+.. warning::
+
+ Some LSB scripts do not fully comply with the standard. For details on how
+ to check whether your script is LSB-compatible, see the "Resource Agents"
+ chapter of `Pacemaker Administration`. Common problems include:
+
+ * Not implementing the ``status`` action
+ * Not observing the correct exit status codes
+ * Starting a started resource returns an error
+ * Stopping a stopped resource returns an error
+
+.. important::
+
+ Make sure the host is *not* configured to start any LSB services at boot
+ that will be controlled by the cluster.
+
+
+.. index::
+ single: Resource; System Services
+ single: System Service; resources
+
+System Services
+_______________
+
+Since there are various types of system services (``systemd``,
+``upstart``, and ``lsb``), Pacemaker supports a special ``service`` alias which
+intelligently figures out which one applies to a given cluster node.
+
+This is particularly useful when the cluster contains a mix of
+``systemd``, ``upstart``, and ``lsb``.
+
+In order, Pacemaker will try to find the named service as:
+
+* an LSB init script
+* a Systemd unit file
+* an Upstart job
+
+
+.. index::
+ single: Resource; STONITH
+ single: STONITH; resources
+
+STONITH
+_______
+
+The ``stonith`` class is used for managing fencing devices, discussed later in
+:ref:`fencing`.
+
+
+.. index::
+ single: Resource; Nagios Plugins
+ single: Nagios Plugins; resources
+
+Nagios Plugins
+______________
+
+Nagios Plugins are a way to monitor services. Pacemaker can use these as
+resources, to react to a change in the service's status.
+
+To use plugins as resources, Pacemaker must have been built with support, and
+OCF-style meta-data for the plugins must be installed on nodes that can run
+them. Meta-data for several common plugins is provided by the
+`nagios-agents-metadata <https://github.com/ClusterLabs/nagios-agents-metadata>`_
+project.
+
+The supported parameters for such a resource are same as the long options of
+the plugin.
+
+Start and monitor actions for plugin resources are implemented as invoking the
+plugin. A plugin result of "OK" (0) is treated as success, a result of "WARN"
+(1) is treated as a successful but degraded service, and any other result is
+considered a failure.
+
+A plugin resource is not going to change its status after recovery by
+restarting the plugin, so using them alone does not make sense with ``on-fail``
+set (or left to default) to ``restart``. Another value could make sense, for
+example, if you want to fence or standby nodes that cannot reach some external
+service.
+
+A more common use case for plugin resources is to configure them with a
+``container`` meta-attribute set to the name of another resource that actually
+makes the service available, such as a virtual machine or container.
+
+With ``container`` set, the plugin resource will automatically be colocated
+with the containing resource and ordered after it, and the containing resource
+will be considered failed if the plugin resource fails. This allows monitoring
+of a service inside a virtual machine or container, with recovery of the
+virtual machine or container if the service fails.
+
+.. warning::
+
+ Nagios support is deprecated in Pacemaker. Support will be dropped entirely
+ at the next major release of Pacemaker.
+
+ For monitoring a service inside a virtual machine or container, the
+ recommended alternative is to configure the virtual machine as a guest node
+ or the container as a :ref:`bundle <s-resource-bundle>`. For other use
+ cases, or when the virtual machine or container image cannot be modified,
+ the recommended alternative is to write a custom OCF agent for the service
+ (which may even call the Nagios plugin as part of its status action).
+
+
+.. index::
+ single: Resource; Upstart
+ single: Upstart; resources
+
+Upstart
+_______
+
+Some Linux distributions previously used `Upstart
+<https://upstart.ubuntu.com/>`_ for system initialization and service
+management. Pacemaker is able to manage services using Upstart if the local
+system supports them and support was enabled when your Pacemaker software was
+built.
+
+The *jobs* that specify how services are managed are usually provided by the
+operating system distribution.
+
+.. important::
+
+ Make sure the host is *not* configured to start any Upstart services at boot
+ that will be controlled by the cluster.
+
+.. warning::
+
+ Upstart support is deprecated in Pacemaker. Upstart is no longer actively
+ maintained, and test platforms for it are no longer readily usable. Support
+ will be dropped entirely at the next major release of Pacemaker.
+
+
+.. _primitive-resource:
+
+Resource Properties
+###################
+
+These values tell the cluster which resource agent to use for the resource,
+where to find that resource agent and what standards it conforms to.
+
+.. table:: **Properties of a Primitive Resource**
+ :widths: 1 4
+
+ +-------------+------------------------------------------------------------------+
+ | Field | Description |
+ +=============+==================================================================+
+ | id | .. index:: |
+ | | single: id; resource |
+ | | single: resource; property, id |
+ | | |
+ | | Your name for the resource |
+ +-------------+------------------------------------------------------------------+
+ | class | .. index:: |
+ | | single: class; resource |
+ | | single: resource; property, class |
+ | | |
+ | | The standard the resource agent conforms to. Allowed values: |
+ | | ``lsb``, ``ocf``, ``service``, ``stonith``, ``systemd``, |
+ | | ``nagios`` *(deprecated since 2.1.6)*, and ``upstart`` |
+ | | *(deprecated since 2.1.0)* |
+ +-------------+------------------------------------------------------------------+
+ | description | .. index:: |
+ | | single: description; resource |
+ | | single: resource; property, description |
+ | | |
+ | | A description of the Resource Agent, intended for local use. |
+ | | E.g. ``IP address for website`` |
+ +-------------+------------------------------------------------------------------+
+ | type | .. index:: |
+ | | single: type; resource |
+ | | single: resource; property, type |
+ | | |
+ | | The name of the Resource Agent you wish to use. E.g. |
+ | | ``IPaddr`` or ``Filesystem`` |
+ +-------------+------------------------------------------------------------------+
+ | provider | .. index:: |
+ | | single: provider; resource |
+ | | single: resource; property, provider |
+ | | |
+ | | The OCF spec allows multiple vendors to supply the same resource |
+ | | agent. To use the OCF resource agents supplied by the Heartbeat |
+ | | project, you would specify ``heartbeat`` here. |
+ +-------------+------------------------------------------------------------------+
+
+The XML definition of a resource can be queried with the **crm_resource** tool.
+For example:
+
+.. code-block:: none
+
+ # crm_resource --resource Email --query-xml
+
+might produce:
+
+.. topic:: A system resource definition
+
+ .. code-block:: xml
+
+ <primitive id="Email" class="service" type="exim"/>
+
+.. note::
+
+ One of the main drawbacks to system services (LSB, systemd or
+ Upstart) resources is that they do not allow any parameters!
+
+.. topic:: An OCF resource definition
+
+ .. code-block:: xml
+
+ <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
+ <instance_attributes id="Public-IP-params">
+ <nvpair id="Public-IP-ip" name="ip" value="192.0.2.2"/>
+ </instance_attributes>
+ </primitive>
+
+.. _resource_options:
+
+Resource Options
+################
+
+Resources have two types of options: *meta-attributes* and *instance attributes*.
+Meta-attributes apply to any type of resource, while instance attributes
+are specific to each resource agent.
+
+Resource Meta-Attributes
+________________________
+
+Meta-attributes are used by the cluster to decide how a resource should
+behave and can be easily set using the ``--meta`` option of the
+**crm_resource** command.
+
+.. table:: **Meta-attributes of a Primitive Resource**
+ :class: longtable
+ :widths: 2 2 3
+
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | Field | Default | Description |
+ +============================+==================================+======================================================+
+ | priority | 0 | .. index:: |
+ | | | single: priority; resource option |
+ | | | single: resource; option, priority |
+ | | | |
+ | | | If not all resources can be active, the cluster |
+ | | | will stop lower priority resources in order to |
+ | | | keep higher priority ones active. |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | critical | true | .. index:: |
+ | | | single: critical; resource option |
+ | | | single: resource; option, critical |
+ | | | |
+ | | | Use this value as the default for ``influence`` in |
+ | | | all :ref:`colocation constraints |
+ | | | <s-resource-colocation>` involving this resource, |
+ | | | as well as the implicit colocation constraints |
+ | | | created if this resource is in a :ref:`group |
+ | | | <group-resources>`. For details, see |
+ | | | :ref:`s-coloc-influence`. *(since 2.1.0)* |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | target-role | Started | .. index:: |
+ | | | single: target-role; resource option |
+ | | | single: resource; option, target-role |
+ | | | |
+ | | | What state should the cluster attempt to keep this |
+ | | | resource in? Allowed values: |
+ | | | |
+ | | | * ``Stopped:`` Force the resource to be stopped |
+ | | | * ``Started:`` Allow the resource to be started |
+ | | | (and in the case of :ref:`promotable clone |
+ | | | resources <s-resource-promotable>`, promoted |
+ | | | if appropriate) |
+ | | | * ``Unpromoted:`` Allow the resource to be started, |
+ | | | but only in the unpromoted role if the resource is |
+ | | | :ref:`promotable <s-resource-promotable>` |
+ | | | * ``Promoted:`` Equivalent to ``Started`` |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | is-managed | TRUE | .. index:: |
+ | | | single: is-managed; resource option |
+ | | | single: resource; option, is-managed |
+ | | | |
+ | | | Is the cluster allowed to start and stop |
+ | | | the resource? Allowed values: ``true``, ``false`` |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | maintenance | FALSE | .. index:: |
+ | | | single: maintenance; resource option |
+ | | | single: resource; option, maintenance |
+ | | | |
+ | | | Similar to the ``maintenance-mode`` |
+ | | | :ref:`cluster option <cluster_options>`, but for |
+ | | | a single resource. If true, the resource will not |
+ | | | be started, stopped, or monitored on any node. This |
+ | | | differs from ``is-managed`` in that monitors will |
+ | | | not be run. Allowed values: ``true``, ``false`` |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | resource-stickiness | 1 for individual clone | .. _resource-stickiness: |
+ | | instances, 0 for all | |
+ | | other resources | .. index:: |
+ | | | single: resource-stickiness; resource option |
+ | | | single: resource; option, resource-stickiness |
+ | | | |
+ | | | A score that will be added to the current node when |
+ | | | a resource is already active. This allows running |
+ | | | resources to stay where they are, even if they |
+ | | | would be placed elsewhere if they were being |
+ | | | started from a stopped state. |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | requires | ``quorum`` for resources | .. _requires: |
+ | | with a ``class`` of ``stonith``, | |
+ | | otherwise ``unfencing`` if | .. index:: |
+ | | unfencing is active in the | single: requires; resource option |
+ | | cluster, otherwise ``fencing`` | single: resource; option, requires |
+ | | if ``stonith-enabled`` is true, | |
+ | | otherwise ``quorum`` | Conditions under which the resource can be |
+ | | | started. Allowed values: |
+ | | | |
+ | | | * ``nothing:`` can always be started |
+ | | | * ``quorum:`` The cluster can only start this |
+ | | | resource if a majority of the configured nodes |
+ | | | are active |
+ | | | * ``fencing:`` The cluster can only start this |
+ | | | resource if a majority of the configured nodes |
+ | | | are active *and* any failed or unknown nodes |
+ | | | have been :ref:`fenced <fencing>` |
+ | | | * ``unfencing:`` The cluster can only start this |
+ | | | resource if a majority of the configured nodes |
+ | | | are active *and* any failed or unknown nodes have |
+ | | | been fenced *and* only on nodes that have been |
+ | | | :ref:`unfenced <unfencing>` |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | migration-threshold | INFINITY | .. index:: |
+ | | | single: migration-threshold; resource option |
+ | | | single: resource; option, migration-threshold |
+ | | | |
+ | | | How many failures may occur for this resource on |
+ | | | a node, before this node is marked ineligible to |
+ | | | host this resource. A value of 0 indicates that this |
+ | | | feature is disabled (the node will never be marked |
+ | | | ineligible); by constrast, the cluster treats |
+ | | | INFINITY (the default) as a very large but finite |
+ | | | number. This option has an effect only if the |
+ | | | failed operation specifies ``on-fail`` as |
+ | | | ``restart`` (the default), and additionally for |
+ | | | failed ``start`` operations, if the cluster |
+ | | | property ``start-failure-is-fatal`` is ``false``. |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | failure-timeout | 0 | .. index:: |
+ | | | single: failure-timeout; resource option |
+ | | | single: resource; option, failure-timeout |
+ | | | |
+ | | | How many seconds to wait before acting as if the |
+ | | | failure had not occurred, and potentially allowing |
+ | | | the resource back to the node on which it failed. |
+ | | | A value of 0 indicates that this feature is |
+ | | | disabled. |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | multiple-active | stop_start | .. index:: |
+ | | | single: multiple-active; resource option |
+ | | | single: resource; option, multiple-active |
+ | | | |
+ | | | What should the cluster do if it ever finds the |
+ | | | resource active on more than one node? Allowed |
+ | | | values: |
+ | | | |
+ | | | * ``block``: mark the resource as unmanaged |
+ | | | * ``stop_only``: stop all active instances and |
+ | | | leave them that way |
+ | | | * ``stop_start``: stop all active instances and |
+ | | | start the resource in one location only |
+ | | | * ``stop_unexpected``: stop all active instances |
+ | | | except where the resource should be active (this |
+ | | | should be used only when extra instances are not |
+ | | | expected to disrupt existing instances, and the |
+ | | | resource agent's monitor of an existing instance |
+ | | | is capable of detecting any problems that could be |
+ | | | caused; note that any resources ordered after this |
+ | | | will still need to be restarted) *(since 2.1.3)* |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | allow-migrate | TRUE for ocf:pacemaker:remote | Whether the cluster should try to "live migrate" |
+ | | resources, FALSE otherwise | this resource when it needs to be moved (see |
+ | | | :ref:`live-migration`) |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | allow-unhealthy-nodes | FALSE | Whether the resource should be able to run on a node |
+ | | | even if the node's health score would otherwise |
+ | | | prevent it (see :ref:`node-health`) *(since 2.1.3)* |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | container-attribute-target | | Specific to bundle resources; see |
+ | | | :ref:`s-bundle-attributes` |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | remote-node | | The name of the Pacemaker Remote guest node this |
+ | | | resource is associated with, if any. If |
+ | | | specified, this both enables the resource as a |
+ | | | guest node and defines the unique name used to |
+ | | | identify the guest node. The guest must be |
+ | | | configured to run the Pacemaker Remote daemon |
+ | | | when it is started. **WARNING:** This value |
+ | | | cannot overlap with any resource or node IDs. |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | remote-port | 3121 | If ``remote-node`` is specified, the port on the |
+ | | | guest used for its Pacemaker Remote connection. |
+ | | | The Pacemaker Remote daemon on the guest must |
+ | | | be configured to listen on this port. |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | remote-addr | value of ``remote-node`` | If ``remote-node`` is specified, the IP |
+ | | | address or hostname used to connect to the |
+ | | | guest via Pacemaker Remote. The Pacemaker Remote |
+ | | | daemon on the guest must be configured to accept |
+ | | | connections on this address. |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | remote-connect-timeout | 60s | If ``remote-node`` is specified, how long before |
+ | | | a pending guest connection will time out. |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+
+As an example of setting resource options, if you performed the following
+commands on an LSB Email resource:
+
+.. code-block:: none
+
+ # crm_resource --meta --resource Email --set-parameter priority --parameter-value 100
+ # crm_resource -m -r Email -p multiple-active -v block
+
+the resulting resource definition might be:
+
+.. topic:: An LSB resource with cluster options
+
+ .. code-block:: xml
+
+ <primitive id="Email" class="lsb" type="exim">
+ <meta_attributes id="Email-meta_attributes">
+ <nvpair id="Email-meta_attributes-priority" name="priority" value="100"/>
+ <nvpair id="Email-meta_attributes-multiple-active" name="multiple-active" value="block"/>
+ </meta_attributes>
+ </primitive>
+
+In addition to the cluster-defined meta-attributes described above, you may
+also configure arbitrary meta-attributes of your own choosing. Most commonly,
+this would be done for use in :ref:`rules <rules>`. For example, an IT department
+might define a custom meta-attribute to indicate which company department each
+resource is intended for. To reduce the chance of name collisions with
+cluster-defined meta-attributes added in the future, it is recommended to use
+a unique, organization-specific prefix for such attributes.
+
+.. _s-resource-defaults:
+
+Setting Global Defaults for Resource Meta-Attributes
+____________________________________________________
+
+To set a default value for a resource option, add it to the
+``rsc_defaults`` section with ``crm_attribute``. For example,
+
+.. code-block:: none
+
+ # crm_attribute --type rsc_defaults --name is-managed --update false
+
+would prevent the cluster from starting or stopping any of the
+resources in the configuration (unless of course the individual
+resources were specifically enabled by having their ``is-managed`` set to
+``true``).
+
+Resource Instance Attributes
+____________________________
+
+The resource agents of some resource classes (lsb, systemd and upstart *not* among them)
+can be given parameters which determine how they behave and which instance
+of a service they control.
+
+If your resource agent supports parameters, you can add them with the
+``crm_resource`` command. For example,
+
+.. code-block:: none
+
+ # crm_resource --resource Public-IP --set-parameter ip --parameter-value 192.0.2.2
+
+would create an entry in the resource like this:
+
+.. topic:: An example OCF resource with instance attributes
+
+ .. code-block:: xml
+
+ <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
+ <instance_attributes id="params-public-ip">
+ <nvpair id="public-ip-addr" name="ip" value="192.0.2.2"/>
+ </instance_attributes>
+ </primitive>
+
+For an OCF resource, the result would be an environment variable
+called ``OCF_RESKEY_ip`` with a value of ``192.0.2.2``.
+
+The list of instance attributes supported by an OCF resource agent can be
+found by calling the resource agent with the ``meta-data`` command.
+The output contains an XML description of all the supported
+attributes, their purpose and default values.
+
+.. topic:: Displaying the metadata for the Dummy resource agent template
+
+ .. code-block:: none
+
+ # export OCF_ROOT=/usr/lib/ocf
+ # $OCF_ROOT/resource.d/pacemaker/Dummy meta-data
+
+ .. code-block:: xml
+
+ <?xml version="1.0"?>
+ <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
+ <resource-agent name="Dummy" version="2.0">
+ <version>1.1</version>
+
+ <longdesc lang="en">
+ This is a dummy OCF resource agent. It does absolutely nothing except keep track
+ of whether it is running or not, and can be configured so that actions fail or
+ take a long time. Its purpose is primarily for testing, and to serve as a
+ template for resource agent writers.
+ </longdesc>
+ <shortdesc lang="en">Example stateless resource agent</shortdesc>
+
+ <parameters>
+ <parameter name="state" unique-group="state">
+ <longdesc lang="en">
+ Location to store the resource state in.
+ </longdesc>
+ <shortdesc lang="en">State file</shortdesc>
+ <content type="string" default="/var/run/Dummy-RESOURCE_ID.state" />
+ </parameter>
+
+ <parameter name="passwd" reloadable="1">
+ <longdesc lang="en">
+ Fake password field
+ </longdesc>
+ <shortdesc lang="en">Password</shortdesc>
+ <content type="string" default="" />
+ </parameter>
+
+ <parameter name="fake" reloadable="1">
+ <longdesc lang="en">
+ Fake attribute that can be changed to cause a reload
+ </longdesc>
+ <shortdesc lang="en">Fake attribute that can be changed to cause a reload</shortdesc>
+ <content type="string" default="dummy" />
+ </parameter>
+
+ <parameter name="op_sleep" reloadable="1">
+ <longdesc lang="en">
+ Number of seconds to sleep during operations. This can be used to test how
+ the cluster reacts to operation timeouts.
+ </longdesc>
+ <shortdesc lang="en">Operation sleep duration in seconds.</shortdesc>
+ <content type="string" default="0" />
+ </parameter>
+
+ <parameter name="fail_start_on" reloadable="1">
+ <longdesc lang="en">
+ Start, migrate_from, and reload-agent actions will return failure if running on
+ the host specified here, but the resource will run successfully anyway (future
+ monitor calls will find it running). This can be used to test on-fail=ignore.
+ </longdesc>
+ <shortdesc lang="en">Report bogus start failure on specified host</shortdesc>
+ <content type="string" default="" />
+ </parameter>
+ <parameter name="envfile" reloadable="1">
+ <longdesc lang="en">
+ If this is set, the environment will be dumped to this file for every call.
+ </longdesc>
+ <shortdesc lang="en">Environment dump file</shortdesc>
+ <content type="string" default="" />
+ </parameter>
+
+ </parameters>
+
+ <actions>
+ <action name="start" timeout="20s" />
+ <action name="stop" timeout="20s" />
+ <action name="monitor" timeout="20s" interval="10s" depth="0"/>
+ <action name="reload" timeout="20s" />
+ <action name="reload-agent" timeout="20s" />
+ <action name="migrate_to" timeout="20s" />
+ <action name="migrate_from" timeout="20s" />
+ <action name="validate-all" timeout="20s" />
+ <action name="meta-data" timeout="5s" />
+ </actions>
+ </resource-agent>
+
+.. index::
+ single: resource; action
+ single: resource; operation
+
+.. _operation:
+
+Resource Operations
+###################
+
+*Operations* are actions the cluster can perform on a resource by calling the
+resource agent. Resource agents must support certain common operations such as
+start, stop, and monitor, and may implement any others.
+
+Operations may be explicitly configured for two purposes: to override defaults
+for options (such as timeout) that the cluster will use whenever it initiates
+the operation, and to run an operation on a recurring basis (for example, to
+monitor the resource for failure).
+
+.. topic:: An OCF resource with a non-default start timeout
+
+ .. code-block:: xml
+
+ <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
+ <operations>
+ <op id="Public-IP-start" name="start" timeout="60s"/>
+ </operations>
+ <instance_attributes id="params-public-ip">
+ <nvpair id="public-ip-addr" name="ip" value="192.0.2.2"/>
+ </instance_attributes>
+ </primitive>
+
+Pacemaker identifies operations by a combination of name and interval, so this
+combination must be unique for each resource. That is, you should not configure
+two operations for the same resource with the same name and interval.
+
+.. _operation_properties:
+
+Operation Properties
+____________________
+
+Operation properties may be specified directly in the ``op`` element as
+XML attributes, or in a separate ``meta_attributes`` block as ``nvpair`` elements.
+XML attributes take precedence over ``nvpair`` elements if both are specified.
+
+.. table:: **Properties of an Operation**
+ :class: longtable
+ :widths: 1 2 3
+
+ +----------------+-----------------------------------+-----------------------------------------------------+
+ | Field | Default | Description |
+ +================+===================================+=====================================================+
+ | id | | .. index:: |
+ | | | single: id; action property |
+ | | | single: action; property, id |
+ | | | |
+ | | | A unique name for the operation. |
+ +----------------+-----------------------------------+-----------------------------------------------------+
+ | name | | .. index:: |
+ | | | single: name; action property |
+ | | | single: action; property, name |
+ | | | |
+ | | | The action to perform. This can be any action |
+ | | | supported by the agent; common values include |
+ | | | ``monitor``, ``start``, and ``stop``. |
+ +----------------+-----------------------------------+-----------------------------------------------------+
+ | interval | 0 | .. index:: |
+ | | | single: interval; action property |
+ | | | single: action; property, interval |
+ | | | |
+ | | | How frequently (in seconds) to perform the |
+ | | | operation. A value of 0 means "when needed". |
+ | | | A positive value defines a *recurring action*, |
+ | | | which is typically used with |
+ | | | :ref:`monitor <s-resource-monitoring>`. |
+ +----------------+-----------------------------------+-----------------------------------------------------+
+ | timeout | | .. index:: |
+ | | | single: timeout; action property |
+ | | | single: action; property, timeout |
+ | | | |
+ | | | How long to wait before declaring the action |
+ | | | has failed |
+ +----------------+-----------------------------------+-----------------------------------------------------+
+ | on-fail | Varies by action: | .. index:: |
+ | | | single: on-fail; action property |
+ | | * ``stop``: ``fence`` if | single: action; property, on-fail |
+ | | ``stonith-enabled`` is true | |
+ | | or ``block`` otherwise | The action to take if this action ever fails. |
+ | | * ``demote``: ``on-fail`` of the | Allowed values: |
+ | | ``monitor`` action with | |
+ | | ``role`` set to ``Promoted``, | * ``ignore:`` Pretend the resource did not fail. |
+ | | if present, enabled, and | * ``block:`` Don't perform any further operations |
+ | | configured to a value other | on the resource. |
+ | | than ``demote``, or ``restart`` | * ``stop:`` Stop the resource and do not start |
+ | | otherwise | it elsewhere. |
+ | | * all other actions: ``restart`` | * ``demote:`` Demote the resource, without a |
+ | | | full restart. This is valid only for ``promote`` |
+ | | | actions, and for ``monitor`` actions with both |
+ | | | a nonzero ``interval`` and ``role`` set to |
+ | | | ``Promoted``; for any other action, a |
+ | | | configuration error will be logged, and the |
+ | | | default behavior will be used. *(since 2.0.5)* |
+ | | | * ``restart:`` Stop the resource and start it |
+ | | | again (possibly on a different node). |
+ | | | * ``fence:`` STONITH the node on which the |
+ | | | resource failed. |
+ | | | * ``standby:`` Move *all* resources away from the |
+ | | | node on which the resource failed. |
+ +----------------+-----------------------------------+-----------------------------------------------------+
+ | enabled | TRUE | .. index:: |
+ | | | single: enabled; action property |
+ | | | single: action; property, enabled |
+ | | | |
+ | | | If ``false``, ignore this operation definition. |
+ | | | This is typically used to pause a particular |
+ | | | recurring ``monitor`` operation; for instance, it |
+ | | | can complement the respective resource being |
+ | | | unmanaged (``is-managed=false``), as this alone |
+ | | | will :ref:`not block any configured monitoring |
+ | | | <s-monitoring-unmanaged>`. Disabling the operation |
+ | | | does not suppress all actions of the given type. |
+ | | | Allowed values: ``true``, ``false``. |
+ +----------------+-----------------------------------+-----------------------------------------------------+
+ | record-pending | TRUE | .. index:: |
+ | | | single: record-pending; action property |
+ | | | single: action; property, record-pending |
+ | | | |
+ | | | If ``true``, the intention to perform the operation |
+ | | | is recorded so that GUIs and CLI tools can indicate |
+ | | | that an operation is in progress. This is best set |
+ | | | as an *operation default* |
+ | | | (see :ref:`s-operation-defaults`). Allowed values: |
+ | | | ``true``, ``false``. |
+ +----------------+-----------------------------------+-----------------------------------------------------+
+ | role | | .. index:: |
+ | | | single: role; action property |
+ | | | single: action; property, role |
+ | | | |
+ | | | Run the operation only on node(s) that the cluster |
+ | | | thinks should be in the specified role. This only |
+ | | | makes sense for recurring ``monitor`` operations. |
+ | | | Allowed (case-sensitive) values: ``Stopped``, |
+ | | | ``Started``, and in the case of :ref:`promotable |
+ | | | clone resources <s-resource-promotable>`, |
+ | | | ``Unpromoted`` and ``Promoted``. |
+ +----------------+-----------------------------------+-----------------------------------------------------+
+
+.. note::
+
+ When ``on-fail`` is set to ``demote``, recovery from failure by a successful
+ demote causes the cluster to recalculate whether and where a new instance
+ should be promoted. The node with the failure is eligible, so if promotion
+ scores have not changed, it will be promoted again.
+
+ There is no direct equivalent of ``migration-threshold`` for the promoted
+ role, but the same effect can be achieved with a location constraint using a
+ :ref:`rule <rules>` with a node attribute expression for the resource's fail
+ count.
+
+ For example, to immediately ban the promoted role from a node with any
+ failed promote or promoted instance monitor:
+
+ .. code-block:: xml
+
+ <rsc_location id="loc1" rsc="my_primitive">
+ <rule id="rule1" score="-INFINITY" role="Promoted" boolean-op="or">
+ <expression id="expr1" attribute="fail-count-my_primitive#promote_0"
+ operation="gte" value="1"/>
+ <expression id="expr2" attribute="fail-count-my_primitive#monitor_10000"
+ operation="gte" value="1"/>
+ </rule>
+ </rsc_location>
+
+ This example assumes that there is a promotable clone of the ``my_primitive``
+ resource (note that the primitive name, not the clone name, is used in the
+ rule), and that there is a recurring 10-second-interval monitor configured for
+ the promoted role (fail count attributes specify the interval in
+ milliseconds).
+
+.. _s-resource-monitoring:
+
+Monitoring Resources for Failure
+________________________________
+
+When Pacemaker first starts a resource, it runs one-time ``monitor`` operations
+(referred to as *probes*) to ensure the resource is running where it's
+supposed to be, and not running where it's not supposed to be. (This behavior
+can be affected by the ``resource-discovery`` location constraint property.)
+
+Other than those initial probes, Pacemaker will *not* (by default) check that
+the resource continues to stay healthy [#]_. You must configure ``monitor``
+operations explicitly to perform these checks.
+
+.. topic:: An OCF resource with a recurring health check
+
+ .. code-block:: xml
+
+ <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
+ <operations>
+ <op id="Public-IP-start" name="start" timeout="60s"/>
+ <op id="Public-IP-monitor" name="monitor" interval="60s"/>
+ </operations>
+ <instance_attributes id="params-public-ip">
+ <nvpair id="public-ip-addr" name="ip" value="192.0.2.2"/>
+ </instance_attributes>
+ </primitive>
+
+By default, a ``monitor`` operation will ensure that the resource is running
+where it is supposed to. The ``target-role`` property can be used for further
+checking.
+
+For example, if a resource has one ``monitor`` operation with
+``interval=10 role=Started`` and a second ``monitor`` operation with
+``interval=11 role=Stopped``, the cluster will run the first monitor on any nodes
+it thinks *should* be running the resource, and the second monitor on any nodes
+that it thinks *should not* be running the resource (for the truly paranoid,
+who want to know when an administrator manually starts a service by mistake).
+
+.. note::
+
+ Currently, monitors with ``role=Stopped`` are not implemented for
+ :ref:`clone <s-resource-clone>` resources.
+
+.. _s-monitoring-unmanaged:
+
+Monitoring Resources When Administration is Disabled
+____________________________________________________
+
+Recurring ``monitor`` operations behave differently under various administrative
+settings:
+
+* When a resource is unmanaged (by setting ``is-managed=false``): No monitors
+ will be stopped.
+
+ If the unmanaged resource is stopped on a node where the cluster thinks it
+ should be running, the cluster will detect and report that it is not, but it
+ will not consider the monitor failed, and will not try to start the resource
+ until it is managed again.
+
+ Starting the unmanaged resource on a different node is strongly discouraged
+ and will at least cause the cluster to consider the resource failed, and
+ may require the resource's ``target-role`` to be set to ``Stopped`` then
+ ``Started`` to be recovered.
+
+* When a resource is put into maintenance mode (by setting
+ ``maintenance=true``): The resource will be marked as unmanaged. (This
+ overrides ``is-managed=true``.)
+
+ Additionally, all monitor operations will be stopped, except those specifying
+ ``role`` as ``Stopped`` (which will be newly initiated if appropriate). As
+ with unmanaged resources in general, starting a resource on a node other than
+ where the cluster expects it to be will cause problems.
+
+* When a node is put into standby: All resources will be moved away from the
+ node, and all ``monitor`` operations will be stopped on the node, except those
+ specifying ``role`` as ``Stopped`` (which will be newly initiated if
+ appropriate).
+
+* When a node is put into maintenance mode: All resources that are active on the
+ node will be marked as in maintenance mode. See above for more details.
+
+* When the cluster is put into maintenance mode: All resources in the cluster
+ will be marked as in maintenance mode. See above for more details.
+
+A resource is in maintenance mode if the cluster, the node where the resource
+is active, or the resource itself is configured to be in maintenance mode. If a
+resource is in maintenance mode, then it is also unmanaged. However, if a
+resource is unmanaged, it is not necessarily in maintenance mode.
+
+.. _s-operation-defaults:
+
+Setting Global Defaults for Operations
+______________________________________
+
+You can change the global default values for operation properties
+in a given cluster. These are defined in an ``op_defaults`` section
+of the CIB's ``configuration`` section, and can be set with
+``crm_attribute``. For example,
+
+.. code-block:: none
+
+ # crm_attribute --type op_defaults --name timeout --update 20s
+
+would default each operation's ``timeout`` to 20 seconds. If an
+operation's definition also includes a value for ``timeout``, then that
+value would be used for that operation instead.
+
+When Implicit Operations Take a Long Time
+_________________________________________
+
+The cluster will always perform a number of implicit operations: ``start``,
+``stop`` and a non-recurring ``monitor`` operation used at startup to check
+whether the resource is already active. If one of these is taking too long,
+then you can create an entry for them and specify a longer timeout.
+
+.. topic:: An OCF resource with custom timeouts for its implicit actions
+
+ .. code-block:: xml
+
+ <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
+ <operations>
+ <op id="public-ip-startup" name="monitor" interval="0" timeout="90s"/>
+ <op id="public-ip-start" name="start" interval="0" timeout="180s"/>
+ <op id="public-ip-stop" name="stop" interval="0" timeout="15min"/>
+ </operations>
+ <instance_attributes id="params-public-ip">
+ <nvpair id="public-ip-addr" name="ip" value="192.0.2.2"/>
+ </instance_attributes>
+ </primitive>
+
+Multiple Monitor Operations
+___________________________
+
+Provided no two operations (for a single resource) have the same name
+and interval, you can have as many ``monitor`` operations as you like.
+In this way, you can do a superficial health check every minute and
+progressively more intense ones at higher intervals.
+
+To tell the resource agent what kind of check to perform, you need to
+provide each monitor with a different value for a common parameter.
+The OCF standard creates a special parameter called ``OCF_CHECK_LEVEL``
+for this purpose and dictates that it is "made available to the
+resource agent without the normal ``OCF_RESKEY`` prefix".
+
+Whatever name you choose, you can specify it by adding an
+``instance_attributes`` block to the ``op`` tag. It is up to each
+resource agent to look for the parameter and decide how to use it.
+
+.. topic:: An OCF resource with two recurring health checks, performing
+ different levels of checks specified via ``OCF_CHECK_LEVEL``.
+
+ .. code-block:: xml
+
+ <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
+ <operations>
+ <op id="public-ip-health-60" name="monitor" interval="60">
+ <instance_attributes id="params-public-ip-depth-60">
+ <nvpair id="public-ip-depth-60" name="OCF_CHECK_LEVEL" value="10"/>
+ </instance_attributes>
+ </op>
+ <op id="public-ip-health-300" name="monitor" interval="300">
+ <instance_attributes id="params-public-ip-depth-300">
+ <nvpair id="public-ip-depth-300" name="OCF_CHECK_LEVEL" value="20"/>
+ </instance_attributes>
+ </op>
+ </operations>
+ <instance_attributes id="params-public-ip">
+ <nvpair id="public-ip-level" name="ip" value="192.0.2.2"/>
+ </instance_attributes>
+ </primitive>
+
+Disabling a Monitor Operation
+_____________________________
+
+The easiest way to stop a recurring monitor is to just delete it.
+However, there can be times when you only want to disable it
+temporarily. In such cases, simply add ``enabled=false`` to the
+operation's definition.
+
+.. topic:: Example of an OCF resource with a disabled health check
+
+ .. code-block:: xml
+
+ <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
+ <operations>
+ <op id="public-ip-check" name="monitor" interval="60s" enabled="false"/>
+ </operations>
+ <instance_attributes id="params-public-ip">
+ <nvpair id="public-ip-addr" name="ip" value="192.0.2.2"/>
+ </instance_attributes>
+ </primitive>
+
+This can be achieved from the command line by executing:
+
+.. code-block:: none
+
+ # cibadmin --modify --xml-text '<op id="public-ip-check" enabled="false"/>'
+
+Once you've done whatever you needed to do, you can then re-enable it with
+
+.. code-block:: none
+
+ # cibadmin --modify --xml-text '<op id="public-ip-check" enabled="true"/>'
+
+.. [#] Currently, anyway. Automatic monitoring operations may be added in a future
+ version of Pacemaker.
diff --git a/doc/sphinx/Pacemaker_Explained/reusing-configuration.rst b/doc/sphinx/Pacemaker_Explained/reusing-configuration.rst
new file mode 100644
index 0000000..0f34f84
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/reusing-configuration.rst
@@ -0,0 +1,415 @@
+Reusing Parts of the Configuration
+----------------------------------
+
+Pacemaker provides multiple ways to simplify the configuration XML by reusing
+parts of it in multiple places.
+
+Besides simplifying the XML, this also allows you to manipulate multiple
+configuration elements with a single reference.
+
+Reusing Resource Definitions
+############################
+
+If you want to create lots of resources with similar configurations, defining a
+*resource template* simplifies the task. Once defined, it can be referenced in
+primitives or in certain types of constraints.
+
+Configuring Resources with Templates
+____________________________________
+
+The primitives referencing the template will inherit all meta-attributes,
+instance attributes, utilization attributes and operations defined
+in the template. And you can define specific attributes and operations for any
+of the primitives. If any of these are defined in both the template and the
+primitive, the values defined in the primitive will take precedence over the
+ones defined in the template.
+
+Hence, resource templates help to reduce the amount of configuration work.
+If any changes are needed, they can be done to the template definition and
+will take effect globally in all resource definitions referencing that
+template.
+
+Resource templates have a syntax similar to that of primitives.
+
+.. topic:: Resource template for a migratable Xen virtual machine
+
+ .. code-block:: xml
+
+ <template id="vm-template" class="ocf" provider="heartbeat" type="Xen">
+ <meta_attributes id="vm-template-meta_attributes">
+ <nvpair id="vm-template-meta_attributes-allow-migrate" name="allow-migrate" value="true"/>
+ </meta_attributes>
+ <utilization id="vm-template-utilization">
+ <nvpair id="vm-template-utilization-memory" name="memory" value="512"/>
+ </utilization>
+ <operations>
+ <op id="vm-template-monitor-15s" interval="15s" name="monitor" timeout="60s"/>
+ <op id="vm-template-start-0" interval="0" name="start" timeout="60s"/>
+ </operations>
+ </template>
+
+Once you define a resource template, you can use it in primitives by specifying the
+``template`` property.
+
+.. topic:: Xen primitive resource using a resource template
+
+ .. code-block:: xml
+
+ <primitive id="vm1" template="vm-template">
+ <instance_attributes id="vm1-instance_attributes">
+ <nvpair id="vm1-instance_attributes-name" name="name" value="vm1"/>
+ <nvpair id="vm1-instance_attributes-xmfile" name="xmfile" value="/etc/xen/shared-vm/vm1"/>
+ </instance_attributes>
+ </primitive>
+
+In the example above, the new primitive ``vm1`` will inherit everything from ``vm-template``. For
+example, the equivalent of the above two examples would be:
+
+.. topic:: Equivalent Xen primitive resource not using a resource template
+
+ .. code-block:: xml
+
+ <primitive id="vm1" class="ocf" provider="heartbeat" type="Xen">
+ <meta_attributes id="vm-template-meta_attributes">
+ <nvpair id="vm-template-meta_attributes-allow-migrate" name="allow-migrate" value="true"/>
+ </meta_attributes>
+ <utilization id="vm-template-utilization">
+ <nvpair id="vm-template-utilization-memory" name="memory" value="512"/>
+ </utilization>
+ <operations>
+ <op id="vm-template-monitor-15s" interval="15s" name="monitor" timeout="60s"/>
+ <op id="vm-template-start-0" interval="0" name="start" timeout="60s"/>
+ </operations>
+ <instance_attributes id="vm1-instance_attributes">
+ <nvpair id="vm1-instance_attributes-name" name="name" value="vm1"/>
+ <nvpair id="vm1-instance_attributes-xmfile" name="xmfile" value="/etc/xen/shared-vm/vm1"/>
+ </instance_attributes>
+ </primitive>
+
+If you want to overwrite some attributes or operations, add them to the
+particular primitive's definition.
+
+.. topic:: Xen resource overriding template values
+
+ .. code-block:: xml
+
+ <primitive id="vm2" template="vm-template">
+ <meta_attributes id="vm2-meta_attributes">
+ <nvpair id="vm2-meta_attributes-allow-migrate" name="allow-migrate" value="false"/>
+ </meta_attributes>
+ <utilization id="vm2-utilization">
+ <nvpair id="vm2-utilization-memory" name="memory" value="1024"/>
+ </utilization>
+ <instance_attributes id="vm2-instance_attributes">
+ <nvpair id="vm2-instance_attributes-name" name="name" value="vm2"/>
+ <nvpair id="vm2-instance_attributes-xmfile" name="xmfile" value="/etc/xen/shared-vm/vm2"/>
+ </instance_attributes>
+ <operations>
+ <op id="vm2-monitor-30s" interval="30s" name="monitor" timeout="120s"/>
+ <op id="vm2-stop-0" interval="0" name="stop" timeout="60s"/>
+ </operations>
+ </primitive>
+
+In the example above, the new primitive ``vm2`` has special attribute values.
+Its ``monitor`` operation has a longer ``timeout`` and ``interval``, and
+the primitive has an additional ``stop`` operation.
+
+To see the resulting definition of a resource, run:
+
+.. code-block:: none
+
+ # crm_resource --query-xml --resource vm2
+
+To see the raw definition of a resource in the CIB, run:
+
+.. code-block:: none
+
+ # crm_resource --query-xml-raw --resource vm2
+
+Using Templates in Constraints
+______________________________
+
+A resource template can be referenced in the following types of constraints:
+
+- ``order`` constraints (see :ref:`s-resource-ordering`)
+- ``colocation`` constraints (see :ref:`s-resource-colocation`)
+- ``rsc_ticket`` constraints (for multi-site clusters as described in :ref:`ticket-constraints`)
+
+Resource templates referenced in constraints stand for all primitives which are
+derived from that template. This means, the constraint applies to all primitive
+resources referencing the resource template. Referencing resource templates in
+constraints is an alternative to resource sets and can simplify the cluster
+configuration considerably.
+
+For example, given the example templates earlier in this chapter:
+
+.. code-block:: xml
+
+ <rsc_colocation id="vm-template-colo-base-rsc" rsc="vm-template" rsc-role="Started" with-rsc="base-rsc" score="INFINITY"/>
+
+would colocate all VMs with ``base-rsc`` and is the equivalent of the following constraint configuration:
+
+.. code-block:: xml
+
+ <rsc_colocation id="vm-colo-base-rsc" score="INFINITY">
+ <resource_set id="vm-colo-base-rsc-0" sequential="false" role="Started">
+ <resource_ref id="vm1"/>
+ <resource_ref id="vm2"/>
+ </resource_set>
+ <resource_set id="vm-colo-base-rsc-1">
+ <resource_ref id="base-rsc"/>
+ </resource_set>
+ </rsc_colocation>
+
+.. note::
+
+ In a colocation constraint, only one template may be referenced from either
+ ``rsc`` or ``with-rsc``; the other reference must be a regular resource.
+
+Using Templates in Resource Sets
+________________________________
+
+Resource templates can also be referenced in resource sets.
+
+For example, given the example templates earlier in this section, then:
+
+.. code-block:: xml
+
+ <rsc_order id="order1" score="INFINITY">
+ <resource_set id="order1-0">
+ <resource_ref id="base-rsc"/>
+ <resource_ref id="vm-template"/>
+ <resource_ref id="top-rsc"/>
+ </resource_set>
+ </rsc_order>
+
+is the equivalent of the following constraint using a sequential resource set:
+
+.. code-block:: xml
+
+ <rsc_order id="order1" score="INFINITY">
+ <resource_set id="order1-0">
+ <resource_ref id="base-rsc"/>
+ <resource_ref id="vm1"/>
+ <resource_ref id="vm2"/>
+ <resource_ref id="top-rsc"/>
+ </resource_set>
+ </rsc_order>
+
+Or, if the resources referencing the template can run in parallel, then:
+
+.. code-block:: xml
+
+ <rsc_order id="order2" score="INFINITY">
+ <resource_set id="order2-0">
+ <resource_ref id="base-rsc"/>
+ </resource_set>
+ <resource_set id="order2-1" sequential="false">
+ <resource_ref id="vm-template"/>
+ </resource_set>
+ <resource_set id="order2-2">
+ <resource_ref id="top-rsc"/>
+ </resource_set>
+ </rsc_order>
+
+is the equivalent of the following constraint configuration:
+
+.. code-block:: xml
+
+ <rsc_order id="order2" score="INFINITY">
+ <resource_set id="order2-0">
+ <resource_ref id="base-rsc"/>
+ </resource_set>
+ <resource_set id="order2-1" sequential="false">
+ <resource_ref id="vm1"/>
+ <resource_ref id="vm2"/>
+ </resource_set>
+ <resource_set id="order2-2">
+ <resource_ref id="top-rsc"/>
+ </resource_set>
+ </rsc_order>
+
+.. _s-reusing-config-elements:
+
+Reusing Rules, Options and Sets of Operations
+#############################################
+
+Sometimes a number of constraints need to use the same set of rules,
+and resources need to set the same options and parameters. To
+simplify this situation, you can refer to an existing object using an
+``id-ref`` instead of an ``id``.
+
+So if for one resource you have
+
+.. code-block:: xml
+
+ <rsc_location id="WebServer-connectivity" rsc="Webserver">
+ <rule id="ping-prefer-rule" score-attribute="pingd" >
+ <expression id="ping-prefer" attribute="pingd" operation="defined"/>
+ </rule>
+ </rsc_location>
+
+Then instead of duplicating the rule for all your other resources, you can instead specify:
+
+.. topic:: **Referencing rules from other constraints**
+
+ .. code-block:: xml
+
+ <rsc_location id="WebDB-connectivity" rsc="WebDB">
+ <rule id-ref="ping-prefer-rule"/>
+ </rsc_location>
+
+.. important::
+
+ The cluster will insist that the ``rule`` exists somewhere. Attempting
+ to add a reference to a non-existing rule will cause a validation
+ failure, as will attempting to remove a ``rule`` that is referenced
+ elsewhere.
+
+The same principle applies for ``meta_attributes`` and
+``instance_attributes`` as illustrated in the example below:
+
+.. topic:: Referencing attributes, options, and operations from other resources
+
+ .. code-block:: xml
+
+ <primitive id="mySpecialRsc" class="ocf" type="Special" provider="me">
+ <instance_attributes id="mySpecialRsc-attrs" score="1" >
+ <nvpair id="default-interface" name="interface" value="eth0"/>
+ <nvpair id="default-port" name="port" value="9999"/>
+ </instance_attributes>
+ <meta_attributes id="mySpecialRsc-options">
+ <nvpair id="failure-timeout" name="failure-timeout" value="5m"/>
+ <nvpair id="migration-threshold" name="migration-threshold" value="1"/>
+ <nvpair id="stickiness" name="resource-stickiness" value="0"/>
+ </meta_attributes>
+ <operations id="health-checks">
+ <op id="health-check" name="monitor" interval="60s"/>
+ <op id="health-check" name="monitor" interval="30min"/>
+ </operations>
+ </primitive>
+ <primitive id="myOtherlRsc" class="ocf" type="Other" provider="me">
+ <instance_attributes id-ref="mySpecialRsc-attrs"/>
+ <meta_attributes id-ref="mySpecialRsc-options"/>
+ <operations id-ref="health-checks"/>
+ </primitive>
+
+``id-ref`` can similarly be used with ``resource_set`` (in any constraint type),
+``nvpair``, and ``operations``.
+
+Tagging Configuration Elements
+##############################
+
+Pacemaker allows you to *tag* any configuration element that has an XML ID.
+
+The main purpose of tagging is to support higher-level user interface tools;
+Pacemaker itself only uses tags within constraints. Therefore, what you can
+do with tags mostly depends on the tools you use.
+
+Configuring Tags
+________________
+
+A tag is simply a named list of XML IDs.
+
+.. topic:: Tag referencing three resources
+
+ .. code-block:: xml
+
+ <tags>
+ <tag id="all-vms">
+ <obj_ref id="vm1"/>
+ <obj_ref id="vm2"/>
+ <obj_ref id="vm3"/>
+ </tag>
+ </tags>
+
+What you can do with this new tag depends on what your higher-level tools
+support. For example, a tool might allow you to enable or disable all of
+the tagged resources at once, or show the status of just the tagged
+resources.
+
+A single configuration element can be listed in any number of tags.
+
+Using Tags in Constraints and Resource Sets
+___________________________________________
+
+Pacemaker itself only uses tags in constraints. If you supply a tag name
+instead of a resource name in any constraint, the constraint will apply to
+all resources listed in that tag.
+
+.. topic:: Constraint using a tag
+
+ .. code-block:: xml
+
+ <rsc_order id="order1" first="storage" then="all-vms" kind="Mandatory" />
+
+In the example above, assuming the ``all-vms`` tag is defined as in the previous
+example, the constraint will behave the same as:
+
+.. topic:: Equivalent constraints without tags
+
+ .. code-block:: xml
+
+ <rsc_order id="order1-1" first="storage" then="vm1" kind="Mandatory" />
+ <rsc_order id="order1-2" first="storage" then="vm2" kind="Mandatory" />
+ <rsc_order id="order1-3" first="storage" then="vm3" kind="Mandatory" />
+
+A tag may be used directly in the constraint, or indirectly by being
+listed in a :ref:`resource set <s-resource-sets>` used in the constraint.
+When used in a resource set, an expanded tag will honor the set's
+``sequential`` property.
+
+Filtering With Tags
+___________________
+
+The ``crm_mon`` tool can be used to display lots of information about the
+state of the cluster. On large or complicated clusters, this can include
+a lot of information, which makes it difficult to find the one thing you
+are interested in. The ``--resource=`` and ``--node=`` command line
+options can be used to filter results. In their most basic usage, these
+options take a single resource or node name. However, they can also
+be supplied with a tag name to display several objects at once.
+
+For instance, given the following CIB section:
+
+.. code-block:: xml
+
+ <resources>
+ <primitive class="stonith" id="Fencing" type="fence_xvm"/>
+ <primitive class="ocf" id="dummy" provider="pacemaker" type="Dummy"/>
+ <group id="inactive-group">
+ <primitive class="ocf" id="inactive-dummy-1" provider="pacemaker" type="Dummy"/>
+ <primitive class="ocf" id="inactive-dummy-2" provider="pacemaker" type="Dummy"/>
+ </group>
+ <clone id="inactive-clone">
+ <primitive id="inactive-dhcpd" class="lsb" type="dhcpd"/>
+ </clone>
+ </resources>
+ <tags>
+ <tag id="inactive-rscs">
+ <obj_ref id="inactive-group"/>
+ <obj_ref id="inactive-clone"/>
+ </tag>
+ </tags>
+
+The following would be output for ``crm_mon --resource=inactive-rscs -r``:
+
+.. code-block:: none
+
+ Cluster Summary:
+ * Stack: corosync
+ * Current DC: cluster02 (version 2.0.4-1.e97f9675f.git.el7-e97f9675f) - partition with quorum
+ * Last updated: Tue Oct 20 16:09:01 2020
+ * Last change: Tue May 5 12:04:36 2020 by hacluster via crmd on cluster01
+ * 5 nodes configured
+ * 27 resource instances configured (4 DISABLED)
+
+ Node List:
+ * Online: [ cluster01 cluster02 ]
+
+ Full List of Resources:
+ * Clone Set: inactive-clone [inactive-dhcpd] (disabled):
+ * Stopped (disabled): [ cluster01 cluster02 ]
+ * Resource Group: inactive-group (disabled):
+ * inactive-dummy-1 (ocf::pacemaker:Dummy): Stopped (disabled)
+ * inactive-dummy-2 (ocf::pacemaker:Dummy): Stopped (disabled)
diff --git a/doc/sphinx/Pacemaker_Explained/rules.rst b/doc/sphinx/Pacemaker_Explained/rules.rst
new file mode 100644
index 0000000..e9d85e0
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/rules.rst
@@ -0,0 +1,1021 @@
+.. index::
+ single: rule
+
+.. _rules:
+
+Rules
+-----
+
+Rules can be used to make your configuration more dynamic, allowing values to
+change depending on the time or the value of a node attribute. Examples of
+things rules are useful for:
+
+* Set a higher value for :ref:`resource-stickiness <resource-stickiness>`
+ during working hours, to minimize downtime, and a lower value on weekends, to
+ allow resources to move to their most preferred locations when people aren't
+ around to notice.
+
+* Automatically place the cluster into maintenance mode during a scheduled
+ maintenance window.
+
+* Assign certain nodes and resources to a particular department via custom
+ node attributes and meta-attributes, and add a single location constraint
+ that restricts the department's resources to run only on those nodes.
+
+Each constraint type or property set that supports rules may contain one or more
+``rule`` elements specifying conditions under which the constraint or properties
+take effect. Examples later in this chapter will make this clearer.
+
+.. index::
+ pair: XML element; rule
+
+Rule Properties
+###############
+
+.. table:: **Attributes of a rule Element**
+ :widths: 1 1 3
+
+ +-----------------+-------------+-------------------------------------------+
+ | Attribute | Default | Description |
+ +=================+=============+===========================================+
+ | id | | .. index:: |
+ | | | pair: rule; id |
+ | | | |
+ | | | A unique name for this element (required) |
+ +-----------------+-------------+-------------------------------------------+
+ | role | ``Started`` | .. index:: |
+ | | | pair: rule; role |
+ | | | |
+ | | | The rule is in effect only when the |
+ | | | resource is in the specified role. |
+ | | | Allowed values are ``Started``, |
+ | | | ``Unpromoted``, and ``Promoted``. A rule |
+ | | | with a ``role`` of ``Promoted`` cannot |
+ | | | determine the initial location of a clone |
+ | | | instance and will only affect which of |
+ | | | the active instances will be promoted. |
+ +-----------------+-------------+-------------------------------------------+
+ | score | | .. index:: |
+ | | | pair: rule; score |
+ | | | |
+ | | | If this rule is used in a location |
+ | | | constraint and evaluates to true, apply |
+ | | | this score to the constraint. Only one of |
+ | | | ``score`` and ``score-attribute`` may be |
+ | | | used. |
+ +-----------------+-------------+-------------------------------------------+
+ | score-attribute | | .. index:: |
+ | | | pair: rule; score-attribute |
+ | | | |
+ | | | If this rule is used in a location |
+ | | | constraint and evaluates to true, use the |
+ | | | value of this node attribute as the score |
+ | | | to apply to the constraint. Only one of |
+ | | | ``score`` and ``score-attribute`` may be |
+ | | | used. |
+ +-----------------+-------------+-------------------------------------------+
+ | boolean-op | ``and`` | .. index:: |
+ | | | pair: rule; boolean-op |
+ | | | |
+ | | | If this rule contains more than one |
+ | | | condition, a value of ``and`` specifies |
+ | | | that the rule evaluates to true only if |
+ | | | all conditions are true, and a value of |
+ | | | ``or`` specifies that the rule evaluates |
+ | | | to true if any condition is true. |
+ +-----------------+-------------+-------------------------------------------+
+
+A ``rule`` element must contain one or more conditions. A condition may be an
+``expression`` element, a ``date_expression`` element, or another ``rule`` element.
+
+
+.. index::
+ single: rule; node attribute expression
+ single: node attribute; rule expression
+ pair: XML element; expression
+
+.. _node_attribute_expressions:
+
+Node Attribute Expressions
+##########################
+
+Expressions are rule conditions based on the values of node attributes.
+
+.. table:: **Attributes of an expression Element**
+ :class: longtable
+ :widths: 1 2 3
+
+ +--------------+---------------------------------+-------------------------------------------+
+ | Attribute | Default | Description |
+ +==============+=================================+===========================================+
+ | id | | .. index:: |
+ | | | pair: expression; id |
+ | | | |
+ | | | A unique name for this element (required) |
+ +--------------+---------------------------------+-------------------------------------------+
+ | attribute | | .. index:: |
+ | | | pair: expression; attribute |
+ | | | |
+ | | | The node attribute to test (required) |
+ +--------------+---------------------------------+-------------------------------------------+
+ | type | The default type for | .. index:: |
+ | | ``lt``, ``gt``, ``lte``, and | pair: expression; type |
+ | | ``gte`` operations is ``number``| |
+ | | if either value contains a | How the node attributes should be |
+ | | decimal point character, or | compared. Allowed values are ``string``, |
+ | | ``integer`` otherwise. The | ``integer`` *(since 2.0.5)*, ``number``, |
+ | | default type for all other | and ``version``. ``integer`` truncates |
+ | | operations is ``string``. If a | floating-point values if necessary before |
+ | | numeric parse fails for either | performing a 64-bit integer comparison. |
+ | | value, then the values are | ``number`` performs a double-precision |
+ | | compared as type ``string``. | floating-point comparison |
+ | | | *(32-bit integer before 2.0.5)*. |
+ +--------------+---------------------------------+-------------------------------------------+
+ | operation | | .. index:: |
+ | | | pair: expression; operation |
+ | | | |
+ | | | The comparison to perform (required). |
+ | | | Allowed values: |
+ | | | |
+ | | | * ``lt:`` True if the node attribute value|
+ | | | is less than the comparison value |
+ | | | * ``gt:`` True if the node attribute value|
+ | | | is greater than the comparison value |
+ | | | * ``lte:`` True if the node attribute |
+ | | | value is less than or equal to the |
+ | | | comparison value |
+ | | | * ``gte:`` True if the node attribute |
+ | | | value is greater than or equal to the |
+ | | | comparison value |
+ | | | * ``eq:`` True if the node attribute value|
+ | | | is equal to the comparison value |
+ | | | * ``ne:`` True if the node attribute value|
+ | | | is not equal to the comparison value |
+ | | | * ``defined:`` True if the node has the |
+ | | | named attribute |
+ | | | * ``not_defined:`` True if the node does |
+ | | | not have the named attribute |
+ +--------------+---------------------------------+-------------------------------------------+
+ | value | | .. index:: |
+ | | | pair: expression; value |
+ | | | |
+ | | | User-supplied value for comparison |
+ | | | (required for operations other than |
+ | | | ``defined`` and ``not_defined``) |
+ +--------------+---------------------------------+-------------------------------------------+
+ | value-source | ``literal`` | .. index:: |
+ | | | pair: expression; value-source |
+ | | | |
+ | | | How the ``value`` is derived. Allowed |
+ | | | values: |
+ | | | |
+ | | | * ``literal``: ``value`` is a literal |
+ | | | string to compare against |
+ | | | * ``param``: ``value`` is the name of a |
+ | | | resource parameter to compare against |
+ | | | (only valid in location constraints) |
+ | | | * ``meta``: ``value`` is the name of a |
+ | | | resource meta-attribute to compare |
+ | | | against (only valid in location |
+ | | | constraints) |
+ +--------------+---------------------------------+-------------------------------------------+
+
+.. _node-attribute-expressions-special:
+
+In addition to custom node attributes defined by the administrator, the cluster
+defines special, built-in node attributes for each node that can also be used
+in rule expressions.
+
+.. table:: **Built-in Node Attributes**
+ :widths: 1 4
+
+ +---------------+-----------------------------------------------------------+
+ | Name | Value |
+ +===============+===========================================================+
+ | #uname | :ref:`Node name <node_name>` |
+ +---------------+-----------------------------------------------------------+
+ | #id | Node ID |
+ +---------------+-----------------------------------------------------------+
+ | #kind | Node type. Possible values are ``cluster``, ``remote``, |
+ | | and ``container``. Kind is ``remote`` for Pacemaker Remote|
+ | | nodes created with the ``ocf:pacemaker:remote`` resource, |
+ | | and ``container`` for Pacemaker Remote guest nodes and |
+ | | bundle nodes |
+ +---------------+-----------------------------------------------------------+
+ | #is_dc | ``true`` if this node is the cluster's Designated |
+ | | Controller (DC), ``false`` otherwise |
+ +---------------+-----------------------------------------------------------+
+ | #cluster-name | The value of the ``cluster-name`` cluster property, if set|
+ +---------------+-----------------------------------------------------------+
+ | #site-name | The value of the ``site-name`` node attribute, if set, |
+ | | otherwise identical to ``#cluster-name`` |
+ +---------------+-----------------------------------------------------------+
+ | #role | The role the relevant promotable clone resource has on |
+ | | this node. Valid only within a rule for a location |
+ | | constraint for a promotable clone resource. |
+ +---------------+-----------------------------------------------------------+
+
+.. Add_to_above_table_if_released:
+
+ +---------------+-----------------------------------------------------------+
+ | #ra-version | The installed version of the resource agent on the node, |
+ | | as defined by the ``version`` attribute of the |
+ | | ``resource-agent`` tag in the agent's metadata. Valid only|
+ | | within rules controlling resource options. This can be |
+ | | useful during rolling upgrades of a backward-incompatible |
+ | | resource agent. *(since x.x.x)* |
+
+
+.. index::
+ single: rule; date/time expression
+ pair: XML element; date_expression
+
+Date/Time Expressions
+#####################
+
+Date/time expressions are rule conditions based (as the name suggests) on the
+current date and time.
+
+A ``date_expression`` element may optionally contain a ``date_spec`` or
+``duration`` element depending on the context.
+
+.. table:: **Attributes of a date_expression Element**
+ :widths: 1 4
+
+ +---------------+-----------------------------------------------------------+
+ | Attribute | Description |
+ +===============+===========================================================+
+ | id | .. index:: |
+ | | pair: id; date_expression |
+ | | |
+ | | A unique name for this element (required) |
+ +---------------+-----------------------------------------------------------+
+ | start | .. index:: |
+ | | pair: start; date_expression |
+ | | |
+ | | A date/time conforming to the |
+ | | `ISO8601 <https://en.wikipedia.org/wiki/ISO_8601>`_ |
+ | | specification. May be used when ``operation`` is |
+ | | ``in_range`` (in which case at least one of ``start`` or |
+ | | ``end`` must be specified) or ``gt`` (in which case |
+ | | ``start`` is required). |
+ +---------------+-----------------------------------------------------------+
+ | end | .. index:: |
+ | | pair: end; date_expression |
+ | | |
+ | | A date/time conforming to the |
+ | | `ISO8601 <https://en.wikipedia.org/wiki/ISO_8601>`_ |
+ | | specification. May be used when ``operation`` is |
+ | | ``in_range`` (in which case at least one of ``start`` or |
+ | | ``end`` must be specified) or ``lt`` (in which case |
+ | | ``end`` is required). |
+ +---------------+-----------------------------------------------------------+
+ | operation | .. index:: |
+ | | pair: operation; date_expression |
+ | | |
+ | | Compares the current date/time with the start and/or end |
+ | | date, depending on the context. Allowed values: |
+ | | |
+ | | * ``gt:`` True if the current date/time is after ``start``|
+ | | * ``lt:`` True if the current date/time is before ``end`` |
+ | | * ``in_range:`` True if the current date/time is after |
+ | | ``start`` (if specified) and before either ``end`` (if |
+ | | specified) or ``start`` plus the value of the |
+ | | ``duration`` element (if one is contained in the |
+ | | ``date_expression``). If both ``end`` and ``duration`` |
+ | | are specified, ``duration`` is ignored. |
+ | | * ``date_spec:`` True if the current date/time matches |
+ | | the specification given in the contained ``date_spec`` |
+ | | element (described below) |
+ +---------------+-----------------------------------------------------------+
+
+
+.. note:: There is no ``eq``, ``neq``, ``gte``, or ``lte`` operation, since
+ they would be valid only for a single second.
+
+
+.. index::
+ single: date specification
+ pair: XML element; date_spec
+
+Date Specifications
+___________________
+
+A ``date_spec`` element is used to create a cron-like expression relating
+to time. Each field can contain a single number or range. Any field not
+supplied is ignored.
+
+.. table:: **Attributes of a date_spec Element**
+ :widths: 1 3
+
+ +---------------+-----------------------------------------------------------+
+ | Attribute | Description |
+ +===============+===========================================================+
+ | id | .. index:: |
+ | | pair: id; date_spec |
+ | | |
+ | | A unique name for this element (required) |
+ +---------------+-----------------------------------------------------------+
+ | seconds | .. index:: |
+ | | pair: seconds; date_spec |
+ | | |
+ | | Allowed values: 0-59 |
+ +---------------+-----------------------------------------------------------+
+ | minutes | .. index:: |
+ | | pair: minutes; date_spec |
+ | | |
+ | | Allowed values: 0-59 |
+ +---------------+-----------------------------------------------------------+
+ | hours | .. index:: |
+ | | pair: hours; date_spec |
+ | | |
+ | | Allowed values: 0-23 (where 0 is midnight and 23 is |
+ | | 11 p.m.) |
+ +---------------+-----------------------------------------------------------+
+ | monthdays | .. index:: |
+ | | pair: monthdays; date_spec |
+ | | |
+ | | Allowed values: 1-31 (depending on month and year) |
+ +---------------+-----------------------------------------------------------+
+ | weekdays | .. index:: |
+ | | pair: weekdays; date_spec |
+ | | |
+ | | Allowed values: 1-7 (where 1 is Monday and 7 is Sunday) |
+ +---------------+-----------------------------------------------------------+
+ | yeardays | .. index:: |
+ | | pair: yeardays; date_spec |
+ | | |
+ | | Allowed values: 1-366 (depending on the year) |
+ +---------------+-----------------------------------------------------------+
+ | months | .. index:: |
+ | | pair: months; date_spec |
+ | | |
+ | | Allowed values: 1-12 |
+ +---------------+-----------------------------------------------------------+
+ | weeks | .. index:: |
+ | | pair: weeks; date_spec |
+ | | |
+ | | Allowed values: 1-53 (depending on weekyear) |
+ +---------------+-----------------------------------------------------------+
+ | years | .. index:: |
+ | | pair: years; date_spec |
+ | | |
+ | | Year according to the Gregorian calendar |
+ +---------------+-----------------------------------------------------------+
+ | weekyears | .. index:: |
+ | | pair: weekyears; date_spec |
+ | | |
+ | | Year in which the week started; for example, 1 January |
+ | | 2005 can be specified in ISO 8601 as "2005-001 Ordinal", |
+ | | "2005-01-01 Gregorian" or "2004-W53-6 Weekly" and thus |
+ | | would match ``years="2005"`` or ``weekyears="2004"`` |
+ +---------------+-----------------------------------------------------------+
+ | moon | .. index:: |
+ | | pair: moon; date_spec |
+ | | |
+ | | Allowed values are 0-7 (where 0 is the new moon and 4 is |
+ | | full moon). *(deprecated since 2.1.6)* |
+ +---------------+-----------------------------------------------------------+
+
+For example, ``monthdays="1"`` matches the first day of every month, and
+``hours="09-17"`` matches the hours between 9 a.m. and 5 p.m. (inclusive).
+
+At this time, multiple ranges (e.g. ``weekdays="1,2"`` or ``weekdays="1-2,5-6"``)
+are not supported.
+
+.. note:: Pacemaker can calculate when evaluation of a ``date_expression`` with
+ an ``operation`` of ``gt``, ``lt``, or ``in_range`` will next change,
+ and schedule a cluster re-check for that time. However, it does not
+ do this for ``date_spec``. Instead, it evaluates the ``date_spec``
+ whenever a cluster re-check naturally happens via a cluster event or
+ the ``cluster-recheck-interval`` cluster option.
+
+ For example, if you have a ``date_spec`` enabling a resource from 9
+ a.m. to 5 p.m., and ``cluster-recheck-interval`` has been set to 5
+ minutes, then sometime between 9 a.m. and 9:05 a.m. the cluster would
+ notice that it needs to start the resource, and sometime between 5
+ p.m. and 5:05 p.m. it would realize that it needs to stop the
+ resource. The timing of the actual start and stop actions will
+ further depend on factors such as any other actions the cluster may
+ need to perform first, and the load of the machine.
+
+
+.. index::
+ single: duration
+ pair: XML element; duration
+
+Durations
+_________
+
+A ``duration`` is used to calculate a value for ``end`` when one is not
+supplied to ``in_range`` operations. It contains one or more attributes each
+containing a single number. Any attribute not supplied is ignored.
+
+.. table:: **Attributes of a duration Element**
+ :widths: 1 3
+
+ +---------------+-----------------------------------------------------------+
+ | Attribute | Description |
+ +===============+===========================================================+
+ | id | .. index:: |
+ | | pair: id; duration |
+ | | |
+ | | A unique name for this element (required) |
+ +---------------+-----------------------------------------------------------+
+ | seconds | .. index:: |
+ | | pair: seconds; duration |
+ | | |
+ | | This many seconds will be added to the total duration |
+ +---------------+-----------------------------------------------------------+
+ | minutes | .. index:: |
+ | | pair: minutes; duration |
+ | | |
+ | | This many minutes will be added to the total duration |
+ +---------------+-----------------------------------------------------------+
+ | hours | .. index:: |
+ | | pair: hours; duration |
+ | | |
+ | | This many hours will be added to the total duration |
+ +---------------+-----------------------------------------------------------+
+ | days | .. index:: |
+ | | pair: days; duration |
+ | | |
+ | | This many days will be added to the total duration |
+ +---------------+-----------------------------------------------------------+
+ | weeks | .. index:: |
+ | | pair: weeks; duration |
+ | | |
+ | | This many weeks will be added to the total duration |
+ +---------------+-----------------------------------------------------------+
+ | months | .. index:: |
+ | | pair: months; duration |
+ | | |
+ | | This many months will be added to the total duration |
+ +---------------+-----------------------------------------------------------+
+ | years | .. index:: |
+ | | pair: years; duration |
+ | | |
+ | | This many years will be added to the total duration |
+ +---------------+-----------------------------------------------------------+
+
+
+Example Time-Based Expressions
+______________________________
+
+A small sample of how time-based expressions can be used:
+
+.. topic:: True if now is any time in the year 2005
+
+ .. code-block:: xml
+
+ <rule id="rule1" score="INFINITY">
+ <date_expression id="date_expr1" start="2005-001" operation="in_range">
+ <duration id="duration1" years="1"/>
+ </date_expression>
+ </rule>
+
+ or equivalently:
+
+ .. code-block:: xml
+
+ <rule id="rule2" score="INFINITY">
+ <date_expression id="date_expr2" operation="date_spec">
+ <date_spec id="date_spec2" years="2005"/>
+ </date_expression>
+ </rule>
+
+.. topic:: 9 a.m. to 5 p.m. Monday through Friday
+
+ .. code-block:: xml
+
+ <rule id="rule3" score="INFINITY">
+ <date_expression id="date_expr3" operation="date_spec">
+ <date_spec id="date_spec3" hours="9-16" weekdays="1-5"/>
+ </date_expression>
+ </rule>
+
+ Note that the ``16`` matches all the way through ``16:59:59``, because the
+ numeric value of the hour still matches.
+
+.. topic:: 9 a.m. to 6 p.m. Monday through Friday or anytime Saturday
+
+ .. code-block:: xml
+
+ <rule id="rule4" score="INFINITY" boolean-op="or">
+ <date_expression id="date_expr4-1" operation="date_spec">
+ <date_spec id="date_spec4-1" hours="9-16" weekdays="1-5"/>
+ </date_expression>
+ <date_expression id="date_expr4-2" operation="date_spec">
+ <date_spec id="date_spec4-2" weekdays="6"/>
+ </date_expression>
+ </rule>
+
+.. topic:: 9 a.m. to 5 p.m. or 9 p.m. to 12 a.m. Monday through Friday
+
+ .. code-block:: xml
+
+ <rule id="rule5" score="INFINITY" boolean-op="and">
+ <rule id="rule5-nested1" score="INFINITY" boolean-op="or">
+ <date_expression id="date_expr5-1" operation="date_spec">
+ <date_spec id="date_spec5-1" hours="9-16"/>
+ </date_expression>
+ <date_expression id="date_expr5-2" operation="date_spec">
+ <date_spec id="date_spec5-2" hours="21-23"/>
+ </date_expression>
+ </rule>
+ <date_expression id="date_expr5-3" operation="date_spec">
+ <date_spec id="date_spec5-3" weekdays="1-5"/>
+ </date_expression>
+ </rule>
+
+.. topic:: Mondays in March 2005
+
+ .. code-block:: xml
+
+ <rule id="rule6" score="INFINITY" boolean-op="and">
+ <date_expression id="date_expr6-1" operation="date_spec">
+ <date_spec id="date_spec6" weekdays="1"/>
+ </date_expression>
+ <date_expression id="date_expr6-2" operation="in_range"
+ start="2005-03-01" end="2005-04-01"/>
+ </rule>
+
+ .. note:: Because no time is specified with the above dates, 00:00:00 is
+ implied. This means that the range includes all of 2005-03-01 but
+ none of 2005-04-01. You may wish to write ``end`` as
+ ``"2005-03-31T23:59:59"`` to avoid confusion.
+
+
+.. index::
+ single: rule; resource expression
+ single: resource; rule expression
+ pair: XML element; rsc_expression
+
+Resource Expressions
+####################
+
+An ``rsc_expression`` *(since 2.0.5)* is a rule condition based on a resource
+agent's properties. This rule is only valid within an ``rsc_defaults`` or
+``op_defaults`` context. None of the matching attributes of ``class``,
+``provider``, and ``type`` are required. If one is omitted, all values of that
+attribute will match. For instance, omitting ``type`` means every type will
+match.
+
+.. table:: **Attributes of a rsc_expression Element**
+ :widths: 1 3
+
+ +---------------+-----------------------------------------------------------+
+ | Attribute | Description |
+ +===============+===========================================================+
+ | id | .. index:: |
+ | | pair: id; rsc_expression |
+ | | |
+ | | A unique name for this element (required) |
+ +---------------+-----------------------------------------------------------+
+ | class | .. index:: |
+ | | pair: class; rsc_expression |
+ | | |
+ | | The standard name to be matched against resource agents |
+ +---------------+-----------------------------------------------------------+
+ | provider | .. index:: |
+ | | pair: provider; rsc_expression |
+ | | |
+ | | If given, the vendor to be matched against resource |
+ | | agents (only relevant when ``class`` is ``ocf``) |
+ +---------------+-----------------------------------------------------------+
+ | type | .. index:: |
+ | | pair: type; rsc_expression |
+ | | |
+ | | The name of the resource agent to be matched |
+ +---------------+-----------------------------------------------------------+
+
+Example Resource-Based Expressions
+__________________________________
+
+A small sample of how resource-based expressions can be used:
+
+.. topic:: True for all ``ocf:heartbeat:IPaddr2`` resources
+
+ .. code-block:: xml
+
+ <rule id="rule1" score="INFINITY">
+ <rsc_expression id="rule_expr1" class="ocf" provider="heartbeat" type="IPaddr2"/>
+ </rule>
+
+.. topic:: Provider doesn't apply to non-OCF resources
+
+ .. code-block:: xml
+
+ <rule id="rule2" score="INFINITY">
+ <rsc_expression id="rule_expr2" class="stonith" type="fence_xvm"/>
+ </rule>
+
+
+.. index::
+ single: rule; operation expression
+ single: operation; rule expression
+ pair: XML element; op_expression
+
+Operation Expressions
+#####################
+
+
+An ``op_expression`` *(since 2.0.5)* is a rule condition based on an action of
+some resource agent. This rule is only valid within an ``op_defaults`` context.
+
+.. table:: **Attributes of an op_expression Element**
+ :widths: 1 3
+
+ +---------------+-----------------------------------------------------------+
+ | Attribute | Description |
+ +===============+===========================================================+
+ | id | .. index:: |
+ | | pair: id; op_expression |
+ | | |
+ | | A unique name for this element (required) |
+ +---------------+-----------------------------------------------------------+
+ | name | .. index:: |
+ | | pair: name; op_expression |
+ | | |
+ | | The action name to match against. This can be any action |
+ | | supported by the resource agent; common values include |
+ | | ``monitor``, ``start``, and ``stop`` (required). |
+ +---------------+-----------------------------------------------------------+
+ | interval | .. index:: |
+ | | pair: interval; op_expression |
+ | | |
+ | | The interval of the action to match against. If not given,|
+ | | only the name attribute will be used to match. |
+ +---------------+-----------------------------------------------------------+
+
+Example Operation-Based Expressions
+___________________________________
+
+A small sample of how operation-based expressions can be used:
+
+.. topic:: True for all monitor actions
+
+ .. code-block:: xml
+
+ <rule id="rule1" score="INFINITY">
+ <op_expression id="rule_expr1" name="monitor"/>
+ </rule>
+
+.. topic:: True for all monitor actions with a 10 second interval
+
+ .. code-block:: xml
+
+ <rule id="rule2" score="INFINITY">
+ <op_expression id="rule_expr2" name="monitor" interval="10s"/>
+ </rule>
+
+
+.. index::
+ pair: location constraint; rule
+
+Using Rules to Determine Resource Location
+##########################################
+
+A location constraint may contain one or more top-level rules. The cluster will
+act as if there is a separate location constraint for each rule that evaluates
+as true.
+
+Consider the following simple location constraint:
+
+.. topic:: Prevent resource ``webserver`` from running on node ``node3``
+
+ .. code-block:: xml
+
+ <rsc_location id="ban-apache-on-node3" rsc="webserver"
+ score="-INFINITY" node="node3"/>
+
+The same constraint can be more verbosely written using a rule:
+
+.. topic:: Prevent resource ``webserver`` from running on node ``node3`` using a rule
+
+ .. code-block:: xml
+
+ <rsc_location id="ban-apache-on-node3" rsc="webserver">
+ <rule id="ban-apache-rule" score="-INFINITY">
+ <expression id="ban-apache-expr" attribute="#uname"
+ operation="eq" value="node3"/>
+ </rule>
+ </rsc_location>
+
+The advantage of using the expanded form is that one could add more expressions
+(for example, limiting the constraint to certain days of the week), or activate
+the constraint by some node attribute other than node name.
+
+Location Rules Based on Other Node Properties
+_____________________________________________
+
+The expanded form allows us to match on node properties other than its name.
+If we rated each machine's CPU power such that the cluster had the following
+nodes section:
+
+.. topic:: Sample node section with node attributes
+
+ .. code-block:: xml
+
+ <nodes>
+ <node id="uuid1" uname="c001n01" type="normal">
+ <instance_attributes id="uuid1-custom_attrs">
+ <nvpair id="uuid1-cpu_mips" name="cpu_mips" value="1234"/>
+ </instance_attributes>
+ </node>
+ <node id="uuid2" uname="c001n02" type="normal">
+ <instance_attributes id="uuid2-custom_attrs">
+ <nvpair id="uuid2-cpu_mips" name="cpu_mips" value="5678"/>
+ </instance_attributes>
+ </node>
+ </nodes>
+
+then we could prevent resources from running on underpowered machines with this
+rule:
+
+.. topic:: Rule using a node attribute (to be used inside a location constraint)
+
+ .. code-block:: xml
+
+ <rule id="need-more-power-rule" score="-INFINITY">
+ <expression id="need-more-power-expr" attribute="cpu_mips"
+ operation="lt" value="3000"/>
+ </rule>
+
+Using ``score-attribute`` Instead of ``score``
+______________________________________________
+
+When using ``score-attribute`` instead of ``score``, each node matched by the
+rule has its score adjusted differently, according to its value for the named
+node attribute. Thus, in the previous example, if a rule inside a location
+constraint for a resource used ``score-attribute="cpu_mips"``, ``c001n01``
+would have its preference to run the resource increased by ``1234`` whereas
+``c001n02`` would have its preference increased by ``5678``.
+
+
+.. _s-rsc-pattern-rules:
+
+Specifying location scores using pattern submatches
+___________________________________________________
+
+Location constraints may use ``rsc-pattern`` to apply the constraint to all
+resources whose IDs match the given pattern (see :ref:`s-rsc-pattern`). The
+pattern may contain up to 9 submatches in parentheses, whose values may be used
+as ``%1`` through ``%9`` in a rule's ``score-attribute`` or a rule expression's
+``attribute``.
+
+As an example, the following configuration (only relevant parts are shown)
+gives the resources **server-httpd** and **ip-httpd** a preference of 100 on
+**node1** and 50 on **node2**, and **ip-gateway** a preference of -100 on
+**node1** and 200 on **node2**.
+
+.. topic:: Location constraint using submatches
+
+ .. code-block:: xml
+
+ <nodes>
+ <node id="1" uname="node1">
+ <instance_attributes id="node1-attrs">
+ <nvpair id="node1-prefer-httpd" name="prefer-httpd" value="100"/>
+ <nvpair id="node1-prefer-gateway" name="prefer-gateway" value="-100"/>
+ </instance_attributes>
+ </node>
+ <node id="2" uname="node2">
+ <instance_attributes id="node2-attrs">
+ <nvpair id="node2-prefer-httpd" name="prefer-httpd" value="50"/>
+ <nvpair id="node2-prefer-gateway" name="prefer-gateway" value="200"/>
+ </instance_attributes>
+ </node>
+ </nodes>
+ <resources>
+ <primitive id="server-httpd" class="ocf" provider="heartbeat" type="apache"/>
+ <primitive id="ip-httpd" class="ocf" provider="heartbeat" type="IPaddr2"/>
+ <primitive id="ip-gateway" class="ocf" provider="heartbeat" type="IPaddr2"/>
+ </resources>
+ <constraints>
+ <!-- The following constraint says that for any resource whose name
+ starts with "server-" or "ip-", that resource's preference for a
+ node is the value of the node attribute named "prefer-" followed
+ by the part of the resource name after "server-" or "ip-",
+ wherever such a node attribute is defined.
+ -->
+ <rsc_location id="location1" rsc-pattern="(server|ip)-(.*)">
+ <rule id="location1-rule1" score-attribute="prefer-%2">
+ <expression id="location1-rule1-expression1" attribute="prefer-%2" operation="defined"/>
+ </rule>
+ </rsc_location>
+ </constraints>
+
+
+.. index::
+ pair: cluster option; rule
+ pair: instance attribute; rule
+ pair: meta-attribute; rule
+ pair: resource defaults; rule
+ pair: operation defaults; rule
+ pair: node attribute; rule
+
+Using Rules to Define Options
+#############################
+
+Rules may be used to control a variety of options:
+
+* :ref:`Cluster options <cluster_options>` (``cluster_property_set`` elements)
+* :ref:`Node attributes <node_attributes>` (``instance_attributes`` or
+ ``utilization`` elements inside a ``node`` element)
+* :ref:`Resource options <resource_options>` (``utilization``,
+ ``meta_attributes``, or ``instance_attributes`` elements inside a resource
+ definition element or ``op`` , ``rsc_defaults``, ``op_defaults``, or
+ ``template`` element)
+* :ref:`Operation properties <operation_properties>` (``meta_attributes``
+ elements inside an ``op`` or ``op_defaults`` element)
+
+.. note::
+
+ Attribute-based expressions for meta-attributes can only be used within
+ ``operations`` and ``op_defaults``. They will not work with resource
+ configuration or ``rsc_defaults``. Additionally, attribute-based
+ expressions cannot be used with cluster options.
+
+Using Rules to Control Resource Options
+_______________________________________
+
+Often some cluster nodes will be different from their peers. Sometimes,
+these differences -- e.g. the location of a binary or the names of network
+interfaces -- require resources to be configured differently depending
+on the machine they're hosted on.
+
+By defining multiple ``instance_attributes`` objects for the resource and
+adding a rule to each, we can easily handle these special cases.
+
+In the example below, ``mySpecialRsc`` will use eth1 and port 9999 when run on
+``node1``, eth2 and port 8888 on ``node2`` and default to eth0 and port 9999
+for all other nodes.
+
+.. topic:: Defining different resource options based on the node name
+
+ .. code-block:: xml
+
+ <primitive id="mySpecialRsc" class="ocf" type="Special" provider="me">
+ <instance_attributes id="special-node1" score="3">
+ <rule id="node1-special-case" score="INFINITY" >
+ <expression id="node1-special-case-expr" attribute="#uname"
+ operation="eq" value="node1"/>
+ </rule>
+ <nvpair id="node1-interface" name="interface" value="eth1"/>
+ </instance_attributes>
+ <instance_attributes id="special-node2" score="2" >
+ <rule id="node2-special-case" score="INFINITY">
+ <expression id="node2-special-case-expr" attribute="#uname"
+ operation="eq" value="node2"/>
+ </rule>
+ <nvpair id="node2-interface" name="interface" value="eth2"/>
+ <nvpair id="node2-port" name="port" value="8888"/>
+ </instance_attributes>
+ <instance_attributes id="defaults" score="1" >
+ <nvpair id="default-interface" name="interface" value="eth0"/>
+ <nvpair id="default-port" name="port" value="9999"/>
+ </instance_attributes>
+ </primitive>
+
+The order in which ``instance_attributes`` objects are evaluated is determined
+by their score (highest to lowest). If not supplied, the score defaults to
+zero. Objects with an equal score are processed in their listed order. If the
+``instance_attributes`` object has no rule, or a ``rule`` that evaluates to
+``true``, then for any parameter the resource does not yet have a value for,
+the resource will use the parameter values defined by the ``instance_attributes``.
+
+For example, given the configuration above, if the resource is placed on
+``node1``:
+
+* ``special-node1`` has the highest score (3) and so is evaluated first; its
+ rule evaluates to ``true``, so ``interface`` is set to ``eth1``.
+* ``special-node2`` is evaluated next with score 2, but its rule evaluates to
+ ``false``, so it is ignored.
+* ``defaults`` is evaluated last with score 1, and has no rule, so its values
+ are examined; ``interface`` is already defined, so the value here is not
+ used, but ``port`` is not yet defined, so ``port`` is set to ``9999``.
+
+Using Rules to Control Resource Defaults
+________________________________________
+
+Rules can be used for resource and operation defaults. The following example
+illustrates how to set a different ``resource-stickiness`` value during and
+outside work hours. This allows resources to automatically move back to their
+most preferred hosts, but at a time that (in theory) does not interfere with
+business activities.
+
+.. topic:: Change ``resource-stickiness`` during working hours
+
+ .. code-block:: xml
+
+ <rsc_defaults>
+ <meta_attributes id="core-hours" score="2">
+ <rule id="core-hour-rule" score="0">
+ <date_expression id="nine-to-five-Mon-to-Fri" operation="date_spec">
+ <date_spec id="nine-to-five-Mon-to-Fri-spec" hours="9-16" weekdays="1-5"/>
+ </date_expression>
+ </rule>
+ <nvpair id="core-stickiness" name="resource-stickiness" value="INFINITY"/>
+ </meta_attributes>
+ <meta_attributes id="after-hours" score="1" >
+ <nvpair id="after-stickiness" name="resource-stickiness" value="0"/>
+ </meta_attributes>
+ </rsc_defaults>
+
+Rules may be used similarly in ``instance_attributes`` or ``utilization``
+blocks.
+
+Any single block may directly contain only a single rule, but that rule may
+itself contain any number of rules.
+
+``rsc_expression`` and ``op_expression`` blocks may additionally be used to
+set defaults on either a single resource or across an entire class of resources
+with a single rule. ``rsc_expression`` may be used to select resource agents
+within both ``rsc_defaults`` and ``op_defaults``, while ``op_expression`` may
+only be used within ``op_defaults``. If multiple rules succeed for a given
+resource agent, the last one specified will be the one that takes effect. As
+with any other rule, boolean operations may be used to make more complicated
+expressions.
+
+.. topic:: Default all IPaddr2 resources to stopped
+
+ .. code-block:: xml
+
+ <rsc_defaults>
+ <meta_attributes id="op-target-role">
+ <rule id="op-target-role-rule" score="INFINITY">
+ <rsc_expression id="op-target-role-expr" class="ocf" provider="heartbeat"
+ type="IPaddr2"/>
+ </rule>
+ <nvpair id="op-target-role-nvpair" name="target-role" value="Stopped"/>
+ </meta_attributes>
+ </rsc_defaults>
+
+.. topic:: Default all monitor action timeouts to 7 seconds
+
+ .. code-block:: xml
+
+ <op_defaults>
+ <meta_attributes id="op-monitor-defaults">
+ <rule id="op-monitor-default-rule" score="INFINITY">
+ <op_expression id="op-monitor-default-expr" name="monitor"/>
+ </rule>
+ <nvpair id="op-monitor-timeout" name="timeout" value="7s"/>
+ </meta_attributes>
+ </op_defaults>
+
+.. topic:: Default the timeout on all 10-second-interval monitor actions on ``IPaddr2`` resources to 8 seconds
+
+ .. code-block:: xml
+
+ <op_defaults>
+ <meta_attributes id="op-monitor-and">
+ <rule id="op-monitor-and-rule" score="INFINITY">
+ <rsc_expression id="op-monitor-and-rsc-expr" class="ocf" provider="heartbeat"
+ type="IPaddr2"/>
+ <op_expression id="op-monitor-and-op-expr" name="monitor" interval="10s"/>
+ </rule>
+ <nvpair id="op-monitor-and-timeout" name="timeout" value="8s"/>
+ </meta_attributes>
+ </op_defaults>
+
+
+.. index::
+ pair: rule; cluster option
+
+Using Rules to Control Cluster Options
+______________________________________
+
+Controlling cluster options is achieved in much the same manner as specifying
+different resource options on different nodes.
+
+The following example illustrates how to set ``maintenance_mode`` during a
+scheduled maintenance window. This will keep the cluster running but not
+monitor, start, or stop resources during this time.
+
+.. topic:: Schedule a maintenance window for 9 to 11 p.m. CDT Sept. 20, 2019
+
+ .. code-block:: xml
+
+ <crm_config>
+ <cluster_property_set id="cib-bootstrap-options">
+ <nvpair id="bootstrap-stonith-enabled" name="stonith-enabled" value="1"/>
+ </cluster_property_set>
+ <cluster_property_set id="normal-set" score="10">
+ <nvpair id="normal-maintenance-mode" name="maintenance-mode" value="false"/>
+ </cluster_property_set>
+ <cluster_property_set id="maintenance-window-set" score="1000">
+ <nvpair id="maintenance-nvpair1" name="maintenance-mode" value="true"/>
+ <rule id="maintenance-rule1" score="INFINITY">
+ <date_expression id="maintenance-date1" operation="in_range"
+ start="2019-09-20 21:00:00 -05:00" end="2019-09-20 23:00:00 -05:00"/>
+ </rule>
+ </cluster_property_set>
+ </crm_config>
+
+.. important:: The ``cluster_property_set`` with an ``id`` set to
+ "cib-bootstrap-options" will *always* have the highest priority,
+ regardless of any scores. Therefore, rules in another
+ ``cluster_property_set`` can never take effect for any
+ properties listed in the bootstrap set.
diff --git a/doc/sphinx/Pacemaker_Explained/status.rst b/doc/sphinx/Pacemaker_Explained/status.rst
new file mode 100644
index 0000000..2d7dd7e
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/status.rst
@@ -0,0 +1,372 @@
+.. index::
+ single: status
+ single: XML element, status
+
+Status -- Here be dragons
+-------------------------
+
+Most users never need to understand the contents of the status section
+and can be happy with the output from ``crm_mon``.
+
+However for those with a curious inclination, this section attempts to
+provide an overview of its contents.
+
+.. index::
+ single: node; status
+
+Node Status
+###########
+
+In addition to the cluster's configuration, the CIB holds an
+up-to-date representation of each cluster node in the ``status`` section.
+
+.. topic:: A bare-bones status entry for a healthy node **cl-virt-1**
+
+ .. code-block:: xml
+
+ <node_state id="1" uname="cl-virt-1" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
+ <transient_attributes id="1"/>
+ <lrm id="1"/>
+ </node_state>
+
+Users are highly recommended *not* to modify any part of a node's
+state *directly*. The cluster will periodically regenerate the entire
+section from authoritative sources, so any changes should be done
+with the tools appropriate to those sources.
+
+.. table:: **Authoritative Sources for State Information**
+ :widths: 1 1
+
+ +----------------------+----------------------+
+ | CIB Object | Authoritative Source |
+ +======================+======================+
+ | node_state | pacemaker-controld |
+ +----------------------+----------------------+
+ | transient_attributes | pacemaker-attrd |
+ +----------------------+----------------------+
+ | lrm | pacemaker-execd |
+ +----------------------+----------------------+
+
+The fields used in the ``node_state`` objects are named as they are
+largely for historical reasons and are rooted in Pacemaker's origins
+as the resource manager for the older Heartbeat project. They have remained
+unchanged to preserve compatibility with older versions.
+
+.. table:: **Node Status Fields**
+ :widths: 1 3
+
+ +------------------+----------------------------------------------------------+
+ | Field | Description |
+ +==================+==========================================================+
+ | id | .. index: |
+ | | single: id; node status |
+ | | single: node; status, id |
+ | | |
+ | | Unique identifier for the node. Corosync-based clusters |
+ | | use a numeric counter. |
+ +------------------+----------------------------------------------------------+
+ | uname | .. index:: |
+ | | single: uname; node status |
+ | | single: node; status, uname |
+ | | |
+ | | The node's name as known by the cluster |
+ +------------------+----------------------------------------------------------+
+ | in_ccm | .. index:: |
+ | | single: in_ccm; node status |
+ | | single: node; status, in_ccm |
+ | | |
+ | | Is the node a member at the cluster communication later? |
+ | | Allowed values: ``true``, ``false``. |
+ +------------------+----------------------------------------------------------+
+ | crmd | .. index:: |
+ | | single: crmd; node status |
+ | | single: node; status, crmd |
+ | | |
+ | | Is the node a member at the pacemaker layer? Allowed |
+ | | values: ``online``, ``offline``. |
+ +------------------+----------------------------------------------------------+
+ | crm-debug-origin | .. index:: |
+ | | single: crm-debug-origin; node status |
+ | | single: node; status, crm-debug-origin |
+ | | |
+ | | The name of the source function that made the most |
+ | | recent change (for debugging purposes). |
+ +------------------+----------------------------------------------------------+
+ | join | .. index:: |
+ | | single: join; node status |
+ | | single: node; status, join |
+ | | |
+ | | Does the node participate in hosting resources? |
+ | | Allowed values: ``down``, ``pending``, ``member``. |
+ | | ``banned``. |
+ +------------------+----------------------------------------------------------+
+ | expected | .. index:: |
+ | | single: expected; node status |
+ | | single: node; status, expected |
+ | | |
+ | | Expected value for ``join``. |
+ +------------------+----------------------------------------------------------+
+
+The cluster uses these fields to determine whether, at the node level, the
+node is healthy or is in a failed state and needs to be fenced.
+
+Transient Node Attributes
+#########################
+
+Like regular :ref:`node_attributes`, the name/value
+pairs listed in the ``transient_attributes`` section help to describe the
+node. However they are forgotten by the cluster when the node goes offline.
+This can be useful, for instance, when you want a node to be in standby mode
+(not able to run resources) just until the next reboot.
+
+In addition to any values the administrator sets, the cluster will
+also store information about failed resources here.
+
+.. topic:: A set of transient node attributes for node **cl-virt-1**
+
+ .. code-block:: xml
+
+ <transient_attributes id="cl-virt-1">
+ <instance_attributes id="status-cl-virt-1">
+ <nvpair id="status-cl-virt-1-pingd" name="pingd" value="3"/>
+ <nvpair id="status-cl-virt-1-probe_complete" name="probe_complete" value="true"/>
+ <nvpair id="status-cl-virt-1-fail-count-pingd:0.monitor_30000" name="fail-count-pingd:0#monitor_30000" value="1"/>
+ <nvpair id="status-cl-virt-1-last-failure-pingd:0" name="last-failure-pingd:0" value="1239009742"/>
+ </instance_attributes>
+ </transient_attributes>
+
+In the above example, we can see that a monitor on the ``pingd:0`` resource has
+failed once, at 09:22:22 UTC 6 April 2009. [#]_.
+
+We also see that the node is connected to three **pingd** peers and that
+all known resources have been checked for on this machine (``probe_complete``).
+
+.. index::
+ single: Operation History
+
+Operation History
+#################
+
+A node's resource history is held in the ``lrm_resources`` tag (a child
+of the ``lrm`` tag). The information stored here includes enough
+information for the cluster to stop the resource safely if it is
+removed from the ``configuration`` section. Specifically, the resource's
+``id``, ``class``, ``type`` and ``provider`` are stored.
+
+.. topic:: A record of the ``apcstonith`` resource
+
+ .. code-block:: xml
+
+ <lrm_resource id="apcstonith" type="fence_apc_snmp" class="stonith"/>
+
+Additionally, we store the last job for every combination of
+``resource``, ``action`` and ``interval``. The concatenation of the values in
+this tuple are used to create the id of the ``lrm_rsc_op`` object.
+
+.. table:: **Contents of an lrm_rsc_op job**
+ :class: longtable
+ :widths: 1 3
+
+ +------------------+----------------------------------------------------------+
+ | Field | Description |
+ +==================+==========================================================+
+ | id | .. index:: |
+ | | single: id; action status |
+ | | single: action; status, id |
+ | | |
+ | | Identifier for the job constructed from the resource's |
+ | | ``operation`` and ``interval``. |
+ +------------------+----------------------------------------------------------+
+ | call-id | .. index:: |
+ | | single: call-id; action status |
+ | | single: action; status, call-id |
+ | | |
+ | | The job's ticket number. Used as a sort key to determine |
+ | | the order in which the jobs were executed. |
+ +------------------+----------------------------------------------------------+
+ | operation | .. index:: |
+ | | single: operation; action status |
+ | | single: action; status, operation |
+ | | |
+ | | The action the resource agent was invoked with. |
+ +------------------+----------------------------------------------------------+
+ | interval | .. index:: |
+ | | single: interval; action status |
+ | | single: action; status, interval |
+ | | |
+ | | The frequency, in milliseconds, at which the operation |
+ | | will be repeated. A one-off job is indicated by 0. |
+ +------------------+----------------------------------------------------------+
+ | op-status | .. index:: |
+ | | single: op-status; action status |
+ | | single: action; status, op-status |
+ | | |
+ | | The job's status. Generally this will be either 0 (done) |
+ | | or -1 (pending). Rarely used in favor of ``rc-code``. |
+ +------------------+----------------------------------------------------------+
+ | rc-code | .. index:: |
+ | | single: rc-code; action status |
+ | | single: action; status, rc-code |
+ | | |
+ | | The job's result. Refer to the *Resource Agents* chapter |
+ | | of *Pacemaker Administration* for details on what the |
+ | | values here mean and how they are interpreted. |
+ +------------------+----------------------------------------------------------+
+ | last-rc-change | .. index:: |
+ | | single: last-rc-change; action status |
+ | | single: action; status, last-rc-change |
+ | | |
+ | | Machine-local date/time, in seconds since epoch, at |
+ | | which the job first returned the current value of |
+ | | ``rc-code``. For diagnostic purposes. |
+ +------------------+----------------------------------------------------------+
+ | exec-time | .. index:: |
+ | | single: exec-time; action status |
+ | | single: action; status, exec-time |
+ | | |
+ | | Time, in milliseconds, that the job was running for. |
+ | | For diagnostic purposes. |
+ +------------------+----------------------------------------------------------+
+ | queue-time | .. index:: |
+ | | single: queue-time; action status |
+ | | single: action; status, queue-time |
+ | | |
+ | | Time, in seconds, that the job was queued for in the |
+ | | local executor. For diagnostic purposes. |
+ +------------------+----------------------------------------------------------+
+ | crm_feature_set | .. index:: |
+ | | single: crm_feature_set; action status |
+ | | single: action; status, crm_feature_set |
+ | | |
+ | | The version which this job description conforms to. Used |
+ | | when processing ``op-digest``. |
+ +------------------+----------------------------------------------------------+
+ | transition-key | .. index:: |
+ | | single: transition-key; action status |
+ | | single: action; status, transition-key |
+ | | |
+ | | A concatenation of the job's graph action number, the |
+ | | graph number, the expected result and the UUID of the |
+ | | controller instance that scheduled it. This is used to |
+ | | construct ``transition-magic`` (below). |
+ +------------------+----------------------------------------------------------+
+ | transition-magic | .. index:: |
+ | | single: transition-magic; action status |
+ | | single: action; status, transition-magic |
+ | | |
+ | | A concatenation of the job's ``op-status``, ``rc-code`` |
+ | | and ``transition-key``. Guaranteed to be unique for the |
+ | | life of the cluster (which ensures it is part of CIB |
+ | | update notifications) and contains all the information |
+ | | needed for the controller to correctly analyze and |
+ | | process the completed job. Most importantly, the |
+ | | decomposed elements tell the controller if the job |
+ | | entry was expected and whether it failed. |
+ +------------------+----------------------------------------------------------+
+ | op-digest | .. index:: |
+ | | single: op-digest; action status |
+ | | single: action; status, op-digest |
+ | | |
+ | | An MD5 sum representing the parameters passed to the |
+ | | job. Used to detect changes to the configuration, to |
+ | | restart resources if necessary. |
+ +------------------+----------------------------------------------------------+
+ | crm-debug-origin | .. index:: |
+ | | single: crm-debug-origin; action status |
+ | | single: action; status, crm-debug-origin |
+ | | |
+ | | The origin of the current values. For diagnostic |
+ | | purposes. |
+ +------------------+----------------------------------------------------------+
+
+Simple Operation History Example
+________________________________
+
+.. topic:: A monitor operation (determines current state of the ``apcstonith`` resource)
+
+ .. code-block:: xml
+
+ <lrm_resource id="apcstonith" type="fence_apc_snmp" class="stonith">
+ <lrm_rsc_op id="apcstonith_monitor_0" operation="monitor" call-id="2"
+ rc-code="7" op-status="0" interval="0"
+ crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
+ op-digest="2e3da9274d3550dc6526fb24bfcbcba0"
+ transition-key="22:2:7:2668bbeb-06d5-40f9-936d-24cb7f87006a"
+ transition-magic="0:7;22:2:7:2668bbeb-06d5-40f9-936d-24cb7f87006a"
+ last-rc-change="1239008085" exec-time="10" queue-time="0"/>
+ </lrm_resource>
+
+In the above example, the job is a non-recurring monitor operation
+often referred to as a "probe" for the ``apcstonith`` resource.
+
+The cluster schedules probes for every configured resource on a node when
+the node first starts, in order to determine the resource's current state
+before it takes any further action.
+
+From the ``transition-key``, we can see that this was the 22nd action of
+the 2nd graph produced by this instance of the controller
+(2668bbeb-06d5-40f9-936d-24cb7f87006a).
+
+The third field of the ``transition-key`` contains a 7, which indicates
+that the job expects to find the resource inactive. By looking at the ``rc-code``
+property, we see that this was the case.
+
+As that is the only job recorded for this node, we can conclude that
+the cluster started the resource elsewhere.
+
+Complex Operation History Example
+_________________________________
+
+.. topic:: Resource history of a ``pingd`` clone with multiple jobs
+
+ .. code-block:: xml
+
+ <lrm_resource id="pingd:0" type="pingd" class="ocf" provider="pacemaker">
+ <lrm_rsc_op id="pingd:0_monitor_30000" operation="monitor" call-id="34"
+ rc-code="0" op-status="0" interval="30000"
+ crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
+ transition-key="10:11:0:2668bbeb-06d5-40f9-936d-24cb7f87006a"
+ last-rc-change="1239009741" exec-time="10" queue-time="0"/>
+ <lrm_rsc_op id="pingd:0_stop_0" operation="stop"
+ crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" call-id="32"
+ rc-code="0" op-status="0" interval="0"
+ transition-key="11:11:0:2668bbeb-06d5-40f9-936d-24cb7f87006a"
+ last-rc-change="1239009741" exec-time="10" queue-time="0"/>
+ <lrm_rsc_op id="pingd:0_start_0" operation="start" call-id="33"
+ rc-code="0" op-status="0" interval="0"
+ crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
+ transition-key="31:11:0:2668bbeb-06d5-40f9-936d-24cb7f87006a"
+ last-rc-change="1239009741" exec-time="10" queue-time="0" />
+ <lrm_rsc_op id="pingd:0_monitor_0" operation="monitor" call-id="3"
+ rc-code="0" op-status="0" interval="0"
+ crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
+ transition-key="23:2:7:2668bbeb-06d5-40f9-936d-24cb7f87006a"
+ last-rc-change="1239008085" exec-time="20" queue-time="0"/>
+ </lrm_resource>
+
+When more than one job record exists, it is important to first sort
+them by ``call-id`` before interpreting them.
+
+Once sorted, the above example can be summarized as:
+
+#. A non-recurring monitor operation returning 7 (not running), with a ``call-id`` of 3
+#. A stop operation returning 0 (success), with a ``call-id`` of 32
+#. A start operation returning 0 (success), with a ``call-id`` of 33
+#. A recurring monitor returning 0 (success), with a ``call-id`` of 34
+
+The cluster processes each job record to build up a picture of the
+resource's state. After the first and second entries, it is
+considered stopped, and after the third it considered active.
+
+Based on the last operation, we can tell that the resource is
+currently active.
+
+Additionally, from the presence of a ``stop`` operation with a lower
+``call-id`` than that of the ``start`` operation, we can conclude that the
+resource has been restarted. Specifically this occurred as part of
+actions 11 and 31 of transition 11 from the controller instance with the key
+``2668bbeb...``. This information can be helpful for locating the
+relevant section of the logs when looking for the source of a failure.
+
+.. [#] You can use the standard ``date`` command to print a human-readable version
+ of any seconds-since-epoch value, for example ``date -d @1239009742``.
diff --git a/doc/sphinx/Pacemaker_Explained/utilization.rst b/doc/sphinx/Pacemaker_Explained/utilization.rst
new file mode 100644
index 0000000..93c67cd
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/utilization.rst
@@ -0,0 +1,264 @@
+.. _utilization:
+
+Utilization and Placement Strategy
+----------------------------------
+
+Pacemaker decides where to place a resource according to the resource
+allocation scores on every node. The resource will be allocated to the
+node where the resource has the highest score.
+
+If the resource allocation scores on all the nodes are equal, by the default
+placement strategy, Pacemaker will choose a node with the least number of
+allocated resources for balancing the load. If the number of resources on each
+node is equal, the first eligible node listed in the CIB will be chosen to run
+the resource.
+
+Often, in real-world situations, different resources use significantly
+different proportions of a node's capacities (memory, I/O, etc.).
+We cannot balance the load ideally just according to the number of resources
+allocated to a node. Besides, if resources are placed such that their combined
+requirements exceed the provided capacity, they may fail to start completely or
+run with degraded performance.
+
+To take these factors into account, Pacemaker allows you to configure:
+
+#. The capacity a certain node provides.
+
+#. The capacity a certain resource requires.
+
+#. An overall strategy for placement of resources.
+
+Utilization attributes
+######################
+
+To configure the capacity that a node provides or a resource requires,
+you can use *utilization attributes* in ``node`` and ``resource`` objects.
+You can name utilization attributes according to your preferences and define as
+many name/value pairs as your configuration needs. However, the attributes'
+values must be integers.
+
+.. topic:: Specifying CPU and RAM capacities of two nodes
+
+ .. code-block:: xml
+
+ <node id="node1" type="normal" uname="node1">
+ <utilization id="node1-utilization">
+ <nvpair id="node1-utilization-cpu" name="cpu" value="2"/>
+ <nvpair id="node1-utilization-memory" name="memory" value="2048"/>
+ </utilization>
+ </node>
+ <node id="node2" type="normal" uname="node2">
+ <utilization id="node2-utilization">
+ <nvpair id="node2-utilization-cpu" name="cpu" value="4"/>
+ <nvpair id="node2-utilization-memory" name="memory" value="4096"/>
+ </utilization>
+ </node>
+
+.. topic:: Specifying CPU and RAM consumed by several resources
+
+ .. code-block:: xml
+
+ <primitive id="rsc-small" class="ocf" provider="pacemaker" type="Dummy">
+ <utilization id="rsc-small-utilization">
+ <nvpair id="rsc-small-utilization-cpu" name="cpu" value="1"/>
+ <nvpair id="rsc-small-utilization-memory" name="memory" value="1024"/>
+ </utilization>
+ </primitive>
+ <primitive id="rsc-medium" class="ocf" provider="pacemaker" type="Dummy">
+ <utilization id="rsc-medium-utilization">
+ <nvpair id="rsc-medium-utilization-cpu" name="cpu" value="2"/>
+ <nvpair id="rsc-medium-utilization-memory" name="memory" value="2048"/>
+ </utilization>
+ </primitive>
+ <primitive id="rsc-large" class="ocf" provider="pacemaker" type="Dummy">
+ <utilization id="rsc-large-utilization">
+ <nvpair id="rsc-large-utilization-cpu" name="cpu" value="3"/>
+ <nvpair id="rsc-large-utilization-memory" name="memory" value="3072"/>
+ </utilization>
+ </primitive>
+
+A node is considered eligible for a resource if it has sufficient free
+capacity to satisfy the resource's requirements. The nature of the required
+or provided capacities is completely irrelevant to Pacemaker -- it just makes
+sure that all capacity requirements of a resource are satisfied before placing
+a resource to a node.
+
+Utilization attributes used on a node object can also be *transient* *(since 2.1.6)*.
+These attributes are added to a ``transient_attributes`` section for the node
+and are forgotten by the cluster when the node goes offline. The ``attrd_updater``
+tool can be used to set these attributes.
+
+.. topic:: Transient utilization attribute for node cluster-1
+
+ .. code-block:: xml
+
+ <transient_attributes id="cluster-1">
+ <utilization id="status-cluster-1">
+ <nvpair id="status-cluster-1-cpu" name="cpu" value="1"/>
+ </utilization>
+ </transient_attributes>
+
+.. note::
+
+ Utilization is supported for bundles *(since 2.1.3)*, but only for bundles
+ with an inner primitive. Any resource utilization values should be specified
+ for the inner primitive, but any priority meta-attribute should be specified
+ for the outer bundle.
+
+
+Placement Strategy
+##################
+
+After you have configured the capacities your nodes provide and the
+capacities your resources require, you need to set the ``placement-strategy``
+in the global cluster options, otherwise the capacity configurations have
+*no effect*.
+
+Four values are available for the ``placement-strategy``:
+
+* **default**
+
+ Utilization values are not taken into account at all.
+ Resources are allocated according to allocation scores. If scores are equal,
+ resources are evenly distributed across nodes.
+
+* **utilization**
+
+ Utilization values are taken into account *only* when deciding whether a node
+ is considered eligible (i.e. whether it has sufficient free capacity to satisfy
+ the resource's requirements). Load-balancing is still done based on the
+ number of resources allocated to a node.
+
+* **balanced**
+
+ Utilization values are taken into account when deciding whether a node
+ is eligible to serve a resource *and* when load-balancing, so an attempt is
+ made to spread the resources in a way that optimizes resource performance.
+
+* **minimal**
+
+ Utilization values are taken into account *only* when deciding whether a node
+ is eligible to serve a resource. For load-balancing, an attempt is made to
+ concentrate the resources on as few nodes as possible, thereby enabling
+ possible power savings on the remaining nodes.
+
+Set ``placement-strategy`` with ``crm_attribute``:
+
+ .. code-block:: none
+
+ # crm_attribute --name placement-strategy --update balanced
+
+Now Pacemaker will ensure the load from your resources will be distributed
+evenly throughout the cluster, without the need for convoluted sets of
+colocation constraints.
+
+Allocation Details
+##################
+
+Which node is preferred to get consumed first when allocating resources?
+________________________________________________________________________
+
+* The node with the highest node weight gets consumed first. Node weight
+ is a score maintained by the cluster to represent node health.
+
+* If multiple nodes have the same node weight:
+
+ * If ``placement-strategy`` is ``default`` or ``utilization``,
+ the node that has the least number of allocated resources gets consumed first.
+
+ * If their numbers of allocated resources are equal,
+ the first eligible node listed in the CIB gets consumed first.
+
+ * If ``placement-strategy`` is ``balanced``,
+ the node that has the most free capacity gets consumed first.
+
+ * If the free capacities of the nodes are equal,
+ the node that has the least number of allocated resources gets consumed first.
+
+ * If their numbers of allocated resources are equal,
+ the first eligible node listed in the CIB gets consumed first.
+
+ * If ``placement-strategy`` is ``minimal``,
+ the first eligible node listed in the CIB gets consumed first.
+
+Which node has more free capacity?
+__________________________________
+
+If only one type of utilization attribute has been defined, free capacity
+is a simple numeric comparison.
+
+If multiple types of utilization attributes have been defined, then
+the node that is numerically highest in the the most attribute types
+has the most free capacity. For example:
+
+* If ``nodeA`` has more free ``cpus``, and ``nodeB`` has more free ``memory``,
+ then their free capacities are equal.
+
+* If ``nodeA`` has more free ``cpus``, while ``nodeB`` has more free ``memory``
+ and ``storage``, then ``nodeB`` has more free capacity.
+
+Which resource is preferred to be assigned first?
+_________________________________________________
+
+* The resource that has the highest ``priority`` (see :ref:`resource_options`) gets
+ allocated first.
+
+* If their priorities are equal, check whether they are already running. The
+ resource that has the highest score on the node where it's running gets allocated
+ first, to prevent resource shuffling.
+
+* If the scores above are equal or the resources are not running, the resource has
+ the highest score on the preferred node gets allocated first.
+
+* If the scores above are equal, the first runnable resource listed in the CIB
+ gets allocated first.
+
+Limitations and Workarounds
+###########################
+
+The type of problem Pacemaker is dealing with here is known as the
+`knapsack problem <http://en.wikipedia.org/wiki/Knapsack_problem>`_ and falls into
+the `NP-complete <http://en.wikipedia.org/wiki/NP-complete>`_ category of computer
+science problems -- a fancy way of saying "it takes a really long time
+to solve".
+
+Clearly in a HA cluster, it's not acceptable to spend minutes, let alone hours
+or days, finding an optimal solution while services remain unavailable.
+
+So instead of trying to solve the problem completely, Pacemaker uses a
+*best effort* algorithm for determining which node should host a particular
+service. This means it arrives at a solution much faster than traditional
+linear programming algorithms, but by doing so at the price of leaving some
+services stopped.
+
+In the contrived example at the start of this chapter:
+
+* ``rsc-small`` would be allocated to ``node1``
+
+* ``rsc-medium`` would be allocated to ``node2``
+
+* ``rsc-large`` would remain inactive
+
+Which is not ideal.
+
+There are various approaches to dealing with the limitations of
+pacemaker's placement strategy:
+
+* **Ensure you have sufficient physical capacity.**
+
+ It might sound obvious, but if the physical capacity of your nodes is (close to)
+ maxed out by the cluster under normal conditions, then failover isn't going to
+ go well. Even without the utilization feature, you'll start hitting timeouts and
+ getting secondary failures.
+
+* **Build some buffer into the capabilities advertised by the nodes.**
+
+ Advertise slightly more resources than we physically have, on the (usually valid)
+ assumption that a resource will not use 100% of the configured amount of
+ CPU, memory and so forth *all* the time. This practice is sometimes called *overcommit*.
+
+* **Specify resource priorities.**
+
+ If the cluster is going to sacrifice services, it should be the ones you care
+ about (comparatively) the least. Ensure that resource priorities are properly set
+ so that your most important resources are scheduled first.