summaryrefslogtreecommitdiffstats
path: root/doc/sphinx/Pacemaker_Explained/resources.rst
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-17 06:53:20 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-17 06:53:20 +0000
commite5a812082ae033afb1eed82c0f2df3d0f6bdc93f (patch)
treea6716c9275b4b413f6c9194798b34b91affb3cc7 /doc/sphinx/Pacemaker_Explained/resources.rst
parentInitial commit. (diff)
downloadpacemaker-e5a812082ae033afb1eed82c0f2df3d0f6bdc93f.tar.xz
pacemaker-e5a812082ae033afb1eed82c0f2df3d0f6bdc93f.zip
Adding upstream version 2.1.6.upstream/2.1.6
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/sphinx/Pacemaker_Explained/resources.rst')
-rw-r--r--doc/sphinx/Pacemaker_Explained/resources.rst1074
1 files changed, 1074 insertions, 0 deletions
diff --git a/doc/sphinx/Pacemaker_Explained/resources.rst b/doc/sphinx/Pacemaker_Explained/resources.rst
new file mode 100644
index 0000000..3b7520f
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Explained/resources.rst
@@ -0,0 +1,1074 @@
+.. _resource:
+
+Cluster Resources
+-----------------
+
+.. _s-resource-primitive:
+
+What is a Cluster Resource?
+###########################
+
+.. index::
+ single: resource
+
+A *resource* is a service managed by Pacemaker. The simplest type of resource,
+a *primitive*, is described in this chapter. More complex forms, such as groups
+and clones, are described in later chapters.
+
+Every primitive has a *resource agent* that provides Pacemaker a standardized
+interface for managing the service. This allows Pacemaker to be agnostic about
+the services it manages. Pacemaker doesn't need to understand how the service
+works because it relies on the resource agent to do the right thing when asked.
+
+Every resource has a *class* specifying the standard that its resource agent
+follows, and a *type* identifying the specific service being managed.
+
+
+.. _s-resource-supported:
+
+.. index::
+ single: resource; class
+
+Resource Classes
+################
+
+Pacemaker supports several classes, or standards, of resource agents:
+
+* OCF
+* LSB
+* Systemd
+* Service
+* Fencing
+* Nagios *(deprecated since 2.1.6)*
+* Upstart *(deprecated since 2.1.0)*
+
+
+.. index::
+ single: resource; OCF
+ single: OCF; resources
+ single: Open Cluster Framework; resources
+
+Open Cluster Framework
+______________________
+
+The Open Cluster Framework (OCF) Resource Agent API is a ClusterLabs
+standard for managing services. It is the most preferred since it is
+specifically designed for use in a Pacemaker cluster.
+
+OCF agents are scripts that support a variety of actions including ``start``,
+``stop``, and ``monitor``. They may accept parameters, making them more
+flexible than other classes. The number and purpose of parameters is left to
+the agent, which advertises them via the ``meta-data`` action.
+
+Unlike other classes, OCF agents have a *provider* as well as a class and type.
+
+For more information, see the "Resource Agents" chapter of *Pacemaker
+Administration* and the `OCF standard
+<https://github.com/ClusterLabs/OCF-spec/tree/main/ra>`_.
+
+
+.. _s-resource-supported-systemd:
+
+.. index::
+ single: Resource; Systemd
+ single: Systemd; resources
+
+Systemd
+_______
+
+Most Linux distributions use `Systemd
+<http://www.freedesktop.org/wiki/Software/systemd>`_ for system initialization
+and service management. *Unit files* specify how to manage services and are
+usually provided by the distribution.
+
+Pacemaker can manage systemd services. Simply create a resource with
+``systemd`` as the resource class and the unit file name as the resource type.
+Do *not* run ``systemctl enable`` on the unit.
+
+.. important::
+
+ Make sure that any systemd services to be controlled by the cluster are
+ *not* enabled to start at boot.
+
+
+.. index::
+ single: resource; LSB
+ single: LSB; resources
+ single: Linux Standard Base; resources
+
+Linux Standard Base
+___________________
+
+*LSB* resource agents, also known as `SysV-style
+<https://en.wikipedia.org/wiki/Init#SysV-style init scripts>`_, are scripts that
+provide start, stop, and status actions for a service.
+
+They are provided by some operating system distributions. If a full path is not
+given, they are assumed to be located in a directory specified when your
+Pacemaker software was built (usually ``/etc/init.d``).
+
+In order to be used with Pacemaker, they must conform to the `LSB specification
+<http://refspecs.linux-foundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html>`_
+as it relates to init scripts.
+
+.. warning::
+
+ Some LSB scripts do not fully comply with the standard. For details on how
+ to check whether your script is LSB-compatible, see the "Resource Agents"
+ chapter of `Pacemaker Administration`. Common problems include:
+
+ * Not implementing the ``status`` action
+ * Not observing the correct exit status codes
+ * Starting a started resource returns an error
+ * Stopping a stopped resource returns an error
+
+.. important::
+
+ Make sure the host is *not* configured to start any LSB services at boot
+ that will be controlled by the cluster.
+
+
+.. index::
+ single: Resource; System Services
+ single: System Service; resources
+
+System Services
+_______________
+
+Since there are various types of system services (``systemd``,
+``upstart``, and ``lsb``), Pacemaker supports a special ``service`` alias which
+intelligently figures out which one applies to a given cluster node.
+
+This is particularly useful when the cluster contains a mix of
+``systemd``, ``upstart``, and ``lsb``.
+
+In order, Pacemaker will try to find the named service as:
+
+* an LSB init script
+* a Systemd unit file
+* an Upstart job
+
+
+.. index::
+ single: Resource; STONITH
+ single: STONITH; resources
+
+STONITH
+_______
+
+The ``stonith`` class is used for managing fencing devices, discussed later in
+:ref:`fencing`.
+
+
+.. index::
+ single: Resource; Nagios Plugins
+ single: Nagios Plugins; resources
+
+Nagios Plugins
+______________
+
+Nagios Plugins are a way to monitor services. Pacemaker can use these as
+resources, to react to a change in the service's status.
+
+To use plugins as resources, Pacemaker must have been built with support, and
+OCF-style meta-data for the plugins must be installed on nodes that can run
+them. Meta-data for several common plugins is provided by the
+`nagios-agents-metadata <https://github.com/ClusterLabs/nagios-agents-metadata>`_
+project.
+
+The supported parameters for such a resource are same as the long options of
+the plugin.
+
+Start and monitor actions for plugin resources are implemented as invoking the
+plugin. A plugin result of "OK" (0) is treated as success, a result of "WARN"
+(1) is treated as a successful but degraded service, and any other result is
+considered a failure.
+
+A plugin resource is not going to change its status after recovery by
+restarting the plugin, so using them alone does not make sense with ``on-fail``
+set (or left to default) to ``restart``. Another value could make sense, for
+example, if you want to fence or standby nodes that cannot reach some external
+service.
+
+A more common use case for plugin resources is to configure them with a
+``container`` meta-attribute set to the name of another resource that actually
+makes the service available, such as a virtual machine or container.
+
+With ``container`` set, the plugin resource will automatically be colocated
+with the containing resource and ordered after it, and the containing resource
+will be considered failed if the plugin resource fails. This allows monitoring
+of a service inside a virtual machine or container, with recovery of the
+virtual machine or container if the service fails.
+
+.. warning::
+
+ Nagios support is deprecated in Pacemaker. Support will be dropped entirely
+ at the next major release of Pacemaker.
+
+ For monitoring a service inside a virtual machine or container, the
+ recommended alternative is to configure the virtual machine as a guest node
+ or the container as a :ref:`bundle <s-resource-bundle>`. For other use
+ cases, or when the virtual machine or container image cannot be modified,
+ the recommended alternative is to write a custom OCF agent for the service
+ (which may even call the Nagios plugin as part of its status action).
+
+
+.. index::
+ single: Resource; Upstart
+ single: Upstart; resources
+
+Upstart
+_______
+
+Some Linux distributions previously used `Upstart
+<https://upstart.ubuntu.com/>`_ for system initialization and service
+management. Pacemaker is able to manage services using Upstart if the local
+system supports them and support was enabled when your Pacemaker software was
+built.
+
+The *jobs* that specify how services are managed are usually provided by the
+operating system distribution.
+
+.. important::
+
+ Make sure the host is *not* configured to start any Upstart services at boot
+ that will be controlled by the cluster.
+
+.. warning::
+
+ Upstart support is deprecated in Pacemaker. Upstart is no longer actively
+ maintained, and test platforms for it are no longer readily usable. Support
+ will be dropped entirely at the next major release of Pacemaker.
+
+
+.. _primitive-resource:
+
+Resource Properties
+###################
+
+These values tell the cluster which resource agent to use for the resource,
+where to find that resource agent and what standards it conforms to.
+
+.. table:: **Properties of a Primitive Resource**
+ :widths: 1 4
+
+ +-------------+------------------------------------------------------------------+
+ | Field | Description |
+ +=============+==================================================================+
+ | id | .. index:: |
+ | | single: id; resource |
+ | | single: resource; property, id |
+ | | |
+ | | Your name for the resource |
+ +-------------+------------------------------------------------------------------+
+ | class | .. index:: |
+ | | single: class; resource |
+ | | single: resource; property, class |
+ | | |
+ | | The standard the resource agent conforms to. Allowed values: |
+ | | ``lsb``, ``ocf``, ``service``, ``stonith``, ``systemd``, |
+ | | ``nagios`` *(deprecated since 2.1.6)*, and ``upstart`` |
+ | | *(deprecated since 2.1.0)* |
+ +-------------+------------------------------------------------------------------+
+ | description | .. index:: |
+ | | single: description; resource |
+ | | single: resource; property, description |
+ | | |
+ | | A description of the Resource Agent, intended for local use. |
+ | | E.g. ``IP address for website`` |
+ +-------------+------------------------------------------------------------------+
+ | type | .. index:: |
+ | | single: type; resource |
+ | | single: resource; property, type |
+ | | |
+ | | The name of the Resource Agent you wish to use. E.g. |
+ | | ``IPaddr`` or ``Filesystem`` |
+ +-------------+------------------------------------------------------------------+
+ | provider | .. index:: |
+ | | single: provider; resource |
+ | | single: resource; property, provider |
+ | | |
+ | | The OCF spec allows multiple vendors to supply the same resource |
+ | | agent. To use the OCF resource agents supplied by the Heartbeat |
+ | | project, you would specify ``heartbeat`` here. |
+ +-------------+------------------------------------------------------------------+
+
+The XML definition of a resource can be queried with the **crm_resource** tool.
+For example:
+
+.. code-block:: none
+
+ # crm_resource --resource Email --query-xml
+
+might produce:
+
+.. topic:: A system resource definition
+
+ .. code-block:: xml
+
+ <primitive id="Email" class="service" type="exim"/>
+
+.. note::
+
+ One of the main drawbacks to system services (LSB, systemd or
+ Upstart) resources is that they do not allow any parameters!
+
+.. topic:: An OCF resource definition
+
+ .. code-block:: xml
+
+ <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
+ <instance_attributes id="Public-IP-params">
+ <nvpair id="Public-IP-ip" name="ip" value="192.0.2.2"/>
+ </instance_attributes>
+ </primitive>
+
+.. _resource_options:
+
+Resource Options
+################
+
+Resources have two types of options: *meta-attributes* and *instance attributes*.
+Meta-attributes apply to any type of resource, while instance attributes
+are specific to each resource agent.
+
+Resource Meta-Attributes
+________________________
+
+Meta-attributes are used by the cluster to decide how a resource should
+behave and can be easily set using the ``--meta`` option of the
+**crm_resource** command.
+
+.. table:: **Meta-attributes of a Primitive Resource**
+ :class: longtable
+ :widths: 2 2 3
+
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | Field | Default | Description |
+ +============================+==================================+======================================================+
+ | priority | 0 | .. index:: |
+ | | | single: priority; resource option |
+ | | | single: resource; option, priority |
+ | | | |
+ | | | If not all resources can be active, the cluster |
+ | | | will stop lower priority resources in order to |
+ | | | keep higher priority ones active. |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | critical | true | .. index:: |
+ | | | single: critical; resource option |
+ | | | single: resource; option, critical |
+ | | | |
+ | | | Use this value as the default for ``influence`` in |
+ | | | all :ref:`colocation constraints |
+ | | | <s-resource-colocation>` involving this resource, |
+ | | | as well as the implicit colocation constraints |
+ | | | created if this resource is in a :ref:`group |
+ | | | <group-resources>`. For details, see |
+ | | | :ref:`s-coloc-influence`. *(since 2.1.0)* |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | target-role | Started | .. index:: |
+ | | | single: target-role; resource option |
+ | | | single: resource; option, target-role |
+ | | | |
+ | | | What state should the cluster attempt to keep this |
+ | | | resource in? Allowed values: |
+ | | | |
+ | | | * ``Stopped:`` Force the resource to be stopped |
+ | | | * ``Started:`` Allow the resource to be started |
+ | | | (and in the case of :ref:`promotable clone |
+ | | | resources <s-resource-promotable>`, promoted |
+ | | | if appropriate) |
+ | | | * ``Unpromoted:`` Allow the resource to be started, |
+ | | | but only in the unpromoted role if the resource is |
+ | | | :ref:`promotable <s-resource-promotable>` |
+ | | | * ``Promoted:`` Equivalent to ``Started`` |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | is-managed | TRUE | .. index:: |
+ | | | single: is-managed; resource option |
+ | | | single: resource; option, is-managed |
+ | | | |
+ | | | Is the cluster allowed to start and stop |
+ | | | the resource? Allowed values: ``true``, ``false`` |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | maintenance | FALSE | .. index:: |
+ | | | single: maintenance; resource option |
+ | | | single: resource; option, maintenance |
+ | | | |
+ | | | Similar to the ``maintenance-mode`` |
+ | | | :ref:`cluster option <cluster_options>`, but for |
+ | | | a single resource. If true, the resource will not |
+ | | | be started, stopped, or monitored on any node. This |
+ | | | differs from ``is-managed`` in that monitors will |
+ | | | not be run. Allowed values: ``true``, ``false`` |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | resource-stickiness | 1 for individual clone | .. _resource-stickiness: |
+ | | instances, 0 for all | |
+ | | other resources | .. index:: |
+ | | | single: resource-stickiness; resource option |
+ | | | single: resource; option, resource-stickiness |
+ | | | |
+ | | | A score that will be added to the current node when |
+ | | | a resource is already active. This allows running |
+ | | | resources to stay where they are, even if they |
+ | | | would be placed elsewhere if they were being |
+ | | | started from a stopped state. |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | requires | ``quorum`` for resources | .. _requires: |
+ | | with a ``class`` of ``stonith``, | |
+ | | otherwise ``unfencing`` if | .. index:: |
+ | | unfencing is active in the | single: requires; resource option |
+ | | cluster, otherwise ``fencing`` | single: resource; option, requires |
+ | | if ``stonith-enabled`` is true, | |
+ | | otherwise ``quorum`` | Conditions under which the resource can be |
+ | | | started. Allowed values: |
+ | | | |
+ | | | * ``nothing:`` can always be started |
+ | | | * ``quorum:`` The cluster can only start this |
+ | | | resource if a majority of the configured nodes |
+ | | | are active |
+ | | | * ``fencing:`` The cluster can only start this |
+ | | | resource if a majority of the configured nodes |
+ | | | are active *and* any failed or unknown nodes |
+ | | | have been :ref:`fenced <fencing>` |
+ | | | * ``unfencing:`` The cluster can only start this |
+ | | | resource if a majority of the configured nodes |
+ | | | are active *and* any failed or unknown nodes have |
+ | | | been fenced *and* only on nodes that have been |
+ | | | :ref:`unfenced <unfencing>` |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | migration-threshold | INFINITY | .. index:: |
+ | | | single: migration-threshold; resource option |
+ | | | single: resource; option, migration-threshold |
+ | | | |
+ | | | How many failures may occur for this resource on |
+ | | | a node, before this node is marked ineligible to |
+ | | | host this resource. A value of 0 indicates that this |
+ | | | feature is disabled (the node will never be marked |
+ | | | ineligible); by constrast, the cluster treats |
+ | | | INFINITY (the default) as a very large but finite |
+ | | | number. This option has an effect only if the |
+ | | | failed operation specifies ``on-fail`` as |
+ | | | ``restart`` (the default), and additionally for |
+ | | | failed ``start`` operations, if the cluster |
+ | | | property ``start-failure-is-fatal`` is ``false``. |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | failure-timeout | 0 | .. index:: |
+ | | | single: failure-timeout; resource option |
+ | | | single: resource; option, failure-timeout |
+ | | | |
+ | | | How many seconds to wait before acting as if the |
+ | | | failure had not occurred, and potentially allowing |
+ | | | the resource back to the node on which it failed. |
+ | | | A value of 0 indicates that this feature is |
+ | | | disabled. |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | multiple-active | stop_start | .. index:: |
+ | | | single: multiple-active; resource option |
+ | | | single: resource; option, multiple-active |
+ | | | |
+ | | | What should the cluster do if it ever finds the |
+ | | | resource active on more than one node? Allowed |
+ | | | values: |
+ | | | |
+ | | | * ``block``: mark the resource as unmanaged |
+ | | | * ``stop_only``: stop all active instances and |
+ | | | leave them that way |
+ | | | * ``stop_start``: stop all active instances and |
+ | | | start the resource in one location only |
+ | | | * ``stop_unexpected``: stop all active instances |
+ | | | except where the resource should be active (this |
+ | | | should be used only when extra instances are not |
+ | | | expected to disrupt existing instances, and the |
+ | | | resource agent's monitor of an existing instance |
+ | | | is capable of detecting any problems that could be |
+ | | | caused; note that any resources ordered after this |
+ | | | will still need to be restarted) *(since 2.1.3)* |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | allow-migrate | TRUE for ocf:pacemaker:remote | Whether the cluster should try to "live migrate" |
+ | | resources, FALSE otherwise | this resource when it needs to be moved (see |
+ | | | :ref:`live-migration`) |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | allow-unhealthy-nodes | FALSE | Whether the resource should be able to run on a node |
+ | | | even if the node's health score would otherwise |
+ | | | prevent it (see :ref:`node-health`) *(since 2.1.3)* |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | container-attribute-target | | Specific to bundle resources; see |
+ | | | :ref:`s-bundle-attributes` |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | remote-node | | The name of the Pacemaker Remote guest node this |
+ | | | resource is associated with, if any. If |
+ | | | specified, this both enables the resource as a |
+ | | | guest node and defines the unique name used to |
+ | | | identify the guest node. The guest must be |
+ | | | configured to run the Pacemaker Remote daemon |
+ | | | when it is started. **WARNING:** This value |
+ | | | cannot overlap with any resource or node IDs. |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | remote-port | 3121 | If ``remote-node`` is specified, the port on the |
+ | | | guest used for its Pacemaker Remote connection. |
+ | | | The Pacemaker Remote daemon on the guest must |
+ | | | be configured to listen on this port. |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | remote-addr | value of ``remote-node`` | If ``remote-node`` is specified, the IP |
+ | | | address or hostname used to connect to the |
+ | | | guest via Pacemaker Remote. The Pacemaker Remote |
+ | | | daemon on the guest must be configured to accept |
+ | | | connections on this address. |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+ | remote-connect-timeout | 60s | If ``remote-node`` is specified, how long before |
+ | | | a pending guest connection will time out. |
+ +----------------------------+----------------------------------+------------------------------------------------------+
+
+As an example of setting resource options, if you performed the following
+commands on an LSB Email resource:
+
+.. code-block:: none
+
+ # crm_resource --meta --resource Email --set-parameter priority --parameter-value 100
+ # crm_resource -m -r Email -p multiple-active -v block
+
+the resulting resource definition might be:
+
+.. topic:: An LSB resource with cluster options
+
+ .. code-block:: xml
+
+ <primitive id="Email" class="lsb" type="exim">
+ <meta_attributes id="Email-meta_attributes">
+ <nvpair id="Email-meta_attributes-priority" name="priority" value="100"/>
+ <nvpair id="Email-meta_attributes-multiple-active" name="multiple-active" value="block"/>
+ </meta_attributes>
+ </primitive>
+
+In addition to the cluster-defined meta-attributes described above, you may
+also configure arbitrary meta-attributes of your own choosing. Most commonly,
+this would be done for use in :ref:`rules <rules>`. For example, an IT department
+might define a custom meta-attribute to indicate which company department each
+resource is intended for. To reduce the chance of name collisions with
+cluster-defined meta-attributes added in the future, it is recommended to use
+a unique, organization-specific prefix for such attributes.
+
+.. _s-resource-defaults:
+
+Setting Global Defaults for Resource Meta-Attributes
+____________________________________________________
+
+To set a default value for a resource option, add it to the
+``rsc_defaults`` section with ``crm_attribute``. For example,
+
+.. code-block:: none
+
+ # crm_attribute --type rsc_defaults --name is-managed --update false
+
+would prevent the cluster from starting or stopping any of the
+resources in the configuration (unless of course the individual
+resources were specifically enabled by having their ``is-managed`` set to
+``true``).
+
+Resource Instance Attributes
+____________________________
+
+The resource agents of some resource classes (lsb, systemd and upstart *not* among them)
+can be given parameters which determine how they behave and which instance
+of a service they control.
+
+If your resource agent supports parameters, you can add them with the
+``crm_resource`` command. For example,
+
+.. code-block:: none
+
+ # crm_resource --resource Public-IP --set-parameter ip --parameter-value 192.0.2.2
+
+would create an entry in the resource like this:
+
+.. topic:: An example OCF resource with instance attributes
+
+ .. code-block:: xml
+
+ <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
+ <instance_attributes id="params-public-ip">
+ <nvpair id="public-ip-addr" name="ip" value="192.0.2.2"/>
+ </instance_attributes>
+ </primitive>
+
+For an OCF resource, the result would be an environment variable
+called ``OCF_RESKEY_ip`` with a value of ``192.0.2.2``.
+
+The list of instance attributes supported by an OCF resource agent can be
+found by calling the resource agent with the ``meta-data`` command.
+The output contains an XML description of all the supported
+attributes, their purpose and default values.
+
+.. topic:: Displaying the metadata for the Dummy resource agent template
+
+ .. code-block:: none
+
+ # export OCF_ROOT=/usr/lib/ocf
+ # $OCF_ROOT/resource.d/pacemaker/Dummy meta-data
+
+ .. code-block:: xml
+
+ <?xml version="1.0"?>
+ <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
+ <resource-agent name="Dummy" version="2.0">
+ <version>1.1</version>
+
+ <longdesc lang="en">
+ This is a dummy OCF resource agent. It does absolutely nothing except keep track
+ of whether it is running or not, and can be configured so that actions fail or
+ take a long time. Its purpose is primarily for testing, and to serve as a
+ template for resource agent writers.
+ </longdesc>
+ <shortdesc lang="en">Example stateless resource agent</shortdesc>
+
+ <parameters>
+ <parameter name="state" unique-group="state">
+ <longdesc lang="en">
+ Location to store the resource state in.
+ </longdesc>
+ <shortdesc lang="en">State file</shortdesc>
+ <content type="string" default="/var/run/Dummy-RESOURCE_ID.state" />
+ </parameter>
+
+ <parameter name="passwd" reloadable="1">
+ <longdesc lang="en">
+ Fake password field
+ </longdesc>
+ <shortdesc lang="en">Password</shortdesc>
+ <content type="string" default="" />
+ </parameter>
+
+ <parameter name="fake" reloadable="1">
+ <longdesc lang="en">
+ Fake attribute that can be changed to cause a reload
+ </longdesc>
+ <shortdesc lang="en">Fake attribute that can be changed to cause a reload</shortdesc>
+ <content type="string" default="dummy" />
+ </parameter>
+
+ <parameter name="op_sleep" reloadable="1">
+ <longdesc lang="en">
+ Number of seconds to sleep during operations. This can be used to test how
+ the cluster reacts to operation timeouts.
+ </longdesc>
+ <shortdesc lang="en">Operation sleep duration in seconds.</shortdesc>
+ <content type="string" default="0" />
+ </parameter>
+
+ <parameter name="fail_start_on" reloadable="1">
+ <longdesc lang="en">
+ Start, migrate_from, and reload-agent actions will return failure if running on
+ the host specified here, but the resource will run successfully anyway (future
+ monitor calls will find it running). This can be used to test on-fail=ignore.
+ </longdesc>
+ <shortdesc lang="en">Report bogus start failure on specified host</shortdesc>
+ <content type="string" default="" />
+ </parameter>
+ <parameter name="envfile" reloadable="1">
+ <longdesc lang="en">
+ If this is set, the environment will be dumped to this file for every call.
+ </longdesc>
+ <shortdesc lang="en">Environment dump file</shortdesc>
+ <content type="string" default="" />
+ </parameter>
+
+ </parameters>
+
+ <actions>
+ <action name="start" timeout="20s" />
+ <action name="stop" timeout="20s" />
+ <action name="monitor" timeout="20s" interval="10s" depth="0"/>
+ <action name="reload" timeout="20s" />
+ <action name="reload-agent" timeout="20s" />
+ <action name="migrate_to" timeout="20s" />
+ <action name="migrate_from" timeout="20s" />
+ <action name="validate-all" timeout="20s" />
+ <action name="meta-data" timeout="5s" />
+ </actions>
+ </resource-agent>
+
+.. index::
+ single: resource; action
+ single: resource; operation
+
+.. _operation:
+
+Resource Operations
+###################
+
+*Operations* are actions the cluster can perform on a resource by calling the
+resource agent. Resource agents must support certain common operations such as
+start, stop, and monitor, and may implement any others.
+
+Operations may be explicitly configured for two purposes: to override defaults
+for options (such as timeout) that the cluster will use whenever it initiates
+the operation, and to run an operation on a recurring basis (for example, to
+monitor the resource for failure).
+
+.. topic:: An OCF resource with a non-default start timeout
+
+ .. code-block:: xml
+
+ <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
+ <operations>
+ <op id="Public-IP-start" name="start" timeout="60s"/>
+ </operations>
+ <instance_attributes id="params-public-ip">
+ <nvpair id="public-ip-addr" name="ip" value="192.0.2.2"/>
+ </instance_attributes>
+ </primitive>
+
+Pacemaker identifies operations by a combination of name and interval, so this
+combination must be unique for each resource. That is, you should not configure
+two operations for the same resource with the same name and interval.
+
+.. _operation_properties:
+
+Operation Properties
+____________________
+
+Operation properties may be specified directly in the ``op`` element as
+XML attributes, or in a separate ``meta_attributes`` block as ``nvpair`` elements.
+XML attributes take precedence over ``nvpair`` elements if both are specified.
+
+.. table:: **Properties of an Operation**
+ :class: longtable
+ :widths: 1 2 3
+
+ +----------------+-----------------------------------+-----------------------------------------------------+
+ | Field | Default | Description |
+ +================+===================================+=====================================================+
+ | id | | .. index:: |
+ | | | single: id; action property |
+ | | | single: action; property, id |
+ | | | |
+ | | | A unique name for the operation. |
+ +----------------+-----------------------------------+-----------------------------------------------------+
+ | name | | .. index:: |
+ | | | single: name; action property |
+ | | | single: action; property, name |
+ | | | |
+ | | | The action to perform. This can be any action |
+ | | | supported by the agent; common values include |
+ | | | ``monitor``, ``start``, and ``stop``. |
+ +----------------+-----------------------------------+-----------------------------------------------------+
+ | interval | 0 | .. index:: |
+ | | | single: interval; action property |
+ | | | single: action; property, interval |
+ | | | |
+ | | | How frequently (in seconds) to perform the |
+ | | | operation. A value of 0 means "when needed". |
+ | | | A positive value defines a *recurring action*, |
+ | | | which is typically used with |
+ | | | :ref:`monitor <s-resource-monitoring>`. |
+ +----------------+-----------------------------------+-----------------------------------------------------+
+ | timeout | | .. index:: |
+ | | | single: timeout; action property |
+ | | | single: action; property, timeout |
+ | | | |
+ | | | How long to wait before declaring the action |
+ | | | has failed |
+ +----------------+-----------------------------------+-----------------------------------------------------+
+ | on-fail | Varies by action: | .. index:: |
+ | | | single: on-fail; action property |
+ | | * ``stop``: ``fence`` if | single: action; property, on-fail |
+ | | ``stonith-enabled`` is true | |
+ | | or ``block`` otherwise | The action to take if this action ever fails. |
+ | | * ``demote``: ``on-fail`` of the | Allowed values: |
+ | | ``monitor`` action with | |
+ | | ``role`` set to ``Promoted``, | * ``ignore:`` Pretend the resource did not fail. |
+ | | if present, enabled, and | * ``block:`` Don't perform any further operations |
+ | | configured to a value other | on the resource. |
+ | | than ``demote``, or ``restart`` | * ``stop:`` Stop the resource and do not start |
+ | | otherwise | it elsewhere. |
+ | | * all other actions: ``restart`` | * ``demote:`` Demote the resource, without a |
+ | | | full restart. This is valid only for ``promote`` |
+ | | | actions, and for ``monitor`` actions with both |
+ | | | a nonzero ``interval`` and ``role`` set to |
+ | | | ``Promoted``; for any other action, a |
+ | | | configuration error will be logged, and the |
+ | | | default behavior will be used. *(since 2.0.5)* |
+ | | | * ``restart:`` Stop the resource and start it |
+ | | | again (possibly on a different node). |
+ | | | * ``fence:`` STONITH the node on which the |
+ | | | resource failed. |
+ | | | * ``standby:`` Move *all* resources away from the |
+ | | | node on which the resource failed. |
+ +----------------+-----------------------------------+-----------------------------------------------------+
+ | enabled | TRUE | .. index:: |
+ | | | single: enabled; action property |
+ | | | single: action; property, enabled |
+ | | | |
+ | | | If ``false``, ignore this operation definition. |
+ | | | This is typically used to pause a particular |
+ | | | recurring ``monitor`` operation; for instance, it |
+ | | | can complement the respective resource being |
+ | | | unmanaged (``is-managed=false``), as this alone |
+ | | | will :ref:`not block any configured monitoring |
+ | | | <s-monitoring-unmanaged>`. Disabling the operation |
+ | | | does not suppress all actions of the given type. |
+ | | | Allowed values: ``true``, ``false``. |
+ +----------------+-----------------------------------+-----------------------------------------------------+
+ | record-pending | TRUE | .. index:: |
+ | | | single: record-pending; action property |
+ | | | single: action; property, record-pending |
+ | | | |
+ | | | If ``true``, the intention to perform the operation |
+ | | | is recorded so that GUIs and CLI tools can indicate |
+ | | | that an operation is in progress. This is best set |
+ | | | as an *operation default* |
+ | | | (see :ref:`s-operation-defaults`). Allowed values: |
+ | | | ``true``, ``false``. |
+ +----------------+-----------------------------------+-----------------------------------------------------+
+ | role | | .. index:: |
+ | | | single: role; action property |
+ | | | single: action; property, role |
+ | | | |
+ | | | Run the operation only on node(s) that the cluster |
+ | | | thinks should be in the specified role. This only |
+ | | | makes sense for recurring ``monitor`` operations. |
+ | | | Allowed (case-sensitive) values: ``Stopped``, |
+ | | | ``Started``, and in the case of :ref:`promotable |
+ | | | clone resources <s-resource-promotable>`, |
+ | | | ``Unpromoted`` and ``Promoted``. |
+ +----------------+-----------------------------------+-----------------------------------------------------+
+
+.. note::
+
+ When ``on-fail`` is set to ``demote``, recovery from failure by a successful
+ demote causes the cluster to recalculate whether and where a new instance
+ should be promoted. The node with the failure is eligible, so if promotion
+ scores have not changed, it will be promoted again.
+
+ There is no direct equivalent of ``migration-threshold`` for the promoted
+ role, but the same effect can be achieved with a location constraint using a
+ :ref:`rule <rules>` with a node attribute expression for the resource's fail
+ count.
+
+ For example, to immediately ban the promoted role from a node with any
+ failed promote or promoted instance monitor:
+
+ .. code-block:: xml
+
+ <rsc_location id="loc1" rsc="my_primitive">
+ <rule id="rule1" score="-INFINITY" role="Promoted" boolean-op="or">
+ <expression id="expr1" attribute="fail-count-my_primitive#promote_0"
+ operation="gte" value="1"/>
+ <expression id="expr2" attribute="fail-count-my_primitive#monitor_10000"
+ operation="gte" value="1"/>
+ </rule>
+ </rsc_location>
+
+ This example assumes that there is a promotable clone of the ``my_primitive``
+ resource (note that the primitive name, not the clone name, is used in the
+ rule), and that there is a recurring 10-second-interval monitor configured for
+ the promoted role (fail count attributes specify the interval in
+ milliseconds).
+
+.. _s-resource-monitoring:
+
+Monitoring Resources for Failure
+________________________________
+
+When Pacemaker first starts a resource, it runs one-time ``monitor`` operations
+(referred to as *probes*) to ensure the resource is running where it's
+supposed to be, and not running where it's not supposed to be. (This behavior
+can be affected by the ``resource-discovery`` location constraint property.)
+
+Other than those initial probes, Pacemaker will *not* (by default) check that
+the resource continues to stay healthy [#]_. You must configure ``monitor``
+operations explicitly to perform these checks.
+
+.. topic:: An OCF resource with a recurring health check
+
+ .. code-block:: xml
+
+ <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
+ <operations>
+ <op id="Public-IP-start" name="start" timeout="60s"/>
+ <op id="Public-IP-monitor" name="monitor" interval="60s"/>
+ </operations>
+ <instance_attributes id="params-public-ip">
+ <nvpair id="public-ip-addr" name="ip" value="192.0.2.2"/>
+ </instance_attributes>
+ </primitive>
+
+By default, a ``monitor`` operation will ensure that the resource is running
+where it is supposed to. The ``target-role`` property can be used for further
+checking.
+
+For example, if a resource has one ``monitor`` operation with
+``interval=10 role=Started`` and a second ``monitor`` operation with
+``interval=11 role=Stopped``, the cluster will run the first monitor on any nodes
+it thinks *should* be running the resource, and the second monitor on any nodes
+that it thinks *should not* be running the resource (for the truly paranoid,
+who want to know when an administrator manually starts a service by mistake).
+
+.. note::
+
+ Currently, monitors with ``role=Stopped`` are not implemented for
+ :ref:`clone <s-resource-clone>` resources.
+
+.. _s-monitoring-unmanaged:
+
+Monitoring Resources When Administration is Disabled
+____________________________________________________
+
+Recurring ``monitor`` operations behave differently under various administrative
+settings:
+
+* When a resource is unmanaged (by setting ``is-managed=false``): No monitors
+ will be stopped.
+
+ If the unmanaged resource is stopped on a node where the cluster thinks it
+ should be running, the cluster will detect and report that it is not, but it
+ will not consider the monitor failed, and will not try to start the resource
+ until it is managed again.
+
+ Starting the unmanaged resource on a different node is strongly discouraged
+ and will at least cause the cluster to consider the resource failed, and
+ may require the resource's ``target-role`` to be set to ``Stopped`` then
+ ``Started`` to be recovered.
+
+* When a resource is put into maintenance mode (by setting
+ ``maintenance=true``): The resource will be marked as unmanaged. (This
+ overrides ``is-managed=true``.)
+
+ Additionally, all monitor operations will be stopped, except those specifying
+ ``role`` as ``Stopped`` (which will be newly initiated if appropriate). As
+ with unmanaged resources in general, starting a resource on a node other than
+ where the cluster expects it to be will cause problems.
+
+* When a node is put into standby: All resources will be moved away from the
+ node, and all ``monitor`` operations will be stopped on the node, except those
+ specifying ``role`` as ``Stopped`` (which will be newly initiated if
+ appropriate).
+
+* When a node is put into maintenance mode: All resources that are active on the
+ node will be marked as in maintenance mode. See above for more details.
+
+* When the cluster is put into maintenance mode: All resources in the cluster
+ will be marked as in maintenance mode. See above for more details.
+
+A resource is in maintenance mode if the cluster, the node where the resource
+is active, or the resource itself is configured to be in maintenance mode. If a
+resource is in maintenance mode, then it is also unmanaged. However, if a
+resource is unmanaged, it is not necessarily in maintenance mode.
+
+.. _s-operation-defaults:
+
+Setting Global Defaults for Operations
+______________________________________
+
+You can change the global default values for operation properties
+in a given cluster. These are defined in an ``op_defaults`` section
+of the CIB's ``configuration`` section, and can be set with
+``crm_attribute``. For example,
+
+.. code-block:: none
+
+ # crm_attribute --type op_defaults --name timeout --update 20s
+
+would default each operation's ``timeout`` to 20 seconds. If an
+operation's definition also includes a value for ``timeout``, then that
+value would be used for that operation instead.
+
+When Implicit Operations Take a Long Time
+_________________________________________
+
+The cluster will always perform a number of implicit operations: ``start``,
+``stop`` and a non-recurring ``monitor`` operation used at startup to check
+whether the resource is already active. If one of these is taking too long,
+then you can create an entry for them and specify a longer timeout.
+
+.. topic:: An OCF resource with custom timeouts for its implicit actions
+
+ .. code-block:: xml
+
+ <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
+ <operations>
+ <op id="public-ip-startup" name="monitor" interval="0" timeout="90s"/>
+ <op id="public-ip-start" name="start" interval="0" timeout="180s"/>
+ <op id="public-ip-stop" name="stop" interval="0" timeout="15min"/>
+ </operations>
+ <instance_attributes id="params-public-ip">
+ <nvpair id="public-ip-addr" name="ip" value="192.0.2.2"/>
+ </instance_attributes>
+ </primitive>
+
+Multiple Monitor Operations
+___________________________
+
+Provided no two operations (for a single resource) have the same name
+and interval, you can have as many ``monitor`` operations as you like.
+In this way, you can do a superficial health check every minute and
+progressively more intense ones at higher intervals.
+
+To tell the resource agent what kind of check to perform, you need to
+provide each monitor with a different value for a common parameter.
+The OCF standard creates a special parameter called ``OCF_CHECK_LEVEL``
+for this purpose and dictates that it is "made available to the
+resource agent without the normal ``OCF_RESKEY`` prefix".
+
+Whatever name you choose, you can specify it by adding an
+``instance_attributes`` block to the ``op`` tag. It is up to each
+resource agent to look for the parameter and decide how to use it.
+
+.. topic:: An OCF resource with two recurring health checks, performing
+ different levels of checks specified via ``OCF_CHECK_LEVEL``.
+
+ .. code-block:: xml
+
+ <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
+ <operations>
+ <op id="public-ip-health-60" name="monitor" interval="60">
+ <instance_attributes id="params-public-ip-depth-60">
+ <nvpair id="public-ip-depth-60" name="OCF_CHECK_LEVEL" value="10"/>
+ </instance_attributes>
+ </op>
+ <op id="public-ip-health-300" name="monitor" interval="300">
+ <instance_attributes id="params-public-ip-depth-300">
+ <nvpair id="public-ip-depth-300" name="OCF_CHECK_LEVEL" value="20"/>
+ </instance_attributes>
+ </op>
+ </operations>
+ <instance_attributes id="params-public-ip">
+ <nvpair id="public-ip-level" name="ip" value="192.0.2.2"/>
+ </instance_attributes>
+ </primitive>
+
+Disabling a Monitor Operation
+_____________________________
+
+The easiest way to stop a recurring monitor is to just delete it.
+However, there can be times when you only want to disable it
+temporarily. In such cases, simply add ``enabled=false`` to the
+operation's definition.
+
+.. topic:: Example of an OCF resource with a disabled health check
+
+ .. code-block:: xml
+
+ <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
+ <operations>
+ <op id="public-ip-check" name="monitor" interval="60s" enabled="false"/>
+ </operations>
+ <instance_attributes id="params-public-ip">
+ <nvpair id="public-ip-addr" name="ip" value="192.0.2.2"/>
+ </instance_attributes>
+ </primitive>
+
+This can be achieved from the command line by executing:
+
+.. code-block:: none
+
+ # cibadmin --modify --xml-text '<op id="public-ip-check" enabled="false"/>'
+
+Once you've done whatever you needed to do, you can then re-enable it with
+
+.. code-block:: none
+
+ # cibadmin --modify --xml-text '<op id="public-ip-check" enabled="true"/>'
+
+.. [#] Currently, anyway. Automatic monitoring operations may be added in a future
+ version of Pacemaker.