summaryrefslogtreecommitdiffstats
path: root/doc/sphinx/Pacemaker_Administration
diff options
context:
space:
mode:
Diffstat (limited to 'doc/sphinx/Pacemaker_Administration')
-rw-r--r--doc/sphinx/Pacemaker_Administration/agents.rst443
-rw-r--r--doc/sphinx/Pacemaker_Administration/alerts.rst311
-rw-r--r--doc/sphinx/Pacemaker_Administration/cluster.rst21
-rw-r--r--doc/sphinx/Pacemaker_Administration/configuring.rst278
-rw-r--r--doc/sphinx/Pacemaker_Administration/index.rst36
-rw-r--r--doc/sphinx/Pacemaker_Administration/installing.rst9
-rw-r--r--doc/sphinx/Pacemaker_Administration/intro.rst21
-rw-r--r--doc/sphinx/Pacemaker_Administration/pcs-crmsh.rst441
-rw-r--r--doc/sphinx/Pacemaker_Administration/tools.rst562
-rw-r--r--doc/sphinx/Pacemaker_Administration/troubleshooting.rst123
-rw-r--r--doc/sphinx/Pacemaker_Administration/upgrading.rst534
11 files changed, 2779 insertions, 0 deletions
diff --git a/doc/sphinx/Pacemaker_Administration/agents.rst b/doc/sphinx/Pacemaker_Administration/agents.rst
new file mode 100644
index 0000000..e5b17e2
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Administration/agents.rst
@@ -0,0 +1,443 @@
+.. index::
+ single: resource agent
+
+Resource Agents
+---------------
+
+
+Action Completion
+#################
+
+If one resource depends on another resource via constraints, the cluster will
+interpret an expected result as sufficient to continue with dependent actions.
+This may cause timing issues if the resource agent start returns before the
+service is not only launched but fully ready to perform its function, or if the
+resource agent stop returns before the service has fully released all its
+claims on system resources. At a minimum, the start or stop should not return
+before a status command would return the expected (started or stopped) result.
+
+
+.. index::
+ single: OCF resource agent
+ single: resource agent; OCF
+
+OCF Resource Agents
+###################
+
+.. index::
+ single: OCF resource agent; location
+
+Location of Custom Scripts
+__________________________
+
+OCF Resource Agents are found in ``/usr/lib/ocf/resource.d/$PROVIDER``
+
+When creating your own agents, you are encouraged to create a new directory
+under ``/usr/lib/ocf/resource.d/`` so that they are not confused with (or
+overwritten by) the agents shipped by existing providers.
+
+So, for example, if you choose the provider name of big-corp and want a new
+resource named big-app, you would create a resource agent called
+``/usr/lib/ocf/resource.d/big-corp/big-app`` and define a resource:
+
+.. code-block: xml
+
+ <primitive id="custom-app" class="ocf" provider="big-corp" type="big-app"/>
+
+
+.. index::
+ single: OCF resource agent; action
+
+Actions
+_______
+
+All OCF resource agents are required to implement the following actions.
+
+.. table:: **Required Actions for OCF Agents**
+
+ +--------------+-------------+------------------------------------------------+
+ | Action | Description | Instructions |
+ +==============+=============+================================================+
+ | start | Start the | .. index:: |
+ | | resource | single: OCF resource agent; start |
+ | | | single: start action |
+ | | | |
+ | | | Return 0 on success and an appropriate |
+ | | | error code otherwise. Must not report |
+ | | | success until the resource is fully |
+ | | | active. |
+ +--------------+-------------+------------------------------------------------+
+ | stop | Stop the | .. index:: |
+ | | resource | single: OCF resource agent; stop |
+ | | | single: stop action |
+ | | | |
+ | | | Return 0 on success and an appropriate |
+ | | | error code otherwise. Must not report |
+ | | | success until the resource is fully |
+ | | | stopped. |
+ +--------------+-------------+------------------------------------------------+
+ | monitor | Check the | .. index:: |
+ | | resource's | single: OCF resource agent; monitor |
+ | | state | single: monitor action |
+ | | | |
+ | | | Exit 0 if the resource is running, 7 |
+ | | | if it is stopped, and any other OCF |
+ | | | exit code if it is failed. NOTE: The |
+ | | | monitor script should test the state |
+ | | | of the resource on the local machine |
+ | | | only. |
+ +--------------+-------------+------------------------------------------------+
+ | meta-data | Describe | .. index:: |
+ | | the | single: OCF resource agent; meta-data |
+ | | resource | single: meta-data action |
+ | | | |
+ | | | Provide information about this |
+ | | | resource in the XML format defined by |
+ | | | the OCF standard. Exit with 0. NOTE: |
+ | | | This is *not* required to be performed |
+ | | | as root. |
+ +--------------+-------------+------------------------------------------------+
+
+OCF resource agents may optionally implement additional actions. Some are used
+only with advanced resource types such as clones.
+
+.. table:: **Optional Actions for OCF Resource Agents**
+
+ +--------------+-------------+------------------------------------------------+
+ | Action | Description | Instructions |
+ +==============+=============+================================================+
+ | validate-all | This should | .. index:: |
+ | | validate | single: OCF resource agent; validate-all |
+ | | the | single: validate-all action |
+ | | instance | |
+ | | parameters | Return 0 if parameters are valid, 2 if |
+ | | provided. | not valid, and 6 if resource is not |
+ | | | configured. |
+ +--------------+-------------+------------------------------------------------+
+ | promote | Bring the | .. index:: |
+ | | local | single: OCF resource agent; promote |
+ | | instance of | single: promote action |
+ | | a promotable| |
+ | | clone | Return 0 on success |
+ | | resource to | |
+ | | the promoted| |
+ | | role. | |
+ +--------------+-------------+------------------------------------------------+
+ | demote | Bring the | .. index:: |
+ | | local | single: OCF resource agent; demote |
+ | | instance of | single: demote action |
+ | | a promotable| |
+ | | clone | Return 0 on success |
+ | | resource to | |
+ | | the | |
+ | | unpromoted | |
+ | | role. | |
+ +--------------+-------------+------------------------------------------------+
+ | notify | Used by the | .. index:: |
+ | | cluster to | single: OCF resource agent; notify |
+ | | send | single: notify action |
+ | | the agent | |
+ | | pre- and | Must not fail. Must exit with 0 |
+ | | post- | |
+ | | notification| |
+ | | events | |
+ | | telling the | |
+ | | resource | |
+ | | what has | |
+ | | happened and| |
+ | | will happen.| |
+ +--------------+-------------+------------------------------------------------+
+ | reload | Reload the | .. index:: |
+ | | service's | single: OCF resource agent; reload |
+ | | own | single: reload action |
+ | | config. | |
+ | | | Not used by Pacemaker |
+ +--------------+-------------+------------------------------------------------+
+ | reload-agent | Make | .. index:: |
+ | | effective | single: OCF resource agent; reload-agent |
+ | | any changes | single: reload-agent action |
+ | | in instance | |
+ | | parameters | This is used when the agent can handle a |
+ | | marked as | change in some of its parameters more |
+ | | reloadable | efficiently than stopping and starting the |
+ | | in the | resource. |
+ | | agent's | |
+ | | meta-data. | |
+ +--------------+-------------+------------------------------------------------+
+ | recover | Restart the | .. index:: |
+ | | service. | single: OCF resource agent; recover |
+ | | | single: recover action |
+ | | | |
+ | | | Not used by Pacemaker |
+ +--------------+-------------+------------------------------------------------+
+
+.. important::
+
+ If you create a new OCF resource agent, use `ocf-tester` to verify that the
+ agent complies with the OCF standard properly.
+
+
+.. index::
+ single: OCF resource agent; return code
+
+How are OCF Return Codes Interpreted?
+_____________________________________
+
+The first thing the cluster does is to check the return code against
+the expected result. If the result does not match the expected value,
+then the operation is considered to have failed, and recovery action is
+initiated.
+
+There are three types of failure recovery:
+
+.. table:: **Types of recovery performed by the cluster**
+
+ +-------+--------------------------------------------+--------------------------------------+
+ | Type | Description | Action Taken by the Cluster |
+ +=======+============================================+======================================+
+ | soft | .. index:: | Restart the resource or move it to a |
+ | | single: OCF resource agent; soft error | new location |
+ | | | |
+ | | A transient error occurred | |
+ +-------+--------------------------------------------+--------------------------------------+
+ | hard | .. index:: | Move the resource elsewhere and |
+ | | single: OCF resource agent; hard error | prevent it from being retried on the |
+ | | | current node |
+ | | A non-transient error that | |
+ | | may be specific to the | |
+ | | current node | |
+ +-------+--------------------------------------------+--------------------------------------+
+ | fatal | .. index:: | Stop the resource and prevent it |
+ | | single: OCF resource agent; fatal error | from being started on any cluster |
+ | | | node |
+ | | A non-transient error that | |
+ | | will be common to all | |
+ | | cluster nodes (e.g. a bad | |
+ | | configuration was specified) | |
+ +-------+--------------------------------------------+--------------------------------------+
+
+.. _ocf_return_codes:
+
+OCF Return Codes
+________________
+
+The following table outlines the different OCF return codes and the type of
+recovery the cluster will initiate when a failure code is received. Although
+counterintuitive, even actions that return 0 (aka. ``OCF_SUCCESS``) can be
+considered to have failed, if 0 was not the expected return value.
+
+.. table:: **OCF Exit Codes and their Recovery Types**
+
+ +-------+-----------------------+---------------------------------------------------+----------+
+ | Exit | OCF Alias | Description | Recovery |
+ | Code | | | |
+ +=======+=======================+===================================================+==========+
+ | 0 | OCF_SUCCESS | .. index:: | soft |
+ | | | single: OCF_SUCCESS | |
+ | | | single: OCF return code; OCF_SUCCESS | |
+ | | | pair: OCF return code; 0 | |
+ | | | | |
+ | | | Success. The command completed successfully. | |
+ | | | This is the expected result for all start, | |
+ | | | stop, promote and demote commands. | |
+ +-------+-----------------------+---------------------------------------------------+----------+
+ | 1 | OCF_ERR_GENERIC | .. index:: | soft |
+ | | | single: OCF_ERR_GENERIC | |
+ | | | single: OCF return code; OCF_ERR_GENERIC | |
+ | | | pair: OCF return code; 1 | |
+ | | | | |
+ | | | Generic "there was a problem" error code. | |
+ +-------+-----------------------+---------------------------------------------------+----------+
+ | 2 | OCF_ERR_ARGS | .. index:: | hard |
+ | | | single: OCF_ERR_ARGS | |
+ | | | single: OCF return code; OCF_ERR_ARGS | |
+ | | | pair: OCF return code; 2 | |
+ | | | | |
+ | | | The resource's parameter values are not valid on | |
+ | | | this machine (for example, a value refers to a | |
+ | | | file not found on the local host). | |
+ +-------+-----------------------+---------------------------------------------------+----------+
+ | 3 | OCF_ERR_UNIMPLEMENTED | .. index:: | hard |
+ | | | single: OCF_ERR_UNIMPLEMENTED | |
+ | | | single: OCF return code; OCF_ERR_UNIMPLEMENTED | |
+ | | | pair: OCF return code; 3 | |
+ | | | | |
+ | | | The requested action is not implemented. | |
+ +-------+-----------------------+---------------------------------------------------+----------+
+ | 4 | OCF_ERR_PERM | .. index:: | hard |
+ | | | single: OCF_ERR_PERM | |
+ | | | single: OCF return code; OCF_ERR_PERM | |
+ | | | pair: OCF return code; 4 | |
+ | | | | |
+ | | | The resource agent does not have | |
+ | | | sufficient privileges to complete the task. | |
+ +-------+-----------------------+---------------------------------------------------+----------+
+ | 5 | OCF_ERR_INSTALLED | .. index:: | hard |
+ | | | single: OCF_ERR_INSTALLED | |
+ | | | single: OCF return code; OCF_ERR_INSTALLED | |
+ | | | pair: OCF return code; 5 | |
+ | | | | |
+ | | | The tools required by the resource are | |
+ | | | not installed on this machine. | |
+ +-------+-----------------------+---------------------------------------------------+----------+
+ | 6 | OCF_ERR_CONFIGURED | .. index:: | fatal |
+ | | | single: OCF_ERR_CONFIGURED | |
+ | | | single: OCF return code; OCF_ERR_CONFIGURED | |
+ | | | pair: OCF return code; 6 | |
+ | | | | |
+ | | | The resource's parameter values are inherently | |
+ | | | invalid (for example, a required parameter was | |
+ | | | not given). | |
+ +-------+-----------------------+---------------------------------------------------+----------+
+ | 7 | OCF_NOT_RUNNING | .. index:: | N/A |
+ | | | single: OCF_NOT_RUNNING | |
+ | | | single: OCF return code; OCF_NOT_RUNNING | |
+ | | | pair: OCF return code; 7 | |
+ | | | | |
+ | | | The resource is safely stopped. This should only | |
+ | | | be returned by monitor actions, not stop actions. | |
+ +-------+-----------------------+---------------------------------------------------+----------+
+ | 8 | OCF_RUNNING_PROMOTED | .. index:: | soft |
+ | | | single: OCF_RUNNING_PROMOTED | |
+ | | | single: OCF return code; OCF_RUNNING_PROMOTED | |
+ | | | pair: OCF return code; 8 | |
+ | | | | |
+ | | | The resource is running in the promoted role. | |
+ +-------+-----------------------+---------------------------------------------------+----------+
+ | 9 | OCF_FAILED_PROMOTED | .. index:: | soft |
+ | | | single: OCF_FAILED_PROMOTED | |
+ | | | single: OCF return code; OCF_FAILED_PROMOTED | |
+ | | | pair: OCF return code; 9 | |
+ | | | | |
+ | | | The resource is (or might be) in the promoted | |
+ | | | role but has failed. The resource will be | |
+ | | | demoted, stopped and then started (and possibly | |
+ | | | promoted) again. | |
+ +-------+-----------------------+---------------------------------------------------+----------+
+ | 190 | OCF_DEGRADED | .. index:: | none |
+ | | | single: OCF_DEGRADED | |
+ | | | single: OCF return code; OCF_DEGRADED | |
+ | | | pair: OCF return code; 190 | |
+ | | | | |
+ | | | The resource is properly active, but in such a | |
+ | | | condition that future failures are more likely. | |
+ +-------+-----------------------+---------------------------------------------------+----------+
+ | 191 | OCF_DEGRADED_PROMOTED | .. index:: | none |
+ | | | single: OCF_DEGRADED_PROMOTED | |
+ | | | single: OCF return code; OCF_DEGRADED_PROMOTED | |
+ | | | pair: OCF return code; 191 | |
+ | | | | |
+ | | | The resource is properly active in the promoted | |
+ | | | role, but in such a condition that future | |
+ | | | failures are more likely. | |
+ +-------+-----------------------+---------------------------------------------------+----------+
+ | other | *none* | Custom error code. | soft |
+ +-------+-----------------------+---------------------------------------------------+----------+
+
+Exceptions to the recovery handling described above:
+
+* Probes (non-recurring monitor actions) that find a resource active
+ (or in the promoted role) will not result in recovery action unless it is
+ also found active elsewhere.
+* The recovery action taken when a resource is found active more than
+ once is determined by the resource's ``multiple-active`` property.
+* Recurring actions that return ``OCF_ERR_UNIMPLEMENTED``
+ do not cause any type of recovery.
+* Actions that return one of the "degraded" codes will be treated the same as
+ if they had returned success, but status output will indicate that the
+ resource is degraded.
+
+
+.. index::
+ single: resource agent; LSB
+ single: LSB resource agent
+ single: init script
+
+LSB Resource Agents (Init Scripts)
+##################################
+
+LSB Compliance
+______________
+
+The relevant part of the
+`LSB specifications <http://refspecs.linuxfoundation.org/lsb.shtml>`_
+includes a description of all the return codes listed here.
+
+Assuming `some_service` is configured correctly and currently
+inactive, the following sequence will help you determine if it is
+LSB-compatible:
+
+#. Start (stopped):
+
+ .. code-block:: none
+
+ # /etc/init.d/some_service start ; echo "result: $?"
+
+ * Did the service start?
+ * Did the echo command print ``result: 0`` (in addition to the init script's
+ usual output)?
+
+#. Status (running):
+
+ .. code-block:: none
+
+ # /etc/init.d/some_service status ; echo "result: $?"
+
+ * Did the script accept the command?
+ * Did the script indicate the service was running?
+ * Did the echo command print ``result: 0`` (in addition to the init script's
+ usual output)?
+
+#. Start (running):
+
+ .. code-block:: none
+
+ # /etc/init.d/some_service start ; echo "result: $?"
+
+ * Is the service still running?
+ * Did the echo command print ``result: 0`` (in addition to the init
+ script's usual output)?
+
+#. Stop (running):
+
+ .. code-block:: none
+
+ # /etc/init.d/some_service stop ; echo "result: $?"
+
+ * Was the service stopped?
+ * Did the echo command print ``result: 0`` (in addition to the init
+ script's usual output)?
+
+#. Status (stopped):
+
+ .. code-block:: none
+
+ # /etc/init.d/some_service status ; echo "result: $?"
+
+ * Did the script accept the command?
+ * Did the script indicate the service was not running?
+ * Did the echo command print ``result: 3`` (in addition to the init
+ script's usual output)?
+
+#. Stop (stopped):
+
+ .. code-block:: none
+
+ # /etc/init.d/some_service stop ; echo "result: $?"
+
+ * Is the service still stopped?
+ * Did the echo command print ``result: 0`` (in addition to the init
+ script's usual output)?
+
+#. Status (failed):
+
+ This step is not readily testable and relies on manual inspection of the script.
+
+ The script can use one of the error codes (other than 3) listed in the
+ LSB spec to indicate that it is active but failed. This tells the
+ cluster that before moving the resource to another node, it needs to
+ stop it on the existing one first.
+
+If the answer to any of the above questions is no, then the script is not
+LSB-compliant. Your options are then to either fix the script or write an OCF
+agent based on the existing script.
diff --git a/doc/sphinx/Pacemaker_Administration/alerts.rst b/doc/sphinx/Pacemaker_Administration/alerts.rst
new file mode 100644
index 0000000..c0f54c6
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Administration/alerts.rst
@@ -0,0 +1,311 @@
+.. index::
+ single: alert; agents
+
+Alert Agents
+------------
+
+.. index::
+ single: alert; sample agents
+
+Using the Sample Alert Agents
+#############################
+
+Pacemaker provides several sample alert agents, installed in
+``/usr/share/pacemaker/alerts`` by default.
+
+While these sample scripts may be copied and used as-is, they are provided
+mainly as templates to be edited to suit your purposes. See their source code
+for the full set of instance attributes they support.
+
+.. topic:: Sending cluster events as SNMP v2c traps
+
+ .. code-block:: xml
+
+ <configuration>
+ <alerts>
+ <alert id="snmp_alert" path="/path/to/alert_snmp.sh">
+ <instance_attributes id="config_for_alert_snmp">
+ <nvpair id="trap_node_states" name="trap_node_states"
+ value="all"/>
+ </instance_attributes>
+ <meta_attributes id="config_for_timestamp">
+ <nvpair id="ts_fmt" name="timestamp-format"
+ value="%Y-%m-%d,%H:%M:%S.%01N"/>
+ </meta_attributes>
+ <recipient id="snmp_destination" value="192.168.1.2"/>
+ </alert>
+ </alerts>
+ </configuration>
+
+.. note:: **SNMP alert agent attributes**
+
+ The ``timestamp-format`` meta-attribute should always be set to
+ ``%Y-%m-%d,%H:%M:%S.%01N`` when using the SNMP agent, to match the SNMP
+ standard.
+
+ The SNMP agent provides a number of instance attributes in addition to the
+ one used in the example above. The most useful are ``trap_version``, which
+ defaults to ``2c``, and ``trap_community``, which defaults to ``public``.
+ See the source code for more details.
+
+.. topic:: Sending cluster events as SNMP v3 traps
+
+ .. code-block:: xml
+
+ <configuration>
+ <alerts>
+ <alert id="snmp_alert" path="/path/to/alert_snmp.sh">
+ <instance_attributes id="config_for_alert_snmp">
+ <nvpair id="trap_node_states" name="trap_node_states"
+ value="all"/>
+ <nvpair id="trap_version" name="trap_version" value="3"/>
+ <nvpair id="trap_community" name="trap_community" value=""/>
+ <nvpair id="trap_options" name="trap_options"
+ value="-l authNoPriv -a MD5 -u testuser -A secret1"/>
+ </instance_attributes>
+ <meta_attributes id="config_for_timestamp">
+ <nvpair id="ts_fmt" name="timestamp-format"
+ value="%Y-%m-%d,%H:%M:%S.%01N"/>
+ </meta_attributes>
+ <recipient id="snmp_destination" value="192.168.1.2"/>
+ </alert>
+ </alerts>
+ </configuration>
+
+.. note:: **SNMP v3 trap configuration**
+
+ To use SNMP v3, ``trap_version`` must be set to ``3``. ``trap_community``
+ will be ignored.
+
+ The example above uses the ``trap_options`` instance attribute to override
+ the security level, authentication protocol, authentication user, and
+ authentication password from snmp.conf. These will be passed to the snmptrap
+ command. Passing the password on the command line is considered insecure;
+ specify authentication and privacy options suitable for your environment.
+
+.. topic:: Sending cluster events as e-mails
+
+ .. code-block:: xml
+
+ <configuration>
+ <alerts>
+ <alert id="smtp_alert" path="/path/to/alert_smtp.sh">
+ <instance_attributes id="config_for_alert_smtp">
+ <nvpair id="email_sender" name="email_sender"
+ value="donotreply@example.com"/>
+ </instance_attributes>
+ <recipient id="smtp_destination" value="admin@example.com"/>
+ </alert>
+ </alerts>
+ </configuration>
+
+
+.. index::
+ single: alert; agent development
+
+Writing an Alert Agent
+######################
+
+.. index::
+ single: alert; environment variables
+ single: environment variable; alert agents
+
+.. table:: **Environment variables passed to alert agents**
+ :class: longtable
+ :widths: 1 3
+
+ +---------------------------+----------------------------------------------------------------+
+ | Environment Variable | Description |
+ +===========================+================================================================+
+ | CRM_alert_kind | .. index:: |
+ | | single:environment variable; CRM_alert_kind |
+ | | single:CRM_alert_kind |
+ | | |
+ | | The type of alert (``node``, ``fencing``, ``resource``, or |
+ | | ``attribute``) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_node | .. index:: |
+ | | single:environment variable; CRM_alert_node |
+ | | single:CRM_alert_node |
+ | | |
+ | | Name of affected node |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_node_sequence | .. index:: |
+ | | single:environment variable; CRM_alert_sequence |
+ | | single:CRM_alert_sequence |
+ | | |
+ | | A sequence number increased whenever an alert is being issued |
+ | | on the local node, which can be used to reference the order in |
+ | | which alerts have been issued by Pacemaker. An alert for an |
+ | | event that happened later in time reliably has a higher |
+ | | sequence number than alerts for earlier events. |
+ | | |
+ | | Be aware that this number has no cluster-wide meaning. |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_recipient | .. index:: |
+ | | single:environment variable; CRM_alert_recipient |
+ | | single:CRM_alert_recipient |
+ | | |
+ | | The configured recipient |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_timestamp | .. index:: |
+ | | single:environment variable; CRM_alert_timestamp |
+ | | single:CRM_alert_timestamp |
+ | | |
+ | | A timestamp created prior to executing the agent, in the |
+ | | format specified by the ``timestamp-format`` meta-attribute. |
+ | | This allows the agent to have a reliable, high-precision time |
+ | | of when the event occurred, regardless of when the agent |
+ | | itself was invoked (which could potentially be delayed due to |
+ | | system load, etc.). |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_timestamp_epoch | .. index:: |
+ | | single:environment variable; CRM_alert_timestamp_epoch |
+ | | single:CRM_alert_timestamp_epoch |
+ | | |
+ | | The same time as ``CRM_alert_timestamp``, expressed as the |
+ | | integer number of seconds since January 1, 1970. This (along |
+ | | with ``CRM_alert_timestamp_usec``) can be useful for alert |
+ | | agents that need to format time in a specific way rather than |
+ | | let the user configure it. |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_timestamp_usec | .. index:: |
+ | | single:environment variable; CRM_alert_timestamp_usec |
+ | | single:CRM_alert_timestamp_usec |
+ | | |
+ | | The same time as ``CRM_alert_timestamp``, expressed as the |
+ | | integer number of microseconds since |
+ | | ``CRM_alert_timestamp_epoch``. |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_version | .. index:: |
+ | | single:environment variable; CRM_alert_version |
+ | | single:CRM_alert_version |
+ | | |
+ | | The version of Pacemaker sending the alert |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_desc | .. index:: |
+ | | single:environment variable; CRM_alert_desc |
+ | | single:CRM_alert_desc |
+ | | |
+ | | Detail about event. For ``node`` alerts, this is the node's |
+ | | current state (``member`` or ``lost``). For ``fencing`` |
+ | | alerts, this is a summary of the requested fencing operation, |
+ | | including origin, target, and fencing operation error code, if |
+ | | any. For ``resource`` alerts, this is a readable string |
+ | | equivalent of ``CRM_alert_status``. |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_nodeid | .. index:: |
+ | | single:environment variable; CRM_alert_nodeid |
+ | | single:CRM_alert_nodeid |
+ | | |
+ | | ID of node whose status changed (provided with ``node`` alerts |
+ | | only) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_rc | .. index:: |
+ | | single:environment variable; CRM_alert_rc |
+ | | single:CRM_alert_rc |
+ | | |
+ | | The numerical return code of the fencing or resource operation |
+ | | (provided with ``fencing`` and ``resource`` alerts only) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_task | .. index:: |
+ | | single:environment variable; CRM_alert_task |
+ | | single:CRM_alert_task |
+ | | |
+ | | The requested fencing or resource operation (provided with |
+ | | ``fencing`` and ``resource`` alerts only) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_exec_time | .. index:: |
+ | | single:environment variable; CRM_alert_exec_time |
+ | | single:CRM_alert_exec_time |
+ | | |
+ | | The (wall-clock) time, in milliseconds, that it took to |
+ | | execute the action. If the action timed out, |
+ | | ``CRM_alert_status`` will be 2, ``CRM_alert_desc`` will be |
+ | | "Timed Out", and this value will be the action timeout. May |
+ | | not be supported on all platforms. (``resource`` alerts only) |
+ | | *(since 2.0.1)* |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_interval | .. index:: |
+ | | single:environment variable; CRM_alert_interval |
+ | | single:CRM_alert_interval |
+ | | |
+ | | The interval of the resource operation (``resource`` alerts |
+ | | only) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_rsc | .. index:: |
+ | | single:environment variable; CRM_alert_rsc |
+ | | single:CRM_alert_rsc |
+ | | |
+ | | The name of the affected resource (``resource`` alerts only) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_status | .. index:: |
+ | | single:environment variable; CRM_alert_status |
+ | | single:CRM_alert_status |
+ | | |
+ | | A numerical code used by Pacemaker to represent the operation |
+ | | result (``resource`` alerts only) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_target_rc | .. index:: |
+ | | single:environment variable; CRM_alert_target_rc |
+ | | single:CRM_alert_target_rc |
+ | | |
+ | | The expected numerical return code of the operation |
+ | | (``resource`` alerts only) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_attribute_name | .. index:: |
+ | | single:environment variable; CRM_alert_attribute_name |
+ | | single:CRM_alert_attribute_name |
+ | | |
+ | | The name of the node attribute that changed (``attribute`` |
+ | | alerts only) |
+ +---------------------------+----------------------------------------------------------------+
+ | CRM_alert_attribute_value | .. index:: |
+ | | single:environment variable; CRM_alert_attribute_value |
+ | | single:CRM_alert_attribute_value |
+ | | |
+ | | The new value of the node attribute that changed |
+ | | (``attribute`` alerts only) |
+ +---------------------------+----------------------------------------------------------------+
+
+Special concerns when writing alert agents:
+
+* Alert agents may be called with no recipient (if none is configured),
+ so the agent must be able to handle this situation, even if it
+ only exits in that case. (Users may modify the configuration in
+ stages, and add a recipient later.)
+
+* If more than one recipient is configured for an alert, the alert agent will
+ be called once per recipient. If an agent is not able to run concurrently, it
+ should be configured with only a single recipient. The agent is free,
+ however, to interpret the recipient as a list.
+
+* When a cluster event occurs, all alerts are fired off at the same time as
+ separate processes. Depending on how many alerts and recipients are
+ configured, and on what is done within the alert agents,
+ a significant load burst may occur. The agent could be written to take
+ this into consideration, for example by queueing resource-intensive actions
+ into some other instance, instead of directly executing them.
+
+* Alert agents are run as the ``hacluster`` user, which has a minimal set
+ of permissions. If an agent requires additional privileges, it is
+ recommended to configure ``sudo`` to allow the agent to run the necessary
+ commands as another user with the appropriate privileges.
+
+* As always, take care to validate and sanitize user-configured parameters,
+ such as ``CRM_alert_timestamp`` (whose content is specified by the
+ user-configured ``timestamp-format``), ``CRM_alert_recipient,`` and all
+ instance attributes. Mostly this is needed simply to protect against
+ configuration errors, but if some user can modify the CIB without having
+ ``hacluster``-level access to the cluster nodes, it is a potential security
+ concern as well, to avoid the possibility of code injection.
+
+.. note:: **ocf:pacemaker:ClusterMon compatibility**
+
+ The alerts interface is designed to be backward compatible with the external
+ scripts interface used by the ``ocf:pacemaker:ClusterMon`` resource, which
+ is now deprecated. To preserve this compatibility, the environment variables
+ passed to alert agents are available prepended with ``CRM_notify_``
+ as well as ``CRM_alert_``. One break in compatibility is that ``ClusterMon``
+ ran external scripts as the ``root`` user, while alert agents are run as the
+ ``hacluster`` user.
diff --git a/doc/sphinx/Pacemaker_Administration/cluster.rst b/doc/sphinx/Pacemaker_Administration/cluster.rst
new file mode 100644
index 0000000..3713733
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Administration/cluster.rst
@@ -0,0 +1,21 @@
+.. index::
+ single: cluster layer
+
+The Cluster Layer
+-----------------
+
+Pacemaker utilizes an underlying cluster layer for two purposes:
+
+* obtaining quorum
+* messaging between nodes
+
+.. index::
+ single: cluster layer; Corosync
+ single: Corosync
+
+Currently, only Corosync 2 and later is supported for this layer.
+
+This document assumes you have configured the cluster nodes in Corosync
+already. High-level cluster management tools are available that can configure
+Corosync for you. If you want the lower-level details, see the
+`Corosync documentation <https://corosync.github.io/corosync/>`_.
diff --git a/doc/sphinx/Pacemaker_Administration/configuring.rst b/doc/sphinx/Pacemaker_Administration/configuring.rst
new file mode 100644
index 0000000..415dd81
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Administration/configuring.rst
@@ -0,0 +1,278 @@
+.. index::
+ single: configuration
+ single: CIB
+
+Configuring Pacemaker
+---------------------
+
+Pacemaker's configuration, the CIB, is stored in XML format. Cluster
+administrators have multiple options for modifying the configuration either via
+the XML, or at a more abstract (and easier for humans to understand) level.
+
+Pacemaker reacts to configuration changes as soon as they are saved.
+Pacemaker's command-line tools and most higher-level tools provide the ability
+to batch changes together and commit them at once, rather than make a series of
+small changes, which could cause avoid unnecessary actions as Pacemaker
+responds to each change individually.
+
+Pacemaker tracks revisions to the configuration and will reject any update
+older than the current revision. Thus, it is a good idea to serialize all
+changes to the configuration. Avoid attempting simultaneous changes, whether on
+the same node or different nodes, and whether manually or using some automated
+configuration tool.
+
+.. note::
+
+ It is not necessary to update the configuration on all cluster nodes.
+ Pacemaker immediately synchronizes changes to all active members of the
+ cluster. To reduce bandwidth, the cluster only broadcasts the incremental
+ updates that result from your changes and uses checksums to ensure that each
+ copy is consistent.
+
+
+Configuration Using Higher-level Tools
+######################################
+
+Most users will benefit from using higher-level tools provided by projects
+separate from Pacemaker. Some of the most commonly used include the crm shell,
+hawk, and pcs. [#]_
+
+See those projects' documentation for details on how to configure Pacemaker
+using them.
+
+
+Configuration Using Pacemaker's Command-Line Tools
+##################################################
+
+Pacemaker provides lower-level, command-line tools to manage the cluster. Most
+configuration tasks can be performed with these tools, without needing any XML
+knowledge.
+
+To enable STONITH for example, one could run:
+
+.. code-block:: none
+
+ # crm_attribute --name stonith-enabled --update 1
+
+Or, to check whether **node1** is allowed to run resources, there is:
+
+.. code-block:: none
+
+ # crm_standby --query --node node1
+
+Or, to change the failure threshold of **my-test-rsc**, one can use:
+
+.. code-block:: none
+
+ # crm_resource -r my-test-rsc --set-parameter migration-threshold --parameter-value 3 --meta
+
+Examples of using these tools for specific cases will be given throughout this
+document where appropriate. See the man pages for further details.
+
+See :ref:`cibadmin` for how to edit the CIB using XML.
+
+See :ref:`crm_shadow` for a way to make a series of changes, then commit them
+all at once to the live cluster.
+
+
+.. index::
+ single: configuration; CIB properties
+ single: CIB; properties
+ single: CIB property
+
+Working with CIB Properties
+___________________________
+
+Although these fields can be written to by the user, in
+most cases the cluster will overwrite any values specified by the
+user with the "correct" ones.
+
+To change the ones that can be specified by the user, for example
+``admin_epoch``, one should use:
+
+.. code-block:: none
+
+ # cibadmin --modify --xml-text '<cib admin_epoch="42"/>'
+
+A complete set of CIB properties will look something like this:
+
+.. topic:: XML attributes set for a cib element
+
+ .. code-block:: xml
+
+ <cib crm_feature_set="3.0.7" validate-with="pacemaker-1.2"
+ admin_epoch="42" epoch="116" num_updates="1"
+ cib-last-written="Mon Jan 12 15:46:39 2015" update-origin="rhel7-1"
+ update-client="crm_attribute" have-quorum="1" dc-uuid="1">
+
+
+.. index::
+ single: configuration; cluster options
+
+Querying and Setting Cluster Options
+____________________________________
+
+Cluster options can be queried and modified using the ``crm_attribute`` tool.
+To get the current value of ``cluster-delay``, you can run:
+
+.. code-block:: none
+
+ # crm_attribute --query --name cluster-delay
+
+which is more simply written as
+
+.. code-block:: none
+
+ # crm_attribute -G -n cluster-delay
+
+If a value is found, you'll see a result like this:
+
+.. code-block:: none
+
+ # crm_attribute -G -n cluster-delay
+ scope=crm_config name=cluster-delay value=60s
+
+If no value is found, the tool will display an error:
+
+.. code-block:: none
+
+ # crm_attribute -G -n clusta-deway
+ scope=crm_config name=clusta-deway value=(null)
+ Error performing operation: No such device or address
+
+To use a different value (for example, 30 seconds), simply run:
+
+.. code-block:: none
+
+ # crm_attribute --name cluster-delay --update 30s
+
+To go back to the cluster's default value, you can delete the value, for example:
+
+.. code-block:: none
+
+ # crm_attribute --name cluster-delay --delete
+ Deleted crm_config option: id=cib-bootstrap-options-cluster-delay name=cluster-delay
+
+
+When Options are Listed More Than Once
+______________________________________
+
+If you ever see something like the following, it means that the option you're
+modifying is present more than once.
+
+.. topic:: Deleting an option that is listed twice
+
+ .. code-block:: none
+
+ # crm_attribute --name batch-limit --delete
+
+ Please choose from one of the matches below and supply the 'id' with --id
+ Multiple attributes match name=batch-limit in crm_config:
+ Value: 50 (set=cib-bootstrap-options, id=cib-bootstrap-options-batch-limit)
+ Value: 100 (set=custom, id=custom-batch-limit)
+
+In such cases, follow the on-screen instructions to perform the requested
+action. To determine which value is currently being used by the cluster, refer
+to the "Rules" chapter of *Pacemaker Explained*.
+
+
+.. index::
+ single: configuration; remote
+
+.. _remote_connection:
+
+Connecting from a Remote Machine
+################################
+
+Provided Pacemaker is installed on a machine, it is possible to connect to the
+cluster even if the machine itself is not in the same cluster. To do this, one
+simply sets up a number of environment variables and runs the same commands as
+when working on a cluster node.
+
+.. table:: **Environment Variables Used to Connect to Remote Instances of the CIB**
+
+ +----------------------+-----------+------------------------------------------------+
+ | Environment Variable | Default | Description |
+ +======================+===========+================================================+
+ | CIB_user | $USER | .. index:: |
+ | | | single: CIB_user |
+ | | | single: environment variable; CIB_user |
+ | | | |
+ | | | The user to connect as. Needs to be |
+ | | | part of the ``haclient`` group on |
+ | | | the target host. |
+ +----------------------+-----------+------------------------------------------------+
+ | CIB_passwd | | .. index:: |
+ | | | single: CIB_passwd |
+ | | | single: environment variable; CIB_passwd |
+ | | | |
+ | | | The user's password. Read from the |
+ | | | command line if unset. |
+ +----------------------+-----------+------------------------------------------------+
+ | CIB_server | localhost | .. index:: |
+ | | | single: CIB_server |
+ | | | single: environment variable; CIB_server |
+ | | | |
+ | | | The host to contact |
+ +----------------------+-----------+------------------------------------------------+
+ | CIB_port | | .. index:: |
+ | | | single: CIB_port |
+ | | | single: environment variable; CIB_port |
+ | | | |
+ | | | The port on which to contact the server; |
+ | | | required. |
+ +----------------------+-----------+------------------------------------------------+
+ | CIB_encrypted | TRUE | .. index:: |
+ | | | single: CIB_encrypted |
+ | | | single: environment variable; CIB_encrypted |
+ | | | |
+ | | | Whether to encrypt network traffic |
+ +----------------------+-----------+------------------------------------------------+
+
+So, if **c001n01** is an active cluster node and is listening on port 1234
+for connections, and **someuser** is a member of the **haclient** group,
+then the following would prompt for **someuser**'s password and return
+the cluster's current configuration:
+
+.. code-block:: none
+
+ # export CIB_port=1234; export CIB_server=c001n01; export CIB_user=someuser;
+ # cibadmin -Q
+
+For security reasons, the cluster does not listen for remote connections by
+default. If you wish to allow remote access, you need to set the
+``remote-tls-port`` (encrypted) or ``remote-clear-port`` (unencrypted) CIB
+properties (i.e., those kept in the ``cib`` tag, like ``num_updates`` and
+``epoch``).
+
+.. table:: **Extra top-level CIB properties for remote access**
+
+ +----------------------+-----------+------------------------------------------------------+
+ | CIB Property | Default | Description |
+ +======================+===========+======================================================+
+ | remote-tls-port | | .. index:: |
+ | | | single: remote-tls-port |
+ | | | single: CIB property; remote-tls-port |
+ | | | |
+ | | | Listen for encrypted remote connections |
+ | | | on this port. |
+ +----------------------+-----------+------------------------------------------------------+
+ | remote-clear-port | | .. index:: |
+ | | | single: remote-clear-port |
+ | | | single: CIB property; remote-clear-port |
+ | | | |
+ | | | Listen for plaintext remote connections |
+ | | | on this port. |
+ +----------------------+-----------+------------------------------------------------------+
+
+.. important::
+
+ The Pacemaker version on the administration host must be the same or greater
+ than the version(s) on the cluster nodes. Otherwise, it may not have the
+ schema files necessary to validate the CIB.
+
+
+.. rubric:: Footnotes
+
+.. [#] For a list, see "Configuration Tools" at
+ https://clusterlabs.org/components.html
diff --git a/doc/sphinx/Pacemaker_Administration/index.rst b/doc/sphinx/Pacemaker_Administration/index.rst
new file mode 100644
index 0000000..327ad31
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Administration/index.rst
@@ -0,0 +1,36 @@
+Pacemaker Administration
+========================
+
+*Managing Pacemaker Clusters*
+
+
+Abstract
+--------
+This document has instructions and tips for system administrators who
+manage high-availability clusters using Pacemaker.
+
+
+Table of Contents
+-----------------
+
+.. toctree::
+ :maxdepth: 3
+ :numbered:
+
+ intro
+ installing
+ cluster
+ configuring
+ tools
+ troubleshooting
+ upgrading
+ alerts
+ agents
+ pcs-crmsh
+
+
+Index
+-----
+
+* :ref:`genindex`
+* :ref:`search`
diff --git a/doc/sphinx/Pacemaker_Administration/installing.rst b/doc/sphinx/Pacemaker_Administration/installing.rst
new file mode 100644
index 0000000..44a3f5f
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Administration/installing.rst
@@ -0,0 +1,9 @@
+Installing Cluster Software
+---------------------------
+
+.. index:: installation
+
+Most major Linux distributions have pacemaker packages in their standard
+package repositories, or the software can be built from source code.
+See the `Install wiki page <https://wiki.clusterlabs.org/wiki/Install>`_
+for details.
diff --git a/doc/sphinx/Pacemaker_Administration/intro.rst b/doc/sphinx/Pacemaker_Administration/intro.rst
new file mode 100644
index 0000000..067e293
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Administration/intro.rst
@@ -0,0 +1,21 @@
+Introduction
+------------
+
+The Scope of this Document
+##########################
+
+The purpose of this document is to help system administrators learn how to
+manage a Pacemaker cluster.
+
+System administrators may be interested in other parts of the
+`Pacemaker documentation set <https://www.clusterlabs.org/pacemaker/doc/>`_
+such as *Clusters from Scratch*, a step-by-step guide to setting up an example
+cluster, and *Pacemaker Explained*, an exhaustive reference for cluster
+configuration.
+
+Multiple higher-level tools (both command-line and GUI) are available to
+simplify cluster management. However, this document focuses on the lower-level
+command-line tools that come with Pacemaker itself. The concepts are applicable
+to the higher-level tools, though the syntax would differ.
+
+.. include:: ../shared/pacemaker-intro.rst
diff --git a/doc/sphinx/Pacemaker_Administration/pcs-crmsh.rst b/doc/sphinx/Pacemaker_Administration/pcs-crmsh.rst
new file mode 100644
index 0000000..61ab4e6
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Administration/pcs-crmsh.rst
@@ -0,0 +1,441 @@
+Quick Comparison of pcs and crm shell
+-------------------------------------
+
+``pcs`` and ``crm shell`` are two popular higher-level command-line interfaces
+to Pacemaker. Each has its own syntax; this chapter gives a quick comparion of
+how to accomplish the same tasks using either one. Some examples also show the
+equivalent command using low-level Pacmaker command-line tools.
+
+These examples show the simplest syntax; see the respective man pages for all
+possible options.
+
+Show Cluster Configuration and Status
+#####################################
+
+.. topic:: Show Configuration (Raw XML)
+
+ .. code-block:: none
+
+ crmsh # crm configure show xml
+ pcs # pcs cluster cib
+ pacemaker # cibadmin -Q
+
+.. topic:: Show Configuration (Human-friendly)
+
+ .. code-block:: none
+
+ crmsh # crm configure show
+ pcs # pcs config
+
+.. topic:: Show Cluster Status
+
+ .. code-block:: none
+
+ crmsh # crm status
+ pcs # pcs status
+ pacemaker # crm_mon -1
+
+Manage Nodes
+############
+
+.. topic:: Put node "pcmk-1" in standby mode
+
+ .. code-block:: none
+
+ crmsh # crm node standby pcmk-1
+ pcs-0.9 # pcs cluster standby pcmk-1
+ pcs-0.10 # pcs node standby pcmk-1
+ pacemaker # crm_standby -N pcmk-1 -v on
+
+.. topic:: Remove node "pcmk-1" from standby mode
+
+ .. code-block:: none
+
+ crmsh # crm node online pcmk-1
+ pcs-0.9 # pcs cluster unstandby pcmk-1
+ pcs-0.10 # pcs node unstandby pcmk-1
+ pacemaker # crm_standby -N pcmk-1 -v off
+
+Manage Cluster Properties
+#########################
+
+.. topic:: Set the "stonith-enabled" cluster property to "false"
+
+ .. code-block:: none
+
+ crmsh # crm configure property stonith-enabled=false
+ pcs # pcs property set stonith-enabled=false
+ pacemaker # crm_attribute -n stonith-enabled -v false
+
+Show Resource Agent Information
+###############################
+
+.. topic:: List Resource Agent (RA) Classes
+
+ .. code-block:: none
+
+ crmsh # crm ra classes
+ pcs # pcs resource standards
+ pacmaker # crm_resource --list-standards
+
+.. topic:: List Available Resource Agents (RAs) by Standard
+
+ .. code-block:: none
+
+ crmsh # crm ra list ocf
+ pcs # pcs resource agents ocf
+ pacemaker # crm_resource --list-agents ocf
+
+.. topic:: List Available Resource Agents (RAs) by OCF Provider
+
+ .. code-block:: none
+
+ crmsh # crm ra list ocf pacemaker
+ pcs # pcs resource agents ocf:pacemaker
+ pacemaker # crm_resource --list-agents ocf:pacemaker
+
+.. topic:: List Available Resource Agent Parameters
+
+ .. code-block:: none
+
+ crmsh # crm ra info IPaddr2
+ pcs # pcs resource describe IPaddr2
+ pacemaker # crm_resource --show-metadata ocf:heartbeat:IPaddr2
+
+You can also use the full ``class:provider:type`` format with crmsh and pcs if
+multiple RAs with the same name are available.
+
+.. topic:: Show Available Fence Agent Parameters
+
+ .. code-block:: none
+
+ crmsh # crm ra info stonith:fence_ipmilan
+ pcs # pcs stonith describe fence_ipmilan
+
+Manage Resources
+################
+
+.. topic:: Create a Resource
+
+ .. code-block:: none
+
+ crmsh # crm configure primitive ClusterIP ocf:heartbeat:IPaddr2 \
+ params ip=192.168.122.120 cidr_netmask=24 \
+ op monitor interval=30s
+ pcs # pcs resource create ClusterIP IPaddr2 ip=192.168.122.120 cidr_netmask=24
+
+pcs determines the standard and provider (``ocf:heartbeat``) automatically
+since ``IPaddr2`` is unique, and automatically creates operations (including
+monitor) based on the agent's meta-data.
+
+.. topic:: Show Configuration of All Resources
+
+ .. code-block:: none
+
+ crmsh # crm configure show
+ pcs-0.9 # pcs resource show --full
+ pcs-0.10 # pcs resource config
+
+.. topic:: Show Configuration of One Resource
+
+ .. code-block:: none
+
+ crmsh # crm configure show ClusterIP
+ pcs-0.9 # pcs resource show ClusterIP
+ pcs-0.10 # pcs resource config ClusterIP
+
+.. topic:: Show Configuration of Fencing Resources
+
+ .. code-block:: none
+
+ crmsh # crm resource status
+ pcs-0.9 # pcs stonith show --full
+ pcs-0.10 # pcs stonith config
+
+.. topic:: Start a Resource
+
+ .. code-block:: none
+
+ crmsh # crm resource start ClusterIP
+ pcs # pcs resource enable ClusterIP
+ pacemaker # crm_resource -r ClusterIP --set-parameter target-role --meta -v Started
+
+.. topic:: Stop a Resource
+
+ .. code-block:: none
+
+ crmsh # crm resource stop ClusterIP
+ pcs # pcs resource disable ClusterIP
+ pacemaker # crm_resource -r ClusterIP --set-parameter target-role --meta -v Stopped
+
+.. topic:: Remove a Resource
+
+ .. code-block:: none
+
+ crmsh # crm configure delete ClusterIP
+ pcs # pcs resource delete ClusterIP
+
+.. topic:: Modify a Resource's Instance Parameters
+
+ .. code-block:: none
+
+ crmsh # crm resource param ClusterIP set clusterip_hash=sourceip
+ pcs # pcs resource update ClusterIP clusterip_hash=sourceip
+ pacemaker # crm_resource -r ClusterIP --set-parameter clusterip_hash -v sourceip
+
+crmsh also has an `edit` command which edits the simplified CIB syntax
+(same commands as the command line) via a configurable text editor.
+
+.. topic:: Modify a Resource's Instance Parameters Interactively
+
+ .. code-block:: none
+
+ crmsh # crm configure edit ClusterIP
+
+Using the interactive shell mode of crmsh, multiple changes can be
+edited and verified before committing to the live configuration:
+
+.. topic:: Make Multiple Configuration Changes Interactively
+
+ .. code-block:: none
+
+ crmsh # crm configure
+ crmsh # edit
+ crmsh # verify
+ crmsh # commit
+
+.. topic:: Delete a Resource's Instance Parameters
+
+ .. code-block:: none
+
+ crmsh # crm resource param ClusterIP delete nic
+ pcs # pcs resource update ClusterIP nic=
+ pacemaker # crm_resource -r ClusterIP --delete-parameter nic
+
+.. topic:: List Current Resource Defaults
+
+ .. code-block:: none
+
+ crmsh # crm configure show type:rsc_defaults
+ pcs # pcs resource defaults
+ pacemaker # cibadmin -Q --scope rsc_defaults
+
+.. topic:: Set Resource Defaults
+
+ .. code-block:: none
+
+ crmsh # crm configure rsc_defaults resource-stickiness=100
+ pcs # pcs resource defaults resource-stickiness=100
+
+.. topic:: List Current Operation Defaults
+
+ .. code-block:: none
+
+ crmsh # crm configure show type:op_defaults
+ pcs # pcs resource op defaults
+ pacemaker # cibadmin -Q --scope op_defaults
+
+.. topic:: Set Operation Defaults
+
+ .. code-block:: none
+
+ crmsh # crm configure op_defaults timeout=240s
+ pcs # pcs resource op defaults timeout=240s
+
+.. topic:: Enable Resource Agent Tracing for a Resource
+
+ .. code-block:: none
+
+ crmsh # crm resource trace Website
+
+.. topic:: Clear Fail Counts for a Resource
+
+ .. code-block:: none
+
+ crmsh # crm resource cleanup Website
+ pcs # pcs resource cleanup Website
+ pacemaker # crm_resource --cleanup -r Website
+
+.. topic:: Create a Clone Resource
+
+ .. code-block:: none
+
+ crmsh # crm configure clone WebIP ClusterIP meta globally-unique=true clone-max=2 clone-node-max=2
+ pcs # pcs resource clone ClusterIP globally-unique=true clone-max=2 clone-node-max=2
+
+.. topic:: Create a Promotable Clone Resource
+
+ .. code-block:: none
+
+ crmsh # crm configure ms WebDataClone WebData \
+ meta master-max=1 master-node-max=1 \
+ clone-max=2 clone-node-max=1 notify=true
+ pcs-0.9 # pcs resource master WebDataClone WebData \
+ master-max=1 master-node-max=1 \
+ clone-max=2 clone-node-max=1 notify=true
+ pcs-0.10 # pcs resource promotable WebData WebDataClone \
+ promoted-max=1 promoted-node-max=1 \
+ clone-max=2 clone-node-max=1 notify=true
+
+pcs will generate the clone name automatically if it is omitted from the
+command line.
+
+
+Manage Constraints
+##################
+
+.. topic:: Create a Colocation Constraint
+
+ .. code-block:: none
+
+ crmsh # crm configure colocation website-with-ip INFINITY: WebSite ClusterIP
+ pcs # pcs constraint colocation add ClusterIP with WebSite INFINITY
+
+.. topic:: Create a Colocation Constraint Based on Role
+
+ .. code-block:: none
+
+ crmsh # crm configure colocation another-ip-with-website inf: AnotherIP WebSite:Master
+ pcs # pcs constraint colocation add Started AnotherIP with Promoted WebSite INFINITY
+
+.. topic:: Create an Ordering Constraint
+
+ .. code-block:: none
+
+ crmsh # crm configure order apache-after-ip mandatory: ClusterIP WebSite
+ pcs # pcs constraint order ClusterIP then WebSite
+
+.. topic:: Create an Ordering Constraint Based on Role
+
+ .. code-block:: none
+
+ crmsh # crm configure order ip-after-website Mandatory: WebSite:Master AnotherIP
+ pcs # pcs constraint order promote WebSite then start AnotherIP
+
+.. topic:: Create a Location Constraint
+
+ .. code-block:: none
+
+ crmsh # crm configure location prefer-pcmk-1 WebSite 50: pcmk-1
+ pcs # pcs constraint location WebSite prefers pcmk-1=50
+
+.. topic:: Create a Location Constraint Based on Role
+
+ .. code-block:: none
+
+ crmsh # crm configure location prefer-pcmk-1 WebSite rule role=Master 50: \#uname eq pcmk-1
+ pcs # pcs constraint location WebSite rule role=Promoted 50 \#uname eq pcmk-1
+
+.. topic:: Move a Resource to a Specific Node (by Creating a Location Constraint)
+
+ .. code-block:: none
+
+ crmsh # crm resource move WebSite pcmk-1
+ pcs # pcs resource move WebSite pcmk-1
+ pacemaker # crm_resource -r WebSite --move -N pcmk-1
+
+.. topic:: Move a Resource Away from Its Current Node (by Creating a Location Constraint)
+
+ .. code-block:: none
+
+ crmsh # crm resource ban Website pcmk-2
+ pcs # pcs resource ban Website pcmk-2
+ pacemaker # crm_resource -r WebSite --move
+
+.. topic:: Remove any Constraints Created by Moving a Resource
+
+ .. code-block:: none
+
+ crmsh # crm resource unmove WebSite
+ pcs # pcs resource clear WebSite
+ pacemaker # crm_resource -r WebSite --clear
+
+Advanced Configuration
+######################
+
+Manipulate Configuration Elements by Type
+_________________________________________
+
+.. topic:: List Constraints with IDs
+
+ .. code-block:: none
+
+ pcs # pcs constraint list --full
+
+.. topic:: Remove Constraint by ID
+
+ .. code-block:: none
+
+ pcs # pcs constraint remove cli-ban-Website-on-pcmk-1
+ crmsh # crm configure remove cli-ban-Website-on-pcmk-1
+
+crmsh's `show` and `edit` commands can be used to manage resources and
+constraints by type:
+
+.. topic:: Show Configuration Elements
+
+ .. code-block:: none
+
+ crmsh # crm configure show type:primitive
+ crmsh # crm configure edit type:colocation
+
+Batch Changes
+_____________
+
+.. topic:: Make Multiple Changes and Apply Together
+
+ .. code-block:: none
+
+ crmsh # crm
+ crmsh # cib new drbd_cfg
+ crmsh # configure primitive WebData ocf:linbit:drbd params drbd_resource=wwwdata \
+ op monitor interval=60s
+ crmsh # configure ms WebDataClone WebData meta master-max=1 master-node-max=1 \
+ clone-max=2 clone-node-max=1 notify=true
+ crmsh # cib commit drbd_cfg
+ crmsh # quit
+
+ pcs # pcs cluster cib drbd_cfg
+ pcs # pcs -f drbd_cfg resource create WebData ocf:linbit:drbd drbd_resource=wwwdata \
+ op monitor interval=60s
+ pcs-0.9 # pcs -f drbd_cfg resource master WebDataClone WebData \
+ master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
+ pcs-0.10 # pcs -f drbd_cfg resource promotable WebData WebDataClone \
+ promoted-max=1 promoted-node-max=1 clone-max=2 clone-node-max=1 notify=true
+ pcs # pcs cluster cib-push drbd_cfg
+
+Template Creation
+_________________
+
+.. topic:: Create Resource Template Based on Existing Primitives of Same Type
+
+ .. code-block:: none
+
+ crmsh # crm configure assist template ClusterIP AdminIP
+
+Log Analysis
+____________
+
+.. topic:: Show Information About Recent Cluster Events
+
+ .. code-block:: none
+
+ crmsh # crm history
+ crmsh # peinputs
+ crmsh # transition pe-input-10
+ crmsh # transition log pe-input-10
+
+Configuration Scripts
+_____________________
+
+.. topic:: Script Multiple-step Cluster Configurations
+
+ .. code-block:: none
+
+ crmsh # crm script show apache
+ crmsh # crm script run apache \
+ id=WebSite \
+ install=true \
+ virtual-ip:ip=192.168.0.15 \
+ database:id=WebData \
+ database:install=true
diff --git a/doc/sphinx/Pacemaker_Administration/tools.rst b/doc/sphinx/Pacemaker_Administration/tools.rst
new file mode 100644
index 0000000..5a6044d
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Administration/tools.rst
@@ -0,0 +1,562 @@
+.. index:: command-line tool
+
+Using Pacemaker Command-Line Tools
+----------------------------------
+
+.. index::
+ single: command-line tool; output format
+
+.. _cmdline_output:
+
+Controlling Command Line Output
+###############################
+
+Some of the pacemaker command line utilities have been converted to a new
+output system. Among these tools are ``crm_mon`` and ``stonith_admin``. This
+is an ongoing project, and more tools will be converted over time. This system
+lets you control the formatting of output with ``--output-as=`` and the
+destination of output with ``--output-to=``.
+
+The available formats vary by tool, but at least plain text and XML are
+supported by all tools that use the new system. The default format is plain
+text. The default destination is stdout but can be redirected to any file.
+Some formats support command line options for changing the style of the output.
+For instance:
+
+.. code-block:: none
+
+ # crm_mon --help-output
+ Usage:
+ crm_mon [OPTION?]
+
+ Provides a summary of cluster's current state.
+
+ Outputs varying levels of detail in a number of different formats.
+
+ Output Options:
+ --output-as=FORMAT Specify output format as one of: console (default), html, text, xml
+ --output-to=DEST Specify file name for output (or "-" for stdout)
+ --html-cgi Add text needed to use output in a CGI program
+ --html-stylesheet=URI Link to an external CSS stylesheet
+ --html-title=TITLE Page title
+ --text-fancy Use more highly formatted output
+
+.. index::
+ single: crm_mon
+ single: command-line tool; crm_mon
+
+.. _crm_mon:
+
+Monitor a Cluster with crm_mon
+##############################
+
+The ``crm_mon`` utility displays the current state of an active cluster. It can
+show the cluster status organized by node or by resource, and can be used in
+either single-shot or dynamically updating mode. It can also display operations
+performed and information about failures.
+
+Using this tool, you can examine the state of the cluster for irregularities,
+and see how it responds when you cause or simulate failures.
+
+See the manual page or the output of ``crm_mon --help`` for a full description
+of its many options.
+
+.. topic:: Sample output from crm_mon -1
+
+ .. code-block:: none
+
+ Cluster Summary:
+ * Stack: corosync
+ * Current DC: node2 (version 2.0.0-1) - partition with quorum
+ * Last updated: Mon Jan 29 12:18:42 2018
+ * Last change: Mon Jan 29 12:18:40 2018 by root via crm_attribute on node3
+ * 5 nodes configured
+ * 2 resources configured
+
+ Node List:
+ * Online: [ node1 node2 node3 node4 node5 ]
+
+ * Active resources:
+ * Fencing (stonith:fence_xvm): Started node1
+ * IP (ocf:heartbeat:IPaddr2): Started node2
+
+.. topic:: Sample output from crm_mon -n -1
+
+ .. code-block:: none
+
+ Cluster Summary:
+ * Stack: corosync
+ * Current DC: node2 (version 2.0.0-1) - partition with quorum
+ * Last updated: Mon Jan 29 12:21:48 2018
+ * Last change: Mon Jan 29 12:18:40 2018 by root via crm_attribute on node3
+ * 5 nodes configured
+ * 2 resources configured
+
+ * Node List:
+ * Node node1: online
+ * Fencing (stonith:fence_xvm): Started
+ * Node node2: online
+ * IP (ocf:heartbeat:IPaddr2): Started
+ * Node node3: online
+ * Node node4: online
+ * Node node5: online
+
+As mentioned in an earlier chapter, the DC is the node is where decisions are
+made. The cluster elects a node to be DC as needed. The only significance of
+the choice of DC to an administrator is the fact that its logs will have the
+most information about why decisions were made.
+
+.. index::
+ pair: crm_mon; CSS
+
+.. _crm_mon_css:
+
+Styling crm_mon HTML output
+___________________________
+
+Various parts of ``crm_mon``'s HTML output have a CSS class associated with
+them. Not everything does, but some of the most interesting portions do. In
+the following example, the status of each node has an ``online`` class and the
+details of each resource have an ``rsc-ok`` class.
+
+.. code-block:: html
+
+ <h2>Node List</h2>
+ <ul>
+ <li>
+ <span>Node: cluster01</span><span class="online"> online</span>
+ </li>
+ <li><ul><li><span class="rsc-ok">ping (ocf::pacemaker:ping): Started</span></li></ul></li>
+ <li>
+ <span>Node: cluster02</span><span class="online"> online</span>
+ </li>
+ <li><ul><li><span class="rsc-ok">ping (ocf::pacemaker:ping): Started</span></li></ul></li>
+ </ul>
+
+By default, a stylesheet for styling these classes is included in the head of
+the HTML output. The relevant portions of this stylesheet that would be used
+in the above example is:
+
+.. code-block:: css
+
+ <style>
+ .online { color: green }
+ .rsc-ok { color: green }
+ </style>
+
+If you want to override some or all of the styling, simply create your own
+stylesheet, place it on a web server, and pass ``--html-stylesheet=<URL>``
+to ``crm_mon``. The link is added after the default stylesheet, so your
+changes take precedence. You don't need to duplicate the entire default.
+Only include what you want to change.
+
+.. index::
+ single: cibadmin
+ single: command-line tool; cibadmin
+
+.. _cibadmin:
+
+Edit the CIB XML with cibadmin
+##############################
+
+The most flexible tool for modifying the configuration is Pacemaker's
+``cibadmin`` command. With ``cibadmin``, you can query, add, remove, update
+or replace any part of the configuration. All changes take effect immediately,
+so there is no need to perform a reload-like operation.
+
+The simplest way of using ``cibadmin`` is to use it to save the current
+configuration to a temporary file, edit that file with your favorite
+text or XML editor, and then upload the revised configuration.
+
+.. topic:: Safely using an editor to modify the cluster configuration
+
+ .. code-block:: none
+
+ # cibadmin --query > tmp.xml
+ # vi tmp.xml
+ # cibadmin --replace --xml-file tmp.xml
+
+Some of the better XML editors can make use of a RELAX NG schema to
+help make sure any changes you make are valid. The schema describing
+the configuration can be found in ``pacemaker.rng``, which may be
+deployed in a location such as ``/usr/share/pacemaker`` depending on your
+operating system distribution and how you installed the software.
+
+If you want to modify just one section of the configuration, you can
+query and replace just that section to avoid modifying any others.
+
+.. topic:: Safely using an editor to modify only the resources section
+
+ .. code-block:: none
+
+ # cibadmin --query --scope resources > tmp.xml
+ # vi tmp.xml
+ # cibadmin --replace --scope resources --xml-file tmp.xml
+
+To quickly delete a part of the configuration, identify the object you wish to
+delete by XML tag and id. For example, you might search the CIB for all
+STONITH-related configuration:
+
+.. topic:: Searching for STONITH-related configuration items
+
+ .. code-block:: none
+
+ # cibadmin --query | grep stonith
+ <nvpair id="cib-bootstrap-options-stonith-action" name="stonith-action" value="reboot"/>
+ <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="1"/>
+ <primitive id="child_DoFencing" class="stonith" type="external/vmware">
+ <lrm_resource id="child_DoFencing:0" type="external/vmware" class="stonith">
+ <lrm_resource id="child_DoFencing:0" type="external/vmware" class="stonith">
+ <lrm_resource id="child_DoFencing:1" type="external/vmware" class="stonith">
+ <lrm_resource id="child_DoFencing:0" type="external/vmware" class="stonith">
+ <lrm_resource id="child_DoFencing:2" type="external/vmware" class="stonith">
+ <lrm_resource id="child_DoFencing:0" type="external/vmware" class="stonith">
+ <lrm_resource id="child_DoFencing:3" type="external/vmware" class="stonith">
+
+If you wanted to delete the ``primitive`` tag with id ``child_DoFencing``,
+you would run:
+
+.. code-block:: none
+
+ # cibadmin --delete --xml-text '<primitive id="child_DoFencing"/>'
+
+See the cibadmin man page for more options.
+
+.. warning::
+
+ Never edit the live ``cib.xml`` file directly. Pacemaker will detect such
+ changes and refuse to use the configuration.
+
+
+.. index::
+ single: crm_shadow
+ single: command-line tool; crm_shadow
+
+.. _crm_shadow:
+
+Batch Configuration Changes with crm_shadow
+###########################################
+
+Often, it is desirable to preview the effects of a series of configuration
+changes before updating the live configuration all at once. For this purpose,
+``crm_shadow`` creates a "shadow" copy of the configuration and arranges for
+all the command-line tools to use it.
+
+To begin, simply invoke ``crm_shadow --create`` with a name of your choice,
+and follow the simple on-screen instructions. Shadow copies are identified with
+a name to make it possible to have more than one.
+
+.. warning::
+
+ Read this section and the on-screen instructions carefully; failure to do so
+ could result in destroying the cluster's active configuration!
+
+.. topic:: Creating and displaying the active sandbox
+
+ .. code-block:: none
+
+ # crm_shadow --create test
+ Setting up shadow instance
+ Type Ctrl-D to exit the crm_shadow shell
+ shadow[test]:
+ shadow[test] # crm_shadow --which
+ test
+
+From this point on, all cluster commands will automatically use the shadow copy
+instead of talking to the cluster's active configuration. Once you have
+finished experimenting, you can either make the changes active via the
+``--commit`` option, or discard them using the ``--delete`` option. Again, be
+sure to follow the on-screen instructions carefully!
+
+For a full list of ``crm_shadow`` options and commands, invoke it with the
+``--help`` option.
+
+.. topic:: Use sandbox to make multiple changes all at once, discard them, and verify real configuration is untouched
+
+ .. code-block:: none
+
+ shadow[test] # crm_failcount -r rsc_c001n01 -G
+ scope=status name=fail-count-rsc_c001n01 value=0
+ shadow[test] # crm_standby --node c001n02 -v on
+ shadow[test] # crm_standby --node c001n02 -G
+ scope=nodes name=standby value=on
+
+ shadow[test] # cibadmin --erase --force
+ shadow[test] # cibadmin --query
+ <cib crm_feature_set="3.0.14" validate-with="pacemaker-3.0" epoch="112" num_updates="2" admin_epoch="0" cib-last-written="Mon Jan 8 23:26:47 2018" update-origin="rhel7-1" update-client="crm_node" update-user="root" have-quorum="1" dc-uuid="1">
+ <configuration>
+ <crm_config/>
+ <nodes/>
+ <resources/>
+ <constraints/>
+ </configuration>
+ <status/>
+ </cib>
+ shadow[test] # crm_shadow --delete test --force
+ Now type Ctrl-D to exit the crm_shadow shell
+ shadow[test] # exit
+ # crm_shadow --which
+ No active shadow configuration defined
+ # cibadmin -Q
+ <cib crm_feature_set="3.0.14" validate-with="pacemaker-3.0" epoch="110" num_updates="2" admin_epoch="0" cib-last-written="Mon Jan 8 23:26:47 2018" update-origin="rhel7-1" update-client="crm_node" update-user="root" have-quorum="1">
+ <configuration>
+ <crm_config>
+ <cluster_property_set id="cib-bootstrap-options">
+ <nvpair id="cib-bootstrap-1" name="stonith-enabled" value="1"/>
+ <nvpair id="cib-bootstrap-2" name="pe-input-series-max" value="30000"/>
+
+See the next section, :ref:`crm_simulate`, for how to test your changes before
+committing them to the live cluster.
+
+
+.. index::
+ single: crm_simulate
+ single: command-line tool; crm_simulate
+
+.. _crm_simulate:
+
+Simulate Cluster Activity with crm_simulate
+###########################################
+
+The command-line tool `crm_simulate` shows the results of the same logic
+the cluster itself uses to respond to a particular cluster configuration and
+status.
+
+As always, the man page is the primary documentation, and should be consulted
+for further details. This section aims for a better conceptual explanation and
+practical examples.
+
+Replaying cluster decision-making logic
+_______________________________________
+
+At any given time, one node in a Pacemaker cluster will be elected DC, and that
+node will run Pacemaker's scheduler to make decisions.
+
+Each time decisions need to be made (a "transition"), the DC will have log
+messages like "Calculated transition ... saving inputs in ..." with a file
+name. You can grab the named file and replay the cluster logic to see why
+particular decisions were made. The file contains the live cluster
+configuration at that moment, so you can also look at it directly to see the
+value of node attributes, etc., at that time.
+
+The simplest usage is (replacing $FILENAME with the actual file name):
+
+.. topic:: Simulate cluster response to a given CIB
+
+ .. code-block:: none
+
+ # crm_simulate --simulate --xml-file $FILENAME
+
+That will show the cluster state when the process started, the actions that
+need to be taken ("Transition Summary"), and the resulting cluster state if the
+actions succeed. Most actions will have a brief description of why they were
+required.
+
+The transition inputs may be compressed. ``crm_simulate`` can handle these
+compressed files directly, though if you want to edit the file, you'll need to
+uncompress it first.
+
+You can do the same simulation for the live cluster configuration at the
+current moment. This is useful mainly when using ``crm_shadow`` to create a
+sandbox version of the CIB; the ``--live-check`` option will use the shadow CIB
+if one is in effect.
+
+.. topic:: Simulate cluster response to current live CIB or shadow CIB
+
+ .. code-block:: none
+
+ # crm_simulate --simulate --live-check
+
+
+Why decisions were made
+_______________________
+
+To get further insight into the "why", it gets user-unfriendly very quickly. If
+you add the ``--show-scores`` option, you will also see all the scores that
+went into the decision-making. The node with the highest cumulative score for a
+resource will run it. You can look for ``-INFINITY`` scores in particular to
+see where complete bans came into effect.
+
+You can also add ``-VVVV`` to get more detailed messages about what's happening
+under the hood. You can add up to two more V's even, but that's usually useful
+only if you're a masochist or tracing through the source code.
+
+
+Visualizing the action sequence
+_______________________________
+
+Another handy feature is the ability to generate a visual graph of the actions
+needed, using the ``--save-dotfile`` option. This relies on the separate
+Graphviz [#]_ project.
+
+.. topic:: Generate a visual graph of cluster actions from a saved CIB
+
+ .. code-block:: none
+
+ # crm_simulate --simulate --xml-file $FILENAME --save-dotfile $FILENAME.dot
+ # dot $FILENAME.dot -Tsvg > $FILENAME.svg
+
+``$FILENAME.dot`` will contain a GraphViz representation of the cluster's
+response to your changes, including all actions with their ordering
+dependencies.
+
+``$FILENAME.svg`` will be the same information in a standard graphical format
+that you can view in your browser or other app of choice. You could, of course,
+use other ``dot`` options to generate other formats.
+
+How to interpret the graphical output:
+
+ * Bubbles indicate actions, and arrows indicate ordering dependencies
+ * Resource actions have text of the form
+ ``<RESOURCE>_<ACTION>_<INTERVAL_IN_MS> <NODE>`` indicating that the
+ specified action will be executed for the specified resource on the
+ specified node, once if interval is 0 or at specified recurring interval
+ otherwise
+ * Actions with black text will be sent to the executor (that is, the
+ appropriate agent will be invoked)
+ * Actions with orange text are "pseudo" actions that the cluster uses
+ internally for ordering but require no real activity
+ * Actions with a solid green border are part of the transition (that is, the
+ cluster will attempt to execute them in the given order -- though a
+ transition can be interrupted by action failure or new events)
+ * Dashed arrows indicate dependencies that are not present in the transition
+ graph
+ * Actions with a dashed border will not be executed. If the dashed border is
+ blue, the cluster does not feel the action needs to be executed. If the
+ dashed border is red, the cluster would like to execute the action but
+ cannot. Any actions depending on an action with a dashed border will not be
+ able to execute.
+ * Loops should not happen, and should be reported as a bug if found.
+
+.. topic:: Small Cluster Transition
+
+ .. image:: ../shared/images/Policy-Engine-small.png
+ :alt: An example transition graph as represented by Graphviz
+ :align: center
+
+In the above example, it appears that a new node, ``pcmk-2``, has come online
+and that the cluster is checking to make sure ``rsc1``, ``rsc2`` and ``rsc3``
+are not already running there (indicated by the ``rscN_monitor_0`` entries).
+Once it did that, and assuming the resources were not active there, it would
+have liked to stop ``rsc1`` and ``rsc2`` on ``pcmk-1`` and move them to
+``pcmk-2``. However, there appears to be some problem and the cluster cannot or
+is not permitted to perform the stop actions which implies it also cannot
+perform the start actions. For some reason, the cluster does not want to start
+``rsc3`` anywhere.
+
+.. topic:: Complex Cluster Transition
+
+ .. image:: ../shared/images/Policy-Engine-big.png
+ :alt: Complex transition graph that you're not expected to be able to read
+ :align: center
+
+
+What-if scenarios
+_________________
+
+You can make changes to the saved or shadow CIB and simulate it again, to see
+how Pacemaker would react differently. You can edit the XML by hand, use
+command-line tools such as ``cibadmin`` with either a shadow CIB or the
+``CIB_file`` environment variable set to the filename, or use higher-level tool
+support (see the man pages of the specific tool you're using for how to perform
+actions on a saved CIB file rather than the live CIB).
+
+You can also inject node failures and/or action failures into the simulation;
+see the ``crm_simulate`` man page for more details.
+
+This capability is useful when using a shadow CIB to edit the configuration.
+Before committing the changes to the live cluster with ``crm_shadow --commit``,
+you can use ``crm_simulate`` to see how the cluster will react to the changes.
+
+.. _crm_attribute:
+
+.. index::
+ single: attrd_updater
+ single: command-line tool; attrd_updater
+ single: crm_attribute
+ single: command-line tool; crm_attribute
+
+Manage Node Attributes, Cluster Options and Defaults with crm_attribute and attrd_updater
+#########################################################################################
+
+``crm_attribute`` and ``attrd_updater`` are confusingly similar tools with subtle
+differences.
+
+``attrd_updater`` can query and update node attributes. ``crm_attribute`` can query
+and update not only node attributes, but also cluster options, resource
+defaults, and operation defaults.
+
+To understand the differences, it helps to understand the various types of node
+attribute.
+
+.. table:: **Types of Node Attributes**
+
+ +-----------+----------+-------------------+------------------+----------------+----------------+
+ | Type | Recorded | Recorded in | Survive full | Manageable by | Manageable by |
+ | | in CIB? | attribute manager | cluster restart? | crm_attribute? | attrd_updater? |
+ | | | memory? | | | |
+ +===========+==========+===================+==================+================+================+
+ | permanent | yes | no | yes | yes | no |
+ +-----------+----------+-------------------+------------------+----------------+----------------+
+ | transient | yes | yes | no | yes | yes |
+ +-----------+----------+-------------------+------------------+----------------+----------------+
+ | private | no | yes | no | no | yes |
+ +-----------+----------+-------------------+------------------+----------------+----------------+
+
+As you can see from the table above, ``crm_attribute`` can manage permanent and
+transient node attributes, while ``attrd_updater`` can manage transient and
+private node attributes.
+
+The difference between the two tools lies mainly in *how* they update node
+attributes: ``attrd_updater`` always contacts the Pacemaker attribute manager
+directly, while ``crm_attribute`` will contact the attribute manager only for
+transient node attributes, and will instead modify the CIB directly for
+permanent node attributes (and for transient node attributes when unable to
+contact the attribute manager).
+
+By contacting the attribute manager directly, ``attrd_updater`` can change
+an attribute's "dampening" (whether changes are immediately flushed to the CIB
+or after a specified amount of time, to minimize disk writes for frequent
+changes), set private node attributes (which are never written to the CIB), and
+set attributes for nodes that don't yet exist.
+
+By modifying the CIB directly, ``crm_attribute`` can set permanent node
+attributes (which are only in the CIB and not managed by the attribute
+manager), and can be used with saved CIB files and shadow CIBs.
+
+However a transient node attribute is set, it is synchronized between the CIB
+and the attribute manager, on all nodes.
+
+
+.. index::
+ single: crm_failcount
+ single: command-line tool; crm_failcount
+ single: crm_node
+ single: command-line tool; crm_node
+ single: crm_report
+ single: command-line tool; crm_report
+ single: crm_standby
+ single: command-line tool; crm_standby
+ single: crm_verify
+ single: command-line tool; crm_verify
+ single: stonith_admin
+ single: command-line tool; stonith_admin
+
+Other Commonly Used Tools
+#########################
+
+Other command-line tools include:
+
+* ``crm_failcount``: query or delete resource fail counts
+* ``crm_node``: manage cluster nodes
+* ``crm_report``: generate a detailed cluster report for bug submissions
+* ``crm_resource``: manage cluster resources
+* ``crm_standby``: manage standby status of nodes
+* ``crm_verify``: validate a CIB
+* ``stonith_admin``: manage fencing devices
+
+See the manual pages for details.
+
+.. rubric:: Footnotes
+
+.. [#] Graph visualization software. See http://www.graphviz.org/ for details.
diff --git a/doc/sphinx/Pacemaker_Administration/troubleshooting.rst b/doc/sphinx/Pacemaker_Administration/troubleshooting.rst
new file mode 100644
index 0000000..22c9dc8
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Administration/troubleshooting.rst
@@ -0,0 +1,123 @@
+.. index:: troubleshooting
+
+Troubleshooting Cluster Problems
+--------------------------------
+
+.. index:: logging, pacemaker.log
+
+Logging
+#######
+
+Pacemaker by default logs messages of ``notice`` severity and higher to the
+system log, and messages of ``info`` severity and higher to the detail log,
+which by default is ``/var/log/pacemaker/pacemaker.log``.
+
+Logging options can be controlled via environment variables at Pacemaker
+start-up. Where these are set varies by operating system (often
+``/etc/sysconfig/pacemaker`` or ``/etc/default/pacemaker``). See the comments
+in that file for details.
+
+Because cluster problems are often highly complex, involving multiple machines,
+cluster daemons, and managed services, Pacemaker logs rather verbosely to
+provide as much context as possible. It is an ongoing priority to make these
+logs more user-friendly, but by necessity there is a lot of obscure, low-level
+information that can make them difficult to follow.
+
+The default log rotation configuration shipped with Pacemaker (typically
+installed in ``/etc/logrotate.d/pacemaker``) rotates the log when it reaches
+100MB in size, or weekly, whichever comes first.
+
+If you configure debug or (Heaven forbid) trace-level logging, the logs can
+grow enormous quite quickly. Because rotated logs are by default named with the
+year, month, and day only, this can cause name collisions if your logs exceed
+100MB in a single day. You can add ``dateformat -%Y%m%d-%H`` to the rotation
+configuration to avoid this.
+
+Reading the Logs
+################
+
+When troubleshooting, first check the system log or journal for errors or
+warnings from Pacemaker components (conveniently, they will all have
+"pacemaker" in their logged process name). For example:
+
+.. code-block:: none
+
+ # grep 'pacemaker.*\(error\|warning\)' /var/log/messages
+ Mar 29 14:04:19 node1 pacemaker-controld[86636]: error: Result of monitor operation for rn2 on node1: Timed Out after 45s (Remote executor did not respond)
+
+If that doesn't give sufficient information, next look at the ``notice`` level
+messages from ``pacemaker-controld``. These will show changes in the state of
+cluster nodes. On the DC, this will also show resource actions attempted. For
+example:
+
+.. code-block:: none
+
+ # grep 'pacemaker-controld.*notice:' /var/log/messages
+ ... output skipped for brevity ...
+ Mar 29 14:05:36 node1 pacemaker-controld[86636]: notice: Node rn2 state is now lost
+ ... more output skipped for brevity ...
+ Mar 29 14:12:17 node1 pacemaker-controld[86636]: notice: Initiating stop operation rsc1_stop_0 on node4
+ ... more output skipped for brevity ...
+
+Of course, you can use other tools besides ``grep`` to search the logs.
+
+
+.. index:: transition
+
+Transitions
+###########
+
+A key concept in understanding how a Pacemaker cluster functions is a
+*transition*. A transition is a set of actions that need to be taken to bring
+the cluster from its current state to the desired state (as expressed by the
+configuration).
+
+Whenever a relevant event happens (a node joining or leaving the cluster,
+a resource failing, etc.), the controller will ask the scheduler to recalculate
+the status of the cluster, which generates a new transition. The controller
+then performs the actions in the transition in the proper order.
+
+Each transition can be identified in the DC's logs by a line like:
+
+.. code-block:: none
+
+ notice: Calculated transition 19, saving inputs in /var/lib/pacemaker/pengine/pe-input-1463.bz2
+
+The file listed as the "inputs" is a snapshot of the cluster configuration and
+state at that moment (the CIB). This file can help determine why particular
+actions were scheduled. The ``crm_simulate`` command, described in
+:ref:`crm_simulate`, can be used to replay the file.
+
+The log messages immediately before the "saving inputs" message will include
+any actions that the scheduler thinks need to be done.
+
+
+Node Failures
+#############
+
+When a node fails, and looking at errors and warnings doesn't give an obvious
+explanation, try to answer questions like the following based on log messages:
+
+* When and what was the last successful message on the node itself, or about
+ that node in the other nodes' logs?
+* Did pacemaker-controld on the other nodes notice the node leave?
+* Did pacemaker-controld on the DC invoke the scheduler and schedule a new
+ transition?
+* Did the transition include fencing the failed node?
+* Was fencing attempted?
+* Did fencing succeed?
+
+Resource Failures
+#################
+
+When a resource fails, and looking at errors and warnings doesn't give an
+obvious explanation, try to answer questions like the following based on log
+messages:
+
+* Did pacemaker-controld record the result of the failed resource action?
+* What was the failed action's execution status and exit status?
+* What code in the resource agent could result in those status codes?
+* Did pacemaker-controld on the DC invoke the scheduler and schedule a new
+ transition?
+* Did the new transition include recovery of the resource?
+* Were the recovery actions initiated, and what were their results?
diff --git a/doc/sphinx/Pacemaker_Administration/upgrading.rst b/doc/sphinx/Pacemaker_Administration/upgrading.rst
new file mode 100644
index 0000000..1ca2a4e
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Administration/upgrading.rst
@@ -0,0 +1,534 @@
+.. index:: upgrade
+
+Upgrading a Pacemaker Cluster
+-----------------------------
+
+.. index:: version
+
+Pacemaker Versioning
+####################
+
+Pacemaker has an overall release version, plus separate version numbers for
+certain internal components.
+
+.. index::
+ single: version; release
+
+* **Pacemaker release version:** This version consists of three numbers
+ (*x.y.z*).
+
+ The major version number (the *x* in *x.y.z*) increases when at least some
+ rolling upgrades are not possible from the previous major version. For example,
+ a rolling upgrade from 1.0.8 to 1.1.15 should always be supported, but a
+ rolling upgrade from 1.0.8 to 2.0.0 may not be possible.
+
+ The minor version (the *y* in *x.y.z*) increases when there are significant
+ changes in cluster default behavior, tool behavior, and/or the API interface
+ (for software that utilizes Pacemaker libraries). The main benefit is to alert
+ you to pay closer attention to the release notes, to see if you might be
+ affected.
+
+ The release counter (the *z* in *x.y.z*) is increased with all public releases
+ of Pacemaker, which typically include both bug fixes and new features.
+
+.. index::
+ single: feature set
+ single: version; feature set
+
+* **CRM feature set:** This version number applies to the communication between
+ full cluster nodes, and is used to avoid problems in mixed-version clusters.
+
+ The major version number increases when nodes with different versions would not
+ work (rolling upgrades are not allowed). The minor version number increases
+ when mixed-version clusters are allowed only during rolling upgrades. The
+ minor-minor version number is ignored, but allows resource agents to detect
+ cluster support for various features. [#]_
+
+ Pacemaker ensures that the longest-running node is the cluster's DC. This
+ ensures new features are not enabled until all nodes are upgraded to support
+ them.
+
+.. index::
+ single: version; Pacemaker Remote protocol
+
+* **Pacemaker Remote protocol version:** This version applies to communication
+ between a Pacemaker Remote node and the cluster. It increases when an older
+ cluster node would have problems hosting the connection to a newer
+ Pacemaker Remote node. To avoid these problems, Pacemaker Remote nodes will
+ accept connections only from cluster nodes with the same or newer
+ Pacemaker Remote protocol version.
+
+ Unlike with CRM feature set differences between full cluster nodes,
+ mixed Pacemaker Remote protocol versions between Pacemaker Remote nodes and
+ full cluster nodes are fine, as long as the Pacemaker Remote nodes have the
+ older version. This can be useful, for example, to host a legacy application
+ in an older operating system version used as a Pacemaker Remote node.
+
+.. index::
+ single: version; XML schema
+
+* **XML schema version:** Pacemaker’s configuration syntax — what's allowed in
+ the Configuration Information Base (CIB) — has its own version. This allows
+ the configuration syntax to evolve over time while still allowing clusters
+ with older configurations to work without change.
+
+
+.. index::
+ single: upgrade; methods
+
+Upgrading Cluster Software
+##########################
+
+There are three approaches to upgrading a cluster, each with advantages and
+disadvantages.
+
+.. table:: **Upgrade Methods**
+
+ +---------------------------------------------------+----------+----------+--------+---------+----------+----------+
+ | Method | Available| Can be | Service| Service | Exercises| Allows |
+ | | between | used with| outage | recovery| failover | change of|
+ | | all | Pacemaker| during | during | logic | messaging|
+ | | versions | Remote | upgrade| upgrade | | layer |
+ | | | nodes | | | | [#]_ |
+ +===================================================+==========+==========+========+=========+==========+==========+
+ | Complete cluster shutdown | yes | yes | always | N/A | no | yes |
+ +---------------------------------------------------+----------+----------+--------+---------+----------+----------+
+ | Rolling (node by node) | no | yes | always | yes | yes | no |
+ | | | | [#]_ | | | |
+ +---------------------------------------------------+----------+----------+--------+---------+----------+----------+
+ | Detach and reattach | yes | no | only | no | no | yes |
+ | | | | due to | | | |
+ | | | | failure| | | |
+ +---------------------------------------------------+----------+----------+--------+---------+----------+----------+
+
+
+.. index::
+ single: upgrade; shutdown
+
+Complete Cluster Shutdown
+_________________________
+
+In this scenario, one shuts down all cluster nodes and resources,
+then upgrades all the nodes before restarting the cluster.
+
+#. On each node:
+
+ a. Shutdown the cluster software (pacemaker and the messaging layer).
+ #. Upgrade the Pacemaker software. This may also include upgrading the
+ messaging layer and/or the underlying operating system.
+ #. Check the configuration with the ``crm_verify`` tool.
+
+#. On each node:
+
+ a. Start the cluster software.
+
+Currently, only Corosync version 2 and greater is supported as the cluster
+layer, but if another stack is supported in the future, the stack does not
+need to be the same one before the upgrade.
+
+One variation of this approach is to build a new cluster on new hosts.
+This allows the new version to be tested beforehand, and minimizes downtime by
+having the new nodes ready to be placed in production as soon as the old nodes
+are shut down.
+
+
+.. index::
+ single: upgrade; rolling upgrade
+
+Rolling (node by node)
+______________________
+
+In this scenario, each node is removed from the cluster, upgraded, and then
+brought back online, until all nodes are running the newest version.
+
+Special considerations when planning a rolling upgrade:
+
+* If you plan to upgrade other cluster software -- such as the messaging layer --
+ at the same time, consult that software's documentation for its compatibility
+ with a rolling upgrade.
+
+* If the major version number is changing in the Pacemaker version you are
+ upgrading to, a rolling upgrade may not be possible. Read the new version's
+ release notes (as well the information here) for what limitations may exist.
+
+* If the CRM feature set is changing in the Pacemaker version you are upgrading
+ to, you should run a mixed-version cluster only during a small rolling
+ upgrade window. If one of the older nodes drops out of the cluster for any
+ reason, it will not be able to rejoin until it is upgraded.
+
+* If the Pacemaker Remote protocol version is changing, all cluster nodes
+ should be upgraded before upgrading any Pacemaker Remote nodes.
+
+See the ClusterLabs wiki's
+`release calendar <https://wiki.clusterlabs.org/wiki/ReleaseCalendar>`_
+to figure out whether the CRM feature set and/or Pacemaker Remote protocol
+version changed between the the Pacemaker release versions in your rolling
+upgrade.
+
+To perform a rolling upgrade, on each node in turn:
+
+#. Put the node into standby mode, and wait for any active resources
+ to be moved cleanly to another node. (This step is optional, but
+ allows you to deal with any resource issues before the upgrade.)
+#. Shutdown the cluster software (pacemaker and the messaging layer) on the node.
+#. Upgrade the Pacemaker software. This may also include upgrading the
+ messaging layer and/or the underlying operating system.
+#. If this is the first node to be upgraded, check the configuration
+ with the ``crm_verify`` tool.
+#. Start the messaging layer.
+ This must be the same messaging layer (currently only Corosync version 2 and
+ greater is supported) that the rest of the cluster is using.
+
+.. note::
+
+ Even if a rolling upgrade from the current version of the cluster to the
+ newest version is not directly possible, it may be possible to perform a
+ rolling upgrade in multiple steps, by upgrading to an intermediate version
+ first.
+
+.. table:: **Version Compatibility Table**
+
+ +-------------------------+---------------------------+
+ | Version being Installed | Oldest Compatible Version |
+ +=========================+===========================+
+ | Pacemaker 2.y.z | Pacemaker 1.1.11 [#]_ |
+ +-------------------------+---------------------------+
+ | Pacemaker 1.y.z | Pacemaker 1.0.0 |
+ +-------------------------+---------------------------+
+ | Pacemaker 0.7.z | Pacemaker 0.6.z |
+ +-------------------------+---------------------------+
+
+.. index::
+ single: upgrade; detach and reattach
+
+Detach and Reattach
+___________________
+
+The reattach method is a variant of a complete cluster shutdown, where the
+resources are left active and get re-detected when the cluster is restarted.
+
+This method may not be used if the cluster contains any Pacemaker Remote nodes.
+
+#. Tell the cluster to stop managing services. This is required to allow the
+ services to remain active after the cluster shuts down.
+
+ .. code-block:: none
+
+ # crm_attribute --name maintenance-mode --update true
+
+#. On each node, shutdown the cluster software (pacemaker and the messaging
+ layer), and upgrade the Pacemaker software. This may also include upgrading
+ the messaging layer. While the underlying operating system may be upgraded
+ at the same time, that will be more likely to cause outages in the detached
+ services (certainly, if a reboot is required).
+#. Check the configuration with the ``crm_verify`` tool.
+#. On each node, start the cluster software.
+ Currently, only Corosync version 2 and greater is supported as the cluster
+ layer, but if another stack is supported in the future, the stack does not
+ need to be the same one before the upgrade.
+#. Verify that the cluster re-detected all resources correctly.
+#. Allow the cluster to resume managing resources again:
+
+ .. code-block:: none
+
+ # crm_attribute --name maintenance-mode --delete
+
+.. note::
+
+ While the goal of the detach-and-reattach method is to avoid disturbing
+ running services, resources may still move after the upgrade if any
+ resource's location is governed by a rule based on transient node
+ attributes. Transient node attributes are erased when the node leaves the
+ cluster. A common example is using the ``ocf:pacemaker:ping`` resource to
+ set a node attribute used to locate other resources.
+
+.. index::
+ pair: upgrade; CIB
+
+Upgrading the Configuration
+###########################
+
+The CIB schema version can change from one Pacemaker version to another.
+
+After cluster software is upgraded, the cluster will continue to use the older
+schema version that it was previously using. This can be useful, for example,
+when administrators have written tools that modify the configuration, and are
+based on the older syntax. [#]_
+
+However, when using an older syntax, new features may be unavailable, and there
+is a performance impact, since the cluster must do a non-persistent
+configuration upgrade before each transition. So while using the old syntax is
+possible, it is not advisable to continue using it indefinitely.
+
+Even if you wish to continue using the old syntax, it is a good idea to
+follow the upgrade procedure outlined below, except for the last step, to ensure
+that the new software has no problems with your existing configuration (since it
+will perform much the same task internally).
+
+If you are brave, it is sufficient simply to run ``cibadmin --upgrade``.
+
+A more cautious approach would proceed like this:
+
+#. Create a shadow copy of the configuration. The later commands will
+ automatically operate on this copy, rather than the live configuration.
+
+ .. code-block:: none
+
+ # crm_shadow --create shadow
+
+.. index::
+ single: configuration; verify
+
+#. Verify the configuration is valid with the new software (which may be
+ stricter about syntax mistakes, or may have dropped support for deprecated
+ features):
+
+ .. code-block:: none
+
+ # crm_verify --live-check
+
+#. Fix any errors or warnings.
+#. Perform the upgrade:
+
+ .. code-block:: none
+
+ # cibadmin --upgrade
+
+#. If this step fails, there are three main possibilities:
+
+ a. The configuration was not valid to start with (did you do steps 2 and
+ 3?).
+ #. The transformation failed; `report a bug <https://bugs.clusterlabs.org/>`_.
+ #. The transformation was successful but produced an invalid result.
+
+ If the result of the transformation is invalid, you may see a number of
+ errors from the validation library. If these are not helpful, visit the
+ `Validation FAQ wiki page <https://wiki.clusterlabs.org/wiki/Validation_FAQ>`_
+ and/or try the manual upgrade procedure described below.
+
+#. Check the changes:
+
+ .. code-block:: none
+
+ # crm_shadow --diff
+
+ If at this point there is anything about the upgrade that you wish to
+ fine-tune (for example, to change some of the automatic IDs), now is the
+ time to do so:
+
+ .. code-block:: none
+
+ # crm_shadow --edit
+
+ This will open the configuration in your favorite editor (whichever is
+ specified by the standard ``$EDITOR`` environment variable).
+
+#. Preview how the cluster will react:
+
+ .. code-block:: none
+
+ # crm_simulate --live-check --save-dotfile shadow.dot -S
+ # dot -Tsvg shadow.dot -o shadow.svg
+
+ You can then view shadow.svg with any compatible image viewer or web
+ browser. Verify that either no resource actions will occur or that you are
+ happy with any that are scheduled. If the output contains actions you do
+ not expect (possibly due to changes to the score calculations), you may need
+ to make further manual changes. See :ref:`crm_simulate` for further details
+ on how to interpret the output of ``crm_simulate`` and ``dot``.
+
+#. Upload the changes:
+
+ .. code-block:: none
+
+ # crm_shadow --commit shadow --force
+
+ In the unlikely event this step fails, please report a bug.
+
+.. note::
+
+ It is also possible to perform the configuration upgrade steps manually:
+
+ #. Locate the ``upgrade*.xsl`` conversion scripts provided with the source
+ code. These will often be installed in a location such as
+ ``/usr/share/pacemaker``, or may be obtained from the
+ `source repository <https://github.com/ClusterLabs/pacemaker/tree/main/xml>`_.
+
+ #. Run the conversion scripts that apply to your older version, for example:
+
+ .. code-block:: none
+
+ # xsltproc /path/to/upgrade06.xsl config06.xml > config10.xml
+
+ #. Locate the ``pacemaker.rng`` script (from the same location as the xsl
+ files).
+ #. Check the XML validity:
+
+ .. code-block:: none
+
+ # xmllint --relaxng /path/to/pacemaker.rng config10.xml
+
+ The advantage of this method is that it can be performed without the cluster
+ running, and any validation errors are often more informative.
+
+
+What Changed in 2.1
+###################
+
+The Pacemaker 2.1 release is fully backward-compatible in both the CIB XML and
+the C API. Highlights:
+
+* Pacemaker now supports the **OCF Resource Agent API version 1.1**.
+ Most notably, the ``Master`` and ``Slave`` role names have been renamed to
+ ``Promoted`` and ``Unpromoted``.
+
+* Pacemaker now supports colocations where the dependent resource does not
+ affect the primary resource's placement (via a new ``influence`` colocation
+ constraint option and ``critical`` resource meta-attribute). This is intended
+ for cases where a less-important resource must be colocated with an essential
+ resource, but it is preferred to leave the less-important resource stopped if
+ it fails, rather than move both resources.
+
+* If Pacemaker is built with libqb 2.0 or later, the detail log will use
+ **millisecond-resolution timestamps**.
+
+* In addition to crm_mon and stonith_admin, the crmadmin, crm_resource,
+ crm_simulate, and crm_verify commands now support the ``--output-as`` and
+ ``--output-to`` options, including **XML output** (which scripts and
+ higher-level tools are strongly recommended to use instead of trying to parse
+ the text output, which may change from release to release).
+
+For a detailed list of changes, see the release notes and the
+`Pacemaker 2.1 Changes <https://wiki.clusterlabs.org/wiki/Pacemaker_2.1_Changes>`_
+page on the ClusterLabs wiki.
+
+
+What Changed in 2.0
+###################
+
+The main goal of the 2.0 release was to remove support for deprecated syntax,
+along with some small changes in default configuration behavior and tool
+behavior. Highlights:
+
+* Only Corosync version 2 and greater is now supported as the underlying
+ cluster layer. Support for Heartbeat and Corosync 1 (including CMAN) is
+ removed.
+
+* The Pacemaker detail log file is now stored in
+ ``/var/log/pacemaker/pacemaker.log`` by default.
+
+* The record-pending cluster property now defaults to true, which
+ allows status tools such as crm_mon to show operations that are in
+ progress.
+
+* Support for a number of deprecated build options, environment variables,
+ and configuration settings has been removed.
+
+* The ``master`` tag has been deprecated in favor of using the ``clone`` tag
+ with the new ``promotable`` meta-attribute set to ``true``. "Master/slave"
+ clone resources are now referred to as "promotable" clone resources.
+
+* The public API for Pacemaker libraries that software applications can use
+ has changed significantly.
+
+For a detailed list of changes, see the release notes and the
+`Pacemaker 2.0 Changes <https://wiki.clusterlabs.org/wiki/Pacemaker_2.0_Changes>`_
+page on the ClusterLabs wiki.
+
+
+What Changed in 1.0
+###################
+
+New
+___
+
+* Failure timeouts.
+* New section for resource and operation defaults.
+* Tool for making offline configuration changes.
+* ``Rules``, ``instance_attributes``, ``meta_attributes`` and sets of
+ operations can be defined once and referenced in multiple places.
+* The CIB now accepts XPath-based create/modify/delete operations. See
+ ``cibadmin --help``.
+* Multi-dimensional colocation and ordering constraints.
+* The ability to connect to the CIB from non-cluster machines.
+* Allow recurring actions to be triggered at known times.
+
+
+Changed
+_______
+
+* Syntax
+
+ * All resource and cluster options now use dashes (-) instead of underscores
+ (_)
+ * ``master_slave`` was renamed to ``master``
+ * The ``attributes`` container tag was removed
+ * The operation field ``pre-req`` has been renamed ``requires``
+ * All operations must have an ``interval``, ``start``/``stop`` must have it
+ set to zero
+
+* The ``stonith-enabled`` option now defaults to true.
+* The cluster will refuse to start resources if ``stonith-enabled`` is true (or
+ unset) and no STONITH resources have been defined
+* The attributes of colocation and ordering constraints were renamed for
+ clarity.
+* ``resource-failure-stickiness`` has been replaced by ``migration-threshold``.
+* The parameters for command-line tools have been made consistent
+* Switched to 'RelaxNG' schema validation and 'libxml2' parser
+
+ * id fields are now XML IDs which have the following limitations:
+
+ * id's cannot contain colons (:)
+ * id's cannot begin with a number
+ * id's must be globally unique (not just unique for that tag)
+
+ * Some fields (such as those in constraints that refer to resources) are
+ IDREFs.
+
+ This means that they must reference existing resources or objects in
+ order for the configuration to be valid. Removing an object which is
+ referenced elsewhere will therefore fail.
+
+ * The CIB representation, from which a MD5 digest is calculated to verify
+ CIBs on the nodes, has changed.
+
+ This means that every CIB update will require a full refresh on any
+ upgraded nodes until the cluster is fully upgraded to 1.0. This will result
+ in significant performance degradation and it is therefore highly
+ inadvisable to run a mixed 1.0/0.6 cluster for any longer than absolutely
+ necessary.
+
+* Ping node information no longer needs to be added to ``ha.cf``. Simply
+ include the lists of hosts in your ping resource(s).
+
+
+Removed
+_______
+
+
+* Syntax
+
+ * It is no longer possible to set resource meta options as top-level
+ attributes. Use meta-attributes instead.
+ * Resource and operation defaults are no longer read from ``crm_config``.
+
+.. rubric:: Footnotes
+
+.. [#] Before CRM feature set 3.1.0 (Pacemaker 2.0.0), the minor-minor version
+ number was treated the same as the minor version.
+
+.. [#] Currently, Corosync version 2 and greater is the only supported cluster
+ stack, but other stacks have been supported by past versions, and may be
+ supported by future versions.
+
+.. [#] Any active resources will be moved off the node being upgraded, so there
+ will be at least a brief outage unless all resources can be migrated
+ "live".
+
+.. [#] Rolling upgrades from Pacemaker 1.1.z to 2.y.z are possible only if the
+ cluster uses corosync version 2 or greater as its messaging layer, and
+ the Cluster Information Base (CIB) uses schema 1.0 or higher in its
+ ``validate-with`` property.
+
+.. [#] As of Pacemaker 2.0.0, only schema versions pacemaker-1.0 and higher
+ are supported (excluding pacemaker-1.1, which was a special case).