summaryrefslogtreecommitdiffstats
path: root/doc/sphinx/Pacemaker_Administration/upgrading.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/sphinx/Pacemaker_Administration/upgrading.rst')
-rw-r--r--doc/sphinx/Pacemaker_Administration/upgrading.rst534
1 files changed, 534 insertions, 0 deletions
diff --git a/doc/sphinx/Pacemaker_Administration/upgrading.rst b/doc/sphinx/Pacemaker_Administration/upgrading.rst
new file mode 100644
index 0000000..1ca2a4e
--- /dev/null
+++ b/doc/sphinx/Pacemaker_Administration/upgrading.rst
@@ -0,0 +1,534 @@
+.. index:: upgrade
+
+Upgrading a Pacemaker Cluster
+-----------------------------
+
+.. index:: version
+
+Pacemaker Versioning
+####################
+
+Pacemaker has an overall release version, plus separate version numbers for
+certain internal components.
+
+.. index::
+ single: version; release
+
+* **Pacemaker release version:** This version consists of three numbers
+ (*x.y.z*).
+
+ The major version number (the *x* in *x.y.z*) increases when at least some
+ rolling upgrades are not possible from the previous major version. For example,
+ a rolling upgrade from 1.0.8 to 1.1.15 should always be supported, but a
+ rolling upgrade from 1.0.8 to 2.0.0 may not be possible.
+
+ The minor version (the *y* in *x.y.z*) increases when there are significant
+ changes in cluster default behavior, tool behavior, and/or the API interface
+ (for software that utilizes Pacemaker libraries). The main benefit is to alert
+ you to pay closer attention to the release notes, to see if you might be
+ affected.
+
+ The release counter (the *z* in *x.y.z*) is increased with all public releases
+ of Pacemaker, which typically include both bug fixes and new features.
+
+.. index::
+ single: feature set
+ single: version; feature set
+
+* **CRM feature set:** This version number applies to the communication between
+ full cluster nodes, and is used to avoid problems in mixed-version clusters.
+
+ The major version number increases when nodes with different versions would not
+ work (rolling upgrades are not allowed). The minor version number increases
+ when mixed-version clusters are allowed only during rolling upgrades. The
+ minor-minor version number is ignored, but allows resource agents to detect
+ cluster support for various features. [#]_
+
+ Pacemaker ensures that the longest-running node is the cluster's DC. This
+ ensures new features are not enabled until all nodes are upgraded to support
+ them.
+
+.. index::
+ single: version; Pacemaker Remote protocol
+
+* **Pacemaker Remote protocol version:** This version applies to communication
+ between a Pacemaker Remote node and the cluster. It increases when an older
+ cluster node would have problems hosting the connection to a newer
+ Pacemaker Remote node. To avoid these problems, Pacemaker Remote nodes will
+ accept connections only from cluster nodes with the same or newer
+ Pacemaker Remote protocol version.
+
+ Unlike with CRM feature set differences between full cluster nodes,
+ mixed Pacemaker Remote protocol versions between Pacemaker Remote nodes and
+ full cluster nodes are fine, as long as the Pacemaker Remote nodes have the
+ older version. This can be useful, for example, to host a legacy application
+ in an older operating system version used as a Pacemaker Remote node.
+
+.. index::
+ single: version; XML schema
+
+* **XML schema version:** Pacemaker’s configuration syntax — what's allowed in
+ the Configuration Information Base (CIB) — has its own version. This allows
+ the configuration syntax to evolve over time while still allowing clusters
+ with older configurations to work without change.
+
+
+.. index::
+ single: upgrade; methods
+
+Upgrading Cluster Software
+##########################
+
+There are three approaches to upgrading a cluster, each with advantages and
+disadvantages.
+
+.. table:: **Upgrade Methods**
+
+ +---------------------------------------------------+----------+----------+--------+---------+----------+----------+
+ | Method | Available| Can be | Service| Service | Exercises| Allows |
+ | | between | used with| outage | recovery| failover | change of|
+ | | all | Pacemaker| during | during | logic | messaging|
+ | | versions | Remote | upgrade| upgrade | | layer |
+ | | | nodes | | | | [#]_ |
+ +===================================================+==========+==========+========+=========+==========+==========+
+ | Complete cluster shutdown | yes | yes | always | N/A | no | yes |
+ +---------------------------------------------------+----------+----------+--------+---------+----------+----------+
+ | Rolling (node by node) | no | yes | always | yes | yes | no |
+ | | | | [#]_ | | | |
+ +---------------------------------------------------+----------+----------+--------+---------+----------+----------+
+ | Detach and reattach | yes | no | only | no | no | yes |
+ | | | | due to | | | |
+ | | | | failure| | | |
+ +---------------------------------------------------+----------+----------+--------+---------+----------+----------+
+
+
+.. index::
+ single: upgrade; shutdown
+
+Complete Cluster Shutdown
+_________________________
+
+In this scenario, one shuts down all cluster nodes and resources,
+then upgrades all the nodes before restarting the cluster.
+
+#. On each node:
+
+ a. Shutdown the cluster software (pacemaker and the messaging layer).
+ #. Upgrade the Pacemaker software. This may also include upgrading the
+ messaging layer and/or the underlying operating system.
+ #. Check the configuration with the ``crm_verify`` tool.
+
+#. On each node:
+
+ a. Start the cluster software.
+
+Currently, only Corosync version 2 and greater is supported as the cluster
+layer, but if another stack is supported in the future, the stack does not
+need to be the same one before the upgrade.
+
+One variation of this approach is to build a new cluster on new hosts.
+This allows the new version to be tested beforehand, and minimizes downtime by
+having the new nodes ready to be placed in production as soon as the old nodes
+are shut down.
+
+
+.. index::
+ single: upgrade; rolling upgrade
+
+Rolling (node by node)
+______________________
+
+In this scenario, each node is removed from the cluster, upgraded, and then
+brought back online, until all nodes are running the newest version.
+
+Special considerations when planning a rolling upgrade:
+
+* If you plan to upgrade other cluster software -- such as the messaging layer --
+ at the same time, consult that software's documentation for its compatibility
+ with a rolling upgrade.
+
+* If the major version number is changing in the Pacemaker version you are
+ upgrading to, a rolling upgrade may not be possible. Read the new version's
+ release notes (as well the information here) for what limitations may exist.
+
+* If the CRM feature set is changing in the Pacemaker version you are upgrading
+ to, you should run a mixed-version cluster only during a small rolling
+ upgrade window. If one of the older nodes drops out of the cluster for any
+ reason, it will not be able to rejoin until it is upgraded.
+
+* If the Pacemaker Remote protocol version is changing, all cluster nodes
+ should be upgraded before upgrading any Pacemaker Remote nodes.
+
+See the ClusterLabs wiki's
+`release calendar <https://wiki.clusterlabs.org/wiki/ReleaseCalendar>`_
+to figure out whether the CRM feature set and/or Pacemaker Remote protocol
+version changed between the the Pacemaker release versions in your rolling
+upgrade.
+
+To perform a rolling upgrade, on each node in turn:
+
+#. Put the node into standby mode, and wait for any active resources
+ to be moved cleanly to another node. (This step is optional, but
+ allows you to deal with any resource issues before the upgrade.)
+#. Shutdown the cluster software (pacemaker and the messaging layer) on the node.
+#. Upgrade the Pacemaker software. This may also include upgrading the
+ messaging layer and/or the underlying operating system.
+#. If this is the first node to be upgraded, check the configuration
+ with the ``crm_verify`` tool.
+#. Start the messaging layer.
+ This must be the same messaging layer (currently only Corosync version 2 and
+ greater is supported) that the rest of the cluster is using.
+
+.. note::
+
+ Even if a rolling upgrade from the current version of the cluster to the
+ newest version is not directly possible, it may be possible to perform a
+ rolling upgrade in multiple steps, by upgrading to an intermediate version
+ first.
+
+.. table:: **Version Compatibility Table**
+
+ +-------------------------+---------------------------+
+ | Version being Installed | Oldest Compatible Version |
+ +=========================+===========================+
+ | Pacemaker 2.y.z | Pacemaker 1.1.11 [#]_ |
+ +-------------------------+---------------------------+
+ | Pacemaker 1.y.z | Pacemaker 1.0.0 |
+ +-------------------------+---------------------------+
+ | Pacemaker 0.7.z | Pacemaker 0.6.z |
+ +-------------------------+---------------------------+
+
+.. index::
+ single: upgrade; detach and reattach
+
+Detach and Reattach
+___________________
+
+The reattach method is a variant of a complete cluster shutdown, where the
+resources are left active and get re-detected when the cluster is restarted.
+
+This method may not be used if the cluster contains any Pacemaker Remote nodes.
+
+#. Tell the cluster to stop managing services. This is required to allow the
+ services to remain active after the cluster shuts down.
+
+ .. code-block:: none
+
+ # crm_attribute --name maintenance-mode --update true
+
+#. On each node, shutdown the cluster software (pacemaker and the messaging
+ layer), and upgrade the Pacemaker software. This may also include upgrading
+ the messaging layer. While the underlying operating system may be upgraded
+ at the same time, that will be more likely to cause outages in the detached
+ services (certainly, if a reboot is required).
+#. Check the configuration with the ``crm_verify`` tool.
+#. On each node, start the cluster software.
+ Currently, only Corosync version 2 and greater is supported as the cluster
+ layer, but if another stack is supported in the future, the stack does not
+ need to be the same one before the upgrade.
+#. Verify that the cluster re-detected all resources correctly.
+#. Allow the cluster to resume managing resources again:
+
+ .. code-block:: none
+
+ # crm_attribute --name maintenance-mode --delete
+
+.. note::
+
+ While the goal of the detach-and-reattach method is to avoid disturbing
+ running services, resources may still move after the upgrade if any
+ resource's location is governed by a rule based on transient node
+ attributes. Transient node attributes are erased when the node leaves the
+ cluster. A common example is using the ``ocf:pacemaker:ping`` resource to
+ set a node attribute used to locate other resources.
+
+.. index::
+ pair: upgrade; CIB
+
+Upgrading the Configuration
+###########################
+
+The CIB schema version can change from one Pacemaker version to another.
+
+After cluster software is upgraded, the cluster will continue to use the older
+schema version that it was previously using. This can be useful, for example,
+when administrators have written tools that modify the configuration, and are
+based on the older syntax. [#]_
+
+However, when using an older syntax, new features may be unavailable, and there
+is a performance impact, since the cluster must do a non-persistent
+configuration upgrade before each transition. So while using the old syntax is
+possible, it is not advisable to continue using it indefinitely.
+
+Even if you wish to continue using the old syntax, it is a good idea to
+follow the upgrade procedure outlined below, except for the last step, to ensure
+that the new software has no problems with your existing configuration (since it
+will perform much the same task internally).
+
+If you are brave, it is sufficient simply to run ``cibadmin --upgrade``.
+
+A more cautious approach would proceed like this:
+
+#. Create a shadow copy of the configuration. The later commands will
+ automatically operate on this copy, rather than the live configuration.
+
+ .. code-block:: none
+
+ # crm_shadow --create shadow
+
+.. index::
+ single: configuration; verify
+
+#. Verify the configuration is valid with the new software (which may be
+ stricter about syntax mistakes, or may have dropped support for deprecated
+ features):
+
+ .. code-block:: none
+
+ # crm_verify --live-check
+
+#. Fix any errors or warnings.
+#. Perform the upgrade:
+
+ .. code-block:: none
+
+ # cibadmin --upgrade
+
+#. If this step fails, there are three main possibilities:
+
+ a. The configuration was not valid to start with (did you do steps 2 and
+ 3?).
+ #. The transformation failed; `report a bug <https://bugs.clusterlabs.org/>`_.
+ #. The transformation was successful but produced an invalid result.
+
+ If the result of the transformation is invalid, you may see a number of
+ errors from the validation library. If these are not helpful, visit the
+ `Validation FAQ wiki page <https://wiki.clusterlabs.org/wiki/Validation_FAQ>`_
+ and/or try the manual upgrade procedure described below.
+
+#. Check the changes:
+
+ .. code-block:: none
+
+ # crm_shadow --diff
+
+ If at this point there is anything about the upgrade that you wish to
+ fine-tune (for example, to change some of the automatic IDs), now is the
+ time to do so:
+
+ .. code-block:: none
+
+ # crm_shadow --edit
+
+ This will open the configuration in your favorite editor (whichever is
+ specified by the standard ``$EDITOR`` environment variable).
+
+#. Preview how the cluster will react:
+
+ .. code-block:: none
+
+ # crm_simulate --live-check --save-dotfile shadow.dot -S
+ # dot -Tsvg shadow.dot -o shadow.svg
+
+ You can then view shadow.svg with any compatible image viewer or web
+ browser. Verify that either no resource actions will occur or that you are
+ happy with any that are scheduled. If the output contains actions you do
+ not expect (possibly due to changes to the score calculations), you may need
+ to make further manual changes. See :ref:`crm_simulate` for further details
+ on how to interpret the output of ``crm_simulate`` and ``dot``.
+
+#. Upload the changes:
+
+ .. code-block:: none
+
+ # crm_shadow --commit shadow --force
+
+ In the unlikely event this step fails, please report a bug.
+
+.. note::
+
+ It is also possible to perform the configuration upgrade steps manually:
+
+ #. Locate the ``upgrade*.xsl`` conversion scripts provided with the source
+ code. These will often be installed in a location such as
+ ``/usr/share/pacemaker``, or may be obtained from the
+ `source repository <https://github.com/ClusterLabs/pacemaker/tree/main/xml>`_.
+
+ #. Run the conversion scripts that apply to your older version, for example:
+
+ .. code-block:: none
+
+ # xsltproc /path/to/upgrade06.xsl config06.xml > config10.xml
+
+ #. Locate the ``pacemaker.rng`` script (from the same location as the xsl
+ files).
+ #. Check the XML validity:
+
+ .. code-block:: none
+
+ # xmllint --relaxng /path/to/pacemaker.rng config10.xml
+
+ The advantage of this method is that it can be performed without the cluster
+ running, and any validation errors are often more informative.
+
+
+What Changed in 2.1
+###################
+
+The Pacemaker 2.1 release is fully backward-compatible in both the CIB XML and
+the C API. Highlights:
+
+* Pacemaker now supports the **OCF Resource Agent API version 1.1**.
+ Most notably, the ``Master`` and ``Slave`` role names have been renamed to
+ ``Promoted`` and ``Unpromoted``.
+
+* Pacemaker now supports colocations where the dependent resource does not
+ affect the primary resource's placement (via a new ``influence`` colocation
+ constraint option and ``critical`` resource meta-attribute). This is intended
+ for cases where a less-important resource must be colocated with an essential
+ resource, but it is preferred to leave the less-important resource stopped if
+ it fails, rather than move both resources.
+
+* If Pacemaker is built with libqb 2.0 or later, the detail log will use
+ **millisecond-resolution timestamps**.
+
+* In addition to crm_mon and stonith_admin, the crmadmin, crm_resource,
+ crm_simulate, and crm_verify commands now support the ``--output-as`` and
+ ``--output-to`` options, including **XML output** (which scripts and
+ higher-level tools are strongly recommended to use instead of trying to parse
+ the text output, which may change from release to release).
+
+For a detailed list of changes, see the release notes and the
+`Pacemaker 2.1 Changes <https://wiki.clusterlabs.org/wiki/Pacemaker_2.1_Changes>`_
+page on the ClusterLabs wiki.
+
+
+What Changed in 2.0
+###################
+
+The main goal of the 2.0 release was to remove support for deprecated syntax,
+along with some small changes in default configuration behavior and tool
+behavior. Highlights:
+
+* Only Corosync version 2 and greater is now supported as the underlying
+ cluster layer. Support for Heartbeat and Corosync 1 (including CMAN) is
+ removed.
+
+* The Pacemaker detail log file is now stored in
+ ``/var/log/pacemaker/pacemaker.log`` by default.
+
+* The record-pending cluster property now defaults to true, which
+ allows status tools such as crm_mon to show operations that are in
+ progress.
+
+* Support for a number of deprecated build options, environment variables,
+ and configuration settings has been removed.
+
+* The ``master`` tag has been deprecated in favor of using the ``clone`` tag
+ with the new ``promotable`` meta-attribute set to ``true``. "Master/slave"
+ clone resources are now referred to as "promotable" clone resources.
+
+* The public API for Pacemaker libraries that software applications can use
+ has changed significantly.
+
+For a detailed list of changes, see the release notes and the
+`Pacemaker 2.0 Changes <https://wiki.clusterlabs.org/wiki/Pacemaker_2.0_Changes>`_
+page on the ClusterLabs wiki.
+
+
+What Changed in 1.0
+###################
+
+New
+___
+
+* Failure timeouts.
+* New section for resource and operation defaults.
+* Tool for making offline configuration changes.
+* ``Rules``, ``instance_attributes``, ``meta_attributes`` and sets of
+ operations can be defined once and referenced in multiple places.
+* The CIB now accepts XPath-based create/modify/delete operations. See
+ ``cibadmin --help``.
+* Multi-dimensional colocation and ordering constraints.
+* The ability to connect to the CIB from non-cluster machines.
+* Allow recurring actions to be triggered at known times.
+
+
+Changed
+_______
+
+* Syntax
+
+ * All resource and cluster options now use dashes (-) instead of underscores
+ (_)
+ * ``master_slave`` was renamed to ``master``
+ * The ``attributes`` container tag was removed
+ * The operation field ``pre-req`` has been renamed ``requires``
+ * All operations must have an ``interval``, ``start``/``stop`` must have it
+ set to zero
+
+* The ``stonith-enabled`` option now defaults to true.
+* The cluster will refuse to start resources if ``stonith-enabled`` is true (or
+ unset) and no STONITH resources have been defined
+* The attributes of colocation and ordering constraints were renamed for
+ clarity.
+* ``resource-failure-stickiness`` has been replaced by ``migration-threshold``.
+* The parameters for command-line tools have been made consistent
+* Switched to 'RelaxNG' schema validation and 'libxml2' parser
+
+ * id fields are now XML IDs which have the following limitations:
+
+ * id's cannot contain colons (:)
+ * id's cannot begin with a number
+ * id's must be globally unique (not just unique for that tag)
+
+ * Some fields (such as those in constraints that refer to resources) are
+ IDREFs.
+
+ This means that they must reference existing resources or objects in
+ order for the configuration to be valid. Removing an object which is
+ referenced elsewhere will therefore fail.
+
+ * The CIB representation, from which a MD5 digest is calculated to verify
+ CIBs on the nodes, has changed.
+
+ This means that every CIB update will require a full refresh on any
+ upgraded nodes until the cluster is fully upgraded to 1.0. This will result
+ in significant performance degradation and it is therefore highly
+ inadvisable to run a mixed 1.0/0.6 cluster for any longer than absolutely
+ necessary.
+
+* Ping node information no longer needs to be added to ``ha.cf``. Simply
+ include the lists of hosts in your ping resource(s).
+
+
+Removed
+_______
+
+
+* Syntax
+
+ * It is no longer possible to set resource meta options as top-level
+ attributes. Use meta-attributes instead.
+ * Resource and operation defaults are no longer read from ``crm_config``.
+
+.. rubric:: Footnotes
+
+.. [#] Before CRM feature set 3.1.0 (Pacemaker 2.0.0), the minor-minor version
+ number was treated the same as the minor version.
+
+.. [#] Currently, Corosync version 2 and greater is the only supported cluster
+ stack, but other stacks have been supported by past versions, and may be
+ supported by future versions.
+
+.. [#] Any active resources will be moved off the node being upgraded, so there
+ will be at least a brief outage unless all resources can be migrated
+ "live".
+
+.. [#] Rolling upgrades from Pacemaker 1.1.z to 2.y.z are possible only if the
+ cluster uses corosync version 2 or greater as its messaging layer, and
+ the Cluster Information Base (CIB) uses schema 1.0 or higher in its
+ ``validate-with`` property.
+
+.. [#] As of Pacemaker 2.0.0, only schema versions pacemaker-1.0 and higher
+ are supported (excluding pacemaker-1.1, which was a special case).