diff options
Diffstat (limited to 'doc/sphinx/Pacemaker_Administration/upgrading.rst')
-rw-r--r-- | doc/sphinx/Pacemaker_Administration/upgrading.rst | 534 |
1 files changed, 534 insertions, 0 deletions
diff --git a/doc/sphinx/Pacemaker_Administration/upgrading.rst b/doc/sphinx/Pacemaker_Administration/upgrading.rst new file mode 100644 index 0000000..1ca2a4e --- /dev/null +++ b/doc/sphinx/Pacemaker_Administration/upgrading.rst @@ -0,0 +1,534 @@ +.. index:: upgrade + +Upgrading a Pacemaker Cluster +----------------------------- + +.. index:: version + +Pacemaker Versioning +#################### + +Pacemaker has an overall release version, plus separate version numbers for +certain internal components. + +.. index:: + single: version; release + +* **Pacemaker release version:** This version consists of three numbers + (*x.y.z*). + + The major version number (the *x* in *x.y.z*) increases when at least some + rolling upgrades are not possible from the previous major version. For example, + a rolling upgrade from 1.0.8 to 1.1.15 should always be supported, but a + rolling upgrade from 1.0.8 to 2.0.0 may not be possible. + + The minor version (the *y* in *x.y.z*) increases when there are significant + changes in cluster default behavior, tool behavior, and/or the API interface + (for software that utilizes Pacemaker libraries). The main benefit is to alert + you to pay closer attention to the release notes, to see if you might be + affected. + + The release counter (the *z* in *x.y.z*) is increased with all public releases + of Pacemaker, which typically include both bug fixes and new features. + +.. index:: + single: feature set + single: version; feature set + +* **CRM feature set:** This version number applies to the communication between + full cluster nodes, and is used to avoid problems in mixed-version clusters. + + The major version number increases when nodes with different versions would not + work (rolling upgrades are not allowed). The minor version number increases + when mixed-version clusters are allowed only during rolling upgrades. The + minor-minor version number is ignored, but allows resource agents to detect + cluster support for various features. [#]_ + + Pacemaker ensures that the longest-running node is the cluster's DC. This + ensures new features are not enabled until all nodes are upgraded to support + them. + +.. index:: + single: version; Pacemaker Remote protocol + +* **Pacemaker Remote protocol version:** This version applies to communication + between a Pacemaker Remote node and the cluster. It increases when an older + cluster node would have problems hosting the connection to a newer + Pacemaker Remote node. To avoid these problems, Pacemaker Remote nodes will + accept connections only from cluster nodes with the same or newer + Pacemaker Remote protocol version. + + Unlike with CRM feature set differences between full cluster nodes, + mixed Pacemaker Remote protocol versions between Pacemaker Remote nodes and + full cluster nodes are fine, as long as the Pacemaker Remote nodes have the + older version. This can be useful, for example, to host a legacy application + in an older operating system version used as a Pacemaker Remote node. + +.. index:: + single: version; XML schema + +* **XML schema version:** Pacemaker’s configuration syntax — what's allowed in + the Configuration Information Base (CIB) — has its own version. This allows + the configuration syntax to evolve over time while still allowing clusters + with older configurations to work without change. + + +.. index:: + single: upgrade; methods + +Upgrading Cluster Software +########################## + +There are three approaches to upgrading a cluster, each with advantages and +disadvantages. + +.. table:: **Upgrade Methods** + + +---------------------------------------------------+----------+----------+--------+---------+----------+----------+ + | Method | Available| Can be | Service| Service | Exercises| Allows | + | | between | used with| outage | recovery| failover | change of| + | | all | Pacemaker| during | during | logic | messaging| + | | versions | Remote | upgrade| upgrade | | layer | + | | | nodes | | | | [#]_ | + +===================================================+==========+==========+========+=========+==========+==========+ + | Complete cluster shutdown | yes | yes | always | N/A | no | yes | + +---------------------------------------------------+----------+----------+--------+---------+----------+----------+ + | Rolling (node by node) | no | yes | always | yes | yes | no | + | | | | [#]_ | | | | + +---------------------------------------------------+----------+----------+--------+---------+----------+----------+ + | Detach and reattach | yes | no | only | no | no | yes | + | | | | due to | | | | + | | | | failure| | | | + +---------------------------------------------------+----------+----------+--------+---------+----------+----------+ + + +.. index:: + single: upgrade; shutdown + +Complete Cluster Shutdown +_________________________ + +In this scenario, one shuts down all cluster nodes and resources, +then upgrades all the nodes before restarting the cluster. + +#. On each node: + + a. Shutdown the cluster software (pacemaker and the messaging layer). + #. Upgrade the Pacemaker software. This may also include upgrading the + messaging layer and/or the underlying operating system. + #. Check the configuration with the ``crm_verify`` tool. + +#. On each node: + + a. Start the cluster software. + +Currently, only Corosync version 2 and greater is supported as the cluster +layer, but if another stack is supported in the future, the stack does not +need to be the same one before the upgrade. + +One variation of this approach is to build a new cluster on new hosts. +This allows the new version to be tested beforehand, and minimizes downtime by +having the new nodes ready to be placed in production as soon as the old nodes +are shut down. + + +.. index:: + single: upgrade; rolling upgrade + +Rolling (node by node) +______________________ + +In this scenario, each node is removed from the cluster, upgraded, and then +brought back online, until all nodes are running the newest version. + +Special considerations when planning a rolling upgrade: + +* If you plan to upgrade other cluster software -- such as the messaging layer -- + at the same time, consult that software's documentation for its compatibility + with a rolling upgrade. + +* If the major version number is changing in the Pacemaker version you are + upgrading to, a rolling upgrade may not be possible. Read the new version's + release notes (as well the information here) for what limitations may exist. + +* If the CRM feature set is changing in the Pacemaker version you are upgrading + to, you should run a mixed-version cluster only during a small rolling + upgrade window. If one of the older nodes drops out of the cluster for any + reason, it will not be able to rejoin until it is upgraded. + +* If the Pacemaker Remote protocol version is changing, all cluster nodes + should be upgraded before upgrading any Pacemaker Remote nodes. + +See the ClusterLabs wiki's +`release calendar <https://wiki.clusterlabs.org/wiki/ReleaseCalendar>`_ +to figure out whether the CRM feature set and/or Pacemaker Remote protocol +version changed between the the Pacemaker release versions in your rolling +upgrade. + +To perform a rolling upgrade, on each node in turn: + +#. Put the node into standby mode, and wait for any active resources + to be moved cleanly to another node. (This step is optional, but + allows you to deal with any resource issues before the upgrade.) +#. Shutdown the cluster software (pacemaker and the messaging layer) on the node. +#. Upgrade the Pacemaker software. This may also include upgrading the + messaging layer and/or the underlying operating system. +#. If this is the first node to be upgraded, check the configuration + with the ``crm_verify`` tool. +#. Start the messaging layer. + This must be the same messaging layer (currently only Corosync version 2 and + greater is supported) that the rest of the cluster is using. + +.. note:: + + Even if a rolling upgrade from the current version of the cluster to the + newest version is not directly possible, it may be possible to perform a + rolling upgrade in multiple steps, by upgrading to an intermediate version + first. + +.. table:: **Version Compatibility Table** + + +-------------------------+---------------------------+ + | Version being Installed | Oldest Compatible Version | + +=========================+===========================+ + | Pacemaker 2.y.z | Pacemaker 1.1.11 [#]_ | + +-------------------------+---------------------------+ + | Pacemaker 1.y.z | Pacemaker 1.0.0 | + +-------------------------+---------------------------+ + | Pacemaker 0.7.z | Pacemaker 0.6.z | + +-------------------------+---------------------------+ + +.. index:: + single: upgrade; detach and reattach + +Detach and Reattach +___________________ + +The reattach method is a variant of a complete cluster shutdown, where the +resources are left active and get re-detected when the cluster is restarted. + +This method may not be used if the cluster contains any Pacemaker Remote nodes. + +#. Tell the cluster to stop managing services. This is required to allow the + services to remain active after the cluster shuts down. + + .. code-block:: none + + # crm_attribute --name maintenance-mode --update true + +#. On each node, shutdown the cluster software (pacemaker and the messaging + layer), and upgrade the Pacemaker software. This may also include upgrading + the messaging layer. While the underlying operating system may be upgraded + at the same time, that will be more likely to cause outages in the detached + services (certainly, if a reboot is required). +#. Check the configuration with the ``crm_verify`` tool. +#. On each node, start the cluster software. + Currently, only Corosync version 2 and greater is supported as the cluster + layer, but if another stack is supported in the future, the stack does not + need to be the same one before the upgrade. +#. Verify that the cluster re-detected all resources correctly. +#. Allow the cluster to resume managing resources again: + + .. code-block:: none + + # crm_attribute --name maintenance-mode --delete + +.. note:: + + While the goal of the detach-and-reattach method is to avoid disturbing + running services, resources may still move after the upgrade if any + resource's location is governed by a rule based on transient node + attributes. Transient node attributes are erased when the node leaves the + cluster. A common example is using the ``ocf:pacemaker:ping`` resource to + set a node attribute used to locate other resources. + +.. index:: + pair: upgrade; CIB + +Upgrading the Configuration +########################### + +The CIB schema version can change from one Pacemaker version to another. + +After cluster software is upgraded, the cluster will continue to use the older +schema version that it was previously using. This can be useful, for example, +when administrators have written tools that modify the configuration, and are +based on the older syntax. [#]_ + +However, when using an older syntax, new features may be unavailable, and there +is a performance impact, since the cluster must do a non-persistent +configuration upgrade before each transition. So while using the old syntax is +possible, it is not advisable to continue using it indefinitely. + +Even if you wish to continue using the old syntax, it is a good idea to +follow the upgrade procedure outlined below, except for the last step, to ensure +that the new software has no problems with your existing configuration (since it +will perform much the same task internally). + +If you are brave, it is sufficient simply to run ``cibadmin --upgrade``. + +A more cautious approach would proceed like this: + +#. Create a shadow copy of the configuration. The later commands will + automatically operate on this copy, rather than the live configuration. + + .. code-block:: none + + # crm_shadow --create shadow + +.. index:: + single: configuration; verify + +#. Verify the configuration is valid with the new software (which may be + stricter about syntax mistakes, or may have dropped support for deprecated + features): + + .. code-block:: none + + # crm_verify --live-check + +#. Fix any errors or warnings. +#. Perform the upgrade: + + .. code-block:: none + + # cibadmin --upgrade + +#. If this step fails, there are three main possibilities: + + a. The configuration was not valid to start with (did you do steps 2 and + 3?). + #. The transformation failed; `report a bug <https://bugs.clusterlabs.org/>`_. + #. The transformation was successful but produced an invalid result. + + If the result of the transformation is invalid, you may see a number of + errors from the validation library. If these are not helpful, visit the + `Validation FAQ wiki page <https://wiki.clusterlabs.org/wiki/Validation_FAQ>`_ + and/or try the manual upgrade procedure described below. + +#. Check the changes: + + .. code-block:: none + + # crm_shadow --diff + + If at this point there is anything about the upgrade that you wish to + fine-tune (for example, to change some of the automatic IDs), now is the + time to do so: + + .. code-block:: none + + # crm_shadow --edit + + This will open the configuration in your favorite editor (whichever is + specified by the standard ``$EDITOR`` environment variable). + +#. Preview how the cluster will react: + + .. code-block:: none + + # crm_simulate --live-check --save-dotfile shadow.dot -S + # dot -Tsvg shadow.dot -o shadow.svg + + You can then view shadow.svg with any compatible image viewer or web + browser. Verify that either no resource actions will occur or that you are + happy with any that are scheduled. If the output contains actions you do + not expect (possibly due to changes to the score calculations), you may need + to make further manual changes. See :ref:`crm_simulate` for further details + on how to interpret the output of ``crm_simulate`` and ``dot``. + +#. Upload the changes: + + .. code-block:: none + + # crm_shadow --commit shadow --force + + In the unlikely event this step fails, please report a bug. + +.. note:: + + It is also possible to perform the configuration upgrade steps manually: + + #. Locate the ``upgrade*.xsl`` conversion scripts provided with the source + code. These will often be installed in a location such as + ``/usr/share/pacemaker``, or may be obtained from the + `source repository <https://github.com/ClusterLabs/pacemaker/tree/main/xml>`_. + + #. Run the conversion scripts that apply to your older version, for example: + + .. code-block:: none + + # xsltproc /path/to/upgrade06.xsl config06.xml > config10.xml + + #. Locate the ``pacemaker.rng`` script (from the same location as the xsl + files). + #. Check the XML validity: + + .. code-block:: none + + # xmllint --relaxng /path/to/pacemaker.rng config10.xml + + The advantage of this method is that it can be performed without the cluster + running, and any validation errors are often more informative. + + +What Changed in 2.1 +################### + +The Pacemaker 2.1 release is fully backward-compatible in both the CIB XML and +the C API. Highlights: + +* Pacemaker now supports the **OCF Resource Agent API version 1.1**. + Most notably, the ``Master`` and ``Slave`` role names have been renamed to + ``Promoted`` and ``Unpromoted``. + +* Pacemaker now supports colocations where the dependent resource does not + affect the primary resource's placement (via a new ``influence`` colocation + constraint option and ``critical`` resource meta-attribute). This is intended + for cases where a less-important resource must be colocated with an essential + resource, but it is preferred to leave the less-important resource stopped if + it fails, rather than move both resources. + +* If Pacemaker is built with libqb 2.0 or later, the detail log will use + **millisecond-resolution timestamps**. + +* In addition to crm_mon and stonith_admin, the crmadmin, crm_resource, + crm_simulate, and crm_verify commands now support the ``--output-as`` and + ``--output-to`` options, including **XML output** (which scripts and + higher-level tools are strongly recommended to use instead of trying to parse + the text output, which may change from release to release). + +For a detailed list of changes, see the release notes and the +`Pacemaker 2.1 Changes <https://wiki.clusterlabs.org/wiki/Pacemaker_2.1_Changes>`_ +page on the ClusterLabs wiki. + + +What Changed in 2.0 +################### + +The main goal of the 2.0 release was to remove support for deprecated syntax, +along with some small changes in default configuration behavior and tool +behavior. Highlights: + +* Only Corosync version 2 and greater is now supported as the underlying + cluster layer. Support for Heartbeat and Corosync 1 (including CMAN) is + removed. + +* The Pacemaker detail log file is now stored in + ``/var/log/pacemaker/pacemaker.log`` by default. + +* The record-pending cluster property now defaults to true, which + allows status tools such as crm_mon to show operations that are in + progress. + +* Support for a number of deprecated build options, environment variables, + and configuration settings has been removed. + +* The ``master`` tag has been deprecated in favor of using the ``clone`` tag + with the new ``promotable`` meta-attribute set to ``true``. "Master/slave" + clone resources are now referred to as "promotable" clone resources. + +* The public API for Pacemaker libraries that software applications can use + has changed significantly. + +For a detailed list of changes, see the release notes and the +`Pacemaker 2.0 Changes <https://wiki.clusterlabs.org/wiki/Pacemaker_2.0_Changes>`_ +page on the ClusterLabs wiki. + + +What Changed in 1.0 +################### + +New +___ + +* Failure timeouts. +* New section for resource and operation defaults. +* Tool for making offline configuration changes. +* ``Rules``, ``instance_attributes``, ``meta_attributes`` and sets of + operations can be defined once and referenced in multiple places. +* The CIB now accepts XPath-based create/modify/delete operations. See + ``cibadmin --help``. +* Multi-dimensional colocation and ordering constraints. +* The ability to connect to the CIB from non-cluster machines. +* Allow recurring actions to be triggered at known times. + + +Changed +_______ + +* Syntax + + * All resource and cluster options now use dashes (-) instead of underscores + (_) + * ``master_slave`` was renamed to ``master`` + * The ``attributes`` container tag was removed + * The operation field ``pre-req`` has been renamed ``requires`` + * All operations must have an ``interval``, ``start``/``stop`` must have it + set to zero + +* The ``stonith-enabled`` option now defaults to true. +* The cluster will refuse to start resources if ``stonith-enabled`` is true (or + unset) and no STONITH resources have been defined +* The attributes of colocation and ordering constraints were renamed for + clarity. +* ``resource-failure-stickiness`` has been replaced by ``migration-threshold``. +* The parameters for command-line tools have been made consistent +* Switched to 'RelaxNG' schema validation and 'libxml2' parser + + * id fields are now XML IDs which have the following limitations: + + * id's cannot contain colons (:) + * id's cannot begin with a number + * id's must be globally unique (not just unique for that tag) + + * Some fields (such as those in constraints that refer to resources) are + IDREFs. + + This means that they must reference existing resources or objects in + order for the configuration to be valid. Removing an object which is + referenced elsewhere will therefore fail. + + * The CIB representation, from which a MD5 digest is calculated to verify + CIBs on the nodes, has changed. + + This means that every CIB update will require a full refresh on any + upgraded nodes until the cluster is fully upgraded to 1.0. This will result + in significant performance degradation and it is therefore highly + inadvisable to run a mixed 1.0/0.6 cluster for any longer than absolutely + necessary. + +* Ping node information no longer needs to be added to ``ha.cf``. Simply + include the lists of hosts in your ping resource(s). + + +Removed +_______ + + +* Syntax + + * It is no longer possible to set resource meta options as top-level + attributes. Use meta-attributes instead. + * Resource and operation defaults are no longer read from ``crm_config``. + +.. rubric:: Footnotes + +.. [#] Before CRM feature set 3.1.0 (Pacemaker 2.0.0), the minor-minor version + number was treated the same as the minor version. + +.. [#] Currently, Corosync version 2 and greater is the only supported cluster + stack, but other stacks have been supported by past versions, and may be + supported by future versions. + +.. [#] Any active resources will be moved off the node being upgraded, so there + will be at least a brief outage unless all resources can be migrated + "live". + +.. [#] Rolling upgrades from Pacemaker 1.1.z to 2.y.z are possible only if the + cluster uses corosync version 2 or greater as its messaging layer, and + the Cluster Information Base (CIB) uses schema 1.0 or higher in its + ``validate-with`` property. + +.. [#] As of Pacemaker 2.0.0, only schema versions pacemaker-1.0 and higher + are supported (excluding pacemaker-1.1, which was a special case). |