diff options
Diffstat (limited to '')
-rw-r--r-- | doc/cephadm/upgrade.rst | 295 |
1 files changed, 295 insertions, 0 deletions
diff --git a/doc/cephadm/upgrade.rst b/doc/cephadm/upgrade.rst new file mode 100644 index 000000000..e0a9f610a --- /dev/null +++ b/doc/cephadm/upgrade.rst @@ -0,0 +1,295 @@ +============== +Upgrading Ceph +============== + +Cephadm can safely upgrade Ceph from one bugfix release to the next. For +example, you can upgrade from v15.2.0 (the first Octopus release) to the next +point release, v15.2.1. + +The automated upgrade process follows Ceph best practices. For example: + +* The upgrade order starts with managers, monitors, then other daemons. +* Each daemon is restarted only after Ceph indicates that the cluster + will remain available. + +.. note:: + + The Ceph cluster health status is likely to switch to + ``HEALTH_WARNING`` during the upgrade. + +.. note:: + + In case a host of the cluster is offline, the upgrade is paused. + + +Starting the upgrade +==================== + +.. note:: + .. note:: + `Staggered Upgrade`_ of the mons/mgrs may be necessary to have access + to this new feature. + + Cephadm by default reduces `max_mds` to `1`. This can be disruptive for large + scale CephFS deployments because the cluster cannot quickly reduce active MDS(s) + to `1` and a single active MDS cannot easily handle the load of all clients + even for a short time. Therefore, to upgrade MDS(s) without reducing `max_mds`, + the `fail_fs` option can to be set to `true` (default value is `false`) prior + to initiating the upgrade: + + .. prompt:: bash # + + ceph config set mgr mgr/orchestrator/fail_fs true + + This would: + #. Fail CephFS filesystems, bringing active MDS daemon(s) to + `up:standby` state. + + #. Upgrade MDS daemons safely. + + #. Bring CephFS filesystems back up, bringing the state of active + MDS daemon(s) from `up:standby` to `up:active`. + +Before you use cephadm to upgrade Ceph, verify that all hosts are currently online and that your cluster is healthy by running the following command: + +.. prompt:: bash # + + ceph -s + +To upgrade (or downgrade) to a specific release, run the following command: + +.. prompt:: bash # + + ceph orch upgrade start --ceph-version <version> + +For example, to upgrade to v16.2.6, run the following command: + +.. prompt:: bash # + + ceph orch upgrade start --ceph-version 16.2.6 + +.. note:: + + From version v16.2.6 the Docker Hub registry is no longer used, so if you use Docker you have to point it to the image in the quay.io registry: + +.. prompt:: bash # + + ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.6 + + +Monitoring the upgrade +====================== + +Determine (1) whether an upgrade is in progress and (2) which version the +cluster is upgrading to by running the following command: + +.. prompt:: bash # + + ceph orch upgrade status + +Watching the progress bar during a Ceph upgrade +----------------------------------------------- + +During the upgrade, a progress bar is visible in the ceph status output. It +looks like this: + +.. code-block:: console + + # ceph -s + + [...] + progress: + Upgrade to docker.io/ceph/ceph:v15.2.1 (00h 20m 12s) + [=======.....................] (time remaining: 01h 43m 31s) + +Watching the cephadm log during an upgrade +------------------------------------------ + +Watch the cephadm log by running the following command: + +.. prompt:: bash # + + ceph -W cephadm + + +Canceling an upgrade +==================== + +You can stop the upgrade process at any time by running the following command: + +.. prompt:: bash # + + ceph orch upgrade stop + +Post upgrade actions +==================== + +In case the new version is based on ``cephadm``, once done with the upgrade the user +has to update the ``cephadm`` package (or ceph-common package in case the user +doesn't use ``cephadm shell``) to a version compatible with the new version. + +Potential problems +================== + +There are a few health alerts that can arise during the upgrade process. + +UPGRADE_NO_STANDBY_MGR +---------------------- + +This alert (``UPGRADE_NO_STANDBY_MGR``) means that Ceph does not detect an +active standby manager daemon. In order to proceed with the upgrade, Ceph +requires an active standby manager daemon (which you can think of in this +context as "a second manager"). + +You can ensure that Cephadm is configured to run 2 (or more) managers by +running the following command: + +.. prompt:: bash # + + ceph orch apply mgr 2 # or more + +You can check the status of existing mgr daemons by running the following +command: + +.. prompt:: bash # + + ceph orch ps --daemon-type mgr + +If an existing mgr daemon has stopped, you can try to restart it by running the +following command: + +.. prompt:: bash # + + ceph orch daemon restart <name> + +UPGRADE_FAILED_PULL +------------------- + +This alert (``UPGRADE_FAILED_PULL``) means that Ceph was unable to pull the +container image for the target version. This can happen if you specify a +version or container image that does not exist (e.g. "1.2.3"), or if the +container registry can not be reached by one or more hosts in the cluster. + +To cancel the existing upgrade and to specify a different target version, run +the following commands: + +.. prompt:: bash # + + ceph orch upgrade stop + ceph orch upgrade start --ceph-version <version> + + +Using customized container images +================================= + +For most users, upgrading requires nothing more complicated than specifying the +Ceph version number to upgrade to. In such cases, cephadm locates the specific +Ceph container image to use by combining the ``container_image_base`` +configuration option (default: ``docker.io/ceph/ceph``) with a tag of +``vX.Y.Z``. + +But it is possible to upgrade to an arbitrary container image, if that's what +you need. For example, the following command upgrades to a development build: + +.. prompt:: bash # + + ceph orch upgrade start --image quay.io/ceph-ci/ceph:recent-git-branch-name + +For more information about available container images, see :ref:`containers`. + +Staggered Upgrade +================= + +Some users may prefer to upgrade components in phases rather than all at once. +The upgrade command, starting in 16.2.11 and 17.2.1 allows parameters +to limit which daemons are upgraded by a single upgrade command. The options in +include ``daemon_types``, ``services``, ``hosts`` and ``limit``. ``daemon_types`` +takes a comma-separated list of daemon types and will only upgrade daemons of those +types. ``services`` is mutually exclusive with ``daemon_types``, only takes services +of one type at a time (e.g. can't provide an OSD and RGW service at the same time), and +will only upgrade daemons belonging to those services. ``hosts`` can be combined +with ``daemon_types`` or ``services`` or provided on its own. The ``hosts`` parameter +follows the same format as the command line options for :ref:`orchestrator-cli-placement-spec`. +``limit`` takes an integer > 0 and provides a numerical limit on the number of +daemons cephadm will upgrade. ``limit`` can be combined with any of the other +parameters. For example, if you specify to upgrade daemons of type osd on host +Host1 with ``limit`` set to 3, cephadm will upgrade (up to) 3 osd daemons on +Host1. + +Example: specifying daemon types and hosts: + +.. prompt:: bash # + + ceph orch upgrade start --image <image-name> --daemon-types mgr,mon --hosts host1,host2 + +Example: specifying services and using limit: + +.. prompt:: bash # + + ceph orch upgrade start --image <image-name> --services rgw.example1,rgw.example2 --limit 2 + +.. note:: + + Cephadm strictly enforces an order to the upgrade of daemons that is still present + in staggered upgrade scenarios. The current upgrade ordering is + ``mgr -> mon -> crash -> osd -> mds -> rgw -> rbd-mirror -> cephfs-mirror -> iscsi -> nfs``. + If you specify parameters that would upgrade daemons out of order, the upgrade + command will block and note which daemons will be missed if you proceed. + +.. note:: + + Upgrade commands with limiting parameters will validate the options before beginning the + upgrade, which may require pulling the new container image. Do not be surprised + if the upgrade start command takes a while to return when limiting parameters are provided. + +.. note:: + + In staggered upgrade scenarios (when a limiting parameter is provided) monitoring + stack daemons including Prometheus and node-exporter are refreshed after the Manager + daemons have been upgraded. Do not be surprised if Manager upgrades thus take longer + than expected. Note that the versions of monitoring stack daemons may not change between + Ceph releases, in which case they are only redeployed. + +Upgrading to a version that supports staggered upgrade from one that doesn't +---------------------------------------------------------------------------- + +While upgrading from a version that already supports staggered upgrades the process +simply requires providing the necessary arguments. However, if you wish to upgrade +to a version that supports staggered upgrade from one that does not, there is a +workaround. It requires first manually upgrading the Manager daemons and then passing +the limiting parameters as usual. + +.. warning:: + Make sure you have multiple running mgr daemons before attempting this procedure. + +To start with, determine which Manager is your active one and which are standby. This +can be done in a variety of ways such as looking at the ``ceph -s`` output. Then, +manually upgrade each standby mgr daemon with: + +.. prompt:: bash # + + ceph orch daemon redeploy mgr.example1.abcdef --image <new-image-name> + +.. note:: + + If you are on a very early version of cephadm (early Octopus) the ``orch daemon redeploy`` + command may not have the ``--image`` flag. In that case, you must manually set the + Manager container image ``ceph config set mgr container_image <new-image-name>`` and then + redeploy the Manager ``ceph orch daemon redeploy mgr.example1.abcdef`` + +At this point, a Manager fail over should allow us to have the active Manager be one +running the new version. + +.. prompt:: bash # + + ceph mgr fail + +Verify the active Manager is now one running the new version. To complete the Manager +upgrading: + +.. prompt:: bash # + + ceph orch upgrade start --image <new-image-name> --daemon-types mgr + +You should now have all your Manager daemons on the new version and be able to +specify the limiting parameters for the rest of the upgrade. |