summaryrefslogtreecommitdiffstats
path: root/doc/security/CVE-2021-20288.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/security/CVE-2021-20288.rst')
-rw-r--r--doc/security/CVE-2021-20288.rst183
1 files changed, 183 insertions, 0 deletions
diff --git a/doc/security/CVE-2021-20288.rst b/doc/security/CVE-2021-20288.rst
new file mode 100644
index 000000000..fa3b073cb
--- /dev/null
+++ b/doc/security/CVE-2021-20288.rst
@@ -0,0 +1,183 @@
+.. _CVE-2021-20288:
+
+CVE-2021-20288: Unauthorized global_id reuse in cephx
+=====================================================
+
+* `NIST information page <https://nvd.nist.gov/vuln/detail/CVE-2021-20288>`_
+
+Summary
+-------
+
+Ceph was not ensuring that reconnecting/renewing clients were
+presenting an existing ticket when reclaiming their global_id value.
+An attacker that was able to authenticate could claim a global_id in
+use by a different client and potentially disrupt
+other cluster services.
+
+Background
+----------
+
+Each authenticated client or daemon in Ceph is assigned a numeric
+global_id identifier. That value is assumed to be unique across the
+cluster. When clients reconnect to the monitor (e.g., due to a
+network disconnection) or renew their ticket, they are supposed to
+present their old ticket to prove prior possession of their global_id
+so that it can be reclaimed and thus remain constant over the lifetime
+of that client instance.
+
+Ceph was not correctly checking that the old ticket was valid, allowing
+an arbitrary global_id to be reclaimed, even if it was in use by another
+active client in the system.
+
+Attacker Requirements
+---------------------
+
+Any potential attacker must:
+
+* have a valid authentication key for the cluster
+* know or guess the global_id of another client
+* run a modified version of the Ceph client code to reclaim another client's global_id
+* construct appropriate client messages or requests to disrupt service or exploit
+ Ceph daemon assumptions about global_id uniqueness
+
+Impact
+------
+
+Confidentiality Impact
+______________________
+
+None
+
+Integrity Impact
+________________
+
+Partial. An attacker could potentially exploit assumptions around
+global_id uniqueness to disrupt other clients' access or disrupt
+Ceph daemons.
+
+Availability Impact
+___________________
+
+High. An attacker could potentially exploit assumptions around
+global_id uniqueness to disrupt other clients' access or disrupt
+Ceph daemons.
+
+Access Complexity
+_________________
+
+High. The client must make use of modified client code in order to
+exploit specific assumptions in the behavior of other Ceph daemons.
+
+Authentication
+______________
+
+Yes. The attacker must also be authenticated and have access to the
+same services as a client it is wishing to impersonate or disrupt.
+
+Gained Access
+_____________
+
+Partial. An attacker can partially impersonate another client.
+
+Affected versions
+-----------------
+
+All prior versions of Ceph monitors fail to ensure that global_id reclaim
+attempts are authentic.
+
+In addition, all user-space daemons and clients starting from Luminous v12.2.0
+were failing to securely reclaim their global_id following commit a2eb6ae3fb57
+("mon/monclient: hunt for multiple monitor in parallel").
+
+All versions of the Linux kernel client properly authenticate.
+
+Fixed versions
+--------------
+
+* Pacific v16.2.1 (and later)
+* Octopus v15.2.11 (and later)
+* Nautilus v14.2.20 (and later)
+
+
+Fix details
+-----------
+
+#. Patched monitors now properly require that clients securely reclaim
+ their global_id when the ``auth_allow_insecure_global_id_reclaim``
+ is ``false``. Initially, by default, this option is set to
+ ``true`` so that existing clients can continue to function without
+ disruption until all clients have been upgraded. When this option
+ is set to false, then an unpatched client will not be able to reconnect
+ to the cluster after an intermittent network disruption breaking
+ its connect to a monitor, or be able to renew its authentication
+ ticket when it times out (by default, after 72 hours).
+
+ Patched monitors raise the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED``
+ health alert if ``auth_allow_insecure_global_id_reclaim`` is enabled.
+ This health alert can be muted with::
+
+ ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w
+
+ Although it is not recommended, the alert can also be disabled with::
+
+ ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false
+
+#. Patched monitors can disconnect new clients right after they have
+ authenticated (forcing them to reconnect and reclaim) in order to
+ determine whether they securely reclaim global_ids. This allows
+ the cluster and users to discover quickly whether clients would be
+ affected by requiring secure global_id reclaim: most clients will
+ report an authentication error immediately. This behavior can be
+ disabled by setting ``auth_expose_insecure_global_id_reclaim`` to
+ ``false``::
+
+ ceph config set mon auth_expose_insecure_global_id_reclaim false
+
+#. Patched monitors will raise the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM`` health
+ alert for any clients or daemons that are not securely reclaiming their
+ global_id. These clients should be upgraded before disabling the
+ ``auth_allow_insecure_global_id_reclaim`` option to avoid disrupting
+ client access.
+
+ By default (if ``auth_expose_insecure_global_id_reclaim`` has not
+ been disabled), clients' failure to securely reclaim global_id will
+ immediately be exposed and raise this health alert.
+ However, if ``auth_expose_insecure_global_id_reclaim`` has been
+ disabled, this alert will not be triggered for a client until it is
+ forced to reconnect to a monitor (e.g., due to a network disruption)
+ or the client renews its authentication ticket (by default, after
+ 72 hours).
+
+#. The default time-to-live (TTL) for authentication tickets has been increased
+ from 12 hours to 72 hours. Because we previously were not ensuring that
+ a client's prior ticket was valid when reclaiming their global_id, a client
+ could tolerate a network outage that lasted longer than the ticket TTL and still
+ reclaim its global_id. Once the cluster starts requiring secure global_id reclaim,
+ a client that is disconnected for longer than the TTL may fail to reclaim its global_id,
+ fail to reauthenticate, and be unable to continue communicating with the cluster
+ until it is restarted. The default TTL was increased to minimize the impact of this
+ change on users.
+
+
+Recommendations
+---------------
+
+#. Users should upgrade to a patched version of Ceph at their earliest
+ convenience.
+
+#. Users should upgrade any unpatched clients at their earliest
+ convenience. By default, these clients can be easily identified by
+ checking the ``ceph health detail`` output for the
+ ``AUTH_INSECURE_GLOBAL_ID_RECLAIM`` alert.
+
+#. If all clients cannot be upgraded immediately, the health alerts can be
+ temporarily muted with::
+
+ ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1w # 1 week
+ ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w # 1 week
+
+#. After all clients have been updated and the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM``
+ alert is no longer present, the cluster should be set to prevent insecure
+ global_id reclaim with::
+
+ ceph config set mon auth_allow_insecure_global_id_reclaim false