1 files changed, 183 insertions, 0 deletions
diff --git a/doc/security/CVE-2021-20288.rst b/doc/security/CVE-2021-20288.rst
new file mode 100644
index 000000000..fa3b073cb
--- /dev/null
+++ b/doc/security/CVE-2021-20288.rst
@@ -0,0 +1,183 @@
+.. _CVE-2021-20288:
+
+CVE-2021-20288: Unauthorized global_id reuse in cephx
+=====================================================
+
+* `NIST information page <https://nvd.nist.gov/vuln/detail/CVE-2021-20288>`_
+
+Summary
+-------
+
+Ceph was not ensuring that reconnecting/renewing clients were
+presenting an existing ticket when reclaiming their global_id value.
+An attacker that was able to authenticate could claim a global_id in
+use by a different client and potentially disrupt
+other cluster services.
+
+Background
+----------
+
+Each authenticated client or daemon in Ceph is assigned a numeric
+global_id identifier. That value is assumed to be unique across the
+cluster.  When clients reconnect to the monitor (e.g., due to a
+network disconnection) or renew their ticket, they are supposed to
+present their old ticket to prove prior possession of their global_id
+so that it can be reclaimed and thus remain constant over the lifetime
+of that client instance.
+
+Ceph was not correctly checking that the old ticket was valid, allowing
+an arbitrary global_id to be reclaimed, even if it was in use by another
+active client in the system.
+
+Attacker Requirements
+---------------------
+
+Any potential attacker must:
+
+* have a valid authentication key for the cluster
+* know or guess the global_id of another client
+* run a modified version of the Ceph client code to reclaim another client's global_id
+* construct appropriate client messages or requests to disrupt service or exploit
+  Ceph daemon assumptions about global_id uniqueness
+
+Impact
+------
+
+Confidentiality Impact
+______________________
+
+None
+
+Integrity Impact
+________________
+
+Partial.  An attacker could potentially exploit assumptions around
+global_id uniqueness to disrupt other clients' access or disrupt
+Ceph daemons.
+
+Availability Impact
+___________________
+
+High.  An attacker could potentially exploit assumptions around
+global_id uniqueness to disrupt other clients' access or disrupt
+Ceph daemons.
+
+Access Complexity
+_________________
+
+High.  The client must make use of modified client code in order to
+exploit specific assumptions in the behavior of other Ceph daemons.
+
+Authentication
+______________
+
+Yes.  The attacker must also be authenticated and have access to the
+same services as a client it is wishing to impersonate or disrupt.
+
+Gained Access
+_____________
+
+Partial.  An attacker can partially impersonate another client.
+
+Affected versions
+-----------------
+
+All prior versions of Ceph monitors fail to ensure that global_id reclaim
+attempts are authentic.
+
+In addition, all user-space daemons and clients starting from Luminous v12.2.0
+were failing to securely reclaim their global_id following commit a2eb6ae3fb57
+("mon/monclient: hunt for multiple monitor in parallel").
+
+All versions of the Linux kernel client properly authenticate.
+
+Fixed versions
+--------------
+
+* Pacific v16.2.1 (and later)
+* Octopus v15.2.11 (and later)
+* Nautilus v14.2.20 (and later)
+
+
+Fix details
+-----------
+
+#. Patched monitors now properly require that clients securely reclaim
+   their global_id when the ``auth_allow_insecure_global_id_reclaim``
+   is ``false``.  Initially, by default, this option is set to
+   ``true`` so that existing clients can continue to function without
+   disruption until all clients have been upgraded.  When this option
+   is set to false, then an unpatched client will not be able to reconnect
+   to the cluster after an intermittent network disruption breaking
+   its connect to a monitor, or be able to renew its authentication
+   ticket when it times out (by default, after 72 hours).
+
+   Patched monitors raise the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED``
+   health alert if ``auth_allow_insecure_global_id_reclaim`` is enabled.
+   This health alert can be muted with::
+
+     ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w
+
+   Although it is not recommended, the alert can also be disabled with::
+
+     ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false
+
+#. Patched monitors can disconnect new clients right after they have
+   authenticated (forcing them to reconnect and reclaim) in order to
+   determine whether they securely reclaim global_ids.  This allows
+   the cluster and users to discover quickly whether clients would be
+   affected by requiring secure global_id reclaim: most clients will
+   report an authentication error immediately.  This behavior can be
+   disabled by setting ``auth_expose_insecure_global_id_reclaim`` to
+   ``false``::
+
+     ceph config set mon auth_expose_insecure_global_id_reclaim false
+
+#. Patched monitors will raise the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM`` health
+   alert for any clients or daemons that are not securely reclaiming their
+   global_id.  These clients should be upgraded before disabling the
+   ``auth_allow_insecure_global_id_reclaim`` option to avoid disrupting
+   client access.
+
+   By default (if ``auth_expose_insecure_global_id_reclaim`` has not
+   been disabled), clients' failure to securely reclaim global_id will
+   immediately be exposed and raise this health alert.
+   However, if ``auth_expose_insecure_global_id_reclaim`` has been
+   disabled, this alert will not be triggered for a client until it is
+   forced to reconnect to a monitor (e.g., due to a network disruption)
+   or the client renews its authentication ticket (by default, after
+   72 hours).
+
+#. The default time-to-live (TTL) for authentication tickets has been increased
+   from 12 hours to 72 hours.  Because we previously were not ensuring that
+   a client's prior ticket was valid when reclaiming their global_id, a client
+   could tolerate a network outage that lasted longer than the ticket TTL and still
+   reclaim its global_id.  Once the cluster starts requiring secure global_id reclaim,
+   a client that is disconnected for longer than the TTL may fail to reclaim its global_id,
+   fail to reauthenticate, and be unable to continue communicating with the cluster
+   until it is restarted.  The default TTL was increased to minimize the impact of this
+   change on users.
+
+
+Recommendations
+---------------
+
+#. Users should upgrade to a patched version of Ceph at their earliest
+   convenience.
+
+#. Users should upgrade any unpatched clients at their earliest
+   convenience.  By default, these clients can be easily identified by
+   checking the ``ceph health detail`` output for the
+   ``AUTH_INSECURE_GLOBAL_ID_RECLAIM`` alert.
+
+#. If all clients cannot be upgraded immediately, the health alerts can be
+   temporarily muted with::
+
+     ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1w  # 1 week
+     ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w  # 1 week
+
+#. After all clients have been updated and the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM``
+   alert is no longer present, the cluster should be set to prevent insecure
+   global_id reclaim with::
+
+     ceph config set mon auth_allow_insecure_global_id_reclaim false