diff options
Diffstat (limited to '')
-rw-r--r-- | doc/security/CVE-2021-20288.rst | 183 |
1 files changed, 183 insertions, 0 deletions
diff --git a/doc/security/CVE-2021-20288.rst b/doc/security/CVE-2021-20288.rst new file mode 100644 index 000000000..fa3b073cb --- /dev/null +++ b/doc/security/CVE-2021-20288.rst @@ -0,0 +1,183 @@ +.. _CVE-2021-20288: + +CVE-2021-20288: Unauthorized global_id reuse in cephx +===================================================== + +* `NIST information page <https://nvd.nist.gov/vuln/detail/CVE-2021-20288>`_ + +Summary +------- + +Ceph was not ensuring that reconnecting/renewing clients were +presenting an existing ticket when reclaiming their global_id value. +An attacker that was able to authenticate could claim a global_id in +use by a different client and potentially disrupt +other cluster services. + +Background +---------- + +Each authenticated client or daemon in Ceph is assigned a numeric +global_id identifier. That value is assumed to be unique across the +cluster. When clients reconnect to the monitor (e.g., due to a +network disconnection) or renew their ticket, they are supposed to +present their old ticket to prove prior possession of their global_id +so that it can be reclaimed and thus remain constant over the lifetime +of that client instance. + +Ceph was not correctly checking that the old ticket was valid, allowing +an arbitrary global_id to be reclaimed, even if it was in use by another +active client in the system. + +Attacker Requirements +--------------------- + +Any potential attacker must: + +* have a valid authentication key for the cluster +* know or guess the global_id of another client +* run a modified version of the Ceph client code to reclaim another client's global_id +* construct appropriate client messages or requests to disrupt service or exploit + Ceph daemon assumptions about global_id uniqueness + +Impact +------ + +Confidentiality Impact +______________________ + +None + +Integrity Impact +________________ + +Partial. An attacker could potentially exploit assumptions around +global_id uniqueness to disrupt other clients' access or disrupt +Ceph daemons. + +Availability Impact +___________________ + +High. An attacker could potentially exploit assumptions around +global_id uniqueness to disrupt other clients' access or disrupt +Ceph daemons. + +Access Complexity +_________________ + +High. The client must make use of modified client code in order to +exploit specific assumptions in the behavior of other Ceph daemons. + +Authentication +______________ + +Yes. The attacker must also be authenticated and have access to the +same services as a client it is wishing to impersonate or disrupt. + +Gained Access +_____________ + +Partial. An attacker can partially impersonate another client. + +Affected versions +----------------- + +All prior versions of Ceph monitors fail to ensure that global_id reclaim +attempts are authentic. + +In addition, all user-space daemons and clients starting from Luminous v12.2.0 +were failing to securely reclaim their global_id following commit a2eb6ae3fb57 +("mon/monclient: hunt for multiple monitor in parallel"). + +All versions of the Linux kernel client properly authenticate. + +Fixed versions +-------------- + +* Pacific v16.2.1 (and later) +* Octopus v15.2.11 (and later) +* Nautilus v14.2.20 (and later) + + +Fix details +----------- + +#. Patched monitors now properly require that clients securely reclaim + their global_id when the ``auth_allow_insecure_global_id_reclaim`` + is ``false``. Initially, by default, this option is set to + ``true`` so that existing clients can continue to function without + disruption until all clients have been upgraded. When this option + is set to false, then an unpatched client will not be able to reconnect + to the cluster after an intermittent network disruption breaking + its connect to a monitor, or be able to renew its authentication + ticket when it times out (by default, after 72 hours). + + Patched monitors raise the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED`` + health alert if ``auth_allow_insecure_global_id_reclaim`` is enabled. + This health alert can be muted with:: + + ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w + + Although it is not recommended, the alert can also be disabled with:: + + ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false + +#. Patched monitors can disconnect new clients right after they have + authenticated (forcing them to reconnect and reclaim) in order to + determine whether they securely reclaim global_ids. This allows + the cluster and users to discover quickly whether clients would be + affected by requiring secure global_id reclaim: most clients will + report an authentication error immediately. This behavior can be + disabled by setting ``auth_expose_insecure_global_id_reclaim`` to + ``false``:: + + ceph config set mon auth_expose_insecure_global_id_reclaim false + +#. Patched monitors will raise the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM`` health + alert for any clients or daemons that are not securely reclaiming their + global_id. These clients should be upgraded before disabling the + ``auth_allow_insecure_global_id_reclaim`` option to avoid disrupting + client access. + + By default (if ``auth_expose_insecure_global_id_reclaim`` has not + been disabled), clients' failure to securely reclaim global_id will + immediately be exposed and raise this health alert. + However, if ``auth_expose_insecure_global_id_reclaim`` has been + disabled, this alert will not be triggered for a client until it is + forced to reconnect to a monitor (e.g., due to a network disruption) + or the client renews its authentication ticket (by default, after + 72 hours). + +#. The default time-to-live (TTL) for authentication tickets has been increased + from 12 hours to 72 hours. Because we previously were not ensuring that + a client's prior ticket was valid when reclaiming their global_id, a client + could tolerate a network outage that lasted longer than the ticket TTL and still + reclaim its global_id. Once the cluster starts requiring secure global_id reclaim, + a client that is disconnected for longer than the TTL may fail to reclaim its global_id, + fail to reauthenticate, and be unable to continue communicating with the cluster + until it is restarted. The default TTL was increased to minimize the impact of this + change on users. + + +Recommendations +--------------- + +#. Users should upgrade to a patched version of Ceph at their earliest + convenience. + +#. Users should upgrade any unpatched clients at their earliest + convenience. By default, these clients can be easily identified by + checking the ``ceph health detail`` output for the + ``AUTH_INSECURE_GLOBAL_ID_RECLAIM`` alert. + +#. If all clients cannot be upgraded immediately, the health alerts can be + temporarily muted with:: + + ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1w # 1 week + ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w # 1 week + +#. After all clients have been updated and the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM`` + alert is no longer present, the cluster should be set to prevent insecure + global_id reclaim with:: + + ceph config set mon auth_allow_insecure_global_id_reclaim false |