Adding upstream version 6.6.15.upstream/6.6.15

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-11 08:27:49 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-11 08:27:49 +0000
commit: ace9429bb58fd418f0c81d4c2835699bddf6bde6 (patch)
tree: b2d64bc10158fdd5497876388cd68142ca374ed3 /Documentation/security
parent: Initial commit. (diff)
download: linux-ace9429bb58fd418f0c81d4c2835699bddf6bde6.tar.xz
linux-ace9429bb58fd418f0c81d4c2835699bddf6bde6.zip
23 files changed, 4969 insertions, 0 deletions
diff --git a/Documentation/security/IMA-templates.rst b/Documentation/security/IMA-templates.rst
new file mode 100644
index 0000000000..15b4add314
--- /dev/null
+++ b/Documentation/security/IMA-templates.rst
@@ -0,0 +1,111 @@
+=================================
+IMA Template Management Mechanism
+=================================
+
+
+Introduction
+============
+
+The original ``ima`` template is fixed length, containing the filedata hash
+and pathname. The filedata hash is limited to 20 bytes (md5/sha1).
+The pathname is a null terminated string, limited to 255 characters.
+To overcome these limitations and to add additional file metadata, it is
+necessary to extend the current version of IMA by defining additional
+templates. For example, information that could be possibly reported are
+the inode UID/GID or the LSM labels either of the inode and of the process
+that is accessing it.
+
+However, the main problem to introduce this feature is that, each time
+a new template is defined, the functions that generate and display
+the measurements list would include the code for handling a new format
+and, thus, would significantly grow over the time.
+
+The proposed solution solves this problem by separating the template
+management from the remaining IMA code. The core of this solution is the
+definition of two new data structures: a template descriptor, to determine
+which information should be included in the measurement list; a template
+field, to generate and display data of a given type.
+
+Managing templates with these structures is very simple. To support
+a new data type, developers define the field identifier and implement
+two functions, init() and show(), respectively to generate and display
+measurement entries. Defining a new template descriptor requires
+specifying the template format (a string of field identifiers separated
+by the ``|`` character) through the ``ima_template_fmt`` kernel command line
+parameter. At boot time, IMA initializes the chosen template descriptor
+by translating the format into an array of template fields structures taken
+from the set of the supported ones.
+
+After the initialization step, IMA will call ``ima_alloc_init_template()``
+(new function defined within the patches for the new template management
+mechanism) to generate a new measurement entry by using the template
+descriptor chosen through the kernel configuration or through the newly
+introduced ``ima_template`` and ``ima_template_fmt`` kernel command line parameters.
+It is during this phase that the advantages of the new architecture are
+clearly shown: the latter function will not contain specific code to handle
+a given template but, instead, it simply calls the ``init()`` method of the template
+fields associated to the chosen template descriptor and store the result
+(pointer to allocated data and data length) in the measurement entry structure.
+
+The same mechanism is employed to display measurements entries.
+The functions ``ima[_ascii]_measurements_show()`` retrieve, for each entry,
+the template descriptor used to produce that entry and call the show()
+method for each item of the array of template fields structures.
+
+
+
+Supported Template Fields and Descriptors
+=========================================
+
+In the following, there is the list of supported template fields
+``('<identifier>': description)``, that can be used to define new template
+descriptors by adding their identifier to the format string
+(support for more data types will be added later):
+
+ - 'd': the digest of the event (i.e. the digest of a measured file),
+   calculated with the SHA1 or MD5 hash algorithm;
+ - 'n': the name of the event (i.e. the file name), with size up to 255 bytes;
+ - 'd-ng': the digest of the event, calculated with an arbitrary hash
+   algorithm (field format: <hash algo>:digest);
+ - 'd-ngv2': same as d-ng, but prefixed with the "ima" or "verity" digest type
+   (field format: <digest type>:<hash algo>:digest);
+ - 'd-modsig': the digest of the event without the appended modsig;
+ - 'n-ng': the name of the event, without size limitations;
+ - 'sig': the file signature, based on either the file's/fsverity's digest[1],
+   or the EVM portable signature, if 'security.ima' contains a file hash.
+ - 'modsig' the appended file signature;
+ - 'buf': the buffer data that was used to generate the hash without size limitations;
+ - 'evmsig': the EVM portable signature;
+ - 'iuid': the inode UID;
+ - 'igid': the inode GID;
+ - 'imode': the inode mode;
+ - 'xattrnames': a list of xattr names (separated by ``|``), only if the xattr is
+    present;
+ - 'xattrlengths': a list of xattr lengths (u32), only if the xattr is present;
+ - 'xattrvalues': a list of xattr values;
+
+
+Below, there is the list of defined template descriptors:
+
+ - "ima": its format is ``d|n``;
+ - "ima-ng" (default): its format is ``d-ng|n-ng``;
+ - "ima-ngv2": its format is ``d-ngv2|n-ng``;
+ - "ima-sig": its format is ``d-ng|n-ng|sig``;
+ - "ima-sigv2": its format is ``d-ngv2|n-ng|sig``;
+ - "ima-buf": its format is ``d-ng|n-ng|buf``;
+ - "ima-modsig": its format is ``d-ng|n-ng|sig|d-modsig|modsig``;
+ - "evm-sig": its format is ``d-ng|n-ng|evmsig|xattrnames|xattrlengths|xattrvalues|iuid|igid|imode``;
+
+
+Use
+===
+
+To specify the template descriptor to be used to generate measurement entries,
+currently the following methods are supported:
+
+ - select a template descriptor among those supported in the kernel
+   configuration (``ima-ng`` is the default choice);
+ - specify a template descriptor name from the kernel command line through
+   the ``ima_template=`` parameter;
+ - register a new template descriptor with custom format through the kernel
+   command line parameter ``ima_template_fmt=``.
diff --git a/Documentation/security/SCTP.rst b/Documentation/security/SCTP.rst
new file mode 100644
index 0000000000..b73eb764a0
--- /dev/null
+++ b/Documentation/security/SCTP.rst
@@ -0,0 +1,344 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====
+SCTP
+====
+
+SCTP LSM Support
+================
+
+Security Hooks
+--------------
+
+For security module support, three SCTP specific hooks have been implemented::
+
+    security_sctp_assoc_request()
+    security_sctp_bind_connect()
+    security_sctp_sk_clone()
+    security_sctp_assoc_established()
+
+The usage of these hooks are described below with the SELinux implementation
+described in the `SCTP SELinux Support`_ chapter.
+
+
+security_sctp_assoc_request()
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Passes the ``@asoc`` and ``@chunk->skb`` of the association INIT packet to the
+security module. Returns 0 on success, error on failure.
+::
+
+    @asoc - pointer to sctp association structure.
+    @skb - pointer to skbuff of association packet.
+
+
+security_sctp_bind_connect()
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Passes one or more ipv4/ipv6 addresses to the security module for validation
+based on the ``@optname`` that will result in either a bind or connect
+service as shown in the permission check tables below.
+Returns 0 on success, error on failure.
+::
+
+    @sk      - Pointer to sock structure.
+    @optname - Name of the option to validate.
+    @address - One or more ipv4 / ipv6 addresses.
+    @addrlen - The total length of address(s). This is calculated on each
+               ipv4 or ipv6 address using sizeof(struct sockaddr_in) or
+               sizeof(struct sockaddr_in6).
+
+  ------------------------------------------------------------------
+  |                     BIND Type Checks                           |
+  |       @optname             |         @address contains         |
+  |----------------------------|-----------------------------------|
+  | SCTP_SOCKOPT_BINDX_ADD     | One or more ipv4 / ipv6 addresses |
+  | SCTP_PRIMARY_ADDR          | Single ipv4 or ipv6 address       |
+  | SCTP_SET_PEER_PRIMARY_ADDR | Single ipv4 or ipv6 address       |
+  ------------------------------------------------------------------
+
+  ------------------------------------------------------------------
+  |                   CONNECT Type Checks                          |
+  |       @optname             |         @address contains         |
+  |----------------------------|-----------------------------------|
+  | SCTP_SOCKOPT_CONNECTX      | One or more ipv4 / ipv6 addresses |
+  | SCTP_PARAM_ADD_IP          | One or more ipv4 / ipv6 addresses |
+  | SCTP_SENDMSG_CONNECT       | Single ipv4 or ipv6 address       |
+  | SCTP_PARAM_SET_PRIMARY     | Single ipv4 or ipv6 address       |
+  ------------------------------------------------------------------
+
+A summary of the ``@optname`` entries is as follows::
+
+    SCTP_SOCKOPT_BINDX_ADD - Allows additional bind addresses to be
+                             associated after (optionally) calling
+                             bind(3).
+                             sctp_bindx(3) adds a set of bind
+                             addresses on a socket.
+
+    SCTP_SOCKOPT_CONNECTX - Allows the allocation of multiple
+                            addresses for reaching a peer
+                            (multi-homed).
+                            sctp_connectx(3) initiates a connection
+                            on an SCTP socket using multiple
+                            destination addresses.
+
+    SCTP_SENDMSG_CONNECT  - Initiate a connection that is generated by a
+                            sendmsg(2) or sctp_sendmsg(3) on a new asociation.
+
+    SCTP_PRIMARY_ADDR     - Set local primary address.
+
+    SCTP_SET_PEER_PRIMARY_ADDR - Request peer sets address as
+                                 association primary.
+
+    SCTP_PARAM_ADD_IP          - These are used when Dynamic Address
+    SCTP_PARAM_SET_PRIMARY     - Reconfiguration is enabled as explained below.
+
+
+To support Dynamic Address Reconfiguration the following parameters must be
+enabled on both endpoints (or use the appropriate **setsockopt**\(2))::
+
+    /proc/sys/net/sctp/addip_enable
+    /proc/sys/net/sctp/addip_noauth_enable
+
+then the following *_PARAM_*'s are sent to the peer in an
+ASCONF chunk when the corresponding ``@optname``'s are present::
+
+          @optname                      ASCONF Parameter
+         ----------                    ------------------
+    SCTP_SOCKOPT_BINDX_ADD     ->   SCTP_PARAM_ADD_IP
+    SCTP_SET_PEER_PRIMARY_ADDR ->   SCTP_PARAM_SET_PRIMARY
+
+
+security_sctp_sk_clone()
+~~~~~~~~~~~~~~~~~~~~~~~~
+Called whenever a new socket is created by **accept**\(2)
+(i.e. a TCP style socket) or when a socket is 'peeled off' e.g userspace
+calls **sctp_peeloff**\(3).
+::
+
+    @asoc - pointer to current sctp association structure.
+    @sk - pointer to current sock structure.
+    @newsk - pointer to new sock structure.
+
+
+security_sctp_assoc_established()
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Called when a COOKIE ACK is received, and the peer secid will be
+saved into ``@asoc->peer_secid`` for client::
+
+    @asoc - pointer to sctp association structure.
+    @skb - pointer to skbuff of the COOKIE ACK packet.
+
+
+Security Hooks used for Association Establishment
+-------------------------------------------------
+
+The following diagram shows the use of ``security_sctp_bind_connect()``,
+``security_sctp_assoc_request()``, ``security_sctp_assoc_established()`` when
+establishing an association.
+::
+
+      SCTP endpoint "A"                                SCTP endpoint "Z"
+      =================                                =================
+    sctp_sf_do_prm_asoc()
+ Association setup can be initiated
+ by a connect(2), sctp_connectx(3),
+ sendmsg(2) or sctp_sendmsg(3).
+ These will result in a call to
+ security_sctp_bind_connect() to
+ initiate an association to
+ SCTP peer endpoint "Z".
+         INIT --------------------------------------------->
+                                                   sctp_sf_do_5_1B_init()
+                                                 Respond to an INIT chunk.
+                                             SCTP peer endpoint "A" is asking
+                                             for a temporary association.
+                                             Call security_sctp_assoc_request()
+                                             to set the peer label if first
+                                             association.
+                                             If not first association, check
+                                             whether allowed, IF so send:
+          <----------------------------------------------- INIT ACK
+          |                                  ELSE audit event and silently
+          |                                       discard the packet.
+          |
+    COOKIE ECHO ------------------------------------------>
+                                                  sctp_sf_do_5_1D_ce()
+                                             Respond to an COOKIE ECHO chunk.
+                                             Confirm the cookie and create a
+                                             permanent association.
+                                             Call security_sctp_assoc_request() to
+                                             do the same as for INIT chunk Response.
+          <------------------------------------------- COOKIE ACK
+          |                                               |
+    sctp_sf_do_5_1E_ca                                    |
+ Call security_sctp_assoc_established()                   |
+ to set the peer label.                                   |
+          |                                               |
+          |                               If SCTP_SOCKET_TCP or peeled off
+          |                               socket security_sctp_sk_clone() is
+          |                               called to clone the new socket.
+          |                                               |
+      ESTABLISHED                                    ESTABLISHED
+          |                                               |
+    ------------------------------------------------------------------
+    |                     Association Established                    |
+    ------------------------------------------------------------------
+
+
+SCTP SELinux Support
+====================
+
+Security Hooks
+--------------
+
+The `SCTP LSM Support`_ chapter above describes the following SCTP security
+hooks with the SELinux specifics expanded below::
+
+    security_sctp_assoc_request()
+    security_sctp_bind_connect()
+    security_sctp_sk_clone()
+    security_sctp_assoc_established()
+
+
+security_sctp_assoc_request()
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Passes the ``@asoc`` and ``@chunk->skb`` of the association INIT packet to the
+security module. Returns 0 on success, error on failure.
+::
+
+    @asoc - pointer to sctp association structure.
+    @skb - pointer to skbuff of association packet.
+
+The security module performs the following operations:
+     IF this is the first association on ``@asoc->base.sk``, then set the peer
+     sid to that in ``@skb``. This will ensure there is only one peer sid
+     assigned to ``@asoc->base.sk`` that may support multiple associations.
+
+     ELSE validate the ``@asoc->base.sk peer_sid`` against the ``@skb peer sid``
+     to determine whether the association should be allowed or denied.
+
+     Set the sctp ``@asoc sid`` to socket's sid (from ``asoc->base.sk``) with
+     MLS portion taken from ``@skb peer sid``. This will be used by SCTP
+     TCP style sockets and peeled off connections as they cause a new socket
+     to be generated.
+
+     If IP security options are configured (CIPSO/CALIPSO), then the ip
+     options are set on the socket.
+
+
+security_sctp_bind_connect()
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Checks permissions required for ipv4/ipv6 addresses based on the ``@optname``
+as follows::
+
+  ------------------------------------------------------------------
+  |                   BIND Permission Checks                       |
+  |       @optname             |         @address contains         |
+  |----------------------------|-----------------------------------|
+  | SCTP_SOCKOPT_BINDX_ADD     | One or more ipv4 / ipv6 addresses |
+  | SCTP_PRIMARY_ADDR          | Single ipv4 or ipv6 address       |
+  | SCTP_SET_PEER_PRIMARY_ADDR | Single ipv4 or ipv6 address       |
+  ------------------------------------------------------------------
+
+  ------------------------------------------------------------------
+  |                 CONNECT Permission Checks                      |
+  |       @optname             |         @address contains         |
+  |----------------------------|-----------------------------------|
+  | SCTP_SOCKOPT_CONNECTX      | One or more ipv4 / ipv6 addresses |
+  | SCTP_PARAM_ADD_IP          | One or more ipv4 / ipv6 addresses |
+  | SCTP_SENDMSG_CONNECT       | Single ipv4 or ipv6 address       |
+  | SCTP_PARAM_SET_PRIMARY     | Single ipv4 or ipv6 address       |
+  ------------------------------------------------------------------
+
+
+`SCTP LSM Support`_ gives a summary of the ``@optname``
+entries and also describes ASCONF chunk processing when Dynamic Address
+Reconfiguration is enabled.
+
+
+security_sctp_sk_clone()
+~~~~~~~~~~~~~~~~~~~~~~~~
+Called whenever a new socket is created by **accept**\(2) (i.e. a TCP style
+socket) or when a socket is 'peeled off' e.g userspace calls
+**sctp_peeloff**\(3). ``security_sctp_sk_clone()`` will set the new
+sockets sid and peer sid to that contained in the ``@asoc sid`` and
+``@asoc peer sid`` respectively.
+::
+
+    @asoc - pointer to current sctp association structure.
+    @sk - pointer to current sock structure.
+    @newsk - pointer to new sock structure.
+
+
+security_sctp_assoc_established()
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Called when a COOKIE ACK is received where it sets the connection's peer sid
+to that in ``@skb``::
+
+    @asoc - pointer to sctp association structure.
+    @skb - pointer to skbuff of the COOKIE ACK packet.
+
+
+Policy Statements
+-----------------
+The following class and permissions to support SCTP are available within the
+kernel::
+
+    class sctp_socket inherits socket { node_bind }
+
+whenever the following policy capability is enabled::
+
+    policycap extended_socket_class;
+
+SELinux SCTP support adds the ``name_connect`` permission for connecting
+to a specific port type and the ``association`` permission that is explained
+in the section below.
+
+If userspace tools have been updated, SCTP will support the ``portcon``
+statement as shown in the following example::
+
+    portcon sctp 1024-1036 system_u:object_r:sctp_ports_t:s0
+
+
+SCTP Peer Labeling
+------------------
+An SCTP socket will only have one peer label assigned to it. This will be
+assigned during the establishment of the first association. Any further
+associations on this socket will have their packet peer label compared to
+the sockets peer label, and only if they are different will the
+``association`` permission be validated. This is validated by checking the
+socket peer sid against the received packets peer sid to determine whether
+the association should be allowed or denied.
+
+NOTES:
+   1) If peer labeling is not enabled, then the peer context will always be
+      ``SECINITSID_UNLABELED`` (``unlabeled_t`` in Reference Policy).
+
+   2) As SCTP can support more than one transport address per endpoint
+      (multi-homing) on a single socket, it is possible to configure policy
+      and NetLabel to provide different peer labels for each of these. As the
+      socket peer label is determined by the first associations transport
+      address, it is recommended that all peer labels are consistent.
+
+   3) **getpeercon**\(3) may be used by userspace to retrieve the sockets peer
+      context.
+
+   4) While not SCTP specific, be aware when using NetLabel that if a label
+      is assigned to a specific interface, and that interface 'goes down',
+      then the NetLabel service will remove the entry. Therefore ensure that
+      the network startup scripts call **netlabelctl**\(8) to set the required
+      label (see **netlabel-config**\(8) helper script for details).
+
+   5) The NetLabel SCTP peer labeling rules apply as discussed in the following
+      set of posts tagged "netlabel" at: https://www.paul-moore.com/blog/t.
+
+   6) CIPSO is only supported for IPv4 addressing: ``socket(AF_INET, ...)``
+      CALIPSO is only supported for IPv6 addressing: ``socket(AF_INET6, ...)``
+
+      Note the following when testing CIPSO/CALIPSO:
+         a) CIPSO will send an ICMP packet if an SCTP packet cannot be
+            delivered because of an invalid label.
+         b) CALIPSO does not send an ICMP packet, just silently discards it.
+
+   7) IPSEC is not supported as RFC 3554 - sctp/ipsec support has not been
+      implemented in userspace (**racoon**\(8) or **ipsec_pluto**\(8)),
+      although the kernel supports SCTP/IPSEC.
diff --git a/Documentation/security/credentials.rst b/Documentation/security/credentials.rst
new file mode 100644
index 0000000000..357328d566
--- /dev/null
+++ b/Documentation/security/credentials.rst
@@ -0,0 +1,564 @@
+====================
+Credentials in Linux
+====================
+
+By: David Howells <dhowells@redhat.com>
+
+.. contents:: :local:
+
+Overview
+========
+
+There are several parts to the security check performed by Linux when one
+object acts upon another:
+
+ 1. Objects.
+
+     Objects are things in the system that may be acted upon directly by
+     userspace programs.  Linux has a variety of actionable objects, including:
+
+	- Tasks
+	- Files/inodes
+	- Sockets
+	- Message queues
+	- Shared memory segments
+	- Semaphores
+	- Keys
+
+     As a part of the description of all these objects there is a set of
+     credentials.  What's in the set depends on the type of object.
+
+ 2. Object ownership.
+
+     Amongst the credentials of most objects, there will be a subset that
+     indicates the ownership of that object.  This is used for resource
+     accounting and limitation (disk quotas and task rlimits for example).
+
+     In a standard UNIX filesystem, for instance, this will be defined by the
+     UID marked on the inode.
+
+ 3. The objective context.
+
+     Also amongst the credentials of those objects, there will be a subset that
+     indicates the 'objective context' of that object.  This may or may not be
+     the same set as in (2) - in standard UNIX files, for instance, this is the
+     defined by the UID and the GID marked on the inode.
+
+     The objective context is used as part of the security calculation that is
+     carried out when an object is acted upon.
+
+ 4. Subjects.
+
+     A subject is an object that is acting upon another object.
+
+     Most of the objects in the system are inactive: they don't act on other
+     objects within the system.  Processes/tasks are the obvious exception:
+     they do stuff; they access and manipulate things.
+
+     Objects other than tasks may under some circumstances also be subjects.
+     For instance an open file may send SIGIO to a task using the UID and EUID
+     given to it by a task that called ``fcntl(F_SETOWN)`` upon it.  In this case,
+     the file struct will have a subjective context too.
+
+ 5. The subjective context.
+
+     A subject has an additional interpretation of its credentials.  A subset
+     of its credentials forms the 'subjective context'.  The subjective context
+     is used as part of the security calculation that is carried out when a
+     subject acts.
+
+     A Linux task, for example, has the FSUID, FSGID and the supplementary
+     group list for when it is acting upon a file - which are quite separate
+     from the real UID and GID that normally form the objective context of the
+     task.
+
+ 6. Actions.
+
+     Linux has a number of actions available that a subject may perform upon an
+     object.  The set of actions available depends on the nature of the subject
+     and the object.
+
+     Actions include reading, writing, creating and deleting files; forking or
+     signalling and tracing tasks.
+
+ 7. Rules, access control lists and security calculations.
+
+     When a subject acts upon an object, a security calculation is made.  This
+     involves taking the subjective context, the objective context and the
+     action, and searching one or more sets of rules to see whether the subject
+     is granted or denied permission to act in the desired manner on the
+     object, given those contexts.
+
+     There are two main sources of rules:
+
+     a. Discretionary access control (DAC):
+
+	 Sometimes the object will include sets of rules as part of its
+	 description.  This is an 'Access Control List' or 'ACL'.  A Linux
+	 file may supply more than one ACL.
+
+	 A traditional UNIX file, for example, includes a permissions mask that
+	 is an abbreviated ACL with three fixed classes of subject ('user',
+	 'group' and 'other'), each of which may be granted certain privileges
+	 ('read', 'write' and 'execute' - whatever those map to for the object
+	 in question).  UNIX file permissions do not allow the arbitrary
+	 specification of subjects, however, and so are of limited use.
+
+	 A Linux file might also sport a POSIX ACL.  This is a list of rules
+	 that grants various permissions to arbitrary subjects.
+
+     b. Mandatory access control (MAC):
+
+	 The system as a whole may have one or more sets of rules that get
+	 applied to all subjects and objects, regardless of their source.
+	 SELinux and Smack are examples of this.
+
+	 In the case of SELinux and Smack, each object is given a label as part
+	 of its credentials.  When an action is requested, they take the
+	 subject label, the object label and the action and look for a rule
+	 that says that this action is either granted or denied.
+
+
+Types of Credentials
+====================
+
+The Linux kernel supports the following types of credentials:
+
+ 1. Traditional UNIX credentials.
+
+	- Real User ID
+	- Real Group ID
+
+     The UID and GID are carried by most, if not all, Linux objects, even if in
+     some cases it has to be invented (FAT or CIFS files for example, which are
+     derived from Windows).  These (mostly) define the objective context of
+     that object, with tasks being slightly different in some cases.
+
+	- Effective, Saved and FS User ID
+	- Effective, Saved and FS Group ID
+	- Supplementary groups
+
+     These are additional credentials used by tasks only.  Usually, an
+     EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID
+     will be used as the objective.  For tasks, it should be noted that this is
+     not always true.
+
+ 2. Capabilities.
+
+	- Set of permitted capabilities
+	- Set of inheritable capabilities
+	- Set of effective capabilities
+	- Capability bounding set
+
+     These are only carried by tasks.  They indicate superior capabilities
+     granted piecemeal to a task that an ordinary task wouldn't otherwise have.
+     These are manipulated implicitly by changes to the traditional UNIX
+     credentials, but can also be manipulated directly by the ``capset()``
+     system call.
+
+     The permitted capabilities are those caps that the process might grant
+     itself to its effective or permitted sets through ``capset()``.  This
+     inheritable set might also be so constrained.
+
+     The effective capabilities are the ones that a task is actually allowed to
+     make use of itself.
+
+     The inheritable capabilities are the ones that may get passed across
+     ``execve()``.
+
+     The bounding set limits the capabilities that may be inherited across
+     ``execve()``, especially when a binary is executed that will execute as
+     UID 0.
+
+ 3. Secure management flags (securebits).
+
+     These are only carried by tasks.  These govern the way the above
+     credentials are manipulated and inherited over certain operations such as
+     execve().  They aren't used directly as objective or subjective
+     credentials.
+
+ 4. Keys and keyrings.
+
+     These are only carried by tasks.  They carry and cache security tokens
+     that don't fit into the other standard UNIX credentials.  They are for
+     making such things as network filesystem keys available to the file
+     accesses performed by processes, without the necessity of ordinary
+     programs having to know about security details involved.
+
+     Keyrings are a special type of key.  They carry sets of other keys and can
+     be searched for the desired key.  Each process may subscribe to a number
+     of keyrings:
+
+	Per-thread keying
+	Per-process keyring
+	Per-session keyring
+
+     When a process accesses a key, if not already present, it will normally be
+     cached on one of these keyrings for future accesses to find.
+
+     For more information on using keys, see ``Documentation/security/keys/*``.
+
+ 5. LSM
+
+     The Linux Security Module allows extra controls to be placed over the
+     operations that a task may do.  Currently Linux supports several LSM
+     options.
+
+     Some work by labelling the objects in a system and then applying sets of
+     rules (policies) that say what operations a task with one label may do to
+     an object with another label.
+
+ 6. AF_KEY
+
+     This is a socket-based approach to credential management for networking
+     stacks [RFC 2367].  It isn't discussed by this document as it doesn't
+     interact directly with task and file credentials; rather it keeps system
+     level credentials.
+
+
+When a file is opened, part of the opening task's subjective context is
+recorded in the file struct created.  This allows operations using that file
+struct to use those credentials instead of the subjective context of the task
+that issued the operation.  An example of this would be a file opened on a
+network filesystem where the credentials of the opened file should be presented
+to the server, regardless of who is actually doing a read or a write upon it.
+
+
+File Markings
+=============
+
+Files on disk or obtained over the network may have annotations that form the
+objective security context of that file.  Depending on the type of filesystem,
+this may include one or more of the following:
+
+ * UNIX UID, GID, mode;
+ * Windows user ID;
+ * Access control list;
+ * LSM security label;
+ * UNIX exec privilege escalation bits (SUID/SGID);
+ * File capabilities exec privilege escalation bits.
+
+These are compared to the task's subjective security context, and certain
+operations allowed or disallowed as a result.  In the case of execve(), the
+privilege escalation bits come into play, and may allow the resulting process
+extra privileges, based on the annotations on the executable file.
+
+
+Task Credentials
+================
+
+In Linux, all of a task's credentials are held in (uid, gid) or through
+(groups, keys, LSM security) a refcounted structure of type 'struct cred'.
+Each task points to its credentials by a pointer called 'cred' in its
+task_struct.
+
+Once a set of credentials has been prepared and committed, it may not be
+changed, barring the following exceptions:
+
+ 1. its reference count may be changed;
+
+ 2. the reference count on the group_info struct it points to may be changed;
+
+ 3. the reference count on the security data it points to may be changed;
+
+ 4. the reference count on any keyrings it points to may be changed;
+
+ 5. any keyrings it points to may be revoked, expired or have their security
+    attributes changed; and
+
+ 6. the contents of any keyrings to which it points may be changed (the whole
+    point of keyrings being a shared set of credentials, modifiable by anyone
+    with appropriate access).
+
+To alter anything in the cred struct, the copy-and-replace principle must be
+adhered to.  First take a copy, then alter the copy and then use RCU to change
+the task pointer to make it point to the new copy.  There are wrappers to aid
+with this (see below).
+
+A task may only alter its _own_ credentials; it is no longer permitted for a
+task to alter another's credentials.  This means the ``capset()`` system call
+is no longer permitted to take any PID other than the one of the current
+process. Also ``keyctl_instantiate()`` and ``keyctl_negate()`` functions no
+longer permit attachment to process-specific keyrings in the requesting
+process as the instantiating process may need to create them.
+
+
+Immutable Credentials
+---------------------
+
+Once a set of credentials has been made public (by calling ``commit_creds()``
+for example), it must be considered immutable, barring two exceptions:
+
+ 1. The reference count may be altered.
+
+ 2. While the keyring subscriptions of a set of credentials may not be
+    changed, the keyrings subscribed to may have their contents altered.
+
+To catch accidental credential alteration at compile time, struct task_struct
+has _const_ pointers to its credential sets, as does struct file.  Furthermore,
+certain functions such as ``get_cred()`` and ``put_cred()`` operate on const
+pointers, thus rendering casts unnecessary, but require to temporarily ditch
+the const qualification to be able to alter the reference count.
+
+
+Accessing Task Credentials
+--------------------------
+
+A task being able to alter only its own credentials permits the current process
+to read or replace its own credentials without the need for any form of locking
+-- which simplifies things greatly.  It can just call::
+
+	const struct cred *current_cred()
+
+to get a pointer to its credentials structure, and it doesn't have to release
+it afterwards.
+
+There are convenience wrappers for retrieving specific aspects of a task's
+credentials (the value is simply returned in each case)::
+
+	uid_t current_uid(void)		Current's real UID
+	gid_t current_gid(void)		Current's real GID
+	uid_t current_euid(void)	Current's effective UID
+	gid_t current_egid(void)	Current's effective GID
+	uid_t current_fsuid(void)	Current's file access UID
+	gid_t current_fsgid(void)	Current's file access GID
+	kernel_cap_t current_cap(void)	Current's effective capabilities
+	struct user_struct *current_user(void)  Current's user account
+
+There are also convenience wrappers for retrieving specific associated pairs of
+a task's credentials::
+
+	void current_uid_gid(uid_t *, gid_t *);
+	void current_euid_egid(uid_t *, gid_t *);
+	void current_fsuid_fsgid(uid_t *, gid_t *);
+
+which return these pairs of values through their arguments after retrieving
+them from the current task's credentials.
+
+
+In addition, there is a function for obtaining a reference on the current
+process's current set of credentials::
+
+	const struct cred *get_current_cred(void);
+
+and functions for getting references to one of the credentials that don't
+actually live in struct cred::
+
+	struct user_struct *get_current_user(void);
+	struct group_info *get_current_groups(void);
+
+which get references to the current process's user accounting structure and
+supplementary groups list respectively.
+
+Once a reference has been obtained, it must be released with ``put_cred()``,
+``free_uid()`` or ``put_group_info()`` as appropriate.
+
+
+Accessing Another Task's Credentials
+------------------------------------
+
+While a task may access its own credentials without the need for locking, the
+same is not true of a task wanting to access another task's credentials.  It
+must use the RCU read lock and ``rcu_dereference()``.
+
+The ``rcu_dereference()`` is wrapped by::
+
+	const struct cred *__task_cred(struct task_struct *task);
+
+This should be used inside the RCU read lock, as in the following example::
+
+	void foo(struct task_struct *t, struct foo_data *f)
+	{
+		const struct cred *tcred;
+		...
+		rcu_read_lock();
+		tcred = __task_cred(t);
+		f->uid = tcred->uid;
+		f->gid = tcred->gid;
+		f->groups = get_group_info(tcred->groups);
+		rcu_read_unlock();
+		...
+	}
+
+Should it be necessary to hold another task's credentials for a long period of
+time, and possibly to sleep while doing so, then the caller should get a
+reference on them using::
+
+	const struct cred *get_task_cred(struct task_struct *task);
+
+This does all the RCU magic inside of it.  The caller must call put_cred() on
+the credentials so obtained when they're finished with.
+
+.. note::
+   The result of ``__task_cred()`` should not be passed directly to
+   ``get_cred()`` as this may race with ``commit_cred()``.
+
+There are a couple of convenience functions to access bits of another task's
+credentials, hiding the RCU magic from the caller::
+
+	uid_t task_uid(task)		Task's real UID
+	uid_t task_euid(task)		Task's effective UID
+
+If the caller is holding the RCU read lock at the time anyway, then::
+
+	__task_cred(task)->uid
+	__task_cred(task)->euid
+
+should be used instead.  Similarly, if multiple aspects of a task's credentials
+need to be accessed, RCU read lock should be used, ``__task_cred()`` called,
+the result stored in a temporary pointer and then the credential aspects called
+from that before dropping the lock.  This prevents the potentially expensive
+RCU magic from being invoked multiple times.
+
+Should some other single aspect of another task's credentials need to be
+accessed, then this can be used::
+
+	task_cred_xxx(task, member)
+
+where 'member' is a non-pointer member of the cred struct.  For instance::
+
+	uid_t task_cred_xxx(task, suid);
+
+will retrieve 'struct cred::suid' from the task, doing the appropriate RCU
+magic.  This may not be used for pointer members as what they point to may
+disappear the moment the RCU read lock is dropped.
+
+
+Altering Credentials
+--------------------
+
+As previously mentioned, a task may only alter its own credentials, and may not
+alter those of another task.  This means that it doesn't need to use any
+locking to alter its own credentials.
+
+To alter the current process's credentials, a function should first prepare a
+new set of credentials by calling::
+
+	struct cred *prepare_creds(void);
+
+this locks current->cred_replace_mutex and then allocates and constructs a
+duplicate of the current process's credentials, returning with the mutex still
+held if successful.  It returns NULL if not successful (out of memory).
+
+The mutex prevents ``ptrace()`` from altering the ptrace state of a process
+while security checks on credentials construction and changing is taking place
+as the ptrace state may alter the outcome, particularly in the case of
+``execve()``.
+
+The new credentials set should be altered appropriately, and any security
+checks and hooks done.  Both the current and the proposed sets of credentials
+are available for this purpose as current_cred() will return the current set
+still at this point.
+
+When replacing the group list, the new list must be sorted before it
+is added to the credential, as a binary search is used to test for
+membership.  In practice, this means groups_sort() should be
+called before set_groups() or set_current_groups().
+groups_sort() must not be called on a ``struct group_list`` which
+is shared as it may permute elements as part of the sorting process
+even if the array is already sorted.
+
+When the credential set is ready, it should be committed to the current process
+by calling::
+
+	int commit_creds(struct cred *new);
+
+This will alter various aspects of the credentials and the process, giving the
+LSM a chance to do likewise, then it will use ``rcu_assign_pointer()`` to
+actually commit the new credentials to ``current->cred``, it will release
+``current->cred_replace_mutex`` to allow ``ptrace()`` to take place, and it
+will notify the scheduler and others of the changes.
+
+This function is guaranteed to return 0, so that it can be tail-called at the
+end of such functions as ``sys_setresuid()``.
+
+Note that this function consumes the caller's reference to the new credentials.
+The caller should _not_ call ``put_cred()`` on the new credentials afterwards.
+
+Furthermore, once this function has been called on a new set of credentials,
+those credentials may _not_ be changed further.
+
+
+Should the security checks fail or some other error occur after
+``prepare_creds()`` has been called, then the following function should be
+invoked::
+
+	void abort_creds(struct cred *new);
+
+This releases the lock on ``current->cred_replace_mutex`` that
+``prepare_creds()`` got and then releases the new credentials.
+
+
+A typical credentials alteration function would look something like this::
+
+	int alter_suid(uid_t suid)
+	{
+		struct cred *new;
+		int ret;
+
+		new = prepare_creds();
+		if (!new)
+			return -ENOMEM;
+
+		new->suid = suid;
+		ret = security_alter_suid(new);
+		if (ret < 0) {
+			abort_creds(new);
+			return ret;
+		}
+
+		return commit_creds(new);
+	}
+
+
+Managing Credentials
+--------------------
+
+There are some functions to help manage credentials:
+
+ - ``void put_cred(const struct cred *cred);``
+
+     This releases a reference to the given set of credentials.  If the
+     reference count reaches zero, the credentials will be scheduled for
+     destruction by the RCU system.
+
+ - ``const struct cred *get_cred(const struct cred *cred);``
+
+     This gets a reference on a live set of credentials, returning a pointer to
+     that set of credentials.
+
+ - ``struct cred *get_new_cred(struct cred *cred);``
+
+     This gets a reference on a set of credentials that is under construction
+     and is thus still mutable, returning a pointer to that set of credentials.
+
+
+Open File Credentials
+=====================
+
+When a new file is opened, a reference is obtained on the opening task's
+credentials and this is attached to the file struct as ``f_cred`` in place of
+``f_uid`` and ``f_gid``.  Code that used to access ``file->f_uid`` and
+``file->f_gid`` should now access ``file->f_cred->fsuid`` and
+``file->f_cred->fsgid``.
+
+It is safe to access ``f_cred`` without the use of RCU or locking because the
+pointer will not change over the lifetime of the file struct, and nor will the
+contents of the cred struct pointed to, barring the exceptions listed above
+(see the Task Credentials section).
+
+To avoid "confused deputy" privilege escalation attacks, access control checks
+during subsequent operations on an opened file should use these credentials
+instead of "current"'s credentials, as the file may have been passed to a more
+privileged process.
+
+Overriding the VFS's Use of Credentials
+=======================================
+
+Under some circumstances it is desirable to override the credentials used by
+the VFS, and that can be done by calling into such as ``vfs_mkdir()`` with a
+different set of credentials.  This is done in the following places:
+
+ * ``sys_faccessat()``.
+ * ``do_coredump()``.
+ * nfs4recover.c.
diff --git a/Documentation/security/digsig.rst b/Documentation/security/digsig.rst
new file mode 100644
index 0000000000..de035f2821
--- /dev/null
+++ b/Documentation/security/digsig.rst
@@ -0,0 +1,101 @@
+==================================
+Digital Signature Verification API
+==================================
+
+:Author: Dmitry Kasatkin
+:Date: 06.10.2011
+
+
+.. CONTENTS
+
+   1. Introduction
+   2. API
+   3. User-space utilities
+
+
+Introduction
+============
+
+Digital signature verification API provides a method to verify digital signature.
+Currently digital signatures are used by the IMA/EVM integrity protection subsystem.
+
+Digital signature verification is implemented using cut-down kernel port of
+GnuPG multi-precision integers (MPI) library. The kernel port provides
+memory allocation errors handling, has been refactored according to kernel
+coding style, and checkpatch.pl reported errors and warnings have been fixed.
+
+Public key and signature consist of header and MPIs::
+
+	struct pubkey_hdr {
+		uint8_t		version;	/* key format version */
+		time_t		timestamp;	/* key made, always 0 for now */
+		uint8_t		algo;
+		uint8_t		nmpi;
+		char		mpi[0];
+	} __packed;
+
+	struct signature_hdr {
+		uint8_t		version;	/* signature format version */
+		time_t		timestamp;	/* signature made */
+		uint8_t		algo;
+		uint8_t		hash;
+		uint8_t		keyid[8];
+		uint8_t		nmpi;
+		char		mpi[0];
+	} __packed;
+
+keyid equals to SHA1[12-19] over the total key content.
+Signature header is used as an input to generate a signature.
+Such approach insures that key or signature header could not be changed.
+It protects timestamp from been changed and can be used for rollback
+protection.
+
+API
+===
+
+API currently includes only 1 function::
+
+	digsig_verify() - digital signature verification with public key
+
+
+	/**
+	* digsig_verify() - digital signature verification with public key
+	* @keyring:	keyring to search key in
+	* @sig:	digital signature
+	* @sigen:	length of the signature
+	* @data:	data
+	* @datalen:	length of the data
+	* @return:	0 on success, -EINVAL otherwise
+	*
+	* Verifies data integrity against digital signature.
+	* Currently only RSA is supported.
+	* Normally hash of the content is used as a data for this function.
+	*
+	*/
+	int digsig_verify(struct key *keyring, const char *sig, int siglen,
+			  const char *data, int datalen);
+
+User-space utilities
+====================
+
+The signing and key management utilities evm-utils provide functionality
+to generate signatures, to load keys into the kernel keyring.
+Keys can be in PEM or converted to the kernel format.
+When the key is added to the kernel keyring, the keyid defines the name
+of the key: 5D2B05FC633EE3E8 in the example below.
+
+Here is example output of the keyctl utility::
+
+	$ keyctl show
+	Session Keyring
+	-3 --alswrv      0     0  keyring: _ses
+	603976250 --alswrv      0    -1   \_ keyring: _uid.0
+	817777377 --alswrv      0     0       \_ user: kmk
+	891974900 --alswrv      0     0       \_ encrypted: evm-key
+	170323636 --alswrv      0     0       \_ keyring: _module
+	548221616 --alswrv      0     0       \_ keyring: _ima
+	128198054 --alswrv      0     0       \_ keyring: _evm
+
+	$ keyctl list 128198054
+	1 key in keyring:
+	620789745: --alswrv     0     0 user: 5D2B05FC633EE3E8
diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
new file mode 100644
index 0000000000..6ed8d2fa6f
--- /dev/null
+++ b/Documentation/security/index.rst
@@ -0,0 +1,20 @@
+======================
+Security Documentation
+======================
+
+.. toctree::
+   :maxdepth: 1
+
+   credentials
+   IMA-templates
+   keys/index
+   lsm
+   lsm-development
+   sak
+   SCTP
+   self-protection
+   siphash
+   tpm/index
+   digsig
+   landlock
+   secrets/index
diff --git a/Documentation/security/keys/core.rst b/Documentation/security/keys/core.rst
new file mode 100644
index 0000000000..326b8a9738
--- /dev/null
+++ b/Documentation/security/keys/core.rst
@@ -0,0 +1,1849 @@
+============================
+Kernel Key Retention Service
+============================
+
+This service allows cryptographic keys, authentication tokens, cross-domain
+user mappings, and similar to be cached in the kernel for the use of
+filesystems and other kernel services.
+
+Keyrings are permitted; these are a special type of key that can hold links to
+other keys. Processes each have three standard keyring subscriptions that a
+kernel service can search for relevant keys.
+
+The key service can be configured on by enabling:
+
+	"Security options"/"Enable access key retention support" (CONFIG_KEYS)
+
+This document has the following sections:
+
+.. contents:: :local:
+
+
+Key Overview
+============
+
+In this context, keys represent units of cryptographic data, authentication
+tokens, keyrings, etc.. These are represented in the kernel by struct key.
+
+Each key has a number of attributes:
+
+	- A serial number.
+	- A type.
+	- A description (for matching a key in a search).
+	- Access control information.
+	- An expiry time.
+	- A payload.
+	- State.
+
+
+  *  Each key is issued a serial number of type key_serial_t that is unique for
+     the lifetime of that key. All serial numbers are positive non-zero 32-bit
+     integers.
+
+     Userspace programs can use a key's serial numbers as a way to gain access
+     to it, subject to permission checking.
+
+  *  Each key is of a defined "type". Types must be registered inside the
+     kernel by a kernel service (such as a filesystem) before keys of that type
+     can be added or used. Userspace programs cannot define new types directly.
+
+     Key types are represented in the kernel by struct key_type. This defines a
+     number of operations that can be performed on a key of that type.
+
+     Should a type be removed from the system, all the keys of that type will
+     be invalidated.
+
+  *  Each key has a description. This should be a printable string. The key
+     type provides an operation to perform a match between the description on a
+     key and a criterion string.
+
+  *  Each key has an owner user ID, a group ID and a permissions mask. These
+     are used to control what a process may do to a key from userspace, and
+     whether a kernel service will be able to find the key.
+
+  *  Each key can be set to expire at a specific time by the key type's
+     instantiation function. Keys can also be immortal.
+
+  *  Each key can have a payload. This is a quantity of data that represent the
+     actual "key". In the case of a keyring, this is a list of keys to which
+     the keyring links; in the case of a user-defined key, it's an arbitrary
+     blob of data.
+
+     Having a payload is not required; and the payload can, in fact, just be a
+     value stored in the struct key itself.
+
+     When a key is instantiated, the key type's instantiation function is
+     called with a blob of data, and that then creates the key's payload in
+     some way.
+
+     Similarly, when userspace wants to read back the contents of the key, if
+     permitted, another key type operation will be called to convert the key's
+     attached payload back into a blob of data.
+
+  *  Each key can be in one of a number of basic states:
+
+      *  Uninstantiated. The key exists, but does not have any data attached.
+     	 Keys being requested from userspace will be in this state.
+
+      *  Instantiated. This is the normal state. The key is fully formed, and
+	 has data attached.
+
+      *  Negative. This is a relatively short-lived state. The key acts as a
+	 note saying that a previous call out to userspace failed, and acts as
+	 a throttle on key lookups. A negative key can be updated to a normal
+	 state.
+
+      *  Expired. Keys can have lifetimes set. If their lifetime is exceeded,
+	 they traverse to this state. An expired key can be updated back to a
+	 normal state.
+
+      *  Revoked. A key is put in this state by userspace action. It can't be
+	 found or operated upon (apart from by unlinking it).
+
+      *  Dead. The key's type was unregistered, and so the key is now useless.
+
+Keys in the last three states are subject to garbage collection.  See the
+section on "Garbage collection".
+
+
+Key Service Overview
+====================
+
+The key service provides a number of features besides keys:
+
+  *  The key service defines three special key types:
+
+     (+) "keyring"
+
+	 Keyrings are special keys that contain a list of other keys. Keyring
+	 lists can be modified using various system calls. Keyrings should not
+	 be given a payload when created.
+
+     (+) "user"
+
+	 A key of this type has a description and a payload that are arbitrary
+	 blobs of data. These can be created, updated and read by userspace,
+	 and aren't intended for use by kernel services.
+
+     (+) "logon"
+
+	 Like a "user" key, a "logon" key has a payload that is an arbitrary
+	 blob of data. It is intended as a place to store secrets which are
+	 accessible to the kernel but not to userspace programs.
+
+	 The description can be arbitrary, but must be prefixed with a non-zero
+	 length string that describes the key "subclass". The subclass is
+	 separated from the rest of the description by a ':'. "logon" keys can
+	 be created and updated from userspace, but the payload is only
+	 readable from kernel space.
+
+  *  Each process subscribes to three keyrings: a thread-specific keyring, a
+     process-specific keyring, and a session-specific keyring.
+
+     The thread-specific keyring is discarded from the child when any sort of
+     clone, fork, vfork or execve occurs. A new keyring is created only when
+     required.
+
+     The process-specific keyring is replaced with an empty one in the child on
+     clone, fork, vfork unless CLONE_THREAD is supplied, in which case it is
+     shared. execve also discards the process's process keyring and creates a
+     new one.
+
+     The session-specific keyring is persistent across clone, fork, vfork and
+     execve, even when the latter executes a set-UID or set-GID binary. A
+     process can, however, replace its current session keyring with a new one
+     by using PR_JOIN_SESSION_KEYRING. It is permitted to request an anonymous
+     new one, or to attempt to create or join one of a specific name.
+
+     The ownership of the thread keyring changes when the real UID and GID of
+     the thread changes.
+
+  *  Each user ID resident in the system holds two special keyrings: a user
+     specific keyring and a default user session keyring. The default session
+     keyring is initialised with a link to the user-specific keyring.
+
+     When a process changes its real UID, if it used to have no session key, it
+     will be subscribed to the default session key for the new UID.
+
+     If a process attempts to access its session key when it doesn't have one,
+     it will be subscribed to the default for its current UID.
+
+  *  Each user has two quotas against which the keys they own are tracked. One
+     limits the total number of keys and keyrings, the other limits the total
+     amount of description and payload space that can be consumed.
+
+     The user can view information on this and other statistics through procfs
+     files.  The root user may also alter the quota limits through sysctl files
+     (see the section "New procfs files").
+
+     Process-specific and thread-specific keyrings are not counted towards a
+     user's quota.
+
+     If a system call that modifies a key or keyring in some way would put the
+     user over quota, the operation is refused and error EDQUOT is returned.
+
+  *  There's a system call interface by which userspace programs can create and
+     manipulate keys and keyrings.
+
+  *  There's a kernel interface by which services can register types and search
+     for keys.
+
+  *  There's a way for the a search done from the kernel to call back to
+     userspace to request a key that can't be found in a process's keyrings.
+
+  *  An optional filesystem is available through which the key database can be
+     viewed and manipulated.
+
+
+Key Access Permissions
+======================
+
+Keys have an owner user ID, a group access ID, and a permissions mask. The mask
+has up to eight bits each for possessor, user, group and other access. Only
+six of each set of eight bits are defined. These permissions granted are:
+
+  *  View
+
+     This permits a key or keyring's attributes to be viewed - including key
+     type and description.
+
+  *  Read
+
+     This permits a key's payload to be viewed or a keyring's list of linked
+     keys.
+
+  *  Write
+
+     This permits a key's payload to be instantiated or updated, or it allows a
+     link to be added to or removed from a keyring.
+
+  *  Search
+
+     This permits keyrings to be searched and keys to be found. Searches can
+     only recurse into nested keyrings that have search permission set.
+
+  *  Link
+
+     This permits a key or keyring to be linked to. To create a link from a
+     keyring to a key, a process must have Write permission on the keyring and
+     Link permission on the key.
+
+  *  Set Attribute
+
+     This permits a key's UID, GID and permissions mask to be changed.
+
+For changing the ownership, group ID or permissions mask, being the owner of
+the key or having the sysadmin capability is sufficient.
+
+
+SELinux Support
+===============
+
+The security class "key" has been added to SELinux so that mandatory access
+controls can be applied to keys created within various contexts.  This support
+is preliminary, and is likely to change quite significantly in the near future.
+Currently, all of the basic permissions explained above are provided in SELinux
+as well; SELinux is simply invoked after all basic permission checks have been
+performed.
+
+The value of the file /proc/self/attr/keycreate influences the labeling of
+newly-created keys.  If the contents of that file correspond to an SELinux
+security context, then the key will be assigned that context.  Otherwise, the
+key will be assigned the current context of the task that invoked the key
+creation request.  Tasks must be granted explicit permission to assign a
+particular context to newly-created keys, using the "create" permission in the
+key security class.
+
+The default keyrings associated with users will be labeled with the default
+context of the user if and only if the login programs have been instrumented to
+properly initialize keycreate during the login process.  Otherwise, they will
+be labeled with the context of the login program itself.
+
+Note, however, that the default keyrings associated with the root user are
+labeled with the default kernel context, since they are created early in the
+boot process, before root has a chance to log in.
+
+The keyrings associated with new threads are each labeled with the context of
+their associated thread, and both session and process keyrings are handled
+similarly.
+
+
+New ProcFS Files
+================
+
+Two files have been added to procfs by which an administrator can find out
+about the status of the key service:
+
+  *  /proc/keys
+
+     This lists the keys that are currently viewable by the task reading the
+     file, giving information about their type, description and permissions.
+     It is not possible to view the payload of the key this way, though some
+     information about it may be given.
+
+     The only keys included in the list are those that grant View permission to
+     the reading process whether or not it possesses them.  Note that LSM
+     security checks are still performed, and may further filter out keys that
+     the current process is not authorised to view.
+
+     The contents of the file look like this::
+
+	SERIAL   FLAGS  USAGE EXPY PERM     UID   GID   TYPE      DESCRIPTION: SUMMARY
+	00000001 I-----    39 perm 1f3f0000     0     0 keyring   _uid_ses.0: 1/4
+	00000002 I-----     2 perm 1f3f0000     0     0 keyring   _uid.0: empty
+	00000007 I-----     1 perm 1f3f0000     0     0 keyring   _pid.1: empty
+	0000018d I-----     1 perm 1f3f0000     0     0 keyring   _pid.412: empty
+	000004d2 I--Q--     1 perm 1f3f0000    32    -1 keyring   _uid.32: 1/4
+	000004d3 I--Q--     3 perm 1f3f0000    32    -1 keyring   _uid_ses.32: empty
+	00000892 I--QU-     1 perm 1f000000     0     0 user      metal:copper: 0
+	00000893 I--Q-N     1  35s 1f3f0000     0     0 user      metal:silver: 0
+	00000894 I--Q--     1  10h 003f0000     0     0 user      metal:gold: 0
+
+     The flags are::
+
+	I	Instantiated
+	R	Revoked
+	D	Dead
+	Q	Contributes to user's quota
+	U	Under construction by callback to userspace
+	N	Negative key
+
+
+  *  /proc/key-users
+
+     This file lists the tracking data for each user that has at least one key
+     on the system.  Such data includes quota information and statistics::
+
+	[root@andromeda root]# cat /proc/key-users
+	0:     46 45/45 1/100 13/10000
+	29:     2 2/2 2/100 40/10000
+	32:     2 2/2 2/100 40/10000
+	38:     2 2/2 2/100 40/10000
+
+     The format of each line is::
+
+	<UID>:			User ID to which this applies
+	<usage>			Structure refcount
+	<inst>/<keys>		Total number of keys and number instantiated
+	<keys>/<max>		Key count quota
+	<bytes>/<max>		Key size quota
+
+
+Four new sysctl files have been added also for the purpose of controlling the
+quota limits on keys:
+
+  *  /proc/sys/kernel/keys/root_maxkeys
+     /proc/sys/kernel/keys/root_maxbytes
+
+     These files hold the maximum number of keys that root may have and the
+     maximum total number of bytes of data that root may have stored in those
+     keys.
+
+  *  /proc/sys/kernel/keys/maxkeys
+     /proc/sys/kernel/keys/maxbytes
+
+     These files hold the maximum number of keys that each non-root user may
+     have and the maximum total number of bytes of data that each of those
+     users may have stored in their keys.
+
+Root may alter these by writing each new limit as a decimal number string to
+the appropriate file.
+
+
+Userspace System Call Interface
+===============================
+
+Userspace can manipulate keys directly through three new syscalls: add_key,
+request_key and keyctl. The latter provides a number of functions for
+manipulating keys.
+
+When referring to a key directly, userspace programs should use the key's
+serial number (a positive 32-bit integer). However, there are some special
+values available for referring to special keys and keyrings that relate to the
+process making the call::
+
+	CONSTANT			VALUE	KEY REFERENCED
+	==============================	======	===========================
+	KEY_SPEC_THREAD_KEYRING		-1	thread-specific keyring
+	KEY_SPEC_PROCESS_KEYRING	-2	process-specific keyring
+	KEY_SPEC_SESSION_KEYRING	-3	session-specific keyring
+	KEY_SPEC_USER_KEYRING		-4	UID-specific keyring
+	KEY_SPEC_USER_SESSION_KEYRING	-5	UID-session keyring
+	KEY_SPEC_GROUP_KEYRING		-6	GID-specific keyring
+	KEY_SPEC_REQKEY_AUTH_KEY	-7	assumed request_key()
+						  authorisation key
+
+
+The main syscalls are:
+
+  *  Create a new key of given type, description and payload and add it to the
+     nominated keyring::
+
+	key_serial_t add_key(const char *type, const char *desc,
+			     const void *payload, size_t plen,
+			     key_serial_t keyring);
+
+     If a key of the same type and description as that proposed already exists
+     in the keyring, this will try to update it with the given payload, or it
+     will return error EEXIST if that function is not supported by the key
+     type. The process must also have permission to write to the key to be able
+     to update it. The new key will have all user permissions granted and no
+     group or third party permissions.
+
+     Otherwise, this will attempt to create a new key of the specified type and
+     description, and to instantiate it with the supplied payload and attach it
+     to the keyring. In this case, an error will be generated if the process
+     does not have permission to write to the keyring.
+
+     If the key type supports it, if the description is NULL or an empty
+     string, the key type will try and generate a description from the content
+     of the payload.
+
+     The payload is optional, and the pointer can be NULL if not required by
+     the type. The payload is plen in size, and plen can be zero for an empty
+     payload.
+
+     A new keyring can be generated by setting type "keyring", the keyring name
+     as the description (or NULL) and setting the payload to NULL.
+
+     User defined keys can be created by specifying type "user". It is
+     recommended that a user defined key's description by prefixed with a type
+     ID and a colon, such as "krb5tgt:" for a Kerberos 5 ticket granting
+     ticket.
+
+     Any other type must have been registered with the kernel in advance by a
+     kernel service such as a filesystem.
+
+     The ID of the new or updated key is returned if successful.
+
+
+  *  Search the process's keyrings for a key, potentially calling out to
+     userspace to create it::
+
+	key_serial_t request_key(const char *type, const char *description,
+				 const char *callout_info,
+				 key_serial_t dest_keyring);
+
+     This function searches all the process's keyrings in the order thread,
+     process, session for a matching key. This works very much like
+     KEYCTL_SEARCH, including the optional attachment of the discovered key to
+     a keyring.
+
+     If a key cannot be found, and if callout_info is not NULL, then
+     /sbin/request-key will be invoked in an attempt to obtain a key. The
+     callout_info string will be passed as an argument to the program.
+
+     To link a key into the destination keyring the key must grant link
+     permission on the key to the caller and the keyring must grant write
+     permission.
+
+     See also Documentation/security/keys/request-key.rst.
+
+
+The keyctl syscall functions are:
+
+  *  Map a special key ID to a real key ID for this process::
+
+	key_serial_t keyctl(KEYCTL_GET_KEYRING_ID, key_serial_t id,
+			    int create);
+
+     The special key specified by "id" is looked up (with the key being created
+     if necessary) and the ID of the key or keyring thus found is returned if
+     it exists.
+
+     If the key does not yet exist, the key will be created if "create" is
+     non-zero; and the error ENOKEY will be returned if "create" is zero.
+
+
+  *  Replace the session keyring this process subscribes to with a new one::
+
+	key_serial_t keyctl(KEYCTL_JOIN_SESSION_KEYRING, const char *name);
+
+     If name is NULL, an anonymous keyring is created attached to the process
+     as its session keyring, displacing the old session keyring.
+
+     If name is not NULL, if a keyring of that name exists, the process
+     attempts to attach it as the session keyring, returning an error if that
+     is not permitted; otherwise a new keyring of that name is created and
+     attached as the session keyring.
+
+     To attach to a named keyring, the keyring must have search permission for
+     the process's ownership.
+
+     The ID of the new session keyring is returned if successful.
+
+
+  *  Update the specified key::
+
+	long keyctl(KEYCTL_UPDATE, key_serial_t key, const void *payload,
+		    size_t plen);
+
+     This will try to update the specified key with the given payload, or it
+     will return error EOPNOTSUPP if that function is not supported by the key
+     type. The process must also have permission to write to the key to be able
+     to update it.
+
+     The payload is of length plen, and may be absent or empty as for
+     add_key().
+
+
+  *  Revoke a key::
+
+	long keyctl(KEYCTL_REVOKE, key_serial_t key);
+
+     This makes a key unavailable for further operations. Further attempts to
+     use the key will be met with error EKEYREVOKED, and the key will no longer
+     be findable.
+
+
+  *  Change the ownership of a key::
+
+	long keyctl(KEYCTL_CHOWN, key_serial_t key, uid_t uid, gid_t gid);
+
+     This function permits a key's owner and group ID to be changed. Either one
+     of uid or gid can be set to -1 to suppress that change.
+
+     Only the superuser can change a key's owner to something other than the
+     key's current owner. Similarly, only the superuser can change a key's
+     group ID to something other than the calling process's group ID or one of
+     its group list members.
+
+
+  *  Change the permissions mask on a key::
+
+	long keyctl(KEYCTL_SETPERM, key_serial_t key, key_perm_t perm);
+
+     This function permits the owner of a key or the superuser to change the
+     permissions mask on a key.
+
+     Only bits the available bits are permitted; if any other bits are set,
+     error EINVAL will be returned.
+
+
+  *  Describe a key::
+
+	long keyctl(KEYCTL_DESCRIBE, key_serial_t key, char *buffer,
+		    size_t buflen);
+
+     This function returns a summary of the key's attributes (but not its
+     payload data) as a string in the buffer provided.
+
+     Unless there's an error, it always returns the amount of data it could
+     produce, even if that's too big for the buffer, but it won't copy more
+     than requested to userspace. If the buffer pointer is NULL then no copy
+     will take place.
+
+     A process must have view permission on the key for this function to be
+     successful.
+
+     If successful, a string is placed in the buffer in the following format::
+
+	<type>;<uid>;<gid>;<perm>;<description>
+
+     Where type and description are strings, uid and gid are decimal, and perm
+     is hexadecimal. A NUL character is included at the end of the string if
+     the buffer is sufficiently big.
+
+     This can be parsed with::
+
+	sscanf(buffer, "%[^;];%d;%d;%o;%s", type, &uid, &gid, &mode, desc);
+
+
+  *  Clear out a keyring::
+
+	long keyctl(KEYCTL_CLEAR, key_serial_t keyring);
+
+     This function clears the list of keys attached to a keyring. The calling
+     process must have write permission on the keyring, and it must be a
+     keyring (or else error ENOTDIR will result).
+
+     This function can also be used to clear special kernel keyrings if they
+     are appropriately marked if the user has CAP_SYS_ADMIN capability.  The
+     DNS resolver cache keyring is an example of this.
+
+
+  *  Link a key into a keyring::
+
+	long keyctl(KEYCTL_LINK, key_serial_t keyring, key_serial_t key);
+
+     This function creates a link from the keyring to the key. The process must
+     have write permission on the keyring and must have link permission on the
+     key.
+
+     Should the keyring not be a keyring, error ENOTDIR will result; and if the
+     keyring is full, error ENFILE will result.
+
+     The link procedure checks the nesting of the keyrings, returning ELOOP if
+     it appears too deep or EDEADLK if the link would introduce a cycle.
+
+     Any links within the keyring to keys that match the new key in terms of
+     type and description will be discarded from the keyring as the new one is
+     added.
+
+
+  *  Move a key from one keyring to another::
+
+	long keyctl(KEYCTL_MOVE,
+		    key_serial_t id,
+		    key_serial_t from_ring_id,
+		    key_serial_t to_ring_id,
+		    unsigned int flags);
+
+     Move the key specified by "id" from the keyring specified by
+     "from_ring_id" to the keyring specified by "to_ring_id".  If the two
+     keyrings are the same, nothing is done.
+
+     "flags" can have KEYCTL_MOVE_EXCL set in it to cause the operation to fail
+     with EEXIST if a matching key exists in the destination keyring, otherwise
+     such a key will be replaced.
+
+     A process must have link permission on the key for this function to be
+     successful and write permission on both keyrings.  Any errors that can
+     occur from KEYCTL_LINK also apply on the destination keyring here.
+
+
+  *  Unlink a key or keyring from another keyring::
+
+	long keyctl(KEYCTL_UNLINK, key_serial_t keyring, key_serial_t key);
+
+     This function looks through the keyring for the first link to the
+     specified key, and removes it if found. Subsequent links to that key are
+     ignored. The process must have write permission on the keyring.
+
+     If the keyring is not a keyring, error ENOTDIR will result; and if the key
+     is not present, error ENOENT will be the result.
+
+
+  *  Search a keyring tree for a key::
+
+	key_serial_t keyctl(KEYCTL_SEARCH, key_serial_t keyring,
+			    const char *type, const char *description,
+			    key_serial_t dest_keyring);
+
+     This searches the keyring tree headed by the specified keyring until a key
+     is found that matches the type and description criteria. Each keyring is
+     checked for keys before recursion into its children occurs.
+
+     The process must have search permission on the top level keyring, or else
+     error EACCES will result. Only keyrings that the process has search
+     permission on will be recursed into, and only keys and keyrings for which
+     a process has search permission can be matched. If the specified keyring
+     is not a keyring, ENOTDIR will result.
+
+     If the search succeeds, the function will attempt to link the found key
+     into the destination keyring if one is supplied (non-zero ID). All the
+     constraints applicable to KEYCTL_LINK apply in this case too.
+
+     Error ENOKEY, EKEYREVOKED or EKEYEXPIRED will be returned if the search
+     fails. On success, the resulting key ID will be returned.
+
+
+  *  Read the payload data from a key::
+
+	long keyctl(KEYCTL_READ, key_serial_t keyring, char *buffer,
+		    size_t buflen);
+
+     This function attempts to read the payload data from the specified key
+     into the buffer. The process must have read permission on the key to
+     succeed.
+
+     The returned data will be processed for presentation by the key type. For
+     instance, a keyring will return an array of key_serial_t entries
+     representing the IDs of all the keys to which it is subscribed. The user
+     defined key type will return its data as is. If a key type does not
+     implement this function, error EOPNOTSUPP will result.
+
+     If the specified buffer is too small, then the size of the buffer required
+     will be returned.  Note that in this case, the contents of the buffer may
+     have been overwritten in some undefined way.
+
+     Otherwise, on success, the function will return the amount of data copied
+     into the buffer.
+
+  *  Instantiate a partially constructed key::
+
+	long keyctl(KEYCTL_INSTANTIATE, key_serial_t key,
+		    const void *payload, size_t plen,
+		    key_serial_t keyring);
+	long keyctl(KEYCTL_INSTANTIATE_IOV, key_serial_t key,
+		    const struct iovec *payload_iov, unsigned ioc,
+		    key_serial_t keyring);
+
+     If the kernel calls back to userspace to complete the instantiation of a
+     key, userspace should use this call to supply data for the key before the
+     invoked process returns, or else the key will be marked negative
+     automatically.
+
+     The process must have write access on the key to be able to instantiate
+     it, and the key must be uninstantiated.
+
+     If a keyring is specified (non-zero), the key will also be linked into
+     that keyring, however all the constraints applying in KEYCTL_LINK apply in
+     this case too.
+
+     The payload and plen arguments describe the payload data as for add_key().
+
+     The payload_iov and ioc arguments describe the payload data in an iovec
+     array instead of a single buffer.
+
+
+  *  Negatively instantiate a partially constructed key::
+
+	long keyctl(KEYCTL_NEGATE, key_serial_t key,
+		    unsigned timeout, key_serial_t keyring);
+	long keyctl(KEYCTL_REJECT, key_serial_t key,
+		    unsigned timeout, unsigned error, key_serial_t keyring);
+
+     If the kernel calls back to userspace to complete the instantiation of a
+     key, userspace should use this call mark the key as negative before the
+     invoked process returns if it is unable to fulfill the request.
+
+     The process must have write access on the key to be able to instantiate
+     it, and the key must be uninstantiated.
+
+     If a keyring is specified (non-zero), the key will also be linked into
+     that keyring, however all the constraints applying in KEYCTL_LINK apply in
+     this case too.
+
+     If the key is rejected, future searches for it will return the specified
+     error code until the rejected key expires.  Negating the key is the same
+     as rejecting the key with ENOKEY as the error code.
+
+
+  *  Set the default request-key destination keyring::
+
+	long keyctl(KEYCTL_SET_REQKEY_KEYRING, int reqkey_defl);
+
+     This sets the default keyring to which implicitly requested keys will be
+     attached for this thread. reqkey_defl should be one of these constants::
+
+	CONSTANT				VALUE	NEW DEFAULT KEYRING
+	======================================	======	=======================
+	KEY_REQKEY_DEFL_NO_CHANGE		-1	No change
+	KEY_REQKEY_DEFL_DEFAULT			0	Default[1]
+	KEY_REQKEY_DEFL_THREAD_KEYRING		1	Thread keyring
+	KEY_REQKEY_DEFL_PROCESS_KEYRING		2	Process keyring
+	KEY_REQKEY_DEFL_SESSION_KEYRING		3	Session keyring
+	KEY_REQKEY_DEFL_USER_KEYRING		4	User keyring
+	KEY_REQKEY_DEFL_USER_SESSION_KEYRING	5	User session keyring
+	KEY_REQKEY_DEFL_GROUP_KEYRING		6	Group keyring
+
+     The old default will be returned if successful and error EINVAL will be
+     returned if reqkey_defl is not one of the above values.
+
+     The default keyring can be overridden by the keyring indicated to the
+     request_key() system call.
+
+     Note that this setting is inherited across fork/exec.
+
+     [1] The default is: the thread keyring if there is one, otherwise
+     the process keyring if there is one, otherwise the session keyring if
+     there is one, otherwise the user default session keyring.
+
+
+  *  Set the timeout on a key::
+
+	long keyctl(KEYCTL_SET_TIMEOUT, key_serial_t key, unsigned timeout);
+
+     This sets or clears the timeout on a key. The timeout can be 0 to clear
+     the timeout or a number of seconds to set the expiry time that far into
+     the future.
+
+     The process must have attribute modification access on a key to set its
+     timeout. Timeouts may not be set with this function on negative, revoked
+     or expired keys.
+
+
+  *  Assume the authority granted to instantiate a key::
+
+	long keyctl(KEYCTL_ASSUME_AUTHORITY, key_serial_t key);
+
+     This assumes or divests the authority required to instantiate the
+     specified key. Authority can only be assumed if the thread has the
+     authorisation key associated with the specified key in its keyrings
+     somewhere.
+
+     Once authority is assumed, searches for keys will also search the
+     requester's keyrings using the requester's security label, UID, GID and
+     groups.
+
+     If the requested authority is unavailable, error EPERM will be returned,
+     likewise if the authority has been revoked because the target key is
+     already instantiated.
+
+     If the specified key is 0, then any assumed authority will be divested.
+
+     The assumed authoritative key is inherited across fork and exec.
+
+
+  *  Get the LSM security context attached to a key::
+
+	long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer,
+		    size_t buflen)
+
+     This function returns a string that represents the LSM security context
+     attached to a key in the buffer provided.
+
+     Unless there's an error, it always returns the amount of data it could
+     produce, even if that's too big for the buffer, but it won't copy more
+     than requested to userspace. If the buffer pointer is NULL then no copy
+     will take place.
+
+     A NUL character is included at the end of the string if the buffer is
+     sufficiently big.  This is included in the returned count.  If no LSM is
+     in force then an empty string will be returned.
+
+     A process must have view permission on the key for this function to be
+     successful.
+
+
+  *  Install the calling process's session keyring on its parent::
+
+	long keyctl(KEYCTL_SESSION_TO_PARENT);
+
+     This functions attempts to install the calling process's session keyring
+     on to the calling process's parent, replacing the parent's current session
+     keyring.
+
+     The calling process must have the same ownership as its parent, the
+     keyring must have the same ownership as the calling process, the calling
+     process must have LINK permission on the keyring and the active LSM module
+     mustn't deny permission, otherwise error EPERM will be returned.
+
+     Error ENOMEM will be returned if there was insufficient memory to complete
+     the operation, otherwise 0 will be returned to indicate success.
+
+     The keyring will be replaced next time the parent process leaves the
+     kernel and resumes executing userspace.
+
+
+  *  Invalidate a key::
+
+	long keyctl(KEYCTL_INVALIDATE, key_serial_t key);
+
+     This function marks a key as being invalidated and then wakes up the
+     garbage collector.  The garbage collector immediately removes invalidated
+     keys from all keyrings and deletes the key when its reference count
+     reaches zero.
+
+     Keys that are marked invalidated become invisible to normal key operations
+     immediately, though they are still visible in /proc/keys until deleted
+     (they're marked with an 'i' flag).
+
+     A process must have search permission on the key for this function to be
+     successful.
+
+  *  Compute a Diffie-Hellman shared secret or public key::
+
+	long keyctl(KEYCTL_DH_COMPUTE, struct keyctl_dh_params *params,
+		    char *buffer, size_t buflen, struct keyctl_kdf_params *kdf);
+
+     The params struct contains serial numbers for three keys::
+
+	 - The prime, p, known to both parties
+	 - The local private key
+	 - The base integer, which is either a shared generator or the
+	   remote public key
+
+     The value computed is::
+
+	result = base ^ private (mod prime)
+
+     If the base is the shared generator, the result is the local
+     public key.  If the base is the remote public key, the result is
+     the shared secret.
+
+     If the parameter kdf is NULL, the following applies:
+
+	 - The buffer length must be at least the length of the prime, or zero.
+
+	 - If the buffer length is nonzero, the length of the result is
+	   returned when it is successfully calculated and copied in to the
+	   buffer. When the buffer length is zero, the minimum required
+	   buffer length is returned.
+
+     The kdf parameter allows the caller to apply a key derivation function
+     (KDF) on the Diffie-Hellman computation where only the result
+     of the KDF is returned to the caller. The KDF is characterized with
+     struct keyctl_kdf_params as follows:
+
+	 - ``char *hashname`` specifies the NUL terminated string identifying
+	   the hash used from the kernel crypto API and applied for the KDF
+	   operation. The KDF implementation complies with SP800-56A as well
+	   as with SP800-108 (the counter KDF).
+
+	 - ``char *otherinfo`` specifies the OtherInfo data as documented in
+	   SP800-56A section 5.8.1.2. The length of the buffer is given with
+	   otherinfolen. The format of OtherInfo is defined by the caller.
+	   The otherinfo pointer may be NULL if no OtherInfo shall be used.
+
+     This function will return error EOPNOTSUPP if the key type is not
+     supported, error ENOKEY if the key could not be found, or error
+     EACCES if the key is not readable by the caller. In addition, the
+     function will return EMSGSIZE when the parameter kdf is non-NULL
+     and either the buffer length or the OtherInfo length exceeds the
+     allowed length.
+
+
+  *  Restrict keyring linkage::
+
+	long keyctl(KEYCTL_RESTRICT_KEYRING, key_serial_t keyring,
+		    const char *type, const char *restriction);
+
+     An existing keyring can restrict linkage of additional keys by evaluating
+     the contents of the key according to a restriction scheme.
+
+     "keyring" is the key ID for an existing keyring to apply a restriction
+     to. It may be empty or may already have keys linked. Existing linked keys
+     will remain in the keyring even if the new restriction would reject them.
+
+     "type" is a registered key type.
+
+     "restriction" is a string describing how key linkage is to be restricted.
+     The format varies depending on the key type, and the string is passed to
+     the lookup_restriction() function for the requested type.  It may specify
+     a method and relevant data for the restriction such as signature
+     verification or constraints on key payload. If the requested key type is
+     later unregistered, no keys may be added to the keyring after the key type
+     is removed.
+
+     To apply a keyring restriction the process must have Set Attribute
+     permission and the keyring must not be previously restricted.
+
+     One application of restricted keyrings is to verify X.509 certificate
+     chains or individual certificate signatures using the asymmetric key type.
+     See Documentation/crypto/asymmetric-keys.rst for specific restrictions
+     applicable to the asymmetric key type.
+
+
+  *  Query an asymmetric key::
+
+	long keyctl(KEYCTL_PKEY_QUERY,
+		    key_serial_t key_id, unsigned long reserved,
+		    const char *params,
+		    struct keyctl_pkey_query *info);
+
+     Get information about an asymmetric key.  Specific algorithms and
+     encodings may be queried by using the ``params`` argument.  This is a
+     string containing a space- or tab-separated string of key-value pairs.
+     Currently supported keys include ``enc`` and ``hash``.  The information
+     is returned in the keyctl_pkey_query struct::
+
+	__u32	supported_ops;
+	__u32	key_size;
+	__u16	max_data_size;
+	__u16	max_sig_size;
+	__u16	max_enc_size;
+	__u16	max_dec_size;
+	__u32	__spare[10];
+
+     ``supported_ops`` contains a bit mask of flags indicating which ops are
+     supported.  This is constructed from a bitwise-OR of::
+
+	KEYCTL_SUPPORTS_{ENCRYPT,DECRYPT,SIGN,VERIFY}
+
+     ``key_size`` indicated the size of the key in bits.
+
+     ``max_*_size`` indicate the maximum sizes in bytes of a blob of data to be
+     signed, a signature blob, a blob to be encrypted and a blob to be
+     decrypted.
+
+     ``__spare[]`` must be set to 0.  This is intended for future use to hand
+     over one or more passphrases needed unlock a key.
+
+     If successful, 0 is returned.  If the key is not an asymmetric key,
+     EOPNOTSUPP is returned.
+
+
+  *  Encrypt, decrypt, sign or verify a blob using an asymmetric key::
+
+	long keyctl(KEYCTL_PKEY_ENCRYPT,
+		    const struct keyctl_pkey_params *params,
+		    const char *info,
+		    const void *in,
+		    void *out);
+
+	long keyctl(KEYCTL_PKEY_DECRYPT,
+		    const struct keyctl_pkey_params *params,
+		    const char *info,
+		    const void *in,
+		    void *out);
+
+	long keyctl(KEYCTL_PKEY_SIGN,
+		    const struct keyctl_pkey_params *params,
+		    const char *info,
+		    const void *in,
+		    void *out);
+
+	long keyctl(KEYCTL_PKEY_VERIFY,
+		    const struct keyctl_pkey_params *params,
+		    const char *info,
+		    const void *in,
+		    const void *in2);
+
+     Use an asymmetric key to perform a public-key cryptographic operation a
+     blob of data.  For encryption and verification, the asymmetric key may
+     only need the public parts to be available, but for decryption and signing
+     the private parts are required also.
+
+     The parameter block pointed to by params contains a number of integer
+     values::
+
+	__s32		key_id;
+	__u32		in_len;
+	__u32		out_len;
+	__u32		in2_len;
+
+     ``key_id`` is the ID of the asymmetric key to be used.  ``in_len`` and
+     ``in2_len`` indicate the amount of data in the in and in2 buffers and
+     ``out_len`` indicates the size of the out buffer as appropriate for the
+     above operations.
+
+     For a given operation, the in and out buffers are used as follows::
+
+	Operation ID		in,in_len	out,out_len	in2,in2_len
+	=======================	===============	===============	===============
+	KEYCTL_PKEY_ENCRYPT	Raw data	Encrypted data	-
+	KEYCTL_PKEY_DECRYPT	Encrypted data	Raw data	-
+	KEYCTL_PKEY_SIGN	Raw data	Signature	-
+	KEYCTL_PKEY_VERIFY	Raw data	-		Signature
+
+     ``info`` is a string of key=value pairs that supply supplementary
+     information.  These include:
+
+	``enc=<encoding>`` The encoding of the encrypted/signature blob.  This
+			can be "pkcs1" for RSASSA-PKCS1-v1.5 or
+			RSAES-PKCS1-v1.5; "pss" for "RSASSA-PSS"; "oaep" for
+			"RSAES-OAEP".  If omitted or is "raw", the raw output
+			of the encryption function is specified.
+
+	``hash=<algo>``	If the data buffer contains the output of a hash
+			function and the encoding includes some indication of
+			which hash function was used, the hash function can be
+			specified with this, eg. "hash=sha256".
+
+     The ``__spare[]`` space in the parameter block must be set to 0.  This is
+     intended, amongst other things, to allow the passing of passphrases
+     required to unlock a key.
+
+     If successful, encrypt, decrypt and sign all return the amount of data
+     written into the output buffer.  Verification returns 0 on success.
+
+
+  *  Watch a key or keyring for changes::
+
+	long keyctl(KEYCTL_WATCH_KEY, key_serial_t key, int queue_fd,
+		    const struct watch_notification_filter *filter);
+
+     This will set or remove a watch for changes on the specified key or
+     keyring.
+
+     "key" is the ID of the key to be watched.
+
+     "queue_fd" is a file descriptor referring to an open pipe which
+     manages the buffer into which notifications will be delivered.
+
+     "filter" is either NULL to remove a watch or a filter specification to
+     indicate what events are required from the key.
+
+     See Documentation/core-api/watch_queue.rst for more information.
+
+     Note that only one watch may be emplaced for any particular { key,
+     queue_fd } combination.
+
+     Notification records look like::
+
+	struct key_notification {
+		struct watch_notification watch;
+		__u32	key_id;
+		__u32	aux;
+	};
+
+     In this, watch::type will be "WATCH_TYPE_KEY_NOTIFY" and subtype will be
+     one of::
+
+	NOTIFY_KEY_INSTANTIATED
+	NOTIFY_KEY_UPDATED
+	NOTIFY_KEY_LINKED
+	NOTIFY_KEY_UNLINKED
+	NOTIFY_KEY_CLEARED
+	NOTIFY_KEY_REVOKED
+	NOTIFY_KEY_INVALIDATED
+	NOTIFY_KEY_SETATTR
+
+     Where these indicate a key being instantiated/rejected, updated, a link
+     being made in a keyring, a link being removed from a keyring, a keyring
+     being cleared, a key being revoked, a key being invalidated or a key
+     having one of its attributes changed (user, group, perm, timeout,
+     restriction).
+
+     If a watched key is deleted, a basic watch_notification will be issued
+     with "type" set to WATCH_TYPE_META and "subtype" set to
+     watch_meta_removal_notification.  The watchpoint ID will be set in the
+     "info" field.
+
+     This needs to be configured by enabling:
+
+	"Provide key/keyring change notifications" (KEY_NOTIFICATIONS)
+
+
+Kernel Services
+===============
+
+The kernel services for key management are fairly simple to deal with. They can
+be broken down into two areas: keys and key types.
+
+Dealing with keys is fairly straightforward. Firstly, the kernel service
+registers its type, then it searches for a key of that type. It should retain
+the key as long as it has need of it, and then it should release it. For a
+filesystem or device file, a search would probably be performed during the open
+call, and the key released upon close. How to deal with conflicting keys due to
+two different users opening the same file is left to the filesystem author to
+solve.
+
+To access the key manager, the following header must be #included::
+
+	<linux/key.h>
+
+Specific key types should have a header file under include/keys/ that should be
+used to access that type.  For keys of type "user", for example, that would be::
+
+	<keys/user-type.h>
+
+Note that there are two different types of pointers to keys that may be
+encountered:
+
+  *  struct key *
+
+     This simply points to the key structure itself. Key structures will be at
+     least four-byte aligned.
+
+  *  key_ref_t
+
+     This is equivalent to a ``struct key *``, but the least significant bit is set
+     if the caller "possesses" the key. By "possession" it is meant that the
+     calling processes has a searchable link to the key from one of its
+     keyrings. There are three functions for dealing with these::
+
+	key_ref_t make_key_ref(const struct key *key, bool possession);
+
+	struct key *key_ref_to_ptr(const key_ref_t key_ref);
+
+	bool is_key_possessed(const key_ref_t key_ref);
+
+     The first function constructs a key reference from a key pointer and
+     possession information (which must be true or false).
+
+     The second function retrieves the key pointer from a reference and the
+     third retrieves the possession flag.
+
+When accessing a key's payload contents, certain precautions must be taken to
+prevent access vs modification races. See the section "Notes on accessing
+payload contents" for more information.
+
+ *  To search for a key, call::
+
+	struct key *request_key(const struct key_type *type,
+				const char *description,
+				const char *callout_info);
+
+    This is used to request a key or keyring with a description that matches
+    the description specified according to the key type's match_preparse()
+    method. This permits approximate matching to occur. If callout_string is
+    not NULL, then /sbin/request-key will be invoked in an attempt to obtain
+    the key from userspace. In that case, callout_string will be passed as an
+    argument to the program.
+
+    Should the function fail error ENOKEY, EKEYEXPIRED or EKEYREVOKED will be
+    returned.
+
+    If successful, the key will have been attached to the default keyring for
+    implicitly obtained request-key keys, as set by KEYCTL_SET_REQKEY_KEYRING.
+
+    See also Documentation/security/keys/request-key.rst.
+
+
+ *  To search for a key in a specific domain, call::
+
+	struct key *request_key_tag(const struct key_type *type,
+				    const char *description,
+				    struct key_tag *domain_tag,
+				    const char *callout_info);
+
+    This is identical to request_key(), except that a domain tag may be
+    specifies that causes search algorithm to only match keys matching that
+    tag.  The domain_tag may be NULL, specifying a global domain that is
+    separate from any nominated domain.
+
+
+ *  To search for a key, passing auxiliary data to the upcaller, call::
+
+	struct key *request_key_with_auxdata(const struct key_type *type,
+					     const char *description,
+					     struct key_tag *domain_tag,
+					     const void *callout_info,
+					     size_t callout_len,
+					     void *aux);
+
+    This is identical to request_key_tag(), except that the auxiliary data is
+    passed to the key_type->request_key() op if it exists, and the
+    callout_info is a blob of length callout_len, if given (the length may be
+    0).
+
+
+ *  To search for a key under RCU conditions, call::
+
+	struct key *request_key_rcu(const struct key_type *type,
+				    const char *description,
+				    struct key_tag *domain_tag);
+
+    which is similar to request_key_tag() except that it does not check for
+    keys that are under construction and it will not call out to userspace to
+    construct a key if it can't find a match.
+
+
+ *  When it is no longer required, the key should be released using::
+
+	void key_put(struct key *key);
+
+    Or::
+
+	void key_ref_put(key_ref_t key_ref);
+
+    These can be called from interrupt context. If CONFIG_KEYS is not set then
+    the argument will not be parsed.
+
+
+ *  Extra references can be made to a key by calling one of the following
+    functions::
+
+	struct key *__key_get(struct key *key);
+	struct key *key_get(struct key *key);
+
+    Keys so references will need to be disposed of by calling key_put() when
+    they've been finished with.  The key pointer passed in will be returned.
+
+    In the case of key_get(), if the pointer is NULL or CONFIG_KEYS is not set
+    then the key will not be dereferenced and no increment will take place.
+
+
+ *  A key's serial number can be obtained by calling::
+
+	key_serial_t key_serial(struct key *key);
+
+    If key is NULL or if CONFIG_KEYS is not set then 0 will be returned (in the
+    latter case without parsing the argument).
+
+
+ *  If a keyring was found in the search, this can be further searched by::
+
+	key_ref_t keyring_search(key_ref_t keyring_ref,
+				 const struct key_type *type,
+				 const char *description,
+				 bool recurse)
+
+    This searches the specified keyring only (recurse == false) or keyring tree
+    (recurse == true) specified for a matching key. Error ENOKEY is returned
+    upon failure (use IS_ERR/PTR_ERR to determine). If successful, the returned
+    key will need to be released.
+
+    The possession attribute from the keyring reference is used to control
+    access through the permissions mask and is propagated to the returned key
+    reference pointer if successful.
+
+
+ *  A keyring can be created by::
+
+	struct key *keyring_alloc(const char *description, uid_t uid, gid_t gid,
+				  const struct cred *cred,
+				  key_perm_t perm,
+				  struct key_restriction *restrict_link,
+				  unsigned long flags,
+				  struct key *dest);
+
+    This creates a keyring with the given attributes and returns it.  If dest
+    is not NULL, the new keyring will be linked into the keyring to which it
+    points.  No permission checks are made upon the destination keyring.
+
+    Error EDQUOT can be returned if the keyring would overload the quota (pass
+    KEY_ALLOC_NOT_IN_QUOTA in flags if the keyring shouldn't be accounted
+    towards the user's quota).  Error ENOMEM can also be returned.
+
+    If restrict_link is not NULL, it should point to a structure that contains
+    the function that will be called each time an attempt is made to link a
+    key into the new keyring.  The structure may also contain a key pointer
+    and an associated key type.  The function is called to check whether a key
+    may be added into the keyring or not.  The key type is used by the garbage
+    collector to clean up function or data pointers in this structure if the
+    given key type is unregistered.  Callers of key_create_or_update() within
+    the kernel can pass KEY_ALLOC_BYPASS_RESTRICTION to suppress the check.
+    An example of using this is to manage rings of cryptographic keys that are
+    set up when the kernel boots where userspace is also permitted to add keys
+    - provided they can be verified by a key the kernel already has.
+
+    When called, the restriction function will be passed the keyring being
+    added to, the key type, the payload of the key being added, and data to be
+    used in the restriction check.  Note that when a new key is being created,
+    this is called between payload preparsing and actual key creation.  The
+    function should return 0 to allow the link or an error to reject it.
+
+    A convenience function, restrict_link_reject, exists to always return
+    -EPERM to in this case.
+
+
+ *  To check the validity of a key, this function can be called::
+
+	int validate_key(struct key *key);
+
+    This checks that the key in question hasn't expired or and hasn't been
+    revoked. Should the key be invalid, error EKEYEXPIRED or EKEYREVOKED will
+    be returned. If the key is NULL or if CONFIG_KEYS is not set then 0 will be
+    returned (in the latter case without parsing the argument).
+
+
+ *  To register a key type, the following function should be called::
+
+	int register_key_type(struct key_type *type);
+
+    This will return error EEXIST if a type of the same name is already
+    present.
+
+
+ *  To unregister a key type, call::
+
+	void unregister_key_type(struct key_type *type);
+
+
+Under some circumstances, it may be desirable to deal with a bundle of keys.
+The facility provides access to the keyring type for managing such a bundle::
+
+	struct key_type key_type_keyring;
+
+This can be used with a function such as request_key() to find a specific
+keyring in a process's keyrings.  A keyring thus found can then be searched
+with keyring_search().  Note that it is not possible to use request_key() to
+search a specific keyring, so using keyrings in this way is of limited utility.
+
+
+Notes On Accessing Payload Contents
+===================================
+
+The simplest payload is just data stored in key->payload directly.  In this
+case, there's no need to indulge in RCU or locking when accessing the payload.
+
+More complex payload contents must be allocated and pointers to them set in the
+key->payload.data[] array.  One of the following ways must be selected to
+access the data:
+
+  1) Unmodifiable key type.
+
+     If the key type does not have a modify method, then the key's payload can
+     be accessed without any form of locking, provided that it's known to be
+     instantiated (uninstantiated keys cannot be "found").
+
+  2) The key's semaphore.
+
+     The semaphore could be used to govern access to the payload and to control
+     the payload pointer. It must be write-locked for modifications and would
+     have to be read-locked for general access. The disadvantage of doing this
+     is that the accessor may be required to sleep.
+
+  3) RCU.
+
+     RCU must be used when the semaphore isn't already held; if the semaphore
+     is held then the contents can't change under you unexpectedly as the
+     semaphore must still be used to serialise modifications to the key. The
+     key management code takes care of this for the key type.
+
+     However, this means using::
+
+	rcu_read_lock() ... rcu_dereference() ... rcu_read_unlock()
+
+     to read the pointer, and::
+
+	rcu_dereference() ... rcu_assign_pointer() ... call_rcu()
+
+     to set the pointer and dispose of the old contents after a grace period.
+     Note that only the key type should ever modify a key's payload.
+
+     Furthermore, an RCU controlled payload must hold a struct rcu_head for the
+     use of call_rcu() and, if the payload is of variable size, the length of
+     the payload. key->datalen cannot be relied upon to be consistent with the
+     payload just dereferenced if the key's semaphore is not held.
+
+     Note that key->payload.data[0] has a shadow that is marked for __rcu
+     usage.  This is called key->payload.rcu_data0.  The following accessors
+     wrap the RCU calls to this element:
+
+     a) Set or change the first payload pointer::
+
+		rcu_assign_keypointer(struct key *key, void *data);
+
+     b) Read the first payload pointer with the key semaphore held::
+
+		[const] void *dereference_key_locked([const] struct key *key);
+
+	 Note that the return value will inherit its constness from the key
+	 parameter.  Static analysis will give an error if it things the lock
+	 isn't held.
+
+     c) Read the first payload pointer with the RCU read lock held::
+
+		const void *dereference_key_rcu(const struct key *key);
+
+
+Defining a Key Type
+===================
+
+A kernel service may want to define its own key type. For instance, an AFS
+filesystem might want to define a Kerberos 5 ticket key type. To do this, it
+author fills in a key_type struct and registers it with the system.
+
+Source files that implement key types should include the following header file::
+
+	<linux/key-type.h>
+
+The structure has a number of fields, some of which are mandatory:
+
+  *  ``const char *name``
+
+     The name of the key type. This is used to translate a key type name
+     supplied by userspace into a pointer to the structure.
+
+
+  *  ``size_t def_datalen``
+
+     This is optional - it supplies the default payload data length as
+     contributed to the quota. If the key type's payload is always or almost
+     always the same size, then this is a more efficient way to do things.
+
+     The data length (and quota) on a particular key can always be changed
+     during instantiation or update by calling::
+
+	int key_payload_reserve(struct key *key, size_t datalen);
+
+     With the revised data length. Error EDQUOT will be returned if this is not
+     viable.
+
+
+  *  ``int (*vet_description)(const char *description);``
+
+     This optional method is called to vet a key description.  If the key type
+     doesn't approve of the key description, it may return an error, otherwise
+     it should return 0.
+
+
+  *  ``int (*preparse)(struct key_preparsed_payload *prep);``
+
+     This optional method permits the key type to attempt to parse payload
+     before a key is created (add key) or the key semaphore is taken (update or
+     instantiate key).  The structure pointed to by prep looks like::
+
+	struct key_preparsed_payload {
+		char		*description;
+		union key_payload payload;
+		const void	*data;
+		size_t		datalen;
+		size_t		quotalen;
+		time_t		expiry;
+	};
+
+     Before calling the method, the caller will fill in data and datalen with
+     the payload blob parameters; quotalen will be filled in with the default
+     quota size from the key type; expiry will be set to TIME_T_MAX and the
+     rest will be cleared.
+
+     If a description can be proposed from the payload contents, that should be
+     attached as a string to the description field.  This will be used for the
+     key description if the caller of add_key() passes NULL or "".
+
+     The method can attach anything it likes to payload.  This is merely passed
+     along to the instantiate() or update() operations.  If set, the expiry
+     time will be applied to the key if it is instantiated from this data.
+
+     The method should return 0 if successful or a negative error code
+     otherwise.
+
+
+  *  ``void (*free_preparse)(struct key_preparsed_payload *prep);``
+
+     This method is only required if the preparse() method is provided,
+     otherwise it is unused.  It cleans up anything attached to the description
+     and payload fields of the key_preparsed_payload struct as filled in by the
+     preparse() method.  It will always be called after preparse() returns
+     successfully, even if instantiate() or update() succeed.
+
+
+  *  ``int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);``
+
+     This method is called to attach a payload to a key during construction.
+     The payload attached need not bear any relation to the data passed to this
+     function.
+
+     The prep->data and prep->datalen fields will define the original payload
+     blob.  If preparse() was supplied then other fields may be filled in also.
+
+     If the amount of data attached to the key differs from the size in
+     keytype->def_datalen, then key_payload_reserve() should be called.
+
+     This method does not have to lock the key in order to attach a payload.
+     The fact that KEY_FLAG_INSTANTIATED is not set in key->flags prevents
+     anything else from gaining access to the key.
+
+     It is safe to sleep in this method.
+
+     generic_key_instantiate() is provided to simply copy the data from
+     prep->payload.data[] to key->payload.data[], with RCU-safe assignment on
+     the first element.  It will then clear prep->payload.data[] so that the
+     free_preparse method doesn't release the data.
+
+
+  *  ``int (*update)(struct key *key, const void *data, size_t datalen);``
+
+     If this type of key can be updated, then this method should be provided.
+     It is called to update a key's payload from the blob of data provided.
+
+     The prep->data and prep->datalen fields will define the original payload
+     blob.  If preparse() was supplied then other fields may be filled in also.
+
+     key_payload_reserve() should be called if the data length might change
+     before any changes are actually made. Note that if this succeeds, the type
+     is committed to changing the key because it's already been altered, so all
+     memory allocation must be done first.
+
+     The key will have its semaphore write-locked before this method is called,
+     but this only deters other writers; any changes to the key's payload must
+     be made under RCU conditions, and call_rcu() must be used to dispose of
+     the old payload.
+
+     key_payload_reserve() should be called before the changes are made, but
+     after all allocations and other potentially failing function calls are
+     made.
+
+     It is safe to sleep in this method.
+
+
+  *  ``int (*match_preparse)(struct key_match_data *match_data);``
+
+     This method is optional.  It is called when a key search is about to be
+     performed.  It is given the following structure::
+
+	struct key_match_data {
+		bool (*cmp)(const struct key *key,
+			    const struct key_match_data *match_data);
+		const void	*raw_data;
+		void		*preparsed;
+		unsigned	lookup_type;
+	};
+
+     On entry, raw_data will be pointing to the criteria to be used in matching
+     a key by the caller and should not be modified.  ``(*cmp)()`` will be pointing
+     to the default matcher function (which does an exact description match
+     against raw_data) and lookup_type will be set to indicate a direct lookup.
+
+     The following lookup_type values are available:
+
+       *  KEYRING_SEARCH_LOOKUP_DIRECT - A direct lookup hashes the type and
+      	  description to narrow down the search to a small number of keys.
+
+       *  KEYRING_SEARCH_LOOKUP_ITERATE - An iterative lookup walks all the
+      	  keys in the keyring until one is matched.  This must be used for any
+      	  search that's not doing a simple direct match on the key description.
+
+     The method may set cmp to point to a function of its choice that does some
+     other form of match, may set lookup_type to KEYRING_SEARCH_LOOKUP_ITERATE
+     and may attach something to the preparsed pointer for use by ``(*cmp)()``.
+     ``(*cmp)()`` should return true if a key matches and false otherwise.
+
+     If preparsed is set, it may be necessary to use the match_free() method to
+     clean it up.
+
+     The method should return 0 if successful or a negative error code
+     otherwise.
+
+     It is permitted to sleep in this method, but ``(*cmp)()`` may not sleep as
+     locks will be held over it.
+
+     If match_preparse() is not provided, keys of this type will be matched
+     exactly by their description.
+
+
+  *  ``void (*match_free)(struct key_match_data *match_data);``
+
+     This method is optional.  If given, it called to clean up
+     match_data->preparsed after a successful call to match_preparse().
+
+
+  *  ``void (*revoke)(struct key *key);``
+
+     This method is optional.  It is called to discard part of the payload
+     data upon a key being revoked.  The caller will have the key semaphore
+     write-locked.
+
+     It is safe to sleep in this method, though care should be taken to avoid
+     a deadlock against the key semaphore.
+
+
+  *  ``void (*destroy)(struct key *key);``
+
+     This method is optional. It is called to discard the payload data on a key
+     when it is being destroyed.
+
+     This method does not need to lock the key to access the payload; it can
+     consider the key as being inaccessible at this time. Note that the key's
+     type may have been changed before this function is called.
+
+     It is not safe to sleep in this method; the caller may hold spinlocks.
+
+
+  *  ``void (*describe)(const struct key *key, struct seq_file *p);``
+
+     This method is optional. It is called during /proc/keys reading to
+     summarise a key's description and payload in text form.
+
+     This method will be called with the RCU read lock held. rcu_dereference()
+     should be used to read the payload pointer if the payload is to be
+     accessed. key->datalen cannot be trusted to stay consistent with the
+     contents of the payload.
+
+     The description will not change, though the key's state may.
+
+     It is not safe to sleep in this method; the RCU read lock is held by the
+     caller.
+
+
+  *  ``long (*read)(const struct key *key, char __user *buffer, size_t buflen);``
+
+     This method is optional. It is called by KEYCTL_READ to translate the
+     key's payload into something a blob of data for userspace to deal with.
+     Ideally, the blob should be in the same format as that passed in to the
+     instantiate and update methods.
+
+     If successful, the blob size that could be produced should be returned
+     rather than the size copied.
+
+     This method will be called with the key's semaphore read-locked. This will
+     prevent the key's payload changing. It is not necessary to use RCU locking
+     when accessing the key's payload. It is safe to sleep in this method, such
+     as might happen when the userspace buffer is accessed.
+
+
+  *  ``int (*request_key)(struct key_construction *cons, const char *op, void *aux);``
+
+     This method is optional.  If provided, request_key() and friends will
+     invoke this function rather than upcalling to /sbin/request-key to operate
+     upon a key of this type.
+
+     The aux parameter is as passed to request_key_async_with_auxdata() and
+     similar or is NULL otherwise.  Also passed are the construction record for
+     the key to be operated upon and the operation type (currently only
+     "create").
+
+     This method is permitted to return before the upcall is complete, but the
+     following function must be called under all circumstances to complete the
+     instantiation process, whether or not it succeeds, whether or not there's
+     an error::
+
+	void complete_request_key(struct key_construction *cons, int error);
+
+     The error parameter should be 0 on success, -ve on error.  The
+     construction record is destroyed by this action and the authorisation key
+     will be revoked.  If an error is indicated, the key under construction
+     will be negatively instantiated if it wasn't already instantiated.
+
+     If this method returns an error, that error will be returned to the
+     caller of request_key*().  complete_request_key() must be called prior to
+     returning.
+
+     The key under construction and the authorisation key can be found in the
+     key_construction struct pointed to by cons:
+
+      *  ``struct key *key;``
+
+     	 The key under construction.
+
+      *  ``struct key *authkey;``
+
+     	 The authorisation key.
+
+
+  *  ``struct key_restriction *(*lookup_restriction)(const char *params);``
+
+     This optional method is used to enable userspace configuration of keyring
+     restrictions. The restriction parameter string (not including the key type
+     name) is passed in, and this method returns a pointer to a key_restriction
+     structure containing the relevant functions and data to evaluate each
+     attempted key link operation. If there is no match, -EINVAL is returned.
+
+
+  *  ``asym_eds_op`` and ``asym_verify_signature``::
+
+       int (*asym_eds_op)(struct kernel_pkey_params *params,
+			  const void *in, void *out);
+       int (*asym_verify_signature)(struct kernel_pkey_params *params,
+				    const void *in, const void *in2);
+
+     These methods are optional.  If provided the first allows a key to be
+     used to encrypt, decrypt or sign a blob of data, and the second allows a
+     key to verify a signature.
+
+     In all cases, the following information is provided in the params block::
+
+	struct kernel_pkey_params {
+		struct key	*key;
+		const char	*encoding;
+		const char	*hash_algo;
+		char		*info;
+		__u32		in_len;
+		union {
+			__u32	out_len;
+			__u32	in2_len;
+		};
+		enum kernel_pkey_operation op : 8;
+	};
+
+     This includes the key to be used; a string indicating the encoding to use
+     (for instance, "pkcs1" may be used with an RSA key to indicate
+     RSASSA-PKCS1-v1.5 or RSAES-PKCS1-v1.5 encoding or "raw" if no encoding);
+     the name of the hash algorithm used to generate the data for a signature
+     (if appropriate); the sizes of the input and output (or second input)
+     buffers; and the ID of the operation to be performed.
+
+     For a given operation ID, the input and output buffers are used as
+     follows::
+
+	Operation ID		in,in_len	out,out_len	in2,in2_len
+	=======================	===============	===============	===============
+	kernel_pkey_encrypt	Raw data	Encrypted data	-
+	kernel_pkey_decrypt	Encrypted data	Raw data	-
+	kernel_pkey_sign	Raw data	Signature	-
+	kernel_pkey_verify	Raw data	-		Signature
+
+     asym_eds_op() deals with encryption, decryption and signature creation as
+     specified by params->op.  Note that params->op is also set for
+     asym_verify_signature().
+
+     Encrypting and signature creation both take raw data in the input buffer
+     and return the encrypted result in the output buffer.  Padding may have
+     been added if an encoding was set.  In the case of signature creation,
+     depending on the encoding, the padding created may need to indicate the
+     digest algorithm - the name of which should be supplied in hash_algo.
+
+     Decryption takes encrypted data in the input buffer and returns the raw
+     data in the output buffer.  Padding will get checked and stripped off if
+     an encoding was set.
+
+     Verification takes raw data in the input buffer and the signature in the
+     second input buffer and checks that the one matches the other.  Padding
+     will be validated.  Depending on the encoding, the digest algorithm used
+     to generate the raw data may need to be indicated in hash_algo.
+
+     If successful, asym_eds_op() should return the number of bytes written
+     into the output buffer.  asym_verify_signature() should return 0.
+
+     A variety of errors may be returned, including EOPNOTSUPP if the operation
+     is not supported; EKEYREJECTED if verification fails; ENOPKG if the
+     required crypto isn't available.
+
+
+  *  ``asym_query``::
+
+       int (*asym_query)(const struct kernel_pkey_params *params,
+			 struct kernel_pkey_query *info);
+
+     This method is optional.  If provided it allows information about the
+     public or asymmetric key held in the key to be determined.
+
+     The parameter block is as for asym_eds_op() and co. but in_len and out_len
+     are unused.  The encoding and hash_algo fields should be used to reduce
+     the returned buffer/data sizes as appropriate.
+
+     If successful, the following information is filled in::
+
+	struct kernel_pkey_query {
+		__u32		supported_ops;
+		__u32		key_size;
+		__u16		max_data_size;
+		__u16		max_sig_size;
+		__u16		max_enc_size;
+		__u16		max_dec_size;
+	};
+
+     The supported_ops field will contain a bitmask indicating what operations
+     are supported by the key, including encryption of a blob, decryption of a
+     blob, signing a blob and verifying the signature on a blob.  The following
+     constants are defined for this::
+
+	KEYCTL_SUPPORTS_{ENCRYPT,DECRYPT,SIGN,VERIFY}
+
+     The key_size field is the size of the key in bits.  max_data_size and
+     max_sig_size are the maximum raw data and signature sizes for creation and
+     verification of a signature; max_enc_size and max_dec_size are the maximum
+     raw data and signature sizes for encryption and decryption.  The
+     max_*_size fields are measured in bytes.
+
+     If successful, 0 will be returned.  If the key doesn't support this,
+     EOPNOTSUPP will be returned.
+
+
+Request-Key Callback Service
+============================
+
+To create a new key, the kernel will attempt to execute the following command
+line::
+
+	/sbin/request-key create <key> <uid> <gid> \
+		<threadring> <processring> <sessionring> <callout_info>
+
+<key> is the key being constructed, and the three keyrings are the process
+keyrings from the process that caused the search to be issued. These are
+included for two reasons:
+
+   1  There may be an authentication token in one of the keyrings that is
+      required to obtain the key, eg: a Kerberos Ticket-Granting Ticket.
+
+   2  The new key should probably be cached in one of these rings.
+
+This program should set it UID and GID to those specified before attempting to
+access any more keys. It may then look around for a user specific process to
+hand the request off to (perhaps a path held in placed in another key by, for
+example, the KDE desktop manager).
+
+The program (or whatever it calls) should finish construction of the key by
+calling KEYCTL_INSTANTIATE or KEYCTL_INSTANTIATE_IOV, which also permits it to
+cache the key in one of the keyrings (probably the session ring) before
+returning.  Alternatively, the key can be marked as negative with KEYCTL_NEGATE
+or KEYCTL_REJECT; this also permits the key to be cached in one of the
+keyrings.
+
+If it returns with the key remaining in the unconstructed state, the key will
+be marked as being negative, it will be added to the session keyring, and an
+error will be returned to the key requestor.
+
+Supplementary information may be provided from whoever or whatever invoked this
+service. This will be passed as the <callout_info> parameter. If no such
+information was made available, then "-" will be passed as this parameter
+instead.
+
+
+Similarly, the kernel may attempt to update an expired or a soon to expire key
+by executing::
+
+	/sbin/request-key update <key> <uid> <gid> \
+		<threadring> <processring> <sessionring>
+
+In this case, the program isn't required to actually attach the key to a ring;
+the rings are provided for reference.
+
+
+Garbage Collection
+==================
+
+Dead keys (for which the type has been removed) will be automatically unlinked
+from those keyrings that point to them and deleted as soon as possible by a
+background garbage collector.
+
+Similarly, revoked and expired keys will be garbage collected, but only after a
+certain amount of time has passed.  This time is set as a number of seconds in::
+
+	/proc/sys/kernel/keys/gc_delay
diff --git a/Documentation/security/keys/ecryptfs.rst b/Documentation/security/keys/ecryptfs.rst
new file mode 100644
index 0000000000..0e2be0a6bb
--- /dev/null
+++ b/Documentation/security/keys/ecryptfs.rst
@@ -0,0 +1,73 @@
+==========================================
+Encrypted keys for the eCryptfs filesystem
+==========================================
+
+ECryptfs is a stacked filesystem which transparently encrypts and decrypts each
+file using a randomly generated File Encryption Key (FEK).
+
+Each FEK is in turn encrypted with a File Encryption Key Encryption Key (FEKEK)
+either in kernel space or in user space with a daemon called 'ecryptfsd'.  In
+the former case the operation is performed directly by the kernel CryptoAPI
+using a key, the FEKEK, derived from a user prompted passphrase;  in the latter
+the FEK is encrypted by 'ecryptfsd' with the help of external libraries in order
+to support other mechanisms like public key cryptography, PKCS#11 and TPM based
+operations.
+
+The data structure defined by eCryptfs to contain information required for the
+FEK decryption is called authentication token and, currently, can be stored in a
+kernel key of the 'user' type, inserted in the user's session specific keyring
+by the userspace utility 'mount.ecryptfs' shipped with the package
+'ecryptfs-utils'.
+
+The 'encrypted' key type has been extended with the introduction of the new
+format 'ecryptfs' in order to be used in conjunction with the eCryptfs
+filesystem.  Encrypted keys of the newly introduced format store an
+authentication token in its payload with a FEKEK randomly generated by the
+kernel and protected by the parent master key.
+
+In order to avoid known-plaintext attacks, the datablob obtained through
+commands 'keyctl print' or 'keyctl pipe' does not contain the overall
+authentication token, which content is well known, but only the FEKEK in
+encrypted form.
+
+The eCryptfs filesystem may really benefit from using encrypted keys in that the
+required key can be securely generated by an Administrator and provided at boot
+time after the unsealing of a 'trusted' key in order to perform the mount in a
+controlled environment.  Another advantage is that the key is not exposed to
+threats of malicious software, because it is available in clear form only at
+kernel level.
+
+Usage::
+
+   keyctl add encrypted name "new ecryptfs key-type:master-key-name keylen" ring
+   keyctl add encrypted name "load hex_blob" ring
+   keyctl update keyid "update key-type:master-key-name"
+
+Where::
+
+	name:= '<16 hexadecimal characters>'
+	key-type:= 'trusted' | 'user'
+	keylen:= 64
+
+
+Example of encrypted key usage with the eCryptfs filesystem:
+
+Create an encrypted key "1000100010001000" of length 64 bytes with format
+'ecryptfs' and save it using a previously loaded user key "test"::
+
+    $ keyctl add encrypted 1000100010001000 "new ecryptfs user:test 64" @u
+    19184530
+
+    $ keyctl print 19184530
+    ecryptfs user:test 64 490045d4bfe48c99f0d465fbbbb79e7500da954178e2de0697
+    dd85091f5450a0511219e9f7cd70dcd498038181466f78ac8d4c19504fcc72402bfc41c2
+    f253a41b7507ccaa4b2b03fff19a69d1cc0b16e71746473f023a95488b6edfd86f7fdd40
+    9d292e4bacded1258880122dd553a661
+
+    $ keyctl pipe 19184530 > ecryptfs.blob
+
+Mount an eCryptfs filesystem using the created encrypted key "1000100010001000"
+into the '/secret' directory::
+
+    $ mount -i -t ecryptfs -oecryptfs_sig=1000100010001000,\
+      ecryptfs_cipher=aes,ecryptfs_key_bytes=32 /secret /secret
diff --git a/Documentation/security/keys/index.rst b/Documentation/security/keys/index.rst
new file mode 100644
index 0000000000..647d58f258
--- /dev/null
+++ b/Documentation/security/keys/index.rst
@@ -0,0 +1,11 @@
+===========
+Kernel Keys
+===========
+
+.. toctree::
+   :maxdepth: 1
+
+   core
+   ecryptfs
+   request-key
+   trusted-encrypted
diff --git a/Documentation/security/keys/request-key.rst b/Documentation/security/keys/request-key.rst
new file mode 100644
index 0000000000..35f2296b70
--- /dev/null
+++ b/Documentation/security/keys/request-key.rst
@@ -0,0 +1,207 @@
+===================
+Key Request Service
+===================
+
+The key request service is part of the key retention service (refer to
+Documentation/security/keys/core.rst).  This document explains more fully how
+the requesting algorithm works.
+
+The process starts by either the kernel requesting a service by calling
+``request_key*()``::
+
+	struct key *request_key(const struct key_type *type,
+				const char *description,
+				const char *callout_info);
+
+or::
+
+	struct key *request_key_tag(const struct key_type *type,
+				    const char *description,
+				    const struct key_tag *domain_tag,
+				    const char *callout_info);
+
+or::
+
+	struct key *request_key_with_auxdata(const struct key_type *type,
+					     const char *description,
+					     const struct key_tag *domain_tag,
+					     const char *callout_info,
+					     size_t callout_len,
+					     void *aux);
+
+or::
+
+	struct key *request_key_rcu(const struct key_type *type,
+				    const char *description,
+				    const struct key_tag *domain_tag);
+
+Or by userspace invoking the request_key system call::
+
+	key_serial_t request_key(const char *type,
+				 const char *description,
+				 const char *callout_info,
+				 key_serial_t dest_keyring);
+
+The main difference between the access points is that the in-kernel interface
+does not need to link the key to a keyring to prevent it from being immediately
+destroyed.  The kernel interface returns a pointer directly to the key, and
+it's up to the caller to destroy the key.
+
+The request_key_tag() call is like the in-kernel request_key(), except that it
+also takes a domain tag that allows keys to be separated by namespace and
+killed off as a group.
+
+The request_key_with_auxdata() calls is like the request_key_tag() call, except
+that they permit auxiliary data to be passed to the upcaller (the default is
+NULL).  This is only useful for those key types that define their own upcall
+mechanism rather than using /sbin/request-key.
+
+The request_key_rcu() call is like the request_key_tag() call, except that it
+doesn't check for keys that are under construction and doesn't attempt to
+construct missing keys.
+
+The userspace interface links the key to a keyring associated with the process
+to prevent the key from going away, and returns the serial number of the key to
+the caller.
+
+
+The following example assumes that the key types involved don't define their
+own upcall mechanisms.  If they do, then those should be substituted for the
+forking and execution of /sbin/request-key.
+
+
+The Process
+===========
+
+A request proceeds in the following manner:
+
+  1) Process A calls request_key() [the userspace syscall calls the kernel
+     interface].
+
+  2) request_key() searches the process's subscribed keyrings to see if there's
+     a suitable key there.  If there is, it returns the key.  If there isn't,
+     and callout_info is not set, an error is returned.  Otherwise the process
+     proceeds to the next step.
+
+  3) request_key() sees that A doesn't have the desired key yet, so it creates
+     two things:
+
+      a) An uninstantiated key U of requested type and description.
+
+      b) An authorisation key V that refers to key U and notes that process A
+     	 is the context in which key U should be instantiated and secured, and
+     	 from which associated key requests may be satisfied.
+
+  4) request_key() then forks and executes /sbin/request-key with a new session
+     keyring that contains a link to auth key V.
+
+  5) /sbin/request-key assumes the authority associated with key U.
+
+  6) /sbin/request-key execs an appropriate program to perform the actual
+     instantiation.
+
+  7) The program may want to access another key from A's context (say a
+     Kerberos TGT key).  It just requests the appropriate key, and the keyring
+     search notes that the session keyring has auth key V in its bottom level.
+
+     This will permit it to then search the keyrings of process A with the
+     UID, GID, groups and security info of process A as if it was process A,
+     and come up with key W.
+
+  8) The program then does what it must to get the data with which to
+     instantiate key U, using key W as a reference (perhaps it contacts a
+     Kerberos server using the TGT) and then instantiates key U.
+
+  9) Upon instantiating key U, auth key V is automatically revoked so that it
+     may not be used again.
+
+  10) The program then exits 0 and request_key() deletes key V and returns key
+      U to the caller.
+
+This also extends further.  If key W (step 7 above) didn't exist, key W would
+be created uninstantiated, another auth key (X) would be created (as per step
+3) and another copy of /sbin/request-key spawned (as per step 4); but the
+context specified by auth key X will still be process A, as it was in auth key
+V.
+
+This is because process A's keyrings can't simply be attached to
+/sbin/request-key at the appropriate places because (a) execve will discard two
+of them, and (b) it requires the same UID/GID/Groups all the way through.
+
+
+Negative Instantiation And Rejection
+====================================
+
+Rather than instantiating a key, it is possible for the possessor of an
+authorisation key to negatively instantiate a key that's under construction.
+This is a short duration placeholder that causes any attempt at re-requesting
+the key while it exists to fail with error ENOKEY if negated or the specified
+error if rejected.
+
+This is provided to prevent excessive repeated spawning of /sbin/request-key
+processes for a key that will never be obtainable.
+
+Should the /sbin/request-key process exit anything other than 0 or die on a
+signal, the key under construction will be automatically negatively
+instantiated for a short amount of time.
+
+
+The Search Algorithm
+====================
+
+A search of any particular keyring proceeds in the following fashion:
+
+  1) When the key management code searches for a key (keyring_search_rcu) it
+     firstly calls key_permission(SEARCH) on the keyring it's starting with,
+     if this denies permission, it doesn't search further.
+
+  2) It considers all the non-keyring keys within that keyring and, if any key
+     matches the criteria specified, calls key_permission(SEARCH) on it to see
+     if the key is allowed to be found.  If it is, that key is returned; if
+     not, the search continues, and the error code is retained if of higher
+     priority than the one currently set.
+
+  3) It then considers all the keyring-type keys in the keyring it's currently
+     searching.  It calls key_permission(SEARCH) on each keyring, and if this
+     grants permission, it recurses, executing steps (2) and (3) on that
+     keyring.
+
+The process stops immediately a valid key is found with permission granted to
+use it.  Any error from a previous match attempt is discarded and the key is
+returned.
+
+When request_key() is invoked, if CONFIG_KEYS_REQUEST_CACHE=y, a per-task
+one-key cache is first checked for a match.
+
+When search_process_keyrings() is invoked, it performs the following searches
+until one succeeds:
+
+  1) If extant, the process's thread keyring is searched.
+
+  2) If extant, the process's process keyring is searched.
+
+  3) The process's session keyring is searched.
+
+  4) If the process has assumed the authority associated with a request_key()
+     authorisation key then:
+
+      a) If extant, the calling process's thread keyring is searched.
+
+      b) If extant, the calling process's process keyring is searched.
+
+      c) The calling process's session keyring is searched.
+
+The moment one succeeds, all pending errors are discarded and the found key is
+returned.  If CONFIG_KEYS_REQUEST_CACHE=y, then that key is placed in the
+per-task cache, displacing the previous key.  The cache is cleared on exit or
+just prior to resumption of userspace.
+
+Only if all these fail does the whole thing fail with the highest priority
+error.  Note that several errors may have come from LSM.
+
+The error priority is::
+
+	EKEYREVOKED > EKEYEXPIRED > ENOKEY
+
+EACCES/EPERM are only returned on a direct search of a specific keyring where
+the basal keyring does not grant Search permission.
diff --git a/Documentation/security/keys/trusted-encrypted.rst b/Documentation/security/keys/trusted-encrypted.rst
new file mode 100644
index 0000000000..9bc9db8ec6
--- /dev/null
+++ b/Documentation/security/keys/trusted-encrypted.rst
@@ -0,0 +1,428 @@
+==========================
+Trusted and Encrypted Keys
+==========================
+
+Trusted and Encrypted Keys are two new key types added to the existing kernel
+key ring service.  Both of these new types are variable length symmetric keys,
+and in both cases all keys are created in the kernel, and user space sees,
+stores, and loads only encrypted blobs.  Trusted Keys require the availability
+of a Trust Source for greater security, while Encrypted Keys can be used on any
+system. All user level blobs, are displayed and loaded in hex ASCII for
+convenience, and are integrity verified.
+
+
+Trust Source
+============
+
+A trust source provides the source of security for Trusted Keys.  This
+section lists currently supported trust sources, along with their security
+considerations.  Whether or not a trust source is sufficiently safe depends
+on the strength and correctness of its implementation, as well as the threat
+environment for a specific use case.  Since the kernel doesn't know what the
+environment is, and there is no metric of trust, it is dependent on the
+consumer of the Trusted Keys to determine if the trust source is sufficiently
+safe.
+
+  *  Root of trust for storage
+
+     (1) TPM (Trusted Platform Module: hardware device)
+
+         Rooted to Storage Root Key (SRK) which never leaves the TPM that
+         provides crypto operation to establish root of trust for storage.
+
+     (2) TEE (Trusted Execution Environment: OP-TEE based on Arm TrustZone)
+
+         Rooted to Hardware Unique Key (HUK) which is generally burnt in on-chip
+         fuses and is accessible to TEE only.
+
+     (3) CAAM (Cryptographic Acceleration and Assurance Module: IP on NXP SoCs)
+
+         When High Assurance Boot (HAB) is enabled and the CAAM is in secure
+         mode, trust is rooted to the OTPMK, a never-disclosed 256-bit key
+         randomly generated and fused into each SoC at manufacturing time.
+         Otherwise, a common fixed test key is used instead.
+
+  *  Execution isolation
+
+     (1) TPM
+
+         Fixed set of operations running in isolated execution environment.
+
+     (2) TEE
+
+         Customizable set of operations running in isolated execution
+         environment verified via Secure/Trusted boot process.
+
+     (3) CAAM
+
+         Fixed set of operations running in isolated execution environment.
+
+  * Optional binding to platform integrity state
+
+     (1) TPM
+
+         Keys can be optionally sealed to specified PCR (integrity measurement)
+         values, and only unsealed by the TPM, if PCRs and blob integrity
+         verifications match. A loaded Trusted Key can be updated with new
+         (future) PCR values, so keys are easily migrated to new PCR values,
+         such as when the kernel and initramfs are updated. The same key can
+         have many saved blobs under different PCR values, so multiple boots are
+         easily supported.
+
+     (2) TEE
+
+         Relies on Secure/Trusted boot process for platform integrity. It can
+         be extended with TEE based measured boot process.
+
+     (3) CAAM
+
+         Relies on the High Assurance Boot (HAB) mechanism of NXP SoCs
+         for platform integrity.
+
+  *  Interfaces and APIs
+
+     (1) TPM
+
+         TPMs have well-documented, standardized interfaces and APIs.
+
+     (2) TEE
+
+         TEEs have well-documented, standardized client interface and APIs. For
+         more details refer to ``Documentation/staging/tee.rst``.
+
+     (3) CAAM
+
+         Interface is specific to silicon vendor.
+
+  *  Threat model
+
+     The strength and appropriateness of a particular trust source for a given
+     purpose must be assessed when using them to protect security-relevant data.
+
+
+Key Generation
+==============
+
+Trusted Keys
+------------
+
+New keys are created from random numbers. They are encrypted/decrypted using
+a child key in the storage key hierarchy. Encryption and decryption of the
+child key must be protected by a strong access control policy within the
+trust source. The random number generator in use differs according to the
+selected trust source:
+
+  *  TPM: hardware device based RNG
+
+     Keys are generated within the TPM. Strength of random numbers may vary
+     from one device manufacturer to another.
+
+  *  TEE: OP-TEE based on Arm TrustZone based RNG
+
+     RNG is customizable as per platform needs. It can either be direct output
+     from platform specific hardware RNG or a software based Fortuna CSPRNG
+     which can be seeded via multiple entropy sources.
+
+  *  CAAM: Kernel RNG
+
+     The normal kernel random number generator is used. To seed it from the
+     CAAM HWRNG, enable CRYPTO_DEV_FSL_CAAM_RNG_API and ensure the device
+     is probed.
+
+Users may override this by specifying ``trusted.rng=kernel`` on the kernel
+command-line to override the used RNG with the kernel's random number pool.
+
+Encrypted Keys
+--------------
+
+Encrypted keys do not depend on a trust source, and are faster, as they use AES
+for encryption/decryption. New keys are created either from kernel-generated
+random numbers or user-provided decrypted data, and are encrypted/decrypted
+using a specified ‘master’ key. The ‘master’ key can either be a trusted-key or
+user-key type. The main disadvantage of encrypted keys is that if they are not
+rooted in a trusted key, they are only as secure as the user key encrypting
+them. The master user key should therefore be loaded in as secure a way as
+possible, preferably early in boot.
+
+
+Usage
+=====
+
+Trusted Keys usage: TPM
+-----------------------
+
+TPM 1.2: By default, trusted keys are sealed under the SRK, which has the
+default authorization value (20 bytes of 0s).  This can be set at takeownership
+time with the TrouSerS utility: "tpm_takeownership -u -z".
+
+TPM 2.0: The user must first create a storage key and make it persistent, so the
+key is available after reboot. This can be done using the following commands.
+
+With the IBM TSS 2 stack::
+
+  #> tsscreateprimary -hi o -st
+  Handle 80000000
+  #> tssevictcontrol -hi o -ho 80000000 -hp 81000001
+
+Or with the Intel TSS 2 stack::
+
+  #> tpm2_createprimary --hierarchy o -G rsa2048 -c key.ctxt
+  [...]
+  #> tpm2_evictcontrol -c key.ctxt 0x81000001
+  persistentHandle: 0x81000001
+
+Usage::
+
+    keyctl add trusted name "new keylen [options]" ring
+    keyctl add trusted name "load hex_blob [pcrlock=pcrnum]" ring
+    keyctl update key "update [options]"
+    keyctl print keyid
+
+    options:
+       keyhandle=    ascii hex value of sealing key
+                       TPM 1.2: default 0x40000000 (SRK)
+                       TPM 2.0: no default; must be passed every time
+       keyauth=	     ascii hex auth for sealing key default 0x00...i
+                     (40 ascii zeros)
+       blobauth=     ascii hex auth for sealed data default 0x00...
+                     (40 ascii zeros)
+       pcrinfo=	     ascii hex of PCR_INFO or PCR_INFO_LONG (no default)
+       pcrlock=	     pcr number to be extended to "lock" blob
+       migratable=   0|1 indicating permission to reseal to new PCR values,
+                     default 1 (resealing allowed)
+       hash=         hash algorithm name as a string. For TPM 1.x the only
+                     allowed value is sha1. For TPM 2.x the allowed values
+                     are sha1, sha256, sha384, sha512 and sm3-256.
+       policydigest= digest for the authorization policy. must be calculated
+                     with the same hash algorithm as specified by the 'hash='
+                     option.
+       policyhandle= handle to an authorization policy session that defines the
+                     same policy and with the same hash algorithm as was used to
+                     seal the key.
+
+"keyctl print" returns an ascii hex copy of the sealed key, which is in standard
+TPM_STORED_DATA format.  The key length for new keys are always in bytes.
+Trusted Keys can be 32 - 128 bytes (256 - 1024 bits), the upper limit is to fit
+within the 2048 bit SRK (RSA) keylength, with all necessary structure/padding.
+
+Trusted Keys usage: TEE
+-----------------------
+
+Usage::
+
+    keyctl add trusted name "new keylen" ring
+    keyctl add trusted name "load hex_blob" ring
+    keyctl print keyid
+
+"keyctl print" returns an ASCII hex copy of the sealed key, which is in format
+specific to TEE device implementation.  The key length for new keys is always
+in bytes. Trusted Keys can be 32 - 128 bytes (256 - 1024 bits).
+
+Trusted Keys usage: CAAM
+------------------------
+
+Usage::
+
+    keyctl add trusted name "new keylen" ring
+    keyctl add trusted name "load hex_blob" ring
+    keyctl print keyid
+
+"keyctl print" returns an ASCII hex copy of the sealed key, which is in a
+CAAM-specific format.  The key length for new keys is always in bytes.
+Trusted Keys can be 32 - 128 bytes (256 - 1024 bits).
+
+Encrypted Keys usage
+--------------------
+
+The decrypted portion of encrypted keys can contain either a simple symmetric
+key or a more complex structure. The format of the more complex structure is
+application specific, which is identified by 'format'.
+
+Usage::
+
+    keyctl add encrypted name "new [format] key-type:master-key-name keylen"
+        ring
+    keyctl add encrypted name "new [format] key-type:master-key-name keylen
+        decrypted-data" ring
+    keyctl add encrypted name "load hex_blob" ring
+    keyctl update keyid "update key-type:master-key-name"
+
+Where::
+
+	format:= 'default | ecryptfs | enc32'
+	key-type:= 'trusted' | 'user'
+
+Examples of trusted and encrypted key usage
+-------------------------------------------
+
+Create and save a trusted key named "kmk" of length 32 bytes.
+
+Note: When using a TPM 2.0 with a persistent key with handle 0x81000001,
+append 'keyhandle=0x81000001' to statements between quotes, such as
+"new 32 keyhandle=0x81000001".
+
+::
+
+    $ keyctl add trusted kmk "new 32" @u
+    440502848
+
+    $ keyctl show
+    Session Keyring
+           -3 --alswrv    500   500  keyring: _ses
+     97833714 --alswrv    500    -1   \_ keyring: _uid.500
+    440502848 --alswrv    500   500       \_ trusted: kmk
+
+    $ keyctl print 440502848
+    0101000000000000000001005d01b7e3f4a6be5709930f3b70a743cbb42e0cc95e18e915
+    3f60da455bbf1144ad12e4f92b452f966929f6105fd29ca28e4d4d5a031d068478bacb0b
+    27351119f822911b0a11ba3d3498ba6a32e50dac7f32894dd890eb9ad578e4e292c83722
+    a52e56a097e6a68b3f56f7a52ece0cdccba1eb62cad7d817f6dc58898b3ac15f36026fec
+    d568bd4a706cb60bb37be6d8f1240661199d640b66fb0fe3b079f97f450b9ef9c22c6d5d
+    dd379f0facd1cd020281dfa3c70ba21a3fa6fc2471dc6d13ecf8298b946f65345faa5ef0
+    f1f8fff03ad0acb083725535636addb08d73dedb9832da198081e5deae84bfaf0409c22b
+    e4a8aea2b607ec96931e6f4d4fe563ba
+
+    $ keyctl pipe 440502848 > kmk.blob
+
+Load a trusted key from the saved blob::
+
+    $ keyctl add trusted kmk "load `cat kmk.blob`" @u
+    268728824
+
+    $ keyctl print 268728824
+    0101000000000000000001005d01b7e3f4a6be5709930f3b70a743cbb42e0cc95e18e915
+    3f60da455bbf1144ad12e4f92b452f966929f6105fd29ca28e4d4d5a031d068478bacb0b
+    27351119f822911b0a11ba3d3498ba6a32e50dac7f32894dd890eb9ad578e4e292c83722
+    a52e56a097e6a68b3f56f7a52ece0cdccba1eb62cad7d817f6dc58898b3ac15f36026fec
+    d568bd4a706cb60bb37be6d8f1240661199d640b66fb0fe3b079f97f450b9ef9c22c6d5d
+    dd379f0facd1cd020281dfa3c70ba21a3fa6fc2471dc6d13ecf8298b946f65345faa5ef0
+    f1f8fff03ad0acb083725535636addb08d73dedb9832da198081e5deae84bfaf0409c22b
+    e4a8aea2b607ec96931e6f4d4fe563ba
+
+Reseal (TPM specific) a trusted key under new PCR values::
+
+    $ keyctl update 268728824 "update pcrinfo=`cat pcr.blob`"
+    $ keyctl print 268728824
+    010100000000002c0002800093c35a09b70fff26e7a98ae786c641e678ec6ffb6b46d805
+    77c8a6377aed9d3219c6dfec4b23ffe3000001005d37d472ac8a44023fbb3d18583a4f73
+    d3a076c0858f6f1dcaa39ea0f119911ff03f5406df4f7f27f41da8d7194f45c9f4e00f2e
+    df449f266253aa3f52e55c53de147773e00f0f9aca86c64d94c95382265968c354c5eab4
+    9638c5ae99c89de1e0997242edfb0b501744e11ff9762dfd951cffd93227cc513384e7e6
+    e782c29435c7ec2edafaa2f4c1fe6e7a781b59549ff5296371b42133777dcc5b8b971610
+    94bc67ede19e43ddb9dc2baacad374a36feaf0314d700af0a65c164b7082401740e489c9
+    7ef6a24defe4846104209bf0c3eced7fa1a672ed5b125fc9d8cd88b476a658a4434644ef
+    df8ae9a178e9f83ba9f08d10fa47e4226b98b0702f06b3b8
+
+
+The initial consumer of trusted keys is EVM, which at boot time needs a high
+quality symmetric key for HMAC protection of file metadata. The use of a
+trusted key provides strong guarantees that the EVM key has not been
+compromised by a user level problem, and when sealed to a platform integrity
+state, protects against boot and offline attacks. Create and save an
+encrypted key "evm" using the above trusted key "kmk":
+
+option 1: omitting 'format'::
+
+    $ keyctl add encrypted evm "new trusted:kmk 32" @u
+    159771175
+
+option 2: explicitly defining 'format' as 'default'::
+
+    $ keyctl add encrypted evm "new default trusted:kmk 32" @u
+    159771175
+
+    $ keyctl print 159771175
+    default trusted:kmk 32 2375725ad57798846a9bbd240de8906f006e66c03af53b1b3
+    82dbbc55be2a44616e4959430436dc4f2a7a9659aa60bb4652aeb2120f149ed197c564e0
+    24717c64 5972dcb82ab2dde83376d82b2e3c09ffc
+
+    $ keyctl pipe 159771175 > evm.blob
+
+Load an encrypted key "evm" from saved blob::
+
+    $ keyctl add encrypted evm "load `cat evm.blob`" @u
+    831684262
+
+    $ keyctl print 831684262
+    default trusted:kmk 32 2375725ad57798846a9bbd240de8906f006e66c03af53b1b3
+    82dbbc55be2a44616e4959430436dc4f2a7a9659aa60bb4652aeb2120f149ed197c564e0
+    24717c64 5972dcb82ab2dde83376d82b2e3c09ffc
+
+Instantiate an encrypted key "evm" using user-provided decrypted data::
+
+    $ evmkey=$(dd if=/dev/urandom bs=1 count=32 | xxd -c32 -p)
+    $ keyctl add encrypted evm "new default user:kmk 32 $evmkey" @u
+    794890253
+
+    $ keyctl print 794890253
+    default user:kmk 32 2375725ad57798846a9bbd240de8906f006e66c03af53b1b382d
+    bbc55be2a44616e4959430436dc4f2a7a9659aa60bb4652aeb2120f149ed197c564e0247
+    17c64 5972dcb82ab2dde83376d82b2e3c09ffc
+
+Other uses for trusted and encrypted keys, such as for disk and file encryption
+are anticipated.  In particular the new format 'ecryptfs' has been defined
+in order to use encrypted keys to mount an eCryptfs filesystem.  More details
+about the usage can be found in the file
+``Documentation/security/keys/ecryptfs.rst``.
+
+Another new format 'enc32' has been defined in order to support encrypted keys
+with payload size of 32 bytes. This will initially be used for nvdimm security
+but may expand to other usages that require 32 bytes payload.
+
+
+TPM 2.0 ASN.1 Key Format
+------------------------
+
+The TPM 2.0 ASN.1 key format is designed to be easily recognisable,
+even in binary form (fixing a problem we had with the TPM 1.2 ASN.1
+format) and to be extensible for additions like importable keys and
+policy::
+
+    TPMKey ::= SEQUENCE {
+        type		OBJECT IDENTIFIER
+        emptyAuth	[0] EXPLICIT BOOLEAN OPTIONAL
+        parent		INTEGER
+        pubkey		OCTET STRING
+        privkey		OCTET STRING
+    }
+
+type is what distinguishes the key even in binary form since the OID
+is provided by the TCG to be unique and thus forms a recognizable
+binary pattern at offset 3 in the key.  The OIDs currently made
+available are::
+
+    2.23.133.10.1.3 TPM Loadable key.  This is an asymmetric key (Usually
+                    RSA2048 or Elliptic Curve) which can be imported by a
+                    TPM2_Load() operation.
+
+    2.23.133.10.1.4 TPM Importable Key.  This is an asymmetric key (Usually
+                    RSA2048 or Elliptic Curve) which can be imported by a
+                    TPM2_Import() operation.
+
+    2.23.133.10.1.5 TPM Sealed Data.  This is a set of data (up to 128
+                    bytes) which is sealed by the TPM.  It usually
+                    represents a symmetric key and must be unsealed before
+                    use.
+
+The trusted key code only uses the TPM Sealed Data OID.
+
+emptyAuth is true if the key has well known authorization "".  If it
+is false or not present, the key requires an explicit authorization
+phrase.  This is used by most user space consumers to decide whether
+to prompt for a password.
+
+parent represents the parent key handle, either in the 0x81 MSO space,
+like 0x81000001 for the RSA primary storage key.  Userspace programmes
+also support specifying the primary handle in the 0x40 MSO space.  If
+this happens the Elliptic Curve variant of the primary key using the
+TCG defined template will be generated on the fly into a volatile
+object and used as the parent.  The current kernel code only supports
+the 0x81 MSO form.
+
+pubkey is the binary representation of TPM2B_PRIVATE excluding the
+initial TPM2B header, which can be reconstructed from the ASN.1 octet
+string length.
+
+privkey is the binary representation of TPM2B_PUBLIC excluding the
+initial TPM2B header which can be reconstructed from the ASN.1 octed
+string length.
diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
new file mode 100644
index 0000000000..36f26501fd
--- /dev/null
+++ b/Documentation/security/landlock.rst
@@ -0,0 +1,129 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+.. Copyright © 2019-2020 ANSSI
+
+==================================
+Landlock LSM: kernel documentation
+==================================
+
+:Author: Mickaël Salaün
+:Date: December 2022
+
+Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
+harden a whole system, this feature should be available to any process,
+including unprivileged ones.  Because such process may be compromised or
+backdoored (i.e. untrusted), Landlock's features must be safe to use from the
+kernel and other processes point of view.  Landlock's interface must therefore
+expose a minimal attack surface.
+
+Landlock is designed to be usable by unprivileged processes while following the
+system security policy enforced by other access control mechanisms (e.g. DAC,
+LSM).  Indeed, a Landlock rule shall not interfere with other access-controls
+enforced on the system, only add more restrictions.
+
+Any user can enforce Landlock rulesets on their processes.  They are merged and
+evaluated according to the inherited ones in a way that ensures that only more
+constraints can be added.
+
+User space documentation can be found here:
+Documentation/userspace-api/landlock.rst.
+
+Guiding principles for safe access controls
+===========================================
+
+* A Landlock rule shall be focused on access control on kernel objects instead
+  of syscall filtering (i.e. syscall arguments), which is the purpose of
+  seccomp-bpf.
+* To avoid multiple kinds of side-channel attacks (e.g. leak of security
+  policies, CPU-based attacks), Landlock rules shall not be able to
+  programmatically communicate with user space.
+* Kernel access check shall not slow down access request from unsandboxed
+  processes.
+* Computation related to Landlock operations (e.g. enforcing a ruleset) shall
+  only impact the processes requesting them.
+* Resources (e.g. file descriptors) directly obtained from the kernel by a
+  sandboxed process shall retain their scoped accesses (at the time of resource
+  acquisition) whatever process use them.
+  Cf. `File descriptor access rights`_.
+
+Design choices
+==============
+
+Inode access rights
+-------------------
+
+All access rights are tied to an inode and what can be accessed through it.
+Reading the content of a directory does not imply to be allowed to read the
+content of a listed inode.  Indeed, a file name is local to its parent
+directory, and an inode can be referenced by multiple file names thanks to
+(hard) links.  Being able to unlink a file only has a direct impact on the
+directory, not the unlinked inode.  This is the reason why
+``LANDLOCK_ACCESS_FS_REMOVE_FILE`` or ``LANDLOCK_ACCESS_FS_REFER`` are not
+allowed to be tied to files but only to directories.
+
+File descriptor access rights
+-----------------------------
+
+Access rights are checked and tied to file descriptors at open time.  The
+underlying principle is that equivalent sequences of operations should lead to
+the same results, when they are executed under the same Landlock domain.
+
+Taking the ``LANDLOCK_ACCESS_FS_TRUNCATE`` right as an example, it may be
+allowed to open a file for writing without being allowed to
+:manpage:`ftruncate` the resulting file descriptor if the related file
+hierarchy doesn't grant such access right.  The following sequences of
+operations have the same semantic and should then have the same result:
+
+* ``truncate(path);``
+* ``int fd = open(path, O_WRONLY); ftruncate(fd); close(fd);``
+
+Similarly to file access modes (e.g. ``O_RDWR``), Landlock access rights
+attached to file descriptors are retained even if they are passed between
+processes (e.g. through a Unix domain socket).  Such access rights will then be
+enforced even if the receiving process is not sandboxed by Landlock.  Indeed,
+this is required to keep a consistent access control over the whole system, and
+this avoids unattended bypasses through file descriptor passing (i.e. confused
+deputy attack).
+
+Tests
+=====
+
+Userspace tests for backward compatibility, ptrace restrictions and filesystem
+support can be found here: `tools/testing/selftests/landlock/`_.
+
+Kernel structures
+=================
+
+Object
+------
+
+.. kernel-doc:: security/landlock/object.h
+    :identifiers:
+
+Filesystem
+----------
+
+.. kernel-doc:: security/landlock/fs.h
+    :identifiers:
+
+Ruleset and domain
+------------------
+
+A domain is a read-only ruleset tied to a set of subjects (i.e. tasks'
+credentials).  Each time a ruleset is enforced on a task, the current domain is
+duplicated and the ruleset is imported as a new layer of rules in the new
+domain.  Indeed, once in a domain, each rule is tied to a layer level.  To
+grant access to an object, at least one rule of each layer must allow the
+requested action on the object.  A task can then only transit to a new domain
+that is the intersection of the constraints from the current domain and those
+of a ruleset provided by the task.
+
+The definition of a subject is implicit for a task sandboxing itself, which
+makes the reasoning much easier and helps avoid pitfalls.
+
+.. kernel-doc:: security/landlock/ruleset.h
+    :identifiers:
+
+.. Links
+.. _tools/testing/selftests/landlock/:
+   https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/landlock/
diff --git a/Documentation/security/lsm-development.rst b/Documentation/security/lsm-development.rst
new file mode 100644
index 0000000000..5895e529da
--- /dev/null
+++ b/Documentation/security/lsm-development.rst
@@ -0,0 +1,17 @@
+=================================
+Linux Security Module Development
+=================================
+
+Based on https://lore.kernel.org/r/20071026073721.618b4778@laptopd505.fenrus.org,
+a new LSM is accepted into the kernel when its intent (a description of
+what it tries to protect against and in what cases one would expect to
+use it) has been appropriately documented in ``Documentation/admin-guide/LSM/``.
+This allows an LSM's code to be easily compared to its goals, and so
+that end users and distros can make a more informed decision about which
+LSMs suit their requirements.
+
+For extensive documentation on the available LSM hook interfaces, please
+see ``security/security.c`` and associated structures:
+
+.. kernel-doc:: security/security.c
+   :export:
diff --git a/Documentation/security/lsm.rst b/Documentation/security/lsm.rst
new file mode 100644
index 0000000000..c20c7c72e2
--- /dev/null
+++ b/Documentation/security/lsm.rst
@@ -0,0 +1,131 @@
+========================================================
+Linux Security Modules: General Security Hooks for Linux
+========================================================
+
+:Author: Stephen Smalley
+:Author: Timothy Fraser
+:Author: Chris Vance
+
+.. note::
+
+   The APIs described in this book are outdated.
+
+Introduction
+============
+
+In March 2001, the National Security Agency (NSA) gave a presentation
+about Security-Enhanced Linux (SELinux) at the 2.5 Linux Kernel Summit.
+SELinux is an implementation of flexible and fine-grained
+nondiscretionary access controls in the Linux kernel, originally
+implemented as its own particular kernel patch. Several other security
+projects (e.g. RSBAC, Medusa) have also developed flexible access
+control architectures for the Linux kernel, and various projects have
+developed particular access control models for Linux (e.g. LIDS, DTE,
+SubDomain). Each project has developed and maintained its own kernel
+patch to support its security needs.
+
+In response to the NSA presentation, Linus Torvalds made a set of
+remarks that described a security framework he would be willing to
+consider for inclusion in the mainstream Linux kernel. He described a
+general framework that would provide a set of security hooks to control
+operations on kernel objects and a set of opaque security fields in
+kernel data structures for maintaining security attributes. This
+framework could then be used by loadable kernel modules to implement any
+desired model of security. Linus also suggested the possibility of
+migrating the Linux capabilities code into such a module.
+
+The Linux Security Modules (LSM) project was started by WireX to develop
+such a framework. LSM was a joint development effort by several security
+projects, including Immunix, SELinux, SGI and Janus, and several
+individuals, including Greg Kroah-Hartman and James Morris, to develop a
+Linux kernel patch that implements this framework. The work was
+incorporated in the mainstream in December of 2003. This technical
+report provides an overview of the framework and the capabilities
+security module.
+
+LSM Framework
+=============
+
+The LSM framework provides a general kernel framework to support
+security modules. In particular, the LSM framework is primarily focused
+on supporting access control modules, although future development is
+likely to address other security needs such as sandboxing. By itself, the
+framework does not provide any additional security; it merely provides
+the infrastructure to support security modules. The LSM framework is
+optional, requiring `CONFIG_SECURITY` to be enabled. The capabilities
+logic is implemented as a security module.
+This capabilities module is discussed further in
+`LSM Capabilities Module`_.
+
+The LSM framework includes security fields in kernel data structures and
+calls to hook functions at critical points in the kernel code to
+manage the security fields and to perform access control.
+It also adds functions for registering security modules.
+An interface `/sys/kernel/security/lsm` reports a comma separated list
+of security modules that are active on the system.
+
+The LSM security fields are simply ``void*`` pointers.
+The data is referred to as a blob, which may be managed by
+the framework or by the individual security modules that use it.
+Security blobs that are used by more than one security module are
+typically managed by the framework.
+For process and
+program execution security information, security fields are included in
+:c:type:`struct task_struct <task_struct>` and
+:c:type:`struct cred <cred>`.
+For filesystem
+security information, a security field is included in :c:type:`struct
+super_block <super_block>`. For pipe, file, and socket security
+information, security fields are included in :c:type:`struct inode
+<inode>` and :c:type:`struct file <file>`.
+For System V IPC security information,
+security fields were added to :c:type:`struct kern_ipc_perm
+<kern_ipc_perm>` and :c:type:`struct msg_msg
+<msg_msg>`; additionally, the definitions for :c:type:`struct
+msg_msg <msg_msg>`, struct msg_queue, and struct shmid_kernel
+were moved to header files (``include/linux/msg.h`` and
+``include/linux/shm.h`` as appropriate) to allow the security modules to
+use these definitions.
+
+For packet and
+network device security information, security fields were added to
+:c:type:`struct sk_buff <sk_buff>` and
+:c:type:`struct scm_cookie <scm_cookie>`.
+Unlike the other security module data, the data used here is a
+32-bit integer. The security modules are required to map or otherwise
+associate these values with real security attributes.
+
+LSM hooks are maintained in lists. A list is maintained for each
+hook, and the hooks are called in the order specified by CONFIG_LSM.
+Detailed documentation for each hook is
+included in the `security/security.c` source file.
+
+The LSM framework provides for a close approximation of
+general security module stacking. It defines
+security_add_hooks() to which each security module passes a
+:c:type:`struct security_hooks_list <security_hooks_list>`,
+which are added to the lists.
+The LSM framework does not provide a mechanism for removing hooks that
+have been registered. The SELinux security module has implemented
+a way to remove itself, however the feature has been deprecated.
+
+The hooks can be viewed as falling into two major
+categories: hooks that are used to manage the security fields and hooks
+that are used to perform access control. Examples of the first category
+of hooks include the security_inode_alloc() and security_inode_free()
+These hooks are used to allocate
+and free security structures for inode objects.
+An example of the second category of hooks
+is the security_inode_permission() hook.
+This hook checks permission when accessing an inode.
+
+LSM Capabilities Module
+=======================
+
+The POSIX.1e capabilities logic is maintained as a security module
+stored in the file ``security/commoncap.c``. The capabilities
+module uses the order field of the :c:type:`lsm_info` description
+to identify it as the first security module to be registered.
+The capabilities security module does not use the general security
+blobs, unlike other modules. The reasons are historical and are
+based on overhead, complexity and performance concerns.
diff --git a/Documentation/security/sak.rst b/Documentation/security/sak.rst
new file mode 100644
index 0000000000..260e1d3687
--- /dev/null
+++ b/Documentation/security/sak.rst
@@ -0,0 +1,91 @@
+=========================================
+Linux Secure Attention Key (SAK) handling
+=========================================
+
+:Date: 18 March 2001
+:Author: Andrew Morton
+
+An operating system's Secure Attention Key is a security tool which is
+provided as protection against trojan password capturing programs.  It
+is an undefeatable way of killing all programs which could be
+masquerading as login applications.  Users need to be taught to enter
+this key sequence before they log in to the system.
+
+From the PC keyboard, Linux has two similar but different ways of
+providing SAK.  One is the ALT-SYSRQ-K sequence.  You shouldn't use
+this sequence.  It is only available if the kernel was compiled with
+sysrq support.
+
+The proper way of generating a SAK is to define the key sequence using
+``loadkeys``.  This will work whether or not sysrq support is compiled
+into the kernel.
+
+SAK works correctly when the keyboard is in raw mode.  This means that
+once defined, SAK will kill a running X server.  If the system is in
+run level 5, the X server will restart.  This is what you want to
+happen.
+
+What key sequence should you use? Well, CTRL-ALT-DEL is used to reboot
+the machine.  CTRL-ALT-BACKSPACE is magical to the X server.  We'll
+choose CTRL-ALT-PAUSE.
+
+In your rc.sysinit (or rc.local) file, add the command::
+
+	echo "control alt keycode 101 = SAK" | /bin/loadkeys
+
+And that's it!  Only the superuser may reprogram the SAK key.
+
+
+.. note::
+
+  1. Linux SAK is said to be not a "true SAK" as is required by
+     systems which implement C2 level security.  This author does not
+     know why.
+
+
+  2. On the PC keyboard, SAK kills all applications which have
+     /dev/console opened.
+
+     Unfortunately this includes a number of things which you don't
+     actually want killed.  This is because these applications are
+     incorrectly holding /dev/console open.  Be sure to complain to your
+     Linux distributor about this!
+
+     You can identify processes which will be killed by SAK with the
+     command::
+
+	# ls -l /proc/[0-9]*/fd/* | grep console
+	l-wx------    1 root     root           64 Mar 18 00:46 /proc/579/fd/0 -> /dev/console
+
+     Then::
+
+	# ps aux|grep 579
+	root       579  0.0  0.1  1088  436 ?        S    00:43   0:00 gpm -t ps/2
+
+     So ``gpm`` will be killed by SAK.  This is a bug in gpm.  It should
+     be closing standard input.  You can work around this by finding the
+     initscript which launches gpm and changing it thusly:
+
+     Old::
+
+	daemon gpm
+
+     New::
+
+	daemon gpm < /dev/null
+
+     Vixie cron also seems to have this problem, and needs the same treatment.
+
+     Also, one prominent Linux distribution has the following three
+     lines in its rc.sysinit and rc scripts::
+
+	exec 3<&0
+	exec 4>&1
+	exec 5>&2
+
+     These commands cause **all** daemons which are launched by the
+     initscripts to have file descriptors 3, 4 and 5 attached to
+     /dev/console.  So SAK kills them all.  A workaround is to simply
+     delete these lines, but this may cause system management
+     applications to malfunction - test everything well.
+
diff --git a/Documentation/security/secrets/coco.rst b/Documentation/security/secrets/coco.rst
new file mode 100644
index 0000000000..a1210db8ec
--- /dev/null
+++ b/Documentation/security/secrets/coco.rst
@@ -0,0 +1,103 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============================
+Confidential Computing secrets
+==============================
+
+This document describes how Confidential Computing secret injection is handled
+from the firmware to the operating system, in the EFI driver and the efi_secret
+kernel module.
+
+
+Introduction
+============
+
+Confidential Computing (coco) hardware such as AMD SEV (Secure Encrypted
+Virtualization) allows guest owners to inject secrets into the VMs
+memory without the host/hypervisor being able to read them.  In SEV,
+secret injection is performed early in the VM launch process, before the
+guest starts running.
+
+The efi_secret kernel module allows userspace applications to access these
+secrets via securityfs.
+
+
+Secret data flow
+================
+
+The guest firmware may reserve a designated memory area for secret injection,
+and publish its location (base GPA and length) in the EFI configuration table
+under a ``LINUX_EFI_COCO_SECRET_AREA_GUID`` entry
+(``adf956ad-e98c-484c-ae11-b51c7d336447``).  This memory area should be marked
+by the firmware as ``EFI_RESERVED_TYPE``, and therefore the kernel should not
+be use it for its own purposes.
+
+During the VM's launch, the virtual machine manager may inject a secret to that
+area.  In AMD SEV and SEV-ES this is performed using the
+``KVM_SEV_LAUNCH_SECRET`` command (see [sev]_).  The structure of the injected
+Guest Owner secret data should be a GUIDed table of secret values; the binary
+format is described in ``drivers/virt/coco/efi_secret/efi_secret.c`` under
+"Structure of the EFI secret area".
+
+On kernel start, the kernel's EFI driver saves the location of the secret area
+(taken from the EFI configuration table) in the ``efi.coco_secret`` field.
+Later it checks if the secret area is populated: it maps the area and checks
+whether its content begins with ``EFI_SECRET_TABLE_HEADER_GUID``
+(``1e74f542-71dd-4d66-963e-ef4287ff173b``).  If the secret area is populated,
+the EFI driver will autoload the efi_secret kernel module, which exposes the
+secrets to userspace applications via securityfs.  The details of the
+efi_secret filesystem interface are in [secrets-coco-abi]_.
+
+
+Application usage example
+=========================
+
+Consider a guest performing computations on encrypted files.  The Guest Owner
+provides the decryption key (= secret) using the secret injection mechanism.
+The guest application reads the secret from the efi_secret filesystem and
+proceeds to decrypt the files into memory and then performs the needed
+computations on the content.
+
+In this example, the host can't read the files from the disk image
+because they are encrypted.  Host can't read the decryption key because
+it is passed using the secret injection mechanism (= secure channel).
+Host can't read the decrypted content from memory because it's a
+confidential (memory-encrypted) guest.
+
+Here is a simple example for usage of the efi_secret module in a guest
+to which an EFI secret area with 4 secrets was injected during launch::
+
+	# ls -la /sys/kernel/security/secrets/coco
+	total 0
+	drwxr-xr-x 2 root root 0 Jun 28 11:54 .
+	drwxr-xr-x 3 root root 0 Jun 28 11:54 ..
+	-r--r----- 1 root root 0 Jun 28 11:54 736870e5-84f0-4973-92ec-06879ce3da0b
+	-r--r----- 1 root root 0 Jun 28 11:54 83c83f7f-1356-4975-8b7e-d3a0b54312c6
+	-r--r----- 1 root root 0 Jun 28 11:54 9553f55d-3da2-43ee-ab5d-ff17f78864d2
+	-r--r----- 1 root root 0 Jun 28 11:54 e6f5a162-d67f-4750-a67c-5d065f2a9910
+
+	# hd /sys/kernel/security/secrets/coco/e6f5a162-d67f-4750-a67c-5d065f2a9910
+	00000000  74 68 65 73 65 2d 61 72  65 2d 74 68 65 2d 6b 61  |these-are-the-ka|
+	00000010  74 61 2d 73 65 63 72 65  74 73 00 01 02 03 04 05  |ta-secrets......|
+	00000020  06 07                                             |..|
+	00000022
+
+	# rm /sys/kernel/security/secrets/coco/e6f5a162-d67f-4750-a67c-5d065f2a9910
+
+	# ls -la /sys/kernel/security/secrets/coco
+	total 0
+	drwxr-xr-x 2 root root 0 Jun 28 11:55 .
+	drwxr-xr-x 3 root root 0 Jun 28 11:54 ..
+	-r--r----- 1 root root 0 Jun 28 11:54 736870e5-84f0-4973-92ec-06879ce3da0b
+	-r--r----- 1 root root 0 Jun 28 11:54 83c83f7f-1356-4975-8b7e-d3a0b54312c6
+	-r--r----- 1 root root 0 Jun 28 11:54 9553f55d-3da2-43ee-ab5d-ff17f78864d2
+
+
+References
+==========
+
+See [sev-api-spec]_ for more info regarding SEV ``LAUNCH_SECRET`` operation.
+
+.. [sev] Documentation/virt/kvm/x86/amd-memory-encryption.rst
+.. [secrets-coco-abi] Documentation/ABI/testing/securityfs-secrets-coco
+.. [sev-api-spec] https://www.amd.com/system/files/TechDocs/55766_SEV-KM_API_Specification.pdf
diff --git a/Documentation/security/secrets/index.rst b/Documentation/security/secrets/index.rst
new file mode 100644
index 0000000000..ced34e9c43
--- /dev/null
+++ b/Documentation/security/secrets/index.rst
@@ -0,0 +1,9 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+Secrets documentation
+=====================
+
+.. toctree::
+
+   coco
diff --git a/Documentation/security/self-protection.rst b/Documentation/security/self-protection.rst
new file mode 100644
index 0000000000..910668e665
--- /dev/null
+++ b/Documentation/security/self-protection.rst
@@ -0,0 +1,316 @@
+======================
+Kernel Self-Protection
+======================
+
+Kernel self-protection is the design and implementation of systems and
+structures within the Linux kernel to protect against security flaws in
+the kernel itself. This covers a wide range of issues, including removing
+entire classes of bugs, blocking security flaw exploitation methods,
+and actively detecting attack attempts. Not all topics are explored in
+this document, but it should serve as a reasonable starting point and
+answer any frequently asked questions. (Patches welcome, of course!)
+
+In the worst-case scenario, we assume an unprivileged local attacker
+has arbitrary read and write access to the kernel's memory. In many
+cases, bugs being exploited will not provide this level of access,
+but with systems in place that defend against the worst case we'll
+cover the more limited cases as well. A higher bar, and one that should
+still be kept in mind, is protecting the kernel against a _privileged_
+local attacker, since the root user has access to a vastly increased
+attack surface. (Especially when they have the ability to load arbitrary
+kernel modules.)
+
+The goals for successful self-protection systems would be that they
+are effective, on by default, require no opt-in by developers, have no
+performance impact, do not impede kernel debugging, and have tests. It
+is uncommon that all these goals can be met, but it is worth explicitly
+mentioning them, since these aspects need to be explored, dealt with,
+and/or accepted.
+
+
+Attack Surface Reduction
+========================
+
+The most fundamental defense against security exploits is to reduce the
+areas of the kernel that can be used to redirect execution. This ranges
+from limiting the exposed APIs available to userspace, making in-kernel
+APIs hard to use incorrectly, minimizing the areas of writable kernel
+memory, etc.
+
+Strict kernel memory permissions
+--------------------------------
+
+When all of kernel memory is writable, it becomes trivial for attacks
+to redirect execution flow. To reduce the availability of these targets
+the kernel needs to protect its memory with a tight set of permissions.
+
+Executable code and read-only data must not be writable
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Any areas of the kernel with executable memory must not be writable.
+While this obviously includes the kernel text itself, we must consider
+all additional places too: kernel modules, JIT memory, etc. (There are
+temporary exceptions to this rule to support things like instruction
+alternatives, breakpoints, kprobes, etc. If these must exist in a
+kernel, they are implemented in a way where the memory is temporarily
+made writable during the update, and then returned to the original
+permissions.)
+
+In support of this are ``CONFIG_STRICT_KERNEL_RWX`` and
+``CONFIG_STRICT_MODULE_RWX``, which seek to make sure that code is not
+writable, data is not executable, and read-only data is neither writable
+nor executable.
+
+Most architectures have these options on by default and not user selectable.
+For some architectures like arm that wish to have these be selectable,
+the architecture Kconfig can select ARCH_OPTIONAL_KERNEL_RWX to enable
+a Kconfig prompt. ``CONFIG_ARCH_OPTIONAL_KERNEL_RWX_DEFAULT`` determines
+the default setting when ARCH_OPTIONAL_KERNEL_RWX is enabled.
+
+Function pointers and sensitive variables must not be writable
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Vast areas of kernel memory contain function pointers that are looked
+up by the kernel and used to continue execution (e.g. descriptor/vector
+tables, file/network/etc operation structures, etc). The number of these
+variables must be reduced to an absolute minimum.
+
+Many such variables can be made read-only by setting them "const"
+so that they live in the .rodata section instead of the .data section
+of the kernel, gaining the protection of the kernel's strict memory
+permissions as described above.
+
+For variables that are initialized once at ``__init`` time, these can
+be marked with the ``__ro_after_init`` attribute.
+
+What remains are variables that are updated rarely (e.g. GDT). These
+will need another infrastructure (similar to the temporary exceptions
+made to kernel code mentioned above) that allow them to spend the rest
+of their lifetime read-only. (For example, when being updated, only the
+CPU thread performing the update would be given uninterruptible write
+access to the memory.)
+
+Segregation of kernel memory from userspace memory
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The kernel must never execute userspace memory. The kernel must also never
+access userspace memory without explicit expectation to do so. These
+rules can be enforced either by support of hardware-based restrictions
+(x86's SMEP/SMAP, ARM's PXN/PAN) or via emulation (ARM's Memory Domains).
+By blocking userspace memory in this way, execution and data parsing
+cannot be passed to trivially-controlled userspace memory, forcing
+attacks to operate entirely in kernel memory.
+
+Reduced access to syscalls
+--------------------------
+
+One trivial way to eliminate many syscalls for 64-bit systems is building
+without ``CONFIG_COMPAT``. However, this is rarely a feasible scenario.
+
+The "seccomp" system provides an opt-in feature made available to
+userspace, which provides a way to reduce the number of kernel entry
+points available to a running process. This limits the breadth of kernel
+code that can be reached, possibly reducing the availability of a given
+bug to an attack.
+
+An area of improvement would be creating viable ways to keep access to
+things like compat, user namespaces, BPF creation, and perf limited only
+to trusted processes. This would keep the scope of kernel entry points
+restricted to the more regular set of normally available to unprivileged
+userspace.
+
+Restricting access to kernel modules
+------------------------------------
+
+The kernel should never allow an unprivileged user the ability to
+load specific kernel modules, since that would provide a facility to
+unexpectedly extend the available attack surface. (The on-demand loading
+of modules via their predefined subsystems, e.g. MODULE_ALIAS_*, is
+considered "expected" here, though additional consideration should be
+given even to these.) For example, loading a filesystem module via an
+unprivileged socket API is nonsense: only the root or physically local
+user should trigger filesystem module loading. (And even this can be up
+for debate in some scenarios.)
+
+To protect against even privileged users, systems may need to either
+disable module loading entirely (e.g. monolithic kernel builds or
+modules_disabled sysctl), or provide signed modules (e.g.
+``CONFIG_MODULE_SIG_FORCE``, or dm-crypt with LoadPin), to keep from having
+root load arbitrary kernel code via the module loader interface.
+
+
+Memory integrity
+================
+
+There are many memory structures in the kernel that are regularly abused
+to gain execution control during an attack, By far the most commonly
+understood is that of the stack buffer overflow in which the return
+address stored on the stack is overwritten. Many other examples of this
+kind of attack exist, and protections exist to defend against them.
+
+Stack buffer overflow
+---------------------
+
+The classic stack buffer overflow involves writing past the expected end
+of a variable stored on the stack, ultimately writing a controlled value
+to the stack frame's stored return address. The most widely used defense
+is the presence of a stack canary between the stack variables and the
+return address (``CONFIG_STACKPROTECTOR``), which is verified just before
+the function returns. Other defenses include things like shadow stacks.
+
+Stack depth overflow
+--------------------
+
+A less well understood attack is using a bug that triggers the
+kernel to consume stack memory with deep function calls or large stack
+allocations. With this attack it is possible to write beyond the end of
+the kernel's preallocated stack space and into sensitive structures. Two
+important changes need to be made for better protections: moving the
+sensitive thread_info structure elsewhere, and adding a faulting memory
+hole at the bottom of the stack to catch these overflows.
+
+Heap memory integrity
+---------------------
+
+The structures used to track heap free lists can be sanity-checked during
+allocation and freeing to make sure they aren't being used to manipulate
+other memory areas.
+
+Counter integrity
+-----------------
+
+Many places in the kernel use atomic counters to track object references
+or perform similar lifetime management. When these counters can be made
+to wrap (over or under) this traditionally exposes a use-after-free
+flaw. By trapping atomic wrapping, this class of bug vanishes.
+
+Size calculation overflow detection
+-----------------------------------
+
+Similar to counter overflow, integer overflows (usually size calculations)
+need to be detected at runtime to kill this class of bug, which
+traditionally leads to being able to write past the end of kernel buffers.
+
+
+Probabilistic defenses
+======================
+
+While many protections can be considered deterministic (e.g. read-only
+memory cannot be written to), some protections provide only statistical
+defense, in that an attack must gather enough information about a
+running system to overcome the defense. While not perfect, these do
+provide meaningful defenses.
+
+Canaries, blinding, and other secrets
+-------------------------------------
+
+It should be noted that things like the stack canary discussed earlier
+are technically statistical defenses, since they rely on a secret value,
+and such values may become discoverable through an information exposure
+flaw.
+
+Blinding literal values for things like JITs, where the executable
+contents may be partially under the control of userspace, need a similar
+secret value.
+
+It is critical that the secret values used must be separate (e.g.
+different canary per stack) and high entropy (e.g. is the RNG actually
+working?) in order to maximize their success.
+
+Kernel Address Space Layout Randomization (KASLR)
+-------------------------------------------------
+
+Since the location of kernel memory is almost always instrumental in
+mounting a successful attack, making the location non-deterministic
+raises the difficulty of an exploit. (Note that this in turn makes
+the value of information exposures higher, since they may be used to
+discover desired memory locations.)
+
+Text and module base
+~~~~~~~~~~~~~~~~~~~~
+
+By relocating the physical and virtual base address of the kernel at
+boot-time (``CONFIG_RANDOMIZE_BASE``), attacks needing kernel code will be
+frustrated. Additionally, offsetting the module loading base address
+means that even systems that load the same set of modules in the same
+order every boot will not share a common base address with the rest of
+the kernel text.
+
+Stack base
+~~~~~~~~~~
+
+If the base address of the kernel stack is not the same between processes,
+or even not the same between syscalls, targets on or beyond the stack
+become more difficult to locate.
+
+Dynamic memory base
+~~~~~~~~~~~~~~~~~~~
+
+Much of the kernel's dynamic memory (e.g. kmalloc, vmalloc, etc) ends up
+being relatively deterministic in layout due to the order of early-boot
+initializations. If the base address of these areas is not the same
+between boots, targeting them is frustrated, requiring an information
+exposure specific to the region.
+
+Structure layout
+~~~~~~~~~~~~~~~~
+
+By performing a per-build randomization of the layout of sensitive
+structures, attacks must either be tuned to known kernel builds or expose
+enough kernel memory to determine structure layouts before manipulating
+them.
+
+
+Preventing Information Exposures
+================================
+
+Since the locations of sensitive structures are the primary target for
+attacks, it is important to defend against exposure of both kernel memory
+addresses and kernel memory contents (since they may contain kernel
+addresses or other sensitive things like canary values).
+
+Kernel addresses
+----------------
+
+Printing kernel addresses to userspace leaks sensitive information about
+the kernel memory layout. Care should be exercised when using any printk
+specifier that prints the raw address, currently %px, %p[ad], (and %p[sSb]
+in certain circumstances [*]).  Any file written to using one of these
+specifiers should be readable only by privileged processes.
+
+Kernels 4.14 and older printed the raw address using %p. As of 4.15-rc1
+addresses printed with the specifier %p are hashed before printing.
+
+[*] If KALLSYMS is enabled and symbol lookup fails, the raw address is
+printed. If KALLSYMS is not enabled the raw address is printed.
+
+Unique identifiers
+------------------
+
+Kernel memory addresses must never be used as identifiers exposed to
+userspace. Instead, use an atomic counter, an idr, or similar unique
+identifier.
+
+Memory initialization
+---------------------
+
+Memory copied to userspace must always be fully initialized. If not
+explicitly memset(), this will require changes to the compiler to make
+sure structure holes are cleared.
+
+Memory poisoning
+----------------
+
+When releasing memory, it is best to poison the contents, to avoid reuse
+attacks that rely on the old contents of memory. E.g., clear stack on a
+syscall return (``CONFIG_GCC_PLUGIN_STACKLEAK``), wipe heap memory on a
+free. This frustrates many uninitialized variable attacks, stack content
+exposures, heap content exposures, and use-after-free attacks.
+
+Destination tracking
+--------------------
+
+To help kill classes of bugs that result in kernel addresses being
+written to userspace, the destination of writes needs to be tracked. If
+the buffer is destined for userspace (e.g. seq_file backed ``/proc`` files),
+it should automatically censor sensitive values.
diff --git a/Documentation/security/siphash.rst b/Documentation/security/siphash.rst
new file mode 100644
index 0000000000..023bd95c74
--- /dev/null
+++ b/Documentation/security/siphash.rst
@@ -0,0 +1,199 @@
+===========================
+SipHash - a short input PRF
+===========================
+
+:Author: Written by Jason A. Donenfeld <jason@zx2c4.com>
+
+SipHash is a cryptographically secure PRF -- a keyed hash function -- that
+performs very well for short inputs, hence the name. It was designed by
+cryptographers Daniel J. Bernstein and Jean-Philippe Aumasson. It is intended
+as a replacement for some uses of: `jhash`, `md5_transform`, `sha1_transform`,
+and so forth.
+
+SipHash takes a secret key filled with randomly generated numbers and either
+an input buffer or several input integers. It spits out an integer that is
+indistinguishable from random. You may then use that integer as part of secure
+sequence numbers, secure cookies, or mask it off for use in a hash table.
+
+Generating a key
+================
+
+Keys should always be generated from a cryptographically secure source of
+random numbers, either using get_random_bytes or get_random_once::
+
+	siphash_key_t key;
+	get_random_bytes(&key, sizeof(key));
+
+If you're not deriving your key from here, you're doing it wrong.
+
+Using the functions
+===================
+
+There are two variants of the function, one that takes a list of integers, and
+one that takes a buffer::
+
+	u64 siphash(const void *data, size_t len, const siphash_key_t *key);
+
+And::
+
+	u64 siphash_1u64(u64, const siphash_key_t *key);
+	u64 siphash_2u64(u64, u64, const siphash_key_t *key);
+	u64 siphash_3u64(u64, u64, u64, const siphash_key_t *key);
+	u64 siphash_4u64(u64, u64, u64, u64, const siphash_key_t *key);
+	u64 siphash_1u32(u32, const siphash_key_t *key);
+	u64 siphash_2u32(u32, u32, const siphash_key_t *key);
+	u64 siphash_3u32(u32, u32, u32, const siphash_key_t *key);
+	u64 siphash_4u32(u32, u32, u32, u32, const siphash_key_t *key);
+
+If you pass the generic siphash function something of a constant length, it
+will constant fold at compile-time and automatically choose one of the
+optimized functions.
+
+Hashtable key function usage::
+
+	struct some_hashtable {
+		DECLARE_HASHTABLE(hashtable, 8);
+		siphash_key_t key;
+	};
+
+	void init_hashtable(struct some_hashtable *table)
+	{
+		get_random_bytes(&table->key, sizeof(table->key));
+	}
+
+	static inline hlist_head *some_hashtable_bucket(struct some_hashtable *table, struct interesting_input *input)
+	{
+		return &table->hashtable[siphash(input, sizeof(*input), &table->key) & (HASH_SIZE(table->hashtable) - 1)];
+	}
+
+You may then iterate like usual over the returned hash bucket.
+
+Security
+========
+
+SipHash has a very high security margin, with its 128-bit key. So long as the
+key is kept secret, it is impossible for an attacker to guess the outputs of
+the function, even if being able to observe many outputs, since 2^128 outputs
+is significant.
+
+Linux implements the "2-4" variant of SipHash.
+
+Struct-passing Pitfalls
+=======================
+
+Often times the XuY functions will not be large enough, and instead you'll
+want to pass a pre-filled struct to siphash. When doing this, it's important
+to always ensure the struct has no padding holes. The easiest way to do this
+is to simply arrange the members of the struct in descending order of size,
+and to use offsetofend() instead of sizeof() for getting the size. For
+performance reasons, if possible, it's probably a good thing to align the
+struct to the right boundary. Here's an example::
+
+	const struct {
+		struct in6_addr saddr;
+		u32 counter;
+		u16 dport;
+	} __aligned(SIPHASH_ALIGNMENT) combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.counter = counter,
+		.dport = dport
+	};
+	u64 h = siphash(&combined, offsetofend(typeof(combined), dport), &secret);
+
+Resources
+=========
+
+Read the SipHash paper if you're interested in learning more:
+https://131002.net/siphash/siphash.pdf
+
+-------------------------------------------------------------------------------
+
+===============================================
+HalfSipHash - SipHash's insecure younger cousin
+===============================================
+
+:Author: Written by Jason A. Donenfeld <jason@zx2c4.com>
+
+On the off-chance that SipHash is not fast enough for your needs, you might be
+able to justify using HalfSipHash, a terrifying but potentially useful
+possibility. HalfSipHash cuts SipHash's rounds down from "2-4" to "1-3" and,
+even scarier, uses an easily brute-forcable 64-bit key (with a 32-bit output)
+instead of SipHash's 128-bit key. However, this may appeal to some
+high-performance `jhash` users.
+
+HalfSipHash support is provided through the "hsiphash" family of functions.
+
+.. warning::
+   Do not ever use the hsiphash functions except for as a hashtable key
+   function, and only then when you can be absolutely certain that the outputs
+   will never be transmitted out of the kernel. This is only remotely useful
+   over `jhash` as a means of mitigating hashtable flooding denial of service
+   attacks.
+
+On 64-bit kernels, the hsiphash functions actually implement SipHash-1-3, a
+reduced-round variant of SipHash, instead of HalfSipHash-1-3. This is because in
+64-bit code, SipHash-1-3 is no slower than HalfSipHash-1-3, and can be faster.
+Note, this does *not* mean that in 64-bit kernels the hsiphash functions are the
+same as the siphash ones, or that they are secure; the hsiphash functions still
+use a less secure reduced-round algorithm and truncate their outputs to 32
+bits.
+
+Generating a hsiphash key
+=========================
+
+Keys should always be generated from a cryptographically secure source of
+random numbers, either using get_random_bytes or get_random_once::
+
+	hsiphash_key_t key;
+	get_random_bytes(&key, sizeof(key));
+
+If you're not deriving your key from here, you're doing it wrong.
+
+Using the hsiphash functions
+============================
+
+There are two variants of the function, one that takes a list of integers, and
+one that takes a buffer::
+
+	u32 hsiphash(const void *data, size_t len, const hsiphash_key_t *key);
+
+And::
+
+	u32 hsiphash_1u32(u32, const hsiphash_key_t *key);
+	u32 hsiphash_2u32(u32, u32, const hsiphash_key_t *key);
+	u32 hsiphash_3u32(u32, u32, u32, const hsiphash_key_t *key);
+	u32 hsiphash_4u32(u32, u32, u32, u32, const hsiphash_key_t *key);
+
+If you pass the generic hsiphash function something of a constant length, it
+will constant fold at compile-time and automatically choose one of the
+optimized functions.
+
+Hashtable key function usage
+============================
+
+::
+
+	struct some_hashtable {
+		DECLARE_HASHTABLE(hashtable, 8);
+		hsiphash_key_t key;
+	};
+
+	void init_hashtable(struct some_hashtable *table)
+	{
+		get_random_bytes(&table->key, sizeof(table->key));
+	}
+
+	static inline hlist_head *some_hashtable_bucket(struct some_hashtable *table, struct interesting_input *input)
+	{
+		return &table->hashtable[hsiphash(input, sizeof(*input), &table->key) & (HASH_SIZE(table->hashtable) - 1)];
+	}
+
+You may then iterate like usual over the returned hash bucket.
+
+Performance
+===========
+
+hsiphash() is roughly 3 times slower than jhash(). For many replacements, this
+will not be a problem, as the hashtable lookup isn't the bottleneck. And in
+general, this is probably a good sacrifice to make for the security and DoS
+resistance of hsiphash().
diff --git a/Documentation/security/tpm/index.rst b/Documentation/security/tpm/index.rst
new file mode 100644
index 0000000000..fc40e9f23c
--- /dev/null
+++ b/Documentation/security/tpm/index.rst
@@ -0,0 +1,10 @@
+=====================================
+Trusted Platform Module documentation
+=====================================
+
+.. toctree::
+
+   tpm_event_log
+   tpm_vtpm_proxy
+   xen-tpmfront
+   tpm_ftpm_tee
diff --git a/Documentation/security/tpm/tpm_event_log.rst b/Documentation/security/tpm/tpm_event_log.rst
new file mode 100644
index 0000000000..f00f7a1d5e
--- /dev/null
+++ b/Documentation/security/tpm/tpm_event_log.rst
@@ -0,0 +1,55 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+TPM Event Log
+=============
+
+This document briefly describes what TPM log is and how it is handed
+over from the preboot firmware to the operating system.
+
+Introduction
+============
+
+The preboot firmware maintains an event log that gets new entries every
+time something gets hashed by it to any of the PCR registers. The events
+are segregated by their type and contain the value of the hashed PCR
+register. Typically, the preboot firmware will hash the components to
+who execution is to be handed over or actions relevant to the boot
+process.
+
+The main application for this is remote attestation and the reason why
+it is useful is nicely put in the very first section of [1]:
+
+"Attestation is used to provide information about the platform’s state
+to a challenger. However, PCR contents are difficult to interpret;
+therefore, attestation is typically more useful when the PCR contents
+are accompanied by a measurement log. While not trusted on their own,
+the measurement log contains a richer set of information than do the PCR
+contents. The PCR contents are used to provide the validation of the
+measurement log."
+
+UEFI event log
+==============
+
+UEFI provided event log has a few somewhat weird quirks.
+
+Before calling ExitBootServices() Linux EFI stub copies the event log to
+a custom configuration table defined by the stub itself. Unfortunately,
+the events generated by ExitBootServices() don't end up in the table.
+
+The firmware provides so called final events configuration table to sort
+out this issue. Events gets mirrored to this table after the first time
+EFI_TCG2_PROTOCOL.GetEventLog() gets called.
+
+This introduces another problem: nothing guarantees that it is not called
+before the Linux EFI stub gets to run. Thus, it needs to calculate and save the
+final events table size while the stub is still running to the custom
+configuration table so that the TPM driver can later on skip these events when
+concatenating two halves of the event log from the custom configuration table
+and the final events table.
+
+References
+==========
+
+- [1] https://trustedcomputinggroup.org/resource/pc-client-specific-platform-firmware-profile-specification/
+- [2] The final concatenation is done in drivers/char/tpm/eventlog/efi.c
diff --git a/Documentation/security/tpm/tpm_ftpm_tee.rst b/Documentation/security/tpm/tpm_ftpm_tee.rst
new file mode 100644
index 0000000000..8c2bae16e3
--- /dev/null
+++ b/Documentation/security/tpm/tpm_ftpm_tee.rst
@@ -0,0 +1,27 @@
+=============================================
+Firmware TPM Driver
+=============================================
+
+This document describes the firmware Trusted Platform Module (fTPM)
+device driver.
+
+Introduction
+============
+
+This driver is a shim for firmware implemented in ARM's TrustZone
+environment. The driver allows programs to interact with the TPM in the same
+way they would interact with a hardware TPM.
+
+Design
+======
+
+The driver acts as a thin layer that passes commands to and from a TPM
+implemented in firmware. The driver itself doesn't contain much logic and is
+used more like a dumb pipe between firmware and kernel/userspace.
+
+The firmware itself is based on the following paper:
+https://www.microsoft.com/en-us/research/wp-content/uploads/2017/06/ftpm1.pdf
+
+When the driver is loaded it will expose ``/dev/tpmX`` character devices to
+userspace which will enable userspace to communicate with the firmware TPM
+through this device.
diff --git a/Documentation/security/tpm/tpm_vtpm_proxy.rst b/Documentation/security/tpm/tpm_vtpm_proxy.rst
new file mode 100644
index 0000000000..ea08e76b17
--- /dev/null
+++ b/Documentation/security/tpm/tpm_vtpm_proxy.rst
@@ -0,0 +1,50 @@
+=============================================
+Virtual TPM Proxy Driver for Linux Containers
+=============================================
+
+| Authors:
+| Stefan Berger <stefanb@linux.vnet.ibm.com>
+
+This document describes the virtual Trusted Platform Module (vTPM)
+proxy device driver for Linux containers.
+
+Introduction
+============
+
+The goal of this work is to provide TPM functionality to each Linux
+container. This allows programs to interact with a TPM in a container
+the same way they interact with a TPM on the physical system. Each
+container gets its own unique, emulated, software TPM.
+
+Design
+======
+
+To make an emulated software TPM available to each container, the container
+management stack needs to create a device pair consisting of a client TPM
+character device ``/dev/tpmX`` (with X=0,1,2...) and a 'server side' file
+descriptor. The former is moved into the container by creating a character
+device with the appropriate major and minor numbers while the file descriptor
+is passed to the TPM emulator. Software inside the container can then send
+TPM commands using the character device and the emulator will receive the
+commands via the file descriptor and use it for sending back responses.
+
+To support this, the virtual TPM proxy driver provides a device ``/dev/vtpmx``
+that is used to create device pairs using an ioctl. The ioctl takes as
+an input flags for configuring the device. The flags  for example indicate
+whether TPM 1.2 or TPM 2 functionality is supported by the TPM emulator.
+The result of the ioctl are the file descriptor for the 'server side'
+as well as the major and minor numbers of the character device that was created.
+Besides that the number of the TPM character device is returned. If for
+example ``/dev/tpm10`` was created, the number (``dev_num``) 10 is returned.
+
+Once the device has been created, the driver will immediately try to talk
+to the TPM. All commands from the driver can be read from the file descriptor
+returned by the ioctl. The commands should be responded to immediately.
+
+UAPI
+====
+
+.. kernel-doc:: include/uapi/linux/vtpm_proxy.h
+
+.. kernel-doc:: drivers/char/tpm/tpm_vtpm_proxy.c
+   :functions: vtpmx_ioc_new_dev
diff --git a/Documentation/security/tpm/xen-tpmfront.rst b/Documentation/security/tpm/xen-tpmfront.rst
new file mode 100644
index 0000000000..31c67522f2
--- /dev/null
+++ b/Documentation/security/tpm/xen-tpmfront.rst
@@ -0,0 +1,124 @@
+=============================
+Virtual TPM interface for Xen
+=============================
+
+Authors: Matthew Fioravante (JHUAPL), Daniel De Graaf (NSA)
+
+This document describes the virtual Trusted Platform Module (vTPM) subsystem for
+Xen. The reader is assumed to have familiarity with building and installing Xen,
+Linux, and a basic understanding of the TPM and vTPM concepts.
+
+Introduction
+------------
+
+The goal of this work is to provide a TPM functionality to a virtual guest
+operating system (in Xen terms, a DomU).  This allows programs to interact with
+a TPM in a virtual system the same way they interact with a TPM on the physical
+system.  Each guest gets its own unique, emulated, software TPM.  However, each
+of the vTPM's secrets (Keys, NVRAM, etc) are managed by a vTPM Manager domain,
+which seals the secrets to the Physical TPM.  If the process of creating each of
+these domains (manager, vTPM, and guest) is trusted, the vTPM subsystem extends
+the chain of trust rooted in the hardware TPM to virtual machines in Xen. Each
+major component of vTPM is implemented as a separate domain, providing secure
+separation guaranteed by the hypervisor. The vTPM domains are implemented in
+mini-os to reduce memory and processor overhead.
+
+This mini-os vTPM subsystem was built on top of the previous vTPM work done by
+IBM and Intel corporation.
+
+
+Design Overview
+---------------
+
+The architecture of vTPM is described below::
+
+  +------------------+
+  |    Linux DomU    | ...
+  |       |  ^       |
+  |       v  |       |
+  |   xen-tpmfront   |
+  +------------------+
+          |  ^
+          v  |
+  +------------------+
+  | mini-os/tpmback  |
+  |       |  ^       |
+  |       v  |       |
+  |  vtpm-stubdom    | ...
+  |       |  ^       |
+  |       v  |       |
+  | mini-os/tpmfront |
+  +------------------+
+          |  ^
+          v  |
+  +------------------+
+  | mini-os/tpmback  |
+  |       |  ^       |
+  |       v  |       |
+  | vtpmmgr-stubdom  |
+  |       |  ^       |
+  |       v  |       |
+  | mini-os/tpm_tis  |
+  +------------------+
+          |  ^
+          v  |
+  +------------------+
+  |   Hardware TPM   |
+  +------------------+
+
+* Linux DomU:
+	       The Linux based guest that wants to use a vTPM. There may be
+	       more than one of these.
+
+* xen-tpmfront.ko:
+		    Linux kernel virtual TPM frontend driver. This driver
+                    provides vTPM access to a Linux-based DomU.
+
+* mini-os/tpmback:
+		    Mini-os TPM backend driver. The Linux frontend driver
+		    connects to this backend driver to facilitate communications
+		    between the Linux DomU and its vTPM. This driver is also
+		    used by vtpmmgr-stubdom to communicate with vtpm-stubdom.
+
+* vtpm-stubdom:
+		 A mini-os stub domain that implements a vTPM. There is a
+		 one to one mapping between running vtpm-stubdom instances and
+                 logical vtpms on the system. The vTPM Platform Configuration
+                 Registers (PCRs) are normally all initialized to zero.
+
+* mini-os/tpmfront:
+		     Mini-os TPM frontend driver. The vTPM mini-os domain
+		     vtpm-stubdom uses this driver to communicate with
+		     vtpmmgr-stubdom. This driver is also used in mini-os
+		     domains such as pv-grub that talk to the vTPM domain.
+
+* vtpmmgr-stubdom:
+		    A mini-os domain that implements the vTPM manager. There is
+		    only one vTPM manager and it should be running during the
+		    entire lifetime of the machine.  This domain regulates
+		    access to the physical TPM on the system and secures the
+		    persistent state of each vTPM.
+
+* mini-os/tpm_tis:
+		    Mini-os TPM version 1.2 TPM Interface Specification (TIS)
+                    driver. This driver used by vtpmmgr-stubdom to talk directly to
+                    the hardware TPM. Communication is facilitated by mapping
+                    hardware memory pages into vtpmmgr-stubdom.
+
+* Hardware TPM:
+		The physical TPM that is soldered onto the motherboard.
+
+
+Integration With Xen
+--------------------
+
+Support for the vTPM driver was added in Xen using the libxl toolstack in Xen
+4.3.  See the Xen documentation (docs/misc/vtpm.txt) for details on setting up
+the vTPM and vTPM Manager stub domains.  Once the stub domains are running, a
+vTPM device is set up in the same manner as a disk or network device in the
+domain's configuration file.
+
+In order to use features such as IMA that require a TPM to be loaded prior to
+the initrd, the xen-tpmfront driver must be compiled in to the kernel.  If not
+using such features, the driver can be compiled as a module and will be loaded
+as usual.
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-11 08:27:49 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-11 08:27:49 +0000
commit	ace9429bb58fd418f0c81d4c2835699bddf6bde6 (patch)
tree	b2d64bc10158fdd5497876388cd68142ca374ed3 /Documentation/security
parent	Initial commit. (diff)
download	linux-ace9429bb58fd418f0c81d4c2835699bddf6bde6.tar.xz linux-ace9429bb58fd418f0c81d4c2835699bddf6bde6.zip