summaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems/fscrypt.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/filesystems/fscrypt.rst')
-rw-r--r--Documentation/filesystems/fscrypt.rst1367
1 files changed, 1367 insertions, 0 deletions
diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
new file mode 100644
index 000000000..5ba5817c1
--- /dev/null
+++ b/Documentation/filesystems/fscrypt.rst
@@ -0,0 +1,1367 @@
+=====================================
+Filesystem-level encryption (fscrypt)
+=====================================
+
+Introduction
+============
+
+fscrypt is a library which filesystems can hook into to support
+transparent encryption of files and directories.
+
+Note: "fscrypt" in this document refers to the kernel-level portion,
+implemented in ``fs/crypto/``, as opposed to the userspace tool
+`fscrypt <https://github.com/google/fscrypt>`_. This document only
+covers the kernel-level portion. For command-line examples of how to
+use encryption, see the documentation for the userspace tool `fscrypt
+<https://github.com/google/fscrypt>`_. Also, it is recommended to use
+the fscrypt userspace tool, or other existing userspace tools such as
+`fscryptctl <https://github.com/google/fscryptctl>`_ or `Android's key
+management system
+<https://source.android.com/security/encryption/file-based>`_, over
+using the kernel's API directly. Using existing tools reduces the
+chance of introducing your own security bugs. (Nevertheless, for
+completeness this documentation covers the kernel's API anyway.)
+
+Unlike dm-crypt, fscrypt operates at the filesystem level rather than
+at the block device level. This allows it to encrypt different files
+with different keys and to have unencrypted files on the same
+filesystem. This is useful for multi-user systems where each user's
+data-at-rest needs to be cryptographically isolated from the others.
+However, except for filenames, fscrypt does not encrypt filesystem
+metadata.
+
+Unlike eCryptfs, which is a stacked filesystem, fscrypt is integrated
+directly into supported filesystems --- currently ext4, F2FS, and
+UBIFS. This allows encrypted files to be read and written without
+caching both the decrypted and encrypted pages in the pagecache,
+thereby nearly halving the memory used and bringing it in line with
+unencrypted files. Similarly, half as many dentries and inodes are
+needed. eCryptfs also limits encrypted filenames to 143 bytes,
+causing application compatibility issues; fscrypt allows the full 255
+bytes (NAME_MAX). Finally, unlike eCryptfs, the fscrypt API can be
+used by unprivileged users, with no need to mount anything.
+
+fscrypt does not support encrypting files in-place. Instead, it
+supports marking an empty directory as encrypted. Then, after
+userspace provides the key, all regular files, directories, and
+symbolic links created in that directory tree are transparently
+encrypted.
+
+Threat model
+============
+
+Offline attacks
+---------------
+
+Provided that userspace chooses a strong encryption key, fscrypt
+protects the confidentiality of file contents and filenames in the
+event of a single point-in-time permanent offline compromise of the
+block device content. fscrypt does not protect the confidentiality of
+non-filename metadata, e.g. file sizes, file permissions, file
+timestamps, and extended attributes. Also, the existence and location
+of holes (unallocated blocks which logically contain all zeroes) in
+files is not protected.
+
+fscrypt is not guaranteed to protect confidentiality or authenticity
+if an attacker is able to manipulate the filesystem offline prior to
+an authorized user later accessing the filesystem.
+
+Online attacks
+--------------
+
+fscrypt (and storage encryption in general) can only provide limited
+protection, if any at all, against online attacks. In detail:
+
+Side-channel attacks
+~~~~~~~~~~~~~~~~~~~~
+
+fscrypt is only resistant to side-channel attacks, such as timing or
+electromagnetic attacks, to the extent that the underlying Linux
+Cryptographic API algorithms or inline encryption hardware are. If a
+vulnerable algorithm is used, such as a table-based implementation of
+AES, it may be possible for an attacker to mount a side channel attack
+against the online system. Side channel attacks may also be mounted
+against applications consuming decrypted data.
+
+Unauthorized file access
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+After an encryption key has been added, fscrypt does not hide the
+plaintext file contents or filenames from other users on the same
+system. Instead, existing access control mechanisms such as file mode
+bits, POSIX ACLs, LSMs, or namespaces should be used for this purpose.
+
+(For the reasoning behind this, understand that while the key is
+added, the confidentiality of the data, from the perspective of the
+system itself, is *not* protected by the mathematical properties of
+encryption but rather only by the correctness of the kernel.
+Therefore, any encryption-specific access control checks would merely
+be enforced by kernel *code* and therefore would be largely redundant
+with the wide variety of access control mechanisms already available.)
+
+Kernel memory compromise
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+An attacker who compromises the system enough to read from arbitrary
+memory, e.g. by mounting a physical attack or by exploiting a kernel
+security vulnerability, can compromise all encryption keys that are
+currently in use.
+
+However, fscrypt allows encryption keys to be removed from the kernel,
+which may protect them from later compromise.
+
+In more detail, the FS_IOC_REMOVE_ENCRYPTION_KEY ioctl (or the
+FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS ioctl) can wipe a master
+encryption key from kernel memory. If it does so, it will also try to
+evict all cached inodes which had been "unlocked" using the key,
+thereby wiping their per-file keys and making them once again appear
+"locked", i.e. in ciphertext or encrypted form.
+
+However, these ioctls have some limitations:
+
+- Per-file keys for in-use files will *not* be removed or wiped.
+ Therefore, for maximum effect, userspace should close the relevant
+ encrypted files and directories before removing a master key, as
+ well as kill any processes whose working directory is in an affected
+ encrypted directory.
+
+- The kernel cannot magically wipe copies of the master key(s) that
+ userspace might have as well. Therefore, userspace must wipe all
+ copies of the master key(s) it makes as well; normally this should
+ be done immediately after FS_IOC_ADD_ENCRYPTION_KEY, without waiting
+ for FS_IOC_REMOVE_ENCRYPTION_KEY. Naturally, the same also applies
+ to all higher levels in the key hierarchy. Userspace should also
+ follow other security precautions such as mlock()ing memory
+ containing keys to prevent it from being swapped out.
+
+- In general, decrypted contents and filenames in the kernel VFS
+ caches are freed but not wiped. Therefore, portions thereof may be
+ recoverable from freed memory, even after the corresponding key(s)
+ were wiped. To partially solve this, you can set
+ CONFIG_PAGE_POISONING=y in your kernel config and add page_poison=1
+ to your kernel command line. However, this has a performance cost.
+
+- Secret keys might still exist in CPU registers, in crypto
+ accelerator hardware (if used by the crypto API to implement any of
+ the algorithms), or in other places not explicitly considered here.
+
+Limitations of v1 policies
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+v1 encryption policies have some weaknesses with respect to online
+attacks:
+
+- There is no verification that the provided master key is correct.
+ Therefore, a malicious user can temporarily associate the wrong key
+ with another user's encrypted files to which they have read-only
+ access. Because of filesystem caching, the wrong key will then be
+ used by the other user's accesses to those files, even if the other
+ user has the correct key in their own keyring. This violates the
+ meaning of "read-only access".
+
+- A compromise of a per-file key also compromises the master key from
+ which it was derived.
+
+- Non-root users cannot securely remove encryption keys.
+
+All the above problems are fixed with v2 encryption policies. For
+this reason among others, it is recommended to use v2 encryption
+policies on all new encrypted directories.
+
+Key hierarchy
+=============
+
+Master Keys
+-----------
+
+Each encrypted directory tree is protected by a *master key*. Master
+keys can be up to 64 bytes long, and must be at least as long as the
+greater of the security strength of the contents and filenames
+encryption modes being used. For example, if any AES-256 mode is
+used, the master key must be at least 256 bits, i.e. 32 bytes. A
+stricter requirement applies if the key is used by a v1 encryption
+policy and AES-256-XTS is used; such keys must be 64 bytes.
+
+To "unlock" an encrypted directory tree, userspace must provide the
+appropriate master key. There can be any number of master keys, each
+of which protects any number of directory trees on any number of
+filesystems.
+
+Master keys must be real cryptographic keys, i.e. indistinguishable
+from random bytestrings of the same length. This implies that users
+**must not** directly use a password as a master key, zero-pad a
+shorter key, or repeat a shorter key. Security cannot be guaranteed
+if userspace makes any such error, as the cryptographic proofs and
+analysis would no longer apply.
+
+Instead, users should generate master keys either using a
+cryptographically secure random number generator, or by using a KDF
+(Key Derivation Function). The kernel does not do any key stretching;
+therefore, if userspace derives the key from a low-entropy secret such
+as a passphrase, it is critical that a KDF designed for this purpose
+be used, such as scrypt, PBKDF2, or Argon2.
+
+Key derivation function
+-----------------------
+
+With one exception, fscrypt never uses the master key(s) for
+encryption directly. Instead, they are only used as input to a KDF
+(Key Derivation Function) to derive the actual keys.
+
+The KDF used for a particular master key differs depending on whether
+the key is used for v1 encryption policies or for v2 encryption
+policies. Users **must not** use the same key for both v1 and v2
+encryption policies. (No real-world attack is currently known on this
+specific case of key reuse, but its security cannot be guaranteed
+since the cryptographic proofs and analysis would no longer apply.)
+
+For v1 encryption policies, the KDF only supports deriving per-file
+encryption keys. It works by encrypting the master key with
+AES-128-ECB, using the file's 16-byte nonce as the AES key. The
+resulting ciphertext is used as the derived key. If the ciphertext is
+longer than needed, then it is truncated to the needed length.
+
+For v2 encryption policies, the KDF is HKDF-SHA512. The master key is
+passed as the "input keying material", no salt is used, and a distinct
+"application-specific information string" is used for each distinct
+key to be derived. For example, when a per-file encryption key is
+derived, the application-specific information string is the file's
+nonce prefixed with "fscrypt\\0" and a context byte. Different
+context bytes are used for other types of derived keys.
+
+HKDF-SHA512 is preferred to the original AES-128-ECB based KDF because
+HKDF is more flexible, is nonreversible, and evenly distributes
+entropy from the master key. HKDF is also standardized and widely
+used by other software, whereas the AES-128-ECB based KDF is ad-hoc.
+
+Per-file encryption keys
+------------------------
+
+Since each master key can protect many files, it is necessary to
+"tweak" the encryption of each file so that the same plaintext in two
+files doesn't map to the same ciphertext, or vice versa. In most
+cases, fscrypt does this by deriving per-file keys. When a new
+encrypted inode (regular file, directory, or symlink) is created,
+fscrypt randomly generates a 16-byte nonce and stores it in the
+inode's encryption xattr. Then, it uses a KDF (as described in `Key
+derivation function`_) to derive the file's key from the master key
+and nonce.
+
+Key derivation was chosen over key wrapping because wrapped keys would
+require larger xattrs which would be less likely to fit in-line in the
+filesystem's inode table, and there didn't appear to be any
+significant advantages to key wrapping. In particular, currently
+there is no requirement to support unlocking a file with multiple
+alternative master keys or to support rotating master keys. Instead,
+the master keys may be wrapped in userspace, e.g. as is done by the
+`fscrypt <https://github.com/google/fscrypt>`_ tool.
+
+DIRECT_KEY policies
+-------------------
+
+The Adiantum encryption mode (see `Encryption modes and usage`_) is
+suitable for both contents and filenames encryption, and it accepts
+long IVs --- long enough to hold both an 8-byte logical block number
+and a 16-byte per-file nonce. Also, the overhead of each Adiantum key
+is greater than that of an AES-256-XTS key.
+
+Therefore, to improve performance and save memory, for Adiantum a
+"direct key" configuration is supported. When the user has enabled
+this by setting FSCRYPT_POLICY_FLAG_DIRECT_KEY in the fscrypt policy,
+per-file encryption keys are not used. Instead, whenever any data
+(contents or filenames) is encrypted, the file's 16-byte nonce is
+included in the IV. Moreover:
+
+- For v1 encryption policies, the encryption is done directly with the
+ master key. Because of this, users **must not** use the same master
+ key for any other purpose, even for other v1 policies.
+
+- For v2 encryption policies, the encryption is done with a per-mode
+ key derived using the KDF. Users may use the same master key for
+ other v2 encryption policies.
+
+IV_INO_LBLK_64 policies
+-----------------------
+
+When FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64 is set in the fscrypt policy,
+the encryption keys are derived from the master key, encryption mode
+number, and filesystem UUID. This normally results in all files
+protected by the same master key sharing a single contents encryption
+key and a single filenames encryption key. To still encrypt different
+files' data differently, inode numbers are included in the IVs.
+Consequently, shrinking the filesystem may not be allowed.
+
+This format is optimized for use with inline encryption hardware
+compliant with the UFS standard, which supports only 64 IV bits per
+I/O request and may have only a small number of keyslots.
+
+IV_INO_LBLK_32 policies
+-----------------------
+
+IV_INO_LBLK_32 policies work like IV_INO_LBLK_64, except that for
+IV_INO_LBLK_32, the inode number is hashed with SipHash-2-4 (where the
+SipHash key is derived from the master key) and added to the file
+logical block number mod 2^32 to produce a 32-bit IV.
+
+This format is optimized for use with inline encryption hardware
+compliant with the eMMC v5.2 standard, which supports only 32 IV bits
+per I/O request and may have only a small number of keyslots. This
+format results in some level of IV reuse, so it should only be used
+when necessary due to hardware limitations.
+
+Key identifiers
+---------------
+
+For master keys used for v2 encryption policies, a unique 16-byte "key
+identifier" is also derived using the KDF. This value is stored in
+the clear, since it is needed to reliably identify the key itself.
+
+Dirhash keys
+------------
+
+For directories that are indexed using a secret-keyed dirhash over the
+plaintext filenames, the KDF is also used to derive a 128-bit
+SipHash-2-4 key per directory in order to hash filenames. This works
+just like deriving a per-file encryption key, except that a different
+KDF context is used. Currently, only casefolded ("case-insensitive")
+encrypted directories use this style of hashing.
+
+Encryption modes and usage
+==========================
+
+fscrypt allows one encryption mode to be specified for file contents
+and one encryption mode to be specified for filenames. Different
+directory trees are permitted to use different encryption modes.
+Currently, the following pairs of encryption modes are supported:
+
+- AES-256-XTS for contents and AES-256-CTS-CBC for filenames
+- AES-128-CBC for contents and AES-128-CTS-CBC for filenames
+- Adiantum for both contents and filenames
+- AES-256-XTS for contents and AES-256-HCTR2 for filenames (v2 policies only)
+
+If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.
+
+AES-128-CBC was added only for low-powered embedded devices with
+crypto accelerators such as CAAM or CESA that do not support XTS. To
+use AES-128-CBC, CONFIG_CRYPTO_ESSIV and CONFIG_CRYPTO_SHA256 (or
+another SHA-256 implementation) must be enabled so that ESSIV can be
+used.
+
+Adiantum is a (primarily) stream cipher-based mode that is fast even
+on CPUs without dedicated crypto instructions. It's also a true
+wide-block mode, unlike XTS. It can also eliminate the need to derive
+per-file encryption keys. However, it depends on the security of two
+primitives, XChaCha12 and AES-256, rather than just one. See the
+paper "Adiantum: length-preserving encryption for entry-level
+processors" (https://eprint.iacr.org/2018/720.pdf) for more details.
+To use Adiantum, CONFIG_CRYPTO_ADIANTUM must be enabled. Also, fast
+implementations of ChaCha and NHPoly1305 should be enabled, e.g.
+CONFIG_CRYPTO_CHACHA20_NEON and CONFIG_CRYPTO_NHPOLY1305_NEON for ARM.
+
+AES-256-HCTR2 is another true wide-block encryption mode that is intended for
+use on CPUs with dedicated crypto instructions. AES-256-HCTR2 has the property
+that a bitflip in the plaintext changes the entire ciphertext. This property
+makes it desirable for filename encryption since initialization vectors are
+reused within a directory. For more details on AES-256-HCTR2, see the paper
+"Length-preserving encryption with HCTR2"
+(https://eprint.iacr.org/2021/1441.pdf). To use AES-256-HCTR2,
+CONFIG_CRYPTO_HCTR2 must be enabled. Also, fast implementations of XCTR and
+POLYVAL should be enabled, e.g. CRYPTO_POLYVAL_ARM64_CE and
+CRYPTO_AES_ARM64_CE_BLK for ARM64.
+
+New encryption modes can be added relatively easily, without changes
+to individual filesystems. However, authenticated encryption (AE)
+modes are not currently supported because of the difficulty of dealing
+with ciphertext expansion.
+
+Contents encryption
+-------------------
+
+For file contents, each filesystem block is encrypted independently.
+Starting from Linux kernel 5.5, encryption of filesystems with block
+size less than system's page size is supported.
+
+Each block's IV is set to the logical block number within the file as
+a little endian number, except that:
+
+- With CBC mode encryption, ESSIV is also used. Specifically, each IV
+ is encrypted with AES-256 where the AES-256 key is the SHA-256 hash
+ of the file's data encryption key.
+
+- With `DIRECT_KEY policies`_, the file's nonce is appended to the IV.
+ Currently this is only allowed with the Adiantum encryption mode.
+
+- With `IV_INO_LBLK_64 policies`_, the logical block number is limited
+ to 32 bits and is placed in bits 0-31 of the IV. The inode number
+ (which is also limited to 32 bits) is placed in bits 32-63.
+
+- With `IV_INO_LBLK_32 policies`_, the logical block number is limited
+ to 32 bits and is placed in bits 0-31 of the IV. The inode number
+ is then hashed and added mod 2^32.
+
+Note that because file logical block numbers are included in the IVs,
+filesystems must enforce that blocks are never shifted around within
+encrypted files, e.g. via "collapse range" or "insert range".
+
+Filenames encryption
+--------------------
+
+For filenames, each full filename is encrypted at once. Because of
+the requirements to retain support for efficient directory lookups and
+filenames of up to 255 bytes, the same IV is used for every filename
+in a directory.
+
+However, each encrypted directory still uses a unique key, or
+alternatively has the file's nonce (for `DIRECT_KEY policies`_) or
+inode number (for `IV_INO_LBLK_64 policies`_) included in the IVs.
+Thus, IV reuse is limited to within a single directory.
+
+With CTS-CBC, the IV reuse means that when the plaintext filenames share a
+common prefix at least as long as the cipher block size (16 bytes for AES), the
+corresponding encrypted filenames will also share a common prefix. This is
+undesirable. Adiantum and HCTR2 do not have this weakness, as they are
+wide-block encryption modes.
+
+All supported filenames encryption modes accept any plaintext length
+>= 16 bytes; cipher block alignment is not required. However,
+filenames shorter than 16 bytes are NUL-padded to 16 bytes before
+being encrypted. In addition, to reduce leakage of filename lengths
+via their ciphertexts, all filenames are NUL-padded to the next 4, 8,
+16, or 32-byte boundary (configurable). 32 is recommended since this
+provides the best confidentiality, at the cost of making directory
+entries consume slightly more space. Note that since NUL (``\0``) is
+not otherwise a valid character in filenames, the padding will never
+produce duplicate plaintexts.
+
+Symbolic link targets are considered a type of filename and are
+encrypted in the same way as filenames in directory entries, except
+that IV reuse is not a problem as each symlink has its own inode.
+
+User API
+========
+
+Setting an encryption policy
+----------------------------
+
+FS_IOC_SET_ENCRYPTION_POLICY
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The FS_IOC_SET_ENCRYPTION_POLICY ioctl sets an encryption policy on an
+empty directory or verifies that a directory or regular file already
+has the specified encryption policy. It takes in a pointer to
+struct fscrypt_policy_v1 or struct fscrypt_policy_v2, defined as
+follows::
+
+ #define FSCRYPT_POLICY_V1 0
+ #define FSCRYPT_KEY_DESCRIPTOR_SIZE 8
+ struct fscrypt_policy_v1 {
+ __u8 version;
+ __u8 contents_encryption_mode;
+ __u8 filenames_encryption_mode;
+ __u8 flags;
+ __u8 master_key_descriptor[FSCRYPT_KEY_DESCRIPTOR_SIZE];
+ };
+ #define fscrypt_policy fscrypt_policy_v1
+
+ #define FSCRYPT_POLICY_V2 2
+ #define FSCRYPT_KEY_IDENTIFIER_SIZE 16
+ struct fscrypt_policy_v2 {
+ __u8 version;
+ __u8 contents_encryption_mode;
+ __u8 filenames_encryption_mode;
+ __u8 flags;
+ __u8 __reserved[4];
+ __u8 master_key_identifier[FSCRYPT_KEY_IDENTIFIER_SIZE];
+ };
+
+This structure must be initialized as follows:
+
+- ``version`` must be FSCRYPT_POLICY_V1 (0) if
+ struct fscrypt_policy_v1 is used or FSCRYPT_POLICY_V2 (2) if
+ struct fscrypt_policy_v2 is used. (Note: we refer to the original
+ policy version as "v1", though its version code is really 0.)
+ For new encrypted directories, use v2 policies.
+
+- ``contents_encryption_mode`` and ``filenames_encryption_mode`` must
+ be set to constants from ``<linux/fscrypt.h>`` which identify the
+ encryption modes to use. If unsure, use FSCRYPT_MODE_AES_256_XTS
+ (1) for ``contents_encryption_mode`` and FSCRYPT_MODE_AES_256_CTS
+ (4) for ``filenames_encryption_mode``.
+
+- ``flags`` contains optional flags from ``<linux/fscrypt.h>``:
+
+ - FSCRYPT_POLICY_FLAGS_PAD_*: The amount of NUL padding to use when
+ encrypting filenames. If unsure, use FSCRYPT_POLICY_FLAGS_PAD_32
+ (0x3).
+ - FSCRYPT_POLICY_FLAG_DIRECT_KEY: See `DIRECT_KEY policies`_.
+ - FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64: See `IV_INO_LBLK_64
+ policies`_.
+ - FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32: See `IV_INO_LBLK_32
+ policies`_.
+
+ v1 encryption policies only support the PAD_* and DIRECT_KEY flags.
+ The other flags are only supported by v2 encryption policies.
+
+ The DIRECT_KEY, IV_INO_LBLK_64, and IV_INO_LBLK_32 flags are
+ mutually exclusive.
+
+- For v2 encryption policies, ``__reserved`` must be zeroed.
+
+- For v1 encryption policies, ``master_key_descriptor`` specifies how
+ to find the master key in a keyring; see `Adding keys`_. It is up
+ to userspace to choose a unique ``master_key_descriptor`` for each
+ master key. The e4crypt and fscrypt tools use the first 8 bytes of
+ ``SHA-512(SHA-512(master_key))``, but this particular scheme is not
+ required. Also, the master key need not be in the keyring yet when
+ FS_IOC_SET_ENCRYPTION_POLICY is executed. However, it must be added
+ before any files can be created in the encrypted directory.
+
+ For v2 encryption policies, ``master_key_descriptor`` has been
+ replaced with ``master_key_identifier``, which is longer and cannot
+ be arbitrarily chosen. Instead, the key must first be added using
+ `FS_IOC_ADD_ENCRYPTION_KEY`_. Then, the ``key_spec.u.identifier``
+ the kernel returned in the struct fscrypt_add_key_arg must
+ be used as the ``master_key_identifier`` in
+ struct fscrypt_policy_v2.
+
+If the file is not yet encrypted, then FS_IOC_SET_ENCRYPTION_POLICY
+verifies that the file is an empty directory. If so, the specified
+encryption policy is assigned to the directory, turning it into an
+encrypted directory. After that, and after providing the
+corresponding master key as described in `Adding keys`_, all regular
+files, directories (recursively), and symlinks created in the
+directory will be encrypted, inheriting the same encryption policy.
+The filenames in the directory's entries will be encrypted as well.
+
+Alternatively, if the file is already encrypted, then
+FS_IOC_SET_ENCRYPTION_POLICY validates that the specified encryption
+policy exactly matches the actual one. If they match, then the ioctl
+returns 0. Otherwise, it fails with EEXIST. This works on both
+regular files and directories, including nonempty directories.
+
+When a v2 encryption policy is assigned to a directory, it is also
+required that either the specified key has been added by the current
+user or that the caller has CAP_FOWNER in the initial user namespace.
+(This is needed to prevent a user from encrypting their data with
+another user's key.) The key must remain added while
+FS_IOC_SET_ENCRYPTION_POLICY is executing. However, if the new
+encrypted directory does not need to be accessed immediately, then the
+key can be removed right away afterwards.
+
+Note that the ext4 filesystem does not allow the root directory to be
+encrypted, even if it is empty. Users who want to encrypt an entire
+filesystem with one key should consider using dm-crypt instead.
+
+FS_IOC_SET_ENCRYPTION_POLICY can fail with the following errors:
+
+- ``EACCES``: the file is not owned by the process's uid, nor does the
+ process have the CAP_FOWNER capability in a namespace with the file
+ owner's uid mapped
+- ``EEXIST``: the file is already encrypted with an encryption policy
+ different from the one specified
+- ``EINVAL``: an invalid encryption policy was specified (invalid
+ version, mode(s), or flags; or reserved bits were set); or a v1
+ encryption policy was specified but the directory has the casefold
+ flag enabled (casefolding is incompatible with v1 policies).
+- ``ENOKEY``: a v2 encryption policy was specified, but the key with
+ the specified ``master_key_identifier`` has not been added, nor does
+ the process have the CAP_FOWNER capability in the initial user
+ namespace
+- ``ENOTDIR``: the file is unencrypted and is a regular file, not a
+ directory
+- ``ENOTEMPTY``: the file is unencrypted and is a nonempty directory
+- ``ENOTTY``: this type of filesystem does not implement encryption
+- ``EOPNOTSUPP``: the kernel was not configured with encryption
+ support for filesystems, or the filesystem superblock has not
+ had encryption enabled on it. (For example, to use encryption on an
+ ext4 filesystem, CONFIG_FS_ENCRYPTION must be enabled in the
+ kernel config, and the superblock must have had the "encrypt"
+ feature flag enabled using ``tune2fs -O encrypt`` or ``mkfs.ext4 -O
+ encrypt``.)
+- ``EPERM``: this directory may not be encrypted, e.g. because it is
+ the root directory of an ext4 filesystem
+- ``EROFS``: the filesystem is readonly
+
+Getting an encryption policy
+----------------------------
+
+Two ioctls are available to get a file's encryption policy:
+
+- `FS_IOC_GET_ENCRYPTION_POLICY_EX`_
+- `FS_IOC_GET_ENCRYPTION_POLICY`_
+
+The extended (_EX) version of the ioctl is more general and is
+recommended to use when possible. However, on older kernels only the
+original ioctl is available. Applications should try the extended
+version, and if it fails with ENOTTY fall back to the original
+version.
+
+FS_IOC_GET_ENCRYPTION_POLICY_EX
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The FS_IOC_GET_ENCRYPTION_POLICY_EX ioctl retrieves the encryption
+policy, if any, for a directory or regular file. No additional
+permissions are required beyond the ability to open the file. It
+takes in a pointer to struct fscrypt_get_policy_ex_arg,
+defined as follows::
+
+ struct fscrypt_get_policy_ex_arg {
+ __u64 policy_size; /* input/output */
+ union {
+ __u8 version;
+ struct fscrypt_policy_v1 v1;
+ struct fscrypt_policy_v2 v2;
+ } policy; /* output */
+ };
+
+The caller must initialize ``policy_size`` to the size available for
+the policy struct, i.e. ``sizeof(arg.policy)``.
+
+On success, the policy struct is returned in ``policy``, and its
+actual size is returned in ``policy_size``. ``policy.version`` should
+be checked to determine the version of policy returned. Note that the
+version code for the "v1" policy is actually 0 (FSCRYPT_POLICY_V1).
+
+FS_IOC_GET_ENCRYPTION_POLICY_EX can fail with the following errors:
+
+- ``EINVAL``: the file is encrypted, but it uses an unrecognized
+ encryption policy version
+- ``ENODATA``: the file is not encrypted
+- ``ENOTTY``: this type of filesystem does not implement encryption,
+ or this kernel is too old to support FS_IOC_GET_ENCRYPTION_POLICY_EX
+ (try FS_IOC_GET_ENCRYPTION_POLICY instead)
+- ``EOPNOTSUPP``: the kernel was not configured with encryption
+ support for this filesystem, or the filesystem superblock has not
+ had encryption enabled on it
+- ``EOVERFLOW``: the file is encrypted and uses a recognized
+ encryption policy version, but the policy struct does not fit into
+ the provided buffer
+
+Note: if you only need to know whether a file is encrypted or not, on
+most filesystems it is also possible to use the FS_IOC_GETFLAGS ioctl
+and check for FS_ENCRYPT_FL, or to use the statx() system call and
+check for STATX_ATTR_ENCRYPTED in stx_attributes.
+
+FS_IOC_GET_ENCRYPTION_POLICY
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The FS_IOC_GET_ENCRYPTION_POLICY ioctl can also retrieve the
+encryption policy, if any, for a directory or regular file. However,
+unlike `FS_IOC_GET_ENCRYPTION_POLICY_EX`_,
+FS_IOC_GET_ENCRYPTION_POLICY only supports the original policy
+version. It takes in a pointer directly to struct fscrypt_policy_v1
+rather than struct fscrypt_get_policy_ex_arg.
+
+The error codes for FS_IOC_GET_ENCRYPTION_POLICY are the same as those
+for FS_IOC_GET_ENCRYPTION_POLICY_EX, except that
+FS_IOC_GET_ENCRYPTION_POLICY also returns ``EINVAL`` if the file is
+encrypted using a newer encryption policy version.
+
+Getting the per-filesystem salt
+-------------------------------
+
+Some filesystems, such as ext4 and F2FS, also support the deprecated
+ioctl FS_IOC_GET_ENCRYPTION_PWSALT. This ioctl retrieves a randomly
+generated 16-byte value stored in the filesystem superblock. This
+value is intended to used as a salt when deriving an encryption key
+from a passphrase or other low-entropy user credential.
+
+FS_IOC_GET_ENCRYPTION_PWSALT is deprecated. Instead, prefer to
+generate and manage any needed salt(s) in userspace.
+
+Getting a file's encryption nonce
+---------------------------------
+
+Since Linux v5.7, the ioctl FS_IOC_GET_ENCRYPTION_NONCE is supported.
+On encrypted files and directories it gets the inode's 16-byte nonce.
+On unencrypted files and directories, it fails with ENODATA.
+
+This ioctl can be useful for automated tests which verify that the
+encryption is being done correctly. It is not needed for normal use
+of fscrypt.
+
+Adding keys
+-----------
+
+FS_IOC_ADD_ENCRYPTION_KEY
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The FS_IOC_ADD_ENCRYPTION_KEY ioctl adds a master encryption key to
+the filesystem, making all files on the filesystem which were
+encrypted using that key appear "unlocked", i.e. in plaintext form.
+It can be executed on any file or directory on the target filesystem,
+but using the filesystem's root directory is recommended. It takes in
+a pointer to struct fscrypt_add_key_arg, defined as follows::
+
+ struct fscrypt_add_key_arg {
+ struct fscrypt_key_specifier key_spec;
+ __u32 raw_size;
+ __u32 key_id;
+ __u32 __reserved[8];
+ __u8 raw[];
+ };
+
+ #define FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR 1
+ #define FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER 2
+
+ struct fscrypt_key_specifier {
+ __u32 type; /* one of FSCRYPT_KEY_SPEC_TYPE_* */
+ __u32 __reserved;
+ union {
+ __u8 __reserved[32]; /* reserve some extra space */
+ __u8 descriptor[FSCRYPT_KEY_DESCRIPTOR_SIZE];
+ __u8 identifier[FSCRYPT_KEY_IDENTIFIER_SIZE];
+ } u;
+ };
+
+ struct fscrypt_provisioning_key_payload {
+ __u32 type;
+ __u32 __reserved;
+ __u8 raw[];
+ };
+
+struct fscrypt_add_key_arg must be zeroed, then initialized
+as follows:
+
+- If the key is being added for use by v1 encryption policies, then
+ ``key_spec.type`` must contain FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR, and
+ ``key_spec.u.descriptor`` must contain the descriptor of the key
+ being added, corresponding to the value in the
+ ``master_key_descriptor`` field of struct fscrypt_policy_v1.
+ To add this type of key, the calling process must have the
+ CAP_SYS_ADMIN capability in the initial user namespace.
+
+ Alternatively, if the key is being added for use by v2 encryption
+ policies, then ``key_spec.type`` must contain
+ FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER, and ``key_spec.u.identifier`` is
+ an *output* field which the kernel fills in with a cryptographic
+ hash of the key. To add this type of key, the calling process does
+ not need any privileges. However, the number of keys that can be
+ added is limited by the user's quota for the keyrings service (see
+ ``Documentation/security/keys/core.rst``).
+
+- ``raw_size`` must be the size of the ``raw`` key provided, in bytes.
+ Alternatively, if ``key_id`` is nonzero, this field must be 0, since
+ in that case the size is implied by the specified Linux keyring key.
+
+- ``key_id`` is 0 if the raw key is given directly in the ``raw``
+ field. Otherwise ``key_id`` is the ID of a Linux keyring key of
+ type "fscrypt-provisioning" whose payload is
+ struct fscrypt_provisioning_key_payload whose ``raw`` field contains
+ the raw key and whose ``type`` field matches ``key_spec.type``.
+ Since ``raw`` is variable-length, the total size of this key's
+ payload must be ``sizeof(struct fscrypt_provisioning_key_payload)``
+ plus the raw key size. The process must have Search permission on
+ this key.
+
+ Most users should leave this 0 and specify the raw key directly.
+ The support for specifying a Linux keyring key is intended mainly to
+ allow re-adding keys after a filesystem is unmounted and re-mounted,
+ without having to store the raw keys in userspace memory.
+
+- ``raw`` is a variable-length field which must contain the actual
+ key, ``raw_size`` bytes long. Alternatively, if ``key_id`` is
+ nonzero, then this field is unused.
+
+For v2 policy keys, the kernel keeps track of which user (identified
+by effective user ID) added the key, and only allows the key to be
+removed by that user --- or by "root", if they use
+`FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS`_.
+
+However, if another user has added the key, it may be desirable to
+prevent that other user from unexpectedly removing it. Therefore,
+FS_IOC_ADD_ENCRYPTION_KEY may also be used to add a v2 policy key
+*again*, even if it's already added by other user(s). In this case,
+FS_IOC_ADD_ENCRYPTION_KEY will just install a claim to the key for the
+current user, rather than actually add the key again (but the raw key
+must still be provided, as a proof of knowledge).
+
+FS_IOC_ADD_ENCRYPTION_KEY returns 0 if either the key or a claim to
+the key was either added or already exists.
+
+FS_IOC_ADD_ENCRYPTION_KEY can fail with the following errors:
+
+- ``EACCES``: FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR was specified, but the
+ caller does not have the CAP_SYS_ADMIN capability in the initial
+ user namespace; or the raw key was specified by Linux key ID but the
+ process lacks Search permission on the key.
+- ``EDQUOT``: the key quota for this user would be exceeded by adding
+ the key
+- ``EINVAL``: invalid key size or key specifier type, or reserved bits
+ were set
+- ``EKEYREJECTED``: the raw key was specified by Linux key ID, but the
+ key has the wrong type
+- ``ENOKEY``: the raw key was specified by Linux key ID, but no key
+ exists with that ID
+- ``ENOTTY``: this type of filesystem does not implement encryption
+- ``EOPNOTSUPP``: the kernel was not configured with encryption
+ support for this filesystem, or the filesystem superblock has not
+ had encryption enabled on it
+
+Legacy method
+~~~~~~~~~~~~~
+
+For v1 encryption policies, a master encryption key can also be
+provided by adding it to a process-subscribed keyring, e.g. to a
+session keyring, or to a user keyring if the user keyring is linked
+into the session keyring.
+
+This method is deprecated (and not supported for v2 encryption
+policies) for several reasons. First, it cannot be used in
+combination with FS_IOC_REMOVE_ENCRYPTION_KEY (see `Removing keys`_),
+so for removing a key a workaround such as keyctl_unlink() in
+combination with ``sync; echo 2 > /proc/sys/vm/drop_caches`` would
+have to be used. Second, it doesn't match the fact that the
+locked/unlocked status of encrypted files (i.e. whether they appear to
+be in plaintext form or in ciphertext form) is global. This mismatch
+has caused much confusion as well as real problems when processes
+running under different UIDs, such as a ``sudo`` command, need to
+access encrypted files.
+
+Nevertheless, to add a key to one of the process-subscribed keyrings,
+the add_key() system call can be used (see:
+``Documentation/security/keys/core.rst``). The key type must be
+"logon"; keys of this type are kept in kernel memory and cannot be
+read back by userspace. The key description must be "fscrypt:"
+followed by the 16-character lower case hex representation of the
+``master_key_descriptor`` that was set in the encryption policy. The
+key payload must conform to the following structure::
+
+ #define FSCRYPT_MAX_KEY_SIZE 64
+
+ struct fscrypt_key {
+ __u32 mode;
+ __u8 raw[FSCRYPT_MAX_KEY_SIZE];
+ __u32 size;
+ };
+
+``mode`` is ignored; just set it to 0. The actual key is provided in
+``raw`` with ``size`` indicating its size in bytes. That is, the
+bytes ``raw[0..size-1]`` (inclusive) are the actual key.
+
+The key description prefix "fscrypt:" may alternatively be replaced
+with a filesystem-specific prefix such as "ext4:". However, the
+filesystem-specific prefixes are deprecated and should not be used in
+new programs.
+
+Removing keys
+-------------
+
+Two ioctls are available for removing a key that was added by
+`FS_IOC_ADD_ENCRYPTION_KEY`_:
+
+- `FS_IOC_REMOVE_ENCRYPTION_KEY`_
+- `FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS`_
+
+These two ioctls differ only in cases where v2 policy keys are added
+or removed by non-root users.
+
+These ioctls don't work on keys that were added via the legacy
+process-subscribed keyrings mechanism.
+
+Before using these ioctls, read the `Kernel memory compromise`_
+section for a discussion of the security goals and limitations of
+these ioctls.
+
+FS_IOC_REMOVE_ENCRYPTION_KEY
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The FS_IOC_REMOVE_ENCRYPTION_KEY ioctl removes a claim to a master
+encryption key from the filesystem, and possibly removes the key
+itself. It can be executed on any file or directory on the target
+filesystem, but using the filesystem's root directory is recommended.
+It takes in a pointer to struct fscrypt_remove_key_arg, defined
+as follows::
+
+ struct fscrypt_remove_key_arg {
+ struct fscrypt_key_specifier key_spec;
+ #define FSCRYPT_KEY_REMOVAL_STATUS_FLAG_FILES_BUSY 0x00000001
+ #define FSCRYPT_KEY_REMOVAL_STATUS_FLAG_OTHER_USERS 0x00000002
+ __u32 removal_status_flags; /* output */
+ __u32 __reserved[5];
+ };
+
+This structure must be zeroed, then initialized as follows:
+
+- The key to remove is specified by ``key_spec``:
+
+ - To remove a key used by v1 encryption policies, set
+ ``key_spec.type`` to FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR and fill
+ in ``key_spec.u.descriptor``. To remove this type of key, the
+ calling process must have the CAP_SYS_ADMIN capability in the
+ initial user namespace.
+
+ - To remove a key used by v2 encryption policies, set
+ ``key_spec.type`` to FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER and fill
+ in ``key_spec.u.identifier``.
+
+For v2 policy keys, this ioctl is usable by non-root users. However,
+to make this possible, it actually just removes the current user's
+claim to the key, undoing a single call to FS_IOC_ADD_ENCRYPTION_KEY.
+Only after all claims are removed is the key really removed.
+
+For example, if FS_IOC_ADD_ENCRYPTION_KEY was called with uid 1000,
+then the key will be "claimed" by uid 1000, and
+FS_IOC_REMOVE_ENCRYPTION_KEY will only succeed as uid 1000. Or, if
+both uids 1000 and 2000 added the key, then for each uid
+FS_IOC_REMOVE_ENCRYPTION_KEY will only remove their own claim. Only
+once *both* are removed is the key really removed. (Think of it like
+unlinking a file that may have hard links.)
+
+If FS_IOC_REMOVE_ENCRYPTION_KEY really removes the key, it will also
+try to "lock" all files that had been unlocked with the key. It won't
+lock files that are still in-use, so this ioctl is expected to be used
+in cooperation with userspace ensuring that none of the files are
+still open. However, if necessary, this ioctl can be executed again
+later to retry locking any remaining files.
+
+FS_IOC_REMOVE_ENCRYPTION_KEY returns 0 if either the key was removed
+(but may still have files remaining to be locked), the user's claim to
+the key was removed, or the key was already removed but had files
+remaining to be the locked so the ioctl retried locking them. In any
+of these cases, ``removal_status_flags`` is filled in with the
+following informational status flags:
+
+- ``FSCRYPT_KEY_REMOVAL_STATUS_FLAG_FILES_BUSY``: set if some file(s)
+ are still in-use. Not guaranteed to be set in the case where only
+ the user's claim to the key was removed.
+- ``FSCRYPT_KEY_REMOVAL_STATUS_FLAG_OTHER_USERS``: set if only the
+ user's claim to the key was removed, not the key itself
+
+FS_IOC_REMOVE_ENCRYPTION_KEY can fail with the following errors:
+
+- ``EACCES``: The FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR key specifier type
+ was specified, but the caller does not have the CAP_SYS_ADMIN
+ capability in the initial user namespace
+- ``EINVAL``: invalid key specifier type, or reserved bits were set
+- ``ENOKEY``: the key object was not found at all, i.e. it was never
+ added in the first place or was already fully removed including all
+ files locked; or, the user does not have a claim to the key (but
+ someone else does).
+- ``ENOTTY``: this type of filesystem does not implement encryption
+- ``EOPNOTSUPP``: the kernel was not configured with encryption
+ support for this filesystem, or the filesystem superblock has not
+ had encryption enabled on it
+
+FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS is exactly the same as
+`FS_IOC_REMOVE_ENCRYPTION_KEY`_, except that for v2 policy keys, the
+ALL_USERS version of the ioctl will remove all users' claims to the
+key, not just the current user's. I.e., the key itself will always be
+removed, no matter how many users have added it. This difference is
+only meaningful if non-root users are adding and removing keys.
+
+Because of this, FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS also requires
+"root", namely the CAP_SYS_ADMIN capability in the initial user
+namespace. Otherwise it will fail with EACCES.
+
+Getting key status
+------------------
+
+FS_IOC_GET_ENCRYPTION_KEY_STATUS
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The FS_IOC_GET_ENCRYPTION_KEY_STATUS ioctl retrieves the status of a
+master encryption key. It can be executed on any file or directory on
+the target filesystem, but using the filesystem's root directory is
+recommended. It takes in a pointer to
+struct fscrypt_get_key_status_arg, defined as follows::
+
+ struct fscrypt_get_key_status_arg {
+ /* input */
+ struct fscrypt_key_specifier key_spec;
+ __u32 __reserved[6];
+
+ /* output */
+ #define FSCRYPT_KEY_STATUS_ABSENT 1
+ #define FSCRYPT_KEY_STATUS_PRESENT 2
+ #define FSCRYPT_KEY_STATUS_INCOMPLETELY_REMOVED 3
+ __u32 status;
+ #define FSCRYPT_KEY_STATUS_FLAG_ADDED_BY_SELF 0x00000001
+ __u32 status_flags;
+ __u32 user_count;
+ __u32 __out_reserved[13];
+ };
+
+The caller must zero all input fields, then fill in ``key_spec``:
+
+ - To get the status of a key for v1 encryption policies, set
+ ``key_spec.type`` to FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR and fill
+ in ``key_spec.u.descriptor``.
+
+ - To get the status of a key for v2 encryption policies, set
+ ``key_spec.type`` to FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER and fill
+ in ``key_spec.u.identifier``.
+
+On success, 0 is returned and the kernel fills in the output fields:
+
+- ``status`` indicates whether the key is absent, present, or
+ incompletely removed. Incompletely removed means that the master
+ secret has been removed, but some files are still in use; i.e.,
+ `FS_IOC_REMOVE_ENCRYPTION_KEY`_ returned 0 but set the informational
+ status flag FSCRYPT_KEY_REMOVAL_STATUS_FLAG_FILES_BUSY.
+
+- ``status_flags`` can contain the following flags:
+
+ - ``FSCRYPT_KEY_STATUS_FLAG_ADDED_BY_SELF`` indicates that the key
+ has added by the current user. This is only set for keys
+ identified by ``identifier`` rather than by ``descriptor``.
+
+- ``user_count`` specifies the number of users who have added the key.
+ This is only set for keys identified by ``identifier`` rather than
+ by ``descriptor``.
+
+FS_IOC_GET_ENCRYPTION_KEY_STATUS can fail with the following errors:
+
+- ``EINVAL``: invalid key specifier type, or reserved bits were set
+- ``ENOTTY``: this type of filesystem does not implement encryption
+- ``EOPNOTSUPP``: the kernel was not configured with encryption
+ support for this filesystem, or the filesystem superblock has not
+ had encryption enabled on it
+
+Among other use cases, FS_IOC_GET_ENCRYPTION_KEY_STATUS can be useful
+for determining whether the key for a given encrypted directory needs
+to be added before prompting the user for the passphrase needed to
+derive the key.
+
+FS_IOC_GET_ENCRYPTION_KEY_STATUS can only get the status of keys in
+the filesystem-level keyring, i.e. the keyring managed by
+`FS_IOC_ADD_ENCRYPTION_KEY`_ and `FS_IOC_REMOVE_ENCRYPTION_KEY`_. It
+cannot get the status of a key that has only been added for use by v1
+encryption policies using the legacy mechanism involving
+process-subscribed keyrings.
+
+Access semantics
+================
+
+With the key
+------------
+
+With the encryption key, encrypted regular files, directories, and
+symlinks behave very similarly to their unencrypted counterparts ---
+after all, the encryption is intended to be transparent. However,
+astute users may notice some differences in behavior:
+
+- Unencrypted files, or files encrypted with a different encryption
+ policy (i.e. different key, modes, or flags), cannot be renamed or
+ linked into an encrypted directory; see `Encryption policy
+ enforcement`_. Attempts to do so will fail with EXDEV. However,
+ encrypted files can be renamed within an encrypted directory, or
+ into an unencrypted directory.
+
+ Note: "moving" an unencrypted file into an encrypted directory, e.g.
+ with the `mv` program, is implemented in userspace by a copy
+ followed by a delete. Be aware that the original unencrypted data
+ may remain recoverable from free space on the disk; prefer to keep
+ all files encrypted from the very beginning. The `shred` program
+ may be used to overwrite the source files but isn't guaranteed to be
+ effective on all filesystems and storage devices.
+
+- Direct I/O is supported on encrypted files only under some
+ circumstances. For details, see `Direct I/O support`_.
+
+- The fallocate operations FALLOC_FL_COLLAPSE_RANGE and
+ FALLOC_FL_INSERT_RANGE are not supported on encrypted files and will
+ fail with EOPNOTSUPP.
+
+- Online defragmentation of encrypted files is not supported. The
+ EXT4_IOC_MOVE_EXT and F2FS_IOC_MOVE_RANGE ioctls will fail with
+ EOPNOTSUPP.
+
+- The ext4 filesystem does not support data journaling with encrypted
+ regular files. It will fall back to ordered data mode instead.
+
+- DAX (Direct Access) is not supported on encrypted files.
+
+- The maximum length of an encrypted symlink is 2 bytes shorter than
+ the maximum length of an unencrypted symlink. For example, on an
+ EXT4 filesystem with a 4K block size, unencrypted symlinks can be up
+ to 4095 bytes long, while encrypted symlinks can only be up to 4093
+ bytes long (both lengths excluding the terminating null).
+
+Note that mmap *is* supported. This is possible because the pagecache
+for an encrypted file contains the plaintext, not the ciphertext.
+
+Without the key
+---------------
+
+Some filesystem operations may be performed on encrypted regular
+files, directories, and symlinks even before their encryption key has
+been added, or after their encryption key has been removed:
+
+- File metadata may be read, e.g. using stat().
+
+- Directories may be listed, in which case the filenames will be
+ listed in an encoded form derived from their ciphertext. The
+ current encoding algorithm is described in `Filename hashing and
+ encoding`_. The algorithm is subject to change, but it is
+ guaranteed that the presented filenames will be no longer than
+ NAME_MAX bytes, will not contain the ``/`` or ``\0`` characters, and
+ will uniquely identify directory entries.
+
+ The ``.`` and ``..`` directory entries are special. They are always
+ present and are not encrypted or encoded.
+
+- Files may be deleted. That is, nondirectory files may be deleted
+ with unlink() as usual, and empty directories may be deleted with
+ rmdir() as usual. Therefore, ``rm`` and ``rm -r`` will work as
+ expected.
+
+- Symlink targets may be read and followed, but they will be presented
+ in encrypted form, similar to filenames in directories. Hence, they
+ are unlikely to point to anywhere useful.
+
+Without the key, regular files cannot be opened or truncated.
+Attempts to do so will fail with ENOKEY. This implies that any
+regular file operations that require a file descriptor, such as
+read(), write(), mmap(), fallocate(), and ioctl(), are also forbidden.
+
+Also without the key, files of any type (including directories) cannot
+be created or linked into an encrypted directory, nor can a name in an
+encrypted directory be the source or target of a rename, nor can an
+O_TMPFILE temporary file be created in an encrypted directory. All
+such operations will fail with ENOKEY.
+
+It is not currently possible to backup and restore encrypted files
+without the encryption key. This would require special APIs which
+have not yet been implemented.
+
+Encryption policy enforcement
+=============================
+
+After an encryption policy has been set on a directory, all regular
+files, directories, and symbolic links created in that directory
+(recursively) will inherit that encryption policy. Special files ---
+that is, named pipes, device nodes, and UNIX domain sockets --- will
+not be encrypted.
+
+Except for those special files, it is forbidden to have unencrypted
+files, or files encrypted with a different encryption policy, in an
+encrypted directory tree. Attempts to link or rename such a file into
+an encrypted directory will fail with EXDEV. This is also enforced
+during ->lookup() to provide limited protection against offline
+attacks that try to disable or downgrade encryption in known locations
+where applications may later write sensitive data. It is recommended
+that systems implementing a form of "verified boot" take advantage of
+this by validating all top-level encryption policies prior to access.
+
+Inline encryption support
+=========================
+
+By default, fscrypt uses the kernel crypto API for all cryptographic
+operations (other than HKDF, which fscrypt partially implements
+itself). The kernel crypto API supports hardware crypto accelerators,
+but only ones that work in the traditional way where all inputs and
+outputs (e.g. plaintexts and ciphertexts) are in memory. fscrypt can
+take advantage of such hardware, but the traditional acceleration
+model isn't particularly efficient and fscrypt hasn't been optimized
+for it.
+
+Instead, many newer systems (especially mobile SoCs) have *inline
+encryption hardware* that can encrypt/decrypt data while it is on its
+way to/from the storage device. Linux supports inline encryption
+through a set of extensions to the block layer called *blk-crypto*.
+blk-crypto allows filesystems to attach encryption contexts to bios
+(I/O requests) to specify how the data will be encrypted or decrypted
+in-line. For more information about blk-crypto, see
+:ref:`Documentation/block/inline-encryption.rst <inline_encryption>`.
+
+On supported filesystems (currently ext4 and f2fs), fscrypt can use
+blk-crypto instead of the kernel crypto API to encrypt/decrypt file
+contents. To enable this, set CONFIG_FS_ENCRYPTION_INLINE_CRYPT=y in
+the kernel configuration, and specify the "inlinecrypt" mount option
+when mounting the filesystem.
+
+Note that the "inlinecrypt" mount option just specifies to use inline
+encryption when possible; it doesn't force its use. fscrypt will
+still fall back to using the kernel crypto API on files where the
+inline encryption hardware doesn't have the needed crypto capabilities
+(e.g. support for the needed encryption algorithm and data unit size)
+and where blk-crypto-fallback is unusable. (For blk-crypto-fallback
+to be usable, it must be enabled in the kernel configuration with
+CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK=y.)
+
+Currently fscrypt always uses the filesystem block size (which is
+usually 4096 bytes) as the data unit size. Therefore, it can only use
+inline encryption hardware that supports that data unit size.
+
+Inline encryption doesn't affect the ciphertext or other aspects of
+the on-disk format, so users may freely switch back and forth between
+using "inlinecrypt" and not using "inlinecrypt".
+
+Direct I/O support
+==================
+
+For direct I/O on an encrypted file to work, the following conditions
+must be met (in addition to the conditions for direct I/O on an
+unencrypted file):
+
+* The file must be using inline encryption. Usually this means that
+ the filesystem must be mounted with ``-o inlinecrypt`` and inline
+ encryption hardware must be present. However, a software fallback
+ is also available. For details, see `Inline encryption support`_.
+
+* The I/O request must be fully aligned to the filesystem block size.
+ This means that the file position the I/O is targeting, the lengths
+ of all I/O segments, and the memory addresses of all I/O buffers
+ must be multiples of this value. Note that the filesystem block
+ size may be greater than the logical block size of the block device.
+
+If either of the above conditions is not met, then direct I/O on the
+encrypted file will fall back to buffered I/O.
+
+Implementation details
+======================
+
+Encryption context
+------------------
+
+An encryption policy is represented on-disk by
+struct fscrypt_context_v1 or struct fscrypt_context_v2. It is up to
+individual filesystems to decide where to store it, but normally it
+would be stored in a hidden extended attribute. It should *not* be
+exposed by the xattr-related system calls such as getxattr() and
+setxattr() because of the special semantics of the encryption xattr.
+(In particular, there would be much confusion if an encryption policy
+were to be added to or removed from anything other than an empty
+directory.) These structs are defined as follows::
+
+ #define FSCRYPT_FILE_NONCE_SIZE 16
+
+ #define FSCRYPT_KEY_DESCRIPTOR_SIZE 8
+ struct fscrypt_context_v1 {
+ u8 version;
+ u8 contents_encryption_mode;
+ u8 filenames_encryption_mode;
+ u8 flags;
+ u8 master_key_descriptor[FSCRYPT_KEY_DESCRIPTOR_SIZE];
+ u8 nonce[FSCRYPT_FILE_NONCE_SIZE];
+ };
+
+ #define FSCRYPT_KEY_IDENTIFIER_SIZE 16
+ struct fscrypt_context_v2 {
+ u8 version;
+ u8 contents_encryption_mode;
+ u8 filenames_encryption_mode;
+ u8 flags;
+ u8 __reserved[4];
+ u8 master_key_identifier[FSCRYPT_KEY_IDENTIFIER_SIZE];
+ u8 nonce[FSCRYPT_FILE_NONCE_SIZE];
+ };
+
+The context structs contain the same information as the corresponding
+policy structs (see `Setting an encryption policy`_), except that the
+context structs also contain a nonce. The nonce is randomly generated
+by the kernel and is used as KDF input or as a tweak to cause
+different files to be encrypted differently; see `Per-file encryption
+keys`_ and `DIRECT_KEY policies`_.
+
+Data path changes
+-----------------
+
+When inline encryption is used, filesystems just need to associate
+encryption contexts with bios to specify how the block layer or the
+inline encryption hardware will encrypt/decrypt the file contents.
+
+When inline encryption isn't used, filesystems must encrypt/decrypt
+the file contents themselves, as described below:
+
+For the read path (->read_folio()) of regular files, filesystems can
+read the ciphertext into the page cache and decrypt it in-place. The
+page lock must be held until decryption has finished, to prevent the
+page from becoming visible to userspace prematurely.
+
+For the write path (->writepage()) of regular files, filesystems
+cannot encrypt data in-place in the page cache, since the cached
+plaintext must be preserved. Instead, filesystems must encrypt into a
+temporary buffer or "bounce page", then write out the temporary
+buffer. Some filesystems, such as UBIFS, already use temporary
+buffers regardless of encryption. Other filesystems, such as ext4 and
+F2FS, have to allocate bounce pages specially for encryption.
+
+Filename hashing and encoding
+-----------------------------
+
+Modern filesystems accelerate directory lookups by using indexed
+directories. An indexed directory is organized as a tree keyed by
+filename hashes. When a ->lookup() is requested, the filesystem
+normally hashes the filename being looked up so that it can quickly
+find the corresponding directory entry, if any.
+
+With encryption, lookups must be supported and efficient both with and
+without the encryption key. Clearly, it would not work to hash the
+plaintext filenames, since the plaintext filenames are unavailable
+without the key. (Hashing the plaintext filenames would also make it
+impossible for the filesystem's fsck tool to optimize encrypted
+directories.) Instead, filesystems hash the ciphertext filenames,
+i.e. the bytes actually stored on-disk in the directory entries. When
+asked to do a ->lookup() with the key, the filesystem just encrypts
+the user-supplied name to get the ciphertext.
+
+Lookups without the key are more complicated. The raw ciphertext may
+contain the ``\0`` and ``/`` characters, which are illegal in
+filenames. Therefore, readdir() must base64url-encode the ciphertext
+for presentation. For most filenames, this works fine; on ->lookup(),
+the filesystem just base64url-decodes the user-supplied name to get
+back to the raw ciphertext.
+
+However, for very long filenames, base64url encoding would cause the
+filename length to exceed NAME_MAX. To prevent this, readdir()
+actually presents long filenames in an abbreviated form which encodes
+a strong "hash" of the ciphertext filename, along with the optional
+filesystem-specific hash(es) needed for directory lookups. This
+allows the filesystem to still, with a high degree of confidence, map
+the filename given in ->lookup() back to a particular directory entry
+that was previously listed by readdir(). See
+struct fscrypt_nokey_name in the source for more details.
+
+Note that the precise way that filenames are presented to userspace
+without the key is subject to change in the future. It is only meant
+as a way to temporarily present valid filenames so that commands like
+``rm -r`` work as expected on encrypted directories.
+
+Tests
+=====
+
+To test fscrypt, use xfstests, which is Linux's de facto standard
+filesystem test suite. First, run all the tests in the "encrypt"
+group on the relevant filesystem(s). One can also run the tests
+with the 'inlinecrypt' mount option to test the implementation for
+inline encryption support. For example, to test ext4 and
+f2fs encryption using `kvm-xfstests
+<https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md>`_::
+
+ kvm-xfstests -c ext4,f2fs -g encrypt
+ kvm-xfstests -c ext4,f2fs -g encrypt -m inlinecrypt
+
+UBIFS encryption can also be tested this way, but it should be done in
+a separate command, and it takes some time for kvm-xfstests to set up
+emulated UBI volumes::
+
+ kvm-xfstests -c ubifs -g encrypt
+
+No tests should fail. However, tests that use non-default encryption
+modes (e.g. generic/549 and generic/550) will be skipped if the needed
+algorithms were not built into the kernel's crypto API. Also, tests
+that access the raw block device (e.g. generic/399, generic/548,
+generic/549, generic/550) will be skipped on UBIFS.
+
+Besides running the "encrypt" group tests, for ext4 and f2fs it's also
+possible to run most xfstests with the "test_dummy_encryption" mount
+option. This option causes all new files to be automatically
+encrypted with a dummy key, without having to make any API calls.
+This tests the encrypted I/O paths more thoroughly. To do this with
+kvm-xfstests, use the "encrypt" filesystem configuration::
+
+ kvm-xfstests -c ext4/encrypt,f2fs/encrypt -g auto
+ kvm-xfstests -c ext4/encrypt,f2fs/encrypt -g auto -m inlinecrypt
+
+Because this runs many more tests than "-g encrypt" does, it takes
+much longer to run; so also consider using `gce-xfstests
+<https://github.com/tytso/xfstests-bld/blob/master/Documentation/gce-xfstests.md>`_
+instead of kvm-xfstests::
+
+ gce-xfstests -c ext4/encrypt,f2fs/encrypt -g auto
+ gce-xfstests -c ext4/encrypt,f2fs/encrypt -g auto -m inlinecrypt