From 399644e47874bff147afb19c89228901ac39340e Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Mon, 15 Apr 2024 21:40:15 +0200 Subject: Adding upstream version 6.05.01. Signed-off-by: Daniel Baumann --- man2/memfd_secret.2 | 204 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 204 insertions(+) create mode 100644 man2/memfd_secret.2 (limited to 'man2/memfd_secret.2') diff --git a/man2/memfd_secret.2 b/man2/memfd_secret.2 new file mode 100644 index 0000000..fcc39f6 --- /dev/null +++ b/man2/memfd_secret.2 @@ -0,0 +1,204 @@ +.\" Copyright (c) 2021, IBM Corporation. +.\" Written by Mike Rapoport +.\" +.\" Based on memfd_create(2) man page +.\" Copyright (C) 2014 Michael Kerrisk +.\" and Copyright (C) 2014 David Herrmann +.\" +.\" SPDX-License-Identifier: GPL-2.0-or-later +.\" +.TH memfd_secret 2 2023-03-30 "Linux man-pages 6.05.01" +.SH NAME +memfd_secret \- create an anonymous RAM-based file +to access secret memory regions +.SH LIBRARY +Standard C library +.RI ( libc ", " \-lc ) +.SH SYNOPSIS +.nf +.PP +.BR "#include " " /* Definition of " SYS_* " constants */" +.B #include +.PP +.BI "int syscall(SYS_memfd_secret, unsigned int " flags ); +.fi +.PP +.IR Note : +glibc provides no wrapper for +.BR memfd_secret (), +necessitating the use of +.BR syscall (2). +.SH DESCRIPTION +.BR memfd_secret () +creates an anonymous RAM-based file and returns a file descriptor +that refers to it. +The file provides a way to create and access memory regions +with stronger protection than usual RAM-based files and +anonymous memory mappings. +Once all open references to the file are closed, +it is automatically released. +The initial size of the file is set to 0. +Following the call, the file size should be set using +.BR ftruncate (2). +.PP +The memory areas backing the file created with +.BR memfd_secret (2) +are visible only to the processes that have access to the file descriptor. +The memory region is removed from the kernel page tables +and only the page tables of the processes holding the file descriptor +map the corresponding physical memory. +(Thus, the pages in the region can't be accessed by the kernel itself, +so that, for example, pointers to the region can't be passed to +system calls.) +.PP +The following values may be bitwise ORed in +.I flags +to control the behavior of +.BR memfd_secret (): +.TP +.B FD_CLOEXEC +Set the close-on-exec flag on the new file descriptor, +which causes the region to be removed from the process on +.BR execve (2). +See the description of the +.B O_CLOEXEC +flag in +.BR open (2) +.PP +As its return value, +.BR memfd_secret () +returns a new file descriptor that refers to an anonymous file. +This file descriptor is opened for both reading and writing +.RB ( O_RDWR ) +and +.B O_LARGEFILE +is set for the file descriptor. +.PP +With respect to +.BR fork (2) +and +.BR execve (2), +the usual semantics apply for the file descriptor created by +.BR memfd_secret (). +A copy of the file descriptor is inherited by the child produced by +.BR fork (2) +and refers to the same file. +The file descriptor is preserved across +.BR execve (2), +unless the close-on-exec flag has been set. +.PP +The memory region is locked into memory in the same way as with +.BR mlock (2), +so that it will never be written into swap, +and hibernation is inhibited for as long as any +.BR memfd_secret () +descriptions exist. +However the implementation of +.BR memfd_secret () +will not try to populate the whole range during the +.BR mmap (2) +call that attaches the region into the process's address space; +instead, the pages are only actually allocated +as they are faulted in. +The amount of memory allowed for memory mappings +of the file descriptor obeys the same rules as +.BR mlock (2) +and cannot exceed +.BR RLIMIT_MEMLOCK . +.SH RETURN VALUE +On success, +.BR memfd_secret () +returns a new file descriptor. +On error, \-1 is returned and +.I errno +is set to indicate the error. +.SH ERRORS +.TP +.B EINVAL +.I flags +included unknown bits. +.TP +.B EMFILE +The per-process limit on the number of open file descriptors has been reached. +.TP +.B EMFILE +The system-wide limit on the total number of open files has been reached. +.TP +.B ENOMEM +There was insufficient memory to create a new anonymous file. +.TP +.B ENOSYS +.BR memfd_secret () +is not implemented on this architecture, +or has not been enabled on the kernel command-line with +.BR secretmem_enable =1. +.SH STANDARDS +Linux. +.SH HISTORY +Linux 5.14. +.SH NOTES +The +.BR memfd_secret () +system call is designed to allow a user-space process +to create a range of memory that is inaccessible to anybody else - +kernel included. +There is no 100% guarantee that kernel won't be able to access +memory ranges backed by +.BR memfd_secret () +in any circumstances, but nevertheless, +it is much harder to exfiltrate data from these regions. +.PP +.BR memfd_secret () +provides the following protections: +.IP \[bu] 3 +Enhanced protection +(in conjunction with all the other in-kernel attack prevention systems) +against ROP attacks. +Absence of any in-kernel primitive for accessing memory backed by +.BR memfd_secret () +means that one-gadget ROP attack +can't work to perform data exfiltration. +The attacker would need to find enough ROP gadgets +to reconstruct the missing page table entries, +which significantly increases difficulty of the attack, +especially when other protections like the kernel stack size limit +and address space layout randomization are in place. +.IP \[bu] +Prevent cross-process user-space memory exposures. +Once a region for a +.BR memfd_secret () +memory mapping is allocated, +the user can't accidentally pass it into the kernel +to be transmitted somewhere. +The memory pages in this region cannot be accessed via the direct map +and they are disallowed in get_user_pages. +.IP \[bu] +Harden against exploited kernel flaws. +In order to access memory areas backed by +.BR memfd_secret (), +a kernel-side attack would need to +either walk the page tables and create new ones, +or spawn a new privileged user-space process to perform +secrets exfiltration using +.BR ptrace (2). +.PP +The way +.BR memfd_secret () +allocates and locks the memory may impact overall system performance, +therefore the system call is disabled by default and only available +if the system administrator turned it on using +"secretmem.enable=y" kernel parameter. +.PP +To prevent potential data leaks of memory regions backed by +.BR memfd_secret () +from a hybernation image, +hybernation is prevented when there are active +.BR memfd_secret () +users. +.SH SEE ALSO +.BR fcntl (2), +.BR ftruncate (2), +.BR mlock (2), +.BR memfd_create (2), +.BR mmap (2), +.BR setrlimit (2) -- cgit v1.2.3