diff options
Diffstat (limited to 'man2/userfaultfd.2')
-rw-r--r-- | man2/userfaultfd.2 | 104 |
1 files changed, 56 insertions, 48 deletions
diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2 index 82903c6..27f4b69 100644 --- a/man2/userfaultfd.2 +++ b/man2/userfaultfd.2 @@ -4,7 +4,7 @@ .\" .\" SPDX-License-Identifier: Linux-man-pages-copyleft .\" -.TH userfaultfd 2 2023-05-03 "Linux man-pages 6.05.01" +.TH userfaultfd 2 2024-02-12 "Linux man-pages 6.7" .SH NAME userfaultfd \- create a file descriptor for handling page faults in user space .SH LIBRARY @@ -16,10 +16,10 @@ Standard C library .BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */" .BR "#include <linux/userfaultfd.h>" " /* Definition of " UFFD_* " constants */" .B #include <unistd.h> -.PP +.P .BI "int syscall(SYS_userfaultfd, int " flags ); .fi -.PP +.P .IR Note : glibc provides no wrapper for .BR userfaultfd (), @@ -32,7 +32,7 @@ handling to a user-space application, and returns a file descriptor that refers to the new object. The new userfaultfd object is configured using .BR ioctl (2). -.PP +.P Once the userfaultfd object is configured, the application can use .BR read (2) to receive userfaultfd notifications. @@ -41,7 +41,7 @@ depending on the value of .I flags used for the creation of the userfaultfd or subsequent calls to .BR fcntl (2). -.PP +.P The following values may be bitwise ORed in .I flags to change the behavior of @@ -69,12 +69,12 @@ When a kernel-originated fault was triggered on the registered range with this userfaultfd, a .B SIGBUS signal will be delivered. -.PP +.P When the last file descriptor referring to a userfaultfd object is closed, all memory ranges that were registered with the object are unregistered and unread events are flushed. .\" -.PP +.P Userfaultfd supports three modes of registration: .TP .BR UFFDIO_REGISTER_MODE_MISSING " (since Linux 4.10)" @@ -111,9 +111,9 @@ The faulted thread will be stopped from execution until user-space write-unprotects the page using an .B UFFDIO_WRITEPROTECT ioctl. -.PP +.P Multiple modes can be enabled at the same time for the same memory range. -.PP +.P Since Linux 4.14, a userfaultfd page-fault notification can selectively embed faulting thread ID information into the notification. One needs to enable this feature explicitly using the @@ -132,7 +132,7 @@ them using the operations described in .BR ioctl_userfaultfd (2). When servicing the page fault events, the fault-handling thread can trigger a wake-up for the sleeping thread. -.PP +.P It is possible for the faulting threads and the fault-handling threads to run in the context of different processes. In this case, these threads may belong to different programs, @@ -142,7 +142,7 @@ In such non-cooperative mode, the process that monitors userfaultfd and handles page faults needs to be aware of the changes in the virtual memory layout of the faulting process to avoid memory corruption. -.PP +.P Since Linux 4.11, userfaultfd can also notify the fault-handling threads about changes in the virtual memory layout of the faulting process. @@ -166,7 +166,7 @@ soon as the userfaultfd manager executes The userfaultfd manager should carefully synchronize calls to .B UFFDIO_COPY with the processing of events. -.PP +.P The current asynchronous model of the event delivery is optimal for single threaded non-cooperative userfaultfd manager implementations. .\" Regarding the preceding sentence, Mike Rapoport says: @@ -174,13 +174,13 @@ single threaded non-cooperative userfaultfd manager implementations. .\" problematic for multi-threaded monitor. I even suspect that it would be .\" impossible to ensure synchronization between page faults and non-page .\" fault events in multi-threaded monitor. -.\" .PP +.\" .P .\" FIXME elaborate about non-cooperating mode, describe its limitations .\" for kernels before Linux 4.11, features added in Linux 4.11 .\" and limitations remaining in Linux 4.11 .\" Maybe it's worth adding a dedicated sub-section... .\" -.PP +.P Since Linux 5.7, userfaultfd is able to do synchronous page dirty tracking using the new write-protect register mode. One should check against the feature bit @@ -200,14 +200,15 @@ the application must enable it using the .B UFFDIO_API .BR ioctl (2) operation. -This operation allows a handshake between the kernel and user space -to determine the API version and supported features. +This operation allows a two-step handshake between the kernel and user space +to determine what API version and features the kernel supports, +and then to enable those features user space wants. This operation must be performed before any of the other .BR ioctl (2) operations described below (or those operations fail with the .B EINVAL error). -.PP +.P After a successful .B UFFDIO_API operation, @@ -221,14 +222,14 @@ operation, a page fault occurring in the requested memory range, and satisfying the mode defined at the registration time, will be forwarded by the kernel to the user-space application. -The application can then use the -.B UFFDIO_COPY , -.B UFFDIO_ZEROPAGE , +The application can then use various (e.g., +.BR UFFDIO_COPY , +.BR UFFDIO_ZEROPAGE , or -.B UFFDIO_CONTINUE +.BR UFFDIO_CONTINUE ) .BR ioctl (2) operations to resolve the page fault. -.PP +.P Since Linux 4.14, if the application sets the .B UFFD_FEATURE_SIGBUS feature bit using the @@ -247,23 +248,23 @@ accesses. For example, this feature can be useful for applications that want to prevent the kernel from automatically allocating pages and filling holes in sparse files when the hole is accessed through a memory mapping. -.PP +.P The .B UFFD_FEATURE_SIGBUS feature is implicitly inherited through .BR fork (2) if used in combination with .BR UFFD_FEATURE_FORK . -.PP +.P Details of the various .BR ioctl (2) operations can be found in .BR ioctl_userfaultfd (2). -.PP +.P Since Linux 4.11, events other than page-fault may enabled during .B UFFDIO_API operation. -.PP +.P Up to Linux 4.11, userfaultfd can be used only with anonymous private memory mappings. Since Linux 4.11, @@ -276,13 +277,13 @@ The user needs to first check availability of this feature using ioctl against the feature bit .B UFFD_FEATURE_PAGEFAULT_FLAG_WP before using this feature. -.PP +.P Since Linux 5.19, the write-protection mode was also supported on shmem and hugetlbfs memory types. It can be detected with the feature bit .BR UFFD_FEATURE_WP_HUGETLBFS_SHMEM . -.PP +.P To register with userfaultfd write-protect mode, the user needs to initiate the .B UFFDIO_REGISTER ioctl with mode @@ -300,7 +301,7 @@ registered, user-space will receive any notification when a missing page is written. Instead, user-space will receive a write-protect page-fault notification only when an existing but write-protected page got written. -.PP +.P After the .B UFFDIO_REGISTER ioctl completed with @@ -312,7 +313,7 @@ where .I uffdio_writeprotect.mode should be set to .BR UFFDIO_WRITEPROTECT_MODE_WP . -.PP +.P When a write-protect event happens, user-space will receive a page-fault notification whose .I uffd_msg.pagefault.flags @@ -325,7 +326,7 @@ write-protect notifications will always have the bit set along with the .B UFFD_PAGEFAULT_FLAG_WP bit. -.PP +.P To resolve a write-protection page fault, the user should initiate another .B UFFDIO_WRITEPROTECT ioctl, whose @@ -351,14 +352,14 @@ since Linux 5.13, or .B UFFD_FEATURE_MINOR_SHMEM since Linux 5.14. -.PP +.P To register with userfaultfd minor fault mode, the user needs to initiate the .B UFFDIO_REGISTER ioctl with mode .B UFFD_REGISTER_MODE_MINOR set. -.PP +.P When a minor fault occurs, user-space will receive a page-fault notification whose @@ -366,7 +367,7 @@ whose will have the .B UFFD_PAGEFAULT_FLAG_MINOR flag set. -.PP +.P To resolve a minor page fault, the handler should decide whether or not the existing page contents need to be modified first. @@ -382,7 +383,7 @@ ioctl, which installs the page table entries and (by default) wakes up the faulting thread(s). -.PP +.P Minor fault mode supports only hugetlbfs-backed (since Linux 5.13) and shmem-backed (since Linux 5.14) memory. .\" @@ -393,7 +394,7 @@ from the userfaultfd file descriptor returns one or more .I uffd_msg structures, each of which describes a page-fault event or an event required for the non-cooperative userfaultfd usage: -.PP +.P .in +4n .EX struct uffd_msg { @@ -430,7 +431,7 @@ struct uffd_msg { } __packed; .EE .in -.PP +.P If multiple events are available and the supplied buffer is large enough, .BR read (2) returns as many events as will fit in the supplied buffer. @@ -442,7 +443,7 @@ structure, the .BR read (2) fails with the error .BR EINVAL . -.PP +.P The fields set in the .I uffd_msg structure are as follows: @@ -532,7 +533,7 @@ If this flag is set, then the fault was a minor fault. .TP .B UFFD_PAGEFAULT_FLAG_WRITE If this flag is set, then the fault was a write fault. -.PP +.P If neither .B UFFD_PAGEFAULT_FLAG_WP nor @@ -569,7 +570,7 @@ or unmapped The end address of the memory range that was freed using .BR madvise (2) or unmapped -.PP +.P A .BR read (2) on a userfaultfd file descriptor can fail with the following errors: @@ -579,7 +580,7 @@ The userfaultfd object has not yet been enabled using the .B UFFDIO_API .BR ioctl (2) operation -.PP +.P If the .B O_NONBLOCK flag is enabled in the associated open file description, @@ -636,7 +637,7 @@ has the value 0. Linux. .SH HISTORY Linux 4.3. -.PP +.P Support for hugetlbfs and shared memory areas and non-page-fault events was added in Linux 4.11 .SH NOTES @@ -666,7 +667,7 @@ The program creates two threads, one of which acts as the page-fault handler for the process, for the pages in a demand-page zero region created using .BR mmap (2). -.PP +.P The program takes one command-line argument, which is the number of pages that will be created in a mapping whose page faults will be handled via userfaultfd. @@ -678,13 +679,13 @@ and registers the address range of that mapping using the operation. The program then creates a second thread that will perform the task of handling page faults. -.PP +.P The main thread then walks through the pages of the mapping fetching bytes from successive pages. Because the pages have not yet been accessed, the first access of a byte in each page will trigger a page-fault event on the userfaultfd file descriptor. -.PP +.P Each of the page-fault events is handled by the second thread, which sits in a loop processing input from the userfaultfd file descriptor. In each loop iteration, the second thread first calls @@ -699,9 +700,9 @@ the faulting region using the .B UFFDIO_COPY .BR ioctl (2) operation. -.PP +.P The following is an example of what we see when running the program: -.PP +.P .in +4n .EX $ \fB./userfaultfd_demo 3\fP @@ -879,6 +880,13 @@ main(int argc, char *argv[]) if (uffd == \-1) err(EXIT_FAILURE, "userfaultfd"); \& + /* NOTE: Two-step feature handshake is not needed here, since this + example doesn't require any specific features. +\& + Programs that *do* should call UFFDIO_API twice: once with + `features = 0` to detect features supported by this kernel, and + again with the subset of features the program actually wants to + enable. */ uffdio_api.api = UFFD_API; uffdio_api.features = 0; if (ioctl(uffd, UFFDIO_API, &uffdio_api) == \-1) @@ -938,6 +946,6 @@ main(int argc, char *argv[]) .BR ioctl_userfaultfd (2), .BR madvise (2), .BR mmap (2) -.PP +.P .I Documentation/admin\-guide/mm/userfaultfd.rst in the Linux kernel source tree |