diff options
Diffstat (limited to 'man2/mlock.2')
-rw-r--r-- | man2/mlock.2 | 507 |
1 files changed, 507 insertions, 0 deletions
diff --git a/man2/mlock.2 b/man2/mlock.2 new file mode 100644 index 0000000..1efe3dd --- /dev/null +++ b/man2/mlock.2 @@ -0,0 +1,507 @@ +.\" Copyright (C) Michael Kerrisk, 2004 +.\" using some material drawn from earlier man pages +.\" written by Thomas Kuhn, Copyright 1996 +.\" +.\" SPDX-License-Identifier: GPL-2.0-or-later +.\" +.TH mlock 2 2023-04-08 "Linux man-pages 6.05.01" +.SH NAME +mlock, mlock2, munlock, mlockall, munlockall \- lock and unlock memory +.SH LIBRARY +Standard C library +.RI ( libc ", " \-lc ) +.SH SYNOPSIS +.nf +.B #include <sys/mman.h> +.PP +.BI "int mlock(const void " addr [. len "], size_t " len ); +.BI "int mlock2(const void " addr [. len "], size_t " len ", \ +unsigned int " flags ); +.BI "int munlock(const void " addr [. len "], size_t " len ); +.PP +.BI "int mlockall(int " flags ); +.B int munlockall(void); +.fi +.SH DESCRIPTION +.BR mlock (), +.BR mlock2 (), +and +.BR mlockall () +lock part or all of the calling process's virtual address +space into RAM, preventing that memory from being paged to the +swap area. +.PP +.BR munlock () +and +.BR munlockall () +perform the converse operation, +unlocking part or all of the calling process's virtual address space, +so that pages in the specified virtual address range +can be swapped out again if required by the kernel memory manager. +.PP +Memory locking and unlocking are performed in units of whole pages. +.SS mlock(), mlock2(), and munlock() +.BR mlock () +locks pages in the address range starting at +.I addr +and continuing for +.I len +bytes. +All pages that contain a part of the specified address range are +guaranteed to be resident in RAM when the call returns successfully; +the pages are guaranteed to stay in RAM until later unlocked. +.PP +.BR mlock2 () +.\" commit a8ca5d0ecbdde5cc3d7accacbd69968b0c98764e +.\" commit de60f5f10c58d4f34b68622442c0e04180367f3f +.\" commit b0f205c2a3082dd9081f9a94e50658c5fa906ff1 +also locks pages in the specified range starting at +.I addr +and continuing for +.I len +bytes. +However, the state of the pages contained in that range after the call +returns successfully will depend on the value in the +.I flags +argument. +.PP +The +.I flags +argument can be either 0 or the following constant: +.TP +.B MLOCK_ONFAULT +Lock pages that are currently resident and mark the entire range so +that the remaining nonresident pages are locked when they are populated +by a page fault. +.PP +If +.I flags +is 0, +.BR mlock2 () +behaves exactly the same as +.BR mlock (). +.PP +.BR munlock () +unlocks pages in the address range starting at +.I addr +and continuing for +.I len +bytes. +After this call, all pages that contain a part of the specified +memory range can be moved to external swap space again by the kernel. +.SS mlockall() and munlockall() +.BR mlockall () +locks all pages mapped into the address space of the +calling process. +This includes the pages of the code, data, and stack +segment, as well as shared libraries, user space kernel data, shared +memory, and memory-mapped files. +All mapped pages are guaranteed +to be resident in RAM when the call returns successfully; +the pages are guaranteed to stay in RAM until later unlocked. +.PP +The +.I flags +argument is constructed as the bitwise OR of one or more of the +following constants: +.TP +.B MCL_CURRENT +Lock all pages which are currently mapped into the address space of +the process. +.TP +.B MCL_FUTURE +Lock all pages which will become mapped into the address space of the +process in the future. +These could be, for instance, new pages required +by a growing heap and stack as well as new memory-mapped files or +shared memory regions. +.TP +.BR MCL_ONFAULT " (since Linux 4.4)" +Used together with +.BR MCL_CURRENT , +.BR MCL_FUTURE , +or both. +Mark all current (with +.BR MCL_CURRENT ) +or future (with +.BR MCL_FUTURE ) +mappings to lock pages when they are faulted in. +When used with +.BR MCL_CURRENT , +all present pages are locked, but +.BR mlockall () +will not fault in non-present pages. +When used with +.BR MCL_FUTURE , +all future mappings will be marked to lock pages when they are faulted +in, but they will not be populated by the lock when the mapping is +created. +.B MCL_ONFAULT +must be used with either +.B MCL_CURRENT +or +.B MCL_FUTURE +or both. +.PP +If +.B MCL_FUTURE +has been specified, then a later system call (e.g., +.BR mmap (2), +.BR sbrk (2), +.BR malloc (3)), +may fail if it would cause the number of locked bytes to exceed +the permitted maximum (see below). +In the same circumstances, stack growth may likewise fail: +the kernel will deny stack expansion and deliver a +.B SIGSEGV +signal to the process. +.PP +.BR munlockall () +unlocks all pages mapped into the address space of the +calling process. +.SH RETURN VALUE +On success, these system calls return 0. +On error, \-1 is returned, +.I errno +is set to indicate the error, +and no changes are made to any locks in the +address space of the process. +.SH ERRORS +.\"SVr4 documents an additional EAGAIN error code. +.TP +.B EAGAIN +.RB ( mlock (), +.BR mlock2 (), +and +.BR munlock ()) +Some or all of the specified address range could not be locked. +.TP +.B EINVAL +.RB ( mlock (), +.BR mlock2 (), +and +.BR munlock ()) +The result of the addition +.IR addr + len +was less than +.I addr +(e.g., the addition may have resulted in an overflow). +.TP +.B EINVAL +.RB ( mlock2 ()) +Unknown \fIflags\fP were specified. +.TP +.B EINVAL +.RB ( mlockall ()) +Unknown \fIflags\fP were specified or +.B MCL_ONFAULT +was specified without either +.B MCL_FUTURE +or +.BR MCL_CURRENT . +.TP +.B EINVAL +(Not on Linux) +.I addr +was not a multiple of the page size. +.TP +.B ENOMEM +.RB ( mlock (), +.BR mlock2 (), +and +.BR munlock ()) +Some of the specified address range does not correspond to mapped +pages in the address space of the process. +.TP +.B ENOMEM +.RB ( mlock (), +.BR mlock2 (), +and +.BR munlock ()) +Locking or unlocking a region would result in the total number of +mappings with distinct attributes (e.g., locked versus unlocked) +exceeding the allowed maximum. +.\" I.e., the number of VMAs would exceed the 64kB maximum +(For example, unlocking a range in the middle of a currently locked +mapping would result in three mappings: +two locked mappings at each end and an unlocked mapping in the middle.) +.TP +.B ENOMEM +(Linux 2.6.9 and later) the caller had a nonzero +.B RLIMIT_MEMLOCK +soft resource limit, but tried to lock more memory than the limit +permitted. +This limit is not enforced if the process is privileged +.RB ( CAP_IPC_LOCK ). +.TP +.B ENOMEM +(Linux 2.4 and earlier) the calling process tried to lock more than +half of RAM. +.\" In the case of mlock(), this check is somewhat buggy: it doesn't +.\" take into account whether the to-be-locked range overlaps with +.\" already locked pages. Thus, suppose we allocate +.\" (num_physpages / 4 + 1) of memory, and lock those pages once using +.\" mlock(), and then lock the *same* page range a second time. +.\" In the case, the second mlock() call will fail, since the check +.\" calculates that the process is trying to lock (num_physpages / 2 + 2) +.\" pages, which of course is not true. (MTK, Nov 04, kernel 2.4.28) +.TP +.B EPERM +The caller is not privileged, but needs privilege +.RB ( CAP_IPC_LOCK ) +to perform the requested operation. +.TP +.B EPERM +.RB ( munlockall ()) +(Linux 2.6.8 and earlier) The caller was not privileged +.RB ( CAP_IPC_LOCK ). +.SH VERSIONS +.SS Linux +Under Linux, +.BR mlock (), +.BR mlock2 (), +and +.BR munlock () +automatically round +.I addr +down to the nearest page boundary. +However, the POSIX.1 specification of +.BR mlock () +and +.BR munlock () +allows an implementation to require that +.I addr +is page aligned, so portable applications should ensure this. +.PP +The +.I VmLck +field of the Linux-specific +.IR /proc/ pid /status +file shows how many kilobytes of memory the process with ID +.I PID +has locked using +.BR mlock (), +.BR mlock2 (), +.BR mlockall (), +and +.BR mmap (2) +.BR MAP_LOCKED . +.SH STANDARDS +.TP +.BR mlock () +.TQ +.BR munlock () +.TQ +.BR mlockall () +.TQ +.BR munlockall () +POSIX.1-2008. +.TP +.BR mlock2 () +Linux. +.PP +On POSIX systems on which +.BR mlock () +and +.BR munlock () +are available, +.B _POSIX_MEMLOCK_RANGE +is defined in \fI<unistd.h>\fP and the number of bytes in a page +can be determined from the constant +.B PAGESIZE +(if defined) in \fI<limits.h>\fP or by calling +.IR sysconf(_SC_PAGESIZE) . +.PP +On POSIX systems on which +.BR mlockall () +and +.BR munlockall () +are available, +.B _POSIX_MEMLOCK +is defined in \fI<unistd.h>\fP to a value greater than 0. +(See also +.BR sysconf (3).) +.\" POSIX.1-2001: It shall be defined to -1 or 0 or 200112L. +.\" -1: unavailable, 0: ask using sysconf(). +.\" glibc defines it to 1. +.SH HISTORY +.TP +.BR mlock () +.TQ +.BR munlock () +.TQ +.BR mlockall () +.TQ +.BR munlockall () +POSIX.1-2001, POSIX.1-2008, SVr4. +.TP +.BR mlock2 () +Linux 4.4, +glibc 2.27. +.SH NOTES +Memory locking has two main applications: real-time algorithms and +high-security data processing. +Real-time applications require +deterministic timing, and, like scheduling, paging is one major cause +of unexpected program execution delays. +Real-time applications will +usually also switch to a real-time scheduler with +.BR sched_setscheduler (2). +Cryptographic security software often handles critical bytes like +passwords or secret keys as data structures. +As a result of paging, +these secrets could be transferred onto a persistent swap store medium, +where they might be accessible to the enemy long after the security +software has erased the secrets in RAM and terminated. +(But be aware that the suspend mode on laptops and some desktop +computers will save a copy of the system's RAM to disk, regardless +of memory locks.) +.PP +Real-time processes that are using +.BR mlockall () +to prevent delays on page faults should reserve enough +locked stack pages before entering the time-critical section, +so that no page fault can be caused by function calls. +This can be achieved by calling a function that allocates a +sufficiently large automatic variable (an array) and writes to the +memory occupied by this array in order to touch these stack pages. +This way, enough pages will be mapped for the stack and can be +locked into RAM. +The dummy writes ensure that not even copy-on-write +page faults can occur in the critical section. +.PP +Memory locks are not inherited by a child created via +.BR fork (2) +and are automatically removed (unlocked) during an +.BR execve (2) +or when the process terminates. +The +.BR mlockall () +.B MCL_FUTURE +and +.B MCL_FUTURE | MCL_ONFAULT +settings are not inherited by a child created via +.BR fork (2) +and are cleared during an +.BR execve (2). +.PP +Note that +.BR fork (2) +will prepare the address space for a copy-on-write operation. +The consequence is that any write access that follows will cause +a page fault that in turn may cause high latencies for a real-time process. +Therefore, it is crucial not to invoke +.BR fork (2) +after an +.BR mlockall () +or +.BR mlock () +operation\[em]not even from a thread which runs at a low priority within +a process which also has a thread running at elevated priority. +.PP +The memory lock on an address range is automatically removed +if the address range is unmapped via +.BR munmap (2). +.PP +Memory locks do not stack, that is, pages which have been locked several times +by calls to +.BR mlock (), +.BR mlock2 (), +or +.BR mlockall () +will be unlocked by a single call to +.BR munlock () +for the corresponding range or by +.BR munlockall (). +Pages which are mapped to several locations or by several processes stay +locked into RAM as long as they are locked at least at one location or by +at least one process. +.PP +If a call to +.BR mlockall () +which uses the +.B MCL_FUTURE +flag is followed by another call that does not specify this flag, the +changes made by the +.B MCL_FUTURE +call will be lost. +.PP +The +.BR mlock2 () +.B MLOCK_ONFAULT +flag and the +.BR mlockall () +.B MCL_ONFAULT +flag allow efficient memory locking for applications that deal with +large mappings where only a (small) portion of pages in the mapping are touched. +In such cases, locking all of the pages in a mapping would incur +a significant penalty for memory locking. +.SS Limits and permissions +In Linux 2.6.8 and earlier, +a process must be privileged +.RB ( CAP_IPC_LOCK ) +in order to lock memory and the +.B RLIMIT_MEMLOCK +soft resource limit defines a limit on how much memory the process may lock. +.PP +Since Linux 2.6.9, no limits are placed on the amount of memory +that a privileged process can lock and the +.B RLIMIT_MEMLOCK +soft resource limit instead defines a limit on how much memory an +unprivileged process may lock. +.SH BUGS +In Linux 4.8 and earlier, +a bug in the kernel's accounting of locked memory for unprivileged processes +(i.e., without +.BR CAP_IPC_LOCK ) +meant that if the region specified by +.I addr +and +.I len +overlapped an existing lock, +then the already locked bytes in the overlapping region were counted twice +when checking against the limit. +Such double accounting could incorrectly calculate a "total locked memory" +value for the process that exceeded the +.B RLIMIT_MEMLOCK +limit, with the result that +.BR mlock () +and +.BR mlock2 () +would fail on requests that should have succeeded. +This bug was fixed +.\" commit 0cf2f6f6dc605e587d2c1120f295934c77e810e8 +in Linux 4.9. +.PP +In Linux 2.4 series of kernels up to and including Linux 2.4.17, +a bug caused the +.BR mlockall () +.B MCL_FUTURE +flag to be inherited across a +.BR fork (2). +This was rectified in Linux 2.4.18. +.PP +Since Linux 2.6.9, if a privileged process calls +.I mlockall(MCL_FUTURE) +and later drops privileges (loses the +.B CAP_IPC_LOCK +capability by, for example, +setting its effective UID to a nonzero value), +then subsequent memory allocations (e.g., +.BR mmap (2), +.BR brk (2)) +will fail if the +.B RLIMIT_MEMLOCK +resource limit is encountered. +.\" See the following LKML thread: +.\" http://marc.theaimsgroup.com/?l=linux-kernel&m=113801392825023&w=2 +.\" "Rationale for RLIMIT_MEMLOCK" +.\" 23 Jan 2006 +.SH SEE ALSO +.BR mincore (2), +.BR mmap (2), +.BR setrlimit (2), +.BR shmctl (2), +.BR sysconf (3), +.BR proc (5), +.BR capabilities (7) |