From 3d08cd331c1adcf0d917392f7e527b3f00511748 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Fri, 24 May 2024 06:52:22 +0200 Subject: Merging upstream version 6.8. Signed-off-by: Daniel Baumann --- man/man2/mbind.2 | 521 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 521 insertions(+) create mode 100644 man/man2/mbind.2 (limited to 'man/man2/mbind.2') diff --git a/man/man2/mbind.2 b/man/man2/mbind.2 new file mode 100644 index 0000000..67c958d --- /dev/null +++ b/man/man2/mbind.2 @@ -0,0 +1,521 @@ +.\" SPDX-License-Identifier: Linux-man-pages-copyleft-var +.\" +.\" Copyright 2003,2004 Andi Kleen, SuSE Labs. +.\" and Copyright 2007 Lee Schermerhorn, Hewlett Packard +.\" +.\" 2006-02-03, mtk, substantial wording changes and other improvements +.\" 2007-08-27, Lee Schermerhorn +.\" more precise specification of behavior. +.\" +.\" FIXME +.\" Linux 3.8 added MPOL_MF_LAZY, which needs to be documented. +.\" Does it also apply for move_pages()? +.\" +.\" commit b24f53a0bea38b266d219ee651b22dba727c44ae +.\" Author: Lee Schermerhorn +.\" Date: Thu Oct 25 14:16:32 2012 +0200 +.\" +.TH mbind 2 2024-05-02 "Linux man-pages (unreleased)" +.SH NAME +mbind \- set memory policy for a memory range +.SH LIBRARY +NUMA (Non-Uniform Memory Access) policy library +.RI ( libnuma ", " \-lnuma ) +.SH SYNOPSIS +.nf +.B "#include " +.P +.BI "long mbind(void " addr [. len "], unsigned long " len ", int " mode , +.BI " const unsigned long " nodemask [(. maxnode " + ULONG_WIDTH - 1)" +.B " / ULONG_WIDTH]," +.BI " unsigned long " maxnode ", unsigned int " flags ); +.fi +.SH DESCRIPTION +.BR mbind () +sets the NUMA memory policy, +which consists of a policy mode and zero or more nodes, +for the memory range starting with +.I addr +and continuing for +.I len +bytes. +The memory policy defines from which node memory is allocated. +.P +If the memory range specified by the +.IR addr " and " len +arguments includes an "anonymous" region of memory\[em]that is +a region of memory created using the +.BR mmap (2) +system call with the +.BR MAP_ANONYMOUS \[em]or +a memory-mapped file, mapped using the +.BR mmap (2) +system call with the +.B MAP_PRIVATE +flag, pages will be allocated only according to the specified +policy when the application writes (stores) to the page. +For anonymous regions, an initial read access will use a shared +page in the kernel containing all zeros. +For a file mapped with +.BR MAP_PRIVATE , +an initial read access will allocate pages according to the +memory policy of the thread that causes the page to be allocated. +This may not be the thread that called +.BR mbind (). +.P +The specified policy will be ignored for any +.B MAP_SHARED +mappings in the specified memory range. +Rather the pages will be allocated according to the memory policy +of the thread that caused the page to be allocated. +Again, this may not be the thread that called +.BR mbind (). +.P +If the specified memory range includes a shared memory region +created using the +.BR shmget (2) +system call and attached using the +.BR shmat (2) +system call, +pages allocated for the anonymous or shared memory region will +be allocated according to the policy specified, regardless of which +process attached to the shared memory segment causes the allocation. +If, however, the shared memory region was created with the +.B SHM_HUGETLB +flag, +the huge pages will be allocated according to the policy specified +only if the page allocation is caused by the process that calls +.BR mbind () +for that region. +.P +By default, +.BR mbind () +has an effect only for new allocations; if the pages inside +the range have been already touched before setting the policy, +then the policy has no effect. +This default behavior may be overridden by the +.B MPOL_MF_MOVE +and +.B MPOL_MF_MOVE_ALL +flags described below. +.P +The +.I mode +argument must specify one of +.BR MPOL_DEFAULT , +.BR MPOL_BIND , +.BR MPOL_INTERLEAVE , +.BR MPOL_WEIGHTED_INTERLEAVE , +.BR MPOL_PREFERRED , +or +.B MPOL_LOCAL +(which are described in detail below). +All policy modes except +.B MPOL_DEFAULT +require the caller to specify the node or nodes to which the mode applies, +via the +.I nodemask +argument. +.P +The +.I mode +argument may also include an optional +.IR "mode flag" . +The supported +.I "mode flags" +are: +.TP +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.15)" +.\" commit bda420b985054a3badafef23807c4b4fa38a3dff +.\" commit 6d2aec9e123bb9c49cb5c7fc654f25f81e688e8c +When +.I mode +is +.BR MPOL_BIND , +enable the kernel NUMA balancing for the task if it is supported by the kernel. +If the flag isn't supported by the kernel, or is used with +.I mode +other than +.BR MPOL_BIND , +\-1 is returned and +.I errno +is set to +.BR EINVAL . +.TP +.BR MPOL_F_STATIC_NODES " (since Linux-2.6.26)" +A nonempty +.I nodemask +specifies physical node IDs. +Linux does not remap the +.I nodemask +when the thread moves to a different cpuset context, +nor when the set of nodes allowed by the thread's +current cpuset context changes. +.TP +.BR MPOL_F_RELATIVE_NODES " (since Linux-2.6.26)" +A nonempty +.I nodemask +specifies node IDs that are relative to the set of +node IDs allowed by the thread's current cpuset. +.P +.I nodemask +points to a bit mask of nodes containing up to +.I maxnode +bits. +The bit mask size is rounded to the next multiple of +.IR "sizeof(unsigned long)" , +but the kernel will use bits only up to +.IR maxnode . +A NULL value of +.I nodemask +or a +.I maxnode +value of zero specifies the empty set of nodes. +If the value of +.I maxnode +is zero, +the +.I nodemask +argument is ignored. +Where a +.I nodemask +is required, it must contain at least one node that is on-line, +allowed by the thread's current cpuset context +(unless the +.B MPOL_F_STATIC_NODES +mode flag is specified), +and contains memory. +.P +The +.I mode +argument must include one of the following values: +.TP +.B MPOL_DEFAULT +This mode requests that any nondefault policy be removed, +restoring default behavior. +When applied to a range of memory via +.BR mbind (), +this means to use the thread memory policy, +which may have been set with +.BR set_mempolicy (2). +If the mode of the thread memory policy is also +.BR MPOL_DEFAULT , +the system-wide default policy will be used. +The system-wide default policy allocates +pages on the node of the CPU that triggers the allocation. +For +.BR MPOL_DEFAULT , +the +.I nodemask +and +.I maxnode +arguments must be specify the empty set of nodes. +.TP +.B MPOL_BIND +This mode specifies a strict policy that restricts memory allocation to +the nodes specified in +.IR nodemask . +If +.I nodemask +specifies more than one node, page allocations will come from +the node with sufficient free memory that is closest to +the node where the allocation takes place. +Pages will not be allocated from any node not specified in the +IR nodemask . +(Before Linux 2.6.26, +.\" commit 19770b32609b6bf97a3dece2529089494cbfc549 +page allocations came from +the node with the lowest numeric node ID first, until that node +contained no free memory. +Allocations then came from the node with the next highest +node ID specified in +.I nodemask +and so forth, until none of the specified nodes contained free memory.) +.TP +.B MPOL_INTERLEAVE +This mode specifies that page allocations be interleaved across the +set of nodes specified in +.IR nodemask . +This optimizes for bandwidth instead of latency +by spreading out pages and memory accesses to those pages across +multiple nodes. +To be effective the memory area should be fairly large, +at least 1\ MB or bigger with a fairly uniform access pattern. +Accesses to a single page of the area will still be limited to +the memory bandwidth of a single node. +.TP +.BR MPOL_WEIGHTED_INTERLEAVE " (since Linux 6.9)" +.\" commit fa3bea4e1f8202d787709b7e3654eb0a99aed758 +This mode interleaves page allocations across the nodes specified in +.I nodemask +according to the weights in +.IR /sys/kernel/mm/mempolicy/weighted_interleave . +For example, if bits 0, 2, and 5 are set in +.IR nodemask , +and the contents of +.IR /sys/kernel/mm/mempolicy/weighted_interleave/node0 , +.IR /sys/ .\|.\|. /node2 , +and +.IR /sys/ .\|.\|. /node5 +are 4, 7, and 9, respectively, +then pages in this region will be allocated on nodes 0, 2, and 5 +in a 4:7:9 ratio. +.TP +.B MPOL_PREFERRED +This mode sets the preferred node for allocation. +The kernel will try to allocate pages from this +node first and fall back to other nodes if the +preferred nodes is low on free memory. +If +.I nodemask +specifies more than one node ID, the first node in the +mask will be selected as the preferred node. +If the +.I nodemask +and +.I maxnode +arguments specify the empty set, then the memory is allocated on +the node of the CPU that triggered the allocation. +.TP +.BR MPOL_LOCAL " (since Linux 3.8)" +.\" commit 479e2802d09f1e18a97262c4c6f8f17ae5884bd8 +.\" commit f2a07f40dbc603c15f8b06e6ec7f768af67b424f +This mode specifies "local allocation"; the memory is allocated on +the node of the CPU that triggered the allocation (the "local node"). +The +.I nodemask +and +.I maxnode +arguments must specify the empty set. +If the "local node" is low on free memory, +the kernel will try to allocate memory from other nodes. +The kernel will allocate memory from the "local node" +whenever memory for this node is available. +If the "local node" is not allowed by the thread's current cpuset context, +the kernel will try to allocate memory from other nodes. +The kernel will allocate memory from the "local node" whenever +it becomes allowed by the thread's current cpuset context. +By contrast, +.B MPOL_DEFAULT +reverts to the memory policy of the thread (which may be set via +.BR set_mempolicy (2)); +that policy may be something other than "local allocation". +.P +If +.B MPOL_MF_STRICT +is passed in +.I flags +and +.I mode +is not +.BR MPOL_DEFAULT , +then the call fails with the error +.B EIO +if the existing pages in the memory range don't follow the policy. +.\" According to the kernel code, the following is not true +.\" --Lee Schermerhorn +.\" In Linux 2.6.16 or later the kernel will also try to move pages +.\" to the requested node with this flag. +.P +If +.B MPOL_MF_MOVE +is specified in +.IR flags , +then the kernel will attempt to move all the existing pages +in the memory range so that they follow the policy. +Pages that are shared with other processes will not be moved. +If +.B MPOL_MF_STRICT +is also specified, then the call fails with the error +.B EIO +if some pages could not be moved. +If the +.B MPOL_INTERLEAVE +policy was specified, +pages already residing on the specified nodes +will not be moved such that they are interleaved. +.P +If +.B MPOL_MF_MOVE_ALL +is passed in +.IR flags , +then the kernel will attempt to move all existing pages in the memory range +regardless of whether other processes use the pages. +The calling thread must be privileged +.RB ( CAP_SYS_NICE ) +to use this flag. +If +.B MPOL_MF_STRICT +is also specified, then the call fails with the error +.B EIO +if some pages could not be moved. +If the +.B MPOL_INTERLEAVE +policy was specified, +pages already residing on the specified nodes +will not be moved such that they are interleaved. +.\" --------------------------------------------------------------- +.SH RETURN VALUE +On success, +.BR mbind () +returns 0; +on error, \-1 is returned and +.I errno +is set to indicate the error. +.\" --------------------------------------------------------------- +.SH ERRORS +.\" I think I got all of the error returns. --Lee Schermerhorn +.TP +.B EFAULT +Part or all of the memory range specified by +.I nodemask +and +.I maxnode +points outside your accessible address space. +Or, there was an unmapped hole in the specified memory range specified by +.I addr +and +.IR len . +.TP +.B EINVAL +An invalid value was specified for +.I flags +or +.IR mode ; +or +.I addr + len +was less than +.IR addr ; +or +.I addr +is not a multiple of the system page size. +Or, +.I mode +is +.B MPOL_DEFAULT +and +.I nodemask +specified a nonempty set; +or +.I mode +is +.B MPOL_BIND +or +.B MPOL_INTERLEAVE +and +.I nodemask +is empty. +Or, +.I maxnode +exceeds a kernel-imposed limit. +.\" As at 2.6.23, this limit is "a page worth of bits", e.g., +.\" 8 * 4096 bits, assuming a 4kB page size. +Or, +.I nodemask +specifies one or more node IDs that are +greater than the maximum supported node ID. +Or, none of the node IDs specified by +.I nodemask +are on-line and allowed by the thread's current cpuset context, +or none of the specified nodes contain memory. +Or, the +.I mode +argument specified both +.B MPOL_F_STATIC_NODES +and +.BR MPOL_F_RELATIVE_NODES . +.TP +.B EIO +.B MPOL_MF_STRICT +was specified and an existing page was already on a node +that does not follow the policy; +or +.B MPOL_MF_MOVE +or +.B MPOL_MF_MOVE_ALL +was specified and the kernel was unable to move all existing +pages in the range. +.TP +.B ENOMEM +Insufficient kernel memory was available. +.TP +.B EPERM +The +.I flags +argument included the +.B MPOL_MF_MOVE_ALL +flag and the caller does not have the +.B CAP_SYS_NICE +privilege. +.\" --------------------------------------------------------------- +.SH STANDARDS +Linux. +.SH HISTORY +Linux 2.6.7. +.P +Support for huge page policy was added with Linux 2.6.16. +For interleave policy to be effective on huge page mappings the +policied memory needs to be tens of megabytes or larger. +.P +Before Linux 5.7. +.\" commit dcf1763546d76c372f3136c8d6b2b6e77f140cf0 +.B MPOL_MF_STRICT +was ignored on huge page mappings. +.P +.B MPOL_MF_MOVE +and +.B MPOL_MF_MOVE_ALL +are available only on Linux 2.6.16 and later. +.SH NOTES +For information on library support, see +.BR numa (7). +.P +NUMA policy is not supported on a memory-mapped file range +that was mapped with the +.B MAP_SHARED +flag. +.P +The +.B MPOL_DEFAULT +mode can have different effects for +.BR mbind () +and +.BR set_mempolicy (2). +When +.B MPOL_DEFAULT +is specified for +.BR set_mempolicy (2), +the thread's memory policy reverts to the system default policy +or local allocation. +When +.B MPOL_DEFAULT +is specified for a range of memory using +.BR mbind (), +any pages subsequently allocated for that range will use +the thread's memory policy, as set by +.BR set_mempolicy (2). +This effectively removes the explicit policy from the +specified range, "falling back" to a possibly nondefault +policy. +To select explicit "local allocation" for a memory range, +specify a +.I mode +of +.B MPOL_LOCAL +or +.B MPOL_PREFERRED +with an empty set of nodes. +This method will work for +.BR set_mempolicy (2), +as well. +.SH SEE ALSO +.BR get_mempolicy (2), +.BR getcpu (2), +.BR mmap (2), +.BR set_mempolicy (2), +.BR shmat (2), +.BR shmget (2), +.BR numa (3), +.BR cpuset (7), +.BR numa (7), +.BR numactl (8) -- cgit v1.2.3