summaryrefslogtreecommitdiffstats
path: root/man2/set_mempolicy.2
diff options
context:
space:
mode:
Diffstat (limited to 'man2/set_mempolicy.2')
-rw-r--r--man2/set_mempolicy.2325
1 files changed, 325 insertions, 0 deletions
diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
new file mode 100644
index 0000000..a7f561d
--- /dev/null
+++ b/man2/set_mempolicy.2
@@ -0,0 +1,325 @@
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft-var
+.\"
+.\" Copyright 2003,2004 Andi Kleen, SuSE Labs.
+.\" and Copyright 2007 Lee Schermerhorn, Hewlett Packard
+.\"
+.\" 2006-02-03, mtk, substantial wording changes and other improvements
+.\" 2007-08-27, Lee Schermerhorn <Lee.Schermerhorn@hp.com>
+.\" more precise specification of behavior.
+.\"
+.TH set_mempolicy 2 2023-07-16 "Linux man-pages 6.05.01"
+.SH NAME
+set_mempolicy \- set default NUMA memory policy for a thread and its children
+.SH LIBRARY
+NUMA (Non-Uniform Memory Access) policy library
+.RI ( libnuma ", " \-lnuma )
+.SH SYNOPSIS
+.nf
+.B "#include <numaif.h>"
+.PP
+.BI "long set_mempolicy(int " mode ", const unsigned long *" nodemask ,
+.BI " unsigned long " maxnode );
+.fi
+.SH DESCRIPTION
+.BR set_mempolicy ()
+sets the NUMA memory policy of the calling thread,
+which consists of a policy mode and zero or more nodes,
+to the values specified by the
+.IR mode ,
+.IR nodemask ,
+and
+.I maxnode
+arguments.
+.PP
+A NUMA machine has different
+memory controllers with different distances to specific CPUs.
+The memory policy defines from which node memory is allocated for
+the thread.
+.PP
+This system call defines the default policy for the thread.
+The thread policy governs allocation of pages in the process's
+address space outside of memory ranges
+controlled by a more specific policy set by
+.BR mbind (2).
+The thread default policy also controls allocation of any pages for
+memory-mapped files mapped using the
+.BR mmap (2)
+call with the
+.B MAP_PRIVATE
+flag and that are only read (loaded) from by the thread
+and of memory-mapped files mapped using the
+.BR mmap (2)
+call with the
+.B MAP_SHARED
+flag, regardless of the access type.
+The policy is applied only when a new page is allocated
+for the thread.
+For anonymous memory this is when the page is first
+touched by the thread.
+.PP
+The
+.I mode
+argument must specify one of
+.BR MPOL_DEFAULT ,
+.BR MPOL_BIND ,
+.BR MPOL_INTERLEAVE ,
+.BR MPOL_PREFERRED ,
+or
+.B MPOL_LOCAL
+(which are described in detail below).
+All modes except
+.B MPOL_DEFAULT
+require the caller to specify the node or nodes to which the mode applies,
+via the
+.I nodemask
+argument.
+.PP
+The
+.I mode
+argument may also include an optional
+.IR "mode flag" .
+The supported
+.I "mode flags"
+are:
+.TP
+.BR MPOL_F_NUMA_BALANCING " (since Linux 5.12)"
+.\" commit bda420b985054a3badafef23807c4b4fa38a3dff
+When
+.I mode
+is
+.BR MPOL_BIND ,
+enable the kernel NUMA balancing for the task if it is supported by the kernel.
+If the flag isn't supported by the kernel, or is used with
+.I mode
+other than
+.BR MPOL_BIND ,
+\-1 is returned and
+.I errno
+is set to
+.BR EINVAL .
+.TP
+.BR MPOL_F_RELATIVE_NODES " (since Linux 2.6.26)"
+A nonempty
+.I nodemask
+specifies node IDs that are relative to the
+set of node IDs allowed by the process's current cpuset.
+.TP
+.BR MPOL_F_STATIC_NODES " (since Linux 2.6.26)"
+A nonempty
+.I nodemask
+specifies physical node IDs.
+Linux will not remap the
+.I nodemask
+when the process moves to a different cpuset context,
+nor when the set of nodes allowed by the process's
+current cpuset context changes.
+.PP
+.I nodemask
+points to a bit mask of node IDs that contains up to
+.I maxnode
+bits.
+The bit mask size is rounded to the next multiple of
+.IR "sizeof(unsigned long)" ,
+but the kernel will use bits only up to
+.IR maxnode .
+A NULL value of
+.I nodemask
+or a
+.I maxnode
+value of zero specifies the empty set of nodes.
+If the value of
+.I maxnode
+is zero,
+the
+.I nodemask
+argument is ignored.
+.PP
+Where a
+.I nodemask
+is required, it must contain at least one node that is on-line,
+allowed by the process's current cpuset context,
+(unless the
+.B MPOL_F_STATIC_NODES
+mode flag is specified),
+and contains memory.
+If the
+.B MPOL_F_STATIC_NODES
+is set in
+.I mode
+and a required
+.I nodemask
+contains no nodes that are allowed by the process's current cpuset context,
+the memory policy reverts to
+.IR "local allocation" .
+This effectively overrides the specified policy until the process's
+cpuset context includes one or more of the nodes specified by
+.IR nodemask .
+.PP
+The
+.I mode
+argument must include one of the following values:
+.TP
+.B MPOL_DEFAULT
+This mode specifies that any nondefault thread memory policy be removed,
+so that the memory policy "falls back" to the system default policy.
+The system default policy is "local allocation"\[em]that is,
+allocate memory on the node of the CPU that triggered the allocation.
+.I nodemask
+must be specified as NULL.
+If the "local node" contains no free memory, the system will
+attempt to allocate memory from a "near by" node.
+.TP
+.B MPOL_BIND
+This mode defines a strict policy that restricts memory allocation to the
+nodes specified in
+.IR nodemask .
+If
+.I nodemask
+specifies more than one node, page allocations will come from
+the node with the lowest numeric node ID first, until that node
+contains no free memory.
+Allocations will then come from the node with the next highest
+node ID specified in
+.I nodemask
+and so forth, until none of the specified nodes contain free memory.
+Pages will not be allocated from any node not specified in the
+.IR nodemask .
+.TP
+.B MPOL_INTERLEAVE
+This mode interleaves page allocations across the nodes specified in
+.I nodemask
+in numeric node ID order.
+This optimizes for bandwidth instead of latency
+by spreading out pages and memory accesses to those pages across
+multiple nodes.
+However, accesses to a single page will still be limited to
+the memory bandwidth of a single node.
+.\" NOTE: the following sentence doesn't make sense in the context
+.\" of set_mempolicy() -- no memory area specified.
+.\" To be effective the memory area should be fairly large,
+.\" at least 1 MB or bigger.
+.TP
+.B MPOL_PREFERRED
+This mode sets the preferred node for allocation.
+The kernel will try to allocate pages from this node first
+and fall back to "near by" nodes if the preferred node is low on free
+memory.
+If
+.I nodemask
+specifies more than one node ID, the first node in the
+mask will be selected as the preferred node.
+If the
+.I nodemask
+and
+.I maxnode
+arguments specify the empty set, then the policy
+specifies "local allocation"
+(like the system default policy discussed above).
+.TP
+.BR MPOL_LOCAL " (since Linux 3.8)"
+.\" commit 479e2802d09f1e18a97262c4c6f8f17ae5884bd8
+.\" commit f2a07f40dbc603c15f8b06e6ec7f768af67b424f
+This mode specifies "local allocation"; the memory is allocated on
+the node of the CPU that triggered the allocation (the "local node").
+The
+.I nodemask
+and
+.I maxnode
+arguments must specify the empty set.
+If the "local node" is low on free memory,
+the kernel will try to allocate memory from other nodes.
+The kernel will allocate memory from the "local node"
+whenever memory for this node is available.
+If the "local node" is not allowed by the process's current cpuset context,
+the kernel will try to allocate memory from other nodes.
+The kernel will allocate memory from the "local node" whenever
+it becomes allowed by the process's current cpuset context.
+.PP
+The thread memory policy is preserved across an
+.BR execve (2),
+and is inherited by child threads created using
+.BR fork (2)
+or
+.BR clone (2).
+.SH RETURN VALUE
+On success,
+.BR set_mempolicy ()
+returns 0;
+on error, \-1 is returned and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+.TP
+.B EFAULT
+Part of all of the memory range specified by
+.I nodemask
+and
+.I maxnode
+points outside your accessible address space.
+.TP
+.B EINVAL
+.I mode
+is invalid.
+Or,
+.I mode
+is
+.B MPOL_DEFAULT
+and
+.I nodemask
+is nonempty,
+or
+.I mode
+is
+.B MPOL_BIND
+or
+.B MPOL_INTERLEAVE
+and
+.I nodemask
+is empty.
+Or,
+.I maxnode
+specifies more than a page worth of bits.
+Or,
+.I nodemask
+specifies one or more node IDs that are
+greater than the maximum supported node ID.
+Or, none of the node IDs specified by
+.I nodemask
+are on-line and allowed by the process's current cpuset context,
+or none of the specified nodes contain memory.
+Or, the
+.I mode
+argument specified both
+.B MPOL_F_STATIC_NODES
+and
+.BR MPOL_F_RELATIVE_NODES .
+Or, the
+.B MPOL_F_NUMA_BALANCING
+isn't supported by the kernel, or is used with
+.I mode
+other than
+.BR MPOL_BIND .
+.TP
+.B ENOMEM
+Insufficient kernel memory was available.
+.SH STANDARDS
+Linux.
+.SH HISTORY
+Linux 2.6.7.
+.SH NOTES
+Memory policy is not remembered if the page is swapped out.
+When such a page is paged back in, it will use the policy of
+the thread or memory range that is in effect at the time the
+page is allocated.
+.PP
+For information on library support, see
+.BR numa (7).
+.SH SEE ALSO
+.BR get_mempolicy (2),
+.BR getcpu (2),
+.BR mbind (2),
+.BR mmap (2),
+.BR numa (3),
+.BR cpuset (7),
+.BR numa (7),
+.BR numactl (8)