summaryrefslogtreecommitdiffstats
path: root/man/man2/mbind.2
diff options
context:
space:
mode:
Diffstat (limited to 'man/man2/mbind.2')
-rw-r--r--man/man2/mbind.2521
1 files changed, 521 insertions, 0 deletions
diff --git a/man/man2/mbind.2 b/man/man2/mbind.2
new file mode 100644
index 0000000..67c958d
--- /dev/null
+++ b/man/man2/mbind.2
@@ -0,0 +1,521 @@
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft-var
+.\"
+.\" Copyright 2003,2004 Andi Kleen, SuSE Labs.
+.\" and Copyright 2007 Lee Schermerhorn, Hewlett Packard
+.\"
+.\" 2006-02-03, mtk, substantial wording changes and other improvements
+.\" 2007-08-27, Lee Schermerhorn <Lee.Schermerhorn@hp.com>
+.\" more precise specification of behavior.
+.\"
+.\" FIXME
+.\" Linux 3.8 added MPOL_MF_LAZY, which needs to be documented.
+.\" Does it also apply for move_pages()?
+.\"
+.\" commit b24f53a0bea38b266d219ee651b22dba727c44ae
+.\" Author: Lee Schermerhorn <lee.schermerhorn@hp.com>
+.\" Date: Thu Oct 25 14:16:32 2012 +0200
+.\"
+.TH mbind 2 2024-05-02 "Linux man-pages (unreleased)"
+.SH NAME
+mbind \- set memory policy for a memory range
+.SH LIBRARY
+NUMA (Non-Uniform Memory Access) policy library
+.RI ( libnuma ", " \-lnuma )
+.SH SYNOPSIS
+.nf
+.B "#include <numaif.h>"
+.P
+.BI "long mbind(void " addr [. len "], unsigned long " len ", int " mode ,
+.BI " const unsigned long " nodemask [(. maxnode " + ULONG_WIDTH - 1)"
+.B " / ULONG_WIDTH],"
+.BI " unsigned long " maxnode ", unsigned int " flags );
+.fi
+.SH DESCRIPTION
+.BR mbind ()
+sets the NUMA memory policy,
+which consists of a policy mode and zero or more nodes,
+for the memory range starting with
+.I addr
+and continuing for
+.I len
+bytes.
+The memory policy defines from which node memory is allocated.
+.P
+If the memory range specified by the
+.IR addr " and " len
+arguments includes an "anonymous" region of memory\[em]that is
+a region of memory created using the
+.BR mmap (2)
+system call with the
+.BR MAP_ANONYMOUS \[em]or
+a memory-mapped file, mapped using the
+.BR mmap (2)
+system call with the
+.B MAP_PRIVATE
+flag, pages will be allocated only according to the specified
+policy when the application writes (stores) to the page.
+For anonymous regions, an initial read access will use a shared
+page in the kernel containing all zeros.
+For a file mapped with
+.BR MAP_PRIVATE ,
+an initial read access will allocate pages according to the
+memory policy of the thread that causes the page to be allocated.
+This may not be the thread that called
+.BR mbind ().
+.P
+The specified policy will be ignored for any
+.B MAP_SHARED
+mappings in the specified memory range.
+Rather the pages will be allocated according to the memory policy
+of the thread that caused the page to be allocated.
+Again, this may not be the thread that called
+.BR mbind ().
+.P
+If the specified memory range includes a shared memory region
+created using the
+.BR shmget (2)
+system call and attached using the
+.BR shmat (2)
+system call,
+pages allocated for the anonymous or shared memory region will
+be allocated according to the policy specified, regardless of which
+process attached to the shared memory segment causes the allocation.
+If, however, the shared memory region was created with the
+.B SHM_HUGETLB
+flag,
+the huge pages will be allocated according to the policy specified
+only if the page allocation is caused by the process that calls
+.BR mbind ()
+for that region.
+.P
+By default,
+.BR mbind ()
+has an effect only for new allocations; if the pages inside
+the range have been already touched before setting the policy,
+then the policy has no effect.
+This default behavior may be overridden by the
+.B MPOL_MF_MOVE
+and
+.B MPOL_MF_MOVE_ALL
+flags described below.
+.P
+The
+.I mode
+argument must specify one of
+.BR MPOL_DEFAULT ,
+.BR MPOL_BIND ,
+.BR MPOL_INTERLEAVE ,
+.BR MPOL_WEIGHTED_INTERLEAVE ,
+.BR MPOL_PREFERRED ,
+or
+.B MPOL_LOCAL
+(which are described in detail below).
+All policy modes except
+.B MPOL_DEFAULT
+require the caller to specify the node or nodes to which the mode applies,
+via the
+.I nodemask
+argument.
+.P
+The
+.I mode
+argument may also include an optional
+.IR "mode flag" .
+The supported
+.I "mode flags"
+are:
+.TP
+.BR MPOL_F_NUMA_BALANCING " (since Linux 5.15)"
+.\" commit bda420b985054a3badafef23807c4b4fa38a3dff
+.\" commit 6d2aec9e123bb9c49cb5c7fc654f25f81e688e8c
+When
+.I mode
+is
+.BR MPOL_BIND ,
+enable the kernel NUMA balancing for the task if it is supported by the kernel.
+If the flag isn't supported by the kernel, or is used with
+.I mode
+other than
+.BR MPOL_BIND ,
+\-1 is returned and
+.I errno
+is set to
+.BR EINVAL .
+.TP
+.BR MPOL_F_STATIC_NODES " (since Linux-2.6.26)"
+A nonempty
+.I nodemask
+specifies physical node IDs.
+Linux does not remap the
+.I nodemask
+when the thread moves to a different cpuset context,
+nor when the set of nodes allowed by the thread's
+current cpuset context changes.
+.TP
+.BR MPOL_F_RELATIVE_NODES " (since Linux-2.6.26)"
+A nonempty
+.I nodemask
+specifies node IDs that are relative to the set of
+node IDs allowed by the thread's current cpuset.
+.P
+.I nodemask
+points to a bit mask of nodes containing up to
+.I maxnode
+bits.
+The bit mask size is rounded to the next multiple of
+.IR "sizeof(unsigned long)" ,
+but the kernel will use bits only up to
+.IR maxnode .
+A NULL value of
+.I nodemask
+or a
+.I maxnode
+value of zero specifies the empty set of nodes.
+If the value of
+.I maxnode
+is zero,
+the
+.I nodemask
+argument is ignored.
+Where a
+.I nodemask
+is required, it must contain at least one node that is on-line,
+allowed by the thread's current cpuset context
+(unless the
+.B MPOL_F_STATIC_NODES
+mode flag is specified),
+and contains memory.
+.P
+The
+.I mode
+argument must include one of the following values:
+.TP
+.B MPOL_DEFAULT
+This mode requests that any nondefault policy be removed,
+restoring default behavior.
+When applied to a range of memory via
+.BR mbind (),
+this means to use the thread memory policy,
+which may have been set with
+.BR set_mempolicy (2).
+If the mode of the thread memory policy is also
+.BR MPOL_DEFAULT ,
+the system-wide default policy will be used.
+The system-wide default policy allocates
+pages on the node of the CPU that triggers the allocation.
+For
+.BR MPOL_DEFAULT ,
+the
+.I nodemask
+and
+.I maxnode
+arguments must be specify the empty set of nodes.
+.TP
+.B MPOL_BIND
+This mode specifies a strict policy that restricts memory allocation to
+the nodes specified in
+.IR nodemask .
+If
+.I nodemask
+specifies more than one node, page allocations will come from
+the node with sufficient free memory that is closest to
+the node where the allocation takes place.
+Pages will not be allocated from any node not specified in the
+IR nodemask .
+(Before Linux 2.6.26,
+.\" commit 19770b32609b6bf97a3dece2529089494cbfc549
+page allocations came from
+the node with the lowest numeric node ID first, until that node
+contained no free memory.
+Allocations then came from the node with the next highest
+node ID specified in
+.I nodemask
+and so forth, until none of the specified nodes contained free memory.)
+.TP
+.B MPOL_INTERLEAVE
+This mode specifies that page allocations be interleaved across the
+set of nodes specified in
+.IR nodemask .
+This optimizes for bandwidth instead of latency
+by spreading out pages and memory accesses to those pages across
+multiple nodes.
+To be effective the memory area should be fairly large,
+at least 1\ MB or bigger with a fairly uniform access pattern.
+Accesses to a single page of the area will still be limited to
+the memory bandwidth of a single node.
+.TP
+.BR MPOL_WEIGHTED_INTERLEAVE " (since Linux 6.9)"
+.\" commit fa3bea4e1f8202d787709b7e3654eb0a99aed758
+This mode interleaves page allocations across the nodes specified in
+.I nodemask
+according to the weights in
+.IR /sys/kernel/mm/mempolicy/weighted_interleave .
+For example, if bits 0, 2, and 5 are set in
+.IR nodemask ,
+and the contents of
+.IR /sys/kernel/mm/mempolicy/weighted_interleave/node0 ,
+.IR /sys/ .\|.\|. /node2 ,
+and
+.IR /sys/ .\|.\|. /node5
+are 4, 7, and 9, respectively,
+then pages in this region will be allocated on nodes 0, 2, and 5
+in a 4:7:9 ratio.
+.TP
+.B MPOL_PREFERRED
+This mode sets the preferred node for allocation.
+The kernel will try to allocate pages from this
+node first and fall back to other nodes if the
+preferred nodes is low on free memory.
+If
+.I nodemask
+specifies more than one node ID, the first node in the
+mask will be selected as the preferred node.
+If the
+.I nodemask
+and
+.I maxnode
+arguments specify the empty set, then the memory is allocated on
+the node of the CPU that triggered the allocation.
+.TP
+.BR MPOL_LOCAL " (since Linux 3.8)"
+.\" commit 479e2802d09f1e18a97262c4c6f8f17ae5884bd8
+.\" commit f2a07f40dbc603c15f8b06e6ec7f768af67b424f
+This mode specifies "local allocation"; the memory is allocated on
+the node of the CPU that triggered the allocation (the "local node").
+The
+.I nodemask
+and
+.I maxnode
+arguments must specify the empty set.
+If the "local node" is low on free memory,
+the kernel will try to allocate memory from other nodes.
+The kernel will allocate memory from the "local node"
+whenever memory for this node is available.
+If the "local node" is not allowed by the thread's current cpuset context,
+the kernel will try to allocate memory from other nodes.
+The kernel will allocate memory from the "local node" whenever
+it becomes allowed by the thread's current cpuset context.
+By contrast,
+.B MPOL_DEFAULT
+reverts to the memory policy of the thread (which may be set via
+.BR set_mempolicy (2));
+that policy may be something other than "local allocation".
+.P
+If
+.B MPOL_MF_STRICT
+is passed in
+.I flags
+and
+.I mode
+is not
+.BR MPOL_DEFAULT ,
+then the call fails with the error
+.B EIO
+if the existing pages in the memory range don't follow the policy.
+.\" According to the kernel code, the following is not true
+.\" --Lee Schermerhorn
+.\" In Linux 2.6.16 or later the kernel will also try to move pages
+.\" to the requested node with this flag.
+.P
+If
+.B MPOL_MF_MOVE
+is specified in
+.IR flags ,
+then the kernel will attempt to move all the existing pages
+in the memory range so that they follow the policy.
+Pages that are shared with other processes will not be moved.
+If
+.B MPOL_MF_STRICT
+is also specified, then the call fails with the error
+.B EIO
+if some pages could not be moved.
+If the
+.B MPOL_INTERLEAVE
+policy was specified,
+pages already residing on the specified nodes
+will not be moved such that they are interleaved.
+.P
+If
+.B MPOL_MF_MOVE_ALL
+is passed in
+.IR flags ,
+then the kernel will attempt to move all existing pages in the memory range
+regardless of whether other processes use the pages.
+The calling thread must be privileged
+.RB ( CAP_SYS_NICE )
+to use this flag.
+If
+.B MPOL_MF_STRICT
+is also specified, then the call fails with the error
+.B EIO
+if some pages could not be moved.
+If the
+.B MPOL_INTERLEAVE
+policy was specified,
+pages already residing on the specified nodes
+will not be moved such that they are interleaved.
+.\" ---------------------------------------------------------------
+.SH RETURN VALUE
+On success,
+.BR mbind ()
+returns 0;
+on error, \-1 is returned and
+.I errno
+is set to indicate the error.
+.\" ---------------------------------------------------------------
+.SH ERRORS
+.\" I think I got all of the error returns. --Lee Schermerhorn
+.TP
+.B EFAULT
+Part or all of the memory range specified by
+.I nodemask
+and
+.I maxnode
+points outside your accessible address space.
+Or, there was an unmapped hole in the specified memory range specified by
+.I addr
+and
+.IR len .
+.TP
+.B EINVAL
+An invalid value was specified for
+.I flags
+or
+.IR mode ;
+or
+.I addr + len
+was less than
+.IR addr ;
+or
+.I addr
+is not a multiple of the system page size.
+Or,
+.I mode
+is
+.B MPOL_DEFAULT
+and
+.I nodemask
+specified a nonempty set;
+or
+.I mode
+is
+.B MPOL_BIND
+or
+.B MPOL_INTERLEAVE
+and
+.I nodemask
+is empty.
+Or,
+.I maxnode
+exceeds a kernel-imposed limit.
+.\" As at 2.6.23, this limit is "a page worth of bits", e.g.,
+.\" 8 * 4096 bits, assuming a 4kB page size.
+Or,
+.I nodemask
+specifies one or more node IDs that are
+greater than the maximum supported node ID.
+Or, none of the node IDs specified by
+.I nodemask
+are on-line and allowed by the thread's current cpuset context,
+or none of the specified nodes contain memory.
+Or, the
+.I mode
+argument specified both
+.B MPOL_F_STATIC_NODES
+and
+.BR MPOL_F_RELATIVE_NODES .
+.TP
+.B EIO
+.B MPOL_MF_STRICT
+was specified and an existing page was already on a node
+that does not follow the policy;
+or
+.B MPOL_MF_MOVE
+or
+.B MPOL_MF_MOVE_ALL
+was specified and the kernel was unable to move all existing
+pages in the range.
+.TP
+.B ENOMEM
+Insufficient kernel memory was available.
+.TP
+.B EPERM
+The
+.I flags
+argument included the
+.B MPOL_MF_MOVE_ALL
+flag and the caller does not have the
+.B CAP_SYS_NICE
+privilege.
+.\" ---------------------------------------------------------------
+.SH STANDARDS
+Linux.
+.SH HISTORY
+Linux 2.6.7.
+.P
+Support for huge page policy was added with Linux 2.6.16.
+For interleave policy to be effective on huge page mappings the
+policied memory needs to be tens of megabytes or larger.
+.P
+Before Linux 5.7.
+.\" commit dcf1763546d76c372f3136c8d6b2b6e77f140cf0
+.B MPOL_MF_STRICT
+was ignored on huge page mappings.
+.P
+.B MPOL_MF_MOVE
+and
+.B MPOL_MF_MOVE_ALL
+are available only on Linux 2.6.16 and later.
+.SH NOTES
+For information on library support, see
+.BR numa (7).
+.P
+NUMA policy is not supported on a memory-mapped file range
+that was mapped with the
+.B MAP_SHARED
+flag.
+.P
+The
+.B MPOL_DEFAULT
+mode can have different effects for
+.BR mbind ()
+and
+.BR set_mempolicy (2).
+When
+.B MPOL_DEFAULT
+is specified for
+.BR set_mempolicy (2),
+the thread's memory policy reverts to the system default policy
+or local allocation.
+When
+.B MPOL_DEFAULT
+is specified for a range of memory using
+.BR mbind (),
+any pages subsequently allocated for that range will use
+the thread's memory policy, as set by
+.BR set_mempolicy (2).
+This effectively removes the explicit policy from the
+specified range, "falling back" to a possibly nondefault
+policy.
+To select explicit "local allocation" for a memory range,
+specify a
+.I mode
+of
+.B MPOL_LOCAL
+or
+.B MPOL_PREFERRED
+with an empty set of nodes.
+This method will work for
+.BR set_mempolicy (2),
+as well.
+.SH SEE ALSO
+.BR get_mempolicy (2),
+.BR getcpu (2),
+.BR mmap (2),
+.BR set_mempolicy (2),
+.BR shmat (2),
+.BR shmget (2),
+.BR numa (3),
+.BR cpuset (7),
+.BR numa (7),
+.BR numactl (8)