summaryrefslogtreecommitdiffstats
path: root/man2/madvise.2
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--man2/madvise.2898
1 files changed, 898 insertions, 0 deletions
diff --git a/man2/madvise.2 b/man2/madvise.2
new file mode 100644
index 0000000..5782574
--- /dev/null
+++ b/man2/madvise.2
@@ -0,0 +1,898 @@
+.\" Copyright (C) 2001 David Gómez <davidge@jazzfree.com>
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.\" Based on comments from mm/filemap.c. Last modified on 10-06-2001
+.\" Modified, 25 Feb 2002, Michael Kerrisk, <mtk.manpages@gmail.com>
+.\" Added notes on MADV_DONTNEED
+.\" 2010-06-19, mtk, Added documentation of MADV_MERGEABLE and
+.\" MADV_UNMERGEABLE
+.\" 2010-06-15, Andi Kleen, Add documentation of MADV_HWPOISON.
+.\" 2010-06-19, Andi Kleen, Add documentation of MADV_SOFT_OFFLINE.
+.\" 2011-09-18, Doug Goldstein <cardoe@cardoe.com>
+.\" Document MADV_HUGEPAGE and MADV_NOHUGEPAGE
+.\"
+.TH madvise 2 2023-04-03 "Linux man-pages 6.05.01"
+.SH NAME
+madvise \- give advice about use of memory
+.SH LIBRARY
+Standard C library
+.RI ( libc ", " \-lc )
+.SH SYNOPSIS
+.nf
+.B #include <sys/mman.h>
+.PP
+.BI "int madvise(void " addr [. length "], size_t " length ", int " advice );
+.fi
+.PP
+.RS -4
+Feature Test Macro Requirements for glibc (see
+.BR feature_test_macros (7)):
+.RE
+.PP
+.BR madvise ():
+.nf
+ Since glibc 2.19:
+ _DEFAULT_SOURCE
+ Up to and including glibc 2.19:
+ _BSD_SOURCE
+.fi
+.SH DESCRIPTION
+The
+.BR madvise ()
+system call is used to give advice or directions to the kernel
+about the address range beginning at address
+.I addr
+and with size
+.IR length .
+.BR madvise ()
+only operates on whole pages, therefore
+.I addr
+must be page-aligned.
+The value of
+.I length
+is rounded up to a multiple of page size.
+In most cases,
+the goal of such advice is to improve system or application performance.
+.PP
+Initially, the system call supported a set of "conventional"
+.I advice
+values, which are also available on several other implementations.
+(Note, though, that
+.BR madvise ()
+is not specified in POSIX.)
+Subsequently, a number of Linux-specific
+.I advice
+values have been added.
+.\"
+.\" ======================================================================
+.\"
+.SS Conventional advice values
+The
+.I advice
+values listed below
+allow an application to tell the kernel how it expects to use
+some mapped or shared memory areas, so that the kernel can choose
+appropriate read-ahead and caching techniques.
+These
+.I advice
+values do not influence the semantics of the application
+(except in the case of
+.BR MADV_DONTNEED ),
+but may influence its performance.
+All of the
+.I advice
+values listed here have analogs in the POSIX-specified
+.BR posix_madvise (3)
+function, and the values have the same meanings, with the exception of
+.BR MADV_DONTNEED .
+.PP
+The advice is indicated in the
+.I advice
+argument, which is one of the following:
+.TP
+.B MADV_NORMAL
+No special treatment.
+This is the default.
+.TP
+.B MADV_RANDOM
+Expect page references in random order.
+(Hence, read ahead may be less useful than normally.)
+.TP
+.B MADV_SEQUENTIAL
+Expect page references in sequential order.
+(Hence, pages in the given range can be aggressively read ahead,
+and may be freed soon after they are accessed.)
+.TP
+.B MADV_WILLNEED
+Expect access in the near future.
+(Hence, it might be a good idea to read some pages ahead.)
+.TP
+.B MADV_DONTNEED
+Do not expect access in the near future.
+(For the time being, the application is finished with the given range,
+so the kernel can free resources associated with it.)
+.IP
+After a successful
+.B MADV_DONTNEED
+operation,
+the semantics of memory access in the specified region are changed:
+subsequent accesses of pages in the range will succeed, but will result
+in either repopulating the memory contents from the
+up-to-date contents of the underlying mapped file
+(for shared file mappings, shared anonymous mappings,
+and shmem-based techniques such as System V shared memory segments)
+or zero-fill-on-demand pages for anonymous private mappings.
+.IP
+Note that, when applied to shared mappings,
+.B MADV_DONTNEED
+might not lead to immediate freeing of the pages in the range.
+The kernel is free to delay freeing the pages until an appropriate moment.
+The resident set size (RSS) of the calling process will be immediately
+reduced however.
+.IP
+.B MADV_DONTNEED
+cannot be applied to locked pages, or
+.B VM_PFNMAP
+pages.
+(Pages marked with the kernel-internal
+.B VM_PFNMAP
+.\" http://lwn.net/Articles/162860/
+flag are special memory areas that are not managed
+by the virtual memory subsystem.
+Such pages are typically created by device drivers that
+map the pages into user space.)
+.IP
+Support for Huge TLB pages was added in Linux v5.18.
+Addresses within a mapping backed by Huge TLB pages must be aligned
+to the underlying Huge TLB page size,
+and the range length is rounded up
+to a multiple of the underlying Huge TLB page size.
+.\"
+.\" ======================================================================
+.\"
+.SS Linux-specific advice values
+The following Linux-specific
+.I advice
+values have no counterparts in the POSIX-specified
+.BR posix_madvise (3),
+and may or may not have counterparts in the
+.BR madvise ()
+interface available on other implementations.
+Note that some of these operations change the semantics of memory accesses.
+.TP
+.BR MADV_REMOVE " (since Linux 2.6.16)"
+.\" commit f6b3ec238d12c8cc6cc71490c6e3127988460349
+Free up a given range of pages
+and its associated backing store.
+This is equivalent to punching a hole in the corresponding
+range of the backing store (see
+.BR fallocate (2)).
+Subsequent accesses in the specified address range will see
+data with a value of zero.
+.\" Databases want to use this feature to drop a section of their
+.\" bufferpool (shared memory segments) - without writing back to
+.\" disk/swap space. This feature is also useful for supporting
+.\" hot-plug memory on UML.
+.IP
+The specified address range must be mapped shared and writable.
+This flag cannot be applied to locked pages, or
+.B VM_PFNMAP
+pages.
+.IP
+In the initial implementation, only
+.BR tmpfs (5)
+supported
+.BR MADV_REMOVE ;
+but since Linux 3.5,
+.\" commit 3f31d07571eeea18a7d34db9af21d2285b807a17
+any filesystem which supports the
+.BR fallocate (2)
+.B FALLOC_FL_PUNCH_HOLE
+mode also supports
+.BR MADV_REMOVE .
+Filesystems which do not support
+.B MADV_REMOVE
+fail with the error
+.BR EOPNOTSUPP .
+.IP
+Support for the Huge TLB filesystem was added in Linux v4.3.
+.TP
+.BR MADV_DONTFORK " (since Linux 2.6.16)"
+.\" commit f822566165dd46ff5de9bf895cfa6c51f53bb0c4
+.\" See http://lwn.net/Articles/171941/
+Do not make the pages in this range available to the child after a
+.BR fork (2).
+This is useful to prevent copy-on-write semantics from changing
+the physical location of a page if the parent writes to it after a
+.BR fork (2).
+(Such page relocations cause problems for hardware that
+DMAs into the page.)
+.\" [PATCH] madvise MADV_DONTFORK/MADV_DOFORK
+.\" Currently, copy-on-write may change the physical address of
+.\" a page even if the user requested that the page is pinned in
+.\" memory (either by mlock or by get_user_pages). This happens
+.\" if the process forks meanwhile, and the parent writes to that
+.\" page. As a result, the page is orphaned: in case of
+.\" get_user_pages, the application will never see any data hardware
+.\" DMA's into this page after the COW. In case of mlock'd memory,
+.\" the parent is not getting the realtime/security benefits of mlock.
+.\"
+.\" In particular, this affects the Infiniband modules which do DMA from
+.\" and into user pages all the time.
+.\"
+.\" This patch adds madvise options to control whether memory range is
+.\" inherited across fork. Useful e.g. for when hardware is doing DMA
+.\" from/into these pages. Could also be useful to an application
+.\" wanting to speed up its forks by cutting large areas out of
+.\" consideration.
+.\"
+.\" SEE ALSO: http://lwn.net/Articles/171941/
+.\" "Tweaks to madvise() and posix_fadvise()", 14 Feb 2006
+.TP
+.BR MADV_DOFORK " (since Linux 2.6.16)"
+Undo the effect of
+.BR MADV_DONTFORK ,
+restoring the default behavior, whereby a mapping is inherited across
+.BR fork (2).
+.TP
+.BR MADV_HWPOISON " (since Linux 2.6.32)"
+.\" commit 9893e49d64a4874ea67849ee2cfbf3f3d6817573
+Poison the pages in the range specified by
+.I addr
+and
+.I length
+and handle subsequent references to those pages
+like a hardware memory corruption.
+This operation is available only for privileged
+.RB ( CAP_SYS_ADMIN )
+processes.
+This operation may result in the calling process receiving a
+.B SIGBUS
+and the page being unmapped.
+.IP
+This feature is intended for testing of memory error-handling code;
+it is available only if the kernel was configured with
+.BR CONFIG_MEMORY_FAILURE .
+.TP
+.BR MADV_MERGEABLE " (since Linux 2.6.32)"
+.\" commit f8af4da3b4c14e7267c4ffb952079af3912c51c5
+Enable Kernel Samepage Merging (KSM) for the pages in the range specified by
+.I addr
+and
+.IR length .
+The kernel regularly scans those areas of user memory that have
+been marked as mergeable,
+looking for pages with identical content.
+These are replaced by a single write-protected page (which is automatically
+copied if a process later wants to update the content of the page).
+KSM merges only private anonymous pages (see
+.BR mmap (2)).
+.IP
+The KSM feature is intended for applications that generate many
+instances of the same data (e.g., virtualization systems such as KVM).
+It can consume a lot of processing power; use with care.
+See the Linux kernel source file
+.I Documentation/admin\-guide/mm/ksm.rst
+for more details.
+.IP
+The
+.B MADV_MERGEABLE
+and
+.B MADV_UNMERGEABLE
+operations are available only if the kernel was configured with
+.BR CONFIG_KSM .
+.TP
+.BR MADV_UNMERGEABLE " (since Linux 2.6.32)"
+Undo the effect of an earlier
+.B MADV_MERGEABLE
+operation on the specified address range;
+KSM unmerges whatever pages it had merged in the address range specified by
+.I addr
+and
+.IR length .
+.TP
+.BR MADV_SOFT_OFFLINE " (since Linux 2.6.33)"
+.\" commit afcf938ee0aac4ef95b1a23bac704c6fbeb26de6
+Soft offline the pages in the range specified by
+.I addr
+and
+.IR length .
+The memory of each page in the specified range is preserved
+(i.e., when next accessed, the same content will be visible,
+but in a new physical page frame),
+and the original page is offlined
+(i.e., no longer used, and taken out of normal memory management).
+The effect of the
+.B MADV_SOFT_OFFLINE
+operation is invisible to (i.e., does not change the semantics of)
+the calling process.
+.IP
+This feature is intended for testing of memory error-handling code;
+it is available only if the kernel was configured with
+.BR CONFIG_MEMORY_FAILURE .
+.TP
+.BR MADV_HUGEPAGE " (since Linux 2.6.38)"
+.\" commit 0af4e98b6b095c74588af04872f83d333c958c32
+.\" http://lwn.net/Articles/358904/
+.\" https://lwn.net/Articles/423584/
+Enable Transparent Huge Pages (THP) for pages in the range specified by
+.I addr
+and
+.IR length .
+The kernel will regularly scan the areas marked as huge page candidates
+to replace them with huge pages.
+The kernel will also allocate huge pages directly when the region is
+naturally aligned to the huge page size (see
+.BR posix_memalign (2)).
+.IP
+This feature is primarily aimed at applications that use large mappings of
+data and access large regions of that memory at a time (e.g., virtualization
+systems such as QEMU).
+It can very easily waste memory (e.g., a 2\ MB mapping that only ever accesses
+1 byte will result in 2\ MB of wired memory instead of one 4\ KB page).
+See the Linux kernel source file
+.I Documentation/admin\-guide/mm/transhuge.rst
+for more details.
+.IP
+Most common kernels configurations provide
+.BR MADV_HUGEPAGE -style
+behavior by default, and thus
+.B MADV_HUGEPAGE
+is normally not necessary.
+It is mostly intended for embedded systems, where
+.BR MADV_HUGEPAGE -style
+behavior may not be enabled by default in the kernel.
+On such systems,
+this flag can be used in order to selectively enable THP.
+Whenever
+.B MADV_HUGEPAGE
+is used, it should always be in regions of memory with
+an access pattern that the developer knows in advance won't risk
+to increase the memory footprint of the application when transparent
+hugepages are enabled.
+.IP
+.\" commit 99cb0dbd47a15d395bf3faa78dc122bc5efe3fc0
+Since Linux 5.4,
+automatic scan of eligible areas and replacement by huge pages works with
+private anonymous pages (see
+.BR mmap (2)),
+shmem pages,
+and file-backed pages.
+For all memory types,
+memory may only be replaced by huge pages on hugepage-aligned boundaries.
+For file-mapped memory
+\[em]including tmpfs (see
+.BR tmpfs (2))\[em]
+the mapping must also be naturally hugepage-aligned within the file.
+Additionally,
+for file-backed,
+non-tmpfs memory,
+the file must not be open for write and the mapping must be executable.
+.IP
+The VMA must not be marked
+.BR VM_NOHUGEPAGE ,
+.BR VM_HUGETLB ,
+.BR VM_IO ,
+.BR VM_DONTEXPAND ,
+.BR VM_MIXEDMAP ,
+or
+.BR VM_PFNMAP ,
+nor can it be stack memory or backed by a DAX-enabled device
+(unless the DAX device is hot-plugged as System RAM).
+The process must also not have
+.B PR_SET_THP_DISABLE
+set (see
+.BR prctl (2)).
+.IP
+The
+.BR MADV_HUGEPAGE ,
+.BR MADV_NOHUGEPAGE ,
+and
+.B MADV_COLLAPSE
+operations are available only if the kernel was configured with
+.B CONFIG_TRANSPARENT_HUGEPAGE
+and file/shmem memory is only supported if the kernel was configured with
+.BR CONFIG_READ_ONLY_THP_FOR_FS .
+.TP
+.BR MADV_NOHUGEPAGE " (since Linux 2.6.38)"
+Ensures that memory in the address range specified by
+.I addr
+and
+.I length
+will not be backed by transparent hugepages.
+.TP
+.BR MADV_COLLAPSE " (since Linux 6.1)"
+.\" commit 7d8faaf155454f8798ec56404faca29a82689c77
+.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321
+Perform a best-effort synchronous collapse of
+the native pages mapped by the memory range
+into Transparent Huge Pages (THPs).
+.B MADV_COLLAPSE
+operates on the current state of memory of the calling process and
+makes no persistent changes or guarantees on how pages will be mapped,
+constructed,
+or faulted in the future.
+.IP
+.B MADV_COLLAPSE
+supports private anonymous pages (see
+.BR mmap (2)),
+shmem pages,
+and file-backed pages.
+See
+.B MADV_HUGEPAGE
+for general information on memory requirements for THP.
+If the range provided spans multiple VMAs,
+the semantics of the collapse over each VMA is independent from the others.
+If collapse of a given huge page-aligned/sized region fails,
+the operation may continue to attempt collapsing
+the remainder of the specified memory.
+.B MADV_COLLAPSE
+will automatically clamp the provided range to be hugepage-aligned.
+.IP
+All non-resident pages covered by the range
+will first be swapped/faulted-in,
+before being copied onto a freshly allocated hugepage.
+If the native pages compose the same PTE-mapped hugepage,
+and are suitably aligned,
+allocation of a new hugepage may be elided and
+collapse may happen in-place.
+Unmapped pages will have their data directly initialized to 0
+in the new hugepage.
+However,
+for every eligible hugepage-aligned/sized region to be collapsed,
+at least one page must currently be backed by physical memory.
+.IP
+.B MADV_COLLAPSE
+is independent of any sysfs
+(see
+.BR sysfs (5))
+setting under
+.IR /sys/kernel/mm/transparent_hugepage ,
+both in terms of determining THP eligibility,
+and allocation semantics.
+See Linux kernel source file
+.I Documentation/admin\-guide/mm/transhuge.rst
+for more information.
+.B MADV_COLLAPSE
+also ignores
+.B huge=
+tmpfs mount when operating on tmpfs files.
+Allocation for the new hugepage may enter direct reclaim and/or compaction,
+regardless of VMA flags
+(though
+.B VM_NOHUGEPAGE
+is still respected).
+.IP
+When the system has multiple NUMA nodes,
+the hugepage will be allocated from
+the node providing the most native pages.
+.IP
+If all hugepage-sized/aligned regions covered by the provided range were
+either successfully collapsed,
+or were already PMD-mapped THPs,
+this operation will be deemed successful.
+Note that this doesn't guarantee anything about
+other possible mappings of the memory.
+In the event multiple hugepage-aligned/sized areas fail to collapse,
+only the most-recently\[en]failed code will be set in
+.IR errno .
+.TP
+.BR MADV_DONTDUMP " (since Linux 3.4)"
+.\" commit 909af768e88867016f427264ae39d27a57b6a8ed
+.\" commit accb61fe7bb0f5c2a4102239e4981650f9048519
+Exclude from a core dump those pages in the range specified by
+.I addr
+and
+.IR length .
+This is useful in applications that have large areas of memory
+that are known not to be useful in a core dump.
+The effect of
+.B MADV_DONTDUMP
+takes precedence over the bit mask that is set via the
+.IR /proc/ pid /coredump_filter
+file (see
+.BR core (5)).
+.TP
+.BR MADV_DODUMP " (since Linux 3.4)"
+Undo the effect of an earlier
+.BR MADV_DONTDUMP .
+.TP
+.BR MADV_FREE " (since Linux 4.5)"
+The application no longer requires the pages in the range specified by
+.I addr
+and
+.IR len .
+The kernel can thus free these pages,
+but the freeing could be delayed until memory pressure occurs.
+For each of the pages that has been marked to be freed
+but has not yet been freed,
+the free operation will be canceled if the caller writes into the page.
+After a successful
+.B MADV_FREE
+operation, any stale data (i.e., dirty, unwritten pages) will be lost
+when the kernel frees the pages.
+However, subsequent writes to pages in the range will succeed
+and then kernel cannot free those dirtied pages,
+so that the caller can always see just written data.
+If there is no subsequent write,
+the kernel can free the pages at any time.
+Once pages in the range have been freed, the caller will
+see zero-fill-on-demand pages upon subsequent page references.
+.IP
+The
+.B MADV_FREE
+operation
+can be applied only to private anonymous pages (see
+.BR mmap (2)).
+Before Linux 4.12,
+.\" commit 93e06c7a645343d222c9a838834a51042eebbbf7
+when freeing pages on a swapless system,
+the pages in the given range are freed instantly,
+regardless of memory pressure.
+.TP
+.BR MADV_WIPEONFORK " (since Linux 4.14)"
+.\" commit d2cd9ede6e193dd7d88b6d27399e96229a551b19
+Present the child process with zero-filled memory in this range after a
+.BR fork (2).
+This is useful in forking servers in order to ensure
+that sensitive per-process data
+(for example, PRNG seeds, cryptographic secrets, and so on)
+is not handed to child processes.
+.IP
+The
+.B MADV_WIPEONFORK
+operation can be applied only to private anonymous pages (see
+.BR mmap (2)).
+.IP
+Within the child created by
+.BR fork (2),
+the
+.B MADV_WIPEONFORK
+setting remains in place on the specified address range.
+This setting is cleared during
+.BR execve (2).
+.TP
+.BR MADV_KEEPONFORK " (since Linux 4.14)"
+.\" commit d2cd9ede6e193dd7d88b6d27399e96229a551b19
+Undo the effect of an earlier
+.BR MADV_WIPEONFORK .
+.TP
+.BR MADV_COLD " (since Linux 5.4)"
+.\" commit 9c276cc65a58faf98be8e56962745ec99ab87636
+Deactivate a given range of pages.
+This will make the pages a more probable
+reclaim target should there be a memory pressure.
+This is a nondestructive operation.
+The advice might be ignored for some pages in the range when it is not
+applicable.
+.TP
+.BR MADV_PAGEOUT " (since Linux 5.4)"
+.\" commit 1a4e58cce84ee88129d5d49c064bd2852b481357
+Reclaim a given range of pages.
+This is done to free up memory occupied by these pages.
+If a page is anonymous, it will be swapped out.
+If a page is file-backed and dirty, it will be written back to the backing
+storage.
+The advice might be ignored for some pages in the range when it is not
+applicable.
+.TP
+.BR MADV_POPULATE_READ " (since Linux 5.14)"
+"Populate (prefault) page tables readable,
+faulting in all pages in the range just as if manually reading from each page;
+however,
+avoid the actual memory access that would have been performed after handling
+the fault.
+.IP
+In contrast to
+.BR MAP_POPULATE ,
+.B MADV_POPULATE_READ
+does not hide errors,
+can be applied to (parts of) existing mappings and will always populate
+(prefault) page tables readable.
+One example use case is prefaulting a file mapping,
+reading all file content from disk;
+however,
+pages won't be dirtied and consequently won't have to be written back to disk
+when evicting the pages from memory.
+.IP
+Depending on the underlying mapping,
+map the shared zeropage,
+preallocate memory or read the underlying file;
+files with holes might or might not preallocate blocks.
+If populating fails,
+a
+.B SIGBUS
+signal is not generated; instead, an error is returned.
+.IP
+If
+.B MADV_POPULATE_READ
+succeeds,
+all page tables have been populated (prefaulted) readable once.
+If
+.B MADV_POPULATE_READ
+fails,
+some page tables might have been populated.
+.IP
+.B MADV_POPULATE_READ
+cannot be applied to mappings without read permissions
+and special mappings,
+for example,
+mappings marked with kernel-internal flags such as
+.B VM_PFNMAP
+or
+.BR VM_IO ,
+or secret memory regions created using
+.BR memfd_secret(2) .
+.IP
+Note that with
+.BR MADV_POPULATE_READ ,
+the process can be killed at any moment when the system runs out of memory.
+.TP
+.BR MADV_POPULATE_WRITE " (since Linux 5.14)"
+Populate (prefault) page tables writable,
+faulting in all pages in the range just as if manually writing to each
+each page;
+however,
+avoid the actual memory access that would have been performed after handling
+the fault.
+.IP
+In contrast to
+.BR MAP_POPULATE ,
+MADV_POPULATE_WRITE does not hide errors,
+can be applied to (parts of) existing mappings and will always populate
+(prefault) page tables writable.
+One example use case is preallocating memory,
+breaking any CoW (Copy on Write).
+.IP
+Depending on the underlying mapping,
+preallocate memory or read the underlying file;
+files with holes will preallocate blocks.
+If populating fails,
+a
+.B SIGBUS
+signal is not generated; instead, an error is returned.
+.IP
+If
+.B MADV_POPULATE_WRITE
+succeeds,
+all page tables have been populated (prefaulted) writable once.
+If
+.B MADV_POPULATE_WRITE
+fails,
+some page tables might have been populated.
+.IP
+.B MADV_POPULATE_WRITE
+cannot be applied to mappings without write permissions
+and special mappings,
+for example,
+mappings marked with kernel-internal flags such as
+.B VM_PFNMAP
+or
+.BR VM_IO ,
+or secret memory regions created using
+.BR memfd_secret(2) .
+.IP
+Note that with
+.BR MADV_POPULATE_WRITE ,
+the process can be killed at any moment when the system runs out of memory.
+.SH RETURN VALUE
+On success,
+.BR madvise ()
+returns zero.
+On error, it returns \-1 and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+.TP
+.B EACCES
+.I advice
+is
+.BR MADV_REMOVE ,
+but the specified address range is not a shared writable mapping.
+.TP
+.B EAGAIN
+A kernel resource was temporarily unavailable.
+.TP
+.B EBADF
+The map exists, but the area maps something that isn't a file.
+.TP
+.B EBUSY
+(for
+.BR MADV_COLLAPSE )
+Could not charge hugepage to cgroup: cgroup limit exceeded.
+.TP
+.B EFAULT
+.I advice
+is
+.B MADV_POPULATE_READ
+or
+.BR MADV_POPULATE_WRITE ,
+and populating (prefaulting) page tables failed because a
+.B SIGBUS
+would have been generated on actual memory access and the reason is not a
+HW poisoned page
+(HW poisoned pages can,
+for example,
+be created using the
+.B MADV_HWPOISON
+flag described elsewhere in this page).
+.TP
+.B EINVAL
+.I addr
+is not page-aligned or
+.I length
+is negative.
+.\" .I length
+.\" is zero,
+.TP
+.B EINVAL
+.I advice
+is not a valid.
+.TP
+.B EINVAL
+.I advice
+is
+.B MADV_COLD
+or
+.B MADV_PAGEOUT
+and the specified address range includes locked, Huge TLB pages, or
+.B VM_PFNMAP
+pages.
+.TP
+.B EINVAL
+.I advice
+is
+.B MADV_DONTNEED
+or
+.B MADV_REMOVE
+and the specified address range includes locked, Huge TLB pages, or
+.B VM_PFNMAP
+pages.
+.TP
+.B EINVAL
+.I advice
+is
+.B MADV_MERGEABLE
+or
+.BR MADV_UNMERGEABLE ,
+but the kernel was not configured with
+.BR CONFIG_KSM .
+.TP
+.B EINVAL
+.I advice
+is
+.B MADV_FREE
+or
+.B MADV_WIPEONFORK
+but the specified address range includes file, Huge TLB,
+.BR MAP_SHARED ,
+or
+.B VM_PFNMAP
+ranges.
+.TP
+.B EINVAL
+.I advice
+is
+.B MADV_POPULATE_READ
+or
+.BR MADV_POPULATE_WRITE ,
+but the specified address range includes ranges with insufficient permissions
+or special mappings,
+for example,
+mappings marked with kernel-internal flags such a
+.B VM_IO
+or
+.BR VM_PFNMAP ,
+or secret memory regions created using
+.BR memfd_secret(2) .
+.TP
+.B EIO
+(for
+.BR MADV_WILLNEED )
+Paging in this area would exceed the process's
+maximum resident set size.
+.TP
+.B ENOMEM
+(for
+.BR MADV_WILLNEED )
+Not enough memory: paging in failed.
+.TP
+.B ENOMEM
+(for
+.BR MADV_COLLAPSE )
+Not enough memory: could not allocate hugepage.
+.TP
+.B ENOMEM
+Addresses in the specified range are not currently
+mapped, or are outside the address space of the process.
+.TP
+.B ENOMEM
+.I advice
+is
+.B MADV_POPULATE_READ
+or
+.BR MADV_POPULATE_WRITE ,
+and populating (prefaulting) page tables failed because there was not enough
+memory.
+.TP
+.B EPERM
+.I advice
+is
+.BR MADV_HWPOISON ,
+but the caller does not have the
+.B CAP_SYS_ADMIN
+capability.
+.TP
+.B EHWPOISON
+.I advice
+is
+.B MADV_POPULATE_READ
+or
+.BR MADV_POPULATE_WRITE ,
+and populating (prefaulting) page tables failed because a HW poisoned page
+(HW poisoned pages can,
+for example,
+be created using the
+.B MADV_HWPOISON
+flag described elsewhere in this page)
+was encountered.
+.SH VERSIONS
+Versions of this system call, implementing a wide variety of
+.I advice
+values, exist on many other implementations.
+Other implementations typically implement at least the flags listed
+above under
+.IR "Conventional advice flags" ,
+albeit with some variation in semantics.
+.PP
+POSIX.1-2001 describes
+.BR posix_madvise (3)
+with constants
+.BR POSIX_MADV_NORMAL ,
+.BR POSIX_MADV_RANDOM ,
+.BR POSIX_MADV_SEQUENTIAL ,
+.BR POSIX_MADV_WILLNEED ,
+and
+.BR POSIX_MADV_DONTNEED ,
+and so on, with behavior close to the similarly named flags listed above.
+.SS Linux
+The Linux implementation requires that the address
+.I addr
+be page-aligned, and allows
+.I length
+to be zero.
+If there are some parts of the specified address range
+that are not mapped, the Linux version of
+.BR madvise ()
+ignores them and applies the call to the rest (but returns
+.B ENOMEM
+from the system call, as it should).
+.PP
+.I madvise(0,\ 0,\ advice)
+will return zero iff
+.I advice
+is supported by the kernel and can be relied on to probe for support.
+.SH STANDARDS
+None.
+.SH HISTORY
+First appeared in 4.4BSD.
+.PP
+Since Linux 3.18,
+.\" commit d3ac21cacc24790eb45d735769f35753f5b56ceb
+support for this system call is optional,
+depending on the setting of the
+.B CONFIG_ADVISE_SYSCALLS
+configuration option.
+.SH SEE ALSO
+.BR getrlimit (2),
+.BR memfd_secret (2),
+.BR mincore (2),
+.BR mmap (2),
+.BR mprotect (2),
+.BR msync (2),
+.BR munmap (2),
+.BR prctl (2),
+.BR process_madvise (2),
+.BR posix_madvise (3),
+.BR core (5)