diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-24 04:52:22 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-24 04:52:22 +0000 |
commit | 3d08cd331c1adcf0d917392f7e527b3f00511748 (patch) | |
tree | 312f0d1e1632f48862f044b8bb87e602dcffb5f9 /man/man2/madvise.2 | |
parent | Adding debian version 6.7-2. (diff) | |
download | manpages-3d08cd331c1adcf0d917392f7e527b3f00511748.tar.xz manpages-3d08cd331c1adcf0d917392f7e527b3f00511748.zip |
Merging upstream version 6.8.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'man/man2/madvise.2')
-rw-r--r-- | man/man2/madvise.2 | 898 |
1 files changed, 898 insertions, 0 deletions
diff --git a/man/man2/madvise.2 b/man/man2/madvise.2 new file mode 100644 index 0000000..d08ca71 --- /dev/null +++ b/man/man2/madvise.2 @@ -0,0 +1,898 @@ +.\" Copyright (C) 2001 David Gómez <davidge@jazzfree.com> +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.\" Based on comments from mm/filemap.c. Last modified on 10-06-2001 +.\" Modified, 25 Feb 2002, Michael Kerrisk, <mtk.manpages@gmail.com> +.\" Added notes on MADV_DONTNEED +.\" 2010-06-19, mtk, Added documentation of MADV_MERGEABLE and +.\" MADV_UNMERGEABLE +.\" 2010-06-15, Andi Kleen, Add documentation of MADV_HWPOISON. +.\" 2010-06-19, Andi Kleen, Add documentation of MADV_SOFT_OFFLINE. +.\" 2011-09-18, Doug Goldstein <cardoe@cardoe.com> +.\" Document MADV_HUGEPAGE and MADV_NOHUGEPAGE +.\" +.TH madvise 2 2024-05-02 "Linux man-pages (unreleased)" +.SH NAME +madvise \- give advice about use of memory +.SH LIBRARY +Standard C library +.RI ( libc ", " \-lc ) +.SH SYNOPSIS +.nf +.B #include <sys/mman.h> +.P +.BI "int madvise(void " addr [. length "], size_t " length ", int " advice ); +.fi +.P +.RS -4 +Feature Test Macro Requirements for glibc (see +.BR feature_test_macros (7)): +.RE +.P +.BR madvise (): +.nf + Since glibc 2.19: + _DEFAULT_SOURCE + Up to and including glibc 2.19: + _BSD_SOURCE +.fi +.SH DESCRIPTION +The +.BR madvise () +system call is used to give advice or directions to the kernel +about the address range beginning at address +.I addr +and with size +.IR length . +.BR madvise () +only operates on whole pages, therefore +.I addr +must be page-aligned. +The value of +.I length +is rounded up to a multiple of page size. +In most cases, +the goal of such advice is to improve system or application performance. +.P +Initially, the system call supported a set of "conventional" +.I advice +values, which are also available on several other implementations. +(Note, though, that +.BR madvise () +is not specified in POSIX.) +Subsequently, a number of Linux-specific +.I advice +values have been added. +.\" +.\" ====================================================================== +.\" +.SS Conventional advice values +The +.I advice +values listed below +allow an application to tell the kernel how it expects to use +some mapped or shared memory areas, so that the kernel can choose +appropriate read-ahead and caching techniques. +These +.I advice +values do not influence the semantics of the application +(except in the case of +.BR MADV_DONTNEED ), +but may influence its performance. +All of the +.I advice +values listed here have analogs in the POSIX-specified +.BR posix_madvise (3) +function, and the values have the same meanings, with the exception of +.BR MADV_DONTNEED . +.P +The advice is indicated in the +.I advice +argument, which is one of the following: +.TP +.B MADV_NORMAL +No special treatment. +This is the default. +.TP +.B MADV_RANDOM +Expect page references in random order. +(Hence, read ahead may be less useful than normally.) +.TP +.B MADV_SEQUENTIAL +Expect page references in sequential order. +(Hence, pages in the given range can be aggressively read ahead, +and may be freed soon after they are accessed.) +.TP +.B MADV_WILLNEED +Expect access in the near future. +(Hence, it might be a good idea to read some pages ahead.) +.TP +.B MADV_DONTNEED +Do not expect access in the near future. +(For the time being, the application is finished with the given range, +so the kernel can free resources associated with it.) +.IP +After a successful +.B MADV_DONTNEED +operation, +the semantics of memory access in the specified region are changed: +subsequent accesses of pages in the range will succeed, but will result +in either repopulating the memory contents from the +up-to-date contents of the underlying mapped file +(for shared file mappings, shared anonymous mappings, +and shmem-based techniques such as System V shared memory segments) +or zero-fill-on-demand pages for anonymous private mappings. +.IP +Note that, when applied to shared mappings, +.B MADV_DONTNEED +might not lead to immediate freeing of the pages in the range. +The kernel is free to delay freeing the pages until an appropriate moment. +The resident set size (RSS) of the calling process will be immediately +reduced however. +.IP +.B MADV_DONTNEED +cannot be applied to locked pages, or +.B VM_PFNMAP +pages. +(Pages marked with the kernel-internal +.B VM_PFNMAP +.\" http://lwn.net/Articles/162860/ +flag are special memory areas that are not managed +by the virtual memory subsystem. +Such pages are typically created by device drivers that +map the pages into user space.) +.IP +Support for Huge TLB pages was added in Linux v5.18. +Addresses within a mapping backed by Huge TLB pages must be aligned +to the underlying Huge TLB page size, +and the range length is rounded up +to a multiple of the underlying Huge TLB page size. +.\" +.\" ====================================================================== +.\" +.SS Linux-specific advice values +The following Linux-specific +.I advice +values have no counterparts in the POSIX-specified +.BR posix_madvise (3), +and may or may not have counterparts in the +.BR madvise () +interface available on other implementations. +Note that some of these operations change the semantics of memory accesses. +.TP +.BR MADV_REMOVE " (since Linux 2.6.16)" +.\" commit f6b3ec238d12c8cc6cc71490c6e3127988460349 +Free up a given range of pages +and its associated backing store. +This is equivalent to punching a hole in the corresponding +range of the backing store (see +.BR fallocate (2)). +Subsequent accesses in the specified address range will see +data with a value of zero. +.\" Databases want to use this feature to drop a section of their +.\" bufferpool (shared memory segments) - without writing back to +.\" disk/swap space. This feature is also useful for supporting +.\" hot-plug memory on UML. +.IP +The specified address range must be mapped shared and writable. +This flag cannot be applied to locked pages, or +.B VM_PFNMAP +pages. +.IP +In the initial implementation, only +.BR tmpfs (5) +supported +.BR MADV_REMOVE ; +but since Linux 3.5, +.\" commit 3f31d07571eeea18a7d34db9af21d2285b807a17 +any filesystem which supports the +.BR fallocate (2) +.B FALLOC_FL_PUNCH_HOLE +mode also supports +.BR MADV_REMOVE . +Filesystems which do not support +.B MADV_REMOVE +fail with the error +.BR EOPNOTSUPP . +.IP +Support for the Huge TLB filesystem was added in Linux v4.3. +.TP +.BR MADV_DONTFORK " (since Linux 2.6.16)" +.\" commit f822566165dd46ff5de9bf895cfa6c51f53bb0c4 +.\" See http://lwn.net/Articles/171941/ +Do not make the pages in this range available to the child after a +.BR fork (2). +This is useful to prevent copy-on-write semantics from changing +the physical location of a page if the parent writes to it after a +.BR fork (2). +(Such page relocations cause problems for hardware that +DMAs into the page.) +.\" [PATCH] madvise MADV_DONTFORK/MADV_DOFORK +.\" Currently, copy-on-write may change the physical address of +.\" a page even if the user requested that the page is pinned in +.\" memory (either by mlock or by get_user_pages). This happens +.\" if the process forks meanwhile, and the parent writes to that +.\" page. As a result, the page is orphaned: in case of +.\" get_user_pages, the application will never see any data hardware +.\" DMA's into this page after the COW. In case of mlock'd memory, +.\" the parent is not getting the realtime/security benefits of mlock. +.\" +.\" In particular, this affects the Infiniband modules which do DMA from +.\" and into user pages all the time. +.\" +.\" This patch adds madvise options to control whether memory range is +.\" inherited across fork. Useful e.g. for when hardware is doing DMA +.\" from/into these pages. Could also be useful to an application +.\" wanting to speed up its forks by cutting large areas out of +.\" consideration. +.\" +.\" SEE ALSO: http://lwn.net/Articles/171941/ +.\" "Tweaks to madvise() and posix_fadvise()", 14 Feb 2006 +.TP +.BR MADV_DOFORK " (since Linux 2.6.16)" +Undo the effect of +.BR MADV_DONTFORK , +restoring the default behavior, whereby a mapping is inherited across +.BR fork (2). +.TP +.BR MADV_HWPOISON " (since Linux 2.6.32)" +.\" commit 9893e49d64a4874ea67849ee2cfbf3f3d6817573 +Poison the pages in the range specified by +.I addr +and +.I length +and handle subsequent references to those pages +like a hardware memory corruption. +This operation is available only for privileged +.RB ( CAP_SYS_ADMIN ) +processes. +This operation may result in the calling process receiving a +.B SIGBUS +and the page being unmapped. +.IP +This feature is intended for testing of memory error-handling code; +it is available only if the kernel was configured with +.BR CONFIG_MEMORY_FAILURE . +.TP +.BR MADV_MERGEABLE " (since Linux 2.6.32)" +.\" commit f8af4da3b4c14e7267c4ffb952079af3912c51c5 +Enable Kernel Samepage Merging (KSM) for the pages in the range specified by +.I addr +and +.IR length . +The kernel regularly scans those areas of user memory that have +been marked as mergeable, +looking for pages with identical content. +These are replaced by a single write-protected page (which is automatically +copied if a process later wants to update the content of the page). +KSM merges only private anonymous pages (see +.BR mmap (2)). +.IP +The KSM feature is intended for applications that generate many +instances of the same data (e.g., virtualization systems such as KVM). +It can consume a lot of processing power; use with care. +See the Linux kernel source file +.I Documentation/admin\-guide/mm/ksm.rst +for more details. +.IP +The +.B MADV_MERGEABLE +and +.B MADV_UNMERGEABLE +operations are available only if the kernel was configured with +.BR CONFIG_KSM . +.TP +.BR MADV_UNMERGEABLE " (since Linux 2.6.32)" +Undo the effect of an earlier +.B MADV_MERGEABLE +operation on the specified address range; +KSM unmerges whatever pages it had merged in the address range specified by +.I addr +and +.IR length . +.TP +.BR MADV_SOFT_OFFLINE " (since Linux 2.6.33)" +.\" commit afcf938ee0aac4ef95b1a23bac704c6fbeb26de6 +Soft offline the pages in the range specified by +.I addr +and +.IR length . +The memory of each page in the specified range is preserved +(i.e., when next accessed, the same content will be visible, +but in a new physical page frame), +and the original page is offlined +(i.e., no longer used, and taken out of normal memory management). +The effect of the +.B MADV_SOFT_OFFLINE +operation is invisible to (i.e., does not change the semantics of) +the calling process. +.IP +This feature is intended for testing of memory error-handling code; +it is available only if the kernel was configured with +.BR CONFIG_MEMORY_FAILURE . +.TP +.BR MADV_HUGEPAGE " (since Linux 2.6.38)" +.\" commit 0af4e98b6b095c74588af04872f83d333c958c32 +.\" http://lwn.net/Articles/358904/ +.\" https://lwn.net/Articles/423584/ +Enable Transparent Huge Pages (THP) for pages in the range specified by +.I addr +and +.IR length . +The kernel will regularly scan the areas marked as huge page candidates +to replace them with huge pages. +The kernel will also allocate huge pages directly when the region is +naturally aligned to the huge page size (see +.BR posix_memalign (2)). +.IP +This feature is primarily aimed at applications that use large mappings of +data and access large regions of that memory at a time (e.g., virtualization +systems such as QEMU). +It can very easily waste memory (e.g., a 2\ MB mapping that only ever accesses +1 byte will result in 2\ MB of wired memory instead of one 4\ KB page). +See the Linux kernel source file +.I Documentation/admin\-guide/mm/transhuge.rst +for more details. +.IP +Most common kernels configurations provide +.BR MADV_HUGEPAGE -style +behavior by default, and thus +.B MADV_HUGEPAGE +is normally not necessary. +It is mostly intended for embedded systems, where +.BR MADV_HUGEPAGE -style +behavior may not be enabled by default in the kernel. +On such systems, +this flag can be used in order to selectively enable THP. +Whenever +.B MADV_HUGEPAGE +is used, it should always be in regions of memory with +an access pattern that the developer knows in advance won't risk +to increase the memory footprint of the application when transparent +hugepages are enabled. +.IP +.\" commit 99cb0dbd47a15d395bf3faa78dc122bc5efe3fc0 +Since Linux 5.4, +automatic scan of eligible areas and replacement by huge pages works with +private anonymous pages (see +.BR mmap (2)), +shmem pages, +and file-backed pages. +For all memory types, +memory may only be replaced by huge pages on hugepage-aligned boundaries. +For file-mapped memory +\[em]including tmpfs (see +.BR tmpfs (2))\[em] +the mapping must also be naturally hugepage-aligned within the file. +Additionally, +for file-backed, +non-tmpfs memory, +the file must not be open for write and the mapping must be executable. +.IP +The VMA must not be marked +.BR VM_NOHUGEPAGE , +.BR VM_HUGETLB , +.BR VM_IO , +.BR VM_DONTEXPAND , +.BR VM_MIXEDMAP , +or +.BR VM_PFNMAP , +nor can it be stack memory or backed by a DAX-enabled device +(unless the DAX device is hot-plugged as System RAM). +The process must also not have +.B PR_SET_THP_DISABLE +set (see +.BR prctl (2)). +.IP +The +.BR MADV_HUGEPAGE , +.BR MADV_NOHUGEPAGE , +and +.B MADV_COLLAPSE +operations are available only if the kernel was configured with +.B CONFIG_TRANSPARENT_HUGEPAGE +and file/shmem memory is only supported if the kernel was configured with +.BR CONFIG_READ_ONLY_THP_FOR_FS . +.TP +.BR MADV_NOHUGEPAGE " (since Linux 2.6.38)" +Ensures that memory in the address range specified by +.I addr +and +.I length +will not be backed by transparent hugepages. +.TP +.BR MADV_COLLAPSE " (since Linux 6.1)" +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77 +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321 +Perform a best-effort synchronous collapse of +the native pages mapped by the memory range +into Transparent Huge Pages (THPs). +.B MADV_COLLAPSE +operates on the current state of memory of the calling process and +makes no persistent changes or guarantees on how pages will be mapped, +constructed, +or faulted in the future. +.IP +.B MADV_COLLAPSE +supports private anonymous pages (see +.BR mmap (2)), +shmem pages, +and file-backed pages. +See +.B MADV_HUGEPAGE +for general information on memory requirements for THP. +If the range provided spans multiple VMAs, +the semantics of the collapse over each VMA is independent from the others. +If collapse of a given huge page-aligned/sized region fails, +the operation may continue to attempt collapsing +the remainder of the specified memory. +.B MADV_COLLAPSE +will automatically clamp the provided range to be hugepage-aligned. +.IP +All non-resident pages covered by the range +will first be swapped/faulted-in, +before being copied onto a freshly allocated hugepage. +If the native pages compose the same PTE-mapped hugepage, +and are suitably aligned, +allocation of a new hugepage may be elided and +collapse may happen in-place. +Unmapped pages will have their data directly initialized to 0 +in the new hugepage. +However, +for every eligible hugepage-aligned/sized region to be collapsed, +at least one page must currently be backed by physical memory. +.IP +.B MADV_COLLAPSE +is independent of any sysfs +(see +.BR sysfs (5)) +setting under +.IR /sys/kernel/mm/transparent_hugepage , +both in terms of determining THP eligibility, +and allocation semantics. +See Linux kernel source file +.I Documentation/admin\-guide/mm/transhuge.rst +for more information. +.B MADV_COLLAPSE +also ignores +.B huge= +tmpfs mount when operating on tmpfs files. +Allocation for the new hugepage may enter direct reclaim and/or compaction, +regardless of VMA flags +(though +.B VM_NOHUGEPAGE +is still respected). +.IP +When the system has multiple NUMA nodes, +the hugepage will be allocated from +the node providing the most native pages. +.IP +If all hugepage-sized/aligned regions covered by the provided range were +either successfully collapsed, +or were already PMD-mapped THPs, +this operation will be deemed successful. +Note that this doesn't guarantee anything about +other possible mappings of the memory. +In the event multiple hugepage-aligned/sized areas fail to collapse, +only the most-recently\[en]failed code will be set in +.IR errno . +.TP +.BR MADV_DONTDUMP " (since Linux 3.4)" +.\" commit 909af768e88867016f427264ae39d27a57b6a8ed +.\" commit accb61fe7bb0f5c2a4102239e4981650f9048519 +Exclude from a core dump those pages in the range specified by +.I addr +and +.IR length . +This is useful in applications that have large areas of memory +that are known not to be useful in a core dump. +The effect of +.B MADV_DONTDUMP +takes precedence over the bit mask that is set via the +.IR /proc/ pid /coredump_filter +file (see +.BR core (5)). +.TP +.BR MADV_DODUMP " (since Linux 3.4)" +Undo the effect of an earlier +.BR MADV_DONTDUMP . +.TP +.BR MADV_FREE " (since Linux 4.5)" +The application no longer requires the pages in the range specified by +.I addr +and +.IR len . +The kernel can thus free these pages, +but the freeing could be delayed until memory pressure occurs. +For each of the pages that has been marked to be freed +but has not yet been freed, +the free operation will be canceled if the caller writes into the page. +After a successful +.B MADV_FREE +operation, any stale data (i.e., dirty, unwritten pages) will be lost +when the kernel frees the pages. +However, subsequent writes to pages in the range will succeed +and then kernel cannot free those dirtied pages, +so that the caller can always see just written data. +If there is no subsequent write, +the kernel can free the pages at any time. +Once pages in the range have been freed, the caller will +see zero-fill-on-demand pages upon subsequent page references. +.IP +The +.B MADV_FREE +operation +can be applied only to private anonymous pages (see +.BR mmap (2)). +Before Linux 4.12, +.\" commit 93e06c7a645343d222c9a838834a51042eebbbf7 +when freeing pages on a swapless system, +the pages in the given range are freed instantly, +regardless of memory pressure. +.TP +.BR MADV_WIPEONFORK " (since Linux 4.14)" +.\" commit d2cd9ede6e193dd7d88b6d27399e96229a551b19 +Present the child process with zero-filled memory in this range after a +.BR fork (2). +This is useful in forking servers in order to ensure +that sensitive per-process data +(for example, PRNG seeds, cryptographic secrets, and so on) +is not handed to child processes. +.IP +The +.B MADV_WIPEONFORK +operation can be applied only to private anonymous pages (see +.BR mmap (2)). +.IP +Within the child created by +.BR fork (2), +the +.B MADV_WIPEONFORK +setting remains in place on the specified address range. +This setting is cleared during +.BR execve (2). +.TP +.BR MADV_KEEPONFORK " (since Linux 4.14)" +.\" commit d2cd9ede6e193dd7d88b6d27399e96229a551b19 +Undo the effect of an earlier +.BR MADV_WIPEONFORK . +.TP +.BR MADV_COLD " (since Linux 5.4)" +.\" commit 9c276cc65a58faf98be8e56962745ec99ab87636 +Deactivate a given range of pages. +This will make the pages a more probable +reclaim target should there be a memory pressure. +This is a nondestructive operation. +The advice might be ignored for some pages in the range when it is not +applicable. +.TP +.BR MADV_PAGEOUT " (since Linux 5.4)" +.\" commit 1a4e58cce84ee88129d5d49c064bd2852b481357 +Reclaim a given range of pages. +This is done to free up memory occupied by these pages. +If a page is anonymous, it will be swapped out. +If a page is file-backed and dirty, it will be written back to the backing +storage. +The advice might be ignored for some pages in the range when it is not +applicable. +.TP +.BR MADV_POPULATE_READ " (since Linux 5.14)" +"Populate (prefault) page tables readable, +faulting in all pages in the range just as if manually reading from each page; +however, +avoid the actual memory access that would have been performed after handling +the fault. +.IP +In contrast to +.BR MAP_POPULATE , +.B MADV_POPULATE_READ +does not hide errors, +can be applied to (parts of) existing mappings and will always populate +(prefault) page tables readable. +One example use case is prefaulting a file mapping, +reading all file content from disk; +however, +pages won't be dirtied and consequently won't have to be written back to disk +when evicting the pages from memory. +.IP +Depending on the underlying mapping, +map the shared zeropage, +preallocate memory or read the underlying file; +files with holes might or might not preallocate blocks. +If populating fails, +a +.B SIGBUS +signal is not generated; instead, an error is returned. +.IP +If +.B MADV_POPULATE_READ +succeeds, +all page tables have been populated (prefaulted) readable once. +If +.B MADV_POPULATE_READ +fails, +some page tables might have been populated. +.IP +.B MADV_POPULATE_READ +cannot be applied to mappings without read permissions +and special mappings, +for example, +mappings marked with kernel-internal flags such as +.B VM_PFNMAP +or +.BR VM_IO , +or secret memory regions created using +.BR memfd_secret(2) . +.IP +Note that with +.BR MADV_POPULATE_READ , +the process can be killed at any moment when the system runs out of memory. +.TP +.BR MADV_POPULATE_WRITE " (since Linux 5.14)" +Populate (prefault) page tables writable, +faulting in all pages in the range just as if manually writing to each +each page; +however, +avoid the actual memory access that would have been performed after handling +the fault. +.IP +In contrast to +.BR MAP_POPULATE , +MADV_POPULATE_WRITE does not hide errors, +can be applied to (parts of) existing mappings and will always populate +(prefault) page tables writable. +One example use case is preallocating memory, +breaking any CoW (Copy on Write). +.IP +Depending on the underlying mapping, +preallocate memory or read the underlying file; +files with holes will preallocate blocks. +If populating fails, +a +.B SIGBUS +signal is not generated; instead, an error is returned. +.IP +If +.B MADV_POPULATE_WRITE +succeeds, +all page tables have been populated (prefaulted) writable once. +If +.B MADV_POPULATE_WRITE +fails, +some page tables might have been populated. +.IP +.B MADV_POPULATE_WRITE +cannot be applied to mappings without write permissions +and special mappings, +for example, +mappings marked with kernel-internal flags such as +.B VM_PFNMAP +or +.BR VM_IO , +or secret memory regions created using +.BR memfd_secret(2) . +.IP +Note that with +.BR MADV_POPULATE_WRITE , +the process can be killed at any moment when the system runs out of memory. +.SH RETURN VALUE +On success, +.BR madvise () +returns zero. +On error, it returns \-1 and +.I errno +is set to indicate the error. +.SH ERRORS +.TP +.B EACCES +.I advice +is +.BR MADV_REMOVE , +but the specified address range is not a shared writable mapping. +.TP +.B EAGAIN +A kernel resource was temporarily unavailable. +.TP +.B EBADF +The map exists, but the area maps something that isn't a file. +.TP +.B EBUSY +(for +.BR MADV_COLLAPSE ) +Could not charge hugepage to cgroup: cgroup limit exceeded. +.TP +.B EFAULT +.I advice +is +.B MADV_POPULATE_READ +or +.BR MADV_POPULATE_WRITE , +and populating (prefaulting) page tables failed because a +.B SIGBUS +would have been generated on actual memory access and the reason is not a +HW poisoned page +(HW poisoned pages can, +for example, +be created using the +.B MADV_HWPOISON +flag described elsewhere in this page). +.TP +.B EINVAL +.I addr +is not page-aligned or +.I length +is negative. +.\" .I length +.\" is zero, +.TP +.B EINVAL +.I advice +is not a valid. +.TP +.B EINVAL +.I advice +is +.B MADV_COLD +or +.B MADV_PAGEOUT +and the specified address range includes locked, Huge TLB pages, or +.B VM_PFNMAP +pages. +.TP +.B EINVAL +.I advice +is +.B MADV_DONTNEED +or +.B MADV_REMOVE +and the specified address range includes locked, Huge TLB pages, or +.B VM_PFNMAP +pages. +.TP +.B EINVAL +.I advice +is +.B MADV_MERGEABLE +or +.BR MADV_UNMERGEABLE , +but the kernel was not configured with +.BR CONFIG_KSM . +.TP +.B EINVAL +.I advice +is +.B MADV_FREE +or +.B MADV_WIPEONFORK +but the specified address range includes file, Huge TLB, +.BR MAP_SHARED , +or +.B VM_PFNMAP +ranges. +.TP +.B EINVAL +.I advice +is +.B MADV_POPULATE_READ +or +.BR MADV_POPULATE_WRITE , +but the specified address range includes ranges with insufficient permissions +or special mappings, +for example, +mappings marked with kernel-internal flags such a +.B VM_IO +or +.BR VM_PFNMAP , +or secret memory regions created using +.BR memfd_secret(2) . +.TP +.B EIO +(for +.BR MADV_WILLNEED ) +Paging in this area would exceed the process's +maximum resident set size. +.TP +.B ENOMEM +(for +.BR MADV_WILLNEED ) +Not enough memory: paging in failed. +.TP +.B ENOMEM +(for +.BR MADV_COLLAPSE ) +Not enough memory: could not allocate hugepage. +.TP +.B ENOMEM +Addresses in the specified range are not currently +mapped, or are outside the address space of the process. +.TP +.B ENOMEM +.I advice +is +.B MADV_POPULATE_READ +or +.BR MADV_POPULATE_WRITE , +and populating (prefaulting) page tables failed because there was not enough +memory. +.TP +.B EPERM +.I advice +is +.BR MADV_HWPOISON , +but the caller does not have the +.B CAP_SYS_ADMIN +capability. +.TP +.B EHWPOISON +.I advice +is +.B MADV_POPULATE_READ +or +.BR MADV_POPULATE_WRITE , +and populating (prefaulting) page tables failed because a HW poisoned page +(HW poisoned pages can, +for example, +be created using the +.B MADV_HWPOISON +flag described elsewhere in this page) +was encountered. +.SH VERSIONS +Versions of this system call, implementing a wide variety of +.I advice +values, exist on many other implementations. +Other implementations typically implement at least the flags listed +above under +.IR "Conventional advice flags" , +albeit with some variation in semantics. +.P +POSIX.1-2001 describes +.BR posix_madvise (3) +with constants +.BR POSIX_MADV_NORMAL , +.BR POSIX_MADV_RANDOM , +.BR POSIX_MADV_SEQUENTIAL , +.BR POSIX_MADV_WILLNEED , +and +.BR POSIX_MADV_DONTNEED , +and so on, with behavior close to the similarly named flags listed above. +.SS Linux +The Linux implementation requires that the address +.I addr +be page-aligned, and allows +.I length +to be zero. +If there are some parts of the specified address range +that are not mapped, the Linux version of +.BR madvise () +ignores them and applies the call to the rest (but returns +.B ENOMEM +from the system call, as it should). +.P +.I madvise(0,\ 0,\ advice) +will return zero iff +.I advice +is supported by the kernel and can be relied on to probe for support. +.SH STANDARDS +None. +.SH HISTORY +First appeared in 4.4BSD. +.P +Since Linux 3.18, +.\" commit d3ac21cacc24790eb45d735769f35753f5b56ceb +support for this system call is optional, +depending on the setting of the +.B CONFIG_ADVISE_SYSCALLS +configuration option. +.SH SEE ALSO +.BR getrlimit (2), +.BR memfd_secret (2), +.BR mincore (2), +.BR mmap (2), +.BR mprotect (2), +.BR msync (2), +.BR munmap (2), +.BR prctl (2), +.BR process_madvise (2), +.BR posix_madvise (3), +.BR core (5) |