diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:41:07 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:41:07 +0000 |
commit | 3af6d22bb3850ab2bac67287e3a3d3b0e32868e5 (patch) | |
tree | 3ee7a3ec64525911fa865bb984c86d997d855527 /man5/proc_sys_vm.5 | |
parent | Adding debian version 6.05.01-1. (diff) | |
download | manpages-3af6d22bb3850ab2bac67287e3a3d3b0e32868e5.tar.xz manpages-3af6d22bb3850ab2bac67287e3a3d3b0e32868e5.zip |
Merging upstream version 6.7.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'man5/proc_sys_vm.5')
-rw-r--r-- | man5/proc_sys_vm.5 | 420 |
1 files changed, 420 insertions, 0 deletions
diff --git a/man5/proc_sys_vm.5 b/man5/proc_sys_vm.5 new file mode 100644 index 0000000..cbd5cf1 --- /dev/null +++ b/man5/proc_sys_vm.5 @@ -0,0 +1,420 @@ +.\" Copyright (C) 1994, 1995, Daniel Quinlan <quinlan@yggdrasil.com> +.\" Copyright (C) 2002-2008, 2017, Michael Kerrisk <mtk.manpages@gmail.com> +.\" Copyright (C) , Andries Brouwer <aeb@cwi.nl> +.\" Copyright (C) 2023, Alejandro Colomar <alx@kernel.org> +.\" +.\" SPDX-License-Identifier: GPL-3.0-or-later +.\" +.TH proc_sys_vm 5 2023-09-30 "Linux man-pages 6.7" +.SH NAME +/proc/sys/vm/ \- virtual memory subsystem +.SH DESCRIPTION +.TP +.I /proc/sys/vm/ +This directory contains files for memory management tuning, buffer, and +cache management. +.TP +.IR /proc/sys/vm/admin_reserve_kbytes " (since Linux 3.10)" +.\" commit 4eeab4f5580d11bffedc697684b91b0bca0d5009 +This file defines the amount of free memory (in KiB) on the system that +should be reserved for users with the capability +.BR CAP_SYS_ADMIN . +.IP +The default value in this file is the minimum of [3% of free pages, 8MiB] +expressed as KiB. +The default is intended to provide enough for the superuser +to log in and kill a process, if necessary, +under the default overcommit 'guess' mode (i.e., 0 in +.IR /proc/sys/vm/overcommit_memory ). +.IP +Systems running in "overcommit never" mode (i.e., 2 in +.IR /proc/sys/vm/overcommit_memory ) +should increase the value in this file to account +for the full virtual memory size of the programs used to recover (e.g., +.BR login (1) +.BR ssh (1), +and +.BR top (1)) +Otherwise, the superuser may not be able to log in to recover the system. +For example, on x86-64 a suitable value is 131072 (128MiB reserved). +.IP +Changing the value in this file takes effect whenever +an application requests memory. +.TP +.IR /proc/sys/vm/compact_memory " (since Linux 2.6.35)" +When 1 is written to this file, all zones are compacted such that free +memory is available in contiguous blocks where possible. +The effect of this action can be seen by examining +.IR /proc/buddyinfo . +.IP +Present only if the kernel was configured with +.BR CONFIG_COMPACTION . +.TP +.IR /proc/sys/vm/drop_caches " (since Linux 2.6.16)" +Writing to this file causes the kernel to drop clean caches, dentries, and +inodes from memory, causing that memory to become free. +This can be useful for memory management testing and +performing reproducible filesystem benchmarks. +Because writing to this file causes the benefits of caching to be lost, +it can degrade overall system performance. +.IP +To free pagecache, use: +.IP +.in +4n +.EX +echo 1 > /proc/sys/vm/drop_caches +.EE +.in +.IP +To free dentries and inodes, use: +.IP +.in +4n +.EX +echo 2 > /proc/sys/vm/drop_caches +.EE +.in +.IP +To free pagecache, dentries, and inodes, use: +.IP +.in +4n +.EX +echo 3 > /proc/sys/vm/drop_caches +.EE +.in +.IP +Because writing to this file is a nondestructive operation and dirty objects +are not freeable, the +user should run +.BR sync (1) +first. +.TP +.IR /proc/sys/vm/sysctl_hugetlb_shm_group " (since Linux 2.6.7)" +This writable file contains a group ID that is allowed +to allocate memory using huge pages. +If a process has a filesystem group ID or any supplementary group ID that +matches this group ID, +then it can make huge-page allocations without holding the +.B CAP_IPC_LOCK +capability; see +.BR memfd_create (2), +.BR mmap (2), +and +.BR shmget (2). +.TP +.IR /proc/sys/vm/legacy_va_layout " (since Linux 2.6.9)" +.\" The following is from Documentation/filesystems/proc.txt +If nonzero, this disables the new 32-bit memory-mapping layout; +the kernel will use the legacy (2.4) layout for all processes. +.TP +.IR /proc/sys/vm/memory_failure_early_kill " (since Linux 2.6.32)" +.\" The following is based on the text in Documentation/sysctl/vm.txt +Control how to kill processes when an uncorrected memory error +(typically a 2-bit error in a memory module) +that cannot be handled by the kernel +is detected in the background by hardware. +In some cases (like the page still having a valid copy on disk), +the kernel will handle the failure +transparently without affecting any applications. +But if there is no other up-to-date copy of the data, +it will kill processes to prevent any data corruptions from propagating. +.IP +The file has one of the following values: +.RS +.TP +.B 1 +Kill all processes that have the corrupted-and-not-reloadable page mapped +as soon as the corruption is detected. +Note that this is not supported for a few types of pages, +such as kernel internally +allocated data or the swap cache, but works for the majority of user pages. +.TP +.B 0 +Unmap the corrupted page from all processes and kill a process +only if it tries to access the page. +.RE +.IP +The kill is performed using a +.B SIGBUS +signal with +.I si_code +set to +.BR BUS_MCEERR_AO . +Processes can handle this if they want to; see +.BR sigaction (2) +for more details. +.IP +This feature is active only on architectures/platforms with advanced machine +check handling and depends on the hardware capabilities. +.IP +Applications can override the +.I memory_failure_early_kill +setting individually with the +.BR prctl (2) +.B PR_MCE_KILL +operation. +.IP +Present only if the kernel was configured with +.BR CONFIG_MEMORY_FAILURE . +.TP +.IR /proc/sys/vm/memory_failure_recovery " (since Linux 2.6.32)" +.\" The following is based on the text in Documentation/sysctl/vm.txt +Enable memory failure recovery (when supported by the platform). +.RS +.TP +.B 1 +Attempt recovery. +.TP +.B 0 +Always panic on a memory failure. +.RE +.IP +Present only if the kernel was configured with +.BR CONFIG_MEMORY_FAILURE . +.TP +.IR /proc/sys/vm/oom_dump_tasks " (since Linux 2.6.25)" +.\" The following is from Documentation/sysctl/vm.txt +Enables a system-wide task dump (excluding kernel threads) to be +produced when the kernel performs an OOM-killing. +The dump includes the following information +for each task (thread, process): +thread ID, real user ID, thread group ID (process ID), +virtual memory size, resident set size, +the CPU that the task is scheduled on, +oom_adj score (see the description of +.IR /proc/ pid /oom_adj ), +and command name. +This is helpful to determine why the OOM-killer was invoked +and to identify the rogue task that caused it. +.IP +If this contains the value zero, this information is suppressed. +On very large systems with thousands of tasks, +it may not be feasible to dump the memory state information for each one. +Such systems should not be forced to incur a performance penalty in +OOM situations when the information may not be desired. +.IP +If this is set to nonzero, this information is shown whenever the +OOM-killer actually kills a memory-hogging task. +.IP +The default value is 0. +.TP +.IR /proc/sys/vm/oom_kill_allocating_task " (since Linux 2.6.24)" +.\" The following is from Documentation/sysctl/vm.txt +This enables or disables killing the OOM-triggering task in +out-of-memory situations. +.IP +If this is set to zero, the OOM-killer will scan through the entire +tasklist and select a task based on heuristics to kill. +This normally selects a rogue memory-hogging task that +frees up a large amount of memory when killed. +.IP +If this is set to nonzero, the OOM-killer simply kills the task that +triggered the out-of-memory condition. +This avoids a possibly expensive tasklist scan. +.IP +If +.I /proc/sys/vm/panic_on_oom +is nonzero, it takes precedence over whatever value is used in +.IR /proc/sys/vm/oom_kill_allocating_task . +.IP +The default value is 0. +.TP +.IR /proc/sys/vm/overcommit_kbytes " (since Linux 3.14)" +.\" commit 49f0ce5f92321cdcf741e35f385669a421013cb7 +This writable file provides an alternative to +.I /proc/sys/vm/overcommit_ratio +for controlling the +.I CommitLimit +when +.I /proc/sys/vm/overcommit_memory +has the value 2. +It allows the amount of memory overcommitting to be specified as +an absolute value (in kB), +rather than as a percentage, as is done with +.IR overcommit_ratio . +This allows for finer-grained control of +.I CommitLimit +on systems with extremely large memory sizes. +.IP +Only one of +.I overcommit_kbytes +or +.I overcommit_ratio +can have an effect: +if +.I overcommit_kbytes +has a nonzero value, then it is used to calculate +.IR CommitLimit , +otherwise +.I overcommit_ratio +is used. +Writing a value to either of these files causes the +value in the other file to be set to zero. +.TP +.I /proc/sys/vm/overcommit_memory +This file contains the kernel virtual memory accounting mode. +Values are: +.RS +.IP +0: heuristic overcommit (this is the default) +.br +1: always overcommit, never check +.br +2: always check, never overcommit +.RE +.IP +In mode 0, calls of +.BR mmap (2) +with +.B MAP_NORESERVE +are not checked, and the default check is very weak, +leading to the risk of getting a process "OOM-killed". +.IP +In mode 1, the kernel pretends there is always enough memory, +until memory actually runs out. +One use case for this mode is scientific computing applications +that employ large sparse arrays. +Before Linux 2.6.0, any nonzero value implies mode 1. +.IP +In mode 2 (available since Linux 2.6), the total virtual address space +that can be allocated +.RI ( CommitLimit +in +.IR /proc/meminfo ) +is calculated as +.IP +.in +4n +.EX +CommitLimit = (total_RAM \- total_huge_TLB) * + overcommit_ratio / 100 + total_swap +.EE +.in +.IP +where: +.RS +.IP \[bu] 3 +.I total_RAM +is the total amount of RAM on the system; +.IP \[bu] +.I total_huge_TLB +is the amount of memory set aside for huge pages; +.IP \[bu] +.I overcommit_ratio +is the value in +.IR /proc/sys/vm/overcommit_ratio ; +and +.IP \[bu] +.I total_swap +is the amount of swap space. +.RE +.IP +For example, on a system with 16 GB of physical RAM, 16 GB +of swap, no space dedicated to huge pages, and an +.I overcommit_ratio +of 50, this formula yields a +.I CommitLimit +of 24 GB. +.IP +Since Linux 3.14, if the value in +.I /proc/sys/vm/overcommit_kbytes +is nonzero, then +.I CommitLimit +is instead calculated as: +.IP +.in +4n +.EX +CommitLimit = overcommit_kbytes + total_swap +.EE +.in +.IP +See also the description of +.I /proc/sys/vm/admin_reserve_kbytes +and +.IR /proc/sys/vm/user_reserve_kbytes . +.TP +.IR /proc/sys/vm/overcommit_ratio " (since Linux 2.6.0)" +This writable file defines a percentage by which memory +can be overcommitted. +The default value in the file is 50. +See the description of +.IR /proc/sys/vm/overcommit_memory . +.TP +.IR /proc/sys/vm/panic_on_oom " (since Linux 2.6.18)" +.\" The following is adapted from Documentation/sysctl/vm.txt +This enables or disables a kernel panic in +an out-of-memory situation. +.IP +If this file is set to the value 0, +the kernel's OOM-killer will kill some rogue process. +Usually, the OOM-killer is able to kill a rogue process and the +system will survive. +.IP +If this file is set to the value 1, +then the kernel normally panics when out-of-memory happens. +However, if a process limits allocations to certain nodes +using memory policies +.RB ( mbind (2) +.BR MPOL_BIND ) +or cpusets +.RB ( cpuset (7)) +and those nodes reach memory exhaustion status, +one process may be killed by the OOM-killer. +No panic occurs in this case: +because other nodes' memory may be free, +this means the system as a whole may not have reached +an out-of-memory situation yet. +.IP +If this file is set to the value 2, +the kernel always panics when an out-of-memory condition occurs. +.IP +The default value is 0. +1 and 2 are for failover of clustering. +Select either according to your policy of failover. +.TP +.I /proc/sys/vm/swappiness +.\" The following is from Documentation/sysctl/vm.txt +The value in this file controls how aggressively the kernel will swap +memory pages. +Higher values increase aggressiveness, lower values +decrease aggressiveness. +The default value is 60. +.TP +.IR /proc/sys/vm/user_reserve_kbytes " (since Linux 3.10)" +.\" commit c9b1d0981fcce3d9976d7b7a56e4e0503bc610dd +Specifies an amount of memory (in KiB) to reserve for user processes. +This is intended to prevent a user from starting a single memory hogging +process, such that they cannot recover (kill the hog). +The value in this file has an effect only when +.I /proc/sys/vm/overcommit_memory +is set to 2 ("overcommit never" mode). +In this case, the system reserves an amount of memory that is the minimum +of [3% of current process size, +.IR user_reserve_kbytes ]. +.IP +The default value in this file is the minimum of [3% of free pages, 128MiB] +expressed as KiB. +.IP +If the value in this file is set to zero, +then a user will be allowed to allocate all free memory with a single process +(minus the amount reserved by +.IR /proc/sys/vm/admin_reserve_kbytes ). +Any subsequent attempts to execute a command will result in +"fork: Cannot allocate memory". +.IP +Changing the value in this file takes effect whenever +an application requests memory. +.TP +.IR /proc/sys/vm/unprivileged_userfaultfd " (since Linux 5.2)" +.\" cefdca0a86be517bc390fc4541e3674b8e7803b0 +This (writable) file exposes a flag that controls whether +unprivileged processes are allowed to employ +.BR userfaultfd (2). +If this file has the value 1, then unprivileged processes may use +.BR userfaultfd (2). +If this file has the value 0, then only processes that have the +.B CAP_SYS_PTRACE +capability may employ +.BR userfaultfd (2). +The default value in this file is 1. +.SH SEE ALSO +.BR proc (5), +.BR proc_sys (5) |