diff options
Diffstat (limited to 'Documentation/sysctl/kernel.txt')
-rw-r--r-- | Documentation/sysctl/kernel.txt | 1156 |
1 files changed, 1156 insertions, 0 deletions
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt new file mode 100644 index 000000000..db1676525 --- /dev/null +++ b/Documentation/sysctl/kernel.txt @@ -0,0 +1,1156 @@ +Documentation for /proc/sys/kernel/* kernel version 2.2.10 + (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> + (c) 2009, Shen Feng<shen@cn.fujitsu.com> + +For general info and legal blurb, please look in README. + +============================================================== + +This file contains documentation for the sysctl files in +/proc/sys/kernel/ and is valid for Linux kernel version 2.2. + +The files in this directory can be used to tune and monitor +miscellaneous and general things in the operation of the Linux +kernel. Since some of the files _can_ be used to screw up your +system, it is advisable to read both documentation and source +before actually making adjustments. + +Currently, these files might (depending on your configuration) +show up in /proc/sys/kernel: + +- acct +- acpi_video_flags +- auto_msgmni +- bootloader_type [ X86 only ] +- bootloader_version [ X86 only ] +- callhome [ S390 only ] +- cap_last_cap +- core_pattern +- core_pipe_limit +- core_uses_pid +- ctrl-alt-del +- dmesg_restrict +- domainname +- hostname +- hotplug +- hardlockup_all_cpu_backtrace +- hardlockup_panic +- hung_task_panic +- hung_task_check_count +- hung_task_timeout_secs +- hung_task_check_interval_secs +- hung_task_warnings +- hyperv_record_panic_msg +- kexec_load_disabled +- kptr_restrict +- l2cr [ PPC only ] +- modprobe ==> Documentation/debugging-modules.txt +- modules_disabled +- msg_next_id [ sysv ipc ] +- msgmax +- msgmnb +- msgmni +- nmi_watchdog +- osrelease +- ostype +- overflowgid +- overflowuid +- panic +- panic_on_oops +- panic_on_stackoverflow +- panic_on_unrecovered_nmi +- panic_on_warn +- panic_on_rcu_stall +- perf_cpu_time_max_percent +- perf_event_paranoid +- perf_event_max_stack +- perf_event_mlock_kb +- perf_event_max_contexts_per_stack +- pid_max +- powersave-nap [ PPC only ] +- printk +- printk_delay +- printk_ratelimit +- printk_ratelimit_burst +- pty ==> Documentation/filesystems/devpts.txt +- randomize_va_space +- real-root-dev ==> Documentation/admin-guide/initrd.rst +- reboot-cmd [ SPARC only ] +- rtsig-max +- rtsig-nr +- seccomp/ ==> Documentation/userspace-api/seccomp_filter.rst +- sem +- sem_next_id [ sysv ipc ] +- sg-big-buff [ generic SCSI device (sg) ] +- shm_next_id [ sysv ipc ] +- shm_rmid_forced +- shmall +- shmmax [ sysv ipc ] +- shmmni +- softlockup_all_cpu_backtrace +- soft_watchdog +- stop-a [ SPARC only ] +- sysrq ==> Documentation/admin-guide/sysrq.rst +- sysctl_writes_strict +- tainted +- threads-max +- unprivileged_bpf_disabled +- unknown_nmi_panic +- watchdog +- watchdog_thresh +- version + +============================================================== + +acct: + +highwater lowwater frequency + +If BSD-style process accounting is enabled these values control +its behaviour. If free space on filesystem where the log lives +goes below <lowwater>% accounting suspends. If free space gets +above <highwater>% accounting resumes. <Frequency> determines +how often do we check the amount of free space (value is in +seconds). Default: +4 2 30 +That is, suspend accounting if there left <= 2% free; resume it +if we got >=4%; consider information about amount of free space +valid for 30 seconds. + +============================================================== + +acpi_video_flags: + +flags + +See Doc*/kernel/power/video.txt, it allows mode of video boot to be +set during run time. + +============================================================== + +auto_msgmni: + +This variable has no effect and may be removed in future kernel +releases. Reading it always returns 0. +Up to Linux 3.17, it enabled/disabled automatic recomputing of msgmni +upon memory add/remove or upon ipc namespace creation/removal. +Echoing "1" into this file enabled msgmni automatic recomputing. +Echoing "0" turned it off. auto_msgmni default value was 1. + + +============================================================== + +bootloader_type: + +x86 bootloader identification + +This gives the bootloader type number as indicated by the bootloader, +shifted left by 4, and OR'd with the low four bits of the bootloader +version. The reason for this encoding is that this used to match the +type_of_loader field in the kernel header; the encoding is kept for +backwards compatibility. That is, if the full bootloader type number +is 0x15 and the full version number is 0x234, this file will contain +the value 340 = 0x154. + +See the type_of_loader and ext_loader_type fields in +Documentation/x86/boot.txt for additional information. + +============================================================== + +bootloader_version: + +x86 bootloader version + +The complete bootloader version number. In the example above, this +file will contain the value 564 = 0x234. + +See the type_of_loader and ext_loader_ver fields in +Documentation/x86/boot.txt for additional information. + +============================================================== + +callhome: + +Controls the kernel's callhome behavior in case of a kernel panic. + +The s390 hardware allows an operating system to send a notification +to a service organization (callhome) in case of an operating system panic. + +When the value in this file is 0 (which is the default behavior) +nothing happens in case of a kernel panic. If this value is set to "1" +the complete kernel oops message is send to the IBM customer service +organization in case the mainframe the Linux operating system is running +on has a service contract with IBM. + +============================================================== + +cap_last_cap + +Highest valid capability of the running kernel. Exports +CAP_LAST_CAP from the kernel. + +============================================================== + +core_pattern: + +core_pattern is used to specify a core dumpfile pattern name. +. max length 128 characters; default value is "core" +. core_pattern is used as a pattern template for the output filename; + certain string patterns (beginning with '%') are substituted with + their actual values. +. backward compatibility with core_uses_pid: + If core_pattern does not include "%p" (default does not) + and core_uses_pid is set, then .PID will be appended to + the filename. +. corename format specifiers: + %<NUL> '%' is dropped + %% output one '%' + %p pid + %P global pid (init PID namespace) + %i tid + %I global tid (init PID namespace) + %u uid (in initial user namespace) + %g gid (in initial user namespace) + %d dump mode, matches PR_SET_DUMPABLE and + /proc/sys/fs/suid_dumpable + %s signal number + %t UNIX time of dump + %h hostname + %e executable filename (may be shortened) + %E executable path + %<OTHER> both are dropped +. If the first character of the pattern is a '|', the kernel will treat + the rest of the pattern as a command to run. The core dump will be + written to the standard input of that program instead of to a file. + +============================================================== + +core_pipe_limit: + +This sysctl is only applicable when core_pattern is configured to pipe +core files to a user space helper (when the first character of +core_pattern is a '|', see above). When collecting cores via a pipe +to an application, it is occasionally useful for the collecting +application to gather data about the crashing process from its +/proc/pid directory. In order to do this safely, the kernel must wait +for the collecting process to exit, so as not to remove the crashing +processes proc files prematurely. This in turn creates the +possibility that a misbehaving userspace collecting process can block +the reaping of a crashed process simply by never exiting. This sysctl +defends against that. It defines how many concurrent crashing +processes may be piped to user space applications in parallel. If +this value is exceeded, then those crashing processes above that value +are noted via the kernel log and their cores are skipped. 0 is a +special value, indicating that unlimited processes may be captured in +parallel, but that no waiting will take place (i.e. the collecting +process is not guaranteed access to /proc/<crashing pid>/). This +value defaults to 0. + +============================================================== + +core_uses_pid: + +The default coredump filename is "core". By setting +core_uses_pid to 1, the coredump filename becomes core.PID. +If core_pattern does not include "%p" (default does not) +and core_uses_pid is set, then .PID will be appended to +the filename. + +============================================================== + +ctrl-alt-del: + +When the value in this file is 0, ctrl-alt-del is trapped and +sent to the init(1) program to handle a graceful restart. +When, however, the value is > 0, Linux's reaction to a Vulcan +Nerve Pinch (tm) will be an immediate reboot, without even +syncing its dirty buffers. + +Note: when a program (like dosemu) has the keyboard in 'raw' +mode, the ctrl-alt-del is intercepted by the program before it +ever reaches the kernel tty layer, and it's up to the program +to decide what to do with it. + +============================================================== + +dmesg_restrict: + +This toggle indicates whether unprivileged users are prevented +from using dmesg(8) to view messages from the kernel's log buffer. +When dmesg_restrict is set to (0) there are no restrictions. When +dmesg_restrict is set set to (1), users must have CAP_SYSLOG to use +dmesg(8). + +The kernel config option CONFIG_SECURITY_DMESG_RESTRICT sets the +default value of dmesg_restrict. + +============================================================== + +domainname & hostname: + +These files can be used to set the NIS/YP domainname and the +hostname of your box in exactly the same way as the commands +domainname and hostname, i.e.: +# echo "darkstar" > /proc/sys/kernel/hostname +# echo "mydomain" > /proc/sys/kernel/domainname +has the same effect as +# hostname "darkstar" +# domainname "mydomain" + +Note, however, that the classic darkstar.frop.org has the +hostname "darkstar" and DNS (Internet Domain Name Server) +domainname "frop.org", not to be confused with the NIS (Network +Information Service) or YP (Yellow Pages) domainname. These two +domain names are in general different. For a detailed discussion +see the hostname(1) man page. + +============================================================== +hardlockup_all_cpu_backtrace: + +This value controls the hard lockup detector behavior when a hard +lockup condition is detected as to whether or not to gather further +debug information. If enabled, arch-specific all-CPU stack dumping +will be initiated. + +0: do nothing. This is the default behavior. + +1: on detection capture more debug information. +============================================================== + +hardlockup_panic: + +This parameter can be used to control whether the kernel panics +when a hard lockup is detected. + + 0 - don't panic on hard lockup + 1 - panic on hard lockup + +See Documentation/lockup-watchdogs.txt for more information. This can +also be set using the nmi_watchdog kernel parameter. + +============================================================== + +hotplug: + +Path for the hotplug policy agent. +Default value is "/sbin/hotplug". + +============================================================== + +hung_task_panic: + +Controls the kernel's behavior when a hung task is detected. +This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. + +0: continue operation. This is the default behavior. + +1: panic immediately. + +============================================================== + +hung_task_check_count: + +The upper bound on the number of tasks that are checked. +This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. + +============================================================== + +hung_task_timeout_secs: + +When a task in D state did not get scheduled +for more than this value report a warning. +This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. + +0: means infinite timeout - no checking done. +Possible values to set are in range {0..LONG_MAX/HZ}. + +============================================================== + +hung_task_check_interval_secs: + +Hung task check interval. If hung task checking is enabled +(see hung_task_timeout_secs), the check is done every +hung_task_check_interval_secs seconds. +This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. + +0 (default): means use hung_task_timeout_secs as checking interval. +Possible values to set are in range {0..LONG_MAX/HZ}. + +============================================================== + +hung_task_warnings: + +The maximum number of warnings to report. During a check interval +if a hung task is detected, this value is decreased by 1. +When this value reaches 0, no more warnings will be reported. +This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. + +-1: report an infinite number of warnings. + +============================================================== + +hyperv_record_panic_msg: + +Controls whether the panic kmsg data should be reported to Hyper-V. + +0: do not report panic kmsg data. + +1: report the panic kmsg data. This is the default behavior. + +============================================================== + +kexec_load_disabled: + +A toggle indicating if the kexec_load syscall has been disabled. This +value defaults to 0 (false: kexec_load enabled), but can be set to 1 +(true: kexec_load disabled). Once true, kexec can no longer be used, and +the toggle cannot be set back to false. This allows a kexec image to be +loaded before disabling the syscall, allowing a system to set up (and +later use) an image without it being altered. Generally used together +with the "modules_disabled" sysctl. + +============================================================== + +kptr_restrict: + +This toggle indicates whether restrictions are placed on +exposing kernel addresses via /proc and other interfaces. + +When kptr_restrict is set to 0 (the default) the address is hashed before +printing. (This is the equivalent to %p.) + +When kptr_restrict is set to (1), kernel pointers printed using the %pK +format specifier will be replaced with 0's unless the user has CAP_SYSLOG +and effective user and group ids are equal to the real ids. This is +because %pK checks are done at read() time rather than open() time, so +if permissions are elevated between the open() and the read() (e.g via +a setuid binary) then %pK will not leak kernel pointers to unprivileged +users. Note, this is a temporary solution only. The correct long-term +solution is to do the permission checks at open() time. Consider removing +world read permissions from files that use %pK, and using dmesg_restrict +to protect against uses of %pK in dmesg(8) if leaking kernel pointer +values to unprivileged users is a concern. + +When kptr_restrict is set to (2), kernel pointers printed using +%pK will be replaced with 0's regardless of privileges. + +============================================================== + +l2cr: (PPC only) + +This flag controls the L2 cache of G3 processor boards. If +0, the cache is disabled. Enabled if nonzero. + +============================================================== + +modules_disabled: + +A toggle value indicating if modules are allowed to be loaded +in an otherwise modular kernel. This toggle defaults to off +(0), but can be set true (1). Once true, modules can be +neither loaded nor unloaded, and the toggle cannot be set back +to false. Generally used with the "kexec_load_disabled" toggle. + +============================================================== + +msg_next_id, sem_next_id, and shm_next_id: + +These three toggles allows to specify desired id for next allocated IPC +object: message, semaphore or shared memory respectively. + +By default they are equal to -1, which means generic allocation logic. +Possible values to set are in range {0..INT_MAX}. + +Notes: +1) kernel doesn't guarantee, that new object will have desired id. So, +it's up to userspace, how to handle an object with "wrong" id. +2) Toggle with non-default value will be set back to -1 by kernel after +successful IPC object allocation. If an IPC object allocation syscall +fails, it is undefined if the value remains unmodified or is reset to -1. + +============================================================== + +nmi_watchdog: + +This parameter can be used to control the NMI watchdog +(i.e. the hard lockup detector) on x86 systems. + + 0 - disable the hard lockup detector + 1 - enable the hard lockup detector + +The hard lockup detector monitors each CPU for its ability to respond to +timer interrupts. The mechanism utilizes CPU performance counter registers +that are programmed to generate Non-Maskable Interrupts (NMIs) periodically +while a CPU is busy. Hence, the alternative name 'NMI watchdog'. + +The NMI watchdog is disabled by default if the kernel is running as a guest +in a KVM virtual machine. This default can be overridden by adding + + nmi_watchdog=1 + +to the guest kernel command line (see Documentation/admin-guide/kernel-parameters.rst). + +============================================================== + +numa_balancing + +Enables/disables automatic page fault based NUMA memory +balancing. Memory is moved automatically to nodes +that access it often. + +Enables/disables automatic NUMA memory balancing. On NUMA machines, there +is a performance penalty if remote memory is accessed by a CPU. When this +feature is enabled the kernel samples what task thread is accessing memory +by periodically unmapping pages and later trapping a page fault. At the +time of the page fault, it is determined if the data being accessed should +be migrated to a local memory node. + +The unmapping of pages and trapping faults incur additional overhead that +ideally is offset by improved memory locality but there is no universal +guarantee. If the target workload is already bound to NUMA nodes then this +feature should be disabled. Otherwise, if the system overhead from the +feature is too high then the rate the kernel samples for NUMA hinting +faults may be controlled by the numa_balancing_scan_period_min_ms, +numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, +numa_balancing_scan_size_mb, and numa_balancing_settle_count sysctls. + +============================================================== + +numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, +numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb + +Automatic NUMA balancing scans tasks address space and unmaps pages to +detect if pages are properly placed or if the data should be migrated to a +memory node local to where the task is running. Every "scan delay" the task +scans the next "scan size" number of pages in its address space. When the +end of the address space is reached the scanner restarts from the beginning. + +In combination, the "scan delay" and "scan size" determine the scan rate. +When "scan delay" decreases, the scan rate increases. The scan delay and +hence the scan rate of every task is adaptive and depends on historical +behaviour. If pages are properly placed then the scan delay increases, +otherwise the scan delay decreases. The "scan size" is not adaptive but +the higher the "scan size", the higher the scan rate. + +Higher scan rates incur higher system overhead as page faults must be +trapped and potentially data must be migrated. However, the higher the scan +rate, the more quickly a tasks memory is migrated to a local node if the +workload pattern changes and minimises performance impact due to remote +memory accesses. These sysctls control the thresholds for scan delays and +the number of pages scanned. + +numa_balancing_scan_period_min_ms is the minimum time in milliseconds to +scan a tasks virtual memory. It effectively controls the maximum scanning +rate for each task. + +numa_balancing_scan_delay_ms is the starting "scan delay" used for a task +when it initially forks. + +numa_balancing_scan_period_max_ms is the maximum time in milliseconds to +scan a tasks virtual memory. It effectively controls the minimum scanning +rate for each task. + +numa_balancing_scan_size_mb is how many megabytes worth of pages are +scanned for a given scan. + +============================================================== + +osrelease, ostype & version: + +# cat osrelease +2.1.88 +# cat ostype +Linux +# cat version +#5 Wed Feb 25 21:49:24 MET 1998 + +The files osrelease and ostype should be clear enough. Version +needs a little more clarification however. The '#5' means that +this is the fifth kernel built from this source base and the +date behind it indicates the time the kernel was built. +The only way to tune these values is to rebuild the kernel :-) + +============================================================== + +overflowgid & overflowuid: + +if your architecture did not always support 32-bit UIDs (i.e. arm, +i386, m68k, sh, and sparc32), a fixed UID and GID will be returned to +applications that use the old 16-bit UID/GID system calls, if the +actual UID or GID would exceed 65535. + +These sysctls allow you to change the value of the fixed UID and GID. +The default is 65534. + +============================================================== + +panic: + +The value in this file represents the number of seconds the kernel +waits before rebooting on a panic. When you use the software watchdog, +the recommended setting is 60. + +============================================================== + +panic_on_io_nmi: + +Controls the kernel's behavior when a CPU receives an NMI caused by +an IO error. + +0: try to continue operation (default) + +1: panic immediately. The IO error triggered an NMI. This indicates a + serious system condition which could result in IO data corruption. + Rather than continuing, panicking might be a better choice. Some + servers issue this sort of NMI when the dump button is pushed, + and you can use this option to take a crash dump. + +============================================================== + +panic_on_oops: + +Controls the kernel's behaviour when an oops or BUG is encountered. + +0: try to continue operation + +1: panic immediately. If the `panic' sysctl is also non-zero then the + machine will be rebooted. + +============================================================== + +panic_on_stackoverflow: + +Controls the kernel's behavior when detecting the overflows of +kernel, IRQ and exception stacks except a user stack. +This file shows up if CONFIG_DEBUG_STACKOVERFLOW is enabled. + +0: try to continue operation. + +1: panic immediately. + +============================================================== + +panic_on_unrecovered_nmi: + +The default Linux behaviour on an NMI of either memory or unknown is +to continue operation. For many environments such as scientific +computing it is preferable that the box is taken out and the error +dealt with than an uncorrected parity/ECC error get propagated. + +A small number of systems do generate NMI's for bizarre random reasons +such as power management so the default is off. That sysctl works like +the existing panic controls already in that directory. + +============================================================== + +panic_on_warn: + +Calls panic() in the WARN() path when set to 1. This is useful to avoid +a kernel rebuild when attempting to kdump at the location of a WARN(). + +0: only WARN(), default behaviour. + +1: call panic() after printing out WARN() location. + +============================================================== + +panic_on_rcu_stall: + +When set to 1, calls panic() after RCU stall detection messages. This +is useful to define the root cause of RCU stalls using a vmcore. + +0: do not panic() when RCU stall takes place, default behavior. + +1: panic() after printing RCU stall messages. + +============================================================== + +perf_cpu_time_max_percent: + +Hints to the kernel how much CPU time it should be allowed to +use to handle perf sampling events. If the perf subsystem +is informed that its samples are exceeding this limit, it +will drop its sampling frequency to attempt to reduce its CPU +usage. + +Some perf sampling happens in NMIs. If these samples +unexpectedly take too long to execute, the NMIs can become +stacked up next to each other so much that nothing else is +allowed to execute. + +0: disable the mechanism. Do not monitor or correct perf's + sampling rate no matter how CPU time it takes. + +1-100: attempt to throttle perf's sample rate to this + percentage of CPU. Note: the kernel calculates an + "expected" length of each sample event. 100 here means + 100% of that expected length. Even if this is set to + 100, you may still see sample throttling if this + length is exceeded. Set to 0 if you truly do not care + how much CPU is consumed. + +============================================================== + +perf_event_paranoid: + +Controls use of the performance events system by unprivileged +users (without CAP_SYS_ADMIN). The default value is 2. + + -1: Allow use of (almost) all events by all users + Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK +>=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN + Disallow raw tracepoint access by users without CAP_SYS_ADMIN +>=1: Disallow CPU event access by users without CAP_SYS_ADMIN +>=2: Disallow kernel profiling by users without CAP_SYS_ADMIN + +============================================================== + +perf_event_max_stack: + +Controls maximum number of stack frames to copy for (attr.sample_type & +PERF_SAMPLE_CALLCHAIN) configured events, for instance, when using +'perf record -g' or 'perf trace --call-graph fp'. + +This can only be done when no events are in use that have callchains +enabled, otherwise writing to this file will return -EBUSY. + +The default value is 127. + +============================================================== + +perf_event_mlock_kb: + +Control size of per-cpu ring buffer not counted agains mlock limit. + +The default value is 512 + 1 page + +============================================================== + +perf_event_max_contexts_per_stack: + +Controls maximum number of stack frame context entries for +(attr.sample_type & PERF_SAMPLE_CALLCHAIN) configured events, for +instance, when using 'perf record -g' or 'perf trace --call-graph fp'. + +This can only be done when no events are in use that have callchains +enabled, otherwise writing to this file will return -EBUSY. + +The default value is 8. + +============================================================== + +pid_max: + +PID allocation wrap value. When the kernel's next PID value +reaches this value, it wraps back to a minimum PID value. +PIDs of value pid_max or larger are not allocated. + +============================================================== + +ns_last_pid: + +The last pid allocated in the current (the one task using this sysctl +lives in) pid namespace. When selecting a pid for a next task on fork +kernel tries to allocate a number starting from this one. + +============================================================== + +powersave-nap: (PPC only) + +If set, Linux-PPC will use the 'nap' mode of powersaving, +otherwise the 'doze' mode will be used. + +============================================================== + +printk: + +The four values in printk denote: console_loglevel, +default_message_loglevel, minimum_console_loglevel and +default_console_loglevel respectively. + +These values influence printk() behavior when printing or +logging error messages. See 'man 2 syslog' for more info on +the different loglevels. + +- console_loglevel: messages with a higher priority than + this will be printed to the console +- default_message_loglevel: messages without an explicit priority + will be printed with this priority +- minimum_console_loglevel: minimum (highest) value to which + console_loglevel can be set +- default_console_loglevel: default value for console_loglevel + +============================================================== + +printk_delay: + +Delay each printk message in printk_delay milliseconds + +Value from 0 - 10000 is allowed. + +============================================================== + +printk_ratelimit: + +Some warning messages are rate limited. printk_ratelimit specifies +the minimum length of time between these messages (in jiffies), by +default we allow one every 5 seconds. + +A value of 0 will disable rate limiting. + +============================================================== + +printk_ratelimit_burst: + +While long term we enforce one message per printk_ratelimit +seconds, we do allow a burst of messages to pass through. +printk_ratelimit_burst specifies the number of messages we can +send before ratelimiting kicks in. + +============================================================== + +printk_devkmsg: + +Control the logging to /dev/kmsg from userspace: + +ratelimit: default, ratelimited +on: unlimited logging to /dev/kmsg from userspace +off: logging to /dev/kmsg disabled + +The kernel command line parameter printk.devkmsg= overrides this and is +a one-time setting until next reboot: once set, it cannot be changed by +this sysctl interface anymore. + +pty +=== + +See Documentation/filesystems/devpts.rst. + + +random +====== + +This is a directory, with the following entries: + +* ``boot_id``: a UUID generated the first time this is retrieved, and + unvarying after that; + +* ``uuid``: a UUID generated every time this is retrieved (this can + thus be used to generate UUIDs at will); + +* ``entropy_avail``: the pool's entropy count, in bits; + +* ``poolsize``: the entropy pool size, in bits; + +* ``urandom_min_reseed_secs``: obsolete (used to determine the minimum + number of seconds between urandom pool reseeding). This file is + writable for compatibility purposes, but writing to it has no effect + on any RNG behavior; + +* ``write_wakeup_threshold``: when the entropy count drops below this + (as a number of bits), processes waiting to write to ``/dev/random`` + are woken up. This file is writable for compatibility purposes, but + writing to it has no effect on any RNG behavior. + + +randomize_va_space +================== + +This option can be used to select the type of process address +space randomization that is used in the system, for architectures +that support this feature. + +0 - Turn the process address space randomization off. This is the + default for architectures that do not support this feature anyways, + and kernels that are booted with the "norandmaps" parameter. + +1 - Make the addresses of mmap base, stack and VDSO page randomized. + This, among other things, implies that shared libraries will be + loaded to random addresses. Also for PIE-linked binaries, the + location of code start is randomized. This is the default if the + CONFIG_COMPAT_BRK option is enabled. + +2 - Additionally enable heap randomization. This is the default if + CONFIG_COMPAT_BRK is disabled. + + There are a few legacy applications out there (such as some ancient + versions of libc.so.5 from 1996) that assume that brk area starts + just after the end of the code+bss. These applications break when + start of the brk area is randomized. There are however no known + non-legacy applications that would be broken this way, so for most + systems it is safe to choose full randomization. + + Systems with ancient and/or broken binaries should be configured + with CONFIG_COMPAT_BRK enabled, which excludes the heap from process + address space randomization. + +============================================================== + +reboot-cmd: (Sparc only) + +??? This seems to be a way to give an argument to the Sparc +ROM/Flash boot loader. Maybe to tell it what to do after +rebooting. ??? + +============================================================== + +rtsig-max & rtsig-nr: + +The file rtsig-max can be used to tune the maximum number +of POSIX realtime (queued) signals that can be outstanding +in the system. + +rtsig-nr shows the number of RT signals currently queued. + +============================================================== + +sched_schedstats: + +Enables/disables scheduler statistics. Enabling this feature +incurs a small amount of overhead in the scheduler but is +useful for debugging and performance tuning. + +============================================================== + +sg-big-buff: + +This file shows the size of the generic SCSI (sg) buffer. +You can't tune it just yet, but you could change it on +compile time by editing include/scsi/sg.h and changing +the value of SG_BIG_BUFF. + +There shouldn't be any reason to change this value. If +you can come up with one, you probably know what you +are doing anyway :) + +============================================================== + +shmall: + +This parameter sets the total amount of shared memory pages that +can be used system wide. Hence, SHMALL should always be at least +ceil(shmmax/PAGE_SIZE). + +If you are not sure what the default PAGE_SIZE is on your Linux +system, you can run the following command: + +# getconf PAGE_SIZE + +============================================================== + +shmmax: + +This value can be used to query and set the run time limit +on the maximum shared memory segment size that can be created. +Shared memory segments up to 1Gb are now supported in the +kernel. This value defaults to SHMMAX. + +============================================================== + +shm_rmid_forced: + +Linux lets you set resource limits, including how much memory one +process can consume, via setrlimit(2). Unfortunately, shared memory +segments are allowed to exist without association with any process, and +thus might not be counted against any resource limits. If enabled, +shared memory segments are automatically destroyed when their attach +count becomes zero after a detach or a process termination. It will +also destroy segments that were created, but never attached to, on exit +from the process. The only use left for IPC_RMID is to immediately +destroy an unattached segment. Of course, this breaks the way things are +defined, so some applications might stop working. Note that this +feature will do you no good unless you also configure your resource +limits (in particular, RLIMIT_AS and RLIMIT_NPROC). Most systems don't +need this. + +Note that if you change this from 0 to 1, already created segments +without users and with a dead originative process will be destroyed. + +============================================================== + +sysctl_writes_strict: + +Control how file position affects the behavior of updating sysctl values +via the /proc/sys interface: + + -1 - Legacy per-write sysctl value handling, with no printk warnings. + Each write syscall must fully contain the sysctl value to be + written, and multiple writes on the same sysctl file descriptor + will rewrite the sysctl value, regardless of file position. + 0 - Same behavior as above, but warn about processes that perform writes + to a sysctl file descriptor when the file position is not 0. + 1 - (default) Respect file position when writing sysctl strings. Multiple + writes will append to the sysctl value buffer. Anything past the max + length of the sysctl value buffer will be ignored. Writes to numeric + sysctl entries must always be at file position 0 and the value must + be fully contained in the buffer sent in the write syscall. + +============================================================== + +softlockup_all_cpu_backtrace: + +This value controls the soft lockup detector thread's behavior +when a soft lockup condition is detected as to whether or not +to gather further debug information. If enabled, each cpu will +be issued an NMI and instructed to capture stack trace. + +This feature is only applicable for architectures which support +NMI. + +0: do nothing. This is the default behavior. + +1: on detection capture more debug information. + +============================================================== + +soft_watchdog + +This parameter can be used to control the soft lockup detector. + + 0 - disable the soft lockup detector + 1 - enable the soft lockup detector + +The soft lockup detector monitors CPUs for threads that are hogging the CPUs +without rescheduling voluntarily, and thus prevent the 'watchdog/N' threads +from running. The mechanism depends on the CPUs ability to respond to timer +interrupts which are needed for the 'watchdog/N' threads to be woken up by +the watchdog timer function, otherwise the NMI watchdog - if enabled - can +detect a hard lockup condition. + +============================================================== + +tainted: + +Non-zero if the kernel has been tainted. Numeric values, which can be +ORed together. The letters are seen in "Tainted" line of Oops reports. + + 1 (P): A module with a non-GPL license has been loaded, this + includes modules with no license. + Set by modutils >= 2.4.9 and module-init-tools. + 2 (F): A module was force loaded by insmod -f. + Set by modutils >= 2.4.9 and module-init-tools. + 4 (S): Unsafe SMP processors: SMP with CPUs not designed for SMP. + 8 (R): A module was forcibly unloaded from the system by rmmod -f. + 16 (M): A hardware machine check error occurred on the system. + 32 (B): A bad page was discovered on the system. + 64 (U): The user has asked that the system be marked "tainted". This + could be because they are running software that directly modifies + the hardware, or for other reasons. + 128 (D): The system has died. + 256 (A): The ACPI DSDT has been overridden with one supplied by the user + instead of using the one provided by the hardware. + 512 (W): A kernel warning has occurred. + 1024 (C): A module from drivers/staging was loaded. + 2048 (I): The system is working around a severe firmware bug. + 4096 (O): An out-of-tree module has been loaded. + 8192 (E): An unsigned module has been loaded in a kernel supporting module + signature. + 16384 (L): A soft lockup has previously occurred on the system. + 32768 (K): The kernel has been live patched. + 65536 (X): Auxiliary taint, defined and used by for distros. +131072 (T): The kernel was built with the struct randomization plugin. + +============================================================== + +threads-max + +This value controls the maximum number of threads that can be created +using fork(). + +During initialization the kernel sets this value such that even if the +maximum number of threads is created, the thread structures occupy only +a part (1/8th) of the available RAM pages. + +The minimum value that can be written to threads-max is 20. +The maximum value that can be written to threads-max is given by the +constant FUTEX_TID_MASK (0x3fffffff). +If a value outside of this range is written to threads-max an error +EINVAL occurs. + +The value written is checked against the available RAM pages. If the +thread structures would occupy too much (more than 1/8th) of the +available RAM pages threads-max is reduced accordingly. + +============================================================== + +unprivileged_bpf_disabled: + +Writing 1 to this entry will disable unprivileged calls to bpf(); +once disabled, calling bpf() without CAP_SYS_ADMIN will return +-EPERM. Once set to 1, this can't be cleared from the running kernel +anymore. + +Writing 2 to this entry will also disable unprivileged calls to bpf(), +however, an admin can still change this setting later on, if needed, by +writing 0 or 1 to this entry. + +If BPF_UNPRIV_DEFAULT_OFF is enabled in the kernel config, then this +entry will default to 2 instead of 0. + + 0 - Unprivileged calls to bpf() are enabled + 1 - Unprivileged calls to bpf() are disabled without recovery + 2 - Unprivileged calls to bpf() are disabled + +============================================================== + +unknown_nmi_panic: + +The value in this file affects behavior of handling NMI. When the +value is non-zero, unknown NMI is trapped and then panic occurs. At +that time, kernel debugging information is displayed on console. + +NMI switch that most IA32 servers have fires unknown NMI up, for +example. If a system hangs up, try pressing the NMI switch. + +============================================================== + +watchdog: + +This parameter can be used to disable or enable the soft lockup detector +_and_ the NMI watchdog (i.e. the hard lockup detector) at the same time. + + 0 - disable both lockup detectors + 1 - enable both lockup detectors + +The soft lockup detector and the NMI watchdog can also be disabled or +enabled individually, using the soft_watchdog and nmi_watchdog parameters. +If the watchdog parameter is read, for example by executing + + cat /proc/sys/kernel/watchdog + +the output of this command (0 or 1) shows the logical OR of soft_watchdog +and nmi_watchdog. + +============================================================== + +watchdog_cpumask: + +This value can be used to control on which cpus the watchdog may run. +The default cpumask is all possible cores, but if NO_HZ_FULL is +enabled in the kernel config, and cores are specified with the +nohz_full= boot argument, those cores are excluded by default. +Offline cores can be included in this mask, and if the core is later +brought online, the watchdog will be started based on the mask value. + +Typically this value would only be touched in the nohz_full case +to re-enable cores that by default were not running the watchdog, +if a kernel lockup was suspected on those cores. + +The argument value is the standard cpulist format for cpumasks, +so for example to enable the watchdog on cores 0, 2, 3, and 4 you +might say: + + echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask + +============================================================== + +watchdog_thresh: + +This value can be used to control the frequency of hrtimer and NMI +events and the soft and hard lockup thresholds. The default threshold +is 10 seconds. + +The softlockup threshold is (2 * watchdog_thresh). Setting this +tunable to zero will disable lockup detection altogether. + +============================================================== |