diff options
Diffstat (limited to 'Documentation/admin-guide')
30 files changed, 995 insertions, 256 deletions
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index ff456871bf..ca7d9402f6 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -551,6 +551,7 @@ memory.stat file includes following statistics: event happens each time a page is unaccounted from the cgroup. swap # of bytes of swap usage + swapcached # of bytes of swap cached in memory dirty # of bytes that are waiting to get written back to the disk. writeback # of bytes of file/anon cache that are queued for syncing to disk. diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index b26b5274ea..3f85254f3c 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -210,6 +210,35 @@ cgroup v2 currently supports the following mount options. relying on the original semantics (e.g. specifying bogusly high 'bypass' protection values at higher tree levels). + memory_hugetlb_accounting + Count HugeTLB memory usage towards the cgroup's overall + memory usage for the memory controller (for the purpose of + statistics reporting and memory protetion). This is a new + behavior that could regress existing setups, so it must be + explicitly opted in with this mount option. + + A few caveats to keep in mind: + + * There is no HugeTLB pool management involved in the memory + controller. The pre-allocated pool does not belong to anyone. + Specifically, when a new HugeTLB folio is allocated to + the pool, it is not accounted for from the perspective of the + memory controller. It is only charged to a cgroup when it is + actually used (for e.g at page fault time). Host memory + overcommit management has to consider this when configuring + hard limits. In general, HugeTLB pool management should be + done via other mechanisms (such as the HugeTLB controller). + * Failure to charge a HugeTLB folio to the memory controller + results in SIGBUS. This could happen even if the HugeTLB pool + still has pages available (but the cgroup limit is hit and + reclaim attempt fails). + * Charging HugeTLB memory towards the memory controller affects + memory protection and reclaim dynamics. Any userspace tuning + (of low, min limits for e.g) needs to take this into account. + * HugeTLB pages utilized while this option is not selected + will not be tracked by the memory controller (even if cgroup + v2 is remounted later on). + Organizing Processes and Threads -------------------------------- @@ -364,6 +393,13 @@ constraint, a threaded controller must be able to handle competition between threads in a non-leaf cgroup and its child cgroups. Each threaded controller defines how such competitions are handled. +Currently, the following controllers are threaded and can be enabled +in a threaded cgroup:: + +- cpu +- cpuset +- perf_event +- pids [Un]populated Notification -------------------------- @@ -1532,6 +1568,15 @@ PAGE_SIZE multiple when read back. collapsing an existing range of pages. This counter is not present when CONFIG_TRANSPARENT_HUGEPAGE is not set. + thp_swpout (npn) + Number of transparent hugepages which are swapout in one piece + without splitting. + + thp_swpout_fallback (npn) + Number of transparent hugepages which were split before swapout. + Usually because failed to allocate some continuous swap space + for the huge page. + memory.numa_stat A read-only nested-keyed file which exists on non-root cgroups. @@ -2023,7 +2068,7 @@ IO Priority ~~~~~~~~~~~ A single attribute controls the behavior of the I/O priority cgroup policy, -namely the blkio.prio.class attribute. The following values are accepted for +namely the io.prio.class attribute. The following values are accepted for that attribute: no-change @@ -2052,9 +2097,11 @@ The following numerical values are associated with the I/O priority policies: +----------------+---+ | no-change | 0 | +----------------+---+ -| rt-to-be | 2 | +| promote-to-rt | 1 | +----------------+---+ -| all-to-idle | 3 | +| restrict-to-be | 2 | ++----------------+---+ +| idle | 3 | +----------------+---+ The numerical value that corresponds to each I/O priority class is as follows: @@ -2074,7 +2121,7 @@ The algorithm to set the I/O priority class for a request is as follows: - If I/O priority class policy is promote-to-rt, change the request I/O priority class to IOPRIO_CLASS_RT and change the request I/O priority level to 4. -- If I/O priorityt class is not promote-to-rt, translate the I/O priority +- If I/O priority class policy is not promote-to-rt, translate the I/O priority class policy into a number, then change the request I/O priority class into the maximum of the I/O priority class policy number and the numerical I/O priority class. @@ -2226,6 +2273,49 @@ Cpuset Interface Files Its value will be affected by memory nodes hotplug events. + cpuset.cpus.exclusive + A read-write multiple values file which exists on non-root + cpuset-enabled cgroups. + + It lists all the exclusive CPUs that are allowed to be used + to create a new cpuset partition. Its value is not used + unless the cgroup becomes a valid partition root. See the + "cpuset.cpus.partition" section below for a description of what + a cpuset partition is. + + When the cgroup becomes a partition root, the actual exclusive + CPUs that are allocated to that partition are listed in + "cpuset.cpus.exclusive.effective" which may be different + from "cpuset.cpus.exclusive". If "cpuset.cpus.exclusive" + has previously been set, "cpuset.cpus.exclusive.effective" + is always a subset of it. + + Users can manually set it to a value that is different from + "cpuset.cpus". The only constraint in setting it is that the + list of CPUs must be exclusive with respect to its sibling. + + For a parent cgroup, any one of its exclusive CPUs can only + be distributed to at most one of its child cgroups. Having an + exclusive CPU appearing in two or more of its child cgroups is + not allowed (the exclusivity rule). A value that violates the + exclusivity rule will be rejected with a write error. + + The root cgroup is a partition root and all its available CPUs + are in its exclusive CPU set. + + cpuset.cpus.exclusive.effective + A read-only multiple values file which exists on all non-root + cpuset-enabled cgroups. + + This file shows the effective set of exclusive CPUs that + can be used to create a partition root. The content of this + file will always be a subset of "cpuset.cpus" and its parent's + "cpuset.cpus.exclusive.effective" if its parent is not the root + cgroup. It will also be a subset of "cpuset.cpus.exclusive" + if it is set. If "cpuset.cpus.exclusive" is not set, it is + treated to have an implicit value of "cpuset.cpus" in the + formation of local partition. + cpuset.cpus.partition A read-write single value file which exists on non-root cpuset-enabled cgroups. This flag is owned by the parent cgroup @@ -2239,26 +2329,41 @@ Cpuset Interface Files "isolated" Partition root without load balancing ========== ===================================== - The root cgroup is always a partition root and its state - cannot be changed. All other non-root cgroups start out as - "member". + A cpuset partition is a collection of cpuset-enabled cgroups with + a partition root at the top of the hierarchy and its descendants + except those that are separate partition roots themselves and + their descendants. A partition has exclusive access to the + set of exclusive CPUs allocated to it. Other cgroups outside + of that partition cannot use any CPUs in that set. + + There are two types of partitions - local and remote. A local + partition is one whose parent cgroup is also a valid partition + root. A remote partition is one whose parent cgroup is not a + valid partition root itself. Writing to "cpuset.cpus.exclusive" + is optional for the creation of a local partition as its + "cpuset.cpus.exclusive" file will assume an implicit value that + is the same as "cpuset.cpus" if it is not set. Writing the + proper "cpuset.cpus.exclusive" values down the cgroup hierarchy + before the target partition root is mandatory for the creation + of a remote partition. + + Currently, a remote partition cannot be created under a local + partition. All the ancestors of a remote partition root except + the root cgroup cannot be a partition root. + + The root cgroup is always a partition root and its state cannot + be changed. All other non-root cgroups start out as "member". When set to "root", the current cgroup is the root of a new - partition or scheduling domain that comprises itself and all - its descendants except those that are separate partition roots - themselves and their descendants. + partition or scheduling domain. The set of exclusive CPUs is + determined by the value of its "cpuset.cpus.exclusive.effective". - When set to "isolated", the CPUs in that partition root will + When set to "isolated", the CPUs in that partition will be in an isolated state without any load balancing from the scheduler. Tasks placed in such a partition with multiple CPUs should be carefully distributed and bound to each of the individual CPUs for optimal performance. - The value shown in "cpuset.cpus.effective" of a partition root - is the CPUs that the partition root can dedicate to a potential - new child partition root. The new child subtracts available - CPUs from its parent "cpuset.cpus.effective". - A partition root ("root" or "isolated") can be in one of the two possible states - valid or invalid. An invalid partition root is in a degraded state where some state information may @@ -2281,37 +2386,33 @@ Cpuset Interface Files In the case of an invalid partition root, a descriptive string on why the partition is invalid is included within parentheses. - For a partition root to become valid, the following conditions + For a local partition root to be valid, the following conditions must be met. - 1) The "cpuset.cpus" is exclusive with its siblings , i.e. they - are not shared by any of its siblings (exclusivity rule). - 2) The parent cgroup is a valid partition root. - 3) The "cpuset.cpus" is not empty and must contain at least - one of the CPUs from parent's "cpuset.cpus", i.e. they overlap. - 4) The "cpuset.cpus.effective" cannot be empty unless there is + 1) The parent cgroup is a valid partition root. + 2) The "cpuset.cpus.exclusive.effective" file cannot be empty, + though it may contain offline CPUs. + 3) The "cpuset.cpus.effective" cannot be empty unless there is no task associated with this partition. - External events like hotplug or changes to "cpuset.cpus" can - cause a valid partition root to become invalid and vice versa. - Note that a task cannot be moved to a cgroup with empty - "cpuset.cpus.effective". + For a remote partition root to be valid, all the above conditions + except the first one must be met. - For a valid partition root with the sibling cpu exclusivity - rule enabled, changes made to "cpuset.cpus" that violate the - exclusivity rule will invalidate the partition as well as its - sibling partitions with conflicting cpuset.cpus values. So - care must be taking in changing "cpuset.cpus". + External events like hotplug or changes to "cpuset.cpus" or + "cpuset.cpus.exclusive" can cause a valid partition root to + become invalid and vice versa. Note that a task cannot be + moved to a cgroup with empty "cpuset.cpus.effective". A valid non-root parent partition may distribute out all its CPUs - to its child partitions when there is no task associated with it. + to its child local partitions when there is no task associated + with it. - Care must be taken to change a valid partition root to - "member" as all its child partitions, if present, will become + Care must be taken to change a valid partition root to "member" + as all its child local partitions, if present, will become invalid causing disruption to tasks running in those child partitions. These inactivated partitions could be recovered if their parent is switched back to a partition root with a proper - set of "cpuset.cpus". + value in "cpuset.cpus" or "cpuset.cpus.exclusive". Poll and inotify events are triggered whenever the state of "cpuset.cpus.partition" changes. That includes changes caused @@ -2321,6 +2422,11 @@ Cpuset Interface Files to "cpuset.cpus.partition" without the need to do continuous polling. + A user can pre-configure certain CPUs to an isolated state + with load balancing disabled at boot time with the "isolcpus" + kernel boot command line option. If those CPUs are to be put + into a partition, they have to be used in an isolated partition. + Device controller ----------------- diff --git a/Documentation/admin-guide/dynamic-debug-howto.rst b/Documentation/admin-guide/dynamic-debug-howto.rst index 0b3d39c610..0c526dac84 100644 --- a/Documentation/admin-guide/dynamic-debug-howto.rst +++ b/Documentation/admin-guide/dynamic-debug-howto.rst @@ -259,7 +259,7 @@ Debug Messages at Module Initialization Time When ``modprobe foo`` is called, modprobe scans ``/proc/cmdline`` for ``foo.params``, strips ``foo.``, and passes them to the kernel along with -params given in modprobe args or ``/etc/modprob.d/*.conf`` files, +params given in modprobe args or ``/etc/modprobe.d/*.conf`` files, in the following order: 1. parameters given via ``/etc/modprobe.d/*.conf``:: diff --git a/Documentation/admin-guide/efi-stub.rst b/Documentation/admin-guide/efi-stub.rst index b24e7c40d8..090f3a185e 100644 --- a/Documentation/admin-guide/efi-stub.rst +++ b/Documentation/admin-guide/efi-stub.rst @@ -15,7 +15,7 @@ between architectures is in drivers/firmware/efi/libstub. For arm64, there is no compressed kernel support, so the Image itself masquerades as a PE/COFF image and the EFI stub is linked into the -kernel. The arm64 EFI stub lives in arch/arm64/kernel/efi-entry.S +kernel. The arm64 EFI stub lives in drivers/firmware/efi/libstub/arm64.c and drivers/firmware/efi/libstub/arm64-stub.c. By using the EFI boot stub it's possible to boot a Linux kernel diff --git a/Documentation/admin-guide/hw-vuln/mds.rst b/Documentation/admin-guide/hw-vuln/mds.rst index 48ca0bd856..48c7b0b72a 100644 --- a/Documentation/admin-guide/hw-vuln/mds.rst +++ b/Documentation/admin-guide/hw-vuln/mds.rst @@ -102,9 +102,19 @@ The possible values in this file are: * - 'Vulnerable' - The processor is vulnerable, but no mitigation enabled * - 'Vulnerable: Clear CPU buffers attempted, no microcode' - - The processor is vulnerable but microcode is not updated. - - The mitigation is enabled on a best effort basis. See :ref:`vmwerv` + - The processor is vulnerable but microcode is not updated. The + mitigation is enabled on a best effort basis. + + If the processor is vulnerable but the availability of the microcode + based mitigation mechanism is not advertised via CPUID, the kernel + selects a best effort mitigation mode. This mode invokes the mitigation + instructions without a guarantee that they clear the CPU buffers. + + This is done to address virtualization scenarios where the host has the + microcode update applied, but the hypervisor is not yet updated to + expose the CPUID to the guest. If the host has updated microcode the + protection takes effect; otherwise a few CPU cycles are wasted + pointlessly. * - 'Mitigation: Clear CPU buffers' - The processor is vulnerable and the CPU buffer clearing mitigation is enabled. @@ -119,24 +129,6 @@ to the above information: 'SMT Host state unknown' Kernel runs in a VM, Host SMT state unknown ======================== ============================================ -.. _vmwerv: - -Best effort mitigation mode -^^^^^^^^^^^^^^^^^^^^^^^^^^^ - - If the processor is vulnerable, but the availability of the microcode based - mitigation mechanism is not advertised via CPUID the kernel selects a best - effort mitigation mode. This mode invokes the mitigation instructions - without a guarantee that they clear the CPU buffers. - - This is done to address virtualization scenarios where the host has the - microcode update applied, but the hypervisor is not yet updated to expose - the CPUID to the guest. If the host has updated microcode the protection - takes effect otherwise a few cpu cycles are wasted pointlessly. - - The state in the mds sysfs file reflects this situation accordingly. - - Mitigation mechanism ------------------------- diff --git a/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst b/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst index c98fd11907..1302fd1b55 100644 --- a/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst +++ b/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst @@ -225,8 +225,19 @@ The possible values in this file are: * - 'Vulnerable' - The processor is vulnerable, but no mitigation enabled * - 'Vulnerable: Clear CPU buffers attempted, no microcode' - - The processor is vulnerable, but microcode is not updated. The + - The processor is vulnerable but microcode is not updated. The mitigation is enabled on a best effort basis. + + If the processor is vulnerable but the availability of the microcode + based mitigation mechanism is not advertised via CPUID, the kernel + selects a best effort mitigation mode. This mode invokes the mitigation + instructions without a guarantee that they clear the CPU buffers. + + This is done to address virtualization scenarios where the host has the + microcode update applied, but the hypervisor is not yet updated to + expose the CPUID to the guest. If the host has updated microcode the + protection takes effect; otherwise a few CPU cycles are wasted + pointlessly. * - 'Mitigation: Clear CPU buffers' - The processor is vulnerable and the CPU buffer clearing mitigation is enabled. diff --git a/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst b/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst index 014167ef8d..444f84e22a 100644 --- a/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst +++ b/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst @@ -98,7 +98,19 @@ The possible values in this file are: * - 'Vulnerable' - The CPU is affected by this vulnerability and the microcode and kernel mitigation are not applied. * - 'Vulnerable: Clear CPU buffers attempted, no microcode' - - The system tries to clear the buffers but the microcode might not support the operation. + - The processor is vulnerable but microcode is not updated. The + mitigation is enabled on a best effort basis. + + If the processor is vulnerable but the availability of the microcode + based mitigation mechanism is not advertised via CPUID, the kernel + selects a best effort mitigation mode. This mode invokes the mitigation + instructions without a guarantee that they clear the CPU buffers. + + This is done to address virtualization scenarios where the host has the + microcode update applied, but the hypervisor is not yet updated to + expose the CPUID to the guest. If the host has updated microcode the + protection takes effect; otherwise a few CPU cycles are wasted + pointlessly. * - 'Mitigation: Clear CPU buffers' - The microcode has been updated to clear the buffers. TSX is still enabled. * - 'Mitigation: TSX disabled' @@ -106,25 +118,6 @@ The possible values in this file are: * - 'Not affected' - The CPU is not affected by this issue. -.. _ucode_needed: - -Best effort mitigation mode -^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -If the processor is vulnerable, but the availability of the microcode-based -mitigation mechanism is not advertised via CPUID the kernel selects a best -effort mitigation mode. This mode invokes the mitigation instructions -without a guarantee that they clear the CPU buffers. - -This is done to address virtualization scenarios where the host has the -microcode update applied, but the hypervisor is not yet updated to expose the -CPUID to the guest. If the host has updated microcode the protection takes -effect; otherwise a few CPU cycles are wasted pointlessly. - -The state in the tsx_async_abort sysfs file reflects this situation -accordingly. - - Mitigation mechanism -------------------- diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst index a748e7eb44..5762e7477a 100644 --- a/Documentation/admin-guide/kdump/kdump.rst +++ b/Documentation/admin-guide/kdump/kdump.rst @@ -17,7 +17,7 @@ You can use common commands, such as cp, scp or makedumpfile to copy the memory image to a dump file on the local disk, or across the network to a remote system. -Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64, +Kdump and kexec are currently supported on the x86, x86_64, ppc64, s390x, arm and arm64 architectures. When the system kernel boots, it reserves a small section of memory for @@ -113,7 +113,7 @@ There are two possible methods of using Kdump. 2) Or use the system kernel binary itself as dump-capture kernel and there is no need to build a separate dump-capture kernel. This is possible only with the architectures which support a relocatable kernel. As - of today, i386, x86_64, ppc64, ia64, arm and arm64 architectures support + of today, i386, x86_64, ppc64, arm and arm64 architectures support relocatable kernel. Building a relocatable kernel is advantageous from the point of view that @@ -236,24 +236,6 @@ Dump-capture kernel config options (Arch Dependent, ppc64) Make and install the kernel and its modules. -Dump-capture kernel config options (Arch Dependent, ia64) ----------------------------------------------------------- - -- No specific options are required to create a dump-capture kernel - for ia64, other than those specified in the arch independent section - above. This means that it is possible to use the system kernel - as a dump-capture kernel if desired. - - The crashkernel region can be automatically placed by the system - kernel at runtime. This is done by specifying the base address as 0, - or omitting it all together:: - - crashkernel=256M@0 - - or:: - - crashkernel=256M - Dump-capture kernel config options (Arch Dependent, arm) ---------------------------------------------------------- @@ -348,11 +330,6 @@ Boot into System Kernel On ppc64, use "crashkernel=128M@32M". - On ia64, 256M@256M is a generous value that typically works. - The region may be automatically placed on ia64, see the - dump-capture kernel config option notes above. - If use sparse memory, the size should be rounded to GRANULE boundaries. - On s390x, typically use "crashkernel=xxM". The value of xx is dependent on the memory consumption of the kdump system. In general this is not dependent on the memory size of the production system. @@ -383,10 +360,6 @@ For ppc64: - Use vmlinux -For ia64: - - - Use vmlinux or vmlinuz.gz - For s390x: - Use image or bzImage @@ -428,14 +401,10 @@ to load dump-capture kernel:: --initrd=<initrd-for-dump-capture-kernel> \ --append="root=<root-dev> <arch-specific-options>" -Please note, that --args-linux does not need to be specified for ia64. -It is planned to make this a no-op on that architecture, but for now -it should be omitted - Following are the arch specific command line options to be used while loading dump-capture kernel. -For i386, x86_64 and ia64: +For i386 and x86_64: "1 irqpoll nr_cpus=1 reset_devices" diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst index 599e8d3bcb..78e4d2e7ba 100644 --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst @@ -413,36 +413,6 @@ of a higher page table lookup overhead, and also consumes more page table space per process. Used to check whether PAE was enabled in the crash kernel when converting virtual addresses to physical addresses. -ia64 -==== - -pgdat_list|(pgdat_list, MAX_NUMNODES) -------------------------------------- - -pg_data_t array storing all NUMA nodes information. MAX_NUMNODES -indicates the number of the nodes. - -node_memblk|(node_memblk, NR_NODE_MEMBLKS) ------------------------------------------- - -List of node memory chunks. Filled when parsing the SRAT table to obtain -information about memory nodes. NR_NODE_MEMBLKS indicates the number of -node memory chunks. - -These values are used to compute the number of nodes the crashed kernel used. - -node_memblk_s|(node_memblk_s, start_paddr)|(node_memblk_s, size) ----------------------------------------------------------------- - -The size of a struct node_memblk_s and the offsets of the -node_memblk_s's members. Used to compute the number of nodes. - -PGTABLE_3|PGTABLE_4 -------------------- - -User-space tools need to know whether the crash kernel was in 3-level or -4-level paging mode. Used to distinguish the page table. - ARM64 ===== diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 41644336e3..b72e2049c4 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -580,6 +580,10 @@ named mounts. Specifying both "all" and "named" disables all v1 hierarchies. + cgroup_favordynmods= [KNL] Enable or Disable favordynmods. + Format: { "true" | "false" } + Defaults to the value of CONFIG_CGROUP_FAVOR_DYNMODS. + cgroup.memory= [KNL] Pass options to the cgroup memory controller. Format: <string> nosocket -- Disable socket memory accounting. @@ -1331,6 +1335,7 @@ earlyprintk=dbgp[debugController#] earlyprintk=pciserial[,force],bus:device.function[,baudrate] earlyprintk=xdbc[xhciController#] + earlyprintk=bios earlyprintk is useful when the kernel crashes before the normal console is initialized. It is not enabled by @@ -1361,6 +1366,8 @@ The sclp output can only be used on s390. + The bios output can only be used on SuperH. + The optional "force" to "pciserial" enables use of a PCI device even when its classcode is not of the UART class. @@ -1449,7 +1456,7 @@ See comment before function elanfreq_setup() in arch/x86/kernel/cpu/cpufreq/elanfreq.c. - elfcorehdr=[size[KMG]@]offset[KMG] [IA64,PPC,SH,X86,S390] + elfcorehdr=[size[KMG]@]offset[KMG] [PPC,SH,X86,S390] Specifies physical address of start of kernel core image elf header and optionally the size. Generally kexec loader will pass this option to capture kernel. @@ -1512,12 +1519,6 @@ floppy= [HW] See Documentation/admin-guide/blockdev/floppy.rst. - force_pal_cache_flush - [IA-64] Avoid check_sal_cache_flush which may hang on - buggy SAL_CACHE_FLUSH implementations. Using this - parameter will force ia64_sal_cache_flush to call - ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. - forcepae [X86-32] Forcefully enable Physical Address Extension (PAE). Many Pentium M systems disable PAE but may have a @@ -1893,6 +1894,12 @@ 0 -- machine default 1 -- force brightness inversion + ia32_emulation= [X86-64] + Format: <bool> + When true, allows loading 32-bit programs and executing 32-bit + syscalls, essentially overriding IA32_EMULATION_DEFAULT_DISABLED at + boot time. When false, unconditionally disables IA32 emulation. + icn= [HW,ISDN] Format: <io>[,<membase>[,<icn_id>[,<icn_id2>]]] @@ -2220,7 +2227,7 @@ forcing Dual Address Cycle for PCI cards supporting greater than 32-bit addressing. - iommu.strict= [ARM64, X86] Configure TLB invalidation behaviour + iommu.strict= [ARM64, X86, S390] Configure TLB invalidation behaviour Format: { "0" | "1" } 0 - Lazy mode. Request that DMA unmap operations use deferred @@ -2913,6 +2920,38 @@ to extract confidential information from the kernel are also disabled. + locktorture.acq_writer_lim= [KNL] + Set the time limit in jiffies for a lock + acquisition. Acquisitions exceeding this limit + will result in a splat once they do complete. + + locktorture.bind_readers= [KNL] + Specify the list of CPUs to which the readers are + to be bound. + + locktorture.bind_writers= [KNL] + Specify the list of CPUs to which the writers are + to be bound. + + locktorture.call_rcu_chains= [KNL] + Specify the number of self-propagating call_rcu() + chains to set up. These are used to ensure that + there is a high probability of an RCU grace period + in progress at any given time. Defaults to 0, + which disables these call_rcu() chains. + + locktorture.long_hold= [KNL] + Specify the duration in milliseconds for the + occasional long-duration lock hold time. Defaults + to 100 milliseconds. Select 0 to disable. + + locktorture.nested_locks= [KNL] + Specify the maximum lock nesting depth that + locktorture is to exercise, up to a limit of 8 + (MAX_NESTED_LOCKS). Specify zero to disable. + Note that this parameter is ineffective on types + of locks that do not support nested acquisition. + locktorture.nreaders_stress= [KNL] Set the number of locking read-acquisition kthreads. Defaults to being automatically set based on the @@ -2928,6 +2967,25 @@ Set time (s) between CPU-hotplug operations, or zero to disable CPU-hotplug testing. + locktorture.rt_boost= [KNL] + Do periodic testing of real-time lock priority + boosting. Select 0 to disable, 1 to boost + only rt_mutex, and 2 to boost unconditionally. + Defaults to 2, which might seem to be an + odd choice, but which should be harmless for + non-real-time spinlocks, due to their disabling + of preemption. Note that non-realtime mutexes + disable boosting. + + locktorture.rt_boost_factor= [KNL] + Number that determines how often and for how + long priority boosting is exercised. This is + scaled down by the number of writers, so that the + number of boosts per unit time remains roughly + constant as the number of writers increases. + On the other hand, the duration of each boost + increases with the number of writers. + locktorture.shuffle_interval= [KNL] Set task-shuffle interval (jiffies). Shuffling tasks allows some CPUs to go into dyntick-idle @@ -2950,13 +3008,13 @@ locktorture.torture_type= [KNL] Specify the locking implementation to test. + locktorture.verbose= [KNL] + Enable additional printk() statements. + locktorture.writer_fifo= [KNL] Run the write-side locktorture kthreads at sched_set_fifo() real-time priority. - locktorture.verbose= [KNL] - Enable additional printk() statements. - logibm.irq= [HW,MOUSE] Logitech Bus Mouse Driver Format: <irq> @@ -3275,6 +3333,11 @@ mga= [HW,DRM] + microcode.force_minrev= [X86] + Format: <bool> + Enable or disable the microcode minimal revision + enforcement for the runtime microcode loader. + min_addr=nn[KMG] [KNL,BOOT,IA-64] All physical memory below this physical address is ignored. @@ -3533,6 +3596,13 @@ [NFS] set the TCP port on which the NFSv4 callback channel should listen. + nfs.delay_retrans= + [NFS] specifies the number of times the NFSv4 client + retries the request before returning an EAGAIN error, + after a reply of NFS4ERR_DELAY from the server. + Only applies if the softerr mount option is enabled, + and the specified value is >= 0. + nfs.enable_ino64= [NFS] enable 64-bit inode numbers. If zero, the NFS client will fake up a 32-bit inode @@ -4769,6 +4839,13 @@ Set maximum number of finished RCU callbacks to process in one batch. + rcutree.do_rcu_barrier= [KNL] + Request a call to rcu_barrier(). This is + throttled so that userspace tests can safely + hammer on the sysfs variable if they so choose. + If triggered before the RCU grace-period machinery + is fully active, this will error out with EAGAIN. + rcutree.dump_tree= [KNL] Dump the structure of the rcu_node combining tree out at early boot. This is used for diagnostic @@ -5225,6 +5302,12 @@ Dump ftrace buffer after reporting RCU CPU stall warning. + rcupdate.rcu_cpu_stall_notifiers= [KNL] + Provide RCU CPU stall notifiers, but see the + warnings in the RCU_CPU_STALL_NOTIFIER Kconfig + option's help text. TL;DR: You almost certainly + do not want rcupdate.rcu_cpu_stall_notifiers. + rcupdate.rcu_cpu_stall_suppress= [KNL] Suppress RCU CPU stall warning messages. @@ -5422,6 +5505,12 @@ test until boot completes in order to avoid interference. + refscale.lookup_instances= [KNL] + Number of data elements to use for the forms of + SLAB_TYPESAFE_BY_RCU testing. A negative number + is negated and multiplied by nr_cpu_ids, while + zero specifies nr_cpu_ids. + refscale.loops= [KNL] Set the number of loops over the synchronization primitive under test. Increasing this number @@ -5611,9 +5700,10 @@ s390_iommu= [HW,S390] Set s390 IOTLB flushing mode strict - With strict flushing every unmap operation will result in - an IOTLB flush. Default is lazy flushing before reuse, - which is faster. + With strict flushing every unmap operation will result + in an IOTLB flush. Default is lazy flushing before + reuse, which is faster. Deprecated, equivalent to + iommu.strict=1. s390_iommu_aperture= [KNL,S390] Specifies the size of the per device DMA address space diff --git a/Documentation/admin-guide/laptops/thinkpad-acpi.rst b/Documentation/admin-guide/laptops/thinkpad-acpi.rst index e27a1c3f63..98d3040101 100644 --- a/Documentation/admin-guide/laptops/thinkpad-acpi.rst +++ b/Documentation/admin-guide/laptops/thinkpad-acpi.rst @@ -53,6 +53,7 @@ detailed description): - Lap mode sensor - Setting keyboard language - WWAN Antenna type + - Auxmac A compatibility table by model and feature is maintained on the web site, http://ibm-acpi.sf.net/. I appreciate any success or failure @@ -1511,6 +1512,25 @@ Currently 2 antenna types are supported as mentioned below: The property is read-only. If the platform doesn't have support the sysfs class is not created. +Auxmac +------ + +sysfs: auxmac + +Some newer Thinkpads have a feature called MAC Address Pass-through. This +feature is implemented by the system firmware to provide a system unique MAC, +that can override a dock or USB ethernet dongle MAC, when connected to a +network. This property enables user-space to easily determine the MAC address +if the feature is enabled. + +The values of this auxiliary MAC are: + + cat /sys/devices/platform/thinkpad_acpi/auxmac + +If the feature is disabled, the value will be 'disabled'. + +This property is read-only. + Adaptive keyboard ----------------- diff --git a/Documentation/admin-guide/media/mgb4.rst b/Documentation/admin-guide/media/mgb4.rst new file mode 100644 index 0000000000..2977f74d7e --- /dev/null +++ b/Documentation/admin-guide/media/mgb4.rst @@ -0,0 +1,374 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==================== +mgb4 sysfs interface +==================== + +The mgb4 driver provides a sysfs interface, that is used to configure video +stream related parameters (some of them must be set properly before the v4l2 +device can be opened) and obtain the video device/stream status. + +There are two types of parameters - global / PCI card related, found under +``/sys/class/video4linux/videoX/device`` and module specific found under +``/sys/class/video4linux/videoX``. + + +Global (PCI card) parameters +============================ + +**module_type** (R): + Module type. + + | 0 - No module present + | 1 - FPDL3 + | 2 - GMSL + +**module_version** (R): + Module version number. Zero in case of a missing module. + +**fw_type** (R): + Firmware type. + + | 1 - FPDL3 + | 2 - GMSL + +**fw_version** (R): + Firmware version number. + +**serial_number** (R): + Card serial number. The format is:: + + PRODUCT-REVISION-SERIES-SERIAL + + where each component is a 8b number. + + +Common FPDL3/GMSL input parameters +================================== + +**input_id** (R): + Input number ID, zero based. + +**oldi_lane_width** (RW): + Number of deserializer output lanes. + + | 0 - single + | 1 - dual (default) + +**color_mapping** (RW): + Mapping of the incoming bits in the signal to the colour bits of the pixels. + + | 0 - OLDI/JEIDA + | 1 - SPWG/VESA (default) + +**link_status** (R): + Video link status. If the link is locked, chips are properly connected and + communicating at the same speed and protocol. The link can be locked without + an active video stream. + + A value of 0 is equivalent to the V4L2_IN_ST_NO_SYNC flag of the V4L2 + VIDIOC_ENUMINPUT status bits. + + | 0 - unlocked + | 1 - locked + +**stream_status** (R): + Video stream status. A stream is detected if the link is locked, the input + pixel clock is running and the DE signal is moving. + + A value of 0 is equivalent to the V4L2_IN_ST_NO_SIGNAL flag of the V4L2 + VIDIOC_ENUMINPUT status bits. + + | 0 - not detected + | 1 - detected + +**video_width** (R): + Video stream width. This is the actual width as detected by the HW. + + The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in the width + field of the v4l2_bt_timings struct. + +**video_height** (R): + Video stream height. This is the actual height as detected by the HW. + + The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in the height + field of the v4l2_bt_timings struct. + +**vsync_status** (R): + The type of VSYNC pulses as detected by the video format detector. + + The value is equivalent to the flags returned by VIDIOC_QUERY_DV_TIMINGS in + the polarities field of the v4l2_bt_timings struct. + + | 0 - active low + | 1 - active high + | 2 - not available + +**hsync_status** (R): + The type of HSYNC pulses as detected by the video format detector. + + The value is equivalent to the flags returned by VIDIOC_QUERY_DV_TIMINGS in + the polarities field of the v4l2_bt_timings struct. + + | 0 - active low + | 1 - active high + | 2 - not available + +**vsync_gap_length** (RW): + If the incoming video signal does not contain synchronization VSYNC and + HSYNC pulses, these must be generated internally in the FPGA to achieve + the correct frame ordering. This value indicates, how many "empty" pixels + (pixels with deasserted Data Enable signal) are necessary to generate the + internal VSYNC pulse. + +**hsync_gap_length** (RW): + If the incoming video signal does not contain synchronization VSYNC and + HSYNC pulses, these must be generated internally in the FPGA to achieve + the correct frame ordering. This value indicates, how many "empty" pixels + (pixels with deasserted Data Enable signal) are necessary to generate the + internal HSYNC pulse. The value must be greater than 1 and smaller than + vsync_gap_length. + +**pclk_frequency** (R): + Input pixel clock frequency in kHz. + + The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in + the pixelclock field of the v4l2_bt_timings struct. + + *Note: The frequency_range parameter must be set properly first to get + a valid frequency here.* + +**hsync_width** (R): + Width of the HSYNC signal in PCLK clock ticks. + + The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in + the hsync field of the v4l2_bt_timings struct. + +**vsync_width** (R): + Width of the VSYNC signal in PCLK clock ticks. + + The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in + the vsync field of the v4l2_bt_timings struct. + +**hback_porch** (R): + Number of PCLK pulses between deassertion of the HSYNC signal and the first + valid pixel in the video line (marked by DE=1). + + The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in + the hbackporch field of the v4l2_bt_timings struct. + +**hfront_porch** (R): + Number of PCLK pulses between the end of the last valid pixel in the video + line (marked by DE=1) and assertion of the HSYNC signal. + + The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in + the hfrontporch field of the v4l2_bt_timings struct. + +**vback_porch** (R): + Number of video lines between deassertion of the VSYNC signal and the video + line with the first valid pixel (marked by DE=1). + + The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in + the vbackporch field of the v4l2_bt_timings struct. + +**vfront_porch** (R): + Number of video lines between the end of the last valid pixel line (marked + by DE=1) and assertion of the VSYNC signal. + + The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in + the vfrontporch field of the v4l2_bt_timings struct. + +**frequency_range** (RW) + PLL frequency range of the OLDI input clock generator. The PLL frequency is + derived from the Pixel Clock Frequency (PCLK) and is equal to PCLK if + oldi_lane_width is set to "single" and PCLK/2 if oldi_lane_width is set to + "dual". + + | 0 - PLL < 50MHz (default) + | 1 - PLL >= 50MHz + + *Note: This parameter can not be changed while the input v4l2 device is + open.* + + +Common FPDL3/GMSL output parameters +=================================== + +**output_id** (R): + Output number ID, zero based. + +**video_source** (RW): + Output video source. If set to 0 or 1, the source is the corresponding card + input and the v4l2 output devices are disabled. If set to 2 or 3, the source + is the corresponding v4l2 video output device. The default is + the corresponding v4l2 output, i.e. 2 for OUT1 and 3 for OUT2. + + | 0 - input 0 + | 1 - input 1 + | 2 - v4l2 output 0 + | 3 - v4l2 output 1 + + *Note: This parameter can not be changed while ANY of the input/output v4l2 + devices is open.* + +**display_width** (RW): + Display width. There is no autodetection of the connected display, so the + proper value must be set before the start of streaming. The default width + is 1280. + + *Note: This parameter can not be changed while the output v4l2 device is + open.* + +**display_height** (RW): + Display height. There is no autodetection of the connected display, so the + proper value must be set before the start of streaming. The default height + is 640. + + *Note: This parameter can not be changed while the output v4l2 device is + open.* + +**frame_rate** (RW): + Output video frame rate in frames per second. The default frame rate is + 60Hz. + +**hsync_polarity** (RW): + HSYNC signal polarity. + + | 0 - active low (default) + | 1 - active high + +**vsync_polarity** (RW): + VSYNC signal polarity. + + | 0 - active low (default) + | 1 - active high + +**de_polarity** (RW): + DE signal polarity. + + | 0 - active low + | 1 - active high (default) + +**pclk_frequency** (RW): + Output pixel clock frequency. Allowed values are between 25000-190000(kHz) + and there is a non-linear stepping between two consecutive allowed + frequencies. The driver finds the nearest allowed frequency to the given + value and sets it. When reading this property, you get the exact + frequency set by the driver. The default frequency is 70000kHz. + + *Note: This parameter can not be changed while the output v4l2 device is + open.* + +**hsync_width** (RW): + Width of the HSYNC signal in pixels. The default value is 16. + +**vsync_width** (RW): + Width of the VSYNC signal in video lines. The default value is 2. + +**hback_porch** (RW): + Number of PCLK pulses between deassertion of the HSYNC signal and the first + valid pixel in the video line (marked by DE=1). The default value is 32. + +**hfront_porch** (RW): + Number of PCLK pulses between the end of the last valid pixel in the video + line (marked by DE=1) and assertion of the HSYNC signal. The default value + is 32. + +**vback_porch** (RW): + Number of video lines between deassertion of the VSYNC signal and the video + line with the first valid pixel (marked by DE=1). The default value is 2. + +**vfront_porch** (RW): + Number of video lines between the end of the last valid pixel line (marked + by DE=1) and assertion of the VSYNC signal. The default value is 2. + + +FPDL3 specific input parameters +=============================== + +**fpdl3_input_width** (RW): + Number of deserializer input lines. + + | 0 - auto (default) + | 1 - single + | 2 - dual + +FPDL3 specific output parameters +================================ + +**fpdl3_output_width** (RW): + Number of serializer output lines. + + | 0 - auto (default) + | 1 - single + | 2 - dual + +GMSL specific input parameters +============================== + +**gmsl_mode** (RW): + GMSL speed mode. + + | 0 - 12Gb/s (default) + | 1 - 6Gb/s + | 2 - 3Gb/s + | 3 - 1.5Gb/s + +**gmsl_stream_id** (RW): + The GMSL multi-stream contains up to four video streams. This parameter + selects which stream is captured by the video input. The value is the + zero-based index of the stream. The default stream id is 0. + + *Note: This parameter can not be changed while the input v4l2 device is + open.* + +**gmsl_fec** (RW): + GMSL Forward Error Correction (FEC). + + | 0 - disabled + | 1 - enabled (default) + + +==================== +mgb4 mtd partitions +==================== + +The mgb4 driver creates a MTD device with two partitions: + - mgb4-fw.X - FPGA firmware. + - mgb4-data.X - Factory settings, e.g. card serial number. + +The *mgb4-fw* partition is writable and is used for FW updates, *mgb4-data* is +read-only. The *X* attached to the partition name represents the card number. +Depending on the CONFIG_MTD_PARTITIONED_MASTER kernel configuration, you may +also have a third partition named *mgb4-flash* available in the system. This +partition represents the whole, unpartitioned, card's FLASH memory and one should +not fiddle with it... + +==================== +mgb4 iio (triggers) +==================== + +The mgb4 driver creates an Industrial I/O (IIO) device that provides trigger and +signal level status capability. The following scan elements are available: + +**activity**: + The trigger levels and pending status. + + | bit 1 - trigger 1 pending + | bit 2 - trigger 2 pending + | bit 5 - trigger 1 level + | bit 6 - trigger 2 level + +**timestamp**: + The trigger event timestamp. + +The iio device can operate either in "raw" mode where you can fetch the signal +levels (activity bits 5 and 6) using sysfs access or in triggered buffer mode. +In the triggered buffer mode you can follow the signal level changes (activity +bits 1 and 2) using the iio device in /dev. If you enable the timestamps, you +will also get the exact trigger event time that can be matched to a video frame +(every mgb4 video frame has a timestamp with the same clock source). + +*Note: although the activity sample always contains all the status bits, it makes +no sense to get the pending bits in raw mode or the level bits in the triggered +buffer mode - the values do not represent valid data in such case.* diff --git a/Documentation/admin-guide/media/pci-cardlist.rst b/Documentation/admin-guide/media/pci-cardlist.rst index 42528795d4..7d8e3c8987 100644 --- a/Documentation/admin-guide/media/pci-cardlist.rst +++ b/Documentation/admin-guide/media/pci-cardlist.rst @@ -77,6 +77,7 @@ ipu3-cio2 Intel ipu3-cio2 driver ivtv Conexant cx23416/cx23415 MPEG encoder/decoder ivtvfb Conexant cx23415 framebuffer mantis MANTIS based cards +mgb4 Digiteq Automotive MGB4 frame grabber mxb Siemens-Nixdorf 'Multimedia eXtension Board' netup-unidvb NetUP Universal DVB card ngene Micronas nGene diff --git a/Documentation/admin-guide/media/v4l-drivers.rst b/Documentation/admin-guide/media/v4l-drivers.rst index 1c41f87c39..61283d67ce 100644 --- a/Documentation/admin-guide/media/v4l-drivers.rst +++ b/Documentation/admin-guide/media/v4l-drivers.rst @@ -17,6 +17,7 @@ Video4Linux (V4L) driver-specific documentation imx7 ipu3 ivtv + mgb4 omap3isp omap4_camera philips diff --git a/Documentation/admin-guide/media/visl.rst b/Documentation/admin-guide/media/visl.rst index 7d2dc78341..4328c6c72d 100644 --- a/Documentation/admin-guide/media/visl.rst +++ b/Documentation/admin-guide/media/visl.rst @@ -78,7 +78,7 @@ The trace events are defined on a per-codec basis, e.g.: .. code-block:: bash - $ ls /sys/kernel/debug/tracing/events/ | grep visl + $ ls /sys/kernel/tracing/events/ | grep visl visl_fwht_controls visl_h264_controls visl_hevc_controls @@ -90,13 +90,13 @@ For example, in order to dump HEVC SPS data: .. code-block:: bash - $ echo 1 > /sys/kernel/debug/tracing/events/visl_hevc_controls/v4l2_ctrl_hevc_sps/enable + $ echo 1 > /sys/kernel/tracing/events/visl_hevc_controls/v4l2_ctrl_hevc_sps/enable The SPS data will be dumped to the trace buffer, i.e.: .. code-block:: bash - $ cat /sys/kernel/debug/tracing/trace + $ cat /sys/kernel/tracing/trace video_parameter_set_id 0 seq_parameter_set_id 0 pic_width_in_luma_samples 1920 diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index 8da1b72818..da94feb97e 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -20,18 +20,18 @@ DAMON provides below interfaces for different users. you can write and use your personalized DAMON sysfs wrapper programs that reads/writes the sysfs files instead of you. The `DAMON user space tool <https://github.com/awslabs/damo>`_ is one example of such programs. -- *debugfs interface. (DEPRECATED!)* - :ref:`This <debugfs_interface>` is almost identical to :ref:`sysfs interface - <sysfs_interface>`. This is deprecated, so users should move to the - :ref:`sysfs interface <sysfs_interface>`. If you depend on this and cannot - move, please report your usecase to damon@lists.linux.dev and - linux-mm@kvack.org. - *Kernel Space Programming Interface.* :doc:`This </mm/damon/api>` is for kernel space programmers. Using this, users can utilize every feature of DAMON most flexibly and efficiently by writing kernel space DAMON application programs for you. You can even extend DAMON for various address spaces. For detail, please refer to the interface :doc:`document </mm/damon/api>`. +- *debugfs interface. (DEPRECATED!)* + :ref:`This <debugfs_interface>` is almost identical to :ref:`sysfs interface + <sysfs_interface>`. This is deprecated, so users should move to the + :ref:`sysfs interface <sysfs_interface>`. If you depend on this and cannot + move, please report your usecase to damon@lists.linux.dev and + linux-mm@kvack.org. .. _sysfs_interface: @@ -76,7 +76,7 @@ comma (","). :: │ │ │ │ │ │ │ │ ... │ │ │ │ │ │ ... │ │ │ │ │ schemes/nr_schemes - │ │ │ │ │ │ 0/action + │ │ │ │ │ │ 0/action,apply_interval_us │ │ │ │ │ │ │ access_pattern/ │ │ │ │ │ │ │ │ sz/min,max │ │ │ │ │ │ │ │ nr_accesses/min,max @@ -105,14 +105,12 @@ having the root permission could use this directory. kdamonds/ --------- -The monitoring-related information including request specifications and results -are called DAMON context. DAMON executes each context with a kernel thread -called kdamond, and multiple kdamonds could run in parallel. - Under the ``admin`` directory, one directory, ``kdamonds``, which has files for -controlling the kdamonds exist. In the beginning, this directory has only one -file, ``nr_kdamonds``. Writing a number (``N``) to the file creates the number -of child directories named ``0`` to ``N-1``. Each directory represents each +controlling the kdamonds (refer to +:ref:`design <damon_design_execution_model_and_data_structures>` for more +details) exists. In the beginning, this directory has only one file, +``nr_kdamonds``. Writing a number (``N``) to the file creates the number of +child directories named ``0`` to ``N-1``. Each directory represents each kdamond. kdamonds/<N>/ @@ -150,9 +148,10 @@ kdamonds/<N>/contexts/ In the beginning, this directory has only one file, ``nr_contexts``. Writing a number (``N``) to the file creates the number of child directories named as -``0`` to ``N-1``. Each directory represents each monitoring context. At the -moment, only one context per kdamond is supported, so only ``0`` or ``1`` can -be written to the file. +``0`` to ``N-1``. Each directory represents each monitoring context (refer to +:ref:`design <damon_design_execution_model_and_data_structures>` for more +details). At the moment, only one context per kdamond is supported, so only +``0`` or ``1`` can be written to the file. .. _sysfs_contexts: @@ -270,8 +269,8 @@ schemes/<N>/ ------------ In each scheme directory, five directories (``access_pattern``, ``quotas``, -``watermarks``, ``filters``, ``stats``, and ``tried_regions``) and one file -(``action``) exist. +``watermarks``, ``filters``, ``stats``, and ``tried_regions``) and two files +(``action`` and ``apply_interval``) exist. The ``action`` file is for setting and getting the scheme's :ref:`action <damon_design_damos_action>`. The keywords that can be written to and read @@ -297,6 +296,9 @@ Note that support of each action depends on the running DAMON operations set - ``stat``: Do nothing but count the statistics. Supported by all operations sets. +The ``apply_interval_us`` file is for setting and getting the scheme's +:ref:`apply_interval <damon_design_damos>` in microseconds. + schemes/<N>/access_pattern/ --------------------------- @@ -392,7 +394,7 @@ pages of all memory cgroups except ``/having_care_already``.:: echo N > 1/matching Note that ``anon`` and ``memcg`` filters are currently supported only when -``paddr`` `implementation <sysfs_contexts>` is being used. +``paddr`` :ref:`implementation <sysfs_contexts>` is being used. Also, memory regions that are filtered out by ``addr`` or ``target`` filters are not counted as the scheme has tried to those, while regions that filtered @@ -430,9 +432,9 @@ that reading it returns the total size of the scheme tried regions, and creates directories named integer starting from ``0`` under this directory. Each directory contains files exposing detailed information about each of the memory region that the corresponding scheme's ``action`` has tried to be applied under -this directory, during next :ref:`aggregation interval -<sysfs_monitoring_attrs>`. The information includes address range, -``nr_accesses``, and ``age`` of the region. +this directory, during next :ref:`apply interval <damon_design_damos>` of the +corresponding scheme. The information includes address range, ``nr_accesses``, +and ``age`` of the region. Writing ``update_schemes_tried_bytes`` to the relevant ``kdamonds/<N>/state`` file will only update the ``total_bytes`` file, and will not create the @@ -495,6 +497,62 @@ Please note that it's highly recommended to use user space tools like `damo <https://github.com/awslabs/damo>`_ rather than manually reading and writing the files as above. Above is only for an example. +.. _tracepoint: + +Tracepoints for Monitoring Results +================================== + +Users can get the monitoring results via the :ref:`tried_regions +<sysfs_schemes_tried_regions>`. The interface is useful for getting a +snapshot, but it could be inefficient for fully recording all the monitoring +results. For the purpose, two trace points, namely ``damon:damon_aggregated`` +and ``damon:damos_before_apply``, are provided. ``damon:damon_aggregated`` +provides the whole monitoring results, while ``damon:damos_before_apply`` +provides the monitoring results for regions that each DAMON-based Operation +Scheme (:ref:`DAMOS <damon_design_damos>`) is gonna be applied. Hence, +``damon:damos_before_apply`` is more useful for recording internal behavior of +DAMOS, or DAMOS target access +:ref:`pattern <damon_design_damos_access_pattern>` based query-like efficient +monitoring results recording. + +While the monitoring is turned on, you could record the tracepoint events and +show results using tracepoint supporting tools like ``perf``. For example:: + + # echo on > monitor_on + # perf record -e damon:damon_aggregated & + # sleep 5 + # kill 9 $(pidof perf) + # echo off > monitor_on + # perf script + kdamond.0 46568 [027] 79357.842179: damon:damon_aggregated: target_id=0 nr_regions=11 122509119488-135708762112: 0 864 + [...] + +Each line of the perf script output represents each monitoring region. The +first five fields are as usual other tracepoint outputs. The sixth field +(``target_id=X``) shows the ide of the monitoring target of the region. The +seventh field (``nr_regions=X``) shows the total number of monitoring regions +for the target. The eighth field (``X-Y:``) shows the start (``X``) and end +(``Y``) addresses of the region in bytes. The ninth field (``X``) shows the +``nr_accesses`` of the region (refer to +:ref:`design <damon_design_region_based_sampling>` for more details of the +counter). Finally the tenth field (``X``) shows the ``age`` of the region +(refer to :ref:`design <damon_design_age_tracking>` for more details of the +counter). + +If the event was ``damon:damos_beofre_apply``, the ``perf script`` output would +be somewhat like below:: + + kdamond.0 47293 [000] 80801.060214: damon:damos_before_apply: ctx_idx=0 scheme_idx=0 target_idx=0 nr_regions=11 121932607488-135128711168: 0 136 + [...] + +Each line of the output represents each monitoring region that each DAMON-based +Operation Scheme was about to be applied at the traced time. The first five +fields are as usual. It shows the index of the DAMON context (``ctx_idx=X``) +of the scheme in the list of the contexts of the context's kdamond, the index +of the scheme (``scheme_idx=X``) in the list of the schemes of the context, in +addition to the output of ``damon_aggregated`` tracepoint. + + .. _debugfs_interface: debugfs Interface (DEPRECATED!) @@ -790,23 +848,3 @@ directory by putting the name of the context to the ``rm_contexts`` file. :: Note that ``mk_contexts``, ``rm_contexts``, and ``monitor_on`` files are in the root directory only. - - -.. _tracepoint: - -Tracepoint for Monitoring Results -================================= - -Users can get the monitoring results via the :ref:`tried_regions -<sysfs_schemes_tried_regions>` or a tracepoint, ``damon:damon_aggregated``. -While the tried regions directory is useful for getting a snapshot, the -tracepoint is useful for getting a full record of the results. While the -monitoring is turned on, you could record the tracepoint events and show -results using tracepoint supporting tools like ``perf``. For example:: - - # echo on > monitor_on - # perf record -e damon:damon_aggregated & - # sleep 5 - # kill 9 $(pidof perf) - # echo off > monitor_on - # perf script diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst index 776f244bda..e59231ac6b 100644 --- a/Documentation/admin-guide/mm/ksm.rst +++ b/Documentation/admin-guide/mm/ksm.rst @@ -155,6 +155,15 @@ stable_node_chains_prune_millisecs scan. It's a noop if not a single KSM page hit the ``max_page_sharing`` yet. +smart_scan + Historically KSM checked every candidate page for each scan. It did + not take into account historic information. When smart scan is + enabled, pages that have previously not been de-duplicated get + skipped. How often these pages are skipped depends on how often + de-duplication has already been tried and failed. By default this + optimization is enabled. The ``pages_skipped`` metric shows how + effective the setting is. + The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``: general_profit @@ -169,6 +178,8 @@ pages_unshared how many pages unique but repeatedly checked for merging pages_volatile how many pages changing too fast to be placed in a tree +pages_skipped + how many pages did the "smart" page scanning algorithm skip full_scans how many times all mergeable areas have been scanned stable_node_chains diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst index cfe034cf1e..098f14d83e 100644 --- a/Documentation/admin-guide/mm/memory-hotplug.rst +++ b/Documentation/admin-guide/mm/memory-hotplug.rst @@ -33,7 +33,7 @@ used to expose persistent memory, other performance-differentiated memory and reserved memory regions as ordinary system RAM to Linux. Linux only supports memory hot(un)plug on selected 64 bit architectures, such as -x86_64, arm64, ppc64, s390x and ia64. +x86_64, arm64, ppc64 and s390x. Memory Hot(Un)Plug Granularity ------------------------------ @@ -75,7 +75,7 @@ Memory hotunplug consists of two phases: (1) Offlining memory blocks (2) Removing the memory from Linux -In the fist phase, memory is "hidden" from the page allocator again, for +In the first phase, memory is "hidden" from the page allocator again, for example, by migrating busy memory to other memory locations and removing all relevant free pages from the page allocator After this phase, the memory is no longer visible in memory statistics of the system. @@ -250,15 +250,15 @@ Observing the State of Memory Blocks The state (online/offline/going-offline) of a memory block can be observed either via:: - % cat /sys/device/system/memory/memoryXXX/state + % cat /sys/devices/system/memory/memoryXXX/state Or alternatively (1/0) via:: - % cat /sys/device/system/memory/memoryXXX/online + % cat /sys/devices/system/memory/memoryXXX/online For an online memory block, the managing zone can be observed via:: - % cat /sys/device/system/memory/memoryXXX/valid_zones + % cat /sys/devices/system/memory/memoryXXX/valid_zones Configuring Memory Hot(Un)Plug ============================== @@ -326,7 +326,7 @@ however, a memory block might span memory holes. A memory block spanning memory holes cannot be offlined. For example, assume 1 GiB memory block size. A device for a memory starting at -0x100000000 is ``/sys/device/system/memory/memory4``:: +0x100000000 is ``/sys/devices/system/memory/memory4``:: (0x100000000 / 1Gib = 4) diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst index c8f380271c..fe17cf2104 100644 --- a/Documentation/admin-guide/mm/pagemap.rst +++ b/Documentation/admin-guide/mm/pagemap.rst @@ -227,3 +227,92 @@ Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is always 12 at most architectures). Since Linux 3.11 their meaning changes after first clear of soft-dirty bits. Since Linux 4.2 they are used for flags unconditionally. + +Pagemap Scan IOCTL +================== + +The ``PAGEMAP_SCAN`` IOCTL on the pagemap file can be used to get or optionally +clear the info about page table entries. The following operations are supported +in this IOCTL: + +- Scan the address range and get the memory ranges matching the provided criteria. + This is performed when the output buffer is specified. +- Write-protect the pages. The ``PM_SCAN_WP_MATCHING`` is used to write-protect + the pages of interest. The ``PM_SCAN_CHECK_WPASYNC`` aborts the operation if + non-Async Write Protected pages are found. The ``PM_SCAN_WP_MATCHING`` can be + used with or without ``PM_SCAN_CHECK_WPASYNC``. +- Both of those operations can be combined into one atomic operation where we can + get and write protect the pages as well. + +Following flags about pages are currently supported: + +- ``PAGE_IS_WPALLOWED`` - Page has async-write-protection enabled +- ``PAGE_IS_WRITTEN`` - Page has been written to from the time it was write protected +- ``PAGE_IS_FILE`` - Page is file backed +- ``PAGE_IS_PRESENT`` - Page is present in the memory +- ``PAGE_IS_SWAPPED`` - Page is in swapped +- ``PAGE_IS_PFNZERO`` - Page has zero PFN +- ``PAGE_IS_HUGE`` - Page is THP or Hugetlb backed + +The ``struct pm_scan_arg`` is used as the argument of the IOCTL. + + 1. The size of the ``struct pm_scan_arg`` must be specified in the ``size`` + field. This field will be helpful in recognizing the structure if extensions + are done later. + 2. The flags can be specified in the ``flags`` field. The ``PM_SCAN_WP_MATCHING`` + and ``PM_SCAN_CHECK_WPASYNC`` are the only added flags at this time. The get + operation is optionally performed depending upon if the output buffer is + provided or not. + 3. The range is specified through ``start`` and ``end``. + 4. The walk can abort before visiting the complete range such as the user buffer + can get full etc. The walk ending address is specified in``end_walk``. + 5. The output buffer of ``struct page_region`` array and size is specified in + ``vec`` and ``vec_len``. + 6. The optional maximum requested pages are specified in the ``max_pages``. + 7. The masks are specified in ``category_mask``, ``category_anyof_mask``, + ``category_inverted`` and ``return_mask``. + +Find pages which have been written and WP them as well:: + + struct pm_scan_arg arg = { + .size = sizeof(arg), + .flags = PM_SCAN_CHECK_WPASYNC | PM_SCAN_CHECK_WPASYNC, + .. + .category_mask = PAGE_IS_WRITTEN, + .return_mask = PAGE_IS_WRITTEN, + }; + +Find pages which have been written, are file backed, not swapped and either +present or huge:: + + struct pm_scan_arg arg = { + .size = sizeof(arg), + .flags = 0, + .. + .category_mask = PAGE_IS_WRITTEN | PAGE_IS_SWAPPED, + .category_inverted = PAGE_IS_SWAPPED, + .category_anyof_mask = PAGE_IS_PRESENT | PAGE_IS_HUGE, + .return_mask = PAGE_IS_WRITTEN | PAGE_IS_SWAPPED | + PAGE_IS_PRESENT | PAGE_IS_HUGE, + }; + +The ``PAGE_IS_WRITTEN`` flag can be considered as a better-performing alternative +of soft-dirty flag. It doesn't get affected by VMA merging of the kernel and hence +the user can find the true soft-dirty pages in case of normal pages. (There may +still be extra dirty pages reported for THP or Hugetlb pages.) + +"PAGE_IS_WRITTEN" category is used with uffd write protect-enabled ranges to +implement memory dirty tracking in userspace: + + 1. The userfaultfd file descriptor is created with ``userfaultfd`` syscall. + 2. The ``UFFD_FEATURE_WP_UNPOPULATED`` and ``UFFD_FEATURE_WP_ASYNC`` features + are set by ``UFFDIO_API`` IOCTL. + 3. The memory range is registered with ``UFFDIO_REGISTER_MODE_WP`` mode + through ``UFFDIO_REGISTER`` IOCTL. + 4. Then any part of the registered memory or the whole memory region must + be write protected using ``PAGEMAP_SCAN`` IOCTL with flag ``PM_SCAN_WP_MATCHING`` + or the ``UFFDIO_WRITEPROTECT`` IOCTL can be used. Both of these perform the + same operation. The former is better in terms of performance. + 5. Now the ``PAGEMAP_SCAN`` IOCTL can be used to either just find pages which + have been written to since they were last marked and/or optionally write protect + the pages as well. diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 4349a8c2b9..203e26da5f 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -244,6 +244,41 @@ write-protected (so future writes will also result in a WP fault). These ioctls support a mode flag (``UFFDIO_COPY_MODE_WP`` or ``UFFDIO_CONTINUE_MODE_WP`` respectively) to configure the mapping this way. +If the userfaultfd context has ``UFFD_FEATURE_WP_ASYNC`` feature bit set, +any vma registered with write-protection will work in async mode rather +than the default sync mode. + +In async mode, there will be no message generated when a write operation +happens, meanwhile the write-protection will be resolved automatically by +the kernel. It can be seen as a more accurate version of soft-dirty +tracking and it can be different in a few ways: + + - The dirty result will not be affected by vma changes (e.g. vma + merging) because the dirty is only tracked by the pte. + + - It supports range operations by default, so one can enable tracking on + any range of memory as long as page aligned. + + - Dirty information will not get lost if the pte was zapped due to + various reasons (e.g. during split of a shmem transparent huge page). + + - Due to a reverted meaning of soft-dirty (page clean when uffd-wp bit + set; dirty when uffd-wp bit cleared), it has different semantics on + some of the memory operations. For example: ``MADV_DONTNEED`` on + anonymous (or ``MADV_REMOVE`` on a file mapping) will be treated as + dirtying of memory by dropping uffd-wp bit during the procedure. + +The user app can collect the "written/dirty" status by looking up the +uffd-wp bit for the pages being interested in /proc/pagemap. + +The page will not be under track of uffd-wp async mode until the page is +explicitly write-protected by ``ioctl(UFFDIO_WRITEPROTECT)`` with the mode +flag ``UFFDIO_WRITEPROTECT_MODE_WP`` set. Trying to resolve a page fault +that was tracked by async mode userfaultfd-wp is invalid. + +When userfaultfd-wp async mode is used alone, it can be applied to all +kinds of memory. + Memory Poisioning Emulation --------------------------- diff --git a/Documentation/admin-guide/module-signing.rst b/Documentation/admin-guide/module-signing.rst index 2898b27032..a8667a7774 100644 --- a/Documentation/admin-guide/module-signing.rst +++ b/Documentation/admin-guide/module-signing.rst @@ -28,10 +28,10 @@ trusted userspace bits. This facility uses X.509 ITU-T standard certificates to encode the public keys involved. The signatures are not themselves encoded in any industrial standard -type. The facility currently only supports the RSA public key encryption -standard (though it is pluggable and permits others to be used). The possible -hash algorithms that can be used are SHA-1, SHA-224, SHA-256, SHA-384, and -SHA-512 (the algorithm is selected by data in the signature). +type. The built-in facility currently only supports the RSA & NIST P-384 ECDSA +public key signing standard (though it is pluggable and permits others to be +used). The possible hash algorithms that can be used are SHA-2 and SHA-3 of +sizes 256, 384, and 512 (the algorithm is selected by data in the signature). ========================== @@ -81,11 +81,12 @@ This has a number of options available: sign the modules with: =============================== ========================================== - ``CONFIG_MODULE_SIG_SHA1`` :menuselection:`Sign modules with SHA-1` - ``CONFIG_MODULE_SIG_SHA224`` :menuselection:`Sign modules with SHA-224` ``CONFIG_MODULE_SIG_SHA256`` :menuselection:`Sign modules with SHA-256` ``CONFIG_MODULE_SIG_SHA384`` :menuselection:`Sign modules with SHA-384` ``CONFIG_MODULE_SIG_SHA512`` :menuselection:`Sign modules with SHA-512` + ``CONFIG_MODULE_SIG_SHA3_256`` :menuselection:`Sign modules with SHA3-256` + ``CONFIG_MODULE_SIG_SHA3_384`` :menuselection:`Sign modules with SHA3-384` + ``CONFIG_MODULE_SIG_SHA3_512`` :menuselection:`Sign modules with SHA3-512` =============================== ========================================== The algorithm selected here will also be built into the kernel (rather @@ -145,6 +146,10 @@ into vmlinux) using parameters in the:: file (which is also generated if it does not already exist). +One can select between RSA (``MODULE_SIG_KEY_TYPE_RSA``) and ECDSA +(``MODULE_SIG_KEY_TYPE_ECDSA``) to generate either RSA 4k or NIST +P-384 keypair. + It is strongly recommended that you provide your own x509.genkey file. Most notably, in the x509.genkey file, the req_distinguished_name section diff --git a/Documentation/admin-guide/perf/ampere_cspmu.rst b/Documentation/admin-guide/perf/ampere_cspmu.rst new file mode 100644 index 0000000000..94f93f5aee --- /dev/null +++ b/Documentation/admin-guide/perf/ampere_cspmu.rst @@ -0,0 +1,29 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================================ +Ampere SoC Performance Monitoring Unit (PMU) +============================================ + +Ampere SoC PMU is a generic PMU IP that follows Arm CoreSight PMU architecture. +Therefore, the driver is implemented as a submodule of arm_cspmu driver. At the +first phase it's used for counting MCU events on AmpereOne. + + +MCU PMU events +-------------- + +The PMU driver supports setting filters for "rank", "bank", and "threshold". +Note, that the filters are per PMU instance rather than per event. + + +Example for perf tool use:: + + / # perf list ampere + + ampere_mcu_pmu_0/act_sent/ [Kernel PMU event] + <...> + ampere_mcu_pmu_1/rd_sent/ [Kernel PMU event] + <...> + + / # perf stat -a -e ampere_mcu_pmu_0/act_sent,bank=5,rank=3,threshold=2/,ampere_mcu_pmu_1/rd_sent/ \ + sleep 1 diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst index f60be04e4e..a2e6f2c811 100644 --- a/Documentation/admin-guide/perf/index.rst +++ b/Documentation/admin-guide/perf/index.rst @@ -22,3 +22,4 @@ Performance monitor support nvidia-pmu meson-ddr-pmu cxl + ampere_cspmu diff --git a/Documentation/admin-guide/pm/intel_idle.rst b/Documentation/admin-guide/pm/intel_idle.rst index b799a43da6..39bd6ecce7 100644 --- a/Documentation/admin-guide/pm/intel_idle.rst +++ b/Documentation/admin-guide/pm/intel_idle.rst @@ -170,7 +170,7 @@ and ``idle=nomwait``. If any of them is present in the kernel command line, the ``MWAIT`` instruction is not allowed to be used, so the initialization of ``intel_idle`` will fail. -Apart from that there are four module parameters recognized by ``intel_idle`` +Apart from that there are five module parameters recognized by ``intel_idle`` itself that can be set via the kernel command line (they cannot be updated via sysfs, so that is the only way to change their values). @@ -216,6 +216,21 @@ are ignored). The idle states disabled this way can be enabled (on a per-CPU basis) from user space via ``sysfs``. +The ``ibrs_off`` module parameter is a boolean flag (defaults to +false). If set, it is used to control if IBRS (Indirect Branch Restricted +Speculation) should be turned off when the CPU enters an idle state. +This flag does not affect CPUs that use Enhanced IBRS which can remain +on with little performance impact. + +For some CPUs, IBRS will be selected as mitigation for Spectre v2 and Retbleed +security vulnerabilities by default. Leaving the IBRS mode on while idling may +have a performance impact on its sibling CPU. The IBRS mode will be turned off +by default when the CPU enters into a deep idle state, but not in some +shallower ones. Setting the ``ibrs_off`` module parameter will force the IBRS +mode to off when the CPU is in any one of the available idle states. This may +help performance of a sibling CPU at the expense of a slightly higher wakeup +latency for the idle CPU. + .. _intel-idle-core-and-package-idle-states: diff --git a/Documentation/admin-guide/pstore-blk.rst b/Documentation/admin-guide/pstore-blk.rst index 2d22ead952..1bb2a1c292 100644 --- a/Documentation/admin-guide/pstore-blk.rst +++ b/Documentation/admin-guide/pstore-blk.rst @@ -76,7 +76,7 @@ kmsg_size ~~~~~~~~~ The chunk size in KB for oops/panic front-end. It **MUST** be a multiple of 4. -It's optional if you do not care oops/panic log. +It's optional if you do not care about the oops/panic log. There are multiple chunks for oops/panic front-end depending on the remaining space except other pstore front-ends. @@ -88,7 +88,7 @@ pmsg_size ~~~~~~~~~ The chunk size in KB for pmsg front-end. It **MUST** be a multiple of 4. -It's optional if you do not care pmsg log. +It's optional if you do not care about the pmsg log. Unlike oops/panic front-end, there is only one chunk for pmsg front-end. @@ -100,7 +100,7 @@ console_size ~~~~~~~~~~~~ The chunk size in KB for console front-end. It **MUST** be a multiple of 4. -It's optional if you do not care console log. +It's optional if you do not care about the console log. Similar to pmsg front-end, there is only one chunk for console front-end. @@ -111,7 +111,7 @@ ftrace_size ~~~~~~~~~~~ The chunk size in KB for ftrace front-end. It **MUST** be a multiple of 4. -It's optional if you do not care console log. +It's optional if you do not care about the ftrace log. Similar to oops front-end, there are multiple chunks for ftrace front-end depending on the count of cpu processors. Each chunk size is equal to diff --git a/Documentation/admin-guide/spkguide.txt b/Documentation/admin-guide/spkguide.txt index 74ea7f3919..0d5965138f 100644 --- a/Documentation/admin-guide/spkguide.txt +++ b/Documentation/admin-guide/spkguide.txt @@ -7,7 +7,7 @@ Last modified on Mon Sep 27 14:26:31 2010 Document version 1.3 Copyright (c) 2005 Gene Collins -Copyright (c) 2008 Samuel Thibault +Copyright (c) 2008, 2023 Samuel Thibault Copyright (c) 2009, 2010 the Speakup Team Permission is granted to copy, distribute and/or modify this document @@ -83,8 +83,7 @@ spkout -- Speak Out txprt -- Transport dummy -- Plain text terminal -Note: Speakup does * NOT * support usb connections! Speakup also does * -NOT * support the internal Tripletalk! +Note: Speakup does * NOT * support the internal Tripletalk! Speakup does support two other synthesizers, but because they work in conjunction with other software, they must be loaded as modules after @@ -94,6 +93,12 @@ These are as follows: decpc -- DecTalk PC (not available at boot up) soft -- One of several software synthesizers (not available at boot up) +By default speakup looks for the synthesizer on the ttyS0 serial port. This can +be changed with the device parameter of the modules, for instance for +DoubleTalk LT: + +speakup_ltlk.dev=ttyUSB0 + See the sections on loading modules and software synthesizers later in this manual for further details. It should be noted here that the speakup.synth boot parameter will have no effect if Speakup has been diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst index a321b84ecc..47499a1742 100644 --- a/Documentation/admin-guide/sysctl/fs.rst +++ b/Documentation/admin-guide/sysctl/fs.rst @@ -42,16 +42,16 @@ pre-allocation or re-sizing of any kernel data structures. dentry-state ------------ -This file shows the values in ``struct dentry_stat``, as defined in -``linux/include/linux/dcache.h``:: +This file shows the values in ``struct dentry_stat_t``, as defined in +``fs/dcache.c``:: struct dentry_stat_t dentry_stat { - int nr_dentry; - int nr_unused; - int age_limit; /* age in seconds */ - int want_pages; /* pages requested by system */ - int nr_negative; /* # of unused negative dentries */ - int dummy; /* Reserved for future use */ + long nr_dentry; + long nr_unused; + long age_limit; /* age in seconds */ + long want_pages; /* pages requested by system */ + long nr_negative; /* # of unused negative dentries */ + long dummy; /* Reserved for future use */ }; Dentries are dynamically allocated and deallocated. diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index cf33de56da..6584a1f9bf 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -436,7 +436,7 @@ ignore-unaligned-usertrap On architectures where unaligned accesses cause traps, and where this feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN``; -currently, ``arc``, ``ia64`` and ``loongarch``), controls whether all +currently, ``arc`` and ``loongarch``), controls whether all unaligned traps are logged. = ============================================================= @@ -445,10 +445,7 @@ unaligned traps are logged. setting. = ============================================================= -See also `unaligned-trap`_ and `unaligned-dump-stack`_. On ``ia64``, -this allows system administrators to override the -``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded. - +See also `unaligned-trap`_. io_uring_disabled ================= @@ -1182,7 +1179,8 @@ automatically on platforms where it can run (that is, platforms with asymmetric CPU topologies and having an Energy Model available). If your platform happens to meet the requirements for EAS but you do not want to use it, change -this value to 0. +this value to 0. On Non-EAS platforms, write operation fails and +read doesn't return anything. task_delayacct =============== @@ -1538,22 +1536,6 @@ See Documentation/admin-guide/kernel-parameters.rst and Documentation/trace/boottime-trace.rst. -.. _unaligned-dump-stack: - -unaligned-dump-stack (ia64) -=========================== - -When logging unaligned accesses, controls whether the stack is -dumped. - -= =================================================== -0 Do not dump the stack. This is the default setting. -1 Dump the stack. -= =================================================== - -See also `ignore-unaligned-usertrap`_. - - unaligned-trap ============== diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst index 4877563241..c7525942f1 100644 --- a/Documentation/admin-guide/sysctl/net.rst +++ b/Documentation/admin-guide/sysctl/net.rst @@ -71,6 +71,7 @@ two flavors of JITs, the newer eBPF JIT currently supported on: - s390x - riscv64 - riscv32 + - loongarch64 And the older cBPF JIT supported on the following archs: diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 45ba1f4dc0..c59889de12 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -742,8 +742,8 @@ overcommit_memory This value contains a flag that enables memory overcommitment. -When this flag is 0, the kernel attempts to estimate the amount -of free memory left when userspace requests more memory. +When this flag is 0, the kernel compares the userspace memory request +size against total memory plus swap and rejects obvious overcommits. When this flag is 1, the kernel pretends there is always enough memory until it actually runs out. |