diff options
Diffstat (limited to 'man7/cgroups.7')
-rw-r--r-- | man7/cgroups.7 | 1914 |
1 files changed, 0 insertions, 1914 deletions
diff --git a/man7/cgroups.7 b/man7/cgroups.7 deleted file mode 100644 index e2c3ec2..0000000 --- a/man7/cgroups.7 +++ /dev/null @@ -1,1914 +0,0 @@ -.\" Copyright (C) 2015 Serge Hallyn <serge@hallyn.com> -.\" and Copyright (C) 2016, 2017 Michael Kerrisk <mtk.manpages@gmail.com> -.\" -.\" SPDX-License-Identifier: Linux-man-pages-copyleft -.\" -.TH cgroups 7 2024-03-05 "Linux man-pages 6.7" -.SH NAME -cgroups \- Linux control groups -.SH DESCRIPTION -Control groups, usually referred to as cgroups, -are a Linux kernel feature which allow processes to -be organized into hierarchical groups whose usage of -various types of resources can then be limited and monitored. -The kernel's cgroup interface is provided through -a pseudo-filesystem called cgroupfs. -Grouping is implemented in the core cgroup kernel code, -while resource tracking and limits are implemented in -a set of per-resource-type subsystems (memory, CPU, and so on). -.\" -.SS Terminology -A -.I cgroup -is a collection of processes that are bound to a set of -limits or parameters defined via the cgroup filesystem. -.P -A -.I subsystem -is a kernel component that modifies the behavior of -the processes in a cgroup. -Various subsystems have been implemented, making it possible to do things -such as limiting the amount of CPU time and memory available to a cgroup, -accounting for the CPU time used by a cgroup, -and freezing and resuming execution of the processes in a cgroup. -Subsystems are sometimes also known as -.I resource controllers -(or simply, controllers). -.P -The cgroups for a controller are arranged in a -.IR hierarchy . -This hierarchy is defined by creating, removing, and -renaming subdirectories within the cgroup filesystem. -At each level of the hierarchy, attributes (e.g., limits) can be defined. -The limits, control, and accounting provided by cgroups generally have -effect throughout the subhierarchy underneath the cgroup where the -attributes are defined. -Thus, for example, the limits placed on -a cgroup at a higher level in the hierarchy cannot be exceeded -by descendant cgroups. -.\" -.SS Cgroups version 1 and version 2 -The initial release of the cgroups implementation was in Linux 2.6.24. -Over time, various cgroup controllers have been added -to allow the management of various types of resources. -However, the development of these controllers was largely uncoordinated, -with the result that many inconsistencies arose between controllers -and management of the cgroup hierarchies became rather complex. -A longer description of these problems can be found in the kernel -source file -.I Documentation/admin\-guide/cgroup\-v2.rst -(or -.I Documentation/cgroup\-v2.txt -in Linux 4.17 and earlier). -.P -Because of the problems with the initial cgroups implementation -(cgroups version 1), -starting in Linux 3.10, work began on a new, -orthogonal implementation to remedy these problems. -Initially marked experimental, and hidden behind the -.I "\-o\ __DEVEL__sane_behavior" -mount option, the new version (cgroups version 2) -was eventually made official with the release of Linux 4.5. -Differences between the two versions are described in the text below. -The file -.IR cgroup.sane_behavior , -present in cgroups v1, is a relic of this mount option. -The file always reports "0" and is only retained for backward compatibility. -.P -Although cgroups v2 is intended as a replacement for cgroups v1, -the older system continues to exist -(and for compatibility reasons is unlikely to be removed). -Currently, cgroups v2 implements only a subset of the controllers -available in cgroups v1. -The two systems are implemented so that both v1 controllers and -v2 controllers can be mounted on the same system. -Thus, for example, it is possible to use those controllers -that are supported under version 2, -while also using version 1 controllers -where version 2 does not yet support those controllers. -The only restriction here is that a controller can't be simultaneously -employed in both a cgroups v1 hierarchy and in the cgroups v2 hierarchy. -.\" -.SH CGROUPS VERSION 1 -Under cgroups v1, each controller may be mounted against a separate -cgroup filesystem that provides its own hierarchical organization of the -processes on the system. -It is also possible to comount multiple (or even all) cgroups v1 controllers -against the same cgroup filesystem, meaning that the comounted controllers -manage the same hierarchical organization of processes. -.P -For each mounted hierarchy, -the directory tree mirrors the control group hierarchy. -Each control group is represented by a directory, with each of its child -control cgroups represented as a child directory. -For instance, -.I /user/joe/1.session -represents control group -.IR 1.session , -which is a child of cgroup -.IR joe , -which is a child of -.IR /user . -Under each cgroup directory is a set of files which can be read or -written to, reflecting resource limits and a few general cgroup -properties. -.\" -.SS Tasks (threads) versus processes -In cgroups v1, a distinction is drawn between -.I processes -and -.IR tasks . -In this view, a process can consist of multiple tasks -(more commonly called threads, from a user-space perspective, -and called such in the remainder of this man page). -In cgroups v1, it is possible to independently manipulate -the cgroup memberships of the threads in a process. -.P -The cgroups v1 ability to split threads across different cgroups -caused problems in some cases. -For example, it made no sense for the -.I memory -controller, -since all of the threads of a process share a single address space. -Because of these problems, -the ability to independently manipulate the cgroup memberships -of the threads in a process was removed in the initial cgroups v2 -implementation, and subsequently restored in a more limited form -(see the discussion of "thread mode" below). -.\" -.SS Mounting v1 controllers -The use of cgroups requires a kernel built with the -.B CONFIG_CGROUP -option. -In addition, each of the v1 controllers has an associated -configuration option that must be set in order to employ that controller. -.P -In order to use a v1 controller, -it must be mounted against a cgroup filesystem. -The usual place for such mounts is under a -.BR tmpfs (5) -filesystem mounted at -.IR /sys/fs/cgroup . -Thus, one might mount the -.I cpu -controller as follows: -.P -.in +4n -.EX -mount \-t cgroup \-o cpu none /sys/fs/cgroup/cpu -.EE -.in -.P -It is possible to comount multiple controllers against the same hierarchy. -For example, here the -.I cpu -and -.I cpuacct -controllers are comounted against a single hierarchy: -.P -.in +4n -.EX -mount \-t cgroup \-o cpu,cpuacct none /sys/fs/cgroup/cpu,cpuacct -.EE -.in -.P -Comounting controllers has the effect that a process is in the same cgroup for -all of the comounted controllers. -Separately mounting controllers allows a process to -be in cgroup -.I /foo1 -for one controller while being in -.I /foo2/foo3 -for another. -.P -It is possible to comount all v1 controllers against the same hierarchy: -.P -.in +4n -.EX -mount \-t cgroup \-o all cgroup /sys/fs/cgroup -.EE -.in -.P -(One can achieve the same result by omitting -.IR "\-o all" , -since it is the default if no controllers are explicitly specified.) -.P -It is not possible to mount the same controller -against multiple cgroup hierarchies. -For example, it is not possible to mount both the -.I cpu -and -.I cpuacct -controllers against one hierarchy, and to mount the -.I cpu -controller alone against another hierarchy. -It is possible to create multiple mount with exactly -the same set of comounted controllers. -However, in this case all that results is multiple mount points -providing a view of the same hierarchy. -.P -Note that on many systems, the v1 controllers are automatically mounted under -.IR /sys/fs/cgroup ; -in particular, -.BR systemd (1) -automatically creates such mounts. -.\" -.SS Unmounting v1 controllers -A mounted cgroup filesystem can be unmounted using the -.BR umount (8) -command, as in the following example: -.P -.in +4n -.EX -umount /sys/fs/cgroup/pids -.EE -.in -.P -.IR "But note well" : -a cgroup filesystem is unmounted only if it is not busy, -that is, it has no child cgroups. -If this is not the case, then the only effect of the -.BR umount (8) -is to make the mount invisible. -Thus, to ensure that the mount is really removed, -one must first remove all child cgroups, -which in turn can be done only after all member processes -have been moved from those cgroups to the root cgroup. -.\" -.SS Cgroups version 1 controllers -Each of the cgroups version 1 controllers is governed -by a kernel configuration option (listed below). -Additionally, the availability of the cgroups feature is governed by the -.B CONFIG_CGROUPS -kernel configuration option. -.TP -.IR cpu " (since Linux 2.6.24; " \fBCONFIG_CGROUP_SCHED\fP ) -Cgroups can be guaranteed a minimum number of "CPU shares" -when a system is busy. -This does not limit a cgroup's CPU usage if the CPUs are not busy. -For further information, see -.I Documentation/scheduler/sched\-design\-CFS.rst -(or -.I Documentation/scheduler/sched\-design\-CFS.txt -in Linux 5.2 and earlier). -.IP -In Linux 3.2, -this controller was extended to provide CPU "bandwidth" control. -If the kernel is configured with -.BR CONFIG_CFS_BANDWIDTH , -then within each scheduling period -(defined via a file in the cgroup directory), it is possible to define -an upper limit on the CPU time allocated to the processes in a cgroup. -This upper limit applies even if there is no other competition for the CPU. -Further information can be found in the kernel source file -.I Documentation/scheduler/sched\-bwc.rst -(or -.I Documentation/scheduler/sched\-bwc.txt -in Linux 5.2 and earlier). -.TP -.IR cpuacct " (since Linux 2.6.24; " \fBCONFIG_CGROUP_CPUACCT\fP ) -This provides accounting for CPU usage by groups of processes. -.IP -Further information can be found in the kernel source file -.I Documentation/admin\-guide/cgroup\-v1/cpuacct.rst -(or -.I Documentation/cgroup\-v1/cpuacct.txt -in Linux 5.2 and earlier). -.TP -.IR cpuset " (since Linux 2.6.24; " \fBCONFIG_CPUSETS\fP ) -This cgroup can be used to bind the processes in a cgroup to -a specified set of CPUs and NUMA nodes. -.IP -Further information can be found in the kernel source file -.I Documentation/admin\-guide/cgroup\-v1/cpusets.rst -(or -.I Documentation/cgroup\-v1/cpusets.txt -in Linux 5.2 and earlier). -. -.TP -.IR memory " (since Linux 2.6.25; " \fBCONFIG_MEMCG\fP ) -The memory controller supports reporting and limiting of process memory, kernel -memory, and swap used by cgroups. -.IP -Further information can be found in the kernel source file -.I Documentation/admin\-guide/cgroup\-v1/memory.rst -(or -.I Documentation/cgroup\-v1/memory.txt -in Linux 5.2 and earlier). -.TP -.IR devices " (since Linux 2.6.26; " \fBCONFIG_CGROUP_DEVICE\fP ) -This supports controlling which processes may create (mknod) devices as -well as open them for reading or writing. -The policies may be specified as allow-lists and deny-lists. -Hierarchy is enforced, so new rules must not -violate existing rules for the target or ancestor cgroups. -.IP -Further information can be found in the kernel source file -.I Documentation/admin\-guide/cgroup\-v1/devices.rst -(or -.I Documentation/cgroup\-v1/devices.txt -in Linux 5.2 and earlier). -.TP -.IR freezer " (since Linux 2.6.28; " \fBCONFIG_CGROUP_FREEZER\fP ) -The -.I freezer -cgroup can suspend and restore (resume) all processes in a cgroup. -Freezing a cgroup -.I /A -also causes its children, for example, processes in -.IR /A/B , -to be frozen. -.IP -Further information can be found in the kernel source file -.I Documentation/admin\-guide/cgroup\-v1/freezer\-subsystem.rst -(or -.I Documentation/cgroup\-v1/freezer\-subsystem.txt -in Linux 5.2 and earlier). -.TP -.IR net_cls " (since Linux 2.6.29; " \fBCONFIG_CGROUP_NET_CLASSID\fP ) -This places a classid, specified for the cgroup, on network packets -created by a cgroup. -These classids can then be used in firewall rules, -as well as used to shape traffic using -.BR tc (8). -This applies only to packets -leaving the cgroup, not to traffic arriving at the cgroup. -.IP -Further information can be found in the kernel source file -.I Documentation/admin\-guide/cgroup\-v1/net_cls.rst -(or -.I Documentation/cgroup\-v1/net_cls.txt -in Linux 5.2 and earlier). -.TP -.IR blkio " (since Linux 2.6.33; " \fBCONFIG_BLK_CGROUP\fP ) -The -.I blkio -cgroup controls and limits access to specified block devices by -applying IO control in the form of throttling and upper limits against leaf -nodes and intermediate nodes in the storage hierarchy. -.IP -Two policies are available. -The first is a proportional-weight time-based division -of disk implemented with CFQ. -This is in effect for leaf nodes using CFQ. -The second is a throttling policy which specifies -upper I/O rate limits on a device. -.IP -Further information can be found in the kernel source file -.I Documentation/admin\-guide/cgroup\-v1/blkio\-controller.rst -(or -.I Documentation/cgroup\-v1/blkio\-controller.txt -in Linux 5.2 and earlier). -.TP -.IR perf_event " (since Linux 2.6.39; " \fBCONFIG_CGROUP_PERF\fP ) -This controller allows -.I perf -monitoring of the set of processes grouped in a cgroup. -.IP -Further information can be found in the kernel source files -.TP -.IR net_prio " (since Linux 3.3; " \fBCONFIG_CGROUP_NET_PRIO\fP ) -This allows priorities to be specified, per network interface, for cgroups. -.IP -Further information can be found in the kernel source file -.I Documentation/admin\-guide/cgroup\-v1/net_prio.rst -(or -.I Documentation/cgroup\-v1/net_prio.txt -in Linux 5.2 and earlier). -.TP -.IR hugetlb " (since Linux 3.5; " \fBCONFIG_CGROUP_HUGETLB\fP ) -This supports limiting the use of huge pages by cgroups. -.IP -Further information can be found in the kernel source file -.I Documentation/admin\-guide/cgroup\-v1/hugetlb.rst -(or -.I Documentation/cgroup\-v1/hugetlb.txt -in Linux 5.2 and earlier). -.TP -.IR pids " (since Linux 4.3; " \fBCONFIG_CGROUP_PIDS\fP ) -This controller permits limiting the number of process that may be created -in a cgroup (and its descendants). -.IP -Further information can be found in the kernel source file -.I Documentation/admin\-guide/cgroup\-v1/pids.rst -(or -.I Documentation/cgroup\-v1/pids.txt -in Linux 5.2 and earlier). -.TP -.IR rdma " (since Linux 4.11; " \fBCONFIG_CGROUP_RDMA\fP ) -The RDMA controller permits limiting the use of -RDMA/IB-specific resources per cgroup. -.IP -Further information can be found in the kernel source file -.I Documentation/admin\-guide/cgroup\-v1/rdma.rst -(or -.I Documentation/cgroup\-v1/rdma.txt -in Linux 5.2 and earlier). -.\" -.SS Creating cgroups and moving processes -A cgroup filesystem initially contains a single root cgroup, '/', -which all processes belong to. -A new cgroup is created by creating a directory in the cgroup filesystem: -.P -.in +4n -.EX -mkdir /sys/fs/cgroup/cpu/cg1 -.EE -.in -.P -This creates a new empty cgroup. -.P -A process may be moved to this cgroup by writing its PID into the cgroup's -.I cgroup.procs -file: -.P -.in +4n -.EX -echo $$ > /sys/fs/cgroup/cpu/cg1/cgroup.procs -.EE -.in -.P -Only one PID at a time should be written to this file. -.P -Writing the value 0 to a -.I cgroup.procs -file causes the writing process to be moved to the corresponding cgroup. -.P -When writing a PID into the -.IR cgroup.procs , -all threads in the process are moved into the new cgroup at once. -.P -Within a hierarchy, a process can be a member of exactly one cgroup. -Writing a process's PID to a -.I cgroup.procs -file automatically removes it from the cgroup of -which it was previously a member. -.P -The -.I cgroup.procs -file can be read to obtain a list of the processes that are -members of a cgroup. -The returned list of PIDs is not guaranteed to be in order. -Nor is it guaranteed to be free of duplicates. -(For example, a PID may be recycled while reading from the list.) -.P -In cgroups v1, an individual thread can be moved to -another cgroup by writing its thread ID -(i.e., the kernel thread ID returned by -.BR clone (2) -and -.BR gettid (2)) -to the -.I tasks -file in a cgroup directory. -This file can be read to discover the set of threads -that are members of the cgroup. -.\" -.SS Removing cgroups -To remove a cgroup, -it must first have no child cgroups and contain no (nonzombie) processes. -So long as that is the case, one can simply -remove the corresponding directory pathname. -Note that files in a cgroup directory cannot and need not be -removed. -.\" -.SS Cgroups v1 release notification -Two files can be used to determine whether the kernel provides -notifications when a cgroup becomes empty. -A cgroup is considered to be empty when it contains no child -cgroups and no member processes. -.P -A special file in the root directory of each cgroup hierarchy, -.IR release_agent , -can be used to register the pathname of a program that may be invoked when -a cgroup in the hierarchy becomes empty. -The pathname of the newly empty cgroup (relative to the cgroup mount point) -is provided as the sole command-line argument when the -.I release_agent -program is invoked. -The -.I release_agent -program might remove the cgroup directory, -or perhaps repopulate it with a process. -.P -The default value of the -.I release_agent -file is empty, meaning that no release agent is invoked. -.P -The content of the -.I release_agent -file can also be specified via a mount option when the -cgroup filesystem is mounted: -.P -.in +4n -.EX -mount \-o release_agent=pathname ... -.EE -.in -.P -Whether or not the -.I release_agent -program is invoked when a particular cgroup becomes empty is determined -by the value in the -.I notify_on_release -file in the corresponding cgroup directory. -If this file contains the value 0, then the -.I release_agent -program is not invoked. -If it contains the value 1, the -.I release_agent -program is invoked. -The default value for this file in the root cgroup is 0. -At the time when a new cgroup is created, -the value in this file is inherited from the corresponding file -in the parent cgroup. -.\" -.SS Cgroup v1 named hierarchies -In cgroups v1, -it is possible to mount a cgroup hierarchy that has no attached controllers: -.P -.in +4n -.EX -mount \-t cgroup \-o none,name=somename none /some/mount/point -.EE -.in -.P -Multiple instances of such hierarchies can be mounted; -each hierarchy must have a unique name. -The only purpose of such hierarchies is to track processes. -(See the discussion of release notification below.) -An example of this is the -.I name=systemd -cgroup hierarchy that is used by -.BR systemd (1) -to track services and user sessions. -.P -Since Linux 5.0, the -.I cgroup_no_v1 -kernel boot option (described below) can be used to disable cgroup v1 -named hierarchies, by specifying -.IR cgroup_no_v1=named . -.\" -.SH CGROUPS VERSION 2 -In cgroups v2, -all mounted controllers reside in a single unified hierarchy. -While (different) controllers may be simultaneously -mounted under the v1 and v2 hierarchies, -it is not possible to mount the same controller simultaneously -under both the v1 and the v2 hierarchies. -.P -The new behaviors in cgroups v2 are summarized here, -and in some cases elaborated in the following subsections. -.IP \[bu] 3 -Cgroups v2 provides a unified hierarchy against -which all controllers are mounted. -.IP \[bu] -"Internal" processes are not permitted. -With the exception of the root cgroup, processes may reside -only in leaf nodes (cgroups that do not themselves contain child cgroups). -The details are somewhat more subtle than this, and are described below. -.IP \[bu] -Active cgroups must be specified via the files -.I cgroup.controllers -and -.IR cgroup.subtree_control . -.IP \[bu] -The -.I tasks -file has been removed. -In addition, the -.I cgroup.clone_children -file that is employed by the -.I cpuset -controller has been removed. -.IP \[bu] -An improved mechanism for notification of empty cgroups is provided by the -.I cgroup.events -file. -.P -For more changes, see the -.I Documentation/admin\-guide/cgroup\-v2.rst -file in the kernel source -(or -.I Documentation/cgroup\-v2.txt -in Linux 4.17 and earlier). -. -.P -Some of the new behaviors listed above saw subsequent modification with -the addition in Linux 4.14 of "thread mode" (described below). -.\" -.SS Cgroups v2 unified hierarchy -In cgroups v1, the ability to mount different controllers -against different hierarchies was intended to allow great flexibility -for application design. -In practice, though, -the flexibility turned out to be less useful than expected, -and in many cases added complexity. -Therefore, in cgroups v2, -all available controllers are mounted against a single hierarchy. -The available controllers are automatically mounted, -meaning that it is not necessary (or possible) to specify the controllers -when mounting the cgroup v2 filesystem using a command such as the following: -.P -.in +4n -.EX -mount \-t cgroup2 none /mnt/cgroup2 -.EE -.in -.P -A cgroup v2 controller is available only if it is not currently in use -via a mount against a cgroup v1 hierarchy. -Or, to put things another way, it is not possible to employ -the same controller against both a v1 hierarchy and the unified v2 hierarchy. -This means that it may be necessary first to unmount a v1 controller -(as described above) before that controller is available in v2. -Since -.BR systemd (1) -makes heavy use of some v1 controllers by default, -it can in some cases be simpler to boot the system with -selected v1 controllers disabled. -To do this, specify the -.I cgroup_no_v1=list -option on the kernel boot command line; -.I list -is a comma-separated list of the names of the controllers to disable, -or the word -.I all -to disable all v1 controllers. -(This situation is correctly handled by -.BR systemd (1), -which falls back to operating without the specified controllers.) -.P -Note that on many modern systems, -.BR systemd (1) -automatically mounts the -.I cgroup2 -filesystem at -.I /sys/fs/cgroup/unified -during the boot process. -.\" -.SS Cgroups v2 mount options -The following options -.RI ( mount\~\-o ) -can be specified when mounting the group v2 filesystem: -.TP -.IR nsdelegate " (since Linux 4.15)" -Treat cgroup namespaces as delegation boundaries. -For details, see below. -.TP -.IR memory_localevents " (since Linux 5.2)" -.\" commit 9852ae3fe5293264f01c49f2571ef7688f7823ce -The -.I memory.events -should show statistics only for the cgroup itself, -and not for any descendant cgroups. -This was the behavior before Linux 5.2. -Starting in Linux 5.2, -the default behavior is to include statistics for descendant cgroups in -.IR memory.events , -and this mount option can be used to revert to the legacy behavior. -This option is system wide and can be set on mount or -modified through remount only from the initial mount namespace; -it is silently ignored in noninitial namespaces. -.\" -.SS Cgroups v2 controllers -The following controllers, documented in the kernel source file -.I Documentation/admin\-guide/cgroup\-v2.rst -(or -.I Documentation/cgroup\-v2.txt -in Linux 4.17 and earlier), -are supported in cgroups version 2: -.TP -.IR cpu " (since Linux 4.15)" -This is the successor to the version 1 -.I cpu -and -.I cpuacct -controllers. -.TP -.IR cpuset " (since Linux 5.0)" -This is the successor of the version 1 -.I cpuset -controller. -.TP -.IR freezer " (since Linux 5.2)" -.\" commit 76f969e8948d82e78e1bc4beb6b9465908e74873 -This is the successor of the version 1 -.I freezer -controller. -.TP -.IR hugetlb " (since Linux 5.6)" -This is the successor of the version 1 -.I hugetlb -controller. -.TP -.IR io " (since Linux 4.5)" -This is the successor of the version 1 -.I blkio -controller. -.TP -.IR memory " (since Linux 4.5)" -This is the successor of the version 1 -.I memory -controller. -.TP -.IR perf_event " (since Linux 4.11)" -This is the same as the version 1 -.I perf_event -controller. -.TP -.IR pids " (since Linux 4.5)" -This is the same as the version 1 -.I pids -controller. -.TP -.IR rdma " (since Linux 4.11)" -This is the same as the version 1 -.I rdma -controller. -.P -There is no direct equivalent of the -.I net_cls -and -.I net_prio -controllers from cgroups version 1. -Instead, support has been added to -.BR iptables (8) -to allow eBPF filters that hook on cgroup v2 pathnames to make decisions -about network traffic on a per-cgroup basis. -.P -The v2 -.I devices -controller provides no interface files; -instead, device control is gated by attaching an eBPF -.RB ( BPF_CGROUP_DEVICE ) -program to a v2 cgroup. -.\" -.SS Cgroups v2 subtree control -Each cgroup in the v2 hierarchy contains the following two files: -.TP -.I cgroup.controllers -This read-only file exposes a list of the controllers that are -.I available -in this cgroup. -The contents of this file match the contents of the -.I cgroup.subtree_control -file in the parent cgroup. -.TP -.I cgroup.subtree_control -This is a list of controllers that are -.I active -.RI ( enabled ) -in the cgroup. -The set of controllers in this file is a subset of the set in the -.I cgroup.controllers -of this cgroup. -The set of active controllers is modified by writing strings to this file -containing space-delimited controller names, -each preceded by '+' (to enable a controller) -or '\-' (to disable a controller), as in the following example: -.IP -.in +4n -.EX -echo \[aq]+pids \-memory\[aq] > x/y/cgroup.subtree_control -.EE -.in -.IP -An attempt to enable a controller -that is not present in -.I cgroup.controllers -leads to an -.B ENOENT -error when writing to the -.I cgroup.subtree_control -file. -.P -Because the list of controllers in -.I cgroup.subtree_control -is a subset of those -.IR cgroup.controllers , -a controller that has been disabled in one cgroup in the hierarchy -can never be re-enabled in the subtree below that cgroup. -.P -A cgroup's -.I cgroup.subtree_control -file determines the set of controllers that are exercised in the -.I child -cgroups. -When a controller (e.g., -.IR pids ) -is present in the -.I cgroup.subtree_control -file of a parent cgroup, -then the corresponding controller-interface files (e.g., -.IR pids.max ) -are automatically created in the children of that cgroup -and can be used to exert resource control in the child cgroups. -.\" -.SS Cgroups v2 \[dq]no internal processes\[dq] rule -Cgroups v2 enforces a so-called "no internal processes" rule. -Roughly speaking, this rule means that, -with the exception of the root cgroup, processes may reside -only in leaf nodes (cgroups that do not themselves contain child cgroups). -This avoids the need to decide how to partition resources between -processes which are members of cgroup A and processes in child cgroups of A. -.P -For instance, if cgroup -.I /cg1/cg2 -exists, then a process may reside in -.IR /cg1/cg2 , -but not in -.IR /cg1 . -This is to avoid an ambiguity in cgroups v1 -with respect to the delegation of resources between processes in -.I /cg1 -and its child cgroups. -The recommended approach in cgroups v2 is to create a subdirectory called -.I leaf -for any nonleaf cgroup which should contain processes, but no child cgroups. -Thus, processes which previously would have gone into -.I /cg1 -would now go into -.IR /cg1/leaf . -This has the advantage of making explicit -the relationship between processes in -.I /cg1/leaf -and -.IR /cg1 's -other children. -.P -The "no internal processes" rule is in fact more subtle than stated above. -More precisely, the rule is that a (nonroot) cgroup can't both -(1) have member processes, and -(2) distribute resources into child cgroups\[em]that is, have a nonempty -.I cgroup.subtree_control -file. -Thus, it -.I is -possible for a cgroup to have both member processes and child cgroups, -but before controllers can be enabled for that cgroup, -the member processes must be moved out of the cgroup -(e.g., perhaps into the child cgroups). -.P -With the Linux 4.14 addition of "thread mode" (described below), -the "no internal processes" rule has been relaxed in some cases. -.\" -.SS Cgroups v2 cgroup.events file -Each nonroot cgroup in the v2 hierarchy contains a read-only file, -.IR cgroup.events , -whose contents are key-value pairs -(delimited by newline characters, with the key and value separated by spaces) -providing state information about the cgroup: -.P -.in +4n -.EX -$ \fBcat mygrp/cgroup.events\fP -populated 1 -frozen 0 -.EE -.in -.P -The following keys may appear in this file: -.TP -.I populated -The value of this key is either 1, -if this cgroup or any of its descendants has member processes, -or otherwise 0. -.TP -.IR frozen " (since Linux 5.2)" -.\" commit 76f969e8948d82e78e1bc4beb6b9465908e7487 -The value of this key is 1 if this cgroup is currently frozen, -or 0 if it is not. -.P -The -.I cgroup.events -file can be monitored, in order to receive notification when the value of -one of its keys changes. -Such monitoring can be done using -.BR inotify (7), -which notifies changes as -.B IN_MODIFY -events, or -.BR poll (2), -which notifies changes by returning the -.B POLLPRI -and -.B POLLERR -bits in the -.I revents -field. -.\" -.SS Cgroup v2 release notification -Cgroups v2 provides a new mechanism for obtaining notification -when a cgroup becomes empty. -The cgroups v1 -.I release_agent -and -.I notify_on_release -files are removed, and replaced by the -.I populated -key in the -.I cgroup.events -file. -This key either has the value 0, -meaning that the cgroup (and its descendants) -contain no (nonzombie) member processes, -or 1, meaning that the cgroup (or one of its descendants) -contains member processes. -.P -The cgroups v2 release-notification mechanism -offers the following advantages over the cgroups v1 -.I release_agent -mechanism: -.IP \[bu] 3 -It allows for cheaper notification, -since a single process can monitor multiple -.I cgroup.events -files (using the techniques described earlier). -By contrast, the cgroups v1 mechanism requires the expense of creating -a process for each notification. -.IP \[bu] -Notification for different cgroup subhierarchies can be delegated -to different processes. -By contrast, the cgroups v1 mechanism allows only one release agent -for an entire hierarchy. -.\" -.SS Cgroups v2 cgroup.stat file -.\" commit ec39225cca42c05ac36853d11d28f877fde5c42e -Each cgroup in the v2 hierarchy contains a read-only -.I cgroup.stat -file (first introduced in Linux 4.14) -that consists of lines containing key-value pairs. -The following keys currently appear in this file: -.TP -.I nr_descendants -This is the total number of visible (i.e., living) descendant cgroups -underneath this cgroup. -.TP -.I nr_dying_descendants -This is the total number of dying descendant cgroups -underneath this cgroup. -A cgroup enters the dying state after being deleted. -It remains in that state for an undefined period -(which will depend on system load) -while resources are freed before the cgroup is destroyed. -Note that the presence of some cgroups in the dying state is normal, -and is not indicative of any problem. -.IP -A process can't be made a member of a dying cgroup, -and a dying cgroup can't be brought back to life. -.\" -.SS Limiting the number of descendant cgroups -Each cgroup in the v2 hierarchy contains the following files, -which can be used to view and set limits on the number -of descendant cgroups under that cgroup: -.TP -.IR cgroup.max.depth " (since Linux 4.14)" -.\" commit 1a926e0bbab83bae8207d05a533173425e0496d1 -This file defines a limit on the depth of nesting of descendant cgroups. -A value of 0 in this file means that no descendant cgroups can be created. -An attempt to create a descendant whose nesting level exceeds -the limit fails -.RI ( mkdir (2) -fails with the error -.BR EAGAIN ). -.IP -Writing the string -.I \[dq]max\[dq] -to this file means that no limit is imposed. -The default value in this file is -.IR \[dq]max\[dq] . -.TP -.IR cgroup.max.descendants " (since Linux 4.14)" -.\" commit 1a926e0bbab83bae8207d05a533173425e0496d1 -This file defines a limit on the number of live descendant cgroups that -this cgroup may have. -An attempt to create more descendants than allowed by the limit fails -.RI ( mkdir (2) -fails with the error -.BR EAGAIN ). -.IP -Writing the string -.I \[dq]max\[dq] -to this file means that no limit is imposed. -The default value in this file is -.IR \[dq]max\[dq] . -.\" -.SH CGROUPS DELEGATION: DELEGATING A HIERARCHY TO A LESS PRIVILEGED USER -In the context of cgroups, -delegation means passing management of some subtree -of the cgroup hierarchy to a nonprivileged user. -Cgroups v1 provides support for delegation based on file permissions -in the cgroup hierarchy but with less strict containment rules than v2 -(as noted below). -Cgroups v2 supports delegation with containment by explicit design. -The focus of the discussion in this section is on delegation in cgroups v2, -with some differences for cgroups v1 noted along the way. -.P -Some terminology is required in order to describe delegation. -A -.I delegater -is a privileged user (i.e., root) who owns a parent cgroup. -A -.I delegatee -is a nonprivileged user who will be granted the permissions needed -to manage some subhierarchy under that parent cgroup, -known as the -.IR "delegated subtree" . -.P -To perform delegation, -the delegater makes certain directories and files writable by the delegatee, -typically by changing the ownership of the objects to be the user ID -of the delegatee. -Assuming that we want to delegate the hierarchy rooted at (say) -.I /dlgt_grp -and that there are not yet any child cgroups under that cgroup, -the ownership of the following is changed to the user ID of the delegatee: -.TP -.I /dlgt_grp -Changing the ownership of the root of the subtree means that any new -cgroups created under the subtree (and the files they contain) -will also be owned by the delegatee. -.TP -.I /dlgt_grp/cgroup.procs -Changing the ownership of this file means that the delegatee -can move processes into the root of the delegated subtree. -.TP -.IR /dlgt_grp/cgroup.subtree_control " (cgroups v2 only)" -Changing the ownership of this file means that the delegatee -can enable controllers (that are present in -.IR /dlgt_grp/cgroup.controllers ) -in order to further redistribute resources at lower levels in the subtree. -(As an alternative to changing the ownership of this file, -the delegater might instead add selected controllers to this file.) -.TP -.IR /dlgt_grp/cgroup.threads " (cgroups v2 only)" -Changing the ownership of this file is necessary if a threaded subtree -is being delegated (see the description of "thread mode", below). -This permits the delegatee to write thread IDs to the file. -(The ownership of this file can also be changed when delegating -a domain subtree, but currently this serves no purpose, -since, as described below, it is not possible to move a thread between -domain cgroups by writing its thread ID to the -.I cgroup.threads -file.) -.IP -In cgroups v1, the corresponding file that should instead be delegated is the -.I tasks -file. -.P -The delegater should -.I not -change the ownership of any of the controller interfaces files (e.g., -.IR pids.max , -.IR memory.high ) -in -.IR dlgt_grp . -Those files are used from the next level above the delegated subtree -in order to distribute resources into the subtree, -and the delegatee should not have permission to change -the resources that are distributed into the delegated subtree. -.P -See also the discussion of the -.I /sys/kernel/cgroup/delegate -file in NOTES for information about further delegatable files in cgroups v2. -.P -After the aforementioned steps have been performed, -the delegatee can create child cgroups within the delegated subtree -(the cgroup subdirectories and the files they contain -will be owned by the delegatee) -and move processes between cgroups in the subtree. -If some controllers are present in -.IR dlgt_grp/cgroup.subtree_control , -or the ownership of that file was passed to the delegatee, -the delegatee can also control the further redistribution -of the corresponding resources into the delegated subtree. -.\" -.SS Cgroups v2 delegation: nsdelegate and cgroup namespaces -Starting with Linux 4.13, -.\" commit 5136f6365ce3eace5a926e10f16ed2a233db5ba9 -there is a second way to perform cgroup delegation in the cgroups v2 hierarchy. -This is done by mounting or remounting the cgroup v2 filesystem with the -.I nsdelegate -mount option. -For example, if the cgroup v2 filesystem has already been mounted, -we can remount it with the -.I nsdelegate -option as follows: -.P -.in +4n -.EX -mount \-t cgroup2 \-o remount,nsdelegate \e - none /sys/fs/cgroup/unified -.EE -.in -.\" -.\" Alternatively, we could boot the kernel with the options: -.\" -.\" cgroup_no_v1=all systemd.legacy_systemd_cgroup_controller -.\" -.\" The effect of the latter option is to prevent systemd from employing -.\" its "hybrid" cgroup mode, where it tries to make use of cgroups v2. -.P -The effect of this mount option is to cause cgroup namespaces -to automatically become delegation boundaries. -More specifically, -the following restrictions apply for processes inside the cgroup namespace: -.IP \[bu] 3 -Writes to controller interface files in the root directory of the namespace -will fail with the error -.BR EPERM . -Processes inside the cgroup namespace can still write to delegatable -files in the root directory of the cgroup namespace such as -.I cgroup.procs -and -.IR cgroup.subtree_control , -and can create subhierarchy underneath the root directory. -.IP \[bu] -Attempts to migrate processes across the namespace boundary are denied -(with the error -.BR ENOENT ). -Processes inside the cgroup namespace can still -(subject to the containment rules described below) -move processes between cgroups -.I within -the subhierarchy under the namespace root. -.P -The ability to define cgroup namespaces as delegation boundaries -makes cgroup namespaces more useful. -To understand why, suppose that we already have one cgroup hierarchy -that has been delegated to a nonprivileged user, -.IR cecilia , -using the older delegation technique described above. -Suppose further that -.I cecilia -wanted to further delegate a subhierarchy -under the existing delegated hierarchy. -(For example, the delegated hierarchy might be associated with -an unprivileged container run by -.IR cecilia .) -Even if a cgroup namespace was employed, -because both hierarchies are owned by the unprivileged user -.IR cecilia , -the following illegitimate actions could be performed: -.IP \[bu] 3 -A process in the inferior hierarchy could change the -resource controller settings in the root directory of that hierarchy. -(These resource controller settings are intended to allow control to -be exercised from the -.I parent -cgroup; -a process inside the child cgroup should not be allowed to modify them.) -.IP \[bu] -A process inside the inferior hierarchy could move processes -into and out of the inferior hierarchy if the cgroups in the -superior hierarchy were somehow visible. -.P -Employing the -.I nsdelegate -mount option prevents both of these possibilities. -.P -The -.I nsdelegate -mount option only has an effect when performed in -the initial mount namespace; -in other mount namespaces, the option is silently ignored. -.P -.IR Note : -On some systems, -.BR systemd (1) -automatically mounts the cgroup v2 filesystem. -In order to experiment with the -.I nsdelegate -operation, it may be useful to boot the kernel with -the following command-line options: -.P -.in +4n -.EX -cgroup_no_v1=all systemd.legacy_systemd_cgroup_controller -.EE -.in -.P -These options cause the kernel to boot with the cgroups v1 controllers -disabled (meaning that the controllers are available in the v2 hierarchy), -and tells -.BR systemd (1) -not to mount and use the cgroup v2 hierarchy, -so that the v2 hierarchy can be manually mounted -with the desired options after boot-up. -.\" -.SS Cgroup delegation containment rules -Some delegation -.I containment rules -ensure that the delegatee can move processes between cgroups within the -delegated subtree, -but can't move processes from outside the delegated subtree into -the subtree or vice versa. -A nonprivileged process (i.e., the delegatee) can write the PID of -a "target" process into a -.I cgroup.procs -file only if all of the following are true: -.IP \[bu] 3 -The writer has write permission on the -.I cgroup.procs -file in the destination cgroup. -.IP \[bu] -The writer has write permission on the -.I cgroup.procs -file in the nearest common ancestor of the source and destination cgroups. -Note that in some cases, -the nearest common ancestor may be the source or destination cgroup itself. -This requirement is not enforced for cgroups v1 hierarchies, -with the consequence that containment in v1 is less strict than in v2. -(For example, in cgroups v1 the user that owns two distinct -delegated subhierarchies can move a process between the hierarchies.) -.IP \[bu] -If the cgroup v2 filesystem was mounted with the -.I nsdelegate -option, the writer must be able to see the source and destination cgroups -from its cgroup namespace. -.IP \[bu] -In cgroups v1: -the effective UID of the writer (i.e., the delegatee) matches the -real user ID or the saved set-user-ID of the target process. -Before Linux 4.11, -.\" commit 576dd464505fc53d501bb94569db76f220104d28 -this requirement also applied in cgroups v2 -(This was a historical requirement inherited from cgroups v1 -that was later deemed unnecessary, -since the other rules suffice for containment in cgroups v2.) -.P -.IR Note : -one consequence of these delegation containment rules is that the -unprivileged delegatee can't place the first process into -the delegated subtree; -instead, the delegater must place the first process -(a process owned by the delegatee) into the delegated subtree. -.\" -.SH CGROUPS VERSION 2 THREAD MODE -Among the restrictions imposed by cgroups v2 that were not present -in cgroups v1 are the following: -.IP \[bu] 3 -.IR "No thread-granularity control" : -all of the threads of a process must be in the same cgroup. -.IP \[bu] -.IR "No internal processes" : -a cgroup can't both have member processes and -exercise controllers on child cgroups. -.P -Both of these restrictions were added because -the lack of these restrictions had caused problems -in cgroups v1. -In particular, the cgroups v1 ability to allow thread-level granularity -for cgroup membership made no sense for some controllers. -(A notable example was the -.I memory -controller: since threads share an address space, -it made no sense to split threads across different -.I memory -cgroups.) -.P -Notwithstanding the initial design decision in cgroups v2, -there were use cases for certain controllers, notably the -.I cpu -controller, -for which thread-level granularity of control was meaningful and useful. -To accommodate such use cases, Linux 4.14 added -.I "thread mode" -for cgroups v2. -.P -Thread mode allows the following: -.IP \[bu] 3 -The creation of -.I threaded subtrees -in which the threads of a process may -be spread across cgroups inside the tree. -(A threaded subtree may contain multiple multithreaded processes.) -.IP \[bu] -The concept of -.IR "threaded controllers" , -which can distribute resources across the cgroups in a threaded subtree. -.IP \[bu] -A relaxation of the "no internal processes rule", -so that, within a threaded subtree, -a cgroup can both contain member threads and -exercise resource control over child cgroups. -.P -With the addition of thread mode, -each nonroot cgroup now contains a new file, -.IR cgroup.type , -that exposes, and in some circumstances can be used to change, -the "type" of a cgroup. -This file contains one of the following type values: -.TP -.I domain -This is a normal v2 cgroup that provides process-granularity control. -If a process is a member of this cgroup, -then all threads of the process are (by definition) in the same cgroup. -This is the default cgroup type, -and provides the same behavior that was provided for -cgroups in the initial cgroups v2 implementation. -.TP -.I threaded -This cgroup is a member of a threaded subtree. -Threads can be added to this cgroup, -and controllers can be enabled for the cgroup. -.TP -.I domain threaded -This is a domain cgroup that serves as the root of a threaded subtree. -This cgroup type is also known as "threaded root". -.TP -.I domain invalid -This is a cgroup inside a threaded subtree -that is in an "invalid" state. -Processes can't be added to the cgroup, -and controllers can't be enabled for the cgroup. -The only thing that can be done with this cgroup (other than deleting it) -is to convert it to a -.I threaded -cgroup by writing the string -.I \[dq]threaded\[dq] -to the -.I cgroup.type -file. -.IP -The rationale for the existence of this "interim" type -during the creation of a threaded subtree -(rather than the kernel simply immediately converting all cgroups -under the threaded root to the type -.IR threaded ) -is to allow for -possible future extensions to the thread mode model -.\" -.SS Threaded versus domain controllers -With the addition of threads mode, -cgroups v2 now distinguishes two types of resource controllers: -.IP \[bu] 3 -.I Threaded -.\" In the kernel source, look for ".threaded[ \t]*= true" in -.\" initializations of struct cgroup_subsys -controllers: these controllers support thread-granularity for -resource control and can be enabled inside threaded subtrees, -with the result that the corresponding controller-interface files -appear inside the cgroups in the threaded subtree. -As at Linux 4.19, the following controllers are threaded: -.IR cpu , -.IR perf_event , -and -.IR pids . -.IP \[bu] -.I Domain -controllers: these controllers support only process granularity -for resource control. -From the perspective of a domain controller, -all threads of a process are always in the same cgroup. -Domain controllers can't be enabled inside a threaded subtree. -.\" -.SS Creating a threaded subtree -There are two pathways that lead to the creation of a threaded subtree. -The first pathway proceeds as follows: -.IP (1) 5 -We write the string -.I \[dq]threaded\[dq] -to the -.I cgroup.type -file of a cgroup -.I y/z -that currently has the type -.IR domain . -This has the following effects: -.RS -.IP \[bu] 3 -The type of the cgroup -.I y/z -becomes -.IR threaded . -.IP \[bu] -The type of the parent cgroup, -.IR y , -becomes -.IR "domain threaded" . -The parent cgroup is the root of a threaded subtree -(also known as the "threaded root"). -.IP \[bu] -All other cgroups under -.I y -that were not already of type -.I threaded -(because they were inside already existing threaded subtrees -under the new threaded root) -are converted to type -.IR "domain invalid" . -Any subsequently created cgroups under -.I y -will also have the type -.IR "domain invalid" . -.RE -.IP (2) -We write the string -.I \[dq]threaded\[dq] -to each of the -.I domain invalid -cgroups under -.IR y , -in order to convert them to the type -.IR threaded . -As a consequence of this step, all threads under the threaded root -now have the type -.I threaded -and the threaded subtree is now fully usable. -The requirement to write -.I \[dq]threaded\[dq] -to each of these cgroups is somewhat cumbersome, -but allows for possible future extensions to the thread-mode model. -.P -The second way of creating a threaded subtree is as follows: -.IP (1) 5 -In an existing cgroup, -.IR z , -that currently has the type -.IR domain , -we (1.1) enable one or more threaded controllers and -(1.2) make a process a member of -.IR z . -(These two steps can be done in either order.) -This has the following consequences: -.RS -.IP \[bu] 3 -The type of -.I z -becomes -.IR "domain threaded" . -.IP \[bu] -All of the descendant cgroups of -.I z -that were not already of type -.I threaded -are converted to type -.IR "domain invalid" . -.RE -.IP (2) -As before, we make the threaded subtree usable by writing the string -.I \[dq]threaded\[dq] -to each of the -.I domain invalid -cgroups under -.IR z , -in order to convert them to the type -.IR threaded . -.P -One of the consequences of the above pathways to creating a threaded subtree -is that the threaded root cgroup can be a parent only to -.I threaded -(and -.IR "domain invalid" ) -cgroups. -The threaded root cgroup can't be a parent of a -.I domain -cgroups, and a -.I threaded -cgroup -can't have a sibling that is a -.I domain -cgroup. -.\" -.SS Using a threaded subtree -Within a threaded subtree, threaded controllers can be enabled -in each subgroup whose type has been changed to -.IR threaded ; -upon doing so, the corresponding controller interface files -appear in the children of that cgroup. -.P -A process can be moved into a threaded subtree by writing its PID to the -.I cgroup.procs -file in one of the cgroups inside the tree. -This has the effect of making all of the threads -in the process members of the corresponding cgroup -and makes the process a member of the threaded subtree. -The threads of the process can then be spread across -the threaded subtree by writing their thread IDs (see -.BR gettid (2)) -to the -.I cgroup.threads -files in different cgroups inside the subtree. -The threads of a process must all reside in the same threaded subtree. -.P -As with writing to -.IR cgroup.procs , -some containment rules apply when writing to the -.I cgroup.threads -file: -.IP \[bu] 3 -The writer must have write permission on the -cgroup.threads -file in the destination cgroup. -.IP \[bu] -The writer must have write permission on the -.I cgroup.procs -file in the common ancestor of the source and destination cgroups. -(In some cases, -the common ancestor may be the source or destination cgroup itself.) -.IP \[bu] -The source and destination cgroups must be in the same threaded subtree. -(Outside a threaded subtree, an attempt to move a thread by writing -its thread ID to the -.I cgroup.threads -file in a different -.I domain -cgroup fails with the error -.BR EOPNOTSUPP .) -.P -The -.I cgroup.threads -file is present in each cgroup (including -.I domain -cgroups) and can be read in order to discover the set of threads -that is present in the cgroup. -The set of thread IDs obtained when reading this file -is not guaranteed to be ordered or free of duplicates. -.P -The -.I cgroup.procs -file in the threaded root shows the PIDs of all processes -that are members of the threaded subtree. -The -.I cgroup.procs -files in the other cgroups in the subtree are not readable. -.P -Domain controllers can't be enabled in a threaded subtree; -no controller-interface files appear inside the cgroups underneath the -threaded root. -From the point of view of a domain controller, -threaded subtrees are invisible: -a multithreaded process inside a threaded subtree appears to a domain -controller as a process that resides in the threaded root cgroup. -.P -Within a threaded subtree, the "no internal processes" rule does not apply: -a cgroup can both contain member processes (or thread) -and exercise controllers on child cgroups. -.\" -.SS Rules for writing to cgroup.type and creating threaded subtrees -A number of rules apply when writing to the -.I cgroup.type -file: -.IP \[bu] 3 -Only the string -.I \[dq]threaded\[dq] -may be written. -In other words, the only explicit transition that is possible is to convert a -.I domain -cgroup to type -.IR threaded . -.IP \[bu] -The effect of writing -.I \[dq]threaded\[dq] -depends on the current value in -.IR cgroup.type , -as follows: -.RS -.IP \[bu] 3 -.I domain -or -.IR "domain threaded" : -start the creation of a threaded subtree -(whose root is the parent of this cgroup) via -the first of the pathways described above; -.IP \[bu] -.IR "domain\ invalid" : -convert this cgroup (which is inside a threaded subtree) to a usable (i.e., -.IR threaded ) -state; -.IP \[bu] -.IR threaded : -no effect (a "no-op"). -.RE -.IP \[bu] -We can't write to a -.I cgroup.type -file if the parent's type is -.IR "domain invalid" . -In other words, the cgroups of a threaded subtree must be converted to the -.I threaded -state in a top-down manner. -.P -There are also some constraints that must be satisfied -in order to create a threaded subtree rooted at the cgroup -.IR x : -.IP \[bu] 3 -There can be no member processes in the descendant cgroups of -.IR x . -(The cgroup -.I x -can itself have member processes.) -.IP \[bu] -No domain controllers may be enabled in -.IR x 's -.I cgroup.subtree_control -file. -.P -If any of the above constraints is violated, then an attempt to write -.I \[dq]threaded\[dq] -to a -.I cgroup.type -file fails with the error -.BR ENOTSUP . -.\" -.SS The \[dq]domain threaded\[dq] cgroup type -According to the pathways described above, -the type of a cgroup can change to -.I domain threaded -in either of the following cases: -.IP \[bu] 3 -The string -.I \[dq]threaded\[dq] -is written to a child cgroup. -.IP \[bu] -A threaded controller is enabled inside the cgroup and -a process is made a member of the cgroup. -.P -A -.I domain threaded -cgroup, -.IR x , -can revert to the type -.I domain -if the above conditions no longer hold true\[em]that is, if all -.I threaded -child cgroups of -.I x -are removed and either -.I x -no longer has threaded controllers enabled or -no longer has member processes. -.P -When a -.I domain threaded -cgroup -.I x -reverts to the type -.IR domain : -.IP \[bu] 3 -All -.I domain invalid -descendants of -.I x -that are not in lower-level threaded subtrees revert to the type -.IR domain . -.IP \[bu] -The root cgroups in any lower-level threaded subtrees revert to the type -.IR "domain threaded" . -.\" -.SS Exceptions for the root cgroup -The root cgroup of the v2 hierarchy is treated exceptionally: -it can be the parent of both -.I domain -and -.I threaded -cgroups. -If the string -.I \[dq]threaded\[dq] -is written to the -.I cgroup.type -file of one of the children of the root cgroup, then -.IP \[bu] 3 -The type of that cgroup becomes -.IR threaded . -.IP \[bu] -The type of any descendants of that cgroup that -are not part of lower-level threaded subtrees changes to -.IR "domain invalid" . -.P -Note that in this case, there is no cgroup whose type becomes -.IR "domain threaded" . -(Notionally, the root cgroup can be considered as the threaded root -for the cgroup whose type was changed to -.IR threaded .) -.P -The aim of this exceptional treatment for the root cgroup is to -allow a threaded cgroup that employs the -.I cpu -controller to be placed as high as possible in the hierarchy, -so as to minimize the (small) cost of traversing the cgroup hierarchy. -.\" -.SS The cgroups v2 \[dq]cpu\[dq] controller and realtime threads -As at Linux 4.19, the cgroups v2 -.I cpu -controller does not support control of realtime threads -(specifically threads scheduled under any of the policies -.BR SCHED_FIFO , -.BR SCHED_RR , -described -.BR SCHED_DEADLINE ; -see -.BR sched (7)). -Therefore, the -.I cpu -controller can be enabled in the root cgroup only -if all realtime threads are in the root cgroup. -(If there are realtime threads in nonroot cgroups, then a -.BR write (2) -of the string -.I \[dq]+cpu\[dq] -to the -.I cgroup.subtree_control -file fails with the error -.BR EINVAL .) -.P -On some systems, -.BR systemd (1) -places certain realtime threads in nonroot cgroups in the v2 hierarchy. -On such systems, -these threads must first be moved to the root cgroup before the -.I cpu -controller can be enabled. -.\" -.SH ERRORS -The following errors can occur for -.BR mount (2): -.TP -.B EBUSY -An attempt to mount a cgroup version 1 filesystem specified neither the -.I name= -option (to mount a named hierarchy) nor a controller name (or -.IR all ). -.SH NOTES -A child process created via -.BR fork (2) -inherits its parent's cgroup memberships. -A process's cgroup memberships are preserved across -.BR execve (2). -.P -The -.BR clone3 (2) -.B CLONE_INTO_CGROUP -flag can be used to create a child process that begins its life in -a different version 2 cgroup from the parent process. -.\" -.SS /proc files -.TP -.IR /proc/cgroups " (since Linux 2.6.24)" -This file contains information about the controllers -that are compiled into the kernel. -An example of the contents of this file (reformatted for readability) -is the following: -.IP -.in +4n -.EX -#subsys_name hierarchy num_cgroups enabled -cpuset 4 1 1 -cpu 8 1 1 -cpuacct 8 1 1 -blkio 6 1 1 -memory 3 1 1 -devices 10 84 1 -freezer 7 1 1 -net_cls 9 1 1 -perf_event 5 1 1 -net_prio 9 1 1 -hugetlb 0 1 0 -pids 2 1 1 -.EE -.in -.IP -The fields in this file are, from left to right: -.RS -.IP [1] 5 -The name of the controller. -.IP [2] -The unique ID of the cgroup hierarchy on which this controller is mounted. -If multiple cgroups v1 controllers are bound to the same hierarchy, -then each will show the same hierarchy ID in this field. -The value in this field will be 0 if: -.RS -.IP \[bu] 3 -the controller is not mounted on a cgroups v1 hierarchy; -.IP \[bu] -the controller is bound to the cgroups v2 single unified hierarchy; or -.IP \[bu] -the controller is disabled (see below). -.RE -.IP [3] -The number of control groups in this hierarchy using this controller. -.IP [4] -This field contains the value 1 if this controller is enabled, -or 0 if it has been disabled (via the -.I cgroup_disable -kernel command-line boot parameter). -.RE -.TP -.IR /proc/ pid /cgroup " (since Linux 2.6.24)" -This file describes control groups to which the process -with the corresponding PID belongs. -The displayed information differs for -cgroups version 1 and version 2 hierarchies. -.IP -For each cgroup hierarchy of which the process is a member, -there is one entry containing three colon-separated fields: -.IP -.in +4n -.EX -hierarchy\-ID:controller\-list:cgroup\-path -.EE -.in -.IP -For example: -.IP -.in +4n -.EX -5:cpuacct,cpu,cpuset:/daemons -.EE -.in -.IP -The colon-separated fields are, from left to right: -.RS -.IP [1] 5 -For cgroups version 1 hierarchies, -this field contains a unique hierarchy ID number -that can be matched to a hierarchy ID in -.IR /proc/cgroups . -For the cgroups version 2 hierarchy, this field contains the value 0. -.IP [2] -For cgroups version 1 hierarchies, -this field contains a comma-separated list of the controllers -bound to the hierarchy. -For the cgroups version 2 hierarchy, this field is empty. -.IP [3] -This field contains the pathname of the control group in the hierarchy -to which the process belongs. -This pathname is relative to the mount point of the hierarchy. -.RE -.\" -.SS /sys/kernel/cgroup files -.TP -.IR /sys/kernel/cgroup/delegate " (since Linux 4.15)" -.\" commit 01ee6cfb1483fe57c9cbd8e73817dfbf9bacffd3 -This file exports a list of the cgroups v2 files -(one per line) that are delegatable -(i.e., whose ownership should be changed to the user ID of the delegatee). -In the future, the set of delegatable files may change or grow, -and this file provides a way for the kernel to inform -user-space applications of which files must be delegated. -As at Linux 4.15, one sees the following when inspecting this file: -.IP -.in +4n -.EX -$ \fBcat /sys/kernel/cgroup/delegate\fP -cgroup.procs -cgroup.subtree_control -cgroup.threads -.EE -.in -.TP -.IR /sys/kernel/cgroup/features " (since Linux 4.15)" -.\" commit 5f2e673405b742be64e7c3604ed4ed3ac14f35ce -Over time, the set of cgroups v2 features that are provided by the -kernel may change or grow, -or some features may not be enabled by default. -This file provides a way for user-space applications to discover what -features the running kernel supports and has enabled. -Features are listed one per line: -.IP -.in +4n -.EX -$ \fBcat /sys/kernel/cgroup/features\fP -nsdelegate -memory_localevents -.EE -.in -.IP -The entries that can appear in this file are: -.RS -.TP -.IR memory_localevents " (since Linux 5.2)" -The kernel supports the -.I memory_localevents -mount option. -.TP -.IR nsdelegate " (since Linux 4.15)" -The kernel supports the -.I nsdelegate -mount option. -.TP -.IR memory_recursiveprot " (since Linux 5.7)" -.\" commit 8a931f801340c2be10552c7b5622d5f4852f3a36 -The kernel supports the -.I memory_recursiveprot -mount option. -.RE -.SH SEE ALSO -.BR prlimit (1), -.BR systemd (1), -.BR systemd\-cgls (1), -.BR systemd\-cgtop (1), -.BR clone (2), -.BR ioprio_set (2), -.BR perf_event_open (2), -.BR setrlimit (2), -.BR cgroup_namespaces (7), -.BR cpuset (7), -.BR namespaces (7), -.BR sched (7), -.BR user_namespaces (7) -.P -The kernel source file -.IR Documentation/admin\-guide/cgroup\-v2.rst . |