diff options
Diffstat (limited to 'man7/user_namespaces.7')
-rw-r--r-- | man7/user_namespaces.7 | 124 |
1 files changed, 62 insertions, 62 deletions
diff --git a/man7/user_namespaces.7 b/man7/user_namespaces.7 index 0c29f93..94e0785 100644 --- a/man7/user_namespaces.7 +++ b/man7/user_namespaces.7 @@ -4,13 +4,13 @@ .\" SPDX-License-Identifier: Linux-man-pages-copyleft .\" .\" -.TH user_namespaces 7 2023-05-03 "Linux man-pages 6.05.01" +.TH user_namespaces 7 2024-02-25 "Linux man-pages 6.7" .SH NAME user_namespaces \- overview of Linux user namespaces .SH DESCRIPTION For an overview of namespaces, see .BR namespaces (7). -.PP +.P User namespaces isolate security-related identifiers and attributes, in particular, user IDs and group IDs (see @@ -46,7 +46,7 @@ or with the .B CLONE_NEWUSER flag. -.PP +.P The kernel imposes (since Linux 3.11) a limit of 32 nested levels of .\" commit 8742f229b635bf1c1c84a3dfe5e47c814c20b5c8 user namespaces. @@ -57,7 +57,7 @@ or .BR clone (2) that would cause this limit to be exceeded fail with the error .BR EUSERS . -.PP +.P Each process is a member of exactly one user namespace. A process created via .BR fork (2) @@ -72,7 +72,7 @@ if it has the .B CAP_SYS_ADMIN in that namespace; upon doing so, it gains a full set of capabilities in that namespace. -.PP +.P A call to .BR clone (2) or @@ -84,14 +84,14 @@ flag makes the new child process (for or the caller (for .BR unshare (2)) a member of the new user namespace created by the call. -.PP +.P The .B NS_GET_PARENT .BR ioctl (2) operation can be used to discover the parental relationship between user namespaces; see .BR ioctl_ns (2). -.PP +.P A task that changes one of its effective IDs will have its dumpability reset to the value in .IR /proc/sys/fs/suid_dumpable . @@ -134,7 +134,7 @@ and user namespace, even if the new namespace is created or joined by the root user (i.e., a process with user ID 0 in the root namespace). -.PP +.P Note that a call to .BR execve (2) will cause a process's capabilities to be recalculated in the usual way (see @@ -144,7 +144,7 @@ unless the process has a user ID of 0 within the namespace, or the executable file has a nonempty inheritable capabilities mask, the process will lose all capabilities. See the discussion of user and group ID mappings, below. -.PP +.P A call to .BR clone (2) or @@ -172,7 +172,7 @@ retaining its user namespace membership by using a pair of .BR setns (2) calls to move to another user namespace and then return to its original user namespace. -.PP +.P The rules for determining whether or not a process has a capability in a particular user namespace are as follows: .IP \[bu] 3 @@ -230,7 +230,7 @@ In other words, having a capability in a user namespace permits a process to perform privileged operations on resources that are governed by (nonuser) namespaces owned by (associated with) the user namespace (see the next subsection). -.PP +.P On the other hand, there are many privileged operations that affect resources that are not associated with any namespace type, for example, changing the system (i.e., calendar) time (governed by @@ -242,14 +242,14 @@ and creating a device (governed by Only a process with privileges in the .I initial user namespace can perform such operations. -.PP +.P Holding .B CAP_SYS_ADMIN within the user namespace that owns a process's mount namespace allows that process to create bind mounts and mount the following types of filesystems: .\" fs_flags = FS_USERNS_MOUNT in kernel sources -.PP +.P .RS 4 .PD 0 .IP \[bu] 3 @@ -281,7 +281,7 @@ and mount the following types of filesystems: (since Linux 5.11) .PD .RE -.PP +.P Holding .B CAP_SYS_ADMIN within the user namespace that owns a process's cgroup namespace @@ -289,9 +289,9 @@ allows (since Linux 4.6) that process to the mount the cgroup version 2 filesystem and cgroup version 1 named hierarchies (i.e., cgroup filesystems mounted with the -.I """none,name=""" +.I \[dq]none,name=\[dq] option). -.PP +.P Holding .B CAP_SYS_ADMIN within the user namespace that owns a process's PID namespace @@ -299,7 +299,7 @@ allows (since Linux 3.8) that process to mount .I /proc filesystems. -.PP +.P Note, however, that mounting block-based filesystems can be done only by a process that holds .B CAP_SYS_ADMIN @@ -312,14 +312,14 @@ Starting in Linux 3.8, unprivileged processes can create user namespaces, and the other types of namespaces can be created with just the .B CAP_SYS_ADMIN capability in the caller's user namespace. -.PP +.P When a nonuser namespace is created, it is owned by the user namespace in which the creating process was a member at the time of the creation of the namespace. Privileged operations on resources governed by the nonuser namespace require that the process has the necessary capabilities in the user namespace that owns the nonuser namespace. -.PP +.P If .B CLONE_NEWUSER is specified along with other @@ -336,7 +336,7 @@ or caller privileges over the remaining namespaces created by the call. Thus, it is possible for an unprivileged caller to specify this combination of flags. -.PP +.P When a new namespace (other than a user namespace) is created via .BR clone (2) or @@ -358,7 +358,7 @@ the process's UTS namespace, and check whether the process has the required capability .RB ( CAP_SYS_ADMIN ) in that user namespace. -.PP +.P The .B NS_GET_USERNS .BR ioctl (2) @@ -383,13 +383,13 @@ inside the user namespace for the process .IR pid . These files can be read to view the mappings in a user namespace and written to (once) to define the mappings. -.PP +.P The description in the following paragraphs explains the details for .IR uid_map ; .I gid_map is exactly the same, but each instance of "user ID" is replaced by "group ID". -.PP +.P The .I uid_map file exposes the mapping of user IDs from the user namespace @@ -403,7 +403,7 @@ will potentially see different values when reading from a particular .I uid_map file, depending on the user ID mappings for the user namespaces of the reading processes. -.PP +.P Each line in the .I uid_map file specifies a 1-to-1 mapping of a range of contiguous @@ -448,14 +448,14 @@ that created this user namespace. .IP (3) The length of the range of user IDs that is mapped between the two user namespaces. -.PP +.P System calls that return user IDs (group IDs)\[em]for example, .BR getuid (2), .BR getgid (2), and the credential fields in the structure returned by .BR stat (2)\[em]return the user ID (group ID) mapped into the caller's user namespace. -.PP +.P When a process accesses a file, its user and group IDs are mapped into the initial user namespace for the purpose of permission checking and assigning IDs when creating a file. @@ -463,7 +463,7 @@ When a process retrieves file user and group IDs via .BR stat (2), the IDs are mapped in the opposite direction, to produce values relative to the process user and group ID mappings. -.PP +.P The initial user namespace has no parent namespace, but, for consistency, the kernel provides dummy user and group ID mapping files for this namespace. @@ -472,14 +472,14 @@ Looking at the file .RI ( gid_map is the same) from a shell in the initial namespace shows: -.PP +.P .in +4n .EX $ \fBcat /proc/$$/uid_map\fP 0 0 4294967295 .EE .in -.PP +.P This mapping tells us that the range starting at user ID 0 in this namespace maps to a range starting at 0 in the (nonexistent) parent namespace, @@ -512,7 +512,7 @@ file in a user namespace fails with the error Similar rules apply for .I gid_map files. -.PP +.P The lines written to .I uid_map .RI ( gid_map ) @@ -552,10 +552,10 @@ Linux 3.9 and later fix this limitation, allowing any valid set of nonoverlapping maps. .IP \[bu] At least one line must be written to the file. -.PP +.P Writes that violate the above rules fail with the error .BR EINVAL . -.PP +.P In order for a process to write to the .IR /proc/ pid /uid_map .RI ( /proc/ pid /gid_map ) @@ -661,7 +661,7 @@ file (see below) before writing to .IR gid_map . .RE .RE -.PP +.P Writes that violate the above rules fail with the error .BR EPERM . .\" @@ -674,13 +674,13 @@ it is possible to create project ID mappings for a user namespace. .BR setquota (8) and .BR quotactl (2).) -.PP +.P Project ID mappings are defined by writing to the .IR /proc/ pid /projid_map file (present since .\" commit f76d207a66c3a53defea67e7d36c3eb1b7d6d61d Linux 3.7). -.PP +.P The validity rules for writing to the .IR /proc/ pid /projid_map file are as for writing to the @@ -689,7 +689,7 @@ file; violation of these rules causes .BR write (2) to fail with the error .BR EINVAL . -.PP +.P The permission rules for writing to the .IR /proc/ pid /projid_map file are as follows: @@ -701,7 +701,7 @@ or be in the parent user namespace of the process .IP \[bu] The mapped project IDs must in turn have a mapping in the parent user namespace. -.PP +.P Violation of these rules causes .BR write (2) to fail with the error @@ -722,7 +722,7 @@ and .I gid_map files have been written, only the mapped values may be used in system calls that change user and group IDs. -.PP +.P For user IDs, the relevant system calls include .BR setuid (2), .BR setfsuid (2), @@ -736,7 +736,7 @@ For group IDs, the relevant system calls include .BR setresgid (2), and .BR setgroups (2). -.PP +.P Writing .RI \[dq] deny \[dq] to the @@ -784,7 +784,7 @@ file (and regardless of the process's capabilities), calls to are also not permitted if .IR /proc/ pid /gid_map has not yet been set. -.PP +.P A privileged process (one with the .B CAP_SYS_ADMIN capability in the namespace) may write either of the strings @@ -800,7 +800,7 @@ Writing the string .RI \[dq] deny \[dq] prevents any process in the user namespace from employing .BR setgroups (2). -.PP +.P The essence of the restrictions described in the preceding paragraph is that it is permitted to write to .IR /proc/ pid /setgroups @@ -819,10 +819,10 @@ a process can transition only from being disallowed to .BR setgroups (2) being allowed. -.PP +.P The default value of this file in the initial user namespace is .RI \[dq] allow \[dq]. -.PP +.P Once .IR /proc/ pid /gid_map has been written to @@ -837,11 +837,11 @@ to .IR /proc/ pid /setgroups (the write fails with the error .BR EPERM ). -.PP +.P A child user namespace inherits the .IR /proc/ pid /setgroups setting from its parent. -.PP +.P If the .I setgroups file has the value @@ -855,7 +855,7 @@ to the file) in this user namespace. .BR EPERM .) This restriction also propagates down to all child user namespaces of this user namespace. -.PP +.P The .IR /proc/ pid /setgroups file was added in Linux 3.19, @@ -913,7 +913,7 @@ and .I /proc/sys/kernel/overflowgid in .BR proc (5). -.PP +.P The cases where unmapped IDs are mapped in this fashion include system calls that return user IDs .RB ( getuid (2), @@ -941,7 +941,7 @@ credentials written to the process accounting file (see .BR acct (5)), and credentials returned with POSIX message queue notifications (see .BR mq_notify (3)). -.PP +.P There is one notable case where unmapped user and group IDs are .I not .\" from_kuid(), from_kgid() @@ -978,7 +978,7 @@ These capabilities are: .BR CAP_FOWNER , and .BR CAP_FSETID . -.PP +.P Within a user namespace, these capabilities allow a process to bypass the rules if the process has the relevant capability over the file, @@ -988,7 +988,7 @@ the process has the relevant effective capability in its user namespace; and .IP \[bu] the file's user ID and group ID both have valid mappings in the user namespace. -.PP +.P The .B CAP_FOWNER capability is treated somewhat exceptionally: @@ -1057,7 +1057,7 @@ User namespaces require support in a range of subsystems across the kernel. When an unsupported subsystem is configured into the kernel, it is not possible to configure user namespaces support. -.PP +.P As at Linux 3.8, most relevant subsystems supported user namespaces, but a number of filesystems did not have the infrastructure needed to map user and group IDs between user namespaces. @@ -1077,9 +1077,9 @@ The comments and .IR usage () function inside the program provide a full explanation of the program. The following shell session demonstrates its use. -.PP +.P First, we look at the run-time environment: -.PP +.P .in +4n .EX $ \fBuname \-rs\fP # Need Linux 3.8 or later @@ -1090,7 +1090,7 @@ $ \fBid \-g\fP 1000 .EE .in -.PP +.P Now start a new shell in new user .RI ( \-U ), mount @@ -1102,29 +1102,29 @@ namespaces, with user ID and group ID .RI ( \-G ) 1000 mapped to 0 inside the user namespace: -.PP +.P .in +4n .EX $ \fB./userns_child_exec \-p \-m \-U \-M \[aq]0 1000 1\[aq] \-G \[aq]0 1000 1\[aq] bash\fP .EE .in -.PP +.P The shell has PID 1, because it is the first process in the new PID namespace: -.PP +.P .in +4n .EX bash$ \fBecho $$\fP 1 .EE .in -.PP +.P Mounting a new .I /proc filesystem and listing all of the processes visible in the new PID namespace shows that the shell can't see any processes outside the PID namespace: -.PP +.P .in +4n .EX bash$ \fBmount \-t proc proc /proc\fP @@ -1134,10 +1134,10 @@ bash$ \fBps ax\fP 22 pts/3 R+ 0:00 ps ax .EE .in -.PP +.P Inside the user namespace, the shell has user and group ID 0, and a full set of permitted and effective capabilities: -.PP +.P .in +4n .EX bash$ \fBcat /proc/$$/status | egrep \[aq]\[ha][UG]id\[aq]\fP @@ -1464,6 +1464,6 @@ main(int argc, char *argv[]) .BR credentials (7), .BR namespaces (7), .BR pid_namespaces (7) -.PP +.P The kernel source file .IR Documentation/admin\-guide/namespaces/resource\-control.rst . |