summaryrefslogtreecommitdiffstats
path: root/man7/user_namespaces.7
diff options
context:
space:
mode:
Diffstat (limited to 'man7/user_namespaces.7')
-rw-r--r--man7/user_namespaces.7124
1 files changed, 62 insertions, 62 deletions
diff --git a/man7/user_namespaces.7 b/man7/user_namespaces.7
index 0c29f93..94e0785 100644
--- a/man7/user_namespaces.7
+++ b/man7/user_namespaces.7
@@ -4,13 +4,13 @@
.\" SPDX-License-Identifier: Linux-man-pages-copyleft
.\"
.\"
-.TH user_namespaces 7 2023-05-03 "Linux man-pages 6.05.01"
+.TH user_namespaces 7 2024-02-25 "Linux man-pages 6.7"
.SH NAME
user_namespaces \- overview of Linux user namespaces
.SH DESCRIPTION
For an overview of namespaces, see
.BR namespaces (7).
-.PP
+.P
User namespaces isolate security-related identifiers and attributes,
in particular,
user IDs and group IDs (see
@@ -46,7 +46,7 @@ or
with the
.B CLONE_NEWUSER
flag.
-.PP
+.P
The kernel imposes (since Linux 3.11) a limit of 32 nested levels of
.\" commit 8742f229b635bf1c1c84a3dfe5e47c814c20b5c8
user namespaces.
@@ -57,7 +57,7 @@ or
.BR clone (2)
that would cause this limit to be exceeded fail with the error
.BR EUSERS .
-.PP
+.P
Each process is a member of exactly one user namespace.
A process created via
.BR fork (2)
@@ -72,7 +72,7 @@ if it has the
.B CAP_SYS_ADMIN
in that namespace;
upon doing so, it gains a full set of capabilities in that namespace.
-.PP
+.P
A call to
.BR clone (2)
or
@@ -84,14 +84,14 @@ flag makes the new child process (for
or the caller (for
.BR unshare (2))
a member of the new user namespace created by the call.
-.PP
+.P
The
.B NS_GET_PARENT
.BR ioctl (2)
operation can be used to discover the parental relationship
between user namespaces; see
.BR ioctl_ns (2).
-.PP
+.P
A task that changes one of its effective IDs
will have its dumpability reset to the value in
.IR /proc/sys/fs/suid_dumpable .
@@ -134,7 +134,7 @@ and
user namespace,
even if the new namespace is created or joined by the root user
(i.e., a process with user ID 0 in the root namespace).
-.PP
+.P
Note that a call to
.BR execve (2)
will cause a process's capabilities to be recalculated in the usual way (see
@@ -144,7 +144,7 @@ unless the process has a user ID of 0 within the namespace,
or the executable file has a nonempty inheritable capabilities mask,
the process will lose all capabilities.
See the discussion of user and group ID mappings, below.
-.PP
+.P
A call to
.BR clone (2)
or
@@ -172,7 +172,7 @@ retaining its user namespace membership by using a pair of
.BR setns (2)
calls to move to another user namespace and then return to
its original user namespace.
-.PP
+.P
The rules for determining whether or not a process has a capability
in a particular user namespace are as follows:
.IP \[bu] 3
@@ -230,7 +230,7 @@ In other words, having a capability in a user namespace permits a process
to perform privileged operations on resources that are governed by (nonuser)
namespaces owned by (associated with) the user namespace
(see the next subsection).
-.PP
+.P
On the other hand, there are many privileged operations that affect
resources that are not associated with any namespace type,
for example, changing the system (i.e., calendar) time (governed by
@@ -242,14 +242,14 @@ and creating a device (governed by
Only a process with privileges in the
.I initial
user namespace can perform such operations.
-.PP
+.P
Holding
.B CAP_SYS_ADMIN
within the user namespace that owns a process's mount namespace
allows that process to create bind mounts
and mount the following types of filesystems:
.\" fs_flags = FS_USERNS_MOUNT in kernel sources
-.PP
+.P
.RS 4
.PD 0
.IP \[bu] 3
@@ -281,7 +281,7 @@ and mount the following types of filesystems:
(since Linux 5.11)
.PD
.RE
-.PP
+.P
Holding
.B CAP_SYS_ADMIN
within the user namespace that owns a process's cgroup namespace
@@ -289,9 +289,9 @@ allows (since Linux 4.6)
that process to the mount the cgroup version 2 filesystem and
cgroup version 1 named hierarchies
(i.e., cgroup filesystems mounted with the
-.I """none,name="""
+.I \[dq]none,name=\[dq]
option).
-.PP
+.P
Holding
.B CAP_SYS_ADMIN
within the user namespace that owns a process's PID namespace
@@ -299,7 +299,7 @@ allows (since Linux 3.8)
that process to mount
.I /proc
filesystems.
-.PP
+.P
Note, however, that mounting block-based filesystems can be done
only by a process that holds
.B CAP_SYS_ADMIN
@@ -312,14 +312,14 @@ Starting in Linux 3.8, unprivileged processes can create user namespaces,
and the other types of namespaces can be created with just the
.B CAP_SYS_ADMIN
capability in the caller's user namespace.
-.PP
+.P
When a nonuser namespace is created,
it is owned by the user namespace in which the creating process
was a member at the time of the creation of the namespace.
Privileged operations on resources governed by the nonuser namespace
require that the process has the necessary capabilities
in the user namespace that owns the nonuser namespace.
-.PP
+.P
If
.B CLONE_NEWUSER
is specified along with other
@@ -336,7 +336,7 @@ or caller
privileges over the remaining namespaces created by the call.
Thus, it is possible for an unprivileged caller to specify this combination
of flags.
-.PP
+.P
When a new namespace (other than a user namespace) is created via
.BR clone (2)
or
@@ -358,7 +358,7 @@ the process's UTS namespace, and check whether the process has the
required capability
.RB ( CAP_SYS_ADMIN )
in that user namespace.
-.PP
+.P
The
.B NS_GET_USERNS
.BR ioctl (2)
@@ -383,13 +383,13 @@ inside the user namespace for the process
.IR pid .
These files can be read to view the mappings in a user namespace and
written to (once) to define the mappings.
-.PP
+.P
The description in the following paragraphs explains the details for
.IR uid_map ;
.I gid_map
is exactly the same,
but each instance of "user ID" is replaced by "group ID".
-.PP
+.P
The
.I uid_map
file exposes the mapping of user IDs from the user namespace
@@ -403,7 +403,7 @@ will potentially see different values when reading from a particular
.I uid_map
file, depending on the user ID mappings for the user namespaces
of the reading processes.
-.PP
+.P
Each line in the
.I uid_map
file specifies a 1-to-1 mapping of a range of contiguous
@@ -448,14 +448,14 @@ that created this user namespace.
.IP (3)
The length of the range of user IDs that is mapped between the two
user namespaces.
-.PP
+.P
System calls that return user IDs (group IDs)\[em]for example,
.BR getuid (2),
.BR getgid (2),
and the credential fields in the structure returned by
.BR stat (2)\[em]return
the user ID (group ID) mapped into the caller's user namespace.
-.PP
+.P
When a process accesses a file, its user and group IDs
are mapped into the initial user namespace for the purpose of permission
checking and assigning IDs when creating a file.
@@ -463,7 +463,7 @@ When a process retrieves file user and group IDs via
.BR stat (2),
the IDs are mapped in the opposite direction,
to produce values relative to the process user and group ID mappings.
-.PP
+.P
The initial user namespace has no parent namespace,
but, for consistency, the kernel provides dummy user and group
ID mapping files for this namespace.
@@ -472,14 +472,14 @@ Looking at the
file
.RI ( gid_map
is the same) from a shell in the initial namespace shows:
-.PP
+.P
.in +4n
.EX
$ \fBcat /proc/$$/uid_map\fP
0 0 4294967295
.EE
.in
-.PP
+.P
This mapping tells us
that the range starting at user ID 0 in this namespace
maps to a range starting at 0 in the (nonexistent) parent namespace,
@@ -512,7 +512,7 @@ file in a user namespace fails with the error
Similar rules apply for
.I gid_map
files.
-.PP
+.P
The lines written to
.I uid_map
.RI ( gid_map )
@@ -552,10 +552,10 @@ Linux 3.9 and later
fix this limitation, allowing any valid set of nonoverlapping maps.
.IP \[bu]
At least one line must be written to the file.
-.PP
+.P
Writes that violate the above rules fail with the error
.BR EINVAL .
-.PP
+.P
In order for a process to write to the
.IR /proc/ pid /uid_map
.RI ( /proc/ pid /gid_map )
@@ -661,7 +661,7 @@ file (see below) before writing to
.IR gid_map .
.RE
.RE
-.PP
+.P
Writes that violate the above rules fail with the error
.BR EPERM .
.\"
@@ -674,13 +674,13 @@ it is possible to create project ID mappings for a user namespace.
.BR setquota (8)
and
.BR quotactl (2).)
-.PP
+.P
Project ID mappings are defined by writing to the
.IR /proc/ pid /projid_map
file (present since
.\" commit f76d207a66c3a53defea67e7d36c3eb1b7d6d61d
Linux 3.7).
-.PP
+.P
The validity rules for writing to the
.IR /proc/ pid /projid_map
file are as for writing to the
@@ -689,7 +689,7 @@ file; violation of these rules causes
.BR write (2)
to fail with the error
.BR EINVAL .
-.PP
+.P
The permission rules for writing to the
.IR /proc/ pid /projid_map
file are as follows:
@@ -701,7 +701,7 @@ or be in the parent user namespace of the process
.IP \[bu]
The mapped project IDs must in turn have a mapping
in the parent user namespace.
-.PP
+.P
Violation of these rules causes
.BR write (2)
to fail with the error
@@ -722,7 +722,7 @@ and
.I gid_map
files have been written, only the mapped values may be used in
system calls that change user and group IDs.
-.PP
+.P
For user IDs, the relevant system calls include
.BR setuid (2),
.BR setfsuid (2),
@@ -736,7 +736,7 @@ For group IDs, the relevant system calls include
.BR setresgid (2),
and
.BR setgroups (2).
-.PP
+.P
Writing
.RI \[dq] deny \[dq]
to the
@@ -784,7 +784,7 @@ file (and regardless of the process's capabilities), calls to
are also not permitted if
.IR /proc/ pid /gid_map
has not yet been set.
-.PP
+.P
A privileged process (one with the
.B CAP_SYS_ADMIN
capability in the namespace) may write either of the strings
@@ -800,7 +800,7 @@ Writing the string
.RI \[dq] deny \[dq]
prevents any process in the user namespace from employing
.BR setgroups (2).
-.PP
+.P
The essence of the restrictions described in the preceding
paragraph is that it is permitted to write to
.IR /proc/ pid /setgroups
@@ -819,10 +819,10 @@ a process can transition only from
being disallowed to
.BR setgroups (2)
being allowed.
-.PP
+.P
The default value of this file in the initial user namespace is
.RI \[dq] allow \[dq].
-.PP
+.P
Once
.IR /proc/ pid /gid_map
has been written to
@@ -837,11 +837,11 @@ to
.IR /proc/ pid /setgroups
(the write fails with the error
.BR EPERM ).
-.PP
+.P
A child user namespace inherits the
.IR /proc/ pid /setgroups
setting from its parent.
-.PP
+.P
If the
.I setgroups
file has the value
@@ -855,7 +855,7 @@ to the file) in this user namespace.
.BR EPERM .)
This restriction also propagates down to all child user namespaces of
this user namespace.
-.PP
+.P
The
.IR /proc/ pid /setgroups
file was added in Linux 3.19,
@@ -913,7 +913,7 @@ and
.I /proc/sys/kernel/overflowgid
in
.BR proc (5).
-.PP
+.P
The cases where unmapped IDs are mapped in this fashion include
system calls that return user IDs
.RB ( getuid (2),
@@ -941,7 +941,7 @@ credentials written to the process accounting file (see
.BR acct (5)),
and credentials returned with POSIX message queue notifications (see
.BR mq_notify (3)).
-.PP
+.P
There is one notable case where unmapped user and group IDs are
.I not
.\" from_kuid(), from_kgid()
@@ -978,7 +978,7 @@ These capabilities are:
.BR CAP_FOWNER ,
and
.BR CAP_FSETID .
-.PP
+.P
Within a user namespace,
these capabilities allow a process to bypass the rules
if the process has the relevant capability over the file,
@@ -988,7 +988,7 @@ the process has the relevant effective capability in its user namespace; and
.IP \[bu]
the file's user ID and group ID both have valid mappings
in the user namespace.
-.PP
+.P
The
.B CAP_FOWNER
capability is treated somewhat exceptionally:
@@ -1057,7 +1057,7 @@ User namespaces require support in a range of subsystems across
the kernel.
When an unsupported subsystem is configured into the kernel,
it is not possible to configure user namespaces support.
-.PP
+.P
As at Linux 3.8, most relevant subsystems supported user namespaces,
but a number of filesystems did not have the infrastructure needed
to map user and group IDs between user namespaces.
@@ -1077,9 +1077,9 @@ The comments and
.IR usage ()
function inside the program provide a full explanation of the program.
The following shell session demonstrates its use.
-.PP
+.P
First, we look at the run-time environment:
-.PP
+.P
.in +4n
.EX
$ \fBuname \-rs\fP # Need Linux 3.8 or later
@@ -1090,7 +1090,7 @@ $ \fBid \-g\fP
1000
.EE
.in
-.PP
+.P
Now start a new shell in new user
.RI ( \-U ),
mount
@@ -1102,29 +1102,29 @@ namespaces, with user ID
and group ID
.RI ( \-G )
1000 mapped to 0 inside the user namespace:
-.PP
+.P
.in +4n
.EX
$ \fB./userns_child_exec \-p \-m \-U \-M \[aq]0 1000 1\[aq] \-G \[aq]0 1000 1\[aq] bash\fP
.EE
.in
-.PP
+.P
The shell has PID 1, because it is the first process in the new
PID namespace:
-.PP
+.P
.in +4n
.EX
bash$ \fBecho $$\fP
1
.EE
.in
-.PP
+.P
Mounting a new
.I /proc
filesystem and listing all of the processes visible
in the new PID namespace shows that the shell can't see
any processes outside the PID namespace:
-.PP
+.P
.in +4n
.EX
bash$ \fBmount \-t proc proc /proc\fP
@@ -1134,10 +1134,10 @@ bash$ \fBps ax\fP
22 pts/3 R+ 0:00 ps ax
.EE
.in
-.PP
+.P
Inside the user namespace, the shell has user and group ID 0,
and a full set of permitted and effective capabilities:
-.PP
+.P
.in +4n
.EX
bash$ \fBcat /proc/$$/status | egrep \[aq]\[ha][UG]id\[aq]\fP
@@ -1464,6 +1464,6 @@ main(int argc, char *argv[])
.BR credentials (7),
.BR namespaces (7),
.BR pid_namespaces (7)
-.PP
+.P
The kernel source file
.IR Documentation/admin\-guide/namespaces/resource\-control.rst .