diff options
Diffstat (limited to 'man7/mount_namespaces.7')
-rw-r--r-- | man7/mount_namespaces.7 | 1371 |
1 files changed, 1371 insertions, 0 deletions
diff --git a/man7/mount_namespaces.7 b/man7/mount_namespaces.7 new file mode 100644 index 0000000..0ce2fee --- /dev/null +++ b/man7/mount_namespaces.7 @@ -0,0 +1,1371 @@ +'\" t +.\" Copyright (c) 2016, 2019, 2021 by Michael Kerrisk <mtk.manpages@gmail.com> +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.\" +.TH mount_namespaces 7 2023-05-03 "Linux man-pages 6.05.01" +.SH NAME +mount_namespaces \- overview of Linux mount namespaces +.SH DESCRIPTION +For an overview of namespaces, see +.BR namespaces (7). +.PP +Mount namespaces provide isolation of the list of mounts seen +by the processes in each namespace instance. +Thus, the processes in each of the mount namespace instances +will see distinct single-directory hierarchies. +.PP +The views provided by the +.IR /proc/ pid /mounts , +.IR /proc/ pid /mountinfo , +and +.IR /proc/ pid /mountstats +files (all described in +.BR proc (5)) +correspond to the mount namespace in which the process with the PID +.I pid +resides. +(All of the processes that reside in the same mount namespace +will see the same view in these files.) +.PP +A new mount namespace is created using either +.BR clone (2) +or +.BR unshare (2) +with the +.B CLONE_NEWNS +flag. +When a new mount namespace is created, +its mount list is initialized as follows: +.IP \[bu] 3 +If the namespace is created using +.BR clone (2), +the mount list of the child's namespace is a copy +of the mount list in the parent process's mount namespace. +.IP \[bu] +If the namespace is created using +.BR unshare (2), +the mount list of the new namespace is a copy of +the mount list in the caller's previous mount namespace. +.PP +Subsequent modifications to the mount list +.RB ( mount (2) +and +.BR umount (2)) +in either mount namespace will not (by default) affect the +mount list seen in the other namespace +(but see the following discussion of shared subtrees). +.\" +.SH SHARED SUBTREES +After the implementation of mount namespaces was completed, +experience showed that the isolation that they provided was, +in some cases, too great. +For example, in order to make a newly loaded optical disk +available in all mount namespaces, +a mount operation was required in each namespace. +For this use case, and others, +the shared subtree feature was introduced in Linux 2.6.15. +This feature allows for automatic, controlled propagation of +.BR mount (2) +and +.BR umount (2) +.I events +between namespaces +(or, more precisely, between the mounts that are members of a +.I peer group +that are propagating events to one another). +.PP +Each mount is marked (via +.BR mount (2)) +as having one of the following +.IR "propagation types" : +.TP +.B MS_SHARED +This mount shares events with members of a peer group. +.BR mount (2) +and +.BR umount (2) +events immediately under this mount will propagate +to the other mounts that are members of the peer group. +.I Propagation +here means that the same +.BR mount (2) +or +.BR umount (2) +will automatically occur +under all of the other mounts in the peer group. +Conversely, +.BR mount (2) +and +.BR umount (2) +events that take place under +peer mounts will propagate to this mount. +.TP +.B MS_PRIVATE +This mount is private; it does not have a peer group. +.BR mount (2) +and +.BR umount (2) +events do not propagate into or out of this mount. +.TP +.B MS_SLAVE +.BR mount (2) +and +.BR umount (2) +events propagate into this mount from +a (master) shared peer group. +.BR mount (2) +and +.BR umount (2) +events under this mount do not propagate to any peer. +.IP +Note that a mount can be the slave of another peer group +while at the same time sharing +.BR mount (2) +and +.BR umount (2) +events +with a peer group of which it is a member. +(More precisely, one peer group can be the slave of another peer group.) +.TP +.B MS_UNBINDABLE +This is like a private mount, +and in addition this mount can't be bind mounted. +Attempts to bind mount this mount +.RB ( mount (2) +with the +.B MS_BIND +flag) will fail. +.IP +When a recursive bind mount +.RB ( mount (2) +with the +.B MS_BIND +and +.B MS_REC +flags) is performed on a directory subtree, +any bind mounts within the subtree are automatically pruned +(i.e., not replicated) +when replicating that subtree to produce the target subtree. +.PP +For a discussion of the propagation type assigned to a new mount, +see NOTES. +.PP +The propagation type is a per-mount-point setting; +some mounts may be marked as shared +(with each shared mount being a member of a distinct peer group), +while others are private +(or slaved or unbindable). +.PP +Note that a mount's propagation type determines whether +.BR mount (2) +and +.BR umount (2) +of mounts +.I immediately under +the mount are propagated. +Thus, the propagation type does not affect propagation of events for +grandchildren and further removed descendant mounts. +What happens if the mount itself is unmounted is determined by +the propagation type that is in effect for the +.I parent +of the mount. +.PP +Members are added to a +.I peer group +when a mount is marked as shared and either: +.IP (a) 5 +the mount is replicated during the creation of a new mount namespace; or +.IP (b) +a new bind mount is created from the mount. +.PP +In both of these cases, the new mount joins the peer group +of which the existing mount is a member. +.PP +A new peer group is also created when a child mount is created under +an existing mount that is marked as shared. +In this case, the new child mount is also marked as shared and +the resulting peer group consists of all the mounts +that are replicated under the peers of parent mounts. +.PP +A mount ceases to be a member of a peer group when either +the mount is explicitly unmounted, +or when the mount is implicitly unmounted because a mount namespace is removed +(because it has no more member processes). +.PP +The propagation type of the mounts in a mount namespace +can be discovered via the "optional fields" exposed in +.IR /proc/ pid /mountinfo . +(See +.BR proc (5) +for details of this file.) +The following tags can appear in the optional fields +for a record in that file: +.TP +.I shared:X +This mount is shared in peer group +.IR X . +Each peer group has a unique ID that is automatically +generated by the kernel, +and all mounts in the same peer group will show the same ID. +(These IDs are assigned starting from the value 1, +and may be recycled when a peer group ceases to have any members.) +.TP +.I master:X +This mount is a slave to shared peer group +.IR X . +.TP +.IR propagate_from:X " (since Linux 2.6.26)" +.\" commit 97e7e0f71d6d948c25f11f0a33878d9356d9579e +This mount is a slave and receives propagation from shared peer group +.IR X . +This tag will always appear in conjunction with a +.I master:X +tag. +Here, +.I X +is the closest dominant peer group under the process's root directory. +If +.I X +is the immediate master of the mount, +or if there is no dominant peer group under the same root, +then only the +.I master:X +field is present and not the +.I propagate_from:X +field. +For further details, see below. +.TP +.I unbindable +This is an unbindable mount. +.PP +If none of the above tags is present, then this is a private mount. +.SS MS_SHARED and MS_PRIVATE example +Suppose that on a terminal in the initial mount namespace, +we mark one mount as shared and another as private, +and then view the mounts in +.IR /proc/self/mountinfo : +.PP +.in +4n +.EX +sh1# \fBmount \-\-make\-shared /mntS\fP +sh1# \fBmount \-\-make\-private /mntP\fP +sh1# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP +77 61 8:17 / /mntS rw,relatime shared:1 +83 61 8:15 / /mntP rw,relatime +.EE +.in +.PP +From the +.I /proc/self/mountinfo +output, we see that +.I /mntS +is a shared mount in peer group 1, and that +.I /mntP +has no optional tags, indicating that it is a private mount. +The first two fields in each record in this file are the unique +ID for this mount, and the mount ID of the parent mount. +We can further inspect this file to see that the parent mount of +.I /mntS +and +.I /mntP +is the root directory, +.IR / , +which is mounted as private: +.PP +.in +4n +.EX +sh1# \fBcat /proc/self/mountinfo | awk \[aq]$1 == 61\[aq] | sed \[aq]s/ \- .*//\[aq]\fP +61 0 8:2 / / rw,relatime +.EE +.in +.PP +On a second terminal, +we create a new mount namespace where we run a second shell +and inspect the mounts: +.PP +.in +4n +.EX +$ \fBPS1=\[aq]sh2# \[aq] sudo unshare \-m \-\-propagation unchanged sh\fP +sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP +222 145 8:17 / /mntS rw,relatime shared:1 +225 145 8:15 / /mntP rw,relatime +.EE +.in +.PP +The new mount namespace received a copy of the initial mount namespace's +mounts. +These new mounts maintain the same propagation types, +but have unique mount IDs. +(The +.I \-\-propagation\~unchanged +option prevents +.BR unshare (1) +from marking all mounts as private when creating a new mount namespace, +.\" Since util-linux 2.27 +which it does by default.) +.PP +In the second terminal, we then create submounts under each of +.I /mntS +and +.I /mntP +and inspect the set-up: +.PP +.in +4n +.EX +sh2# \fBmkdir /mntS/a\fP +sh2# \fBmount /dev/sdb6 /mntS/a\fP +sh2# \fBmkdir /mntP/b\fP +sh2# \fBmount /dev/sdb7 /mntP/b\fP +sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP +222 145 8:17 / /mntS rw,relatime shared:1 +225 145 8:15 / /mntP rw,relatime +178 222 8:22 / /mntS/a rw,relatime shared:2 +230 225 8:23 / /mntP/b rw,relatime +.EE +.in +.PP +From the above, it can be seen that +.I /mntS/a +was created as shared (inheriting this setting from its parent mount) and +.I /mntP/b +was created as a private mount. +.PP +Returning to the first terminal and inspecting the set-up, +we see that the new mount created under the shared mount +.I /mntS +propagated to its peer mount (in the initial mount namespace), +but the new mount created under the private mount +.I /mntP +did not propagate: +.PP +.in +4n +.EX +sh1# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP +77 61 8:17 / /mntS rw,relatime shared:1 +83 61 8:15 / /mntP rw,relatime +179 77 8:22 / /mntS/a rw,relatime shared:2 +.EE +.in +.\" +.SS MS_SLAVE example +Making a mount a slave allows it to receive propagated +.BR mount (2) +and +.BR umount (2) +events from a master shared peer group, +while preventing it from propagating events to that master. +This is useful if we want to (say) receive a mount event when +an optical disk is mounted in the master shared peer group +(in another mount namespace), +but want to prevent +.BR mount (2) +and +.BR umount (2) +events under the slave mount +from having side effects in other namespaces. +.PP +We can demonstrate the effect of slaving by first marking +two mounts as shared in the initial mount namespace: +.PP +.in +4n +.EX +sh1# \fBmount \-\-make\-shared /mntX\fP +sh1# \fBmount \-\-make\-shared /mntY\fP +sh1# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP +132 83 8:23 / /mntX rw,relatime shared:1 +133 83 8:22 / /mntY rw,relatime shared:2 +.EE +.in +.PP +On a second terminal, +we create a new mount namespace and inspect the mounts: +.PP +.in +4n +.EX +sh2# \fBunshare \-m \-\-propagation unchanged sh\fP +sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP +168 167 8:23 / /mntX rw,relatime shared:1 +169 167 8:22 / /mntY rw,relatime shared:2 +.EE +.in +.PP +In the new mount namespace, we then mark one of the mounts as a slave: +.PP +.in +4n +.EX +sh2# \fBmount \-\-make\-slave /mntY\fP +sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP +168 167 8:23 / /mntX rw,relatime shared:1 +169 167 8:22 / /mntY rw,relatime master:2 +.EE +.in +.PP +From the above output, we see that +.I /mntY +is now a slave mount that is receiving propagation events from +the shared peer group with the ID 2. +.PP +Continuing in the new namespace, we create submounts under each of +.I /mntX +and +.IR /mntY : +.PP +.in +4n +.EX +sh2# \fBmkdir /mntX/a\fP +sh2# \fBmount /dev/sda3 /mntX/a\fP +sh2# \fBmkdir /mntY/b\fP +sh2# \fBmount /dev/sda5 /mntY/b\fP +.EE +.in +.PP +When we inspect the state of the mounts in the new mount namespace, +we see that +.I /mntX/a +was created as a new shared mount +(inheriting the "shared" setting from its parent mount) and +.I /mntY/b +was created as a private mount: +.PP +.in +4n +.EX +sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP +168 167 8:23 / /mntX rw,relatime shared:1 +169 167 8:22 / /mntY rw,relatime master:2 +173 168 8:3 / /mntX/a rw,relatime shared:3 +175 169 8:5 / /mntY/b rw,relatime +.EE +.in +.PP +Returning to the first terminal (in the initial mount namespace), +we see that the mount +.I /mntX/a +propagated to the peer (the shared +.IR /mntX ), +but the mount +.I /mntY/b +was not propagated: +.PP +.in +4n +.EX +sh1# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP +132 83 8:23 / /mntX rw,relatime shared:1 +133 83 8:22 / /mntY rw,relatime shared:2 +174 132 8:3 / /mntX/a rw,relatime shared:3 +.EE +.in +.PP +Now we create a new mount under +.I /mntY +in the first shell: +.PP +.in +4n +.EX +sh1# \fBmkdir /mntY/c\fP +sh1# \fBmount /dev/sda1 /mntY/c\fP +sh1# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP +132 83 8:23 / /mntX rw,relatime shared:1 +133 83 8:22 / /mntY rw,relatime shared:2 +174 132 8:3 / /mntX/a rw,relatime shared:3 +178 133 8:1 / /mntY/c rw,relatime shared:4 +.EE +.in +.PP +When we examine the mounts in the second mount namespace, +we see that in this case the new mount has been propagated +to the slave mount, +and that the new mount is itself a slave mount (to peer group 4): +.PP +.in +4n +.EX +sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP +168 167 8:23 / /mntX rw,relatime shared:1 +169 167 8:22 / /mntY rw,relatime master:2 +173 168 8:3 / /mntX/a rw,relatime shared:3 +175 169 8:5 / /mntY/b rw,relatime +179 169 8:1 / /mntY/c rw,relatime master:4 +.EE +.in +.\" +.SS MS_UNBINDABLE example +One of the primary purposes of unbindable mounts is to avoid +the "mount explosion" problem when repeatedly performing bind mounts +of a higher-level subtree at a lower-level mount. +The problem is illustrated by the following shell session. +.PP +Suppose we have a system with the following mounts: +.PP +.in +4n +.EX +# \fBmount | awk \[aq]{print $1, $2, $3}\[aq]\fP +/dev/sda1 on / +/dev/sdb6 on /mntX +/dev/sdb7 on /mntY +.EE +.in +.PP +Suppose furthermore that we wish to recursively bind mount +the root directory under several users' home directories. +We do this for the first user, and inspect the mounts: +.PP +.in +4n +.EX +# \fBmount \-\-rbind / /home/cecilia/\fP +# \fBmount | awk \[aq]{print $1, $2, $3}\[aq]\fP +/dev/sda1 on / +/dev/sdb6 on /mntX +/dev/sdb7 on /mntY +/dev/sda1 on /home/cecilia +/dev/sdb6 on /home/cecilia/mntX +/dev/sdb7 on /home/cecilia/mntY +.EE +.in +.PP +When we repeat this operation for the second user, +we start to see the explosion problem: +.PP +.in +4n +.EX +# \fBmount \-\-rbind / /home/henry\fP +# \fBmount | awk \[aq]{print $1, $2, $3}\[aq]\fP +/dev/sda1 on / +/dev/sdb6 on /mntX +/dev/sdb7 on /mntY +/dev/sda1 on /home/cecilia +/dev/sdb6 on /home/cecilia/mntX +/dev/sdb7 on /home/cecilia/mntY +/dev/sda1 on /home/henry +/dev/sdb6 on /home/henry/mntX +/dev/sdb7 on /home/henry/mntY +/dev/sda1 on /home/henry/home/cecilia +/dev/sdb6 on /home/henry/home/cecilia/mntX +/dev/sdb7 on /home/henry/home/cecilia/mntY +.EE +.in +.PP +Under +.IR /home/henry , +we have not only recursively added the +.I /mntX +and +.I /mntY +mounts, but also the recursive mounts of those directories under +.I /home/cecilia +that were created in the previous step. +Upon repeating the step for a third user, +it becomes obvious that the explosion is exponential in nature: +.PP +.in +4n +.EX +# \fBmount \-\-rbind / /home/otto\fP +# \fBmount | awk \[aq]{print $1, $2, $3}\[aq]\fP +/dev/sda1 on / +/dev/sdb6 on /mntX +/dev/sdb7 on /mntY +/dev/sda1 on /home/cecilia +/dev/sdb6 on /home/cecilia/mntX +/dev/sdb7 on /home/cecilia/mntY +/dev/sda1 on /home/henry +/dev/sdb6 on /home/henry/mntX +/dev/sdb7 on /home/henry/mntY +/dev/sda1 on /home/henry/home/cecilia +/dev/sdb6 on /home/henry/home/cecilia/mntX +/dev/sdb7 on /home/henry/home/cecilia/mntY +/dev/sda1 on /home/otto +/dev/sdb6 on /home/otto/mntX +/dev/sdb7 on /home/otto/mntY +/dev/sda1 on /home/otto/home/cecilia +/dev/sdb6 on /home/otto/home/cecilia/mntX +/dev/sdb7 on /home/otto/home/cecilia/mntY +/dev/sda1 on /home/otto/home/henry +/dev/sdb6 on /home/otto/home/henry/mntX +/dev/sdb7 on /home/otto/home/henry/mntY +/dev/sda1 on /home/otto/home/henry/home/cecilia +/dev/sdb6 on /home/otto/home/henry/home/cecilia/mntX +/dev/sdb7 on /home/otto/home/henry/home/cecilia/mntY +.EE +.in +.PP +The mount explosion problem in the above scenario can be avoided +by making each of the new mounts unbindable. +The effect of doing this is that recursive mounts of the root +directory will not replicate the unbindable mounts. +We make such a mount for the first user: +.PP +.in +4n +.EX +# \fBmount \-\-rbind \-\-make\-unbindable / /home/cecilia\fP +.EE +.in +.PP +Before going further, we show that unbindable mounts are indeed unbindable: +.PP +.in +4n +.EX +# \fBmkdir /mntZ\fP +# \fBmount \-\-bind /home/cecilia /mntZ\fP +mount: wrong fs type, bad option, bad superblock on /home/cecilia, + missing codepage or helper program, or other error +\& + In some cases useful info is found in syslog \- try + dmesg | tail or so. +.EE +.in +.PP +Now we create unbindable recursive bind mounts for the other two users: +.PP +.in +4n +.EX +# \fBmount \-\-rbind \-\-make\-unbindable / /home/henry\fP +# \fBmount \-\-rbind \-\-make\-unbindable / /home/otto\fP +.EE +.in +.PP +Upon examining the list of mounts, +we see there has been no explosion of mounts, +because the unbindable mounts were not replicated +under each user's directory: +.PP +.in +4n +.EX +# \fBmount | awk \[aq]{print $1, $2, $3}\[aq]\fP +/dev/sda1 on / +/dev/sdb6 on /mntX +/dev/sdb7 on /mntY +/dev/sda1 on /home/cecilia +/dev/sdb6 on /home/cecilia/mntX +/dev/sdb7 on /home/cecilia/mntY +/dev/sda1 on /home/henry +/dev/sdb6 on /home/henry/mntX +/dev/sdb7 on /home/henry/mntY +/dev/sda1 on /home/otto +/dev/sdb6 on /home/otto/mntX +/dev/sdb7 on /home/otto/mntY +.EE +.in +.\" +.SS Propagation type transitions +The following table shows the effect that applying a new propagation type +(i.e., +.IR mount\~\-\-make\-xxxx ) +has on the existing propagation type of a mount. +The rows correspond to existing propagation types, +and the columns are the new propagation settings. +For reasons of space, "private" is abbreviated as "priv" and +"unbindable" as "unbind". +.TS +lb2 lb2 lb2 lb2 lb1 +lb | l l l l l. + make-shared make-slave make-priv make-unbind +_ +shared shared slave/priv [1] priv unbind +slave slave+shared slave [2] priv unbind +slave+shared slave+shared slave priv unbind +private shared priv [2] priv unbind +unbindable shared unbind [2] priv unbind +.TE +.sp 1 +Note the following details to the table: +.IP [1] 4 +If a shared mount is the only mount in its peer group, +making it a slave automatically makes it private. +.IP [2] +Slaving a nonshared mount has no effect on the mount. +.\" +.SS Bind (MS_BIND) semantics +Suppose that the following command is performed: +.PP +.in +4n +.EX +mount \-\-bind A/a B/b +.EE +.in +.PP +Here, +.I A +is the source mount, +.I B +is the destination mount, +.I a +is a subdirectory path under the mount point +.IR A , +and +.I b +is a subdirectory path under the mount point +.IR B . +The propagation type of the resulting mount, +.IR B/b , +depends on the propagation types of the mounts +.I A +and +.IR B , +and is summarized in the following table. +.PP +.TS +lb2 lb1 lb2 lb2 lb2 lb0 +lb2 lb1 lb2 lb2 lb2 lb0 +lb lb | l l l l l. + source(A) + shared private slave unbind +_ +dest(B) shared shared shared slave+shared invalid + nonshared shared private slave invalid +.TE +.sp 1 +Note that a recursive bind of a subtree follows the same semantics +as for a bind operation on each mount in the subtree. +(Unbindable mounts are automatically pruned at the target mount point.) +.PP +For further details, see +.I Documentation/filesystems/sharedsubtree.rst +in the kernel source tree. +.\" +.SS Move (MS_MOVE) semantics +Suppose that the following command is performed: +.PP +.in +4n +.EX +mount \-\-move A B/b +.EE +.in +.PP +Here, +.I A +is the source mount, +.I B +is the destination mount, and +.I b +is a subdirectory path under the mount point +.IR B . +The propagation type of the resulting mount, +.IR B/b , +depends on the propagation types of the mounts +.I A +and +.IR B , +and is summarized in the following table. +.PP +.TS +lb2 lb1 lb2 lb2 lb2 lb0 +lb2 lb1 lb2 lb2 lb2 lb0 +lb lb | l l l l l. + source(A) + shared private slave unbind +_ +dest(B) shared shared shared slave+shared invalid + nonshared shared private slave unbindable +.TE +.sp 1 +Note: moving a mount that resides under a shared mount is invalid. +.PP +For further details, see +.I Documentation/filesystems/sharedsubtree.rst +in the kernel source tree. +.\" +.SS Mount semantics +Suppose that we use the following command to create a mount: +.PP +.in +4n +.EX +mount device B/b +.EE +.in +.PP +Here, +.I B +is the destination mount, and +.I b +is a subdirectory path under the mount point +.IR B . +The propagation type of the resulting mount, +.IR B/b , +follows the same rules as for a bind mount, +where the propagation type of the source mount +is considered always to be private. +.\" +.SS Unmount semantics +Suppose that we use the following command to tear down a mount: +.PP +.in +4n +.EX +umount A +.EE +.in +.PP +Here, +.I A +is a mount on +.IR B/b , +where +.I B +is the parent mount and +.I b +is a subdirectory path under the mount point +.IR B . +If +.B B +is shared, then all most-recently-mounted mounts at +.I b +on mounts that receive propagation from mount +.I B +and do not have submounts under them are unmounted. +.\" +.SS The /proc/ pid /mountinfo "propagate_from" tag +The +.I propagate_from:X +tag is shown in the optional fields of a +.IR /proc/ pid /mountinfo +record in cases where a process can't see a slave's immediate master +(i.e., the pathname of the master is not reachable from +the filesystem root directory) +and so cannot determine the +chain of propagation between the mounts it can see. +.PP +In the following example, we first create a two-link master-slave chain +between the mounts +.IR /mnt , +.IR /tmp/etc , +and +.IR /mnt/tmp/etc . +Then the +.BR chroot (1) +command is used to make the +.I /tmp/etc +mount point unreachable from the root directory, +creating a situation where the master of +.I /mnt/tmp/etc +is not reachable from the (new) root directory of the process. +.PP +First, we bind mount the root directory onto +.I /mnt +and then bind mount +.I /proc +at +.I /mnt/proc +so that after the later +.BR chroot (1) +the +.BR proc (5) +filesystem remains visible at the correct location +in the chroot-ed environment. +.PP +.in +4n +.EX +# \fBmkdir \-p /mnt/proc\fP +# \fBmount \-\-bind / /mnt\fP +# \fBmount \-\-bind /proc /mnt/proc\fP +.EE +.in +.PP +Next, we ensure that the +.I /mnt +mount is a shared mount in a new peer group (with no peers): +.PP +.in +4n +.EX +# \fBmount \-\-make\-private /mnt\fP # Isolate from any previous peer group +# \fBmount \-\-make\-shared /mnt\fP +# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP +239 61 8:2 / /mnt ... shared:102 +248 239 0:4 / /mnt/proc ... shared:5 +.EE +.in +.PP +Next, we bind mount +.I /mnt/etc +onto +.IR /tmp/etc : +.PP +.in +4n +.EX +# \fBmkdir \-p /tmp/etc\fP +# \fBmount \-\-bind /mnt/etc /tmp/etc\fP +# \fBcat /proc/self/mountinfo | egrep \[aq]/mnt|/tmp/\[aq] | sed \[aq]s/ \- .*//\[aq]\fP +239 61 8:2 / /mnt ... shared:102 +248 239 0:4 / /mnt/proc ... shared:5 +267 40 8:2 /etc /tmp/etc ... shared:102 +.EE +.in +.PP +Initially, these two mounts are in the same peer group, +but we then make the +.I /tmp/etc +a slave of +.IR /mnt/etc , +and then make +.I /tmp/etc +shared as well, +so that it can propagate events to the next slave in the chain: +.PP +.in +4n +.EX +# \fBmount \-\-make\-slave /tmp/etc\fP +# \fBmount \-\-make\-shared /tmp/etc\fP +# \fBcat /proc/self/mountinfo | egrep \[aq]/mnt|/tmp/\[aq] | sed \[aq]s/ \- .*//\[aq]\fP +239 61 8:2 / /mnt ... shared:102 +248 239 0:4 / /mnt/proc ... shared:5 +267 40 8:2 /etc /tmp/etc ... shared:105 master:102 +.EE +.in +.PP +Then we bind mount +.I /tmp/etc +onto +.IR /mnt/tmp/etc . +Again, the two mounts are initially in the same peer group, +but we then make +.I /mnt/tmp/etc +a slave of +.IR /tmp/etc : +.PP +.in +4n +.EX +# \fBmkdir \-p /mnt/tmp/etc\fP +# \fBmount \-\-bind /tmp/etc /mnt/tmp/etc\fP +# \fBmount \-\-make\-slave /mnt/tmp/etc\fP +# \fBcat /proc/self/mountinfo | egrep \[aq]/mnt|/tmp/\[aq] | sed \[aq]s/ \- .*//\[aq]\fP +239 61 8:2 / /mnt ... shared:102 +248 239 0:4 / /mnt/proc ... shared:5 +267 40 8:2 /etc /tmp/etc ... shared:105 master:102 +273 239 8:2 /etc /mnt/tmp/etc ... master:105 +.EE +.in +.PP +From the above, we see that +.I /mnt +is the master of the slave +.IR /tmp/etc , +which in turn is the master of the slave +.IR /mnt/tmp/etc . +.PP +We then +.BR chroot (1) +to the +.I /mnt +directory, which renders the mount with ID 267 unreachable +from the (new) root directory: +.PP +.in +4n +.EX +# \fBchroot /mnt\fP +.EE +.in +.PP +When we examine the state of the mounts inside the chroot-ed environment, +we see the following: +.PP +.in +4n +.EX +# \fBcat /proc/self/mountinfo | sed \[aq]s/ \- .*//\[aq]\fP +239 61 8:2 / / ... shared:102 +248 239 0:4 / /proc ... shared:5 +273 239 8:2 /etc /tmp/etc ... master:105 propagate_from:102 +.EE +.in +.PP +Above, we see that the mount with ID 273 +is a slave whose master is the peer group 105. +The mount point for that master is unreachable, and so a +.I propagate_from +tag is displayed, indicating that the closest dominant peer group +(i.e., the nearest reachable mount in the slave chain) +is the peer group with the ID 102 (corresponding to the +.I /mnt +mount point before the +.BR chroot (1) +was performed). +.\" +.SH STANDARDS +Linux. +.SH HISTORY +Linux 2.4.19. +.\" +.SH NOTES +The propagation type assigned to a new mount depends +on the propagation type of the parent mount. +If the mount has a parent (i.e., it is a non-root mount +point) and the propagation type of the parent is +.BR MS_SHARED , +then the propagation type of the new mount is also +.BR MS_SHARED . +Otherwise, the propagation type of the new mount is +.BR MS_PRIVATE . +.PP +Notwithstanding the fact that the default propagation type +for new mount is in many cases +.BR MS_PRIVATE , +.B MS_SHARED +is typically more useful. +For this reason, +.BR systemd (1) +automatically remounts all mounts as +.B MS_SHARED +on system startup. +Thus, on most modern systems, the default propagation type is in practice +.BR MS_SHARED . +.PP +Since, when one uses +.BR unshare (1) +to create a mount namespace, +the goal is commonly to provide full isolation of the mounts +in the new namespace, +.BR unshare (1) +(since +.I util\-linux +2.27) in turn reverses the step performed by +.BR systemd (1), +by making all mounts private in the new namespace. +That is, +.BR unshare (1) +performs the equivalent of the following in the new mount namespace: +.PP +.in +4n +.EX +mount \-\-make\-rprivate / +.EE +.in +.PP +To prevent this, one can use the +.I \-\-propagation\~unchanged +option to +.BR unshare (1). +.PP +An application that creates a new mount namespace directly using +.BR clone (2) +or +.BR unshare (2) +may desire to prevent propagation of mount events to other mount namespaces +(as is done by +.BR unshare (1)). +This can be done by changing the propagation type of +mounts in the new namespace to either +.B MS_SLAVE +or +.BR MS_PRIVATE , +using a call such as the following: +.PP +.in +4n +.EX +mount(NULL, "/", MS_SLAVE | MS_REC, NULL); +.EE +.in +.PP +For a discussion of propagation types when moving mounts +.RB ( MS_MOVE ) +and creating bind mounts +.RB ( MS_BIND ), +see +.IR Documentation/filesystems/sharedsubtree.rst . +.\" +.\" ============================================================ +.\" +.SS Restrictions on mount namespaces +Note the following points with respect to mount namespaces: +.IP [1] 4 +Each mount namespace has an owner user namespace. +As explained above, when a new mount namespace is created, +its mount list is initialized as a copy of the mount list +of another mount namespace. +If the new namespace and the namespace from which the mount list +was copied are owned by different user namespaces, +then the new mount namespace is considered +.IR "less privileged" . +.IP [2] +When creating a less privileged mount namespace, +shared mounts are reduced to slave mounts. +This ensures that mappings performed in less +privileged mount namespaces will not propagate to more privileged +mount namespaces. +.IP [3] +Mounts that come as a single unit from a more privileged mount namespace are +locked together and may not be separated in a less privileged mount +namespace. +(The +.BR unshare (2) +.B CLONE_NEWNS +operation brings across all of the mounts from the original +mount namespace as a single unit, +and recursive mounts that propagate between +mount namespaces propagate as a single unit.) +.IP +In this context, "may not be separated" means that the mounts +are locked so that they may not be individually unmounted. +Consider the following example: +.IP +.in +4n +.EX +$ \fBsudo sh\fP +# \fBmount \-\-bind /dev/null /etc/shadow\fP +# \fBcat /etc/shadow\fP # Produces no output +.EE +.in +.IP +The above steps, performed in a more privileged mount namespace, +have created a bind mount that +obscures the contents of the shadow password file, +.IR /etc/shadow . +For security reasons, it should not be possible to +.BR umount (2) +that mount in a less privileged mount namespace, +since that would reveal the contents of +.IR /etc/shadow . +.IP +Suppose we now create a new mount namespace +owned by a new user namespace. +The new mount namespace will inherit copies of all of the mounts +from the previous mount namespace. +However, those mounts will be locked because the new mount namespace +is less privileged. +Consequently, an attempt to +.BR umount (2) +the mount fails as show +in the following step: +.IP +.in +4n +.EX +# \fBunshare \-\-user \-\-map\-root\-user \-\-mount \e\fP + \fBstrace \-o /tmp/log \e\fP + \fBumount /mnt/dir\fP +umount: /etc/shadow: not mounted. +# \fBgrep \[aq]\[ha]umount\[aq] /tmp/log\fP +umount2("/etc/shadow", 0) = \-1 EINVAL (Invalid argument) +.EE +.in +.IP +The error message from +.BR mount (8) +is a little confusing, but the +.BR strace (1) +output reveals that the underlying +.BR umount2 (2) +system call failed with the error +.BR EINVAL , +which is the error that the kernel returns to indicate that +the mount is locked. +.IP +Note, however, that it is possible to stack (and unstack) a +mount on top of one of the inherited locked mounts in a +less privileged mount namespace: +.IP +.in +4n +.EX +# \fBecho \[aq]aaaaa\[aq] > /tmp/a\fP # File to mount onto /etc/shadow +# \fBunshare \-\-user \-\-map\-root\-user \-\-mount \e\fP + \fBsh \-c \[aq]mount \-\-bind /tmp/a /etc/shadow; cat /etc/shadow\[aq]\fP +aaaaa +# \fBumount /etc/shadow\fP +.EE +.in +.IP +The final +.BR umount (8) +command above, which is performed in the initial mount namespace, +makes the original +.I /etc/shadow +file once more visible in that namespace. +.IP [4] +Following on from point [3], +note that it is possible to +.BR umount (2) +an entire subtree of mounts that +propagated as a unit into a less privileged mount namespace, +as illustrated in the following example. +.IP +First, we create new user and mount namespaces using +.BR unshare (1). +In the new mount namespace, +the propagation type of all mounts is set to private. +We then create a shared bind mount at +.IR /mnt , +and a small hierarchy of mounts underneath that mount. +.IP +.in +4n +.EX +$ \fBPS1=\[aq]ns1# \[aq] sudo unshare \-\-user \-\-map\-root\-user \e\fP + \fB\-\-mount \-\-propagation private bash\fP +ns1# \fBecho $$\fP # We need the PID of this shell later +778501 +ns1# \fBmount \-\-make\-shared \-\-bind /mnt /mnt\fP +ns1# \fBmkdir /mnt/x\fP +ns1# \fBmount \-\-make\-private \-t tmpfs none /mnt/x\fP +ns1# \fBmkdir /mnt/x/y\fP +ns1# \fBmount \-\-make\-private \-t tmpfs none /mnt/x/y\fP +ns1# \fBgrep /mnt /proc/self/mountinfo | sed \[aq]s/ \- .*//\[aq]\fP +986 83 8:5 /mnt /mnt rw,relatime shared:344 +989 986 0:56 / /mnt/x rw,relatime +990 989 0:57 / /mnt/x/y rw,relatime +.EE +.in +.IP +Continuing in the same shell session, +we then create a second shell in a new user namespace and a new +(less privileged) mount namespace and +check the state of the propagated mounts rooted at +.IR /mnt . +.IP +.in +4n +.EX +ns1# \fBPS1=\[aq]ns2# \[aq] unshare \-\-user \-\-map\-root\-user \e\fP + \fB\-\-mount \-\-propagation unchanged bash\fP +ns2# \fBgrep /mnt /proc/self/mountinfo | sed \[aq]s/ \- .*//\[aq]\fP +1239 1204 8:5 /mnt /mnt rw,relatime master:344 +1240 1239 0:56 / /mnt/x rw,relatime +1241 1240 0:57 / /mnt/x/y rw,relatime +.EE +.in +.IP +Of note in the above output is that the propagation type of the mount +.I /mnt +has been reduced to slave, as explained in point [2]. +This means that submount events will propagate from the master +.I /mnt +in "ns1", but propagation will not occur in the opposite direction. +.IP +From a separate terminal window, we then use +.BR nsenter (1) +to enter the mount and user namespaces corresponding to "ns1". +In that terminal window, we then recursively bind mount +.I /mnt/x +at the location +.IR /mnt/ppp . +.IP +.in +4n +.EX +$ \fBPS1=\[aq]ns3# \[aq] sudo nsenter \-t 778501 \-\-user \-\-mount\fP +ns3# \fBmount \-\-rbind \-\-make\-private /mnt/x /mnt/ppp\fP +ns3# \fBgrep /mnt /proc/self/mountinfo | sed \[aq]s/ \- .*//\[aq]\fP +986 83 8:5 /mnt /mnt rw,relatime shared:344 +989 986 0:56 / /mnt/x rw,relatime +990 989 0:57 / /mnt/x/y rw,relatime +1242 986 0:56 / /mnt/ppp rw,relatime +1243 1242 0:57 / /mnt/ppp/y rw,relatime shared:518 +.EE +.in +.IP +Because the propagation type of the parent mount, +.IR /mnt , +was shared, the recursive bind mount propagated a small subtree of +mounts under the slave mount +.I /mnt +into "ns2", +as can be verified by executing the following command in that shell session: +.IP +.in +4n +.EX +ns2# \fBgrep /mnt /proc/self/mountinfo | sed \[aq]s/ \- .*//\[aq]\fP +1239 1204 8:5 /mnt /mnt rw,relatime master:344 +1240 1239 0:56 / /mnt/x rw,relatime +1241 1240 0:57 / /mnt/x/y rw,relatime +1244 1239 0:56 / /mnt/ppp rw,relatime +1245 1244 0:57 / /mnt/ppp/y rw,relatime master:518 +.EE +.in +.IP +While it is not possible to +.BR umount (2) +a part of the propagated subtree +.RI ( /mnt/ppp/y ) +in "ns2", +it is possible to +.BR umount (2) +the entire subtree, +as shown by the following commands: +.IP +.in +4n +.EX +ns2# \fBumount /mnt/ppp/y\fP +umount: /mnt/ppp/y: not mounted. +ns2# \fBumount \-l /mnt/ppp | sed \[aq]s/ \- .*//\[aq]\fP # Succeeds... +ns2# \fBgrep /mnt /proc/self/mountinfo\fP +1239 1204 8:5 /mnt /mnt rw,relatime master:344 +1240 1239 0:56 / /mnt/x rw,relatime +1241 1240 0:57 / /mnt/x/y rw,relatime +.EE +.in +.IP [5] +The +.BR mount (2) +flags +.BR MS_RDONLY , +.BR MS_NOSUID , +.BR MS_NOEXEC , +and the "atime" flags +.RB ( MS_NOATIME , +.BR MS_NODIRATIME , +.BR MS_RELATIME ) +settings become locked +.\" commit 9566d6742852c527bf5af38af5cbb878dad75705 +.\" Author: Eric W. Biederman <ebiederm@xmission.com> +.\" Date: Mon Jul 28 17:26:07 2014 -0700 +.\" +.\" mnt: Correct permission checks in do_remount +.\" +when propagated from a more privileged to +a less privileged mount namespace, +and may not be changed in the less privileged mount namespace. +.IP +This point is illustrated in the following example where, +in a more privileged mount namespace, +we create a bind mount that is marked as read-only. +For security reasons, +it should not be possible to make the mount writable in +a less privileged mount namespace, and indeed the kernel prevents this: +.IP +.in +4n +.EX +$ \fBsudo mkdir /mnt/dir\fP +$ \fBsudo mount \-\-bind \-o ro /some/path /mnt/dir\fP +$ \fBsudo unshare \-\-user \-\-map\-root\-user \-\-mount \e\fP + \fBmount \-o remount,rw /mnt/dir\fP +mount: /mnt/dir: permission denied. +.EE +.in +.IP [6] +.\" (As of 3.18-rc1 (in Al Viro's 2014-08-30 vfs.git#for-next tree)) +A file or directory that is a mount point in one namespace that is not +a mount point in another namespace, may be renamed, unlinked, or removed +.RB ( rmdir (2)) +in the mount namespace in which it is not a mount point +(subject to the usual permission checks). +Consequently, the mount point is removed in the mount namespace +where it was a mount point. +.IP +Previously (before Linux 3.18), +.\" mtk: The change was in Linux 3.18, I think, with this commit: +.\" commit 8ed936b5671bfb33d89bc60bdcc7cf0470ba52fe +.\" Author: Eric W. Biederman <ebiederman@twitter.com> +.\" Date: Tue Oct 1 18:33:48 2013 -0700 +.\" +.\" vfs: Lazily remove mounts on unlinked files and directories. +attempting to unlink, rename, or remove a file or directory +that was a mount point in another mount namespace would result in the error +.BR EBUSY . +That behavior had technical problems of enforcement (e.g., for NFS) +and permitted denial-of-service attacks against more privileged users +(i.e., preventing individual files from being updated +by bind mounting on top of them). +.SH EXAMPLES +See +.BR pivot_root (2). +.SH SEE ALSO +.BR unshare (1), +.BR clone (2), +.BR mount (2), +.BR mount_setattr (2), +.BR pivot_root (2), +.BR setns (2), +.BR umount (2), +.BR unshare (2), +.BR proc (5), +.BR namespaces (7), +.BR user_namespaces (7), +.BR findmnt (8), +.BR mount (8), +.BR pam_namespace (8), +.BR pivot_root (8), +.BR umount (8) +.PP +.I Documentation/filesystems/sharedsubtree.rst +in the kernel source tree. |