summaryrefslogtreecommitdiffstats
path: root/upstream/opensuse-tumbleweed/man7/mount_namespaces.7
diff options
context:
space:
mode:
Diffstat (limited to 'upstream/opensuse-tumbleweed/man7/mount_namespaces.7')
-rw-r--r--upstream/opensuse-tumbleweed/man7/mount_namespaces.71371
1 files changed, 1371 insertions, 0 deletions
diff --git a/upstream/opensuse-tumbleweed/man7/mount_namespaces.7 b/upstream/opensuse-tumbleweed/man7/mount_namespaces.7
new file mode 100644
index 00000000..0ce2feef
--- /dev/null
+++ b/upstream/opensuse-tumbleweed/man7/mount_namespaces.7
@@ -0,0 +1,1371 @@
+'\" t
+.\" Copyright (c) 2016, 2019, 2021 by Michael Kerrisk <mtk.manpages@gmail.com>
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.\"
+.TH mount_namespaces 7 2023-05-03 "Linux man-pages 6.05.01"
+.SH NAME
+mount_namespaces \- overview of Linux mount namespaces
+.SH DESCRIPTION
+For an overview of namespaces, see
+.BR namespaces (7).
+.PP
+Mount namespaces provide isolation of the list of mounts seen
+by the processes in each namespace instance.
+Thus, the processes in each of the mount namespace instances
+will see distinct single-directory hierarchies.
+.PP
+The views provided by the
+.IR /proc/ pid /mounts ,
+.IR /proc/ pid /mountinfo ,
+and
+.IR /proc/ pid /mountstats
+files (all described in
+.BR proc (5))
+correspond to the mount namespace in which the process with the PID
+.I pid
+resides.
+(All of the processes that reside in the same mount namespace
+will see the same view in these files.)
+.PP
+A new mount namespace is created using either
+.BR clone (2)
+or
+.BR unshare (2)
+with the
+.B CLONE_NEWNS
+flag.
+When a new mount namespace is created,
+its mount list is initialized as follows:
+.IP \[bu] 3
+If the namespace is created using
+.BR clone (2),
+the mount list of the child's namespace is a copy
+of the mount list in the parent process's mount namespace.
+.IP \[bu]
+If the namespace is created using
+.BR unshare (2),
+the mount list of the new namespace is a copy of
+the mount list in the caller's previous mount namespace.
+.PP
+Subsequent modifications to the mount list
+.RB ( mount (2)
+and
+.BR umount (2))
+in either mount namespace will not (by default) affect the
+mount list seen in the other namespace
+(but see the following discussion of shared subtrees).
+.\"
+.SH SHARED SUBTREES
+After the implementation of mount namespaces was completed,
+experience showed that the isolation that they provided was,
+in some cases, too great.
+For example, in order to make a newly loaded optical disk
+available in all mount namespaces,
+a mount operation was required in each namespace.
+For this use case, and others,
+the shared subtree feature was introduced in Linux 2.6.15.
+This feature allows for automatic, controlled propagation of
+.BR mount (2)
+and
+.BR umount (2)
+.I events
+between namespaces
+(or, more precisely, between the mounts that are members of a
+.I peer group
+that are propagating events to one another).
+.PP
+Each mount is marked (via
+.BR mount (2))
+as having one of the following
+.IR "propagation types" :
+.TP
+.B MS_SHARED
+This mount shares events with members of a peer group.
+.BR mount (2)
+and
+.BR umount (2)
+events immediately under this mount will propagate
+to the other mounts that are members of the peer group.
+.I Propagation
+here means that the same
+.BR mount (2)
+or
+.BR umount (2)
+will automatically occur
+under all of the other mounts in the peer group.
+Conversely,
+.BR mount (2)
+and
+.BR umount (2)
+events that take place under
+peer mounts will propagate to this mount.
+.TP
+.B MS_PRIVATE
+This mount is private; it does not have a peer group.
+.BR mount (2)
+and
+.BR umount (2)
+events do not propagate into or out of this mount.
+.TP
+.B MS_SLAVE
+.BR mount (2)
+and
+.BR umount (2)
+events propagate into this mount from
+a (master) shared peer group.
+.BR mount (2)
+and
+.BR umount (2)
+events under this mount do not propagate to any peer.
+.IP
+Note that a mount can be the slave of another peer group
+while at the same time sharing
+.BR mount (2)
+and
+.BR umount (2)
+events
+with a peer group of which it is a member.
+(More precisely, one peer group can be the slave of another peer group.)
+.TP
+.B MS_UNBINDABLE
+This is like a private mount,
+and in addition this mount can't be bind mounted.
+Attempts to bind mount this mount
+.RB ( mount (2)
+with the
+.B MS_BIND
+flag) will fail.
+.IP
+When a recursive bind mount
+.RB ( mount (2)
+with the
+.B MS_BIND
+and
+.B MS_REC
+flags) is performed on a directory subtree,
+any bind mounts within the subtree are automatically pruned
+(i.e., not replicated)
+when replicating that subtree to produce the target subtree.
+.PP
+For a discussion of the propagation type assigned to a new mount,
+see NOTES.
+.PP
+The propagation type is a per-mount-point setting;
+some mounts may be marked as shared
+(with each shared mount being a member of a distinct peer group),
+while others are private
+(or slaved or unbindable).
+.PP
+Note that a mount's propagation type determines whether
+.BR mount (2)
+and
+.BR umount (2)
+of mounts
+.I immediately under
+the mount are propagated.
+Thus, the propagation type does not affect propagation of events for
+grandchildren and further removed descendant mounts.
+What happens if the mount itself is unmounted is determined by
+the propagation type that is in effect for the
+.I parent
+of the mount.
+.PP
+Members are added to a
+.I peer group
+when a mount is marked as shared and either:
+.IP (a) 5
+the mount is replicated during the creation of a new mount namespace; or
+.IP (b)
+a new bind mount is created from the mount.
+.PP
+In both of these cases, the new mount joins the peer group
+of which the existing mount is a member.
+.PP
+A new peer group is also created when a child mount is created under
+an existing mount that is marked as shared.
+In this case, the new child mount is also marked as shared and
+the resulting peer group consists of all the mounts
+that are replicated under the peers of parent mounts.
+.PP
+A mount ceases to be a member of a peer group when either
+the mount is explicitly unmounted,
+or when the mount is implicitly unmounted because a mount namespace is removed
+(because it has no more member processes).
+.PP
+The propagation type of the mounts in a mount namespace
+can be discovered via the "optional fields" exposed in
+.IR /proc/ pid /mountinfo .
+(See
+.BR proc (5)
+for details of this file.)
+The following tags can appear in the optional fields
+for a record in that file:
+.TP
+.I shared:X
+This mount is shared in peer group
+.IR X .
+Each peer group has a unique ID that is automatically
+generated by the kernel,
+and all mounts in the same peer group will show the same ID.
+(These IDs are assigned starting from the value 1,
+and may be recycled when a peer group ceases to have any members.)
+.TP
+.I master:X
+This mount is a slave to shared peer group
+.IR X .
+.TP
+.IR propagate_from:X " (since Linux 2.6.26)"
+.\" commit 97e7e0f71d6d948c25f11f0a33878d9356d9579e
+This mount is a slave and receives propagation from shared peer group
+.IR X .
+This tag will always appear in conjunction with a
+.I master:X
+tag.
+Here,
+.I X
+is the closest dominant peer group under the process's root directory.
+If
+.I X
+is the immediate master of the mount,
+or if there is no dominant peer group under the same root,
+then only the
+.I master:X
+field is present and not the
+.I propagate_from:X
+field.
+For further details, see below.
+.TP
+.I unbindable
+This is an unbindable mount.
+.PP
+If none of the above tags is present, then this is a private mount.
+.SS MS_SHARED and MS_PRIVATE example
+Suppose that on a terminal in the initial mount namespace,
+we mark one mount as shared and another as private,
+and then view the mounts in
+.IR /proc/self/mountinfo :
+.PP
+.in +4n
+.EX
+sh1# \fBmount \-\-make\-shared /mntS\fP
+sh1# \fBmount \-\-make\-private /mntP\fP
+sh1# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
+77 61 8:17 / /mntS rw,relatime shared:1
+83 61 8:15 / /mntP rw,relatime
+.EE
+.in
+.PP
+From the
+.I /proc/self/mountinfo
+output, we see that
+.I /mntS
+is a shared mount in peer group 1, and that
+.I /mntP
+has no optional tags, indicating that it is a private mount.
+The first two fields in each record in this file are the unique
+ID for this mount, and the mount ID of the parent mount.
+We can further inspect this file to see that the parent mount of
+.I /mntS
+and
+.I /mntP
+is the root directory,
+.IR / ,
+which is mounted as private:
+.PP
+.in +4n
+.EX
+sh1# \fBcat /proc/self/mountinfo | awk \[aq]$1 == 61\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
+61 0 8:2 / / rw,relatime
+.EE
+.in
+.PP
+On a second terminal,
+we create a new mount namespace where we run a second shell
+and inspect the mounts:
+.PP
+.in +4n
+.EX
+$ \fBPS1=\[aq]sh2# \[aq] sudo unshare \-m \-\-propagation unchanged sh\fP
+sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
+222 145 8:17 / /mntS rw,relatime shared:1
+225 145 8:15 / /mntP rw,relatime
+.EE
+.in
+.PP
+The new mount namespace received a copy of the initial mount namespace's
+mounts.
+These new mounts maintain the same propagation types,
+but have unique mount IDs.
+(The
+.I \-\-propagation\~unchanged
+option prevents
+.BR unshare (1)
+from marking all mounts as private when creating a new mount namespace,
+.\" Since util-linux 2.27
+which it does by default.)
+.PP
+In the second terminal, we then create submounts under each of
+.I /mntS
+and
+.I /mntP
+and inspect the set-up:
+.PP
+.in +4n
+.EX
+sh2# \fBmkdir /mntS/a\fP
+sh2# \fBmount /dev/sdb6 /mntS/a\fP
+sh2# \fBmkdir /mntP/b\fP
+sh2# \fBmount /dev/sdb7 /mntP/b\fP
+sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
+222 145 8:17 / /mntS rw,relatime shared:1
+225 145 8:15 / /mntP rw,relatime
+178 222 8:22 / /mntS/a rw,relatime shared:2
+230 225 8:23 / /mntP/b rw,relatime
+.EE
+.in
+.PP
+From the above, it can be seen that
+.I /mntS/a
+was created as shared (inheriting this setting from its parent mount) and
+.I /mntP/b
+was created as a private mount.
+.PP
+Returning to the first terminal and inspecting the set-up,
+we see that the new mount created under the shared mount
+.I /mntS
+propagated to its peer mount (in the initial mount namespace),
+but the new mount created under the private mount
+.I /mntP
+did not propagate:
+.PP
+.in +4n
+.EX
+sh1# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
+77 61 8:17 / /mntS rw,relatime shared:1
+83 61 8:15 / /mntP rw,relatime
+179 77 8:22 / /mntS/a rw,relatime shared:2
+.EE
+.in
+.\"
+.SS MS_SLAVE example
+Making a mount a slave allows it to receive propagated
+.BR mount (2)
+and
+.BR umount (2)
+events from a master shared peer group,
+while preventing it from propagating events to that master.
+This is useful if we want to (say) receive a mount event when
+an optical disk is mounted in the master shared peer group
+(in another mount namespace),
+but want to prevent
+.BR mount (2)
+and
+.BR umount (2)
+events under the slave mount
+from having side effects in other namespaces.
+.PP
+We can demonstrate the effect of slaving by first marking
+two mounts as shared in the initial mount namespace:
+.PP
+.in +4n
+.EX
+sh1# \fBmount \-\-make\-shared /mntX\fP
+sh1# \fBmount \-\-make\-shared /mntY\fP
+sh1# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
+132 83 8:23 / /mntX rw,relatime shared:1
+133 83 8:22 / /mntY rw,relatime shared:2
+.EE
+.in
+.PP
+On a second terminal,
+we create a new mount namespace and inspect the mounts:
+.PP
+.in +4n
+.EX
+sh2# \fBunshare \-m \-\-propagation unchanged sh\fP
+sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
+168 167 8:23 / /mntX rw,relatime shared:1
+169 167 8:22 / /mntY rw,relatime shared:2
+.EE
+.in
+.PP
+In the new mount namespace, we then mark one of the mounts as a slave:
+.PP
+.in +4n
+.EX
+sh2# \fBmount \-\-make\-slave /mntY\fP
+sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
+168 167 8:23 / /mntX rw,relatime shared:1
+169 167 8:22 / /mntY rw,relatime master:2
+.EE
+.in
+.PP
+From the above output, we see that
+.I /mntY
+is now a slave mount that is receiving propagation events from
+the shared peer group with the ID 2.
+.PP
+Continuing in the new namespace, we create submounts under each of
+.I /mntX
+and
+.IR /mntY :
+.PP
+.in +4n
+.EX
+sh2# \fBmkdir /mntX/a\fP
+sh2# \fBmount /dev/sda3 /mntX/a\fP
+sh2# \fBmkdir /mntY/b\fP
+sh2# \fBmount /dev/sda5 /mntY/b\fP
+.EE
+.in
+.PP
+When we inspect the state of the mounts in the new mount namespace,
+we see that
+.I /mntX/a
+was created as a new shared mount
+(inheriting the "shared" setting from its parent mount) and
+.I /mntY/b
+was created as a private mount:
+.PP
+.in +4n
+.EX
+sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
+168 167 8:23 / /mntX rw,relatime shared:1
+169 167 8:22 / /mntY rw,relatime master:2
+173 168 8:3 / /mntX/a rw,relatime shared:3
+175 169 8:5 / /mntY/b rw,relatime
+.EE
+.in
+.PP
+Returning to the first terminal (in the initial mount namespace),
+we see that the mount
+.I /mntX/a
+propagated to the peer (the shared
+.IR /mntX ),
+but the mount
+.I /mntY/b
+was not propagated:
+.PP
+.in +4n
+.EX
+sh1# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
+132 83 8:23 / /mntX rw,relatime shared:1
+133 83 8:22 / /mntY rw,relatime shared:2
+174 132 8:3 / /mntX/a rw,relatime shared:3
+.EE
+.in
+.PP
+Now we create a new mount under
+.I /mntY
+in the first shell:
+.PP
+.in +4n
+.EX
+sh1# \fBmkdir /mntY/c\fP
+sh1# \fBmount /dev/sda1 /mntY/c\fP
+sh1# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
+132 83 8:23 / /mntX rw,relatime shared:1
+133 83 8:22 / /mntY rw,relatime shared:2
+174 132 8:3 / /mntX/a rw,relatime shared:3
+178 133 8:1 / /mntY/c rw,relatime shared:4
+.EE
+.in
+.PP
+When we examine the mounts in the second mount namespace,
+we see that in this case the new mount has been propagated
+to the slave mount,
+and that the new mount is itself a slave mount (to peer group 4):
+.PP
+.in +4n
+.EX
+sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
+168 167 8:23 / /mntX rw,relatime shared:1
+169 167 8:22 / /mntY rw,relatime master:2
+173 168 8:3 / /mntX/a rw,relatime shared:3
+175 169 8:5 / /mntY/b rw,relatime
+179 169 8:1 / /mntY/c rw,relatime master:4
+.EE
+.in
+.\"
+.SS MS_UNBINDABLE example
+One of the primary purposes of unbindable mounts is to avoid
+the "mount explosion" problem when repeatedly performing bind mounts
+of a higher-level subtree at a lower-level mount.
+The problem is illustrated by the following shell session.
+.PP
+Suppose we have a system with the following mounts:
+.PP
+.in +4n
+.EX
+# \fBmount | awk \[aq]{print $1, $2, $3}\[aq]\fP
+/dev/sda1 on /
+/dev/sdb6 on /mntX
+/dev/sdb7 on /mntY
+.EE
+.in
+.PP
+Suppose furthermore that we wish to recursively bind mount
+the root directory under several users' home directories.
+We do this for the first user, and inspect the mounts:
+.PP
+.in +4n
+.EX
+# \fBmount \-\-rbind / /home/cecilia/\fP
+# \fBmount | awk \[aq]{print $1, $2, $3}\[aq]\fP
+/dev/sda1 on /
+/dev/sdb6 on /mntX
+/dev/sdb7 on /mntY
+/dev/sda1 on /home/cecilia
+/dev/sdb6 on /home/cecilia/mntX
+/dev/sdb7 on /home/cecilia/mntY
+.EE
+.in
+.PP
+When we repeat this operation for the second user,
+we start to see the explosion problem:
+.PP
+.in +4n
+.EX
+# \fBmount \-\-rbind / /home/henry\fP
+# \fBmount | awk \[aq]{print $1, $2, $3}\[aq]\fP
+/dev/sda1 on /
+/dev/sdb6 on /mntX
+/dev/sdb7 on /mntY
+/dev/sda1 on /home/cecilia
+/dev/sdb6 on /home/cecilia/mntX
+/dev/sdb7 on /home/cecilia/mntY
+/dev/sda1 on /home/henry
+/dev/sdb6 on /home/henry/mntX
+/dev/sdb7 on /home/henry/mntY
+/dev/sda1 on /home/henry/home/cecilia
+/dev/sdb6 on /home/henry/home/cecilia/mntX
+/dev/sdb7 on /home/henry/home/cecilia/mntY
+.EE
+.in
+.PP
+Under
+.IR /home/henry ,
+we have not only recursively added the
+.I /mntX
+and
+.I /mntY
+mounts, but also the recursive mounts of those directories under
+.I /home/cecilia
+that were created in the previous step.
+Upon repeating the step for a third user,
+it becomes obvious that the explosion is exponential in nature:
+.PP
+.in +4n
+.EX
+# \fBmount \-\-rbind / /home/otto\fP
+# \fBmount | awk \[aq]{print $1, $2, $3}\[aq]\fP
+/dev/sda1 on /
+/dev/sdb6 on /mntX
+/dev/sdb7 on /mntY
+/dev/sda1 on /home/cecilia
+/dev/sdb6 on /home/cecilia/mntX
+/dev/sdb7 on /home/cecilia/mntY
+/dev/sda1 on /home/henry
+/dev/sdb6 on /home/henry/mntX
+/dev/sdb7 on /home/henry/mntY
+/dev/sda1 on /home/henry/home/cecilia
+/dev/sdb6 on /home/henry/home/cecilia/mntX
+/dev/sdb7 on /home/henry/home/cecilia/mntY
+/dev/sda1 on /home/otto
+/dev/sdb6 on /home/otto/mntX
+/dev/sdb7 on /home/otto/mntY
+/dev/sda1 on /home/otto/home/cecilia
+/dev/sdb6 on /home/otto/home/cecilia/mntX
+/dev/sdb7 on /home/otto/home/cecilia/mntY
+/dev/sda1 on /home/otto/home/henry
+/dev/sdb6 on /home/otto/home/henry/mntX
+/dev/sdb7 on /home/otto/home/henry/mntY
+/dev/sda1 on /home/otto/home/henry/home/cecilia
+/dev/sdb6 on /home/otto/home/henry/home/cecilia/mntX
+/dev/sdb7 on /home/otto/home/henry/home/cecilia/mntY
+.EE
+.in
+.PP
+The mount explosion problem in the above scenario can be avoided
+by making each of the new mounts unbindable.
+The effect of doing this is that recursive mounts of the root
+directory will not replicate the unbindable mounts.
+We make such a mount for the first user:
+.PP
+.in +4n
+.EX
+# \fBmount \-\-rbind \-\-make\-unbindable / /home/cecilia\fP
+.EE
+.in
+.PP
+Before going further, we show that unbindable mounts are indeed unbindable:
+.PP
+.in +4n
+.EX
+# \fBmkdir /mntZ\fP
+# \fBmount \-\-bind /home/cecilia /mntZ\fP
+mount: wrong fs type, bad option, bad superblock on /home/cecilia,
+ missing codepage or helper program, or other error
+\&
+ In some cases useful info is found in syslog \- try
+ dmesg | tail or so.
+.EE
+.in
+.PP
+Now we create unbindable recursive bind mounts for the other two users:
+.PP
+.in +4n
+.EX
+# \fBmount \-\-rbind \-\-make\-unbindable / /home/henry\fP
+# \fBmount \-\-rbind \-\-make\-unbindable / /home/otto\fP
+.EE
+.in
+.PP
+Upon examining the list of mounts,
+we see there has been no explosion of mounts,
+because the unbindable mounts were not replicated
+under each user's directory:
+.PP
+.in +4n
+.EX
+# \fBmount | awk \[aq]{print $1, $2, $3}\[aq]\fP
+/dev/sda1 on /
+/dev/sdb6 on /mntX
+/dev/sdb7 on /mntY
+/dev/sda1 on /home/cecilia
+/dev/sdb6 on /home/cecilia/mntX
+/dev/sdb7 on /home/cecilia/mntY
+/dev/sda1 on /home/henry
+/dev/sdb6 on /home/henry/mntX
+/dev/sdb7 on /home/henry/mntY
+/dev/sda1 on /home/otto
+/dev/sdb6 on /home/otto/mntX
+/dev/sdb7 on /home/otto/mntY
+.EE
+.in
+.\"
+.SS Propagation type transitions
+The following table shows the effect that applying a new propagation type
+(i.e.,
+.IR mount\~\-\-make\-xxxx )
+has on the existing propagation type of a mount.
+The rows correspond to existing propagation types,
+and the columns are the new propagation settings.
+For reasons of space, "private" is abbreviated as "priv" and
+"unbindable" as "unbind".
+.TS
+lb2 lb2 lb2 lb2 lb1
+lb | l l l l l.
+ make-shared make-slave make-priv make-unbind
+_
+shared shared slave/priv [1] priv unbind
+slave slave+shared slave [2] priv unbind
+slave+shared slave+shared slave priv unbind
+private shared priv [2] priv unbind
+unbindable shared unbind [2] priv unbind
+.TE
+.sp 1
+Note the following details to the table:
+.IP [1] 4
+If a shared mount is the only mount in its peer group,
+making it a slave automatically makes it private.
+.IP [2]
+Slaving a nonshared mount has no effect on the mount.
+.\"
+.SS Bind (MS_BIND) semantics
+Suppose that the following command is performed:
+.PP
+.in +4n
+.EX
+mount \-\-bind A/a B/b
+.EE
+.in
+.PP
+Here,
+.I A
+is the source mount,
+.I B
+is the destination mount,
+.I a
+is a subdirectory path under the mount point
+.IR A ,
+and
+.I b
+is a subdirectory path under the mount point
+.IR B .
+The propagation type of the resulting mount,
+.IR B/b ,
+depends on the propagation types of the mounts
+.I A
+and
+.IR B ,
+and is summarized in the following table.
+.PP
+.TS
+lb2 lb1 lb2 lb2 lb2 lb0
+lb2 lb1 lb2 lb2 lb2 lb0
+lb lb | l l l l l.
+ source(A)
+ shared private slave unbind
+_
+dest(B) shared shared shared slave+shared invalid
+ nonshared shared private slave invalid
+.TE
+.sp 1
+Note that a recursive bind of a subtree follows the same semantics
+as for a bind operation on each mount in the subtree.
+(Unbindable mounts are automatically pruned at the target mount point.)
+.PP
+For further details, see
+.I Documentation/filesystems/sharedsubtree.rst
+in the kernel source tree.
+.\"
+.SS Move (MS_MOVE) semantics
+Suppose that the following command is performed:
+.PP
+.in +4n
+.EX
+mount \-\-move A B/b
+.EE
+.in
+.PP
+Here,
+.I A
+is the source mount,
+.I B
+is the destination mount, and
+.I b
+is a subdirectory path under the mount point
+.IR B .
+The propagation type of the resulting mount,
+.IR B/b ,
+depends on the propagation types of the mounts
+.I A
+and
+.IR B ,
+and is summarized in the following table.
+.PP
+.TS
+lb2 lb1 lb2 lb2 lb2 lb0
+lb2 lb1 lb2 lb2 lb2 lb0
+lb lb | l l l l l.
+ source(A)
+ shared private slave unbind
+_
+dest(B) shared shared shared slave+shared invalid
+ nonshared shared private slave unbindable
+.TE
+.sp 1
+Note: moving a mount that resides under a shared mount is invalid.
+.PP
+For further details, see
+.I Documentation/filesystems/sharedsubtree.rst
+in the kernel source tree.
+.\"
+.SS Mount semantics
+Suppose that we use the following command to create a mount:
+.PP
+.in +4n
+.EX
+mount device B/b
+.EE
+.in
+.PP
+Here,
+.I B
+is the destination mount, and
+.I b
+is a subdirectory path under the mount point
+.IR B .
+The propagation type of the resulting mount,
+.IR B/b ,
+follows the same rules as for a bind mount,
+where the propagation type of the source mount
+is considered always to be private.
+.\"
+.SS Unmount semantics
+Suppose that we use the following command to tear down a mount:
+.PP
+.in +4n
+.EX
+umount A
+.EE
+.in
+.PP
+Here,
+.I A
+is a mount on
+.IR B/b ,
+where
+.I B
+is the parent mount and
+.I b
+is a subdirectory path under the mount point
+.IR B .
+If
+.B B
+is shared, then all most-recently-mounted mounts at
+.I b
+on mounts that receive propagation from mount
+.I B
+and do not have submounts under them are unmounted.
+.\"
+.SS The /proc/ pid /mountinfo "propagate_from" tag
+The
+.I propagate_from:X
+tag is shown in the optional fields of a
+.IR /proc/ pid /mountinfo
+record in cases where a process can't see a slave's immediate master
+(i.e., the pathname of the master is not reachable from
+the filesystem root directory)
+and so cannot determine the
+chain of propagation between the mounts it can see.
+.PP
+In the following example, we first create a two-link master-slave chain
+between the mounts
+.IR /mnt ,
+.IR /tmp/etc ,
+and
+.IR /mnt/tmp/etc .
+Then the
+.BR chroot (1)
+command is used to make the
+.I /tmp/etc
+mount point unreachable from the root directory,
+creating a situation where the master of
+.I /mnt/tmp/etc
+is not reachable from the (new) root directory of the process.
+.PP
+First, we bind mount the root directory onto
+.I /mnt
+and then bind mount
+.I /proc
+at
+.I /mnt/proc
+so that after the later
+.BR chroot (1)
+the
+.BR proc (5)
+filesystem remains visible at the correct location
+in the chroot-ed environment.
+.PP
+.in +4n
+.EX
+# \fBmkdir \-p /mnt/proc\fP
+# \fBmount \-\-bind / /mnt\fP
+# \fBmount \-\-bind /proc /mnt/proc\fP
+.EE
+.in
+.PP
+Next, we ensure that the
+.I /mnt
+mount is a shared mount in a new peer group (with no peers):
+.PP
+.in +4n
+.EX
+# \fBmount \-\-make\-private /mnt\fP # Isolate from any previous peer group
+# \fBmount \-\-make\-shared /mnt\fP
+# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
+239 61 8:2 / /mnt ... shared:102
+248 239 0:4 / /mnt/proc ... shared:5
+.EE
+.in
+.PP
+Next, we bind mount
+.I /mnt/etc
+onto
+.IR /tmp/etc :
+.PP
+.in +4n
+.EX
+# \fBmkdir \-p /tmp/etc\fP
+# \fBmount \-\-bind /mnt/etc /tmp/etc\fP
+# \fBcat /proc/self/mountinfo | egrep \[aq]/mnt|/tmp/\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
+239 61 8:2 / /mnt ... shared:102
+248 239 0:4 / /mnt/proc ... shared:5
+267 40 8:2 /etc /tmp/etc ... shared:102
+.EE
+.in
+.PP
+Initially, these two mounts are in the same peer group,
+but we then make the
+.I /tmp/etc
+a slave of
+.IR /mnt/etc ,
+and then make
+.I /tmp/etc
+shared as well,
+so that it can propagate events to the next slave in the chain:
+.PP
+.in +4n
+.EX
+# \fBmount \-\-make\-slave /tmp/etc\fP
+# \fBmount \-\-make\-shared /tmp/etc\fP
+# \fBcat /proc/self/mountinfo | egrep \[aq]/mnt|/tmp/\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
+239 61 8:2 / /mnt ... shared:102
+248 239 0:4 / /mnt/proc ... shared:5
+267 40 8:2 /etc /tmp/etc ... shared:105 master:102
+.EE
+.in
+.PP
+Then we bind mount
+.I /tmp/etc
+onto
+.IR /mnt/tmp/etc .
+Again, the two mounts are initially in the same peer group,
+but we then make
+.I /mnt/tmp/etc
+a slave of
+.IR /tmp/etc :
+.PP
+.in +4n
+.EX
+# \fBmkdir \-p /mnt/tmp/etc\fP
+# \fBmount \-\-bind /tmp/etc /mnt/tmp/etc\fP
+# \fBmount \-\-make\-slave /mnt/tmp/etc\fP
+# \fBcat /proc/self/mountinfo | egrep \[aq]/mnt|/tmp/\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
+239 61 8:2 / /mnt ... shared:102
+248 239 0:4 / /mnt/proc ... shared:5
+267 40 8:2 /etc /tmp/etc ... shared:105 master:102
+273 239 8:2 /etc /mnt/tmp/etc ... master:105
+.EE
+.in
+.PP
+From the above, we see that
+.I /mnt
+is the master of the slave
+.IR /tmp/etc ,
+which in turn is the master of the slave
+.IR /mnt/tmp/etc .
+.PP
+We then
+.BR chroot (1)
+to the
+.I /mnt
+directory, which renders the mount with ID 267 unreachable
+from the (new) root directory:
+.PP
+.in +4n
+.EX
+# \fBchroot /mnt\fP
+.EE
+.in
+.PP
+When we examine the state of the mounts inside the chroot-ed environment,
+we see the following:
+.PP
+.in +4n
+.EX
+# \fBcat /proc/self/mountinfo | sed \[aq]s/ \- .*//\[aq]\fP
+239 61 8:2 / / ... shared:102
+248 239 0:4 / /proc ... shared:5
+273 239 8:2 /etc /tmp/etc ... master:105 propagate_from:102
+.EE
+.in
+.PP
+Above, we see that the mount with ID 273
+is a slave whose master is the peer group 105.
+The mount point for that master is unreachable, and so a
+.I propagate_from
+tag is displayed, indicating that the closest dominant peer group
+(i.e., the nearest reachable mount in the slave chain)
+is the peer group with the ID 102 (corresponding to the
+.I /mnt
+mount point before the
+.BR chroot (1)
+was performed).
+.\"
+.SH STANDARDS
+Linux.
+.SH HISTORY
+Linux 2.4.19.
+.\"
+.SH NOTES
+The propagation type assigned to a new mount depends
+on the propagation type of the parent mount.
+If the mount has a parent (i.e., it is a non-root mount
+point) and the propagation type of the parent is
+.BR MS_SHARED ,
+then the propagation type of the new mount is also
+.BR MS_SHARED .
+Otherwise, the propagation type of the new mount is
+.BR MS_PRIVATE .
+.PP
+Notwithstanding the fact that the default propagation type
+for new mount is in many cases
+.BR MS_PRIVATE ,
+.B MS_SHARED
+is typically more useful.
+For this reason,
+.BR systemd (1)
+automatically remounts all mounts as
+.B MS_SHARED
+on system startup.
+Thus, on most modern systems, the default propagation type is in practice
+.BR MS_SHARED .
+.PP
+Since, when one uses
+.BR unshare (1)
+to create a mount namespace,
+the goal is commonly to provide full isolation of the mounts
+in the new namespace,
+.BR unshare (1)
+(since
+.I util\-linux
+2.27) in turn reverses the step performed by
+.BR systemd (1),
+by making all mounts private in the new namespace.
+That is,
+.BR unshare (1)
+performs the equivalent of the following in the new mount namespace:
+.PP
+.in +4n
+.EX
+mount \-\-make\-rprivate /
+.EE
+.in
+.PP
+To prevent this, one can use the
+.I \-\-propagation\~unchanged
+option to
+.BR unshare (1).
+.PP
+An application that creates a new mount namespace directly using
+.BR clone (2)
+or
+.BR unshare (2)
+may desire to prevent propagation of mount events to other mount namespaces
+(as is done by
+.BR unshare (1)).
+This can be done by changing the propagation type of
+mounts in the new namespace to either
+.B MS_SLAVE
+or
+.BR MS_PRIVATE ,
+using a call such as the following:
+.PP
+.in +4n
+.EX
+mount(NULL, "/", MS_SLAVE | MS_REC, NULL);
+.EE
+.in
+.PP
+For a discussion of propagation types when moving mounts
+.RB ( MS_MOVE )
+and creating bind mounts
+.RB ( MS_BIND ),
+see
+.IR Documentation/filesystems/sharedsubtree.rst .
+.\"
+.\" ============================================================
+.\"
+.SS Restrictions on mount namespaces
+Note the following points with respect to mount namespaces:
+.IP [1] 4
+Each mount namespace has an owner user namespace.
+As explained above, when a new mount namespace is created,
+its mount list is initialized as a copy of the mount list
+of another mount namespace.
+If the new namespace and the namespace from which the mount list
+was copied are owned by different user namespaces,
+then the new mount namespace is considered
+.IR "less privileged" .
+.IP [2]
+When creating a less privileged mount namespace,
+shared mounts are reduced to slave mounts.
+This ensures that mappings performed in less
+privileged mount namespaces will not propagate to more privileged
+mount namespaces.
+.IP [3]
+Mounts that come as a single unit from a more privileged mount namespace are
+locked together and may not be separated in a less privileged mount
+namespace.
+(The
+.BR unshare (2)
+.B CLONE_NEWNS
+operation brings across all of the mounts from the original
+mount namespace as a single unit,
+and recursive mounts that propagate between
+mount namespaces propagate as a single unit.)
+.IP
+In this context, "may not be separated" means that the mounts
+are locked so that they may not be individually unmounted.
+Consider the following example:
+.IP
+.in +4n
+.EX
+$ \fBsudo sh\fP
+# \fBmount \-\-bind /dev/null /etc/shadow\fP
+# \fBcat /etc/shadow\fP # Produces no output
+.EE
+.in
+.IP
+The above steps, performed in a more privileged mount namespace,
+have created a bind mount that
+obscures the contents of the shadow password file,
+.IR /etc/shadow .
+For security reasons, it should not be possible to
+.BR umount (2)
+that mount in a less privileged mount namespace,
+since that would reveal the contents of
+.IR /etc/shadow .
+.IP
+Suppose we now create a new mount namespace
+owned by a new user namespace.
+The new mount namespace will inherit copies of all of the mounts
+from the previous mount namespace.
+However, those mounts will be locked because the new mount namespace
+is less privileged.
+Consequently, an attempt to
+.BR umount (2)
+the mount fails as show
+in the following step:
+.IP
+.in +4n
+.EX
+# \fBunshare \-\-user \-\-map\-root\-user \-\-mount \e\fP
+ \fBstrace \-o /tmp/log \e\fP
+ \fBumount /mnt/dir\fP
+umount: /etc/shadow: not mounted.
+# \fBgrep \[aq]\[ha]umount\[aq] /tmp/log\fP
+umount2("/etc/shadow", 0) = \-1 EINVAL (Invalid argument)
+.EE
+.in
+.IP
+The error message from
+.BR mount (8)
+is a little confusing, but the
+.BR strace (1)
+output reveals that the underlying
+.BR umount2 (2)
+system call failed with the error
+.BR EINVAL ,
+which is the error that the kernel returns to indicate that
+the mount is locked.
+.IP
+Note, however, that it is possible to stack (and unstack) a
+mount on top of one of the inherited locked mounts in a
+less privileged mount namespace:
+.IP
+.in +4n
+.EX
+# \fBecho \[aq]aaaaa\[aq] > /tmp/a\fP # File to mount onto /etc/shadow
+# \fBunshare \-\-user \-\-map\-root\-user \-\-mount \e\fP
+ \fBsh \-c \[aq]mount \-\-bind /tmp/a /etc/shadow; cat /etc/shadow\[aq]\fP
+aaaaa
+# \fBumount /etc/shadow\fP
+.EE
+.in
+.IP
+The final
+.BR umount (8)
+command above, which is performed in the initial mount namespace,
+makes the original
+.I /etc/shadow
+file once more visible in that namespace.
+.IP [4]
+Following on from point [3],
+note that it is possible to
+.BR umount (2)
+an entire subtree of mounts that
+propagated as a unit into a less privileged mount namespace,
+as illustrated in the following example.
+.IP
+First, we create new user and mount namespaces using
+.BR unshare (1).
+In the new mount namespace,
+the propagation type of all mounts is set to private.
+We then create a shared bind mount at
+.IR /mnt ,
+and a small hierarchy of mounts underneath that mount.
+.IP
+.in +4n
+.EX
+$ \fBPS1=\[aq]ns1# \[aq] sudo unshare \-\-user \-\-map\-root\-user \e\fP
+ \fB\-\-mount \-\-propagation private bash\fP
+ns1# \fBecho $$\fP # We need the PID of this shell later
+778501
+ns1# \fBmount \-\-make\-shared \-\-bind /mnt /mnt\fP
+ns1# \fBmkdir /mnt/x\fP
+ns1# \fBmount \-\-make\-private \-t tmpfs none /mnt/x\fP
+ns1# \fBmkdir /mnt/x/y\fP
+ns1# \fBmount \-\-make\-private \-t tmpfs none /mnt/x/y\fP
+ns1# \fBgrep /mnt /proc/self/mountinfo | sed \[aq]s/ \- .*//\[aq]\fP
+986 83 8:5 /mnt /mnt rw,relatime shared:344
+989 986 0:56 / /mnt/x rw,relatime
+990 989 0:57 / /mnt/x/y rw,relatime
+.EE
+.in
+.IP
+Continuing in the same shell session,
+we then create a second shell in a new user namespace and a new
+(less privileged) mount namespace and
+check the state of the propagated mounts rooted at
+.IR /mnt .
+.IP
+.in +4n
+.EX
+ns1# \fBPS1=\[aq]ns2# \[aq] unshare \-\-user \-\-map\-root\-user \e\fP
+ \fB\-\-mount \-\-propagation unchanged bash\fP
+ns2# \fBgrep /mnt /proc/self/mountinfo | sed \[aq]s/ \- .*//\[aq]\fP
+1239 1204 8:5 /mnt /mnt rw,relatime master:344
+1240 1239 0:56 / /mnt/x rw,relatime
+1241 1240 0:57 / /mnt/x/y rw,relatime
+.EE
+.in
+.IP
+Of note in the above output is that the propagation type of the mount
+.I /mnt
+has been reduced to slave, as explained in point [2].
+This means that submount events will propagate from the master
+.I /mnt
+in "ns1", but propagation will not occur in the opposite direction.
+.IP
+From a separate terminal window, we then use
+.BR nsenter (1)
+to enter the mount and user namespaces corresponding to "ns1".
+In that terminal window, we then recursively bind mount
+.I /mnt/x
+at the location
+.IR /mnt/ppp .
+.IP
+.in +4n
+.EX
+$ \fBPS1=\[aq]ns3# \[aq] sudo nsenter \-t 778501 \-\-user \-\-mount\fP
+ns3# \fBmount \-\-rbind \-\-make\-private /mnt/x /mnt/ppp\fP
+ns3# \fBgrep /mnt /proc/self/mountinfo | sed \[aq]s/ \- .*//\[aq]\fP
+986 83 8:5 /mnt /mnt rw,relatime shared:344
+989 986 0:56 / /mnt/x rw,relatime
+990 989 0:57 / /mnt/x/y rw,relatime
+1242 986 0:56 / /mnt/ppp rw,relatime
+1243 1242 0:57 / /mnt/ppp/y rw,relatime shared:518
+.EE
+.in
+.IP
+Because the propagation type of the parent mount,
+.IR /mnt ,
+was shared, the recursive bind mount propagated a small subtree of
+mounts under the slave mount
+.I /mnt
+into "ns2",
+as can be verified by executing the following command in that shell session:
+.IP
+.in +4n
+.EX
+ns2# \fBgrep /mnt /proc/self/mountinfo | sed \[aq]s/ \- .*//\[aq]\fP
+1239 1204 8:5 /mnt /mnt rw,relatime master:344
+1240 1239 0:56 / /mnt/x rw,relatime
+1241 1240 0:57 / /mnt/x/y rw,relatime
+1244 1239 0:56 / /mnt/ppp rw,relatime
+1245 1244 0:57 / /mnt/ppp/y rw,relatime master:518
+.EE
+.in
+.IP
+While it is not possible to
+.BR umount (2)
+a part of the propagated subtree
+.RI ( /mnt/ppp/y )
+in "ns2",
+it is possible to
+.BR umount (2)
+the entire subtree,
+as shown by the following commands:
+.IP
+.in +4n
+.EX
+ns2# \fBumount /mnt/ppp/y\fP
+umount: /mnt/ppp/y: not mounted.
+ns2# \fBumount \-l /mnt/ppp | sed \[aq]s/ \- .*//\[aq]\fP # Succeeds...
+ns2# \fBgrep /mnt /proc/self/mountinfo\fP
+1239 1204 8:5 /mnt /mnt rw,relatime master:344
+1240 1239 0:56 / /mnt/x rw,relatime
+1241 1240 0:57 / /mnt/x/y rw,relatime
+.EE
+.in
+.IP [5]
+The
+.BR mount (2)
+flags
+.BR MS_RDONLY ,
+.BR MS_NOSUID ,
+.BR MS_NOEXEC ,
+and the "atime" flags
+.RB ( MS_NOATIME ,
+.BR MS_NODIRATIME ,
+.BR MS_RELATIME )
+settings become locked
+.\" commit 9566d6742852c527bf5af38af5cbb878dad75705
+.\" Author: Eric W. Biederman <ebiederm@xmission.com>
+.\" Date: Mon Jul 28 17:26:07 2014 -0700
+.\"
+.\" mnt: Correct permission checks in do_remount
+.\"
+when propagated from a more privileged to
+a less privileged mount namespace,
+and may not be changed in the less privileged mount namespace.
+.IP
+This point is illustrated in the following example where,
+in a more privileged mount namespace,
+we create a bind mount that is marked as read-only.
+For security reasons,
+it should not be possible to make the mount writable in
+a less privileged mount namespace, and indeed the kernel prevents this:
+.IP
+.in +4n
+.EX
+$ \fBsudo mkdir /mnt/dir\fP
+$ \fBsudo mount \-\-bind \-o ro /some/path /mnt/dir\fP
+$ \fBsudo unshare \-\-user \-\-map\-root\-user \-\-mount \e\fP
+ \fBmount \-o remount,rw /mnt/dir\fP
+mount: /mnt/dir: permission denied.
+.EE
+.in
+.IP [6]
+.\" (As of 3.18-rc1 (in Al Viro's 2014-08-30 vfs.git#for-next tree))
+A file or directory that is a mount point in one namespace that is not
+a mount point in another namespace, may be renamed, unlinked, or removed
+.RB ( rmdir (2))
+in the mount namespace in which it is not a mount point
+(subject to the usual permission checks).
+Consequently, the mount point is removed in the mount namespace
+where it was a mount point.
+.IP
+Previously (before Linux 3.18),
+.\" mtk: The change was in Linux 3.18, I think, with this commit:
+.\" commit 8ed936b5671bfb33d89bc60bdcc7cf0470ba52fe
+.\" Author: Eric W. Biederman <ebiederman@twitter.com>
+.\" Date: Tue Oct 1 18:33:48 2013 -0700
+.\"
+.\" vfs: Lazily remove mounts on unlinked files and directories.
+attempting to unlink, rename, or remove a file or directory
+that was a mount point in another mount namespace would result in the error
+.BR EBUSY .
+That behavior had technical problems of enforcement (e.g., for NFS)
+and permitted denial-of-service attacks against more privileged users
+(i.e., preventing individual files from being updated
+by bind mounting on top of them).
+.SH EXAMPLES
+See
+.BR pivot_root (2).
+.SH SEE ALSO
+.BR unshare (1),
+.BR clone (2),
+.BR mount (2),
+.BR mount_setattr (2),
+.BR pivot_root (2),
+.BR setns (2),
+.BR umount (2),
+.BR unshare (2),
+.BR proc (5),
+.BR namespaces (7),
+.BR user_namespaces (7),
+.BR findmnt (8),
+.BR mount (8),
+.BR pam_namespace (8),
+.BR pivot_root (8),
+.BR umount (8)
+.PP
+.I Documentation/filesystems/sharedsubtree.rst
+in the kernel source tree.