1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
|
.\" Copyright (C) 2003 Andries Brouwer (aeb@cwi.nl)
.\"
.\" SPDX-License-Identifier: Linux-man-pages-copyleft
.\"
.TH path_resolution 7 2024-05-02 "Linux man-pages 6.8"
.SH NAME
path_resolution \- how a pathname is resolved to a file
.SH DESCRIPTION
Some UNIX/Linux system calls have as parameter one or more filenames.
A filename (or pathname) is resolved as follows.
.SS Step 1: start of the resolution process
If the pathname starts with the \[aq]/\[aq] character, the starting lookup
directory is the root directory of the calling process.
A process inherits its root directory from its parent.
Usually this will be the root directory of the file hierarchy.
A process may get a different root directory by use of the
.BR chroot (2)
system call, or may temporarily use a different root directory by using
.BR openat2 (2)
with the
.B RESOLVE_IN_ROOT
flag set.
.P
A process may get an entirely private mount namespace in case
it\[em]or one of its ancestors\[em]was started by an invocation of the
.BR clone (2)
system call that had the
.B CLONE_NEWNS
flag set.
This handles the \[aq]/\[aq] part of the pathname.
.P
If the pathname does not start with the \[aq]/\[aq] character, the starting
lookup directory of the resolution process is the current working directory of
the process \[em] or in the case of
.BR openat (2)-style
system calls, the
.I dfd
argument (or the current working directory if
.B AT_FDCWD
is passed as the
.I dfd
argument).
The current working directory is inherited from the parent, and can
be changed by use of the
.BR chdir (2)
system call.
.P
Pathnames starting with a \[aq]/\[aq] character are called absolute pathnames.
Pathnames not starting with a \[aq]/\[aq] are called relative pathnames.
.SS Step 2: walk along the path
Set the current lookup directory to the starting lookup directory.
Now, for each nonfinal component of the pathname, where a component
is a substring delimited by \[aq]/\[aq] characters, this component is looked up
in the current lookup directory.
.P
If the process does not have search permission on
the current lookup directory,
an
.B EACCES
error is returned ("Permission denied").
.P
If the component is not found, an
.B ENOENT
error is returned
("No such file or directory").
.P
If the component is found, but is neither a directory nor a symbolic link,
an
.B ENOTDIR
error is returned ("Not a directory").
.P
If the component is found and is a directory, we set the
current lookup directory to that directory, and go to the
next component.
.P
If the component is found and is a symbolic link,
we first resolve this symbolic link
(with the current lookup directory
as starting lookup directory).
Upon error, that error is returned.
If the result is not a directory, an
.B ENOTDIR
error is returned.
If the resolution of the symbolic link is successful and returns a directory,
we set the current lookup directory to that directory, and go to
the next component.
Note that the resolution process here can involve recursion if the
prefix ('dirname') component of a pathname contains a filename
that is a symbolic link that resolves to a directory (where the
prefix component of that directory may contain a symbolic link, and so on).
In order to protect the kernel against stack overflow, and also
to protect against denial of service, there are limits on the
maximum recursion depth, and on the maximum number of symbolic links
followed.
An
.B ELOOP
error is returned when the maximum is
exceeded ("Too many levels of symbolic links").
.P
.\"
.\" presently: max recursion depth during symlink resolution: 5
.\" max total number of symbolic links followed: 40
.\" _POSIX_SYMLOOP_MAX is 8
As currently implemented on Linux, the maximum number
.\" MAXSYMLINKS is 40
of symbolic links that will be followed while resolving a pathname is 40.
Before Linux 2.6.18, the limit on the recursion depth was 5.
Starting with Linux 2.6.18, this limit
.\" MAX_NESTED_LINKS
was raised to 8.
In Linux 4.2,
.\" commit 894bc8c4662ba9daceafe943a5ba0dd407da5cd3
the kernel's pathname-resolution code
was reworked to eliminate the use of recursion,
so that the only limit that remains is the maximum of 40
resolutions for the entire pathname.
.P
The resolution of symbolic links during this stage can be blocked by using
.BR openat2 (2),
with the
.B RESOLVE_NO_SYMLINKS
flag set.
.SS Step 3: find the final entry
The lookup of the final component of the pathname goes just like
that of all other components, as described in the previous step,
with two differences: (i) the final component need not be a
directory (at least as far as the path resolution process is
concerned\[em]it may have to be a directory, or a nondirectory, because of
the requirements of the specific system call), and (ii) it
is not necessarily an error if the component is not found\[em]maybe
we are just creating it.
The details on the treatment
of the final entry are described in the manual pages of the specific
system calls.
.SS "\&. and .."
By convention, every directory has the entries "." and "..",
which refer to the directory itself and to its parent directory,
respectively.
.P
The path resolution process will assume that these entries have
their conventional meanings, regardless of whether they are
actually present in the physical filesystem.
.P
One cannot walk up past the root: "/.." is the same as "/".
.SS Mount points
After a
.I mount dev path
command, the pathname "path" refers to
the root of the filesystem hierarchy on the device "dev", and no
longer to whatever it referred to earlier.
.P
One can walk out of a mounted filesystem: "path/.." refers to
the parent directory of "path",
outside of the filesystem hierarchy on "dev".
.P
Traversal of mount points can be blocked by using
.BR openat2 (2),
with the
.B RESOLVE_NO_XDEV
flag set (though note that this also restricts bind mount traversal).
.SS Trailing slashes
If a pathname ends in a \[aq]/\[aq], that forces resolution of the preceding
component as in Step 2:
the component preceding the slash either exists and resolves to a directory
or it names a directory that is to be created
immediately after the pathname is resolved.
Otherwise, a trailing \[aq]/\[aq] is ignored.
.SS Final symbolic link
If the last component of a pathname is a symbolic link, then it
depends on the system call whether the file referred to will be
the symbolic link or the result of path resolution on its contents.
For example, the system call
.BR lstat (2)
will operate on the symbolic link,
while
.BR stat (2)
operates on the file pointed to by the symbolic link.
.SS Length limit
There is a maximum length for pathnames.
If the pathname (or some
intermediate pathname obtained while resolving symbolic links)
is too long, an
.B ENAMETOOLONG
error is returned ("Filename too long").
.SS Empty pathname
In the original UNIX, the empty pathname referred to the current directory.
Nowadays POSIX decrees that an empty pathname must not be resolved
successfully.
Linux returns
.B ENOENT
in this case.
.SS Permissions
The permission bits of a file consist of three groups of three bits; see
.BR chmod (1)
and
.BR stat (2).
The first group of three is used when the effective user ID of
the calling process equals the owner ID of the file.
The second group
of three is used when the group ID of the file either equals the
effective group ID of the calling process, or is one of the
supplementary group IDs of the calling process (as set by
.BR setgroups (2)).
When neither holds, the third group is used.
.P
Of the three bits used, the first bit determines read permission,
the second write permission, and the last execute permission
in case of ordinary files, or search permission in case of directories.
.P
Linux uses the fsuid instead of the effective user ID in permission checks.
Ordinarily the fsuid will equal the effective user ID, but the fsuid can be
changed by the system call
.BR setfsuid (2).
.P
(Here "fsuid" stands for something like "filesystem user ID".
The concept was required for the implementation of a user space
NFS server at a time when processes could send a signal to a process
with the same effective user ID.
It is obsolete now.
Nobody should use
.BR setfsuid (2).)
.P
Similarly, Linux uses the fsgid ("filesystem group ID")
instead of the effective group ID.
See
.BR setfsgid (2).
.\" FIXME . say something about filesystem mounted read-only ?
.SS Bypassing permission checks: superuser and capabilities
On a traditional UNIX system, the superuser
.RI ( root ,
user ID 0) is all-powerful, and bypasses all permissions restrictions
when accessing files.
.\" (but for exec at least one x bit must be set) -- AEB
.\" but there is variation across systems on this point: for
.\" example, HP-UX and Tru64 are as described by AEB. However,
.\" on some implementations (e.g., Solaris, FreeBSD),
.\" access(X_OK) by superuser will report success, regardless
.\" of the file's execute permission bits. -- MTK (Oct 05)
.P
On Linux, superuser privileges are divided into capabilities (see
.BR capabilities (7)).
Two capabilities are relevant for file permissions checks:
.B CAP_DAC_OVERRIDE
and
.BR CAP_DAC_READ_SEARCH .
(A process has these capabilities if its fsuid is 0.)
.P
The
.B CAP_DAC_OVERRIDE
capability overrides all permission checking,
but grants execute permission only when at least one
of the file's three execute permission bits is set.
.P
The
.B CAP_DAC_READ_SEARCH
capability grants read and search permission
on directories, and read permission on ordinary files.
.\" FIXME . say something about immutable files
.\" FIXME . say something about ACLs
.SH SEE ALSO
.BR readlink (2),
.BR capabilities (7),
.BR credentials (7),
.BR symlink (7)
|