summaryrefslogtreecommitdiffstats
path: root/upstream/debian-bookworm/man2/openat2.2
blob: a24a3d0cc7e5c1f670450835dc177af69362fd64 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
.\" Copyright (C) 2019 Aleksa Sarai <cyphar@cyphar.com>
.\"
.\" SPDX-License-Identifier: Linux-man-pages-copyleft
.TH openat2 2 2023-02-05 "Linux man-pages 6.03"
.SH NAME
openat2 \- open and possibly create a file (extended)
.SH LIBRARY
Standard C library
.RI ( libc ", " \-lc )
.SH SYNOPSIS
.nf
.BR "#include <fcntl.h>" \
"          /* Definition of " O_* " and " S_* " constants */"
.BR "#include <linux/openat2.h>" "  /* Definition of " RESOLVE_* " constants */"
.BR "#include <sys/syscall.h>" "    /* Definition of " SYS_* " constants */"
.B #include <unistd.h>
.PP
.BI "long syscall(SYS_openat2, int " dirfd ", const char *" pathname ,
.BI "             struct open_how *" how ", size_t " size );
.fi
.PP
.IR Note :
glibc provides no wrapper for
.BR openat2 (),
necessitating the use of
.BR syscall (2).
.SH DESCRIPTION
The
.BR openat2 ()
system call is an extension of
.BR openat (2)
and provides a superset of its functionality.
.PP
The
.BR openat2 ()
system call opens the file specified by
.IR pathname .
If the specified file does not exist, it may optionally (if
.B O_CREAT
is specified in
.IR how.flags )
be created.
.PP
As with
.BR openat (2),
if
.I pathname
is a relative pathname, then it is interpreted relative to the
directory referred to by the file descriptor
.I dirfd
(or the current working directory of the calling process, if
.I dirfd
is the special value
.BR AT_FDCWD ).
If
.I pathname
is an absolute pathname, then
.I dirfd
is ignored (unless
.I how.resolve
contains
.BR RESOLVE_IN_ROOT ,
in which case
.I pathname
is resolved relative to
.IR dirfd ).
.PP
Rather than taking a single
.I flags
argument, an extensible structure (\fIhow\fP) is passed to allow for
future extensions.
The
.I size
argument must be specified as
.IR "sizeof(struct open_how)" .
.\"
.SS The open_how structure
The
.I how
argument specifies how
.I pathname
should be opened, and acts as a superset of the
.I flags
and
.I mode
arguments to
.BR openat (2).
This argument is a pointer to an
.I open_how
structure,
described in
.BR open_how (2type).
.PP
Any future extensions to
.BR openat2 ()
will be implemented as new fields appended to the
.I open_how
structure,
with a zero value in a new field resulting in the kernel behaving
as though that extension field was not present.
Therefore, the caller
.I must
zero-fill this structure on
initialization.
(See the "Extensibility" section of the
.B NOTES
for more detail on why this is necessary.)
.PP
The fields of the
.I open_how
structure are as follows:
.TP
.I flags
This field specifies
the file creation and file status flags to use when opening the file.
All of the
.B O_*
flags defined for
.BR openat (2)
are valid
.BR openat2 ()
flag values.
.IP
Whereas
.BR openat (2)
ignores unknown bits in its
.I flags
argument,
.BR openat2 ()
returns an error if unknown or conflicting flags are specified in
.IR how.flags .
.TP
.I mode
This field specifies the
mode for the new file, with identical semantics to the
.I mode
argument of
.BR openat (2).
.IP
Whereas
.BR openat (2)
ignores bits other than those in the range
.I 07777
in its
.I mode
argument,
.BR openat2 ()
returns an error if
.I how.mode
contains bits other than
.IR 07777 .
Similarly, an error is returned if
.BR openat2 ()
is called with a nonzero
.I how.mode
and
.I how.flags
does not contain
.B O_CREAT
or
.BR O_TMPFILE .
.TP
.I resolve
This is a bit-mask of flags that modify the way in which
.B all
components of
.I pathname
will be resolved.
(See
.BR path_resolution (7)
for background information.)
.IP
The primary use case for these flags is to allow trusted programs to restrict
how untrusted paths (or paths inside untrusted directories) are resolved.
The full list of
.I resolve
flags is as follows:
.RS
.TP
.B RESOLVE_BENEATH
.\" commit adb21d2b526f7f196b2f3fdca97d80ba05dd14a0
Do not permit the path resolution to succeed if any component of the resolution
is not a descendant of the directory indicated by
.IR dirfd .
This causes absolute symbolic links (and absolute values of
.IR pathname )
to be rejected.
.IP
Currently, this flag also disables magic-link resolution (see below).
However, this may change in the future.
Therefore, to ensure that magic links are not resolved,
the caller should explicitly specify
.BR RESOLVE_NO_MAGICLINKS .
.TP
.B RESOLVE_IN_ROOT
.\" commit 8db52c7e7ee1bd861b6096fcafc0fe7d0f24a994
Treat the directory referred to by
.I dirfd
as the root directory while resolving
.IR pathname .
Absolute symbolic links are interpreted relative to
.IR dirfd .
If a prefix component of
.I pathname
equates to
.IR dirfd ,
then an immediately following
.I ..\&
component likewise equates to
.I dirfd
(just as
.I /..\&
is traditionally equivalent to
.IR / ).
If
.I pathname
is an absolute path, it is also interpreted relative to
.IR dirfd .
.IP
The effect of this flag is as though the calling process had used
.BR chroot (2)
to (temporarily) modify its root directory (to the directory
referred to by
.IR dirfd ).
However, unlike
.BR chroot (2)
(which changes the filesystem root permanently for a process),
.B RESOLVE_IN_ROOT
allows a program to efficiently restrict path resolution on a per-open basis.
.IP
Currently, this flag also disables magic-link resolution.
However, this may change in the future.
Therefore, to ensure that magic links are not resolved,
the caller should explicitly specify
.BR RESOLVE_NO_MAGICLINKS .
.TP
.B RESOLVE_NO_MAGICLINKS
.\" commit 278121417a72d87fb29dd8c48801f80821e8f75a
Disallow all magic-link resolution during path resolution.
.IP
Magic links are symbolic link-like objects that are most notably found in
.BR proc (5);
examples include
.IR /proc/ pid /exe
and
.IR /proc/ pid /fd/* .
(See
.BR symlink (7)
for more details.)
.IP
Unknowingly opening magic links can be risky for some applications.
Examples of such risks include the following:
.RS
.IP \[bu] 3
If the process opening a pathname is a controlling process that
currently has no controlling terminal (see
.BR credentials (7)),
then opening a magic link inside
.IR /proc/ pid /fd
that happens to refer to a terminal
would cause the process to acquire a controlling terminal.
.IP \[bu]
.\" From https://lwn.net/Articles/796868/:
.\"     The presence of this flag will prevent a path lookup operation
.\"     from traversing through one of these magic links, thus blocking
.\"     (for example) attempts to escape from a container via a /proc
.\"     entry for an open file descriptor.
In a containerized environment,
a magic link inside
.I /proc
may refer to an object outside the container,
and thus may provide a means to escape from the container.
.RE
.IP
Because of such risks,
an application may prefer to disable magic link resolution using the
.B RESOLVE_NO_MAGICLINKS
flag.
.IP
If the trailing component (i.e., basename) of
.I pathname
is a magic link,
.I how.resolve
contains
.BR RESOLVE_NO_MAGICLINKS ,
and
.I how.flags
contains both
.B O_PATH
and
.BR O_NOFOLLOW ,
then an
.B O_PATH
file descriptor referencing the magic link will be returned.
.TP
.B RESOLVE_NO_SYMLINKS
.\" commit 278121417a72d87fb29dd8c48801f80821e8f75a
Disallow resolution of symbolic links during path resolution.
This option implies
.BR RESOLVE_NO_MAGICLINKS .
.IP
If the trailing component (i.e., basename) of
.I pathname
is a symbolic link,
.I how.resolve
contains
.BR RESOLVE_NO_SYMLINKS ,
and
.I how.flags
contains both
.B O_PATH
and
.BR O_NOFOLLOW ,
then an
.B O_PATH
file descriptor referencing the symbolic link will be returned.
.IP
Note that the effect of the
.B RESOLVE_NO_SYMLINKS
flag,
which affects the treatment of symbolic links in all of the components of
.IR pathname ,
differs from the effect of the
.B O_NOFOLLOW
file creation flag (in
.IR how.flags ),
which affects the handling of symbolic links only in the final component of
.IR pathname .
.IP
Applications that employ the
.B RESOLVE_NO_SYMLINKS
flag are encouraged to make its use configurable
(unless it is used for a specific security purpose),
as symbolic links are very widely used by end-users.
Setting this flag indiscriminately\[em]i.e.,
for purposes not specifically related to security\[em]for all uses of
.BR openat2 ()
may result in spurious errors on previously functional systems.
This may occur if, for example,
a system pathname that is used by an application is modified
(e.g., in a new distribution release)
so that a pathname component (now) contains a symbolic link.
.TP
.B RESOLVE_NO_XDEV
.\" commit 72ba29297e1439efaa54d9125b866ae9d15df339
Disallow traversal of mount points during path resolution (including all bind
mounts).
Consequently,
.I pathname
must either be on the same mount as the directory referred to by
.IR dirfd ,
or on the same mount as the current working directory if
.I dirfd
is specified as
.BR AT_FDCWD .
.IP
Applications that employ the
.B RESOLVE_NO_XDEV
flag are encouraged to make its use configurable (unless it is
used for a specific security purpose),
as bind mounts are widely used by end-users.
Setting this flag indiscriminately\[em]i.e.,
for purposes not specifically related to security\[em]for all uses of
.BR openat2 ()
may result in spurious errors on previously functional systems.
This may occur if, for example,
a system pathname that is used by an application is modified
(e.g., in a new distribution release)
so that a pathname component (now) contains a bind mount.
.TP
.B RESOLVE_CACHED
Make the open operation fail unless all path components are already present
in the kernel's lookup cache.
If any kind of revalidation or I/O is needed to satisfy the lookup,
.BR openat2 ()
fails with the error
.B EAGAIN .
This is useful in providing a fast-path open that can be performed without
resorting to thread offload, or other mechanisms that an application might
use to offload slower operations.
.RE
.IP
If any bits other than those listed above are set in
.IR how.resolve ,
an error is returned.
.SH RETURN VALUE
On success, a new file descriptor is returned.
On error, \-1 is returned, and
.I errno
is set to indicate the error.
.SH ERRORS
The set of errors returned by
.BR openat2 ()
includes all of the errors returned by
.BR openat (2),
as well as the following additional errors:
.TP
.B E2BIG
An extension that this kernel does not support was specified in
.IR how .
(See the "Extensibility" section of
.B NOTES
for more detail on how extensions are handled.)
.TP
.B EAGAIN
.I how.resolve
contains either
.B RESOLVE_IN_ROOT
or
.BR RESOLVE_BENEATH ,
and the kernel could not ensure that a ".." component didn't escape (due to a
race condition or potential attack).
The caller may choose to retry the
.BR openat2 ()
call.
.TP
.B EAGAIN
.B RESOLVE_CACHED
was set, and the open operation cannot be performed using only cached
information.
The caller should retry without
.B RESOLVE_CACHED
set in
.I how.resolve .
.TP
.B EINVAL
An unknown flag or invalid value was specified in
.IR how .
.TP
.B EINVAL
.I mode
is nonzero, but
.I how.flags
does not contain
.B O_CREAT
or
.BR O_TMPFILE .
.TP
.B EINVAL
.I size
was smaller than any known version of
.IR "struct open_how" .
.TP
.B ELOOP
.I how.resolve
contains
.BR RESOLVE_NO_SYMLINKS ,
and one of the path components was a symbolic link (or magic link).
.TP
.B ELOOP
.I how.resolve
contains
.BR RESOLVE_NO_MAGICLINKS ,
and one of the path components was a magic link.
.TP
.B EXDEV
.I how.resolve
contains either
.B RESOLVE_IN_ROOT
or
.BR RESOLVE_BENEATH ,
and an escape from the root during path resolution was detected.
.TP
.B EXDEV
.I how.resolve
contains
.BR RESOLVE_NO_XDEV ,
and a path component crosses a mount point.
.SH VERSIONS
.BR openat2 ()
first appeared in Linux 5.6.
.\" commit fddb5d430ad9fa91b49b1d34d0202ffe2fa0e179
.SH STANDARDS
This system call is Linux-specific.
.PP
The semantics of
.B RESOLVE_BENEATH
were modeled after FreeBSD's
.BR O_BENEATH .
.SH NOTES
.SS Extensibility
In order to allow for future extensibility,
.BR openat2 ()
requires the user-space application to specify the size of the
.I open_how
structure that it is passing.
By providing this information, it is possible for
.BR openat2 ()
to provide both forwards- and backwards-compatibility, with
.I size
acting as an implicit version number.
(Because new extension fields will always
be appended, the structure size will always increase.)
This extensibility design is very similar to other system calls such as
.BR sched_setattr (2),
.BR perf_event_open (2),
and
.BR clone3 (2).
.PP
If we let
.I usize
be the size of the structure as specified by the user-space application, and
.I ksize
be the size of the structure which the kernel supports, then there are
three cases to consider:
.IP \[bu] 3
If
.I ksize
equals
.IR usize ,
then there is no version mismatch and
.I how
can be used verbatim.
.IP \[bu]
If
.I ksize
is larger than
.IR usize ,
then there are some extension fields that the kernel supports
which the user-space application
is unaware of.
Because a zero value in any added extension field signifies a no-op,
the kernel
treats all of the extension fields not provided by the user-space application
as having zero values.
This provides backwards-compatibility.
.IP \[bu]
If
.I ksize
is smaller than
.IR usize ,
then there are some extension fields which the user-space application
is aware of but which the kernel does not support.
Because any extension field must have its zero values signify a no-op,
the kernel can
safely ignore the unsupported extension fields if they are all-zero.
If any unsupported extension fields are nonzero, then \-1 is returned and
.I errno
is set to
.BR E2BIG .
This provides forwards-compatibility.
.PP
Because the definition of
.I struct open_how
may change in the future (with new fields being added when system headers are
updated), user-space applications should zero-fill
.I struct open_how
to ensure that recompiling the program with new headers will not result in
spurious errors at runtime.
The simplest way is to use a designated
initializer:
.PP
.in +4n
.EX
struct open_how how = { .flags = O_RDWR,
                        .resolve = RESOLVE_IN_ROOT };
.EE
.in
.PP
or explicitly using
.BR memset (3)
or similar:
.PP
.in +4n
.EX
struct open_how how;
memset(&how, 0, sizeof(how));
how.flags = O_RDWR;
how.resolve = RESOLVE_IN_ROOT;
.EE
.in
.PP
A user-space application that wishes to determine which extensions
the running kernel supports can do so by conducting a binary search on
.I size
with a structure which has every byte nonzero (to find the largest value
which doesn't produce an error of
.BR E2BIG ).
.SH SEE ALSO
.BR openat (2),
.BR open_how (2type),
.BR path_resolution (7),
.BR symlink (7)