diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:43:11 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:43:11 +0000 |
commit | fc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch) | |
tree | ce1e3bce06471410239a6f41282e328770aa404a /upstream/debian-bookworm/man2/open.2 | |
parent | Initial commit. (diff) | |
download | manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip |
Adding upstream version 4.22.0.upstream/4.22.0
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'upstream/debian-bookworm/man2/open.2')
-rw-r--r-- | upstream/debian-bookworm/man2/open.2 | 1939 |
1 files changed, 1939 insertions, 0 deletions
diff --git a/upstream/debian-bookworm/man2/open.2 b/upstream/debian-bookworm/man2/open.2 new file mode 100644 index 00000000..19115a37 --- /dev/null +++ b/upstream/debian-bookworm/man2/open.2 @@ -0,0 +1,1939 @@ +.\" This manpage is Copyright (C) 1992 Drew Eckhardt; +.\" and Copyright (C) 1993 Michael Haardt, Ian Jackson. +.\" and Copyright (C) 2008 Greg Banks +.\" and Copyright (C) 2006, 2008, 2013, 2014 Michael Kerrisk <mtk.manpages@gmail.com> +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.\" Modified 1993-07-21 by Rik Faith <faith@cs.unc.edu> +.\" Modified 1994-08-21 by Michael Haardt +.\" Modified 1996-04-13 by Andries Brouwer <aeb@cwi.nl> +.\" Modified 1996-05-13 by Thomas Koenig +.\" Modified 1996-12-20 by Michael Haardt +.\" Modified 1999-02-19 by Andries Brouwer <aeb@cwi.nl> +.\" Modified 1998-11-28 by Joseph S. Myers <jsm28@hermes.cam.ac.uk> +.\" Modified 1999-06-03 by Michael Haardt +.\" Modified 2002-05-07 by Michael Kerrisk <mtk.manpages@gmail.com> +.\" Modified 2004-06-23 by Michael Kerrisk <mtk.manpages@gmail.com> +.\" 2004-12-08, mtk, reordered flags list alphabetically +.\" 2004-12-08, Martin Pool <mbp@sourcefrog.net> (& mtk), added O_NOATIME +.\" 2007-09-18, mtk, Added description of O_CLOEXEC + other minor edits +.\" 2008-01-03, mtk, with input from Trond Myklebust +.\" <trond.myklebust@fys.uio.no> and Timo Sirainen <tss@iki.fi> +.\" Rewrite description of O_EXCL. +.\" 2008-01-11, Greg Banks <gnb@melbourne.sgi.com>: add more detail +.\" on O_DIRECT. +.\" 2008-02-26, Michael Haardt: Reorganized text for O_CREAT and mode +.\" +.\" FIXME . Apr 08: The next POSIX revision has O_EXEC, O_SEARCH, and +.\" O_TTYINIT. Eventually these may need to be documented. --mtk +.\" +.TH open 2 2023-02-05 "Linux man-pages 6.03" +.SH NAME +open, openat, creat \- open and possibly create a file +.SH LIBRARY +Standard C library +.RI ( libc ", " \-lc ) +.SH SYNOPSIS +.nf +.B #include <fcntl.h> +.PP +.BI "int open(const char *" pathname ", int " flags ); +.BI "int open(const char *" pathname ", int " flags ", mode_t " mode ); +.PP +.BI "int creat(const char *" pathname ", mode_t " mode ); +.PP +.BI "int openat(int " dirfd ", const char *" pathname ", int " flags ); +.BI "int openat(int " dirfd ", const char *" pathname ", int " flags \ +", mode_t " mode ); +.PP +/* Documented separately, in \fBopenat2\fP(2): */ +.BI "int openat2(int " dirfd ", const char *" pathname , +.BI " const struct open_how *" how ", size_t " size ");" +.fi +.PP +.RS -4 +Feature Test Macro Requirements for glibc (see +.BR feature_test_macros (7)): +.RE +.PP +.BR openat (): +.nf + Since glibc 2.10: + _POSIX_C_SOURCE >= 200809L + Before glibc 2.10: + _ATFILE_SOURCE +.fi +.SH DESCRIPTION +The +.BR open () +system call opens the file specified by +.IR pathname . +If the specified file does not exist, +it may optionally (if +.B O_CREAT +is specified in +.IR flags ) +be created by +.BR open (). +.PP +The return value of +.BR open () +is a file descriptor, a small, nonnegative integer that is an index +to an entry in the process's table of open file descriptors. +The file descriptor is used +in subsequent system calls +.RB ( read "(2), " write "(2), " lseek "(2), " fcntl (2), +etc.) to refer to the open file. +The file descriptor returned by a successful call will be +the lowest-numbered file descriptor not currently open for the process. +.PP +By default, the new file descriptor is set to remain open across an +.BR execve (2) +(i.e., the +.B FD_CLOEXEC +file descriptor flag described in +.BR fcntl (2) +is initially disabled); the +.B O_CLOEXEC +flag, described below, can be used to change this default. +The file offset is set to the beginning of the file (see +.BR lseek (2)). +.PP +A call to +.BR open () +creates a new +.IR "open file description" , +an entry in the system-wide table of open files. +The open file description records the file offset and the file status flags +(see below). +A file descriptor is a reference to an open file description; +this reference is unaffected if +.I pathname +is subsequently removed or modified to refer to a different file. +For further details on open file descriptions, see NOTES. +.PP +The argument +.I flags +must include one of the following +.IR "access modes" : +.BR O_RDONLY ", " O_WRONLY ", or " O_RDWR . +These request opening the file read-only, write-only, or read/write, +respectively. +.PP +In addition, zero or more file creation flags and file status flags +can be +.RI bitwise- or 'd +in +.IR flags . +The +.I file creation flags +are +.BR O_CLOEXEC , +.BR O_CREAT , +.BR O_DIRECTORY , +.BR O_EXCL , +.BR O_NOCTTY , +.BR O_NOFOLLOW , +.BR O_TMPFILE , +and +.BR O_TRUNC . +The +.I file status flags +are all of the remaining flags listed below. +.\" SUSv4 divides the flags into: +.\" * Access mode +.\" * File creation +.\" * File status +.\" * Other (O_CLOEXEC, O_DIRECTORY, O_NOFOLLOW) +.\" though it's not clear what the difference between "other" and +.\" "File creation" flags is. I raised an Aardvark to see if this +.\" can be clarified in SUSv4; 10 Oct 2008. +.\" http://thread.gmane.org/gmane.comp.standards.posix.austin.general/64/focus=67 +.\" TC1 (balloted in 2013), resolved this, so that those three constants +.\" are also categorized" as file status flags. +.\" +The distinction between these two groups of flags is that +the file creation flags affect the semantics of the open operation itself, +while the file status flags affect the semantics of subsequent I/O operations. +The file status flags can be retrieved and (in some cases) +modified; see +.BR fcntl (2) +for details. +.PP +The full list of file creation flags and file status flags is as follows: +.TP +.B O_APPEND +The file is opened in append mode. +Before each +.BR write (2), +the file offset is positioned at the end of the file, +as if with +.BR lseek (2). +The modification of the file offset and the write operation +are performed as a single atomic step. +.IP +.B O_APPEND +may lead to corrupted files on NFS filesystems if more than one process +appends data to a file at once. +.\" For more background, see +.\" http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=453946 +.\" http://nfs.sourceforge.net/ +This is because NFS does not support +appending to a file, so the client kernel has to simulate it, which +can't be done without a race condition. +.TP +.B O_ASYNC +Enable signal-driven I/O: +generate a signal +.RB ( SIGIO +by default, but this can be changed via +.BR fcntl (2)) +when input or output becomes possible on this file descriptor. +This feature is available only for terminals, pseudoterminals, +sockets, and (since Linux 2.6) pipes and FIFOs. +See +.BR fcntl (2) +for further details. +See also BUGS, below. +.TP +.BR O_CLOEXEC " (since Linux 2.6.23)" +.\" NOTE! several other man pages refer to this text +Enable the close-on-exec flag for the new file descriptor. +.\" FIXME . for later review when Issue 8 is one day released... +.\" POSIX proposes to fix many APIs that provide hidden FDs +.\" http://austingroupbugs.net/tag_view_page.php?tag_id=8 +.\" http://austingroupbugs.net/view.php?id=368 +Specifying this flag permits a program to avoid additional +.BR fcntl (2) +.B F_SETFD +operations to set the +.B FD_CLOEXEC +flag. +.IP +Note that the use of this flag is essential in some multithreaded programs, +because using a separate +.BR fcntl (2) +.B F_SETFD +operation to set the +.B FD_CLOEXEC +flag does not suffice to avoid race conditions +where one thread opens a file descriptor and +attempts to set its close-on-exec flag using +.BR fcntl (2) +at the same time as another thread does a +.BR fork (2) +plus +.BR execve (2). +Depending on the order of execution, +the race may lead to the file descriptor returned by +.BR open () +being unintentionally leaked to the program executed by the child process +created by +.BR fork (2). +(This kind of race is in principle possible for any system call +that creates a file descriptor whose close-on-exec flag should be set, +and various other Linux system calls provide an equivalent of the +.B O_CLOEXEC +flag to deal with this problem.) +.\" This flag fixes only one form of the race condition; +.\" The race can also occur with, for example, file descriptors +.\" returned by accept(), pipe(), etc. +.TP +.B O_CREAT +If +.I pathname +does not exist, create it as a regular file. +.IP +The owner (user ID) of the new file is set to the effective user ID +of the process. +.IP +The group ownership (group ID) of the new file is set either to +the effective group ID of the process (System V semantics) +or to the group ID of the parent directory (BSD semantics). +On Linux, the behavior depends on whether the +set-group-ID mode bit is set on the parent directory: +if that bit is set, then BSD semantics apply; +otherwise, System V semantics apply. +For some filesystems, the behavior also depends on the +.I bsdgroups +and +.I sysvgroups +mount options described in +.BR mount (8). +.\" As at Linux 2.6.25, bsdgroups is supported by ext2, ext3, ext4, and +.\" XFS (since Linux 2.6.14). +.IP +The +.I mode +argument specifies the file mode bits to be applied when a new file is created. +If neither +.B O_CREAT +nor +.B O_TMPFILE +is specified in +.IR flags , +then +.I mode +is ignored (and can thus be specified as 0, or simply omitted). +The +.I mode +argument +.B must +be supplied if +.B O_CREAT +or +.B O_TMPFILE +is specified in +.IR flags ; +if it is not supplied, +some arbitrary bytes from the stack will be applied as the file mode. +.IP +The effective mode is modified by the process's +.I umask +in the usual way: in the absence of a default ACL, the mode of the +created file is +.IR "(mode\ &\ \[ti]umask)" . +.IP +Note that +.I mode +applies only to future accesses of the +newly created file; the +.BR open () +call that creates a read-only file may well return a read/write +file descriptor. +.IP +The following symbolic constants are provided for +.IR mode : +.RS +.TP 9 +.B S_IRWXU +00700 user (file owner) has read, write, and execute permission +.TP +.B S_IRUSR +00400 user has read permission +.TP +.B S_IWUSR +00200 user has write permission +.TP +.B S_IXUSR +00100 user has execute permission +.TP +.B S_IRWXG +00070 group has read, write, and execute permission +.TP +.B S_IRGRP +00040 group has read permission +.TP +.B S_IWGRP +00020 group has write permission +.TP +.B S_IXGRP +00010 group has execute permission +.TP +.B S_IRWXO +00007 others have read, write, and execute permission +.TP +.B S_IROTH +00004 others have read permission +.TP +.B S_IWOTH +00002 others have write permission +.TP +.B S_IXOTH +00001 others have execute permission +.RE +.IP +According to POSIX, the effect when other bits are set in +.I mode +is unspecified. +On Linux, the following bits are also honored in +.IR mode : +.RS +.TP 9 +.B S_ISUID +0004000 set-user-ID bit +.TP +.B S_ISGID +0002000 set-group-ID bit (see +.BR inode (7)). +.TP +.B S_ISVTX +0001000 sticky bit (see +.BR inode (7)). +.RE +.TP +.BR O_DIRECT " (since Linux 2.4.10)" +Try to minimize cache effects of the I/O to and from this file. +In general this will degrade performance, but it is useful in +special situations, such as when applications do their own caching. +File I/O is done directly to/from user-space buffers. +The +.B O_DIRECT +flag on its own makes an effort to transfer data synchronously, +but does not give the guarantees of the +.B O_SYNC +flag that data and necessary metadata are transferred. +To guarantee synchronous I/O, +.B O_SYNC +must be used in addition to +.BR O_DIRECT . +See NOTES below for further discussion. +.IP +A semantically similar (but deprecated) interface for block devices +is described in +.BR raw (8). +.TP +.B O_DIRECTORY +If \fIpathname\fP is not a directory, cause the open to fail. +.\" But see the following and its replies: +.\" http://marc.theaimsgroup.com/?t=112748702800001&r=1&w=2 +.\" [PATCH] open: O_DIRECTORY and O_CREAT together should fail +.\" O_DIRECTORY | O_CREAT causes O_DIRECTORY to be ignored. +This flag was added in Linux 2.1.126, to +avoid denial-of-service problems if +.BR opendir (3) +is called on a +FIFO or tape device. +.TP +.B O_DSYNC +Write operations on the file will complete according to the requirements of +synchronized I/O +.I data +integrity completion. +.IP +By the time +.BR write (2) +(and similar) +return, the output data +has been transferred to the underlying hardware, +along with any file metadata that would be required to retrieve that data +(i.e., as though each +.BR write (2) +was followed by a call to +.BR fdatasync (2)). +.IR "See NOTES below" . +.TP +.B O_EXCL +Ensure that this call creates the file: +if this flag is specified in conjunction with +.BR O_CREAT , +and +.I pathname +already exists, then +.BR open () +fails with the error +.BR EEXIST . +.IP +When these two flags are specified, symbolic links are not followed: +.\" POSIX.1-2001 explicitly requires this behavior. +if +.I pathname +is a symbolic link, then +.BR open () +fails regardless of where the symbolic link points. +.IP +In general, the behavior of +.B O_EXCL +is undefined if it is used without +.BR O_CREAT . +There is one exception: on Linux 2.6 and later, +.B O_EXCL +can be used without +.B O_CREAT +if +.I pathname +refers to a block device. +If the block device is in use by the system (e.g., mounted), +.BR open () +fails with the error +.BR EBUSY . +.IP +On NFS, +.B O_EXCL +is supported only when using NFSv3 or later on kernel 2.6 or later. +In NFS environments where +.B O_EXCL +support is not provided, programs that rely on it +for performing locking tasks will contain a race condition. +Portable programs that want to perform atomic file locking using a lockfile, +and need to avoid reliance on NFS support for +.BR O_EXCL , +can create a unique file on +the same filesystem (e.g., incorporating hostname and PID), and use +.BR link (2) +to make a link to the lockfile. +If +.BR link (2) +returns 0, the lock is successful. +Otherwise, use +.BR stat (2) +on the unique file to check if its link count has increased to 2, +in which case the lock is also successful. +.TP +.B O_LARGEFILE +(LFS) +Allow files whose sizes cannot be represented in an +.I off_t +(but can be represented in an +.IR off64_t ) +to be opened. +The +.B _LARGEFILE64_SOURCE +macro must be defined +(before including +.I any +header files) +in order to obtain this definition. +Setting the +.B _FILE_OFFSET_BITS +feature test macro to 64 (rather than using +.BR O_LARGEFILE ) +is the preferred +method of accessing large files on 32-bit systems (see +.BR feature_test_macros (7)). +.TP +.BR O_NOATIME " (since Linux 2.6.8)" +Do not update the file last access time +.RI ( st_atime +in the inode) +when the file is +.BR read (2). +.IP +This flag can be employed only if one of the following conditions is true: +.RS +.IP \[bu] 3 +The effective UID of the process +.\" Strictly speaking: the filesystem UID +matches the owner UID of the file. +.IP \[bu] +The calling process has the +.B CAP_FOWNER +capability in its user namespace and +the owner UID of the file has a mapping in the namespace. +.RE +.IP +This flag is intended for use by indexing or backup programs, +where its use can significantly reduce the amount of disk activity. +This flag may not be effective on all filesystems. +One example is NFS, where the server maintains the access time. +.\" The O_NOATIME flag also affects the treatment of st_atime +.\" by mmap() and readdir(2), MTK, Dec 04. +.TP +.B O_NOCTTY +If +.I pathname +refers to a terminal device\[em]see +.BR tty (4)\[em]it +will not become the process's controlling terminal even if the +process does not have one. +.TP +.B O_NOFOLLOW +If the trailing component (i.e., basename) of +.I pathname +is a symbolic link, then the open fails, with the error +.BR ELOOP . +Symbolic links in earlier components of the pathname will still be +followed. +(Note that the +.B ELOOP +error that can occur in this case is indistinguishable from the case where +an open fails because there are too many symbolic links found +while resolving components in the prefix part of the pathname.) +.IP +This flag is a FreeBSD extension, which was added in Linux 2.1.126, +and has subsequently been standardized in POSIX.1-2008. +.IP +See also +.B O_PATH +below. +.\" The headers from glibc 2.0.100 and later include a +.\" definition of this flag; \fIkernels before Linux 2.1.126 will ignore it if +.\" used\fP. +.TP +.BR O_NONBLOCK " or " O_NDELAY +When possible, the file is opened in nonblocking mode. +Neither the +.BR open () +nor any subsequent I/O operations on the file descriptor which is +returned will cause the calling process to wait. +.IP +Note that the setting of this flag has no effect on the operation of +.BR poll (2), +.BR select (2), +.BR epoll (7), +and similar, +since those interfaces merely inform the caller about whether +a file descriptor is "ready", +meaning that an I/O operation performed on +the file descriptor with the +.B O_NONBLOCK +flag +.I clear +would not block. +.IP +Note that this flag has no effect for regular files and block devices; +that is, I/O operations will (briefly) block when device activity +is required, regardless of whether +.B O_NONBLOCK +is set. +Since +.B O_NONBLOCK +semantics might eventually be implemented, +applications should not depend upon blocking behavior +when specifying this flag for regular files and block devices. +.IP +For the handling of FIFOs (named pipes), see also +.BR fifo (7). +For a discussion of the effect of +.B O_NONBLOCK +in conjunction with mandatory file locks and with file leases, see +.BR fcntl (2). +.TP +.BR O_PATH " (since Linux 2.6.39)" +.\" commit 1abf0c718f15a56a0a435588d1b104c7a37dc9bd +.\" commit 326be7b484843988afe57566b627fb7a70beac56 +.\" commit 65cfc6722361570bfe255698d9cd4dccaf47570d +.\" +.\" http://thread.gmane.org/gmane.linux.man/2790/focus=3496 +.\" Subject: Re: [PATCH] open(2): document O_PATH +.\" Newsgroups: gmane.linux.man, gmane.linux.kernel +.\" +Obtain a file descriptor that can be used for two purposes: +to indicate a location in the filesystem tree and +to perform operations that act purely at the file descriptor level. +The file itself is not opened, and other file operations (e.g., +.BR read (2), +.BR write (2), +.BR fchmod (2), +.BR fchown (2), +.BR fgetxattr (2), +.BR ioctl (2), +.BR mmap (2)) +fail with the error +.BR EBADF . +.IP +The following operations +.I can +be performed on the resulting file descriptor: +.RS +.IP \[bu] 3 +.BR close (2). +.IP \[bu] +.BR fchdir (2), +if the file descriptor refers to a directory +(since Linux 3.5). +.\" commit 332a2e1244bd08b9e3ecd378028513396a004a24 +.IP \[bu] +.BR fstat (2) +(since Linux 3.6). +.IP \[bu] +.\" fstat(): commit 55815f70147dcfa3ead5738fd56d3574e2e3c1c2 +.BR fstatfs (2) +(since Linux 3.12). +.\" fstatfs(): commit 9d05746e7b16d8565dddbe3200faa1e669d23bbf +.IP \[bu] +Duplicating the file descriptor +.RB ( dup (2), +.BR fcntl (2) +.BR F_DUPFD , +etc.). +.IP \[bu] +Getting and setting file descriptor flags +.RB ( fcntl (2) +.B F_GETFD +and +.BR F_SETFD ). +.IP \[bu] +Retrieving open file status flags using the +.BR fcntl (2) +.B F_GETFL +operation: the returned flags will include the bit +.BR O_PATH . +.IP \[bu] +Passing the file descriptor as the +.I dirfd +argument of +.BR openat () +and the other "*at()" system calls. +This includes +.BR linkat (2) +with +.B AT_EMPTY_PATH +(or via procfs using +.BR AT_SYMLINK_FOLLOW ) +even if the file is not a directory. +.IP \[bu] +Passing the file descriptor to another process via a UNIX domain socket +(see +.B SCM_RIGHTS +in +.BR unix (7)). +.RE +.IP +When +.B O_PATH +is specified in +.IR flags , +flag bits other than +.BR O_CLOEXEC , +.BR O_DIRECTORY , +and +.B O_NOFOLLOW +are ignored. +.IP +Opening a file or directory with the +.B O_PATH +flag requires no permissions on the object itself +(but does require execute permission on the directories in the path prefix). +Depending on the subsequent operation, +a check for suitable file permissions may be performed (e.g., +.BR fchdir (2) +requires execute permission on the directory referred to +by its file descriptor argument). +By contrast, +obtaining a reference to a filesystem object by opening it with the +.B O_RDONLY +flag requires that the caller have read permission on the object, +even when the subsequent operation (e.g., +.BR fchdir (2), +.BR fstat (2)) +does not require read permission on the object. +.IP +If +.I pathname +is a symbolic link and the +.B O_NOFOLLOW +flag is also specified, +then the call returns a file descriptor referring to the symbolic link. +This file descriptor can be used as the +.I dirfd +argument in calls to +.BR fchownat (2), +.BR fstatat (2), +.BR linkat (2), +and +.BR readlinkat (2) +with an empty pathname to have the calls operate on the symbolic link. +.IP +If +.I pathname +refers to an automount point that has not yet been triggered, so no +other filesystem is mounted on it, then the call returns a file +descriptor referring to the automount directory without triggering a mount. +.BR fstatfs (2) +can then be used to determine if it is, in fact, an untriggered +automount point +.RB ( ".f_type == AUTOFS_SUPER_MAGIC" ). +.IP +One use of +.B O_PATH +for regular files is to provide the equivalent of POSIX.1's +.B O_EXEC +functionality. +This permits us to open a file for which we have execute +permission but not read permission, and then execute that file, +with steps something like the following: +.IP +.in +4n +.EX +char buf[PATH_MAX]; +fd = open("some_prog", O_PATH); +snprintf(buf, PATH_MAX, "/proc/self/fd/%d", fd); +execl(buf, "some_prog", (char *) NULL); +.EE +.in +.IP +An +.B O_PATH +file descriptor can also be passed as the argument of +.BR fexecve (3). +.TP +.B O_SYNC +Write operations on the file will complete according to the requirements of +synchronized I/O +.I file +integrity completion +(by contrast with the +synchronized I/O +.I data +integrity completion +provided by +.BR O_DSYNC .) +.IP +By the time +.BR write (2) +(or similar) +returns, the output data and associated file metadata +have been transferred to the underlying hardware +(i.e., as though each +.BR write (2) +was followed by a call to +.BR fsync (2)). +.IR "See NOTES below" . +.TP +.BR O_TMPFILE " (since Linux 3.11)" +.\" commit 60545d0d4610b02e55f65d141c95b18ccf855b6e +.\" commit f4e0c30c191f87851c4a53454abb55ee276f4a7e +.\" commit bb458c644a59dbba3a1fe59b27106c5e68e1c4bd +Create an unnamed temporary regular file. +The +.I pathname +argument specifies a directory; +an unnamed inode will be created in that directory's filesystem. +Anything written to the resulting file will be lost when +the last file descriptor is closed, unless the file is given a name. +.IP +.B O_TMPFILE +must be specified with one of +.B O_RDWR +or +.B O_WRONLY +and, optionally, +.BR O_EXCL . +If +.B O_EXCL +is not specified, then +.BR linkat (2) +can be used to link the temporary file into the filesystem, making it +permanent, using code like the following: +.IP +.in +4n +.EX +char path[PATH_MAX]; +fd = open("/path/to/dir", O_TMPFILE | O_RDWR, + S_IRUSR | S_IWUSR); + +/* File I/O on \[aq]fd\[aq]... */ + +linkat(fd, "", AT_FDCWD, "/path/for/file", AT_EMPTY_PATH); + +/* If the caller doesn\[aq]t have the CAP_DAC_READ_SEARCH + capability (needed to use AT_EMPTY_PATH with linkat(2)), + and there is a proc(5) filesystem mounted, then the + linkat(2) call above can be replaced with: + +snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd); +linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file", + AT_SYMLINK_FOLLOW); +*/ +.EE +.in +.IP +In this case, +the +.BR open () +.I mode +argument determines the file permission mode, as with +.BR O_CREAT . +.IP +Specifying +.B O_EXCL +in conjunction with +.B O_TMPFILE +prevents a temporary file from being linked into the filesystem +in the above manner. +(Note that the meaning of +.B O_EXCL +in this case is different from the meaning of +.B O_EXCL +otherwise.) +.IP +There are two main use cases for +.\" Inspired by http://lwn.net/Articles/559147/ +.BR O_TMPFILE : +.RS +.IP \[bu] 3 +Improved +.BR tmpfile (3) +functionality: race-free creation of temporary files that +(1) are automatically deleted when closed; +(2) can never be reached via any pathname; +(3) are not subject to symlink attacks; and +(4) do not require the caller to devise unique names. +.IP \[bu] +Creating a file that is initially invisible, which is then populated +with data and adjusted to have appropriate filesystem attributes +.RB ( fchown (2), +.BR fchmod (2), +.BR fsetxattr (2), +etc.) +before being atomically linked into the filesystem +in a fully formed state (using +.BR linkat (2) +as described above). +.RE +.IP +.B O_TMPFILE +requires support by the underlying filesystem; +only a subset of Linux filesystems provide that support. +In the initial implementation, support was provided in +the ext2, ext3, ext4, UDF, Minix, and tmpfs filesystems. +.\" To check for support, grep for "tmpfile" in kernel sources +Support for other filesystems has subsequently been added as follows: +XFS (Linux 3.15); +.\" commit 99b6436bc29e4f10e4388c27a3e4810191cc4788 +.\" commit ab29743117f9f4c22ac44c13c1647fb24fb2bafe +Btrfs (Linux 3.16); +.\" commit ef3b9af50bfa6a1f02cd7b3f5124b712b1ba3e3c +F2FS (Linux 3.16); +.\" commit 50732df02eefb39ab414ef655979c2c9b64ad21c +and ubifs (Linux 4.9) +.TP +.B O_TRUNC +If the file already exists and is a regular file and the access mode allows +writing (i.e., is +.B O_RDWR +or +.BR O_WRONLY ) +it will be truncated to length 0. +If the file is a FIFO or terminal device file, the +.B O_TRUNC +flag is ignored. +Otherwise, the effect of +.B O_TRUNC +is unspecified. +.SS creat() +A call to +.BR creat () +is equivalent to calling +.BR open () +with +.I flags +equal to +.BR O_CREAT|O_WRONLY|O_TRUNC . +.SS openat() +The +.BR openat () +system call operates in exactly the same way as +.BR open (), +except for the differences described here. +.PP +The +.I dirfd +argument is used in conjunction with the +.I pathname +argument as follows: +.IP \[bu] 3 +If the pathname given in +.I pathname +is absolute, then +.I dirfd +is ignored. +.IP \[bu] +If the pathname given in +.I pathname +is relative and +.I dirfd +is the special value +.BR AT_FDCWD , +then +.I pathname +is interpreted relative to the current working +directory of the calling process (like +.BR open ()). +.IP \[bu] +If the pathname given in +.I pathname +is relative, then it is interpreted relative to the directory +referred to by the file descriptor +.I dirfd +(rather than relative to the current working directory of +the calling process, as is done by +.BR open () +for a relative pathname). +In this case, +.I dirfd +must be a directory that was opened for reading +.RB ( O_RDONLY ) +or using the +.B O_PATH +flag. +.PP +If the pathname given in +.I pathname +is relative, and +.I dirfd +is not a valid file descriptor, an error +.RB ( EBADF ) +results. +(Specifying an invalid file descriptor number in +.I dirfd +can be used as a means to ensure that +.I pathname +is absolute.) +.\" +.SS openat2(2) +The +.BR openat2 (2) +system call is an extension of +.BR openat (), +and provides a superset of the features of +.BR openat (). +It is documented separately, in +.BR openat2 (2). +.SH RETURN VALUE +On success, +.BR open (), +.BR openat (), +and +.BR creat () +return the new file descriptor (a nonnegative integer). +On error, \-1 is returned and +.I errno +is set to indicate the error. +.SH ERRORS +.BR open (), +.BR openat (), +and +.BR creat () +can fail with the following errors: +.TP +.B EACCES +The requested access to the file is not allowed, or search permission +is denied for one of the directories in the path prefix of +.IR pathname , +or the file did not exist yet and write access to the parent directory +is not allowed. +(See also +.BR path_resolution (7).) +.TP +.B EACCES +.\" commit 30aba6656f61ed44cba445a3c0d38b296fa9e8f5 +Where +.B O_CREAT +is specified, the +.I protected_fifos +or +.I protected_regular +sysctl is enabled, the file already exists and is a FIFO or regular file, the +owner of the file is neither the current user nor the owner of the +containing directory, and the containing directory is both world- or +group-writable and sticky. +For details, see the descriptions of +.I /proc/sys/fs/protected_fifos +and +.I /proc/sys/fs/protected_regular +in +.BR proc (5). +.TP +.B EBADF +.RB ( openat ()) +.I pathname +is relative but +.I dirfd +is neither +.B AT_FDCWD +nor a valid file descriptor. +.TP +.B EBUSY +.B O_EXCL +was specified in +.I flags +and +.I pathname +refers to a block device that is in use by the system (e.g., it is mounted). +.TP +.B EDQUOT +Where +.B O_CREAT +is specified, the file does not exist, and the user's quota of disk +blocks or inodes on the filesystem has been exhausted. +.TP +.B EEXIST +.I pathname +already exists and +.BR O_CREAT " and " O_EXCL +were used. +.TP +.B EFAULT +.I pathname +points outside your accessible address space. +.TP +.B EFBIG +See +.BR EOVERFLOW . +.TP +.B EINTR +While blocked waiting to complete an open of a slow device +(e.g., a FIFO; see +.BR fifo (7)), +the call was interrupted by a signal handler; see +.BR signal (7). +.TP +.B EINVAL +The filesystem does not support the +.B O_DIRECT +flag. +See +.B NOTES +for more information. +.TP +.B EINVAL +Invalid value in +.\" In particular, __O_TMPFILE instead of O_TMPFILE +.IR flags . +.TP +.B EINVAL +.B O_TMPFILE +was specified in +.IR flags , +but neither +.B O_WRONLY +nor +.B O_RDWR +was specified. +.TP +.B EINVAL +.B O_CREAT +was specified in +.I flags +and the final component ("basename") of the new file's +.I pathname +is invalid +(e.g., it contains characters not permitted by the underlying filesystem). +.TP +.B EINVAL +The final component ("basename") of +.I pathname +is invalid +(e.g., it contains characters not permitted by the underlying filesystem). +.TP +.B EISDIR +.I pathname +refers to a directory and the access requested involved writing +(that is, +.B O_WRONLY +or +.B O_RDWR +is set). +.TP +.B EISDIR +.I pathname +refers to an existing directory, +.B O_TMPFILE +and one of +.B O_WRONLY +or +.B O_RDWR +were specified in +.IR flags , +but this kernel version does not provide the +.B O_TMPFILE +functionality. +.TP +.B ELOOP +Too many symbolic links were encountered in resolving +.IR pathname . +.TP +.B ELOOP +.I pathname +was a symbolic link, and +.I flags +specified +.B O_NOFOLLOW +but not +.BR O_PATH . +.TP +.B EMFILE +The per-process limit on the number of open file descriptors has been reached +(see the description of +.B RLIMIT_NOFILE +in +.BR getrlimit (2)). +.TP +.B ENAMETOOLONG +.I pathname +was too long. +.TP +.B ENFILE +The system-wide limit on the total number of open files has been reached. +.TP +.B ENODEV +.I pathname +refers to a device special file and no corresponding device exists. +(This is a Linux kernel bug; in this situation +.B ENXIO +must be returned.) +.TP +.B ENOENT +.B O_CREAT +is not set and the named file does not exist. +.TP +.B ENOENT +A directory component in +.I pathname +does not exist or is a dangling symbolic link. +.TP +.B ENOENT +.I pathname +refers to a nonexistent directory, +.B O_TMPFILE +and one of +.B O_WRONLY +or +.B O_RDWR +were specified in +.IR flags , +but this kernel version does not provide the +.B O_TMPFILE +functionality. +.TP +.B ENOMEM +The named file is a FIFO, +but memory for the FIFO buffer can't be allocated because +the per-user hard limit on memory allocation for pipes has been reached +and the caller is not privileged; see +.BR pipe (7). +.TP +.B ENOMEM +Insufficient kernel memory was available. +.TP +.B ENOSPC +.I pathname +was to be created but the device containing +.I pathname +has no room for the new file. +.TP +.B ENOTDIR +A component used as a directory in +.I pathname +is not, in fact, a directory, or \fBO_DIRECTORY\fP was specified and +.I pathname +was not a directory. +.TP +.B ENOTDIR +.RB ( openat ()) +.I pathname +is a relative pathname and +.I dirfd +is a file descriptor referring to a file other than a directory. +.TP +.B ENXIO +.BR O_NONBLOCK " | " O_WRONLY +is set, the named file is a FIFO, and +no process has the FIFO open for reading. +.TP +.B ENXIO +The file is a device special file and no corresponding device exists. +.TP +.B ENXIO +The file is a UNIX domain socket. +.TP +.B EOPNOTSUPP +The filesystem containing +.I pathname +does not support +.BR O_TMPFILE . +.TP +.B EOVERFLOW +.I pathname +refers to a regular file that is too large to be opened. +The usual scenario here is that an application compiled +on a 32-bit platform without +.I \-D_FILE_OFFSET_BITS=64 +tried to open a file whose size exceeds +.I (1<<31)\-1 +bytes; +see also +.B O_LARGEFILE +above. +This is the error specified by POSIX.1; +before Linux 2.6.24, Linux gave the error +.B EFBIG +for this case. +.\" See http://bugzilla.kernel.org/show_bug.cgi?id=7253 +.\" "Open of a large file on 32-bit fails with EFBIG, should be EOVERFLOW" +.\" Reported 2006-10-03 +.TP +.B EPERM +The +.B O_NOATIME +flag was specified, but the effective user ID of the caller +.\" Strictly speaking, it's the filesystem UID... (MTK) +did not match the owner of the file and the caller was not privileged. +.TP +.B EPERM +The operation was prevented by a file seal; see +.BR fcntl (2). +.TP +.B EROFS +.I pathname +refers to a file on a read-only filesystem and write access was +requested. +.TP +.B ETXTBSY +.I pathname +refers to an executable image which is currently being executed and +write access was requested. +.TP +.B ETXTBSY +.I pathname +refers to a file that is currently in use as a swap file, and the +.B O_TRUNC +flag was specified. +.TP +.B ETXTBSY +.I pathname +refers to a file that is currently being read by the kernel (e.g., for +module/firmware loading), and write access was requested. +.TP +.B EWOULDBLOCK +The +.B O_NONBLOCK +flag was specified, and an incompatible lease was held on the file +(see +.BR fcntl (2)). +.SH VERSIONS +.BR openat () +was added in Linux 2.6.16; +library support was added in glibc 2.4. +.SH STANDARDS +.BR open (), +.BR creat () +SVr4, 4.3BSD, POSIX.1-2001, POSIX.1-2008. +.PP +.BR openat (): +POSIX.1-2008. +.PP +.BR openat2 (2) +is Linux-specific. +.PP +The +.BR O_DIRECT , +.BR O_NOATIME , +.BR O_PATH , +and +.B O_TMPFILE +flags are Linux-specific. +One must define +.B _GNU_SOURCE +to obtain their definitions. +.PP +The +.BR O_CLOEXEC , +.BR O_DIRECTORY , +and +.B O_NOFOLLOW +flags are not specified in POSIX.1-2001, +but are specified in POSIX.1-2008. +Since glibc 2.12, one can obtain their definitions by defining either +.B _POSIX_C_SOURCE +with a value greater than or equal to 200809L or +.B _XOPEN_SOURCE +with a value greater than or equal to 700. +In glibc 2.11 and earlier, one obtains the definitions by defining +.BR _GNU_SOURCE . +.PP +As noted in +.BR feature_test_macros (7), +feature test macros such as +.BR _POSIX_C_SOURCE , +.BR _XOPEN_SOURCE , +and +.B _GNU_SOURCE +must be defined before including +.I any +header files. +.SH NOTES +Under Linux, the +.B O_NONBLOCK +flag is sometimes used in cases where one wants to open +but does not necessarily have the intention to read or write. +For example, +this may be used to open a device in order to get a file descriptor +for use with +.BR ioctl (2). +.PP +The (undefined) effect of +.B O_RDONLY | O_TRUNC +varies among implementations. +On many systems the file is actually truncated. +.\" Linux 2.0, 2.5: truncate +.\" Solaris 5.7, 5.8: truncate +.\" Irix 6.5: truncate +.\" Tru64 5.1B: truncate +.\" HP-UX 11.22: truncate +.\" FreeBSD 4.7: truncate +.PP +Note that +.BR open () +can open device special files, but +.BR creat () +cannot create them; use +.BR mknod (2) +instead. +.PP +If the file is newly created, its +.IR st_atime , +.IR st_ctime , +.I st_mtime +fields +(respectively, time of last access, time of last status change, and +time of last modification; see +.BR stat (2)) +are set +to the current time, and so are the +.I st_ctime +and +.I st_mtime +fields of the +parent directory. +Otherwise, if the file is modified because of the +.B O_TRUNC +flag, its +.I st_ctime +and +.I st_mtime +fields are set to the current time. +.PP +The files in the +.I /proc/[pid]/fd +directory show the open file descriptors of the process with the PID +.IR pid . +The files in the +.I /proc/[pid]/fdinfo +directory show even more information about these file descriptors. +See +.BR proc (5) +for further details of both of these directories. +.PP +The Linux header file +.B <asm/fcntl.h> +doesn't define +.BR O_ASYNC ; +the (BSD-derived) +.B FASYNC +synonym is defined instead. +.\" +.\" +.SS Open file descriptions +The term open file description is the one used by POSIX to refer to the +entries in the system-wide table of open files. +In other contexts, this object is +variously also called an "open file object", +a "file handle", an "open file table entry", +or\[em]in kernel-developer parlance\[em]a +.IR "struct file" . +.PP +When a file descriptor is duplicated (using +.BR dup (2) +or similar), +the duplicate refers to the same open file description +as the original file descriptor, +and the two file descriptors consequently share +the file offset and file status flags. +Such sharing can also occur between processes: +a child process created via +.BR fork (2) +inherits duplicates of its parent's file descriptors, +and those duplicates refer to the same open file descriptions. +.PP +Each +.BR open () +of a file creates a new open file description; +thus, there may be multiple open file descriptions +corresponding to a file inode. +.PP +On Linux, one can use the +.BR kcmp (2) +.B KCMP_FILE +operation to test whether two file descriptors +(in the same process or in two different processes) +refer to the same open file description. +.\" +.\" +.SS Synchronized I/O +The POSIX.1-2008 "synchronized I/O" option +specifies different variants of synchronized I/O, +and specifies the +.BR open () +flags +.BR O_SYNC , +.BR O_DSYNC , +and +.B O_RSYNC +for controlling the behavior. +Regardless of whether an implementation supports this option, +it must at least support the use of +.B O_SYNC +for regular files. +.PP +Linux implements +.B O_SYNC +and +.BR O_DSYNC , +but not +.BR O_RSYNC . +Somewhat incorrectly, glibc defines +.B O_RSYNC +to have the same value as +.BR O_SYNC . +.RB ( O_RSYNC +is defined in the Linux header file +.I <asm/fcntl.h> +on HP PA-RISC, but it is not used.) +.PP +.B O_SYNC +provides synchronized I/O +.I file +integrity completion, +meaning write operations will flush data and all associated metadata +to the underlying hardware. +.B O_DSYNC +provides synchronized I/O +.I data +integrity completion, +meaning write operations will flush data +to the underlying hardware, +but will only flush metadata updates that are required +to allow a subsequent read operation to complete successfully. +Data integrity completion can reduce the number of disk operations +that are required for applications that don't need the guarantees +of file integrity completion. +.PP +To understand the difference between the two types of completion, +consider two pieces of file metadata: +the file last modification timestamp +.RI ( st_mtime ) +and the file length. +All write operations will update the last file modification timestamp, +but only writes that add data to the end of the +file will change the file length. +The last modification timestamp is not needed to ensure that +a read completes successfully, but the file length is. +Thus, +.B O_DSYNC +would only guarantee to flush updates to the file length metadata +(whereas +.B O_SYNC +would also always flush the last modification timestamp metadata). +.PP +Before Linux 2.6.33, Linux implemented only the +.B O_SYNC +flag for +.BR open (). +However, when that flag was specified, +most filesystems actually provided the equivalent of synchronized I/O +.I data +integrity completion (i.e., +.B O_SYNC +was actually implemented as the equivalent of +.BR O_DSYNC ). +.PP +Since Linux 2.6.33, proper +.B O_SYNC +support is provided. +However, to ensure backward binary compatibility, +.B O_DSYNC +was defined with the same value as the historical +.BR O_SYNC , +and +.B O_SYNC +was defined as a new (two-bit) flag value that includes the +.B O_DSYNC +flag value. +This ensures that applications compiled against +new headers get at least +.B O_DSYNC +semantics before Linux 2.6.33. +.\" +.SS C library/kernel differences +Since glibc 2.26, +the glibc wrapper function for +.BR open () +employs the +.BR openat () +system call, rather than the kernel's +.BR open () +system call. +For certain architectures, this is also true before glibc 2.26. +.\" +.SS NFS +There are many infelicities in the protocol underlying NFS, affecting +amongst others +.BR O_SYNC " and " O_NDELAY . +.PP +On NFS filesystems with UID mapping enabled, +.BR open () +may +return a file descriptor but, for example, +.BR read (2) +requests are denied +with +.BR EACCES . +This is because the client performs +.BR open () +by checking the +permissions, but UID mapping is performed by the server upon +read and write requests. +.\" +.\" +.SS FIFOs +Opening the read or write end of a FIFO blocks until the other +end is also opened (by another process or thread). +See +.BR fifo (7) +for further details. +.\" +.\" +.SS File access mode +Unlike the other values that can be specified in +.IR flags , +the +.I "access mode" +values +.BR O_RDONLY ", " O_WRONLY ", and " O_RDWR +do not specify individual bits. +Rather, they define the low order two bits of +.IR flags , +and are defined respectively as 0, 1, and 2. +In other words, the combination +.B "O_RDONLY | O_WRONLY" +is a logical error, and certainly does not have the same meaning as +.BR O_RDWR . +.PP +Linux reserves the special, nonstandard access mode 3 (binary 11) in +.I flags +to mean: +check for read and write permission on the file and return a file descriptor +that can't be used for reading or writing. +This nonstandard access mode is used by some Linux drivers to return a +file descriptor that is to be used only for device-specific +.BR ioctl (2) +operations. +.\" See for example util-linux's disk-utils/setfdprm.c +.\" For some background on access mode 3, see +.\" http://thread.gmane.org/gmane.linux.kernel/653123 +.\" "[RFC] correct flags to f_mode conversion in __dentry_open" +.\" LKML, 12 Mar 2008 +.\" +.\" +.SS Rationale for openat() and other "directory file descriptor" APIs +.BR openat () +and the other system calls and library functions that take +a directory file descriptor argument +(i.e., +.BR execveat (2), +.BR faccessat (2), +.BR fanotify_mark (2), +.BR fchmodat (2), +.BR fchownat (2), +.BR fspick (2), +.BR fstatat (2), +.BR futimesat (2), +.BR linkat (2), +.BR mkdirat (2), +.BR mknodat (2), +.BR mount_setattr (2), +.BR move_mount (2), +.BR name_to_handle_at (2), +.BR open_tree (2), +.BR openat2 (2), +.BR readlinkat (2), +.BR renameat (2), +.BR renameat2 (2), +.BR statx (2), +.BR symlinkat (2), +.BR unlinkat (2), +.BR utimensat (2), +.BR mkfifoat (3), +and +.BR scandirat (3)) +address two problems with the older interfaces that preceded them. +Here, the explanation is in terms of the +.BR openat () +call, but the rationale is analogous for the other interfaces. +.PP +First, +.BR openat () +allows an application to avoid race conditions that could +occur when using +.BR open () +to open files in directories other than the current working directory. +These race conditions result from the fact that some component +of the directory prefix given to +.BR open () +could be changed in parallel with the call to +.BR open (). +Suppose, for example, that we wish to create the file +.I dir1/dir2/xxx.dep +if the file +.I dir1/dir2/xxx +exists. +The problem is that between the existence check and the file-creation step, +.I dir1 +or +.I dir2 +(which might be symbolic links) +could be modified to point to a different location. +Such races can be avoided by +opening a file descriptor for the target directory, +and then specifying that file descriptor as the +.I dirfd +argument of (say) +.BR fstatat (2) +and +.BR openat (). +The use of the +.I dirfd +file descriptor also has other benefits: +.IP \[bu] 3 +the file descriptor is a stable reference to the directory, +even if the directory is renamed; and +.IP \[bu] +the open file descriptor prevents the underlying filesystem from +being dismounted, +just as when a process has a current working directory on a filesystem. +.PP +Second, +.BR openat () +allows the implementation of a per-thread "current working +directory", via file descriptor(s) maintained by the application. +(This functionality can also be obtained by tricks based +on the use of +.IR /proc/self/fd/ dirfd, +but less efficiently.) +.PP +The +.I dirfd +argument for these APIs can be obtained by using +.BR open () +or +.BR openat () +to open a directory (with either the +.B O_RDONLY +or the +.B O_PATH +flag). +Alternatively, such a file descriptor can be obtained by applying +.BR dirfd (3) +to a directory stream created using +.BR opendir (3). +.PP +When these APIs are given a +.I dirfd +argument of +.B AT_FDCWD +or the specified pathname is absolute, +then they handle their pathname argument in the same way as +the corresponding conventional APIs. +However, in this case, several of the APIs have a +.I flags +argument that provides access to functionality that is not available with +the corresponding conventional APIs. +.\" +.\" +.SS O_DIRECT +The +.B O_DIRECT +flag may impose alignment restrictions on the length and address +of user-space buffers and the file offset of I/Os. +In Linux alignment +restrictions vary by filesystem and kernel version and might be +absent entirely. +The handling of misaligned +.B O_DIRECT +I/Os also varies; +they can either fail with +.B EINVAL +or fall back to buffered I/O. +.PP +Since Linux 6.1, +.B O_DIRECT +support and alignment restrictions for a file can be queried using +.BR statx (2), +using the +.B STATX_DIOALIGN +flag. +Support for +.B STATX_DIOALIGN +varies by filesystem; +see +.BR statx (2). +.PP +Some filesystems provide their own interfaces for querying +.B O_DIRECT +alignment restrictions, +for example the +.B XFS_IOC_DIOINFO +operation in +.BR xfsctl (3). +.B STATX_DIOALIGN +should be used instead when it is available. +.PP +If none of the above is available, +then direct I/O support and alignment restrictions +can only be assumed from known characteristics of the filesystem, +the individual file, +the underlying storage device(s), +and the kernel version. +In Linux 2.4, +most filesystems based on block devices require that +the file offset and the length and memory address of all I/O segments +be multiples of the filesystem block size +(typically 4096 bytes). +In Linux 2.6.0, +this was relaxed to the logical block size of the block device +(typically 512 bytes). +A block device's logical block size can be determined using the +.BR ioctl (2) +.B BLKSSZGET +operation or from the shell using the command: +.PP +.in +4n +.EX +blockdev \-\-getss +.EE +.in +.PP +.B O_DIRECT +I/Os should never be run concurrently with the +.BR fork (2) +system call, +if the memory buffer is a private mapping +(i.e., any mapping created with the +.BR mmap (2) +.B MAP_PRIVATE +flag; +this includes memory allocated on the heap and statically allocated buffers). +Any such I/Os, whether submitted via an asynchronous I/O interface or from +another thread in the process, +should be completed before +.BR fork (2) +is called. +Failure to do so can result in data corruption and undefined behavior in +parent and child processes. +This restriction does not apply when the memory buffer for the +.B O_DIRECT +I/Os was created using +.BR shmat (2) +or +.BR mmap (2) +with the +.B MAP_SHARED +flag. +Nor does this restriction apply when the memory buffer has been advised as +.B MADV_DONTFORK +with +.BR madvise (2), +ensuring that it will not be available +to the child after +.BR fork (2). +.PP +The +.B O_DIRECT +flag was introduced in SGI IRIX, where it has alignment +restrictions similar to those of Linux 2.4. +IRIX has also a +.BR fcntl (2) +call to query appropriate alignments, and sizes. +FreeBSD 4.x introduced +a flag of the same name, but without alignment restrictions. +.PP +.B O_DIRECT +support was added in Linux 2.4.10. +Older Linux kernels simply ignore this flag. +Some filesystems may not implement the flag, in which case +.BR open () +fails with the error +.B EINVAL +if it is used. +.PP +Applications should avoid mixing +.B O_DIRECT +and normal I/O to the same file, +and especially to overlapping byte regions in the same file. +Even when the filesystem correctly handles the coherency issues in +this situation, overall I/O throughput is likely to be slower than +using either mode alone. +Likewise, applications should avoid mixing +.BR mmap (2) +of files with direct I/O to the same files. +.PP +The behavior of +.B O_DIRECT +with NFS will differ from local filesystems. +Older kernels, or +kernels configured in certain ways, may not support this combination. +The NFS protocol does not support passing the flag to the server, so +.B O_DIRECT +I/O will bypass the page cache only on the client; the server may +still cache the I/O. +The client asks the server to make the I/O +synchronous to preserve the synchronous semantics of +.BR O_DIRECT . +Some servers will perform poorly under these circumstances, especially +if the I/O size is small. +Some servers may also be configured to +lie to clients about the I/O having reached stable storage; this +will avoid the performance penalty at some risk to data integrity +in the event of server power failure. +The Linux NFS client places no alignment restrictions on +.B O_DIRECT +I/O. +.PP +In summary, +.B O_DIRECT +is a potentially powerful tool that should be used with caution. +It is recommended that applications treat use of +.B O_DIRECT +as a performance option which is disabled by default. +.SH BUGS +Currently, it is not possible to enable signal-driven +I/O by specifying +.B O_ASYNC +when calling +.BR open (); +use +.BR fcntl (2) +to enable this flag. +.\" FIXME . Check bugzilla report on open(O_ASYNC) +.\" See http://bugzilla.kernel.org/show_bug.cgi?id=5993 +.PP +One must check for two different error codes, +.B EISDIR +and +.BR ENOENT , +when trying to determine whether the kernel supports +.B O_TMPFILE +functionality. +.PP +When both +.B O_CREAT +and +.B O_DIRECTORY +are specified in +.I flags +and the file specified by +.I pathname +does not exist, +.BR open () +will create a regular file (i.e., +.B O_DIRECTORY +is ignored). +.SH SEE ALSO +.BR chmod (2), +.BR chown (2), +.BR close (2), +.BR dup (2), +.BR fcntl (2), +.BR link (2), +.BR lseek (2), +.BR mknod (2), +.BR mmap (2), +.BR mount (2), +.BR open_by_handle_at (2), +.BR openat2 (2), +.BR read (2), +.BR socket (2), +.BR stat (2), +.BR umask (2), +.BR unlink (2), +.BR write (2), +.BR fopen (3), +.BR acl (5), +.BR fifo (7), +.BR inode (7), +.BR path_resolution (7), +.BR symlink (7) |