diff options
Diffstat (limited to 'man/man2/fcntl.2')
-rw-r--r-- | man/man2/fcntl.2 | 2113 |
1 files changed, 2113 insertions, 0 deletions
diff --git a/man/man2/fcntl.2 b/man/man2/fcntl.2 new file mode 100644 index 0000000..9f5e197 --- /dev/null +++ b/man/man2/fcntl.2 @@ -0,0 +1,2113 @@ +.\" This manpage is Copyright (C) 1992 Drew Eckhardt; +.\" and Copyright (C) 1993 Michael Haardt, Ian Jackson; +.\" and Copyright (C) 1998 Jamie Lokier; +.\" and Copyright (C) 2002-2010, 2014 Michael Kerrisk; +.\" and Copyright (C) 2014 Jeff Layton +.\" and Copyright (C) 2014 David Herrmann +.\" and Copyright (C) 2017 Jens Axboe +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.\" Modified 1993-07-24 by Rik Faith <faith@cs.unc.edu> +.\" Modified 1995-09-26 by Andries Brouwer <aeb@cwi.nl> +.\" and again on 960413 and 980804 and 981223. +.\" Modified 1998-12-11 by Jamie Lokier <jamie@imbolc.ucc.ie> +.\" Applied correction by Christian Ehrhardt - aeb, 990712 +.\" Modified 2002-04-23 by Michael Kerrisk <mtk.manpages@gmail.com> +.\" Added note on F_SETFL and O_DIRECT +.\" Complete rewrite + expansion of material on file locking +.\" Incorporated description of F_NOTIFY, drawing on +.\" Stephen Rothwell's notes in Documentation/dnotify.txt. +.\" Added description of F_SETLEASE and F_GETLEASE +.\" Corrected and polished, aeb, 020527. +.\" Modified 2004-03-03 by Michael Kerrisk <mtk.manpages@gmail.com> +.\" Modified description of file leases: fixed some errors of detail +.\" Replaced the term "lease contestant" by "lease breaker" +.\" Modified, 27 May 2004, Michael Kerrisk <mtk.manpages@gmail.com> +.\" Added notes on capability requirements +.\" Modified 2004-12-08, added O_NOATIME after note from Martin Pool +.\" 2004-12-10, mtk, noted F_GETOWN bug after suggestion from aeb. +.\" 2005-04-08 Jamie Lokier <jamie@shareable.org>, mtk +.\" Described behavior of F_SETOWN/F_SETSIG in +.\" multithreaded processes, and generally cleaned +.\" up the discussion of F_SETOWN. +.\" 2005-05-20, Johannes Nicolai <johannes.nicolai@hpi.uni-potsdam.de>, +.\" mtk: Noted F_SETOWN bug for socket file descriptor in Linux 2.4 +.\" and earlier. Added text on permissions required to send signal. +.\" 2009-09-30, Michael Kerrisk +.\" Note obsolete F_SETOWN behavior with threads. +.\" Document F_SETOWN_EX and F_GETOWN_EX +.\" 2010-06-17, Michael Kerrisk +.\" Document F_SETPIPE_SZ and F_GETPIPE_SZ. +.\" 2014-07-08, David Herrmann <dh.herrmann@gmail.com> +.\" Document F_ADD_SEALS and F_GET_SEALS +.\" 2017-06-26, Jens Axboe <axboe@kernel.dk> +.\" Document F_{GET,SET}_RW_HINT and F_{GET,SET}_FILE_RW_HINT +.\" +.TH fcntl 2 2024-05-02 "Linux man-pages (unreleased)" +.SH NAME +fcntl \- manipulate file descriptor +.SH LIBRARY +Standard C library +.RI ( libc ", " \-lc ) +.SH SYNOPSIS +.nf +.B #include <fcntl.h> +.P +.BI "int fcntl(int " fd ", int " op ", ... /* " arg " */ );" +.fi +.SH DESCRIPTION +.BR fcntl () +performs one of the operations described below on the open file descriptor +.IR fd . +The operation is determined by +.IR op . +.P +.BR fcntl () +can take an optional third argument. +Whether or not this argument is required is determined by +.IR op . +The required argument type is indicated in parentheses after each +.I op +name (in most cases, the required type is +.IR int , +and we identify the argument using the name +.IR arg ), +or +.I void +is specified if the argument is not required. +.P +Certain of the operations below are supported only since a particular +Linux kernel version. +The preferred method of checking whether the host kernel supports +a particular operation is to invoke +.BR fcntl () +with the desired +.I op +value and then test whether the call failed with +.BR EINVAL , +indicating that the kernel does not recognize this value. +.SS Duplicating a file descriptor +.TP +.BR F_DUPFD " (\fIint\fP)" +Duplicate the file descriptor +.I fd +using the lowest-numbered available file descriptor greater than or equal to +.IR arg . +This is different from +.BR dup2 (2), +which uses exactly the file descriptor specified. +.IP +On success, the new file descriptor is returned. +.IP +See +.BR dup (2) +for further details. +.TP +.BR F_DUPFD_CLOEXEC " (\fIint\fP; since Linux 2.6.24)" +As for +.BR F_DUPFD , +but additionally set the +close-on-exec flag for the duplicate file descriptor. +Specifying this flag permits a program to avoid an additional +.BR fcntl () +.B F_SETFD +operation to set the +.B FD_CLOEXEC +flag. +For an explanation of why this flag is useful, +see the description of +.B O_CLOEXEC +in +.BR open (2). +.SS File descriptor flags +The following operations manipulate the flags associated with +a file descriptor. +Currently, only one such flag is defined: +.BR FD_CLOEXEC , +the close-on-exec flag. +If the +.B FD_CLOEXEC +bit is set, +the file descriptor will automatically be closed during a successful +.BR execve (2). +(If the +.BR execve (2) +fails, the file descriptor is left open.) +If the +.B FD_CLOEXEC +bit is not set, the file descriptor will remain open across an +.BR execve (2). +.TP +.BR F_GETFD " (\fIvoid\fP)" +Return (as the function result) the file descriptor flags; +.I arg +is ignored. +.TP +.BR F_SETFD " (\fIint\fP)" +Set the file descriptor flags to the value specified by +.IR arg . +.P +In multithreaded programs, using +.BR fcntl () +.B F_SETFD +to set the close-on-exec flag at the same time as another thread performs a +.BR fork (2) +plus +.BR execve (2) +is vulnerable to a race condition that may unintentionally leak +the file descriptor to the program executed in the child process. +See the discussion of the +.B O_CLOEXEC +flag in +.BR open (2) +for details and a remedy to the problem. +.SS File status flags +Each open file description has certain associated status flags, +initialized by +.BR open (2) +.\" or +.\" .BR creat (2), +and possibly modified by +.BR fcntl (). +Duplicated file descriptors +(made with +.BR dup (2), +.BR fcntl (F_DUPFD), +.BR fork (2), +etc.) refer to the same open file description, and thus +share the same file status flags. +.P +The file status flags and their semantics are described in +.BR open (2). +.TP +.BR F_GETFL " (\fIvoid\fP)" +Return (as the function result) +the file access mode and the file status flags; +.I arg +is ignored. +.TP +.BR F_SETFL " (\fIint\fP)" +Set the file status flags to the value specified by +.IR arg . +File access mode +.RB ( O_RDONLY ", " O_WRONLY ", " O_RDWR ) +and file creation flags +(i.e., +.BR O_CREAT ", " O_EXCL ", " O_NOCTTY ", " O_TRUNC ) +in +.I arg +are ignored. +On Linux, this operation can change only the +.BR O_APPEND , +.BR O_ASYNC , +.BR O_DIRECT , +.BR O_NOATIME , +and +.B O_NONBLOCK +flags. +It is not possible to change the +.B O_DSYNC +and +.B O_SYNC +flags; see BUGS, below. +.SS Advisory record locking +Linux implements traditional ("process-associated") UNIX record locks, +as standardized by POSIX. +For a Linux-specific alternative with better semantics, +see the discussion of open file description locks below. +.P +.BR F_SETLK , +.BR F_SETLKW , +and +.B F_GETLK +are used to acquire, release, and test for the existence of record +locks (also known as byte-range, file-segment, or file-region locks). +The third argument, +.IR lock , +is a pointer to a structure that has at least the following fields +(in unspecified order). +.P +.in +4n +.EX +struct flock { + ... + short l_type; /* Type of lock: F_RDLCK, + F_WRLCK, F_UNLCK */ + short l_whence; /* How to interpret l_start: + SEEK_SET, SEEK_CUR, SEEK_END */ + off_t l_start; /* Starting offset for lock */ + off_t l_len; /* Number of bytes to lock */ + pid_t l_pid; /* PID of process blocking our lock + (set by F_GETLK and F_OFD_GETLK) */ + ... +}; +.EE +.in +.P +The +.IR l_whence ", " l_start ", and " l_len +fields of this structure specify the range of bytes we wish to lock. +Bytes past the end of the file may be locked, +but not bytes before the start of the file. +.P +.I l_start +is the starting offset for the lock, and is interpreted +relative to either: +the start of the file (if +.I l_whence +is +.BR SEEK_SET ); +the current file offset (if +.I l_whence +is +.BR SEEK_CUR ); +or the end of the file (if +.I l_whence +is +.BR SEEK_END ). +In the final two cases, +.I l_start +can be a negative number provided the +offset does not lie before the start of the file. +.P +.I l_len +specifies the number of bytes to be locked. +If +.I l_len +is positive, then the range to be locked covers bytes +.I l_start +up to and including +.IR l_start + l_len \-1. +Specifying 0 for +.I l_len +has the special meaning: lock all bytes starting at the +location specified by +.IR l_whence " and " l_start +through to the end of file, no matter how large the file grows. +.P +POSIX.1-2001 allows (but does not require) +an implementation to support a negative +.I l_len +value; if +.I l_len +is negative, the interval described by +.I lock +covers bytes +.IR l_start + l_len +up to and including +.IR l_start \-1. +This is supported since Linux 2.4.21 and Linux 2.5.49. +.P +The +.I l_type +field can be used to place a read +.RB ( F_RDLCK ) +or a write +.RB ( F_WRLCK ) +lock on a file. +Any number of processes may hold a read lock (shared lock) +on a file region, but only one process may hold a write lock +(exclusive lock). +An exclusive lock excludes all other locks, +both shared and exclusive. +A single process can hold only one type of lock on a file region; +if a new lock is applied to an already-locked region, +then the existing lock is converted to the new lock type. +(Such conversions may involve splitting, shrinking, or coalescing with +an existing lock if the byte range specified by the new lock does not +precisely coincide with the range of the existing lock.) +.TP +.BR F_SETLK " (\fIstruct flock *\fP)" +Acquire a lock (when +.I l_type +is +.B F_RDLCK +or +.BR F_WRLCK ) +or release a lock (when +.I l_type +is +.BR F_UNLCK ) +on the bytes specified by the +.IR l_whence ", " l_start ", and " l_len +fields of +.IR lock . +If a conflicting lock is held by another process, +this call returns \-1 and sets +.I errno +to +.B EACCES +or +.BR EAGAIN . +(The error returned in this case differs across implementations, +so POSIX requires a portable application to check for both errors.) +.TP +.BR F_SETLKW " (\fIstruct flock *\fP)" +As for +.BR F_SETLK , +but if a conflicting lock is held on the file, then wait for that +lock to be released. +If a signal is caught while waiting, then the call is interrupted +and (after the signal handler has returned) +returns immediately (with return value \-1 and +.I errno +set to +.BR EINTR ; +see +.BR signal (7)). +.TP +.BR F_GETLK " (\fIstruct flock *\fP)" +On input to this call, +.I lock +describes a lock we would like to place on the file. +If the lock could be placed, +.BR fcntl () +does not actually place it, but returns +.B F_UNLCK +in the +.I l_type +field of +.I lock +and leaves the other fields of the structure unchanged. +.IP +If one or more incompatible locks would prevent +this lock being placed, then +.BR fcntl () +returns details about one of those locks in the +.IR l_type ", " l_whence ", " l_start ", and " l_len +fields of +.IR lock . +If the conflicting lock is a traditional (process-associated) record lock, +then the +.I l_pid +field is set to the PID of the process holding that lock. +If the conflicting lock is an open file description lock, then +.I l_pid +is set to \-1. +Note that the returned information +may already be out of date by the time the caller inspects it. +.P +In order to place a read lock, +.I fd +must be open for reading. +In order to place a write lock, +.I fd +must be open for writing. +To place both types of lock, open a file read-write. +.P +When placing locks with +.BR F_SETLKW , +the kernel detects +.IR deadlocks , +whereby two or more processes have their +lock requests mutually blocked by locks held by the other processes. +For example, suppose process A holds a write lock on byte 100 of a file, +and process B holds a write lock on byte 200. +If each process then attempts to lock the byte already +locked by the other process using +.BR F_SETLKW , +then, without deadlock detection, +both processes would remain blocked indefinitely. +When the kernel detects such deadlocks, +it causes one of the blocking lock requests to immediately fail with the error +.BR EDEADLK ; +an application that encounters such an error should release +some of its locks to allow other applications to proceed before +attempting regain the locks that it requires. +Circular deadlocks involving more than two processes are also detected. +Note, however, that there are limitations to the kernel's +deadlock-detection algorithm; see BUGS. +.P +As well as being removed by an explicit +.BR F_UNLCK , +record locks are automatically released when the process terminates. +.P +Record locks are not inherited by a child created via +.BR fork (2), +but are preserved across an +.BR execve (2). +.P +Because of the buffering performed by the +.BR stdio (3) +library, the use of record locking with routines in that package +should be avoided; use +.BR read (2) +and +.BR write (2) +instead. +.P +The record locks described above are associated with the process +(unlike the open file description locks described below). +This has some unfortunate consequences: +.IP \[bu] 3 +If a process closes +.I any +file descriptor referring to a file, +then all of the process's locks on that file are released, +regardless of the file descriptor(s) on which the locks were obtained. +.\" (Additional file descriptors referring to the same file +.\" may have been obtained by calls to +.\" .BR open "(2), " dup "(2), " dup2 "(2), or " fcntl ().) +This is bad: it means that a process can lose its locks on +a file such as +.I /etc/passwd +or +.I /etc/mtab +when for some reason a library function decides to open, read, +and close the same file. +.IP \[bu] +The threads in a process share locks. +In other words, +a multithreaded program can't use record locking to ensure +that threads don't simultaneously access the same region of a file. +.P +Open file description locks solve both of these problems. +.SS Open file description locks (non-POSIX) +Open file description locks are advisory byte-range locks whose operation is +in most respects identical to the traditional record locks described above. +This lock type is Linux-specific, +and available since Linux 3.15. +(There is a proposal with the Austin Group +.\" FIXME . Review progress into POSIX +.\" http://austingroupbugs.net/view.php?id=768 +to include this lock type in the next revision of POSIX.1.) +For an explanation of open file descriptions, see +.BR open (2). +.P +The principal difference between the two lock types +is that whereas traditional record locks +are associated with a process, +open file description locks are associated with the +open file description on which they are acquired, +much like locks acquired with +.BR flock (2). +Consequently (and unlike traditional advisory record locks), +open file description locks are inherited across +.BR fork (2) +(and +.BR clone (2) +with +.BR CLONE_FILES ), +and are only automatically released on the last close +of the open file description, +instead of being released on any close of the file. +.P +Conflicting lock combinations +(i.e., a read lock and a write lock or two write locks) +where one lock is an open file description lock and the other +is a traditional record lock conflict +even when they are acquired by the same process on the same file descriptor. +.P +Open file description locks placed via the same open file description +(i.e., via the same file descriptor, +or via a duplicate of the file descriptor created by +.BR fork (2), +.BR dup (2), +.BR fcntl () +.BR F_DUPFD , +and so on) are always compatible: +if a new lock is placed on an already locked region, +then the existing lock is converted to the new lock type. +(Such conversions may result in splitting, shrinking, or coalescing with +an existing lock as discussed above.) +.P +On the other hand, open file description locks may conflict with +each other when they are acquired via different open file descriptions. +Thus, the threads in a multithreaded program can use +open file description locks to synchronize access to a file region +by having each thread perform its own +.BR open (2) +on the file and applying locks via the resulting file descriptor. +.P +As with traditional advisory locks, the third argument to +.BR fcntl (), +.IR lock , +is a pointer to an +.I flock +structure. +By contrast with traditional record locks, the +.I l_pid +field of that structure must be set to zero +when using the operations described below. +.P +The operations for working with open file description locks are analogous +to those used with traditional locks: +.TP +.BR F_OFD_SETLK " (\fIstruct flock *\fP)" +Acquire an open file description lock (when +.I l_type +is +.B F_RDLCK +or +.BR F_WRLCK ) +or release an open file description lock (when +.I l_type +is +.BR F_UNLCK ) +on the bytes specified by the +.IR l_whence ", " l_start ", and " l_len +fields of +.IR lock . +If a conflicting lock is held by another process, +this call returns \-1 and sets +.I errno +to +.BR EAGAIN . +.TP +.BR F_OFD_SETLKW " (\fIstruct flock *\fP)" +As for +.BR F_OFD_SETLK , +but if a conflicting lock is held on the file, then wait for that lock to be +released. +If a signal is caught while waiting, then the call is interrupted +and (after the signal handler has returned) returns immediately +(with return value \-1 and +.I errno +set to +.BR EINTR ; +see +.BR signal (7)). +.TP +.BR F_OFD_GETLK " (\fIstruct flock *\fP)" +On input to this call, +.I lock +describes an open file description lock we would like to place on the file. +If the lock could be placed, +.BR fcntl () +does not actually place it, but returns +.B F_UNLCK +in the +.I l_type +field of +.I lock +and leaves the other fields of the structure unchanged. +If one or more incompatible locks would prevent this lock being placed, +then details about one of these locks are returned via +.IR lock , +as described above for +.BR F_GETLK . +.P +In the current implementation, +.\" commit 57b65325fe34ec4c917bc4e555144b4a94d9e1f7 +no deadlock detection is performed for open file description locks. +(This contrasts with process-associated record locks, +for which the kernel does perform deadlock detection.) +.\" +.SS Mandatory locking +.IR Warning : +the Linux implementation of mandatory locking is unreliable. +See BUGS below. +Because of these bugs, +and the fact that the feature is believed to be little used, +since Linux 4.5, mandatory locking has been made an optional feature, +governed by a configuration option +.RB ( CONFIG_MANDATORY_FILE_LOCKING ). +This feature is no longer supported at all in Linux 5.15 and above. +.P +By default, both traditional (process-associated) and open file description +record locks are advisory. +Advisory locks are not enforced and are useful only between +cooperating processes. +.P +Both lock types can also be mandatory. +Mandatory locks are enforced for all processes. +If a process tries to perform an incompatible access (e.g., +.BR read (2) +or +.BR write (2)) +on a file region that has an incompatible mandatory lock, +then the result depends upon whether the +.B O_NONBLOCK +flag is enabled for its open file description. +If the +.B O_NONBLOCK +flag is not enabled, then +the system call is blocked until the lock is removed +or converted to a mode that is compatible with the access. +If the +.B O_NONBLOCK +flag is enabled, then the system call fails with the error +.BR EAGAIN . +.P +To make use of mandatory locks, mandatory locking must be enabled +both on the filesystem that contains the file to be locked, +and on the file itself. +Mandatory locking is enabled on a filesystem +using the "\-o mand" option to +.BR mount (8), +or the +.B MS_MANDLOCK +flag for +.BR mount (2). +Mandatory locking is enabled on a file by disabling +group execute permission on the file and enabling the set-group-ID +permission bit (see +.BR chmod (1) +and +.BR chmod (2)). +.P +Mandatory locking is not specified by POSIX. +Some other systems also support mandatory locking, +although the details of how to enable it vary across systems. +.\" +.SS Lost locks +When an advisory lock is obtained on a networked filesystem such as +NFS it is possible that the lock might get lost. +This may happen due to administrative action on the server, or due to a +network partition (i.e., loss of network connectivity with the server) +which lasts long enough for the server to assume +that the client is no longer functioning. +.P +When the filesystem determines that a lock has been lost, future +.BR read (2) +or +.BR write (2) +requests may fail with the error +.BR EIO . +This error will persist until the lock is removed or the file +descriptor is closed. +Since Linux 3.12, +.\" commit ef1820f9be27b6ad158f433ab38002ab8131db4d +this happens at least for NFSv4 (including all minor versions). +.P +Some versions of UNIX send a signal +.RB ( SIGLOST ) +in this circumstance. +Linux does not define this signal, and does not provide any +asynchronous notification of lost locks. +.\" +.SS Managing signals +.BR F_GETOWN , +.BR F_SETOWN , +.BR F_GETOWN_EX , +.BR F_SETOWN_EX , +.BR F_GETSIG , +and +.B F_SETSIG +are used to manage I/O availability signals: +.TP +.BR F_GETOWN " (\fIvoid\fP)" +Return (as the function result) +the process ID or process group ID currently receiving +.B SIGIO +and +.B SIGURG +signals for events on file descriptor +.IR fd . +Process IDs are returned as positive values; +process group IDs are returned as negative values (but see BUGS below). +.I arg +is ignored. +.TP +.BR F_SETOWN " (\fIint\fP)" +Set the process ID or process group ID that will receive +.B SIGIO +and +.B SIGURG +signals for events on the file descriptor +.IR fd . +The target process or process group ID is specified in +.IR arg . +A process ID is specified as a positive value; +a process group ID is specified as a negative value. +Most commonly, the calling process specifies itself as the owner +(that is, +.I arg +is specified as +.BR getpid (2)). +.IP +As well as setting the file descriptor owner, +one must also enable generation of signals on the file descriptor. +This is done by using the +.BR fcntl () +.B F_SETFL +operation to set the +.B O_ASYNC +file status flag on the file descriptor. +Subsequently, a +.B SIGIO +signal is sent whenever input or output becomes possible +on the file descriptor. +The +.BR fcntl () +.B F_SETSIG +operation can be used to obtain delivery of a signal other than +.BR SIGIO . +.IP +Sending a signal to the owner process (group) specified by +.B F_SETOWN +is subject to the same permissions checks as are described for +.BR kill (2), +where the sending process is the one that employs +.B F_SETOWN +(but see BUGS below). +If this permission check fails, then the signal is +silently discarded. +.IR Note : +The +.B F_SETOWN +operation records the caller's credentials at the time of the +.BR fcntl () +call, +and it is these saved credentials that are used for the permission checks. +.IP +If the file descriptor +.I fd +refers to a socket, +.B F_SETOWN +also selects +the recipient of +.B SIGURG +signals that are delivered when out-of-band +data arrives on that socket. +.RB ( SIGURG +is sent in any situation where +.BR select (2) +would report the socket as having an "exceptional condition".) +.\" The following appears to be rubbish. It doesn't seem to +.\" be true according to the kernel source, and I can write +.\" a program that gets a terminal-generated SIGIO even though +.\" it is not the foreground process group of the terminal. +.\" -- MTK, 8 Apr 05 +.\" +.\" If the file descriptor +.\" .I fd +.\" refers to a terminal device, then SIGIO +.\" signals are sent to the foreground process group of the terminal. +.IP +The following was true in Linux 2.6.x up to and including Linux 2.6.11: +.RS +.IP +If a nonzero value is given to +.B F_SETSIG +in a multithreaded process running with a threading library +that supports thread groups (e.g., NPTL), +then a positive value given to +.B F_SETOWN +has a different meaning: +.\" The relevant place in the (2.6) kernel source is the +.\" 'switch' in fs/fcntl.c::send_sigio_to_task() -- MTK, Apr 2005 +instead of being a process ID identifying a whole process, +it is a thread ID identifying a specific thread within a process. +Consequently, it may be necessary to pass +.B F_SETOWN +the result of +.BR gettid (2) +instead of +.BR getpid (2) +to get sensible results when +.B F_SETSIG +is used. +(In current Linux threading implementations, +a main thread's thread ID is the same as its process ID. +This means that a single-threaded program can equally use +.BR gettid (2) +or +.BR getpid (2) +in this scenario.) +Note, however, that the statements in this paragraph do not apply +to the +.B SIGURG +signal generated for out-of-band data on a socket: +this signal is always sent to either a process or a process group, +depending on the value given to +.BR F_SETOWN . +.\" send_sigurg()/send_sigurg_to_task() bypasses +.\" kill_fasync()/send_sigio()/send_sigio_to_task() +.\" to directly call send_group_sig_info() +.\" -- MTK, Apr 2005 (kernel 2.6.11) +.RE +.IP +The above behavior was accidentally dropped in Linux 2.6.12, +and won't be restored. +From Linux 2.6.32 onward, use +.B F_SETOWN_EX +to target +.B SIGIO +and +.B SIGURG +signals at a particular thread. +.TP +.BR F_GETOWN_EX " (\fIstruct f_owner_ex *\fP) (since Linux 2.6.32)" +Return the current file descriptor owner settings +as defined by a previous +.B F_SETOWN_EX +operation. +The information is returned in the structure pointed to by +.IR arg , +which has the following form: +.IP +.in +4n +.EX +struct f_owner_ex { + int type; + pid_t pid; +}; +.EE +.in +.IP +The +.I type +field will have one of the values +.BR F_OWNER_TID , +.BR F_OWNER_PID , +or +.BR F_OWNER_PGRP . +The +.I pid +field is a positive integer representing a thread ID, process ID, +or process group ID. +See +.B F_SETOWN_EX +for more details. +.TP +.BR F_SETOWN_EX " (\fIstruct f_owner_ex *\fP) (since Linux 2.6.32)" +This operation performs a similar task to +.BR F_SETOWN . +It allows the caller to direct I/O availability signals +to a specific thread, process, or process group. +The caller specifies the target of signals via +.IR arg , +which is a pointer to a +.I f_owner_ex +structure. +The +.I type +field has one of the following values, which define how +.I pid +is interpreted: +.RS +.TP +.B F_OWNER_TID +Send the signal to the thread whose thread ID +(the value returned by a call to +.BR clone (2) +or +.BR gettid (2)) +is specified in +.IR pid . +.TP +.B F_OWNER_PID +Send the signal to the process whose ID +is specified in +.IR pid . +.TP +.B F_OWNER_PGRP +Send the signal to the process group whose ID +is specified in +.IR pid . +(Note that, unlike with +.BR F_SETOWN , +a process group ID is specified as a positive value here.) +.RE +.TP +.BR F_GETSIG " (\fIvoid\fP)" +Return (as the function result) +the signal sent when input or output becomes possible. +A value of zero means +.B SIGIO +is sent. +Any other value (including +.BR SIGIO ) +is the +signal sent instead, and in this case additional info is available to +the signal handler if installed with +.BR SA_SIGINFO . +.I arg +is ignored. +.TP +.BR F_SETSIG " (\fIint\fP)" +Set the signal sent when input or output becomes possible +to the value given in +.IR arg . +A value of zero means to send the default +.B SIGIO +signal. +Any other value (including +.BR SIGIO ) +is the signal to send instead, and in this case additional info +is available to the signal handler if installed with +.BR SA_SIGINFO . +.\" +.\" The following was true only up until Linux 2.6.11: +.\" +.\" Additionally, passing a nonzero value to +.\" .B F_SETSIG +.\" changes the signal recipient from a whole process to a specific thread +.\" within a process. +.\" See the description of +.\" .B F_SETOWN +.\" for more details. +.IP +By using +.B F_SETSIG +with a nonzero value, and setting +.B SA_SIGINFO +for the +signal handler (see +.BR sigaction (2)), +extra information about I/O events is passed to +the handler in a +.I siginfo_t +structure. +If the +.I si_code +field indicates the source is +.BR SI_SIGIO , +the +.I si_fd +field gives the file descriptor associated with the event. +Otherwise, +there is no indication which file descriptors are pending, and you +should use the usual mechanisms +.RB ( select (2), +.BR poll (2), +.BR read (2) +with +.B O_NONBLOCK +set etc.) to determine which file descriptors are available for I/O. +.IP +Note that the file descriptor provided in +.I si_fd +is the one that was specified during the +.B F_SETSIG +operation. +This can lead to an unusual corner case. +If the file descriptor is duplicated +.RB ( dup (2) +or similar), and the original file descriptor is closed, +then I/O events will continue to be generated, but the +.I si_fd +field will contain the number of the now closed file descriptor. +.IP +By selecting a real time signal (value >= +.BR SIGRTMIN ), +multiple I/O events may be queued using the same signal numbers. +(Queuing is dependent on available memory.) +Extra information is available +if +.B SA_SIGINFO +is set for the signal handler, as above. +.IP +Note that Linux imposes a limit on the +number of real-time signals that may be queued to a +process (see +.BR getrlimit (2) +and +.BR signal (7)) +and if this limit is reached, then the kernel reverts to +delivering +.BR SIGIO , +and this signal is delivered to the entire +process rather than to a specific thread. +.\" See fs/fcntl.c::send_sigio_to_task() (2.4/2.6) sources -- MTK, Apr 05 +.P +Using these mechanisms, a program can implement fully asynchronous I/O +without using +.BR select (2) +or +.BR poll (2) +most of the time. +.P +The use of +.B O_ASYNC +is specific to BSD and Linux. +The only use of +.B F_GETOWN +and +.B F_SETOWN +specified in POSIX.1 is in conjunction with the use of the +.B SIGURG +signal on sockets. +(POSIX does not specify the +.B SIGIO +signal.) +.BR F_GETOWN_EX , +.BR F_SETOWN_EX , +.BR F_GETSIG , +and +.B F_SETSIG +are Linux-specific. +POSIX has asynchronous I/O and the +.I aio_sigevent +structure to achieve similar things; these are also available +in Linux as part of the GNU C Library (glibc). +.SS Leases +.B F_SETLEASE +and +.B F_GETLEASE +(Linux 2.4 onward) are used to establish a new lease, +and retrieve the current lease, on the open file description +referred to by the file descriptor +.IR fd . +A file lease provides a mechanism whereby the process holding +the lease (the "lease holder") is notified (via delivery of a signal) +when a process (the "lease breaker") tries to +.BR open (2) +or +.BR truncate (2) +the file referred to by that file descriptor. +.TP +.BR F_SETLEASE " (\fIint\fP)" +Set or remove a file lease according to which of the following +values is specified in the integer +.IR arg : +.RS +.TP +.B F_RDLCK +Take out a read lease. +This will cause the calling process to be notified when +the file is opened for writing or is truncated. +.\" The following became true in Linux 2.6.10: +.\" See the man-pages-2.09 Changelog for further info. +A read lease can be placed only on a file descriptor that +is opened read-only. +.TP +.B F_WRLCK +Take out a write lease. +This will cause the caller to be notified when +the file is opened for reading or writing or is truncated. +A write lease may be placed on a file only if there are no +other open file descriptors for the file. +.TP +.B F_UNLCK +Remove our lease from the file. +.RE +.P +Leases are associated with an open file description (see +.BR open (2)). +This means that duplicate file descriptors (created by, for example, +.BR fork (2) +or +.BR dup (2)) +refer to the same lease, and this lease may be modified +or released using any of these descriptors. +Furthermore, the lease is released by either an explicit +.B F_UNLCK +operation on any of these duplicate file descriptors, or when all +such file descriptors have been closed. +.P +Leases may be taken out only on regular files. +An unprivileged process may take out a lease only on a file whose +UID (owner) matches the filesystem UID of the process. +A process with the +.B CAP_LEASE +capability may take out leases on arbitrary files. +.TP +.BR F_GETLEASE " (\fIvoid\fP)" +Indicates what type of lease is associated with the file descriptor +.I fd +by returning either +.BR F_RDLCK ", " F_WRLCK ", or " F_UNLCK , +indicating, respectively, a read lease , a write lease, or no lease. +.I arg +is ignored. +.P +When a process (the "lease breaker") performs an +.BR open (2) +or +.BR truncate (2) +that conflicts with a lease established via +.BR F_SETLEASE , +the system call is blocked by the kernel and +the kernel notifies the lease holder by sending it a signal +.RB ( SIGIO +by default). +The lease holder should respond to receipt of this signal by doing +whatever cleanup is required in preparation for the file to be +accessed by another process (e.g., flushing cached buffers) and +then either remove or downgrade its lease. +A lease is removed by performing an +.B F_SETLEASE +operation specifying +.I arg +as +.BR F_UNLCK . +If the lease holder currently holds a write lease on the file, +and the lease breaker is opening the file for reading, +then it is sufficient for the lease holder to downgrade +the lease to a read lease. +This is done by performing an +.B F_SETLEASE +operation specifying +.I arg +as +.BR F_RDLCK . +.P +If the lease holder fails to downgrade or remove the lease within +the number of seconds specified in +.IR /proc/sys/fs/lease\-break\-time , +then the kernel forcibly removes or downgrades the lease holder's lease. +.P +Once a lease break has been initiated, +.B F_GETLEASE +returns the target lease type (either +.B F_RDLCK +or +.BR F_UNLCK , +depending on what would be compatible with the lease breaker) +until the lease holder voluntarily downgrades or removes the lease or +the kernel forcibly does so after the lease break timer expires. +.P +Once the lease has been voluntarily or forcibly removed or downgraded, +and assuming the lease breaker has not unblocked its system call, +the kernel permits the lease breaker's system call to proceed. +.P +If the lease breaker's blocked +.BR open (2) +or +.BR truncate (2) +is interrupted by a signal handler, +then the system call fails with the error +.BR EINTR , +but the other steps still occur as described above. +If the lease breaker is killed by a signal while blocked in +.BR open (2) +or +.BR truncate (2), +then the other steps still occur as described above. +If the lease breaker specifies the +.B O_NONBLOCK +flag when calling +.BR open (2), +then the call immediately fails with the error +.BR EWOULDBLOCK , +but the other steps still occur as described above. +.P +The default signal used to notify the lease holder is +.BR SIGIO , +but this can be changed using the +.B F_SETSIG +operation to +.BR fcntl (). +If a +.B F_SETSIG +operation is performed (even one specifying +.BR SIGIO ), +and the signal +handler is established using +.BR SA_SIGINFO , +then the handler will receive a +.I siginfo_t +structure as its second argument, and the +.I si_fd +field of this argument will hold the file descriptor of the leased file +that has been accessed by another process. +(This is useful if the caller holds leases against multiple files.) +.SS File and directory change notification (dnotify) +.TP +.BR F_NOTIFY " (\fIint\fP)" +(Linux 2.4 onward) +Provide notification when the directory referred to by +.I fd +or any of the files that it contains is changed. +The events to be notified are specified in +.IR arg , +which is a bit mask specified by ORing together zero or more of +the following bits: +.P +.RS +.PD 0 +.TP +.B DN_ACCESS +A file was accessed +.RB ( read (2), +.BR pread (2), +.BR readv (2), +and similar) +.TP +.B DN_MODIFY +A file was modified +.RB ( write (2), +.BR pwrite (2), +.BR writev (2), +.BR truncate (2), +.BR ftruncate (2), +and similar). +.TP +.B DN_CREATE +A file was created +.RB ( open (2), +.BR creat (2), +.BR mknod (2), +.BR mkdir (2), +.BR link (2), +.BR symlink (2), +.BR rename (2) +into this directory). +.TP +.B DN_DELETE +A file was unlinked +.RB ( unlink (2), +.BR rename (2) +to another directory, +.BR rmdir (2)). +.TP +.B DN_RENAME +A file was renamed within this directory +.RB ( rename (2)). +.TP +.B DN_ATTRIB +The attributes of a file were changed +.RB ( chown (2), +.BR chmod (2), +.BR utime (2), +.BR utimensat (2), +and similar). +.PD +.RE +.IP +(In order to obtain these definitions, the +.B _GNU_SOURCE +feature test macro must be defined before including +.I any +header files.) +.IP +Directory notifications are normally "one-shot", and the application +must reregister to receive further notifications. +Alternatively, if +.B DN_MULTISHOT +is included in +.IR arg , +then notification will remain in effect until explicitly removed. +.IP +.\" The following does seem a poor API-design choice... +A series of +.B F_NOTIFY +requests is cumulative, with the events in +.I arg +being added to the set already monitored. +To disable notification of all events, make an +.B F_NOTIFY +call specifying +.I arg +as 0. +.IP +Notification occurs via delivery of a signal. +The default signal is +.BR SIGIO , +but this can be changed using the +.B F_SETSIG +operation to +.BR fcntl (). +(Note that +.B SIGIO +is one of the nonqueuing standard signals; +switching to the use of a real-time signal means that +multiple notifications can be queued to the process.) +In the latter case, the signal handler receives a +.I siginfo_t +structure as its second argument (if the handler was +established using +.BR SA_SIGINFO ) +and the +.I si_fd +field of this structure contains the file descriptor which +generated the notification (useful when establishing notification +on multiple directories). +.IP +Especially when using +.BR DN_MULTISHOT , +a real time signal should be used for notification, +so that multiple notifications can be queued. +.IP +.B NOTE: +New applications should use the +.I inotify +interface (available since Linux 2.6.13), +which provides a much superior interface for obtaining notifications of +filesystem events. +See +.BR inotify (7). +.SS Changing the capacity of a pipe +.TP +.BR F_SETPIPE_SZ " (\fIint\fP; since Linux 2.6.35)" +Change the capacity of the pipe referred to by +.I fd +to be at least +.I arg +bytes. +An unprivileged process can adjust the pipe capacity to any value +between the system page size and the limit defined in +.I /proc/sys/fs/pipe\-max\-size +(see +.BR proc (5)). +Attempts to set the pipe capacity below the page size are silently +rounded up to the page size. +Attempts by an unprivileged process to set the pipe capacity above the limit in +.I /proc/sys/fs/pipe\-max\-size +yield the error +.BR EPERM ; +a privileged process +.RB ( CAP_SYS_RESOURCE ) +can override the limit. +.IP +When allocating the buffer for the pipe, +the kernel may use a capacity larger than +.IR arg , +if that is convenient for the implementation. +(In the current implementation, +the allocation is the next higher power-of-two page-size multiple +of the requested size.) +The actual capacity (in bytes) that is set is returned as the function result. +.IP +Attempting to set the pipe capacity smaller than the amount +of buffer space currently used to store data produces the error +.BR EBUSY . +.IP +Note that because of the way the pages of the pipe buffer +are employed when data is written to the pipe, +the number of bytes that can be written may be less than the nominal size, +depending on the size of the writes. +.TP +.BR F_GETPIPE_SZ " (\fIvoid\fP; since Linux 2.6.35)" +Return (as the function result) the capacity of the pipe referred to by +.IR fd . +.\" +.SS File Sealing +File seals limit the set of allowed operations on a given file. +For each seal that is set on a file, +a specific set of operations will fail with +.B EPERM +on this file from now on. +The file is said to be sealed. +The default set of seals depends on the type of the underlying +file and filesystem. +For an overview of file sealing, a discussion of its purpose, +and some code examples, see +.BR memfd_create (2). +.P +Currently, +file seals can be applied only to a file descriptor returned by +.BR memfd_create (2) +(if the +.B MFD_ALLOW_SEALING +was employed). +On other filesystems, all +.BR fcntl () +operations that operate on seals will return +.BR EINVAL . +.P +Seals are a property of an inode. +Thus, all open file descriptors referring to the same inode share +the same set of seals. +Furthermore, seals can never be removed, only added. +.TP +.BR F_ADD_SEALS " (\fIint\fP; since Linux 3.17)" +Add the seals given in the bit-mask argument +.I arg +to the set of seals of the inode referred to by the file descriptor +.IR fd . +Seals cannot be removed again. +Once this call succeeds, the seals are enforced by the kernel immediately. +If the current set of seals includes +.B F_SEAL_SEAL +(see below), then this call will be rejected with +.BR EPERM . +Adding a seal that is already set is a no-op, in case +.B F_SEAL_SEAL +is not set already. +In order to place a seal, the file descriptor +.I fd +must be writable. +.TP +.BR F_GET_SEALS " (\fIvoid\fP; since Linux 3.17)" +Return (as the function result) the current set of seals +of the inode referred to by +.IR fd . +If no seals are set, 0 is returned. +If the file does not support sealing, \-1 is returned and +.I errno +is set to +.BR EINVAL . +.P +The following seals are available: +.TP +.B F_SEAL_SEAL +If this seal is set, any further call to +.BR fcntl () +with +.B F_ADD_SEALS +fails with the error +.BR EPERM . +Therefore, this seal prevents any modifications to the set of seals itself. +If the initial set of seals of a file includes +.BR F_SEAL_SEAL , +then this effectively causes the set of seals to be constant and locked. +.TP +.B F_SEAL_SHRINK +If this seal is set, the file in question cannot be reduced in size. +This affects +.BR open (2) +with the +.B O_TRUNC +flag as well as +.BR truncate (2) +and +.BR ftruncate (2). +Those calls fail with +.B EPERM +if you try to shrink the file in question. +Increasing the file size is still possible. +.TP +.B F_SEAL_GROW +If this seal is set, the size of the file in question cannot be increased. +This affects +.BR write (2) +beyond the end of the file, +.BR truncate (2), +.BR ftruncate (2), +and +.BR fallocate (2). +These calls fail with +.B EPERM +if you use them to increase the file size. +If you keep the size or shrink it, those calls still work as expected. +.TP +.B F_SEAL_WRITE +If this seal is set, you cannot modify the contents of the file. +Note that shrinking or growing the size of the file is +still possible and allowed. +.\" One or more other seals are typically used with F_SEAL_WRITE +.\" because, given a file with the F_SEAL_WRITE seal set, then, +.\" while it would no longer be possible to (say) write zeros into +.\" the last 100 bytes of a file, it would still be possible +.\" to (say) shrink the file by 100 bytes using ftruncate(), and +.\" then increase the file size by 100 bytes, which would have +.\" the effect of replacing the last hundred bytes by zeros. +.\" +Thus, this seal is normally used in combination with one of the other seals. +This seal affects +.BR write (2) +and +.BR fallocate (2) +(only in combination with the +.B FALLOC_FL_PUNCH_HOLE +flag). +Those calls fail with +.B EPERM +if this seal is set. +Furthermore, trying to create new shared, writable memory-mappings via +.BR mmap (2) +will also fail with +.BR EPERM . +.IP +Using the +.B F_ADD_SEALS +operation to set the +.B F_SEAL_WRITE +seal fails with +.B EBUSY +if any writable, shared mapping exists. +Such mappings must be unmapped before you can add this seal. +Furthermore, if there are any asynchronous I/O operations +.RB ( io_submit (2)) +pending on the file, +all outstanding writes will be discarded. +.TP +.BR F_SEAL_FUTURE_WRITE " (since Linux 5.1)" +The effect of this seal is similar to +.BR F_SEAL_WRITE , +but the contents of the file can still be modified via +shared writable mappings that were created prior to the seal being set. +Any attempt to create a new writable mapping on the file via +.BR mmap (2) +will fail with +.BR EPERM . +Likewise, an attempt to write to the file via +.BR write (2) +will fail with +.BR EPERM . +.IP +Using this seal, +one process can create a memory buffer that it can continue to modify +while sharing that buffer on a "read-only" basis with other processes. +.\" +.SS File read/write hints +Write lifetime hints can be used to inform the kernel about the relative +expected lifetime of writes on a given inode or +via a particular open file description. +(See +.BR open (2) +for an explanation of open file descriptions.) +In this context, the term "write lifetime" means +the expected time the data will live on media, before +being overwritten or erased. +.P +An application may use the different hint values specified below to +separate writes into different write classes, +so that multiple users or applications running on a single storage back-end +can aggregate their I/O patterns in a consistent manner. +However, there are no functional semantics implied by these flags, +and different I/O classes can use the write lifetime hints +in arbitrary ways, so long as the hints are used consistently. +.P +The following operations can be applied to the file descriptor, +.IR fd : +.TP +.BR F_GET_RW_HINT " (\fIuint64_t *\fP; since Linux 4.13)" +Returns the value of the read/write hint associated with the underlying inode +referred to by +.IR fd . +.TP +.BR F_SET_RW_HINT " (\fIuint64_t *\fP; since Linux 4.13)" +Sets the read/write hint value associated with the +underlying inode referred to by +.IR fd . +This hint persists until either it is explicitly modified or +the underlying filesystem is unmounted. +.TP +.BR F_GET_FILE_RW_HINT " (\fIuint64_t *\fP; since Linux 4.13)" +Returns the value of the read/write hint associated with +the open file description referred to by +.IR fd . +.TP +.BR F_SET_FILE_RW_HINT " (\fIuint64_t *\fP; since Linux 4.13)" +Sets the read/write hint value associated with the open file description +referred to by +.IR fd . +.P +If an open file description has not been assigned a read/write hint, +then it shall use the value assigned to the inode, if any. +.P +The following read/write +hints are valid since Linux 4.13: +.TP +.B RWH_WRITE_LIFE_NOT_SET +No specific hint has been set. +This is the default value. +.TP +.B RWH_WRITE_LIFE_NONE +No specific write lifetime is associated with this file or inode. +.TP +.B RWH_WRITE_LIFE_SHORT +Data written to this inode or via this open file description +is expected to have a short lifetime. +.TP +.B RWH_WRITE_LIFE_MEDIUM +Data written to this inode or via this open file description +is expected to have a lifetime longer than +data written with +.BR RWH_WRITE_LIFE_SHORT . +.TP +.B RWH_WRITE_LIFE_LONG +Data written to this inode or via this open file description +is expected to have a lifetime longer than +data written with +.BR RWH_WRITE_LIFE_MEDIUM . +.TP +.B RWH_WRITE_LIFE_EXTREME +Data written to this inode or via this open file description +is expected to have a lifetime longer than +data written with +.BR RWH_WRITE_LIFE_LONG . +.P +All the write-specific hints are relative to each other, +and no individual absolute meaning should be attributed to them. +.SH RETURN VALUE +For a successful call, the return value depends on the operation: +.TP +.B F_DUPFD +The new file descriptor. +.TP +.B F_GETFD +Value of file descriptor flags. +.TP +.B F_GETFL +Value of file status flags. +.TP +.B F_GETLEASE +Type of lease held on file descriptor. +.TP +.B F_GETOWN +Value of file descriptor owner. +.TP +.B F_GETSIG +Value of signal sent when read or write becomes possible, or zero +for traditional +.B SIGIO +behavior. +.TP +.B F_GETPIPE_SZ +.TQ +.B F_SETPIPE_SZ +The pipe capacity. +.TP +.B F_GET_SEALS +A bit mask identifying the seals that have been set +for the inode referred to by +.IR fd . +.TP +All other operations +Zero. +.P +On error, \-1 is returned, and +.I errno +is set to indicate the error. +.SH ERRORS +.TP +.BR EACCES " or " EAGAIN +Operation is prohibited by locks held by other processes. +.TP +.B EAGAIN +The operation is prohibited because the file has been memory-mapped by +another process. +.TP +.B EBADF +.I fd +is not an open file descriptor +.TP +.B EBADF +.I op +is +.B F_SETLK +or +.B F_SETLKW +and the file descriptor open mode doesn't match with the +type of lock requested. +.TP +.B EBUSY +.I op +is +.B F_SETPIPE_SZ +and the new pipe capacity specified in +.I arg +is smaller than the amount of buffer space currently +used to store data in the pipe. +.TP +.B EBUSY +.I op +is +.BR F_ADD_SEALS , +.I arg +includes +.BR F_SEAL_WRITE , +and there exists a writable, shared mapping on the file referred to by +.IR fd . +.TP +.B EDEADLK +It was detected that the specified +.B F_SETLKW +operation would cause a deadlock. +.TP +.B EFAULT +.I lock +is outside your accessible address space. +.TP +.B EINTR +.I op +is +.B F_SETLKW +or +.B F_OFD_SETLKW +and the operation was interrupted by a signal; see +.BR signal (7). +.TP +.B EINTR +.I op +is +.BR F_GETLK , +.BR F_SETLK , +.BR F_OFD_GETLK , +or +.BR F_OFD_SETLK , +and the operation was interrupted by a signal before the lock was checked or +acquired. +Most likely when locking a remote file (e.g., locking over +NFS), but can sometimes happen locally. +.TP +.B EINVAL +The value specified in +.I op +is not recognized by this kernel. +.TP +.B EINVAL +.I op +is +.B F_ADD_SEALS +and +.I arg +includes an unrecognized sealing bit. +.TP +.B EINVAL +.I op +is +.B F_ADD_SEALS +or +.B F_GET_SEALS +and the filesystem containing the inode referred to by +.I fd +does not support sealing. +.TP +.B EINVAL +.I op +is +.B F_DUPFD +and +.I arg +is negative or is greater than the maximum allowable value +(see the discussion of +.B RLIMIT_NOFILE +in +.BR getrlimit (2)). +.TP +.B EINVAL +.I op +is +.B F_SETSIG +and +.I arg +is not an allowable signal number. +.TP +.B EINVAL +.I op +is +.BR F_OFD_SETLK , +.BR F_OFD_SETLKW , +or +.BR F_OFD_GETLK , +and +.I l_pid +was not specified as zero. +.TP +.B EMFILE +.I op +is +.B F_DUPFD +and the per-process limit on the number of open file descriptors +has been reached. +.TP +.B ENOLCK +Too many segment locks open, lock table is full, or a remote locking +protocol failed (e.g., locking over NFS). +.TP +.B ENOTDIR +.B F_NOTIFY +was specified in +.IR op , +but +.I fd +does not refer to a directory. +.TP +.B EPERM +.I op +is +.B F_SETPIPE_SZ +and the soft or hard user pipe limit has been reached; see +.BR pipe (7). +.TP +.B EPERM +Attempted to clear the +.B O_APPEND +flag on a file that has the append-only attribute set. +.TP +.B EPERM +.I op +was +.BR F_ADD_SEALS , +but +.I fd +was not open for writing +or the current set of seals on the file already includes +.BR F_SEAL_SEAL . +.SH STANDARDS +POSIX.1-2008. +.P +.BR F_GETOWN_EX , +.BR F_SETOWN_EX , +.BR F_SETPIPE_SZ , +.BR F_GETPIPE_SZ , +.BR F_GETSIG , +.BR F_SETSIG , +.BR F_NOTIFY , +.BR F_GETLEASE , +and +.B F_SETLEASE +are Linux-specific. +(Define the +.B _GNU_SOURCE +macro to obtain these definitions.) +.\" .P +.\" SVr4 documents additional EIO, ENOLINK and EOVERFLOW error conditions. +.P +.BR F_OFD_SETLK , +.BR F_OFD_SETLKW , +and +.B F_OFD_GETLK +are Linux-specific (and one must define +.B _GNU_SOURCE +to obtain their definitions), +but work is being done to have them included in the next version of POSIX.1. +.P +.B F_ADD_SEALS +and +.B F_GET_SEALS +are Linux-specific. +.\" FIXME . Once glibc adds support, add a note about FTM requirements +.SH HISTORY +SVr4, 4.3BSD, POSIX.1-2001. +.P +Only the operations +.BR F_DUPFD , +.BR F_GETFD , +.BR F_SETFD , +.BR F_GETFL , +.BR F_SETFL , +.BR F_GETLK , +.BR F_SETLK , +and +.B F_SETLKW +are specified in POSIX.1-2001. +.P +.B F_GETOWN +and +.B F_SETOWN +are specified in POSIX.1-2001. +(To get their definitions, define either +.\" .BR _BSD_SOURCE , +.\" or +.B _XOPEN_SOURCE +with the value 500 or greater, or +.B _POSIX_C_SOURCE +with the value 200809L or greater.) +.P +.B F_DUPFD_CLOEXEC +is specified in POSIX.1-2008. +(To get this definition, define +.B _POSIX_C_SOURCE +with the value 200809L or greater, or +.B _XOPEN_SOURCE +with the value 700 or greater.) +.SH NOTES +The errors returned by +.BR dup2 (2) +are different from those returned by +.BR F_DUPFD . +.\" +.SS File locking +The original Linux +.BR fcntl () +system call was not designed to handle large file offsets +(in the +.I flock +structure). +Consequently, an +.BR fcntl64 () +system call was added in Linux 2.4. +The newer system call employs a different structure for file locking, +.IR flock64 , +and corresponding operations, +.BR F_GETLK64 , +.BR F_SETLK64 , +and +.BR F_SETLKW64 . +However, these details can be ignored by applications using glibc, whose +.BR fcntl () +wrapper function transparently employs the more recent system call +where it is available. +.\" +.SS Record locks +Since Linux 2.0, there is no interaction between the types of lock +placed by +.BR flock (2) +and +.BR fcntl (). +.P +Several systems have more fields in +.I "struct flock" +such as, for example, +.I l_sysid +(to identify the machine where the lock is held). +.\" e.g., Solaris 8 documents this field in fcntl(2), and Irix 6.5 +.\" documents it in fcntl(5). mtk, May 2007 +.\" Also, FreeBSD documents it (Apr 2014). +Clearly, +.I l_pid +alone is not going to be very useful if the process holding the lock +may live on a different machine; +on Linux, while present on some architectures (such as MIPS32), +this field is not used. +.P +The original Linux +.BR fcntl () +system call was not designed to handle large file offsets +(in the +.I flock +structure). +Consequently, an +.BR fcntl64 () +system call was added in Linux 2.4. +The newer system call employs a different structure for file locking, +.IR flock64 , +and corresponding operations, +.BR F_GETLK64 , +.BR F_SETLK64 , +and +.BR F_SETLKW64 . +However, these details can be ignored by applications using glibc, whose +.BR fcntl () +wrapper function transparently employs the more recent system call +where it is available. +.SS Record locking and NFS +Before Linux 3.12, if an NFSv4 client +loses contact with the server for a period of time +(defined as more than 90 seconds with no communication), +.\" +.\" Neil Brown: With NFSv3 the failure mode is the reverse. If +.\" the server loses contact with a client then any lock stays in place +.\" indefinitely ("why can't I read my mail"... I remember it well). +.\" +it might lose and regain a lock without ever being aware of the fact. +(The period of time after which contact is assumed lost is known as +the NFSv4 leasetime. +On a Linux NFS server, this can be determined by looking at +.IR /proc/fs/nfsd/nfsv4leasetime , +which expresses the period in seconds. +The default value for this file is 90.) +.\" +.\" Jeff Layton: +.\" Note that this is not a firm timeout. The server runs a job +.\" periodically to clean out expired stateful objects, and it's likely +.\" that there is some time (maybe even up to another whole lease period) +.\" between when the timeout expires and the job actually runs. If the +.\" client gets a RENEW in there within that window, its lease will be +.\" renewed and its state preserved. +.\" +This scenario potentially risks data corruption, +since another process might acquire a lock in the intervening period +and perform file I/O. +.P +Since Linux 3.12, +.\" commit ef1820f9be27b6ad158f433ab38002ab8131db4d +if an NFSv4 client loses contact with the server, +any I/O to the file by a process which "thinks" it holds +a lock will fail until that process closes and reopens the file. +A kernel parameter, +.IR nfs.recover_lost_locks , +can be set to 1 to obtain the pre-3.12 behavior, +whereby the client will attempt to recover lost locks +when contact is reestablished with the server. +Because of the attendant risk of data corruption, +.\" commit f6de7a39c181dfb8a2c534661a53c73afb3081cd +this parameter defaults to 0 (disabled). +.SH BUGS +.SS F_SETFL +It is not possible to use +.B F_SETFL +to change the state of the +.B O_DSYNC +and +.B O_SYNC +flags. +.\" FIXME . According to POSIX.1-2001, O_SYNC should also be modifiable +.\" via fcntl(2), but currently Linux does not permit this +.\" See http://bugzilla.kernel.org/show_bug.cgi?id=5994 +Attempts to change the state of these flags are silently ignored. +.SS F_GETOWN +A limitation of the Linux system call conventions on some +architectures (notably i386) means that if a (negative) +process group ID to be returned by +.B F_GETOWN +falls in the range \-1 to \-4095, then the return value is wrongly +interpreted by glibc as an error in the system call; +.\" glibc source: sysdeps/unix/sysv/linux/i386/sysdep.h +that is, the return value of +.BR fcntl () +will be \-1, and +.I errno +will contain the (positive) process group ID. +The Linux-specific +.B F_GETOWN_EX +operation avoids this problem. +.\" mtk, Dec 04: some limited testing on alpha and ia64 seems to +.\" indicate that ANY negative PGID value will cause F_GETOWN +.\" to misinterpret the return as an error. Some other architectures +.\" seem to have the same range check as i386. +Since glibc 2.11, glibc makes the kernel +.B F_GETOWN +problem invisible by implementing +.B F_GETOWN +using +.BR F_GETOWN_EX . +.SS F_SETOWN +In Linux 2.4 and earlier, there is bug that can occur +when an unprivileged process uses +.B F_SETOWN +to specify the owner +of a socket file descriptor +as a process (group) other than the caller. +In this case, +.BR fcntl () +can return \-1 with +.I errno +set to +.BR EPERM , +even when the owner process (group) is one that the caller +has permission to send signals to. +Despite this error return, the file descriptor owner is set, +and signals will be sent to the owner. +.\" +.SS Deadlock detection +The deadlock-detection algorithm employed by the kernel when dealing with +.B F_SETLKW +requests can yield both +false negatives (failures to detect deadlocks, +leaving a set of deadlocked processes blocked indefinitely) +and false positives +.RB ( EDEADLK +errors when there is no deadlock). +For example, +the kernel limits the lock depth of its dependency search to 10 steps, +meaning that circular deadlock chains that exceed +that size will not be detected. +In addition, the kernel may falsely indicate a deadlock +when two or more processes created using the +.BR clone (2) +.B CLONE_FILES +flag place locks that appear (to the kernel) to conflict. +.\" +.SS Mandatory locking +The Linux implementation of mandatory locking +is subject to race conditions which render it unreliable: +.\" http://marc.info/?l=linux-kernel&m=119013491707153&w=2 +.\" +.\" Reconfirmed by Jeff Layton +.\" From: Jeff Layton <jlayton <at> redhat.com> +.\" Subject: Re: Status of fcntl() mandatory locking +.\" Newsgroups: gmane.linux.file-systems +.\" Date: 2014-04-28 10:07:57 GMT +.\" http://thread.gmane.org/gmane.linux.file-systems/84481/focus=84518 +a +.BR write (2) +call that overlaps with a lock may modify data after the mandatory lock is +acquired; +a +.BR read (2) +call that overlaps with a lock may detect changes to data that were made +only after a write lock was acquired. +Similar races exist between mandatory locks and +.BR mmap (2). +It is therefore inadvisable to rely on mandatory locking. +.SH SEE ALSO +.BR dup2 (2), +.BR flock (2), +.BR open (2), +.BR socket (2), +.BR lockf (3), +.BR capabilities (7), +.BR feature_test_macros (7), +.BR lslocks (8) +.P +.IR locks.txt , +.IR mandatory\-locking.txt , +and +.I dnotify.txt +in the Linux kernel source directory +.I Documentation/filesystems/ +(on older kernels, these files are directly under the +.I Documentation/ +directory, and +.I mandatory\-locking.txt +is called +.IR mandatory.txt ) |