summaryrefslogtreecommitdiffstats
path: root/man/man2/fcntl.2
diff options
context:
space:
mode:
Diffstat (limited to 'man/man2/fcntl.2')
-rw-r--r--man/man2/fcntl.22113
1 files changed, 2113 insertions, 0 deletions
diff --git a/man/man2/fcntl.2 b/man/man2/fcntl.2
new file mode 100644
index 0000000..9f5e197
--- /dev/null
+++ b/man/man2/fcntl.2
@@ -0,0 +1,2113 @@
+.\" This manpage is Copyright (C) 1992 Drew Eckhardt;
+.\" and Copyright (C) 1993 Michael Haardt, Ian Jackson;
+.\" and Copyright (C) 1998 Jamie Lokier;
+.\" and Copyright (C) 2002-2010, 2014 Michael Kerrisk;
+.\" and Copyright (C) 2014 Jeff Layton
+.\" and Copyright (C) 2014 David Herrmann
+.\" and Copyright (C) 2017 Jens Axboe
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.\" Modified 1993-07-24 by Rik Faith <faith@cs.unc.edu>
+.\" Modified 1995-09-26 by Andries Brouwer <aeb@cwi.nl>
+.\" and again on 960413 and 980804 and 981223.
+.\" Modified 1998-12-11 by Jamie Lokier <jamie@imbolc.ucc.ie>
+.\" Applied correction by Christian Ehrhardt - aeb, 990712
+.\" Modified 2002-04-23 by Michael Kerrisk <mtk.manpages@gmail.com>
+.\" Added note on F_SETFL and O_DIRECT
+.\" Complete rewrite + expansion of material on file locking
+.\" Incorporated description of F_NOTIFY, drawing on
+.\" Stephen Rothwell's notes in Documentation/dnotify.txt.
+.\" Added description of F_SETLEASE and F_GETLEASE
+.\" Corrected and polished, aeb, 020527.
+.\" Modified 2004-03-03 by Michael Kerrisk <mtk.manpages@gmail.com>
+.\" Modified description of file leases: fixed some errors of detail
+.\" Replaced the term "lease contestant" by "lease breaker"
+.\" Modified, 27 May 2004, Michael Kerrisk <mtk.manpages@gmail.com>
+.\" Added notes on capability requirements
+.\" Modified 2004-12-08, added O_NOATIME after note from Martin Pool
+.\" 2004-12-10, mtk, noted F_GETOWN bug after suggestion from aeb.
+.\" 2005-04-08 Jamie Lokier <jamie@shareable.org>, mtk
+.\" Described behavior of F_SETOWN/F_SETSIG in
+.\" multithreaded processes, and generally cleaned
+.\" up the discussion of F_SETOWN.
+.\" 2005-05-20, Johannes Nicolai <johannes.nicolai@hpi.uni-potsdam.de>,
+.\" mtk: Noted F_SETOWN bug for socket file descriptor in Linux 2.4
+.\" and earlier. Added text on permissions required to send signal.
+.\" 2009-09-30, Michael Kerrisk
+.\" Note obsolete F_SETOWN behavior with threads.
+.\" Document F_SETOWN_EX and F_GETOWN_EX
+.\" 2010-06-17, Michael Kerrisk
+.\" Document F_SETPIPE_SZ and F_GETPIPE_SZ.
+.\" 2014-07-08, David Herrmann <dh.herrmann@gmail.com>
+.\" Document F_ADD_SEALS and F_GET_SEALS
+.\" 2017-06-26, Jens Axboe <axboe@kernel.dk>
+.\" Document F_{GET,SET}_RW_HINT and F_{GET,SET}_FILE_RW_HINT
+.\"
+.TH fcntl 2 2024-05-02 "Linux man-pages (unreleased)"
+.SH NAME
+fcntl \- manipulate file descriptor
+.SH LIBRARY
+Standard C library
+.RI ( libc ", " \-lc )
+.SH SYNOPSIS
+.nf
+.B #include <fcntl.h>
+.P
+.BI "int fcntl(int " fd ", int " op ", ... /* " arg " */ );"
+.fi
+.SH DESCRIPTION
+.BR fcntl ()
+performs one of the operations described below on the open file descriptor
+.IR fd .
+The operation is determined by
+.IR op .
+.P
+.BR fcntl ()
+can take an optional third argument.
+Whether or not this argument is required is determined by
+.IR op .
+The required argument type is indicated in parentheses after each
+.I op
+name (in most cases, the required type is
+.IR int ,
+and we identify the argument using the name
+.IR arg ),
+or
+.I void
+is specified if the argument is not required.
+.P
+Certain of the operations below are supported only since a particular
+Linux kernel version.
+The preferred method of checking whether the host kernel supports
+a particular operation is to invoke
+.BR fcntl ()
+with the desired
+.I op
+value and then test whether the call failed with
+.BR EINVAL ,
+indicating that the kernel does not recognize this value.
+.SS Duplicating a file descriptor
+.TP
+.BR F_DUPFD " (\fIint\fP)"
+Duplicate the file descriptor
+.I fd
+using the lowest-numbered available file descriptor greater than or equal to
+.IR arg .
+This is different from
+.BR dup2 (2),
+which uses exactly the file descriptor specified.
+.IP
+On success, the new file descriptor is returned.
+.IP
+See
+.BR dup (2)
+for further details.
+.TP
+.BR F_DUPFD_CLOEXEC " (\fIint\fP; since Linux 2.6.24)"
+As for
+.BR F_DUPFD ,
+but additionally set the
+close-on-exec flag for the duplicate file descriptor.
+Specifying this flag permits a program to avoid an additional
+.BR fcntl ()
+.B F_SETFD
+operation to set the
+.B FD_CLOEXEC
+flag.
+For an explanation of why this flag is useful,
+see the description of
+.B O_CLOEXEC
+in
+.BR open (2).
+.SS File descriptor flags
+The following operations manipulate the flags associated with
+a file descriptor.
+Currently, only one such flag is defined:
+.BR FD_CLOEXEC ,
+the close-on-exec flag.
+If the
+.B FD_CLOEXEC
+bit is set,
+the file descriptor will automatically be closed during a successful
+.BR execve (2).
+(If the
+.BR execve (2)
+fails, the file descriptor is left open.)
+If the
+.B FD_CLOEXEC
+bit is not set, the file descriptor will remain open across an
+.BR execve (2).
+.TP
+.BR F_GETFD " (\fIvoid\fP)"
+Return (as the function result) the file descriptor flags;
+.I arg
+is ignored.
+.TP
+.BR F_SETFD " (\fIint\fP)"
+Set the file descriptor flags to the value specified by
+.IR arg .
+.P
+In multithreaded programs, using
+.BR fcntl ()
+.B F_SETFD
+to set the close-on-exec flag at the same time as another thread performs a
+.BR fork (2)
+plus
+.BR execve (2)
+is vulnerable to a race condition that may unintentionally leak
+the file descriptor to the program executed in the child process.
+See the discussion of the
+.B O_CLOEXEC
+flag in
+.BR open (2)
+for details and a remedy to the problem.
+.SS File status flags
+Each open file description has certain associated status flags,
+initialized by
+.BR open (2)
+.\" or
+.\" .BR creat (2),
+and possibly modified by
+.BR fcntl ().
+Duplicated file descriptors
+(made with
+.BR dup (2),
+.BR fcntl (F_DUPFD),
+.BR fork (2),
+etc.) refer to the same open file description, and thus
+share the same file status flags.
+.P
+The file status flags and their semantics are described in
+.BR open (2).
+.TP
+.BR F_GETFL " (\fIvoid\fP)"
+Return (as the function result)
+the file access mode and the file status flags;
+.I arg
+is ignored.
+.TP
+.BR F_SETFL " (\fIint\fP)"
+Set the file status flags to the value specified by
+.IR arg .
+File access mode
+.RB ( O_RDONLY ", " O_WRONLY ", " O_RDWR )
+and file creation flags
+(i.e.,
+.BR O_CREAT ", " O_EXCL ", " O_NOCTTY ", " O_TRUNC )
+in
+.I arg
+are ignored.
+On Linux, this operation can change only the
+.BR O_APPEND ,
+.BR O_ASYNC ,
+.BR O_DIRECT ,
+.BR O_NOATIME ,
+and
+.B O_NONBLOCK
+flags.
+It is not possible to change the
+.B O_DSYNC
+and
+.B O_SYNC
+flags; see BUGS, below.
+.SS Advisory record locking
+Linux implements traditional ("process-associated") UNIX record locks,
+as standardized by POSIX.
+For a Linux-specific alternative with better semantics,
+see the discussion of open file description locks below.
+.P
+.BR F_SETLK ,
+.BR F_SETLKW ,
+and
+.B F_GETLK
+are used to acquire, release, and test for the existence of record
+locks (also known as byte-range, file-segment, or file-region locks).
+The third argument,
+.IR lock ,
+is a pointer to a structure that has at least the following fields
+(in unspecified order).
+.P
+.in +4n
+.EX
+struct flock {
+ ...
+ short l_type; /* Type of lock: F_RDLCK,
+ F_WRLCK, F_UNLCK */
+ short l_whence; /* How to interpret l_start:
+ SEEK_SET, SEEK_CUR, SEEK_END */
+ off_t l_start; /* Starting offset for lock */
+ off_t l_len; /* Number of bytes to lock */
+ pid_t l_pid; /* PID of process blocking our lock
+ (set by F_GETLK and F_OFD_GETLK) */
+ ...
+};
+.EE
+.in
+.P
+The
+.IR l_whence ", " l_start ", and " l_len
+fields of this structure specify the range of bytes we wish to lock.
+Bytes past the end of the file may be locked,
+but not bytes before the start of the file.
+.P
+.I l_start
+is the starting offset for the lock, and is interpreted
+relative to either:
+the start of the file (if
+.I l_whence
+is
+.BR SEEK_SET );
+the current file offset (if
+.I l_whence
+is
+.BR SEEK_CUR );
+or the end of the file (if
+.I l_whence
+is
+.BR SEEK_END ).
+In the final two cases,
+.I l_start
+can be a negative number provided the
+offset does not lie before the start of the file.
+.P
+.I l_len
+specifies the number of bytes to be locked.
+If
+.I l_len
+is positive, then the range to be locked covers bytes
+.I l_start
+up to and including
+.IR l_start + l_len \-1.
+Specifying 0 for
+.I l_len
+has the special meaning: lock all bytes starting at the
+location specified by
+.IR l_whence " and " l_start
+through to the end of file, no matter how large the file grows.
+.P
+POSIX.1-2001 allows (but does not require)
+an implementation to support a negative
+.I l_len
+value; if
+.I l_len
+is negative, the interval described by
+.I lock
+covers bytes
+.IR l_start + l_len
+up to and including
+.IR l_start \-1.
+This is supported since Linux 2.4.21 and Linux 2.5.49.
+.P
+The
+.I l_type
+field can be used to place a read
+.RB ( F_RDLCK )
+or a write
+.RB ( F_WRLCK )
+lock on a file.
+Any number of processes may hold a read lock (shared lock)
+on a file region, but only one process may hold a write lock
+(exclusive lock).
+An exclusive lock excludes all other locks,
+both shared and exclusive.
+A single process can hold only one type of lock on a file region;
+if a new lock is applied to an already-locked region,
+then the existing lock is converted to the new lock type.
+(Such conversions may involve splitting, shrinking, or coalescing with
+an existing lock if the byte range specified by the new lock does not
+precisely coincide with the range of the existing lock.)
+.TP
+.BR F_SETLK " (\fIstruct flock *\fP)"
+Acquire a lock (when
+.I l_type
+is
+.B F_RDLCK
+or
+.BR F_WRLCK )
+or release a lock (when
+.I l_type
+is
+.BR F_UNLCK )
+on the bytes specified by the
+.IR l_whence ", " l_start ", and " l_len
+fields of
+.IR lock .
+If a conflicting lock is held by another process,
+this call returns \-1 and sets
+.I errno
+to
+.B EACCES
+or
+.BR EAGAIN .
+(The error returned in this case differs across implementations,
+so POSIX requires a portable application to check for both errors.)
+.TP
+.BR F_SETLKW " (\fIstruct flock *\fP)"
+As for
+.BR F_SETLK ,
+but if a conflicting lock is held on the file, then wait for that
+lock to be released.
+If a signal is caught while waiting, then the call is interrupted
+and (after the signal handler has returned)
+returns immediately (with return value \-1 and
+.I errno
+set to
+.BR EINTR ;
+see
+.BR signal (7)).
+.TP
+.BR F_GETLK " (\fIstruct flock *\fP)"
+On input to this call,
+.I lock
+describes a lock we would like to place on the file.
+If the lock could be placed,
+.BR fcntl ()
+does not actually place it, but returns
+.B F_UNLCK
+in the
+.I l_type
+field of
+.I lock
+and leaves the other fields of the structure unchanged.
+.IP
+If one or more incompatible locks would prevent
+this lock being placed, then
+.BR fcntl ()
+returns details about one of those locks in the
+.IR l_type ", " l_whence ", " l_start ", and " l_len
+fields of
+.IR lock .
+If the conflicting lock is a traditional (process-associated) record lock,
+then the
+.I l_pid
+field is set to the PID of the process holding that lock.
+If the conflicting lock is an open file description lock, then
+.I l_pid
+is set to \-1.
+Note that the returned information
+may already be out of date by the time the caller inspects it.
+.P
+In order to place a read lock,
+.I fd
+must be open for reading.
+In order to place a write lock,
+.I fd
+must be open for writing.
+To place both types of lock, open a file read-write.
+.P
+When placing locks with
+.BR F_SETLKW ,
+the kernel detects
+.IR deadlocks ,
+whereby two or more processes have their
+lock requests mutually blocked by locks held by the other processes.
+For example, suppose process A holds a write lock on byte 100 of a file,
+and process B holds a write lock on byte 200.
+If each process then attempts to lock the byte already
+locked by the other process using
+.BR F_SETLKW ,
+then, without deadlock detection,
+both processes would remain blocked indefinitely.
+When the kernel detects such deadlocks,
+it causes one of the blocking lock requests to immediately fail with the error
+.BR EDEADLK ;
+an application that encounters such an error should release
+some of its locks to allow other applications to proceed before
+attempting regain the locks that it requires.
+Circular deadlocks involving more than two processes are also detected.
+Note, however, that there are limitations to the kernel's
+deadlock-detection algorithm; see BUGS.
+.P
+As well as being removed by an explicit
+.BR F_UNLCK ,
+record locks are automatically released when the process terminates.
+.P
+Record locks are not inherited by a child created via
+.BR fork (2),
+but are preserved across an
+.BR execve (2).
+.P
+Because of the buffering performed by the
+.BR stdio (3)
+library, the use of record locking with routines in that package
+should be avoided; use
+.BR read (2)
+and
+.BR write (2)
+instead.
+.P
+The record locks described above are associated with the process
+(unlike the open file description locks described below).
+This has some unfortunate consequences:
+.IP \[bu] 3
+If a process closes
+.I any
+file descriptor referring to a file,
+then all of the process's locks on that file are released,
+regardless of the file descriptor(s) on which the locks were obtained.
+.\" (Additional file descriptors referring to the same file
+.\" may have been obtained by calls to
+.\" .BR open "(2), " dup "(2), " dup2 "(2), or " fcntl ().)
+This is bad: it means that a process can lose its locks on
+a file such as
+.I /etc/passwd
+or
+.I /etc/mtab
+when for some reason a library function decides to open, read,
+and close the same file.
+.IP \[bu]
+The threads in a process share locks.
+In other words,
+a multithreaded program can't use record locking to ensure
+that threads don't simultaneously access the same region of a file.
+.P
+Open file description locks solve both of these problems.
+.SS Open file description locks (non-POSIX)
+Open file description locks are advisory byte-range locks whose operation is
+in most respects identical to the traditional record locks described above.
+This lock type is Linux-specific,
+and available since Linux 3.15.
+(There is a proposal with the Austin Group
+.\" FIXME . Review progress into POSIX
+.\" http://austingroupbugs.net/view.php?id=768
+to include this lock type in the next revision of POSIX.1.)
+For an explanation of open file descriptions, see
+.BR open (2).
+.P
+The principal difference between the two lock types
+is that whereas traditional record locks
+are associated with a process,
+open file description locks are associated with the
+open file description on which they are acquired,
+much like locks acquired with
+.BR flock (2).
+Consequently (and unlike traditional advisory record locks),
+open file description locks are inherited across
+.BR fork (2)
+(and
+.BR clone (2)
+with
+.BR CLONE_FILES ),
+and are only automatically released on the last close
+of the open file description,
+instead of being released on any close of the file.
+.P
+Conflicting lock combinations
+(i.e., a read lock and a write lock or two write locks)
+where one lock is an open file description lock and the other
+is a traditional record lock conflict
+even when they are acquired by the same process on the same file descriptor.
+.P
+Open file description locks placed via the same open file description
+(i.e., via the same file descriptor,
+or via a duplicate of the file descriptor created by
+.BR fork (2),
+.BR dup (2),
+.BR fcntl ()
+.BR F_DUPFD ,
+and so on) are always compatible:
+if a new lock is placed on an already locked region,
+then the existing lock is converted to the new lock type.
+(Such conversions may result in splitting, shrinking, or coalescing with
+an existing lock as discussed above.)
+.P
+On the other hand, open file description locks may conflict with
+each other when they are acquired via different open file descriptions.
+Thus, the threads in a multithreaded program can use
+open file description locks to synchronize access to a file region
+by having each thread perform its own
+.BR open (2)
+on the file and applying locks via the resulting file descriptor.
+.P
+As with traditional advisory locks, the third argument to
+.BR fcntl (),
+.IR lock ,
+is a pointer to an
+.I flock
+structure.
+By contrast with traditional record locks, the
+.I l_pid
+field of that structure must be set to zero
+when using the operations described below.
+.P
+The operations for working with open file description locks are analogous
+to those used with traditional locks:
+.TP
+.BR F_OFD_SETLK " (\fIstruct flock *\fP)"
+Acquire an open file description lock (when
+.I l_type
+is
+.B F_RDLCK
+or
+.BR F_WRLCK )
+or release an open file description lock (when
+.I l_type
+is
+.BR F_UNLCK )
+on the bytes specified by the
+.IR l_whence ", " l_start ", and " l_len
+fields of
+.IR lock .
+If a conflicting lock is held by another process,
+this call returns \-1 and sets
+.I errno
+to
+.BR EAGAIN .
+.TP
+.BR F_OFD_SETLKW " (\fIstruct flock *\fP)"
+As for
+.BR F_OFD_SETLK ,
+but if a conflicting lock is held on the file, then wait for that lock to be
+released.
+If a signal is caught while waiting, then the call is interrupted
+and (after the signal handler has returned) returns immediately
+(with return value \-1 and
+.I errno
+set to
+.BR EINTR ;
+see
+.BR signal (7)).
+.TP
+.BR F_OFD_GETLK " (\fIstruct flock *\fP)"
+On input to this call,
+.I lock
+describes an open file description lock we would like to place on the file.
+If the lock could be placed,
+.BR fcntl ()
+does not actually place it, but returns
+.B F_UNLCK
+in the
+.I l_type
+field of
+.I lock
+and leaves the other fields of the structure unchanged.
+If one or more incompatible locks would prevent this lock being placed,
+then details about one of these locks are returned via
+.IR lock ,
+as described above for
+.BR F_GETLK .
+.P
+In the current implementation,
+.\" commit 57b65325fe34ec4c917bc4e555144b4a94d9e1f7
+no deadlock detection is performed for open file description locks.
+(This contrasts with process-associated record locks,
+for which the kernel does perform deadlock detection.)
+.\"
+.SS Mandatory locking
+.IR Warning :
+the Linux implementation of mandatory locking is unreliable.
+See BUGS below.
+Because of these bugs,
+and the fact that the feature is believed to be little used,
+since Linux 4.5, mandatory locking has been made an optional feature,
+governed by a configuration option
+.RB ( CONFIG_MANDATORY_FILE_LOCKING ).
+This feature is no longer supported at all in Linux 5.15 and above.
+.P
+By default, both traditional (process-associated) and open file description
+record locks are advisory.
+Advisory locks are not enforced and are useful only between
+cooperating processes.
+.P
+Both lock types can also be mandatory.
+Mandatory locks are enforced for all processes.
+If a process tries to perform an incompatible access (e.g.,
+.BR read (2)
+or
+.BR write (2))
+on a file region that has an incompatible mandatory lock,
+then the result depends upon whether the
+.B O_NONBLOCK
+flag is enabled for its open file description.
+If the
+.B O_NONBLOCK
+flag is not enabled, then
+the system call is blocked until the lock is removed
+or converted to a mode that is compatible with the access.
+If the
+.B O_NONBLOCK
+flag is enabled, then the system call fails with the error
+.BR EAGAIN .
+.P
+To make use of mandatory locks, mandatory locking must be enabled
+both on the filesystem that contains the file to be locked,
+and on the file itself.
+Mandatory locking is enabled on a filesystem
+using the "\-o mand" option to
+.BR mount (8),
+or the
+.B MS_MANDLOCK
+flag for
+.BR mount (2).
+Mandatory locking is enabled on a file by disabling
+group execute permission on the file and enabling the set-group-ID
+permission bit (see
+.BR chmod (1)
+and
+.BR chmod (2)).
+.P
+Mandatory locking is not specified by POSIX.
+Some other systems also support mandatory locking,
+although the details of how to enable it vary across systems.
+.\"
+.SS Lost locks
+When an advisory lock is obtained on a networked filesystem such as
+NFS it is possible that the lock might get lost.
+This may happen due to administrative action on the server, or due to a
+network partition (i.e., loss of network connectivity with the server)
+which lasts long enough for the server to assume
+that the client is no longer functioning.
+.P
+When the filesystem determines that a lock has been lost, future
+.BR read (2)
+or
+.BR write (2)
+requests may fail with the error
+.BR EIO .
+This error will persist until the lock is removed or the file
+descriptor is closed.
+Since Linux 3.12,
+.\" commit ef1820f9be27b6ad158f433ab38002ab8131db4d
+this happens at least for NFSv4 (including all minor versions).
+.P
+Some versions of UNIX send a signal
+.RB ( SIGLOST )
+in this circumstance.
+Linux does not define this signal, and does not provide any
+asynchronous notification of lost locks.
+.\"
+.SS Managing signals
+.BR F_GETOWN ,
+.BR F_SETOWN ,
+.BR F_GETOWN_EX ,
+.BR F_SETOWN_EX ,
+.BR F_GETSIG ,
+and
+.B F_SETSIG
+are used to manage I/O availability signals:
+.TP
+.BR F_GETOWN " (\fIvoid\fP)"
+Return (as the function result)
+the process ID or process group ID currently receiving
+.B SIGIO
+and
+.B SIGURG
+signals for events on file descriptor
+.IR fd .
+Process IDs are returned as positive values;
+process group IDs are returned as negative values (but see BUGS below).
+.I arg
+is ignored.
+.TP
+.BR F_SETOWN " (\fIint\fP)"
+Set the process ID or process group ID that will receive
+.B SIGIO
+and
+.B SIGURG
+signals for events on the file descriptor
+.IR fd .
+The target process or process group ID is specified in
+.IR arg .
+A process ID is specified as a positive value;
+a process group ID is specified as a negative value.
+Most commonly, the calling process specifies itself as the owner
+(that is,
+.I arg
+is specified as
+.BR getpid (2)).
+.IP
+As well as setting the file descriptor owner,
+one must also enable generation of signals on the file descriptor.
+This is done by using the
+.BR fcntl ()
+.B F_SETFL
+operation to set the
+.B O_ASYNC
+file status flag on the file descriptor.
+Subsequently, a
+.B SIGIO
+signal is sent whenever input or output becomes possible
+on the file descriptor.
+The
+.BR fcntl ()
+.B F_SETSIG
+operation can be used to obtain delivery of a signal other than
+.BR SIGIO .
+.IP
+Sending a signal to the owner process (group) specified by
+.B F_SETOWN
+is subject to the same permissions checks as are described for
+.BR kill (2),
+where the sending process is the one that employs
+.B F_SETOWN
+(but see BUGS below).
+If this permission check fails, then the signal is
+silently discarded.
+.IR Note :
+The
+.B F_SETOWN
+operation records the caller's credentials at the time of the
+.BR fcntl ()
+call,
+and it is these saved credentials that are used for the permission checks.
+.IP
+If the file descriptor
+.I fd
+refers to a socket,
+.B F_SETOWN
+also selects
+the recipient of
+.B SIGURG
+signals that are delivered when out-of-band
+data arrives on that socket.
+.RB ( SIGURG
+is sent in any situation where
+.BR select (2)
+would report the socket as having an "exceptional condition".)
+.\" The following appears to be rubbish. It doesn't seem to
+.\" be true according to the kernel source, and I can write
+.\" a program that gets a terminal-generated SIGIO even though
+.\" it is not the foreground process group of the terminal.
+.\" -- MTK, 8 Apr 05
+.\"
+.\" If the file descriptor
+.\" .I fd
+.\" refers to a terminal device, then SIGIO
+.\" signals are sent to the foreground process group of the terminal.
+.IP
+The following was true in Linux 2.6.x up to and including Linux 2.6.11:
+.RS
+.IP
+If a nonzero value is given to
+.B F_SETSIG
+in a multithreaded process running with a threading library
+that supports thread groups (e.g., NPTL),
+then a positive value given to
+.B F_SETOWN
+has a different meaning:
+.\" The relevant place in the (2.6) kernel source is the
+.\" 'switch' in fs/fcntl.c::send_sigio_to_task() -- MTK, Apr 2005
+instead of being a process ID identifying a whole process,
+it is a thread ID identifying a specific thread within a process.
+Consequently, it may be necessary to pass
+.B F_SETOWN
+the result of
+.BR gettid (2)
+instead of
+.BR getpid (2)
+to get sensible results when
+.B F_SETSIG
+is used.
+(In current Linux threading implementations,
+a main thread's thread ID is the same as its process ID.
+This means that a single-threaded program can equally use
+.BR gettid (2)
+or
+.BR getpid (2)
+in this scenario.)
+Note, however, that the statements in this paragraph do not apply
+to the
+.B SIGURG
+signal generated for out-of-band data on a socket:
+this signal is always sent to either a process or a process group,
+depending on the value given to
+.BR F_SETOWN .
+.\" send_sigurg()/send_sigurg_to_task() bypasses
+.\" kill_fasync()/send_sigio()/send_sigio_to_task()
+.\" to directly call send_group_sig_info()
+.\" -- MTK, Apr 2005 (kernel 2.6.11)
+.RE
+.IP
+The above behavior was accidentally dropped in Linux 2.6.12,
+and won't be restored.
+From Linux 2.6.32 onward, use
+.B F_SETOWN_EX
+to target
+.B SIGIO
+and
+.B SIGURG
+signals at a particular thread.
+.TP
+.BR F_GETOWN_EX " (\fIstruct f_owner_ex *\fP) (since Linux 2.6.32)"
+Return the current file descriptor owner settings
+as defined by a previous
+.B F_SETOWN_EX
+operation.
+The information is returned in the structure pointed to by
+.IR arg ,
+which has the following form:
+.IP
+.in +4n
+.EX
+struct f_owner_ex {
+ int type;
+ pid_t pid;
+};
+.EE
+.in
+.IP
+The
+.I type
+field will have one of the values
+.BR F_OWNER_TID ,
+.BR F_OWNER_PID ,
+or
+.BR F_OWNER_PGRP .
+The
+.I pid
+field is a positive integer representing a thread ID, process ID,
+or process group ID.
+See
+.B F_SETOWN_EX
+for more details.
+.TP
+.BR F_SETOWN_EX " (\fIstruct f_owner_ex *\fP) (since Linux 2.6.32)"
+This operation performs a similar task to
+.BR F_SETOWN .
+It allows the caller to direct I/O availability signals
+to a specific thread, process, or process group.
+The caller specifies the target of signals via
+.IR arg ,
+which is a pointer to a
+.I f_owner_ex
+structure.
+The
+.I type
+field has one of the following values, which define how
+.I pid
+is interpreted:
+.RS
+.TP
+.B F_OWNER_TID
+Send the signal to the thread whose thread ID
+(the value returned by a call to
+.BR clone (2)
+or
+.BR gettid (2))
+is specified in
+.IR pid .
+.TP
+.B F_OWNER_PID
+Send the signal to the process whose ID
+is specified in
+.IR pid .
+.TP
+.B F_OWNER_PGRP
+Send the signal to the process group whose ID
+is specified in
+.IR pid .
+(Note that, unlike with
+.BR F_SETOWN ,
+a process group ID is specified as a positive value here.)
+.RE
+.TP
+.BR F_GETSIG " (\fIvoid\fP)"
+Return (as the function result)
+the signal sent when input or output becomes possible.
+A value of zero means
+.B SIGIO
+is sent.
+Any other value (including
+.BR SIGIO )
+is the
+signal sent instead, and in this case additional info is available to
+the signal handler if installed with
+.BR SA_SIGINFO .
+.I arg
+is ignored.
+.TP
+.BR F_SETSIG " (\fIint\fP)"
+Set the signal sent when input or output becomes possible
+to the value given in
+.IR arg .
+A value of zero means to send the default
+.B SIGIO
+signal.
+Any other value (including
+.BR SIGIO )
+is the signal to send instead, and in this case additional info
+is available to the signal handler if installed with
+.BR SA_SIGINFO .
+.\"
+.\" The following was true only up until Linux 2.6.11:
+.\"
+.\" Additionally, passing a nonzero value to
+.\" .B F_SETSIG
+.\" changes the signal recipient from a whole process to a specific thread
+.\" within a process.
+.\" See the description of
+.\" .B F_SETOWN
+.\" for more details.
+.IP
+By using
+.B F_SETSIG
+with a nonzero value, and setting
+.B SA_SIGINFO
+for the
+signal handler (see
+.BR sigaction (2)),
+extra information about I/O events is passed to
+the handler in a
+.I siginfo_t
+structure.
+If the
+.I si_code
+field indicates the source is
+.BR SI_SIGIO ,
+the
+.I si_fd
+field gives the file descriptor associated with the event.
+Otherwise,
+there is no indication which file descriptors are pending, and you
+should use the usual mechanisms
+.RB ( select (2),
+.BR poll (2),
+.BR read (2)
+with
+.B O_NONBLOCK
+set etc.) to determine which file descriptors are available for I/O.
+.IP
+Note that the file descriptor provided in
+.I si_fd
+is the one that was specified during the
+.B F_SETSIG
+operation.
+This can lead to an unusual corner case.
+If the file descriptor is duplicated
+.RB ( dup (2)
+or similar), and the original file descriptor is closed,
+then I/O events will continue to be generated, but the
+.I si_fd
+field will contain the number of the now closed file descriptor.
+.IP
+By selecting a real time signal (value >=
+.BR SIGRTMIN ),
+multiple I/O events may be queued using the same signal numbers.
+(Queuing is dependent on available memory.)
+Extra information is available
+if
+.B SA_SIGINFO
+is set for the signal handler, as above.
+.IP
+Note that Linux imposes a limit on the
+number of real-time signals that may be queued to a
+process (see
+.BR getrlimit (2)
+and
+.BR signal (7))
+and if this limit is reached, then the kernel reverts to
+delivering
+.BR SIGIO ,
+and this signal is delivered to the entire
+process rather than to a specific thread.
+.\" See fs/fcntl.c::send_sigio_to_task() (2.4/2.6) sources -- MTK, Apr 05
+.P
+Using these mechanisms, a program can implement fully asynchronous I/O
+without using
+.BR select (2)
+or
+.BR poll (2)
+most of the time.
+.P
+The use of
+.B O_ASYNC
+is specific to BSD and Linux.
+The only use of
+.B F_GETOWN
+and
+.B F_SETOWN
+specified in POSIX.1 is in conjunction with the use of the
+.B SIGURG
+signal on sockets.
+(POSIX does not specify the
+.B SIGIO
+signal.)
+.BR F_GETOWN_EX ,
+.BR F_SETOWN_EX ,
+.BR F_GETSIG ,
+and
+.B F_SETSIG
+are Linux-specific.
+POSIX has asynchronous I/O and the
+.I aio_sigevent
+structure to achieve similar things; these are also available
+in Linux as part of the GNU C Library (glibc).
+.SS Leases
+.B F_SETLEASE
+and
+.B F_GETLEASE
+(Linux 2.4 onward) are used to establish a new lease,
+and retrieve the current lease, on the open file description
+referred to by the file descriptor
+.IR fd .
+A file lease provides a mechanism whereby the process holding
+the lease (the "lease holder") is notified (via delivery of a signal)
+when a process (the "lease breaker") tries to
+.BR open (2)
+or
+.BR truncate (2)
+the file referred to by that file descriptor.
+.TP
+.BR F_SETLEASE " (\fIint\fP)"
+Set or remove a file lease according to which of the following
+values is specified in the integer
+.IR arg :
+.RS
+.TP
+.B F_RDLCK
+Take out a read lease.
+This will cause the calling process to be notified when
+the file is opened for writing or is truncated.
+.\" The following became true in Linux 2.6.10:
+.\" See the man-pages-2.09 Changelog for further info.
+A read lease can be placed only on a file descriptor that
+is opened read-only.
+.TP
+.B F_WRLCK
+Take out a write lease.
+This will cause the caller to be notified when
+the file is opened for reading or writing or is truncated.
+A write lease may be placed on a file only if there are no
+other open file descriptors for the file.
+.TP
+.B F_UNLCK
+Remove our lease from the file.
+.RE
+.P
+Leases are associated with an open file description (see
+.BR open (2)).
+This means that duplicate file descriptors (created by, for example,
+.BR fork (2)
+or
+.BR dup (2))
+refer to the same lease, and this lease may be modified
+or released using any of these descriptors.
+Furthermore, the lease is released by either an explicit
+.B F_UNLCK
+operation on any of these duplicate file descriptors, or when all
+such file descriptors have been closed.
+.P
+Leases may be taken out only on regular files.
+An unprivileged process may take out a lease only on a file whose
+UID (owner) matches the filesystem UID of the process.
+A process with the
+.B CAP_LEASE
+capability may take out leases on arbitrary files.
+.TP
+.BR F_GETLEASE " (\fIvoid\fP)"
+Indicates what type of lease is associated with the file descriptor
+.I fd
+by returning either
+.BR F_RDLCK ", " F_WRLCK ", or " F_UNLCK ,
+indicating, respectively, a read lease , a write lease, or no lease.
+.I arg
+is ignored.
+.P
+When a process (the "lease breaker") performs an
+.BR open (2)
+or
+.BR truncate (2)
+that conflicts with a lease established via
+.BR F_SETLEASE ,
+the system call is blocked by the kernel and
+the kernel notifies the lease holder by sending it a signal
+.RB ( SIGIO
+by default).
+The lease holder should respond to receipt of this signal by doing
+whatever cleanup is required in preparation for the file to be
+accessed by another process (e.g., flushing cached buffers) and
+then either remove or downgrade its lease.
+A lease is removed by performing an
+.B F_SETLEASE
+operation specifying
+.I arg
+as
+.BR F_UNLCK .
+If the lease holder currently holds a write lease on the file,
+and the lease breaker is opening the file for reading,
+then it is sufficient for the lease holder to downgrade
+the lease to a read lease.
+This is done by performing an
+.B F_SETLEASE
+operation specifying
+.I arg
+as
+.BR F_RDLCK .
+.P
+If the lease holder fails to downgrade or remove the lease within
+the number of seconds specified in
+.IR /proc/sys/fs/lease\-break\-time ,
+then the kernel forcibly removes or downgrades the lease holder's lease.
+.P
+Once a lease break has been initiated,
+.B F_GETLEASE
+returns the target lease type (either
+.B F_RDLCK
+or
+.BR F_UNLCK ,
+depending on what would be compatible with the lease breaker)
+until the lease holder voluntarily downgrades or removes the lease or
+the kernel forcibly does so after the lease break timer expires.
+.P
+Once the lease has been voluntarily or forcibly removed or downgraded,
+and assuming the lease breaker has not unblocked its system call,
+the kernel permits the lease breaker's system call to proceed.
+.P
+If the lease breaker's blocked
+.BR open (2)
+or
+.BR truncate (2)
+is interrupted by a signal handler,
+then the system call fails with the error
+.BR EINTR ,
+but the other steps still occur as described above.
+If the lease breaker is killed by a signal while blocked in
+.BR open (2)
+or
+.BR truncate (2),
+then the other steps still occur as described above.
+If the lease breaker specifies the
+.B O_NONBLOCK
+flag when calling
+.BR open (2),
+then the call immediately fails with the error
+.BR EWOULDBLOCK ,
+but the other steps still occur as described above.
+.P
+The default signal used to notify the lease holder is
+.BR SIGIO ,
+but this can be changed using the
+.B F_SETSIG
+operation to
+.BR fcntl ().
+If a
+.B F_SETSIG
+operation is performed (even one specifying
+.BR SIGIO ),
+and the signal
+handler is established using
+.BR SA_SIGINFO ,
+then the handler will receive a
+.I siginfo_t
+structure as its second argument, and the
+.I si_fd
+field of this argument will hold the file descriptor of the leased file
+that has been accessed by another process.
+(This is useful if the caller holds leases against multiple files.)
+.SS File and directory change notification (dnotify)
+.TP
+.BR F_NOTIFY " (\fIint\fP)"
+(Linux 2.4 onward)
+Provide notification when the directory referred to by
+.I fd
+or any of the files that it contains is changed.
+The events to be notified are specified in
+.IR arg ,
+which is a bit mask specified by ORing together zero or more of
+the following bits:
+.P
+.RS
+.PD 0
+.TP
+.B DN_ACCESS
+A file was accessed
+.RB ( read (2),
+.BR pread (2),
+.BR readv (2),
+and similar)
+.TP
+.B DN_MODIFY
+A file was modified
+.RB ( write (2),
+.BR pwrite (2),
+.BR writev (2),
+.BR truncate (2),
+.BR ftruncate (2),
+and similar).
+.TP
+.B DN_CREATE
+A file was created
+.RB ( open (2),
+.BR creat (2),
+.BR mknod (2),
+.BR mkdir (2),
+.BR link (2),
+.BR symlink (2),
+.BR rename (2)
+into this directory).
+.TP
+.B DN_DELETE
+A file was unlinked
+.RB ( unlink (2),
+.BR rename (2)
+to another directory,
+.BR rmdir (2)).
+.TP
+.B DN_RENAME
+A file was renamed within this directory
+.RB ( rename (2)).
+.TP
+.B DN_ATTRIB
+The attributes of a file were changed
+.RB ( chown (2),
+.BR chmod (2),
+.BR utime (2),
+.BR utimensat (2),
+and similar).
+.PD
+.RE
+.IP
+(In order to obtain these definitions, the
+.B _GNU_SOURCE
+feature test macro must be defined before including
+.I any
+header files.)
+.IP
+Directory notifications are normally "one-shot", and the application
+must reregister to receive further notifications.
+Alternatively, if
+.B DN_MULTISHOT
+is included in
+.IR arg ,
+then notification will remain in effect until explicitly removed.
+.IP
+.\" The following does seem a poor API-design choice...
+A series of
+.B F_NOTIFY
+requests is cumulative, with the events in
+.I arg
+being added to the set already monitored.
+To disable notification of all events, make an
+.B F_NOTIFY
+call specifying
+.I arg
+as 0.
+.IP
+Notification occurs via delivery of a signal.
+The default signal is
+.BR SIGIO ,
+but this can be changed using the
+.B F_SETSIG
+operation to
+.BR fcntl ().
+(Note that
+.B SIGIO
+is one of the nonqueuing standard signals;
+switching to the use of a real-time signal means that
+multiple notifications can be queued to the process.)
+In the latter case, the signal handler receives a
+.I siginfo_t
+structure as its second argument (if the handler was
+established using
+.BR SA_SIGINFO )
+and the
+.I si_fd
+field of this structure contains the file descriptor which
+generated the notification (useful when establishing notification
+on multiple directories).
+.IP
+Especially when using
+.BR DN_MULTISHOT ,
+a real time signal should be used for notification,
+so that multiple notifications can be queued.
+.IP
+.B NOTE:
+New applications should use the
+.I inotify
+interface (available since Linux 2.6.13),
+which provides a much superior interface for obtaining notifications of
+filesystem events.
+See
+.BR inotify (7).
+.SS Changing the capacity of a pipe
+.TP
+.BR F_SETPIPE_SZ " (\fIint\fP; since Linux 2.6.35)"
+Change the capacity of the pipe referred to by
+.I fd
+to be at least
+.I arg
+bytes.
+An unprivileged process can adjust the pipe capacity to any value
+between the system page size and the limit defined in
+.I /proc/sys/fs/pipe\-max\-size
+(see
+.BR proc (5)).
+Attempts to set the pipe capacity below the page size are silently
+rounded up to the page size.
+Attempts by an unprivileged process to set the pipe capacity above the limit in
+.I /proc/sys/fs/pipe\-max\-size
+yield the error
+.BR EPERM ;
+a privileged process
+.RB ( CAP_SYS_RESOURCE )
+can override the limit.
+.IP
+When allocating the buffer for the pipe,
+the kernel may use a capacity larger than
+.IR arg ,
+if that is convenient for the implementation.
+(In the current implementation,
+the allocation is the next higher power-of-two page-size multiple
+of the requested size.)
+The actual capacity (in bytes) that is set is returned as the function result.
+.IP
+Attempting to set the pipe capacity smaller than the amount
+of buffer space currently used to store data produces the error
+.BR EBUSY .
+.IP
+Note that because of the way the pages of the pipe buffer
+are employed when data is written to the pipe,
+the number of bytes that can be written may be less than the nominal size,
+depending on the size of the writes.
+.TP
+.BR F_GETPIPE_SZ " (\fIvoid\fP; since Linux 2.6.35)"
+Return (as the function result) the capacity of the pipe referred to by
+.IR fd .
+.\"
+.SS File Sealing
+File seals limit the set of allowed operations on a given file.
+For each seal that is set on a file,
+a specific set of operations will fail with
+.B EPERM
+on this file from now on.
+The file is said to be sealed.
+The default set of seals depends on the type of the underlying
+file and filesystem.
+For an overview of file sealing, a discussion of its purpose,
+and some code examples, see
+.BR memfd_create (2).
+.P
+Currently,
+file seals can be applied only to a file descriptor returned by
+.BR memfd_create (2)
+(if the
+.B MFD_ALLOW_SEALING
+was employed).
+On other filesystems, all
+.BR fcntl ()
+operations that operate on seals will return
+.BR EINVAL .
+.P
+Seals are a property of an inode.
+Thus, all open file descriptors referring to the same inode share
+the same set of seals.
+Furthermore, seals can never be removed, only added.
+.TP
+.BR F_ADD_SEALS " (\fIint\fP; since Linux 3.17)"
+Add the seals given in the bit-mask argument
+.I arg
+to the set of seals of the inode referred to by the file descriptor
+.IR fd .
+Seals cannot be removed again.
+Once this call succeeds, the seals are enforced by the kernel immediately.
+If the current set of seals includes
+.B F_SEAL_SEAL
+(see below), then this call will be rejected with
+.BR EPERM .
+Adding a seal that is already set is a no-op, in case
+.B F_SEAL_SEAL
+is not set already.
+In order to place a seal, the file descriptor
+.I fd
+must be writable.
+.TP
+.BR F_GET_SEALS " (\fIvoid\fP; since Linux 3.17)"
+Return (as the function result) the current set of seals
+of the inode referred to by
+.IR fd .
+If no seals are set, 0 is returned.
+If the file does not support sealing, \-1 is returned and
+.I errno
+is set to
+.BR EINVAL .
+.P
+The following seals are available:
+.TP
+.B F_SEAL_SEAL
+If this seal is set, any further call to
+.BR fcntl ()
+with
+.B F_ADD_SEALS
+fails with the error
+.BR EPERM .
+Therefore, this seal prevents any modifications to the set of seals itself.
+If the initial set of seals of a file includes
+.BR F_SEAL_SEAL ,
+then this effectively causes the set of seals to be constant and locked.
+.TP
+.B F_SEAL_SHRINK
+If this seal is set, the file in question cannot be reduced in size.
+This affects
+.BR open (2)
+with the
+.B O_TRUNC
+flag as well as
+.BR truncate (2)
+and
+.BR ftruncate (2).
+Those calls fail with
+.B EPERM
+if you try to shrink the file in question.
+Increasing the file size is still possible.
+.TP
+.B F_SEAL_GROW
+If this seal is set, the size of the file in question cannot be increased.
+This affects
+.BR write (2)
+beyond the end of the file,
+.BR truncate (2),
+.BR ftruncate (2),
+and
+.BR fallocate (2).
+These calls fail with
+.B EPERM
+if you use them to increase the file size.
+If you keep the size or shrink it, those calls still work as expected.
+.TP
+.B F_SEAL_WRITE
+If this seal is set, you cannot modify the contents of the file.
+Note that shrinking or growing the size of the file is
+still possible and allowed.
+.\" One or more other seals are typically used with F_SEAL_WRITE
+.\" because, given a file with the F_SEAL_WRITE seal set, then,
+.\" while it would no longer be possible to (say) write zeros into
+.\" the last 100 bytes of a file, it would still be possible
+.\" to (say) shrink the file by 100 bytes using ftruncate(), and
+.\" then increase the file size by 100 bytes, which would have
+.\" the effect of replacing the last hundred bytes by zeros.
+.\"
+Thus, this seal is normally used in combination with one of the other seals.
+This seal affects
+.BR write (2)
+and
+.BR fallocate (2)
+(only in combination with the
+.B FALLOC_FL_PUNCH_HOLE
+flag).
+Those calls fail with
+.B EPERM
+if this seal is set.
+Furthermore, trying to create new shared, writable memory-mappings via
+.BR mmap (2)
+will also fail with
+.BR EPERM .
+.IP
+Using the
+.B F_ADD_SEALS
+operation to set the
+.B F_SEAL_WRITE
+seal fails with
+.B EBUSY
+if any writable, shared mapping exists.
+Such mappings must be unmapped before you can add this seal.
+Furthermore, if there are any asynchronous I/O operations
+.RB ( io_submit (2))
+pending on the file,
+all outstanding writes will be discarded.
+.TP
+.BR F_SEAL_FUTURE_WRITE " (since Linux 5.1)"
+The effect of this seal is similar to
+.BR F_SEAL_WRITE ,
+but the contents of the file can still be modified via
+shared writable mappings that were created prior to the seal being set.
+Any attempt to create a new writable mapping on the file via
+.BR mmap (2)
+will fail with
+.BR EPERM .
+Likewise, an attempt to write to the file via
+.BR write (2)
+will fail with
+.BR EPERM .
+.IP
+Using this seal,
+one process can create a memory buffer that it can continue to modify
+while sharing that buffer on a "read-only" basis with other processes.
+.\"
+.SS File read/write hints
+Write lifetime hints can be used to inform the kernel about the relative
+expected lifetime of writes on a given inode or
+via a particular open file description.
+(See
+.BR open (2)
+for an explanation of open file descriptions.)
+In this context, the term "write lifetime" means
+the expected time the data will live on media, before
+being overwritten or erased.
+.P
+An application may use the different hint values specified below to
+separate writes into different write classes,
+so that multiple users or applications running on a single storage back-end
+can aggregate their I/O patterns in a consistent manner.
+However, there are no functional semantics implied by these flags,
+and different I/O classes can use the write lifetime hints
+in arbitrary ways, so long as the hints are used consistently.
+.P
+The following operations can be applied to the file descriptor,
+.IR fd :
+.TP
+.BR F_GET_RW_HINT " (\fIuint64_t *\fP; since Linux 4.13)"
+Returns the value of the read/write hint associated with the underlying inode
+referred to by
+.IR fd .
+.TP
+.BR F_SET_RW_HINT " (\fIuint64_t *\fP; since Linux 4.13)"
+Sets the read/write hint value associated with the
+underlying inode referred to by
+.IR fd .
+This hint persists until either it is explicitly modified or
+the underlying filesystem is unmounted.
+.TP
+.BR F_GET_FILE_RW_HINT " (\fIuint64_t *\fP; since Linux 4.13)"
+Returns the value of the read/write hint associated with
+the open file description referred to by
+.IR fd .
+.TP
+.BR F_SET_FILE_RW_HINT " (\fIuint64_t *\fP; since Linux 4.13)"
+Sets the read/write hint value associated with the open file description
+referred to by
+.IR fd .
+.P
+If an open file description has not been assigned a read/write hint,
+then it shall use the value assigned to the inode, if any.
+.P
+The following read/write
+hints are valid since Linux 4.13:
+.TP
+.B RWH_WRITE_LIFE_NOT_SET
+No specific hint has been set.
+This is the default value.
+.TP
+.B RWH_WRITE_LIFE_NONE
+No specific write lifetime is associated with this file or inode.
+.TP
+.B RWH_WRITE_LIFE_SHORT
+Data written to this inode or via this open file description
+is expected to have a short lifetime.
+.TP
+.B RWH_WRITE_LIFE_MEDIUM
+Data written to this inode or via this open file description
+is expected to have a lifetime longer than
+data written with
+.BR RWH_WRITE_LIFE_SHORT .
+.TP
+.B RWH_WRITE_LIFE_LONG
+Data written to this inode or via this open file description
+is expected to have a lifetime longer than
+data written with
+.BR RWH_WRITE_LIFE_MEDIUM .
+.TP
+.B RWH_WRITE_LIFE_EXTREME
+Data written to this inode or via this open file description
+is expected to have a lifetime longer than
+data written with
+.BR RWH_WRITE_LIFE_LONG .
+.P
+All the write-specific hints are relative to each other,
+and no individual absolute meaning should be attributed to them.
+.SH RETURN VALUE
+For a successful call, the return value depends on the operation:
+.TP
+.B F_DUPFD
+The new file descriptor.
+.TP
+.B F_GETFD
+Value of file descriptor flags.
+.TP
+.B F_GETFL
+Value of file status flags.
+.TP
+.B F_GETLEASE
+Type of lease held on file descriptor.
+.TP
+.B F_GETOWN
+Value of file descriptor owner.
+.TP
+.B F_GETSIG
+Value of signal sent when read or write becomes possible, or zero
+for traditional
+.B SIGIO
+behavior.
+.TP
+.B F_GETPIPE_SZ
+.TQ
+.B F_SETPIPE_SZ
+The pipe capacity.
+.TP
+.B F_GET_SEALS
+A bit mask identifying the seals that have been set
+for the inode referred to by
+.IR fd .
+.TP
+All other operations
+Zero.
+.P
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+.TP
+.BR EACCES " or " EAGAIN
+Operation is prohibited by locks held by other processes.
+.TP
+.B EAGAIN
+The operation is prohibited because the file has been memory-mapped by
+another process.
+.TP
+.B EBADF
+.I fd
+is not an open file descriptor
+.TP
+.B EBADF
+.I op
+is
+.B F_SETLK
+or
+.B F_SETLKW
+and the file descriptor open mode doesn't match with the
+type of lock requested.
+.TP
+.B EBUSY
+.I op
+is
+.B F_SETPIPE_SZ
+and the new pipe capacity specified in
+.I arg
+is smaller than the amount of buffer space currently
+used to store data in the pipe.
+.TP
+.B EBUSY
+.I op
+is
+.BR F_ADD_SEALS ,
+.I arg
+includes
+.BR F_SEAL_WRITE ,
+and there exists a writable, shared mapping on the file referred to by
+.IR fd .
+.TP
+.B EDEADLK
+It was detected that the specified
+.B F_SETLKW
+operation would cause a deadlock.
+.TP
+.B EFAULT
+.I lock
+is outside your accessible address space.
+.TP
+.B EINTR
+.I op
+is
+.B F_SETLKW
+or
+.B F_OFD_SETLKW
+and the operation was interrupted by a signal; see
+.BR signal (7).
+.TP
+.B EINTR
+.I op
+is
+.BR F_GETLK ,
+.BR F_SETLK ,
+.BR F_OFD_GETLK ,
+or
+.BR F_OFD_SETLK ,
+and the operation was interrupted by a signal before the lock was checked or
+acquired.
+Most likely when locking a remote file (e.g., locking over
+NFS), but can sometimes happen locally.
+.TP
+.B EINVAL
+The value specified in
+.I op
+is not recognized by this kernel.
+.TP
+.B EINVAL
+.I op
+is
+.B F_ADD_SEALS
+and
+.I arg
+includes an unrecognized sealing bit.
+.TP
+.B EINVAL
+.I op
+is
+.B F_ADD_SEALS
+or
+.B F_GET_SEALS
+and the filesystem containing the inode referred to by
+.I fd
+does not support sealing.
+.TP
+.B EINVAL
+.I op
+is
+.B F_DUPFD
+and
+.I arg
+is negative or is greater than the maximum allowable value
+(see the discussion of
+.B RLIMIT_NOFILE
+in
+.BR getrlimit (2)).
+.TP
+.B EINVAL
+.I op
+is
+.B F_SETSIG
+and
+.I arg
+is not an allowable signal number.
+.TP
+.B EINVAL
+.I op
+is
+.BR F_OFD_SETLK ,
+.BR F_OFD_SETLKW ,
+or
+.BR F_OFD_GETLK ,
+and
+.I l_pid
+was not specified as zero.
+.TP
+.B EMFILE
+.I op
+is
+.B F_DUPFD
+and the per-process limit on the number of open file descriptors
+has been reached.
+.TP
+.B ENOLCK
+Too many segment locks open, lock table is full, or a remote locking
+protocol failed (e.g., locking over NFS).
+.TP
+.B ENOTDIR
+.B F_NOTIFY
+was specified in
+.IR op ,
+but
+.I fd
+does not refer to a directory.
+.TP
+.B EPERM
+.I op
+is
+.B F_SETPIPE_SZ
+and the soft or hard user pipe limit has been reached; see
+.BR pipe (7).
+.TP
+.B EPERM
+Attempted to clear the
+.B O_APPEND
+flag on a file that has the append-only attribute set.
+.TP
+.B EPERM
+.I op
+was
+.BR F_ADD_SEALS ,
+but
+.I fd
+was not open for writing
+or the current set of seals on the file already includes
+.BR F_SEAL_SEAL .
+.SH STANDARDS
+POSIX.1-2008.
+.P
+.BR F_GETOWN_EX ,
+.BR F_SETOWN_EX ,
+.BR F_SETPIPE_SZ ,
+.BR F_GETPIPE_SZ ,
+.BR F_GETSIG ,
+.BR F_SETSIG ,
+.BR F_NOTIFY ,
+.BR F_GETLEASE ,
+and
+.B F_SETLEASE
+are Linux-specific.
+(Define the
+.B _GNU_SOURCE
+macro to obtain these definitions.)
+.\" .P
+.\" SVr4 documents additional EIO, ENOLINK and EOVERFLOW error conditions.
+.P
+.BR F_OFD_SETLK ,
+.BR F_OFD_SETLKW ,
+and
+.B F_OFD_GETLK
+are Linux-specific (and one must define
+.B _GNU_SOURCE
+to obtain their definitions),
+but work is being done to have them included in the next version of POSIX.1.
+.P
+.B F_ADD_SEALS
+and
+.B F_GET_SEALS
+are Linux-specific.
+.\" FIXME . Once glibc adds support, add a note about FTM requirements
+.SH HISTORY
+SVr4, 4.3BSD, POSIX.1-2001.
+.P
+Only the operations
+.BR F_DUPFD ,
+.BR F_GETFD ,
+.BR F_SETFD ,
+.BR F_GETFL ,
+.BR F_SETFL ,
+.BR F_GETLK ,
+.BR F_SETLK ,
+and
+.B F_SETLKW
+are specified in POSIX.1-2001.
+.P
+.B F_GETOWN
+and
+.B F_SETOWN
+are specified in POSIX.1-2001.
+(To get their definitions, define either
+.\" .BR _BSD_SOURCE ,
+.\" or
+.B _XOPEN_SOURCE
+with the value 500 or greater, or
+.B _POSIX_C_SOURCE
+with the value 200809L or greater.)
+.P
+.B F_DUPFD_CLOEXEC
+is specified in POSIX.1-2008.
+(To get this definition, define
+.B _POSIX_C_SOURCE
+with the value 200809L or greater, or
+.B _XOPEN_SOURCE
+with the value 700 or greater.)
+.SH NOTES
+The errors returned by
+.BR dup2 (2)
+are different from those returned by
+.BR F_DUPFD .
+.\"
+.SS File locking
+The original Linux
+.BR fcntl ()
+system call was not designed to handle large file offsets
+(in the
+.I flock
+structure).
+Consequently, an
+.BR fcntl64 ()
+system call was added in Linux 2.4.
+The newer system call employs a different structure for file locking,
+.IR flock64 ,
+and corresponding operations,
+.BR F_GETLK64 ,
+.BR F_SETLK64 ,
+and
+.BR F_SETLKW64 .
+However, these details can be ignored by applications using glibc, whose
+.BR fcntl ()
+wrapper function transparently employs the more recent system call
+where it is available.
+.\"
+.SS Record locks
+Since Linux 2.0, there is no interaction between the types of lock
+placed by
+.BR flock (2)
+and
+.BR fcntl ().
+.P
+Several systems have more fields in
+.I "struct flock"
+such as, for example,
+.I l_sysid
+(to identify the machine where the lock is held).
+.\" e.g., Solaris 8 documents this field in fcntl(2), and Irix 6.5
+.\" documents it in fcntl(5). mtk, May 2007
+.\" Also, FreeBSD documents it (Apr 2014).
+Clearly,
+.I l_pid
+alone is not going to be very useful if the process holding the lock
+may live on a different machine;
+on Linux, while present on some architectures (such as MIPS32),
+this field is not used.
+.P
+The original Linux
+.BR fcntl ()
+system call was not designed to handle large file offsets
+(in the
+.I flock
+structure).
+Consequently, an
+.BR fcntl64 ()
+system call was added in Linux 2.4.
+The newer system call employs a different structure for file locking,
+.IR flock64 ,
+and corresponding operations,
+.BR F_GETLK64 ,
+.BR F_SETLK64 ,
+and
+.BR F_SETLKW64 .
+However, these details can be ignored by applications using glibc, whose
+.BR fcntl ()
+wrapper function transparently employs the more recent system call
+where it is available.
+.SS Record locking and NFS
+Before Linux 3.12, if an NFSv4 client
+loses contact with the server for a period of time
+(defined as more than 90 seconds with no communication),
+.\"
+.\" Neil Brown: With NFSv3 the failure mode is the reverse. If
+.\" the server loses contact with a client then any lock stays in place
+.\" indefinitely ("why can't I read my mail"... I remember it well).
+.\"
+it might lose and regain a lock without ever being aware of the fact.
+(The period of time after which contact is assumed lost is known as
+the NFSv4 leasetime.
+On a Linux NFS server, this can be determined by looking at
+.IR /proc/fs/nfsd/nfsv4leasetime ,
+which expresses the period in seconds.
+The default value for this file is 90.)
+.\"
+.\" Jeff Layton:
+.\" Note that this is not a firm timeout. The server runs a job
+.\" periodically to clean out expired stateful objects, and it's likely
+.\" that there is some time (maybe even up to another whole lease period)
+.\" between when the timeout expires and the job actually runs. If the
+.\" client gets a RENEW in there within that window, its lease will be
+.\" renewed and its state preserved.
+.\"
+This scenario potentially risks data corruption,
+since another process might acquire a lock in the intervening period
+and perform file I/O.
+.P
+Since Linux 3.12,
+.\" commit ef1820f9be27b6ad158f433ab38002ab8131db4d
+if an NFSv4 client loses contact with the server,
+any I/O to the file by a process which "thinks" it holds
+a lock will fail until that process closes and reopens the file.
+A kernel parameter,
+.IR nfs.recover_lost_locks ,
+can be set to 1 to obtain the pre-3.12 behavior,
+whereby the client will attempt to recover lost locks
+when contact is reestablished with the server.
+Because of the attendant risk of data corruption,
+.\" commit f6de7a39c181dfb8a2c534661a53c73afb3081cd
+this parameter defaults to 0 (disabled).
+.SH BUGS
+.SS F_SETFL
+It is not possible to use
+.B F_SETFL
+to change the state of the
+.B O_DSYNC
+and
+.B O_SYNC
+flags.
+.\" FIXME . According to POSIX.1-2001, O_SYNC should also be modifiable
+.\" via fcntl(2), but currently Linux does not permit this
+.\" See http://bugzilla.kernel.org/show_bug.cgi?id=5994
+Attempts to change the state of these flags are silently ignored.
+.SS F_GETOWN
+A limitation of the Linux system call conventions on some
+architectures (notably i386) means that if a (negative)
+process group ID to be returned by
+.B F_GETOWN
+falls in the range \-1 to \-4095, then the return value is wrongly
+interpreted by glibc as an error in the system call;
+.\" glibc source: sysdeps/unix/sysv/linux/i386/sysdep.h
+that is, the return value of
+.BR fcntl ()
+will be \-1, and
+.I errno
+will contain the (positive) process group ID.
+The Linux-specific
+.B F_GETOWN_EX
+operation avoids this problem.
+.\" mtk, Dec 04: some limited testing on alpha and ia64 seems to
+.\" indicate that ANY negative PGID value will cause F_GETOWN
+.\" to misinterpret the return as an error. Some other architectures
+.\" seem to have the same range check as i386.
+Since glibc 2.11, glibc makes the kernel
+.B F_GETOWN
+problem invisible by implementing
+.B F_GETOWN
+using
+.BR F_GETOWN_EX .
+.SS F_SETOWN
+In Linux 2.4 and earlier, there is bug that can occur
+when an unprivileged process uses
+.B F_SETOWN
+to specify the owner
+of a socket file descriptor
+as a process (group) other than the caller.
+In this case,
+.BR fcntl ()
+can return \-1 with
+.I errno
+set to
+.BR EPERM ,
+even when the owner process (group) is one that the caller
+has permission to send signals to.
+Despite this error return, the file descriptor owner is set,
+and signals will be sent to the owner.
+.\"
+.SS Deadlock detection
+The deadlock-detection algorithm employed by the kernel when dealing with
+.B F_SETLKW
+requests can yield both
+false negatives (failures to detect deadlocks,
+leaving a set of deadlocked processes blocked indefinitely)
+and false positives
+.RB ( EDEADLK
+errors when there is no deadlock).
+For example,
+the kernel limits the lock depth of its dependency search to 10 steps,
+meaning that circular deadlock chains that exceed
+that size will not be detected.
+In addition, the kernel may falsely indicate a deadlock
+when two or more processes created using the
+.BR clone (2)
+.B CLONE_FILES
+flag place locks that appear (to the kernel) to conflict.
+.\"
+.SS Mandatory locking
+The Linux implementation of mandatory locking
+is subject to race conditions which render it unreliable:
+.\" http://marc.info/?l=linux-kernel&m=119013491707153&w=2
+.\"
+.\" Reconfirmed by Jeff Layton
+.\" From: Jeff Layton <jlayton <at> redhat.com>
+.\" Subject: Re: Status of fcntl() mandatory locking
+.\" Newsgroups: gmane.linux.file-systems
+.\" Date: 2014-04-28 10:07:57 GMT
+.\" http://thread.gmane.org/gmane.linux.file-systems/84481/focus=84518
+a
+.BR write (2)
+call that overlaps with a lock may modify data after the mandatory lock is
+acquired;
+a
+.BR read (2)
+call that overlaps with a lock may detect changes to data that were made
+only after a write lock was acquired.
+Similar races exist between mandatory locks and
+.BR mmap (2).
+It is therefore inadvisable to rely on mandatory locking.
+.SH SEE ALSO
+.BR dup2 (2),
+.BR flock (2),
+.BR open (2),
+.BR socket (2),
+.BR lockf (3),
+.BR capabilities (7),
+.BR feature_test_macros (7),
+.BR lslocks (8)
+.P
+.IR locks.txt ,
+.IR mandatory\-locking.txt ,
+and
+.I dnotify.txt
+in the Linux kernel source directory
+.I Documentation/filesystems/
+(on older kernels, these files are directly under the
+.I Documentation/
+directory, and
+.I mandatory\-locking.txt
+is called
+.IR mandatory.txt )