diff options
Diffstat (limited to 'man2/futex.2')
-rw-r--r-- | man2/futex.2 | 1976 |
1 files changed, 0 insertions, 1976 deletions
diff --git a/man2/futex.2 b/man2/futex.2 deleted file mode 100644 index 2ff300b..0000000 --- a/man2/futex.2 +++ /dev/null @@ -1,1976 +0,0 @@ -.\" Page by b.hubert -.\" and Copyright (C) 2015, Thomas Gleixner <tglx@linutronix.de> -.\" and Copyright (C) 2015, Michael Kerrisk <mtk.manpages@gmail.com> -.\" -.\" %%%LICENSE_START(FREELY_REDISTRIBUTABLE) -.\" may be freely modified and distributed -.\" %%%LICENSE_END -.\" -.\" Niki A. Rahimi (LTC Security Development, narahimi@us.ibm.com) -.\" added ERRORS section. -.\" -.\" Modified 2004-06-17 mtk -.\" Modified 2004-10-07 aeb, added FUTEX_REQUEUE, FUTEX_CMP_REQUEUE -.\" -.\" FIXME Still to integrate are some points from Torvald Riegel's mail of -.\" 2015-01-23: -.\" http://thread.gmane.org/gmane.linux.kernel/1703405/focus=7977 -.\" -.\" FIXME Do we need to add some text regarding Torvald Riegel's 2015-01-24 mail -.\" http://thread.gmane.org/gmane.linux.kernel/1703405/focus=1873242 -.\" -.TH futex 2 2023-10-31 "Linux man-pages 6.7" -.SH NAME -futex \- fast user-space locking -.SH LIBRARY -Standard C library -.RI ( libc ", " \-lc ) -.SH SYNOPSIS -.nf -.P -.BR "#include <linux/futex.h>" " /* Definition of " FUTEX_* " constants */" -.BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */" -.B #include <unistd.h> -.P -.BI "long syscall(SYS_futex, uint32_t *" uaddr ", int " futex_op \ -", uint32_t " val , -.BI " const struct timespec *" timeout , \ -" \fR /* or: \fBuint32_t \fIval2\fP */" -.BI " uint32_t *" uaddr2 ", uint32_t " val3 ); -.fi -.P -.IR Note : -glibc provides no wrapper for -.BR futex (), -necessitating the use of -.BR syscall (2). -.SH DESCRIPTION -The -.BR futex () -system call provides a method for waiting until a certain condition becomes -true. -It is typically used as a blocking construct in the context of -shared-memory synchronization. -When using futexes, the majority of -the synchronization operations are performed in user space. -A user-space program employs the -.BR futex () -system call only when it is likely that the program has to block for -a longer time until the condition becomes true. -Other -.BR futex () -operations can be used to wake any processes or threads waiting -for a particular condition. -.P -A futex is a 32-bit value\[em]referred to below as a -.IR "futex word" \[em]whose -address is supplied to the -.BR futex () -system call. -(Futexes are 32 bits in size on all platforms, including 64-bit systems.) -All futex operations are governed by this value. -In order to share a futex between processes, -the futex is placed in a region of shared memory, -created using (for example) -.BR mmap (2) -or -.BR shmat (2). -(Thus, the futex word may have different -virtual addresses in different processes, -but these addresses all refer to the same location in physical memory.) -In a multithreaded program, it is sufficient to place the futex word -in a global variable shared by all threads. -.P -When executing a futex operation that requests to block a thread, -the kernel will block only if the futex word has the value that the -calling thread supplied (as one of the arguments of the -.BR futex () -call) as the expected value of the futex word. -The loading of the futex word's value, -the comparison of that value with the expected value, -and the actual blocking will happen atomically and will be totally ordered -with respect to concurrent operations performed by other threads -on the same futex word. -.\" Notes from Darren Hart (Dec 2015): -.\" Totally ordered with respect futex operations refers to semantics -.\" of the ACQUIRE/RELEASE operations and how they impact ordering of -.\" memory reads and writes. The kernel futex operations are protected -.\" by spinlocks, which ensure that all operations are serialized -.\" with respect to one another. -.\" -.\" This is a lot to attempt to define in this document. Perhaps a -.\" reference to linux/Documentation/memory-barriers.txt as a footnote -.\" would be sufficient? Or perhaps for this manual, "serialized" would -.\" be sufficient, with a footnote regarding "totally ordered" and a -.\" pointer to the memory-barrier documentation? -Thus, the futex word is used to connect the synchronization in user space -with the implementation of blocking by the kernel. -Analogously to an atomic -compare-and-exchange operation that potentially changes shared memory, -blocking via a futex is an atomic compare-and-block operation. -.\" FIXME(Torvald Riegel): -.\" Eventually we want to have some text in NOTES to satisfy -.\" the reference in the following sentence -.\" See NOTES for a detailed specification of -.\" the synchronization semantics. -.P -One use of futexes is for implementing locks. -The state of the lock (i.e., acquired or not acquired) -can be represented as an atomically accessed flag in shared memory. -In the uncontended case, -a thread can access or modify the lock state with atomic instructions, -for example atomically changing it from not acquired to acquired -using an atomic compare-and-exchange instruction. -(Such instructions are performed entirely in user mode, -and the kernel maintains no information about the lock state.) -On the other hand, a thread may be unable to acquire a lock because -it is already acquired by another thread. -It then may pass the lock's flag as a futex word and the value -representing the acquired state as the expected value to a -.BR futex () -wait operation. -This -.BR futex () -operation will block if and only if the lock is still acquired -(i.e., the value in the futex word still matches the "acquired state"). -When releasing the lock, a thread has to first reset the -lock state to not acquired and then execute a futex -operation that wakes threads blocked on the lock flag used as a futex word -(this can be further optimized to avoid unnecessary wake-ups). -See -.BR futex (7) -for more detail on how to use futexes. -.P -Besides the basic wait and wake-up futex functionality, there are further -futex operations aimed at supporting more complex use cases. -.P -Note that -no explicit initialization or destruction is necessary to use futexes; -the kernel maintains a futex -(i.e., the kernel-internal implementation artifact) -only while operations such as -.BR FUTEX_WAIT , -described below, are being performed on a particular futex word. -.\" -.SS Arguments -The -.I uaddr -argument points to the futex word. -On all platforms, futexes are four-byte -integers that must be aligned on a four-byte boundary. -The operation to perform on the futex is specified in the -.I futex_op -argument; -.I val -is a value whose meaning and purpose depends on -.IR futex_op . -.P -The remaining arguments -.RI ( timeout , -.IR uaddr2 , -and -.IR val3 ) -are required only for certain of the futex operations described below. -Where one of these arguments is not required, it is ignored. -.P -For several blocking operations, the -.I timeout -argument is a pointer to a -.I timespec -structure that specifies a timeout for the operation. -However, notwithstanding the prototype shown above, for some operations, -the least significant four bytes of this argument are instead -used as an integer whose meaning is determined by the operation. -For these operations, the kernel casts the -.I timeout -value first to -.IR "unsigned long", -then to -.IR uint32_t , -and in the remainder of this page, this argument is referred to as -.I val2 -when interpreted in this fashion. -.P -Where it is required, the -.I uaddr2 -argument is a pointer to a second futex word that is employed -by the operation. -.P -The interpretation of the final integer argument, -.IR val3 , -depends on the operation. -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.SS Futex operations -The -.I futex_op -argument consists of two parts: -a command that specifies the operation to be performed, -bitwise ORed with zero or more options that -modify the behaviour of the operation. -The options that may be included in -.I futex_op -are as follows: -.TP -.BR FUTEX_PRIVATE_FLAG " (since Linux 2.6.22)" -.\" commit 34f01cc1f512fa783302982776895c73714ebbc2 -This option bit can be employed with all futex operations. -It tells the kernel that the futex is process-private and not shared -with another process (i.e., it is being used for synchronization -only between threads of the same process). -This allows the kernel to make some additional performance optimizations. -.\" I.e., It allows the kernel choose the fast path for validating -.\" the user-space address and avoids expensive VMA lookups, -.\" taking reference counts on file backing store, and so on. -.IP -As a convenience, -.I <linux/futex.h> -defines a set of constants with the suffix -.B _PRIVATE -that are equivalents of all of the operations listed below, -.\" except the obsolete FUTEX_FD, for which the "private" flag was -.\" meaningless -but with the -.B FUTEX_PRIVATE_FLAG -ORed into the constant value. -Thus, there are -.BR FUTEX_WAIT_PRIVATE , -.BR FUTEX_WAKE_PRIVATE , -and so on. -.TP -.BR FUTEX_CLOCK_REALTIME " (since Linux 2.6.28)" -.\" commit 1acdac104668a0834cfa267de9946fac7764d486 -This option bit can be employed only with the -.BR FUTEX_WAIT_BITSET , -.BR FUTEX_WAIT_REQUEUE_PI , -(since Linux 4.5) -.\" commit 337f13046ff03717a9e99675284a817527440a49 -.BR FUTEX_WAIT , -and -(since Linux 5.14) -.\" commit bf22a6976897977b0a3f1aeba6823c959fc4fdae -.B FUTEX_LOCK_PI2 -operations. -.IP -If this option is set, the kernel measures the -.I timeout -against the -.B CLOCK_REALTIME -clock. -.IP -If this option is not set, the kernel measures the -.I timeout -against the -.B CLOCK_MONOTONIC -clock. -.P -The operation specified in -.I futex_op -is one of the following: -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.TP -.BR FUTEX_WAIT " (since Linux 2.6.0)" -.\" Strictly speaking, since some time in Linux 2.5.x -This operation tests that the value at the -futex word pointed to by the address -.I uaddr -still contains the expected value -.IR val , -and if so, then sleeps waiting for a -.B FUTEX_WAKE -operation on the futex word. -The load of the value of the futex word is an atomic memory -access (i.e., using atomic machine instructions of the respective -architecture). -This load, the comparison with the expected value, and -starting to sleep are performed atomically -.\" FIXME: Torvald, I think we may need to add some explanation of -.\" "totally ordered" here. -and totally ordered -with respect to other futex operations on the same futex word. -If the thread starts to sleep, -it is considered a waiter on this futex word. -If the futex value does not match -.IR val , -then the call fails immediately with the error -.BR EAGAIN . -.IP -The purpose of the comparison with the expected value is to prevent lost -wake-ups. -If another thread changed the value of the futex word after the -calling thread decided to block based on the prior value, -and if the other thread executed a -.B FUTEX_WAKE -operation (or similar wake-up) after the value change and before this -.B FUTEX_WAIT -operation, then the calling thread will observe the -value change and will not start to sleep. -.IP -If the -.I timeout -is not NULL, the structure it points to specifies a -timeout for the wait. -(This interval will be rounded up to the system clock granularity, -and is guaranteed not to expire early.) -The timeout is by default measured according to the -.B CLOCK_MONOTONIC -clock, but, since Linux 4.5, the -.B CLOCK_REALTIME -clock can be selected by specifying -.B FUTEX_CLOCK_REALTIME -in -.IR futex_op . -If -.I timeout -is NULL, the call blocks indefinitely. -.IP -.IR Note : -for -.BR FUTEX_WAIT , -.I timeout -is interpreted as a -.I relative -value. -This differs from other futex operations, where -.I timeout -is interpreted as an absolute value. -To obtain the equivalent of -.B FUTEX_WAIT -with an absolute timeout, employ -.B FUTEX_WAIT_BITSET -with -.I val3 -specified as -.BR FUTEX_BITSET_MATCH_ANY . -.IP -The arguments -.I uaddr2 -and -.I val3 -are ignored. -.\" FIXME . (Torvald) I think we should remove this. Or maybe adapt to a -.\" different example. -.\" -.\" For -.\" .BR futex (7), -.\" this call is executed if decrementing the count gave a negative value -.\" (indicating contention), -.\" and will sleep until another process or thread releases -.\" the futex and executes the -.\" .B FUTEX_WAKE -.\" operation. -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.TP -.BR FUTEX_WAKE " (since Linux 2.6.0)" -.\" Strictly speaking, since Linux 2.5.x -This operation wakes at most -.I val -of the waiters that are waiting (e.g., inside -.BR FUTEX_WAIT ) -on the futex word at the address -.IR uaddr . -Most commonly, -.I val -is specified as either 1 (wake up a single waiter) or -.B INT_MAX -(wake up all waiters). -No guarantee is provided about which waiters are awoken -(e.g., a waiter with a higher scheduling priority is not guaranteed -to be awoken in preference to a waiter with a lower priority). -.IP -The arguments -.IR timeout , -.IR uaddr2 , -and -.I val3 -are ignored. -.\" FIXME . (Torvald) I think we should remove this. Or maybe adapt to -.\" a different example. -.\" -.\" For -.\" .BR futex (7), -.\" this is executed if incrementing the count showed that -.\" there were waiters, -.\" once the futex value has been set to 1 -.\" (indicating that it is available). -.\" -.\" How does "incrementing the count show that there were waiters"? -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.TP -.BR FUTEX_FD " (from Linux 2.6.0 up to and including Linux 2.6.25)" -.\" Strictly speaking, from Linux 2.5.x to Linux 2.6.25 -This operation creates a file descriptor that is associated with -the futex at -.IR uaddr . -The caller must close the returned file descriptor after use. -When another process or thread performs a -.B FUTEX_WAKE -on the futex word, the file descriptor indicates as being readable with -.BR select (2), -.BR poll (2), -and -.BR epoll (7) -.IP -The file descriptor can be used to obtain asynchronous notifications: if -.I val -is nonzero, then, when another process or thread executes a -.BR FUTEX_WAKE , -the caller will receive the signal number that was passed in -.IR val . -.IP -The arguments -.IR timeout , -.IR uaddr2 , -and -.I val3 -are ignored. -.IP -Because it was inherently racy, -.B FUTEX_FD -has been removed -.\" commit 82af7aca56c67061420d618cc5a30f0fd4106b80 -from Linux 2.6.26 onward. -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.TP -.BR FUTEX_REQUEUE " (since Linux 2.6.0)" -This operation performs the same task as -.B FUTEX_CMP_REQUEUE -(see below), except that no check is made using the value in -.IR val3 . -(The argument -.I val3 -is ignored.) -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.TP -.BR FUTEX_CMP_REQUEUE " (since Linux 2.6.7)" -This operation first checks whether the location -.I uaddr -still contains the value -.IR val3 . -If not, the operation fails with the error -.BR EAGAIN . -Otherwise, the operation wakes up a maximum of -.I val -waiters that are waiting on the futex at -.IR uaddr . -If there are more than -.I val -waiters, then the remaining waiters are removed -from the wait queue of the source futex at -.I uaddr -and added to the wait queue of the target futex at -.IR uaddr2 . -The -.I val2 -argument specifies an upper limit on the number of waiters -that are requeued to the futex at -.IR uaddr2 . -.IP -.\" FIXME(Torvald) Is the following correct? Or is just the decision -.\" which threads to wake or requeue part of the atomic operation? -The load from -.I uaddr -is an atomic memory access (i.e., using atomic machine instructions of -the respective architecture). -This load, the comparison with -.IR val3 , -and the requeueing of any waiters are performed atomically and totally -ordered with respect to other operations on the same futex word. -.\" Notes from a f2f conversation with Thomas Gleixner (Aug 2015): ### -.\" The operation is serialized with respect to operations on both -.\" source and target futex. No other waiter can enqueue itself -.\" for waiting and no other waiter can dequeue itself because of -.\" a timeout or signal. -.IP -Typical values to specify for -.I val -are 0 or 1. -(Specifying -.B INT_MAX -is not useful, because it would make the -.B FUTEX_CMP_REQUEUE -operation equivalent to -.BR FUTEX_WAKE .) -The limit value specified via -.I val2 -is typically either 1 or -.BR INT_MAX . -(Specifying the argument as 0 is not useful, because it would make the -.B FUTEX_CMP_REQUEUE -operation equivalent to -.BR FUTEX_WAIT .) -.IP -The -.B FUTEX_CMP_REQUEUE -operation was added as a replacement for the earlier -.BR FUTEX_REQUEUE . -The difference is that the check of the value at -.I uaddr -can be used to ensure that requeueing happens only under certain -conditions, which allows race conditions to be avoided in certain use cases. -.\" But, as Rich Felker points out, there remain valid use cases for -.\" FUTEX_REQUEUE, for example, when the calling thread is requeuing -.\" the target(s) to a lock that the calling thread owns -.\" From: Rich Felker <dalias@libc.org> -.\" Date: Wed, 29 Oct 2014 22:43:17 -0400 -.\" To: Darren Hart <dvhart@infradead.org> -.\" CC: libc-alpha@sourceware.org, ... -.\" Subject: Re: Add futex wrapper to glibc? -.IP -Both -.B FUTEX_REQUEUE -and -.B FUTEX_CMP_REQUEUE -can be used to avoid "thundering herd" wake-ups that could occur when using -.B FUTEX_WAKE -in cases where all of the waiters that are woken need to acquire -another futex. -Consider the following scenario, -where multiple waiter threads are waiting on B, -a wait queue implemented using a futex: -.IP -.in +4n -.EX -lock(A) -while (!check_value(V)) { - unlock(A); - block_on(B); - lock(A); -}; -unlock(A); -.EE -.in -.IP -If a waker thread used -.BR FUTEX_WAKE , -then all waiters waiting on B would be woken up, -and they would all try to acquire lock A. -However, waking all of the threads in this manner would be pointless because -all except one of the threads would immediately block on lock A again. -By contrast, a requeue operation wakes just one waiter and moves -the other waiters to lock A, -and when the woken waiter unlocks A then the next waiter can proceed. -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.TP -.BR FUTEX_WAKE_OP " (since Linux 2.6.14)" -.\" commit 4732efbeb997189d9f9b04708dc26bf8613ed721 -.\" Author: Jakub Jelinek <jakub@redhat.com> -.\" Date: Tue Sep 6 15:16:25 2005 -0700 -.\" FIXME. (Torvald) The glibc condvar implementation is currently being -.\" revised (e.g., to not use an internal lock anymore). -.\" It is probably more future-proof to remove this paragraph. -.\" [Torvald, do you have an update here?] -This operation was added to support some user-space use cases -where more than one futex must be handled at the same time. -The most notable example is the implementation of -.BR pthread_cond_signal (3), -which requires operations on two futexes, -the one used to implement the mutex and the one used in the implementation -of the wait queue associated with the condition variable. -.B FUTEX_WAKE_OP -allows such cases to be implemented without leading to -high rates of contention and context switching. -.IP -The -.B FUTEX_WAKE_OP -operation is equivalent to executing the following code atomically -and totally ordered with respect to other futex operations on -any of the two supplied futex words: -.IP -.in +4n -.EX -uint32_t oldval = *(uint32_t *) uaddr2; -*(uint32_t *) uaddr2 = oldval \fIop\fP \fIoparg\fP; -futex(uaddr, FUTEX_WAKE, val, 0, 0, 0); -if (oldval \fIcmp\fP \fIcmparg\fP) - futex(uaddr2, FUTEX_WAKE, val2, 0, 0, 0); -.EE -.in -.IP -In other words, -.B FUTEX_WAKE_OP -does the following: -.RS -.IP \[bu] 3 -saves the original value of the futex word at -.I uaddr2 -and performs an operation to modify the value of the futex at -.IR uaddr2 ; -this is an atomic read-modify-write memory access (i.e., using atomic -machine instructions of the respective architecture) -.IP \[bu] -wakes up a maximum of -.I val -waiters on the futex for the futex word at -.IR uaddr ; -and -.IP \[bu] -dependent on the results of a test of the original value of the -futex word at -.IR uaddr2 , -wakes up a maximum of -.I val2 -waiters on the futex for the futex word at -.IR uaddr2 . -.RE -.IP -The operation and comparison that are to be performed are encoded -in the bits of the argument -.IR val3 . -Pictorially, the encoding is: -.IP -.in +4n -.EX -+---+---+-----------+-----------+ -|op |cmp| oparg | cmparg | -+---+---+-----------+-----------+ - 4 4 12 12 <== # of bits -.EE -.in -.IP -Expressed in code, the encoding is: -.IP -.in +4n -.EX -#define FUTEX_OP(op, oparg, cmp, cmparg) \e - (((op & 0xf) << 28) | \e - ((cmp & 0xf) << 24) | \e - ((oparg & 0xfff) << 12) | \e - (cmparg & 0xfff)) -.EE -.in -.IP -In the above, -.I op -and -.I cmp -are each one of the codes listed below. -The -.I oparg -and -.I cmparg -components are literal numeric values, except as noted below. -.IP -The -.I op -component has one of the following values: -.IP -.in +4n -.EX -FUTEX_OP_SET 0 /* uaddr2 = oparg; */ -FUTEX_OP_ADD 1 /* uaddr2 += oparg; */ -FUTEX_OP_OR 2 /* uaddr2 |= oparg; */ -FUTEX_OP_ANDN 3 /* uaddr2 &= \[ti]oparg; */ -FUTEX_OP_XOR 4 /* uaddr2 \[ha]= oparg; */ -.EE -.in -.IP -In addition, bitwise ORing the following value into -.I op -causes -.I (1\~<<\~oparg) -to be used as the operand: -.IP -.in +4n -.EX -FUTEX_OP_ARG_SHIFT 8 /* Use (1 << oparg) as operand */ -.EE -.in -.IP -The -.I cmp -field is one of the following: -.IP -.in +4n -.EX -FUTEX_OP_CMP_EQ 0 /* if (oldval == cmparg) wake */ -FUTEX_OP_CMP_NE 1 /* if (oldval != cmparg) wake */ -FUTEX_OP_CMP_LT 2 /* if (oldval < cmparg) wake */ -FUTEX_OP_CMP_LE 3 /* if (oldval <= cmparg) wake */ -FUTEX_OP_CMP_GT 4 /* if (oldval > cmparg) wake */ -FUTEX_OP_CMP_GE 5 /* if (oldval >= cmparg) wake */ -.EE -.in -.IP -The return value of -.B FUTEX_WAKE_OP -is the sum of the number of waiters woken on the futex -.I uaddr -plus the number of waiters woken on the futex -.IR uaddr2 . -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.TP -.BR FUTEX_WAIT_BITSET " (since Linux 2.6.25)" -.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d -This operation is like -.B FUTEX_WAIT -except that -.I val3 -is used to provide a 32-bit bit mask to the kernel. -This bit mask, in which at least one bit must be set, -is stored in the kernel-internal state of the waiter. -See the description of -.B FUTEX_WAKE_BITSET -for further details. -.IP -If -.I timeout -is not NULL, the structure it points to specifies -an absolute timeout for the wait operation. -If -.I timeout -is NULL, the operation can block indefinitely. -.IP -The -.I uaddr2 -argument is ignored. -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.TP -.BR FUTEX_WAKE_BITSET " (since Linux 2.6.25)" -.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d -This operation is the same as -.B FUTEX_WAKE -except that the -.I val3 -argument is used to provide a 32-bit bit mask to the kernel. -This bit mask, in which at least one bit must be set, -is used to select which waiters should be woken up. -The selection is done by a bitwise AND of the "wake" bit mask -(i.e., the value in -.IR val3 ) -and the bit mask which is stored in the kernel-internal -state of the waiter (the "wait" bit mask that is set using -.BR FUTEX_WAIT_BITSET ). -All of the waiters for which the result of the AND is nonzero are woken up; -the remaining waiters are left sleeping. -.IP -The effect of -.B FUTEX_WAIT_BITSET -and -.B FUTEX_WAKE_BITSET -is to allow selective wake-ups among multiple waiters that are blocked -on the same futex. -However, note that, depending on the use case, -employing this bit-mask multiplexing feature on a -futex can be less efficient than simply using multiple futexes, -because employing bit-mask multiplexing requires the kernel -to check all waiters on a futex, -including those that are not interested in being woken up -(i.e., they do not have the relevant bit set in their "wait" bit mask). -.\" According to http://locklessinc.com/articles/futex_cheat_sheet/: -.\" -.\" "The original reason for the addition of these extensions -.\" was to improve the performance of pthread read-write locks -.\" in glibc. However, the pthreads library no longer uses the -.\" same locking algorithm, and these extensions are not used -.\" without the bitset parameter being all ones. -.\" -.\" The page goes on to note that the FUTEX_WAIT_BITSET operation -.\" is nevertheless used (with a bit mask of all ones) in order to -.\" obtain the absolute timeout functionality that is useful -.\" for efficiently implementing Pthreads APIs (which use absolute -.\" timeouts); FUTEX_WAIT provides only relative timeouts. -.IP -The constant -.BR FUTEX_BITSET_MATCH_ANY , -which corresponds to all 32 bits set in the bit mask, can be used as the -.I val3 -argument for -.B FUTEX_WAIT_BITSET -and -.BR FUTEX_WAKE_BITSET . -Other than differences in the handling of the -.I timeout -argument, the -.B FUTEX_WAIT -operation is equivalent to -.B FUTEX_WAIT_BITSET -with -.I val3 -specified as -.BR FUTEX_BITSET_MATCH_ANY ; -that is, allow a wake-up by any waker. -The -.B FUTEX_WAKE -operation is equivalent to -.B FUTEX_WAKE_BITSET -with -.I val3 -specified as -.BR FUTEX_BITSET_MATCH_ANY ; -that is, wake up any waiter(s). -.IP -The -.I uaddr2 -and -.I timeout -arguments are ignored. -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.SS Priority-inheritance futexes -Linux supports priority-inheritance (PI) futexes in order to handle -priority-inversion problems that can be encountered with -normal futex locks. -Priority inversion is the problem that occurs when a high-priority -task is blocked waiting to acquire a lock held by a low-priority task, -while tasks at an intermediate priority continuously preempt -the low-priority task from the CPU. -Consequently, the low-priority task makes no progress toward -releasing the lock, and the high-priority task remains blocked. -.P -Priority inheritance is a mechanism for dealing with -the priority-inversion problem. -With this mechanism, when a high-priority task becomes blocked -by a lock held by a low-priority task, -the priority of the low-priority task is temporarily raised -to that of the high-priority task, -so that it is not preempted by any intermediate level tasks, -and can thus make progress toward releasing the lock. -To be effective, priority inheritance must be transitive, -meaning that if a high-priority task blocks on a lock -held by a lower-priority task that is itself blocked by a lock -held by another intermediate-priority task -(and so on, for chains of arbitrary length), -then both of those tasks -(or more generally, all of the tasks in a lock chain) -have their priorities raised to be the same as the high-priority task. -.P -From a user-space perspective, -what makes a futex PI-aware is a policy agreement (described below) -between user space and the kernel about the value of the futex word, -coupled with the use of the PI-futex operations described below. -(Unlike the other futex operations described above, -the PI-futex operations are designed -for the implementation of very specific IPC mechanisms.) -.\" -.\" Quoting Darren Hart: -.\" These opcodes paired with the PI futex value policy (described below) -.\" defines a "futex" as PI aware. These were created very specifically -.\" in support of PI pthread_mutexes, so it makes a lot more sense to -.\" talk about a PI aware pthread_mutex, than a PI aware futex, since -.\" there is a lot of policy and scaffolding that has to be built up -.\" around it to use it properly (this is what a PI pthread_mutex is). -.P -.\" mtk: The following text is drawn from the Hart/Guniguntala paper -.\" (listed in SEE ALSO), but I have reworded some pieces -.\" significantly. -.\" -The PI-futex operations described below differ from the other -futex operations in that they impose policy on the use of the value of the -futex word: -.IP \[bu] 3 -If the lock is not acquired, the futex word's value shall be 0. -.IP \[bu] -If the lock is acquired, the futex word's value shall -be the thread ID (TID; -see -.BR gettid (2)) -of the owning thread. -.IP \[bu] -If the lock is owned and there are threads contending for the lock, -then the -.B FUTEX_WAITERS -bit shall be set in the futex word's value; in other words, this value is: -.IP -.in +4n -.EX -FUTEX_WAITERS | TID -.EE -.in -.IP -(Note that is invalid for a PI futex word to have no owner and -.B FUTEX_WAITERS -set.) -.P -With this policy in place, -a user-space application can acquire an unacquired -lock or release a lock using atomic instructions executed in user mode -(e.g., a compare-and-swap operation such as -.I cmpxchg -on the x86 architecture). -Acquiring a lock simply consists of using compare-and-swap to atomically -set the futex word's value to the caller's TID if its previous value was 0. -Releasing a lock requires using compare-and-swap to set the futex word's -value to 0 if the previous value was the expected TID. -.P -If a futex is already acquired (i.e., has a nonzero value), -waiters must employ the -.B FUTEX_LOCK_PI -or -.B FUTEX_LOCK_PI2 -operations to acquire the lock. -If other threads are waiting for the lock, then the -.B FUTEX_WAITERS -bit is set in the futex value; -in this case, the lock owner must employ the -.B FUTEX_UNLOCK_PI -operation to release the lock. -.P -In the cases where callers are forced into the kernel -(i.e., required to perform a -.BR futex () -call), -they then deal directly with a so-called RT-mutex, -a kernel locking mechanism which implements the required -priority-inheritance semantics. -After the RT-mutex is acquired, the futex value is updated accordingly, -before the calling thread returns to user space. -.P -It is important to note -.\" tglx (July 2015): -.\" If there are multiple waiters on a pi futex then a wake pi operation -.\" will wake the first waiter and hand over the lock to this waiter. This -.\" includes handing over the rtmutex which represents the futex in the -.\" kernel. The strict requirement is that the futex owner and the rtmutex -.\" owner must be the same, except for the update period which is -.\" serialized by the futex internal locking. That means the kernel must -.\" update the user-space value prior to returning to user space -that the kernel will update the futex word's value prior -to returning to user space. -(This prevents the possibility of the futex word's value ending -up in an invalid state, such as having an owner but the value being 0, -or having waiters but not having the -.B FUTEX_WAITERS -bit set.) -.P -If a futex has an associated RT-mutex in the kernel -(i.e., there are blocked waiters) -and the owner of the futex/RT-mutex dies unexpectedly, -then the kernel cleans up the RT-mutex and hands it over to the next waiter. -This in turn requires that the user-space value is updated accordingly. -To indicate that this is required, the kernel sets the -.B FUTEX_OWNER_DIED -bit in the futex word along with the thread ID of the new owner. -User space can detect this situation via the presence of the -.B FUTEX_OWNER_DIED -bit and is then responsible for cleaning up the stale state left over by -the dead owner. -.\" tglx (July 2015): -.\" The FUTEX_OWNER_DIED bit can also be set on uncontended futexes, where -.\" the kernel has no state associated. This happens via the robust futex -.\" mechanism. In that case the futex value will be set to -.\" FUTEX_OWNER_DIED. The robust futex mechanism is also available for non -.\" PI futexes. -.P -PI futexes are operated on by specifying one of the values listed below in -.IR futex_op . -Note that the PI futex operations must be used as paired operations -and are subject to some additional requirements: -.IP \[bu] 3 -.BR FUTEX_LOCK_PI , -.BR FUTEX_LOCK_PI2 , -and -.B FUTEX_TRYLOCK_PI -pair with -.BR FUTEX_UNLOCK_PI . -.B FUTEX_UNLOCK_PI -must be called only on a futex owned by the calling thread, -as defined by the value policy, otherwise the error -.B EPERM -results. -.IP \[bu] -.B FUTEX_WAIT_REQUEUE_PI -pairs with -.BR FUTEX_CMP_REQUEUE_PI . -This must be performed from a non-PI futex to a distinct PI futex -(or the error -.B EINVAL -results). -Additionally, -.I val -(the number of waiters to be woken) must be 1 -(or the error -.B EINVAL -results). -.P -The PI futex operations are as follows: -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.TP -.BR FUTEX_LOCK_PI " (since Linux 2.6.18)" -.\" commit c87e2837be82df479a6bae9f155c43516d2feebc -This operation is used after an attempt to acquire -the lock via an atomic user-mode instruction failed -because the futex word has a nonzero value\[em]specifically, -because it contained the (PID-namespace-specific) TID of the lock owner. -.IP -The operation checks the value of the futex word at the address -.IR uaddr . -If the value is 0, then the kernel tries to atomically set -the futex value to the caller's TID. -If the futex word's value is nonzero, -the kernel atomically sets the -.B FUTEX_WAITERS -bit, which signals the futex owner that it cannot unlock the futex in -user space atomically by setting the futex value to 0. -.\" tglx (July 2015): -.\" The operation here is similar to the FUTEX_WAIT logic. When the user -.\" space atomic acquire does not succeed because the futex value was non -.\" zero, then the waiter goes into the kernel, takes the kernel internal -.\" lock and retries the acquisition under the lock. If the acquisition -.\" does not succeed either, then it sets the FUTEX_WAITERS bit, to signal -.\" the lock owner that it needs to go into the kernel. Here is the pseudo -.\" code: -.\" -.\" lock(kernel_lock); -.\" retry: -.\" -.\" /* -.\" * Owner might have unlocked in user space before we -.\" * were able to set the waiter bit. -.\" */ -.\" if (atomic_acquire(futex) == SUCCESS) { -.\" unlock(kernel_lock()); -.\" return 0; -.\" } -.\" -.\" /* -.\" * Owner might have unlocked after the above atomic_acquire() -.\" * attempt. -.\" */ -.\" if (atomic_set_waiters_bit(futex) != SUCCESS) -.\" goto retry; -.\" -.\" queue_waiter(); -.\" unlock(kernel_lock); -.\" block(); -.\" -After that, the kernel: -.RS -.IP (1) 5 -Tries to find the thread which is associated with the owner TID. -.IP (2) -Creates or reuses kernel state on behalf of the owner. -(If this is the first waiter, there is no kernel state for this -futex, so kernel state is created by locking the RT-mutex -and the futex owner is made the owner of the RT-mutex. -If there are existing waiters, then the existing state is reused.) -.IP (3) -Attaches the waiter to the futex -(i.e., the waiter is enqueued on the RT-mutex waiter list). -.RE -.IP -If more than one waiter exists, -the enqueueing of the waiter is in descending priority order. -(For information on priority ordering, see the discussion of the -.BR SCHED_DEADLINE , -.BR SCHED_FIFO , -and -.B SCHED_RR -scheduling policies in -.BR sched (7).) -The owner inherits either the waiter's CPU bandwidth -(if the waiter is scheduled under the -.B SCHED_DEADLINE -policy) or the waiter's priority (if the waiter is scheduled under the -.B SCHED_RR -or -.B SCHED_FIFO -policy). -.\" August 2015: -.\" mtk: If the realm is restricted purely to SCHED_OTHER (SCHED_NORMAL) -.\" processes, does the nice value come into play also? -.\" -.\" tglx: No. SCHED_OTHER/NORMAL tasks are handled in FIFO order -This inheritance follows the lock chain in the case of nested locking -.\" (i.e., task 1 blocks on lock A, held by task 2, -.\" while task 2 blocks on lock B, held by task 3) -and performs deadlock detection. -.IP -The -.I timeout -argument provides a timeout for the lock attempt. -If -.I timeout -is not NULL, the structure it points to specifies -an absolute timeout, measured against the -.B CLOCK_REALTIME -clock. -.\" 2016-07-07 response from Thomas Gleixner on LKML: -.\" From: Thomas Gleixner <tglx@linutronix.de> -.\" Date: 6 July 2016 at 20:57 -.\" Subject: Re: futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op -.\" -.\" On Thu, 23 Jun 2016, Michael Kerrisk (man-pages) wrote: -.\" > On 06/23/2016 08:28 PM, Darren Hart wrote: -.\" > > And as a follow-on, what is the reason for FUTEX_LOCK_PI only using -.\" > > CLOCK_REALTIME? It seems reasonable to me that a user may want to wait a -.\" > > specific amount of time, regardless of wall time. -.\" > -.\" > Yes, that's another weird inconsistency. -.\" -.\" The reason is that phtread_mutex_timedlock() uses absolute timeouts based on -.\" CLOCK_REALTIME. glibc folks asked to make that the default behaviour back -.\" then when we added LOCK_PI. -If -.I timeout -is NULL, the operation will block indefinitely. -.IP -The -.IR uaddr2 , -.IR val , -and -.I val3 -arguments are ignored. -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.TP -.BR FUTEX_LOCK_PI2 " (since Linux 5.14)" -.\" commit bf22a6976897977b0a3f1aeba6823c959fc4fdae -This operation is the same as -.BR FUTEX_LOCK_PI , -except that the clock against which -.I timeout -is measured is selectable. -By default, the (absolute) timeout specified in -.I timeout -is measured against the -.B CLOCK_MONOTONIC -clock, but if the -.B FUTEX_CLOCK_REALTIME -flag is specified in -.IR futex_op , -then the timeout is measured against the -.B CLOCK_REALTIME -clock. -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.TP -.BR FUTEX_TRYLOCK_PI " (since Linux 2.6.18)" -.\" commit c87e2837be82df479a6bae9f155c43516d2feebc -This operation tries to acquire the lock at -.IR uaddr . -It is invoked when a user-space atomic acquire did not -succeed because the futex word was not 0. -.IP -Because the kernel has access to more state information than user space, -acquisition of the lock might succeed if performed by the -kernel in cases where the futex word -(i.e., the state information accessible to use-space) contains stale state -.RB ( FUTEX_WAITERS -and/or -.BR FUTEX_OWNER_DIED ). -This can happen when the owner of the futex died. -User space cannot handle this condition in a race-free manner, -but the kernel can fix this up and acquire the futex. -.\" Paraphrasing a f2f conversation with Thomas Gleixner about the -.\" above point (Aug 2015): ### -.\" There is a rare possibility of a race condition involving an -.\" uncontended futex with no owner, but with waiters. The -.\" kernel-user-space contract is that if a futex is nonzero, you must -.\" go into kernel. The futex was owned by a task, and that task dies -.\" but there are no waiters, so the futex value is non zero. -.\" Therefore, the next locker has to go into the kernel, -.\" so that the kernel has a chance to clean up. (CMXCH on zero -.\" in user space would fail, so kernel has to clean up.) -.\" Darren Hart (Oct 2015): -.\" The trylock in the kernel has more state, so it can independently -.\" verify the flags that user space must trust implicitly. -.IP -The -.IR uaddr2 , -.IR val , -.IR timeout , -and -.I val3 -arguments are ignored. -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.TP -.BR FUTEX_UNLOCK_PI " (since Linux 2.6.18)" -.\" commit c87e2837be82df479a6bae9f155c43516d2feebc -This operation wakes the top priority waiter that is waiting in -.B FUTEX_LOCK_PI -or -.B FUTEX_LOCK_PI2 -on the futex address provided by the -.I uaddr -argument. -.IP -This is called when the user-space value at -.I uaddr -cannot be changed atomically from a TID (of the owner) to 0. -.IP -The -.IR uaddr2 , -.IR val , -.IR timeout , -and -.I val3 -arguments are ignored. -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.TP -.BR FUTEX_CMP_REQUEUE_PI " (since Linux 2.6.31)" -.\" commit 52400ba946759af28442dee6265c5c0180ac7122 -This operation is a PI-aware variant of -.BR FUTEX_CMP_REQUEUE . -It requeues waiters that are blocked via -.B FUTEX_WAIT_REQUEUE_PI -on -.I uaddr -from a non-PI source futex -.RI ( uaddr ) -to a PI target futex -.RI ( uaddr2 ). -.IP -As with -.BR FUTEX_CMP_REQUEUE , -this operation wakes up a maximum of -.I val -waiters that are waiting on the futex at -.IR uaddr . -However, for -.BR FUTEX_CMP_REQUEUE_PI , -.I val -is required to be 1 -(since the main point is to avoid a thundering herd). -The remaining waiters are removed from the wait queue of the source futex at -.I uaddr -and added to the wait queue of the target futex at -.IR uaddr2 . -.IP -The -.I val2 -.\" val2 is the cap on the number of requeued waiters. -.\" In the glibc pthread_cond_broadcast() implementation, this argument -.\" is specified as INT_MAX, and for pthread_cond_signal() it is 0. -and -.I val3 -arguments serve the same purposes as for -.BR FUTEX_CMP_REQUEUE . -.\" -.\" The page at http://locklessinc.com/articles/futex_cheat_sheet/ -.\" notes that "priority-inheritance Futex to priority-inheritance -.\" Futex requeues are currently unsupported". However, probably -.\" the page does not need to say nothing about this, since -.\" Thomas Gleixner commented (July 2015): "they never will be -.\" supported because they make no sense at all" -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.TP -.BR FUTEX_WAIT_REQUEUE_PI " (since Linux 2.6.31)" -.\" commit 52400ba946759af28442dee6265c5c0180ac7122 -.\" -Wait on a non-PI futex at -.I uaddr -and potentially be requeued (via a -.B FUTEX_CMP_REQUEUE_PI -operation in another task) onto a PI futex at -.IR uaddr2 . -The wait operation on -.I uaddr -is the same as for -.BR FUTEX_WAIT . -.IP -The waiter can be removed from the wait on -.I uaddr -without requeueing on -.I uaddr2 -via a -.B FUTEX_WAKE -operation in another task. -In this case, the -.B FUTEX_WAIT_REQUEUE_PI -operation fails with the error -.BR EAGAIN . -.IP -If -.I timeout -is not NULL, the structure it points to specifies -an absolute timeout for the wait operation. -If -.I timeout -is NULL, the operation can block indefinitely. -.IP -The -.I val3 -argument is ignored. -.IP -The -.B FUTEX_WAIT_REQUEUE_PI -and -.B FUTEX_CMP_REQUEUE_PI -were added to support a fairly specific use case: -support for priority-inheritance-aware POSIX threads condition variables. -The idea is that these operations should always be paired, -in order to ensure that user space and the kernel remain in sync. -Thus, in the -.B FUTEX_WAIT_REQUEUE_PI -operation, the user-space application pre-specifies the target -of the requeue that takes place in the -.B FUTEX_CMP_REQUEUE_PI -operation. -.\" -.\" Darren Hart notes that a patch to allow glibc to fully support -.\" PI-aware pthreads condition variables has not yet been accepted into -.\" glibc. The story is complex, and can be found at -.\" https://sourceware.org/bugzilla/show_bug.cgi?id=11588 -.\" Darren notes that in the meantime, the patch is shipped with various -.\" PREEMPT_RT-enabled Linux systems. -.\" -.\" Related to the preceding, Darren proposed that somewhere, man-pages -.\" should document the following point: -.\" -.\" While the Linux kernel, since Linux 2.6.31, supports requeueing of -.\" priority-inheritance (PI) aware mutexes via the -.\" FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI futex operations, -.\" the glibc implementation does not yet take full advantage of this. -.\" Specifically, the condvar internal data lock remains a non-PI aware -.\" mutex, regardless of the type of the pthread_mutex associated with -.\" the condvar. This can lead to an unbounded priority inversion on -.\" the internal data lock even when associating a PI aware -.\" pthread_mutex with a condvar during a pthread_cond*_wait -.\" operation. For this reason, it is not recommended to rely on -.\" priority inheritance when using pthread condition variables. -.\" -.\" The problem is that the obvious location for this text is -.\" the pthread_cond*wait(3) man page. However, such a man page -.\" does not currently exist. -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.SH RETURN VALUE -In the event of an error (and assuming that -.BR futex () -was invoked via -.BR syscall (2)), -all operations return \-1 and set -.I errno -to indicate the error. -.P -The return value on success depends on the operation, -as described in the following list: -.TP -.B FUTEX_WAIT -Returns 0 if the caller was woken up. -Note that a wake-up can also be caused by common futex usage patterns -in unrelated code that happened to have previously used the futex word's -memory location (e.g., typical futex-based implementations of -Pthreads mutexes can cause this under some conditions). -Therefore, callers should always conservatively assume that a return -value of 0 can mean a spurious wake-up, and use the futex word's value -(i.e., the user-space synchronization scheme) -to decide whether to continue to block or not. -.TP -.B FUTEX_WAKE -Returns the number of waiters that were woken up. -.TP -.B FUTEX_FD -Returns the new file descriptor associated with the futex. -.TP -.B FUTEX_REQUEUE -Returns the number of waiters that were woken up. -.TP -.B FUTEX_CMP_REQUEUE -Returns the total number of waiters that were woken up or -requeued to the futex for the futex word at -.IR uaddr2 . -If this value is greater than -.IR val , -then the difference is the number of waiters requeued to the futex for the -futex word at -.IR uaddr2 . -.TP -.B FUTEX_WAKE_OP -Returns the total number of waiters that were woken up. -This is the sum of the woken waiters on the two futexes for -the futex words at -.I uaddr -and -.IR uaddr2 . -.TP -.B FUTEX_WAIT_BITSET -Returns 0 if the caller was woken up. -See -.B FUTEX_WAIT -for how to interpret this correctly in practice. -.TP -.B FUTEX_WAKE_BITSET -Returns the number of waiters that were woken up. -.TP -.B FUTEX_LOCK_PI -Returns 0 if the futex was successfully locked. -.TP -.B FUTEX_LOCK_PI2 -Returns 0 if the futex was successfully locked. -.TP -.B FUTEX_TRYLOCK_PI -Returns 0 if the futex was successfully locked. -.TP -.B FUTEX_UNLOCK_PI -Returns 0 if the futex was successfully unlocked. -.TP -.B FUTEX_CMP_REQUEUE_PI -Returns the total number of waiters that were woken up or -requeued to the futex for the futex word at -.IR uaddr2 . -If this value is greater than -.IR val , -then difference is the number of waiters requeued to the futex for -the futex word at -.IR uaddr2 . -.TP -.B FUTEX_WAIT_REQUEUE_PI -Returns 0 if the caller was successfully requeued to the futex for -the futex word at -.IR uaddr2 . -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.SH ERRORS -.TP -.B EACCES -No read access to the memory of a futex word. -.TP -.B EAGAIN -.RB ( FUTEX_WAIT , -.BR FUTEX_WAIT_BITSET , -.BR FUTEX_WAIT_REQUEUE_PI ) -The value pointed to by -.I uaddr -was not equal to the expected value -.I val -at the time of the call. -.IP -.BR Note : -on Linux, the symbolic names -.B EAGAIN -and -.B EWOULDBLOCK -(both of which appear in different parts of the kernel futex code) -have the same value. -.TP -.B EAGAIN -.RB ( FUTEX_CMP_REQUEUE , -.BR FUTEX_CMP_REQUEUE_PI ) -The value pointed to by -.I uaddr -is not equal to the expected value -.IR val3 . -.TP -.B EAGAIN -.RB ( FUTEX_LOCK_PI , -.BR FUTEX_LOCK_PI2 , -.BR FUTEX_TRYLOCK_PI , -.BR FUTEX_CMP_REQUEUE_PI ) -The futex owner thread ID of -.I uaddr -(for -.BR FUTEX_CMP_REQUEUE_PI : -.IR uaddr2 ) -is about to exit, -but has not yet handled the internal state cleanup. -Try again. -.TP -.B EDEADLK -.RB ( FUTEX_LOCK_PI , -.BR FUTEX_LOCK_PI2 , -.BR FUTEX_TRYLOCK_PI , -.BR FUTEX_CMP_REQUEUE_PI ) -The futex word at -.I uaddr -is already locked by the caller. -.TP -.B EDEADLK -.\" FIXME . I see that kernel/locking/rtmutex.c uses EDEADLK in some -.\" places, and EDEADLOCK in others. On almost all architectures -.\" these constants are synonymous. Is there a reason that both -.\" names are used? -.\" -.\" tglx (July 2015): "No. We should probably fix that." -.\" -.RB ( FUTEX_CMP_REQUEUE_PI ) -While requeueing a waiter to the PI futex for the futex word at -.IR uaddr2 , -the kernel detected a deadlock. -.TP -.B EFAULT -A required pointer argument (i.e., -.IR uaddr , -.IR uaddr2 , -or -.IR timeout ) -did not point to a valid user-space address. -.TP -.B EINTR -A -.B FUTEX_WAIT -or -.B FUTEX_WAIT_BITSET -operation was interrupted by a signal (see -.BR signal (7)). -Before Linux 2.6.22, this error could also be returned for -a spurious wakeup; since Linux 2.6.22, this no longer happens. -.TP -.B EINVAL -The operation in -.I futex_op -is one of those that employs a timeout, but the supplied -.I timeout -argument was invalid -.RI ( tv_sec -was less than zero, or -.I tv_nsec -was not less than 1,000,000,000). -.TP -.B EINVAL -The operation specified in -.I futex_op -employs one or both of the pointers -.I uaddr -and -.IR uaddr2 , -but one of these does not point to a valid object\[em]that is, -the address is not four-byte-aligned. -.TP -.B EINVAL -.RB ( FUTEX_WAIT_BITSET , -.BR FUTEX_WAKE_BITSET ) -The bit mask supplied in -.I val3 -is zero. -.TP -.B EINVAL -.RB ( FUTEX_CMP_REQUEUE_PI ) -.I uaddr -equals -.I uaddr2 -(i.e., an attempt was made to requeue to the same futex). -.TP -.B EINVAL -.RB ( FUTEX_FD ) -The signal number supplied in -.I val -is invalid. -.TP -.B EINVAL -.RB ( FUTEX_WAKE , -.BR FUTEX_WAKE_OP , -.BR FUTEX_WAKE_BITSET , -.BR FUTEX_REQUEUE , -.BR FUTEX_CMP_REQUEUE ) -The kernel detected an inconsistency between the user-space state at -.I uaddr -and the kernel state\[em]that is, it detected a waiter which waits in -.B FUTEX_LOCK_PI -or -.B FUTEX_LOCK_PI2 -on -.IR uaddr . -.TP -.B EINVAL -.RB ( FUTEX_LOCK_PI , -.BR FUTEX_LOCK_PI2 , -.BR FUTEX_TRYLOCK_PI , -.BR FUTEX_UNLOCK_PI ) -The kernel detected an inconsistency between the user-space state at -.I uaddr -and the kernel state. -This indicates either state corruption -or that the kernel found a waiter on -.I uaddr -which is waiting via -.B FUTEX_WAIT -or -.BR FUTEX_WAIT_BITSET . -.TP -.B EINVAL -.RB ( FUTEX_CMP_REQUEUE_PI ) -The kernel detected an inconsistency between the user-space state at -.I uaddr2 -and the kernel state; -.\" From a conversation with Thomas Gleixner (Aug 2015): ### -.\" The kernel sees: I have non PI state for a futex you tried to -.\" tell me was PI -that is, the kernel detected a waiter which waits via -.B FUTEX_WAIT -or -.B FUTEX_WAIT_BITSET -on -.IR uaddr2 . -.TP -.B EINVAL -.RB ( FUTEX_CMP_REQUEUE_PI ) -The kernel detected an inconsistency between the user-space state at -.I uaddr -and the kernel state; -that is, the kernel detected a waiter which waits via -.B FUTEX_WAIT -or -.B FUTEX_WAIT_BITSET -on -.IR uaddr . -.TP -.B EINVAL -.RB ( FUTEX_CMP_REQUEUE_PI ) -The kernel detected an inconsistency between the user-space state at -.I uaddr -and the kernel state; -that is, the kernel detected a waiter which waits on -.I uaddr -via -.B FUTEX_LOCK_PI -or -.B FUTEX_LOCK_PI2 -(instead of -.BR FUTEX_WAIT_REQUEUE_PI ). -.TP -.B EINVAL -.RB ( FUTEX_CMP_REQUEUE_PI ) -.\" This deals with the case: -.\" wait_requeue_pi(A, B); -.\" requeue_pi(A, C); -An attempt was made to requeue a waiter to a futex other than that -specified by the matching -.B FUTEX_WAIT_REQUEUE_PI -call for that waiter. -.TP -.B EINVAL -.RB ( FUTEX_CMP_REQUEUE_PI ) -The -.I val -argument is not 1. -.TP -.B EINVAL -Invalid argument. -.TP -.B ENFILE -.RB ( FUTEX_FD ) -The system-wide limit on the total number of open files has been reached. -.TP -.B ENOMEM -.RB ( FUTEX_LOCK_PI , -.BR FUTEX_LOCK_PI2 , -.BR FUTEX_TRYLOCK_PI , -.BR FUTEX_CMP_REQUEUE_PI ) -The kernel could not allocate memory to hold state information. -.TP -.B ENOSYS -Invalid operation specified in -.IR futex_op . -.TP -.B ENOSYS -The -.B FUTEX_CLOCK_REALTIME -option was specified in -.IR futex_op , -but the accompanying operation was neither -.BR FUTEX_WAIT , -.BR FUTEX_WAIT_BITSET , -.BR FUTEX_WAIT_REQUEUE_PI , -nor -.BR FUTEX_LOCK_PI2 . -.TP -.B ENOSYS -.RB ( FUTEX_LOCK_PI , -.BR FUTEX_LOCK_PI2 , -.BR FUTEX_TRYLOCK_PI , -.BR FUTEX_UNLOCK_PI , -.BR FUTEX_CMP_REQUEUE_PI , -.BR FUTEX_WAIT_REQUEUE_PI ) -A run-time check determined that the operation is not available. -The PI-futex operations are not implemented on all architectures and -are not supported on some CPU variants. -.TP -.B EPERM -.RB ( FUTEX_LOCK_PI , -.BR FUTEX_LOCK_PI2 , -.BR FUTEX_TRYLOCK_PI , -.BR FUTEX_CMP_REQUEUE_PI ) -The caller is not allowed to attach itself to the futex at -.I uaddr -(for -.BR FUTEX_CMP_REQUEUE_PI : -the futex at -.IR uaddr2 ). -(This may be caused by a state corruption in user space.) -.TP -.B EPERM -.RB ( FUTEX_UNLOCK_PI ) -The caller does not own the lock represented by the futex word. -.TP -.B ESRCH -.RB ( FUTEX_LOCK_PI , -.BR FUTEX_LOCK_PI2 , -.BR FUTEX_TRYLOCK_PI , -.BR FUTEX_CMP_REQUEUE_PI ) -The thread ID in the futex word at -.I uaddr -does not exist. -.TP -.B ESRCH -.RB ( FUTEX_CMP_REQUEUE_PI ) -The thread ID in the futex word at -.I uaddr2 -does not exist. -.TP -.B ETIMEDOUT -The operation in -.I futex_op -employed the timeout specified in -.IR timeout , -and the timeout expired before the operation completed. -.\" -.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" -.\" -.SH STANDARDS -Linux. -.SH HISTORY -Linux 2.6.0. -.P -Initial futex support was merged in Linux 2.5.7 but with different -semantics from what was described above. -A four-argument system call with the semantics -described in this page was introduced in Linux 2.5.40. -A fifth argument was added in Linux 2.5.70, -and a sixth argument was added in Linux 2.6.7. -.SH EXAMPLES -The program below demonstrates use of futexes in a program where a parent -process and a child process use a pair of futexes located inside a -shared anonymous mapping to synchronize access to a shared resource: -the terminal. -The two processes each write -.I nloops -(a command-line argument that defaults to 5 if omitted) -messages to the terminal and employ a synchronization protocol -that ensures that they alternate in writing messages. -Upon running this program we see output such as the following: -.P -.in +4n -.EX -$ \fB./futex_demo\fP -Parent (18534) 0 -Child (18535) 0 -Parent (18534) 1 -Child (18535) 1 -Parent (18534) 2 -Child (18535) 2 -Parent (18534) 3 -Child (18535) 3 -Parent (18534) 4 -Child (18535) 4 -.EE -.in -.SS Program source -\& -.\" SRC BEGIN (futex.c) -.EX -/* futex_demo.c -\& - Usage: futex_demo [nloops] - (Default: 5) -\& - Demonstrate the use of futexes in a program where parent and child - use a pair of futexes located inside a shared anonymous mapping to - synchronize access to a shared resource: the terminal. The two - processes each write \[aq]num\-loops\[aq] messages to the terminal and employ - a synchronization protocol that ensures that they alternate in - writing messages. -*/ -#define _GNU_SOURCE -#include <err.h> -#include <errno.h> -#include <linux/futex.h> -#include <stdatomic.h> -#include <stdint.h> -#include <stdio.h> -#include <stdlib.h> -#include <sys/mman.h> -#include <sys/syscall.h> -#include <sys/time.h> -#include <sys/wait.h> -#include <unistd.h> -\& -static uint32_t *futex1, *futex2, *iaddr; -\& -static int -futex(uint32_t *uaddr, int futex_op, uint32_t val, - const struct timespec *timeout, uint32_t *uaddr2, uint32_t val3) -{ - return syscall(SYS_futex, uaddr, futex_op, val, - timeout, uaddr2, val3); -} -\& -/* Acquire the futex pointed to by \[aq]futexp\[aq]: wait for its value to - become 1, and then set the value to 0. */ -\& -static void -fwait(uint32_t *futexp) -{ - long s; - const uint32_t one = 1; -\& - /* atomic_compare_exchange_strong(ptr, oldval, newval) - atomically performs the equivalent of: -\& - if (*ptr == *oldval) - *ptr = newval; -\& - It returns true if the test yielded true and *ptr was updated. */ -\& - while (1) { -\& - /* Is the futex available? */ - if (atomic_compare_exchange_strong(futexp, &one, 0)) - break; /* Yes */ -\& - /* Futex is not available; wait. */ -\& - s = futex(futexp, FUTEX_WAIT, 0, NULL, NULL, 0); - if (s == \-1 && errno != EAGAIN) - err(EXIT_FAILURE, "futex\-FUTEX_WAIT"); - } -} -\& -/* Release the futex pointed to by \[aq]futexp\[aq]: if the futex currently - has the value 0, set its value to 1 and then wake any futex waiters, - so that if the peer is blocked in fwait(), it can proceed. */ -\& -static void -fpost(uint32_t *futexp) -{ - long s; - const uint32_t zero = 0; -\& - /* atomic_compare_exchange_strong() was described - in comments above. */ -\& - if (atomic_compare_exchange_strong(futexp, &zero, 1)) { - s = futex(futexp, FUTEX_WAKE, 1, NULL, NULL, 0); - if (s == \-1) - err(EXIT_FAILURE, "futex\-FUTEX_WAKE"); - } -} -\& -int -main(int argc, char *argv[]) -{ - pid_t childPid; - unsigned int nloops; -\& - setbuf(stdout, NULL); -\& - nloops = (argc > 1) ? atoi(argv[1]) : 5; -\& - /* Create a shared anonymous mapping that will hold the futexes. - Since the futexes are being shared between processes, we - subsequently use the "shared" futex operations (i.e., not the - ones suffixed "_PRIVATE"). */ -\& - iaddr = mmap(NULL, sizeof(*iaddr) * 2, PROT_READ | PROT_WRITE, - MAP_ANONYMOUS | MAP_SHARED, \-1, 0); - if (iaddr == MAP_FAILED) - err(EXIT_FAILURE, "mmap"); -\& - futex1 = &iaddr[0]; - futex2 = &iaddr[1]; -\& - *futex1 = 0; /* State: unavailable */ - *futex2 = 1; /* State: available */ -\& - /* Create a child process that inherits the shared anonymous - mapping. */ -\& - childPid = fork(); - if (childPid == \-1) - err(EXIT_FAILURE, "fork"); -\& - if (childPid == 0) { /* Child */ - for (unsigned int j = 0; j < nloops; j++) { - fwait(futex1); - printf("Child (%jd) %u\en", (intmax_t) getpid(), j); - fpost(futex2); - } -\& - exit(EXIT_SUCCESS); - } -\& - /* Parent falls through to here. */ -\& - for (unsigned int j = 0; j < nloops; j++) { - fwait(futex2); - printf("Parent (%jd) %u\en", (intmax_t) getpid(), j); - fpost(futex1); - } -\& - wait(NULL); -\& - exit(EXIT_SUCCESS); -} -.EE -.\" SRC END -.SH SEE ALSO -.ad l -.BR get_robust_list (2), -.BR restart_syscall (2), -.BR pthread_mutexattr_getprotocol (3), -.BR futex (7), -.BR sched (7) -.P -The following kernel source files: -.IP \[bu] 3 -.I Documentation/pi\-futex.txt -.IP \[bu] -.I Documentation/futex\-requeue\-pi.txt -.IP \[bu] -.I Documentation/locking/rt\-mutex.txt -.IP \[bu] -.I Documentation/locking/rt\-mutex\-design.txt -.IP \[bu] -.I Documentation/robust\-futex\-ABI.txt -.P -Franke, H., Russell, R., and Kirwood, M., 2002. -\fIFuss, Futexes and Furwocks: Fast Userlevel Locking in Linux\fP -(from proceedings of the Ottawa Linux Symposium 2002), -.br -.UR http://kernel.org\:/doc\:/ols\:/2002\:/ols2002\-pages\-479\-495.pdf -.UE -.P -Hart, D., 2009. \fIA futex overview and update\fP, -.UR http://lwn.net/Articles/360699/ -.UE -.P -Hart, D.\& and Guniguntala, D., 2009. -\fIRequeue-PI: Making glibc Condvars PI-Aware\fP -(from proceedings of the 2009 Real-Time Linux Workshop), -.UR http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf -.UE -.P -Drepper, U., 2011. \fIFutexes Are Tricky\fP, -.UR http://www.akkadia.org/drepper/futex.pdf -.UE -.P -Futex example library, futex\-*.tar.bz2 at -.br -.UR https://mirrors.kernel.org\:/pub\:/linux\:/kernel\:/people\:/rusty/ -.UE -.\" -.\" FIXME(Torvald) We should probably refer to the glibc code here, in -.\" particular the glibc-internal futex wrapper functions that are -.\" WIP, and the generic pthread_mutex_t and perhaps condvar -.\" implementations. |