diff options
Diffstat (limited to 'man7/epoll.7')
-rw-r--r-- | man7/epoll.7 | 610 |
1 files changed, 0 insertions, 610 deletions
diff --git a/man7/epoll.7 b/man7/epoll.7 deleted file mode 100644 index a93bc0b..0000000 --- a/man7/epoll.7 +++ /dev/null @@ -1,610 +0,0 @@ -.\" Copyright (C) 2003 Davide Libenzi -.\" -.\" SPDX-License-Identifier: GPL-2.0-or-later -.\" -.\" Davide Libenzi <davidel@xmailserver.org> -.\" -.TH epoll 7 2023-10-31 "Linux man-pages 6.7" -.SH NAME -epoll \- I/O event notification facility -.SH SYNOPSIS -.nf -.B #include <sys/epoll.h> -.fi -.SH DESCRIPTION -The -.B epoll -API performs a similar task to -.BR poll (2): -monitoring multiple file descriptors to see if I/O is possible on any of them. -The -.B epoll -API can be used either as an edge-triggered or a level-triggered -interface and scales well to large numbers of watched file descriptors. -.P -The central concept of the -.B epoll -API is the -.B epoll -.IR instance , -an in-kernel data structure which, from a user-space perspective, -can be considered as a container for two lists: -.IP \[bu] 3 -The -.I interest -list (sometimes also called the -.B epoll -set): the set of file descriptors that the process has registered -an interest in monitoring. -.IP \[bu] -The -.I ready -list: the set of file descriptors that are "ready" for I/O. -The ready list is a subset of -(or, more precisely, a set of references to) -the file descriptors in the interest list. -The ready list is dynamically populated -by the kernel as a result of I/O activity on those file descriptors. -.P -The following system calls are provided to -create and manage an -.B epoll -instance: -.IP \[bu] 3 -.BR epoll_create (2) -creates a new -.B epoll -instance and returns a file descriptor referring to that instance. -(The more recent -.BR epoll_create1 (2) -extends the functionality of -.BR epoll_create (2).) -.IP \[bu] -Interest in particular file descriptors is then registered via -.BR epoll_ctl (2), -which adds items to the interest list of the -.B epoll -instance. -.IP \[bu] -.BR epoll_wait (2) -waits for I/O events, -blocking the calling thread if no events are currently available. -(This system call can be thought of as fetching items from -the ready list of the -.B epoll -instance.) -.\" -.SS Level-triggered and edge-triggered -The -.B epoll -event distribution interface is able to behave both as edge-triggered -(ET) and as level-triggered (LT). -The difference between the two mechanisms -can be described as follows. -Suppose that -this scenario happens: -.IP (1) 5 -The file descriptor that represents the read side of a pipe -.RI ( rfd ) -is registered on the -.B epoll -instance. -.IP (2) -A pipe writer writes 2\ kB of data on the write side of the pipe. -.IP (3) -A call to -.BR epoll_wait (2) -is done that will return -.I rfd -as a ready file descriptor. -.IP (4) -The pipe reader reads 1\ kB of data from -.IR rfd . -.IP (5) -A call to -.BR epoll_wait (2) -is done. -.P -If the -.I rfd -file descriptor has been added to the -.B epoll -interface using the -.B EPOLLET -(edge-triggered) -flag, the call to -.BR epoll_wait (2) -done in step -.B 5 -will probably hang despite the available data still present in the file -input buffer; -meanwhile the remote peer might be expecting a response based on the -data it already sent. -The reason for this is that edge-triggered mode -delivers events only when changes occur on the monitored file descriptor. -So, in step -.B 5 -the caller might end up waiting for some data that is already present inside -the input buffer. -In the above example, an event on -.I rfd -will be generated because of the write done in -.B 2 -and the event is consumed in -.BR 3 . -Since the read operation done in -.B 4 -does not consume the whole buffer data, the call to -.BR epoll_wait (2) -done in step -.B 5 -might block indefinitely. -.P -An application that employs the -.B EPOLLET -flag should use nonblocking file descriptors to avoid having a blocking -read or write starve a task that is handling multiple file descriptors. -The suggested way to use -.B epoll -as an edge-triggered -.RB ( EPOLLET ) -interface is as follows: -.IP (1) 5 -with nonblocking file descriptors; and -.IP (2) -by waiting for an event only after -.BR read (2) -or -.BR write (2) -return -.BR EAGAIN . -.P -By contrast, when used as a level-triggered interface -(the default, when -.B EPOLLET -is not specified), -.B epoll -is simply a faster -.BR poll (2), -and can be used wherever the latter is used since it shares the -same semantics. -.P -Since even with edge-triggered -.BR epoll , -multiple events can be generated upon receipt of multiple chunks of data, -the caller has the option to specify the -.B EPOLLONESHOT -flag, to tell -.B epoll -to disable the associated file descriptor after the receipt of an event with -.BR epoll_wait (2). -When the -.B EPOLLONESHOT -flag is specified, -it is the caller's responsibility to rearm the file descriptor using -.BR epoll_ctl (2) -with -.BR EPOLL_CTL_MOD . -.P -If multiple threads -(or processes, if child processes have inherited the -.B epoll -file descriptor across -.BR fork (2)) -are blocked in -.BR epoll_wait (2) -waiting on the same epoll file descriptor and a file descriptor -in the interest list that is marked for edge-triggered -.RB ( EPOLLET ) -notification becomes ready, -just one of the threads (or processes) is awoken from -.BR epoll_wait (2). -This provides a useful optimization for avoiding "thundering herd" wake-ups -in some scenarios. -.\" -.SS Interaction with autosleep -If the system is in -.B autosleep -mode via -.I /sys/power/autosleep -and an event happens which wakes the device from sleep, the device -driver will keep the device awake only until that event is queued. -To keep the device awake until the event has been processed, -it is necessary to use the -.BR epoll_ctl (2) -.B EPOLLWAKEUP -flag. -.P -When the -.B EPOLLWAKEUP -flag is set in the -.B events -field for a -.IR "struct epoll_event" , -the system will be kept awake from the moment the event is queued, -through the -.BR epoll_wait (2) -call which returns the event until the subsequent -.BR epoll_wait (2) -call. -If the event should keep the system awake beyond that time, -then a separate -.I wake_lock -should be taken before the second -.BR epoll_wait (2) -call. -.SS /proc interfaces -The following interfaces can be used to limit the amount of -kernel memory consumed by epoll: -.\" Following was added in Linux 2.6.28, but them removed in Linux 2.6.29 -.\" .TP -.\" .IR /proc/sys/fs/epoll/max_user_instances " (since Linux 2.6.28)" -.\" This specifies an upper limit on the number of epoll instances -.\" that can be created per real user ID. -.TP -.IR /proc/sys/fs/epoll/max_user_watches " (since Linux 2.6.28)" -This specifies a limit on the total number of -file descriptors that a user can register across -all epoll instances on the system. -The limit is per real user ID. -Each registered file descriptor costs roughly 90 bytes on a 32-bit kernel, -and roughly 160 bytes on a 64-bit kernel. -Currently, -.\" Linux 2.6.29 (in Linux 2.6.28, the default was 1/32 of lowmem) -the default value for -.I max_user_watches -is 1/25 (4%) of the available low memory, -divided by the registration cost in bytes. -.SS Example for suggested usage -While the usage of -.B epoll -when employed as a level-triggered interface does have the same -semantics as -.BR poll (2), -the edge-triggered usage requires more clarification to avoid stalls -in the application event loop. -In this example, listener is a -nonblocking socket on which -.BR listen (2) -has been called. -The function -.I do_use_fd() -uses the new ready file descriptor until -.B EAGAIN -is returned by either -.BR read (2) -or -.BR write (2). -An event-driven state machine application should, after having received -.BR EAGAIN , -record its current state so that at the next call to -.I do_use_fd() -it will continue to -.BR read (2) -or -.BR write (2) -from where it stopped before. -.P -.in +4n -.EX -#define MAX_EVENTS 10 -struct epoll_event ev, events[MAX_EVENTS]; -int listen_sock, conn_sock, nfds, epollfd; -\& -/* Code to set up listening socket, \[aq]listen_sock\[aq], - (socket(), bind(), listen()) omitted. */ -\& -epollfd = epoll_create1(0); -if (epollfd == \-1) { - perror("epoll_create1"); - exit(EXIT_FAILURE); -} -\& -ev.events = EPOLLIN; -ev.data.fd = listen_sock; -if (epoll_ctl(epollfd, EPOLL_CTL_ADD, listen_sock, &ev) == \-1) { - perror("epoll_ctl: listen_sock"); - exit(EXIT_FAILURE); -} -\& -for (;;) { - nfds = epoll_wait(epollfd, events, MAX_EVENTS, \-1); - if (nfds == \-1) { - perror("epoll_wait"); - exit(EXIT_FAILURE); - } -\& - for (n = 0; n < nfds; ++n) { - if (events[n].data.fd == listen_sock) { - conn_sock = accept(listen_sock, - (struct sockaddr *) &addr, &addrlen); - if (conn_sock == \-1) { - perror("accept"); - exit(EXIT_FAILURE); - } - setnonblocking(conn_sock); - ev.events = EPOLLIN | EPOLLET; - ev.data.fd = conn_sock; - if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn_sock, - &ev) == \-1) { - perror("epoll_ctl: conn_sock"); - exit(EXIT_FAILURE); - } - } else { - do_use_fd(events[n].data.fd); - } - } -} -.EE -.in -.P -When used as an edge-triggered interface, for performance reasons, it is -possible to add the file descriptor inside the -.B epoll -interface -.RB ( EPOLL_CTL_ADD ) -once by specifying -.RB ( EPOLLIN | EPOLLOUT ). -This allows you to avoid -continuously switching between -.B EPOLLIN -and -.B EPOLLOUT -calling -.BR epoll_ctl (2) -with -.BR EPOLL_CTL_MOD . -.SS Questions and answers -.IP \[bu] 3 -What is the key used to distinguish the file descriptors registered in an -interest list? -.IP -The key is the combination of the file descriptor number and -the open file description -(also known as an "open file handle", -the kernel's internal representation of an open file). -.IP \[bu] -What happens if you register the same file descriptor on an -.B epoll -instance twice? -.IP -You will probably get -.BR EEXIST . -However, it is possible to add a duplicate -.RB ( dup (2), -.BR dup2 (2), -.BR fcntl (2) -.BR F_DUPFD ) -file descriptor to the same -.B epoll -instance. -.\" But a file descriptor duplicated by fork(2) can't be added to the -.\" set, because the [file *, fd] pair is already in the epoll set. -.\" That is a somewhat ugly inconsistency. On the one hand, a child process -.\" cannot add the duplicate file descriptor to the epoll set. (In every -.\" other case that I can think of, file descriptors duplicated by fork have -.\" similar semantics to file descriptors duplicated by dup() and friends.) On -.\" the other hand, the very fact that the child has a duplicate of the -.\" file descriptor means that even if the parent closes its file descriptor, -.\" then epoll_wait() in the parent will continue to receive notifications for -.\" that file descriptor because of the duplicated file descriptor in the child. -.\" -.\" See http://thread.gmane.org/gmane.linux.kernel/596462/ -.\" "epoll design problems with common fork/exec patterns" -.\" -.\" mtk, Feb 2008 -This can be a useful technique for filtering events, -if the duplicate file descriptors are registered with different -.I events -masks. -.IP \[bu] -Can two -.B epoll -instances wait for the same file descriptor? -If so, are events reported to both -.B epoll -file descriptors? -.IP -Yes, and events would be reported to both. -However, careful programming may be needed to do this correctly. -.IP \[bu] -Is the -.B epoll -file descriptor itself poll/epoll/selectable? -.IP -Yes. -If an -.B epoll -file descriptor has events waiting, then it will -indicate as being readable. -.IP \[bu] -What happens if one attempts to put an -.B epoll -file descriptor into its own file descriptor set? -.IP -The -.BR epoll_ctl (2) -call fails -.RB ( EINVAL ). -However, you can add an -.B epoll -file descriptor inside another -.B epoll -file descriptor set. -.IP \[bu] -Can I send an -.B epoll -file descriptor over a UNIX domain socket to another process? -.IP -Yes, but it does not make sense to do this, since the receiving process -would not have copies of the file descriptors in the interest list. -.IP \[bu] -Will closing a file descriptor cause it to be removed from all -.B epoll -interest lists? -.IP -Yes, but be aware of the following point. -A file descriptor is a reference to an open file description (see -.BR open (2)). -Whenever a file descriptor is duplicated via -.BR dup (2), -.BR dup2 (2), -.BR fcntl (2) -.BR F_DUPFD , -or -.BR fork (2), -a new file descriptor referring to the same open file description is -created. -An open file description continues to exist until all -file descriptors referring to it have been closed. -.IP -A file descriptor is removed from an -interest list only after all the file descriptors referring to the underlying -open file description have been closed. -This means that even after a file descriptor that is part of an -interest list has been closed, -events may be reported for that file descriptor if other file -descriptors referring to the same underlying file description remain open. -To prevent this happening, -the file descriptor must be explicitly removed from the interest list (using -.BR epoll_ctl (2) -.BR EPOLL_CTL_DEL ) -before it is duplicated. -Alternatively, -the application must ensure that all file descriptors are closed -(which may be difficult if file descriptors were duplicated -behind the scenes by library functions that used -.BR dup (2) -or -.BR fork (2)). -.IP \[bu] -If more than one event occurs between -.BR epoll_wait (2) -calls, are they combined or reported separately? -.IP -They will be combined. -.IP \[bu] -Does an operation on a file descriptor affect the -already collected but not yet reported events? -.IP -You can do two operations on an existing file descriptor. -Remove would be meaningless for -this case. -Modify will reread available I/O. -.IP \[bu] -Do I need to continuously read/write a file descriptor -until -.B EAGAIN -when using the -.B EPOLLET -flag (edge-triggered behavior)? -.IP -Receiving an event from -.BR epoll_wait (2) -should suggest to you that such -file descriptor is ready for the requested I/O operation. -You must consider it ready until the next (nonblocking) -read/write yields -.BR EAGAIN . -When and how you will use the file descriptor is entirely up to you. -.IP -For packet/token-oriented files (e.g., datagram socket, -terminal in canonical mode), -the only way to detect the end of the read/write I/O space -is to continue to read/write until -.BR EAGAIN . -.IP -For stream-oriented files (e.g., pipe, FIFO, stream socket), the -condition that the read/write I/O space is exhausted can also be detected by -checking the amount of data read from / written to the target file -descriptor. -For example, if you call -.BR read (2) -by asking to read a certain amount of data and -.BR read (2) -returns a lower number of bytes, you -can be sure of having exhausted the read I/O space for the file -descriptor. -The same is true when writing using -.BR write (2). -(Avoid this latter technique if you cannot guarantee that -the monitored file descriptor always refers to a stream-oriented file.) -.SS Possible pitfalls and ways to avoid them -.IP \[bu] 3 -.B Starvation (edge-triggered) -.IP -If there is a large amount of I/O space, -it is possible that by trying to drain -it the other files will not get processed causing starvation. -(This problem is not specific to -.BR epoll .) -.IP -The solution is to maintain a ready list -and mark the file descriptor as ready -in its associated data structure, thereby allowing the application to -remember which files need to be processed but still round robin amongst -all the ready files. -This also supports ignoring subsequent events you -receive for file descriptors that are already ready. -.IP \[bu] -.B If using an event cache... -.IP -If you use an event cache or store all the file descriptors returned from -.BR epoll_wait (2), -then make sure to provide a way to mark -its closure dynamically (i.e., caused by -a previous event's processing). -Suppose you receive 100 events from -.BR epoll_wait (2), -and in event #47 a condition causes event #13 to be closed. -If you remove the structure and -.BR close (2) -the file descriptor for event #13, then your -event cache might still say there are events waiting for that -file descriptor causing confusion. -.IP -One solution for this is to call, during the processing of event 47, -.BR epoll_ctl ( EPOLL_CTL_DEL ) -to delete file descriptor 13 and -.BR close (2), -then mark its associated -data structure as removed and link it to a cleanup list. -If you find another -event for file descriptor 13 in your batch processing, -you will discover the file descriptor had been -previously removed and there will be no confusion. -.SH VERSIONS -Some other systems provide similar mechanisms; -for example, -FreeBSD has -.IR kqueue , -and Solaris has -.IR /dev/poll . -.SH STANDARDS -Linux. -.SH HISTORY -Linux 2.5.44. -.\" Its interface should be finalized in Linux 2.5.66. -glibc 2.3.2. -.SH NOTES -The set of file descriptors that is being monitored via -an epoll file descriptor can be viewed via the entry for -the epoll file descriptor in the process's -.IR /proc/ pid /fdinfo -directory. -See -.BR proc (5) -for further details. -.P -The -.BR kcmp (2) -.B KCMP_EPOLL_TFD -operation can be used to test whether a file descriptor -is present in an epoll instance. -.SH SEE ALSO -.BR epoll_create (2), -.BR epoll_create1 (2), -.BR epoll_ctl (2), -.BR epoll_wait (2), -.BR poll (2), -.BR select (2) |