summaryrefslogtreecommitdiffstats
path: root/man7/epoll.7
diff options
context:
space:
mode:
Diffstat (limited to 'man7/epoll.7')
-rw-r--r--man7/epoll.7610
1 files changed, 0 insertions, 610 deletions
diff --git a/man7/epoll.7 b/man7/epoll.7
deleted file mode 100644
index a93bc0b..0000000
--- a/man7/epoll.7
+++ /dev/null
@@ -1,610 +0,0 @@
-.\" Copyright (C) 2003 Davide Libenzi
-.\"
-.\" SPDX-License-Identifier: GPL-2.0-or-later
-.\"
-.\" Davide Libenzi <davidel@xmailserver.org>
-.\"
-.TH epoll 7 2023-10-31 "Linux man-pages 6.7"
-.SH NAME
-epoll \- I/O event notification facility
-.SH SYNOPSIS
-.nf
-.B #include <sys/epoll.h>
-.fi
-.SH DESCRIPTION
-The
-.B epoll
-API performs a similar task to
-.BR poll (2):
-monitoring multiple file descriptors to see if I/O is possible on any of them.
-The
-.B epoll
-API can be used either as an edge-triggered or a level-triggered
-interface and scales well to large numbers of watched file descriptors.
-.P
-The central concept of the
-.B epoll
-API is the
-.B epoll
-.IR instance ,
-an in-kernel data structure which, from a user-space perspective,
-can be considered as a container for two lists:
-.IP \[bu] 3
-The
-.I interest
-list (sometimes also called the
-.B epoll
-set): the set of file descriptors that the process has registered
-an interest in monitoring.
-.IP \[bu]
-The
-.I ready
-list: the set of file descriptors that are "ready" for I/O.
-The ready list is a subset of
-(or, more precisely, a set of references to)
-the file descriptors in the interest list.
-The ready list is dynamically populated
-by the kernel as a result of I/O activity on those file descriptors.
-.P
-The following system calls are provided to
-create and manage an
-.B epoll
-instance:
-.IP \[bu] 3
-.BR epoll_create (2)
-creates a new
-.B epoll
-instance and returns a file descriptor referring to that instance.
-(The more recent
-.BR epoll_create1 (2)
-extends the functionality of
-.BR epoll_create (2).)
-.IP \[bu]
-Interest in particular file descriptors is then registered via
-.BR epoll_ctl (2),
-which adds items to the interest list of the
-.B epoll
-instance.
-.IP \[bu]
-.BR epoll_wait (2)
-waits for I/O events,
-blocking the calling thread if no events are currently available.
-(This system call can be thought of as fetching items from
-the ready list of the
-.B epoll
-instance.)
-.\"
-.SS Level-triggered and edge-triggered
-The
-.B epoll
-event distribution interface is able to behave both as edge-triggered
-(ET) and as level-triggered (LT).
-The difference between the two mechanisms
-can be described as follows.
-Suppose that
-this scenario happens:
-.IP (1) 5
-The file descriptor that represents the read side of a pipe
-.RI ( rfd )
-is registered on the
-.B epoll
-instance.
-.IP (2)
-A pipe writer writes 2\ kB of data on the write side of the pipe.
-.IP (3)
-A call to
-.BR epoll_wait (2)
-is done that will return
-.I rfd
-as a ready file descriptor.
-.IP (4)
-The pipe reader reads 1\ kB of data from
-.IR rfd .
-.IP (5)
-A call to
-.BR epoll_wait (2)
-is done.
-.P
-If the
-.I rfd
-file descriptor has been added to the
-.B epoll
-interface using the
-.B EPOLLET
-(edge-triggered)
-flag, the call to
-.BR epoll_wait (2)
-done in step
-.B 5
-will probably hang despite the available data still present in the file
-input buffer;
-meanwhile the remote peer might be expecting a response based on the
-data it already sent.
-The reason for this is that edge-triggered mode
-delivers events only when changes occur on the monitored file descriptor.
-So, in step
-.B 5
-the caller might end up waiting for some data that is already present inside
-the input buffer.
-In the above example, an event on
-.I rfd
-will be generated because of the write done in
-.B 2
-and the event is consumed in
-.BR 3 .
-Since the read operation done in
-.B 4
-does not consume the whole buffer data, the call to
-.BR epoll_wait (2)
-done in step
-.B 5
-might block indefinitely.
-.P
-An application that employs the
-.B EPOLLET
-flag should use nonblocking file descriptors to avoid having a blocking
-read or write starve a task that is handling multiple file descriptors.
-The suggested way to use
-.B epoll
-as an edge-triggered
-.RB ( EPOLLET )
-interface is as follows:
-.IP (1) 5
-with nonblocking file descriptors; and
-.IP (2)
-by waiting for an event only after
-.BR read (2)
-or
-.BR write (2)
-return
-.BR EAGAIN .
-.P
-By contrast, when used as a level-triggered interface
-(the default, when
-.B EPOLLET
-is not specified),
-.B epoll
-is simply a faster
-.BR poll (2),
-and can be used wherever the latter is used since it shares the
-same semantics.
-.P
-Since even with edge-triggered
-.BR epoll ,
-multiple events can be generated upon receipt of multiple chunks of data,
-the caller has the option to specify the
-.B EPOLLONESHOT
-flag, to tell
-.B epoll
-to disable the associated file descriptor after the receipt of an event with
-.BR epoll_wait (2).
-When the
-.B EPOLLONESHOT
-flag is specified,
-it is the caller's responsibility to rearm the file descriptor using
-.BR epoll_ctl (2)
-with
-.BR EPOLL_CTL_MOD .
-.P
-If multiple threads
-(or processes, if child processes have inherited the
-.B epoll
-file descriptor across
-.BR fork (2))
-are blocked in
-.BR epoll_wait (2)
-waiting on the same epoll file descriptor and a file descriptor
-in the interest list that is marked for edge-triggered
-.RB ( EPOLLET )
-notification becomes ready,
-just one of the threads (or processes) is awoken from
-.BR epoll_wait (2).
-This provides a useful optimization for avoiding "thundering herd" wake-ups
-in some scenarios.
-.\"
-.SS Interaction with autosleep
-If the system is in
-.B autosleep
-mode via
-.I /sys/power/autosleep
-and an event happens which wakes the device from sleep, the device
-driver will keep the device awake only until that event is queued.
-To keep the device awake until the event has been processed,
-it is necessary to use the
-.BR epoll_ctl (2)
-.B EPOLLWAKEUP
-flag.
-.P
-When the
-.B EPOLLWAKEUP
-flag is set in the
-.B events
-field for a
-.IR "struct epoll_event" ,
-the system will be kept awake from the moment the event is queued,
-through the
-.BR epoll_wait (2)
-call which returns the event until the subsequent
-.BR epoll_wait (2)
-call.
-If the event should keep the system awake beyond that time,
-then a separate
-.I wake_lock
-should be taken before the second
-.BR epoll_wait (2)
-call.
-.SS /proc interfaces
-The following interfaces can be used to limit the amount of
-kernel memory consumed by epoll:
-.\" Following was added in Linux 2.6.28, but them removed in Linux 2.6.29
-.\" .TP
-.\" .IR /proc/sys/fs/epoll/max_user_instances " (since Linux 2.6.28)"
-.\" This specifies an upper limit on the number of epoll instances
-.\" that can be created per real user ID.
-.TP
-.IR /proc/sys/fs/epoll/max_user_watches " (since Linux 2.6.28)"
-This specifies a limit on the total number of
-file descriptors that a user can register across
-all epoll instances on the system.
-The limit is per real user ID.
-Each registered file descriptor costs roughly 90 bytes on a 32-bit kernel,
-and roughly 160 bytes on a 64-bit kernel.
-Currently,
-.\" Linux 2.6.29 (in Linux 2.6.28, the default was 1/32 of lowmem)
-the default value for
-.I max_user_watches
-is 1/25 (4%) of the available low memory,
-divided by the registration cost in bytes.
-.SS Example for suggested usage
-While the usage of
-.B epoll
-when employed as a level-triggered interface does have the same
-semantics as
-.BR poll (2),
-the edge-triggered usage requires more clarification to avoid stalls
-in the application event loop.
-In this example, listener is a
-nonblocking socket on which
-.BR listen (2)
-has been called.
-The function
-.I do_use_fd()
-uses the new ready file descriptor until
-.B EAGAIN
-is returned by either
-.BR read (2)
-or
-.BR write (2).
-An event-driven state machine application should, after having received
-.BR EAGAIN ,
-record its current state so that at the next call to
-.I do_use_fd()
-it will continue to
-.BR read (2)
-or
-.BR write (2)
-from where it stopped before.
-.P
-.in +4n
-.EX
-#define MAX_EVENTS 10
-struct epoll_event ev, events[MAX_EVENTS];
-int listen_sock, conn_sock, nfds, epollfd;
-\&
-/* Code to set up listening socket, \[aq]listen_sock\[aq],
- (socket(), bind(), listen()) omitted. */
-\&
-epollfd = epoll_create1(0);
-if (epollfd == \-1) {
- perror("epoll_create1");
- exit(EXIT_FAILURE);
-}
-\&
-ev.events = EPOLLIN;
-ev.data.fd = listen_sock;
-if (epoll_ctl(epollfd, EPOLL_CTL_ADD, listen_sock, &ev) == \-1) {
- perror("epoll_ctl: listen_sock");
- exit(EXIT_FAILURE);
-}
-\&
-for (;;) {
- nfds = epoll_wait(epollfd, events, MAX_EVENTS, \-1);
- if (nfds == \-1) {
- perror("epoll_wait");
- exit(EXIT_FAILURE);
- }
-\&
- for (n = 0; n < nfds; ++n) {
- if (events[n].data.fd == listen_sock) {
- conn_sock = accept(listen_sock,
- (struct sockaddr *) &addr, &addrlen);
- if (conn_sock == \-1) {
- perror("accept");
- exit(EXIT_FAILURE);
- }
- setnonblocking(conn_sock);
- ev.events = EPOLLIN | EPOLLET;
- ev.data.fd = conn_sock;
- if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn_sock,
- &ev) == \-1) {
- perror("epoll_ctl: conn_sock");
- exit(EXIT_FAILURE);
- }
- } else {
- do_use_fd(events[n].data.fd);
- }
- }
-}
-.EE
-.in
-.P
-When used as an edge-triggered interface, for performance reasons, it is
-possible to add the file descriptor inside the
-.B epoll
-interface
-.RB ( EPOLL_CTL_ADD )
-once by specifying
-.RB ( EPOLLIN | EPOLLOUT ).
-This allows you to avoid
-continuously switching between
-.B EPOLLIN
-and
-.B EPOLLOUT
-calling
-.BR epoll_ctl (2)
-with
-.BR EPOLL_CTL_MOD .
-.SS Questions and answers
-.IP \[bu] 3
-What is the key used to distinguish the file descriptors registered in an
-interest list?
-.IP
-The key is the combination of the file descriptor number and
-the open file description
-(also known as an "open file handle",
-the kernel's internal representation of an open file).
-.IP \[bu]
-What happens if you register the same file descriptor on an
-.B epoll
-instance twice?
-.IP
-You will probably get
-.BR EEXIST .
-However, it is possible to add a duplicate
-.RB ( dup (2),
-.BR dup2 (2),
-.BR fcntl (2)
-.BR F_DUPFD )
-file descriptor to the same
-.B epoll
-instance.
-.\" But a file descriptor duplicated by fork(2) can't be added to the
-.\" set, because the [file *, fd] pair is already in the epoll set.
-.\" That is a somewhat ugly inconsistency. On the one hand, a child process
-.\" cannot add the duplicate file descriptor to the epoll set. (In every
-.\" other case that I can think of, file descriptors duplicated by fork have
-.\" similar semantics to file descriptors duplicated by dup() and friends.) On
-.\" the other hand, the very fact that the child has a duplicate of the
-.\" file descriptor means that even if the parent closes its file descriptor,
-.\" then epoll_wait() in the parent will continue to receive notifications for
-.\" that file descriptor because of the duplicated file descriptor in the child.
-.\"
-.\" See http://thread.gmane.org/gmane.linux.kernel/596462/
-.\" "epoll design problems with common fork/exec patterns"
-.\"
-.\" mtk, Feb 2008
-This can be a useful technique for filtering events,
-if the duplicate file descriptors are registered with different
-.I events
-masks.
-.IP \[bu]
-Can two
-.B epoll
-instances wait for the same file descriptor?
-If so, are events reported to both
-.B epoll
-file descriptors?
-.IP
-Yes, and events would be reported to both.
-However, careful programming may be needed to do this correctly.
-.IP \[bu]
-Is the
-.B epoll
-file descriptor itself poll/epoll/selectable?
-.IP
-Yes.
-If an
-.B epoll
-file descriptor has events waiting, then it will
-indicate as being readable.
-.IP \[bu]
-What happens if one attempts to put an
-.B epoll
-file descriptor into its own file descriptor set?
-.IP
-The
-.BR epoll_ctl (2)
-call fails
-.RB ( EINVAL ).
-However, you can add an
-.B epoll
-file descriptor inside another
-.B epoll
-file descriptor set.
-.IP \[bu]
-Can I send an
-.B epoll
-file descriptor over a UNIX domain socket to another process?
-.IP
-Yes, but it does not make sense to do this, since the receiving process
-would not have copies of the file descriptors in the interest list.
-.IP \[bu]
-Will closing a file descriptor cause it to be removed from all
-.B epoll
-interest lists?
-.IP
-Yes, but be aware of the following point.
-A file descriptor is a reference to an open file description (see
-.BR open (2)).
-Whenever a file descriptor is duplicated via
-.BR dup (2),
-.BR dup2 (2),
-.BR fcntl (2)
-.BR F_DUPFD ,
-or
-.BR fork (2),
-a new file descriptor referring to the same open file description is
-created.
-An open file description continues to exist until all
-file descriptors referring to it have been closed.
-.IP
-A file descriptor is removed from an
-interest list only after all the file descriptors referring to the underlying
-open file description have been closed.
-This means that even after a file descriptor that is part of an
-interest list has been closed,
-events may be reported for that file descriptor if other file
-descriptors referring to the same underlying file description remain open.
-To prevent this happening,
-the file descriptor must be explicitly removed from the interest list (using
-.BR epoll_ctl (2)
-.BR EPOLL_CTL_DEL )
-before it is duplicated.
-Alternatively,
-the application must ensure that all file descriptors are closed
-(which may be difficult if file descriptors were duplicated
-behind the scenes by library functions that used
-.BR dup (2)
-or
-.BR fork (2)).
-.IP \[bu]
-If more than one event occurs between
-.BR epoll_wait (2)
-calls, are they combined or reported separately?
-.IP
-They will be combined.
-.IP \[bu]
-Does an operation on a file descriptor affect the
-already collected but not yet reported events?
-.IP
-You can do two operations on an existing file descriptor.
-Remove would be meaningless for
-this case.
-Modify will reread available I/O.
-.IP \[bu]
-Do I need to continuously read/write a file descriptor
-until
-.B EAGAIN
-when using the
-.B EPOLLET
-flag (edge-triggered behavior)?
-.IP
-Receiving an event from
-.BR epoll_wait (2)
-should suggest to you that such
-file descriptor is ready for the requested I/O operation.
-You must consider it ready until the next (nonblocking)
-read/write yields
-.BR EAGAIN .
-When and how you will use the file descriptor is entirely up to you.
-.IP
-For packet/token-oriented files (e.g., datagram socket,
-terminal in canonical mode),
-the only way to detect the end of the read/write I/O space
-is to continue to read/write until
-.BR EAGAIN .
-.IP
-For stream-oriented files (e.g., pipe, FIFO, stream socket), the
-condition that the read/write I/O space is exhausted can also be detected by
-checking the amount of data read from / written to the target file
-descriptor.
-For example, if you call
-.BR read (2)
-by asking to read a certain amount of data and
-.BR read (2)
-returns a lower number of bytes, you
-can be sure of having exhausted the read I/O space for the file
-descriptor.
-The same is true when writing using
-.BR write (2).
-(Avoid this latter technique if you cannot guarantee that
-the monitored file descriptor always refers to a stream-oriented file.)
-.SS Possible pitfalls and ways to avoid them
-.IP \[bu] 3
-.B Starvation (edge-triggered)
-.IP
-If there is a large amount of I/O space,
-it is possible that by trying to drain
-it the other files will not get processed causing starvation.
-(This problem is not specific to
-.BR epoll .)
-.IP
-The solution is to maintain a ready list
-and mark the file descriptor as ready
-in its associated data structure, thereby allowing the application to
-remember which files need to be processed but still round robin amongst
-all the ready files.
-This also supports ignoring subsequent events you
-receive for file descriptors that are already ready.
-.IP \[bu]
-.B If using an event cache...
-.IP
-If you use an event cache or store all the file descriptors returned from
-.BR epoll_wait (2),
-then make sure to provide a way to mark
-its closure dynamically (i.e., caused by
-a previous event's processing).
-Suppose you receive 100 events from
-.BR epoll_wait (2),
-and in event #47 a condition causes event #13 to be closed.
-If you remove the structure and
-.BR close (2)
-the file descriptor for event #13, then your
-event cache might still say there are events waiting for that
-file descriptor causing confusion.
-.IP
-One solution for this is to call, during the processing of event 47,
-.BR epoll_ctl ( EPOLL_CTL_DEL )
-to delete file descriptor 13 and
-.BR close (2),
-then mark its associated
-data structure as removed and link it to a cleanup list.
-If you find another
-event for file descriptor 13 in your batch processing,
-you will discover the file descriptor had been
-previously removed and there will be no confusion.
-.SH VERSIONS
-Some other systems provide similar mechanisms;
-for example,
-FreeBSD has
-.IR kqueue ,
-and Solaris has
-.IR /dev/poll .
-.SH STANDARDS
-Linux.
-.SH HISTORY
-Linux 2.5.44.
-.\" Its interface should be finalized in Linux 2.5.66.
-glibc 2.3.2.
-.SH NOTES
-The set of file descriptors that is being monitored via
-an epoll file descriptor can be viewed via the entry for
-the epoll file descriptor in the process's
-.IR /proc/ pid /fdinfo
-directory.
-See
-.BR proc (5)
-for further details.
-.P
-The
-.BR kcmp (2)
-.B KCMP_EPOLL_TFD
-operation can be used to test whether a file descriptor
-is present in an epoll instance.
-.SH SEE ALSO
-.BR epoll_create (2),
-.BR epoll_create1 (2),
-.BR epoll_ctl (2),
-.BR epoll_wait (2),
-.BR poll (2),
-.BR select (2)