summaryrefslogtreecommitdiffstats
path: root/man7/socket.7
diff options
context:
space:
mode:
Diffstat (limited to 'man7/socket.7')
-rw-r--r--man7/socket.71266
1 files changed, 1266 insertions, 0 deletions
diff --git a/man7/socket.7 b/man7/socket.7
new file mode 100644
index 0000000..2cc24d9
--- /dev/null
+++ b/man7/socket.7
@@ -0,0 +1,1266 @@
+'\" t
+.\" SPDX-License-Identifier: Linux-man-pages-1-para
+.\"
+.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
+.\" and copyright (c) 1999 Matthew Wilcox.
+.\"
+.\" 2002-10-30, Michael Kerrisk, <mtk.manpages@gmail.com>
+.\" Added description of SO_ACCEPTCONN
+.\" 2004-05-20, aeb, added SO_RCVTIMEO/SO_SNDTIMEO text.
+.\" Modified, 27 May 2004, Michael Kerrisk <mtk.manpages@gmail.com>
+.\" Added notes on capability requirements
+.\" A few small grammar fixes
+.\" 2010-06-13 Jan Engelhardt <jengelh@medozas.de>
+.\" Documented SO_DOMAIN and SO_PROTOCOL.
+.\"
+.\" FIXME
+.\" The following are not yet documented:
+.\"
+.\" SO_PEERNAME (2.4?)
+.\" get only
+.\" Seems to do something similar to getpeername(), but then
+.\" why is it necessary / how does it differ?
+.\"
+.\" SO_TIMESTAMPING (2.6.30)
+.\" Documentation/networking/timestamping.txt
+.\" commit cb9eff097831007afb30d64373f29d99825d0068
+.\" Author: Patrick Ohly <patrick.ohly@intel.com>
+.\"
+.\" SO_WIFI_STATUS (3.3)
+.\" commit 6e3e939f3b1bf8534b32ad09ff199d88800835a0
+.\" Author: Johannes Berg <johannes.berg@intel.com>
+.\" Also: SCM_WIFI_STATUS
+.\"
+.\" SO_NOFCS (3.4)
+.\" commit 3bdc0eba0b8b47797f4a76e377dd8360f317450f
+.\" Author: Ben Greear <greearb@candelatech.com>
+.\"
+.\" SO_GET_FILTER (3.8)
+.\" commit a8fc92778080c845eaadc369a0ecf5699a03bef0
+.\" Author: Pavel Emelyanov <xemul@parallels.com>
+.\"
+.\" SO_MAX_PACING_RATE (3.13)
+.\" commit 62748f32d501f5d3712a7c372bbb92abc7c62bc7
+.\" Author: Eric Dumazet <edumazet@google.com>
+.\"
+.\" SO_BPF_EXTENSIONS (3.14)
+.\" commit ea02f9411d9faa3553ed09ce0ec9f00ceae9885e
+.\" Author: Michal Sekletar <msekleta@redhat.com>
+.\"
+.TH socket 7 2023-07-15 "Linux man-pages 6.05.01"
+.SH NAME
+socket \- Linux socket interface
+.SH SYNOPSIS
+.nf
+.B #include <sys/socket.h>
+.PP
+.IB sockfd " = socket(int " socket_family ", int " socket_type ", int " protocol );
+.fi
+.SH DESCRIPTION
+This manual page describes the Linux networking socket layer user
+interface.
+The BSD compatible sockets
+are the uniform interface
+between the user process and the network protocol stacks in the kernel.
+The protocol modules are grouped into
+.I protocol families
+such as
+.BR AF_INET ", " AF_IPX ", and " AF_PACKET ,
+and
+.I socket types
+such as
+.B SOCK_STREAM
+or
+.BR SOCK_DGRAM .
+See
+.BR socket (2)
+for more information on families and types.
+.SS Socket-layer functions
+These functions are used by the user process to send or receive packets
+and to do other socket operations.
+For more information, see their respective manual pages.
+.PP
+.BR socket (2)
+creates a socket,
+.BR connect (2)
+connects a socket to a remote socket address,
+the
+.BR bind (2)
+function binds a socket to a local socket address,
+.BR listen (2)
+tells the socket that new connections shall be accepted, and
+.BR accept (2)
+is used to get a new socket with a new incoming connection.
+.BR socketpair (2)
+returns two connected anonymous sockets (implemented only for a few
+local families like
+.BR AF_UNIX )
+.PP
+.BR send (2),
+.BR sendto (2),
+and
+.BR sendmsg (2)
+send data over a socket, and
+.BR recv (2),
+.BR recvfrom (2),
+.BR recvmsg (2)
+receive data from a socket.
+.BR poll (2)
+and
+.BR select (2)
+wait for arriving data or a readiness to send data.
+In addition, the standard I/O operations like
+.BR write (2),
+.BR writev (2),
+.BR sendfile (2),
+.BR read (2),
+and
+.BR readv (2)
+can be used to read and write data.
+.PP
+.BR getsockname (2)
+returns the local socket address and
+.BR getpeername (2)
+returns the remote socket address.
+.BR getsockopt (2)
+and
+.BR setsockopt (2)
+are used to set or get socket layer or protocol options.
+.BR ioctl (2)
+can be used to set or read some other options.
+.PP
+.BR close (2)
+is used to close a socket.
+.BR shutdown (2)
+closes parts of a full-duplex socket connection.
+.PP
+Seeking, or calling
+.BR pread (2)
+or
+.BR pwrite (2)
+with a nonzero position is not supported on sockets.
+.PP
+It is possible to do nonblocking I/O on sockets by setting the
+.B O_NONBLOCK
+flag on a socket file descriptor using
+.BR fcntl (2).
+Then all operations that would block will (usually)
+return with
+.B EAGAIN
+(operation should be retried later);
+.BR connect (2)
+will return
+.B EINPROGRESS
+error.
+The user can then wait for various events via
+.BR poll (2)
+or
+.BR select (2).
+.TS
+tab(:) allbox;
+c s s
+l l lx.
+I/O events
+Event:Poll flag:Occurrence
+Read:POLLIN:T{
+New data arrived.
+T}
+Read:POLLIN:T{
+A connection setup has been completed
+(for connection-oriented sockets)
+T}
+Read:POLLHUP:T{
+A disconnection request has been initiated by the other end.
+T}
+Read:POLLHUP:T{
+A connection is broken (only for connection-oriented protocols).
+When the socket is written
+.B SIGPIPE
+is also sent.
+T}
+Write:POLLOUT:T{
+Socket has enough send buffer space for writing new data.
+T}
+Read/Write:T{
+POLLIN |
+.br
+POLLOUT
+T}:T{
+An outgoing
+.BR connect (2)
+finished.
+T}
+Read/Write:POLLERR:T{
+An asynchronous error occurred.
+T}
+Read/Write:POLLHUP:T{
+The other end has shut down one direction.
+T}
+Exception:POLLPRI:T{
+Urgent data arrived.
+.B SIGURG
+is sent then.
+T}
+.\" FIXME . The following is not true currently:
+.\" It is no I/O event when the connection
+.\" is broken from the local end using
+.\" .BR shutdown (2)
+.\" or
+.\" .BR close (2).
+.TE
+.PP
+An alternative to
+.BR poll (2)
+and
+.BR select (2)
+is to let the kernel inform the application about events
+via a
+.B SIGIO
+signal.
+For that the
+.B O_ASYNC
+flag must be set on a socket file descriptor via
+.BR fcntl (2)
+and a valid signal handler for
+.B SIGIO
+must be installed via
+.BR sigaction (2).
+See the
+.I Signals
+discussion below.
+.SS Socket address structures
+Each socket domain has its own format for socket addresses,
+with a domain-specific address structure.
+Each of these structures begins with an
+integer "family" field (typed as
+.IR sa_family_t )
+that indicates the type of the address structure.
+This allows
+the various system calls (e.g.,
+.BR connect (2),
+.BR bind (2),
+.BR accept (2),
+.BR getsockname (2),
+.BR getpeername (2)),
+which are generic to all socket domains,
+to determine the domain of a particular socket address.
+.PP
+To allow any type of socket address to be passed to
+interfaces in the sockets API,
+the type
+.I struct sockaddr
+is defined.
+The purpose of this type is purely to allow casting of
+domain-specific socket address types to a "generic" type,
+so as to avoid compiler warnings about type mismatches in
+calls to the sockets API.
+.PP
+In addition, the sockets API provides the data type
+.IR "struct sockaddr_storage".
+This type
+is suitable to accommodate all supported domain-specific socket
+address structures; it is large enough and is aligned properly.
+(In particular, it is large enough to hold
+IPv6 socket addresses.)
+The structure includes the following field, which can be used to identify
+the type of socket address actually stored in the structure:
+.PP
+.in +4n
+.EX
+ sa_family_t ss_family;
+.EE
+.in
+.PP
+The
+.I sockaddr_storage
+structure is useful in programs that must handle socket addresses
+in a generic way
+(e.g., programs that must deal with both IPv4 and IPv6 socket addresses).
+.SS Socket options
+The socket options listed below can be set by using
+.BR setsockopt (2)
+and read with
+.BR getsockopt (2)
+with the socket level set to
+.B SOL_SOCKET
+for all sockets.
+Unless otherwise noted,
+.I optval
+is a pointer to an
+.IR int .
+.\" FIXME .
+.\" In the list below, the text used to describe argument types
+.\" for each socket option should be more consistent
+.\"
+.\" SO_ACCEPTCONN is in POSIX.1-2001, and its origin is explained in
+.\" W R Stevens, UNPv1
+.TP
+.B SO_ACCEPTCONN
+Returns a value indicating whether or not this socket has been marked
+to accept connections with
+.BR listen (2).
+The value 0 indicates that this is not a listening socket,
+the value 1 indicates that this is a listening socket.
+This socket option is read-only.
+.TP
+.BR SO_ATTACH_FILTER " (since Linux 2.2), " SO_ATTACH_BPF " (since Linux 3.19)"
+Attach a classic BPF
+.RB ( SO_ATTACH_FILTER )
+or an extended BPF
+.RB ( SO_ATTACH_BPF )
+program to the socket for use as a filter of incoming packets.
+A packet will be dropped if the filter program returns zero.
+If the filter program returns a
+nonzero value which is less than the packet's data length,
+the packet will be truncated to the length returned.
+If the value returned by the filter is greater than or equal to the
+packet's data length, the packet is allowed to proceed unmodified.
+.IP
+The argument for
+.B SO_ATTACH_FILTER
+is a
+.I sock_fprog
+structure, defined in
+.IR <linux/filter.h> :
+.IP
+.in +4n
+.EX
+struct sock_fprog {
+ unsigned short len;
+ struct sock_filter *filter;
+};
+.EE
+.in
+.IP
+The argument for
+.B SO_ATTACH_BPF
+is a file descriptor returned by the
+.BR bpf (2)
+system call and must refer to a program of type
+.BR BPF_PROG_TYPE_SOCKET_FILTER .
+.IP
+These options may be set multiple times for a given socket,
+each time replacing the previous filter program.
+The classic and extended versions may be called on the same socket,
+but the previous filter will always be replaced such that a socket
+never has more than one filter defined.
+.IP
+Both classic and extended BPF are explained in the kernel source file
+.I Documentation/networking/filter.txt
+.TP
+.BR SO_ATTACH_REUSEPORT_CBPF ", " SO_ATTACH_REUSEPORT_EBPF
+For use with the
+.B SO_REUSEPORT
+option, these options allow the user to set a classic BPF
+.RB ( SO_ATTACH_REUSEPORT_CBPF )
+or an extended BPF
+.RB ( SO_ATTACH_REUSEPORT_EBPF )
+program which defines how packets are assigned to
+the sockets in the reuseport group (that is, all sockets which have
+.B SO_REUSEPORT
+set and are using the same local address to receive packets).
+.IP
+The BPF program must return an index between 0 and N\-1 representing
+the socket which should receive the packet
+(where N is the number of sockets in the group).
+If the BPF program returns an invalid index,
+socket selection will fall back to the plain
+.B SO_REUSEPORT
+mechanism.
+.IP
+Sockets are numbered in the order in which they are added to the group
+(that is, the order of
+.BR bind (2)
+calls for UDP sockets or the order of
+.BR listen (2)
+calls for TCP sockets).
+New sockets added to a reuseport group will inherit the BPF program.
+When a socket is removed from a reuseport group (via
+.BR close (2)),
+the last socket in the group will be moved into the closed socket's
+position.
+.IP
+These options may be set repeatedly at any time on any socket in the group
+to replace the current BPF program used by all sockets in the group.
+.IP
+.B SO_ATTACH_REUSEPORT_CBPF
+takes the same argument type as
+.B SO_ATTACH_FILTER
+and
+.B SO_ATTACH_REUSEPORT_EBPF
+takes the same argument type as
+.BR SO_ATTACH_BPF .
+.IP
+UDP support for this feature is available since Linux 4.5;
+TCP support is available since Linux 4.6.
+.TP
+.B SO_BINDTODEVICE
+Bind this socket to a particular device like \[lq]eth0\[rq],
+as specified in the passed interface name.
+If the
+name is an empty string or the option length is zero, the socket device
+binding is removed.
+The passed option is a variable-length null-terminated
+interface name string with the maximum size of
+.BR IFNAMSIZ .
+If a socket is bound to an interface,
+only packets received from that particular interface are processed by the
+socket.
+Note that this works only for some socket types, particularly
+.B AF_INET
+sockets.
+It is not supported for packet sockets (use normal
+.BR bind (2)
+there).
+.IP
+Before Linux 3.8,
+this socket option could be set, but could not retrieved with
+.BR getsockopt (2).
+Since Linux 3.8, it is readable.
+The
+.I optlen
+argument should contain the buffer size available
+to receive the device name and is recommended to be
+.B IFNAMSIZ
+bytes.
+The real device name length is reported back in the
+.I optlen
+argument.
+.TP
+.B SO_BROADCAST
+Set or get the broadcast flag.
+When enabled, datagram sockets are allowed to send
+packets to a broadcast address.
+This option has no effect on stream-oriented sockets.
+.TP
+.B SO_BSDCOMPAT
+Enable BSD bug-to-bug compatibility.
+This is used by the UDP protocol module in Linux 2.0 and 2.2.
+If enabled, ICMP errors received for a UDP socket will not be passed
+to the user program.
+In later kernel versions, support for this option has been phased out:
+Linux 2.4 silently ignores it, and Linux 2.6 generates a kernel warning
+(printk()) if a program uses this option.
+Linux 2.0 also enabled BSD bug-to-bug compatibility
+options (random header changing, skipping of the broadcast flag) for raw
+sockets with this option, but that was removed in Linux 2.2.
+.TP
+.B SO_DEBUG
+Enable socket debugging.
+Allowed only for processes with the
+.B CAP_NET_ADMIN
+capability or an effective user ID of 0.
+.TP
+.BR SO_DETACH_FILTER " (since Linux 2.2), " SO_DETACH_BPF " (since Linux 3.19)"
+These two options, which are synonyms,
+may be used to remove the classic or extended BPF
+program attached to a socket with either
+.B SO_ATTACH_FILTER
+or
+.BR SO_ATTACH_BPF .
+The option value is ignored.
+.TP
+.BR SO_DOMAIN " (since Linux 2.6.32)"
+Retrieves the socket domain as an integer, returning a value such as
+.BR AF_INET6 .
+See
+.BR socket (2)
+for details.
+This socket option is read-only.
+.TP
+.B SO_ERROR
+Get and clear the pending socket error.
+This socket option is read-only.
+Expects an integer.
+.TP
+.B SO_DONTROUTE
+Don't send via a gateway, send only to directly connected hosts.
+The same effect can be achieved by setting the
+.B MSG_DONTROUTE
+flag on a socket
+.BR send (2)
+operation.
+Expects an integer boolean flag.
+.TP
+.BR SO_INCOMING_CPU " (gettable since Linux 3.19, settable since Linux 4.4)"
+.\" getsockopt 2c8c56e15df3d4c2af3d656e44feb18789f75837
+.\" setsockopt 70da268b569d32a9fddeea85dc18043de9d89f89
+Sets or gets the CPU affinity of a socket.
+Expects an integer flag.
+.IP
+.in +4n
+.EX
+int cpu = 1;
+setsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu,
+ sizeof(cpu));
+.EE
+.in
+.IP
+Because all of the packets for a single stream
+(i.e., all packets for the same 4-tuple)
+arrive on the single RX queue that is associated with a particular CPU,
+the typical use case is to employ one listening process per RX queue,
+with the incoming flow being handled by a listener
+on the same CPU that is handling the RX queue.
+This provides optimal NUMA behavior and keeps CPU caches hot.
+.\"
+.\" From an email conversation with Eric Dumazet:
+.\" >> Note that setting the option is not supported if SO_REUSEPORT is used.
+.\" >
+.\" > Please define "not supported". Does this yield an API diagnostic?
+.\" > If so, what is it?
+.\" >
+.\" >> Socket will be selected from an array, either by a hash or BPF program
+.\" >> that has no access to this information.
+.\" >
+.\" > Sorry -- I'm lost here. How does this comment relate to the proposed
+.\" > man page text above?
+.\"
+.\" Simply that :
+.\"
+.\" If an application uses both SO_INCOMING_CPU and SO_REUSEPORT, then
+.\" SO_REUSEPORT logic, selecting the socket to receive the packet, ignores
+.\" SO_INCOMING_CPU setting.
+.TP
+.BR SO_INCOMING_NAPI_ID " (gettable since Linux 4.12)"
+.\" getsockopt 6d4339028b350efbf87c61e6d9e113e5373545c9
+Returns a system-level unique ID called NAPI ID that is associated
+with a RX queue on which the last packet associated with that
+socket is received.
+.IP
+This can be used by an application to split the incoming flows among worker
+threads based on the RX queue on which the packets associated with the
+flows are received.
+It allows each worker thread to be associated with
+a NIC HW receive queue and service all the connection
+requests received on that RX queue.
+This mapping between an app thread and
+a HW NIC queue streamlines the
+flow of data from the NIC to the application.
+.TP
+.B SO_KEEPALIVE
+Enable sending of keep-alive messages on connection-oriented sockets.
+Expects an integer boolean flag.
+.TP
+.B SO_LINGER
+Sets or gets the
+.B SO_LINGER
+option.
+The argument is a
+.I linger
+structure.
+.IP
+.in +4n
+.EX
+struct linger {
+ int l_onoff; /* linger active */
+ int l_linger; /* how many seconds to linger for */
+};
+.EE
+.in
+.IP
+When enabled, a
+.BR close (2)
+or
+.BR shutdown (2)
+will not return until all queued messages for the socket have been
+successfully sent or the linger timeout has been reached.
+Otherwise,
+the call returns immediately and the closing is done in the background.
+When the socket is closed as part of
+.BR exit (2),
+it always lingers in the background.
+.TP
+.B SO_LOCK_FILTER
+.\" commit d59577b6ffd313d0ab3be39cb1ab47e29bdc9182
+When set, this option will prevent
+changing the filters associated with the socket.
+These filters include any set using the socket options
+.BR SO_ATTACH_FILTER ,
+.BR SO_ATTACH_BPF ,
+.BR SO_ATTACH_REUSEPORT_CBPF ,
+and
+.BR SO_ATTACH_REUSEPORT_EBPF .
+.IP
+The typical use case is for a privileged process to set up a raw socket
+(an operation that requires the
+.B CAP_NET_RAW
+capability), apply a restrictive filter, set the
+.B SO_LOCK_FILTER
+option,
+and then either drop its privileges or pass the socket file descriptor
+to an unprivileged process via a UNIX domain socket.
+.IP
+Once the
+.B SO_LOCK_FILTER
+option has been enabled, attempts to change or remove the filter
+attached to a socket, or to disable the
+.B SO_LOCK_FILTER
+option will fail with the error
+.BR EPERM .
+.TP
+.BR SO_MARK " (since Linux 2.6.25)"
+.\" commit 4a19ec5800fc3bb64e2d87c4d9fdd9e636086fe0
+.\" and 914a9ab386a288d0f22252fc268ecbc048cdcbd5
+Set the mark for each packet sent through this socket
+(similar to the netfilter MARK target but socket-based).
+Changing the mark can be used for mark-based
+routing without netfilter or for packet filtering.
+Setting this option requires the
+.B CAP_NET_ADMIN
+capability.
+.TP
+.B SO_OOBINLINE
+If this option is enabled,
+out-of-band data is directly placed into the receive data stream.
+Otherwise, out-of-band data is passed only when the
+.B MSG_OOB
+flag is set during receiving.
+.\" don't document it because it can do too much harm.
+.\".B SO_NO_CHECK
+.\" The kernel has support for the SO_NO_CHECK socket
+.\" option (boolean: 0 == default, calculate checksum on xmit,
+.\" 1 == do not calculate checksum on xmit).
+.\" Additional note from Andi Kleen on SO_NO_CHECK (2010-08-30)
+.\" On Linux UDP checksums are essentially free and there's no reason
+.\" to turn them off and it would disable another safety line.
+.\" That is why I didn't document the option.
+.TP
+.B SO_PASSCRED
+Enable or disable the receiving of the
+.B SCM_CREDENTIALS
+control message.
+For more information, see
+.BR unix (7).
+.TP
+.B SO_PASSSEC
+Enable or disable the receiving of the
+.B SCM_SECURITY
+control message.
+For more information, see
+.BR unix (7).
+.TP
+.BR SO_PEEK_OFF " (since Linux 3.4)"
+.\" commit ef64a54f6e558155b4f149bb10666b9e914b6c54
+This option, which is currently supported only for
+.BR unix (7)
+sockets, sets the value of the "peek offset" for the
+.BR recv (2)
+system call when used with
+.B MSG_PEEK
+flag.
+.IP
+When this option is set to a negative value
+(it is set to \-1 for all new sockets),
+traditional behavior is provided:
+.BR recv (2)
+with the
+.B MSG_PEEK
+flag will peek data from the front of the queue.
+.IP
+When the option is set to a value greater than or equal to zero,
+then the next peek at data queued in the socket will occur at
+the byte offset specified by the option value.
+At the same time, the "peek offset" will be
+incremented by the number of bytes that were peeked from the queue,
+so that a subsequent peek will return the next data in the queue.
+.IP
+If data is removed from the front of the queue via a call to
+.BR recv (2)
+(or similar) without the
+.B MSG_PEEK
+flag, the "peek offset" will be decreased by the number of bytes removed.
+In other words, receiving data without the
+.B MSG_PEEK
+flag will cause the "peek offset" to be adjusted to maintain
+the correct relative position in the queued data,
+so that a subsequent peek will retrieve the data that would have been
+retrieved had the data not been removed.
+.IP
+For datagram sockets, if the "peek offset" points to the middle of a packet,
+the data returned will be marked with the
+.B MSG_TRUNC
+flag.
+.IP
+The following example serves to illustrate the use of
+.BR SO_PEEK_OFF .
+Suppose a stream socket has the following queued input data:
+.IP
+.in +4n
+.EX
+aabbccddeeff
+.EE
+.in
+.IP
+The following sequence of
+.BR recv (2)
+calls would have the effect noted in the comments:
+.IP
+.in +4n
+.EX
+int ov = 4; // Set peek offset to 4
+setsockopt(fd, SOL_SOCKET, SO_PEEK_OFF, &ov, sizeof(ov));
+\&
+recv(fd, buf, 2, MSG_PEEK); // Peeks "cc"; offset set to 6
+recv(fd, buf, 2, MSG_PEEK); // Peeks "dd"; offset set to 8
+recv(fd, buf, 2, 0); // Reads "aa"; offset set to 6
+recv(fd, buf, 2, MSG_PEEK); // Peeks "ee"; offset set to 8
+.EE
+.in
+.TP
+.B SO_PEERCRED
+Return the credentials of the peer process connected to this socket.
+For further details, see
+.BR unix (7).
+.TP
+.BR SO_PEERSEC " (since Linux 2.6.2)"
+Return the security context of the peer socket connected to this socket.
+For further details, see
+.BR unix (7)
+and
+.BR ip (7).
+.TP
+.B SO_PRIORITY
+Set the protocol-defined priority for all packets to be sent on
+this socket.
+Linux uses this value to order the networking queues:
+packets with a higher priority may be processed first depending
+on the selected device queueing discipline.
+.\" For
+.\" .BR ip (7),
+.\" this also sets the IP type-of-service (TOS) field for outgoing packets.
+Setting a priority outside the range 0 to 6 requires the
+.B CAP_NET_ADMIN
+capability.
+.TP
+.BR SO_PROTOCOL " (since Linux 2.6.32)"
+Retrieves the socket protocol as an integer, returning a value such as
+.BR IPPROTO_SCTP .
+See
+.BR socket (2)
+for details.
+This socket option is read-only.
+.TP
+.B SO_RCVBUF
+Sets or gets the maximum socket receive buffer in bytes.
+The kernel doubles this value (to allow space for bookkeeping overhead)
+when it is set using
+.\" Most (all?) other implementations do not do this -- MTK, Dec 05
+.BR setsockopt (2),
+and this doubled value is returned by
+.BR getsockopt (2).
+.\" The following thread on LMKL is quite informative:
+.\" getsockopt/setsockopt with SO_RCVBUF and SO_SNDBUF "non-standard" behavior
+.\" 17 July 2012
+.\" http://thread.gmane.org/gmane.linux.kernel/1328935
+The default value is set by the
+.I /proc/sys/net/core/rmem_default
+file, and the maximum allowed value is set by the
+.I /proc/sys/net/core/rmem_max
+file.
+The minimum (doubled) value for this option is 256.
+.TP
+.BR SO_RCVBUFFORCE " (since Linux 2.6.14)"
+Using this socket option, a privileged
+.RB ( CAP_NET_ADMIN )
+process can perform the same task as
+.BR SO_RCVBUF ,
+but the
+.I rmem_max
+limit can be overridden.
+.TP
+.BR SO_RCVLOWAT " and " SO_SNDLOWAT
+Specify the minimum number of bytes in the buffer until the socket layer
+will pass the data to the protocol
+.RB ( SO_SNDLOWAT )
+or the user on receiving
+.RB ( SO_RCVLOWAT ).
+These two values are initialized to 1.
+.B SO_SNDLOWAT
+is not changeable on Linux
+.RB ( setsockopt (2)
+fails with the error
+.BR ENOPROTOOPT ).
+.B SO_RCVLOWAT
+is changeable
+only since Linux 2.4.
+.IP
+Before Linux 2.6.28
+.\" Tested on kernel 2.6.14 -- mtk, 30 Nov 05
+.BR select (2),
+.BR poll (2),
+and
+.BR epoll (7)
+did not respect the
+.B SO_RCVLOWAT
+setting on Linux,
+and indicated a socket as readable when even a single byte of data
+was available.
+A subsequent read from the socket would then block until
+.B SO_RCVLOWAT
+bytes are available.
+Since Linux 2.6.28,
+.\" commit c7004482e8dcb7c3c72666395cfa98a216a4fb70
+.BR select (2),
+.BR poll (2),
+and
+.BR epoll (7)
+indicate a socket as readable only if at least
+.B SO_RCVLOWAT
+bytes are available.
+.TP
+.BR SO_RCVTIMEO " and " SO_SNDTIMEO
+.\" Not implemented in Linux 2.0.
+.\" Implemented in Linux 2.1.11 for getsockopt: always return a zero struct.
+.\" Implemented in Linux 2.3.41 for setsockopt, and actually used.
+Specify the receiving or sending timeouts until reporting an error.
+The argument is a
+.IR "struct timeval" .
+If an input or output function blocks for this period of time, and
+data has been sent or received, the return value of that function
+will be the amount of data transferred; if no data has been transferred
+and the timeout has been reached, then \-1 is returned with
+.I errno
+set to
+.B EAGAIN
+or
+.BR EWOULDBLOCK ,
+.\" in fact to EAGAIN
+or
+.B EINPROGRESS
+(for
+.BR connect (2))
+just as if the socket was specified to be nonblocking.
+If the timeout is set to zero (the default),
+then the operation will never timeout.
+Timeouts only have effect for system calls that perform socket I/O (e.g.,
+.BR accept (2),
+.BR connect (2),
+.BR read (2),
+.BR recvmsg (2),
+.BR send (2),
+.BR sendmsg (2));
+timeouts have no effect for
+.BR select (2),
+.BR poll (2),
+.BR epoll_wait (2),
+and so on.
+.TP
+.B SO_REUSEADDR
+.\" commit c617f398edd4db2b8567a28e899a88f8f574798d
+.\" https://lwn.net/Articles/542629/
+Indicates that the rules used in validating addresses supplied in a
+.BR bind (2)
+call should allow reuse of local addresses.
+For
+.B AF_INET
+sockets this
+means that a socket may bind, except when there
+is an active listening socket bound to the address.
+When the listening socket is bound to
+.B INADDR_ANY
+with a specific port then it is not possible
+to bind to this port for any local address.
+Argument is an integer boolean flag.
+.TP
+.BR SO_REUSEPORT " (since Linux 3.9)"
+Permits multiple
+.B AF_INET
+or
+.B AF_INET6
+sockets to be bound to an identical socket address.
+This option must be set on each socket (including the first socket)
+prior to calling
+.BR bind (2)
+on the socket.
+To prevent port hijacking,
+all of the processes binding to the same address must have the same
+effective UID.
+This option can be employed with both TCP and UDP sockets.
+.IP
+For TCP sockets, this option allows
+.BR accept (2)
+load distribution in a multi-threaded server to be improved by
+using a distinct listener socket for each thread.
+This provides improved load distribution as compared
+to traditional techniques such using a single
+.BR accept (2)ing
+thread that distributes connections,
+or having multiple threads that compete to
+.BR accept (2)
+from the same socket.
+.IP
+For UDP sockets,
+the use of this option can provide better distribution
+of incoming datagrams to multiple processes (or threads) as compared
+to the traditional technique of having multiple processes
+compete to receive datagrams on the same socket.
+.TP
+.BR SO_RXQ_OVFL " (since Linux 2.6.33)"
+.\" commit 3b885787ea4112eaa80945999ea0901bf742707f
+Indicates that an unsigned 32-bit value ancillary message (cmsg)
+should be attached to received skbs indicating
+the number of packets dropped by the socket since its creation.
+.TP
+.BR SO_SELECT_ERR_QUEUE " (since Linux 3.10)"
+.\" commit 7d4c04fc170087119727119074e72445f2bb192b
+.\" Author: Keller, Jacob E <jacob.e.keller@intel.com>
+When this option is set on a socket,
+an error condition on a socket causes notification not only via the
+.I exceptfds
+set of
+.BR select (2).
+Similarly,
+.BR poll (2)
+also returns a
+.B POLLPRI
+whenever an
+.B POLLERR
+event is returned.
+.\" It does not affect wake up.
+.IP
+Background: this option was added when waking up on an error condition
+occurred only via the
+.I readfds
+and
+.I writefds
+sets of
+.BR select (2).
+The option was added to allow monitoring for error conditions via the
+.I exceptfds
+argument without simultaneously having to receive notifications (via
+.IR readfds )
+for regular data that can be read from the socket.
+After changes in Linux 4.16,
+.\" commit 6e5d58fdc9bedd0255a8
+.\" ("skbuff: Fix not waking applications when errors are enqueued")
+the use of this flag to achieve the desired notifications
+is no longer necessary.
+This option is nevertheless retained for backwards compatibility.
+.TP
+.B SO_SNDBUF
+Sets or gets the maximum socket send buffer in bytes.
+The kernel doubles this value (to allow space for bookkeeping overhead)
+when it is set using
+.\" Most (all?) other implementations do not do this -- MTK, Dec 05
+.\" See also the comment to SO_RCVBUF (17 Jul 2012 LKML mail)
+.BR setsockopt (2),
+and this doubled value is returned by
+.BR getsockopt (2).
+The default value is set by the
+.I /proc/sys/net/core/wmem_default
+file and the maximum allowed value is set by the
+.I /proc/sys/net/core/wmem_max
+file.
+The minimum (doubled) value for this option is 2048.
+.TP
+.BR SO_SNDBUFFORCE " (since Linux 2.6.14)"
+Using this socket option, a privileged
+.RB ( CAP_NET_ADMIN )
+process can perform the same task as
+.BR SO_SNDBUF ,
+but the
+.I wmem_max
+limit can be overridden.
+.TP
+.B SO_TIMESTAMP
+Enable or disable the receiving of the
+.B SO_TIMESTAMP
+control message.
+The timestamp control message is sent with level
+.B SOL_SOCKET
+and a
+.I cmsg_type
+of
+.BR SCM_TIMESTAMP .
+The
+.I cmsg_data
+field is a
+.I "struct timeval"
+indicating the
+reception time of the last packet passed to the user in this call.
+See
+.BR cmsg (3)
+for details on control messages.
+.TP
+.BR SO_TIMESTAMPNS " (since Linux 2.6.22)"
+.\" commit 92f37fd2ee805aa77925c1e64fd56088b46094fc
+Enable or disable the receiving of the
+.B SO_TIMESTAMPNS
+control message.
+The timestamp control message is sent with level
+.B SOL_SOCKET
+and a
+.I cmsg_type
+of
+.BR SCM_TIMESTAMPNS .
+The
+.I cmsg_data
+field is a
+.I "struct timespec"
+indicating the
+reception time of the last packet passed to the user in this call.
+The clock used for the timestamp is
+.BR CLOCK_REALTIME .
+See
+.BR cmsg (3)
+for details on control messages.
+.IP
+A socket cannot mix
+.B SO_TIMESTAMP
+and
+.BR SO_TIMESTAMPNS :
+the two modes are mutually exclusive.
+.TP
+.B SO_TYPE
+Gets the socket type as an integer (e.g.,
+.BR SOCK_STREAM ).
+This socket option is read-only.
+.TP
+.BR SO_BUSY_POLL " (since Linux 3.11)"
+Sets the approximate time in microseconds to busy poll on a blocking receive
+when there is no data.
+Increasing this value requires
+.BR CAP_NET_ADMIN .
+The default for this option is controlled by the
+.I /proc/sys/net/core/busy_read
+file.
+.IP
+The value in the
+.I /proc/sys/net/core/busy_poll
+file determines how long
+.BR select (2)
+and
+.BR poll (2)
+will busy poll when they operate on sockets with
+.B SO_BUSY_POLL
+set and no events to report are found.
+.IP
+In both cases,
+busy polling will only be done when the socket last received data
+from a network device that supports this option.
+.IP
+While busy polling may improve latency of some applications,
+care must be taken when using it since this will increase
+both CPU utilization and power usage.
+.SS Signals
+When writing onto a connection-oriented socket that has been shut down
+(by the local or the remote end)
+.B SIGPIPE
+is sent to the writing process and
+.B EPIPE
+is returned.
+The signal is not sent when the write call
+specified the
+.B MSG_NOSIGNAL
+flag.
+.PP
+When requested with the
+.B FIOSETOWN
+.BR fcntl (2)
+or
+.B SIOCSPGRP
+.BR ioctl (2),
+.B SIGIO
+is sent when an I/O event occurs.
+It is possible to use
+.BR poll (2)
+or
+.BR select (2)
+in the signal handler to find out which socket the event occurred on.
+An alternative (in Linux 2.2) is to set a real-time signal using the
+.B F_SETSIG
+.BR fcntl (2);
+the handler of the real time signal will be called with
+the file descriptor in the
+.I si_fd
+field of its
+.IR siginfo_t .
+See
+.BR fcntl (2)
+for more information.
+.PP
+Under some circumstances (e.g., multiple processes accessing a
+single socket), the condition that caused the
+.B SIGIO
+may have already disappeared when the process reacts to the signal.
+If this happens, the process should wait again because Linux
+will resend the signal later.
+.\" .SS Ancillary messages
+.SS /proc interfaces
+The core socket networking parameters can be accessed
+via files in the directory
+.IR /proc/sys/net/core/ .
+.TP
+.I rmem_default
+contains the default setting in bytes of the socket receive buffer.
+.TP
+.I rmem_max
+contains the maximum socket receive buffer size in bytes which a user may
+set by using the
+.B SO_RCVBUF
+socket option.
+.TP
+.I wmem_default
+contains the default setting in bytes of the socket send buffer.
+.TP
+.I wmem_max
+contains the maximum socket send buffer size in bytes which a user may
+set by using the
+.B SO_SNDBUF
+socket option.
+.TP
+.IR message_cost " and " message_burst
+configure the token bucket filter used to load limit warning messages
+caused by external network events.
+.TP
+.I netdev_max_backlog
+Maximum number of packets in the global input queue.
+.TP
+.I optmem_max
+Maximum length of ancillary data and user control data like the iovecs
+per socket.
+.\" netdev_fastroute is not documented because it is experimental
+.SS Ioctls
+These operations can be accessed using
+.BR ioctl (2):
+.PP
+.in +4n
+.EX
+.IB error " = ioctl(" ip_socket ", " ioctl_type ", " &value_result ");"
+.EE
+.in
+.TP
+.B SIOCGSTAMP
+Return a
+.I struct timeval
+with the receive timestamp of the last packet passed to the user.
+This is useful for accurate round trip time measurements.
+See
+.BR setitimer (2)
+for a description of
+.IR "struct timeval" .
+.\"
+This ioctl should be used only if the socket options
+.B SO_TIMESTAMP
+and
+.B SO_TIMESTAMPNS
+are not set on the socket.
+Otherwise, it returns the timestamp of the
+last packet that was received while
+.B SO_TIMESTAMP
+and
+.B SO_TIMESTAMPNS
+were not set, or it fails if no such packet has been received,
+(i.e.,
+.BR ioctl (2)
+returns \-1 with
+.I errno
+set to
+.BR ENOENT ).
+.TP
+.B SIOCSPGRP
+Set the process or process group that is to receive
+.B SIGIO
+or
+.B SIGURG
+signals when I/O becomes possible or urgent data is available.
+The argument is a pointer to a
+.IR pid_t .
+For further details, see the description of
+.B F_SETOWN
+in
+.BR fcntl (2).
+.TP
+.B FIOASYNC
+Change the
+.B O_ASYNC
+flag to enable or disable asynchronous I/O mode of the socket.
+Asynchronous I/O mode means that the
+.B SIGIO
+signal or the signal set with
+.B F_SETSIG
+is raised when a new I/O event occurs.
+.IP
+Argument is an integer boolean flag.
+(This operation is synonymous with the use of
+.BR fcntl (2)
+to set the
+.B O_ASYNC
+flag.)
+.\"
+.TP
+.B SIOCGPGRP
+Get the current process or process group that receives
+.B SIGIO
+or
+.B SIGURG
+signals,
+or 0
+when none is set.
+.PP
+Valid
+.BR fcntl (2)
+operations:
+.TP
+.B FIOGETOWN
+The same as the
+.B SIOCGPGRP
+.BR ioctl (2).
+.TP
+.B FIOSETOWN
+The same as the
+.B SIOCSPGRP
+.BR ioctl (2).
+.SH VERSIONS
+.B SO_BINDTODEVICE
+was introduced in Linux 2.0.30.
+.B SO_PASSCRED
+is new in Linux 2.2.
+The
+.I /proc
+interfaces were introduced in Linux 2.2.
+.B SO_RCVTIMEO
+and
+.B SO_SNDTIMEO
+are supported since Linux 2.3.41.
+Earlier, timeouts were fixed to
+a protocol-specific setting, and could not be read or written.
+.SH NOTES
+Linux assumes that half of the send/receive buffer is used for internal
+kernel structures; thus the values in the corresponding
+.I /proc
+files are twice what can be observed on the wire.
+.PP
+Linux will allow port reuse only with the
+.B SO_REUSEADDR
+option
+when this option was set both in the previous program that performed a
+.BR bind (2)
+to the port and in the program that wants to reuse the port.
+This differs from some implementations (e.g., FreeBSD)
+where only the later program needs to set the
+.B SO_REUSEADDR
+option.
+Typically this difference is invisible, since, for example, a server
+program is designed to always set this option.
+.\" .SH AUTHORS
+.\" This man page was written by Andi Kleen.
+.SH SEE ALSO
+.BR wireshark (1),
+.BR bpf (2),
+.BR connect (2),
+.BR getsockopt (2),
+.BR setsockopt (2),
+.BR socket (2),
+.BR pcap (3),
+.BR address_families (7),
+.BR capabilities (7),
+.BR ddp (7),
+.BR ip (7),
+.BR ipv6 (7),
+.BR packet (7),
+.BR tcp (7),
+.BR udp (7),
+.BR unix (7),
+.BR tcpdump (8)