summaryrefslogtreecommitdiffstats
path: root/upstream/opensuse-tumbleweed/man7/tcp.7
diff options
context:
space:
mode:
Diffstat (limited to 'upstream/opensuse-tumbleweed/man7/tcp.7')
-rw-r--r--upstream/opensuse-tumbleweed/man7/tcp.71585
1 files changed, 1585 insertions, 0 deletions
diff --git a/upstream/opensuse-tumbleweed/man7/tcp.7 b/upstream/opensuse-tumbleweed/man7/tcp.7
new file mode 100644
index 00000000..6cc9bbab
--- /dev/null
+++ b/upstream/opensuse-tumbleweed/man7/tcp.7
@@ -0,0 +1,1585 @@
+.\" SPDX-License-Identifier: Linux-man-pages-1-para
+.\"
+.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
+.\" and Copyright (C) 2008 Michael Kerrisk <mtk.manpages@gmail.com>
+.\" Note also that many pieces are drawn from the kernel source file
+.\" Documentation/networking/ip-sysctl.txt.
+.\"
+.\" 2.4 Updates by Nivedita Singhvi 4/20/02 <nivedita@us.ibm.com>.
+.\" Modified, 2004-11-11, Michael Kerrisk and Andries Brouwer
+.\" Updated details of interaction of TCP_CORK and TCP_NODELAY.
+.\"
+.\" 2008-11-21, mtk, many, many updates.
+.\" The descriptions of /proc files and socket options should now
+.\" be more or less up to date and complete as at Linux 2.6.27
+.\" (other than the remaining FIXMEs in the page source below).
+.\"
+.\" FIXME The following need to be documented
+.\" TCP_MD5SIG (2.6.20)
+.\" commit cfb6eeb4c860592edd123fdea908d23c6ad1c7dc
+.\" Author was yoshfuji@linux-ipv6.org
+.\" Needs CONFIG_TCP_MD5SIG
+.\" From net/inet/Kconfig:
+.\" bool "TCP: MD5 Signature Option support (RFC2385) (EXPERIMENTAL)"
+.\" RFC2385 specifies a method of giving MD5 protection to TCP sessions.
+.\" Its main (only?) use is to protect BGP sessions between core routers
+.\" on the Internet.
+.\"
+.\" There is a TCP_MD5SIG option documented in FreeBSD's tcp(4),
+.\" but probably many details are different on Linux
+.\" http://thread.gmane.org/gmane.linux.network/47490
+.\" http://www.daemon-systems.org/man/tcp.4.html
+.\" http://article.gmane.org/gmane.os.netbsd.devel.network/3767/match=tcp_md5sig+freebsd
+.\"
+.\" TCP_COOKIE_TRANSACTIONS (2.6.33)
+.\" commit 519855c508b9a17878c0977a3cdefc09b59b30df
+.\" Author: William Allen Simpson <william.allen.simpson@gmail.com>
+.\" commit e56fb50f2b7958b931c8a2fc0966061b3f3c8f3a
+.\" Author: William Allen Simpson <william.allen.simpson@gmail.com>
+.\"
+.\" REMOVED in Linux 3.10
+.\" commit 1a2c6181c4a1922021b4d7df373bba612c3e5f04
+.\" Author: Christoph Paasch <christoph.paasch@uclouvain.be>
+.\"
+.\" TCP_THIN_LINEAR_TIMEOUTS (2.6.34)
+.\" commit 36e31b0af58728071e8023cf8e20c5166b700717
+.\" Author: Andreas Petlund <apetlund@simula.no>
+.\"
+.\" TCP_THIN_DUPACK (2.6.34)
+.\" commit 7e38017557bc0b87434d184f8804cadb102bb903
+.\" Author: Andreas Petlund <apetlund@simula.no>
+.\"
+.\" TCP_REPAIR (3.5)
+.\" commit ee9952831cfd0bbe834f4a26489d7dce74582e37
+.\" Author: Pavel Emelyanov <xemul@parallels.com>
+.\" See also
+.\" http://criu.org/TCP_connection
+.\" https://lwn.net/Articles/495304/
+.\"
+.\" TCP_REPAIR_QUEUE (3.5)
+.\" commit ee9952831cfd0bbe834f4a26489d7dce74582e37
+.\" Author: Pavel Emelyanov <xemul@parallels.com>
+.\"
+.\" TCP_QUEUE_SEQ (3.5)
+.\" commit ee9952831cfd0bbe834f4a26489d7dce74582e37
+.\" Author: Pavel Emelyanov <xemul@parallels.com>
+.\"
+.\" TCP_REPAIR_OPTIONS (3.5)
+.\" commit b139ba4e90dccbf4cd4efb112af96a5c9e0b098c
+.\" Author: Pavel Emelyanov <xemul@parallels.com>
+.\"
+.\" TCP_FASTOPEN (3.6)
+.\" (Fast Open server side implementation completed in Linux 3.7)
+.\" http://lwn.net/Articles/508865/
+.\"
+.\" TCP_TIMESTAMP (3.9)
+.\" commit 93be6ce0e91b6a94783e012b1857a347a5e6e9f2
+.\" Author: Andrey Vagin <avagin@openvz.org>
+.\"
+.\" TCP_NOTSENT_LOWAT (3.12)
+.\" commit c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36
+.\" Author: Eric Dumazet <edumazet@google.com>
+.\"
+.\" TCP_CC_INFO (4.1)
+.\" commit 6e9250f59ef9efb932c84850cd221f22c2a03c4a
+.\" Author: Eric Dumazet <edumazet@google.com>
+.\"
+.\" TCP_SAVE_SYN, TCP_SAVED_SYN (4.2)
+.\" commit cd8ae85299d54155702a56811b2e035e63064d3d
+.\" Author: Eric Dumazet <edumazet@google.com>
+.\"
+.TH tcp 7 2023-07-15 "Linux man-pages 6.05.01"
+.SH NAME
+tcp \- TCP protocol
+.SH SYNOPSIS
+.nf
+.B #include <sys/socket.h>
+.B #include <netinet/in.h>
+.B #include <netinet/tcp.h>
+.PP
+.IB tcp_socket " = socket(AF_INET, SOCK_STREAM, 0);"
+.fi
+.SH DESCRIPTION
+This is an implementation of the TCP protocol defined in
+RFC\ 793, RFC\ 1122 and RFC\ 2001 with the NewReno and SACK
+extensions.
+It provides a reliable, stream-oriented,
+full-duplex connection between two sockets on top of
+.BR ip (7),
+for both v4 and v6 versions.
+TCP guarantees that the data arrives in order and
+retransmits lost packets.
+It generates and checks a per-packet checksum to catch
+transmission errors.
+TCP does not preserve record boundaries.
+.PP
+A newly created TCP socket has no remote or local address and is not
+fully specified.
+To create an outgoing TCP connection use
+.BR connect (2)
+to establish a connection to another TCP socket.
+To receive new incoming connections, first
+.BR bind (2)
+the socket to a local address and port and then call
+.BR listen (2)
+to put the socket into the listening state.
+After that a new socket for each incoming connection can be accepted using
+.BR accept (2).
+A socket which has had
+.BR accept (2)
+or
+.BR connect (2)
+successfully called on it is fully specified and may transmit data.
+Data cannot be transmitted on listening or not yet connected sockets.
+.PP
+Linux supports RFC\ 1323 TCP high performance
+extensions.
+These include Protection Against Wrapped
+Sequence Numbers (PAWS), Window Scaling and Timestamps.
+Window scaling allows the use
+of large (> 64\ kB) TCP windows in order to support links with high
+latency or bandwidth.
+To make use of them, the send and receive buffer sizes must be increased.
+They can be set globally with the
+.I /proc/sys/net/ipv4/tcp_wmem
+and
+.I /proc/sys/net/ipv4/tcp_rmem
+files, or on individual sockets by using the
+.B SO_SNDBUF
+and
+.B SO_RCVBUF
+socket options with the
+.BR setsockopt (2)
+call.
+.PP
+The maximum sizes for socket buffers declared via the
+.B SO_SNDBUF
+and
+.B SO_RCVBUF
+mechanisms are limited by the values in the
+.I /proc/sys/net/core/rmem_max
+and
+.I /proc/sys/net/core/wmem_max
+files.
+Note that TCP actually allocates twice the size of
+the buffer requested in the
+.BR setsockopt (2)
+call, and so a succeeding
+.BR getsockopt (2)
+call will not return the same size of buffer as requested in the
+.BR setsockopt (2)
+call.
+TCP uses the extra space for administrative purposes and internal
+kernel structures, and the
+.I /proc
+file values reflect the
+larger sizes compared to the actual TCP windows.
+On individual connections, the socket buffer size must be set prior to the
+.BR listen (2)
+or
+.BR connect (2)
+calls in order to have it take effect.
+See
+.BR socket (7)
+for more information.
+.PP
+TCP supports urgent data.
+Urgent data is used to signal the
+receiver that some important message is part of the data
+stream and that it should be processed as soon as possible.
+To send urgent data specify the
+.B MSG_OOB
+option to
+.BR send (2).
+When urgent data is received, the kernel sends a
+.B SIGURG
+signal to the process or process group that has been set as the
+socket "owner" using the
+.B SIOCSPGRP
+or
+.B FIOSETOWN
+ioctls (or the POSIX.1-specified
+.BR fcntl (2)
+.B F_SETOWN
+operation).
+When the
+.B SO_OOBINLINE
+socket option is enabled, urgent data is put into the normal
+data stream (a program can test for its location using the
+.B SIOCATMARK
+ioctl described below),
+otherwise it can be received only when the
+.B MSG_OOB
+flag is set for
+.BR recv (2)
+or
+.BR recvmsg (2).
+.PP
+When out-of-band data is present,
+.BR select (2)
+indicates the file descriptor as having an exceptional condition and
+.I poll (2)
+indicates a
+.B POLLPRI
+event.
+.PP
+Linux 2.4 introduced a number of changes for improved
+throughput and scaling, as well as enhanced functionality.
+Some of these features include support for zero-copy
+.BR sendfile (2),
+Explicit Congestion Notification, new
+management of TIME_WAIT sockets, keep-alive socket options
+and support for Duplicate SACK extensions.
+.SS Address formats
+TCP is built on top of IP (see
+.BR ip (7)).
+The address formats defined by
+.BR ip (7)
+apply to TCP.
+TCP supports point-to-point communication only;
+broadcasting and multicasting are not
+supported.
+.SS /proc interfaces
+System-wide TCP parameter settings can be accessed by files in the directory
+.IR /proc/sys/net/ipv4/ .
+In addition, most IP
+.I /proc
+interfaces also apply to TCP; see
+.BR ip (7).
+Variables described as
+.I Boolean
+take an integer value, with a nonzero value ("true") meaning that
+the corresponding option is enabled, and a zero value ("false")
+meaning that the option is disabled.
+.TP
+.IR tcp_abc " (Integer; default: 0; Linux 2.6.15 to Linux 3.8)"
+.\" Since Linux 2.6.15; removed in Linux 3.9
+.\" commit ca2eb5679f8ddffff60156af42595df44a315ef0
+.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
+Control the Appropriate Byte Count (ABC), defined in RFC 3465.
+ABC is a way of increasing the congestion window
+.RI ( cwnd )
+more slowly in response to partial acknowledgements.
+Possible values are:
+.RS
+.TP
+.B 0
+increase
+.I cwnd
+once per acknowledgement (no ABC)
+.TP
+.B 1
+increase
+.I cwnd
+once per acknowledgement of full sized segment
+.TP
+.B 2
+allow increase
+.I cwnd
+by two if acknowledgement is
+of two segments to compensate for delayed acknowledgements.
+.RE
+.TP
+.IR tcp_abort_on_overflow " (Boolean; default: disabled; since Linux 2.4)"
+.\" Since Linux 2.3.41
+Enable resetting connections if the listening service is too
+slow and unable to keep up and accept them.
+It means that if overflow occurred due
+to a burst, the connection will recover.
+Enable this option
+.I only
+if you are really sure that the listening daemon
+cannot be tuned to accept connections faster.
+Enabling this option can harm the clients of your server.
+.TP
+.IR tcp_adv_win_scale " (integer; default: 2; since Linux 2.4)"
+.\" Since Linux 2.4.0-test7
+Count buffering overhead as
+.IR "bytes/2\[ha]tcp_adv_win_scale" ,
+if
+.I tcp_adv_win_scale
+is greater than 0; or
+.IR "bytes\-bytes/2\[ha](\-tcp_adv_win_scale)" ,
+if
+.I tcp_adv_win_scale
+is less than or equal to zero.
+.IP
+The socket receive buffer space is shared between the
+application and kernel.
+TCP maintains part of the buffer as
+the TCP window, this is the size of the receive window
+advertised to the other end.
+The rest of the space is used
+as the "application" buffer, used to isolate the network
+from scheduling and application latencies.
+The
+.I tcp_adv_win_scale
+default value of 2 implies that the space
+used for the application buffer is one fourth that of the total.
+.TP
+.IR tcp_allowed_congestion_control " (String; default: see text; since Linux 2.4.20)"
+.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
+Show/set the congestion control algorithm choices available to unprivileged
+processes (see the description of the
+.B TCP_CONGESTION
+socket option).
+The items in the list are separated by white space and
+terminated by a newline character.
+The list is a subset of those listed in
+.IR tcp_available_congestion_control .
+The default value for this list is "reno" plus the default setting of
+.IR tcp_congestion_control .
+.TP
+.IR tcp_autocorking " (Boolean; default: enabled; since Linux 3.14)"
+.\" commit f54b311142a92ea2e42598e347b84e1655caf8e3
+.\" Text heavily based on Documentation/networking/ip-sysctl.txt
+If this option is enabled, the kernel tries to coalesce small writes
+(from consecutive
+.BR write (2)
+and
+.BR sendmsg (2)
+calls) as much as possible,
+in order to decrease the total number of sent packets.
+Coalescing is done if at least one prior packet for the flow
+is waiting in Qdisc queues or device transmit queue.
+Applications can still use the
+.B TCP_CORK
+socket option to obtain optimal behavior
+when they know how/when to uncork their sockets.
+.TP
+.IR tcp_available_congestion_control " (String; read-only; since Linux 2.4.20)"
+.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
+Show a list of the congestion-control algorithms
+that are registered.
+The items in the list are separated by white space and
+terminated by a newline character.
+This list is a limiting set for the list in
+.IR tcp_allowed_congestion_control .
+More congestion-control algorithms may be available as modules,
+but not loaded.
+.TP
+.IR tcp_app_win " (integer; default: 31; since Linux 2.4)"
+.\" Since Linux 2.4.0-test7
+This variable defines how many
+bytes of the TCP window are reserved for buffering overhead.
+.IP
+A maximum of (\fIwindow/2\[ha]tcp_app_win\fP, mss) bytes in the window
+are reserved for the application buffer.
+A value of 0 implies that no amount is reserved.
+.\"
+.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
+.TP
+.IR tcp_base_mss " (Integer; default: 512; since Linux 2.6.17)"
+The initial value of
+.I search_low
+to be used by the packetization layer Path MTU discovery (MTU probing).
+If MTU probing is enabled,
+this is the initial MSS used by the connection.
+.\"
+.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt
+.TP
+.IR tcp_bic " (Boolean; default: disabled; Linux 2.4.27/2.6.6 to Linux 2.6.13)"
+Enable BIC TCP congestion control algorithm.
+BIC-TCP is a sender-side-only change that ensures a linear RTT
+fairness under large windows while offering both scalability and
+bounded TCP-friendliness.
+The protocol combines two schemes
+called additive increase and binary search increase.
+When the congestion window is large, additive increase with a large
+increment ensures linear RTT fairness as well as good scalability.
+Under small congestion windows, binary search
+increase provides TCP friendliness.
+.\"
+.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt
+.TP
+.IR tcp_bic_low_window " (integer; default: 14; Linux 2.4.27/2.6.6 to Linux 2.6.13)"
+Set the threshold window (in packets) where BIC TCP starts to
+adjust the congestion window.
+Below this threshold BIC TCP behaves the same as the default TCP Reno.
+.\"
+.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt
+.TP
+.IR tcp_bic_fast_convergence " (Boolean; default: enabled; Linux 2.4.27/2.6.6 to Linux 2.6.13)"
+Force BIC TCP to more quickly respond to changes in congestion window.
+Allows two flows sharing the same connection to converge more rapidly.
+.TP
+.IR tcp_congestion_control " (String; default: see text; since Linux 2.4.13)"
+.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
+Set the default congestion-control algorithm to be used for new connections.
+The algorithm "reno" is always available,
+but additional choices may be available depending on kernel configuration.
+The default value for this file is set as part of kernel configuration.
+.TP
+.IR tcp_dma_copybreak " (integer; default: 4096; since Linux 2.6.24)"
+Lower limit, in bytes, of the size of socket reads that will be
+offloaded to a DMA copy engine, if one is present in the system
+and the kernel was configured with the
+.B CONFIG_NET_DMA
+option.
+.TP
+.IR tcp_dsack " (Boolean; default: enabled; since Linux 2.4)"
+.\" Since Linux 2.4.0-test7
+Enable RFC\ 2883 TCP Duplicate SACK support.
+.TP
+.IR tcp_fastopen " (Bitmask; default: 0x1; since Linux 3.7)"
+Enables RFC\~7413 Fast Open support.
+The flag is used as a bitmap with the following values:
+.RS
+.TP
+.B 0x1
+Enables client side Fast Open support
+.TP
+.B 0x2
+Enables server side Fast Open support
+.TP
+.B 0x4
+Allows client side to transmit data in SYN without Fast Open option
+.TP
+.B 0x200
+Allows server side to accept SYN data without Fast Open option
+.TP
+.B 0x400
+Enables Fast Open on all listeners without
+.B TCP_FASTOPEN
+socket option
+.RE
+.TP
+.IR tcp_fastopen_key " (since Linux 3.7)"
+Set server side RFC\~7413 Fast Open key to generate Fast Open cookie
+when server side Fast Open support is enabled.
+.TP
+.IR tcp_ecn " (Integer; default: see below; since Linux 2.4)"
+.\" Since Linux 2.4.0-test7
+Enable RFC\ 3168 Explicit Congestion Notification.
+.IP
+This file can have one of the following values:
+.RS
+.TP
+.B 0
+Disable ECN.
+Neither initiate nor accept ECN.
+This was the default up to and including Linux 2.6.30.
+.TP
+.B 1
+Enable ECN when requested by incoming connections and also
+request ECN on outgoing connection attempts.
+.TP
+.B 2
+.\" commit 255cac91c3c9ce7dca7713b93ab03c75b7902e0e
+Enable ECN when requested by incoming connections,
+but do not request ECN on outgoing connections.
+This value is supported, and is the default, since Linux 2.6.31.
+.RE
+.IP
+When enabled, connectivity to some destinations could be affected
+due to older, misbehaving middle boxes along the path, causing
+connections to be dropped.
+However, to facilitate and encourage deployment with option 1, and
+to work around such buggy equipment, the
+.B tcp_ecn_fallback
+option has been introduced.
+.TP
+.IR tcp_ecn_fallback " (Boolean; default: enabled; since Linux 4.1)"
+.\" commit 492135557dc090a1abb2cfbe1a412757e3ed68ab
+Enable RFC\ 3168, Section 6.1.1.1. fallback.
+When enabled, outgoing ECN-setup SYNs that time out within the
+normal SYN retransmission timeout will be resent with CWR and
+ECE cleared.
+.TP
+.IR tcp_fack " (integer; default: see below; since Linux 2.2)"
+.\" Since Linux 2.1.92
+Enable TCP Forward Acknowledgement support.
+
+Prior to Linux 4.11, this option was enabled by default.
+In Linux 4.11, it was disabled by default in favor of RACK (see
+tcp_recovery).
+In Linux 4.15 it was deprecated entirely and its value is ignored.
+.TP
+.IR tcp_recovery " (integer; default: 0x1; since Linux 4.4)"
+Enable various experimental loss recovery features.
+
+This field is a bitmap to enable various loss recovery features.
+.RS
+.IP 0x1
+enables the RACK loss detection for fast detection of lost
+retransmissions and tail drops. It also subsumes and disables
+RFC6675 recovery for SACK connections. (Since Linux 4.4)
+.IP 0x2
+makes RACK's reordering window static (min_rtt/4). (Since
+Linux 4.15)
+.IP 0x4
+disables RACK's DUPACK threshold heuristic (Since Linux
+4.18).
+.RE
+.TP
+.IR tcp_fin_timeout " (integer; default: 60; since Linux 2.2)"
+.\" Since Linux 2.1.53
+This specifies how many seconds to wait for a final FIN packet before the
+socket is forcibly closed.
+This is strictly a violation of the TCP specification,
+but required to prevent denial-of-service attacks.
+In Linux 2.2, the default value was 180.
+.\"
+.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt
+.TP
+.IR tcp_frto " (integer; default: see below; since Linux 2.4.21/2.6)"
+.\" Since Linux 2.4.21/2.5.43
+Enable F-RTO, an enhanced recovery algorithm for TCP retransmission
+timeouts (RTOs).
+It is particularly beneficial in wireless environments
+where packet loss is typically due to random radio interference
+rather than intermediate router congestion.
+See RFC 4138 for more details.
+.IP
+This file can have one of the following values:
+.RS
+.TP
+.B 0
+Disabled.
+This was the default up to and including Linux 2.6.23.
+.TP
+.B 1
+The basic version F-RTO algorithm is enabled.
+.TP
+.B 2
+.\" commit c96fd3d461fa495400df24be3b3b66f0e0b152f9
+Enable SACK-enhanced F-RTO if flow uses SACK.
+The basic version can be used also when
+SACK is in use though in that case scenario(s) exists where F-RTO
+interacts badly with the packet counting of the SACK-enabled TCP flow.
+This value is the default since Linux 2.6.24.
+.RE
+.IP
+Before Linux 2.6.22, this parameter was a Boolean value,
+supporting just values 0 and 1 above.
+.TP
+.IR tcp_frto_response " (integer; default: 0; since Linux 2.6.22)"
+When F-RTO has detected that a TCP retransmission timeout was spurious
+(i.e., the timeout would have been avoided had TCP set a
+longer retransmission timeout),
+TCP has several options concerning what to do next.
+Possible values are:
+.RS
+.TP
+.B 0
+Rate halving based; a smooth and conservative response,
+results in halved congestion window
+.RI ( cwnd )
+and slow-start threshold
+.RI ( ssthresh )
+after one RTT.
+.TP
+.B 1
+Very conservative response; not recommended because even
+though being valid, it interacts poorly with the rest of Linux TCP; halves
+.I cwnd
+and
+.I ssthresh
+immediately.
+.TP
+.B 2
+Aggressive response; undoes congestion-control measures
+that are now known to be unnecessary
+(ignoring the possibility of a lost retransmission that would require
+TCP to be more cautious);
+.I cwnd
+and
+.I ssthresh
+are restored to the values prior to timeout.
+.RE
+.TP
+.IR tcp_keepalive_intvl " (integer; default: 75; since Linux 2.4)"
+.\" Since Linux 2.3.18
+The number of seconds between TCP keep-alive probes.
+.TP
+.IR tcp_keepalive_probes " (integer; default: 9; since Linux 2.2)"
+.\" Since Linux 2.1.43
+The maximum number of TCP keep-alive probes to send
+before giving up and killing the connection if
+no response is obtained from the other end.
+.TP
+.IR tcp_keepalive_time " (integer; default: 7200; since Linux 2.2)"
+.\" Since Linux 2.1.43
+The number of seconds a connection needs to be idle
+before TCP begins sending out keep-alive probes.
+Keep-alives are sent only when the
+.B SO_KEEPALIVE
+socket option is enabled.
+The default value is 7200 seconds (2 hours).
+An idle connection is terminated after
+approximately an additional 11 minutes (9 probes an interval
+of 75 seconds apart) when keep-alive is enabled.
+.IP
+Note that underlying connection tracking mechanisms and
+application timeouts may be much shorter.
+.\"
+.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt
+.TP
+.IR tcp_low_latency " (Boolean; default: disabled; since Linux 2.4.21/2.6; \
+obsolete since Linux 4.14)"
+.\" Since Linux 2.4.21/2.5.60
+If enabled, the TCP stack makes decisions that prefer lower
+latency as opposed to higher throughput.
+It this option is disabled, then higher throughput is preferred.
+An example of an application where this default should be
+changed would be a Beowulf compute cluster.
+Since Linux 4.14,
+.\" commit b6690b14386698ce2c19309abad3f17656bdfaea
+this file still exists, but its value is ignored.
+.TP
+.IR tcp_max_orphans " (integer; default: see below; since Linux 2.4)"
+.\" Since Linux 2.3.41
+The maximum number of orphaned (not attached to any user file
+handle) TCP sockets allowed in the system.
+When this number is exceeded,
+the orphaned connection is reset and a warning is printed.
+This limit exists only to prevent simple denial-of-service attacks.
+Lowering this limit is not recommended.
+Network conditions might require you to increase the number of
+orphans allowed, but note that each orphan can eat up to \[ti]64\ kB
+of unswappable memory.
+The default initial value is set equal to the kernel parameter NR_FILE.
+This initial default is adjusted depending on the memory in the system.
+.TP
+.IR tcp_max_syn_backlog " (integer; default: see below; since Linux 2.2)"
+.\" Since Linux 2.1.53
+The maximum number of queued connection requests which have
+still not received an acknowledgement from the connecting client.
+If this number is exceeded, the kernel will begin
+dropping requests.
+The default value of 256 is increased to
+1024 when the memory present in the system is adequate or
+greater (>= 128\ MB), and reduced to 128 for those systems with
+very low memory (<= 32\ MB).
+.IP
+Before Linux 2.6.20,
+.\" commit 72a3effaf633bcae9034b7e176bdbd78d64a71db
+it was recommended that if this needed to be increased above 1024,
+the size of the SYNACK hash table
+.RB ( TCP_SYNQ_HSIZE )
+in
+.I include/net/tcp.h
+should be modified to keep
+.IP
+.in +4n
+.EX
+TCP_SYNQ_HSIZE * 16 <= tcp_max_syn_backlog
+.EE
+.in
+.IP
+and the kernel should be
+recompiled.
+In Linux 2.6.20, the fixed sized
+.B TCP_SYNQ_HSIZE
+was removed in favor of dynamic sizing.
+.TP
+.IR tcp_max_tw_buckets " (integer; default: see below; since Linux 2.4)"
+.\" Since Linux 2.3.41
+The maximum number of sockets in TIME_WAIT state allowed in
+the system.
+This limit exists only to prevent simple denial-of-service attacks.
+The default value of NR_FILE*2 is adjusted
+depending on the memory in the system.
+If this number is
+exceeded, the socket is closed and a warning is printed.
+.TP
+.IR tcp_moderate_rcvbuf " (Boolean; default: enabled; since Linux 2.4.17/2.6.7)"
+.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
+If enabled, TCP performs receive buffer auto-tuning,
+attempting to automatically size the buffer (no greater than
+.IR tcp_rmem[2] )
+to match the size required by the path for full throughput.
+.TP
+.IR tcp_mem " (since Linux 2.4)"
+.\" Since Linux 2.4.0-test7
+This is a vector of 3 integers: [low, pressure, high].
+These bounds, measured in units of the system page size,
+are used by TCP to track its memory usage.
+The defaults are calculated at boot time from the amount of
+available memory.
+(TCP can only use
+.I "low memory"
+for this, which is limited to around 900 megabytes on 32-bit systems.
+64-bit systems do not suffer this limitation.)
+.RS
+.TP
+.I low
+TCP doesn't regulate its memory allocation when the number
+of pages it has allocated globally is below this number.
+.TP
+.I pressure
+When the amount of memory allocated by TCP
+exceeds this number of pages, TCP moderates its memory consumption.
+This memory pressure state is exited
+once the number of pages allocated falls below
+the
+.I low
+mark.
+.TP
+.I high
+The maximum number of pages, globally, that TCP will allocate.
+This value overrides any other limits imposed by the kernel.
+.RE
+.TP
+.IR tcp_mtu_probing " (integer; default: 0; since Linux 2.6.17)"
+.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
+This parameter controls TCP Packetization-Layer Path MTU Discovery.
+The following values may be assigned to the file:
+.RS
+.TP
+.B 0
+Disabled
+.TP
+.B 1
+Disabled by default, enabled when an ICMP black hole detected
+.TP
+.B 2
+Always enabled, use initial MSS of
+.IR tcp_base_mss .
+.RE
+.TP
+.IR tcp_no_metrics_save " (Boolean; default: disabled; since Linux 2.6.6)"
+.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
+By default, TCP saves various connection metrics in the route cache
+when the connection closes, so that connections established in the
+near future can use these to set initial conditions.
+Usually, this increases overall performance,
+but it may sometimes cause performance degradation.
+If
+.I tcp_no_metrics_save
+is enabled, TCP will not cache metrics on closing connections.
+.TP
+.IR tcp_orphan_retries " (integer; default: 8; since Linux 2.4)"
+.\" Since Linux 2.3.41
+The maximum number of attempts made to probe the other
+end of a connection which has been closed by our end.
+.TP
+.IR tcp_reordering " (integer; default: 3; since Linux 2.4)"
+.\" Since Linux 2.4.0-test7
+The maximum a packet can be reordered in a TCP packet stream
+without TCP assuming packet loss and going into slow start.
+It is not advisable to change this number.
+This is a packet reordering detection metric designed to
+minimize unnecessary back off and retransmits provoked by
+reordering of packets on a connection.
+.TP
+.IR tcp_retrans_collapse " (Boolean; default: enabled; since Linux 2.2)"
+.\" Since Linux 2.1.96
+Try to send full-sized packets during retransmit.
+.TP
+.IR tcp_retries1 " (integer; default: 3; since Linux 2.2)"
+.\" Since Linux 2.1.43
+The number of times TCP will attempt to retransmit a
+packet on an established connection normally,
+without the extra effort of getting the network layers involved.
+Once we exceed this number of
+retransmits, we first have the network layer
+update the route if possible before each new retransmit.
+The default is the RFC specified minimum of 3.
+.TP
+.IR tcp_retries2 " (integer; default: 15; since Linux 2.2)"
+.\" Since Linux 2.1.43
+The maximum number of times a TCP packet is retransmitted
+in established state before giving up.
+The default value is 15, which corresponds to a duration of
+approximately between 13 to 30 minutes, depending
+on the retransmission timeout.
+The RFC\ 1122 specified
+minimum limit of 100 seconds is typically deemed too short.
+.TP
+.IR tcp_rfc1337 " (Boolean; default: disabled; since Linux 2.2)"
+.\" Since Linux 2.1.90
+Enable TCP behavior conformant with RFC\ 1337.
+When disabled,
+if a RST is received in TIME_WAIT state, we close
+the socket immediately without waiting for the end
+of the TIME_WAIT period.
+.TP
+.IR tcp_rmem " (since Linux 2.4)"
+.\" Since Linux 2.4.0-test7
+This is a vector of 3 integers: [min, default, max].
+These parameters are used by TCP to regulate receive buffer sizes.
+TCP dynamically adjusts the size of the
+receive buffer from the defaults listed below, in the range
+of these values, depending on memory available in the system.
+.RS
+.TP
+.I min
+minimum size of the receive buffer used by each TCP socket.
+The default value is the system page size.
+(On Linux 2.4, the default value is 4\ kB, lowered to
+.B PAGE_SIZE
+bytes in low-memory systems.)
+This value
+is used to ensure that in memory pressure mode,
+allocations below this size will still succeed.
+This is not
+used to bound the size of the receive buffer declared
+using
+.B SO_RCVBUF
+on a socket.
+.TP
+.I default
+the default size of the receive buffer for a TCP socket.
+This value overwrites the initial default buffer size from
+the generic global
+.I net.core.rmem_default
+defined for all protocols.
+The default value is 87380 bytes.
+(On Linux 2.4, this will be lowered to 43689 in low-memory systems.)
+If larger receive buffer sizes are desired, this value should
+be increased (to affect all sockets).
+To employ large TCP windows, the
+.I net.ipv4.tcp_window_scaling
+must be enabled (default).
+.TP
+.I max
+the maximum size of the receive buffer used by each TCP socket.
+This value does not override the global
+.IR net.core.rmem_max .
+This is not used to limit the size of the receive buffer declared using
+.B SO_RCVBUF
+on a socket.
+The default value is calculated using the formula
+.IP
+.in +4n
+.EX
+max(87380, min(4\ MB, \fItcp_mem\fP[1]*PAGE_SIZE/128))
+.EE
+.in
+.IP
+(On Linux 2.4, the default is 87380*2 bytes,
+lowered to 87380 in low-memory systems).
+.RE
+.TP
+.IR tcp_sack " (Boolean; default: enabled; since Linux 2.2)"
+.\" Since Linux 2.1.36
+Enable RFC\ 2018 TCP Selective Acknowledgements.
+.TP
+.IR tcp_slow_start_after_idle " (Boolean; default: enabled; since Linux 2.6.18)"
+.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
+If enabled, provide RFC 2861 behavior and time out the congestion
+window after an idle period.
+An idle period is defined as the current RTO (retransmission timeout).
+If disabled, the congestion window will not
+be timed out after an idle period.
+.TP
+.IR tcp_stdurg " (Boolean; default: disabled; since Linux 2.2)"
+.\" Since Linux 2.1.44
+If this option is enabled, then use the RFC\ 1122 interpretation
+of the TCP urgent-pointer field.
+.\" RFC 793 was ambiguous in its specification of the meaning of the
+.\" urgent pointer. RFC 1122 (and RFC 961) fixed on a particular
+.\" resolution of this ambiguity (unfortunately the "wrong" one).
+According to this interpretation, the urgent pointer points
+to the last byte of urgent data.
+If this option is disabled, then use the BSD-compatible interpretation of
+the urgent pointer:
+the urgent pointer points to the first byte after the urgent data.
+Enabling this option may lead to interoperability problems.
+.TP
+.IR tcp_syn_retries " (integer; default: 6; since Linux 2.2)"
+.\" Since Linux 2.1.38
+The maximum number of times initial SYNs for an active TCP
+connection attempt will be retransmitted.
+This value should not be higher than 255.
+The default value is 6, which corresponds to retrying for up to
+approximately 127 seconds.
+Before Linux 3.7,
+.\" commit 6c9ff979d1921e9fd05d89e1383121c2503759b9
+the default value was 5, which
+(in conjunction with calculation based on other kernel parameters)
+corresponded to approximately 180 seconds.
+.TP
+.IR tcp_synack_retries " (integer; default: 5; since Linux 2.2)"
+.\" Since Linux 2.1.38
+The maximum number of times a SYN/ACK segment
+for a passive TCP connection will be retransmitted.
+This number should not be higher than 255.
+.TP
+.IR tcp_syncookies " (integer; default: 1; since Linux 2.2)"
+.\" Since Linux 2.1.43
+Enable TCP syncookies.
+The kernel must be compiled with
+.BR CONFIG_SYN_COOKIES .
+The syncookies feature attempts to protect a
+socket from a SYN flood attack.
+This should be used as a last resort, if at all.
+This is a violation of the TCP protocol,
+and conflicts with other areas of TCP such as TCP extensions.
+It can cause problems for clients and relays.
+It is not recommended as a tuning mechanism for heavily
+loaded servers to help with overloaded or misconfigured conditions.
+For recommended alternatives see
+.IR tcp_max_syn_backlog ,
+.IR tcp_synack_retries ,
+and
+.IR tcp_abort_on_overflow .
+Set to one of the following values:
+.RS
+.TP
+.B 0
+Disable TCP syncookies.
+.TP
+.B 1
+Send out syncookies when the syn backlog queue of a socket overflows.
+.TP
+.B 2
+(since Linux 3.12)
+.\" commit 5ad37d5deee1ff7150a2d0602370101de158ad86
+Send out syncookies unconditionally.
+This can be useful for network testing.
+.RE
+.TP
+.IR tcp_timestamps " (integer; default: 1; since Linux 2.2)"
+.\" Since Linux 2.1.36
+Set to one of the following values to enable or disable RFC\ 1323
+TCP timestamps:
+.RS
+.TP
+.B 0
+Disable timestamps.
+.TP
+.B 1
+Enable timestamps as defined in RFC1323 and use random offset for
+each connection rather than only using the current time.
+.TP
+.B 2
+As for the value 1, but without random offsets.
+.\" commit 25429d7b7dca01dc4f17205de023a30ca09390d0
+Setting
+.I tcp_timestamps
+to this value is meaningful since Linux 4.10.
+.RE
+.TP
+.IR tcp_tso_win_divisor " (integer; default: 3; since Linux 2.6.9)"
+This parameter controls what percentage of the congestion window
+can be consumed by a single TCP Segmentation Offload (TSO) frame.
+The setting of this parameter is a tradeoff between burstiness and
+building larger TSO frames.
+.TP
+.IR tcp_tw_recycle " (Boolean; default: disabled; Linux 2.4 to Linux 4.11)"
+.\" Since Linux 2.3.15
+.\" removed in Linux 4.12; commit 4396e46187ca5070219b81773c4e65088dac50cc
+Enable fast recycling of TIME_WAIT sockets.
+Enabling this option is
+not recommended as the remote IP may not use monotonically increasing
+timestamps (devices behind NAT, devices with per-connection timestamp
+offsets).
+See RFC 1323 (PAWS) and RFC 6191.
+.\"
+.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt
+.TP
+.IR tcp_tw_reuse " (Boolean; default: disabled; since Linux 2.4.19/2.6)"
+.\" Since Linux 2.4.19/2.5.43
+Allow to reuse TIME_WAIT sockets for new connections when it is
+safe from protocol viewpoint.
+It should not be changed without advice/request of technical experts.
+.\"
+.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt
+.TP
+.IR tcp_vegas_cong_avoid " (Boolean; default: disabled; Linux 2.2 to Linux 2.6.13)"
+.\" Since Linux 2.1.8; removed in Linux 2.6.13
+Enable TCP Vegas congestion avoidance algorithm.
+TCP Vegas is a sender-side-only change to TCP that anticipates
+the onset of congestion by estimating the bandwidth.
+TCP Vegas adjusts the sending rate by modifying the congestion window.
+TCP Vegas should provide less packet loss, but it is
+not as aggressive as TCP Reno.
+.\"
+.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt
+.TP
+.IR tcp_westwood " (Boolean; default: disabled; Linux 2.4.26/2.6.3 to Linux 2.6.13)"
+Enable TCP Westwood+ congestion control algorithm.
+TCP Westwood+ is a sender-side-only modification of the TCP Reno
+protocol stack that optimizes the performance of TCP congestion control.
+It is based on end-to-end bandwidth estimation to set
+congestion window and slow start threshold after a congestion episode.
+Using this estimation, TCP Westwood+ adaptively sets a
+slow start threshold and a congestion window which takes into
+account the bandwidth used at the time congestion is experienced.
+TCP Westwood+ significantly increases fairness with respect to
+TCP Reno in wired networks and throughput over wireless links.
+.TP
+.IR tcp_window_scaling " (Boolean; default: enabled; since Linux 2.2)"
+.\" Since Linux 2.1.36
+Enable RFC\ 1323 TCP window scaling.
+This feature allows the use of a large window
+(> 64\ kB) on a TCP connection, should the other end support it.
+Normally, the 16 bit window length field in the TCP header
+limits the window size to less than 64\ kB.
+If larger windows are desired, applications can increase the size of
+their socket buffers and the window scaling option will be employed.
+If
+.I tcp_window_scaling
+is disabled, TCP will not negotiate the use of window
+scaling with the other end during connection setup.
+.TP
+.IR tcp_wmem " (since Linux 2.4)"
+.\" Since Linux 2.4.0-test7
+This is a vector of 3 integers: [min, default, max].
+These parameters are used by TCP to regulate send buffer sizes.
+TCP dynamically adjusts the size of the send buffer from the
+default values listed below, in the range of these values,
+depending on memory available.
+.RS
+.TP
+.I min
+Minimum size of the send buffer used by each TCP socket.
+The default value is the system page size.
+(On Linux 2.4, the default value is 4\ kB.)
+This value is used to ensure that in memory pressure mode,
+allocations below this size will still succeed.
+This is not used to bound the size of the send buffer declared using
+.B SO_SNDBUF
+on a socket.
+.TP
+.I default
+The default size of the send buffer for a TCP socket.
+This value overwrites the initial default buffer size from
+the generic global
+.I /proc/sys/net/core/wmem_default
+defined for all protocols.
+The default value is 16\ kB.
+.\" True in Linux 2.4 and 2.6
+If larger send buffer sizes are desired, this value
+should be increased (to affect all sockets).
+To employ large TCP windows, the
+.I /proc/sys/net/ipv4/tcp_window_scaling
+must be set to a nonzero value (default).
+.TP
+.I max
+The maximum size of the send buffer used by each TCP socket.
+This value does not override the value in
+.IR /proc/sys/net/core/wmem_max .
+This is not used to limit the size of the send buffer declared using
+.B SO_SNDBUF
+on a socket.
+The default value is calculated using the formula
+.IP
+.in +4n
+.EX
+max(65536, min(4\ MB, \fItcp_mem\fP[1]*PAGE_SIZE/128))
+.EE
+.in
+.IP
+(On Linux 2.4, the default value is 128\ kB,
+lowered 64\ kB depending on low-memory systems.)
+.RE
+.TP
+.IR tcp_workaround_signed_windows " (Boolean; default: disabled; since Linux 2.6.26)"
+If enabled, assume that no receipt of a window-scaling option means that the
+remote TCP is broken and treats the window as a signed quantity.
+If disabled, assume that the remote TCP is not broken even if we do
+not receive a window scaling option from it.
+.SS Socket options
+To set or get a TCP socket option, call
+.BR getsockopt (2)
+to read or
+.BR setsockopt (2)
+to write the option with the option level argument set to
+.BR IPPROTO_TCP .
+Unless otherwise noted,
+.I optval
+is a pointer to an
+.IR int .
+.\" or SOL_TCP on Linux
+In addition,
+most
+.B IPPROTO_IP
+socket options are valid on TCP sockets.
+For more information see
+.BR ip (7).
+.PP
+Following is a list of TCP-specific socket options.
+For details of some other socket options that are also applicable
+for TCP sockets, see
+.BR socket (7).
+.TP
+.BR TCP_CONGESTION " (since Linux 2.6.13)"
+.\" commit 5f8ef48d240963093451bcf83df89f1a1364f51d
+.\" Author: Stephen Hemminger <shemminger@osdl.org>
+The argument for this option is a string.
+This option allows the caller to set the TCP congestion control
+algorithm to be used, on a per-socket basis.
+Unprivileged processes are restricted to choosing one of the algorithms in
+.I tcp_allowed_congestion_control
+(described above).
+Privileged processes
+.RB ( CAP_NET_ADMIN )
+can choose from any of the available congestion-control algorithms
+(see the description of
+.I tcp_available_congestion_control
+above).
+.TP
+.BR TCP_CORK " (since Linux 2.2)"
+.\" precisely: since Linux 2.1.127
+If set, don't send out partial frames.
+All queued partial frames are sent when the option is cleared again.
+This is useful for prepending headers before calling
+.BR sendfile (2),
+or for throughput optimization.
+As currently implemented, there is a 200 millisecond ceiling on the time
+for which output is corked by
+.BR TCP_CORK .
+If this ceiling is reached, then queued data is automatically transmitted.
+This option can be combined with
+.B TCP_NODELAY
+only since Linux 2.5.71.
+This option should not be used in code intended to be portable.
+.TP
+.BR TCP_DEFER_ACCEPT " (since Linux 2.4)"
+.\" Precisely: since Linux 2.3.38
+.\" Useful references:
+.\" http://www.techrepublic.com/article/take-advantage-of-tcp-ip-options-to-optimize-data-transmission/
+.\" http://unix.stackexchange.com/questions/94104/real-world-use-of-tcp-defer-accept
+Allow a listener to be awakened only when data arrives on the socket.
+Takes an integer value (seconds), this can
+bound the maximum number of attempts TCP will make to
+complete the connection.
+This option should not be used in code intended to be portable.
+.TP
+.BR TCP_INFO " (since Linux 2.4)"
+Used to collect information about this socket.
+The kernel returns a \fIstruct tcp_info\fP as defined in the file
+.IR /usr/include/linux/tcp.h .
+This option should not be used in code intended to be portable.
+.TP
+.BR TCP_KEEPCNT " (since Linux 2.4)"
+.\" Precisely: since Linux 2.3.18
+The maximum number of keepalive probes TCP should send
+before dropping the connection.
+This option should not be
+used in code intended to be portable.
+.TP
+.BR TCP_KEEPIDLE " (since Linux 2.4)"
+.\" Precisely: since Linux 2.3.18
+The time (in seconds) the connection needs to remain idle
+before TCP starts sending keepalive probes, if the socket
+option
+.B SO_KEEPALIVE
+has been set on this socket.
+This option should not be used in code intended to be portable.
+.TP
+.BR TCP_KEEPINTVL " (since Linux 2.4)"
+.\" Precisely: since Linux 2.3.18
+The time (in seconds) between individual keepalive probes.
+This option should not be used in code intended to be portable.
+.TP
+.BR TCP_LINGER2 " (since Linux 2.4)"
+.\" Precisely: since Linux 2.3.41
+The lifetime of orphaned FIN_WAIT2 state sockets.
+This option can be used to override the system-wide setting in the file
+.I /proc/sys/net/ipv4/tcp_fin_timeout
+for this socket.
+This is not to be confused with the
+.BR socket (7)
+level option
+.BR SO_LINGER .
+This option should not be used in code intended to be portable.
+.TP
+.B TCP_MAXSEG
+.\" Present in Linux 1.0
+The maximum segment size for outgoing TCP packets.
+In Linux 2.2 and earlier, and in Linux 2.6.28 and later,
+if this option is set before connection establishment, it also
+changes the MSS value announced to the other end in the initial packet.
+Values greater than the (eventual) interface MTU have no effect.
+TCP will also impose
+its minimum and maximum bounds over the value provided.
+.TP
+.B TCP_NODELAY
+.\" Present in Linux 1.0
+If set, disable the Nagle algorithm.
+This means that segments
+are always sent as soon as possible, even if there is only a
+small amount of data.
+When not set, data is buffered until there
+is a sufficient amount to send out, thereby avoiding the
+frequent sending of small packets, which results in poor
+utilization of the network.
+This option is overridden by
+.BR TCP_CORK ;
+however, setting this option forces an explicit flush of
+pending output, even if
+.B TCP_CORK
+is currently set.
+.TP
+.BR TCP_QUICKACK " (since Linux 2.4.4)"
+Enable quickack mode if set or disable quickack
+mode if cleared.
+In quickack mode, acks are sent
+immediately, rather than delayed if needed in accordance
+to normal TCP operation.
+This flag is not permanent,
+it only enables a switch to or from quickack mode.
+Subsequent operation of the TCP protocol will
+once again enter/leave quickack mode depending on
+internal protocol processing and factors such as
+delayed ack timeouts occurring and data transfer.
+This option should not be used in code intended to be
+portable.
+.TP
+.BR TCP_SYNCNT " (since Linux 2.4)"
+.\" Precisely: since Linux 2.3.18
+Set the number of SYN retransmits that TCP should send before
+aborting the attempt to connect.
+It cannot exceed 255.
+This option should not be used in code intended to be portable.
+.TP
+.BR TCP_USER_TIMEOUT " (since Linux 2.6.37)"
+.\" commit dca43c75e7e545694a9dd6288553f55c53e2a3a3
+.\" Author: Jerry Chu <hkchu@google.com>
+.\" The following text taken nearly verbatim from Jerry Chu's (excellent)
+.\" commit message.
+.\"
+This option takes an
+.I unsigned int
+as an argument.
+When the value is greater than 0,
+it specifies the maximum amount of time in milliseconds that transmitted
+data may remain unacknowledged, or buffered data may remain untransmitted
+(due to zero window size) before TCP will forcibly close the
+corresponding connection and return
+.B ETIMEDOUT
+to the application.
+If the option value is specified as 0,
+TCP will use the system default.
+.IP
+Increasing user timeouts allows a TCP connection to survive extended
+periods without end-to-end connectivity.
+Decreasing user timeouts
+allows applications to "fail fast", if so desired.
+Otherwise, failure may take up to 20 minutes with
+the current system defaults in a normal WAN environment.
+.IP
+This option can be set during any state of a TCP connection,
+but is effective only during the synchronized states of a connection
+(ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, and LAST-ACK).
+Moreover, when used with the TCP keepalive
+.RB ( SO_KEEPALIVE )
+option,
+.B TCP_USER_TIMEOUT
+will override keepalive to determine when to close a
+connection due to keepalive failure.
+.IP
+The option has no effect on when TCP retransmits a packet,
+nor when a keepalive probe is sent.
+.IP
+This option, like many others, will be inherited by the socket returned by
+.BR accept (2),
+if it was set on the listening socket.
+.IP
+Further details on the user timeout feature can be found in
+RFC\ 793 and RFC\ 5482 ("TCP User Timeout Option").
+.TP
+.BR TCP_WINDOW_CLAMP " (since Linux 2.4)"
+.\" Precisely: since Linux 2.3.41
+Bound the size of the advertised window to this value.
+The kernel imposes a minimum size of SOCK_MIN_RCVBUF/2.
+This option should not be used in code intended to be
+portable.
+.TP
+.BR TCP_FASTOPEN " (since Linux 3.6)"
+This option enables Fast Open (RFC\~7413) on the listener socket.
+The value specifies the maximum length of pending SYNs
+(similar to the backlog argument in
+.BR listen (2)).
+Once enabled,
+the listener socket grants the TCP Fast Open cookie
+on incoming SYN with TCP Fast Open option.
+.IP
+More importantly it accepts the data in SYN with a valid Fast Open cookie
+and responds SYN-ACK acknowledging both the data and the SYN sequence.
+.BR accept (2)
+returns a socket that is available for read and write
+when the handshake has not completed yet.
+Thus the data exchange can commence before the handshake completes.
+This option requires enabling the server-side support on sysctl
+.I net.ipv4.tcp_fastopen
+(see above).
+For TCP Fast Open client-side support,
+see
+.BR send (2)
+.B MSG_FASTOPEN
+or
+.B TCP_FASTOPEN_CONNECT
+below.
+.TP
+.BR TCP_FASTOPEN_CONNECT " (since Linux 4.11)"
+This option enables an alternative way to perform Fast Open
+on the active side (client).
+When this option is enabled,
+.BR connect (2)
+would behave differently depending on
+if a Fast Open cookie is available for the destination.
+.IP
+If a cookie is not available (i.e. first contact to the destination),
+.BR connect (2)
+behaves as usual by sending a SYN immediately,
+except the SYN would include an empty Fast Open cookie option
+to solicit a cookie.
+.IP
+If a cookie is available,
+.BR connect (2)
+would return 0 immediately but the SYN transmission is deferred.
+A subsequent
+.BR write (2)
+or
+.BR sendmsg (2)
+would trigger a SYN with data plus cookie in the Fast Open option.
+In other words,
+the actual connect operation is deferred until data is supplied.
+.IP
+.B Note:
+While this option is designed for convenience,
+enabling it does change the behaviors and certain system calls might set
+different
+.I errno
+values.
+With cookie present,
+.BR write (2)
+or
+.BR sendmsg (2)
+must be called right after
+.BR connect (2)
+in order to send out SYN+data to complete 3WHS and establish connection.
+Calling
+.BR read (2)
+right after
+.BR connect (2)
+without
+.BR write (2)
+will cause the blocking socket to be blocked forever.
+.IP
+The application should either set
+.B TCP_FASTOPEN_CONNECT
+socket option before
+.BR write (2)
+or
+.BR sendmsg (2),
+or call
+.BR write (2)
+or
+.BR sendmsg (2)
+with
+.B MSG_FASTOPEN
+flag directly,
+instead of both on the same connection.
+.IP
+Here is the typical call flow with this new option:
+.IP
+.in +4n
+.EX
+s = socket();
+setsockopt(s, IPPROTO_TCP, TCP_FASTOPEN_CONNECT, 1, ...);
+connect(s);
+write(s); /* write() should always follow connect()
+ * in order to trigger SYN to go out. */
+read(s)/write(s);
+/* ... */
+close(s);
+.EE
+.in
+.SS Sockets API
+TCP provides limited support for out-of-band data,
+in the form of (a single byte of) urgent data.
+In Linux this means if the other end sends newer out-of-band
+data the older urgent data is inserted as normal data into
+the stream (even when
+.B SO_OOBINLINE
+is not set).
+This differs from BSD-based stacks.
+.PP
+Linux uses the BSD compatible interpretation of the urgent
+pointer field by default.
+This violates RFC\ 1122, but is
+required for interoperability with other stacks.
+It can be changed via
+.IR /proc/sys/net/ipv4/tcp_stdurg .
+.PP
+It is possible to peek at out-of-band data using the
+.BR recv (2)
+.B MSG_PEEK
+flag.
+.PP
+Since Linux 2.4, Linux supports the use of
+.B MSG_TRUNC
+in the
+.I flags
+argument of
+.BR recv (2)
+(and
+.BR recvmsg (2)).
+This flag causes the received bytes of data to be discarded,
+rather than passed back in a caller-supplied buffer.
+Since Linux 2.4.4,
+.B MSG_TRUNC
+also has this effect when used in conjunction with
+.B MSG_OOB
+to receive out-of-band data.
+.SS Ioctls
+The following
+.BR ioctl (2)
+calls return information in
+.IR value .
+The correct syntax is:
+.PP
+.RS
+.nf
+.BI int " value";
+.IB error " = ioctl(" tcp_socket ", " ioctl_type ", &" value ");"
+.fi
+.RE
+.PP
+.I ioctl_type
+is one of the following:
+.TP
+.B SIOCINQ
+Returns the amount of queued unread data in the receive buffer.
+The socket must not be in LISTEN state, otherwise an error
+.RB ( EINVAL )
+is returned.
+.B SIOCINQ
+is defined in
+.IR <linux/sockios.h> .
+.\" FIXME https://www.sourceware.org/bugzilla/show_bug.cgi?id=12002,
+.\" filed 2010-09-10, may cause SIOCINQ to be defined in glibc headers
+Alternatively,
+you can use the synonymous
+.BR FIONREAD ,
+defined in
+.IR <sys/ioctl.h> .
+.TP
+.B SIOCATMARK
+Returns true (i.e.,
+.I value
+is nonzero) if the inbound data stream is at the urgent mark.
+.IP
+If the
+.B SO_OOBINLINE
+socket option is set, and
+.B SIOCATMARK
+returns true, then the
+next read from the socket will return the urgent data.
+If the
+.B SO_OOBINLINE
+socket option is not set, and
+.B SIOCATMARK
+returns true, then the
+next read from the socket will return the bytes following
+the urgent data (to actually read the urgent data requires the
+.B recv(MSG_OOB)
+flag).
+.IP
+Note that a read never reads across the urgent mark.
+If an application is informed of the presence of urgent data via
+.BR select (2)
+(using the
+.I exceptfds
+argument) or through delivery of a
+.B SIGURG
+signal,
+then it can advance up to the mark using a loop which repeatedly tests
+.B SIOCATMARK
+and performs a read (requesting any number of bytes) as long as
+.B SIOCATMARK
+returns false.
+.TP
+.B SIOCOUTQ
+Returns the amount of unsent data in the socket send queue.
+The socket must not be in LISTEN state, otherwise an error
+.RB ( EINVAL )
+is returned.
+.B SIOCOUTQ
+is defined in
+.IR <linux/sockios.h> .
+.\" FIXME . https://www.sourceware.org/bugzilla/show_bug.cgi?id=12002,
+.\" filed 2010-09-10, may cause SIOCOUTQ to be defined in glibc headers
+Alternatively,
+you can use the synonymous
+.BR TIOCOUTQ ,
+defined in
+.IR <sys/ioctl.h> .
+.SS Error handling
+When a network error occurs, TCP tries to resend the packet.
+If it doesn't succeed after some time, either
+.B ETIMEDOUT
+or the last received error on this connection is reported.
+.PP
+Some applications require a quicker error notification.
+This can be enabled with the
+.B IPPROTO_IP
+level
+.B IP_RECVERR
+socket option.
+When this option is enabled, all incoming
+errors are immediately passed to the user program.
+Use this option with care \[em] it makes TCP less tolerant to routing
+changes and other normal network conditions.
+.SH ERRORS
+.TP
+.B EAFNOTSUPPORT
+Passed socket address type in
+.I sin_family
+was not
+.BR AF_INET .
+.TP
+.B EPIPE
+The other end closed the socket unexpectedly or a read is
+executed on a shut down socket.
+.TP
+.B ETIMEDOUT
+The other end didn't acknowledge retransmitted data after some time.
+.PP
+Any errors defined for
+.BR ip (7)
+or the generic socket layer may also be returned for TCP.
+.SH VERSIONS
+Support for Explicit Congestion Notification, zero-copy
+.BR sendfile (2),
+reordering support and some SACK extensions
+(DSACK) were introduced in Linux 2.4.
+Support for forward acknowledgement (FACK), TIME_WAIT recycling,
+and per-connection keepalive socket options were introduced in Linux 2.3.
+.SH BUGS
+Not all errors are documented.
+.PP
+IPv6 is not described.
+.\" Only a single Linux kernel version is described
+.\" Info for 2.2 was lost. Should be added again,
+.\" or put into a separate page.
+.\" .SH AUTHORS
+.\" This man page was originally written by Andi Kleen.
+.\" It was updated for 2.4 by Nivedita Singhvi with input from
+.\" Alexey Kuznetsov's Documentation/networking/ip-sysctl.txt
+.\" document.
+.SH SEE ALSO
+.BR accept (2),
+.BR bind (2),
+.BR connect (2),
+.BR getsockopt (2),
+.BR listen (2),
+.BR recvmsg (2),
+.BR sendfile (2),
+.BR sendmsg (2),
+.BR socket (2),
+.BR ip (7),
+.BR socket (7)
+.PP
+The kernel source file
+.IR Documentation/networking/ip\-sysctl.txt .
+.PP
+RFC\ 793 for the TCP specification.
+.br
+RFC\ 1122 for the TCP requirements and a description of the Nagle algorithm.
+.br
+RFC\ 1323 for TCP timestamp and window scaling options.
+.br
+RFC\ 1337 for a description of TIME_WAIT assassination hazards.
+.br
+RFC\ 3168 for a description of Explicit Congestion Notification.
+.br
+RFC\ 2581 for TCP congestion control algorithms.
+.br
+RFC\ 2018 and RFC\ 2883 for SACK and extensions to SACK.