diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:40:15 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:40:15 +0000 |
commit | 399644e47874bff147afb19c89228901ac39340e (patch) | |
tree | 1c4c0b733f4c16b5783b41bebb19194a9ef62ad1 /man7/tcp.7 | |
parent | Initial commit. (diff) | |
download | manpages-ac8a94b90d5cf454cd6648203aaf1c44d642788f.tar.xz manpages-ac8a94b90d5cf454cd6648203aaf1c44d642788f.zip |
Adding upstream version 6.05.01.upstream/6.05.01
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'man7/tcp.7')
-rw-r--r-- | man7/tcp.7 | 1563 |
1 files changed, 1563 insertions, 0 deletions
diff --git a/man7/tcp.7 b/man7/tcp.7 new file mode 100644 index 0000000..aec6b78 --- /dev/null +++ b/man7/tcp.7 @@ -0,0 +1,1563 @@ +.\" SPDX-License-Identifier: Linux-man-pages-1-para +.\" +.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>. +.\" and Copyright (C) 2008 Michael Kerrisk <mtk.manpages@gmail.com> +.\" Note also that many pieces are drawn from the kernel source file +.\" Documentation/networking/ip-sysctl.txt. +.\" +.\" 2.4 Updates by Nivedita Singhvi 4/20/02 <nivedita@us.ibm.com>. +.\" Modified, 2004-11-11, Michael Kerrisk and Andries Brouwer +.\" Updated details of interaction of TCP_CORK and TCP_NODELAY. +.\" +.\" 2008-11-21, mtk, many, many updates. +.\" The descriptions of /proc files and socket options should now +.\" be more or less up to date and complete as at Linux 2.6.27 +.\" (other than the remaining FIXMEs in the page source below). +.\" +.\" FIXME The following need to be documented +.\" TCP_MD5SIG (2.6.20) +.\" commit cfb6eeb4c860592edd123fdea908d23c6ad1c7dc +.\" Author was yoshfuji@linux-ipv6.org +.\" Needs CONFIG_TCP_MD5SIG +.\" From net/inet/Kconfig: +.\" bool "TCP: MD5 Signature Option support (RFC2385) (EXPERIMENTAL)" +.\" RFC2385 specifies a method of giving MD5 protection to TCP sessions. +.\" Its main (only?) use is to protect BGP sessions between core routers +.\" on the Internet. +.\" +.\" There is a TCP_MD5SIG option documented in FreeBSD's tcp(4), +.\" but probably many details are different on Linux +.\" http://thread.gmane.org/gmane.linux.network/47490 +.\" http://www.daemon-systems.org/man/tcp.4.html +.\" http://article.gmane.org/gmane.os.netbsd.devel.network/3767/match=tcp_md5sig+freebsd +.\" +.\" TCP_COOKIE_TRANSACTIONS (2.6.33) +.\" commit 519855c508b9a17878c0977a3cdefc09b59b30df +.\" Author: William Allen Simpson <william.allen.simpson@gmail.com> +.\" commit e56fb50f2b7958b931c8a2fc0966061b3f3c8f3a +.\" Author: William Allen Simpson <william.allen.simpson@gmail.com> +.\" +.\" REMOVED in Linux 3.10 +.\" commit 1a2c6181c4a1922021b4d7df373bba612c3e5f04 +.\" Author: Christoph Paasch <christoph.paasch@uclouvain.be> +.\" +.\" TCP_THIN_LINEAR_TIMEOUTS (2.6.34) +.\" commit 36e31b0af58728071e8023cf8e20c5166b700717 +.\" Author: Andreas Petlund <apetlund@simula.no> +.\" +.\" TCP_THIN_DUPACK (2.6.34) +.\" commit 7e38017557bc0b87434d184f8804cadb102bb903 +.\" Author: Andreas Petlund <apetlund@simula.no> +.\" +.\" TCP_REPAIR (3.5) +.\" commit ee9952831cfd0bbe834f4a26489d7dce74582e37 +.\" Author: Pavel Emelyanov <xemul@parallels.com> +.\" See also +.\" http://criu.org/TCP_connection +.\" https://lwn.net/Articles/495304/ +.\" +.\" TCP_REPAIR_QUEUE (3.5) +.\" commit ee9952831cfd0bbe834f4a26489d7dce74582e37 +.\" Author: Pavel Emelyanov <xemul@parallels.com> +.\" +.\" TCP_QUEUE_SEQ (3.5) +.\" commit ee9952831cfd0bbe834f4a26489d7dce74582e37 +.\" Author: Pavel Emelyanov <xemul@parallels.com> +.\" +.\" TCP_REPAIR_OPTIONS (3.5) +.\" commit b139ba4e90dccbf4cd4efb112af96a5c9e0b098c +.\" Author: Pavel Emelyanov <xemul@parallels.com> +.\" +.\" TCP_FASTOPEN (3.6) +.\" (Fast Open server side implementation completed in Linux 3.7) +.\" http://lwn.net/Articles/508865/ +.\" +.\" TCP_TIMESTAMP (3.9) +.\" commit 93be6ce0e91b6a94783e012b1857a347a5e6e9f2 +.\" Author: Andrey Vagin <avagin@openvz.org> +.\" +.\" TCP_NOTSENT_LOWAT (3.12) +.\" commit c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 +.\" Author: Eric Dumazet <edumazet@google.com> +.\" +.\" TCP_CC_INFO (4.1) +.\" commit 6e9250f59ef9efb932c84850cd221f22c2a03c4a +.\" Author: Eric Dumazet <edumazet@google.com> +.\" +.\" TCP_SAVE_SYN, TCP_SAVED_SYN (4.2) +.\" commit cd8ae85299d54155702a56811b2e035e63064d3d +.\" Author: Eric Dumazet <edumazet@google.com> +.\" +.TH tcp 7 2023-07-15 "Linux man-pages 6.05.01" +.SH NAME +tcp \- TCP protocol +.SH SYNOPSIS +.nf +.B #include <sys/socket.h> +.B #include <netinet/in.h> +.B #include <netinet/tcp.h> +.PP +.IB tcp_socket " = socket(AF_INET, SOCK_STREAM, 0);" +.fi +.SH DESCRIPTION +This is an implementation of the TCP protocol defined in +RFC\ 793, RFC\ 1122 and RFC\ 2001 with the NewReno and SACK +extensions. +It provides a reliable, stream-oriented, +full-duplex connection between two sockets on top of +.BR ip (7), +for both v4 and v6 versions. +TCP guarantees that the data arrives in order and +retransmits lost packets. +It generates and checks a per-packet checksum to catch +transmission errors. +TCP does not preserve record boundaries. +.PP +A newly created TCP socket has no remote or local address and is not +fully specified. +To create an outgoing TCP connection use +.BR connect (2) +to establish a connection to another TCP socket. +To receive new incoming connections, first +.BR bind (2) +the socket to a local address and port and then call +.BR listen (2) +to put the socket into the listening state. +After that a new socket for each incoming connection can be accepted using +.BR accept (2). +A socket which has had +.BR accept (2) +or +.BR connect (2) +successfully called on it is fully specified and may transmit data. +Data cannot be transmitted on listening or not yet connected sockets. +.PP +Linux supports RFC\ 1323 TCP high performance +extensions. +These include Protection Against Wrapped +Sequence Numbers (PAWS), Window Scaling and Timestamps. +Window scaling allows the use +of large (> 64\ kB) TCP windows in order to support links with high +latency or bandwidth. +To make use of them, the send and receive buffer sizes must be increased. +They can be set globally with the +.I /proc/sys/net/ipv4/tcp_wmem +and +.I /proc/sys/net/ipv4/tcp_rmem +files, or on individual sockets by using the +.B SO_SNDBUF +and +.B SO_RCVBUF +socket options with the +.BR setsockopt (2) +call. +.PP +The maximum sizes for socket buffers declared via the +.B SO_SNDBUF +and +.B SO_RCVBUF +mechanisms are limited by the values in the +.I /proc/sys/net/core/rmem_max +and +.I /proc/sys/net/core/wmem_max +files. +Note that TCP actually allocates twice the size of +the buffer requested in the +.BR setsockopt (2) +call, and so a succeeding +.BR getsockopt (2) +call will not return the same size of buffer as requested in the +.BR setsockopt (2) +call. +TCP uses the extra space for administrative purposes and internal +kernel structures, and the +.I /proc +file values reflect the +larger sizes compared to the actual TCP windows. +On individual connections, the socket buffer size must be set prior to the +.BR listen (2) +or +.BR connect (2) +calls in order to have it take effect. +See +.BR socket (7) +for more information. +.PP +TCP supports urgent data. +Urgent data is used to signal the +receiver that some important message is part of the data +stream and that it should be processed as soon as possible. +To send urgent data specify the +.B MSG_OOB +option to +.BR send (2). +When urgent data is received, the kernel sends a +.B SIGURG +signal to the process or process group that has been set as the +socket "owner" using the +.B SIOCSPGRP +or +.B FIOSETOWN +ioctls (or the POSIX.1-specified +.BR fcntl (2) +.B F_SETOWN +operation). +When the +.B SO_OOBINLINE +socket option is enabled, urgent data is put into the normal +data stream (a program can test for its location using the +.B SIOCATMARK +ioctl described below), +otherwise it can be received only when the +.B MSG_OOB +flag is set for +.BR recv (2) +or +.BR recvmsg (2). +.PP +When out-of-band data is present, +.BR select (2) +indicates the file descriptor as having an exceptional condition and +.I poll (2) +indicates a +.B POLLPRI +event. +.PP +Linux 2.4 introduced a number of changes for improved +throughput and scaling, as well as enhanced functionality. +Some of these features include support for zero-copy +.BR sendfile (2), +Explicit Congestion Notification, new +management of TIME_WAIT sockets, keep-alive socket options +and support for Duplicate SACK extensions. +.SS Address formats +TCP is built on top of IP (see +.BR ip (7)). +The address formats defined by +.BR ip (7) +apply to TCP. +TCP supports point-to-point communication only; +broadcasting and multicasting are not +supported. +.SS /proc interfaces +System-wide TCP parameter settings can be accessed by files in the directory +.IR /proc/sys/net/ipv4/ . +In addition, most IP +.I /proc +interfaces also apply to TCP; see +.BR ip (7). +Variables described as +.I Boolean +take an integer value, with a nonzero value ("true") meaning that +the corresponding option is enabled, and a zero value ("false") +meaning that the option is disabled. +.TP +.IR tcp_abc " (Integer; default: 0; Linux 2.6.15 to Linux 3.8)" +.\" Since Linux 2.6.15; removed in Linux 3.9 +.\" commit ca2eb5679f8ddffff60156af42595df44a315ef0 +.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt +Control the Appropriate Byte Count (ABC), defined in RFC 3465. +ABC is a way of increasing the congestion window +.RI ( cwnd ) +more slowly in response to partial acknowledgements. +Possible values are: +.RS +.TP +.B 0 +increase +.I cwnd +once per acknowledgement (no ABC) +.TP +.B 1 +increase +.I cwnd +once per acknowledgement of full sized segment +.TP +.B 2 +allow increase +.I cwnd +by two if acknowledgement is +of two segments to compensate for delayed acknowledgements. +.RE +.TP +.IR tcp_abort_on_overflow " (Boolean; default: disabled; since Linux 2.4)" +.\" Since Linux 2.3.41 +Enable resetting connections if the listening service is too +slow and unable to keep up and accept them. +It means that if overflow occurred due +to a burst, the connection will recover. +Enable this option +.I only +if you are really sure that the listening daemon +cannot be tuned to accept connections faster. +Enabling this option can harm the clients of your server. +.TP +.IR tcp_adv_win_scale " (integer; default: 2; since Linux 2.4)" +.\" Since Linux 2.4.0-test7 +Count buffering overhead as +.IR "bytes/2\[ha]tcp_adv_win_scale" , +if +.I tcp_adv_win_scale +is greater than 0; or +.IR "bytes\-bytes/2\[ha](\-tcp_adv_win_scale)" , +if +.I tcp_adv_win_scale +is less than or equal to zero. +.IP +The socket receive buffer space is shared between the +application and kernel. +TCP maintains part of the buffer as +the TCP window, this is the size of the receive window +advertised to the other end. +The rest of the space is used +as the "application" buffer, used to isolate the network +from scheduling and application latencies. +The +.I tcp_adv_win_scale +default value of 2 implies that the space +used for the application buffer is one fourth that of the total. +.TP +.IR tcp_allowed_congestion_control " (String; default: see text; since Linux 2.4.20)" +.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt +Show/set the congestion control algorithm choices available to unprivileged +processes (see the description of the +.B TCP_CONGESTION +socket option). +The items in the list are separated by white space and +terminated by a newline character. +The list is a subset of those listed in +.IR tcp_available_congestion_control . +The default value for this list is "reno" plus the default setting of +.IR tcp_congestion_control . +.TP +.IR tcp_autocorking " (Boolean; default: enabled; since Linux 3.14)" +.\" commit f54b311142a92ea2e42598e347b84e1655caf8e3 +.\" Text heavily based on Documentation/networking/ip-sysctl.txt +If this option is enabled, the kernel tries to coalesce small writes +(from consecutive +.BR write (2) +and +.BR sendmsg (2) +calls) as much as possible, +in order to decrease the total number of sent packets. +Coalescing is done if at least one prior packet for the flow +is waiting in Qdisc queues or device transmit queue. +Applications can still use the +.B TCP_CORK +socket option to obtain optimal behavior +when they know how/when to uncork their sockets. +.TP +.IR tcp_available_congestion_control " (String; read-only; since Linux 2.4.20)" +.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt +Show a list of the congestion-control algorithms +that are registered. +The items in the list are separated by white space and +terminated by a newline character. +This list is a limiting set for the list in +.IR tcp_allowed_congestion_control . +More congestion-control algorithms may be available as modules, +but not loaded. +.TP +.IR tcp_app_win " (integer; default: 31; since Linux 2.4)" +.\" Since Linux 2.4.0-test7 +This variable defines how many +bytes of the TCP window are reserved for buffering overhead. +.IP +A maximum of (\fIwindow/2\[ha]tcp_app_win\fP, mss) bytes in the window +are reserved for the application buffer. +A value of 0 implies that no amount is reserved. +.\" +.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt +.TP +.IR tcp_base_mss " (Integer; default: 512; since Linux 2.6.17)" +The initial value of +.I search_low +to be used by the packetization layer Path MTU discovery (MTU probing). +If MTU probing is enabled, +this is the initial MSS used by the connection. +.\" +.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.IR tcp_bic " (Boolean; default: disabled; Linux 2.4.27/2.6.6 to Linux 2.6.13)" +Enable BIC TCP congestion control algorithm. +BIC-TCP is a sender-side-only change that ensures a linear RTT +fairness under large windows while offering both scalability and +bounded TCP-friendliness. +The protocol combines two schemes +called additive increase and binary search increase. +When the congestion window is large, additive increase with a large +increment ensures linear RTT fairness as well as good scalability. +Under small congestion windows, binary search +increase provides TCP friendliness. +.\" +.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.IR tcp_bic_low_window " (integer; default: 14; Linux 2.4.27/2.6.6 to Linux 2.6.13)" +Set the threshold window (in packets) where BIC TCP starts to +adjust the congestion window. +Below this threshold BIC TCP behaves the same as the default TCP Reno. +.\" +.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.IR tcp_bic_fast_convergence " (Boolean; default: enabled; Linux 2.4.27/2.6.6 to Linux 2.6.13)" +Force BIC TCP to more quickly respond to changes in congestion window. +Allows two flows sharing the same connection to converge more rapidly. +.TP +.IR tcp_congestion_control " (String; default: see text; since Linux 2.4.13)" +.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt +Set the default congestion-control algorithm to be used for new connections. +The algorithm "reno" is always available, +but additional choices may be available depending on kernel configuration. +The default value for this file is set as part of kernel configuration. +.TP +.IR tcp_dma_copybreak " (integer; default: 4096; since Linux 2.6.24)" +Lower limit, in bytes, of the size of socket reads that will be +offloaded to a DMA copy engine, if one is present in the system +and the kernel was configured with the +.B CONFIG_NET_DMA +option. +.TP +.IR tcp_dsack " (Boolean; default: enabled; since Linux 2.4)" +.\" Since Linux 2.4.0-test7 +Enable RFC\ 2883 TCP Duplicate SACK support. +.TP +.IR tcp_fastopen " (Bitmask; default: 0x1; since Linux 3.7)" +Enables RFC\~7413 Fast Open support. +The flag is used as a bitmap with the following values: +.RS +.TP +.B 0x1 +Enables client side Fast Open support +.TP +.B 0x2 +Enables server side Fast Open support +.TP +.B 0x4 +Allows client side to transmit data in SYN without Fast Open option +.TP +.B 0x200 +Allows server side to accept SYN data without Fast Open option +.TP +.B 0x400 +Enables Fast Open on all listeners without +.B TCP_FASTOPEN +socket option +.RE +.TP +.IR tcp_fastopen_key " (since Linux 3.7)" +Set server side RFC\~7413 Fast Open key to generate Fast Open cookie +when server side Fast Open support is enabled. +.TP +.IR tcp_ecn " (Integer; default: see below; since Linux 2.4)" +.\" Since Linux 2.4.0-test7 +Enable RFC\ 3168 Explicit Congestion Notification. +.IP +This file can have one of the following values: +.RS +.TP +.B 0 +Disable ECN. +Neither initiate nor accept ECN. +This was the default up to and including Linux 2.6.30. +.TP +.B 1 +Enable ECN when requested by incoming connections and also +request ECN on outgoing connection attempts. +.TP +.B 2 +.\" commit 255cac91c3c9ce7dca7713b93ab03c75b7902e0e +Enable ECN when requested by incoming connections, +but do not request ECN on outgoing connections. +This value is supported, and is the default, since Linux 2.6.31. +.RE +.IP +When enabled, connectivity to some destinations could be affected +due to older, misbehaving middle boxes along the path, causing +connections to be dropped. +However, to facilitate and encourage deployment with option 1, and +to work around such buggy equipment, the +.B tcp_ecn_fallback +option has been introduced. +.TP +.IR tcp_ecn_fallback " (Boolean; default: enabled; since Linux 4.1)" +.\" commit 492135557dc090a1abb2cfbe1a412757e3ed68ab +Enable RFC\ 3168, Section 6.1.1.1. fallback. +When enabled, outgoing ECN-setup SYNs that time out within the +normal SYN retransmission timeout will be resent with CWR and +ECE cleared. +.TP +.IR tcp_fack " (Boolean; default: enabled; since Linux 2.2)" +.\" Since Linux 2.1.92 +Enable TCP Forward Acknowledgement support. +.TP +.IR tcp_fin_timeout " (integer; default: 60; since Linux 2.2)" +.\" Since Linux 2.1.53 +This specifies how many seconds to wait for a final FIN packet before the +socket is forcibly closed. +This is strictly a violation of the TCP specification, +but required to prevent denial-of-service attacks. +In Linux 2.2, the default value was 180. +.\" +.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.IR tcp_frto " (integer; default: see below; since Linux 2.4.21/2.6)" +.\" Since Linux 2.4.21/2.5.43 +Enable F-RTO, an enhanced recovery algorithm for TCP retransmission +timeouts (RTOs). +It is particularly beneficial in wireless environments +where packet loss is typically due to random radio interference +rather than intermediate router congestion. +See RFC 4138 for more details. +.IP +This file can have one of the following values: +.RS +.TP +.B 0 +Disabled. +This was the default up to and including Linux 2.6.23. +.TP +.B 1 +The basic version F-RTO algorithm is enabled. +.TP +.B 2 +.\" commit c96fd3d461fa495400df24be3b3b66f0e0b152f9 +Enable SACK-enhanced F-RTO if flow uses SACK. +The basic version can be used also when +SACK is in use though in that case scenario(s) exists where F-RTO +interacts badly with the packet counting of the SACK-enabled TCP flow. +This value is the default since Linux 2.6.24. +.RE +.IP +Before Linux 2.6.22, this parameter was a Boolean value, +supporting just values 0 and 1 above. +.TP +.IR tcp_frto_response " (integer; default: 0; since Linux 2.6.22)" +When F-RTO has detected that a TCP retransmission timeout was spurious +(i.e., the timeout would have been avoided had TCP set a +longer retransmission timeout), +TCP has several options concerning what to do next. +Possible values are: +.RS +.TP +.B 0 +Rate halving based; a smooth and conservative response, +results in halved congestion window +.RI ( cwnd ) +and slow-start threshold +.RI ( ssthresh ) +after one RTT. +.TP +.B 1 +Very conservative response; not recommended because even +though being valid, it interacts poorly with the rest of Linux TCP; halves +.I cwnd +and +.I ssthresh +immediately. +.TP +.B 2 +Aggressive response; undoes congestion-control measures +that are now known to be unnecessary +(ignoring the possibility of a lost retransmission that would require +TCP to be more cautious); +.I cwnd +and +.I ssthresh +are restored to the values prior to timeout. +.RE +.TP +.IR tcp_keepalive_intvl " (integer; default: 75; since Linux 2.4)" +.\" Since Linux 2.3.18 +The number of seconds between TCP keep-alive probes. +.TP +.IR tcp_keepalive_probes " (integer; default: 9; since Linux 2.2)" +.\" Since Linux 2.1.43 +The maximum number of TCP keep-alive probes to send +before giving up and killing the connection if +no response is obtained from the other end. +.TP +.IR tcp_keepalive_time " (integer; default: 7200; since Linux 2.2)" +.\" Since Linux 2.1.43 +The number of seconds a connection needs to be idle +before TCP begins sending out keep-alive probes. +Keep-alives are sent only when the +.B SO_KEEPALIVE +socket option is enabled. +The default value is 7200 seconds (2 hours). +An idle connection is terminated after +approximately an additional 11 minutes (9 probes an interval +of 75 seconds apart) when keep-alive is enabled. +.IP +Note that underlying connection tracking mechanisms and +application timeouts may be much shorter. +.\" +.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.IR tcp_low_latency " (Boolean; default: disabled; since Linux 2.4.21/2.6; \ +obsolete since Linux 4.14)" +.\" Since Linux 2.4.21/2.5.60 +If enabled, the TCP stack makes decisions that prefer lower +latency as opposed to higher throughput. +It this option is disabled, then higher throughput is preferred. +An example of an application where this default should be +changed would be a Beowulf compute cluster. +Since Linux 4.14, +.\" commit b6690b14386698ce2c19309abad3f17656bdfaea +this file still exists, but its value is ignored. +.TP +.IR tcp_max_orphans " (integer; default: see below; since Linux 2.4)" +.\" Since Linux 2.3.41 +The maximum number of orphaned (not attached to any user file +handle) TCP sockets allowed in the system. +When this number is exceeded, +the orphaned connection is reset and a warning is printed. +This limit exists only to prevent simple denial-of-service attacks. +Lowering this limit is not recommended. +Network conditions might require you to increase the number of +orphans allowed, but note that each orphan can eat up to \[ti]64\ kB +of unswappable memory. +The default initial value is set equal to the kernel parameter NR_FILE. +This initial default is adjusted depending on the memory in the system. +.TP +.IR tcp_max_syn_backlog " (integer; default: see below; since Linux 2.2)" +.\" Since Linux 2.1.53 +The maximum number of queued connection requests which have +still not received an acknowledgement from the connecting client. +If this number is exceeded, the kernel will begin +dropping requests. +The default value of 256 is increased to +1024 when the memory present in the system is adequate or +greater (>= 128\ MB), and reduced to 128 for those systems with +very low memory (<= 32\ MB). +.IP +Before Linux 2.6.20, +.\" commit 72a3effaf633bcae9034b7e176bdbd78d64a71db +it was recommended that if this needed to be increased above 1024, +the size of the SYNACK hash table +.RB ( TCP_SYNQ_HSIZE ) +in +.I include/net/tcp.h +should be modified to keep +.IP +.in +4n +.EX +TCP_SYNQ_HSIZE * 16 <= tcp_max_syn_backlog +.EE +.in +.IP +and the kernel should be +recompiled. +In Linux 2.6.20, the fixed sized +.B TCP_SYNQ_HSIZE +was removed in favor of dynamic sizing. +.TP +.IR tcp_max_tw_buckets " (integer; default: see below; since Linux 2.4)" +.\" Since Linux 2.3.41 +The maximum number of sockets in TIME_WAIT state allowed in +the system. +This limit exists only to prevent simple denial-of-service attacks. +The default value of NR_FILE*2 is adjusted +depending on the memory in the system. +If this number is +exceeded, the socket is closed and a warning is printed. +.TP +.IR tcp_moderate_rcvbuf " (Boolean; default: enabled; since Linux 2.4.17/2.6.7)" +.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt +If enabled, TCP performs receive buffer auto-tuning, +attempting to automatically size the buffer (no greater than +.IR tcp_rmem[2] ) +to match the size required by the path for full throughput. +.TP +.IR tcp_mem " (since Linux 2.4)" +.\" Since Linux 2.4.0-test7 +This is a vector of 3 integers: [low, pressure, high]. +These bounds, measured in units of the system page size, +are used by TCP to track its memory usage. +The defaults are calculated at boot time from the amount of +available memory. +(TCP can only use +.I "low memory" +for this, which is limited to around 900 megabytes on 32-bit systems. +64-bit systems do not suffer this limitation.) +.RS +.TP +.I low +TCP doesn't regulate its memory allocation when the number +of pages it has allocated globally is below this number. +.TP +.I pressure +When the amount of memory allocated by TCP +exceeds this number of pages, TCP moderates its memory consumption. +This memory pressure state is exited +once the number of pages allocated falls below +the +.I low +mark. +.TP +.I high +The maximum number of pages, globally, that TCP will allocate. +This value overrides any other limits imposed by the kernel. +.RE +.TP +.IR tcp_mtu_probing " (integer; default: 0; since Linux 2.6.17)" +.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt +This parameter controls TCP Packetization-Layer Path MTU Discovery. +The following values may be assigned to the file: +.RS +.TP +.B 0 +Disabled +.TP +.B 1 +Disabled by default, enabled when an ICMP black hole detected +.TP +.B 2 +Always enabled, use initial MSS of +.IR tcp_base_mss . +.RE +.TP +.IR tcp_no_metrics_save " (Boolean; default: disabled; since Linux 2.6.6)" +.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt +By default, TCP saves various connection metrics in the route cache +when the connection closes, so that connections established in the +near future can use these to set initial conditions. +Usually, this increases overall performance, +but it may sometimes cause performance degradation. +If +.I tcp_no_metrics_save +is enabled, TCP will not cache metrics on closing connections. +.TP +.IR tcp_orphan_retries " (integer; default: 8; since Linux 2.4)" +.\" Since Linux 2.3.41 +The maximum number of attempts made to probe the other +end of a connection which has been closed by our end. +.TP +.IR tcp_reordering " (integer; default: 3; since Linux 2.4)" +.\" Since Linux 2.4.0-test7 +The maximum a packet can be reordered in a TCP packet stream +without TCP assuming packet loss and going into slow start. +It is not advisable to change this number. +This is a packet reordering detection metric designed to +minimize unnecessary back off and retransmits provoked by +reordering of packets on a connection. +.TP +.IR tcp_retrans_collapse " (Boolean; default: enabled; since Linux 2.2)" +.\" Since Linux 2.1.96 +Try to send full-sized packets during retransmit. +.TP +.IR tcp_retries1 " (integer; default: 3; since Linux 2.2)" +.\" Since Linux 2.1.43 +The number of times TCP will attempt to retransmit a +packet on an established connection normally, +without the extra effort of getting the network layers involved. +Once we exceed this number of +retransmits, we first have the network layer +update the route if possible before each new retransmit. +The default is the RFC specified minimum of 3. +.TP +.IR tcp_retries2 " (integer; default: 15; since Linux 2.2)" +.\" Since Linux 2.1.43 +The maximum number of times a TCP packet is retransmitted +in established state before giving up. +The default value is 15, which corresponds to a duration of +approximately between 13 to 30 minutes, depending +on the retransmission timeout. +The RFC\ 1122 specified +minimum limit of 100 seconds is typically deemed too short. +.TP +.IR tcp_rfc1337 " (Boolean; default: disabled; since Linux 2.2)" +.\" Since Linux 2.1.90 +Enable TCP behavior conformant with RFC\ 1337. +When disabled, +if a RST is received in TIME_WAIT state, we close +the socket immediately without waiting for the end +of the TIME_WAIT period. +.TP +.IR tcp_rmem " (since Linux 2.4)" +.\" Since Linux 2.4.0-test7 +This is a vector of 3 integers: [min, default, max]. +These parameters are used by TCP to regulate receive buffer sizes. +TCP dynamically adjusts the size of the +receive buffer from the defaults listed below, in the range +of these values, depending on memory available in the system. +.RS +.TP +.I min +minimum size of the receive buffer used by each TCP socket. +The default value is the system page size. +(On Linux 2.4, the default value is 4\ kB, lowered to +.B PAGE_SIZE +bytes in low-memory systems.) +This value +is used to ensure that in memory pressure mode, +allocations below this size will still succeed. +This is not +used to bound the size of the receive buffer declared +using +.B SO_RCVBUF +on a socket. +.TP +.I default +the default size of the receive buffer for a TCP socket. +This value overwrites the initial default buffer size from +the generic global +.I net.core.rmem_default +defined for all protocols. +The default value is 87380 bytes. +(On Linux 2.4, this will be lowered to 43689 in low-memory systems.) +If larger receive buffer sizes are desired, this value should +be increased (to affect all sockets). +To employ large TCP windows, the +.I net.ipv4.tcp_window_scaling +must be enabled (default). +.TP +.I max +the maximum size of the receive buffer used by each TCP socket. +This value does not override the global +.IR net.core.rmem_max . +This is not used to limit the size of the receive buffer declared using +.B SO_RCVBUF +on a socket. +The default value is calculated using the formula +.IP +.in +4n +.EX +max(87380, min(4\ MB, \fItcp_mem\fP[1]*PAGE_SIZE/128)) +.EE +.in +.IP +(On Linux 2.4, the default is 87380*2 bytes, +lowered to 87380 in low-memory systems). +.RE +.TP +.IR tcp_sack " (Boolean; default: enabled; since Linux 2.2)" +.\" Since Linux 2.1.36 +Enable RFC\ 2018 TCP Selective Acknowledgements. +.TP +.IR tcp_slow_start_after_idle " (Boolean; default: enabled; since Linux 2.6.18)" +.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt +If enabled, provide RFC 2861 behavior and time out the congestion +window after an idle period. +An idle period is defined as the current RTO (retransmission timeout). +If disabled, the congestion window will not +be timed out after an idle period. +.TP +.IR tcp_stdurg " (Boolean; default: disabled; since Linux 2.2)" +.\" Since Linux 2.1.44 +If this option is enabled, then use the RFC\ 1122 interpretation +of the TCP urgent-pointer field. +.\" RFC 793 was ambiguous in its specification of the meaning of the +.\" urgent pointer. RFC 1122 (and RFC 961) fixed on a particular +.\" resolution of this ambiguity (unfortunately the "wrong" one). +According to this interpretation, the urgent pointer points +to the last byte of urgent data. +If this option is disabled, then use the BSD-compatible interpretation of +the urgent pointer: +the urgent pointer points to the first byte after the urgent data. +Enabling this option may lead to interoperability problems. +.TP +.IR tcp_syn_retries " (integer; default: 6; since Linux 2.2)" +.\" Since Linux 2.1.38 +The maximum number of times initial SYNs for an active TCP +connection attempt will be retransmitted. +This value should not be higher than 255. +The default value is 6, which corresponds to retrying for up to +approximately 127 seconds. +Before Linux 3.7, +.\" commit 6c9ff979d1921e9fd05d89e1383121c2503759b9 +the default value was 5, which +(in conjunction with calculation based on other kernel parameters) +corresponded to approximately 180 seconds. +.TP +.IR tcp_synack_retries " (integer; default: 5; since Linux 2.2)" +.\" Since Linux 2.1.38 +The maximum number of times a SYN/ACK segment +for a passive TCP connection will be retransmitted. +This number should not be higher than 255. +.TP +.IR tcp_syncookies " (integer; default: 1; since Linux 2.2)" +.\" Since Linux 2.1.43 +Enable TCP syncookies. +The kernel must be compiled with +.BR CONFIG_SYN_COOKIES . +The syncookies feature attempts to protect a +socket from a SYN flood attack. +This should be used as a last resort, if at all. +This is a violation of the TCP protocol, +and conflicts with other areas of TCP such as TCP extensions. +It can cause problems for clients and relays. +It is not recommended as a tuning mechanism for heavily +loaded servers to help with overloaded or misconfigured conditions. +For recommended alternatives see +.IR tcp_max_syn_backlog , +.IR tcp_synack_retries , +and +.IR tcp_abort_on_overflow . +Set to one of the following values: +.RS +.TP +.B 0 +Disable TCP syncookies. +.TP +.B 1 +Send out syncookies when the syn backlog queue of a socket overflows. +.TP +.B 2 +(since Linux 3.12) +.\" commit 5ad37d5deee1ff7150a2d0602370101de158ad86 +Send out syncookies unconditionally. +This can be useful for network testing. +.RE +.TP +.IR tcp_timestamps " (integer; default: 1; since Linux 2.2)" +.\" Since Linux 2.1.36 +Set to one of the following values to enable or disable RFC\ 1323 +TCP timestamps: +.RS +.TP +.B 0 +Disable timestamps. +.TP +.B 1 +Enable timestamps as defined in RFC1323 and use random offset for +each connection rather than only using the current time. +.TP +.B 2 +As for the value 1, but without random offsets. +.\" commit 25429d7b7dca01dc4f17205de023a30ca09390d0 +Setting +.I tcp_timestamps +to this value is meaningful since Linux 4.10. +.RE +.TP +.IR tcp_tso_win_divisor " (integer; default: 3; since Linux 2.6.9)" +This parameter controls what percentage of the congestion window +can be consumed by a single TCP Segmentation Offload (TSO) frame. +The setting of this parameter is a tradeoff between burstiness and +building larger TSO frames. +.TP +.IR tcp_tw_recycle " (Boolean; default: disabled; Linux 2.4 to Linux 4.11)" +.\" Since Linux 2.3.15 +.\" removed in Linux 4.12; commit 4396e46187ca5070219b81773c4e65088dac50cc +Enable fast recycling of TIME_WAIT sockets. +Enabling this option is +not recommended as the remote IP may not use monotonically increasing +timestamps (devices behind NAT, devices with per-connection timestamp +offsets). +See RFC 1323 (PAWS) and RFC 6191. +.\" +.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.IR tcp_tw_reuse " (Boolean; default: disabled; since Linux 2.4.19/2.6)" +.\" Since Linux 2.4.19/2.5.43 +Allow to reuse TIME_WAIT sockets for new connections when it is +safe from protocol viewpoint. +It should not be changed without advice/request of technical experts. +.\" +.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.IR tcp_vegas_cong_avoid " (Boolean; default: disabled; Linux 2.2 to Linux 2.6.13)" +.\" Since Linux 2.1.8; removed in Linux 2.6.13 +Enable TCP Vegas congestion avoidance algorithm. +TCP Vegas is a sender-side-only change to TCP that anticipates +the onset of congestion by estimating the bandwidth. +TCP Vegas adjusts the sending rate by modifying the congestion window. +TCP Vegas should provide less packet loss, but it is +not as aggressive as TCP Reno. +.\" +.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.IR tcp_westwood " (Boolean; default: disabled; Linux 2.4.26/2.6.3 to Linux 2.6.13)" +Enable TCP Westwood+ congestion control algorithm. +TCP Westwood+ is a sender-side-only modification of the TCP Reno +protocol stack that optimizes the performance of TCP congestion control. +It is based on end-to-end bandwidth estimation to set +congestion window and slow start threshold after a congestion episode. +Using this estimation, TCP Westwood+ adaptively sets a +slow start threshold and a congestion window which takes into +account the bandwidth used at the time congestion is experienced. +TCP Westwood+ significantly increases fairness with respect to +TCP Reno in wired networks and throughput over wireless links. +.TP +.IR tcp_window_scaling " (Boolean; default: enabled; since Linux 2.2)" +.\" Since Linux 2.1.36 +Enable RFC\ 1323 TCP window scaling. +This feature allows the use of a large window +(> 64\ kB) on a TCP connection, should the other end support it. +Normally, the 16 bit window length field in the TCP header +limits the window size to less than 64\ kB. +If larger windows are desired, applications can increase the size of +their socket buffers and the window scaling option will be employed. +If +.I tcp_window_scaling +is disabled, TCP will not negotiate the use of window +scaling with the other end during connection setup. +.TP +.IR tcp_wmem " (since Linux 2.4)" +.\" Since Linux 2.4.0-test7 +This is a vector of 3 integers: [min, default, max]. +These parameters are used by TCP to regulate send buffer sizes. +TCP dynamically adjusts the size of the send buffer from the +default values listed below, in the range of these values, +depending on memory available. +.RS +.TP +.I min +Minimum size of the send buffer used by each TCP socket. +The default value is the system page size. +(On Linux 2.4, the default value is 4\ kB.) +This value is used to ensure that in memory pressure mode, +allocations below this size will still succeed. +This is not used to bound the size of the send buffer declared using +.B SO_SNDBUF +on a socket. +.TP +.I default +The default size of the send buffer for a TCP socket. +This value overwrites the initial default buffer size from +the generic global +.I /proc/sys/net/core/wmem_default +defined for all protocols. +The default value is 16\ kB. +.\" True in Linux 2.4 and 2.6 +If larger send buffer sizes are desired, this value +should be increased (to affect all sockets). +To employ large TCP windows, the +.I /proc/sys/net/ipv4/tcp_window_scaling +must be set to a nonzero value (default). +.TP +.I max +The maximum size of the send buffer used by each TCP socket. +This value does not override the value in +.IR /proc/sys/net/core/wmem_max . +This is not used to limit the size of the send buffer declared using +.B SO_SNDBUF +on a socket. +The default value is calculated using the formula +.IP +.in +4n +.EX +max(65536, min(4\ MB, \fItcp_mem\fP[1]*PAGE_SIZE/128)) +.EE +.in +.IP +(On Linux 2.4, the default value is 128\ kB, +lowered 64\ kB depending on low-memory systems.) +.RE +.TP +.IR tcp_workaround_signed_windows " (Boolean; default: disabled; since Linux 2.6.26)" +If enabled, assume that no receipt of a window-scaling option means that the +remote TCP is broken and treats the window as a signed quantity. +If disabled, assume that the remote TCP is not broken even if we do +not receive a window scaling option from it. +.SS Socket options +To set or get a TCP socket option, call +.BR getsockopt (2) +to read or +.BR setsockopt (2) +to write the option with the option level argument set to +.BR IPPROTO_TCP . +Unless otherwise noted, +.I optval +is a pointer to an +.IR int . +.\" or SOL_TCP on Linux +In addition, +most +.B IPPROTO_IP +socket options are valid on TCP sockets. +For more information see +.BR ip (7). +.PP +Following is a list of TCP-specific socket options. +For details of some other socket options that are also applicable +for TCP sockets, see +.BR socket (7). +.TP +.BR TCP_CONGESTION " (since Linux 2.6.13)" +.\" commit 5f8ef48d240963093451bcf83df89f1a1364f51d +.\" Author: Stephen Hemminger <shemminger@osdl.org> +The argument for this option is a string. +This option allows the caller to set the TCP congestion control +algorithm to be used, on a per-socket basis. +Unprivileged processes are restricted to choosing one of the algorithms in +.I tcp_allowed_congestion_control +(described above). +Privileged processes +.RB ( CAP_NET_ADMIN ) +can choose from any of the available congestion-control algorithms +(see the description of +.I tcp_available_congestion_control +above). +.TP +.BR TCP_CORK " (since Linux 2.2)" +.\" precisely: since Linux 2.1.127 +If set, don't send out partial frames. +All queued partial frames are sent when the option is cleared again. +This is useful for prepending headers before calling +.BR sendfile (2), +or for throughput optimization. +As currently implemented, there is a 200 millisecond ceiling on the time +for which output is corked by +.BR TCP_CORK . +If this ceiling is reached, then queued data is automatically transmitted. +This option can be combined with +.B TCP_NODELAY +only since Linux 2.5.71. +This option should not be used in code intended to be portable. +.TP +.BR TCP_DEFER_ACCEPT " (since Linux 2.4)" +.\" Precisely: since Linux 2.3.38 +.\" Useful references: +.\" http://www.techrepublic.com/article/take-advantage-of-tcp-ip-options-to-optimize-data-transmission/ +.\" http://unix.stackexchange.com/questions/94104/real-world-use-of-tcp-defer-accept +Allow a listener to be awakened only when data arrives on the socket. +Takes an integer value (seconds), this can +bound the maximum number of attempts TCP will make to +complete the connection. +This option should not be used in code intended to be portable. +.TP +.BR TCP_INFO " (since Linux 2.4)" +Used to collect information about this socket. +The kernel returns a \fIstruct tcp_info\fP as defined in the file +.IR /usr/include/linux/tcp.h . +This option should not be used in code intended to be portable. +.TP +.BR TCP_KEEPCNT " (since Linux 2.4)" +.\" Precisely: since Linux 2.3.18 +The maximum number of keepalive probes TCP should send +before dropping the connection. +This option should not be +used in code intended to be portable. +.TP +.BR TCP_KEEPIDLE " (since Linux 2.4)" +.\" Precisely: since Linux 2.3.18 +The time (in seconds) the connection needs to remain idle +before TCP starts sending keepalive probes, if the socket +option +.B SO_KEEPALIVE +has been set on this socket. +This option should not be used in code intended to be portable. +.TP +.BR TCP_KEEPINTVL " (since Linux 2.4)" +.\" Precisely: since Linux 2.3.18 +The time (in seconds) between individual keepalive probes. +This option should not be used in code intended to be portable. +.TP +.BR TCP_LINGER2 " (since Linux 2.4)" +.\" Precisely: since Linux 2.3.41 +The lifetime of orphaned FIN_WAIT2 state sockets. +This option can be used to override the system-wide setting in the file +.I /proc/sys/net/ipv4/tcp_fin_timeout +for this socket. +This is not to be confused with the +.BR socket (7) +level option +.BR SO_LINGER . +This option should not be used in code intended to be portable. +.TP +.B TCP_MAXSEG +.\" Present in Linux 1.0 +The maximum segment size for outgoing TCP packets. +In Linux 2.2 and earlier, and in Linux 2.6.28 and later, +if this option is set before connection establishment, it also +changes the MSS value announced to the other end in the initial packet. +Values greater than the (eventual) interface MTU have no effect. +TCP will also impose +its minimum and maximum bounds over the value provided. +.TP +.B TCP_NODELAY +.\" Present in Linux 1.0 +If set, disable the Nagle algorithm. +This means that segments +are always sent as soon as possible, even if there is only a +small amount of data. +When not set, data is buffered until there +is a sufficient amount to send out, thereby avoiding the +frequent sending of small packets, which results in poor +utilization of the network. +This option is overridden by +.BR TCP_CORK ; +however, setting this option forces an explicit flush of +pending output, even if +.B TCP_CORK +is currently set. +.TP +.BR TCP_QUICKACK " (since Linux 2.4.4)" +Enable quickack mode if set or disable quickack +mode if cleared. +In quickack mode, acks are sent +immediately, rather than delayed if needed in accordance +to normal TCP operation. +This flag is not permanent, +it only enables a switch to or from quickack mode. +Subsequent operation of the TCP protocol will +once again enter/leave quickack mode depending on +internal protocol processing and factors such as +delayed ack timeouts occurring and data transfer. +This option should not be used in code intended to be +portable. +.TP +.BR TCP_SYNCNT " (since Linux 2.4)" +.\" Precisely: since Linux 2.3.18 +Set the number of SYN retransmits that TCP should send before +aborting the attempt to connect. +It cannot exceed 255. +This option should not be used in code intended to be portable. +.TP +.BR TCP_USER_TIMEOUT " (since Linux 2.6.37)" +.\" commit dca43c75e7e545694a9dd6288553f55c53e2a3a3 +.\" Author: Jerry Chu <hkchu@google.com> +.\" The following text taken nearly verbatim from Jerry Chu's (excellent) +.\" commit message. +.\" +This option takes an +.I unsigned int +as an argument. +When the value is greater than 0, +it specifies the maximum amount of time in milliseconds that transmitted +data may remain unacknowledged, or buffered data may remain untransmitted +(due to zero window size) before TCP will forcibly close the +corresponding connection and return +.B ETIMEDOUT +to the application. +If the option value is specified as 0, +TCP will use the system default. +.IP +Increasing user timeouts allows a TCP connection to survive extended +periods without end-to-end connectivity. +Decreasing user timeouts +allows applications to "fail fast", if so desired. +Otherwise, failure may take up to 20 minutes with +the current system defaults in a normal WAN environment. +.IP +This option can be set during any state of a TCP connection, +but is effective only during the synchronized states of a connection +(ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, and LAST-ACK). +Moreover, when used with the TCP keepalive +.RB ( SO_KEEPALIVE ) +option, +.B TCP_USER_TIMEOUT +will override keepalive to determine when to close a +connection due to keepalive failure. +.IP +The option has no effect on when TCP retransmits a packet, +nor when a keepalive probe is sent. +.IP +This option, like many others, will be inherited by the socket returned by +.BR accept (2), +if it was set on the listening socket. +.IP +Further details on the user timeout feature can be found in +RFC\ 793 and RFC\ 5482 ("TCP User Timeout Option"). +.TP +.BR TCP_WINDOW_CLAMP " (since Linux 2.4)" +.\" Precisely: since Linux 2.3.41 +Bound the size of the advertised window to this value. +The kernel imposes a minimum size of SOCK_MIN_RCVBUF/2. +This option should not be used in code intended to be +portable. +.TP +.BR TCP_FASTOPEN " (since Linux 3.6)" +This option enables Fast Open (RFC\~7413) on the listener socket. +The value specifies the maximum length of pending SYNs +(similar to the backlog argument in +.BR listen (2)). +Once enabled, +the listener socket grants the TCP Fast Open cookie +on incoming SYN with TCP Fast Open option. +.IP +More importantly it accepts the data in SYN with a valid Fast Open cookie +and responds SYN-ACK acknowledging both the data and the SYN sequence. +.BR accept (2) +returns a socket that is available for read and write +when the handshake has not completed yet. +Thus the data exchange can commence before the handshake completes. +This option requires enabling the server-side support on sysctl +.I net.ipv4.tcp_fastopen +(see above). +For TCP Fast Open client-side support, +see +.BR send (2) +.B MSG_FASTOPEN +or +.B TCP_FASTOPEN_CONNECT +below. +.TP +.BR TCP_FASTOPEN_CONNECT " (since Linux 4.11)" +This option enables an alternative way to perform Fast Open +on the active side (client). +When this option is enabled, +.BR connect (2) +would behave differently depending on +if a Fast Open cookie is available for the destination. +.IP +If a cookie is not available (i.e. first contact to the destination), +.BR connect (2) +behaves as usual by sending a SYN immediately, +except the SYN would include an empty Fast Open cookie option +to solicit a cookie. +.IP +If a cookie is available, +.BR connect (2) +would return 0 immediately but the SYN transmission is deferred. +A subsequent +.BR write (2) +or +.BR sendmsg (2) +would trigger a SYN with data plus cookie in the Fast Open option. +In other words, +the actual connect operation is deferred until data is supplied. +.IP +.B Note: +While this option is designed for convenience, +enabling it does change the behaviors and certain system calls might set +different +.I errno +values. +With cookie present, +.BR write (2) +or +.BR sendmsg (2) +must be called right after +.BR connect (2) +in order to send out SYN+data to complete 3WHS and establish connection. +Calling +.BR read (2) +right after +.BR connect (2) +without +.BR write (2) +will cause the blocking socket to be blocked forever. +.IP +The application should either set +.B TCP_FASTOPEN_CONNECT +socket option before +.BR write (2) +or +.BR sendmsg (2), +or call +.BR write (2) +or +.BR sendmsg (2) +with +.B MSG_FASTOPEN +flag directly, +instead of both on the same connection. +.IP +Here is the typical call flow with this new option: +.IP +.in +4n +.EX +s = socket(); +setsockopt(s, IPPROTO_TCP, TCP_FASTOPEN_CONNECT, 1, ...); +connect(s); +write(s); /* write() should always follow connect() + * in order to trigger SYN to go out. */ +read(s)/write(s); +/* ... */ +close(s); +.EE +.in +.SS Sockets API +TCP provides limited support for out-of-band data, +in the form of (a single byte of) urgent data. +In Linux this means if the other end sends newer out-of-band +data the older urgent data is inserted as normal data into +the stream (even when +.B SO_OOBINLINE +is not set). +This differs from BSD-based stacks. +.PP +Linux uses the BSD compatible interpretation of the urgent +pointer field by default. +This violates RFC\ 1122, but is +required for interoperability with other stacks. +It can be changed via +.IR /proc/sys/net/ipv4/tcp_stdurg . +.PP +It is possible to peek at out-of-band data using the +.BR recv (2) +.B MSG_PEEK +flag. +.PP +Since Linux 2.4, Linux supports the use of +.B MSG_TRUNC +in the +.I flags +argument of +.BR recv (2) +(and +.BR recvmsg (2)). +This flag causes the received bytes of data to be discarded, +rather than passed back in a caller-supplied buffer. +Since Linux 2.4.4, +.B MSG_TRUNC +also has this effect when used in conjunction with +.B MSG_OOB +to receive out-of-band data. +.SS Ioctls +The following +.BR ioctl (2) +calls return information in +.IR value . +The correct syntax is: +.PP +.RS +.nf +.BI int " value"; +.IB error " = ioctl(" tcp_socket ", " ioctl_type ", &" value ");" +.fi +.RE +.PP +.I ioctl_type +is one of the following: +.TP +.B SIOCINQ +Returns the amount of queued unread data in the receive buffer. +The socket must not be in LISTEN state, otherwise an error +.RB ( EINVAL ) +is returned. +.B SIOCINQ +is defined in +.IR <linux/sockios.h> . +.\" FIXME https://www.sourceware.org/bugzilla/show_bug.cgi?id=12002, +.\" filed 2010-09-10, may cause SIOCINQ to be defined in glibc headers +Alternatively, +you can use the synonymous +.BR FIONREAD , +defined in +.IR <sys/ioctl.h> . +.TP +.B SIOCATMARK +Returns true (i.e., +.I value +is nonzero) if the inbound data stream is at the urgent mark. +.IP +If the +.B SO_OOBINLINE +socket option is set, and +.B SIOCATMARK +returns true, then the +next read from the socket will return the urgent data. +If the +.B SO_OOBINLINE +socket option is not set, and +.B SIOCATMARK +returns true, then the +next read from the socket will return the bytes following +the urgent data (to actually read the urgent data requires the +.B recv(MSG_OOB) +flag). +.IP +Note that a read never reads across the urgent mark. +If an application is informed of the presence of urgent data via +.BR select (2) +(using the +.I exceptfds +argument) or through delivery of a +.B SIGURG +signal, +then it can advance up to the mark using a loop which repeatedly tests +.B SIOCATMARK +and performs a read (requesting any number of bytes) as long as +.B SIOCATMARK +returns false. +.TP +.B SIOCOUTQ +Returns the amount of unsent data in the socket send queue. +The socket must not be in LISTEN state, otherwise an error +.RB ( EINVAL ) +is returned. +.B SIOCOUTQ +is defined in +.IR <linux/sockios.h> . +.\" FIXME . https://www.sourceware.org/bugzilla/show_bug.cgi?id=12002, +.\" filed 2010-09-10, may cause SIOCOUTQ to be defined in glibc headers +Alternatively, +you can use the synonymous +.BR TIOCOUTQ , +defined in +.IR <sys/ioctl.h> . +.SS Error handling +When a network error occurs, TCP tries to resend the packet. +If it doesn't succeed after some time, either +.B ETIMEDOUT +or the last received error on this connection is reported. +.PP +Some applications require a quicker error notification. +This can be enabled with the +.B IPPROTO_IP +level +.B IP_RECVERR +socket option. +When this option is enabled, all incoming +errors are immediately passed to the user program. +Use this option with care \[em] it makes TCP less tolerant to routing +changes and other normal network conditions. +.SH ERRORS +.TP +.B EAFNOTSUPPORT +Passed socket address type in +.I sin_family +was not +.BR AF_INET . +.TP +.B EPIPE +The other end closed the socket unexpectedly or a read is +executed on a shut down socket. +.TP +.B ETIMEDOUT +The other end didn't acknowledge retransmitted data after some time. +.PP +Any errors defined for +.BR ip (7) +or the generic socket layer may also be returned for TCP. +.SH VERSIONS +Support for Explicit Congestion Notification, zero-copy +.BR sendfile (2), +reordering support and some SACK extensions +(DSACK) were introduced in Linux 2.4. +Support for forward acknowledgement (FACK), TIME_WAIT recycling, +and per-connection keepalive socket options were introduced in Linux 2.3. +.SH BUGS +Not all errors are documented. +.PP +IPv6 is not described. +.\" Only a single Linux kernel version is described +.\" Info for 2.2 was lost. Should be added again, +.\" or put into a separate page. +.\" .SH AUTHORS +.\" This man page was originally written by Andi Kleen. +.\" It was updated for 2.4 by Nivedita Singhvi with input from +.\" Alexey Kuznetsov's Documentation/networking/ip-sysctl.txt +.\" document. +.SH SEE ALSO +.BR accept (2), +.BR bind (2), +.BR connect (2), +.BR getsockopt (2), +.BR listen (2), +.BR recvmsg (2), +.BR sendfile (2), +.BR sendmsg (2), +.BR socket (2), +.BR ip (7), +.BR socket (7) +.PP +The kernel source file +.IR Documentation/networking/ip\-sysctl.txt . +.PP +RFC\ 793 for the TCP specification. +.br +RFC\ 1122 for the TCP requirements and a description of the Nagle algorithm. +.br +RFC\ 1323 for TCP timestamp and window scaling options. +.br +RFC\ 1337 for a description of TIME_WAIT assassination hazards. +.br +RFC\ 3168 for a description of Explicit Congestion Notification. +.br +RFC\ 2581 for TCP congestion control algorithms. +.br +RFC\ 2018 and RFC\ 2883 for SACK and extensions to SACK. |