diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-24 04:52:22 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-24 04:52:22 +0000 |
commit | 3d08cd331c1adcf0d917392f7e527b3f00511748 (patch) | |
tree | 312f0d1e1632f48862f044b8bb87e602dcffb5f9 /man7/tcp.7 | |
parent | Adding debian version 6.7-2. (diff) | |
download | manpages-3d08cd331c1adcf0d917392f7e527b3f00511748.tar.xz manpages-3d08cd331c1adcf0d917392f7e527b3f00511748.zip |
Merging upstream version 6.8.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'man7/tcp.7')
-rw-r--r-- | man7/tcp.7 | 1563 |
1 files changed, 0 insertions, 1563 deletions
diff --git a/man7/tcp.7 b/man7/tcp.7 deleted file mode 100644 index 2b1d333..0000000 --- a/man7/tcp.7 +++ /dev/null @@ -1,1563 +0,0 @@ -.\" SPDX-License-Identifier: Linux-man-pages-1-para -.\" -.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>. -.\" and Copyright (C) 2008 Michael Kerrisk <mtk.manpages@gmail.com> -.\" Note also that many pieces are drawn from the kernel source file -.\" Documentation/networking/ip-sysctl.txt. -.\" -.\" 2.4 Updates by Nivedita Singhvi 4/20/02 <nivedita@us.ibm.com>. -.\" Modified, 2004-11-11, Michael Kerrisk and Andries Brouwer -.\" Updated details of interaction of TCP_CORK and TCP_NODELAY. -.\" -.\" 2008-11-21, mtk, many, many updates. -.\" The descriptions of /proc files and socket options should now -.\" be more or less up to date and complete as at Linux 2.6.27 -.\" (other than the remaining FIXMEs in the page source below). -.\" -.\" FIXME The following need to be documented -.\" TCP_MD5SIG (2.6.20) -.\" commit cfb6eeb4c860592edd123fdea908d23c6ad1c7dc -.\" Author was yoshfuji@linux-ipv6.org -.\" Needs CONFIG_TCP_MD5SIG -.\" From net/inet/Kconfig: -.\" bool "TCP: MD5 Signature Option support (RFC2385) (EXPERIMENTAL)" -.\" RFC2385 specifies a method of giving MD5 protection to TCP sessions. -.\" Its main (only?) use is to protect BGP sessions between core routers -.\" on the Internet. -.\" -.\" There is a TCP_MD5SIG option documented in FreeBSD's tcp(4), -.\" but probably many details are different on Linux -.\" http://thread.gmane.org/gmane.linux.network/47490 -.\" http://www.daemon-systems.org/man/tcp.4.html -.\" http://article.gmane.org/gmane.os.netbsd.devel.network/3767/match=tcp_md5sig+freebsd -.\" -.\" TCP_COOKIE_TRANSACTIONS (2.6.33) -.\" commit 519855c508b9a17878c0977a3cdefc09b59b30df -.\" Author: William Allen Simpson <william.allen.simpson@gmail.com> -.\" commit e56fb50f2b7958b931c8a2fc0966061b3f3c8f3a -.\" Author: William Allen Simpson <william.allen.simpson@gmail.com> -.\" -.\" REMOVED in Linux 3.10 -.\" commit 1a2c6181c4a1922021b4d7df373bba612c3e5f04 -.\" Author: Christoph Paasch <christoph.paasch@uclouvain.be> -.\" -.\" TCP_THIN_LINEAR_TIMEOUTS (2.6.34) -.\" commit 36e31b0af58728071e8023cf8e20c5166b700717 -.\" Author: Andreas Petlund <apetlund@simula.no> -.\" -.\" TCP_THIN_DUPACK (2.6.34) -.\" commit 7e38017557bc0b87434d184f8804cadb102bb903 -.\" Author: Andreas Petlund <apetlund@simula.no> -.\" -.\" TCP_REPAIR (3.5) -.\" commit ee9952831cfd0bbe834f4a26489d7dce74582e37 -.\" Author: Pavel Emelyanov <xemul@parallels.com> -.\" See also -.\" http://criu.org/TCP_connection -.\" https://lwn.net/Articles/495304/ -.\" -.\" TCP_REPAIR_QUEUE (3.5) -.\" commit ee9952831cfd0bbe834f4a26489d7dce74582e37 -.\" Author: Pavel Emelyanov <xemul@parallels.com> -.\" -.\" TCP_QUEUE_SEQ (3.5) -.\" commit ee9952831cfd0bbe834f4a26489d7dce74582e37 -.\" Author: Pavel Emelyanov <xemul@parallels.com> -.\" -.\" TCP_REPAIR_OPTIONS (3.5) -.\" commit b139ba4e90dccbf4cd4efb112af96a5c9e0b098c -.\" Author: Pavel Emelyanov <xemul@parallels.com> -.\" -.\" TCP_FASTOPEN (3.6) -.\" (Fast Open server side implementation completed in Linux 3.7) -.\" http://lwn.net/Articles/508865/ -.\" -.\" TCP_TIMESTAMP (3.9) -.\" commit 93be6ce0e91b6a94783e012b1857a347a5e6e9f2 -.\" Author: Andrey Vagin <avagin@openvz.org> -.\" -.\" TCP_NOTSENT_LOWAT (3.12) -.\" commit c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 -.\" Author: Eric Dumazet <edumazet@google.com> -.\" -.\" TCP_CC_INFO (4.1) -.\" commit 6e9250f59ef9efb932c84850cd221f22c2a03c4a -.\" Author: Eric Dumazet <edumazet@google.com> -.\" -.\" TCP_SAVE_SYN, TCP_SAVED_SYN (4.2) -.\" commit cd8ae85299d54155702a56811b2e035e63064d3d -.\" Author: Eric Dumazet <edumazet@google.com> -.\" -.TH tcp 7 2023-10-31 "Linux man-pages 6.7" -.SH NAME -tcp \- TCP protocol -.SH SYNOPSIS -.nf -.B #include <sys/socket.h> -.B #include <netinet/in.h> -.B #include <netinet/tcp.h> -.P -.IB tcp_socket " = socket(AF_INET, SOCK_STREAM, 0);" -.fi -.SH DESCRIPTION -This is an implementation of the TCP protocol defined in -RFC\ 793, RFC\ 1122 and RFC\ 2001 with the NewReno and SACK -extensions. -It provides a reliable, stream-oriented, -full-duplex connection between two sockets on top of -.BR ip (7), -for both v4 and v6 versions. -TCP guarantees that the data arrives in order and -retransmits lost packets. -It generates and checks a per-packet checksum to catch -transmission errors. -TCP does not preserve record boundaries. -.P -A newly created TCP socket has no remote or local address and is not -fully specified. -To create an outgoing TCP connection use -.BR connect (2) -to establish a connection to another TCP socket. -To receive new incoming connections, first -.BR bind (2) -the socket to a local address and port and then call -.BR listen (2) -to put the socket into the listening state. -After that a new socket for each incoming connection can be accepted using -.BR accept (2). -A socket which has had -.BR accept (2) -or -.BR connect (2) -successfully called on it is fully specified and may transmit data. -Data cannot be transmitted on listening or not yet connected sockets. -.P -Linux supports RFC\ 1323 TCP high performance -extensions. -These include Protection Against Wrapped -Sequence Numbers (PAWS), Window Scaling and Timestamps. -Window scaling allows the use -of large (> 64\ kB) TCP windows in order to support links with high -latency or bandwidth. -To make use of them, the send and receive buffer sizes must be increased. -They can be set globally with the -.I /proc/sys/net/ipv4/tcp_wmem -and -.I /proc/sys/net/ipv4/tcp_rmem -files, or on individual sockets by using the -.B SO_SNDBUF -and -.B SO_RCVBUF -socket options with the -.BR setsockopt (2) -call. -.P -The maximum sizes for socket buffers declared via the -.B SO_SNDBUF -and -.B SO_RCVBUF -mechanisms are limited by the values in the -.I /proc/sys/net/core/rmem_max -and -.I /proc/sys/net/core/wmem_max -files. -Note that TCP actually allocates twice the size of -the buffer requested in the -.BR setsockopt (2) -call, and so a succeeding -.BR getsockopt (2) -call will not return the same size of buffer as requested in the -.BR setsockopt (2) -call. -TCP uses the extra space for administrative purposes and internal -kernel structures, and the -.I /proc -file values reflect the -larger sizes compared to the actual TCP windows. -On individual connections, the socket buffer size must be set prior to the -.BR listen (2) -or -.BR connect (2) -calls in order to have it take effect. -See -.BR socket (7) -for more information. -.P -TCP supports urgent data. -Urgent data is used to signal the -receiver that some important message is part of the data -stream and that it should be processed as soon as possible. -To send urgent data specify the -.B MSG_OOB -option to -.BR send (2). -When urgent data is received, the kernel sends a -.B SIGURG -signal to the process or process group that has been set as the -socket "owner" using the -.B SIOCSPGRP -or -.B FIOSETOWN -ioctls (or the POSIX.1-specified -.BR fcntl (2) -.B F_SETOWN -operation). -When the -.B SO_OOBINLINE -socket option is enabled, urgent data is put into the normal -data stream (a program can test for its location using the -.B SIOCATMARK -ioctl described below), -otherwise it can be received only when the -.B MSG_OOB -flag is set for -.BR recv (2) -or -.BR recvmsg (2). -.P -When out-of-band data is present, -.BR select (2) -indicates the file descriptor as having an exceptional condition and -.I poll (2) -indicates a -.B POLLPRI -event. -.P -Linux 2.4 introduced a number of changes for improved -throughput and scaling, as well as enhanced functionality. -Some of these features include support for zero-copy -.BR sendfile (2), -Explicit Congestion Notification, new -management of TIME_WAIT sockets, keep-alive socket options -and support for Duplicate SACK extensions. -.SS Address formats -TCP is built on top of IP (see -.BR ip (7)). -The address formats defined by -.BR ip (7) -apply to TCP. -TCP supports point-to-point communication only; -broadcasting and multicasting are not -supported. -.SS /proc interfaces -System-wide TCP parameter settings can be accessed by files in the directory -.IR /proc/sys/net/ipv4/ . -In addition, most IP -.I /proc -interfaces also apply to TCP; see -.BR ip (7). -Variables described as -.I Boolean -take an integer value, with a nonzero value ("true") meaning that -the corresponding option is enabled, and a zero value ("false") -meaning that the option is disabled. -.TP -.IR tcp_abc " (Integer; default: 0; Linux 2.6.15 to Linux 3.8)" -.\" Since Linux 2.6.15; removed in Linux 3.9 -.\" commit ca2eb5679f8ddffff60156af42595df44a315ef0 -.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt -Control the Appropriate Byte Count (ABC), defined in RFC 3465. -ABC is a way of increasing the congestion window -.RI ( cwnd ) -more slowly in response to partial acknowledgements. -Possible values are: -.RS -.TP -.B 0 -increase -.I cwnd -once per acknowledgement (no ABC) -.TP -.B 1 -increase -.I cwnd -once per acknowledgement of full sized segment -.TP -.B 2 -allow increase -.I cwnd -by two if acknowledgement is -of two segments to compensate for delayed acknowledgements. -.RE -.TP -.IR tcp_abort_on_overflow " (Boolean; default: disabled; since Linux 2.4)" -.\" Since Linux 2.3.41 -Enable resetting connections if the listening service is too -slow and unable to keep up and accept them. -It means that if overflow occurred due -to a burst, the connection will recover. -Enable this option -.I only -if you are really sure that the listening daemon -cannot be tuned to accept connections faster. -Enabling this option can harm the clients of your server. -.TP -.IR tcp_adv_win_scale " (integer; default: 2; since Linux 2.4)" -.\" Since Linux 2.4.0-test7 -Count buffering overhead as -.IR "bytes/2\[ha]tcp_adv_win_scale" , -if -.I tcp_adv_win_scale -is greater than 0; or -.IR "bytes\-bytes/2\[ha](\-tcp_adv_win_scale)" , -if -.I tcp_adv_win_scale -is less than or equal to zero. -.IP -The socket receive buffer space is shared between the -application and kernel. -TCP maintains part of the buffer as -the TCP window, this is the size of the receive window -advertised to the other end. -The rest of the space is used -as the "application" buffer, used to isolate the network -from scheduling and application latencies. -The -.I tcp_adv_win_scale -default value of 2 implies that the space -used for the application buffer is one fourth that of the total. -.TP -.IR tcp_allowed_congestion_control " (String; default: see text; since Linux 2.4.20)" -.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt -Show/set the congestion control algorithm choices available to unprivileged -processes (see the description of the -.B TCP_CONGESTION -socket option). -The items in the list are separated by white space and -terminated by a newline character. -The list is a subset of those listed in -.IR tcp_available_congestion_control . -The default value for this list is "reno" plus the default setting of -.IR tcp_congestion_control . -.TP -.IR tcp_autocorking " (Boolean; default: enabled; since Linux 3.14)" -.\" commit f54b311142a92ea2e42598e347b84e1655caf8e3 -.\" Text heavily based on Documentation/networking/ip-sysctl.txt -If this option is enabled, the kernel tries to coalesce small writes -(from consecutive -.BR write (2) -and -.BR sendmsg (2) -calls) as much as possible, -in order to decrease the total number of sent packets. -Coalescing is done if at least one prior packet for the flow -is waiting in Qdisc queues or device transmit queue. -Applications can still use the -.B TCP_CORK -socket option to obtain optimal behavior -when they know how/when to uncork their sockets. -.TP -.IR tcp_available_congestion_control " (String; read-only; since Linux 2.4.20)" -.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt -Show a list of the congestion-control algorithms -that are registered. -The items in the list are separated by white space and -terminated by a newline character. -This list is a limiting set for the list in -.IR tcp_allowed_congestion_control . -More congestion-control algorithms may be available as modules, -but not loaded. -.TP -.IR tcp_app_win " (integer; default: 31; since Linux 2.4)" -.\" Since Linux 2.4.0-test7 -This variable defines how many -bytes of the TCP window are reserved for buffering overhead. -.IP -A maximum of (\fIwindow/2\[ha]tcp_app_win\fP, mss) bytes in the window -are reserved for the application buffer. -A value of 0 implies that no amount is reserved. -.\" -.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt -.TP -.IR tcp_base_mss " (Integer; default: 512; since Linux 2.6.17)" -The initial value of -.I search_low -to be used by the packetization layer Path MTU discovery (MTU probing). -If MTU probing is enabled, -this is the initial MSS used by the connection. -.\" -.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt -.TP -.IR tcp_bic " (Boolean; default: disabled; Linux 2.4.27/2.6.6 to Linux 2.6.13)" -Enable BIC TCP congestion control algorithm. -BIC-TCP is a sender-side-only change that ensures a linear RTT -fairness under large windows while offering both scalability and -bounded TCP-friendliness. -The protocol combines two schemes -called additive increase and binary search increase. -When the congestion window is large, additive increase with a large -increment ensures linear RTT fairness as well as good scalability. -Under small congestion windows, binary search -increase provides TCP friendliness. -.\" -.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt -.TP -.IR tcp_bic_low_window " (integer; default: 14; Linux 2.4.27/2.6.6 to Linux 2.6.13)" -Set the threshold window (in packets) where BIC TCP starts to -adjust the congestion window. -Below this threshold BIC TCP behaves the same as the default TCP Reno. -.\" -.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt -.TP -.IR tcp_bic_fast_convergence " (Boolean; default: enabled; Linux 2.4.27/2.6.6 to Linux 2.6.13)" -Force BIC TCP to more quickly respond to changes in congestion window. -Allows two flows sharing the same connection to converge more rapidly. -.TP -.IR tcp_congestion_control " (String; default: see text; since Linux 2.4.13)" -.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt -Set the default congestion-control algorithm to be used for new connections. -The algorithm "reno" is always available, -but additional choices may be available depending on kernel configuration. -The default value for this file is set as part of kernel configuration. -.TP -.IR tcp_dma_copybreak " (integer; default: 4096; since Linux 2.6.24)" -Lower limit, in bytes, of the size of socket reads that will be -offloaded to a DMA copy engine, if one is present in the system -and the kernel was configured with the -.B CONFIG_NET_DMA -option. -.TP -.IR tcp_dsack " (Boolean; default: enabled; since Linux 2.4)" -.\" Since Linux 2.4.0-test7 -Enable RFC\ 2883 TCP Duplicate SACK support. -.TP -.IR tcp_fastopen " (Bitmask; default: 0x1; since Linux 3.7)" -Enables RFC\~7413 Fast Open support. -The flag is used as a bitmap with the following values: -.RS -.TP -.B 0x1 -Enables client side Fast Open support -.TP -.B 0x2 -Enables server side Fast Open support -.TP -.B 0x4 -Allows client side to transmit data in SYN without Fast Open option -.TP -.B 0x200 -Allows server side to accept SYN data without Fast Open option -.TP -.B 0x400 -Enables Fast Open on all listeners without -.B TCP_FASTOPEN -socket option -.RE -.TP -.IR tcp_fastopen_key " (since Linux 3.7)" -Set server side RFC\~7413 Fast Open key to generate Fast Open cookie -when server side Fast Open support is enabled. -.TP -.IR tcp_ecn " (Integer; default: see below; since Linux 2.4)" -.\" Since Linux 2.4.0-test7 -Enable RFC\ 3168 Explicit Congestion Notification. -.IP -This file can have one of the following values: -.RS -.TP -.B 0 -Disable ECN. -Neither initiate nor accept ECN. -This was the default up to and including Linux 2.6.30. -.TP -.B 1 -Enable ECN when requested by incoming connections and also -request ECN on outgoing connection attempts. -.TP -.B 2 -.\" commit 255cac91c3c9ce7dca7713b93ab03c75b7902e0e -Enable ECN when requested by incoming connections, -but do not request ECN on outgoing connections. -This value is supported, and is the default, since Linux 2.6.31. -.RE -.IP -When enabled, connectivity to some destinations could be affected -due to older, misbehaving middle boxes along the path, causing -connections to be dropped. -However, to facilitate and encourage deployment with option 1, and -to work around such buggy equipment, the -.B tcp_ecn_fallback -option has been introduced. -.TP -.IR tcp_ecn_fallback " (Boolean; default: enabled; since Linux 4.1)" -.\" commit 492135557dc090a1abb2cfbe1a412757e3ed68ab -Enable RFC\ 3168, Section 6.1.1.1. fallback. -When enabled, outgoing ECN-setup SYNs that time out within the -normal SYN retransmission timeout will be resent with CWR and -ECE cleared. -.TP -.IR tcp_fack " (Boolean; default: enabled; since Linux 2.2)" -.\" Since Linux 2.1.92 -Enable TCP Forward Acknowledgement support. -.TP -.IR tcp_fin_timeout " (integer; default: 60; since Linux 2.2)" -.\" Since Linux 2.1.53 -This specifies how many seconds to wait for a final FIN packet before the -socket is forcibly closed. -This is strictly a violation of the TCP specification, -but required to prevent denial-of-service attacks. -In Linux 2.2, the default value was 180. -.\" -.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt -.TP -.IR tcp_frto " (integer; default: see below; since Linux 2.4.21/2.6)" -.\" Since Linux 2.4.21/2.5.43 -Enable F-RTO, an enhanced recovery algorithm for TCP retransmission -timeouts (RTOs). -It is particularly beneficial in wireless environments -where packet loss is typically due to random radio interference -rather than intermediate router congestion. -See RFC 4138 for more details. -.IP -This file can have one of the following values: -.RS -.TP -.B 0 -Disabled. -This was the default up to and including Linux 2.6.23. -.TP -.B 1 -The basic version F-RTO algorithm is enabled. -.TP -.B 2 -.\" commit c96fd3d461fa495400df24be3b3b66f0e0b152f9 -Enable SACK-enhanced F-RTO if flow uses SACK. -The basic version can be used also when -SACK is in use though in that case scenario(s) exists where F-RTO -interacts badly with the packet counting of the SACK-enabled TCP flow. -This value is the default since Linux 2.6.24. -.RE -.IP -Before Linux 2.6.22, this parameter was a Boolean value, -supporting just values 0 and 1 above. -.TP -.IR tcp_frto_response " (integer; default: 0; since Linux 2.6.22)" -When F-RTO has detected that a TCP retransmission timeout was spurious -(i.e., the timeout would have been avoided had TCP set a -longer retransmission timeout), -TCP has several options concerning what to do next. -Possible values are: -.RS -.TP -.B 0 -Rate halving based; a smooth and conservative response, -results in halved congestion window -.RI ( cwnd ) -and slow-start threshold -.RI ( ssthresh ) -after one RTT. -.TP -.B 1 -Very conservative response; not recommended because even -though being valid, it interacts poorly with the rest of Linux TCP; halves -.I cwnd -and -.I ssthresh -immediately. -.TP -.B 2 -Aggressive response; undoes congestion-control measures -that are now known to be unnecessary -(ignoring the possibility of a lost retransmission that would require -TCP to be more cautious); -.I cwnd -and -.I ssthresh -are restored to the values prior to timeout. -.RE -.TP -.IR tcp_keepalive_intvl " (integer; default: 75; since Linux 2.4)" -.\" Since Linux 2.3.18 -The number of seconds between TCP keep-alive probes. -.TP -.IR tcp_keepalive_probes " (integer; default: 9; since Linux 2.2)" -.\" Since Linux 2.1.43 -The maximum number of TCP keep-alive probes to send -before giving up and killing the connection if -no response is obtained from the other end. -.TP -.IR tcp_keepalive_time " (integer; default: 7200; since Linux 2.2)" -.\" Since Linux 2.1.43 -The number of seconds a connection needs to be idle -before TCP begins sending out keep-alive probes. -Keep-alives are sent only when the -.B SO_KEEPALIVE -socket option is enabled. -The default value is 7200 seconds (2 hours). -An idle connection is terminated after -approximately an additional 11 minutes (9 probes an interval -of 75 seconds apart) when keep-alive is enabled. -.IP -Note that underlying connection tracking mechanisms and -application timeouts may be much shorter. -.\" -.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt -.TP -.IR tcp_low_latency " (Boolean; default: disabled; since Linux 2.4.21/2.6; \ -obsolete since Linux 4.14)" -.\" Since Linux 2.4.21/2.5.60 -If enabled, the TCP stack makes decisions that prefer lower -latency as opposed to higher throughput. -It this option is disabled, then higher throughput is preferred. -An example of an application where this default should be -changed would be a Beowulf compute cluster. -Since Linux 4.14, -.\" commit b6690b14386698ce2c19309abad3f17656bdfaea -this file still exists, but its value is ignored. -.TP -.IR tcp_max_orphans " (integer; default: see below; since Linux 2.4)" -.\" Since Linux 2.3.41 -The maximum number of orphaned (not attached to any user file -handle) TCP sockets allowed in the system. -When this number is exceeded, -the orphaned connection is reset and a warning is printed. -This limit exists only to prevent simple denial-of-service attacks. -Lowering this limit is not recommended. -Network conditions might require you to increase the number of -orphans allowed, but note that each orphan can eat up to \[ti]64\ kB -of unswappable memory. -The default initial value is set equal to the kernel parameter NR_FILE. -This initial default is adjusted depending on the memory in the system. -.TP -.IR tcp_max_syn_backlog " (integer; default: see below; since Linux 2.2)" -.\" Since Linux 2.1.53 -The maximum number of queued connection requests which have -still not received an acknowledgement from the connecting client. -If this number is exceeded, the kernel will begin -dropping requests. -The default value of 256 is increased to -1024 when the memory present in the system is adequate or -greater (>= 128\ MB), and reduced to 128 for those systems with -very low memory (<= 32\ MB). -.IP -Before Linux 2.6.20, -.\" commit 72a3effaf633bcae9034b7e176bdbd78d64a71db -it was recommended that if this needed to be increased above 1024, -the size of the SYNACK hash table -.RB ( TCP_SYNQ_HSIZE ) -in -.I include/net/tcp.h -should be modified to keep -.IP -.in +4n -.EX -TCP_SYNQ_HSIZE * 16 <= tcp_max_syn_backlog -.EE -.in -.IP -and the kernel should be -recompiled. -In Linux 2.6.20, the fixed sized -.B TCP_SYNQ_HSIZE -was removed in favor of dynamic sizing. -.TP -.IR tcp_max_tw_buckets " (integer; default: see below; since Linux 2.4)" -.\" Since Linux 2.3.41 -The maximum number of sockets in TIME_WAIT state allowed in -the system. -This limit exists only to prevent simple denial-of-service attacks. -The default value of NR_FILE*2 is adjusted -depending on the memory in the system. -If this number is -exceeded, the socket is closed and a warning is printed. -.TP -.IR tcp_moderate_rcvbuf " (Boolean; default: enabled; since Linux 2.4.17/2.6.7)" -.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt -If enabled, TCP performs receive buffer auto-tuning, -attempting to automatically size the buffer (no greater than -.IR tcp_rmem[2] ) -to match the size required by the path for full throughput. -.TP -.IR tcp_mem " (since Linux 2.4)" -.\" Since Linux 2.4.0-test7 -This is a vector of 3 integers: [low, pressure, high]. -These bounds, measured in units of the system page size, -are used by TCP to track its memory usage. -The defaults are calculated at boot time from the amount of -available memory. -(TCP can only use -.I "low memory" -for this, which is limited to around 900 megabytes on 32-bit systems. -64-bit systems do not suffer this limitation.) -.RS -.TP -.I low -TCP doesn't regulate its memory allocation when the number -of pages it has allocated globally is below this number. -.TP -.I pressure -When the amount of memory allocated by TCP -exceeds this number of pages, TCP moderates its memory consumption. -This memory pressure state is exited -once the number of pages allocated falls below -the -.I low -mark. -.TP -.I high -The maximum number of pages, globally, that TCP will allocate. -This value overrides any other limits imposed by the kernel. -.RE -.TP -.IR tcp_mtu_probing " (integer; default: 0; since Linux 2.6.17)" -.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt -This parameter controls TCP Packetization-Layer Path MTU Discovery. -The following values may be assigned to the file: -.RS -.TP -.B 0 -Disabled -.TP -.B 1 -Disabled by default, enabled when an ICMP black hole detected -.TP -.B 2 -Always enabled, use initial MSS of -.IR tcp_base_mss . -.RE -.TP -.IR tcp_no_metrics_save " (Boolean; default: disabled; since Linux 2.6.6)" -.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt -By default, TCP saves various connection metrics in the route cache -when the connection closes, so that connections established in the -near future can use these to set initial conditions. -Usually, this increases overall performance, -but it may sometimes cause performance degradation. -If -.I tcp_no_metrics_save -is enabled, TCP will not cache metrics on closing connections. -.TP -.IR tcp_orphan_retries " (integer; default: 8; since Linux 2.4)" -.\" Since Linux 2.3.41 -The maximum number of attempts made to probe the other -end of a connection which has been closed by our end. -.TP -.IR tcp_reordering " (integer; default: 3; since Linux 2.4)" -.\" Since Linux 2.4.0-test7 -The maximum a packet can be reordered in a TCP packet stream -without TCP assuming packet loss and going into slow start. -It is not advisable to change this number. -This is a packet reordering detection metric designed to -minimize unnecessary back off and retransmits provoked by -reordering of packets on a connection. -.TP -.IR tcp_retrans_collapse " (Boolean; default: enabled; since Linux 2.2)" -.\" Since Linux 2.1.96 -Try to send full-sized packets during retransmit. -.TP -.IR tcp_retries1 " (integer; default: 3; since Linux 2.2)" -.\" Since Linux 2.1.43 -The number of times TCP will attempt to retransmit a -packet on an established connection normally, -without the extra effort of getting the network layers involved. -Once we exceed this number of -retransmits, we first have the network layer -update the route if possible before each new retransmit. -The default is the RFC specified minimum of 3. -.TP -.IR tcp_retries2 " (integer; default: 15; since Linux 2.2)" -.\" Since Linux 2.1.43 -The maximum number of times a TCP packet is retransmitted -in established state before giving up. -The default value is 15, which corresponds to a duration of -approximately between 13 to 30 minutes, depending -on the retransmission timeout. -The RFC\ 1122 specified -minimum limit of 100 seconds is typically deemed too short. -.TP -.IR tcp_rfc1337 " (Boolean; default: disabled; since Linux 2.2)" -.\" Since Linux 2.1.90 -Enable TCP behavior conformant with RFC\ 1337. -When disabled, -if a RST is received in TIME_WAIT state, we close -the socket immediately without waiting for the end -of the TIME_WAIT period. -.TP -.IR tcp_rmem " (since Linux 2.4)" -.\" Since Linux 2.4.0-test7 -This is a vector of 3 integers: [min, default, max]. -These parameters are used by TCP to regulate receive buffer sizes. -TCP dynamically adjusts the size of the -receive buffer from the defaults listed below, in the range -of these values, depending on memory available in the system. -.RS -.TP -.I min -minimum size of the receive buffer used by each TCP socket. -The default value is the system page size. -(On Linux 2.4, the default value is 4\ kB, lowered to -.B PAGE_SIZE -bytes in low-memory systems.) -This value -is used to ensure that in memory pressure mode, -allocations below this size will still succeed. -This is not -used to bound the size of the receive buffer declared -using -.B SO_RCVBUF -on a socket. -.TP -.I default -the default size of the receive buffer for a TCP socket. -This value overwrites the initial default buffer size from -the generic global -.I net.core.rmem_default -defined for all protocols. -The default value is 87380 bytes. -(On Linux 2.4, this will be lowered to 43689 in low-memory systems.) -If larger receive buffer sizes are desired, this value should -be increased (to affect all sockets). -To employ large TCP windows, the -.I net.ipv4.tcp_window_scaling -must be enabled (default). -.TP -.I max -the maximum size of the receive buffer used by each TCP socket. -This value does not override the global -.IR net.core.rmem_max . -This is not used to limit the size of the receive buffer declared using -.B SO_RCVBUF -on a socket. -The default value is calculated using the formula -.IP -.in +4n -.EX -max(87380, min(4\ MB, \fItcp_mem\fP[1]*PAGE_SIZE/128)) -.EE -.in -.IP -(On Linux 2.4, the default is 87380*2 bytes, -lowered to 87380 in low-memory systems). -.RE -.TP -.IR tcp_sack " (Boolean; default: enabled; since Linux 2.2)" -.\" Since Linux 2.1.36 -Enable RFC\ 2018 TCP Selective Acknowledgements. -.TP -.IR tcp_slow_start_after_idle " (Boolean; default: enabled; since Linux 2.6.18)" -.\" The following is from Linux 2.6.28-rc4: Documentation/networking/ip-sysctl.txt -If enabled, provide RFC 2861 behavior and time out the congestion -window after an idle period. -An idle period is defined as the current RTO (retransmission timeout). -If disabled, the congestion window will not -be timed out after an idle period. -.TP -.IR tcp_stdurg " (Boolean; default: disabled; since Linux 2.2)" -.\" Since Linux 2.1.44 -If this option is enabled, then use the RFC\ 1122 interpretation -of the TCP urgent-pointer field. -.\" RFC 793 was ambiguous in its specification of the meaning of the -.\" urgent pointer. RFC 1122 (and RFC 961) fixed on a particular -.\" resolution of this ambiguity (unfortunately the "wrong" one). -According to this interpretation, the urgent pointer points -to the last byte of urgent data. -If this option is disabled, then use the BSD-compatible interpretation of -the urgent pointer: -the urgent pointer points to the first byte after the urgent data. -Enabling this option may lead to interoperability problems. -.TP -.IR tcp_syn_retries " (integer; default: 6; since Linux 2.2)" -.\" Since Linux 2.1.38 -The maximum number of times initial SYNs for an active TCP -connection attempt will be retransmitted. -This value should not be higher than 255. -The default value is 6, which corresponds to retrying for up to -approximately 127 seconds. -Before Linux 3.7, -.\" commit 6c9ff979d1921e9fd05d89e1383121c2503759b9 -the default value was 5, which -(in conjunction with calculation based on other kernel parameters) -corresponded to approximately 180 seconds. -.TP -.IR tcp_synack_retries " (integer; default: 5; since Linux 2.2)" -.\" Since Linux 2.1.38 -The maximum number of times a SYN/ACK segment -for a passive TCP connection will be retransmitted. -This number should not be higher than 255. -.TP -.IR tcp_syncookies " (integer; default: 1; since Linux 2.2)" -.\" Since Linux 2.1.43 -Enable TCP syncookies. -The kernel must be compiled with -.BR CONFIG_SYN_COOKIES . -The syncookies feature attempts to protect a -socket from a SYN flood attack. -This should be used as a last resort, if at all. -This is a violation of the TCP protocol, -and conflicts with other areas of TCP such as TCP extensions. -It can cause problems for clients and relays. -It is not recommended as a tuning mechanism for heavily -loaded servers to help with overloaded or misconfigured conditions. -For recommended alternatives see -.IR tcp_max_syn_backlog , -.IR tcp_synack_retries , -and -.IR tcp_abort_on_overflow . -Set to one of the following values: -.RS -.TP -.B 0 -Disable TCP syncookies. -.TP -.B 1 -Send out syncookies when the syn backlog queue of a socket overflows. -.TP -.B 2 -(since Linux 3.12) -.\" commit 5ad37d5deee1ff7150a2d0602370101de158ad86 -Send out syncookies unconditionally. -This can be useful for network testing. -.RE -.TP -.IR tcp_timestamps " (integer; default: 1; since Linux 2.2)" -.\" Since Linux 2.1.36 -Set to one of the following values to enable or disable RFC\ 1323 -TCP timestamps: -.RS -.TP -.B 0 -Disable timestamps. -.TP -.B 1 -Enable timestamps as defined in RFC1323 and use random offset for -each connection rather than only using the current time. -.TP -.B 2 -As for the value 1, but without random offsets. -.\" commit 25429d7b7dca01dc4f17205de023a30ca09390d0 -Setting -.I tcp_timestamps -to this value is meaningful since Linux 4.10. -.RE -.TP -.IR tcp_tso_win_divisor " (integer; default: 3; since Linux 2.6.9)" -This parameter controls what percentage of the congestion window -can be consumed by a single TCP Segmentation Offload (TSO) frame. -The setting of this parameter is a tradeoff between burstiness and -building larger TSO frames. -.TP -.IR tcp_tw_recycle " (Boolean; default: disabled; Linux 2.4 to Linux 4.11)" -.\" Since Linux 2.3.15 -.\" removed in Linux 4.12; commit 4396e46187ca5070219b81773c4e65088dac50cc -Enable fast recycling of TIME_WAIT sockets. -Enabling this option is -not recommended as the remote IP may not use monotonically increasing -timestamps (devices behind NAT, devices with per-connection timestamp -offsets). -See RFC 1323 (PAWS) and RFC 6191. -.\" -.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt -.TP -.IR tcp_tw_reuse " (Boolean; default: disabled; since Linux 2.4.19/2.6)" -.\" Since Linux 2.4.19/2.5.43 -Allow to reuse TIME_WAIT sockets for new connections when it is -safe from protocol viewpoint. -It should not be changed without advice/request of technical experts. -.\" -.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt -.TP -.IR tcp_vegas_cong_avoid " (Boolean; default: disabled; Linux 2.2 to Linux 2.6.13)" -.\" Since Linux 2.1.8; removed in Linux 2.6.13 -Enable TCP Vegas congestion avoidance algorithm. -TCP Vegas is a sender-side-only change to TCP that anticipates -the onset of congestion by estimating the bandwidth. -TCP Vegas adjusts the sending rate by modifying the congestion window. -TCP Vegas should provide less packet loss, but it is -not as aggressive as TCP Reno. -.\" -.\" The following is from Linux 2.6.12: Documentation/networking/ip-sysctl.txt -.TP -.IR tcp_westwood " (Boolean; default: disabled; Linux 2.4.26/2.6.3 to Linux 2.6.13)" -Enable TCP Westwood+ congestion control algorithm. -TCP Westwood+ is a sender-side-only modification of the TCP Reno -protocol stack that optimizes the performance of TCP congestion control. -It is based on end-to-end bandwidth estimation to set -congestion window and slow start threshold after a congestion episode. -Using this estimation, TCP Westwood+ adaptively sets a -slow start threshold and a congestion window which takes into -account the bandwidth used at the time congestion is experienced. -TCP Westwood+ significantly increases fairness with respect to -TCP Reno in wired networks and throughput over wireless links. -.TP -.IR tcp_window_scaling " (Boolean; default: enabled; since Linux 2.2)" -.\" Since Linux 2.1.36 -Enable RFC\ 1323 TCP window scaling. -This feature allows the use of a large window -(> 64\ kB) on a TCP connection, should the other end support it. -Normally, the 16 bit window length field in the TCP header -limits the window size to less than 64\ kB. -If larger windows are desired, applications can increase the size of -their socket buffers and the window scaling option will be employed. -If -.I tcp_window_scaling -is disabled, TCP will not negotiate the use of window -scaling with the other end during connection setup. -.TP -.IR tcp_wmem " (since Linux 2.4)" -.\" Since Linux 2.4.0-test7 -This is a vector of 3 integers: [min, default, max]. -These parameters are used by TCP to regulate send buffer sizes. -TCP dynamically adjusts the size of the send buffer from the -default values listed below, in the range of these values, -depending on memory available. -.RS -.TP -.I min -Minimum size of the send buffer used by each TCP socket. -The default value is the system page size. -(On Linux 2.4, the default value is 4\ kB.) -This value is used to ensure that in memory pressure mode, -allocations below this size will still succeed. -This is not used to bound the size of the send buffer declared using -.B SO_SNDBUF -on a socket. -.TP -.I default -The default size of the send buffer for a TCP socket. -This value overwrites the initial default buffer size from -the generic global -.I /proc/sys/net/core/wmem_default -defined for all protocols. -The default value is 16\ kB. -.\" True in Linux 2.4 and 2.6 -If larger send buffer sizes are desired, this value -should be increased (to affect all sockets). -To employ large TCP windows, the -.I /proc/sys/net/ipv4/tcp_window_scaling -must be set to a nonzero value (default). -.TP -.I max -The maximum size of the send buffer used by each TCP socket. -This value does not override the value in -.IR /proc/sys/net/core/wmem_max . -This is not used to limit the size of the send buffer declared using -.B SO_SNDBUF -on a socket. -The default value is calculated using the formula -.IP -.in +4n -.EX -max(65536, min(4\ MB, \fItcp_mem\fP[1]*PAGE_SIZE/128)) -.EE -.in -.IP -(On Linux 2.4, the default value is 128\ kB, -lowered 64\ kB depending on low-memory systems.) -.RE -.TP -.IR tcp_workaround_signed_windows " (Boolean; default: disabled; since Linux 2.6.26)" -If enabled, assume that no receipt of a window-scaling option means that the -remote TCP is broken and treats the window as a signed quantity. -If disabled, assume that the remote TCP is not broken even if we do -not receive a window scaling option from it. -.SS Socket options -To set or get a TCP socket option, call -.BR getsockopt (2) -to read or -.BR setsockopt (2) -to write the option with the option level argument set to -.BR IPPROTO_TCP . -Unless otherwise noted, -.I optval -is a pointer to an -.IR int . -.\" or SOL_TCP on Linux -In addition, -most -.B IPPROTO_IP -socket options are valid on TCP sockets. -For more information see -.BR ip (7). -.P -Following is a list of TCP-specific socket options. -For details of some other socket options that are also applicable -for TCP sockets, see -.BR socket (7). -.TP -.BR TCP_CONGESTION " (since Linux 2.6.13)" -.\" commit 5f8ef48d240963093451bcf83df89f1a1364f51d -.\" Author: Stephen Hemminger <shemminger@osdl.org> -The argument for this option is a string. -This option allows the caller to set the TCP congestion control -algorithm to be used, on a per-socket basis. -Unprivileged processes are restricted to choosing one of the algorithms in -.I tcp_allowed_congestion_control -(described above). -Privileged processes -.RB ( CAP_NET_ADMIN ) -can choose from any of the available congestion-control algorithms -(see the description of -.I tcp_available_congestion_control -above). -.TP -.BR TCP_CORK " (since Linux 2.2)" -.\" precisely: since Linux 2.1.127 -If set, don't send out partial frames. -All queued partial frames are sent when the option is cleared again. -This is useful for prepending headers before calling -.BR sendfile (2), -or for throughput optimization. -As currently implemented, there is a 200 millisecond ceiling on the time -for which output is corked by -.BR TCP_CORK . -If this ceiling is reached, then queued data is automatically transmitted. -This option can be combined with -.B TCP_NODELAY -only since Linux 2.5.71. -This option should not be used in code intended to be portable. -.TP -.BR TCP_DEFER_ACCEPT " (since Linux 2.4)" -.\" Precisely: since Linux 2.3.38 -.\" Useful references: -.\" http://www.techrepublic.com/article/take-advantage-of-tcp-ip-options-to-optimize-data-transmission/ -.\" http://unix.stackexchange.com/questions/94104/real-world-use-of-tcp-defer-accept -Allow a listener to be awakened only when data arrives on the socket. -Takes an integer value (seconds), this can -bound the maximum number of attempts TCP will make to -complete the connection. -This option should not be used in code intended to be portable. -.TP -.BR TCP_INFO " (since Linux 2.4)" -Used to collect information about this socket. -The kernel returns a \fIstruct tcp_info\fP as defined in the file -.IR /usr/include/linux/tcp.h . -This option should not be used in code intended to be portable. -.TP -.BR TCP_KEEPCNT " (since Linux 2.4)" -.\" Precisely: since Linux 2.3.18 -The maximum number of keepalive probes TCP should send -before dropping the connection. -This option should not be -used in code intended to be portable. -.TP -.BR TCP_KEEPIDLE " (since Linux 2.4)" -.\" Precisely: since Linux 2.3.18 -The time (in seconds) the connection needs to remain idle -before TCP starts sending keepalive probes, if the socket -option -.B SO_KEEPALIVE -has been set on this socket. -This option should not be used in code intended to be portable. -.TP -.BR TCP_KEEPINTVL " (since Linux 2.4)" -.\" Precisely: since Linux 2.3.18 -The time (in seconds) between individual keepalive probes. -This option should not be used in code intended to be portable. -.TP -.BR TCP_LINGER2 " (since Linux 2.4)" -.\" Precisely: since Linux 2.3.41 -The lifetime of orphaned FIN_WAIT2 state sockets. -This option can be used to override the system-wide setting in the file -.I /proc/sys/net/ipv4/tcp_fin_timeout -for this socket. -This is not to be confused with the -.BR socket (7) -level option -.BR SO_LINGER . -This option should not be used in code intended to be portable. -.TP -.B TCP_MAXSEG -.\" Present in Linux 1.0 -The maximum segment size for outgoing TCP packets. -In Linux 2.2 and earlier, and in Linux 2.6.28 and later, -if this option is set before connection establishment, it also -changes the MSS value announced to the other end in the initial packet. -Values greater than the (eventual) interface MTU have no effect. -TCP will also impose -its minimum and maximum bounds over the value provided. -.TP -.B TCP_NODELAY -.\" Present in Linux 1.0 -If set, disable the Nagle algorithm. -This means that segments -are always sent as soon as possible, even if there is only a -small amount of data. -When not set, data is buffered until there -is a sufficient amount to send out, thereby avoiding the -frequent sending of small packets, which results in poor -utilization of the network. -This option is overridden by -.BR TCP_CORK ; -however, setting this option forces an explicit flush of -pending output, even if -.B TCP_CORK -is currently set. -.TP -.BR TCP_QUICKACK " (since Linux 2.4.4)" -Enable quickack mode if set or disable quickack -mode if cleared. -In quickack mode, acks are sent -immediately, rather than delayed if needed in accordance -to normal TCP operation. -This flag is not permanent, -it only enables a switch to or from quickack mode. -Subsequent operation of the TCP protocol will -once again enter/leave quickack mode depending on -internal protocol processing and factors such as -delayed ack timeouts occurring and data transfer. -This option should not be used in code intended to be -portable. -.TP -.BR TCP_SYNCNT " (since Linux 2.4)" -.\" Precisely: since Linux 2.3.18 -Set the number of SYN retransmits that TCP should send before -aborting the attempt to connect. -It cannot exceed 255. -This option should not be used in code intended to be portable. -.TP -.BR TCP_USER_TIMEOUT " (since Linux 2.6.37)" -.\" commit dca43c75e7e545694a9dd6288553f55c53e2a3a3 -.\" Author: Jerry Chu <hkchu@google.com> -.\" The following text taken nearly verbatim from Jerry Chu's (excellent) -.\" commit message. -.\" -This option takes an -.I unsigned int -as an argument. -When the value is greater than 0, -it specifies the maximum amount of time in milliseconds that transmitted -data may remain unacknowledged, or buffered data may remain untransmitted -(due to zero window size) before TCP will forcibly close the -corresponding connection and return -.B ETIMEDOUT -to the application. -If the option value is specified as 0, -TCP will use the system default. -.IP -Increasing user timeouts allows a TCP connection to survive extended -periods without end-to-end connectivity. -Decreasing user timeouts -allows applications to "fail fast", if so desired. -Otherwise, failure may take up to 20 minutes with -the current system defaults in a normal WAN environment. -.IP -This option can be set during any state of a TCP connection, -but is effective only during the synchronized states of a connection -(ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, and LAST-ACK). -Moreover, when used with the TCP keepalive -.RB ( SO_KEEPALIVE ) -option, -.B TCP_USER_TIMEOUT -will override keepalive to determine when to close a -connection due to keepalive failure. -.IP -The option has no effect on when TCP retransmits a packet, -nor when a keepalive probe is sent. -.IP -This option, like many others, will be inherited by the socket returned by -.BR accept (2), -if it was set on the listening socket. -.IP -Further details on the user timeout feature can be found in -RFC\ 793 and RFC\ 5482 ("TCP User Timeout Option"). -.TP -.BR TCP_WINDOW_CLAMP " (since Linux 2.4)" -.\" Precisely: since Linux 2.3.41 -Bound the size of the advertised window to this value. -The kernel imposes a minimum size of SOCK_MIN_RCVBUF/2. -This option should not be used in code intended to be -portable. -.TP -.BR TCP_FASTOPEN " (since Linux 3.6)" -This option enables Fast Open (RFC\~7413) on the listener socket. -The value specifies the maximum length of pending SYNs -(similar to the backlog argument in -.BR listen (2)). -Once enabled, -the listener socket grants the TCP Fast Open cookie -on incoming SYN with TCP Fast Open option. -.IP -More importantly it accepts the data in SYN with a valid Fast Open cookie -and responds SYN-ACK acknowledging both the data and the SYN sequence. -.BR accept (2) -returns a socket that is available for read and write -when the handshake has not completed yet. -Thus the data exchange can commence before the handshake completes. -This option requires enabling the server-side support on sysctl -.I net.ipv4.tcp_fastopen -(see above). -For TCP Fast Open client-side support, -see -.BR send (2) -.B MSG_FASTOPEN -or -.B TCP_FASTOPEN_CONNECT -below. -.TP -.BR TCP_FASTOPEN_CONNECT " (since Linux 4.11)" -This option enables an alternative way to perform Fast Open -on the active side (client). -When this option is enabled, -.BR connect (2) -would behave differently depending on -if a Fast Open cookie is available for the destination. -.IP -If a cookie is not available (i.e. first contact to the destination), -.BR connect (2) -behaves as usual by sending a SYN immediately, -except the SYN would include an empty Fast Open cookie option -to solicit a cookie. -.IP -If a cookie is available, -.BR connect (2) -would return 0 immediately but the SYN transmission is deferred. -A subsequent -.BR write (2) -or -.BR sendmsg (2) -would trigger a SYN with data plus cookie in the Fast Open option. -In other words, -the actual connect operation is deferred until data is supplied. -.IP -.B Note: -While this option is designed for convenience, -enabling it does change the behaviors and certain system calls might set -different -.I errno -values. -With cookie present, -.BR write (2) -or -.BR sendmsg (2) -must be called right after -.BR connect (2) -in order to send out SYN+data to complete 3WHS and establish connection. -Calling -.BR read (2) -right after -.BR connect (2) -without -.BR write (2) -will cause the blocking socket to be blocked forever. -.IP -The application should either set -.B TCP_FASTOPEN_CONNECT -socket option before -.BR write (2) -or -.BR sendmsg (2), -or call -.BR write (2) -or -.BR sendmsg (2) -with -.B MSG_FASTOPEN -flag directly, -instead of both on the same connection. -.IP -Here is the typical call flow with this new option: -.IP -.in +4n -.EX -s = socket(); -setsockopt(s, IPPROTO_TCP, TCP_FASTOPEN_CONNECT, 1, ...); -connect(s); -write(s); /* write() should always follow connect() - * in order to trigger SYN to go out. */ -read(s)/write(s); -/* ... */ -close(s); -.EE -.in -.SS Sockets API -TCP provides limited support for out-of-band data, -in the form of (a single byte of) urgent data. -In Linux this means if the other end sends newer out-of-band -data the older urgent data is inserted as normal data into -the stream (even when -.B SO_OOBINLINE -is not set). -This differs from BSD-based stacks. -.P -Linux uses the BSD compatible interpretation of the urgent -pointer field by default. -This violates RFC\ 1122, but is -required for interoperability with other stacks. -It can be changed via -.IR /proc/sys/net/ipv4/tcp_stdurg . -.P -It is possible to peek at out-of-band data using the -.BR recv (2) -.B MSG_PEEK -flag. -.P -Since Linux 2.4, Linux supports the use of -.B MSG_TRUNC -in the -.I flags -argument of -.BR recv (2) -(and -.BR recvmsg (2)). -This flag causes the received bytes of data to be discarded, -rather than passed back in a caller-supplied buffer. -Since Linux 2.4.4, -.B MSG_TRUNC -also has this effect when used in conjunction with -.B MSG_OOB -to receive out-of-band data. -.SS Ioctls -The following -.BR ioctl (2) -calls return information in -.IR value . -The correct syntax is: -.P -.RS -.nf -.BI int " value"; -.IB error " = ioctl(" tcp_socket ", " ioctl_type ", &" value ");" -.fi -.RE -.P -.I ioctl_type -is one of the following: -.TP -.B SIOCINQ -Returns the amount of queued unread data in the receive buffer. -The socket must not be in LISTEN state, otherwise an error -.RB ( EINVAL ) -is returned. -.B SIOCINQ -is defined in -.IR <linux/sockios.h> . -.\" FIXME https://www.sourceware.org/bugzilla/show_bug.cgi?id=12002, -.\" filed 2010-09-10, may cause SIOCINQ to be defined in glibc headers -Alternatively, -you can use the synonymous -.BR FIONREAD , -defined in -.IR <sys/ioctl.h> . -.TP -.B SIOCATMARK -Returns true (i.e., -.I value -is nonzero) if the inbound data stream is at the urgent mark. -.IP -If the -.B SO_OOBINLINE -socket option is set, and -.B SIOCATMARK -returns true, then the -next read from the socket will return the urgent data. -If the -.B SO_OOBINLINE -socket option is not set, and -.B SIOCATMARK -returns true, then the -next read from the socket will return the bytes following -the urgent data (to actually read the urgent data requires the -.B recv(MSG_OOB) -flag). -.IP -Note that a read never reads across the urgent mark. -If an application is informed of the presence of urgent data via -.BR select (2) -(using the -.I exceptfds -argument) or through delivery of a -.B SIGURG -signal, -then it can advance up to the mark using a loop which repeatedly tests -.B SIOCATMARK -and performs a read (requesting any number of bytes) as long as -.B SIOCATMARK -returns false. -.TP -.B SIOCOUTQ -Returns the amount of unsent data in the socket send queue. -The socket must not be in LISTEN state, otherwise an error -.RB ( EINVAL ) -is returned. -.B SIOCOUTQ -is defined in -.IR <linux/sockios.h> . -.\" FIXME . https://www.sourceware.org/bugzilla/show_bug.cgi?id=12002, -.\" filed 2010-09-10, may cause SIOCOUTQ to be defined in glibc headers -Alternatively, -you can use the synonymous -.BR TIOCOUTQ , -defined in -.IR <sys/ioctl.h> . -.SS Error handling -When a network error occurs, TCP tries to resend the packet. -If it doesn't succeed after some time, either -.B ETIMEDOUT -or the last received error on this connection is reported. -.P -Some applications require a quicker error notification. -This can be enabled with the -.B IPPROTO_IP -level -.B IP_RECVERR -socket option. -When this option is enabled, all incoming -errors are immediately passed to the user program. -Use this option with care \[em] it makes TCP less tolerant to routing -changes and other normal network conditions. -.SH ERRORS -.TP -.B EAFNOTSUPPORT -Passed socket address type in -.I sin_family -was not -.BR AF_INET . -.TP -.B EPIPE -The other end closed the socket unexpectedly or a read is -executed on a shut down socket. -.TP -.B ETIMEDOUT -The other end didn't acknowledge retransmitted data after some time. -.P -Any errors defined for -.BR ip (7) -or the generic socket layer may also be returned for TCP. -.SH VERSIONS -Support for Explicit Congestion Notification, zero-copy -.BR sendfile (2), -reordering support and some SACK extensions -(DSACK) were introduced in Linux 2.4. -Support for forward acknowledgement (FACK), TIME_WAIT recycling, -and per-connection keepalive socket options were introduced in Linux 2.3. -.SH BUGS -Not all errors are documented. -.P -IPv6 is not described. -.\" Only a single Linux kernel version is described -.\" Info for 2.2 was lost. Should be added again, -.\" or put into a separate page. -.\" .SH AUTHORS -.\" This man page was originally written by Andi Kleen. -.\" It was updated for 2.4 by Nivedita Singhvi with input from -.\" Alexey Kuznetsov's Documentation/networking/ip-sysctl.txt -.\" document. -.SH SEE ALSO -.BR accept (2), -.BR bind (2), -.BR connect (2), -.BR getsockopt (2), -.BR listen (2), -.BR recvmsg (2), -.BR sendfile (2), -.BR sendmsg (2), -.BR socket (2), -.BR ip (7), -.BR socket (7) -.P -The kernel source file -.IR Documentation/networking/ip\-sysctl.txt . -.P -RFC\ 793 for the TCP specification. -.br -RFC\ 1122 for the TCP requirements and a description of the Nagle algorithm. -.br -RFC\ 1323 for TCP timestamp and window scaling options. -.br -RFC\ 1337 for a description of TIME_WAIT assassination hazards. -.br -RFC\ 3168 for a description of Explicit Congestion Notification. -.br -RFC\ 2581 for TCP congestion control algorithms. -.br -RFC\ 2018 and RFC\ 2883 for SACK and extensions to SACK. |