diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-07-24 09:54:23 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-07-24 09:54:44 +0000 |
commit | 836b47cb7e99a977c5a23b059ca1d0b5065d310e (patch) | |
tree | 1604da8f482d02effa033c94a84be42bc0c848c3 /src/health/guides/tcp | |
parent | Releasing debian version 1.44.3-2. (diff) | |
download | netdata-836b47cb7e99a977c5a23b059ca1d0b5065d310e.tar.xz netdata-836b47cb7e99a977c5a23b059ca1d0b5065d310e.zip |
Merging upstream version 1.46.3.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/health/guides/tcp')
-rw-r--r-- | src/health/guides/tcp/10s_ipv4_tcp_resets_received.md | 67 | ||||
-rw-r--r-- | src/health/guides/tcp/10s_ipv4_tcp_resets_sent.md | 43 | ||||
-rw-r--r-- | src/health/guides/tcp/1m_ipv4_tcp_resets_received.md | 41 | ||||
-rw-r--r-- | src/health/guides/tcp/1m_ipv4_tcp_resets_sent.md | 37 | ||||
-rw-r--r-- | src/health/guides/tcp/1m_tcp_accept_queue_drops.md | 30 | ||||
-rw-r--r-- | src/health/guides/tcp/1m_tcp_accept_queue_overflows.md | 35 | ||||
-rw-r--r-- | src/health/guides/tcp/1m_tcp_syn_queue_cookies.md | 39 | ||||
-rw-r--r-- | src/health/guides/tcp/1m_tcp_syn_queue_drops.md | 22 | ||||
-rw-r--r-- | src/health/guides/tcp/tcp_connections.md | 51 | ||||
-rw-r--r-- | src/health/guides/tcp/tcp_memory.md | 50 | ||||
-rw-r--r-- | src/health/guides/tcp/tcp_orphans.md | 48 |
11 files changed, 463 insertions, 0 deletions
diff --git a/src/health/guides/tcp/10s_ipv4_tcp_resets_received.md b/src/health/guides/tcp/10s_ipv4_tcp_resets_received.md new file mode 100644 index 000000000..c17954f2d --- /dev/null +++ b/src/health/guides/tcp/10s_ipv4_tcp_resets_received.md @@ -0,0 +1,67 @@ +### Understand the alert + +TCP reset is an abrupt closure of the session. It causes the resources allocated to the connection to be immediately released and all other information about the connection is erased. + +The Netdata Agent monitors the average number of sent TCP RESETS over the last 10 seconds. This can indicate a port scan or that a service running on the system has crashed. Additionally, it's a result of a high number of sent TCP RESETS. Furthermore, it can also indicate a SYN reset attack. + +### More about TCP Resets + +TCP uses a three-way handshake to establish a reliable connection. The connection is full duplex, and both sides synchronize (SYN) and acknowledge (ACK) each other. The exchange of these four flags +is performed in three steps: SYN, SYN-ACK, and ACK. + +When an unexpected TCP packet arrives at a host, that host usually responds by sending a reset packet back on the same connection. A reset packet is one with no payload and with the RST bit set in the TCP header flags. There are a few circumstances in which a TCP packet might not be expected. The most common cases are: + +1. A TCP packet received on a port that is not open. +2. An aborting connection +3. Half opened connections +4. Time wait assassination +5. Listening endpoint Queue is Full +6. A TCP Buffer Overflow + +Basically, A TCP Reset usually occurs when a system receives data which doesn't agree with its view of the connection. + +### Troubleshoot the alert + +- Use tcpdump to capture the traffic and use Wireshark to inspect the network packets. You must stop the capture after a certain observation period (60s up to 5 minutes). This command will create a dump file which can be interpreted by Wireshark that contains all the TCP packets with RST flag set. + ``` + tcpdump -i any 'tcp[tcpflags] & (tcp-rst) == (tcp-rst)' -s 65535 -w output.pcap + ``` + +- Counter measure on malicious TCP resets + +SYN cookie is a technique used to resist IP address spoofing attacks. In particular, the use of SYN cookies allows a server to avoid dropping connections when the SYN queue fills up. + +Enable SYN cookies in Linux: + + 1. Check if your system has the SYN cookies service enabled. If the value is 1, then the service is enabled, if not proceed to step 2. + ``` + cat /proc/sys/net/ipv4/tcp_syncookies + ``` + + 2. Bump this `net.ipv4.tcp_syncookies=1` value under `/etc/sysctl.conf` + + 3. Apply the configuration + ``` + sysctl -p + ``` + +Enable SYN cookies in FreeBSD: + + 1. Check if your system has the SYN cookies service enabled. If the value is 1, then the service is enabled, if not proceed to step 2. + ``` + sysctl net.inet.tcp.syncookies_only + ``` + + 2. Bump this `net.inet.tcp.syncookies_only=1` value under `/etc/sysctl.conf` + + 3. Apply the configuration + ``` + /etc/rc.d/sysctl reload + ``` + +The use of SYN cookies does not break any protocol specifications, and therefore should be compatible with all TCP implementations. There are, however, a few caveats that take effect when SYN cookies are in use. + +### Useful resources + +1. [TCP reset explanation](https://www.pico.net/kb/what-is-a-tcp-reset-rst/) +2. [TCP 3-way handshake on wikipedia](https://en.wikipedia.org/wiki/Handshaking) diff --git a/src/health/guides/tcp/10s_ipv4_tcp_resets_sent.md b/src/health/guides/tcp/10s_ipv4_tcp_resets_sent.md new file mode 100644 index 000000000..9a941694e --- /dev/null +++ b/src/health/guides/tcp/10s_ipv4_tcp_resets_sent.md @@ -0,0 +1,43 @@ +### Understand the alert + +TCP reset is an abrupt closure of the session. It causes the resources allocated to the connection to be immediately released and all other information about the connection is erased. + +The Netdata Agent monitors the average number of sent TCP RESETS over the last 10 seconds. This can indicate a port scan or that a service running on the system has crashed. Additionally, it's a result of a high number of sent TCP RESETS. Furthermore, it can also indicate a SYN reset attack. + +### More about TCP Resets + +TCP uses a three-way handshake to establish a reliable connection. The connection is full duplex, and both sides synchronize (SYN) and acknowledge (ACK) each other. The exchange of these four flags +is performed in three steps: SYN, SYN-ACK, and ACK. + +When an unexpected TCP packet arrives at a host, that host usually responds by sending a reset packet back on the same connection. A reset packet is one with no payload and with the RST bit set in the TCP header flags. There are a few circumstances in which a TCP packet might not be expected. The most common cases are: + +1. A TCP packet received on a port that is not open. +2. An aborting connection +3. Half opened connections +4. Time wait assassination +5. Listening endpoint Queue is Full +6. A TCP Buffer Overflow + +Basically, A TCP Reset usually occurs when a system receives data which doesn't agree with its view of the connection. + +When your system cannot establish a connection it will retry by default `net.ipv4.tcp_syn_retries` times. + +### Troubleshoot the alert + +- Use tcpdump to capture the traffic and use Wireshark to inspect the network packets. You must stop the capture after a certain observation period (60s up to 5 minutes). This command will create a dump file which can be interpreted by Wireshark that contains all the TCP packets with RST flag set. + ``` + tcpdump -i any 'tcp[tcpflags] & (tcp-rst) == (tcp-rst)' -s 65535 -w output.pcap + ``` + +- Identify which application sends TCP resets + +1. Check the instances of `RST` events of the TCP protocol. Wireshark also displays the ports on which the two systems tried to establish the TCP connection, (XXXXXX -> XXXXXX). +2. To check which application is using this port, run the following code: + ``` + lsof -i:XXXXXX -P -n + ``` +### Useful resources + +1. [TCP reset explanation](https://www.pico.net/kb/what-is-a-tcp-reset-rst/) +2. [TCP 3-way handshake on wikipedia](https://en.wikipedia.org/wiki/Handshaking) +3. [Read more about Wireshark here](https://www.wireshark.org/)
\ No newline at end of file diff --git a/src/health/guides/tcp/1m_ipv4_tcp_resets_received.md b/src/health/guides/tcp/1m_ipv4_tcp_resets_received.md new file mode 100644 index 000000000..89f01f3cb --- /dev/null +++ b/src/health/guides/tcp/1m_ipv4_tcp_resets_received.md @@ -0,0 +1,41 @@ +### Understand the alert + +This alert, `1m_ipv4_tcp_resets_received`, calculates the average number of TCP RESETS received (`AttemptFails`) over the last minute on your system. If you receive this alert, it means that there is an increase in the number of TCP RESETS, which might indicate a problem with your networked applications or servers. + +### What does TCP RESET mean? + +`TCP RESET` is a signal that is sent from one connection end to the other when an ongoing connection is immediately terminated without an orderly close. This usually happens when a networked application encounters an issue, such as an incorrect connection request, invalid data packet, or a closed port. + +### Troubleshoot the alert + +1. Identify the top consumers of TCP RESETS: + + You can use the `ss` utility to list the TCP sockets and their states: + + ``` + sudo ss -tan + ``` + + Look for the `State` column to see which sockets have a `CLOSE-WAIT`, `FIN-WAIT`, `TIME-WAIT`, or `LAST-ACK` status. These states usually have a high number of TCP RESETS. + +2. Check the logs of the concerned applications: + + If you have identified the problematic applications or servers, inspect their logs for any error messages, warnings, or unusual activity related to network connection issues. + +3. Inspect the system logs: + + Check the system logs, such as `/var/log/syslog` on Linux or `/var/log/system.log` on FreeBSD, for any network-related issues. This could help you find possible reasons for the increased number of TCP RESETS. + +4. Monitor and diagnose network issues: + + Use tools like `tcpdump`, `wireshark`, or `iftop` to capture packets and observe network traffic. This can help you identify patterns that may be causing the increased number of TCP RESETS. + +5. Check for resource constraints: + + Ensure that your system's resources, such as CPU, memory, and disk space, are not under heavy load or reaching their limits. High resource usage could cause networked applications to behave unexpectedly, resulting in an increased number of TCP RESETS. + +### Useful resources + +1. [ss Utility - Investigate Network Connections & Sockets](https://www.binarytides.com/linux-ss-command/) +2. [Wireshark - A Network Protocol Analyzer](https://www.wireshark.org/) +3. [Monitoring Network Traffic with iftop](https://www.tecmint.com/iftop-linux-network-bandwidth-monitoring-tool/) diff --git a/src/health/guides/tcp/1m_ipv4_tcp_resets_sent.md b/src/health/guides/tcp/1m_ipv4_tcp_resets_sent.md new file mode 100644 index 000000000..fa052e6bb --- /dev/null +++ b/src/health/guides/tcp/1m_ipv4_tcp_resets_sent.md @@ -0,0 +1,37 @@ +### Understand the alert + +This alert calculates the average number of TCP resets (`OutRsts`) sent by the host over the last minute. If you receive this alert, it means that your system is experiencing an unusually high rate of TCP resets, which might signal connection issues or potential attacks. + +### What is a TCP reset? + +A TCP reset (or RST packet) is a signal used in the Transmission Control Protocol (TCP) to abruptly close an active connection between two devices. It can be sent by either the client or server to inform the other party that they should consider the connection terminated. + +### Why are high numbers of TCP resets a concern? + +When there's a high rate of TCP resets sent by a host, it generally indicates problems in communication with other devices or services. This could be due to network latency, misconfigured firewalls, or aggressive timeouts causing connections to break. In some cases, it could also signal a potential Denial of Service (DoS) attack, where an attacker sends multiple resets to disrupt a service or network. + +### Troubleshoot the alert + +- Check the network performance + + Investigate if there are any network latency issues or congestion in your system. You can use tools like `ping`, `traceroute`, or `mtr` to check the network quality and connectivity to other hosts. + +- Analyze packet captures for communication issues + + Use a packet capture tool like `tcpdump` or `Wireshark` to capture and analyze network traffic during the period of high resets. Look for patterns or specific connections that are frequently terminated with a reset. This could help pinpoint misconfigured services, firewalls, or devices causing the issue. + +- Check firewall settings + + Ensure that your firewall settings are properly configured to allow necessary connections and not aggressively closing them. Look for rules related to connection timeouts, max connections, and SYN flood protection to see if they might be causing the resets. + +- Review system logs for errors + + Check system and application logs for any error messages or events that correlate to the time of the alert. This might give you more information about the cause of the issue. + +- Monitor for potential attacks + + If the above steps don't help determine the cause, consider monitoring your network and system for potential DoS attacks. Implement security measures such as rate-limiting and access control to protect your services and network from malicious traffic. + +### Useful resources + +1. [TCP Connection Resets and How to Troubleshoot Them](https://blog.wireshark.org/tcp/connection/resets/troubleshoot/) diff --git a/src/health/guides/tcp/1m_tcp_accept_queue_drops.md b/src/health/guides/tcp/1m_tcp_accept_queue_drops.md new file mode 100644 index 000000000..5926d24c9 --- /dev/null +++ b/src/health/guides/tcp/1m_tcp_accept_queue_drops.md @@ -0,0 +1,30 @@ +### Understand the alert + +This alert presents the average number of dropped packets in the TCP accept queue over the last sixty seconds. If it is raised, then the system is dropping incoming TCP connections. This could also be an indication of accepted queue overflow, low memory, security issues, no route to a destination, etc. +- This alert gets raised to warning when the value is greater than 1 and less than 5. +- If the number of queue drops over the last minute exceeds 5, then the alert gets raised to critical. + +### TCP Accept Queue Drops + +The accept queue holds fully established TCP connections waiting to be handled by the listening application. It overflows when the server application fails to accept new connections at the rate they are coming in. + +### Troubleshooting Section + +- Check for queue overflows. + +If you receive this alert, then you can cross-check its results with the `1m_tcp_accept_queue_overflows` alert. If that alert is also in a warning or critical state, then the system is experiencing accept queue overflowing. To fix that you can do the following: + +1. Open the /etc/sysctl.conf file and look for the entry " net.ipv4.tcp_max_syn_backlog". + The `tcp_max_syn_backlog` is the maximal number of remembered connection requests (SYN_RECV), which have not received an acknowledgment from connecting client. +2. If the entry does not exist, then append the following default entry to the file; `net.ipv4.tcp_max_syn_backlog=1280`. Otherwise, adjust the limit to suit your needs. +3. Save your changes and run: + ``` + sysctl -p + ``` + +Note: Netdata strongly suggests knowing exactly what values you need before making system changes. + +### Useful resources + +1. [ip-sysctl.txt](https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt) +2. [Transmission Control Protocol](https://en.wikipedia.org/wiki/Transmission_Control_Protocol) diff --git a/src/health/guides/tcp/1m_tcp_accept_queue_overflows.md b/src/health/guides/tcp/1m_tcp_accept_queue_overflows.md new file mode 100644 index 000000000..7c5ddf0f5 --- /dev/null +++ b/src/health/guides/tcp/1m_tcp_accept_queue_overflows.md @@ -0,0 +1,35 @@ +### Understand the alert + +This alert presents the average number of overflows in the TCP accept queue over the last minute. + +- This alert gets raised in a warning state when the value is greater than 1 and less than 5. +- If the overflow average exceeds 5 in the last minute, then the alert gets raised in the critical state. + +### What is the Accept queue? + +The accept queue holds fully established TCP connections waiting to be handled by the listening application. It overflows when the server application fails to accept new connections at the rate they are coming in. + +### This alert might also indicate a SYN flood. + +A SYN flood is a form of denial-of-service attack in which an attacker rapidly initiates a connection to a server without finalizing the connection. The server has to spend resources waiting for half-opened connections, which can consume enough resources to make the system unresponsive to legitimate traffic. + +### Troubleshooting Section + +Increase the queue length + +1. Open the /etc/sysctl.conf file and look for the entry " net.ipv4.tcp_max_syn_backlog". + The `tcp_max_syn_backlog` is the maximal number of remembered connection requests (SYN_RECV), which have not received an acknowledgment from connecting client. +2. If the entry does not exist, you can append the following default entry to the file; `net.ipv4. tcp_max_syn_backlog=1280`. Otherwise, adjust the limit to suit your needs. +3. Save your changes and run; + ``` + sysctl -p + ``` + +Note: Netdata strongly suggests knowing exactly what values you need before making system changes. + +### Useful resources + +1. [SYN Floods](https://en.wikipedia.org/wiki/SYN_flood) +2. [ip-sysctl.txt](https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt) +3. [Transmission Control Protocol](https://en.wikipedia.org/wiki/Transmission_Control_Protocol) + diff --git a/src/health/guides/tcp/1m_tcp_syn_queue_cookies.md b/src/health/guides/tcp/1m_tcp_syn_queue_cookies.md new file mode 100644 index 000000000..8dafb9f41 --- /dev/null +++ b/src/health/guides/tcp/1m_tcp_syn_queue_cookies.md @@ -0,0 +1,39 @@ +### Understand the alert + +This alert presents the average number of sent SYN cookies due to the full TCP SYN queue over the sixty seconds. Receiving this means that the incoming traffic is excessive. SYN queue cookies are used to resist any potential SYN flood attacks. + +This alert is raised to warning when the average exceeds 1 and will enter critical when the value exceeds an average of 5 sent SYN cookies in sixty seconds. + +###What are SYN Queue Cookies? + +The SYN Queue stores inbound SYN packets (specifically: struct inet_request_sock). It is responsible for sending out SYN+ACK packets and retrying them on timeout. After transmitting the SYN+ACK, the SYN Queue waits for an ACK packet from the client - the last packet in the three-way-handshake. All received ACK packets must first be matched against the fully established connection table, and only then against data in the relevant SYN Queue. On SYN Queue match, the kernel removes the item from the SYN Queue, successfully creates a full connection (specifically: struct inet_sock), and adds it to the Accept Queue. + +### SYN flood + +This alert likely indicates a SYN flood. + +A SYN flood is a form of denial-of-service attack in which an attacker rapidly initiates a connection to a server without finalizing the connection. The server has to spend resources waiting for half-opened connections, which can consume enough resources to make the system unresponsive to legitimate traffic. + +### Troubleshoot the alert + +If the traffic is legitimate, then increase the limit of the SYN queue. + +If you can determine that the traffic is legitimate, consider expanding the limit of the SYN queue through configuration; + +*(If the traffic is not legitimate, then this is not safe! You will expose more resources to an attacker if the traffic is not legitimate.)* + +1. Open the /etc/sysctl.conf file and look for the entry "net.core.somaxconn". This value will affect both SYN and accept queue limits on newer Linux systems. +2. Set the value accordingly (By default it is set to 128) `net.core.somaxconn=128` (if the value doesn't exist, append it to the file) +3. Save your changes and run this command to apply the changes. + ``` + sysctl -p + ``` +Note: Netdata strongly suggests knowing exactly what values you need before making system changes. + +### Useful resources + +1. [SYN packet handling](https://blog.cloudflare.com/syn-packet-handling-in-the-wild/) +2. [SYN Floods](https://en.wikipedia.org/wiki/SYN_flood) +3. [SYN Cookies](https://en.wikipedia.org/wiki/SYN_cookies) +4. [ip-sysctl.txt](https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt) +5. [Transmission Control Protocol](https://en.wikipedia.org/wiki/Transmission_Control_Protocol) diff --git a/src/health/guides/tcp/1m_tcp_syn_queue_drops.md b/src/health/guides/tcp/1m_tcp_syn_queue_drops.md new file mode 100644 index 000000000..c29d86d77 --- /dev/null +++ b/src/health/guides/tcp/1m_tcp_syn_queue_drops.md @@ -0,0 +1,22 @@ +### Understand the alert + +This alert indicates that the average number of SYN requests dropped due to the TCP SYN queue being full has exceeded a specific threshold in the last minute. A high number of dropped SYN requests may indicate a SYN flood attack, causing the system to become unresponsive to legitimate traffic. + +### Troubleshoot the alert + +1. **Monitor incoming traffic**: Analyze the incoming network traffic to determine if there is a sudden surge in SYN requests, which might indicate a SYN flood attack. Use tools like `tcpdump`, `iftop`, or `nload` to monitor network traffic. + +2. **Check system resources**: Inspect the system's CPU and memory usage to ensure there are enough resources available to handle incoming connections. High resource usage might lead to dropped SYN requests. + +3. **Enable SYN cookies**: If the traffic is legitimate, consider enabling SYN cookies to help mitigate the impact of a SYN flood attack, as described in the provided guide above. + +4. **Adjust SYN queue settings**: Increase the SYN queue size by adjusting the `net.core.somaxconn` and `net.ipv4.tcp_max_syn_backlog` sysctl parameters. Make sure to set these values according to your system's capacity and traffic requirements. + +5. **Implement traffic filtering**: Use traffic filtering techniques such as rate limiting, IP blocking, or firewall rules to mitigate the impact of SYN flood attacks. + +### Useful resources + +1. [SYN packet handling](https://blog.cloudflare.com/syn-packet-handling-in-the-wild/) +2. [SYN Floods](https://en.wikipedia.org/wiki/SYN_flood) +3. [SYN Cookies](https://en.wikipedia.org/wiki/SYN_cookies) +4. [ip-sysctl.txt](https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt) diff --git a/src/health/guides/tcp/tcp_connections.md b/src/health/guides/tcp/tcp_connections.md new file mode 100644 index 000000000..849a05ac2 --- /dev/null +++ b/src/health/guides/tcp/tcp_connections.md @@ -0,0 +1,51 @@ +### Understand the alert + +This alert is related to the percentage of used IPv4 TCP connections. If you receive this alert, it means that your system has high TCP connections utilization, and you might be approaching the limit of maximum connections. + +### What does high IPv4 TCP connections utilization mean? + +When the number of IPv4 TCP connections gets too high, the system's ability to establish new connections decreases. This is because there are limitations due to resources such as memory or system settings. High utilization could lead to connection-related issues or service interruptions. + +### Troubleshoot the alert + +1. Check current TCP connections: + + To see the current number of TCP connections, you can use the `ss` or `netstat` command: + + ``` + ss -t | grep ESTAB | wc -l + ``` + + or + + ``` + netstat -ant | grep ESTABLISHED | wc -l + ``` + +2. Identify connections with high usage: + + To list the connections with their state (e.g., ESTABLISHED, LISTEN), use the following command: + + ``` + ss -tan + ``` + + Look for connections with a high number of ESTABLISHED connections, as these may be contributing to the high utilization. + +3. Inspect running processes to identify potential culprits: + + You can use the `lsof` command to list all open files and the processes that are using them: + + ``` + sudo lsof -iTCP + ``` + + Look for processes with a high number of open files, as these are likely responsible for the increased TCP connections utilization. + +4. Take action: + + Once you have identified the processes contributing to high TCP connections utilization, you can take appropriate action. This may involve optimizing the application, adjusting system settings, or optimizing hardware resources. + +### Useful resources + +1. [Linux lsof command tutorial](https://www.howtoforge.com/linux-lsof-command/) diff --git a/src/health/guides/tcp/tcp_memory.md b/src/health/guides/tcp/tcp_memory.md new file mode 100644 index 000000000..99223c224 --- /dev/null +++ b/src/health/guides/tcp/tcp_memory.md @@ -0,0 +1,50 @@ +### Understand the alert + +This alert is triggered when the TCP memory usage on your system is higher than the allowed limit. High TCP memory utilization can cause applications to become unresponsive and result in poor system performance. + +### Troubleshoot the alert + +To resolve the TCP memory alert, you can follow these steps: + +1. Verify the current TCP memory usage: + + Check the current values of TCP memory buffers by running the following command: + + ``` + cat /proc/sys/net/ipv4/tcp_mem + ``` + + The output consists of three values: low, pressure (memory pressure), and high (memory limit). + +2. Monitor system performance: + + Use the `vmstat` command to monitor the system's performance and understand the memory consumption in detail: + + ``` + vmstat 5 + ``` + + This will display the system's statistics every 5 seconds. Pay attention to the `si` and `so` columns, which represent swap-ins and swap-outs. High values in these columns may indicate memory pressure on the system. + +3. Identify high memory-consuming processes: + + Use the `top` command to identify processes that consume the most memory: + + ``` + top -o %MEM + ``` + + Look for processes with high memory usage and determine if they are necessary for your system. If they are not, consider stopping or killing these processes to free up memory. + +4. Increase the TCP memory: + + Follow the steps mentioned in the provided guide to increase the TCP memory. This includes: + + - Increase the `tcp_mem` bounds using the `sysctl` command. + - Verify the change and test it with the same workload that triggered the alarm originally. + - If the change works, make it permanent by adding the new values to `/etc/sysctl.conf`. + - Reload the sysctl settings with `sysctl -p`. + +### Useful resources + +1. [man pages of tcp](https://man7.org/linux/man-pages/man7/tcp.7.html) diff --git a/src/health/guides/tcp/tcp_orphans.md b/src/health/guides/tcp/tcp_orphans.md new file mode 100644 index 000000000..d7dd35a87 --- /dev/null +++ b/src/health/guides/tcp/tcp_orphans.md @@ -0,0 +1,48 @@ +### Understand the alert + +This alert indicates that your system is experiencing high IPv4 TCP socket utilization, specifically orphaned sockets. Orphaned connections are those not attached to any user file handle. When these connections exceed the limit, they are reset immediately. The warning state is triggered when the percentage of used orphan IPv4 TCP sockets exceeds 25%, and the critical state is triggered when the value exceeds 50%. + +### Troubleshoot the alert + +- Check the current orphan socket usage + +To check the number of orphan sockets in your system, run the following command: + + ``` + cat /proc/sys/net/ipv4/tcp_max_orphans + ``` + +- Identify the processes causing high orphan socket usage + +To identify the processes causing high orphan socket usage, you can use the `ss` command: + + ``` + sudo ss -tan state time-wait state close-wait + ``` + + Look for connections with a large number of orphan sockets and investigate the related processes. + +- Increase the orphan socket limit + +If you need to increase the orphan socket limit to accommodate legitimate connections, you can update the value in the `/proc/sys/net/ipv4/tcp_max_orphans` file. Replace `{DESIRED_AMOUNT}` with the new limit: + + ``` + echo {DESIRED_AMOUNT} > /proc/sys/net/ipv4/tcp_max_orphans + ``` + + Consider the kernel's penalty factor for orphan sockets (usually 2x or 4x) when determining the appropriate limit. + + **Note**: Be cautious when making system changes and ensure you understand the implications of updating these settings. + +- Review and optimize application behavior + +Investigate the applications generating a high number of orphan sockets and consider optimizing their behavior. This may involve updating application settings or code to better manage network connections. + +- Monitor your system + +Keep an eye on your system's orphan socket usage, particularly during peak hours. Adjust the limit as needed to accommodate legitimate connections. + +### Useful resources + +1. [Network Sockets](https://en.wikipedia.org/wiki/Network_socket) +2. [Linux-admins.com - Troubleshooting Out of Socket Memory](http://www.linux-admins.net/2013/01/troubleshooting-out-of-socket-memory.html)
\ No newline at end of file |