diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-19 02:57:58 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-19 02:57:58 +0000 |
commit | be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97 (patch) | |
tree | 9754ff1ca740f6346cf8483ec915d4054bc5da2d /health/guides/net | |
parent | Initial commit. (diff) | |
download | netdata-upstream.tar.xz netdata-upstream.zip |
Adding upstream version 1.44.3.upstream/1.44.3upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'health/guides/net')
-rw-r--r-- | health/guides/net/10min_fifo_errors.md | 42 | ||||
-rw-r--r-- | health/guides/net/10min_netisr_backlog_exceeded.md | 56 | ||||
-rw-r--r-- | health/guides/net/10s_received_packets_storm.md | 23 | ||||
-rw-r--r-- | health/guides/net/1m_received_packets_rate.md | 45 | ||||
-rw-r--r-- | health/guides/net/1m_received_traffic_overflow.md | 24 | ||||
-rw-r--r-- | health/guides/net/1m_sent_traffic_overflow.md | 23 | ||||
-rw-r--r-- | health/guides/net/inbound_packets_dropped.md | 58 | ||||
-rw-r--r-- | health/guides/net/inbound_packets_dropped_ratio.md | 52 | ||||
-rw-r--r-- | health/guides/net/interface_inbound_errors.md | 36 | ||||
-rw-r--r-- | health/guides/net/interface_outbound_errors.md | 42 | ||||
-rw-r--r-- | health/guides/net/interface_speed.md | 44 | ||||
-rw-r--r-- | health/guides/net/outbound_packets_dropped.md | 57 | ||||
-rw-r--r-- | health/guides/net/outbound_packets_dropped_ratio.md | 27 |
13 files changed, 529 insertions, 0 deletions
diff --git a/health/guides/net/10min_fifo_errors.md b/health/guides/net/10min_fifo_errors.md new file mode 100644 index 00000000..845ae6af --- /dev/null +++ b/health/guides/net/10min_fifo_errors.md @@ -0,0 +1,42 @@ +### Understand the alert + +Between the IP stack and the Network Interface Controller (NIC) lies the driver queue. This queue is typically implemented as a FIFO ring buffer into the memory space allocated by the driver. The NIC receive frames and place them into memory as skb_buff data structures (SocKet Buffer). We can have queues (ingress queues) and transmitted (egress queues) but these queues do not contain any actual packet data. Each queue has a pointer to the devices associated with it, and to the skb_buff data structures that store the ingress/egress packets. The number of frames this queue can handle is limited. Queues fill up when an interface receives packets faster than kernel can process them. + +Netdata monitors the number of FIFO errors (number of times an overflow occurs in the ring buffer) for a specific network interface in the last 10 minutes. This alarm is triggered when the NIC is not able to handle the peak load of incoming/outgoing packets with the current ring buffer size. + +Not all NICs support FIFO queue operations. + +### More about SKB + +The SocKet Buffer (SKB), is the most fundamental data structure in the Linux networking code. Every packet sent or received is handled using this data structure. This is a large struct containing all the control information required for the packet (datagram, cell, etc). + +The struct sk_buff has the following fields to point to the specific network layer headers: + +- transport_header (previously called h) – This field points to layer 4, the transport layer (and can include tcp header or udp header or + icmp header, and more) + +- network_header (previously called nh) – This field points to layer 3, the network layer (and can include ip header or ipv6 header or arp + header). + +- mac_header (previously called mac) – This field points to layer 2, the link layer. + +- skb_network_header(skb), skb_transport_header(skb) and skb_mac_header(skb) - These return pointer to the header. + +### Troubleshoot the alert + +- Update the ring buffer size + +1. To view the maximum RX ring buffer size: + + ``` + ethtool -g enp1s0 + ``` + +2. If the values in the Pre-set maximums section are higher than in the Current hardware settings section, increase RX (or TX) ring buffer: + + ``` + enp1s0 rx 4080 + ``` + +3. Verify the change to make sure that you no longer receive the alarm when running the same workload. To make this permanently, you must consult your distribution guides. + diff --git a/health/guides/net/10min_netisr_backlog_exceeded.md b/health/guides/net/10min_netisr_backlog_exceeded.md new file mode 100644 index 00000000..d40d2c9a --- /dev/null +++ b/health/guides/net/10min_netisr_backlog_exceeded.md @@ -0,0 +1,56 @@ +### Understand the alert + +The `10min_netisr_backlog_exceeded` alert occurs when the `netisr_maxqlen` queue within FreeBSD's network kernel dispatch service reaches its maximum capacity. This queue stores packets received by interfaces and waiting to be processed by the destined subsystems or userland applications. When the queue is full, the system drops new packets. This alert indicates that the average number of dropped packets in the last minute has exceeded the netisr queue length. + +### Troubleshoot the alert + +1. **Increase the netisr_maxqlen value** + + a. Check the current value: + + ``` + root@netdata~ # sysctl net.route.netisr_maxqlen + net.route.netisr_maxqlen: 256 + ``` + + b. Increase the value by a factor of 4: + + ``` + root@netdata~ # sysctl -w net.route.netisr_maxqlen=1024 + ``` + + c. Verify the change and test with the same workload that triggered the alarm originally: + + ``` + root@netdata~ # sysctl net.route.netisr_maxqlen + net.route.netisr_maxqlen: 1024 + ``` + + d. If the change works for your system, make it permanent by adding this entry, `net.route.netisr_maxqlen=1024`, to `/etc/sysctl.conf`. + + e. Reload the sysctl settings: + + ``` + root@netdata~ # /etc/rc.d/sysctl reload + ``` + +2. **Monitor the system** + + After increasing the `netisr_maxqlen` value, continue to monitor your system's dropped packet statistics using tools like `netstat` to determine if the queue backlog situation has improved. If you are still experiencing high packet drop rates, you may need to further increase the `netisr_maxqlen` value, or explore other optimizations for your networking stack. + +3. **Check hardware and system resources** + + In some cases, overloaded or underpowered hardware may cause issues with packet processing. Ensure that your hardware (network cards, switches, routers, etc.) is performing optimally, and that your system has enough CPU and RAM resources to handle the traffic load. + +4. **Network traffic analysis** + + Analyze your network traffic using tools like `tcpdump`, `iftop`, or `iptraf` to identify specific traffic patterns or types causing the backlog issue. This analysis can help you optimize your network infrastructure or take actions to reduce unnecessary traffic. + +5. **Update FreeBSD version** + + Ensure that your FreeBSD system is up to date, as newer kernel versions may include performance improvements and optimizations for packet processing. Updating to a newer version might help resolve netisr backlog issues. + +### Useful resources + +1. [FreeBSD Performance Tuning](https://calomel.org/freebsd_network_tuning.html) +2. [FreeBSD Handbook: Tuning Kernel Limits](https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/configtuning-kernel-limits.html) diff --git a/health/guides/net/10s_received_packets_storm.md b/health/guides/net/10s_received_packets_storm.md new file mode 100644 index 00000000..29e1f534 --- /dev/null +++ b/health/guides/net/10s_received_packets_storm.md @@ -0,0 +1,23 @@ +### Understand the alert + +This alert is triggered when there is a significant increase in the number of received packets within a 10-second interval. It indicates a potential packet storm, which may cause network congestion, dropped packets, and reduced performance. + +### Troubleshoot the alert + +1. **Check network utilization**: Monitor network utilization on the affected interface to identify potential bottlenecks, high bandwidth usage, or network saturation. + +2. **Identify the source**: Determine the source of the increased packet rate. This may be caused by a misconfigured application, a faulty network device, or a Denial of Service (DoS) attack. + +3. **Inspect network devices**: Check network devices such as routers, switches, and firewalls for potential issues, misconfigurations, or firmware updates that may resolve the problem. + +4. **Verify application behavior**: Ensure that the applications running on your network are behaving as expected and not generating excessive traffic. + +5. **Implement rate limiting**: If the packet storm is caused by a specific application or service, consider implementing rate limiting to control the number of packets being sent. + +6. **Monitor network security**: Check for signs of a DoS attack or other security threats, and take appropriate action to mitigate the risk. + +### Useful resources + +1. [Wireshark User's Guide](https://www.wireshark.org/docs/wsug_html_chunked/) +2. [Tcpdump Manual Page](https://www.tcpdump.org/manpages/tcpdump.1.html) +3. [Iperf - Network Bandwidth Measurement Tool](https://iperf.fr/) diff --git a/health/guides/net/1m_received_packets_rate.md b/health/guides/net/1m_received_packets_rate.md new file mode 100644 index 00000000..891e8bf3 --- /dev/null +++ b/health/guides/net/1m_received_packets_rate.md @@ -0,0 +1,45 @@ +### Understand the alert + +1m_received_packets_rate alert indicates the average number of packets received by the network interface on your system over the last minute. If you receive this alert, it signifies higher than usual network traffic incoming. + +### What do received packets mean? + +A received packet is a unit of data that is transmitted through the network interface to your system. Higher received packets rate means an increase in incoming network traffic to your system. It could be due to legitimate usage or could signal a potential issue such as a network misconfiguration, an attack, or a system malfunction. + +### Troubleshoot the alert + +1. Analyze the network throughput: Use the `nload` or `iftop` command to check the incoming traffic on your system's network interfaces. These commands display the current network traffic and will help you monitor the incoming data. + + ``` + sudo nload <network_interface> // or + sudo iftop -i <network_interface> + ``` + + Replace `<network_interface>` with your network interface (e.g., eth0). + +2. Check for specific processes consuming unusually high network bandwidth: Use the `netstat` command combined with `grep` to filter the results and find processes with high network traffic. + + ``` + sudo netstat -tunap | grep <network_interface> + ``` + + Replace `<network_interface>` with your network interface (e.g., eth0). + +3. Identify host-consuming bandwidth: After identifying the processes consuming a high network, you can trace back their respective hosts. Use the `tcpdump` command to capture live network traffic and analyze it for specific IP addresses causing the high packets rate. + + ``` + sudo tcpdump -n -i <network_interface> -c 100 + ``` + + Replace `<network_interface>` with your network interface (e.g., eth0). + +4. Mitigate the issue: Depending on the root cause, apply appropriate remedial actions. This may include: + - Adjusting application/service configuration to reduce network traffic + - Updating firewall rules to block undesired sources/IPs + - Ensuring network devices are appropriately configured + - Addressing system overload issues that hamper network performance + +### Useful resources + +1. [nload - Monitor Linux Network Traffic and Bandwidth Usage in Real Time](https://www.tecmint.com/nload-monitor-linux-network-traffic-bandwidth-usage/) +2. [An Introduction to the ss Command](http://www.binarytides.com/linux-ss-command/) diff --git a/health/guides/net/1m_received_traffic_overflow.md b/health/guides/net/1m_received_traffic_overflow.md new file mode 100644 index 00000000..270dd892 --- /dev/null +++ b/health/guides/net/1m_received_traffic_overflow.md @@ -0,0 +1,24 @@ +### Understand the alert + +Network interfaces are categorized primarily on the bandwidth they can operate (1 Gbps, 10 Gbps, etc). High network utilization occurs when the volume of data on a network link approaches the capacity of the link. Netdata agent +calculates the average outbound utilization for a specific network interface over the last minute. High outbound utilization increases latency and packet loss because packet bursts are buffered + +This alarm may indicate either network congestion or malicious activity. + +### Troubleshoot the alert + +- Prioritize important traffic + +Quality of service (QoS) is the use of routing prioritization to control traffic and ensure the performance of critical applications. QoS works best when low-priority traffic exists that can be dropped when congestion occurs. The higher-priority traffic must fit within the bandwidth limitations of the link or path. + +- Add more bandwidth + + - For **Cloud infrastructures**, adding bandwidth might be easy. It depends on your cloud infrastracture and your cloud provider. Some of them either offer you the service to upgrade machines to a higher bandwidth rate or upgrade you machine to a more powerful one with higher bandwidth rate. + + - For **Bare-metal** machines, you will need either a hardware upgrade or the addition of a network card using link aggregation to combine multiple network connections in parallel (e.g LACP). + +### Useful resources + +- [FireQOS](https://firehol.org/tutorial/fireqos-new-user/) is a traffic shaping helper. It has a very simple shell scripting language to express traffic shaping. + +- [`tcconfig`](https://tcconfig.readthedocs.io/en/latest/index.html) is a command wrapper that makes it easy to set up traffic control of network bandwidth/latency/packet-loss/packet-corruption/etc.
\ No newline at end of file diff --git a/health/guides/net/1m_sent_traffic_overflow.md b/health/guides/net/1m_sent_traffic_overflow.md new file mode 100644 index 00000000..376d578c --- /dev/null +++ b/health/guides/net/1m_sent_traffic_overflow.md @@ -0,0 +1,23 @@ +### Understand the alert + +Network interfaces are categorized primarily on the bandwidth rate at which they can operate (1 Gbps, 10 Gbps, etc). High network utilization occurs when the volume of data on a network link approaches the capacity of the link. Netdata agent calculates the average outbound utilization for a specific network interface over the last minute. High outbound utilization increases latency and packet loss because packet bursts are buffered. + +This alarm may indicate either a network congestion or malicious activity. + +### Troubleshoot the alert + +- Prioritize important traffic + +Quality of service (QoS) is the use of mechanisms or technologies to control traffic and ensure the performance of critical applications. QoS works best when low-priority traffic exists that can be dropped when congestion occurs. The higher-priority traffic must fit within the bandwidth limitations of the link or path. + +- Add more bandwidth + + - For **Cloud infrastructures**, adding bandwidth might be easy. It depends on your cloud infrastracture and your cloud provider. Some of them either offer you the service to upgrade machines to a higher bandwidth rate or upgrade you machine to a more powerful one with higher bandwidth rate. + + - For **Bare-metal** machines, you will need either a hardware upgrade or the addition of a network card using link aggregation to combine multiple network connections in parallel (e.g LACP). + +### Useful resources + +- [FireQOS](https://firehol.org/tutorial/fireqos-new-user/) is a traffic shaping helper. It has a very simple shell scripting language to express traffic shaping. + +- [`tcconfig`](https://tcconfig.readthedocs.io/en/latest/index.html) is a command wrapper that makes it easy to set up traffic control of network bandwidth/latency/packet-loss/packet-corruption/etc.
\ No newline at end of file diff --git a/health/guides/net/inbound_packets_dropped.md b/health/guides/net/inbound_packets_dropped.md new file mode 100644 index 00000000..e2519630 --- /dev/null +++ b/health/guides/net/inbound_packets_dropped.md @@ -0,0 +1,58 @@ +### Understand the alert + +This alert is triggered when the number of inbound dropped packets for a network interface exceeds a specified threshold during the last 10 minutes. A dropped packet means that the network device could not process the packet, hence it was discarded. + +### What are the common causes of dropped packets? + +1. Network Congestion: When the network traffic is too high, the buffer may overflow before the device can process the packets, causing some packets to be dropped. +2. Link Layer Errors: Packets can be dropped due to errors in the link layer causing frames to be corrupted. +3. Insufficient Resources: The network interface may fail to process incoming packets due to a lack of memory or CPU resources. + +### Troubleshoot the alert + +1. Check the overall system resources + + Run the `vmstat` command to get a report about your system statistics. + + ``` + vmstat 1 + ``` + + Check if the CPU or memory usage is high. If either is near full utilization, consider upgrading system resources or managing the load more efficiently. + +2. Check network interface statistics + + Run the `ifconfig` command to get more information on the network interface. + + ``` + ifconfig <INTERFACE> + ``` + + Look for the `RX dropped` field to confirm the number of dropped packets. + +3. Monitor network traffic + + Use `iftop` or `nload` to monitor the network traffic in real time. If you don't have these tools, install them: + + ``` + sudo apt install iftop nload + ``` + + ``` + iftop -i <INTERFACE> + nload <INTERFACE> + ``` + + Identify if there is unusually high traffic on the network interface. + +4. Check logs for any related errors + + Check the system logs for any errors related to the network interface or driver: + + ``` + sudo dmesg | grep -i "eth0" + sudo journalctl -u networking.service + ``` + + If you find any errors, you can research the specific problem and apply the necessary fixes. + diff --git a/health/guides/net/inbound_packets_dropped_ratio.md b/health/guides/net/inbound_packets_dropped_ratio.md new file mode 100644 index 00000000..7bc9ed8e --- /dev/null +++ b/health/guides/net/inbound_packets_dropped_ratio.md @@ -0,0 +1,52 @@ +### Understand the alert + +Packet drops indicate that your system received some packets but could not process them. A sizeable amount of packet drops can consume significant amount of resources in your system. Some reasons that packets drops occurred in your system could be: + +- Your system receives packets with bad VLAN tags. +- The packets you are receiving are using a protocol that is unknown to your system. +- You receive IPv6 packets, but your system is not configured for IPv6. + +All these packets consume resources until being dropped (and for a short period after). For example, your NIC stores them in a ring-buffer until they are forwarded to the destined subsystem or userland application for further process. + +Netdata calculates the ratio of inbound dropped packets for your wired network interface over the last 10 minutes. + +### Identify VLANs in your interface + +There are cases in which traffic is routed to your host due to the existence of multiple VLAN in your network. + +1. Identify VLAN tagged packet in your interface. + +``` +tcpdump -i <your_interface> -nn -e vlan +``` + +2. Monitor the output of the `tcpdump`, identify VLANs which may exist. If no output is displayed, your interface probably uses traditional ethernet frames. + +3. Depending on your network topology, you may consider removing unnecessary VLANs from the switch trunk port toward your host. + +### Update the ring buffer size on your interface + +1. To view the maximum RX ring buffer size: + + ``` + ethtool -g enp1s0 + ``` + +2. If the values in the Pre-set maximums section are higher than in the current hardware settings section, increase RX + ring buffer: + + ``` + enp1s0 rx 4080 + ``` + +3. Verify the change to make sure that you no longer receive the alarm when running the same workload. To make this + permanently, you must consult your distribution guides. + + +### Inspect the packets your network interface receives + +Wireshark is a free and open-source packet analyzer. It is used for network troubleshooting, analysis, software and communications protocol development. + +### Useful resources + +[Read more about Wireshark here](https://www.wireshark.org/)
\ No newline at end of file diff --git a/health/guides/net/interface_inbound_errors.md b/health/guides/net/interface_inbound_errors.md new file mode 100644 index 00000000..6c8bcfcd --- /dev/null +++ b/health/guides/net/interface_inbound_errors.md @@ -0,0 +1,36 @@ +- Troubleshoot errors related to network congestion + +Network congestion can cause packets to be dropped, leading to interface inbound errors. To determine if congestion is the issue, you can monitor the network for any signs of excessive workload or high utilization rates. + +1. Use `ifconfig` to check the network interface utilization: + ``` + ifconfig <your_interface> + ``` + +2. Check the network switch/router logs for any indication of high utilization, errors or warnings. + +3. Use monitoring tools like `iftop`, `nload`, or `iptraf` to monitor network traffic and identify any bottle-necks or usage spikes. + +If you find that congestion is causing the inbound errors, consider ways to alleviate the issue including upgrading your network infrastructure or load balancing the traffic. + +- Troubleshoot errors caused by faulty network equipment + +Faulty network devices, such as switches and routers, can introduce errors in packets. To identify the cause, you should review the logs and statistics of any network devices in the path of the communication between the sender and this system. + +1. Check the logs of the network equipment for any indications of errors, problems or unusual behavior. + +2. Review the error counters and statistics of the network equipment to identify any trends or issues. + +3. Consider replacing or upgrading faulty equipment if it is found to be responsible for inbound errors. + +- Troubleshoot errors caused by software or configuration issues + +Incorrect configurations or software issues can also contribute to interface inbound errors. Some steps to troubleshoot these potential causes are: + +1. Review the system logs for any errors or warnings related to the network subsystem. + +2. Ensure that the network interface is configured correctly, and proper drivers are installed and up-to-date. + +3. Examine the system's firewall and security settings to verify that there are no inappropriate blockings or restrictions that may be causing the errors. + +In conclusion, by following these troubleshooting steps, you should be able to identify and resolve the cause of interface inbound errors on your FreeBSD system. Remember to monitor the situation regularly and address any new issues that may arise to ensure a stable and efficient networking environment.
\ No newline at end of file diff --git a/health/guides/net/interface_outbound_errors.md b/health/guides/net/interface_outbound_errors.md new file mode 100644 index 00000000..194d8aba --- /dev/null +++ b/health/guides/net/interface_outbound_errors.md @@ -0,0 +1,42 @@ +### Understand the alert + +This alert is triggered when there is a high number of outbound errors on a specific network interface in the last 10 minutes on a FreeBSD system. When you receive this alert, it means that the network interface is facing transmission-related issues, such as aborted, carrier, FIFO, heartbeat, or window errors. + +### Troubleshoot the alert + +1. Identify the network interface with the problem + Use `ifconfig` to get a list of all network interfaces and their error count: + ``` + ifconfig -a + ``` + Check the "Oerrs" (Outbound errors) field for each interface to find the one with the issue. + +2. Check the interface speed and duplex settings + The speed and duplex settings may mismatch between the network interface and the network equipment (like switches and routers) that it is connected to. Use `ifconfig` or `ethtool` to check these settings. + + With `ifconfig`: + ``` + ifconfig <interface_name> + ``` + + If required, adjust the speed and duplex settings using `ifconfig`: + ``` + ifconfig <interface_name> media <media_type> + ``` + `<media_type>` can be one of the following: 10baseT/UTP, 100baseTX, 1000baseTX, etc., and can include half-duplex or full-duplex. + Example: + ``` + ifconfig em0 media 1000baseTX mediaopt full-duplex + ``` + Ensure both the network interface and the connected device use the same settings. + +3. Check network cables and devices + Check the physical connections of the network cable to both the network interface and the network equipment it connects to. Replace the network cable if necessary. Additionally, verify if the issue is related to the connected network equipment (switches and routers). + +4. Analyze network traffic + Use tools like `tcpdump` or `Wireshark` to analyze the network traffic on the affected interface. This can give you insights into the root cause of the errors and help in troubleshooting device or network-related issues. + +### Useful resources + +1. [FreeBSD ifconfig man page](https://www.freebsd.org/cgi/man.cgi?ifconfig(8)) +2. [FreeBSD Handbook - Configuring the Network](https://www.freebsd.org/doc/handbook/config-network-setup.html) diff --git a/health/guides/net/interface_speed.md b/health/guides/net/interface_speed.md new file mode 100644 index 00000000..89f967c5 --- /dev/null +++ b/health/guides/net/interface_speed.md @@ -0,0 +1,44 @@ +### Understand the alert + +This alert indicates the current speed of the network interface `${label:device}`. If you receive this alert, it means that there is a significant change or reduction in the speed of your network interface. + +### What does interface speed mean? + +Interface speed refers to the maximum throughput an interface (network card or adapter) can support in terms of transmitting and receiving data. It is measured in Megabits per second (Mbit/s) and determines the performance of a network connection. + +### Troubleshoot the alert + +- Check the network interface speed. + +To see the interface speed and other information about the network interface, run the following command in the terminal: + +``` +ethtool ${label:device} +``` + +Replace `${label:device}` with your network interface name, e.g., `eth0` or `enp2s0`. + +- Confirm if there is a network congestion issue. + +High network traffic or congestion might cause reduced interface speed. Use the `iftop` utility to monitor the traffic on the network interface. If you don't have `iftop` installed, then [install it](https://www.binarytides.com/linux-commands-monitor-network/). + +Run the following command in the terminal: + +``` +sudo iftop -i ${label:device} +``` + +Replace `${label:device}` with your network interface name. + +- Verify cable connections and quality. + +Physical cable issues might cause reduced speed in the network interface. Check the connections and quality of the cables connecting your system to the network devices such as routers, switches, or hubs. + +- Update network drivers. + +Outdated network drivers can also lead to reduced speed in the network interface. Update the network drivers to the latest version to avoid any compatibility issues or performance degradations. + +- Check for EMI (Electromagnetic Interference). + +Network cables and devices located near power cables or electronic devices producing electromagnetic fields might experience reduced network interface speed. Make sure that your network cables and devices are not in proximity to potential sources of EMI. + diff --git a/health/guides/net/outbound_packets_dropped.md b/health/guides/net/outbound_packets_dropped.md new file mode 100644 index 00000000..49291d1d --- /dev/null +++ b/health/guides/net/outbound_packets_dropped.md @@ -0,0 +1,57 @@ +### Understand the alert + +This alert tracks the number of dropped outbound packets on a specific network interface (`${label:device}`) within the last 10 minutes. If you receive this alert, it means that your system has experienced dropped outbound packets in the monitored network interface, which might indicate network congestion or other issues affecting network performance. + +### What are dropped packets? + +Dropped packets refer to network packets that are discarded or lost within a computer network during transmission. In general, this can be caused by various factors, such as network congestion, faulty hardware, misconfigured devices, or packet errors. + +### Troubleshoot the alert + +1. Identify the affected network interface: + +Check the alert message for the `${label:device}` placeholder. It indicates the network interface experiencing the dropped outbound packets. + +2. Verify network congestion or excessive traffic: + +Excessive traffic or network congestion can lead to dropped packets. To check network traffic, use the `nload` tool. If it isn't installed, you can follow the instructions given [here](https://www.howtoforge.com/tutorial/install-nload-on-linux/). + +```bash +nload ${label:device} +``` + +This will display the current network bandwidth usage on the specified interface. Look for unusually high or fluctuating usage patterns, which could indicate congestion or excessive traffic. + +3. Verify hardware issues: + +Check the network interface and related hardware components (such as the network card, cables, and switches) for visible damage, loose connections, or other issues. Replace any defective components as needed. + +4. Check network interface configuration: + +Review your network interface configuration to ensure that it is correctly set up. To do this, you can use the `ip` or `ifconfig` command. For example: + +```bash +ip addr show ${label:device} +``` + +or + +```bash +ifconfig ${label:device} +``` + +Verify that the IP address, subnet mask, and other network settings match your network configuration. + +5. Check system logs for networking errors: + +Review your system logs to identify any networking error messages that might provide more information on the cause of the dropped packets. + +```bash +grep -i "error" /var/log/syslog | grep "${label:device}" +``` + +6. Monitor your network for packet errors using tools like `tcpdump` or `wireshark`. + +### Useful resources + +1. [How to monitor network bandwidth and traffic in Linux](https://www.binarytides.com/linux-commands-monitor-network/) diff --git a/health/guides/net/outbound_packets_dropped_ratio.md b/health/guides/net/outbound_packets_dropped_ratio.md new file mode 100644 index 00000000..9b90a97b --- /dev/null +++ b/health/guides/net/outbound_packets_dropped_ratio.md @@ -0,0 +1,27 @@ +### Understand the alert + +When we want to investigate the outbound traffic, the journey of a network packet starts at the application layer. + +Data are written (commonly) to a socket by a user program. The programmer may (raw sockets) or may not (datagram and stream sockets) have the possibility of absolute control over the data which is being sent through the network. The kernel will take the data which is written in a socket queue and allocate the necessary socket buffers. The kernel will try to forward the packets to their destination encapsulating the routing metadata (headers, checksums, fragmentation information) for each packet through a network interface. + +The Netdata Agent calculates the ratio of outbound dropped packets for a specific network interface over the last 10 minutes. Receiving this alarm means that packets were dropped on their way to transmission. + +This alert is triggered in warning state when the ratio of outbound dropped packets for a specific network interface over the last 10 minutes is more than 2%. + +The main reasons of outbound packet drops are: + +1. Link congestion +2. Overburdened devices +3. Defective hardware +4. Faulty network configuration +5. Restricted access from firewall rules + +### Troubleshoot the alert: + +Inspect the packets your network interface sends using Wireshark. + +Wireshark is a free and open-source packet analyzer. It is used for network troubleshooting, analysis, software and communications protocol development. + +### Useful resources + +[Read more about Wireshark here](https://www.wireshark.org/)
\ No newline at end of file |