diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-03-09 13:19:22 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-03-09 13:19:22 +0000 |
commit | c21c3b0befeb46a51b6bf3758ffa30813bea0ff0 (patch) | |
tree | 9754ff1ca740f6346cf8483ec915d4054bc5da2d /health/guides/portcheck | |
parent | Adding upstream version 1.43.2. (diff) | |
download | netdata-upstream/1.44.3.tar.xz netdata-upstream/1.44.3.zip |
Adding upstream version 1.44.3.upstream/1.44.3
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
3 files changed, 105 insertions, 0 deletions
diff --git a/health/guides/portcheck/portcheck_connection_fails.md b/health/guides/portcheck/portcheck_connection_fails.md new file mode 100644 index 000000000..781cf7a01 --- /dev/null +++ b/health/guides/portcheck/portcheck_connection_fails.md @@ -0,0 +1,32 @@ +### Understand the alert + +This alert indicates that too many connections are failing to a specific TCP endpoint in the last 5 minutes. It suggests that the monitored service on that endpoint is most likely down, unreachable, or access is being denied by firewall/security rules. + +### Troubleshoot the alert + +1. Check the service + Investigate if the service at the endpoint (specific IP and port) is running as expected. Inspect service logs for issues, error messages, or indications of a shutdown event. + +2. Test the endpoint + Try to establish a connection to the flagged endpoint using tools like `telnet`, `curl`, or `nc`. These tools provide real-time feedback that can help identify problems with the endpoint: + + Example using `telnet`: + ``` + telnet IP_ADDRESS PORT_NUMBER + ``` + +3. Examine firewall and security group rules + Verify if there are any recent changes or newly added firewall/security group rules that might be causing the connectivity issues. Look for any rules that could be blocking the monitored port specifically or the IP range. + +4. Inspect network connectivity + Check the network connectivity between the Netdata Agent and the monitored endpoint. Ensure there are no intermittent network failures or high latency affecting the communication between the two. + +5. Examine the alert configuration + Validate the alert configuration in the `netdata.conf` file to confirm that the alert thresholds and monitored percentage of failed connections are set appropriately. + +6. Check resource utilization + High resource utilization might affect the availability of the monitored endpoint. Check if the system hosting the service has enough resources available (CPU, memory, and storage) to serve incoming requests. + +### Useful resources + +1. [How to use netcat (nc) command: Examples for network testing/debugging](https://www.nixcraft.com/t/how-to-use-netcat-nc-command-examples-for-network-testing-debugging/3332) diff --git a/health/guides/portcheck/portcheck_connection_timeouts.md b/health/guides/portcheck/portcheck_connection_timeouts.md new file mode 100644 index 000000000..5386f1509 --- /dev/null +++ b/health/guides/portcheck/portcheck_connection_timeouts.md @@ -0,0 +1,41 @@ +### Understand the alert + +The `portcheck_connection_timeouts` alert calculates the average ratio of connection timeouts when trying to connect to a TCP endpoint over the last 5 minutes. If you receive this alert, it means that the monitored TCP endpoint is unreachable, potentially due to networking issues or an overloaded host/service. + +This alert triggers a warning state when the ratio of timeouts is between 10-40% and a critical state if the ratio is greater than 40%. + +### Troubleshoot the alert + +1. Check the network connectivity + - Use the `ping` command to check network connectivity between your system and the monitored TCP endpoint. + ``` + ping <tcp_endpoint_ip> + ``` + If the connectivity is intermittent or not established, it indicates network issues. Reach out to your network administrator for assistance. + +2. Check the status of the monitored TCP service + - Identify the service running on the monitored TCP endpoint by checking the port number. + - Use the `netstat` command to check the service status: + + ``` + netstat -tnlp | grep <port_number> + ``` + If the service is not running or unresponsive, restart the service or investigate further into the application logs for any issues. + +3. Verify the load on the TCP endpoint host + - Connect to the host and analyze its resource consumption (CPU, memory, disk I/O, and network bandwidth) with tools like `top`, `vmstat`, `iostat`, and `iftop`. + - Identify resource-consuming processes or applications and apply corrective measures (kill/restart the process, allocate more resources, etc.). + +4. Examine the firewall rules and security groups + - Ensure that there are no blocking rules or security groups for your incoming connections to the TCP endpoint. + - If required, update the rules or create new allow rules for the required ports and IP addresses. + +5. Check the Netdata configuration + - Review the Netdata configuration file `/etc/netdata/netdata.conf` to ensure the `portcheck` plugin settings are correctly configured for monitoring the TCP endpoint. + - If necessary, update and restart the Netdata agent. + +### Useful resources + +1. [Netstat Command in Linux](https://www.tecmint.com/20-netstat-commands-for-linux-network-management/) +2. [Iostat Command Usage and Examples](https://www.thomas-krenn.com/en/wiki/Iostat_command_usage_and_examples) +3. [Iftop Guide](https://www.tecmint.com/iftop-linux-network-bandwidth-monitoring-tool/) diff --git a/health/guides/portcheck/portcheck_service_reachable.md b/health/guides/portcheck/portcheck_service_reachable.md new file mode 100644 index 000000000..550db585e --- /dev/null +++ b/health/guides/portcheck/portcheck_service_reachable.md @@ -0,0 +1,32 @@ +### Understand the alert + +This alert checks if a particular TCP service on a specified host and port is reachable. If the average percentage of successful checks within the last minute is below 75%, it triggers an alert indicating the TCP service is not functioning properly. + +### Troubleshoot the alert + +- Verify if the problem is network-related or service-related + + 1. Check if the host and port are correct and the service is configured to listen on that specific port. + + 2. Use `ping` or `traceroute` to diagnose the connectivity issues between your machine and the host. + + 3. Use `telnet` or `nc` to check if the specific port on the host is reachable. For example, `telnet example.com port_number` or `nc example.com port_number`. + + 4. Check the network configuration, firewall settings, and routing rules on both the local machine and the target host. + +- Check if the TCP service is running and functioning properly + + 1. Check the service logs for any errors or issues that may prevent it from working correctly. + + 2. Restart the service and monitor its behavior. + + 3. Investigate if there are any recent changes in the service configuration or updates that may cause the issue. + + 4. Monitor system resources such as CPU, memory, and disk usage to ensure they are not causing any performance bottlenecks. + +- Optimize the service configuration + + 1. Review the service's performance-related configurations and fine-tune them, if necessary. + + 2. Check if there are any optimizations or best practices that can be applied to boost the service performance and reliability. + |