From 58daab21cd043e1dc37024a7f99b396788372918 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sat, 9 Mar 2024 14:19:48 +0100 Subject: Merging upstream version 1.44.3. Signed-off-by: Daniel Baumann --- health/guides/web_log/1m_bad_requests.md | 21 ++++++++++ health/guides/web_log/1m_internal_errors.md | 24 +++++++++++ health/guides/web_log/1m_successful.md | 23 +++++++++++ health/guides/web_log/web_log_10m_response_time.md | 42 +++++++++++++++++++ health/guides/web_log/web_log_1m_bad_requests.md | 27 ++++++++++++ .../guides/web_log/web_log_1m_internal_errors.md | 31 ++++++++++++++ health/guides/web_log/web_log_1m_redirects.md | 22 ++++++++++ health/guides/web_log/web_log_1m_requests.md | 31 ++++++++++++++ health/guides/web_log/web_log_1m_successful.md | 23 +++++++++++ health/guides/web_log/web_log_1m_total_requests.md | 36 ++++++++++++++++ health/guides/web_log/web_log_1m_unmatched.md | 25 +++++++++++ health/guides/web_log/web_log_5m_requests_ratio.md | 34 +++++++++++++++ health/guides/web_log/web_log_5m_successful.md | 36 ++++++++++++++++ health/guides/web_log/web_log_5m_successful_old.md | 29 +++++++++++++ health/guides/web_log/web_log_web_slow.md | 48 ++++++++++++++++++++++ 15 files changed, 452 insertions(+) create mode 100644 health/guides/web_log/1m_bad_requests.md create mode 100644 health/guides/web_log/1m_internal_errors.md create mode 100644 health/guides/web_log/1m_successful.md create mode 100644 health/guides/web_log/web_log_10m_response_time.md create mode 100644 health/guides/web_log/web_log_1m_bad_requests.md create mode 100644 health/guides/web_log/web_log_1m_internal_errors.md create mode 100644 health/guides/web_log/web_log_1m_redirects.md create mode 100644 health/guides/web_log/web_log_1m_requests.md create mode 100644 health/guides/web_log/web_log_1m_successful.md create mode 100644 health/guides/web_log/web_log_1m_total_requests.md create mode 100644 health/guides/web_log/web_log_1m_unmatched.md create mode 100644 health/guides/web_log/web_log_5m_requests_ratio.md create mode 100644 health/guides/web_log/web_log_5m_successful.md create mode 100644 health/guides/web_log/web_log_5m_successful_old.md create mode 100644 health/guides/web_log/web_log_web_slow.md (limited to 'health/guides/web_log') diff --git a/health/guides/web_log/1m_bad_requests.md b/health/guides/web_log/1m_bad_requests.md new file mode 100644 index 000000000..d8702b244 --- /dev/null +++ b/health/guides/web_log/1m_bad_requests.md @@ -0,0 +1,21 @@ +### Understand the alert + +This alert is triggered when the ratio of client error HTTP requests (4xx class status codes, excluding 401) within the last minute is higher than normal. Client errors indicate that the issue is on the client's side, such as incorrect requests or invalid URLs. + +### Troubleshoot the alert + +1. **Analyze response codes**: Identify the specific HTTP response codes your web server is sending to clients. Use the Netdata dashboard and inspect the `detailed_response_codes` chart for your web server to track the error codes being sent. + +2. **Check server logs**: Review the web server logs (e.g., access.log and error.log) to identify any issues, patterns, or errors causing the increase in client errors. These logs can typically be found under `/var/log/{nginx, apache2}/{access.log, error.log}`. + +3. **Verify application behavior**: Check the behavior of applications running on your web server to ensure they are not generating incorrect URLs or causing issues with client requests. + +4. **Identify broken links**: If there is a high number of 404 errors, use a broken link checker tool to identify and fix any dead links on your website or other websites that redirect to your website. + +5. **Monitor server performance**: Keep an eye on the web server's performance metrics to ensure that changes in client errors do not negatively impact server performance or resource usage. + +### Useful resources + +1. [RFC 2616 - HTTP/1.1 Status Code Definitions](https://datatracker.ietf.org/doc/html/rfc2616#section-10.4) +2. [Mozilla - HTTP Status Codes - Client Error Responses](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#client_error_responses) +3. [Broken Link Checker Tools](https://www.google.com/search?q=broken+link+checker) diff --git a/health/guides/web_log/1m_internal_errors.md b/health/guides/web_log/1m_internal_errors.md new file mode 100644 index 000000000..64a1ce081 --- /dev/null +++ b/health/guides/web_log/1m_internal_errors.md @@ -0,0 +1,24 @@ +### Understand the alert + +This alert indicates that there has been an increase in the number of HTTP 5XX server errors in the last minute. These errors typically indicate a problem with the server's ability to process requests, such as misconfigurations, overloaded resources, or other server-side issues. + +### Troubleshoot the alert + +1. **Inspect server logs**: Check the server error logs for any error messages, warnings, or unusual patterns. For Apache and Nginx, the error logs are usually found under `/var/log/{apache2, nginx}/error.log`. Analyze the logs to identify potential issues with the server, such as misconfigurations or resource limitations. + +2. **Check .htaccess file**: If you're using Apache, examine the `.htaccess` file for any misconfigurations or incorrect settings. Ensure that the directives in the file are valid and properly formatted. If necessary, temporarily disable the `.htaccess` file to see if it resolves the issue. + +3. **Review server resources**: Monitor the server's CPU, RAM, and disk usage to determine if the server is experiencing resource limitations. High resource usage can lead to server errors, as the server may be unable to handle incoming requests. Consider upgrading your server resources or optimizing the server for better performance. + +4. **Examine server software**: Check for any issues with the server software, such as outdated versions, security vulnerabilities, or software bugs. Update your server software to the latest version and apply any necessary patches to resolve potential issues. + +5. **Monitor third-party services**: If your server relies on third-party services or APIs, verify that these services are functioning correctly. Server errors may occur if your server is unable to communicate with these services or if they are experiencing downtime. + +6. **Test server functionality**: Use tools such as `curl` or web browser developer tools to send HTTP requests to your server and examine the responses. This can help you identify specific issues with the server, such as incorrect response headers or missing resources. + +### Useful resources + +1. [Apache HTTP Server Documentation](https://httpd.apache.org/docs/) +2. [Nginx Documentation](https://nginx.org/en/docs/) +3. [Mozilla Developer Network - HTTP Status Codes](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status) + diff --git a/health/guides/web_log/1m_successful.md b/health/guides/web_log/1m_successful.md new file mode 100644 index 000000000..abe790086 --- /dev/null +++ b/health/guides/web_log/1m_successful.md @@ -0,0 +1,23 @@ +### Understand the alert + +This alert is triggered when the percentage of successful HTTP requests (1xx, 2xx, 304, 401 response codes) within the last minute falls below a certain threshold. A warning state occurs when the success rate is below 85%, and a critical state occurs when it falls below 75%. This alert can indicate a malfunction in your web server's services, malicious activity towards your website, or broken links. + +### Troubleshoot the alert + +1. **Analyze response codes**: Identify the specific HTTP response codes your web server is sending to clients. Use the Netdata dashboard and inspect the `detailed_response_codes` chart for your web server to track the error codes being sent. + +2. **Check server logs**: Review the web server logs to identify any issues, patterns, or errors causing the decrease in successful requests. Investigate any unusual or unexpected response codes. + +3. **Inspect application logs**: Check the logs of applications running on your web server for any errors or issues that might be affecting the success rate of HTTP requests. + +4. **Verify server resources**: Ensure your server has adequate resources (CPU, RAM, disk space) to handle the workload, as resource limitations can impact the success rate of HTTP requests. + +5. **Review server configuration**: Check your web server's configuration for any misconfigurations, incorrect permissions, or improper settings that may be causing the issue. + +6. **Monitor security**: Look for signs of malicious activity, such as a high number of requests from a specific IP address or a sudden spike in requests. Implement security measures, such as rate limiting, IP blocking, or Web Application Firewalls (WAF), if necessary. + +### Useful resources + +1. [HTTP status codes on Mozilla](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status) +2. [Apache HTTP Server Documentation](https://httpd.apache.org/docs/) +3. [Nginx Documentation](https://nginx.org/en/docs/) diff --git a/health/guides/web_log/web_log_10m_response_time.md b/health/guides/web_log/web_log_10m_response_time.md new file mode 100644 index 000000000..603482a9b --- /dev/null +++ b/health/guides/web_log/web_log_10m_response_time.md @@ -0,0 +1,42 @@ +### Understand the alert + +This alert calculates the average `HTTP response time` of your web server over the last 10 minutes. If you receive this alert, it means that the `latency` of your web server has increased, and might be affecting the user experience. + +### What does HTTP response time mean? + +`HTTP response time` is a measure of the time it takes for your web server to process a request and deliver the corresponding response to the client. A high response time can lead to slow loading pages, indicating that your server is struggling to handle the requests or there are issues with the network. + +### Troubleshoot the alert + +1. **Check the server load**: A high server load can cause increased latency. Check the server load using tools like `top`, `htop`, or `glances`. If server load is high, consider optimizing your server, offloading some services to a separate server, or scaling up your infrastructure. + + ``` + top + ``` + +2. **Analyze the web server logs**: Look for patterns or specific requests that may be causing the increased latency. This can be achieved by parsing logs and correlating the response time with requests. For example, for Apache logs: + + ``` + sudo cat /var/log/apache2/access.log | awk '{print $NF " " $0}' | sort -nr | head -n 10 + ``` + + For Nginx logs: + + ``` + sudo cat /var/log/nginx/access.log | awk '{print $NF " " $0}' | sort -nr | head -n 10 + ``` + +3. **Network issues**: Check if there are any issues with the network connecting your server to the clients, such as high latency, packet loss or a high number of dropped packets. You can use the `traceroute` command to diagnose any network-related issues. + + ``` + traceroute example.com + ``` + +4. **Review your server's configuration**: Check your web server's configuration for any issues, misconfigurations, or suboptimal settings that may be causing the high response time. + +5. **Monitoring and profiling**: Use application monitoring tools like New Relic, AppDynamics, or Dynatrace to get detailed insights about the response time and locate any bottlenecks or problematic requests. + +### Useful resources + +1. [How to Optimize Nginx Performance](https://calomel.org/nginx.html) +2. [Apache Performance Tuning](https://httpd.apache.org/docs/2.4/misc/perf-tuning.html) diff --git a/health/guides/web_log/web_log_1m_bad_requests.md b/health/guides/web_log/web_log_1m_bad_requests.md new file mode 100644 index 000000000..a296c90e6 --- /dev/null +++ b/health/guides/web_log/web_log_1m_bad_requests.md @@ -0,0 +1,27 @@ +### Understand the alert + +HTTP response status codes indicate whether a specific HTTP request has been successfully completed or not. + +The 4xx class of status code is intended for cases in which the client seems to have erred. Except when responding to a HEAD request, the server should include an entity containing an explanation of +the error situation, and whether it is a temporary or permanent condition. These status codes are applicable to any request method. + +The Netdata Agent calculates the ratio of client error HTTP requests over the last minute. This metric does not include the 401 errors. + + +### Troubleshoot the alert + +To identify the HTTP response code your web server sends back: + +1. Open the Netdata dashboard. +2. Inspect the `detailed_response_codes` chart for your web server. This chart keeps track of exactly what error codes your web server sends out. + +You should also check server logs for more details about how the server is handling the requests. For example, web servers such as Apache or Nginx produce two files called access.log and error.log (by default under `/var/log/{nginx, apache2}/{access.log, error.log}`) + +3. Troubleshoot 404 codes on the server side + +The 404 requests indicate outdated links on your website or in other websites that redirect to your website. To check for dead links on your on website, use a `broken link checker` software periodically. + +### Useful resources + +1. [https://datatracker.ietf.org/doc/html/rfc2616#section-10.4](https://datatracker.ietf.org/doc/html/rfc2616#section-10.4) +2. [https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#client_error_responses](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#client_error_responses) \ No newline at end of file diff --git a/health/guides/web_log/web_log_1m_internal_errors.md b/health/guides/web_log/web_log_1m_internal_errors.md new file mode 100644 index 000000000..6eff7c68a --- /dev/null +++ b/health/guides/web_log/web_log_1m_internal_errors.md @@ -0,0 +1,31 @@ +### Understand the alert + +This alert is generated by the Netdata Agent when monitoring web server logs. This alert is triggered when the web server has experienced an unusually high number of internal errors (HTTP status codes 5xx) within the last minute. Internal errors indicate that there is an issue with the server or the application running on it, which is causing the server to fail in processing client requests. + +### Troubleshoot the alert + +1. **Check the web server logs**: Inspect the web server logs to identify the specific internal errors and any patterns that might be causing the issue. Depending on the web server you are using (e.g., Apache, Nginx, etc.), the log files will be located in different directories. You can usually find the logs in the following locations: + + - Apache: `/var/log/apache2/` (Debian/Ubuntu) or `/var/log/httpd/` (RHEL/CentOS) + - Nginx: `/var/log/nginx/` + + To view the logs in real-time, you can use the `tail` command: + + ``` + tail -f /path/to/your/log/directory/access.log + ``` + +2. **Analyze the application logs**: If you have an application running on the web server (e.g., PHP, Node.js, Python), check the application logs for any errors or issues that might be causing the internal errors. + +3. **Verify server resources**: Ensure that your server has enough resources (CPU, RAM, disk space) to handle the current workload. High resource utilization can lead to internal errors. You can use Netdata's dashboard to monitor the server resources in real-time. + +4. **Check server configuration**: Review the web server's configuration files for any misconfigurations or settings that may be causing the issue. For example, incorrect permissions, wrong file paths, or improper configurations can lead to internal errors. + +5. **Inspect application code**: Review your application code to identify any bugs, memory leaks, or issues that could be causing the internal errors. If you recently deployed new code or made changes, consider rolling back to a previous version to see if the issue persists. + +6. **Monitor web server metrics**: Keep an eye on the web server's metrics, such as response times and request rates, to identify any performance bottlenecks or potential issues that may be causing the internal errors. + +### Useful resources + +1. [Server errors on Datatracker](https://datatracker.ietf.org/doc/html/rfc2616#section-10.5) +2. [HTTP server errors on Mozilla](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#server_error_responses) diff --git a/health/guides/web_log/web_log_1m_redirects.md b/health/guides/web_log/web_log_1m_redirects.md new file mode 100644 index 000000000..663f04f5f --- /dev/null +++ b/health/guides/web_log/web_log_1m_redirects.md @@ -0,0 +1,22 @@ +### Understand the alert + +HTTP response status codes indicate whether a specific HTTP request has been successfully completed or not. + +The 3XX class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required may be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD. A client SHOULD detect infinite redirection loops, since such loops generate network traffic for each redirection. + +The Netdata Agent calculates the ratio of redirection HTTP requests over the last minute. This metric does not include the "304 Not modified" message. + +### Troubleshoot the alert + +You can identify exactly what HTTP response code your web server send back to your clients, by opening the Netdata dashboard and inspecting the `detailed_response_codes` chart for your web server. This chart keeps +track of exactly what error codes your web server sends out. + +You should also check the server error logs. For example, web servers such as Apache or Nginx produce and error logs, by default under `/var/log/{nginx, apache2}/{access.log, error.log}` + +### Useful resources + +1. [3XX codes in the HTTP protocol](https://datatracker.ietf.org/doc/html/rfc2616#section-10.3) + +2. [HTTP redirection messages on Mozilla](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#redirection_messages) + + diff --git a/health/guides/web_log/web_log_1m_requests.md b/health/guides/web_log/web_log_1m_requests.md new file mode 100644 index 000000000..230aa8c8e --- /dev/null +++ b/health/guides/web_log/web_log_1m_requests.md @@ -0,0 +1,31 @@ +### Understand the alert + +This alert monitors the number of HTTP requests received by your web server in the last minute. If you receive this alert, it means that there is an increase in the workload on your web server. + +### What does the number of HTTP requests mean? + +HTTP requests are messages sent by clients (like web browsers) to the server to request various resources, such as web pages, images, scripts, and more. An increase in the number of HTTP requests means that there are more clients accessing your web server, which can result in increased resource usage, decreased response times, or potential overloading. + +### Troubleshoot the alert + +1. Determine if the increase in requests is legitimate or malicious: + + - Review traffic logs to see if the increase in requests is coming from legitimate users or search engine bots, or if it is potentially malicious traffic resulting from bots, crawlers, or DDoS attacks. + +2. Analyze server logs for anomalies or abnormal request patterns: + + - Look for sudden spikes, repeating requests, or any other suspicious patterns in the server logs. You may use tools like `grep`, `awk`, or web server-specific log analyzers to help with this. + +3. Check server resources and response times: + + - Monitor your server's CPU, memory, and disk usage to see if the increased requests are causing resource strains or degradations in server performance. + - Use tools like `top`, `htop`, `vmstat`, or monitoring applications for your specific web server software (e.g., `apachetop` for Apache) to help identify the source of the problem. + +4. Optimize web server performance: + + - If you find that the increase in requests is legitimate, consider optimizing the web server by enabling caching, improving database query performance, or upgrading hardware and server resources to handle the increased demand. + +5. Implement security measures: + + - If you have determined that the increase in requests is coming from malicious sources, consider implementing security measures such as rate-limiting, IP blocking, or configuring a Web Application Firewall (WAF). + diff --git a/health/guides/web_log/web_log_1m_successful.md b/health/guides/web_log/web_log_1m_successful.md new file mode 100644 index 000000000..b97515388 --- /dev/null +++ b/health/guides/web_log/web_log_1m_successful.md @@ -0,0 +1,23 @@ +### Understand the alert + +HTTP response status codes indicate whether a specific HTTP request has been successfully completed or not. + +The Netdata Agent calculates the ratio of successful HTTP requests over the last minute. These requests consist of 1xx, 2xx, 304, 401 response codes. You receive this alert in warning when the percentage of successful requests is less than 85% and in critical when it is below 75%. This alert can indicate: + +- A malfunction in the services of your web server +- Malicious activity towards your website +- Broken links towards your servers. + +In most cases, Netdata will send you another alert indicating high incidences of "abnormal" HTTP requests code, for example you could also receive the `web_log_1m_bad_requests` alert. + +### Troubleshoot the alert + +There are a number of reasons triggering this alert. All of them could eventually cause bad user experience with your web services. + +Identify exactly what HTTP response code your web server sent back to your clients. + +Open the Netdata dashboard and inspect the `detailed_response_codes` chart for your web server. This chart keeps track of exactly what error codes your web server sends out. + +### Useful resources + +1. [HTTP status codes on Mozilla](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status) \ No newline at end of file diff --git a/health/guides/web_log/web_log_1m_total_requests.md b/health/guides/web_log/web_log_1m_total_requests.md new file mode 100644 index 000000000..c867cfbf6 --- /dev/null +++ b/health/guides/web_log/web_log_1m_total_requests.md @@ -0,0 +1,36 @@ +### Understand the alert + +This alert calculates the total number of HTTP requests received by the web server in the last minute. If you receive this alert, it means that your web server is experiencing an increase in workload, which might affect its performance or availability. + +### What does an increase in workload mean? + +An increase in workload means that your web server is handling more traffic than usual, or there might be an unexpected spike in the number of HTTP requests received. This might be because of a variety of reasons, like marketing campaigns, product promotions, or even a sudden surge in user demand. + +### Troubleshoot the alert + +1. Analyze web traffic logs + + To understand the reason behind the increased workload, the first step is to analyze the web server traffic logs. Look for any patterns, specific time intervals, or specific user agents that are contributing to the high number of requests. + +2. Check the web server performance + + Monitoring web server performance metrics like CPU usage, memory usage, and disk space can provide insight into the resource utilization. Use tools like `top`, `vmstat`, `iostat`, and `free` for this assessment. + +3. Monitor response times + + Checking the response time statistics, like average response time and peak response time, can help to understand if the server is struggling to serve the high number of requests. Tools like `apachetop` or `logstash` can be used to track this information. + +4. Evaluate server scaling options + + If none of the previous steps help to identify or resolve the issue, it might be time to consider scaling options. If the server is unable to handle the increased workload, vertically or horizontally scaling the system can help. + +5. Investigate application-level issues + + Application-level issues might also be the reason for high web server traffic. Profiling the web application, checking for slow database queries, or inefficient scripts can help to identify and resolve performance issues. + +### Useful resources + +1. [Analyzing Web server logs with ApacheTop](https://www.howtoforge.com/how-to-analyze-apache-web-server-logs-apachetop) +2. [Logstash Guide: Analyzing Logs](https://www.elastic.co/guide/en/logstash/current/logstash-intro.html) +3. [Web Application Performance Monitoring with New Relic](https://newrelic.com/platform/web-application-monitoring) +4. [Vertically or Horizontally Scaling Your Web Server](https://www.digitalocean.com/community/tutorials/5-common-server-setups-for-your-web-application) \ No newline at end of file diff --git a/health/guides/web_log/web_log_1m_unmatched.md b/health/guides/web_log/web_log_1m_unmatched.md new file mode 100644 index 000000000..933316493 --- /dev/null +++ b/health/guides/web_log/web_log_1m_unmatched.md @@ -0,0 +1,25 @@ +### Understand the alert + +In a webserver, all activity should be monitored. By default, most of the webservers log activity in an `access.log` file. The access log is a list of all requests for individual files that people or bots have requested from a website. Log File strings include notes about their requests for the HTML files and their embedded graphic images, along with any other associated files that are transmitted. + +The Netdata Agent calculates the percentage of unparsed log lines over the last minute. These are entries in the log file that didn't match in any of the common pattern operations (1XX, 2XX, etc) of the webserver. This can indicate an abnormal activity on your web server, or that your server is performing operations that you cannot monitor with the Agent. + +Web servers like NGINX and Apache2 give you the ability to modify the log patterns for each request. If you have done that, you also need to adjust the Netdata Agent to parse those patterns. + +### Troubleshoot the alert + +- Create a custom log format job + +You must create a new job in the `web_log` collector for your Agent. + +1. See how you can [configure this collector](https://learn.netdata.cloud/docs/agent/collectors/python.d.plugin/web_log#configuration) + +2. Follow the job template specified in the [default web_log.conf file](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/web_log/web_log.conf#L53-L86), focus on the lines [83:85](https://github.com/netdata/netdata/blob/e6d9fbc4a53f1d35363e9b342231bb11627bafbd/collectors/python.d.plugin/web_log/web_log.conf#L83-L85) where you can see how you define a `custom_log_format`. + +3. Restart the Netdata Agent + ``` + systemctl restart netdata + ``` + + + diff --git a/health/guides/web_log/web_log_5m_requests_ratio.md b/health/guides/web_log/web_log_5m_requests_ratio.md new file mode 100644 index 000000000..e2cf46f16 --- /dev/null +++ b/health/guides/web_log/web_log_5m_requests_ratio.md @@ -0,0 +1,34 @@ +### Understand the alert + +The `web_log_5m_requests_ratio` alert indicates that there is a significant increase in the number of successful HTTP requests to your web server in the last 5 minutes compared to the previous 5 minutes. This alert is important for monitoring sudden traffic surges, which can potentially overload your server. + +### Troubleshoot the alert + +1. Check the source of the increased traffic + Use web server logs to determine the source of the increased traffic. Identify if the requests are coming from a specific IP address, group of IP addresses, or even bots. + + For example, for Nginx, you can check the log files at `/var/log/nginx/access.log`. For Apache, the logs can be found at `/var/log/apache2/access.log`. + +2. Analyze the requests + Look at the type of requests (GET, POST, etc.) and the requested resources (URLs). This analysis can help you understand if the increase in traffic is legitimate or if it's due to an issue like a DDoS attack or a web crawler. + +3. Monitor server performance + Use monitoring tools like `top`, `iotop`, or Netdata itself to check your server's performance metrics. Keep an eye on CPU, RAM, and disk usage to ensure that the server is not getting overloaded. + +4. Optimize server resources and configuration + If you find that the traffic increase is legitimate and your server is struggling to handle the load, consider optimizing your server resources and configuration. Techniques include: + + - Increasing server resources (CPU, RAM, disk) + - Using a caching mechanism + - Load balancing and scaling out your infrastructure + - User connection rate limiting and request throttling + +5. Mitigate potential attacks + If the analysis reveals that the increase in traffic is due to a DDoS attack, implement mitigation strategies like firewalls, IP blocking, or using a web application firewall (WAF). Ensure that you have a robust security system in place to protect your server from such attacks. + +### Useful resources + +1. [How to Manage Sudden Traffic Surges and Server Overload](https://www.nginx.com/blog/how-to-manage-sudden-traffic-surges-server-overload/) +2. [Attacks on Network Infrastructure](https://www.cloudflare.com/learning/ddos/ddos-attacks/) +3. [Using Nginx to Rate Limit IP Addresses](https://calomel.org/nginx.html) +4. [Setting up a Super Fast Apache Server with Cache](https://hostadvice.com/how-to/how-to-configure-apache-web-server-cache-on-ubuntu/) \ No newline at end of file diff --git a/health/guides/web_log/web_log_5m_successful.md b/health/guides/web_log/web_log_5m_successful.md new file mode 100644 index 000000000..5c5b2c4e6 --- /dev/null +++ b/health/guides/web_log/web_log_5m_successful.md @@ -0,0 +1,36 @@ +### Understand the alert + +This alert monitors the average number of successful HTTP requests per second, over the last 5 minutes (`web_log.type_requests`). If you receive this alert, it means that there has been a significant change in the number of successful HTTP requests to your web server. + +### What does successful HTTP request mean? + +A successful HTTP request is one that receives a response with an HTTP status code in the range of `200-299`. In other words, these requests have been processed correctly by the web server and returned the expected results to the client. + +### Troubleshoot the alert + +1. Check your web server logs + + Inspect your web server logs for any abnormal activity or issues that might have led to increased or decreased successful HTTP requests. Depending on your web server (e.g., Apache, Nginx), the location of the logs will vary. + +2. Analyze the type of requests + + Check the logs for request types (e.g., GET, POST, PUT, DELETE) and their corresponding distribution during the time of the alert. This might help you identify a pattern or source of the issue. + +3. Monitor web server resources + + Use monitoring tools like `top`, `htop`, or `glances` to check the resource usage of your web server during the alert period. High resource usage may indicate that your server is struggling to handle the load, causing an abnormal number of successful HTTP requests. + +4. Verify client connections + + Investigate the IP addresses and user agents that are making a significant number of requests during the alert period. If there's a spike in requests from a single or a few IPs, it could be a sign of a coordinated attack, excessive crawling, or other unexpected behavior. + +5. Check your web application + + Make sure that your web application is functioning well and generating the expected response for clients, which can impact successful HTTP requests. + +### Useful resources + +1. [Apache Log Files](https://httpd.apache.org/docs/current/logs.html) +2. [Nginx Log Files](https://nginx.org/en/docs/ngx_core_module.html#error_log) +3. [Introduction to Identifying Security Vulnerabilities in Web Servers](https://www.acunetix.com/blog/articles/introduction-identifying-security-vulnerabilities-web-servers) +4. [Web Application Performance Analysis and Monitoring](https://www.site24x7.com/learning/web-application-performance.html) \ No newline at end of file diff --git a/health/guides/web_log/web_log_5m_successful_old.md b/health/guides/web_log/web_log_5m_successful_old.md new file mode 100644 index 000000000..bbee58a42 --- /dev/null +++ b/health/guides/web_log/web_log_5m_successful_old.md @@ -0,0 +1,29 @@ +### Understand the alert + +This alert, `web_log_5m_successful_old`, calculates the average number of successful HTTP requests per second for the 5 minutes starting 10 minutes ago. If you receive this alert, it means that there might be a significant change in the number of requests your web server is serving. + +### What does the alert mean? + +The alert is useful for understanding the workload on your web server based on historical request data. It helps to ensure that the web server is functioning as expected and can handle the current number of users without negatively impacting their experience. + +### Troubleshoot the alert + +To troubleshoot this alert, follow these steps: + +1. **Check the current number of successful HTTP requests** to compare with the historical data of the alert. You can use Netdata's web dashboard to see the current requests rate in real-time. If the number of requests has increased significantly, it might indicate a potential issue. + +2. **Identify any potential issues or errors on your web server.** Check the server's error logs for any signs of abnormal behavior or error messages. This can help you determine if there are any underlying issues causing the increase in requests. + +3. **Analyze the user traffic** to understand the cause of the increase in successful requests. This could be caused by a sudden spike in website visitors, a DDoS attack, or the introduction of new and popular content on your website. You can use tools like Google Analytics or server access logs to get detailed information about user traffic. + +4. **Review server resources and performance** to ensure the web server has adequate resources to handle the request load. If the number of requests is higher than usual, check the server's CPU usage, memory usage, and network bandwidth to ensure optimal performance. + +5. **Evaluate server configuration** to check for any misconfigurations, outdated software, or resource limitations that may impact the handling of requests. Update or adjust configurations as necessary to improve the web server's performance. + +6. **Monitor and take necessary actions** based on your findings. If the increase in successful requests is a result of legitimate traffic, ensure that your web server can handle the extra load. If the traffic is malicious or the result of an attack, consider implementing security measures like rate-limiting or blocking IPs. + +### Useful resources + +1. [Monitoring Web Server Performance with Netdata](https://www.netdata.cloud/webserver-monitoring/) +2. [How to Analyze Access Logs](https://www.scalyr.com/blog/analyze-access-logs/) +3. [Optimizing Web Server Performance](https://www.keycdn.com/blog/web-server-performance) diff --git a/health/guides/web_log/web_log_web_slow.md b/health/guides/web_log/web_log_web_slow.md new file mode 100644 index 000000000..7ed3ebe1f --- /dev/null +++ b/health/guides/web_log/web_log_web_slow.md @@ -0,0 +1,48 @@ +### Understand the alert + +The `web_log_web_slow` alert is triggered when the average HTTP response time of your web server (NGINX, Apache) has increased over the last minute. It indicates that your web server's performance might be affected, resulting in slow response times for client requests. + +### Troubleshoot the alert + +There are several factors that can cause slow web server performance. To troubleshoot the `web_log_web_slow` alert, examine the following areas: + +1. **Monitor web server utilization:** + + Use monitoring tools like `top`, `htop`, or `glances` to check the CPU, memory, and traffic utilization of your web server. If you find high resource usage, consider taking action to address the issue: + - Increase your server's resources (CPU, memory) or move to a more powerful machine. + - Adjust the web server configuration to use more worker processes or threads. + - Implement load balancing across multiple web servers to distribute the traffic load. + +2. **Optimize databases:** + + Slow database performance can directly impact web server response times. Monitor and optimize your database to improve response speeds: + - Check for slow or inefficient queries and optimize them. + - Regularly clean and optimize your database by removing outdated or unnecessary data, and by using tools like `mysqlcheck` or `pg_dump`. + - Enable database caching for faster results on recurring queries. + +3. **Configure caching:** + + Implement browser or server-side caching to reduce the load on your web server and speed up content delivery: + - Enable browser caching using proper cache-control headers in your server configuration. + - Implement server-side caching with tools like Varnish or use full-page caching in your web server (NGINX FastCGI cache, Apache mod_cache). + +4. **Examine web server logs:** + + Analyze your web server logs to identify specific requests or resources that may be causing slow responses. Tools like `goaccess` or `awstats` can help you analyze web server logs and identify issues: + - Check for slow request URIs or resources and optimize them. + - Identify slow third-party services, such as CDNs, external APIs, or database connections, and troubleshoot these connections as needed. + +5. **Optimize web server configuration:** + + Review your web server's configuration settings to ensure optimal performance: + - Ensure that your web server is using the latest stable version for performance improvements and security updates. + - Disable unnecessary modules or features to reduce resource usage. + - Review and optimize settings related to timeouts, buffer sizes, and compression for better performance. + +### Useful resources + +1. [Optimizing NGINX for Performance](https://easyengine.io/tutorials/nginx/performance/) +2. [Apache Performance Tuning](https://httpd.apache.org/docs/2.4/misc/perf-tuning.html) +3. [Top 10 MySQL Performance Tuning Tips](https://www.databasejournal.com/features/mysql/top-10-mysql-performance-tuning-tips.html) +4. [10 Tips for Optimal PostgreSQL Performance](https://www.digitalocean.com/community/tutorials/10-tips-for-optimizing-postgresql-performance-on-a-digitalocean-droplet) +5. [A Beginner's Guide to HTTP Cache Headers](https://www.keycdn.com/blog/http-cache-headers) \ No newline at end of file -- cgit v1.2.3