diff options
Diffstat (limited to '')
-rw-r--r-- | health/guides/anomalies/anomalies_anomaly_flags.md | 30 | ||||
-rw-r--r-- | health/guides/anomalies/anomalies_anomaly_probabilities.md | 30 |
2 files changed, 60 insertions, 0 deletions
diff --git a/health/guides/anomalies/anomalies_anomaly_flags.md b/health/guides/anomalies/anomalies_anomaly_flags.md new file mode 100644 index 00000000..d4ffa164 --- /dev/null +++ b/health/guides/anomalies/anomalies_anomaly_flags.md @@ -0,0 +1,30 @@ +### Understand the alert + +This alert, `anomalies_anomaly_flags`, is triggered when the Netdata Agent detects more than 10 anomalies in the past 2 minutes. Anomalies are events or observations that are significantly different from the majority of the data, raising suspicions about potential issues. + +### What does an anomaly mean? + +An anomaly is an unusual pattern, behavior, or event in your system's operations. These occurrences are typically unexpected and can be either positive or negative. In the context of this alert, the anomalies are most likely related to performance issues, such as a sudden spike in CPU usage, disk I/O, or network activity. + +### Troubleshoot the alert + +1. Identify the source of the anomalies: + + To understand the cause of these anomalies, you should examine the various charts in Netdata dashboard for potential performance issues. Look for sudden spikes, drops, or other irregular patterns in CPU usage, memory usage, disk I/O, and network activity. + +2. Check for any application or system errors: + + Review system and application log files to detect any errors or warnings that may be related to the anomalies. Be sure to check logs of your applications, services, and databases for any error messages or unusual behavior. + +3. Monitor resource usage: + + You can use the Anomalies tab in Netdata to dive deeper into what could be triggering anomalies in your infrastructure. + +4. Adjust thresholds or address the underlying issue: + + If the anomalies are due to normal variations in your system's operation or expected spikes in resource usage, consider adjusting the threshold for this alert to avoid false positives. If the anomalies indicate an actual problem or point to a misconfiguration, take appropriate action to address the root cause. + +5. Observe the results: + + After implementing changes or adjustments, continue monitoring the system using Netdata and other tools to ensure the anomalies are resolved and do not persist. + diff --git a/health/guides/anomalies/anomalies_anomaly_probabilities.md b/health/guides/anomalies/anomalies_anomaly_probabilities.md new file mode 100644 index 00000000..cea04a43 --- /dev/null +++ b/health/guides/anomalies/anomalies_anomaly_probabilities.md @@ -0,0 +1,30 @@ +### Understand the alert + +This alert, `anomalies_anomaly_probabilities`, is generated by the Netdata agent when the average anomaly probability over the last 2 minutes is 50. An anomaly probability is a value calculated by the machine learning (ML) component in Netdata, aiming to detect unusual events or behavior in system metrics. + +### What is anomaly probability? + +Anomaly probability is a percentage calculated by the Netdata's ML feature that represents the probability of an observed metric value being considered an anomaly. A higher anomaly probability indicates a higher chance that the system behavior is deviating from its historical patterns or expected behavior. + +### What does an average anomaly probability of 50 mean? + +An average anomaly probability of 50 indicates that there might be some unusual events, metrics, or behavior in your monitored system. This might not necessarily indicate an issue, but rather, it raises suspicious deviations in the system metrics that are worth investigating. + +### Troubleshoot the alert + +1. Investigate the unusual events or behavior + + The first step is to identify the metric(s) or series of metric values that are causing the alert. Look for changes in the monitored metrics or a combination of metrics that deviate significantly from their historical patterns. + +2. Check system performance and resource usage + + Use the overview and anomalies tab to explore the metrics that could be contributing to anomalies. + +3. Inspect system logs + + System logs can provide valuable information about unusual events or behaviors. Check system logs using tools like `journalctl`, `dmesg`, or `tail` for any error messages, warnings, or critical events that might be related to the anomaly. + +4. Review the alert settings + + In some cases, the alert may be caused by overly strict or sensitive settings, leading to the triggering of false positives. Review the settings and consider adjusting the anomaly probability threshold, if necessary. + |