blob: 6836ce7b1a5e260fcbc8e12fafce2ef4147e5da3 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
|
# below are some examples of using the `anomaly-bit` option to define alerts based on anomaly
# rates as opposed to raw metric values. You can read more about the anomaly-bit and Netdata's
# native anomaly detection here:
# https://learn.netdata.cloud/docs/agent/ml#anomaly-bit---100--anomalous-0--normal
# examples below are commented, you would need to uncomment and adjust as desired to enable them.
# node level anomaly rate example
# https://learn.netdata.cloud/docs/agent/ml#node-anomaly-rate
# if node level anomaly rate is between 1-5% then warning (pick your own threshold that works best via tial and error).
# if node level anomaly rate is above 5% then critical (pick your own threshold that works best via tial and error).
# template: ml_1min_node_ar
# on: anomaly_detection.anomaly_rate
# os: linux
# hosts: *
# lookup: average -1m foreach anomaly_rate
# calc: $this
# units: %
# every: 30s
# warn: $this > (($status >= $WARNING) ? (1) : (5))
# crit: $this > (($status == $CRITICAL) ? (5) : (100))
# info: rolling 1min node level anomaly rate
# alert per dimension example
# if anomaly rate is between 5-20% then warning (pick your own threshold that works best via tial and error).
# if anomaly rate is above 20% then critical (pick your own threshold that works best via tial and error).
# template: ml_5min_cpu_dims
# on: system.cpu
# os: linux
# hosts: *
# lookup: average -5m anomaly-bit foreach *
# calc: $this
# units: %
# every: 30s
# warn: $this > (($status >= $WARNING) ? (5) : (20))
# crit: $this > (($status == $CRITICAL) ? (20) : (100))
# info: rolling 5min anomaly rate for each system.cpu dimension
# alert per chart example
# if anomaly rate is between 5-20% then warning (pick your own threshold that works best via tial and error).
# if anomaly rate is above 20% then critical (pick your own threshold that works best via tial and error).
# template: ml_5min_cpu_chart
# on: system.cpu
# os: linux
# hosts: *
# lookup: average -5m anomaly-bit of *
# calc: $this
# units: %
# every: 30s
# warn: $this > (($status >= $WARNING) ? (5) : (20))
# crit: $this > (($status == $CRITICAL) ? (20) : (100))
# info: rolling 5min anomaly rate for system.cpu chart
|