summaryrefslogtreecommitdiffstats
path: root/src/ml/notebooks/netdata_anomaly_detection_deepdive.ipynb
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--src/ml/notebooks/netdata_anomaly_detection_deepdive.ipynb (renamed from ml/notebooks/netdata_anomaly_detection_deepdive.ipynb)18
1 files changed, 9 insertions, 9 deletions
diff --git a/ml/notebooks/netdata_anomaly_detection_deepdive.ipynb b/src/ml/notebooks/netdata_anomaly_detection_deepdive.ipynb
index 14e4366bb..f939b1317 100644
--- a/ml/notebooks/netdata_anomaly_detection_deepdive.ipynb
+++ b/src/ml/notebooks/netdata_anomaly_detection_deepdive.ipynb
@@ -19,7 +19,7 @@
}
},
"source": [
- "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/netdata/netdata/blob/master/ml/notebooks/netdata_anomaly_detection_deepdive.ipynb)"
+ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/netdata/netdata/blob/master/src/ml/notebooks/netdata_anomaly_detection_deepdive.ipynb)"
]
},
{
@@ -30,7 +30,7 @@
}
},
"source": [
- "This notebook will walk through a simplified python based implementation of the C & C++ code in [`netdata/netdata/ml/`](https://github.com/netdata/netdata/tree/master/ml) used to power the [anomaly detection capabilities](https://github.com/netdata/netdata/blob/master/ml/README.md) of the Netdata agent.\n",
+ "This notebook will walk through a simplified python based implementation of the C & C++ code in [`netdata/netdata/ml/`](https://github.com/netdata/netdata/tree/master/src/ml) used to power the [anomaly detection capabilities](https://github.com/netdata/netdata/blob/master/src/ml/README.md) of the Netdata agent.\n",
"\n",
"The main goal here is to help interested users learn more about how the machine learning works under the hood. If you just want to get started by enabling ml on your agent you can check out these [simple configuration steps](https://learn.netdata.cloud/docs/agent/ml#configuration). \n",
"\n",
@@ -155,7 +155,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "A full list of all the anomaly detection configuration parameters, and descriptions of each, can be found in the [configuration](https://github.com/netdata/netdata/blob/master/ml/README.md#configuration) section of the [ml readme](https://github.com/netdata/netdata/blob/master/ml/README.md).\n",
+ "A full list of all the anomaly detection configuration parameters, and descriptions of each, can be found in the [configuration](https://github.com/netdata/netdata/blob/master/src/ml/README.md#configuration) section of the [ml readme](https://github.com/netdata/netdata/blob/master/src/ml/README.md).\n",
"\n",
"Below we will focus on some basic params to decide what data to pull and the main ml params of importance in understanding how it all works.\n",
"\n",
@@ -169,13 +169,13 @@
"- `num_samples_to_lag`: The number of previous values to also include in our feature vector.\n",
"\n",
"#### anomaly score related parameters:\n",
- "- `dimension_anomaly_score_threshold`: The threshold on the anomaly score, above which the data it considered anomalous and the [anomaly bit](https://github.com/netdata/netdata/blob/master/ml/README.md#anomaly-bit) is set to 1 (its actually set to 100 in reality but this just to make it behave more like a rate when aggregated in the netdata agent api). By default this is `0.99` which means anything with an anomaly score above 99% is considered anomalous. Decreasing this threshold makes the model more sensitive and will leave to more anomaly bits, increasing it does the opposite.\n",
+ "- `dimension_anomaly_score_threshold`: The threshold on the anomaly score, above which the data it considered anomalous and the [anomaly bit](https://github.com/netdata/netdata/blob/master/src/ml/README.md#anomaly-bit) is set to 1 (its actually set to 100 in reality but this just to make it behave more like a rate when aggregated in the netdata agent api). By default this is `0.99` which means anything with an anomaly score above 99% is considered anomalous. Decreasing this threshold makes the model more sensitive and will leave to more anomaly bits, increasing it does the opposite.\n",
"\n",
"#### model parameters:\n",
"- `n_clusters_per_dimension`: This is the number of clusters to fit for each model, by default it is set to 2 such that 2 cluster [centroids](https://en.wikipedia.org/wiki/Centroid) will be fit for each model.\n",
"- `max_iterations`: The maximum number of iterations the fitting of the clusters is allowed to take. In reality the clustering will converge a lot sooner than this.\n",
"\n",
- "**Note**: There is much more detailed discussion of all there configuration parameters in the [\"Configuration\"](https://github.com/netdata/netdata/blob/master/ml/README.md#configuration) section of the ml readme."
+ "**Note**: There is much more detailed discussion of all there configuration parameters in the [\"Configuration\"](https://github.com/netdata/netdata/blob/master/src/ml/README.md#configuration) section of the ml readme."
]
},
{
@@ -397,7 +397,7 @@
"source": [
"In the plot below it should be clear that the light yellow section of the data has been messed with and is now \"anomalous\" or \"strange looking\" in comparison to all the data that comes before it. \n",
"\n",
- "Our goal now is to create some sort of [anomaly score](https://github.com/netdata/netdata/blob/master/ml/README.md#anomaly-score) that can easily capture this."
+ "Our goal now is to create some sort of [anomaly score](https://github.com/netdata/netdata/blob/master/src/ml/README.md#anomaly-score) that can easily capture this."
]
},
{
@@ -444,7 +444,7 @@
"\n",
"In this notebook we will just use good old [kmeans](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html) from [scikit-learn](https://scikit-learn.org/stable/index.html). \n",
"\n",
- "In reality the Netdata Agent uses the awesome [dlib](https://github.com/davisking/dlib) c++ library and the [`find_clusters_using_kmeans`](http://dlib.net/ml.html#find_clusters_using_kmeans) function along with a few others. You can see the Netdata KMeans code [here](https://github.com/netdata/netdata/blob/master/ml/kmeans/KMeans.cc).\n",
+ "In reality the Netdata Agent uses the awesome [dlib](https://github.com/davisking/dlib) c++ library and the [`find_clusters_using_kmeans`](http://dlib.net/ml.html#find_clusters_using_kmeans) function along with a few others. You can see the Netdata KMeans code [here](https://github.com/netdata/netdata/blob/master/src/ml/kmeans/KMeans.cc).\n",
"\n",
"The code below:\n",
"\n",
@@ -780,7 +780,7 @@
"source": [
"Now that we have our raw data, our anomaly scores, and our anomaly bits - we can plot this all side by side to get a clear picture of how it all works together.\n",
"\n",
- "In the plots below we see that during the light yellow \"anomalous\" period the \"[anomaly scores](https://github.com/netdata/netdata/blob/master/ml/README.md#anomaly-score)\" get elevated to such an extend that many \"[anomaly bits](https://github.com/netdata/netdata/blob/master/ml/README.md#anomaly-bit)\" start flipping from 0 to 1 and essentially \"turning on\" to signal potentially anomalous data."
+ "In the plots below we see that during the light yellow \"anomalous\" period the \"[anomaly scores](https://github.com/netdata/netdata/blob/master/src/ml/README.md#anomaly-score)\" get elevated to such an extend that many \"[anomaly bits](https://github.com/netdata/netdata/blob/master/src/ml/README.md#anomaly-bit)\" start flipping from 0 to 1 and essentially \"turning on\" to signal potentially anomalous data."
]
},
{
@@ -883,7 +883,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "The last concept to introduce now is the \"[anomaly rate](https://github.com/netdata/netdata/blob/master/ml/README.md#anomaly-rate)\" which is really just an average over \"anomaly bits\".\n",
+ "The last concept to introduce now is the \"[anomaly rate](https://github.com/netdata/netdata/blob/master/src/ml/README.md#anomaly-rate)\" which is really just an average over \"anomaly bits\".\n",
"\n",
"For example, in the next cell we will just average all the anomaly bits across the light yellow window of time to find the anomaly rate for the metric within this window. "
]