diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2019-10-13 08:37:32 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2019-10-13 08:38:18 +0000 |
commit | ca540a730c0b880922e86074f994a95b8d413bea (patch) | |
tree | 1364a1b82cfcc68f51aabf9b2545e6a06059d6bb /backends | |
parent | Releasing debian version 1.17.1-1. (diff) | |
download | netdata-ca540a730c0b880922e86074f994a95b8d413bea.tar.xz netdata-ca540a730c0b880922e86074f994a95b8d413bea.zip |
Merging upstream version 1.18.0.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
-rw-r--r-- | backends/README.md | 189 | ||||
-rw-r--r-- | backends/WALKTHROUGH.md | 293 | ||||
-rw-r--r-- | backends/aws_kinesis/README.md | 21 | ||||
-rw-r--r-- | backends/mongodb/README.md | 14 | ||||
-rw-r--r-- | backends/opentsdb/README.md | 18 | ||||
-rw-r--r-- | backends/prometheus/README.md | 140 | ||||
-rw-r--r-- | backends/prometheus/remote_write/README.md | 16 |
7 files changed, 360 insertions, 331 deletions
diff --git a/backends/README.md b/backends/README.md index f93e60f56..ac0847dca 100644 --- a/backends/README.md +++ b/backends/README.md @@ -1,26 +1,25 @@ # Metrics long term archiving -Netdata supports backends for archiving the metrics, or providing long term dashboards, -using Grafana or other tools, like this: +Netdata supports backends for archiving the metrics, or providing long term dashboards, using Grafana or other tools, +like this: ![image](https://cloud.githubusercontent.com/assets/2662304/20649711/29f182ba-b4ce-11e6-97c8-ab2c0ab59833.png) -Since Netdata collects thousands of metrics per server per second, which would easily congest any backend -server when several Netdata servers are sending data to it, Netdata allows sending metrics at a lower -frequency, by resampling them. +Since Netdata collects thousands of metrics per server per second, which would easily congest any backend server when +several Netdata servers are sending data to it, Netdata allows sending metrics at a lower frequency, by resampling them. -So, although Netdata collects metrics every second, it can send to the backend servers averages or sums every -X seconds (though, it can send them per second if you need it to). +So, although Netdata collects metrics every second, it can send to the backend servers averages or sums every X seconds +(though, it can send them per second if you need it to). ## features 1. Supported backends - - **graphite** (`plaintext interface`, used by **Graphite**, **InfluxDB**, **KairosDB**, - **Blueflood**, **ElasticSearch** via logstash tcp input and the graphite codec, etc) + - **graphite** (`plaintext interface`, used by **Graphite**, **InfluxDB**, **KairosDB**, **Blueflood**, + **ElasticSearch** via logstash tcp input and the graphite codec, etc) - metrics are sent to the backend server as `prefix.hostname.chart.dimension`. `prefix` is - configured below, `hostname` is the hostname of the machine (can also be configured). + metrics are sent to the backend server as `prefix.hostname.chart.dimension`. `prefix` is configured below, + `hostname` is the hostname of the machine (can also be configured). - **opentsdb** (`telnet or HTTP interfaces`, used by **OpenTSDB**, **InfluxDB**, **KairosDB**, etc) @@ -33,12 +32,12 @@ X seconds (though, it can send them per second if you need it to). - **prometheus** is described at [prometheus page](prometheus/) since it pulls data from Netdata. - **prometheus remote write** (a binary snappy-compressed protocol buffer encoding over HTTP used by - **Elasticsearch**, **Gnocchi**, **Graphite**, **InfluxDB**, **Kafka**, **OpenTSDB**, - **PostgreSQL/TimescaleDB**, **Splunk**, **VictoriaMetrics**, - and a lot of other [storage providers](https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage)) + **Elasticsearch**, **Gnocchi**, **Graphite**, **InfluxDB**, **Kafka**, **OpenTSDB**, **PostgreSQL/TimescaleDB**, + **Splunk**, **VictoriaMetrics**, and a lot of other [storage + providers](https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage)) - metrics are labeled in the format, which is used by Netdata for the [plaintext prometheus protocol](prometheus/). - Notes on using the remote write backend are [here](prometheus/remote_write/). + metrics are labeled in the format, which is used by Netdata for the [plaintext prometheus + protocol](prometheus/). Notes on using the remote write backend are [here](prometheus/remote_write/). - **AWS Kinesis Data Streams** @@ -54,32 +53,37 @@ X seconds (though, it can send them per second if you need it to). 4. Netdata supports three modes of operation for all backends: - - `as-collected` sends to backends the metrics as they are collected, in the units they are collected. - So, counters are sent as counters and gauges are sent as gauges, much like all data collectors do. - For example, to calculate CPU utilization in this format, you need to know how to convert kernel ticks to percentage. + - `as-collected` sends to backends the metrics as they are collected, in the units they are collected. So, + counters are sent as counters and gauges are sent as gauges, much like all data collectors do. For example, to + calculate CPU utilization in this format, you need to know how to convert kernel ticks to percentage. - - `average` sends to backends normalized metrics from the Netdata database. - In this mode, all metrics are sent as gauges, in the units Netdata uses. This abstracts data collection - and simplifies visualization, but you will not be able to copy and paste queries from other sources to convert units. - For example, CPU utilization percentage is calculated by Netdata, so Netdata will convert ticks to percentage and - send the average percentage to the backend. + - `average` sends to backends normalized metrics from the Netdata database. In this mode, all metrics are sent as + gauges, in the units Netdata uses. This abstracts data collection and simplifies visualization, but you will not + be able to copy and paste queries from other sources to convert units. For example, CPU utilization percentage + is calculated by Netdata, so Netdata will convert ticks to percentage and send the average percentage to the + backend. - - `sum` or `volume`: the sum of the interpolated values shown on the Netdata graphs is sent to the backend. - So, if Netdata is configured to send data to the backend every 10 seconds, the sum of the 10 values shown on the + - `sum` or `volume`: the sum of the interpolated values shown on the Netdata graphs is sent to the backend. So, if + Netdata is configured to send data to the backend every 10 seconds, the sum of the 10 values shown on the Netdata charts will be used. -Time-series databases suggest to collect the raw values (`as-collected`). If you plan to invest on building your monitoring around a time-series database and you already know (or you will invest in learning) how to convert units and normalize the metrics in Grafana or other visualization tools, we suggest to use `as-collected`. + Time-series databases suggest to collect the raw values (`as-collected`). If you plan to invest on building your + monitoring around a time-series database and you already know (or you will invest in learning) how to convert units + and normalize the metrics in Grafana or other visualization tools, we suggest to use `as-collected`. -If, on the other hand, you just need long term archiving of Netdata metrics and you plan to mainly work with Netdata, we suggest to use `average`. It decouples visualization from data collection, so it will generally be a lot simpler. Furthermore, if you use `average`, the charts shown in the back-end will match exactly what you see in Netdata, which is not necessarily true for the other modes of operation. + If, on the other hand, you just need long term archiving of Netdata metrics and you plan to mainly work with + Netdata, we suggest to use `average`. It decouples visualization from data collection, so it will generally be a lot + simpler. Furthermore, if you use `average`, the charts shown in the back-end will match exactly what you see in + Netdata, which is not necessarily true for the other modes of operation. 5. This code is smart enough, not to slow down Netdata, independently of the speed of the backend server. ## configuration -In `/etc/netdata/netdata.conf` you should have something like this (if not download the latest version -of `netdata.conf` from your Netdata): +In `/etc/netdata/netdata.conf` you should have something like this (if not download the latest version of `netdata.conf` +from your Netdata): -``` +```conf [backend] enabled = yes | no type = graphite | opentsdb:telnet | opentsdb:http | opentsdb:https | prometheus_remote_write | json | kinesis | mongodb @@ -98,92 +102,87 @@ of `netdata.conf` from your Netdata): - `enabled = yes | no`, enables or disables sending data to a backend -- `type = graphite | opentsdb:telnet | opentsdb:http | opentsdb:https | json | kinesis | mongodb`, selects the backend type +- `type = graphite | opentsdb:telnet | opentsdb:http | opentsdb:https | json | kinesis | mongodb`, selects the backend + type -- `destination = host1 host2 host3 ...`, accepts **a space separated list** of hostnames, - IPs (IPv4 and IPv6) and ports to connect to. - Netdata will use the **first available** to send the metrics. +- `destination = host1 host2 host3 ...`, accepts **a space separated list** of hostnames, IPs (IPv4 and IPv6) and + ports to connect to. Netdata will use the **first available** to send the metrics. The format of each item in this list, is: `[PROTOCOL:]IP[:PORT]`. `PROTOCOL` can be `udp` or `tcp`. `tcp` is the default and only supported by the current backends. - `IP` can be `XX.XX.XX.XX` (IPv4), or `[XX:XX...XX:XX]` (IPv6). - For IPv6 you can to enclose the IP in `[]` to separate it from the port. + `IP` can be `XX.XX.XX.XX` (IPv4), or `[XX:XX...XX:XX]` (IPv6). For IPv6 you can to enclose the IP in `[]` to + separate it from the port. `PORT` can be a number of a service name. If omitted, the default port for the backend will be used (graphite = 2003, opentsdb = 4242). Example IPv4: -``` +```conf destination = 10.11.14.2:4242 10.11.14.3:4242 10.11.14.4:4242 ``` Example IPv6 and IPv4 together: -``` +```conf destination = [ffff:...:0001]:2003 10.11.12.1:2003 ``` - When multiple servers are defined, Netdata will try the next one when the first one fails. This allows - you to load-balance different servers: give your backend servers in different order on each Netdata. + When multiple servers are defined, Netdata will try the next one when the first one fails. This allows you to + load-balance different servers: give your backend servers in different order on each Netdata. - Netdata also ships [`nc-backend.sh`](nc-backend.sh), - a script that can be used as a fallback backend to save the metrics to disk and push them to the - time-series database when it becomes available again. It can also be used to monitor / trace / debug - the metrics Netdata generates. + Netdata also ships [`nc-backend.sh`](nc-backend.sh), a script that can be used as a fallback backend to save the + metrics to disk and push them to the time-series database when it becomes available again. It can also be used to + monitor / trace / debug the metrics Netdata generates. For kinesis backend `destination` should be set to an AWS region (for example, `us-east-1`). The MongoDB backend doesn't use the `destination` option for its configuration. It uses the `mongodb.conf` - [configuration file](mongodb/README.md) instead. + [configuration file](../backends/mongodb/) instead. -- `data source = as collected`, or `data source = average`, or `data source = sum`, selects the kind of - data that will be sent to the backend. +- `data source = as collected`, or `data source = average`, or `data source = sum`, selects the kind of data that will + be sent to the backend. -- `hostname = my-name`, is the hostname to be used for sending data to the backend server. By default - this is `[global].hostname`. +- `hostname = my-name`, is the hostname to be used for sending data to the backend server. By default this is + `[global].hostname`. - `prefix = Netdata`, is the prefix to add to all metrics. -- `update every = 10`, is the number of seconds between sending data to the backend. Netdata will add - some randomness to this number, to prevent stressing the backend server when many Netdata servers send - data to the same backend. This randomness does not affect the quality of the data, only the time they - are sent. - -- `buffer on failures = 10`, is the number of iterations (each iteration is `[backend].update every` seconds) - to buffer data, when the backend is not available. If the backend fails to receive the data after that - many failures, data loss on the backend is expected (Netdata will also log it). - -- `timeout ms = 20000`, is the timeout in milliseconds to wait for the backend server to process the data. - By default this is `2 * update_every * 1000`. - -- `send hosts matching = localhost *` includes one or more space separated patterns, using `*` as wildcard - (any number of times within each pattern). The patterns are checked against the hostname (the localhost - is always checked as `localhost`), allowing us to filter which hosts will be sent to the backend when - this Netdata is a central Netdata aggregating multiple hosts. A pattern starting with `!` gives a - negative match. So to match all hosts named `*db*` except hosts containing `*slave*`, use - `!*slave* *db*` (so, the order is important: the first pattern matching the hostname will be used - positive - or negative). - -- `send charts matching = *` includes one or more space separated patterns, using `*` as wildcard (any - number of times within each pattern). The patterns are checked against both chart id and chart name. - A pattern starting with `!` gives a negative match. So to match all charts named `apps.*` - except charts ending in `*reads`, use `!*reads apps.*` (so, the order is important: the first pattern - matching the chart id or the chart name will be used - positive or negative). - -- `send names instead of ids = yes | no` controls the metric names Netdata should send to backend. - Netdata supports names and IDs for charts and dimensions. Usually IDs are unique identifiers as read - by the system and names are human friendly labels (also unique). Most charts and metrics have the same - ID and name, but in several cases they are different: disks with device-mapper, interrupts, QoS classes, - statsd synthetic charts, etc. - -- `host tags = list of TAG=VALUE` defines tags that should be appended on all metrics for the given host. - These are currently only sent to opentsdb and prometheus. Please use the appropriate format for each - time-series db. For example opentsdb likes them like `TAG1=VALUE1 TAG2=VALUE2`, but prometheus like - `tag1="value1",tag2="value2"`. Host tags are mirrored with database replication (streaming of metrics - between Netdata servers). +- `update every = 10`, is the number of seconds between sending data to the backend. Netdata will add some randomness + to this number, to prevent stressing the backend server when many Netdata servers send data to the same backend. + This randomness does not affect the quality of the data, only the time they are sent. + +- `buffer on failures = 10`, is the number of iterations (each iteration is `[backend].update every` seconds) to + buffer data, when the backend is not available. If the backend fails to receive the data after that many failures, + data loss on the backend is expected (Netdata will also log it). + +- `timeout ms = 20000`, is the timeout in milliseconds to wait for the backend server to process the data. By default + this is `2 * update_every * 1000`. + +- `send hosts matching = localhost *` includes one or more space separated patterns, using `*` as wildcard (any number + of times within each pattern). The patterns are checked against the hostname (the localhost is always checked as + `localhost`), allowing us to filter which hosts will be sent to the backend when this Netdata is a central Netdata + aggregating multiple hosts. A pattern starting with `!` gives a negative match. So to match all hosts named `*db*` + except hosts containing `*slave*`, use `!*slave* *db*` (so, the order is important: the first pattern matching the + hostname will be used - positive or negative). + +- `send charts matching = *` includes one or more space separated patterns, using `*` as wildcard (any number of times + within each pattern). The patterns are checked against both chart id and chart name. A pattern starting with `!` + gives a negative match. So to match all charts named `apps.*` except charts ending in `*reads`, use `!*reads + apps.*` (so, the order is important: the first pattern matching the chart id or the chart name will be used - + positive or negative). + +- `send names instead of ids = yes | no` controls the metric names Netdata should send to backend. Netdata supports + names and IDs for charts and dimensions. Usually IDs are unique identifiers as read by the system and names are + human friendly labels (also unique). Most charts and metrics have the same ID and name, but in several cases they + are different: disks with device-mapper, interrupts, QoS classes, statsd synthetic charts, etc. + +- `host tags = list of TAG=VALUE` defines tags that should be appended on all metrics for the given host. These are + currently only sent to opentsdb and prometheus. Please use the appropriate format for each time-series db. For + example opentsdb likes them like `TAG1=VALUE1 TAG2=VALUE2`, but prometheus like `tag1="value1",tag2="value2"`. Host + tags are mirrored with database replication (streaming of metrics between Netdata servers). ## monitoring operation @@ -194,16 +193,15 @@ Netdata provides 5 charts: 2. **Buffered data size**, the amount of data (in KB) Netdata added the buffer. -3. ~~**Backend latency**, the time the backend server needed to process the data Netdata sent. - If there was a re-connection involved, this includes the connection time.~~ - (this chart has been removed, because it only measures the time Netdata needs to give the data - to the O/S - since the backend servers do not ack the reception, Netdata does not have any means - to measure this properly). +3. ~~**Backend latency**, the time the backend server needed to process the data Netdata sent. If there was a + re-connection involved, this includes the connection time.~~ (this chart has been removed, because it only measures + the time Netdata needs to give the data to the O/S - since the backend servers do not ack the reception, Netdata + does not have any means to measure this properly). 4. **Backend operations**, the number of operations performed by Netdata. -5. **Backend thread CPU usage**, the CPU resources consumed by the Netdata thread, that is responsible - for sending the metrics to the backend server. +5. **Backend thread CPU usage**, the CPU resources consumed by the Netdata thread, that is responsible for sending the + metrics to the backend server. ![image](https://cloud.githubusercontent.com/assets/2662304/20463536/eb196084-af3d-11e6-8ee5-ddbd3b4d8449.png) @@ -216,7 +214,8 @@ Netdata adds 4 alarms: 1. `backend_last_buffering`, number of seconds since the last successful buffering of backend data 2. `backend_metrics_sent`, percentage of metrics sent to the backend server 3. `backend_metrics_lost`, number of metrics lost due to repeating failures to contact the backend server -4. ~~`backend_slow`, the percentage of time between iterations needed by the backend time to process the data sent by Netdata~~ (this was misleading and has been removed). +4. ~~`backend_slow`, the percentage of time between iterations needed by the backend time to process the data sent by + Netdata~~ (this was misleading and has been removed). ![image](https://cloud.githubusercontent.com/assets/2662304/20463779/a46ed1c2-af43-11e6-91a5-07ca4533cac3.png) diff --git a/backends/WALKTHROUGH.md b/backends/WALKTHROUGH.md index 19f4ac0e1..c6461db46 100644 --- a/backends/WALKTHROUGH.md +++ b/backends/WALKTHROUGH.md @@ -2,123 +2,102 @@ ## Intro -In this article I will walk you through the basics of getting Netdata, -Prometheus and Grafana all working together and monitoring your application -servers. This article will be using docker on your local workstation. We will be -working with docker in an ad-hoc way, launching containers that run ‘/bin/bash’ -and attaching a TTY to them. I use docker here in a purely academic fashion and -do not condone running Netdata in a container. I pick this method so individuals -without cloud accounts or access to VMs can try this out and for it’s speed of -deployment. +In this article I will walk you through the basics of getting Netdata, Prometheus and Grafana all working together and +monitoring your application servers. This article will be using docker on your local workstation. We will be working +with docker in an ad-hoc way, launching containers that run ‘/bin/bash’ and attaching a TTY to them. I use docker here +in a purely academic fashion and do not condone running Netdata in a container. I pick this method so individuals +without cloud accounts or access to VMs can try this out and for it’s speed of deployment. ## Why Netdata, Prometheus, and Grafana -Some time ago I was introduced to Netdata by a coworker. We were attempting to -troubleshoot python code which seemed to be bottlenecked. I was instantly -impressed by the amount of metrics Netdata exposes to you. I quickly added -Netdata to my set of go-to tools when troubleshooting systems performance. - -Some time ago, even later, I was introduced to Prometheus. Prometheus is a -monitoring application which flips the normal architecture around and polls -rest endpoints for its metrics. This architectural change greatly simplifies -and decreases the time necessary to begin monitoring your applications. -Compared to current monitoring solutions the time spent on designing the -infrastructure is greatly reduced. Running a single Prometheus server per -application becomes feasible with the help of Grafana. - -Grafana has been the go to graphing tool for… some time now. It’s awesome, -anyone that has used it knows it’s awesome. We can point Grafana at Prometheus -and use Prometheus as a data source. This allows a pretty simple overall -monitoring architecture: Install Netdata on your application servers, point -Prometheus at Netdata, and then point Grafana at Prometheus. - -I’m omitting an important ingredient in this stack in order to keep this tutorial -simple and that is service discovery. My personal preference is to use Consul. -Prometheus can plug into consul and automatically begin to scrape new hosts that -register a Netdata client with Consul. - -At the end of this tutorial you will understand how each technology fits -together to create a modern monitoring stack. This stack will offer you -visibility into your application and systems performance. +Some time ago I was introduced to Netdata by a coworker. We were attempting to troubleshoot python code which seemed to +be bottlenecked. I was instantly impressed by the amount of metrics Netdata exposes to you. I quickly added Netdata to +my set of go-to tools when troubleshooting systems performance. + +Some time ago, even later, I was introduced to Prometheus. Prometheus is a monitoring application which flips the normal +architecture around and polls rest endpoints for its metrics. This architectural change greatly simplifies and decreases +the time necessary to begin monitoring your applications. Compared to current monitoring solutions the time spent on +designing the infrastructure is greatly reduced. Running a single Prometheus server per application becomes feasible +with the help of Grafana. + +Grafana has been the go to graphing tool for… some time now. It’s awesome, anyone that has used it knows it’s awesome. +We can point Grafana at Prometheus and use Prometheus as a data source. This allows a pretty simple overall monitoring +architecture: Install Netdata on your application servers, point Prometheus at Netdata, and then point Grafana at +Prometheus. + +I’m omitting an important ingredient in this stack in order to keep this tutorial simple and that is service discovery. +My personal preference is to use Consul. Prometheus can plug into consul and automatically begin to scrape new hosts +that register a Netdata client with Consul. + +At the end of this tutorial you will understand how each technology fits together to create a modern monitoring stack. +This stack will offer you visibility into your application and systems performance. ## Getting Started - Netdata -To begin let’s create our container which we will install Netdata on. We need -to run a container, forward the necessary port that Netdata listens on, and -attach a tty so we can interact with the bash shell on the container. But -before we do this we want name resolution between the two containers to work. -In order to accomplish this we will create a user-defined network and attach -both containers to this network. The first command we should run is: +To begin let’s create our container which we will install Netdata on. We need to run a container, forward the necessary +port that Netdata listens on, and attach a tty so we can interact with the bash shell on the container. But before we do +this we want name resolution between the two containers to work. In order to accomplish this we will create a +user-defined network and attach both containers to this network. The first command we should run is: ```sh docker network create --driver bridge netdata-tutorial ``` -With this user-defined network created we can now launch our container we will -install Netdata on and point it to this network. +With this user-defined network created we can now launch our container we will install Netdata on and point it to this +network. ```sh docker run -it --name netdata --hostname netdata --network=netdata-tutorial -p 19999:19999 centos:latest '/bin/bash' ``` -This command creates an interactive tty session (-it), gives the container both -a name in relation to the docker daemon and a hostname (this is so you know what -container is which when working in the shells and docker maps hostname -resolution to this container), forwards the local port 19999 to the container’s -port 19999 (-p 19999:19999), sets the command to run (/bin/bash) and then -chooses the base container images (centos:latest). After running this you should -be sitting inside the shell of the container. +This command creates an interactive tty session (-it), gives the container both a name in relation to the docker daemon +and a hostname (this is so you know what container is which when working in the shells and docker maps hostname +resolution to this container), forwards the local port 19999 to the container’s port 19999 (-p 19999:19999), sets the +command to run (/bin/bash) and then chooses the base container images (centos:latest). After running this you should be +sitting inside the shell of the container. -After we have entered the shell we can install Netdata. This process could not -be easier. If you take a look at [this link](../packaging/installer/#installation), the Netdata devs give us -several one-liners to install Netdata. I have not had any issues with these one -liners and their bootstrapping scripts so far (If you guys run into anything do -share). Run the following command in your container. +After we have entered the shell we can install Netdata. This process could not be easier. If you take a look at [this +link](../packaging/installer/#installation), the Netdata devs give us several one-liners to install Netdata. I have not +had any issues with these one liners and their bootstrapping scripts so far (If you guys run into anything do share). +Run the following command in your container. ```sh bash <(curl -Ss https://my-netdata.io/kickstart.sh) --dont-wait ``` -After the install completes you should be able to hit the Netdata dashboard at -<http://localhost:19999/> (replace localhost if you’re doing this on a VM or have -the docker container hosted on a machine not on your local system). If this is -your first time using Netdata I suggest you take a look around. The amount of -time I’ve spent digging through /proc and calculating my own metrics has been -greatly reduced by this tool. Take it all in. +After the install completes you should be able to hit the Netdata dashboard at <http://localhost:19999/> (replace +localhost if you’re doing this on a VM or have the docker container hosted on a machine not on your local system). If +this is your first time using Netdata I suggest you take a look around. The amount of time I’ve spent digging through +/proc and calculating my own metrics has been greatly reduced by this tool. Take it all in. Next I want to draw your attention to a particular endpoint. Navigate to -<http://localhost:19999/api/v1/allmetrics?format=prometheus&help=yes> In your -browser. This is the endpoint which publishes all the metrics in a format which -Prometheus understands. Let’s take a look at one of these metrics. -`netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="system"} -0.0831255 1501271696000` This metric is representing several things which I will -go in more details in the section on prometheus. For now understand that this -metric: `netdata_system_cpu_percentage_average` has several labels: [chart, -family, dimension]. This corresponds with the first cpu chart you see on the -Netdata dashboard. +<http://localhost:19999/api/v1/allmetrics?format=prometheus&help=yes> In your browser. This is the endpoint which +publishes all the metrics in a format which Prometheus understands. Let’s take a look at one of these metrics. +`netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="system"} 0.0831255 1501271696000` This +metric is representing several things which I will go in more details in the section on prometheus. For now understand +that this metric: `netdata_system_cpu_percentage_average` has several labels: (chart, family, dimension). This +corresponds with the first cpu chart you see on the Netdata dashboard. ![](https://github.com/ldelossa/NetdataTutorial/raw/master/Screen%20Shot%202017-07-28%20at%204.00.45%20PM.png) -This CHART is called ‘system.cpu’, The FAMILY is cpu, and the DIMENSION we are -observing is “system”. You can begin to draw links between the charts in Netdata -to the prometheus metrics format in this manner. +This CHART is called ‘system.cpu’, The FAMILY is cpu, and the DIMENSION we are observing is “system”. You can begin to +draw links between the charts in Netdata to the prometheus metrics format in this manner. ## Prometheus -We will be installing prometheus in a container for purpose of demonstration. -While prometheus does have an official container I would like to walk through -the install process and setup on a fresh container. This will allow anyone +We will be installing prometheus in a container for purpose of demonstration. While prometheus does have an official +container I would like to walk through the install process and setup on a fresh container. This will allow anyone reading to migrate this tutorial to a VM or Server of any sort. -Let’s start another container in the same fashion as we did the Netdata -container. +Let’s start another container in the same fashion as we did the Netdata container. ```sh docker run -it --name prometheus --hostname prometheus --network=netdata-tutorial -p 9090:9090 centos:latest '/bin/bash' ``` -This should drop you into a shell once again. Once there quickly install your favorite editor as we will be editing files later in this tutorial. +This should drop you into a shell once again. Once there quickly install your favorite editor as we will be editing +files later in this tutorial. ```sh yum install vim -y @@ -139,39 +118,33 @@ mkdir /opt/prometheus sudo tar -xvf /tmp/prometheus-*linux-amd64.tar.gz -C /opt/prometheus --strip=1 ``` -This should get prometheus installed into the container. Let’s test that we can run prometheus and connect to it’s web interface. +This should get prometheus installed into the container. Let’s test that we can run prometheus and connect to it’s web +interface. ```sh /opt/prometheus/prometheus ``` -Now attempt to go to <http://localhost:9090/>. You should be presented with the -prometheus homepage. This is a good point to talk about Prometheus’s data model -which can be viewed here: <https://prometheus.io/docs/concepts/data_model/> As -explained we have two key elements in Prometheus metrics. We have the ‘metric’ -and its ‘labels’. Labels allow for granularity between metrics. Let’s use our -previous example to further explain. +Now attempt to go to <http://localhost:9090/>. You should be presented with the prometheus homepage. This is a good +point to talk about Prometheus’s data model which can be viewed here: <https://prometheus.io/docs/concepts/data_model/> +As explained we have two key elements in Prometheus metrics. We have the ‘metric’ and its ‘labels’. Labels allow for +granularity between metrics. Let’s use our previous example to further explain. -``` +```conf netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="system"} 0.0831255 1501271696000 ``` -Here our metric is -‘netdata_system_cpu_percentage_average’ and our labels are ‘chart’, ‘family’, -and ‘dimension. The last two values constitute the actual metric value for the -metric type (gauge, counter, etc…). We can begin graphing system metrics with -this information, but first we need to hook up Prometheus to poll Netdata stats. - -Let’s move our attention to Prometheus’s configuration. Prometheus gets it -config from the file located (in our example) at -`/opt/prometheus/prometheus.yml`. I won’t spend an extensive amount of time -going over the configuration values documented here: -<https://prometheus.io/docs/operating/configuration/>. We will be adding a new -“job” under the “scrape_configs”. Let’s make the “scrape_configs” section look -like this (we can use the dns name Netdata due to the custom user-defined -network we created in docker beforehand). - -```yml +Here our metric is ‘netdata_system_cpu_percentage_average’ and our labels are ‘chart’, ‘family’, and ‘dimension. The +last two values constitute the actual metric value for the metric type (gauge, counter, etc…). We can begin graphing +system metrics with this information, but first we need to hook up Prometheus to poll Netdata stats. + +Let’s move our attention to Prometheus’s configuration. Prometheus gets it config from the file located (in our example) +at `/opt/prometheus/prometheus.yml`. I won’t spend an extensive amount of time going over the configuration values +documented here: <https://prometheus.io/docs/operating/configuration/>. We will be adding a new“job” under the +“scrape_configs”. Let’s make the “scrape_configs” section look like this (we can use the dns name Netdata due to the +custom user-defined network we created in docker beforehand). + +```yaml scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' @@ -192,84 +165,66 @@ scrape_configs: - targets: ['netdata:19999'] ``` -Let’s start prometheus once again by running `/opt/prometheus/prometheus`. If we - -now navigate to prometheus at ‘<http://localhost:9090/targets’> we should see our - -target being successfully scraped. If we now go back to the Prometheus’s -homepage and begin to type ‘netdata\_’ Prometheus should auto complete metrics -it is now scraping. +Let’s start prometheus once again by running `/opt/prometheus/prometheus`. If we now navigate to prometheus at +‘<http://localhost:9090/targets’> we should see our target being successfully scraped. If we now go back to the +Prometheus’s homepage and begin to type ‘netdata\_’ Prometheus should auto complete metrics it is now scraping. ![](https://github.com/ldelossa/NetdataTutorial/raw/master/Screen%20Shot%202017-07-28%20at%205.13.43%20PM.png) -Let’s now start exploring how we can graph some metrics. Back in our NetData -container lets get the CPU spinning with a pointless busy loop. On the shell do -the following: +Let’s now start exploring how we can graph some metrics. Back in our NetData container lets get the CPU spinning with a +pointless busy loop. On the shell do the following: -``` +```sh [root@netdata /]# while true; do echo "HOT HOT HOT CPU"; done ``` -Our NetData cpu graph should be showing some activity. Let’s represent this in -Prometheus. In order to do this let’s keep our metrics page open for reference: -<http://localhost:19999/api/v1/allmetrics?format=prometheus&help=yes> We are -setting out to graph the data in the CPU chart so let’s search for “system.cpu” -in the metrics page above. We come across a section of metrics with the first -comments `# COMMENT homogeneous chart "system.cpu", context "system.cpu", family -"cpu", units "percentage"` Followed by the metrics. This is a good start now let -us drill down to the specific metric we would like to graph. +Our NetData cpu graph should be showing some activity. Let’s represent this in Prometheus. In order to do this let’s +keep our metrics page open for reference: <http://localhost:19999/api/v1/allmetrics?format=prometheus&help=yes> We are +setting out to graph the data in the CPU chart so let’s search for “system.cpu”in the metrics page above. We come across +a section of metrics with the first comments `# COMMENT homogeneous chart "system.cpu", context "system.cpu", family +"cpu", units "percentage"` Followed by the metrics. This is a good start now let us drill down to the specific metric we +would like to graph. -``` +```conf # COMMENT netdata_system_cpu_percentage_average: dimension "system", value is percentage, gauge, dt 1501275951 to 1501275951 inclusive netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="system"} 0.0000000 1501275951000 ``` -Here we learn that the metric name we care about is -‘netdata_system_cpu_percentage_average’ so throw this into Prometheus and see -what we get. We should see something similar to this (I shut off my busy loop) +Here we learn that the metric name we care about is‘netdata_system_cpu_percentage_average’ so throw this into Prometheus +and see what we get. We should see something similar to this (I shut off my busy loop) ![](https://github.com/ldelossa/NetdataTutorial/raw/master/Screen%20Shot%202017-07-28%20at%205.47.53%20PM.png) -This is a good step toward what we want. Also make note that Prometheus will tag -on an ‘instance’ label for us which corresponds to our statically defined job in -the configuration file. This allows us to tailor our queries to specific -instances. Now we need to isolate the dimension we want in our query. To do this -let us refine the query slightly. Let’s query the dimension also. Place this -into our query text box. -`netdata_system_cpu_percentage_average{dimension="system"}` We now wind up with -the following graph. +This is a good step toward what we want. Also make note that Prometheus will tag on an ‘instance’ label for us which +corresponds to our statically defined job in the configuration file. This allows us to tailor our queries to specific +instances. Now we need to isolate the dimension we want in our query. To do this let us refine the query slightly. Let’s +query the dimension also. Place this into our query text box. +`netdata_system_cpu_percentage_average{dimension="system"}` We now wind up with the following graph. ![](https://github.com/ldelossa/NetdataTutorial/raw/master/Screen%20Shot%202017-07-28%20at%205.54.40%20PM.png) -Awesome, this is exactly what we wanted. If you haven’t caught on yet we can -emulate entire charts from NetData by using the `chart` dimension. If you’d like -you can combine the ‘chart’ and ‘instance’ dimension to create per-instance -charts. Let’s give this a try: -`netdata_system_cpu_percentage_average{chart="system.cpu", instance="netdata:19999"}` - -This is the basics of using Prometheus to query NetData. I’d advise everyone at -this point to read [this page](../backends/prometheus/#using-netdata-with-prometheus). -The key point here is that NetData can export metrics from its internal DB or -can send metrics “as-collected” by specifying the ‘source=as-collected’ url -parameter like so. -<http://localhost:19999/api/v1/allmetrics?format=prometheus&help=yes&types=yes&source=as-collected> -If you choose to use this method you will need to use Prometheus's set of -functions here: <https://prometheus.io/docs/querying/functions/> to obtain useful -metrics as you are now dealing with raw counters from the system. For example -you will have to use the `irate()` function over a counter to get that metric's -rate per second. If your graphing needs are met by using the metrics returned by -NetData's internal database (not specifying any source= url parameter) then use -that. If you find limitations then consider re-writing your queries using the -raw data and using Prometheus functions to get the desired chart. +Awesome, this is exactly what we wanted. If you haven’t caught on yet we can emulate entire charts from NetData by using +the `chart` dimension. If you’d like you can combine the ‘chart’ and ‘instance’ dimension to create per-instance charts. +Let’s give this a try: `netdata_system_cpu_percentage_average{chart="system.cpu", instance="netdata:19999"}` + +This is the basics of using Prometheus to query NetData. I’d advise everyone at this point to read [this +page](../backends/prometheus/#using-netdata-with-prometheus). The key point here is that NetData can export metrics from +its internal DB or can send metrics “as-collected” by specifying the ‘source=as-collected’ url parameter like so. +<http://localhost:19999/api/v1/allmetrics?format=prometheus&help=yes&types=yes&source=as-collected> If you choose to use +this method you will need to use Prometheus's set of functions here: <https://prometheus.io/docs/querying/functions/> to +obtain useful metrics as you are now dealing with raw counters from the system. For example you will have to use the +`irate()` function over a counter to get that metric's rate per second. If your graphing needs are met by using the +metrics returned by NetData's internal database (not specifying any source= url parameter) then use that. If you find +limitations then consider re-writing your queries using the raw data and using Prometheus functions to get the desired +chart. ## Grafana -Finally we make it to grafana. This is the easiest part in my opinion. This time -we will actually run the official grafana docker container as all configuration -we need to do is done via the GUI. Let’s run the following command: +Finally we make it to grafana. This is the easiest part in my opinion. This time we will actually run the official +grafana docker container as all configuration we need to do is done via the GUI. Let’s run the following command: -``` +```sh docker run -i -p 3000:3000 --network=netdata-tutorial grafana/grafana ``` @@ -277,26 +232,22 @@ This will get grafana running at ‘<http://localhost:3000/’> Let’s go there login using the credentials Admin:Admin. -The first thing we want to do is click ‘Add data source’. Let’s make it look -like the following screenshot +The first thing we want to do is click ‘Add data source’. Let’s make it look like the following screenshot ![](https://github.com/ldelossa/NetdataTutorial/raw/master/Screen%20Shot%202017-07-28%20at%206.36.55%20PM.png) -With this completed let’s graph! Create a new Dashboard by clicking on the top -left Grafana Icon and create a new graph in that dashboard. Fill in the query -like we did above and save. +With this completed let’s graph! Create a new Dashboard by clicking on the top left Grafana Icon and create a new graph +in that dashboard. Fill in the query like we did above and save. ![](https://github.com/ldelossa/NetdataTutorial/raw/master/Screen%20Shot%202017-07-28%20at%206.39.38%20PM.png) ## Conclusion -There you have it, a complete systems monitoring stack which is very easy to -deploy. From here I would begin to understand how Prometheus and a service -discovery mechanism such as Consul can play together nicely. My current prod -deployments automatically register Netdata services into Consul and Prometheus -automatically begins to scrape them. Once achieved you do not have to think -about the monitoring system until Prometheus cannot keep up with your scale. -Once this happens there are options presented in the Prometheus documentation -for solving this. Hope this was helpful, happy monitoring. +There you have it, a complete systems monitoring stack which is very easy to deploy. From here I would begin to +understand how Prometheus and a service discovery mechanism such as Consul can play together nicely. My current prod +deployments automatically register Netdata services into Consul and Prometheus automatically begins to scrape them. Once +achieved you do not have to think about the monitoring system until Prometheus cannot keep up with your scale. Once this +happens there are options presented in the Prometheus documentation for solving this. Hope this was helpful, happy +monitoring. [![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fbackends%2FWALKTHROUGH&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/backends/aws_kinesis/README.md b/backends/aws_kinesis/README.md index a6529f237..051ac9637 100644 --- a/backends/aws_kinesis/README.md +++ b/backends/aws_kinesis/README.md @@ -2,11 +2,17 @@ ## Prerequisites -To use AWS Kinesis as a backend AWS SDK for C++ should be [installed](https://docs.aws.amazon.com/en_us/sdk-for-cpp/v1/developer-guide/setup.html) first. `libcrypto`, `libssl`, and `libcurl` are also required to compile Netdata with Kinesis support enabled. Next, Netdata should be re-installed from the source. The installer will detect that the required libraries are now available. +To use AWS Kinesis as a backend AWS SDK for C++ should be +[installed](https://docs.aws.amazon.com/en_us/sdk-for-cpp/v1/developer-guide/setup.html) first. `libcrypto`, `libssl`, +and `libcurl` are also required to compile Netdata with Kinesis support enabled. Next, Netdata should be re-installed +from the source. The installer will detect that the required libraries are now available. -If the AWS SDK for C++ is being installed from source, it is useful to set `-DBUILD_ONLY="kinesis"`. Otherwise, the building process could take a very long time. Take a note, that the default installation path for the libraries is `/usr/local/lib64`. Many Linux distributions don't include this path as the default one for a library search, so it is advisable to use the following options to `cmake` while building the AWS SDK: +If the AWS SDK for C++ is being installed from source, it is useful to set `-DBUILD_ONLY="kinesis"`. Otherwise, the +building process could take a very long time. Take a note, that the default installation path for the libraries is +`/usr/local/lib64`. Many Linux distributions don't include this path as the default one for a library search, so it is +advisable to use the following options to `cmake` while building the AWS SDK: -``` +```sh cmake -DCMAKE_INSTALL_LIBDIR=/usr/lib -DCMAKE_INSTALL_INCLUDEDIR=/usr/include -DBUILD_SHARED_LIBS=OFF -DBUILD_ONLY=kinesis <aws-sdk-cpp sources> ``` @@ -14,7 +20,7 @@ cmake -DCMAKE_INSTALL_LIBDIR=/usr/lib -DCMAKE_INSTALL_INCLUDEDIR=/usr/include -D To enable data sending to the kinesis backend set the following options in `netdata.conf`: -``` +```conf [backend] enabled = yes type = kinesis @@ -25,7 +31,7 @@ set the `destination` option to an AWS region. In the Netdata configuration directory run `./edit-config aws_kinesis.conf` and set AWS credentials and stream name: -``` +```yaml # AWS credentials aws_access_key_id = your_access_key_id aws_secret_access_key = your_secret_access_key @@ -34,8 +40,9 @@ aws_secret_access_key = your_secret_access_key stream name = your_stream_name ``` -Alternatively, AWS credentials can be set for the *netdata* user using AWS SDK for C++ [standard methods](https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/credentials.html). +Alternatively, AWS credentials can be set for the `netdata` user using AWS SDK for C++ [standard methods](https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/credentials.html). -A partition key for every record is computed automatically by Netdata with the purpose to distribute records across available shards evenly. +A partition key for every record is computed automatically by Netdata with the purpose to distribute records across +available shards evenly. [![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fbackends%2Faws_kinesis%2FREADME&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/backends/mongodb/README.md b/backends/mongodb/README.md index 7538fe8be..890afd178 100644 --- a/backends/mongodb/README.md +++ b/backends/mongodb/README.md @@ -2,21 +2,24 @@ ## Prerequisites -To use MongoDB as a backend, `libmongoc` 1.7.0 or higher should be [installed](http://mongoc.org/libmongoc/current/installing.html) first. Next, Netdata should be re-installed from the source. The installer will detect that the required libraries are now available. +To use MongoDB as a backend, `libmongoc` 1.7.0 or higher should be +[installed](http://mongoc.org/libmongoc/current/installing.html) first. Next, Netdata should be re-installed from the +source. The installer will detect that the required libraries are now available. ## Configuration To enable data sending to the MongoDB backend set the following options in `netdata.conf`: -``` +```conf [backend] enabled = yes type = mongodb ``` -In the Netdata configuration directory run `./edit-config mongodb.conf` and set [MongoDB URI](https://docs.mongodb.com/manual/reference/connection-string/), database name, and collection name: +In the Netdata configuration directory run `./edit-config mongodb.conf` and set [MongoDB +URI](https://docs.mongodb.com/manual/reference/connection-string/), database name, and collection name: -``` +```yaml # URI uri = mongodb://<hostname> @@ -27,6 +30,7 @@ database = your_database_name collection = your_collection_name ``` -The default socket timeout depends on the backend update interval. The timeout is 500 ms shorter than the interval (but not less than 1000 ms). You can alter the timeout using the `sockettimeoutms` MongoDB URI option. +The default socket timeout depends on the backend update interval. The timeout is 500 ms shorter than the interval (but +not less than 1000 ms). You can alter the timeout using the `sockettimeoutms` MongoDB URI option. [![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fbackends%2Fmongodb%2FREADME&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/backends/opentsdb/README.md b/backends/opentsdb/README.md index ab1f08bd3..5c04868c1 100644 --- a/backends/opentsdb/README.md +++ b/backends/opentsdb/README.md @@ -1,25 +1,31 @@ # OpenTSDB with HTTP -Netdata can easily communicate with OpenTSDB using HTTP API. To enable this channel, set the following options in your `netdata.conf`: +Netdata can easily communicate with OpenTSDB using HTTP API. To enable this channel, set the following options in your +`netdata.conf`: -``` +```conf [backend] type = opentsdb:http destination = localhost:4242 ``` -In this example, OpenTSDB is running with its default port, which is `4242`. If you run OpenTSDB on a different port, change the `destination = localhost:4242` line accordingly. +In this example, OpenTSDB is running with its default port, which is `4242`. If you run OpenTSDB on a different port, +change the `destination = localhost:4242` line accordingly. ## HTTPS -As of [v1.16.0](https://github.com/netdata/netdata/releases/tag/v1.16.0), Netdata can send metrics to OpenTSDB using TLS/SSL. Unfortunately, OpenTDSB does not support encrypted connections, so you will have to configure a reverse proxy to enable HTTPS communication between Netdata and OpenTSBD. You can set up a reverse proxy with [Nginx](../../docs/Running-behind-nginx.md). +As of [v1.16.0](https://github.com/netdata/netdata/releases/tag/v1.16.0), Netdata can send metrics to OpenTSDB using +TLS/SSL. Unfortunately, OpenTDSB does not support encrypted connections, so you will have to configure a reverse proxy +to enable HTTPS communication between Netdata and OpenTSBD. You can set up a reverse proxy with +[Nginx](../../docs/Running-behind-nginx.md). After your proxy is configured, make the following changes to `netdata.conf`: -``` +```conf [backend] type = opentsdb:https destination = localhost:8082 ``` -In this example, we used the port `8082` for our reverse proxy. If your reverse proxy listens on a different port, change the `destination = localhost:8082` line accordingly. +In this example, we used the port `8082` for our reverse proxy. If your reverse proxy listens on a different port, +change the `destination = localhost:8082` line accordingly. diff --git a/backends/prometheus/README.md b/backends/prometheus/README.md index 0a4be27e3..e234efe94 100644 --- a/backends/prometheus/README.md +++ b/backends/prometheus/README.md @@ -1,15 +1,19 @@ # Using Netdata with Prometheus -> IMPORTANT: the format Netdata sends metrics to prometheus has changed since Netdata v1.7. The new prometheus backend for Netdata supports a lot more features and is aligned to the development of the rest of the Netdata backends. +> IMPORTANT: the format Netdata sends metrics to prometheus has changed since Netdata v1.7. The new prometheus backend +> for Netdata supports a lot more features and is aligned to the development of the rest of the Netdata backends. -Prometheus is a distributed monitoring system which offers a very simple setup along with a robust data model. Recently Netdata added support for Prometheus. I'm going to quickly show you how to install both Netdata and prometheus on the same server. We can then use grafana pointed at Prometheus to obtain long term metrics Netdata offers. I'm assuming we are starting at a fresh ubuntu shell (whether you'd like to follow along in a VM or a cloud instance is up to you). +Prometheus is a distributed monitoring system which offers a very simple setup along with a robust data model. Recently +Netdata added support for Prometheus. I'm going to quickly show you how to install both Netdata and prometheus on the +same server. We can then use grafana pointed at Prometheus to obtain long term metrics Netdata offers. I'm assuming we +are starting at a fresh ubuntu shell (whether you'd like to follow along in a VM or a cloud instance is up to you). ## Installing Netdata and prometheus ### Installing Netdata -There are number of ways to install Netdata according to [Installation](../../packaging/installer/#installation)\ -The suggested way of installing the latest Netdata and keep it upgrade automatically. Using one line installation: +There are number of ways to install Netdata according to [Installation](../../packaging/installer/). The suggested way +of installing the latest Netdata and keep it upgrade automatically. Using one line installation: ```sh bash <(curl -Ss https://my-netdata.io/kickstart.sh) @@ -17,7 +21,7 @@ bash <(curl -Ss https://my-netdata.io/kickstart.sh) At this point we should have Netdata listening on port 19999. Attempt to take your browser here: -``` +```sh http://your.netdata.ip:19999 ``` @@ -25,7 +29,10 @@ _(replace `your.netdata.ip` with the IP or hostname of the server running Netdat ### Installing Prometheus -In order to install prometheus we are going to introduce our own systemd startup script along with an example of prometheus.yaml configuration. Prometheus needs to be pointed to your server at a specific target url for it to scrape Netdata's api. Prometheus is always a pull model meaning Netdata is the passive client within this architecture. Prometheus always initiates the connection with Netdata. +In order to install prometheus we are going to introduce our own systemd startup script along with an example of +prometheus.yaml configuration. Prometheus needs to be pointed to your server at a specific target url for it to scrape +Netdata's api. Prometheus is always a pull model meaning Netdata is the passive client within this architecture. +Prometheus always initiates the connection with Netdata. #### Download Prometheus @@ -113,7 +120,10 @@ scrape_configs: #### Install nodes.yml -The following is completely optional, it will enable Prometheus to generate alerts from some NetData sources. Tweak the values to your own needs. We will use the following `nodes.yml` file below. Save it at `/opt/prometheus/nodes.yml`, and add a _- "nodes.yml"_ entry under the _rule_files:_ section in the example prometheus.yml file above. +The following is completely optional, it will enable Prometheus to generate alerts from some NetData sources. Tweak the +values to your own needs. We will use the following `nodes.yml` file below. Save it at `/opt/prometheus/nodes.yml`, and +add a _- "nodes.yml"_ entry under the _rule_files:_ section in the example prometheus.yml file above. + ```yaml groups: - name: nodes @@ -156,7 +166,7 @@ groups: Save this service file as `/etc/systemd/system/prometheus.service`: -``` +```sh [Unit] Description=Prometheus Server AssertPathExists=/opt/prometheus @@ -183,13 +193,15 @@ sudo systemctl enable prometheus Prometheus should now start and listen on port 9090. Attempt to head there with your browser. -If everything is working correctly when you fetch `http://your.prometheus.ip:9090` you will see a 'Status' tab. Click this and click on 'targets' We should see the Netdata host as a scraped target. +If everything is working correctly when you fetch `http://your.prometheus.ip:9090` you will see a 'Status' tab. Click +this and click on 'targets' We should see the Netdata host as a scraped target. -- - - +--- ## Netdata support for prometheus -> IMPORTANT: the format Netdata sends metrics to prometheus has changed since Netdata v1.6. The new format allows easier queries for metrics and supports both `as collected` and normalized metrics. +> IMPORTANT: the format Netdata sends metrics to prometheus has changed since Netdata v1.6. The new format allows easier +> queries for metrics and supports both `as collected` and normalized metrics. Before explaining the changes, we have to understand the key differences between Netdata and prometheus. @@ -203,7 +215,8 @@ Each chart in Netdata has several properties (common to all its metrics): - `chart_name` - a more human friendly name for `chart_id`, also unique. -- `context` - this is the template of the chart. All disk I/O charts have the same context, all mysql requests charts have the same context, etc. This is used for alarm templates to match all the charts they should be attached to. +- `context` - this is the template of the chart. All disk I/O charts have the same context, all mysql requests charts + have the same context, etc. This is used for alarm templates to match all the charts they should be attached to. - `family` groups a set of charts together. It is used as the submenu of the dashboard. @@ -211,32 +224,52 @@ Each chart in Netdata has several properties (common to all its metrics): #### dimensions -Then each Netdata chart contains metrics called `dimensions`. All the dimensions of a chart have the same units of measurement, and are contextually in the same category (ie. the metrics for disk bandwidth are `read` and `write` and they are both in the same chart). +Then each Netdata chart contains metrics called `dimensions`. All the dimensions of a chart have the same units of +measurement, and are contextually in the same category (ie. the metrics for disk bandwidth are `read` and `write` and +they are both in the same chart). ### Netdata data source Netdata can send metrics to prometheus from 3 data sources: -- `as collected` or `raw` - this data source sends the metrics to prometheus as they are collected. No conversion is done by Netdata. The latest value for each metric is just given to prometheus. This is the most preferred method by prometheus, but it is also the harder to work with. To work with this data source, you will need to understand how to get meaningful values out of them. +- `as collected` or `raw` - this data source sends the metrics to prometheus as they are collected. No conversion is + done by Netdata. The latest value for each metric is just given to prometheus. This is the most preferred method by + prometheus, but it is also the harder to work with. To work with this data source, you will need to understand how + to get meaningful values out of them. - The format of the metrics is: `CONTEXT{chart="CHART",family="FAMILY",dimension="DIMENSION"}`. + The format of the metrics is: `CONTEXT{chart="CHART",family="FAMILY",dimension="DIMENSION"}`. - If the metric is a counter (`incremental` in Netdata lingo), `_total` is appended the context. + If the metric is a counter (`incremental` in Netdata lingo), `_total` is appended the context. - Unlike prometheus, Netdata allows each dimension of a chart to have a different algorithm and conversion constants (`multiplier` and `divisor`). In this case, that the dimensions of a charts are heterogeneous, Netdata will use this format: `CONTEXT_DIMENSION{chart="CHART",family="FAMILY"}` + Unlike prometheus, Netdata allows each dimension of a chart to have a different algorithm and conversion constants + (`multiplier` and `divisor`). In this case, that the dimensions of a charts are heterogeneous, Netdata will use this + format: `CONTEXT_DIMENSION{chart="CHART",family="FAMILY"}` -- `average` - this data source uses the Netdata database to send the metrics to prometheus as they are presented on the Netdata dashboard. So, all the metrics are sent as gauges, at the units they are presented in the Netdata dashboard charts. This is the easiest to work with. +- `average` - this data source uses the Netdata database to send the metrics to prometheus as they are presented on + the Netdata dashboard. So, all the metrics are sent as gauges, at the units they are presented in the Netdata + dashboard charts. This is the easiest to work with. - The format of the metrics is: `CONTEXT_UNITS_average{chart="CHART",family="FAMILY",dimension="DIMENSION"}`. + The format of the metrics is: `CONTEXT_UNITS_average{chart="CHART",family="FAMILY",dimension="DIMENSION"}`. - When this source is used, Netdata keeps track of the last access time for each prometheus server fetching the metrics. This last access time is used at the subsequent queries of the same prometheus server to identify the time-frame the `average` will be calculated. So, no matter how frequently prometheus scrapes Netdata, it will get all the database data. To identify each prometheus server, Netdata uses by default the IP of the client fetching the metrics. If there are multiple prometheus servers fetching data from the same Netdata, using the same IP, each prometheus server can append `server=NAME` to the URL. Netdata will use this `NAME` to uniquely identify the prometheus server. + When this source is used, Netdata keeps track of the last access time for each prometheus server fetching the + metrics. This last access time is used at the subsequent queries of the same prometheus server to identify the + time-frame the `average` will be calculated. + + So, no matter how frequently prometheus scrapes Netdata, it will get all the database data. + To identify each prometheus server, Netdata uses by default the IP of the client fetching the metrics. + + If there are multiple prometheus servers fetching data from the same Netdata, using the same IP, each prometheus + server can append `server=NAME` to the URL. Netdata will use this `NAME` to uniquely identify the prometheus server. - `sum` or `volume`, is like `average` but instead of averaging the values, it sums them. - The format of the metrics is: `CONTEXT_UNITS_sum{chart="CHART",family="FAMILY",dimension="DIMENSION"}`. - All the other operations are the same with `average`. + The format of the metrics is: `CONTEXT_UNITS_sum{chart="CHART",family="FAMILY",dimension="DIMENSION"}`. All the + other operations are the same with `average`. + + To change the data source to `sum` or `as-collected` you need to provide the `source` parameter in the request URL. + e.g.: `http://your.netdata.ip:19999/api/v1/allmetrics?format=prometheus&help=yes&source=as-collected` -Keep in mind that early versions of Netdata were sending the metrics as: `CHART_DIMENSION{}`. + Keep in mind that early versions of Netdata were sending the metrics as: `CHART_DIMENSION{}`. ### Querying Metrics @@ -248,7 +281,9 @@ _(replace `your.netdata.ip` with the ip or hostname of your Netdata server)_ Netdata will respond with all the metrics it sends to prometheus. -If you search that page for `"system.cpu"` you will find all the metrics Netdata is exporting to prometheus for this chart. `system.cpu` is the chart name on the Netdata dashboard (on the Netdata dashboard all charts have a text heading such as : `Total CPU utilization (system.cpu)`. What we are interested here in the chart name: `system.cpu`). +If you search that page for `"system.cpu"` you will find all the metrics Netdata is exporting to prometheus for this +chart. `system.cpu` is the chart name on the Netdata dashboard (on the Netdata dashboard all charts have a text heading +such as : `Total CPU utilization (system.cpu)`. What we are interested here in the chart name: `system.cpu`). Searching for `"system.cpu"` reveals: @@ -278,7 +313,9 @@ netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension= _(Netdata response for `system.cpu` with source=`average`)_ -In `average` or `sum` data sources, all values are normalized and are reported to prometheus as gauges. Now, use the 'expression' text form in prometheus. Begin to type the metrics we are looking for: `netdata_system_cpu`. You should see that the text form begins to auto-fill as prometheus knows about this metric. +In `average` or `sum` data sources, all values are normalized and are reported to prometheus as gauges. Now, use the +'expression' text form in prometheus. Begin to type the metrics we are looking for: `netdata_system_cpu`. You should see +that the text form begins to auto-fill as prometheus knows about this metric. If the data source was `as collected`, the response would be: @@ -312,7 +349,9 @@ For more information check prometheus documentation. ### Streaming data from upstream hosts -The `format=prometheus` parameter only exports the host's Netdata metrics. If you are using the master/slave functionality of Netdata this ignores any upstream hosts - so you should consider using the below in your **prometheus.yml**: +The `format=prometheus` parameter only exports the host's Netdata metrics. If you are using the master/slave +functionality of Netdata this ignores any upstream hosts - so you should consider using the below in your +**prometheus.yml**: ```yaml metrics_path: '/api/v1/allmetrics' @@ -321,31 +360,38 @@ The `format=prometheus` parameter only exports the host's Netdata metrics. If y honor_labels: true ``` -This will report all upstream host data, and `honor_labels` will make Prometheus take note of the instance names provided. +This will report all upstream host data, and `honor_labels` will make Prometheus take note of the instance names +provided. ### Timestamps -To pass the metrics through prometheus pushgateway, Netdata supports the option `×tamps=no` to send the metrics without timestamps. +To pass the metrics through prometheus pushgateway, Netdata supports the option `×tamps=no` to send the metrics +without timestamps. ## Netdata host variables -Netdata collects various system configuration metrics, like the max number of TCP sockets supported, the max number of files allowed system-wide, various IPC sizes, etc. These metrics are not exposed to prometheus by default. +Netdata collects various system configuration metrics, like the max number of TCP sockets supported, the max number of +files allowed system-wide, various IPC sizes, etc. These metrics are not exposed to prometheus by default. To expose them, append `variables=yes` to the Netdata URL. ### TYPE and HELP -To save bandwidth, and because prometheus does not use them anyway, `# TYPE` and `# HELP` lines are suppressed. If wanted they can be re-enabled via `types=yes` and `help=yes`, e.g. `/api/v1/allmetrics?format=prometheus&types=yes&help=yes` +To save bandwidth, and because prometheus does not use them anyway, `# TYPE` and `# HELP` lines are suppressed. If +wanted they can be re-enabled via `types=yes` and `help=yes`, e.g. +`/api/v1/allmetrics?format=prometheus&types=yes&help=yes` ### Names and IDs -Netdata supports names and IDs for charts and dimensions. Usually IDs are unique identifiers as read by the system and names are human friendly labels (also unique). +Netdata supports names and IDs for charts and dimensions. Usually IDs are unique identifiers as read by the system and +names are human friendly labels (also unique). -Most charts and metrics have the same ID and name, but in several cases they are different: disks with device-mapper, interrupts, QoS classes, statsd synthetic charts, etc. +Most charts and metrics have the same ID and name, but in several cases they are different: disks with device-mapper, +interrupts, QoS classes, statsd synthetic charts, etc. The default is controlled in `netdata.conf`: -``` +```conf [backend] send names instead of ids = yes | no ``` @@ -359,18 +405,21 @@ You can overwrite it from prometheus, by appending to the URL: Netdata can filter the metrics it sends to prometheus with this setting: -``` +```conf [backend] send charts matching = * ``` -This settings accepts a space separated list of patterns to match the **charts** to be sent to prometheus. Each pattern can use `*` as wildcard, any number of times (e.g `*a*b*c*` is valid). Patterns starting with `!` give a negative match (e.g `!*.bad users.* groups.*` will send all the users and groups except `bad` user and `bad` group). The order is important: the first match (positive or negative) left to right, is used. +This settings accepts a space separated list of patterns to match the **charts** to be sent to prometheus. Each pattern +can use `*` as wildcard, any number of times (e.g `*a*b*c*` is valid). Patterns starting with `!` give a negative match +(e.g `!*.bad users.* groups.*` will send all the users and groups except `bad` user and `bad` group). The order is +important: the first match (positive or negative) left to right, is used. ### Changing the prefix of Netdata metrics Netdata sends all metrics prefixed with `netdata_`. You can change this in `netdata.conf`, like this: -``` +```conf [backend] prefix = netdata ``` @@ -379,16 +428,23 @@ It can also be changed from the URL, by appending `&prefix=netdata`. ### Metric Units -The default source `average` adds the unit of measurement to the name of each metric (e.g. `_KiB_persec`). -To hide the units and get the same metric names as with the other sources, append to the URL `&hideunits=yes`. +The default source `average` adds the unit of measurement to the name of each metric (e.g. `_KiB_persec`). To hide the +units and get the same metric names as with the other sources, append to the URL `&hideunits=yes`. -The units were standardized in v1.12, with the effect of changing the metric names. -To get the metric names as they were before v1.12, append to the URL `&oldunits=yes` +The units were standardized in v1.12, with the effect of changing the metric names. To get the metric names as they were +before v1.12, append to the URL `&oldunits=yes` ### Accuracy of `average` and `sum` data sources -When the data source is set to `average` or `sum`, Netdata remembers the last access of each client accessing prometheus metrics and uses this last access time to respond with the `average` or `sum` of all the entries in the database since that. This means that prometheus servers are not losing data when they access Netdata with data source = `average` or `sum`. +When the data source is set to `average` or `sum`, Netdata remembers the last access of each client accessing prometheus +metrics and uses this last access time to respond with the `average` or `sum` of all the entries in the database since +that. This means that prometheus servers are not losing data when they access Netdata with data source = `average` or +`sum`. -To uniquely identify each prometheus server, Netdata uses the IP of the client accessing the metrics. If however the IP is not good enough for identifying a single prometheus server (e.g. when prometheus servers are accessing Netdata through a web proxy, or when multiple prometheus servers are NATed to a single IP), each prometheus may append `&server=NAME` to the URL. This `NAME` is used by Netdata to uniquely identify each prometheus server and keep track of its last access time. +To uniquely identify each prometheus server, Netdata uses the IP of the client accessing the metrics. If however the IP +is not good enough for identifying a single prometheus server (e.g. when prometheus servers are accessing Netdata +through a web proxy, or when multiple prometheus servers are NATed to a single IP), each prometheus may append +`&server=NAME` to the URL. This `NAME` is used by Netdata to uniquely identify each prometheus server and keep track of +its last access time. [![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fbackends%2Fprometheus%2FREADME&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/backends/prometheus/remote_write/README.md b/backends/prometheus/remote_write/README.md index 8af6f4d1d..009ded608 100644 --- a/backends/prometheus/remote_write/README.md +++ b/backends/prometheus/remote_write/README.md @@ -2,26 +2,32 @@ ## Prerequisites -To use the prometheus remote write API with [storage providers](https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage) [protobuf](https://developers.google.com/protocol-buffers/) and [snappy](https://github.com/google/snappy) libraries should be installed first. Next, Netdata should be re-installed from the source. The installer will detect that the required libraries and utilities are now available. +To use the prometheus remote write API with [storage +providers](https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage) +[protobuf](https://developers.google.com/protocol-buffers/) and [snappy](https://github.com/google/snappy) libraries +should be installed first. Next, Netdata should be re-installed from the source. The installer will detect that the +required libraries and utilities are now available. ## Configuration An additional option in the backend configuration section is available for the remote write backend: -``` +```conf [backend] remote write URL path = /receive ``` -The default value is `/receive`. `remote write URL path` is used to set an endpoint path for the remote write protocol. For example, if your endpoint is `http://example.domain:example_port/storage/read` you should set +The default value is `/receive`. `remote write URL path` is used to set an endpoint path for the remote write protocol. +For example, if your endpoint is `http://example.domain:example_port/storage/read` you should set -``` +```conf [backend] destination = example.domain:example_port remote write URL path = /storage/read ``` -`buffered` and `lost` dimensions in the Netdata Backend Data Size operation monitoring chart estimate uncompressed buffer size on failures. +`buffered` and `lost` dimensions in the Netdata Backend Data Size operation monitoring chart estimate uncompressed +buffer size on failures. ## Notes |