diff options
Diffstat (limited to '')
38 files changed, 4793 insertions, 0 deletions
diff --git a/backends/Makefile.am b/backends/Makefile.am new file mode 100644 index 0000000..dace013 --- /dev/null +++ b/backends/Makefile.am @@ -0,0 +1,22 @@ +# SPDX-License-Identifier: GPL-3.0-or-later + +AUTOMAKE_OPTIONS = subdir-objects +MAINTAINERCLEANFILES = $(srcdir)/Makefile.in + +SUBDIRS = \ + graphite \ + json \ + opentsdb \ + prometheus \ + aws_kinesis \ + mongodb \ + $(NULL) + +dist_noinst_DATA = \ + README.md \ + WALKTHROUGH.md \ + $(NULL) + +dist_noinst_SCRIPTS = \ + nc-backend.sh \ + $(NULL) diff --git a/backends/README.md b/backends/README.md new file mode 100644 index 0000000..8d53fd6 --- /dev/null +++ b/backends/README.md @@ -0,0 +1,236 @@ +<!-- +title: "Metrics long term archiving" +custom_edit_url: https://github.com/netdata/netdata/edit/master/backends/README.md +--> + +# Metrics long term archiving + +> ⚠️ The backends system is now deprecated in favor of the [exporting engine](/exporting/README.md). + +Netdata supports backends for archiving the metrics, or providing long term dashboards, using Grafana or other tools, +like this: + +![image](https://cloud.githubusercontent.com/assets/2662304/20649711/29f182ba-b4ce-11e6-97c8-ab2c0ab59833.png) + +Since Netdata collects thousands of metrics per server per second, which would easily congest any backend server when +several Netdata servers are sending data to it, Netdata allows sending metrics at a lower frequency, by resampling them. + +So, although Netdata collects metrics every second, it can send to the backend servers averages or sums every X seconds +(though, it can send them per second if you need it to). + +## features + +1. Supported backends + + - **graphite** (`plaintext interface`, used by **Graphite**, **InfluxDB**, **KairosDB**, **Blueflood**, + **ElasticSearch** via logstash tcp input and the graphite codec, etc) + + metrics are sent to the backend server as `prefix.hostname.chart.dimension`. `prefix` is configured below, + `hostname` is the hostname of the machine (can also be configured). + + - **opentsdb** (`telnet or HTTP interfaces`, used by **OpenTSDB**, **InfluxDB**, **KairosDB**, etc) + + metrics are sent to opentsdb as `prefix.chart.dimension` with tag `host=hostname`. + + - **json** document DBs + + metrics are sent to a document db, `JSON` formatted. + + - **prometheus** is described at [prometheus page](/backends/prometheus/README.md) since it pulls data from + Netdata. + + - **prometheus remote write** (a binary snappy-compressed protocol buffer encoding over HTTP used by + **Elasticsearch**, **Gnocchi**, **Graphite**, **InfluxDB**, **Kafka**, **OpenTSDB**, **PostgreSQL/TimescaleDB**, + **Splunk**, **VictoriaMetrics**, and a lot of other [storage + providers](https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage)) + + metrics are labeled in the format, which is used by Netdata for the [plaintext prometheus + protocol](/backends/prometheus/README.md). Notes on using the remote write backend are [here](/backends/prometheus/remote_write/README.md). + + - **TimescaleDB** via [community-built connector](/backends/TIMESCALE.md) that takes JSON streams from a Netdata + client and writes them to a TimescaleDB table. + + - **AWS Kinesis Data Streams** + + metrics are sent to the service in `JSON` format. + + - **MongoDB** + + metrics are sent to the database in `JSON` format. + +2. Only one backend may be active at a time. + +3. Netdata can filter metrics (at the chart level), to send only a subset of the collected metrics. + +4. Netdata supports three modes of operation for all backends: + + - `as-collected` sends to backends the metrics as they are collected, in the units they are collected. So, + counters are sent as counters and gauges are sent as gauges, much like all data collectors do. For example, to + calculate CPU utilization in this format, you need to know how to convert kernel ticks to percentage. + + - `average` sends to backends normalized metrics from the Netdata database. In this mode, all metrics are sent as + gauges, in the units Netdata uses. This abstracts data collection and simplifies visualization, but you will not + be able to copy and paste queries from other sources to convert units. For example, CPU utilization percentage + is calculated by Netdata, so Netdata will convert ticks to percentage and send the average percentage to the + backend. + + - `sum` or `volume`: the sum of the interpolated values shown on the Netdata graphs is sent to the backend. So, if + Netdata is configured to send data to the backend every 10 seconds, the sum of the 10 values shown on the + Netdata charts will be used. + + Time-series databases suggest to collect the raw values (`as-collected`). If you plan to invest on building your + monitoring around a time-series database and you already know (or you will invest in learning) how to convert units + and normalize the metrics in Grafana or other visualization tools, we suggest to use `as-collected`. + + If, on the other hand, you just need long term archiving of Netdata metrics and you plan to mainly work with + Netdata, we suggest to use `average`. It decouples visualization from data collection, so it will generally be a lot + simpler. Furthermore, if you use `average`, the charts shown in the back-end will match exactly what you see in + Netdata, which is not necessarily true for the other modes of operation. + +5. This code is smart enough, not to slow down Netdata, independently of the speed of the backend server. + +## configuration + +In `/etc/netdata/netdata.conf` you should have something like this (if not download the latest version of `netdata.conf` +from your Netdata): + +```conf +[backend] + enabled = yes | no + type = graphite | opentsdb:telnet | opentsdb:http | opentsdb:https | prometheus_remote_write | json | kinesis | mongodb + host tags = list of TAG=VALUE + destination = space separated list of [PROTOCOL:]HOST[:PORT] - the first working will be used, or a region for kinesis + data source = average | sum | as collected + prefix = Netdata + hostname = my-name + update every = 10 + buffer on failures = 10 + timeout ms = 20000 + send charts matching = * + send hosts matching = localhost * + send names instead of ids = yes +``` + +- `enabled = yes | no`, enables or disables sending data to a backend + +- `type = graphite | opentsdb:telnet | opentsdb:http | opentsdb:https | json | kinesis | mongodb`, selects the backend + type + +- `destination = host1 host2 host3 ...`, accepts **a space separated list** of hostnames, IPs (IPv4 and IPv6) and + ports to connect to. Netdata will use the **first available** to send the metrics. + + The format of each item in this list, is: `[PROTOCOL:]IP[:PORT]`. + + `PROTOCOL` can be `udp` or `tcp`. `tcp` is the default and only supported by the current backends. + + `IP` can be `XX.XX.XX.XX` (IPv4), or `[XX:XX...XX:XX]` (IPv6). For IPv6 you can to enclose the IP in `[]` to + separate it from the port. + + `PORT` can be a number of a service name. If omitted, the default port for the backend will be used + (graphite = 2003, opentsdb = 4242). + + Example IPv4: + +```conf + destination = 10.11.14.2:4242 10.11.14.3:4242 10.11.14.4:4242 +``` + + Example IPv6 and IPv4 together: + +```conf + destination = [ffff:...:0001]:2003 10.11.12.1:2003 +``` + + When multiple servers are defined, Netdata will try the next one when the first one fails. This allows you to + load-balance different servers: give your backend servers in different order on each Netdata. + + Netdata also ships `nc-backend.sh`, a script that can be used as a fallback backend to save the + metrics to disk and push them to the time-series database when it becomes available again. It can also be used to + monitor / trace / debug the metrics Netdata generates. + + For kinesis backend `destination` should be set to an AWS region (for example, `us-east-1`). + + The MongoDB backend doesn't use the `destination` option for its configuration. It uses the `mongodb.conf` + [configuration file](/backends/mongodb/README.md) instead. + +- `data source = as collected`, or `data source = average`, or `data source = sum`, selects the kind of data that will + be sent to the backend. + +- `hostname = my-name`, is the hostname to be used for sending data to the backend server. By default this is + `[global].hostname`. + +- `prefix = Netdata`, is the prefix to add to all metrics. + +- `update every = 10`, is the number of seconds between sending data to the backend. Netdata will add some randomness + to this number, to prevent stressing the backend server when many Netdata servers send data to the same backend. + This randomness does not affect the quality of the data, only the time they are sent. + +- `buffer on failures = 10`, is the number of iterations (each iteration is `[backend].update every` seconds) to + buffer data, when the backend is not available. If the backend fails to receive the data after that many failures, + data loss on the backend is expected (Netdata will also log it). + +- `timeout ms = 20000`, is the timeout in milliseconds to wait for the backend server to process the data. By default + this is `2 * update_every * 1000`. + +- `send hosts matching = localhost *` includes one or more space separated patterns, using `*` as wildcard (any number + of times within each pattern). The patterns are checked against the hostname (the localhost is always checked as + `localhost`), allowing us to filter which hosts will be sent to the backend when this Netdata is a central Netdata + aggregating multiple hosts. A pattern starting with `!` gives a negative match. So to match all hosts named `*db*` + except hosts containing `*child*`, use `!*child* *db*` (so, the order is important: the first pattern + matching the hostname will be used - positive or negative). + +- `send charts matching = *` includes one or more space separated patterns, using `*` as wildcard (any number of times + within each pattern). The patterns are checked against both chart id and chart name. A pattern starting with `!` + gives a negative match. So to match all charts named `apps.*` except charts ending in `*reads`, use `!*reads + apps.*` (so, the order is important: the first pattern matching the chart id or the chart name will be used - + positive or negative). + +- `send names instead of ids = yes | no` controls the metric names Netdata should send to backend. Netdata supports + names and IDs for charts and dimensions. Usually IDs are unique identifiers as read by the system and names are + human friendly labels (also unique). Most charts and metrics have the same ID and name, but in several cases they + are different: disks with device-mapper, interrupts, QoS classes, statsd synthetic charts, etc. + +- `host tags = list of TAG=VALUE` defines tags that should be appended on all metrics for the given host. These are + currently only sent to graphite, json, opentsdb and prometheus. Please use the appropriate format for each + time-series db. For example opentsdb likes them like `TAG1=VALUE1 TAG2=VALUE2`, but prometheus like `tag1="value1", + tag2="value2"`. Host tags are mirrored with database replication (streaming of metrics between Netdata servers). + + Starting from Netdata v1.20 the host tags are parsed in accordance with a configured backend type and stored as + host labels so that they can be reused in API responses and exporting connectors. The parsing is supported for + graphite, json, opentsdb, and prometheus (default) backend types. You can check how the host tags were parsed using + the /api/v1/info API call. + +## monitoring operation + +Netdata provides 5 charts: + +1. **Buffered metrics**, the number of metrics Netdata added to the buffer for dispatching them to the + backend server. + +2. **Buffered data size**, the amount of data (in KB) Netdata added the buffer. + +3. ~~**Backend latency**, the time the backend server needed to process the data Netdata sent. If there was a + re-connection involved, this includes the connection time.~~ (this chart has been removed, because it only measures + the time Netdata needs to give the data to the O/S - since the backend servers do not ack the reception, Netdata + does not have any means to measure this properly). + +4. **Backend operations**, the number of operations performed by Netdata. + +5. **Backend thread CPU usage**, the CPU resources consumed by the Netdata thread, that is responsible for sending the + metrics to the backend server. + +![image](https://cloud.githubusercontent.com/assets/2662304/20463536/eb196084-af3d-11e6-8ee5-ddbd3b4d8449.png) + +## alarms + +Netdata adds 4 alarms: + +1. `backend_last_buffering`, number of seconds since the last successful buffering of backend data +2. `backend_metrics_sent`, percentage of metrics sent to the backend server +3. `backend_metrics_lost`, number of metrics lost due to repeating failures to contact the backend server +4. ~~`backend_slow`, the percentage of time between iterations needed by the backend time to process the data sent by + Netdata~~ (this was misleading and has been removed). + +![image](https://cloud.githubusercontent.com/assets/2662304/20463779/a46ed1c2-af43-11e6-91a5-07ca4533cac3.png) + +[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fbackends%2FREADME&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/backends/TIMESCALE.md b/backends/TIMESCALE.md new file mode 100644 index 0000000..05a3c3b --- /dev/null +++ b/backends/TIMESCALE.md @@ -0,0 +1,57 @@ +<!-- +title: "Writing metrics to TimescaleDB" +custom_edit_url: https://github.com/netdata/netdata/edit/master/backends/TIMESCALE.md +--> + +# Writing metrics to TimescaleDB + +Thanks to Netdata's community of developers and system administrators, and Mahlon Smith +([GitHub](https://github.com/mahlonsmith)/[Website](http://www.martini.nu/)) in particular, Netdata now supports +archiving metrics directly to TimescaleDB. + +What's TimescaleDB? Here's how their team defines the project on their [GitHub page](https://github.com/timescale/timescaledb): + +> TimescaleDB is an open-source database designed to make SQL scalable for time-series data. It is engineered up from +> PostgreSQL, providing automatic partitioning across time and space (partitioning key), as well as full SQL support. + +## Quickstart + +To get started archiving metrics to TimescaleDB right away, check out Mahlon's [`netdata-timescale-relay` +repository](https://github.com/mahlonsmith/netdata-timescale-relay) on GitHub. + +This small program takes JSON streams from a Netdata client and writes them to a PostgreSQL (aka TimescaleDB) table. +You'll run this program in parallel with Netdata, and after a short [configuration +process](https://github.com/mahlonsmith/netdata-timescale-relay#configuration), your metrics should start populating +TimescaleDB. + +Finally, another member of Netdata's community has built a project that quickly launches Netdata, TimescaleDB, and +Grafana in easy-to-manage Docker containers. Rune Juhl Jacobsen's +[project](https://github.com/runejuhl/grafana-timescaledb) uses a `Makefile` to create everything, which makes it +perfect for testing and experimentation. + +## Netdata↔TimescaleDB in action + +Aside from creating incredible contributions to Netdata, Mahlon works at [LAIKA](https://www.laika.com/), an +Oregon-based animation studio that's helped create acclaimed films like _Coraline_ and _Kubo and the Two Strings_. + +As part of his work to maintain the company's infrastructure of render farms, workstations, and virtual machines, he's +using Netdata, `netdata-timescale-relay`, and TimescaleDB to store Netdata metrics alongside other data from other +sources. + +> LAIKA is a long-time PostgreSQL user and added TimescaleDB to their infrastructure in 2018 to help manage and store +> their IT metrics and time-series data. So far, the tool has been in production at LAIKA for over a year and helps them +> with their use case of time-based logging, where they record over 8 million metrics an hour for netdata content alone. + +By archiving Netdata metrics to a backend like TimescaleDB, LAIKA can consolidate metrics data from distributed machines +efficiently. Mahlon can then correlate Netdata metrics with other sources directly in TimescaleDB. + +And, because LAIKA will soon be storing years worth of Netdata metrics data in TimescaleDB, they can analyze long-term +metrics as their films move from concept to final cut. + +Read the full blog post from LAIKA at the [TimescaleDB +blog](https://blog.timescale.com/blog/writing-it-metrics-from-netdata-to-timescaledb/amp/). + +Thank you to Mahlon, Rune, TimescaleDB, and the members of the Netdata community that requested and then built this +backend connection between Netdata and TimescaleDB! + +[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fbackends%2FTIMESCALE&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/backends/WALKTHROUGH.md b/backends/WALKTHROUGH.md new file mode 100644 index 0000000..76dd62f --- /dev/null +++ b/backends/WALKTHROUGH.md @@ -0,0 +1,258 @@ +<!-- +title: "Netdata, Prometheus, Grafana stack" +custom_edit_url: https://github.com/netdata/netdata/edit/master/backends/WALKTHROUGH.md +--> + +# Netdata, Prometheus, Grafana stack + +## Intro + +In this article I will walk you through the basics of getting Netdata, Prometheus and Grafana all working together and +monitoring your application servers. This article will be using docker on your local workstation. We will be working +with docker in an ad-hoc way, launching containers that run ‘/bin/bash’ and attaching a TTY to them. I use docker here +in a purely academic fashion and do not condone running Netdata in a container. I pick this method so individuals +without cloud accounts or access to VMs can try this out and for it’s speed of deployment. + +## Why Netdata, Prometheus, and Grafana + +Some time ago I was introduced to Netdata by a coworker. We were attempting to troubleshoot python code which seemed to +be bottlenecked. I was instantly impressed by the amount of metrics Netdata exposes to you. I quickly added Netdata to +my set of go-to tools when troubleshooting systems performance. + +Some time ago, even later, I was introduced to Prometheus. Prometheus is a monitoring application which flips the normal +architecture around and polls rest endpoints for its metrics. This architectural change greatly simplifies and decreases +the time necessary to begin monitoring your applications. Compared to current monitoring solutions the time spent on +designing the infrastructure is greatly reduced. Running a single Prometheus server per application becomes feasible +with the help of Grafana. + +Grafana has been the go to graphing tool for… some time now. It’s awesome, anyone that has used it knows it’s awesome. +We can point Grafana at Prometheus and use Prometheus as a data source. This allows a pretty simple overall monitoring +architecture: Install Netdata on your application servers, point Prometheus at Netdata, and then point Grafana at +Prometheus. + +I’m omitting an important ingredient in this stack in order to keep this tutorial simple and that is service discovery. +My personal preference is to use Consul. Prometheus can plug into consul and automatically begin to scrape new hosts +that register a Netdata client with Consul. + +At the end of this tutorial you will understand how each technology fits together to create a modern monitoring stack. +This stack will offer you visibility into your application and systems performance. + +## Getting Started - Netdata + +To begin let’s create our container which we will install Netdata on. We need to run a container, forward the necessary +port that Netdata listens on, and attach a tty so we can interact with the bash shell on the container. But before we do +this we want name resolution between the two containers to work. In order to accomplish this we will create a +user-defined network and attach both containers to this network. The first command we should run is: + +```sh +docker network create --driver bridge netdata-tutorial +``` + +With this user-defined network created we can now launch our container we will install Netdata on and point it to this +network. + +```sh +docker run -it --name netdata --hostname netdata --network=netdata-tutorial -p 19999:19999 centos:latest '/bin/bash' +``` + +This command creates an interactive tty session (-it), gives the container both a name in relation to the docker daemon +and a hostname (this is so you know what container is which when working in the shells and docker maps hostname +resolution to this container), forwards the local port 19999 to the container’s port 19999 (-p 19999:19999), sets the +command to run (/bin/bash) and then chooses the base container images (centos:latest). After running this you should be +sitting inside the shell of the container. + +After we have entered the shell we can install Netdata. This process could not be easier. If you take a look at [this +link](../packaging/installer/README.md), the Netdata devs give us several one-liners to install Netdata. I have not had +any issues with these one liners and their bootstrapping scripts so far (If you guys run into anything do share). Run +the following command in your container. + +```sh +bash <(curl -Ss https://my-netdata.io/kickstart.sh) --dont-wait +``` + +After the install completes you should be able to hit the Netdata dashboard at <http://localhost:19999/> (replace +localhost if you’re doing this on a VM or have the docker container hosted on a machine not on your local system). If +this is your first time using Netdata I suggest you take a look around. The amount of time I’ve spent digging through +/proc and calculating my own metrics has been greatly reduced by this tool. Take it all in. + +Next I want to draw your attention to a particular endpoint. Navigate to +<http://localhost:19999/api/v1/allmetrics?format=prometheus&help=yes> In your browser. This is the endpoint which +publishes all the metrics in a format which Prometheus understands. Let’s take a look at one of these metrics. +`netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="system"} 0.0831255 1501271696000` This +metric is representing several things which I will go in more details in the section on prometheus. For now understand +that this metric: `netdata_system_cpu_percentage_average` has several labels: (chart, family, dimension). This +corresponds with the first cpu chart you see on the Netdata dashboard. + +![](https://github.com/ldelossa/NetdataTutorial/raw/master/Screen%20Shot%202017-07-28%20at%204.00.45%20PM.png) + +This CHART is called ‘system.cpu’, The FAMILY is cpu, and the DIMENSION we are observing is “system”. You can begin to +draw links between the charts in Netdata to the prometheus metrics format in this manner. + +## Prometheus + +We will be installing prometheus in a container for purpose of demonstration. While prometheus does have an official +container I would like to walk through the install process and setup on a fresh container. This will allow anyone +reading to migrate this tutorial to a VM or Server of any sort. + +Let’s start another container in the same fashion as we did the Netdata container. + +```sh +docker run -it --name prometheus --hostname prometheus +--network=netdata-tutorial -p 9090:9090 centos:latest '/bin/bash' +``` + +This should drop you into a shell once again. Once there quickly install your favorite editor as we will be editing +files later in this tutorial. + +```sh +yum install vim -y +``` + +Prometheus provides a tarball of their latest stable versions [here](https://prometheus.io/download/). + +Let’s download the latest version and install into your container. + +```sh +cd /tmp && curl -s https://api.github.com/repos/prometheus/prometheus/releases/latest \ +| grep "browser_download_url.*linux-amd64.tar.gz" \ +| cut -d '"' -f 4 \ +| wget -qi - + +mkdir /opt/prometheus + +sudo tar -xvf /tmp/prometheus-*linux-amd64.tar.gz -C /opt/prometheus --strip=1 +``` + +This should get prometheus installed into the container. Let’s test that we can run prometheus and connect to it’s web +interface. + +```sh +/opt/prometheus/prometheus +``` + +Now attempt to go to <http://localhost:9090/>. You should be presented with the prometheus homepage. This is a good +point to talk about Prometheus’s data model which can be viewed here: <https://prometheus.io/docs/concepts/data_model/> +As explained we have two key elements in Prometheus metrics. We have the ‘metric’ and its ‘labels’. Labels allow for +granularity between metrics. Let’s use our previous example to further explain. + +```conf +netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="system"} 0.0831255 1501271696000 +``` + +Here our metric is ‘netdata_system_cpu_percentage_average’ and our labels are ‘chart’, ‘family’, and ‘dimension. The +last two values constitute the actual metric value for the metric type (gauge, counter, etc…). We can begin graphing +system metrics with this information, but first we need to hook up Prometheus to poll Netdata stats. + +Let’s move our attention to Prometheus’s configuration. Prometheus gets it config from the file located (in our example) +at `/opt/prometheus/prometheus.yml`. I won’t spend an extensive amount of time going over the configuration values +documented here: <https://prometheus.io/docs/operating/configuration/>. We will be adding a new“job” under the +“scrape_configs”. Let’s make the “scrape_configs” section look like this (we can use the dns name Netdata due to the +custom user-defined network we created in docker beforehand). + +```yaml +scrape_configs: + # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. + - job_name: 'prometheus' + + # metrics_path defaults to '/metrics' + # scheme defaults to 'http'. + + static_configs: + - targets: ['localhost:9090'] + + - job_name: 'netdata' + + metrics_path: /api/v1/allmetrics + params: + format: [ prometheus ] + + static_configs: + - targets: ['netdata:19999'] +``` + +Let’s start prometheus once again by running `/opt/prometheus/prometheus`. If we now navigate to prometheus at +‘<http://localhost:9090/targets’> we should see our target being successfully scraped. If we now go back to the +Prometheus’s homepage and begin to type ‘netdata\_’ Prometheus should auto complete metrics it is now scraping. + +![](https://github.com/ldelossa/NetdataTutorial/raw/master/Screen%20Shot%202017-07-28%20at%205.13.43%20PM.png) + +Let’s now start exploring how we can graph some metrics. Back in our NetData container lets get the CPU spinning with a +pointless busy loop. On the shell do the following: + +```sh +[root@netdata /]# while true; do echo "HOT HOT HOT CPU"; done +``` + +Our NetData cpu graph should be showing some activity. Let’s represent this in Prometheus. In order to do this let’s +keep our metrics page open for reference: <http://localhost:19999/api/v1/allmetrics?format=prometheus&help=yes> We are +setting out to graph the data in the CPU chart so let’s search for “system.cpu”in the metrics page above. We come across +a section of metrics with the first comments `# COMMENT homogeneous chart "system.cpu", context "system.cpu", family +"cpu", units "percentage"` Followed by the metrics. This is a good start now let us drill down to the specific metric we +would like to graph. + +```conf +# COMMENT +netdata_system_cpu_percentage_average: dimension "system", value is percentage, gauge, dt 1501275951 to 1501275951 inclusive +netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="system"} 0.0000000 1501275951000 +``` + +Here we learn that the metric name we care about is‘netdata_system_cpu_percentage_average’ so throw this into Prometheus +and see what we get. We should see something similar to this (I shut off my busy loop) + +![](https://github.com/ldelossa/NetdataTutorial/raw/master/Screen%20Shot%202017-07-28%20at%205.47.53%20PM.png) + +This is a good step toward what we want. Also make note that Prometheus will tag on an ‘instance’ label for us which +corresponds to our statically defined job in the configuration file. This allows us to tailor our queries to specific +instances. Now we need to isolate the dimension we want in our query. To do this let us refine the query slightly. Let’s +query the dimension also. Place this into our query text box. +`netdata_system_cpu_percentage_average{dimension="system"}` We now wind up with the following graph. + +![](https://github.com/ldelossa/NetdataTutorial/raw/master/Screen%20Shot%202017-07-28%20at%205.54.40%20PM.png) + +Awesome, this is exactly what we wanted. If you haven’t caught on yet we can emulate entire charts from NetData by using +the `chart` dimension. If you’d like you can combine the ‘chart’ and ‘instance’ dimension to create per-instance charts. +Let’s give this a try: `netdata_system_cpu_percentage_average{chart="system.cpu", instance="netdata:19999"}` + +This is the basics of using Prometheus to query NetData. I’d advise everyone at this point to read [this +page](../backends/prometheus/#using-netdata-with-prometheus). The key point here is that NetData can export metrics from +its internal DB or can send metrics “as-collected” by specifying the ‘source=as-collected’ url parameter like so. +<http://localhost:19999/api/v1/allmetrics?format=prometheus&help=yes&types=yes&source=as-collected> If you choose to use +this method you will need to use Prometheus's set of functions here: <https://prometheus.io/docs/querying/functions/> to +obtain useful metrics as you are now dealing with raw counters from the system. For example you will have to use the +`irate()` function over a counter to get that metric's rate per second. If your graphing needs are met by using the +metrics returned by NetData's internal database (not specifying any source= url parameter) then use that. If you find +limitations then consider re-writing your queries using the raw data and using Prometheus functions to get the desired +chart. + +## Grafana + +Finally we make it to grafana. This is the easiest part in my opinion. This time we will actually run the official +grafana docker container as all configuration we need to do is done via the GUI. Let’s run the following command: + +```sh +docker run -i -p 3000:3000 --network=netdata-tutorial grafana/grafana +``` + +This will get grafana running at ‘<http://localhost:3000/’> Let’s go there and + +login using the credentials Admin:Admin. + +The first thing we want to do is click ‘Add data source’. Let’s make it look like the following screenshot + +![](https://github.com/ldelossa/NetdataTutorial/raw/master/Screen%20Shot%202017-07-28%20at%206.36.55%20PM.png) + +With this completed let’s graph! Create a new Dashboard by clicking on the top left Grafana Icon and create a new graph +in that dashboard. Fill in the query like we did above and save. + +![](https://github.com/ldelossa/NetdataTutorial/raw/master/Screen%20Shot%202017-07-28%20at%206.39.38%20PM.png) + +## Conclusion + +There you have it, a complete systems monitoring stack which is very easy to deploy. From here I would begin to +understand how Prometheus and a service discovery mechanism such as Consul can play together nicely. My current prod +deployments automatically register Netdata services into Consul and Prometheus automatically begins to scrape them. Once +achieved you do not have to think about the monitoring system until Prometheus cannot keep up with your scale. Once this +happens there are options presented in the Prometheus documentation for solving this. Hope this was helpful, happy +monitoring. + +[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fbackends%2FWALKTHROUGH&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/backends/aws_kinesis/Makefile.am b/backends/aws_kinesis/Makefile.am new file mode 100644 index 0000000..1fec72c --- /dev/null +++ b/backends/aws_kinesis/Makefile.am @@ -0,0 +1,12 @@ +# SPDX-License-Identifier: GPL-3.0-or-later + +AUTOMAKE_OPTIONS = subdir-objects +MAINTAINERCLEANFILES = $(srcdir)/Makefile.in + +dist_noinst_DATA = \ + README.md \ + $(NULL) + +dist_libconfig_DATA = \ + aws_kinesis.conf \ + $(NULL) diff --git a/backends/aws_kinesis/README.md b/backends/aws_kinesis/README.md new file mode 100644 index 0000000..a2b6825 --- /dev/null +++ b/backends/aws_kinesis/README.md @@ -0,0 +1,53 @@ +<!-- +title: "Using Netdata with AWS Kinesis Data Streams" +custom_edit_url: https://github.com/netdata/netdata/edit/master/backends/aws_kinesis/README.md +--> + +# Using Netdata with AWS Kinesis Data Streams + +## Prerequisites + +To use AWS Kinesis as a backend AWS SDK for C++ should be +[installed](https://docs.aws.amazon.com/en_us/sdk-for-cpp/v1/developer-guide/setup.html) first. `libcrypto`, `libssl`, +and `libcurl` are also required to compile Netdata with Kinesis support enabled. Next, Netdata should be re-installed +from the source. The installer will detect that the required libraries are now available. + +If the AWS SDK for C++ is being installed from source, it is useful to set `-DBUILD_ONLY="kinesis"`. Otherwise, the +building process could take a very long time. Take a note, that the default installation path for the libraries is +`/usr/local/lib64`. Many Linux distributions don't include this path as the default one for a library search, so it is +advisable to use the following options to `cmake` while building the AWS SDK: + +```sh +cmake -DCMAKE_INSTALL_LIBDIR=/usr/lib -DCMAKE_INSTALL_INCLUDEDIR=/usr/include -DBUILD_SHARED_LIBS=OFF -DBUILD_ONLY=kinesis <aws-sdk-cpp sources> +``` + +## Configuration + +To enable data sending to the kinesis backend set the following options in `netdata.conf`: + +```conf +[backend] + enabled = yes + type = kinesis + destination = us-east-1 +``` + +set the `destination` option to an AWS region. + +In the Netdata configuration directory run `./edit-config aws_kinesis.conf` and set AWS credentials and stream name: + +```yaml +# AWS credentials +aws_access_key_id = your_access_key_id +aws_secret_access_key = your_secret_access_key + +# destination stream +stream name = your_stream_name +``` + +Alternatively, AWS credentials can be set for the `netdata` user using AWS SDK for C++ [standard methods](https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/credentials.html). + +A partition key for every record is computed automatically by Netdata with the purpose to distribute records across +available shards evenly. + +[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fbackends%2Faws_kinesis%2FREADME&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/backends/aws_kinesis/aws_kinesis.c b/backends/aws_kinesis/aws_kinesis.c new file mode 100644 index 0000000..b1ea478 --- /dev/null +++ b/backends/aws_kinesis/aws_kinesis.c @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#define BACKENDS_INTERNALS +#include "aws_kinesis.h" + +#define CONFIG_FILE_LINE_MAX ((CONFIG_MAX_NAME + CONFIG_MAX_VALUE + 1024) * 2) + +// ---------------------------------------------------------------------------- +// kinesis backend + +// read the aws_kinesis.conf file +int read_kinesis_conf(const char *path, char **access_key_id_p, char **secret_access_key_p, char **stream_name_p) +{ + char *access_key_id = *access_key_id_p; + char *secret_access_key = *secret_access_key_p; + char *stream_name = *stream_name_p; + + if(unlikely(access_key_id)) freez(access_key_id); + if(unlikely(secret_access_key)) freez(secret_access_key); + if(unlikely(stream_name)) freez(stream_name); + access_key_id = NULL; + secret_access_key = NULL; + stream_name = NULL; + + int line = 0; + + char filename[FILENAME_MAX + 1]; + snprintfz(filename, FILENAME_MAX, "%s/aws_kinesis.conf", path); + + char buffer[CONFIG_FILE_LINE_MAX + 1], *s; + + debug(D_BACKEND, "BACKEND: opening config file '%s'", filename); + + FILE *fp = fopen(filename, "r"); + if(!fp) { + return 1; + } + + while(fgets(buffer, CONFIG_FILE_LINE_MAX, fp) != NULL) { + buffer[CONFIG_FILE_LINE_MAX] = '\0'; + line++; + + s = trim(buffer); + if(!s || *s == '#') { + debug(D_BACKEND, "BACKEND: ignoring line %d of file '%s', it is empty.", line, filename); + continue; + } + + char *name = s; + char *value = strchr(s, '='); + if(unlikely(!value)) { + error("BACKEND: ignoring line %d ('%s') of file '%s', there is no = in it.", line, s, filename); + continue; + } + *value = '\0'; + value++; + + name = trim(name); + value = trim(value); + + if(unlikely(!name || *name == '#')) { + error("BACKEND: ignoring line %d of file '%s', name is empty.", line, filename); + continue; + } + + if(!value) + value = ""; + else + value = strip_quotes(value); + + if(name[0] == 'a' && name[4] == 'a' && !strcmp(name, "aws_access_key_id")) { + access_key_id = strdupz(value); + } + else if(name[0] == 'a' && name[4] == 's' && !strcmp(name, "aws_secret_access_key")) { + secret_access_key = strdupz(value); + } + else if(name[0] == 's' && !strcmp(name, "stream name")) { + stream_name = strdupz(value); + } + } + + fclose(fp); + + if(unlikely(!stream_name || !*stream_name)) { + error("BACKEND: stream name is a mandatory Kinesis parameter but it is not configured"); + return 1; + } + + *access_key_id_p = access_key_id; + *secret_access_key_p = secret_access_key; + *stream_name_p = stream_name; + + return 0; +} diff --git a/backends/aws_kinesis/aws_kinesis.conf b/backends/aws_kinesis/aws_kinesis.conf new file mode 100644 index 0000000..cc54b5f --- /dev/null +++ b/backends/aws_kinesis/aws_kinesis.conf @@ -0,0 +1,10 @@ +# AWS Kinesis Data Streams backend configuration +# +# All options in this file are mandatory + +# AWS credentials +aws_access_key_id = +aws_secret_access_key = + +# destination stream +stream name =
\ No newline at end of file diff --git a/backends/aws_kinesis/aws_kinesis.h b/backends/aws_kinesis/aws_kinesis.h new file mode 100644 index 0000000..50a4631 --- /dev/null +++ b/backends/aws_kinesis/aws_kinesis.h @@ -0,0 +1,14 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_BACKEND_KINESIS_H +#define NETDATA_BACKEND_KINESIS_H + +#include "backends/backends.h" +#include "aws_kinesis_put_record.h" + +#define KINESIS_PARTITION_KEY_MAX 256 +#define KINESIS_RECORD_MAX 1024 * 1024 + +extern int read_kinesis_conf(const char *path, char **auth_key_id_p, char **secure_key_p, char **stream_name_p); + +#endif //NETDATA_BACKEND_KINESIS_H diff --git a/backends/aws_kinesis/aws_kinesis_put_record.cc b/backends/aws_kinesis/aws_kinesis_put_record.cc new file mode 100644 index 0000000..a8ba4aa --- /dev/null +++ b/backends/aws_kinesis/aws_kinesis_put_record.cc @@ -0,0 +1,87 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include <aws/core/Aws.h> +#include <aws/core/client/ClientConfiguration.h> +#include <aws/core/auth/AWSCredentials.h> +#include <aws/core/utils/Outcome.h> +#include <aws/kinesis/KinesisClient.h> +#include <aws/kinesis/model/PutRecordRequest.h> +#include "aws_kinesis_put_record.h" + +using namespace Aws; + +static SDKOptions options; + +static Kinesis::KinesisClient *client; + +struct request_outcome { + Kinesis::Model::PutRecordOutcomeCallable future_outcome; + size_t data_len; +}; + +static Vector<request_outcome> request_outcomes; + +void backends_kinesis_init(const char *region, const char *access_key_id, const char *secret_key, const long timeout) { + InitAPI(options); + + Client::ClientConfiguration config; + + config.region = region; + config.requestTimeoutMs = timeout; + config.connectTimeoutMs = timeout; + + if(access_key_id && *access_key_id && secret_key && *secret_key) { + client = New<Kinesis::KinesisClient>("client", Auth::AWSCredentials(access_key_id, secret_key), config); + } else { + client = New<Kinesis::KinesisClient>("client", config); + } +} + +void backends_kinesis_shutdown() { + Delete(client); + + ShutdownAPI(options); +} + +int backends_kinesis_put_record(const char *stream_name, const char *partition_key, + const char *data, size_t data_len) { + Kinesis::Model::PutRecordRequest request; + + request.SetStreamName(stream_name); + request.SetPartitionKey(partition_key); + request.SetData(Utils::ByteBuffer((unsigned char*) data, data_len)); + + request_outcomes.push_back({client->PutRecordCallable(request), data_len}); + + return 0; +} + +int backends_kinesis_get_result(char *error_message, size_t *sent_bytes, size_t *lost_bytes) { + Kinesis::Model::PutRecordOutcome outcome; + *sent_bytes = 0; + *lost_bytes = 0; + + for(auto request_outcome = request_outcomes.begin(); request_outcome != request_outcomes.end(); ) { + std::future_status status = request_outcome->future_outcome.wait_for(std::chrono::microseconds(100)); + + if(status == std::future_status::ready || status == std::future_status::deferred) { + outcome = request_outcome->future_outcome.get(); + *sent_bytes += request_outcome->data_len; + + if(!outcome.IsSuccess()) { + *lost_bytes += request_outcome->data_len; + outcome.GetError().GetMessage().copy(error_message, ERROR_LINE_MAX); + } + + request_outcomes.erase(request_outcome); + } else { + ++request_outcome; + } + } + + if(*lost_bytes) { + return 1; + } + + return 0; +}
\ No newline at end of file diff --git a/backends/aws_kinesis/aws_kinesis_put_record.h b/backends/aws_kinesis/aws_kinesis_put_record.h new file mode 100644 index 0000000..fa3d034 --- /dev/null +++ b/backends/aws_kinesis/aws_kinesis_put_record.h @@ -0,0 +1,25 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_BACKEND_KINESIS_PUT_RECORD_H +#define NETDATA_BACKEND_KINESIS_PUT_RECORD_H + +#define ERROR_LINE_MAX 1023 + +#ifdef __cplusplus +extern "C" { +#endif + +void backends_kinesis_init(const char *region, const char *access_key_id, const char *secret_key, const long timeout); + +void backends_kinesis_shutdown(); + +int backends_kinesis_put_record(const char *stream_name, const char *partition_key, + const char *data, size_t data_len); + +int backends_kinesis_get_result(char *error_message, size_t *sent_bytes, size_t *lost_bytes); + +#ifdef __cplusplus +} +#endif + +#endif //NETDATA_BACKEND_KINESIS_PUT_RECORD_H diff --git a/backends/backends.c b/backends/backends.c new file mode 100644 index 0000000..6bf583e --- /dev/null +++ b/backends/backends.c @@ -0,0 +1,1243 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#define BACKENDS_INTERNALS +#include "backends.h" + +// ---------------------------------------------------------------------------- +// How backends work in netdata: +// +// 1. There is an independent thread that runs at the required interval +// (for example, once every 10 seconds) +// +// 2. Every time it wakes, it calls the backend formatting functions to build +// a buffer of data. This is a very fast, memory only operation. +// +// 3. If the buffer already includes data, the new data are appended. +// If the buffer becomes too big, because the data cannot be sent, a +// log is written and the buffer is discarded. +// +// 4. Then it tries to send all the data. It blocks until all the data are sent +// or the socket returns an error. +// If the time required for this is above the interval, it starts skipping +// intervals, but the calculated values include the entire database, without +// gaps (it remembers the timestamps and continues from where it stopped). +// +// 5. repeats the above forever. +// + +const char *global_backend_prefix = "netdata"; +int global_backend_update_every = 10; +BACKEND_OPTIONS global_backend_options = BACKEND_SOURCE_DATA_AVERAGE | BACKEND_OPTION_SEND_NAMES; +const char *global_backend_source = NULL; + +// ---------------------------------------------------------------------------- +// helper functions for backends + +size_t backend_name_copy(char *d, const char *s, size_t usable) { + size_t n; + + for(n = 0; *s && n < usable ; d++, s++, n++) { + char c = *s; + + if(c != '.' && !isalnum(c)) *d = '_'; + else *d = c; + } + *d = '\0'; + + return n; +} + +// calculate the SUM or AVERAGE of a dimension, for any timeframe +// may return NAN if the database does not have any value in the give timeframe + +calculated_number backend_calculate_value_from_stored_data( + RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap + , time_t *first_timestamp // the first point of the database used in this response + , time_t *last_timestamp // the timestamp that should be reported to backend +) { + RRDHOST *host = st->rrdhost; + (void)host; + + // find the edges of the rrd database for this chart + time_t first_t = rd->state->query_ops.oldest_time(rd); + time_t last_t = rd->state->query_ops.latest_time(rd); + time_t update_every = st->update_every; + struct rrddim_query_handle handle; + storage_number n; + + // step back a little, to make sure we have complete data collection + // for all metrics + after -= update_every * 2; + before -= update_every * 2; + + // align the time-frame + after = after - (after % update_every); + before = before - (before % update_every); + + // for before, loose another iteration + // the latest point will be reported the next time + before -= update_every; + + if(unlikely(after > before)) + // this can happen when update_every > before - after + after = before; + + if(unlikely(after < first_t)) + after = first_t; + + if(unlikely(before > last_t)) + before = last_t; + + if(unlikely(before < first_t || after > last_t)) { + // the chart has not been updated in the wanted timeframe + debug(D_BACKEND, "BACKEND: %s.%s.%s: aligned timeframe %lu to %lu is outside the chart's database range %lu to %lu", + host->hostname, st->id, rd->id, + (unsigned long)after, (unsigned long)before, + (unsigned long)first_t, (unsigned long)last_t + ); + return NAN; + } + + *first_timestamp = after; + *last_timestamp = before; + + size_t counter = 0; + calculated_number sum = 0; + +/* + long start_at_slot = rrdset_time2slot(st, before), + stop_at_slot = rrdset_time2slot(st, after), + slot, stop_now = 0; + + for(slot = start_at_slot; !stop_now ; slot--) { + + if(unlikely(slot < 0)) slot = st->entries - 1; + if(unlikely(slot == stop_at_slot)) stop_now = 1; + + storage_number n = rd->values[slot]; + + if(unlikely(!does_storage_number_exist(n))) { + // not collected + continue; + } + + calculated_number value = unpack_storage_number(n); + sum += value; + + counter++; + } +*/ + for(rd->state->query_ops.init(rd, &handle, after, before) ; !rd->state->query_ops.is_finished(&handle) ; ) { + time_t curr_t; + n = rd->state->query_ops.next_metric(&handle, &curr_t); + + if(unlikely(!does_storage_number_exist(n))) { + // not collected + continue; + } + + calculated_number value = unpack_storage_number(n); + sum += value; + + counter++; + } + rd->state->query_ops.finalize(&handle); + if(unlikely(!counter)) { + debug(D_BACKEND, "BACKEND: %s.%s.%s: no values stored in database for range %lu to %lu", + host->hostname, st->id, rd->id, + (unsigned long)after, (unsigned long)before + ); + return NAN; + } + + if(unlikely(BACKEND_OPTIONS_DATA_SOURCE(backend_options) == BACKEND_SOURCE_DATA_SUM)) + return sum; + + return sum / (calculated_number)counter; +} + + +// discard a response received by a backend +// after logging a simple of it to error.log + +int discard_response(BUFFER *b, const char *backend) { + char sample[1024]; + const char *s = buffer_tostring(b); + char *d = sample, *e = &sample[sizeof(sample) - 1]; + + for(; *s && d < e ;s++) { + char c = *s; + if(unlikely(!isprint(c))) c = ' '; + *d++ = c; + } + *d = '\0'; + + info("BACKEND: received %zu bytes from %s backend. Ignoring them. Sample: '%s'", buffer_strlen(b), backend, sample); + buffer_flush(b); + return 0; +} + + +// ---------------------------------------------------------------------------- +// the backend thread + +static SIMPLE_PATTERN *charts_pattern = NULL; +static SIMPLE_PATTERN *hosts_pattern = NULL; + +inline int backends_can_send_rrdset(BACKEND_OPTIONS backend_options, RRDSET *st) { + RRDHOST *host = st->rrdhost; + (void)host; + + if(unlikely(rrdset_flag_check(st, RRDSET_FLAG_BACKEND_IGNORE))) + return 0; + + if(unlikely(!rrdset_flag_check(st, RRDSET_FLAG_BACKEND_SEND))) { + // we have not checked this chart + if(simple_pattern_matches(charts_pattern, st->id) || simple_pattern_matches(charts_pattern, st->name)) + rrdset_flag_set(st, RRDSET_FLAG_BACKEND_SEND); + else { + rrdset_flag_set(st, RRDSET_FLAG_BACKEND_IGNORE); + debug(D_BACKEND, "BACKEND: not sending chart '%s' of host '%s', because it is disabled for backends.", st->id, host->hostname); + return 0; + } + } + + if(unlikely(!rrdset_is_available_for_backends(st))) { + debug(D_BACKEND, "BACKEND: not sending chart '%s' of host '%s', because it is not available for backends.", st->id, host->hostname); + return 0; + } + + if(unlikely(st->rrd_memory_mode == RRD_MEMORY_MODE_NONE && !(BACKEND_OPTIONS_DATA_SOURCE(backend_options) == BACKEND_SOURCE_DATA_AS_COLLECTED))) { + debug(D_BACKEND, "BACKEND: not sending chart '%s' of host '%s' because its memory mode is '%s' and the backend requires database access.", st->id, host->hostname, rrd_memory_mode_name(host->rrd_memory_mode)); + return 0; + } + + return 1; +} + +inline BACKEND_OPTIONS backend_parse_data_source(const char *source, BACKEND_OPTIONS backend_options) { + if(!strcmp(source, "raw") || !strcmp(source, "as collected") || !strcmp(source, "as-collected") || !strcmp(source, "as_collected") || !strcmp(source, "ascollected")) { + backend_options |= BACKEND_SOURCE_DATA_AS_COLLECTED; + backend_options &= ~(BACKEND_OPTIONS_SOURCE_BITS ^ BACKEND_SOURCE_DATA_AS_COLLECTED); + } + else if(!strcmp(source, "average")) { + backend_options |= BACKEND_SOURCE_DATA_AVERAGE; + backend_options &= ~(BACKEND_OPTIONS_SOURCE_BITS ^ BACKEND_SOURCE_DATA_AVERAGE); + } + else if(!strcmp(source, "sum") || !strcmp(source, "volume")) { + backend_options |= BACKEND_SOURCE_DATA_SUM; + backend_options &= ~(BACKEND_OPTIONS_SOURCE_BITS ^ BACKEND_SOURCE_DATA_SUM); + } + else { + error("BACKEND: invalid data source method '%s'.", source); + } + + return backend_options; +} + +static void backends_main_cleanup(void *ptr) { + struct netdata_static_thread *static_thread = (struct netdata_static_thread *)ptr; + static_thread->enabled = NETDATA_MAIN_THREAD_EXITING; + + info("cleaning up..."); + + static_thread->enabled = NETDATA_MAIN_THREAD_EXITED; +} + +/** + * Set Kinesis variables + * + * Set the variables necessary to work with this specific backend. + * + * @param default_port the default port of the backend + * @param brc function called to check the result. + * @param brf function called to format the message to the backend + */ +void backend_set_kinesis_variables(int *default_port, + backend_response_checker_t brc, + backend_request_formatter_t brf) +{ + (void)default_port; +#ifndef HAVE_KINESIS + (void)brc; + (void)brf; +#endif + +#if HAVE_KINESIS + *brc = process_json_response; + if (BACKEND_OPTIONS_DATA_SOURCE(global_backend_options) == BACKEND_SOURCE_DATA_AS_COLLECTED) + *brf = backends_format_dimension_collected_json_plaintext; + else + *brf = backends_format_dimension_stored_json_plaintext; +#endif +} + +/** + * Set Prometheus variables + * + * Set the variables necessary to work with this specific backend. + * + * @param default_port the default port of the backend + * @param brc function called to check the result. + * @param brf function called to format the message to the backend + */ +void backend_set_prometheus_variables(int *default_port, + backend_response_checker_t brc, + backend_request_formatter_t brf) +{ + (void)default_port; + (void)brf; +#ifndef ENABLE_PROMETHEUS_REMOTE_WRITE + (void)brc; +#endif + +#if ENABLE_PROMETHEUS_REMOTE_WRITE + *brc = backends_process_prometheus_remote_write_response; +#endif /* ENABLE_PROMETHEUS_REMOTE_WRITE */ +} + +/** + * Set MongoDB variables + * + * Set the variables necessary to work with this specific backend. + * + * @param default_port the default port of the backend + * @param brc function called to check the result. + * @param brf function called to format the message to the backend + */ +void backend_set_mongodb_variables(int *default_port, + backend_response_checker_t brc, + backend_request_formatter_t brf) +{ + (void)default_port; +#ifndef HAVE_MONGOC + (void)brc; + (void)brf; +#endif + +#if HAVE_MONGOC + *brc = process_json_response; + if(BACKEND_OPTIONS_DATA_SOURCE(global_backend_options) == BACKEND_SOURCE_DATA_AS_COLLECTED) + *brf = backends_format_dimension_collected_json_plaintext; + else + *brf = backends_format_dimension_stored_json_plaintext; +#endif +} + +/** + * Set JSON variables + * + * Set the variables necessary to work with this specific backend. + * + * @param default_port the default port of the backend + * @param brc function called to check the result. + * @param brf function called to format the message to the backend + */ +void backend_set_json_variables(int *default_port, + backend_response_checker_t brc, + backend_request_formatter_t brf) +{ + *default_port = 5448; + *brc = process_json_response; + + if (BACKEND_OPTIONS_DATA_SOURCE(global_backend_options) == BACKEND_SOURCE_DATA_AS_COLLECTED) + *brf = backends_format_dimension_collected_json_plaintext; + else + *brf = backends_format_dimension_stored_json_plaintext; +} + +/** + * Set OpenTSDB HTTP variables + * + * Set the variables necessary to work with this specific backend. + * + * @param default_port the default port of the backend + * @param brc function called to check the result. + * @param brf function called to format the message to the backend + */ +void backend_set_opentsdb_http_variables(int *default_port, + backend_response_checker_t brc, + backend_request_formatter_t brf) +{ + *default_port = 4242; + *brc = process_opentsdb_response; + + if(BACKEND_OPTIONS_DATA_SOURCE(global_backend_options) == BACKEND_SOURCE_DATA_AS_COLLECTED) + *brf = backends_format_dimension_collected_opentsdb_http; + else + *brf = backends_format_dimension_stored_opentsdb_http; + +} + +/** + * Set OpenTSDB Telnet variables + * + * Set the variables necessary to work with this specific backend. + * + * @param default_port the default port of the backend + * @param brc function called to check the result. + * @param brf function called to format the message to the backend + */ +void backend_set_opentsdb_telnet_variables(int *default_port, + backend_response_checker_t brc, + backend_request_formatter_t brf) +{ + *default_port = 4242; + *brc = process_opentsdb_response; + + if(BACKEND_OPTIONS_DATA_SOURCE(global_backend_options) == BACKEND_SOURCE_DATA_AS_COLLECTED) + *brf = backends_format_dimension_collected_opentsdb_telnet; + else + *brf = backends_format_dimension_stored_opentsdb_telnet; +} + +/** + * Set Graphite variables + * + * Set the variables necessary to work with this specific backend. + * + * @param default_port the default port of the backend + * @param brc function called to check the result. + * @param brf function called to format the message to the backend + */ +void backend_set_graphite_variables(int *default_port, + backend_response_checker_t brc, + backend_request_formatter_t brf) +{ + *default_port = 2003; + *brc = process_graphite_response; + + if(BACKEND_OPTIONS_DATA_SOURCE(global_backend_options) == BACKEND_SOURCE_DATA_AS_COLLECTED) + *brf = backends_format_dimension_collected_graphite_plaintext; + else + *brf = backends_format_dimension_stored_graphite_plaintext; +} + +/** + * Select Type + * + * Select the backedn type based in the user input + * + * @param type is the string that defines the backend type + * + * @return It returns the backend id. + */ +BACKEND_TYPE backend_select_type(const char *type) { + if(!strcmp(type, "graphite") || !strcmp(type, "graphite:plaintext")) { + return BACKEND_TYPE_GRAPHITE; + } + else if(!strcmp(type, "opentsdb") || !strcmp(type, "opentsdb:telnet")) { + return BACKEND_TYPE_OPENTSDB_USING_TELNET; + } + else if(!strcmp(type, "opentsdb:http") || !strcmp(type, "opentsdb:https")) { + return BACKEND_TYPE_OPENTSDB_USING_HTTP; + } + else if (!strcmp(type, "json") || !strcmp(type, "json:plaintext")) { + return BACKEND_TYPE_JSON; + } + else if (!strcmp(type, "prometheus_remote_write")) { + return BACKEND_TYPE_PROMETHEUS_REMOTE_WRITE; + } + else if (!strcmp(type, "kinesis") || !strcmp(type, "kinesis:plaintext")) { + return BACKEND_TYPE_KINESIS; + } + else if (!strcmp(type, "mongodb") || !strcmp(type, "mongodb:plaintext")) { + return BACKEND_TYPE_MONGODB; + } + + return BACKEND_TYPE_UNKNOWN; +} + +/** + * Backend main + * + * The main thread used to control the backedns. + * + * @param ptr a pointer to netdata_static_structure. + * + * @return It always return NULL. + */ +void *backends_main(void *ptr) { + netdata_thread_cleanup_push(backends_main_cleanup, ptr); + + int default_port = 0; + int sock = -1; + BUFFER *b = buffer_create(1), *response = buffer_create(1); + int (*backend_request_formatter)(BUFFER *, const char *, RRDHOST *, const char *, RRDSET *, RRDDIM *, time_t, time_t, BACKEND_OPTIONS) = NULL; + int (*backend_response_checker)(BUFFER *) = NULL; + +#if HAVE_KINESIS + int do_kinesis = 0; + char *kinesis_auth_key_id = NULL, *kinesis_secure_key = NULL, *kinesis_stream_name = NULL; +#endif + +#if ENABLE_PROMETHEUS_REMOTE_WRITE + int do_prometheus_remote_write = 0; + BUFFER *http_request_header = NULL; +#endif + +#if HAVE_MONGOC + int do_mongodb = 0; + char *mongodb_uri = NULL; + char *mongodb_database = NULL; + char *mongodb_collection = NULL; + + // set the default socket timeout in ms + int32_t mongodb_default_socket_timeout = (int32_t)(global_backend_update_every >= 2)?(global_backend_update_every * MSEC_PER_SEC - 500):1000; + +#endif + +#ifdef ENABLE_HTTPS + struct netdata_ssl opentsdb_ssl = {NULL , NETDATA_SSL_START}; +#endif + + // ------------------------------------------------------------------------ + // collect configuration options + + struct timeval timeout = { + .tv_sec = 0, + .tv_usec = 0 + }; + int enabled = config_get_boolean(CONFIG_SECTION_BACKEND, "enabled", 0); + const char *source = config_get(CONFIG_SECTION_BACKEND, "data source", "average"); + const char *type = config_get(CONFIG_SECTION_BACKEND, "type", "graphite"); + const char *destination = config_get(CONFIG_SECTION_BACKEND, "destination", "localhost"); + global_backend_prefix = config_get(CONFIG_SECTION_BACKEND, "prefix", "netdata"); + const char *hostname = config_get(CONFIG_SECTION_BACKEND, "hostname", localhost->hostname); + global_backend_update_every = (int)config_get_number(CONFIG_SECTION_BACKEND, "update every", global_backend_update_every); + int buffer_on_failures = (int)config_get_number(CONFIG_SECTION_BACKEND, "buffer on failures", 10); + long timeoutms = config_get_number(CONFIG_SECTION_BACKEND, "timeout ms", global_backend_update_every * 2 * 1000); + + if(config_get_boolean(CONFIG_SECTION_BACKEND, "send names instead of ids", (global_backend_options & BACKEND_OPTION_SEND_NAMES))) + global_backend_options |= BACKEND_OPTION_SEND_NAMES; + else + global_backend_options &= ~BACKEND_OPTION_SEND_NAMES; + + charts_pattern = simple_pattern_create(config_get(CONFIG_SECTION_BACKEND, "send charts matching", "*"), NULL, SIMPLE_PATTERN_EXACT); + hosts_pattern = simple_pattern_create(config_get(CONFIG_SECTION_BACKEND, "send hosts matching", "localhost *"), NULL, SIMPLE_PATTERN_EXACT); + +#if ENABLE_PROMETHEUS_REMOTE_WRITE + const char *remote_write_path = config_get(CONFIG_SECTION_BACKEND, "remote write URL path", "/receive"); +#endif + + // ------------------------------------------------------------------------ + // validate configuration options + // and prepare for sending data to our backend + + global_backend_options = backend_parse_data_source(source, global_backend_options); + global_backend_source = source; + + if(timeoutms < 1) { + error("BACKEND: invalid timeout %ld ms given. Assuming %d ms.", timeoutms, global_backend_update_every * 2 * 1000); + timeoutms = global_backend_update_every * 2 * 1000; + } + timeout.tv_sec = (timeoutms * 1000) / 1000000; + timeout.tv_usec = (timeoutms * 1000) % 1000000; + + if(!enabled || global_backend_update_every < 1) + goto cleanup; + + // ------------------------------------------------------------------------ + // select the backend type + BACKEND_TYPE work_type = backend_select_type(type); + if (work_type == BACKEND_TYPE_UNKNOWN) { + error("BACKEND: Unknown backend type '%s'", type); + goto cleanup; + } + + switch (work_type) { + case BACKEND_TYPE_OPENTSDB_USING_HTTP: { +#ifdef ENABLE_HTTPS + if (!strcmp(type, "opentsdb:https")) { + security_start_ssl(NETDATA_SSL_CONTEXT_EXPORTING); + } +#endif + backend_set_opentsdb_http_variables(&default_port,&backend_response_checker,&backend_request_formatter); + break; + } + case BACKEND_TYPE_PROMETHEUS_REMOTE_WRITE: { +#if ENABLE_PROMETHEUS_REMOTE_WRITE + do_prometheus_remote_write = 1; + + http_request_header = buffer_create(1); + backends_init_write_request(); +#else + error("BACKEND: Prometheus remote write support isn't compiled"); +#endif // ENABLE_PROMETHEUS_REMOTE_WRITE + backend_set_prometheus_variables(&default_port,&backend_response_checker,&backend_request_formatter); + break; + } + case BACKEND_TYPE_KINESIS: { +#if HAVE_KINESIS + do_kinesis = 1; + + if(unlikely(read_kinesis_conf(netdata_configured_user_config_dir, &kinesis_auth_key_id, &kinesis_secure_key, &kinesis_stream_name))) { + error("BACKEND: kinesis backend type is set but cannot read its configuration from %s/aws_kinesis.conf", netdata_configured_user_config_dir); + goto cleanup; + } + + backends_kinesis_init(destination, kinesis_auth_key_id, kinesis_secure_key, timeout.tv_sec * 1000 + timeout.tv_usec / 1000); +#else + error("BACKEND: AWS Kinesis support isn't compiled"); +#endif // HAVE_KINESIS + backend_set_kinesis_variables(&default_port,&backend_response_checker,&backend_request_formatter); + break; + } + case BACKEND_TYPE_MONGODB: { +#if HAVE_MONGOC + if(unlikely(read_mongodb_conf(netdata_configured_user_config_dir, + &mongodb_uri, + &mongodb_database, + &mongodb_collection))) { + error("BACKEND: mongodb backend type is set but cannot read its configuration from %s/mongodb.conf", + netdata_configured_user_config_dir); + goto cleanup; + } + + if(likely(!backends_mongodb_init(mongodb_uri, mongodb_database, mongodb_collection, mongodb_default_socket_timeout))) { + backend_set_mongodb_variables(&default_port, &backend_response_checker, &backend_request_formatter); + do_mongodb = 1; + } + else { + error("BACKEND: cannot initialize MongoDB backend"); + goto cleanup; + } +#else + error("BACKEND: MongoDB support isn't compiled"); +#endif // HAVE_MONGOC + break; + } + case BACKEND_TYPE_GRAPHITE: { + backend_set_graphite_variables(&default_port,&backend_response_checker,&backend_request_formatter); + break; + } + case BACKEND_TYPE_OPENTSDB_USING_TELNET: { + backend_set_opentsdb_telnet_variables(&default_port,&backend_response_checker,&backend_request_formatter); + break; + } + case BACKEND_TYPE_JSON: { + backend_set_json_variables(&default_port,&backend_response_checker,&backend_request_formatter); + break; + } + case BACKEND_TYPE_UNKNOWN: { + break; + } + default: { + break; + } + } + +#if ENABLE_PROMETHEUS_REMOTE_WRITE + if((backend_request_formatter == NULL && !do_prometheus_remote_write) || backend_response_checker == NULL) { +#else + if(backend_request_formatter == NULL || backend_response_checker == NULL) { +#endif + error("BACKEND: backend is misconfigured - disabling it."); + goto cleanup; + } + + +// ------------------------------------------------------------------------ +// prepare the charts for monitoring the backend operation + + struct rusage thread; + + collected_number + chart_buffered_metrics = 0, + chart_lost_metrics = 0, + chart_sent_metrics = 0, + chart_buffered_bytes = 0, + chart_received_bytes = 0, + chart_sent_bytes = 0, + chart_receptions = 0, + chart_transmission_successes = 0, + chart_transmission_failures = 0, + chart_data_lost_events = 0, + chart_lost_bytes = 0, + chart_backend_reconnects = 0; + // chart_backend_latency = 0; + + RRDSET *chart_metrics = rrdset_create_localhost("netdata", "backend_metrics", NULL, "backend", NULL, "Netdata Buffered Metrics", "metrics", "backends", NULL, 130600, global_backend_update_every, RRDSET_TYPE_LINE); + rrddim_add(chart_metrics, "buffered", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + rrddim_add(chart_metrics, "lost", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + rrddim_add(chart_metrics, "sent", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + + RRDSET *chart_bytes = rrdset_create_localhost("netdata", "backend_bytes", NULL, "backend", NULL, "Netdata Backend Data Size", "KiB", "backends", NULL, 130610, global_backend_update_every, RRDSET_TYPE_AREA); + rrddim_add(chart_bytes, "buffered", NULL, 1, 1024, RRD_ALGORITHM_ABSOLUTE); + rrddim_add(chart_bytes, "lost", NULL, 1, 1024, RRD_ALGORITHM_ABSOLUTE); + rrddim_add(chart_bytes, "sent", NULL, 1, 1024, RRD_ALGORITHM_ABSOLUTE); + rrddim_add(chart_bytes, "received", NULL, 1, 1024, RRD_ALGORITHM_ABSOLUTE); + + RRDSET *chart_ops = rrdset_create_localhost("netdata", "backend_ops", NULL, "backend", NULL, "Netdata Backend Operations", "operations", "backends", NULL, 130630, global_backend_update_every, RRDSET_TYPE_LINE); + rrddim_add(chart_ops, "write", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + rrddim_add(chart_ops, "discard", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + rrddim_add(chart_ops, "reconnect", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + rrddim_add(chart_ops, "failure", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + rrddim_add(chart_ops, "read", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + + /* + * this is misleading - we can only measure the time we need to send data + * this time is not related to the time required for the data to travel to + * the backend database and the time that server needed to process them + * + * issue #1432 and https://www.softlab.ntua.gr/facilities/documentation/unix/unix-socket-faq/unix-socket-faq-2.html + * + RRDSET *chart_latency = rrdset_create_localhost("netdata", "backend_latency", NULL, "backend", NULL, "Netdata Backend Latency", "ms", "backends", NULL, 130620, global_backend_update_every, RRDSET_TYPE_AREA); + rrddim_add(chart_latency, "latency", NULL, 1, 1000, RRD_ALGORITHM_ABSOLUTE); + */ + + RRDSET *chart_rusage = rrdset_create_localhost("netdata", "backend_thread_cpu", NULL, "backend", NULL, "NetData Backend Thread CPU usage", "milliseconds/s", "backends", NULL, 130630, global_backend_update_every, RRDSET_TYPE_STACKED); + rrddim_add(chart_rusage, "user", NULL, 1, 1000, RRD_ALGORITHM_INCREMENTAL); + rrddim_add(chart_rusage, "system", NULL, 1, 1000, RRD_ALGORITHM_INCREMENTAL); + + + // ------------------------------------------------------------------------ + // prepare the backend main loop + + info("BACKEND: configured ('%s' on '%s' sending '%s' data, every %d seconds, as host '%s', with prefix '%s')", type, destination, source, global_backend_update_every, hostname, global_backend_prefix); + send_statistics("BACKEND_START", "OK", type); + + usec_t step_ut = global_backend_update_every * USEC_PER_SEC; + time_t after = now_realtime_sec(); + int failures = 0; + heartbeat_t hb; + heartbeat_init(&hb); + + while(!netdata_exit) { + + // ------------------------------------------------------------------------ + // Wait for the next iteration point. + + heartbeat_next(&hb, step_ut); + time_t before = now_realtime_sec(); + debug(D_BACKEND, "BACKEND: preparing buffer for timeframe %lu to %lu", (unsigned long)after, (unsigned long)before); + + // ------------------------------------------------------------------------ + // add to the buffer the data we need to send to the backend + + netdata_thread_disable_cancelability(); + + size_t count_hosts = 0; + size_t count_charts_total = 0; + size_t count_dims_total = 0; + +#if ENABLE_PROMETHEUS_REMOTE_WRITE + if(do_prometheus_remote_write) + backends_clear_write_request(); +#endif + rrd_rdlock(); + RRDHOST *host; + rrdhost_foreach_read(host) { + if(unlikely(!rrdhost_flag_check(host, RRDHOST_FLAG_BACKEND_SEND|RRDHOST_FLAG_BACKEND_DONT_SEND))) { + char *name = (host == localhost)?"localhost":host->hostname; + if (!hosts_pattern || simple_pattern_matches(hosts_pattern, name)) { + rrdhost_flag_set(host, RRDHOST_FLAG_BACKEND_SEND); + info("enabled backend for host '%s'", name); + } + else { + rrdhost_flag_set(host, RRDHOST_FLAG_BACKEND_DONT_SEND); + info("disabled backend for host '%s'", name); + } + } + + if(unlikely(!rrdhost_flag_check(host, RRDHOST_FLAG_BACKEND_SEND))) + continue; + + rrdhost_rdlock(host); + + count_hosts++; + size_t count_charts = 0; + size_t count_dims = 0; + size_t count_dims_skipped = 0; + + const char *__hostname = (host == localhost)?hostname:host->hostname; + +#if ENABLE_PROMETHEUS_REMOTE_WRITE + if(do_prometheus_remote_write) { + backends_rrd_stats_remote_write_allmetrics_prometheus( + host + , __hostname + , global_backend_prefix + , global_backend_options + , after + , before + , &count_charts + , &count_dims + , &count_dims_skipped + ); + chart_buffered_metrics += count_dims; + } + else +#endif + { + RRDSET *st; + rrdset_foreach_read(st, host) { + if(likely(backends_can_send_rrdset(global_backend_options, st))) { + rrdset_rdlock(st); + + count_charts++; + + RRDDIM *rd; + rrddim_foreach_read(rd, st) { + if (likely(rd->last_collected_time.tv_sec >= after)) { + chart_buffered_metrics += backend_request_formatter(b, global_backend_prefix, host, __hostname, st, rd, after, before, global_backend_options); + count_dims++; + } + else { + debug(D_BACKEND, "BACKEND: not sending dimension '%s' of chart '%s' from host '%s', its last data collection (%lu) is not within our timeframe (%lu to %lu)", rd->id, st->id, __hostname, (unsigned long)rd->last_collected_time.tv_sec, (unsigned long)after, (unsigned long)before); + count_dims_skipped++; + } + } + + rrdset_unlock(st); + } + } + } + + debug(D_BACKEND, "BACKEND: sending host '%s', metrics of %zu dimensions, of %zu charts. Skipped %zu dimensions.", __hostname, count_dims, count_charts, count_dims_skipped); + count_charts_total += count_charts; + count_dims_total += count_dims; + + rrdhost_unlock(host); + } + rrd_unlock(); + + netdata_thread_enable_cancelability(); + + debug(D_BACKEND, "BACKEND: buffer has %zu bytes, added metrics for %zu dimensions, of %zu charts, from %zu hosts", buffer_strlen(b), count_dims_total, count_charts_total, count_hosts); + + // ------------------------------------------------------------------------ + + chart_buffered_bytes = (collected_number)buffer_strlen(b); + + // reset the monitoring chart counters + chart_received_bytes = + chart_sent_bytes = + chart_sent_metrics = + chart_lost_metrics = + chart_receptions = + chart_transmission_successes = + chart_transmission_failures = + chart_data_lost_events = + chart_lost_bytes = + chart_backend_reconnects = 0; + // chart_backend_latency = 0; + + if(unlikely(netdata_exit)) break; + + //fprintf(stderr, "\nBACKEND BEGIN:\n%s\nBACKEND END\n", buffer_tostring(b)); + //fprintf(stderr, "after = %lu, before = %lu\n", after, before); + + // prepare for the next iteration + // to add incrementally data to buffer + after = before; + +#if HAVE_KINESIS + if(do_kinesis) { + unsigned long long partition_key_seq = 0; + + size_t buffer_len = buffer_strlen(b); + size_t sent = 0; + + while(sent < buffer_len) { + char partition_key[KINESIS_PARTITION_KEY_MAX + 1]; + snprintf(partition_key, KINESIS_PARTITION_KEY_MAX, "netdata_%llu", partition_key_seq++); + size_t partition_key_len = strnlen(partition_key, KINESIS_PARTITION_KEY_MAX); + + const char *first_char = buffer_tostring(b) + sent; + + size_t record_len = 0; + + // split buffer into chunks of maximum allowed size + if(buffer_len - sent < KINESIS_RECORD_MAX - partition_key_len) { + record_len = buffer_len - sent; + } + else { + record_len = KINESIS_RECORD_MAX - partition_key_len; + while(*(first_char + record_len) != '\n' && record_len) record_len--; + } + + char error_message[ERROR_LINE_MAX + 1] = ""; + + debug(D_BACKEND, "BACKEND: backends_kinesis_put_record(): dest = %s, id = %s, key = %s, stream = %s, partition_key = %s, \ + buffer = %zu, record = %zu", destination, kinesis_auth_key_id, kinesis_secure_key, kinesis_stream_name, + partition_key, buffer_len, record_len); + + backends_kinesis_put_record(kinesis_stream_name, partition_key, first_char, record_len); + + sent += record_len; + chart_transmission_successes++; + + size_t sent_bytes = 0, lost_bytes = 0; + + if(unlikely(backends_kinesis_get_result(error_message, &sent_bytes, &lost_bytes))) { + // oops! we couldn't send (all or some of the) data + error("BACKEND: %s", error_message); + error("BACKEND: failed to write data to database backend '%s'. Willing to write %zu bytes, wrote %zu bytes.", + destination, sent_bytes, sent_bytes - lost_bytes); + + chart_transmission_failures++; + chart_data_lost_events++; + chart_lost_bytes += lost_bytes; + + // estimate the number of lost metrics + chart_lost_metrics += (collected_number)(chart_buffered_metrics + * (buffer_len && (lost_bytes > buffer_len) ? (double)lost_bytes / buffer_len : 1)); + + break; + } + else { + chart_receptions++; + } + + if(unlikely(netdata_exit)) break; + } + + chart_sent_bytes += sent; + if(likely(sent == buffer_len)) + chart_sent_metrics = chart_buffered_metrics; + + buffer_flush(b); + } else +#endif /* HAVE_KINESIS */ + +#if HAVE_MONGOC + if(do_mongodb) { + size_t buffer_len = buffer_strlen(b); + size_t sent = 0; + + while(sent < buffer_len) { + const char *first_char = buffer_tostring(b); + + debug(D_BACKEND, "BACKEND: backends_mongodb_insert(): uri = %s, database = %s, collection = %s, \ + buffer = %zu", mongodb_uri, mongodb_database, mongodb_collection, buffer_len); + + if(likely(!backends_mongodb_insert((char *)first_char, (size_t)chart_buffered_metrics))) { + sent += buffer_len; + chart_transmission_successes++; + chart_receptions++; + } + else { + // oops! we couldn't send (all or some of the) data + error("BACKEND: failed to write data to database backend '%s'. Willing to write %zu bytes, wrote %zu bytes.", + mongodb_uri, buffer_len, 0UL); + + chart_transmission_failures++; + chart_data_lost_events++; + chart_lost_bytes += buffer_len; + + // estimate the number of lost metrics + chart_lost_metrics += (collected_number)chart_buffered_metrics; + + break; + } + + if(unlikely(netdata_exit)) break; + } + + chart_sent_bytes += sent; + if(likely(sent == buffer_len)) + chart_sent_metrics = chart_buffered_metrics; + + buffer_flush(b); + } else +#endif /* HAVE_MONGOC */ + + { + + // ------------------------------------------------------------------------ + // if we are connected, receive a response, without blocking + + if(likely(sock != -1)) { + errno = 0; + + // loop through to collect all data + while(sock != -1 && errno != EWOULDBLOCK) { + buffer_need_bytes(response, 4096); + + ssize_t r; +#ifdef ENABLE_HTTPS + if(opentsdb_ssl.conn && !opentsdb_ssl.flags) { + r = SSL_read(opentsdb_ssl.conn, &response->buffer[response->len], response->size - response->len); + } else { + r = recv(sock, &response->buffer[response->len], response->size - response->len, MSG_DONTWAIT); + } +#else + r = recv(sock, &response->buffer[response->len], response->size - response->len, MSG_DONTWAIT); +#endif + if(likely(r > 0)) { + // we received some data + response->len += r; + chart_received_bytes += r; + chart_receptions++; + } + else if(r == 0) { + error("BACKEND: '%s' closed the socket", destination); + close(sock); + sock = -1; + } + else { + // failed to receive data + if(errno != EAGAIN && errno != EWOULDBLOCK) { + error("BACKEND: cannot receive data from backend '%s'.", destination); + } + } + } + + // if we received data, process them + if(buffer_strlen(response)) + backend_response_checker(response); + } + + // ------------------------------------------------------------------------ + // if we are not connected, connect to a backend server + + if(unlikely(sock == -1)) { + // usec_t start_ut = now_monotonic_usec(); + size_t reconnects = 0; + + sock = connect_to_one_of(destination, default_port, &timeout, &reconnects, NULL, 0); +#ifdef ENABLE_HTTPS + if(sock != -1) { + if(netdata_exporting_ctx) { + if(!opentsdb_ssl.conn) { + opentsdb_ssl.conn = SSL_new(netdata_exporting_ctx); + if(!opentsdb_ssl.conn) { + error("Failed to allocate SSL structure %d.", sock); + opentsdb_ssl.flags = NETDATA_SSL_NO_HANDSHAKE; + } + } else { + SSL_clear(opentsdb_ssl.conn); + } + } + + if(opentsdb_ssl.conn) { + if(SSL_set_fd(opentsdb_ssl.conn, sock) != 1) { + error("Failed to set the socket to the SSL on socket fd %d.", host->rrdpush_sender_socket); + opentsdb_ssl.flags = NETDATA_SSL_NO_HANDSHAKE; + } else { + opentsdb_ssl.flags = NETDATA_SSL_HANDSHAKE_COMPLETE; + SSL_set_connect_state(opentsdb_ssl.conn); + int err = SSL_connect(opentsdb_ssl.conn); + if (err != 1) { + err = SSL_get_error(opentsdb_ssl.conn, err); + error("SSL cannot connect with the server: %s ", ERR_error_string((long)SSL_get_error(opentsdb_ssl.conn, err), NULL)); + opentsdb_ssl.flags = NETDATA_SSL_NO_HANDSHAKE; + } //TODO: check certificate here + } + } + } +#endif + chart_backend_reconnects += reconnects; + // chart_backend_latency += now_monotonic_usec() - start_ut; + } + + if(unlikely(netdata_exit)) break; + + // ------------------------------------------------------------------------ + // if we are connected, send our buffer to the backend server + + if(likely(sock != -1)) { + size_t len = buffer_strlen(b); + // usec_t start_ut = now_monotonic_usec(); + int flags = 0; + #ifdef MSG_NOSIGNAL + flags += MSG_NOSIGNAL; + #endif + +#if ENABLE_PROMETHEUS_REMOTE_WRITE + if(do_prometheus_remote_write) { + size_t data_size = backends_get_write_request_size(); + + if(unlikely(!data_size)) { + error("BACKEND: write request size is out of range"); + continue; + } + + buffer_flush(b); + buffer_need_bytes(b, data_size); + if(unlikely(backends_pack_write_request(b->buffer, &data_size))) { + error("BACKEND: cannot pack write request"); + continue; + } + b->len = data_size; + chart_buffered_bytes = (collected_number)buffer_strlen(b); + + buffer_flush(http_request_header); + buffer_sprintf(http_request_header, + "POST %s HTTP/1.1\r\n" + "Host: %s\r\n" + "Accept: */*\r\n" + "Content-Length: %zu\r\n" + "Content-Type: application/x-www-form-urlencoded\r\n\r\n", + remote_write_path, + destination, + data_size + ); + + len = buffer_strlen(http_request_header); + send(sock, buffer_tostring(http_request_header), len, flags); + + len = data_size; + } +#endif + + ssize_t written; +#ifdef ENABLE_HTTPS + if(opentsdb_ssl.conn && !opentsdb_ssl.flags) { + written = SSL_write(opentsdb_ssl.conn, buffer_tostring(b), len); + } else { + written = send(sock, buffer_tostring(b), len, flags); + } +#else + written = send(sock, buffer_tostring(b), len, flags); +#endif + + // chart_backend_latency += now_monotonic_usec() - start_ut; + if(written != -1 && (size_t)written == len) { + // we sent the data successfully + chart_transmission_successes++; + chart_sent_bytes += written; + chart_sent_metrics = chart_buffered_metrics; + + // reset the failures count + failures = 0; + + // empty the buffer + buffer_flush(b); + } + else { + // oops! we couldn't send (all or some of the) data + error("BACKEND: failed to write data to database backend '%s'. Willing to write %zu bytes, wrote %zd bytes. Will re-connect.", destination, len, written); + chart_transmission_failures++; + + if(written != -1) + chart_sent_bytes += written; + + // increment the counter we check for data loss + failures++; + + // close the socket - we will re-open it next time + close(sock); + sock = -1; + } + } + else { + error("BACKEND: failed to update database backend '%s'", destination); + chart_transmission_failures++; + + // increment the counter we check for data loss + failures++; + } + } + + +#if ENABLE_PROMETHEUS_REMOTE_WRITE + if(do_prometheus_remote_write && failures) { + (void) buffer_on_failures; + failures = 0; + chart_lost_bytes = chart_buffered_bytes = backends_get_write_request_size(); // estimated write request size + chart_data_lost_events++; + chart_lost_metrics = chart_buffered_metrics; + } else +#endif + if(failures > buffer_on_failures) { + // too bad! we are going to lose data + chart_lost_bytes += buffer_strlen(b); + error("BACKEND: reached %d backend failures. Flushing buffers to protect this host - this results in data loss on back-end server '%s'", failures, destination); + buffer_flush(b); + failures = 0; + chart_data_lost_events++; + chart_lost_metrics = chart_buffered_metrics; + } + + if(unlikely(netdata_exit)) break; + + // ------------------------------------------------------------------------ + // update the monitoring charts + + if(likely(chart_ops->counter_done)) rrdset_next(chart_ops); + rrddim_set(chart_ops, "read", chart_receptions); + rrddim_set(chart_ops, "write", chart_transmission_successes); + rrddim_set(chart_ops, "discard", chart_data_lost_events); + rrddim_set(chart_ops, "failure", chart_transmission_failures); + rrddim_set(chart_ops, "reconnect", chart_backend_reconnects); + rrdset_done(chart_ops); + + if(likely(chart_metrics->counter_done)) rrdset_next(chart_metrics); + rrddim_set(chart_metrics, "buffered", chart_buffered_metrics); + rrddim_set(chart_metrics, "lost", chart_lost_metrics); + rrddim_set(chart_metrics, "sent", chart_sent_metrics); + rrdset_done(chart_metrics); + + if(likely(chart_bytes->counter_done)) rrdset_next(chart_bytes); + rrddim_set(chart_bytes, "buffered", chart_buffered_bytes); + rrddim_set(chart_bytes, "lost", chart_lost_bytes); + rrddim_set(chart_bytes, "sent", chart_sent_bytes); + rrddim_set(chart_bytes, "received", chart_received_bytes); + rrdset_done(chart_bytes); + + /* + if(likely(chart_latency->counter_done)) rrdset_next(chart_latency); + rrddim_set(chart_latency, "latency", chart_backend_latency); + rrdset_done(chart_latency); + */ + + getrusage(RUSAGE_THREAD, &thread); + if(likely(chart_rusage->counter_done)) rrdset_next(chart_rusage); + rrddim_set(chart_rusage, "user", thread.ru_utime.tv_sec * 1000000ULL + thread.ru_utime.tv_usec); + rrddim_set(chart_rusage, "system", thread.ru_stime.tv_sec * 1000000ULL + thread.ru_stime.tv_usec); + rrdset_done(chart_rusage); + + if(likely(buffer_strlen(b) == 0)) + chart_buffered_metrics = 0; + + if(unlikely(netdata_exit)) break; + } + +cleanup: +#if HAVE_KINESIS + if(do_kinesis) { + backends_kinesis_shutdown(); + freez(kinesis_auth_key_id); + freez(kinesis_secure_key); + freez(kinesis_stream_name); + } +#endif + +#if ENABLE_PROMETHEUS_REMOTE_WRITE + buffer_free(http_request_header); + if(do_prometheus_remote_write) + backends_protocol_buffers_shutdown(); +#endif + +#if HAVE_MONGOC + if(do_mongodb) { + backends_mongodb_cleanup(); + freez(mongodb_uri); + freez(mongodb_database); + freez(mongodb_collection); + } +#endif + + if(sock != -1) + close(sock); + + buffer_free(b); + buffer_free(response); + +#ifdef ENABLE_HTTPS + if(netdata_exporting_ctx) { + if(opentsdb_ssl.conn) { + SSL_free(opentsdb_ssl.conn); + } + } +#endif + + netdata_thread_cleanup_pop(1); + return NULL; +} diff --git a/backends/backends.h b/backends/backends.h new file mode 100644 index 0000000..2f4efd9 --- /dev/null +++ b/backends/backends.h @@ -0,0 +1,97 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_BACKENDS_H +#define NETDATA_BACKENDS_H 1 + +#include "daemon/common.h" + +typedef enum backend_options { + BACKEND_OPTION_NONE = 0, + + BACKEND_SOURCE_DATA_AS_COLLECTED = (1 << 0), + BACKEND_SOURCE_DATA_AVERAGE = (1 << 1), + BACKEND_SOURCE_DATA_SUM = (1 << 2), + + BACKEND_OPTION_SEND_NAMES = (1 << 16) +} BACKEND_OPTIONS; + +typedef enum backend_types { + BACKEND_TYPE_UNKNOWN, // Invalid type + BACKEND_TYPE_GRAPHITE, // Send plain text to Graphite + BACKEND_TYPE_OPENTSDB_USING_TELNET, // Send data to OpenTSDB using telnet API + BACKEND_TYPE_OPENTSDB_USING_HTTP, // Send data to OpenTSDB using HTTP API + BACKEND_TYPE_JSON, // Stores the data using JSON. + BACKEND_TYPE_PROMETHEUS_REMOTE_WRITE, // The user selected to use Prometheus backend + BACKEND_TYPE_KINESIS, // Send message to AWS Kinesis + BACKEND_TYPE_MONGODB, // Send data to MongoDB collection + BACKEND_TYPE_NUM // Number of backend types +} BACKEND_TYPE; + +typedef int (**backend_response_checker_t)(BUFFER *); +typedef int (**backend_request_formatter_t)(BUFFER *, const char *, RRDHOST *, const char *, RRDSET *, RRDDIM *, time_t, time_t, BACKEND_OPTIONS); + +#define BACKEND_OPTIONS_SOURCE_BITS (BACKEND_SOURCE_DATA_AS_COLLECTED|BACKEND_SOURCE_DATA_AVERAGE|BACKEND_SOURCE_DATA_SUM) +#define BACKEND_OPTIONS_DATA_SOURCE(backend_options) (backend_options & BACKEND_OPTIONS_SOURCE_BITS) + +extern int global_backend_update_every; +extern BACKEND_OPTIONS global_backend_options; +extern const char *global_backend_source; +extern const char *global_backend_prefix; + +extern void *backends_main(void *ptr); +BACKEND_TYPE backend_select_type(const char *type); + +extern BACKEND_OPTIONS backend_parse_data_source(const char *source, BACKEND_OPTIONS backend_options); + +#ifdef BACKENDS_INTERNALS + +extern int backends_can_send_rrdset(BACKEND_OPTIONS backend_options, RRDSET *st); +extern calculated_number backend_calculate_value_from_stored_data( + RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap + , time_t *first_timestamp // the timestamp of the first point used in this response + , time_t *last_timestamp // the timestamp that should be reported to backend +); + +extern size_t backend_name_copy(char *d, const char *s, size_t usable); +extern int discard_response(BUFFER *b, const char *backend); + +static inline char *strip_quotes(char *str) { + if(*str == '"' || *str == '\'') { + char *s; + + str++; + + s = str; + while(*s) s++; + if(s != str) s--; + + if(*s == '"' || *s == '\'') *s = '\0'; + } + + return str; +} + +#endif // BACKENDS_INTERNALS + +#include "backends/prometheus/backend_prometheus.h" +#include "backends/graphite/graphite.h" +#include "backends/json/json.h" +#include "backends/opentsdb/opentsdb.h" + +#if HAVE_KINESIS +#include "backends/aws_kinesis/aws_kinesis.h" +#endif + +#if ENABLE_PROMETHEUS_REMOTE_WRITE +#include "backends/prometheus/remote_write/remote_write.h" +#endif + +#if HAVE_MONGOC +#include "backends/mongodb/mongodb.h" +#endif + +#endif /* NETDATA_BACKENDS_H */ diff --git a/backends/graphite/Makefile.am b/backends/graphite/Makefile.am new file mode 100644 index 0000000..babdcf0 --- /dev/null +++ b/backends/graphite/Makefile.am @@ -0,0 +1,4 @@ +# SPDX-License-Identifier: GPL-3.0-or-later + +AUTOMAKE_OPTIONS = subdir-objects +MAINTAINERCLEANFILES = $(srcdir)/Makefile.in diff --git a/backends/graphite/graphite.c b/backends/graphite/graphite.c new file mode 100644 index 0000000..f75a93a --- /dev/null +++ b/backends/graphite/graphite.c @@ -0,0 +1,90 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#define BACKENDS_INTERNALS +#include "graphite.h" + +// ---------------------------------------------------------------------------- +// graphite backend + +int backends_format_dimension_collected_graphite_plaintext( + BUFFER *b // the buffer to write data to + , const char *prefix // the prefix to use + , RRDHOST *host // the host this chart comes from + , const char *hostname // the hostname (to override host->hostname) + , RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap +) { + (void)host; + (void)after; + (void)before; + + char chart_name[RRD_ID_LENGTH_MAX + 1]; + char dimension_name[RRD_ID_LENGTH_MAX + 1]; + backend_name_copy(chart_name, (backend_options & BACKEND_OPTION_SEND_NAMES && st->name)?st->name:st->id, RRD_ID_LENGTH_MAX); + backend_name_copy(dimension_name, (backend_options & BACKEND_OPTION_SEND_NAMES && rd->name)?rd->name:rd->id, RRD_ID_LENGTH_MAX); + + buffer_sprintf( + b + , "%s.%s.%s.%s%s%s " COLLECTED_NUMBER_FORMAT " %llu\n" + , prefix + , hostname + , chart_name + , dimension_name + , (host->tags)?";":"" + , (host->tags)?host->tags:"" + , rd->last_collected_value + , (unsigned long long)rd->last_collected_time.tv_sec + ); + + return 1; +} + +int backends_format_dimension_stored_graphite_plaintext( + BUFFER *b // the buffer to write data to + , const char *prefix // the prefix to use + , RRDHOST *host // the host this chart comes from + , const char *hostname // the hostname (to override host->hostname) + , RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap +) { + (void)host; + + char chart_name[RRD_ID_LENGTH_MAX + 1]; + char dimension_name[RRD_ID_LENGTH_MAX + 1]; + backend_name_copy(chart_name, (backend_options & BACKEND_OPTION_SEND_NAMES && st->name)?st->name:st->id, RRD_ID_LENGTH_MAX); + backend_name_copy(dimension_name, (backend_options & BACKEND_OPTION_SEND_NAMES && rd->name)?rd->name:rd->id, RRD_ID_LENGTH_MAX); + + time_t first_t = after, last_t = before; + calculated_number value = backend_calculate_value_from_stored_data(st, rd, after, before, backend_options, &first_t, &last_t); + + if(!isnan(value)) { + + buffer_sprintf( + b + , "%s.%s.%s.%s%s%s " CALCULATED_NUMBER_FORMAT " %llu\n" + , prefix + , hostname + , chart_name + , dimension_name + , (host->tags)?";":"" + , (host->tags)?host->tags:"" + , value + , (unsigned long long) last_t + ); + + return 1; + } + return 0; +} + +int process_graphite_response(BUFFER *b) { + return discard_response(b, "graphite"); +} + + diff --git a/backends/graphite/graphite.h b/backends/graphite/graphite.h new file mode 100644 index 0000000..498a7fc --- /dev/null +++ b/backends/graphite/graphite.h @@ -0,0 +1,35 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + + +#ifndef NETDATA_BACKEND_GRAPHITE_H +#define NETDATA_BACKEND_GRAPHITE_H + +#include "backends/backends.h" + +extern int backends_format_dimension_collected_graphite_plaintext( + BUFFER *b // the buffer to write data to + , const char *prefix // the prefix to use + , RRDHOST *host // the host this chart comes from + , const char *hostname // the hostname (to override host->hostname) + , RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap +); + +extern int backends_format_dimension_stored_graphite_plaintext( + BUFFER *b // the buffer to write data to + , const char *prefix // the prefix to use + , RRDHOST *host // the host this chart comes from + , const char *hostname // the hostname (to override host->hostname) + , RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap +); + +extern int process_graphite_response(BUFFER *b); + +#endif //NETDATA_BACKEND_GRAPHITE_H diff --git a/backends/json/Makefile.am b/backends/json/Makefile.am new file mode 100644 index 0000000..babdcf0 --- /dev/null +++ b/backends/json/Makefile.am @@ -0,0 +1,4 @@ +# SPDX-License-Identifier: GPL-3.0-or-later + +AUTOMAKE_OPTIONS = subdir-objects +MAINTAINERCLEANFILES = $(srcdir)/Makefile.in diff --git a/backends/json/json.c b/backends/json/json.c new file mode 100644 index 0000000..0c7cc73 --- /dev/null +++ b/backends/json/json.c @@ -0,0 +1,152 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#define BACKENDS_INTERNALS +#include "json.h" + +// ---------------------------------------------------------------------------- +// json backend + +int backends_format_dimension_collected_json_plaintext( + BUFFER *b // the buffer to write data to + , const char *prefix // the prefix to use + , RRDHOST *host // the host this chart comes from + , const char *hostname // the hostname (to override host->hostname) + , RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap +) { + (void)host; + (void)after; + (void)before; + (void)backend_options; + + const char *tags_pre = "", *tags_post = "", *tags = host->tags; + if(!tags) tags = ""; + + if(*tags) { + if(*tags == '{' || *tags == '[' || *tags == '"') { + tags_pre = "\"host_tags\":"; + tags_post = ","; + } + else { + tags_pre = "\"host_tags\":\""; + tags_post = "\","; + } + } + + buffer_sprintf(b, "{" + "\"prefix\":\"%s\"," + "\"hostname\":\"%s\"," + "%s%s%s" + + "\"chart_id\":\"%s\"," + "\"chart_name\":\"%s\"," + "\"chart_family\":\"%s\"," + "\"chart_context\": \"%s\"," + "\"chart_type\":\"%s\"," + "\"units\": \"%s\"," + + "\"id\":\"%s\"," + "\"name\":\"%s\"," + "\"value\":" COLLECTED_NUMBER_FORMAT "," + + "\"timestamp\": %llu}\n", + prefix, + hostname, + tags_pre, tags, tags_post, + + st->id, + st->name, + st->family, + st->context, + st->type, + st->units, + + rd->id, + rd->name, + rd->last_collected_value, + + (unsigned long long) rd->last_collected_time.tv_sec + ); + + return 1; +} + +int backends_format_dimension_stored_json_plaintext( + BUFFER *b // the buffer to write data to + , const char *prefix // the prefix to use + , RRDHOST *host // the host this chart comes from + , const char *hostname // the hostname (to override host->hostname) + , RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap +) { + (void)host; + + time_t first_t = after, last_t = before; + calculated_number value = backend_calculate_value_from_stored_data(st, rd, after, before, backend_options, &first_t, &last_t); + + if(!isnan(value)) { + const char *tags_pre = "", *tags_post = "", *tags = host->tags; + if(!tags) tags = ""; + + if(*tags) { + if(*tags == '{' || *tags == '[' || *tags == '"') { + tags_pre = "\"host_tags\":"; + tags_post = ","; + } + else { + tags_pre = "\"host_tags\":\""; + tags_post = "\","; + } + } + + buffer_sprintf(b, "{" + "\"prefix\":\"%s\"," + "\"hostname\":\"%s\"," + "%s%s%s" + + "\"chart_id\":\"%s\"," + "\"chart_name\":\"%s\"," + "\"chart_family\":\"%s\"," + "\"chart_context\": \"%s\"," + "\"chart_type\":\"%s\"," + "\"units\": \"%s\"," + + "\"id\":\"%s\"," + "\"name\":\"%s\"," + "\"value\":" CALCULATED_NUMBER_FORMAT "," + + "\"timestamp\": %llu}\n", + prefix, + hostname, + tags_pre, tags, tags_post, + + st->id, + st->name, + st->family, + st->context, + st->type, + st->units, + + rd->id, + rd->name, + value, + + (unsigned long long) last_t + ); + + return 1; + } + return 0; +} + +int process_json_response(BUFFER *b) { + return discard_response(b, "json"); +} + + diff --git a/backends/json/json.h b/backends/json/json.h new file mode 100644 index 0000000..78ac376 --- /dev/null +++ b/backends/json/json.h @@ -0,0 +1,34 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_BACKEND_JSON_H +#define NETDATA_BACKEND_JSON_H + +#include "backends/backends.h" + +extern int backends_format_dimension_collected_json_plaintext( + BUFFER *b // the buffer to write data to + , const char *prefix // the prefix to use + , RRDHOST *host // the host this chart comes from + , const char *hostname // the hostname (to override host->hostname) + , RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap +); + +extern int backends_format_dimension_stored_json_plaintext( + BUFFER *b // the buffer to write data to + , const char *prefix // the prefix to use + , RRDHOST *host // the host this chart comes from + , const char *hostname // the hostname (to override host->hostname) + , RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap +); + +extern int process_json_response(BUFFER *b); + +#endif //NETDATA_BACKEND_JSON_H diff --git a/backends/mongodb/Makefile.am b/backends/mongodb/Makefile.am new file mode 100644 index 0000000..161784b --- /dev/null +++ b/backends/mongodb/Makefile.am @@ -0,0 +1,8 @@ +# SPDX-License-Identifier: GPL-3.0-or-later + +AUTOMAKE_OPTIONS = subdir-objects +MAINTAINERCLEANFILES = $(srcdir)/Makefile.in + +dist_noinst_DATA = \ + README.md \ + $(NULL) diff --git a/backends/mongodb/README.md b/backends/mongodb/README.md new file mode 100644 index 0000000..7c7996e --- /dev/null +++ b/backends/mongodb/README.md @@ -0,0 +1,41 @@ +<!-- +title: "MongoDB backend" +custom_edit_url: https://github.com/netdata/netdata/edit/master/backends/mongodb/README.md +--> + +# MongoDB backend + +## Prerequisites + +To use MongoDB as a backend, `libmongoc` 1.7.0 or higher should be +[installed](http://mongoc.org/libmongoc/current/installing.html) first. Next, Netdata should be re-installed from the +source. The installer will detect that the required libraries are now available. + +## Configuration + +To enable data sending to the MongoDB backend set the following options in `netdata.conf`: + +```conf +[backend] + enabled = yes + type = mongodb +``` + +In the Netdata configuration directory run `./edit-config mongodb.conf` and set [MongoDB +URI](https://docs.mongodb.com/manual/reference/connection-string/), database name, and collection name: + +```yaml +# URI +uri = mongodb://<hostname> + +# database name +database = your_database_name + +# collection name +collection = your_collection_name +``` + +The default socket timeout depends on the backend update interval. The timeout is 500 ms shorter than the interval (but +not less than 1000 ms). You can alter the timeout using the `sockettimeoutms` MongoDB URI option. + +[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fbackends%2Fmongodb%2FREADME&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/backends/mongodb/mongodb.c b/backends/mongodb/mongodb.c new file mode 100644 index 0000000..d0527a7 --- /dev/null +++ b/backends/mongodb/mongodb.c @@ -0,0 +1,189 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#define BACKENDS_INTERNALS +#include "mongodb.h" +#include <mongoc.h> + +#define CONFIG_FILE_LINE_MAX ((CONFIG_MAX_NAME + CONFIG_MAX_VALUE + 1024) * 2) + +static mongoc_client_t *mongodb_client; +static mongoc_collection_t *mongodb_collection; + +int backends_mongodb_init(const char *uri_string, + const char *database_string, + const char *collection_string, + int32_t default_socket_timeout) { + mongoc_uri_t *uri; + bson_error_t error; + + mongoc_init(); + + uri = mongoc_uri_new_with_error(uri_string, &error); + if(unlikely(!uri)) { + error("BACKEND: failed to parse URI: %s. Error message: %s", uri_string, error.message); + return 1; + } + + int32_t socket_timeout = mongoc_uri_get_option_as_int32(uri, MONGOC_URI_SOCKETTIMEOUTMS, default_socket_timeout); + if(!mongoc_uri_set_option_as_int32(uri, MONGOC_URI_SOCKETTIMEOUTMS, socket_timeout)) { + error("BACKEND: failed to set %s to the value %d", MONGOC_URI_SOCKETTIMEOUTMS, socket_timeout); + return 1; + }; + + mongodb_client = mongoc_client_new_from_uri(uri); + if(unlikely(!mongodb_client)) { + error("BACKEND: failed to create a new client"); + return 1; + } + + if(!mongoc_client_set_appname(mongodb_client, "netdata")) { + error("BACKEND: failed to set client appname"); + }; + + mongodb_collection = mongoc_client_get_collection(mongodb_client, database_string, collection_string); + + mongoc_uri_destroy(uri); + + return 0; +} + +void backends_free_bson(bson_t **insert, size_t n_documents) { + size_t i; + + for(i = 0; i < n_documents; i++) + bson_destroy(insert[i]); + + free(insert); +} + +int backends_mongodb_insert(char *data, size_t n_metrics) { + bson_t **insert = calloc(n_metrics, sizeof(bson_t *)); + bson_error_t error; + char *start = data, *end = data; + size_t n_documents = 0; + + while(*end && n_documents <= n_metrics) { + while(*end && *end != '\n') end++; + + if(likely(*end)) { + *end = '\0'; + end++; + } + else { + break; + } + + insert[n_documents] = bson_new_from_json((const uint8_t *)start, -1, &error); + + if(unlikely(!insert[n_documents])) { + error("BACKEND: %s", error.message); + backends_free_bson(insert, n_documents); + return 1; + } + + start = end; + + n_documents++; + } + + if(unlikely(!mongoc_collection_insert_many(mongodb_collection, (const bson_t **)insert, n_documents, NULL, NULL, &error))) { + error("BACKEND: %s", error.message); + backends_free_bson(insert, n_documents); + return 1; + } + + backends_free_bson(insert, n_documents); + + return 0; +} + +void backends_mongodb_cleanup() { + mongoc_collection_destroy(mongodb_collection); + mongoc_client_destroy(mongodb_client); + mongoc_cleanup(); + + return; +} + +int read_mongodb_conf(const char *path, char **uri_p, char **database_p, char **collection_p) { + char *uri = *uri_p; + char *database = *database_p; + char *collection = *collection_p; + + if(unlikely(uri)) freez(uri); + if(unlikely(database)) freez(database); + if(unlikely(collection)) freez(collection); + uri = NULL; + database = NULL; + collection = NULL; + + int line = 0; + + char filename[FILENAME_MAX + 1]; + snprintfz(filename, FILENAME_MAX, "%s/mongodb.conf", path); + + char buffer[CONFIG_FILE_LINE_MAX + 1], *s; + + debug(D_BACKEND, "BACKEND: opening config file '%s'", filename); + + FILE *fp = fopen(filename, "r"); + if(!fp) { + return 1; + } + + while(fgets(buffer, CONFIG_FILE_LINE_MAX, fp) != NULL) { + buffer[CONFIG_FILE_LINE_MAX] = '\0'; + line++; + + s = trim(buffer); + if(!s || *s == '#') { + debug(D_BACKEND, "BACKEND: ignoring line %d of file '%s', it is empty.", line, filename); + continue; + } + + char *name = s; + char *value = strchr(s, '='); + if(unlikely(!value)) { + error("BACKEND: ignoring line %d ('%s') of file '%s', there is no = in it.", line, s, filename); + continue; + } + *value = '\0'; + value++; + + name = trim(name); + value = trim(value); + + if(unlikely(!name || *name == '#')) { + error("BACKEND: ignoring line %d of file '%s', name is empty.", line, filename); + continue; + } + + if(!value) + value = ""; + else + value = strip_quotes(value); + + if(name[0] == 'u' && !strcmp(name, "uri")) { + uri = strdupz(value); + } + else if(name[0] == 'd' && !strcmp(name, "database")) { + database = strdupz(value); + } + else if(name[0] == 'c' && !strcmp(name, "collection")) { + collection = strdupz(value); + } + } + + fclose(fp); + + if(unlikely(!collection || !*collection)) { + error("BACKEND: collection name is a mandatory MongoDB parameter, but it is not configured"); + return 1; + } + + *uri_p = uri; + *database_p = database; + *collection_p = collection; + + return 0; +} diff --git a/backends/mongodb/mongodb.conf b/backends/mongodb/mongodb.conf new file mode 100644 index 0000000..11ea6ef --- /dev/null +++ b/backends/mongodb/mongodb.conf @@ -0,0 +1,12 @@ +# MongoDB backend configuration +# +# All options in this file are mandatory + +# URI +uri = + +# database name +database = + +# collection name +collection = diff --git a/backends/mongodb/mongodb.h b/backends/mongodb/mongodb.h new file mode 100644 index 0000000..cae9b09 --- /dev/null +++ b/backends/mongodb/mongodb.h @@ -0,0 +1,16 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_BACKEND_MONGODB_H +#define NETDATA_BACKEND_MONGODB_H + +#include "backends/backends.h" + +extern int backends_mongodb_init(const char *uri_string, const char *database_string, const char *collection_string, const int32_t socket_timeout); + +extern int backends_mongodb_insert(char *data, size_t n_metrics); + +extern void backends_mongodb_cleanup(); + +extern int read_mongodb_conf(const char *path, char **uri_p, char **database_p, char **collection_p); + +#endif //NETDATA_BACKEND_MONGODB_H diff --git a/backends/nc-backend.sh b/backends/nc-backend.sh new file mode 100755 index 0000000..7280f86 --- /dev/null +++ b/backends/nc-backend.sh @@ -0,0 +1,158 @@ +#!/usr/bin/env bash + +# SPDX-License-Identifier: GPL-3.0-or-later + +# This is a simple backend database proxy, written in BASH, using the nc command. +# Run the script without any parameters for help. + +MODE="${1}" +MY_PORT="${2}" +BACKEND_HOST="${3}" +BACKEND_PORT="${4}" +FILE="${NETDATA_NC_BACKEND_DIR-/tmp}/netdata-nc-backend-${MY_PORT}" + +log() { + logger --stderr --id=$$ --tag "netdata-nc-backend" "${*}" +} + +mync() { + local ret + + log "Running: nc ${*}" + nc "${@}" + ret=$? + + log "nc stopped with return code ${ret}." + + return ${ret} +} + +listen_save_replay_forever() { + local file="${1}" port="${2}" real_backend_host="${3}" real_backend_port="${4}" ret delay=1 started ended + + while true + do + log "Starting nc to listen on port ${port} and save metrics to ${file}" + + started=$(date +%s) + mync -l -p "${port}" | tee -a -p --output-error=exit "${file}" + ended=$(date +%s) + + if [ -s "${file}" ] + then + if [ ! -z "${real_backend_host}" ] && [ ! -z "${real_backend_port}" ] + then + log "Attempting to send the metrics to the real backend at ${real_backend_host}:${real_backend_port}" + + mync "${real_backend_host}" "${real_backend_port}" <"${file}" + ret=$? + + if [ ${ret} -eq 0 ] + then + log "Successfuly sent the metrics to ${real_backend_host}:${real_backend_port}" + mv "${file}" "${file}.old" + touch "${file}" + else + log "Failed to send the metrics to ${real_backend_host}:${real_backend_port} (nc returned ${ret}) - appending more data to ${file}" + fi + else + log "No backend configured - appending more data to ${file}" + fi + fi + + # prevent a CPU hungry infinite loop + # if nc cannot listen to port + if [ $((ended - started)) -lt 5 ] + then + log "nc has been stopped too fast." + delay=30 + else + delay=1 + fi + + log "Waiting ${delay} seconds before listening again for data." + sleep ${delay} + done +} + +if [ "${MODE}" = "start" ] + then + + # start the listener, in exclusive mode + # only one can use the same file/port at a time + { + flock -n 9 + # shellcheck disable=SC2181 + if [ $? -ne 0 ] + then + log "Cannot get exclusive lock on file ${FILE}.lock - Am I running multiple times?" + exit 2 + fi + + # save our PID to the lock file + echo "$$" >"${FILE}.lock" + + listen_save_replay_forever "${FILE}" "${MY_PORT}" "${BACKEND_HOST}" "${BACKEND_PORT}" + ret=$? + + log "listener exited." + exit ${ret} + + } 9>>"${FILE}.lock" + + # we can only get here if ${FILE}.lock cannot be created + log "Cannot create file ${FILE}." + exit 3 + +elif [ "${MODE}" = "stop" ] + then + + { + flock -n 9 + # shellcheck disable=SC2181 + if [ $? -ne 0 ] + then + pid=$(<"${FILE}".lock) + log "Killing process ${pid}..." + kill -TERM "-${pid}" + exit 0 + fi + + log "File ${FILE}.lock has been locked by me but it shouldn't. Is a collector running?" + exit 4 + + } 9<"${FILE}.lock" + + log "File ${FILE}.lock does not exist. Is a collector running?" + exit 5 + +else + + cat <<EOF +Usage: + + "${0}" start|stop PORT [BACKEND_HOST BACKEND_PORT] + + PORT The port this script will listen + (configure netdata to use this as a second backend) + + BACKEND_HOST The real backend host + BACKEND_PORT The real backend port + + This script can act as fallback backend for netdata. + It will receive metrics from netdata, save them to + ${FILE} + and once netdata reconnects to the real-backend, this script + will push all metrics collected to the real-backend too and + wait for a failure to happen again. + + Only one netdata can connect to this script at a time. + If you need fallback for multiple netdata, run this script + multiple times with different ports. + + You can run me in the background with this: + + screen -d -m "${0}" start PORT [BACKEND_HOST BACKEND_PORT] +EOF + exit 1 +fi diff --git a/backends/opentsdb/Makefile.am b/backends/opentsdb/Makefile.am new file mode 100644 index 0000000..babdcf0 --- /dev/null +++ b/backends/opentsdb/Makefile.am @@ -0,0 +1,4 @@ +# SPDX-License-Identifier: GPL-3.0-or-later + +AUTOMAKE_OPTIONS = subdir-objects +MAINTAINERCLEANFILES = $(srcdir)/Makefile.in diff --git a/backends/opentsdb/README.md b/backends/opentsdb/README.md new file mode 100644 index 0000000..5ba7b12 --- /dev/null +++ b/backends/opentsdb/README.md @@ -0,0 +1,38 @@ +<!-- +title: "OpenTSDB with HTTP" +custom_edit_url: https://github.com/netdata/netdata/edit/master/backends/opentsdb/README.md +--> + +# OpenTSDB with HTTP + +Netdata can easily communicate with OpenTSDB using HTTP API. To enable this channel, set the following options in your +`netdata.conf`: + +```conf +[backend] + type = opentsdb:http + destination = localhost:4242 +``` + +In this example, OpenTSDB is running with its default port, which is `4242`. If you run OpenTSDB on a different port, +change the `destination = localhost:4242` line accordingly. + +## HTTPS + +As of [v1.16.0](https://github.com/netdata/netdata/releases/tag/v1.16.0), Netdata can send metrics to OpenTSDB using +TLS/SSL. Unfortunately, OpenTDSB does not support encrypted connections, so you will have to configure a reverse proxy +to enable HTTPS communication between Netdata and OpenTSDB. You can set up a reverse proxy with +[Nginx](/docs/Running-behind-nginx.md). + +After your proxy is configured, make the following changes to `netdata.conf`: + +```conf +[backend] + type = opentsdb:https + destination = localhost:8082 +``` + +In this example, we used the port `8082` for our reverse proxy. If your reverse proxy listens on a different port, +change the `destination = localhost:8082` line accordingly. + +[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fbackends%2Fopentsdb%2FREADME&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)]() diff --git a/backends/opentsdb/opentsdb.c b/backends/opentsdb/opentsdb.c new file mode 100644 index 0000000..965b4c0 --- /dev/null +++ b/backends/opentsdb/opentsdb.c @@ -0,0 +1,205 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#define BACKENDS_INTERNALS +#include "opentsdb.h" + +// ---------------------------------------------------------------------------- +// opentsdb backend + +int backends_format_dimension_collected_opentsdb_telnet( + BUFFER *b // the buffer to write data to + , const char *prefix // the prefix to use + , RRDHOST *host // the host this chart comes from + , const char *hostname // the hostname (to override host->hostname) + , RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap +) { + (void)host; + (void)after; + (void)before; + + char chart_name[RRD_ID_LENGTH_MAX + 1]; + char dimension_name[RRD_ID_LENGTH_MAX + 1]; + backend_name_copy(chart_name, (backend_options & BACKEND_OPTION_SEND_NAMES && st->name)?st->name:st->id, RRD_ID_LENGTH_MAX); + backend_name_copy(dimension_name, (backend_options & BACKEND_OPTION_SEND_NAMES && rd->name)?rd->name:rd->id, RRD_ID_LENGTH_MAX); + + buffer_sprintf( + b + , "put %s.%s.%s %llu " COLLECTED_NUMBER_FORMAT " host=%s%s%s\n" + , prefix + , chart_name + , dimension_name + , (unsigned long long)rd->last_collected_time.tv_sec + , rd->last_collected_value + , hostname + , (host->tags)?" ":"" + , (host->tags)?host->tags:"" + ); + + return 1; +} + +int backends_format_dimension_stored_opentsdb_telnet( + BUFFER *b // the buffer to write data to + , const char *prefix // the prefix to use + , RRDHOST *host // the host this chart comes from + , const char *hostname // the hostname (to override host->hostname) + , RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap +) { + (void)host; + + time_t first_t = after, last_t = before; + calculated_number value = backend_calculate_value_from_stored_data(st, rd, after, before, backend_options, &first_t, &last_t); + + char chart_name[RRD_ID_LENGTH_MAX + 1]; + char dimension_name[RRD_ID_LENGTH_MAX + 1]; + backend_name_copy(chart_name, (backend_options & BACKEND_OPTION_SEND_NAMES && st->name)?st->name:st->id, RRD_ID_LENGTH_MAX); + backend_name_copy(dimension_name, (backend_options & BACKEND_OPTION_SEND_NAMES && rd->name)?rd->name:rd->id, RRD_ID_LENGTH_MAX); + + if(!isnan(value)) { + + buffer_sprintf( + b + , "put %s.%s.%s %llu " CALCULATED_NUMBER_FORMAT " host=%s%s%s\n" + , prefix + , chart_name + , dimension_name + , (unsigned long long) last_t + , value + , hostname + , (host->tags)?" ":"" + , (host->tags)?host->tags:"" + ); + + return 1; + } + + return 0; +} + +int process_opentsdb_response(BUFFER *b) { + return discard_response(b, "opentsdb"); +} + +static inline void opentsdb_build_message(BUFFER *b, char *message, const char *hostname, int length) { + buffer_sprintf( + b + , "POST /api/put HTTP/1.1\r\n" + "Host: %s\r\n" + "Content-Type: application/json\r\n" + "Content-Length: %d\r\n" + "\r\n" + "%s" + , hostname + , length + , message + ); +} + +int backends_format_dimension_collected_opentsdb_http( + BUFFER *b // the buffer to write data to + , const char *prefix // the prefix to use + , RRDHOST *host // the host this chart comes from + , const char *hostname // the hostname (to override host->hostname) + , RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap +) { + (void)host; + (void)after; + (void)before; + + char message[1024]; + char chart_name[RRD_ID_LENGTH_MAX + 1]; + char dimension_name[RRD_ID_LENGTH_MAX + 1]; + backend_name_copy(chart_name, (backend_options & BACKEND_OPTION_SEND_NAMES && st->name)?st->name:st->id, RRD_ID_LENGTH_MAX); + backend_name_copy(dimension_name, (backend_options & BACKEND_OPTION_SEND_NAMES && rd->name)?rd->name:rd->id, RRD_ID_LENGTH_MAX); + + int length = snprintfz(message + , sizeof(message) + , "{" + " \"metric\": \"%s.%s.%s\"," + " \"timestamp\": %llu," + " \"value\": "COLLECTED_NUMBER_FORMAT "," + " \"tags\": {" + " \"host\": \"%s%s%s\"" + " }" + "}" + , prefix + , chart_name + , dimension_name + , (unsigned long long)rd->last_collected_time.tv_sec + , rd->last_collected_value + , hostname + , (host->tags)?" ":"" + , (host->tags)?host->tags:"" + ); + + if(length > 0) { + opentsdb_build_message(b, message, hostname, length); + } + + return 1; +} + +int backends_format_dimension_stored_opentsdb_http( + BUFFER *b // the buffer to write data to + , const char *prefix // the prefix to use + , RRDHOST *host // the host this chart comes from + , const char *hostname // the hostname (to override host->hostname) + , RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap +) { + (void)host; + + time_t first_t = after, last_t = before; + calculated_number value = backend_calculate_value_from_stored_data(st, rd, after, before, backend_options, &first_t, &last_t); + + if(!isnan(value)) { + char chart_name[RRD_ID_LENGTH_MAX + 1]; + char dimension_name[RRD_ID_LENGTH_MAX + 1]; + backend_name_copy(chart_name, (backend_options & BACKEND_OPTION_SEND_NAMES && st->name)?st->name:st->id, RRD_ID_LENGTH_MAX); + backend_name_copy(dimension_name, (backend_options & BACKEND_OPTION_SEND_NAMES && rd->name)?rd->name:rd->id, RRD_ID_LENGTH_MAX); + + char message[1024]; + int length = snprintfz(message + , sizeof(message) + , "{" + " \"metric\": \"%s.%s.%s\"," + " \"timestamp\": %llu," + " \"value\": "CALCULATED_NUMBER_FORMAT "," + " \"tags\": {" + " \"host\": \"%s%s%s\"" + " }" + "}" + , prefix + , chart_name + , dimension_name + , (unsigned long long)last_t + , value + , hostname + , (host->tags)?" ":"" + , (host->tags)?host->tags:"" + ); + + if(length > 0) { + opentsdb_build_message(b, message, hostname, length); + } + + return 1; + } + + return 0; +} diff --git a/backends/opentsdb/opentsdb.h b/backends/opentsdb/opentsdb.h new file mode 100644 index 0000000..87d9c5c --- /dev/null +++ b/backends/opentsdb/opentsdb.h @@ -0,0 +1,58 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_BACKEND_OPENTSDB_H +#define NETDATA_BACKEND_OPENTSDB_H + +#include "backends/backends.h" + +extern int backends_format_dimension_collected_opentsdb_telnet( + BUFFER *b // the buffer to write data to + , const char *prefix // the prefix to use + , RRDHOST *host // the host this chart comes from + , const char *hostname // the hostname (to override host->hostname) + , RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap +); + +extern int backends_format_dimension_stored_opentsdb_telnet( + BUFFER *b // the buffer to write data to + , const char *prefix // the prefix to use + , RRDHOST *host // the host this chart comes from + , const char *hostname // the hostname (to override host->hostname) + , RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap +); + +extern int process_opentsdb_response(BUFFER *b); + +int backends_format_dimension_collected_opentsdb_http( + BUFFER *b // the buffer to write data to + , const char *prefix // the prefix to use + , RRDHOST *host // the host this chart comes from + , const char *hostname // the hostname (to override host->hostname) + , RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap +); + +int backends_format_dimension_stored_opentsdb_http( + BUFFER *b // the buffer to write data to + , const char *prefix // the prefix to use + , RRDHOST *host // the host this chart comes from + , const char *hostname // the hostname (to override host->hostname) + , RRDSET *st // the chart + , RRDDIM *rd // the dimension + , time_t after // the start timestamp + , time_t before // the end timestamp + , BACKEND_OPTIONS backend_options // BACKEND_SOURCE_* bitmap +); + +#endif //NETDATA_BACKEND_OPENTSDB_H diff --git a/backends/prometheus/Makefile.am b/backends/prometheus/Makefile.am new file mode 100644 index 0000000..334fca8 --- /dev/null +++ b/backends/prometheus/Makefile.am @@ -0,0 +1,12 @@ +# SPDX-License-Identifier: GPL-3.0-or-later + +AUTOMAKE_OPTIONS = subdir-objects +MAINTAINERCLEANFILES = $(srcdir)/Makefile.in + +SUBDIRS = \ + remote_write \ + $(NULL) + +dist_noinst_DATA = \ + README.md \ + $(NULL) diff --git a/backends/prometheus/README.md b/backends/prometheus/README.md new file mode 100644 index 0000000..10275fa --- /dev/null +++ b/backends/prometheus/README.md @@ -0,0 +1,457 @@ +<!-- +title: "Using Netdata with Prometheus" +custom_edit_url: https://github.com/netdata/netdata/edit/master/backends/prometheus/README.md +--> + +# Using Netdata with Prometheus + +> IMPORTANT: the format Netdata sends metrics to prometheus has changed since Netdata v1.7. The new prometheus backend +> for Netdata supports a lot more features and is aligned to the development of the rest of the Netdata backends. + +Prometheus is a distributed monitoring system which offers a very simple setup along with a robust data model. Recently +Netdata added support for Prometheus. I'm going to quickly show you how to install both Netdata and prometheus on the +same server. We can then use grafana pointed at Prometheus to obtain long term metrics Netdata offers. I'm assuming we +are starting at a fresh ubuntu shell (whether you'd like to follow along in a VM or a cloud instance is up to you). + +## Installing Netdata and prometheus + +### Installing Netdata + +There are number of ways to install Netdata according to [Installation](/packaging/installer/README.md). The suggested way +of installing the latest Netdata and keep it upgrade automatically. Using one line installation: + +```sh +bash <(curl -Ss https://my-netdata.io/kickstart.sh) +``` + +At this point we should have Netdata listening on port 19999. Attempt to take your browser here: + +```sh +http://your.netdata.ip:19999 +``` + +_(replace `your.netdata.ip` with the IP or hostname of the server running Netdata)_ + +### Installing Prometheus + +In order to install prometheus we are going to introduce our own systemd startup script along with an example of +prometheus.yaml configuration. Prometheus needs to be pointed to your server at a specific target url for it to scrape +Netdata's api. Prometheus is always a pull model meaning Netdata is the passive client within this architecture. +Prometheus always initiates the connection with Netdata. + +#### Download Prometheus + +```sh +cd /tmp && curl -s https://api.github.com/repos/prometheus/prometheus/releases/latest \ +| grep "browser_download_url.*linux-amd64.tar.gz" \ +| cut -d '"' -f 4 \ +| wget -qi - +``` + +#### Create prometheus system user + +```sh +sudo useradd -r prometheus +``` + +#### Create prometheus directory + +```sh +sudo mkdir /opt/prometheus +sudo chown prometheus:prometheus /opt/prometheus +``` + +#### Untar prometheus directory + +```sh +sudo tar -xvf /tmp/prometheus-*linux-amd64.tar.gz -C /opt/prometheus --strip=1 +``` + +#### Install prometheus.yml + +We will use the following `prometheus.yml` file. Save it at `/opt/prometheus/prometheus.yml`. + +Make sure to replace `your.netdata.ip` with the IP or hostname of the host running Netdata. + +```yaml +# my global config +global: + scrape_interval: 5s # Set the scrape interval to every 5 seconds. Default is every 1 minute. + evaluation_interval: 5s # Evaluate rules every 5 seconds. The default is every 1 minute. + # scrape_timeout is set to the global default (10s). + + # Attach these labels to any time series or alerts when communicating with + # external systems (federation, remote storage, Alertmanager). + external_labels: + monitor: 'codelab-monitor' + +# Load rules once and periodically evaluate them according to the global 'evaluation_interval'. +rule_files: + # - "first.rules" + # - "second.rules" + +# A scrape configuration containing exactly one endpoint to scrape: +# Here it's Prometheus itself. +scrape_configs: + # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. + - job_name: 'prometheus' + + # metrics_path defaults to '/metrics' + # scheme defaults to 'http'. + + static_configs: + - targets: ['0.0.0.0:9090'] + + - job_name: 'netdata-scrape' + + metrics_path: '/api/v1/allmetrics' + params: + # format: prometheus | prometheus_all_hosts + # You can use `prometheus_all_hosts` if you want Prometheus to set the `instance` to your hostname instead of IP + format: [prometheus] + # + # sources: as-collected | raw | average | sum | volume + # default is: average + #source: [as-collected] + # + # server name for this prometheus - the default is the client IP + # for Netdata to uniquely identify it + #server: ['prometheus1'] + honor_labels: true + + static_configs: + - targets: ['{your.netdata.ip}:19999'] +``` + +#### Install nodes.yml + +The following is completely optional, it will enable Prometheus to generate alerts from some NetData sources. Tweak the +values to your own needs. We will use the following `nodes.yml` file below. Save it at `/opt/prometheus/nodes.yml`, and +add a _- "nodes.yml"_ entry under the _rule_files:_ section in the example prometheus.yml file above. + +```yaml +groups: +- name: nodes + + rules: + - alert: node_high_cpu_usage_70 + expr: avg(rate(netdata_cpu_cpu_percentage_average{dimension="idle"}[1m])) by (job) > 70 + for: 1m + annotations: + description: '{{ $labels.job }} on ''{{ $labels.job }}'' CPU usage is at {{ humanize $value }}%.' + summary: CPU alert for container node '{{ $labels.job }}' + + - alert: node_high_memory_usage_70 + expr: 100 / sum(netdata_system_ram_MB_average) by (job) + * sum(netdata_system_ram_MB_average{dimension=~"free|cached"}) by (job) < 30 + for: 1m + annotations: + description: '{{ $labels.job }} memory usage is {{ humanize $value}}%.' + summary: Memory alert for container node '{{ $labels.job }}' + + - alert: node_low_root_filesystem_space_20 + expr: 100 / sum(netdata_disk_space_GB_average{family="/"}) by (job) + * sum(netdata_disk_space_GB_average{family="/",dimension=~"avail|cached"}) by (job) < 20 + for: 1m + annotations: + description: '{{ $labels.job }} root filesystem space is {{ humanize $value}}%.' + summary: Root filesystem alert for container node '{{ $labels.job }}' + + - alert: node_root_filesystem_fill_rate_6h + expr: predict_linear(netdata_disk_space_GB_average{family="/",dimension=~"avail|cached"}[1h], 6 * 3600) < 0 + for: 1h + labels: + severity: critical + annotations: + description: Container node {{ $labels.job }} root filesystem is going to fill up in 6h. + summary: Disk fill alert for Swarm node '{{ $labels.job }}' +``` + +#### Install prometheus.service + +Save this service file as `/etc/systemd/system/prometheus.service`: + +```sh +[Unit] +Description=Prometheus Server +AssertPathExists=/opt/prometheus + +[Service] +Type=simple +WorkingDirectory=/opt/prometheus +User=prometheus +Group=prometheus +ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --log.level=info +ExecReload=/bin/kill -SIGHUP $MAINPID +ExecStop=/bin/kill -SIGINT $MAINPID + +[Install] +WantedBy=multi-user.target +``` + +##### Start Prometheus + +```sh +sudo systemctl start prometheus +sudo systemctl enable prometheus +``` + +Prometheus should now start and listen on port 9090. Attempt to head there with your browser. + +If everything is working correctly when you fetch `http://your.prometheus.ip:9090` you will see a 'Status' tab. Click +this and click on 'targets' We should see the Netdata host as a scraped target. + +--- + +## Netdata support for prometheus + +> IMPORTANT: the format Netdata sends metrics to prometheus has changed since Netdata v1.6. The new format allows easier +> queries for metrics and supports both `as collected` and normalized metrics. + +Before explaining the changes, we have to understand the key differences between Netdata and prometheus. + +### understanding Netdata metrics + +#### charts + +Each chart in Netdata has several properties (common to all its metrics): + +- `chart_id` - uniquely identifies a chart. + +- `chart_name` - a more human friendly name for `chart_id`, also unique. + +- `context` - this is the template of the chart. All disk I/O charts have the same context, all mysql requests charts + have the same context, etc. This is used for alarm templates to match all the charts they should be attached to. + +- `family` groups a set of charts together. It is used as the submenu of the dashboard. + +- `units` is the units for all the metrics attached to the chart. + +#### dimensions + +Then each Netdata chart contains metrics called `dimensions`. All the dimensions of a chart have the same units of +measurement, and are contextually in the same category (ie. the metrics for disk bandwidth are `read` and `write` and +they are both in the same chart). + +### Netdata data source + +Netdata can send metrics to prometheus from 3 data sources: + +- `as collected` or `raw` - this data source sends the metrics to prometheus as they are collected. No conversion is + done by Netdata. The latest value for each metric is just given to prometheus. This is the most preferred method by + prometheus, but it is also the harder to work with. To work with this data source, you will need to understand how + to get meaningful values out of them. + + The format of the metrics is: `CONTEXT{chart="CHART",family="FAMILY",dimension="DIMENSION"}`. + + If the metric is a counter (`incremental` in Netdata lingo), `_total` is appended the context. + + Unlike prometheus, Netdata allows each dimension of a chart to have a different algorithm and conversion constants + (`multiplier` and `divisor`). In this case, that the dimensions of a charts are heterogeneous, Netdata will use this + format: `CONTEXT_DIMENSION{chart="CHART",family="FAMILY"}` + +- `average` - this data source uses the Netdata database to send the metrics to prometheus as they are presented on + the Netdata dashboard. So, all the metrics are sent as gauges, at the units they are presented in the Netdata + dashboard charts. This is the easiest to work with. + + The format of the metrics is: `CONTEXT_UNITS_average{chart="CHART",family="FAMILY",dimension="DIMENSION"}`. + + When this source is used, Netdata keeps track of the last access time for each prometheus server fetching the + metrics. This last access time is used at the subsequent queries of the same prometheus server to identify the + time-frame the `average` will be calculated. + + So, no matter how frequently prometheus scrapes Netdata, it will get all the database data. + To identify each prometheus server, Netdata uses by default the IP of the client fetching the metrics. + + If there are multiple prometheus servers fetching data from the same Netdata, using the same IP, each prometheus + server can append `server=NAME` to the URL. Netdata will use this `NAME` to uniquely identify the prometheus server. + +- `sum` or `volume`, is like `average` but instead of averaging the values, it sums them. + + The format of the metrics is: `CONTEXT_UNITS_sum{chart="CHART",family="FAMILY",dimension="DIMENSION"}`. All the + other operations are the same with `average`. + + To change the data source to `sum` or `as-collected` you need to provide the `source` parameter in the request URL. + e.g.: `http://your.netdata.ip:19999/api/v1/allmetrics?format=prometheus&help=yes&source=as-collected` + + Keep in mind that early versions of Netdata were sending the metrics as: `CHART_DIMENSION{}`. + +### Querying Metrics + +Fetch with your web browser this URL: + +`http://your.netdata.ip:19999/api/v1/allmetrics?format=prometheus&help=yes` + +_(replace `your.netdata.ip` with the ip or hostname of your Netdata server)_ + +Netdata will respond with all the metrics it sends to prometheus. + +If you search that page for `"system.cpu"` you will find all the metrics Netdata is exporting to prometheus for this +chart. `system.cpu` is the chart name on the Netdata dashboard (on the Netdata dashboard all charts have a text heading +such as : `Total CPU utilization (system.cpu)`. What we are interested here in the chart name: `system.cpu`). + +Searching for `"system.cpu"` reveals: + +```sh +# COMMENT homogeneous chart "system.cpu", context "system.cpu", family "cpu", units "percentage" +# COMMENT netdata_system_cpu_percentage_average: dimension "guest_nice", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive +netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="guest_nice"} 0.0000000 1500066662000 +# COMMENT netdata_system_cpu_percentage_average: dimension "guest", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive +netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="guest"} 1.7837326 1500066662000 +# COMMENT netdata_system_cpu_percentage_average: dimension "steal", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive +netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="steal"} 0.0000000 1500066662000 +# COMMENT netdata_system_cpu_percentage_average: dimension "softirq", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive +netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="softirq"} 0.5275442 1500066662000 +# COMMENT netdata_system_cpu_percentage_average: dimension "irq", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive +netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="irq"} 0.2260836 1500066662000 +# COMMENT netdata_system_cpu_percentage_average: dimension "user", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive +netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="user"} 2.3362762 1500066662000 +# COMMENT netdata_system_cpu_percentage_average: dimension "system", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive +netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="system"} 1.7961062 1500066662000 +# COMMENT netdata_system_cpu_percentage_average: dimension "nice", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive +netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="nice"} 0.0000000 1500066662000 +# COMMENT netdata_system_cpu_percentage_average: dimension "iowait", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive +netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="iowait"} 0.9671802 1500066662000 +# COMMENT netdata_system_cpu_percentage_average: dimension "idle", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive +netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="idle"} 92.3630770 1500066662000 +``` + +_(Netdata response for `system.cpu` with source=`average`)_ + +In `average` or `sum` data sources, all values are normalized and are reported to prometheus as gauges. Now, use the +'expression' text form in prometheus. Begin to type the metrics we are looking for: `netdata_system_cpu`. You should see +that the text form begins to auto-fill as prometheus knows about this metric. + +If the data source was `as collected`, the response would be: + +```sh +# COMMENT homogeneous chart "system.cpu", context "system.cpu", family "cpu", units "percentage" +# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "guest_nice", value * 1 / 1 delta gives percentage (counter) +netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="guest_nice"} 0 1500066716438 +# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "guest", value * 1 / 1 delta gives percentage (counter) +netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="guest"} 63945 1500066716438 +# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "steal", value * 1 / 1 delta gives percentage (counter) +netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="steal"} 0 1500066716438 +# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "softirq", value * 1 / 1 delta gives percentage (counter) +netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="softirq"} 8295 1500066716438 +# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "irq", value * 1 / 1 delta gives percentage (counter) +netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="irq"} 4079 1500066716438 +# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "user", value * 1 / 1 delta gives percentage (counter) +netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="user"} 116488 1500066716438 +# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "system", value * 1 / 1 delta gives percentage (counter) +netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="system"} 35084 1500066716438 +# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "nice", value * 1 / 1 delta gives percentage (counter) +netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="nice"} 505 1500066716438 +# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "iowait", value * 1 / 1 delta gives percentage (counter) +netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="iowait"} 23314 1500066716438 +# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "idle", value * 1 / 1 delta gives percentage (counter) +netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="idle"} 918470 1500066716438 +``` + +_(Netdata response for `system.cpu` with source=`as-collected`)_ + +For more information check prometheus documentation. + +### Streaming data from upstream hosts + +The `format=prometheus` parameter only exports the host's Netdata metrics. If you are using the parent-child +functionality of Netdata this ignores any upstream hosts - so you should consider using the below in your +**prometheus.yml**: + +```yaml + metrics_path: '/api/v1/allmetrics' + params: + format: [prometheus_all_hosts] + honor_labels: true +``` + +This will report all upstream host data, and `honor_labels` will make Prometheus take note of the instance names +provided. + +### Timestamps + +To pass the metrics through prometheus pushgateway, Netdata supports the option `×tamps=no` to send the metrics +without timestamps. + +## Netdata host variables + +Netdata collects various system configuration metrics, like the max number of TCP sockets supported, the max number of +files allowed system-wide, various IPC sizes, etc. These metrics are not exposed to prometheus by default. + +To expose them, append `variables=yes` to the Netdata URL. + +### TYPE and HELP + +To save bandwidth, and because prometheus does not use them anyway, `# TYPE` and `# HELP` lines are suppressed. If +wanted they can be re-enabled via `types=yes` and `help=yes`, e.g. +`/api/v1/allmetrics?format=prometheus&types=yes&help=yes` + +Note that if enabled, the `# TYPE` and `# HELP` lines are repeated for every occurrence of a metric, which goes against the Prometheus documentation's [specification for these lines](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#comments-help-text-and-type-information). + +### Names and IDs + +Netdata supports names and IDs for charts and dimensions. Usually IDs are unique identifiers as read by the system and +names are human friendly labels (also unique). + +Most charts and metrics have the same ID and name, but in several cases they are different: disks with device-mapper, +interrupts, QoS classes, statsd synthetic charts, etc. + +The default is controlled in `netdata.conf`: + +```conf +[backend] + send names instead of ids = yes | no +``` + +You can overwrite it from prometheus, by appending to the URL: + +- `&names=no` to get IDs (the old behaviour) +- `&names=yes` to get names + +### Filtering metrics sent to prometheus + +Netdata can filter the metrics it sends to prometheus with this setting: + +```conf +[backend] + send charts matching = * +``` + +This settings accepts a space separated list of patterns to match the **charts** to be sent to prometheus. Each pattern +can use `*` as wildcard, any number of times (e.g `*a*b*c*` is valid). Patterns starting with `!` give a negative match +(e.g `!*.bad users.* groups.*` will send all the users and groups except `bad` user and `bad` group). The order is +important: the first match (positive or negative) left to right, is used. + +### Changing the prefix of Netdata metrics + +Netdata sends all metrics prefixed with `netdata_`. You can change this in `netdata.conf`, like this: + +```conf +[backend] + prefix = netdata +``` + +It can also be changed from the URL, by appending `&prefix=netdata`. + +### Metric Units + +The default source `average` adds the unit of measurement to the name of each metric (e.g. `_KiB_persec`). To hide the +units and get the same metric names as with the other sources, append to the URL `&hideunits=yes`. + +The units were standardized in v1.12, with the effect of changing the metric names. To get the metric names as they were +before v1.12, append to the URL `&oldunits=yes` + +### Accuracy of `average` and `sum` data sources + +When the data source is set to `average` or `sum`, Netdata remembers the last access of each client accessing prometheus +metrics and uses this last access time to respond with the `average` or `sum` of all the entries in the database since +that. This means that prometheus servers are not losing data when they access Netdata with data source = `average` or +`sum`. + +To uniquely identify each prometheus server, Netdata uses the IP of the client accessing the metrics. If however the IP +is not good enough for identifying a single prometheus server (e.g. when prometheus servers are accessing Netdata +through a web proxy, or when multiple prometheus servers are NATed to a single IP), each prometheus may append +`&server=NAME` to the URL. This `NAME` is used by Netdata to uniquely identify each prometheus server and keep track of +its last access time. + +[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fbackends%2Fprometheus%2FREADME&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/backends/prometheus/backend_prometheus.c b/backends/prometheus/backend_prometheus.c new file mode 100644 index 0000000..a3ecf16 --- /dev/null +++ b/backends/prometheus/backend_prometheus.c @@ -0,0 +1,797 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#define BACKENDS_INTERNALS +#include "backend_prometheus.h" + +// ---------------------------------------------------------------------------- +// PROMETHEUS +// /api/v1/allmetrics?format=prometheus and /api/v1/allmetrics?format=prometheus_all_hosts + +static struct prometheus_server { + const char *server; + uint32_t hash; + RRDHOST *host; + time_t last_access; + struct prometheus_server *next; +} *prometheus_server_root = NULL; + +static inline time_t prometheus_server_last_access(const char *server, RRDHOST *host, time_t now) { + static netdata_mutex_t prometheus_server_root_mutex = NETDATA_MUTEX_INITIALIZER; + + uint32_t hash = simple_hash(server); + + netdata_mutex_lock(&prometheus_server_root_mutex); + + struct prometheus_server *ps; + for(ps = prometheus_server_root; ps ;ps = ps->next) { + if (host == ps->host && hash == ps->hash && !strcmp(server, ps->server)) { + time_t last = ps->last_access; + ps->last_access = now; + netdata_mutex_unlock(&prometheus_server_root_mutex); + return last; + } + } + + ps = callocz(1, sizeof(struct prometheus_server)); + ps->server = strdupz(server); + ps->hash = hash; + ps->host = host; + ps->last_access = now; + ps->next = prometheus_server_root; + prometheus_server_root = ps; + + netdata_mutex_unlock(&prometheus_server_root_mutex); + return 0; +} + +static inline size_t backends_prometheus_name_copy(char *d, const char *s, size_t usable) { + size_t n; + + for(n = 0; *s && n < usable ; d++, s++, n++) { + register char c = *s; + + if(!isalnum(c)) *d = '_'; + else *d = c; + } + *d = '\0'; + + return n; +} + +static inline size_t backends_prometheus_label_copy(char *d, const char *s, size_t usable) { + size_t n; + + // make sure we can escape one character without overflowing the buffer + usable--; + + for(n = 0; *s && n < usable ; d++, s++, n++) { + register char c = *s; + + if(unlikely(c == '"' || c == '\\' || c == '\n')) { + *d++ = '\\'; + n++; + } + *d = c; + } + *d = '\0'; + + return n; +} + +static inline char *backends_prometheus_units_copy(char *d, const char *s, size_t usable, int showoldunits) { + const char *sorig = s; + char *ret = d; + size_t n; + + // Fix for issue 5227 + if (unlikely(showoldunits)) { + static struct { + const char *newunit; + uint32_t hash; + const char *oldunit; + } units[] = { + {"KiB/s", 0, "kilobytes/s"} + , {"MiB/s", 0, "MB/s"} + , {"GiB/s", 0, "GB/s"} + , {"KiB" , 0, "KB"} + , {"MiB" , 0, "MB"} + , {"GiB" , 0, "GB"} + , {"inodes" , 0, "Inodes"} + , {"percentage" , 0, "percent"} + , {"faults/s" , 0, "page faults/s"} + , {"KiB/operation", 0, "kilobytes per operation"} + , {"milliseconds/operation", 0, "ms per operation"} + , {NULL, 0, NULL} + }; + static int initialized = 0; + int i; + + if(unlikely(!initialized)) { + for (i = 0; units[i].newunit; i++) + units[i].hash = simple_hash(units[i].newunit); + initialized = 1; + } + + uint32_t hash = simple_hash(s); + for(i = 0; units[i].newunit ; i++) { + if(unlikely(hash == units[i].hash && !strcmp(s, units[i].newunit))) { + // info("matched extension for filename '%s': '%s'", filename, last_dot); + s=units[i].oldunit; + sorig = s; + break; + } + } + } + *d++ = '_'; + for(n = 1; *s && n < usable ; d++, s++, n++) { + register char c = *s; + + if(!isalnum(c)) *d = '_'; + else *d = c; + } + + if(n == 2 && sorig[0] == '%') { + n = 0; + d = ret; + s = "_percent"; + for( ; *s && n < usable ; n++) *d++ = *s++; + } + else if(n > 3 && sorig[n-3] == '/' && sorig[n-2] == 's') { + n = n - 2; + d -= 2; + s = "_persec"; + for( ; *s && n < usable ; n++) *d++ = *s++; + } + + *d = '\0'; + + return ret; +} + + +#define PROMETHEUS_ELEMENT_MAX 256 +#define PROMETHEUS_LABELS_MAX 1024 +#define PROMETHEUS_VARIABLE_MAX 256 + +#define PROMETHEUS_LABELS_MAX_NUMBER 128 + +struct host_variables_callback_options { + RRDHOST *host; + BUFFER *wb; + BACKEND_OPTIONS backend_options; + BACKENDS_PROMETHEUS_OUTPUT_OPTIONS output_options; + const char *prefix; + const char *labels; + time_t now; + int host_header_printed; + char name[PROMETHEUS_VARIABLE_MAX+1]; +}; + +static int print_host_variables(RRDVAR *rv, void *data) { + struct host_variables_callback_options *opts = data; + + if(rv->options & (RRDVAR_OPTION_CUSTOM_HOST_VAR|RRDVAR_OPTION_CUSTOM_CHART_VAR)) { + if(!opts->host_header_printed) { + opts->host_header_printed = 1; + + if(opts->output_options & BACKENDS_PROMETHEUS_OUTPUT_HELP) { + buffer_sprintf(opts->wb, "\n# COMMENT global host and chart variables\n"); + } + } + + calculated_number value = rrdvar2number(rv); + if(isnan(value) || isinf(value)) { + if(opts->output_options & BACKENDS_PROMETHEUS_OUTPUT_HELP) + buffer_sprintf(opts->wb, "# COMMENT variable \"%s\" is %s. Skipped.\n", rv->name, (isnan(value))?"NAN":"INF"); + + return 0; + } + + char *label_pre = ""; + char *label_post = ""; + if(opts->labels && *opts->labels) { + label_pre = "{"; + label_post = "}"; + } + + backends_prometheus_name_copy(opts->name, rv->name, sizeof(opts->name)); + + if(opts->output_options & BACKENDS_PROMETHEUS_OUTPUT_TIMESTAMPS) + buffer_sprintf(opts->wb + , "%s_%s%s%s%s " CALCULATED_NUMBER_FORMAT " %llu\n" + , opts->prefix + , opts->name + , label_pre + , opts->labels + , label_post + , value + , opts->now * 1000ULL + ); + else + buffer_sprintf(opts->wb, "%s_%s%s%s%s " CALCULATED_NUMBER_FORMAT "\n" + , opts->prefix + , opts->name + , label_pre + , opts->labels + , label_post + , value + ); + + return 1; + } + + return 0; +} + +static void rrd_stats_api_v1_charts_allmetrics_prometheus(RRDHOST *host, BUFFER *wb, const char *prefix, BACKEND_OPTIONS backend_options, time_t after, time_t before, int allhosts, BACKENDS_PROMETHEUS_OUTPUT_OPTIONS output_options) { + rrdhost_rdlock(host); + + char hostname[PROMETHEUS_ELEMENT_MAX + 1]; + backends_prometheus_label_copy(hostname, host->hostname, PROMETHEUS_ELEMENT_MAX); + + char labels[PROMETHEUS_LABELS_MAX + 1] = ""; + if(allhosts) { + if(output_options & BACKENDS_PROMETHEUS_OUTPUT_TIMESTAMPS) + buffer_sprintf(wb, "netdata_info{instance=\"%s\",application=\"%s\",version=\"%s\"} 1 %llu\n", hostname, host->program_name, host->program_version, now_realtime_usec() / USEC_PER_MS); + else + buffer_sprintf(wb, "netdata_info{instance=\"%s\",application=\"%s\",version=\"%s\"} 1\n", hostname, host->program_name, host->program_version); + + if(host->tags && *(host->tags)) { + if(output_options & BACKENDS_PROMETHEUS_OUTPUT_TIMESTAMPS) { + buffer_sprintf(wb, "netdata_host_tags_info{instance=\"%s\",%s} 1 %llu\n", hostname, host->tags, now_realtime_usec() / USEC_PER_MS); + + // deprecated, exists only for compatibility with older queries + buffer_sprintf(wb, "netdata_host_tags{instance=\"%s\",%s} 1 %llu\n", hostname, host->tags, now_realtime_usec() / USEC_PER_MS); + } + else { + buffer_sprintf(wb, "netdata_host_tags_info{instance=\"%s\",%s} 1\n", hostname, host->tags); + + // deprecated, exists only for compatibility with older queries + buffer_sprintf(wb, "netdata_host_tags{instance=\"%s\",%s} 1\n", hostname, host->tags); + } + + } + + snprintfz(labels, PROMETHEUS_LABELS_MAX, ",instance=\"%s\"", hostname); + } + else { + if(output_options & BACKENDS_PROMETHEUS_OUTPUT_TIMESTAMPS) + buffer_sprintf(wb, "netdata_info{instance=\"%s\",application=\"%s\",version=\"%s\"} 1 %llu\n", hostname, host->program_name, host->program_version, now_realtime_usec() / USEC_PER_MS); + else + buffer_sprintf(wb, "netdata_info{instance=\"%s\",application=\"%s\",version=\"%s\"} 1\n", hostname, host->program_name, host->program_version); + + if(host->tags && *(host->tags)) { + if(output_options & BACKENDS_PROMETHEUS_OUTPUT_TIMESTAMPS) { + buffer_sprintf(wb, "netdata_host_tags_info{%s} 1 %llu\n", host->tags, now_realtime_usec() / USEC_PER_MS); + + // deprecated, exists only for compatibility with older queries + buffer_sprintf(wb, "netdata_host_tags{%s} 1 %llu\n", host->tags, now_realtime_usec() / USEC_PER_MS); + } + else { + buffer_sprintf(wb, "netdata_host_tags_info{%s} 1\n", host->tags); + + // deprecated, exists only for compatibility with older queries + buffer_sprintf(wb, "netdata_host_tags{%s} 1\n", host->tags); + } + } + } + + // send custom variables set for the host + if(output_options & BACKENDS_PROMETHEUS_OUTPUT_VARIABLES){ + struct host_variables_callback_options opts = { + .host = host, + .wb = wb, + .labels = (labels[0] == ',')?&labels[1]:labels, + .backend_options = backend_options, + .output_options = output_options, + .prefix = prefix, + .now = now_realtime_sec(), + .host_header_printed = 0 + }; + foreach_host_variable_callback(host, print_host_variables, &opts); + } + + // for each chart + RRDSET *st; + rrdset_foreach_read(st, host) { + char chart[PROMETHEUS_ELEMENT_MAX + 1]; + char context[PROMETHEUS_ELEMENT_MAX + 1]; + char family[PROMETHEUS_ELEMENT_MAX + 1]; + char units[PROMETHEUS_ELEMENT_MAX + 1] = ""; + + backends_prometheus_label_copy(chart, (output_options & BACKENDS_PROMETHEUS_OUTPUT_NAMES && st->name)?st->name:st->id, PROMETHEUS_ELEMENT_MAX); + backends_prometheus_label_copy(family, st->family, PROMETHEUS_ELEMENT_MAX); + backends_prometheus_name_copy(context, st->context, PROMETHEUS_ELEMENT_MAX); + + if(likely(backends_can_send_rrdset(backend_options, st))) { + rrdset_rdlock(st); + + int as_collected = (BACKEND_OPTIONS_DATA_SOURCE(backend_options) == BACKEND_SOURCE_DATA_AS_COLLECTED); + int homogeneous = 1; + if(as_collected) { + if(rrdset_flag_check(st, RRDSET_FLAG_HOMOGENEOUS_CHECK)) + rrdset_update_heterogeneous_flag(st); + + if(rrdset_flag_check(st, RRDSET_FLAG_HETEROGENEOUS)) + homogeneous = 0; + } + else { + if(BACKEND_OPTIONS_DATA_SOURCE(backend_options) == BACKEND_SOURCE_DATA_AVERAGE && !(output_options & BACKENDS_PROMETHEUS_OUTPUT_HIDEUNITS)) + backends_prometheus_units_copy(units, st->units, PROMETHEUS_ELEMENT_MAX, output_options & BACKENDS_PROMETHEUS_OUTPUT_OLDUNITS); + } + + if(unlikely(output_options & BACKENDS_PROMETHEUS_OUTPUT_HELP)) + buffer_sprintf(wb, "\n# COMMENT %s chart \"%s\", context \"%s\", family \"%s\", units \"%s\"\n" + , (homogeneous)?"homogeneous":"heterogeneous" + , (output_options & BACKENDS_PROMETHEUS_OUTPUT_NAMES && st->name) ? st->name : st->id + , st->context + , st->family + , st->units + ); + + // for each dimension + RRDDIM *rd; + rrddim_foreach_read(rd, st) { + if(rd->collections_counter && !rrddim_flag_check(rd, RRDDIM_FLAG_OBSOLETE)) { + char dimension[PROMETHEUS_ELEMENT_MAX + 1]; + char *suffix = ""; + + if (as_collected) { + // we need as-collected / raw data + + if(unlikely(rd->last_collected_time.tv_sec < after)) + continue; + + const char *t = "gauge", *h = "gives"; + if(rd->algorithm == RRD_ALGORITHM_INCREMENTAL || + rd->algorithm == RRD_ALGORITHM_PCENT_OVER_DIFF_TOTAL) { + t = "counter"; + h = "delta gives"; + suffix = "_total"; + } + + if(homogeneous) { + // all the dimensions of the chart, has the same algorithm, multiplier and divisor + // we add all dimensions as labels + + backends_prometheus_label_copy(dimension, (output_options & BACKENDS_PROMETHEUS_OUTPUT_NAMES && rd->name) ? rd->name : rd->id, PROMETHEUS_ELEMENT_MAX); + + if(unlikely(output_options & BACKENDS_PROMETHEUS_OUTPUT_HELP)) + buffer_sprintf(wb + , "# COMMENT %s_%s%s: chart \"%s\", context \"%s\", family \"%s\", dimension \"%s\", value * " COLLECTED_NUMBER_FORMAT " / " COLLECTED_NUMBER_FORMAT " %s %s (%s)\n" + , prefix + , context + , suffix + , (output_options & BACKENDS_PROMETHEUS_OUTPUT_NAMES && st->name) ? st->name : st->id + , st->context + , st->family + , (output_options & BACKENDS_PROMETHEUS_OUTPUT_NAMES && rd->name) ? rd->name : rd->id + , rd->multiplier + , rd->divisor + , h + , st->units + , t + ); + + if(unlikely(output_options & BACKENDS_PROMETHEUS_OUTPUT_TYPES)) + buffer_sprintf(wb, "# TYPE %s_%s%s %s\n" + , prefix + , context + , suffix + , t + ); + + if(output_options & BACKENDS_PROMETHEUS_OUTPUT_TIMESTAMPS) + buffer_sprintf(wb + , "%s_%s%s{chart=\"%s\",family=\"%s\",dimension=\"%s\"%s} " COLLECTED_NUMBER_FORMAT " %llu\n" + , prefix + , context + , suffix + , chart + , family + , dimension + , labels + , rd->last_collected_value + , timeval_msec(&rd->last_collected_time) + ); + else + buffer_sprintf(wb + , "%s_%s%s{chart=\"%s\",family=\"%s\",dimension=\"%s\"%s} " COLLECTED_NUMBER_FORMAT "\n" + , prefix + , context + , suffix + , chart + , family + , dimension + , labels + , rd->last_collected_value + ); + } + else { + // the dimensions of the chart, do not have the same algorithm, multiplier or divisor + // we create a metric per dimension + + backends_prometheus_name_copy(dimension, (output_options & BACKENDS_PROMETHEUS_OUTPUT_NAMES && rd->name) ? rd->name : rd->id, PROMETHEUS_ELEMENT_MAX); + + if(unlikely(output_options & BACKENDS_PROMETHEUS_OUTPUT_HELP)) + buffer_sprintf(wb + , "# COMMENT %s_%s_%s%s: chart \"%s\", context \"%s\", family \"%s\", dimension \"%s\", value * " COLLECTED_NUMBER_FORMAT " / " COLLECTED_NUMBER_FORMAT " %s %s (%s)\n" + , prefix + , context + , dimension + , suffix + , (output_options & BACKENDS_PROMETHEUS_OUTPUT_NAMES && st->name) ? st->name : st->id + , st->context + , st->family + , (output_options & BACKENDS_PROMETHEUS_OUTPUT_NAMES && rd->name) ? rd->name : rd->id + , rd->multiplier + , rd->divisor + , h + , st->units + , t + ); + + if(unlikely(output_options & BACKENDS_PROMETHEUS_OUTPUT_TYPES)) + buffer_sprintf(wb, "# TYPE %s_%s_%s%s %s\n" + , prefix + , context + , dimension + , suffix + , t + ); + + if(output_options & BACKENDS_PROMETHEUS_OUTPUT_TIMESTAMPS) + buffer_sprintf(wb + , "%s_%s_%s%s{chart=\"%s\",family=\"%s\"%s} " COLLECTED_NUMBER_FORMAT " %llu\n" + , prefix + , context + , dimension + , suffix + , chart + , family + , labels + , rd->last_collected_value + , timeval_msec(&rd->last_collected_time) + ); + else + buffer_sprintf(wb + , "%s_%s_%s%s{chart=\"%s\",family=\"%s\"%s} " COLLECTED_NUMBER_FORMAT "\n" + , prefix + , context + , dimension + , suffix + , chart + , family + , labels + , rd->last_collected_value + ); + } + } + else { + // we need average or sum of the data + + time_t first_t = after, last_t = before; + calculated_number value = backend_calculate_value_from_stored_data(st, rd, after, before, backend_options, &first_t, &last_t); + + if(!isnan(value) && !isinf(value)) { + + if(BACKEND_OPTIONS_DATA_SOURCE(backend_options) == BACKEND_SOURCE_DATA_AVERAGE) + suffix = "_average"; + else if(BACKEND_OPTIONS_DATA_SOURCE(backend_options) == BACKEND_SOURCE_DATA_SUM) + suffix = "_sum"; + + backends_prometheus_label_copy(dimension, (output_options & BACKENDS_PROMETHEUS_OUTPUT_NAMES && rd->name) ? rd->name : rd->id, PROMETHEUS_ELEMENT_MAX); + + if (unlikely(output_options & BACKENDS_PROMETHEUS_OUTPUT_HELP)) + buffer_sprintf(wb, "# COMMENT %s_%s%s%s: dimension \"%s\", value is %s, gauge, dt %llu to %llu inclusive\n" + , prefix + , context + , units + , suffix + , (output_options & BACKENDS_PROMETHEUS_OUTPUT_NAMES && rd->name) ? rd->name : rd->id + , st->units + , (unsigned long long)first_t + , (unsigned long long)last_t + ); + + if (unlikely(output_options & BACKENDS_PROMETHEUS_OUTPUT_TYPES)) + buffer_sprintf(wb, "# TYPE %s_%s%s%s gauge\n" + , prefix + , context + , units + , suffix + ); + + if(output_options & BACKENDS_PROMETHEUS_OUTPUT_TIMESTAMPS) + buffer_sprintf(wb, "%s_%s%s%s{chart=\"%s\",family=\"%s\",dimension=\"%s\"%s} " CALCULATED_NUMBER_FORMAT " %llu\n" + , prefix + , context + , units + , suffix + , chart + , family + , dimension + , labels + , value + , last_t * MSEC_PER_SEC + ); + else + buffer_sprintf(wb, "%s_%s%s%s{chart=\"%s\",family=\"%s\",dimension=\"%s\"%s} " CALCULATED_NUMBER_FORMAT "\n" + , prefix + , context + , units + , suffix + , chart + , family + , dimension + , labels + , value + ); + } + } + } + } + + rrdset_unlock(st); + } + } + + rrdhost_unlock(host); +} + +#if ENABLE_PROMETHEUS_REMOTE_WRITE +inline static void remote_write_split_words(char *str, char **words, int max_words) { + char *s = str; + int i = 0; + + while(*s && i < max_words - 1) { + while(*s && isspace(*s)) s++; // skip spaces to the begining of a tag name + + if(*s) + words[i] = s; + + while(*s && !isspace(*s) && *s != '=') s++; // find the end of the tag name + + if(*s != '=') { + words[i] = NULL; + break; + } + *s = '\0'; + s++; + i++; + + while(*s && isspace(*s)) s++; // skip spaces to the begining of a tag value + + if(*s && *s == '"') s++; // strip an opening quote + if(*s) + words[i] = s; + + while(*s && !isspace(*s) && *s != ',') s++; // find the end of the tag value + + if(*s && *s != ',') { + words[i] = NULL; + break; + } + if(s != words[i] && *(s - 1) == '"') *(s - 1) = '\0'; // strip a closing quote + if(*s != '\0') { + *s = '\0'; + s++; + i++; + } + } +} + +void backends_rrd_stats_remote_write_allmetrics_prometheus( + RRDHOST *host + , const char *__hostname + , const char *prefix + , BACKEND_OPTIONS backend_options + , time_t after + , time_t before + , size_t *count_charts + , size_t *count_dims + , size_t *count_dims_skipped +) { + char hostname[PROMETHEUS_ELEMENT_MAX + 1]; + backends_prometheus_label_copy(hostname, __hostname, PROMETHEUS_ELEMENT_MAX); + + backends_add_host_info("netdata_info", hostname, host->program_name, host->program_version, now_realtime_usec() / USEC_PER_MS); + + if(host->tags && *(host->tags)) { + char tags[PROMETHEUS_LABELS_MAX + 1]; + strncpy(tags, host->tags, PROMETHEUS_LABELS_MAX); + char *words[PROMETHEUS_LABELS_MAX_NUMBER] = {NULL}; + int i; + + remote_write_split_words(tags, words, PROMETHEUS_LABELS_MAX_NUMBER); + + backends_add_host_info("netdata_host_tags_info", hostname, NULL, NULL, now_realtime_usec() / USEC_PER_MS); + + for(i = 0; words[i] != NULL && words[i + 1] != NULL && (i + 1) < PROMETHEUS_LABELS_MAX_NUMBER; i += 2) { + backends_add_tag(words[i], words[i + 1]); + } + } + + // for each chart + RRDSET *st; + rrdset_foreach_read(st, host) { + char chart[PROMETHEUS_ELEMENT_MAX + 1]; + char context[PROMETHEUS_ELEMENT_MAX + 1]; + char family[PROMETHEUS_ELEMENT_MAX + 1]; + char units[PROMETHEUS_ELEMENT_MAX + 1] = ""; + + backends_prometheus_label_copy(chart, (backend_options & BACKEND_OPTION_SEND_NAMES && st->name)?st->name:st->id, PROMETHEUS_ELEMENT_MAX); + backends_prometheus_label_copy(family, st->family, PROMETHEUS_ELEMENT_MAX); + backends_prometheus_name_copy(context, st->context, PROMETHEUS_ELEMENT_MAX); + + if(likely(backends_can_send_rrdset(backend_options, st))) { + rrdset_rdlock(st); + + (*count_charts)++; + + int as_collected = (BACKEND_OPTIONS_DATA_SOURCE(backend_options) == BACKEND_SOURCE_DATA_AS_COLLECTED); + int homogeneous = 1; + if(as_collected) { + if(rrdset_flag_check(st, RRDSET_FLAG_HOMOGENEOUS_CHECK)) + rrdset_update_heterogeneous_flag(st); + + if(rrdset_flag_check(st, RRDSET_FLAG_HETEROGENEOUS)) + homogeneous = 0; + } + else { + if(BACKEND_OPTIONS_DATA_SOURCE(backend_options) == BACKEND_SOURCE_DATA_AVERAGE) + backends_prometheus_units_copy(units, st->units, PROMETHEUS_ELEMENT_MAX, 0); + } + + // for each dimension + RRDDIM *rd; + rrddim_foreach_read(rd, st) { + if(rd->collections_counter && !rrddim_flag_check(rd, RRDDIM_FLAG_OBSOLETE)) { + char name[PROMETHEUS_LABELS_MAX + 1]; + char dimension[PROMETHEUS_ELEMENT_MAX + 1]; + char *suffix = ""; + + if (as_collected) { + // we need as-collected / raw data + + if(unlikely(rd->last_collected_time.tv_sec < after)) { + debug(D_BACKEND, "BACKEND: not sending dimension '%s' of chart '%s' from host '%s', its last data collection (%lu) is not within our timeframe (%lu to %lu)", rd->id, st->id, __hostname, (unsigned long)rd->last_collected_time.tv_sec, (unsigned long)after, (unsigned long)before); + (*count_dims_skipped)++; + continue; + } + + if(homogeneous) { + // all the dimensions of the chart, has the same algorithm, multiplier and divisor + // we add all dimensions as labels + + backends_prometheus_label_copy(dimension, (backend_options & BACKEND_OPTION_SEND_NAMES && rd->name) ? rd->name : rd->id, PROMETHEUS_ELEMENT_MAX); + snprintf(name, PROMETHEUS_LABELS_MAX, "%s_%s%s", prefix, context, suffix); + + backends_add_metric(name, chart, family, dimension, hostname, rd->last_collected_value, timeval_msec(&rd->last_collected_time)); + (*count_dims)++; + } + else { + // the dimensions of the chart, do not have the same algorithm, multiplier or divisor + // we create a metric per dimension + + backends_prometheus_name_copy(dimension, (backend_options & BACKEND_OPTION_SEND_NAMES && rd->name) ? rd->name : rd->id, PROMETHEUS_ELEMENT_MAX); + snprintf(name, PROMETHEUS_LABELS_MAX, "%s_%s_%s%s", prefix, context, dimension, suffix); + + backends_add_metric(name, chart, family, NULL, hostname, rd->last_collected_value, timeval_msec(&rd->last_collected_time)); + (*count_dims)++; + } + } + else { + // we need average or sum of the data + + time_t first_t = after, last_t = before; + calculated_number value = backend_calculate_value_from_stored_data(st, rd, after, before, backend_options, &first_t, &last_t); + + if(!isnan(value) && !isinf(value)) { + + if(BACKEND_OPTIONS_DATA_SOURCE(backend_options) == BACKEND_SOURCE_DATA_AVERAGE) + suffix = "_average"; + else if(BACKEND_OPTIONS_DATA_SOURCE(backend_options) == BACKEND_SOURCE_DATA_SUM) + suffix = "_sum"; + + backends_prometheus_label_copy(dimension, (backend_options & BACKEND_OPTION_SEND_NAMES && rd->name) ? rd->name : rd->id, PROMETHEUS_ELEMENT_MAX); + snprintf(name, PROMETHEUS_LABELS_MAX, "%s_%s%s%s", prefix, context, units, suffix); + + backends_add_metric(name, chart, family, dimension, hostname, value, last_t * MSEC_PER_SEC); + (*count_dims)++; + } + } + } + } + + rrdset_unlock(st); + } + } +} +#endif /* ENABLE_PROMETHEUS_REMOTE_WRITE */ + +static inline time_t prometheus_preparation(RRDHOST *host, BUFFER *wb, BACKEND_OPTIONS backend_options, const char *server, time_t now, BACKENDS_PROMETHEUS_OUTPUT_OPTIONS output_options) { + if(!server || !*server) server = "default"; + + time_t after = prometheus_server_last_access(server, host, now); + + int first_seen = 0; + if(!after) { + after = now - global_backend_update_every; + first_seen = 1; + } + + if(after > now) { + // oops! this should never happen + after = now - global_backend_update_every; + } + + if(output_options & BACKENDS_PROMETHEUS_OUTPUT_HELP) { + char *mode; + if(BACKEND_OPTIONS_DATA_SOURCE(backend_options) == BACKEND_SOURCE_DATA_AS_COLLECTED) + mode = "as collected"; + else if(BACKEND_OPTIONS_DATA_SOURCE(backend_options) == BACKEND_SOURCE_DATA_AVERAGE) + mode = "average"; + else if(BACKEND_OPTIONS_DATA_SOURCE(backend_options) == BACKEND_SOURCE_DATA_SUM) + mode = "sum"; + else + mode = "unknown"; + + buffer_sprintf(wb, "# COMMENT netdata \"%s\" to %sprometheus \"%s\", source \"%s\", last seen %lu %s, time range %lu to %lu\n\n" + , host->hostname + , (first_seen)?"FIRST SEEN ":"" + , server + , mode + , (unsigned long)((first_seen)?0:(now - after)) + , (first_seen)?"never":"seconds ago" + , (unsigned long)after, (unsigned long)now + ); + } + + return after; +} + +void backends_rrd_stats_api_v1_charts_allmetrics_prometheus_single_host(RRDHOST *host, BUFFER *wb, const char *server, const char *prefix, BACKEND_OPTIONS backend_options, BACKENDS_PROMETHEUS_OUTPUT_OPTIONS output_options) { + time_t before = now_realtime_sec(); + + // we start at the point we had stopped before + time_t after = prometheus_preparation(host, wb, backend_options, server, before, output_options); + + rrd_stats_api_v1_charts_allmetrics_prometheus(host, wb, prefix, backend_options, after, before, 0, output_options); +} + +void backends_rrd_stats_api_v1_charts_allmetrics_prometheus_all_hosts(RRDHOST *host, BUFFER *wb, const char *server, const char *prefix, BACKEND_OPTIONS backend_options, BACKENDS_PROMETHEUS_OUTPUT_OPTIONS output_options) { + time_t before = now_realtime_sec(); + + // we start at the point we had stopped before + time_t after = prometheus_preparation(host, wb, backend_options, server, before, output_options); + + rrd_rdlock(); + rrdhost_foreach_read(host) { + rrd_stats_api_v1_charts_allmetrics_prometheus(host, wb, prefix, backend_options, after, before, 1, output_options); + } + rrd_unlock(); +} + +#if ENABLE_PROMETHEUS_REMOTE_WRITE +int backends_process_prometheus_remote_write_response(BUFFER *b) { + if(unlikely(!b)) return 1; + + const char *s = buffer_tostring(b); + int len = buffer_strlen(b); + + // do nothing with HTTP responses 200 or 204 + + while(!isspace(*s) && len) { + s++; + len--; + } + s++; + len--; + + if(likely(len > 4 && (!strncmp(s, "200 ", 4) || !strncmp(s, "204 ", 4)))) + return 0; + else + return discard_response(b, "prometheus remote write"); +} +#endif diff --git a/backends/prometheus/backend_prometheus.h b/backends/prometheus/backend_prometheus.h new file mode 100644 index 0000000..8c14ddc --- /dev/null +++ b/backends/prometheus/backend_prometheus.h @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_BACKEND_PROMETHEUS_H +#define NETDATA_BACKEND_PROMETHEUS_H 1 + +#include "backends/backends.h" + +typedef enum backends_prometheus_output_flags { + BACKENDS_PROMETHEUS_OUTPUT_NONE = 0, + BACKENDS_PROMETHEUS_OUTPUT_HELP = (1 << 0), + BACKENDS_PROMETHEUS_OUTPUT_TYPES = (1 << 1), + BACKENDS_PROMETHEUS_OUTPUT_NAMES = (1 << 2), + BACKENDS_PROMETHEUS_OUTPUT_TIMESTAMPS = (1 << 3), + BACKENDS_PROMETHEUS_OUTPUT_VARIABLES = (1 << 4), + BACKENDS_PROMETHEUS_OUTPUT_OLDUNITS = (1 << 5), + BACKENDS_PROMETHEUS_OUTPUT_HIDEUNITS = (1 << 6) +} BACKENDS_PROMETHEUS_OUTPUT_OPTIONS; + +extern void backends_rrd_stats_api_v1_charts_allmetrics_prometheus_single_host(RRDHOST *host, BUFFER *wb, const char *server, const char *prefix, BACKEND_OPTIONS backend_options, BACKENDS_PROMETHEUS_OUTPUT_OPTIONS output_options); +extern void backends_rrd_stats_api_v1_charts_allmetrics_prometheus_all_hosts(RRDHOST *host, BUFFER *wb, const char *server, const char *prefix, BACKEND_OPTIONS backend_options, BACKENDS_PROMETHEUS_OUTPUT_OPTIONS output_options); + +#if ENABLE_PROMETHEUS_REMOTE_WRITE +extern void backends_rrd_stats_remote_write_allmetrics_prometheus( + RRDHOST *host + , const char *__hostname + , const char *prefix + , BACKEND_OPTIONS backend_options + , time_t after + , time_t before + , size_t *count_charts + , size_t *count_dims + , size_t *count_dims_skipped +); +extern int backends_process_prometheus_remote_write_response(BUFFER *b); +#endif + +#endif //NETDATA_BACKEND_PROMETHEUS_H diff --git a/backends/prometheus/remote_write/Makefile.am b/backends/prometheus/remote_write/Makefile.am new file mode 100644 index 0000000..d049ef4 --- /dev/null +++ b/backends/prometheus/remote_write/Makefile.am @@ -0,0 +1,14 @@ +# SPDX-License-Identifier: GPL-3.0-or-later + +AUTOMAKE_OPTIONS = subdir-objects +MAINTAINERCLEANFILES = $(srcdir)/Makefile.in + +CLEANFILES = \ + remote_write.pb.cc \ + remote_write.pb.h \ + $(NULL) + +dist_noinst_DATA = \ + remote_write.proto \ + README.md \ + $(NULL) diff --git a/backends/prometheus/remote_write/README.md b/backends/prometheus/remote_write/README.md new file mode 100644 index 0000000..b83575e --- /dev/null +++ b/backends/prometheus/remote_write/README.md @@ -0,0 +1,41 @@ +<!-- +title: "Prometheus remote write backend" +custom_edit_url: https://github.com/netdata/netdata/edit/master/backends/prometheus/remote_write/README.md +--> + +# Prometheus remote write backend + +## Prerequisites + +To use the prometheus remote write API with [storage +providers](https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage) +[protobuf](https://developers.google.com/protocol-buffers/) and [snappy](https://github.com/google/snappy) libraries +should be installed first. Next, Netdata should be re-installed from the source. The installer will detect that the +required libraries and utilities are now available. + +## Configuration + +An additional option in the backend configuration section is available for the remote write backend: + +```conf +[backend] + remote write URL path = /receive +``` + +The default value is `/receive`. `remote write URL path` is used to set an endpoint path for the remote write protocol. +For example, if your endpoint is `http://example.domain:example_port/storage/read` you should set + +```conf +[backend] + destination = example.domain:example_port + remote write URL path = /storage/read +``` + +`buffered` and `lost` dimensions in the Netdata Backend Data Size operation monitoring chart estimate uncompressed +buffer size on failures. + +## Notes + +The remote write backend does not support `buffer on failures` + +[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fbackends%2Fprometheus%2Fremote_write%2FREADME&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/backends/prometheus/remote_write/remote_write.cc b/backends/prometheus/remote_write/remote_write.cc new file mode 100644 index 0000000..9448595 --- /dev/null +++ b/backends/prometheus/remote_write/remote_write.cc @@ -0,0 +1,120 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include <snappy.h> +#include "remote_write.pb.h" +#include "remote_write.h" + +using namespace prometheus; + +static google::protobuf::Arena arena; +static WriteRequest *write_request; + +void backends_init_write_request() { + GOOGLE_PROTOBUF_VERIFY_VERSION; + write_request = google::protobuf::Arena::CreateMessage<WriteRequest>(&arena); +} + +void backends_clear_write_request() { + write_request->clear_timeseries(); +} + +void backends_add_host_info(const char *name, const char *instance, const char *application, const char *version, const int64_t timestamp) { + TimeSeries *timeseries; + Sample *sample; + Label *label; + + timeseries = write_request->add_timeseries(); + + label = timeseries->add_labels(); + label->set_name("__name__"); + label->set_value(name); + + label = timeseries->add_labels(); + label->set_name("instance"); + label->set_value(instance); + + if(application) { + label = timeseries->add_labels(); + label->set_name("application"); + label->set_value(application); + } + + if(version) { + label = timeseries->add_labels(); + label->set_name("version"); + label->set_value(version); + } + + sample = timeseries->add_samples(); + sample->set_value(1); + sample->set_timestamp(timestamp); +} + +// adds tag to the last created timeseries +void backends_add_tag(char *tag, char *value) { + TimeSeries *timeseries; + Label *label; + + timeseries = write_request->mutable_timeseries(write_request->timeseries_size() - 1); + + label = timeseries->add_labels(); + label->set_name(tag); + label->set_value(value); +} + +void backends_add_metric(const char *name, const char *chart, const char *family, const char *dimension, const char *instance, const double value, const int64_t timestamp) { + TimeSeries *timeseries; + Sample *sample; + Label *label; + + timeseries = write_request->add_timeseries(); + + label = timeseries->add_labels(); + label->set_name("__name__"); + label->set_value(name); + + label = timeseries->add_labels(); + label->set_name("chart"); + label->set_value(chart); + + label = timeseries->add_labels(); + label->set_name("family"); + label->set_value(family); + + if(dimension) { + label = timeseries->add_labels(); + label->set_name("dimension"); + label->set_value(dimension); + } + + label = timeseries->add_labels(); + label->set_name("instance"); + label->set_value(instance); + + sample = timeseries->add_samples(); + sample->set_value(value); + sample->set_timestamp(timestamp); +} + +size_t backends_get_write_request_size(){ +#if GOOGLE_PROTOBUF_VERSION < 3001000 + size_t size = (size_t)snappy::MaxCompressedLength(write_request->ByteSize()); +#else + size_t size = (size_t)snappy::MaxCompressedLength(write_request->ByteSizeLong()); +#endif + + return (size < INT_MAX)?size:0; +} + +int backends_pack_write_request(char *buffer, size_t *size) { + std::string uncompressed_write_request; + if(write_request->SerializeToString(&uncompressed_write_request) == false) return 1; + + snappy::RawCompress(uncompressed_write_request.data(), uncompressed_write_request.size(), buffer, size); + + return 0; +} + +void backends_protocol_buffers_shutdown() { + google::protobuf::ShutdownProtobufLibrary(); +} diff --git a/backends/prometheus/remote_write/remote_write.h b/backends/prometheus/remote_write/remote_write.h new file mode 100644 index 0000000..1307d72 --- /dev/null +++ b/backends/prometheus/remote_write/remote_write.h @@ -0,0 +1,30 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_BACKEND_PROMETHEUS_REMOTE_WRITE_H +#define NETDATA_BACKEND_PROMETHEUS_REMOTE_WRITE_H + +#ifdef __cplusplus +extern "C" { +#endif + +void backends_init_write_request(); + +void backends_clear_write_request(); + +void backends_add_host_info(const char *name, const char *instance, const char *application, const char *version, const int64_t timestamp); + +void backends_add_tag(char *tag, char *value); + +void backends_add_metric(const char *name, const char *chart, const char *family, const char *dimension, const char *instance, const double value, const int64_t timestamp); + +size_t backends_get_write_request_size(); + +int backends_pack_write_request(char *buffer, size_t *size); + +void backends_protocol_buffers_shutdown(); + +#ifdef __cplusplus +} +#endif + +#endif //NETDATA_BACKEND_PROMETHEUS_REMOTE_WRITE_H diff --git a/backends/prometheus/remote_write/remote_write.proto b/backends/prometheus/remote_write/remote_write.proto new file mode 100644 index 0000000..dfde254 --- /dev/null +++ b/backends/prometheus/remote_write/remote_write.proto @@ -0,0 +1,29 @@ +syntax = "proto3"; +package prometheus; + +option cc_enable_arenas = true; + +import "google/protobuf/descriptor.proto"; + +message WriteRequest { + repeated TimeSeries timeseries = 1 [(nullable) = false]; +} + +message TimeSeries { + repeated Label labels = 1 [(nullable) = false]; + repeated Sample samples = 2 [(nullable) = false]; +} + +message Label { + string name = 1; + string value = 2; +} + +message Sample { + double value = 1; + int64 timestamp = 2; +} + +extend google.protobuf.FieldOptions { + bool nullable = 65001; +} |