summaryrefslogtreecommitdiffstats
path: root/src/streaming
diff options
context:
space:
mode:
Diffstat (limited to 'src/streaming')
-rw-r--r--src/streaming/README.md138
-rw-r--r--src/streaming/h2o-common.h (renamed from src/streaming/common.h)0
-rw-r--r--src/streaming/protocol/command-begin-set-end.c126
-rw-r--r--src/streaming/protocol/command-chart-definition.c206
-rw-r--r--src/streaming/protocol/command-claimed_id.c78
-rw-r--r--src/streaming/protocol/command-function.c20
-rw-r--r--src/streaming/protocol/command-host-labels.c25
-rw-r--r--src/streaming/protocol/command-host-variables.c52
-rw-r--r--src/streaming/protocol/command-nodeid.c128
-rw-r--r--src/streaming/protocol/commands.c58
-rw-r--r--src/streaming/protocol/commands.h41
-rw-r--r--src/streaming/receiver.c673
-rw-r--r--src/streaming/receiver.h93
-rw-r--r--src/streaming/replication.c9
-rw-r--r--src/streaming/replication.h6
-rw-r--r--src/streaming/rrdhost-status.c355
-rw-r--r--src/streaming/rrdhost-status.h161
-rw-r--r--src/streaming/rrdpush.c1418
-rw-r--r--src/streaming/rrdpush.h761
-rw-r--r--src/streaming/sender-commit.c168
-rw-r--r--src/streaming/sender-connect.c741
-rw-r--r--src/streaming/sender-destinations.c143
-rw-r--r--src/streaming/sender-destinations.h38
-rw-r--r--src/streaming/sender-execute.c294
-rw-r--r--src/streaming/sender-internals.h48
-rw-r--r--src/streaming/sender.c1412
-rw-r--r--src/streaming/sender.h169
-rw-r--r--src/streaming/stream-capabilities.c169
-rw-r--r--src/streaming/stream-capabilities.h100
-rw-r--r--src/streaming/stream-compression/brotli.c (renamed from src/streaming/compression_brotli.c)2
-rw-r--r--src/streaming/stream-compression/brotli.h (renamed from src/streaming/compression_brotli.h)0
-rw-r--r--src/streaming/stream-compression/compression.c (renamed from src/streaming/compression.c)34
-rw-r--r--src/streaming/stream-compression/compression.h (renamed from src/streaming/compression.h)12
-rw-r--r--src/streaming/stream-compression/gzip.c (renamed from src/streaming/compression_gzip.c)2
-rw-r--r--src/streaming/stream-compression/gzip.h (renamed from src/streaming/compression_gzip.h)0
-rw-r--r--src/streaming/stream-compression/lz4.c (renamed from src/streaming/compression_lz4.c)2
-rw-r--r--src/streaming/stream-compression/lz4.h (renamed from src/streaming/compression_lz4.h)0
-rw-r--r--src/streaming/stream-compression/zstd.c (renamed from src/streaming/compression_zstd.c)2
-rw-r--r--src/streaming/stream-compression/zstd.h (renamed from src/streaming/compression_zstd.h)0
-rw-r--r--src/streaming/stream-conf.c137
-rw-r--r--src/streaming/stream-conf.h28
-rw-r--r--src/streaming/stream-handshake.c53
-rw-r--r--src/streaming/stream-handshake.h82
-rw-r--r--src/streaming/stream-path.c353
-rw-r--r--src/streaming/stream-path.h54
-rw-r--r--src/streaming/stream.conf73
46 files changed, 4730 insertions, 3734 deletions
diff --git a/src/streaming/README.md b/src/streaming/README.md
index fe4e01bae..74b5691d0 100644
--- a/src/streaming/README.md
+++ b/src/streaming/README.md
@@ -30,6 +30,8 @@ node**. This file is automatically generated by Netdata the first time it is sta
#### `[stream]` section
+This section is used by the sending Netdata.
+
| Setting | Default | Description |
|-------------------------------------------------|---------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `enabled` | `no` | Whether this node streams metrics to any parent. Change to `yes` to enable streaming. |
@@ -38,34 +40,62 @@ node**. This file is automatically generated by Netdata the first time it is sta
| `CApath` | `/etc/ssl/certs/` | The directory where known certificates are found. Defaults to OpenSSL's default path. |
| `CAfile` | `/etc/ssl/certs/cert.pem` | Add a parent node certificate to the list of known certificates in `CAPath`. |
| `api key` | | The `API_KEY` to use as the child node. |
-| `timeout seconds` | `60` | The timeout to connect and send metrics to a parent. |
+| `timeout` | `1m` | The timeout to connect and send metrics to a parent. |
| `default port` | `19999` | The port to use if `destination` does not specify one. |
| [`send charts matching`](#send-charts-matching) | `*` | A space-separated list of [Netdata simple patterns](/src/libnetdata/simple_pattern/README.md) to filter which charts are streamed. [Read more →](#send-charts-matching) |
| `buffer size bytes` | `10485760` | The size of the buffer to use when sending metrics. The default `10485760` equals a buffer of 10MB, which is good for 60 seconds of data. Increase this if you expect latencies higher than that. The buffer is flushed on reconnect. |
-| `reconnect delay seconds` | `5` | How long to wait until retrying to connect to the parent node. |
+| `reconnect delay` | `5s` | How long to wait until retrying to connect to the parent node. |
| `initial clock resync iterations` | `60` | Sync the clock of charts for how many seconds when starting. |
| `parent using h2o` | `no` | Set to yes if you are connecting to parent trough it's h2o webserver/port. Currently there is no reason to set this to `yes` unless you are testing the new h2o based netdata webserver. When production ready this will be set to `yes` as default. |
-### `[API_KEY]` and `[MACHINE_GUID]` sections
-
-| Setting | Default | Description |
-|-----------------------------------------------|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `enabled` | `no` | Whether this API KEY enabled or disabled. |
-| [`allow from`](#allow-from) | `*` | A space-separated list of [Netdata simple patterns](/src/libnetdata/simple_pattern/README.md) matching the IPs of nodes that will stream metrics using this API key. [Read more →](#allow-from) |
-| `default history` | `3600` | The default amount of child metrics history to retain when using the `ram` memory mode. |
-| [`default memory mode`](#default-memory-mode) | `ram` | The [database](/src/database/README.md) to use for all nodes using this `API_KEY`. Valid settings are `dbengine`, `ram`, or `none`. [Read more →](#default-memory-mode) |
-| `health enabled by default` | `auto` | Whether alerts and notifications should be enabled for nodes using this `API_KEY`. `auto` enables alerts when the child is connected. `yes` enables alerts always, and `no` disables alerts. |
-| `default postpone alarms on connect seconds` | `60` | Postpone alerts and notifications for a period of time after the child connects. |
-| `default health log history` | `432000` | History of health log events (in seconds) kept in the database. |
-| `default proxy enabled` | | Route metrics through a proxy. |
-| `default proxy destination` | | Space-separated list of `IP:PORT` for proxies. |
-| `default proxy api key` | | The `API_KEY` of the proxy. |
-| `default send charts matching` | `*` | See [`send charts matching`](#send-charts-matching). |
-| `enable compression` | `yes` | Enable/disable stream compression. |
-| `enable replication` | `yes` | Enable/disable replication. |
-| `seconds to replicate` | `86400` | How many seconds of data to replicate from each child at a time |
-| `seconds per replication step` | `600` | The duration we want to replicate per each replication step. |
-| `is ephemeral node` | `no` | Indicate whether this child is an ephemeral node. An ephemeral node will become unavailable after the specified duration of "cleanup ephemeral hosts after secs" from the time of the node's last connection. |
+### `[API_KEY]` sections
+
+This section defines an API key for other agents to connect to this Netdata.
+
+| Setting | Default | Description |
+|------------------------------|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `enabled` | `no` | Whether this API KEY enabled or disabled. |
+| `type` | `api` | This section defines an API key. |
+| [`allow from`](#allow-from) | `*` | A space-separated list of [Netdata simple patterns](/src/libnetdata/simple_pattern/README.md) matching the IPs of nodes that will stream metrics using this API key. [Read more →](#allow-from) |
+| `retention` | `1h` | The default amount of child metrics history to retain when using the `ram` db. |
+| [`db`](#default-memory-mode) | `dbengine` | The [database](/src/database/README.md) to use for all nodes using this `API_KEY`. Valid settings are `dbengine`, `ram`, or `none`. [Read more →](#default-memory-mode) |
+| `health enabled by default` | `auto` | Whether alerts and notifications should be enabled for nodes using this `API_KEY`. `auto` enables alerts when the child is connected. `yes` enables alerts always, and `no` disables alerts. |
+| `postpone alerts on connect` | `1m` | Postpone alerts and notifications for a period of time after the child connects. |
+| `health log retention` | `5d` | History of health log events (in seconds) kept in the database. |
+| `proxy enabled` | | Route metrics through a proxy. |
+| `proxy destination` | | Space-separated list of `IP:PORT` for proxies. |
+| `proxy api key` | | The `API_KEY` of the proxy. |
+| `send charts matching` | `*` | See [`send charts matching`](#send-charts-matching). |
+| `enable compression` | `yes` | Enable/disable stream compression. |
+| `enable replication` | `yes` | Enable/disable replication. |
+| `replication period` | `1d` | Limits the maximum window that will be replicated from each child. |
+| `replication step` | `10m` | The duration we want to replicate per each replication step. |
+| `is ephemeral node` | `no` | Indicate whether this child is an ephemeral node. An ephemeral node will become unavailable after the specified duration of "cleanup ephemeral hosts after" from the time of the node's last connection. |
+
+
+### `[MACHINE_GUID]` sections
+
+This section is about customizing configuration for specific agents. It allows many agents to share the same API key, while providing customizability per remote agent.
+
+| Setting | Default | Description |
+|------------------------------|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `enabled` | `no` | Whether this MACHINE_GUID enabled or disabled. |
+| `type` | `machine` | This section defines the configuration for a specific agent. |
+| [`allow from`](#allow-from) | `*` | A space-separated list of [Netdata simple patterns](/src/libnetdata/simple_pattern/README.md) matching the IPs of nodes that will stream metrics using this API key. [Read more →](#allow-from) |
+| `retention` | `3600` | The default amount of child metrics history to retain when using the `ram` db. |
+| [`db`](#default-memory-mode) | `dbengine` | The [database](/src/database/README.md) to use for all nodes using this `API_KEY`. Valid settings are `dbengine`, `ram`, or `none`. [Read more →](#default-memory-mode) |
+| `health enabled` | `auto` | Whether alerts and notifications should be enabled for nodes using this `API_KEY`. `auto` enables alerts when the child is connected. `yes` enables alerts always, and `no` disables alerts. |
+| `postpone alerts on connect` | `1m` | Postpone alerts and notifications for a period of time after the child connects. |
+| `health log retention` | `5d` | History of health log events (in seconds) kept in the database. |
+| `proxy enabled` | | Route metrics through a proxy. |
+| `proxy destination` | | Space-separated list of `IP:PORT` for proxies. |
+| `proxy api key` | | The `API_KEY` of the proxy. |
+| `send charts matching` | `*` | See [`send charts matching`](#send-charts-matching). |
+| `enable compression` | `yes` | Enable/disable stream compression. |
+| `enable replication` | `yes` | Enable/disable replication. |
+| `replication period` | `1d` | Limits the maximum window that will be replicated from each child. |
+| `replication step` | `10m` | The duration we want to replicate per each replication step. |
+| `is ephemeral node` | `no` | Indicate whether this child is an ephemeral node. An ephemeral node will become unavailable after the specified duration of "cleanup ephemeral hosts after" from the time of the node's last connection. |
#### `destination`
@@ -81,7 +111,7 @@ the following format: `[PROTOCOL:]HOST[%INTERFACE][:PORT][:SSL]`.
To enable TCP streaming to a parent node at `203.0.113.0` on port `20000` and with TLS/SSL encryption:
-```conf
+```text
[stream]
destination = tcp:203.0.113.0:20000:SSL
```
@@ -95,14 +125,14 @@ The default is a single wildcard `*`, which streams all charts.
To send only a few charts, list them explicitly, or list a group using a wildcard. To send _only_ the `apps.cpu` chart
and charts with contexts beginning with `system.`:
-```conf
+```text
[stream]
send charts matching = apps.cpu system.*
```
To send all but a few charts, use `!` to create a negative match. To send _all_ charts _but_ `apps.cpu`:
-```conf
+```text
[stream]
send charts matching = !apps.cpu *
```
@@ -116,14 +146,14 @@ The default is `*`, which accepts all requests including the `API_KEY`.
To allow from only a specific IP address:
-```conf
+```text
[API_KEY]
allow from = 203.0.113.10
```
To allow all IPs starting with `10.*`, except `10.1.2.3`:
-```conf
+```text
[API_KEY]
allow from = !10.1.2.3 10.*
```
@@ -131,7 +161,7 @@ To allow all IPs starting with `10.*`, except `10.1.2.3`:
> If you set specific IP addresses here, and also use the `allow connections` setting in the `[web]` section of
> `netdata.conf`, be sure to add the IP address there so that it can access the API port.
-#### `default memory mode`
+#### `db`
The [database](/src/database/README.md) to use for all nodes using this `API_KEY`.
Valid settings are `dbengine`, `ram`, , or `none`.
@@ -142,19 +172,15 @@ Valid settings are `dbengine`, `ram`, , or `none`.
streaming configurations that use ephemeral nodes.
- `none`: No database.
-When using `default memory mode = dbengine`, the parent node creates a separate instance of the TSDB to store metrics
-from child nodes. The [size of _each_ instance is configurable](/docs/netdata-agent/configuration/optimizing-metrics-database/change-metrics-storage.md) with the `page
-cache size` and `dbengine multihost disk space` settings in the `[global]` section in `netdata.conf`.
-
### `netdata.conf`
-| Setting | Default | Description |
-|--------------------------------------------|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `[global]` section | | |
-| `memory mode` | `dbengine` | Determines the [database type](/src/database/README.md) to be used on that node. Other options settings include `none`, and `ram`. `none` disables the database at this host. This also disables alerts and notifications, as those can't run without a database. |
-| `[web]` section | | |
-| `mode` | `static-threaded` | Determines the [web server](/src/web/server/README.md) type. The other option is `none`, which disables the dashboard, API, and registry. |
-| `accept a streaming request every seconds` | `0` | Set a limit on how often a parent node accepts streaming requests from child nodes. `0` equals no limit. If this is set, you may see `... too busy to accept new streaming request. Will be allowed in X secs` in Netdata's `error.log`. |
+| Setting | Default | Description |
+|------------------------------------|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `[db]` section | | |
+| `mode` | `dbengine` | Determines the [database type](/src/database/README.md) to be used on that node. Other options settings include `none`, and `ram`. `none` disables the database at this host. This also disables alerts and notifications, as those can't run without a database. |
+| `[web]` section | | |
+| `mode` | `static-threaded` | Determines the [web server](/src/web/server/README.md) type. The other option is `none`, which disables the dashboard, API, and registry. |
+| `accept a streaming request every` | `off` | Set a limit on how often a parent node accepts streaming requests from child nodes. `0` equals no limit. If this is set, you may see `... too busy to accept new streaming request. Will be allowed in X secs` in Netdata's `error.log`. |
### Basic use cases
@@ -175,16 +201,16 @@ with the `[MACHINE_GUID]` section.
For example, the metrics streamed from only the child node with `MACHINE_GUID` are saved in memory, not using the
default `dbengine` as specified by the `API_KEY`, and alerts are disabled.
-```conf
+```text
[API_KEY]
enabled = yes
- default memory mode = dbengine
- health enabled by default = auto
+ db = dbengine
+ health enabled = auto
allow from = *
[MACHINE_GUID]
enabled = yes
- memory mode = ram
+ db = ram
health enabled = no
```
@@ -405,7 +431,7 @@ In the following example, the proxy receives metrics from a child node using the
`66666666-7777-8888-9999-000000000000`, then stores metrics using `dbengine`. It then uses the `API_KEY` of
`11111111-2222-3333-4444-555555555555` to proxy those same metrics on to a parent node at `203.0.113.0`.
-```conf
+```text
[stream]
enabled = yes
destination = 203.0.113.0
@@ -413,7 +439,7 @@ In the following example, the proxy receives metrics from a child node using the
[66666666-7777-8888-9999-000000000000]
enabled = yes
- default memory mode = dbengine
+ db = dbengine
```
### Ephemeral nodes
@@ -423,13 +449,13 @@ metrics to any number of permanently-running parent nodes.
On the parent, set the following in `stream.conf`:
-```conf
+```text
[11111111-2222-3333-4444-555555555555]
# enable/disable this API key
enabled = yes
# one hour of data for each of the child nodes
- default history = 3600
+ history = 1h
# do not save child metrics on disk
default memory = ram
@@ -455,9 +481,9 @@ On the child nodes, set the following in `stream.conf`:
In addition, edit `netdata.conf` on each child node to disable the database and alerts.
```bash
-[global]
+[db]
# disable the local database
- memory mode = none
+ db = none
[health]
# disable health checks
@@ -471,16 +497,16 @@ This replication process ensures data continuity even if child nodes temporarily
Replication is enabled by default in Netdata, but you can customize the replication behavior by modifying the `[API_KEY]` section of the `stream.conf` file. Here's an example configuration:
-```conf
+```text
[11111111-2222-3333-4444-555555555555]
# Enable replication for all hosts using this api key. Default: yes.
enable replication = yes
- # How many seconds of data to replicate from each child at a time. Default: a day (86400 seconds).
- seconds to replicate = 86400
+ # How many seconds of data to replicate from each child at a time. Default: a day.
+ replication period = 1d
- # The duration we want to replicate per each replication step. Default: 600 seconds (10 minutes).
- seconds per replication step = 600
+ # The duration we want to replicate per each replication step. Default: 10 minutes.
+ replication step = 10m
```
You can monitor the replication process in two ways:
@@ -597,9 +623,9 @@ ERROR : STREAM_SENDER[CHILD HOSTNAME] : STREAM child HOSTNAME [send to PARENT HO
### Stream charts wrong
Chart data needs to be consistent between child and parent nodes. If there are differences between chart data on
-a parent and a child, such as gaps in metrics collection, it most often means your child's `memory mode`
+a parent and a child, such as gaps in metrics collection, it most often means your child's `[db].db` setting
does not match the parent's. To learn more about the different ways Netdata can store metrics, and thus keep chart
-data consistent, read our [memory mode documentation](/src/database/README.md).
+data consistent, read our [db documentation](/src/database/README.md).
### Forbidding access
diff --git a/src/streaming/common.h b/src/streaming/h2o-common.h
index b7292f4d0..b7292f4d0 100644
--- a/src/streaming/common.h
+++ b/src/streaming/h2o-common.h
diff --git a/src/streaming/protocol/command-begin-set-end.c b/src/streaming/protocol/command-begin-set-end.c
new file mode 100644
index 000000000..17daef776
--- /dev/null
+++ b/src/streaming/protocol/command-begin-set-end.c
@@ -0,0 +1,126 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "commands.h"
+#include "plugins.d/pluginsd_internals.h"
+
+static void rrdpush_send_chart_metrics(BUFFER *wb, RRDSET *st, struct sender_state *s __maybe_unused, RRDSET_FLAGS flags) {
+ buffer_fast_strcat(wb, "BEGIN \"", 7);
+ buffer_fast_strcat(wb, rrdset_id(st), string_strlen(st->id));
+ buffer_fast_strcat(wb, "\" ", 2);
+
+ if(st->last_collected_time.tv_sec > st->rrdpush.sender.resync_time_s)
+ buffer_print_uint64(wb, st->usec_since_last_update);
+ else
+ buffer_fast_strcat(wb, "0", 1);
+
+ buffer_fast_strcat(wb, "\n", 1);
+
+ RRDDIM *rd;
+ rrddim_foreach_read(rd, st) {
+ if(unlikely(!rrddim_check_updated(rd)))
+ continue;
+
+ if(likely(rrddim_check_upstream_exposed_collector(rd))) {
+ buffer_fast_strcat(wb, "SET \"", 5);
+ buffer_fast_strcat(wb, rrddim_id(rd), string_strlen(rd->id));
+ buffer_fast_strcat(wb, "\" = ", 4);
+ buffer_print_int64(wb, rd->collector.collected_value);
+ buffer_fast_strcat(wb, "\n", 1);
+ }
+ else {
+ internal_error(true, "STREAM: 'host:%s/chart:%s/dim:%s' flag 'exposed' is updated but not exposed",
+ rrdhost_hostname(st->rrdhost), rrdset_id(st), rrddim_id(rd));
+ // we will include it in the next iteration
+ rrddim_metadata_updated(rd);
+ }
+ }
+ rrddim_foreach_done(rd);
+
+ if(unlikely(flags & RRDSET_FLAG_UPSTREAM_SEND_VARIABLES))
+ rrdvar_print_to_streaming_custom_chart_variables(st, wb);
+
+ buffer_fast_strcat(wb, "END\n", 4);
+}
+
+void rrdset_push_metrics_v1(RRDSET_STREAM_BUFFER *rsb, RRDSET *st) {
+ RRDHOST *host = st->rrdhost;
+ rrdpush_send_chart_metrics(rsb->wb, st, host->sender, rsb->rrdset_flags);
+}
+
+void rrddim_push_metrics_v2(RRDSET_STREAM_BUFFER *rsb, RRDDIM *rd, usec_t point_end_time_ut, NETDATA_DOUBLE n, SN_FLAGS flags) {
+ if(!rsb->wb || !rsb->v2 || !netdata_double_isnumber(n) || !does_storage_number_exist(flags))
+ return;
+
+ bool with_slots = stream_has_capability(rsb, STREAM_CAP_SLOTS) ? true : false;
+ NUMBER_ENCODING integer_encoding = stream_has_capability(rsb, STREAM_CAP_IEEE754) ? NUMBER_ENCODING_BASE64 : NUMBER_ENCODING_HEX;
+ NUMBER_ENCODING doubles_encoding = stream_has_capability(rsb, STREAM_CAP_IEEE754) ? NUMBER_ENCODING_BASE64 : NUMBER_ENCODING_DECIMAL;
+ BUFFER *wb = rsb->wb;
+ time_t point_end_time_s = (time_t)(point_end_time_ut / USEC_PER_SEC);
+ if(unlikely(rsb->last_point_end_time_s != point_end_time_s)) {
+
+ if(unlikely(rsb->begin_v2_added))
+ buffer_fast_strcat(wb, PLUGINSD_KEYWORD_END_V2 "\n", sizeof(PLUGINSD_KEYWORD_END_V2) - 1 + 1);
+
+ buffer_fast_strcat(wb, PLUGINSD_KEYWORD_BEGIN_V2, sizeof(PLUGINSD_KEYWORD_BEGIN_V2) - 1);
+
+ if(with_slots) {
+ buffer_fast_strcat(wb, " "PLUGINSD_KEYWORD_SLOT":", sizeof(PLUGINSD_KEYWORD_SLOT) - 1 + 2);
+ buffer_print_uint64_encoded(wb, integer_encoding, rd->rrdset->rrdpush.sender.chart_slot);
+ }
+
+ buffer_fast_strcat(wb, " '", 2);
+ buffer_fast_strcat(wb, rrdset_id(rd->rrdset), string_strlen(rd->rrdset->id));
+ buffer_fast_strcat(wb, "' ", 2);
+ buffer_print_uint64_encoded(wb, integer_encoding, rd->rrdset->update_every);
+ buffer_fast_strcat(wb, " ", 1);
+ buffer_print_uint64_encoded(wb, integer_encoding, point_end_time_s);
+ buffer_fast_strcat(wb, " ", 1);
+ if(point_end_time_s == rsb->wall_clock_time)
+ buffer_fast_strcat(wb, "#", 1);
+ else
+ buffer_print_uint64_encoded(wb, integer_encoding, rsb->wall_clock_time);
+ buffer_fast_strcat(wb, "\n", 1);
+
+ rsb->last_point_end_time_s = point_end_time_s;
+ rsb->begin_v2_added = true;
+ }
+
+ buffer_fast_strcat(wb, PLUGINSD_KEYWORD_SET_V2, sizeof(PLUGINSD_KEYWORD_SET_V2) - 1);
+
+ if(with_slots) {
+ buffer_fast_strcat(wb, " "PLUGINSD_KEYWORD_SLOT":", sizeof(PLUGINSD_KEYWORD_SLOT) - 1 + 2);
+ buffer_print_uint64_encoded(wb, integer_encoding, rd->rrdpush.sender.dim_slot);
+ }
+
+ buffer_fast_strcat(wb, " '", 2);
+ buffer_fast_strcat(wb, rrddim_id(rd), string_strlen(rd->id));
+ buffer_fast_strcat(wb, "' ", 2);
+ buffer_print_int64_encoded(wb, integer_encoding, rd->collector.last_collected_value);
+ buffer_fast_strcat(wb, " ", 1);
+
+ if((NETDATA_DOUBLE)rd->collector.last_collected_value == n)
+ buffer_fast_strcat(wb, "#", 1);
+ else
+ buffer_print_netdata_double_encoded(wb, doubles_encoding, n);
+
+ buffer_fast_strcat(wb, " ", 1);
+ buffer_print_sn_flags(wb, flags, true);
+ buffer_fast_strcat(wb, "\n", 1);
+}
+
+void rrdset_push_metrics_finished(RRDSET_STREAM_BUFFER *rsb, RRDSET *st) {
+ if(!rsb->wb)
+ return;
+
+ if(rsb->v2 && rsb->begin_v2_added) {
+ if(unlikely(rsb->rrdset_flags & RRDSET_FLAG_UPSTREAM_SEND_VARIABLES))
+ rrdvar_print_to_streaming_custom_chart_variables(st, rsb->wb);
+
+ buffer_fast_strcat(rsb->wb, PLUGINSD_KEYWORD_END_V2 "\n", sizeof(PLUGINSD_KEYWORD_END_V2) - 1 + 1);
+ }
+
+ sender_commit(st->rrdhost->sender, rsb->wb, STREAM_TRAFFIC_TYPE_DATA);
+
+ *rsb = (RRDSET_STREAM_BUFFER){ .wb = NULL, };
+}
+
diff --git a/src/streaming/protocol/command-chart-definition.c b/src/streaming/protocol/command-chart-definition.c
new file mode 100644
index 000000000..864d13242
--- /dev/null
+++ b/src/streaming/protocol/command-chart-definition.c
@@ -0,0 +1,206 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "commands.h"
+#include "plugins.d/pluginsd_internals.h"
+
+// chart labels
+static int send_clabels_callback(const char *name, const char *value, RRDLABEL_SRC ls, void *data) {
+ BUFFER *wb = (BUFFER *)data;
+ buffer_sprintf(wb, PLUGINSD_KEYWORD_CLABEL " \"%s\" \"%s\" %d\n", name, value, ls & ~(RRDLABEL_FLAG_INTERNAL));
+ return 1;
+}
+
+static void rrdpush_send_clabels(BUFFER *wb, RRDSET *st) {
+ if (st->rrdlabels) {
+ if(rrdlabels_walkthrough_read(st->rrdlabels, send_clabels_callback, wb) > 0)
+ buffer_sprintf(wb, PLUGINSD_KEYWORD_CLABEL_COMMIT "\n");
+ }
+}
+
+// Send the current chart definition.
+// Assumes that collector thread has already called sender_start for mutex / buffer state.
+bool rrdpush_send_chart_definition(BUFFER *wb, RRDSET *st) {
+ uint32_t version = rrdset_metadata_version(st);
+
+ RRDHOST *host = st->rrdhost;
+ NUMBER_ENCODING integer_encoding = stream_has_capability(host->sender, STREAM_CAP_IEEE754) ? NUMBER_ENCODING_BASE64 : NUMBER_ENCODING_HEX;
+ bool with_slots = stream_has_capability(host->sender, STREAM_CAP_SLOTS) ? true : false;
+
+ bool replication_progress = false;
+
+ // properly set the name for the remote end to parse it
+ char *name = "";
+ if(likely(st->name)) {
+ if(unlikely(st->id != st->name)) {
+ // they differ
+ name = strchr(rrdset_name(st), '.');
+ if(name)
+ name++;
+ else
+ name = "";
+ }
+ }
+
+ buffer_fast_strcat(wb, PLUGINSD_KEYWORD_CHART, sizeof(PLUGINSD_KEYWORD_CHART) - 1);
+
+ if(with_slots) {
+ buffer_fast_strcat(wb, " "PLUGINSD_KEYWORD_SLOT":", sizeof(PLUGINSD_KEYWORD_SLOT) - 1 + 2);
+ buffer_print_uint64_encoded(wb, integer_encoding, st->rrdpush.sender.chart_slot);
+ }
+
+ // send the chart
+ buffer_sprintf(
+ wb
+ , " \"%s\" \"%s\" \"%s\" \"%s\" \"%s\" \"%s\" \"%s\" %d %d \"%s %s %s\" \"%s\" \"%s\"\n"
+ , rrdset_id(st)
+ , name
+ , rrdset_title(st)
+ , rrdset_units(st)
+ , rrdset_family(st)
+ , rrdset_context(st)
+ , rrdset_type_name(st->chart_type)
+ , st->priority
+ , st->update_every
+ , rrdset_flag_check(st, RRDSET_FLAG_OBSOLETE)?"obsolete":""
+ , rrdset_flag_check(st, RRDSET_FLAG_STORE_FIRST)?"store_first":""
+ , rrdset_flag_check(st, RRDSET_FLAG_HIDDEN)?"hidden":""
+ , rrdset_plugin_name(st)
+ , rrdset_module_name(st)
+ );
+
+ // send the chart labels
+ if (stream_has_capability(host->sender, STREAM_CAP_CLABELS))
+ rrdpush_send_clabels(wb, st);
+
+ // send the dimensions
+ RRDDIM *rd;
+ rrddim_foreach_read(rd, st) {
+ buffer_fast_strcat(wb, PLUGINSD_KEYWORD_DIMENSION, sizeof(PLUGINSD_KEYWORD_DIMENSION) - 1);
+
+ if(with_slots) {
+ buffer_fast_strcat(wb, " "PLUGINSD_KEYWORD_SLOT":", sizeof(PLUGINSD_KEYWORD_SLOT) - 1 + 2);
+ buffer_print_uint64_encoded(wb, integer_encoding, rd->rrdpush.sender.dim_slot);
+ }
+
+ buffer_sprintf(
+ wb
+ , " \"%s\" \"%s\" \"%s\" %d %d \"%s %s %s\"\n"
+ , rrddim_id(rd)
+ , rrddim_name(rd)
+ , rrd_algorithm_name(rd->algorithm)
+ , rd->multiplier
+ , rd->divisor
+ , rrddim_flag_check(rd, RRDDIM_FLAG_OBSOLETE)?"obsolete":""
+ , rrddim_option_check(rd, RRDDIM_OPTION_HIDDEN)?"hidden":""
+ , rrddim_option_check(rd, RRDDIM_OPTION_DONT_DETECT_RESETS_OR_OVERFLOWS)?"noreset":""
+ );
+ }
+ rrddim_foreach_done(rd);
+
+ // send the chart functions
+ if(stream_has_capability(host->sender, STREAM_CAP_FUNCTIONS))
+ rrd_chart_functions_expose_rrdpush(st, wb);
+
+ // send the chart local custom variables
+ rrdvar_print_to_streaming_custom_chart_variables(st, wb);
+
+ if (stream_has_capability(host->sender, STREAM_CAP_REPLICATION)) {
+ time_t db_first_time_t, db_last_time_t;
+
+ time_t now = now_realtime_sec();
+ rrdset_get_retention_of_tier_for_collected_chart(st, &db_first_time_t, &db_last_time_t, now, 0);
+
+ buffer_sprintf(wb, PLUGINSD_KEYWORD_CHART_DEFINITION_END " %llu %llu %llu\n",
+ (unsigned long long)db_first_time_t,
+ (unsigned long long)db_last_time_t,
+ (unsigned long long)now);
+
+ if(!rrdset_flag_check(st, RRDSET_FLAG_SENDER_REPLICATION_IN_PROGRESS)) {
+ rrdset_flag_set(st, RRDSET_FLAG_SENDER_REPLICATION_IN_PROGRESS);
+ rrdset_flag_clear(st, RRDSET_FLAG_SENDER_REPLICATION_FINISHED);
+ rrdhost_sender_replicating_charts_plus_one(st->rrdhost);
+ }
+ replication_progress = true;
+
+#ifdef NETDATA_LOG_REPLICATION_REQUESTS
+ internal_error(true, "REPLAY: 'host:%s/chart:%s' replication starts",
+ rrdhost_hostname(st->rrdhost), rrdset_id(st));
+#endif
+ }
+
+ sender_commit(host->sender, wb, STREAM_TRAFFIC_TYPE_METADATA);
+
+ // we can set the exposed flag, after we commit the buffer
+ // because replication may pick it up prematurely
+ rrddim_foreach_read(rd, st) {
+ rrddim_metadata_exposed_upstream(rd, version);
+ }
+ rrddim_foreach_done(rd);
+ rrdset_metadata_exposed_upstream(st, version);
+
+ st->rrdpush.sender.resync_time_s = st->last_collected_time.tv_sec + (stream_conf_initial_clock_resync_iterations * st->update_every);
+ return replication_progress;
+}
+
+bool should_send_chart_matching(RRDSET *st, RRDSET_FLAGS flags) {
+ if(!(flags & RRDSET_FLAG_RECEIVER_REPLICATION_FINISHED))
+ return false;
+
+ if(unlikely(!(flags & (RRDSET_FLAG_UPSTREAM_SEND | RRDSET_FLAG_UPSTREAM_IGNORE)))) {
+ RRDHOST *host = st->rrdhost;
+
+ if (flags & RRDSET_FLAG_ANOMALY_DETECTION) {
+ if(ml_streaming_enabled())
+ rrdset_flag_set(st, RRDSET_FLAG_UPSTREAM_SEND);
+ else
+ rrdset_flag_set(st, RRDSET_FLAG_UPSTREAM_IGNORE);
+ }
+ else {
+ int negative = 0, positive = 0;
+ SIMPLE_PATTERN_RESULT r;
+
+ r = simple_pattern_matches_string_extract(host->rrdpush.send.charts_matching, st->context, NULL, 0);
+ if(r == SP_MATCHED_POSITIVE) positive++;
+ else if(r == SP_MATCHED_NEGATIVE) negative++;
+
+ if(!negative) {
+ r = simple_pattern_matches_string_extract(host->rrdpush.send.charts_matching, st->name, NULL, 0);
+ if (r == SP_MATCHED_POSITIVE) positive++;
+ else if (r == SP_MATCHED_NEGATIVE) negative++;
+ }
+
+ if(!negative) {
+ r = simple_pattern_matches_string_extract(host->rrdpush.send.charts_matching, st->id, NULL, 0);
+ if (r == SP_MATCHED_POSITIVE) positive++;
+ else if (r == SP_MATCHED_NEGATIVE) negative++;
+ }
+
+ if(!negative && positive)
+ rrdset_flag_set(st, RRDSET_FLAG_UPSTREAM_SEND);
+ else
+ rrdset_flag_set(st, RRDSET_FLAG_UPSTREAM_IGNORE);
+ }
+
+ // get the flags again, to know how to respond
+ flags = rrdset_flag_check(st, RRDSET_FLAG_UPSTREAM_SEND|RRDSET_FLAG_UPSTREAM_IGNORE);
+ }
+
+ return flags & RRDSET_FLAG_UPSTREAM_SEND;
+}
+
+// Called from the internal collectors to mark a chart obsolete.
+bool rrdset_push_chart_definition_now(RRDSET *st) {
+ RRDHOST *host = st->rrdhost;
+
+ if(unlikely(!rrdhost_can_send_definitions_to_parent(host)
+ || !should_send_chart_matching(st, rrdset_flag_get(st)))) {
+ return false;
+ }
+
+ BUFFER *wb = sender_start(host->sender);
+ rrdpush_send_chart_definition(wb, st);
+ sender_thread_buffer_free();
+
+ return true;
+}
+
diff --git a/src/streaming/protocol/command-claimed_id.c b/src/streaming/protocol/command-claimed_id.c
new file mode 100644
index 000000000..5392e1d3b
--- /dev/null
+++ b/src/streaming/protocol/command-claimed_id.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "commands.h"
+#include "plugins.d/pluginsd_internals.h"
+
+PARSER_RC rrdpush_receiver_pluginsd_claimed_id(char **words, size_t num_words, PARSER *parser) {
+ const char *machine_guid_str = get_word(words, num_words, 1);
+ const char *claim_id_str = get_word(words, num_words, 2);
+
+ if (!machine_guid_str || !claim_id_str) {
+ netdata_log_error("PLUGINSD: command CLAIMED_ID came malformed, machine_guid '%s', claim_id '%s'",
+ machine_guid_str ? machine_guid_str : "[unset]",
+ claim_id_str ? claim_id_str : "[unset]");
+ return PARSER_RC_ERROR;
+ }
+
+ RRDHOST *host = parser->user.host;
+
+ nd_uuid_t machine_uuid;
+ if(uuid_parse(machine_guid_str, machine_uuid)) {
+ netdata_log_error("PLUGINSD: parameter machine guid to CLAIMED_ID command is not valid UUID. "
+ "Received: '%s'.", machine_guid_str);
+ return PARSER_RC_ERROR;
+ }
+
+ nd_uuid_t claim_uuid;
+ if(strcmp(claim_id_str, "NULL") == 0)
+ uuid_clear(claim_uuid);
+
+ else if(uuid_parse(claim_id_str, claim_uuid) != 0) {
+ netdata_log_error("PLUGINSD: parameter claim id to CLAIMED_ID command is not valid UUID. "
+ "Received: '%s'.", claim_id_str);
+ return PARSER_RC_ERROR;
+ }
+
+ if(strcmp(machine_guid_str, host->machine_guid) != 0) {
+ netdata_log_error("PLUGINSD: received claim id for host '%s' but it came over the connection of '%s'",
+ machine_guid_str, host->machine_guid);
+ return PARSER_RC_OK; //the message is OK problem must be somewhere else
+ }
+
+ if(host == localhost) {
+ netdata_log_error("PLUGINSD: CLAIMED_ID command cannot be used to set the claimed id of localhost. "
+ "Received: '%s'.", claim_id_str);
+ return PARSER_RC_OK;
+ }
+
+ if(!uuid_is_null(claim_uuid)) {
+ uuid_copy(host->aclk.claim_id_of_origin.uuid, claim_uuid);
+ rrdpush_sender_send_claimed_id(host);
+ }
+
+ return PARSER_RC_OK;
+}
+
+void rrdpush_sender_send_claimed_id(RRDHOST *host) {
+ if(!stream_has_capability(host->sender, STREAM_CAP_CLAIM))
+ return;
+
+ if(unlikely(!rrdhost_can_send_definitions_to_parent(host)))
+ return;
+
+ BUFFER *wb = sender_start(host->sender);
+
+ char str[UUID_STR_LEN] = "";
+ ND_UUID uuid = host->aclk.claim_id_of_origin;
+ if(!UUIDiszero(uuid))
+ uuid_unparse_lower(uuid.uuid, str);
+ else
+ strncpyz(str, "NULL", sizeof(str) - 1);
+
+ buffer_sprintf(wb, PLUGINSD_KEYWORD_CLAIMED_ID " '%s' '%s'\n",
+ host->machine_guid, str);
+
+ sender_commit(host->sender, wb, STREAM_TRAFFIC_TYPE_METADATA);
+
+ sender_thread_buffer_free();
+}
diff --git a/src/streaming/protocol/command-function.c b/src/streaming/protocol/command-function.c
new file mode 100644
index 000000000..d9b28eb4e
--- /dev/null
+++ b/src/streaming/protocol/command-function.c
@@ -0,0 +1,20 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "commands.h"
+#include "plugins.d/pluginsd_internals.h"
+
+void rrdpush_send_global_functions(RRDHOST *host) {
+ if(!stream_has_capability(host->sender, STREAM_CAP_FUNCTIONS))
+ return;
+
+ if(unlikely(!rrdhost_can_send_definitions_to_parent(host)))
+ return;
+
+ BUFFER *wb = sender_start(host->sender);
+
+ rrd_global_functions_expose_rrdpush(host, wb, stream_has_capability(host->sender, STREAM_CAP_DYNCFG));
+
+ sender_commit(host->sender, wb, STREAM_TRAFFIC_TYPE_FUNCTIONS);
+
+ sender_thread_buffer_free();
+}
diff --git a/src/streaming/protocol/command-host-labels.c b/src/streaming/protocol/command-host-labels.c
new file mode 100644
index 000000000..7c2a2d0dd
--- /dev/null
+++ b/src/streaming/protocol/command-host-labels.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "commands.h"
+#include "plugins.d/pluginsd_internals.h"
+
+static int send_labels_callback(const char *name, const char *value, RRDLABEL_SRC ls, void *data) {
+ BUFFER *wb = (BUFFER *)data;
+ buffer_sprintf(wb, "LABEL \"%s\" = %d \"%s\"\n", name, ls, value);
+ return 1;
+}
+
+void rrdpush_send_host_labels(RRDHOST *host) {
+ if(unlikely(!rrdhost_can_send_definitions_to_parent(host)
+ || !stream_has_capability(host->sender, STREAM_CAP_HLABELS)))
+ return;
+
+ BUFFER *wb = sender_start(host->sender);
+
+ rrdlabels_walkthrough_read(host->rrdlabels, send_labels_callback, wb);
+ buffer_sprintf(wb, "OVERWRITE %s\n", "labels");
+
+ sender_commit(host->sender, wb, STREAM_TRAFFIC_TYPE_METADATA);
+
+ sender_thread_buffer_free();
+}
diff --git a/src/streaming/protocol/command-host-variables.c b/src/streaming/protocol/command-host-variables.c
new file mode 100644
index 000000000..83e4990d6
--- /dev/null
+++ b/src/streaming/protocol/command-host-variables.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "commands.h"
+#include "plugins.d/pluginsd_internals.h"
+
+static inline void rrdpush_sender_add_host_variable_to_buffer(BUFFER *wb, const RRDVAR_ACQUIRED *rva) {
+ buffer_sprintf(
+ wb
+ , "VARIABLE HOST %s = " NETDATA_DOUBLE_FORMAT "\n"
+ , rrdvar_name(rva)
+ , rrdvar2number(rva)
+ );
+
+ netdata_log_debug(D_STREAM, "RRDVAR pushed HOST VARIABLE %s = " NETDATA_DOUBLE_FORMAT, rrdvar_name(rva), rrdvar2number(rva));
+}
+
+void rrdpush_sender_send_this_host_variable_now(RRDHOST *host, const RRDVAR_ACQUIRED *rva) {
+ if(rrdhost_can_send_definitions_to_parent(host)) {
+ BUFFER *wb = sender_start(host->sender);
+ rrdpush_sender_add_host_variable_to_buffer(wb, rva);
+ sender_commit(host->sender, wb, STREAM_TRAFFIC_TYPE_METADATA);
+ sender_thread_buffer_free();
+ }
+}
+
+struct custom_host_variables_callback {
+ BUFFER *wb;
+};
+
+static int rrdpush_sender_thread_custom_host_variables_callback(const DICTIONARY_ITEM *item __maybe_unused, void *rrdvar_ptr __maybe_unused, void *struct_ptr) {
+ const RRDVAR_ACQUIRED *rv = (const RRDVAR_ACQUIRED *)item;
+ struct custom_host_variables_callback *tmp = struct_ptr;
+ BUFFER *wb = tmp->wb;
+
+ rrdpush_sender_add_host_variable_to_buffer(wb, rv);
+ return 1;
+}
+
+void rrdpush_sender_thread_send_custom_host_variables(RRDHOST *host) {
+ if(rrdhost_can_send_definitions_to_parent(host)) {
+ BUFFER *wb = sender_start(host->sender);
+ struct custom_host_variables_callback tmp = {
+ .wb = wb
+ };
+ int ret = rrdvar_walkthrough_read(host->rrdvars, rrdpush_sender_thread_custom_host_variables_callback, &tmp);
+ (void)ret;
+ sender_commit(host->sender, wb, STREAM_TRAFFIC_TYPE_METADATA);
+ sender_thread_buffer_free();
+
+ netdata_log_debug(D_STREAM, "RRDVAR sent %d VARIABLES", ret);
+ }
+}
diff --git a/src/streaming/protocol/command-nodeid.c b/src/streaming/protocol/command-nodeid.c
new file mode 100644
index 000000000..85ace83c8
--- /dev/null
+++ b/src/streaming/protocol/command-nodeid.c
@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "commands.h"
+#include "plugins.d/pluginsd_internals.h"
+
+// the child disconnected from the parent, and it has to clear the parent's claim id
+void rrdpush_sender_clear_parent_claim_id(RRDHOST *host) {
+ host->aclk.claim_id_of_parent = UUID_ZERO;
+}
+
+// the parent sends to the child its claim id, node id and cloud url
+void rrdpush_receiver_send_node_and_claim_id_to_child(RRDHOST *host) {
+ if(host == localhost || UUIDiszero(host->node_id)) return;
+
+ spinlock_lock(&host->receiver_lock);
+ if(host->receiver && stream_has_capability(host->receiver, STREAM_CAP_NODE_ID)) {
+ char node_id_str[UUID_STR_LEN] = "";
+ uuid_unparse_lower(host->node_id.uuid, node_id_str);
+
+ CLAIM_ID claim_id = claim_id_get();
+
+ if((!claim_id_is_set(claim_id) || !aclk_online())) {
+ // the agent is not claimed or not connected, just use parent claim id
+ // to allow the connection flow.
+ // this may be zero and it is ok.
+ claim_id.uuid = host->aclk.claim_id_of_parent;
+ uuid_unparse_lower(claim_id.uuid.uuid, claim_id.str);
+ }
+
+ char buf[4096];
+ snprintfz(buf, sizeof(buf),
+ PLUGINSD_KEYWORD_NODE_ID " '%s' '%s' '%s'\n",
+ claim_id.str, node_id_str, cloud_config_url_get());
+
+ send_to_plugin(buf, __atomic_load_n(&host->receiver->parser, __ATOMIC_RELAXED));
+ }
+ spinlock_unlock(&host->receiver_lock);
+}
+
+// the sender of the child receives node id, claim id and cloud url from the receiver of the parent
+void rrdpush_sender_get_node_and_claim_id_from_parent(struct sender_state *s) {
+ char *claim_id_str = get_word(s->line.words, s->line.num_words, 1);
+ char *node_id_str = get_word(s->line.words, s->line.num_words, 2);
+ char *url = get_word(s->line.words, s->line.num_words, 3);
+
+ bool claimed = is_agent_claimed();
+ bool update_node_id = false;
+
+ ND_UUID claim_id;
+ if (uuid_parse(claim_id_str ? claim_id_str : "", claim_id.uuid) != 0) {
+ nd_log(NDLS_DAEMON, NDLP_ERR,
+ "STREAM %s [send to %s] received invalid claim id '%s'",
+ rrdhost_hostname(s->host), s->connected_to,
+ claim_id_str ? claim_id_str : "(unset)");
+ return;
+ }
+
+ ND_UUID node_id;
+ if(uuid_parse(node_id_str ? node_id_str : "", node_id.uuid) != 0) {
+ nd_log(NDLS_DAEMON, NDLP_ERR,
+ "STREAM %s [send to %s] received an invalid node id '%s'",
+ rrdhost_hostname(s->host), s->connected_to,
+ node_id_str ? node_id_str : "(unset)");
+ return;
+ }
+
+ if (!UUIDiszero(s->host->aclk.claim_id_of_parent) && !UUIDeq(s->host->aclk.claim_id_of_parent, claim_id))
+ nd_log(NDLS_DAEMON, NDLP_INFO,
+ "STREAM %s [send to %s] changed parent's claim id to %s",
+ rrdhost_hostname(s->host), s->connected_to,
+ claim_id_str ? claim_id_str : "(unset)");
+
+ if(!UUIDiszero(s->host->node_id) && !UUIDeq(s->host->node_id, node_id)) {
+ if(claimed) {
+ nd_log(NDLS_DAEMON, NDLP_ERR,
+ "STREAM %s [send to %s] parent reports different node id '%s', but we are claimed. Ignoring it.",
+ rrdhost_hostname(s->host), s->connected_to,
+ node_id_str ? node_id_str : "(unset)");
+ return;
+ }
+ else {
+ update_node_id = true;
+ nd_log(NDLS_DAEMON, NDLP_WARNING,
+ "STREAM %s [send to %s] changed node id to %s",
+ rrdhost_hostname(s->host), s->connected_to,
+ node_id_str ? node_id_str : "(unset)");
+ }
+ }
+
+ if(!url || !*url) {
+ nd_log(NDLS_DAEMON, NDLP_ERR,
+ "STREAM %s [send to %s] received an invalid cloud URL '%s'",
+ rrdhost_hostname(s->host), s->connected_to,
+ url ? url : "(unset)");
+ return;
+ }
+
+ s->host->aclk.claim_id_of_parent = claim_id;
+
+ // There are some very strange corner cases here:
+ //
+ // - Agent is claimed but offline, and it receives node_id and cloud_url from a different Netdata Cloud.
+ // - Agent is configured to talk to an on-prem Netdata Cloud, it is offline, but the parent is connected
+ // to a different Netdata Cloud.
+ //
+ // The solution below, tries to get the agent online, using the latest information.
+ // So, if the agent is not claimed or not connected, we inherit whatever information sent from the parent,
+ // to allow the user to work with it.
+
+ if(claimed && aclk_online())
+ // we are directly claimed and connected, ignore node id and cloud url
+ return;
+
+ bool node_id_updated = false;
+ if(UUIDiszero(s->host->node_id) || update_node_id) {
+ s->host->node_id = node_id;
+ node_id_updated = true;
+ }
+
+ // we change the URL, to allow the agent dashboard to work with Netdata Cloud on-prem, if any.
+ cloud_config_url_set(url);
+
+ // send it down the line (to children)
+ rrdpush_receiver_send_node_and_claim_id_to_child(s->host);
+
+ if(node_id_updated)
+ stream_path_node_id_updated(s->host);
+}
diff --git a/src/streaming/protocol/commands.c b/src/streaming/protocol/commands.c
new file mode 100644
index 000000000..e9e16bdac
--- /dev/null
+++ b/src/streaming/protocol/commands.c
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "commands.h"
+
+RRDSET_STREAM_BUFFER rrdset_push_metric_initialize(RRDSET *st, time_t wall_clock_time) {
+ RRDHOST *host = st->rrdhost;
+
+ // fetch the flags we need to check with one atomic operation
+ RRDHOST_FLAGS host_flags = __atomic_load_n(&host->flags, __ATOMIC_SEQ_CST);
+
+ // check if we are not connected
+ if(unlikely(!(host_flags & RRDHOST_FLAG_RRDPUSH_SENDER_READY_4_METRICS))) {
+
+ if(unlikely(!(host_flags & (RRDHOST_FLAG_RRDPUSH_SENDER_SPAWN | RRDHOST_FLAG_RRDPUSH_RECEIVER_DISCONNECTED))))
+ rrdpush_sender_thread_spawn(host);
+
+ if(unlikely(!(host_flags & RRDHOST_FLAG_RRDPUSH_SENDER_LOGGED_STATUS))) {
+ rrdhost_flag_set(host, RRDHOST_FLAG_RRDPUSH_SENDER_LOGGED_STATUS);
+ nd_log_daemon(NDLP_NOTICE, "STREAM %s [send]: not ready - collected metrics are not sent to parent.", rrdhost_hostname(host));
+ }
+
+ return (RRDSET_STREAM_BUFFER) { .wb = NULL, };
+ }
+ else if(unlikely(host_flags & RRDHOST_FLAG_RRDPUSH_SENDER_LOGGED_STATUS)) {
+ nd_log_daemon(NDLP_INFO, "STREAM %s [send]: sending metrics to parent...", rrdhost_hostname(host));
+ rrdhost_flag_clear(host, RRDHOST_FLAG_RRDPUSH_SENDER_LOGGED_STATUS);
+ }
+
+ if(unlikely(host_flags & RRDHOST_FLAG_GLOBAL_FUNCTIONS_UPDATED)) {
+ BUFFER *wb = sender_start(host->sender);
+ rrd_global_functions_expose_rrdpush(host, wb, stream_has_capability(host->sender, STREAM_CAP_DYNCFG));
+ sender_commit(host->sender, wb, STREAM_TRAFFIC_TYPE_FUNCTIONS);
+ }
+
+ bool exposed_upstream = rrdset_check_upstream_exposed(st);
+ RRDSET_FLAGS rrdset_flags = rrdset_flag_get(st);
+ bool replication_in_progress = !(rrdset_flags & RRDSET_FLAG_SENDER_REPLICATION_FINISHED);
+
+ if(unlikely((exposed_upstream && replication_in_progress) ||
+ !should_send_chart_matching(st, rrdset_flags)))
+ return (RRDSET_STREAM_BUFFER) { .wb = NULL, };
+
+ if(unlikely(!exposed_upstream)) {
+ BUFFER *wb = sender_start(host->sender);
+ replication_in_progress = rrdpush_send_chart_definition(wb, st);
+ }
+
+ if(replication_in_progress)
+ return (RRDSET_STREAM_BUFFER) { .wb = NULL, };
+
+ return (RRDSET_STREAM_BUFFER) {
+ .capabilities = host->sender->capabilities,
+ .v2 = stream_has_capability(host->sender, STREAM_CAP_INTERPOLATED),
+ .rrdset_flags = rrdset_flags,
+ .wb = sender_start(host->sender),
+ .wall_clock_time = wall_clock_time,
+ };
+}
diff --git a/src/streaming/protocol/commands.h b/src/streaming/protocol/commands.h
new file mode 100644
index 000000000..81344175c
--- /dev/null
+++ b/src/streaming/protocol/commands.h
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_STREAMING_PROTCOL_COMMANDS_H
+#define NETDATA_STREAMING_PROTCOL_COMMANDS_H
+
+#include "database/rrd.h"
+#include "../rrdpush.h"
+
+typedef struct rrdset_stream_buffer {
+ STREAM_CAPABILITIES capabilities;
+ bool v2;
+ bool begin_v2_added;
+ time_t wall_clock_time;
+ RRDSET_FLAGS rrdset_flags;
+ time_t last_point_end_time_s;
+ BUFFER *wb;
+} RRDSET_STREAM_BUFFER;
+
+RRDSET_STREAM_BUFFER rrdset_push_metric_initialize(RRDSET *st, time_t wall_clock_time);
+
+void rrdpush_sender_get_node_and_claim_id_from_parent(struct sender_state *s);
+void rrdpush_receiver_send_node_and_claim_id_to_child(RRDHOST *host);
+void rrdpush_sender_clear_parent_claim_id(RRDHOST *host);
+
+void rrdpush_sender_send_claimed_id(RRDHOST *host);
+
+void rrdpush_send_global_functions(RRDHOST *host);
+void rrdpush_send_host_labels(RRDHOST *host);
+
+void rrdpush_sender_thread_send_custom_host_variables(RRDHOST *host);
+void rrdpush_sender_send_this_host_variable_now(RRDHOST *host, const RRDVAR_ACQUIRED *rva);
+
+bool rrdpush_send_chart_definition(BUFFER *wb, RRDSET *st);
+bool rrdset_push_chart_definition_now(RRDSET *st);
+bool should_send_chart_matching(RRDSET *st, RRDSET_FLAGS flags);
+
+void rrdset_push_metrics_v1(RRDSET_STREAM_BUFFER *rsb, RRDSET *st);
+void rrddim_push_metrics_v2(RRDSET_STREAM_BUFFER *rsb, RRDDIM *rd, usec_t point_end_time_ut, NETDATA_DOUBLE n, SN_FLAGS flags);
+void rrdset_push_metrics_finished(RRDSET_STREAM_BUFFER *rsb, RRDSET *st);
+
+#endif //NETDATA_STREAMING_PROTCOL_COMMANDS_H
diff --git a/src/streaming/receiver.c b/src/streaming/receiver.c
index 0c0da2121..6c15004a3 100644
--- a/src/streaming/receiver.c
+++ b/src/streaming/receiver.c
@@ -3,12 +3,13 @@
#include "rrdpush.h"
#include "web/server/h2o/http_server.h"
-extern struct config stream_config;
+// When a child disconnects this is the maximum we will wait
+// before we update the cloud that the child is offline
+#define MAX_CHILD_DISC_DELAY (30000)
+#define MAX_CHILD_DISC_TOLERANCE (125 / 100)
void receiver_state_free(struct receiver_state *rpt) {
-#ifdef ENABLE_HTTPS
netdata_ssl_close(&rpt->ssl);
-#endif
if(rpt->fd != -1) {
internal_error(true, "closing socket...");
@@ -36,7 +37,7 @@ void receiver_state_free(struct receiver_state *rpt) {
freez(rpt);
}
-#include "collectors/plugins.d/pluginsd_parser.h"
+#include "plugins.d/pluginsd_parser.h"
// IMPORTANT: to add workers, you have to edit WORKER_PARSER_FIRST_JOB accordingly
#define WORKER_RECEIVER_JOB_BYTES_READ (WORKER_PARSER_FIRST_JOB - 1)
@@ -71,9 +72,7 @@ static inline int read_stream(struct receiver_state *r, char* buffer, size_t siz
errno_clear();
switch(wait_on_socket_or_cancel_with_timeout(
-#ifdef ENABLE_HTTPS
&r->ssl,
-#endif
r->fd, 0, POLLIN, NULL))
{
case 0: // data are waiting
@@ -93,14 +92,10 @@ static inline int read_stream(struct receiver_state *r, char* buffer, size_t siz
return -2;
}
-#ifdef ENABLE_HTTPS
if (SSL_connection(&r->ssl))
bytes_read = netdata_ssl_read(&r->ssl, buffer, size);
else
bytes_read = read(r->fd, buffer, size);
-#else
- bytes_read = read(r->fd, buffer, size);
-#endif
} while(bytes_read < 0 && errno == EINTR && tries--);
@@ -325,7 +320,7 @@ static size_t streaming_parser(struct receiver_state *rpt, struct plugind *cd, i
.capabilities = rpt->capabilities,
};
- parser = parser_init(&user, NULL, NULL, fd, PARSER_INPUT_SPLIT, ssl);
+ parser = parser_init(&user, fd, fd, PARSER_INPUT_SPLIT, ssl);
}
#ifdef ENABLE_H2O
@@ -336,10 +331,6 @@ static size_t streaming_parser(struct receiver_state *rpt, struct plugind *cd, i
rrd_collector_started();
- // this keeps the parser with its current value
- // so, parser needs to be allocated before pushing it
- CLEANUP_FUNCTION_REGISTER(pluginsd_process_thread_cleanup) parser_ptr = parser;
-
bool compressed_connection = rrdpush_decompression_initialize(rpt);
buffered_reader_init(&rpt->reader);
@@ -365,6 +356,9 @@ static size_t streaming_parser(struct receiver_state *rpt, struct plugind *cd, i
};
ND_LOG_STACK_PUSH(lgs);
+ __atomic_store_n(&rpt->parser, parser, __ATOMIC_RELAXED);
+ rrdpush_receiver_send_node_and_claim_id_to_child(rpt->host);
+
while(!receiver_should_stop(rpt)) {
if(!buffered_reader_next_line(&rpt->reader, buffer)) {
@@ -389,6 +383,17 @@ static size_t streaming_parser(struct receiver_state *rpt, struct plugind *cd, i
buffer->len = 0;
buffer->buffer[0] = '\0';
}
+
+ // cleanup the sender buffer, because we may end-up reusing an incomplete buffer
+ sender_thread_buffer_free();
+ parser->user.v2.stream_buffer.wb = NULL;
+
+ // make sure send_to_plugin() will not write any data to the socket
+ spinlock_lock(&parser->writer.spinlock);
+ parser->fd_output = -1;
+ parser->ssl_output = NULL;
+ spinlock_unlock(&parser->writer.spinlock);
+
result = parser->user.data_collections_count;
return result;
}
@@ -407,7 +412,7 @@ static bool rrdhost_set_receiver(RRDHOST *host, struct receiver_state *rpt) {
bool signal_rrdcontext = false;
bool set_this = false;
- netdata_mutex_lock(&host->receiver_lock);
+ spinlock_lock(&host->receiver_lock);
if (!host->receiver) {
rrdhost_flag_clear(host, RRDHOST_FLAG_ORPHAN);
@@ -433,7 +438,7 @@ static bool rrdhost_set_receiver(RRDHOST *host, struct receiver_state *rpt) {
}
}
- host->health_log.health_log_history = rpt->config.alarms_history;
+ host->health_log.health_log_retention_s = rpt->config.alarms_history;
// this is a test
// if(rpt->hops <= host->sender->hops)
@@ -450,7 +455,7 @@ static bool rrdhost_set_receiver(RRDHOST *host, struct receiver_state *rpt) {
set_this = true;
}
- netdata_mutex_unlock(&host->receiver_lock);
+ spinlock_unlock(&host->receiver_lock);
if(signal_rrdcontext)
rrdcontext_host_child_connected(host);
@@ -460,47 +465,56 @@ static bool rrdhost_set_receiver(RRDHOST *host, struct receiver_state *rpt) {
static void rrdhost_clear_receiver(struct receiver_state *rpt) {
RRDHOST *host = rpt->host;
- if(host) {
- bool signal_rrdcontext = false;
- netdata_mutex_lock(&host->receiver_lock);
+ if(!host) return;
+ spinlock_lock(&host->receiver_lock);
+ {
// Make sure that we detach this thread and don't kill a freshly arriving receiver
- if(host->receiver == rpt) {
+
+ if (host->receiver == rpt) {
+ spinlock_unlock(&host->receiver_lock);
+ {
+ // run all these without having the receiver lock
+
+ stream_path_child_disconnected(host);
+ rrdpush_sender_thread_stop(host, STREAM_HANDSHAKE_DISCONNECT_RECEIVER_LEFT, false);
+ rrdpush_receiver_replication_reset(host);
+ rrdcontext_host_child_disconnected(host);
+
+ if (rpt->config.health_enabled)
+ rrdcalc_child_disconnected(host);
+
+ rrdpush_reset_destinations_postpone_time(host);
+ }
+ spinlock_lock(&host->receiver_lock);
+
+ // now we have the lock again
+
__atomic_sub_fetch(&localhost->connected_children_count, 1, __ATOMIC_RELAXED);
rrdhost_flag_set(rpt->host, RRDHOST_FLAG_RRDPUSH_RECEIVER_DISCONNECTED);
host->trigger_chart_obsoletion_check = 0;
host->child_connect_time = 0;
host->child_disconnected_time = now_realtime_sec();
-
host->health.health_enabled = 0;
- rrdpush_sender_thread_stop(host, STREAM_HANDSHAKE_DISCONNECT_RECEIVER_LEFT, false);
-
- signal_rrdcontext = true;
- rrdpush_receiver_replication_reset(host);
-
+ host->rrdpush_last_receiver_exit_reason = rpt->exit.reason;
rrdhost_flag_set(host, RRDHOST_FLAG_ORPHAN);
host->receiver = NULL;
- host->rrdpush_last_receiver_exit_reason = rpt->exit.reason;
-
- if(rpt->config.health_enabled)
- rrdcalc_child_disconnected(host);
}
+ }
- netdata_mutex_unlock(&host->receiver_lock);
-
- if(signal_rrdcontext)
- rrdcontext_host_child_disconnected(host);
+ // this must be cleared with the receiver lock
+ pluginsd_process_cleanup(rpt->parser);
+ __atomic_store_n(&rpt->parser, NULL, __ATOMIC_RELAXED);
- rrdpush_reset_destinations_postpone_time(host);
- }
+ spinlock_unlock(&host->receiver_lock);
}
bool stop_streaming_receiver(RRDHOST *host, STREAM_HANDSHAKE reason) {
bool ret = false;
- netdata_mutex_lock(&host->receiver_lock);
+ spinlock_lock(&host->receiver_lock);
if(host->receiver) {
if(!host->receiver->exit.shutdown) {
@@ -514,12 +528,12 @@ bool stop_streaming_receiver(RRDHOST *host, STREAM_HANDSHAKE reason) {
int count = 2000;
while (host->receiver && count-- > 0) {
- netdata_mutex_unlock(&host->receiver_lock);
+ spinlock_unlock(&host->receiver_lock);
// let the lock for the receiver thread to exit
sleep_usec(1 * USEC_PER_MS);
- netdata_mutex_lock(&host->receiver_lock);
+ spinlock_lock(&host->receiver_lock);
}
if(host->receiver)
@@ -531,16 +545,14 @@ bool stop_streaming_receiver(RRDHOST *host, STREAM_HANDSHAKE reason) {
else
ret = true;
- netdata_mutex_unlock(&host->receiver_lock);
+ spinlock_unlock(&host->receiver_lock);
return ret;
}
static void rrdpush_send_error_on_taken_over_connection(struct receiver_state *rpt, const char *msg) {
(void) send_timeout(
-#ifdef ENABLE_HTTPS
&rpt->ssl,
-#endif
rpt->fd,
(char *)msg,
strlen(msg),
@@ -548,7 +560,7 @@ static void rrdpush_send_error_on_taken_over_connection(struct receiver_state *r
5);
}
-void rrdpush_receive_log_status(struct receiver_state *rpt, const char *msg, const char *status, ND_LOG_FIELD_PRIORITY priority) {
+static void rrdpush_receive_log_status(struct receiver_state *rpt, const char *msg, const char *status, ND_LOG_FIELD_PRIORITY priority) {
// this function may be called BEFORE we spawn the receiver thread
// so, we need to add the fields again (it does not harm)
ND_LOG_STACK lgs[] = {
@@ -582,26 +594,26 @@ static void rrdpush_receive(struct receiver_state *rpt)
rpt->config.health_enabled = health_plugin_enabled();
rpt->config.alarms_delay = 60;
- rpt->config.alarms_history = HEALTH_LOG_DEFAULT_HISTORY;
+ rpt->config.alarms_history = HEALTH_LOG_RETENTION_DEFAULT;
- rpt->config.rrdpush_enabled = (int)default_rrdpush_enabled;
- rpt->config.rrdpush_destination = default_rrdpush_destination;
- rpt->config.rrdpush_api_key = default_rrdpush_api_key;
- rpt->config.rrdpush_send_charts_matching = default_rrdpush_send_charts_matching;
+ rpt->config.rrdpush_enabled = (int)stream_conf_send_enabled;
+ rpt->config.rrdpush_destination = stream_conf_send_destination;
+ rpt->config.rrdpush_api_key = stream_conf_send_api_key;
+ rpt->config.rrdpush_send_charts_matching = stream_conf_send_charts_matching;
- rpt->config.rrdpush_enable_replication = default_rrdpush_enable_replication;
- rpt->config.rrdpush_seconds_to_replicate = default_rrdpush_seconds_to_replicate;
- rpt->config.rrdpush_replication_step = default_rrdpush_replication_step;
+ rpt->config.rrdpush_enable_replication = stream_conf_replication_enabled;
+ rpt->config.rrdpush_seconds_to_replicate = stream_conf_replication_period;
+ rpt->config.rrdpush_replication_step = stream_conf_replication_step;
- rpt->config.update_every = (int)appconfig_get_number(&stream_config, rpt->machine_guid, "update every", rpt->config.update_every);
+ rpt->config.update_every = (int)appconfig_get_duration_seconds(&stream_config, rpt->machine_guid, "update every", rpt->config.update_every);
if(rpt->config.update_every < 0) rpt->config.update_every = 1;
- rpt->config.history = (int)appconfig_get_number(&stream_config, rpt->key, "default history", rpt->config.history);
- rpt->config.history = (int)appconfig_get_number(&stream_config, rpt->machine_guid, "history", rpt->config.history);
+ rpt->config.history = (int)appconfig_get_number(&stream_config, rpt->key, "retention", rpt->config.history);
+ rpt->config.history = (int)appconfig_get_number(&stream_config, rpt->machine_guid, "retention", rpt->config.history);
if(rpt->config.history < 5) rpt->config.history = 5;
- rpt->config.mode = rrd_memory_mode_id(appconfig_get(&stream_config, rpt->key, "default memory mode", rrd_memory_mode_name(rpt->config.mode)));
- rpt->config.mode = rrd_memory_mode_id(appconfig_get(&stream_config, rpt->machine_guid, "memory mode", rrd_memory_mode_name(rpt->config.mode)));
+ rpt->config.mode = rrd_memory_mode_id(appconfig_get(&stream_config, rpt->key, "db", rrd_memory_mode_name(rpt->config.mode)));
+ rpt->config.mode = rrd_memory_mode_id(appconfig_get(&stream_config, rpt->machine_guid, "db", rrd_memory_mode_name(rpt->config.mode)));
if (unlikely(rpt->config.mode == RRD_MEMORY_MODE_DBENGINE && !dbengine_enabled)) {
netdata_log_error("STREAM '%s' [receive from %s:%s]: "
@@ -616,34 +628,34 @@ static void rrdpush_receive(struct receiver_state *rpt)
rpt->config.health_enabled = appconfig_get_boolean_ondemand(&stream_config, rpt->key, "health enabled by default", rpt->config.health_enabled);
rpt->config.health_enabled = appconfig_get_boolean_ondemand(&stream_config, rpt->machine_guid, "health enabled", rpt->config.health_enabled);
- rpt->config.alarms_delay = appconfig_get_number(&stream_config, rpt->key, "default postpone alarms on connect seconds", rpt->config.alarms_delay);
- rpt->config.alarms_delay = appconfig_get_number(&stream_config, rpt->machine_guid, "postpone alarms on connect seconds", rpt->config.alarms_delay);
+ rpt->config.alarms_delay = appconfig_get_duration_seconds(&stream_config, rpt->key, "postpone alerts on connect", rpt->config.alarms_delay);
+ rpt->config.alarms_delay = appconfig_get_duration_seconds(&stream_config, rpt->machine_guid, "postpone alerts on connect", rpt->config.alarms_delay);
- rpt->config.alarms_history = appconfig_get_number(&stream_config, rpt->key, "default health log history", rpt->config.alarms_history);
- rpt->config.alarms_history = appconfig_get_number(&stream_config, rpt->machine_guid, "health log history", rpt->config.alarms_history);
+ rpt->config.alarms_history = appconfig_get_duration_seconds(&stream_config, rpt->key, "health log retention", rpt->config.alarms_history);
+ rpt->config.alarms_history = appconfig_get_duration_seconds(&stream_config, rpt->machine_guid, "health log retention", rpt->config.alarms_history);
- rpt->config.rrdpush_enabled = appconfig_get_boolean(&stream_config, rpt->key, "default proxy enabled", rpt->config.rrdpush_enabled);
+ rpt->config.rrdpush_enabled = appconfig_get_boolean(&stream_config, rpt->key, "proxy enabled", rpt->config.rrdpush_enabled);
rpt->config.rrdpush_enabled = appconfig_get_boolean(&stream_config, rpt->machine_guid, "proxy enabled", rpt->config.rrdpush_enabled);
- rpt->config.rrdpush_destination = appconfig_get(&stream_config, rpt->key, "default proxy destination", rpt->config.rrdpush_destination);
+ rpt->config.rrdpush_destination = appconfig_get(&stream_config, rpt->key, "proxy destination", rpt->config.rrdpush_destination);
rpt->config.rrdpush_destination = appconfig_get(&stream_config, rpt->machine_guid, "proxy destination", rpt->config.rrdpush_destination);
- rpt->config.rrdpush_api_key = appconfig_get(&stream_config, rpt->key, "default proxy api key", rpt->config.rrdpush_api_key);
+ rpt->config.rrdpush_api_key = appconfig_get(&stream_config, rpt->key, "proxy api key", rpt->config.rrdpush_api_key);
rpt->config.rrdpush_api_key = appconfig_get(&stream_config, rpt->machine_guid, "proxy api key", rpt->config.rrdpush_api_key);
- rpt->config.rrdpush_send_charts_matching = appconfig_get(&stream_config, rpt->key, "default proxy send charts matching", rpt->config.rrdpush_send_charts_matching);
+ rpt->config.rrdpush_send_charts_matching = appconfig_get(&stream_config, rpt->key, "proxy send charts matching", rpt->config.rrdpush_send_charts_matching);
rpt->config.rrdpush_send_charts_matching = appconfig_get(&stream_config, rpt->machine_guid, "proxy send charts matching", rpt->config.rrdpush_send_charts_matching);
rpt->config.rrdpush_enable_replication = appconfig_get_boolean(&stream_config, rpt->key, "enable replication", rpt->config.rrdpush_enable_replication);
rpt->config.rrdpush_enable_replication = appconfig_get_boolean(&stream_config, rpt->machine_guid, "enable replication", rpt->config.rrdpush_enable_replication);
- rpt->config.rrdpush_seconds_to_replicate = appconfig_get_number(&stream_config, rpt->key, "seconds to replicate", rpt->config.rrdpush_seconds_to_replicate);
- rpt->config.rrdpush_seconds_to_replicate = appconfig_get_number(&stream_config, rpt->machine_guid, "seconds to replicate", rpt->config.rrdpush_seconds_to_replicate);
+ rpt->config.rrdpush_seconds_to_replicate = appconfig_get_duration_seconds(&stream_config, rpt->key, "replication period", rpt->config.rrdpush_seconds_to_replicate);
+ rpt->config.rrdpush_seconds_to_replicate = appconfig_get_duration_seconds(&stream_config, rpt->machine_guid, "replication period", rpt->config.rrdpush_seconds_to_replicate);
- rpt->config.rrdpush_replication_step = appconfig_get_number(&stream_config, rpt->key, "seconds per replication step", rpt->config.rrdpush_replication_step);
- rpt->config.rrdpush_replication_step = appconfig_get_number(&stream_config, rpt->machine_guid, "seconds per replication step", rpt->config.rrdpush_replication_step);
+ rpt->config.rrdpush_replication_step = appconfig_get_number(&stream_config, rpt->key, "replication step", rpt->config.rrdpush_replication_step);
+ rpt->config.rrdpush_replication_step = appconfig_get_number(&stream_config, rpt->machine_guid, "replication step", rpt->config.rrdpush_replication_step);
- rpt->config.rrdpush_compression = default_rrdpush_compression_enabled;
+ rpt->config.rrdpush_compression = stream_conf_compression_enabled;
rpt->config.rrdpush_compression = appconfig_get_boolean(&stream_config, rpt->key, "enable compression", rpt->config.rrdpush_compression);
rpt->config.rrdpush_compression = appconfig_get_boolean(&stream_config, rpt->machine_guid, "enable compression", rpt->config.rrdpush_compression);
@@ -652,7 +664,7 @@ static void rrdpush_receive(struct receiver_state *rpt)
is_ephemeral = appconfig_get_boolean(&stream_config, rpt->machine_guid, "is ephemeral node", is_ephemeral);
if(rpt->config.rrdpush_compression) {
- char *order = appconfig_get(&stream_config, rpt->key, "compression algorithms order", RRDPUSH_COMPRESSION_ALGORITHMS_ORDER);
+ const char *order = appconfig_get(&stream_config, rpt->key, "compression algorithms order", RRDPUSH_COMPRESSION_ALGORITHMS_ORDER);
order = appconfig_get(&stream_config, rpt->machine_guid, "compression algorithms order", order);
rrdpush_parse_compression_order(rpt, order);
}
@@ -730,11 +742,7 @@ static void rrdpush_receive(struct receiver_state *rpt)
, rpt->host->rrd_history_entries
, rrd_memory_mode_name(rpt->host->rrd_memory_mode)
, (rpt->config.health_enabled == CONFIG_BOOLEAN_NO)?"disabled":((rpt->config.health_enabled == CONFIG_BOOLEAN_YES)?"enabled":"auto")
-#ifdef ENABLE_HTTPS
, (rpt->ssl.conn != NULL) ? " SSL," : ""
-#else
- , ""
-#endif
);
#endif // NETDATA_INTERNAL_CHECKS
@@ -784,9 +792,7 @@ static void rrdpush_receive(struct receiver_state *rpt)
} else {
#endif
ssize_t bytes_sent = send_timeout(
-#ifdef ENABLE_HTTPS
&rpt->ssl,
-#endif
rpt->fd, initial_response, strlen(initial_response), 0, 60);
if(bytes_sent != (ssize_t)strlen(initial_response)) {
@@ -828,13 +834,9 @@ static void rrdpush_receive(struct receiver_state *rpt)
rpt, "connected and ready to receive data",
RRDPUSH_STATUS_CONNECTED, NDLP_INFO);
-#ifdef ENABLE_ACLK
// in case we have cloud connection we inform cloud
// new child connected
- if (netdata_cloud_enabled)
- aclk_host_state_update(rpt->host, 1, 1);
-#endif
-
+ schedule_node_state_update(rpt->host, 300);
rrdhost_set_is_parent_label();
if (is_ephemeral)
@@ -843,50 +845,28 @@ static void rrdpush_receive(struct receiver_state *rpt)
// let it reconnect to parent immediately
rrdpush_reset_destinations_postpone_time(rpt->host);
- size_t count = streaming_parser(rpt, &cd, rpt->fd,
-#ifdef ENABLE_HTTPS
- (rpt->ssl.conn) ? &rpt->ssl : NULL
-#else
- NULL
-#endif
- );
+ // receive data
+ size_t count = streaming_parser(rpt, &cd, rpt->fd, (rpt->ssl.conn) ? &rpt->ssl : NULL);
+ // the parser stopped
receiver_set_exit_reason(rpt, STREAM_HANDSHAKE_DISCONNECT_PARSER_EXIT, false);
{
char msg[100 + 1];
snprintfz(msg, sizeof(msg) - 1, "disconnected (completed %zu updates)", count);
- rrdpush_receive_log_status(
- rpt, msg,
- RRDPUSH_STATUS_DISCONNECTED, NDLP_WARNING);
+ rrdpush_receive_log_status(rpt, msg, RRDPUSH_STATUS_DISCONNECTED, NDLP_WARNING);
}
-#ifdef ENABLE_ACLK
// in case we have cloud connection we inform cloud
// a child disconnected
- if (netdata_cloud_enabled)
- aclk_host_state_update(rpt->host, 0, 1);
-#endif
+ STREAM_PATH tmp = rrdhost_stream_path_fetch(rpt->host);
+ uint64_t total_reboot = (tmp.start_time + tmp.shutdown_time);
+ schedule_node_state_update(rpt->host, MIN((total_reboot * MAX_CHILD_DISC_TOLERANCE), MAX_CHILD_DISC_DELAY));
cleanup:
;
}
-static void rrdpush_receiver_thread_cleanup(void *pptr) {
- struct receiver_state *rpt = CLEANUP_FUNCTION_GET_PTR(pptr);
- if(!rpt) return;
-
- netdata_log_info("STREAM '%s' [receive from [%s]:%s]: "
- "receive thread ended (task id %d)"
- , rpt->hostname ? rpt->hostname : "-"
- , rpt->client_ip ? rpt->client_ip : "-", rpt->client_port ? rpt->client_port : "-", gettid_cached());
-
- worker_unregister();
- rrdhost_clear_receiver(rpt);
- receiver_state_free(rpt);
- rrdhost_set_is_parent_label();
-}
-
static bool stream_receiver_log_capabilities(BUFFER *wb, void *ptr) {
struct receiver_state *rpt = ptr;
if(!rpt)
@@ -901,16 +881,11 @@ static bool stream_receiver_log_transport(BUFFER *wb, void *ptr) {
if(!rpt)
return false;
-#ifdef ENABLE_HTTPS
buffer_strcat(wb, SSL_connection(&rpt->ssl) ? "https" : "http");
-#else
- buffer_strcat(wb, "http");
-#endif
return true;
}
void *rrdpush_receiver_thread(void *ptr) {
- CLEANUP_FUNCTION_REGISTER(rrdpush_receiver_thread_cleanup) cleanup_ptr = ptr;
worker_register("STREAMRCV");
worker_register_job_custom_metric(WORKER_RECEIVER_JOB_BYTES_READ,
@@ -942,5 +917,469 @@ void *rrdpush_receiver_thread(void *ptr) {
, rpt->client_port);
rrdpush_receive(rpt);
+
+ netdata_log_info("STREAM '%s' [receive from [%s]:%s]: "
+ "receive thread ended (task id %d)"
+ , rpt->hostname ? rpt->hostname : "-"
+ , rpt->client_ip ? rpt->client_ip : "-", rpt->client_port ? rpt->client_port : "-", gettid_cached());
+
+ worker_unregister();
+ rrdhost_clear_receiver(rpt);
+ rrdhost_set_is_parent_label();
+ receiver_state_free(rpt);
return NULL;
}
+
+int rrdpush_receiver_permission_denied(struct web_client *w) {
+ // we always respond with the same message and error code
+ // to prevent an attacker from gaining info about the error
+ buffer_flush(w->response.data);
+ buffer_strcat(w->response.data, START_STREAMING_ERROR_NOT_PERMITTED);
+ return HTTP_RESP_UNAUTHORIZED;
+}
+
+int rrdpush_receiver_too_busy_now(struct web_client *w) {
+ // we always respond with the same message and error code
+ // to prevent an attacker from gaining info about the error
+ buffer_flush(w->response.data);
+ buffer_strcat(w->response.data, START_STREAMING_ERROR_BUSY_TRY_LATER);
+ return HTTP_RESP_SERVICE_UNAVAILABLE;
+}
+
+static void rrdpush_receiver_takeover_web_connection(struct web_client *w, struct receiver_state *rpt) {
+ rpt->fd = w->ifd;
+
+ rpt->ssl.conn = w->ssl.conn;
+ rpt->ssl.state = w->ssl.state;
+
+ w->ssl = NETDATA_SSL_UNSET_CONNECTION;
+
+ WEB_CLIENT_IS_DEAD(w);
+
+ if(web_server_mode == WEB_SERVER_MODE_STATIC_THREADED) {
+ web_client_flag_set(w, WEB_CLIENT_FLAG_DONT_CLOSE_SOCKET);
+ }
+ else {
+ if(w->ifd == w->ofd)
+ w->ifd = w->ofd = -1;
+ else
+ w->ifd = -1;
+ }
+
+ buffer_flush(w->response.data);
+}
+
+int rrdpush_receiver_thread_spawn(struct web_client *w, char *decoded_query_string, void *h2o_ctx __maybe_unused) {
+
+ if(!service_running(ABILITY_STREAMING_CONNECTIONS))
+ return rrdpush_receiver_too_busy_now(w);
+
+ struct receiver_state *rpt = callocz(1, sizeof(*rpt));
+ rpt->connected_since_s = now_realtime_sec();
+ rpt->last_msg_t = now_monotonic_sec();
+ rpt->hops = 1;
+
+ rpt->capabilities = STREAM_CAP_INVALID;
+
+#ifdef ENABLE_H2O
+ rpt->h2o_ctx = h2o_ctx;
+#endif
+
+ __atomic_add_fetch(&netdata_buffers_statistics.rrdhost_receivers, sizeof(*rpt), __ATOMIC_RELAXED);
+ __atomic_add_fetch(&netdata_buffers_statistics.rrdhost_allocations_size, sizeof(struct rrdhost_system_info), __ATOMIC_RELAXED);
+
+ rpt->system_info = callocz(1, sizeof(struct rrdhost_system_info));
+ rpt->system_info->hops = rpt->hops;
+
+ rpt->fd = -1;
+ rpt->client_ip = strdupz(w->client_ip);
+ rpt->client_port = strdupz(w->client_port);
+
+ rpt->ssl = NETDATA_SSL_UNSET_CONNECTION;
+
+ rpt->config.update_every = default_rrd_update_every;
+
+ // parse the parameters and fill rpt and rpt->system_info
+
+ while(decoded_query_string) {
+ char *value = strsep_skip_consecutive_separators(&decoded_query_string, "&");
+ if(!value || !*value) continue;
+
+ char *name = strsep_skip_consecutive_separators(&value, "=");
+ if(!name || !*name) continue;
+ if(!value || !*value) continue;
+
+ if(!strcmp(name, "key") && !rpt->key)
+ rpt->key = strdupz(value);
+
+ else if(!strcmp(name, "hostname") && !rpt->hostname)
+ rpt->hostname = strdupz(value);
+
+ else if(!strcmp(name, "registry_hostname") && !rpt->registry_hostname)
+ rpt->registry_hostname = strdupz(value);
+
+ else if(!strcmp(name, "machine_guid") && !rpt->machine_guid)
+ rpt->machine_guid = strdupz(value);
+
+ else if(!strcmp(name, "update_every"))
+ rpt->config.update_every = (int)strtoul(value, NULL, 0);
+
+ else if(!strcmp(name, "os") && !rpt->os)
+ rpt->os = strdupz(value);
+
+ else if(!strcmp(name, "timezone") && !rpt->timezone)
+ rpt->timezone = strdupz(value);
+
+ else if(!strcmp(name, "abbrev_timezone") && !rpt->abbrev_timezone)
+ rpt->abbrev_timezone = strdupz(value);
+
+ else if(!strcmp(name, "utc_offset"))
+ rpt->utc_offset = (int32_t)strtol(value, NULL, 0);
+
+ else if(!strcmp(name, "hops"))
+ rpt->hops = rpt->system_info->hops = (uint16_t) strtoul(value, NULL, 0);
+
+ else if(!strcmp(name, "ml_capable"))
+ rpt->system_info->ml_capable = strtoul(value, NULL, 0);
+
+ else if(!strcmp(name, "ml_enabled"))
+ rpt->system_info->ml_enabled = strtoul(value, NULL, 0);
+
+ else if(!strcmp(name, "mc_version"))
+ rpt->system_info->mc_version = strtoul(value, NULL, 0);
+
+ else if(!strcmp(name, "ver") && (rpt->capabilities & STREAM_CAP_INVALID))
+ rpt->capabilities = convert_stream_version_to_capabilities(strtoul(value, NULL, 0), NULL, false);
+
+ else {
+ // An old Netdata child does not have a compatible streaming protocol, map to something sane.
+ if (!strcmp(name, "NETDATA_SYSTEM_OS_NAME"))
+ name = "NETDATA_HOST_OS_NAME";
+
+ else if (!strcmp(name, "NETDATA_SYSTEM_OS_ID"))
+ name = "NETDATA_HOST_OS_ID";
+
+ else if (!strcmp(name, "NETDATA_SYSTEM_OS_ID_LIKE"))
+ name = "NETDATA_HOST_OS_ID_LIKE";
+
+ else if (!strcmp(name, "NETDATA_SYSTEM_OS_VERSION"))
+ name = "NETDATA_HOST_OS_VERSION";
+
+ else if (!strcmp(name, "NETDATA_SYSTEM_OS_VERSION_ID"))
+ name = "NETDATA_HOST_OS_VERSION_ID";
+
+ else if (!strcmp(name, "NETDATA_SYSTEM_OS_DETECTION"))
+ name = "NETDATA_HOST_OS_DETECTION";
+
+ else if(!strcmp(name, "NETDATA_PROTOCOL_VERSION") && (rpt->capabilities & STREAM_CAP_INVALID))
+ rpt->capabilities = convert_stream_version_to_capabilities(1, NULL, false);
+
+ if (unlikely(rrdhost_set_system_info_variable(rpt->system_info, name, value))) {
+ nd_log_daemon(NDLP_NOTICE, "STREAM '%s' [receive from [%s]:%s]: "
+ "request has parameter '%s' = '%s', which is not used."
+ , (rpt->hostname && *rpt->hostname) ? rpt->hostname : "-"
+ , rpt->client_ip, rpt->client_port
+ , name, value);
+ }
+ }
+ }
+
+ if (rpt->capabilities & STREAM_CAP_INVALID)
+ // no version is supplied, assume version 0;
+ rpt->capabilities = convert_stream_version_to_capabilities(0, NULL, false);
+
+ // find the program name and version
+ if(w->user_agent && w->user_agent[0]) {
+ char *t = strchr(w->user_agent, '/');
+ if(t && *t) {
+ *t = '\0';
+ t++;
+ }
+
+ rpt->program_name = strdupz(w->user_agent);
+ if(t && *t) rpt->program_version = strdupz(t);
+ }
+
+ // check if we should accept this connection
+
+ if(!rpt->key || !*rpt->key) {
+ rrdpush_receive_log_status(
+ rpt, "request without an API key, rejecting connection",
+ RRDPUSH_STATUS_NO_API_KEY, NDLP_WARNING);
+
+ receiver_state_free(rpt);
+ return rrdpush_receiver_permission_denied(w);
+ }
+
+ if(!rpt->hostname || !*rpt->hostname) {
+ rrdpush_receive_log_status(
+ rpt, "request without a hostname, rejecting connection",
+ RRDPUSH_STATUS_NO_HOSTNAME, NDLP_WARNING);
+
+ receiver_state_free(rpt);
+ return rrdpush_receiver_permission_denied(w);
+ }
+
+ if(!rpt->registry_hostname)
+ rpt->registry_hostname = strdupz(rpt->hostname);
+
+ if(!rpt->machine_guid || !*rpt->machine_guid) {
+ rrdpush_receive_log_status(
+ rpt, "request without a machine GUID, rejecting connection",
+ RRDPUSH_STATUS_NO_MACHINE_GUID, NDLP_WARNING);
+
+ receiver_state_free(rpt);
+ return rrdpush_receiver_permission_denied(w);
+ }
+
+ {
+ char buf[GUID_LEN + 1];
+
+ if (regenerate_guid(rpt->key, buf) == -1) {
+ rrdpush_receive_log_status(
+ rpt, "API key is not a valid UUID (use the command uuidgen to generate one)",
+ RRDPUSH_STATUS_INVALID_API_KEY, NDLP_WARNING);
+
+ receiver_state_free(rpt);
+ return rrdpush_receiver_permission_denied(w);
+ }
+
+ if (regenerate_guid(rpt->machine_guid, buf) == -1) {
+ rrdpush_receive_log_status(
+ rpt, "machine GUID is not a valid UUID",
+ RRDPUSH_STATUS_INVALID_MACHINE_GUID, NDLP_WARNING);
+
+ receiver_state_free(rpt);
+ return rrdpush_receiver_permission_denied(w);
+ }
+ }
+
+ const char *api_key_type = appconfig_get(&stream_config, rpt->key, "type", "api");
+ if(!api_key_type || !*api_key_type) api_key_type = "unknown";
+ if(strcmp(api_key_type, "api") != 0) {
+ rrdpush_receive_log_status(
+ rpt, "API key is a machine GUID",
+ RRDPUSH_STATUS_INVALID_API_KEY, NDLP_WARNING);
+
+ receiver_state_free(rpt);
+ return rrdpush_receiver_permission_denied(w);
+ }
+
+ if(!appconfig_get_boolean(&stream_config, rpt->key, "enabled", 0)) {
+ rrdpush_receive_log_status(
+ rpt, "API key is not enabled",
+ RRDPUSH_STATUS_API_KEY_DISABLED, NDLP_WARNING);
+
+ receiver_state_free(rpt);
+ return rrdpush_receiver_permission_denied(w);
+ }
+
+ {
+ SIMPLE_PATTERN *key_allow_from = simple_pattern_create(
+ appconfig_get(&stream_config, rpt->key, "allow from", "*"),
+ NULL, SIMPLE_PATTERN_EXACT, true);
+
+ if(key_allow_from) {
+ if(!simple_pattern_matches(key_allow_from, w->client_ip)) {
+ simple_pattern_free(key_allow_from);
+
+ rrdpush_receive_log_status(
+ rpt, "API key is not allowed from this IP",
+ RRDPUSH_STATUS_NOT_ALLOWED_IP, NDLP_WARNING);
+
+ receiver_state_free(rpt);
+ return rrdpush_receiver_permission_denied(w);
+ }
+
+ simple_pattern_free(key_allow_from);
+ }
+ }
+
+ {
+ const char *machine_guid_type = appconfig_get(&stream_config, rpt->machine_guid, "type", "machine");
+ if (!machine_guid_type || !*machine_guid_type) machine_guid_type = "unknown";
+
+ if (strcmp(machine_guid_type, "machine") != 0) {
+ rrdpush_receive_log_status(
+ rpt, "machine GUID is an API key",
+ RRDPUSH_STATUS_INVALID_MACHINE_GUID, NDLP_WARNING);
+
+ receiver_state_free(rpt);
+ return rrdpush_receiver_permission_denied(w);
+ }
+ }
+
+ if(!appconfig_get_boolean(&stream_config, rpt->machine_guid, "enabled", 1)) {
+ rrdpush_receive_log_status(
+ rpt, "machine GUID is not enabled",
+ RRDPUSH_STATUS_MACHINE_GUID_DISABLED, NDLP_WARNING);
+
+ receiver_state_free(rpt);
+ return rrdpush_receiver_permission_denied(w);
+ }
+
+ {
+ SIMPLE_PATTERN *machine_allow_from = simple_pattern_create(
+ appconfig_get(&stream_config, rpt->machine_guid, "allow from", "*"),
+ NULL, SIMPLE_PATTERN_EXACT, true);
+
+ if(machine_allow_from) {
+ if(!simple_pattern_matches(machine_allow_from, w->client_ip)) {
+ simple_pattern_free(machine_allow_from);
+
+ rrdpush_receive_log_status(
+ rpt, "machine GUID is not allowed from this IP",
+ RRDPUSH_STATUS_NOT_ALLOWED_IP, NDLP_WARNING);
+
+ receiver_state_free(rpt);
+ return rrdpush_receiver_permission_denied(w);
+ }
+
+ simple_pattern_free(machine_allow_from);
+ }
+ }
+
+ if (strcmp(rpt->machine_guid, localhost->machine_guid) == 0) {
+
+ rrdpush_receiver_takeover_web_connection(w, rpt);
+
+ rrdpush_receive_log_status(
+ rpt, "machine GUID is my own",
+ RRDPUSH_STATUS_LOCALHOST, NDLP_DEBUG);
+
+ char initial_response[HTTP_HEADER_SIZE + 1];
+ snprintfz(initial_response, HTTP_HEADER_SIZE, "%s", START_STREAMING_ERROR_SAME_LOCALHOST);
+
+ if(send_timeout(
+ &rpt->ssl,
+ rpt->fd, initial_response, strlen(initial_response), 0, 60) != (ssize_t)strlen(initial_response)) {
+
+ nd_log_daemon(NDLP_ERR, "STREAM '%s' [receive from [%s]:%s]: "
+ "failed to reply."
+ , rpt->hostname
+ , rpt->client_ip, rpt->client_port
+ );
+ }
+
+ receiver_state_free(rpt);
+ return HTTP_RESP_OK;
+ }
+
+ if(unlikely(web_client_streaming_rate_t > 0)) {
+ static SPINLOCK spinlock = NETDATA_SPINLOCK_INITIALIZER;
+ static time_t last_stream_accepted_t = 0;
+
+ time_t now = now_realtime_sec();
+ spinlock_lock(&spinlock);
+
+ if(unlikely(last_stream_accepted_t == 0))
+ last_stream_accepted_t = now;
+
+ if(now - last_stream_accepted_t < web_client_streaming_rate_t) {
+ spinlock_unlock(&spinlock);
+
+ char msg[100 + 1];
+ snprintfz(msg, sizeof(msg) - 1,
+ "rate limit, will accept new connection in %ld secs",
+ (long)(web_client_streaming_rate_t - (now - last_stream_accepted_t)));
+
+ rrdpush_receive_log_status(
+ rpt, msg,
+ RRDPUSH_STATUS_RATE_LIMIT, NDLP_NOTICE);
+
+ receiver_state_free(rpt);
+ return rrdpush_receiver_too_busy_now(w);
+ }
+
+ last_stream_accepted_t = now;
+ spinlock_unlock(&spinlock);
+ }
+
+ /*
+ * Quick path for rejecting multiple connections. The lock taken is fine-grained - it only protects the receiver
+ * pointer within the host (if a host exists). This protects against multiple concurrent web requests hitting
+ * separate threads within the web-server and landing here. The lock guards the thread-shutdown sequence that
+ * detaches the receiver from the host. If the host is being created (first time-access) then we also use the
+ * lock to prevent race-hazard (two threads try to create the host concurrently, one wins and the other does a
+ * lookup to the now-attached structure).
+ */
+
+ {
+ time_t age = 0;
+ bool receiver_stale = false;
+ bool receiver_working = false;
+
+ rrd_rdlock();
+ RRDHOST *host = rrdhost_find_by_guid(rpt->machine_guid);
+ if (unlikely(host && rrdhost_flag_check(host, RRDHOST_FLAG_ARCHIVED))) /* Ignore archived hosts. */
+ host = NULL;
+
+ if (host) {
+ spinlock_lock(&host->receiver_lock);
+ if (host->receiver) {
+ age = now_monotonic_sec() - host->receiver->last_msg_t;
+
+ if (age < 30)
+ receiver_working = true;
+ else
+ receiver_stale = true;
+ }
+ spinlock_unlock(&host->receiver_lock);
+ }
+ rrd_rdunlock();
+
+ if (receiver_stale && stop_streaming_receiver(host, STREAM_HANDSHAKE_DISCONNECT_STALE_RECEIVER)) {
+ // we stopped the receiver
+ // we can proceed with this connection
+ receiver_stale = false;
+
+ nd_log_daemon(NDLP_NOTICE, "STREAM '%s' [receive from [%s]:%s]: "
+ "stopped previous stale receiver to accept this one."
+ , rpt->hostname
+ , rpt->client_ip, rpt->client_port
+ );
+ }
+
+ if (receiver_working || receiver_stale) {
+ // another receiver is already connected
+ // try again later
+
+ char msg[200 + 1];
+ snprintfz(msg, sizeof(msg) - 1,
+ "multiple connections for same host, "
+ "old connection was last used %ld secs ago%s",
+ age, receiver_stale ? " (signaled old receiver to stop)" : " (new connection not accepted)");
+
+ rrdpush_receive_log_status(
+ rpt, msg,
+ RRDPUSH_STATUS_ALREADY_CONNECTED, NDLP_DEBUG);
+
+ // Have not set WEB_CLIENT_FLAG_DONT_CLOSE_SOCKET - caller should clean up
+ buffer_flush(w->response.data);
+ buffer_strcat(w->response.data, START_STREAMING_ERROR_ALREADY_STREAMING);
+ receiver_state_free(rpt);
+ return HTTP_RESP_CONFLICT;
+ }
+ }
+
+ rrdpush_receiver_takeover_web_connection(w, rpt);
+
+ char tag[NETDATA_THREAD_TAG_MAX + 1];
+ snprintfz(tag, NETDATA_THREAD_TAG_MAX, THREAD_TAG_STREAM_RECEIVER "[%s]", rpt->hostname);
+ tag[NETDATA_THREAD_TAG_MAX] = '\0';
+
+ rpt->thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_DEFAULT, rrdpush_receiver_thread, (void *)rpt);
+ if(!rpt->thread) {
+ rrdpush_receive_log_status(
+ rpt, "can't create receiver thread",
+ RRDPUSH_STATUS_INTERNAL_SERVER_ERROR, NDLP_ERR);
+
+ buffer_flush(w->response.data);
+ buffer_strcat(w->response.data, "Can't handle this request");
+ receiver_state_free(rpt);
+ return HTTP_RESP_INTERNAL_SERVER_ERROR;
+ }
+
+ // prevent the caller from closing the streaming socket
+ return HTTP_RESP_OK;
+}
diff --git a/src/streaming/receiver.h b/src/streaming/receiver.h
new file mode 100644
index 000000000..a1f208608
--- /dev/null
+++ b/src/streaming/receiver.h
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_RECEIVER_H
+#define NETDATA_RECEIVER_H
+
+#include "libnetdata/libnetdata.h"
+#include "database/rrd.h"
+
+struct parser;
+
+struct receiver_state {
+ RRDHOST *host;
+ pid_t tid;
+ ND_THREAD *thread;
+ int fd;
+ char *key;
+ char *hostname;
+ char *registry_hostname;
+ char *machine_guid;
+ char *os;
+ char *timezone; // Unused?
+ char *abbrev_timezone;
+ int32_t utc_offset;
+ char *client_ip; // Duplicated in pluginsd
+ char *client_port; // Duplicated in pluginsd
+ char *program_name; // Duplicated in pluginsd
+ char *program_version;
+ struct rrdhost_system_info *system_info;
+ STREAM_CAPABILITIES capabilities;
+ time_t last_msg_t;
+ time_t connected_since_s;
+
+ struct buffered_reader reader;
+
+ uint16_t hops;
+
+ struct {
+ bool shutdown; // signal the streaming parser to exit
+ STREAM_HANDSHAKE reason;
+ } exit;
+
+ struct {
+ RRD_MEMORY_MODE mode;
+ int history;
+ int update_every;
+ int health_enabled; // CONFIG_BOOLEAN_YES, CONFIG_BOOLEAN_NO, CONFIG_BOOLEAN_AUTO
+ time_t alarms_delay;
+ uint32_t alarms_history;
+ int rrdpush_enabled;
+ const char *rrdpush_api_key; // DONT FREE - it is allocated in appconfig
+ const char *rrdpush_send_charts_matching; // DONT FREE - it is allocated in appconfig
+ bool rrdpush_enable_replication;
+ time_t rrdpush_seconds_to_replicate;
+ time_t rrdpush_replication_step;
+ const char *rrdpush_destination; // DONT FREE - it is allocated in appconfig
+ unsigned int rrdpush_compression;
+ STREAM_CAPABILITIES compression_priorities[COMPRESSION_ALGORITHM_MAX];
+ } config;
+
+ NETDATA_SSL ssl;
+
+ time_t replication_first_time_t;
+
+ struct decompressor_state decompressor;
+ /*
+ struct {
+ uint32_t count;
+ STREAM_NODE_INSTANCE *array;
+ } instances;
+*/
+
+ // The parser pointer is safe to read and use, only when having the host receiver lock.
+ // Without this lock, the data pointed by the pointer may vanish randomly.
+ // Also, since the receiver sets it when it starts, it should be read with
+ // an atomic read.
+ struct parser *parser;
+
+#ifdef ENABLE_H2O
+ void *h2o_ctx;
+#endif
+};
+
+#ifdef ENABLE_H2O
+#define is_h2o_rrdpush(x) ((x)->h2o_ctx != NULL)
+#define unless_h2o_rrdpush(x) if(!is_h2o_rrdpush(x))
+#endif
+
+int rrdpush_receiver_thread_spawn(struct web_client *w, char *decoded_query_string, void *h2o_ctx);
+
+void receiver_state_free(struct receiver_state *rpt);
+bool stop_streaming_receiver(RRDHOST *host, STREAM_HANDSHAKE reason);
+
+#endif //NETDATA_RECEIVER_H
diff --git a/src/streaming/replication.c b/src/streaming/replication.c
index 1f5aeb34c..1f2c3140d 100644
--- a/src/streaming/replication.c
+++ b/src/streaming/replication.c
@@ -612,6 +612,7 @@ static struct replication_query *replication_response_prepare(
}
void replication_response_cancel_and_finalize(struct replication_query *q) {
+ if(!q) return;
replication_query_finalize(NULL, q, false);
}
@@ -718,7 +719,7 @@ bool replication_response_execute_and_finalize(struct replication_query *q, size
struct replication_request_details {
struct {
send_command callback;
- void *data;
+ struct parser *parser;
} caller;
RRDHOST *host;
@@ -826,7 +827,7 @@ static bool send_replay_chart_cmd(struct replication_request_details *r, const c
rrdset_id(st), r->wanted.start_streaming ? "true" : "false",
(unsigned long long)r->wanted.after, (unsigned long long)r->wanted.before);
- ssize_t ret = r->caller.callback(buffer, r->caller.data);
+ ssize_t ret = r->caller.callback(buffer, r->caller.parser);
if (ret < 0) {
netdata_log_error("REPLAY ERROR: 'host:%s/chart:%s' failed to send replication request to child (error %zd)",
rrdhost_hostname(r->host), rrdset_id(r->st), ret);
@@ -836,14 +837,14 @@ static bool send_replay_chart_cmd(struct replication_request_details *r, const c
return true;
}
-bool replicate_chart_request(send_command callback, void *callback_data, RRDHOST *host, RRDSET *st,
+bool replicate_chart_request(send_command callback, struct parser *parser, RRDHOST *host, RRDSET *st,
time_t child_first_entry, time_t child_last_entry, time_t child_wall_clock_time,
time_t prev_first_entry_wanted, time_t prev_last_entry_wanted)
{
struct replication_request_details r = {
.caller = {
.callback = callback,
- .data = callback_data,
+ .parser = parser,
},
.host = host,
diff --git a/src/streaming/replication.h b/src/streaming/replication.h
index 507b7c32f..27baeaf35 100644
--- a/src/streaming/replication.h
+++ b/src/streaming/replication.h
@@ -5,6 +5,8 @@
#include "daemon/common.h"
+struct parser;
+
struct replication_query_statistics {
SPINLOCK spinlock;
size_t queries_started;
@@ -17,9 +19,9 @@ struct replication_query_statistics replication_get_query_statistics(void);
bool replicate_chart_response(RRDHOST *rh, RRDSET *rs, bool start_streaming, time_t after, time_t before);
-typedef ssize_t (*send_command)(const char *txt, void *data);
+typedef ssize_t (*send_command)(const char *txt, struct parser *parser);
-bool replicate_chart_request(send_command callback, void *callback_data,
+bool replicate_chart_request(send_command callback, struct parser *parser,
RRDHOST *rh, RRDSET *rs,
time_t child_first_entry, time_t child_last_entry, time_t child_wall_clock_time,
time_t response_first_start_time, time_t response_last_end_time);
diff --git a/src/streaming/rrdhost-status.c b/src/streaming/rrdhost-status.c
new file mode 100644
index 000000000..c34fa693e
--- /dev/null
+++ b/src/streaming/rrdhost-status.c
@@ -0,0 +1,355 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "rrdhost-status.h"
+
+const char *rrdhost_db_status_to_string(RRDHOST_DB_STATUS status) {
+ switch(status) {
+ default:
+ case RRDHOST_DB_STATUS_INITIALIZING:
+ return "initializing";
+
+ case RRDHOST_DB_STATUS_QUERYABLE:
+ return "online";
+ }
+}
+
+const char *rrdhost_db_liveness_to_string(RRDHOST_DB_LIVENESS status) {
+ switch(status) {
+ default:
+ case RRDHOST_DB_LIVENESS_STALE:
+ return "stale";
+
+ case RRDHOST_DB_LIVENESS_LIVE:
+ return "live";
+ }
+}
+
+const char *rrdhost_ingest_status_to_string(RRDHOST_INGEST_STATUS status) {
+ switch(status) {
+ case RRDHOST_INGEST_STATUS_ARCHIVED:
+ return "archived";
+
+ case RRDHOST_INGEST_STATUS_INITIALIZING:
+ return "initializing";
+
+ case RRDHOST_INGEST_STATUS_REPLICATING:
+ return "replicating";
+
+ case RRDHOST_INGEST_STATUS_ONLINE:
+ return "online";
+
+ default:
+ case RRDHOST_INGEST_STATUS_OFFLINE:
+ return "offline";
+ }
+}
+
+const char *rrdhost_ingest_type_to_string(RRDHOST_INGEST_TYPE type) {
+ switch(type) {
+ case RRDHOST_INGEST_TYPE_LOCALHOST:
+ return "localhost";
+
+ case RRDHOST_INGEST_TYPE_VIRTUAL:
+ return "virtual";
+
+ case RRDHOST_INGEST_TYPE_CHILD:
+ return "child";
+
+ default:
+ case RRDHOST_INGEST_TYPE_ARCHIVED:
+ return "archived";
+ }
+}
+
+const char *rrdhost_streaming_status_to_string(RRDHOST_STREAMING_STATUS status) {
+ switch(status) {
+ case RRDHOST_STREAM_STATUS_DISABLED:
+ return "disabled";
+
+ case RRDHOST_STREAM_STATUS_REPLICATING:
+ return "replicating";
+
+ case RRDHOST_STREAM_STATUS_ONLINE:
+ return "online";
+
+ default:
+ case RRDHOST_STREAM_STATUS_OFFLINE:
+ return "offline";
+ }
+}
+
+const char *rrdhost_ml_status_to_string(RRDHOST_ML_STATUS status) {
+ switch(status) {
+ case RRDHOST_ML_STATUS_RUNNING:
+ return "online";
+
+ case RRDHOST_ML_STATUS_OFFLINE:
+ return "offline";
+
+ default:
+ case RRDHOST_ML_STATUS_DISABLED:
+ return "disabled";
+ }
+}
+
+const char *rrdhost_ml_type_to_string(RRDHOST_ML_TYPE type) {
+ switch(type) {
+ case RRDHOST_ML_TYPE_SELF:
+ return "self";
+
+ case RRDHOST_ML_TYPE_RECEIVED:
+ return "received";
+
+ default:
+ case RRDHOST_ML_TYPE_DISABLED:
+ return "disabled";
+ }
+}
+
+const char *rrdhost_health_status_to_string(RRDHOST_HEALTH_STATUS status) {
+ switch(status) {
+ default:
+ case RRDHOST_HEALTH_STATUS_DISABLED:
+ return "disabled";
+
+ case RRDHOST_HEALTH_STATUS_INITIALIZING:
+ return "initializing";
+
+ case RRDHOST_HEALTH_STATUS_RUNNING:
+ return "online";
+ }
+}
+
+const char *rrdhost_dyncfg_status_to_string(RRDHOST_DYNCFG_STATUS status) {
+ switch(status) {
+ default:
+ case RRDHOST_DYNCFG_STATUS_UNAVAILABLE:
+ return "unavailable";
+
+ case RRDHOST_DYNCFG_STATUS_AVAILABLE:
+ return "online";
+ }
+}
+
+static NETDATA_DOUBLE rrdhost_sender_replication_completion_unsafe(RRDHOST *host, time_t now, size_t *instances) {
+ size_t charts = rrdhost_sender_replicating_charts(host);
+ NETDATA_DOUBLE completion;
+ if(!charts || !host->sender || !host->sender->replication.oldest_request_after_t)
+ completion = 100.0;
+ else if(!host->sender->replication.latest_completed_before_t || host->sender->replication.latest_completed_before_t < host->sender->replication.oldest_request_after_t)
+ completion = 0.0;
+ else {
+ time_t total = now - host->sender->replication.oldest_request_after_t;
+ time_t current = host->sender->replication.latest_completed_before_t - host->sender->replication.oldest_request_after_t;
+ completion = (NETDATA_DOUBLE) current * 100.0 / (NETDATA_DOUBLE) total;
+ }
+
+ *instances = charts;
+
+ return completion;
+}
+
+void rrdhost_status(RRDHOST *host, time_t now, RRDHOST_STATUS *s) {
+ memset(s, 0, sizeof(*s));
+
+ s->host = host;
+ s->now = now;
+
+ RRDHOST_FLAGS flags = __atomic_load_n(&host->flags, __ATOMIC_RELAXED);
+
+ // --- dyncfg ---
+
+ s->dyncfg.status = dyncfg_available_for_rrdhost(host) ? RRDHOST_DYNCFG_STATUS_AVAILABLE : RRDHOST_DYNCFG_STATUS_UNAVAILABLE;
+
+ // --- db ---
+
+ bool online = rrdhost_is_online(host);
+
+ rrdhost_retention(host, now, online, &s->db.first_time_s, &s->db.last_time_s);
+ s->db.metrics = host->rrdctx.metrics;
+ s->db.instances = host->rrdctx.instances;
+ s->db.contexts = dictionary_entries(host->rrdctx.contexts);
+ if(!s->db.first_time_s || !s->db.last_time_s || !s->db.metrics || !s->db.instances || !s->db.contexts ||
+ (flags & (RRDHOST_FLAG_PENDING_CONTEXT_LOAD)))
+ s->db.status = RRDHOST_DB_STATUS_INITIALIZING;
+ else
+ s->db.status = RRDHOST_DB_STATUS_QUERYABLE;
+
+ s->db.mode = host->rrd_memory_mode;
+
+ // --- ingest ---
+
+ s->ingest.since = MAX(host->child_connect_time, host->child_disconnected_time);
+ s->ingest.reason = (online) ? STREAM_HANDSHAKE_NEVER : host->rrdpush_last_receiver_exit_reason;
+
+ spinlock_lock(&host->receiver_lock);
+ s->ingest.hops = (host->system_info ? host->system_info->hops : (host == localhost) ? 0 : 1);
+ bool has_receiver = false;
+ if (host->receiver && !rrdhost_flag_check(host, RRDHOST_FLAG_RRDPUSH_RECEIVER_DISCONNECTED)) {
+ has_receiver = true;
+ s->ingest.replication.instances = rrdhost_receiver_replicating_charts(host);
+ s->ingest.replication.completion = host->rrdpush_receiver_replication_percent;
+ s->ingest.replication.in_progress = s->ingest.replication.instances > 0;
+
+ s->ingest.capabilities = host->receiver->capabilities;
+ s->ingest.peers = socket_peers(host->receiver->fd);
+ s->ingest.ssl = SSL_connection(&host->receiver->ssl);
+ }
+ spinlock_unlock(&host->receiver_lock);
+
+ if (online) {
+ if(s->db.status == RRDHOST_DB_STATUS_INITIALIZING)
+ s->ingest.status = RRDHOST_INGEST_STATUS_INITIALIZING;
+
+ else if (host == localhost || rrdhost_option_check(host, RRDHOST_OPTION_VIRTUAL_HOST)) {
+ s->ingest.status = RRDHOST_INGEST_STATUS_ONLINE;
+ s->ingest.since = netdata_start_time;
+ }
+
+ else if (s->ingest.replication.in_progress)
+ s->ingest.status = RRDHOST_INGEST_STATUS_REPLICATING;
+
+ else
+ s->ingest.status = RRDHOST_INGEST_STATUS_ONLINE;
+ }
+ else {
+ if (!s->ingest.since) {
+ s->ingest.status = RRDHOST_INGEST_STATUS_ARCHIVED;
+ s->ingest.since = s->db.last_time_s;
+ }
+
+ else
+ s->ingest.status = RRDHOST_INGEST_STATUS_OFFLINE;
+ }
+
+ if(host == localhost)
+ s->ingest.type = RRDHOST_INGEST_TYPE_LOCALHOST;
+ else if(has_receiver)
+ s->ingest.type = RRDHOST_INGEST_TYPE_CHILD;
+ else if(rrdhost_option_check(host, RRDHOST_OPTION_VIRTUAL_HOST))
+ s->ingest.type = RRDHOST_INGEST_TYPE_VIRTUAL;
+ else
+ s->ingest.type = RRDHOST_INGEST_TYPE_ARCHIVED;
+
+ s->ingest.id = host->rrdpush_receiver_connection_counter;
+
+ if(!s->ingest.since)
+ s->ingest.since = netdata_start_time;
+
+ if(s->ingest.status == RRDHOST_INGEST_STATUS_ONLINE)
+ s->db.liveness = RRDHOST_DB_LIVENESS_LIVE;
+ else
+ s->db.liveness = RRDHOST_DB_LIVENESS_STALE;
+
+ // --- stream ---
+
+ if (!host->sender) {
+ s->stream.status = RRDHOST_STREAM_STATUS_DISABLED;
+ s->stream.hops = s->ingest.hops + 1;
+ }
+ else {
+ sender_lock(host->sender);
+
+ s->stream.since = host->sender->last_state_since_t;
+ s->stream.peers = socket_peers(host->sender->rrdpush_sender_socket);
+ s->stream.ssl = SSL_connection(&host->sender->ssl);
+
+ memcpy(s->stream.sent_bytes_on_this_connection_per_type,
+ host->sender->sent_bytes_on_this_connection_per_type,
+ MIN(sizeof(s->stream.sent_bytes_on_this_connection_per_type),
+ sizeof(host->sender->sent_bytes_on_this_connection_per_type)));
+
+ if (rrdhost_flag_check(host, RRDHOST_FLAG_RRDPUSH_SENDER_CONNECTED)) {
+ s->stream.hops = host->sender->hops;
+ s->stream.reason = STREAM_HANDSHAKE_NEVER;
+ s->stream.capabilities = host->sender->capabilities;
+
+ s->stream.replication.completion = rrdhost_sender_replication_completion_unsafe(host, now, &s->stream.replication.instances);
+ s->stream.replication.in_progress = s->stream.replication.instances > 0;
+
+ if(s->stream.replication.in_progress)
+ s->stream.status = RRDHOST_STREAM_STATUS_REPLICATING;
+ else
+ s->stream.status = RRDHOST_STREAM_STATUS_ONLINE;
+
+ s->stream.compression = host->sender->compressor.initialized;
+ }
+ else {
+ s->stream.status = RRDHOST_STREAM_STATUS_OFFLINE;
+ s->stream.hops = s->ingest.hops + 1;
+ s->stream.reason = host->sender->exit.reason;
+ }
+
+ sender_unlock(host->sender);
+ }
+
+ s->stream.id = host->rrdpush_sender_connection_counter;
+
+ if(!s->stream.since)
+ s->stream.since = netdata_start_time;
+
+ // --- ml ---
+
+ if(ml_host_get_host_status(host, &s->ml.metrics)) {
+ s->ml.type = RRDHOST_ML_TYPE_SELF;
+
+ if(s->ingest.status == RRDHOST_INGEST_STATUS_OFFLINE || s->ingest.status == RRDHOST_INGEST_STATUS_ARCHIVED)
+ s->ml.status = RRDHOST_ML_STATUS_OFFLINE;
+ else
+ s->ml.status = RRDHOST_ML_STATUS_RUNNING;
+ }
+ else if(stream_has_capability(&s->ingest, STREAM_CAP_DATA_WITH_ML)) {
+ s->ml.type = RRDHOST_ML_TYPE_RECEIVED;
+ s->ml.status = RRDHOST_ML_STATUS_RUNNING;
+ }
+ else {
+ // does not receive ML, does not run ML
+ s->ml.type = RRDHOST_ML_TYPE_DISABLED;
+ s->ml.status = RRDHOST_ML_STATUS_DISABLED;
+ }
+
+ // --- health ---
+
+ if(host->health.health_enabled) {
+ if(flags & RRDHOST_FLAG_PENDING_HEALTH_INITIALIZATION)
+ s->health.status = RRDHOST_HEALTH_STATUS_INITIALIZING;
+ else {
+ s->health.status = RRDHOST_HEALTH_STATUS_RUNNING;
+
+ RRDCALC *rc;
+ foreach_rrdcalc_in_rrdhost_read(host, rc) {
+ if (unlikely(!rc->rrdset || !rc->rrdset->last_collected_time.tv_sec))
+ continue;
+
+ switch (rc->status) {
+ default:
+ case RRDCALC_STATUS_REMOVED:
+ break;
+
+ case RRDCALC_STATUS_CLEAR:
+ s->health.alerts.clear++;
+ break;
+
+ case RRDCALC_STATUS_WARNING:
+ s->health.alerts.warning++;
+ break;
+
+ case RRDCALC_STATUS_CRITICAL:
+ s->health.alerts.critical++;
+ break;
+
+ case RRDCALC_STATUS_UNDEFINED:
+ s->health.alerts.undefined++;
+ break;
+
+ case RRDCALC_STATUS_UNINITIALIZED:
+ s->health.alerts.uninitialized++;
+ break;
+ }
+ }
+ foreach_rrdcalc_in_rrdhost_done(rc);
+ }
+ }
+ else
+ s->health.status = RRDHOST_HEALTH_STATUS_DISABLED;
+}
diff --git a/src/streaming/rrdhost-status.h b/src/streaming/rrdhost-status.h
new file mode 100644
index 000000000..21298e268
--- /dev/null
+++ b/src/streaming/rrdhost-status.h
@@ -0,0 +1,161 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_RRDHOST_STATUS_H
+#define NETDATA_RRDHOST_STATUS_H
+
+#include "libnetdata/libnetdata.h"
+#include "stream-handshake.h"
+#include "stream-capabilities.h"
+#include "database/rrd.h"
+
+typedef enum __attribute__((packed)) {
+ RRDHOST_DB_STATUS_INITIALIZING = 0,
+ RRDHOST_DB_STATUS_QUERYABLE,
+} RRDHOST_DB_STATUS;
+
+const char *rrdhost_db_status_to_string(RRDHOST_DB_STATUS status);
+
+typedef enum __attribute__((packed)) {
+ RRDHOST_DB_LIVENESS_STALE = 0,
+ RRDHOST_DB_LIVENESS_LIVE,
+} RRDHOST_DB_LIVENESS;
+
+const char *rrdhost_db_liveness_to_string(RRDHOST_DB_LIVENESS status);
+
+typedef enum __attribute__((packed)) {
+ RRDHOST_INGEST_STATUS_ARCHIVED = 0,
+ RRDHOST_INGEST_STATUS_INITIALIZING,
+ RRDHOST_INGEST_STATUS_REPLICATING,
+ RRDHOST_INGEST_STATUS_ONLINE,
+ RRDHOST_INGEST_STATUS_OFFLINE,
+} RRDHOST_INGEST_STATUS;
+
+const char *rrdhost_ingest_status_to_string(RRDHOST_INGEST_STATUS status);
+
+typedef enum __attribute__((packed)) {
+ RRDHOST_INGEST_TYPE_LOCALHOST = 0,
+ RRDHOST_INGEST_TYPE_VIRTUAL,
+ RRDHOST_INGEST_TYPE_CHILD,
+ RRDHOST_INGEST_TYPE_ARCHIVED,
+} RRDHOST_INGEST_TYPE;
+
+const char *rrdhost_ingest_type_to_string(RRDHOST_INGEST_TYPE type);
+
+typedef enum __attribute__((packed)) {
+ RRDHOST_STREAM_STATUS_DISABLED = 0,
+ RRDHOST_STREAM_STATUS_REPLICATING,
+ RRDHOST_STREAM_STATUS_ONLINE,
+ RRDHOST_STREAM_STATUS_OFFLINE,
+} RRDHOST_STREAMING_STATUS;
+
+const char *rrdhost_streaming_status_to_string(RRDHOST_STREAMING_STATUS status);
+
+typedef enum __attribute__((packed)) {
+ RRDHOST_ML_STATUS_DISABLED = 0,
+ RRDHOST_ML_STATUS_OFFLINE,
+ RRDHOST_ML_STATUS_RUNNING,
+} RRDHOST_ML_STATUS;
+
+const char *rrdhost_ml_status_to_string(RRDHOST_ML_STATUS status);
+
+typedef enum __attribute__((packed)) {
+ RRDHOST_ML_TYPE_DISABLED = 0,
+ RRDHOST_ML_TYPE_SELF,
+ RRDHOST_ML_TYPE_RECEIVED,
+} RRDHOST_ML_TYPE;
+
+const char *rrdhost_ml_type_to_string(RRDHOST_ML_TYPE type);
+
+typedef enum __attribute__((packed)) {
+ RRDHOST_HEALTH_STATUS_DISABLED = 0,
+ RRDHOST_HEALTH_STATUS_INITIALIZING,
+ RRDHOST_HEALTH_STATUS_RUNNING,
+} RRDHOST_HEALTH_STATUS;
+
+const char *rrdhost_health_status_to_string(RRDHOST_HEALTH_STATUS status);
+
+typedef enum __attribute__((packed)) {
+ RRDHOST_DYNCFG_STATUS_UNAVAILABLE = 0,
+ RRDHOST_DYNCFG_STATUS_AVAILABLE,
+} RRDHOST_DYNCFG_STATUS;
+
+const char *rrdhost_dyncfg_status_to_string(RRDHOST_DYNCFG_STATUS status);
+
+typedef struct {
+ RRDHOST *host;
+ time_t now;
+
+ struct {
+ RRDHOST_DYNCFG_STATUS status;
+ } dyncfg;
+
+ struct {
+ RRDHOST_DB_STATUS status;
+ RRDHOST_DB_LIVENESS liveness;
+ RRD_MEMORY_MODE mode;
+ time_t first_time_s;
+ time_t last_time_s;
+ size_t metrics;
+ size_t instances;
+ size_t contexts;
+ } db;
+
+ struct {
+ RRDHOST_ML_STATUS status;
+ RRDHOST_ML_TYPE type;
+ struct ml_metrics_statistics metrics;
+ } ml;
+
+ struct {
+ size_t hops;
+ RRDHOST_INGEST_TYPE type;
+ RRDHOST_INGEST_STATUS status;
+ SOCKET_PEERS peers;
+ bool ssl;
+ STREAM_CAPABILITIES capabilities;
+ uint32_t id;
+ time_t since;
+ STREAM_HANDSHAKE reason;
+
+ struct {
+ bool in_progress;
+ NETDATA_DOUBLE completion;
+ size_t instances;
+ } replication;
+ } ingest;
+
+ struct {
+ size_t hops;
+ RRDHOST_STREAMING_STATUS status;
+ SOCKET_PEERS peers;
+ bool ssl;
+ bool compression;
+ STREAM_CAPABILITIES capabilities;
+ uint32_t id;
+ time_t since;
+ STREAM_HANDSHAKE reason;
+
+ struct {
+ bool in_progress;
+ NETDATA_DOUBLE completion;
+ size_t instances;
+ } replication;
+
+ size_t sent_bytes_on_this_connection_per_type[STREAM_TRAFFIC_TYPE_MAX];
+ } stream;
+
+ struct {
+ RRDHOST_HEALTH_STATUS status;
+ struct {
+ uint32_t undefined;
+ uint32_t uninitialized;
+ uint32_t clear;
+ uint32_t warning;
+ uint32_t critical;
+ } alerts;
+ } health;
+} RRDHOST_STATUS;
+
+void rrdhost_status(RRDHOST *host, time_t now, RRDHOST_STATUS *s);
+
+#endif //NETDATA_RRDHOST_STATUS_H
diff --git a/src/streaming/rrdpush.c b/src/streaming/rrdpush.c
deleted file mode 100644
index 23a86e720..000000000
--- a/src/streaming/rrdpush.c
+++ /dev/null
@@ -1,1418 +0,0 @@
-// SPDX-License-Identifier: GPL-3.0-or-later
-
-#include "rrdpush.h"
-
-/*
- * rrdpush
- *
- * 3 threads are involved for all stream operations
- *
- * 1. a random data collection thread, calling rrdset_done_push()
- * this is called for each chart.
- *
- * the output of this work is kept in a thread BUFFER
- * the sender thread is signalled via a pipe (in RRDHOST)
- *
- * 2. a sender thread running at the sending netdata
- * this is spawned automatically on the first chart to be pushed
- *
- * It tries to push the metrics to the remote netdata, as fast
- * as possible (i.e. immediately after they are collected).
- *
- * 3. a receiver thread, running at the receiving netdata
- * this is spawned automatically when the sender connects to
- * the receiver.
- *
- */
-
-struct config stream_config = {
- .first_section = NULL,
- .last_section = NULL,
- .mutex = NETDATA_MUTEX_INITIALIZER,
- .index = {
- .avl_tree = {
- .root = NULL,
- .compar = appconfig_section_compare
- },
- .rwlock = AVL_LOCK_INITIALIZER
- }
-};
-
-unsigned int default_rrdpush_enabled = 0;
-STREAM_CAPABILITIES globally_disabled_capabilities = STREAM_CAP_NONE;
-
-unsigned int default_rrdpush_compression_enabled = 1;
-char *default_rrdpush_destination = NULL;
-char *default_rrdpush_api_key = NULL;
-char *default_rrdpush_send_charts_matching = NULL;
-bool default_rrdpush_enable_replication = true;
-time_t default_rrdpush_seconds_to_replicate = 86400;
-time_t default_rrdpush_replication_step = 600;
-#ifdef ENABLE_HTTPS
-char *netdata_ssl_ca_path = NULL;
-char *netdata_ssl_ca_file = NULL;
-#endif
-
-static void load_stream_conf() {
- errno_clear();
- char *filename = strdupz_path_subpath(netdata_configured_user_config_dir, "stream.conf");
- if(!appconfig_load(&stream_config, filename, 0, NULL)) {
- nd_log_daemon(NDLP_NOTICE, "CONFIG: cannot load user config '%s'. Will try stock config.", filename);
- freez(filename);
-
- filename = strdupz_path_subpath(netdata_configured_stock_config_dir, "stream.conf");
- if(!appconfig_load(&stream_config, filename, 0, NULL))
- nd_log_daemon(NDLP_NOTICE, "CONFIG: cannot load stock config '%s'. Running with internal defaults.", filename);
- }
- freez(filename);
-}
-
-bool rrdpush_receiver_needs_dbengine() {
- struct section *co;
-
- for(co = stream_config.first_section; co; co = co->next) {
- if(strcmp(co->name, "stream") == 0)
- continue; // the first section is not relevant
-
- char *s;
-
- s = appconfig_get_by_section(co, "enabled", NULL);
- if(!s || !appconfig_test_boolean_value(s))
- continue;
-
- s = appconfig_get_by_section(co, "default memory mode", NULL);
- if(s && strcmp(s, "dbengine") == 0)
- return true;
-
- s = appconfig_get_by_section(co, "memory mode", NULL);
- if(s && strcmp(s, "dbengine") == 0)
- return true;
- }
-
- return false;
-}
-
-int rrdpush_init() {
- // --------------------------------------------------------------------
- // load stream.conf
- load_stream_conf();
-
- default_rrdpush_enabled = (unsigned int)appconfig_get_boolean(&stream_config, CONFIG_SECTION_STREAM, "enabled", default_rrdpush_enabled);
- default_rrdpush_destination = appconfig_get(&stream_config, CONFIG_SECTION_STREAM, "destination", "");
- default_rrdpush_api_key = appconfig_get(&stream_config, CONFIG_SECTION_STREAM, "api key", "");
- default_rrdpush_send_charts_matching = appconfig_get(&stream_config, CONFIG_SECTION_STREAM, "send charts matching", "*");
-
- default_rrdpush_enable_replication = config_get_boolean(CONFIG_SECTION_DB, "enable replication", default_rrdpush_enable_replication);
- default_rrdpush_seconds_to_replicate = config_get_number(CONFIG_SECTION_DB, "seconds to replicate", default_rrdpush_seconds_to_replicate);
- default_rrdpush_replication_step = config_get_number(CONFIG_SECTION_DB, "seconds per replication step", default_rrdpush_replication_step);
-
- rrdhost_free_orphan_time_s = config_get_number(CONFIG_SECTION_DB, "cleanup orphan hosts after secs", rrdhost_free_orphan_time_s);
-
- default_rrdpush_compression_enabled = (unsigned int)appconfig_get_boolean(&stream_config, CONFIG_SECTION_STREAM,
- "enable compression", default_rrdpush_compression_enabled);
-
- rrdpush_compression_levels[COMPRESSION_ALGORITHM_BROTLI] = (int)appconfig_get_number(
- &stream_config, CONFIG_SECTION_STREAM, "brotli compression level",
- rrdpush_compression_levels[COMPRESSION_ALGORITHM_BROTLI]);
-
- rrdpush_compression_levels[COMPRESSION_ALGORITHM_ZSTD] = (int)appconfig_get_number(
- &stream_config, CONFIG_SECTION_STREAM, "zstd compression level",
- rrdpush_compression_levels[COMPRESSION_ALGORITHM_ZSTD]);
-
- rrdpush_compression_levels[COMPRESSION_ALGORITHM_LZ4] = (int)appconfig_get_number(
- &stream_config, CONFIG_SECTION_STREAM, "lz4 compression acceleration",
- rrdpush_compression_levels[COMPRESSION_ALGORITHM_LZ4]);
-
- rrdpush_compression_levels[COMPRESSION_ALGORITHM_GZIP] = (int)appconfig_get_number(
- &stream_config, CONFIG_SECTION_STREAM, "gzip compression level",
- rrdpush_compression_levels[COMPRESSION_ALGORITHM_GZIP]);
-
- if(default_rrdpush_enabled && (!default_rrdpush_destination || !*default_rrdpush_destination || !default_rrdpush_api_key || !*default_rrdpush_api_key)) {
- nd_log_daemon(NDLP_WARNING, "STREAM [send]: cannot enable sending thread - information is missing.");
- default_rrdpush_enabled = 0;
- }
-
-#ifdef ENABLE_HTTPS
- netdata_ssl_validate_certificate_sender = !appconfig_get_boolean(&stream_config, CONFIG_SECTION_STREAM, "ssl skip certificate verification", !netdata_ssl_validate_certificate);
-
- if(!netdata_ssl_validate_certificate_sender)
- nd_log_daemon(NDLP_NOTICE, "SSL: streaming senders will skip SSL certificates verification.");
-
- netdata_ssl_ca_path = appconfig_get(&stream_config, CONFIG_SECTION_STREAM, "CApath", NULL);
- netdata_ssl_ca_file = appconfig_get(&stream_config, CONFIG_SECTION_STREAM, "CAfile", NULL);
-#endif
-
- return default_rrdpush_enabled;
-}
-
-// data collection happens from multiple threads
-// each of these threads calls rrdset_done()
-// which in turn calls rrdset_done_push()
-// which uses this pipe to notify the streaming thread
-// that there are more data ready to be sent
-#define PIPE_READ 0
-#define PIPE_WRITE 1
-
-// to have the remote netdata re-sync the charts
-// to its current clock, we send for this many
-// iterations a BEGIN line without microseconds
-// this is for the first iterations of each chart
-unsigned int remote_clock_resync_iterations = 60;
-
-static inline bool should_send_chart_matching(RRDSET *st, RRDSET_FLAGS flags) {
- if(!(flags & RRDSET_FLAG_RECEIVER_REPLICATION_FINISHED))
- return false;
-
- if(unlikely(!(flags & (RRDSET_FLAG_UPSTREAM_SEND | RRDSET_FLAG_UPSTREAM_IGNORE)))) {
- RRDHOST *host = st->rrdhost;
-
- if (flags & RRDSET_FLAG_ANOMALY_DETECTION) {
- if(ml_streaming_enabled())
- rrdset_flag_set(st, RRDSET_FLAG_UPSTREAM_SEND);
- else
- rrdset_flag_set(st, RRDSET_FLAG_UPSTREAM_IGNORE);
- }
- else if(simple_pattern_matches_string(host->rrdpush_send_charts_matching, st->id) ||
- simple_pattern_matches_string(host->rrdpush_send_charts_matching, st->name))
-
- rrdset_flag_set(st, RRDSET_FLAG_UPSTREAM_SEND);
- else
- rrdset_flag_set(st, RRDSET_FLAG_UPSTREAM_IGNORE);
-
- // get the flags again, to know how to respond
- flags = rrdset_flag_check(st, RRDSET_FLAG_UPSTREAM_SEND|RRDSET_FLAG_UPSTREAM_IGNORE);
- }
-
- return flags & RRDSET_FLAG_UPSTREAM_SEND;
-}
-
-int configured_as_parent() {
- struct section *section = NULL;
- int is_parent = 0;
-
- appconfig_wrlock(&stream_config);
- for (section = stream_config.first_section; section; section = section->next) {
- nd_uuid_t uuid;
-
- if (uuid_parse(section->name, uuid) != -1 &&
- appconfig_get_boolean_by_section(section, "enabled", 0)) {
- is_parent = 1;
- break;
- }
- }
- appconfig_unlock(&stream_config);
-
- return is_parent;
-}
-
-// chart labels
-static int send_clabels_callback(const char *name, const char *value, RRDLABEL_SRC ls, void *data) {
- BUFFER *wb = (BUFFER *)data;
- buffer_sprintf(wb, PLUGINSD_KEYWORD_CLABEL " \"%s\" \"%s\" %d\n", name, value, ls & ~(RRDLABEL_FLAG_INTERNAL));
- return 1;
-}
-
-static void rrdpush_send_clabels(BUFFER *wb, RRDSET *st) {
- if (st->rrdlabels) {
- if(rrdlabels_walkthrough_read(st->rrdlabels, send_clabels_callback, wb) > 0)
- buffer_sprintf(wb, PLUGINSD_KEYWORD_CLABEL_COMMIT "\n");
- }
-}
-
-// Send the current chart definition.
-// Assumes that collector thread has already called sender_start for mutex / buffer state.
-static inline bool rrdpush_send_chart_definition(BUFFER *wb, RRDSET *st) {
- uint32_t version = rrdset_metadata_version(st);
-
- RRDHOST *host = st->rrdhost;
- NUMBER_ENCODING integer_encoding = stream_has_capability(host->sender, STREAM_CAP_IEEE754) ? NUMBER_ENCODING_BASE64 : NUMBER_ENCODING_HEX;
- bool with_slots = stream_has_capability(host->sender, STREAM_CAP_SLOTS) ? true : false;
-
- bool replication_progress = false;
-
- // properly set the name for the remote end to parse it
- char *name = "";
- if(likely(st->name)) {
- if(unlikely(st->id != st->name)) {
- // they differ
- name = strchr(rrdset_name(st), '.');
- if(name)
- name++;
- else
- name = "";
- }
- }
-
- buffer_fast_strcat(wb, PLUGINSD_KEYWORD_CHART, sizeof(PLUGINSD_KEYWORD_CHART) - 1);
-
- if(with_slots) {
- buffer_fast_strcat(wb, " "PLUGINSD_KEYWORD_SLOT":", sizeof(PLUGINSD_KEYWORD_SLOT) - 1 + 2);
- buffer_print_uint64_encoded(wb, integer_encoding, st->rrdpush.sender.chart_slot);
- }
-
- // send the chart
- buffer_sprintf(
- wb
- , " \"%s\" \"%s\" \"%s\" \"%s\" \"%s\" \"%s\" \"%s\" %d %d \"%s %s %s %s\" \"%s\" \"%s\"\n"
- , rrdset_id(st)
- , name
- , rrdset_title(st)
- , rrdset_units(st)
- , rrdset_family(st)
- , rrdset_context(st)
- , rrdset_type_name(st->chart_type)
- , st->priority
- , st->update_every
- , rrdset_flag_check(st, RRDSET_FLAG_OBSOLETE)?"obsolete":""
- , rrdset_flag_check(st, RRDSET_FLAG_DETAIL)?"detail":""
- , rrdset_flag_check(st, RRDSET_FLAG_STORE_FIRST)?"store_first":""
- , rrdset_flag_check(st, RRDSET_FLAG_HIDDEN)?"hidden":""
- , rrdset_plugin_name(st)
- , rrdset_module_name(st)
- );
-
- // send the chart labels
- if (stream_has_capability(host->sender, STREAM_CAP_CLABELS))
- rrdpush_send_clabels(wb, st);
-
- // send the dimensions
- RRDDIM *rd;
- rrddim_foreach_read(rd, st) {
- buffer_fast_strcat(wb, PLUGINSD_KEYWORD_DIMENSION, sizeof(PLUGINSD_KEYWORD_DIMENSION) - 1);
-
- if(with_slots) {
- buffer_fast_strcat(wb, " "PLUGINSD_KEYWORD_SLOT":", sizeof(PLUGINSD_KEYWORD_SLOT) - 1 + 2);
- buffer_print_uint64_encoded(wb, integer_encoding, rd->rrdpush.sender.dim_slot);
- }
-
- buffer_sprintf(
- wb
- , " \"%s\" \"%s\" \"%s\" %d %d \"%s %s %s\"\n"
- , rrddim_id(rd)
- , rrddim_name(rd)
- , rrd_algorithm_name(rd->algorithm)
- , rd->multiplier
- , rd->divisor
- , rrddim_flag_check(rd, RRDDIM_FLAG_OBSOLETE)?"obsolete":""
- , rrddim_option_check(rd, RRDDIM_OPTION_HIDDEN)?"hidden":""
- , rrddim_option_check(rd, RRDDIM_OPTION_DONT_DETECT_RESETS_OR_OVERFLOWS)?"noreset":""
- );
- }
- rrddim_foreach_done(rd);
-
- // send the chart functions
- if(stream_has_capability(host->sender, STREAM_CAP_FUNCTIONS))
- rrd_chart_functions_expose_rrdpush(st, wb);
-
- // send the chart local custom variables
- rrdvar_print_to_streaming_custom_chart_variables(st, wb);
-
- if (stream_has_capability(host->sender, STREAM_CAP_REPLICATION)) {
- time_t db_first_time_t, db_last_time_t;
-
- time_t now = now_realtime_sec();
- rrdset_get_retention_of_tier_for_collected_chart(st, &db_first_time_t, &db_last_time_t, now, 0);
-
- buffer_sprintf(wb, PLUGINSD_KEYWORD_CHART_DEFINITION_END " %llu %llu %llu\n",
- (unsigned long long)db_first_time_t,
- (unsigned long long)db_last_time_t,
- (unsigned long long)now);
-
- if(!rrdset_flag_check(st, RRDSET_FLAG_SENDER_REPLICATION_IN_PROGRESS)) {
- rrdset_flag_set(st, RRDSET_FLAG_SENDER_REPLICATION_IN_PROGRESS);
- rrdset_flag_clear(st, RRDSET_FLAG_SENDER_REPLICATION_FINISHED);
- rrdhost_sender_replicating_charts_plus_one(st->rrdhost);
- }
- replication_progress = true;
-
-#ifdef NETDATA_LOG_REPLICATION_REQUESTS
- internal_error(true, "REPLAY: 'host:%s/chart:%s' replication starts",
- rrdhost_hostname(st->rrdhost), rrdset_id(st));
-#endif
- }
-
- sender_commit(host->sender, wb, STREAM_TRAFFIC_TYPE_METADATA);
-
- // we can set the exposed flag, after we commit the buffer
- // because replication may pick it up prematurely
- rrddim_foreach_read(rd, st) {
- rrddim_metadata_exposed_upstream(rd, version);
- }
- rrddim_foreach_done(rd);
- rrdset_metadata_exposed_upstream(st, version);
-
- st->rrdpush.sender.resync_time_s = st->last_collected_time.tv_sec + (remote_clock_resync_iterations * st->update_every);
- return replication_progress;
-}
-
-// sends the current chart dimensions
-static void rrdpush_send_chart_metrics(BUFFER *wb, RRDSET *st, struct sender_state *s __maybe_unused, RRDSET_FLAGS flags) {
- buffer_fast_strcat(wb, "BEGIN \"", 7);
- buffer_fast_strcat(wb, rrdset_id(st), string_strlen(st->id));
- buffer_fast_strcat(wb, "\" ", 2);
-
- if(st->last_collected_time.tv_sec > st->rrdpush.sender.resync_time_s)
- buffer_print_uint64(wb, st->usec_since_last_update);
- else
- buffer_fast_strcat(wb, "0", 1);
-
- buffer_fast_strcat(wb, "\n", 1);
-
- RRDDIM *rd;
- rrddim_foreach_read(rd, st) {
- if(unlikely(!rrddim_check_updated(rd)))
- continue;
-
- if(likely(rrddim_check_upstream_exposed_collector(rd))) {
- buffer_fast_strcat(wb, "SET \"", 5);
- buffer_fast_strcat(wb, rrddim_id(rd), string_strlen(rd->id));
- buffer_fast_strcat(wb, "\" = ", 4);
- buffer_print_int64(wb, rd->collector.collected_value);
- buffer_fast_strcat(wb, "\n", 1);
- }
- else {
- internal_error(true, "STREAM: 'host:%s/chart:%s/dim:%s' flag 'exposed' is updated but not exposed",
- rrdhost_hostname(st->rrdhost), rrdset_id(st), rrddim_id(rd));
- // we will include it in the next iteration
- rrddim_metadata_updated(rd);
- }
- }
- rrddim_foreach_done(rd);
-
- if(unlikely(flags & RRDSET_FLAG_UPSTREAM_SEND_VARIABLES))
- rrdvar_print_to_streaming_custom_chart_variables(st, wb);
-
- buffer_fast_strcat(wb, "END\n", 4);
-}
-
-static void rrdpush_sender_thread_spawn(RRDHOST *host);
-
-// Called from the internal collectors to mark a chart obsolete.
-bool rrdset_push_chart_definition_now(RRDSET *st) {
- RRDHOST *host = st->rrdhost;
-
- if(unlikely(!rrdhost_can_send_definitions_to_parent(host)
- || !should_send_chart_matching(st, rrdset_flag_get(st)))) {
- return false;
- }
-
- BUFFER *wb = sender_start(host->sender);
- rrdpush_send_chart_definition(wb, st);
- sender_thread_buffer_free();
-
- return true;
-}
-
-void rrdset_push_metrics_v1(RRDSET_STREAM_BUFFER *rsb, RRDSET *st) {
- RRDHOST *host = st->rrdhost;
- rrdpush_send_chart_metrics(rsb->wb, st, host->sender, rsb->rrdset_flags);
-}
-
-void rrddim_push_metrics_v2(RRDSET_STREAM_BUFFER *rsb, RRDDIM *rd, usec_t point_end_time_ut, NETDATA_DOUBLE n, SN_FLAGS flags) {
- if(!rsb->wb || !rsb->v2 || !netdata_double_isnumber(n) || !does_storage_number_exist(flags))
- return;
-
- bool with_slots = stream_has_capability(rsb, STREAM_CAP_SLOTS) ? true : false;
- NUMBER_ENCODING integer_encoding = stream_has_capability(rsb, STREAM_CAP_IEEE754) ? NUMBER_ENCODING_BASE64 : NUMBER_ENCODING_HEX;
- NUMBER_ENCODING doubles_encoding = stream_has_capability(rsb, STREAM_CAP_IEEE754) ? NUMBER_ENCODING_BASE64 : NUMBER_ENCODING_DECIMAL;
- BUFFER *wb = rsb->wb;
- time_t point_end_time_s = (time_t)(point_end_time_ut / USEC_PER_SEC);
- if(unlikely(rsb->last_point_end_time_s != point_end_time_s)) {
-
- if(unlikely(rsb->begin_v2_added))
- buffer_fast_strcat(wb, PLUGINSD_KEYWORD_END_V2 "\n", sizeof(PLUGINSD_KEYWORD_END_V2) - 1 + 1);
-
- buffer_fast_strcat(wb, PLUGINSD_KEYWORD_BEGIN_V2, sizeof(PLUGINSD_KEYWORD_BEGIN_V2) - 1);
-
- if(with_slots) {
- buffer_fast_strcat(wb, " "PLUGINSD_KEYWORD_SLOT":", sizeof(PLUGINSD_KEYWORD_SLOT) - 1 + 2);
- buffer_print_uint64_encoded(wb, integer_encoding, rd->rrdset->rrdpush.sender.chart_slot);
- }
-
- buffer_fast_strcat(wb, " '", 2);
- buffer_fast_strcat(wb, rrdset_id(rd->rrdset), string_strlen(rd->rrdset->id));
- buffer_fast_strcat(wb, "' ", 2);
- buffer_print_uint64_encoded(wb, integer_encoding, rd->rrdset->update_every);
- buffer_fast_strcat(wb, " ", 1);
- buffer_print_uint64_encoded(wb, integer_encoding, point_end_time_s);
- buffer_fast_strcat(wb, " ", 1);
- if(point_end_time_s == rsb->wall_clock_time)
- buffer_fast_strcat(wb, "#", 1);
- else
- buffer_print_uint64_encoded(wb, integer_encoding, rsb->wall_clock_time);
- buffer_fast_strcat(wb, "\n", 1);
-
- rsb->last_point_end_time_s = point_end_time_s;
- rsb->begin_v2_added = true;
- }
-
- buffer_fast_strcat(wb, PLUGINSD_KEYWORD_SET_V2, sizeof(PLUGINSD_KEYWORD_SET_V2) - 1);
-
- if(with_slots) {
- buffer_fast_strcat(wb, " "PLUGINSD_KEYWORD_SLOT":", sizeof(PLUGINSD_KEYWORD_SLOT) - 1 + 2);
- buffer_print_uint64_encoded(wb, integer_encoding, rd->rrdpush.sender.dim_slot);
- }
-
- buffer_fast_strcat(wb, " '", 2);
- buffer_fast_strcat(wb, rrddim_id(rd), string_strlen(rd->id));
- buffer_fast_strcat(wb, "' ", 2);
- buffer_print_int64_encoded(wb, integer_encoding, rd->collector.last_collected_value);
- buffer_fast_strcat(wb, " ", 1);
-
- if((NETDATA_DOUBLE)rd->collector.last_collected_value == n)
- buffer_fast_strcat(wb, "#", 1);
- else
- buffer_print_netdata_double_encoded(wb, doubles_encoding, n);
-
- buffer_fast_strcat(wb, " ", 1);
- buffer_print_sn_flags(wb, flags, true);
- buffer_fast_strcat(wb, "\n", 1);
-}
-
-void rrdset_push_metrics_finished(RRDSET_STREAM_BUFFER *rsb, RRDSET *st) {
- if(!rsb->wb)
- return;
-
- if(rsb->v2 && rsb->begin_v2_added) {
- if(unlikely(rsb->rrdset_flags & RRDSET_FLAG_UPSTREAM_SEND_VARIABLES))
- rrdvar_print_to_streaming_custom_chart_variables(st, rsb->wb);
-
- buffer_fast_strcat(rsb->wb, PLUGINSD_KEYWORD_END_V2 "\n", sizeof(PLUGINSD_KEYWORD_END_V2) - 1 + 1);
- }
-
- sender_commit(st->rrdhost->sender, rsb->wb, STREAM_TRAFFIC_TYPE_DATA);
-
- *rsb = (RRDSET_STREAM_BUFFER){ .wb = NULL, };
-}
-
-RRDSET_STREAM_BUFFER rrdset_push_metric_initialize(RRDSET *st, time_t wall_clock_time) {
- RRDHOST *host = st->rrdhost;
-
- // fetch the flags we need to check with one atomic operation
- RRDHOST_FLAGS host_flags = __atomic_load_n(&host->flags, __ATOMIC_SEQ_CST);
-
- // check if we are not connected
- if(unlikely(!(host_flags & RRDHOST_FLAG_RRDPUSH_SENDER_READY_4_METRICS))) {
-
- if(unlikely(!(host_flags & (RRDHOST_FLAG_RRDPUSH_SENDER_SPAWN | RRDHOST_FLAG_RRDPUSH_RECEIVER_DISCONNECTED))))
- rrdpush_sender_thread_spawn(host);
-
- if(unlikely(!(host_flags & RRDHOST_FLAG_RRDPUSH_SENDER_LOGGED_STATUS))) {
- rrdhost_flag_set(host, RRDHOST_FLAG_RRDPUSH_SENDER_LOGGED_STATUS);
- nd_log_daemon(NDLP_NOTICE, "STREAM %s [send]: not ready - collected metrics are not sent to parent.", rrdhost_hostname(host));
- }
-
- return (RRDSET_STREAM_BUFFER) { .wb = NULL, };
- }
- else if(unlikely(host_flags & RRDHOST_FLAG_RRDPUSH_SENDER_LOGGED_STATUS)) {
- nd_log_daemon(NDLP_INFO, "STREAM %s [send]: sending metrics to parent...", rrdhost_hostname(host));
- rrdhost_flag_clear(host, RRDHOST_FLAG_RRDPUSH_SENDER_LOGGED_STATUS);
- }
-
- if(unlikely(host_flags & RRDHOST_FLAG_GLOBAL_FUNCTIONS_UPDATED)) {
- BUFFER *wb = sender_start(host->sender);
- rrd_global_functions_expose_rrdpush(host, wb, stream_has_capability(host->sender, STREAM_CAP_DYNCFG));
- sender_commit(host->sender, wb, STREAM_TRAFFIC_TYPE_FUNCTIONS);
- }
-
- bool exposed_upstream = rrdset_check_upstream_exposed(st);
- RRDSET_FLAGS rrdset_flags = rrdset_flag_get(st);
- bool replication_in_progress = !(rrdset_flags & RRDSET_FLAG_SENDER_REPLICATION_FINISHED);
-
- if(unlikely((exposed_upstream && replication_in_progress) ||
- !should_send_chart_matching(st, rrdset_flags)))
- return (RRDSET_STREAM_BUFFER) { .wb = NULL, };
-
- if(unlikely(!exposed_upstream)) {
- BUFFER *wb = sender_start(host->sender);
- replication_in_progress = rrdpush_send_chart_definition(wb, st);
- }
-
- if(replication_in_progress)
- return (RRDSET_STREAM_BUFFER) { .wb = NULL, };
-
- return (RRDSET_STREAM_BUFFER) {
- .capabilities = host->sender->capabilities,
- .v2 = stream_has_capability(host->sender, STREAM_CAP_INTERPOLATED),
- .rrdset_flags = rrdset_flags,
- .wb = sender_start(host->sender),
- .wall_clock_time = wall_clock_time,
- };
-}
-
-// labels
-static int send_labels_callback(const char *name, const char *value, RRDLABEL_SRC ls, void *data) {
- BUFFER *wb = (BUFFER *)data;
- buffer_sprintf(wb, "LABEL \"%s\" = %d \"%s\"\n", name, ls, value);
- return 1;
-}
-
-void rrdpush_send_host_labels(RRDHOST *host) {
- if(unlikely(!rrdhost_can_send_definitions_to_parent(host)
- || !stream_has_capability(host->sender, STREAM_CAP_HLABELS)))
- return;
-
- BUFFER *wb = sender_start(host->sender);
-
- rrdlabels_walkthrough_read(host->rrdlabels, send_labels_callback, wb);
- buffer_sprintf(wb, "OVERWRITE %s\n", "labels");
-
- sender_commit(host->sender, wb, STREAM_TRAFFIC_TYPE_METADATA);
-
- sender_thread_buffer_free();
-}
-
-void rrdpush_send_global_functions(RRDHOST *host) {
- if(!stream_has_capability(host->sender, STREAM_CAP_FUNCTIONS))
- return;
-
- if(unlikely(!rrdhost_can_send_definitions_to_parent(host)))
- return;
-
- BUFFER *wb = sender_start(host->sender);
-
- rrd_global_functions_expose_rrdpush(host, wb, stream_has_capability(host->sender, STREAM_CAP_DYNCFG));
-
- sender_commit(host->sender, wb, STREAM_TRAFFIC_TYPE_FUNCTIONS);
-
- sender_thread_buffer_free();
-}
-
-void rrdpush_send_claimed_id(RRDHOST *host) {
- if(!stream_has_capability(host->sender, STREAM_CAP_CLAIM))
- return;
-
- if(unlikely(!rrdhost_can_send_definitions_to_parent(host)))
- return;
-
- BUFFER *wb = sender_start(host->sender);
- rrdhost_aclk_state_lock(host);
-
- buffer_sprintf(wb, "CLAIMED_ID %s %s\n", host->machine_guid, (host->aclk_state.claimed_id ? host->aclk_state.claimed_id : "NULL") );
-
- rrdhost_aclk_state_unlock(host);
- sender_commit(host->sender, wb, STREAM_TRAFFIC_TYPE_METADATA);
-
- sender_thread_buffer_free();
-}
-
-int connect_to_one_of_destinations(
- RRDHOST *host,
- int default_port,
- struct timeval *timeout,
- size_t *reconnects_counter,
- char *connected_to,
- size_t connected_to_size,
- struct rrdpush_destinations **destination)
-{
- int sock = -1;
-
- for (struct rrdpush_destinations *d = host->destinations; d; d = d->next) {
- time_t now = now_realtime_sec();
-
- if(nd_thread_signaled_to_cancel())
- return -1;
-
- if(d->postpone_reconnection_until > now)
- continue;
-
- nd_log(NDLS_DAEMON, NDLP_DEBUG,
- "STREAM %s: connecting to '%s' (default port: %d)...",
- rrdhost_hostname(host), string2str(d->destination), default_port);
-
- if (reconnects_counter)
- *reconnects_counter += 1;
-
- d->since = now;
- d->attempts++;
- sock = connect_to_this(string2str(d->destination), default_port, timeout);
-
- if (sock != -1) {
- if (connected_to && connected_to_size)
- strncpyz(connected_to, string2str(d->destination), connected_to_size);
-
- *destination = d;
-
- // move the current item to the end of the list
- // without this, this destination will break the loop again and again
- // not advancing the destinations to find one that may work
- DOUBLE_LINKED_LIST_REMOVE_ITEM_UNSAFE(host->destinations, d, prev, next);
- DOUBLE_LINKED_LIST_APPEND_ITEM_UNSAFE(host->destinations, d, prev, next);
-
- break;
- }
- }
-
- return sock;
-}
-
-struct destinations_init_tmp {
- RRDHOST *host;
- struct rrdpush_destinations *list;
- int count;
-};
-
-bool destinations_init_add_one(char *entry, void *data) {
- struct destinations_init_tmp *t = data;
-
- struct rrdpush_destinations *d = callocz(1, sizeof(struct rrdpush_destinations));
- char *colon_ssl = strstr(entry, ":SSL");
- if(colon_ssl) {
- *colon_ssl = '\0';
- d->ssl = true;
- }
- else
- d->ssl = false;
-
- d->destination = string_strdupz(entry);
-
- __atomic_add_fetch(&netdata_buffers_statistics.rrdhost_senders, sizeof(struct rrdpush_destinations), __ATOMIC_RELAXED);
-
- DOUBLE_LINKED_LIST_APPEND_ITEM_UNSAFE(t->list, d, prev, next);
-
- t->count++;
- nd_log_daemon(NDLP_INFO, "STREAM: added streaming destination No %d: '%s' to host '%s'", t->count, string2str(d->destination), rrdhost_hostname(t->host));
-
- return false; // we return false, so that we will get all defined destinations
-}
-
-void rrdpush_destinations_init(RRDHOST *host) {
- if(!host->rrdpush_send_destination) return;
-
- rrdpush_destinations_free(host);
-
- struct destinations_init_tmp t = {
- .host = host,
- .list = NULL,
- .count = 0,
- };
-
- foreach_entry_in_connection_string(host->rrdpush_send_destination, destinations_init_add_one, &t);
-
- host->destinations = t.list;
-}
-
-void rrdpush_destinations_free(RRDHOST *host) {
- while (host->destinations) {
- struct rrdpush_destinations *tmp = host->destinations;
- DOUBLE_LINKED_LIST_REMOVE_ITEM_UNSAFE(host->destinations, tmp, prev, next);
- string_freez(tmp->destination);
- freez(tmp);
- __atomic_sub_fetch(&netdata_buffers_statistics.rrdhost_senders, sizeof(struct rrdpush_destinations), __ATOMIC_RELAXED);
- }
-
- host->destinations = NULL;
-}
-
-// ----------------------------------------------------------------------------
-// rrdpush sender thread
-
-// Either the receiver lost the connection or the host is being destroyed.
-// The sender mutex guards thread creation, any spurious data is wiped on reconnection.
-void rrdpush_sender_thread_stop(RRDHOST *host, STREAM_HANDSHAKE reason, bool wait) {
- if (!host->sender)
- return;
-
- sender_lock(host->sender);
-
- if(rrdhost_flag_check(host, RRDHOST_FLAG_RRDPUSH_SENDER_SPAWN)) {
-
- host->sender->exit.shutdown = true;
- host->sender->exit.reason = reason;
-
- // signal it to cancel
- nd_thread_signal_cancel(host->rrdpush_sender_thread);
- }
-
- sender_unlock(host->sender);
-
- if(wait) {
- sender_lock(host->sender);
- while(host->sender->tid) {
- sender_unlock(host->sender);
- sleep_usec(10 * USEC_PER_MS);
- sender_lock(host->sender);
- }
- sender_unlock(host->sender);
- }
-}
-
-// ----------------------------------------------------------------------------
-// rrdpush receiver thread
-
-static void rrdpush_sender_thread_spawn(RRDHOST *host) {
- sender_lock(host->sender);
-
- if(!rrdhost_flag_check(host, RRDHOST_FLAG_RRDPUSH_SENDER_SPAWN)) {
- char tag[NETDATA_THREAD_TAG_MAX + 1];
- snprintfz(tag, NETDATA_THREAD_TAG_MAX, THREAD_TAG_STREAM_SENDER "[%s]", rrdhost_hostname(host));
-
- host->rrdpush_sender_thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_DEFAULT,
- rrdpush_sender_thread, (void *)host->sender);
- if(!host->rrdpush_sender_thread)
- nd_log_daemon(NDLP_ERR, "STREAM %s [send]: failed to create new thread for client.", rrdhost_hostname(host));
- else
- rrdhost_flag_set(host, RRDHOST_FLAG_RRDPUSH_SENDER_SPAWN);
- }
-
- sender_unlock(host->sender);
-}
-
-int rrdpush_receiver_permission_denied(struct web_client *w) {
- // we always respond with the same message and error code
- // to prevent an attacker from gaining info about the error
- buffer_flush(w->response.data);
- buffer_strcat(w->response.data, START_STREAMING_ERROR_NOT_PERMITTED);
- return HTTP_RESP_UNAUTHORIZED;
-}
-
-int rrdpush_receiver_too_busy_now(struct web_client *w) {
- // we always respond with the same message and error code
- // to prevent an attacker from gaining info about the error
- buffer_flush(w->response.data);
- buffer_strcat(w->response.data, START_STREAMING_ERROR_BUSY_TRY_LATER);
- return HTTP_RESP_SERVICE_UNAVAILABLE;
-}
-
-static void rrdpush_receiver_takeover_web_connection(struct web_client *w, struct receiver_state *rpt) {
- rpt->fd = w->ifd;
-
-#ifdef ENABLE_HTTPS
- rpt->ssl.conn = w->ssl.conn;
- rpt->ssl.state = w->ssl.state;
-
- w->ssl = NETDATA_SSL_UNSET_CONNECTION;
-#endif
-
- WEB_CLIENT_IS_DEAD(w);
-
- if(web_server_mode == WEB_SERVER_MODE_STATIC_THREADED) {
- web_client_flag_set(w, WEB_CLIENT_FLAG_DONT_CLOSE_SOCKET);
- }
- else {
- if(w->ifd == w->ofd)
- w->ifd = w->ofd = -1;
- else
- w->ifd = -1;
- }
-
- buffer_flush(w->response.data);
-}
-
-void *rrdpush_receiver_thread(void *ptr);
-int rrdpush_receiver_thread_spawn(struct web_client *w, char *decoded_query_string, void *h2o_ctx __maybe_unused) {
-
- if(!service_running(ABILITY_STREAMING_CONNECTIONS))
- return rrdpush_receiver_too_busy_now(w);
-
- struct receiver_state *rpt = callocz(1, sizeof(*rpt));
- rpt->last_msg_t = now_monotonic_sec();
- rpt->hops = 1;
-
- rpt->capabilities = STREAM_CAP_INVALID;
-
-#ifdef ENABLE_H2O
- rpt->h2o_ctx = h2o_ctx;
-#endif
-
- __atomic_add_fetch(&netdata_buffers_statistics.rrdhost_receivers, sizeof(*rpt), __ATOMIC_RELAXED);
- __atomic_add_fetch(&netdata_buffers_statistics.rrdhost_allocations_size, sizeof(struct rrdhost_system_info), __ATOMIC_RELAXED);
-
- rpt->system_info = callocz(1, sizeof(struct rrdhost_system_info));
- rpt->system_info->hops = rpt->hops;
-
- rpt->fd = -1;
- rpt->client_ip = strdupz(w->client_ip);
- rpt->client_port = strdupz(w->client_port);
-
-#ifdef ENABLE_HTTPS
- rpt->ssl = NETDATA_SSL_UNSET_CONNECTION;
-#endif
-
- rpt->config.update_every = default_rrd_update_every;
-
- // parse the parameters and fill rpt and rpt->system_info
-
- while(decoded_query_string) {
- char *value = strsep_skip_consecutive_separators(&decoded_query_string, "&");
- if(!value || !*value) continue;
-
- char *name = strsep_skip_consecutive_separators(&value, "=");
- if(!name || !*name) continue;
- if(!value || !*value) continue;
-
- if(!strcmp(name, "key") && !rpt->key)
- rpt->key = strdupz(value);
-
- else if(!strcmp(name, "hostname") && !rpt->hostname)
- rpt->hostname = strdupz(value);
-
- else if(!strcmp(name, "registry_hostname") && !rpt->registry_hostname)
- rpt->registry_hostname = strdupz(value);
-
- else if(!strcmp(name, "machine_guid") && !rpt->machine_guid)
- rpt->machine_guid = strdupz(value);
-
- else if(!strcmp(name, "update_every"))
- rpt->config.update_every = (int)strtoul(value, NULL, 0);
-
- else if(!strcmp(name, "os") && !rpt->os)
- rpt->os = strdupz(value);
-
- else if(!strcmp(name, "timezone") && !rpt->timezone)
- rpt->timezone = strdupz(value);
-
- else if(!strcmp(name, "abbrev_timezone") && !rpt->abbrev_timezone)
- rpt->abbrev_timezone = strdupz(value);
-
- else if(!strcmp(name, "utc_offset"))
- rpt->utc_offset = (int32_t)strtol(value, NULL, 0);
-
- else if(!strcmp(name, "hops"))
- rpt->hops = rpt->system_info->hops = (uint16_t) strtoul(value, NULL, 0);
-
- else if(!strcmp(name, "ml_capable"))
- rpt->system_info->ml_capable = strtoul(value, NULL, 0);
-
- else if(!strcmp(name, "ml_enabled"))
- rpt->system_info->ml_enabled = strtoul(value, NULL, 0);
-
- else if(!strcmp(name, "mc_version"))
- rpt->system_info->mc_version = strtoul(value, NULL, 0);
-
- else if(!strcmp(name, "ver") && (rpt->capabilities & STREAM_CAP_INVALID))
- rpt->capabilities = convert_stream_version_to_capabilities(strtoul(value, NULL, 0), NULL, false);
-
- else {
- // An old Netdata child does not have a compatible streaming protocol, map to something sane.
- if (!strcmp(name, "NETDATA_SYSTEM_OS_NAME"))
- name = "NETDATA_HOST_OS_NAME";
-
- else if (!strcmp(name, "NETDATA_SYSTEM_OS_ID"))
- name = "NETDATA_HOST_OS_ID";
-
- else if (!strcmp(name, "NETDATA_SYSTEM_OS_ID_LIKE"))
- name = "NETDATA_HOST_OS_ID_LIKE";
-
- else if (!strcmp(name, "NETDATA_SYSTEM_OS_VERSION"))
- name = "NETDATA_HOST_OS_VERSION";
-
- else if (!strcmp(name, "NETDATA_SYSTEM_OS_VERSION_ID"))
- name = "NETDATA_HOST_OS_VERSION_ID";
-
- else if (!strcmp(name, "NETDATA_SYSTEM_OS_DETECTION"))
- name = "NETDATA_HOST_OS_DETECTION";
-
- else if(!strcmp(name, "NETDATA_PROTOCOL_VERSION") && (rpt->capabilities & STREAM_CAP_INVALID))
- rpt->capabilities = convert_stream_version_to_capabilities(1, NULL, false);
-
- if (unlikely(rrdhost_set_system_info_variable(rpt->system_info, name, value))) {
- nd_log_daemon(NDLP_NOTICE, "STREAM '%s' [receive from [%s]:%s]: "
- "request has parameter '%s' = '%s', which is not used."
- , (rpt->hostname && *rpt->hostname) ? rpt->hostname : "-"
- , rpt->client_ip, rpt->client_port
- , name, value);
- }
- }
- }
-
- if (rpt->capabilities & STREAM_CAP_INVALID)
- // no version is supplied, assume version 0;
- rpt->capabilities = convert_stream_version_to_capabilities(0, NULL, false);
-
- // find the program name and version
- if(w->user_agent && w->user_agent[0]) {
- char *t = strchr(w->user_agent, '/');
- if(t && *t) {
- *t = '\0';
- t++;
- }
-
- rpt->program_name = strdupz(w->user_agent);
- if(t && *t) rpt->program_version = strdupz(t);
- }
-
- // check if we should accept this connection
-
- if(!rpt->key || !*rpt->key) {
- rrdpush_receive_log_status(
- rpt, "request without an API key, rejecting connection",
- RRDPUSH_STATUS_NO_API_KEY, NDLP_WARNING);
-
- receiver_state_free(rpt);
- return rrdpush_receiver_permission_denied(w);
- }
-
- if(!rpt->hostname || !*rpt->hostname) {
- rrdpush_receive_log_status(
- rpt, "request without a hostname, rejecting connection",
- RRDPUSH_STATUS_NO_HOSTNAME, NDLP_WARNING);
-
- receiver_state_free(rpt);
- return rrdpush_receiver_permission_denied(w);
- }
-
- if(!rpt->registry_hostname)
- rpt->registry_hostname = strdupz(rpt->hostname);
-
- if(!rpt->machine_guid || !*rpt->machine_guid) {
- rrdpush_receive_log_status(
- rpt, "request without a machine GUID, rejecting connection",
- RRDPUSH_STATUS_NO_MACHINE_GUID, NDLP_WARNING);
-
- receiver_state_free(rpt);
- return rrdpush_receiver_permission_denied(w);
- }
-
- {
- char buf[GUID_LEN + 1];
-
- if (regenerate_guid(rpt->key, buf) == -1) {
- rrdpush_receive_log_status(
- rpt, "API key is not a valid UUID (use the command uuidgen to generate one)",
- RRDPUSH_STATUS_INVALID_API_KEY, NDLP_WARNING);
-
- receiver_state_free(rpt);
- return rrdpush_receiver_permission_denied(w);
- }
-
- if (regenerate_guid(rpt->machine_guid, buf) == -1) {
- rrdpush_receive_log_status(
- rpt, "machine GUID is not a valid UUID",
- RRDPUSH_STATUS_INVALID_MACHINE_GUID, NDLP_WARNING);
-
- receiver_state_free(rpt);
- return rrdpush_receiver_permission_denied(w);
- }
- }
-
- const char *api_key_type = appconfig_get(&stream_config, rpt->key, "type", "api");
- if(!api_key_type || !*api_key_type) api_key_type = "unknown";
- if(strcmp(api_key_type, "api") != 0) {
- rrdpush_receive_log_status(
- rpt, "API key is a machine GUID",
- RRDPUSH_STATUS_INVALID_API_KEY, NDLP_WARNING);
-
- receiver_state_free(rpt);
- return rrdpush_receiver_permission_denied(w);
- }
-
- if(!appconfig_get_boolean(&stream_config, rpt->key, "enabled", 0)) {
- rrdpush_receive_log_status(
- rpt, "API key is not enabled",
- RRDPUSH_STATUS_API_KEY_DISABLED, NDLP_WARNING);
-
- receiver_state_free(rpt);
- return rrdpush_receiver_permission_denied(w);
- }
-
- {
- SIMPLE_PATTERN *key_allow_from = simple_pattern_create(
- appconfig_get(&stream_config, rpt->key, "allow from", "*"),
- NULL, SIMPLE_PATTERN_EXACT, true);
-
- if(key_allow_from) {
- if(!simple_pattern_matches(key_allow_from, w->client_ip)) {
- simple_pattern_free(key_allow_from);
-
- rrdpush_receive_log_status(
- rpt, "API key is not allowed from this IP",
- RRDPUSH_STATUS_NOT_ALLOWED_IP, NDLP_WARNING);
-
- receiver_state_free(rpt);
- return rrdpush_receiver_permission_denied(w);
- }
-
- simple_pattern_free(key_allow_from);
- }
- }
-
- {
- const char *machine_guid_type = appconfig_get(&stream_config, rpt->machine_guid, "type", "machine");
- if (!machine_guid_type || !*machine_guid_type) machine_guid_type = "unknown";
-
- if (strcmp(machine_guid_type, "machine") != 0) {
- rrdpush_receive_log_status(
- rpt, "machine GUID is an API key",
- RRDPUSH_STATUS_INVALID_MACHINE_GUID, NDLP_WARNING);
-
- receiver_state_free(rpt);
- return rrdpush_receiver_permission_denied(w);
- }
- }
-
- if(!appconfig_get_boolean(&stream_config, rpt->machine_guid, "enabled", 1)) {
- rrdpush_receive_log_status(
- rpt, "machine GUID is not enabled",
- RRDPUSH_STATUS_MACHINE_GUID_DISABLED, NDLP_WARNING);
-
- receiver_state_free(rpt);
- return rrdpush_receiver_permission_denied(w);
- }
-
- {
- SIMPLE_PATTERN *machine_allow_from = simple_pattern_create(
- appconfig_get(&stream_config, rpt->machine_guid, "allow from", "*"),
- NULL, SIMPLE_PATTERN_EXACT, true);
-
- if(machine_allow_from) {
- if(!simple_pattern_matches(machine_allow_from, w->client_ip)) {
- simple_pattern_free(machine_allow_from);
-
- rrdpush_receive_log_status(
- rpt, "machine GUID is not allowed from this IP",
- RRDPUSH_STATUS_NOT_ALLOWED_IP, NDLP_WARNING);
-
- receiver_state_free(rpt);
- return rrdpush_receiver_permission_denied(w);
- }
-
- simple_pattern_free(machine_allow_from);
- }
- }
-
- if (strcmp(rpt->machine_guid, localhost->machine_guid) == 0) {
-
- rrdpush_receiver_takeover_web_connection(w, rpt);
-
- rrdpush_receive_log_status(
- rpt, "machine GUID is my own",
- RRDPUSH_STATUS_LOCALHOST, NDLP_DEBUG);
-
- char initial_response[HTTP_HEADER_SIZE + 1];
- snprintfz(initial_response, HTTP_HEADER_SIZE, "%s", START_STREAMING_ERROR_SAME_LOCALHOST);
-
- if(send_timeout(
-#ifdef ENABLE_HTTPS
- &rpt->ssl,
-#endif
- rpt->fd, initial_response, strlen(initial_response), 0, 60) != (ssize_t)strlen(initial_response)) {
-
- nd_log_daemon(NDLP_ERR, "STREAM '%s' [receive from [%s]:%s]: "
- "failed to reply."
- , rpt->hostname
- , rpt->client_ip, rpt->client_port
- );
- }
-
- receiver_state_free(rpt);
- return HTTP_RESP_OK;
- }
-
- if(unlikely(web_client_streaming_rate_t > 0)) {
- static SPINLOCK spinlock = NETDATA_SPINLOCK_INITIALIZER;
- static time_t last_stream_accepted_t = 0;
-
- time_t now = now_realtime_sec();
- spinlock_lock(&spinlock);
-
- if(unlikely(last_stream_accepted_t == 0))
- last_stream_accepted_t = now;
-
- if(now - last_stream_accepted_t < web_client_streaming_rate_t) {
- spinlock_unlock(&spinlock);
-
- char msg[100 + 1];
- snprintfz(msg, sizeof(msg) - 1,
- "rate limit, will accept new connection in %ld secs",
- (long)(web_client_streaming_rate_t - (now - last_stream_accepted_t)));
-
- rrdpush_receive_log_status(
- rpt, msg,
- RRDPUSH_STATUS_RATE_LIMIT, NDLP_NOTICE);
-
- receiver_state_free(rpt);
- return rrdpush_receiver_too_busy_now(w);
- }
-
- last_stream_accepted_t = now;
- spinlock_unlock(&spinlock);
- }
-
- /*
- * Quick path for rejecting multiple connections. The lock taken is fine-grained - it only protects the receiver
- * pointer within the host (if a host exists). This protects against multiple concurrent web requests hitting
- * separate threads within the web-server and landing here. The lock guards the thread-shutdown sequence that
- * detaches the receiver from the host. If the host is being created (first time-access) then we also use the
- * lock to prevent race-hazard (two threads try to create the host concurrently, one wins and the other does a
- * lookup to the now-attached structure).
- */
-
- {
- time_t age = 0;
- bool receiver_stale = false;
- bool receiver_working = false;
-
- rrd_rdlock();
- RRDHOST *host = rrdhost_find_by_guid(rpt->machine_guid);
- if (unlikely(host && rrdhost_flag_check(host, RRDHOST_FLAG_ARCHIVED))) /* Ignore archived hosts. */
- host = NULL;
-
- if (host) {
- netdata_mutex_lock(&host->receiver_lock);
- if (host->receiver) {
- age = now_monotonic_sec() - host->receiver->last_msg_t;
-
- if (age < 30)
- receiver_working = true;
- else
- receiver_stale = true;
- }
- netdata_mutex_unlock(&host->receiver_lock);
- }
- rrd_rdunlock();
-
- if (receiver_stale && stop_streaming_receiver(host, STREAM_HANDSHAKE_DISCONNECT_STALE_RECEIVER)) {
- // we stopped the receiver
- // we can proceed with this connection
- receiver_stale = false;
-
- nd_log_daemon(NDLP_NOTICE, "STREAM '%s' [receive from [%s]:%s]: "
- "stopped previous stale receiver to accept this one."
- , rpt->hostname
- , rpt->client_ip, rpt->client_port
- );
- }
-
- if (receiver_working || receiver_stale) {
- // another receiver is already connected
- // try again later
-
- char msg[200 + 1];
- snprintfz(msg, sizeof(msg) - 1,
- "multiple connections for same host, "
- "old connection was last used %ld secs ago%s",
- age, receiver_stale ? " (signaled old receiver to stop)" : " (new connection not accepted)");
-
- rrdpush_receive_log_status(
- rpt, msg,
- RRDPUSH_STATUS_ALREADY_CONNECTED, NDLP_DEBUG);
-
- // Have not set WEB_CLIENT_FLAG_DONT_CLOSE_SOCKET - caller should clean up
- buffer_flush(w->response.data);
- buffer_strcat(w->response.data, START_STREAMING_ERROR_ALREADY_STREAMING);
- receiver_state_free(rpt);
- return HTTP_RESP_CONFLICT;
- }
- }
-
- rrdpush_receiver_takeover_web_connection(w, rpt);
-
- char tag[NETDATA_THREAD_TAG_MAX + 1];
- snprintfz(tag, NETDATA_THREAD_TAG_MAX, THREAD_TAG_STREAM_RECEIVER "[%s]", rpt->hostname);
- tag[NETDATA_THREAD_TAG_MAX] = '\0';
-
- rpt->thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_DEFAULT, rrdpush_receiver_thread, (void *)rpt);
- if(!rpt->thread) {
- rrdpush_receive_log_status(
- rpt, "can't create receiver thread",
- RRDPUSH_STATUS_INTERNAL_SERVER_ERROR, NDLP_ERR);
-
- buffer_flush(w->response.data);
- buffer_strcat(w->response.data, "Can't handle this request");
- receiver_state_free(rpt);
- return HTTP_RESP_INTERNAL_SERVER_ERROR;
- }
-
- // prevent the caller from closing the streaming socket
- return HTTP_RESP_OK;
-}
-
-void rrdpush_reset_destinations_postpone_time(RRDHOST *host) {
- uint32_t wait = (host->sender) ? host->sender->reconnect_delay : 5;
- time_t now = now_realtime_sec();
- for (struct rrdpush_destinations *d = host->destinations; d; d = d->next)
- d->postpone_reconnection_until = now + wait;
-}
-
-static struct {
- STREAM_HANDSHAKE err;
- const char *str;
-} handshake_errors[] = {
- { STREAM_HANDSHAKE_OK_V3, "CONNECTED" },
- { STREAM_HANDSHAKE_OK_V2, "CONNECTED" },
- { STREAM_HANDSHAKE_OK_V1, "CONNECTED" },
- { STREAM_HANDSHAKE_NEVER, "" },
- { STREAM_HANDSHAKE_ERROR_BAD_HANDSHAKE, "BAD HANDSHAKE" },
- { STREAM_HANDSHAKE_ERROR_LOCALHOST, "LOCALHOST" },
- { STREAM_HANDSHAKE_ERROR_ALREADY_CONNECTED, "ALREADY CONNECTED" },
- { STREAM_HANDSHAKE_ERROR_DENIED, "DENIED" },
- { STREAM_HANDSHAKE_ERROR_SEND_TIMEOUT, "SEND TIMEOUT" },
- { STREAM_HANDSHAKE_ERROR_RECEIVE_TIMEOUT, "RECEIVE TIMEOUT" },
- { STREAM_HANDSHAKE_ERROR_INVALID_CERTIFICATE, "INVALID CERTIFICATE" },
- { STREAM_HANDSHAKE_ERROR_SSL_ERROR, "SSL ERROR" },
- { STREAM_HANDSHAKE_ERROR_CANT_CONNECT, "CANT CONNECT" },
- { STREAM_HANDSHAKE_BUSY_TRY_LATER, "BUSY TRY LATER" },
- { STREAM_HANDSHAKE_INTERNAL_ERROR, "INTERNAL ERROR" },
- { STREAM_HANDSHAKE_INITIALIZATION, "REMOTE IS INITIALIZING" },
- { STREAM_HANDSHAKE_DISCONNECT_HOST_CLEANUP, "DISCONNECTED HOST CLEANUP" },
- { STREAM_HANDSHAKE_DISCONNECT_STALE_RECEIVER, "DISCONNECTED STALE RECEIVER" },
- { STREAM_HANDSHAKE_DISCONNECT_SHUTDOWN, "DISCONNECTED SHUTDOWN REQUESTED" },
- { STREAM_HANDSHAKE_DISCONNECT_NETDATA_EXIT, "DISCONNECTED NETDATA EXIT" },
- { STREAM_HANDSHAKE_DISCONNECT_PARSER_EXIT, "DISCONNECTED PARSE ENDED" },
- {STREAM_HANDSHAKE_DISCONNECT_UNKNOWN_SOCKET_READ_ERROR, "DISCONNECTED UNKNOWN SOCKET READ ERROR" },
- { STREAM_HANDSHAKE_DISCONNECT_PARSER_FAILED, "DISCONNECTED PARSE ERROR" },
- { STREAM_HANDSHAKE_DISCONNECT_RECEIVER_LEFT, "DISCONNECTED RECEIVER LEFT" },
- { STREAM_HANDSHAKE_DISCONNECT_ORPHAN_HOST, "DISCONNECTED ORPHAN HOST" },
- { STREAM_HANDSHAKE_NON_STREAMABLE_HOST, "NON STREAMABLE HOST" },
- { STREAM_HANDSHAKE_DISCONNECT_NOT_SUFFICIENT_READ_BUFFER, "DISCONNECTED NOT SUFFICIENT READ BUFFER" },
- {STREAM_HANDSHAKE_DISCONNECT_SOCKET_EOF, "DISCONNECTED SOCKET EOF" },
- {STREAM_HANDSHAKE_DISCONNECT_SOCKET_READ_FAILED, "DISCONNECTED SOCKET READ FAILED" },
- {STREAM_HANDSHAKE_DISCONNECT_SOCKET_READ_TIMEOUT, "DISCONNECTED SOCKET READ TIMEOUT" },
- { 0, NULL },
-};
-
-const char *stream_handshake_error_to_string(STREAM_HANDSHAKE handshake_error) {
- if(handshake_error >= STREAM_HANDSHAKE_OK_V1)
- // handshake_error is the whole version / capabilities number
- return "CONNECTED";
-
- for(size_t i = 0; handshake_errors[i].str ; i++) {
- if(handshake_error == handshake_errors[i].err)
- return handshake_errors[i].str;
- }
-
- return "UNKNOWN";
-}
-
-static struct {
- STREAM_CAPABILITIES cap;
- const char *str;
-} capability_names[] = {
- {STREAM_CAP_V1, "V1" },
- {STREAM_CAP_V2, "V2" },
- {STREAM_CAP_VN, "VN" },
- {STREAM_CAP_VCAPS, "VCAPS" },
- {STREAM_CAP_HLABELS, "HLABELS" },
- {STREAM_CAP_CLAIM, "CLAIM" },
- {STREAM_CAP_CLABELS, "CLABELS" },
- {STREAM_CAP_LZ4, "LZ4" },
- {STREAM_CAP_FUNCTIONS, "FUNCTIONS" },
- {STREAM_CAP_REPLICATION, "REPLICATION" },
- {STREAM_CAP_BINARY, "BINARY" },
- {STREAM_CAP_INTERPOLATED, "INTERPOLATED" },
- {STREAM_CAP_IEEE754, "IEEE754" },
- {STREAM_CAP_DATA_WITH_ML, "ML" },
- {STREAM_CAP_DYNCFG, "DYNCFG" },
- {STREAM_CAP_SLOTS, "SLOTS" },
- {STREAM_CAP_ZSTD, "ZSTD" },
- {STREAM_CAP_GZIP, "GZIP" },
- {STREAM_CAP_BROTLI, "BROTLI" },
- {STREAM_CAP_PROGRESS, "PROGRESS" },
- {0 , NULL },
-};
-
-void stream_capabilities_to_string(BUFFER *wb, STREAM_CAPABILITIES caps) {
- for(size_t i = 0; capability_names[i].str ; i++) {
- if(caps & capability_names[i].cap) {
- buffer_strcat(wb, capability_names[i].str);
- buffer_strcat(wb, " ");
- }
- }
-}
-
-void stream_capabilities_to_json_array(BUFFER *wb, STREAM_CAPABILITIES caps, const char *key) {
- if(key)
- buffer_json_member_add_array(wb, key);
- else
- buffer_json_add_array_item_array(wb);
-
- for(size_t i = 0; capability_names[i].str ; i++) {
- if(caps & capability_names[i].cap)
- buffer_json_add_array_item_string(wb, capability_names[i].str);
- }
-
- buffer_json_array_close(wb);
-}
-
-void log_receiver_capabilities(struct receiver_state *rpt) {
- BUFFER *wb = buffer_create(100, NULL);
- stream_capabilities_to_string(wb, rpt->capabilities);
-
- nd_log_daemon(NDLP_INFO, "STREAM %s [receive from [%s]:%s]: established link with negotiated capabilities: %s",
- rrdhost_hostname(rpt->host), rpt->client_ip, rpt->client_port, buffer_tostring(wb));
-
- buffer_free(wb);
-}
-
-void log_sender_capabilities(struct sender_state *s) {
- BUFFER *wb = buffer_create(100, NULL);
- stream_capabilities_to_string(wb, s->capabilities);
-
- nd_log_daemon(NDLP_INFO, "STREAM %s [send to %s]: established link with negotiated capabilities: %s",
- rrdhost_hostname(s->host), s->connected_to, buffer_tostring(wb));
-
- buffer_free(wb);
-}
-
-STREAM_CAPABILITIES stream_our_capabilities(RRDHOST *host, bool sender) {
- STREAM_CAPABILITIES disabled_capabilities = globally_disabled_capabilities;
-
- if(host && sender) {
- // we have DATA_WITH_ML capability
- // we should remove the DATA_WITH_ML capability if our database does not have anomaly info
- // this can happen under these conditions: 1. we don't run ML, and 2. we don't receive ML
- netdata_mutex_lock(&host->receiver_lock);
-
- if(!ml_host_running(host) && !stream_has_capability(host->receiver, STREAM_CAP_DATA_WITH_ML))
- disabled_capabilities |= STREAM_CAP_DATA_WITH_ML;
-
- netdata_mutex_unlock(&host->receiver_lock);
-
- if(host->sender)
- disabled_capabilities |= host->sender->disabled_capabilities;
- }
-
- return (STREAM_CAP_V1 |
- STREAM_CAP_V2 |
- STREAM_CAP_VN |
- STREAM_CAP_VCAPS |
- STREAM_CAP_HLABELS |
- STREAM_CAP_CLAIM |
- STREAM_CAP_CLABELS |
- STREAM_CAP_FUNCTIONS |
- STREAM_CAP_REPLICATION |
- STREAM_CAP_BINARY |
- STREAM_CAP_INTERPOLATED |
- STREAM_CAP_SLOTS |
- STREAM_CAP_PROGRESS |
- STREAM_CAP_COMPRESSIONS_AVAILABLE |
- STREAM_CAP_DYNCFG |
- STREAM_CAP_IEEE754 |
- STREAM_CAP_DATA_WITH_ML |
- 0) & ~disabled_capabilities;
-}
-
-STREAM_CAPABILITIES convert_stream_version_to_capabilities(int32_t version, RRDHOST *host, bool sender) {
- STREAM_CAPABILITIES caps = 0;
-
- if(version <= 1) caps = STREAM_CAP_V1;
- else if(version < STREAM_OLD_VERSION_CLAIM) caps = STREAM_CAP_V2 | STREAM_CAP_HLABELS;
- else if(version <= STREAM_OLD_VERSION_CLAIM) caps = STREAM_CAP_VN | STREAM_CAP_HLABELS | STREAM_CAP_CLAIM;
- else if(version <= STREAM_OLD_VERSION_CLABELS) caps = STREAM_CAP_VN | STREAM_CAP_HLABELS | STREAM_CAP_CLAIM | STREAM_CAP_CLABELS;
- else if(version <= STREAM_OLD_VERSION_LZ4) caps = STREAM_CAP_VN | STREAM_CAP_HLABELS | STREAM_CAP_CLAIM | STREAM_CAP_CLABELS | STREAM_CAP_LZ4_AVAILABLE;
- else caps = version;
-
- if(caps & STREAM_CAP_VCAPS)
- caps &= ~(STREAM_CAP_V1|STREAM_CAP_V2|STREAM_CAP_VN);
-
- if(caps & STREAM_CAP_VN)
- caps &= ~(STREAM_CAP_V1|STREAM_CAP_V2);
-
- if(caps & STREAM_CAP_V2)
- caps &= ~(STREAM_CAP_V1);
-
- STREAM_CAPABILITIES common_caps = caps & stream_our_capabilities(host, sender);
-
- if(!(common_caps & STREAM_CAP_INTERPOLATED))
- // DATA WITH ML requires INTERPOLATED
- common_caps &= ~STREAM_CAP_DATA_WITH_ML;
-
- return common_caps;
-}
-
-int32_t stream_capabilities_to_vn(uint32_t caps) {
- if(caps & STREAM_CAP_LZ4) return STREAM_OLD_VERSION_LZ4;
- if(caps & STREAM_CAP_CLABELS) return STREAM_OLD_VERSION_CLABELS;
- return STREAM_OLD_VERSION_CLAIM; // if(caps & STREAM_CAP_CLAIM)
-}
diff --git a/src/streaming/rrdpush.h b/src/streaming/rrdpush.h
index d55a07675..55d0c296c 100644
--- a/src/streaming/rrdpush.h
+++ b/src/streaming/rrdpush.h
@@ -3,759 +3,16 @@
#ifndef NETDATA_RRDPUSH_H
#define NETDATA_RRDPUSH_H 1
-#include "libnetdata/libnetdata.h"
-#include "daemon/common.h"
-#include "web/server/web_client.h"
-#include "database/rrdfunctions.h"
-#include "database/rrd.h"
+#include "stream-handshake.h"
+#include "stream-capabilities.h"
+#include "stream-conf.h"
+#include "stream-compression/compression.h"
-#define CONNECTED_TO_SIZE 100
-#define CBUFFER_INITIAL_SIZE (16 * 1024)
-#define THREAD_BUFFER_INITIAL_SIZE (CBUFFER_INITIAL_SIZE / 2)
+#include "sender.h"
+#include "receiver.h"
-// ----------------------------------------------------------------------------
-// obsolete versions - do not use anymore
-
-#define STREAM_OLD_VERSION_CLAIM 3
-#define STREAM_OLD_VERSION_CLABELS 4
-#define STREAM_OLD_VERSION_LZ4 5
-
-// ----------------------------------------------------------------------------
-// capabilities negotiation
-
-typedef enum {
- STREAM_CAP_NONE = 0,
-
- // do not use the first 3 bits
- // they used to be versions 1, 2 and 3
- // before we introduce capabilities
-
- STREAM_CAP_V1 = (1 << 3), // v1 = the oldest protocol
- STREAM_CAP_V2 = (1 << 4), // v2 = the second version of the protocol (with host labels)
- STREAM_CAP_VN = (1 << 5), // version negotiation supported (for versions 3, 4, 5 of the protocol)
- // v3 = claiming supported
- // v4 = chart labels supported
- // v5 = lz4 compression supported
- STREAM_CAP_VCAPS = (1 << 6), // capabilities negotiation supported
- STREAM_CAP_HLABELS = (1 << 7), // host labels supported
- STREAM_CAP_CLAIM = (1 << 8), // claiming supported
- STREAM_CAP_CLABELS = (1 << 9), // chart labels supported
- STREAM_CAP_LZ4 = (1 << 10), // lz4 compression supported
- STREAM_CAP_FUNCTIONS = (1 << 11), // plugin functions supported
- STREAM_CAP_REPLICATION = (1 << 12), // replication supported
- STREAM_CAP_BINARY = (1 << 13), // streaming supports binary data
- STREAM_CAP_INTERPOLATED = (1 << 14), // streaming supports interpolated streaming of values
- STREAM_CAP_IEEE754 = (1 << 15), // streaming supports binary/hex transfer of double values
- STREAM_CAP_DATA_WITH_ML = (1 << 16), // streaming supports transferring anomaly bit
- // STREAM_CAP_DYNCFG = (1 << 17), // leave this unused for as long as possible
- STREAM_CAP_SLOTS = (1 << 18), // the sender can appoint a unique slot for each chart
- STREAM_CAP_ZSTD = (1 << 19), // ZSTD compression supported
- STREAM_CAP_GZIP = (1 << 20), // GZIP compression supported
- STREAM_CAP_BROTLI = (1 << 21), // BROTLI compression supported
- STREAM_CAP_PROGRESS = (1 << 22), // Functions PROGRESS support
- STREAM_CAP_DYNCFG = (1 << 23), // support for DYNCFG
-
- STREAM_CAP_INVALID = (1 << 30), // used as an invalid value for capabilities when this is set
- // this must be signed int, so don't use the last bit
- // needed for negotiating errors between parent and child
-} STREAM_CAPABILITIES;
-
-#ifdef ENABLE_LZ4
-#define STREAM_CAP_LZ4_AVAILABLE STREAM_CAP_LZ4
-#else
-#define STREAM_CAP_LZ4_AVAILABLE 0
-#endif // ENABLE_LZ4
-
-#ifdef ENABLE_ZSTD
-#define STREAM_CAP_ZSTD_AVAILABLE STREAM_CAP_ZSTD
-#else
-#define STREAM_CAP_ZSTD_AVAILABLE 0
-#endif // ENABLE_ZSTD
-
-#ifdef ENABLE_BROTLI
-#define STREAM_CAP_BROTLI_AVAILABLE STREAM_CAP_BROTLI
-#else
-#define STREAM_CAP_BROTLI_AVAILABLE 0
-#endif // ENABLE_BROTLI
-
-#define STREAM_CAP_COMPRESSIONS_AVAILABLE (STREAM_CAP_LZ4_AVAILABLE|STREAM_CAP_ZSTD_AVAILABLE|STREAM_CAP_BROTLI_AVAILABLE|STREAM_CAP_GZIP)
-
-extern STREAM_CAPABILITIES globally_disabled_capabilities;
-
-STREAM_CAPABILITIES stream_our_capabilities(RRDHOST *host, bool sender);
-
-#define stream_has_capability(rpt, capability) ((rpt) && ((rpt)->capabilities & (capability)) == (capability))
-
-static inline bool stream_has_more_than_one_capability_of(STREAM_CAPABILITIES caps, STREAM_CAPABILITIES mask) {
- STREAM_CAPABILITIES common = (STREAM_CAPABILITIES)(caps & mask);
- return (common & (common - 1)) != 0 && common != 0;
-}
-
-// ----------------------------------------------------------------------------
-// stream handshake
-
-#define HTTP_HEADER_SIZE 8192
-
-#define STREAMING_PROTOCOL_VERSION "1.1"
-#define START_STREAMING_PROMPT_V1 "Hit me baby, push them over..."
-#define START_STREAMING_PROMPT_V2 "Hit me baby, push them over and bring the host labels..."
-#define START_STREAMING_PROMPT_VN "Hit me baby, push them over with the version="
-
-#define START_STREAMING_ERROR_SAME_LOCALHOST "Don't hit me baby, you are trying to stream my localhost back"
-#define START_STREAMING_ERROR_ALREADY_STREAMING "This GUID is already streaming to this server"
-#define START_STREAMING_ERROR_NOT_PERMITTED "You are not permitted to access this. Check the logs for more info."
-#define START_STREAMING_ERROR_BUSY_TRY_LATER "The server is too busy now to accept this request. Try later."
-#define START_STREAMING_ERROR_INTERNAL_ERROR "The server encountered an internal error. Try later."
-#define START_STREAMING_ERROR_INITIALIZATION "The server is initializing. Try later."
-
-#define RRDPUSH_STATUS_CONNECTED "CONNECTED"
-#define RRDPUSH_STATUS_ALREADY_CONNECTED "ALREADY CONNECTED"
-#define RRDPUSH_STATUS_DISCONNECTED "DISCONNECTED"
-#define RRDPUSH_STATUS_RATE_LIMIT "RATE LIMIT TRY LATER"
-#define RRDPUSH_STATUS_INITIALIZATION_IN_PROGRESS "INITIALIZATION IN PROGRESS RETRY LATER"
-#define RRDPUSH_STATUS_INTERNAL_SERVER_ERROR "INTERNAL SERVER ERROR DROPPING CONNECTION"
-#define RRDPUSH_STATUS_DUPLICATE_RECEIVER "DUPLICATE RECEIVER DROPPING CONNECTION"
-#define RRDPUSH_STATUS_CANT_REPLY "CANT REPLY DROPPING CONNECTION"
-#define RRDPUSH_STATUS_NO_HOSTNAME "NO HOSTNAME PERMISSION DENIED"
-#define RRDPUSH_STATUS_NO_API_KEY "NO API KEY PERMISSION DENIED"
-#define RRDPUSH_STATUS_INVALID_API_KEY "INVALID API KEY PERMISSION DENIED"
-#define RRDPUSH_STATUS_NO_MACHINE_GUID "NO MACHINE GUID PERMISSION DENIED"
-#define RRDPUSH_STATUS_MACHINE_GUID_DISABLED "MACHINE GUID DISABLED PERMISSION DENIED"
-#define RRDPUSH_STATUS_INVALID_MACHINE_GUID "INVALID MACHINE GUID PERMISSION DENIED"
-#define RRDPUSH_STATUS_API_KEY_DISABLED "API KEY DISABLED PERMISSION DENIED"
-#define RRDPUSH_STATUS_NOT_ALLOWED_IP "NOT ALLOWED IP PERMISSION DENIED"
-#define RRDPUSH_STATUS_LOCALHOST "LOCALHOST PERMISSION DENIED"
-#define RRDPUSH_STATUS_PERMISSION_DENIED "PERMISSION DENIED"
-#define RRDPUSH_STATUS_BAD_HANDSHAKE "BAD HANDSHAKE"
-#define RRDPUSH_STATUS_TIMEOUT "TIMEOUT"
-#define RRDPUSH_STATUS_CANT_UPGRADE_CONNECTION "CANT UPGRADE CONNECTION"
-#define RRDPUSH_STATUS_SSL_ERROR "SSL ERROR"
-#define RRDPUSH_STATUS_INVALID_SSL_CERTIFICATE "INVALID SSL CERTIFICATE"
-#define RRDPUSH_STATUS_CANT_ESTABLISH_SSL_CONNECTION "CANT ESTABLISH SSL CONNECTION"
-
-typedef enum {
- STREAM_HANDSHAKE_OK_V3 = 3, // v3+
- STREAM_HANDSHAKE_OK_V2 = 2, // v2
- STREAM_HANDSHAKE_OK_V1 = 1, // v1
- STREAM_HANDSHAKE_NEVER = 0, // never tried to connect
- STREAM_HANDSHAKE_ERROR_BAD_HANDSHAKE = -1,
- STREAM_HANDSHAKE_ERROR_LOCALHOST = -2,
- STREAM_HANDSHAKE_ERROR_ALREADY_CONNECTED = -3,
- STREAM_HANDSHAKE_ERROR_DENIED = -4,
- STREAM_HANDSHAKE_ERROR_SEND_TIMEOUT = -5,
- STREAM_HANDSHAKE_ERROR_RECEIVE_TIMEOUT = -6,
- STREAM_HANDSHAKE_ERROR_INVALID_CERTIFICATE = -7,
- STREAM_HANDSHAKE_ERROR_SSL_ERROR = -8,
- STREAM_HANDSHAKE_ERROR_CANT_CONNECT = -9,
- STREAM_HANDSHAKE_BUSY_TRY_LATER = -10,
- STREAM_HANDSHAKE_INTERNAL_ERROR = -11,
- STREAM_HANDSHAKE_INITIALIZATION = -12,
- STREAM_HANDSHAKE_DISCONNECT_HOST_CLEANUP = -13,
- STREAM_HANDSHAKE_DISCONNECT_STALE_RECEIVER = -14,
- STREAM_HANDSHAKE_DISCONNECT_SHUTDOWN = -15,
- STREAM_HANDSHAKE_DISCONNECT_NETDATA_EXIT = -16,
- STREAM_HANDSHAKE_DISCONNECT_PARSER_EXIT = -17,
- STREAM_HANDSHAKE_DISCONNECT_UNKNOWN_SOCKET_READ_ERROR = -18,
- STREAM_HANDSHAKE_DISCONNECT_PARSER_FAILED = -19,
- STREAM_HANDSHAKE_DISCONNECT_RECEIVER_LEFT = -20,
- STREAM_HANDSHAKE_DISCONNECT_ORPHAN_HOST = -21,
- STREAM_HANDSHAKE_NON_STREAMABLE_HOST = -22,
- STREAM_HANDSHAKE_DISCONNECT_NOT_SUFFICIENT_READ_BUFFER = -23,
- STREAM_HANDSHAKE_DISCONNECT_SOCKET_EOF = -24,
- STREAM_HANDSHAKE_DISCONNECT_SOCKET_READ_FAILED = -25,
- STREAM_HANDSHAKE_DISCONNECT_SOCKET_READ_TIMEOUT = -26,
- STREAM_HANDSHAKE_ERROR_HTTP_UPGRADE = -27,
-
-} STREAM_HANDSHAKE;
-
-
-// ----------------------------------------------------------------------------
-
-typedef struct {
- char *os_name;
- char *os_id;
- char *os_version;
- char *kernel_name;
- char *kernel_version;
-} stream_encoded_t;
-
-#include "compression.h"
-
-// Thread-local storage
-// Metric transmission: collector threads asynchronously fill the buffer, sender thread uses it.
-
-typedef enum __attribute__((packed)) {
- STREAM_TRAFFIC_TYPE_REPLICATION = 0,
- STREAM_TRAFFIC_TYPE_FUNCTIONS,
- STREAM_TRAFFIC_TYPE_METADATA,
- STREAM_TRAFFIC_TYPE_DATA,
- STREAM_TRAFFIC_TYPE_DYNCFG,
-
- // terminator
- STREAM_TRAFFIC_TYPE_MAX,
-} STREAM_TRAFFIC_TYPE;
-
-typedef enum __attribute__((packed)) {
- SENDER_FLAG_OVERFLOW = (1 << 0), // The buffer has been overflown
-} SENDER_FLAGS;
-
-struct sender_state {
- RRDHOST *host;
- pid_t tid; // the thread id of the sender, from gettid_cached()
- SENDER_FLAGS flags;
- int timeout;
- int default_port;
- uint32_t reconnect_delay;
- char connected_to[CONNECTED_TO_SIZE + 1]; // We don't know which proxy we connect to, passed back from socket.c
- size_t begin;
- size_t reconnects_counter;
- size_t sent_bytes;
- size_t sent_bytes_on_this_connection;
- size_t send_attempts;
- time_t last_traffic_seen_t;
- time_t last_state_since_t; // the timestamp of the last state (online/offline) change
- size_t not_connected_loops;
- // Metrics are collected asynchronously by collector threads calling rrdset_done_push(). This can also trigger
- // the lazy creation of the sender thread - both cases (buffer access and thread creation) are guarded here.
- SPINLOCK spinlock;
- struct circular_buffer *buffer;
- char read_buffer[PLUGINSD_LINE_MAX + 1];
- ssize_t read_len;
- STREAM_CAPABILITIES capabilities;
- STREAM_CAPABILITIES disabled_capabilities;
-
- size_t sent_bytes_on_this_connection_per_type[STREAM_TRAFFIC_TYPE_MAX];
-
- int rrdpush_sender_pipe[2]; // collector to sender thread signaling
- int rrdpush_sender_socket;
-
- uint16_t hops;
-
- struct line_splitter line;
- struct compressor_state compressor;
-
-#ifdef NETDATA_LOG_STREAM_SENDER
- FILE *stream_log_fp;
-#endif
-
-#ifdef ENABLE_HTTPS
- NETDATA_SSL ssl; // structure used to encrypt the connection
-#endif
-
- struct {
- bool shutdown;
- STREAM_HANDSHAKE reason;
- } exit;
-
- struct {
- DICTIONARY *requests; // de-duplication of replication requests, per chart
- time_t oldest_request_after_t; // the timestamp of the oldest replication request
- time_t latest_completed_before_t; // the timestamp of the latest replication request
-
- struct {
- size_t pending_requests; // the currently outstanding replication requests
- size_t charts_replicating; // the number of unique charts having pending replication requests (on every request one is added and is removed when we finish it - it does not track completion of the replication for this chart)
- bool reached_max; // true when the sender buffer should not get more replication responses
- } atomic;
-
- } replication;
-
- struct {
- bool pending_data;
- size_t buffer_used_percentage; // the current utilization of the sending buffer
- usec_t last_flush_time_ut; // the last time the sender flushed the sending buffer in USEC
- time_t last_buffer_recreate_s; // true when the sender buffer should be re-created
- } atomic;
-
- struct {
- bool intercept_input;
- const char *transaction;
- const char *timeout_s;
- const char *function;
- const char *access;
- const char *source;
- BUFFER *payload;
- } functions;
-
- int parent_using_h2o;
-};
-
-#define sender_lock(sender) spinlock_lock(&(sender)->spinlock)
-#define sender_unlock(sender) spinlock_unlock(&(sender)->spinlock)
-
-#define rrdpush_sender_pipe_has_pending_data(sender) __atomic_load_n(&(sender)->atomic.pending_data, __ATOMIC_RELAXED)
-#define rrdpush_sender_pipe_set_pending_data(sender) __atomic_store_n(&(sender)->atomic.pending_data, true, __ATOMIC_RELAXED)
-#define rrdpush_sender_pipe_clear_pending_data(sender) __atomic_store_n(&(sender)->atomic.pending_data, false, __ATOMIC_RELAXED)
-
-#define rrdpush_sender_last_buffer_recreate_get(sender) __atomic_load_n(&(sender)->atomic.last_buffer_recreate_s, __ATOMIC_RELAXED)
-#define rrdpush_sender_last_buffer_recreate_set(sender, value) __atomic_store_n(&(sender)->atomic.last_buffer_recreate_s, value, __ATOMIC_RELAXED)
-
-#define rrdpush_sender_replication_buffer_full_set(sender, value) __atomic_store_n(&((sender)->replication.atomic.reached_max), value, __ATOMIC_SEQ_CST)
-#define rrdpush_sender_replication_buffer_full_get(sender) __atomic_load_n(&((sender)->replication.atomic.reached_max), __ATOMIC_SEQ_CST)
-
-#define rrdpush_sender_set_buffer_used_percent(sender, value) __atomic_store_n(&((sender)->atomic.buffer_used_percentage), value, __ATOMIC_RELAXED)
-#define rrdpush_sender_get_buffer_used_percent(sender) __atomic_load_n(&((sender)->atomic.buffer_used_percentage), __ATOMIC_RELAXED)
-
-#define rrdpush_sender_set_flush_time(sender) __atomic_store_n(&((sender)->atomic.last_flush_time_ut), now_realtime_usec(), __ATOMIC_RELAXED)
-#define rrdpush_sender_get_flush_time(sender) __atomic_load_n(&((sender)->atomic.last_flush_time_ut), __ATOMIC_RELAXED)
-
-#define rrdpush_sender_replicating_charts(sender) __atomic_load_n(&((sender)->replication.atomic.charts_replicating), __ATOMIC_RELAXED)
-#define rrdpush_sender_replicating_charts_plus_one(sender) __atomic_add_fetch(&((sender)->replication.atomic.charts_replicating), 1, __ATOMIC_RELAXED)
-#define rrdpush_sender_replicating_charts_minus_one(sender) __atomic_sub_fetch(&((sender)->replication.atomic.charts_replicating), 1, __ATOMIC_RELAXED)
-#define rrdpush_sender_replicating_charts_zero(sender) __atomic_store_n(&((sender)->replication.atomic.charts_replicating), 0, __ATOMIC_RELAXED)
-
-#define rrdpush_sender_pending_replication_requests(sender) __atomic_load_n(&((sender)->replication.atomic.pending_requests), __ATOMIC_RELAXED)
-#define rrdpush_sender_pending_replication_requests_plus_one(sender) __atomic_add_fetch(&((sender)->replication.atomic.pending_requests), 1, __ATOMIC_RELAXED)
-#define rrdpush_sender_pending_replication_requests_minus_one(sender) __atomic_sub_fetch(&((sender)->replication.atomic.pending_requests), 1, __ATOMIC_RELAXED)
-#define rrdpush_sender_pending_replication_requests_zero(sender) __atomic_store_n(&((sender)->replication.atomic.pending_requests), 0, __ATOMIC_RELAXED)
-
-/*
-typedef enum {
- STREAM_NODE_INSTANCE_FEATURE_CLOUD_ONLINE = (1 << 0),
- STREAM_NODE_INSTANCE_FEATURE_VIRTUAL_HOST = (1 << 1),
- STREAM_NODE_INSTANCE_FEATURE_HEALTH_ENABLED = (1 << 2),
- STREAM_NODE_INSTANCE_FEATURE_ML_SELF = (1 << 3),
- STREAM_NODE_INSTANCE_FEATURE_ML_RECEIVED = (1 << 4),
- STREAM_NODE_INSTANCE_FEATURE_SSL = (1 << 5),
-} STREAM_NODE_INSTANCE_FEATURES;
-
-typedef struct stream_node_instance {
- uuid_t uuid;
- STRING *agent;
- STREAM_NODE_INSTANCE_FEATURES features;
- uint32_t hops;
-
- // receiver information on that agent
- int32_t capabilities;
- uint32_t local_port;
- uint32_t remote_port;
- STRING *local_ip;
- STRING *remote_ip;
-} STREAM_NODE_INSTANCE;
-*/
-
-struct receiver_state {
- RRDHOST *host;
- pid_t tid;
- ND_THREAD *thread;
- int fd;
- char *key;
- char *hostname;
- char *registry_hostname;
- char *machine_guid;
- char *os;
- char *timezone; // Unused?
- char *abbrev_timezone;
- int32_t utc_offset;
- char *client_ip; // Duplicated in pluginsd
- char *client_port; // Duplicated in pluginsd
- char *program_name; // Duplicated in pluginsd
- char *program_version;
- struct rrdhost_system_info *system_info;
- STREAM_CAPABILITIES capabilities;
- time_t last_msg_t;
-
- struct buffered_reader reader;
-
- uint16_t hops;
-
- struct {
- bool shutdown; // signal the streaming parser to exit
- STREAM_HANDSHAKE reason;
- } exit;
-
- struct {
- RRD_MEMORY_MODE mode;
- int history;
- int update_every;
- int health_enabled; // CONFIG_BOOLEAN_YES, CONFIG_BOOLEAN_NO, CONFIG_BOOLEAN_AUTO
- time_t alarms_delay;
- uint32_t alarms_history;
- int rrdpush_enabled;
- char *rrdpush_api_key; // DONT FREE - it is allocated in appconfig
- char *rrdpush_send_charts_matching; // DONT FREE - it is allocated in appconfig
- bool rrdpush_enable_replication;
- time_t rrdpush_seconds_to_replicate;
- time_t rrdpush_replication_step;
- char *rrdpush_destination; // DONT FREE - it is allocated in appconfig
- unsigned int rrdpush_compression;
- STREAM_CAPABILITIES compression_priorities[COMPRESSION_ALGORITHM_MAX];
- } config;
-
-#ifdef ENABLE_HTTPS
- NETDATA_SSL ssl;
-#endif
-
- time_t replication_first_time_t;
-
- struct decompressor_state decompressor;
-/*
- struct {
- uint32_t count;
- STREAM_NODE_INSTANCE *array;
- } instances;
-*/
-
-#ifdef ENABLE_H2O
- void *h2o_ctx;
-#endif
-};
-
-#ifdef ENABLE_H2O
-#define is_h2o_rrdpush(x) ((x)->h2o_ctx != NULL)
-#define unless_h2o_rrdpush(x) if(!is_h2o_rrdpush(x))
-#endif
-
-struct rrdpush_destinations {
- STRING *destination;
- bool ssl;
- uint32_t attempts;
- time_t since;
- time_t postpone_reconnection_until;
- STREAM_HANDSHAKE reason;
-
- struct rrdpush_destinations *prev;
- struct rrdpush_destinations *next;
-};
-
-extern unsigned int default_rrdpush_enabled;
-extern unsigned int default_rrdpush_compression_enabled;
-extern char *default_rrdpush_destination;
-extern char *default_rrdpush_api_key;
-extern char *default_rrdpush_send_charts_matching;
-extern bool default_rrdpush_enable_replication;
-extern time_t default_rrdpush_seconds_to_replicate;
-extern time_t default_rrdpush_replication_step;
-extern unsigned int remote_clock_resync_iterations;
-
-void rrdpush_destinations_init(RRDHOST *host);
-void rrdpush_destinations_free(RRDHOST *host);
-
-BUFFER *sender_start(struct sender_state *s);
-void sender_commit(struct sender_state *s, BUFFER *wb, STREAM_TRAFFIC_TYPE type);
-int rrdpush_init();
-bool rrdpush_receiver_needs_dbengine();
-int configured_as_parent();
-
-typedef struct rrdset_stream_buffer {
- STREAM_CAPABILITIES capabilities;
- bool v2;
- bool begin_v2_added;
- time_t wall_clock_time;
- uint64_t rrdset_flags; // RRDSET_FLAGS
- time_t last_point_end_time_s;
- BUFFER *wb;
-} RRDSET_STREAM_BUFFER;
-
-RRDSET_STREAM_BUFFER rrdset_push_metric_initialize(RRDSET *st, time_t wall_clock_time);
-void rrdset_push_metrics_v1(RRDSET_STREAM_BUFFER *rsb, RRDSET *st);
-void rrdset_push_metrics_finished(RRDSET_STREAM_BUFFER *rsb, RRDSET *st);
-void rrddim_push_metrics_v2(RRDSET_STREAM_BUFFER *rsb, RRDDIM *rd, usec_t point_end_time_ut, NETDATA_DOUBLE n, SN_FLAGS flags);
-
-bool rrdset_push_chart_definition_now(RRDSET *st);
-void *rrdpush_sender_thread(void *ptr);
-void rrdpush_send_host_labels(RRDHOST *host);
-void rrdpush_send_claimed_id(RRDHOST *host);
-void rrdpush_send_global_functions(RRDHOST *host);
-
-int rrdpush_receiver_thread_spawn(struct web_client *w, char *decoded_query_string, void *h2o_ctx);
-void rrdpush_sender_thread_stop(RRDHOST *host, STREAM_HANDSHAKE reason, bool wait);
-
-void rrdpush_sender_send_this_host_variable_now(RRDHOST *host, const RRDVAR_ACQUIRED *rva);
-int connect_to_one_of_destinations(
- RRDHOST *host,
- int default_port,
- struct timeval *timeout,
- size_t *reconnects_counter,
- char *connected_to,
- size_t connected_to_size,
- struct rrdpush_destinations **destination);
-
-void rrdpush_signal_sender_to_wake_up(struct sender_state *s);
-
-void rrdpush_reset_destinations_postpone_time(RRDHOST *host);
-const char *stream_handshake_error_to_string(STREAM_HANDSHAKE handshake_error);
-void stream_capabilities_to_json_array(BUFFER *wb, STREAM_CAPABILITIES caps, const char *key);
-void rrdpush_receive_log_status(struct receiver_state *rpt, const char *msg, const char *status, ND_LOG_FIELD_PRIORITY priority);
-void log_receiver_capabilities(struct receiver_state *rpt);
-void log_sender_capabilities(struct sender_state *s);
-STREAM_CAPABILITIES convert_stream_version_to_capabilities(int32_t version, RRDHOST *host, bool sender);
-int32_t stream_capabilities_to_vn(uint32_t caps);
-void stream_capabilities_to_string(BUFFER *wb, STREAM_CAPABILITIES caps);
-
-void receiver_state_free(struct receiver_state *rpt);
-bool stop_streaming_receiver(RRDHOST *host, STREAM_HANDSHAKE reason);
-
-void sender_thread_buffer_free(void);
-
-#include "replication.h"
-
-typedef enum __attribute__((packed)) {
- RRDHOST_DB_STATUS_INITIALIZING = 0,
- RRDHOST_DB_STATUS_QUERYABLE,
-} RRDHOST_DB_STATUS;
-
-static inline const char *rrdhost_db_status_to_string(RRDHOST_DB_STATUS status) {
- switch(status) {
- default:
- case RRDHOST_DB_STATUS_INITIALIZING:
- return "initializing";
-
- case RRDHOST_DB_STATUS_QUERYABLE:
- return "online";
- }
-}
-
-typedef enum __attribute__((packed)) {
- RRDHOST_DB_LIVENESS_STALE = 0,
- RRDHOST_DB_LIVENESS_LIVE,
-} RRDHOST_DB_LIVENESS;
-
-static inline const char *rrdhost_db_liveness_to_string(RRDHOST_DB_LIVENESS status) {
- switch(status) {
- default:
- case RRDHOST_DB_LIVENESS_STALE:
- return "stale";
-
- case RRDHOST_DB_LIVENESS_LIVE:
- return "live";
- }
-}
-
-typedef enum __attribute__((packed)) {
- RRDHOST_INGEST_STATUS_ARCHIVED = 0,
- RRDHOST_INGEST_STATUS_INITIALIZING,
- RRDHOST_INGEST_STATUS_REPLICATING,
- RRDHOST_INGEST_STATUS_ONLINE,
- RRDHOST_INGEST_STATUS_OFFLINE,
-} RRDHOST_INGEST_STATUS;
-
-static inline const char *rrdhost_ingest_status_to_string(RRDHOST_INGEST_STATUS status) {
- switch(status) {
- case RRDHOST_INGEST_STATUS_ARCHIVED:
- return "archived";
-
- case RRDHOST_INGEST_STATUS_INITIALIZING:
- return "initializing";
-
- case RRDHOST_INGEST_STATUS_REPLICATING:
- return "replicating";
-
- case RRDHOST_INGEST_STATUS_ONLINE:
- return "online";
-
- default:
- case RRDHOST_INGEST_STATUS_OFFLINE:
- return "offline";
- }
-}
-
-typedef enum __attribute__((packed)) {
- RRDHOST_INGEST_TYPE_LOCALHOST = 0,
- RRDHOST_INGEST_TYPE_VIRTUAL,
- RRDHOST_INGEST_TYPE_CHILD,
- RRDHOST_INGEST_TYPE_ARCHIVED,
-} RRDHOST_INGEST_TYPE;
-
-static inline const char *rrdhost_ingest_type_to_string(RRDHOST_INGEST_TYPE type) {
- switch(type) {
- case RRDHOST_INGEST_TYPE_LOCALHOST:
- return "localhost";
-
- case RRDHOST_INGEST_TYPE_VIRTUAL:
- return "virtual";
-
- case RRDHOST_INGEST_TYPE_CHILD:
- return "child";
-
- default:
- case RRDHOST_INGEST_TYPE_ARCHIVED:
- return "archived";
- }
-}
-
-typedef enum __attribute__((packed)) {
- RRDHOST_STREAM_STATUS_DISABLED = 0,
- RRDHOST_STREAM_STATUS_REPLICATING,
- RRDHOST_STREAM_STATUS_ONLINE,
- RRDHOST_STREAM_STATUS_OFFLINE,
-} RRDHOST_STREAMING_STATUS;
-
-static inline const char *rrdhost_streaming_status_to_string(RRDHOST_STREAMING_STATUS status) {
- switch(status) {
- case RRDHOST_STREAM_STATUS_DISABLED:
- return "disabled";
-
- case RRDHOST_STREAM_STATUS_REPLICATING:
- return "replicating";
-
- case RRDHOST_STREAM_STATUS_ONLINE:
- return "online";
-
- default:
- case RRDHOST_STREAM_STATUS_OFFLINE:
- return "offline";
- }
-}
-
-typedef enum __attribute__((packed)) {
- RRDHOST_ML_STATUS_DISABLED = 0,
- RRDHOST_ML_STATUS_OFFLINE,
- RRDHOST_ML_STATUS_RUNNING,
-} RRDHOST_ML_STATUS;
-
-static inline const char *rrdhost_ml_status_to_string(RRDHOST_ML_STATUS status) {
- switch(status) {
- case RRDHOST_ML_STATUS_RUNNING:
- return "online";
-
- case RRDHOST_ML_STATUS_OFFLINE:
- return "offline";
-
- default:
- case RRDHOST_ML_STATUS_DISABLED:
- return "disabled";
- }
-}
-
-typedef enum __attribute__((packed)) {
- RRDHOST_ML_TYPE_DISABLED = 0,
- RRDHOST_ML_TYPE_SELF,
- RRDHOST_ML_TYPE_RECEIVED,
-} RRDHOST_ML_TYPE;
-
-static inline const char *rrdhost_ml_type_to_string(RRDHOST_ML_TYPE type) {
- switch(type) {
- case RRDHOST_ML_TYPE_SELF:
- return "self";
-
- case RRDHOST_ML_TYPE_RECEIVED:
- return "received";
-
- default:
- case RRDHOST_ML_TYPE_DISABLED:
- return "disabled";
- }
-}
-
-typedef enum __attribute__((packed)) {
- RRDHOST_HEALTH_STATUS_DISABLED = 0,
- RRDHOST_HEALTH_STATUS_INITIALIZING,
- RRDHOST_HEALTH_STATUS_RUNNING,
-} RRDHOST_HEALTH_STATUS;
-
-static inline const char *rrdhost_health_status_to_string(RRDHOST_HEALTH_STATUS status) {
- switch(status) {
- default:
- case RRDHOST_HEALTH_STATUS_DISABLED:
- return "disabled";
-
- case RRDHOST_HEALTH_STATUS_INITIALIZING:
- return "initializing";
-
- case RRDHOST_HEALTH_STATUS_RUNNING:
- return "online";
- }
-}
-
-typedef enum __attribute__((packed)) {
- RRDHOST_DYNCFG_STATUS_UNAVAILABLE = 0,
- RRDHOST_DYNCFG_STATUS_AVAILABLE,
-} RRDHOST_DYNCFG_STATUS;
-
-static inline const char *rrdhost_dyncfg_status_to_string(RRDHOST_DYNCFG_STATUS status) {
- switch(status) {
- default:
- case RRDHOST_DYNCFG_STATUS_UNAVAILABLE:
- return "unavailable";
-
- case RRDHOST_DYNCFG_STATUS_AVAILABLE:
- return "online";
- }
-}
-
-typedef struct rrdhost_status {
- RRDHOST *host;
- time_t now;
-
- struct {
- RRDHOST_DYNCFG_STATUS status;
- } dyncfg;
-
- struct {
- RRDHOST_DB_STATUS status;
- RRDHOST_DB_LIVENESS liveness;
- RRD_MEMORY_MODE mode;
- time_t first_time_s;
- time_t last_time_s;
- size_t metrics;
- size_t instances;
- size_t contexts;
- } db;
-
- struct {
- RRDHOST_ML_STATUS status;
- RRDHOST_ML_TYPE type;
- struct ml_metrics_statistics metrics;
- } ml;
-
- struct {
- size_t hops;
- RRDHOST_INGEST_TYPE type;
- RRDHOST_INGEST_STATUS status;
- SOCKET_PEERS peers;
- bool ssl;
- STREAM_CAPABILITIES capabilities;
- uint32_t id;
- time_t since;
- STREAM_HANDSHAKE reason;
-
- struct {
- bool in_progress;
- NETDATA_DOUBLE completion;
- size_t instances;
- } replication;
- } ingest;
-
- struct {
- size_t hops;
- RRDHOST_STREAMING_STATUS status;
- SOCKET_PEERS peers;
- bool ssl;
- bool compression;
- STREAM_CAPABILITIES capabilities;
- uint32_t id;
- time_t since;
- STREAM_HANDSHAKE reason;
-
- struct {
- bool in_progress;
- NETDATA_DOUBLE completion;
- size_t instances;
- } replication;
-
- size_t sent_bytes_on_this_connection_per_type[STREAM_TRAFFIC_TYPE_MAX];
- } stream;
-
- struct {
- RRDHOST_HEALTH_STATUS status;
- struct {
- uint32_t undefined;
- uint32_t uninitialized;
- uint32_t clear;
- uint32_t warning;
- uint32_t critical;
- } alerts;
- } health;
-} RRDHOST_STATUS;
-
-void rrdhost_status(RRDHOST *host, time_t now, RRDHOST_STATUS *s);
-bool rrdhost_state_cloud_emulation(RRDHOST *host);
-
-bool rrdpush_compression_initialize(struct sender_state *s);
-bool rrdpush_decompression_initialize(struct receiver_state *rpt);
-void rrdpush_parse_compression_order(struct receiver_state *rpt, const char *order);
-void rrdpush_select_receiver_compression_algorithm(struct receiver_state *rpt);
-void rrdpush_compression_deactivate(struct sender_state *s);
+#include "rrdhost-status.h"
+#include "protocol/commands.h"
+#include "stream-path.h"
#endif //NETDATA_RRDPUSH_H
diff --git a/src/streaming/sender-commit.c b/src/streaming/sender-commit.c
new file mode 100644
index 000000000..6ff7cb2ba
--- /dev/null
+++ b/src/streaming/sender-commit.c
@@ -0,0 +1,168 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "sender-internals.h"
+
+static __thread BUFFER *sender_thread_buffer = NULL;
+static __thread bool sender_thread_buffer_used = false;
+static __thread time_t sender_thread_buffer_last_reset_s = 0;
+
+void sender_thread_buffer_free(void) {
+ buffer_free(sender_thread_buffer);
+ sender_thread_buffer = NULL;
+ sender_thread_buffer_used = false;
+}
+
+// Collector thread starting a transmission
+BUFFER *sender_start(struct sender_state *s) {
+ if(unlikely(sender_thread_buffer_used))
+ fatal("STREAMING: thread buffer is used multiple times concurrently.");
+
+ if(unlikely(rrdpush_sender_last_buffer_recreate_get(s) > sender_thread_buffer_last_reset_s)) {
+ if(unlikely(sender_thread_buffer && sender_thread_buffer->size > THREAD_BUFFER_INITIAL_SIZE)) {
+ buffer_free(sender_thread_buffer);
+ sender_thread_buffer = NULL;
+ }
+ }
+
+ if(unlikely(!sender_thread_buffer)) {
+ sender_thread_buffer = buffer_create(THREAD_BUFFER_INITIAL_SIZE, &netdata_buffers_statistics.buffers_streaming);
+ sender_thread_buffer_last_reset_s = rrdpush_sender_last_buffer_recreate_get(s);
+ }
+
+ sender_thread_buffer_used = true;
+ buffer_flush(sender_thread_buffer);
+ return sender_thread_buffer;
+}
+
+#define SENDER_BUFFER_ADAPT_TO_TIMES_MAX_SIZE 3
+
+// Collector thread finishing a transmission
+void sender_commit(struct sender_state *s, BUFFER *wb, STREAM_TRAFFIC_TYPE type) {
+
+ if(unlikely(wb != sender_thread_buffer))
+ fatal("STREAMING: sender is trying to commit a buffer that is not this thread's buffer.");
+
+ if(unlikely(!sender_thread_buffer_used))
+ fatal("STREAMING: sender is committing a buffer twice.");
+
+ sender_thread_buffer_used = false;
+
+ char *src = (char *)buffer_tostring(wb);
+ size_t src_len = buffer_strlen(wb);
+
+ if(unlikely(!src || !src_len))
+ return;
+
+ sender_lock(s);
+
+#ifdef NETDATA_LOG_STREAM_SENDER
+ if(type == STREAM_TRAFFIC_TYPE_METADATA) {
+ if(!s->stream_log_fp) {
+ char filename[FILENAME_MAX + 1];
+ snprintfz(filename, FILENAME_MAX, "/tmp/stream-sender-%s.txt", s->host ? rrdhost_hostname(s->host) : "unknown");
+
+ s->stream_log_fp = fopen(filename, "w");
+ }
+
+ fprintf(s->stream_log_fp, "\n--- SEND MESSAGE START: %s ----\n"
+ "%s"
+ "--- SEND MESSAGE END ----------------------------------------\n"
+ , rrdhost_hostname(s->host), src
+ );
+ }
+#endif
+
+ if(unlikely(s->buffer->max_size < (src_len + 1) * SENDER_BUFFER_ADAPT_TO_TIMES_MAX_SIZE)) {
+ netdata_log_info("STREAM %s [send to %s]: max buffer size of %zu is too small for a data message of size %zu. Increasing the max buffer size to %d times the max data message size.",
+ rrdhost_hostname(s->host), s->connected_to, s->buffer->max_size, buffer_strlen(wb) + 1, SENDER_BUFFER_ADAPT_TO_TIMES_MAX_SIZE);
+
+ s->buffer->max_size = (src_len + 1) * SENDER_BUFFER_ADAPT_TO_TIMES_MAX_SIZE;
+ }
+
+ if (s->compressor.initialized) {
+ while(src_len) {
+ size_t size_to_compress = src_len;
+
+ if(unlikely(size_to_compress > COMPRESSION_MAX_MSG_SIZE)) {
+ if (stream_has_capability(s, STREAM_CAP_BINARY))
+ size_to_compress = COMPRESSION_MAX_MSG_SIZE;
+ else {
+ if (size_to_compress > COMPRESSION_MAX_MSG_SIZE) {
+ // we need to find the last newline
+ // so that the decompressor will have a whole line to work with
+
+ const char *t = &src[COMPRESSION_MAX_MSG_SIZE];
+ while (--t >= src)
+ if (unlikely(*t == '\n'))
+ break;
+
+ if (t <= src) {
+ size_to_compress = COMPRESSION_MAX_MSG_SIZE;
+ } else
+ size_to_compress = t - src + 1;
+ }
+ }
+ }
+
+ const char *dst;
+ size_t dst_len = rrdpush_compress(&s->compressor, src, size_to_compress, &dst);
+ if (!dst_len) {
+ netdata_log_error("STREAM %s [send to %s]: COMPRESSION failed. Resetting compressor and re-trying",
+ rrdhost_hostname(s->host), s->connected_to);
+
+ rrdpush_compression_initialize(s);
+ dst_len = rrdpush_compress(&s->compressor, src, size_to_compress, &dst);
+ if(!dst_len) {
+ netdata_log_error("STREAM %s [send to %s]: COMPRESSION failed again. Deactivating compression",
+ rrdhost_hostname(s->host), s->connected_to);
+
+ worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_NO_COMPRESSION);
+ rrdpush_compression_deactivate(s);
+ rrdpush_sender_thread_close_socket(s);
+ sender_unlock(s);
+ return;
+ }
+ }
+
+ rrdpush_signature_t signature = rrdpush_compress_encode_signature(dst_len);
+
+#ifdef NETDATA_INTERNAL_CHECKS
+ // check if reversing the signature provides the same length
+ size_t decoded_dst_len = rrdpush_decompress_decode_signature((const char *)&signature, sizeof(signature));
+ if(decoded_dst_len != dst_len)
+ fatal("RRDPUSH COMPRESSION: invalid signature, original payload %zu bytes, "
+ "compressed payload length %zu bytes, but signature says payload is %zu bytes",
+ size_to_compress, dst_len, decoded_dst_len);
+#endif
+
+ if(cbuffer_add_unsafe(s->buffer, (const char *)&signature, sizeof(signature)))
+ s->flags |= SENDER_FLAG_OVERFLOW;
+ else {
+ if(cbuffer_add_unsafe(s->buffer, dst, dst_len))
+ s->flags |= SENDER_FLAG_OVERFLOW;
+ else
+ s->sent_bytes_on_this_connection_per_type[type] += dst_len + sizeof(signature);
+ }
+
+ src = src + size_to_compress;
+ src_len -= size_to_compress;
+ }
+ }
+ else if(cbuffer_add_unsafe(s->buffer, src, src_len))
+ s->flags |= SENDER_FLAG_OVERFLOW;
+ else
+ s->sent_bytes_on_this_connection_per_type[type] += src_len;
+
+ replication_recalculate_buffer_used_ratio_unsafe(s);
+
+ bool signal_sender = false;
+ if(!rrdpush_sender_pipe_has_pending_data(s)) {
+ rrdpush_sender_pipe_set_pending_data(s);
+ signal_sender = true;
+ }
+
+ sender_unlock(s);
+
+ if(signal_sender && (!stream_has_capability(s, STREAM_CAP_INTERPOLATED) || type != STREAM_TRAFFIC_TYPE_DATA))
+ rrdpush_signal_sender_to_wake_up(s);
+}
diff --git a/src/streaming/sender-connect.c b/src/streaming/sender-connect.c
new file mode 100644
index 000000000..ac5f392a0
--- /dev/null
+++ b/src/streaming/sender-connect.c
@@ -0,0 +1,741 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "sender-internals.h"
+
+void rrdpush_sender_thread_close_socket(struct sender_state *s) {
+ rrdhost_flag_clear(s->host, RRDHOST_FLAG_RRDPUSH_SENDER_CONNECTED | RRDHOST_FLAG_RRDPUSH_SENDER_READY_4_METRICS);
+
+ netdata_ssl_close(&s->ssl);
+
+ if(s->rrdpush_sender_socket != -1) {
+ close(s->rrdpush_sender_socket);
+ s->rrdpush_sender_socket = -1;
+ }
+
+ // do not flush the circular buffer here
+ // this function is called sometimes with the sender lock, sometimes without the lock
+}
+
+void rrdpush_encode_variable(stream_encoded_t *se, RRDHOST *host) {
+ se->os_name = (host->system_info->host_os_name)?url_encode(host->system_info->host_os_name):strdupz("");
+ se->os_id = (host->system_info->host_os_id)?url_encode(host->system_info->host_os_id):strdupz("");
+ se->os_version = (host->system_info->host_os_version)?url_encode(host->system_info->host_os_version):strdupz("");
+ se->kernel_name = (host->system_info->kernel_name)?url_encode(host->system_info->kernel_name):strdupz("");
+ se->kernel_version = (host->system_info->kernel_version)?url_encode(host->system_info->kernel_version):strdupz("");
+}
+
+void rrdpush_clean_encoded(stream_encoded_t *se) {
+ if (se->os_name) {
+ freez(se->os_name);
+ se->os_name = NULL;
+ }
+
+ if (se->os_id) {
+ freez(se->os_id);
+ se->os_id = NULL;
+ }
+
+ if (se->os_version) {
+ freez(se->os_version);
+ se->os_version = NULL;
+ }
+
+ if (se->kernel_name) {
+ freez(se->kernel_name);
+ se->kernel_name = NULL;
+ }
+
+ if (se->kernel_version) {
+ freez(se->kernel_version);
+ se->kernel_version = NULL;
+ }
+}
+
+struct {
+ const char *response;
+ const char *status;
+ size_t length;
+ int32_t version;
+ bool dynamic;
+ const char *error;
+ int worker_job_id;
+ int postpone_reconnect_seconds;
+ ND_LOG_FIELD_PRIORITY priority;
+} stream_responses[] = {
+ {
+ .response = START_STREAMING_PROMPT_VN,
+ .length = sizeof(START_STREAMING_PROMPT_VN) - 1,
+ .status = RRDPUSH_STATUS_CONNECTED,
+ .version = STREAM_HANDSHAKE_OK_V3, // and above
+ .dynamic = true, // dynamic = we will parse the version / capabilities
+ .error = NULL,
+ .worker_job_id = 0,
+ .postpone_reconnect_seconds = 0,
+ .priority = NDLP_INFO,
+ },
+ {
+ .response = START_STREAMING_PROMPT_V2,
+ .length = sizeof(START_STREAMING_PROMPT_V2) - 1,
+ .status = RRDPUSH_STATUS_CONNECTED,
+ .version = STREAM_HANDSHAKE_OK_V2,
+ .dynamic = false,
+ .error = NULL,
+ .worker_job_id = 0,
+ .postpone_reconnect_seconds = 0,
+ .priority = NDLP_INFO,
+ },
+ {
+ .response = START_STREAMING_PROMPT_V1,
+ .length = sizeof(START_STREAMING_PROMPT_V1) - 1,
+ .status = RRDPUSH_STATUS_CONNECTED,
+ .version = STREAM_HANDSHAKE_OK_V1,
+ .dynamic = false,
+ .error = NULL,
+ .worker_job_id = 0,
+ .postpone_reconnect_seconds = 0,
+ .priority = NDLP_INFO,
+ },
+ {
+ .response = START_STREAMING_ERROR_SAME_LOCALHOST,
+ .length = sizeof(START_STREAMING_ERROR_SAME_LOCALHOST) - 1,
+ .status = RRDPUSH_STATUS_LOCALHOST,
+ .version = STREAM_HANDSHAKE_ERROR_LOCALHOST,
+ .dynamic = false,
+ .error = "remote server rejected this stream, the host we are trying to stream is its localhost",
+ .worker_job_id = WORKER_SENDER_JOB_DISCONNECT_BAD_HANDSHAKE,
+ .postpone_reconnect_seconds = 60 * 60, // the IP may change, try it every hour
+ .priority = NDLP_DEBUG,
+ },
+ {
+ .response = START_STREAMING_ERROR_ALREADY_STREAMING,
+ .length = sizeof(START_STREAMING_ERROR_ALREADY_STREAMING) - 1,
+ .status = RRDPUSH_STATUS_ALREADY_CONNECTED,
+ .version = STREAM_HANDSHAKE_ERROR_ALREADY_CONNECTED,
+ .dynamic = false,
+ .error = "remote server rejected this stream, the host we are trying to stream is already streamed to it",
+ .worker_job_id = WORKER_SENDER_JOB_DISCONNECT_BAD_HANDSHAKE,
+ .postpone_reconnect_seconds = 2 * 60, // 2 minutes
+ .priority = NDLP_DEBUG,
+ },
+ {
+ .response = START_STREAMING_ERROR_NOT_PERMITTED,
+ .length = sizeof(START_STREAMING_ERROR_NOT_PERMITTED) - 1,
+ .status = RRDPUSH_STATUS_PERMISSION_DENIED,
+ .version = STREAM_HANDSHAKE_ERROR_DENIED,
+ .dynamic = false,
+ .error = "remote server denied access, probably we don't have the right API key?",
+ .worker_job_id = WORKER_SENDER_JOB_DISCONNECT_BAD_HANDSHAKE,
+ .postpone_reconnect_seconds = 1 * 60, // 1 minute
+ .priority = NDLP_ERR,
+ },
+ {
+ .response = START_STREAMING_ERROR_BUSY_TRY_LATER,
+ .length = sizeof(START_STREAMING_ERROR_BUSY_TRY_LATER) - 1,
+ .status = RRDPUSH_STATUS_RATE_LIMIT,
+ .version = STREAM_HANDSHAKE_BUSY_TRY_LATER,
+ .dynamic = false,
+ .error = "remote server is currently busy, we should try later",
+ .worker_job_id = WORKER_SENDER_JOB_DISCONNECT_BAD_HANDSHAKE,
+ .postpone_reconnect_seconds = 2 * 60, // 2 minutes
+ .priority = NDLP_NOTICE,
+ },
+ {
+ .response = START_STREAMING_ERROR_INTERNAL_ERROR,
+ .length = sizeof(START_STREAMING_ERROR_INTERNAL_ERROR) - 1,
+ .status = RRDPUSH_STATUS_INTERNAL_SERVER_ERROR,
+ .version = STREAM_HANDSHAKE_INTERNAL_ERROR,
+ .dynamic = false,
+ .error = "remote server is encountered an internal error, we should try later",
+ .worker_job_id = WORKER_SENDER_JOB_DISCONNECT_BAD_HANDSHAKE,
+ .postpone_reconnect_seconds = 5 * 60, // 5 minutes
+ .priority = NDLP_CRIT,
+ },
+ {
+ .response = START_STREAMING_ERROR_INITIALIZATION,
+ .length = sizeof(START_STREAMING_ERROR_INITIALIZATION) - 1,
+ .status = RRDPUSH_STATUS_INITIALIZATION_IN_PROGRESS,
+ .version = STREAM_HANDSHAKE_INITIALIZATION,
+ .dynamic = false,
+ .error = "remote server is initializing, we should try later",
+ .worker_job_id = WORKER_SENDER_JOB_DISCONNECT_BAD_HANDSHAKE,
+ .postpone_reconnect_seconds = 2 * 60, // 2 minute
+ .priority = NDLP_NOTICE,
+ },
+
+ // terminator
+ {
+ .response = NULL,
+ .length = 0,
+ .status = RRDPUSH_STATUS_BAD_HANDSHAKE,
+ .version = STREAM_HANDSHAKE_ERROR_BAD_HANDSHAKE,
+ .dynamic = false,
+ .error = "remote node response is not understood, is it Netdata?",
+ .worker_job_id = WORKER_SENDER_JOB_DISCONNECT_BAD_HANDSHAKE,
+ .postpone_reconnect_seconds = 1 * 60, // 1 minute
+ .priority = NDLP_ERR,
+ }
+};
+
+static inline bool rrdpush_sender_validate_response(RRDHOST *host, struct sender_state *s, char *http, size_t http_length) {
+ int32_t version = STREAM_HANDSHAKE_ERROR_BAD_HANDSHAKE;
+
+ int i;
+ for(i = 0; stream_responses[i].response ; i++) {
+ if(stream_responses[i].dynamic &&
+ http_length > stream_responses[i].length && http_length < (stream_responses[i].length + 30) &&
+ strncmp(http, stream_responses[i].response, stream_responses[i].length) == 0) {
+
+ version = str2i(&http[stream_responses[i].length]);
+ break;
+ }
+ else if(http_length == stream_responses[i].length && strcmp(http, stream_responses[i].response) == 0) {
+ version = stream_responses[i].version;
+
+ break;
+ }
+ }
+
+ if(version >= STREAM_HANDSHAKE_OK_V1) {
+ host->destination->reason = version;
+ host->destination->postpone_reconnection_until = now_realtime_sec() + s->reconnect_delay;
+ s->capabilities = convert_stream_version_to_capabilities(version, host, true);
+ return true;
+ }
+
+ ND_LOG_FIELD_PRIORITY priority = stream_responses[i].priority;
+ const char *error = stream_responses[i].error;
+ const char *status = stream_responses[i].status;
+ int worker_job_id = stream_responses[i].worker_job_id;
+ int delay = stream_responses[i].postpone_reconnect_seconds;
+
+ worker_is_busy(worker_job_id);
+ rrdpush_sender_thread_close_socket(s);
+ host->destination->reason = version;
+ host->destination->postpone_reconnection_until = now_realtime_sec() + delay;
+
+ ND_LOG_STACK lgs[] = {
+ ND_LOG_FIELD_TXT(NDF_RESPONSE_CODE, status),
+ ND_LOG_FIELD_END(),
+ };
+ ND_LOG_STACK_PUSH(lgs);
+
+ char buf[RFC3339_MAX_LENGTH];
+ rfc3339_datetime_ut(buf, sizeof(buf), host->destination->postpone_reconnection_until * USEC_PER_SEC, 0, false);
+
+ nd_log(NDLS_DAEMON, priority,
+ "STREAM %s [send to %s]: %s - will retry in %d secs, at %s",
+ rrdhost_hostname(host), s->connected_to, error, delay, buf);
+
+ return false;
+}
+
+unsigned char alpn_proto_list[] = {
+ 18, 'n', 'e', 't', 'd', 'a', 't', 'a', '_', 's', 't', 'r', 'e', 'a', 'm', '/', '2', '.', '0',
+ 8, 'h', 't', 't', 'p', '/', '1', '.', '1'
+};
+
+#define CONN_UPGRADE_VAL "upgrade"
+
+static bool rrdpush_sender_connect_ssl(struct sender_state *s) {
+ RRDHOST *host = s->host;
+ bool ssl_required = host && host->destination && host->destination->ssl;
+
+ netdata_ssl_close(&host->sender->ssl);
+
+ if(!ssl_required)
+ return true;
+
+ if (netdata_ssl_open_ext(&host->sender->ssl, netdata_ssl_streaming_sender_ctx, s->rrdpush_sender_socket, alpn_proto_list, sizeof(alpn_proto_list))) {
+ if(!netdata_ssl_connect(&host->sender->ssl)) {
+ // couldn't connect
+
+ ND_LOG_STACK lgs[] = {
+ ND_LOG_FIELD_TXT(NDF_RESPONSE_CODE, RRDPUSH_STATUS_SSL_ERROR),
+ ND_LOG_FIELD_END(),
+ };
+ ND_LOG_STACK_PUSH(lgs);
+
+ worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_SSL_ERROR);
+ rrdpush_sender_thread_close_socket(s);
+ host->destination->reason = STREAM_HANDSHAKE_ERROR_SSL_ERROR;
+ host->destination->postpone_reconnection_until = now_realtime_sec() + 5 * 60;
+ return false;
+ }
+
+ if (netdata_ssl_validate_certificate_sender &&
+ security_test_certificate(host->sender->ssl.conn)) {
+ // certificate is not valid
+
+ ND_LOG_STACK lgs[] = {
+ ND_LOG_FIELD_TXT(NDF_RESPONSE_CODE, RRDPUSH_STATUS_INVALID_SSL_CERTIFICATE),
+ ND_LOG_FIELD_END(),
+ };
+ ND_LOG_STACK_PUSH(lgs);
+
+ worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_SSL_ERROR);
+ netdata_log_error("SSL: closing the stream connection, because the server SSL certificate is not valid.");
+ rrdpush_sender_thread_close_socket(s);
+ host->destination->reason = STREAM_HANDSHAKE_ERROR_INVALID_CERTIFICATE;
+ host->destination->postpone_reconnection_until = now_realtime_sec() + 5 * 60;
+ return false;
+ }
+
+ return true;
+ }
+
+ ND_LOG_STACK lgs[] = {
+ ND_LOG_FIELD_TXT(NDF_RESPONSE_CODE, RRDPUSH_STATUS_CANT_ESTABLISH_SSL_CONNECTION),
+ ND_LOG_FIELD_END(),
+ };
+ ND_LOG_STACK_PUSH(lgs);
+
+ netdata_log_error("SSL: failed to establish connection.");
+ return false;
+}
+
+static int rrdpush_http_upgrade_prelude(RRDHOST *host, struct sender_state *s) {
+
+ char http[HTTP_HEADER_SIZE + 1];
+ snprintfz(http, HTTP_HEADER_SIZE,
+ "GET " NETDATA_STREAM_URL HTTP_1_1 HTTP_ENDL
+ "Upgrade: " NETDATA_STREAM_PROTO_NAME HTTP_ENDL
+ "Connection: Upgrade"
+ HTTP_HDR_END);
+
+ ssize_t bytes = send_timeout(
+ &host->sender->ssl,
+ s->rrdpush_sender_socket,
+ http,
+ strlen(http),
+ 0,
+ 1000);
+
+ bytes = recv_timeout(
+ &host->sender->ssl,
+ s->rrdpush_sender_socket,
+ http,
+ HTTP_HEADER_SIZE,
+ 0,
+ 1000);
+
+ if (bytes <= 0) {
+ error_report("Error reading from remote");
+ return 1;
+ }
+
+ rbuf_t buf = rbuf_create(bytes);
+ rbuf_push(buf, http, bytes);
+
+ http_parse_ctx ctx;
+ http_parse_ctx_create(&ctx, HTTP_PARSE_INITIAL);
+ ctx.flags |= HTTP_PARSE_FLAG_DONT_WAIT_FOR_CONTENT;
+
+ int rc;
+ // while((rc = parse_http_response(buf, &ctx)) == HTTP_PARSE_NEED_MORE_DATA);
+ rc = parse_http_response(buf, &ctx);
+
+ if (rc != HTTP_PARSE_SUCCESS) {
+ error_report("Failed to parse HTTP response sent. (%d)", rc);
+ goto err_cleanup;
+ }
+ if (ctx.http_code == HTTP_RESP_MOVED_PERM) {
+ const char *hdr = get_http_header_by_name(&ctx, "location");
+ if (hdr)
+ error_report("HTTP response is %d Moved Permanently (location: \"%s\") instead of expected %d Switching Protocols.", ctx.http_code, hdr, HTTP_RESP_SWITCH_PROTO);
+ else
+ error_report("HTTP response is %d instead of expected %d Switching Protocols.", ctx.http_code, HTTP_RESP_SWITCH_PROTO);
+ goto err_cleanup;
+ }
+ if (ctx.http_code == HTTP_RESP_NOT_FOUND) {
+ error_report("HTTP response is %d instead of expected %d Switching Protocols. Parent version too old.", ctx.http_code, HTTP_RESP_SWITCH_PROTO);
+ // TODO set some flag here that will signify parent is older version
+ // and to try connection without rrdpush_http_upgrade_prelude next time
+ goto err_cleanup;
+ }
+ if (ctx.http_code != HTTP_RESP_SWITCH_PROTO) {
+ error_report("HTTP response is %d instead of expected %d Switching Protocols", ctx.http_code, HTTP_RESP_SWITCH_PROTO);
+ goto err_cleanup;
+ }
+
+ const char *hdr = get_http_header_by_name(&ctx, "connection");
+ if (!hdr) {
+ error_report("Missing \"connection\" header in reply");
+ goto err_cleanup;
+ }
+ if (strncmp(hdr, CONN_UPGRADE_VAL, strlen(CONN_UPGRADE_VAL))) {
+ error_report("Expected \"connection: " CONN_UPGRADE_VAL "\"");
+ goto err_cleanup;
+ }
+
+ hdr = get_http_header_by_name(&ctx, "upgrade");
+ if (!hdr) {
+ error_report("Missing \"upgrade\" header in reply");
+ goto err_cleanup;
+ }
+ if (strncmp(hdr, NETDATA_STREAM_PROTO_NAME, strlen(NETDATA_STREAM_PROTO_NAME))) {
+ error_report("Expected \"upgrade: " NETDATA_STREAM_PROTO_NAME "\"");
+ goto err_cleanup;
+ }
+
+ netdata_log_debug(D_STREAM, "Stream sender upgrade to \"" NETDATA_STREAM_PROTO_NAME "\" successful");
+ rbuf_free(buf);
+ http_parse_ctx_destroy(&ctx);
+ return 0;
+err_cleanup:
+ rbuf_free(buf);
+ http_parse_ctx_destroy(&ctx);
+ return 1;
+}
+
+static bool sender_send_connection_request(RRDHOST *host, int default_port, int timeout, struct sender_state *s) {
+
+ struct timeval tv = {
+ .tv_sec = timeout,
+ .tv_usec = 0
+ };
+
+ // make sure the socket is closed
+ rrdpush_sender_thread_close_socket(s);
+
+ s->rrdpush_sender_socket = connect_to_one_of_destinations(
+ host
+ , default_port
+ , &tv
+ , &s->reconnects_counter
+ , s->connected_to
+ , sizeof(s->connected_to)-1
+ , &host->destination
+ );
+
+ if(unlikely(s->rrdpush_sender_socket == -1)) {
+ // netdata_log_error("STREAM %s [send to %s]: could not connect to parent node at this time.", rrdhost_hostname(host), host->rrdpush_send_destination);
+ return false;
+ }
+
+ // netdata_log_info("STREAM %s [send to %s]: initializing communication...", rrdhost_hostname(host), s->connected_to);
+
+ // reset our capabilities to default
+ s->capabilities = stream_our_capabilities(host, true);
+
+ /* TODO: During the implementation of #7265 switch the set of variables to HOST_* and CONTAINER_* if the
+ version negotiation resulted in a high enough version.
+ */
+ stream_encoded_t se;
+ rrdpush_encode_variable(&se, host);
+
+ host->sender->hops = host->system_info->hops + 1;
+
+ char http[HTTP_HEADER_SIZE + 1];
+ int eol = snprintfz(http, HTTP_HEADER_SIZE,
+ "STREAM "
+ "key=%s"
+ "&hostname=%s"
+ "&registry_hostname=%s"
+ "&machine_guid=%s"
+ "&update_every=%d"
+ "&os=%s"
+ "&timezone=%s"
+ "&abbrev_timezone=%s"
+ "&utc_offset=%d"
+ "&hops=%d"
+ "&ml_capable=%d"
+ "&ml_enabled=%d"
+ "&mc_version=%d"
+ "&ver=%u"
+ "&NETDATA_INSTANCE_CLOUD_TYPE=%s"
+ "&NETDATA_INSTANCE_CLOUD_INSTANCE_TYPE=%s"
+ "&NETDATA_INSTANCE_CLOUD_INSTANCE_REGION=%s"
+ "&NETDATA_SYSTEM_OS_NAME=%s"
+ "&NETDATA_SYSTEM_OS_ID=%s"
+ "&NETDATA_SYSTEM_OS_ID_LIKE=%s"
+ "&NETDATA_SYSTEM_OS_VERSION=%s"
+ "&NETDATA_SYSTEM_OS_VERSION_ID=%s"
+ "&NETDATA_SYSTEM_OS_DETECTION=%s"
+ "&NETDATA_HOST_IS_K8S_NODE=%s"
+ "&NETDATA_SYSTEM_KERNEL_NAME=%s"
+ "&NETDATA_SYSTEM_KERNEL_VERSION=%s"
+ "&NETDATA_SYSTEM_ARCHITECTURE=%s"
+ "&NETDATA_SYSTEM_VIRTUALIZATION=%s"
+ "&NETDATA_SYSTEM_VIRT_DETECTION=%s"
+ "&NETDATA_SYSTEM_CONTAINER=%s"
+ "&NETDATA_SYSTEM_CONTAINER_DETECTION=%s"
+ "&NETDATA_CONTAINER_OS_NAME=%s"
+ "&NETDATA_CONTAINER_OS_ID=%s"
+ "&NETDATA_CONTAINER_OS_ID_LIKE=%s"
+ "&NETDATA_CONTAINER_OS_VERSION=%s"
+ "&NETDATA_CONTAINER_OS_VERSION_ID=%s"
+ "&NETDATA_CONTAINER_OS_DETECTION=%s"
+ "&NETDATA_SYSTEM_CPU_LOGICAL_CPU_COUNT=%s"
+ "&NETDATA_SYSTEM_CPU_FREQ=%s"
+ "&NETDATA_SYSTEM_TOTAL_RAM=%s"
+ "&NETDATA_SYSTEM_TOTAL_DISK_SIZE=%s"
+ "&NETDATA_PROTOCOL_VERSION=%s"
+ HTTP_1_1 HTTP_ENDL
+ "User-Agent: %s/%s\r\n"
+ "Accept: */*\r\n\r\n"
+ , host->rrdpush.send.api_key
+ , rrdhost_hostname(host)
+ , rrdhost_registry_hostname(host)
+ , host->machine_guid
+ , default_rrd_update_every
+ , rrdhost_os(host)
+ , rrdhost_timezone(host)
+ , rrdhost_abbrev_timezone(host)
+ , host->utc_offset
+ , host->sender->hops
+ , host->system_info->ml_capable
+ , host->system_info->ml_enabled
+ , host->system_info->mc_version
+ , s->capabilities
+ , (host->system_info->cloud_provider_type) ? host->system_info->cloud_provider_type : ""
+ , (host->system_info->cloud_instance_type) ? host->system_info->cloud_instance_type : ""
+ , (host->system_info->cloud_instance_region) ? host->system_info->cloud_instance_region : ""
+ , se.os_name
+ , se.os_id
+ , (host->system_info->host_os_id_like) ? host->system_info->host_os_id_like : ""
+ , se.os_version
+ , (host->system_info->host_os_version_id) ? host->system_info->host_os_version_id : ""
+ , (host->system_info->host_os_detection) ? host->system_info->host_os_detection : ""
+ , (host->system_info->is_k8s_node) ? host->system_info->is_k8s_node : ""
+ , se.kernel_name
+ , se.kernel_version
+ , (host->system_info->architecture) ? host->system_info->architecture : ""
+ , (host->system_info->virtualization) ? host->system_info->virtualization : ""
+ , (host->system_info->virt_detection) ? host->system_info->virt_detection : ""
+ , (host->system_info->container) ? host->system_info->container : ""
+ , (host->system_info->container_detection) ? host->system_info->container_detection : ""
+ , (host->system_info->container_os_name) ? host->system_info->container_os_name : ""
+ , (host->system_info->container_os_id) ? host->system_info->container_os_id : ""
+ , (host->system_info->container_os_id_like) ? host->system_info->container_os_id_like : ""
+ , (host->system_info->container_os_version) ? host->system_info->container_os_version : ""
+ , (host->system_info->container_os_version_id) ? host->system_info->container_os_version_id : ""
+ , (host->system_info->container_os_detection) ? host->system_info->container_os_detection : ""
+ , (host->system_info->host_cores) ? host->system_info->host_cores : ""
+ , (host->system_info->host_cpu_freq) ? host->system_info->host_cpu_freq : ""
+ , (host->system_info->host_ram_total) ? host->system_info->host_ram_total : ""
+ , (host->system_info->host_disk_space) ? host->system_info->host_disk_space : ""
+ , STREAMING_PROTOCOL_VERSION
+ , rrdhost_program_name(host)
+ , rrdhost_program_version(host)
+ );
+ http[eol] = 0x00;
+ rrdpush_clean_encoded(&se);
+
+ if(!rrdpush_sender_connect_ssl(s))
+ return false;
+
+ if (s->parent_using_h2o && rrdpush_http_upgrade_prelude(host, s)) {
+ ND_LOG_STACK lgs[] = {
+ ND_LOG_FIELD_TXT(NDF_RESPONSE_CODE, RRDPUSH_STATUS_CANT_UPGRADE_CONNECTION),
+ ND_LOG_FIELD_END(),
+ };
+ ND_LOG_STACK_PUSH(lgs);
+
+ worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_CANT_UPGRADE_CONNECTION);
+ rrdpush_sender_thread_close_socket(s);
+ host->destination->reason = STREAM_HANDSHAKE_ERROR_HTTP_UPGRADE;
+ host->destination->postpone_reconnection_until = now_realtime_sec() + 1 * 60;
+ return false;
+ }
+
+ ssize_t len = (ssize_t)strlen(http);
+ ssize_t bytes = send_timeout(
+ &host->sender->ssl,
+ s->rrdpush_sender_socket,
+ http,
+ len,
+ 0,
+ timeout);
+
+ if(bytes <= 0) { // timeout is 0
+ ND_LOG_STACK lgs[] = {
+ ND_LOG_FIELD_TXT(NDF_RESPONSE_CODE, RRDPUSH_STATUS_TIMEOUT),
+ ND_LOG_FIELD_END(),
+ };
+ ND_LOG_STACK_PUSH(lgs);
+
+ worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_TIMEOUT);
+ rrdpush_sender_thread_close_socket(s);
+
+ nd_log(NDLS_DAEMON, NDLP_ERR,
+ "STREAM %s [send to %s]: failed to send HTTP header to remote netdata.",
+ rrdhost_hostname(host), s->connected_to);
+
+ host->destination->reason = STREAM_HANDSHAKE_ERROR_SEND_TIMEOUT;
+ host->destination->postpone_reconnection_until = now_realtime_sec() + 1 * 60;
+ return false;
+ }
+
+ bytes = recv_timeout(
+ &host->sender->ssl,
+ s->rrdpush_sender_socket,
+ http,
+ HTTP_HEADER_SIZE,
+ 0,
+ timeout);
+
+ if(bytes <= 0) { // timeout is 0
+ ND_LOG_STACK lgs[] = {
+ ND_LOG_FIELD_TXT(NDF_RESPONSE_CODE, RRDPUSH_STATUS_TIMEOUT),
+ ND_LOG_FIELD_END(),
+ };
+ ND_LOG_STACK_PUSH(lgs);
+
+ worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_TIMEOUT);
+ rrdpush_sender_thread_close_socket(s);
+
+ nd_log(NDLS_DAEMON, NDLP_ERR,
+ "STREAM %s [send to %s]: remote netdata does not respond.",
+ rrdhost_hostname(host), s->connected_to);
+
+ host->destination->reason = STREAM_HANDSHAKE_ERROR_RECEIVE_TIMEOUT;
+ host->destination->postpone_reconnection_until = now_realtime_sec() + 30;
+ return false;
+ }
+
+ if(sock_setnonblock(s->rrdpush_sender_socket) < 0)
+ nd_log(NDLS_DAEMON, NDLP_WARNING,
+ "STREAM %s [send to %s]: cannot set non-blocking mode for socket.",
+ rrdhost_hostname(host), s->connected_to);
+ sock_setcloexec(s->rrdpush_sender_socket);
+
+ if(sock_enlarge_out(s->rrdpush_sender_socket) < 0)
+ nd_log(NDLS_DAEMON, NDLP_WARNING,
+ "STREAM %s [send to %s]: cannot enlarge the socket buffer.",
+ rrdhost_hostname(host), s->connected_to);
+
+ http[bytes] = '\0';
+ if(!rrdpush_sender_validate_response(host, s, http, bytes))
+ return false;
+
+ rrdpush_compression_initialize(s);
+
+ log_sender_capabilities(s);
+
+ ND_LOG_STACK lgs[] = {
+ ND_LOG_FIELD_TXT(NDF_RESPONSE_CODE, RRDPUSH_STATUS_CONNECTED),
+ ND_LOG_FIELD_END(),
+ };
+ ND_LOG_STACK_PUSH(lgs);
+
+ nd_log(NDLS_DAEMON, NDLP_DEBUG,
+ "STREAM %s: connected to %s...",
+ rrdhost_hostname(host), s->connected_to);
+
+ return true;
+}
+
+bool attempt_to_connect(struct sender_state *state) {
+ ND_LOG_STACK lgs[] = {
+ ND_LOG_FIELD_UUID(NDF_MESSAGE_ID, &streaming_to_parent_msgid),
+ ND_LOG_FIELD_END(),
+ };
+ ND_LOG_STACK_PUSH(lgs);
+
+ state->send_attempts = 0;
+
+ // reset the bytes we have sent for this session
+ state->sent_bytes_on_this_connection = 0;
+ memset(state->sent_bytes_on_this_connection_per_type, 0, sizeof(state->sent_bytes_on_this_connection_per_type));
+
+ if(sender_send_connection_request(state->host, state->default_port, state->timeout, state)) {
+ // reset the buffer, to properly send charts and metrics
+ rrdpush_sender_on_connect(state->host);
+
+ // send from the beginning
+ state->begin = 0;
+
+ // make sure the next reconnection will be immediate
+ state->not_connected_loops = 0;
+
+ // let the data collection threads know we are ready
+ rrdhost_flag_set(state->host, RRDHOST_FLAG_RRDPUSH_SENDER_CONNECTED);
+
+ rrdpush_sender_after_connect(state->host);
+
+ return true;
+ }
+
+ // we couldn't connect
+
+ // increase the failed connections counter
+ state->not_connected_loops++;
+
+ // slow re-connection on repeating errors
+ usec_t now_ut = now_monotonic_usec();
+ usec_t end_ut = now_ut + USEC_PER_SEC * state->reconnect_delay;
+ while(now_ut < end_ut) {
+ if(nd_thread_signaled_to_cancel())
+ return false;
+
+ sleep_usec(100 * USEC_PER_MS); // seconds
+ now_ut = now_monotonic_usec();
+ }
+
+ return false;
+}
+
+bool rrdpush_sender_connect(struct sender_state *s) {
+ worker_is_busy(WORKER_SENDER_JOB_CONNECT);
+
+ time_t now_s = now_monotonic_sec();
+ rrdpush_sender_cbuffer_recreate_timed(s, now_s, false, true);
+ rrdpush_sender_execute_commands_cleanup(s);
+
+ rrdhost_flag_clear(s->host, RRDHOST_FLAG_RRDPUSH_SENDER_READY_4_METRICS);
+ s->flags &= ~SENDER_FLAG_OVERFLOW;
+ s->read_len = 0;
+ s->buffer->read = 0;
+ s->buffer->write = 0;
+
+ if(!attempt_to_connect(s))
+ return false;
+
+ if(rrdhost_sender_should_exit(s))
+ return false;
+
+ s->last_traffic_seen_t = now_monotonic_sec();
+ stream_path_send_to_parent(s->host);
+ rrdpush_sender_send_claimed_id(s->host);
+ rrdpush_send_host_labels(s->host);
+ rrdpush_send_global_functions(s->host);
+ s->replication.oldest_request_after_t = 0;
+
+ rrdhost_flag_set(s->host, RRDHOST_FLAG_RRDPUSH_SENDER_READY_4_METRICS);
+
+ nd_log(NDLS_DAEMON, NDLP_DEBUG,
+ "STREAM %s [send to %s]: enabling metrics streaming...",
+ rrdhost_hostname(s->host), s->connected_to);
+
+ return true;
+}
+
+// Either the receiver lost the connection or the host is being destroyed.
+// The sender mutex guards thread creation, any spurious data is wiped on reconnection.
+void rrdpush_sender_thread_stop(RRDHOST *host, STREAM_HANDSHAKE reason, bool wait) {
+ if (!host->sender)
+ return;
+
+ sender_lock(host->sender);
+
+ if(rrdhost_flag_check(host, RRDHOST_FLAG_RRDPUSH_SENDER_SPAWN)) {
+
+ host->sender->exit.shutdown = true;
+ host->sender->exit.reason = reason;
+
+ // signal it to cancel
+ nd_thread_signal_cancel(host->rrdpush_sender_thread);
+ }
+
+ sender_unlock(host->sender);
+
+ if(wait) {
+ sender_lock(host->sender);
+ while(host->sender->tid) {
+ sender_unlock(host->sender);
+ sleep_usec(10 * USEC_PER_MS);
+ sender_lock(host->sender);
+ }
+ sender_unlock(host->sender);
+ }
+}
diff --git a/src/streaming/sender-destinations.c b/src/streaming/sender-destinations.c
new file mode 100644
index 000000000..5e67ca039
--- /dev/null
+++ b/src/streaming/sender-destinations.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "sender-internals.h"
+
+void rrdpush_reset_destinations_postpone_time(RRDHOST *host) {
+ uint32_t wait = (host->sender) ? host->sender->reconnect_delay : 5;
+ time_t now = now_realtime_sec();
+ for (struct rrdpush_destinations *d = host->destinations; d; d = d->next)
+ d->postpone_reconnection_until = now + wait;
+}
+
+void rrdpush_sender_ssl_init(RRDHOST *host) {
+ static SPINLOCK sp = NETDATA_SPINLOCK_INITIALIZER;
+ spinlock_lock(&sp);
+
+ if(netdata_ssl_streaming_sender_ctx || !host) {
+ spinlock_unlock(&sp);
+ return;
+ }
+
+ for(struct rrdpush_destinations *d = host->destinations; d ; d = d->next) {
+ if (d->ssl) {
+ // we need to initialize SSL
+
+ netdata_ssl_initialize_ctx(NETDATA_SSL_STREAMING_SENDER_CTX);
+ ssl_security_location_for_context(netdata_ssl_streaming_sender_ctx, stream_conf_ssl_ca_file, stream_conf_ssl_ca_path);
+
+ // stop the loop
+ break;
+ }
+ }
+
+ spinlock_unlock(&sp);
+}
+
+int connect_to_one_of_destinations(
+ RRDHOST *host,
+ int default_port,
+ struct timeval *timeout,
+ size_t *reconnects_counter,
+ char *connected_to,
+ size_t connected_to_size,
+ struct rrdpush_destinations **destination)
+{
+ int sock = -1;
+
+ for (struct rrdpush_destinations *d = host->destinations; d; d = d->next) {
+ time_t now = now_realtime_sec();
+
+ if(nd_thread_signaled_to_cancel())
+ return -1;
+
+ if(d->postpone_reconnection_until > now)
+ continue;
+
+ nd_log(NDLS_DAEMON, NDLP_DEBUG,
+ "STREAM %s: connecting to '%s' (default port: %d)...",
+ rrdhost_hostname(host), string2str(d->destination), default_port);
+
+ if (reconnects_counter)
+ *reconnects_counter += 1;
+
+ d->since = now;
+ d->attempts++;
+ sock = connect_to_this(string2str(d->destination), default_port, timeout);
+
+ if (sock != -1) {
+ if (connected_to && connected_to_size)
+ strncpyz(connected_to, string2str(d->destination), connected_to_size);
+
+ *destination = d;
+
+ // move the current item to the end of the list
+ // without this, this destination will break the loop again and again
+ // not advancing the destinations to find one that may work
+ DOUBLE_LINKED_LIST_REMOVE_ITEM_UNSAFE(host->destinations, d, prev, next);
+ DOUBLE_LINKED_LIST_APPEND_ITEM_UNSAFE(host->destinations, d, prev, next);
+
+ break;
+ }
+ }
+
+ return sock;
+}
+
+struct destinations_init_tmp {
+ RRDHOST *host;
+ struct rrdpush_destinations *list;
+ int count;
+};
+
+static bool destinations_init_add_one(char *entry, void *data) {
+ struct destinations_init_tmp *t = data;
+
+ struct rrdpush_destinations *d = callocz(1, sizeof(struct rrdpush_destinations));
+ char *colon_ssl = strstr(entry, ":SSL");
+ if(colon_ssl) {
+ *colon_ssl = '\0';
+ d->ssl = true;
+ }
+ else
+ d->ssl = false;
+
+ d->destination = string_strdupz(entry);
+
+ __atomic_add_fetch(&netdata_buffers_statistics.rrdhost_senders, sizeof(struct rrdpush_destinations), __ATOMIC_RELAXED);
+
+ DOUBLE_LINKED_LIST_APPEND_ITEM_UNSAFE(t->list, d, prev, next);
+
+ t->count++;
+ nd_log_daemon(NDLP_INFO, "STREAM: added streaming destination No %d: '%s' to host '%s'", t->count, string2str(d->destination), rrdhost_hostname(t->host));
+
+ return false; // we return false, so that we will get all defined destinations
+}
+
+void rrdpush_destinations_init(RRDHOST *host) {
+ if(!host->rrdpush.send.destination) return;
+
+ rrdpush_destinations_free(host);
+
+ struct destinations_init_tmp t = {
+ .host = host,
+ .list = NULL,
+ .count = 0,
+ };
+
+ foreach_entry_in_connection_string(host->rrdpush.send.destination, destinations_init_add_one, &t);
+
+ host->destinations = t.list;
+}
+
+void rrdpush_destinations_free(RRDHOST *host) {
+ while (host->destinations) {
+ struct rrdpush_destinations *tmp = host->destinations;
+ DOUBLE_LINKED_LIST_REMOVE_ITEM_UNSAFE(host->destinations, tmp, prev, next);
+ string_freez(tmp->destination);
+ freez(tmp);
+ __atomic_sub_fetch(&netdata_buffers_statistics.rrdhost_senders, sizeof(struct rrdpush_destinations), __ATOMIC_RELAXED);
+ }
+
+ host->destinations = NULL;
+}
+
diff --git a/src/streaming/sender-destinations.h b/src/streaming/sender-destinations.h
new file mode 100644
index 000000000..e7c72cef7
--- /dev/null
+++ b/src/streaming/sender-destinations.h
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_SENDER_DESTINATIONS_H
+#define NETDATA_SENDER_DESTINATIONS_H
+
+#include "libnetdata/libnetdata.h"
+#include "stream-handshake.h"
+#include "database/rrd.h"
+
+struct rrdpush_destinations {
+ STRING *destination;
+ bool ssl;
+ uint32_t attempts;
+ time_t since;
+ time_t postpone_reconnection_until;
+ STREAM_HANDSHAKE reason;
+
+ struct rrdpush_destinations *prev;
+ struct rrdpush_destinations *next;
+};
+
+void rrdpush_sender_ssl_init(RRDHOST *host);
+
+void rrdpush_reset_destinations_postpone_time(RRDHOST *host);
+
+void rrdpush_destinations_init(RRDHOST *host);
+void rrdpush_destinations_free(RRDHOST *host);
+
+int connect_to_one_of_destinations(
+ RRDHOST *host,
+ int default_port,
+ struct timeval *timeout,
+ size_t *reconnects_counter,
+ char *connected_to,
+ size_t connected_to_size,
+ struct rrdpush_destinations **destination);
+
+#endif //NETDATA_SENDER_DESTINATIONS_H
diff --git a/src/streaming/sender-execute.c b/src/streaming/sender-execute.c
new file mode 100644
index 000000000..e180710e9
--- /dev/null
+++ b/src/streaming/sender-execute.c
@@ -0,0 +1,294 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "sender-internals.h"
+
+struct inflight_stream_function {
+ struct sender_state *sender;
+ STRING *transaction;
+ usec_t received_ut;
+};
+
+static void stream_execute_function_callback(BUFFER *func_wb, int code, void *data) {
+ struct inflight_stream_function *tmp = data;
+ struct sender_state *s = tmp->sender;
+
+ if(rrdhost_can_send_definitions_to_parent(s->host)) {
+ BUFFER *wb = sender_start(s);
+
+ pluginsd_function_result_begin_to_buffer(wb
+ , string2str(tmp->transaction)
+ , code
+ , content_type_id2string(func_wb->content_type)
+ , func_wb->expires);
+
+ buffer_fast_strcat(wb, buffer_tostring(func_wb), buffer_strlen(func_wb));
+ pluginsd_function_result_end_to_buffer(wb);
+
+ sender_commit(s, wb, STREAM_TRAFFIC_TYPE_FUNCTIONS);
+ sender_thread_buffer_free();
+
+ internal_error(true, "STREAM %s [send to %s] FUNCTION transaction %s sending back response (%zu bytes, %"PRIu64" usec).",
+ rrdhost_hostname(s->host), s->connected_to,
+ string2str(tmp->transaction),
+ buffer_strlen(func_wb),
+ now_realtime_usec() - tmp->received_ut);
+ }
+
+ string_freez(tmp->transaction);
+ buffer_free(func_wb);
+ freez(tmp);
+}
+
+static void stream_execute_function_progress_callback(void *data, size_t done, size_t all) {
+ struct inflight_stream_function *tmp = data;
+ struct sender_state *s = tmp->sender;
+
+ if(rrdhost_can_send_definitions_to_parent(s->host)) {
+ BUFFER *wb = sender_start(s);
+
+ buffer_sprintf(wb, PLUGINSD_KEYWORD_FUNCTION_PROGRESS " '%s' %zu %zu\n",
+ string2str(tmp->transaction), done, all);
+
+ sender_commit(s, wb, STREAM_TRAFFIC_TYPE_FUNCTIONS);
+ }
+}
+
+static void execute_commands_function(struct sender_state *s, const char *command, const char *transaction, const char *timeout_s, const char *function, BUFFER *payload, const char *access, const char *source) {
+ worker_is_busy(WORKER_SENDER_JOB_FUNCTION_REQUEST);
+ nd_log(NDLS_ACCESS, NDLP_INFO, NULL);
+
+ if(!transaction || !*transaction || !timeout_s || !*timeout_s || !function || !*function) {
+ netdata_log_error("STREAM %s [send to %s] %s execution command is incomplete (transaction = '%s', timeout = '%s', function = '%s'). Ignoring it.",
+ rrdhost_hostname(s->host), s->connected_to,
+ command,
+ transaction?transaction:"(unset)",
+ timeout_s?timeout_s:"(unset)",
+ function?function:"(unset)");
+ }
+ else {
+ int timeout = str2i(timeout_s);
+ if(timeout <= 0) timeout = PLUGINS_FUNCTIONS_TIMEOUT_DEFAULT;
+
+ struct inflight_stream_function *tmp = callocz(1, sizeof(struct inflight_stream_function));
+ tmp->received_ut = now_realtime_usec();
+ tmp->sender = s;
+ tmp->transaction = string_strdupz(transaction);
+ BUFFER *wb = buffer_create(1024, &netdata_buffers_statistics.buffers_functions);
+
+ int code = rrd_function_run(s->host, wb, timeout,
+ http_access_from_hex_mapping_old_roles(access), function, false, transaction,
+ stream_execute_function_callback, tmp,
+ stream_has_capability(s, STREAM_CAP_PROGRESS) ? stream_execute_function_progress_callback : NULL,
+ stream_has_capability(s, STREAM_CAP_PROGRESS) ? tmp : NULL,
+ NULL, NULL, payload, source, true);
+
+ if(code != HTTP_RESP_OK) {
+ if (!buffer_strlen(wb))
+ rrd_call_function_error(wb, "Failed to route this request to the plugin that offered it.", code);
+ }
+ }
+}
+
+struct deferred_function {
+ const char *transaction;
+ const char *timeout_s;
+ const char *function;
+ const char *access;
+ const char *source;
+};
+
+static void execute_deferred_function(struct sender_state *s, void *data) {
+ struct deferred_function *dfd = data;
+ execute_commands_function(s, s->defer.end_keyword,
+ dfd->transaction, dfd->timeout_s,
+ dfd->function, s->defer.payload,
+ dfd->access, dfd->source);
+}
+
+static void execute_deferred_json(struct sender_state *s, void *data) {
+ const char *keyword = data;
+
+ if(strcmp(keyword, PLUGINSD_KEYWORD_STREAM_PATH) == 0)
+ stream_path_set_from_json(s->host, buffer_tostring(s->defer.payload), true);
+ else
+ nd_log(NDLS_DAEMON, NDLP_ERR, "STREAM: unknown JSON keyword '%s' with payload: %s", keyword, buffer_tostring(s->defer.payload));
+}
+
+static void cleanup_deferred_json(struct sender_state *s __maybe_unused, void *data) {
+ const char *keyword = data;
+ freez((void *)keyword);
+}
+
+static void cleanup_deferred_function(struct sender_state *s __maybe_unused, void *data) {
+ struct deferred_function *dfd = data;
+ freez((void *)dfd->transaction);
+ freez((void *)dfd->timeout_s);
+ freez((void *)dfd->function);
+ freez((void *)dfd->access);
+ freez((void *)dfd->source);
+ freez(dfd);
+}
+
+static void cleanup_deferred_data(struct sender_state *s) {
+ if(s->defer.cleanup)
+ s->defer.cleanup(s, s->defer.action_data);
+
+ buffer_free(s->defer.payload);
+ s->defer.payload = NULL;
+ s->defer.end_keyword = NULL;
+ s->defer.action = NULL;
+ s->defer.cleanup = NULL;
+ s->defer.action_data = NULL;
+}
+
+void rrdpush_sender_execute_commands_cleanup(struct sender_state *s) {
+ cleanup_deferred_data(s);
+}
+
+// This is just a placeholder until the gap filling state machine is inserted
+void rrdpush_sender_execute_commands(struct sender_state *s) {
+ worker_is_busy(WORKER_SENDER_JOB_EXECUTE);
+
+ ND_LOG_STACK lgs[] = {
+ ND_LOG_FIELD_CB(NDF_REQUEST, line_splitter_reconstruct_line, &s->line),
+ ND_LOG_FIELD_END(),
+ };
+ ND_LOG_STACK_PUSH(lgs);
+
+ char *start = s->read_buffer, *end = &s->read_buffer[s->read_len], *newline;
+ *end = '\0';
+ for( ; start < end ; start = newline + 1) {
+ newline = strchr(start, '\n');
+
+ if(!newline) {
+ if(s->defer.end_keyword) {
+ buffer_strcat(s->defer.payload, start);
+ start = end;
+ }
+ break;
+ }
+
+ *newline = '\0';
+ s->line.count++;
+
+ if(s->defer.end_keyword) {
+ if(strcmp(start, s->defer.end_keyword) == 0) {
+ s->defer.action(s, s->defer.action_data);
+ cleanup_deferred_data(s);
+ }
+ else {
+ buffer_strcat(s->defer.payload, start);
+ buffer_putc(s->defer.payload, '\n');
+ }
+
+ continue;
+ }
+
+ s->line.num_words = quoted_strings_splitter_whitespace(start, s->line.words, PLUGINSD_MAX_WORDS);
+ const char *command = get_word(s->line.words, s->line.num_words, 0);
+
+ if(command && strcmp(command, PLUGINSD_CALL_FUNCTION) == 0) {
+ char *transaction = get_word(s->line.words, s->line.num_words, 1);
+ char *timeout_s = get_word(s->line.words, s->line.num_words, 2);
+ char *function = get_word(s->line.words, s->line.num_words, 3);
+ char *access = get_word(s->line.words, s->line.num_words, 4);
+ char *source = get_word(s->line.words, s->line.num_words, 5);
+
+ execute_commands_function(s, command, transaction, timeout_s, function, NULL, access, source);
+ }
+ else if(command && strcmp(command, PLUGINSD_CALL_FUNCTION_PAYLOAD_BEGIN) == 0) {
+ char *transaction = get_word(s->line.words, s->line.num_words, 1);
+ char *timeout_s = get_word(s->line.words, s->line.num_words, 2);
+ char *function = get_word(s->line.words, s->line.num_words, 3);
+ char *access = get_word(s->line.words, s->line.num_words, 4);
+ char *source = get_word(s->line.words, s->line.num_words, 5);
+ char *content_type = get_word(s->line.words, s->line.num_words, 6);
+
+ s->defer.end_keyword = PLUGINSD_CALL_FUNCTION_PAYLOAD_END;
+ s->defer.payload = buffer_create(0, NULL);
+ s->defer.payload->content_type = content_type_string2id(content_type);
+ s->defer.action = execute_deferred_function;
+ s->defer.cleanup = cleanup_deferred_function;
+
+ struct deferred_function *dfd = callocz(1, sizeof(*dfd));
+ dfd->transaction = strdupz(transaction ? transaction : "");
+ dfd->timeout_s = strdupz(timeout_s ? timeout_s : "");
+ dfd->function = strdupz(function ? function : "");
+ dfd->access = strdupz(access ? access : "");
+ dfd->source = strdupz(source ? source : "");
+
+ s->defer.action_data = dfd;
+ }
+ else if(command && strcmp(command, PLUGINSD_CALL_FUNCTION_CANCEL) == 0) {
+ worker_is_busy(WORKER_SENDER_JOB_FUNCTION_REQUEST);
+ nd_log(NDLS_ACCESS, NDLP_DEBUG, NULL);
+
+ char *transaction = get_word(s->line.words, s->line.num_words, 1);
+ if(transaction && *transaction)
+ rrd_function_cancel(transaction);
+ }
+ else if(command && strcmp(command, PLUGINSD_CALL_FUNCTION_PROGRESS) == 0) {
+ worker_is_busy(WORKER_SENDER_JOB_FUNCTION_REQUEST);
+ nd_log(NDLS_ACCESS, NDLP_DEBUG, NULL);
+
+ char *transaction = get_word(s->line.words, s->line.num_words, 1);
+ if(transaction && *transaction)
+ rrd_function_progress(transaction);
+ }
+ else if (command && strcmp(command, PLUGINSD_KEYWORD_REPLAY_CHART) == 0) {
+ worker_is_busy(WORKER_SENDER_JOB_REPLAY_REQUEST);
+ nd_log(NDLS_ACCESS, NDLP_DEBUG, NULL);
+
+ const char *chart_id = get_word(s->line.words, s->line.num_words, 1);
+ const char *start_streaming = get_word(s->line.words, s->line.num_words, 2);
+ const char *after = get_word(s->line.words, s->line.num_words, 3);
+ const char *before = get_word(s->line.words, s->line.num_words, 4);
+
+ if (!chart_id || !start_streaming || !after || !before) {
+ netdata_log_error("STREAM %s [send to %s] %s command is incomplete"
+ " (chart=%s, start_streaming=%s, after=%s, before=%s)",
+ rrdhost_hostname(s->host), s->connected_to,
+ command,
+ chart_id ? chart_id : "(unset)",
+ start_streaming ? start_streaming : "(unset)",
+ after ? after : "(unset)",
+ before ? before : "(unset)");
+ }
+ else {
+ replication_add_request(s, chart_id,
+ strtoll(after, NULL, 0),
+ strtoll(before, NULL, 0),
+ !strcmp(start_streaming, "true")
+ );
+ }
+ }
+ else if(command && strcmp(command, PLUGINSD_KEYWORD_NODE_ID) == 0) {
+ rrdpush_sender_get_node_and_claim_id_from_parent(s);
+ }
+ else if(command && strcmp(command, PLUGINSD_KEYWORD_JSON) == 0) {
+ char *keyword = get_word(s->line.words, s->line.num_words, 1);
+
+ s->defer.end_keyword = PLUGINSD_KEYWORD_JSON_END;
+ s->defer.payload = buffer_create(0, NULL);
+ s->defer.action = execute_deferred_json;
+ s->defer.cleanup = cleanup_deferred_json;
+ s->defer.action_data = strdupz(keyword);
+ }
+ else {
+ netdata_log_error("STREAM %s [send to %s] received unknown command over connection: %s",
+ rrdhost_hostname(s->host), s->connected_to, s->line.words[0]?s->line.words[0]:"(unset)");
+ }
+
+ line_splitter_reset(&s->line);
+ worker_is_busy(WORKER_SENDER_JOB_EXECUTE);
+ }
+
+ if (start < end) {
+ memmove(s->read_buffer, start, end-start);
+ s->read_len = end - start;
+ }
+ else {
+ s->read_buffer[0] = '\0';
+ s->read_len = 0;
+ }
+}
diff --git a/src/streaming/sender-internals.h b/src/streaming/sender-internals.h
new file mode 100644
index 000000000..574369afa
--- /dev/null
+++ b/src/streaming/sender-internals.h
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_SENDER_INTERNALS_H
+#define NETDATA_SENDER_INTERNALS_H
+
+#include "rrdpush.h"
+#include "h2o-common.h"
+#include "aclk/https_client.h"
+
+#define WORKER_SENDER_JOB_CONNECT 0
+#define WORKER_SENDER_JOB_PIPE_READ 1
+#define WORKER_SENDER_JOB_SOCKET_RECEIVE 2
+#define WORKER_SENDER_JOB_EXECUTE 3
+#define WORKER_SENDER_JOB_SOCKET_SEND 4
+#define WORKER_SENDER_JOB_DISCONNECT_BAD_HANDSHAKE 5
+#define WORKER_SENDER_JOB_DISCONNECT_OVERFLOW 6
+#define WORKER_SENDER_JOB_DISCONNECT_TIMEOUT 7
+#define WORKER_SENDER_JOB_DISCONNECT_POLL_ERROR 8
+#define WORKER_SENDER_JOB_DISCONNECT_SOCKET_ERROR 9
+#define WORKER_SENDER_JOB_DISCONNECT_SSL_ERROR 10
+#define WORKER_SENDER_JOB_DISCONNECT_PARENT_CLOSED 11
+#define WORKER_SENDER_JOB_DISCONNECT_RECEIVE_ERROR 12
+#define WORKER_SENDER_JOB_DISCONNECT_SEND_ERROR 13
+#define WORKER_SENDER_JOB_DISCONNECT_NO_COMPRESSION 14
+#define WORKER_SENDER_JOB_BUFFER_RATIO 15
+#define WORKER_SENDER_JOB_BYTES_RECEIVED 16
+#define WORKER_SENDER_JOB_BYTES_SENT 17
+#define WORKER_SENDER_JOB_BYTES_COMPRESSED 18
+#define WORKER_SENDER_JOB_BYTES_UNCOMPRESSED 19
+#define WORKER_SENDER_JOB_BYTES_COMPRESSION_RATIO 20
+#define WORKER_SENDER_JOB_REPLAY_REQUEST 21
+#define WORKER_SENDER_JOB_FUNCTION_REQUEST 22
+#define WORKER_SENDER_JOB_REPLAY_DICT_SIZE 23
+#define WORKER_SENDER_JOB_DISCONNECT_CANT_UPGRADE_CONNECTION 24
+
+#if WORKER_UTILIZATION_MAX_JOB_TYPES < 25
+#error WORKER_UTILIZATION_MAX_JOB_TYPES has to be at least 25
+#endif
+
+bool attempt_to_connect(struct sender_state *state);
+void rrdpush_sender_on_connect(RRDHOST *host);
+void rrdpush_sender_after_connect(RRDHOST *host);
+void rrdpush_sender_thread_close_socket(struct sender_state *s);
+
+void rrdpush_sender_execute_commands_cleanup(struct sender_state *s);
+void rrdpush_sender_execute_commands(struct sender_state *s);
+
+#endif //NETDATA_SENDER_INTERNALS_H
diff --git a/src/streaming/sender.c b/src/streaming/sender.c
index a5fbe6044..666409b1c 100644
--- a/src/streaming/sender.c
+++ b/src/streaming/sender.c
@@ -1,257 +1,6 @@
// SPDX-License-Identifier: GPL-3.0-or-later
-#include "rrdpush.h"
-#include "common.h"
-#include "aclk/https_client.h"
-
-#define WORKER_SENDER_JOB_CONNECT 0
-#define WORKER_SENDER_JOB_PIPE_READ 1
-#define WORKER_SENDER_JOB_SOCKET_RECEIVE 2
-#define WORKER_SENDER_JOB_EXECUTE 3
-#define WORKER_SENDER_JOB_SOCKET_SEND 4
-#define WORKER_SENDER_JOB_DISCONNECT_BAD_HANDSHAKE 5
-#define WORKER_SENDER_JOB_DISCONNECT_OVERFLOW 6
-#define WORKER_SENDER_JOB_DISCONNECT_TIMEOUT 7
-#define WORKER_SENDER_JOB_DISCONNECT_POLL_ERROR 8
-#define WORKER_SENDER_JOB_DISCONNECT_SOCKET_ERROR 9
-#define WORKER_SENDER_JOB_DISCONNECT_SSL_ERROR 10
-#define WORKER_SENDER_JOB_DISCONNECT_PARENT_CLOSED 11
-#define WORKER_SENDER_JOB_DISCONNECT_RECEIVE_ERROR 12
-#define WORKER_SENDER_JOB_DISCONNECT_SEND_ERROR 13
-#define WORKER_SENDER_JOB_DISCONNECT_NO_COMPRESSION 14
-#define WORKER_SENDER_JOB_BUFFER_RATIO 15
-#define WORKER_SENDER_JOB_BYTES_RECEIVED 16
-#define WORKER_SENDER_JOB_BYTES_SENT 17
-#define WORKER_SENDER_JOB_BYTES_COMPRESSED 18
-#define WORKER_SENDER_JOB_BYTES_UNCOMPRESSED 19
-#define WORKER_SENDER_JOB_BYTES_COMPRESSION_RATIO 20
-#define WORKER_SENDER_JOB_REPLAY_REQUEST 21
-#define WORKER_SENDER_JOB_FUNCTION_REQUEST 22
-#define WORKER_SENDER_JOB_REPLAY_DICT_SIZE 23
-#define WORKER_SENDER_JOB_DISCONNECT_CANT_UPGRADE_CONNECTION 24
-
-#if WORKER_UTILIZATION_MAX_JOB_TYPES < 25
-#error WORKER_UTILIZATION_MAX_JOB_TYPES has to be at least 25
-#endif
-
-extern struct config stream_config;
-extern char *netdata_ssl_ca_path;
-extern char *netdata_ssl_ca_file;
-
-static __thread BUFFER *sender_thread_buffer = NULL;
-static __thread bool sender_thread_buffer_used = false;
-static __thread time_t sender_thread_buffer_last_reset_s = 0;
-
-void sender_thread_buffer_free(void) {
- buffer_free(sender_thread_buffer);
- sender_thread_buffer = NULL;
- sender_thread_buffer_used = false;
-}
-
-// Collector thread starting a transmission
-BUFFER *sender_start(struct sender_state *s) {
- if(unlikely(sender_thread_buffer_used))
- fatal("STREAMING: thread buffer is used multiple times concurrently.");
-
- if(unlikely(rrdpush_sender_last_buffer_recreate_get(s) > sender_thread_buffer_last_reset_s)) {
- if(unlikely(sender_thread_buffer && sender_thread_buffer->size > THREAD_BUFFER_INITIAL_SIZE)) {
- buffer_free(sender_thread_buffer);
- sender_thread_buffer = NULL;
- }
- }
-
- if(unlikely(!sender_thread_buffer)) {
- sender_thread_buffer = buffer_create(THREAD_BUFFER_INITIAL_SIZE, &netdata_buffers_statistics.buffers_streaming);
- sender_thread_buffer_last_reset_s = rrdpush_sender_last_buffer_recreate_get(s);
- }
-
- sender_thread_buffer_used = true;
- buffer_flush(sender_thread_buffer);
- return sender_thread_buffer;
-}
-
-static inline void rrdpush_sender_thread_close_socket(RRDHOST *host);
-
-#define SENDER_BUFFER_ADAPT_TO_TIMES_MAX_SIZE 3
-
-// Collector thread finishing a transmission
-void sender_commit(struct sender_state *s, BUFFER *wb, STREAM_TRAFFIC_TYPE type) {
-
- if(unlikely(wb != sender_thread_buffer))
- fatal("STREAMING: sender is trying to commit a buffer that is not this thread's buffer.");
-
- if(unlikely(!sender_thread_buffer_used))
- fatal("STREAMING: sender is committing a buffer twice.");
-
- sender_thread_buffer_used = false;
-
- char *src = (char *)buffer_tostring(wb);
- size_t src_len = buffer_strlen(wb);
-
- if(unlikely(!src || !src_len))
- return;
-
- sender_lock(s);
-
-#ifdef NETDATA_LOG_STREAM_SENDER
- if(type == STREAM_TRAFFIC_TYPE_METADATA) {
- if(!s->stream_log_fp) {
- char filename[FILENAME_MAX + 1];
- snprintfz(filename, FILENAME_MAX, "/tmp/stream-sender-%s.txt", s->host ? rrdhost_hostname(s->host) : "unknown");
-
- s->stream_log_fp = fopen(filename, "w");
- }
-
- fprintf(s->stream_log_fp, "\n--- SEND MESSAGE START: %s ----\n"
- "%s"
- "--- SEND MESSAGE END ----------------------------------------\n"
- , rrdhost_hostname(s->host), src
- );
- }
-#endif
-
- if(unlikely(s->buffer->max_size < (src_len + 1) * SENDER_BUFFER_ADAPT_TO_TIMES_MAX_SIZE)) {
- netdata_log_info("STREAM %s [send to %s]: max buffer size of %zu is too small for a data message of size %zu. Increasing the max buffer size to %d times the max data message size.",
- rrdhost_hostname(s->host), s->connected_to, s->buffer->max_size, buffer_strlen(wb) + 1, SENDER_BUFFER_ADAPT_TO_TIMES_MAX_SIZE);
-
- s->buffer->max_size = (src_len + 1) * SENDER_BUFFER_ADAPT_TO_TIMES_MAX_SIZE;
- }
-
- if (s->compressor.initialized) {
- while(src_len) {
- size_t size_to_compress = src_len;
-
- if(unlikely(size_to_compress > COMPRESSION_MAX_MSG_SIZE)) {
- if (stream_has_capability(s, STREAM_CAP_BINARY))
- size_to_compress = COMPRESSION_MAX_MSG_SIZE;
- else {
- if (size_to_compress > COMPRESSION_MAX_MSG_SIZE) {
- // we need to find the last newline
- // so that the decompressor will have a whole line to work with
-
- const char *t = &src[COMPRESSION_MAX_MSG_SIZE];
- while (--t >= src)
- if (unlikely(*t == '\n'))
- break;
-
- if (t <= src) {
- size_to_compress = COMPRESSION_MAX_MSG_SIZE;
- } else
- size_to_compress = t - src + 1;
- }
- }
- }
-
- const char *dst;
- size_t dst_len = rrdpush_compress(&s->compressor, src, size_to_compress, &dst);
- if (!dst_len) {
- netdata_log_error("STREAM %s [send to %s]: COMPRESSION failed. Resetting compressor and re-trying",
- rrdhost_hostname(s->host), s->connected_to);
-
- rrdpush_compression_initialize(s);
- dst_len = rrdpush_compress(&s->compressor, src, size_to_compress, &dst);
- if(!dst_len) {
- netdata_log_error("STREAM %s [send to %s]: COMPRESSION failed again. Deactivating compression",
- rrdhost_hostname(s->host), s->connected_to);
-
- worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_NO_COMPRESSION);
- rrdpush_compression_deactivate(s);
- rrdpush_sender_thread_close_socket(s->host);
- sender_unlock(s);
- return;
- }
- }
-
- rrdpush_signature_t signature = rrdpush_compress_encode_signature(dst_len);
-
-#ifdef NETDATA_INTERNAL_CHECKS
- // check if reversing the signature provides the same length
- size_t decoded_dst_len = rrdpush_decompress_decode_signature((const char *)&signature, sizeof(signature));
- if(decoded_dst_len != dst_len)
- fatal("RRDPUSH COMPRESSION: invalid signature, original payload %zu bytes, "
- "compressed payload length %zu bytes, but signature says payload is %zu bytes",
- size_to_compress, dst_len, decoded_dst_len);
-#endif
-
- if(cbuffer_add_unsafe(s->buffer, (const char *)&signature, sizeof(signature)))
- s->flags |= SENDER_FLAG_OVERFLOW;
- else {
- if(cbuffer_add_unsafe(s->buffer, dst, dst_len))
- s->flags |= SENDER_FLAG_OVERFLOW;
- else
- s->sent_bytes_on_this_connection_per_type[type] += dst_len + sizeof(signature);
- }
-
- src = src + size_to_compress;
- src_len -= size_to_compress;
- }
- }
- else if(cbuffer_add_unsafe(s->buffer, src, src_len))
- s->flags |= SENDER_FLAG_OVERFLOW;
- else
- s->sent_bytes_on_this_connection_per_type[type] += src_len;
-
- replication_recalculate_buffer_used_ratio_unsafe(s);
-
- bool signal_sender = false;
- if(!rrdpush_sender_pipe_has_pending_data(s)) {
- rrdpush_sender_pipe_set_pending_data(s);
- signal_sender = true;
- }
-
- sender_unlock(s);
-
- if(signal_sender && (!stream_has_capability(s, STREAM_CAP_INTERPOLATED) || type != STREAM_TRAFFIC_TYPE_DATA))
- rrdpush_signal_sender_to_wake_up(s);
-}
-
-static inline void rrdpush_sender_add_host_variable_to_buffer(BUFFER *wb, const RRDVAR_ACQUIRED *rva) {
- buffer_sprintf(
- wb
- , "VARIABLE HOST %s = " NETDATA_DOUBLE_FORMAT "\n"
- , rrdvar_name(rva)
- , rrdvar2number(rva)
- );
-
- netdata_log_debug(D_STREAM, "RRDVAR pushed HOST VARIABLE %s = " NETDATA_DOUBLE_FORMAT, rrdvar_name(rva), rrdvar2number(rva));
-}
-
-void rrdpush_sender_send_this_host_variable_now(RRDHOST *host, const RRDVAR_ACQUIRED *rva) {
- if(rrdhost_can_send_definitions_to_parent(host)) {
- BUFFER *wb = sender_start(host->sender);
- rrdpush_sender_add_host_variable_to_buffer(wb, rva);
- sender_commit(host->sender, wb, STREAM_TRAFFIC_TYPE_METADATA);
- sender_thread_buffer_free();
- }
-}
-
-struct custom_host_variables_callback {
- BUFFER *wb;
-};
-
-static int rrdpush_sender_thread_custom_host_variables_callback(const DICTIONARY_ITEM *item __maybe_unused, void *rrdvar_ptr __maybe_unused, void *struct_ptr) {
- const RRDVAR_ACQUIRED *rv = (const RRDVAR_ACQUIRED *)item;
- struct custom_host_variables_callback *tmp = struct_ptr;
- BUFFER *wb = tmp->wb;
-
- rrdpush_sender_add_host_variable_to_buffer(wb, rv);
- return 1;
-}
-
-static void rrdpush_sender_thread_send_custom_host_variables(RRDHOST *host) {
- if(rrdhost_can_send_definitions_to_parent(host)) {
- BUFFER *wb = sender_start(host->sender);
- struct custom_host_variables_callback tmp = {
- .wb = wb
- };
- int ret = rrdvar_walkthrough_read(host->rrdvars, rrdpush_sender_thread_custom_host_variables_callback, &tmp);
- (void)ret;
- sender_commit(host->sender, wb, STREAM_TRAFFIC_TYPE_METADATA);
- sender_thread_buffer_free();
-
- netdata_log_debug(D_STREAM, "RRDVAR sent %d VARIABLES", ret);
- }
-}
+#include "sender-internals.h"
// resets all the chart, so that their definitions
// will be resent to the central netdata
@@ -275,7 +24,7 @@ static void rrdpush_sender_thread_reset_all_charts(RRDHOST *host) {
rrdhost_sender_replicating_charts_zero(host);
}
-static void rrdpush_sender_cbuffer_recreate_timed(struct sender_state *s, time_t now_s, bool have_mutex, bool force) {
+void rrdpush_sender_cbuffer_recreate_timed(struct sender_state *s, time_t now_s, bool have_mutex, bool force) {
static __thread time_t last_reset_time_s = 0;
if(!force && now_s - last_reset_time_s < 300)
@@ -324,704 +73,24 @@ static void rrdpush_sender_charts_and_replication_reset(RRDHOST *host) {
rrdpush_sender_replicating_charts_zero(host->sender);
}
-static void rrdpush_sender_on_connect(RRDHOST *host) {
+void rrdpush_sender_on_connect(RRDHOST *host) {
rrdpush_sender_cbuffer_flush(host);
rrdpush_sender_charts_and_replication_reset(host);
}
-static void rrdpush_sender_after_connect(RRDHOST *host) {
+void rrdpush_sender_after_connect(RRDHOST *host) {
rrdpush_sender_thread_send_custom_host_variables(host);
}
-static inline void rrdpush_sender_thread_close_socket(RRDHOST *host) {
-#ifdef ENABLE_HTTPS
- netdata_ssl_close(&host->sender->ssl);
-#endif
-
- if(host->sender->rrdpush_sender_socket != -1) {
- close(host->sender->rrdpush_sender_socket);
- host->sender->rrdpush_sender_socket = -1;
- }
+static void rrdpush_sender_on_disconnect(RRDHOST *host) {
+ // we have been connected to this parent - let's cleanup
- rrdhost_flag_clear(host, RRDHOST_FLAG_RRDPUSH_SENDER_READY_4_METRICS);
- rrdhost_flag_clear(host, RRDHOST_FLAG_RRDPUSH_SENDER_CONNECTED);
-
- // do not flush the circular buffer here
- // this function is called sometimes with the mutex lock, sometimes without the lock
rrdpush_sender_charts_and_replication_reset(host);
-}
-
-void rrdpush_encode_variable(stream_encoded_t *se, RRDHOST *host) {
- se->os_name = (host->system_info->host_os_name)?url_encode(host->system_info->host_os_name):strdupz("");
- se->os_id = (host->system_info->host_os_id)?url_encode(host->system_info->host_os_id):strdupz("");
- se->os_version = (host->system_info->host_os_version)?url_encode(host->system_info->host_os_version):strdupz("");
- se->kernel_name = (host->system_info->kernel_name)?url_encode(host->system_info->kernel_name):strdupz("");
- se->kernel_version = (host->system_info->kernel_version)?url_encode(host->system_info->kernel_version):strdupz("");
-}
-
-void rrdpush_clean_encoded(stream_encoded_t *se) {
- if (se->os_name) {
- freez(se->os_name);
- se->os_name = NULL;
- }
-
- if (se->os_id) {
- freez(se->os_id);
- se->os_id = NULL;
- }
-
- if (se->os_version) {
- freez(se->os_version);
- se->os_version = NULL;
- }
-
- if (se->kernel_name) {
- freez(se->kernel_name);
- se->kernel_name = NULL;
- }
-
- if (se->kernel_version) {
- freez(se->kernel_version);
- se->kernel_version = NULL;
- }
-}
-
-struct {
- const char *response;
- const char *status;
- size_t length;
- int32_t version;
- bool dynamic;
- const char *error;
- int worker_job_id;
- int postpone_reconnect_seconds;
- ND_LOG_FIELD_PRIORITY priority;
-} stream_responses[] = {
- {
- .response = START_STREAMING_PROMPT_VN,
- .length = sizeof(START_STREAMING_PROMPT_VN) - 1,
- .status = RRDPUSH_STATUS_CONNECTED,
- .version = STREAM_HANDSHAKE_OK_V3, // and above
- .dynamic = true, // dynamic = we will parse the version / capabilities
- .error = NULL,
- .worker_job_id = 0,
- .postpone_reconnect_seconds = 0,
- .priority = NDLP_INFO,
- },
- {
- .response = START_STREAMING_PROMPT_V2,
- .length = sizeof(START_STREAMING_PROMPT_V2) - 1,
- .status = RRDPUSH_STATUS_CONNECTED,
- .version = STREAM_HANDSHAKE_OK_V2,
- .dynamic = false,
- .error = NULL,
- .worker_job_id = 0,
- .postpone_reconnect_seconds = 0,
- .priority = NDLP_INFO,
- },
- {
- .response = START_STREAMING_PROMPT_V1,
- .length = sizeof(START_STREAMING_PROMPT_V1) - 1,
- .status = RRDPUSH_STATUS_CONNECTED,
- .version = STREAM_HANDSHAKE_OK_V1,
- .dynamic = false,
- .error = NULL,
- .worker_job_id = 0,
- .postpone_reconnect_seconds = 0,
- .priority = NDLP_INFO,
- },
- {
- .response = START_STREAMING_ERROR_SAME_LOCALHOST,
- .length = sizeof(START_STREAMING_ERROR_SAME_LOCALHOST) - 1,
- .status = RRDPUSH_STATUS_LOCALHOST,
- .version = STREAM_HANDSHAKE_ERROR_LOCALHOST,
- .dynamic = false,
- .error = "remote server rejected this stream, the host we are trying to stream is its localhost",
- .worker_job_id = WORKER_SENDER_JOB_DISCONNECT_BAD_HANDSHAKE,
- .postpone_reconnect_seconds = 60 * 60, // the IP may change, try it every hour
- .priority = NDLP_DEBUG,
- },
- {
- .response = START_STREAMING_ERROR_ALREADY_STREAMING,
- .length = sizeof(START_STREAMING_ERROR_ALREADY_STREAMING) - 1,
- .status = RRDPUSH_STATUS_ALREADY_CONNECTED,
- .version = STREAM_HANDSHAKE_ERROR_ALREADY_CONNECTED,
- .dynamic = false,
- .error = "remote server rejected this stream, the host we are trying to stream is already streamed to it",
- .worker_job_id = WORKER_SENDER_JOB_DISCONNECT_BAD_HANDSHAKE,
- .postpone_reconnect_seconds = 2 * 60, // 2 minutes
- .priority = NDLP_DEBUG,
- },
- {
- .response = START_STREAMING_ERROR_NOT_PERMITTED,
- .length = sizeof(START_STREAMING_ERROR_NOT_PERMITTED) - 1,
- .status = RRDPUSH_STATUS_PERMISSION_DENIED,
- .version = STREAM_HANDSHAKE_ERROR_DENIED,
- .dynamic = false,
- .error = "remote server denied access, probably we don't have the right API key?",
- .worker_job_id = WORKER_SENDER_JOB_DISCONNECT_BAD_HANDSHAKE,
- .postpone_reconnect_seconds = 1 * 60, // 1 minute
- .priority = NDLP_ERR,
- },
- {
- .response = START_STREAMING_ERROR_BUSY_TRY_LATER,
- .length = sizeof(START_STREAMING_ERROR_BUSY_TRY_LATER) - 1,
- .status = RRDPUSH_STATUS_RATE_LIMIT,
- .version = STREAM_HANDSHAKE_BUSY_TRY_LATER,
- .dynamic = false,
- .error = "remote server is currently busy, we should try later",
- .worker_job_id = WORKER_SENDER_JOB_DISCONNECT_BAD_HANDSHAKE,
- .postpone_reconnect_seconds = 2 * 60, // 2 minutes
- .priority = NDLP_NOTICE,
- },
- {
- .response = START_STREAMING_ERROR_INTERNAL_ERROR,
- .length = sizeof(START_STREAMING_ERROR_INTERNAL_ERROR) - 1,
- .status = RRDPUSH_STATUS_INTERNAL_SERVER_ERROR,
- .version = STREAM_HANDSHAKE_INTERNAL_ERROR,
- .dynamic = false,
- .error = "remote server is encountered an internal error, we should try later",
- .worker_job_id = WORKER_SENDER_JOB_DISCONNECT_BAD_HANDSHAKE,
- .postpone_reconnect_seconds = 5 * 60, // 5 minutes
- .priority = NDLP_CRIT,
- },
- {
- .response = START_STREAMING_ERROR_INITIALIZATION,
- .length = sizeof(START_STREAMING_ERROR_INITIALIZATION) - 1,
- .status = RRDPUSH_STATUS_INITIALIZATION_IN_PROGRESS,
- .version = STREAM_HANDSHAKE_INITIALIZATION,
- .dynamic = false,
- .error = "remote server is initializing, we should try later",
- .worker_job_id = WORKER_SENDER_JOB_DISCONNECT_BAD_HANDSHAKE,
- .postpone_reconnect_seconds = 2 * 60, // 2 minute
- .priority = NDLP_NOTICE,
- },
-
- // terminator
- {
- .response = NULL,
- .length = 0,
- .status = RRDPUSH_STATUS_BAD_HANDSHAKE,
- .version = STREAM_HANDSHAKE_ERROR_BAD_HANDSHAKE,
- .dynamic = false,
- .error = "remote node response is not understood, is it Netdata?",
- .worker_job_id = WORKER_SENDER_JOB_DISCONNECT_BAD_HANDSHAKE,
- .postpone_reconnect_seconds = 1 * 60, // 1 minute
- .priority = NDLP_ERR,
- }
-};
-
-static inline bool rrdpush_sender_validate_response(RRDHOST *host, struct sender_state *s, char *http, size_t http_length) {
- int32_t version = STREAM_HANDSHAKE_ERROR_BAD_HANDSHAKE;
-
- int i;
- for(i = 0; stream_responses[i].response ; i++) {
- if(stream_responses[i].dynamic &&
- http_length > stream_responses[i].length && http_length < (stream_responses[i].length + 30) &&
- strncmp(http, stream_responses[i].response, stream_responses[i].length) == 0) {
-
- version = str2i(&http[stream_responses[i].length]);
- break;
- }
- else if(http_length == stream_responses[i].length && strcmp(http, stream_responses[i].response) == 0) {
- version = stream_responses[i].version;
-
- break;
- }
- }
-
- if(version >= STREAM_HANDSHAKE_OK_V1) {
- host->destination->reason = version;
- host->destination->postpone_reconnection_until = now_realtime_sec() + s->reconnect_delay;
- s->capabilities = convert_stream_version_to_capabilities(version, host, true);
- return true;
- }
-
- ND_LOG_FIELD_PRIORITY priority = stream_responses[i].priority;
- const char *error = stream_responses[i].error;
- const char *status = stream_responses[i].status;
- int worker_job_id = stream_responses[i].worker_job_id;
- int delay = stream_responses[i].postpone_reconnect_seconds;
-
- worker_is_busy(worker_job_id);
- rrdpush_sender_thread_close_socket(host);
- host->destination->reason = version;
- host->destination->postpone_reconnection_until = now_realtime_sec() + delay;
-
- ND_LOG_STACK lgs[] = {
- ND_LOG_FIELD_TXT(NDF_RESPONSE_CODE, status),
- ND_LOG_FIELD_END(),
- };
- ND_LOG_STACK_PUSH(lgs);
-
- char buf[RFC3339_MAX_LENGTH];
- rfc3339_datetime_ut(buf, sizeof(buf), host->destination->postpone_reconnection_until * USEC_PER_SEC, 0, false);
-
- nd_log(NDLS_DAEMON, priority,
- "STREAM %s [send to %s]: %s - will retry in %d secs, at %s",
- rrdhost_hostname(host), s->connected_to, error, delay, buf);
-
- return false;
-}
-
-unsigned char alpn_proto_list[] = {
- 18, 'n', 'e', 't', 'd', 'a', 't', 'a', '_', 's', 't', 'r', 'e', 'a', 'm', '/', '2', '.', '0',
- 8, 'h', 't', 't', 'p', '/', '1', '.', '1'
-};
-
-#define CONN_UPGRADE_VAL "upgrade"
-
-static bool rrdpush_sender_connect_ssl(struct sender_state *s __maybe_unused) {
-#ifdef ENABLE_HTTPS
- RRDHOST *host = s->host;
- bool ssl_required = host->destination && host->destination->ssl;
-
- netdata_ssl_close(&host->sender->ssl);
-
- if(!ssl_required)
- return true;
-
- if (netdata_ssl_open_ext(&host->sender->ssl, netdata_ssl_streaming_sender_ctx, s->rrdpush_sender_socket, alpn_proto_list, sizeof(alpn_proto_list))) {
- if(!netdata_ssl_connect(&host->sender->ssl)) {
- // couldn't connect
-
- ND_LOG_STACK lgs[] = {
- ND_LOG_FIELD_TXT(NDF_RESPONSE_CODE, RRDPUSH_STATUS_SSL_ERROR),
- ND_LOG_FIELD_END(),
- };
- ND_LOG_STACK_PUSH(lgs);
-
- worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_SSL_ERROR);
- rrdpush_sender_thread_close_socket(host);
- host->destination->reason = STREAM_HANDSHAKE_ERROR_SSL_ERROR;
- host->destination->postpone_reconnection_until = now_realtime_sec() + 5 * 60;
- return false;
- }
-
- if (netdata_ssl_validate_certificate_sender &&
- security_test_certificate(host->sender->ssl.conn)) {
- // certificate is not valid
-
- ND_LOG_STACK lgs[] = {
- ND_LOG_FIELD_TXT(NDF_RESPONSE_CODE, RRDPUSH_STATUS_INVALID_SSL_CERTIFICATE),
- ND_LOG_FIELD_END(),
- };
- ND_LOG_STACK_PUSH(lgs);
-
- worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_SSL_ERROR);
- netdata_log_error("SSL: closing the stream connection, because the server SSL certificate is not valid.");
- rrdpush_sender_thread_close_socket(host);
- host->destination->reason = STREAM_HANDSHAKE_ERROR_INVALID_CERTIFICATE;
- host->destination->postpone_reconnection_until = now_realtime_sec() + 5 * 60;
- return false;
- }
-
- return true;
- }
-
- ND_LOG_STACK lgs[] = {
- ND_LOG_FIELD_TXT(NDF_RESPONSE_CODE, RRDPUSH_STATUS_CANT_ESTABLISH_SSL_CONNECTION),
- ND_LOG_FIELD_END(),
- };
- ND_LOG_STACK_PUSH(lgs);
-
- netdata_log_error("SSL: failed to establish connection.");
- return false;
-
-#else
- // SSL is not enabled
- return true;
-#endif
-}
-
-static int rrdpush_http_upgrade_prelude(RRDHOST *host, struct sender_state *s) {
-
- char http[HTTP_HEADER_SIZE + 1];
- snprintfz(http, HTTP_HEADER_SIZE,
- "GET " NETDATA_STREAM_URL HTTP_1_1 HTTP_ENDL
- "Upgrade: " NETDATA_STREAM_PROTO_NAME HTTP_ENDL
- "Connection: Upgrade"
- HTTP_HDR_END);
-
- ssize_t bytes = send_timeout(
-#ifdef ENABLE_HTTPS
- &host->sender->ssl,
-#endif
- s->rrdpush_sender_socket,
- http,
- strlen(http),
- 0,
- 1000);
-
- bytes = recv_timeout(
-#ifdef ENABLE_HTTPS
- &host->sender->ssl,
-#endif
- s->rrdpush_sender_socket,
- http,
- HTTP_HEADER_SIZE,
- 0,
- 1000);
-
- if (bytes <= 0) {
- error_report("Error reading from remote");
- return 1;
- }
-
- rbuf_t buf = rbuf_create(bytes);
- rbuf_push(buf, http, bytes);
-
- http_parse_ctx ctx;
- http_parse_ctx_create(&ctx, HTTP_PARSE_INITIAL);
- ctx.flags |= HTTP_PARSE_FLAG_DONT_WAIT_FOR_CONTENT;
-
- int rc;
-// while((rc = parse_http_response(buf, &ctx)) == HTTP_PARSE_NEED_MORE_DATA);
- rc = parse_http_response(buf, &ctx);
-
- if (rc != HTTP_PARSE_SUCCESS) {
- error_report("Failed to parse HTTP response sent. (%d)", rc);
- goto err_cleanup;
- }
- if (ctx.http_code == HTTP_RESP_MOVED_PERM) {
- const char *hdr = get_http_header_by_name(&ctx, "location");
- if (hdr)
- error_report("HTTP response is %d Moved Permanently (location: \"%s\") instead of expected %d Switching Protocols.", ctx.http_code, hdr, HTTP_RESP_SWITCH_PROTO);
- else
- error_report("HTTP response is %d instead of expected %d Switching Protocols.", ctx.http_code, HTTP_RESP_SWITCH_PROTO);
- goto err_cleanup;
- }
- if (ctx.http_code == HTTP_RESP_NOT_FOUND) {
- error_report("HTTP response is %d instead of expected %d Switching Protocols. Parent version too old.", ctx.http_code, HTTP_RESP_SWITCH_PROTO);
- // TODO set some flag here that will signify parent is older version
- // and to try connection without rrdpush_http_upgrade_prelude next time
- goto err_cleanup;
- }
- if (ctx.http_code != HTTP_RESP_SWITCH_PROTO) {
- error_report("HTTP response is %d instead of expected %d Switching Protocols", ctx.http_code, HTTP_RESP_SWITCH_PROTO);
- goto err_cleanup;
- }
-
- const char *hdr = get_http_header_by_name(&ctx, "connection");
- if (!hdr) {
- error_report("Missing \"connection\" header in reply");
- goto err_cleanup;
- }
- if (strncmp(hdr, CONN_UPGRADE_VAL, strlen(CONN_UPGRADE_VAL))) {
- error_report("Expected \"connection: " CONN_UPGRADE_VAL "\"");
- goto err_cleanup;
- }
-
- hdr = get_http_header_by_name(&ctx, "upgrade");
- if (!hdr) {
- error_report("Missing \"upgrade\" header in reply");
- goto err_cleanup;
- }
- if (strncmp(hdr, NETDATA_STREAM_PROTO_NAME, strlen(NETDATA_STREAM_PROTO_NAME))) {
- error_report("Expected \"upgrade: " NETDATA_STREAM_PROTO_NAME "\"");
- goto err_cleanup;
- }
-
- netdata_log_debug(D_STREAM, "Stream sender upgrade to \"" NETDATA_STREAM_PROTO_NAME "\" successful");
- rbuf_free(buf);
- http_parse_ctx_destroy(&ctx);
- return 0;
-err_cleanup:
- rbuf_free(buf);
- http_parse_ctx_destroy(&ctx);
- return 1;
-}
-
-static bool rrdpush_sender_thread_connect_to_parent(RRDHOST *host, int default_port, int timeout, struct sender_state *s) {
-
- struct timeval tv = {
- .tv_sec = timeout,
- .tv_usec = 0
- };
-
- // make sure the socket is closed
- rrdpush_sender_thread_close_socket(host);
-
- s->rrdpush_sender_socket = connect_to_one_of_destinations(
- host
- , default_port
- , &tv
- , &s->reconnects_counter
- , s->connected_to
- , sizeof(s->connected_to)-1
- , &host->destination
- );
-
- if(unlikely(s->rrdpush_sender_socket == -1)) {
- // netdata_log_error("STREAM %s [send to %s]: could not connect to parent node at this time.", rrdhost_hostname(host), host->rrdpush_send_destination);
- return false;
- }
-
- // netdata_log_info("STREAM %s [send to %s]: initializing communication...", rrdhost_hostname(host), s->connected_to);
-
- // reset our capabilities to default
- s->capabilities = stream_our_capabilities(host, true);
-
- /* TODO: During the implementation of #7265 switch the set of variables to HOST_* and CONTAINER_* if the
- version negotiation resulted in a high enough version.
- */
- stream_encoded_t se;
- rrdpush_encode_variable(&se, host);
-
- host->sender->hops = host->system_info->hops + 1;
-
- char http[HTTP_HEADER_SIZE + 1];
- int eol = snprintfz(http, HTTP_HEADER_SIZE,
- "STREAM "
- "key=%s"
- "&hostname=%s"
- "&registry_hostname=%s"
- "&machine_guid=%s"
- "&update_every=%d"
- "&os=%s"
- "&timezone=%s"
- "&abbrev_timezone=%s"
- "&utc_offset=%d"
- "&hops=%d"
- "&ml_capable=%d"
- "&ml_enabled=%d"
- "&mc_version=%d"
- "&ver=%u"
- "&NETDATA_INSTANCE_CLOUD_TYPE=%s"
- "&NETDATA_INSTANCE_CLOUD_INSTANCE_TYPE=%s"
- "&NETDATA_INSTANCE_CLOUD_INSTANCE_REGION=%s"
- "&NETDATA_SYSTEM_OS_NAME=%s"
- "&NETDATA_SYSTEM_OS_ID=%s"
- "&NETDATA_SYSTEM_OS_ID_LIKE=%s"
- "&NETDATA_SYSTEM_OS_VERSION=%s"
- "&NETDATA_SYSTEM_OS_VERSION_ID=%s"
- "&NETDATA_SYSTEM_OS_DETECTION=%s"
- "&NETDATA_HOST_IS_K8S_NODE=%s"
- "&NETDATA_SYSTEM_KERNEL_NAME=%s"
- "&NETDATA_SYSTEM_KERNEL_VERSION=%s"
- "&NETDATA_SYSTEM_ARCHITECTURE=%s"
- "&NETDATA_SYSTEM_VIRTUALIZATION=%s"
- "&NETDATA_SYSTEM_VIRT_DETECTION=%s"
- "&NETDATA_SYSTEM_CONTAINER=%s"
- "&NETDATA_SYSTEM_CONTAINER_DETECTION=%s"
- "&NETDATA_CONTAINER_OS_NAME=%s"
- "&NETDATA_CONTAINER_OS_ID=%s"
- "&NETDATA_CONTAINER_OS_ID_LIKE=%s"
- "&NETDATA_CONTAINER_OS_VERSION=%s"
- "&NETDATA_CONTAINER_OS_VERSION_ID=%s"
- "&NETDATA_CONTAINER_OS_DETECTION=%s"
- "&NETDATA_SYSTEM_CPU_LOGICAL_CPU_COUNT=%s"
- "&NETDATA_SYSTEM_CPU_FREQ=%s"
- "&NETDATA_SYSTEM_TOTAL_RAM=%s"
- "&NETDATA_SYSTEM_TOTAL_DISK_SIZE=%s"
- "&NETDATA_PROTOCOL_VERSION=%s"
- HTTP_1_1 HTTP_ENDL
- "User-Agent: %s/%s\r\n"
- "Accept: */*\r\n\r\n"
- , host->rrdpush_send_api_key
- , rrdhost_hostname(host)
- , rrdhost_registry_hostname(host)
- , host->machine_guid
- , default_rrd_update_every
- , rrdhost_os(host)
- , rrdhost_timezone(host)
- , rrdhost_abbrev_timezone(host)
- , host->utc_offset
- , host->sender->hops
- , host->system_info->ml_capable
- , host->system_info->ml_enabled
- , host->system_info->mc_version
- , s->capabilities
- , (host->system_info->cloud_provider_type) ? host->system_info->cloud_provider_type : ""
- , (host->system_info->cloud_instance_type) ? host->system_info->cloud_instance_type : ""
- , (host->system_info->cloud_instance_region) ? host->system_info->cloud_instance_region : ""
- , se.os_name
- , se.os_id
- , (host->system_info->host_os_id_like) ? host->system_info->host_os_id_like : ""
- , se.os_version
- , (host->system_info->host_os_version_id) ? host->system_info->host_os_version_id : ""
- , (host->system_info->host_os_detection) ? host->system_info->host_os_detection : ""
- , (host->system_info->is_k8s_node) ? host->system_info->is_k8s_node : ""
- , se.kernel_name
- , se.kernel_version
- , (host->system_info->architecture) ? host->system_info->architecture : ""
- , (host->system_info->virtualization) ? host->system_info->virtualization : ""
- , (host->system_info->virt_detection) ? host->system_info->virt_detection : ""
- , (host->system_info->container) ? host->system_info->container : ""
- , (host->system_info->container_detection) ? host->system_info->container_detection : ""
- , (host->system_info->container_os_name) ? host->system_info->container_os_name : ""
- , (host->system_info->container_os_id) ? host->system_info->container_os_id : ""
- , (host->system_info->container_os_id_like) ? host->system_info->container_os_id_like : ""
- , (host->system_info->container_os_version) ? host->system_info->container_os_version : ""
- , (host->system_info->container_os_version_id) ? host->system_info->container_os_version_id : ""
- , (host->system_info->container_os_detection) ? host->system_info->container_os_detection : ""
- , (host->system_info->host_cores) ? host->system_info->host_cores : ""
- , (host->system_info->host_cpu_freq) ? host->system_info->host_cpu_freq : ""
- , (host->system_info->host_ram_total) ? host->system_info->host_ram_total : ""
- , (host->system_info->host_disk_space) ? host->system_info->host_disk_space : ""
- , STREAMING_PROTOCOL_VERSION
- , rrdhost_program_name(host)
- , rrdhost_program_version(host)
- );
- http[eol] = 0x00;
- rrdpush_clean_encoded(&se);
-
- if(!rrdpush_sender_connect_ssl(s))
- return false;
-
- if (s->parent_using_h2o && rrdpush_http_upgrade_prelude(host, s)) {
- ND_LOG_STACK lgs[] = {
- ND_LOG_FIELD_TXT(NDF_RESPONSE_CODE, RRDPUSH_STATUS_CANT_UPGRADE_CONNECTION),
- ND_LOG_FIELD_END(),
- };
- ND_LOG_STACK_PUSH(lgs);
-
- worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_CANT_UPGRADE_CONNECTION);
- rrdpush_sender_thread_close_socket(host);
- host->destination->reason = STREAM_HANDSHAKE_ERROR_HTTP_UPGRADE;
- host->destination->postpone_reconnection_until = now_realtime_sec() + 1 * 60;
- return false;
- }
-
- ssize_t len = (ssize_t)strlen(http);
- ssize_t bytes = send_timeout(
-#ifdef ENABLE_HTTPS
- &host->sender->ssl,
-#endif
- s->rrdpush_sender_socket,
- http,
- len,
- 0,
- timeout);
-
- if(bytes <= 0) { // timeout is 0
- ND_LOG_STACK lgs[] = {
- ND_LOG_FIELD_TXT(NDF_RESPONSE_CODE, RRDPUSH_STATUS_TIMEOUT),
- ND_LOG_FIELD_END(),
- };
- ND_LOG_STACK_PUSH(lgs);
-
- worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_TIMEOUT);
- rrdpush_sender_thread_close_socket(host);
-
- nd_log(NDLS_DAEMON, NDLP_ERR,
- "STREAM %s [send to %s]: failed to send HTTP header to remote netdata.",
- rrdhost_hostname(host), s->connected_to);
-
- host->destination->reason = STREAM_HANDSHAKE_ERROR_SEND_TIMEOUT;
- host->destination->postpone_reconnection_until = now_realtime_sec() + 1 * 60;
- return false;
- }
-
- bytes = recv_timeout(
-#ifdef ENABLE_HTTPS
- &host->sender->ssl,
-#endif
- s->rrdpush_sender_socket,
- http,
- HTTP_HEADER_SIZE,
- 0,
- timeout);
-
- if(bytes <= 0) { // timeout is 0
- ND_LOG_STACK lgs[] = {
- ND_LOG_FIELD_TXT(NDF_RESPONSE_CODE, RRDPUSH_STATUS_TIMEOUT),
- ND_LOG_FIELD_END(),
- };
- ND_LOG_STACK_PUSH(lgs);
-
- worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_TIMEOUT);
- rrdpush_sender_thread_close_socket(host);
-
- nd_log(NDLS_DAEMON, NDLP_ERR,
- "STREAM %s [send to %s]: remote netdata does not respond.",
- rrdhost_hostname(host), s->connected_to);
-
- host->destination->reason = STREAM_HANDSHAKE_ERROR_RECEIVE_TIMEOUT;
- host->destination->postpone_reconnection_until = now_realtime_sec() + 30;
- return false;
- }
-
- if(sock_setnonblock(s->rrdpush_sender_socket) < 0)
- nd_log(NDLS_DAEMON, NDLP_WARNING,
- "STREAM %s [send to %s]: cannot set non-blocking mode for socket.",
- rrdhost_hostname(host), s->connected_to);
- sock_setcloexec(s->rrdpush_sender_socket);
-
- if(sock_enlarge_out(s->rrdpush_sender_socket) < 0)
- nd_log(NDLS_DAEMON, NDLP_WARNING,
- "STREAM %s [send to %s]: cannot enlarge the socket buffer.",
- rrdhost_hostname(host), s->connected_to);
-
- http[bytes] = '\0';
- if(!rrdpush_sender_validate_response(host, s, http, bytes))
- return false;
-
- rrdpush_compression_initialize(s);
-
- log_sender_capabilities(s);
-
- ND_LOG_STACK lgs[] = {
- ND_LOG_FIELD_TXT(NDF_RESPONSE_CODE, RRDPUSH_STATUS_CONNECTED),
- ND_LOG_FIELD_END(),
- };
- ND_LOG_STACK_PUSH(lgs);
-
- nd_log(NDLS_DAEMON, NDLP_DEBUG,
- "STREAM %s: connected to %s...",
- rrdhost_hostname(host), s->connected_to);
-
- return true;
-}
-
-static bool attempt_to_connect(struct sender_state *state) {
- ND_LOG_STACK lgs[] = {
- ND_LOG_FIELD_UUID(NDF_MESSAGE_ID, &streaming_to_parent_msgid),
- ND_LOG_FIELD_END(),
- };
- ND_LOG_STACK_PUSH(lgs);
-
- state->send_attempts = 0;
-
- // reset the bytes we have sent for this session
- state->sent_bytes_on_this_connection = 0;
- memset(state->sent_bytes_on_this_connection_per_type, 0, sizeof(state->sent_bytes_on_this_connection_per_type));
-
- if(rrdpush_sender_thread_connect_to_parent(state->host, state->default_port, state->timeout, state)) {
- // reset the buffer, to properly send charts and metrics
- rrdpush_sender_on_connect(state->host);
-
- // send from the beginning
- state->begin = 0;
-
- // make sure the next reconnection will be immediate
- state->not_connected_loops = 0;
-
- // let the data collection threads know we are ready
- rrdhost_flag_set(state->host, RRDHOST_FLAG_RRDPUSH_SENDER_CONNECTED);
-
- rrdpush_sender_after_connect(state->host);
-
- return true;
- }
-
- // we couldn't connect
-
- // increase the failed connections counter
- state->not_connected_loops++;
-
- // slow re-connection on repeating errors
- usec_t now_ut = now_monotonic_usec();
- usec_t end_ut = now_ut + USEC_PER_SEC * state->reconnect_delay;
- while(now_ut < end_ut) {
- if(nd_thread_signaled_to_cancel())
- return false;
- sleep_usec(100 * USEC_PER_MS); // seconds
- now_ut = now_monotonic_usec();
- }
-
- return false;
+ // clear the parent's claim id
+ rrdpush_sender_clear_parent_claim_id(host);
+ rrdpush_receiver_send_node_and_claim_id_to_child(host);
+ stream_path_parent_disconnected(host);
}
// TCP window is open, and we have data to transmit.
@@ -1037,14 +106,10 @@ static ssize_t attempt_to_send(struct sender_state *s) {
size_t outstanding = cbuffer_next_unsafe(s->buffer, &chunk);
netdata_log_debug(D_STREAM, "STREAM: Sending data. Buffer r=%zu w=%zu s=%zu, next chunk=%zu", cb->read, cb->write, cb->size, outstanding);
-#ifdef ENABLE_HTTPS
if(SSL_connection(&s->ssl))
ret = netdata_ssl_write(&s->ssl, chunk, outstanding);
else
ret = send(s->rrdpush_sender_socket, chunk, outstanding, MSG_DONTWAIT);
-#else
- ret = send(s->rrdpush_sender_socket, chunk, outstanding, MSG_DONTWAIT);
-#endif
if (likely(ret > 0)) {
cbuffer_remove_unsafe(s->buffer, ret);
@@ -1058,7 +123,7 @@ static ssize_t attempt_to_send(struct sender_state *s) {
worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_SEND_ERROR);
netdata_log_debug(D_STREAM, "STREAM: Send failed - closing socket...");
netdata_log_error("STREAM %s [send to %s]: failed to send metrics - closing connection - we have sent %zu bytes on this connection.", rrdhost_hostname(s->host), s->connected_to, s->sent_bytes_on_this_connection);
- rrdpush_sender_thread_close_socket(s->host);
+ rrdpush_sender_thread_close_socket(s);
}
else
netdata_log_debug(D_STREAM, "STREAM: send() returned 0 -> no error but no transmission");
@@ -1072,14 +137,10 @@ static ssize_t attempt_to_send(struct sender_state *s) {
static ssize_t attempt_read(struct sender_state *s) {
ssize_t ret;
-#ifdef ENABLE_HTTPS
if (SSL_connection(&s->ssl))
ret = netdata_ssl_read(&s->ssl, s->read_buffer + s->read_len, sizeof(s->read_buffer) - s->read_len - 1);
else
ret = recv(s->rrdpush_sender_socket, s->read_buffer + s->read_len, sizeof(s->read_buffer) - s->read_len - 1,MSG_DONTWAIT);
-#else
- ret = recv(s->rrdpush_sender_socket, s->read_buffer + s->read_len, sizeof(s->read_buffer) - s->read_len - 1,MSG_DONTWAIT);
-#endif
if (ret > 0) {
s->read_len += ret;
@@ -1089,13 +150,9 @@ static ssize_t attempt_read(struct sender_state *s) {
if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR))
return ret;
-#ifdef ENABLE_HTTPS
if (SSL_connection(&s->ssl))
worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_SSL_ERROR);
- else
-#endif
-
- if (ret == 0 || errno == ECONNRESET) {
+ else if (ret == 0 || errno == ECONNRESET) {
worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_PARENT_CLOSED);
netdata_log_error("STREAM %s [send to %s]: connection closed by far end.", rrdhost_hostname(s->host), s->connected_to);
}
@@ -1104,258 +161,11 @@ static ssize_t attempt_read(struct sender_state *s) {
netdata_log_error("STREAM %s [send to %s]: error during receive (%zd) - closing connection.", rrdhost_hostname(s->host), s->connected_to, ret);
}
- rrdpush_sender_thread_close_socket(s->host);
+ rrdpush_sender_thread_close_socket(s);
return ret;
}
-struct inflight_stream_function {
- struct sender_state *sender;
- STRING *transaction;
- usec_t received_ut;
-};
-
-static void stream_execute_function_callback(BUFFER *func_wb, int code, void *data) {
- struct inflight_stream_function *tmp = data;
- struct sender_state *s = tmp->sender;
-
- if(rrdhost_can_send_definitions_to_parent(s->host)) {
- BUFFER *wb = sender_start(s);
-
- pluginsd_function_result_begin_to_buffer(wb
- , string2str(tmp->transaction)
- , code
- , content_type_id2string(func_wb->content_type)
- , func_wb->expires);
-
- buffer_fast_strcat(wb, buffer_tostring(func_wb), buffer_strlen(func_wb));
- pluginsd_function_result_end_to_buffer(wb);
-
- sender_commit(s, wb, STREAM_TRAFFIC_TYPE_FUNCTIONS);
- sender_thread_buffer_free();
-
- internal_error(true, "STREAM %s [send to %s] FUNCTION transaction %s sending back response (%zu bytes, %"PRIu64" usec).",
- rrdhost_hostname(s->host), s->connected_to,
- string2str(tmp->transaction),
- buffer_strlen(func_wb),
- now_realtime_usec() - tmp->received_ut);
- }
-
- string_freez(tmp->transaction);
- buffer_free(func_wb);
- freez(tmp);
-}
-
-static void stream_execute_function_progress_callback(void *data, size_t done, size_t all) {
- struct inflight_stream_function *tmp = data;
- struct sender_state *s = tmp->sender;
-
- if(rrdhost_can_send_definitions_to_parent(s->host)) {
- BUFFER *wb = sender_start(s);
-
- buffer_sprintf(wb, PLUGINSD_KEYWORD_FUNCTION_PROGRESS " '%s' %zu %zu\n",
- string2str(tmp->transaction), done, all);
-
- sender_commit(s, wb, STREAM_TRAFFIC_TYPE_FUNCTIONS);
- }
-}
-
-static void execute_commands_function(struct sender_state *s, const char *command, const char *transaction, const char *timeout_s, const char *function, BUFFER *payload, const char *access, const char *source) {
- worker_is_busy(WORKER_SENDER_JOB_FUNCTION_REQUEST);
- nd_log(NDLS_ACCESS, NDLP_INFO, NULL);
-
- if(!transaction || !*transaction || !timeout_s || !*timeout_s || !function || !*function) {
- netdata_log_error("STREAM %s [send to %s] %s execution command is incomplete (transaction = '%s', timeout = '%s', function = '%s'). Ignoring it.",
- rrdhost_hostname(s->host), s->connected_to,
- command,
- transaction?transaction:"(unset)",
- timeout_s?timeout_s:"(unset)",
- function?function:"(unset)");
- }
- else {
- int timeout = str2i(timeout_s);
- if(timeout <= 0) timeout = PLUGINS_FUNCTIONS_TIMEOUT_DEFAULT;
-
- struct inflight_stream_function *tmp = callocz(1, sizeof(struct inflight_stream_function));
- tmp->received_ut = now_realtime_usec();
- tmp->sender = s;
- tmp->transaction = string_strdupz(transaction);
- BUFFER *wb = buffer_create(1024, &netdata_buffers_statistics.buffers_functions);
-
- int code = rrd_function_run(s->host, wb, timeout,
- http_access_from_hex_mapping_old_roles(access), function, false, transaction,
- stream_execute_function_callback, tmp,
- stream_has_capability(s, STREAM_CAP_PROGRESS) ? stream_execute_function_progress_callback : NULL,
- stream_has_capability(s, STREAM_CAP_PROGRESS) ? tmp : NULL,
- NULL, NULL, payload, source);
-
- if(code != HTTP_RESP_OK) {
- if (!buffer_strlen(wb))
- rrd_call_function_error(wb, "Failed to route request to collector", code);
- }
- }
-}
-
-static void cleanup_intercepting_input(struct sender_state *s) {
- freez((void *)s->functions.transaction);
- freez((void *)s->functions.timeout_s);
- freez((void *)s->functions.function);
- freez((void *)s->functions.access);
- freez((void *)s->functions.source);
- buffer_free(s->functions.payload);
-
- s->functions.transaction = NULL;
- s->functions.timeout_s = NULL;
- s->functions.function = NULL;
- s->functions.payload = NULL;
- s->functions.access = NULL;
- s->functions.source = NULL;
- s->functions.intercept_input = false;
-}
-
-static void execute_commands_cleanup(struct sender_state *s) {
- cleanup_intercepting_input(s);
-}
-
-// This is just a placeholder until the gap filling state machine is inserted
-void execute_commands(struct sender_state *s) {
- worker_is_busy(WORKER_SENDER_JOB_EXECUTE);
-
- ND_LOG_STACK lgs[] = {
- ND_LOG_FIELD_CB(NDF_REQUEST, line_splitter_reconstruct_line, &s->line),
- ND_LOG_FIELD_END(),
- };
- ND_LOG_STACK_PUSH(lgs);
-
- char *start = s->read_buffer, *end = &s->read_buffer[s->read_len], *newline;
- *end = '\0';
- for( ; start < end ; start = newline + 1) {
- newline = strchr(start, '\n');
-
- if(!newline) {
- if(s->functions.intercept_input) {
- buffer_strcat(s->functions.payload, start);
- start = end;
- }
- break;
- }
-
- *newline = '\0';
- s->line.count++;
-
- if(s->functions.intercept_input) {
- if(strcmp(start, PLUGINSD_CALL_FUNCTION_PAYLOAD_END) == 0) {
- execute_commands_function(s,
- PLUGINSD_CALL_FUNCTION_PAYLOAD_END,
- s->functions.transaction, s->functions.timeout_s,
- s->functions.function, s->functions.payload,
- s->functions.access, s->functions.source);
-
- cleanup_intercepting_input(s);
- }
- else {
- buffer_strcat(s->functions.payload, start);
- buffer_fast_charcat(s->functions.payload, '\n');
- }
-
- continue;
- }
-
- s->line.num_words = quoted_strings_splitter_pluginsd(start, s->line.words, PLUGINSD_MAX_WORDS);
- const char *command = get_word(s->line.words, s->line.num_words, 0);
-
- if(command && strcmp(command, PLUGINSD_CALL_FUNCTION) == 0) {
- char *transaction = get_word(s->line.words, s->line.num_words, 1);
- char *timeout_s = get_word(s->line.words, s->line.num_words, 2);
- char *function = get_word(s->line.words, s->line.num_words, 3);
- char *access = get_word(s->line.words, s->line.num_words, 4);
- char *source = get_word(s->line.words, s->line.num_words, 5);
-
- execute_commands_function(s, command, transaction, timeout_s, function, NULL, access, source);
- }
- else if(command && strcmp(command, PLUGINSD_CALL_FUNCTION_PAYLOAD_BEGIN) == 0) {
- char *transaction = get_word(s->line.words, s->line.num_words, 1);
- char *timeout_s = get_word(s->line.words, s->line.num_words, 2);
- char *function = get_word(s->line.words, s->line.num_words, 3);
- char *access = get_word(s->line.words, s->line.num_words, 4);
- char *source = get_word(s->line.words, s->line.num_words, 5);
- char *content_type = get_word(s->line.words, s->line.num_words, 6);
-
- s->functions.transaction = strdupz(transaction ? transaction : "");
- s->functions.timeout_s = strdupz(timeout_s ? timeout_s : "");
- s->functions.function = strdupz(function ? function : "");
- s->functions.access = strdupz(access ? access : "");
- s->functions.source = strdupz(source ? source : "");
- s->functions.payload = buffer_create(0, NULL);
- s->functions.payload->content_type = content_type_string2id(content_type);
- s->functions.intercept_input = true;
- }
- else if(command && strcmp(command, PLUGINSD_CALL_FUNCTION_CANCEL) == 0) {
- worker_is_busy(WORKER_SENDER_JOB_FUNCTION_REQUEST);
- nd_log(NDLS_ACCESS, NDLP_DEBUG, NULL);
-
- char *transaction = get_word(s->line.words, s->line.num_words, 1);
- if(transaction && *transaction)
- rrd_function_cancel(transaction);
- }
- else if(command && strcmp(command, PLUGINSD_CALL_FUNCTION_PROGRESS) == 0) {
- worker_is_busy(WORKER_SENDER_JOB_FUNCTION_REQUEST);
- nd_log(NDLS_ACCESS, NDLP_DEBUG, NULL);
-
- char *transaction = get_word(s->line.words, s->line.num_words, 1);
- if(transaction && *transaction)
- rrd_function_progress(transaction);
- }
- else if (command && strcmp(command, PLUGINSD_KEYWORD_REPLAY_CHART) == 0) {
- worker_is_busy(WORKER_SENDER_JOB_REPLAY_REQUEST);
- nd_log(NDLS_ACCESS, NDLP_DEBUG, NULL);
-
- const char *chart_id = get_word(s->line.words, s->line.num_words, 1);
- const char *start_streaming = get_word(s->line.words, s->line.num_words, 2);
- const char *after = get_word(s->line.words, s->line.num_words, 3);
- const char *before = get_word(s->line.words, s->line.num_words, 4);
-
- if (!chart_id || !start_streaming || !after || !before) {
- netdata_log_error("STREAM %s [send to %s] %s command is incomplete"
- " (chart=%s, start_streaming=%s, after=%s, before=%s)",
- rrdhost_hostname(s->host), s->connected_to,
- command,
- chart_id ? chart_id : "(unset)",
- start_streaming ? start_streaming : "(unset)",
- after ? after : "(unset)",
- before ? before : "(unset)");
- }
- else {
- replication_add_request(s, chart_id,
- strtoll(after, NULL, 0),
- strtoll(before, NULL, 0),
- !strcmp(start_streaming, "true")
- );
- }
- }
- else {
- netdata_log_error("STREAM %s [send to %s] received unknown command over connection: %s", rrdhost_hostname(s->host), s->connected_to, s->line.words[0]?s->line.words[0]:"(unset)");
- }
-
- line_splitter_reset(&s->line);
- worker_is_busy(WORKER_SENDER_JOB_EXECUTE);
- }
-
- if (start < end) {
- memmove(s->read_buffer, start, end-start);
- s->read_len = end - start;
- }
- else {
- s->read_buffer[0] = '\0';
- s->read_len = 0;
- }
-}
-
-struct rrdpush_sender_thread_data {
- RRDHOST *host;
- char *pipe_buffer;
-};
-
static bool rrdpush_sender_pipe_close(RRDHOST *host, int *pipe_fds, bool reopen) {
static netdata_mutex_t mutex = NETDATA_MUTEX_INITIALIZER;
@@ -1449,7 +259,7 @@ static void rrdhost_clear_sender___while_having_sender_mutex(RRDHOST *host) {
rrdpush_reset_destinations_postpone_time(host);
}
-static bool rrdhost_sender_should_exit(struct sender_state *s) {
+bool rrdhost_sender_should_exit(struct sender_state *s) {
if(unlikely(nd_thread_signaled_to_cancel())) {
if(!s->exit.reason)
s->exit.reason = STREAM_HANDSHAKE_DISCONNECT_SHUTDOWN;
@@ -1483,64 +293,6 @@ static bool rrdhost_sender_should_exit(struct sender_state *s) {
return false;
}
-static void rrdpush_sender_thread_cleanup_callback(void *pptr) {
- struct rrdpush_sender_thread_data *s = CLEANUP_FUNCTION_GET_PTR(pptr);
- if(!s) return;
-
- worker_unregister();
-
- RRDHOST *host = s->host;
-
- sender_lock(host->sender);
- netdata_log_info("STREAM %s [send]: sending thread exits %s",
- rrdhost_hostname(host),
- host->sender->exit.reason != STREAM_HANDSHAKE_NEVER ? stream_handshake_error_to_string(host->sender->exit.reason) : "");
-
- rrdpush_sender_thread_close_socket(host);
- rrdpush_sender_pipe_close(host, host->sender->rrdpush_sender_pipe, false);
- execute_commands_cleanup(host->sender);
-
- rrdhost_clear_sender___while_having_sender_mutex(host);
-
-#ifdef NETDATA_LOG_STREAM_SENDER
- if(host->sender->stream_log_fp) {
- fclose(host->sender->stream_log_fp);
- host->sender->stream_log_fp = NULL;
- }
-#endif
-
- sender_unlock(host->sender);
-
- freez(s->pipe_buffer);
- freez(s);
-}
-
-void rrdpush_initialize_ssl_ctx(RRDHOST *host __maybe_unused) {
-#ifdef ENABLE_HTTPS
- static SPINLOCK sp = NETDATA_SPINLOCK_INITIALIZER;
- spinlock_lock(&sp);
-
- if(netdata_ssl_streaming_sender_ctx || !host) {
- spinlock_unlock(&sp);
- return;
- }
-
- for(struct rrdpush_destinations *d = host->destinations; d ; d = d->next) {
- if (d->ssl) {
- // we need to initialize SSL
-
- netdata_ssl_initialize_ctx(NETDATA_SSL_STREAMING_SENDER_CTX);
- ssl_security_location_for_context(netdata_ssl_streaming_sender_ctx, netdata_ssl_ca_file, netdata_ssl_ca_path);
-
- // stop the loop
- break;
- }
- }
-
- spinlock_unlock(&sp);
-#endif
-}
-
static bool stream_sender_log_capabilities(BUFFER *wb, void *ptr) {
struct sender_state *state = ptr;
if(!state)
@@ -1555,11 +307,7 @@ static bool stream_sender_log_transport(BUFFER *wb, void *ptr) {
if(!state)
return false;
-#ifdef ENABLE_HTTPS
buffer_strcat(wb, SSL_connection(&state->ssl) ? "https" : "http");
-#else
- buffer_strcat(wb, "http");
-#endif
return true;
}
@@ -1627,9 +375,9 @@ void *rrdpush_sender_thread(void *ptr) {
worker_register_job_custom_metric(WORKER_SENDER_JOB_BYTES_COMPRESSION_RATIO, "cumulative compression savings ratio", "%", WORKER_METRIC_ABSOLUTE);
worker_register_job_custom_metric(WORKER_SENDER_JOB_REPLAY_DICT_SIZE, "replication dict entries", "entries", WORKER_METRIC_ABSOLUTE);
- if(!rrdhost_has_rrdpush_sender_enabled(s->host) || !s->host->rrdpush_send_destination ||
- !*s->host->rrdpush_send_destination || !s->host->rrdpush_send_api_key ||
- !*s->host->rrdpush_send_api_key) {
+ if(!rrdhost_has_rrdpush_sender_enabled(s->host) || !s->host->rrdpush.send.destination ||
+ !*s->host->rrdpush.send.destination || !s->host->rrdpush.send.api_key ||
+ !*s->host->rrdpush.send.api_key) {
netdata_log_error("STREAM %s [send]: thread created (task id %d), but host has streaming disabled.",
rrdhost_hostname(s->host), gettid_cached());
return NULL;
@@ -1641,12 +389,12 @@ void *rrdpush_sender_thread(void *ptr) {
return NULL;
}
- rrdpush_initialize_ssl_ctx(s->host);
+ rrdpush_sender_ssl_init(s->host);
netdata_log_info("STREAM %s [send]: thread created (task id %d)", rrdhost_hostname(s->host), gettid_cached());
- s->timeout = (int)appconfig_get_number(
- &stream_config, CONFIG_SECTION_STREAM, "timeout seconds", 600);
+ s->timeout = (int)appconfig_get_duration_seconds(
+ &stream_config, CONFIG_SECTION_STREAM, "timeout", 600);
s->default_port = (int)appconfig_get_number(
&stream_config, CONFIG_SECTION_STREAM, "default port", 19999);
@@ -1654,20 +402,19 @@ void *rrdpush_sender_thread(void *ptr) {
s->buffer->max_size = (size_t)appconfig_get_number(
&stream_config, CONFIG_SECTION_STREAM, "buffer size bytes", 1024 * 1024 * 10);
- s->reconnect_delay = (unsigned int)appconfig_get_number(
- &stream_config, CONFIG_SECTION_STREAM, "reconnect delay seconds", 5);
+ s->reconnect_delay = (unsigned int)appconfig_get_duration_seconds(
+ &stream_config, CONFIG_SECTION_STREAM, "reconnect delay", 5);
- remote_clock_resync_iterations = (unsigned int)appconfig_get_number(
+ stream_conf_initial_clock_resync_iterations = (unsigned int)appconfig_get_number(
&stream_config, CONFIG_SECTION_STREAM,
"initial clock resync iterations",
- remote_clock_resync_iterations); // TODO: REMOVE FOR SLEW / GAPFILLING
+ stream_conf_initial_clock_resync_iterations); // TODO: REMOVE FOR SLEW / GAPFILLING
s->parent_using_h2o = appconfig_get_boolean(
&stream_config, CONFIG_SECTION_STREAM, "parent using h2o", false);
// initialize rrdpush globals
- rrdhost_flag_clear(s->host, RRDHOST_FLAG_RRDPUSH_SENDER_READY_4_METRICS);
- rrdhost_flag_clear(s->host, RRDHOST_FLAG_RRDPUSH_SENDER_CONNECTED);
+ rrdhost_flag_clear(s->host, RRDHOST_FLAG_RRDPUSH_SENDER_CONNECTED | RRDHOST_FLAG_RRDPUSH_SENDER_READY_4_METRICS);
int pipe_buffer_size = 10 * 1024;
#ifdef F_GETPIPE_SZ
@@ -1682,12 +429,9 @@ void *rrdpush_sender_thread(void *ptr) {
return NULL;
}
- struct rrdpush_sender_thread_data *thread_data = callocz(1, sizeof(struct rrdpush_sender_thread_data));
- thread_data->pipe_buffer = mallocz(pipe_buffer_size);
- thread_data->host = s->host;
-
- CLEANUP_FUNCTION_REGISTER(rrdpush_sender_thread_cleanup_callback) cleanup_ptr = thread_data;
+ char *pipe_buffer = mallocz(pipe_buffer_size);
+ bool was_connected = false;
size_t iterations = 0;
time_t now_s = now_monotonic_sec();
while(!rrdhost_sender_should_exit(s)) {
@@ -1695,36 +439,11 @@ void *rrdpush_sender_thread(void *ptr) {
// The connection attempt blocks (after which we use the socket in nonblocking)
if(unlikely(s->rrdpush_sender_socket == -1)) {
- worker_is_busy(WORKER_SENDER_JOB_CONNECT);
-
- now_s = now_monotonic_sec();
- rrdpush_sender_cbuffer_recreate_timed(s, now_s, false, true);
- execute_commands_cleanup(s);
-
- rrdhost_flag_clear(s->host, RRDHOST_FLAG_RRDPUSH_SENDER_READY_4_METRICS);
- s->flags &= ~SENDER_FLAG_OVERFLOW;
- s->read_len = 0;
- s->buffer->read = 0;
- s->buffer->write = 0;
-
- if(!attempt_to_connect(s))
- continue;
-
- if(rrdhost_sender_should_exit(s))
- break;
-
- now_s = s->last_traffic_seen_t = now_monotonic_sec();
- rrdpush_send_claimed_id(s->host);
- rrdpush_send_host_labels(s->host);
- rrdpush_send_global_functions(s->host);
- s->replication.oldest_request_after_t = 0;
-
- rrdhost_flag_set(s->host, RRDHOST_FLAG_RRDPUSH_SENDER_READY_4_METRICS);
-
- nd_log(NDLS_DAEMON, NDLP_DEBUG,
- "STREAM %s [send to %s]: enabling metrics streaming...",
- rrdhost_hostname(s->host), s->connected_to);
+ if(was_connected)
+ rrdpush_sender_on_disconnect(s->host);
+ was_connected = rrdpush_sender_connect(s);
+ now_s = s->last_traffic_seen_t;
continue;
}
@@ -1738,7 +457,7 @@ void *rrdpush_sender_thread(void *ptr) {
)) {
worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_TIMEOUT);
netdata_log_error("STREAM %s [send to %s]: could not send metrics for %d seconds - closing connection - we have sent %zu bytes on this connection via %zu send attempts.", rrdhost_hostname(s->host), s->connected_to, s->timeout, s->sent_bytes_on_this_connection, s->send_attempts);
- rrdpush_sender_thread_close_socket(s->host);
+ rrdpush_sender_thread_close_socket(s);
continue;
}
@@ -1767,9 +486,9 @@ void *rrdpush_sender_thread(void *ptr) {
if(unlikely(s->rrdpush_sender_pipe[PIPE_READ] == -1)) {
if(!rrdpush_sender_pipe_close(s->host, s->rrdpush_sender_pipe, true)) {
- netdata_log_error("STREAM %s [send]: cannot create inter-thread communication pipe. Disabling streaming.",
- rrdhost_hostname(s->host));
- rrdpush_sender_thread_close_socket(s->host);
+ netdata_log_error("STREAM %s [send]: cannot create inter-thread communication pipe. "
+ "Disabling streaming.", rrdhost_hostname(s->host));
+ rrdpush_sender_thread_close_socket(s);
break;
}
}
@@ -1820,7 +539,7 @@ void *rrdpush_sender_thread(void *ptr) {
worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_POLL_ERROR);
netdata_log_error("STREAM %s [send to %s]: failed to poll(). Closing socket.", rrdhost_hostname(s->host), s->connected_to);
rrdpush_sender_pipe_close(s->host, s->rrdpush_sender_pipe, true);
- rrdpush_sender_thread_close_socket(s->host);
+ rrdpush_sender_thread_close_socket(s);
continue;
}
@@ -1839,7 +558,7 @@ void *rrdpush_sender_thread(void *ptr) {
worker_is_busy(WORKER_SENDER_JOB_PIPE_READ);
netdata_log_debug(D_STREAM, "STREAM: Data added to send buffer (current buffer chunk %zu bytes)...", outstanding);
- if (read(fds[Collector].fd, thread_data->pipe_buffer, pipe_buffer_size) == -1)
+ if (read(fds[Collector].fd, pipe_buffer, pipe_buffer_size) == -1)
netdata_log_error("STREAM %s [send to %s]: cannot read from internal pipe.", rrdhost_hostname(s->host), s->connected_to);
}
@@ -1854,7 +573,7 @@ void *rrdpush_sender_thread(void *ptr) {
}
if(unlikely(s->read_len))
- execute_commands(s);
+ rrdpush_sender_execute_commands(s);
if(unlikely(fds[Collector].revents & (POLLERR|POLLHUP|POLLNVAL))) {
char *error = NULL;
@@ -1869,7 +588,7 @@ void *rrdpush_sender_thread(void *ptr) {
if(error) {
rrdpush_sender_pipe_close(s->host, s->rrdpush_sender_pipe, true);
netdata_log_error("STREAM %s [send to %s]: restarting internal pipe: %s.",
- rrdhost_hostname(s->host), s->connected_to, error);
+ rrdhost_hostname(s->host), s->connected_to, error);
}
}
@@ -1886,8 +605,8 @@ void *rrdpush_sender_thread(void *ptr) {
if(unlikely(error)) {
worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_SOCKET_ERROR);
netdata_log_error("STREAM %s [send to %s]: restarting connection: %s - %zu bytes transmitted.",
- rrdhost_hostname(s->host), s->connected_to, error, s->sent_bytes_on_this_connection);
- rrdpush_sender_thread_close_socket(s->host);
+ rrdhost_hostname(s->host), s->connected_to, error, s->sent_bytes_on_this_connection);
+ rrdpush_sender_thread_close_socket(s);
}
}
@@ -1896,12 +615,57 @@ void *rrdpush_sender_thread(void *ptr) {
worker_is_busy(WORKER_SENDER_JOB_DISCONNECT_OVERFLOW);
errno_clear();
netdata_log_error("STREAM %s [send to %s]: buffer full (allocated %zu bytes) after sending %zu bytes. Restarting connection",
- rrdhost_hostname(s->host), s->connected_to, s->buffer->size, s->sent_bytes_on_this_connection);
- rrdpush_sender_thread_close_socket(s->host);
+ rrdhost_hostname(s->host), s->connected_to, s->buffer->size, s->sent_bytes_on_this_connection);
+ rrdpush_sender_thread_close_socket(s);
}
worker_set_metric(WORKER_SENDER_JOB_REPLAY_DICT_SIZE, (NETDATA_DOUBLE) dictionary_entries(s->replication.requests));
}
+ if(was_connected)
+ rrdpush_sender_on_disconnect(s->host);
+
+ netdata_log_info("STREAM %s [send]: sending thread exits %s",
+ rrdhost_hostname(s->host),
+ s->exit.reason != STREAM_HANDSHAKE_NEVER ? stream_handshake_error_to_string(s->exit.reason) : "");
+
+ sender_lock(s);
+ {
+ rrdpush_sender_thread_close_socket(s);
+ rrdpush_sender_pipe_close(s->host, s->rrdpush_sender_pipe, false);
+ rrdpush_sender_execute_commands_cleanup(s);
+
+ rrdhost_clear_sender___while_having_sender_mutex(s->host);
+
+#ifdef NETDATA_LOG_STREAM_SENDER
+ if (s->stream_log_fp) {
+ fclose(s->stream_log_fp);
+ s->stream_log_fp = NULL;
+ }
+#endif
+ }
+ sender_unlock(s);
+
+ freez(pipe_buffer);
+ worker_unregister();
+
return NULL;
}
+
+void rrdpush_sender_thread_spawn(RRDHOST *host) {
+ sender_lock(host->sender);
+
+ if(!rrdhost_flag_check(host, RRDHOST_FLAG_RRDPUSH_SENDER_SPAWN)) {
+ char tag[NETDATA_THREAD_TAG_MAX + 1];
+ snprintfz(tag, NETDATA_THREAD_TAG_MAX, THREAD_TAG_STREAM_SENDER "[%s]", rrdhost_hostname(host));
+
+ host->rrdpush_sender_thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_DEFAULT,
+ rrdpush_sender_thread, (void *)host->sender);
+ if(!host->rrdpush_sender_thread)
+ nd_log_daemon(NDLP_ERR, "STREAM %s [send]: failed to create new thread for client.", rrdhost_hostname(host));
+ else
+ rrdhost_flag_set(host, RRDHOST_FLAG_RRDPUSH_SENDER_SPAWN);
+ }
+
+ sender_unlock(host->sender);
+}
diff --git a/src/streaming/sender.h b/src/streaming/sender.h
new file mode 100644
index 000000000..94d104f5f
--- /dev/null
+++ b/src/streaming/sender.h
@@ -0,0 +1,169 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_SENDER_H
+#define NETDATA_SENDER_H
+
+#include "libnetdata/libnetdata.h"
+
+#define CONNECTED_TO_SIZE 100
+
+#define CBUFFER_INITIAL_SIZE (16 * 1024)
+#define THREAD_BUFFER_INITIAL_SIZE (CBUFFER_INITIAL_SIZE / 2)
+
+typedef enum __attribute__((packed)) {
+ STREAM_TRAFFIC_TYPE_REPLICATION = 0,
+ STREAM_TRAFFIC_TYPE_FUNCTIONS,
+ STREAM_TRAFFIC_TYPE_METADATA,
+ STREAM_TRAFFIC_TYPE_DATA,
+ STREAM_TRAFFIC_TYPE_DYNCFG,
+
+ // terminator
+ STREAM_TRAFFIC_TYPE_MAX,
+} STREAM_TRAFFIC_TYPE;
+
+typedef enum __attribute__((packed)) {
+ SENDER_FLAG_OVERFLOW = (1 << 0), // The buffer has been overflown
+} SENDER_FLAGS;
+
+typedef struct {
+ char *os_name;
+ char *os_id;
+ char *os_version;
+ char *kernel_name;
+ char *kernel_version;
+} stream_encoded_t;
+
+#include "stream-handshake.h"
+#include "stream-capabilities.h"
+#include "stream-conf.h"
+#include "stream-compression/compression.h"
+
+#include "sender-destinations.h"
+
+typedef void (*rrdpush_defer_action_t)(struct sender_state *s, void *data);
+typedef void (*rrdpush_defer_cleanup_t)(struct sender_state *s, void *data);
+
+struct sender_state {
+ RRDHOST *host;
+ pid_t tid; // the thread id of the sender, from gettid_cached()
+ SENDER_FLAGS flags;
+ int timeout;
+ int default_port;
+ uint32_t reconnect_delay;
+ char connected_to[CONNECTED_TO_SIZE + 1]; // We don't know which proxy we connect to, passed back from socket.c
+ size_t begin;
+ size_t reconnects_counter;
+ size_t sent_bytes;
+ size_t sent_bytes_on_this_connection;
+ size_t send_attempts;
+ time_t last_traffic_seen_t;
+ time_t last_state_since_t; // the timestamp of the last state (online/offline) change
+ size_t not_connected_loops;
+ // Metrics are collected asynchronously by collector threads calling rrdset_done_push(). This can also trigger
+ // the lazy creation of the sender thread - both cases (buffer access and thread creation) are guarded here.
+ SPINLOCK spinlock;
+ struct circular_buffer *buffer;
+ char read_buffer[PLUGINSD_LINE_MAX + 1];
+ ssize_t read_len;
+ STREAM_CAPABILITIES capabilities;
+ STREAM_CAPABILITIES disabled_capabilities;
+
+ size_t sent_bytes_on_this_connection_per_type[STREAM_TRAFFIC_TYPE_MAX];
+
+ int rrdpush_sender_pipe[2]; // collector to sender thread signaling
+ int rrdpush_sender_socket;
+
+ uint16_t hops;
+
+ struct line_splitter line;
+ struct compressor_state compressor;
+
+#ifdef NETDATA_LOG_STREAM_SENDER
+ FILE *stream_log_fp;
+#endif
+
+ NETDATA_SSL ssl; // structure used to encrypt the connection
+
+ struct {
+ bool shutdown;
+ STREAM_HANDSHAKE reason;
+ } exit;
+
+ struct {
+ DICTIONARY *requests; // de-duplication of replication requests, per chart
+ time_t oldest_request_after_t; // the timestamp of the oldest replication request
+ time_t latest_completed_before_t; // the timestamp of the latest replication request
+
+ struct {
+ size_t pending_requests; // the currently outstanding replication requests
+ size_t charts_replicating; // the number of unique charts having pending replication requests (on every request one is added and is removed when we finish it - it does not track completion of the replication for this chart)
+ bool reached_max; // true when the sender buffer should not get more replication responses
+ } atomic;
+
+ } replication;
+
+ struct {
+ bool pending_data;
+ size_t buffer_used_percentage; // the current utilization of the sending buffer
+ usec_t last_flush_time_ut; // the last time the sender flushed the sending buffer in USEC
+ time_t last_buffer_recreate_s; // true when the sender buffer should be re-created
+ } atomic;
+
+ struct {
+ const char *end_keyword;
+ BUFFER *payload;
+ rrdpush_defer_action_t action;
+ rrdpush_defer_cleanup_t cleanup;
+ void *action_data;
+ } defer;
+
+ bool parent_using_h2o;
+};
+
+#define sender_lock(sender) spinlock_lock(&(sender)->spinlock)
+#define sender_unlock(sender) spinlock_unlock(&(sender)->spinlock)
+
+#define rrdpush_sender_pipe_has_pending_data(sender) __atomic_load_n(&(sender)->atomic.pending_data, __ATOMIC_RELAXED)
+#define rrdpush_sender_pipe_set_pending_data(sender) __atomic_store_n(&(sender)->atomic.pending_data, true, __ATOMIC_RELAXED)
+#define rrdpush_sender_pipe_clear_pending_data(sender) __atomic_store_n(&(sender)->atomic.pending_data, false, __ATOMIC_RELAXED)
+
+#define rrdpush_sender_last_buffer_recreate_get(sender) __atomic_load_n(&(sender)->atomic.last_buffer_recreate_s, __ATOMIC_RELAXED)
+#define rrdpush_sender_last_buffer_recreate_set(sender, value) __atomic_store_n(&(sender)->atomic.last_buffer_recreate_s, value, __ATOMIC_RELAXED)
+
+#define rrdpush_sender_replication_buffer_full_set(sender, value) __atomic_store_n(&((sender)->replication.atomic.reached_max), value, __ATOMIC_SEQ_CST)
+#define rrdpush_sender_replication_buffer_full_get(sender) __atomic_load_n(&((sender)->replication.atomic.reached_max), __ATOMIC_SEQ_CST)
+
+#define rrdpush_sender_set_buffer_used_percent(sender, value) __atomic_store_n(&((sender)->atomic.buffer_used_percentage), value, __ATOMIC_RELAXED)
+#define rrdpush_sender_get_buffer_used_percent(sender) __atomic_load_n(&((sender)->atomic.buffer_used_percentage), __ATOMIC_RELAXED)
+
+#define rrdpush_sender_set_flush_time(sender) __atomic_store_n(&((sender)->atomic.last_flush_time_ut), now_realtime_usec(), __ATOMIC_RELAXED)
+#define rrdpush_sender_get_flush_time(sender) __atomic_load_n(&((sender)->atomic.last_flush_time_ut), __ATOMIC_RELAXED)
+
+#define rrdpush_sender_replicating_charts(sender) __atomic_load_n(&((sender)->replication.atomic.charts_replicating), __ATOMIC_RELAXED)
+#define rrdpush_sender_replicating_charts_plus_one(sender) __atomic_add_fetch(&((sender)->replication.atomic.charts_replicating), 1, __ATOMIC_RELAXED)
+#define rrdpush_sender_replicating_charts_minus_one(sender) __atomic_sub_fetch(&((sender)->replication.atomic.charts_replicating), 1, __ATOMIC_RELAXED)
+#define rrdpush_sender_replicating_charts_zero(sender) __atomic_store_n(&((sender)->replication.atomic.charts_replicating), 0, __ATOMIC_RELAXED)
+
+#define rrdpush_sender_pending_replication_requests(sender) __atomic_load_n(&((sender)->replication.atomic.pending_requests), __ATOMIC_RELAXED)
+#define rrdpush_sender_pending_replication_requests_plus_one(sender) __atomic_add_fetch(&((sender)->replication.atomic.pending_requests), 1, __ATOMIC_RELAXED)
+#define rrdpush_sender_pending_replication_requests_minus_one(sender) __atomic_sub_fetch(&((sender)->replication.atomic.pending_requests), 1, __ATOMIC_RELAXED)
+#define rrdpush_sender_pending_replication_requests_zero(sender) __atomic_store_n(&((sender)->replication.atomic.pending_requests), 0, __ATOMIC_RELAXED)
+
+BUFFER *sender_start(struct sender_state *s);
+void sender_commit(struct sender_state *s, BUFFER *wb, STREAM_TRAFFIC_TYPE type);
+
+void *rrdpush_sender_thread(void *ptr);
+void rrdpush_sender_thread_stop(RRDHOST *host, STREAM_HANDSHAKE reason, bool wait);
+
+void sender_thread_buffer_free(void);
+
+void rrdpush_signal_sender_to_wake_up(struct sender_state *s);
+
+bool rrdpush_sender_connect(struct sender_state *s);
+void rrdpush_sender_cbuffer_recreate_timed(struct sender_state *s, time_t now_s, bool have_mutex, bool force);
+bool rrdhost_sender_should_exit(struct sender_state *s);
+void rrdpush_sender_thread_spawn(RRDHOST *host);
+
+#include "replication.h"
+
+#endif //NETDATA_SENDER_H
diff --git a/src/streaming/stream-capabilities.c b/src/streaming/stream-capabilities.c
new file mode 100644
index 000000000..b089e8f9d
--- /dev/null
+++ b/src/streaming/stream-capabilities.c
@@ -0,0 +1,169 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "rrdpush.h"
+
+static STREAM_CAPABILITIES globally_disabled_capabilities = STREAM_CAP_NONE;
+
+static struct {
+ STREAM_CAPABILITIES cap;
+ const char *str;
+} capability_names[] = {
+ {STREAM_CAP_V1, "V1" },
+ {STREAM_CAP_V2, "V2" },
+ {STREAM_CAP_VN, "VN" },
+ {STREAM_CAP_VCAPS, "VCAPS" },
+ {STREAM_CAP_HLABELS, "HLABELS" },
+ {STREAM_CAP_CLAIM, "CLAIM" },
+ {STREAM_CAP_CLABELS, "CLABELS" },
+ {STREAM_CAP_LZ4, "LZ4" },
+ {STREAM_CAP_FUNCTIONS, "FUNCTIONS" },
+ {STREAM_CAP_REPLICATION, "REPLICATION" },
+ {STREAM_CAP_BINARY, "BINARY" },
+ {STREAM_CAP_INTERPOLATED, "INTERPOLATED" },
+ {STREAM_CAP_IEEE754, "IEEE754" },
+ {STREAM_CAP_DATA_WITH_ML, "ML" },
+ {STREAM_CAP_DYNCFG, "DYNCFG" },
+ {STREAM_CAP_SLOTS, "SLOTS" },
+ {STREAM_CAP_ZSTD, "ZSTD" },
+ {STREAM_CAP_GZIP, "GZIP" },
+ {STREAM_CAP_BROTLI, "BROTLI" },
+ {STREAM_CAP_PROGRESS, "PROGRESS" },
+ {STREAM_CAP_NODE_ID, "NODEID" },
+ {STREAM_CAP_PATHS, "PATHS" },
+ {0 , NULL },
+};
+
+STREAM_CAPABILITIES stream_capabilities_parse_one(const char *str) {
+ if (!str || !*str)
+ return STREAM_CAP_NONE;
+
+ for (size_t i = 0; capability_names[i].str; i++) {
+ if (strcmp(capability_names[i].str, str) == 0)
+ return capability_names[i].cap;
+ }
+
+ return STREAM_CAP_NONE;
+}
+
+void stream_capabilities_to_string(BUFFER *wb, STREAM_CAPABILITIES caps) {
+ for(size_t i = 0; capability_names[i].str ; i++) {
+ if(caps & capability_names[i].cap) {
+ buffer_strcat(wb, capability_names[i].str);
+ buffer_strcat(wb, " ");
+ }
+ }
+}
+
+void stream_capabilities_to_json_array(BUFFER *wb, STREAM_CAPABILITIES caps, const char *key) {
+ if(key)
+ buffer_json_member_add_array(wb, key);
+ else
+ buffer_json_add_array_item_array(wb);
+
+ for(size_t i = 0; capability_names[i].str ; i++) {
+ if(caps & capability_names[i].cap)
+ buffer_json_add_array_item_string(wb, capability_names[i].str);
+ }
+
+ buffer_json_array_close(wb);
+}
+
+void log_receiver_capabilities(struct receiver_state *rpt) {
+ BUFFER *wb = buffer_create(100, NULL);
+ stream_capabilities_to_string(wb, rpt->capabilities);
+
+ nd_log_daemon(NDLP_INFO, "STREAM %s [receive from [%s]:%s]: established link with negotiated capabilities: %s",
+ rrdhost_hostname(rpt->host), rpt->client_ip, rpt->client_port, buffer_tostring(wb));
+
+ buffer_free(wb);
+}
+
+void log_sender_capabilities(struct sender_state *s) {
+ BUFFER *wb = buffer_create(100, NULL);
+ stream_capabilities_to_string(wb, s->capabilities);
+
+ nd_log_daemon(NDLP_INFO, "STREAM %s [send to %s]: established link with negotiated capabilities: %s",
+ rrdhost_hostname(s->host), s->connected_to, buffer_tostring(wb));
+
+ buffer_free(wb);
+}
+
+STREAM_CAPABILITIES stream_our_capabilities(RRDHOST *host, bool sender) {
+ STREAM_CAPABILITIES disabled_capabilities = globally_disabled_capabilities;
+
+ if(host && sender) {
+ // we have DATA_WITH_ML capability
+ // we should remove the DATA_WITH_ML capability if our database does not have anomaly info
+ // this can happen under these conditions: 1. we don't run ML, and 2. we don't receive ML
+ spinlock_lock(&host->receiver_lock);
+
+ if(!ml_host_running(host) && !stream_has_capability(host->receiver, STREAM_CAP_DATA_WITH_ML))
+ disabled_capabilities |= STREAM_CAP_DATA_WITH_ML;
+
+ spinlock_unlock(&host->receiver_lock);
+
+ if(host->sender)
+ disabled_capabilities |= host->sender->disabled_capabilities;
+ }
+
+ return (STREAM_CAP_V1 |
+ STREAM_CAP_V2 |
+ STREAM_CAP_VN |
+ STREAM_CAP_VCAPS |
+ STREAM_CAP_HLABELS |
+ STREAM_CAP_CLAIM |
+ STREAM_CAP_CLABELS |
+ STREAM_CAP_FUNCTIONS |
+ STREAM_CAP_REPLICATION |
+ STREAM_CAP_BINARY |
+ STREAM_CAP_INTERPOLATED |
+ STREAM_CAP_SLOTS |
+ STREAM_CAP_PROGRESS |
+ STREAM_CAP_COMPRESSIONS_AVAILABLE |
+ STREAM_CAP_DYNCFG |
+ STREAM_CAP_NODE_ID |
+ STREAM_CAP_PATHS |
+ STREAM_CAP_IEEE754 |
+ STREAM_CAP_DATA_WITH_ML |
+ 0) & ~disabled_capabilities;
+}
+
+STREAM_CAPABILITIES convert_stream_version_to_capabilities(int32_t version, RRDHOST *host, bool sender) {
+ STREAM_CAPABILITIES caps = 0;
+
+ if(version <= 1) caps = STREAM_CAP_V1;
+ else if(version < STREAM_OLD_VERSION_CLAIM) caps = STREAM_CAP_V2 | STREAM_CAP_HLABELS;
+ else if(version <= STREAM_OLD_VERSION_CLAIM) caps = STREAM_CAP_VN | STREAM_CAP_HLABELS | STREAM_CAP_CLAIM;
+ else if(version <= STREAM_OLD_VERSION_CLABELS) caps = STREAM_CAP_VN | STREAM_CAP_HLABELS | STREAM_CAP_CLAIM | STREAM_CAP_CLABELS;
+ else if(version <= STREAM_OLD_VERSION_LZ4) caps = STREAM_CAP_VN | STREAM_CAP_HLABELS | STREAM_CAP_CLAIM | STREAM_CAP_CLABELS | STREAM_CAP_LZ4_AVAILABLE;
+ else caps = version;
+
+ if(caps & STREAM_CAP_VCAPS)
+ caps &= ~(STREAM_CAP_V1|STREAM_CAP_V2|STREAM_CAP_VN);
+
+ if(caps & STREAM_CAP_VN)
+ caps &= ~(STREAM_CAP_V1|STREAM_CAP_V2);
+
+ if(caps & STREAM_CAP_V2)
+ caps &= ~(STREAM_CAP_V1);
+
+ STREAM_CAPABILITIES common_caps = caps & stream_our_capabilities(host, sender);
+
+ if(!(common_caps & STREAM_CAP_INTERPOLATED))
+ // DATA WITH ML requires INTERPOLATED
+ common_caps &= ~STREAM_CAP_DATA_WITH_ML;
+
+ return common_caps;
+}
+
+int32_t stream_capabilities_to_vn(uint32_t caps) {
+ if(caps & STREAM_CAP_LZ4) return STREAM_OLD_VERSION_LZ4;
+ if(caps & STREAM_CAP_CLABELS) return STREAM_OLD_VERSION_CLABELS;
+ return STREAM_OLD_VERSION_CLAIM; // if(caps & STREAM_CAP_CLAIM)
+}
+
+void check_local_streaming_capabilities(void) {
+ ieee754_doubles = is_system_ieee754_double();
+ if(!ieee754_doubles)
+ globally_disabled_capabilities |= STREAM_CAP_IEEE754;
+}
diff --git a/src/streaming/stream-capabilities.h b/src/streaming/stream-capabilities.h
new file mode 100644
index 000000000..90a0e2190
--- /dev/null
+++ b/src/streaming/stream-capabilities.h
@@ -0,0 +1,100 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_STREAM_CAPABILITIES_H
+#define NETDATA_STREAM_CAPABILITIES_H
+
+#include "libnetdata/libnetdata.h"
+
+// ----------------------------------------------------------------------------
+// obsolete versions - do not use anymore
+
+#define STREAM_OLD_VERSION_CLAIM 3
+#define STREAM_OLD_VERSION_CLABELS 4
+#define STREAM_OLD_VERSION_LZ4 5
+
+// ----------------------------------------------------------------------------
+// capabilities negotiation
+
+typedef enum {
+ STREAM_CAP_NONE = 0,
+
+ // do not use the first 3 bits
+ // they used to be versions 1, 2 and 3
+ // before we introduce capabilities
+
+ STREAM_CAP_V1 = (1 << 3), // v1 = the oldest protocol
+ STREAM_CAP_V2 = (1 << 4), // v2 = the second version of the protocol (with host labels)
+ STREAM_CAP_VN = (1 << 5), // version negotiation supported (for versions 3, 4, 5 of the protocol)
+ // v3 = claiming supported
+ // v4 = chart labels supported
+ // v5 = lz4 compression supported
+ STREAM_CAP_VCAPS = (1 << 6), // capabilities negotiation supported
+ STREAM_CAP_HLABELS = (1 << 7), // host labels supported
+ STREAM_CAP_CLAIM = (1 << 8), // claiming supported
+ STREAM_CAP_CLABELS = (1 << 9), // chart labels supported
+ STREAM_CAP_LZ4 = (1 << 10), // lz4 compression supported
+ STREAM_CAP_FUNCTIONS = (1 << 11), // plugin functions supported
+ STREAM_CAP_REPLICATION = (1 << 12), // replication supported
+ STREAM_CAP_BINARY = (1 << 13), // streaming supports binary data
+ STREAM_CAP_INTERPOLATED = (1 << 14), // streaming supports interpolated streaming of values
+ STREAM_CAP_IEEE754 = (1 << 15), // streaming supports binary/hex transfer of double values
+ STREAM_CAP_DATA_WITH_ML = (1 << 16), // streaming supports transferring anomaly bit
+ // STREAM_CAP_DYNCFG = (1 << 17), // leave this unused for as long as possible
+ STREAM_CAP_SLOTS = (1 << 18), // the sender can appoint a unique slot for each chart
+ STREAM_CAP_ZSTD = (1 << 19), // ZSTD compression supported
+ STREAM_CAP_GZIP = (1 << 20), // GZIP compression supported
+ STREAM_CAP_BROTLI = (1 << 21), // BROTLI compression supported
+ STREAM_CAP_PROGRESS = (1 << 22), // Functions PROGRESS support
+ STREAM_CAP_DYNCFG = (1 << 23), // support for DYNCFG
+ STREAM_CAP_NODE_ID = (1 << 24), // support for sending NODE_ID back to the child
+ STREAM_CAP_PATHS = (1 << 25), // support for sending PATHS upstream and downstream
+
+ STREAM_CAP_INVALID = (1 << 30), // used as an invalid value for capabilities when this is set
+ // this must be signed int, so don't use the last bit
+ // needed for negotiating errors between parent and child
+} STREAM_CAPABILITIES;
+
+#ifdef ENABLE_LZ4
+#define STREAM_CAP_LZ4_AVAILABLE STREAM_CAP_LZ4
+#else
+#define STREAM_CAP_LZ4_AVAILABLE 0
+#endif // ENABLE_LZ4
+
+#ifdef ENABLE_ZSTD
+#define STREAM_CAP_ZSTD_AVAILABLE STREAM_CAP_ZSTD
+#else
+#define STREAM_CAP_ZSTD_AVAILABLE 0
+#endif // ENABLE_ZSTD
+
+#ifdef ENABLE_BROTLI
+#define STREAM_CAP_BROTLI_AVAILABLE STREAM_CAP_BROTLI
+#else
+#define STREAM_CAP_BROTLI_AVAILABLE 0
+#endif // ENABLE_BROTLI
+
+#define STREAM_CAP_COMPRESSIONS_AVAILABLE (STREAM_CAP_LZ4_AVAILABLE|STREAM_CAP_ZSTD_AVAILABLE|STREAM_CAP_BROTLI_AVAILABLE|STREAM_CAP_GZIP)
+
+#define stream_has_capability(rpt, capability) ((rpt) && ((rpt)->capabilities & (capability)) == (capability))
+
+static inline bool stream_has_more_than_one_capability_of(STREAM_CAPABILITIES caps, STREAM_CAPABILITIES mask) {
+ STREAM_CAPABILITIES common = (STREAM_CAPABILITIES)(caps & mask);
+ return (common & (common - 1)) != 0 && common != 0;
+}
+
+struct sender_state;
+struct receiver_state;
+struct rrdhost;
+
+STREAM_CAPABILITIES stream_capabilities_parse_one(const char *str);
+
+void stream_capabilities_to_string(BUFFER *wb, STREAM_CAPABILITIES caps);
+void stream_capabilities_to_json_array(BUFFER *wb, STREAM_CAPABILITIES caps, const char *key);
+void log_receiver_capabilities(struct receiver_state *rpt);
+void log_sender_capabilities(struct sender_state *s);
+STREAM_CAPABILITIES convert_stream_version_to_capabilities(int32_t version, struct rrdhost *host, bool sender);
+int32_t stream_capabilities_to_vn(uint32_t caps);
+STREAM_CAPABILITIES stream_our_capabilities(struct rrdhost *host, bool sender);
+
+void check_local_streaming_capabilities(void);
+
+#endif //NETDATA_STREAM_CAPABILITIES_H
diff --git a/src/streaming/compression_brotli.c b/src/streaming/stream-compression/brotli.c
index cf52f3bca..c2c09cdc5 100644
--- a/src/streaming/compression_brotli.c
+++ b/src/streaming/stream-compression/brotli.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-3.0-or-later
-#include "compression_brotli.h"
+#include "brotli.h"
#ifdef ENABLE_BROTLI
#include <brotli/encode.h>
diff --git a/src/streaming/compression_brotli.h b/src/streaming/stream-compression/brotli.h
index 4955e5a82..4955e5a82 100644
--- a/src/streaming/compression_brotli.h
+++ b/src/streaming/stream-compression/brotli.h
diff --git a/src/streaming/compression.c b/src/streaming/stream-compression/compression.c
index a94c8a0a6..3c9930656 100644
--- a/src/streaming/compression.c
+++ b/src/streaming/stream-compression/compression.c
@@ -2,18 +2,18 @@
#include "compression.h"
-#include "compression_gzip.h"
+#include "gzip.h"
#ifdef ENABLE_LZ4
-#include "compression_lz4.h"
+#include "lz4.h"
#endif
#ifdef ENABLE_ZSTD
-#include "compression_zstd.h"
+#include "zstd.h"
#endif
#ifdef ENABLE_BROTLI
-#include "compression_brotli.h"
+#include "brotli.h"
#endif
int rrdpush_compression_levels[COMPRESSION_ALGORITHM_MAX] = {
@@ -32,7 +32,7 @@ void rrdpush_parse_compression_order(struct receiver_state *rpt, const char *ord
char *s = strdupz(order);
char *words[COMPRESSION_ALGORITHM_MAX + 100] = { NULL };
- size_t num_words = quoted_strings_splitter_pluginsd(s, words, COMPRESSION_ALGORITHM_MAX + 100);
+ size_t num_words = quoted_strings_splitter_whitespace(s, words, COMPRESSION_ALGORITHM_MAX + 100);
size_t slot = 0;
STREAM_CAPABILITIES added = STREAM_CAP_NONE;
for(size_t i = 0; i < num_words && slot < COMPRESSION_ALGORITHM_MAX ;i++) {
@@ -395,21 +395,17 @@ size_t rrdpush_decompress(struct decompressor_state *state, const char *compress
// ----------------------------------------------------------------------------
// unit test
-static inline long int my_random (void) {
- return random();
-}
-
void unittest_generate_random_name(char *dst, size_t size) {
if(size < 7)
size = 7;
- size_t len = 5 + my_random() % (size - 6);
+ size_t len = 5 + os_random32() % (size - 6);
for(size_t i = 0; i < len ; i++) {
- if(my_random() % 2 == 0)
- dst[i] = 'A' + my_random() % 26;
+ if(os_random8() % 2 == 0)
+ dst[i] = 'A' + os_random8() % 26;
else
- dst[i] = 'a' + my_random() % 26;
+ dst[i] = 'a' + os_random8() % 26;
}
dst[len] = '\0';
@@ -423,9 +419,9 @@ void unittest_generate_message(BUFFER *wb, time_t now_s, size_t counter) {
time_t point_end_time_s = now_s;
time_t wall_clock_time_s = now_s;
size_t chart_slot = counter + 1;
- size_t dimensions = 2 + my_random() % 5;
+ size_t dimensions = 2 + os_random8() % 5;
char chart[RRD_ID_LENGTH_MAX + 1] = "name";
- unittest_generate_random_name(chart, 5 + my_random() % 30);
+ unittest_generate_random_name(chart, 5 + os_random8() % 30);
buffer_fast_strcat(wb, PLUGINSD_KEYWORD_BEGIN_V2, sizeof(PLUGINSD_KEYWORD_BEGIN_V2) - 1);
@@ -451,10 +447,10 @@ void unittest_generate_message(BUFFER *wb, time_t now_s, size_t counter) {
for(size_t d = 0; d < dimensions ;d++) {
size_t dim_slot = d + 1;
char dim_id[RRD_ID_LENGTH_MAX + 1] = "dimension";
- unittest_generate_random_name(dim_id, 10 + my_random() % 20);
- int64_t last_collected_value = (my_random() % 2 == 0) ? (int64_t)(counter + d) : (int64_t)my_random();
- NETDATA_DOUBLE value = (my_random() % 2 == 0) ? (NETDATA_DOUBLE)my_random() / ((NETDATA_DOUBLE)my_random() + 1) : (NETDATA_DOUBLE)last_collected_value;
- SN_FLAGS flags = (my_random() % 1000 == 0) ? SN_FLAG_NONE : SN_FLAG_NOT_ANOMALOUS;
+ unittest_generate_random_name(dim_id, 10 + os_random8() % 20);
+ int64_t last_collected_value = (os_random8() % 2 == 0) ? (int64_t)(counter + d) : (int64_t)os_random32();
+ NETDATA_DOUBLE value = (os_random8() % 2 == 0) ? (NETDATA_DOUBLE)os_random64() / ((NETDATA_DOUBLE)os_random64() + 1) : (NETDATA_DOUBLE)last_collected_value;
+ SN_FLAGS flags = (os_random16() % 1000 == 0) ? SN_FLAG_NONE : SN_FLAG_NOT_ANOMALOUS;
buffer_fast_strcat(wb, PLUGINSD_KEYWORD_SET_V2, sizeof(PLUGINSD_KEYWORD_SET_V2) - 1);
diff --git a/src/streaming/compression.h b/src/streaming/stream-compression/compression.h
index 285fb2cf6..37f589b85 100644
--- a/src/streaming/compression.h
+++ b/src/streaming/stream-compression/compression.h
@@ -1,10 +1,10 @@
// SPDX-License-Identifier: GPL-3.0-or-later
-#include "rrdpush.h"
-
#ifndef NETDATA_RRDPUSH_COMPRESSION_H
#define NETDATA_RRDPUSH_COMPRESSION_H 1
+#include "libnetdata/libnetdata.h"
+
// signature MUST end with a newline
#if COMPRESSION_MAX_MSG_SIZE >= (COMPRESSION_MAX_CHUNK - COMPRESSION_MAX_OVERHEAD)
@@ -172,4 +172,12 @@ static inline size_t rrdpush_decompressor_get(struct decompressor_state *state,
// ----------------------------------------------------------------------------
+#include "../rrdpush.h"
+
+bool rrdpush_compression_initialize(struct sender_state *s);
+bool rrdpush_decompression_initialize(struct receiver_state *rpt);
+void rrdpush_parse_compression_order(struct receiver_state *rpt, const char *order);
+void rrdpush_select_receiver_compression_algorithm(struct receiver_state *rpt);
+void rrdpush_compression_deactivate(struct sender_state *s);
+
#endif // NETDATA_RRDPUSH_COMPRESSION_H 1
diff --git a/src/streaming/compression_gzip.c b/src/streaming/stream-compression/gzip.c
index c4ef3af05..d63e9afbe 100644
--- a/src/streaming/compression_gzip.c
+++ b/src/streaming/stream-compression/gzip.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-3.0-or-later
-#include "compression_gzip.h"
+#include "gzip.h"
#include <zlib.h>
void rrdpush_compressor_init_gzip(struct compressor_state *state) {
diff --git a/src/streaming/compression_gzip.h b/src/streaming/stream-compression/gzip.h
index 85f34bc6d..85f34bc6d 100644
--- a/src/streaming/compression_gzip.h
+++ b/src/streaming/stream-compression/gzip.h
diff --git a/src/streaming/compression_lz4.c b/src/streaming/stream-compression/lz4.c
index f5174134e..284192153 100644
--- a/src/streaming/compression_lz4.c
+++ b/src/streaming/stream-compression/lz4.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-3.0-or-later
-#include "compression_lz4.h"
+#include "lz4.h"
#ifdef ENABLE_LZ4
#include "lz4.h"
diff --git a/src/streaming/compression_lz4.h b/src/streaming/stream-compression/lz4.h
index 69f0fadcc..69f0fadcc 100644
--- a/src/streaming/compression_lz4.h
+++ b/src/streaming/stream-compression/lz4.h
diff --git a/src/streaming/compression_zstd.c b/src/streaming/stream-compression/zstd.c
index dabc044f7..0ce27c0d3 100644
--- a/src/streaming/compression_zstd.c
+++ b/src/streaming/stream-compression/zstd.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-3.0-or-later
-#include "compression_zstd.h"
+#include "zstd.h"
#ifdef ENABLE_ZSTD
#include <zstd.h>
diff --git a/src/streaming/compression_zstd.h b/src/streaming/stream-compression/zstd.h
index bfabbf89d..bfabbf89d 100644
--- a/src/streaming/compression_zstd.h
+++ b/src/streaming/stream-compression/zstd.h
diff --git a/src/streaming/stream-conf.c b/src/streaming/stream-conf.c
new file mode 100644
index 000000000..8fc9e0819
--- /dev/null
+++ b/src/streaming/stream-conf.c
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "stream-conf.h"
+
+struct config stream_config = APPCONFIG_INITIALIZER;
+
+bool stream_conf_send_enabled = false;
+bool stream_conf_compression_enabled = true;
+bool stream_conf_replication_enabled = true;
+
+const char *stream_conf_send_destination = NULL;
+const char *stream_conf_send_api_key = NULL;
+const char *stream_conf_send_charts_matching = "*";
+
+time_t stream_conf_replication_period = 86400;
+time_t stream_conf_replication_step = 600;
+
+const char *stream_conf_ssl_ca_path = NULL;
+const char *stream_conf_ssl_ca_file = NULL;
+
+// to have the remote netdata re-sync the charts
+// to its current clock, we send for this many
+// iterations a BEGIN line without microseconds
+// this is for the first iterations of each chart
+unsigned int stream_conf_initial_clock_resync_iterations = 60;
+
+static void stream_conf_load() {
+ errno_clear();
+ char *filename = filename_from_path_entry_strdupz(netdata_configured_user_config_dir, "stream.conf");
+ if(!appconfig_load(&stream_config, filename, 0, NULL)) {
+ nd_log_daemon(NDLP_NOTICE, "CONFIG: cannot load user config '%s'. Will try stock config.", filename);
+ freez(filename);
+
+ filename = filename_from_path_entry_strdupz(netdata_configured_stock_config_dir, "stream.conf");
+ if(!appconfig_load(&stream_config, filename, 0, NULL))
+ nd_log_daemon(NDLP_NOTICE, "CONFIG: cannot load stock config '%s'. Running with internal defaults.", filename);
+ }
+
+ freez(filename);
+
+ appconfig_move(&stream_config,
+ CONFIG_SECTION_STREAM, "timeout seconds",
+ CONFIG_SECTION_STREAM, "timeout");
+
+ appconfig_move(&stream_config,
+ CONFIG_SECTION_STREAM, "reconnect delay seconds",
+ CONFIG_SECTION_STREAM, "reconnect delay");
+
+ appconfig_move_everywhere(&stream_config, "default memory mode", "db");
+ appconfig_move_everywhere(&stream_config, "memory mode", "db");
+ appconfig_move_everywhere(&stream_config, "db mode", "db");
+ appconfig_move_everywhere(&stream_config, "default history", "retention");
+ appconfig_move_everywhere(&stream_config, "history", "retention");
+ appconfig_move_everywhere(&stream_config, "default proxy enabled", "proxy enabled");
+ appconfig_move_everywhere(&stream_config, "default proxy destination", "proxy destination");
+ appconfig_move_everywhere(&stream_config, "default proxy api key", "proxy api key");
+ appconfig_move_everywhere(&stream_config, "default proxy send charts matching", "proxy send charts matching");
+ appconfig_move_everywhere(&stream_config, "default health log history", "health log retention");
+ appconfig_move_everywhere(&stream_config, "health log history", "health log retention");
+ appconfig_move_everywhere(&stream_config, "seconds to replicate", "replication period");
+ appconfig_move_everywhere(&stream_config, "seconds per replication step", "replication step");
+ appconfig_move_everywhere(&stream_config, "default postpone alarms on connect seconds", "postpone alerts on connect");
+ appconfig_move_everywhere(&stream_config, "postpone alarms on connect seconds", "postpone alerts on connect");
+}
+
+bool stream_conf_receiver_needs_dbengine(void) {
+ return stream_conf_needs_dbengine(&stream_config);
+}
+
+bool stream_conf_init() {
+ // --------------------------------------------------------------------
+ // load stream.conf
+ stream_conf_load();
+
+ stream_conf_send_enabled =
+ appconfig_get_boolean(&stream_config, CONFIG_SECTION_STREAM, "enabled", stream_conf_send_enabled);
+
+ stream_conf_send_destination =
+ appconfig_get(&stream_config, CONFIG_SECTION_STREAM, "destination", "");
+
+ stream_conf_send_api_key =
+ appconfig_get(&stream_config, CONFIG_SECTION_STREAM, "api key", "");
+
+ stream_conf_send_charts_matching =
+ appconfig_get(&stream_config, CONFIG_SECTION_STREAM, "send charts matching", stream_conf_send_charts_matching);
+
+ stream_conf_replication_enabled =
+ config_get_boolean(CONFIG_SECTION_DB, "enable replication", stream_conf_replication_enabled);
+
+ stream_conf_replication_period =
+ config_get_duration_seconds(CONFIG_SECTION_DB, "replication period", stream_conf_replication_period);
+
+ stream_conf_replication_step =
+ config_get_duration_seconds(CONFIG_SECTION_DB, "replication step", stream_conf_replication_step);
+
+ rrdhost_free_orphan_time_s =
+ config_get_duration_seconds(CONFIG_SECTION_DB, "cleanup orphan hosts after", rrdhost_free_orphan_time_s);
+
+ stream_conf_compression_enabled =
+ appconfig_get_boolean(&stream_config, CONFIG_SECTION_STREAM,
+ "enable compression", stream_conf_compression_enabled);
+
+ rrdpush_compression_levels[COMPRESSION_ALGORITHM_BROTLI] = (int)appconfig_get_number(
+ &stream_config, CONFIG_SECTION_STREAM, "brotli compression level",
+ rrdpush_compression_levels[COMPRESSION_ALGORITHM_BROTLI]);
+
+ rrdpush_compression_levels[COMPRESSION_ALGORITHM_ZSTD] = (int)appconfig_get_number(
+ &stream_config, CONFIG_SECTION_STREAM, "zstd compression level",
+ rrdpush_compression_levels[COMPRESSION_ALGORITHM_ZSTD]);
+
+ rrdpush_compression_levels[COMPRESSION_ALGORITHM_LZ4] = (int)appconfig_get_number(
+ &stream_config, CONFIG_SECTION_STREAM, "lz4 compression acceleration",
+ rrdpush_compression_levels[COMPRESSION_ALGORITHM_LZ4]);
+
+ rrdpush_compression_levels[COMPRESSION_ALGORITHM_GZIP] = (int)appconfig_get_number(
+ &stream_config, CONFIG_SECTION_STREAM, "gzip compression level",
+ rrdpush_compression_levels[COMPRESSION_ALGORITHM_GZIP]);
+
+ if(stream_conf_send_enabled && (!stream_conf_send_destination || !*stream_conf_send_destination || !stream_conf_send_api_key || !*stream_conf_send_api_key)) {
+ nd_log_daemon(NDLP_WARNING, "STREAM [send]: cannot enable sending thread - information is missing.");
+ stream_conf_send_enabled = false;
+ }
+
+ netdata_ssl_validate_certificate_sender = !appconfig_get_boolean(&stream_config, CONFIG_SECTION_STREAM, "ssl skip certificate verification", !netdata_ssl_validate_certificate);
+
+ if(!netdata_ssl_validate_certificate_sender)
+ nd_log_daemon(NDLP_NOTICE, "SSL: streaming senders will skip SSL certificates verification.");
+
+ stream_conf_ssl_ca_path = appconfig_get(&stream_config, CONFIG_SECTION_STREAM, "CApath", NULL);
+ stream_conf_ssl_ca_file = appconfig_get(&stream_config, CONFIG_SECTION_STREAM, "CAfile", NULL);
+
+ return stream_conf_send_enabled;
+}
+
+bool stream_conf_configured_as_parent() {
+ return stream_conf_has_uuid_section(&stream_config);
+}
diff --git a/src/streaming/stream-conf.h b/src/streaming/stream-conf.h
new file mode 100644
index 000000000..da7a88123
--- /dev/null
+++ b/src/streaming/stream-conf.h
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_STREAM_CONF_H
+#define NETDATA_STREAM_CONF_H
+
+#include "libnetdata/libnetdata.h"
+#include "daemon/common.h"
+
+extern bool stream_conf_send_enabled;
+extern bool stream_conf_compression_enabled;
+extern bool stream_conf_replication_enabled;
+
+extern const char *stream_conf_send_destination;
+extern const char *stream_conf_send_api_key;
+extern const char *stream_conf_send_charts_matching;
+extern time_t stream_conf_replication_period;
+extern time_t stream_conf_replication_step;
+extern unsigned int stream_conf_initial_clock_resync_iterations;
+
+extern struct config stream_config;
+extern const char *stream_conf_ssl_ca_path;
+extern const char *stream_conf_ssl_ca_file;
+
+bool stream_conf_init();
+bool stream_conf_receiver_needs_dbengine();
+bool stream_conf_configured_as_parent();
+
+#endif //NETDATA_STREAM_CONF_H
diff --git a/src/streaming/stream-handshake.c b/src/streaming/stream-handshake.c
new file mode 100644
index 000000000..e338df950
--- /dev/null
+++ b/src/streaming/stream-handshake.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "rrdpush.h"
+
+static struct {
+ STREAM_HANDSHAKE err;
+ const char *str;
+} handshake_errors[] = {
+ { STREAM_HANDSHAKE_OK_V3, "CONNECTED" },
+ { STREAM_HANDSHAKE_OK_V2, "CONNECTED" },
+ { STREAM_HANDSHAKE_OK_V1, "CONNECTED" },
+ { STREAM_HANDSHAKE_NEVER, "" },
+ { STREAM_HANDSHAKE_ERROR_BAD_HANDSHAKE, "BAD HANDSHAKE" },
+ { STREAM_HANDSHAKE_ERROR_LOCALHOST, "LOCALHOST" },
+ { STREAM_HANDSHAKE_ERROR_ALREADY_CONNECTED, "ALREADY CONNECTED" },
+ { STREAM_HANDSHAKE_ERROR_DENIED, "DENIED" },
+ { STREAM_HANDSHAKE_ERROR_SEND_TIMEOUT, "SEND TIMEOUT" },
+ { STREAM_HANDSHAKE_ERROR_RECEIVE_TIMEOUT, "RECEIVE TIMEOUT" },
+ { STREAM_HANDSHAKE_ERROR_INVALID_CERTIFICATE, "INVALID CERTIFICATE" },
+ { STREAM_HANDSHAKE_ERROR_SSL_ERROR, "SSL ERROR" },
+ { STREAM_HANDSHAKE_ERROR_CANT_CONNECT, "CANT CONNECT" },
+ { STREAM_HANDSHAKE_BUSY_TRY_LATER, "BUSY TRY LATER" },
+ { STREAM_HANDSHAKE_INTERNAL_ERROR, "INTERNAL ERROR" },
+ { STREAM_HANDSHAKE_INITIALIZATION, "REMOTE IS INITIALIZING" },
+ { STREAM_HANDSHAKE_DISCONNECT_HOST_CLEANUP, "DISCONNECTED HOST CLEANUP" },
+ { STREAM_HANDSHAKE_DISCONNECT_STALE_RECEIVER, "DISCONNECTED STALE RECEIVER" },
+ { STREAM_HANDSHAKE_DISCONNECT_SHUTDOWN, "DISCONNECTED SHUTDOWN REQUESTED" },
+ { STREAM_HANDSHAKE_DISCONNECT_NETDATA_EXIT, "DISCONNECTED NETDATA EXIT" },
+ { STREAM_HANDSHAKE_DISCONNECT_PARSER_EXIT, "DISCONNECTED PARSE ENDED" },
+ {STREAM_HANDSHAKE_DISCONNECT_UNKNOWN_SOCKET_READ_ERROR, "DISCONNECTED UNKNOWN SOCKET READ ERROR" },
+ { STREAM_HANDSHAKE_DISCONNECT_PARSER_FAILED, "DISCONNECTED PARSE ERROR" },
+ { STREAM_HANDSHAKE_DISCONNECT_RECEIVER_LEFT, "DISCONNECTED RECEIVER LEFT" },
+ { STREAM_HANDSHAKE_DISCONNECT_ORPHAN_HOST, "DISCONNECTED ORPHAN HOST" },
+ { STREAM_HANDSHAKE_NON_STREAMABLE_HOST, "NON STREAMABLE HOST" },
+ { STREAM_HANDSHAKE_DISCONNECT_NOT_SUFFICIENT_READ_BUFFER, "DISCONNECTED NOT SUFFICIENT READ BUFFER" },
+ {STREAM_HANDSHAKE_DISCONNECT_SOCKET_EOF, "DISCONNECTED SOCKET EOF" },
+ {STREAM_HANDSHAKE_DISCONNECT_SOCKET_READ_FAILED, "DISCONNECTED SOCKET READ FAILED" },
+ {STREAM_HANDSHAKE_DISCONNECT_SOCKET_READ_TIMEOUT, "DISCONNECTED SOCKET READ TIMEOUT" },
+ { 0, NULL },
+};
+
+const char *stream_handshake_error_to_string(STREAM_HANDSHAKE handshake_error) {
+ if(handshake_error >= STREAM_HANDSHAKE_OK_V1)
+ // handshake_error is the whole version / capabilities number
+ return "CONNECTED";
+
+ for(size_t i = 0; handshake_errors[i].str ; i++) {
+ if(handshake_error == handshake_errors[i].err)
+ return handshake_errors[i].str;
+ }
+
+ return "UNKNOWN";
+}
diff --git a/src/streaming/stream-handshake.h b/src/streaming/stream-handshake.h
new file mode 100644
index 000000000..9b66cab97
--- /dev/null
+++ b/src/streaming/stream-handshake.h
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_STREAM_HANDSHAKE_H
+#define NETDATA_STREAM_HANDSHAKE_H
+
+#define HTTP_HEADER_SIZE 8192
+
+#define STREAMING_PROTOCOL_VERSION "1.1"
+#define START_STREAMING_PROMPT_V1 "Hit me baby, push them over..."
+#define START_STREAMING_PROMPT_V2 "Hit me baby, push them over and bring the host labels..."
+#define START_STREAMING_PROMPT_VN "Hit me baby, push them over with the version="
+
+#define START_STREAMING_ERROR_SAME_LOCALHOST "Don't hit me baby, you are trying to stream my localhost back"
+#define START_STREAMING_ERROR_ALREADY_STREAMING "This GUID is already streaming to this server"
+#define START_STREAMING_ERROR_NOT_PERMITTED "You are not permitted to access this. Check the logs for more info."
+#define START_STREAMING_ERROR_BUSY_TRY_LATER "The server is too busy now to accept this request. Try later."
+#define START_STREAMING_ERROR_INTERNAL_ERROR "The server encountered an internal error. Try later."
+#define START_STREAMING_ERROR_INITIALIZATION "The server is initializing. Try later."
+
+#define RRDPUSH_STATUS_CONNECTED "CONNECTED"
+#define RRDPUSH_STATUS_ALREADY_CONNECTED "ALREADY CONNECTED"
+#define RRDPUSH_STATUS_DISCONNECTED "DISCONNECTED"
+#define RRDPUSH_STATUS_RATE_LIMIT "RATE LIMIT TRY LATER"
+#define RRDPUSH_STATUS_INITIALIZATION_IN_PROGRESS "INITIALIZATION IN PROGRESS RETRY LATER"
+#define RRDPUSH_STATUS_INTERNAL_SERVER_ERROR "INTERNAL SERVER ERROR DROPPING CONNECTION"
+#define RRDPUSH_STATUS_DUPLICATE_RECEIVER "DUPLICATE RECEIVER DROPPING CONNECTION"
+#define RRDPUSH_STATUS_CANT_REPLY "CANT REPLY DROPPING CONNECTION"
+#define RRDPUSH_STATUS_NO_HOSTNAME "NO HOSTNAME PERMISSION DENIED"
+#define RRDPUSH_STATUS_NO_API_KEY "NO API KEY PERMISSION DENIED"
+#define RRDPUSH_STATUS_INVALID_API_KEY "INVALID API KEY PERMISSION DENIED"
+#define RRDPUSH_STATUS_NO_MACHINE_GUID "NO MACHINE GUID PERMISSION DENIED"
+#define RRDPUSH_STATUS_MACHINE_GUID_DISABLED "MACHINE GUID DISABLED PERMISSION DENIED"
+#define RRDPUSH_STATUS_INVALID_MACHINE_GUID "INVALID MACHINE GUID PERMISSION DENIED"
+#define RRDPUSH_STATUS_API_KEY_DISABLED "API KEY DISABLED PERMISSION DENIED"
+#define RRDPUSH_STATUS_NOT_ALLOWED_IP "NOT ALLOWED IP PERMISSION DENIED"
+#define RRDPUSH_STATUS_LOCALHOST "LOCALHOST PERMISSION DENIED"
+#define RRDPUSH_STATUS_PERMISSION_DENIED "PERMISSION DENIED"
+#define RRDPUSH_STATUS_BAD_HANDSHAKE "BAD HANDSHAKE"
+#define RRDPUSH_STATUS_TIMEOUT "TIMEOUT"
+#define RRDPUSH_STATUS_CANT_UPGRADE_CONNECTION "CANT UPGRADE CONNECTION"
+#define RRDPUSH_STATUS_SSL_ERROR "SSL ERROR"
+#define RRDPUSH_STATUS_INVALID_SSL_CERTIFICATE "INVALID SSL CERTIFICATE"
+#define RRDPUSH_STATUS_CANT_ESTABLISH_SSL_CONNECTION "CANT ESTABLISH SSL CONNECTION"
+
+typedef enum {
+ STREAM_HANDSHAKE_OK_V3 = 3, // v3+
+ STREAM_HANDSHAKE_OK_V2 = 2, // v2
+ STREAM_HANDSHAKE_OK_V1 = 1, // v1
+ STREAM_HANDSHAKE_NEVER = 0, // never tried to connect
+ STREAM_HANDSHAKE_ERROR_BAD_HANDSHAKE = -1,
+ STREAM_HANDSHAKE_ERROR_LOCALHOST = -2,
+ STREAM_HANDSHAKE_ERROR_ALREADY_CONNECTED = -3,
+ STREAM_HANDSHAKE_ERROR_DENIED = -4,
+ STREAM_HANDSHAKE_ERROR_SEND_TIMEOUT = -5,
+ STREAM_HANDSHAKE_ERROR_RECEIVE_TIMEOUT = -6,
+ STREAM_HANDSHAKE_ERROR_INVALID_CERTIFICATE = -7,
+ STREAM_HANDSHAKE_ERROR_SSL_ERROR = -8,
+ STREAM_HANDSHAKE_ERROR_CANT_CONNECT = -9,
+ STREAM_HANDSHAKE_BUSY_TRY_LATER = -10,
+ STREAM_HANDSHAKE_INTERNAL_ERROR = -11,
+ STREAM_HANDSHAKE_INITIALIZATION = -12,
+ STREAM_HANDSHAKE_DISCONNECT_HOST_CLEANUP = -13,
+ STREAM_HANDSHAKE_DISCONNECT_STALE_RECEIVER = -14,
+ STREAM_HANDSHAKE_DISCONNECT_SHUTDOWN = -15,
+ STREAM_HANDSHAKE_DISCONNECT_NETDATA_EXIT = -16,
+ STREAM_HANDSHAKE_DISCONNECT_PARSER_EXIT = -17,
+ STREAM_HANDSHAKE_DISCONNECT_UNKNOWN_SOCKET_READ_ERROR = -18,
+ STREAM_HANDSHAKE_DISCONNECT_PARSER_FAILED = -19,
+ STREAM_HANDSHAKE_DISCONNECT_RECEIVER_LEFT = -20,
+ STREAM_HANDSHAKE_DISCONNECT_ORPHAN_HOST = -21,
+ STREAM_HANDSHAKE_NON_STREAMABLE_HOST = -22,
+ STREAM_HANDSHAKE_DISCONNECT_NOT_SUFFICIENT_READ_BUFFER = -23,
+ STREAM_HANDSHAKE_DISCONNECT_SOCKET_EOF = -24,
+ STREAM_HANDSHAKE_DISCONNECT_SOCKET_READ_FAILED = -25,
+ STREAM_HANDSHAKE_DISCONNECT_SOCKET_READ_TIMEOUT = -26,
+ STREAM_HANDSHAKE_ERROR_HTTP_UPGRADE = -27,
+
+} STREAM_HANDSHAKE;
+
+const char *stream_handshake_error_to_string(STREAM_HANDSHAKE handshake_error);
+
+#endif //NETDATA_STREAM_HANDSHAKE_H
diff --git a/src/streaming/stream-path.c b/src/streaming/stream-path.c
new file mode 100644
index 000000000..7aad9a0bf
--- /dev/null
+++ b/src/streaming/stream-path.c
@@ -0,0 +1,353 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "stream-path.h"
+#include "rrdpush.h"
+#include "plugins.d/pluginsd_internals.h"
+
+ENUM_STR_MAP_DEFINE(STREAM_PATH_FLAGS) = {
+ { .id = STREAM_PATH_FLAG_ACLK, .name = "aclk" },
+
+ // terminator
+ { . id = 0, .name = NULL }
+};
+
+BITMAP_STR_DEFINE_FUNCTIONS(STREAM_PATH_FLAGS, STREAM_PATH_FLAG_NONE, "");
+
+static void stream_path_clear(STREAM_PATH *p) {
+ string_freez(p->hostname);
+ p->hostname = NULL;
+ p->host_id = UUID_ZERO;
+ p->node_id = UUID_ZERO;
+ p->claim_id = UUID_ZERO;
+ p->hops = 0;
+ p->since = 0;
+ p->first_time_t = 0;
+ p->capabilities = 0;
+ p->flags = STREAM_PATH_FLAG_NONE;
+ p->start_time = 0;
+ p->shutdown_time = 0;
+}
+
+static void rrdhost_stream_path_clear_unsafe(RRDHOST *host, bool destroy) {
+ for(size_t i = 0; i < host->rrdpush.path.used ; i++)
+ stream_path_clear(&host->rrdpush.path.array[i]);
+
+ host->rrdpush.path.used = 0;
+
+ if(destroy) {
+ freez(host->rrdpush.path.array);
+ host->rrdpush.path.array = NULL;
+ host->rrdpush.path.size = 0;
+ }
+}
+
+void rrdhost_stream_path_clear(RRDHOST *host, bool destroy) {
+ spinlock_lock(&host->rrdpush.path.spinlock);
+ rrdhost_stream_path_clear_unsafe(host, destroy);
+ spinlock_unlock(&host->rrdpush.path.spinlock);
+}
+
+static void stream_path_to_json_object(BUFFER *wb, STREAM_PATH *p) {
+ buffer_json_add_array_item_object(wb);
+ buffer_json_member_add_string(wb, "hostname", string2str(p->hostname));
+ buffer_json_member_add_uuid(wb, "host_id", p->host_id.uuid);
+ buffer_json_member_add_uuid(wb, "node_id", p->node_id.uuid);
+ buffer_json_member_add_uuid(wb, "claim_id", p->claim_id.uuid);
+ buffer_json_member_add_int64(wb, "hops", p->hops);
+ buffer_json_member_add_uint64(wb, "since", p->since);
+ buffer_json_member_add_uint64(wb, "first_time_t", p->first_time_t);
+ buffer_json_member_add_uint64(wb, "start_time", p->start_time);
+ buffer_json_member_add_uint64(wb, "shutdown_time", p->shutdown_time);
+ stream_capabilities_to_json_array(wb, p->capabilities, "capabilities");
+ STREAM_PATH_FLAGS_2json(wb, "flags", p->flags);
+ buffer_json_object_close(wb);
+}
+
+static STREAM_PATH rrdhost_stream_path_self(RRDHOST *host) {
+ STREAM_PATH p = { 0 };
+
+ bool is_localhost = host == localhost || rrdhost_option_check(host, RRDHOST_OPTION_VIRTUAL_HOST);
+
+ p.hostname = string_dup(localhost->hostname);
+ p.host_id = localhost->host_id;
+ p.node_id = localhost->node_id;
+ p.claim_id = claim_id_get_uuid();
+ p.start_time = get_agent_event_time_median(EVENT_AGENT_START_TIME) / USEC_PER_MS;
+ p.shutdown_time = get_agent_event_time_median(EVENT_AGENT_SHUTDOWN_TIME) / USEC_PER_MS;
+
+ p.flags = STREAM_PATH_FLAG_NONE;
+ if(!UUIDiszero(p.claim_id))
+ p.flags |= STREAM_PATH_FLAG_ACLK;
+
+ bool has_receiver = false;
+ spinlock_lock(&host->receiver_lock);
+ if(host->receiver) {
+ has_receiver = true;
+ p.hops = (int16_t)host->receiver->hops;
+ p.since = host->receiver->connected_since_s;
+ }
+ spinlock_unlock(&host->receiver_lock);
+
+ if(!has_receiver) {
+ p.hops = (is_localhost) ? 0 : -1; // -1 for stale nodes
+ p.since = netdata_start_time;
+ }
+
+ // the following may get the receiver lock again!
+ p.capabilities = stream_our_capabilities(host, true);
+
+ rrdhost_retention(host, 0, false, &p.first_time_t, NULL);
+
+ return p;
+}
+
+STREAM_PATH rrdhost_stream_path_fetch(RRDHOST *host) {
+ STREAM_PATH p = { 0 };
+
+ spinlock_lock(&host->rrdpush.path.spinlock);
+ for (size_t i = 0; i < host->rrdpush.path.used; i++) {
+ STREAM_PATH *tmp_path = &host->rrdpush.path.array[i];
+ if(UUIDeq(host->host_id, tmp_path->host_id)) {
+ p = *tmp_path;
+ break;
+ }
+ }
+ spinlock_unlock(&host->rrdpush.path.spinlock);
+ return p;
+}
+
+void rrdhost_stream_path_to_json(BUFFER *wb, struct rrdhost *host, const char *key, bool add_version) {
+ if(add_version)
+ buffer_json_member_add_uint64(wb, "version", 1);
+
+ spinlock_lock(&host->rrdpush.path.spinlock);
+ buffer_json_member_add_array(wb, key);
+ {
+ {
+ STREAM_PATH tmp = rrdhost_stream_path_self(host);
+
+ bool found_self = false;
+ for (size_t i = 0; i < host->rrdpush.path.used; i++) {
+ STREAM_PATH *p = &host->rrdpush.path.array[i];
+ if(UUIDeq(localhost->host_id, p->host_id)) {
+ // this is us, use the current data
+ p = &tmp;
+ found_self = true;
+ }
+ stream_path_to_json_object(wb, p);
+ }
+
+ if(!found_self) {
+ // we didn't find ourselves in the list.
+ // append us.
+ stream_path_to_json_object(wb, &tmp);
+ }
+
+ stream_path_clear(&tmp);
+ }
+ }
+ buffer_json_array_close(wb); // key
+ spinlock_unlock(&host->rrdpush.path.spinlock);
+}
+
+static BUFFER *stream_path_payload(RRDHOST *host) {
+ BUFFER *wb = buffer_create(0, NULL);
+ buffer_json_initialize(wb, "\"", "\"", 0, true, BUFFER_JSON_OPTIONS_MINIFY);
+ rrdhost_stream_path_to_json(wb, host, STREAM_PATH_JSON_MEMBER, true);
+ buffer_json_finalize(wb);
+ return wb;
+}
+
+void stream_path_send_to_parent(RRDHOST *host) {
+ struct sender_state *s = host->sender;
+ if(!s || !stream_has_capability(s, STREAM_CAP_PATHS)) return;
+
+ CLEAN_BUFFER *payload = stream_path_payload(host);
+
+ BUFFER *wb = sender_start(s);
+ buffer_sprintf(wb, PLUGINSD_KEYWORD_JSON " " PLUGINSD_KEYWORD_STREAM_PATH "\n%s\n" PLUGINSD_KEYWORD_JSON_END "\n", buffer_tostring(payload));
+ sender_commit(s, wb, STREAM_TRAFFIC_TYPE_METADATA);
+}
+
+void stream_path_send_to_child(RRDHOST *host) {
+ if(host == localhost)
+ return;
+
+ CLEAN_BUFFER *payload = stream_path_payload(host);
+
+ spinlock_lock(&host->receiver_lock);
+ if(host->receiver && stream_has_capability(host->receiver, STREAM_CAP_PATHS)) {
+
+ CLEAN_BUFFER *wb = buffer_create(0, NULL);
+ buffer_sprintf(wb, PLUGINSD_KEYWORD_JSON " " PLUGINSD_KEYWORD_STREAM_PATH "\n%s\n" PLUGINSD_KEYWORD_JSON_END "\n", buffer_tostring(payload));
+ send_to_plugin(buffer_tostring(wb), __atomic_load_n(&host->receiver->parser, __ATOMIC_RELAXED));
+ }
+ spinlock_unlock(&host->receiver_lock);
+}
+
+void stream_path_child_disconnected(RRDHOST *host) {
+ rrdhost_stream_path_clear(host, true);
+}
+
+void stream_path_parent_disconnected(RRDHOST *host) {
+ spinlock_lock(&host->rrdpush.path.spinlock);
+
+ size_t cleared = 0;
+ size_t used = host->rrdpush.path.used;
+ for (size_t i = 0; i < used; i++) {
+ STREAM_PATH *p = &host->rrdpush.path.array[i];
+ if(UUIDeq(localhost->host_id, p->host_id)) {
+ host->rrdpush.path.used = i + 1;
+
+ for(size_t j = i + 1; j < used ;j++) {
+ stream_path_clear(&host->rrdpush.path.array[j]);
+ cleared++;
+ }
+
+ break;
+ }
+ }
+
+ spinlock_unlock(&host->rrdpush.path.spinlock);
+
+ if(cleared)
+ stream_path_send_to_child(host);
+}
+
+void stream_path_retention_updated(RRDHOST *host) {
+ if(!host || !localhost) return;
+ stream_path_send_to_parent(host);
+ stream_path_send_to_child(host);
+}
+
+void stream_path_node_id_updated(RRDHOST *host) {
+ if(!host || !localhost) return;
+ stream_path_send_to_parent(host);
+ stream_path_send_to_child(host);
+}
+
+// --------------------------------------------------------------------------------------------------------------------
+
+
+static bool parse_single_path(json_object *jobj, const char *path, STREAM_PATH *p, BUFFER *error) {
+ JSONC_PARSE_TXT2STRING_OR_ERROR_AND_RETURN(jobj, path, "hostname", p->hostname, error, true);
+ JSONC_PARSE_TXT2UUID_OR_ERROR_AND_RETURN(jobj, path, "host_id", p->host_id.uuid, error, true);
+ JSONC_PARSE_TXT2UUID_OR_ERROR_AND_RETURN(jobj, path, "node_id", p->node_id.uuid, error, true);
+ JSONC_PARSE_TXT2UUID_OR_ERROR_AND_RETURN(jobj, path, "claim_id", p->claim_id.uuid, error, true);
+ JSONC_PARSE_INT64_OR_ERROR_AND_RETURN(jobj, path, "hops", p->hops, error, true);
+ JSONC_PARSE_UINT64_OR_ERROR_AND_RETURN(jobj, path, "since", p->since, error, true);
+ JSONC_PARSE_UINT64_OR_ERROR_AND_RETURN(jobj, path, "first_time_t", p->first_time_t, error, true);
+ JSONC_PARSE_INT64_OR_ERROR_AND_RETURN(jobj, path, "start_time", p->start_time, error, true);
+ JSONC_PARSE_INT64_OR_ERROR_AND_RETURN(jobj, path, "shutdown_time", p->shutdown_time, error, true);
+ JSONC_PARSE_ARRAY_OF_TXT2BITMAP_OR_ERROR_AND_RETURN(jobj, path, "flags", STREAM_PATH_FLAGS_2id_one, p->flags, error, true);
+ JSONC_PARSE_ARRAY_OF_TXT2BITMAP_OR_ERROR_AND_RETURN(jobj, path, "capabilities", stream_capabilities_parse_one, p->capabilities, error, true);
+
+ if(!p->hostname) {
+ buffer_strcat(error, "hostname cannot be empty");
+ return false;
+ }
+
+ if(UUIDiszero(p->host_id)) {
+ buffer_strcat(error, "host_id cannot be zero");
+ return false;
+ }
+
+ if(p->hops < 0) {
+ buffer_strcat(error, "hops cannot be negative");
+ return false;
+ }
+
+ if(p->capabilities == STREAM_CAP_NONE) {
+ buffer_strcat(error, "capabilities cannot be empty");
+ return false;
+ }
+
+ if(p->since <= 0) {
+ buffer_strcat(error, "since cannot be <= 0");
+ return false;
+ }
+
+ return true;
+}
+
+static XXH128_hash_t stream_path_hash_unsafe(RRDHOST *host) {
+ if(!host->rrdpush.path.used)
+ return (XXH128_hash_t){ 0 };
+
+ return XXH3_128bits(host->rrdpush.path.array, sizeof(*host->rrdpush.path.array) * host->rrdpush.path.used);
+}
+
+static int compare_by_hops(const void *a, const void *b) {
+ const STREAM_PATH *path1 = a;
+ const STREAM_PATH *path2 = b;
+
+ if (path1->hops < path2->hops)
+ return -1;
+ else if (path1->hops > path2->hops)
+ return 1;
+
+ return 0;
+}
+
+bool stream_path_set_from_json(RRDHOST *host, const char *json, bool from_parent) {
+ if(!json || !*json)
+ return false;
+
+ CLEAN_JSON_OBJECT *jobj = json_tokener_parse(json);
+ if(!jobj) {
+ nd_log(NDLS_DAEMON, NDLP_ERR,
+ "STREAM PATH: Cannot parse json: %s", json);
+ return false;
+ }
+
+ spinlock_lock(&host->rrdpush.path.spinlock);
+ XXH128_hash_t old_hash = stream_path_hash_unsafe(host);
+ rrdhost_stream_path_clear_unsafe(host, true);
+
+ CLEAN_BUFFER *error = buffer_create(0, NULL);
+
+ json_object *_jarray;
+ if (json_object_object_get_ex(jobj, STREAM_PATH_JSON_MEMBER, &_jarray) &&
+ json_object_is_type(_jarray, json_type_array)) {
+ size_t items = json_object_array_length(_jarray);
+ host->rrdpush.path.array = callocz(items, sizeof(*host->rrdpush.path.array));
+ host->rrdpush.path.size = items;
+
+ for (size_t i = 0; i < items; ++i) {
+ json_object *joption = json_object_array_get_idx(_jarray, i);
+ if (!json_object_is_type(joption, json_type_object)) {
+ nd_log(NDLS_DAEMON, NDLP_ERR,
+ "STREAM PATH: Array item No %zu is not an object: %s", i, json);
+ continue;
+ }
+
+ if(!parse_single_path(joption, "", &host->rrdpush.path.array[host->rrdpush.path.used], error)) {
+ stream_path_clear(&host->rrdpush.path.array[host->rrdpush.path.used]);
+ nd_log(NDLS_DAEMON, NDLP_ERR,
+ "STREAM PATH: Array item No %zu cannot be parsed: %s: %s", i, buffer_tostring(error), json);
+ }
+ else
+ host->rrdpush.path.used++;
+ }
+ }
+
+ if(host->rrdpush.path.used > 1) {
+ // sorting is required in order to support stream_path_parent_disconnected()
+ qsort(host->rrdpush.path.array, host->rrdpush.path.used,
+ sizeof(*host->rrdpush.path.array), compare_by_hops);
+ }
+
+ XXH128_hash_t new_hash = stream_path_hash_unsafe(host);
+ spinlock_unlock(&host->rrdpush.path.spinlock);
+
+ if(!XXH128_isEqual(old_hash, new_hash)) {
+ if(!from_parent)
+ stream_path_send_to_parent(host);
+
+ // when it comes from the child, we still need to send it back to the child
+ // including our own entry in it.
+ stream_path_send_to_child(host);
+ }
+
+ return host->rrdpush.path.used > 0;
+}
diff --git a/src/streaming/stream-path.h b/src/streaming/stream-path.h
new file mode 100644
index 000000000..6dc323bdd
--- /dev/null
+++ b/src/streaming/stream-path.h
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_STREAM_PATH_H
+#define NETDATA_STREAM_PATH_H
+
+#include "stream-capabilities.h"
+
+#define STREAM_PATH_JSON_MEMBER "streaming_path"
+
+typedef enum __attribute__((packed)) {
+ STREAM_PATH_FLAG_NONE = 0,
+ STREAM_PATH_FLAG_ACLK = (1 << 0),
+} STREAM_PATH_FLAGS;
+
+typedef struct stream_path {
+ STRING *hostname; // the hostname of the agent
+ ND_UUID host_id; // the machine guid of the agent
+ ND_UUID node_id; // the cloud node id of the agent
+ ND_UUID claim_id; // the cloud claim id of the agent
+ time_t since; // the timestamp of the last update
+ time_t first_time_t; // the oldest timestamp in the db
+ int16_t hops; // -1 = stale node, 0 = localhost, >0 the hops count
+ STREAM_PATH_FLAGS flags; // ACLK or NONE for the moment
+ STREAM_CAPABILITIES capabilities; // streaming connection capabilities
+ uint32_t start_time; // median time in ms the agent needs to start
+ uint32_t shutdown_time; // median time in ms the agent needs to shutdown
+} STREAM_PATH;
+
+typedef struct rrdhost_stream_path {
+ SPINLOCK spinlock;
+ uint16_t size;
+ uint16_t used;
+ STREAM_PATH *array;
+} RRDHOST_STREAM_PATH;
+
+
+struct rrdhost;
+
+void stream_path_send_to_parent(struct rrdhost *host);
+void stream_path_send_to_child(struct rrdhost *host);
+
+void rrdhost_stream_path_to_json(BUFFER *wb, struct rrdhost *host, const char *key, bool add_version);
+void rrdhost_stream_path_clear(struct rrdhost *host, bool destroy);
+
+void stream_path_retention_updated(struct rrdhost *host);
+void stream_path_node_id_updated(struct rrdhost *host);
+
+void stream_path_child_disconnected(struct rrdhost *host);
+void stream_path_parent_disconnected(struct rrdhost *host);
+STREAM_PATH rrdhost_stream_path_fetch(struct rrdhost *host);
+
+bool stream_path_set_from_json(struct rrdhost *host, const char *json, bool from_parent);
+
+#endif //NETDATA_STREAM_PATH_H
diff --git a/src/streaming/stream.conf b/src/streaming/stream.conf
index 475d5eac2..659bd830d 100644
--- a/src/streaming/stream.conf
+++ b/src/streaming/stream.conf
@@ -62,32 +62,33 @@
#enable compression = yes
# The timeout to connect and send metrics
- timeout seconds = 60
+ #timeout = 1m
# If the destination line above does not specify a port, use this
- default port = 19999
+ #default port = 19999
- # filter the charts to be streamed
+ # filter the charts and contexts to be streamed
# netdata SIMPLE PATTERN:
# - space separated list of patterns (use \ to include spaces in patterns)
# - use * as wildcard, any number of times within each pattern
# - prefix a pattern with ! for a negative match (ie not stream the charts it matches)
# - the order of patterns is important (left to right)
# To send all except a few, use: !this !that * (ie append a wildcard pattern)
- send charts matching = *
+ # The pattern is matched against the context, the chart name and the chart id.
+ #send charts matching = *
# The buffer to use for sending metrics.
# 10MB is good for 60 seconds of data, so increase this if you expect latencies.
# The buffer is flushed on reconnects (this will not prevent gaps at the charts).
- buffer size bytes = 10485760
+ #buffer size bytes = 10485760
# If the connection fails, or it disconnects,
# retry after that many seconds.
- reconnect delay seconds = 5
+ #reconnect delay = 5s
# Sync the clock of the charts for that many iterations, when starting.
# It is ignored when replication is enabled
- initial clock resync iterations = 60
+ #initial clock resync iterations = 60
# -----------------------------------------------------------------------------
# 2. ON PARENT NETDATA - THE ONE THAT WILL BE RECEIVING METRICS
@@ -124,21 +125,21 @@
# will be pushing metrics using this API key.
# The metrics are received via the API port, so the same IPs
# should also be matched at netdata.conf [web].allow connections from
- allow from = *
+ #allow from = *
- # The default history in entries, for all hosts using this API key.
+ # The history in entries (for db alloc or ram), for all hosts using this API key.
# You can also set it per host below.
- # For the default db mode (dbengine), this is ignored.
- #default history = 3600
+ # For the default db (dbengine), this is ignored.
+ #retention = 3600
- # The default memory mode to be used for all hosts using this API key.
+ # The database to be used for all hosts using this API key.
# You can also set it per host below.
# If you don't set it here, the memory mode of netdata.conf will be used.
# Valid modes:
# ram keep it in RAM, don't touch the disk
# none no database at all (use this on headless proxies)
- # dbengine like a traditional database
- #default memory mode = dbengine
+ # dbengine Netdata's high performance database
+ #db = dbengine
# Shall we enable health monitoring for the hosts using this API key?
# 3 possible values:
@@ -150,18 +151,18 @@
# The default is taken from [health].enabled of netdata.conf
#health enabled by default = auto
- # postpone alarms for a short period after the sender is connected
- default postpone alarms on connect seconds = 60
+ # postpone alerts for a short period after the sender is connected
+ #postpone alerts on connect = 1m
- # seconds of health log events to keep
- #default health log history = 432000
+ # the duration to maintain health log events
+ #health log retention = 5d
# need to route metrics differently? set these.
# the defaults are the ones at the [stream] section (above)
- #default proxy enabled = yes | no
- #default proxy destination = IP:PORT IP:PORT ...
- #default proxy api key = API_KEY
- #default proxy send charts matching = *
+ #proxy enabled = yes | no
+ #proxy destination = IP:PORT IP:PORT ...
+ #proxy api key = API_KEY
+ #proxy send charts matching = *
# Stream Compression
# By default it is enabled.
@@ -176,13 +177,13 @@
#enable replication = yes
# How many seconds to replicate from each child. Default: a day
- #seconds to replicate = 86400
+ #replication period = 1d
# The duration we want to replicate per each step.
- #seconds per replication step = 600
+ #replication step = 10m
# Indicate whether this child is an ephemeral node. An ephemeral node will become unavailable
- # after the specified duration of "cleanup ephemeral hosts after secs" (as defined in the db section of netdata.conf)
+ # after the specified duration of "cleanup ephemeral hosts after" (as defined in the db section of netdata.conf)
# from the time of the node's last connection.
#is ephemeral node = no
@@ -217,23 +218,23 @@
# The metrics are received via the API port, so the same IPs
# should also be matched at netdata.conf [web].allow connections from
# and at stream.conf [API_KEY].allow from
- allow from = *
+ #allow from = *
# The number of entries in the database.
- # This is ignored for db mode dbengine.
- #history = 3600
+ # This is ignored for db dbengine.
+ #retention = 3600
# The memory mode of the database: ram | none | dbengine
- #memory mode = dbengine
+ #db = dbengine
# Health / alarms control: yes | no | auto
#health enabled = auto
- # postpone alarms when the sender connects
- postpone alarms on connect seconds = 60
+ # postpone alerts when the sender connects
+ #postpone alerts on connect = 1m
- # seconds of health log events to keep
- #health log history = 432000
+ # the duration to maintain health log events
+ #health log retention = 5d
# need to route metrics differently?
# the defaults are the ones at the [API KEY] section
@@ -252,12 +253,12 @@
#enable replication = yes
# How many seconds to replicate from each child.
- #seconds to replicate = 86400
+ #replication period = 1d
# The duration we want to replicate per each step.
- #seconds per replication step = 600
+ #replication step = 10m
# Indicate whether this child is an ephemeral node. An ephemeral node will become unavailable
- # after the specified duration of "cleanup ephemeral hosts after secs" (as defined in the db section of netdata.conf)
+ # after the specified duration of "cleanup ephemeral hosts after" (as defined in the db section of netdata.conf)
# from the time of the node's last connection.
#is ephemeral node = no