diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-11-25 17:33:56 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-11-25 17:34:10 +0000 |
commit | 83ba6762cc43d9db581b979bb5e3445669e46cc2 (patch) | |
tree | 2e69833b43f791ed253a7a20318b767ebe56cdb8 /src/streaming/README.md | |
parent | Releasing debian version 1.47.5-1. (diff) | |
download | netdata-83ba6762cc43d9db581b979bb5e3445669e46cc2.tar.xz netdata-83ba6762cc43d9db581b979bb5e3445669e46cc2.zip |
Merging upstream version 2.0.3+dfsg (Closes: #923993, #1042533, #1045145).
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/streaming/README.md')
-rw-r--r-- | src/streaming/README.md | 138 |
1 files changed, 82 insertions, 56 deletions
diff --git a/src/streaming/README.md b/src/streaming/README.md index fe4e01bae..74b5691d0 100644 --- a/src/streaming/README.md +++ b/src/streaming/README.md @@ -30,6 +30,8 @@ node**. This file is automatically generated by Netdata the first time it is sta #### `[stream]` section +This section is used by the sending Netdata. + | Setting | Default | Description | |-------------------------------------------------|---------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `enabled` | `no` | Whether this node streams metrics to any parent. Change to `yes` to enable streaming. | @@ -38,34 +40,62 @@ node**. This file is automatically generated by Netdata the first time it is sta | `CApath` | `/etc/ssl/certs/` | The directory where known certificates are found. Defaults to OpenSSL's default path. | | `CAfile` | `/etc/ssl/certs/cert.pem` | Add a parent node certificate to the list of known certificates in `CAPath`. | | `api key` | | The `API_KEY` to use as the child node. | -| `timeout seconds` | `60` | The timeout to connect and send metrics to a parent. | +| `timeout` | `1m` | The timeout to connect and send metrics to a parent. | | `default port` | `19999` | The port to use if `destination` does not specify one. | | [`send charts matching`](#send-charts-matching) | `*` | A space-separated list of [Netdata simple patterns](/src/libnetdata/simple_pattern/README.md) to filter which charts are streamed. [Read more →](#send-charts-matching) | | `buffer size bytes` | `10485760` | The size of the buffer to use when sending metrics. The default `10485760` equals a buffer of 10MB, which is good for 60 seconds of data. Increase this if you expect latencies higher than that. The buffer is flushed on reconnect. | -| `reconnect delay seconds` | `5` | How long to wait until retrying to connect to the parent node. | +| `reconnect delay` | `5s` | How long to wait until retrying to connect to the parent node. | | `initial clock resync iterations` | `60` | Sync the clock of charts for how many seconds when starting. | | `parent using h2o` | `no` | Set to yes if you are connecting to parent trough it's h2o webserver/port. Currently there is no reason to set this to `yes` unless you are testing the new h2o based netdata webserver. When production ready this will be set to `yes` as default. | -### `[API_KEY]` and `[MACHINE_GUID]` sections - -| Setting | Default | Description | -|-----------------------------------------------|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `enabled` | `no` | Whether this API KEY enabled or disabled. | -| [`allow from`](#allow-from) | `*` | A space-separated list of [Netdata simple patterns](/src/libnetdata/simple_pattern/README.md) matching the IPs of nodes that will stream metrics using this API key. [Read more →](#allow-from) | -| `default history` | `3600` | The default amount of child metrics history to retain when using the `ram` memory mode. | -| [`default memory mode`](#default-memory-mode) | `ram` | The [database](/src/database/README.md) to use for all nodes using this `API_KEY`. Valid settings are `dbengine`, `ram`, or `none`. [Read more →](#default-memory-mode) | -| `health enabled by default` | `auto` | Whether alerts and notifications should be enabled for nodes using this `API_KEY`. `auto` enables alerts when the child is connected. `yes` enables alerts always, and `no` disables alerts. | -| `default postpone alarms on connect seconds` | `60` | Postpone alerts and notifications for a period of time after the child connects. | -| `default health log history` | `432000` | History of health log events (in seconds) kept in the database. | -| `default proxy enabled` | | Route metrics through a proxy. | -| `default proxy destination` | | Space-separated list of `IP:PORT` for proxies. | -| `default proxy api key` | | The `API_KEY` of the proxy. | -| `default send charts matching` | `*` | See [`send charts matching`](#send-charts-matching). | -| `enable compression` | `yes` | Enable/disable stream compression. | -| `enable replication` | `yes` | Enable/disable replication. | -| `seconds to replicate` | `86400` | How many seconds of data to replicate from each child at a time | -| `seconds per replication step` | `600` | The duration we want to replicate per each replication step. | -| `is ephemeral node` | `no` | Indicate whether this child is an ephemeral node. An ephemeral node will become unavailable after the specified duration of "cleanup ephemeral hosts after secs" from the time of the node's last connection. | +### `[API_KEY]` sections + +This section defines an API key for other agents to connect to this Netdata. + +| Setting | Default | Description | +|------------------------------|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `enabled` | `no` | Whether this API KEY enabled or disabled. | +| `type` | `api` | This section defines an API key. | +| [`allow from`](#allow-from) | `*` | A space-separated list of [Netdata simple patterns](/src/libnetdata/simple_pattern/README.md) matching the IPs of nodes that will stream metrics using this API key. [Read more →](#allow-from) | +| `retention` | `1h` | The default amount of child metrics history to retain when using the `ram` db. | +| [`db`](#default-memory-mode) | `dbengine` | The [database](/src/database/README.md) to use for all nodes using this `API_KEY`. Valid settings are `dbengine`, `ram`, or `none`. [Read more →](#default-memory-mode) | +| `health enabled by default` | `auto` | Whether alerts and notifications should be enabled for nodes using this `API_KEY`. `auto` enables alerts when the child is connected. `yes` enables alerts always, and `no` disables alerts. | +| `postpone alerts on connect` | `1m` | Postpone alerts and notifications for a period of time after the child connects. | +| `health log retention` | `5d` | History of health log events (in seconds) kept in the database. | +| `proxy enabled` | | Route metrics through a proxy. | +| `proxy destination` | | Space-separated list of `IP:PORT` for proxies. | +| `proxy api key` | | The `API_KEY` of the proxy. | +| `send charts matching` | `*` | See [`send charts matching`](#send-charts-matching). | +| `enable compression` | `yes` | Enable/disable stream compression. | +| `enable replication` | `yes` | Enable/disable replication. | +| `replication period` | `1d` | Limits the maximum window that will be replicated from each child. | +| `replication step` | `10m` | The duration we want to replicate per each replication step. | +| `is ephemeral node` | `no` | Indicate whether this child is an ephemeral node. An ephemeral node will become unavailable after the specified duration of "cleanup ephemeral hosts after" from the time of the node's last connection. | + + +### `[MACHINE_GUID]` sections + +This section is about customizing configuration for specific agents. It allows many agents to share the same API key, while providing customizability per remote agent. + +| Setting | Default | Description | +|------------------------------|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `enabled` | `no` | Whether this MACHINE_GUID enabled or disabled. | +| `type` | `machine` | This section defines the configuration for a specific agent. | +| [`allow from`](#allow-from) | `*` | A space-separated list of [Netdata simple patterns](/src/libnetdata/simple_pattern/README.md) matching the IPs of nodes that will stream metrics using this API key. [Read more →](#allow-from) | +| `retention` | `3600` | The default amount of child metrics history to retain when using the `ram` db. | +| [`db`](#default-memory-mode) | `dbengine` | The [database](/src/database/README.md) to use for all nodes using this `API_KEY`. Valid settings are `dbengine`, `ram`, or `none`. [Read more →](#default-memory-mode) | +| `health enabled` | `auto` | Whether alerts and notifications should be enabled for nodes using this `API_KEY`. `auto` enables alerts when the child is connected. `yes` enables alerts always, and `no` disables alerts. | +| `postpone alerts on connect` | `1m` | Postpone alerts and notifications for a period of time after the child connects. | +| `health log retention` | `5d` | History of health log events (in seconds) kept in the database. | +| `proxy enabled` | | Route metrics through a proxy. | +| `proxy destination` | | Space-separated list of `IP:PORT` for proxies. | +| `proxy api key` | | The `API_KEY` of the proxy. | +| `send charts matching` | `*` | See [`send charts matching`](#send-charts-matching). | +| `enable compression` | `yes` | Enable/disable stream compression. | +| `enable replication` | `yes` | Enable/disable replication. | +| `replication period` | `1d` | Limits the maximum window that will be replicated from each child. | +| `replication step` | `10m` | The duration we want to replicate per each replication step. | +| `is ephemeral node` | `no` | Indicate whether this child is an ephemeral node. An ephemeral node will become unavailable after the specified duration of "cleanup ephemeral hosts after" from the time of the node's last connection. | #### `destination` @@ -81,7 +111,7 @@ the following format: `[PROTOCOL:]HOST[%INTERFACE][:PORT][:SSL]`. To enable TCP streaming to a parent node at `203.0.113.0` on port `20000` and with TLS/SSL encryption: -```conf +```text [stream] destination = tcp:203.0.113.0:20000:SSL ``` @@ -95,14 +125,14 @@ The default is a single wildcard `*`, which streams all charts. To send only a few charts, list them explicitly, or list a group using a wildcard. To send _only_ the `apps.cpu` chart and charts with contexts beginning with `system.`: -```conf +```text [stream] send charts matching = apps.cpu system.* ``` To send all but a few charts, use `!` to create a negative match. To send _all_ charts _but_ `apps.cpu`: -```conf +```text [stream] send charts matching = !apps.cpu * ``` @@ -116,14 +146,14 @@ The default is `*`, which accepts all requests including the `API_KEY`. To allow from only a specific IP address: -```conf +```text [API_KEY] allow from = 203.0.113.10 ``` To allow all IPs starting with `10.*`, except `10.1.2.3`: -```conf +```text [API_KEY] allow from = !10.1.2.3 10.* ``` @@ -131,7 +161,7 @@ To allow all IPs starting with `10.*`, except `10.1.2.3`: > If you set specific IP addresses here, and also use the `allow connections` setting in the `[web]` section of > `netdata.conf`, be sure to add the IP address there so that it can access the API port. -#### `default memory mode` +#### `db` The [database](/src/database/README.md) to use for all nodes using this `API_KEY`. Valid settings are `dbengine`, `ram`, , or `none`. @@ -142,19 +172,15 @@ Valid settings are `dbengine`, `ram`, , or `none`. streaming configurations that use ephemeral nodes. - `none`: No database. -When using `default memory mode = dbengine`, the parent node creates a separate instance of the TSDB to store metrics -from child nodes. The [size of _each_ instance is configurable](/docs/netdata-agent/configuration/optimizing-metrics-database/change-metrics-storage.md) with the `page -cache size` and `dbengine multihost disk space` settings in the `[global]` section in `netdata.conf`. - ### `netdata.conf` -| Setting | Default | Description | -|--------------------------------------------|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `[global]` section | | | -| `memory mode` | `dbengine` | Determines the [database type](/src/database/README.md) to be used on that node. Other options settings include `none`, and `ram`. `none` disables the database at this host. This also disables alerts and notifications, as those can't run without a database. | -| `[web]` section | | | -| `mode` | `static-threaded` | Determines the [web server](/src/web/server/README.md) type. The other option is `none`, which disables the dashboard, API, and registry. | -| `accept a streaming request every seconds` | `0` | Set a limit on how often a parent node accepts streaming requests from child nodes. `0` equals no limit. If this is set, you may see `... too busy to accept new streaming request. Will be allowed in X secs` in Netdata's `error.log`. | +| Setting | Default | Description | +|------------------------------------|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `[db]` section | | | +| `mode` | `dbengine` | Determines the [database type](/src/database/README.md) to be used on that node. Other options settings include `none`, and `ram`. `none` disables the database at this host. This also disables alerts and notifications, as those can't run without a database. | +| `[web]` section | | | +| `mode` | `static-threaded` | Determines the [web server](/src/web/server/README.md) type. The other option is `none`, which disables the dashboard, API, and registry. | +| `accept a streaming request every` | `off` | Set a limit on how often a parent node accepts streaming requests from child nodes. `0` equals no limit. If this is set, you may see `... too busy to accept new streaming request. Will be allowed in X secs` in Netdata's `error.log`. | ### Basic use cases @@ -175,16 +201,16 @@ with the `[MACHINE_GUID]` section. For example, the metrics streamed from only the child node with `MACHINE_GUID` are saved in memory, not using the default `dbengine` as specified by the `API_KEY`, and alerts are disabled. -```conf +```text [API_KEY] enabled = yes - default memory mode = dbengine - health enabled by default = auto + db = dbengine + health enabled = auto allow from = * [MACHINE_GUID] enabled = yes - memory mode = ram + db = ram health enabled = no ``` @@ -405,7 +431,7 @@ In the following example, the proxy receives metrics from a child node using the `66666666-7777-8888-9999-000000000000`, then stores metrics using `dbengine`. It then uses the `API_KEY` of `11111111-2222-3333-4444-555555555555` to proxy those same metrics on to a parent node at `203.0.113.0`. -```conf +```text [stream] enabled = yes destination = 203.0.113.0 @@ -413,7 +439,7 @@ In the following example, the proxy receives metrics from a child node using the [66666666-7777-8888-9999-000000000000] enabled = yes - default memory mode = dbengine + db = dbengine ``` ### Ephemeral nodes @@ -423,13 +449,13 @@ metrics to any number of permanently-running parent nodes. On the parent, set the following in `stream.conf`: -```conf +```text [11111111-2222-3333-4444-555555555555] # enable/disable this API key enabled = yes # one hour of data for each of the child nodes - default history = 3600 + history = 1h # do not save child metrics on disk default memory = ram @@ -455,9 +481,9 @@ On the child nodes, set the following in `stream.conf`: In addition, edit `netdata.conf` on each child node to disable the database and alerts. ```bash -[global] +[db] # disable the local database - memory mode = none + db = none [health] # disable health checks @@ -471,16 +497,16 @@ This replication process ensures data continuity even if child nodes temporarily Replication is enabled by default in Netdata, but you can customize the replication behavior by modifying the `[API_KEY]` section of the `stream.conf` file. Here's an example configuration: -```conf +```text [11111111-2222-3333-4444-555555555555] # Enable replication for all hosts using this api key. Default: yes. enable replication = yes - # How many seconds of data to replicate from each child at a time. Default: a day (86400 seconds). - seconds to replicate = 86400 + # How many seconds of data to replicate from each child at a time. Default: a day. + replication period = 1d - # The duration we want to replicate per each replication step. Default: 600 seconds (10 minutes). - seconds per replication step = 600 + # The duration we want to replicate per each replication step. Default: 10 minutes. + replication step = 10m ``` You can monitor the replication process in two ways: @@ -597,9 +623,9 @@ ERROR : STREAM_SENDER[CHILD HOSTNAME] : STREAM child HOSTNAME [send to PARENT HO ### Stream charts wrong Chart data needs to be consistent between child and parent nodes. If there are differences between chart data on -a parent and a child, such as gaps in metrics collection, it most often means your child's `memory mode` +a parent and a child, such as gaps in metrics collection, it most often means your child's `[db].db` setting does not match the parent's. To learn more about the different ways Netdata can store metrics, and thus keep chart -data consistent, read our [memory mode documentation](/src/database/README.md). +data consistent, read our [db documentation](/src/database/README.md). ### Forbidding access |