From 81581f9719bc56f01d5aa08952671d65fda9867a Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Mon, 8 May 2023 18:27:08 +0200 Subject: Merging upstream version 1.39.0. Signed-off-by: Daniel Baumann --- .../metrics-storage-management/enable-streaming.md | 228 ++++++++++ .../enable-streaming.mdx | 158 ------- .../how-streaming-works.mdx | 99 ----- .../reference-streaming.mdx | 490 --------------------- 4 files changed, 228 insertions(+), 747 deletions(-) create mode 100644 docs/metrics-storage-management/enable-streaming.md delete mode 100644 docs/metrics-storage-management/enable-streaming.mdx delete mode 100644 docs/metrics-storage-management/how-streaming-works.mdx delete mode 100644 docs/metrics-storage-management/reference-streaming.mdx (limited to 'docs/metrics-storage-management') diff --git a/docs/metrics-storage-management/enable-streaming.md b/docs/metrics-storage-management/enable-streaming.md new file mode 100644 index 000000000..f54ffaeba --- /dev/null +++ b/docs/metrics-storage-management/enable-streaming.md @@ -0,0 +1,228 @@ +# How metrics streaming works + +Each node running Netdata can stream the metrics it collects, in real time, to another node. Streaming allows you to +replicate metrics data across multiple nodes, or centralize all your metrics data into a single time-series database +(TSDB). + +When one node streams metrics to another, the node receiving metrics can visualize them on the dashboard, run health checks to +[trigger alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/view-active-alarms.md) and +[send notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md), and +[export](https://github.com/netdata/netdata/blob/master/docs/export/external-databases.md) all metrics to an external TSDB. When Netdata streams metrics to another +Netdata, the receiving one is able to perform everything a Netdata instance is capable of. + +Streaming lets you decide exactly how you want to store and maintain metrics data. While we believe Netdata's +[distributed architecture](https://github.com/netdata/netdata/blob/master/docs/store/distributed-data-architecture.md) is +ideal for speed and scale, streaming provides centralization options and high data availability. + +This document will get you started quickly with streaming. More advanced concepts and suggested production deployments +can be found in the [streaming and replication reference](https://github.com/netdata/netdata/blob/master/streaming/README.md). + +## Streaming basics + +There are three types of nodes in Netdata's streaming ecosystem. + +- **Parent**: A node, running Netdata, that receives streamed metric data. +- **Child**: A node, running Netdata, that streams metric data to one or more parent. +- **Proxy**: A node, running Netdata, that receives metric data from a child and "forwards" them on to a + separate parent node. + +Netdata uses API keys, which are just random GUIDs, to authorize the communication between child and parent nodes. We +recommend using `uuidgen` for generating API keys, which can then be used across any number of streaming connections. +Or, you can generate unique API keys for each parent-child relationship. + +Once the parent node authorizes the child's API key, the child can start streaming metrics. + +It's important to note that the streaming connection uses TCP, UDP, or Unix sockets, _not HTTP_. To proxy streaming +metrics, you need to use a proxy that tunnels [OSI layer 4-7 +traffic](https://en.wikipedia.org/wiki/OSI_model#Layer_4:_Transport_Layer) without interfering with it, such as +[SOCKS](https://en.wikipedia.org/wiki/SOCKS) or Nginx's +[TCP/UDP load balancing](https://docs.nginx.com/nginx/admin-guide/load-balancer/tcp-udp-load-balancer/). + +## Supported streaming configurations + +Netdata supports any combination of parent, child, and proxy nodes that you can imagine. Any node can act as both a +parent, child, or proxy at the same time, sending or receiving streaming metrics from any number of other nodes. + +Here are a few example streaming configurations: + +- **Headless collector**: + - Child `A`, _without_ a database or web dashboard, streams metrics to parent `B`. + - `A` metrics are only available via the local Agent dashboard for `B`. + - `B` generates alarms for `A`. +- **Replication**: + - Child `A`, _with_ a database and web dashboard, streams metrics to parent `B`. + - `A` metrics are available on both local Agent dashboards, and can be stored with the same or different metrics + retention policies. + - Both `A` and `B` generate alarms. +- **Proxy**: + - Child `A`, _with or without_ a database, sends metrics to proxy `C`, also _with or without_ a database. `C` sends + metrics to parent `B`. + - Any node with a database can generate alarms. + + + +### A basic parent child setup + +![simple-parent-child](https://user-images.githubusercontent.com/43294513/232492152-11886282-29bc-401f-9577-24237e43a501.jpg) + +For a predictable number of non-ephemeral nodes, install a Netdata agent on each node and replicate its data to a +Netdata parent, preferrably on a management/admin node outside your production infrastructure. +There are two variations of the basic setup: + +- When your nodes have sufficient RAM and disk IO the Netdata agents on each node can run with the default + settings for data collection and retention. + +- When your nodes have severe RAM and disk IO limitations (e.g. Raspberry Pis), you should + [optimize the Netdata agent's performance](https://github.com/netdata/netdata/blob/master/docs/guides/configure/performance.md). + +[Secure your nodes](https://github.com/netdata/netdata/blob/master/docs/category-overview-pages/secure-nodes.md) to +protect them from the internet by making their UI accessible only via an nginx proxy, with potentially different subdomains +for the parent and even each child, if necessary. + +Both children and the parent are connected to the cloud, to enable infrastructure observability, +[without transferring the collected data](https://github.com/netdata/netdata/blob/master/docs/netdata-security.md). +Requests for data are always serverd by a connected Netdata agent. When both a child and a parent are connected, +the cloud will always select the parent to query the user requested data. + +### An advanced setup + +![Ephemeral nodes with two parents](https://user-images.githubusercontent.com/43294513/228891974-590bf0de-4e5a-46b2-a07a-7bb3dffde2bf.jpg) + +When the nodes are ephemeral, we recommend using two parents in an active-active setup, and having the children not store data at all. + +Both parents are configured on each child, so that if one is not available, they connect to the other. + +The children in this set up are not connected to Netdata Cloud at all, as high availability is achieved with the second parent. + +## Enable streaming between nodes + +The simplest streaming configuration is **replication**, in which a child node streams its metrics in real time to a +parent node, and both nodes retain metrics in their own databases. + +To configure replication, you need two nodes, each running Netdata. First you'll first enable streaming on your parent +node, then enable streaming on your child node. When you're finished, you'll be able to see the child node's metrics in +the parent node's dashboard, quickly switch between the two dashboards, and be able to serve +[alarm notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md) from either or both nodes. + +### Enable streaming on the parent node + +First, log onto the node that will act as the parent. + +Run `uuidgen` to create a new API key, which is a randomly-generated machine GUID the Netdata Agent uses to identify +itself while initiating a streaming connection. Copy that into a separate text file for later use. + +> Find out how to [install `uuidgen`](https://command-not-found.com/uuidgen) on your node if you don't already have it. + +Next, open `stream.conf` using [`edit-config`](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) +from within the [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata +sudo ./edit-config stream.conf +``` + +Scroll down to the section beginning with `[API_KEY]`. Paste the API key you generated earlier between the brackets, so +that it looks like the following: + +```conf +[11111111-2222-3333-4444-555555555555] +``` + +Set `enabled` to `yes`, and `default memory mode` to `dbengine`. Leave all the other settings as their defaults. A +simplified version of the configuration, minus the commented lines, looks like the following: + +```conf +[11111111-2222-3333-4444-555555555555] + enabled = yes + default memory mode = dbengine +``` + +Save the file and close it, then restart Netdata with `sudo systemctl restart netdata`, or the [appropriate +method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. + +### Enable streaming on the child node + +Connect to your child node with SSH. + +Open `stream.conf` again. Scroll down to the `[stream]` section and set `enabled` to `yes`. Paste the IP address of your +parent node at the end of the `destination` line, and paste the API key generated on the parent node onto the `api key` +line. + +Leave all the other settings as their defaults. A simplified version of the configuration, minus the commented lines, +looks like the following: + +```conf +[stream] + enabled = yes + destination = 203.0.113.0 + api key = 11111111-2222-3333-4444-555555555555 +``` + +Save the file and close it, then restart Netdata with `sudo systemctl restart netdata`, or the [appropriate +method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. + +### Enable TLS/SSL on streaming (optional) + +While encrypting the connection between your parent and child nodes is recommended for security, it's not required to +get started. If you're not interested in encryption, skip ahead to [view streamed +metrics](#view-streamed-metrics-in-netdatas-dashboard). + +In this example, we'll use self-signed certificates. + +On the **parent** node, use OpenSSL to create the key and certificate, then use `chown` to make the new files readable +by the `netdata` user. + +```bash +sudo openssl req -newkey rsa:2048 -nodes -sha512 -x509 -days 365 -keyout /etc/netdata/ssl/key.pem -out /etc/netdata/ssl/cert.pem +sudo chown netdata:netdata /etc/netdata/ssl/cert.pem /etc/netdata/ssl/key.pem +``` + +Next, enforce TLS/SSL on the web server. Open `netdata.conf`, scroll down to the `[web]` section, and look for the `bind +to` setting. Add `^SSL=force` to turn on TLS/SSL. See the [web server +reference](https://github.com/netdata/netdata/blob/master/web/server/README.md#enabling-tls-support) for other TLS/SSL options. + +```conf +[web] + bind to = *=dashboard|registry|badges|management|streaming|netdata.conf^SSL=force +``` + +Next, connect to the **child** node and open `stream.conf`. Add `:SSL` to the end of the existing `destination` setting +to connect to the parent using TLS/SSL. Uncomment the `ssl skip certificate verification` line to allow the use of +self-signed certificates. + +```conf +[stream] + enabled = yes + destination = 203.0.113.0:SSL + ssl skip certificate verification = yes + api key = 11111111-2222-3333-4444-555555555555 +``` + +Restart both the parent and child nodes with `sudo systemctl restart netdata`, or the [appropriate +method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system, to stream encrypted metrics using TLS/SSL. + +### View streamed metrics in Netdata Cloud + +In Netdata Cloud you should now be able to see a new parent showing up in the Home tab under "Nodes by data replication". +The replication factor for the child node has now increased to 2, meaning that its data is now highly available. + +You don't need to do anything else, as the cloud will automatically prefer to fetch data about the child from the parent +and switch to querying the child only when the parent is unavailable, or for some reason doesn't have the requested +data (e.g. the connection between parent and the child is broken). + +### View streamed metrics in Netdata's dashboard + +At this point, the child node is streaming its metrics in real time to its parent. Open the local Agent dashboard for +the parent by navigating to `http://PARENT-NODE:19999` in your browser, replacing `PARENT-NODE` with its IP address or +hostname. + +This dashboard shows parent metrics. To see child metrics, open the left-hand sidebar with the hamburger icon +![Hamburger icon](https://raw.githubusercontent.com/netdata/netdata-ui/master/src/components/icon/assets/hamburger.svg) +in the top panel. Both nodes appear under the **Replicated Nodes** menu. Click on either of the links to switch between +separate parent and child dashboards. + +![Switching between parent and child dashboards](https://user-images.githubusercontent.com/1153921/110043346-761ec000-7d04-11eb-8e58-77670ba39161.gif) + +The child dashboard is also available directly at `http://PARENT-NODE:19999/host/CHILD-HOSTNAME`, which in this example +is `http://203.0.113.0:19999/host/netdata-child`. + diff --git a/docs/metrics-storage-management/enable-streaming.mdx b/docs/metrics-storage-management/enable-streaming.mdx deleted file mode 100644 index 3bcf19b40..000000000 --- a/docs/metrics-storage-management/enable-streaming.mdx +++ /dev/null @@ -1,158 +0,0 @@ ---- -title: "Enable streaming between nodes" -description: >- - "With metrics streaming enabled, you can not only replicate metrics data - into a second database, but also view dashboards and trigger alarm notifications - for multiple nodes in parallel." -type: "how-to" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.mdx" -sidebar_label: "Enable streaming between nodes" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup" ---- - -# Enable streaming between nodes - -The simplest streaming configuration is **replication**, in which a child node streams its metrics in real time to a -parent node, and both nodes retain metrics in their own databases. - -To configure replication, you need two nodes, each running Netdata. First you'll first enable streaming on your parent -node, then enable streaming on your child node. When you're finished, you'll be able to see the child node's metrics in -the parent node's dashboard, quickly switch between the two dashboards, and be able to serve [alarm -notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md) from either or both nodes. - -## Enable streaming on the parent node - -First, log onto the node that will act as the parent. - -Run `uuidgen` to create a new API key, which is a randomly-generated machine GUID the Netdata Agent uses to identify -itself while initiating a streaming connection. Copy that into a separate text file for later use. - -> Find out how to [install `uuidgen`](https://command-not-found.com/uuidgen) on your node if you don't already have it. - -Next, open `stream.conf` using [`edit-config`](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) -from within the [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). - -```bash -cd /etc/netdata -sudo ./edit-config stream.conf -``` - -Scroll down to the section beginning with `[API_KEY]`. Paste the API key you generated earlier between the brackets, so -that it looks like the following: - -```conf -[11111111-2222-3333-4444-555555555555] -``` - -Set `enabled` to `yes`, and `default memory mode` to `dbengine`. Leave all the other settings as their defaults. A -simplified version of the configuration, minus the commented lines, looks like the following: - -```conf -[11111111-2222-3333-4444-555555555555] - enabled = yes - default memory mode = dbengine -``` - -Save the file and close it, then restart Netdata with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. - -## Enable streaming on the child node - -Connect to your child node with SSH. - -Open `stream.conf` again. Scroll down to the `[stream]` section and set `enabled` to `yes`. Paste the IP address of your -parent node at the end of the `destination` line, and paste the API key generated on the parent node onto the `api key` -line. - -Leave all the other settings as their defaults. A simplified version of the configuration, minus the commented lines, -looks like the following: - -```conf -[stream] - enabled = yes - destination = 203.0.113.0 - api key = 11111111-2222-3333-4444-555555555555 -``` - -Save the file and close it, then restart Netdata with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. - -## Enable TLS/SSL on streaming (optional) - -While encrypting the connection between your parent and child nodes is recommended for security, it's not required to -get started. If you're not interested in encryption, skip ahead to [view streamed -metrics](#view-streamed-metrics-in-netdatas-dashboard). - -In this example, we'll use self-signed certificates. - -On the **parent** node, use OpenSSL to create the key and certificate, then use `chown` to make the new files readable -by the `netdata` user. - -```bash -sudo openssl req -newkey rsa:2048 -nodes -sha512 -x509 -days 365 -keyout /etc/netdata/ssl/key.pem -out /etc/netdata/ssl/cert.pem -sudo chown netdata:netdata /etc/netdata/ssl/cert.pem /etc/netdata/ssl/key.pem -``` - -Next, enforce TLS/SSL on the web server. Open `netdata.conf`, scroll down to the `[web]` section, and look for the `bind -to` setting. Add `^SSL=force` to turn on TLS/SSL. See the [web server -reference](https://github.com/netdata/netdata/blob/master/web/server/README.md#enabling-tls-support) for other TLS/SSL options. - -```conf -[web] - bind to = *=dashboard|registry|badges|management|streaming|netdata.conf^SSL=force -``` - -Next, connect to the **child** node and open `stream.conf`. Add `:SSL` to the end of the existing `destination` setting -to connect to the parent using TLS/SSL. Uncomment the `ssl skip certificate verification` line to allow the use of -self-signed certificates. - -```conf -[stream] - enabled = yes - destination = 203.0.113.0:SSL - ssl skip certificate verification = yes - api key = 11111111-2222-3333-4444-555555555555 -``` - -Restart both the parent and child nodes with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system, to stream encrypted metrics using TLS/SSL. - -## View streamed metrics in Netdata's dashboard - -At this point, the child node is streaming its metrics in real time to its parent. Open the local Agent dashboard for -the parent by navigating to `http://PARENT-NODE:19999` in your browser, replacing `PARENT-NODE` with its IP address or -hostname. - -This dashboard shows parent metrics. To see child metrics, open the left-hand sidebar with the hamburger icon -![Hamburger icon](https://raw.githubusercontent.com/netdata/netdata-ui/master/src/components/icon/assets/hamburger.svg) -in the top panel. Both nodes appear under the **Replicated Nodes** menu. Click on either of the links to switch between -separate parent and child dashboards. - -![Switching between parent and child -dashboards](https://user-images.githubusercontent.com/1153921/110043346-761ec000-7d04-11eb-8e58-77670ba39161.gif) - -The child dashboard is also available directly at `http://PARENT-NODE:19999/host/CHILD-HOSTNAME`, which in this example -is `http://203.0.113.0:19999/host/netdata-child`. - -## What's next? - -Now that you have a basic streaming setup with replication, you may want to tweak the configuration to eliminate the -child database, disable the child dashboard, or enable SSL on the streaming connection between the parent and child. - -See the [streaming reference -doc](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/reference-streaming.mdx#examples) for details about -other possible configurations. - -When using Netdata's default TSDB (`dbengine`), the parent node maintains separate, parallel databases for itself and -every child node streaming to it. Each instance is sized identically based on the `dbengine multihost disk space` -setting in `netdata.conf`. See our doc on [changing metrics retention](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) for -details. - -### Related information & further reading - -- Streaming - - [How Netdata streams metrics](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/how-streaming-works.mdx) - - **[Enable streaming between nodes](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.mdx)** - - [Streaming reference](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/reference-streaming.mdx) diff --git a/docs/metrics-storage-management/how-streaming-works.mdx b/docs/metrics-storage-management/how-streaming-works.mdx deleted file mode 100644 index f181d3769..000000000 --- a/docs/metrics-storage-management/how-streaming-works.mdx +++ /dev/null @@ -1,99 +0,0 @@ ---- -title: "How metrics streaming works" -description: >- - "Netdata's real-time streaming allows you to replicate metrics data - across multiple nodes, or centralize all your metrics data into a single - time-series database (TSDB)." -type: "explanation" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/how-streaming-works.mdx" -sidebar_label: "How metrics streaming works" -learn_status: "Published" -learn_topic_type: "Concepts" -learn_rel_path: "Concepts" ---- - -# How metrics streaming works - -Each node running Netdata can stream the metrics it collects, in real time, to another node. Streaming allows you to -replicate metrics data across multiple nodes, or centralize all your metrics data into a single time-series database -(TSDB). - -When one node streams metrics to another, the node receiving metrics can visualize them on the -[dashboard](https://github.com/netdata/netdata/blob/master/docs/visualize/interact-dashboards-charts.md), run health checks to [trigger -alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/view-active-alarms.md) and [send notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md), and -[export](https://github.com/netdata/netdata/blob/master/docs/export/external-databases.md) all metrics to an external TSDB. When Netdata streams metrics to another -Netdata, the receiving one is able to perform everything a Netdata instance is capable of. - -Streaming lets you decide exactly how you want to store and maintain metrics data. While we believe Netdata's -[distributed architecture](https://github.com/netdata/netdata/blob/master/docs/store/distributed-data-architecture.md) is ideal for speed and scale, streaming -provides centralization options for those who want to maintain only a single TSDB instance. - -## Streaming basics - -There are three types of nodes in Netdata's streaming ecosystem. - -- **Parent**: A node, running Netdata, that receives streamed metric data. -- **Child**: A node, running Netdata, that streams metric data to one or more parent. -- **Proxy**: A node, running Netdata, that receives metric data from a child and "forwards" them on to a - separate parent node. - -Netdata uses API keys, which are just random GUIDs, to authorize the communication between child and parent nodes. We -recommend using `uuidgen` for generating API keys, which can then be used across any number of streaming connections. -Or, you can generate unique API keys for each parent-child relationship. - -Once the parent node authorizes the child's API key, the child can start streaming metrics. - -It's important to note that the streaming connection uses TCP, UDP, or Unix sockets, _not HTTP_. To proxy streaming -metrics, you need to use a proxy that tunnels [OSI layer 4-7 -traffic](https://en.wikipedia.org/wiki/OSI_model#Layer_4:_Transport_Layer) without interfering with it, such as -[SOCKS](https://en.wikipedia.org/wiki/SOCKS) or Nginx's [TCP/UDP load -balancing](https://docs.nginx.com/nginx/admin-guide/load-balancer/tcp-udp-load-balancer/). - -## Supported streaming configurations - -Netdata supports any combination of parent, child, and proxy nodes that you can imagine. Any node can act as both a -parent, child, or proxy at the same time, sending or receiving streaming metrics from any number of other nodes. - -Here are a few example streaming configurations: - -- **Headless collector**: - - Child `A`, _without_ a database or web dashboard, streams metrics to parent `B`. - - `A` metrics are only available via the local Agent dashboard for `B`. - - `B` generates alarms for `A`. -- **Replication**: - - Child `A`, _with_ a database and web dashboard, streams metrics to parent `B`. - - `A` metrics are available on both local Agent dashboards, and can be stored with the same or different metrics - retention policies. - - Both `A` and `B` generate alarms. -- **Proxy**: - - Child `A`, _with or without_ a database, sends metrics to proxy `C`, also _with or without_ a database. `C` sends - metrics to parent `B`. - - Any node with a database can generate alarms. - -## Viewing streamed metrics - -Parent nodes feature a **Replicated Nodes** section in the left-hand panel, which opens with the hamburger icon -![Hamburger icon](https://raw.githubusercontent.com/netdata/netdata-ui/master/src/components/icon/assets/hamburger.svg) -in the top navigation. The parent node, plus any child nodes, appear here. Click on any of the hostnames to switch -between parent and child dashboards, all served by the parent's [web server](https://github.com/netdata/netdata/blob/master/web/server/README.md). - -![Switching between -](https://user-images.githubusercontent.com/1153921/110043346-761ec000-7d04-11eb-8e58-77670ba39161.gif) - -Each child dashboard is also available directly at the following URL pattern: -`http://PARENT-NODE:19999/host/CHILD-HOSTNAME`. - -## What's next? - -Now that you understand the fundamentals of streaming metrics between nodes, go ahead and [enable -streaming](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.mdx) using a simple `parent-child` relationship. For all -the details, see the [streaming reference](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/reference-streaming.mdx) doc. - -Take your streaming setup even further by [exporting metrics](https://github.com/netdata/netdata/blob/master/docs/export/external-databases.md) to an external TSDB. - -### Related information & further reading - -- Streaming - - **[How Netdata streams metrics](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/how-streaming-works.mdx)** - - [Enable streaming between nodes](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.mdx) - - [Streaming reference](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/reference-streaming.mdx) \ No newline at end of file diff --git a/docs/metrics-storage-management/reference-streaming.mdx b/docs/metrics-storage-management/reference-streaming.mdx deleted file mode 100644 index 58c898639..000000000 --- a/docs/metrics-storage-management/reference-streaming.mdx +++ /dev/null @@ -1,490 +0,0 @@ ---- -title: "Streaming reference" -description: "Each node running Netdata can stream the metrics it collects, in real time, to another node. See all of the available settings in this reference document." -type: "reference" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/metrics-storage-management/reference-streaming.mdx" -sidebar_label: "Streaming reference" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "References/Configuration" ---- - -# Streaming reference - -Each node running Netdata can stream the metrics it collects, in real time, to another node. To learn more, read about -[how streaming works](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/how-streaming-works.mdx). - -For a quickstart guide for enabling a simple `parent-child` streaming relationship, see our [stream metrics between -nodes](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.mdx) doc. All other configuration options and scenarios are -covered in the sections below. - -## Configuration - -There are two files responsible for configuring Netdata's streaming capabilities: `stream.conf` and `netdata.conf`. - -From within your Netdata config directory (typically `/etc/netdata`), [use `edit-config`](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) to -open either `stream.conf` or `netdata.conf`. - -``` -sudo ./edit-config stream.conf -sudo ./edit-config netdata.conf -``` - -## Settings - -As mentioned above, both `stream.conf` and `netdata.conf` contain settings relevant to streaming. - -### `stream.conf` - -The `stream.conf` file contains three sections. The `[stream]` section is for configuring child nodes. - -The `[API_KEY]` and `[MACHINE_GUID]` sections are both for configuring parent nodes, and share the same settings. -`[API_KEY]` settings affect every child node using that key, whereas `[MACHINE_GUID]` settings affect only the child -node with a matching GUID. - -The file `/var/lib/netdata/registry/netdata.public.unique.id` contains a random GUID that **uniquely identifies each -node**. This file is automatically generated by Netdata the first time it is started and remains unaltered forever. - -#### `[stream]` section - -| Setting | Default | Description | -| :---------------------------------------------- | :------------------------ | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `enabled` | `no` | Whether this node streams metrics to any parent. Change to `yes` to enable streaming. | -| [`destination`](#destination) | ` ` | A space-separated list of parent nodes to attempt to stream to, with the first available parent receiving metrics, using the following format: `[PROTOCOL:]HOST[%INTERFACE][:PORT][:SSL]`. [Read more →](#destination) | -| `ssl skip certificate verification` | `yes` | If you want to accept self-signed or expired certificates, set to `yes` and uncomment. | -| `CApath` | `/etc/ssl/certs/` | The directory where known certificates are found. Defaults to OpenSSL's default path. | -| `CAfile` | `/etc/ssl/certs/cert.pem` | Add a parent node certificate to the list of known certificates in `CAPath`. | -| `api key` | ` ` | The `API_KEY` to use as the child node. | -| `timeout seconds` | `60` | The timeout to connect and send metrics to a parent. | -| `default port` | `19999` | The port to use if `destination` does not specify one. | -| [`send charts matching`](#send-charts-matching) | `*` | A space-separated list of [Netdata simple patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md) to filter which charts are streamed. [Read more →](#send-charts-matching) | -| `buffer size bytes` | `10485760` | The size of the buffer to use when sending metrics. The default `10485760` equals a buffer of 10MB, which is good for 60 seconds of data. Increase this if you expect latencies higher than that. The buffer is flushed on reconnect. | -| `reconnect delay seconds` | `5` | How long to wait until retrying to connect to the parent node. | -| `initial clock resync iterations` | `60` | Sync the clock of charts for how many seconds when starting. | - -### `[API_KEY]` and `[MACHINE_GUID]` sections - -| Setting | Default | Description | -| :---------------------------------------------- | :------------------------ | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `enabled` | `no` | Whether this API KEY enabled or disabled. | -| [`allow from`](#allow-from) | `*` | A space-separated list of [Netdata simple patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md) matching the IPs of nodes that will stream metrics using this API key. [Read more →](#allow-from) | -| `default history` | `3600` | The default amount of child metrics history to retain when using the `save`, `map`, or `ram` memory modes. | -| [`default memory mode`](#default-memory-mode) | `ram` | The [database](https://github.com/netdata/netdata/blob/master/database/README.md) to use for all nodes using this `API_KEY`. Valid settings are `dbengine`, `map`, `save`, `ram`, or `none`. [Read more →](#default-memory-mode) | -| `health enabled by default` | `auto` | Whether alarms and notifications should be enabled for nodes using this `API_KEY`. `auto` enables alarms when the child is connected. `yes` enables alarms always, and `no` disables alarms. | -| `default postpone alarms on connect seconds` | `60` | Postpone alarms and notifications for a period of time after the child connects. | -| `default proxy enabled` | ` ` | Route metrics through a proxy. | -| `default proxy destination` | ` ` | Space-separated list of `IP:PORT` for proxies. | -| `default proxy api key` | ` ` | The `API_KEY` of the proxy. | -| `default send charts matching` | `*` | See [`send charts matching`](#send-charts-matching). | - -#### `destination` - -A space-separated list of parent nodes to attempt to stream to, with the first available parent receiving metrics, using -the following format: `[PROTOCOL:]HOST[%INTERFACE][:PORT][:SSL]`. - -- `PROTOCOL`: `tcp`, `udp`, or `unix`. (only tcp and unix are supported by parent nodes) -- `HOST`: A IPv4, IPv6 IP, or a hostname, or a unix domain socket path. IPv6 IPs should be given with brackets - `[ip:address]`. -- `INTERFACE` (IPv6 only): The network interface to use. -- `PORT`: The port number or service name (`/etc/services`) to use. -- `SSL`: To enable TLS/SSL encryption of the streaming connection. - -To enable TCP streaming to a parent node at `203.0.113.0` on port `20000` and with TLS/SSL encryption: - -```conf -[stream] - destination = tcp:203.0.113.0:20000:SSL -``` - -#### `send charts matching` - -A space-separated list of [Netdata simple patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md) to filter which charts are streamed. - -The default is a single wildcard `*`, which streams all charts. - -To send only a few charts, list them explicitly, or list a group using a wildcard. To send _only_ the `apps.cpu` chart -and charts with contexts beginning with `system.`: - -```conf -[stream] - send charts matching = apps.cpu system.* -``` - -To send all but a few charts, use `!` to create a negative match. To send _all_ charts _but_ `apps.cpu`: - -```conf -[stream] - send charts matching = !apps.cpu * -``` - -#### `allow from` - -A space-separated list of [Netdata simple patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md) matching the IPs of nodes that -will stream metrics using this API key. The order is important, left to right, as the first positive or negative match is used. - -The default is `*`, which accepts all requests including the `API_KEY`. - -To allow from only a specific IP address: - -```conf -[API_KEY] - allow from = 203.0.113.10 -``` - -To allow all IPs starting with `10.*`, except `10.1.2.3`: - -```conf -[API_KEY] - allow from = !10.1.2.3 10.* -``` - -> If you set specific IP addresses here, and also use the `allow connections` setting in the `[web]` section of -> `netdata.conf`, be sure to add the IP address there so that it can access the API port. - -#### `default memory mode` - -The [database](https://github.com/netdata/netdata/blob/master/database/README.md) to use for all nodes using this `API_KEY`. Valid settings are `dbengine`, `ram`, -`save`, `map`, or `none`. - -- `dbengine`: The default, recommended time-series database (TSDB) for Netdata. Stores recent metrics in memory, then - efficiently spills them to disk for long-term storage. -- `ram`: Stores metrics _only_ in memory, which means metrics are lost when Netdata stops or restarts. Ideal for - streaming configurations that use ephemeral nodes. -- `save`: Stores metrics in memory, but saves metrics to disk when Netdata stops or restarts, and loads historical - metrics on start. -- `map`: Stores metrics in memory-mapped files, like swap, with constant disk write. -- `none`: No database. - -When using `default memory mode = dbengine`, the parent node creates a separate instance of the TSDB to store metrics -from child nodes. The [size of _each_ instance is configurable](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) with the `page -cache size` and `dbengine multihost disk space` settings in the `[global]` section in `netdata.conf`. - -### `netdata.conf` - -| Setting | Default | Description | -| :----------------------------------------- | :---------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| **`[global]` section** | | | -| `memory mode` | `dbengine` | Determines the [database type](https://github.com/netdata/netdata/blob/master/database/README.md) to be used on that node. Other options settings include `none`, `ram`, `save`, and `map`. `none` disables the database at this host. This also disables alarms and notifications, as those can't run without a database. | -| **`[web]` section** | | | -| `mode` | `static-threaded` | Determines the [web server](https://github.com/netdata/netdata/blob/master/web/server/README.md) type. The other option is `none`, which disables the dashboard, API, and registry. | -| `accept a streaming request every seconds` | `0` | Set a limit on how often a parent node accepts streaming requests from child nodes. `0` equals no limit. If this is set, you may see `... too busy to accept new streaming request. Will be allowed in X secs` in Netdata's `error.log`. | - -## Examples - -### Per-child settings - -While the `[API_KEY]` section applies settings for any child node using that key, you can also use per-child settings -with the `[MACHINE_GUID]` section. - -For example, the metrics streamed from only the child node with `MACHINE_GUID` are saved in memory, not using the -default `dbengine` as specified by the `API_KEY`, and alarms are disabled. - -```conf -[API_KEY] - enabled = yes - default memory mode = dbengine - health enabled by default = auto - allow from = * - -[MACHINE_GUID] - enabled = yes - memory mode = save - health enabled = no -``` - -### Securing streaming with TLS/SSL - -Netdata does not activate TLS encryption by default. To encrypt streaming connections, you first need to [enable TLS -support](https://github.com/netdata/netdata/blob/master/web/server/README.md#enabling-tls-support) on the parent. With encryption enabled on the receiving side, you -need to instruct the child to use TLS/SSL as well. On the child's `stream.conf`, configure the destination as follows: - -``` -[stream] - destination = host:port:SSL -``` - -The word `SSL` appended to the end of the destination tells the child that connections must be encrypted. - -> While Netdata uses Transport Layer Security (TLS) 1.2 to encrypt communications rather than the obsolete SSL protocol, -> it's still common practice to refer to encrypted web connections as `SSL`. Many vendors, like Nginx and even Netdata -> itself, use `SSL` in configuration files, whereas documentation will always refer to encrypted communications as `TLS` -> or `TLS/SSL`. - -#### Certificate verification - -When TLS/SSL is enabled on the child, the default behavior will be to not connect with the parent unless the server's -certificate can be verified via the default chain. In case you want to avoid this check, add the following to the -child's `stream.conf` file: - -``` -[stream] - ssl skip certificate verification = yes -``` - -#### Trusted certificate - -If you've enabled [certificate verification](#certificate-verification), you might see errors from the OpenSSL library -when there's a problem with checking the certificate chain (`X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY`). More -importantly, OpenSSL will reject self-signed certificates. - -Given these known issues, you have two options. If you trust your certificate, you can set the options `CApath` and -`CAfile` to inform Netdata where your certificates, and the certificate trusted file, are stored. - -For more details about these options, you can read about [verify -locations](https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_load_verify_locations.html). - -Before you changed your streaming configuration, you need to copy your trusted certificate to your child system and add -the certificate to OpenSSL's list. - -On most Linux distributions, the `update-ca-certificates` command searches inside the `/usr/share/ca-certificates` -directory for certificates. You should double-check by reading the `update-ca-certificate` manual (`man -update-ca-certificate`), and then change the directory in the below commands if needed. - -If you have `sudo` configured on your child system, you can use that to run the following commands. If not, you'll have -to log in as `root` to complete them. - -``` -# mkdir /usr/share/ca-certificates/netdata -# cp parent_cert.pem /usr/share/ca-certificates/netdata/parent_cert.crt -# chown -R netdata.netdata /usr/share/ca-certificates/netdata/ -``` - -First, you create a new directory to store your certificates for Netdata. Next, you need to change the extension on your -certificate from `.pem` to `.crt` so it's compatible with `update-ca-certificate`. Finally, you need to change -permissions so the user that runs Netdata can access the directory where you copied in your certificate. - -Next, edit the file `/etc/ca-certificates.conf` and add the following line: - -``` -netdata/parent_cert.crt -``` - -Now you update the list of certificates running the following, again either as `sudo` or `root`: - -``` -# update-ca-certificates -``` - -> Some Linux distributions have different methods of updating the certificate list. For more details, please read this -> guide on [adding trusted root certificates](https://github.com/Busindre/How-to-Add-trusted-root-certificates). - -Once you update your certificate list, you can set the stream parameters for Netdata to trust the parent certificate. -Open `stream.conf` for editing and change the following lines: - -``` -[stream] - CApath = /etc/ssl/certs/ - CAfile = /etc/ssl/certs/parent_cert.pem -``` - -With this configuration, the `CApath` option tells Netdata to search for trusted certificates inside `/etc/ssl/certs`. -The `CAfile` option specifies the Netdata parent certificate is located at `/etc/ssl/certs/parent_cert.pem`. With this -configuration, you can skip using the system's entire list of certificates and use Netdata's parent certificate instead. - -#### Expected behaviors - -With the introduction of TLS/SSL, the parent-child communication behaves as shown in the table below, depending on the -following configurations: - -- **Parent TLS (Yes/No)**: Whether the `[web]` section in `netdata.conf` has `ssl key` and `ssl certificate`. -- **Parent port TLS (-/force/optional)**: Depends on whether the `[web]` section `bind to` contains a `^SSL=force` or - `^SSL=optional` directive on the port(s) used for streaming. -- **Child TLS (Yes/No)**: Whether the destination in the child's `stream.conf` has `:SSL` at the end. -- **Child TLS Verification (yes/no)**: Value of the child's `stream.conf` `ssl skip certificate verification` - parameter (default is no). - -| Parent TLS enabled | Parent port SSL | Child TLS | Child SSL Ver. | Behavior | -| :----------------- | :--------------- | :-------- | :------------- | :--------------------------------------------------------------------------------------------------------------------------------------- | -| No | - | No | no | Legacy behavior. The parent-child stream is unencrypted. | -| Yes | force | No | no | The parent rejects the child connection. | -| Yes | -/optional | No | no | The parent-child stream is unencrypted (expected situation for legacy child nodes and newer parent nodes) | -| Yes | -/force/optional | Yes | no | The parent-child stream is encrypted, provided that the parent has a valid TLS/SSL certificate. Otherwise, the child refuses to connect. | -| Yes | -/force/optional | Yes | yes | The parent-child stream is encrypted. | - -### Proxy - -A proxy is a node that receives metrics from a child, then streams them onward to a parent. To configure a proxy, -configure it as a receiving and a sending Netdata at the same time. - -Netdata proxies may or may not maintain a database for the metrics passing through them. When they maintain a database, -they can also run health checks (alarms and notifications) for the remote host that is streaming the metrics. - -In the following example, the proxy receives metrics from a child node using the `API_KEY` of -`66666666-7777-8888-9999-000000000000`, then stores metrics using `dbengine`. It then uses the `API_KEY` of -`11111111-2222-3333-4444-555555555555` to proxy those same metrics on to a parent node at `203.0.113.0`. - -```conf -[stream] - enabled = yes - destination = 203.0.113.0 - api key = 11111111-2222-3333-4444-555555555555 - -[66666666-7777-8888-9999-000000000000] - enabled = yes - default memory mode = dbengine -``` - -### Ephemeral nodes - -Netdata can help you monitor ephemeral nodes, such as containers in an auto-scaling infrastructure, by always streaming -metrics to any number of permanently-running parent nodes. - -On the parent, set the following in `stream.conf`: - -```conf -[11111111-2222-3333-4444-555555555555] - # enable/disable this API key - enabled = yes - - # one hour of data for each of the child nodes - default history = 3600 - - # do not save child metrics on disk - default memory = ram - - # alarms checks, only while the child is connected - health enabled by default = auto -``` - -On the child nodes, set the following in `stream.conf`: - -```bash -[stream] - # stream metrics to another Netdata - enabled = yes - - # the IP and PORT of the parent - destination = 10.11.12.13:19999 - - # the API key to use - api key = 11111111-2222-3333-4444-555555555555 -``` - -In addition, edit `netdata.conf` on each child node to disable the database and alarms. - -```bash -[global] - # disable the local database - memory mode = none - -[health] - # disable health checks - enabled = no -``` - -## Troubleshooting - -Both parent and child nodes log information at `/var/log/netdata/error.log`. - -If the child manages to connect to the parent you will see something like (on the parent): - -``` -2017-03-09 09:38:52: netdata: INFO : STREAM [receive from [10.11.12.86]:38564]: new client connection. -2017-03-09 09:38:52: netdata: INFO : STREAM xxx [10.11.12.86]:38564: receive thread created (task id 27721) -2017-03-09 09:38:52: netdata: INFO : STREAM xxx [receive from [10.11.12.86]:38564]: client willing to stream metrics for host 'xxx' with machine_guid '1234567-1976-11e6-ae19-7cdd9077342a': update every = 1, history = 3600, memory mode = ram, health auto -2017-03-09 09:38:52: netdata: INFO : STREAM xxx [receive from [10.11.12.86]:38564]: initializing communication... -2017-03-09 09:38:52: netdata: INFO : STREAM xxx [receive from [10.11.12.86]:38564]: receiving metrics... -``` - -and something like this on the child: - -``` -2017-03-09 09:38:28: netdata: INFO : STREAM xxx [send to box:19999]: connecting... -2017-03-09 09:38:28: netdata: INFO : STREAM xxx [send to box:19999]: initializing communication... -2017-03-09 09:38:28: netdata: INFO : STREAM xxx [send to box:19999]: waiting response from remote netdata... -2017-03-09 09:38:28: netdata: INFO : STREAM xxx [send to box:19999]: established communication - sending metrics... -``` - -The following sections describe the most common issues you might encounter when connecting parent and child nodes. - -### Slow connections between parent and child - -When you have a slow connection between parent and child, Netdata raises a few different errors. Most of the -errors will appear in the child's `error.log`. - -```bash -netdata ERROR : STREAM_SENDER[CHILD HOSTNAME] : STREAM CHILD HOSTNAME [send to PARENT IP:PARENT PORT]: too many data pending - buffer is X bytes long, -Y unsent - we have sent Z bytes in total, W on this connection. Closing connection to flush the data. -``` - -On the parent side, you may see various error messages, most commonly the following: - -``` -netdata ERROR : STREAM_PARENT[CHILD HOSTNAME,[CHILD IP]:CHILD PORT] : read failed: end of file -``` - -Another common problem in slow connections is the child sending a partial message to the parent. In this case, the -parent will write the following to its `error.log`: - -``` -ERROR : STREAM_RECEIVER[CHILD HOSTNAME,[CHILD IP]:CHILD PORT] : sent command 'B' which is not known by netdata, for host 'HOSTNAME'. Disabling it. -``` - -In this example, `B` was part of a `BEGIN` message that was cut due to connection problems. - -Slow connections can also cause problems when the parent misses a message and then receives a command related to the -missed message. For example, a parent might miss a message containing the child's charts, and then doesn't know -what to do with the `SET` message that follows. When that happens, the parent will show a message like this: - -``` -ERROR : STREAM_RECEIVER[CHILD HOSTNAME,[CHILD IP]:CHILD PORT] : requested a SET on chart 'CHART NAME' of host 'HOSTNAME', without a dimension. Disabling it. -``` - -### Child cannot connect to parent - -When the child can't connect to a parent for any reason (misconfiguration, networking, firewalls, parent -down), you will see the following in the child's `error.log`. - -``` -ERROR : STREAM_SENDER[HOSTNAME] : Failed to connect to 'PARENT IP', port 'PARENT PORT' (errno 113, No route to host) -``` - -### 'Is this a Netdata?' - -This question can appear when Netdata starts the stream and receives an unexpected response. This error can appear when -the parent is using SSL and the child tries to connect using plain text. You will also see this message when -Netdata connects to another server that isn't Netdata. The complete error message will look like this: - -``` -ERROR : STREAM_SENDER[CHILD HOSTNAME] : STREAM child HOSTNAME [send to PARENT HOSTNAME:PARENT PORT]: server is not replying properly (is it a netdata?). -``` - -### Stream charts wrong - -Chart data needs to be consistent between child and parent nodes. If there are differences between chart data on -a parent and a child, such as gaps in metrics collection, it most often means your child's `memory mode` -does not match the parent's. To learn more about the different ways Netdata can store metrics, and thus keep chart -data consistent, read our [memory mode documentation](https://github.com/netdata/netdata/blob/master/database/README.md). - -### Forbidding access - -You may see errors about "forbidding access" for a number of reasons. It could be because of a slow connection between -the parent and child nodes, but it could also be due to other failures. Look in your parent's `error.log` for errors -that look like this: - -``` -STREAM [receive from [child HOSTNAME]:child IP]: `MESSAGE`. Forbidding access." -``` - -`MESSAGE` will have one of the following patterns: - -- `request without KEY` : The message received is incomplete and the KEY value can be API, hostname, machine GUID. -- `API key 'VALUE' is not valid GUID`: The UUID received from child does not have the format defined in [RFC - 4122](https://tools.ietf.org/html/rfc4122) -- `machine GUID 'VALUE' is not GUID.`: This error with machine GUID is like the previous one. -- `API key 'VALUE' is not allowed`: This stream has a wrong API key. -- `API key 'VALUE' is not permitted from this IP`: The IP is not allowed to use STREAM with this parent. -- `machine GUID 'VALUE' is not allowed.`: The GUID that is trying to send stream is not allowed. -- `Machine GUID 'VALUE' is not permitted from this IP. `: The IP does not match the pattern or IP allowed to connect to - use stream. - -### Netdata could not create a stream - -The connection between parent and child is a stream. When the parent can't convert the initial connection into -a stream, it will write the following message inside `error.log`: - -``` -file descriptor given is not a valid stream -``` - -After logging this error, Netdata will close the stream. -- cgit v1.2.3