From a2d7dede737947d7c6afa20a88e1f0c64e0eb96c Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Thu, 10 Aug 2023 11:18:52 +0200 Subject: Merging upstream version 1.42.0. Signed-off-by: Daniel Baumann --- docs/anonymous-statistics.md | 2 +- .../deployment-strategies.md | 208 ++++++++++++++++++++- .../add-webhook-notification-configuration.md | 4 +- docs/cloud/netdata-functions.md | 6 +- docs/configure/common-changes.md | 2 +- .../troubleshooting-agent-with-cloud-connection.md | 14 +- 6 files changed, 220 insertions(+), 16 deletions(-) (limited to 'docs') diff --git a/docs/anonymous-statistics.md b/docs/anonymous-statistics.md index d8cc99689..f84989e16 100644 --- a/docs/anonymous-statistics.md +++ b/docs/anonymous-statistics.md @@ -23,7 +23,7 @@ We use the statistics gathered from this information for two purposes: Netdata collects usage information via two different channels: -- **Agent dashboard**: We use the [PostHog JavaScript integration](https://posthog.com/docs/integrations/js-integration) (with sensitive event attributes overwritten to be anonymized) to send product usage events when you access an [Agent's dashboard](https://github.com/netdata/netdata/blob/master/web/gui/README.md). +- **Agent dashboard**: We use the [PostHog JavaScript integration](https://posthog.com/docs/integrations/js-integration) (with sensitive event attributes overwritten to be anonymized) to send product usage events when you access an [Agent's dashboard](https://github.com/netdata/netdata/blob/master/docs/category-overview-pages/accessing-netdata-dashboards.md). - **Agent backend**: The `netdata` daemon executes the [`anonymous-statistics.sh`](https://github.com/netdata/netdata/blob/6469cf92724644f5facf343e4bdd76ac0551a418/daemon/anonymous-statistics.sh.in) script when Netdata starts, stops cleanly, or fails. You can opt-out from sending anonymous statistics to Netdata through three different [opt-out mechanisms](#opt-out). diff --git a/docs/category-overview-pages/deployment-strategies.md b/docs/category-overview-pages/deployment-strategies.md index a1d393f26..f8a68b46f 100644 --- a/docs/category-overview-pages/deployment-strategies.md +++ b/docs/category-overview-pages/deployment-strategies.md @@ -21,7 +21,7 @@ There are 3 components to structure your Netdata ecosystem: 3. **Netdata Cloud** - Our SaaS, combining all your infrastructure, all your Netdata Agents and Parents, into one uniform, distributed, infinitely + Our SaaS, combining all your infrastructure, all your Netdata Agents and Parents, into one uniform, distributed, scalable, monitoring database, offering advanced data slicing and dicing capabilities, custom dashboards, advanced troubleshooting tools, user management, centralized management of alerts, and more. @@ -30,9 +30,211 @@ The Netdata Agent is a highly modular software piece, providing data collection database, a query engine, health monitoring and alerts, machine learning and anomaly detection, metrics exporting to third party systems. -To help our users have a complete experience of Netdata when they install it for the first time, a Netdata Agent with default configuration +## Deployment Options Overview + +This section provides a quick overview of a few common deployment options. The next sections go into configuration examples and further reading. + +### Stand-alone Deployment + +To help our users have a complete experience of Netdata when they install it for the first time, a Netdata Agent with default configuration is a complete monitoring solution out of the box, having all these features enabled and available. +The Agent will act as a _stand-alone_ Agent by default, and this is great to start out with for small setups and home labs. By [connecting each Agent to Cloud](https://github.com/netdata/netdata/blob/master/claim/README.md), you can see an overview of all your nodes, with aggregated charts and centralized alerting, without setting up a Parent. + +![image](https://github.com/netdata/netdata/assets/116741/6a638175-aec4-4d46-85a6-520c283ab6a8) + +### Parent – Child Deployment + +An Agent connected to a Parent is called a _Child_. It will _stream_ metrics to its Parent. The Parent can then take care of storing metrics on behalf of that node (with longer retention), handle metrics queries for showing dashboards, and provide alerting. + +When using Cloud, it is recommended that just the Parent is connected to Cloud. Child Agents can then be configured to have short retention, in RAM instead of on Disk, and have alerting and other features disabled. Because they don't need to connect to Cloud themselves, those children can then be further secured by not allowing outbound traffic. + +![image](https://github.com/netdata/netdata/assets/116741/cb65698d-a6b7-43ee-a2d1-c30d0a46f084) + +This setup allows for leaner Child nodes and is good for setups with more than a handful of nodes. Metrics data remains accessible if the Child node is temporarily unavailable or decommissioned, although there is no failover in case the Parent becomes unavailable. + + +### Active–Active Parent Deployment + +For high availability, Parents can be configured to stream data for their children between them, and keep the data sets in sync. Child Agents are configured with the addresses of both Parent Agents, but will only stream to one of them at a time. When that Parent becomes unavailable, it reconnects to another. When the first Parent becomes available again, that Parent will catch up by receiving the backlog from the second. + +With both Parent Agents connected to Cloud, Cloud will route queries to either Parent transparently, depending on their availability. Alerts trigger on either Parent will stream to Cloud, and Cloud will deduplicate and debounce state changes to prevent spurious notifications. + +![image](https://github.com/netdata/netdata/assets/116741/6ae2b10c-7f7d-4503-aac4-0a9381c6f80b) + + +## Configuration Details + +### Stand-alone Deployment + +The stand-alone setup is configured out of the box with reasonable defaults, but please consult our [configuration documentation](https://github.com/netdata/netdata/blob/master/docs/cloud/cheatsheet.md) for details, including the overview of [common configuration changes](https://github.com/netdata/netdata/blob/master/docs/configure/common-changes.md). + +### Parent – Child Deployment + +For setups involving Child and Parent Agents, the Agents need to be configured for [_streaming_](https://github.com/netdata/netdata/blob/master/streaming/README.md), through the configuration file `stream.conf`. This will instruct the Child to stream data to the Parent and the Parent to accept streaming connections for one or more Child Agents. To secure this connection, both need set up a shared API key (to replace the string `API_KEY` in the examples below). Additionally, the Child is configured with one or more addresses of Parent Agents (`PARENT_IP_ADDRESS`). + +An API key is a key created with `uuidgen` and is used for authentication and/or customization in the Parent side. I.e. a Child will stream using the API key, and a Parent is configured to accept connections from Child, but can also apply different options for children by using multiple different API keys. The easiest setup uses just one API key for all Child Agents. + +#### Child config + +As mentioned above, the recommendation is to not claim the Child to Cloud directly during your setup, avoiding establishing an [ACLK](https://github.com/netdata/netdata/blob/master/aclk/README.md) connection. + +To reduce the footprint of the Netdata Agent on your production system, some capabilities can be switched OFF on the Child and kept ON on the Parent. In this example, Machine Learning and Alerting are disabled in the Child, so that the Parent can take the load. We also use RAM instead of disk to store metrics with limited retention, covering temporary network issues. + +##### netdata.conf + +On the child node, edit `netdata.conf` by using the edit-config script: `/etc/netdata/edit-config netdata.conf` set the following parameters: + +```yaml +[db] + # https://learn.netdata.cloud/docs/agent/database + # none = no retention, ram = some retention in ram + mode = ram + # The retention in seconds. + # This provides some tolerance to the time the child has to find a parent in + # order to transfer the data. For IoT this can be lowered to 120. + retention = 1200 + # The granularity of metrics, in seconds. + # You may increase this to lower CPU resources. + update every = 1 +[ml] + # Disable Machine Learning + enabled = no +[health] + # Disable Health Checks (Alerting) + enabled = no +[web] + # Disable remote access to the local dashboard + bind to = lo +[plugins] + # Uncomment the following line to disable all external plugins on extreme + # IoT cases by default. + # enable running new plugins = no +``` + +##### stream.conf + +To edit `stream.conf`, again use the edit-config script: `/etc/netdata/edit-config stream.conf`. + +Set the following parameters: + +```yaml +[stream] + # Stream metrics to another Netdata + enabled = yes + # The IP and PORT of the parent + destination = PARENT_IP_ADDRESS:19999 + # The shared API key, generated by uuidgen + api key = API_KEY +``` + +#### Parent config + +For the Parent, besides setting up streaming, the example will also provide an example configuration of multiple [tiers](https://github.com/netdata/netdata/blob/master/database/engine/README.md#tiering) of metrics [storage](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md), for 10 children, with about 2k metrics each. + +- 1s granularity at tier 0 for 1 week +- 1m granularity at tier 1 for 1 month +- 1h granularity at tier 2 for 1 year + +Requiring: + +- 25GB of disk +- 3.5GB of RAM (2.5GB under pressure) + +##### netdata.conf + +On the Parent, edit `netdata.conf` with `/etc/netdata/edit-config netdata.conf` and set the following parameters: + +```yaml +[db] + mode = dbengine + storage tiers = 3 + # To allow memory pressure to offload index from ram + dbengine page descriptors in file mapped memory = yes + # storage tier 0 + update every = 1 + dbengine multihost disk space MB = 12000 + dbengine page cache size MB = 1400 + # storage tier 1 + dbengine tier 1 page cache size MB = 512 + dbengine tier 1 multihost disk space MB = 4096 + dbengine tier 1 update every iterations = 60 + dbengine tier 1 backfill = new + # storage tier 2 + dbengine tier 2 page cache size MB = 128 + dbengine tier 2 multihost disk space MB = 2048 + dbengine tier 2 update every iterations = 60 + dbengine tier 2 backfill = new +[ml] + # Enabled by default + # enabled = yes +[health] + # Enabled by default + # enabled = yes +[web] + # Enabled by default + # bind to = * +``` + +##### stream.conf + +On the Parent node, edit `stream.conf` with `/etc/netdata/edit-config stream.conf`, and then set the following parameters: + +```yaml +[API_KEY] + # Accept metrics streaming from other Agents with the specified API key + enabled = yes +``` + +### Active–Active Parent Deployment + +In order to setup active–active streaming between Parent 1 and Parent 2, Parent 1 needs to be instructed to stream data to Parent 2 and Parent 2 to stream data to Parent 1. The Child Agents need to be configured with the addresses of both Parent Agents. The Agent will only connect to one Parent at a time, falling back to the next if the previous failed. These examples use the same API key between Parent Agents as for connections from Child Agents. + +On both Netdata Parent and all Child Agents, edit `stream.conf` with `/etc/netdata/edit-config stream.conf`: + +##### stream.conf on Parent 1 + +```yaml +[stream] + # Stream metrics to another Netdata + enabled = yes + # The IP and PORT of Parent 2 + destination = PARENT_2_IP_ADDRESS:19999 + # This is the API key for the outgoing connection to Parent 2 + api key = API_KEY +[API_KEY] + # Accept metrics streams from Parent 2 and Child Agents + enabled = yes +``` + +##### stream.conf on Parent 2 + +```yaml +[stream] + # Stream metrics to another Netdata + enabled = yes + # The IP and PORT of Parent 1 + destination = PARENT_1_IP_ADDRESS:19999 + api key = API_KEY +[API_KEY] + # Accept metrics streams from Parent 1 and Child Agents + enabled = yes +``` + +##### stream.conf on Child Agents + +```yaml +[stream] + # Stream metrics to another Netdata + enabled = yes + # The IP and PORT of the parent + destination = PARENT_1_IP_ADDRESS:19999 PARENT_2_IP_ADDRESS:19999 + # The shared API key, generated by uuidgen + api key = API_KEY +``` + +## Further Reading + We strongly recommend the following configuration changes for production deployments: 1. Understand Netdata's [security and privacy design](https://github.com/netdata/netdata/blob/master/docs/netdata-security.md) and @@ -47,7 +249,7 @@ We strongly recommend the following configuration changes for production deploym - Increase data retention. - Make your data highly available. -3. [Optimize the Netdata Agents system utilization and performance](https://github.com/netdata/netdata/edit/master/docs/guides/configure/performance.md) +3. [Optimize the Netdata Agents system utilization and performance](https://github.com/netdata/netdata/blob/master/docs/guides/configure/performance.md) To save valuable system resources, especially when running on weak IoT devices. diff --git a/docs/cloud/alerts-notifications/add-webhook-notification-configuration.md b/docs/cloud/alerts-notifications/add-webhook-notification-configuration.md index 21d1b6ed8..012b0478f 100644 --- a/docs/cloud/alerts-notifications/add-webhook-notification-configuration.md +++ b/docs/cloud/alerts-notifications/add-webhook-notification-configuration.md @@ -140,7 +140,7 @@ server { ssl_client_certificate /path/to/Netdata_CA.pem; location / { - if ($ssl_client_s_dn !~ "CN=api.netdata.cloud") { + if ($ssl_client_s_dn !~ "CN=app.netdata.cloud") { return 403; } # ... existing location configuration ... @@ -158,7 +158,7 @@ Listen 443 SSLCACertificateFile "/path/to/Netdata_CA.pem" - Require expr "%{SSL_CLIENT_S_DN_CN} == 'api.netdata.cloud'" + Require expr "%{SSL_CLIENT_S_DN_CN} == 'app.netdata.cloud'" # ... existing directory configuration ... ``` diff --git a/docs/cloud/netdata-functions.md b/docs/cloud/netdata-functions.md index 8e9415eb3..949c8b4cc 100644 --- a/docs/cloud/netdata-functions.md +++ b/docs/cloud/netdata-functions.md @@ -36,8 +36,10 @@ functions - [plugins.d](https://github.com/netdata/netdata/blob/master/collector | ebpf_thread | Controller for eBPF threads. | [ebpf.plugin](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/README.md) | If you have ideas or requests for other functions: -* open a [Feature request](https://github.com/netdata/netdata-cloud/issues/new?assignees=&labels=feature+request%2Cneeds+triage&template=FEAT_REQUEST.yml&title=%5BFeat%5D%3A+) on Netdata Cloud repo -* engage with our community on the [Netdata Discord server](https://discord.com/invite/mPZ6WZKKG2). +* Participate in the relevant [GitHub discussion](https://github.com/netdata/netdata/discussions/14412) +* Open a [feature request](https://github.com/netdata/netdata-cloud/issues/new?assignees=&labels=feature+request%2Cneeds+triage&template=FEAT_REQUEST.yml&title=%5BFeat%5D%3A+) on Netdata Cloud repo +* Join the Netdata community on [Discord](https://discord.com/invite/mPZ6WZKKG2) and let us know. + #### How do functions work with streaming? Via streaming, the definitions of functions are transmitted to a parent node so it knows all the functions available on diff --git a/docs/configure/common-changes.md b/docs/configure/common-changes.md index f171e49e2..61e5d4c8d 100644 --- a/docs/configure/common-changes.md +++ b/docs/configure/common-changes.md @@ -20,7 +20,7 @@ directory. ## Change dashboards and visualizations -The Netdata Agent's [local dashboard](https://github.com/netdata/netdata/blob/master/web/gui/README.md), accessible +The Netdata Agent's [local dashboard](https://github.com/netdata/netdata/blob/master/docs/category-overview-pages/accessing-netdata-dashboards.md), accessible at `http://NODE:19999` is highly configurable. If you use Netdata Cloud for [infrastructure monitoring](https://github.com/netdata/netdata/blob/master/docs/quickstart/infrastructure.md), you diff --git a/docs/guides/troubleshoot/troubleshooting-agent-with-cloud-connection.md b/docs/guides/troubleshoot/troubleshooting-agent-with-cloud-connection.md index ad747cb76..9c69ee915 100644 --- a/docs/guides/troubleshoot/troubleshooting-agent-with-cloud-connection.md +++ b/docs/guides/troubleshoot/troubleshooting-agent-with-cloud-connection.md @@ -19,17 +19,17 @@ If the claiming process fails, the node will not appear at all in Netdata Cloud. First ensure that you: - Use the newest possible stable or nightly version of the agent (at least v1.32). -- Your node can successfully issue an HTTPS request to https://api.netdata.cloud +- Your node can successfully issue an HTTPS request to https://app.netdata.cloud Other possible causes differ between kickstart installations and Docker installations. ### Verify your node can access Netdata Cloud -If you run either `curl` or `wget` to do an HTTPS request to https://api.netdata.cloud, you should get +If you run either `curl` or `wget` to do an HTTPS request to https://app.netdata.cloud, you should get back a 404 response. If you do not, check your network connectivity, domain resolution, and firewall settings for outbound connections. -If your firewall is configured to completely prevent outbound connections, you need to whitelist `api.netdata.cloud` and `mqtt.netdata.cloud`. If you can't whitelist domains in your firewall, you can whitelist the IPs that the hostnames resolve to, but keep in mind that they can change without any notice. +If your firewall is configured to completely prevent outbound connections, you need to whitelist `app.netdata.cloud` and `mqtt.netdata.cloud`. If you can't whitelist domains in your firewall, you can whitelist the IPs that the hostnames resolve to, but keep in mind that they can change without any notice. If you use an outbound proxy, you need to [take some extra steps]( https://github.com/netdata/netdata/blob/master/claim/README.md#connect-through-a-proxy). @@ -100,7 +100,7 @@ process, but not have enough time to establish the ACLK connection. ### Verify that your firewall allows websockets -The agent initiates an SSL connection to `api.netdata.cloud` and then upgrades that connection to use secure +The agent initiates an SSL connection to `app.netdata.cloud` and then upgrades that connection to use secure websockets. Some firewalls completely prevent the use of websockets, even for outbound connections. ## Previously connected agent that can no longer connect @@ -110,7 +110,7 @@ that it is currently not connected. ### Verify that network connectivity is still possible -Verify that you can still issue HTTPS requests to api.netdata.cloud and that no firewall or proxy changes were made. +Verify that you can still issue HTTPS requests to app.netdata.cloud and that no firewall or proxy changes were made. ### Verify that the claiming info is persisted @@ -124,7 +124,7 @@ work this way, as we have unique node identification information under `/var/lib ### Verify that your IP is not blocked by Netdata Cloud -Most of the nodes change IPs dynamically. It is possible that your current IP has been restricted from accessing `api.netdata.cloud` due to security concerns, usually because it was spamming Netdata Coud with too many +Most of the nodes change IPs dynamically. It is possible that your current IP has been restricted from accessing `app.netdata.cloud` due to security concerns, usually because it was spamming Netdata Coud with too many failed requests (old versions of the agent). To verify this: @@ -135,7 +135,7 @@ To verify this: sudo netdatacli aclk-state | grep "Banned By Cloud" ``` - The output will contain a line indicating if the IP is banned from `api.netdata.cloud`: + The output will contain a line indicating if the IP is banned from `app.netdata.cloud`: ```bash Banned By Cloud: yes -- cgit v1.2.3