diff options
Diffstat (limited to 'docs/netdata-agent')
26 files changed, 3437 insertions, 0 deletions
diff --git a/docs/netdata-agent/README.md b/docs/netdata-agent/README.md new file mode 100644 index 000000000..75bd4898e --- /dev/null +++ b/docs/netdata-agent/README.md @@ -0,0 +1,84 @@ +# Netdata Agent + +The Netdata Agent is the main building block in a Netdata ecosystem. It is installed on all monitored systems to monitor system components, containers and applications. + +The Netdata Agent is an **observability pipeline in a box** that can either operate standalone, or blend into a bigger pipeline made by more Netdata Agents (Children and Parents). + +## Distributed Observability Pipeline + +The Netdata observability pipeline looks like in the following graph. + +The pipeline is extended by creating Metrics Observability Centralization Points that are linked all together (`from a remote Netdata`, `to a remote Netdata`), so that all Netdata installed become a vast integrated observability pipeline. + +```mermaid +stateDiagram-v2 + classDef userFeature fill:#f00,color:white,font-weight:bold,stroke-width:2px,stroke:yellow + classDef usedByNC fill:#090,color:white,font-weight:bold,stroke-width:2px,stroke:yellow + Local --> Discover + Local: Local Netdata + [*] --> Detect: from a remote Netdata + Others: 3rd party time-series DBs + Detect: Detect Anomalies + Dashboard:::userFeature + Dashboard: Netdata Dashboards + 3rdDashboard:::userFeature + 3rdDashboard: 3rd party Dashboards + Notifications:::userFeature + Notifications: Alert Notifications + Alerts: Alert Transitions + Discover --> Collect + Collect --> Detect + Store: Store + Store: Time-Series Database + Detect --> Store + Store --> Learn + Store --> Check + Store --> Query + Store --> Score + Store --> Stream + Store --> Export + Query --> Visualize + Score --> Visualize + Check --> Alerts + Learn --> Detect: trained ML models + Alerts --> Notifications + Stream --> [*]: to a remote Netdata + Export --> Others + Others --> 3rdDashboard + Visualize --> Dashboard + Score:::usedByNC + Query:::usedByNC + Alerts:::usedByNC +``` + +1. **Discover**: auto-detect metric sources on localhost, auto-discover metric sources on Kubernetes. +2. **Collect**: query data sources to collect metric samples, using the optimal protocol for each data source. 800+ integrations supported, including dozens of native application protocols, OpenMetrics and StatsD. +3. **Detect Anomalies**: use the trained machine learning models for each metric, to detect in real-time if each sample collected is an outlier (an anomaly), or not. +4. **Store**: keep collected samples and their anomaly status, in the time-series database (database mode `dbengine`) or a ring buffer (database modes `ram` and `alloc`). +5. **Learn**: train multiple machine learning models for each metric collected, learning behaviors and patterns for detecting anomalies. +6. **Check**: a health engine, triggering alerts and sending notifications. Netdata comes with hundreds of alert configurations that are automatically attached to metrics when they get collected, detecting errors, common configuration errors and performance issues. +7. **Query**: a query engine for querying time-series data. +8. **Score**: a scoring engine for comparing and correlating metrics. +9. **Stream**: a mechanism to connect Netdata agents and build Metrics Centralization Points (Netdata Parents). +10. **Visualize**: Netdata's fully automated dashboards for all metrics. +11. **Export**: export metric samples to 3rd party time-series databases, enabling the use of 3rd party tools for visualization, like Grafana. + +## Comparison to other observability solutions + +1. **One moving part**: Other monitoring solution require maintaining metrics exporters, time-series databases, visualization engines. Netdata has everything integrated into one package, even when [Metrics Centralization Points](/docs/observability-centralization-points/metrics-centralization-points/README.md) are required, making deployment and maintenance a lot simpler. + +2. **Automation**: Netdata is designed to automate most of the process of setting up and running an observability solution. It is designed to instantly provide comprehensive dashboards and fully automated alerts, with zero configuration. + +3. **High Fidelity Monitoring**: Netdata was born from our need to kill the console for observability. So, it provides metrics and logs in the same granularity and fidelity console tools do, but also comes with tools that go beyond metrics and logs, to provide a holistic view of the monitored infrastructure (e.g. check [Top Monitoring](/docs/top-monitoring-netdata-functions.md)). + +4. **Minimal impact on monitored systems and applications**: Netdata has been designed to have a minimal impact on the monitored systems and their applications. There are [independent studies](https://www.ivanomalavolta.com/files/papers/ICSOC_2023.pdf) reporting that Netdata excels in CPU usage, RAM utilization, Execution Time and the impact Netdata has on monitored applications and containers. + +5. **Energy efficiency**: [University of Amsterdam did a research to find the energy efficiency of monitoring tools](https://twitter.com/IMalavolta/status/1734208439096676680). They tested Netdata, Prometheus, ELK, among other tools. The study concluded that **Netdata is the most energy efficient monitoring tool**. + +## Dashboard Versions + +The Netdata agents (Standalone, Children and Parents) **share the dashboard** of Netdata Cloud. However, when the user is logged-in and the Netdata agent is connected to Netdata Cloud, the following are enabled (which are otherwise disabled): + +1. **Access to Sensitive Data**: Some data, like systemd-journal logs and several [Top Monitoring](/docs/top-monitoring-netdata-functions.md) features expose sensitive data, like IPs, ports, process command lines and more. To access all these when the dashboard is served directly from a Netdata agent, Netdata Cloud is required to verify that the user accessing the dashboard has the required permissions. + +2. **Dynamic Configuration**: Netdata agents are configured via configuration files, manually or through some provisioning system. The latest Netdata includes a feature to allow users change some of the configuration (collectors, alerts) via the dashboard. This feature is only available to users of paid Netdata Cloud plan. diff --git a/docs/netdata-agent/backup-and-restore-an-agent.md b/docs/netdata-agent/backup-and-restore-an-agent.md new file mode 100644 index 000000000..d17cad604 --- /dev/null +++ b/docs/netdata-agent/backup-and-restore-an-agent.md @@ -0,0 +1,70 @@ +# Backing up a Netdata Agent + +> **Note** +> +> Users are responsible for backing up, recovering, and ensuring their data's availability because Netdata stores data locally on each system due to its decentralized architecture. + +## Introduction + +When preparing to backup a Netdata Agent it is worth considering that there are different kinds of data that you may wish to backup independently or all together: + +| Data type | Description | Location | +|---------------------|------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------| +| Agent configuration | Files controlling configuration of the Netdata Agent | [config directory](/docs/netdata-agent/configuration/README.md) | +| Metrics | Database files | /var/cache/netdata | +| Identity | Claim token, API key and some other files | /var/lib/netdata | + + +## Scenarios + +### Backing up to restore data in case of a node failure + +In this standard scenario, you are backing up your Netdata Agent in case of a node failure or data corruption so that the metrics and the configuration can be recovered. The purpose is not to backup/restore the application itself. + +1. Verify that the directory paths in the table above contain the information you expect. + + > **Note** + > The specific paths may vary depending on installation method, Operating System, and whether it is a Docker/Kubernetes deployment. + +2. It is recommended that you [stop the Netdata Agent](/docs/netdata-agent/start-stop-restart.md) when backing up the Metrics/database files. + Backing up the Agent configuration and Identity folders is straightforward as they should not be changing very frequently. + +3. Using a backup tool such as `tar` you will need to run the backup as _root_ or as the _netdata_ user to access all the files in the directories. + + ``` + sudo tar -cvpzf netdata_backup.tar.gz /etc/netdata/ /var/cache/netdata /var/lib/netdata + ``` + + Stopping the Netdata agent is typically necessary to back up the database files of the Netdata Agent. + +If you want to minimize the gap in metrics caused by stopping the Netdata Agent, consider implementing a backup job or script that follows this sequence: + +- Backup the Agent configuration Identity directories +- Stop the Netdata service +- Backup up the database files +- Restart the netdata agent. + +### Restoring Netdata + +1. Ensure that the Netdata agent is installed and is [stopped](/packaging/installer/README.md#maintaining-a-netdata-agent-installation) + + If you plan to deploy the Agent and restore a backup on top of it, then you might find it helpful to use the [`--dont-start-it`](/packaging/installer/methods/kickstart.md#other-options) option upon installation. + + ``` + wget -O /tmp/netdata-kickstart.sh https://get.netdata.cloud/kickstart.sh && sh /tmp/netdata-kickstart.sh --dont-start-it + ``` + + > **Note** + > If you are going to restore the database files then you should first ensure that the Metrics directory is empty. + > + > ``` + > sudo rm -Rf /var/cache/netdata + > ``` + +2. Restore the backup from the archive + + ``` + sudo tar -xvpzf /path/to/netdata_backup.tar.gz -C / + ``` + +3. [Start the Netdata agent](/packaging/installer/README.md#maintaining-a-netdata-agent-installation) diff --git a/docs/netdata-agent/configuration/README.md b/docs/netdata-agent/configuration/README.md new file mode 100644 index 000000000..097fb9310 --- /dev/null +++ b/docs/netdata-agent/configuration/README.md @@ -0,0 +1,43 @@ +# Netdata Agent Configuration + +The main Netdata agent configuration is `netdata.conf`. + +## The Netdata config directory + +On most Linux systems, by using our [recommended one-line installation](/packaging/installer/README.md#install-on-linux-with-one-line-installer), the **Netdata config +directory** will be `/etc/netdata/`. The config directory contains several configuration files with the `.conf` extension, a +few directories, and a shell script named `edit-config`. + +> Some operating systems will use `/opt/netdata/etc/netdata/` as the config directory. If you're not sure where yours +> is, navigate to `http://NODE:19999/netdata.conf` in your browser, replacing `NODE` with the IP address or hostname of +> your node, and find the `# config directory = ` setting. The value listed is the config directory for your system. + +All of Netdata's documentation assumes that your config directory is at `/etc/netdata`, and that you're running any scripts from inside that directory. + + +## edit `netdata.conf` + +To edit `netdata.conf`, run this on your terminal: + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` + +Your editor will open. + +## downloading `netdata.conf` + +The running version of `netdata.conf` can be downloaded from a running Netdata agent, at this URL: + +``` +http://agent-ip:19999/netdata.conf +``` + +You can save and use this version, using these commands: + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +curl -ksSLo /tmp/netdata.conf.new http://localhost:19999/netdata.conf && sudo mv -i /tmp/netdata.conf.new netdata.conf +``` + diff --git a/docs/netdata-agent/configuration/anonymous-telemetry-events.md b/docs/netdata-agent/configuration/anonymous-telemetry-events.md new file mode 100644 index 000000000..b943ea9a3 --- /dev/null +++ b/docs/netdata-agent/configuration/anonymous-telemetry-events.md @@ -0,0 +1,103 @@ +<!-- +title: "Anonymous telemetry events" +custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/netdata-agent/configuration/anonymous-telemetry-events.md +sidebar_label: "Anonymous telemetry events" +learn_status: "Published" +learn_rel_path: "Configuration" +--> + +# Anonymous telemetry events + +By default, Netdata collects anonymous usage information from the open-source monitoring agent. For agent events like start,stop,crash etc we use our own cloud function in GCP. For frontend telemetry (pageviews etc.) on the agent dashboard itself we use the open-source +product analytics platform [PostHog](https://github.com/PostHog/posthog). + +We are strongly committed to your [data privacy](https://netdata.cloud/privacy/). + +We use the statistics gathered from this information for two purposes: + +1. **Quality assurance**, to help us understand if Netdata behaves as expected, and to help us classify repeated + issues with certain distributions or environments. + +2. **Usage statistics**, to help us interpret how people use the Netdata agent in real-world environments, and to help + us identify how our development/design decisions influence the community. + +Netdata collects usage information via two different channels: + +- **Agent dashboard**: We use the [PostHog JavaScript integration](https://posthog.com/docs/integrations/js-integration) (with sensitive event attributes overwritten to be anonymized) to send product usage events when you access an [Agent's dashboard](/docs/dashboards-and-charts/README.md). +- **Agent backend**: The `netdata` daemon executes the [`anonymous-statistics.sh`](https://github.com/netdata/netdata/blob/6469cf92724644f5facf343e4bdd76ac0551a418/daemon/anonymous-statistics.sh.in) script when Netdata starts, stops cleanly, or fails. + +You can opt-out from sending anonymous statistics to Netdata through three different [opt-out mechanisms](#opt-out). + +## Agent Dashboard - PostHog JavaScript + +When you kick off an Agent dashboard session by visiting `http://NODE:19999`, Netdata initializes a PostHog session and masks various event attributes. + +_Note_: You can see the relevant code in the [dashboard repository](https://github.com/netdata/dashboard/blob/master/src/domains/global/sagas.ts#L107) where the `window.posthog.register()` call is made. + +```JavaScript +window.posthog.register({ + distinct_id: machineGuid, + $ip: "127.0.0.1", + $current_url: "agent dashboard", + $pathname: "netdata-dashboard", + $host: "dashboard.netdata.io", +}) +``` + +In the above snippet a Netdata PostHog session is initialized and the `ip`, `current_url`, `pathname` and `host` attributes are set to constant values for all events that may be sent during the session. This way, information like the IP or hostname of the Agent will not be sent as part of the product usage event data. + +We have configured the dashboard to trigger the PostHog JavaScript code only when the variable `anonymous_statistics` is true. The value of this +variable is controlled via the [opt-out mechanism](#opt-out). + +## Agent Backend - Anonymous Statistics Script + +Every time the daemon is started or stopped and every time a fatal condition is encountered, Netdata uses the anonymous +statistics script to collect system information and send it to the Netdata telemetry cloud function via an http call. The information collected for all +events is: + +- Netdata version +- OS name, version, id, id_like +- Kernel name, version, architecture +- Virtualization technology +- Containerization technology + +Furthermore, the FATAL event sends the Netdata process & thread name, along with the source code function, source code +filename and source code line number of the fatal error. + +Starting with v1.21, we additionally collect information about: + +- Failures to build the dependencies required to use Cloud features. +- Unavailability of Cloud features in an agent. +- Failures to connect to the Cloud in case the [connection process](/src/claim/README.md) has been completed. This includes error codes + to inform the Netdata team about the reason why the connection failed. + +To see exactly what and how is collected, you can review the script template `daemon/anonymous-statistics.sh.in`. The +template is converted to a bash script called `anonymous-statistics.sh`, installed under the Netdata `plugins +directory`, which is usually `/usr/libexec/netdata/plugins.d`. + +## Opt-out + +You can opt-out from sending anonymous statistics to Netdata through three different opt-out mechanisms: + +**Create a file called `.opt-out-from-anonymous-statistics`.** This empty file, stored in your Netdata configuration +directory (usually `/etc/netdata`), immediately stops the statistics script from running, and works with any type of +installation, including manual, offline, and macOS installations. Create the file by running `touch +.opt-out-from-anonymous-statistics` from your Netdata configuration directory. + +**Pass the option `--disable-telemetry` to any of the installer scripts in the [installation +docs](/packaging/installer/README.md).** You can append this option during the initial installation or a manual +update. You can also export the environment variable `DISABLE_TELEMETRY` with a non-zero or non-empty value +(e.g: `export DISABLE_TELEMETRY=1`). + +When using Docker, **set your `DISABLE_TELEMETRY` environment variable to `1`.** You can set this variable with the following +command: `export DISABLE_TELEMETRY=1`. When creating a container using Netdata's [Docker +image](/packaging/docker/README.md#create-a-new-netdata-agent-container) for the first time, this variable will disable +the anonymous statistics script inside of the container. + +Each of these opt-out processes does the following: + +- Prevents the daemon from executing the anonymous statistics script. +- Forces the anonymous statistics script to exit immediately. +- Stops the PostHog JavaScript snippet, which remains on the dashboard, from firing and sending any data to the Netdata PostHog. + + diff --git a/docs/netdata-agent/configuration/cheatsheet.md b/docs/netdata-agent/configuration/cheatsheet.md new file mode 100644 index 000000000..3e1428694 --- /dev/null +++ b/docs/netdata-agent/configuration/cheatsheet.md @@ -0,0 +1,215 @@ +# Useful management and configuration actions + +Below you will find some of the most common actions that one can take while using Netdata. You can use this page as a quick reference for installing Netdata, connecting a node to the Cloud, properly editing the configuration, accessing Netdata's API, and more! + +### Install Netdata + +```bash +wget -O /tmp/netdata-kickstart.sh https://get.netdata.cloud/kickstart.sh && sh /tmp/netdata-kickstart.sh + +# Or, if you have cURL but not wget (such as on macOS): +curl https://get.netdata.cloud/kickstart.sh > /tmp/netdata-kickstart.sh && sh /tmp/netdata-kickstart.sh +``` + +#### Connect a node to Netdata Cloud + +To do so, sign in to Netdata Cloud, on your Space under the Nodes tab, click `Add Nodes` and paste the provided command into your node’s terminal and run it. +You can also copy the Claim token and pass it to the installation script with `--claim-token` and re-run it. + +### Configuration + +**Netdata's config directory** is `/etc/netdata/` but in some operating systems it might be `/opt/netdata/etc/netdata/`. +Look for the `# config directory =` line over at `http://NODE_IP:19999/netdata.conf` to find your config directory. + +From within that directory you can run `sudo ./edit-config netdata.conf` **to edit Netdata's configuration.** +You can edit other config files too, by specifying their filename after `./edit-config`. +You are expected to use this method in all following configuration changes. + +<!-- #### Edit Netdata's other config files (examples): + +- `$ sudo ./edit-config apps_groups.conf` +- `$ sudo ./edit-config ebpf.conf` +- `$ sudo ./edit-config health.d/load.conf` +- `$ sudo ./edit-config go.d/prometheus.conf` + +#### View the running Netdata configuration: `http://NODE:19999/netdata.conf` + +> Replace `NODE` with the IP address or hostname of your node. Often `localhost`. + +## Metrics collection & retention + +You can tweak your settings in the netdata.conf file. +📄 [Find your netdata.conf file](/src/daemon/config/README.md) + +Open a new terminal and navigate to the netdata.conf file. Use the edit-config script to make changes: `sudo ./edit-config netdata.conf` + +The most popular settings to change are: + +#### Increase metrics retention (4GiB) + +``` +sudo ./edit-config netdata.conf +``` + +``` +[global] + dbengine multihost disk space = 4096 +``` + +#### Reduce the collection frequency (every 5 seconds) + +``` +sudo ./edit-config netdata.conf +``` + +``` +[global] + update every = 5 +``` --> + +--- + +#### Enable/disable plugins (groups of collectors) + +```bash +sudo ./edit-config netdata.conf +``` + +```conf +[plugins] + go.d = yes # enabled + node.d = no # disabled +``` + +#### Enable/disable specific collectors + +```bash +sudo ./edit-config go.d.conf # edit a plugin's config +``` + +```yaml +modules: + activemq: no # disabled + cockroachdb: yes # enabled +``` + +#### Edit a collector's config + +```bash +sudo ./edit-config go.d/mysql.conf +``` + +### Alerts & notifications + +<!-- #### Add a new alert + +``` +sudo touch health.d/example-alert.conf +sudo ./edit-config health.d/example-alert.conf +``` --> +After any change, reload the Netdata health configuration: + +```bash +netdatacli reload-health +#or if that command doesn't work on your installation, use: +killall -USR2 netdata +``` + +#### Configure a specific alert + +```bash +sudo ./edit-config health.d/example-alert.conf +``` + +#### Silence a specific alert + +```bash +sudo ./edit-config health.d/example-alert.conf +``` + +``` + to: silent +``` + +<!-- #### Disable alerts and notifications + +```conf +[health] + enabled = no +``` --> + +--- + +### Manage the daemon + +| Intent | Action | +|:----------------------------|------------------------------------------------------------:| +| Start Netdata | `$ sudo service netdata start` | +| Stop Netdata | `$ sudo service netdata stop` | +| Restart Netdata | `$ sudo service netdata restart` | +| Reload health configuration | `$ sudo netdatacli reload-health` `$ killall -USR2 netdata` | +| View error logs | `less /var/log/netdata/error.log` | +| View collectors logs | `less /var/log/netdata/collector.log` | + +#### Change the port Netdata listens to (example, set it to port 39999) + +```conf +[web] +default port = 39999 +``` + +### See metrics and dashboards + +#### Netdata Cloud: `https://app.netdata.cloud` + +#### Local dashboard: `https://NODE:19999` + +> Replace `NODE` with the IP address or hostname of your node. Often `localhost`. + +### Access the Netdata API + +You can access the API like this: `http://NODE:19999/api/VERSION/REQUEST`. +If you want to take a look at all the API requests, check our API page at <https://learn.netdata.cloud/api> +<!-- +## Interact with charts + +| Intent | Action | +| -------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| Stop a chart from updating | `click` | +| Zoom | **Cloud** <br/> use the `zoom in` and `zoom out` buttons on any chart (upper right corner) <br/><br/> **Agent**<br/>`SHIFT` or `ALT` + `mouse scrollwheel` <br/> `SHIFT` or `ALT` + `two-finger pinch` (touchscreen) <br/> `SHIFT` or `ALT` + `two-finger scroll` (touchscreen) | +| Zoom to a specific timeframe | **Cloud**<br/>use the `select and zoom` button on any chart and then do a `mouse selection` <br/><br/> **Agent**<br/>`SHIFT` + `mouse selection` | +| Pan forward or back in time | `click` & `drag` <br/> `touch` & `drag` (touchpad/touchscreen) | +| Select a certain timeframe | `ALT` + `mouse selection` <br/> WIP need to evaluate this `command?` + `mouse selection` (macOS) | +| Reset to default auto refreshing state | `double click` | --> + +<!-- ## Dashboards + +#### Disable the local dashboard + +Use the `edit-config` script to edit the `netdata.conf` file. + +``` +[web] +mode = none +``` --> + +<!-- #### Opt out from anonymous statistics + +``` +sudo touch .opt-out-from-anonymous-statistics +``` --> + +<!-- ## Understanding the dashboard + +**Charts**: A visualization displaying one or more collected/calculated metrics in a time series. Charts are generated +by collectors. + +**Dimensions**: Any value shown on a chart, which can be raw or calculated values, such as percentages, averages, +minimums, maximums, and more. + +**Families**: One instance of a monitored hardware or software resource that needs to be monitored and displayed +separately from similar instances. Example, disks named +**sda**, **sdb**, **sdc**, and so on. + +**Contexts**: A grouping of charts based on the types of metrics collected and visualized. +**disk.io**, **disk.ops**, and **disk.backlog** are all contexts. --> diff --git a/docs/netdata-agent/configuration/common-configuration-changes.md b/docs/netdata-agent/configuration/common-configuration-changes.md new file mode 100644 index 000000000..e9d8abadc --- /dev/null +++ b/docs/netdata-agent/configuration/common-configuration-changes.md @@ -0,0 +1,148 @@ +# Common configuration changes + +The Netdata Agent requires no configuration upon installation to collect thousands of per-second metrics from most +systems, containers, and applications, but there are hundreds of settings to tweak if you want to exercise more control +over your monitoring platform. + +This document assumes familiarity with +using [`edit-config`](/docs/netdata-agent/configuration/README.md) from the Netdata config +directory. + +## Change dashboards and visualizations + +The Netdata Agent's [local dashboard](/docs/dashboards-and-charts/README.md), accessible +at `http://NODE:19999` is highly configurable. If +you use [Netdata Cloud](/docs/netdata-cloud/README.md) +for infrastructure monitoring, you +will see many of these +changes reflected in those visualizations due to the way Netdata Cloud proxies metric data and metadata to your browser. + +### Increase the long-term metrics retention period + +Read our doc +on [increasing long-term metrics storage](/docs/netdata-agent/configuration/optimizing-metrics-database/change-metrics-storage.md) +for details, including a +[calculator](/docs/netdata-agent/configuration/optimizing-metrics-database/change-metrics-storage.md#calculate-the-system-resources-ram-disk-space-needed-to-store-metrics) +to help you determine the exact settings for your desired retention period. + +### Reduce the data collection frequency + +Change `update every` in +the [`[global]` section](/src/daemon/config/README.md#global-section-options) +of `netdata.conf` so +that it is greater than `1`. An `update every` of `5` means the Netdata Agent enforces a _minimum_ collection frequency +of 5 seconds. + +```conf +[global] + update every = 5 +``` + +Every collector and plugin has its own `update every` setting, which you can also change in the `go.d.conf`, +`python.d.conf` or `charts.d.conf` files, or in individual collector configuration files. If the `update +every` for an individual collector is less than the global, the Netdata Agent uses the global setting. See +the [enable or configure a collector](/src/collectors/REFERENCE.md#enable-and-disable-a-specific-collection-module) +doc for details. + +### Disable a collector or plugin + +Turn off entire plugins in +the [`[plugins]` section](/src/daemon/config/README.md#plugins-section-options) +of +`netdata.conf`. + +To disable specific collectors, open `go.d.conf`, `python.d.conf` or `charts.d.conf` and find the line +for that specific module. Uncomment the line and change its value to `no`. + +## Modify alerts and notifications + +Netdata's health monitoring watchdog uses hundreds of preconfigured health entities, with intelligent thresholds, to +generate warning and critical alerts for most production systems and their applications without configuration. However, +each alert and notification method is completely customizable. + +### Add a new alert + +To create a new alert configuration file, initiate an empty file, with a filename that ends in `.conf`, in the +`health.d/` directory. The Netdata Agent loads any valid alert configuration file ending in `.conf` in that directory. +Next, edit the new file with `edit-config`. For example, with a file called `example-alert.conf`. + +```bash +sudo touch health.d/example-alert.conf +sudo ./edit-config health.d/example-alert.conf +``` + +Or, append your new alert to an existing file by editing a relevant existing file in the `health.d/` directory. + +Read more about [configuring alerts](/src/health/REFERENCE.md) to +get started, and see +the [health monitoring reference](/src/health/REFERENCE.md) for a full listing +of options available in health entities. + +### Configure a specific alert + +Tweak existing alerts by editing files in the `health.d/` directory. For example, edit `health.d/cpu.conf` to change how +the Agent responds to anomalies related to CPU utilization. + +To see which configuration file you need to edit to configure a specific +alert, [view your active alerts](/docs/dashboards-and-charts/alerts-tab.md) in +Netdata Cloud or the local Agent dashboard and look for the **source** line. For example, it might +read `source 4@/usr/lib/netdata/conf.d/health.d/cpu.conf`. + +Because the source path contains `health.d/cpu.conf`, run `sudo edit-config health.d/cpu.conf` to configure that alert. + +### Disable a specific alert + +Open the configuration file for that alert and set the `to` line to `silent`. + +```conf +template: disk_fill_rate + on: disk.space + lookup: max -1s at -30m unaligned of avail + calc: ($this - $avail) / (30 * 60) + every: 15s + to: silent +``` + +### Turn of all alerts and notifications + +Set `enabled` to `no` in +the [`[health]`](/src/daemon/config/README.md#health-section-options) +section of `netdata.conf`. + +### Enable alert notifications + +Open `health_alarm_notify.conf` for editing. First, read the [enabling notifications](/docs/alerts-and-notifications/notifications/README.md#netdata-agent) doc +for an example of the process using Slack, then +click on the link to your preferred notification method to find documentation for that specific endpoint. + +## Improve node security + +While the Netdata Agent is both [open and secure by design](https://www.netdata.cloud/blog/netdata-agent-dashboard/), we +recommend every user take some action to administer and secure their nodes. + +Learn more about the available options in the [security design documentation](/docs/security-and-privacy-design/README.md). + +## Reduce resource usage + +Read +our [performance optimization guide](/docs/netdata-agent/configuration/optimize-the-netdata-agents-performance.md) +for a long list of specific changes +that can reduce the Netdata Agent's CPU/memory footprint and IO requirements. + +## Organize nodes with host labels + +Beginning with v1.20, Netdata accepts user-defined **host labels**. These labels are sent during streaming, exporting, +and as metadata to Netdata Cloud, and help you organize the metrics coming from complex infrastructure. Host labels are +defined in the section `[host labels]`. + +For a quick introduction, read +the [host label guide](/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts.md). + +The following restrictions apply to host label names: + +- Names cannot start with `_`, but it can be present in other parts of the name. +- Names only accept alphabet letters, numbers, dots, and dashes. + +The policy for values is more flexible, but you can not use exclamation marks (`!`), whitespaces (` `), single quotes +(`'`), double quotes (`"`), or asterisks (`*`), because they are used to compare label values in health alerts and +templates. diff --git a/docs/netdata-agent/configuration/dynamic-configuration.md b/docs/netdata-agent/configuration/dynamic-configuration.md new file mode 100644 index 000000000..7064abf9a --- /dev/null +++ b/docs/netdata-agent/configuration/dynamic-configuration.md @@ -0,0 +1,62 @@ +# Dynamic Configuration Manager + +**Netdata Cloud paid subscription required.** + +The Dynamic Configuration Manager allows direct configuration of collectors and alerts through the Netdata UI. This feature allows users to: + +- **Create, test, and deploy configurations** for one or more nodes directly within the UI. +- **Eliminate the need for manual command-line edits and node access**, enhancing workflow efficiency. + +**Cloud Connection and Security**: Nodes using Dynamic Configuration Manager require a connection to Netdata Cloud. This ensures proper permission handling and data security. + +> **Info** +> +> To understand what actions users can perform based on their role, refer to the [Role Based Access documentation](/docs/netdata-cloud/authentication-and-authorization/role-based-access-model.md#dynamic-configuration-manager). + +## Collectors + +### Module + +A module represents a specific data collector, such as Apache, MySQL, or Redis. Think of modules as templates for data collection. + +Each module can have multiple jobs, which are unique configurations of that template tailored to your specific needs. + +You can manage individual modules using the following actions: + +| Action | Description | +|--------------------|---------------------------------------------------------------------------------------------------------------------------| +| **Add job** | Create new configuration instances (jobs) for a particular module. | +| **Enable/Disable** | Disabling a module deactivates all currently running jobs and prevents any future jobs from being created for that module | + +### Job + +A job represents a running instance of a module with a specific configuration. Think of it as a customized data collection task based on a module template. + +Every job has a designated "source type" indicating its origin: + +- **Stock**: Pre-installed with Netdata and provides basic data collection for common services. +- **User**: Originates from user-created files on the node. +- **Discovered**: Automatically generated by Netdata upon discovering a service running on the node. +- **Dynamic Configuration**: Created and managed using the Dynamic Configuration Manager. + +You can manage individual jobs using the following actions: + +| Action | Description | +|--------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| **Restart** | This restarts a job's data collection, useful if a job encounters a "Failed" state. Upon restart, a notification with the failure message will be displayed. | +| **Remove** | Delete a job configuration entirely. Note that only jobs created through Dynamic Configuration can be removed. Other job types originate from files on the node and cannot be deleted here. | +| **Enable/Disable** | Control the job's activity. Disabling a running job stops data collection. | +| **Edit** | Modify an existing job's configuration. | +| **Test** | Validate newly created or edited configurations before applying them permanently. | + +## Health + +Each entry in the Health tab contains an Alert template, that then is used to create Alerts. + +The functionality in the main view is the same as with the [Collectors tab](#collectors). + +### Health entry configuration + +You can create new configurations both for templates or individual Alerts. + +Each template can have multiple items which resemble Alerts that either apply to a certain [instance](/docs/dashboards-and-charts/netdata-charts.md#instances-dropdown), or all instances under a specific [context](/docs/dashboards-and-charts/netdata-charts.md#contexts) diff --git a/docs/netdata-agent/configuration/optimize-the-netdata-agents-performance.md b/docs/netdata-agent/configuration/optimize-the-netdata-agents-performance.md new file mode 100644 index 000000000..6acbd4977 --- /dev/null +++ b/docs/netdata-agent/configuration/optimize-the-netdata-agents-performance.md @@ -0,0 +1,266 @@ +# How to optimize the Netdata Agent's performance + +We designed the Netdata Agent to be incredibly lightweight, even when it's collecting a few thousand dimensions every +second and visualizing that data into hundreds of charts. However, the default settings of the Netdata Agent are not +optimized for performance, but for a simple, standalone setup. We want the first install to give you something you can +run without any configuration. Most of the settings and options are enabled, since we want you to experience the full +thing. + +By default, Netdata will automatically detect applications running on the node it is installed to start collecting +metrics in real-time, has health monitoring enabled to evaluate alerts and trains Machine Learning (ML) models for each +metric, to detect anomalies. + +This document describes the resources required for the various default capabilities and the strategies to optimize +Netdata for production use. + +## Summary of performance optimizations + +The following table summarizes the effect of each optimization on the CPU, RAM and Disk IO utilization in production. + +| Optimization | CPU | RAM | Disk IO | +|-------------------------------------------------------------------------------------------------------------------------------|--------------------|--------------------|--------------------| +| [Use streaming and replication](#use-streaming-and-replication) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | +| [Disable unneeded plugins or collectors](#disable-unneeded-plugins-or-collectors) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | +| [Reduce data collection frequency](#reduce-collection-frequency) | :heavy_check_mark: | | :heavy_check_mark: | +| [Change how long Netdata stores metrics](/docs/netdata-agent/configuration/optimizing-metrics-database/change-metrics-storage.md) | | :heavy_check_mark: | :heavy_check_mark: | +| [Use a different metric storage database](/src/database/README.md) | | :heavy_check_mark: | :heavy_check_mark: | +| [Disable machine learning](#disable-machine-learning) | :heavy_check_mark: | | | +| [Use a reverse proxy](#run-netdata-behind-a-proxy) | :heavy_check_mark: | | | +| [Disable/lower gzip compression for the agent dashboard](#disablelower-gzip-compression-for-the-dashboard) | :heavy_check_mark: | | | + +## Resources required by a default Netdata installation + +Netdata's performance is primarily affected by **data collection/retention** and **clients accessing data**. + +You can configure almost all aspects of data collection/retention, and certain aspects of clients accessing data. + +### CPU consumption + +Expect about: + +- 1-3% of a single core for the netdata core +- 1-3% of a single core for the various collectors (e.g. go.d.plugin, apps.plugin) +- 5-10% of a single core, when ML training runs + +Your experience may vary depending on the number of metrics collected, the collectors enabled and the specific +environment they run on, i.e. the work they have to do to collect these metrics. + +As a general rule, for modern hardware and VMs, the total CPU consumption of a standalone Netdata installation, +including all its components, should be below 5 - 15% of a single core. For example, on 8 core server it will use only +0.6% - 1.8% of a total CPU capacity, depending on the CPU characteristics. + +The Netdata Agent runs with the lowest +possible [process scheduling policy](/src/daemon/README.md#netdata-process-scheduling-policy), +which is `nice 19`, and uses the `idle` process scheduler. Together, these settings ensure that the Agent only gets CPU +resources when the node has CPU resources to space. If the node reaches 100% CPU utilization, the Agent is stopped first +to ensure your applications get any available resources. + +To reduce CPU usage you can (either one or a combination of the following actions): + +1. [Disable machine learning](#disable-machine-learning), +2. [Use streaming and replication](#use-streaming-and-replication), +3. [Reduce the data collection frequency](#reduce-collection-frequency) +4. [Disable unneeded plugins or collectors](#disable-unneeded-plugins-or-collectors) +5. [Use a reverse proxy](#run-netdata-behind-a-proxy), +6. [Disable/lower gzip compression for the agent dashboard](#disablelower-gzip-compression-for-the-dashboard). + +### Memory consumption + +The memory footprint of Netdata is mainly influenced by the number of metrics concurrently being collected. Expect about +150MB of RAM for a typical 64-bit server collecting about 2000 to 3000 metrics. + +To estimate and control memory consumption, you can (either one or a combination of the following actions): + +1. [Disable unneeded plugins or collectors](#disable-unneeded-plugins-or-collectors) +2. [Change how long Netdata stores metrics](/docs/netdata-agent/configuration/optimizing-metrics-database/change-metrics-storage.md) +3. [Use a different metric storage database](/src/database/README.md). + +### Disk footprint and I/O + +By default, Netdata should not use more than 1GB of disk space, most of which is dedicated for storing metric data and +metadata. For typical installations collecting 2000 - 3000 metrics, this storage should provide a few days of +high-resolution retention (per second), about a month of mid-resolution retention (per minute) and more than a year of +low-resolution retention (per hour). + +Netdata spreads I/O operations across time. For typical standalone installations there should be a few write operations +every 5-10 seconds of a few kilobytes each, occasionally up to 1MB. In addition, under heavy load, collectors that +require disk I/O may stop and show gaps in charts. + +To optimize your disk footprint in any aspect described below you can: + + +To configure retention, you can: + +1. [Change how long Netdata stores metrics](/docs/netdata-agent/configuration/optimizing-metrics-database/change-metrics-storage.md). + +To control disk I/O: + +1. [Use a different metric storage database](/src/database/README.md), + + +Minimize deployment impact on the production system by optimizing disk footprint: + +1. [Using streaming and replication](#use-streaming-and-replication) +2. [Reduce the data collection frequency](#reduce-collection-frequency) +3. [Disable unneeded plugins or collectors](#disable-unneeded-plugins-or-collectors). + +## Use streaming and replication + +For all production environments, parent Netdata nodes outside the production infrastructure should be receiving all +collected data from children Netdata nodes running on the production infrastructure, +using [streaming and replication](/docs/observability-centralization-points/README.md). + +### Disable health checks on the child nodes + +When you set up streaming, we recommend you run your health checks on the parent. This saves resources on the children +and makes it easier to configure or disable alerts and agent notifications. + +The parents by default run health checks for each child, as long as the child is connected (the details are +in `stream.conf`). On the child nodes you should add to `netdata.conf` the following: + +```conf +[health] + enabled = no +``` + +### Use memory mode ram for the child nodes + +See [using a different metric storage database](/src/database/README.md). + +## Disable unneeded plugins or collectors + +If you know that you don't need an [entire plugin or a specific +collector](/src/collectors/README.md#collector-architecture-and-terminology), +you can disable any of them. Keep in mind that if a plugin/collector has nothing to do, it simply shuts down and does +not consume system resources. You will only improve the Agent's performance by disabling plugins/collectors that are +actively collecting metrics. + +Open `netdata.conf` and scroll down to the `[plugins]` section. To disable any plugin, uncomment it and set the value to +`no`. For example, to explicitly keep the `proc` and `go.d` plugins enabled while disabling `python.d` and `charts.d`. + +```conf +[plugins] + proc = yes + python.d = no + charts.d = no + go.d = yes +``` + +Disable specific collectors by opening their respective plugin configuration files, uncommenting the line for the +collector, and setting its value to `no`. + +```bash +sudo ./edit-config go.d.conf +sudo ./edit-config python.d.conf +sudo ./edit-config charts.d.conf +``` + +For example, to disable a few Python collectors: + +```conf +modules: + apache: no + dockerd: no + fail2ban: no +``` + +## Reduce collection frequency + +The fastest way to improve the Agent's resource utilization is to reduce how often it collects metrics. + +### Global + +If you don't need per-second metrics, or if the Netdata Agent uses a lot of CPU even when no one is viewing that node's +dashboard, [configure the Agent](/docs/netdata-agent/configuration/README.md) to collect +metrics less often. + +Open `netdata.conf` and edit the `update every` setting. The default is `1`, meaning that the Agent collects metrics +every second. + +If you change this to `2`, Netdata enforces a minimum `update every` setting of 2 seconds, and collects metrics every +other second, which will effectively halve CPU utilization. Set this to `5` or `10` to collect metrics every 5 or 10 +seconds, respectively. + +```conf +[global] + update every = 5 +``` + +### Specific plugin or collector + +Every collector and plugin has its own `update every` setting, which you can also change in the `go.d.conf`, +`python.d.conf`, or `charts.d.conf` files, or in individual collector configuration files. If the `update +every` for an individual collector is less than the global, the Netdata Agent uses the global setting. See +the [collectors configuration reference](/src/collectors/REFERENCE.md) for +details. + +To reduce the frequency of +an [internal_plugin/collector](/src/collectors/README.md#collector-architecture-and-terminology), +open `netdata.conf` and find the appropriate section. For example, to reduce the frequency of the `apps` plugin, which +collects and visualizes metrics on application resource utilization: + +```conf +[plugin:apps] + update every = 5 +``` + +To [configure an individual collector](/src/collectors/REFERENCE.md#configure-a-collector), +open its specific configuration file with `edit-config` and look for the `update_every` setting. For example, to reduce +the frequency of the `nginx` collector, run `sudo ./edit-config go.d/nginx.conf`: + +```conf +# [ GLOBAL ] +update_every: 10 +``` + +## Lower memory usage for metrics retention + +See how +to [change how long Netdata stores metrics](/docs/netdata-agent/configuration/optimizing-metrics-database/change-metrics-storage.md). + +## Use a different metric storage database + +Consider [using a different metric storage database](/src/database/README.md) +when running Netdata on IoT devices, and for children in a parent-child set up based +on [streaming and replication](/docs/observability-centralization-points/README.md). + +## Disable machine learning + +Automated anomaly detection may be a powerful tool, but we recommend it to only be enabled on Netdata parents that sit +outside your production infrastructure, or if you have cpu and memory to spare. You can disable ML with the following: + +```conf +[ml] + enabled = no +``` + +## Run Netdata behind a proxy + +A dedicated web server like nginx provides more robustness than the Agent's +internal [web server](/src/web/README.md). +Nginx can handle more concurrent connections, reuse idle connections, and use fast gzip compression to reduce payloads. + +For details on installing another web server as a proxy for the local Agent dashboard, +see [reverse proxies](/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/README.md). + +## Disable/lower gzip compression for the dashboard + +If you choose not to run the Agent behind Nginx, you can disable or lower the Agent's web server's gzip compression. +While gzip compression does reduce the size of the HTML/CSS/JS payload, it does use additional CPU while a user is +looking at the local Agent dashboard. + +To disable gzip compression, open `netdata.conf` and find the `[web]` section: + +```conf +[web] + enable gzip compression = no +``` + +Or to lower the default compression level: + +```conf +[web] + enable gzip compression = yes + gzip compression level = 1 +``` + diff --git a/docs/netdata-agent/configuration/optimizing-metrics-database/README.md b/docs/netdata-agent/configuration/optimizing-metrics-database/README.md new file mode 100644 index 000000000..fdbd3b690 --- /dev/null +++ b/docs/netdata-agent/configuration/optimizing-metrics-database/README.md @@ -0,0 +1,3 @@ +# Optimizing Metrics Database Overview + +This section contains documentation to help you understand how the metrics DB works, understand the key features and configure them to suit your needs.
\ No newline at end of file diff --git a/docs/netdata-agent/configuration/optimizing-metrics-database/change-metrics-storage.md b/docs/netdata-agent/configuration/optimizing-metrics-database/change-metrics-storage.md new file mode 100644 index 000000000..8d940a730 --- /dev/null +++ b/docs/netdata-agent/configuration/optimizing-metrics-database/change-metrics-storage.md @@ -0,0 +1,122 @@ +# Change how long Netdata stores metrics + +Netdata offers a granular approach to data retention, allowing you to manage storage based on both **time** and **disk +space**. This provides greater control and helps you optimize storage usage for your specific needs. + +**Default Retention Limits**: + +| Tier | Resolution | Time Limit | Size Limit | +|:----:|:-------------------:|:----------:|:----------:| +| 0 | high (per second) | 14 days | 1 GiB | +| 1 | middle (per minute) | 3 months | 1 GiB | +| 2 | low (per hour) | 2 years | 1 GiB | + +With these defaults, Netdata requires approximately 4 GiB of storage space (including metadata). + +## Retention Settings + +> **In a parent-child setup**, these settings manage the shared storage space utilized by the Netdata parent agent for +> storing metrics collected by both the parent and its child nodes. + +You can fine-tune retention for each tier by setting a time limit or size limit. Setting a limit to 0 disables it, +allowing for no time-based deletion for that tier or using all available space, respectively. This enables various +retention strategies as shown in the table below: + +| Setting | Retention Behavior | +|--------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------| +| Size Limit = 0, Time Limit > 0 | **Time-based only:** data is stored for a specific duration regardless of disk usage. | +| Time Limit = 0, Size Limit > 0 | **Space-based only:** data is stored until it reaches a certain amount of disk space, regardless of time. | +| Time Limit > 0, Size Limit > 0 | **Combined time and space limits:** data is deleted once it reaches either the time limit or the disk space limit, whichever comes first. | + +You can change these limits in `netdata.conf`: + +``` +[db] + mode = dbengine + storage tiers = 3 + + # Tier 0, per second data. Set to 0 for no limit. + dbengine tier 0 disk space MB = 1024 + dbengine tier 0 retention days = 14 + + # Tier 1, per minute data. Set to 0 for no limit. + dbengine tier 1 disk space MB = 1024 + dbengine tier 1 retention days = 90 + + # Tier 2, per hour data. Set to 0 for no limit. + dbengine tier 2 disk space MB = 1024 + dbengine tier 2 retention days = 730 +``` + +## Monitoring Retention Utilization + +Netdata provides a visual representation of storage utilization for both time and space limits across all tiers within +the 'dbengine retention' subsection of the 'Netdata Monitoring' section on the dashboard. This chart shows exactly how +your storage space (disk space limits) and time (time limits) are used for metric retention. + +## Legacy configuration + +### v1.45.6 and prior + +Netdata versions prior to v1.46.0 relied on a disk space-based retention. + +**Default Retention Limits**: + +| Tier | Resolution | Size Limit | +|:----:|:-------------------:|:----------:| +| 0 | high (per second) | 256 MB | +| 1 | middle (per minute) | 128 MB | +| 2 | low (per hour) | 64 GiB | + +You can change these limits in `netdata.conf`: + +``` +[db] + mode = dbengine + storage tiers = 3 + + # Tier 0, per second data + dbengine multihost disk space MB = 256 + + # Tier 1, per minute data + dbengine tier 1 multihost disk space MB = 1024 + + # Tier 2, per hour data + dbengine tier 2 multihost disk space MB = 1024 +``` + +### v1.35.1 and prior + +These versions of the Agent do not support tiers. You could change the metric retention for the parent and +all of its children only with the `dbengine multihost disk space MB` setting. This setting accounts the space allocation +for the parent node and all of its children. + +To configure the database engine, look for the `page cache size MB` and `dbengine multihost disk space MB` settings in +the `[db]` section of your `netdata.conf`. + +```conf +[db] + dbengine page cache size MB = 32 + dbengine multihost disk space MB = 256 +``` + +### v1.23.2 and prior + +_For Netdata Agents earlier than v1.23.2_, the Agent on the parent node uses one dbengine instance for itself, and +another instance for every child node it receives metrics from. If you had four streaming nodes, you would have five +instances in total (`1 parent + 4 child nodes = 5 instances`). + +The Agent allocates resources for each instance separately using the `dbengine disk space MB` (**deprecated**) setting. +If `dbengine disk space MB`(**deprecated**) is set to the default `256`, each instance is given 256 MiB in disk space, +which means the total disk space required to store all instances is, +roughly, `256 MiB * 1 parent * 4 child nodes = 1280 MiB`. + +#### Backward compatibility + +All existing metrics belonging to child nodes are automatically converted to legacy dbengine instances and the localhost +metrics are transferred to the multihost dbengine instance. + +All new child nodes are automatically transferred to the multihost dbengine instance and share its page cache and disk +space. If you want to migrate a child node from its legacy dbengine instance to the multihost dbengine instance, you +must delete the instance's directory, which is located in `/var/cache/netdata/MACHINE_GUID/dbengine`, after stopping the +Agent. diff --git a/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts.md b/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts.md new file mode 100644 index 000000000..b0094a60f --- /dev/null +++ b/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts.md @@ -0,0 +1,253 @@ +# Organize systems, metrics, and alerts + +When you use Netdata to monitor and troubleshoot an entire infrastructure, you need sophisticated ways of keeping everything organized. +Netdata allows to organize your observability infrastructure with Spaces, Rooms, virtual nodes, host labels, and metric labels. + +## Spaces and Rooms + +[Spaces](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md#netdata-cloud-spaces) are used for organization-level or infrastructure-level +grouping of nodes and people. A node can only appear in a single space, while people can have access to multiple spaces. + +The [Rooms](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md#netdata-cloud-rooms) in a space bring together nodes and people in +collaboration areas. Rooms can also be used for fine-tuned +[role based access control](/docs/netdata-cloud/authentication-and-authorization/role-based-access-model.md). + +## Virtual nodes + +Netdata’s virtual nodes functionality allows you to define nodes in configuration files and have them be treated as regular nodes +in all of the UI, dashboards, tabs, filters etc. For example, you can create a virtual node each for all your Windows machines +and monitor them as discrete entities. Virtual nodes can help you simplify your infrastructure monitoring and focus on the +individual node that matters. + +To define your windows server as a virtual node you need to: + + * Define virtual nodes in `/etc/netdata/vnodes/vnodes.conf` + + ```yaml + - hostname: win_server1 + guid: <value> + ``` + Just remember to use a valid guid (On Linux you can use `uuidgen` command to generate one, on Windows just use the `[guid]::NewGuid()` command in PowerShell) + + * Add the vnode config to the data collection job. e.g. in `go.d/windows.conf`: + ```yaml + jobs: + - name: win_server1 + vnode: win_server1 + url: http://203.0.113.10:9182/metrics + ``` + +## Host labels + +Host labels can be extremely useful when: + +- You need alerts that adapt to the system's purpose +- You need properly-labeled metrics archiving so you can sort, correlate, and mash-up your data to your heart's content. +- You need to keep tabs on ephemeral Docker containers in a Kubernetes cluster. + +Let's take a peek into how to create host labels and apply them across a few of Netdata's features to give you more +organization power over your infrastructure. + +### Default labels + +When Netdata starts, it captures relevant information about the system and converts them into automatically generated +host labels. You can use these to logically organize your systems via health entities, exporting metrics, +parent-child status, and more. + +They capture the following: + +- Kernel version +- Operating system name and version +- CPU architecture, system cores, CPU frequency, RAM, and disk space +- Whether Netdata is running inside of a container, and if so, the OS and hardware details about the container's host +- Whether Netdata is running inside K8s node +- What virtualization layer the system runs on top of, if any +- Whether the system is a streaming parent or child + +If you want to organize your systems without manually creating host labels, try the automatic labels in some of the +features below. You can see them under `http://HOST-IP:19999/api/v1/info`, beginning with an underscore `_`. +```json +{ + ... + "host_labels": { + "_is_k8s_node": "false", + "_is_parent": "false", + ... +``` + +### Custom labels + +Host labels are defined in `netdata.conf`. To create host labels, open that file using `edit-config`. + +```bash +cd /etc/netdata # Replace this path with your Netdata config directory, if different +sudo ./edit-config netdata.conf +``` + +Create a new `[host labels]` section defining a new host label and its value for the system in question. Make sure not +to violate any of the [host label naming rules](/docs/netdata-agent/configuration/common-configuration-changes.md#organize-nodes-with-host-labels). + +```conf +[host labels] + type = webserver + location = us-seattle + installed = 20200218 +``` + +Once you've written a few host labels, you need to enable them. Instead of restarting the entire Netdata service, you +can reload labels using the helpful `netdatacli` tool: + +```bash +netdatacli reload-labels +``` + +Your host labels will now be enabled. You can double-check these by using `curl http://HOST-IP:19999/api/v1/info` to +read the status of your agent. For example, from a VPS system running Debian 10: + +```json +{ + ... + "host_labels": { + "_is_k8s_node": "false", + "_is_parent": "false", + "_virt_detection": "systemd-detect-virt", + "_container_detection": "none", + "_container": "unknown", + "_virtualization": "kvm", + "_architecture": "x86_64", + "_kernel_version": "4.19.0-6-amd64", + "_os_version": "10 (buster)", + "_os_name": "Debian GNU/Linux", + "type": "webserver", + "location": "seattle", + "installed": "20200218" + }, + ... +} +``` + + +### Host labels in streaming + +You may have noticed the `_is_parent` and `_is_child` automatic labels from above. Host labels are also now +streamed from a child to its parent node, which concentrates an entire infrastructure's OS, hardware, container, +and virtualization information in one place: the parent. + +Now, if you'd like to remind yourself of how much RAM a certain child node has, you can access +`http://localhost:19999/host/CHILD_HOSTNAME/api/v1/info` and reference the automatically-generated host labels from the +child system. It's a vastly simplified way of accessing critical information about your infrastructure. + +> ⚠️ Because automatic labels for child nodes are accessible via API calls, and contain sensitive information like +> kernel and operating system versions, you should secure streaming connections with SSL. See the [streaming +> documentation](/src/streaming/README.md#securing-streaming-communications) for details. You may also want to use +> [access lists](/src/web/server/README.md#access-lists) or [expose the API only to LAN/localhost +> connections](/docs/netdata-agent/securing-netdata-agents.md#expose-netdata-only-in-a-private-lan). + +You can also use `_is_parent`, `_is_child`, and any other host labels in both health entities and metrics +exporting. Speaking of which... + +### Host labels in alerts + +You can use host labels to logically organize your systems by their type, purpose, or location, and then apply specific +alerts to them. + +For example, let's use configuration example from earlier: + +```conf +[host labels] + type = webserver + location = us-seattle + installed = 20200218 +``` + +You could now create a new health entity (checking if disk space will run out soon) that applies only to any host +labeled `webserver`: + +```yaml + template: disk_fill_rate + on: disk.space + lookup: max -1s at -30m unaligned of avail + calc: ($this - $avail) / (30 * 60) + every: 15s + host labels: type = webserver +``` + +Or, by using one of the automatic labels, for only webserver systems running a specific OS: + +```yaml + host labels: _os_name = Debian* +``` + +In a streaming configuration where a parent node is triggering alerts for its child nodes, you could create health +entities that apply only to child nodes: + +```yaml + host labels: _is_child = true +``` + +Or when ephemeral Docker nodes are involved: + +```yaml + host labels: _container = docker +``` + +Of course, there are many more possibilities for intuitively organizing your systems with host labels. See the [health +documentation](/src/health/REFERENCE.md#alert-line-host-labels) for more details, and then get creative! + +### Host labels in metrics exporting + +If you have enabled any metrics exporting via our experimental [exporters](/src/exporting/README.md), any new host +labels you created manually are sent to the destination database alongside metrics. You can change this behavior by +editing `exporting.conf`, and you can even send automatically-generated labels on with exported metrics. + +```conf +[exporting:global] +enabled = yes +send configured labels = yes +send automatic labels = no +``` + +You can also change this behavior per exporting connection: + +```conf +[opentsdb:my_instance3] +enabled = yes +destination = localhost:4242 +data source = sum +update every = 10 +send charts matching = system.cpu +send configured labels = no +send automatic labels = yes +``` + +By applying labels to exported metrics, you can more easily parse historical metrics with the labels applied. To learn +more about exporting, read the [documentation](/src/exporting/README.md). + +## Metric labels + +The Netdata aggregate charts allow you to filter and group metrics based on label name-value pairs. + +All go.d plugin collectors support the specification of labels at the "collection job" level. Some collectors come with out of the box +labels (e.g. generic Prometheus collector, Kubernetes, Docker and more). But you can also add your own custom labels, by configuring +the data collection jobs. + +For example, suppose we have a single Netdata agent, collecting data from two remote Apache web servers, located in different data centers. +The web servers are load balanced and provide access to the service "Payments". + +You can define the following in `go.d.conf`, to be able to group the web requests by service or location: + +``` +jobs: + - name: mywebserver1 + url: http://host1/server-status?auto + labels: + service: "Payments" + location: "Atlanta" + - name: mywebserver2 + url: http://host2/server-status?auto + labels: + service: "Payments" + location: "New York" +``` + +Of course you may define as many custom label/value pairs as you like, in as many data collection jobs you need. diff --git a/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/README.md b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/README.md new file mode 100644 index 000000000..00fe63af1 --- /dev/null +++ b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/README.md @@ -0,0 +1,34 @@ +# Running the Netdata Agent behind a reverse proxy + +If you need to access a Netdata agent's user interface or API in a production environment we recommend you put Netdata behind +another web server and secure access to the dashboard via SSL, user authentication and firewall rules. + +A dedicated web server also provides more robustness and capabilities than the Agent's [internal web server](/src/web/README.md). + +We have documented running behind +[nginx](/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-nginx.md), +[Apache](/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-apache.md), +[HAProxy](/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-haproxy.md), +[Lighttpd](/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-lighttpd.md), +[Caddy](/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-caddy.md), +and [H2O](/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-h2o.md). +If you prefer a different web server, we suggest you follow the documentation for nginx and tell us how you did it + by adding your own "Running behind webserverX" document. + +When you run Netdata behind a reverse proxy, we recommend you firewall protect all your Netdata servers, so that only the web server IP will be allowed to directly access Netdata. To do this, run this on each of your servers (or use your firewall manager): + +```sh +PROXY_IP="1.2.3.4" +iptables -t filter -I INPUT -p tcp --dport 19999 \! -s ${PROXY_IP} -m conntrack --ctstate NEW -j DROP +``` + +The above will prevent anyone except your web server to access a Netdata dashboard running on the host. + +You can also use `netdata.conf`: + +``` +[web] + allow connections from = localhost 1.2.3.4 +``` + +Of course, you can add more IPs. diff --git a/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-apache.md b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-apache.md new file mode 100644 index 000000000..1f7274d5c --- /dev/null +++ b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-apache.md @@ -0,0 +1,363 @@ +# Netdata via Apache's mod_proxy + +Below you can find instructions for configuring an apache server to: + +1. Proxy a single Netdata via an HTTP and HTTPS virtual host. +2. Dynamically proxy any number of Netdata servers. +3. Add user authentication. +4. Adjust Netdata settings to get optimal results. + +## Requirements + +Make sure your apache has `mod_proxy` and `mod_proxy_http` installed and enabled. + +On Debian/Ubuntu systems, install apache, which already includes the two modules, using: + +```sh +sudo apt-get install apache2 +``` + +Enable them: + +```sh +sudo a2enmod proxy +sudo a2enmod proxy_http +``` + +Also, enable the rewrite module: + +```sh +sudo a2enmod rewrite +``` +## Netdata on an existing virtual host + +On any **existing** and already **working** apache virtual host, you can redirect requests for URL `/netdata/` to one or more Netdata servers. + +### Proxy one Netdata, running on the same server apache runs + +Add the following on top of any existing virtual host. It will allow you to access Netdata as `http://virtual.host/netdata/`. + +```conf +<VirtualHost *:80> + + RewriteEngine On + ProxyRequests Off + ProxyPreserveHost On + + <Proxy *> + Require all granted + </Proxy> + + # Local Netdata server accessed with '/netdata/', at localhost:19999 + ProxyPass "/netdata/" "http://localhost:19999/" connectiontimeout=5 timeout=30 keepalive=on + ProxyPassReverse "/netdata/" "http://localhost:19999/" + + # if the user did not give the trailing /, add it + # for HTTP (if the virtualhost is HTTP, use this) + RewriteRule ^/netdata$ http://%{HTTP_HOST}/netdata/ [L,R=301] + # for HTTPS (if the virtualhost is HTTPS, use this) + #RewriteRule ^/netdata$ https://%{HTTP_HOST}/netdata/ [L,R=301] + + # rest of virtual host config here + +</VirtualHost> +``` + +### Proxy multiple Netdata running on multiple servers + +Add the following on top of any existing virtual host. It will allow you to access multiple Netdata as `http://virtual.host/netdata/HOSTNAME/`, where `HOSTNAME` is the hostname of any other Netdata server you have (to access the `localhost` Netdata, use `http://virtual.host/netdata/localhost/`). + +```conf +<VirtualHost *:80> + + RewriteEngine On + ProxyRequests Off + ProxyPreserveHost On + + <Proxy *> + Require all granted + </Proxy> + + # proxy any host, on port 19999 + ProxyPassMatch "^/netdata/([A-Za-z0-9\._-]+)/(.*)" "http://$1:19999/$2" connectiontimeout=5 timeout=30 keepalive=on + + # make sure the user did not forget to add a trailing / + # for HTTP (if the virtualhost is HTTP, use this) + RewriteRule "^/netdata/([A-Za-z0-9\._-]+)$" http://%{HTTP_HOST}/netdata/$1/ [L,R=301] + # for HTTPS (if the virtualhost is HTTPS, use this) + RewriteRule "^/netdata/([A-Za-z0-9\._-]+)$" https://%{HTTP_HOST}/netdata/$1/ [L,R=301] + + # rest of virtual host config here + +</VirtualHost> +``` + +> IMPORTANT<br/> +> The above config allows your apache users to connect to port 19999 on any server on your network. + +If you want to control the servers your users can connect to, replace the `ProxyPassMatch` line with the following. This allows only `server1`, `server2`, `server3` and `server4`. + +``` + ProxyPassMatch "^/netdata/(server1|server2|server3|server4)/(.*)" "http://$1:19999/$2" connectiontimeout=5 timeout=30 keepalive=on +``` + +## Netdata on a dedicated virtual host + +You can proxy Netdata through apache, using a dedicated apache virtual host. + +Create a new apache site: + +```sh +nano /etc/apache2/sites-available/netdata.conf +``` + +with this content: + +```conf +<VirtualHost *:80> + ProxyRequests Off + ProxyPreserveHost On + + ServerName netdata.domain.tld + + <Proxy *> + Require all granted + </Proxy> + + ProxyPass "/" "http://localhost:19999/" connectiontimeout=5 timeout=30 keepalive=on + ProxyPassReverse "/" "http://localhost:19999/" + + ErrorLog ${APACHE_LOG_DIR}/netdata-error.log + CustomLog ${APACHE_LOG_DIR}/netdata-access.log combined +</VirtualHost> +``` + +Enable the VirtualHost: + +```sh +sudo a2ensite netdata.conf && service apache2 reload +``` + +## Netdata proxy in Plesk + +_Assuming the main goal is to make Netdata running in HTTPS._ + +1. Make a subdomain for Netdata on which you enable and force HTTPS - You can use a free Let's Encrypt certificate +2. Go to "Apache & nginx Settings", and in the following section, add: + +```conf +RewriteEngine on +RewriteRule (.*) http://localhost:19999/$1 [P,L] +``` + +3. Optional: If your server is remote, then just replace "localhost" with your actual hostname or IP, it just works. + +Repeat the operation for as many servers as you need. + +## Enable Basic Auth + +If you wish to add an authentication (user/password) to access your Netdata, do these: + +Install the package `apache2-utils`. On Debian/Ubuntu run `sudo apt-get install apache2-utils`. + +Then, generate password for user `netdata`, using `htpasswd -c /etc/apache2/.htpasswd netdata` + +**Apache 2.2 Example:**\ +Modify the virtual host with these: + +```conf + # replace the <Proxy *> section + <Proxy *> + Order deny,allow + Allow from all + </Proxy> + + # add a <Location /netdata/> section + <Location /netdata/> + AuthType Basic + AuthName "Protected site" + AuthUserFile /etc/apache2/.htpasswd + Require valid-user + Order deny,allow + Allow from all + </Location> +``` + +Specify `Location /` if Netdata is running on dedicated virtual host. + +**Apache 2.4 (dedicated virtual host) Example:** + +```conf +<VirtualHost *:80> + RewriteEngine On + ProxyRequests Off + ProxyPreserveHost On + + ServerName netdata.domain.tld + + <Proxy *> + AllowOverride None + AuthType Basic + AuthName "Protected site" + AuthUserFile /etc/apache2/.htpasswd + Require valid-user + </Proxy> + + ProxyPass "/" "http://localhost:19999/" connectiontimeout=5 timeout=30 keepalive=on + ProxyPassReverse "/" "http://localhost:19999/" + + ErrorLog ${APACHE_LOG_DIR}/netdata-error.log + CustomLog ${APACHE_LOG_DIR}/netdata-access.log combined +</VirtualHost> +``` + +Note: Changes are applied by reloading or restarting Apache. + +## Configuration of Content Security Policy + +If you want to enable CSP within your Apache, you should consider some special requirements of the headers. Modify your configuration like that: + +``` + Header always set Content-Security-Policy "default-src http: 'unsafe-inline' 'self' 'unsafe-eval'; script-src http: 'unsafe-inline' 'self' 'unsafe-eval'; style-src http: 'self' 'unsafe-inline'" +``` + +Note: Changes are applied by reloading or restarting Apache. + +## Using Netdata with Apache's `mod_evasive` module + +The `mod_evasive` Apache module helps system administrators protect their web server from brute force and distributed +denial of service attack (DDoS) attacks. + +Because Netdata sends a request to the web server for every chart update, it's normal to create 20-30 requests per +second, per client. If you're using `mod_evasive` on your Apache web server, this volume of requests will trigger the +module's protection, and your dashboard will become unresponsive. You may even begin to see 403 errors. + +To mitigate this issue, you will need to change the value of the `DOSPageCount` option in your `mod_evasive.conf` file, +which can typically be found at `/etc/httpd/conf.d/mod_evasive.conf` or `/etc/apache2/mods-enabled/evasive.conf`. + +The `DOSPageCount` option sets the limit of the number of requests from a single IP address for the same page per page +interval, which is usually 1 second. The default value is `2` requests per second. Clearly, Netdata's typical usage will +exceed that threshold, and `mod_evasive` will add your IP address to a blocklist. + +Our users have found success by setting `DOSPageCount` to `30`. Try this, and raise the value if you continue to see 403 +errors while accessing the dashboard. + +```conf +DOSPageCount 30 +``` + +Restart Apache with `sudo systemctl restart apache2`, or the appropriate method to restart services on your system, to +reload its configuration with your new values. + +### Virtual host + +To adjust the `DOSPageCount` for a specific virtual host, open your virtual host config, which can be found at +`/etc/httpd/conf/sites-available/my-domain.conf` or `/etc/apache2/sites-available/my-domain.conf` and add the +following: + +```conf +<VirtualHost *:80> + ... + # Increase the DOSPageCount to prevent 403 errors and IP addresses being blocked. + <IfModule mod_evasive20.c> + DOSPageCount 30 + </IfModule> +</VirtualHost> +``` + +See issues [#2011](https://github.com/netdata/netdata/issues/2011) and +[#7658](https://github.com/netdata/netdata/issues/7568) for more information. + +# Netdata configuration + +You might edit `/etc/netdata/netdata.conf` to optimize your setup a bit. For applying these changes you need to restart Netdata. + +## Response compression + +If you plan to use Netdata exclusively via apache, you can gain some performance by preventing double compression of its output (Netdata compresses its response, apache re-compresses it) by editing `/etc/netdata/netdata.conf` and setting: + +``` +[web] + enable gzip compression = no +``` + +Once you disable compression at Netdata (and restart it), please verify you receive compressed responses from apache (it is important to receive compressed responses - the charts will be more snappy). + +## Limit direct access to Netdata + +You would also need to instruct Netdata to listen only on `localhost`, `127.0.0.1` or `::1`. + +``` +[web] + bind to = localhost +``` + +or + +``` +[web] + bind to = 127.0.0.1 +``` + +or + +``` +[web] + bind to = ::1 +``` + + + +You can also use a unix domain socket. This will also provide a faster route between apache and Netdata: + +``` +[web] + bind to = unix:/tmp/netdata.sock +``` + +Apache 2.4.24+ can not read from `/tmp` so create your socket in `/var/run/netdata` + +``` +[web] + bind to = unix:/var/run/netdata/netdata.sock +``` + +_note: Netdata v1.8+ support unix domain sockets_ + +At the apache side, prepend the 2nd argument to `ProxyPass` with `unix:/tmp/netdata.sock|`, like this: + +``` +ProxyPass "/netdata/" "unix:/tmp/netdata.sock|http://localhost:19999/" connectiontimeout=5 timeout=30 keepalive=on +``` + + + +If your apache server is not on localhost, you can set: + +``` +[web] + bind to = * + allow connections from = IP_OF_APACHE_SERVER +``` + +*note: Netdata v1.9+ support `allow connections from`* + +`allow connections from` accepts [Netdata simple patterns](/src/libnetdata/simple_pattern/README.md) to match against the connection IP address. + +## Prevent the double access.log + +apache logs accesses and Netdata logs them too. You can prevent Netdata from generating its access log, by setting this in `/etc/netdata/netdata.conf`: + +``` +[logs] + access = off +``` + +## Troubleshooting mod_proxy + +Make sure the requests reach Netdata, by examining `/var/log/netdata/access.log`. + +1. if the requests do not reach Netdata, your apache does not forward them. +2. if the requests reach Netdata but the URLs are wrong, you have not re-written them properly. + + diff --git a/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-caddy.md b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-caddy.md new file mode 100644 index 000000000..b7608b309 --- /dev/null +++ b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-caddy.md @@ -0,0 +1,38 @@ +<!-- +title: "Netdata via Caddy" +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/Running-behind-caddy.md" +sidebar_label: "Netdata via Caddy" +learn_status: "Published" +learn_topic_type: "Tasks" +learn_rel_path: "Configuration/Secure your nodes" +--> + +# Netdata via Caddy + +To run Netdata via [Caddy v2 proxying,](https://caddyserver.com/docs/caddyfile/directives/reverse_proxy) set your Caddyfile up like this: + +```caddyfile +netdata.domain.tld { + reverse_proxy localhost:19999 +} +``` + +Other directives can be added between the curly brackets as needed. + +To run Netdata in a subfolder: + +```caddyfile +netdata.domain.tld { + handle_path /netdata/* { + reverse_proxy localhost:19999 + } +} +``` + +## limit direct access to Netdata + +You would also need to instruct Netdata to listen only to `127.0.0.1` or `::1`. + +To limit access to Netdata only from localhost, set `bind socket to IP = 127.0.0.1` or `bind socket to IP = ::1` in `/etc/netdata/netdata.conf`. + + diff --git a/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-h2o.md b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-h2o.md new file mode 100644 index 000000000..276b72e8b --- /dev/null +++ b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-h2o.md @@ -0,0 +1,187 @@ +<!-- +title: "Running Netdata behind H2O" +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-h2o.md" +sidebar_label: "Running Netdata behind H2O" +learn_status: "Published" +learn_topic_type: "Tasks" +learn_rel_path: "Configuration/Secure your nodes" +--> + +# Running Netdata behind H2O + +[H2O](https://h2o.examp1e.net/) is a new generation HTTP server that provides quicker response to users with less CPU utilization when compared to older generation of web servers. + +It is notable for having much simpler configuration than many popular HTTP servers, low resource requirements, and integrated native support for many things that other HTTP servers may need special setup to use. + +## Why H2O + +- Sane configuration defaults mean that typical configurations are very minimalistic and easy to work with. + +- Native support for HTTP/2 provides improved performance when accessing the Netdata dashboard remotely. + +- Password protect access to the Netdata dashboard without requiring Netdata Cloud. + +## H2O configuration file. + +On most systems, the H2O configuration is found under `/etc/h2o`. H2O uses [YAML 1.1](https://yaml.org/spec/1.1/), with a few special extensions, for it’s configuration files, with the main configuration file being `/etc/h2o/h2o.conf`. + +You can edit the H2O configuration file with Nano, Vim or any other text editors with which you are comfortable. + +After making changes to the configuration files, perform the following: + +- Test the configuration with `h2o -m test -c /etc/h2o/h2o.conf` + +- Restart H2O to apply tha changes with `/etc/init.d/h2o restart` or `service h2o restart` + +## Ways to access Netdata via H2O + +### As a virtual host + +With this method instead of `SERVER_IP_ADDRESS:19999`, the Netdata dashboard can be accessed via a human-readable URL such as `netdata.example.com` used in the configuration below. + +```yaml +hosts: + netdata.example.com: + listen: + port: 80 + paths: + /: + proxy.preserve-host: ON + proxy.reverse.url: http://127.0.0.1:19999 +``` + +### As a subfolder of an existing virtual host + +This method is recommended when Netdata is to be served from a subfolder (or directory). +In this case, the virtual host `netdata.example.com` already exists and Netdata has to be accessed via `netdata.example.com/netdata/`. + +```yaml +hosts: + netdata.example.com: + listen: + port: 80 + paths: + /netdata: + redirect: + status: 301 + url: /netdata/ + /netdata/: + proxy.preserve-host: ON + proxy.reverse.url: http://127.0.0.1:19999 +``` + +### As a subfolder for multiple Netdata servers, via one H2O instance + +This is the recommended configuration when one H2O instance will be used to manage multiple Netdata servers via subfolders. + +```yaml +hosts: + netdata.example.com: + listen: + port: 80 + paths: + /netdata/server1: + redirect: + status: 301 + url: /netdata/server1/ + /netdata/server1/: + proxy.preserve-host: ON + proxy.reverse.url: http://198.51.100.1:19999 + /netdata/server2: + redirect: + status: 301 + url: /netdata/server2/ + /netdata/server2/: + proxy.preserve-host: ON + proxy.reverse.url: http://198.51.100.2:19999 +``` + +Of course you can add as many backend servers as you like. + +Using the above, you access Netdata on the backend servers, like this: + +- `http://netdata.example.com/netdata/server1/` to reach Netdata on `198.51.100.1:19999` +- `http://netdata.example.com/netdata/server2/` to reach Netdata on `198.51.100.2:19999` + +### Encrypt the communication between H2O and Netdata + +In case Netdata's web server has been [configured to use TLS](/src/web/server/README.md#enabling-tls-support), it is +necessary to specify inside the H2O configuration that the final destination is using TLS. To do this, change the +`http://` on the `proxy.reverse.url` line in your H2O configuration with `https://` + +### Enable authentication + +Create an authentication file to enable basic authentication via H2O, this secures your Netdata dashboard. + +If you don't have an authentication file, you can use the following command: + +```sh +printf "yourusername:$(openssl passwd -apr1)" > /etc/h2o/passwords +``` + +And then add a basic authentication handler to each path definition: + +```yaml +hosts: + netdata.example.com: + listen: + port: 80 + paths: + /: + mruby.handler: | + require "htpasswd.rb" + Htpasswd.new("/etc/h2o/passwords", "netdata.example.com") + proxy.preserve-host: ON + proxy.reverse.url: http://127.0.0.1:19999 +``` + +For more information on using basic authentication with H2O, see [their official documentation](https://h2o.examp1e.net/configure/basic_auth.html). + +## Limit direct access to Netdata + +If your H2O server is on `localhost`, you can use this to ensure external access is only possible through H2O: + +``` +[web] + bind to = 127.0.0.1 ::1 +``` + + + +You can also use a unix domain socket. This will provide faster communication between H2O and Netdata as well: + +``` +[web] + bind to = unix:/run/netdata/netdata.sock +``` + +In the H2O configuration, use a line like the following to connect to Netdata via the unix socket: + +```yaml +proxy.reverse.url http://[unix:/run/netdata/netdata.sock] +``` + + + +If your H2O server is not on localhost, you can set: + +``` +[web] + bind to = * + allow connections from = IP_OF_H2O_SERVER +``` + +*note: Netdata v1.9+ support `allow connections from`* + +`allow connections from` accepts [Netdata simple patterns](/src/libnetdata/simple_pattern/README.md) to match against +the connection IP address. + +## Prevent the double access.log + +H2O logs accesses and Netdata logs them too. You can prevent Netdata from generating its access log, by setting +this in `/etc/netdata/netdata.conf`: + +``` +[logs] + access = off +``` diff --git a/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-haproxy.md b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-haproxy.md new file mode 100644 index 000000000..9d2aff670 --- /dev/null +++ b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-haproxy.md @@ -0,0 +1,297 @@ +<!-- +title: "Netdata via HAProxy" +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-haproxy.md" +sidebar_label: "Netdata via HAProxy" +learn_status: "Published" +learn_topic_type: "Tasks" +learn_rel_path: "Configuration/Secure your nodes" +--> + +# Netdata via HAProxy + +> HAProxy is a free, very fast and reliable solution offering high availability, load balancing, +> and proxying for TCP and HTTP-based applications. It is particularly suited for very high traffic websites +> and powers quite a number of the world's most visited ones. + +If Netdata is running on a host running HAProxy, rather than connecting to Netdata from a port number, a domain name can +be pointed at HAProxy, and HAProxy can redirect connections to the Netdata port. This can make it possible to connect to +Netdata at `https://example.com` or `https://example.com/netdata/`, which is a much nicer experience then +`http://example.com:19999`. + +To proxy requests from [HAProxy](https://github.com/haproxy/haproxy) to Netdata, +the following configuration can be used: + +## Default Configuration + +For all examples, set the mode to `http` + +```conf +defaults + mode http +``` + +## Simple Configuration + +A simple example where the base URL, say `http://example.com`, is used with no subpath: + +### Frontend + +Create a frontend to receive the request. + +```conf +frontend http_frontend + ## HTTP ipv4 and ipv6 on all ips ## + bind :::80 v4v6 + + default_backend netdata_backend +``` + +### Backend + +Create the Netdata backend which will send requests to port `19999`. + +```conf +backend netdata_backend + option forwardfor + server netdata_local 127.0.0.1:19999 + + http-request set-header Host %[src] + http-request set-header X-Forwarded-For %[src] + http-request set-header X-Forwarded-Port %[dst_port] + http-request set-header Connection "keep-alive" +``` + +## Configuration with subpath + +An example where the base URL is used with a subpath `/netdata/`: + +### Frontend + +To use a subpath, create an ACL, which will set a variable based on the subpath. + +```conf +frontend http_frontend + ## HTTP ipv4 and ipv6 on all ips ## + bind :::80 v4v6 + + # URL begins with /netdata + acl is_netdata url_beg /netdata + + # if trailing slash is missing, redirect to /netdata/ + http-request redirect scheme https drop-query append-slash if is_netdata ! { path_beg /netdata/ } + + ## Backends ## + use_backend netdata_backend if is_netdata + + # Other requests go here (optional) + # put netdata_backend here if no others are used + default_backend www_backend +``` + +### Backend + +Same as simple example, except remove `/netdata/` with regex. + +```conf +backend netdata_backend + option forwardfor + server netdata_local 127.0.0.1:19999 + + http-request set-path %[path,regsub(^/netdata/,/)] + + http-request set-header Host %[src] + http-request set-header X-Forwarded-For %[src] + http-request set-header X-Forwarded-Port %[dst_port] + http-request set-header Connection "keep-alive" +``` + +## Using TLS communication + +TLS can be used by adding port `443` and a cert to the frontend. +This example will only use Netdata if host matches example.com (replace with your domain). + +### Frontend + +This frontend uses a certificate list. + +```conf +frontend https_frontend + ## HTTP ## + bind :::80 v4v6 + # Redirect all HTTP traffic to HTTPS with 301 redirect + redirect scheme https code 301 if !{ ssl_fc } + + ## HTTPS ## + # Bind to all v4/v6 addresses, use a list of certs in file + bind :::443 v4v6 ssl crt-list /etc/letsencrypt/certslist.txt + + ## ACL ## + # Optionally check host for Netdata + acl is_example_host hdr_sub(host) -i example.com + + ## Backends ## + use_backend netdata_backend if is_example_host + # Other requests go here (optional) + default_backend www_backend +``` + +In the cert list file place a mapping from a certificate file to the domain used: + +`/etc/letsencrypt/certslist.txt`: + +```txt +example.com /etc/letsencrypt/live/example.com/example.com.pem +``` + +The file `/etc/letsencrypt/live/example.com/example.com.pem` should contain the key and +certificate (in that order) concatenated into a `.pem` file.: + +```sh +cat /etc/letsencrypt/live/example.com/fullchain.pem \ + /etc/letsencrypt/live/example.com/privkey.pem > \ + /etc/letsencrypt/live/example.com/example.com.pem +``` + +### Backend + +Same as simple, except set protocol `https`. + +```conf +backend netdata_backend + option forwardfor + server netdata_local 127.0.0.1:19999 + + http-request add-header X-Forwarded-Proto https + http-request set-header Host %[src] + http-request set-header X-Forwarded-For %[src] + http-request set-header X-Forwarded-Port %[dst_port] + http-request set-header Connection "keep-alive" +``` + +## Enable authentication + +To use basic HTTP Authentication, create an authentication list: + +```conf +# HTTP Auth +userlist basic-auth-list + group is-admin + # Plaintext password + user admin password passwordhere groups is-admin +``` + +You can create a hashed password using the `mkpassword` utility. + +```sh + printf "passwordhere" | mkpasswd --stdin --method=sha-256 +$5$l7Gk0VPIpKO$f5iEcxvjfdF11khw.utzSKqP7W.0oq8wX9nJwPLwzy1 +``` + +Replace `passwordhere` with hash: + +```conf +user admin password $5$l7Gk0VPIpKO$f5iEcxvjfdF11khw.utzSKqP7W.0oq8wX9nJwPLwzy1 groups is-admin +``` + +Now add at the top of the backend: + +```conf +acl devops-auth http_auth_group(basic-auth-list) is-admin +http-request auth realm netdata_local unless devops-auth +``` + +## Full Example + +Full example configuration with HTTP auth over TLS with subpath: + +```conf +global + maxconn 20000 + + log /dev/log local0 + log /dev/log local1 notice + user haproxy + group haproxy + pidfile /run/haproxy.pid + + stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners + stats timeout 30s + daemon + + tune.ssl.default-dh-param 4096 # Max size of DHE key + + # Default ciphers to use on SSL-enabled listening sockets. + ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS + ssl-default-bind-options no-sslv3 + +defaults + log global + mode http + option httplog + option dontlognull + timeout connect 5000 + timeout client 50000 + timeout server 50000 + errorfile 400 /etc/haproxy/errors/400.http + errorfile 403 /etc/haproxy/errors/403.http + errorfile 408 /etc/haproxy/errors/408.http + errorfile 500 /etc/haproxy/errors/500.http + errorfile 502 /etc/haproxy/errors/502.http + errorfile 503 /etc/haproxy/errors/503.http + errorfile 504 /etc/haproxy/errors/504.http + +frontend https_frontend + ## HTTP ## + bind :::80 v4v6 + # Redirect all HTTP traffic to HTTPS with 301 redirect + redirect scheme https code 301 if !{ ssl_fc } + + ## HTTPS ## + # Bind to all v4/v6 addresses, use a list of certs in file + bind :::443 v4v6 ssl crt-list /etc/letsencrypt/certslist.txt + + ## ACL ## + # Optionally check host for Netdata + acl is_example_host hdr_sub(host) -i example.com + acl is_netdata url_beg /netdata + + http-request redirect scheme https drop-query append-slash if is_netdata ! { path_beg /netdata/ } + + ## Backends ## + use_backend netdata_backend if is_example_host is_netdata + default_backend www_backend + +# HTTP Auth +userlist basic-auth-list + group is-admin + # Hashed password + user admin password $5$l7Gk0VPIpKO$f5iEcxvjfdF11khw.utzSKqP7W.0oq8wX9nJwPLwzy1 groups is-admin + +## Default server(s) (optional)## +backend www_backend + mode http + balance roundrobin + timeout connect 5s + timeout server 30s + timeout queue 30s + + http-request add-header 'X-Forwarded-Proto: https' + server other_server 111.111.111.111:80 check + +backend netdata_backend + acl devops-auth http_auth_group(basic-auth-list) is-admin + http-request auth realm netdata_local unless devops-auth + + option forwardfor + server netdata_local 127.0.0.1:19999 + + http-request set-path %[path,regsub(^/netdata/,/)] + + http-request add-header X-Forwarded-Proto https + http-request set-header Host %[src] + http-request set-header X-Forwarded-For %[src] + http-request set-header X-Forwarded-Port %[dst_port] + http-request set-header Connection "keep-alive" +``` + + diff --git a/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-lighttpd.md b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-lighttpd.md new file mode 100644 index 000000000..637bc0642 --- /dev/null +++ b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-lighttpd.md @@ -0,0 +1,75 @@ +<!-- +title: "Netdata via lighttpd v1.4.x" +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-lighttpd.md" +sidebar_label: "Netdata via lighttpd v1.4.x" +learn_status: "Published" +learn_topic_type: "Tasks" +learn_rel_path: "Configuration/Secure your nodes" +--> + +# Netdata via lighttpd v1.4.x + +Here is a config for accessing Netdata in a suburl via lighttpd 1.4.46 and newer: + +```txt +$HTTP["url"] =~ "^/netdata/" { + proxy.server = ( "" => ("netdata" => ( "host" => "127.0.0.1", "port" => 19999 ))) + proxy.header = ( "map-urlpath" => ( "/netdata/" => "/") ) +} +``` + +If you have older lighttpd you have to use a chain (such as below), as explained [at this stackoverflow answer](http://stackoverflow.com/questions/14536554/lighttpd-configuration-to-proxy-rewrite-from-one-domain-to-another). + +```txt +$HTTP["url"] =~ "^/netdata/" { + proxy.server = ( "" => ("" => ( "host" => "127.0.0.1", "port" => 19998 ))) +} + +$SERVER["socket"] == ":19998" { + url.rewrite-once = ( "^/netdata(.*)$" => "/$1" ) + proxy.server = ( "" => ( "" => ( "host" => "127.0.0.1", "port" => 19999 ))) +} +``` + + + +If the only thing the server is exposing via the web is Netdata (and thus no suburl rewriting required), +then you can get away with just + +``` +proxy.server = ( "" => ( ( "host" => "127.0.0.1", "port" => 19999 ))) +``` + +Though if it's public facing you might then want to put some authentication on it. htdigest support +looks like: + +``` +auth.backend = "htdigest" +auth.backend.htdigest.userfile = "/etc/lighttpd/lighttpd.htdigest" +auth.require = ( "" => ( "method" => "digest", + "realm" => "netdata", + "require" => "valid-user" + ) + ) +``` + +other auth methods, and more info on htdigest, can be found in lighttpd's [mod_auth docs](http://redmine.lighttpd.net/projects/lighttpd/wiki/Docs_ModAuth). + + + +It seems that lighttpd (or some versions of it), fail to proxy compressed web responses. +To solve this issue, disable web response compression in Netdata. + +Open `/etc/netdata/netdata.conf` and set in [global]\: + +``` +enable web responses gzip compression = no +``` + +## limit direct access to Netdata + +You would also need to instruct Netdata to listen only to `127.0.0.1` or `::1`. + +To limit access to Netdata only from localhost, set `bind socket to IP = 127.0.0.1` or `bind socket to IP = ::1` in `/etc/netdata/netdata.conf`. + + diff --git a/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-nginx.md b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-nginx.md new file mode 100644 index 000000000..f2dd137dd --- /dev/null +++ b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-nginx.md @@ -0,0 +1,282 @@ +# Running Netdata behind Nginx + +## Intro + +[Nginx](https://nginx.org/en/) is an HTTP and reverse proxy server, a mail proxy server, and a generic TCP/UDP proxy server used to host websites and applications of all sizes. + +The software is known for its low impact on memory resources, high scalability, and its modular, event-driven architecture which can offer secure, predictable performance. + +## Why Nginx + +- By default, Nginx is fast and lightweight out of the box. + +- Nginx is used and useful in cases when you want to access different instances of Netdata from a single server. + +- Password-protect access to Netdata, until distributed authentication is implemented via the Netdata cloud Sign In mechanism. + +- A proxy was necessary to encrypt the communication to Netdata, until v1.16.0, which provided TLS (HTTPS) support. + +## Nginx configuration file + +All Nginx configurations can be found in the `/etc/nginx/` directory. The main configuration file is `/etc/nginx/nginx.conf`. Website or app-specific configurations can be found in the `/etc/nginx/site-available/` directory. + +Configuration options in Nginx are known as directives. Directives are organized into groups known as blocks or contexts. The two terms can be used interchangeably. + +Depending on your installation source, you’ll find an example configuration file at `/etc/nginx/conf.d/default.conf` or `etc/nginx/sites-enabled/default`, in some cases you may have to manually create the `sites-available` and `sites-enabled` directories. + +You can edit the Nginx configuration file with Nano, Vim or any other text editors you are comfortable with. + +After making changes to the configuration files: + +- Test Nginx configuration with `nginx -t`. + +- Restart Nginx to effect the change with `/etc/init.d/nginx restart` or `service nginx restart`. + +## Ways to access Netdata via Nginx + +### As a virtual host + +With this method instead of `SERVER_IP_ADDRESS:19999`, the Netdata dashboard can be accessed via a human-readable URL such as `netdata.example.com` used in the configuration below. + +```conf +upstream backend { + # the Netdata server + server 127.0.0.1:19999; + keepalive 1024; +} + +server { + # nginx listens to this + listen 80; + # uncomment the line if you want nginx to listen on IPv6 address + #listen [::]:80; + + # the virtual host name of this + server_name netdata.example.com; + + location / { + proxy_set_header X-Forwarded-Host $host; + proxy_set_header X-Forwarded-Server $host; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_pass http://backend; + proxy_http_version 1.1; + proxy_pass_request_headers on; + proxy_set_header Connection "keep-alive"; + proxy_store off; + } +} +``` + +### As a subfolder to an existing virtual host + +This method is recommended when Netdata is to be served from a subfolder (or directory). +In this case, the virtual host `netdata.example.com` already exists and Netdata has to be accessed via `netdata.example.com/netdata/`. + +```conf +upstream netdata { + server 127.0.0.1:19999; + keepalive 64; +} + +server { + listen 80; + # uncomment the line if you want nginx to listen on IPv6 address + #listen [::]:80; + + # the virtual host name of this subfolder should be exposed + #server_name netdata.example.com; + + location = /netdata { + return 301 /netdata/; + } + + location ~ /netdata/(?<ndpath>.*) { + proxy_redirect off; + proxy_set_header Host $host; + + proxy_set_header X-Forwarded-Host $host; + proxy_set_header X-Forwarded-Server $host; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_http_version 1.1; + proxy_pass_request_headers on; + proxy_set_header Connection "keep-alive"; + proxy_store off; + proxy_pass http://netdata/$ndpath$is_args$args; + + gzip on; + gzip_proxied any; + gzip_types *; + } +} +``` + +### As a subfolder for multiple Netdata servers, via one Nginx + +This is the recommended configuration when one Nginx will be used to manage multiple Netdata servers via subfolders. + +```conf +upstream backend-server1 { + server 10.1.1.103:19999; + keepalive 64; +} +upstream backend-server2 { + server 10.1.1.104:19999; + keepalive 64; +} + +server { + listen 80; + # uncomment the line if you want nginx to listen on IPv6 address + #listen [::]:80; + + # the virtual host name of this subfolder should be exposed + #server_name netdata.example.com; + + location ~ /netdata/(?<behost>.*?)/(?<ndpath>.*) { + proxy_set_header X-Forwarded-Host $host; + proxy_set_header X-Forwarded-Server $host; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_http_version 1.1; + proxy_pass_request_headers on; + proxy_set_header Connection "keep-alive"; + proxy_store off; + proxy_pass http://backend-$behost/$ndpath$is_args$args; + + gzip on; + gzip_proxied any; + gzip_types *; + } + + # make sure there is a trailing slash at the browser + # or the URLs will be wrong + location ~ /netdata/(?<behost>.*) { + return 301 /netdata/$behost/; + } +} +``` + +Of course you can add as many backend servers as you like. + +Using the above, you access Netdata on the backend servers, like this: + +- `http://netdata.example.com/netdata/server1/` to reach `backend-server1` +- `http://netdata.example.com/netdata/server2/` to reach `backend-server2` + +### Encrypt the communication between Nginx and Netdata + +In case Netdata's web server has been [configured to use TLS](/src/web/server/README.md#enabling-tls-support), it is +necessary to specify inside the Nginx configuration that the final destination is using TLS. To do this, please, append +the following parameters in your `nginx.conf` + +```conf +proxy_set_header X-Forwarded-Proto https; +proxy_pass https://localhost:19999; +``` + +Optionally it is also possible to [enable TLS/SSL on Nginx](http://nginx.org/en/docs/http/configuring_https_servers.html), this way the user will encrypt not only the communication between Nginx and Netdata but also between the user and Nginx. + +If Nginx is not configured as described here, you will probably receive the error `SSL_ERROR_RX_RECORD_TOO_LONG`. + +### Enable authentication + +Create an authentication file to enable basic authentication via Nginx, this secures your Netdata dashboard. + +If you don't have an authentication file, you can use the following command: + +```sh +printf "yourusername:$(openssl passwd -apr1)" > /etc/nginx/passwords +``` + +And then enable the authentication inside your server directive: + +```conf +server { + # ... + auth_basic "Protected"; + auth_basic_user_file passwords; + # ... +} +``` + +## Limit direct access to Netdata + +If your Nginx is on `localhost`, you can use this to protect your Netdata: + +``` +[web] + bind to = 127.0.0.1 ::1 +``` + +You can also use a unix domain socket. This will also provide a faster route between Nginx and Netdata: + +``` +[web] + bind to = unix:/var/run/netdata/netdata.sock +``` + +*note: Netdata v1.8+ support unix domain sockets* + +At the Nginx side, use something like this to use the same unix domain socket: + +```conf +upstream backend { + server unix:/var/run/netdata/netdata.sock; + keepalive 64; +} +``` + + +If your Nginx server is not on localhost, you can set: + +``` +[web] + bind to = * + allow connections from = IP_OF_NGINX_SERVER +``` + +*note: Netdata v1.9+ support `allow connections from`* + +`allow connections from` accepts [Netdata simple patterns](/src/libnetdata/simple_pattern/README.md) to match against the +connection IP address. + +## Prevent the double access.log + +Nginx logs accesses and Netdata logs them too. You can prevent Netdata from generating its access log, by setting this in `/etc/netdata/netdata.conf`: + +``` +[logs] + access = off +``` + +## Use gzip compression + +By default, netdata compresses its responses. You can have nginx do that instead, with the following options in the `location /` block: + +```conf + location / { + ... + gzip on; + gzip_proxied any; + gzip_types *; + } +``` + +To disable Netdata's gzip compression, open `netdata.conf` and in the `[web]` section put: + +```conf +[web] + enable gzip compression = no +``` + +## SELinux + +If you get an 502 Bad Gateway error you might check your Nginx error log: + +```sh +# cat /var/log/nginx/error.log: +2016/09/09 12:34:05 [crit] 5731#5731: *1 connect() to 127.0.0.1:19999 failed (13: Permission denied) while connecting to upstream, client: 1.2.3.4, server: netdata.example.com, request: "GET / HTTP/2.0", upstream: "http://127.0.0.1:19999/", host: "netdata.example.com" +``` + +If you see something like the above, chances are high that SELinux prevents nginx from connecting to the backend server. To fix that, just use this policy: `setsebool -P httpd_can_network_connect true`. + + diff --git a/docs/netdata-agent/securing-netdata-agents.md b/docs/netdata-agent/securing-netdata-agents.md new file mode 100644 index 000000000..4f6ff4094 --- /dev/null +++ b/docs/netdata-agent/securing-netdata-agents.md @@ -0,0 +1,177 @@ +# Securing Netdata Agents
+
+Netdata is a monitoring system. It should be protected, the same way you protect all your admin apps. We assume Netdata
+will be installed privately, for your eyes only.
+
+Upon installation, the Netdata Agent serves the **local dashboard** at port `19999`. If the node is accessible to the
+internet at large, anyone can access the dashboard and your node's metrics at `http://NODE:19999`. We made this decision
+so that the local dashboard was immediately accessible to users, and so that we don't dictate how professionals set up
+and secure their infrastructures.
+
+Viewers will be able to get some information about the system Netdata is running. This information is everything the dashboard
+provides. The dashboard includes a list of the services each system runs (the legends of the charts under the `Systemd Services`
+section), the applications running (the legends of the charts under the `Applications` section), the disks of the system and
+their names, the user accounts of the system that are running processes (the `Users` and `User Groups` section of the dashboard),
+the network interfaces and their names (not the IPs) and detailed information about the performance of the system and its applications.
+
+This information is not sensitive (meaning that it is not your business data), but **it is important for possible attackers**.
+It will give them clues on what to check, what to try and in the case of DDoS against your applications, they will know if they
+are doing it right or not.
+
+Also, viewers could use Netdata itself to stress your servers. Although the Netdata daemon runs unprivileged, with the minimum
+process priority (scheduling priority `idle` - lower than nice 19) and adjusts its OutOfMemory (OOM) score to 1000 (so that it
+will be first to be killed by the kernel if the system starves for memory), some pressure can be applied on your systems if
+someone attempts a DDoS against Netdata.
+
+Instead of dictating how to secure your infrastructure, we give you many options to establish security best practices
+that align with your goals and your organization's standards.
+
+- [Disable the local dashboard](#disable-the-local-dashboard): **Simplest and recommended method** for those who have
+ added nodes to Netdata Cloud and view dashboards and metrics there.
+
+- [Expose Netdata only in a private LAN](#expose-netdata-only-in-a-private-lan). Simplest and recommended method for those who do not use Netdata Cloud.
+
+- [Fine-grained access control](#fine-grained-access-control): Allow local dashboard access from
+ only certain IP addresses, such as a trusted static IP or connections from behind a management LAN. Full support for Netdata Cloud.
+
+- [Use a reverse proxy (authenticating web server in proxy mode)](#use-an-authenticating-web-server-in-proxy-mode): Password-protect
+ a local dashboard and enable TLS to secure it. Full support for Netdata Cloud.
+
+- [Use Netdata parents as Web Application Firewalls](#use-netdata-parents-as-web-application-firewalls)
+
+- [Other methods](#other-methods) list some less common methods of protecting Netdata.
+
+## Disable the local dashboard
+
+This is the _recommended method for those who have connected their nodes to Netdata Cloud_ and prefer viewing real-time
+metrics using the Room Overview, Nodes tab, and Cloud dashboards.
+
+You can disable the local dashboard (and API) but retain the encrypted Agent-Cloud link
+([ACLK](/src/aclk/README.md)) that
+allows you to stream metrics on demand from your nodes via the Netdata Cloud interface. This change mitigates all
+concerns about revealing metrics and system design to the internet at large, while keeping all the functionality you
+need to view metrics and troubleshoot issues with Netdata Cloud.
+
+Open `netdata.conf` with `./edit-config netdata.conf`. Scroll down to the `[web]` section, and find the `mode =
+static-threaded` setting, and change it to `none`.
+
+```conf
+[web]
+ mode = none
+```
+
+Save and close the editor, then [restart your Agent](/packaging/installer/README.md#maintaining-a-netdata-agent-installation)
+using `sudo systemctl
+restart netdata`. If you try to visit the local dashboard to `http://NODE:19999` again, the connection will fail because
+that node no longer serves its local dashboard.
+
+> See the [configuration basics doc](/docs/netdata-agent/configuration/README.md) for details on how to find
+`netdata.conf` and use
+> `edit-config`.
+
+## Expose Netdata only in a private LAN
+
+If your organisation has a private administration and management LAN, you can bind Netdata on this network interface on all your servers.
+This is done in `Netdata.conf` with these settings:
+
+```
+[web]
+ bind to = 10.1.1.1:19999 localhost:19999
+```
+
+You can bind Netdata to multiple IPs and ports. If you use hostnames, Netdata will resolve them and use all the IPs
+(in the above example `localhost` usually resolves to both `127.0.0.1` and `::1`).
+
+**This is the best and the suggested way to protect Netdata**. Your systems **should** have a private administration and management
+LAN, so that all management tasks are performed without any possibility of them being exposed on the internet.
+
+For cloud based installations, if your cloud provider does not provide such a private LAN (or if you use multiple providers),
+you can create a virtual management and administration LAN with tools like `tincd` or `gvpe`. These tools create a mesh VPN
+allowing all servers to communicate securely and privately. Your administration stations join this mesh VPN to get access to
+management and administration tasks on all your cloud servers.
+
+For `gvpe` we have developed a [simple provisioning tool](https://github.com/netdata/netdata-demo-site/tree/master/gvpe) you
+may find handy (it includes statically compiled `gvpe` binaries for Linux and FreeBSD, and also a script to compile `gvpe`
+on your macOS system). We use this to create a management and administration LAN for all Netdata demo sites (spread all over
+the internet using multiple hosting providers).
+
+## Fine-grained access control
+
+If you want to keep using the local dashboard, but don't want it exposed to the internet, you can restrict access with
+[access lists](/src/web/server/README.md#access-lists). This method also fully
+retains the ability to stream metrics
+on-demand through Netdata Cloud.
+
+The `allow connections from` setting helps you allow only certain IP addresses or FQDN/hostnames, such as a trusted
+static IP, only `localhost`, or connections from behind a management LAN.
+
+By default, this setting is `localhost *`. This setting allows connections from `localhost` in addition to _all_
+connections, using the `*` wildcard. You can change this setting using Netdata's [simple
+patterns](/src/libnetdata/simple_pattern/README.md).
+
+```conf
+[web]
+ # Allow only localhost connections
+ allow connections from = localhost
+
+ # Allow only from management LAN running on `10.X.X.X`
+ allow connections from = 10.*
+
+ # Allow connections only from a specific FQDN/hostname
+ allow connections from = example*
+```
+
+The `allow connections from` setting is global and restricts access to the dashboard, badges, streaming, API, and
+`netdata.conf`, but you can also set each of those access lists more granularly if you choose:
+
+```conf
+[web]
+ allow connections from = localhost *
+ allow dashboard from = localhost *
+ allow badges from = *
+ allow streaming from = *
+ allow netdata.conf from = localhost fd* 10.* 192.168.* 172.16.* 172.17.* 172.18.* 172.19.* 172.20.* 172.21.* 172.22.* 172.23.* 172.24.* 172.25.* 172.26.* 172.27.* 172.28.* 172.29.* 172.30.* 172.31.*
+ allow management from = localhost
+```
+
+See the [web server](/src/web/server/README.md#access-lists) docs for additional details
+about access lists. You can take
+access lists one step further by [enabling SSL](/src/web/server/README.md#enabling-tls-support) to encrypt data from local
+dashboard in transit. The connection to Netdata Cloud is always secured with TLS.
+
+## Use an authenticating web server in proxy mode
+
+Use one web server to provide authentication in front of **all your Netdata servers**. So, you will be accessing all your Netdata with
+URLs like `http://{HOST}/netdata/{NETDATA_HOSTNAME}/` and authentication will be shared among all of them (you will sign-in once for all your servers).
+Instructions are provided on how to set the proxy configuration to have Netdata run behind
+[nginx](/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-nginx.md),
+[HAproxy](/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-haproxy.md),
+[Apache](/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-apache.md),
+[lighthttpd](/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-lighttpd.md),
+[caddy](/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-caddy.md), and
+[H2O](/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-h2o.md).
+
+## Use Netdata parents as Web Application Firewalls
+
+The Netdata Agents you install on your production systems do not need direct access to the Internet. Even when you use
+Netdata Cloud, you can appoint one or more Netdata Parents to act as border gateways or application firewalls, isolating
+your production systems from the rest of the world. Netdata
+Parents receive metric data from Netdata Agents or other Netdata Parents on one side, and serve most queries using their own
+copy of the data to satisfy dashboard requests on the other side.
+
+For more information see [Streaming and replication](/docs/observability-centralization-points/README.md).
+
+## Other methods
+
+Of course, there are many more methods you could use to protect Netdata:
+
+- Bind Netdata to localhost and use `ssh -L 19998:127.0.0.1:19999 remote.netdata.ip` to forward connections of local port 19998 to remote port 19999.
+This way you can ssh to a Netdata server and then use `http://127.0.0.1:19998/` on your computer to access the remote Netdata dashboard.
+
+- If you are always under a static IP, you can use the script given above to allow direct access to your Netdata servers without authentication,
+from all your static IPs.
+
+- Install all your Netdata in **headless data collector** mode, forwarding all metrics in real-time to a parent
+ Netdata server, which will be protected with authentication using an nginx server running locally at the parent
+ Netdata server. This requires more resources (you will need a bigger parent Netdata server), but does not require
+ any firewall changes, since all the child Netdata servers will not be listening for incoming connections.
diff --git a/docs/netdata-agent/sizing-netdata-agents/README.md b/docs/netdata-agent/sizing-netdata-agents/README.md new file mode 100644 index 000000000..3ba346f7a --- /dev/null +++ b/docs/netdata-agent/sizing-netdata-agents/README.md @@ -0,0 +1,89 @@ +# Sizing Netdata Agents + +Netdata automatically adjusts its resources utilization based on the workload offered to it. + +This is a map of how Netdata **features impact resources utilization**: + +| Feature | CPU | RAM | Disk I/O | Disk Space | Retention | Bandwidth | +|-----------------------------:|:---:|:---:|:--------:|:----------:|:---------:|:---------:| +| Metrics collected | X | X | X | X | X | - | +| Samples collection frequency | X | - | X | X | X | - | +| Database mode and tiers | - | X | X | X | X | - | +| Machine learning | X | X | - | - | - | - | +| Streaming | X | X | - | - | - | X | + +1. **Metrics collected**: The number of metrics collected affects almost every aspect of resources utilization. + + When you need to lower the resources used by Netdata, this is an obvious first step. + +2. **Samples collection frequency**: By default Netdata collects metrics with 1-second granularity, unless the metrics collected are not updated that frequently, in which case Netdata collects them at the frequency they are updated. This is controlled per data collection job. + + Lowering the data collection frequency from every-second to every-2-seconds, will make Netdata use half the CPU utilization. So, CPU utilization is proportional to the data collection frequency. + +3. **Database Mode and Tiers**: By default Netdata stores metrics in 3 database tiers: high-resolution, mid-resolution, low-resolution. All database tiers are updated in parallel during data collection, and depending on the query duration Netdata may consult one or more tiers to optimize the resources required to satisfy it. + + The number of database tiers affects the memory requirements of Netdata. Going from 3-tiers to 1-tier, will make Netdata use half the memory. Of course metrics retention will also be limited to 1 tier. + +4. **Machine Learning**: Byt default Netdata trains multiple machine learning models for every metric collected, to learn its behavior and detect anomalies. Machine Learning is a CPU intensive process and affects the overall CPU utilization of Netdata. + +5. **Streaming Compression**: When using Netdata in Parent-Child configurations to create Metrics Centralization Points, the compression algorithm used greatly affects CPU utilization and bandwidth consumption. + + Netdata supports multiple streaming compressions algorithms, allowing the optimization of either CPU utilization or Network Bandwidth. The default algorithm `zstd` provides the best balance among them. + +## Minimizing the resources used by Netdata Agents + +To minimize the resources used by Netdata Agents, we suggest to configure Netdata Parents for centralizing metric samples, and disabling most of the features on Netdata Children. This will provide minimal resources utilization at the edge, while all the features of Netdata are available at the Netdata Parents. + +The following guides provide instructions on how to do this. + +## Maximizing the scale of Netdata Parents + +Netdata Parents automatically size resource utilization based on the workload they receive. The only possible option for improving query performance is to dedicate more RAM to them, by increasing their caches efficiency. + +Check [RAM Requirements](/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md) for more information. + +## Innovations Netdata has for optimal performance and scalability + +The following are some of the innovations the open-source Netdata agent has, that contribute to its excellent performance, and scalability. + +1. **Minimal disk I/O** + + When Netdata saves data on-disk, it stores them at their final place, eliminating the need to reorganize this data. + + Netdata is organizing its data structures in such a way that samples are committed to disk as evenly as possible across time, without affecting its memory requirements. + + Furthermore, Netdata Agents use direct-I/O for saving and loading metric samples. This prevents Netdata from polluting system caches with metric data. Netdata maintains its own caches for this data. + + All these features make Netdata an nice partner and a polite citizen for production applications running on the same systems Netdata runs. + +2. **4 bytes per sample uncompressed** + + To achieve optimal memory and disk footprint, Netdata uses a custom 32-bit floating point number. This floating point number is used to store the samples collected, together with their anomaly bit. The database of Netdata is fixed-step, so it has predefined slots for every sample, allowing Netdata to store timestamps once every several hundreds samples, minimizing both its memory requirements and the disk footprint. + + The final disk footprint of Netdata varies due to compression efficiency. It is usually about 0.6 bytes per sample for the high-resolution tier (per-second), 6 bytes per sample for the mid-resolution tier (per-minute) and 18 bytes per sample for the low-resolution tier (per-hour). + +3. **Query priorities** + + Alerting, Machine Learning, Streaming and Replication, rely on metric queries. When multiple queries are running in parallel, Netdata assigns priorities to all of them, favoring interactive queries over background tasks. This means that queries do not compete equally for resources. Machine learning or replication may slow down when interactive queries are running and the system starves for resources. + +4. **A pointer per label** + + Apart from metric samples, metric labels and their cardinality is the biggest memory consumer, especially in highly ephemeral environments, like kubernetes. Netdata uses a single pointer for any label key-value pair that is reused. Keys and values are also deduplicated, providing the best possible memory footprint for metric labels. + +5. **Streaming Protocol** + + The streaming protocol of Netdata allows minimizing the resources consumed on production systems by delegating features of to other Netdata agents (Parents), without compromising monitoring fidelity or responsiveness, enabling the creation of a highly distributed observability platform. + +## Netdata vs Prometheus + +Netdata outperforms Prometheus in every aspect. -35% CPU Utilization, -49% RAM usage, -12% network bandwidth, -98% disk I/O, -75% in disk footprint for high resolution data, while providing more than a year of retention. + +Read the [full comparison here](https://blog.netdata.cloud/netdata-vs-prometheus-performance-analysis/). + +## Energy Efficiency + +University of Amsterdam contacted a research on the impact monitoring systems have on docker based systems. + +The study found that Netdata excels in CPU utilization, RAM usage, Execution Time and concluded that **Netdata is the most energy efficient tool**. + +Read the [full study here](https://www.ivanomalavolta.com/files/papers/ICSOC_2023.pdf). diff --git a/docs/netdata-agent/sizing-netdata-agents/bandwidth-requirements.md b/docs/netdata-agent/sizing-netdata-agents/bandwidth-requirements.md new file mode 100644 index 000000000..092c8da16 --- /dev/null +++ b/docs/netdata-agent/sizing-netdata-agents/bandwidth-requirements.md @@ -0,0 +1,47 @@ +# Bandwidth Requirements + +## On Production Systems, Standalone Netdata + +Standalone Netdata may use network bandwidth under the following conditions: + +1. You configured data collection jobs that are fetching data from remote systems. There is no such jobs enabled by default. +2. You use the dashboard of the Netdata. +3. [Netdata Cloud communication](#netdata-cloud-communication) (see below). + +## On Metrics Centralization Points, between Netdata Children & Parents + +Netdata supports multiple compression algorithms for streaming communication. Netdata Children offer all their compression algorithms when connecting to a Netdata Parent, and the Netdata Parent decides which one to use based on algorithms availability and user configuration. + +| Algorithm | Best for | +|:---------:|:-----------------------------------------------------------------------------------------------------------------------------------:| +| `zstd` | The best balance between CPU utilization and compression efficiency. This is the default. | +| `lz4` | The fastest of the algorithms. Use this when CPU utilization is more important than bandwidth. | +| `gzip` | The best compression efficiency, at the expense of CPU utilization. Use this when bandwidth is more important than CPU utilization. | +| `brotli` | The most CPU intensive algorithm, providing the best compression. | + +The expected bandwidth consumption using `zstd` for 1 million samples per second is 84 Mbps, or 10.5 MiB/s. + +The order compression algorithms is selected is configured in `stream.conf`, per `[API KEY]`, like this: + +``` + compression algorithms order = zstd lz4 brotli gzip +``` + +The first available algorithm on both the Netdata Child and the Netdata Parent, from left to right, is chosen. + +Compression can also be disabled in `stream.conf` at either Netdata Children or Netdata Parents. + +## Netdata Cloud Communication + +When Netdata Agents connect to Netdata Cloud, they communicate metadata of the metrics being collected, but they do not stream the samples collected for each metric. + +The information transferred to Netdata Cloud is: + +1. Information and **metadata about the system itself**, like its hostname, architecture, virtualization technologies used and generally labels associated with the system. +2. Information about the **running data collection plugins, modules and jobs**. +3. Information about the **metrics available and their retention**. +4. Information about the **configured alerts and their transitions**. + +This is not a constant stream of information. Netdata Agents update Netdata Cloud only about status changes on all the above (e.g. an alert being triggered, or a metric stopped being collected). So, there is an initial handshake and exchange of information when Netdata starts, and then there only updates when required. + +Of course, when you view Netdata Cloud dashboards that need to query the database a Netdata agent maintains, this query is forwarded to an agent that can satisfy it. This means that Netdata Cloud receives metric samples only when a user is accessing a dashboard and the samples transferred are usually aggregations to allow rendering the dashboards. diff --git a/docs/netdata-agent/sizing-netdata-agents/cpu-requirements.md b/docs/netdata-agent/sizing-netdata-agents/cpu-requirements.md new file mode 100644 index 000000000..021a35fb2 --- /dev/null +++ b/docs/netdata-agent/sizing-netdata-agents/cpu-requirements.md @@ -0,0 +1,65 @@ +# CPU Requirements + +Netdata's CPU consumption is affected by the following factors: + +1. The number of metrics collected +2. The frequency metrics are collected +3. Machine Learning +4. Streaming compression (streaming of metrics to Netdata Parents) +5. Database Mode + +## On Production Systems, Netdata Children + +On production systems, where Netdata is running with default settings, monitoring the system it is installed at and its containers and applications, CPU utilization should usually be about 1% to 5% of a single CPU core. + +This includes 3 database tiers, machine learning, per-second data collection, alerts, and streaming to a Netdata Parent. + +## On Metrics Centralization Points, Netdata Parents + +On Metrics Centralization Points, Netdata Parents running on modern server hardware, we **estimate CPU utilization per million of samples collected per second**: + +| Feature | Depends On | Expected Utilization | Key Reasons | +|:-----------------:|:---------------------------------------------------:|:----------------------------------------------------------------:|:-------------------------------------------------------------------------:| +| Metrics Ingestion | Number of samples received per second | 2 CPU cores per million of samples per second | Decompress and decode received messages, update database. | +| Metrics re-streaming| Number of samples resent per second | 2 CPU cores per million of samples per second | Encode and compress messages towards Netdata Parent. | +| Machine Learning | Number of unique time-series concurrently collected | 2 CPU cores per million of unique metrics concurrently collected | Train machine learning models, query existing models to detect anomalies. | + +We recommend keeping the total CPU utilization below 60% when a Netdata Parent is steadily ingesting metrics, training machine learning models and running health checks. This will leave enough CPU resources available for queries. + +## I want to minimize CPU utilization. What should I do? + +You can control Netdata's CPU utilization with these parameters: + +1. **Data collection frequency**: Going from per-second metrics to every-2-seconds metrics will half the CPU utilization of Netdata. +2. **Number of metrics collected**: Netdata by default collects every metric available on the systems it runs. Review the metrics collected and disable data collection plugins and modules not needed. +3. **Machine Learning**: Disable machine learning to save CPU cycles. +4. **Number of database tiers**: Netdata updates database tiers in parallel, during data collection. This affects both CPU utilization and memory requirements. +5. **Database Mode**: The default database mode is `dbengine`, which compresses and commits data to disk. If you have a Netdata Parent where metrics are aggregated and saved to disk and there is a reliable connection between the Netdata you want to optimize and its Parent, switch to database mode `ram` or `alloc`. This disables saving to disk, so your Netdata will also not use any disk I/O. + +## I see increased CPU consumption when a busy Netdata Parent starts, why? + +When a Netdata Parent starts and Netdata children get connected to it, there are several operations that temporarily affect CPU utilization, network bandwidth and disk I/O. + +The general flow looks like this: + +1. **Back-filling of higher tiers**: Usually this means calculating the aggregates of the last hour of `tier2` and of the last minute of `tier1`, ensuring that higher tiers reflect all the information `tier0` has. If Netdata was stopped abnormally (e.g. due to a system failure or crash), higher tiers may have to be back-filled for longer durations. +2. **Metadata synchronization**: The metadata of all metrics each Netdata Child maintains are negotiated between the Child and the Parent and are synchronized. +3. **Replication**: If the Parent is missing samples the Child has, these samples are transferred to the Parent before transferring new samples. +4. Once all these finish, the normal **streaming of new metric samples** starts. +5. At the same time, **machine learning** initializes, loads saved trained models and prepares anomaly detection. +6. After a few moments the **health engine starts checking metrics** for triggering alerts. + +The above process is per metric. So, while one metric back-fills, another replicates and a third one streams. + +At the same time: + +- the compression algorithm learns the patterns of the data exchanged and optimizes its dictionaries for optimal compression and CPU utilization, +- the database engine adjusts the page size of each metric, so that samples are committed to disk as evenly as possible across time. + +So, when looking for the "steady CPU consumption during ingestion" of a busy Netdata Parent, we recommend to let it stabilize for a few hours before checking. + +Keep in mind that Netdata has been designed so that even if during the initialization phase and the connection of hundreds of Netdata Children the system lacks CPU resources, the Netdata Parent will complete all the operations and eventually enter a steady CPU consumption during ingestion, without affecting the quality of the metrics stored. So, it is ok if during initialization of a busy Netdata Parent, CPU consumption spikes to 100%. + +Important: the above initialization process is not such intense when new nodes get connected to a Netdata Parent for the first time (e.g. ephemeral nodes), since several of the steps involved are not required. + +Especially for the cases where children disconnect and reconnect to the Parent due to network related issues (i.e. both the Netdata Child and the Netdata Parent have not been restarted and less than 1 hour has passed since the last disconnection), the re-negotiation phase is minimal and metrics are instantly entering the normal streaming phase. diff --git a/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md b/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md new file mode 100644 index 000000000..d9e879cb6 --- /dev/null +++ b/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md @@ -0,0 +1,131 @@ +# Disk Requirements & Retention + +## Database Modes and Tiers + +Netdata comes with 3 database modes: + +1. `dbengine`: the default high-performance multi-tier database of Netdata. Metric samples are cached in memory and are saved to disk in multiple tiers, with compression. +2. `ram`: metric samples are stored in ring buffers in memory, with increments of 1024 samples. Metric samples are not committed to disk. Kernel-Same-Page (KSM) can be used to deduplicate Netdata's memory. +3. `alloc`: metric samples are stored in ring buffers in memory, with flexible increments. Metric samples are not committed to disk. + +## `ram` and `alloc` + +Modes `ram` and `alloc` can help when Netdata should not introduce any disk I/O at all. In both of these modes, metric samples exist only in memory, and only while they are collected. + +When Netdata is configured to stream its metrics to a Metrics Observability Centralization Point (a Netdata Parent), metric samples are forwarded in real-time to that Netdata Parent. The ring buffers available in these modes is used to cache the collected samples for some time, in case there are network issues, or the Netdata Parent is restarted for maintenance. + +The memory required per sample in these modes, is 4 bytes: + +- `ram` mode uses `mmap()` behind the scene, and can be incremented in steps of 1024 samples (4KiB). Mode `ram` allows the use of the Linux kernel memory dedupper (Kernel-Same-Page or KSM) to deduplicate Netdata ring buffers and save memory. +- `alloc` mode can be sized for any number of samples per metric. KSM cannot be used in this mode. + +To configure database mode `ram` or `alloc`, in `netdata.conf`, set the following: + +- `[db].mode` to either `ram` or `alloc`. +- `[db].retention` to the number of samples the ring buffers should maintain. For `ram` if the value set is not a multiple of 1024, the next multiple of 1024 will be used. + +## `dbengine` + +`dbengine` supports up to 5 tiers. By default, 3 tiers are used, like this: + +| Tier | Resolution | Uncompressed Sample Size | Usually On Disk | +|:--------:|:--------------------------------------------------------------------------------------------:|:------------------------:|:---------------:| +| `tier0` | native resolution (metrics collected per-second as stored per-second) | 4 bytes | 0.6 bytes | +| `tier1` | 60 iterations of `tier0`, so when metrics are collected per-second, this tier is per-minute. | 16 bytes | 6 bytes | +| `tier2` | 60 iterations of `tier1`, so when metrics are collected per second, this tier is per-hour. | 16 bytes | 18 bytes | + +Data are saved to disk compressed, so the actual size on disk varies depending on compression efficiency. + +`dbegnine` tiers are overlapping, so higher tiers include a down-sampled version of the samples in lower tiers: + +```mermaid +gantt + dateFormat YYYY-MM-DD + tickInterval 1week + axisFormat + todayMarker off + tier0, 14d :a1, 2023-12-24, 7d + tier1, 60d :a2, 2023-12-01, 30d + tier2, 365d :a3, 2023-11-02, 59d +``` + +## Disk Space and Metrics Retention + +You can find information about the current disk utilization of a Netdata Parent, at <http://agent-ip:19999/api/v2/info>. The output of this endpoint is like this: + +```json +{ + // more information about the agent + // then, near the end: + "db_size": [ + { + "tier": 0, + "metrics": 43070, + "samples": 88078162001, + "disk_used": 41156409552, + "disk_max": 41943040000, + "disk_percent": 98.1245269, + "from": 1705033983, + "to": 1708856640, + "retention": 3822657, + "expected_retention": 3895720, + "currently_collected_metrics": 27424 + }, + { + "tier": 1, + "metrics": 72987, + "samples": 5155155269, + "disk_used": 20585157180, + "disk_max": 20971520000, + "disk_percent": 98.1576785, + "from": 1698287340, + "to": 1708856640, + "retention": 10569300, + "expected_retention": 10767675, + "currently_collected_metrics": 27424 + }, + { + "tier": 2, + "metrics": 148234, + "samples": 314919121, + "disk_used": 5957346684, + "disk_max": 10485760000, + "disk_percent": 56.8136853, + "from": 1667808000, + "to": 1708856640, + "retention": 41048640, + "expected_retention": 72251324, + "currently_collected_metrics": 27424 + } + ] +} +``` + +In this example: + +- `tier` is the database tier. +- `metrics` is the number of unique time-series in the database. +- `samples` is the number of samples in the database. +- `disk_used` is the currently used disk space in bytes. +- `disk_max` is the configured max disk space in bytes. +- `disk_percent` is the current disk space utilization for this tier. +- `from` is the first (oldest) timestamp in the database for this tier. +- `to` is the latest (newest) timestamp in the database for this tier. +- `retention` is the current retention of the database for this tier, in seconds (divide by 3600 for hours, divide by 86400 for days). +- `expected_retention` is the expected retention in seconds when `disk_percent` will be 100 (divide by 3600 for hours, divide by 86400 for days). +- `currently_collected_metrics` is the number of unique time-series currently being collected for this tier. + +So, for our example above: + +| Tier | # Of Metrics | # Of Samples | Disk Used | Disk Free | Current Retention | Expected Retention | Sample Size | +|-----:|-------------:|--------------:|----------:|----------:|------------------:|-------------------:|------------:| +| 0 | 43.1K | 88.1 billion | 38.4Gi | 1.88% | 44.2 days | 45.0 days | 0.46 B | +| 1 | 73.0K | 5.2 billion | 19.2Gi | 1.84% | 122.3 days | 124.6 days | 3.99 B | +| 2 | 148.3K | 315.0 million | 5.6Gi | 43.19% | 475.1 days | 836.2 days | 18.91 B | + +To configure retention, in `netdata.conf`, set the following: + +- `[db].mode` to `dbengine`. +- `[db].dbengine multihost disk space MB`, this is the max disk size for `tier0`. The default is 256MiB. +- `[db].dbengine tier 1 multihost disk space MB`, this is the max disk space for `tier1`. The default is 50% of `tier0`. +- `[db].dbengine tier 2 multihost disk space MB`, this is the max disk space for `tier2`. The default is 50% of `tier1`. diff --git a/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md b/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md new file mode 100644 index 000000000..8d8522517 --- /dev/null +++ b/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md @@ -0,0 +1,60 @@ +# RAM Requirements + +With default configuration about database tiers, Netdata should need about 16KiB per unique metric collected, independently of the data collection frequency. + +Netdata supports memory ballooning and automatically sizes and limits the memory used, based on the metrics concurrently being collected. + +## On Production Systems, Netdata Children + +With default settings, Netdata should run with 100MB to 200MB of RAM, depending on the number of metrics being collected. + +This number can be lowered by limiting the number of database tier or switching database modes. For more information check [Disk Requirements and Retention](/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md). + +## On Metrics Centralization Points, Netdata Parents + +The general formula, with the default configuration of database tiers, is: + +``` +memory = UNIQUE_METRICS x 16KiB + CONFIGURED_CACHES +``` + +The default `CONFIGURED_CACHES` is 32MiB. + +For 1 million concurrently collected time-series (independently of their data collection frequency), the memory required is: + +``` +UNIQUE_METRICS = 1000000 +CONFIGURED_CACHES = 32MiB + +(UNIQUE_METRICS * 16KiB / 1024 in MiB) + CONFIGURED_CACHES = +( 1000000 * 16KiB / 1024 in MiB) + 32 MiB = +15657 MiB = +about 16 GiB +``` + +There are 2 cache sizes that can be configured in `netdata.conf`: + +1. `[db].dbengine page cache size MB`: this is the main cache that keeps metrics data into memory. When data are not found in it, the extent cache is consulted, and if not found in that either, they are loaded from disk. +2. `[db].dbengine extent cache size MB`: this is the compressed extent cache. It keeps in memory compressed data blocks, as they appear on disk, to avoid reading them again. Data found in the extend cache but not in the main cache have to be uncompressed to be queried. + +Both of them are dynamically adjusted to use some of the total memory computed above. The configuration in `netdata.conf` allows providing additional memory to them, increasing their caching efficiency. + +## I have a Netdata Parent that is also a systemd-journal logs centralization point, what should I know? + +Logs usually require significantly more disk space and I/O bandwidth than metrics. For optimal performance we recommend to store metrics and logs on separate, independent disks. + +Netdata uses direct-I/O for its database, so that it does not pollute the system caches with its own data. We want Netdata to be a nice citizen when it runs side-by-side with production applications, so this was required to guarantee that Netdata does not affect the operation of databases or other sensitive applications running on the same servers. + +To optimize disk I/O, Netdata maintains its own private caches. The default settings of these caches are automatically adjusted to the minimum required size for acceptable metrics query performance. + +`systemd-journal` on the other hand, relies on operating system caches for improving the query performance of logs. When the system lacks free memory, querying logs leads to increased disk I/O. + +If you are experiencing slow responses and increased disk reads when metrics queries run, we suggest to dedicate some more RAM to Netdata. + +We frequently see that the following strategy gives best results: + +1. Start the Netdata Parent, send all the load you expect it to have and let it stabilize for a few hours. Netdata will now use the minimum memory it believes is required for smooth operation. +2. Check the available system memory. +3. Set the page cache in `netdata.conf` to use 1/3 of the available memory. + +This will allow Netdata queries to have more caches, while leaving plenty of available memory of logs and the operating system. diff --git a/docs/netdata-agent/start-stop-restart.md b/docs/netdata-agent/start-stop-restart.md new file mode 100644 index 000000000..6fbe18d31 --- /dev/null +++ b/docs/netdata-agent/start-stop-restart.md @@ -0,0 +1,153 @@ +# Start, stop, or restart the Netdata Agent + +When you install the Netdata Agent, the [daemon](/src/daemon/README.md) is +configured to start at boot and stop and restart/shutdown. + +You will most often need to _restart_ the Agent to load new or editing configuration files. +[Health configuration](#reload-health-configuration) files are the only exception, as they can be reloaded without restarting +the entire Agent. + +Stopping or restarting the Netdata Agent will cause gaps in stored metrics until the `netdata` process initiates +collectors and the database engine. + +## Using `systemctl`, `service`, or `init.d` + +This is the recommended way to start, stop, or restart the Netdata daemon. + +- To **start** Netdata, run `sudo systemctl start netdata`. +- To **stop** Netdata, run `sudo systemctl stop netdata`. +- To **restart** Netdata, run `sudo systemctl restart netdata`. + +If the above commands fail, or you know that you're using a non-systemd system, try using the `service` command: + +- **service**: `sudo service netdata start`, `sudo service netdata stop`, `sudo service netdata restart` + +## Using `netdata` + +Use the `netdata` command, typically located at `/usr/sbin/netdata`, to start the Netdata daemon. + +```bash +sudo netdata +``` + +If you start the daemon this way, close it with `sudo killall netdata`. + +## Using `netdatacli` + +The Netdata Agent also comes with a [CLI tool](/src/cli/README.md) capable of performing shutdowns. Start the Agent back up +using your preferred method listed above. + +```bash +sudo netdatacli shutdown-agent +``` + +## Netdata MSI installations + +Netdata provides an installer for Windows using WSL, on those installations by using a Windows terminal (e.g. the Command prompt or Windows Powershell) you can: + +- Start Netdata, by running `start-netdata` +- Stop Netdata, by running `stop-netdata` +- Restart Netdata, by running `restart-netdata` + +## Reload health configuration + +You do not need to restart the Netdata Agent between changes to health configuration files, such as specific health +entities. Instead, use [`netdatacli`](#using-netdatacli) and the `reload-health` option to prevent gaps in metrics +collection. + +```bash +sudo netdatacli reload-health +``` + +If `netdatacli` doesn't work on your system, send a `SIGUSR2` signal to the daemon, which reloads health configuration +without restarting the entire process. + +```bash +killall -USR2 netdata +``` + +## Force stop stalled or unresponsive `netdata` processes + +In rare cases, the Netdata Agent may stall or not properly close sockets, preventing a new process from starting. In +these cases, try the following three commands: + +```bash +sudo systemctl stop netdata +sudo killall netdata +ps aux| grep netdata +``` + +The output of `ps aux` should show no `netdata` or associated processes running. You can now start the Netdata Agent +again with `service netdata start`, or the appropriate method for your system. + +## Starting Netdata at boot + +In the `system` directory you can find scripts and configurations for the +various distros. + +### systemd + +The installer already installs `netdata.service` if it detects a systemd system. + +To install `netdata.service` by hand, run: + +```sh +# stop Netdata +killall netdata + +# copy netdata.service to systemd +cp system/netdata.service /etc/systemd/system/ + +# let systemd know there is a new service +systemctl daemon-reload + +# enable Netdata at boot +systemctl enable netdata + +# start Netdata +systemctl start netdata +``` + +### init.d + +In the system directory you can find `netdata-lsb`. Copy it to the proper place according to your distribution +documentation. For Ubuntu, this can be done via running the following commands as root. + +```sh +# copy the Netdata startup file to /etc/init.d +cp system/netdata-lsb /etc/init.d/netdata + +# make sure it is executable +chmod +x /etc/init.d/netdata + +# enable it +update-rc.d netdata defaults +``` + +### openrc (gentoo) + +In the `system` directory you can find `netdata-openrc`. Copy it to the proper +place according to your distribution documentation. + +### CentOS / Red Hat Enterprise Linux + +For older versions of RHEL/CentOS that don't have systemd, an init script is included in the system directory. This can +be installed by running the following commands as root. + +```sh +# copy the Netdata startup file to /etc/init.d +cp system/netdata-init-d /etc/init.d/netdata + +# make sure it is executable +chmod +x /etc/init.d/netdata + +# enable it +chkconfig --add netdata +``` + +_There have been some recent work on the init script, see PR +<https://github.com/netdata/netdata/pull/403>_ + +### other systems + +You can start Netdata by running it from `/etc/rc.local` or equivalent. diff --git a/docs/netdata-agent/versions-and-platforms.md b/docs/netdata-agent/versions-and-platforms.md new file mode 100644 index 000000000..14dc393b5 --- /dev/null +++ b/docs/netdata-agent/versions-and-platforms.md @@ -0,0 +1,70 @@ +# Netdata Agent Versions & Platforms + +Netdata is evolving rapidly and new features are added at a constant pace. Therefore we have a frequent release cadence to deliver all these features to use as soon as possible. + +Netdata Agents are available in 2 versions: + +| Release Channel | Release Frequency | Support Policy & Features | Support Duration | Backwards Compatibility | +|:---------------:|:---------------------------------------------:|:---------------------------------------------------------:|:----------------------------------------:|:---------------------------------------------------------------------------------:| +| Stable | At most once per month, usually every 45 days | Receiving bug fixes and security updates between releases | Up to the 2nd stable release after them | Previous configuration semantics and data are supported by newer releases | +| Nightly | Every night at 00:00 UTC | Latest pre-released features | Up to the 2nd nightly release after them | Configuration and data of unreleased features may change between nightly releases | + +> "Support Duration" defines the time we consider the release as actively used by users in production systems, so that all features of Netdata should be working like the day they were released. However, after the latest release, previous releases stop receiving bug fixes and security updates. All users are advised to update to the latest release to get the latest bug fixes. + +## Binary Distribution Packages + +Binary distribution packages are provided by Netdata, via CI integration, for the following platforms and architectures: + +| Platform | Platform Versions | Released Packages Architecture | Format | +|:-----------------------:|:--------------------------------:|:------------------------------------------------:|:------------:| +| Docker under Linux | 19.03 and later | `x86_64`, `i386`, `ARMv7`, `AArch64`, `POWER8+` | docker image | +| Static Builds | - | `x86_64`, `ARMv6`, `ARMv7`, `AArch64`, `POWER8+` | .gz.run | +| Alma Linux | 8.x, 9.x | `x86_64`, `AArch64` | RPM | +| Amazon Linux | 2, 2023 | `x86_64`, `AArch64` | RPM | +| Centos | 7.x | `x86_64` | RPM | +| Debian | 10.x, 11.x, 12.x | `x86_64`, `i386`, `ARMv7`, `AArch64` | DEB | +| Fedora | 37, 38, 39 | `x86_64`, `AArch64` | RPM | +| OpenSUSE | Leap 15.4, Leap 15.5, Tumbleweed | `x86_64`, `AArch64` | RPM | +| Oracle Linux | 8.x, 9.x | `x86_64`, `AArch64` | RPM | +| Redhat Enterprise Linux | 7.x | `x86_64` | RPM | +| Redhat Enterprise Linux | 8.x, 9.x | `x86_64`, `AArch64` | RPM | +| Ubuntu | 20.04, 22.04, 23.10 | `x86_64`, `i386`, `ARMv7`, `AArch64` | DEB | + +> IMPORTANT: Linux distributions frequently provide binary packages of Netdata. However, the packages you will find in the distributions' repositories may be outdated, incomplete, missing significant features or completely broken. We recommend using the packages we provide. + +## Third-party Supported Binary Packages + +The following distributions always provide the latest stable version of Netdata: + +| Platform | Platform Versions | Released Packages Architecture | +|:----------:|:-----------------:|:------------------------------------:| +| Arch Linux | Latest | All the Arch supported architectures | +| MacOS Brew | Latest | All the Brew supported architectures | + + +## Builds from Source + +We guarantee Netdata builds from source for the platforms we provide automated binary packages. These platforms are automatically checked via our CI, and fixes are always applied to allow merging new code into the nightly versions. + +The following builds from source should usually work, although we don't regularly monitor if there are issues: + +| Platform | Platform Versions | +|:-----------------------------------:|:--------------------------:| +| Linux Distributions | Latest unreleased versions | +| FreeBSD and derivatives | 13-STABLE | +| Gentoo and derivatives | Latest | +| Arch Linux and derivatives | latest from AUR | +| MacOS | 11, 12, 13 | +| Linux under Microsoft Windows (WSL) | Latest | + +## Static Builds and Unsupported Linux Versions + +The static builds of Netdata can be used on any Linux platform of the supported architectures. The only requirement these static builds have is a working Linux kernel, any version. Everything else required for Netdata to run, is inside the package itself. + +Static builds usually miss certain features that require operating-system support and cannot be provided in a generic way. These features include: + +- IPMI hardware sensors support +- systemd-journal features +- eBPF related features + +When platforms are removed from the [Binary Distribution Packages](/packaging/makeself/README.md) list, they default to install or update Netdata to a static build. This may mean that after platforms become EOL, Netdata on them may lose some of its features. We recommend upgrading the operating system before it becomes EOL, to continue using all the features of Netdata. |