diff options
Diffstat (limited to '')
15 files changed, 94 insertions, 850 deletions
diff --git a/docs/guides/collect-apache-nginx-web-logs.md b/docs/developer-and-contributor-corner/collect-apache-nginx-web-logs.md index 055219935..55af82fb7 100644 --- a/docs/guides/collect-apache-nginx-web-logs.md +++ b/docs/developer-and-contributor-corner/collect-apache-nginx-web-logs.md @@ -8,7 +8,7 @@ You can use the [LTSV log format](http://ltsv.org/), track TLS and cipher usage, ever. In one test on a system with SSD storage, the collector consistently parsed the logs for 200,000 requests in 200ms, using ~30% of a single core. -The [web_log](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/weblog/README.md) collector is currently compatible +The [web_log](/src/go/plugin/go.d/modules/weblog/README.md) collector is currently compatible with [Nginx](https://nginx.org/en/) and [Apache](https://httpd.apache.org/). This guide will walk you through using the new Go-based web log collector to turn the logs these web servers @@ -82,7 +82,7 @@ jobs: ``` Restart Netdata with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system. Netdata should pick up your web server's access log and +method](/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system. Netdata should pick up your web server's access log and begin showing real-time charts! ### Custom log formats and fields @@ -91,7 +91,7 @@ The web log collector is capable of parsing custom Nginx and Apache log formats leave that topic for a separate guide. We do have [extensive -documentation](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/weblog/README.md#custom-log-format) on how +documentation](/src/go/plugin/go.d/modules/weblog/README.md#custom-log-format) on how to build custom parsing for Nginx and Apache logs. ## Tweak web log collector alerts @@ -109,4 +109,4 @@ You can also edit this file directly with `edit-config`: ``` For more information about editing the defaults or writing new alert entities, see our -[health monitoring documentation](https://github.com/netdata/netdata/blob/master/src/health/README.md). +[health monitoring documentation](/src/health/README.md). diff --git a/docs/guides/collect-unbound-metrics.md b/docs/developer-and-contributor-corner/collect-unbound-metrics.md index 5467592a0..ac997b7f9 100644 --- a/docs/guides/collect-unbound-metrics.md +++ b/docs/developer-and-contributor-corner/collect-unbound-metrics.md @@ -58,9 +58,7 @@ configuring the collector. You may not need to do any more configuration to have Netdata collect your Unbound metrics. If you followed the steps above to enable `remote-control` and make your Unbound files readable by Netdata, that should -be enough. Restart Netdata with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system. You should see Unbound metrics in your Netdata -dashboard! +be enough. Restart Netdata with `sudo systemctl restart netdata`, or the appropriate method for your system. You should see Unbound metrics in your Netdata dashboard! ![Some charts showing Unbound metrics in real-time](https://user-images.githubusercontent.com/1153921/69659974-93160f00-103c-11ea-88e6-27e9efcf8c0d.png) @@ -93,7 +91,7 @@ jobs: tls_skip_verify: yes tls_cert: /path/to/unbound_control.pem tls_key: /path/to/unbound_control.key - + - name: local address: 127.0.0.1:8953 cumulative: yes @@ -101,16 +99,15 @@ jobs: ``` Netdata will attempt to read `unbound.conf` to get the appropriate `address`, `cumulative`, `use_tls`, `tls_cert`, and -`tls_key` parameters. +`tls_key` parameters. -Restart Netdata with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system. +Restart Netdata with `sudo systemctl restart netdata`, or the appropriate method for your system. ### Manual setup for a remote Unbound server Collecting metrics from remote Unbound servers requires manual configuration. There are too many possibilities to cover all remote connections here, but the [default `unbound.conf` -file](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/config/go.d/unbound.conf) contains a few useful examples: +file](https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/config/go.d/unbound.conf) contains a few useful examples: ```yaml jobs: @@ -132,11 +129,11 @@ jobs: ``` To see all the available options, see the default [unbound.conf -file](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/config/go.d/unbound.conf). +file](https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/config/go.d/unbound.conf). ## What's next? -Now that you're collecting metrics from your Unbound servers, let us know how it's working for you! There's always room +Now that you're collecting metrics from your Unbound servers, let us know how it's working for you! There's always Room for improvement or refinement based on real-world use cases. Feel free to [file an issue](https://github.com/netdata/netdata/issues/new?assignees=&labels=bug%2Cneeds+triage&template=BUG_REPORT.yml) with your thoughts. diff --git a/docs/guides/monitor/kubernetes-k8s-netdata.md b/docs/developer-and-contributor-corner/kubernetes-k8s-netdata.md index 982c35e79..011aac8da 100644 --- a/docs/guides/monitor/kubernetes-k8s-netdata.md +++ b/docs/developer-and-contributor-corner/kubernetes-k8s-netdata.md @@ -38,7 +38,7 @@ To follow this tutorial, you need: - A free Netdata Cloud account. [Sign up](https://app.netdata.cloud/sign-up?cloudRoute=/spaces) if you don't have one already. - A working cluster running Kubernetes v1.9 or newer, with a Netdata deployment and connected parent/child nodes. See - our [Kubernetes deployment process](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/kubernetes.md) for details on deployment and + our [Kubernetes deployment process](/packaging/installer/methods/kubernetes.md) for details on deployment and conneting to Cloud. - The [`kubectl`](https://kubernetes.io/docs/reference/kubectl/overview/) command line tool, within [one minor version difference](https://kubernetes.io/docs/tasks/tools/install-kubectl/#before-you-begin) of your cluster, on an @@ -93,13 +93,7 @@ The Netdata Helm chart deploys and enables everything you need for monitoring Ku Netdata and connect your cluster's nodes, you're ready to check out the visualizations **with zero configuration**. To get started, [sign in](https://app.netdata.cloud/sign-in?cloudRoute=/spaces) to your Netdata Cloud account. Head over -to the War Room you connected your cluster to, if not **General**. - -Netdata Cloud is already visualizing your Kubernetes metrics, streamed in real-time from each node, in the -[Overview](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md): - -![Netdata's Kubernetes monitoring -dashboard](https://user-images.githubusercontent.com/1153921/109037415-eafc5500-7687-11eb-8773-9b95941e3328.png) +to the Room you connected your cluster to, if not **General**. Let's walk through monitoring each layer of a Kubernetes cluster using the Overview as our framework. @@ -118,9 +112,6 @@ cluster](https://user-images.githubusercontent.com/1153921/109042169-19c8fa00-76 For example, the chart above shows a spike in the CPU utilization from `rabbitmq` every minute or so, along with a baseline CPU utilization of 10-15% across the cluster. -Read about the [Overview](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md) and some best practices on [viewing -an overview of your infrastructure](https://github.com/netdata/netdata/blob/master/docs/visualize/overview-infrastructure.md) for details on using composite charts to -drill down into per-node performance metrics. ## Pod and container metrics @@ -132,7 +123,7 @@ visualizations](https://user-images.githubusercontent.com/1153921/109049195-349f ### Health map -The first visualization is the [health map](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/kubernetes.md#health-map), +The first visualization is the [health map](/docs/dashboards-and-charts/kubernetes-tab.md#health-map), which places each container into its own box, then varies the intensity of their color to visualize the resource utilization. By default, the health map shows the **average CPU utilization as a percentage of the configured limit** for every container in your cluster. @@ -146,7 +137,7 @@ Let's explore the most colorful box by hovering over it. container](https://user-images.githubusercontent.com/1153921/109049544-a8417980-7695-11eb-80a7-109b4a645a27.png) The **Context** tab shows `rabbitmq-5bb66bb6c9-6xr5b` as the container's image name, which means this container is -running a [RabbitMQ](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/rabbitmq/README.md) workload. +running a [RabbitMQ](/src/go/plugin/go.d/modules/rabbitmq/README.md) workload. Click the **Metrics** tab to see real-time metrics from that container. Unsurprisingly, it shows a spike in CPU utilization at regular intervals. @@ -165,7 +156,7 @@ different namespaces. ![Time-series Kubernetes monitoring in Netdata Cloud](https://user-images.githubusercontent.com/1153921/109075210-126a1680-76b6-11eb-918d-5acdcdac152d.png) -Each composite chart has a [definition bar](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md#definition-bar) +Each composite chart has a [definition bar](/docs/dashboards-and-charts/netdata-charts.md#definition-bar) for complete customization. For example, grouping the top chart by `k8s_container_name` reveals new information. ![Changing time-series charts](https://user-images.githubusercontent.com/1153921/109075212-139b4380-76b6-11eb-836f-939482ae55fc.png) @@ -175,20 +166,20 @@ for complete customization. For example, grouping the top chart by `k8s_containe Netdata has a [service discovery plugin](https://github.com/netdata/agent-service-discovery), which discovers and creates configuration files for [compatible services](https://github.com/netdata/helmchart#service-discovery-and-supported-services) and any endpoints covered by -our [generic Prometheus collector](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/prometheus/README.md). +our [generic Prometheus collector](/src/go/plugin/go.d/modules/prometheus/README.md). Netdata uses these files to collect metrics from any compatible application as they run _inside_ of a pod. Service discovery happens without manual intervention as pods are created, destroyed, or moved between nodes. Service metrics show up on the Overview as well, beneath the **Kubernetes** section, and are labeled according to the service in question. For example, the **RabbitMQ** section has numerous charts from the [`rabbitmq` -collector](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/rabbitmq/README.md): +collector](/src/go/plugin/go.d/modules/rabbitmq/README.md): ![Finding service discovery metrics](https://user-images.githubusercontent.com/1153921/109054511-2eac8a00-769b-11eb-97f1-da93acb4b5fe.png) > The robot-shop cluster has more supported services, such as MySQL, which are not visible with zero configuration. This > is usually because of services running on non-default ports, using non-default names, or required passwords. Read up -> on [configuring service discovery](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/kubernetes.md#configure-service-discovery) to collect +> on [configuring service discovery](/packaging/installer/methods/kubernetes.md#configure-service-discovery) to collect > more service metrics. Service metrics are essential to infrastructure monitoring, as they're the best indicator of the end-user experience, @@ -202,7 +193,7 @@ Netdata also automatically collects metrics from two essential Kubernetes proces The **k8s kubelet** section visualizes metrics from the Kubernetes agent responsible for managing every pod on a given node. This also happens without any configuration thanks to the [kubelet -collector](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/k8s_kubelet/README.md). +collector](/src/go/plugin/go.d/modules/k8s_kubelet/README.md). Monitoring each node's kubelet can be invaluable when diagnosing issues with your Kubernetes cluster. For example, you can see if the number of running containers/pods has dropped, which could signal a fault or crash in a particular @@ -218,7 +209,7 @@ configuration-related errors, and the actual vs. desired numbers of volumes, plu The **k8s kube-proxy** section displays metrics about the network proxy that runs on each node in your Kubernetes cluster. kube-proxy lets pods communicate with each other and accept sessions from outside your cluster. Its metrics are collected by the [kube-proxy -collector](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/k8s_kubeproxy/README.md). +collector](/src/go/plugin/go.d/modules/k8s_kubeproxy/README.md). With Netdata, you can monitor how often your k8s proxies are syncing proxy rules between nodes. Dramatic changes in these figures could indicate an anomaly in your cluster that's worthy of further investigation. @@ -238,9 +229,9 @@ clusters of all sizes. - [Netdata Helm chart](https://github.com/netdata/helmchart) - [Netdata service discovery](https://github.com/netdata/agent-service-discovery) - [Netdata Agent 路 `kubelet` - collector](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/k8s_kubelet/README.md) + collector](/src/go/plugin/go.d/modules/k8s_kubelet/README.md) - [Netdata Agent 路 `kube-proxy` - collector](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/k8s_kubeproxy/README.md) -- [Netdata Agent 路 `cgroups.plugin`](https://github.com/netdata/netdata/blob/master/src/collectors/cgroups.plugin/README.md) + collector](/src/go/plugin/go.d/modules/k8s_kubeproxy/README.md) +- [Netdata Agent 路 `cgroups.plugin`](/src/collectors/cgroups.plugin/README.md) diff --git a/docs/guides/monitor/lamp-stack.md b/docs/developer-and-contributor-corner/lamp-stack.md index cc649dba9..2df5a7167 100644 --- a/docs/guides/monitor/lamp-stack.md +++ b/docs/developer-and-contributor-corner/lamp-stack.md @@ -51,7 +51,7 @@ To follow this tutorial, you need: ## Install the Netdata Agent If you don't have the free, open-source Netdata monitoring agent installed on your node yet, get started with a [single -kickstart command](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md): +kickstart command](/packaging/installer/README.md): <OneLineInstallWget/> @@ -61,15 +61,15 @@ replacing `NODE` with the hostname or IP address of your system. ## Enable hardware and Linux system monitoring -There's nothing you need to do to enable [system monitoring](https://github.com/netdata/netdata/blob/master/docs/collect/system-metrics.md) and Linux monitoring with +There's nothing you need to do to enable system monitoring and Linux monitoring with the Netdata Agent, which autodetects metrics from CPUs, memory, disks, networking devices, and Linux processes like systemd without any configuration. If you're using containers, Netdata automatically collects resource utilization -metrics from each using the [cgroups data collector](https://github.com/netdata/netdata/blob/master/src/collectors/cgroups.plugin/README.md). +metrics from each using the [cgroups data collector](/src/collectors/cgroups.plugin/README.md). ## Enable Apache monitoring Let's begin by configuring Apache to work with Netdata's [Apache data -collector](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/apache/README.md). +collector](/src/go/plugin/go.d/modules/apache/README.md). Actually, there's nothing for you to do to enable Apache monitoring with Netdata. @@ -80,7 +80,7 @@ metrics](https://httpd.apache.org/docs/2.4/mod/mod_status.html), which is just _ ## Enable web log monitoring The Netdata Agent also comes with a [web log -collector](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/weblog/README.md), which reads Apache's access +collector](/src/go/plugin/go.d/modules/weblog/README.md), which reads Apache's access log file, processes each line, and converts them into per-second metrics. On Debian systems, it reads the file at `/var/log/apache2/access.log`. @@ -93,7 +93,7 @@ monitoring. Because your MySQL database is password-protected, you do need to tell MySQL to allow the `netdata` user to connect to without a password. Netdata's [MySQL data -collector](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/mysql/README.md) collects metrics in _read-only_ +collector](/src/go/plugin/go.d/modules/mysql/README.md) collects metrics in _read-only_ mode, without being able to alter or affect operations in any way. First, log into the MySQL shell. Then, run the following three commands, one at a time: @@ -105,15 +105,15 @@ FLUSH PRIVILEGES; ``` Run `sudo systemctl restart netdata`, or the [appropriate alternative for your -system](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#maintaining-a-netdata-agent-installation), to collect dozens of metrics every second for robust MySQL monitoring. +system](/packaging/installer/README.md#maintaining-a-netdata-agent-installation), to collect dozens of metrics every second for robust MySQL monitoring. ## Enable PHP monitoring Unlike Apache or MySQL, PHP isn't a service that you can monitor directly, unless you instrument a PHP-based application -with [StatsD](https://github.com/netdata/netdata/blob/master/src/collectors/statsd.plugin/README.md). +with [StatsD](/src/collectors/statsd.plugin/README.md). However, if you use [PHP-FPM](https://php-fpm.org/) in your LAMP stack, you can monitor that process with our [PHP-FPM -data collector](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/phpfpm/README.md). +data collector](/src/go/plugin/go.d/modules/phpfpm/README.md). Open your PHP-FPM configuration for editing, replacing `7.4` with your version of PHP: @@ -159,12 +159,12 @@ If the Netdata Agent isn't already open in your browser, open a new tab and navi > If you [signed up](https://app.netdata.cloud/sign-up?cloudRoute=/spaces) for Netdata Cloud earlier, you can also view > the exact same LAMP stack metrics there, plus additional features, like drag-and-drop custom dashboards. Be sure to -> [connecting your node](https://github.com/netdata/netdata/blob/master/src/claim/README.md) to start streaming metrics to your browser through Netdata Cloud. +> [connecting your node](/src/claim/README.md) to start streaming metrics to your browser through Netdata Cloud. Netdata automatically organizes all metrics and charts onto a single page for easy navigation. Peek at gauges to see overall system performance, then scroll down to see more. Click-and-drag with your mouse to pan _all_ charts back and forth through different time intervals, or hold `SHIFT` and use the scrollwheel (or two-finger scroll) to zoom in and -out. Check out our doc on [interacting with charts](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md) for all the details. +out. Check out our doc on [interacting with charts](/docs/dashboards-and-charts/netdata-charts.md) for all the details. ![The Netdata dashboard](https://user-images.githubusercontent.com/1153921/109520555-98e17800-7a69-11eb-86ec-16f689da4527.png) @@ -197,15 +197,15 @@ Here's a quick reference for what charts you might want to focus on after settin The Netdata Agent comes with hundreds of pre-configured alerts to help you keep tabs on your system, including 19 alerts designed for smarter LAMP stack monitoring. -Click the 馃敂 icon in the top navigation to [see active alerts](https://github.com/netdata/netdata/blob/master/docs/monitor/view-active-alerts.md). The **Active** tabs +Click the 馃敂 icon in the top navigation to [see active alerts](/docs/dashboards-and-charts/alerts-tab.md). The **Active** tabs shows any alerts currently triggered, while the **All** tab displays a list of _every_ pre-configured alert. The ![An example of LAMP stack alerts](https://user-images.githubusercontent.com/1153921/109524120-5883f900-7a6d-11eb-830e-0e7baaa28163.png) -[Tweak alerts](https://github.com/netdata/netdata/blob/master/src/health/REFERENCE.md) based on your infrastructure monitoring needs, and to see these alerts +[Tweak alerts](/src/health/REFERENCE.md) based on your infrastructure monitoring needs, and to see these alerts in other places, like your inbox or a Slack channel, [enable a notification -method](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md). +method](/docs/alerts-and-notifications/notifications/README.md). ## What's next? @@ -215,7 +215,7 @@ services. The per-second metrics granularity means you have the most accurate in any LAMP-related issues. Another powerful way to monitor the availability of a LAMP stack is the [`httpcheck` -collector](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/httpcheck/README.md), which pings a web server at +collector](/src/go/plugin/go.d/modules/httpcheck/README.md), which pings a web server at a regular interval and tells you whether if and how quickly it's responding. The `response_match` option also lets you monitor when the web server's response isn't what you expect it to be, which might happen if PHP-FPM crashes, for example. @@ -225,14 +225,14 @@ we're not covering it here, but it _does_ work in a single-node setup. Just don' node crashed. If you're planning on managing more than one node, or want to take advantage of advanced features, like finding the -source of issues faster with [Metric Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md), +source of issues faster with [Metric Correlations](/docs/metric-correlations.md), [sign up](https://app.netdata.cloud/sign-up?cloudRoute=/spaces) for a free Netdata Cloud account. ### Related reference documentation -- [Netdata Agent 路 Get started](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md) -- [Netdata Agent 路 Apache data collector](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/apache/README.md) -- [Netdata Agent 路 Web log collector](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/weblog/README.md) -- [Netdata Agent 路 MySQL data collector](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/mysql/README.md) -- [Netdata Agent 路 PHP-FPM data collector](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/phpfpm/README.md) +- [Netdata Agent 路 Get started](/packaging/installer/README.md) +- [Netdata Agent 路 Apache data collector](/src/go/plugin/go.d/modules/apache/README.md) +- [Netdata Agent 路 Web log collector](/src/go/plugin/go.d/modules/weblog/README.md) +- [Netdata Agent 路 MySQL data collector](/src/go/plugin/go.d/modules/mysql/README.md) +- [Netdata Agent 路 PHP-FPM data collector](/src/go/plugin/go.d/modules/phpfpm/README.md) diff --git a/docs/guides/monitor-cockroachdb.md b/docs/developer-and-contributor-corner/monitor-cockroachdb.md index 9d4d3ea03..f0db12cc4 100644 --- a/docs/guides/monitor-cockroachdb.md +++ b/docs/developer-and-contributor-corner/monitor-cockroachdb.md @@ -11,7 +11,7 @@ learn_rel_path: "Miscellaneous" [CockroachDB](https://github.com/cockroachdb/cockroach) is an open-source project that brings SQL databases into scalable, disaster-resilient cloud deployments. Thanks to -a [new CockroachDB collector](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/cockroachdb/README.md) +a [new CockroachDB collector](/src/go/plugin/go.d/modules/cockroachdb/README.md) released in [v1.20](https://blog.netdata.cloud/posts/release-1.20/), you can now monitor any number of CockroachDB databases with maximum granularity using Netdata. Collect more than 50 unique metrics and put them on interactive visualizations @@ -38,7 +38,7 @@ display them on the dashboard. If your CockroachDB instance is accessible through `http://localhost:8080/` or `http://127.0.0.1:8080`, your setup is complete. Restart Netdata with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system, and refresh your browser. You should see CockroachDB +method](/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system, and refresh your browser. You should see CockroachDB metrics in your Netdata dashboard! <figure> @@ -115,4 +115,4 @@ cd /etc/netdata/ # Replace with your Netdata configuration directory, if not /et ./edit-config health.d/cockroachdb.conf # You may need to use `sudo` for write privileges ``` -For more information about editing the defaults or writing new alert entities, see our documentation on [configuring health alerts](https://github.com/netdata/netdata/blob/master/src/health/REFERENCE.md). +For more information about editing the defaults or writing new alert entities, see our documentation on [configuring health alerts](/src/health/REFERENCE.md). diff --git a/docs/guides/troubleshoot/monitor-debug-applications-ebpf.md b/docs/developer-and-contributor-corner/monitor-debug-applications-ebpf.md index 728606c83..91d2a2ef2 100644 --- a/docs/guides/troubleshoot/monitor-debug-applications-ebpf.md +++ b/docs/developer-and-contributor-corner/monitor-debug-applications-ebpf.md @@ -12,7 +12,7 @@ learn_rel_path: "Operations" When trying to troubleshoot or debug a finicky application, there's no such thing as too much information. At Netdata, we developed programs that connect to the [_extended Berkeley Packet Filter_ (eBPF) virtual -machine](https://github.com/netdata/netdata/blob/master/src/collectors/ebpf.plugin/README.md) to help you see exactly how specific applications are interacting with the +machine](/src/collectors/ebpf.plugin/README.md) to help you see exactly how specific applications are interacting with the Linux kernel. With these charts, you can root out bugs, discover optimizations, diagnose memory leaks, and much more. This means you can see exactly how often, and in what volume, the application creates processes, opens files, writes to @@ -29,7 +29,7 @@ To start troubleshooting an application with eBPF metrics, you need to ensure yo displays those metrics independent from any other process. You can use the `apps_groups.conf` file to configure which applications appear in charts generated by -[`apps.plugin`](https://github.com/netdata/netdata/blob/master/src/collectors/apps.plugin/README.md). Once you edit this file and create a new group for the application +[`apps.plugin`](/src/collectors/apps.plugin/README.md). Once you edit this file and create a new group for the application you want to monitor, you can see how it's interacting with the Linux kernel via real-time eBPF metrics. Let's assume you have an application that runs on the process `custom-app`. To monitor eBPF metrics for that application @@ -61,12 +61,12 @@ dev: custom-app ``` Restart Netdata with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system, to begin seeing metrics for this particular +method](/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system, to begin seeing metrics for this particular group+process. You can also add additional processes to the same group. You can set up `apps_groups.conf` to more show more precise eBPF metrics for any application or service running on your system, even if it's a standard package like Redis, Apache, or any other [application/service Netdata collects -from](https://github.com/netdata/netdata/blob/master/src/collectors/COLLECTORS.md). +from](/src/collectors/COLLECTORS.md). ```conf # ----------------------------------------------------------------------------- @@ -86,7 +86,7 @@ to show other charts that will help you debug and troubleshoot how it interacts ## Configure the eBPF collector to monitor errors -The eBPF collector has [two possible modes](https://github.com/netdata/netdata/blob/master/src/collectors/ebpf.plugin/README.md#ebpf-load-mode): `entry` and `return`. The default +The eBPF collector has [two possible modes](/src/collectors/ebpf.plugin/README.md#ebpf-load-mode): `entry` and `return`. The default is `entry`, and only monitors calls to kernel functions, but the `return` also monitors and charts _whether these calls return in error_. @@ -110,7 +110,7 @@ Replace `entry` with `return`: ``` Restart Netdata with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system. +method](/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system. ## Get familiar with per-application eBPF metrics and charts @@ -122,7 +122,7 @@ Pay particular attention to the charts in the **ebpf file**, **ebpf syscall**, * sub-sections. These charts are populated by low-level Linux kernel metrics thanks to eBPF, and showcase the volume of calls to open/close files, call functions like `do_fork`, IO activity on the VFS, and much more. -See the [eBPF collector documentation](https://github.com/netdata/netdata/blob/master/src/collectors/ebpf.plugin/README.md#integration-with-appsplugin) for the full list +See the [eBPF collector documentation](/src/collectors/ebpf.plugin/README.md#integration-with-appsplugin) for the full list of per-application charts. Let's show some examples of how you can first identify normal eBPF patterns, then use that knowledge to identify @@ -239,16 +239,16 @@ same application on multiple systems and want to correlate how it performs on ea findings with someone else on your team. If you don't already have a Netdata Cloud account, go [sign in](https://app.netdata.cloud) and get started for free. -You can also read how to [monitor your infrastructure with Netdata Cloud](https://github.com/netdata/netdata/blob/master/docs/quickstart/infrastructure.md) to understand the key features that it has to offer. +You can also read how to [monitor your infrastructure with Netdata Cloud](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md) to understand the key features that it has to offer. -Once you've added one or more nodes to a Space in Netdata Cloud, you can see aggregated eBPF metrics in the [Overview -dashboard](https://github.com/netdata/netdata/blob/master/docs/visualize/overview-infrastructure.md) under the same **Applications** or **eBPF** sections that you -find on the local Agent dashboard. Or, [create new dashboards](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md) using eBPF metrics +Once you've added one or more nodes to a Space in Netdata Cloud, you can see aggregated eBPF metrics in the Overview +dashboard under the same **Applications** or **eBPF** sections that you +find on the local Agent dashboard. Or, [create new dashboards](/docs/dashboards-and-charts/dashboards-tab.md) using eBPF metrics from any number of distributed nodes to see how your application interacts with multiple Linux kernels on multiple Linux systems. Now that you can see eBPF metrics in Netdata Cloud, you can [invite your -team](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/organize-your-infrastrucutre-invite-your-team.md#invite-your-team) and share your findings with others. +team](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md#invite-your-team) and share your findings with others. diff --git a/docs/guides/monitor-hadoop-cluster.md b/docs/developer-and-contributor-corner/monitor-hadoop-cluster.md index b536e0fa0..98bf3d21f 100644 --- a/docs/guides/monitor-hadoop-cluster.md +++ b/docs/developer-and-contributor-corner/monitor-hadoop-cluster.md @@ -27,8 +27,8 @@ alternative, like the guide available from For more specifics on the collection modules used in this guide, read the respective pages in our documentation: -- [HDFS](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/hdfs/README.md) -- [Zookeeper](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/zookeeper/README.md) +- [HDFS](/src/go/plugin/go.d/modules/hdfs/README.md) +- [Zookeeper](/src/go/plugin/go.d/modules/zookeeper/README.md) ## Set up your HDFS and Zookeeper installations @@ -164,7 +164,7 @@ jobs: address : 203.0.113.10:2182 ``` -Finally, [restart Netdata](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#maintaining-a-netdata-agent-installation). +Finally, [restart Netdata](/packaging/installer/README.md#maintaining-a-netdata-agent-installation). ```sh sudo systemctl restart netdata @@ -188,4 +188,4 @@ sudo /etc/netdata/edit-config health.d/zookeeper.conf ``` For more information about editing the defaults or writing new alert entities, see our -[health monitoring documentation](https://github.com/netdata/netdata/blob/master/src/health/README.md). +[health monitoring documentation](/src/health/README.md). diff --git a/docs/guides/monitor/pi-hole-raspberry-pi.md b/docs/developer-and-contributor-corner/pi-hole-raspberry-pi.md index 1e76cc096..df6bb0809 100644 --- a/docs/guides/monitor/pi-hole-raspberry-pi.md +++ b/docs/developer-and-contributor-corner/pi-hole-raspberry-pi.md @@ -81,7 +81,7 @@ service](https://discourse.pi-hole.net/t/how-do-i-configure-my-devices-to-use-pi finished setting up Pi-hole at this point. As far as configuring Netdata to monitor Pi-hole metrics, there's nothing you actually need to do. Netdata's [Pi-hole -collector](https://github.com/netdata/netdata/blob/master/src/go/collectors/go.d.plugin/modules/pihole/README.md) will autodetect the new service +collector](/src/go/plugin/go.d/modules/pihole/README.md) will autodetect the new service running on your Raspberry Pi and immediately start collecting metrics every second. Restart Netdata with `sudo systemctl restart netdata`, which will then recognize that Pi-hole is running and start a @@ -100,14 +100,12 @@ part of your system might affect another. ![The Netdata dashboard in action](https://user-images.githubusercontent.com/1153921/80827388-b9fee100-8b98-11ea-8f60-0d7824667cd3.gif) -If you're completely new to Netdata, look at the [Introduction](https://github.com/netdata/netdata/blob/master/docs/getting-started/introduction.md) section for a walkthrough of all its features. For a more expedited tour, see the [get started documentation](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md). - ### Enable temperature sensor monitoring You need to manually enable Netdata's built-in [temperature sensor -collector](https://github.com/netdata/netdata/blob/master/src/collectors/charts.d.plugin/sensors/README.md) to start collecting metrics. +collector](/src/collectors/charts.d.plugin/sensors/README.md) to start collecting metrics. -> Netdata uses a few plugins to manage its [collectors](https://github.com/netdata/netdata/blob/master/src/collectors/REFERENCE.md), each using a different language: Go, +> Netdata uses a few plugins to manage its [collectors](/src/collectors/REFERENCE.md), each using a different language: Go, > Python, Node.js, and Bash. While our Go collectors are undergoing the most active development, we still support the > other languages. In this case, you need to enable a temperature sensor collector that's written in Bash. @@ -125,7 +123,7 @@ Raspberry Pi temperature sensor monitoring. ### Storing historical metrics on your Raspberry Pi By default, Netdata allocates 256 MiB in disk space to store historical metrics inside the [database -engine](https://github.com/netdata/netdata/blob/master/src/database/engine/README.md). On the Raspberry Pi used for this guide, Netdata collects 1,500 metrics every +engine](/src/database/engine/README.md). On the Raspberry Pi used for this guide, Netdata collects 1,500 metrics every second, which equates to storing 3.5 days worth of historical metrics. You can increase this allocation by editing `netdata.conf` and increasing the `dbengine multihost disk space` setting to @@ -137,6 +135,6 @@ more than 256. ``` Use our [database sizing -calculator](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md#calculate-the-system-resources-ram-disk-space-needed-to-store-metrics) -and the [Database configuration documentation](https://github.com/netdata/netdata/blob/master/src/database/README.md) to help you determine the right +calculator](/docs/netdata-agent/configuration/optimizing-metrics-database/change-metrics-storage.md#calculate-the-system-resources-ram-disk-space-needed-to-store-metrics) +and the [Database configuration documentation](/src/database/README.md) to help you determine the right setting for your Raspberry Pi. diff --git a/docs/guides/monitor/process.md b/docs/developer-and-contributor-corner/process.md index af36aefa1..2902a24f6 100644 --- a/docs/guides/monitor/process.md +++ b/docs/developer-and-contributor-corner/process.md @@ -37,16 +37,16 @@ With Netdata's process monitoring, you can: ## Prerequisites -- One or more Linux nodes running [Netdata](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md) +- One or more Linux nodes running [Netdata](/packaging/installer/README.md) - A general understanding of how - to [configure the Netdata Agent](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) + to [configure the Netdata Agent](/docs/netdata-agent/configuration/README.md) using `edit-config`. - A Netdata Cloud account. [Sign up](https://app.netdata.cloud) if you don't have one already. ## How does Netdata do process monitoring? The Netdata Agent already knows to look for hundreds -of [standard applications that we support via collectors](https://github.com/netdata/netdata/blob/master/src/collectors/COLLECTORS.md), +of [standard applications that we support via collectors](/src/collectors/COLLECTORS.md), and groups them based on their purpose. Let's say you want to monitor a MySQL database using its process. The Netdata Agent already knows to look for processes with the string `mysqld` in their @@ -55,12 +55,12 @@ process-specific charts. The process and groups settings are used by two unique and powerful collectors. -[**`apps.plugin`**](https://github.com/netdata/netdata/blob/master/src/collectors/apps.plugin/README.md) looks at the Linux +[**`apps.plugin`**](/src/collectors/apps.plugin/README.md) looks at the Linux process tree every second, much like `top` or `ps fax`, and collects resource utilization information on every running process. It then automatically adds a layer of meaningful visualization on top of these metrics, and creates per-process/application charts. -[**`ebpf.plugin`**](https://github.com/netdata/netdata/blob/master/src/collectors/ebpf.plugin/README.md): Netdata's extended +[**`ebpf.plugin`**](/src/collectors/ebpf.plugin/README.md): Netdata's extended Berkeley Packet Filter (eBPF) collector monitors Linux kernel-level metrics for file descriptors, virtual filesystem IO, and process management, and then hands process-specific metrics over to `apps.plugin` for visualization. The eBPF collector also collects and visualizes @@ -130,7 +130,7 @@ aware of hundreds of processes, and collects metrics from them automatically. But, if you want to change the grouping behavior, add an application that isn't yet supported in the Netdata Agent, or monitor a custom application, you need to edit the `apps_groups.conf` configuration file. -Navigate to your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) and +Navigate to your [Netdata config directory](/docs/netdata-agent/configuration/README.md) and use `edit-config` to edit the file. ```bash @@ -146,7 +146,7 @@ others, and groups them into `sql`. That makes sense, since all these processes sql: mysqld* mariad* postgres* postmaster* oracle_* ora_* sqlservr ``` -These groups are then reflected as [dimensions](https://github.com/netdata/netdata/blob/master/src/web/README.md#dimensions) +These groups are then reflected as [dimensions](/src/web/README.md#dimensions) within Netdata's charts. ![An example per-process CPU utilization chart in Netdata @@ -180,7 +180,7 @@ sql: mariad* postmaster* oracle_* ora_* sqlservr ``` Restart Netdata with `sudo systemctl restart netdata`, or -the [appropriate method](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system, to start collecting utilization metrics +the [appropriate method](/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system, to start collecting utilization metrics from your application. Time to [visualize your process metrics](#visualize-process-metrics). ### Custom applications @@ -207,7 +207,7 @@ custom-app: custom-app ``` Restart Netdata with `sudo systemctl restart netdata`, or -the [appropriate method](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system, to start collecting utilization metrics +the [appropriate method](/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system, to start collecting utilization metrics from your application. ## Visualize process metrics diff --git a/docs/guides/python-collector.md b/docs/developer-and-contributor-corner/python-collector.md index 4dd6d2c4c..0b7aa96a6 100644 --- a/docs/guides/python-collector.md +++ b/docs/developer-and-contributor-corner/python-collector.md @@ -1,8 +1,8 @@ # Develop a custom data collector in Python -The Netdata Agent uses [data collectors](https://github.com/netdata/netdata/blob/master/src/collectors/README.md) to +The Netdata Agent uses [data collectors](/src/collectors/README.md) to fetch metrics from hundreds of system, container, and service endpoints. While the Netdata team and community has built -[powerful collectors](https://github.com/netdata/netdata/blob/master/src/collectors/COLLECTORS.md) for most system, container, +[powerful collectors](/src/collectors/COLLECTORS.md) for most system, container, and service/application endpoints, some custom applications can't be monitored by default. In this tutorial, you'll learn how to leverage the [Python programming language](https://www.python.org/) to build a @@ -22,7 +22,7 @@ want to make it available for other users, you should create the pull request in ## What you need to get started - A physical or virtual Linux system, which we'll call a _node_. - - A working [installation of Netdata](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md) monitoring agent. + - A working [installation of Netdata](/packaging/installer/README.md) monitoring agent. ### Quick start @@ -33,7 +33,7 @@ For a quick start, you can look at the Netdata (as opposed to having to install Netdata from source again with your new changes) you can copy over the relevant file to where Netdata expects it and then either `sudo systemctl restart netdata` to have it be picked up and used by Netdata or you can just run the updated collector in debug mode by following a process like below (this assumes you have -[installed Netdata from a GitHub fork](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/manual.md) you +[installed Netdata from a GitHub fork](/packaging/installer/methods/manual.md) you have made to do your development on). ```bash @@ -73,7 +73,7 @@ The basic elements of a Netdata collector are: - `get_data()`: The basic function of the plugin which will return to Netdata the correct values. **Note**: All names are better explained in the -[External Plugins Documentation](https://github.com/netdata/netdata/blob/master/src/collectors/plugins.d/README.md). +[External Plugins Documentation](/src/collectors/plugins.d/README.md). Parameters like `priority` and `update_every` mentioned in that documentation are handled by the `python.d.plugin`, not by each collection module. @@ -117,7 +117,7 @@ context, charttype]`, where: that is `A.B`, with `A` being the name of the collector, and `B` being the name of the specific metric. - `charttype`: Either `line`, `area`, or `stacked`. If null line is the default value. -You can read more about `family` and `context` in the [web dashboard](https://github.com/netdata/netdata/blob/master/src/web/README.md#families) doc. +You can read more about `family` and `context` in the [web dashboard](/src/web/README.md#families) doc. Once the chart has been defined, you should define the dimensions of the chart. Dimensions are basically the metrics to be represented in this chart and each chart can have more than one dimension. In order to define the dimensions, the @@ -410,7 +410,7 @@ ORDER = [ ] ``` -[Restart Netdata](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#maintaining-a-netdata-agent-installation) with `sudo systemctl restart netdata` to see the new humidity +[Restart Netdata](/packaging/installer/README.md#maintaining-a-netdata-agent-installation) with `sudo systemctl restart netdata` to see the new humidity chart: ![A snapshot of the modified chart](https://i.imgur.com/XOeCBmg.png) @@ -467,7 +467,7 @@ ORDER = [ ] ``` -[Restart Netdata](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#maintaining-a-netdata-agent-installation) with `sudo systemctl restart netdata` to see the new +[Restart Netdata](/packaging/installer/README.md#maintaining-a-netdata-agent-installation) with `sudo systemctl restart netdata` to see the new min/max/average temperature chart with multiple dimensions: ![A snapshot of the modified chart](https://i.imgur.com/g7E8lnG.png) @@ -521,7 +521,7 @@ variables and inform the user about the defaults. For example, take a look at th [GitHub](https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/example/example.conf). You can read more about the configuration file on the [`python.d.plugin` -documentation](https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/README.md). +documentation](/src/collectors/python.d.plugin/README.md). You can find the source code for the above examples on [GitHub](https://github.com/papajohn-uop/netdata). diff --git a/docs/guides/monitor/raspberry-pi-anomaly-detection.md b/docs/developer-and-contributor-corner/raspberry-pi-anomaly-detection.md index 3c56ac79a..41cf007eb 100644 --- a/docs/guides/monitor/raspberry-pi-anomaly-detection.md +++ b/docs/developer-and-contributor-corner/raspberry-pi-anomaly-detection.md @@ -6,7 +6,7 @@ We love IoT and edge at Netdata, we also love machine learning. Even better if w of monitoring increasingly complex systems. We recently explored what might be involved in enabling our Python-based [anomalies -collector](https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/anomalies/README.md) on a Raspberry Pi. To our delight, it's actually quite +collector](/src/collectors/python.d.plugin/anomalies/README.md) on a Raspberry Pi. To our delight, it's actually quite straightforward! Read on to learn all the steps and enable unsupervised anomaly detection on your on Raspberry Pi(s). @@ -17,14 +17,14 @@ Read on to learn all the steps and enable unsupervised anomaly detection on your - A Raspberry Pi running Raspbian, which we'll call a _node_. - The [open-source Netdata](https://github.com/netdata/netdata) monitoring agent. If you don't have it installed on your - node yet, [get started now](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md). + node yet, [get started now](/packaging/installer/README.md). ## Install dependencies First make sure Netdata is using Python 3 when it runs Python-based data collectors. -Next, open `netdata.conf` using [`edit-config`](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) -from within the [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/netdata-agent/configuration.md#the-netdata-config-directory). Scroll down to the +Next, open `netdata.conf` using [`edit-config`](/docs/netdata-agent/configuration/README.md#edit-netdataconf) +from within the [Netdata config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory). Scroll down to the `[plugin:python.d]` section to pass in the `-ppython3` command option. ```conf @@ -53,7 +53,7 @@ LLVM_CONFIG=llvm-config-9 pip3 install --user llvmlite numpy==1.20.1 netdata-pan ## Enable the anomalies collector -Now you're ready to enable the collector and [restart Netdata](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#maintaining-a-netdata-agent-installation). +Now you're ready to enable the collector and [restart Netdata](/packaging/installer/README.md#maintaining-a-netdata-agent-installation). ```bash sudo ./edit-config python.d.conf @@ -75,7 +75,7 @@ centralized cloud somewhere) is the resource utilization impact of running a mon With the default configuration, the anomalies collector uses about 6.5% of CPU at each run. During the retraining step, CPU utilization jumps to between 20-30% for a few seconds, but you can [configure -retraining](https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/anomalies/README.md#configuration) to happen less often if you wish. +retraining](/src/collectors/python.d.plugin/anomalies/README.md#configuration) to happen less often if you wish. ![CPU utilization of anomaly detection on the Raspberry Pi](https://user-images.githubusercontent.com/1153921/110149718-9d749c00-7d9b-11eb-9af8-46e2032cd1d0.png) diff --git a/docs/guides/configure/performance.md b/docs/guides/configure/performance.md deleted file mode 100644 index e1b32778e..000000000 --- a/docs/guides/configure/performance.md +++ /dev/null @@ -1,266 +0,0 @@ -# How to optimize the Netdata Agent's performance - -We designed the Netdata Agent to be incredibly lightweight, even when it's collecting a few thousand dimensions every -second and visualizing that data into hundreds of charts. However, the default settings of the Netdata Agent are not -optimized for performance, but for a simple, standalone setup. We want the first install to give you something you can -run without any configuration. Most of the settings and options are enabled, since we want you to experience the full -thing. - -By default, Netdata will automatically detect applications running on the node it is installed to start collecting -metrics in real-time, has health monitoring enabled to evaluate alerts and trains Machine Learning (ML) models for each -metric, to detect anomalies. - -This document describes the resources required for the various default capabilities and the strategies to optimize -Netdata for production use. - -## Summary of performance optimizations - -The following table summarizes the effect of each optimization on the CPU, RAM and Disk IO utilization in production. - -| Optimization | CPU | RAM | Disk IO | -|-------------------------------------------------------------------------------------------------------------------------------|--------------------|--------------------|--------------------| -| [Use streaming and replication](#use-streaming-and-replication) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | -| [Disable unneeded plugins or collectors](#disable-unneeded-plugins-or-collectors) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | -| [Reduce data collection frequency](#reduce-collection-frequency) | :heavy_check_mark: | | :heavy_check_mark: | -| [Change how long Netdata stores metrics](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) | | :heavy_check_mark: | :heavy_check_mark: | -| [Use a different metric storage database](https://github.com/netdata/netdata/blob/master/src/database/README.md) | | :heavy_check_mark: | :heavy_check_mark: | -| [Disable machine learning](#disable-machine-learning) | :heavy_check_mark: | | | -| [Use a reverse proxy](#run-netdata-behind-a-proxy) | :heavy_check_mark: | | | -| [Disable/lower gzip compression for the agent dashboard](#disablelower-gzip-compression-for-the-dashboard) | :heavy_check_mark: | | | - -## Resources required by a default Netdata installation - -Netdata's performance is primarily affected by **data collection/retention** and **clients accessing data**. - -You can configure almost all aspects of data collection/retention, and certain aspects of clients accessing data. - -### CPU consumption - -Expect about: - -- 1-3% of a single core for the netdata core -- 1-3% of a single core for the various collectors (e.g. go.d.plugin, apps.plugin) -- 5-10% of a single core, when ML training runs - -Your experience may vary depending on the number of metrics collected, the collectors enabled and the specific -environment they run on, i.e. the work they have to do to collect these metrics. - -As a general rule, for modern hardware and VMs, the total CPU consumption of a standalone Netdata installation, -including all its components, should be below 5 - 15% of a single core. For example, on 8 core server it will use only -0.6% - 1.8% of a total CPU capacity, depending on the CPU characteristics. - -The Netdata Agent runs with the lowest -possible [process scheduling policy](https://github.com/netdata/netdata/blob/master/src/daemon/README.md#netdata-process-scheduling-policy), -which is `nice 19`, and uses the `idle` process scheduler. Together, these settings ensure that the Agent only gets CPU -resources when the node has CPU resources to space. If the node reaches 100% CPU utilization, the Agent is stopped first -to ensure your applications get any available resources. - -To reduce CPU usage you can (either one or a combination of the following actions): - -1. [Disable machine learning](#disable-machine-learning), -2. [Use streaming and replication](#use-streaming-and-replication), -3. [Reduce the data collection frequency](#reduce-collection-frequency) -4. [Disable unneeded plugins or collectors](#disable-unneeded-plugins-or-collectors) -5. [Use a reverse proxy](#run-netdata-behind-a-proxy), -6. [Disable/lower gzip compression for the agent dashboard](#disablelower-gzip-compression-for-the-dashboard). - -### Memory consumption - -The memory footprint of Netdata is mainly influenced by the number of metrics concurrently being collected. Expect about -150MB of RAM for a typical 64-bit server collecting about 2000 to 3000 metrics. - -To estimate and control memory consumption, you can (either one or a combination of the following actions): - -1. [Disable unneeded plugins or collectors](#disable-unneeded-plugins-or-collectors) -2. [Change how long Netdata stores metrics](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) -3. [Use a different metric storage database](https://github.com/netdata/netdata/blob/master/src/database/README.md). - -### Disk footprint and I/O - -By default, Netdata should not use more than 1GB of disk space, most of which is dedicated for storing metric data and -metadata. For typical installations collecting 2000 - 3000 metrics, this storage should provide a few days of -high-resolution retention (per second), about a month of mid-resolution retention (per minute) and more than a year of -low-resolution retention (per hour). - -Netdata spreads I/O operations across time. For typical standalone installations there should be a few write operations -every 5-10 seconds of a few kilobytes each, occasionally up to 1MB. In addition, under heavy load, collectors that -require disk I/O may stop and show gaps in charts. - -To optimize your disk footprint in any aspect described below you can: - - -To configure retention, you can: - -1. [Change how long Netdata stores metrics](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md). - -To control disk I/O: - -1. [Use a different metric storage database](https://github.com/netdata/netdata/blob/master/src/database/README.md), - - -Minimize deployment impact on the production system by optimizing disk footprint: - -1. [Using streaming and replication](#use-streaming-and-replication) -2. [Reduce the data collection frequency](#reduce-collection-frequency) -3. [Disable unneeded plugins or collectors](#disable-unneeded-plugins-or-collectors). - -## Use streaming and replication - -For all production environments, parent Netdata nodes outside the production infrastructure should be receiving all -collected data from children Netdata nodes running on the production infrastructure, -using [streaming and replication](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.md). - -### Disable health checks on the child nodes - -When you set up streaming, we recommend you run your health checks on the parent. This saves resources on the children -and makes it easier to configure or disable alerts and agent notifications. - -The parents by default run health checks for each child, as long as the child is connected (the details are -in `stream.conf`). On the child nodes you should add to `netdata.conf` the following: - -```conf -[health] - enabled = no -``` - -### Use memory mode ram for the child nodes - -See [using a different metric storage database](https://github.com/netdata/netdata/blob/master/src/database/README.md). - -## Disable unneeded plugins or collectors - -If you know that you don't need an [entire plugin or a specific -collector](https://github.com/netdata/netdata/blob/master/src/collectors/README.md#collector-architecture-and-terminology), -you can disable any of them. Keep in mind that if a plugin/collector has nothing to do, it simply shuts down and does -not consume system resources. You will only improve the Agent's performance by disabling plugins/collectors that are -actively collecting metrics. - -Open `netdata.conf` and scroll down to the `[plugins]` section. To disable any plugin, uncomment it and set the value to -`no`. For example, to explicitly keep the `proc` and `go.d` plugins enabled while disabling `python.d` and `charts.d`. - -```conf -[plugins] - proc = yes - python.d = no - charts.d = no - go.d = yes -``` - -Disable specific collectors by opening their respective plugin configuration files, uncommenting the line for the -collector, and setting its value to `no`. - -```bash -sudo ./edit-config go.d.conf -sudo ./edit-config python.d.conf -sudo ./edit-config charts.d.conf -``` - -For example, to disable a few Python collectors: - -```conf -modules: - apache: no - dockerd: no - fail2ban: no -``` - -## Reduce collection frequency - -The fastest way to improve the Agent's resource utilization is to reduce how often it collects metrics. - -### Global - -If you don't need per-second metrics, or if the Netdata Agent uses a lot of CPU even when no one is viewing that node's -dashboard, [configure the Agent](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) to collect -metrics less often. - -Open `netdata.conf` and edit the `update every` setting. The default is `1`, meaning that the Agent collects metrics -every second. - -If you change this to `2`, Netdata enforces a minimum `update every` setting of 2 seconds, and collects metrics every -other second, which will effectively halve CPU utilization. Set this to `5` or `10` to collect metrics every 5 or 10 -seconds, respectively. - -```conf -[global] - update every = 5 -``` - -### Specific plugin or collector - -Every collector and plugin has its own `update every` setting, which you can also change in the `go.d.conf`, -`python.d.conf`, or `charts.d.conf` files, or in individual collector configuration files. If the `update -every` for an individual collector is less than the global, the Netdata Agent uses the global setting. See -the [collectors configuration reference](https://github.com/netdata/netdata/blob/master/src/collectors/REFERENCE.md) for -details. - -To reduce the frequency of -an [internal_plugin/collector](https://github.com/netdata/netdata/blob/master/src/collectors/README.md#collector-architecture-and-terminology), -open `netdata.conf` and find the appropriate section. For example, to reduce the frequency of the `apps` plugin, which -collects and visualizes metrics on application resource utilization: - -```conf -[plugin:apps] - update every = 5 -``` - -To [configure an individual collector](https://github.com/netdata/netdata/blob/master/src/collectors/REFERENCE.md#configure-a-collector), -open its specific configuration file with `edit-config` and look for the `update_every` setting. For example, to reduce -the frequency of the `nginx` collector, run `sudo ./edit-config go.d/nginx.conf`: - -```conf -# [ GLOBAL ] -update_every: 10 -``` - -## Lower memory usage for metrics retention - -See how -to [change how long Netdata stores metrics](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md). - -## Use a different metric storage database - -Consider [using a different metric storage database](https://github.com/netdata/netdata/blob/master/src/database/README.md) -when running Netdata on IoT devices, and for children in a parent-child set up based -on [streaming and replication](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.md). - -## Disable machine learning - -Automated anomaly detection may be a powerful tool, but we recommend it to only be enabled on Netdata parents that sit -outside your production infrastructure, or if you have cpu and memory to spare. You can disable ML with the following: - -```conf -[ml] - enabled = no -``` - -## Run Netdata behind a proxy - -A dedicated web server like nginx provides more robustness than the Agent's -internal [web server](https://github.com/netdata/netdata/blob/master/src/web/README.md). -Nginx can handle more concurrent connections, reuse idle connections, and use fast gzip compression to reduce payloads. - -For details on installing another web server as a proxy for the local Agent dashboard, -see [reverse proxies](https://github.com/netdata/netdata/blob/master/docs/category-overview-pages/reverse-proxies.md). - -## Disable/lower gzip compression for the dashboard - -If you choose not to run the Agent behind Nginx, you can disable or lower the Agent's web server's gzip compression. -While gzip compression does reduce the size of the HTML/CSS/JS payload, it does use additional CPU while a user is -looking at the local Agent dashboard. - -To disable gzip compression, open `netdata.conf` and find the `[web]` section: - -```conf -[web] - enable gzip compression = no -``` - -Or to lower the default compression level: - -```conf -[web] - enable gzip compression = yes - gzip compression level = 1 -``` - diff --git a/docs/guides/monitor/anomaly-detection.md b/docs/guides/monitor/anomaly-detection.md deleted file mode 100644 index bc19a4f28..000000000 --- a/docs/guides/monitor/anomaly-detection.md +++ /dev/null @@ -1,76 +0,0 @@ -<!-- -title: "Machine learning (ML) powered anomaly detection" -sidebar_label: "Machine learning (ML) powered anomaly detection" -description: "Detect anomalies in any system, container, or application in your infrastructure with machine learning and the open-source Netdata Agent." -image: /img/seo/guides/monitor/anomaly-detection.png -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/monitor/anomaly-detection.md -learn_status: "Published" -learn_rel_path: "Operations" ---> - -# Machine learning (ML) powered anomaly detection - - -## Overview - -As of [`v1.32.0`](https://github.com/netdata/netdata/releases/tag/v1.32.0), Netdata comes with some ML powered [anomaly detection](https://en.wikipedia.org/wiki/Anomaly_detection) capabilities built into it and available to use out of the box, with zero configuration required (ML was enabled by default in `v1.35.0-29-nightly` in [this PR](https://github.com/netdata/netdata/pull/13158), previously it required a one line config change). - -This means that in addition to collecting raw value metrics, the Netdata agent will also produce an [`anomaly-bit`](https://github.com/netdata/netdata/blob/master/src/ml/README.md#anomaly-bit---100--anomalous-0--normal) every second which will be `100` when recent raw metric values are considered anomalous by Netdata and `0` when they look normal. Once we aggregate beyond one second intervals this aggregated `anomaly-bit` becomes an ["anomaly rate"](https://github.com/netdata/netdata/blob/master/src/ml/README.md#anomaly-rate---averageanomaly-bit). - -To be as concrete as possible, the below api call shows how to access the raw anomaly bit of the `system.cpu` chart from the [london.my-netdata.io](https://london.my-netdata.io) Netdata demo server. Passing `options=anomaly-bit` returns the anomaly bit instead of the raw metric value. - -``` -https://london.my-netdata.io/api/v1/data?chart=system.cpu&options=anomaly-bit -``` - -If we aggregate the above to just 1 point by adding `points=1` we get an "[Anomaly Rate](https://github.com/netdata/netdata/blob/master/src/ml/README.md#anomaly-rate---averageanomaly-bit)": - -``` -https://london.my-netdata.io/api/v1/data?chart=system.cpu&options=anomaly-bit&points=1 -``` - -The fundamentals of Netdata's anomaly detection approach and implementation are covered in lots more detail in the [agent ML documentation](https://github.com/netdata/netdata/blob/master/src/ml/README.md). - -This guide will explain how to get started using these ML based anomaly detection capabilities within Netdata. - -## Anomaly Advisor - -The [Anomaly Advisor](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/anomaly-advisor.md) is the flagship anomaly detection feature within Netdata. In the "Anomalies" tab of Netdata you will see an overall "Anomaly Rate" chart that aggregates node level anomaly rate for all nodes in a space. The aim of this chart is to make it easy to quickly spot periods of time where the overall "[node anomaly rate](https://github.com/netdata/netdata/blob/master/src/ml/README.md#node-anomaly-rate)" is elevated in some unusual way and for what node or nodes this relates to. - -![image](https://user-images.githubusercontent.com/2178292/175928290-490dd8b9-9c55-4724-927e-e145cb1cc837.png) - -Once an area on the Anomaly Rate chart is highlighted netdata will append a "heatmap" to the bottom of the screen that shows which metrics were more anomalous in the highlighted timeframe. Each row in the heatmap consists of an anomaly rate sparkline graph that can be expanded to reveal the raw underlying metric chart for that dimension. - -![image](https://user-images.githubusercontent.com/2178292/175929162-02c8fe69-cc4f-4cf4-9b3a-a5e559a6feca.png) - -## Embedded Anomaly Rate Charts - -Charts in both the [Overview](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md) and [single node dashboard](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md#jump-to-single-node-dashboards) tabs also expose the underlying anomaly rates for each dimension so users can easily see if the raw metrics are considered anomalous or not by Netdata. - -Pressing the anomalies icon (next to the information icon in the chart header) will expand the anomaly rate chart to make it easy to see how the anomaly rate for any individual dimension corresponds to the raw underlying data. In the example below we can see that the spike in `system.pgpgio|in` corresponded in the anomaly rate for that dimension jumping to 100% for a small period of time until the spike passed. - -![image](https://user-images.githubusercontent.com/2178292/175933078-5dd951ff-7709-4bb9-b4be-34199afb3945.png) - -## Anomaly Rate Based Alerts - -It is possible to use the `anomaly-bit` when defining traditional Alerts within netdata. The `anomaly-bit` is just another `options` parameter that can be passed as part of an alert line lookup. - -You can see some example ML based alert configurations below: - -- [Anomaly rate based CPU dimensions alert](https://github.com/netdata/netdata/blob/master/src/health/REFERENCE.md#example-8---anomaly-rate-based-cpu-dimensions-alert) -- [Anomaly rate based CPU chart alert](https://github.com/netdata/netdata/blob/master/src/health/REFERENCE.md#example-9---anomaly-rate-based-cpu-chart-alert) -- [Anomaly rate based node level alert](https://github.com/netdata/netdata/blob/master/src/health/REFERENCE.md#example-10---anomaly-rate-based-node-level-alert) -- More examples in the [`/health/health.d/ml.conf`](https://github.com/netdata/netdata/blob/master/src/health/health.d/ml.conf) file that ships with the agent. - -## Learn More - -Check out the resources below to learn more about how Netdata is approaching ML: - -- [Agent ML documentation](https://github.com/netdata/netdata/blob/master/src/ml/README.md). -- [Anomaly Advisor documentation](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/anomaly-advisor.md). -- [Metric Correlations documentation](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md). -- Anomaly Advisor [launch blog post](https://www.netdata.cloud/blog/introducing-anomaly-advisor-unsupervised-anomaly-detection-in-netdata/). -- Netdata Approach to ML [blog post](https://www.netdata.cloud/blog/our-approach-to-machine-learning/). -- `areal/ml` related [GitHub Discussions](https://github.com/netdata/netdata/discussions?discussions_q=label%3Aarea%2Fml). -- Netdata Machine Learning Meetup [deck](https://docs.google.com/presentation/d/1rfSxktg2av2k-eMwMbjN0tXeo76KC33iBaxerYinovs/edit?usp=sharing) and [YouTube recording](https://www.youtube.com/watch?v=eJGWZHVQdNU). -- Netdata Anomaly Advisor [YouTube Playlist](https://youtube.com/playlist?list=PL-P-gAHfL2KPeUcCKmNHXC-LX-FfdO43j). diff --git a/docs/guides/troubleshoot/troubleshooting-agent-with-cloud-connection.md b/docs/guides/troubleshoot/troubleshooting-agent-with-cloud-connection.md deleted file mode 100644 index 0c9962ba2..000000000 --- a/docs/guides/troubleshoot/troubleshooting-agent-with-cloud-connection.md +++ /dev/null @@ -1,147 +0,0 @@ -# Troubleshoot Agent-Cloud connectivity issues - -Learn how to troubleshoot connectivity issues leading to agents not appearing at all in Netdata Cloud, or -appearing with a status other than `live`. - -After installing an agent with the claiming token provided by Netdata Cloud, you should see charts from that node on -Netdata Cloud within seconds. If you don't see charts, check if the node appears in the list of nodes -(Nodes tab, top right Node filter, or Manage Nodes screen). If your node does not appear in the list, or it does appear with a status other than "Live", this guide will help you troubleshoot what's happening. - - The most common explanation for connectivity issues usually falls into one of the following three categories: - -- If the node does not appear at all in Netdata Cloud, [the claiming process was unsuccessful](#the-claiming-process-was-unsuccessful). -- If the node appears as in Netdata Cloud, but is in the "Unseen" state, [the Agent was claimed but can not connect](#the-agent-was-claimed-but-can-not-connect). -- If the node appears as in Netdata Cloud as "Offline" or "Stale", it is a [previously connected agent that can no longer connect](#previously-connected-agent-that-can-no-longer-connect). - -## The claiming process was unsuccessful - -If the claiming process fails, the node will not appear at all in Netdata Cloud. - -First ensure that you: -- Use the newest possible stable or nightly version of the agent (at least v1.32). -- Your node can successfully issue an HTTPS request to https://app.netdata.cloud - -Other possible causes differ between kickstart installations and Docker installations. - -### Verify your node can access Netdata Cloud - -If you run either `curl` or `wget` to do an HTTPS request to https://app.netdata.cloud, you should get -back a 404 response. If you do not, check your network connectivity, domain resolution, -and firewall settings for outbound connections. - -If your firewall is configured to completely prevent outbound connections, you need to whitelist `app.netdata.cloud` and `mqtt.netdata.cloud`. If you can't whitelist domains in your firewall, you can whitelist the IPs that the hostnames resolve to, but keep in mind that they can change without any notice. - -If you use an outbound proxy, you need to [take some extra steps]( https://github.com/netdata/netdata/blob/master/src/claim/README.md#connect-through-a-proxy). - -### Troubleshoot claiming with kickstart.sh - -Claiming is done by executing `netdata-claim.sh`, a script that is usually located under `${INSTALL_PREFIX}/netdata/usr/sbin/netdata-claim.sh`. Possible error conditions we have identified are: -- No script found at all in any of our search paths. -- The path where the claiming script should be does not exist. -- The path exists, but is not a file. -- The path is a file, but is not executable. -Check the output of the kickstart script for any reported errors claiming and verify that the claiming script exists -and can be executed. - -### Troubleshoot claiming with Docker - -First verify that the NETDATA_CLAIM_TOKEN parameter is correctly configured and then check for any errors during -initialization of the container. - -The most common issue we have seen claiming nodes in Docker is [running on older hosts with seccomp enabled](https://github.com/netdata/netdata/blob/master/src/claim/README.md#known-issues-on-older-hosts-with-seccomp-enabled). - -## The Agent was claimed but can not connect - -Agents that appear on the cloud with state "Unseen" have successfully been claimed, but have never -been able to successfully establish an ACLK connection. - -Agents that appear with state "Offline" or "Stale" were able to connect at some point, but are currently not -connected. The difference between the two is that "Stale" nodes had some of their data replicated to a -parent node that is still connected. - -### Verify that the agent is running - -#### Troubleshoot connection establishment with kickstart.sh - -The kickstart script will install/update your Agent and then try to claim the node to the Cloud -(if tokens are provided). To complete the second part, the Agent must be running. In some platforms, -the Netdata service cannot be enabled by default and you must do it manually, using the following steps: - -1. Check if the Agent is running: - - ```bash - systemctl status netdata - ``` - - The expected output should contain info like this: - - ```bash - Active: active (running) since Wed 2022-07-06 12:25:02 EEST; 1h 40min ago - ``` - -2. Enable and start the Netdata Service. - - ```bash - systemctl enable netdata - systemctl start netdata - ``` - -3. Retry the kickstart claiming process. - -> ### Note -> -> In some cases a simple restart of the Agent can fix the issue. -> Read more about [Starting, Stopping and Restarting the Agent](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#maintaining-a-netdata-agent-installation). - -#### Troubleshoot connection establishment with Docker - -If a Netdata container exits or is killed before it properly starts, it may be able to complete the claiming -process, but not have enough time to establish the ACLK connection. - -### Verify that your firewall allows websockets - -The agent initiates an SSL connection to `app.netdata.cloud` and then upgrades that connection to use secure -websockets. Some firewalls completely prevent the use of websockets, even for outbound connections. - -## Previously connected agent that can no longer connect - -The states "Offline" and "Stale" suggest that the agent was able to connect at some point in the past, but -that it is currently not connected. - -### Verify that network connectivity is still possible - -Verify that you can still issue HTTPS requests to app.netdata.cloud and that no firewall or proxy changes were made. - -### Verify that the claiming info is persisted - -If you use Docker, verify that the contents of `/var/lib/netdata` are preserved across container restarts, using a persistent volume. - -### Verify that the claiming info is not cloned - -A relatively common case we have seen especially with VMs is two or more nodes sharing the same credentials. -This happens if you claim a node in a VM and then create an image based on that node. Netdata can't properly -work this way, as we have unique node identification information under `/var/lib/netdata`. - -### Verify that your IP is not blocked by Netdata Cloud - -Most of the nodes change IPs dynamically. It is possible that your current IP has been restricted from accessing `app.netdata.cloud` due to security concerns, usually because it was spamming Netdata Coud with too many -failed requests (old versions of the agent). - -To verify this: - -1. Check the Agent's `aclk-state`. - - ```bash - sudo netdatacli aclk-state | grep "Banned By Cloud" - ``` - - The output will contain a line indicating if the IP is banned from `app.netdata.cloud`: - - ```bash - Banned By Cloud: yes - ``` - -2. If your node's IP is banned, you can: - - - Contact our team to whitelist your IP by submitting a ticket in the [Netdata forum](https://community.netdata.cloud/) - - Change your node's IP diff --git a/docs/guides/using-host-labels.md b/docs/guides/using-host-labels.md deleted file mode 100644 index 961f4b2d7..000000000 --- a/docs/guides/using-host-labels.md +++ /dev/null @@ -1,253 +0,0 @@ -# Organize systems, metrics, and alerts - -When you use Netdata to monitor and troubleshoot an entire infrastructure, you need sophisticated ways of keeping everything organized. -Netdata allows to organize your observability infrastructure with spaces, war rooms, virtual nodes, host labels, and metric labels. - -## Spaces and war rooms - -[Spaces](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/organize-your-infrastrucutre-invite-your-team.md#netdata-cloud-spaces) are used for organization-level or infrastructure-level -grouping of nodes and people. A node can only appear in a single space, while people can have access to multiple spaces. - -The [war rooms](https://github.com/netdata/netdata/edit/master/docs/cloud/war-rooms.md) in a space bring together nodes and people in -collaboration areas. War rooms can also be used for fine-tuned -[role based access control](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/role-based-access.md). - -## Virtual nodes - -Netdata鈥檚 virtual nodes functionality allows you to define nodes in configuration files and have them be treated as regular nodes -in all of the UI, dashboards, tabs, filters etc. For example, you can create a virtual node each for all your Windows machines -and monitor them as discrete entities. Virtual nodes can help you simplify your infrastructure monitoring and focus on the -individual node that matters. - -To define your windows server as a virtual node you need to: - - * Define virtual nodes in `/etc/netdata/vnodes/vnodes.conf` - - ```yaml - - hostname: win_server1 - guid: <value> - ``` - Just remember to use a valid guid (On Linux you can use `uuidgen` command to generate one, on Windows just use the `[guid]::NewGuid()` command in PowerShell) - - * Add the vnode config to the data collection job. e.g. in `go.d/windows.conf`: - ```yaml - jobs: - - name: win_server1 - vnode: win_server1 - url: http://203.0.113.10:9182/metrics - ``` - -## Host labels - -Host labels can be extremely useful when: - -- You need alerts that adapt to the system's purpose -- You need properly-labeled metrics archiving so you can sort, correlate, and mash-up your data to your heart's content. -- You need to keep tabs on ephemeral Docker containers in a Kubernetes cluster. - -Let's take a peek into how to create host labels and apply them across a few of Netdata's features to give you more -organization power over your infrastructure. - -### Default labels - -When Netdata starts, it captures relevant information about the system and converts them into automatically generated -host labels. You can use these to logically organize your systems via health entities, exporting metrics, -parent-child status, and more. - -They capture the following: - -- Kernel version -- Operating system name and version -- CPU architecture, system cores, CPU frequency, RAM, and disk space -- Whether Netdata is running inside of a container, and if so, the OS and hardware details about the container's host -- Whether Netdata is running inside K8s node -- What virtualization layer the system runs on top of, if any -- Whether the system is a streaming parent or child - -If you want to organize your systems without manually creating host labels, try the automatic labels in some of the -features below. You can see them under `http://HOST-IP:19999/api/v1/info`, beginning with an underscore `_`. -```json -{ - ... - "host_labels": { - "_is_k8s_node": "false", - "_is_parent": "false", - ... -``` - -### Custom labels - -Host labels are defined in `netdata.conf`. To create host labels, open that file using `edit-config`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config netdata.conf -``` - -Create a new `[host labels]` section defining a new host label and its value for the system in question. Make sure not -to violate any of the [host label naming rules](https://github.com/netdata/netdata/blob/master/docs/configure/common-changes.md#organize-nodes-with-host-labels). - -```conf -[host labels] - type = webserver - location = us-seattle - installed = 20200218 -``` - -Once you've written a few host labels, you need to enable them. Instead of restarting the entire Netdata service, you -can reload labels using the helpful `netdatacli` tool: - -```bash -netdatacli reload-labels -``` - -Your host labels will now be enabled. You can double-check these by using `curl http://HOST-IP:19999/api/v1/info` to -read the status of your agent. For example, from a VPS system running Debian 10: - -```json -{ - ... - "host_labels": { - "_is_k8s_node": "false", - "_is_parent": "false", - "_virt_detection": "systemd-detect-virt", - "_container_detection": "none", - "_container": "unknown", - "_virtualization": "kvm", - "_architecture": "x86_64", - "_kernel_version": "4.19.0-6-amd64", - "_os_version": "10 (buster)", - "_os_name": "Debian GNU/Linux", - "type": "webserver", - "location": "seattle", - "installed": "20200218" - }, - ... -} -``` - - -### Host labels in streaming - -You may have noticed the `_is_parent` and `_is_child` automatic labels from above. Host labels are also now -streamed from a child to its parent node, which concentrates an entire infrastructure's OS, hardware, container, -and virtualization information in one place: the parent. - -Now, if you'd like to remind yourself of how much RAM a certain child node has, you can access -`http://localhost:19999/host/CHILD_HOSTNAME/api/v1/info` and reference the automatically-generated host labels from the -child system. It's a vastly simplified way of accessing critical information about your infrastructure. - -> 鈿狅笍 Because automatic labels for child nodes are accessible via API calls, and contain sensitive information like -> kernel and operating system versions, you should secure streaming connections with SSL. See the [streaming -> documentation](https://github.com/netdata/netdata/blob/master/src/streaming/README.md#securing-streaming-communications) for details. You may also want to use -> [access lists](https://github.com/netdata/netdata/blob/master/src/web/server/README.md#access-lists) or [expose the API only to LAN/localhost -> connections](https://github.com/netdata/netdata/blob/master/docs/category-overview-pages/secure-nodes.md#expose-netdata-only-in-a-private-lan). - -You can also use `_is_parent`, `_is_child`, and any other host labels in both health entities and metrics -exporting. Speaking of which... - -### Host labels in alerts - -You can use host labels to logically organize your systems by their type, purpose, or location, and then apply specific -alerts to them. - -For example, let's use configuration example from earlier: - -```conf -[host labels] - type = webserver - location = us-seattle - installed = 20200218 -``` - -You could now create a new health entity (checking if disk space will run out soon) that applies only to any host -labeled `webserver`: - -```yaml - template: disk_fill_rate - on: disk.space - lookup: max -1s at -30m unaligned of avail - calc: ($this - $avail) / (30 * 60) - every: 15s - host labels: type = webserver -``` - -Or, by using one of the automatic labels, for only webserver systems running a specific OS: - -```yaml - host labels: _os_name = Debian* -``` - -In a streaming configuration where a parent node is triggering alerts for its child nodes, you could create health -entities that apply only to child nodes: - -```yaml - host labels: _is_child = true -``` - -Or when ephemeral Docker nodes are involved: - -```yaml - host labels: _container = docker -``` - -Of course, there are many more possibilities for intuitively organizing your systems with host labels. See the [health -documentation](https://github.com/netdata/netdata/blob/master/src/health/REFERENCE.md#alert-line-host-labels) for more details, and then get creative! - -### Host labels in metrics exporting - -If you have enabled any metrics exporting via our experimental [exporters](https://github.com/netdata/netdata/blob/master/src/exporting/README.md), any new host -labels you created manually are sent to the destination database alongside metrics. You can change this behavior by -editing `exporting.conf`, and you can even send automatically-generated labels on with exported metrics. - -```conf -[exporting:global] -enabled = yes -send configured labels = yes -send automatic labels = no -``` - -You can also change this behavior per exporting connection: - -```conf -[opentsdb:my_instance3] -enabled = yes -destination = localhost:4242 -data source = sum -update every = 10 -send charts matching = system.cpu -send configured labels = no -send automatic labels = yes -``` - -By applying labels to exported metrics, you can more easily parse historical metrics with the labels applied. To learn -more about exporting, read the [documentation](https://github.com/netdata/netdata/blob/master/src/exporting/README.md). - -## Metric labels - -The Netdata aggregate charts allow you to filter and group metrics based on label name-value pairs. - -All go.d plugin collectors support the specification of labels at the "collection job" level. Some collectors come with out of the box -labels (e.g. generic Prometheus collector, Kubernetes, Docker and more). But you can also add your own custom labels, by configuring -the data collection jobs. - -For example, suppose we have a single Netdata agent, collecting data from two remote Apache web servers, located in different data centers. -The web servers are load balanced and provide access to the service "Payments". - -You can define the following in `go.d.conf`, to be able to group the web requests by service or location: - -``` -jobs: - - name: mywebserver1 - url: http://host1/server-status?auto - labels: - service: "Payments" - location: "Atlanta" - - name: mywebserver2 - url: http://host2/server-status?auto - labels: - service: "Payments" - location: "New York" -``` - -Of course you may define as many custom label/value pairs as you like, in as many data collection jobs you need. |