diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2023-05-08 16:27:08 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2023-05-08 16:27:08 +0000 |
commit | 81581f9719bc56f01d5aa08952671d65fda9867a (patch) | |
tree | 0f5c6b6138bf169c23c9d24b1fc0a3521385cb18 /docs | |
parent | Releasing debian version 1.38.1-1. (diff) | |
download | netdata-81581f9719bc56f01d5aa08952671d65fda9867a.tar.xz netdata-81581f9719bc56f01d5aa08952671d65fda9867a.zip |
Merging upstream version 1.39.0.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'docs')
139 files changed, 3207 insertions, 10050 deletions
diff --git a/docs/Add-more-charts-to-netdata.md b/docs/Add-more-charts-to-netdata.md deleted file mode 100644 index 35a89fba0..000000000 --- a/docs/Add-more-charts-to-netdata.md +++ /dev/null @@ -1,13 +0,0 @@ -<!-- -title: "Add more charts to Netdata" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/Add-more-charts-to-netdata.md ---> - -# Add more charts to Netdata - -This file has been deprecated. Please see our [collectors docs](https://github.com/netdata/netdata/blob/master/collectors/README.md) for more information. - -## Available data collection modules - -See the [list of supported collectors](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md) to see all the sources Netdata can collect metrics -from. diff --git a/docs/Demo-Sites.md b/docs/Demo-Sites.md index 5c4d1018f..1fd0d4192 100644 --- a/docs/Demo-Sites.md +++ b/docs/Demo-Sites.md @@ -1,12 +1,17 @@ <!-- -title: "Demo sites" +title: "Live demos" date: 2020-03-26 custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/Demo-Sites.md +sidebar_label: "Live demos" +learn_status: "Published" +learn_topic_type: "Getting started" +learn_rel_path: "Getting started" +sidebar_position: "90" --> -# Demo sites +# Live demos -You can also view live demos of Netdata at **https://app.netdata.cloud/spaces/netdata-demo** +See the live Netdata Cloud demo with rooms for specific use cases at **https://app.netdata.cloud/spaces/netdata-demo** | Location | Netdata demo URL | 60 mins reqs | VM donated by | | :------------------ | :-------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| :------------------------------------------------- | diff --git a/docs/Donations-netdata-has-received.md b/docs/Donations-netdata-has-received.md deleted file mode 100644 index a8623c5db..000000000 --- a/docs/Donations-netdata-has-received.md +++ /dev/null @@ -1,29 +0,0 @@ -<!-- -title: "Donations" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/Donations-netdata-has-received.md ---> - -# Donations - -This is a list of the donations we have received for Netdata (sorted alphabetically on their name): - -| what donated|related links|who donated|description of the donation| -|-----------:|:-----------:|:---------:|:--------------------------| -| Packages Distribution|-|**[PackageCloud.io](https://packagecloud.io/)**|**PackageCloud.io** donated to a free open-source subscription to their awesome Package Distribution services.| -| Cross Browser Testing|-|**[BrowserStack.com](https://www.browserstack.com/)**|**BrowserStack.com** donated a free subscription to their awesome Browser Testing services (all three of them: Live, Screenshots, Responsive).| -| Cloud VM|[cdn77.my-netdata.io](http://cdn77.my-netdata.io)|**[CDN77.com](https://www.cdn77.com/)**|**CDN77.com** donated a VM with 2 CPU cores, 4GB RAM and 20GB HD, on their excellent CDN network.| -| Localization Management|[Netdata localization project](https://crowdin.com/project/netdata) (check issue [#279](https://github.com/netdata/netdata/issues/279))|**[Crowdin.com](https://crowdin.com/)**|**Crowdin.com** donated an open source license to their Localization Management Platform.| -| Cloud VMs|[london.my-netdata.io](https://london.my-netdata.io) (Several VMs)|**[DigitalOcean.com](https://www.digitalocean.com/)**|**DigitalOcean.com** donated 1000 USD to be used in their excellent Cloud Computing services. Many thanks to [Justin Paine](https://github.com/xxdesmus) for making this happen.| -| Development IDE|-|**[JetBrains.com](https://www.jetbrains.com/)**|**JetBrains.com** donated an open source license for 4 developers for 1 year, to their excellent IDEs.| -| Cloud VM|[octopuscs.my-netdata.io](https://octopuscs.my-netdata.io)|**[OctopusCS.com](https://octopuscs.com/)**|**OctopusCS.com** donated a VM with 4 CPU cores, 16GB RAM and 50GB HD in their excellent Cloud Computing services.| -| Cloud VM|[stackscale.my-netdata.io](https://stackscale.my-netdata.io)|**[stackscale.com](https://www.stackscale.com/)**|**StackScale.com** donated a VM with 4 CPU cores, 16GB RAM and 100GB HD in their excellent Cloud Computing services.| - -Thank you! - ---- - -**Do you want to donate?** We are thirsty for on-line services that can help us make Netdata better. We also try to build a network of demo sites (VMs) that can help us show the full potential of Netdata. - -Please contact me at costa@tsaousis.gr. - - diff --git a/docs/README.md b/docs/README.md deleted file mode 100644 index e3b915617..000000000 --- a/docs/README.md +++ /dev/null @@ -1,17 +0,0 @@ -<!-- -title: "Read documentation on <https://learn.netdata.cloud>" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/README.md ---> - -# Read documentation on <https://learn.netdata.cloud> - -Welcome to the Netdata documentation! While you can read Netdata documentation here, or throughout the Netdata -repository, our intention is that these pages are read on [learn.netdata.cloud](https://learn.netdata.cloud). - -Links between documentation pages will work fine here, but the formatting may not be perfect, as our documentation site -uses a few extra Markdown features that GitHub doesn't support natively. Other things might be missing or look less than -perfect. - -Now get out there and build an exceptional infrastructure. - - diff --git a/docs/Running-behind-apache.md b/docs/Running-behind-apache.md index d152306ff..045bb676e 100644 --- a/docs/Running-behind-apache.md +++ b/docs/Running-behind-apache.md @@ -1,13 +1,4 @@ -<!-- -title: "Netdata via apache's mod_proxy" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/Running-behind-apache.md" -sidebar_label: "Netdata via apache's mod_proxy" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Expose local dashboard through proxy" ---> - -# Netdata via apache's mod_proxy +# Netdata via Apache's mod_proxy Below you can find instructions for configuring an apache server to: @@ -38,13 +29,11 @@ Also, enable the rewrite module: ```sh sudo a2enmod rewrite ``` - - ## Netdata on an existing virtual host On any **existing** and already **working** apache virtual host, you can redirect requests for URL `/netdata/` to one or more Netdata servers. -### proxy one Netdata, running on the same server apache runs +### Proxy one Netdata, running on the same server apache runs Add the following on top of any existing virtual host. It will allow you to access Netdata as `http://virtual.host/netdata/`. @@ -74,7 +63,7 @@ Add the following on top of any existing virtual host. It will allow you to acce </VirtualHost> ``` -### proxy multiple Netdata running on multiple servers +### Proxy multiple Netdata running on multiple servers Add the following on top of any existing virtual host. It will allow you to access multiple Netdata as `http://virtual.host/netdata/HOSTNAME/`, where `HOSTNAME` is the hostname of any other Netdata server you have (to access the `localhost` Netdata, use `http://virtual.host/netdata/localhost/`). @@ -355,7 +344,7 @@ If your apache server is not on localhost, you can set: `allow connections from` accepts [Netdata simple patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md) to match against the connection IP address. -## prevent the double access.log +## Prevent the double access.log apache logs accesses and Netdata logs them too. You can prevent Netdata from generating its access log, by setting this in `/etc/netdata/netdata.conf`: diff --git a/docs/Running-behind-caddy.md b/docs/Running-behind-caddy.md index d7d61375b..b7608b309 100644 --- a/docs/Running-behind-caddy.md +++ b/docs/Running-behind-caddy.md @@ -4,7 +4,7 @@ custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/Running-be sidebar_label: "Netdata via Caddy" learn_status: "Published" learn_topic_type: "Tasks" -learn_rel_path: "Setup/Expose local dashboard through proxy" +learn_rel_path: "Configuration/Secure your nodes" --> # Netdata via Caddy diff --git a/docs/Running-behind-h2o.md b/docs/Running-behind-h2o.md index 8a1e22b2f..deadc91cb 100644 --- a/docs/Running-behind-h2o.md +++ b/docs/Running-behind-h2o.md @@ -4,7 +4,7 @@ custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/Running-be sidebar_label: "Running Netdata behind H2O" learn_status: "Published" learn_topic_type: "Tasks" -learn_rel_path: "Setup/Expose local dashboard through proxy" +learn_rel_path: "Configuration/Secure your nodes" --> # Running Netdata behind H2O diff --git a/docs/Running-behind-haproxy.md b/docs/Running-behind-haproxy.md index f87eaa1fe..4c9c32cc4 100644 --- a/docs/Running-behind-haproxy.md +++ b/docs/Running-behind-haproxy.md @@ -4,7 +4,7 @@ custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/Running-be sidebar_label: "Netdata via HAProxy" learn_status: "Published" learn_topic_type: "Tasks" -learn_rel_path: "Setup/Expose local dashboard through proxy" +learn_rel_path: "Configuration/Secure your nodes" --> # Netdata via HAProxy diff --git a/docs/Running-behind-lighttpd.md b/docs/Running-behind-lighttpd.md index 6350b474b..d1d9acc3e 100644 --- a/docs/Running-behind-lighttpd.md +++ b/docs/Running-behind-lighttpd.md @@ -4,7 +4,7 @@ custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/Running-be sidebar_label: "Netdata via lighttpd v1.4.x" learn_status: "Published" learn_topic_type: "Tasks" -learn_rel_path: "Setup/Expose local dashboard through proxy" +learn_rel_path: "Configuration/Secure your nodes" --> # Netdata via lighttpd v1.4.x diff --git a/docs/Running-behind-nginx.md b/docs/Running-behind-nginx.md index a94f4058d..842a9c326 100644 --- a/docs/Running-behind-nginx.md +++ b/docs/Running-behind-nginx.md @@ -1,12 +1,3 @@ -<!-- -title: "Running Netdata behind Nginx" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/Running-behind-nginx.md" -sidebar_label: "Running Netdata behind Nginx" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup/Expose local dashboard through proxy" ---> - # Running Netdata behind Nginx ## Intro @@ -51,7 +42,7 @@ With this method instead of `SERVER_IP_ADDRESS:19999`, the Netdata dashboard can upstream backend { # the Netdata server server 127.0.0.1:19999; - keepalive 64; + keepalive 1024; } server { @@ -216,8 +207,6 @@ If your Nginx is on `localhost`, you can use this to protect your Netdata: bind to = 127.0.0.1 ::1 ``` - - You can also use a unix domain socket. This will also provide a faster route between Nginx and Netdata: ``` @@ -259,6 +248,26 @@ Nginx logs accesses and Netdata logs them too. You can prevent Netdata from gene access log = none ``` +## Use gzip compression + +By default, netdata compresses its responses. You can have nginx do that instead, with the following options in the `location /` block: + +```conf + location / { + ... + gzip on; + gzip_proxied any; + gzip_types *; + } +``` + +To disable Netdata's gzip compression, open `netdata.conf` and in the `[web]` section put: + +```conf +[web] + enable gzip compression = no +``` + ## SELinux If you get an 502 Bad Gateway error you might check your Nginx error log: diff --git a/docs/a-github-star-is-important.md b/docs/a-github-star-is-important.md deleted file mode 100644 index 22659ea6f..000000000 --- a/docs/a-github-star-is-important.md +++ /dev/null @@ -1,24 +0,0 @@ -<!-- -title: "A GitHub star is important" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/a-github-star-is-important.md ---> - -# A GitHub star is important - -**GitHub stars** allow Netdata to expand its reach, its community, especially attract people with skills willing to -contribute to it. - -Compared to its first release, Netdata is now **twice as fast**, has all its bugs settled and a lot more functionality. -This happened because a lot of people find it useful, use it daily at home and work, **rely on it** and **contribute to -it**. - -**GitHub stars** also **motivate** us. They state that you find our work **useful**. They give us strength to continue, -to work **harder** to make it even **better**. - -So, give Netdata a **GitHub star**, at the top right of this page. - -Thank you! - -Costa Tsaousis - - diff --git a/docs/agent-cloud.md b/docs/agent-cloud.md deleted file mode 100644 index b5b996617..000000000 --- a/docs/agent-cloud.md +++ /dev/null @@ -1,78 +0,0 @@ -<!-- -title: "Use the Agent with Netdata Cloud" -date: 2020-05-04 -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/agent-cloud.md ---> - -# Use the Agent with Netdata Cloud - -While the Netdata Agent is an enormously powerful _distributed_ health monitoring and performance troubleshooting tool, -many of its users need to monitor dozens or hundreds of systems at the same time. That's why we built Netdata Cloud, a -hosted web interface that gives you real-time visibility into your entire infrastructure. - -There are two main ways to use your Agent(s) with Netdata Cloud. You can use both these methods simultaneously, or just -one, based on your needs: - -- Use Netdata Cloud's web interface for monitoring an entire infrastructure, with any number of Agents, in one - centralized dashboard. -- Use **Visited nodes** to quickly navigate between the dashboards of nodes you've recently visited. - -## Monitor an infrastructure with Netdata Cloud - -We designed Netdata Cloud to help you see health and performance metrics, plus active alarms, in a single interface. -Here's what a small infrastructure might look like: - -![Animated GIF of Netdata Cloud](https://user-images.githubusercontent.com/1153921/80828986-1ebb3b00-8b9b-11ea-957f-2c8d0d009e44.gif) - -[Read more about Netdata Cloud](https://github.com/netdata/netdata/blob/master/docs/cloud/cloud.mdx) to better -understand how it gives you real-time -visibility into your entire infrastructure, and why you might consider using it. - -Next, [get started in 5 minutes](https://github.com/netdata/netdata/blob/master/docs/cloud/get-started.mdx), or read our -[connection to Cloud reference](https://github.com/netdata/netdata/blob/master/claim/README.md) for a complete -investigation of Cloud's security and encryption features, plus instructions for Docker containers. - -## Navigate between dashboards with Visited nodes - -If you don't want to use Netdata Cloud's web interface, you can still connect multiple nodes through the **Visited -nodes** menu, which appears on the left-hand side of the dashboard. - -You can use the Visited nodes menu to navigate between the dashboards of many different Agent-monitored systems quickly. - -To add nodes to your Visited nodes menu, you first need to navigate to that node's dashboard, then click the **Sign in** -button at the top of the dashboard. On the screen that appears, which states your node is requesting access to your -Netdata Cloud account, sign in with your preferred method. - -Cloud redirects you back to your node's dashboard, which is now connected to your Netdata Cloud account. You can now see -the Visited nodes menu, which is populated by a single node. - -![An Agent's dashboard with the Visited nodes menu](https://user-images.githubusercontent.com/1153921/80830383-b6ba2400-8b9d-11ea-9eb2-379c7eccd22f.png) - -If you previously went through the Cloud onboarding process to create a Space and War Room, you will also see these in -the Visited Nodes menu. You can click on your Space or any of your War Rooms to navigate to Netdata Cloud and continue -monitoring your infrastructure from there. - -![A Agent's dashboard with the Visited nodes menu, plus Spaces and War Rooms](https://user-images.githubusercontent.com/1153921/80830382-b6218d80-8b9d-11ea-869c-1170b95eeb4a.png) - -To add more Agents to your Visited nodes menu, visit them and sign in again. This process connects that node to your -Cloud account and further populates the menu. - -Once you've added more than one node, you can use the menu to switch between various dashboards without remembering IP -addresses or hostnames or saving bookmarks for every node you want to monitor. - -![Switching between dashboards with Visited nodes](https://user-images.githubusercontent.com/1153921/80831018-e158ac80-8b9e-11ea-882e-1d82cdc028cd.gif) - -## What's next? - -The Agent-Cloud integration is highly adaptable to the needs of any infrastructure or user. If you want to learn more -about how you might want to use or configure Cloud, we recommend the following: - -- Get an overview of Cloud's features by - reading [Cloud documentation](https://github.com/netdata/netdata/blob/master/docs/cloud/cloud.mdx). -- Follow the - 5-minute [get started with Cloud](https://github.com/netdata/netdata/blob/master/docs/cloud/cloud.mdx) - guide to finish - onboarding and connect your first nodes. -- Better understand how agents connect securely to the Cloud - with [connect agent to Cloud](https://github.com/netdata/netdata/blob/master/claim/README.md) and - [Agent-Cloud link](https://github.com/netdata/netdata/blob/master/aclk/README.md) documentation. diff --git a/docs/anonymous-statistics.md b/docs/anonymous-statistics.md index 13eb465c6..512cd02d3 100644 --- a/docs/anonymous-statistics.md +++ b/docs/anonymous-statistics.md @@ -1,9 +1,12 @@ <!-- -title: "Anonymous statistics" +title: "Anonymous telemetry events" custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/anonymous-statistics.md +sidebar_label: "Anonymous telemetry events" +learn_status: "Published" +learn_rel_path: "Configuration" --> -# Anonymous statistics +# Anonymous telemetry events By default, Netdata collects anonymous usage information from the open-source monitoring agent using the open-source product analytics platform [PostHog](https://github.com/PostHog/posthog). We use their [cloud enterprise platform](https://posthog.com/product). @@ -97,9 +100,4 @@ Each of these opt-out processes does the following: - Forces the anonymous statistics script to exit immediately. - Stops the PostHog JavaScript snippet, which remains on the dashboard, from firing and sending any data to the Netdata PostHog. -## Migration from Google Analytics and Google Tag Manager. - -Prior to v1.29.4 we used Google Analytics to capture this information. This led to discomfort with some of our users in sending any product usage data to a third party like Google. It was also not even that useful in terms of generating the insights we needed to help catch bugs early and find opportunities for product improvement as Google Analytics does not allow its users access to the raw underlying data without paying a significant amount of money which would be infeasible for a project like Netdata. - -While we migrate fully away from Google Analytics to PostHog there maybe be a small period of time where we run both in parallel before we remove all Google Analytics related code. This is to ensure we can fully test and validate the Netdata PostHog implementation before fully defaulting to it. diff --git a/docs/category-overview-pages/deployment-strategies.md b/docs/category-overview-pages/deployment-strategies.md new file mode 100644 index 000000000..a1d393f26 --- /dev/null +++ b/docs/category-overview-pages/deployment-strategies.md @@ -0,0 +1,66 @@ +# Deployment strategies + +Netdata can be used to monitor all kinds of infrastructure, from stand-alone tiny IoT devices to complex hybrid setups +combining on-premise and cloud infrastructure, mixing bare-metal servers, virtual machines and containers. + +There are 3 components to structure your Netdata ecosystem: + +1. **Netdata Agents** + To monitor the physical or virtual nodes of your infrastructure, including all applications and containers running on them. + + Netdata Agents are Open-Source, licensed under GPL v3+. + +2. **Netdata Parents** + To create data centralization points within your infrastructure, to offload Netdata Agents functions from your production + systems, to provide high-availability of your data, increased data retention and isolation of your nodes. + + Netdata Parents are implemented using the Netdata Agent software. Any Netdata Agent can be an Agent for a node and a Parent + for other Agents, at the same time. + + It is recommended to set up multiple Netdata Parents. They will all seamlessly be integrated by Netdata Cloud into one monitoring solution. + + +3. **Netdata Cloud** + Our SaaS, combining all your infrastructure, all your Netdata Agents and Parents, into one uniform, distributed, infinitely + scalable, monitoring database, offering advanced data slicing and dicing capabilities, custom dashboards, advanced troubleshooting + tools, user management, centralized management of alerts, and more. + + +The Netdata Agent is a highly modular software piece, providing data collection via numerous plugins, an in-house crafted time-series +database, a query engine, health monitoring and alerts, machine learning and anomaly detection, metrics exporting to third party systems. + + +To help our users have a complete experience of Netdata when they install it for the first time, a Netdata Agent with default configuration +is a complete monitoring solution out of the box, having all these features enabled and available. + +We strongly recommend the following configuration changes for production deployments: + +1. Understand Netdata's [security and privacy design](https://github.com/netdata/netdata/blob/master/docs/netdata-security.md) and + [secure your nodes](https://github.com/netdata/netdata/blob/master/docs/category-overview-pages/secure-nodes.md) + + To safeguard your infrastructure and comply with your organization's security policies. + +2. Set up [streaming and replication](https://github.com/netdata/netdata/blob/master/streaming/README.md) to: + + - Offload Netdata Agents running on production systems and free system resources for the production applications running on them. + - Isolate production systems from the rest of the world and improve security. + - Increase data retention. + - Make your data highly available. + +3. [Optimize the Netdata Agents system utilization and performance](https://github.com/netdata/netdata/edit/master/docs/guides/configure/performance.md) + + To save valuable system resources, especially when running on weak IoT devices. + +We also suggest that you: + +1. [Use Netdata Cloud to access the dashboards](https://github.com/netdata/netdata/blob/master/docs/quickstart/infrastructure.md) + + For increased security, user management and access to our latest tools for advanced dashboarding and troubleshooting. + +2. [Change how long Netdata stores metrics](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) + + To control Netdata's memory use, when you have a lot of ephemeral metrics. + +3. [Use host labels](https://github.com/netdata/netdata/blob/master/docs/guides/using-host-labels.md) + + To organize systems, metrics, and alarms. diff --git a/docs/category-overview-pages/installation-overview.md b/docs/category-overview-pages/installation-overview.md new file mode 100644 index 000000000..e60dd442c --- /dev/null +++ b/docs/category-overview-pages/installation-overview.md @@ -0,0 +1,10 @@ +# Installation + +In this category you can find instructions on all the possible ways you can install Netdata on the +[supported platforms](https://github.com/netdata/netdata/blob/master/packaging/PLATFORM_SUPPORT.md). + +If this is your first time using Netdata, we recommend that you first start with the +[quick installation guide](https://github.com/netdata/netdata/edit/master/packaging/installer/README.md) and then +go into the more advanced options available to you. + + diff --git a/docs/category-overview-pages/integrations-overview.md b/docs/category-overview-pages/integrations-overview.md new file mode 100644 index 000000000..6fa2f50af --- /dev/null +++ b/docs/category-overview-pages/integrations-overview.md @@ -0,0 +1,31 @@ +<!-- +title: "Integrations" +sidebar_label: "Integrations" +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/category-overview-pages/integrations-overview.md" +description: "Available integrations in Netdata" +learn_status: "Published" +learn_rel_path: "Integrations" +sidebar_position: 60 +--> + +# Integrations + +Netdata's ability to monitor out of the box every potentially useful aspect of a node's operation is unparalleled. +But Netdata also provides out of the box, meaningful charts and alerts for hundreds of applications, with the ability +to be easily extended to monitor anything. See the full list of Netdata's capabilities and how you can extend them in the +[supported collectors list](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md). + +Our out of the box alerts were created by expert professionals and have been validated on the field, countless times. +Use them to trigger [alert notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md) +either centrally, via the +[Cloud alert notifications](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.md) +, or by configuring individual +[agent notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md). + +We designed Netdata with interoperability in mind. The Agent collects thousands of metrics every second, and then what +you do with them is up to you. You can +[store metrics in the database engine](https://github.com/netdata/netdata/blob/master/database/README.md), +or send them to another time series database for long-term storage or further analysis using +Netdata's [exporting engine](https://github.com/netdata/netdata/edit/master/exporting/README.md). + + diff --git a/docs/category-overview-pages/misc-overview.md b/docs/category-overview-pages/misc-overview.md new file mode 100644 index 000000000..e0c1cc0d1 --- /dev/null +++ b/docs/category-overview-pages/misc-overview.md @@ -0,0 +1,19 @@ +<!-- +title: "Miscellaneous material" +sidebar_label: "Miscellaneous" +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/category-overview-pages/misc-overview.md" +description: "Available integrations in Netdata" +learn_status: "Published" +learn_rel_path: "Miscellaneous" +sidebar_position: 110 +--> + +# Miscellaneous material + +This section contains temporary material that no longer belongs in our official documentation, and will +be moved to other locations. We keep it here to make it accessible while we create the new articles. + + + + + diff --git a/docs/category-overview-pages/reverse-proxies.md b/docs/category-overview-pages/reverse-proxies.md new file mode 100644 index 000000000..07c8b9bd5 --- /dev/null +++ b/docs/category-overview-pages/reverse-proxies.md @@ -0,0 +1,34 @@ +# Running Netdata behind a reverse proxy + +If you need to access a Netdata agent's user interface or API in a production environment we recommend you put Netdata behind +another web server and secure access to the dashboard via SSL, user authentication and firewall rules. + +A dedicated web server also provides more robustness and capabilities than the Agent's [internal web server](https://github.com/netdata/netdata/blob/master/web/README.md). + +We have documented running behind +[nginx](https://github.com/netdata/netdata/blob/master/docs/Running-behind-nginx.md), +[Apache](https://github.com/netdata/netdata/blob/master/docs/Running-behind-apache.md), +[HAProxy](https://github.com/netdata/netdata/blob/master/docs/Running-behind-haproxy.md), +[Lighttpd](https://github.com/netdata/netdata/blob/master/docs/Running-behind-lighttpd.md), +[Caddy](https://github.com/netdata/netdata/blob/master/docs/Running-behind-caddy.md), +and [H2O](https://github.com/netdata/netdata/blob/master/docs/Running-behind-h2o.md). +If you prefer a different web server, we suggest you follow the documentation for nginx and tell us how you did it + by adding your own "Running behind webserverX" document. + +When you run Netdata behind a reverse proxy, we recommend you firewall protect all your Netdata servers, so that only the web server IP will be allowed to directly access Netdata. To do this, run this on each of your servers (or use your firewall manager): + +```sh +PROXY_IP="1.2.3.4" +iptables -t filter -I INPUT -p tcp --dport 19999 \! -s ${PROXY_IP} -m conntrack --ctstate NEW -j DROP +``` + +The above will prevent anyone except your web server to access a Netdata dashboard running on the host. + +You can also use `netdata.conf`: + +``` +[web] + allow connections from = localhost 1.2.3.4 +``` + +Of course, you can add more IPs. diff --git a/docs/category-overview-pages/secure-nodes.md b/docs/category-overview-pages/secure-nodes.md new file mode 100644 index 000000000..33e205f00 --- /dev/null +++ b/docs/category-overview-pages/secure-nodes.md @@ -0,0 +1,177 @@ +# Secure your nodes
+
+Netdata is a monitoring system. It should be protected, the same way you protect all your admin apps. We assume Netdata
+will be installed privately, for your eyes only.
+
+Upon installation, the Netdata Agent serves the **local dashboard** at port `19999`. If the node is accessible to the
+internet at large, anyone can access the dashboard and your node's metrics at `http://NODE:19999`. We made this decision
+so that the local dashboard was immediately accessible to users, and so that we don't dictate how professionals set up
+and secure their infrastructures.
+
+Viewers will be able to get some information about the system Netdata is running. This information is everything the dashboard
+provides. The dashboard includes a list of the services each system runs (the legends of the charts under the `Systemd Services`
+section), the applications running (the legends of the charts under the `Applications` section), the disks of the system and
+their names, the user accounts of the system that are running processes (the `Users` and `User Groups` section of the dashboard),
+the network interfaces and their names (not the IPs) and detailed information about the performance of the system and its applications.
+
+This information is not sensitive (meaning that it is not your business data), but **it is important for possible attackers**.
+It will give them clues on what to check, what to try and in the case of DDoS against your applications, they will know if they
+are doing it right or not.
+
+Also, viewers could use Netdata itself to stress your servers. Although the Netdata daemon runs unprivileged, with the minimum
+process priority (scheduling priority `idle` - lower than nice 19) and adjusts its OutOfMemory (OOM) score to 1000 (so that it
+will be first to be killed by the kernel if the system starves for memory), some pressure can be applied on your systems if
+someone attempts a DDoS against Netdata.
+
+Instead of dictating how to secure your infrastructure, we give you many options to establish security best practices
+that align with your goals and your organization's standards.
+
+- [Disable the local dashboard](#disable-the-local-dashboard): **Simplest and recommended method** for those who have
+ added nodes to Netdata Cloud and view dashboards and metrics there.
+
+- [Expose Netdata only in a private LAN](#expose-netdata-only-in-a-private-lan). Simplest and recommended method for those who do not use Netdata Cloud.
+
+- [Fine-grained access control](#fine-grained-access-control): Allow local dashboard access from
+ only certain IP addresses, such as a trusted static IP or connections from behind a management LAN. Full support for Netdata Cloud.
+
+- [Use a reverse proxy (authenticating web server in proxy mode)](#use-an-authenticating-web-server-in-proxy-mode): Password-protect
+ a local dashboard and enable TLS to secure it. Full support for Netdata Cloud.
+
+- [Use Netdata parents as Web Application Firewalls](#use-netdata-parents-as-web-application-firewalls)
+
+- [Other methods](#other-methods) list some less common methods of protecting Netdata.
+
+## Disable the local dashboard
+
+This is the _recommended method for those who have connected their nodes to Netdata Cloud_ and prefer viewing real-time
+metrics using the War Room Overview, Nodes tab, and Cloud dashboards.
+
+You can disable the local dashboard (and API) but retain the encrypted Agent-Cloud link
+([ACLK](https://github.com/netdata/netdata/blob/master/aclk/README.md)) that
+allows you to stream metrics on demand from your nodes via the Netdata Cloud interface. This change mitigates all
+concerns about revealing metrics and system design to the internet at large, while keeping all the functionality you
+need to view metrics and troubleshoot issues with Netdata Cloud.
+
+Open `netdata.conf` with `./edit-config netdata.conf`. Scroll down to the `[web]` section, and find the `mode =
+static-threaded` setting, and change it to `none`.
+
+```conf
+[web]
+ mode = none
+```
+
+Save and close the editor, then [restart your Agent](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md)
+using `sudo systemctl
+restart netdata`. If you try to visit the local dashboard to `http://NODE:19999` again, the connection will fail because
+that node no longer serves its local dashboard.
+
+> See the [configuration basics doc](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) for details on how to find
+`netdata.conf` and use
+> `edit-config`.
+
+## Expose Netdata only in a private LAN
+
+If your organisation has a private administration and management LAN, you can bind Netdata on this network interface on all your servers.
+This is done in `Netdata.conf` with these settings:
+
+```
+[web]
+ bind to = 10.1.1.1:19999 localhost:19999
+```
+
+You can bind Netdata to multiple IPs and ports. If you use hostnames, Netdata will resolve them and use all the IPs
+(in the above example `localhost` usually resolves to both `127.0.0.1` and `::1`).
+
+**This is the best and the suggested way to protect Netdata**. Your systems **should** have a private administration and management
+LAN, so that all management tasks are performed without any possibility of them being exposed on the internet.
+
+For cloud based installations, if your cloud provider does not provide such a private LAN (or if you use multiple providers),
+you can create a virtual management and administration LAN with tools like `tincd` or `gvpe`. These tools create a mesh VPN
+allowing all servers to communicate securely and privately. Your administration stations join this mesh VPN to get access to
+management and administration tasks on all your cloud servers.
+
+For `gvpe` we have developed a [simple provisioning tool](https://github.com/netdata/netdata-demo-site/tree/master/gvpe) you
+may find handy (it includes statically compiled `gvpe` binaries for Linux and FreeBSD, and also a script to compile `gvpe`
+on your macOS system). We use this to create a management and administration LAN for all Netdata demo sites (spread all over
+the internet using multiple hosting providers).
+
+## Fine-grained access control
+
+If you want to keep using the local dashboard, but don't want it exposed to the internet, you can restrict access with
+[access lists](https://github.com/netdata/netdata/blob/master/web/server/README.md#access-lists). This method also fully
+retains the ability to stream metrics
+on-demand through Netdata Cloud.
+
+The `allow connections from` setting helps you allow only certain IP addresses or FQDN/hostnames, such as a trusted
+static IP, only `localhost`, or connections from behind a management LAN.
+
+By default, this setting is `localhost *`. This setting allows connections from `localhost` in addition to _all_
+connections, using the `*` wildcard. You can change this setting using Netdata's [simple
+patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md).
+
+```conf
+[web]
+ # Allow only localhost connections
+ allow connections from = localhost
+
+ # Allow only from management LAN running on `10.X.X.X`
+ allow connections from = 10.*
+
+ # Allow connections only from a specific FQDN/hostname
+ allow connections from = example*
+```
+
+The `allow connections from` setting is global and restricts access to the dashboard, badges, streaming, API, and
+`netdata.conf`, but you can also set each of those access lists more granularly if you choose:
+
+```conf
+[web]
+ allow connections from = localhost *
+ allow dashboard from = localhost *
+ allow badges from = *
+ allow streaming from = *
+ allow netdata.conf from = localhost fd* 10.* 192.168.* 172.16.* 172.17.* 172.18.* 172.19.* 172.20.* 172.21.* 172.22.* 172.23.* 172.24.* 172.25.* 172.26.* 172.27.* 172.28.* 172.29.* 172.30.* 172.31.*
+ allow management from = localhost
+```
+
+See the [web server](https://github.com/netdata/netdata/blob/master/web/server/README.md#access-lists) docs for additional details
+about access lists. You can take
+access lists one step further by [enabling SSL](https://github.com/netdata/netdata/blob/master/web/server/README.md#enabling-tls-support) to encrypt data from local
+dashboard in transit. The connection to Netdata Cloud is always secured with TLS.
+
+## Use an authenticating web server in proxy mode
+
+Use one web server to provide authentication in front of **all your Netdata servers**. So, you will be accessing all your Netdata with
+URLs like `http://{HOST}/netdata/{NETDATA_HOSTNAME}/` and authentication will be shared among all of them (you will sign-in once for all your servers).
+Instructions are provided on how to set the proxy configuration to have Netdata run behind
+[nginx](https://github.com/netdata/netdata/blob/master/docs/Running-behind-nginx.md),
+[HAproxy](https://github.com/netdata/netdata/blob/master/docs/Running-behind-haproxy.md),
+[Apache](https://github.com/netdata/netdata/blob/master/docs/Running-behind-apache.md),
+[lighthttpd](https://github.com/netdata/netdata/blob/master/docs/Running-behind-lighttpd.md),
+[caddy](https://github.com/netdata/netdata/blob/master/docs/Running-behind-caddy.md), and
+[H2O](https://github.com/netdata/netdata/blob/master/docs/Running-behind-h2o.md).
+
+## Use Netdata parents as Web Application Firewalls
+
+The Netdata Agents you install on your production systems do not need direct access to the Internet. Even when you use
+Netdata Cloud, you can appoint one or more Netdata Parents to act as border gateways or application firewalls, isolating
+your production systems from the rest of the world. Netdata
+Parents receive metric data from Netdata Agents or other Netdata Parents on one side, and serve most queries using their own
+copy of the data to satisfy dashboard requests on the other side.
+
+For more information see [Streaming and replication](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.md).
+
+## Other methods
+
+Of course, there are many more methods you could use to protect Netdata:
+
+- Bind Netdata to localhost and use `ssh -L 19998:127.0.0.1:19999 remote.netdata.ip` to forward connections of local port 19998 to remote port 19999.
+This way you can ssh to a Netdata server and then use `http://127.0.0.1:19998/` on your computer to access the remote Netdata dashboard.
+
+- If you are always under a static IP, you can use the script given above to allow direct access to your Netdata servers without authentication,
+from all your static IPs.
+
+- Install all your Netdata in **headless data collector** mode, forwarding all metrics in real-time to a parent
+ Netdata server, which will be protected with authentication using an nginx server running locally at the parent
+ Netdata server. This requires more resources (you will need a bigger parent Netdata server), but does not require
+ any firewall changes, since all the child Netdata servers will not be listening for incoming connections.
diff --git a/docs/category-overview-pages/troubleshooting-overview.md b/docs/category-overview-pages/troubleshooting-overview.md new file mode 100644 index 000000000..60406edd6 --- /dev/null +++ b/docs/category-overview-pages/troubleshooting-overview.md @@ -0,0 +1,5 @@ +# Troubleshooting and machine learning + +In this section you can learn about Netdata's advanced tools that can assist you in troubleshooting issues with +your infrastructure, to facilitate the identification of a root cause. + diff --git a/docs/category-overview-pages/visualizations-overview.md b/docs/category-overview-pages/visualizations-overview.md new file mode 100644 index 000000000..d07af062c --- /dev/null +++ b/docs/category-overview-pages/visualizations-overview.md @@ -0,0 +1,4 @@ +# Visualizations, charts and dashboards + +In this section you can learn about the various ways Netdata visualizes the collected metrics at an infrastructure level with Netdata Cloud +and at a single node level, with the Netdata Agent Dashboard. diff --git a/docs/cloud/alerts-notifications/add-discord-notification.md b/docs/cloud/alerts-notifications/add-discord-notification.md index 386e6035e..d1769f0e2 100644 --- a/docs/cloud/alerts-notifications/add-discord-notification.md +++ b/docs/cloud/alerts-notifications/add-discord-notification.md @@ -1,17 +1,8 @@ -<!-- -title: "Add Discord notification configuration" -sidebar_label: "Add Discord notification configuration" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/add-discord-notification-configuration.md" -sidebar_position: "1" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations/Alerts" -learn_docs_purpose: "Instructions on how to add notification configuration for Discord" ---> +# Add Discord notification configuration From the Netdata Cloud UI, you can manage your space's notification settings and enable the configuration to deliver notifications on Discord. -#### Prerequisites +## Prerequisites To enable Discord notifications you need: @@ -19,7 +10,7 @@ To enable Discord notifications you need: - Access to the space as an **administrator** - Have a Discord server able to receive webhook integrations. For mode details check [how to configure this on Discord](#settings-on-discord) -#### Steps +## Steps 1. Click on the **Space settings** cog (located above your profile icon) 1. Click on the **Notification** tab @@ -35,9 +26,9 @@ To enable Discord notifications you need: - Webhook URL - URL provided on Discord for the channel you want to receive your notifications. For more details check [how to configure this on Discord](#settings-on-discord) - Thread name - if the Discord channel is a **Forum channel** you will need to provide the thread name as well -#### Settings on Discord +## Settings on Discord -#### Enable webhook integrations on Discord server +## Enable webhook integrations on Discord server To enable the webhook integrations on Discord you need: 1. Go to *Integrations** under your **Server Settings @@ -51,9 +42,3 @@ To enable the webhook integrations on Discord you need: ![image](https://user-images.githubusercontent.com/82235632/214092713-d16389e3-080f-4e1c-b150-c0fccbf4570e.png) For more details please read this article from Discord: [Intro to Webhooks](https://support.discord.com/hc/en-us/articles/228383668). - -#### Related topics - -- [Alerts Configuration](https://github.com/netdata/netdata/blob/master/health/README.md) -- [Alert Notifications](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.mdx) -- [Manage notification methods](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/manage-notification-methods.md)
\ No newline at end of file diff --git a/docs/cloud/alerts-notifications/add-opsgenie-notification-configuration.md b/docs/cloud/alerts-notifications/add-opsgenie-notification-configuration.md new file mode 100644 index 000000000..28e526c90 --- /dev/null +++ b/docs/cloud/alerts-notifications/add-opsgenie-notification-configuration.md @@ -0,0 +1,37 @@ +# Add Opsgenie notification configuration + +From the Cloud interface, you can manage your space's notification settings and from these you can add a specific configuration to get notifications delivered on Opsgenie. + +## Prerequisites + +To add Opsgenie notification configurations you need + +- A Netdata Cloud account +- Access to the space as an **administrator** +- Space on **Business** plan or higher +- Have a permission to add new integrations in Opsgenie. + +## Steps + +1. Click on the **Space settings** cog (located above your profile icon) +1. Click on the **Notification** tab +1. Click on the **+ Add configuration** button (near the top-right corner of your screen) +1. On the **Opsgenie** card click on **+ Add** +1. A modal will be presented to you to enter the required details to enable the configuration: + 1. **Notification settings** are Netdata specific settings + - Configuration name - you can optionally provide a name for your configuration you can easily refer to it + - Rooms - by specifying a list of Rooms you are select to which nodes or areas of your infrastructure you want to be notified using this configuration + - Notification - you specify which notifications you want to be notified using this configuration: All Alerts and unreachable, All Alerts, Critical only + 1. **Integration configuration** are the specific notification integration required settings, which vary by notification method. For Opsgenie: + - API Key - a key provided on Opsgenie for the channel you want to receive your notifications. For more details check [how to configure this on Opsgenie](#settings-on-opsgenie) + +## Settings on Opsgenie + +To enable the Netdata integration on Opsgenie you need: +1. Go to integrations tab of your team, click **Add integration**. + + ![image](https://user-images.githubusercontent.com/93676586/230361479-cb73919c-452d-47ec-8066-ed99be5f05e2.png) + +1. Pick **API** from available integrations. Copy your API Key and press **Save Integration**. + +1. Paste copied API key into the corresponding field in **Integration configuration** section of Opsgenie modal window in Netdata.
\ No newline at end of file diff --git a/docs/cloud/alerts-notifications/add-pagerduty-notification-configuration.md b/docs/cloud/alerts-notifications/add-pagerduty-notification-configuration.md index 6e47cfd9c..64880ebe3 100644 --- a/docs/cloud/alerts-notifications/add-pagerduty-notification-configuration.md +++ b/docs/cloud/alerts-notifications/add-pagerduty-notification-configuration.md @@ -1,26 +1,17 @@ -<!-- -title: "Add PagerDuty notification configuration" -sidebar_label: "Add PagerDuty notification configuration" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/add-pagerduty-notification-configuration.md" -sidebar_position: "1" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations/Alerts" -learn_docs_purpose: "Instructions on how to add notification configuration for PagerDuty" ---> +# Add PagerDuty notification configuration -From the Cloud interface, you can manage your space's notification settings and from these you can add specific configuration to get notifications delivered on PagerDuty. +From the Cloud interface, you can manage your space's notification settings and from these you can add a specific configuration to get notifications delivered on PagerDuty. -#### Prerequisites +## Prerequisites To add PagerDuty notification configurations you need - A Cloud account - Access to the space as and **administrator** -- Space will needs to be on **Business** plan or higher +- Space needs to be on **Business** plan or higher - Have a PagerDuty service to receive events, for mode details check [how to configure this on PagerDuty](#settings-on-pagerduty) -#### Steps +## Steps 1. Click on the **Space settings** cog (located above your profile icon) 1. Click on the **Notification** tab @@ -34,9 +25,9 @@ To add PagerDuty notification configurations you need 1. **Integration configuration** are the specific notification integration required settings, which vary by notification method. For PagerDuty: - Integration Key - is a 32 character key provided by PagerDuty to receive events on your service. For more details check [how to configure this on PagerDuty](#settings-on-pagerduty) -#### Settings on PagerDuty +## Settings on PagerDuty -#### Enable webhook integrations on PagerDuty +## Enable webhook integrations on PagerDuty To enable the webhook integrations on PagerDuty you need: 1. Create a service to receive events from your services directory page: @@ -49,12 +40,4 @@ To enable the webhook integrations on PagerDuty you need: 1. Once the service is created you will be redirected to its configuration page, where you can copy the **integration key**, that you will need need to add to your notification configuration on Netdata UI: - ![image](https://user-images.githubusercontent.com/2930882/214255916-0d2e53d5-87cc-408a-9f5b-0308a3262d5c.png) - - -#### Related topics - -- [Alerts Configuration](https://github.com/netdata/netdata/blob/master/health/README.md) -- [Alert Notifications](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.mdx) -- [Manage notification methods](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/manage-notification-methods.md)
\ No newline at end of file diff --git a/docs/cloud/alerts-notifications/add-slack-notification-configuration.md b/docs/cloud/alerts-notifications/add-slack-notification-configuration.md index d8d6185fe..99bb2d5b5 100644 --- a/docs/cloud/alerts-notifications/add-slack-notification-configuration.md +++ b/docs/cloud/alerts-notifications/add-slack-notification-configuration.md @@ -1,26 +1,17 @@ -<!-- -title: "Add Slack notification configuration" -sidebar_label: "Add Slack notification configuration" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/add-slack-notification-configuration.md" -sidebar_position: "1" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations/Alerts" -learn_docs_purpose: "Instructions on how to add notification configuration for Slack" ---> +# Add Slack notification configuration -From the Cloud interface, you can manage your space's notification settings and from these you can add specific configuration to get notifications delivered on Slack. +From the Cloud interface, you can manage your space's notification settings and from these you can add a specific configuration to get notifications delivered on Slack. -#### Prerequisites +## Prerequisites To add discord notification configurations you need - A Netdata Cloud account - Access to the space as an **administrator** -- Space will needs to be on **Business** plan or higher +- Space needs to be on **Business** plan or higher - Have a Slack app on your workspace to receive the webhooks, for mode details check [how to configure this on Slack](#settings-on-slack) -#### Steps +## Steps 1. Click on the **Space settings** cog (located above your profile icon) 1. Click on the **Notification** tab @@ -34,7 +25,7 @@ To add discord notification configurations you need 1. **Integration configuration** are the specific notification integration required settings, which vary by notification method. For Slack: - Webhook URL - URL provided on Slack for the channel you want to receive your notifications. For more details check [how to configure this on Slack](#settings-on-slack) -#### Settings on Slack +## Settings on Slack To enable the webhook integrations on Slack you need: 1. Create an app to receive webhook integrations. Check [Create an app](https://api.slack.com/apps?new_app=1) from Slack documentation for further details @@ -54,10 +45,3 @@ To enable the webhook integrations on Slack you need: ![image](https://user-images.githubusercontent.com/82235632/214104412-13aaeced-1b40-4894-85f6-9db0eb35c584.png) For more details please check Slacks's article [Incoming webhooks for Slack](https://slack.com/help/articles/115005265063-Incoming-webhooks-for-Slack). - - -#### Related topics - -- [Alerts Configuration](https://github.com/netdata/netdata/blob/master/health/README.md) -- [Alert Notifications](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.mdx) -- [Manage notification methods](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/manage-notification-methods.md)
\ No newline at end of file diff --git a/docs/cloud/alerts-notifications/add-webhook-notification-configuration.md b/docs/cloud/alerts-notifications/add-webhook-notification-configuration.md index e6d042339..0140c30fd 100644 --- a/docs/cloud/alerts-notifications/add-webhook-notification-configuration.md +++ b/docs/cloud/alerts-notifications/add-webhook-notification-configuration.md @@ -1,17 +1,8 @@ -<!-- -title: "Add webhook notification configuration" -sidebar_label: "Add webhook notification configuration" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/add-webhook-notification-configuration.md" -sidebar_position: "1" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations/Alerts" -learn_docs_purpose: "Instructions on how to add notification configuration for webhook" ---> +# Add webhook notification configuration -From the Cloud interface, you can manage your space's notification settings and from these you can add specific configuration to get notifications delivered on a webhook using a predefined schema. +From the Cloud interface, you can manage your space's notification settings and from these you can add a specific configuration to get notifications delivered on a webhook using a predefined schema. -#### Prerequisites +## Prerequisites To add discord notification configurations you need @@ -20,7 +11,7 @@ To add discord notification configurations you need - Space needs to be on **Pro** plan or higher - Have an app that allows you to receive webhooks following a predefined schema, for mode details check [how to create the webhook service](#webhook-service) -#### Steps +## Steps 1. Click on the **Space settings** cog (located above your profile icon) 1. Click on the **Notification** tab @@ -34,16 +25,16 @@ To add discord notification configurations you need 1. **Integration configuration** are the specific notification integration required settings, which vary by notification method. For webhook: - Webhook URL - webhook URL is the url of the service that Netdata will send notifications to. In order to keep the communication secured, we only accept HTTPS urls. Check [how to create the webhook service](#webhook-service). - Extra headers - these are optional key-value pairs that you can set to be included in the HTTP requests sent to the webhook URL. For mode details check [Extra headers](#extra-headers) - - Authorization Mechanism - Netdata webhook integration supports 3 different authorization mechanisms. For mode details check [Authorization mechanism](#authorization-mechanism): + - Authentication Mechanism - Netdata webhook integration supports 3 different authentication mechanisms. For mode details check [Authentication mechanisms](#authentication-mechanisms): - Mutual TLS (recommended) - default authentication mechanism used if no other method is selected. - Basic - the client sends a request with an Authorization header that includes a base64-encoded string in the format **username:password**. These will settings will be required inputs. - Bearer - the client sends a request with an Authorization header that includes a **bearer token**. This setting will be a required input. -#### Webhook service +## Webhook service A webhook integration allows your application to receive real-time alerts from Netdata by sending HTTP requests to a specified URL. In this document, we'll go over the steps to set up a generic webhook integration, including adding headers, and implementing different types of authorization mechanisms. -##### Netdata webhook integration +### Netdata webhook integration A webhook integration is a way for one service to notify another service about events that occur within it. This is done by sending an HTTP POST request to a specified URL (known as the "webhook URL") when an event occurs. @@ -59,16 +50,17 @@ The notification content sent to the destination service will be a JSON object h | chart | string | The chart associated with the alert. | | context | string | The chart context. | | space | string | The space where the node that raised the alert is assigned. | +| rooms | object[object(string,string)] | Object with list of rooms names and urls where the node belongs to. | | family | string | Context family. | | class | string | Classification of the alert, e.g. "Error". | | severity | string | Alert severity, can be one of "warning", "critical" or "clear". | | date | string | Date of the alert in ISO8601 format. | | duration | string | Duration the alert has been raised. | -| critical_count | integer | umber of critical alerts currently existing on the same node. | -| warning_count | integer | Number of warning alerts currently existing on the same node. | +| additional_active_critical_alerts | integer | Number of additional critical alerts currently existing on the same node. | +| additional_active_warning_alerts | integer | Number of additional warning alerts currently existing on the same node. | | alarm_url | string | Netdata Cloud URL for this alarm. | -##### Extra headers +### Extra headers When setting up a webhook integration, the user can specify a set of headers to be included in the HTTP requests sent to the webhook URL. @@ -78,28 +70,165 @@ By default, the following headers will be sent in the HTTP request |:-------------------------------:|-----------------------------| | Content-Type | application/json | -##### Authorization mechanism +### Authentication mechanisms -Netdata webhook integration supports 3 different authorization mechanisms: +Netdata webhook integration supports 3 different authentication mechanisms: -1. Mutual TLS (recommended) +#### Mutual TLS authentication (recommended) -In mutual Transport Layer Security (mTLS) authorization, the client and the server authenticate each other using X.509 certificates. This ensures that the client is connecting to the intended server, and that the server is only accepting connections from authorized clients. - -To take advantage of mutual TLS, you can configure your server to verify Netdata's client certificate. To do that you need to download our [CA certificate file](http://localhost) and configure your server to use it as the +In mutual Transport Layer Security (mTLS) authentication, the client and the server authenticate each other using X.509 certificates. This ensures that the client is connecting to the intended server, and that the server is only accepting connections from authorized clients. This is the default authentication mechanism used if no other method is selected. -2. Basic +To take advantage of mutual TLS, you can configure your server to verify Netdata's client certificate. In order to achieve this, the Netdata client sending the notification supports mutual TLS (mTLS) to identify itself with a client certificate that your server can validate. + +The steps to perform this validation are as follows: + +- Store Netdata CA certificate on a file in your disk. The content of this file should be: + +<details> + <summary>Netdata CA certificate</summary> + +``` +-----BEGIN CERTIFICATE----- +MIIF0jCCA7qgAwIBAgIUDV0rS5jXsyNX33evHEQOwn9fPo0wDQYJKoZIhvcNAQEN +BQAwgYAxCzAJBgNVBAYTAlVTMRMwEQYDVQQIEwpDYWxpZm9ybmlhMRYwFAYDVQQH +Ew1TYW4gRnJhbmNpc2NvMRYwFAYDVQQKEw1OZXRkYXRhLCBJbmMuMRIwEAYDVQQL +EwlDbG91ZCBTUkUxGDAWBgNVBAMTD05ldGRhdGEgUm9vdCBDQTAeFw0yMzAyMjIx +MjQzMDBaFw0zMzAyMTkxMjQzMDBaMIGAMQswCQYDVQQGEwJVUzETMBEGA1UECBMK +Q2FsaWZvcm5pYTEWMBQGA1UEBxMNU2FuIEZyYW5jaXNjbzEWMBQGA1UEChMNTmV0 +ZGF0YSwgSW5jLjESMBAGA1UECxMJQ2xvdWQgU1JFMRgwFgYDVQQDEw9OZXRkYXRh +IFJvb3QgQ0EwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQCwIg7z3R++ +ppQYYVVoMIDlhWO3qVTMsAQoJYEvVa6fqaImUBLW/k19LUaXgUJPohB7gBp1pkjs +QfY5dBo8iFr7MDHtyiAFjcQV181sITTMBEJwp77R4slOXCvrreizhTt1gvf4S1zL +qeHBYWEgH0RLrOAqD0jkOHwewVouO0k3Wf2lEbCq3qRk2HeDvkv0LR7sFC+dDms8 +fDHqb/htqhk+FAJELGRqLeaFq1Z5Eq1/9dk4SIeHgK5pdYqsjpBzOTmocgriw6he +s7F3dOec1ZZdcBEAxOjbYt4e58JwuR81cWAVMmyot5JNCzYVL9e5Vc5n22qt2dmc +Tzw2rLOPt9pT5bzbmyhcDuNg2Qj/5DySAQ+VQysx91BJRXyUimqE7DwQyLhpQU72 +jw29lf2RHdCPNmk8J1TNropmpz/aI7rkperPugdOmxzP55i48ECbvDF4Wtazi+l+ +4kx7ieeLfEQgixy4lRUUkrgJlIDOGbw+d2Ag6LtOgwBiBYnDgYpvLucnx5cFupPY +Cy3VlJ4EKUeQQSsz5kVmvotk9MED4sLx1As8V4e5ViwI5dCsRfKny7BeJ6XNPLnw +PtMh1hbiqCcDmB1urCqXcMle4sRhKccReYOwkLjLLZ80A+MuJuIEAUUuEPCwywzU +R7pagYsmvNgmwIIuJtB6mIJBShC7TpJG+wIDAQABo0IwQDAOBgNVHQ8BAf8EBAMC +AQYwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4EFgQU9IbvOsPSUrpr8H2zSafYVQ9e +Ft8wDQYJKoZIhvcNAQENBQADggIBABQ08aI31VKZs8jzg+y/QM5cvzXlVhcpkZsY +1VVBr0roSBw9Pld9SERrEHto8PVXbadRxeEs4sKivJBKubWAooQ6NTvEB9MHuGnZ +VCU+N035Gq/mhBZgtIs/Zz33jTB2ju3G4Gm9VTZbVqd0OUxFs41Iqvi0HStC3/Io +rKi7crubmp5f2cNW1HrS++ScbTM+VaKVgQ2Tg5jOjou8wtA+204iYXlFpw9Q0qnP +qq6ix7TfLLeRVp6mauwPsAJUgHZluz7yuv3r7TBdukU4ZKUmfAGIPSebtB3EzXfH +7Y326xzv0hEpjvDHLy6+yFfTdBSrKPsMHgc9bsf88dnypNYL8TUiEHlcTgCGU8ts +ud8sWN2M5FEWbHPNYRVfH3xgY2iOYZzn0i+PVyGryOPuzkRHTxDLPIGEWE5susM4 +X4bnNJyKH1AMkBCErR34CLXtAe2ngJlV/V3D4I8CQFJdQkn9tuznohUU/j80xvPH +FOcDGQYmh4m2aIJtlNVP6+/92Siugb5y7HfslyRK94+bZBg2D86TcCJWaaZOFUrR +Y3WniYXsqM5/JI4OOzu7dpjtkJUYvwtg7Qb5jmm8Ilf5rQZJhuvsygzX6+WM079y +nsjoQAm6OwpTN5362vE9SYu1twz7KdzBlUkDhePEOgQkWfLHBJWwB+PvB1j/cUA3 +5zrbwvQf +-----END CERTIFICATE----- +``` +</details> + +- Enable client certificate validation on the web server that is doing the TLS termination. Below we show you how to perform this configuration in `NGINX` and `Apache` + + **NGINX** + +```bash +server { + listen 443 ssl default_server; + + # ... existing SSL configuration for server authentication ... + ssl_verify_client on; + ssl_client_certificate /path/to/Netdata_CA.pem; + + location / { + if ($ssl_client_s_dn !~ "CN=api.netdata.cloud") { + return 403; + } + # ... existing location configuration ... + } +} +``` + +**Apache** + +```bash +Listen 443 +<VirtualHost *:443> + # ... existing SSL configuration for server authentication ... + SSLVerifyClient require + SSLCACertificateFile "/path/to/Netdata_CA.pem" +</VirtualHost> +<Directory /var/www/> + Require expr "%{SSL_CLIENT_S_DN_CN} == 'api.netdata.cloud'" + # ... existing directory configuration ... +</Directory> +``` + +#### Basic authentication In basic authorization, the client sends a request with an Authorization header that includes a base64-encoded string in the format username:password. The server then uses this information to authenticate the client. If this authentication method is selected, the user can set the user and password that will be used when connecting to the destination service. -3. Bearer +#### Bearer token authentication + +In bearer token authentication, the client sends a request with an Authorization header that includes a bearer token. The server then uses this token to authenticate the client. Bearer tokens are typically generated by an authentication service, and are passed to the client after a successful authentication. If this method is selected, the user can set the token to be used for connecting to the destination service. + +##### Challenge secret + +To validate that you has ownership of the web application that will receive the webhook events, we are using a challenge response check mechanism. + +This mechanism works as follows: + +- The challenge secret parameter that you provide is a shared secret between you and Netdata only. +- On your request for creating a new Webhook integration, we will make a GET request to the url of the webhook, adding a query parameter `crc_token`, consisting of a random string. +- You will receive this request on your application and it must construct an encrypted response, consisting of a base64-encoded HMAC SHA-256 hash created from the crc_token and the shared secret. The response will be in the format: + +```json +{ + "response_token": "sha256=9GKoHJYmcHIkhD+C182QWN79YBd+D+Vkj4snmZrfNi4=" +} +``` + +- We will compare your application's response with the hash that we will generate using the challenge secret, and if they are the same, the integration creation will succeed. + +We will do this validation everytime you update your integration configuration. + +- Response requirements: + - A base64 encoded HMAC SHA-256 hash created from the crc_token and the shared secret. + - Valid response_token and JSON format. + - Latency less than 5 seconds. + - 200 HTTP response code. + +**Example response token generation in Python:** + +Here you can see how to define a handler for a Flask application in python 3: + +```python +import base64 +import hashlib +import hmac +import json + +key ='YOUR_CHALLENGE_SECRET' + +@app.route('/webhooks/netdata') +def webhook_challenge(): + token = request.args.get('crc_token').encode('ascii') + + # creates HMAC SHA-256 hash from incomming token and your consumer secret + sha256_hash_digest = hmac.new(key.encode(), + msg=token, + digestmod=hashlib.sha256).digest() + + # construct response data with base64 encoded hash + response = { + 'response_token': 'sha256=' + base64.b64encode(sha256_hash_digest).decode('ascii') + } -In bearer token authorization, the client sends a request with an Authorization header that includes a bearer token. The server then uses this token to authenticate the client. Bearer tokens are typically generated by an authentication service, and are passed to the client after a successful authentication. If this method is selected, the user can set the token to be used for connecting to the destination service. + # returns properly formatted json response + return json.dumps(response) +``` #### Related topics - [Alerts Configuration](https://github.com/netdata/netdata/blob/master/health/README.md) -- [Alert Notifications](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.mdx) +- [Alert Notifications](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.md) - [Manage notification methods](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/manage-notification-methods.md) diff --git a/docs/cloud/alerts-notifications/manage-notification-methods.md b/docs/cloud/alerts-notifications/manage-notification-methods.md index 115aaae73..17c7f879a 100644 --- a/docs/cloud/alerts-notifications/manage-notification-methods.md +++ b/docs/cloud/alerts-notifications/manage-notification-methods.md @@ -1,25 +1,17 @@ -<!-- -title: "Manage notification methods" -sidebar_label: "Manage notification methods" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/manage-notification-methods.md" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations/Alerts" -learn_docs_purpose: "Instructions on how to manage notification methods" ---> +# Manage notification methods From the Cloud interface, you can manage your space's notification settings as well as allow users to personalize their notifications setting -### Manage space notification settings +## Manage space notification settings -#### Prerequisites +### Prerequisites To manage space notification settings, you will need the following: - A Netdata Cloud account - Access to the space as an **administrator** -#### Available actions per notification methods based on service level +### Available actions per notification methods based on service level | **Action** | **Personal service level** | **System service level** | | :- | :-: | :-: | @@ -30,9 +22,9 @@ To manage space notification settings, you will need the following: Notes: * For Netadata provided ones you can't delete the existing notification method configuration. -* Enable, Edit and Add actions over specific notification methods will only be allowed if your plan has access to those ([service classification](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.mdx#service-classification)) +* Enable, Edit and Add actions over specific notification methods will only be allowed if your plan has access to those ([service classification](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.md#service-classification)) -#### Steps +### Steps 1. Click on the **Space settings** cog (located above your profile icon) 1. Click on the **Notification** tab @@ -53,9 +45,9 @@ Notes: 1. **Delete an existing** notification method configuartion. Netdata provided ones can't be deleted, e.g. Email - Use the trash icon to delete your configuration -### Manage user notification settings +## Manage user notification settings -#### Prerequisites +### Prerequisites To manage user specific notification settings, you will need the following: @@ -64,7 +56,7 @@ To manage user specific notification settings, you will need the following: Note: If an administrator has disabled a Personal [service level](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.md#service-level) notification method this will override any user specific setting. -#### Steps +### Steps 1. Click on the **User notification settings** shortcut on top of the help button 1. You are presented with: @@ -78,11 +70,3 @@ Note: If an administrator has disabled a Personal [service level](https://github 1. **Activate notifications** for a room you aren't a member of - From the **All Rooms** tab click on the Join button for the room(s) you want -#### Related topics - -- [Alert Notifications](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.mdx) -- [Alerts Configuration](https://github.com/netdata/netdata/blob/master/health/README.md) -- [Add webhook notification configuration](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/add-webhook-notification-configuration.md) -- [Add Discord notification configuration](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/add-discord-notification-configuration.md) -- [Add Slack notification configuration](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/add-slack-notification-configuration.md) -- [Add PagerDuty notification configuration](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/add-pagerduty-notification-configuration.md) diff --git a/docs/cloud/alerts-notifications/notifications.mdx b/docs/cloud/alerts-notifications/notifications.md index e594606eb..94cd2dc3f 100644 --- a/docs/cloud/alerts-notifications/notifications.mdx +++ b/docs/cloud/alerts-notifications/notifications.md @@ -1,14 +1,4 @@ ---- -title: "Alert notifications" -description: >- - "Configure Netdata Cloud to send notifications to your team whenever any node on your infrastructure - triggers a pre-configured or custom alert threshold." -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.mdx" -sidebar_label: "Alert notifications" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations/Alerts" ---- +# Cloud alert notifications import Callout from '@site/src/components/Callout' @@ -17,11 +7,11 @@ unreachable state. By enabling notifications, you ensure no alert, on any node i you or your team. Having this information centralized helps you: -* Have a clear view of the health across your infrastructure, [seeing all a alerts in one place](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/view-active-alerts.mdx) +* Have a clear view of the health across your infrastructure, seeing all alerts in one place. * Easily [setup your alert notification process](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/manage-notification-methods.md): methods to use and where to use them, filtering rules, etc. -* Quickly troubleshoot using [Metric Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metrics-correlations.md) -or [Anomaly Advisor](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/anomaly-advisor.mdx) +* Quickly troubleshoot using [Metric Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md) +or [Anomaly Advisor](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/anomaly-advisor.md) If a node is getting disconnected often or has many alerts, we protect you and your team from alert fatigue by sending you a flood protection notification. Getting one of these notifications is a good signal of health or performance issues @@ -37,7 +27,7 @@ Centralized alert notifications from Netdata Cloud is a independent process from Netdata](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md). You can enable one or the other, or both, based on your needs. However, the alerts you see in Netdata Cloud are based on those streamed from your Netdata-monitoring nodes. If you want to tweak or add new alert that you see in Netdata Cloud, and receive via centralized alert notifications, you must -[configure](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md) each node's alert watchdog. +[configure](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md) each node's alert watchdog. </Callout> @@ -80,7 +70,7 @@ For **System** notification methods, the destination of the channel will be a ta These notification methods allow for fine-grain rule settings to be done by administrators and more than one configuration can exist for them since. You can specify different targets depending on Rooms or Notification level settings. -Some examples of such notification methods are: Webhook, PagerDuty, slack. +Some examples of such notification methods are: Webhook, PagerDuty, Slack. #### Service classification @@ -97,7 +87,7 @@ These are: webhook ##### Business Notification methods classified as Business are only available for **Business** plans -These are: PagerDuty, slack +These are: PagerDuty, Slack, Opsgenie ## Flood protection @@ -129,27 +119,3 @@ within Cloud's embedded dashboards. Here's an example email notification for the `ram_available` chart, which is in a critical state: ![Screenshot of an alarm notification email from Netdata Cloud](https://user-images.githubusercontent.com/1153921/87461878-e933c480-c5c3-11ea-870b-affdb0801854.png) - -## What's next? - -Netdata Cloud's alarm notifications feature leverages the alarms configuration on each node in your infrastructure. If -you'd like to tweak any of these alarms, or even add new ones based on your needs, read our [health -quickstart](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md). - -You can also [view active alarms](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/view-active-alerts.mdx) in Netdata Cloud for an instant -visualization of the health of your infrastructure. - -### Related Topics - -#### **Related Concepts** -- [Rooms](https://github.com/netdata/netdata/blob/master/docs/cloud/war-rooms.md) -- [Metric Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metrics-correlations.md) -- [Anomaly Advisor](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/anomaly-advisor.mdx) - -#### Related Tasks -- [View Active alarms](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/view-active-alerts.mdx) -- [Manage notification methods](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/manage-notification-methods.md) -- [Add webhook notification configuration](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/add-webhook-notification-configuration.md) -- [Add Discord notification configuration](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/add-discord-notification-configuration.md) -- [Add Slack notification configuration](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/add-slack-notification-configuration.md) -- [Add PagerDuty notification configuration](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/add-pagerduty-notification-configuration.md) diff --git a/docs/cloud/alerts-notifications/smartboard.mdx b/docs/cloud/alerts-notifications/smartboard.mdx deleted file mode 100644 index b9240ce49..000000000 --- a/docs/cloud/alerts-notifications/smartboard.mdx +++ /dev/null @@ -1,46 +0,0 @@ ---- -title: "Alerts smartboard" -description: "" -type: "reference" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/smartboard.mdx" -sidebar_label: "Alerts smartboard" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations/Alerts" ---- - -The Alerts view gives you a high level of availability and performance information for every node you're -monitoring with Netdata Cloud. We expect it to become the "home base" for many Netdata Cloud users who want to instantly -understand what's going on with their infrastructure and exactly where issues might be. - -The Alerts view is available entirely for free to all users and for any number of nodes. - -## Alerts table and filtering - -The Alerts view shows all active alerts in your War Room, including the alert's name, the most recent value, a -timestamp of when it became active, and the relevant node. - -You can use the checkboxes in the filter pane on the right side of the screen to filter the alerts displayed in the -table -by Status, Class, Type & Componenet, Role, Operating System, or Node. - -Click on any of the alert names to see the alert. - -## View active alerts - -In the `Active` subtab, you can see exactly how many **critical** and **warning** alerts are active across your nodes. - -## View configured alerts - -You can view all the configured alerts on all the agents that belong to a War Room in the `Alert Configurations` subtab. -From within the Alerts view, you can click the `Alert Configurations` subtab to see a high level view of the states of -the alerts on the nodes within this War Room and drill down to the node level where each alert is configured with their -latest status. - - - - - - - - diff --git a/docs/cloud/alerts-notifications/view-active-alerts.mdx b/docs/cloud/alerts-notifications/view-active-alerts.mdx deleted file mode 100644 index 1035b682e..000000000 --- a/docs/cloud/alerts-notifications/view-active-alerts.mdx +++ /dev/null @@ -1,76 +0,0 @@ ---- -title: "View active alerts" -description: >- - "Track the health of your infrastructure in one place by taking advantage of the powerful health monitoring - watchdog running on every node." -type: "how-to" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/view-active-alerts.mdx" -sidebar_label: "View active alerts" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations/Alerts" ---- - -Netdata Cloud receives information about active alerts on individual nodes in your infrastructure and updates the -interface based on those status changes. - -Netdata Cloud doesn't produce alerts itself but rather receives and aggregates alerts from each node in your -infrastructure based on their configuration. Every node comes with hundreds of pre-configured alerts that have been -tested by Netdata's community of DevOps engineers and SREs, but you may want to customize existing alerts or create new -ones entirely. - -Read our doc on [health alerts](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md) to -learn how to tweak existing alerts or create new -health entities based on the specific needs of your infrastructure. By taking charge of alert configuration, you'll -ensure Netdata Cloud always delivers the most relevant alerts about the well-being of your nodes. - -## View all active alerts - -The [Alerts Smartboard](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/smartboard.mdx) -provides a high-level interface for viewing the number of critical or warning alerts and where they are in your -infrastructure. - -![The Alerts Smartboard](https://user-images.githubusercontent.com/1153921/119025635-2fcb1b80-b959-11eb-9fdb-7f1a082f43c5.png) - -Click on the **Alerts** tab in any War Room to open the Smartboard. Alternatively, click on any of the alert badges in -the [Nodes view](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/nodes.md) to jump to the Alerts -Smartboard. - -From here, filter active alerts using the **critical** or **warning** boxes, or hover over a box in -the [nodes map](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/smartboard.mdx#nodes-map) -to see a -popup node-specific alert information. - -## View alerts in context with charts - -If you click on any of the alerts, either in a nodes map popup or the alerts table, Netdata Cloud navigates you to the -single-node dashboard and scrolls to the relevant chart. Netdata Cloud also draws a highlight and the value at the -moment your node triggered this alert. - -![An alert in context with charts and dimensions](https://user-images.githubusercontent.com/1153921/119039593-4a0cf580-b969-11eb-840c-4ecb123df9f5.png) - -You can -then [select this area](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx#select) -with `Alt/⌘ + mouse selection` to highlight the alerted timeframe while you explore other charts for root cause -analysis. - -Or, select the area and -run [Metric Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md) to -filter the single-node -dashboard to only those charts most likely to be connected to the alert. - -## What's next? - -Learn more about the features of the Smartboard in -its [reference](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/smartboard.mdx) -doc. To stay notified of active alerts, -enable [centralized alert notifications](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.mdx) -from Netdata Cloud. - -If you're through with setting up alerts, it might be time -to [invite your team](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/invite-your-team.md). - -Check out our recommendations on organizing and -using [Spaces](https://github.com/netdata/netdata/blob/master/docs/cloud/spaces.md) and -[War Rooms](https://github.com/netdata/netdata/blob/master/docs/cloud/war-rooms.md) to streamline your processes once -you find an alert in Netdata Cloud. diff --git a/docs/cloud/beta-architecture/new-architecture.md b/docs/cloud/beta-architecture/new-architecture.md deleted file mode 100644 index c51f08fb1..000000000 --- a/docs/cloud/beta-architecture/new-architecture.md +++ /dev/null @@ -1,36 +0,0 @@ ---- -title: "Test the New Cloud Architecture" -description: "Would you like to be the first to try our new architecture and provide feedback? If so, this guide will help you sign up for our beta testing group." -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/beta-architecture/new-architecture.md" ---- - -To enhance the stability and reliability of Netdata Cloud, we did extensive work on our backend, and we would like to give you the opportunity -to be among the first users to try these changes to our Cloud architecture and provide feedback. - -The backend architecture changes should offer notable improvements in reliability and stability in Netdata Cloud, -but more importantly, it allows us to develop new features and enhanced functionality, including features and enhancements -that you have specifically requested. Features that will be developed on the new architecture include: - -- Parent/Child Cloud relationships -- Alert logs -- Alert management -- Much more - -## Enabling the new architecture - -To enable the new architecture, first ensure that you have installed the latest Netdata version following -[our guide](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx). Then, you or your administrator will need to retrieve the Space IDs -within Netdata Cloud by clicking `Manage Space` in the left pane, selecting the `Space` tab, and copying the value in the `Space Id` field. -You can then send an email to [beta@Netdata.cloud](mailto:beta@netdata.cloud) requesting to be included in our beta testers, and include -in the body of the email a list of Space IDs for any space you would like to have whitelisted for the update. If you received an email -invitation, you can also just reply to the invitation with your Space IDs in the body of the reply. - -Feel free to send the Space IDs for multiple spaces to test the new infrastructure on each of them. - -## Reporting issues - -After you are set up with the new architecture changes, we ask that you report any issues you encounter in our -[designated Discord channel](https://discord.gg/dGzdemHwHh). This feedback -will help us ensure the highest performance of the new architecture and expedite the development and release -of the aforementioned enhancements and features. - diff --git a/docs/cloud/cheatsheet.mdx b/docs/cloud/cheatsheet.md index c1d0a471d..35a6a2c99 100644 --- a/docs/cloud/cheatsheet.mdx +++ b/docs/cloud/cheatsheet.md @@ -1,41 +1,45 @@ ---- -title: "'Netdata management and configuration cheatsheet'" -description: "'Connecting an Agent to the Cloud allows a Netdata Agent, running on a distributed node, to securely connect to Netdata Cloud via the encrypted Agent-Cloud link (ACLK).'" -image: "/cheatsheet/cheatsheet-meta.png" -sidebar_label: "Cheatsheet" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/cheatsheet.mdx" -part_of_learn: "True" -learn_status: "Published" -learn_topic_type: "Getting started" -learn_rel_path: "Getting started" ---- +# Useful management and configuration actions + +Below you will find some of the most common actions that one can take while using Netdata. You can use this page as a quick reference for installing Netdata, connecting a node to the Cloud, properly editing the configuration, accessing Netdata's API, and more! -import { - OneLineInstallWget, - OneLineInstallCurl, -} from '@site/src/components/OneLineInstall/'; +### Install Netdata -Use our management & configuration cheatsheet to simplify your interactions with Netdata, including configuration, -using charts, managing the daemon, and more. +```bash +wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh && sh /tmp/netdata-kickstart.sh + +# Or, if you have cURL but not wget (such as on macOS): +curl https://my-netdata.io/kickstart.sh > /tmp/netdata-kickstart.sh && sh /tmp/netdata-kickstart.sh +``` -## Install Netdata +#### Connect a node to Netdata Cloud -#### Install Netdata +To do so, sign in to Netdata Cloud, on your Space under the Nodes tab, click `Add Nodes` and paste the provided command into your node’s terminal and run it. +You can also copy the Claim token and pass it to the installation script with `--claim-token` and re-run it. -<OneLineInstallWget /> +### Configuration -Or, if you have cURL but not wget (such as on macOS): +**Netdata's config directory** is `/etc/netdata/` but in some operating systems it might be `/opt/netdata/etc/netdata/`. +Look for the `# config directory =` line over at `http://NODE_IP:19999/netdata.conf` to find your config directory. -<OneLineInstallCurl /> +From within that directory you can run `sudo ./edit-config netdata.conf` **to edit Netdata's configuration.** +You can edit other config files too, by specifying their filename after `./edit-config`. +You are expected to use this method in all following configuration changes. + +<!-- #### Edit Netdata's other config files (examples): + +- `$ sudo ./edit-config apps_groups.conf` +- `$ sudo ./edit-config ebpf.conf` +- `$ sudo ./edit-config health.d/load.conf` +- `$ sudo ./edit-config go.d/prometheus.conf` -#### Claim a node to Netdata Cloud +#### View the running Netdata configuration: `http://NODE:19999/netdata.conf` -To do so, sign in to Netdata Cloud, click the `Claim Nodes` button, choose the `War Rooms` to add nodes to, then click `Copy` to copy the full script to your clipboard. Paste that into your node’s terminal and run it. +> Replace `NODE` with the IP address or hostname of your node. Often `localhost`. ## Metrics collection & retention You can tweak your settings in the netdata.conf file. -📄 [Find your netdata.conf file](https://learn.netdata.cloud/guides/step-by-step/step-04#find-your-netdataconf-file) +📄 [Find your netdata.conf file](https://github.com/netdata/netdata/blob/master/daemon/config/README.md) Open a new terminal and navigate to the netdata.conf file. Use the edit-config script to make changes: `sudo ./edit-config netdata.conf` @@ -61,15 +65,17 @@ sudo ./edit-config netdata.conf ``` [global] update every = 5 -``` +``` --> + +--- #### Enable/disable plugins (groups of collectors) -``` +```bash sudo ./edit-config netdata.conf ``` -``` +```conf [plugins] go.d = yes # enabled node.d = no # disabled @@ -77,100 +83,82 @@ sudo ./edit-config netdata.conf #### Enable/disable specific collectors +```bash +sudo ./edit-config go.d.conf # edit a plugin's config ``` -sudo ./edit-config go.d.conf -``` - -> `Or python.d.conf, node.d.conf, edbpf.conf, and so on`. -``` +```yaml modules: activemq: no # disabled - bind: no # disabled cockroachdb: yes # enabled ``` -#### Edit a collector's config (example) +#### Edit a collector's config +```bash +sudo ./edit-config go.d/mysql.conf ``` -$ sudo ./edit-config go.d/mysql.conf -$ sudo ./edit-config ebpf.conf -$ sudo ./edit-config python.d/anomalies.conf -``` - -## Configuration - -#### The Netdata config directory: `/etc/netdata` - -> If you don't have such a directory: -> 📄 [Find your netdata.conf file](https://learn.netdata.cloud/guides/step-by-step/step-04#find-your-netdataconf-file) -> The cheatsheet assumes you’re running all commands from within the Netdata config directory! - -#### Edit Netdata's main config file: `$ sudo ./edit-config netdata.conf` - -#### Edit Netdata's other config files (examples): -- `$ sudo ./edit-config apps_groups.conf` -- `$ sudo ./edit-config ebpf.conf` -- `$ sudo ./edit-config health.d/load.conf` -- `$ sudo ./edit-config go.d/prometheus.conf` - -#### View the running Netdata configuration: `http://NODE:19999/netdata.conf` - -> Replace `NODE` with the IP address or hostname of your node. Often `localhost`. +### Alarms & notifications -## Alarms & notifications - -#### Add a new alarm +<!-- #### Add a new alarm ``` sudo touch health.d/example-alarm.conf sudo ./edit-config health.d/example-alarm.conf +``` --> +After any change, reload the Netdata health configuration: + +```bash +netdatacli reload-health +#or if that command doesn't work on your installation, use: +killall -USR2 netdata ``` #### Configure a specific alarm -``` +```bash sudo ./edit-config health.d/example-alarm.conf ``` #### Silence a specific alarm -``` +```bash sudo ./edit-config health.d/example-alarm.conf - to: silent ``` -#### Disable alarms and notifications - ``` -[health] - enabled = no + to: silent ``` -> After any change, reload the Netdata health configuration - -``` -netdatacli reload-health -``` +<!-- #### Disable alarms and notifications -or if that command doesn't work on your installation, use: +```conf +[health] + enabled = no +``` --> -``` -killall -USR2 netdata -``` +--- -## Manage the daemon +### Manage the daemon | Intent | Action | | :-------------------------- | --------------------------------------------------------------------: | -| Start Netdata | `$ sudo systemctl start netdata` | -| Stop Netdata | `$ sudo systemctl stop netdata` | -| Restart Netdata | `$ sudo systemctl restart netdata` | -| Reload health configuration | `$ sudo netdatacli reload-health` <br></br> `$ killall -USR2 netdata` | +| Start Netdata | `$ sudo service netdata start` | +| Stop Netdata | `$ sudo service netdata stop` | +| Restart Netdata | `$ sudo service netdata restart` | +| Reload health configuration | `$ sudo netdatacli reload-health` `$ killall -USR2 netdata` | | View error logs | `less /var/log/netdata/error.log` | +| View collectors logs | `less /var/log/netdata/collector.log` | + +#### Change the port Netdata listens to (example, set it to port 39999) + +```conf +[web] +default port = 39999 +``` -## See metrics and dashboards +### See metrics and dashboards #### Netdata Cloud: `https://app.netdata.cloud` @@ -178,8 +166,11 @@ killall -USR2 netdata > Replace `NODE` with the IP address or hostname of your node. Often `localhost`. -#### Access the Netdata API: `http://NODE:19999/api/v1/info` +### Access the Netdata API +You can access the API like this: `http://NODE:19999/api/VERSION/REQUEST`. +If you want to take a look at all the API requests, check our API page at <https://learn.netdata.cloud/api> +<!-- ## Interact with charts | Intent | Action | @@ -189,9 +180,9 @@ killall -USR2 netdata | Zoom to a specific timeframe | **Cloud**<br/>use the `select and zoom` button on any chart and then do a `mouse selection` <br/><br/> **Agent**<br/>`SHIFT` + `mouse selection` | | Pan forward or back in time | `click` & `drag` <br/> `touch` & `drag` (touchpad/touchscreen) | | Select a certain timeframe | `ALT` + `mouse selection` <br/> WIP need to evaluate this `command?` + `mouse selection` (macOS) | -| Reset to default auto refreshing state | `double click` | +| Reset to default auto refreshing state | `double click` | --> -## Dashboards +<!-- ## Dashboards #### Disable the local dashboard @@ -200,22 +191,15 @@ Use the `edit-config` script to edit the `netdata.conf` file. ``` [web] mode = none -``` - -#### Change the port Netdata listens to (port 39999) - -``` -[web] -default port = 39999 -``` +``` --> -#### Opt out from anonymous statistics +<!-- #### Opt out from anonymous statistics ``` sudo touch .opt-out-from-anonymous-statistics -``` +``` --> -## Understanding the dashboard +<!-- ## Understanding the dashboard **Charts**: A visualization displaying one or more collected/calculated metrics in a time series. Charts are generated by collectors. @@ -228,4 +212,4 @@ separately from similar instances. Example, disks named **sda**, **sdb**, **sdc**, and so on. **Contexts**: A grouping of charts based on the types of metrics collected and visualized. -**disk.io**, **disk.ops**, and **disk.backlog** are all contexts. +**disk.io**, **disk.ops**, and **disk.backlog** are all contexts. --> diff --git a/docs/cloud/cloud.mdx b/docs/cloud/cloud.mdx deleted file mode 100644 index 764ba0e89..000000000 --- a/docs/cloud/cloud.mdx +++ /dev/null @@ -1,74 +0,0 @@ ---- -title: "Netdata Cloud docs" -description: "Netdata Cloud is real-time visibility for entire infrastructures. View key metrics, insightful charts, and active alarms from all your nodes." -custom_edit_url: "https://github.com/netdata/learn/blob/master/docs/cloud.mdx" ---- - -import { Grid, Box, BoxList, BoxListItem } from '@site/src/components/Grid/' -import { RiExternalLinkLine } from 'react-icons/ri' - -This is the documentation for the Netdata Cloud web application, which works in parallel with the open-source Netdata -monitoring agent to help you monitor your entire infrastructure [for free <RiExternalLinkLine className="inline-block" -/>](https://netdata.cloud/pricing/) in real time and troubleshoot problems that threaten the health of your -nodes before they occur. - -Netdata Cloud requires the open-source [Netdata](/docs/) monitoring agent, which is the basis for the metrics, -visualizations, and alarms that you'll find in Netdata Cloud. Every time you view a node in Netdata Cloud, its metrics -and metadata are streamed to Netdata Cloud, then proxied to your browser, with an infrastructure that ensures [data -privacy <RiExternalLinkLine className="inline-block" />](https://netdata.cloud/privacy/). - - -Read [_What is Netdata?_](https://github.com/netdata/netdata/blob/master/docs/overview/what-is-netdata.md) for details about how Netdata and Netdata Cloud work together -and how they're different from other monitoring solutions, or the -[FAQ <RiExternalLinkLine className="inline-block" />](https://community.netdata.cloud/tags/c/general/29/faq) for answers to common questions. - -<Grid columns="1" className="mb-16"> - <Box - to="/docs/cloud/get-started" - title="Get started with Netdata Cloud" - cta="Go" - image={true}> - Ready to get real-time visibility into your entire infrastructure? This guide will help you get started on Netdata Cloud, from signing in for a free account to connecting your nodes. - </Box> -</Grid> - -## Learn about Netdata Cloud's features - -<Grid columns="2"> - <Box - title="Spaces and War Rooms"> - <BoxList> - <BoxListItem to="/docs/cloud/spaces" title="Spaces" /> - <BoxListItem to="/docs/cloud/war-rooms" title="War Rooms" /> - </BoxList> - </Box> - <Box - title="Dashboards"> - <BoxList> - <BoxListItem to="/docs/cloud/visualize/overview" title="Overview" /> - <BoxListItem to="/docs/cloud/visualize/nodes" title="Nodes view" /> - <BoxListItem to="/docs/cloud/visualize/kubernetes" title="Kubernetes" /> - <BoxListItem to="/docs/cloud/visualize/dashboards" title="Create new dashboards" /> - </BoxList> - </Box> - <Box - title="Alerts and notifications"> - <BoxList> - <BoxListItem to="/docs/cloud/alerts-notifications/view-active-alerts" title="View active alerts" /> - <BoxListItem to="/docs/cloud/alerts-notifications/smartboard" title="Alerts Smartboard" /> - <BoxListItem to="/docs/cloud/alerts-notifications/notifications" title="Alert notifications" /> - </BoxList> - </Box> - <Box - title="Troubleshooting with Netdata Cloud"> - <BoxListItem to="/docs/cloud/insights/metric-correlations" title="Metric Correlations" /> - </Box> - <Box - title="Management and settings"> - <BoxList> - <BoxListItem to="/docs/cloud/manage/sign-in" title="Sign in with email, Google, or GitHub" /> - <BoxListItem to="/docs/cloud/manage/invite-your-team" title="Invite your team" /> - <BoxListItem to="/docs/cloud/manage/themes" title="Choose your Netdata Cloud theme" /> - </BoxList> - </Box> -</Grid> diff --git a/docs/cloud/data-privacy.mdx b/docs/cloud/data-privacy.mdx deleted file mode 100644 index c99cff946..000000000 --- a/docs/cloud/data-privacy.mdx +++ /dev/null @@ -1,39 +0,0 @@ ---- -title: "Data privacy in the Netdata Cloud" -description: "Keeping your data safe and secure is our priority.Netdata never stores your personal information in the Netdata Cloud." -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/data-privacy.mdx" -sidebar_label: "Data privacy in the Netdata Cloud" -learn_status: "Published" -learn_topic_type: "Concepts" -learn_rel_path: "Concepts" ---- - -[Data privacy](https://netdata.cloud/privacy/) is very important to us. We firmly believe that your data belongs to -you. This is why **we don't store any metric data in Netdata Cloud**. - -Your local installations of the Netdata Agent form the basis for the Netdata Cloud. All the data that you see in the web browser when using Netdata Cloud, is actually streamed directly from the Netdata Agent to the Netdata Cloud dashboard. -The data passes through our systems, but it isn't stored. You can learn more about [the Agent's security design](https://github.com/netdata/netdata/blob/master/docs/netdata-security.md) in the Agent documentation. - -However, to be able to offer the stunning visualizations and advanced functionality of Netdata Cloud, it does store a limited number of _metadata_. - -## Metadata - -Let's look at the metadata Netdata Cloud stores using the publicly available demo server `frankfurt.my-netdata.io`: - -- The email address you used to sign up/or sign in -- For each node connected to your Spaces in Netdata Cloud: - - Hostname (as it appears in Netdata Cloud) - - Information shown in `/api/v1/info`. For example: [https://frankfurt.my-netdata.io/api/v1/info](https://frankfurt.my-netdata.io/api/v1/info). - - The chart metadata shown in `/api/v1/charts`. For example: [https://frankfurt.my-netdata.io/api/v1/info](https://frankfurt.my-netdata.io/api/v1/info). - - Alarm configurations shown in `/api/v1/alarms?all`. For example: [https://frankfurt.my-netdata.io/api/v1/alarms?all](https://frankfurt.my-netdata.io/api/v1/alarms?all). - - Active alarms shown in `/api/v1/alarms`. For example: [https://frankfurt.my-netdata.io/api/v1/alarms](https://frankfurt.my-netdata.io/api/v1/alarms). - -How we use them: - -- The data is stored in our production database on AWS. Some of it is also used in Google BigQuery, our data lake, for analytics purposes. These analytics are crucial for our product development process. -- Email is used to identify users in regards to product use and to enrich our tools with product use, such as our CRM. -- This data is only available to Netdata and never to a 3rd party. - -## Delete all personal data - -To remove all personal info we have about you (email and activities) you need to delete your cloud account by logging into https://app.netdata.cloud and accessing your profile, at the bottom left of your screen. diff --git a/docs/cloud/get-started.mdx b/docs/cloud/get-started.mdx deleted file mode 100644 index b9f83af8f..000000000 --- a/docs/cloud/get-started.mdx +++ /dev/null @@ -1,133 +0,0 @@ ---- -title: "Get started with Netdata Cloud" -description: >- - "Ready to get real-time visibility into your entire infrastructure? This guide will help you get started on - Netdata Cloud." -image: "/img/seo/cloud_get-started.png" -custom_edit_url: "https://github.com/netdata/learn/blob/master/docs/cloud/get-started.mdx" ---- - -import Link from '@docusaurus/Link' -import Callout from '@site/src/components/Callout' - -Ready to get real-time visibility into your entire infrastructure with Netdata Cloud? This guide will walk you through -the onboarding process, such as setting up your Space and War Room and connecting your first nodes. - -## Before you start - -Before you get started with Netdata Cloud, you should have the open-source Netdata monitoring agent installed. See our -[installation guide](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx) for details. - -If you already have the Netdata agent running on your node(s), make sure to update it to v1.32 or higher. Read the -[updating documentation](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md) for information -on how to update based on the method you used to install Netdata on that node. - -## Begin the onboarding process - -Get started by signing in to Netdata. Read -the [sign in](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/sign-in.mdx) doc for details on the -authentication methods we use. - -<Link to="https://app.netdata.cloud" className="group"> - <button className="relative text-text bg-gray-200 px-4 py-2 rounded"> - <span className="z-10 relative font-semibold group-hover:text-gray-100">Sign in to Netdata</span> - <div className="opacity-0 group-hover:opacity-100 transition absolute z-0 inset-0 bg-gradient-to-r from-green to-green-lighter rounded"></div> - </button> -</Link> - -Once signed in with your preferred method, a -General [War Room](https://github.com/netdata/netdata/blob/master/docs/cloud/war-rooms.md) and -a [Space](https://github.com/netdata/netdata/blob/master/docs/cloud/spaces.md) -named for your login email are automatically created. You can configure more Spaces and War Rooms to help you you -organize your team -and the many systems that make up your infrastructure. For example, you can put product and infrastructure SRE teams in -separate -Spaces, and then use War Rooms to group nodes by their service (`nginx`), purpose (`webservers`), or physical -location (`IAD`). - -Don't worry! You can always add more Spaces and War Rooms later if you decide to reorganize how you use Netdata Cloud. - -## Connect your nodes - -From within the created War Rooms, Netdata Cloud prompts you -to [connect](https://github.com/netdata/netdata/blob/master/claim/README.md) your nodes to Netdata Cloud. Non-admin -users can users can select from existing nodes already connected to the space or select an admin from a provided list to -connect node. -You can connect any node running Netdata, whether it's a physical or virtual machine, a Docker container, IoT device, -and more. - -The connection process securely connects any node to Netdata Cloud using -the [Agent-Cloud link](https://github.com/netdata/netdata/blob/master/aclk/README.md). By -connecting a node, you prove you have write and administrative access to that node. Connecting to Cloud also prevents -any third party -from connecting a node that you control. Keep in mind: - -- _You can only connect any given node in a single Space_. You can, however, add that connected node to multiple War - Rooms - within that one Space. -- You must repeat the connection process on every node you want to add to Netdata Cloud. - -<Callout type="notice"> - -**Netdata Cloud ensures your data privacy by not storing metrics data from your nodes**. See our statement on Netdata -Cloud [data privacy](https://github.com/netdata/netdata/blob/master/aclk/README.md/#data-privacy) for details on the -data that's streamed from your nodes and the -[connecting to cloud](https://github.com/netdata/netdata/blob/master/claim/README.md) doc for details about why we -implemented the connection process and the encryption methods we use to secure your data in transit. - -</Callout> - -To connect a node, select which War Rooms you want to add this node to with the dropdown, then copy the script given by -Netdata Cloud into your node's terminal. - -Hit **Enter**. The script should return `Agent was successfully claimed.`. If the claiming script returns errors, or if -you don't see the node in your Space after 60 seconds, see -the [troubleshooting information](https://github.com/netdata/netdata/blob/master/claim/README.md#troubleshooting). - -Repeat this process with every node you want to add to Netdata Cloud during onboarding. You can also add more nodes once -you've finished onboarding by clicking the **Connect Nodes** button in -the [Space management area](https://github.com/netdata/netdata/blob/master/docs/cloud/spaces.md/#manage-spaces). - -### Alternatives and other operating systems - -**Docker**: You can execute the claiming script Netdata running as a Docker container, or attach the claiming script -when creating the container for the first time, such as when you're spinning up ephemeral containers. See -the [connect an agent running in Docker](https://github.com/netdata/netdata/blob/master/claim/README.md#connect-an-agent-running-in-docker) -documentation for details. - -**Without root privileges**: If you want to connect an agent without using root privileges, see our [connect -documentation](https://github.com/netdata/netdata/blob/master/claim/README.md#connect-an-agent-without-root-privileges). - -**With a proxy**: If your node uses a proxy to connect to the internet, you need to configure the node's proxy settings. -See -our [connect through a proxy](https://github.com/netdata/netdata/blob/master/claim/README.md#connect-through-a-proxy) -doc for details. - -## Add bookmarks to essential resources - -When an anomaly or outage strikes, your team needs to access other essential resources quickly. You can use Netdata -Cloud's bookmarks to put these tools in one accessible place. Bookmarks are shared between all War Rooms in a Space, so -any users in your Space will be able to see and use them. - -Bookmarks can link to both internal and external resources. You can bookmark your app's status page for quick updates -during an outage, a messaging system on your organization's intranet, or other tools your team uses to respond to -changes in your infrastructure. - -To add a new bookmark, click on the **Add bookmark** link. In the panel, name the bookmark, include its URL, and write a -short description for your team's reference. - -## What's next? - -You finish onboarding -by [inviting members of your team](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/invite-your-team.md) -to your Space. You -can also invite them later. At this point, you're ready to use Cloud. - -Next, learn about the organization and interfaces -behind [Spaces](https://github.com/netdata/netdata/blob/master/docs/cloud/spaces.md) -and [War -Rooms](https://github.com/netdata/netdata/blob/master/docs/cloud/war-rooms.md). - -If you're ready to explore, check out how to use -the [Overview dashboard](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md), which is the -default view for each new War Room you create. diff --git a/docs/cloud/insights/anomaly-advisor.mdx b/docs/cloud/insights/anomaly-advisor.md index 98a28d92c..4804dbc16 100644 --- a/docs/cloud/insights/anomaly-advisor.mdx +++ b/docs/cloud/insights/anomaly-advisor.md @@ -1,12 +1,14 @@ ---- +<!-- title: "Anomaly Advisor" description: "Quickly find anomalous metrics anywhere in your infrastructure." -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/insights/anomaly-advisor.mdx" +custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/insights/anomaly-advisor.md" sidebar_label: "Anomaly Advisor" learn_status: "Published" learn_topic_type: "Tasks" learn_rel_path: "Operations" ---- +--> + +# Anomaly Advisor import ReactPlayer from 'react-player' @@ -19,7 +21,7 @@ interest. If you are running a Netdata version higher than `v1.35.0-29-nightly` you will be able to use the Anomaly Advisor out of the box with zero configuration. If you are on an earlier Netdata version you will need to first enable ML on your nodes by following the steps below. -To enable the Anomaly Advisor you must first enable ML on your nodes via a small config change in `netdata.conf`. Once the anomaly detection models have trained on the Agent (with default settings this takes a couple of hours until enough data has been seen to train the models) you will then be able to enable the Anomaly Advisor feature in Netdata Cloud. +To enable the Anomaly Advisor you must first enable ML on your nodes via a small config change in `netdata.conf`. Once the anomaly detection models have trained on the Agent (with default settings this takes a couple of hours until enough data has been seen to train the models) you will then be able to enable the Anomaly Advisor feature in Netdata Cloud. ### Enable ML on Netdata Agent @@ -30,9 +32,7 @@ To enable ML on your Netdata Agent, you need to edit the `[ml]` section in your enabled = yes ``` -At a minimum you just need to set `enabled = yes` to enable ML with default params. More details about configuration can be found in the [Netdata Agent ML docs](https://learn.netdata.cloud/docs/agent/ml#configuration). - -**Note**: Follow [this guide](https://github.com/netdata/netdata/blob/master/docs/guides/step-by-step/step-04.md) if you are unfamiliar with making configuration changes in Netdata. +At a minimum you just need to set `enabled = yes` to enable ML with default params. More details about configuration can be found in the [Netdata Agent ML docs](https://github.com/netdata/netdata/blob/master/ml/README.md#configuration). When you have finished your configuration, restart Netdata with a command like `sudo systemctl restart netdata` for the config changes to take effect. You can find more info on restarting Netdata [here](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md). @@ -44,7 +44,7 @@ Once this line flattens out all configured metrics should have models trained an ## Using Anomaly Advisor -To use the Anomaly Advisor, go to the "anomalies" tab. Once you highlight a particular timeframe of interest, a selection of the most anomalous dimensions will appear below. +To use the Anomaly Advisor, go to the "anomalies" tab. Once you highlight a particular timeframe of interest, a selection of the most anomalous dimensions will appear below. The aim here is to surface the most anomalous metrics in the space or room for the highlighted window to try and cut down on the amount of manual searching required to get to the root cause of your issues. @@ -68,7 +68,7 @@ You can expand any sparkline chart to see the underlying raw data to see how it ![image](https://user-images.githubusercontent.com/2178292/164430105-f747d1e0-f3cb-4495-a5f7-b7bbb71039ae.png) -On the upper right hand side of the page you can select which nodes to filter on if you wish to do so. The ML training status of each node is also displayed. +On the upper right hand side of the page you can select which nodes to filter on if you wish to do so. The ML training status of each node is also displayed. On the lower right hand side of the page an index of anomaly rates is displayed for the highlighted timeline of interest. The index is sorted from most anomalous metric (highest anomaly rate) to least (lowest anomaly rate). Clicking on an entry in the index will scroll the rest of the page to the corresponding anomaly rate sparkline for that metric. @@ -80,6 +80,7 @@ On the lower right hand side of the page an index of anomaly rates is displayed You can read more detail on how anomaly detection in the Netdata Agent works in our [Agent docs](https://github.com/netdata/netdata/blob/master/ml/README.md). 🚧 **Note**: This functionality is still **under active development** and considered experimental. We dogfood it internally and among early adopters within the Netdata community to build the feature. If you would like to get involved and help us with feedback, you can reach us through any of the following channels: + - Email us at analytics-ml-team@netdata.cloud - Comment on the [beta launch post](https://community.netdata.cloud/t/anomaly-advisor-beta-launch/2717) in the Netdata community - Join us in the [🤖-ml-powered-monitoring](https://discord.gg/4eRSEUpJnc) channel of the Netdata discord. diff --git a/docs/cloud/insights/events-feed.md b/docs/cloud/insights/events-feed.md new file mode 100644 index 000000000..0e297ba81 --- /dev/null +++ b/docs/cloud/insights/events-feed.md @@ -0,0 +1,79 @@ +<!-- +title: "Events feed" +sidebar_label: "Events feed" +custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/insights/events-feed.md" +sidebar_position: "2800" +learn_status: "Published" +learn_topic_type: "Concepts" +learn_rel_path: "Concepts" +learn_docs_purpose: "Present the Netdata Events feed." +--> + +# Events feed + +Netdata Cloud provides the Events feed which is a powerful feature that tracks events that happen on your infrastructure, or in your Space. The feed lets you investigate events that occurred in the past, which is invaluable for troubleshooting. Common use cases are ones like when a node goes offline, and you want to understand what events happened before that. A detailed event history can also assist in attributing sudden pattern changes in a time series to specific changes in your environment. + +## What are the available events? + +At a high-level view, these are the domains from which the Events feed will provide visibility into. + +> ⚠️ Based on your space's plan, different allowances are defined to query past data. + +| **Domains of events** | **Community** | **Pro** | **Business** | +| :-- | :-- | :-- | :-- | +| **Auditing events** - COMING SOON<br/>Events related to actions done on your Space, e.g. invite user, change user role or change plan.| 4 hours | 7 days | 90 days | +| **[Topology events](#topology-events)**<br/>Node state transition events, e.g. live or offline.| 4 hours | 7 days | 14 days | +| **[Alert events](#alert-events)**<br/>Alert state transition events, can be seen as an alert history log.| 4 hours | 7 days | 90 days | + +### Topology events + +| **Event name** | **Description** | **Example** | +| :-- | :-- | :-- | +| Node Became Live | The node is collecting and streaming metrics to Cloud.| Node `netdata-k8s-state-xyz` was **live** | +| Node Became Stale | The node is offline and not streaming metrics to Cloud. It can show historical data from a parent node. | Node `ip-xyz.ec2.internal` was **stale** | +| Node Became Offline | The node is offline, not streaming metrics to Cloud and not available in any parent node.| Node `ip-xyz.ec2.internal` was **offline** | +| Node Created | The node is created but it is still `Unseen` on Cloud, didn't establish a successful connection yet.| Node `ip-xyz.ec2.internal` was **created** | +| Node Removed |The node was removed from the Space, for example by using the `Delete` action on the node. This is a soft delete in that the node gets marked as deleted, but retains the association with this space. If it becomes live again, it will be restored (see `Node Restored` below) and reappear in this space as before. | Node `ip-xyz.ec2.internal` was **deleted (soft)** | +| Node Restored | The node was restored. See `Node Removed` above. | Node `ip-xyz.ec2.internal` was **restored** | +| Node Deleted | The node was deleted from the Space. This is a hard delete and no information on the node is retained. | Node `ip-xyz.ec2.internal` was **deleted (hard)** | +| Agent Connected | The agent connected to the Cloud MQTT server (Agent-Cloud Link established).<br/>These events can only be seen on _All nodes_ War Room. | Agent with claim ID `7d87bqs9-cv42-4823-8sd4-3614548850c7` has connected to Cloud. | +| Agent Disconnected | The agent disconnected from the Cloud MQTT server (Agent-Cloud Link severed).<br/>These events can only be seen on _All nodes_ War Room. | Agent with claim ID `7d87bqs9-cv42-4823-8sd4-3614548850c7` has disconnected from Cloud: **Connection Timeout**. | +| Space Statistics | Daily snapshot of space node statistics.<br/>These events can only be seen on _All nodes_ War Room. | Space statistics. Nodes: **22 live**, **21 stale**, **18 removed**, **61 total**. | + + +### Alert events + +| **Event name** | **Description** | **Example** | +| :-- | :-- | :-- | +| Node Alert State Changed | These are node alert state transition events and can be seen as an alert history log. You will be able to see transitions to or from any of these states: Cleared, Warning, Critical, Removed, Error or Unknown | Transition to Cleared:<br/>`httpcheck_web_service_bad_status` for `httpcheck_netdata_cloud.request_status` on `netdata-parent-xyz` recovered with value **8.33%**<br/><br/>Transition from Cleared to Warning or Critical:<br/>`httpcheck_web_service_bad_status` for `httpcheck_netdata_cloud.request_status` on `netdata-parent-xyz` was raised to **WARNING** with value **10%**<br/><br/>Transition from Warning to Critical:<br/>`httpcheck_web_service_bad_status` for `httpcheck_netdata_cloud.request_status` on `netdata-parent-xyz` escalated to **CRITICAL** with value **25%**<br/><br/>Transition from Critical to Warning:<br/>`httpcheck_web_service_bad_status` for `httpcheck_netdata_cloud.request_status` on `netdata-parent-xyz` was demoted to **WARNING** with value **10%**<br/><br/>Transition to Removed:<br/>Alert `httpcheck_web_service_bad_status` for `httpcheck_netdata_cloud.request_status` on `netdata-parent-xyz` is no longer available, state can't be assessed.<br/><br/>Transition to Error:<br/>For this alert `httpcheck_web_service_bad_status` related to `httpcheck_netdata_cloud.request_status` on `netdata-parent-xyz` we couldn't calculate the current value ⓘ| + +## Who can access the events? + +All users will be able to see events from the Topology and Alerts domain but Auditing events, once these are added, only be accessible to administrators. For more details checkout [Netdata Role-Based Access model](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/role-based-access.md). + +## How to use the events feed + +1. Click on the **Events** tab (located near the top of your screen) +1. You will be presented with a table listing the events that occurred from the timeframe defined on the date time picker +1. You can use the filtering capabilities available on right-hand bar to slice through the results provided. See more details on event types and filters + +Note: When you try to query a longer period than what your space allows you will see an error message highlighting that you are querying data outside of your plan. + +### Event types and filters + +| Event type | Tags | Nodes | Alert Status | Alert Names | Chart Names | +| :-- | :-- | :-- | :-- | :-- | :-- | +| Node Became Live | node, lifecycle | Node name | - | - | - | +| Node Became Stale | node, lifecycle | Node name | - | - | - | +| Node Became Offline | node, lifecycle | Node name | - | - | - | +| Node Created | node, lifecycle | Node name | - | - | - | +| Node Removed | node, lifecycle | Node name | - | - | - | +| Node Restored | node, lifecycle | Node name | - | - | - | +| Node Deleted | node, lifecycle | Node name | - | - | - | +| Agent Claimed | agent | - | - | - | - | +| Agent Connected | agent | - | - | - | - | +| Agent Disconnected | agent | - | - | - | - | +| Agent Authenticated | agent | - | - | - | - | +| Agent Authentication Failed | agent | - | - | - | - | +| Space Statistics | space, node, statistics | Node name | - | - | - | +| Node Alert State Changed | alert, node | Node name | Cleared, Warning, Critical, Removed, Error or Unknown | Alert name | Chart name | diff --git a/docs/cloud/insights/metric-correlations.md b/docs/cloud/insights/metric-correlations.md index ce8835d34..c8ead9be3 100644 --- a/docs/cloud/insights/metric-correlations.md +++ b/docs/cloud/insights/metric-correlations.md @@ -1,4 +1,4 @@ ---- +<!-- title: "Metric Correlations" description: "Quickly find metrics and charts closely related to a particular timeframe of interest anywhere in your infrastructure to discover the root cause faster." custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md" @@ -6,7 +6,9 @@ sidebar_label: "Metric Correlations" learn_status: "Published" learn_topic_type: "Tasks" learn_rel_path: "Operations" ---- +--> + +# Metric Correlations The Metric Correlations (MC) feature lets you quickly find metrics and charts related to a particular window of interest that you want to explore further. By displaying the standard Netdata dashboard, filtered to show only charts that are relevant to the window of interest, you can get to the root cause sooner. @@ -51,9 +53,9 @@ Behind the scenes, Netdata will aggregate the raw data as needed such that arbit ### Data -Netdata is different from typical observability agents since, in addition to just collecting raw metric values, it will by default also assign an "[Anomaly Bit](/docs/agent/ml#anomaly-bit)" related to each collected metric each second. This bit will be 0 for "normal" and 1 for "anomalous". This means that each metric also natively has an "[Anomaly Rate](/docs/agent/ml#anomaly-rate)" associated with it and, as such, MC can be run against the raw metric values or their corresponding anomaly rates. +Netdata is different from typical observability agents since, in addition to just collecting raw metric values, it will by default also assign an "[Anomaly Bit](https://github.com/netdata/netdata/tree/master/ml#anomaly-bit---100--anomalous-0--normal)" related to each collected metric each second. This bit will be 0 for "normal" and 1 for "anomalous". This means that each metric also natively has an "[Anomaly Rate](https://github.com/netdata/netdata/tree/master/ml#anomaly-rate---averageanomaly-bit)" associated with it and, as such, MC can be run against the raw metric values or their corresponding anomaly rates. -**Note**: Read more [here](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/anomaly-detection.md) to learn more about the native anomaly detection features within netdata. +**Note**: Read more [here](https://github.com/netdata/netdata/blob/master/ml/README.md) to learn more about the native anomaly detection features within netdata. - `Metrics` - Run MC on the raw metric values. - `Anomaly Rate` - Run MC on the corresponding anomaly rate for each metric. @@ -72,7 +74,7 @@ Should you still want to, disabling nodes for Metric Correlation on the agent is ## Usage tips! -- When running Metric Correlations from the [Overview tab](https://learn.netdata.cloud/docs/cloud/visualize/overview#overview) across multiple nodes, you might find better results if you iterate on the initial results by grouping by node to then filter to nodes of interest and run the Metric Correlations again. So a typical workflow in this case would be to: +- When running Metric Correlations from the [Overview tab](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md#overview-and-single-node-view) across multiple nodes, you might find better results if you iterate on the initial results by grouping by node to then filter to nodes of interest and run the Metric Correlations again. So a typical workflow in this case would be to: - If unsure which nodes you are interested in then run MC on all nodes. - Within the initial results returned group the most interesting chart by node to see if the changes are across all nodes or a subset of nodes. - If you see a subset of nodes clearly jump out when you group by node, then filter for just those nodes of interest and run the MC again. This will result in less aggregation needing to be done by Netdata and so should help give clearer results as you interact with the slider. @@ -81,7 +83,3 @@ Should you still want to, disabling nodes for Metric Correlation on the agent is - `Volume` might favour picking up more sparse metrics that were relatively flat and then came to life with some spikes (or vice versa). This is because for such metrics that just don't have that many different values in them, it is impossible to construct a cumulative distribution that can then be compared. So `Volume` might be useful in spotting examples of metrics turning on or off. ![example where volume captured network traffic turning on](https://user-images.githubusercontent.com/2178292/182336924-d02fd3d3-7f09-41da-9cfc-809d01396d9d.png) - `KS2` since it relies on the full distribution might be better at highlighting more complex changes that `Volume` is unable to capture. For example a change in the variation of a metric might be picked up easily by `KS2` but missed (or just much lower scored) by `Volume` since the averages might remain not all that different between baseline and highlight even if their variance has changed a lot. ![example where KS2 captured a change in entropy distribution that volume alone might not have picked up](https://user-images.githubusercontent.com/2178292/182338289-59b61e6b-089d-431c-bc8e-bd19ba6ad5a5.png) - Use `Volume` and `Anomaly Rate` together to ask what metrics have turned most anomalous from baseline to highlighted window. You can expand the embedded anomaly rate chart once you have results to see this more clearly. ![example where Volume and Anomaly Rate together help show what dimensions where most anomalous](https://user-images.githubusercontent.com/2178292/182338666-6d19fa92-89d3-4d61-804c-8f10982114f5.png) - -## What's next? - -You can read more about all the ML powered capabilities of Netdata [here](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/anomaly-detection.md). If you aren't yet familiar with the power of Netdata Cloud's visualization features, check out the [Nodes view](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/nodes.md) and learn how to [build new dashboards](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md). diff --git a/docs/cloud/manage/invite-your-team.md b/docs/cloud/manage/invite-your-team.md index f294a627d..da2d51f7f 100644 --- a/docs/cloud/manage/invite-your-team.md +++ b/docs/cloud/manage/invite-your-team.md @@ -1,37 +1,24 @@ ---- -title: "Invite your team" -description: >- - "Invite your entire SRE, DevOPs, or ITOps team to Netdata Cloud to give everyone insights into your - infrastructure from a single pane of glass." -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/manage/invite-your-team.md" -sidebar_label: "Invite your team" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations" ---- +# Invite your team + +Invite your entire SRE, DevOPs, or ITOps team to Netdata Cloud, to give everyone insights into your infrastructure from a single pane of glass. Invite new users to your Space by clicking on **Invite Users** in the [Space](https://github.com/netdata/netdata/blob/master/docs/cloud/spaces.md) management area. -![Opening the invitation panel in Netdata Cloud](https://user-images.githubusercontent.com/1153921/108529805-1b13b480-7292-11eb-862f-0499e3fdac17.png) +![image](https://user-images.githubusercontent.com/70198089/227887469-e46bad55-ef5d-441a-83a5-dcc2af038678.png) + -Enter the email addresses for the users you want to invite to your Space. You can enter any number of email addresses, -separated by a comma, to send multiple invitations at once. +You will be prompted to enter the email addresses for the users you want to invite to your Space. You can enter any number of email addresses, separated by a comma, to send multiple invitations at once. Next, choose the War Rooms you want to invite these users to. Once logged in, these users are not restricted only to these War Rooms. They can be invited to others, or join any that are public. +Next, pick a role for the invited user. You can read more about [which roles are available](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/role-based-access.md#what-roles-are-available) based on your [subscription plan](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/plans.md). + Click the **Send** button to send an email invitation, which will prompt them -to [sign up](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/sign-in.mdx) and join your Space. +to [sign up](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/sign-in.md) and join your Space. -![The invitation panel in Netdata Cloud](https://user-images.githubusercontent.com/1153921/97762959-53b33680-1ac7-11eb-8e9d-f3f4a14c0028.png) +![image](https://user-images.githubusercontent.com/70198089/227888899-8511081b-0157-4e22-81d9-898cc464dcb0.png) Any unaccepted invitations remain under **Invitations awaiting response**. These invitations can be rescinded at any time by clicking the trash can icon. - -## What's next? - -If your team members have trouble signing in, direct them to -the [sign in guide](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/sign-in.mdx). Once your -team is onboarded to Netdata Cloud, they can view shared assets, such -as [new dashboards](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md). diff --git a/docs/cloud/manage/plans.md b/docs/cloud/manage/plans.md new file mode 100644 index 000000000..9180ab5a0 --- /dev/null +++ b/docs/cloud/manage/plans.md @@ -0,0 +1,120 @@ +# Netdata Plans + +This page will guide you through the differences between the Community, Pro, Business and Enterprise plans. + +At Netdata, we believe in providing free and unrestricted access to high-quality monitoring solutions, and our commitment to this principle will not change. We offer our free SaaS offering - what we call **Community plan** - and Open Source Agent, which features unlimited nodes and users, unlimited metrics, and retention, providing real-time, high-fidelity, out-of-the-box infrastructure monitoring for packaged applications, containers, and operating systems. + +We also provide paid subscriptions that designed to provide additional features and capabilities for businesses that need tighter and customizable integration of the free monitoring solution to their processes. These are divided into three different plans: **Pro**, **Business**, and **Enterprise**. Each plan will offers a different set of features and capabilities to meet the needs of businesses of different sizes and with different monitoring requirements. + +> ### Note +> To not disrupt the existing space user's access rights we will keep them in the **Early Bird** plan. The reason for this is to allow users to +> keep using the legacy **Member** role with the exact same permissions as it has currently. +> +> If you move from the **Early Bird** plan to a paid plan, you will not be able to return to the **Early Bird** plan again. The **Community** free plan will always be available to you, but it does not allow +> you to invite or change users using the Member role. See more details on our [roles and plans](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/role-based-access.md#what-roles-are-available) documentation. + +### Plans + +The plan is an attribute that is directly attached to your space(s) and that dictates what capabilities and customizations you have on your space. If you have different spaces you can have different Netdata plans on them. This gives you flexibility to chose what is more adequate for your needs on each of your spaces. + +Netdata Cloud plans, with the exception of Community, work as subscriptions and overall consist of two pricing components: + +* A flat fee component, that is a price per space, and +* An on-demand metered component, that is related to your usage of Netdata which directly links to the [number of nodes you have running](#running-nodes-and-billing) + +Netdata provides two billing frequency options: + +* Monthly - Pay as you go, where we charge both the flat fee and the on-demand component every month +* Yearly - Annual prepayment, where we charge upfront the flat fee and committed amount related to your estimated usage of Netdata (more details [here](#committed-nodes)) + +For more details on the plans and subscription conditions please check <https://netdata.cloud/pricing>. + +#### Running nodes and billing + +The only dynamic variable we consider for billing is the number of concurrently running nodes or agents. We only charge you for your active running nodes, so we don't count: + +* offline nodes +* stale nodes, nodes that are available to query through a Netdata parent agent but are not actively connecting metrics at the moment + +To ensure we don't overcharge you due to sporadic spikes throughout a month or even at a certain point in a day we are: + +* Calculate a daily P90 figure for your running nodes. To achieve that, we take a daily snapshot of your running nodes, and using the node state change events (live, offline) we guarantee that a daily P90 figure is calculated to remove any daily spikes +* On top of the above, we do a running P90 calculation from the start to the end of your billing cycle. Even if you have an yearly billing frequency we keep a monthly subscription linked to that to identify any potential overage over your [committed nodes](#committed-nodes). + +#### Committed nodes + +When you subscribe to an Yearly plan you will need to specify the number of nodes that you will commit to. On these nodes, a discounted price of less 25% than the original cost per node of the plan is applied. This amount will be part of your annual prepayment. + +``` +Node plan discounted price x committed nodes x 12 months +``` + +If, for a given month, your usage is over these committed nodes we will charge the original cost per node for the nodes above the committed number. + +#### Plan changes and credit balance + +It is ok to change your mind. We allow to change your plan, billing frequency or adjust the committed nodes, on yearly plans, at any time. + +To achieve this you will need to: + +* Move to the Community plan, where we will cancel the current subscription and: + * Issue a credit to you for the unused period, in case you are on a **yearly plan** + * Charge you only for the current used period and issue a credit for the unused period related to the flat fee, in case you are on a **monthly plan** +* Select the new subscription with the change that you want + +> ⚠️ On a move to Community (cancellation of an active subscription), please note that you will have all your notification methods configurations active **for a period of 24 hours**. +> After that, any notification methods unavailable in your new plan at that time will be automatically disabled. You can always re-enable them once you move to a paid plan that includes them. + +> ⚠️ Any credit given to you will be available to use on future paid subscriptions with us. It will be available until the the **end of the following year**. + +### Areas impacted by plans + +##### Role-Based Access model + +Depending on the plan associated to your space you will have different roles available: + +| **Role** | **Community** | **Pro** | **Business** | **Early Bird** | +| :-- | :--: | :--: | :--: | :--: | +| **Administrators**<p>Users with this role can control Spaces, War Rooms, Nodes, Users and Billing.</p><p>They can also access any War Room in the Space.</p> | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | +| **Managers**<p>Users with this role can manage War Rooms and Users.</p><p>They can access any War Room in the Space.</p> | - | - | :heavy_check_mark: | - | +| **Troubleshooters**<p>Users with this role can use Netdata to troubleshoot, not manage entities.</p><p>They can access any War Room in the Space.</p> | - | :heavy_check_mark: | :heavy_check_mark: | - | +| **Observers**<p>Users with this role can only view data in specific War Rooms.</p>💡 Ideal for restricting your customer's access to their own dedicated rooms.<p></p> | - | - | :heavy_check_mark: | - | +| **Billing**<p>Users with this role can handle billing options and invoices.</p> | - | - | :heavy_check_mark: | - | +| **Member** ⚠️ Legacy role<p>Users with this role can create War Rooms and invite other Members.</p><p>They can only see the War Rooms they belong to and all Nodes in the All Nodes room.</p>| - | - | - | :heavy_check_mark: | + +For more details check the documentation under [Role-Based Access model](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/role-based-access.md). + +##### Events feed + +The plan you have subscribed on your space will determine the amount of historical data you will be able to query: + +| **Type of events** | **Community** | **Pro** | **Business** | +| :-- | :-- | :-- | :-- | +| **Auditing events** - COMING SOON<p>Events related to actions done on your Space, e.g. invite user, change user role or create room.</p>| 4 hours | 7 days | 90 days | +| **Topology events**<p>Node state transition events, e.g. live or offline.</p>| 4 hours | 7 days | 14 days | +| **Alert events**<p>Alert state transition events, can be seen as an alert history log.</p>| 4 hours | 7 days | 90 days | + +For more details check the documentation under [Events feed](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/events-feed.md). + +##### Notification integrations + +The plan on your space will determine what type of notifications methods will be available to you: + +* **Community** - Email and Discord +* **Pro** - Email, Discord and webhook +* **Business** - Unlimited, this includes Slack, PagerDuty, Opsgenie etc. + +For mode details check the documentation under [Alert Notifications](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.md). + +### Related Topics + +#### **Related Concepts** + +* [Spaces](https://github.com/netdata/netdata/blob/master/docs/cloud/spaces.md) +* [Alert Notifications](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.md) +* [Events feed](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/events-feed.md) +* [Role-Based Access model](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/role-based-access.md) + +#### Related Tasks + +* [View Plan & Billing](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/view-plan-billing.md) diff --git a/docs/cloud/manage/role-based-access.md b/docs/cloud/manage/role-based-access.md new file mode 100644 index 000000000..1696e0964 --- /dev/null +++ b/docs/cloud/manage/role-based-access.md @@ -0,0 +1,136 @@ +# Role-Based Access model + +Netdata Cloud's role-based-access mechanism allows you to control what functionalities in the app users can access. Each user can be assigned only one role, which fully specifies all the capabilities they are afforded. + +## What roles are available? + +With the advent of the paid plans we revamped the roles to cover needs expressed by Netdata users, like providing more limited access to their customers, or +being able to join any room. We also aligned the offered roles to the target audience of each plan. The end result is the following: + +| **Role** | **Community** | **Pro** | **Business** | **Early Bird** | +| :-- | :--: | :--: | :--: | :--: | +| **Administrators**<p>Users with this role can control Spaces, War Rooms, Nodes, Users and Billing.</p><p>They can also access any War Room in the Space.</p> | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | +| **Managers**<p>Users with this role can manage War Rooms and Users.</p><p>They can access any War Room in the Space.</p> | - | - | :heavy_check_mark: | - | +| **Troubleshooters**<p>Users with this role can use Netdata to troubleshoot, not manage entities.</p><p>They can access any War Room in the Space.</p> | - | :heavy_check_mark: | :heavy_check_mark: | - | +| **Observers**<p>Users with this role can only view data in specific War Rooms.</p>💡 Ideal for restricting your customer's access to their own dedicated rooms.<p></p> | - | - | :heavy_check_mark: | - | +| **Billing**<p>Users with this role can handle billing options and invoices.</p> | - | - | :heavy_check_mark: | - | +| **Member** ⚠️ Legacy role<p>Users with this role can create War Rooms and invite other Members.</p><p>They can only see the War Rooms they belong to and all Nodes in the All Nodes room.</p>| - | - | - | :heavy_check_mark: | + +## What happens to the previous Member role? + +We will maintain a Early Bird plan for existing users, which will continue to provide access to the Member role. + +## Which functionalities are available for each role? + +In more detail, you can find on the following tables which functionalities are available for each role on each domain. + +### Space Management + +| **Functionality** | **Administrator** | **Manager** | **Troubleshooter** | **Observer** | **Billing** | **Member** | +| :-- | :--: | :--: | :--: | :--: | :--: | :--: | +| See Space | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | +| Leave Space | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | +| Delete Space | :heavy_check_mark: | - | - | - | - | - | +| Change name | :heavy_check_mark: | - | - | - | - | - | +| Change description | :heavy_check_mark: | - | - | - | - | - | + +### Node Management + +| **Functionality** | **Administrator** | **Manager** | **Troubleshooter** | **Observer** | **Billing** | **Member** | Notes | +| :-- | :--: | :--: | :--: | :--: | :--: | :--: | :-- | +| See all Nodes in Space (_All Nodes_ room) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | - | :heavy_check_mark: | Members are always on the _All Nodes_ room | +| Connect Node to Space | :heavy_check_mark: | - | - | - | - | - | - | +| Delete Node from Space | :heavy_check_mark: | - | - | - | - | - | - | + +### User Management + +| **Functionality** | **Administrator** | **Manager** | **Troubleshooter** | **Observer** | **Billing** | **Member** | Notes | +| :-- | :--: | :--: | :--: | :--: | :--: | :--: | :-- | +| See all Users in Space | :heavy_check_mark: | :heavy_check_mark: | - | - | - | :heavy_check_mark: | | +| Invite new User to Space | :heavy_check_mark: | :heavy_check_mark: | - | - | - | :heavy_check_mark: | You can't invite a user with a role you don't have permissions to appoint to (see below) | +| Delete Pending Invitation to Space | :heavy_check_mark: | :heavy_check_mark: | - | - | - | :heavy_check_mark: | | +| Delete User from Space | :heavy_check_mark: | :heavy_check_mark: | - | - | - | - | You can't delete a user if he has a role you don't have permissions to appoint to (see below) | +| Appoint Administrators | :heavy_check_mark: | - | - | - | - | - | | +| Appoint Billing user | :heavy_check_mark: | - | - | - | - | - | | +| Appoint Managers | :heavy_check_mark: | :heavy_check_mark: | - | - | - | - | | +| Appoint Troubleshooters | :heavy_check_mark: | :heavy_check_mark: | - | - | - | - | | +| Appoint Observer | :heavy_check_mark: | :heavy_check_mark: | - | - | - | - | | +| Appoint Member | :heavy_check_mark: | - | - | - | - | :heavy_check_mark: | Only available on Early Bird plans | +| See all Users in a Room | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | :heavy_check_mark: | | +| Invite existing user to Room | :heavy_check_mark: | :heavy_check_mark: | - | - | - | :heavy_check_mark: | User already invited to the Space | +| Remove user from Room | :heavy_check_mark: | :heavy_check_mark: | - | - | - | - | | + +### Room Management + +| **Functionality** | **Administrator** | **Manager** | **Troubleshooter** | **Observer** | **Billing** | **Member** | Notes | +| :-- | :--: | :--: | :--: | :--: | :--: | :--: | :-- | +| See all Rooms in a Space | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | - | - | | +| Join any Room in a Space | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | - | - | By joining a room you will be enabled to get notifications from nodes on that room | +| Leave Room | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | :heavy_check_mark: | | +| Create a new Room in a Space | :heavy_check_mark: | :heavy_check_mark: | - | - | - | :heavy_check_mark: | | +| Delete Room | :heavy_check_mark: | :heavy_check_mark: | - | - | - | - | | +| Change Room name | :heavy_check_mark: | :heavy_check_mark: | - | - | - | :heavy_check_mark: | If not the _All Nodes_ room | +| Change Room description | :heavy_check_mark: | :heavy_check_mark: | - | - | - | :heavy_check_mark: | | +| Add existing Nodes to Room | :heavy_check_mark: | :heavy_check_mark: | - | - | - | :heavy_check_mark: | Node already connected to the Space | +| Remove Nodes from Room | :heavy_check_mark: | :heavy_check_mark: | - | - | - | :heavy_check_mark: | | + +### Notifications Management + +| **Functionality** | **Administrator** | **Manager** | **Troubleshooter** | **Observer** | **Billing** | **Member** | Notes | +| :-- | :--: | :--: | :--: | :--: | :--: | :--: | :-- | +| See all configured notifications on a Space | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | :heavy_check_mark: | | +| Add new configuration | :heavy_check_mark: | - | - | - | - | - | | +| Enable/Disable configuration | :heavy_check_mark: | - | - | - | - | - | | +| Edit configuration | :heavy_check_mark: | - | - | - | - | - | Some exceptions apply depending on [service level](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/manage-notification-methods.md#available-actions-per-notification-methods-based-on-service-level) | +| Delete configuration | :heavy_check_mark: | - | - | - | - | - | | +| Edit personal level notification settings | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | [Manage user notification settings](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/manage-notification-methods.md#manage-user-notification-settings) | + +Notes: +* Enable, Edit and Add actions over specific notification methods will only be allowed if your plan has access to those ([service classification](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.md#service-classification)) + +### Dashboards + +| **Functionality** | **Administrator** | **Manager** | **Troubleshooter** | **Observer** | **Billing** | **Member** | +| :-- | :--: | :--: | :--: | :--: | :--: | :--: | +| See all dashboards in Room | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | :heavy_check_mark: | +| Add new dashboard to Room | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | :heavy_check_mark: | +| Edit any dashboard in Room | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | - | :heavy_check_mark: | +| Edit own dashboard in Room | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | :heavy_check_mark: | +| Delete any dashboard in Room | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | - | :heavy_check_mark: | +| Delete own dashboard in Room | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | :heavy_check_mark: | + +### Functions + +| **Functionality** | **Administrator** | **Manager** | **Troubleshooter** | **Observer** | **Billing** | **Member** | Notes | +| :-- | :--: | :--: | :--: | :--: | :--: | :--: | :-- | +| See all functions in Room | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | :heavy_check_mark: | +| Run any function in Room | :heavy_check_mark: | :heavy_check_mark: | - | - | - | - | +| Run read-only function in Room | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | :heavy_check_mark: | | +| Run sensitive function in Room | :heavy_check_mark: | :heavy_check_mark: | - | - | - | - | There isn't any function on this category yet, so subject to change. | + +### Events feed + +| **Functionality** | **Administrator** | **Manager** | **Troubleshooter** | **Observer** | **Billing** | **Member** | Notes | +| :-- | :--: | :--: | :--: | :--: | :--: | :--: | :-- | +| See Alert or Topology events | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | :heavy_check_mark: | | +| See Auditing events | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | - | These are coming soon, not currently available | + +### Billing + +| **Functionality** | **Administrator** | **Manager** | **Troubleshooter** | **Observer** | **Billing** | **Member** | Notes | +| :-- | :--: | :--: | :--: | :--: | :--: | :--: | :-- | +| See Plan & Billing details | :heavy_check_mark: | - | - | - | :heavy_check_mark: | - | Current plan and usage figures | +| Update plans | :heavy_check_mark: | - | - | - | - | - | This includes cancelling current plan (going to Community plan) | +| See invoices | :heavy_check_mark: | - | - | - | :heavy_check_mark: | - | | +| Manage payment methods | :heavy_check_mark: | - | - | - | :heavy_check_mark: | - | | +| Update billing email | :heavy_check_mark: | - | - | - | :heavy_check_mark: | - | | + +### Other permissions + +| **Functionality** | **Administrator** | **Manager** | **Troubleshooter** | **Observer** | **Billing** | **Member** | +| :-- | :--: | :--: | :--: | :--: | :--: | :--: | +| See Bookmarks in Space | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | :heavy_check_mark: | +| Add Bookmark to Space | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | - | :heavy_check_mark: | +| Delete Bookmark from Space | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | - | :heavy_check_mark: | +| See Visited Nodes | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | :heavy_check_mark: | +| Update Visited Nodes | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | - | :heavy_check_mark: | diff --git a/docs/cloud/manage/sign-in.mdx b/docs/cloud/manage/sign-in.md index 32fcb22e7..96275f573 100644 --- a/docs/cloud/manage/sign-in.mdx +++ b/docs/cloud/manage/sign-in.md @@ -1,12 +1,6 @@ ---- -title: "Sign in with email, Google, or GitHub" -description: "Learn how signing in to Cloud works via one of our three authentication methods, plus some tips if you're having trouble signing in." -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/manage/sign-in.mdx" -sidebar_label: "Sign in with email, Google, or GitHub" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations" ---- +# Sign in to Netdata + +This page explains how to sign in to Netdata with your email, Google account, or GitHub account, and provides some tips if you're having trouble signing in. You can [sign in to Netdata](https://app.netdata.cloud/sign-in?cloudRoute=spaces?utm_source=docs&utm_content=sign_in_button_first_section) through one of three methods: email, Google, or GitHub. Email uses a time-sensitive link that authenticates your browser, and Google/GitHub both use OAuth to associate your email address @@ -15,7 +9,6 @@ with a Netdata Cloud account. No matter the method, your Netdata Cloud account is based around your email address. Netdata Cloud does not store passwords. - ## Email To sign in with email, visit [Netdata Cloud](https://app.netdata.cloud/sign-in?cloudRoute=spaces?utm_source=docs&utm_content=sign_in_button_email_section), enter your email address, and click @@ -35,22 +28,27 @@ If you don't have a Netdata Cloud account yet you won't need to worry about it. After your account is created and you sign in to Netdata, you first are asked to agree to Netdata Cloud's [Privacy Policy](https://www.netdata.cloud/privacy/) and [Terms of Use](https://www.netdata.cloud/terms/). Once you agree with these you are directed through the Netdata Cloud onboarding process, which is explained in the [Netdata Cloud -quickstart](https://github.com/netdata/netdata/blob/master/docs/cloud/get-started.mdx). +quickstart](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md). ### Troubleshooting -You should receive your sign in email in less than a minute. The subject is **Verify your email!** and the sender is `no-reply@app.netdata.cloud` via `sendgrid.net`. +You should receive your sign in email in less than a minute. The subject is **Verify your email!** for new sign-ups, **Sign in to Netdata** for sign ins. +The sender is `no-reply@netdata.cloud` via `sendgrid.net`. If you don't see the email, try the following: -- Check [Netdata Cloud status](https://status.netdata.cloud) for ongoing issues with our infrastructure. -- Request another sign in email via the [sign in page](https://app.netdata.cloud/sign-in?cloudRoute=spaces?utm_source=docs&utm_content=sign_in_button_troubleshooting_section). -- Check your spam folder. -- In Gmail, check the **Updates** category. +- Check your spam folder. +- In Gmail, check the **Updates** category. +- Check [Netdata Cloud status](https://status.netdata.cloud) for ongoing issues with our infrastructure. +- Request another sign in email via the [sign in page](https://app.netdata.cloud/sign-in?cloudRoute=spaces?utm_source=docs&utm_content=sign_in_button_troubleshooting_section). -You may also want to add `no-reply@app.netdata.cloud` to your address book or contacts list, especially if you're using +You may also want to add `no-reply@netdata.cloud` to your address book or contacts list, especially if you're using a public email service, such as Gmail. You may also want to whitelist/allowlist either the specific email or the entire -`app.netdata.cloud` domain. +`netdata.cloud` domain. + +In some cases, temporary issues with your mail server or email account may result in your email address being added to a Bounce list by Sendgrid. +If you are added to that list, no Netdata cloud email can reach you, including alarm notifications. Let us know in Discord that you have trouble receiving +any email from us and someone will ask you to provide your email address privately, so we can check if you are on the Bounce list. ## Google and GitHub OAuth @@ -59,7 +57,7 @@ receives via OAuth. To sign in with Google or GitHub OAuth, visit [Netdata Cloud](https://app.netdata.cloud/sign-in?cloudRoute=spaces?utm_source=docs&utm_content=sign_in_button_google_github_section) and click the **Continue with Google/GitHub** or button. Enter your Google/GitHub username and your password. Complete two-factor -authentication if you or your organization has it enabled. +authentication if you or your organization has it enabled. You are then signed in to Netdata Cloud or directed to the new-user onboarding if you have not signed up previously. @@ -81,8 +79,3 @@ with `user2@example.com`, Netdata Cloud creates a new account and begins the onb It is not currently possible to link an account created with `user@example.com` to a Google account associated with `user2@example.com`. - -## What's next? - -If you haven't already onboarded to Netdata Cloud and connected your first nodes, visit -the [get started guide](https://github.com/netdata/netdata/blob/master/docs/cloud/get-started.mdx). diff --git a/docs/cloud/manage/themes.md b/docs/cloud/manage/themes.md index 11d5cb32f..aaf193a87 100644 --- a/docs/cloud/manage/themes.md +++ b/docs/cloud/manage/themes.md @@ -1,12 +1,4 @@ ---- -title: "Choose your Netdata Cloud theme" -description: "Switch between Light and Dark themes in Netdata Cloud to match your personal visualization preferences." -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/manage/themes.md" -sidebar_label: "Choose your Netdata Cloud theme" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations" ---- +# Choose your Netdata Cloud theme The Dark theme is the default for all new Netdata Cloud accounts. diff --git a/docs/cloud/manage/view-plan-billing.md b/docs/cloud/manage/view-plan-billing.md new file mode 100644 index 000000000..d29f93f98 --- /dev/null +++ b/docs/cloud/manage/view-plan-billing.md @@ -0,0 +1,92 @@ +# View Plan & Billing + +From the Cloud interface, you can view and manage your space's plan and billing settings, and see the space's usage in terms of running nodes. + +To view and manage some specific settings, related to billing options and invoices, you'll be redirected to our billing provider Customer Portal. + +## Prerequisites + +To see your plan and billing setting you need: + +- A Cloud account +- Access to the space as an Administrator or Billing user + +## Steps + +1. Click on the **Space settings** cog (located above your profile icon) +1. Click on the **Plan & Billing** tab +1. On this page you will be presented with information on your current plan, billing settings, and usage information: + 1. At the top of the page you will see: + - **Credit** amount which refers to any amount you have available to use on future invoices or subscription changes (<https://github.com/netdata/netdata/blob/master/docs/cloud/manage/plans.md#plan-changes-and-credit-balance>) - this is displayed once you have had an active paid subscription with us + - **Billing email** the email that was specified to be linked to tha plan subscription. This is where invoices, payment, and subscription-related notifications will be sent. + - **Billing options and Invoices** is the link to our billing provider Customer Portal where you will be able to: + - See the current subscription. There will always be 2 subscriptions active for the two pricing components mentioned on [Netdata Plans documentation page](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/plans.md#plans) + - Change directly the payment method associated to current subscriptions + - View, add, delete or change your default payment methods + - View or change or Billing information: + - Billing email + - Address + - Phone number + - Tax ID + - View your invoice history + 1. At the middle, you'll see details on your current plan as well as means to: + - Upgrade or cancel your plan + - View full plan details page + 1. At the bottom, you will find your Usage chart that displays: + - Daily count - The weighted 90th percentile of the live node count during the day, taking time as the weight. If you have 30 live nodes throughout the day + except for a two hour peak of 44 live nodes, the daily value is 31. + - Period count: The 90th percentile of the daily counts for this period up to the date. The last value for the period is used as the number of nodes for the bill for that period. See more details in [running nodes and billing](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/plans.md#running-nodes-and-billing) (only applicable if you are on a paid plan subscription) + - Committed nodes: The number of nodes committed to in the yearly plan. In case the period count is higher than the number of committed nodes, the difference is billed as overage. + +> ⚠️ At the moment, any changes to an active paid plan, upgrades, change billing frequency or committed nodes, will be a manual two-setup flow: +> +> 1. cancel your current subscription - move you to the Community plan +> 2. chose the plan with the intended changes +> +> This is a temporary process that we aim to sort out soon so that it will effortless for you to do any of these actions. + +## FAQ + +### 1. What Payment Methods are accepted? + +You can easily pay online via most major Credit/Debit Cards. More payment options are expected to become available in the near future. + +### 2. What happens if a renewal payment fails? + +After an initial failed payment, we will attempt to process your payment every week for the next 15 days. After three failed attempts your Space will be moved to the **Community** plan (free forever). + +For the next 24 hours, you will be able to use all your current notification method configurations. After 24 hours, any of the notification method configurations that aren't available on your space's plan will be automatically disabled. + +Cancellation might affect users in your Space. Please check what roles are available on the [Community plan](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/plans.md#areas-impacted-by-plans). Users with unavailable roles on the Community plan will immediately have restricted access to the Space. + +### 3. Which currencies do you support? + +We currently accept payments only in US Dollars (USD). We currently have plans to also accept payments in Euros (EUR), but do not currently have an estimate for when such support will be available. + +### 4. Can I get a refund? How? + +Payments for Netdata subscriptions are refundable **only** if you cancel your subscription within 14 days of purchase. The refund will be credited to the Credit/Debit Card used for making the purchase. To request a refund, please email us at [billing@netdata.cloud](mailto:billing@netdata.cloud). + +### 5. How do I cancel my paid Plan? + +Your annual or monthly Netdata Subscription plan will automatically renew until you cancel it. You can cancel your paid plan at any time by clicking ‘Cancel Plan’ from the **Plan & Billing** section under settings. You can also cancel your paid Plan by clicking the _Select_ button under **Community** plan in the **Plan & Billing** Section under Settings. + +### 6. How can I access my Invoices/Receipts after I paid for a Plan? + +You can visit the _Billing Options & Invoices_ in the **Plan & Billing** section under settings in your Netdata Space where you can find all your Invoicing history. + +### 7. Why do I see two separate Invoices? + +Every time you purchase or renew a Plan, two separate Invoices are generated: + +- One Invoice includes the recurring fees of the Plan you have chosen + +- The other Invoice includes your monthly “On Demand - Usage”. + + Right after the activation of your subscription, you will receive a zero value Invoice since you had no usage when you subscribed. + + On the following month you will receive an Invoice based on your monthly usage. + +You can find some further details on the [Netdata Plans page](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/plans.md#plans). + +> ⚠️ We expect this to change to a single invoice in the future, but currently do not have a concrete timeline for when this change will happen. diff --git a/docs/cloud/netdata-functions.md b/docs/cloud/netdata-functions.md index e1b9dd0b1..9fcf732cb 100644 --- a/docs/cloud/netdata-functions.md +++ b/docs/cloud/netdata-functions.md @@ -9,6 +9,8 @@ learn_rel_path: "Concepts" learn_docs_purpose: "Present the Netdata Functions what these are and why they should be used." --> +# Netdata Functions + Netdata Agent collectors are able to expose functions that can be executed in run-time and on-demand. These will be executed on the node - host where the function is made available. diff --git a/docs/cloud/runtime-troubleshooting-with-functions.md b/docs/cloud/runtime-troubleshooting-with-functions.md index 3800ea20d..839b8c9ed 100644 --- a/docs/cloud/runtime-troubleshooting-with-functions.md +++ b/docs/cloud/runtime-troubleshooting-with-functions.md @@ -1,13 +1,4 @@ -<!-- -title: "Run-time troubleshooting with Functions" -sidebar_label: "Run-time troubleshooting with Functions" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/runtime-troubleshooting-with-functions.md" -learn_status: "Published" -sidebar_position: "4" -learn_topic_type: "Tasks" -learn_rel_path: "Operations" -learn_docs_purpose: "Instructions on how to use Functions" ---> +# Run-time troubleshooting with Functions Netdata Functions feature allows you to execute on-demand a pre-defined routine on a node where a Netdata Agent is running. These routines are exposed by a given collector. These routines can be used to retrieve additional information to help you troubleshoot or to trigger some action to happen on the node itself. @@ -19,14 +10,14 @@ The following is required to be able to run Functions from Netdata Cloud. * At least one of the nodes claimed to your Space should be on a Netdata agent version higher than `v1.37.1` * Ensure that the node has the collector that exposes the function you want enabled ([see current available functions](https://github.com/netdata/netdata/blob/master/docs/cloud/netdata-functions.md#what-functions-are-currently-available)) -### Execute a function (from functions view) +### Execute a function (from the Functions tab) 1. From the right-hand bar select the **Function** you want to run 2. Still on the right-hand bar select the **Node** where you want to run it 3. Results will be displayed in the central area for you to interact with 4. Additional filtering capabilities, depending on the function, should be available on right-hand bar -### Execute a function (from Nodes view) +### Execute a function (from the Nodes tab) 1. Click on the functions icon for a node that has this active 2. You are directed to the **Functions** tab diff --git a/docs/cloud/spaces.md b/docs/cloud/spaces.md index 31d8a47ae..2a275c14c 100644 --- a/docs/cloud/spaces.md +++ b/docs/cloud/spaces.md @@ -1,14 +1,6 @@ ---- -title: "Spaces" -description: >- - "Organize your infrastructure monitoring on Netdata Cloud by creating Spaces, then groupingyour - Agent-monitored nodes." -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/spaces.md" -sidebar_label: "Spaces" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations" ---- +# Netdata Cloud Spaces + +Organize your multi-organization infrastructure monitoring on Netdata Cloud by creating Spaces to completely isolate access to your Agent-monitored nodes. A Space is a high-level container. It's a collaboration space where you can organize team members, access levels and the nodes you want to monitor. @@ -70,8 +62,9 @@ will open a side tab in which you can: 6. _Manage your bookmarks*_, click on the **Bookmarks** tab to add or remove bookmarks that you need. -:::note \* This action requires admin rights for this space -::: +> ### Note +> +> \* This action requires admin rights for this space ## Obsoleting offline nodes from a Space @@ -84,8 +77,3 @@ Netdata admin users now have the ability to remove obsolete nodes from a space. - If the obsoleted nodes eventually become live or online once more they will be automatically re-added to the space ![Obsoleting an offline node](https://user-images.githubusercontent.com/24860547/173087202-70abfd2d-f0eb-4959-bd0f-74aeee2a2a5a.gif) - -## What's next? - -Once you configured your Spaces, it's time to set up -your [War Rooms](https://github.com/netdata/netdata/blob/master/docs/cloud/war-rooms.md). diff --git a/docs/cloud/visualize/dashboards.md b/docs/cloud/visualize/dashboards.md index 3c6d7ffd5..a9376db17 100644 --- a/docs/cloud/visualize/dashboards.md +++ b/docs/cloud/visualize/dashboards.md @@ -1,14 +1,4 @@ ---- -title: "Build new dashboards" -description: >- - "Design new dashboards that target your infrastructure's unique needs and share them with your team for - targeted visual anomaly detection or incident response." -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md" -sidebar_label: "Build new dashboards" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations/Visualizations" ---- +# Build new dashboards With Netdata Cloud, you can build new dashboards that target your infrastructure's unique needs. Put key metrics from any number of distributed systems in one place for a bird's eye view of your infrastructure. @@ -25,7 +15,7 @@ dashboards](https://user-images.githubusercontent.com/1153921/108529360-a2145d00 In the modal, give your new dashboard a name, and click **+ Add**. Click the **Add Chart** button to add your first chart card. From the dropdown, select either *All Nodes** or a specific -node. If you select **All Nodes**, you will add a [composite chart](/docs/cloud/visualize/overview#composite-charts) to +node. If you select **All Nodes**, you will add a [composite chart](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md) to your new dashboard. Next, select the context. You'll see a preview of the chart before you finish adding it. The **Add Text** button creates a new card with user-defined text, which you can use to describe or document a @@ -44,12 +34,11 @@ of any number of **cards**, which can contain charts or text. ### Chart cards Click the **Add Chart** button to add your first chart card. From the dropdown, select either *All Nodes** or a specific -node. If you select **All Nodes**, you will add a [composite chart](/docs/cloud/visualize/overview#composite-charts) to +node. If you select **All Nodes**, you will add a [composite chart](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md) to your new dashboard. Next, select the context. You'll see a preview of the chart before you finish adding it. The charts you add to any dashboard are fully interactive, just like the charts in an Agent dashboard or a single node's -dashboard in Cloud. Zoom in and out, highlight timeframes, and more. See our -[Agent dashboard docs](https://learn.netdata.cloud/docs/agent/web#using-charts) for all the shortcuts. +dashboard in Cloud. Zoom in and out, highlight timeframes, and more. Charts also synchronize as you interact with them, even across contexts _or_ nodes. @@ -81,7 +70,7 @@ dashboards. ## Pin dashboards Click on the **Pin** button in any dashboard to put those charts into a separate panel at the bottom of the screen. You -can now navigate through Netdata Cloud freely, individual Cloud dashboards, the Nodes view, different War Rooms, or even +can now navigate through Netdata Cloud freely, individual Cloud dashboards, the Nodes tab, different War Rooms, or even different Spaces, and have those valuable metrics follow you. Pinning dashboards helps you correlate potentially related charts across your infrastructure, no matter how you diff --git a/docs/cloud/visualize/interact-new-charts.md b/docs/cloud/visualize/interact-new-charts.md index 4b33fe85f..4c6c2ebf5 100644 --- a/docs/cloud/visualize/interact-new-charts.md +++ b/docs/cloud/visualize/interact-new-charts.md @@ -1,20 +1,6 @@ ---- -title: "Interact with charts" -description: >- - "Learn how to get the most out of Netdata's charts. These charts will help you make sense of all the - metrics at your disposal, helping you troubleshoot with real-time, per-second metric data" -type: "how-to" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md" -sidebar_label: "Interact with charts" -learn_status: "Published" -learn_topic_type: "Concepts" -learn_rel_path: "Operations/Visualizations" ---- - -> ⚠️ This new version of charts is currently **only** available on Netdata Cloud. We didn't want to keep this valuable -> feature from you, so after we get this into your hands on the Cloud, we will collect and implement your feedback. -> Together, we will be able to provide the best possible version of charts on the Netdata Agent dashboard, as quickly as -> possible. +# Interact with charts + +Learn how to use Netdata's powerful charts to troubleshoot with real-time, per-second metric data. Netdata excels in collecting, storing, and organizing metrics in out-of-the-box dashboards. To make sense of all the metrics, Netdata offers an enhanced version of charts that update every second. @@ -33,39 +19,40 @@ These charts provide a lot of useful information, so that you can: - View information about the chart, its plugin, context, and type - Get the chart status and possible errors. On top, reload functionality -These charts will available -on [Overview tab](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md), Single Node view and +These charts are available on Netdata Cloud's +[Overview tab](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md), Single Node tab and on your [Custom Dashboards](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md). +Some of the features listed below are also available on the simpler charts that are available on each agent's user interface. + ## Overview Have a look at the can see the overall look and feel of the charts for both with a composite chart from the [Overview tab](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md) and a simple chart -from the single node view: +from the Single Node tab: -![NRve6zr325.gif](https://images.zenhubusercontent.com/60b4ebb03f4163193ec31819/5ecaf5ec-1229-480e-b122-62f63e9df227) +<img width="678" alt="image" src="https://user-images.githubusercontent.com/43294513/220913360-f3f2ac06-b715-4e99-a933-f3bcb776636f.png"/> With a quick glance you have immediate information available at your disposal: - Chart title and units +- Definition bar - Action bars - Chart area - Legend with dimensions ## Play, Pause and Reset -Your charts are controlled using the -available [Time controls](https://github.com/netdata/netdata/blob/master/docs/dashboard/visualization-date-and-time-controls.mdx#time-controls). +Your charts are controlled using the available +[Time controls](https://github.com/netdata/netdata/blob/master/docs/dashboard/visualization-date-and-time-controls.md#time-controls). Besides these, when interacting with the chart you can also activate these controls by: -- hovering over any chart to temporarily pause it - this momentarily switches time control to Pause, so that you can +- Hovering over any chart to temporarily pause it - this momentarily switches time control to Pause, so that you can hover over a specific timeframe. When moving out of the chart time control will go back to Play (if it was it's previous state) -- clicking on the chart to lock it - this enables the Pause option on the time controls, to the current timeframe. This +- Clicking on the chart to lock it - this enables the Pause option on the time controls, to the current timeframe. This is if you want to jump to a different chart to look for possible correlations. -- double clicking to release a previously locked chart - move the time control back to Play - - ![23CHKCPnnJ.gif](https://images.zenhubusercontent.com/60b4ebb03f4163193ec31819/0b1e111e-df44-4d92-b2e3-be5cfd9db8df) +- Double clicking to release a previously locked chart - move the time control back to Play | Interaction | Keyboard/mouse | Touchpad/touchscreen | Time control | |:------------------|:---------------|:---------------------|:----------------------| @@ -84,7 +71,7 @@ from the chart title to a chart action bar. The elements that you can find on this top bar are: - Netdata icon: this indicates that data is continuously being updated, this happens - if [Time controls](https://github.com/netdata/netdata/blob/master/docs/dashboard/visualization-date-and-time-controls.mdx#time-controls) + if [Time controls](https://github.com/netdata/netdata/blob/master/docs/dashboard/visualization-date-and-time-controls.md#time-controls) are in Play or Force Play mode - Chart status icon: indicates the status of the chart. Possible values are: Loading, Timeout, Error or No data - Chart title: on the chart title you can see the title together with the metric being displayed, as well as the unit of @@ -92,9 +79,147 @@ The elements that you can find on this top bar are: - Chart action bar: here you'll have access to chart info, change chart types, enables fullscreen mode, and the ability to add the chart to a custom dashboard -![image.png](https://images.zenhubusercontent.com/60b4ebb03f4163193ec31819/c8f5f0bd-5f84-4812-970b-0e4340f4773b) +![image](https://user-images.githubusercontent.com/70198089/222689197-f9506ca7-a869-40a9-871f-8c4e1fa4b927.png) + + +## Definition bar + +Each composite chart has a definition bar to provide information about the following: + +* Grouping option +* Aggregate function to be applied in case multiple data sources exist +* Instances +* Nodes +* Dimensions, and +* Aggregate function over time to be applied if one point in the chart consists of multiple data points aggregated + +### Group by dimension, node, or chart + +Click on the **dimension** dropdown to change how a composite chart groups metrics. + +The default option is by _dimension_, so that each line/area in the visualization is the aggregation of a single +dimension. +This provides a per dimension view of the data from all the nodes in the War Room, taking into account filtering +criteria if defined. + +A composite chart grouped by _node_ visualizes a single metric across contributing nodes. If the composite chart has +five +contributing nodes, there will be five lines/areas. This is typically an absolute value of the sum of the dimensions +over each node but there +are some opinionated-but-valuable exceptions where a specific dimension is selected. +Grouping by nodes allows you to quickly understand which nodes in your infrastructure are experiencing anomalous +behavior. + +A composite chart grouped by _instance_ visualizes each instance of one software or hardware on a node and displays +these as a separate dimension. By grouping the +`disk.io` chart by _instance_, you can visualize the activity of each disk on each node that contributes to the +composite +chart. + +Another very pertinent example is composite charts over contexts related to cgroups (VMs and containers). You have the +means to change the default group by or apply filtering to +get a better view into what data your are trying to analyze. For example, if you change the group by to _instance_ you +get a view with the data of all the instances (cgroups) that +contribute to that chart. Then you can use further filtering tools to focus the data that is important to you and even +save the result to your own dashboards. + +![image](https://user-images.githubusercontent.com/82235632/201902017-04b76701-0ff9-4498-aa9b-6d507b567bea.png) + +### Aggregate functions over data sources + +Each chart uses an opinionated-but-valuable default aggregate function over the data sources. For example, +the `system.cpu` chart shows the +average for each dimension from every contributing chart, while the `net.net` chart shows the sum for each dimension +from every contributing chart, which can also come from multiple networking interfaces. + +The following aggregate functions are available for each selected dimension: + +- **Average**: Displays the average value from contributing nodes. If a composite chart has 5 nodes with the following + values for the `out` dimension—`-2.1`, `-5.5`, `-10.2`, `-15`, `-0.1`—the composite chart displays a + value of `−6.58`. +- **Sum**: Displays the sum of contributed values. Using the same nodes, dimension, and values as above, the composite + chart displays a metric value of `-32.9`. +- **Min**: Displays a minimum value. For dimensions with positive values, the min is the value closest to zero. For + charts with negative values, the min is the value with the largest magnitude. +- **Max**: Displays a maximum value. For dimensions with positive values, the max is the value with the largest + magnitude. For charts with negative values, the max is the value closet to zero. + +### Dimensions + +Select which dimensions to display on the composite chart. You can choose **All dimensions**, a single dimension, or any +number of dimensions available on that context. + +### Instances + +Click on **X Instances** to display a dropdown of instances and nodes contributing to that composite chart. Each line in +the dropdown displays an instance name and the associated node's hostname. + +### Nodes + +Click on **X Nodes** to display a dropdown of nodes contributing to that composite chart. Each line displays a hostname +to help you identify which nodes contribute to a chart. You can also use this component to filter nodes directly on the +chart. + +If one or more nodes can't contribute to a given chart, the definition bar shows a warning symbol plus the number of +affected nodes, then lists them in the dropdown along with the associated error. Nodes might return errors because of +networking issues, a stopped `netdata` service, or because that node does not have any metrics for that context. -### Chart action bar +### Aggregate functions over time + +When the granularity of the data collected is higher than the plotted points on the chart an aggregation function over +time +is applied. By default the aggregation applied is _average_ but the user can choose different options from the +following: + +* Min +* Max +* Average +* Sum +* Incremental sum (Delta) +* Standard deviation +* Median +* Single exponential smoothing +* Double exponential smoothing +* Coefficient variation +* Trimmed Median `*` +* Trimmed Mean `*` +* Percentile `**` + +> ### Info +> +> - `*` For **Trimmed Median and Mean** you can choose the percentage of data tha you want to focus on: 1%, 2%, 3%, 5%, 10%, 15%, 20% and 25%. +> - `**` For **Percentile** you can specify the percentile you want to focus on: 25th, 50th, 75th, 80th, 90th, 95th, 97th, 98th and 99th. + +For more details on each, you can refer to our Agent's HTTP API details +on [Data Queries - Data Grouping](https://github.com/netdata/netdata/blob/master/web/api/queries/README.md#data-grouping). + +### Reset to defaults + +Click on the 3-dot icon (**⋮**) on any chart, then **Reset to Defaults**, to reset the definition bar to its initial +state. + +## Jump to single-node dashboards + +Click on **X Charts**/**X Nodes** to display one of the two dropdowns that list the charts and nodes contributing to a +given composite chart. For example, the nodes dropdown. + +![The nodes dropdown in a composite chart](https://user-images.githubusercontent.com/1153921/99305049-7c019b80-2810-11eb-942a-8ebfcf236b7f.png) + +To jump to a single-node dashboard, click on the link icon +<img class="img__inline img__inline--link" src="https://user-images.githubusercontent.com/1153921/95762109-1d219300-0c62-11eb-8daa-9ba509a8e71c.png" /> next to the +node you're interested in. + +The single-node dashboard opens in a new tab. From there, you can continue to troubleshoot or run +[Metric Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md) for faster root +cause analysis. + +## Add composite charts to a dashboard + +Click on the 3-dot icon (**⋮**) on any chart, then click on **Add to Dashboard**. Click the **+** button for any +dashboard you'd like to add this composite chart to, or create a new dashboard an initiate it with your chosen chart by +entering the name and clicking **New Dashboard**. + +## Chart action bar On this bar you have access to immediate actions over the chart, the available actions are: @@ -104,7 +229,8 @@ On this bar you have access to immediate actions over the chart, the available a - Add chart to dashboard: This allows you to add the chart to an existing custom dashboard or directly create a new one that includes the chart. -<img src="https://images.zenhubusercontent.com/60b4ebb03f4163193ec31819/65ac4fc8-3d8d-4617-8234-dbb9b31b4264" width="40%" height="40%" /> +<img src="https://user-images.githubusercontent.com/70198089/222689501-4116f5fe-e447-4359-83b5-62dadb33f4ef.png" width="40%" height="40%" /> + ## Exploration action bar @@ -116,7 +242,7 @@ available actions that you can see are: - Horizontal and Vertical zooms - In-context zoom in and out -<img src="https://images.zenhubusercontent.com/60b4ebb03f4163193ec31819/0417ad66-fcf6-42d5-9a24-e9392ec51f87" width="40%" height="40%" /> +<img src="https://user-images.githubusercontent.com/70198089/222689556-58ad77bc-924f-4c3f-b38b-fc63de2f5773.png" width="40%" height="40%" /> ### Pan @@ -129,24 +255,13 @@ it like pushing the current timeframe off the screen to see what came before or ### Highlight -Selecting timeframes is useful when you see an interesting spike or change in a chart and want to investigate further, -from looking at the same period of time on other charts/sections or triggering actions to help you troubleshoot with an -in-context action bar to help you troubleshoot (currently only available on -Single Node view). The available actions: - -- - -run [Metric Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md) - -- zoom in on the selected timeframe - -[Metric Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md) -will only be available if you respect the timeframe selection limitations. The selected duration pill together with the -button state helps visualize this. +Selecting timeframes is useful when you see an interesting spike or change in a chart and want to investigate further by: -<img src="https://images.zenhubusercontent.com/60b4ebb03f4163193ec31819/2ffc157d-0f0f-402e-80bb-5ffa8a2091d5" width="50%" height="50%" /> +- Looking at the same period of time on other charts/sections +- Running [metric correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md) + to filter metrics that also show something different in the selected period, vs the previous one -<p/> +<img alt="image" src="https://user-images.githubusercontent.com/43294513/221365853-1142944a-ace5-484a-a108-a205d050c594.png" /> | Interaction | Keyboard/mouse | Touchpad/touchscreen | |:-----------------------------------|:---------------------------------------------------------|:---------------------| @@ -160,10 +275,11 @@ week, which is useful in understanding what "normal" looks like, or to identify memory usage. The actions above are _normal_ vertical zoom actions. We also provide an horizontal zoom action that helps you focus on -a -specific Y-axis area to further investigate a spike or dive on your charts. +a specific Y-axis area to further investigate a spike or dive on your charts. + +![f8722ee8-e69b-426c-8bcb-6cb79897c177](https://user-images.githubusercontent.com/70198089/222689676-ad16a2a0-3c3d-48fa-87af-c40ae142dd79.gif) + -![Y5IESOjD3s.gif](https://images.zenhubusercontent.com/60b4ebb03f4163193ec31819/f8722ee8-e69b-426c-8bcb-6cb79897c177) | Interaction | Keyboard/mouse | Touchpad/touchscreen | |:-------------------------------------------|:-------------------------------------|:-----------------------------------------------------| @@ -182,7 +298,7 @@ The bottom legend of the chart where you can see the dimensions of the chart can - Dimension name (Ascending or Descending) - Dimension value (Ascending or Descending) -<img src="https://images.zenhubusercontent.com/60b4ebb03f4163193ec31819/d3031c35-37bc-46c1-bcf9-be29dea0b476" width="50%" height="50%" /> +<img src="https://user-images.githubusercontent.com/70198089/222689791-48c77890-1093-4beb-84c2-7598353ca049.png" width="50%" height="50%" /> ### Show and hide dimensions @@ -200,23 +316,4 @@ To resize the chart, click-and-drag the icon on the bottom-right corner of any c original height, double-click the same icon. -![AjqnkIHB9H.gif](https://images.zenhubusercontent.com/60b4ebb03f4163193ec31819/1bcc6a0a-a58e-457b-8a0c-e5d361a3083c) - -## What's next? - -We recommend you read up on the differences -between [chart dimensions, contexts, and families](https://github.com/netdata/netdata/blob/master/docs/dashboard/dimensions-contexts-families.mdx) -to strengthen your understanding of how Netdata organizes its dashboards. Another valuable way to interact with charts -is to use -the [date and time controls](https://github.com/netdata/netdata/blob/master/docs/dashboard/visualization-date-and-time-controls.mdx), -which helps you visualize specific moments of historical metrics. - -### Further reading & related information - -- Dashboard - - [How the dashboard works](https://github.com/netdata/netdata/blob/master/docs/dashboard/how-dashboard-works.mdx) - - [Chart dimensions, contexts, and families](https://github.com/netdata/netdata/blob/master/docs/dashboard/dimensions-contexts-families.mdx) - - [Date and Time controls](https://github.com/netdata/netdata/blob/master/docs/dashboard/visualization-date-and-time-controls.mdx) - - [Customize the standard dashboard](https://github.com/netdata/netdata/blob/master/docs/dashboard/customize.mdx) - - [Metric Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md) - - [Netdata Agent - Interact with charts](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx) +![1bcc6a0a-a58e-457b-8a0c-e5d361a3083c](https://user-images.githubusercontent.com/70198089/222689845-51a9c054-a57d-49dc-925d-39b924dae2f8.gif) diff --git a/docs/cloud/visualize/kubernetes.md b/docs/cloud/visualize/kubernetes.md index 0ff839703..46e46bc18 100644 --- a/docs/cloud/visualize/kubernetes.md +++ b/docs/cloud/visualize/kubernetes.md @@ -1,4 +1,4 @@ ---- +<!-- title: "Kubernetes visualizations" description: "Netdata Cloud features rich, zero-configuration Kubernetes monitoring for the resource utilization and application metrics of Kubernetes (k8s) clusters." custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/kubernetes.md" @@ -6,25 +6,17 @@ sidebar_label: "Kubernetes visualizations" learn_status: "Published" learn_topic_type: "Concepts" learn_rel_path: "Operations/Visualizations" ---- +--> + +# Kubernetes visualizations Netdata Cloud features enhanced visualizations for the resource utilization of Kubernetes (k8s) clusters, embedded in -the default [Overview](/docs/cloud/visualize/overview/) dashboard. +the default [Overview](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md) dashboard. These visualizations include a health map for viewing the status of k8s pods/containers, in addition to composite charts for viewing per-second CPU, memory, disk, and networking metrics from k8s nodes. -## Before you begin - -In order to use the Kubernetes visualizations in Netdata Cloud, you need: - -- A Kubernetes cluster running Kubernetes v1.9 or newer. -- A Netdata deployment using the latest version of the [Helm chart](https://github.com/netdata/helmchart), which - installs [v1.29.2](https://github.com/netdata/netdata/releases) or newer of the Netdata Agent. -- To connect your Kubernetes cluster to Netdata Cloud. -- To enable the feature flag described below. - -See our [Kubernetes deployment instructions](/docs/agent/packaging/installer/methods/kubernetes/) for details on +See our [Kubernetes deployment instructions](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/kubernetes.md) for details on installation and connecting to Netdata Cloud. ## Available Kubernetes metrics @@ -87,7 +79,7 @@ and `k8s_node_name`. The default is `k8s_controller_name`. ### Filtering -Filtering behaves identically to the [node filter in War Rooms](/docs/cloud/war-rooms#node-filter), with the ability to +Filtering behaves identically to the [node filter in War Rooms](https://github.com/netdata/netdata/blob/master/docs/cloud/war-rooms.md#node-filter), with the ability to filter pods/containers by `container_id` and `namespace`. ### Detailed information @@ -120,7 +112,7 @@ problematic behavior to investigate further, troubleshoot, and remediate with `k The Kubernetes composite charts show real-time and historical resource utilization metrics from nodes, pods, or containers within your Kubernetes deployment. -See the [Overview](/docs/cloud/visualize/overview#definition-bar) doc for details on how composite charts work. These +See the [Overview](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md#definition-bar) doc for details on how composite charts work. These work similarly, but in addition to visualizing _by dimension_ and _by node_, Kubernetes composite charts can also be grouped by the following labels: @@ -148,7 +140,3 @@ There are some caveats and known issues with Kubernetes monitoring with Netdata [drained](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/) from your Kubernetes cluster. These drained nodes will be marked "unreachable" and will show up in War Room management screens/dropdowns. The same applies for any ephemeral nodes created and destroyed during horizontal scaling. - -## What's next? - -For more information about monitoring a k8s cluster with Netdata, see our guide: [_Kubernetes monitoring with Netdata: Overview and visualizations_](/guides/monitor/kubernetes-k8s-netdata/). diff --git a/docs/cloud/visualize/node-filter.md b/docs/cloud/visualize/node-filter.md new file mode 100644 index 000000000..889caaf87 --- /dev/null +++ b/docs/cloud/visualize/node-filter.md @@ -0,0 +1,21 @@ +# Node filter + +The node filter allows you to quickly filter the nodes visualized in a War Room's views. It appears on all views, except on single-node dashboards. + +Inside the filter, the nodes get categorized into three groups: + +- Live nodes + Nodes that are currently online, collecting and streaming metrics to Cloud. + - Live nodes display raised [Alert](https://github.com/netdata/netdata/blob/master/docs/monitor/view-active-alarms.md) counters, [Machine Learning](https://github.com/netdata/netdata/blob/master/ml/README.md) availability, and [Functions](https://github.com/netdata/netdata/blob/master/docs/cloud/netdata-functions.md) availability +- Stale nodes + Nodes that are offline and not streaming metrics to Cloud. Only historical data can be presented from a parent node. + - For these nodes you can only see their ML status, as they are not online to provide more information +- Offline nodes + Nodes that are offline, not streaming metrics to Cloud and not available in any parent node. + Offline nodes are automatically deleted after 30 days and can also be deleted manually. + +By using the search bar, you can narrow down to specific nodes based on their name. + +When you select one or more nodes, the total selected number will appear in the **Nodes** bar on the **Selected** field. + +![The node filter](https://user-images.githubusercontent.com/70198089/225249850-60ce4fcc-4398-4412-a6b5-6082308f4e60.png) diff --git a/docs/cloud/visualize/nodes.md b/docs/cloud/visualize/nodes.md index 9878b6b10..4160166f7 100644 --- a/docs/cloud/visualize/nodes.md +++ b/docs/cloud/visualize/nodes.md @@ -1,20 +1,12 @@ ---- -title: "Nodes view" -description: "See charts from all your nodes in one pane of glass, then dive in to embedded dashboards for granular troubleshooting of ongoing issues." -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/nodes.md" -sidebar_label: "Nodes view" -learn_status: "Published" -learn_topic_type: "Concepts" -learn_rel_path: "Operations/Visualizations" ---- - -The Nodes view lets you see and customize key metrics from any number of Agent-monitored nodes and seamlessly navigate +# Nodes tab + +The Nodes tab lets you see and customize key metrics from any number of Agent-monitored nodes and seamlessly navigate to any node's dashboard for troubleshooting performance issues or anomalies using Netdata's highly-granular metrics. -![The Nodes view in Netdata +![The Nodes tab in Netdata Cloud](https://user-images.githubusercontent.com/1153921/119035218-2eebb700-b964-11eb-8b74-4ec2df0e457c.png) -Each War Room's Nodes view is populated based on the nodes you added to that specific War Room. Each node occupies a +Each War Room's Nodes tab is populated based on the nodes you added to that specific War Room. Each node occupies a single row, first featuring that node's alarm status (yellow for warnings, red for critical alarms) and operating system, some essential information about the node, followed by columns of user-defined key metrics represented in real-time charts. @@ -39,15 +31,9 @@ These customizations appear for anyone else with access to that War Room. ## See more metrics in Netdata Cloud If you want to add more metrics to your War Rooms and they don't show up when you add new metrics to Nodes, you likely -need to configure those nodes to collect from additional data sources. See our [collectors doc](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure.md) +need to configure those nodes to collect from additional data sources. See our [collectors configuration reference](https://github.com/netdata/netdata/blob/master/collectors/REFERENCE.md) to learn how to use dozens of pre-installed collectors that can instantly collect from your favorite services and applications. -If you want to see up to 30 days of historical metrics in Cloud (and more on individual node dashboards), read our guide -on [long-term storage of historical metrics](https://github.com/netdata/netdata/blob/master/docs/guides/longer-metrics-storage.md). Also, see our -[calculator](/docs/store/change-metrics-storage#calculate-the-system-resources-RAM-disk-space-needed-to-store-metrics) +If you want to see up to 30 days of historical metrics in Cloud (and more on individual node dashboards), read about [changing how long Netdata stores metrics](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md). Also, see our +[calculator](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md#calculate-the-system-resources-ram-disk-space-needed-to-store-metrics) for finding the disk and RAM you need to store metrics for a certain period of time. - -## What's next? - -Now that you know how to view your nodes at a glance, learn how to [track active -alarms](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/view-active-alerts.mdx) with the Alerts Smartboard. diff --git a/docs/cloud/visualize/overview.md b/docs/cloud/visualize/overview.md index 35c07656a..84638f058 100644 --- a/docs/cloud/visualize/overview.md +++ b/docs/cloud/visualize/overview.md @@ -1,250 +1,48 @@ ---- -title: "Home, Overview and Single Node view" -description: >- - "The Home tab automatically presents relevant information of your War Room, the Overview uses composite - charts from all the nodes in a given War Room and Single Node view provides a look at a specific Node" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md" -sidebar_label: "Home, Overview and Single Node view" -learn_status: "Published" -learn_topic_type: "Concepts" -learn_rel_path: "Operations/Visualizations" ---- +# Home, overview and single node tabs + +Learn how to use the Home, Overview, and Single Node tabs in Netdata Cloud, to explore your infrastructure and troubleshoot issues. ## Home The Home tab provides a predefined dashboard of relevant information about entities in the War Room. -This tab will -automatically present summarized information in an easily digestible display. You can see information about your +This tab will automatically present summarized information in an easily digestible display. You can see information about your nodes, data collection and retention stats, alerts, users and dashboards. -## Overview +## Overview and single node tab The Overview tab is another great way to monitor infrastructure using Netdata Cloud. While the interface might look -similar to local -dashboards served by an Agent Overview uses **composite charts**. +similar to local dashboards served by an Agent Overview uses **composite charts**. These charts display real-time aggregated metrics from all the nodes (or a filtered selection) in a given War Room. -With Overview's composite charts, you can see your infrastructure from a single pane of glass, discover trends or -anomalies, then drill down by grouping metrics by node and jumping to single-node dashboards for root cause analysis. - -## Single Node view - -The Single Node view dashboard engine is the same as the Overview, meaning that it also uses **composite charts**, and -displays real-time aggregated metrics from a specific node. - -As mentioned above, the interface is similar to local dashboards served by an Agent but this dashboard also uses * -*composite charts** which, in the case of a single node, will aggregate -multiple chart _instances_ belonging to a context into a single chart. For example, on `disk.io` context it will get -into a single chart an aggregated view of each disk the node has. - -Further tools provided in composite chart [definiton bar](/docs/cloud/visualize/overview#definition-bar) will allow you -to explore in more detail what is happening on each _instance_. - -## Before you get started - -Only nodes with v1.25.0-127 or later of the the [open-source Netdata](https://github.com/netdata/netdata) monitoring -agent can contribute to composite charts. If your node(s) use an earlier version of Netdata, you will see them marked as -**needs upgrade** in various dropdowns. - -See our [update docs](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md) for the preferred -update method based on how you installed -Netdata. - -## Composite charts - -The Overview uses composite charts, which aggregate metrics from all the nodes (or a filtered selection) in a given War -Room. - -## Definition bar - -Each composite chart has a definition bar to provide information about the following: - -* Grouping option -* Aggregate function to be applied in case multiple data sources exist -* Instances -* Nodes -* Dimensions, and -* Aggregate function over time to be applied if one point in the chart consists of multiple data points aggregated - -### Group by dimension, node, or chart - -Click on the **dimension** dropdown to change how a composite chart groups metrics. - -The default option is by _dimension_, so that each line/area in the visualization is the aggregation of a single -dimension. -This provides a per dimension view of the data from all the nodes in the War Room, taking into account filtering -criteria if defined. - -A composite chart grouped by _node_ visualizes a single metric across contributing nodes. If the composite chart has -five -contributing nodes, there will be five lines/areas. This is typically an absolute value of the sum of the dimensions -over each node but there -are some opinionated-but-valuable exceptions where a specific dimension is selected. -Grouping by nodes allows you to quickly understand which nodes in your infrastructure are experiencing anomalous -behavior. - -A composite chart grouped by _instance_ visualizes each instance of one software or hardware on a node and displays -these as a separate dimension. By grouping the -`disk.io` chart by _instance_, you can visualize the activity of each disk on each node that contributes to the -composite -chart. - -Another very pertinent example is composite charts over contexts related to cgroups (VMs and containers). You have the -means to change the default group by or apply filtering to -get a better view into what data your are trying to analyze. For example, if you change the group by to _instance_ you -get a view with the data of all the instances (cgroups) that -contribute to that chart. Then you can use further filtering tools to focus the data that is important to you and even -save the result to your own dashboards. - -![image](https://user-images.githubusercontent.com/82235632/201902017-04b76701-0ff9-4498-aa9b-6d507b567bea.png) - -### Aggregate functions over data sources - -Each chart uses an opinionated-but-valuable default aggregate function over the data sources. For example, -the `system.cpu` chart shows the -average for each dimension from every contributing chart, while the `net.net` chart shows the sum for each dimension -from every contributing chart, which can also come from multiple networking interfaces. - -The following aggregate functions are available for each selected dimension: - -- **Average**: Displays the average value from contributing nodes. If a composite chart has 5 nodes with the following - values for the `out` dimension—`-2.1`, `-5.5`, `-10.2`, `-15`, `-0.1`—the composite chart displays a - value of `−6.58`. -- **Sum**: Displays the sum of contributed values. Using the same nodes, dimension, and values as above, the composite - chart displays a metric value of `-32.9`. -- **Min**: Displays a minimum value. For dimensions with positive values, the min is the value closest to zero. For - charts with negative values, the min is the value with the largest magnitude. -- **Max**: Displays a maximum value. For dimensions with positive values, the max is the value with the largest - magnitude. For charts with negative values, the max is the value closet to zero. - -### Dimensions - -Select which dimensions to display on the composite chart. You can choose **All dimensions**, a single dimension, or any -number of dimensions available on that context. +When you [interact with composite charts](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md) +you can see your infrastructure from a single pane of glass, discover trends or anomalies, and perform root cause analysis. -### Instances +The Single Node tab dashboard is exactly the same as the Overview, but with a hard-coded filter to only show a single node. -Click on **X Instances** to display a dropdown of instances and nodes contributing to that composite chart. Each line in -the -dropdown displays an instance name and the associated node's hostname. +### Chart navigation Menu -### Nodes - -Click on **X Nodes** to display a dropdown of nodes contributing to that composite chart. Each line displays a hostname -to help you identify which nodes contribute to a chart. You can also use this component to filter nodes directly on the -chart. - -If one or more nodes can't contribute to a given chart, the definition bar shows a warning symbol plus the number of -affected nodes, then lists them in the dropdown along with the associated error. Nodes might return errors because of -networking issues, a stopped `netdata` service, or because that node does not have any metrics for that context. - -### Aggregate functions over time - -When the granularity of the data collected is higher than the plotted points on the chart an aggregation function over -time -is applied. By default the aggregation applied is _average_ but the user can choose different options from the -following: - -* Min -* Max -* Average -* Sum -* Incremental sum (Delta) -* Standard deviation -* Median -* Single exponential smoothing -* Double exponential smoothing -* Coefficient variation -* Trimmed Median `*` -* Trimmed Mean `*` -* Percentile `**` - -:::info - -- `*` For **Trimmed Median and Mean** you can choose the percentage of data tha you want to focus on: 1%, 2%, 3%, 5%, - 10%, 15%, 20% and 25%. -- `**` For **Percentile** you can specify the percentile you want to focus on: 25th, 50th, 75th, 80th, 90th, 95th, 97th, - 98th and 99th. - -::: - -For more details on each, you can refer to our Agent's HTTP API details -on [Data Queries - Data Grouping](/docs/agent/web/api/queries#data-grouping). - -### Reset to defaults - -Click on the 3-dot icon (**⋮**) on any chart, then **Reset to Defaults**, to reset the definition bar to its initial -state. - -## Jump to single-node dashboards - -Click on **X Charts**/**X Nodes** to display one of the two dropdowns that list the charts and nodes contributing to a -given composite chart. For example, the nodes dropdown. - -![The nodes dropdown in a composite -chart](https://user-images.githubusercontent.com/1153921/99305049-7c019b80-2810-11eb-942a-8ebfcf236b7f.png) - -To jump to a single-node dashboard, click on the link icon <img class="img__inline img__inline--link" -src="https://user-images.githubusercontent.com/1153921/95762109-1d219300-0c62-11eb-8daa-9ba509a8e71c.png" /> next to the -node you're interested in. - -The single-node dashboard opens in a new tab. From there, you can continue to troubleshoot or run [Metric -Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md) for faster root -cause analysis. - -## Add composite charts to a dashboard - -Click on the 3-dot icon (**⋮**) on any chart, then click on **Add to Dashboard**. Click the **+** button for any -dashboard you'd like to add this composite chart to, or create a new dashboard an initiate it with your chosen chart by -entering the name and clicking **New Dashboard**. - -## Interacting with composite charts: pan, zoom, and resize - -You can interact with composite charts as you would with other Netdata charts. You can use the controls beneath each -chart to pan, zoom, or resize the chart, or use various combinations of the keyboard and mouse. See -the [chart interaction doc](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx) for -details. - -## Menu - -The Overview uses a similar menu to local Agent dashboards and single-node dashboards in Netdata Cloud, with sections +Netdata Cloud uses a similar menu to local Agent dashboards, with sections and sub-menus aggregated from every contributing node. For example, even if only two nodes actively collect from and monitor an Apache web server, the **Apache** section still appears and displays composite charts from those two nodes. -![A menu in the Overview -screen](https://user-images.githubusercontent.com/1153921/95785094-fa0ad980-0c89-11eb-8328-2ff11ac630b4.png) +![A menu in the Overview screen](https://user-images.githubusercontent.com/1153921/95785094-fa0ad980-0c89-11eb-8328-2ff11ac630b4.png) -One difference between the Overview's menu and those found in single-node dashboards or local Agent dashboards is that +One difference between the Netdata Cloud menu and those found in local Agent dashboards is that the Overview condenses multiple services, families, or instances into single sections, sub-menus, and associated charts. -For services, let's say you have two concurrent jobs with the [web_log -collector](https://github.com/netdata/go.d.plugin/blob/master/modules/weblog/README.md), one for Apache and another for -Nginx. A single-node or -local dashboard shows two section, **web_log apache** and **web_log nginx**, whereas the Overview condenses these into a +For services, let's say you have two concurrent jobs with the [web_log collector](https://github.com/netdata/go.d.plugin/blob/master/modules/weblog/README.md), one for Apache and another for Nginx. +A single-node or local dashboard shows two section, **web_log apache** and **web_log nginx**, whereas the Overview condenses these into a single **web_log** section containing composite charts from both jobs. -The Overview also consdenses multiple families or multiple instances into a single **all** sub-menu and associated -charts. For example, if Node A has 5 disks, and Node B has 3, each disk contributes to a single `disk.io` composite -chart. The utility bar should show that there are 8 charts from 2 nodes contributing to that chart. - -This action applies to disks, network devices, and other metric types that involve multiple instances of a piece of -hardware or software. The Overview currently does not display metrics from filesystems. Read more about [families and -instances](https://github.com/netdata/netdata/blob/master/docs/dashboard/dimensions-contexts-families.mdx) +The Cloud also condenses multiple families or multiple instances into a single **all** sub-menu and associated charts. +For example, if Node A has 5 disks, and Node B has 3, each disk contributes to a single `disk.io` composite chart. +The utility bar should show that there are 8 charts from 2 nodes contributing to that chart. +The aggregation applies to disks, network devices, and other metric types that involve multiple instances of a piece of hardware or software. ## Persistence of composite chart settings -When you change a composite chart via its definition bar, Netdata Cloud persists these settings in a query string -attached to the URL in your browser. You can "save" these settings by bookmarking this particular URL, or share it with -colleagues by having them copy-paste it into their browser. - -## What's next? - -For another way to view an infrastructure from a high level, see -the [Nodes view](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/nodes.md). - -If you need a refresher on how Netdata's charts work, see our doc -on [interacting with charts](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx). +Of course you can [change the filtering or grouping](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md) of metrics in the composite charts that aggregate all these instances, to see only the information you are interested in, and save that tab in a custom dashboard. -Or, get more granular with configuring how you monitor your infrastructure -by [building new dashboards](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md). +When you change a composite chart via its definition bar, Netdata Cloud persists these settings in a query string attached to the URL in your browser. +You can "save" these settings by bookmarking this particular URL, or share it with colleagues by having them copy-paste it into their browser. diff --git a/docs/cloud/war-rooms.md b/docs/cloud/war-rooms.md index 99f9e3680..c599fd5b4 100644 --- a/docs/cloud/war-rooms.md +++ b/docs/cloud/war-rooms.md @@ -1,162 +1,60 @@ ---- -title: "War Rooms" -description: >- - "Netdata Cloud uses War Rooms to group related nodes and create insightful compositedashboards based on - their aggregate health and performance." -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/cloud/war-rooms.md" -sidebar_label: "War Rooms" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations" ---- - -War Rooms organize your connected nodes and provide infrastructure-wide dashboards using real-time metrics and -visualizations. - -Once you add nodes to a Space, all of your nodes will be visible in the _All nodes_ War Room. This is a special War Room -which gives you an overview of all of your nodes in this particular space. Then you can create functional separations of -your nodes into more War Rooms. Every War Room has its own dashboards, navigation, indicators, and management tools. - -![An example War Room](/img/cloud/main-page.png) - -## Navigation - -### Switching between views - static tabs - -Every War Rooms provides multiple views. Each view focus on a particular area/subject of the nodes which you monitor in -this War Rooms. Let's explore what view you have available: - -- The default view for any War Room is - the [Home tab](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md#home), which give you - an overview - of this space. Here you can see the number of Nodes claimed, data retention statics, user particate, alerts and more - -- The second and most important view is - the [Overview tab](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md#overview) which - uses composite - charts to display real-time metrics from every available node in a given War Room. - -- The [Nodes tab](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/nodes.md) gives you the ability to - see the status (offline or online), host details - , alarm status and also a short overview of some key metrics from all your nodes at a glance. - -- [Kubernetes tab](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/kubernetes.md) is a logical - grouping of charts regards to your Kubernetes clusters. - It contains a subset of the charts available in the _Overview tab_ - -- - -The [Dashboards tab](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md) -gives you the ability to have tailored made views of -specific/targeted interfaces for your infrastructure using any number of charts from any number of nodes. - -- The **Alerts tab** provides you with an overview for all the active alerts you receive for the nodes in this War Room, - you can also see alla the alerts that are configured to be triggered in any given moment. - -- The **Anomalies tab** is dedicated to - the [Anomaly Advisor](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/anomaly-advisor.mdx) tool +# Netdata Cloud War rooms -### Non static tabs +Netdata Cloud uses War Rooms to organize your connected nodes and provide infrastructure-wide dashboards using real-time metrics and visualizations. -If you open -a [new dashboard](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md), -jump to a single-node dashboard, or navigate to a dedicated alert page they will open in a new War Room tab. - -Tabs can be rearranged with drag-and-drop or closed with the **X** button. Open tabs persist between sessions, so you -can always come right back to your preferred setup. - -### Play, pause, force play, and timeframe selector - -A War Room has three different states: playing, paused, and force playing. The default playing state refreshes charts -every second as long as the browser tab is in -focus. [Interacting with a chart](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx) -pauses -the War Room. Once the tab loses focus, charts pause automatically. - -The top navigation bar features a play/pause button to quickly change the state, and a dropdown to select **Force Play** -, which keeps charts refreshing, potentially at the expense of system performance. - -Next to the play/pause button is the timeframe selector, which helps you select a precise window of metrics data to -visualize. By default, all visualizations in Netdata Cloud show the last 15 minutes of metrics data. - -Use the **Quick Selector** to visualize metrics from predefined timeframes, or use the input field below to enter a -number and an appropriate unit of time. The calendar allows you to select multiple days of metrics data. - -Click **Apply** to re-render all visualizations with new metrics data streamed to your browser from each distributed -node. Click **Clear** to remove any changes and apply the default 15-minute timeframe. - -The fields beneath the calendar display the beginning and ending timestamps your selected timeframe. - -### Node filter - -The node filter allows you to quickly filter the nodes visualized in a War Room's views. It appears on all views, but -not on single-node dashboards. +Once you add nodes to a Space, all of your nodes will be visible in the **All nodes** War Room. This is a special War Room +which gives you an overview of all of your nodes in this particular Space. Then you can create functional separations of +your nodes into more War Rooms. Every War Room has its own dashboards, navigation, indicators, and management tools. -![The node filter](https://user-images.githubusercontent.com/12612986/172674440-df224058-2b2c-41da-bb45-f4eb82e342e5.png) +![An example War Room](https://user-images.githubusercontent.com/43294513/225355998-f16730ba-06d4-4953-8fd3-f1c2751e102d.png) ## War Room organization We recommend a few strategies for organizing your War Rooms. -**Service, purpose, location, etc.**: You can group War Rooms by a service (think Nginx, MySQL, Pulsar, and so on), -their purpose (webserver, database, application), their physical location, whether they're baremetal or a Docker -container, the PaaS/cloud provider it runs on, and much more. This allows you to see entire slices of your -infrastructure by moving from one War Room to another. +- **Service, purpose, location, etc.** + You can group War Rooms by a service (Nginx, MySQL, Pulsar, and so on), their purpose (webserver, database, application), their physical location, whether they're "bare metal" or a Docker container, the PaaS/cloud provider it runs on, and much more. + This allows you to see entire slices of your infrastructure by moving from one War Room to another. -**End-to-end apps/services**: If you have a user-facing SaaS product, or an internal service that said product relies -on, you may want to monitor that entire stack in a single War Room. This might include Kubernetes clusters, Docker -containers, proxies, databases, web servers, brokers, and more. End-to-end War Rooms are valuable tools for ensuring the -health and performance of your organization's essential services. +- **End-to-end apps/services** + If you have a user-facing SaaS product, or an internal service that this said product relies on, you may want to monitor that entire stack in a single War Room. This might include Kubernetes clusters, Docker containers, proxies, databases, web servers, brokers, and more. + End-to-end War Rooms are valuable tools for ensuring the health and performance of your organization's essential services. -**Incident response**: You can also create new War Rooms as one of the first steps in your incident response process. -For example, you have a user-facing web app that relies on Apache Pulsar for a message queue, and one of your nodes -using the [Pulsar collector](https://github.com/netdata/go.d.plugin/blob/master/modules/pulsar/README.md) begins -reporting a suspiciously low messages rate. You can create a War Room called `$year-$month-$day-pulsar-rate`, add all -your Pulsar nodes in addition to nodes they connect to, and begin diagnosing the root cause in a War Room optimized for -getting to resolution as fast as possible. +- **Incident response** + You can also create new War Rooms as one of the first steps in your incident response process. + For example, you have a user-facing web app that relies on Apache Pulsar for a message queue, and one of your nodes using the [Pulsar collector](https://github.com/netdata/go.d.plugin/blob/master/modules/pulsar/README.md) begins reporting a suspiciously low messages rate. + You can create a War Room called `$year-$month-$day-pulsar-rate`, add all your Pulsar nodes in addition to nodes they connect to, and begin diagnosing the root cause in a War Room optimized for getting to resolution as fast as possible. ## Add War Rooms -To add new War Rooms to any Space, click on the green plus icon **+** next the **War Rooms** heading. on the left ( -space's) sidebar. +To add new War Rooms to any Space, click on the green plus icon **+** next the **War Rooms** heading on the left (Space's) sidebar. -In the panel, give the War Room a name and description, and choose whether it's public or private. Anyone in your Space -can join public War Rooms, but can only join private War Rooms with an invitation. +In the panel, give the War Room a name and description, and choose whether it's public or private. +Anyone in your Space can join public War Rooms, but can only join private War Rooms with an invitation. ## Manage War Rooms -All the users and nodes involved in a particular space can potential be part of a War Room. +All the users and nodes involved in a particular Space can be part of a War Room. -Any user can change simple settings of a War room, like the name or the users participating in it. Click on the gear -icon of the War Room's name in the top of the page to do that. A sidebar will open with options for this War Room: +Any user can change simple settings of a War room, like the name or the users participating in it. +Click on the gear icon of the War Room's name in the top of the page to do that. A sidebar will open with options for this War Room: -1. To _change a War Room's name, description, or public/private status_, click on **War Room** tab of the sidebar. +1. To **change a War Room's name, description, or public/private status**, click on **War Room** tab. -2. To _include an existing node_ to a War Room or _connect a new node*_ click on **Nodes** tab of the sidebar. Choose - any - connected node you want to add to this War Room by clicking on the checkbox next to its hostname, then click **+ Add - ** - at the top of the panel. +2. To **include an existing node** to a War Room or **connect a new node\*** click on **Nodes** tab. Choose any connected node you want to add to this War Room by clicking on the checkbox next to its hostname, then click **+ Add** at the top of the panel. -3. To _add existing users to a War Room_, click on **Add Users**. See - our [invite doc](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/invite-your-team.md) - for details on inviting new users to your Space in Netdata Cloud. +3. To **add existing users to a War Room**, click on **Add Users**. + See our [invite doc](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/invite-your-team.md) for details on inviting new users to your Space in Netdata Cloud. -:::note -\* This action requires admin rights for this space -::: +> ### Note +> +>\* This action requires **admin** rights for this Space ### More actions -To _view or remove nodes_ in a War Room, click on **Nodes view**. To remove a node from the current War Room, click on +To **view or remove nodes** in a War Room, click on the **Nodes tab**. To remove a node from the current War Room, click on the **🗑** icon. -:::info -Removing a node from a War Room does not remove it from your Space. -::: - -## What's next? - -Once you've figured out an organizational structure that works for your team, learn more about how you can use Netdata -Cloud to monitor distributed nodes -using [real-time composite charts](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md). +> ### Info +> +> Removing a node from a War Room does not remove it from your Space. diff --git a/docs/collect/application-metrics.md b/docs/collect/application-metrics.md index 454ed95ad..ec73cefe3 100644 --- a/docs/collect/application-metrics.md +++ b/docs/collect/application-metrics.md @@ -51,11 +51,10 @@ application metrics collectors, including those for containers/k8s clusters. ## Collect metrics from applications running on Windows Netdata is fully capable of collecting and visualizing metrics from applications running on Windows systems. The only -caveat is that you must [install Netdata](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx) on a separate system or a compatible VM because there +caveat is that you must [install Netdata](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md) on a separate system or a compatible VM because there is no native Windows version of the Netdata Agent. -Once you have Netdata running on that separate system, you can follow the [enable and configure -doc](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure.md) to tell the collector to look for exposed metrics on the Windows system's IP +Once you have Netdata running on that separate system, you can follow the [collectors configuration reference](https://github.com/netdata/netdata/blob/master/collectors/REFERENCE.md) documentation to tell the collector to look for exposed metrics on the Windows system's IP address or hostname, plus the applicable port. For example, you have a MySQL database with a root password of `my-secret-pw` running on a Windows system with the IP diff --git a/docs/collect/container-metrics.md b/docs/collect/container-metrics.md index b6b6a432c..cde541839 100644 --- a/docs/collect/container-metrics.md +++ b/docs/collect/container-metrics.md @@ -55,8 +55,7 @@ metrics](https://github.com/netdata/go.d.plugin/blob/master/modules/mysql/README ### Collect metrics from applications running in Docker containers -You could use this technique to monitor an entire infrastructure of Docker containers. The same [enable and -configure](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure.md) procedures apply whether an application runs on the host system or inside +You could use this technique to monitor an entire infrastructure of Docker containers. The same [enable and configure](https://github.com/netdata/netdata/blob/master/collectors/REFERENCE.md) procedures apply whether an application runs on the host system or inside a container. You may need to configure the target endpoint if it's not the application's default. Netdata can even [run in a Docker container](https://github.com/netdata/netdata/blob/master/packaging/docker/README.md) itself, and then collect metrics about the diff --git a/docs/collect/enable-configure.md b/docs/collect/enable-configure.md deleted file mode 100644 index cd8960ac1..000000000 --- a/docs/collect/enable-configure.md +++ /dev/null @@ -1,72 +0,0 @@ -<!-- -title: "Enable or configure a collector" -description: "Every collector is highly configurable, allowing them to collect metrics from any node and any infrastructure." -custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/collect/enable-configure.md" -sidebar_label: "Enable or configure a collector" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup" ---> - -# Enable or configure a collector - -When Netdata starts up, each collector searches for exposed metrics on the default endpoint established by that service -or application's standard installation procedure. For example, the [Nginx -collector](https://github.com/netdata/go.d.plugin/blob/master/modules/nginx/README.md) searches at -`http://127.0.0.1/stub_status` for exposed metrics in the correct format. If an Nginx web server is running and exposes -metrics on that endpoint, the collector begins gathering them. - -However, not every node or infrastructure uses standard ports, paths, files, or naming conventions. You may need to -enable or configure a collector to gather all available metrics from your systems, containers, or applications. - -## Enable a collector or its orchestrator - -You can enable/disable collectors individually, or enable/disable entire orchestrators, using their configuration files. -For example, you can change the behavior of the Go orchestrator, or any of its collectors, by editing `go.d.conf`. - -Use `edit-config` from your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) to open -the orchestrator primary configuration file: - -```bash -cd /etc/netdata -sudo ./edit-config go.d.conf -``` - -Within this file, you can either disable the orchestrator entirely (`enabled: yes`), or find a specific collector and -enable/disable it with `yes` and `no` settings. Uncomment any line you change to ensure the Netdata daemon reads it on -start. - -After you make your changes, restart the Agent with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. - -## Configure a collector - -First, [find the collector](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md) you want to edit and open its documentation. Some software has -collectors written in multiple languages. In these cases, you should always pick the collector written in Go. - -Use `edit-config` from your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) to open a -collector's configuration file. For example, edit the Nginx collector with the following: - -```bash -./edit-config go.d/nginx.conf -``` - -Each configuration file describes every available option and offers examples to help you tweak Netdata's settings -according to your needs. In addition, every collector's documentation shows the exact command you need to run to -configure that collector. Uncomment any line you change to ensure the collector's orchestrator or the Netdata daemon -read it on start. - -After you make your changes, restart the Agent with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. - -## What's next? - -Read high-level overviews on how Netdata collects [system metrics](https://github.com/netdata/netdata/blob/master/docs/collect/system-metrics.md), [container -metrics](https://github.com/netdata/netdata/blob/master/docs/collect/container-metrics.md), and [application metrics](https://github.com/netdata/netdata/blob/master/docs/collect/application-metrics.md). - -If you're already collecting all metrics from your systems, containers, and applications, it's time to move into -Netdata's visualization features. [See an overview of your infrastructure](https://github.com/netdata/netdata/blob/master/docs/visualize/overview-infrastructure.md) -using Netdata Cloud, or learn how to [interact with dashboards and -charts](https://github.com/netdata/netdata/blob/master/docs/visualize/interact-dashboards-charts.md). - - diff --git a/docs/collect/how-collectors-work.md b/docs/collect/how-collectors-work.md deleted file mode 100644 index 382d4ccc6..000000000 --- a/docs/collect/how-collectors-work.md +++ /dev/null @@ -1,82 +0,0 @@ -<!-- -title: "How Netdata's metrics collectors work" -description: "When Netdata starts, and with zero configuration, it auto-detects thousands of data sources and immediately collects per-second metrics." -custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/collect/how-collectors-work.md" -sidebar_label: "How Netdata's metrics collectors work" -learn_status: "Published" -learn_topic_type: "Concepts" -learn_rel_path: "Concepts" ---> - -# How Netdata's metrics collectors work - -When Netdata starts, and with zero configuration, it auto-detects thousands of data sources and immediately collects -per-second metrics. - -Netdata can immediately collect metrics from these endpoints thanks to 300+ **collectors**, which all come pre-installed -when you [install Netdata](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx). - -Every collector has two primary jobs: - -- Look for exposed metrics at a pre- or user-defined endpoint. -- Gather exposed metrics and use additional logic to build meaningful, interactive visualizations. - -If the collector finds compatible metrics exposed on the configured endpoint, it begins a per-second collection job. The -Netdata Agent gathers these metrics, sends them to the [database engine for -storage](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md), and immediately [visualizes them -meaningfully](https://github.com/netdata/netdata/blob/master/docs/visualize/interact-dashboards-charts.md) on dashboards. - -Each collector comes with a pre-defined configuration that matches the default setup for that application. This endpoint -can be a URL and port, a socket, a file, a web page, and more. - -For example, the [Nginx collector](https://github.com/netdata/go.d.plugin/blob/master/modules/nginx/README.md) searches -at `http://127.0.0.1/stub_status`, which is the default endpoint for exposing Nginx metrics. The [web log collector for -Nginx or Apache](https://github.com/netdata/go.d.plugin/blob/master/README.mdmodules/weblog) searches at -`/var/log/nginx/access.log` and `/var/log/apache2/access.log`, respectively, both of which are standard locations for -access log files on Linux systems. - -The endpoint is user-configurable, as are many other specifics of what a given collector does. - -## What can Netdata collect? - -To quickly find your answer, see our [list of supported collectors](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md). - -Generally, Netdata's collectors can be grouped into three types: - -- [Systems](https://github.com/netdata/netdata/blob/master/docs/collect/system-metrics.md): Monitor CPU, memory, disk, networking, systemd, eBPF, and much more. - Every metric exposed by `/proc`, `/sys`, and other Linux kernel sources. -- [Containers](https://github.com/netdata/netdata/blob/master/docs/collect/container-metrics.md): Gather metrics from container agents, like `dockerd` or `kubectl`, - along with the resource usage of containers and the applications they run. -- [Applications](https://github.com/netdata/netdata/blob/master/docs/collect/application-metrics.md): Collect per-second metrics from web servers, databases, logs, - message brokers, APM tools, email servers, and much more. - -## Collector architecture and terminology - -**Collector** is a catch-all term for any Netdata process that gathers metrics from an endpoint. - -While we use _collector_ most often in documentation, release notes, and educational content, you may encounter other -terms related to collecting metrics. - -- **Modules** are a type of collector. -- **Orchestrators** are external plugins that run and manage one or more modules. They run as independent processes. - The Go orchestrator is in active development. - - [go.d.plugin](https://github.com/netdata/go.d.plugin/blob/master/README.md): An orchestrator for data - collection modules written in `go`. - - [python.d.plugin](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/README.md): An orchestrator for data collection modules written in - `python` v2/v3. - - [charts.d.plugin](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/README.md): An orchestrator for data collection modules written in - `bash` v4+. -- **External plugins** gather metrics from external processes, such as a webserver or database, and run as independent - processes that communicate with the Netdata daemon via pipes. -- **Internal plugins** gather metrics from `/proc`, `/sys`, and other Linux kernel sources. They are written in `C`, - and run as threads within the Netdata daemon. - -## What's next? - -[Enable or configure a collector](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure.md) if the default settings are not compatible with -your infrastructure. - -See our [collectors reference](https://github.com/netdata/netdata/blob/master/collectors/REFERENCE.md) for detailed information on Netdata's collector architecture, -troubleshooting a collector, developing a custom collector, and more. - - diff --git a/docs/collect/system-metrics.md b/docs/collect/system-metrics.md index 442b13823..daaf61d72 100644 --- a/docs/collect/system-metrics.md +++ b/docs/collect/system-metrics.md @@ -37,15 +37,14 @@ can find all system collectors in our [supported collectors list](https://github ## Collect Windows system metrics -Netdata is also capable of monitoring Windows systems. The [WMI -collector](https://github.com/netdata/go.d.plugin/blob/master/modules/wmi/README.md) integrates with +Netdata is also capable of monitoring Windows systems. The [Windows +collector](https://github.com/netdata/go.d.plugin/blob/master/modules/windows/README.md) integrates with [windows_exporter](https://github.com/prometheus-community/windows_exporter), a small Go-based binary that you can run -on Windows systems. The WMI collector then gathers metrics from an endpoint created by windows_exporter, for more -details see [the requirements](https://github.com/netdata/go.d.plugin/blob/master/modules/wmi/README.md#requirements). +on Windows systems. The Windows collector then gathers metrics from an endpoint created by windows_exporter, for more +details see [the requirements](https://github.com/netdata/go.d.plugin/blob/master/modules/windows/README.md#requirements). -Next, [configure the WMI -collector](https://github.com/netdata/go.d.plugin/blob/master/modules/wmi/README.md#configuration) to point to the URL -and port of your exposed endpoint. Restart Netdata with `sudo systemctl restart netdata`, or the [appropriate +Next, [configure](https://github.com/netdata/go.d.plugin/blob/master/modules/windows/README.md#configuration) the Windows +collector to point to the URL and port of your exposed endpoint. Restart Netdata with `sudo systemctl restart netdata`, or the [appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. You'll start seeing Windows system metrics, such as CPU utilization, memory, bandwidth per NIC, number of processes, and much more. diff --git a/docs/configure/common-changes.md b/docs/configure/common-changes.md index e1dccfceb..f171e49e2 100644 --- a/docs/configure/common-changes.md +++ b/docs/configure/common-changes.md @@ -5,7 +5,7 @@ custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/configure/ sidebar_label: "Common configuration changes" learn_status: "Published" learn_topic_type: "Tasks" -learn_rel_path: "Setup" +learn_rel_path: "Configuration" --> # Common configuration changes @@ -29,16 +29,6 @@ changes reflected in those visualizations due to the way Netdata Cloud proxies m ### Increase the long-term metrics retention period -Increase the values for the `page cache size` and `dbengine multihost disk space` settings in -the [`[global]`section](https://github.com/netdata/netdata/blob/master/daemon/config/README.md#global-section-options) -of `netdata.conf`. - -```conf -[global] - page cache size = 128 # 128 MiB of memory for metrics storage - dbengine multihost disk space = 4096 # 4GiB of disk space for metrics storage -``` - Read our doc on [increasing long-term metrics storage](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) for details, including a @@ -61,7 +51,7 @@ of 5 seconds. Every collector and plugin has its own `update every` setting, which you can also change in the `go.d.conf`, `python.d.conf` or `charts.d.conf` files, or in individual collector configuration files. If the `update every` for an individual collector is less than the global, the Netdata Agent uses the global setting. See -the [enable or configure a collector](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure.md) +the [enable or configure a collector](https://github.com/netdata/netdata/blob/master/collectors/REFERENCE.md#enable-and-disable-a-specific-collection-module) doc for details. ### Disable a collector or plugin @@ -93,7 +83,7 @@ sudo ./edit-config health.d/example-alarm.conf Or, append your new alarm to an existing file by editing a relevant existing file in the `health.d/` directory. -Read more about [configuring alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md) to +Read more about [configuring alarms](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md) to get started, and see the [health monitoring reference](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md) for a full listing of options available in health entities. @@ -142,56 +132,7 @@ click on the link to your preferred notification method to find documentation fo While the Netdata Agent is both [open and secure by design](https://www.netdata.cloud/blog/netdata-agent-dashboard/), we recommend every user take some action to administer and secure their nodes. -Learn more about a few of the following changes in -the [node security doc](https://github.com/netdata/netdata/blob/master/docs/configure/secure-nodes.md). - -### Disable the local Agent dashboard (`http://NODE:19999`) - -If you use Netdata Cloud to visualize metrics, stream metrics to a parent node, or otherwise don't need the local Agent -dashboard, disabling it reduces the Agent's resource utilization and improves security. - -Change the `mode` setting to `none` in -the [`[web]` section](https://github.com/netdata/netdata/blob/master/web/server/README.md#configuration) -of `netdata.conf`. - -```conf -[web] - mode = none -``` - -### Use access lists to restrict access to specific assets - -Allow access from only specific IP addresses, ranges of IP addresses, or hostnames -using [access lists](https://github.com/netdata/netdata/blob/master/web/server/README.md#access-lists) -and [simple patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md). - -See a quickstart to access lists in the [node security -doc](https://github.com/netdata/netdata/blob/master/docs/configure/secure-nodes.md#restrict-access-to-the-local-dashboard). - -### Stop sending anonymous statistics to Google Analytics - -Create a file called `.opt-out-from-anonymous-statistics` inside of your Netdata config directory to immediately stop -the statistics script. - -```bash -sudo touch .opt-out-from-anonymous-statistics -``` - -Learn more -about [why we collect anonymous statistics](https://github.com/netdata/netdata/blob/master/docs/anonymous-statistics.md). - -### Change the IP address/port Netdata listens to - -Change the `default port` setting in the `[web]` section to a port other than `19999`. - -```conf -[web] - default port = 39999 -``` - -Use the `bind to` setting to the ports other assets, such as -the [running `netdata.conf` configuration](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#see-an-agents-running-configuration), -API, or streaming requests listen to. +Learn more about the available options in the [security design documentation](https://github.com/netdata/netdata/blob/master/docs/netdata-security.md). ## Reduce resource usage @@ -217,36 +158,3 @@ The following restrictions apply to host label names: The policy for values is more flexible, but you can not use exclamation marks (`!`), whitespaces (` `), single quotes (`'`), double quotes (`"`), or asterisks (`*`), because they are used to compare label values in health alarms and templates. - -## What's next? - -If you haven't already, learn how -to [secure your nodes](https://github.com/netdata/netdata/blob/master/docs/configure/secure-nodes.md). - -As mentioned at the top, there are plenty of other - -You can also take what you've learned about node configuration to tweak the Agent's behavior or enable new features: - -- [Enable new collectors](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure.md) or tweak - their behavior. -- [Configure existing health alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md) or - create new ones. -- [Enable notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md) to receive - updates about the health of your - infrastructure. -- - -Change [the long-term metrics retention period](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) -using the database engine. - -### Related reference documentation - -- [Netdata Agent · Daemon](https://github.com/netdata/netdata/blob/master/health/README.md) -- [Netdata Agent · Daemon configuration](https://github.com/netdata/netdata/blob/master/daemon/config/README.md) -- [Netdata Agent · Web server](https://github.com/netdata/netdata/blob/master/web/server/README.md) -- [Netdata Agent · Local Agent dashboard](https://github.com/netdata/netdata/blob/master/web/gui/README.md) -- [Netdata Agent · Health monitoring](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md) -- [Netdata Agent · Notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md) -- [Netdata Agent · Simple patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md) - -[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fdocs%2Fconfigure%2Fcommon-changes&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/docs/configure/nodes.md b/docs/configure/nodes.md index 8f54b1bfb..0f31715ae 100644 --- a/docs/configure/nodes.md +++ b/docs/configure/nodes.md @@ -1,13 +1,3 @@ -<!-- -title: "Configure the Netdata Agent" -description: "Netdata is zero-configuration for most users, but complex infrastructures may require you to tweak some of the Agent's granular settings." -custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/configure/nodes.md" -sidebar_label: "Configure the Netdata Agent" -learn_status: "Published" -learn_topic_type: "Concepts" -learn_rel_path: "Concepts" ---> - # Configure the Netdata Agent Netdata's zero-configuration collection, storage, and visualization features work for many users, infrastructures, and @@ -23,7 +13,7 @@ anomaly, or change in infrastructure affects how their Agents should perform. ## The Netdata config directory On most Linux systems, using our [recommended one-line -installation](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx#install-on-linux-with-one-line-installer), the **Netdata config +installation](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#install-on-linux-with-one-line-installer), the **Netdata config directory** is `/etc/netdata/`. The config directory contains several configuration files with the `.conf` extension, a few directories, and a shell script named `edit-config`. @@ -44,14 +34,14 @@ exist. for each in the [daemon config](https://github.com/netdata/netdata/blob/master/daemon/config/README.md) doc. - `edit-config` is a shell script used for [editing configuration files](#use-edit-config-to-edit-configuration-files). - Various configuration files ending in `.conf` for [configuring plugins or - collectors](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure.md#enable-a-collector-or-its-orchestrator) behave. Examples: `go.d.conf`, + collectors](https://github.com/netdata/netdata/blob/master/collectors/REFERENCE.md) behave. Examples: `go.d.conf`, `python.d.conf`, and `ebpf.d.conf`. - Various directories ending in `.d`, which contain other configuration files, each ending in `.conf`, for [configuring - specific collectors](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure.md#configure-a-collector). + specific collectors](https://github.com/netdata/netdata/blob/master/collectors/REFERENCE.md). - `apps_groups.conf` is a configuration file for changing how applications/processes are grouped when viewing the **Application** charts from [`apps.plugin`](https://github.com/netdata/netdata/blob/master/collectors/apps.plugin/README.md) or [`ebpf.plugin`](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/README.md). -- `health.d/` is a directory that contains [health configuration files](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md). +- `health.d/` is a directory that contains [health configuration files](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md). - `health_alarm_notify.conf` enables and configures [alarm notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md). - `statsd.d/` is a directory for configuring Netdata's [statsd collector](https://github.com/netdata/netdata/blob/master/collectors/statsd.plugin/README.md). - `stream.conf` configures [parent-child streaming](https://github.com/netdata/netdata/blob/master/streaming/README.md) between separate nodes running the Agent. @@ -73,31 +63,32 @@ See [configure agent containers](https://github.com/netdata/netdata/blob/master/ The **recommended way to easily and safely edit Netdata's configuration** is with the `edit-config` script. This script opens existing Netdata configuration files using your system's `$EDITOR`. If the file doesn't yet exist in your config -directory, the script copies the stock version from `/usr/lib/netdata/conf.d` and opens it for editing. +directory, the script copies the stock version from `/usr/lib/netdata/conf.d` (or wherever the symlink `orig` under the config directory leads to) +to the proper place in the config directory and opens the copy for editing. + +If you have trouble running the script, you can manually copy the file and edit the copy. -Run `edit-config` without any options to see details on its usage and a list of all the configuration files you can -edit. +e.g. `cp /usr/lib/netdata/conf.d/go.d/bind.conf /etc/netdata/go.d/bind.conf; vi /etc/netdata/go.d/bind.conf` + +Run `edit-config` without options, to see details on its usage, or `edit-config --list` to see a list of all the configuration +files you can edit. ```bash -./edit-config USAGE: - ./edit-config FILENAME + ./edit-config [options] FILENAME Copy and edit the stock config file named: FILENAME if FILENAME is already copied, it will be edited as-is. - The EDITOR shell variable is used to define the editor to be used. - - Stock config files at: '/usr/lib/netdata/conf.d' + Stock config files at: '/etc/netdata/../../usr/lib/netdata/conf.d' User config files at: '/etc/netdata' - Available files in '/usr/lib/netdata/conf.d' to copy and edit: + The editor to use can be specified either by setting the EDITOR + environment variable, or by using the --editor option. -./apps_groups.conf ./health.d/phpfpm.conf -./aws_kinesis.conf ./health.d/pihole.conf -./charts.d/ap.conf ./health.d/portcheck.conf -./charts.d/apcupsd.conf ./health.d/postgres.conf -... + The file to edit can also be specified using the --file option. + + For a list of known config files, run './edit-config --list' ``` To edit `netdata.conf`, run `./edit-config netdata.conf`. You may need to elevate your privileges with `sudo` or another @@ -146,29 +137,3 @@ wget -O /etc/netdata/netdata.conf http://localhost:19999/netdata.conf # or curl -o /etc/netdata/netdata.conf http://NODE:19999/netdata.conf ``` - -## What's next? - -Learn more about [starting, stopping, or restarting](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) the Netdata daemon to apply -configuration changes. - -Apply some [common configuration changes](https://github.com/netdata/netdata/blob/master/docs/configure/common-changes.md) to quickly tweak the Agent's behavior. - -[Add security to your node](https://github.com/netdata/netdata/blob/master/docs/configure/secure-nodes.md) with what you've learned about the Netdata config directory -and `edit-config`. We put together a few security best practices based on how you use the Netdata. - -You can also take what you've learned about node configuration to enable or enhance features: - -- [Enable new collectors](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure.md) or tweak their behavior. -- [Configure existing health alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md) or create new ones. -- [Enable notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md) to receive updates about the health of your - infrastructure. -- Change [the long-term metrics retention period](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) using the database engine. - -### Related reference documentation - -- [Netdata Agent · Daemon](https://github.com/netdata/netdata/blob/master/daemon/README.md) -- [Netdata Agent · Health monitoring](https://github.com/netdata/netdata/blob/master/health/README.md) -- [Netdata Agent · Notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md) - -[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fdocs%2Fconfigure%2Fnodes&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/docs/configure/secure-nodes.md b/docs/configure/secure-nodes.md deleted file mode 100644 index 75bf6fd36..000000000 --- a/docs/configure/secure-nodes.md +++ /dev/null @@ -1,127 +0,0 @@ -<!-- -title: "Secure your nodes" -description: "Your data and systems are safe with Netdata, but we recommend a few easy ways to improve the security of your infrastructure." -custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/configure/secure-nodes.md" -sidebar_label: "Secure your nodes" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup" ---> - -# Secure your nodes - -Upon installation, the Netdata Agent serves the **local dashboard** at port `19999`. If the node is accessible to the -internet at large, anyone can access the dashboard and your node's metrics at `http://NODE:19999`. We made this decision -so that the local dashboard was immediately accessible to users, and so that we don't dictate how professionals set up -and secure their infrastructures. - -Despite this design decision, your [data](https://github.com/netdata/netdata/blob/master/docs/netdata-security.md#your-data-is-safe-with-netdata) and your -[systems](https://github.com/netdata/netdata/blob/master/docs/netdata-security.md#your-systems-are-safe-with-netdata) are safe with Netdata. Netdata is read-only, -cannot do anything other than present metrics, and runs without special/`sudo` privileges. Also, the local dashboard -only exposes chart metadata and metric values, not raw data. - -While Netdata is secure by design, we believe you should [protect your -nodes](https://github.com/netdata/netdata/blob/master/docs/netdata-security.md#why-netdata-should-be-protected). If left accessible to the internet at large, the -local dashboard could reveal sensitive information about your infrastructure. For example, an attacker can view which -applications you run (databases, webservers, and so on), or see every user account on a node. - -Instead of dictating how to secure your infrastructure, we give you many options to establish security best practices -that align with your goals and your organization's standards. - -- [Disable the local dashboard](#disable-the-local-dashboard): **Simplest and recommended method** for those who have - added nodes to Netdata Cloud and view dashboards and metrics there. -- [Restrict access to the local dashboard](#restrict-access-to-the-local-dashboard): Allow local dashboard access from - only certain IP addresses, such as a trusted static IP or connections from behind a management LAN. Full support for - Netdata Cloud. -- [Use a reverse proxy](#use-a-reverse-proxy): Password-protect a local dashboard and enable TLS to secure it. Full - support for Netdata Cloud. - -## Disable the local dashboard - -This is the _recommended method for those who have connected their nodes to Netdata Cloud_ and prefer viewing real-time -metrics using the War Room Overview, Nodes view, and Cloud dashboards. - -You can disable the local dashboard (and API) but retain the encrypted Agent-Cloud link ([ACLK](https://github.com/netdata/netdata/blob/master/aclk/README.md)) that -allows you to stream metrics on demand from your nodes via the Netdata Cloud interface. This change mitigates all -concerns about revealing metrics and system design to the internet at large, while keeping all the functionality you -need to view metrics and troubleshoot issues with Netdata Cloud. - -Open `netdata.conf` with `./edit-config netdata.conf`. Scroll down to the `[web]` section, and find the `mode = -static-threaded` setting, and change it to `none`. - -```conf -[web] - mode = none -``` - -Save and close the editor, then [restart your Agent](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) using `sudo systemctl -restart netdata`. If you try to visit the local dashboard to `http://NODE:19999` again, the connection will fail because -that node no longer serves its local dashboard. - -> See the [configuration basics doc](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) for details on how to find `netdata.conf` and use -> `edit-config`. - -## Restrict access to the local dashboard - -If you want to keep using the local dashboard, but don't want it exposed to the internet, you can restrict access with -[access lists](https://github.com/netdata/netdata/blob/master/web/server/README.md#access-lists). This method also fully retains the ability to stream metrics -on-demand through Netdata Cloud. - -The `allow connections from` setting helps you allow only certain IP addresses or FQDN/hostnames, such as a trusted -static IP, only `localhost`, or connections from behind a management LAN. - -By default, this setting is `localhost *`. This setting allows connections from `localhost` in addition to _all_ -connections, using the `*` wildcard. You can change this setting using Netdata's [simple -patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md). - -```conf -[web] - # Allow only localhost connections - allow connections from = localhost - - # Allow only from management LAN running on `10.X.X.X` - allow connections from = 10.* - - # Allow connections only from a specific FQDN/hostname - allow connections from = example* -``` - -The `allow connections from` setting is global and restricts access to the dashboard, badges, streaming, API, and -`netdata.conf`, but you can also set each of those access lists more granularly if you choose: - -```conf -[web] - allow connections from = localhost * - allow dashboard from = localhost * - allow badges from = * - allow streaming from = * - allow netdata.conf from = localhost fd* 10.* 192.168.* 172.16.* 172.17.* 172.18.* 172.19.* 172.20.* 172.21.* 172.22.* 172.23.* 172.24.* 172.25.* 172.26.* 172.27.* 172.28.* 172.29.* 172.30.* 172.31.* - allow management from = localhost -``` - -See the [web server](https://github.com/netdata/netdata/blob/master/web/server/README.md#access-lists) docs for additional details about access lists. You can take -access lists one step further by [enabling SSL](https://github.com/netdata/netdata/blob/master/web/server/README.md#enabling-tls-support) to encrypt data from local -dashboard in transit. The connection to Netdata Cloud is always secured with TLS. - -## Use a reverse proxy - -You can also put Netdata behind a reverse proxy for additional security while retaining the functionality of both the -local dashboard and Netdata Cloud dashboards. You can use a reverse proxy to password-protect the local dashboard and -enable HTTPS to encrypt metadata and metric values in transit. - -We recommend Nginx, as it's what we use for our [demo server](https://london.my-netdata.io/), and we have a guide -dedicated to [running Netdata behind Nginx](https://github.com/netdata/netdata/blob/master/docs/Running-behind-nginx.md). - -We also have guides for [Apache](https://github.com/netdata/netdata/blob/master/docs/Running-behind-apache.md), [Lighttpd](https://github.com/netdata/netdata/blob/master/docs/Running-behind-lighttpd.md), -[HAProxy](https://github.com/netdata/netdata/blob/master/docs/Running-behind-haproxy.md), and [Caddy](https://github.com/netdata/netdata/blob/master/docs/Running-behind-caddy.md). - -## What's next? - -Read about [Netdata's security design](https://github.com/netdata/netdata/blob/master/docs/netdata-security.md) and our [blog -post](https://www.netdata.cloud/blog/netdata-agent-dashboard/) about why the local Agent dashboard is both open and -secure by design. - -Next up, learn about [collectors](https://github.com/netdata/netdata/blob/master/docs/collect/how-collectors-work.md) to ensure you're gathering every essential -metric about your node, its applications, and your infrastructure at large. - -[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fdocs%2Fconfigure%2Fsecure-nodesa&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/docs/configure/start-stop-restart.md b/docs/configure/start-stop-restart.md index 3c04777da..45691bc94 100644 --- a/docs/configure/start-stop-restart.md +++ b/docs/configure/start-stop-restart.md @@ -1,20 +1,10 @@ -<!-- -title: "Start, stop, or restart the Netdata Agent" -description: "Manage the Netdata Agent daemon, load configuration changes, and troubleshoot stuck processes on systemd and non-systemd nodes." -custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/configure/start-stop-restart.md" -sidebar_label: "Start, stop, or restart the Netdata Agent" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations" ---> - # Start, stop, or restart the Netdata Agent -When you install the Netdata Agent, the [daemon](https://github.com/netdata/netdata/blob/master/daemon/README.md) is configured to start at boot and stop and -restart/shutdown. +When you install the Netdata Agent, the [daemon](https://github.com/netdata/netdata/blob/master/daemon/README.md) is +configured to start at boot and stop and restart/shutdown. -You will most often need to _restart_ the Agent to load new or editing configuration files. [Health -configuration](#reload-health-configuration) files are the only exception, as they can be reloaded without restarting +You will most often need to _restart_ the Agent to load new or editing configuration files. +[Health configuration](#reload-health-configuration) files are the only exception, as they can be reloaded without restarting the entire Agent. Stopping or restarting the Netdata Agent will cause gaps in stored metrics until the `netdata` process initiates @@ -51,6 +41,14 @@ using your preferred method listed above. sudo netdatacli shutdown-agent ``` +## Netdata MSI installations + +Netdata provides an installer for Windows using WSL, on those installations by using a Windows terminal (e.g. the Command prompt or Windows Powershell) you can: + +- Start Netdata, by running `start-netdata` +- Stop Netdata, by running `stop-netdata` +- Restart Netdata, by running `restart-netdata` + ## Reload health configuration You do not need to restart the Netdata Agent between changes to health configuration files, such as specific health @@ -82,21 +80,75 @@ ps aux| grep netdata The output of `ps aux` should show no `netdata` or associated processes running. You can now start the Netdata Agent again with `service netdata start`, or the appropriate method for your system. -## What's next? +## Starting Netdata at boot + +In the `system` directory you can find scripts and configurations for the +various distros. + +### systemd + +The installer already installs `netdata.service` if it detects a systemd system. + +To install `netdata.service` by hand, run: + +```sh +# stop Netdata +killall netdata + +# copy netdata.service to systemd +cp system/netdata.service /etc/systemd/system/ + +# let systemd know there is a new service +systemctl daemon-reload + +# enable Netdata at boot +systemctl enable netdata + +# start Netdata +systemctl start netdata +``` + +### init.d + +In the system directory you can find `netdata-lsb`. Copy it to the proper place according to your distribution +documentation. For Ubuntu, this can be done via running the following commands as root. + +```sh +# copy the Netdata startup file to /etc/init.d +cp system/netdata-lsb /etc/init.d/netdata + +# make sure it is executable +chmod +x /etc/init.d/netdata + +# enable it +update-rc.d netdata defaults +``` -Learn more about [securing the Netdata Agent](https://github.com/netdata/netdata/blob/master/docs/configure/secure-nodes.md). +### openrc (gentoo) -You can also use the restart/reload methods described above to enable new features: +In the `system` directory you can find `netdata-openrc`. Copy it to the proper +place according to your distribution documentation. + +### CentOS / Red Hat Enterprise Linux + +For older versions of RHEL/CentOS that don't have systemd, an init script is included in the system directory. This can +be installed by running the following commands as root. + +```sh +# copy the Netdata startup file to /etc/init.d +cp system/netdata-init-d /etc/init.d/netdata + +# make sure it is executable +chmod +x /etc/init.d/netdata + +# enable it +chkconfig --add netdata +``` -- [Enable new collectors](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure.md) or tweak their behavior. -- [Configure existing health alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md) or create new ones. -- [Enable notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md) to receive updates about the health of your - infrastructure. -- Change [the long-term metrics retention period](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) using the database engine. +_There have been some recent work on the init script, see PR +<https://github.com/netdata/netdata/pull/403>_ -### Related reference documentation +### other systems -- [Netdata Agent · Daemon](https://github.com/netdata/netdata/blob/master/daemon/README.md) -- [Netdata Agent · Netdata CLI](https://github.com/netdata/netdata/blob/master/cli/README.md) +You can start Netdata by running it from `/etc/rc.local` or equivalent. -[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fdocs%2Fconfigure%2Fstart-stop-restart&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/docs/contributing/contributing-documentation.md b/docs/contributing/contributing-documentation.md deleted file mode 100644 index da28272b4..000000000 --- a/docs/contributing/contributing-documentation.md +++ /dev/null @@ -1,109 +0,0 @@ -<!-- -title: "Contributing to documentation" -description: "Want to contribute to Netdata's documentation? This guide will set you up with the tools to help others learn about health and performance monitoring." -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/contributing/contributing-documentation.md ---> - -# Contributing to documentation - -We welcome contributions to Netdata's already extensive documentation. - -We store documentation related to the open-source Netdata Agent inside of the [`netdata/netdata` -repository](https://github.com/netdata/netdata) on GitHub. Documentation related to Netdata Cloud is stored in a private -repository and is not currently open to community contributions. - -The Netdata team aggregates and publishes all documentation at [learn.netdata.cloud](https://learn.netdata.cloud/) using -[Docusaurus](https://v2.docusaurus.io/) in a private GitHub repository. - -## Before you get started - -Anyone interested in contributing to documentation should first read the [Netdata style -guide](https://github.com/netdata/netdata/blob/master/docs/contributing/style-guide.md) and the [Netdata Community Code of Conduct](https://github.com/netdata/.github/blob/main/CODE_OF_CONDUCT.md). - -Netdata's documentation uses Markdown syntax. If you're not familiar with Markdown, read the [Mastering -Markdown](https://guides.github.com/features/mastering-markdown/) guide from GitHub for the basics on creating -paragraphs, styled text, lists, tables, and more. - -### Netdata's documentation structure - -Netdata's documentation is separated into four sections. - -- **Netdata**: Documents based on the actions users want to take, and solutions to their problems, such both the Netdata - Agent and Netdata Cloud. - - Stored in various subfolders of the [`/docs` folder](https://github.com/netdata/netdata/tree/master/docs) within the - `netdata/netdata` repository: `/docs/collect`, `/docs/configure`, `/docs/export`, `/docs/get`, `/docs/monitor`, - `/docs/overview`, `/docs/quickstart`, `/docs/store`, and `/docs/visualize`. - - Published at [`https://learn.netdata.cloud/docs`](https://learn.netdata.cloud/docs). -- **Netdata Agent reference**: Reference documentation for the open-source Netdata Agent. - - Stored in various `.md` files within the `netdata/netdata` repository alongside the code responsible for that - feature. For example, the database engine's reference documentation is at `/database/engine/README.md`. - - Published under the **Reference** section in the Netdata Learn sidebar. -- **Netdata Cloud reference**: Reference documentation for the closed-source Netdata Cloud web application. - - Stored in a private GitHub repository and not editable by the community. - - Published at [`https://github.com/netdata/netdata/blob/master/docs/cloud/cloud.mdx`](https://github.com/netdata/netdata/blob/master/docs/cloud/cloud.mdx). -- **Guides**: Solutions-based articles for users who want instructions on completing a specific complex task using the - Netdata Agent and/or Netdata Cloud. - - Stored in the [`/docs/guides` folder](https://github.com/netdata/netdata/tree/master/docs/guides) within the - `netdata/netdata` repository. Organized into subfolders that roughly correlate with the core Netdata documentation. - - Published at [`https://learn.netdata.cloud/guides`](https://learn.netdata.cloud/guides). - -Generally speaking, if you want to contribute to the reference documentation for a specific Netdata Agent feature, find -the appropriate `.md` file co-located with that feature. If you want to contribute documentation that spans features or -products, or has no direct correlation with the existing directory structure, place it in the `/docs` folder within -`netdata/netdata`. - -## How to contribute - -The easiest way to contribute to Netdata's documentation is to edit a file directly on GitHub. This is perfect for small -fixes to a single document, such as fixing a typo or clarifying a confusing sentence. - -Click on the **Edit this page** button on any published document on [Netdata Learn](https://learn.netdata.cloud). Each -page has two of these buttons: One beneath the table of contents, and another at the end of the document, which take you -to GitHub's code editor. Make your suggested changes, keeping [Netdata style guide](https://github.com/netdata/netdata/blob/master/docs/contributing/style-guide.md) -in mind, and use *Preview changes** button to ensure your Markdown syntax works as expected. - -Under the **Commit changes** header, write descriptive title for your requested change. Click the **Commit changes** -button to initiate your pull request (PR). - -Jump down to our instructions on [PRs](#making-a-pull-request) for your next steps. - -### Edit locally - -Editing documentation locally is the preferred method for complex changes that span multiple documents or change the -documentation's style or structure. - -Create a fork of the Netdata Agent repository by visit the [Netdata repository](https://github.com/netdata/netdata) and -clicking on the **Fork** button. - -![Screenshot of forking the Netdata -repository](https://user-images.githubusercontent.com/1153921/59873572-25f5a380-9351-11e9-92a4-a681fe4a2ed9.png) - -GitHub will ask you where you want to clone the repository. When finished, you end up at the index of your forked -Netdata Agent repository. Clone your fork to your local machine: - -```bash -git clone https://github.com/YOUR-GITHUB-USERNAME/netdata.git -``` - -Create a new branch using `git checkout -b BRANCH-NAME`. Use your favorite text editor to make your changes, keeping the -[Netdata style guide](https://github.com/netdata/netdata/blob/master/docs/contributing/style-guide.md) in mind. Add, commit, and push changes to your fork. When -you're finished, visit the [Netdata Agent Pull requests](https://github.com/netdata/netdata/pulls) to create a new pull -request based on the changes you made in the new branch of your fork. - -## Making a pull request - -Pull requests (PRs) should be concise and informative. See our [PR guidelines](https://learn.netdata.cloud/contribute/handbook#pr-guidelines) for -specifics. - -- The title must follow the [imperative mood](https://en.wikipedia.org/wiki/Imperative_mood) and be no more than ~50 - characters. -- The description should explain what was changed and why. Verify that you tested any code or processes that you are - trying to change. - -The Netdata team will review your PR and assesses it for correctness, conciseness, and overall quality. We may point to -specific sections and ask for additional information or other fixes. - -After merging your PR, the Netdata team rebuilds the [documentation site](https://learn.netdata.cloud) to publish the -changed documentation. - - diff --git a/docs/contributing/style-guide.md b/docs/contributing/style-guide.md index 7d1b86478..997bc61a8 100644 --- a/docs/contributing/style-guide.md +++ b/docs/contributing/style-guide.md @@ -1,19 +1,14 @@ -<!-- -title: "Netdata style guide" -description: "The Netdata style guide establishes editorial guidelines for all of Netdata's writing, including documentation, blog posts, in-product UX copy, and more." -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/contributing/style-guide.md ---> - # Netdata style guide -The _Netdata style guide_ establishes editorial guidelines for any writing produced by the Netdata team or the Netdata -community, including documentation, articles, in-product UX copy, and more. Both internal Netdata teams and external -contributors to any of Netdata's open-source projects should reference and adhere to this style guide as much as -possible. +The _Netdata style guide_ establishes editorial guidelines for any writing produced by the Netdata team or the Netdata community, including documentation, articles, in-product UX copy, and more. + +> ### Note +> This document is meant to be accompanied by the [Documentation Guidelines](https://github.com/netdata/netdata/blob/master/docs/guidelines.md). If you want to contribute to Netdata's documentation, please read it too. + +Both internal Netdata teams and external contributors to any of Netdata's open-source projects should reference and adhere to this style guide as much as possible. -Netdata's writing should **empower** and **educate**. You want to help people understand Netdata's value, encourage them -to learn more, and ultimately use Netdata's products to democratize monitoring in their organizations. To achieve these -goals, your writing should be: +Netdata's writing should **empower** and **educate**. You want to help people understand Netdata's value, encourage them to learn more, and ultimately use Netdata's products to democratize monitoring in their organizations. +To achieve these goals, your writing should be: - **Clear**. Use simple words and sentences. Use strong, direct, and active language that encourages readers to action. - **Concise**. Provide solutions and answers as quickly as possible. Give users the information they need right now, @@ -353,56 +348,6 @@ The Netdata team uses [`remark-lint`](https://github.com/remarkjs/remark-lint) f If you want to see all the settings, open the [`remarkrc.js`](https://github.com/netdata/netdata/blob/master/.remarkrc.js) file in the `netdata/netdata` repository. -### Frontmatter - -Every document must begin with frontmatter, followed by an H1 (`#`) heading. - -Unlike typical Markdown frontmatter, Netdata uses HTML comments (`<!--`, `-->`) to begin and end the frontmatter block. -These HTML comments are later converted into typical frontmatter syntax when building [Netdata -Learn](https://learn.netdata.cloud). - -Frontmatter _must_ contain the following variables: - -- A `title` that quickly and distinctly describes the document's content. -- A `description` that elaborates on the purpose or goal of the document using no less than 100 characters and no more - than 155 characters. -- A `custom_edit_url` that links directly to the GitHub URL where another user could suggest additional changes to the - published document. - -Some documents, like the Ansible guide and others in the `/docs/guides` folder, require an `image` variable as well. In -this case, replace `/docs` with `/img/seo`, and then rebuild the remainder of the path to the document in question. End -the path with `.png`. A member of the Netdata team will assist in creating the image when publishing the content. - -For example, here is the frontmatter for the guide -about [deploying the Netdata Agent with Ansible](https://github.com/netdata/netdata/blob/master/docs/guides/deploy/ansible.md). - -```markdown -<!-- -title: Deploy Netdata with Ansible -description: "Deploy an infrastructure monitoring solution in minutes with the Netdata Agent and Ansible. Use and customize a simple playbook for monitoring as code." -image: /img/seo/guides/deploy/ansible.png -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/deploy/ansible.md ---> - -# Deploy Netdata with Ansible - -... -``` - -Questions about frontmatter in -documentation? [Ask on our community forum](https://community.netdata.cloud/c/blog-posts-and-articles/6). - -### Linking between documentation - -Documentation should link to relevant pages whenever it's relevant and provides valuable context to the reader. - -Links should always reference the full path to the document, beginning at the root of the Netdata Agent repository -(`/`), and ending with the `.md` file extension. Avoid relative links or traversing up directories using `../`. - -For example, if you want to link to our node configuration document, link -to `https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md`. To reference -the guide for deploying the Netdata Agent with Ansible, link to `/docs/guides/deploy/ansible.md`. - ### References to UI elements When referencing a user interface (UI) element in Netdata, reference the label text of the link/button with Markdown's @@ -464,6 +409,58 @@ inline char *health_stock_config_dir(void) { Prism also supports titles and line highlighting. See the [Docusaurus documentation](https://v2.docusaurus.io/docs/markdown-features#code-blocks) for more information. +### Adding Notes + +Notes inside files should render properly both in GitHub and in Learn, to do that, it is best to use the format listed below: + +``` +> ### Note +> This is an info or a note block. + +> ### Tip, Best Practice +> This is a tip or a best practice block. + +> ### Warning, Caution +> This is a warning or a caution block. +``` + +Which renders into: + + +> ### Note +> This is an info or a note block. + +> ### Tip, Best Practice +> This is a tip or a best practice block. + +> ### Warning, Caution +> This is a warning or a caution block. + +### Tabs + +Docusaurus allows for Tabs to be used, but we have to ensure that a user accessing the file from GitHub doesn't notice any weird artifacts while reading. So, we use tabs only when necessary in this format: + +``` + +<Tabs> +<TabItem value="tab1" label="tab1"> + +<h3> Header for tab1 </h3> + +text for tab1, both visible in GitHub and Docusaurus + + +</TabItem> +<TabItem value="tab2" label="tab2"> + +<h3> Header for tab2 </h3> + +text for tab2, both visible in GitHub and Docusaurus + +</TabItem> +</Tabs> +``` + ## Word list The following tables describe the standard spelling, capitalization, and usage of words found in Netdata's writing. @@ -476,7 +473,6 @@ The following tables describe the standard spelling, capitalization, and usage o | **Netdata** | The company behind the open-source Netdata Agent and the Netdata Cloud web application. Never use _netdata_ or _NetData_. <br /><br />In general, focus on the user's goals, actions, and solutions rather than what the company provides. For example, write _Learn more about enabling alarm notifications on your preferred platforms_ instead of _Netdata sends alarm notifications to your preferred platforms_. | | **Netdata Agent** | The free and open source [monitoring agent](https://github.com/netdata/netdata) that you can install on all of your distributed systems, whether they're physical, virtual, containerized, ephemeral, and more. The Agent monitors systems running Linux, Docker, Kubernetes, macOS, FreeBSD, and more, and collects metrics from hundreds of popular services and applications. | | **Netdata Cloud** | The web application hosted at [https://app.netdata.cloud](https://app.netdata.cloud) that helps you monitor an entire infrastructure of distributed systems in real time. <br /><br />Never use _Cloud_ without the preceding _Netdata_ to avoid ambiguity. | -| **Netdata community** | Contributors to any of Netdata's [open-source projects](https://github.com/netdata/learn/blob/master/contribute/projects.mdx), members of the [community forum](https://community.netdata.cloud/). | | **Netdata community forum** | The Discourse-powered forum for feature requests, Netdata Cloud technical support, and conversations about Netdata's monitoring and troubleshooting products. | | **node** | A system on which the Netdata Agent is installed. The system can be physical, virtual, in a Docker container, and more. Depending on your infrastructure, you may have one, dozens, or hundreds of nodes. Some nodes are _ephemeral_, in that they're created/destroyed automatically by an orchestrator service. | | **Space** | The highest level container within Netdata Cloud for a user to organize their team members and nodes within their infrastructure. A Space likely represents an entire organization or a large team. <br /><br />_Space_ is always capitalized. | @@ -489,7 +485,5 @@ The following tables describe the standard spelling, capitalization, and usage o | Term | Definition | |-----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | **filesystem** | Use instead of _file system_. | -| **preconfigured** | The concept that many of Netdata's features come with sane defaults that users don't need to configure to find [immediate value](https://github.com/netdata/netdata/blob/master/docs/overview/why-netdata.md#simple-to-deploy). | +| **preconfigured** | The concept that many of Netdata's features come with sane defaults that users don't need to configure to find immediate value. | | **real time**/**real-time** | Use _real time_ as a noun phrase, most often with _in_: _Netdata collects metrics in real time_. Use _real-time_ as an adjective: _Netdata collects real-time metrics from hundreds of supported applications and services. | - - diff --git a/docs/dashboard/customize.mdx b/docs/dashboard/customize.md index 3c30ee231..d9538e62f 100644 --- a/docs/dashboard/customize.mdx +++ b/docs/dashboard/customize.md @@ -1,25 +1,12 @@ ---- -title: "Customize the standard dashboard" -description: >- - "Netdata's preconfigured dashboard offers many customization options, such as choosing when - charts are updated, your preferred theme, and custom text to document processes, and more." -type: "how-to" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/dashboard/customize.mdx" -sidebar_label: "Customize the standard dashboard" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations" ---- - # Customize the standard dashboard -While the [Netdata dashboard](https://github.com/netdata/netdata/blob/master/docs/dashboard/how-dashboard-works.mdx) comes preconfigured with hundreds of charts and +While the [Netdata dashboard](https://github.com/netdata/netdata/blob/master/web/gui/README.md) comes preconfigured with hundreds of charts and thousands of metrics, you may want to alter your experience based on a particular use case or preferences. ## Dashboard settings -To change dashboard settings, click the on the **settings** icon ![Import -icon](https://raw.githubusercontent.com/netdata/netdata-ui/98e31799c1ec0983f433537ff16d2ac2b0d994aa/src/components/icon/assets/gear.svg) +To change dashboard settings, click the on the **settings** icon +![Import icon](https://raw.githubusercontent.com/netdata/netdata-ui/98e31799c1ec0983f433537ff16d2ac2b0d994aa/src/components/icon/assets/gear.svg) in the top panel. These settings only affect how the dashboard behaves in your browser. They take effect immediately and are permanently @@ -30,7 +17,8 @@ Here are a few popular settings: ### Change chart legend position -Find this setting under the **Visual** tab. By default, Netdata places the [legend of dimensions](https://github.com/netdata/netdata/blob/master/docs/dashboard/dimensions-contexts-families.mdx#dimension) _below_ charts. +Find this setting under the **Visual** tab. By default, Netdata places the +[legend of dimensions](https://github.com/netdata/netdata/blob/master/docs/dashboard/dimensions-contexts-families.md#dimension) _below_ charts. Click this toggle to move the legend to the _right_ of charts. @@ -82,18 +70,3 @@ the following line to the `[web]` section to tell Netdata where to find your cus ``` Reload your browser tab to see your custom configuration. - -## What's next? - -If you're keen on continuing to customize your Netdata experience, check out our docs on [building new custom -dashboards](https://github.com/netdata/netdata/blob/master/web/gui/custom/README.md) with HTML, CSS, and JavaScript. - -### Further reading & related information - -- Dashboard - - [How the dashboard works](https://github.com/netdata/netdata/blob/master/docs/dashboard/how-dashboard-works.mdx) - - [Interact with charts](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx) - - [Chart dimensions, contexts, and families](https://github.com/netdata/netdata/blob/master/docs/dashboard/dimensions-contexts-families.mdx) - - [Select timeframes to visualize](https://github.com/netdata/netdata/blob/master/docs/dashboard/visualization-date-and-time-controls.mdx) - - [Import, export, and print a snapshot](https://github.com/netdata/netdata/blob/master/docs/dashboard/import-export-print-snapshot.mdx) - - **[Customize the standard dashboard](https://github.com/netdata/netdata/blob/master/docs/dashboard/customize.mdx)** diff --git a/docs/dashboard/dimensions-contexts-families.mdx b/docs/dashboard/dimensions-contexts-families.md index ee9636d15..41e839c85 100644 --- a/docs/dashboard/dimensions-contexts-families.mdx +++ b/docs/dashboard/dimensions-contexts-families.md @@ -1,26 +1,12 @@ ---- -title: "Chart dimensions, contexts, and families" -description: >- - "Netdata organizes charts into dimensions, contexts, and families to automatically - and meaningfully organize thousands of metrics into interactive charts." -type: "explanation" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/dashboard/dimensions-contexts-families.mdx" -sidebar_label: "Chart dimensions, contexts, and families" -learn_status: "Published" -learn_topic_type: "Concepts" -learn_rel_path: "Concepts" ---- - # Chart dimensions, contexts, and families -While Netdata's charts require no configuration and are [easy to interact with](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx), +While Netdata's charts require no configuration and are [easy to interact with](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md), they have a lot of underlying complexity. To meaningfully organize charts out of the box based on what's happening in your nodes, Netdata uses the concepts of **dimensions**, **contexts**, and **families**. -Understanding how these work will help you more easily navigate the dashboard, [write new -alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md), or play around with the [API](https://github.com/netdata/netdata/blob/master/web/api/README.md). - -For a refresher on the anatomy of a chart, see [dashboards and charts](https://github.com/netdata/netdata/blob/master/docs/dashboard/how-dashboard-works.mdx). +Understanding how these work will help you more easily navigate the dashboard, +[write new alarms](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md), or play around +with the [API](https://github.com/netdata/netdata/blob/master/web/api/README.md). ## Dimension @@ -42,7 +28,7 @@ dimensions](https://user-images.githubusercontent.com/1153921/114207816-a5cb7400 The chart shows 13 unique dimensions, such as `httpd` for the CPU utilization for web servers, `kernel` for anything related to the Linux kernel, and so on. In your dashboard, these specific dimensions will almost certainly be different. -Dimensions can be [hidden](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx#show-and-hide-dimensions) to help you focus your +Dimensions can be [hidden](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md#show-and-hide-dimensions) to help you focus your attention. ## Context @@ -56,7 +42,7 @@ whereas anything after the `.` is specified either by the chart's developer or b By default, a chart's type affects where it fits in the menu, while its family creates submenus. -Netdata also relies on contexts for [alarm configuration](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md) (the [`on` +Netdata also relies on contexts for [alarm configuration](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md) (the [`on` line](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#alarm-line-on)). ## Family @@ -81,22 +67,3 @@ names: | `disk.ops` | `disk_ops.sda` | `disk_ops.sdb` | | `disk.backlog` | `disk_backlog.sda` | `disk_backlog.sdb` | | `disk.util` | `disk_util.sda` | `disk_util.sdb` | - -## What's next? - -With an understanding of a chart's dimensions, context, and family, you're now ready to dig even deeper into Netdata's -dashboard. We recommend looking into [using the timeframe selector](https://github.com/netdata/netdata/blob/master/docs/dashboard/visualization-date-and-time-controls.mdx). - -If you feel comfortable with the [dashboard](https://github.com/netdata/netdata/blob/master/docs/dashboard/how-dashboard-works.mdx) and interacting with charts, we -recommend learning about [configuration](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md). While Netdata doesn't _require_ a complicated setup -process or a query language to create charts, there are a lot of ways to tweak the experience to match your needs. - -### Further reading & related information - -- Dashboard - - [How the dashboard works](https://github.com/netdata/netdata/blob/master/docs/dashboard/how-dashboard-works.mdx) - - [Interact with charts](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx) - - **[Chart dimensions, contexts, and families](https://github.com/netdata/netdata/blob/master/docs/dashboard/dimensions-contexts-families.mdx)** - - [Select timeframes to visualize](https://github.com/netdata/netdata/blob/master/docs/dashboard/visualization-date-and-time-controls.mdx) - - [Import, export, and print a snapshot](https://github.com/netdata/netdata/blob/master/docs/dashboard/import-export-print-snapshot.mdx) - - [Customize the standard dashboard](https://github.com/netdata/netdata/blob/master/docs/dashboard/customize.mdx) diff --git a/docs/dashboard/how-dashboard-works.mdx b/docs/dashboard/how-dashboard-works.mdx deleted file mode 100644 index f14402705..000000000 --- a/docs/dashboard/how-dashboard-works.mdx +++ /dev/null @@ -1,118 +0,0 @@ ---- -title: "How the dashboard works" -description: >- - "Learn how to navigate Netdata's preconfigured dashboard to get started - exploring, visualizing, and troubleshooting in real time." -type: "explanation" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/dashboard/how-dashboard-works.mdx" -sidebar_label: "How the dashboard works" -learn_status: "Published" -learn_topic_type: "Concepts" -learn_rel_path: "Concepts" ---- - -# How the dashboard works - -Because Netdata is a monitoring and _troubleshooting_ platform, a dashboard with real-time, meaningful, and -context-aware charts is essential. - -As soon as you [install Netdata](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx), it autodetects hardware, OS, containers, services, and -applications running on your node and builds a dashboard on a single, scrollable webpage. This page features hundreds of -charts, which are preconfigured to save you time from learning a query language, all stacked on top of one another. This -vertical rhythm is designed to encourage exploration and help you visually identify connections between the metrics -visualized in different charts. - -It's essential to understand the core concepts and features of Netdata's dashboard if you want to maximize your Netdata -experience right after installation. - -## Open the dashboard - -Access Netdata's dashboard by navigating to `http://NODE:19999` in your browser, replacing `NODE` with either -`localhost` or the hostname/IP address of a remote node. - -![Animated GIF of navigating to the -dashboard](https://user-images.githubusercontent.com/1153921/80825153-abaec600-8b94-11ea-8b17-1b770a2abaa9.gif) - -Many features of the internal web server that serves the dashboard are [configurable](https://github.com/netdata/netdata/blob/master/web/server/README.md), including -the listen port, enforced TLS, and even disabling the dashboard altogether. - -## Sections and menus - -As mentioned in the introduction, Netdata automatically organizes all the metrics it collects from your node, and places -them into **sections** of closely related charts. - -The first section on any dashboard is the **System Overview**, followed by **CPUs**, **Memory**, and so on. - -These sections populate the **menu**, which is on the right-hand side of the dashboard. Instead of manually scrolling up -and down to explore the dashboard, it's generally faster to click on the relevant menu item to jump to that position on -the dashboard. - -Many menu items also contain a **submenu**, with links to additional categories. For example, the **Disks** section is often separated into multiple groups based on the number of disk drives/partitions on your node, which are also known as a family. - -![Animated GIF of using Netdata's menus and -submenus](https://user-images.githubusercontent.com/1153921/80832425-7c528600-8ba1-11ea-8140-d0a17a62009b.gif) - -## Charts - -Every **chart** in the Netdata dashboard is [fully interactive](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx). Netdata -synchronizes your interactions to help you understand exactly how a node behaved in any timeframe, whether that's -seconds or days. - -A chart is an individual, interactive, always-updating graphic displaying one or more collected/calculated metrics, -which are generated by [collectors](https://github.com/netdata/netdata/blob/master/docs/collect/how-collectors-work.md). - -![Animated GIF of the standard Netdata dashboard being manipulated and synchronizing -charts](https://user-images.githubusercontent.com/1153921/80839230-b034a800-8baf-11ea-9cb2-99c1e10f0f85.gif) - -Hover over any chart to temporarily pause it and see the exact metrics values presented as different dimensions. Click -or tap to stop the chart from automatically updating with new metrics, thereby locking it to a single timeframe. -Double-click it to resume auto-updating. - -Let's cover two of the most important ways to interact with charts: panning through time and zooming. - -To pan through time, **click and hold** (or touch and hold) on any chart, then **drag your mouse** (or finger) to the -left or right. Drag to the right to pan backward through time, or drag to the left to pan forward in time. Think of it -like pushing the current timeframe off the screen to see what came before or after. - -To zoom, press and hold `Shift`, then use your mouse's scroll wheel, or a two-finger pinch if you're using a touchpad. - -See [interact with charts](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx) for all the possible ways to interact with the charts on -your dashboard. - -## Alarms - -Many of the preconfigured charts on the Netdata dashboard also come with preconfigured alarms. Netdata sends three -primary alarm states via alarms: `CLEAR`, `WARNING`, and `CRITICAL`. If an alarm moves from a `CLEAR` state to either -`WARNING` or `CRITICAL`, Netdata creates a notification to let you know exactly what's going on. There are [other alarm -states](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#alarm-statuses) as well. - -The easiest way to see alarms is by clicking on the alarm icon ![Alarms -icon](https://raw.githubusercontent.com/netdata/netdata-ui/98e31799c1ec0983f433537ff16d2ac2b0d994aa/src/components/icon/assets/alarm.svg) -in the top panel to open the alarms panel, which shows you all the active alarms. The other **All** tab shows every -active alarm, and the **Log** tab shows a historical record of exactly when alarms triggered and to which state. - -![Animated GIF of looking at raised alarms and the alarm -log](https://user-images.githubusercontent.com/1153921/80842482-8c289500-8bb6-11ea-9791-600cfdbe82ce.gif) - -Learn more about [viewing active alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/view-active-alarms.md), [configuring -alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md), or [enabling a new notification -method](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md). - -## What's next? - -Learn more about [interacting with charts](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx) to quickly pan through time, zoom, and -show/hide dimensions to best understand the state of your node in any timeframe. A complete understanding of [chart -dimensions, contexts, and families](https://github.com/netdata/netdata/blob/master/docs/dashboard/dimensions-contexts-families.mdx) will also help with how Netdata -organizes its dashboard and operates [alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md). - -### Further reading & related information - -- Dashboard - - **[How the dashboard works](https://github.com/netdata/netdata/blob/master/docs/dashboard/how-dashboard-works.mdx)** - - [Interact with charts](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx) - - [Chart dimensions, contexts, and families](https://github.com/netdata/netdata/blob/master/docs/dashboard/dimensions-contexts-families.mdx) - - [Select timeframes to visualize](https://github.com/netdata/netdata/blob/master/docs/dashboard/visualization-date-and-time-controls.mdx) - - [Import, export, and print a snapshot](https://github.com/netdata/netdata/blob/master/docs/dashboard/import-export-print-snapshot.mdx) - - [Customize the standard dashboard](https://github.com/netdata/netdata/blob/master/docs/dashboard/customize.mdx) -- [HTTP API](https://github.com/netdata/netdata/blob/master/web/api/README.md) -- [Custom dashboards](https://github.com/netdata/netdata/blob/master/web/gui/custom/README.md) diff --git a/docs/dashboard/import-export-print-snapshot.mdx b/docs/dashboard/import-export-print-snapshot.md index 23430a561..35c3b9db9 100644 --- a/docs/dashboard/import-export-print-snapshot.mdx +++ b/docs/dashboard/import-export-print-snapshot.md @@ -1,25 +1,25 @@ ---- +<!-- title: "Import, export, and print a snapshot" description: >- "Snapshots can be incredibly useful for diagnosing anomalies after they've already happened, and are interoperable with any other node running Netdata." type: "how-to" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/dashboard/import-export-print-snapshot.mdx" +custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/dashboard/import-export-print-snapshot.md" sidebar_label: "Import, export, and print a snapshot" learn_status: "Published" learn_topic_type: "Tasks" learn_rel_path: "Operations" ---- +--> -# Import, export, and print snapshots +# Import, export, and print a snapshot Netdata can export snapshots of the contents of your dashboard at a given time, which you can then import into any other node running Netdata. Or, you can create a print-ready version of your dashboard to save to PDF or actually print to paper. Snapshots can be incredibly useful for diagnosing anomalies after they've already happened. Let's say Netdata triggered a warning alarm while you were asleep. In the morning, you can [select the -timeframe](https://github.com/netdata/netdata/blob/master/docs/dashboard/visualization-date-and-time-controls.mdx) when the alarm triggered, export a snapshot, and send it to a +timeframe](https://github.com/netdata/netdata/blob/master/docs/dashboard/visualization-date-and-time-controls.md) when the alarm triggered, export a snapshot, and send it to a colleague for further analysis. @@ -72,19 +72,3 @@ in the top panel. When you click **Print**, Netdata opens a new window to render every chart. This might take some time. When finished, Netdata opens a browser print dialog for you to save to PDF or print. - -## What's next? - -Now that you understand snapshots, now is a good time to delve deeper into some of the dashboard's lesser-known -features, such as [customization](https://github.com/netdata/netdata/blob/master/docs/dashboard/customize.mdx) or [building new, custom -dashboards](https://github.com/netdata/netdata/blob/master/web/gui/custom/README.md). - -### Further reading & related information - -- Dashboard - - [How the dashboard works](https://github.com/netdata/netdata/blob/master/docs/dashboard/how-dashboard-works.mdx) - - [Interact with charts](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx) - - [Chart dimensions, contexts, and families](https://github.com/netdata/netdata/blob/master/docs/dashboard/dimensions-contexts-families.mdx) - - [Select timeframes to visualize](https://github.com/netdata/netdata/blob/master/docs/dashboard/visualization-date-and-time-controls.mdx) - - **[Import, export, and print a snapshot](https://github.com/netdata/netdata/blob/master/docs/dashboard/import-export-print-snapshot.mdx)** - - [Customize the standard dashboard](https://github.com/netdata/netdata/blob/master/docs/dashboard/customize.mdx)
\ No newline at end of file diff --git a/docs/dashboard/interact-charts.mdx b/docs/dashboard/interact-charts.mdx deleted file mode 100644 index a733bc9e0..000000000 --- a/docs/dashboard/interact-charts.mdx +++ /dev/null @@ -1,201 +0,0 @@ ---- -title: "Interact with charts" -description: "Learn how to pan, zoom, select, and customize Netdata's preconfigured charts to help you troubleshooting with real-time, per-second metrics data." -type: "how-to" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/dashboard/interact-charts.mdx" -sidebar_label: "Interact with charts" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Operations" ---- - -# Interact with charts - -> ⚠️ There is a new version of charts that is currently **only** available on [Netdata Cloud](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md). We didn't -> want to keep this valuable feature from you, so after we get this into your hands on the Cloud, we will collect and implement your feedback to make sure we are providing the best possible version of the feature on the Netdata Agent dashboard as quickly as possible. - -While charts that update every second with new metrics are helpful for understanding the immediate state of a node, deep -troubleshooting and root cause analysis begins by manipulating the default charts. To help you troubleshoot, Netdata -synchronizes every chart every time you interact with one of them. - -Here's what synchronization looks like: - -![Animated GIF of the standard Netdata dashboard being manipulated and synchronizing -charts](https://user-images.githubusercontent.com/1153921/80839230-b034a800-8baf-11ea-9cb2-99c1e10f0f85.gif) - -Once you understand all the interactions available to you, you'll be able to quickly move around the dashboard, search -for anomalies, and find root causes using per-second metrics. - -## Pause or stop - -| Interaction | Keyboard/mouse | Touchpad/touchscreen | -| :---------------- | :------------- | :------------------- | -| **Pause** a chart | `hover` | `n/a` | -| **Stop** a chart | `click` | `tap` | - -By hovering over any chart, you temporarily pause it so that you can hover over a specific timeframe and see the exact -values presented as dimensions. Click on the chart to lock it to this timeframe, which is useful if you want to jump to -a different chart to look for possible correlations. - -![Animated GIF of hovering over a chart to see -values](https://user-images.githubusercontent.com/1153921/62968279-9227dd00-bdbf-11e9-9112-1d21444d0f31.gif) - -## Pan - -| Interaction | Keyboard/mouse | Touchpad/touchscreen | -| :---------- | :------------- | :------------------- | -| **Pan** | `click + drag` | `swipe` | - -Drag your mouse/finger to the right to pan backward through time, or drag to the left to pan forward in time. Think of -it like pushing the current timeframe off the screen to see what came before or after. - -## Zoom - -| Interaction | Keyboard/mouse | Touchpad/touchscreen | -| :------------------------------- | :-------------------------- | :--------------------------------------------------- | -| **Zoom** in or out | `Shift + mouse scrollwheel` | `two-finger pinch` <br />`Shift + two-finger scroll` | -| **Zoom** to a specific timeframe | `Shift + mouse selection` | `n/a` | - -Zooming in helps you see metrics with maximum granularity, which is useful when you're trying to diagnose the root cause -of an anomaly or outage. Zooming out lets you see metrics within the larger context, such as the last hour, day, or -week, which is useful in understanding what "normal" looks like, or to identify long-term trends, like a slow creep in -memory usage. - -## Select - -| Interaction | Keyboard/mouse | Touchpad/touchscreen | -| :------------------------------ | :-------------------------------------------------------- | :------------------- | -| **Select** a specific timeframe | `Alt + mouse selection` or `⌘ + mouse selection` (macOS) | `n/a` | - -Selecting timeframes is useful when you see an interesting spike or change in a chart and want to investigate further. - -Select a timeframe, then move to different charts/sections of the dashboard. Each chart shows the same selection to help -you immediately identify the timeframe and look for correlations. - -## Reset a chart to its default state - -| Interaction | Keyboard/mouse | Touchpad/touchscreen | -| :---------------- | :------------- | :------------------- | -| **Reset** a chart | `double-click` | `n/a` | - -Double-check on a chart to restore it to the default auto-updating state, with a timeframe based on your browser -viewport. - -## Resize - -Click-and-drag the icon on the bottom-right corner of any chart. To restore the chart to its original height, -double-click the same icon. - -![Animated GIF of resizing a chart and resetting it to the default -height](https://user-images.githubusercontent.com/1153921/80842459-7d41e280-8bb6-11ea-9488-1bc29f94d7f2.gif) - -## Show and hide dimensions - -| Interaction | Keyboard/mouse | Touchpad/touchscreen | -| :------------------------------------- | :-------------- | :------------------- | -| **Show one** dimension and hide others | `click` | `tap` | -| **Toggle (show/hide)** one dimension | `Shift + click` | `n/a` | - -Hiding dimensions simplifies the chart and can help you better discover exactly which aspect of your system might be -behaving strangely. - -## See the context - -Hover your mouse over the date that appears just beneath the chart itself. A tooltip will tell you the context for that -chart. Below, the context is `apps.cpu`. - -![See a chart's -context](https://user-images.githubusercontent.com/1153921/114212924-39ec0a00-9917-11eb-9a9e-7e171057b3fd.gif) - -## See the resolution and update frequency - -Hover your mouse over the timestamp just to the right of the date. `resolution` is the number of seconds between each -"tick" in the chart. `collection every` is how often Netdata collects and stores that metric. - -If the `resolution` value is higher than `collection every`, such as `resolution 5 secs, collected every 1 sec`, this -means that each tick is calculating represents the average values across a 5-second period. You can zoom in to increase -the resolution to `resolution 1 sec` to see the exact values. - -## Chart controls - -Many of the above interactions can also be triggered using the icons on the bottom-right corner of every chart. They -are, respectively, `Pan Left`, `Reset`, `Pan Right`, `Zoom In`, and `Zoom Out`. - -## Chart label filtering - -The chart label filtering feature supports grouping by and filtering each chart based on labels (key/value pairs) applicable to the context and provides fine-grain capability on slicing the data and metrics. - -All metrics collected get "tagged" with labels and values, thus providing a powerful way of slicing and visualizing all metrics related to the infrastructure. - -The chart label filtering is currently enabled on: - -- All charts on the **Overview** tab -- Custom dashboards - -![Chart filtering on Overview tab chart](https://user-images.githubusercontent.com/88642300/193084084-01074495-c826-4519-a09f-d210f7e3e6be.png) -![Chart filtering on Custom dashboard](https://user-images.githubusercontent.com/88642300/193084172-358dfded-c318-4d9f-b6e2-46a8fc33030b.png) - -The top panel on each chart displays the various filters and grouping options selected on the specific chart. These filters are specific for each chart and need to be manually configured on each chart. - -Additionally, the charts can be saved to a custom dashboard, new or existing, with the selected filters from the overview screen. - -![Chart filtering saved on custom dashboard](https://user-images.githubusercontent.com/88642300/193084225-1b65984e-566c-4815-8bc1-a2781d3564bd.png) - -## Custom labels for Collectors - -In addition to the default labels associated with a collector and metrics context (you can identify them by seeing which ones have an underscore as a prefix), there is now a new feature enabled to create custom labels. These custom labels may be needed to group your jobs or instances into various categories. - -These custom labels can be configured within your go.d plugins by simply associating a label key/value pair, as in the following eaxmple. - -```conf -jobs: - - name: example_1 - someOption: someValue - labels: - label1: value1 - label2: value2 - - name: example_2 - someOption: someValue - labels: - label3: value3 - label4: value4 -``` - -For instance, you may be running multiple Postgres database instances within an infrastructure. Some of these may be associated with testing environments, some with staging and some with production environments. You can now associate each Postgres job / instance with a custom label. The “group by” and filtering options will then allow you to associate individual jobs by specific labels. - -```conf -jobs: - - name: local - dsn: 'postgres://postgres:postgres@127.0.0.1:5432/postgres' - collect_databases_matching: '*' - labels: - instance_type: production - ``` - ![Group by individual job labels one](https://user-images.githubusercontent.com/88642300/193084580-49df500a-ddfb-45bb-a209-3c7a904ee9e0.png) - ![group by individual job labels two](https://user-images.githubusercontent.com/88642300/193084624-6d9848d0-9400-4e34-9cd4-78e50c784cc0.png) - -### Future Work - -We already have [configurable host labels](https://github.com/netdata/netdata/blob/master/docs/guides/using-host-labels.md) as well, which currently can’t be used to filter or group your metrics. We intend to provide the same capabilities described here with host labels, among other capabilities on other areas of the app as well - -## What's next? - -We recommend you read up on the differences between [chart dimensions, contexts, and -families](https://github.com/netdata/netdata/blob/master/docs/dashboard/dimensions-contexts-families.mdx) to complete your understanding of how Netdata organizes its -dashboards. Another valuable way to interact with charts is to use the [timeframe -selector](https://github.com/netdata/netdata/blob/master/docs/dashboard/visualization-date-and-time-controls.mdx), which helps you visualize specific moments of historical metrics. - -If you feel comfortable with the [dashboard](https://github.com/netdata/netdata/blob/master/docs/dashboard/how-dashboard-works.mdx) and interacting with charts, we -recommend moving on to learning about [configuration](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md). While Netdata doesn't _require_ a -complicated setup process or a query language to create charts, there are a lot of ways to tweak the experience to match -your needs. - -### Further reading & related information - -- Dashboard - - [How the dashboard works](https://github.com/netdata/netdata/blob/master/docs/dashboard/how-dashboard-works.mdx) - - [Netdata Cloud · Interact with new charts](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md) - - [Chart dimensions, contexts, and families](https://github.com/netdata/netdata/blob/master/docs/dashboard/dimensions-contexts-families.mdx) - - [Select timeframes to visualize](https://github.com/netdata/netdata/blob/master/docs/dashboard/visualization-date-and-time-controls.mdx) - - [Import, export, and print a snapshot](https://github.com/netdata/netdata/blob/master/docs/dashboard/import-export-print-snapshot.mdx) - - [Customize the standard dashboard](https://github.com/netdata/netdata/blob/master/docs/dashboard/customize.mdx) diff --git a/docs/dashboard/reference-web-server.mdx b/docs/dashboard/reference-web-server.mdx deleted file mode 100644 index f90e6f873..000000000 --- a/docs/dashboard/reference-web-server.mdx +++ /dev/null @@ -1,278 +0,0 @@ ---- -title: "Web server reference" -description: "The Netdata Agent's local static-threaded web server serves dashboards and real-time visualizations with security and DDoS protection." -type: reference -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/dashboard/reference-web-server.mdx ---- - -# Web server reference - -The Netdata web server is `static-threaded`, with a fixed, configurable number of threads. - -All the threads are concurrently listening for web requests on the same sockets, and the kernel distributes the incoming -requests to them. Each thread uses non-blocking I/O so it can serve any number of web requests in parallel. - -This web server respects the `keep-alive` HTTP header to serve multiple HTTP requests via the same connection. - -## Configuration - -From within your Netdata config directory (typically `/etc/netdata`), [use `edit-config`](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) to -open `netdata.conf`. - -``` -sudo ./edit-config netdata.conf -``` - -Scroll down to the `[web]` section to find the following settings. - -## Settings - -| Setting | Default | Description | -|:-------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `ssl key` | `/etc/netdata/ssl/key.pem` | Declare the location of an SSL key to [enable HTTPS](#enable-httpstls-support). | -| `ssl certificate` | `/etc/netdata/ssl/cert.pem` | Declare the location of an SSL certificate to [enable HTTPS](#enable-httpstls-support). | -| `tls version` | `1.3` | Choose which TLS version to use. While all versions are allowed (`1` or `1.0`, `1.1`, `1.2` and `1.3`), we recommend `1.3` for the most secure encryption. If left blank, Netdata uses the highest available protocol version on your system. | -| `tls ciphers` | `none` | Choose which TLS cipher to use. Options include `TLS_AES_256_GCM_SHA384`, `TLS_CHACHA20_POLY1305_SHA256`, and `TLS_AES_128_GCM_SHA256`. If left blank, Netdata uses the default cipher list for that protocol provided by your TLS implementation. | -| `ses max window` | `15` | See [single exponential smoothing](https://github.com/netdata/netdata/blob/master/web/api/queries/ses/README.md). | -| `des max window` | `15` | See [double exponential smoothing](https://github.com/netdata/netdata/blob/master/web/api/queries/des/README.md). | -| `mode` | `static-threaded` | Turns on (`static-threaded` or off (`none`) the static-threaded web server. See the [example](#disable-the-web-server) to turn off the web server and disable the dashboard. | -| `listen backlog` | `4096` | The port backlog. Check `man 2 listen`. | -| `default port` | `19999` | The listen port for the static web server. | -| `web files owner` | `netdata` | The user that owns the web static files. Netdata will refuse to serve a file that is not owned by this user, even if it has read access to that file. If the user given is not found, Netdata will only serve files owned by user given in `run as user`. | -| `web files group` | `netdata` | If this is set, Netdata will check if the file is owned by this group and refuse to serve the file if it's not. | -| `disconnect idle clients after seconds` | `60` | The time in seconds to disconnect web clients after being totally idle. | -| `timeout for first request` | `60` | How long to wait for a client to send a request before closing the socket. Prevents slow request attacks. | -| `accept a streaming request every seconds` | `0` | Can be used to set a limit on how often a parent node will accept streaming requests from child nodes in a [streaming and replication setup](https://github.com/netdata/netdata/blob/master/streaming/README.md). | -| `respect do not track policy` | `no` | If set to `yes`, Netdata will respect the user's browser preferences for [Do Not Track](https://www.eff.org/issues/do-not-track) (DNT) and storing cookies. If DNT is _enabled_ in the browser, and this option is set to `yes`, users will not be able to sign in to Netdata Cloud via their local Agent dashboard, and their node will not connect to any [registry](https://github.com/netdata/netdata/blob/master/registry/README.md). For certain browsers, users must disable DNT and change this option to `yes` for full functionality. | -| `x-frame-options response header` | ` ` | Avoid [clickjacking attacks](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Frame-Options), by ensuring that the content is not embedded into other sites. | -| `allow connections from` | `localhost *` | Declare which IP addresses or full-qualified domain names (FQDNs) are allowed to connect to the web server, including the [dashboard](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx) or [HTTP API](https://github.com/netdata/netdata/blob/master/web/api/README.md). This is a global setting with higher priority to any of the ones below. | -| `allow connections by dns` | `heuristic` | See the [access list examples](#access-lists) for details on using `allow` settings. | -| `allow dashboard from` | `localhost *` | | -| `allow dashboard by dns` | `heuristic` | | -| `allow badges from` | `*` | | -| `allow badges by dns` | `heuristic` | | -| `allow streaming from` | `*` | | -| `allow streaming by dns` | `heuristic` | | -| `allow netdata.conf` | `localhost fd* 10.* 192.168.* 172.16.* 172.17.* 172.18.* 172.19.* 172.20.* 172.21.* 172.22.* 172.23.* 172.24.* 172.25.* 172.26.* 172.27.* 172.28.* 172.29.* 172.30.* 172.31.* UNKNOWN` | | -| `allow netdata.conf by dns` | `no` | | -| `allow management from` | `localhost` | | -| `allow management by dns` | `heuristic` | | -| `enable gzip compression` | `yes` | When set to `yes`, Netdata web responses will be GZIP compressed, if the web client accepts such responses. | -| `gzip compression strategy` | `default` | Valid settings are `default`, `filtered`, `huffman only`, `rle` and `fixed`. | -| `gzip compression level` | `3` | Valid settings are 1 (fastest) to 9 (best ratio). | -| `web server threads` | ` ` | How many processor threads the web server is allowed. The default is system-specific, the minimum of `6` or the number of CPU cores. | -| `web server max sockets` | ` ` | Available sockets. The default is system-specific, automatically adjusted to 50% of the max number of open files Netdata is allowed to use (via `/etc/security/limits.conf` or systemd), to allow enough file descriptors to be available for data collection. | -| `custom dashboard_info.js` | ` ` | Specifies the location of a custom `dashboard.js` file. See [customizing the standard dashboard](https://github.com/netdata/netdata/blob/master/docs/dashboard/customize.mdx#customize-the-standard-dashboard) for details. | - -## Examples - -### Disable the web server - -Disable the web server by editing `netdata.conf` and setting: - -``` -[web] - mode = none -``` - -### Change the number of threads - -Control the number of threads and sockets with the following settings: - -``` -[web] - web server threads = 4 - web server max sockets = 512 -``` - -### Binding Netdata to multiple ports - -Netdata can bind to multiple IPs and ports, offering access to different services on each. Up to 100 sockets can be used (increase it at compile time with `CFLAGS="-DMAX_LISTEN_FDS=200" ./netdata-installer.sh ...`). - -The ports to bind are controlled via `[web].bind to`, like this: - -``` -[web] - default port = 19999 - bind to = 127.0.0.1=dashboard^SSL=optional 10.1.1.1:19998=management|netdata.conf hostname:19997=badges [::]:19996=streaming^SSL=force localhost:19995=registry *:http=dashboard unix:/run/netdata/netdata.sock -``` - -Using the above, Netdata will bind to: - -- IPv4 127.0.0.1 at port 19999 (port was used from `default port`). Only the UI (dashboard) and the read API will be accessible on this port. Both HTTP and HTTPS requests will be accepted. -- IPv4 10.1.1.1 at port 19998. The management API and `netdata.conf` will be accessible on this port. -- All the IPs `hostname` resolves to (both IPv4 and IPv6 depending on the resolved IPs) at port 19997. Only badges will be accessible on this port. -- All IPv6 IPs at port 19996. Only metric streaming requests from other Netdata agents will be accepted on this port. Only encrypted streams will be allowed (i.e. child nodes also need to be [configured for TLS](https://github.com/netdata/netdata/blob/master/streaming/README.md). -- All the IPs `localhost` resolves to (both IPv4 and IPv6 depending the resolved IPs) at port 19996. This port will only accept registry API requests. -- All IPv4 and IPv6 IPs at port `http` as set in `/etc/services`. Only the UI (dashboard) and the read API will be accessible on this port. -- Unix domain socket `/run/netdata/netdata.sock`. All requests are serviceable on this socket. Note that in some OSs like Fedora, every service sees a different `/tmp`, so don't create a Unix socket under `/tmp`. `/run` or `/var/run` is suggested. - -The option `[web].default port` is used when an entries in `[web].bind to` do not specify a port. - -Note that the access permissions specified with the `=request type|request type|...` format are available from version 1.12 onwards. -As shown in the example above, these permissions are optional, with the default being to permit all request types on the specified port. -The request types are strings identical to the `allow X from` directives of the access lists, i.e. `dashboard`, `streaming`, `registry`, `netdata.conf`, `badges` and `management`. -The access lists themselves and the general setting `allow connections from` in the next section are applied regardless of the ports that are configured to provide these services. -The API requests are serviced as follows: - -- `dashboard` gives access to the UI, the read API and badges API calls. -- `badges` gives access only to the badges API calls. -- `management` gives access only to the management API calls. - -### Enable HTTPS/TLS support - -Since v1.16.0, Netdata supports encrypted HTTP connections to the web server, plus encryption of streaming data to a -parent from its child nodes, via the TLS protocol. - -Inbound unix socket connections are unaffected, regardless of the TLS settings. - -> While Netdata uses Transport Layer Security (TLS) 1.2 to encrypt communications rather than the obsolete SSL protocol, -> it's still common practice to refer to encrypted web connections as `SSL`. Many vendors, like Nginx and even Netdata -> itself, use `SSL` in configuration files, whereas documentation will always refer to encrypted communications as `TLS` -> or `TLS/SSL`. - -To enable TLS, provide the path to your certificate and private key in the `[web]` section of `netdata.conf`: - -```conf -[web] - ssl key = /etc/netdata/ssl/key.pem - ssl certificate = /etc/netdata/ssl/cert.pem -``` - -Both files must be readable by the `netdata` user. If either of these files do not exist or are unreadable, Netdata will fall back to HTTP. For a parent-child connection, only the parent needs these settings. - -For test purposes, generate self-signed certificates with the following command: - -```bash -openssl req -newkey rsa:2048 -nodes -sha512 -x509 -days 365 -keyout key.pem -out cert.pem -``` - -> If you use 4096 bits for your key and the certificate, Netdata will need more CPU to process the communication. -> `rsa4096` can be up to 4 times slower than `rsa2048`, so we recommend using 2048 bits. Verify the difference -> by running: -> -> ```sh -> openssl speed rsa2048 rsa4096 -> ``` - -### Select TLS version - -Beginning with version `v1.21.0`, specify the TLS version and the ciphers that you want to use: - -```conf -[web] - tls version = 1.3 - tls ciphers = TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:TLS_AES_128_GCM_SHA256 -``` - -If you do not specify these options, Netdata will use the highest available protocol version on your system and the default cipher list for that protocol provided by your TLS implementation. - -#### TLS/SSL enforcement - -When the certificates are defined and unless any other options are provided, a Netdata server will: - -- Redirect all incoming HTTP web server requests to HTTPS. Applies to the dashboard, the API, `netdata.conf` and badges. -- Allow incoming child connections to use both unencrypted and encrypted communications for streaming. - -To change this behavior, you need to modify the `bind to` setting in the `[web]` section of `netdata.conf`. At the end of each port definition, append `^SSL=force` or `^SSL=optional`. What happens with these settings differs, depending on whether the port is used for HTTP/S requests, or for streaming. - -| SSL setting | HTTP requests|HTTPS requests|Unencrypted Streams|Encrypted Streams| -|:---------:|:-----------:|:------------:|:-----------------:|:----------------| -| none | Redirected to HTTPS|Accepted|Accepted|Accepted| -| `force`| Redirected to HTTPS|Accepted|Denied|Accepted| -| `optional`| Accepted|Accepted|Accepted|Accepted| - -Example: - -``` -[web] - bind to = *=dashboard|registry|badges|management|streaming|netdata.conf^SSL=force -``` - -For information how to configure the child to use TLS, check [securing the communication](https://github.com/netdata/netdata/blob/master/streaming/README.md#securing-streaming-communications) in the streaming documentation. There you will find additional details on the expected behavior for client and server nodes, when their respective TLS options are enabled. - -When we define the use of SSL in a Netdata agent for different ports, Netdata will apply the behavior specified on each port. For example, using the configuration line below: - -``` -[web] - bind to = *=dashboard|registry|badges|management|streaming|netdata.conf^SSL=force *:20000=netdata.conf^SSL=optional *:20001=dashboard|registry -``` - -Netdata will: - -- Force all HTTP requests to the default port to be redirected to HTTPS (same port). -- Refuse unencrypted streaming connections from child nodes on the default port. -- Allow both HTTP and HTTPS requests to port 20000 for `netdata.conf` -- Force HTTP requests to port 20001 to be redirected to HTTPS (same port). Only allow requests for the dashboard, the read API and the registry on port 20001. - -#### TLS/SSL errors - -When you start using Netdata with TLS, you may find errors in the Netdata log, which is stored at `/var/log/netdata/error.log` by default. - -Most of the time, these errors are due to incompatibilities between your browser's options related to TLS/SSL protocols and Netdata's internal configuration. The most common error is `error:00000006:lib(0):func(0):EVP lib`. - -In the near future, Netdata will allow our users to change the internal configuration to avoid similar errors. Until then, we're recommending only the most common and safe encryption protocols listed above. - -### Access lists - -Netdata supports access lists in `netdata.conf`: - -``` -[web] - allow connections from = localhost * - allow dashboard from = localhost * - allow badges from = * - allow streaming from = * - allow netdata.conf from = localhost fd* 10.* 192.168.* 172.16.* 172.17.* 172.18.* 172.19.* 172.20.* 172.21.* 172.22.* 172.23.* 172.24.* 172.25.* 172.26.* 172.27.* 172.28.* 172.29.* 172.30.* 172.31.* - allow management from = localhost -``` - -`*` does string matches on the IPs or FQDNs of the clients. - -- `allow connections from` matches anyone that connects on the Netdata port(s). - So, if someone is not allowed, it will be connected and disconnected immediately, without reading even - a single byte from its connection. This is a global setting with higher priority to any of the ones below. - -- `allow dashboard from` receives the request and examines if it is a static dashboard file or an API call the - dashboards do. - -- `allow badges from` checks if the API request is for a badge. Badges are not matched by `allow dashboard from`. - -- `allow streaming from` checks if the child willing to stream metrics to this Netdata is allowed. - This can be controlled per API KEY and MACHINE GUID in `stream.conf`. - The setting in `netdata.conf` is checked before the ones in `stream.conf`. - -- `allow netdata.conf from` checks the IP to allow `http://netdata.host:19999/netdata.conf`. - The IPs listed are all the private IPv4 addresses, including link local IPv6 addresses. Keep in mind that connections to Netdata API ports are filtered by `allow connections from`. So, IPs allowed by `allow netdata.conf from` should also be allowed by `allow connections from`. - -- `allow management from` checks the IPs to allow API management calls. Management via the API is currently supported for [health](https://github.com/netdata/netdata/blob/master/web/api/health/README.md#health-management-api) - -In order to check the FQDN of the connection without opening the Netdata agent to DNS-spoofing, a reverse-dns record -must be setup for the connecting host. At connection time the reverse-dns of the peer IP address is resolved, and -a forward DNS resolution is made to validate the IP address against the name-pattern. - -Please note that this process can be expensive on a machine that is serving many connections. Each access list has an -associated configuration option to turn off DNS-based patterns completely to avoid incurring this cost at run-time: - -``` - allow connections by dns = heuristic - allow dashboard by dns = heuristic - allow badges by dns = heuristic - allow streaming by dns = heuristic - allow netdata.conf by dns = no - allow management by dns = heuristic -``` - -The three possible values for each of these options are `yes`, `no` and `heuristic`. The `heuristic` option disables -the check when the pattern only contains IPv4/IPv6 addresses or `localhost`, and enables it when wildcards are -present that may match DNS FQDNs. - -## DDoS protection - -If you publish your Netdata web server to the internet, you may want to apply some protection against DDoS: - -1. Use the `static-threaded` web server (it is the default) -2. Use reasonable `[web].web server max sockets` (the default is) -3. Don't use all your CPU cores for Netdata (lower `[web].web server threads`) -4. Run the `netdata` process with a low process scheduling priority (the default is the lowest) -5. If possible, proxy Netdata via a full featured web server (Nginx, Apache, etc) diff --git a/docs/dashboard/visualization-date-and-time-controls.md b/docs/dashboard/visualization-date-and-time-controls.md new file mode 100644 index 000000000..99e4c308e --- /dev/null +++ b/docs/dashboard/visualization-date-and-time-controls.md @@ -0,0 +1,92 @@ +# Visualization date and time controls + +Netdata's dashboard features powerful date visualization controls that include a time control, a timezone selector and a rich date and timeframe selector. + +The controls come with useful defaults and rich customization, to help you narrow your focus when troubleshooting issues or anomalies. + +## Time controls + +The time control provides you the following options: **Play**, **Pause** and **Force Play**. + +- **Play** - the content of the page will be automatically refreshed while this is in the foreground +- **Pause** - the content of the page isn't refreshed due to a manual request to pause it or, for example, when your investigating data on a chart (cursor is on top of a chart) +- **Force Play** - the content of the page will be automatically refreshed even if this is in the background + +With this, we aim to bring more clarity and allow you to distinguish if the content you are looking at is live or historical and also allow you to always refresh the content of the page when the tabs are in the background. + +Main use cases for **Force Play**: + +- You use a terminal or deployment tools to do changes in your infra and want to see the effect immediately, Netdata is in the background, displaying the impact of these changes +- You want to have Netdata on the background, example displayed on a TV, to constantly see metrics through dashboards or to watch the alert status + +![The time control with Play, Pause and Force Play](https://user-images.githubusercontent.com/70198089/225850250-1fe12477-23f8-4b4d-b497-79b416963e10.png) + +## Date and time selector + +The date and time selector allows you to change the visible timeframe and change the timezone used in the interface. + +### Pick timeframes to visualize + +While [panning through time and zooming in/out](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md) from charts it is helpful when you're looking a recent history, or want to do granular troubleshooting, what if you want to see metrics from 6 hours ago? Or 6 days? + +Netdata's dashboard features a **timeframe selector** to help you visualize specific timeframes in a few helpful ways. +By default, it shows a certain number of minutes of historical metrics based on the your browser's viewport to ensure it's always showing per-second granularity. + +#### Open the timeframe selector + +To visualize a new timeframe, you need to open the picker, which appears just above the menu, near the top-right bar of the dashboard. + +![Timeframe Selector](https://user-images.githubusercontent.com/70198089/225850611-728936d9-7ca4-49fa-8d37-1ce73dd6f76c.png) + +The **Clear** button resets the dashboard back to its default state based on your browser viewport, and **Apply** closes +the picker and shifts all charts to the selected timeframe. + +#### Use the pre-defined timeframes + +Click any of the following options in the predefined timeframe column to choose between: + +- Last 5 minutes +- Last 15 minutes +- Last 30 minutes +- Last hour +- Last 2 hours +- Last 6 hours +- Last 12 hours +- Last day +- Last 2 days +- Last 7 days + +Click **Apply** to see metrics from your selected timeframe. + +#### Choose a specific interval + +Beneath the predefined timeframe columns is an input field and dropdown you use in combination to select a specific timeframe of +minutes, hours, days, or months. Enter a number and choose the appropriate unit of time, then click **Apply**. + +#### Choose multiple days via the calendar + +Use the calendar to select multiple days. Click on a date to begin the timeframe selection, then an ending date. The +timeframe begins at noon on the beginning and end dates. Click **Apply** to see your selected multi-day timeframe. + +#### Caveats and considerations + +**Longer timeframes will decrease metrics granularity**. At the default timeframe, based on your browser viewport, each +"tick" on charts represents one second. If you select a timeframe of 6 hours, each tick represents the _average_ value +across a larger period of time. + +**You can only see metrics as far back in history as your metrics retention policy allows**. Netdata uses an internal +time-series database (TSDB) to store as many metrics as it can within a specific amount of disk space. The default +storage is 256 MiB, which should be enough for 1-3 days of historical metrics. If you navigate back to a timeframe +beyond stored historical metrics, you'll see this message: + +![image](https://user-images.githubusercontent.com/70198089/225851033-43b95164-a651-48f2-8915-6aac9739ed93.png) + +At any time, [configure the internal TSDB's storage capacity](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) to expand your +depth of historical metrics. + +### Timezone selector + +The default timezone used in all date and time fields in Netdata Cloud comes from your browser. To change it, open the +date and time selector and use the control displayed here: + +![Timezone selector](https://user-images.githubusercontent.com/43294513/216628390-c3bd1cd2-349d-4523-b8d3-c7e68395f670.png) diff --git a/docs/dashboard/visualization-date-and-time-controls.mdx b/docs/dashboard/visualization-date-and-time-controls.mdx deleted file mode 100644 index a59a1f066..000000000 --- a/docs/dashboard/visualization-date-and-time-controls.mdx +++ /dev/null @@ -1,125 +0,0 @@ -<!-- -title: "Visualization date and time controls" -description: "Netdata's dashboard features powerful date visualization controls that include a time control (play, pause, force play), a timezone selector and a rich date and timeframe selector, with useful defaults and rich customization, to help you narrow your focus when troubleshooting issues or anomalies." -type: "how-to" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/dashboard/visualization-date-and-time-controls.mdx" -sidebar_label: "Visualization date and time controls" -learn_status: "Published" -learn_topic_type: "Concepts" -learn_rel_path: "Concepts" ---> - -# Visualization date and time controls - -## Date and time selector - -### Pick timeframes to visualize - -While [panning through time and zooming in/out](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx) from charts it is helpful when -you're looking a recent history, or want to do granular troubleshooting, what if you want to see metrics from 6 hours -ago? Or 6 days? - -Netdata's dashboard features a **timeframe selector** to help you visualize specific timeframes in a few helpful ways. -By default, it shows a certain number of minutes of historical metrics based on the your browser's viewport to ensure -it's always showing per-second granularity. - -#### Open the timeframe selector - -To visualize a new timeframe, you need to open the picker, which appears just above the menu, near the top-right cover -of the dashboard. - -![The timeframe selector in the local Agent -dashboard](https://user-images.githubusercontent.com/1153921/101507784-2c585080-3934-11eb-9d6e-eff30b8553e4.png) - -The **Clear** button resets the dashboard back to its default state based on your browser viewport, and **Apply** closes -the picker and shifts all charts to the selected timeframe. - -#### Use the Quick Selector - -Click any of the following options in the **Quick Selector** to choose a commonly-used timeframe. - -- Last 5 minutes -- Last 15 minutes -- Last 2 hours -- Last 6 hours -- Last 12 hours - -Click **Apply** to see metrics from your selected timeframe. - -#### Choose a specific interval - -Beneath the Quick Selector is an input field and dropdown you use in combination to select a specific timeframe of -minutes, hours, days, or months. Enter a number and choose the appropriate unit of time, then click **Apply**. - -#### Choose multiple days - -Use the calendar to select multiple days. Click on a date to begin the timeframe selection, then an ending date. The -timeframe begins at noon on the beginning and end dates. Click **Apply** to see your selected multi-day timeframe. - -## Time controls - -The time control provides you the following options: **Play**, **Pause** and **Force Play**. -* **Play** - the content of the page will be automatically refreshed while this is in the foreground -* **Pause** - the content of the page isn't refreshed due to a manual request to pause it or, for example, when your investigating data on a -chart (cursor is on top of a chart) -* **Force Play** - the content of the page will be automatically refreshed even if this is in the background - -With this, we aim to bring more clarity and allow you to distinguish if the content you are looking at is live or historical and also allow you - to always refresh the content of the page when the tabs are in the background. - -Main use cases for **Force Play**: -* You use a terminal or deployment tools to do changes in your infra and want to see immediately, Netdata is in the background, displaying the impact -of these changes -* You want to have Netdata on the background, example displayed on a TV, to constantly see metrics through dashboards or to watch the alert -status - -![The time control with Play, Pause and -Force Play](https://user-images.githubusercontent.com/82235632/129206460-03c47d0d-1a5b-428a-b972-473718b74bdb.png) - -## Timezone selector - -With the timezone selector, you have the ability to change the timezone on Netdata Cloud. More often than not teams are -distributed in different timezones and they need to collaborate. - -Our goal is to make it easier for you and your teams to troubleshoot based on your timezone preference and communicate easily -with varying timezones and timeframes without the need to be concerned about their specificity. - -<img width="437" alt="Untitled1" src="https://user-images.githubusercontent.com/43294513/216628390-c3bd1cd2-349d-4523-b8d3-c7e68395f670.png"> - -When you change the timezone all the date and time fields will be updated to be displayed according to the specified timezone, this goes from -charts to alerts information and across the Netdata Cloud. - -## Caveats and considerations - -**Longer timeframes will decrease metrics granularity**. At the default timeframe, based on your browser viewport, each -"tick" on charts represents one second. If you select a timeframe of 6 hours, each tick represents the _average_ value -across a larger period of time. - -**You can only see metrics as far back in history as your metrics retention policy allows**. Netdata uses an internal -time-series database (TSDB) to store as many metrics as it can within a specific amount of disk space. The default -storage is 256 MiB, which should be enough for 1-3 days of historical metrics. If you navigate back to a timeframe -beyond stored historical metrics, you'll see this message: - -![Screenshot of reaching the end of historical metrics -storage](https://user-images.githubusercontent.com/1153921/114207597-63a23280-9911-11eb-863d-4d2f75b030b4.png) - -At any time, [configure the internal TSDB's storage capacity](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) to expand your -depth of historical metrics. - -## What's next? - -One useful next step after selecting a timeframe is [exporting the -metrics](https://github.com/netdata/netdata/blob/master/docs/dashboard/import-export-print-snapshot.mdx) into a snapshot file, which can then be shared and imported -into any other Netdata dashboard. - -There are also many ways to [customize](https://github.com/netdata/netdata/blob/master/docs/dashboard/customize.mdx) the standard dashboard experience, from changing -the theme to editing the text that accompanies every section of charts. - -## Further reading & related information - -- Dashboard - - [How the dashboard works](https://github.com/netdata/netdata/blob/master/docs/dashboard/how-dashboard-works.mdx) - - [Interact with charts](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx) - - [Chart dimensions, contexts, and families](https://github.com/netdata/netdata/blob/master/docs/dashboard/dimensions-contexts-families.mdx) - - [Import, export, and print a snapshot](https://github.com/netdata/netdata/blob/master/docs/dashboard/import-export-print-snapshot.mdx) - - [Customize the standard dashboard](https://github.com/netdata/netdata/blob/master/docs/dashboard/customize.mdx) diff --git a/docs/export/enable-connector.md b/docs/export/enable-connector.md index 28208e2f4..02e380e15 100644 --- a/docs/export/enable-connector.md +++ b/docs/export/enable-connector.md @@ -5,7 +5,7 @@ custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/export/ena sidebar_label: "Enable an exporting connector" learn_status: "Published" learn_topic_type: "Tasks" -learn_rel_path: "Setup" +learn_rel_path: "Configuration" --> # Enable an exporting connector @@ -92,8 +92,8 @@ details. If you want to further configure your exporting connectors, see the [exporting engine reference](https://github.com/netdata/netdata/blob/master/exporting/README.md#configuration). -For a comprehensive example of using the Graphite connector, read our guide: -[_Export and visualize Netdata metrics in Graphite_](https://github.com/netdata/netdata/blob/master/docs/guides/export/export-netdata-metrics-graphite.md). Or, start +For a comprehensive example of using the Graphite connector, read our documentation on +[exporting metrics to Graphite providers](https://github.com/netdata/netdata/blob/master/exporting/graphite/README.md). Or, start [using host labels](https://github.com/netdata/netdata/blob/master/docs/guides/using-host-labels.md) on exported metrics. ### Related reference documentation diff --git a/docs/export/external-databases.md b/docs/export/external-databases.md index 00ca7410e..715e8660d 100644 --- a/docs/export/external-databases.md +++ b/docs/export/external-databases.md @@ -22,7 +22,7 @@ Based on your needs and resources you allocated to your external time-series dat that metrics are exported or export only certain charts with filtering. You can also choose whether metrics are exported as-collected, a normalized average, or the sum/volume of metrics values over the configured interval. -Exporting is an important part of Netdata's effort to be [interoperable](https://github.com/netdata/netdata/blob/master/docs/overview/netdata-monitoring-stack.md) +Exporting is an important part of Netdata's effort to be interoperable with other monitoring software. You can use an external time-series database for long-term metrics retention, further analysis, or correlation with other tools, such as application tracing. @@ -75,19 +75,3 @@ documentation and the [enabling a connector](https://github.com/netdata/netdata/ Can't find your preferred external time-series database? Ask our [community](https://community.netdata.cloud/) for solutions, or file an [issue on GitHub](https://github.com/netdata/netdata/issues/new?assignees=&labels=bug%2Cneeds+triage&template=BUG_REPORT.yml). - -## What's next? - -We recommend you read our document on [enabling a connector](https://github.com/netdata/netdata/blob/master/docs/export/enable-connector.md) to learn about the -process and discover important configuration options. If you would rather skip ahead, click on any of the above links to -connectors for their reference documentation, which outline any prerequisites to install for that connector, along with -connector-specific configuration options. - -Read about one possible use case for exporting metrics in our guide: [_Export and visualize Netdata metrics in -Graphite_](https://github.com/netdata/netdata/blob/master/docs/guides/export/export-netdata-metrics-graphite.md). - -### Related reference documentation - -- [Exporting engine reference](https://github.com/netdata/netdata/blob/master/exporting/README.md) - - diff --git a/docs/get-started.mdx b/docs/get-started.mdx deleted file mode 100644 index aa82e811b..000000000 --- a/docs/get-started.mdx +++ /dev/null @@ -1,129 +0,0 @@ -<!-- -title: "Install Netdata" -description: "Download and install the open-source Netdata monitoring agent on physical/virtual servers, Linux (Ubuntu/Debian/CentOS/etc), Docker, Kubernetes, and many others, often with one command." -sidebar_label: "Install Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/get-started.mdx" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Getting started" ---> - -import { OneLineInstallWget, OneLineInstallCurl } from '@site/src/components/OneLineInstall/' -import { InstallRegexLink, InstallBoxRegexLink } from '@site/src/components/InstallRegexLink/' -import Tabs from '@theme/Tabs'; -import TabItem from '@theme/TabItem'; - -Netdata is a free and open-source (FOSS) monitoring agent that collects thousands of hardware and software metrics from -any physical or virtual system (we call them _nodes_). These metrics are organized in an easy-to-use and -navigate interface. - -Together with [Netdata Cloud](https://github.com/netdata/netdata/blob/master/docs/cloud/cloud.mdx), you can monitor your entire infrastructure in -real time and troubleshoot problems that threaten the health of your nodes. - -Netdata runs permanently on all your physical/virtual servers, containers, cloud deployments, and edge/IoT devices. It -runs on Linux distributions (Ubuntu, Debian, CentOS, and more), container/microservice platforms (Kubernetes clusters, -Docker), and many other operating systems (FreeBSD, macOS), with no `sudo` required. - -To install Netdata in minutes on your platform: - -1. Sign up to https://app.netdata.cloud/ -2. You will be presented with an empty space, and a prompt to "Connect Nodes" with the install command for each platform -3. Select the platform you want to install Netdata to, copy and paste the script into your node's terminal, and run it - -Upon installation completing successfully, you should be able to see the node live in your Netdata Space! - -Continue reading for more advanced instructions and installation options. - -## Install on Linux with one-line installer - -The **recommended** way to install Netdata on a Linux node (physical, virtual, container, IoT) is our one-line -[kickstart script](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/kickstart.md). -This script automatically installs dependencies and builds Netdata from its source code. - -To install, copy the script, paste it into your node's terminal, and hit `Enter` to begin the installation process. - - <Tabs> - <TabItem value="wget" label=<code>wget</code>> - - <OneLineInstallWget/> - - </TabItem> - <TabItem value="curl" label=<code>curl</code>> - - <OneLineInstallCurl/> - - </TabItem> -</Tabs> - -:::note -If you plan to also Claim the node to Netdata Cloud, -make sure to replace `YOUR_CLAIM_TOKEN` with the claim token of your space, -and `YOUR_ROOM_ID` with the ID of the room you are willing to claim to. -::: - -Jump down to [what's next](#whats-next) to learn how to view your new dashboard and take your next steps monitoring and -troubleshooting with Netdata. - -## Other installation options - -<InstallRegexLink> - <InstallBoxRegexLink - to="[](https://github.com/netdata/netdata/blob/master/packaging/docker/README.md)" - os="Run with Docker" - svg="docker" /> - <InstallBoxRegexLink - to="[](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/kubernetes.md)" - os="Deploy on Kubernetes" - svg="kubernetes" /> - <InstallBoxRegexLink - to="[](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/macos.md)" - os="Install on macOS" - svg="macos" /> - <InstallBoxRegexLink - to="[](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/manual.md)" - os="Linux from Git" - svg="linux" /> - <InstallBoxRegexLink - to="[](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/source.md)" - os="Linux from source" - svg="linux" /> - <InstallBoxRegexLink - to="[](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/offline.md)" - os="Linux for offline nodes" - svg="linux" /> -</InstallRegexLink> - - -## What's next? - -To start using Netdata, open a browser and navigate to `http://NODE:19999`, replacing `NODE` with either `localhost` or -the hostname/IP address of a remote node. - -Where you go from here is based on your use case, immediate needs, and experience with monitoring and troubleshooting. - -### Dashboard - -Learn more about [how the dashboard works](https://github.com/netdata/netdata/blob/master/docs/dashboard/how-dashboard-works.mdx), or dive directly into the many ways -to [interact with charts](https://github.com/netdata/netdata/blob/master/docs/dashboard/interact-charts.mdx). - -### Configuration - -Discover the recommended way to [configure Netdata's settings or behavior](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) using our built-in -`edit-config` script, then apply that knowledge to mission-critical tweaks, such as [changing how long Netdata stores -metrics](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md). - -### Data collection - -If Netdata didn't autodetect all the hardware, containers, services, or applications running on your node, you should -learn more about [how data collectors work](https://github.com/netdata/netdata/blob/master/docs/collect/how-collectors-work.md). If there's a [supported -collector](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md) for metrics you need, [configure the collector](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure.md) -or read about its requirements to configure your endpoint to publish metrics in the correct format and endpoint. - -### Alarms & notifications - -Netdata comes with hundreds of preconfigured alarms, designed by our monitoring gurus in parallel with our open-source -community, but you may want to [edit alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md) or -[enable notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md) to customize your Netdata experience. - -### Make your deployment production ready - -Both [securing Netdata](https://github.com/netdata/netdata/blob/master/docs/configure/secure-nodes.md) and [setting up replication](https://github.com/netdata/netdata/blob/master/streaming/README.md) are strongly recommended. diff --git a/docs/getting-started/integrations.md b/docs/getting-started/integrations.md deleted file mode 100644 index 9f38a67d0..000000000 --- a/docs/getting-started/integrations.md +++ /dev/null @@ -1,12 +0,0 @@ -<!-- -title: "Integrations" -sidebar_label: "Integrations" -custom_edit_url: null -learn_status: "Published" -learn_topic_type: "Getting started" -learn_rel_path: "Getting started" -learn_docs_purpose: "Present all the Netdata integrations" -learn_doc_type: "AUTOGENERATED" ---> - -This page is autogenerated, this is placeholder document
\ No newline at end of file diff --git a/docs/getting-started/introduction.md b/docs/getting-started/introduction.md index 1ace5e3a6..b164074bd 100644 --- a/docs/getting-started/introduction.md +++ b/docs/getting-started/introduction.md @@ -1,13 +1,6 @@ -<!-- -title: "Introduction" -sidebar_label: "Introduction" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/getting-started/intro.md" -learn_status: "Published" -sidebar_position: "1" -learn_topic_type: "Getting started" -learn_rel_path: "Getting started" -learn_docs_purpose: "Present netdata in a nutshell" ---> +# Getting started with Netdata + +Learn how Netdata can get you monitoring your infrastructure in minutes. ## What is Netdata ? @@ -21,34 +14,34 @@ Netdata is: ### Simple to deploy -- **One-line deployment** for Linux distributions, plus support for Kubernetes/Docker infrastructures. -- **Zero configuration and maintenance** required to collect thousands of metrics, every second, from the underlying +- **One-line deployment** for Linux distributions, plus support for Kubernetes/Docker infrastructures. +- **Zero configuration and maintenance** required to collect thousands of metrics, every second, from the underlying OS and running applications. -- **Prebuilt charts and alarms** alert you to common anomalies and performance issues without manual configuration. -- **Distributed storage** to simplify the cost and complexity of storing metrics data from any number of nodes. +- **Prebuilt charts and alarms** alert you to common anomalies and performance issues without manual configuration. +- **Distributed storage** to simplify the cost and complexity of storing metrics data from any number of nodes. ### Powerful and scalable -- **1% CPU utilization, a few MB of RAM, and minimal disk I/O** to run the monitoring Agent on bare metal, virtual +- **1% CPU utilization, a few MB of RAM, and minimal disk I/O** to run the monitoring Agent on bare metal, virtual machines, containers, and even IoT devices. -- **Per-second granularity** for an unlimited number of metrics based on the hardware and applications you're running +- **Per-second granularity** for an unlimited number of metrics based on the hardware and applications you're running on your nodes. -- **Interoperable exporters** let you connect Netdata's per-second metrics with an existing monitoring stack and other +- **Interoperable exporters** let you connect Netdata's per-second metrics with an existing monitoring stack and other time-series databases. ### Optimized for troubleshooting -- **Visual anomaly detection** with a UI/UX that emphasizes the relationships between charts. -- **Customizable dashboards** to pinpoint correlated metrics, respond to incidents, and help you streamline your +- **Visual anomaly detection** with a UI/UX that emphasizes the relationships between charts. +- **Customizable dashboards** to pinpoint correlated metrics, respond to incidents, and help you streamline your workflows. -- **Distributed metrics in a centralized interface** to assist users or teams trace complex issues between distributed +- **Distributed metrics in a centralized interface** to assist users or teams trace complex issues between distributed nodes. ### Secure by design -- **Distributed data architecture** so fast and efficient, there’s no limit to the number of metrics you can follow. -- Because your data is **stored at the edge**, security is ensured. -- +- **Distributed data architecture** so fast and efficient, there’s no limit to the number of metrics you can follow. +- Because your data is **stored at the edge**, security is ensured. + ### Comparison with other monitoring solutions Netdata offers many benefits over the existing monitoring landscape, whether they're expensive SaaS products or other @@ -66,21 +59,19 @@ open-source tools. | **Kills the console** for tracing performance issues | The console is always required for troubleshooting | | Requires **zero dedicated resources** | Require large dedicated resources | - Netdata works with tons of applications, notifications platforms, and other time-series databases: -- **300+ system, container, and application endpoints**: Collectors autodetect metrics from default endpoints and +- **300+ system, container, and application endpoints**: Collectors autodetect metrics from default endpoints and immediately visualize them into meaningful charts designed for troubleshooting. See [everything we support](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md). -- **20+ notification platforms**: Netdata's health watchdog sends warning and critical alarms to your [favorite +- **20+ notification platforms**: Netdata's health watchdog sends warning and critical alarms to your [favorite platform](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md) to inform you of anomalies just seconds after they affect your node. -- **30+ external time-series databases**: Export resampled metrics as they're collected to other [local- and +- **30+ external time-series databases**: Export resampled metrics as they're collected to other [local- and Cloud-based databases](https://github.com/netdata/netdata/blob/master/docs/export/external-databases.md) for best-in-class interoperability. - -## How it works +## How it works Netdata is a highly efficient, highly modular, metrics management engine. Its lockless design makes it ideal for concurrent operations on the metrics. @@ -93,9 +84,9 @@ And a higher level diagram in this one. ![Diagram 2 of Netdata's core functionality](https://user-images.githubusercontent.com/1153921/95367248-5f755980-0889-11eb-827f-9b7aa02a556e.png) -You can even visit this slightly dated [interactive infographic](https://my-netdata.io/infographic.html) and get lost in a rabbit hole. +You can even visit this slightly dated [interactive infographic](https://my-netdata.io/infographic.html) and get lost in a rabbit hole. -But the best way to get under the hood or in the steering wheel of our highly efficient, low-latency system (supporting multiple readers and one writer on each metric) is to read the rest of our docs, or just to jump in and [get started](app.netdata.com). But here's a good breakdown: +But the best way to get under the hood or in the steering wheel of our highly efficient, low-latency system (supporting multiple readers and one writer on each metric) is to read the rest of our docs, or just to jump in and [get started](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md). But here's a good breakdown: ### Netdata Agent @@ -104,12 +95,56 @@ Netdata's distributed monitoring Agent collects thousands of metrics from system You can install Netdata on most Linux distributions (Ubuntu, Debian, CentOS, and more), container/microservice platforms (Kubernetes clusters, Docker), and many other operating systems (FreeBSD, macOS), with no sudo required. ### Netdata Cloud + Netdata Cloud is a web application that gives you real-time visibility for your entire infrastructure. With Netdata Cloud, you can view key metrics, insightful charts, and active alarms from all your nodes in a single web interface. When an anomaly strikes, seamlessly navigate to any node to troubleshoot and discover the root cause with the familiar Netdata dashboard. Netdata Cloud is free! You can add an entire infrastructure of nodes, invite all your colleagues, and visualize any number of metrics, charts, and alarms entirely for free. While Netdata Cloud offers a centralized method of monitoring your Agents, your metrics data is not stored or centralized in any way. Metrics data remains with your nodes and is only streamed to your browser, through Cloud, when you're viewing the Netdata Cloud interface. +## Use Netdata standalone or as part of your monitoring stack + +Netdata is an extremely powerful monitoring, visualization, and troubleshooting platform. While you can use it as an +effective standalone tool, we also designed it to be open and interoperable with other tools you might already be using. + +Netdata helps you collect everything and scales to infrastructure of any size, but it doesn't lock-in data or force you +to use specific tools or methodologies. Each feature is extensible and interoperable so they can work in parallel with +other tools. For example, you can use Netdata to collect metrics, visualize metrics with a second open-source program, +and centralize your metrics in a cloud-based time-series database solution for long-term storage or further analysis. + +You can build a new monitoring stack, including Netdata, or integrate Netdata's metrics with your existing monitoring +stack. No matter which route you take, Netdata helps you monitor infrastructure of any size. + +Here are a few ways to enrich your existing monitoring and troubleshooting stack with Netdata: + +### Collect metrics from Prometheus endpoints + +Netdata automatically detects 600 popular endpoints and collects per-second metrics from them via the [generic +Prometheus collector](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/README.md). This even +includes support for Windows 10 via [`windows_exporter`](https://github.com/prometheus-community/windows_exporter). + +This collector is installed and enabled on all Agent installations by default, so you don't need to waste time +configuring Netdata. Netdata will detect these Prometheus metrics endpoints and collect even more granular metrics than +your existing solutions. You can now use all of Netdata's meaningfully-visualized charts to diagnose issues and +troubleshoot anomalies. + +### Export metrics to external time-series databases + +Netdata can send its per-second metrics to external time-series databases, such as InfluxDB, Prometheus, Graphite, +TimescaleDB, ElasticSearch, AWS Kinesis Data Streams, Google Cloud Pub/Sub Service, and many others. + +Once you have Netdata's metrics in a secondary time-series database, you can use them however you'd like, such as +additional visualization/dashboarding tools or aggregation of data from multiple sources. + +### Visualize metrics with Grafana + +One popular monitoring stack is Netdata, Prometheus, and Grafana. Netdata acts as the stack's metrics collection +powerhouse, Prometheus as the time-series database, and Grafana as the visualization platform. You can also use Grafite instead of Prometheus, +or directly use the [Netdata source plugin for Grafana](https://blog.netdata.cloud/introducing-netdata-source-plugin-for-grafana/) + +Of course, just because you export or visualize metrics elsewhere, it doesn't mean Netdata's equivalent features +disappear. You can always build new dashboards in Netdata Cloud, drill down into per-second metrics using Netdata's +charts, or use Netdata's health watchdog to send notifications whenever an anomaly strikes. ## Community @@ -120,14 +155,14 @@ ask questions, find resources, and engage with passionate professionals. The tea You can also find Netdata on: -- [Twitter](https://twitter.com/linuxnetdata) -- [YouTube](https://www.youtube.com/c/Netdata) -- [Reddit](https://www.reddit.com/r/netdata/) -- [LinkedIn](https://www.linkedin.com/company/netdata-cloud/) -- [StackShare](https://stackshare.io/netdata) -- [Product Hunt](https://www.producthunt.com/posts/netdata-monitoring-agent/) -- [Repology](https://repology.org/metapackage/netdata/versions) -- [Facebook](https://www.facebook.com/linuxnetdata/) +- [Twitter](https://twitter.com/linuxnetdata) +- [YouTube](https://www.youtube.com/c/Netdata) +- [Reddit](https://www.reddit.com/r/netdata/) +- [LinkedIn](https://www.linkedin.com/company/netdata-cloud/) +- [StackShare](https://stackshare.io/netdata) +- [Product Hunt](https://www.producthunt.com/posts/netdata-monitoring-agent/) +- [Repology](https://repology.org/metapackage/netdata/versions) +- [Facebook](https://www.facebook.com/linuxnetdata/) ## Contribute diff --git a/docs/glossary.md b/docs/glossary.md new file mode 100644 index 000000000..fe61cc111 --- /dev/null +++ b/docs/glossary.md @@ -0,0 +1,180 @@ +# Glossary + +The Netdata community welcomes engineers, SREs, admins, etc. of all levels of expertise with engineering and the Netdata tool. And just as a journey of a thousand miles starts with one step, sometimes, the journey to mastery begins with understanding a single term. + +As such, we want to provide a little Glossary as a reference starting point for new users who might be confused about the Netdata vernacular that more familiar users might take for granted. + +If you're here looking for the definition of a term you heard elsewhere in our community or products, or if you just want to learn Netdata from the ground up, you've come to the right page. + +Use the alphabatized list below to find the answer to your single-term questions, and click the bolded list items to explore more on the topics! We'll be sure to keep constantly updating this list, so if you hear a word that you would like for us to cover, just let us know or submit a request! + +[A](#a) | [B](#b) | [C](#c) | [D](#d)| [E](#e) | [F](#f) | [G](#g) | [H](#h) | [I](#i) | [J](#j) | [K](#k) | [L](#l) | [M](#m) | [N](#n) | [O](#o) | [P](#p) +| [Q](#q) | [R](#r) | [S](#s) | [T](#t) | [U](#u) | [V](#v) | [W](#w) | [X](#x) | [Y](#y) | [Z](#z) + +## A + +- [**Agent** or **Netdata Agent**](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md): Netdata's distributed monitoring Agent collects thousands of metrics from systems, hardware, and applications with zero configuration. It runs permanently on all your physical/virtual servers, containers, cloud deployments, and edge/IoT devices. + +- [**Agent-cloud link** or **ACLK**](https://github.com/netdata/netdata/blob/master/aclk/README.md): The Agent-Cloud link (ACLK) is the mechanism responsible for securely connecting a Netdata Agent to your web browser through Netdata Cloud. + +- [**Aggregate Function**](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md#aggregate-functions-over-time): A function applied When the granularity of the data collected is higher than the plotted points on the chart. + +- [**Alerts** (formerly **Alarms**)](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.md): With the information that appears on Netdata Cloud and the local dashboard about active alerts, you can configure alerts to match your infrastructure's needs or your team's goals. + +- [**Alarm Entity Type**](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#health-entity-reference): Entity types that are attached to specific charts and use the `alarm` label. + +- [**Anomaly Advisor**](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/anomaly-advisor.md): A Netdata feature that lets you quickly surface potentially anomalous metrics and charts related to a particular highlight window of interest. + +## B + +- [**Bookmarks**](https://github.com/netdata/netdata/blob/master/docs/cloud/spaces.md#manage-spaces): Netdata Cloud's bookmarks put your tools in one accessible place. Bookmarks are shared between all War Rooms in a Space, so any users in your Space will be able to see and use them. + +## C + +- [**Child**](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.md#streaming-basics): A node, running Netdata, that streams metric data to one or more parent. + +- [**Cloud** or **Netdata Cloud**](https://github.com/netdata/netdata/blob/master/docs/quickstart/infrastructure.md): Netdata Cloud is a web application that gives you real-time visibility for your entire infrastructure. With Netdata Cloud, you can view key metrics, insightful charts, and active alarms from all your nodes in a single web interface. + +- [**Collector**](https://github.com/netdata/netdata/blob/master/collectors/README.md#collector-architecture-and-terminology): A catch-all term for any Netdata process that gathers metrics from an endpoint. + +- [**Community**](https://community.netdata.cloud/): As a company with a passion and genesis in open-source, we are not just very proud of our community, but we consider our users, fans, and chatters to be an imperative part of the Netdata experience and culture. + +- [**Composite Charts**](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md#overview-and-single-node-view): Charts used by the **Overview** tab which aggregate metrics from all the nodes (or a filtered selection) in a given War Room. + +- [**Context**](https://github.com/netdata/netdata/blob/master/docs/dashboard/dimensions-contexts-families.md#context): A way of grouping charts by the types of metrics collected and dimensions displayed. It's kind of like a machine-readable naming and organization scheme. + +- [**Custom dashboards**](https://github.com/netdata/netdata/blob/master/web/gui/custom/README.md) A dashboard that you can create using simple HTML (no javascript is required for basic dashboards). + +## D + +- [**Dashboards**](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md): Out-of-the box visual presentation of metrics that allows you to make sense of your infrastructure and its health and performance. + +- [**Definition Bar**](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md): Bar within a composite chart that provides important information and options about the metrics within the chart. + +- [**Dimension**](https://github.com/netdata/netdata/blob/master/docs/dashboard/dimensions-contexts-families.md#dimension): A dimension is a value that gets shown on a chart. + +- [**Distributed Architecture**](https://github.com/netdata/netdata/blob/master/docs/store/distributed-data-architecture.md): The data architecture mindset with which Netdata was built, where all data are collected and stored on the edge, whenever it's possible, creating countless benefits. + +## E + +- [**External Plugins**](https://github.com/netdata/netdata/blob/master/collectors/plugins.d/README.md): These gather metrics from external processes, such as a webserver or database, and run as independent processes that communicate with the Netdata daemon via pipes. + +## F + +- [**Family**](https://github.com/netdata/netdata/blob/master/docs/dashboard/dimensions-contexts-families.md#family): 1. What we consider our Netdata community of users and engineers. 2. A single instance of a hardware or software resource that needs to be displayed separately from similar instances. + +- [**Flood Protection**](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.md#flood-protection): If a node has too many state changes like firing too many alerts or going from reachable to unreachable, Netdata Cloud enables flood protection. As long as a node is in flood protection mode, Netdata Cloud does not send notifications about this node + +- [**Functions** or **Netdata Functions**](https://github.com/netdata/netdata/blob/master/docs/cloud/netdata-functions.md): Routines exposed by a collector on the Netdata Agent that can bring additional information to support troubleshooting or trigger some action to happen on the node itself. + +## G + +- [**Guided Troubleshooting**](https://github.com/netdata/netdata/blob/master/docs/category-overview-pages/troubleshooting-overview.md): Troubleshooting with our Machine-Learning-powered tools designed to give you a cutting edge advantage in your troubleshooting battles. + +- [**Group by**](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md#group-by-dimension-node-or-chart): The drop-down on the dimension bar of a composite chart that allows you to group metrics by dimension, node, or chart. + +## H + +- [**Headless Collector Streaming**](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.md#supported-streaming-configurations): Streaming configuration where child `A`, _without_ a database or web dashboard, streams metrics to parent `B`. + +- [**Health Configuration Files**](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#edit-health-configuration-files): Files that you can edit to configure your Agent's health watchdog service. + +- [**Health Entity Reference**](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#health-entity-reference): + +- [**Home** tab](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md#home): Tab in Netdata Cloud that provides a predefined dashboard of relevant information about entities in the War Room. + +## I + +- [**Internal plugins**](https://github.com/netdata/netdata/blob/master/collectors/README.md#collector-architecture-and-terminology): These gather metrics from `/proc`, `/sys`, and other Linux kernel sources. They are written in `C` and run as threads within the Netdata daemon. + +## K + +- [**Kickstart** or **Kickstart Script**](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/kickstart.md): An automatic one-line installation script named 'kickstart.sh' that works on all Linux distributions and macOS. + +- [**Kubernetes Dashboard** or **Kubernetes Tab**](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/kubernetes.md): Netdata Cloud features enhanced visualizations for the resource utilization of Kubernetes (k8s) clusters, embedded in the default Overview dashboard. + +## M + +- [**Metrics Collection**](https://github.com/netdata/netdata/blob/master/collectors/README.md): With zero configuration, Netdata auto-detects thousands of data sources upon starting and immediately collects per-second metrics. Netdata can immediately collect metrics from these endpoints thanks to 300+ collectors, which all come pre-installed when you install Netdata. + +- [**Metric Correlations**](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md): A Netdata feature that lets you quickly find metrics and charts related to a particular window of interest that you want to explore further. + +- [**Metrics Exporting**](https://github.com/netdata/netdata/blob/master/docs/export/external-databases.md): Netdata allows you to export metrics to external time-series databases with the exporting engine. This system uses a number of connectors to initiate connections to more than thirty supported databases, including InfluxDB, Prometheus, Graphite, ElasticSearch, and much more. + +- [**Metrics Storage**](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md): Upon collection the collected metrics need to be either forwarded, exported or just stored for further treatment. The Agent is capable to store metrics both short and long-term, with or without the usage of non-volatile storage. + +- [**Metrics Streaming Replication**](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.md): Each node running Netdata can stream the metrics it collects, in real time, to another node. Metric streaming allows you to replicate metrics data across multiple nodes, or centralize all your metrics data into a single time-series database (TSDB). + +- [**Module**](https://github.com/netdata/netdata/blob/master/collectors/REFERENCE.md#enable-and-disable-a-specific-collection-module): A type of collector. + +## N + +- [**Netdata**](https://github.com/netdata/netdata/blob/master/docs/getting-started/introduction.md): Netdata is a monitoring tool designed by system administrators, DevOps engineers, and developers to collect everything, help you visualize +metrics, troubleshoot complex performance problems, and make data interoperable with the rest of your monitoring stack. + +- [**Netdata Agent** or **Agent**](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md): Netdata's distributed monitoring Agent collects thousands of metrics from systems, hardware, and applications with zero configuration. It runs permanently on all your physical/virtual servers, containers, cloud deployments, and edge/IoT devices. + +- [**Netdata Cloud** or **Cloud**](https://github.com/netdata/netdata/blob/master/docs/quickstart/infrastructure.md): Netdata Cloud is a web application that gives you real-time visibility for your entire infrastructure. With Netdata Cloud, you can view key metrics, insightful charts, and active alarms from all your nodes in a single web interface. + +- [**Netdata Functions** or **Functions**](https://github.com/netdata/netdata/blob/master/docs/cloud/netdata-functions.md): Routines exposed by a collector on the Netdata Agent that can bring additional information to support troubleshooting or trigger some action to happen on the node itself. + +<!-- No link for this keyword - **Netdata Logs** https://github.com/netdata/netdata/blob/master/docs/tasks/miscellaneous/check-netdata-logs.md: The three log files - `error.log`, `access.log` and `debug.log` - used by Netdata --> + +<!-- Here we need to explain Agent notifications and Cloud notifications, not just "notifications" + +- **Notifications** https://github.com/netdata/netdata/blob/master/docs/concepts/health-monitoring/notifications.md: Netdata can send centralized alert notifications to your team whenever a node enters a warning, critical, or unreachable state. By enabling notifications, you ensure no alert, on any node in your infrastructure, goes unnoticed by you or your team. --> + +## O + +- [**Obsoletion**(of nodes)](https://github.com/netdata/netdata/blob/master/docs/cloud/spaces.md#obsoleting-offline-nodes-from-a-space): Removing nodes from a space. + +- [**Orchestrators**](https://github.com/netdata/netdata/blob/master/collectors/README.md#collector-architecture-and-terminology): External plugins that run and manage one or more modules. They run as independent processes. + +- [**Overview** tab](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md#overview-and-single-node-view): Tab in Netdata Cloud that uses composite charts. These charts display real-time aggregated metrics from all the nodes (or a filtered selection) in a given War Room. + +## P + +- [**Parent**](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.md#streaming-basics): A node, running Netdata, that receives streamed metric data. + +- [**Proxy**](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.md#streaming-basics): A node, running Netdata, that receives metric data from a child and "forwards" them on to a separate parent node. + +- [**Proxy Streaming**](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.md#supported-streaming-configurations): Streaming configuration where child `A`, _with or without_ a database, sends metrics to proxy `C`, also _with or without_ a database. `C` sends metrics to parent `B` + +## R + +- [**Registry**](https://github.com/netdata/netdata/blob/master/registry/README.md): Registry that allows Netdata to provide unified cross-server dashboards. + +- [**Replication Streaming**](https://github.com/netdata/netdata/blob/master/streaming/README.md): Streaming configuration where child `A`, _with_ a database and web dashboard, streams metrics to parent `B`. + +- [**Room** or **War Room**](https://github.com/netdata/netdata/blob/master/docs/cloud/war-rooms.md): War Rooms organize your connected nodes and provide infrastructure-wide dashboards using real-time metrics and visualizations. + +## S + +- [**Single Node Dashboard**](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md#overview-and-single-node-view): A dashboard pre-configured with every installation of the Netdata agent, with thousand of metrics and hundreds of interactive charts that requires no set up. + +<!-- No link for this file in current structure. - **Snapshots** https://github.com/netdata/netdata/blob/master/docs/tasks/miscellaneous/snapshot-data.md: An image of your dashboard at any given time, whicn can be imiported into any other node running Netdata or used to genereated a PDF file for your records. --> + +- [**Space**](https://github.com/netdata/netdata/blob/master/docs/cloud/spaces.md): A high-level container and virtual collaboration area where you can organize team members, access levels,and the nodes you want to monitor. + +## T + +- [**Template Entity Type**](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#entity-types): Entity type that defines rules that apply to all charts of a specific context, and use the template label. Templates help you apply one entity to all disks, all network interfaces, all MySQL databases, and so on. + +- [**Tiers**](https://github.com/netdata/netdata/blob/master/database/engine/README.md#tiers): Tiering is a mechanism of providing multiple tiers of data with different granularity of metrics (the frequency they are collected and stored, i.e. their resolution). + +## U + +- [**Unlimited Scalability**](https://www.netdata.cloud/#:~:text=love%20community%20contributions!-,Infinite%20Scalability,-By%20storing%20data): With Netdata's distributed architecture, you can seamless observe a couple, hundreds or +even thousands of nodes. There are no actual bottlenecks especially if you retain metrics locally in the Agents. + +## V + +- [**Visualizations**](https://github.com/netdata/netdata/blob/master/docs/category-overview-pages/visualizations-overview.md): Netdata uses dimensions, contexts, and families to sort your metric data into graphs, charts, and alerts that maximize your understand of your infrastructure and your ability to troubleshoot it, along or on a team. + +## W + +- [**War Room** or **Room**](https://github.com/netdata/netdata/blob/master/docs/cloud/war-rooms.md): War Rooms organize your connected nodes and provide infrastructure-wide dashboards using real-time metrics and visualizations. + +## Z + +- [**Zero Configuration**](https://github.com/netdata/netdata/blob/master/docs/getting-started/introduction.md#simple-to-deploy): Netdata is preconfigured and capable to autodetect and monitor any well known application that runs on your system. You just deploy and claim Netdata Agents in your Netdata space, and monitor them in seconds. diff --git a/docs/guidelines.md b/docs/guidelines.md index 6c1c3ba7c..e8ff98e4e 100644 --- a/docs/guidelines.md +++ b/docs/guidelines.md @@ -1,772 +1,71 @@ -<!-- -title: "Contribute to the documentation" -sidebar_label: "to Documentation" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/guidelines.md" -sidebar_position: "10000" -learn_status: "Published" -learn_topic_type: "Custom" -learn_rel_path: "Contribute" -learn_docs_purpose: "TBD" ---> - -import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; +# Contribute to the documentation Welcome to our docs developer guidelines! -This document will guide you to the process of contributing to our -docs (**learn.netdata.cloud**) +This document will guide you to the process of contributing to our docs (**learn.netdata.cloud**) ## Documentation architecture -Netdata docs follows has two principals. - -1. Keep the documentation of each component _as close as you can to the codebase_ -2. Every component is analyzed via topic related docs. - -To this end: - -1. Documentation lives in every possible repo in the netdata organization. At the moment we contribute to: - - netdata/netdata - - netdata/learn (final site) - - netdata/go.d.plugin - - netdata/agent-service-discovery - - In each of these repos you will find markdown files. These markdown files may or not be part of the final docs. You - understand what documents are part of the final docs in the following section:[_How to update documentation of - learn.netdata.cloud_](#how-to-update-documentation-of-learn-netdata-cloud) - -2. Netdata docs processes are inspired from - the [DITA 1.2 guidelines](http://docs.oasis-open.org/dita/v1.2/os/spec/archSpec/dita-1.2_technicalContent_overview.html) - for Technical content. - -## Topic types - -### Concepts - -A concept introduces a single feature or concept. A concept should answer the questions: - -- What is this? -- Why would I use it? - -Concept topics: - -- Are abstract ideas -- Explain meaning or benefit -- Can stay when specifications change -- Provide background information - -### Tasks - -Concept and reference topics exist to support tasks. _The goal for users … is not to understand a concept but to -complete a task_. A task gives instructions for how to complete a procedure. - -Much of the uncertainty whether a topic is a concept or a reference disappears, when you have strong, solid task topics -in place, furthermore topics directly address your users and their daily tasks and help them to get their job done. A -task **must give an answer** to the **following questions**: - -- How do I create cool espresso drinks with my new coffee machine? -- How do I clean the milk steamer? - -For the title text, use the structure active verb + noun. For example, for instance _Deploy the Agent_. - -### References - -The reference document and information types provide for the separation of fact-based information from concepts and -tasks. \ -Factual information may include tables and lists of specifications, parameters, parts, commands, edit-files and other -information that the users are likely to look up. The reference information type allows fact-based content to be -maintained by those responsible for its accuracy and consistency. - -## Contribute to the documentation of learn.netdata.cloud - -### Encapsulate topics into markdown files. - -Netdata uses markdown files to document everything. To implement concrete sections of these [Topic types](#topic-types) -we encapsulate this logic as follows. Every document is characterized by its topic type ('learn_topic_type' metadata -field). To avoid breaking every single netdata concept into numerous small markdown files each document can be either a -single `Reference` or `Concept` or `Task` or a group of `References`, `Concepts`, `Tasks`. - -To this end, every single topic is encapsulated into a `Heading 3 (###)` section. That means, when you have a single -file you only make use of `Headings 4` and lower (`4, 5, 6`, for templated section or subsection). In case you want to -includ multiple (`Concepts` let's say) in a single document, you use `Headings 3` to seperate each concept. `Headings 2` -are used only in case you want to logically group topics inside a document. - -For instance: - -```markdown - -Small introduction of the document. - -### Concept A - -Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna -aliqua. - -#### Field from template 1 - -Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. - -#### Field from template 1 - -Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. - -##### Subsection 1 - -. . . - -### Concept A - -Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. - -#### Field from template 1 - -. . . - - -``` - -This approach gives a clean and readable outlook in each document from a single sidebar. - -Here you can find the preferred templates for each topic type: - - -<Tabs> - <TabItem value="Concept" label="Concept" default> - - ```markdown - Small intro, give some context to the user of what you will cover on this document - - ### concept title (omit if the document describes only one concept) - - A concept introduces a single feature or concept. A concept should answer the questions: - - 1. What is this? - 2. Why would I use it? - - ``` - - </TabItem> - <TabItem value="Task" label="Tasks"> - - ```markdown - Small intro, give some context to the user of what you will cover on this document - - ### Task title (omit if the document describes only one task) - - #### Prerequisite - - Unordered list of what you will need. - - #### Steps - - Exact list of step the user must follow - - #### Expected result - - What you expect to see when you complete the steps above - - #### Example - - Example configuration/actions of the task - - #### Related reference documentation - - List of reference docs user needs to be aware of. - ``` - - </TabItem> - <TabItem value="Reference-collectors" label="Reference-collectors"> +Our documentation in <https://learn.netdata.cloud> is generated by markdown documents in the public Github repositories of the "netdata" organization. - ```markdown - Small intro, give some context to the user of what you will cover on this document +The structure of the documentation is handled by a [map](https://github.com/netdata/learn/blob/master/map.tsv) file, that contains metadata for every markdown files in the repos we ingest from. - ### Reference name (omit if the document describes only one reference) +Then the ingest script parses that map and organizes the markdown files accordingly. - #### Requirements - - Document any dependencies needed to run this module +### Improve existing documentation - #### Requirements on the monitored component +The easiest way to contribute to Netdata's documentation is to edit a file directly on GitHub. This is perfect for small fixes to a single document, such as fixing a typo or clarifying a confusing sentence. - Document any steps user must take to sucessful monitor application, - for instance (create a user) +Each published document on [Netdata Learn](https://learn.netdata.cloud) includes at the bottom a link to **Edit this page**. +Clicking on that link is the recommended way to improve our documentation, as it leads you directly to GitHub's code editor. +Make your suggested changes, and use the ***Preview changes*** button to ensure your Markdown syntax works as expected. - #### Configuration files - - table with path and configuration files purpose - Columns: File name | Description (Purpose in a nutshell) - - #### Data collection - - To make changes, see `the ./edit-config task <link>` - - #### Auto discovery - - ##### Single node installation - - . . . we autodetect localhost:port and what configurations are defaults - - ##### Kubernetes installations - - . . . Service discovery, click here - - #### Metrics - - Columns: Metric (Context) | Scope | description (of the context) | dimensions | units (of the context) | Alert triggered - - - #### Alerts - - Collapsible content for every alert, just like the alert guides - - #### Configuration options - - Table with all the configuration options available. - - Columns: name | description | default | file_name - - #### Configuration example - - Default configuration example - - #### Troubleshoot - - backlink to the task to run this module in debug mode (here you provide the debug flags) - - -``` - - </TabItem> -</Tabs> - -### Metadata fields - -All Docs that are supposed to be part of learn.netdata.cloud have **hidden** sections in the begining of document. These -sections are plain lines of text and we call them metadata. Their represented as `key : "Value"` pairs. Some of them are -needed from our statice website builder (docusaurus) others are needed for our internal pipelines to build docs -(have prefix `learn_`). - -So let's go through the different necessary metadata tags to get a document properly published on Learn: - -| metadata_key | Value(s) | Frontmatter effect | Mandatory | Limitations | -|:---------------------:|---------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------:|:---------------------------------------:| -| `title` | `String` | Title in each document | yes | | -| `custom_edit_url` | `String` | The source GH link of the file | yes | | -| `description` | `String or multiline String` | - | yes | | -| `sidebar_label` | `String or multiline String` | Name in the TOC tree | yes | | -| `sidebar_position` | `String or multiline String` | Global position in the TOC tree (local for per folder) | yes | | -| `learn_status` | [`Published`, `Unpublished`, `Hidden`] | `Published`: Document visible in learn,<br/> `Unpublished`: Document archived in learn, <br/>`Hidden`: Documentplaced under learn_rel_path but it's hidden] | yes | | -| `learn_topic_type` | [`Concepts`, `Tasks`, `References`, `Getting Started`] | | yes | | -| `learn_rel_path` | `Path` (the path you want this file to appear in learn<br/> without the /docs prefix and the name of the file | | yes | | -| `learn_autogenerated` | `Dictionary` (for internal use) | | no | Keys in the dictionary must be in `' '` | - -:::important - -1. In case any mandatory tags are missing or falsely inputted the file will remain unpublished. This is by design to - prevent non-properly tagged files from getting published. -2. All metadata values must be included in `" "`. From `string` noted text inside the fields use `' ''` - - -While Docusaurus can make use of more metadata tags than the above, these are the minimum we require to publish the file -on Learn. - -::: - -### Placing a document in learn - -Here you can see how the metadata are parsed and create a markdown file in learn. - -![](https://user-images.githubusercontent.com/12612986/207310336-f7cc150b-543c-4f13-be98-5058a4d29284.png) - -### Before you get started - -Anyone interested in contributing to documentation should first read the [Netdata style guide](#styling-guide) further -down below and the [Netdata Community Code of Conduct](https://github.com/netdata/.github/blob/main/CODE_OF_CONDUCT.md). - -Netdata's documentation uses Markdown syntax. If you're not familiar with Markdown, read -the [Mastering Markdown](https://guides.github.com/features/mastering-markdown/) guide from GitHub for the basics on -creating paragraphs, styled text, lists, tables, and more, and read further down about some special -occasions [while writing in MDX](#mdx-and-markdown). - -### Making your first contribution - -The easiest way to contribute to Netdata's documentation is to edit a file directly on GitHub. This is perfect for small -fixes to a single document, such as fixing a typo or clarifying a confusing sentence. - -Click on the **Edit this page** button on any published document on [Netdata Learn](https://learn.netdata.cloud). Each -page has two of these buttons: One beneath the table of contents, and another at the end of the document, which take you -to GitHub's code editor. Make your suggested changes, keeping the [Netdata style guide](#styling-guide) -in mind, and use the ***Preview changes*** button to ensure your Markdown syntax works as expected. - -Under the **Commit changes** header, write descriptive title for your requested change. Click the **Commit changes** -button to initiate your pull request (PR). +Under the **Commit changes** header, write descriptive title for your requested change. Click the **Commit changes** button to initiate your pull request (PR). Jump down to our instructions on [PRs](#making-a-pull-request) for your next steps. -**Note**: If you wish to contribute documentation that is more tailored from your specific infrastructure -monitoring/troubleshooting experience, please consider submitting a blog post about your experience. Check -the [README](https://github.com/netdata/blog/blob/master/README.md) in our blog repo! Any blog submissions that have -widespread or universal application will be integrated into our permanent documentation. - -### Edit locally - -Editing documentation locally is the preferred method for complex changes that span multiple documents or change the -documentation's style or structure. - -Create a fork of the Netdata Agent repository by visit the [Netdata repository](https://github.com/netdata/netdata) and -clicking on the **Fork** button. - -GitHub will ask you where you want to clone the repository. When finished, you end up at the index of your forked -Netdata Agent repository. Clone your fork to your local machine: - -```bash -git clone https://github.com/YOUR-GITHUB-USERNAME/netdata.git -``` - -Create a new branch using `git checkout -b BRANCH-NAME`. Use your favorite text editor to make your changes, keeping -the [Netdata style guide](https://github.com/netdata/netdata/blob/master/docs/contributing/style-guide.md) in mind. Add, commit, and push changes to your fork. When you're -finished, visit the [Netdata Agent Pull requests](https://github.com/netdata/netdata/pulls) to create a new pull request -based on the changes you made in the new branch of your fork. - -### Making a pull request - -Pull requests (PRs) should be concise and informative. See our [PR guidelines](/contribute/handbook#pr-guidelines) for -specifics. - -- The title must follow the [imperative mood](https://en.wikipedia.org/wiki/Imperative_mood) and be no more than ~50 - characters. -- The description should explain what was changed and why. Verify that you tested any code or processes that you are - trying to change. - -The Netdata team will review your PR and assesses it for correctness, conciseness, and overall quality. We may point to -specific sections and ask for additional information or other fixes. - -After merging your PR, the Netdata team rebuilds the [documentation site](https://learn.netdata.cloud) to publish the -changed documentation. - -## Styling guide - -The *Netdata style guide* establishes editorial guidelines for any writing produced by the Netdata team or the Netdata -community, including documentation, articles, in-product UX copy, and more. Both internal Netdata teams and external -contributors to any of Netdata's open-source projects should reference and adhere to this style guide as much as -possible. - -Netdata's writing should **empower** and **educate**. You want to help people understand Netdata's value, encourage them -to learn more, and ultimately use Netdata's products to democratize monitoring in their organizations. To achieve these -goals, your writing should be: - -- **Clear**. Use simple words and sentences. Use strong, direct, and active language that encourages readers to action. -- **Concise**. Provide solutions and answers as quickly as possible. Give users the information they need right now, - along with opportunities to learn more. -- **Universal**. Think of yourself as a guide giving a tour of Netdata's products, features, and capabilities to a - diverse group of users. Write to reach the widest possible audience. - -You can achieve these goals by reading and adhering to the principles outlined below. - -If you're not familiar with Markdown, read -the [Mastering Markdown](https://guides.github.com/features/mastering-markdown/) guide from GitHub for the basics on -creating paragraphs, styled text, lists, tables, and more. - -The following sections describe situations in which a specific syntax is required. - -#### Syntax standards (`remark-lint`) - -The Netdata team uses [`remark-lint`](https://github.com/remarkjs/remark-lint) for Markdown code styling. +### Create a new document -- Use a maximum of 120 characters per line. -- Begin headings with hashes, such as `# H1 heading`, `## H2 heading`, and so on. -- Use `_` for italics/emphasis. -- Use `**` for bold. -- Use dashes `-` to begin an unordered list, and put a single space after the dash. -- Tables should be padded so that pipes line up vertically with added whitespace. +You can create a pull request to add a completely new markdown document in any of our public repositories. +After the Github pull request is merged, our documentation team will decide where in the documentation hierarchy to publish that document. -If you want to see all the settings, open the -[`remarkrc.js`](https://github.com/netdata/netdata/blob/master/.remarkrc.js) file in the `netdata/netdata` repository. +If you wish to contribute documentation that is tailored to your specific infrastructure monitoring/troubleshooting experience, please consider submitting a blog post about your experience. -#### MDX and markdown +Check out our [blog](https://github.com/netdata/blog#readme) repo! Any blog submissions that have widespread or universal application will be integrated into our permanent documentation. -While writing in Docusaurus, you might want to take leverage of it's features that are supported in MDX formatted files. -One of those that we use is [Tabs](https://docusaurus.io/docs/next/markdown-features/tabs). They use an HTML syntax, -which requires some changes in the way we write markdown inside them. +#### Before you get started -In detail: +Anyone interested in contributing significantly to documentation should first read the [Netdata style guide](https://github.com/netdata/netdata/blob/master/docs/contributing/style-guide.md) and the [Netdata Community Code of Conduct](https://github.com/netdata/.github/blob/main/CODE_OF_CONDUCT.md). -Due to a bug with docusaurus, we prefer to use `<h1>heading</h1> instead of # H1` so that docusaurus doesn't render the -contents of all Tabs on the right hand side, while not being able to navigate -them [relative link](https://github.com/facebook/docusaurus/issues/7008). +Netdata's documentation uses Markdown syntax. If you're not familiar with Markdown, read the [Mastering Markdown](https://guides.github.com/features/mastering-markdown/) guide from GitHub for the basics on creating paragraphs, styled text, lists, tables, and more. -You can use markdown syntax for every other styling you want to do except Admonitions: -For admonitions, follow [this](https://docusaurus.io/docs/markdown-features/admonitions#usage-in-jsx) guide to use -admonitions inside JSX. While writing in JSX, all the markdown stylings have to be in HTML format to be rendered -properly. +#### Edit locally -#### Admonitions +Editing documentation locally is the preferred method for completely new documents, or complex changes that span multiple documents. Clone the repository where you wish to make your changes, work on a new branch and create a pull request with that branch. -Use admonitions cautiously. Admonitions may draw user's attention, to that end we advise you to use them only for side -content/info, without significantly interrupting the document flow. +### Links to other documents -You can find the supported admonitions in the docusaurus's [documentation](https://docusaurus.io/docs/markdown-features/admonitions). +Please ensure that any links to a different documentation resource are fully expanded URLs to the relevant markdown document, not links to "learn.netdata.cloud". -#### Images +e.g. -Don't rely on images to convey features, ideas, or instructions. Accompany every image with descriptive alt text. - -In Markdown, use the standard image syntax, `![](/docs/agent/contributing)`, and place the alt text between the -brackets `[]`. Here's an example using our logo: - -```markdown -![The Netdata logo](/docs/agent/web/gui/static/img/netdata-logomark.svg) ``` - -Reference in-product text, code samples, and terminal output with actual text content, not screen captures or other -images. Place the text in an appropriate element, such as a blockquote or code block, so all users can parse the -information. - -#### Syntax highlighting - -Our documentation site at [learn.netdata.cloud](https://learn.netdata.cloud) uses -[Prism](https://v2.docusaurus.io/docs/markdown-features#syntax-highlighting) for syntax highlighting. Netdata can use -any of -the [supported languages by prism-react-renderer](https://github.com/FormidableLabs/prism-react-renderer/blob/master/src/vendor/prism/includeLangs.js) -. - -If no language is specified, Prism tries to guess the language based on its content. - -Include the language directly after the three backticks (```` ``` ````) that start the code block. For highlighting C -code, for example: - -````c -```c -inline char *health_stock_config_dir(void) { - char buffer[FILENAME_MAX + 1]; - snprintfz(buffer, FILENAME_MAX, "%s/health.d", netdata_configured_stock_config_dir); - return config_get(CONFIG_SECTION_DIRECTORIES, "stock health config", buffer); -} -``` -```` - -And the prettified result: - -```c -inline char *health_stock_config_dir(void) { - char buffer[FILENAME_MAX + 1]; - snprintfz(buffer, FILENAME_MAX, "%s/health.d", netdata_configured_stock_config_dir); - return config_get(CONFIG_SECTION_DIRECTORIES, "stock health config", buffer); -} +[Correct link to this document](https://github.com/netdata/netdata/blob/master/docs/guidelines.md) +vs +[Incorrect link to this document](https://learn.netdata.cloud/XYZ) ``` -Prism also supports titles and line highlighting. See -the [Docusaurus documentation](https://v2.docusaurus.io/docs/markdown-features#code-blocks) for more information. - -## Language, grammar, and mechanics - -#### Voice and tone - -One way we write empowering, educational content is by using a consistent voice and an appropriate tone. - -*Voice* is like your personality, which doesn't really change day to day. - -*Tone* is how you express your personality. Your expression changes based on your attitude or mood, or based on who -you're around. In writing, your reflect tone in your word choice, punctuation, sentence structure, or even the use of -emoji. - -The same idea about voice and tone applies to organizations, too. Our voice shouldn't change much between two pieces of -content, no matter who wrote each, but the tone might be quite different based on who we think is reading. - -For example, a [blog post](https://www.netdata.cloud/blog/) and a [press release](https://www.netdata.cloud/news/) -should have a similar voice, despite most often being written by different people. However, blog posts are relaxed and -witty, while press releases are focused and academic. You won't see any emoji in a press release. - -##### Voice - -Netdata's voice is authentic, passionate, playful, and respectful. - -- **Authentic** writing is honest and fact-driven. Focus on Netdata's strength while accurately communicating what - Netdata can and cannot do, and emphasize technical accuracy over hard sells and marketing jargon. -- **Passionate** writing is strong and direct. Be a champion for the product or feature you're writing about, and let - your unique personality and writing style shine. -- **Playful** writing is friendly, thoughtful, and engaging. Don't take yourself too seriously, as long as it's not at - the expense of Netdata or any of its users. -- **Respectful** writing treats people the way you want to be treated. Prioritize giving solutions and answers as - quickly as possible. - -##### Tone - -Netdata's tone is fun and playful, but clarity and conciseness comes first. We also tend to be informal, and aren't -afraid of a playful joke or two. - -While we have general standards for voice and tone, we do want every individual's unique writing style to reflect in -published content. - -#### Universal communication - -Netdata is a global company in every sense, with employees, contributors, and users from around the world. We strive to -communicate in a way that is clear and easily understood by everyone. - -Here are some guidelines, pointers, and questions to be aware of as you write to ensure your writing is universal. Some -of these are expanded into individual sections in -the [language, grammar, and mechanics](#language-grammar-and-mechanics) section below. - -- Would this language make sense to someone who doesn't work here? -- Could someone quickly scan this document and understand the material? -- Create an information hierarchy with key information presented first and clearly called out to improve scannability. -- Avoid directional language like "sidebar on the right of the page" or "header at the top of the page" since - presentation elements may adapt for devices. -- Use descriptive links rather than "click here" or "learn more". -- Include alt text for images and image links. -- Ensure any information contained within a graphic element is also available as plain text. -- Avoid idioms that may not be familiar to the user or that may not make sense when translated. -- Avoid local, cultural, or historical references that may be unfamiliar to users. -- Prioritize active, direct language. -- Avoid referring to someone's age unless it is directly relevant; likewise, avoid referring to people with age-related - descriptors like "young" or "elderly." -- Avoid disability-related idioms like "lame" or "falling on deaf ears." Don't refer to a person's disability unless - it’s directly relevant to what you're writing. -- Don't call groups of people "guys." Don't call women "girls." -- Avoid gendered terms in favor of neutral alternatives, like "server" instead of "waitress" and "businessperson" - instead of "businessman." -- When writing about a person, use their communicated pronouns. When in doubt, just ask or use their name. It's OK to - use "they" as a singular pronoun. - -> Some of these guidelines were adapted from MailChimp under the Creative Commons license. - -To ensure Netdata's writing is clear, concise, and universal, we have established standards for language, grammar, and -certain writing mechanics. However, if you're writing about Netdata for an external publication, such as a guest blog -post, follow that publication's style guide or standards, while keeping -the [preferred spelling of Netdata terms](#netdata-specific-terms) in mind. - -#### Active voice - -Active voice is more concise and easier to understand compared to passive voice. When using active voice, the subject of -the sentence is action. In passive voice, the subject is acted upon. A famous example of passive voice is the phrase -"mistakes were made." - -| | | -| --------------- | ----------------------------------------------------------------------------------------- | -| Not recommended | When an alarm is triggered by a metric, a notification is sent by Netdata. | -| **Recommended** | When a metric triggers an alarm, Netdata sends a notification to your preferred endpoint. | - -#### Second person - -Use the second person ("you") to give instructions or "talk" directly to users. - -In these situations, avoid "we," "I," "let's," and "us," particularly in documentation. The "you" pronoun can also be -implied, depending on your sentence structure. - -One valid exception is when a member of the Netdata team or community wants to write about said team or community. - -| | | -| ------------------------------ | ------------------------------------------------------------ | -| Not recommended | To install Netdata, we should try the one-line installer... | -| **Recommended** | To install Netdata, you should try the one-line installer... | -| **Recommended**, implied "you" | To install Netdata, try the one-line installer... | - -#### "Easy" or "simple" - -Using words that imply the complexity of a task or feature goes against our policy -of [universal communication](#universal-communication). If you claim that a task is easy and the reader struggles to -complete it, you may inadvertently discourage them. - -However, if you give users two options and want to relay that one option is genuinely less complex than another, be -specific about how and why. - -For example, don't write, "Netdata's one-line installer is the easiest way to install Netdata." Instead, you might want -to say, "Netdata's one-line installer requires fewer steps than manually installing from source." - -#### Slang, metaphors, and jargon - -A particular word, phrase, or metaphor you're familiar with might not translate well to the other cultures featured -among Netdata's global community. We recommended you avoid slang or colloquialisms in your writing. - -In addition, don't use abbreviations that have not yet been defined in the content. See our section on -[abbreviations](#abbreviations-acronyms-and-initialisms) for additional guidance. - -If you must use industry jargon, such as "mean time to resolution," define the term as clearly and concisely as you can. - -> Netdata helps you reduce your organization's mean time to resolution (MTTR), which is the average time the responsible -> team requires to repair a system and resolve an ongoing incident. - -#### Spelling - -While the Netdata team is mostly *not* American, we still aspire to use American spelling whenever possible, as it is -the standard for the monitoring industry. - -See the [word list](#word-list) for spellings of specific words. - -#### Capitalization - -Follow the general [English standards](https://owl.purdue.edu/owl/general_writing/mechanics/help_with_capitals.html) for -capitalization. In summary: +This permalink ensures that the link will not be broken by any future restructuring in learn.netdata.cloud. -- Capitalize the first word of every new sentence. -- Don't use uppercase for emphasis. (Netdata is the BEST!) -- Capitalize the names of brands, software, products, and companies according to their official guidelines. (Netdata, - Docker, Apache, NGINX) -- Avoid camel case (NetData) or all caps (NETDATA). +You can see the URL to the source of any published documentation page in the **Edit this page** link at the bottom. +If you just replace `edit` with `blob` in that URL, you have the permalink to the original markdown document. -Whenever you refer to the company Netdata, Inc., or the open-source monitoring agent the company develops, capitalize -**Netdata**. - -However, if you are referring to a process, user, or group on a Linux system, use lowercase and fence the word in an -inline code block: `` `netdata` ``. - -| | | -| --------------- | ---------------------------------------------------------------------------------------------- | -| Not recommended | The netdata agent, which spawns the netdata process, is actively maintained by netdata, inc. | -| **Recommended** | The Netdata Agent, which spawns the `netdata` process, is actively maintained by Netdata, Inc. | - -##### Capitalization of document titles and page headings - -Document titles and page headings should use sentence case. That means you should only capitalize the first word. - -If you need to use the name of a brand, software, product, and company, capitalize it according to their official -guidelines. - -Also, don't put a period (`.`) or colon (`:`) at the end of a title or header. - -| | | -| --------------- | --------------------------------------------------------------------------------------------------- | -| Not recommended | Getting Started Guide <br />Service Discovery and Auto-Detection: <br />Install netdata with docker | -| ** -Recommended** | Getting started guide <br />Service discovery and auto-detection <br />Install Netdata with Docker | - -#### Abbreviations (acronyms and initialisms) - -Use abbreviations (including [acronyms and initialisms](https://www.dictionary.com/e/acronym-vs-abbreviation/)) in -documentation when one exists, when it's widely accepted within the monitoring/sysadmin community, and when it improves -the readability of a document. - -When introducing an abbreviation to a document for the first time, give the reader both the spelled-out version and the -shortened version at the same time. For example: - -> Use Netdata to monitor Extended Berkeley Packet Filter (eBPF) metrics in real-time. After you define an abbreviation, don't switch back and forth. Use only the abbreviation for the rest of the document. - -You can also use abbreviations in a document's title to keep the title short and relevant. If you do this, you should -still introduce the spelled-out name alongside the abbreviation as soon as possible. - -#### Clause order - -When instructing users to take action, give them the context first. By placing the context in an initial clause at the -beginning of the sentence, users can immediately know if they want to read more, follow a link, or skip ahead. - -| | | -| --------------- | ------------------------------------------------------------------------------ | -| Not recommended | Read the reference guide if you'd like to learn more about custom dashboards. | -| **Recommended** | If you'd like to learn more about custom dashboards, read the reference guide. | - -#### Oxford comma - -The Oxford comma is the comma used after the second-to-last item in a list of three or more items. It appears just -before "and" or "or." - -| | | -| --------------- | ---------------------------------------------------------------------------- | -| Not recommended | Netdata can monitor RAM, disk I/O, MySQL queries per second and lm-sensors. | -| **Recommended** | Netdata can monitor RAM, disk I/O, MySQL queries per second, and lm-sensors. | - -#### Future releases or features - -Do not mention future releases or upcoming features in writing unless they have been previously communicated via a -public roadmap. - -In particular, documentation must describe, as accurately as possible, the Netdata Agent _as of -the [latest commit](https://github.com/netdata/netdata/commits/master) in the GitHub repository_. For Netdata Cloud, -documentation must reflect the *current state* of [production](https://app.netdata.cloud). - -#### Informational links - -Every link should clearly state its destination. Don't use words like "here" to describe where a link will take your -reader. - -| | | -| --------------- | ------------------------------------------------------------------------------------------ | -| Not recommended | To install Netdata, click [here](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md). | -| **Recommended** | To install Netdata, read the [installation instructions](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md). | - -Use links as often as required to provide necessary context. Blog posts and guides require less hyperlinks than -documentation. See the section on [linking between documentation](#linking-between-documentation) for guidance on the -Markdown syntax and path structure of inter-documentation links. - -#### Contractions - -Contractions like "you'll" or "they're" are acceptable in most Netdata writing. They're both authentic and playful, and -reinforce the idea that you, as a writer, are guiding users through a particular idea, process, or feature. - -Contractions are generally not used in press releases or other media engagements. - -#### Emoji - -Emoji can add fun and character to your writing, but should be used sparingly and only if it matches the content's tone -and desired audience. - -#### Switching Linux users - -Netdata documentation often suggests that users switch from their normal user to the `netdata` user to run specific -commands. Use the following command to instruct users to make the switch: - -```bash -sudo su -s /bin/bash netdata -``` - -#### Hostname/IP address of a node - -Use `NODE` instead of an actual or example IP address/hostname when referencing the process of navigating to a dashboard -or API endpoint in a browser. - -| | | -| --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Not recommended | Navigate to `http://example.com:19999` in your browser to see Netdata's dashboard. <br />Navigate to `http://203.0.113.0:19999` in your browser to see Netdata's dashboard. | -| ** -Recommended** | Navigate to `http://NODE:19999` in your browser to see Netdata's dashboard. | - -If you worry that `NODE` doesn't provide enough context for the user, particularly in documentation or guides designed -for beginners, you can provide an explanation: - -> With the Netdata Agent running, visit `http://NODE:19999/api/v1/info` in your browser, replacing `NODE` with the IP -> address or hostname of your Agent. - -#### Paths and running commands - -When instructing users to run a Netdata-specific command, don't assume the path to said command, because not every -Netdata Agent installation will have commands under the same paths. When applicable, help them navigate to the correct -path, providing a recommendation or instructions on how to view the running configuration, which includes the correct -paths. - -For example, the [configuration](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) doc first teaches users how to find the Netdata config directory -and navigate to it, then runs commands from the `/etc/netdata` path so that the instructions are more universal. - -Don't include full paths, beginning from the system's root (`/`), as these might not work on certain systems. - -| | | -| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Not recommended | Use `edit-config` to edit Netdata's configuration: `sudo /etc/netdata/edit-config netdata.conf`. | -| ** -Recommended** | Use `edit-config` to edit Netdata's configuration by first navigating to your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory), which is typically at `/etc/netdata`, then running `sudo edit-config netdata.conf`. | - -#### `sudo` - -Include `sudo` before a command if you believe most Netdata users will need to elevate privileges to run it. This makes -our writing more universal, and users on `sudo`-less systems are generally already aware that they need to run commands -differently. - -For example, most users need to use `sudo` with the `edit-config` script, because the Netdata config directory is owned -by the `netdata` user. Same goes for restarting the Netdata Agent with `systemctl`. - -| | | -| --------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | -| Not recommended | Run `edit-config netdata.conf` to configure the Netdata Agent. <br />Run `systemctl restart netdata` to restart the Netdata Agent. | -| ** -Recommended** | Run `sudo edit-config netdata.conf` to configure the Netdata Agent. <br />Run `sudo systemctl restart netdata` to restart the Netdata Agent. | - -## Deploy and test docs - -<!-- -TODO: Update this section after implemeting a _docker-compose_ for builting and testing learn ---> +### Making a pull request -The Netdata team aggregates and publishes all documentation at [learn.netdata.cloud](/) using -[Docusaurus](https://v2.docusaurus.io/) over at the [`netdata/learn` repository](https://github.com/netdata/learn). +Pull requests (PRs) should be concise and informative. +See our [PR guidelines](https://github.com/netdata/.github/blob/main/CONTRIBUTING.md#pr-guidelines) for specifics. -## Netdata-specific terms +The Netdata team will review your PR and assesses it for correctness, conciseness, and overall quality. +We may point to specific sections and ask for additional information or other fixes. -Consult the [Netdata Glossary](https://github.com/netdata/netdata/blob/master/docs/glossary.md) Netdata specific terms
\ No newline at end of file +After merging your PR, the Netdata team rebuilds the [documentation site](https://learn.netdata.cloud) to publish the changed documentation. diff --git a/docs/guides/collect-apache-nginx-web-logs.md b/docs/guides/collect-apache-nginx-web-logs.md index b4a525471..e9b38c27e 100644 --- a/docs/guides/collect-apache-nginx-web-logs.md +++ b/docs/guides/collect-apache-nginx-web-logs.md @@ -1,16 +1,8 @@ -<!-- -title: "Monitor Nginx or Apache web server log files with Netdata" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/collect-apache-nginx-web-logs.md ---> +# Monitor Nginx or Apache web server log files -# Monitor Nginx or Apache web server log files with Netdata +Parsing web server log files with Netdata, revealing the volume of redirects, requests and other metrics, can give you a better overview of your infrastructure. -Log files have been a critical resource for developers and system administrators who want to understand the health and -performance of their web servers, and Netdata is taking important steps to make them even more valuable. - -By parsing web server log files with Netdata, and seeing the volume of redirects, requests, or server errors over time, -you can better understand what's happening on your infrastructure. Too many bad requests? Maybe a recent deploy missed a -few small SVG icons. Too many requests? Time to batten down the hatches—it's a DDoS. +Too many bad requests? Maybe a recent deploy missed a few small SVG icons. Too many requests? Time to batten down the hatches—it's a DDoS. You can use the [LTSV log format](http://ltsv.org/), track TLS and cipher usage, and the whole parser is faster than ever. In one test on a system with SSD storage, the collector consistently parsed the logs for 200,000 requests in @@ -116,12 +108,5 @@ You can also edit this file directly with `edit-config`: ./edit-config health.d/weblog.conf ``` -For more information about editing the defaults or writing new alarm entities, see our [health monitoring -documentation](https://github.com/netdata/netdata/blob/master/health/README.md). - -## What's next? - -Now that you have web log collection up and running, we recommend you take a look at the collector's [documentation](https://github.com/netdata/go.d.plugin/blob/master/modules/weblog/README.md) for some ideas of how you can turn these rather "boring" logs into powerful real-time tools for keeping your servers happy. - -Don't forget to give GitHub user [Wing924](https://github.com/Wing924) a big 👍 for his hard work in starting up the Go -refactoring effort. +For more information about editing the defaults or writing new alarm entities, see our +[health monitoring documentation](https://github.com/netdata/netdata/blob/master/health/README.md). diff --git a/docs/guides/collect-unbound-metrics.md b/docs/guides/collect-unbound-metrics.md index 5400fd833..c5f4deb51 100644 --- a/docs/guides/collect-unbound-metrics.md +++ b/docs/guides/collect-unbound-metrics.md @@ -1,7 +1,11 @@ <!-- title: "Monitor Unbound DNS servers with Netdata" +sidebar_label: "Monitor Unbound DNS servers with Netdata" date: 2020-03-31 custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/collect-unbound-metrics.md +learn_status: "Published" +learn_topic_type: "Tasks" +learn_rel_path: "Miscellaneous" --> # Monitor Unbound DNS servers with Netdata diff --git a/docs/guides/configure/performance.md b/docs/guides/configure/performance.md index 256d6e854..2e5e105fe 100644 --- a/docs/guides/configure/performance.md +++ b/docs/guides/configure/performance.md @@ -1,110 +1,101 @@ -<!-- -title: How to optimize the Netdata Agent's performance -description: "While the Netdata Agent is designed to monitor a system with only 1% CPU, you can optimize its performance for low-resource systems." -image: /img/seo/guides/configure/performance.png -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/configure/performance.md ---> - # How to optimize the Netdata Agent's performance We designed the Netdata Agent to be incredibly lightweight, even when it's collecting a few thousand dimensions every -second and visualizing that data into hundreds of charts. When properly configured for a production node, the Agent -itself should never use more than 1% of a single CPU core, roughly 50-100 MiB of RAM, and minimal disk I/O to collect, -store, and visualize all this data. - -We take this scalability seriously. We have one user [running -Netdata](https://github.com/netdata/netdata/issues/1323#issuecomment-266427841) on a system with 144 cores and 288 -threads. Despite collecting 100,000 metrics every second, the Agent still only uses 9% CPU utilization on a -single core. - -But not everyone has such powerful systems at their disposal. For example, you might run the Agent on a cloud VM with -only 512 MiB of RAM, or an IoT device like a [Raspberry Pi](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/pi-hole-raspberry-pi.md). In these -cases, reducing Netdata's footprint beyond its already diminutive size can pay big dividends, giving your services more -horsepower while still monitoring the health and the performance of the node, OS, hardware, and applications. +second and visualizing that data into hundreds of charts. However, the default settings of the Netdata Agent are not +optimized for performance, but for a simple, standalone setup. We want the first install to give you something you can +run without any configuration. Most of the settings and options are enabled, since we want you to experience the full thing. -The default settings of the Netdata Agent are not optimized for performance, but for a simple standalone setup. We want -the first install to give you something you can run without any configuration. Most of the settings and options are -enabled, since we want you to experience the full thing. +By default, Netdata will automatically detect applications running on the node it is installed to start collecting metrics in +real-time, has health monitoring enabled to evaluate alerts and trains Machine Learning (ML) models for each metric, to detect anomalies. +This document describes the resources required for the various default capabilities and the strategies to optimize Netdata for production use. -## Prerequisites +## Summary of performance optimizations -- A node running the Netdata Agent. -- Familiarity with configuring the Netdata Agent with `edit-config`. +The following table summarizes the effect of each optimization on the CPU, RAM and Disk IO utilization in production. -If you're not familiar with how to configure the Netdata Agent, read our [node configuration -doc](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) before continuing with this guide. This guide assumes familiarity with the Netdata config -directory, using `edit-config`, and the process of uncommenting/editing various settings in `netdata.conf` and other -configuration files. +Optimization | CPU | RAM | Disk IO +-- | -- | -- |-- +[Use streaming and replication](#use-streaming-and-replication) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: +[Disable unneeded plugins or collectors](#disable-unneeded-plugins-or-collectors) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: +[Reduce data collection frequency](#reduce-collection-frequency) | :heavy_check_mark: | | :heavy_check_mark: +[Change how long Netdata stores metrics](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) | | :heavy_check_mark: | :heavy_check_mark: +[Use a different metric storage database](https://github.com/netdata/netdata/blob/master/database/README.md) | | :heavy_check_mark: | :heavy_check_mark: +[Disable machine learning](#disable-machine-learning) | :heavy_check_mark: | | +[Use a reverse proxy](#run-netdata-behind-a-proxy) | :heavy_check_mark: | | +[Disable/lower gzip compression for the agent dashboard](#disablelower-gzip-compression-for-the-dashboard) | :heavy_check_mark: | | -## What affects Netdata's performance? +## Resources required by a default Netdata installation Netdata's performance is primarily affected by **data collection/retention** and **clients accessing data**. -You can configure almost all aspects of data collection/retention, and certain aspects of clients accessing data. For -example, you can't control how many users might be viewing a local Agent dashboard, [viewing an -infrastructure](https://github.com/netdata/netdata/blob/master/docs/visualize/overview-infrastructure.md) in real-time with Netdata Cloud, or running [Metric -Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md). +You can configure almost all aspects of data collection/retention, and certain aspects of clients accessing data. + +### CPU consumption + +Expect about: + - 1-3% of a single core for the netdata core + - 1-3% of a single core for the various collectors (e.g. go.d.plugin, apps.plugin) + - 5-10% of a single core, when ML training runs -The Netdata Agent runs with the lowest possible [process scheduling -policy](https://github.com/netdata/netdata/blob/master/daemon/README.md#netdata-process-scheduling-policy), which is `nice 19`, and uses the `idle` process scheduler. +Your experience may vary depending on the number of metrics collected, the collectors enabled and the specific environment they +run on, i.e. the work they have to do to collect these metrics. + +As a general rule, for modern hardware and VMs, the total CPU consumption of a standalone Netdata installation, including all its components, +should be below 5 - 15% of a single core. For example, on 8 core server it will use only 0.6% - 1.8% of a total CPU capacity, depending on +the CPU characteristics. + +The Netdata Agent runs with the lowest possible [process scheduling policy](https://github.com/netdata/netdata/blob/master/daemon/README.md#netdata-process-scheduling-policy), which is `nice 19`, and uses the `idle` process scheduler. Together, these settings ensure that the Agent only gets CPU resources when the node has CPU resources to space. If the node reaches 100% CPU utilization, the Agent is stopped first to ensure your applications get any available resources. -In addition, under heavy load, collectors that require disk I/O may stop and show gaps in charts. -Let's walk through the best ways to improve the Netdata Agent's performance. +To reduce CPU usage you can [disable machine learning](#disable-machine-learning), +[use streaming and replication](#use-streaming-and-replication), +[reduce the data collection frequency](#reduce-collection-frequency), [disable unneeded plugins or collectors](#disable-unneeded-plugins-or-collectors), [use a reverse proxy](#run-netdata-behind-a-proxy), and [disable/lower gzip compression for the agent dashboard](#disablelower-gzip-compression-for-the-dashboard). -## Reduce collection frequency +### Memory consumption -The fastest way to improve the Agent's resource utilization is to reduce how often it collects metrics. +The memory footprint of Netdata is mainly influenced by the number of metrics concurrently being collected. Expect about 150MB of RAM for a typical 64-bit server collecting about 2000 to 3000 metrics. -### Global +To estimate and control memory consumption, you can [disable unneeded plugins or collectors](#disable-unneeded-plugins-or-collectors), [change how long Netdata stores metrics](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md), or [use a different metric storage database](https://github.com/netdata/netdata/blob/master/database/README.md). -If you don't need per-second metrics, or if the Netdata Agent uses a lot of CPU even when no one is viewing that node's -dashboard, configure the Agent to collect metrics less often. -Open `netdata.conf` and edit the `update every` setting. The default is `1`, meaning that the Agent collects metrics -every second. +### Disk footprint and I/O -If you change this to `2`, Netdata enforces a minimum `update every` setting of 2 seconds, and collects metrics every -other second, which will effectively halve CPU utilization. Set this to `5` or `10` to collect metrics every 5 or 10 -seconds, respectively. +By default, Netdata should not use more than 1GB of disk space, most of which is dedicated for storing metric data and metadata. For typical installations collecting 2000 - 3000 metrics, this storage should provide a few days of high-resolution retention (per second), about a month of mid-resolution retention (per minute) and more than a year of low-resolution retention (per hour). -```conf -[global] - update every = 5 -``` +Netdata spreads I/O operations across time. For typical standalone installations there should be a few write operations every 5-10 seconds of a few kilobytes each, occasionally up to 1MB. In addition, under heavy load, collectors that require disk I/O may stop and show gaps in charts. -### Specific plugin or collector +To configure retention, you can [change how long Netdata stores metrics](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md). +To control disk I/O [use a different metric storage database](https://github.com/netdata/netdata/blob/master/database/README.md), avoid querying the +production system [using streaming and replication](#use-streaming-and-replication), [reduce the data collection frequency](#reduce-collection-frequency), and [disable unneeded plugins or collectors](#disable-unneeded-plugins-or-collectors). -Every collector and plugin has its own `update every` setting, which you can also change in the `go.d.conf`, -`python.d.conf`, or `charts.d.conf` files, or in individual collector configuration files. If the `update -every` for an individual collector is less than the global, the Netdata Agent uses the global setting. See the [enable -or configure a collector](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure.md) doc for details. +## Use streaming and replication -To reduce the frequency of an [internal -plugin/collector](https://github.com/netdata/netdata/blob/master/docs/collect/how-collectors-work.md#collector-architecture-and-terminology), open `netdata.conf` and -find the appropriate section. For example, to reduce the frequency of the `apps` plugin, which collects and visualizes -metrics on application resource utilization: +For all production environments, parent Netdata nodes outside the production infrastructure should be receiving all +collected data from children Netdata nodes running on the production infrastructure, using [streaming and replication](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.md). -```conf -[plugin:apps] - update every = 5 -``` +### Disable health checks on the child nodes -To [configure an individual collector](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure.md), open its specific configuration file with -`edit-config` and look for the `update_every` setting. For example, to reduce the frequency of the `nginx` collector, -run `sudo ./edit-config go.d/nginx.conf`: +When you set up streaming, we recommend you run your health checks on the parent. This saves resources on the children +and makes it easier to configure or disable alerts and agent notifications. + +The parents by default run health checks for each child, as long as the child is connected (the details are in `stream.conf`). +On the child nodes you should add to `netdata.conf` the following: ```conf -# [ GLOBAL ] -update_every: 10 +[health] + enabled = no ``` +### Use memory mode ram or save for the child nodes + +See [using a different metric storage database](https://github.com/netdata/netdata/blob/master/database/README.md). + ## Disable unneeded plugins or collectors If you know that you don't need an [entire plugin or a specific -collector](https://github.com/netdata/netdata/blob/master/docs/collect/how-collectors-work.md#collector-architecture-and-terminology), you can disable any of them. +collector](https://github.com/netdata/netdata/blob/master/collectors/README.md#collector-architecture-and-terminology), you can disable any of them. Keep in mind that if a plugin/collector has nothing to do, it simply shuts down and does not consume system resources. You will only improve the Agent's performance by disabling plugins/collectors that are actively collecting metrics. @@ -137,42 +128,60 @@ modules: fail2ban: no ``` -## Lower memory usage for metrics retention +## Reduce collection frequency + +The fastest way to improve the Agent's resource utilization is to reduce how often it collects metrics. + +### Global + +If you don't need per-second metrics, or if the Netdata Agent uses a lot of CPU even when no one is viewing that node's +dashboard, [configure the Agent](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) to collect metrics less often. + +Open `netdata.conf` and edit the `update every` setting. The default is `1`, meaning that the Agent collects metrics +every second. + +If you change this to `2`, Netdata enforces a minimum `update every` setting of 2 seconds, and collects metrics every +other second, which will effectively halve CPU utilization. Set this to `5` or `10` to collect metrics every 5 or 10 +seconds, respectively. -Reduce the disk space that the [database engine](https://github.com/netdata/netdata/blob/master/database/engine/README.md) uses to retain metrics by editing -the `dbengine multihost disk space` option in `netdata.conf`. The default value is `256`, but can be set to a minimum of -`64`. By reducing the disk space allocation, Netdata also needs to store less metadata in the node's memory. +```conf +[global] + update every = 5 +``` -The `page cache size` option also directly impacts Netdata's memory usage, but has a minimum value of `32`. +### Specific plugin or collector -Reducing the value of `dbengine multihost disk space` does slim down Netdata's resource usage, but it also reduces how -long Netdata retains metrics. Find the right balance of performance and metrics retention by using the [dbengine -calculator](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md#calculate-the-system-resources-ram-disk-space-needed-to-store-metrics). +Every collector and plugin has its own `update every` setting, which you can also change in the `go.d.conf`, +`python.d.conf`, or `charts.d.conf` files, or in individual collector configuration files. If the `update +every` for an individual collector is less than the global, the Netdata Agent uses the global setting. See the [collectors configuration reference](https://github.com/netdata/netdata/blob/master/collectors/REFERENCE.md) for details. -All the settings are found in the `[global]` section of `netdata.conf`: +To reduce the frequency of an [internal +plugin/collector](https://github.com/netdata/netdata/blob/master/collectors/README.md#collector-architecture-and-terminology), open `netdata.conf` and +find the appropriate section. For example, to reduce the frequency of the `apps` plugin, which collects and visualizes +metrics on application resource utilization: ```conf -[db] - memory mode = dbengine - page cache size = 32 - dbengine multihost disk space = 256 +[plugin:apps] + update every = 5 ``` -To save even more memory, you can disable the dbengine and reduce retention to just 30 minutes, as shown below: +To [configure an individual collector](https://github.com/netdata/netdata/blob/master/collectors/REFERENCE.md#configure-a-collector), open its specific configuration file with +`edit-config` and look for the `update_every` setting. For example, to reduce the frequency of the `nginx` collector, +run `sudo ./edit-config go.d/nginx.conf`: ```conf -[db] - storage tiers = 1 - mode = alloc - retention = 1800 +# [ GLOBAL ] +update_every: 10 ``` -Metric retention is not important in certain use cases, such as: - - Data collection nodes stream collected metrics collected to a centralization point. - - Data collection nodes export their metrics to another time series DB, or are scraped by Prometheus - - Netdata installed only during incidents, to get richer information. -In such cases, you may not want to use the dbengine at all and instead opt for memory mode -`memory mode = alloc` or `memory mode = none`. +## Lower memory usage for metrics retention + +See how to [change how long Netdata stores metrics](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md). + +## Use a different metric storage database + +Consider [using a different metric storage database](https://github.com/netdata/netdata/blob/master/database/README.md) when running Netdata on IoT devices, +and for children in a parent-child set up based on [streaming and replication](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.md). ## Disable machine learning @@ -185,34 +194,12 @@ with the following: enabled = no ``` -## Run Netdata behind Nginx +## Run Netdata behind a proxy -A dedicated web server like Nginx provides far more robustness than the Agent's internal [web server](https://github.com/netdata/netdata/blob/master/web/README.md). +A dedicated web server like nginx provides more robustness than the Agent's internal [web server](https://github.com/netdata/netdata/blob/master/web/README.md). Nginx can handle more concurrent connections, reuse idle connections, and use fast gzip compression to reduce payloads. -For details on installing Nginx as a proxy for the local Agent dashboard, see our [Nginx -doc](https://github.com/netdata/netdata/blob/master/docs/Running-behind-nginx.md). - -After you complete Nginx setup according to the doc linked above, we recommend setting `keepalive` to `1024`, and using -gzip compression with the following options in the `location /` block: - -```conf - location / { - ... - gzip on; - gzip_proxied any; - gzip_types *; - } -``` - -Finally, edit `netdata.conf` with the following settings: - -```conf -[global] - bind socket to IP = 127.0.0.1 - disconnect idle web clients after seconds = 3600 - enable web responses gzip compression = no -``` +For details on installing another web server as a proxy for the local Agent dashboard, see [reverse proxies](https://github.com/netdata/netdata/blob/master/docs/category-overview-pages/reverse-proxies.md). ## Disable/lower gzip compression for the dashboard @@ -235,43 +222,3 @@ Or to lower the default compression level: gzip compression level = 1 ``` -## Disable logs - -If you installation is working correctly, and you're not actively auditing Netdata's logs, disable them in -`netdata.conf`. - -```conf -[logs] - debug log = none - error log = none - access log = none -``` - -## Disable health checks - -If you are streaming metrics to parent nodes, we recommend you run your health checks on the parent, for all the metrics collected -by the children nodes. This saves resources on the children and makes it easier to configure or disable alerts and agent notifications. - -The parents by default run health checks for each child, as long as it is connected (the details are in `stream.conf`). -On the child nodes you should add to `netdata.conf` the following: - -```conf -[health] - enabled = no -``` - -## What's next? - -We hope this guide helped you better understand how to optimize the performance of the Netdata Agent. - -Now that your Agent is running smoothly, we recommend you [secure your nodes](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) if you haven't -already. - -Next, dive into some of Netdata's more complex features, such as configuring its health watchdog or exporting metrics to -an external time-series database. - -- [Interact with dashboards and charts](https://github.com/netdata/netdata/blob/master/docs/visualize/interact-dashboards-charts.md) -- [Configure health alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md) -- [Export metrics to external time-series databases](https://github.com/netdata/netdata/blob/master/docs/export/external-databases.md) - -[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fdocs%2Fguides%2Fconfigure%2Fperformance.md&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/docs/guides/deploy/ansible.md b/docs/guides/deploy/ansible.md deleted file mode 100644 index 0472bdc60..000000000 --- a/docs/guides/deploy/ansible.md +++ /dev/null @@ -1,180 +0,0 @@ -<!-- -title: Deploy Netdata with Ansible -description: "Deploy an infrastructure monitoring solution in minutes with the Netdata Agent and Ansible. Use and customize a simple playbook for monitoring as code." -image: /img/seo/guides/deploy/ansible.png -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/deploy/ansible.md -sidebar_label: "Install Netdata with Ansible" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Installation" ---> - -# Deploy Netdata with Ansible - -Netdata's [one-line kickstart](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx) is zero-configuration, highly adaptable, and compatible with tons -of different operating systems and Linux distributions. You can use it on bare metal, VMs, containers, and everything -in-between. - -But what if you're trying to bootstrap an infrastructure monitoring solution as quickly as possible? What if you need to -deploy Netdata across an entire infrastructure with many nodes? What if you want to make this deployment reliable, -repeatable, and idempotent? What if you want to write and deploy your infrastructure or cloud monitoring system like -code? - -Enter [Ansible](https://ansible.com), a popular system provisioning, configuration management, and infrastructure as -code (IaC) tool. Ansible uses **playbooks** to glue many standardized operations together with a simple syntax, then run -those operations over standard and secure SSH connections. There's no agent to install on the remote system, so all you -have to worry about is your application and your monitoring software. - -Ansible has some competition from the likes of [Puppet](https://puppet.com/) or [Chef](https://www.chef.io/), but the -most valuable feature about Ansible is **idempotent**. From the [Ansible -glossary](https://docs.ansible.com/ansible/latest/reference_appendices/glossary.html) - -> An operation is idempotent if the result of performing it once is exactly the same as the result of performing it -> repeatedly without any intervening actions. - -Idempotency means you can run an Ansible playbook against your nodes any number of times without affecting how they -operate. When you deploy Netdata with Ansible, you're also deploying _monitoring as code_. - -In this guide, we'll walk through the process of using an [Ansible -playbook](https://github.com/netdata/community/tree/main/netdata-agent-deployment/ansible-quickstart) to automatically -deploy the Netdata Agent to any number of distributed nodes, manage the configuration of each node, and connect them to -your Netdata Cloud account. You'll go from some unmonitored nodes to a infrastructure monitoring solution in a matter of -minutes. - -## Prerequisites - -- A Netdata Cloud account. [Sign in and create one](https://app.netdata.cloud) if you don't have one already. -- An administration system with [Ansible](https://www.ansible.com/) installed. -- One or more nodes that your administration system can access via [SSH public - keys](https://git-scm.com/book/en/v2/Git-on-the-Server-Generating-Your-SSH-Public-Key) (preferably password-less). - -## Download and configure the playbook - -First, download the -[playbook](https://github.com/netdata/community/tree/main/netdata-agent-deployment/ansible-quickstart), move it to the -current directory, and remove the rest of the cloned repository, as it's not required for using the Ansible playbook. - -```bash -git clone https://github.com/netdata/community.git -mv community/netdata-agent-deployment/ansible-quickstart . -rm -rf community -``` - -Or if you don't want to clone the entire repository, use the [gitzip browser extension](https://gitzip.org/) to get the netdata-agent-deployment directory as a zip file. - -Next, `cd` into the Ansible directory. - -```bash -cd ansible-quickstart -``` - -### Edit the `hosts` file - -The `hosts` file contains a list of IP addresses or hostnames that Ansible will try to run the playbook against. The -`hosts` file that comes with the repository contains two example IP addresses, which you should replace according to the -IP address/hostname of your nodes. - -```conf -203.0.113.0 hostname=node-01 -203.0.113.1 hostname=node-02 -``` - -You can also set the `hostname` variable, which appears both on the local Agent dashboard and Netdata Cloud, or you can -omit the `hostname=` string entirely to use the system's default hostname. - -#### Set the login user (optional) - -If you SSH into your nodes as a user other than `root`, you need to configure `hosts` according to those user names. Use -the `ansible_user` variable to set the login user. For example: - -```conf -203.0.113.0 hostname=ansible-01 ansible_user=example -``` - -#### Set your SSH key (optional) - -If you use an SSH key other than `~/.ssh/id_rsa` for logging into your nodes, you can set that on a per-node basis in -the `hosts` file with the `ansible_ssh_private_key_file` variable. For example, to log into a Lightsail instance using -two different SSH keys supplied by AWS. - -```conf -203.0.113.0 hostname=ansible-01 ansible_ssh_private_key_file=~/.ssh/LightsailDefaultKey-us-west-2.pem -203.0.113.1 hostname=ansible-02 ansible_ssh_private_key_file=~/.ssh/LightsailDefaultKey-us-east-1.pem -``` - -### Edit the `vars/main.yml` file - -In order to connect your node(s) to your Space in Netdata Cloud, and see all their metrics in real-time in [composite -charts](https://github.com/netdata/netdata/blob/master/docs/visualize/overview-infrastructure.md) or perform [Metric -Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md), you need to set the `claim_token` -and `claim_room` variables. - -To find your `claim_token` and `claim_room`, go to Netdata Cloud, then click on your Space's name in the top navigation, -then click on **Manage your Space**. Click on the **Nodes** tab in the panel that appears, which displays a script with -`token` and `room` strings. - -![Animated GIF of finding the claiming script and the token and room -strings](https://user-images.githubusercontent.com/1153921/98740235-f4c3ac00-2367-11eb-8ffd-e9ab0f04c463.gif) - -Copy those strings into the `claim_token` and `claim_rooms` variables. - -```yml -claim_token: XXXXX -claim_rooms: XXXXX -``` - -Change the `dbengine_multihost_disk_space` if you want to change the metrics retention policy by allocating more or less -disk space for storing metrics. The default is 2048 Mib, or 2 GiB. - -Because we're connecting this node to Netdata Cloud, and will view its dashboards there instead of via the IP address or -hostname of the node, the playbook disables that local dashboard by setting `web_mode` to `none`. This gives a small -security boost by not allowing any unwanted access to the local dashboard. - -You can read more about this decision, or other ways you might lock down the local dashboard, in our [node security -doc](https://github.com/netdata/netdata/blob/master/docs/configure/secure-nodes.md). - -> Curious about why Netdata's dashboard is open by default? Read our [blog -> post](https://www.netdata.cloud/blog/netdata-agent-dashboard/) on that zero-configuration design decision. - -## Run the playbook - -Time to run the playbook from your administration system: - -```bash -ansible-playbook -i hosts tasks/main.yml -``` - -Ansible first connects to your node(s) via SSH, then [collects -facts](https://docs.ansible.com/ansible/latest/user_guide/playbooks_vars_facts.html#ansible-facts) about the system. -This playbook doesn't use these facts, but you could expand it to provision specific types of systems based on the -makeup of your infrastructure. - -Next, Ansible makes changes to each node according to the `tasks` defined in the playbook, and -[returns](https://docs.ansible.com/ansible/latest/reference_appendices/common_return_values.html#changed) whether each -task results in a changed, failure, or was skipped entirely. - -The task to install Netdata will take a few minutes per node, so be patient! Once the playbook reaches the connect to Cloud -task, your nodes start populating your Space in Netdata Cloud. - -## What's next? - -Go use Netdata! - -If you need a bit more guidance for how you can use Netdata for health monitoring and performance troubleshooting, see -our [documentation](https://learn.netdata.cloud/docs). It's designed like a comprehensive guide, based on what you might -want to do with Netdata, so use those categories to dive in. - -Some of the best places to start: - -- [Enable or configure a collector](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure.md) -- [Supported collectors list](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md) -- [See an overview of your infrastructure](https://github.com/netdata/netdata/blob/master/docs/visualize/overview-infrastructure.md) -- [Interact with dashboards and charts](https://github.com/netdata/netdata/blob/master/docs/visualize/interact-dashboards-charts.md) -- [Change how long Netdata stores metrics](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) - -We're looking for more deployment and configuration management strategies, whether via Ansible or other -provisioning/infrastructure as code software, such as Chef or Puppet, in our [community -repo](https://github.com/netdata/community). Anyone is able to fork the repo and submit a PR, either to improve this -playbook, extend it, or create an entirely new experience for deploying Netdata across entire infrastructure. - - diff --git a/docs/guides/export/export-netdata-metrics-graphite.md b/docs/guides/export/export-netdata-metrics-graphite.md deleted file mode 100644 index 985ba2241..000000000 --- a/docs/guides/export/export-netdata-metrics-graphite.md +++ /dev/null @@ -1,181 +0,0 @@ -<!-- -title: Export and visualize Netdata metrics in Graphite -description: "Use Netdata to collect and export thousands of metrics to Graphite for long-term storage or further analysis." -image: /img/seo/guides/export/export-netdata-metrics-graphite.png ---> -import { OneLineInstallWget } from '@site/src/components/OneLineInstall/' - -# Export and visualize Netdata metrics in Graphite - -Collecting metrics is an essential part of monitoring any application, service, or infrastructure, but it's not the -final step for any developer, sysadmin, SRE, or DevOps engineer who's keeping an eye on things. To take meaningful -action on these metrics, you may need to develop a stack of monitoring tools that work in parallel to help you diagnose -anomalies and discover root causes faster. - -We designed Netdata with interoperability in mind. The Agent collects thousands of metrics every second, and then what -you do with them is up to you. You -can [store metrics in the database engine](https://github.com/netdata/netdata/blob/master/docs/guides/longer-metrics-storage.md), -or send them to another time series database for long-term storage or further analysis using -Netdata's [exporting engine](https://github.com/netdata/netdata/blob/master/docs/export/external-databases.md). - -In this guide, we'll show you how to export Netdata metrics to [Graphite](https://graphiteapp.org/) for long-term -storage and further analysis. Graphite is a free open-source software (FOSS) tool that collects graphs numeric -time-series data, such as all the metrics collected by the Netdata Agent itself. Using Netdata and Graphite together, -you get more visibility into the health and performance of your entire infrastructure. - -![A custom dashboard in Grafana with Netdata -metrics](https://user-images.githubusercontent.com/1153921/83903855-b8828480-a713-11ea-8edb-927ba521599b.png) - -Let's get started. - -## Install the Netdata Agent - -If you don't have the Netdata Agent installed already, visit -the [installation guide](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md) -for the recommended instructions for your system. In most cases, you can use the one-line installation script: - -<OneLineInstallWget/> - -Once installation finishes, open your browser and navigate to `http://NODE:19999`, replacing `NODE` with the IP address -or hostname of your system, to find the Agent dashboard. - -## Install Graphite via Docker - -For this guide, we'll install Graphite using Docker. See the [Docker documentation](https://docs.docker.com/get-docker/) -for details if you don't yet have it installed on your system. - -> If you already have Graphite installed, skip this step. If you want to install via a different method, see the -> [Graphite installation docs](https://graphite.readthedocs.io/en/latest/install.html), with the caveat that some -> configuration settings may be different. - -Start up the Graphite image with `docker run`. - -```bash -docker run -d \ - --name graphite \ - --restart=always \ - -p 80:80 \ - -p 2003-2004:2003-2004 \ - -p 2023-2024:2023-2024 \ - -p 8125:8125/udp \ - -p 8126:8126 \ - graphiteapp/graphite-statsd -``` - -Open your browser and navigate to `http://NODE`, to see the Graphite interface. Nothing yet, but we'll fix that soon -enough. - -![An empty Graphite dashboard](https://user-images.githubusercontent.com/1153921/83798958-ea371500-a659-11ea-8403-d46f77a05b78.png) - -## Enable the Graphite exporting connector - -You're now ready to begin exporting Netdata metrics to Graphite. - -Begin by using `edit-config` to open the `exporting.conf` file. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory -sudo ./edit-config exporting.conf -``` - -If you haven't already, enable the exporting engine by setting `enabled` to `yes` in the `[exporting:global]` section. - -```conf -[exporting:global] - enabled = yes -``` - -Next, configure the connector. Find the `[graphite:my_graphite_instance]` example section and uncomment the line. -Replace `my_graphite_instance` with a name of your choice. Let's go with `[graphite:netdata]`. Set `enabled` to `yes` -and uncomment the line. Your configuration should now look like this: - -```conf -[graphite:netdata] - enabled = yes - # destination = localhost - # data source = average - # prefix = netdata - # hostname = my_hostname - # update every = 10 - # buffer on failures = 10 - # timeout ms = 20000 - # send names instead of ids = yes - # send charts matching = * - # send hosts matching = localhost * -``` - -Set the `destination` setting to `localhost:2003`. By default, the Docker image for Graphite listens on port `2003` for -incoming metrics. If you installed Graphite a different way, or tweaked the `docker run` command, you may need to change -the port accordingly. - -```conf -[graphite:netdata] - enabled = yes - destination = localhost:2003 - ... -``` - -We'll not worry about the rest of the settings for now. Restart the Agent using `sudo systemctl restart netdata`, or the -[appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your -system, to spin up the exporting engine. - -## See and organize Netdata metrics in Graphite - -Head back to the Graphite interface again, then click on the **Dashboard** link to get started with Netdata's exported -metrics. You can also navigate directly to `http://NODE/dashboard`. - -Let's switch the interface to help you understand which metrics Netdata is exporting to Graphite. Click on **Dashboard** -and **Configure UI**, then choose the **Tree** option. Refresh your browser to change the UI. - -![Change the Graphite UI](https://user-images.githubusercontent.com/1153921/83798697-77c63500-a659-11ea-8ed5-5e274953c871.png) - -You should now see a tree of available contexts, including one that matches the hostname of the Agent exporting metrics. -In this example, the Agent's hostname is `arcturus`. - -Let's add some system CPU charts so you can monitor the long-term health of your system. Click through the tree to find -**hostname → system → cpu** metrics, then click on the **user** context. A chart with metrics from that context appears -in the dashboard. Add a few other system CPU charts to flesh things out. - -Next, let's combine one or two of these charts. Click and drag one chart onto the other, and wait until the green **Drop -to merge** dialog appears. Release to merge the charts. - -![Merging charts in Graphite](https://user-images.githubusercontent.com/1153921/83817628-1bbfd880-a67a-11ea-81bc-05efc639b6ce.png) - -Finally, save your dashboard. Click **Dashboard**, then **Save As**, then choose a name. Your dashboard is now saved. - -Of course, this is just the beginning of the customization you can do with Graphite. You can change the time range, -share your dashboard with others, or use the composer to customize the size and appearance of specific charts. Learn -more about adding, modifying, and combining graphs in -the [Graphite docs](https://graphite.readthedocs.io/en/latest/dashboard.html). - -## Monitor the exporting engine - -As soon as the exporting engine begins, Netdata begins reporting metrics about the system's health and performance. - -![Graphs for monitoring the exporting engine](https://user-images.githubusercontent.com/1153921/83800787-e5c02b80-a65c-11ea-865a-c447d2ce4cbb.png) - -You can use these charts to verify that Netdata is properly exporting metrics to Graphite. You can even add these -exporting charts to your Graphite dashboard! - -### Add exporting charts to Netdata Cloud - -You can also show these exporting engine metrics on Netdata Cloud. If you don't have an account already, -go [sign in](https://app.netdata.cloud) and get started for free. If you need some help along the way, read -the [get started with Cloud guide](https://github.com/netdata/netdata/blob/master/docs/cloud/get-started.mdx). - -Add more metrics to a War Room's Nodes view by clicking on the **Add metric** button, then typing `exporting` into the -context field. Choose the exporting contexts you want to add, then click **Add**. You'll see these charts alongside any -others you've customized in Netdata Cloud. - -![Exporting engine metrics in Netdata Cloud](https://user-images.githubusercontent.com/1153921/83902769-db139e00-a711-11ea-828e-aa7e32b04c75.png) - -## What's next? - -What you do with your exported metrics is entirely up to you, but as you might have seen in the Graphite connector -configuration block, there are many other ways to tweak and customize which metrics you export to Graphite and how -often. - -For full details about each configuration option and what it does, see -the [exporting reference guide](https://github.com/netdata/netdata/blob/master/exporting/README.md). - - diff --git a/docs/guides/longer-metrics-storage.md b/docs/guides/longer-metrics-storage.md deleted file mode 100644 index 8ccd9585f..000000000 --- a/docs/guides/longer-metrics-storage.md +++ /dev/null @@ -1,158 +0,0 @@ -<!-- -title: "Netdata Longer Metrics Retention" -description: "" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/longer-metrics-storage.md ---> - -# Netdata Longer Metrics Retention - -Metrics retention affects 3 parameters on the operation of a Netdata Agent: - -1. The disk space required to store the metrics. -2. The memory the Netdata Agent will require to have that retention available for queries. -3. The CPU resources that will be required to query longer time-frames. - -As retention increases, the resources required to support that retention increase too. - -Since Netdata Agents usually run at the edge, inside production systems, Netdata Agent **parents** should be considered. When having a **parent - child** setup, the child (the Netdata Agent running on a production system) delegates all its functions, including longer metrics retention and querying, to the parent node that can dedicate more resources to this task. A single Netdata Agent parent can centralize multiple children Netdata Agents (dozens, hundreds, or even thousands depending on its available resources). - - -## Ephemerality of metrics - -The ephemerality of metrics plays an important role in retention. In environments where metrics stop being collected and new metrics are constantly being generated, we are interested about 2 parameters: - -1. The **expected concurrent number of metrics** as an average for the lifetime of the database. - This affects mainly the storage requirements. - -2. The **expected total number of unique metrics** for the lifetime of the database. - This affects mainly the memory requirements for having all these metrics indexed and available to be queried. - -## Granularity of metrics - -The granularity of metrics (the frequency they are collected and stored, i.e. their resolution) is significantly affecting retention. - -Lowering the granularity from per second to every two seconds, will double their retention and half the CPU requirements of the Netdata Agent, without affecting disk space or memory requirements. - -## Which database mode to use - -Netdata Agents support multiple database modes. - -The default mode `[db].mode = dbengine` has been designed to scale for longer retentions. - -The other available database modes are designed to minimize resource utilization and should usually be considered on **parent - child** setups at the children side. - -So, - -* On a single node setup, use `[db].mode = dbengine` to increase retention. -* On a **parent - child** setup, use `[db].mode = dbengine` on the parent to increase retention and a more resource efficient mode (like `save`, `ram` or `none`) for the child to minimize resources utilization. - -To use `dbengine`, set this in `netdata.conf` (it is the default): - -``` -[db] - mode = dbengine -``` - -## Tiering - -`dbengine` supports tiering. Tiering allows having up to 3 versions of the data: - -1. Tier 0 is the high resolution data. -2. Tier 1 is the first tier that samples data every 60 data collections of Tier 0. -3. Tier 2 is the second tier that samples data every 3600 data collections of Tier 0 (60 of Tier 1). - -To enable tiering set `[db].storage tiers` in `netdata.conf` (the default is 1, to enable only Tier 0): - -``` -[db] - mode = dbengine - storage tiers = 3 -``` - -## Disk space requirements - -Netdata Agents require about 1 bytes on disk per database point on Tier 0 and 4 times more on higher tiers (Tier 1 and 2). They require 4 times more storage per point compared to Tier 0, because for every point higher tiers store `min`, `max`, `sum`, `count` and `anomaly rate` (the values are 5, but they require 4 times the storage because `count` and `anomaly rate` are 16-bit integers). The `average` is calculated on the fly at query time using `sum / count`. - -### Tier 0 - per second for a week - -For 2000 metrics, collected every second and retained for a week, Tier 0 needs: 1 byte x 2000 metrics x 3600 secs per hour x 24 hours per day x 7 days per week = 1100MB. - -The setting to control this is in `netdata.conf`: - -``` -[db] - mode = dbengine - - # per second data collection - update every = 1 - - # enable only Tier 0 - storage tiers = 1 - - # Tier 0, per second data for a week - dbengine multihost disk space MB = 1100 -``` - -By setting it to `1100` and restarting the Netdata Agent, this node will start maintaining about a week of data. But pay attention to the number of metrics. If you have more than 2000 metrics on a node, or you need more that a week of high resolution metrics, you may need to adjust this setting accordingly. - -### Tier 1 - per minute for a month - -Tier 1 is by default sampling the data every 60 points of Tier 0. If Tier 0 is per second, then Tier 1 is per minute. - -Tier 1 needs 4 times more storage per point compared to Tier 0. So, for 2000 metrics, with per minute resolution, retained for a month, Tier 1 needs: 4 bytes x 2000 metrics x 60 minutes per hour x 24 hours per day x 30 days per month = 330MB. - -Do this in `netdata.conf`: - -``` -[db] - mode = dbengine - - # per second data collection - update every = 1 - - # enable only Tier 0 and Tier 1 - storage tiers = 2 - - # Tier 0, per second data for a week - dbengine multihost disk space MB = 1100 - - # Tier 1, per minute data for a month - dbengine tier 1 multihost disk space MB = 330 -``` - -Once `netdata.conf` is edited, the Netdata Agent needs to be restarted for the changes to take effect. - -### Tier 2 - per hour for a year - -Tier 2 is by default sampling data every 3600 points of Tier 0 (60 of Tier 1). If Tier 0 is per second, then Tier 2 is per hour. - -The storage requirements are the same to Tier 1. - -For 2000 metrics, with per hour resolution, retained for a year, Tier 2 needs: 4 bytes x 2000 metrics x 24 hours per day x 365 days per year = 67MB. - -Do this in `netdata.conf`: - -``` -[db] - mode = dbengine - - # per second data collection - update every = 1 - - # enable only Tier 0 and Tier 1 - storage tiers = 3 - - # Tier 0, per second data for a week - dbengine multihost disk space MB = 1100 - - # Tier 1, per minute data for a month - dbengine tier 1 multihost disk space MB = 330 - - # Tier 2, per hour data for a year - dbengine tier 2 multihost disk space MB = 67 -``` - -Once `netdata.conf` is edited, the Netdata Agent needs to be restarted for the changes to take effect. - - - diff --git a/docs/guides/monitor-cockroachdb.md b/docs/guides/monitor-cockroachdb.md index 3c6e1b2cf..ea94d7a02 100644 --- a/docs/guides/monitor-cockroachdb.md +++ b/docs/guides/monitor-cockroachdb.md @@ -1,6 +1,10 @@ <!-- title: "Monitor CockroachDB metrics with Netdata" +sidebar_label: "Monitor CockroachDB metrics with Netdata" custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/monitor-cockroachdb.md +learn_status: "Published" +learn_topic_type: "Tasks" +learn_rel_path: "Miscellaneous" --> # Monitor CockroachDB metrics with Netdata @@ -20,9 +24,11 @@ Let's dive in and walk through the process of monitoring CockroachDB metrics wit ## What's in this guide -- [Configure the CockroachDB collector](#configure-the-cockroachdb-collector) +- [Monitor CockroachDB metrics with Netdata](#monitor-cockroachdb-metrics-with-netdata) + - [What's in this guide](#whats-in-this-guide) + - [Configure the CockroachDB collector](#configure-the-cockroachdb-collector) - [Manual setup for a local CockroachDB database](#manual-setup-for-a-local-cockroachdb-database) -- [Tweak CockroachDB alarms](#tweak-cockroachdb-alarms) + - [Tweak CockroachDB alarms](#tweak-cockroachdb-alarms) ## Configure the CockroachDB collector @@ -109,25 +115,4 @@ cd /etc/netdata/ # Replace with your Netdata configuration directory, if not /et ./edit-config health.d/cockroachdb.conf # You may need to use `sudo` for write privileges ``` -For more information about editing the defaults or writing new alarm entities, see our health monitoring [quickstart -guide](https://github.com/netdata/netdata/blob/master/health/QUICKSTART.md). - -## What's next? - -Now that you're collecting metrics from your CockroachDB databases, let us know how it's working for you! There's always -room for improvement or refinement based on real-world use cases. Feel free to [file an -issue](https://github.com/netdata/netdata/issues/new?assignees=&labels=bug%2Cneeds+triage&template=BUG_REPORT.yml) with -your -thoughts. - -Also, be sure to check out these useful resources: - -- [Netdata's CockroachDB documentation](https://github.com/netdata/go.d.plugin/blob/master/modules/cockroachdb/README.md) -- [Netdata's CockroachDB configuration](https://github.com/netdata/go.d.plugin/blob/master/config/go.d/cockroachdb.conf) -- [Netdata's CockroachDB alarms](https://github.com/netdata/netdata/blob/29d9b5e51603792ee27ef5a21f1de0ba8e130158/health/health.d/cockroachdb.conf) -- [CockroachDB homepage](https://www.cockroachlabs.com/product/) -- [CockroachDB documentation](https://www.cockroachlabs.com/docs/stable/) -- [`_status/vars` endpoint docs](https://www.cockroachlabs.com/docs/stable/monitoring-and-alerting.html#prometheus-endpoint) -- [Monitor CockroachDB with Prometheus](https://www.cockroachlabs.com/docs/stable/monitor-cockroachdb-with-prometheus.html) - - +For more information about editing the defaults or writing new alarm entities, see our documentation on [configuring health alarms](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md). diff --git a/docs/guides/monitor-hadoop-cluster.md b/docs/guides/monitor-hadoop-cluster.md index cce261fee..91282b955 100644 --- a/docs/guides/monitor-hadoop-cluster.md +++ b/docs/guides/monitor-hadoop-cluster.md @@ -1,6 +1,10 @@ <!-- title: "Monitor a Hadoop cluster with Netdata" +sidebar_label: "Monitor a Hadoop cluster with Netdata" custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/monitor-hadoop-cluster.md +learn_status: "Published" +learn_topic_type: "Tasks" +learn_rel_path: "Miscellaneous" --> # Monitor a Hadoop cluster with Netdata @@ -184,20 +188,5 @@ sudo /etc/netdata/edit-config health.d/hdfs.conf sudo /etc/netdata/edit-config health.d/zookeeper.conf ``` -For more information about editing the defaults or writing new alarm entities, see our [health monitoring -documentation](https://github.com/netdata/netdata/blob/master/health/README.md). - -## What's next? - -If you're having issues with Netdata auto-detecting your HDFS/Zookeeper servers, or want to help improve how Netdata -collects or presents metrics from these services, feel free to [file an -issue](https://github.com/netdata/netdata/issues/new?assignees=&labels=bug%2Cneeds+triage&template=BUG_REPORT.yml). - -- Read up on the [HDFS configuration - file](https://github.com/netdata/go.d.plugin/blob/master/config/go.d/hdfs.conf) to understand how to configure - global options or per-job options, such as username/password, TLS certificates, timeouts, and more. -- Read up on the [Zookeeper configuration - file](https://github.com/netdata/go.d.plugin/blob/master/config/go.d/zookeeper.conf) to understand how to configure - global options or per-job options, timeouts, TLS certificates, and more. - - +For more information about editing the defaults or writing new alarm entities, see our +[health monitoring documentation](https://github.com/netdata/netdata/blob/master/health/README.md). diff --git a/docs/guides/monitor/anomaly-detection-python.md b/docs/guides/monitor/anomaly-detection-python.md deleted file mode 100644 index d6d27f4e5..000000000 --- a/docs/guides/monitor/anomaly-detection-python.md +++ /dev/null @@ -1,189 +0,0 @@ -<!-- -title: "Detect anomalies in systems and applications" -description: "Detect anomalies in any system, container, or application in your infrastructure with machine learning and the open-source Netdata Agent." -image: /img/seo/guides/monitor/anomaly-detection.png -author: "Joel Hans" -author_title: "Editorial Director, Technical & Educational Resources" -author_img: "/img/authors/joel-hans.jpg" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/monitor/anomaly-detection-python.md ---> - -# Detect anomalies in systems and applications - -Beginning with v1.27, the [open-source Netdata Agent](https://github.com/netdata/netdata) is capable of unsupervised -[anomaly detection](https://en.wikipedia.org/wiki/Anomaly_detection) with machine learning (ML). As with all things -Netdata, the anomalies collector comes with preconfigured alarms and instant visualizations that require no query -languages or organizing metrics. You configure the collector to look at specific charts, and it handles the rest. - -Netdata's implementation uses a handful of functions in the [Python Outlier Detection (PyOD) -library](https://github.com/yzhao062/pyod/tree/master), which periodically runs a `train` function that learns what -"normal" looks like on your node and creates an ML model for each chart, then utilizes the -[`predict_proba()`](https://pyod.readthedocs.io/en/latest/api_cc.html#pyod.models.base.BaseDetector.predict_proba) and -[`predict()`](https://pyod.readthedocs.io/en/latest/api_cc.html#pyod.models.base.BaseDetector.predict) PyOD functions to -quantify how anomalous certain charts are. - -All these metrics and alarms are available for centralized monitoring in [Netdata Cloud](https://app.netdata.cloud). If -you choose to sign up for Netdata Cloud and [connect your nodes](https://github.com/netdata/netdata/blob/master/claim/README.md), you will have the ability to run -tailored anomaly detection on every node in your infrastructure, regardless of its purpose or workload. - -In this guide, you'll learn how to set up the anomalies collector to instantly detect anomalies in an Nginx web server -and/or the node that hosts it, which will give you the tools to configure parallel unsupervised monitors for any -application in your infrastructure. Let's get started. - -![Example anomaly detection with an Nginx web -server](https://user-images.githubusercontent.com/1153921/103586700-da5b0a00-4ea2-11eb-944e-46edd3f83e3a.png) - -## Prerequisites - -- A node running the Netdata Agent. If you don't yet have that, [get Netdata](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx). -- A Netdata Cloud account. [Sign up](https://app.netdata.cloud) if you don't have one already. -- Familiarity with configuring the Netdata Agent with [`edit-config`](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md). -- _Optional_: An Nginx web server running on the same node to follow the example configuration steps. - -## Install required Python packages - -The anomalies collector uses a few Python packages, available with `pip3`, to run ML training. It requires -[`numba`](http://numba.pydata.org/), [`scikit-learn`](https://scikit-learn.org/stable/), -[`pyod`](https://pyod.readthedocs.io/en/latest/), in addition to -[`netdata-pandas`](https://github.com/netdata/netdata-pandas), which is a package built by the Netdata team to pull data -from a Netdata Agent's API into a [Pandas](https://pandas.pydata.org/). Read more about `netdata-pandas` on its [package -repo](https://github.com/netdata/netdata-pandas) or in Netdata's [community -repo](https://github.com/netdata/community/tree/main/netdata-agent-api/netdata-pandas). - -```bash -# Become the netdata user -sudo su -s /bin/bash netdata - -# Install required packages for the netdata user -pip3 install --user netdata-pandas==0.0.38 numba==0.50.1 scikit-learn==0.23.2 pyod==0.8.3 -``` - -> If the `pip3` command fails, you need to install it. For example, on an Ubuntu system, use `sudo apt install -> python3-pip`. - -Use `exit` to become your normal user again. - -## Enable the anomalies collector - -Navigate to your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) and use `edit-config` -to open the `python.d.conf` file. - -```bash -sudo ./edit-config python.d.conf -``` - -In `python.d.conf` file, search for the `anomalies` line. If the line exists, set the value to `yes`. Add the line -yourself if it doesn't already exist. Either way, the final result should look like: - -```conf -anomalies: yes -``` - -[Restart the Agent](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system, to start up the anomalies collector. By default, the -model training process runs every 30 minutes, and uses the previous 4 hours of metrics to establish a baseline for -health and performance across the default included charts. - -> 💡 The anomaly collector may need 30-60 seconds to finish its initial training and have enough data to start -> generating anomaly scores. You may need to refresh your browser tab for the **Anomalies** section to appear in menus -> on both the local Agent dashboard or Netdata Cloud. - -## Configure the anomalies collector - -Open `python.d/anomalies.conf` with `edit-conf`. - -```bash -sudo ./edit-config python.d/anomalies.conf -``` - -The file contains many user-configurable settings with sane defaults. Here are some important settings that don't -involve tweaking the behavior of the ML training itself. - -- `charts_regex`: Which charts to train models for and run anomaly detection on, with each chart getting a separate - model. -- `charts_to_exclude`: Specific charts, selected by the regex in `charts_regex`, to exclude. -- `train_every_n`: How often to train the ML models. -- `train_n_secs`: The number of historical observations to train each model on. The default is 4 hours, but if your node - doesn't have historical metrics going back that far, consider [changing the metrics retention - policy](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) or reducing this window. -- `custom_models`: A way to define custom models that you want anomaly probabilities for, including multi-node or - streaming setups. - -> ⚠️ Setting `charts_regex` with many charts or `train_n_secs` to a very large number will have an impact on the -> resources and time required to train a model for every chart. The actual performance implications depend on the -> resources available on your node. If you plan on changing these settings beyond the default, or what's mentioned in -> this guide, make incremental changes to observe the performance impact. Considering `train_max_n` to cap the number of -> observations actually used to train on. - -### Run anomaly detection on Nginx and log file metrics - -As mentioned above, this guide uses an Nginx web server to demonstrate how the anomalies collector works. You must -configure the collector to monitor charts from the -[Nginx](https://github.com/netdata/go.d.plugin/blob/master/modules/nginx/README.md) and [web -log](https://github.com/netdata/go.d.plugin/blob/master/modules/weblog/README.md) collectors. - -`charts_regex` allows for some basic regex, such as wildcards (`*`) to match all contexts with a certain pattern. For -example, `system\..*` matches with any chart with a context that begins with `system.`, and ends in any number of other -characters (`.*`). Note the escape character (`\`) around the first period to capture a period character exactly, and -not any character. - -Change `charts_regex` in `anomalies.conf` to the following: - -```conf - charts_regex: 'system\..*|nginx_local\..*|web_log_nginx\..*|apps.cpu|apps.mem' -``` - -This value tells the anomaly collector to train against every `system.` chart, every `nginx_local` chart, every -`web_log_nginx` chart, and specifically the `apps.cpu` and `apps.mem` charts. - -![The anomalies collector chart with many -dimensions](https://user-images.githubusercontent.com/1153921/102813877-db5e4880-4386-11eb-8040-d7a1d7a476bb.png) - -### Remove some metrics from anomaly detection - -As you can see in the above screenshot, this node is now looking for anomalies in many places. The result is a single -`anomalies_local.probability` chart with more than twenty dimensions, some of which the dashboard hides at the bottom of -a scrollable area. In addition, training and analyzing the anomaly collector on many charts might require more CPU -utilization that you're willing to give. - -First, explicitly declare which `system.` charts to monitor rather than of all of them using regex (`system\..*`). - -```conf - charts_regex: 'system\.cpu|system\.load|system\.io|system\.net|system\.ram|nginx_local\..*|web_log_nginx\..*|apps.cpu|apps.mem' -``` - -Next, remove some charts with the `charts_to_exclude` setting. For this example, using an Nginx web server, focus on the -volume of requests/responses, not, for example, which type of 4xx response a user might receive. - -```conf - charts_to_exclude: 'web_log_nginx.excluded_requests,web_log_nginx.responses_by_status_code_class,web_log_nginx.status_code_class_2xx_responses,web_log_nginx.status_code_class_4xx_responses,web_log_nginx.current_poll_uniq_clients,web_log_nginx.requests_by_http_method,web_log_nginx.requests_by_http_version,web_log_nginx.requests_by_ip_proto' -``` - -![The anomalies collector with less -dimensions](https://user-images.githubusercontent.com/1153921/102820642-d69f9180-4392-11eb-91c5-d3d166d40105.png) - -Apply the ideas behind the collector's regex and exclude settings to any other -[system](https://github.com/netdata/netdata/blob/master/docs/collect/system-metrics.md), [container](https://github.com/netdata/netdata/blob/master/docs/collect/container-metrics.md), or -[application](https://github.com/netdata/netdata/blob/master/docs/collect/application-metrics.md) metrics you want to detect anomalies for. - -## What's next? - -Now that you know how to set up unsupervised anomaly detection in the Netdata Agent, using an Nginx web server as an -example, it's time to apply that knowledge to other mission-critical parts of your infrastructure. If you're not sure -what to monitor next, check out our list of [collectors](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md) to see what kind of metrics Netdata -can collect from your systems, containers, and applications. - -Keep on moving to [part 2](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/visualize-monitor-anomalies.md), which covers the charts and alarms -Netdata creates for unsupervised anomaly detection. - -For a different troubleshooting experience, try out the [Metric -Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md) feature in Netdata Cloud. Metric -Correlations helps you perform faster root cause analysis by narrowing a dashboard to only the charts most likely to be -related to an anomaly. - -### Related reference documentation - -- [Netdata Agent · Anomalies collector](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/anomalies/README.md) -- [Netdata Agent · Nginx collector](https://github.com/netdata/go.d.plugin/blob/master/modules/nginx/README.md) -- [Netdata Agent · web log collector](https://github.com/netdata/go.d.plugin/blob/master/modules/weblog/README.md) -- [Netdata Cloud · Metric Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md) diff --git a/docs/guides/monitor/anomaly-detection.md b/docs/guides/monitor/anomaly-detection.md index ce819d937..4552e7a72 100644 --- a/docs/guides/monitor/anomaly-detection.md +++ b/docs/guides/monitor/anomaly-detection.md @@ -1,13 +1,14 @@ <!-- title: "Machine learning (ML) powered anomaly detection" +sidebar_label: "Machine learning (ML) powered anomaly detection" description: "Detect anomalies in any system, container, or application in your infrastructure with machine learning and the open-source Netdata Agent." image: /img/seo/guides/monitor/anomaly-detection.png -author: "Andrew Maguire" -author_title: "Analytics & ML Lead" -author_img: "/img/authors/andy-maguire.jpg" custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/monitor/anomaly-detection.md +learn_status: "Published" +learn_rel_path: "Operations" --> +# Machine learning (ML) powered anomaly detection ## Overview @@ -34,7 +35,7 @@ This guide will explain how to get started using these ML based anomaly detectio ## Anomaly Advisor -The [Anomaly Advisor](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/anomaly-advisor.mdx) is the flagship anomaly detection feature within Netdata. In the "Anomalies" tab of Netdata you will see an overall "Anomaly Rate" chart that aggregates node level anomaly rate for all nodes in a space. The aim of this chart is to make it easy to quickly spot periods of time where the overall "[node anomaly rate](https://github.com/netdata/netdata/blob/master/ml/README.md#node-anomaly-rate)" is elevated in some unusual way and for what node or nodes this relates to. +The [Anomaly Advisor](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/anomaly-advisor.md) is the flagship anomaly detection feature within Netdata. In the "Anomalies" tab of Netdata you will see an overall "Anomaly Rate" chart that aggregates node level anomaly rate for all nodes in a space. The aim of this chart is to make it easy to quickly spot periods of time where the overall "[node anomaly rate](https://github.com/netdata/netdata/blob/master/ml/README.md#node-anomaly-rate)" is elevated in some unusual way and for what node or nodes this relates to. ![image](https://user-images.githubusercontent.com/2178292/175928290-490dd8b9-9c55-4724-927e-e145cb1cc837.png) @@ -52,13 +53,13 @@ Pressing the anomalies icon (next to the information icon in the chart header) w ## Anomaly Rate Based Alerts -It is possible to use the `anomaly-bit` when defining traditional Alerts within netdata. The `anomaly-bit` is just another `options` parameter that can be passed as part of an [alarm line lookup](https://learn.netdata.cloud/docs/agent/health/reference#alarm-line-lookup). +It is possible to use the `anomaly-bit` when defining traditional Alerts within netdata. The `anomaly-bit` is just another `options` parameter that can be passed as part of an [alarm line lookup](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md#alarm-line-lookup). You can see some example ML based alert configurations below: -- [Anomaly rate based CPU dimensions alarm](https://learn.netdata.cloud/docs/agent/health/reference#example-8---anomaly-rate-based-cpu-dimensions-alarm) -- [Anomaly rate based CPU chart alarm](https://learn.netdata.cloud/docs/agent/health/reference#example-9---anomaly-rate-based-cpu-chart-alarm) -- [Anomaly rate based node level alarm](https://learn.netdata.cloud/docs/agent/health/reference#example-10---anomaly-rate-based-node-level-alarm) +- [Anomaly rate based CPU dimensions alarm](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md#example-8---anomaly-rate-based-cpu-dimensions-alarm) +- [Anomaly rate based CPU chart alarm](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md#example-9---anomaly-rate-based-cpu-chart-alarm) +- [Anomaly rate based node level alarm](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md#example-10---anomaly-rate-based-node-level-alarm) - More examples in the [`/health/health.d/ml.conf`](https://github.com/netdata/netdata/blob/master/health/health.d/ml.conf) file that ships with the agent. ## Learn More @@ -66,7 +67,7 @@ You can see some example ML based alert configurations below: Check out the resources below to learn more about how Netdata is approaching ML: - [Agent ML documentation](https://github.com/netdata/netdata/blob/master/ml/README.md). -- [Anomaly Advisor documentation](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/anomaly-advisor.mdx). +- [Anomaly Advisor documentation](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/anomaly-advisor.md). - [Metric Correlations documentation](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md). - Anomaly Advisor [launch blog post](https://www.netdata.cloud/blog/introducing-anomaly-advisor-unsupervised-anomaly-detection-in-netdata/). - Netdata Approach to ML [blog post](https://www.netdata.cloud/blog/our-approach-to-machine-learning/). diff --git a/docs/guides/monitor/dimension-templates.md b/docs/guides/monitor/dimension-templates.md deleted file mode 100644 index d2795a9c6..000000000 --- a/docs/guides/monitor/dimension-templates.md +++ /dev/null @@ -1,181 +0,0 @@ -<!-- -title: "Use dimension templates to create dynamic alarms" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/monitor/dimension-templates.md ---> - -# Use dimension templates to create dynamic alarms - -Your ability to monitor the health of your systems and applications relies on your ability to create and maintain -the best set of alarms for your particular needs. - -In v1.18 of Netdata, we introduced **dimension templates** for alarms, which simplifies the process of -writing [alarm entities](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#health-entity-reference) for -charts with many dimensions. - -Dimension templates can condense many individual entities into one—no more copy-pasting one entity and changing the -`alarm`/`template` and `lookup` lines for each dimension you'd like to monitor. - -They are, however, an advanced health monitoring feature. For more basic instructions on creating your first alarm, -check out our [health monitoring documentation](https://github.com/netdata/netdata/blob/master/health/README.md), which also includes -[examples](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#example-alarms). - -## The fundamentals of `foreach` - -Our dimension templates update creates a new `foreach` parameter to the -existing [`lookup` line](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#alarm-line-lookup). This -is where the magic happens. - -You use the `foreach` parameter to specify which dimensions you want to monitor with this single alarm. You can separate -them with a comma (`,`) or a pipe (`|`). You can also use -a [Netdata simple pattern](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md) to create -many alarms with a regex-like syntax. - -The `foreach` parameter _has_ to be the last parameter in your `lookup` line, and if you have both `of` and `foreach` in -the same `lookup` line, Netdata will ignore the `of` parameter and use `foreach` instead. - -Let's get into some examples so you can see how the new parameter works. - -> ⚠️ The following entities are examples to showcase the functionality and syntax of dimension templates. They are not -> meant to be run as-is on production systems. - -## Condensing entities with `foreach` - -Let's say you want to monitor the `system`, `user`, and `nice` dimensions in your system's overall CPU utilization. -Before dimension templates, you would need the following three entities: - -```yaml - alarm: cpu_system - on: system.cpu -lookup: average -10m percentage of system - every: 1m - warn: $this > 50 - crit: $this > 80 - - alarm: cpu_user - on: system.cpu -lookup: average -10m percentage of user - every: 1m - warn: $this > 50 - crit: $this > 80 - - alarm: cpu_nice - on: system.cpu -lookup: average -10m percentage of nice - every: 1m - warn: $this > 50 - crit: $this > 80 -``` - -With dimension templates, you can condense these into a single alarm. Take note of the `alarm` and `lookup` lines. - -```yaml - alarm: cpu_template - on: system.cpu -lookup: average -10m percentage foreach system,user,nice - every: 1m - warn: $this > 50 - crit: $this > 80 -``` - -The `alarm` line specifies the naming scheme Netdata will use. You can use whatever naming scheme you'd like, with `.` -and `_` being the only allowed symbols. - -The `lookup` line has changed from `of` to `foreach`, and we're now passing three dimensions. - -In this example, Netdata will create three alarms with the names `cpu_template_system`, `cpu_template_user`, and -`cpu_template_nice`. Every minute, each alarm will use the same database query to calculate the average CPU usage for -the `system`, `user`, and `nice` dimensions over the last 10 minutes and send out alarms if necessary. - -You can find these three alarms active by clicking on the **Alarms** button in the top navigation, and then clicking on -the **All** tab and scrolling to the **system - cpu** collapsible section. - -![Three new alarms created from the dimension template](https://user-images.githubusercontent.com/1153921/66218994-29523800-e67f-11e9-9bcb-9bca23e2c554.png) - -Let's look at some other examples of how `foreach` works so you can best apply it in your configurations. - -### Using a Netdata simple pattern in `foreach` - -In the last example, we used `foreach system,user,nice` to create three distinct alarms using dimension templates. But -what if you want to quickly create alarms for _all_ the dimensions of a given chart? - -Use a [simple pattern](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md)! One example of a simple pattern is a single wildcard -(`*`). - -Instead of monitoring system CPU usage, let's monitor per-application CPU usage using the `apps.cpu` chart. Passing a -wildcard as the simple pattern tells Netdata to create a separate alarm for _every_ process on your system: - -```yaml - alarm: app_cpu - on: apps.cpu -lookup: average -10m percentage foreach * - every: 1m - warn: $this > 50 - crit: $this > 80 -``` - -This entity will now create alarms for every dimension in the `apps.cpu` chart. Given that most `apps.cpu` charts have -10 or more dimensions, using the wildcard ensures you catch every CPU-hogging process. - -To learn more about how to use simple patterns with dimension templates, see -our [simple patterns documentation](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md). - -## Using `foreach` with alarm templates - -Dimension templates also work -with [alarm templates](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#alarm-line-alarm-or-template). -Alarm templates help you create alarms for all the charts with a given context—for example, all the cores of your -system's CPU. - -By combining the two, you can create dozens of individual alarms with a single template entity. Here's how you would -create alarms for the `system`, `user`, and `nice` dimensions for every chart in the `cpu.cpu` context—or, in other -words, every CPU core. - -```yaml -template: cpu_template - on: cpu.cpu - lookup: average -10m percentage foreach system,user,nice - every: 1m - warn: $this > 50 - crit: $this > 80 -``` - -On a system with a 6-core, 12-thread Ryzen 5 1600 CPU, this one entity creates alarms on the following charts and -dimensions: - -- `cpu.cpu0` - - `cpu_template_user` - - `cpu_template_system` - - `cpu_template_nice` -- `cpu.cpu1` - - `cpu_template_user` - - `cpu_template_system` - - `cpu_template_nice` -- `cpu.cpu2` - - `cpu_template_user` - - `cpu_template_system` - - `cpu_template_nice` -- ... -- `cpu.cpu11` - - `cpu_template_user` - - `cpu_template_system` - - `cpu_template_nice` - -And how just a few of those dimension template-generated alarms look like in the Netdata dashboard. - -![A few of the created alarms in the Netdata dashboard](https://user-images.githubusercontent.com/1153921/66219669-708cf880-e680-11e9-8b3a-7bfe178fa28b.png) - -All in all, this single entity creates 36 individual alarms. Much easier than writing 36 separate entities in your -health configuration files! - -## What's next? - -We hope you're excited about the possibilities of using dimension templates! Maybe they'll inspire you to build new -alarms that will help you better monitor the health of your systems. - -Or, at the very least, simplify your configuration files. - -For information about other advanced features in Netdata's health monitoring toolkit, check out -our [health documentation](https://github.com/netdata/netdata/blob/master/health/README.md). And if you have some cool -alarms you built using dimension templates, - - diff --git a/docs/guides/monitor/kubernetes-k8s-netdata.md b/docs/guides/monitor/kubernetes-k8s-netdata.md index 5732fc96c..96d79935b 100644 --- a/docs/guides/monitor/kubernetes-k8s-netdata.md +++ b/docs/guides/monitor/kubernetes-k8s-netdata.md @@ -1,14 +1,6 @@ -<!-- -title: "Kubernetes monitoring with Netdata: Overview and visualizations" -description: "Learn how to navigate Netdata's Kubernetes monitoring features for visualizing the health and performance of a Kubernetes cluster with per-second granularity." -image: /img/seo/guides/monitor/kubernetes-k8s-netdata.png -author: "Joel Hans" -author_title: "Editorial Director, Technical & Educational Resources" -author_img: "/img/authors/joel-hans.jpg" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/monitor/kubernetes-k8s-netdata.md ---> - -# Kubernetes monitoring with Netdata: Overview and visualizations +# Kubernetes monitoring with Netdata + +This document gives an overview of what visualizations Netdata provides on Kubernetes deployments. At Netdata, we've built Kubernetes monitoring tools that add visibility without complexity while also helping you actively troubleshoot anomalies or outages. This guide walks you through each of the visualizations and offers best @@ -140,7 +132,7 @@ visualizations](https://user-images.githubusercontent.com/1153921/109049195-349f ### Health map -The first visualization is the [health map](https://learn.netdata.cloud/docs/cloud/visualize/kubernetes#health-map), +The first visualization is the [health map](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/kubernetes.md#health-map), which places each container into its own box, then varies the intensity of their color to visualize the resource utilization. By default, the health map shows the **average CPU utilization as a percentage of the configured limit** for every container in your cluster. diff --git a/docs/guides/monitor/lamp-stack.md b/docs/guides/monitor/lamp-stack.md index 165888c4b..190ea87e8 100644 --- a/docs/guides/monitor/lamp-stack.md +++ b/docs/guides/monitor/lamp-stack.md @@ -1,15 +1,8 @@ -<!-- -title: "LAMP stack monitoring (Linux, Apache, MySQL, PHP) with Netdata" -description: "Set up robust LAMP stack monitoring (Linux, Apache, MySQL, PHP) in just a few minutes using a free, open-source monitoring tool that collects metrics every second." -image: /img/seo/guides/monitor/lamp-stack.png -author: "Joel Hans" -author_title: "Editorial Director, Technical & Educational Resources" -author_img: "/img/authors/joel-hans.jpg" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/monitor/lamp-stack.md ---> import { OneLineInstallWget } from '@site/src/components/OneLineInstall/' -# LAMP stack monitoring (Linux, Apache, MySQL, PHP) with Netdata +# LAMP stack monitoring with Netdata + +Set up robust LAMP stack monitoring (Linux, Apache, MySQL, PHP) in a few minutes using Netdata. The LAMP stack is the "hello world" for deploying dynamic web applications. It's fast, flexible, and reliable, which means a developer or sysadmin won't go far in their career without interacting with the stack and its services. @@ -58,7 +51,7 @@ To follow this tutorial, you need: ## Install the Netdata Agent If you don't have the free, open-source Netdata monitoring agent installed on your node yet, get started with a [single -kickstart command](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx): +kickstart command](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md): <OneLineInstallWget/> @@ -171,10 +164,9 @@ If the Netdata Agent isn't already open in your browser, open a new tab and navi Netdata automatically organizes all metrics and charts onto a single page for easy navigation. Peek at gauges to see overall system performance, then scroll down to see more. Click-and-drag with your mouse to pan _all_ charts back and forth through different time intervals, or hold `SHIFT` and use the scrollwheel (or two-finger scroll) to zoom in and -out. Check out our doc on [interacting with charts](https://github.com/netdata/netdata/blob/master/docs/visualize/interact-dashboards-charts.md) for all the details. +out. Check out our doc on [interacting with charts](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md) for all the details. -![The Netdata -dashboard](https://user-images.githubusercontent.com/1153921/109520555-98e17800-7a69-11eb-86ec-16f689da4527.png) +![The Netdata dashboard](https://user-images.githubusercontent.com/1153921/109520555-98e17800-7a69-11eb-86ec-16f689da4527.png) The **System Overview** section, which you can also see in the right-hand menu, contains key hardware monitoring charts, including CPU utilization, memory page faults, network monitoring, and much more. The **Applications** section shows you @@ -211,7 +203,7 @@ shows any alarms currently triggered, while the **All** tab displays a list of _ ![An example of LAMP stack alarms](https://user-images.githubusercontent.com/1153921/109524120-5883f900-7a6d-11eb-830e-0e7baaa28163.png) -[Tweak alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md) based on your infrastructure monitoring needs, and to see these alarms +[Tweak alarms](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md) based on your infrastructure monitoring needs, and to see these alarms in other places, like your inbox or a Slack channel, [enable a notification method](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md). @@ -238,7 +230,7 @@ source of issues faster with [Metric Correlations](https://github.com/netdata/ne ### Related reference documentation -- [Netdata Agent · Get started](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx) +- [Netdata Agent · Get started](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md) - [Netdata Agent · Apache data collector](https://github.com/netdata/go.d.plugin/blob/master/modules/apache/README.md) - [Netdata Agent · Web log collector](https://github.com/netdata/go.d.plugin/blob/master/modules/weblog/README.md) - [Netdata Agent · MySQL data collector](https://github.com/netdata/go.d.plugin/blob/master/modules/mysql/README.md) diff --git a/docs/guides/monitor/pi-hole-raspberry-pi.md b/docs/guides/monitor/pi-hole-raspberry-pi.md index 5099d12b9..4f0ff4cd6 100644 --- a/docs/guides/monitor/pi-hole-raspberry-pi.md +++ b/docs/guides/monitor/pi-hole-raspberry-pi.md @@ -1,13 +1,17 @@ <!-- title: "Monitor Pi-hole (and a Raspberry Pi) with Netdata" +sidebar_label: "Monitor Pi-hole (and a Raspberry Pi) with Netdata" description: "Monitor Pi-hole metrics, plus Raspberry Pi system metrics, in minutes and completely for free with Netdata's open-source monitoring agent." image: /img/seo/guides/monitor/netdata-pi-hole-raspberry-pi.png custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/monitor/pi-hole-raspberry-pi.md +learn_status: "Published" +learn_rel_path: "Miscellaneous" --> -import { OneLineInstallWget } from '@site/src/components/OneLineInstall/' # Monitor Pi-hole (and a Raspberry Pi) with Netdata +import { OneLineInstallWget } from '@site/src/components/OneLineInstall/' + Between intrusive ads, invasive trackers, and vicious malware, many techies and homelab enthusiasts are advancing their networks' security and speed with a tiny computer and a powerful piece of software: [Pi-hole](https://pi-hole.net/). @@ -61,9 +65,7 @@ populates its dashboard with more than 250 charts. Open your browser of choice and navigate to `http://NODE:19999/`, replacing `NODE` with the IP address of your Raspberry Pi. Not sure what that IP is? Try running `hostname -I | awk '{print $1}'` from the Pi itself. -You'll see Netdata's dashboard and a few hundred real-time, -[interactive](https://learn.netdata.cloud/guides/step-by-step/step-02#interact-with-charts) charts. Feel free to -explore, but let's turn our attention to installing Pi-hole. +You'll see Netdata's dashboard and a few hundred real-time, interactive charts. Feel free to explore, but let's turn our attention to installing Pi-hole. ## Install Pi-Hole @@ -98,8 +100,7 @@ part of your system might affect another. ![The Netdata dashboard in action](https://user-images.githubusercontent.com/1153921/80827388-b9fee100-8b98-11ea-8f60-0d7824667cd3.gif) -If you're completely new to Netdata, look at our [step-by-step guide](https://github.com/netdata/netdata/blob/master/docs/guides/step-by-step/step-00.md) for a -walkthrough of all its features. For a more expedited tour, see the [get started guide](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx). +If you're completely new to Netdata, look at the [Introduction](https://github.com/netdata/netdata/blob/master/docs/getting-started/introduction.md) section for a walkthrough of all its features. For a more expedited tour, see the [get started documentation](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md). ### Enable temperature sensor monitoring @@ -137,26 +138,5 @@ more than 256. Use our [database sizing calculator](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md#calculate-the-system-resources-ram-disk-space-needed-to-store-metrics) -and [guide on storing historical metrics](https://github.com/netdata/netdata/blob/master/docs/guides/longer-metrics-storage.md) to help you determine the right +and the [Database configuration documentation](https://github.com/netdata/netdata/blob/master/database/README.md) to help you determine the right setting for your Raspberry Pi. - -## What's next? - -Now that you're monitoring Pi-hole and your Raspberry Pi with Netdata, you can extend its capabilities even further, or -configure Netdata to more specific goals. - -Most importantly, you can always install additional services and instantly collect metrics from many of them with our -[300+ integrations](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md). - -- [Optimize performance](https://github.com/netdata/netdata/blob/master/docs/guides/configure/performance.md) using tweaks developed for IoT devices. -- [Stream Raspberry Pi metrics](https://github.com/netdata/netdata/blob/master/streaming/README.md) to a parent host for easy access or longer-term storage. -- [Tweak alarms](https://github.com/netdata/netdata/blob/master/health/QUICKSTART.md) for either Pi-hole or the health of your Raspberry Pi. -- [Export metrics to external databases](https://github.com/netdata/netdata/blob/master/exporting/README.md) with the exporting engine. - -Or, head over to [our guides](https://learn.netdata.cloud/guides/) for even more experiments and insights into -troubleshooting the health of your systems and services. - -If you have any questions about using Netdata to monitor your Raspberry Pi, Pi-hole, or any other applications, head on -over to our [community forum](https://community.netdata.cloud/). - - diff --git a/docs/guides/monitor/process.md b/docs/guides/monitor/process.md index 7cc327a01..9aa6911f1 100644 --- a/docs/guides/monitor/process.md +++ b/docs/guides/monitor/process.md @@ -1,8 +1,11 @@ <!-- title: Monitor any process in real-time with Netdata +sidebar_label: Monitor any process in real-time with Netdata description: "Tap into Netdata's powerful collectors, with per-second utilization metrics for every process, to troubleshoot faster and make data-informed decisions." image: /img/seo/guides/monitor/process.png custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/monitor/process.md +learn_status: "Published" +learn_rel_path: "Operations" --> # Monitor any process in real-time with Netdata @@ -34,11 +37,7 @@ With Netdata's process monitoring, you can: ## Prerequisites -- One or more Linux nodes running [Netdata](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx). If you - need more time to understand Netdata before - following this guide, see - the [infrastructure](https://github.com/netdata/netdata/blob/master/docs/quickstart/infrastructure.md) or - [single-node](https://github.com/netdata/netdata/blob/master/docs/quickstart/single-node.md) monitoring quickstarts. +- One or more Linux nodes running [Netdata](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md) - A general understanding of how to [configure the Netdata Agent](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) using `edit-config`. @@ -268,45 +267,4 @@ relevant data. `ebpf.plugin` visualizes additional eBPF metrics, which are system-wide and not per-process, under the **eBPF** section. -## What's next? - -Now that you have `apps_groups.conf` configured correctly, and know where to find per-process visualizations throughout -Netdata's ecosystem, you can precisely monitor the health and performance of any process on your node using per-second -metrics. - -For even more in-depth troubleshooting, see our guide -on [monitoring and debugging applications with eBPF](https://github.com/netdata/netdata/blob/master/docs/guides/troubleshoot/monitor-debug-applications-ebpf.md). - -If the process you're monitoring also has -a [supported collector](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md), now is a great time to -set -that up if it wasn't autodetected. With both process utilization and application-specific metrics, you should have every -piece of data needed to discover the root cause of an incident. See -our [collector setup](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure.md) doc for details. - -[Create new dashboards](https://github.com/netdata/netdata/blob/master/docs/visualize/create-dashboards.md) in Netdata -Cloud using charts from `apps.plugin`, -`ebpf.plugin`, and application-specific collectors to build targeted dashboards for monitoring key processes across your -infrastructure. - -Try -running [Metric Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md) -on a node that's running the process(es) you're monitoring. Even if nothing is going wrong at the moment, Netdata -Cloud's embedded intelligence helps you better understand how a MySQL database, for example, might influence a system's -volume of memory page faults. And when an incident is afoot, use Metric Correlations to reduce mean time to resolution ( -MTTR) and cognitive load. - -If you want more specific metrics from your custom application, check out -Netdata's [statsd support](https://github.com/netdata/netdata/blob/master/collectors/statsd.plugin/README.md). With statd, you can send detailed metrics from your -application to Netdata and visualize them with per-second granularity. Netdata's statsd collector works with dozens of -[statsd server implementations](https://github.com/etsy/statsd/wiki#client-implementations), which work with most application -frameworks. - -### Related reference documentation - -- [Netdata Agent · `apps.plugin`](https://github.com/netdata/netdata/blob/master/collectors/apps.plugin/README.md) -- [Netdata Agent · `ebpf.plugin`](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/README.md) -- [Netdata Agent · Dashboards](https://github.com/netdata/netdata/blob/master/web/README.md#dimensions) -- [Netdata Agent · MySQL collector](https://github.com/netdata/go.d.plugin/blob/master/modules/mysql/README.md) - diff --git a/docs/guides/monitor/raspberry-pi-anomaly-detection.md b/docs/guides/monitor/raspberry-pi-anomaly-detection.md index 00b652bf2..935d0f6cf 100644 --- a/docs/guides/monitor/raspberry-pi-anomaly-detection.md +++ b/docs/guides/monitor/raspberry-pi-anomaly-detection.md @@ -1,12 +1,6 @@ ---- -title: "Unsupervised anomaly detection for Raspberry Pi monitoring" -description: "Use a low-overhead machine learning algorithm and an open-source monitoring tool to detect anomalous metrics on a Raspberry Pi." -image: /img/seo/guides/monitor/raspberry-pi-anomaly-detection.png -author: "Andy Maguire" -author_title: "Senior Machine Learning Engineer" -author_img: "/img/authors/andy-maguire.jpg" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/monitor/raspberry-pi-anomaly-detection.md ---- +# Anomaly detection for RPi monitoring + +Learn how to use a low-overhead machine learning algorithm alongside Netdata to detect anomalous metrics on a Raspberry Pi. We love IoT and edge at Netdata, we also love machine learning. Even better if we can combine the two to ease the pain of monitoring increasingly complex systems. @@ -23,7 +17,7 @@ Read on to learn all the steps and enable unsupervised anomaly detection on your - A Raspberry Pi running Raspbian, which we'll call a _node_. - The [open-source Netdata](https://github.com/netdata/netdata) monitoring agent. If you don't have it installed on your - node yet, [get started now](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx). + node yet, [get started now](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md). ## Install dependencies @@ -63,7 +57,6 @@ Now you're ready to enable the collector and [restart Netdata](https://github.co ```bash sudo ./edit-config python.d.conf -# set `anomalies: no` to `anomalies: yes` # restart netdata sudo systemctl restart netdata @@ -100,26 +93,4 @@ during training. By default, the anomalies collector, along with all other runni ![RAM utilization of anomaly detection on the Raspberry Pi](https://user-images.githubusercontent.com/1153921/110149720-9e0d3280-7d9b-11eb-883d-b1d4d9b9b5e1.png) -## What's next? - -So, all in all, with a small little bit of extra set up and a small overhead on the Pi itself, the anomalies collector -looks like a potentially useful addition to enable unsupervised anomaly detection on your Pi. - -See our two-part guide series for a more complete picture of configuring the anomalies collector, plus some best -practices on using the charts it automatically generates: - -- [_Detect anomalies in systems and applications_](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/anomaly-detection-python.md) -- [_Monitor and visualize anomalies with Netdata_](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/visualize-monitor-anomalies.md) - -If you're using your Raspberry Pi for other purposes, like blocking ads/trackers with Pi-hole, check out our companions -Pi guide: [_Monitor Pi-hole (and a Raspberry Pi) with Netdata_](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/pi-hole-raspberry-pi.md). - -Once you've had a chance to give unsupervised anomaly detection a go, share your use cases and let us know of any -feedback on our [community forum](https://community.netdata.cloud/t/anomalies-collector-feedback-megathread/767). - -### Related reference documentation - -- [Netdata Agent · Get Netdata](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx) -- [Netdata Agent · Anomalies collector](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/anomalies/README.md) - diff --git a/docs/guides/monitor/statsd.md b/docs/guides/monitor/statsd.md deleted file mode 100644 index 848e2649c..000000000 --- a/docs/guides/monitor/statsd.md +++ /dev/null @@ -1,298 +0,0 @@ -<!-- -title: How to use any StatsD data source with Netdata -description: "Learn how to monitor any custom application instrumented with StatsD with per-second metrics and fully customizable, interactive charts." -image: /img/seo/guides/monitor/statsd.png -author: "Odysseas Lamtzidis" -author_title: "Developer Advocate" -author_img: "/img/authors/odysseas-lamtzidis.jpg" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/monitor/statsd.md ---> - -# StatsD Guide - -StatsD is a protocol and server implementation, first introduced at Etsy, to aggregate and summarize application metrics. With StatsD, applications are instrumented by developers using the libraries that already exist for the language, without caring about managing the data. The StatsD server is in charge of receiving the metrics, performing some simple processing on them, and then pushing them to the time-series database (TSDB) for long-term storage and visualization. - -Netdata is a fully-functional StatsD server and TSDB implementation, so you can instantly visualize metrics by simply sending them to Netdata using the built-in StatsD server. - -In this guide, we'll go through a scenario of visualizing our data in Netdata in a matter of seconds using [k6](https://k6.io), an open-source tool for automating load testing that outputs metrics to the StatsD format. - -Although we'll use k6 as the use-case, the same principles can be applied to every application that supports the StatsD protocol. Simply enable the StatsD output and point it to the node that runs Netdata, which is `localhost` in this case. - -In general, the process for creating a StatsD collector can be summarized in 2 steps: - -- Run an experiment by sending StatsD metrics to Netdata, without any prior configuration. This will create a chart per metric (called private charts) and will help you verify that everything works as expected from the application side of things. - - Make sure to reload the dashboard tab **after** you start sending data to Netdata. -- Create a configuration file for your app using [edit-config](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md): `sudo ./edit-config - statsd.d/myapp.conf` - - Each app will have it's own section in the right-hand menu. - -Now, let's see the above process in detail. - -## Prerequisites - -- A node with the [Netdata](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx) installed. -- An application to instrument. For this guide, that will be [k6](https://k6.io/docs/getting-started/installation). - -## Understanding the metrics - -The real in instrumenting an application with StatsD for you is to decide what metrics you want to visualize and how you want them grouped. In other words, you need decide which metrics will be grouped in the same charts and how the charts will be grouped on Netdata's dashboard. - -Start with documentation for the particular application that you want to monitor (or the technological stack that you are using). In our case, the [k6 documentation](https://k6.io/docs/using-k6/metrics/) has a whole page dedicated to the metrics output by k6, along with descriptions. - -If you are using StatsD to monitor an existing application, you don't have much control over these metrics. For example, k6 has a type called `trend`, which is identical to timers and histograms. Thus, _k6 is clearly dictating_ which metrics can be used as histograms and simple gauges. - -On the other hand, if you are instrumenting your own code, you will need to not only decide what are the "things" that you want to measure, but also decide which StatsD metric type is the appropriate for each. - -## Use private charts to see all available metrics - -In Netdata, every metric will receive its own chart, called a `private chart`. Although in the final implementation this is something that we will disable, since it can create considerable noise (imagine having 100s of metrics), it’s very handy while building the configuration file. - -You can get a quick visual representation of the metrics and their type (e.g it’s a gauge, a timer, etc.). - -An important thing to notice is that StatsD has different types of metrics, as illustrated in the [Netdata documentation](https://learn.netdata.cloud/docs/agent/collectors/statsd.plugin#metrics-supported-by-netdata). Histograms and timers support mathematical operations to be performed on top of the baseline metric, like reporting the `average` of the value. - -Here are some examples of default private charts. You can see that the histogram private charts will visualize all the available operations. - -**Gauge private chart** - -![Gauge metric example](https://i.imgur.com/Sr5nJEV.png) - -**Histogram private chart** - -![Timer metric example](https://i.imgur.com/P4p0hvq.png) - -## Create a new StatsD configuration file - -Start by creating a new configuration file under the `statsd.d/` folder in the [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). Use [`edit-config`](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) to create a new file called `k6.conf`. - -```bash= -sudo ./edit-config statsd.d/k6.conf -``` - -Copy the following configuration into your file as a starting point. - -```conf -[app] - name = k6 - metrics = k6* - private charts = yes - gaps when not collected = no - memory mode = dbengine -``` - -Next, you need is to understand how to organize metrics in Netdata’s StatsD. - -### Synthetic charts - -Netdata lets you group the metrics exposed by your instrumented application with _synthetic charts_. - -First, create a `[dictionary]` section to transform the names of the metrics into human-readable equivalents. `http_req_blocked`, `http_req_connecting`, `http_req_receiving`, and `http_reqs` are all metrics exposed by k6. - -``` -[dictionary] - http_req_blocked = Blocked HTTP Requests - http_req_connecting = Connecting HTTP Requests - http_req_receiving = Receiving HTTP Requests - http_reqs = Total HTTP requests -``` - -Continue this dictionary process with any other metrics you want to collect with Netdata. - -### Families and context - -Families and context are additional ways to group metrics. Families control the submenu at right-hand menu and it's a subcategory of the section. Given the metrics given by K6, we are organizing them in 2 major groups, or `families`: `k6 native metrics` and `http metrics`. - -Context is a second way to group metrics, when the metrics are of the same nature but different origin. In our case, if we ran several different load testing experiments side-by-side, we could define the same app, but different context (e.g `http_requests.experiment1`, `http_requests.experiment2`). - -Find more details about family and context in our [documentation](https://github.com/netdata/netdata/blob/master/web/README.md#families). - -### Dimension - -Now, having decided on how we are going to group the charts, we need to define how we are going to group metrics into different charts. This is particularly important, since we decide: - -- What metrics **not** to show, since they are not useful for our use-case. -- What metrics to consolidate into the same charts, so as to reduce noise and increase visual correlation. - -The dimension option has this syntax: `dimension = [pattern] METRIC NAME TYPE MULTIPLIER DIVIDER OPTIONS` - -- **pattern**: A keyword that tells the StatsD server the `METRIC` string is actually a [simple pattern].(/libnetdata/simple_pattern/README.md). We don't simple patterns in the example, but if we wanted to visualize all the `http_req` metrics, we could have a single dimension: `dimension = pattern 'k6.http_req*' last 1 1`. Find detailed examples with patterns in our [documentation](https://github.com/netdata/netdata/blob/master/collectors/statsd.plugin/README.md#dimension-patterns). -- **METRIC** The id of the metric as it comes from the client. You can easily find this in the private charts above, for example: `k6.http_req_connecting`. -- **NAME**: The name of the dimension. You can use the dictionary to expand this to something more human-readable. -- **TYPE**: - - For all charts: - - `events`: The number of events (data points) received by the StatsD server - - `last`: The last value that the server received - - For histograms and timers: - - `min`, `max`, `sum`, `average`, `percentile`, `median`, `stddev`: This is helpful if you want to see different representations of the same value. You can find an example at the `[iteration_duration]` above. Note that the baseline `metric` is the same, but the `name` of the dimension is different, since we use the baseline, but we perform a computation on it, creating a different final metric for visualization(dimension). -- **MULTIPLIER DIVIDER**: Handy if you want to convert Kilobytes to Megabytes or you want to give negative value. The second is handy for better visualization of send/receive. You can find an example at the **packets** submenu of the **IPv4 Networking Section**. - -> ❕ If you define a chart, run Netdata to visualize metrics, and then add or remove a dimension from that chart, this will result in a new chart with the same name, confusing Netdata. If you change the dimensions of the chart, please make sure to also change the `name` of that chart, since it serves as the `id` of that chart in Netdata's storage. (e.g http_req --> http_req_1). - -### Finalize your StatsD configuration file - -It's time to assemble all the pieces together and create the synthetic charts that will consist our application dashboard in Netdata. We can do it in a few simple steps: - -- Decide which metrics we want to use (we have viewed all of them as private charts). For example, we want to use `k6.http_requests`, `k6.vus`, etc. -- Decide how we want organize them in different synthetic charts. For example, we want `k6.http_requests`, `k6.vus` on their own, but `k6.http_req_blocked` and `k6.http_req_connecting` on the same chart. -- For each synthetic chart, we define a **unique** name and a human readable title. -- We decide at which `family` (submenu section) we want each synthetic chart to belong to. For example, here we have defined 2 families: `http requests`, `k6_metrics`. -- If we have multiple instances of the same metric, we can define different contexts, (Optional). -- We define a dimension according to the syntax we highlighted above. -- We define a type for each synthetic chart (line, area, stacked) -- We define the units for each synthetic chart. - -Following the above steps, we append to the `k6.conf` that we defined above, the following configuration: - -``` -[http_req_total] - name = http_req_total - title = Total HTTP Requests - family = http requests - context = k6.http_requests - dimension = k6.http_reqs http_reqs last 1 1 sum - type = line - units = requests/s - -[vus] - name = vus - title = Virtual Active Users - family = k6_metrics - dimension = k6.vus vus last 1 1 - dimension = k6.vus_max vus_max last 1 1 - type = line - unit = vus - -[iteration_duration] - name = iteration_duration_2 - title = Iteration duration - family = k6_metrics - dimension = k6.iteration_duration iteration_duration last 1 1 - dimension = k6.iteration_duration iteration_duration_max max 1 1 - dimension = k6.iteration_duration iteration_duration_min min 1 1 - dimension = k6.iteration_duration iteration_duration_avg avg 1 1 - type = line - unit = s - -[dropped_iterations] - name = dropped_iterations - title = Dropped Iterations - family = k6_metrics - dimension = k6.dropped_iterations dropped_iterations last 1 1 - units = iterations - type = line - -[data] - name = data - title = K6 Data - family = k6_metrics - dimension = k6.data_received data_received last 1 1 - dimension = k6.data_sent data_sent last -1 1 - units = kb/s - type = area - -[http_req_status] - name = http_req_status - title = HTTP Requests Status - family = http requests - dimension = k6.http_req_blocked http_req_blocked last 1 1 - dimension = k6.http_req_connecting http_req_connecting last 1 1 - units = ms - type = line - -[http_req_duration] - name = http_req_duration - title = HTTP requests duration - family = http requests - dimension = k6.http_req_sending http_req_sending last 1 1 - dimension = k6.http_req_waiting http_req_waiting last 1 1 - dimension = k6.http_req_receiving http_req_receiving last 1 1 - units = ms - type = stacked -``` - -> Take note that Netdata will report the rate for metrics and counters, even if k6 or another application sends an _absolute_ number. For example, k6 sends absolute HTTP requests with `http_reqs`, but Netdat visualizes that in `requests/second`. - -To enable this StatsD configuration, [restart Netdata](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md). - -## Final touches - -At this point, you have used StatsD to gather metrics for k6, creating a whole new section in your Netdata dashboard in the process. Moreover, you can further customize the icon of the particular section, as well as the description for each chart. - -To edit the section, please follow the Netdata [documentation](https://learn.netdata.cloud/docs/agent/web/gui#customizing-the-local-dashboard). - -While the following configuration will be placed in a new file, as the documentation suggests, it is instructing to use `dashboard_info.js` as a template. Open the file and see how the rest of sections and collectors have been defined. - -```javascript= -netdataDashboard.menu = { - 'k6': { - title: 'K6 Load Testing', - icon: '<i class="fas fa-cogs"></i>', - info: 'k6 is an open-source load testing tool and cloud service providing the best developer experience for API performance testing.' - }, - . - . - . -``` - -We can then add a description for each chart. Simply find the following section in `dashboard_info.js` to understand how a chart definitions are used: - -```javascript= -netdataDashboard.context = { - 'system.cpu': { - info: function (os) { - void (os); - return 'Total CPU utilization (all cores). 100% here means there is no CPU idle time at all. You can get per core usage at the <a href="#menu_cpu">CPUs</a> section and per application usage at the <a href="#menu_apps">Applications Monitoring</a> section.' - + netdataDashboard.sparkline('<br/>Keep an eye on <b>iowait</b> ', 'system.cpu', 'iowait', '%', '. If it is constantly high, your disks are a bottleneck and they slow your system down.') - + netdataDashboard.sparkline('<br/>An important metric worth monitoring, is <b>softirq</b> ', 'system.cpu', 'softirq', '%', '. A constantly high percentage of softirq may indicate network driver issues.'); - }, - valueRange: "[0, 100]" - }, -``` - -Afterwards, you can open your `custom_dashboard_info.js`, as suggested in the documentation linked above, and add something like the following example: - -```javascript= -netdataDashboard.context = { - 'k6.http_req_duration': { - info: "Total time for the request. It's equal to http_req_sending + http_req_waiting + http_req_receiving (i.e. how long did the remote server take to process the request and respond, without the initial DNS lookup/connection times)" - }, - -``` -The chart is identified as ``<section_name>.<chart_name>``. - -These descriptions can greatly help the Netdata user who is monitoring your application in the midst of an incident. - -The `info` field supports `html`, embedding useful links and instructions in the description. - -## Vendoring a new collector - -While we learned how to visualize any data source in Netdata using the StatsD protocol, we have also created a new collector. - -As long as you use the same underlying collector, every new `myapp.conf` file will create a new data source and dashboard section for Netdata. Netdata loads all the configuration files by default, but it will **not** create dashboard sections or charts, unless it starts receiving data for that particular data source. This means that we can now share our collector with the rest of the Netdata community. - -If you want to contribute or you need any help in developing your collector, we have a whole [Forum Category](https://community.netdata.cloud/c/agent-development/9) dedicated to contributing to the Netdata Agent. - -### Making a PR to the netdata/netdata repository - -- Make sure you follow the contributing guide and read our Code of Conduct -- Fork the netdata/netdata repository -- Place the configuration file inside `netdata/collectors/statsd.plugin` -- Add a reference in `netdata/collectors/statsd.plugin/Makefile.am`. For example, if we contribute the `k6.conf` file: -```Makefile -dist_statsdconfig_DATA = \ - example.conf \ - k6.conf \ - $(NULL) -``` - -## What's next? - -In this tutorial, you learned how to monitor an application using Netdata's StatsD implementation. - -Netdata allows you easily visualize any StatsD metric without any configuration, since it creates a private metric per chart by default. But to make your implementation more robust, you also learned how to group metrics by family and context, and create multiple dimensions. With these tools, you can quickly instrument any application with StatsD to monitor its performance and availability with per-second metrics. - -### Related reference documentation - -- [Netdata Agent · StatsD](https://github.com/netdata/netdata/blob/master/collectors/statsd.plugin/README.md) - - diff --git a/docs/guides/monitor/stop-notifications-alarms.md b/docs/guides/monitor/stop-notifications-alarms.md deleted file mode 100644 index 3c026a89b..000000000 --- a/docs/guides/monitor/stop-notifications-alarms.md +++ /dev/null @@ -1,92 +0,0 @@ -<!-- -title: "Stop notifications for individual alarms" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/monitor/stop-notifications-alarms.md ---> - -# Stop notifications for individual alarms - -In this short tutorial, you'll learn how to stop notifications for individual alarms in Netdata's health -monitoring system. We also refer to this process as _silencing_ the alarm. - -Why silence alarms? We designed Netdata's pre-configured alarms for production systems, so they might not be -relevant if you run Netdata on your laptop or a small virtual server. If they're not helpful, they can be a distraction -to real issues with health and performance. - -Silencing individual alarms is an excellent solution for situations where you're not interested in seeing a specific -alarm but don't want to disable a [notification system](https://github.com/netdata/netdata/blob/master/health/notifications/README.md) entirely. - -## Find the alarm configuration file - -To silence an alarm, you need to know where to find its configuration file. - -Let's use the `system.cpu` chart as an example. It's the first chart you'll see on most Netdata dashboards. - -To figure out which file you need to edit, open up Netdata's dashboard and, click the **Alarms** button at the top -of the dashboard, followed by clicking on the **All** tab. - -In this example, we're looking for the `system - cpu` entity, which, when opened, looks like this: - -![The system - cpu alarm -entity](https://user-images.githubusercontent.com/1153921/67034648-ebb4cc80-f0cc-11e9-9d49-1023629924f5.png) - -In the `source` row, you see that this chart is getting its configuration from -`4@/usr/lib/netdata/conf.d/health.d/cpu.conf`. The relevant part of begins at `health.d`: `health.d/cpu.conf`. That's -the file you need to edit if you want to silence this alarm. - -For more information about editing or referencing health configuration files on your system, see the [health -quickstart](https://github.com/netdata/netdata/blob/master/health/QUICKSTART.md#edit-health-configuration-files). - -## Edit the file to enable silencing - -To edit `health.d/cpu.conf`, use `edit-config` from inside of your Netdata configuration directory. - -```bash -cd /etc/netdata/ # Replace with your Netdata configuration directory, if not /etc/netdata/ -./edit-config health.d/cpu.conf -``` - -> You may need to use `sudo` or another method of elevating your privileges. - -The beginning of the file looks like this: - -```yaml -template: 10min_cpu_usage - on: system.cpu - os: linux - hosts: * - lookup: average -10m unaligned of user,system,softirq,irq,guest - units: % - every: 1m - warn: $this > (($status >= $WARNING) ? (75) : (85)) - crit: $this > (($status == $CRITICAL) ? (85) : (95)) - delay: down 15m multiplier 1.5 max 1h - info: average cpu utilization for the last 10 minutes (excluding iowait, nice and steal) - to: sysadmin -``` - -To silence this alarm, change `sysadmin` to `silent`. - -```yaml - to: silent -``` - -Use one of the available [methods](https://github.com/netdata/netdata/blob/master/health/QUICKSTART.md#reload-health-configuration) to reload your health configuration - and ensure you get no more notifications about that alarm**. - -You can add `to: silent` to any alarm you'd rather not bother you with notifications. - -## What's next? - -You should now know the fundamentals behind silencing any individual alarm in Netdata. - -To learn about _all_ of Netdata's health configuration possibilities, visit the [health reference -guide](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md), or check out other [tutorials on health monitoring](https://github.com/netdata/netdata/blob/master/health/README.md#guides). - -Or, take better control over how you get notified about alarms via the [notification -system](https://github.com/netdata/netdata/blob/master/health/notifications/README.md). - -You can also use Netdata's [Health Management API](https://github.com/netdata/netdata/blob/master/web/api/health/README.md#health-management-api) to control health -checks and notifications while Netdata runs. With this API, you can disable health checks during a maintenance window or -backup process, for example. - - diff --git a/docs/guides/monitor/visualize-monitor-anomalies.md b/docs/guides/monitor/visualize-monitor-anomalies.md deleted file mode 100644 index 90ce20a4b..000000000 --- a/docs/guides/monitor/visualize-monitor-anomalies.md +++ /dev/null @@ -1,142 +0,0 @@ ---- -title: "Monitor and visualize anomalies with Netdata (part 2)" -description: "Using unsupervised anomaly detection and machine learning, get notified " -image: /img/seo/guides/monitor/visualize-monitor-anomalies.png -author: "Joel Hans" -author_title: "Editorial Director, Technical & Educational Resources" -author_img: "/img/authors/joel-hans.jpg" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/monitor/visualize-monitor-anomalies.md ---- - -Welcome to part 2 of our series of guides on using _unsupervised anomaly detection_ to detect issues with your systems, -containers, and applications using the open-source Netdata Agent. For an introduction to detecting anomalies and -monitoring associated metrics, see [part 1](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/anomaly-detection-python.md), which covers prerequisites and -configuration basics. - -With anomaly detection in the Netdata Agent set up, you will now want to visualize and monitor which charts have -anomalous data, when, and where to look next. - -> 💡 In certain cases, the anomalies collector doesn't start immediately after restarting the Netdata Agent. If this -> happens, you won't see the dashboard section or the relevant [charts](#visualize-anomalies-in-charts) right away. Wait -> a minute or two, refresh, and look again. If the anomalies charts and alarms are still not present, investigate the -> error log with `less /var/log/netdata/error.log | grep anomalies`. - -## Test anomaly detection - -Time to see the Netdata Agent's unsupervised anomaly detection in action. To trigger anomalies on the Nginx web server, -use `ab`, otherwise known as [Apache Bench](https://httpd.apache.org/docs/2.4/programs/ab.html). Despite its name, it -works just as well with Nginx web servers. Install it on Ubuntu/Debian systems with `sudo apt install apache2-utils`. - -> 💡 If you haven't followed the guide's example of using Nginx, an easy way to test anomaly detection on your node is -> to use the `stress-ng` command, which is available on most Linux distributions. Run `stress-ng --cpu 0` to create CPU -> stress or `stress-ng --vm 0` for RAM stress. Each test will cause some "collateral damage," in that you may see CPU -> utilization rise when running the RAM test, and vice versa. - -The following test creates a minimum of 10,000,000 requests for Nginx to handle, with a maximum of 10 at any given time, -with a run time of 60 seconds. If your system can handle those 10,000,000 in less than 60 seconds, `ab` will keep -sending requests until the timer runs out. - -```bash -ab -k -c 10 -t 60 -n 10000000 http://127.0.0.1/ -``` - -Let's see how Netdata detects this anomalous behavior and propagates information to you through preconfigured alarms and -dashboards that automatically organize anomaly detection metrics into meaningful charts to help you begin root cause -analysis (RCA). - -## Monitor anomalies with alarms - -The anomalies collector creates two "classes" of alarms for each chart captured by the `charts_regex` setting. All these -alarms are preconfigured based on your [configuration in -`anomalies.conf`](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/anomaly-detection-python.md#configure-the-anomalies-collector). With the `charts_regex` -and `charts_to_exclude` settings from [part 1](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/anomaly-detection-python.md) of this guide series, the -Netdata Agent creates 32 alarms driven by unsupervised anomaly detection. - -The first class triggers warning alarms when the average anomaly probability for a given chart has stayed above 50% for -at least the last two minutes. - -![An example anomaly probability -alarm](https://user-images.githubusercontent.com/1153921/104225767-0a0a9480-5404-11eb-9bfd-e29592397203.png) - -The second class triggers warning alarms when the number of anomalies in the last two minutes hits 10 or higher. - -![An example anomaly count -alarm](https://user-images.githubusercontent.com/1153921/104225769-0aa32b00-5404-11eb-95f3-7309f9429fe1.png) - -If you see either of these alarms in Netdata Cloud, the local Agent dashboard, or on your preferred notification -platform, it's a safe bet that the node's current metrics have deviated from normal. That doesn't necessarily mean -there's a full-blown incident, depending on what application/service you're using anomaly detection on, but it's worth -further investigation. - -As you use the anomalies collector, you may find that the default settings provide too many or too few genuine alarms. -In this case, [configure the alarm](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md) with `sudo ./edit-config -health.d/anomalies.conf`. Take a look at the `lookup` line syntax in the [health -reference](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#alarm-line-lookup) to understand how the anomalies collector automatically creates -alarms for any dimension on the `anomalies_local.probability` and `anomalies_local.anomaly` charts. - -## Visualize anomalies in charts - -In either [Netdata Cloud](https://app.netdata.cloud) or the local Agent dashboard at `http://NODE:19999`, click on the -**Anomalies** [section](https://github.com/netdata/netdata/blob/master/web/gui/README.md#sections) to see the pair of anomaly detection charts, which are -preconfigured to visualize per-second anomaly metrics based on your [configuration in -`anomalies.conf`](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/anomaly-detection-python.md#configure-the-anomalies-collector). - -These charts have the contexts `anomalies.probability` and `anomalies.anomaly`. Together, these charts -create meaningful visualizations for immediately recognizing not only that something is going wrong on your node, but -give context as to where to look next. - -The `anomalies_local.probability` chart shows the probability that the latest observed data is anomalous, based on the -trained model. The `anomalies_local.anomaly` chart visualizes 0→1 predictions based on whether the latest observed -data is anomalous based on the trained model. Both charts share the same dimensions, which you configured via -`charts_regex` and `charts_to_exclude` in [part 1](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/anomaly-detection-python.md). - -In other words, the `probability` chart shows the amplitude of the anomaly, whereas the `anomaly` chart provides quick -yes/no context. - -![Two charts created by the anomalies -collector](https://user-images.githubusercontent.com/1153921/104226380-ef84eb00-5404-11eb-9faf-9e64c43b95ff.png) - -Before `08:32:00`, both charts show little in the way of verified anomalies. Based on the metrics the anomalies -collector has trained on, a certain percentage of anomaly probability score is normal, as seen in the -`web_log_nginx_requests_prob` dimension and a few others. What you're looking for is large deviations from the "noise" -in the `anomalies.probability` chart, or any increments to the `anomalies.anomaly` chart. - -Unsurprisingly, the stress test that began at `08:32:00` caused significant changes to these charts. The three -dimensions that immediately shot to 100% anomaly probability, and remained there during the test, were -`web_log_nginx.requests_prob`, `nginx_local.connections_accepted_handled_prob`, and `system.cpu_pressure_prob`. - -## Build an anomaly detection dashboard - -[Netdata Cloud](https://app.netdata.cloud) features a drag-and-drop [dashboard -editor](https://github.com/netdata/netdata/blob/master/docs/visualize/create-dashboards.md) that helps you create entirely new dashboards with charts targeted for -your specific applications. - -For example, here's a dashboard designed for visualizing anomalies present in an Nginx web server, including -documentation about why the dashboard exists and where to look next based on what you're seeing: - -![An example anomaly detection -dashboard](https://user-images.githubusercontent.com/1153921/104226915-c6188f00-5405-11eb-9bb4-559a18016fa7.png) - -Use the anomaly charts for instant visual identification of potential anomalies, and then Nginx-specific charts, in the -right column, to validate whether the probability and anomaly counters are showing a valid incident worth further -investigation using [Metric Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md) to narrow -the dashboard into only the charts relevant to what you're seeing from the anomalies collector. - -## What's next? - -Between this guide and [part 1](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/anomaly-detection-python.md), which covered setup and configuration, you -now have a fundamental understanding of how unsupervised anomaly detection in Netdata works, from root cause to alarms -to preconfigured or custom dashboards. - -We'd love to hear your feedback on the anomalies collector. Hop over to the [community -forum](https://community.netdata.cloud/t/anomalies-collector-feedback-megathread/767), and let us know if you're already getting value from -unsupervised anomaly detection, or would like to see something added to it. You might even post a custom configuration -that works well for monitoring some other popular application, like MySQL, PostgreSQL, Redis, or anything else we -[support through collectors](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md). - -### Related reference documentation - -- [Netdata Agent · Anomalies collector](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/anomalies/README.md) -- [Netdata Cloud · Build new dashboards](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md) - - diff --git a/docs/guides/python-collector.md b/docs/guides/python-collector.md index e0e7a6041..f77699495 100644 --- a/docs/guides/python-collector.md +++ b/docs/guides/python-collector.md @@ -1,35 +1,57 @@ -<!-- -title: "Develop a custom data collector in Python" -description: "Learn how write a custom data collector in Python, which you'll use to collect metrics from and monitor any application that isn't supported out of the box." -image: /img/seo/guides/python-collector.png -author: "Panagiotis Papaioannou" -author_title: "University of Patras" -author_img: "/img/authors/panagiotis-papaioannou.jpg" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/python-collector.md ---> - # Develop a custom data collector in Python -The Netdata Agent uses [data collectors](https://github.com/netdata/netdata/blob/master/docs/collect/how-collectors-work.md) to fetch metrics from hundreds of system, -container, and service endpoints. While the Netdata team and community has built [powerful -collectors](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md) for most system, container, and service/application endpoints, there are plenty -of custom applications that can't be monitored by default. - -## Problem - -You have a custom application or infrastructure that you need to monitor, but no open-source monitoring tool offers a -prebuilt method for collecting your required metric data. - -## Solution +The Netdata Agent uses [data collectors](https://github.com/netdata/netdata/blob/master/collectors/README.md) to +fetch metrics from hundreds of system, container, and service endpoints. While the Netdata team and community has built +[powerful collectors](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md) for most system, container, +and service/application endpoints, some custom applications can't be monitored by default. In this tutorial, you'll learn how to leverage the [Python programming language](https://www.python.org/) to build a custom data collector for the Netdata Agent. Follow along with your own dataset, using the techniques and best practices covered here, or use the included examples for collecting and organizing either random or weather data. +## Disclaimer + +If you're comfortable with Golang, consider instead writing a module for the [go.d.plugin](https://github.com/netdata/go.d.plugin). +Golang is more performant, easier to maintain, and simpler for users since it doesn't require a particular runtime on the node to +execute. Python plugins require Python on the machine to be executed. Netdata uses Go as the platform of choice for +production-grade collectors. + +We generally do not accept contributions of Python modules to the Github project netdata/netdata. If you write a Python collector and +want to make it available for other users, you should create the pull request in https://github.com/netdata/community. + ## What you need to get started -- A physical or virtual Linux system, which we'll call a _node_. -- A working installation of the free and open-source [Netdata](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx) monitoring agent. + - A physical or virtual Linux system, which we'll call a _node_. + - A working [installation of Netdata](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md) monitoring agent. + +### Quick start + +For a quick start, you can look at the +[example plugin](https://raw.githubusercontent.com/netdata/netdata/master/collectors/python.d.plugin/example/example.chart.py). + +**Note**: If you are working 'locally' on a new collector and would like to run it in an already installed and running +Netdata (as opposed to having to install Netdata from source again with your new changes) you can copy over the relevant +file to where Netdata expects it and then either `sudo systemctl restart netdata` to have it be picked up and used by +Netdata or you can just run the updated collector in debug mode by following a process like below (this assumes you have +[installed Netdata from a GitHub fork](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/manual.md) you +have made to do your development on). + +```bash +# clone your fork (done once at the start but shown here for clarity) +#git clone --branch my-example-collector https://github.com/mygithubusername/netdata.git --depth=100 --recursive +# go into your netdata source folder +cd netdata +# git pull your latest changes (assuming you built from a fork you are using to develop on) +git pull +# instead of running the installer we can just copy over the updated collector files +#sudo ./netdata-installer.sh --dont-wait +# copy over the file you have updated locally (pretending we are working on the 'example' collector) +sudo cp collectors/python.d.plugin/example/example.chart.py /usr/libexec/netdata/python.d/ +# become user netdata +sudo su -s /bin/bash netdata +# run your updated collector in debug mode to see if it works without having to reinstall netdata +/usr/libexec/netdata/plugins.d/python.d.plugin example debug trace nolock +``` ## Jobs and elements of a Python collector @@ -50,6 +72,11 @@ The basic elements of a Netdata collector are: - `data{}`: A dictionary containing the values to be displayed. - `get_data()`: The basic function of the plugin which will return to Netdata the correct values. +**Note**: All names are better explained in the +[External Plugins Documentation](https://github.com/netdata/netdata/blob/master/collectors/plugins.d/README.md). +Parameters like `priority` and `update_every` mentioned in that documentation are handled by the `python.d.plugin`, +not by each collection module. + Let's walk through these jobs and elements as independent elements first, then apply them to example Python code. ### Determine how to gather metrics data @@ -135,11 +162,18 @@ correct values. ## Framework classes -The `python.d` plugin has a number of framework classes that can be used to speed up the development of your python -collector. Your class can inherit one of these framework classes, which have preconfigured methods. +Every module needs to implement its own `Service` class. This class should inherit from one of the framework classes: + +- `SimpleService` +- `UrlService` +- `SocketService` +- `LogService` +- `ExecutableService` -For example, the snippet below is from the [RabbitMQ -collector](https://github.com/netdata/netdata/blob/91f3268e9615edd393bd43de4ad8068111024cc9/collectors/python.d.plugin/rabbitmq/rabbitmq.chart.py#L273). +Also it needs to invoke the parent class constructor in a specific way as well as assign global variables to class variables. + +For example, the snippet below is from the +[RabbitMQ collector](https://github.com/netdata/netdata/blob/91f3268e9615edd393bd43de4ad8068111024cc9/collectors/python.d.plugin/rabbitmq/rabbitmq.chart.py#L273). This collector uses an HTTP endpoint and uses the `UrlService` framework class, which only needs to define an HTTP endpoint for data collection. @@ -166,8 +200,7 @@ class Service(UrlService): In our use-case, we use the `SimpleService` framework, since there is no framework class that suits our needs. -You can read more about the [framework classes](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/README.md#how-to-write-a-new-module) from -the Netdata documentation. +You can find below the [framework class reference](#framework-class-reference). ## An example collector using weather station data @@ -196,6 +229,35 @@ CHARTS = { ## Parse the data to extract or create the actual data to be represented +Every collector must implement `_get_data`. This method should grab raw data from `_get_raw_data`, +parse it, and return a dictionary where keys are unique dimension names, or `None` if no data is collected. + +For example: +```py +def _get_data(self): + try: + raw = self._get_raw_data().split(" ") + return {'active': int(raw[2])} + except (ValueError, AttributeError): + return None +``` + +In our weather data collector we declare `_get_data` as follows: + +```python + def get_data(self): + #The data dict is basically all the values to be represented + # The entries are in the format: { "dimension": value} + #And each "dimension" should belong to a chart. + data = dict() + + self.populate_data() + + data['current_temperature'] = self.weather_data["temp"] + + return data +``` + A standard practice would be to either get the data on JSON format or transform them to JSON format. We use a dictionary to give this format and issue random values to simulate received data. @@ -461,26 +523,104 @@ variables and inform the user about the defaults. For example, take a look at th You can read more about the configuration file on the [`python.d.plugin` documentation](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/README.md). -## What's next? +You can find the source code for the above examples on [GitHub](https://github.com/papajohn-uop/netdata). + +## Pull Request Checklist for Python Plugins + +Pull requests should be created in https://github.com/netdata/community. + +This is a generic checklist for submitting a new Python plugin for Netdata. It is by no means comprehensive. + +At minimum, to be buildable and testable, the PR needs to include: + +- The module itself, following proper naming conventions: `collectors/python.d.plugin/<module_dir>/<module_name>.chart.py` +- A README.md file for the plugin under `collectors/python.d.plugin/<module_dir>`. +- The configuration file for the module: `collectors/python.d.plugin/<module_dir>/<module_name>.conf`. Python config files are in YAML format, and should include comments describing what options are present. The instructions are also needed in the configuration section of the README.md +- A basic configuration for the plugin in the appropriate global config file: `collectors/python.d.plugin/python.d.conf`, which is also in YAML format. Either add a line that reads `# <module_name>: yes` if the module is to be enabled by default, or one that reads `<module_name>: no` if it is to be disabled by default. +- A makefile for the plugin at `collectors/python.d.plugin/<module_dir>/Makefile.inc`. Check an existing plugin for what this should look like. +- A line in `collectors/python.d.plugin/Makefile.am` including the above-mentioned makefile. Place it with the other plugin includes (please keep the includes sorted alphabetically). +- Optionally, chart information in `web/gui/dashboard_info.js`. This generally involves specifying a name and icon for the section, and may include descriptions for the section or individual charts. +- Optionally, some default alarm configurations for your collector in `health/health.d/<module_name>.conf` and a line adding `<module_name>.conf` in `health/Makefile.am`. + +## Framework class reference + +Every framework class has some user-configurable variables which are specific to this particular class. Those variables should have default values initialized in the child class constructor. + +If module needs some additional user-configurable variable, it can be accessed from the `self.configuration` list and assigned in constructor or custom `check` method. Example: + +```py +def __init__(self, configuration=None, name=None): + UrlService.__init__(self, configuration=configuration, name=name) + try: + self.baseurl = str(self.configuration['baseurl']) + except (KeyError, TypeError): + self.baseurl = "http://localhost:5001" +``` + +Classes implement `_get_raw_data` which should be used to grab raw data. This method usually returns a list of strings. + +### `SimpleService` + +This is last resort class, if a new module cannot be written by using other framework class this one can be used. + +Example: `ceph`, `sensors` + +It is the lowest-level class which implements most of module logic, like: + +- threading +- handling run times +- chart formatting +- logging +- chart creation and updating + +### `LogService` + +Examples: `apache_cache`, `nginx_log`_ + +Variable from config file: `log_path`. + +Object created from this class reads new lines from file specified in `log_path` variable. It will check if file exists and is readable. Also `_get_raw_data` returns list of strings where each string is one line from file specified in `log_path`. + +### `ExecutableService` + +Examples: `exim`, `postfix`_ + +Variable from config file: `command`. + +This allows to execute a shell command in a secure way. It will check for invalid characters in `command` variable and won't proceed if there is one of: + +- '&' +- '|' +- ';' +- '>' +- '\<' + +For additional security it uses python `subprocess.Popen` (without `shell=True` option) to execute command. Command can be specified with absolute or relative name. When using relative name, it will try to find `command` in `PATH` environment variable as well as in `/sbin` and `/usr/sbin`. + +`_get_raw_data` returns list of decoded lines returned by `command`. + +### UrlService + +Examples: `apache`, `nginx`, `tomcat`_ + +Variables from config file: `url`, `user`, `pass`. + +If data is grabbed by accessing service via HTTP protocol, this class can be used. It can handle HTTP Basic Auth when specified with `user` and `pass` credentials. + +Please note that the config file can use different variables according to the specification of each module. + +`_get_raw_data` returns list of utf-8 decoded strings (lines). + +### SocketService + +Examples: `dovecot`, `redis` -Find the source code for the above examples on [GitHub](https://github.com/papajohn-uop/netdata). +Variables from config file: `unix_socket`, `host`, `port`, `request`. -Now you are ready to start developing our Netdata python Collector and share it with the rest of the Netdata community. +Object will try execute `request` using either `unix_socket` or TCP/IP socket with combination of `host` and `port`. This can access unix sockets with SOCK_STREAM or SOCK_DGRAM protocols and TCP/IP sockets in version 4 and 6 with SOCK_STREAM setting. -- If you need help while developing your collector, join our [Netdata - Community](https://community.netdata.cloud/c/agent-development/9) to chat about it. -- Follow the - [checklist](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/README.md#pull-request-checklist-for-python-plugins) - to contribute the collector to the Netdata Agent [repository](https://github.com/netdata/netdata). -- Check out the [example](https://github.com/netdata/netdata/tree/master/collectors/python.d.plugin/example) Python - collector, which is a minimal example collector you could also use as a starting point. Once comfortable with that, - then browse other [existing collectors](https://github.com/netdata/netdata/tree/master/collectors/python.d.plugin) - that might have similarities to what you want to do. -- If you're developing a proof of concept (PoC), consider migrating the collector in Golang - ([go.d.plugin](https://github.com/netdata/go.d.plugin)) once you validate its value in production. Golang is more - performant, easier to maintain, and simpler for users since it doesn't require a particular runtime on the node to - execute (Python plugins require Python on the machine to be executed). Netdata uses Go as the platform of choice for - production-grade collectors. -- Celebrate! You have contributed to an open-source project with hundreds of thousands of users! +Sockets are accessed in non-blocking mode with 15 second timeout. +After every execution of `_get_raw_data` socket is closed, to prevent this module needs to set `_keep_alive` variable to `True` and implement custom `_check_raw_data` method. +`_check_raw_data` should take raw data and return `True` if all data is received otherwise it should return `False`. Also it should do it in fast and efficient way. diff --git a/docs/guides/step-by-step/step-00.md b/docs/guides/step-by-step/step-00.md deleted file mode 100644 index 2f83ee9b4..000000000 --- a/docs/guides/step-by-step/step-00.md +++ /dev/null @@ -1,120 +0,0 @@ -<!-- -title: "The step-by-step Netdata guide" -date: 2020-03-31 -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/step-by-step/step-00.md ---> -import { OneLineInstallWget, OneLineInstallCurl } from '@site/src/components/OneLineInstall/' - -# The step-by-step Netdata guide - -Welcome to Netdata! We're glad you're interested in our health monitoring and performance troubleshooting system. - -Because Netdata is entirely open-source software, you can use it free of charge, whether you want to monitor one or ten -thousand systems! All our code is hosted on [GitHub](https://github.com/netdata/netdata). - -This guide is designed to help you understand what Netdata is, what it's capable of, and how it'll help you make -faster and more informed decisions about the health and performance of your systems and applications. If you're -completely new to Netdata, or have never tried health monitoring/performance troubleshooting systems before, this -guide is perfect for you. - -If you have monitoring experience, or would rather get straight into configuring Netdata to your needs, you can jump -straight into code and configurations with our [getting started guide](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx). - -> This guide contains instructions for Netdata installed on a Linux system. Many of the instructions will work on -> other supported operating systems, like FreeBSD and macOS, but we can't make any guarantees. - -## Where to go if you need help - -No matter where you are in this Netdata guide, if you need help, head over to our [GitHub -repository](https://github.com/netdata/netdata/). That's where we collect questions from users, help fix their bugs, and -point people toward documentation that explains what they're having trouble with. - -Click on the **issues** tab to see all the conversations we're having with Netdata users. Use the search bar to find -previously-written advice for your specific problem, and if you don't see any results, hit the **New issue** button to -send us a question. - - -## Before we get started - -Let's make sure you have Netdata installed on your system! - -> If you already installed Netdata, feel free to skip to [Step 1: Netdata's building blocks](step-01.md). - -The easiest way to install Netdata on a Linux system is our `kickstart.sh` one-line installer. Run this on your system -and let it take care of the rest. - -This script will install Netdata from source, keep it up to date with nightly releases, connects to the Netdata -[registry](https://github.com/netdata/netdata/blob/master/registry/README.md), and sends [_anonymous statistics_](https://github.com/netdata/netdata/blob/master/docs/anonymous-statistics.md) about how you use -Netdata. We use this information to better understand how we can improve the Netdata experience for all our users. - -To install Netdata, run the following as your normal user: - -<OneLineInstallWget/> - -Or, if you have cURL but not wget (such as on macOS): - -<OneLineInstallCurl/> - - -Once finished, you'll have Netdata installed, and you'll be set up to get _nightly updates_ to get the latest features, -improvements, and bugfixes. - -If this method doesn't work for you, or you want to use a different process, visit our [installation -documentation](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md) for details. - -## Netdata fundamentals - -[Step 1. Netdata's building blocks](step-01.md) - -In this introductory step, we'll talk about the fundamental ideas, philosophies, and UX decisions behind Netdata. - -[Step 2. Get to know Netdata's dashboard](step-02.md) - -Visit Netdata's dashboard to explore, manipulate charts, and check out alarms. Get your first taste of visual anomaly -detection. - -[Step 3. Monitor more than one system with Netdata](step-03.md) - -While the dashboard lets you quickly move from one agent to another, Netdata Cloud is our SaaS solution for monitoring -the health of many systems. We'll cover its features and the benefits of using Netdata Cloud on top of the dashboard. - -[Step 4. The basics of configuring Netdata](step-04.md) - -While Netdata can monitor thousands of metrics in real-time without any configuration, you may _want_ to tweak some -settings based on your system's resources. - -## Intermediate steps - -[Step 5. Health monitoring alarms and notifications](step-05.md) - -Learn how to tune, silence, and write custom alarms. Then enable notifications so you never miss a change in health -status or performance anomaly. - -[Step 6. Collect metrics from more services and apps](step-06.md) - -Learn how to enable/disable collection plugins and configure a collection plugin job to add more charts to your Netdata -dashboard and begin monitoring more apps and services, like MySQL, Nginx, MongoDB, and hundreds more. - -[Step 7. Netdata's dashboard in depth](step-07.md) - -Now that you configured your Netdata monitoring agent to your exact needs, you'll dive back into metrics snapshots, -updates, and the dashboard's settings. - -## Advanced steps - -[Step 8. Building your first custom dashboard](step-08.md) - -Using simple HTML, CSS, and JavaScript, we'll build a custom dashboard that displays essential information in any format -you choose. You can even monitor many systems from a single HTML file. - -[Step 9. Long-term metrics storage](step-09.md) - -By default, Netdata can store lots of real-time metrics, but you can also tweak our custom database engine to your -heart's content. Want to take your Netdata metrics elsewhere? We're happy to help you archive data to Prometheus, -MongoDB, TimescaleDB, and others. - -[Step 10. Set up a proxy](step-10.md) - -Run Netdata behind an Nginx proxy to improve performance, and enable TLS/HTTPS for better security. - - diff --git a/docs/guides/step-by-step/step-01.md b/docs/guides/step-by-step/step-01.md deleted file mode 100644 index e60bb0769..000000000 --- a/docs/guides/step-by-step/step-01.md +++ /dev/null @@ -1,156 +0,0 @@ -<!-- -title: "Step 1. Netdata's building blocks" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/step-by-step/step-01.md ---> - -# Step 1. Netdata's building blocks - -Netdata is a distributed and real-time _health monitoring and performance troubleshooting toolkit_ for monitoring your -systems and applications. - -Because the monitoring agent is highly-optimized, you can install it all your physical systems, containers, IoT devices, -and edge devices without disrupting their core function. - -By default, and without configuration, Netdata delivers real-time insights into everything happening on the system, from -CPU utilization to packet loss on every network device. Netdata can also auto-detect metrics from hundreds of your -favorite services and applications, like MySQL/MariaDB, Docker, Nginx, Apache, MongoDB, and more. - -All metrics are automatically-updated, providing interactive dashboards that allow you to dive in, discover anomalies, -and figure out the root cause analysis of any issue. - -Best of all, Netdata is entirely free, open-source software! Solo developers and enterprises with thousands of systems -can both use it free of charge. We're hosted on [GitHub](https://github.com/netdata/netdata). - -Want to learn about the history of Netdata, and what inspired our CEO to build it in the first place, and where we're -headed? Read Costa's comprehensive blog post: _[Redefining monitoring with Netdata (and how it came to -be)](https://blog.netdata.cloud/posts/redefining-monitoring-netdata/)_. - -## What you'll learn in this step - -In the first step of the Netdata guide, you'll learn about: - -- [Netdata's core features](#netdatas-core-features) -- [Why you should use Netdata](#why-you-should-use-netdata) -- [How Netdata has complementary systems, not competitors](#how-netdata-has-complementary-systems-not-competitors) - -Let's get started! - -## Netdata's core features - -Netdata has only been around for a few years, but it's a complex piece of software. Here are just some of the features -we'll cover throughout this guide. - -- A sophisticated **dashboard**, which we'll cover in [step 2](step-02.md). The real-time, highly-granular dashboard, - with hundreds of charts, is your main source of information about the health and performance of your systems/ - applications. We designed the dashboard with anomaly detection and quick analysis in mind. We'll return to - dashboard-related topics in both [step 7](step-07.md) and [step 8](step-08.md). -- **Long-term metrics storage** by default. With our new database engine, you can store days, weeks, or months of - per-second historical metrics. Or you can archive metrics to another database, like MongoDB or Prometheus. We'll - cover all these options in [step 9](step-09.md). -- **No configuration necessary**. Without any configuration, you'll get thousands of real-time metrics and hundreds of - alarms designed by our community of sysadmin experts. But you _can_ configure Netdata in a lot of ways, some of - which we'll cover in [step 4](step-04.md). -- **Distributed, per-system installation**. Instead of centralizing metrics in one location, you install Netdata on - _every_ system, and each system is responsible for its metrics. Having distributed agents reduces cost and lets - Netdata run on devices with little available resources, such as IoT and edge devices, without affecting their core - purpose. -- **Sophisticated health monitoring** to ensure you always know when an anomaly hits. In [step 5](step-05.md), we dive - into how you can tune alarms, write your own alarm, and enable two types of notifications. -- **High-speed, low-resource collectors** that allow you to collect thousands of metrics every second while using only - a fraction of your system's CPU resources and a few MiB of RAM. -- **Netdata Cloud** is our SaaS toolkit that helps Netdata users monitor the health and performance of entire - infrastructures, whether they are two or two thousand (or more!) systems. We'll cover Netdata Cloud in [step - 3](step-03.md). - -## Why you should use Netdata - -Because you care about the health and performance of your systems and applications, and all of the awesome features we -just mentioned. And it's free! - -All these may be valid reasons, but let's step back and talk about Netdata's _principles_ for health monitoring and -performance troubleshooting. We have a lot of [complementary -systems](#how-netdata-has-complementary-systems-not-competitors), and we think there's a good reason why Netdata should -always be your first choice when troubleshooting an anomaly. - -We built Netdata on four principles. - -### Per-second data collection - -Our first principle is per-second data collection for all metrics. - -That matters because you can't monitor a 2-second service-level agreement (SLA) with 10-second metrics. You can't detect -quick anomalies if your metrics don't show them. - -How do we solve this? By decentralizing monitoring. Each node is responsible for collecting metrics, triggering alarms, -and building dashboards locally, and we work hard to ensure it does each step (and others) with remarkable efficiency. -For example, Netdata can [collect 100,000 metrics](https://github.com/netdata/netdata/issues/1323) every second while -using only 9% of a single server-grade CPU core! - -By decentralizing monitoring and emphasizing speed at every turn, Netdata helps you scale your health monitoring and -performance troubleshooting to an infrastructure of every size. _And_ you get to keep per-second metrics in long-term -storage thanks to the database engine. - -### Unlimited metrics - -We believe all metrics are fundamentally important, and all metrics should be available to the user. - -If you don't collect _all_ the metrics a system creates, you're only seeing part of the story. It's like saying you've -read a book after skipping all but the last ten pages. You only know the ending, not everything that leads to it. - -Most monitoring solutions exist to poke you when there's a problem, and then tell you to use a dozen different console -tools to find the root cause. Netdata prefers to give you every piece of information you might need to understand why an -anomaly happened. - -### Meaningful presentation - -We want every piece of Netdata's dashboard not only to look good and update every second, but also provide context as to -what you're looking at and why it matters. - -The principle of meaningful presentation is fundamental to our dashboard's user experience (UX). We could have put -charts in a grid or hidden some behind tabs or buttons. We instead chose to stack them vertically, on a single page, so -you can visually see how, for example, a jump in disk usage can also increase system load. - -Here's an example of a system undergoing a disk stress test: - -![Screen Shot 2019-10-23 at 15 38 -32](https://user-images.githubusercontent.com/1153921/67439589-7f920700-f5ab-11e9-930d-fb0014900d90.png) - -> For the curious, here's the command: `stress-ng --fallocate 4 --fallocate-bytes 4g --timeout 1m --metrics --verify -> --times`! - -### Immediate results - -Finally, Netdata should be usable from the moment you install it. - -As we've talked about, and as you'll learn in the following nine steps, Netdata comes installed with: - -- Auto-detected metrics -- Human-readable units -- Metrics that are structured into charts, families, and contexts -- Automatically generated dashboards -- Charts designed for visual anomaly detection -- Hundreds of pre-configured alarms - -By standardizing your monitoring infrastructure, Netdata tries to make at least one part of your administrative tasks -easy! - -## How Netdata has complementary systems, not competitors - -We'll cover this quickly, as you're probably eager to get on with using Netdata itself. - -We don't want to lock you in to using Netdata by itself, and forever. By supporting [archiving to -external databases](https://github.com/netdata/netdata/blob/master/exporting/README.md) like Graphite, Prometheus, OpenTSDB, MongoDB, and others, you can use Netdata _in -conjunction_ with software that might seem like our competitors. - -We don't want to "wage war" with another monitoring solution, whether it's commercial, open-source, or anything in -between. We just want to give you all the metrics every second, and what you do with them next is your business, not -ours. Our mission is helping people create more extraordinary infrastructures! - -## What's next? - -We think it's imperative you understand why we built Netdata the way we did. But now that we have that behind us, let's -get right into that dashboard you've heard so much about. - -[Next: Get to know Netdata's dashboard →](step-02.md) - - diff --git a/docs/guides/step-by-step/step-02.md b/docs/guides/step-by-step/step-02.md deleted file mode 100644 index 535f3cfa3..000000000 --- a/docs/guides/step-by-step/step-02.md +++ /dev/null @@ -1,208 +0,0 @@ -<!-- -title: "Step 2. Get to know Netdata's dashboard" -date: 2020-05-04 -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/step-by-step/step-02.md ---> - -# Step 2. Get to know Netdata's dashboard - -Welcome to Netdata proper! Now that you understand how Netdata works, how it's built, and why we built it, you can start -working with the dashboard directly. - -This step-by-step guide assumes you've already installed Netdata on a system of yours. If you haven't yet, hop back over -to ["step 0"](step-00.md#before-we-get-started) for information about our one-line installer script. Or, view the -[installation docs](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md) to learn more. Once you have Netdata installed, you can hop back -over here and dig in. - -## What you'll learn in this step - -In this step of the Netdata guide, you'll learn how to: - -- [Visit and explore the dashboard](#visit-and-explore-the-dashboard) -- [Explore available charts using menus](#explore-available-charts-using-menus) -- [Read the descriptions accompanying charts](#read-the-descriptions-accompanying-charts) -- [Interact with charts](#interact-with-charts) -- [See raised alarms and the alarm log](#see-raised-alarms-and-the-alarm-log) - -Let's get started! - -## Visit and explore the dashboard - -Netdata's dashboard is where you interact with your system's metrics. Time to open it up and start exploring. Open up -your browser of choice. - -Open up your web browser of choice and navigate to `http://NODE:19999`, replacing `NODE` with the IP address or hostname -of your Agent. If you're unsure, try `http://localhost:19999` first. Hit **Enter**. Welcome to Netdata! - -![Animated GIF of navigating to the -dashboard](https://user-images.githubusercontent.com/1153921/80825153-abaec600-8b94-11ea-8b17-1b770a2abaa9.gif) - -> From here on out in this guide, we'll refer to the address you use to view your dashboard as `NODE`. Be sure to -> replace it with either `localhost`, the IP address, or the hostname of your system. - -## Explore available charts using menus - -**Menus** are located on the right-hand side of the Netdata dashboard. You can use these to navigate to the -charts you're interested in. - -![Animated GIF of using the menus and -submenus](https://user-images.githubusercontent.com/1153921/80832425-7c528600-8ba1-11ea-8140-d0a17a62009b.gif) - -Netdata shows all its charts on a single page, so you can also scroll up and down using the mouse wheel, your -touchscreen/touchpad, or the scrollbar. - -Both menus and the items displayed beneath them, called **submenus**, are populated automatically by Netdata based on -what it's collecting. If you run Netdata on many different systems using different OS types or versions, the -menus and submenus may look a little different for each one. - -To learn more about menus, see our documentation about [navigating the standard -dashboard](https://github.com/netdata/netdata/blob/master/web/gui/README.md#metrics-menus). - -> ❗ By default, Netdata only creates and displays charts if the metrics are _not zero_. So, you may be missing some -> charts, menus, and submenus if those charts have zero metrics. You can change this by changing the **Which dimensions -> to show?** setting to **All**. In addition, if you start Netdata and immediately load the dashboard, not all -> charts/menus/submenus may be displayed, as some collectors can take a while to initialize. - -## Read the descriptions accompanying charts - -Many charts come with a short description of what dimensions the chart is displaying and why they matter. - -For example, here's the description that accompanies the **swap** chart. - -![Screenshot of the swap -description](https://user-images.githubusercontent.com/1153921/63452078-477b1600-c3fa-11e9-836b-2fc90fba8b4b.png) - -If you're new to health monitoring and performance troubleshooting, we recommend you spend some time reading these -descriptions and learning more at the pages linked above. - -## Understand charts, dimensions, families, and contexts - -A **chart** is an interactive visualization of one or more collected/calculated metrics. You can see the name (also -known as its unique ID) of a chart by looking at the top-left corner of a chart and finding the parenthesized text. On a -Linux system, one of the first charts on the dashboard will be the system CPU chart, with the name `system.cpu`: - -![Screenshot of the system CPU chart in the Netdata -dashboard](https://user-images.githubusercontent.com/1153921/67443082-43b16e80-f5b8-11e9-8d33-d6ee052c6678.png) - -A **dimension** is any value that gets shown on a chart. The value can be raw data or calculated values, such as -percentages, aggregates, and more. Most charts will have more than one dimension, in which case it will display each in -a different color. Here, a `system.cpu` chart is showing many dimensions, such as `user`, `system`, `softirq`, `irq`, -and more. - -![Screenshot of the dimensions shown in the system CPU chart in the Netdata -dashboard](https://user-images.githubusercontent.com/1153921/62721031-2bba4d80-b9c0-11e9-9dca-32403617ce72.png) - -A **family** is _one_ instance of a monitored hardware or software resource that needs to be monitored and displayed -separately from similar instances. For example, if your system has multiple partitions, Netdata will create different -families for `/`, `/boot`, `/home`, and so on. Same goes for entire disks, network devices, and more. - -![A number of families created for disk partitions](https://user-images.githubusercontent.com/1153921/67896952-a788e980-fb1a-11e9-880b-2dfb3945c8d6.png) - -A **context** groups several charts based on the types of metrics being collected and displayed. For example, the -**Disk** section often has many contexts: `disk.io`, `disk.ops`, `disk.backlog`, `disk.util`, and so on. Netdata uses -this context to create individual charts and then groups them by family. You can always see the context of any chart by -looking at its name or hovering over the chart's date. - -It's important to understand these differences, as Netdata uses charts, dimensions, families, and contexts to create -health alarms and configure collectors. To read even more about the differences between all these elements of the -dashboard, and how they affect other parts of Netdata, read our [dashboards -documentation](https://github.com/netdata/netdata/blob/master/web/README.md#charts-contexts-families). - -## Interact with charts - -We built Netdata to be a big sandbox for learning more about your systems and applications. Time to play! - -Netdata's charts are fully interactive. You can pan through historical metrics, zoom in and out, select specific -timeframes for further analysis, resize charts, and more. - -Best of all, Whenever you use a chart in this way, Netdata synchronizes all the other charts to match it. - -![Animated GIF of the standard Netdata dashboard being manipulated and synchronizing -charts](https://user-images.githubusercontent.com/1153921/81867875-3d6beb00-9526-11ea-94b8-388951e2e03d.gif) - -### Pan, zoom, highlight, and reset charts - -You can change how charts show their metrics in a few different ways, each of which have a few methods: - -| Change | Method #1 | Method #2 | Method #3 | -| ------------------------------------------------- | ----------------------------------- | --------------------------------------------------------- | ---------------------------------------------------------- | -| **Reset** charts to default auto-refreshing state | `double click` | `double tap` (touchpad/touchscreen) | | -| **Select** a certain timeframe | `ALT` + `mouse selection` | `⌘` + `mouse selection` (macOS) | | -| **Pan** forward or back in time | `click and drag` | `touch and drag` (touchpad/touchscreen) | | -| **Zoom** to a specific timeframe | `SHIFT` + `mouse selection` | | | -| **Zoom** in/out | `SHIFT`/`ALT` + `mouse scrollwheel` | `SHIFT`/`ALT` + `two-finger pinch` (touchpad/touchscreen) | `SHIFT`/`ALT` + `two-finger scroll` (touchpad/touchscreen) | - -These interactions can also be triggered using the icons on the bottom-right corner of every chart. They are, -respectively, `Pan Left`, `Reset`, `Pan Right`, `Zoom In`, and `Zoom Out`. - -### Show and hide dimensions - -Each dimension can be hidden by clicking on it. Hiding dimensions simplifies the chart and can help you better discover -exactly which aspect of your system is behaving strangely. - -### Resize charts - -Additionally, resize charts by clicking-and-dragging the icon on the bottom-right corner of any chart. To restore the -chart to its original height, double-click the same icon. - -![Animated GIF of resizing a chart and resetting it to the default -height](https://user-images.githubusercontent.com/1153921/80842459-7d41e280-8bb6-11ea-9488-1bc29f94d7f2.gif) - -To learn more about other options and chart interactivity, read our [dashboard documentation](https://github.com/netdata/netdata/blob/master/web/README.md). - -## See raised alarms and the alarm log - -Aside from performance troubleshooting, the Agent helps you monitor the health of your systems and applications. That's -why every Netdata installation comes with dozens of pre-configured alarms that trigger alerts when your system starts -acting strangely. - -Find the **Alarms** button in the top navigation bring up a modal that shows currently raised alarms, all running -alarms, and the alarms log. - -Here is an example of a raised `system.cpu` alarm, followed by the full list and alarm log: - -![Animated GIF of looking at raised alarms and the alarm -log](https://user-images.githubusercontent.com/1153921/80842482-8c289500-8bb6-11ea-9791-600cfdbe82ce.gif) - -And a static screenshot of the raised CPU alarm: - -![Screenshot of a raised system CPU alarm](https://user-images.githubusercontent.com/1153921/80842330-2dfbb200-8bb6-11ea-8147-3cd366eb0f37.png) - -The alarm itself is named *system - cpu**, and its context is `system.cpu`. Beneath that is an auto-updating badge that -shows the latest value the chart that triggered the alarm. - -With the three icons beneath that and the **role** designation, you can: - -1. Scroll to the chart associated with this raised alarm. -2. Copy a link to the badge to your clipboard. -3. Copy the code to embed the badge onto another web page using an `<embed>` element. - -The table on the right-hand side displays information about the alarm's configuration. In above example, Netdata -triggers a warning alarm when CPU usage is between 75 and 85%, and a critical alarm when above 85%. It's a _little_ more -complicated than that, but we'll get into more complex health entity configurations in a later step. - -The `calculation` field is the equation used to calculate those percentages, and the `check every` field specifies how -often Netdata should be calculating these metrics to see if the alarm should remain triggered. - -The `execute` field tells Netdata how to notify you about this alarm, and the `source` field lets you know where you can -find the configuration file, if you'd like to edit its configuration. - -We'll cover alarm configuration in more detail later in the guide, so don't worry about it too much for now! Right -now, it's most important that you understand how to see alarms, and parse their details, if and when they appear on your -system. - -## What's next? - -In this step of the Netdata guide, you learned how to: - -- Visit the dashboard -- Explore available charts (using the right-side menu) -- Read the descriptions accompanying charts -- Interact with charts -- See raised alarms and the alarm log - -Next, you'll learn how to monitor multiple nodes through the dashboard. - -[Next: Monitor more than one system with Netdata →](step-03.md) - - diff --git a/docs/guides/step-by-step/step-03.md b/docs/guides/step-by-step/step-03.md deleted file mode 100644 index 3204765b4..000000000 --- a/docs/guides/step-by-step/step-03.md +++ /dev/null @@ -1,94 +0,0 @@ -<!-- -title: "Step 3. Monitor more than one system with Netdata" -date: 2020-05-01 -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/step-by-step/step-03.md ---> - -# Step 3. Monitor more than one system with Netdata - -The Netdata agent is _distributed_ by design. That means each agent operates independently from any other, collecting -and creating charts only for the system you installed it on. We made this decision a long time ago to [improve security -and performance](step-01.md). - -You might be thinking, "So, now I have to remember all these IP addresses, and type them into my browser -manually, to move from one system to another? Maybe I should just make a bunch of bookmarks. What's a few more tabs -on top of the hundred I have already?" - -We get it. That's why we built [Netdata Cloud](https://github.com/netdata/netdata/blob/master/docs/cloud/cloud.mdx), which connects many distributed -agents for a seamless experience when monitoring an entire infrastructure of Netdata-monitored nodes. - -![Animated GIF of Netdata -Cloud](https://user-images.githubusercontent.com/1153921/80828986-1ebb3b00-8b9b-11ea-957f-2c8d0d009e44.gif) - -## What you'll learn in this step - -In this step of the Netdata guide, we'll talk about the following: - -- [Step 3. Monitor more than one system with Netdata](#step-3-monitor-more-than-one-system-with-netdata) - - [What you'll learn in this step](#what-youll-learn-in-this-step) - - [Why use Netdata Cloud?](#why-use-netdata-cloud) - - [Get started with Netdata Cloud](#get-started-with-netdata-cloud) - - [Navigate between dashboards with Visited Nodes](#navigate-between-dashboards-with-visited-nodes) - - [What's next?](#whats-next) - -## Why use Netdata Cloud? - -Our [Cloud documentation](https://github.com/netdata/netdata/blob/master/docs/cloud/cloud.mdx) does a good job (we think!) of explaining why Cloud -gives you a ton of value at no cost: - -> Netdata Cloud gives you real-time visibility for your entire infrastructure. With Netdata Cloud, you can run all your -> distributed Agents in headless mode _and_ access the real-time metrics and insightful charts from their dashboards. -> View key metrics and active alarms at-a-glance, and then seamlessly dive into any of your distributed dashboards -> without leaving Cloud's centralized interface. - -You can add as many nodes and team members as you need, and as our free and open source Agent gets better with more -features, new collectors for more applications, and improved UI, so will Cloud. - -## Get started with Netdata Cloud - -Signing in, onboarding, and connecting your first nodes only takes a few minutes, and we have a [Get started with -Cloud](https://github.com/netdata/netdata/blob/master/docs/cloud/cloud.mdx) guide to help you walk through every step. - -Or, if you're feeling confident, dive right in. - -<p><a href="https://app.netdata.cloud" className="button button--lg">Sign in to Cloud</a></p> - -When you finish that guide, circle back to this step in the guide to learn how to use the Visited Nodes feature on -top of Cloud's centralized web interface. - -## Navigate between dashboards with Visited Nodes - -To add nodes to your visited nodes, you first need to navigate to that node's dashboard, then click the **Sign in** -button at the top of the dashboard. On the screen that appears, which states your node is requesting access to your -Netdata Cloud account, sign in with your preferred method. - -Cloud redirects you back to your node's dashboard, which is now connected to your Netdata Cloud account. You can now see the menu populated by a single visited node. - -![An Agent's dashboard with the Visited nodes -menu](https://user-images.githubusercontent.com/1153921/80830383-b6ba2400-8b9d-11ea-9eb2-379c7eccd22f.png) - -If you previously went through the Cloud onboarding process to create a Space and War Room, you will also see these -alongside your visited nodes. You can click on your Space or any of your War Rooms to navigate to Netdata Cloud and -continue monitoring your infrastructure from there. - -![A Agent's dashboard with the Visited nodes menu, plus Spaces and War -Rooms](https://user-images.githubusercontent.com/1153921/80830382-b6218d80-8b9d-11ea-869c-1170b95eeb4a.png) - -To add other visited nodes, navigate to their dashboard and sign in to Cloud by clicking on the **Sign in** button. This -process connects that node to your Cloud account and further populates the menu. - -Once you've added more than one node, you can use the menu to switch between various dashboards without remembering IP -addresses or hostnames or saving bookmarks for every node you want to monitor. - -![Switching between dashboards with Visited -nodes](https://user-images.githubusercontent.com/1153921/80831018-e158ac80-8b9e-11ea-882e-1d82cdc028cd.gif) - -## What's next? - -Now that you have a Netdata Cloud account with a connected node (or a few!) and can navigate between your dashboards with -Visited nodes, it's time to learn more about how you can configure Netdata to your liking. From there, you'll be able to -customize your Netdata experience to your exact infrastructure and the information you need. - -[Next: The basics of configuring Netdata →](step-04.md) - - diff --git a/docs/guides/step-by-step/step-04.md b/docs/guides/step-by-step/step-04.md deleted file mode 100644 index fcd84ce6a..000000000 --- a/docs/guides/step-by-step/step-04.md +++ /dev/null @@ -1,144 +0,0 @@ -<!-- -title: "Step 4. The basics of configuring Netdata" -date: 2020-03-31 -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/step-by-step/step-04.md ---> - -# Step 4. The basics of configuring Netdata - -Welcome to the fourth step of the Netdata guide. - -Since the beginning, we've covered the building blocks of Netdata, dashboard basics, and how you can monitor many -individual systems using many distributed Netdata agents. - -Next up: configuration. - -## What you'll learn in this step - -We'll talk about Netdata's default configuration, and then you'll learn how to do the following: - -- [Find your `netdata.conf` file](#find-your-netdataconf-file) -- [Use edit-config to open `netdata.conf`](#use-edit-config-to-open-netdataconf) -- [Navigate the structure of `netdata.conf`](#the-structure-of-netdataconf) -- [Edit your `netdata.conf` file](#edit-your-netdataconf-file) - -## Find your `netdata.conf` file - -Netdata primarily uses the `netdata.conf` file to configure its core functionality. `netdata.conf` resides within your -**Netdata config directory**. - -The location of that directory and `netdata.conf` depends on your operating system and the method you used to install -Netdata. - -The most reliable method of finding your Netdata config directory is loading your `netdata.conf` on your browser. Open a -tab and navigate to `http://HOST:19999/netdata.conf`. Your browser will load a text document that looks like this: - -![A netdata.conf file opened in the -browser](https://user-images.githubusercontent.com/1153921/68346763-344f1c80-00b2-11ea-9d1d-0ccac74d5558.png) - -Look for the line that begins with `# config directory = `. The text after that will be the path to your Netdata config -directory. - -In the system represented by the screenshot, the line reads: `config directory = /etc/netdata`. That means -`netdata.conf`, and all the other configuration files, can be found at `/etc/netdata`. - -> For more details on where your Netdata config directory is, take a look at our [installation -> instructions](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md). - -For the rest of this guide, we'll assume you're editing files or running scripts from _within_ your **Netdata -configuration directory**. - -## Use edit-config to open `netdata.conf` - -Inside your Netdata config directory, there is a helper scripted called `edit-config`. This script will open existing -Netdata configuration files using a text editor. Or, if the configuration file doesn't yet exist, the script will copy -an example file to your Netdata config directory and then allow you to edit it before saving it. - -> `edit-config` will use the `EDITOR` environment variable on your system to edit the file. On many systems, that is -> defaulted to `vim` or `nano`. We highly recommend `nano` for beginners. To change this variable for the current -> session (it will revert to the default when you reboot), export a new value: `export EDITOR=nano`. Or, [make the -> change permanent](https://stackoverflow.com/questions/13046624/how-to-permanently-export-a-variable-in-linux). - -Let's give it a shot. Navigate to your Netdata config directory. To use `edit-config` on `netdata.conf`, you need to -have permissions to edit the file. On Linux/macOS systems, you can usually use `sudo` to elevate your permissions. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different as found in the steps above -sudo ./edit-config netdata.conf -``` - -You should now see `netdata.conf` your editor! Let's walk through how the file is structured. - -## The structure of `netdata.conf` - -There are two main parts of the file to note: **sections** and **options**. - -The `netdata.conf` file is broken up into various **sections**, such as `[global]`, `[web]`, and `[registry]`. Each -section contains the configuration options for some core component of Netdata. - -Each section also contains many **options**. Options have a name and a value. With the option `config directory = -/etc/netdata`, `config directory` is the name, and `/etc/netdata` is the value. - -Most lines are **commented**, in that they start with a hash symbol (`#`), and the value is set to a sane default. To -tell Netdata that you'd like to change any option from its default value, you must **uncomment** it by removing that -hash. - -### Edit your `netdata.conf` file - -Let's try editing the options in `netdata.conf` to see how the process works. - -First, add a fake option to show you how Netdata loads its configuration files. Add a `test` option under the `[global]` -section and give it the value of `1`. - -```conf -[global] - test = 1 -``` - -Restart Netdata with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. - -Now, open up your browser and navigate to `http://HOST:19999/netdata.conf`. You'll see that Netdata has recognized -that our fake option isn't valid and added a notice that Netdata will ignore it. - -Here's the process in GIF form! - -![Animated GIF of creating a fake option in -netdata.conf](https://user-images.githubusercontent.com/1153921/65470254-4422e200-de1f-11e9-9597-a97c89ee59b8.gif) - -Now, let's make a slightly more substantial edit to `netdata.conf`: change the Agent's name. - -If you edit the value of the `hostname` option, you can change the name of your Netdata Agent on the dashboard and a -handful of other places, like the Visited nodes menu _and_ Netdata Cloud. - -Use `edit-config` to change the `hostname` option to a name like `hello-world`. Be sure to uncomment it! - -```conf -[global] - hostname = hello-world -``` - -Once you're done, restart Netdata and refresh the dashboard. Say hello to your renamed agent! - -![Animated GIF of editing the hostname option in -netdata.conf](https://user-images.githubusercontent.com/1153921/80994808-1c065300-8df2-11ea-81af-d28dc3ba27c8.gif) - -Netdata has dozens upon dozens of options you can change. To see them all, read our [daemon -configuration](https://github.com/netdata/netdata/blob/master/daemon/config/README.md), or hop into our popular guide on [increasing long-term metrics -storage](https://github.com/netdata/netdata/blob/master/docs/guides/longer-metrics-storage.md). - -## What's next? - -At this point, you should be comfortable with getting to your Netdata directory, opening and editing `netdata.conf`, and -seeing your changes reflected in the dashboard. - -Netdata has many more configuration files that you might want to change, but we'll cover those in the following steps of -this guide. - -In the next step, we're going to cover one of Netdata's core functions: monitoring the health of your systems via alarms -and notifications. You'll learn how to disable alarms, create new ones, and push notifications to the system of your -choosing. - -[Next: Health monitoring alarms and notifications →](step-05.md) - - diff --git a/docs/guides/step-by-step/step-05.md b/docs/guides/step-by-step/step-05.md deleted file mode 100644 index 3ef498d40..000000000 --- a/docs/guides/step-by-step/step-05.md +++ /dev/null @@ -1,349 +0,0 @@ -<!-- -title: "Step 5. Health monitoring alarms and notifications" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/step-by-step/step-05.md ---> - -# Step 5. Health monitoring alarms and notifications - -In the fifth step of the Netdata guide, we're introducing you to one of our core features: **health monitoring**. - -To accurately monitor the health of your systems and applications, you need to know _immediately_ when there's something -strange going on. Netdata's alarm and notification systems are essential to keeping you informed. - -Netdata comes with hundreds of pre-configured alarms that don't require configuration. They were designed by our -community of system administrators to cover the most important parts of production systems, so, in many cases, you won't -need to edit them. - -Luckily, Netdata's alarm and notification system are incredibly adaptable to your infrastructure's unique needs. - -## What you'll learn in this step - -We'll talk about Netdata's default configuration, and then you'll learn how to do the following: - -- [Tune Netdata's pre-configured alarms](#tune-netdatas-pre-configured-alarms) -- [Write your first health entity](#write-your-first-health-entity) -- [Enable Netdata's notification systems](#enable-netdatas-notification-systems) - -## Tune Netdata's pre-configured alarms - -First, let's tune an alarm that came pre-configured with your Netdata installation. - -The first chart you see on any Netdata dashboard is the `system.cpu` chart, which shows the system's CPU utilization -across all cores. To figure out which file you need to edit to tune this alarm, click the **Alarms** button at the top -of the dashboard, click on the **All** tab, and find the **system - cpu** alarm entity. - -![The system - cpu alarm entity](https://user-images.githubusercontent.com/1153921/67034648-ebb4cc80-f0cc-11e9-9d49-1023629924f5.png) - -Look at the `source` row in the table. This means the `system.cpu` chart sources its health alarms from -`4@/usr/lib/netdata/conf.d/health.d/cpu.conf`. To tune these alarms, you'll need to edit the alarm file at -`health.d/cpu.conf`. Go to your [Netdata config directory](step-04.md#find-your-netdataconf-file) and use the -`edit-config` script. - -```bash -sudo ./edit-config health.d/cpu.conf -``` - -The first **health entity** in that file looks like this: - -```yaml -template: 10min_cpu_usage - on: system.cpu - os: linux - hosts: * - lookup: average -10m unaligned of user,system,softirq,irq,guest - units: % - every: 1m - warn: $this > (($status >= $WARNING) ? (75) : (85)) - crit: $this > (($status == $CRITICAL) ? (85) : (95)) - delay: down 15m multiplier 1.5 max 1h - info: average cpu utilization for the last 10 minutes (excluding iowait, nice and steal) - to: sysadmin -``` - -Let's say you want to tune this alarm to trigger warning and critical alarms at a lower CPU utilization. You can change -the `warn` and `crit` lines to the values of your choosing. For example: - -```yaml - warn: $this > (($status >= $WARNING) ? (60) : (75)) - crit: $this > (($status == $CRITICAL) ? (75) : (85)) -``` - -You _can_ restart Netdata with `sudo systemctl restart netdata`, to enable your tune, but you can also reload _only_ the -health monitoring component using one of the available [methods](https://github.com/netdata/netdata/blob/master/health/QUICKSTART.md#reload-health-configuration). - -You can also tune any other aspect of the default alarms. To better understand how each line in a health entity works, -read our [health documentation](https://github.com/netdata/netdata/blob/master/health/README.md). - -### Silence an individual alarm - -Many Netdata users don't need all the default alarms enabled. Instead of disabling any given alarm, or even _all_ -alarms, you can silence individual alarms by changing one line in a given health entity. Let's look at that -`health/cpu.conf` file again. - -```yaml -template: 10min_cpu_usage - on: system.cpu - os: linux - hosts: * - lookup: average -10m unaligned of user,system,softirq,irq,guest - units: % - every: 1m - warn: $this > (($status >= $WARNING) ? (75) : (85)) - crit: $this > (($status == $CRITICAL) ? (85) : (95)) - delay: down 15m multiplier 1.5 max 1h - info: average cpu utilization for the last 10 minutes (excluding iowait, nice and steal) - to: sysadmin -``` - -To silence this alarm, change `sysadmin` to `silent`. - -```yaml - to: silent -``` - -Use `netdatacli reload-health` to reload your health configuration. You can add `to: silent` to any alarm you'd rather not -bother you with notifications. - -## Write your first health entity - -The best way to understand how health entities work is building your own and experimenting with the options. To start, -let's build a health entity that triggers an alarm when system RAM usage goes above 80%. - -We will first create a new file inside of the `health.d/` directory. We'll name our file -`example.conf` for now. - -```bash -./edit-config health.d/example.conf -``` - -The first line in a health entity will be `alarm:`. This is how you name your entity. You can give it any name you -choose, but the only symbols allowed are `.` and `_`. Let's call the alarm `ram_usage`. - -```yaml - alarm: ram_usage -``` - -> You'll see some funky indentation in the lines coming up. Don't worry about it too much! Indentation is not important -> to how Netdata processes entities, and it will make sense when you're done. - -Next, you need to specify which chart this entity listens via the `on:` line. You're declaring that you want this alarm -to check metrics on the `system.ram` chart. - -```yaml - on: system.ram -``` - -Now comes the `lookup`. This line specifies what metrics the alarm is looking for, what duration of time it's looking -at, and how to process the metrics into a more usable format. - -```yaml -lookup: average -1m percentage of used -``` - -Let's take a moment to break this line down. - -- `average`: Calculate the average of all the metrics collected. -- `-1m`: Use metrics from 1 minute ago until now to calculate that average. -- `percentage`: Clarify that you want to calculate a percentage of RAM usage. -- `of used`: Specify which dimension (`used`) on the `system.ram` chart you want to monitor with this entity. - -In other words, you're taking 1 minute's worth of metrics from the `used` dimension on the `system.ram` chart, -calculating their average, and returning it as a percentage. - -You can move on to the `units` line, which lets Netdata know that we're working with a percentage and not an absolute -unit. - -```yaml - units: % -``` - -Next, the `every` line tells Netdata how often to perform the calculation you specified in the `lookup` line. For -certain alarms, you might want to use a shorter duration, which you can specify using values like `10s`. - -```yaml - every: 1m -``` - -We'll put the next two lines—`warn` and `crit`—together. In these lines, you declare at which percentage you want to -trigger a warning or critical alarm. Notice the variable `$this`, which is the value calculated by the `lookup` line. -These lines will trigger a warning if that average RAM usage goes above 80%, and a critical alert if it's above 90%. - -```yaml - warn: $this > 80 - crit: $this > 90 -``` - -> ❗ Most default Netdata alarms come with more complicated `warn` and `crit` lines. You may have noticed the line `warn: -> $this > (($status >= $WARNING) ? (75) : (85))` in one of the health entity examples above, which is an example of -> using the [conditional operator for hysteresis](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#special-use-of-the-conditional-operator). -> Hysteresis is used to keep Netdata from triggering a ton of alerts if the metric being tracked quickly goes above and -> then falls below the threshold. For this very simple example, we'll skip hysteresis, but recommend implementing it in -> your future health entities. - -Finish off with the `info` line, which creates a description of the alarm that will then appear in any -[notification](#enable-netdatas-notification-systems) you set up. This line is optional, but it has value—think of it as -documentation for a health entity! - -```yaml - info: The percentage of RAM being used by the system. -``` - -Here's what the entity looks like in full. Now you can see why we indented the lines, too. - -```yaml - alarm: ram_usage - on: system.ram -lookup: average -1m percentage of used - units: % - every: 1m - warn: $this > 80 - crit: $this > 90 - info: The percentage of RAM being used by the system. -``` - -What about what it looks like on the Netdata dashboard? - -![An active alert for the ram_usage alarm](https://user-images.githubusercontent.com/1153921/67056219-f89ee380-f0ff-11e9-8842-7dc210dd2908.png) - -If you'd like to try this alarm on your system, you can install a small program called -[stress](http://manpages.ubuntu.com/manpages/disco/en/man1/stress.1.html) to create a synthetic load. Use the command -below, and change the `8G` value to a number that's appropriate for the amount of RAM on your system. - -```bash -stress -m 1 --vm-bytes 8G --vm-keep -``` - -Netdata is capable of understanding much more complicated entities. To better understand how they work, read the [health -documentation](https://github.com/netdata/netdata/blob/master/health/README.md), look at some [examples](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#example-alarms), and open the files -containing the default entities on your system. - -## Enable Netdata's notification systems - -Health alarms, while great on their own, are pretty useless without some way of you knowing they've been triggered. -That's why Netdata comes with a notification system that supports more than a dozen services, such as email, Slack, -Discord, PagerDuty, Twilio, Amazon SNS, and much more. - -To see all the supported systems, visit our [notifications documentation](https://github.com/netdata/netdata/blob/master/health/notifications/README.md). - -We'll cover email and Slack notifications here, but with this knowledge you should be able to enable any other type of -notifications instead of or in addition to these. - -### Email notifications - -To use email notifications, you need `sendmail` or an equivalent installed on your system. Linux systems use `sendmail` -or similar programs to, unsurprisingly, send emails to any inbox. - -> Learn more about `sendmail` via its [documentation](http://www.postfix.org/sendmail.1.html). - -Edit the `health_alarm_notify.conf` file, which resides in your Netdata directory. - -```bash -sudo ./edit-config health_alarm_notify.conf -``` - -Look for the following lines: - -```conf -# if a role recipient is not configured, an email will be send to: -DEFAULT_RECIPIENT_EMAIL="root" -# to receive only critical alarms, set it to "root|critical" -``` - -Change the value of `DEFAULT_RECIPIENT_EMAIL` to the email address at which you'd like to receive notifications. - -```conf -# if a role recipient is not configured, an email will be sent to: -DEFAULT_RECIPIENT_EMAIL="me@example.com" -# to receive only critical alarms, set it to "root|critical" -``` - -Test email notifications system by first becoming the Netdata user and then asking Netdata to send a test alarm: - -```bash -sudo su -s /bin/bash netdata -/usr/libexec/netdata/plugins.d/alarm-notify.sh test -``` - -You should see output similar to this: - -```bash -# SENDING TEST WARNING ALARM TO ROLE: sysadmin -2019-10-17 18:23:38: alarm-notify.sh: INFO: sent email notification for: hostname test.chart.test_alarm is WARNING to 'me@example.com' -# OK - -# SENDING TEST CRITICAL ALARM TO ROLE: sysadmin -2019-10-17 18:23:38: alarm-notify.sh: INFO: sent email notification for: hostname test.chart.test_alarm is CRITICAL to 'me@example.com' -# OK - -# SENDING TEST CLEAR ALARM TO ROLE: sysadmin -2019-10-17 18:23:39: alarm-notify.sh: INFO: sent email notification for: hostname test.chart.test_alarm is CLEAR to 'me@example.com' -# OK -``` - -... and you should get three separate emails, one for each test alarm, in your inbox! (Be sure to check your spam -folder.) - -## Enable Slack notifications - -If you're one of the many who spend their workday getting pinged with GIFs by your colleagues, why not add Netdata -notifications to the mix? It's a great way to immediately see, collaborate around, and respond to anomalies in your -infrastructure. - -To get Slack notifications working, you first need to add an [incoming -webhook](https://slack.com/apps/A0F7XDUAZ-incoming-webhooks) to the channel of your choice. Click the green **Add to -Slack** button, choose the channel, and click the **Add Incoming WebHooks Integration** button. - -On the following page, you'll receive a **Webhook URL**. That's what you'll need to configure Netdata, so keep it handy. - -Time to dive back into your `health_alarm_notify.conf` file: - -```bash -sudo ./edit-config health_alarm_notify.conf -``` - -Look for the `SLACK_WEBHOOK_URL=" "` line and add the incoming webhook URL you got from Slack: - -```conf -SLACK_WEBHOOK_URL="https://hooks.slack.com/services/XXXXXXXXX/XXXXXXXXX/XXXXXXXXXXXX" -``` - -A few lines down, edit the `DEFAULT_RECIPIENT_SLACK` line to contain a single hash `#` character. This instructs Netdata -to send a notification to the channel you configured with the incoming webhook. - -```conf -DEFAULT_RECIPIENT_SLACK="#" -``` - -Time to test the notifications again! - -```bash -sudo su -s /bin/bash netdata -/usr/libexec/netdata/plugins.d/alarm-notify.sh test -``` - -You should receive three notifications in your Slack channel. - -Congratulations! You're set up with two awesome ways to get notified about any change in the health of your systems or -applications. - -To further configure your email or Slack notification setup, or to enable other notification systems, check out the -following documentation: - -- [Email notifications](https://github.com/netdata/netdata/blob/master/health/notifications/email/README.md) -- [Slack notifications](https://github.com/netdata/netdata/blob/master/health/notifications/slack/README.md) -- [Netdata's notification system](https://github.com/netdata/netdata/blob/master/health/notifications/README.md) - -## What's next? - -In this step, you learned the fundamentals of Netdata's health monitoring tools: alarms and notifications. You should be -able to tune default alarms, silence them, and understand some of the basics of writing health entities. And, if you so -chose, you'll now have both email and Slack notifications enabled. - -You're coming along quick! - -Next up, we're going to cover how Netdata collects its metrics, and how you can get Netdata to collect real-time metrics -from hundreds of services with almost no configuration on your part. Onward! - -[Next: Collect metrics from more services and apps →](step-06.md) - - diff --git a/docs/guides/step-by-step/step-06.md b/docs/guides/step-by-step/step-06.md deleted file mode 100644 index b951a76bb..000000000 --- a/docs/guides/step-by-step/step-06.md +++ /dev/null @@ -1,122 +0,0 @@ -<!-- -title: "Step 6. Collect metrics from more services and apps" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/step-by-step/step-06.md ---> - -# Step 6. Collect metrics from more services and apps - -When Netdata _starts_, it auto-detects dozens of **data sources**, such as database servers, web servers, and more. - -To auto-detect and collect metrics from a source you just installed, you need to restart Netdata using `sudo systemctl -restart netdata`, or the [appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. - -However, auto-detection only works if you installed the source using its standard installation -procedure. If Netdata isn't collecting metrics after a restart, your source probably isn't configured -correctly. - -Check out the [collectors that come pre-installed with Netdata](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md) to find the module for the -source you want to monitor. - -## What you'll learn in this step - -We'll begin with an overview on Netdata's collector architecture, and then dive into the following: - -- [Netdata's collector architecture](#netdatas-collector-architecture) -- [Enable and disable plugins](#enable-and-disable-plugins) -- [Enable the Nginx collector as an example](#example-enable-the-nginx-collector) - -## Netdata's collector architecture - -Many Netdata users never have to configure collector or worry about which plugin orchestrator they want to use. - -But, if you want to configure collector or write a collector for your custom source, it's important to understand the -underlying architecture. - -By default, Netdata collects a lot of metrics every second using any number of discrete collector. Collectors, in turn, -are organized and manged by plugins. **Internal** plugins collect system metrics, **external** plugins collect -non-system metrics, and **orchestrator** plugins group individual collectors together based on the programming language -they were built in. - -These modules are primarily written in [Go](https://github.com/netdata/go.d.plugin/blob/master/README.md) (`go.d`) and -[Python](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/README.md), although some use [Bash](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/README.md) -(`charts.d`). - -## Enable and disable plugins - -You don't need to explicitly enable plugins to auto-detect properly configured sources, but it's useful to know how to -enable or disable them. - -One reason you might want to _disable_ plugins is to improve Netdata's performance on low-resource systems, like -ephemeral nodes or edge devices. Disabling orchestrator plugins like `python.d` can save significant resources if you're -not using any of its data collector modules. - -You can enable or disable plugins in the `[plugin]` section of `netdata.conf`. This section features a list of all the -plugins with a boolean setting (`yes` or `no`) to enable or disable them. Be sure to uncomment the line by removing the -hash (`#`)! - -Enabled: - -```conf -[plugins] - # python.d = yes -``` - -Disabled: - -```conf -[plugins] - python.d = no -``` - -When you explicitly disable a plugin this way, it won't auto-collect metrics using its collectors. - -## Example: Enable the Nginx collector - -To help explain how the auto-detection process works, let's use an Nginx web server as an example. - -Even if you don't have Nginx installed on your system, we recommend you read through the following section so you can -apply the process to other data sources, such as Apache, Redis, Memcached, and more. - -The Nginx collector, which helps Netdata collect metrics from a running Nginx web server, is part of the -`python.d.plugin` external plugin _orchestrator_. - -In order for Netdata to auto-detect an Nginx web server, you need to enable `ngx_http_stub_status_module` and pass the -`stub_status` directive in the `location` block of your Nginx configuration file. - -You can confirm if the `stub_status` Nginx module is already enabled or not by using following command: - -```sh -nginx -V 2>&1 | grep -o with-http_stub_status_module -``` - -If this command returns nothing, you'll need to [enable this module](https://www.nginx.com/blog/monitoring-nginx/). - -Next, edit your `/etc/nginx/sites-enabled/default` file to include a `location` block with the following: - -```conf - location /stub_status { - stub_status; - } -``` - -Restart Netdata using `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system, and Netdata will auto-detect metrics from your Nginx web -server! - -While not necessary for most auto-detection and collection purposes, you can also configure the Nginx collector itself -by editing its configuration file: - -```sh -./edit-config python.d/nginx.conf -``` - -After configuring any source, or changing the configuration files for their respective modules, always restart Netdata. - -## What's next? - -Now that you've learned the fundamentals behind configuring data sources for auto-detection, it's time to move back to -the dashboard to learn more about some of its more advanced features. - -[Next: Netdata's dashboard in depth →](step-07.md) - - diff --git a/docs/guides/step-by-step/step-07.md b/docs/guides/step-by-step/step-07.md deleted file mode 100644 index 8c5c21bee..000000000 --- a/docs/guides/step-by-step/step-07.md +++ /dev/null @@ -1,114 +0,0 @@ -<!-- -title: "Step 7. Netdata's dashboard in depth" -date: 2020-05-04 -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/step-by-step/step-07.md ---> - -# Step 7. Netdata's dashboard in depth - -Welcome to the seventh step of the Netdata guide! - -This step of the guide aims to get you more familiar with the features of the dashboard not previously mentioned in -[step 2](https://github.com/netdata/netdata/blob/master/docs/guides/step-by-step/step-02.md). - -## What you'll learn in this step - -In this step of the Netdata guide, you'll learn how to: - -- [Change the dashboard's settings](#change-the-dashboards-settings) -- [Check if there's an update to Netdata](#check-if-theres-an-update-to-netdata) -- [Export and import a snapshot](#export-and-import-a-snapshot) - -Let's get started! - -## Change the dashboard's settings - -The settings area at the top of your Netdata dashboard houses browser settings. These settings do not affect the -operation of your Netdata server/daemon. They take effect immediately and are permanently saved to browser local storage -(except the refresh on focus / always option). - -You can see the **Performance**, **Synchronization**, **Visual**, and **Locale** tabs on the dashboard settings modal. - -![Animated GIF of opening the settings -modal](https://user-images.githubusercontent.com/1153921/80841197-c93f5800-8bb3-11ea-907d-85bfe23565e1.gif) - -To change any setting, click on the toggle button. We recommend you spend some time reading the descriptions for each setting to understand them before making changes. - -Pay particular attention to the following settings, as they have dramatic impacts on the performance and appearance of -your Netdata dashboard: - -- When to refresh the charts? -- How to handle hidden charts? -- Which chart refresh policy to use? -- Which theme to use? -- Do you need help? - -Some settings are applied immediately, and others are only reflected after you refresh the page. - -## Check if there's an update to Netdata - -You can always check if there is an update available from the **Update** area of your Netdata dashboard. - -![Opening the Agent's Update modal](https://user-images.githubusercontent.com/1153921/80829493-1adbe880-8b9c-11ea-9770-cc3b23a89414.gif) - -If an update is available, you'll see a modal similar to the one above. - -When you use the [automatic one-line installer script](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md) attempt to update every day. If -you choose to update it manually, there are [several well-documented methods](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md) to achieve -that. However, it is best practice for you to first go over the [changelog](https://github.com/netdata/netdata/blob/master/CHANGELOG.md). - -## Export and import a snapshot - -Netdata can export and import snapshots of the contents of your dashboard at a given time. Any Netdata agent can import -a snapshot created by any other Netdata agent. - -Snapshot files include all the information of the dashboard, including the URL of the origin server, its unique ID, and -chart data queries for the visible timeframe. While snapshots are not in real-time, and thus won't update with new -metrics, you can still pan, zoom, and highlight charts as you see fit. - -Snapshots can be incredibly useful for diagnosing anomalies after they've already happened. Let's say Netdata triggered -an alarm while you were sleeping. In the morning, you can look up the exact moment the alarm was raised, export a -snapshot, and send it to a colleague for further analysis. - -> ❗ Know how you shouldn't go around downloading software from suspicious-looking websites? Same policy goes for loading -> snapshots from untrusted or anonymous sources. Importing a snapshot loads quite a bit of data into your web browser, -> and so you should always err on the side of protecting your system. - -To export a snapshot, click on the **export** icon. - -![Animated GIF of opening the export -modal](https://user-images.githubusercontent.com/1153921/80993197-82d63d00-8def-11ea-88fa-98827814e930.gif) - -Edit the snapshot file name and select your desired compression method. Click on **Export**. - -When the export is complete, your browser will prompt you to save the `.snapshot` file to your machine. You can now -share this file with any other Netdata user via email, Slack, or even to help describe your Netdata experience when -[filing an issue](https://github.com/netdata/netdata/issues/new/choose) on GitHub. - -To import a snapshot, click on the **import** icon. - -![Animated GIF of opening the import -modal](https://user-images.githubusercontent.com/12263278/64901503-ee696f80-d691-11e9-9678-8d0e2a162402.gif) - -Select the Netdata snapshot file to import. Once the file is loaded, the dashboard will update with critical information -about the snapshot and the system from which it was taken. Click **import** to render it. - -Your Netdata dashboard will load data contained in the snapshot into charts. Because the snapshot only covers a certain -period, it won't update with new metrics. - -An imported snapshot is also temporary. If you reload your browser tab, Netdata will remove the snapshot data and -restore your real-time dashboard for your machine. - -## What's next? - -In this step of the Netdata guide, you learned how to: - -- Change the dashboard's settings -- Check if there's an update to Netdata -- Export or import a snapshot - -Next, you'll learn how to build your first custom dashboard! - -[Next: Build your first custom dashboard →](step-08.md) - - diff --git a/docs/guides/step-by-step/step-08.md b/docs/guides/step-by-step/step-08.md deleted file mode 100644 index 7a8d417f1..000000000 --- a/docs/guides/step-by-step/step-08.md +++ /dev/null @@ -1,395 +0,0 @@ -<!-- -title: "Step 8. Build your first custom dashboard" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/step-by-step/step-08.md ---> - -# Step 8. Build your first custom dashboard - -In previous steps of the guide, you have learned how several sections of the Netdata dashboard worked. - -This step will show you how to set up a custom dashboard to fit your unique needs. If nothing else, Netdata is really, -really flexible. 🤸 - -## What you'll learn in this step - -In this step of the Netdata guide, you'll learn: - -- [Why you might want a custom dashboard](#why-should-i-create-a-custom-dashboard) -- [How to create and prepare your `custom-dashboard.html` file](#create-and-prepare-your-custom-dashboardhtml-file) -- [Where to add `dashboard.js` to your custom dashboard file](#add-dashboardjs-to-your-custom-dashboard-file) -- [How to add basic styling](#add-some-basic-styling) -- [How to add charts of different types, shapes, and sizes](#creating-your-dashboards-charts) - -Let's get on with it! - -## Why should I create a custom dashboard? - -Because it's cool! - -But there are way more reasons than that, most of which will prove more valuable to you. - -You could use custom dashboards to aggregate real-time data from multiple Netdata agents in one place. Or, you could put -all the charts with metrics collected from your custom application via `statsd` and perform application performance -monitoring from a single dashboard. You could even use a custom dashboard and a standalone web server to create an -enriched public status page for your service, and give your users something fun to look at while they're waiting for the -503 errors to clear up! - -Netdata's custom dashboarding capability is meant to be as flexible as your ideas. We hope you can take these -fundamental ideas and turn them into something amazing. - -## Create and prepare your `custom-dashboard.html` file - -By default, Netdata stores its web server files at `/usr/share/netdata/web`. As with finding the location of your -`netdata.conf` file, you can double-check this location by loading up `http://HOST:19999/netdata.conf` in your browser -and finding the value of the `web files directory` option. - -To create your custom dashboard, create a file at `/usr/share/netdata/web/custom-dashboard.html` and copy in the -following: - -```html -<!DOCTYPE html> -<html lang="en"> -<head> - <title>My custom dashboard</title> - - <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> - <meta charset="utf-8"> - <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> - <meta name="viewport" content="width=device-width, initial-scale=1"> - <meta name="apple-mobile-web-app-capable" content="yes"> - <meta name="apple-mobile-web-app-status-bar-style" content="black-translucent"> - - <!-- Add dashboard.js here! --> - -</head> -<body> - - <main class="container"> - - <h1>My custom dashboard</h1> - - <!-- Add charts here! --> - - </main> - -</body> -</html> -``` - -Try visiting `http://HOST:19999/custom-dashboard.html` in your browser. - -If you get a blank page with this text: `Access to file is not permitted: /usr/share/netdata/web/custom-dashboard.html`. -You can fix this error by changing the dashboard file's permissions to make it owned by the `netdata` user. - -```bash -sudo chown netdata:netdata /usr/share/netdata/web/custom-dashboard.html -``` - -Reload your browser, and you should see a blank page with the title: **Your custom dashboard**! - -## Add `dashboard.js` to your custom dashboard file - -You need to include the `dashboard.js` file of a Netdata agent to add Netdata charts. Add the following to the `<head>` -of your custom dashboard page and change `HOST` according to your setup. - -```html - <!-- Add dashboard.js here! --> - <script type="text/javascript" src="http://HOST:19999/dashboard.js"></script> -``` - -When you add `dashboard.js` to any web page, it loads several JavaScript and CSS files to create and style charts. It -also scans the page for elements that define charts, builds them, and refreshes with new metrics. - -> If you enabled SSL on your Netdata dashboard already, you'll need to use `https://` to grab the `dashboard.js` file. - -## Add some basic styling - -While not necessary, let's add some basic styling to make our dashboard look a little nicer. We're putting some -basic CSS into a `<style>` tag inside of the page's `<head>` element. - -```html - <!-- Add dashboard.js here! --> - <script type="text/javascript" src="http://HOST:19999/dashboard.js"></script> - - <style> - .wrap { - max-width: 1280px; - margin: 0 auto; - } - - h1 { - margin-bottom: 30px; - text-align: center; - } - - .charts { - display: flex; - flex-flow: row wrap; - justify-content: space-around; - } - </style> - -</head> -``` - -## Creating your dashboard's charts - -Time to create a chart! - -You need to create a `<div>` for each new chart. Each `<div>` element accepts a few `data-` attributes, some of which -are required and some of which are optional. - -Let's cover a few important ones. And while we do it, we'll create a custom dashboard that shows a few CPU-related -charts on a single page. - -### The chart unique ID (required) - -You need to specify the unique ID of a chart to show it on your custom dashboard. If you forgot how to find the unique -ID, head back over to [step 2](https://github.com/netdata/netdata/blob/master/docs/guides/step-by-step/step-02.md#understand-charts-dimensions-families-and-contexts) -for a re-introduction. - -You can then put this unique ID into a `<div>` element with the `data-netdata` attribute. Put this in the `<body>` of -your custom dashboard file beneath the helpful comment. - -```html -<body> - - <main class="wrap"> - - <h1>My custom dashboard</h1> - - <div class="charts"> - - <!-- Add charts here! --> - <div data-netdata="system.cpu"></div> - - </div> - - </main> - -</body> -``` - -Reload the page, and you should see a real-time `system.cpu` chart! - -... and a whole lot of white space. Let's fix that by adding a few more charts. - -```html - <!-- Add charts here! --> - <div data-netdata="system.cpu"></div> - <div data-netdata="apps.cpu"></div> - <div data-netdata="groups.cpu"></div> - <div data-netdata="users.cpu"></div> -``` - -![Custom dashboard with four charts -added](https://user-images.githubusercontent.com/1153921/67526566-e675f580-f669-11e9-8ff5-d1f21a84fb2b.png) - -### Set chart duration - -By default, these charts visualize 10 minutes of Netdata metrics. Let's get a little more granular on this dashboard. To -do so, add a new `data-after=""` attribute to each chart. - -`data-after` takes a _relative_ number of seconds from _now_. So, by putting `-300` as the value, you're asking the -custom dashboard to display the _last 5 minutes_ (`5m * 60s = 300s`) of data. - -```html - <!-- Add charts here! --> - <div data-netdata="system.cpu" - data-after="-300"> - </div> - <div data-netdata="apps.cpu" - data-after="-300"> - </div> - <div data-netdata="groups.cpu" - data-after="-300"> - </div> - <div data-netdata="users.cpu" - data-after="-300"> - </div> -``` - -### Set chart size - -You can set the size of any chart using the `data-height=""` and `data-width=""` attributes. These attributes can be -anything CSS accepts for width and height (e.g. percentages, pixels, em/rem, calc, and so on). - -Let's make the charts a little taller and allow them to fit side-by-side for a more compact view. Add -`data-height="200px"` and `data-width="50%"` to each chart. - -```html - <div data-netdata="system.cpu" - data-after="-300" - data-height="250px" - data-width="50%"></div> - <div data-netdata="apps.cpu" - data-after="-300" - data-height="250px" - data-width="50%"></div> - <div data-netdata="groups.cpu" - data-after="-300" - data-height="250px" - data-width="50%"></div> - <div data-netdata="users.cpu" - data-after="-300" - data-height="250px" - data-width="50%"></div> -``` - -Now we're getting somewhere! - -![A custom dashboard with four charts -side-by-side](https://user-images.githubusercontent.com/1153921/67526620-ff7ea680-f669-11e9-92d3-575665fc3a8e.png) - -## Final touches - -While we already have a perfectly workable dashboard, let's add some final touches to make it a little more pleasant on -the eyes. - -First, add some extra CSS to create some vertical whitespace between the top and bottom row of charts. - -```html - <style> - ... - - .charts > div { - margin-bottom: 6rem; - } - </style> -``` - -To create horizontal whitespace, change the value of `data-width="50%"` to `data-width="calc(50% - 2rem)"`. - -```html - <div data-netdata="system.cpu" - data-after="-300" - data-height="250px" - data-width="calc(50% - 2rem)"></div> - <div data-netdata="apps.cpu" - data-after="-300" - data-height="250px" - data-width="calc(50% - 2rem)"></div> - <div data-netdata="groups.cpu" - data-after="-300" - data-height="250px" - data-width="calc(50% - 2rem)"></div> - <div data-netdata="users.cpu" - data-after="-300" - data-height="250px" - data-width="calc(50% - 2rem)"></div> -``` - -Told you the `data-width` and `data-height` attributes can take any CSS values! - -Prefer a dark theme? Add this to your `<head>` _above_ where you added `dashboard.js`: - -```html - <script> - var netdataTheme = 'slate'; - </script> - - <!-- Add dashboard.js here! --> - <script type="text/javascript" src="https://HOST/dashboard.js"></script> -``` - -Refresh the dashboard to give your eyes a break from all that blue light! - -![A finished custom -dashboard](https://user-images.githubusercontent.com/1153921/67531221-a23d2200-f676-11e9-91fe-c2cf1c426bf9.png) - -## The final `custom-dashboard.html` - -In case you got lost along the way, here's the final version of the `custom-dashboard.html` file: - -```html -<!DOCTYPE html> -<html lang="en"> -<head> - <title>My custom dashboard</title> - - <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> - <meta charset="utf-8"> - <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> - <meta name="viewport" content="width=device-width, initial-scale=1"> - <meta name="apple-mobile-web-app-capable" content="yes"> - <meta name="apple-mobile-web-app-status-bar-style" content="black-translucent"> - - <script> - var netdataTheme = 'slate'; - </script> - - <!-- Add dashboard.js here! --> - <script type="text/javascript" src="http://localhost:19999/dashboard.js"></script> - - <style> - .wrap { - max-width: 1280px; - margin: 0 auto; - } - - h1 { - margin-bottom: 30px; - text-align: center; - } - - .charts { - display: flex; - flex-flow: row wrap; - justify-content: space-around; - } - - .charts > div { - margin-bottom: 6rem; - position: relative; - } - </style> - -</head> -<body> - - <main class="wrap"> - - <h1>My custom dashboard</h1> - - <div class="charts"> - - <!-- Add charts here! --> - <div data-netdata="system.cpu" - data-after="-300" - data-height="250px" - data-width="calc(50% - 2rem)"></div> - <div data-netdata="apps.cpu" - data-after="-300" - data-height="250px" - data-width="calc(50% - 2rem)"></div> - <div data-netdata="groups.cpu" - data-after="-300" - data-height="250px" - data-width="calc(50% - 2rem)"></div> - <div data-netdata="users.cpu" - data-after="-300" - data-height="250px" - data-width="calc(50% - 2rem)"></div> - - </div> - - </main> - -</body> -</html> -``` - -## What's next? - -In this guide, you learned the fundamentals of building a custom Netdata dashboard. You should now be able to add more -charts to your `custom-dashboard.html`, change the charts that are already there, and size them according to your needs. - -Of course, the custom dashboarding features covered here are just the beginning. Be sure to read up on our [custom -dashboard documentation](https://github.com/netdata/netdata/blob/master/web/gui/custom/README.md) for details on how you can use other chart libraries, pull metrics -from multiple Netdata agents, and choose which dimensions a given chart shows. - -Next, you'll learn how to store long-term historical metrics in Netdata! - -[Next: Long-term metrics storage →](https://github.com/netdata/netdata/blob/master/docs/guides/step-by-step/step-09.md) - - diff --git a/docs/guides/step-by-step/step-09.md b/docs/guides/step-by-step/step-09.md deleted file mode 100644 index 839115a27..000000000 --- a/docs/guides/step-by-step/step-09.md +++ /dev/null @@ -1,162 +0,0 @@ -<!-- -title: "Step 9. Long-term metrics storage" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/step-by-step/step-09.md ---> - -# Step 9. Long-term metrics storage - -By default, Netdata stores metrics in a custom database we call the [database engine](https://github.com/netdata/netdata/blob/master/database/engine/README.md), which -stores recent metrics in your system's RAM and "spills" historical metrics to disk. By using both RAM and disk, the -database engine helps you store a much larger dataset than the amount of RAM your system has. - -On a system that's collecting 2,000 metrics every second, the database engine's default configuration will store about -two day's worth of metrics in RAM and on disk. - -That's a lot of metrics. We're talking 345,600,000 individual data points. And the database engine does it with a tiny -a portion of the RAM available on most systems. - -To store _even more_ metrics, you have two options. First, you can tweak the database engine's options to expand the RAM -or disk it uses. Second, you can archive metrics to an external database. For that, we'll use MongoDB as examples. - -## What you'll learn in this step - -In this step of the Netdata guide, you'll learn how to: - -- [Tweak the database engine's settings](#tweak-the-database-engines-settings) -- [Archive metrics to an external database](#archive-metrics-to-an-external-database) - - [Use the MongoDB database](#archive-metrics-via-the-mongodb-exporting-connector) - -Let's get started! - -## Tweak the database engine's settings - -If you're using Netdata v1.18.0 or higher, and you haven't changed your `memory mode` settings before following this -guide, your Netdata agent is already using the database engine. - -Let's look at your `netdata.conf` file again. Under the `[global]` section, you'll find three connected options. - -```conf -[db] - # mode = dbengine - # dbengine page cache size MB = 32 - # dbengine disk space MB = 256 -``` - -The `memory mode` option is set, by default, to `dbengine`. `page cache size` determines the amount of RAM, in MiB, that -the database engine dedicates to caching the metrics it's collecting. `dbengine disk space` determines the amount of -disk space, in MiB, that the database engine will use to store these metrics once they've been "spilled" to disk.. - -You can uncomment and change either `page cache size` or `dbengine disk space` based on how much RAM and disk you want -the database engine to use. The higher those values, the more metrics Netdata will store. If you change them to 64 and -512, respectively, the database engine should store about four day's worth of data on a system collecting 2,000 metrics -every second. - -[**See our database engine calculator**](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) to help you correctly set `dbengine disk -space` based on your needs. The calculator gives an accurate estimate based on how many child nodes you have, how many -metrics your Agent collects, and more. - -```conf -[db] - mode = dbengine - dbengine page cache size MB = 64 - dbengine disk space MB = 512 -``` - -After you've made your changes, restart Netdata using `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. - -To confirm the database engine is working, go to your Netdata dashboard and click on the **Netdata Monitoring** menu on -the right-hand side. You can find `dbengine` metrics after `queries`. - -![Image of the database engine reflected in the Netdata -Dashboard](https://user-images.githubusercontent.com/12263278/64781383-9c71fe00-d55a-11e9-962b-efd5558efbae.png) - -## Archive metrics to an external database - -You can archive all the metrics collected by Netdata to **external databases**. The supported databases and services -include Graphite, OpenTSDB, Prometheus, AWS Kinesis Data Streams, Google Cloud Pub/Sub, MongoDB, and the list is always -growing. - -As we said in [step 1](https://github.com/netdata/netdata/blob/master/docs/guides/step-by-step/step-01.md), we have only complimentary systems, not competitors! We're -happy to support these archiving methods and are always working to improve them. - -A lot of Netdata users archive their metrics to one of these databases for long-term storage or further analysis. Since -Netdata collects so many metrics every second, they can quickly overload small devices or even big servers that are -aggregating metrics streaming in from other Netdata agents. - -We even support resampling metrics during archiving. With resampling enabled, Netdata will archive only the average or -sum of every X seconds of metrics. This reduces the sheer amount of data, albeit with a little less accuracy. - -How you archive metrics, or if you archive metrics at all, is entirely up to you! But let's cover two easy archiving -methods, MongoDB and Prometheus remote write, to get you started. - -### Archive metrics via the MongoDB exporting connector - -Begin by installing MongoDB its dependencies via the correct package manager for your system. - -```bash -sudo apt-get install mongodb # Debian/Ubuntu -sudo dnf install mongodb # Fedora -sudo yum install mongodb # CentOS -``` - -Next, install the one essential dependency: v1.7.0 or higher of -[libmongoc](http://mongoc.org/libmongoc/current/installing.html). - -```bash -sudo apt-get install libmongoc-1.0-0 libmongoc-dev # Debian/Ubuntu -sudo dnf install mongo-c-driver mongo-c-driver-devel # Fedora -sudo yum install mongo-c-driver mongo-c-driver-devel # CentOS -``` - -Next, create a new MongoDB database and collection to store all these archived metrics. Use the `mongo` command to start -the MongoDB shell, and then execute the following command: - -```mongodb -use netdata -db.createCollection("netdata_metrics") -``` - -Next, Netdata needs to be [reinstalled](https://github.com/netdata/netdata/blob/master/packaging/installer/REINSTALL.md) in order to detect that the required -libraries to make this exporting connection exist. Since you most likely installed Netdata using the one-line installer -script, all you have to do is run that script again. Don't worry—any configuration changes you made along the way will -be retained! - -Now, from your Netdata config directory, initialize and edit a `exporting.conf` file to tell Netdata where to find the -database you just created. - -```sh -./edit-config exporting.conf -``` - -Add the following section to the file: - -```conf -[mongodb:my_mongo_instance] - enabled = yes - destination = mongodb://localhost - database = netdata - collection = netdata_metrics -``` - -Restart Netdata using `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system, to enable the MongoDB exporting connector. Click on the -**Netdata Monitoring** menu and check out the **exporting my mongo instance** sub-menu. You should start seeing these -charts fill up with data about the exporting process! - -![image](https://user-images.githubusercontent.com/1153921/70443852-25171200-1a56-11ea-8be3-494544b1c295.png) - -If you'd like to try connecting Netdata to another database, such as Prometheus or OpenTSDB, read our [exporting -documentation](https://github.com/netdata/netdata/blob/master/exporting/README.md). - -## What's next? - -You're getting close to the end! In this step, you learned how to make the most of the database engine, or archive -metrics to MongoDB for long-term storage. - -In the last step of this step-by-step guide, we'll put our sysadmin hat on and use Nginx to proxy traffic to and from -our Netdata dashboard. - -[Next: Set up a proxy →](https://github.com/netdata/netdata/blob/master/docs/guides/step-by-step/step-10.md) - - diff --git a/docs/guides/step-by-step/step-10.md b/docs/guides/step-by-step/step-10.md deleted file mode 100644 index a24e803f7..000000000 --- a/docs/guides/step-by-step/step-10.md +++ /dev/null @@ -1,232 +0,0 @@ -<!-- -title: "Step 10. Set up a proxy" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/step-by-step/step-10.md ---> - -# Step 10. Set up a proxy - -You're almost through! At this point, you should be pretty familiar with now Netdata works and how to configure it to -your liking. - -In this step of the guide, we're going to add a proxy in front of Netdata. We're doing this for both improved -performance and security, so we highly recommend following these steps. Doubly so if you installed Netdata on a -publicly-accessible remote server. - -> ❗ If you installed Netdata on the machine you're currently using (e.g. on `localhost`), and have been accessing -> Netdata at `http://localhost:19999`, you can skip this step of the guide. In most cases, there is no benefit to -> setting up a proxy for a service running locally. - -> ❗❗ This guide requires more advanced administration skills than previous parts. If you're still working on your -> Linux administration skills, and would rather get back to Netdata, you might want to [skip this -> step](step-99.md) for now and return to it later. - -## What you'll learn in this step - -In this step of the Netdata guide, you'll learn: - -- [What a proxy is and the benefits of using one](#wait-whats-a-proxy) -- [How to connect Netdata to Nginx](#connect-netdata-to-nginx) -- [How to enable HTTPS in Nginx](#enable-https-in-nginx) -- [How to secure your Netdata dashboard with a password](#secure-your-netdata-dashboard-with-a-password) - -Let's dive in! - -## Wait. What's a proxy? - -A proxy is a middleman between the internet and a service you're running on your system. Traffic from the internet at -large enters your system through the proxy, which then routes it to the service. - -A proxy is often used to enable encrypted HTTPS connections with your browser, but they're also useful for load -balancing, performance, and password-protection. - -We'll use [Nginx](https://nginx.org/en/) for this step of the guide, but you can also use -[Caddy](https://caddyserver.com/) as a simple proxy if you prefer. - -## Required before you start - -You need three things to run a proxy using Nginx: - -- Nginx and Certbot installed on your system -- A fully qualified domain name -- A subdomain for Netdata that points to your system - -### Nginx and Certbot - -This step of the guide assumes you can install Nginx on your system. Here are the easiest methods to do so on Debian, -Ubuntu, Fedora, and CentOS systems. - -```bash -sudo apt-get install nginx # Debian/Ubuntu -sudo dnf install nginx # Fedora -sudo yum install nginx # CentOS -``` - -Check out [Nginx's installation -instructions](https://docs.nginx.com/nginx/admin-guide/installing-nginx/installing-nginx-open-source/) for details on -other Linux distributions. - -Certbot is a tool to help you create and renew certificate+key pairs for your domain. Visit their -[instructions](https://certbot.eff.org/instructions) to get a detailed installation process for your operating system. - -### Fully qualified domain name - -The only other true prerequisite of using a proxy is a **fully qualified domain name** (FQDN). In other words, a domain -name like `example.com`, `netdata.cloud`, or `github.com`. - -If you don't have a domain name, you won't be able to use a proxy the way we'll describe here. - -Because we strongly recommend running Netdata behind a proxy, the cost of a domain name is worth the benefit. If you -don't have a preferred domain registrar, try [Google Domains](https://domains.google/), -[Cloudflare](https://www.cloudflare.com/products/registrar/), or [Namecheap](https://www.namecheap.com/). - -### Subdomain for Netdata - -Any of the three domain registrars mentioned above, and most registrars in general, will allow you to create new DNS -entries for your domain. - -To create a subdomain for Netdata, use your registrar's DNS settings to create an A record for a `netdata` subdomain. -Point the A record to the IP address of your system. - -Once finished with the steps below, you'll be able to access your dashboard at `http://netdata.example.com`. - -## Connect Netdata to Nginx - -The first part of enabling the proxy is to create a new server for Nginx. - -Use your favorite text editor to create a file at `/etc/nginx/sites-available/netdata`, copy in the following -configuration, and change the `server_name` line to match your domain. - -```nginx -upstream backend { - server 127.0.0.1:19999; - keepalive 64; -} - -server { - listen 80; - # uncomment the line if you want nginx to listen on IPv6 address - #listen [::]:80; - - # Change `example.com` to match your domain name. - server_name netdata.example.com; - - location / { - proxy_set_header X-Forwarded-Host $host; - proxy_set_header X-Forwarded-Server $host; - proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; - proxy_pass http://backend; - proxy_http_version 1.1; - proxy_pass_request_headers on; - proxy_set_header Connection "keep-alive"; - proxy_store off; - } -} -``` - -Save and close the file. - -Test your configuration file by running `sudo nginx -t`. - -If that returns no errors, it's time to make your server available. Run the command to create a symbolic link in the -`sites-enabled` directory. - -```bash -sudo ln -s /etc/nginx/sites-available/netdata /etc/nginx/sites-enabled/netdata -``` - -Finally, restart Nginx to make your changes live. Open your browser and head to `http://netdata.example.com`. You should -see your proxied Netdata dashboard! - -## Enable HTTPS in Nginx - -All this proxying doesn't mean much if we can't take advantage of one of the biggest benefits: encrypted HTTPS -connections! Let's fix that. - -Certbot will automatically get a certificate, edit your Nginx configuration, and get HTTPS running in a single step. Run -the following: - -```bash -sudo certbot --nginx -``` - -> See this error after running `sudo certbot --nginx`? -> -> ``` -> Saving debug log to /var/log/letsencrypt/letsencrypt.log -> The requested nginx plugin does not appear to be installed` -> ``` -> -> You must install `python-certbot-nginx`. On Ubuntu or Debian systems, you can run `sudo apt-get install -> python-certbot-nginx` to download and install this package. - -You'll be prompted with a few questions. At the `Which names would you like to activate HTTPS for?` question, hit -`Enter`. Next comes this question: - -```bash -Please choose whether or not to redirect HTTP traffic to HTTPS, removing HTTP access. -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -1: No redirect - Make no further changes to the webserver configuration. -2: Redirect - Make all requests redirect to secure HTTPS access. Choose this for -new sites, or if you're confident your site works on HTTPS. You can undo this -change by editing your web server's configuration. -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -``` - -You _do_ want to force HTTPS, so hit `2` and then `Enter`. Nginx will now ensure all attempts to access -`netdata.example.com` use HTTPS. - -Certbot will automatically renew your certificate whenever it's needed, so you're done configuring your proxy. Open your -browser again and navigate to `https://netdata.example.com`, and you'll land on an encrypted, proxied Netdata dashboard! - -## Secure your Netdata dashboard with a password - -Finally, let's take a moment to put your Netdata dashboard behind a password. This step is optional, but you might not -want _anyone_ to access the metrics in your proxied dashboard. - -Run the below command after changing `user` to the username you want to use to log in to your dashboard. - -```bash -sudo sh -c "echo -n 'user:' >> /etc/nginx/.htpasswd" -``` - -Then run this command to create a password: - -```bash -sudo sh -c "openssl passwd -apr1 >> /etc/nginx/.htpasswd" -``` - -You'll be prompted to create a password. Next, open your Nginx configuration file at -`/etc/nginx/sites-available/netdata` and add these two lines under `location / {`: - -```nginx - location / { - auth_basic "Restricted Content"; - auth_basic_user_file /etc/nginx/.htpasswd; - ... -``` - -Save, exit, and restart Nginx. Then try visiting your dashboard one last time. You'll see a prompt for the username and -password you just created. - -![Username/password -prompt](https://user-images.githubusercontent.com/1153921/67431031-5320bf80-f598-11e9-9573-f9f9912f1ef6.png) - -Your Netdata dashboard is now a touch more secure. - -## What's next? - -You're a real sysadmin now! - -If you want to configure your Nginx proxy further, check out the following: - -- [Running Netdata behind Nginx](https://github.com/netdata/netdata/blob/master/docs/Running-behind-nginx.md) -- [How to optimize Netdata's performance](https://github.com/netdata/netdata/blob/master/docs/guides/configure/performance.md) -- [Enabling TLS on Netdata's dashboard](https://github.com/netdata/netdata/blob/master/web/server/README.md#enabling-tls-support) - -And... you're _almost_ done with the Netdata guide. - -For some celebratory emoji and a clap on the back, head on over to our final step. - -[Next: The end. →](step-99.md) - - diff --git a/docs/guides/step-by-step/step-99.md b/docs/guides/step-by-step/step-99.md deleted file mode 100644 index 58902fee7..000000000 --- a/docs/guides/step-by-step/step-99.md +++ /dev/null @@ -1,51 +0,0 @@ -<!-- -title: "Step ∞. You're finished!" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/step-by-step/step-99.md ---> - -# Step ∞. You're finished! - -Congratulations. 🎉 - -You've completed the step-by-step Netdata guide. That means you're well on your way to becoming an expert in using -our toolkit for health monitoring and performance troubleshooting. - -But, perhaps more importantly, also that much closer to being an expert in the _fundamental skills behind health -monitoring and performance troubleshooting_, which you can take with you to any job or project. - -And that is the entire point of this guide, and Netdata's [documentation](https://learn.netdata.cloud) as a -whole—give you every resource possible to help you build faster, more resilient systems, services, and applications. - -Along the way, you learned how to: - -- Navigate Netdata's dashboard and visually detect anomalies using its charts. -- Monitor multiple systems using Netdata agents connected together with your browser and Netdata Cloud. -- Edit your `netdata.conf` file to tweak Netdata to your liking. -- Tune existing alarms and create entirely new ones, plus get notifications about alarms on your favorite services. -- Take advantage of Netdata's auto-detection capabilities to ensure your applications/services are monitored with - little to no configuration. -- Use advanced features within Netdata's dashboard. -- Build a custom dashboard using `dashboard.js`. -- Save more historical metrics with the database engine or archive metrics to MongoDB. -- Put Netdata behind a proxy to enable HTTPS and improve performance. - -Seems like a lot, right? Well, we hope it felt manageable and, yes, even _fun_. - -## What's next? - -Now that you're at the end of our step-by-step Netdata guide, the next steps are entirely up to you. In fact, you're -just at the beginning of your journey into health monitoring and performance troubleshooting. - -Our documentation exists to put every Netdata resource in front of you as easily and coherently as we possibly can. -Click around, search, and find new mountains to climb. - -If that feels like too much possibility to you, why not one of these options: - -- Share your experience with Netdata and this guide. Be sure to [@mention](https://twitter.com/linuxnetdata) us on - Twitter! -- Contribute to what we do. Browse our [open issues](https://github.com/netdata/netdata/issues) and check out out - [contributions doc](https://learn.netdata.cloud/contribute/) for ideas of how you can pitch in. - -We can't wait to see what you monitor next! Bon voyage! ⛵ - - diff --git a/docs/guides/troubleshoot/monitor-debug-applications-ebpf.md b/docs/guides/troubleshoot/monitor-debug-applications-ebpf.md index c79a038cc..856985ec5 100644 --- a/docs/guides/troubleshoot/monitor-debug-applications-ebpf.md +++ b/docs/guides/troubleshoot/monitor-debug-applications-ebpf.md @@ -1,8 +1,11 @@ <!-- title: "Monitor, troubleshoot, and debug applications with eBPF metrics" +sidebar_label: "Monitor, troubleshoot, and debug applications with eBPF metrics" description: "Use Netdata's built-in eBPF metrics collector to monitor, troubleshoot, and debug your custom application using low-level kernel feedback." image: /img/seo/guides/troubleshoot/monitor-debug-applications-ebpf.png custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/troubleshoot/monitor-debug-applications-ebpf.md +learn_status: "Published" +learn_rel_path: "Operations" --> # Monitor, troubleshoot, and debug applications with eBPF metrics @@ -83,7 +86,7 @@ to show other charts that will help you debug and troubleshoot how it interacts ## Configure the eBPF collector to monitor errors -The eBPF collector has [two possible modes](/collectors/ebpf.plugin#ebpf-load-mode): `entry` and `return`. The default +The eBPF collector has [two possible modes](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/README.md#ebpf-load-mode): `entry` and `return`. The default is `entry`, and only monitors calls to kernel functions, but the `return` also monitors and charts _whether these calls return in error_. @@ -236,35 +239,16 @@ same application on multiple systems and want to correlate how it performs on ea findings with someone else on your team. If you don't already have a Netdata Cloud account, go [sign in](https://app.netdata.cloud) and get started for free. -Read the [get started with Cloud guide](https://github.com/netdata/netdata/blob/master/docs/cloud/get-started.mdx) for a walkthrough of -connecting nodes to and other fundamentals. +You can also read how to [monitor your infrastructure with Netdata Cloud](https://github.com/netdata/netdata/blob/master/docs/quickstart/infrastructure.md) to understand the key features that it has to offer. Once you've added one or more nodes to a Space in Netdata Cloud, you can see aggregated eBPF metrics in the [Overview dashboard](https://github.com/netdata/netdata/blob/master/docs/visualize/overview-infrastructure.md) under the same **Applications** or **eBPF** sections that you -find on the local Agent dashboard. Or, [create new dashboards](https://github.com/netdata/netdata/blob/master/docs/visualize/create-dashboards.md) using eBPF metrics +find on the local Agent dashboard. Or, [create new dashboards](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md) using eBPF metrics from any number of distributed nodes to see how your application interacts with multiple Linux kernels on multiple Linux systems. Now that you can see eBPF metrics in Netdata Cloud, you can [invite your team](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/invite-your-team.md) and share your findings with others. -## What's next? - -Debugging and troubleshooting an application takes a special combination of practice, experience, and sheer luck. With -Netdata's eBPF metrics to back you up, you can rest assured that you see every minute detail of how your application -interacts with the Linux kernel. - -If you're still trying to wrap your head around what we offer, be sure to read up on our accompanying documentation and -other resources on eBPF monitoring with Netdata: - -- [eBPF collector](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/README.md) -- [eBPF's integration with `apps.plugin`](https://github.com/netdata/netdata/blob/master/collectors/apps.plugin/README.md#integration-with-ebpf) -- [Linux eBPF monitoring with Netdata](https://www.netdata.cloud/blog/linux-ebpf-monitoring-with-netdata/) - -The scenarios described above are just the beginning when it comes to troubleshooting with eBPF metrics. We're excited -to explore others and see what our community dreams up. If you have other use cases, whether simulated or real-world, -we'd love to hear them: [info@netdata.cloud](mailto:info@netdata.cloud). - -Happy troubleshooting! diff --git a/docs/guides/troubleshoot/troubleshooting-agent-with-cloud-connection.md b/docs/guides/troubleshoot/troubleshooting-agent-with-cloud-connection.md index 138182e01..a0e8973f7 100644 --- a/docs/guides/troubleshoot/troubleshooting-agent-with-cloud-connection.md +++ b/docs/guides/troubleshoot/troubleshooting-agent-with-cloud-connection.md @@ -1,11 +1,7 @@ -<!-- -title: "Troubleshoot Agent-Cloud connectivity issues" -description: "A simple guide to troubleshoot occurrences where the Agent is showing as offline after claiming." -custom_edit_url: https://github.com/netdata/netdata/edit/master/guides/troubleshoot/troubleshooting-agent-with-cloud-connection.md ---> - # Troubleshoot Agent-Cloud connectivity issues +Learn how to troubleshoot the Netdata Agent showing as offline after claiming, so you can connect the Agent to Netdata Cloud. + When you are claiming a node, you might not be able to immediately see it online in Netdata Cloud. This could be due to an error in the claiming process or a temporary outage of some services. @@ -13,9 +9,13 @@ We identified some scenarios that might cause this delay and possible actions yo The most common explanation for the delay usually falls into one of the following three categories: -- [The claiming process of the kickstart script was unsuccessful](#the-claiming-process-of-the-kickstart-script-was-unsuccessful) -- [Claiming on an older, deprecated version of the Agent](#claiming-on-an-older-deprecated-version-of-the-agent) -- [Network issues while connecting to the Cloud](#network-issues-while-connecting-to-the-cloud) +- [Troubleshoot Agent-Cloud connectivity issues](#troubleshoot-agent-cloud-connectivity-issues) + - [The claiming process of the kickstart script was unsuccessful](#the-claiming-process-of-the-kickstart-script-was-unsuccessful) + - [The kickstart script auto-claimed the Agent but there was no error message displayed](#the-kickstart-script-auto-claimed-the-agent-but-there-was-no-error-message-displayed) + - [Claiming on an older, deprecated version of the Agent](#claiming-on-an-older-deprecated-version-of-the-agent) + - [Network issues while connecting to the Cloud](#network-issues-while-connecting-to-the-cloud) + - [Verify that your IP is whitelisted from Netdata Cloud](#verify-that-your-ip-is-whitelisted-from-netdata-cloud) + - [Make sure that your node has internet connectivity and can resolve network domains](#make-sure-that-your-node-has-internet-connectivity-and-can-resolve-network-domains) ## The claiming process of the kickstart script was unsuccessful @@ -48,16 +48,14 @@ and you must do it manually, using the following steps: 3. Retry the kickstart claiming process. -:::note - -In some cases a simple restart of the Agent can fix the issue. -Read more about [Starting, Stopping and Restarting the Agent](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md). - -::: +> ### Note +> +> In some cases a simple restart of the Agent can fix the issue. +> Read more about [Starting, Stopping and Restarting the Agent](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md). ## Claiming on an older, deprecated version of the Agent -Make sure that you are using the latest version of Netdata if you are using the [Claiming script](https://learn.netdata.cloud/docs/agent/claim#claiming-script). +Make sure that you are using the latest version of Netdata if you are using the [Claiming script](https://github.com/netdata/netdata/blob/master/claim/README.md#claiming-script). With the introduction of our new architecture, Agents running versions lower than `v1.32.0` can face claiming problems, so we recommend you [update the Netdata Agent](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md) to the latest stable version. @@ -109,9 +107,7 @@ To verify this: main-ingress-545609a41fcaf5d6.elb.us-east-1.amazonaws.com has address 44.196.50.41 ``` - :::info - - There will be cases in which the firewall restricts network access. In those cases, you need to whitelist `api.netdata.cloud` and `mqtt.netdata.cloud` domains to be able to see your nodes in Netdata Cloud. - If you can't whitelist domains in your firewall, you can whitelist the IPs that the above command will produce, but keep in mind that they can change without any notice. - - ::: + > ### Info + > + > There will be cases in which the firewall restricts network access. In those cases, you need to whitelist `api.netdata.cloud` and `mqtt.netdata.cloud` domains to be able to see your nodes in Netdata Cloud. + > If you can't whitelist domains in your firewall, you can whitelist the IPs that the above command will produce, but keep in mind that they can change without any notice. diff --git a/docs/guides/using-host-labels.md b/docs/guides/using-host-labels.md index 7937d589b..b9b156116 100644 --- a/docs/guides/using-host-labels.md +++ b/docs/guides/using-host-labels.md @@ -1,23 +1,81 @@ -<!-- -title: "Use host labels to organize systems, metrics, and alarms" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/using-host-labels.md ---> +# Organize systems, metrics, and alerts -# Use host labels to organize systems, metrics, and alarms +When you use Netdata to monitor and troubleshoot an entire infrastructure, you need sophisticated ways of keeping everything organized. +Netdata allows to organize your observability infrastructure with spaces, war rooms, virtual nodes, host labels, and metric labels. -When you use Netdata to monitor and troubleshoot an entire infrastructure, whether that's dozens or hundreds of systems, -you need sophisticated ways of keeping everything organized. You need alarms that adapt to the system's purpose, or -whether the parent or child in a streaming setup. You need properly-labeled metrics archiving so you can sort, -correlate, and mash-up your data to your heart's content. You need to keep tabs on ephemeral Docker containers in a -Kubernetes cluster. +## Spaces and war rooms -You need **host labels**: a powerful new way of organizing your Netdata-monitored systems. We introduced host labels in -[v1.20 of Netdata](https://blog.netdata.cloud/posts/release-1.20/), and they come pre-configured out of the box. +[Spaces](https://github.com/netdata/netdata/blob/master/docs/cloud/spaces.md) are used for organization-level or infrastructure-level +grouping of nodes and people. A node can only appear in a single space, while people can have access to multiple spaces. + +The [war rooms](https://github.com/netdata/netdata/edit/master/docs/cloud/war-rooms.md) in a space bring together nodes and people in +collaboration areas. War rooms can also be used for fine-tuned +[role based access control](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/role-based-access.md). + +## Virtual nodes + +Netdata’s virtual nodes functionality allows you to define nodes in configuration files and have them be treated as regular nodes +in all of the UI, dashboards, tabs, filters etc. For example, you can create a virtual node each for all your Windows machines +and monitor them as discrete entities. Virtual nodes can help you simplify your infrastructure monitoring and focus on the +individual node that matters. + +To define your windows server as a virtual node you need to: + + * Define virtual nodes in `/etc/netdata/vnodes/vnodes.conf` + + ```yaml + - hostname: win_server1 + guid: <value> + ``` + Just remember to use a valid guid (On Linux you can use `uuidgen` command to generate one, on Windows just use the `[guid]::NewGuid()` command in PowerShell) + + * Add the vnode config to the data collection job. e.g. in `go.d/windows.conf`: + ```yaml + jobs: + - name: win_server1 + vnode: win_server1 + url: http://203.0.113.10:9182/metrics + ``` + +## Host labels + +Host labels can be extremely useful when: + +- You need alarms that adapt to the system's purpose +- You need properly-labeled metrics archiving so you can sort, correlate, and mash-up your data to your heart's content. +- You need to keep tabs on ephemeral Docker containers in a Kubernetes cluster. Let's take a peek into how to create host labels and apply them across a few of Netdata's features to give you more organization power over your infrastructure. -## Create unique host labels +### Default labels + +When Netdata starts, it captures relevant information about the system and converts them into automatically generated +host labels. You can use these to logically organize your systems via health entities, exporting metrics, +parent-child status, and more. + +They capture the following: + +- Kernel version +- Operating system name and version +- CPU architecture, system cores, CPU frequency, RAM, and disk space +- Whether Netdata is running inside of a container, and if so, the OS and hardware details about the container's host +- Whether Netdata is running inside K8s node +- What virtualization layer the system runs on top of, if any +- Whether the system is a streaming parent or child + +If you want to organize your systems without manually creating host labels, try the automatic labels in some of the +features below. You can see them under `http://HOST-IP:19999/api/v1/info`, beginning with an underscore `_`. +```json +{ + ... + "host_labels": { + "_is_k8s_node": "false", + "_is_parent": "false", + ... +``` + +### Custom labels Host labels are defined in `netdata.conf`. To create host labels, open that file using `edit-config`. @@ -68,28 +126,8 @@ read the status of your agent. For example, from a VPS system running Debian 10: } ``` -You may have noticed a handful of labels that begin with an underscore (`_`). These are automatic labels. - -### Automatic labels - -When Netdata starts, it captures relevant information about the system and converts them into automatically-generated -host labels. You can use these to logically organize your systems via health entities, exporting metrics, -parent-child status, and more. - -They capture the following: - -- Kernel version -- Operating system name and version -- CPU architecture, system cores, CPU frequency, RAM, and disk space -- Whether Netdata is running inside of a container, and if so, the OS and hardware details about the container's host -- Whether Netdata is running inside K8s node -- What virtualization layer the system runs on top of, if any -- Whether the system is a streaming parent or child - -If you want to organize your systems without manually creating host labels, try the automatic labels in some of the -features below. -## Host labels in streaming +### Host labels in streaming You may have noticed the `_is_parent` and `_is_child` automatic labels from above. Host labels are also now streamed from a child to its parent node, which concentrates an entire infrastructure's OS, hardware, container, @@ -108,7 +146,7 @@ child system. It's a vastly simplified way of accessing critical information abo You can also use `_is_parent`, `_is_child`, and any other host labels in both health entities and metrics exporting. Speaking of which... -## Host labels in health entities +### Host labels in alerts You can use host labels to logically organize your systems by their type, purpose, or location, and then apply specific alarms to them. @@ -156,7 +194,7 @@ Or when ephemeral Docker nodes are involved: Of course, there are many more possibilities for intuitively organizing your systems with host labels. See the [health documentation](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#alarm-line-host-labels) for more details, and then get creative! -## Host labels in metrics exporting +### Host labels in metrics exporting If you have enabled any metrics exporting via our experimental [exporters](https://github.com/netdata/netdata/blob/master/exporting/README.md), any new host labels you created manually are sent to the destination database alongside metrics. You can change this behavior by @@ -185,28 +223,31 @@ send automatic labels = yes By applying labels to exported metrics, you can more easily parse historical metrics with the labels applied. To learn more about exporting, read the [documentation](https://github.com/netdata/netdata/blob/master/exporting/README.md). -## What's next? +## Metric labels -Host labels are a brand-new feature to Netdata, and yet they've already propagated deeply into some of its core -functionality. We're just getting started with labels, and will keep the community apprised of additional functionality -as it's made available. You can also track [issue #6503](https://github.com/netdata/netdata/issues/6503), which is where -the Netdata team first kicked off this work. +The Netdata aggregate charts allow you to filter and group metrics based on label name-value pairs. -It should be noted that while the Netdata dashboard does not expose either user-configured or automatic host labels, API -queries _do_ showcase this information. As always, we recommend you secure Netdata +All go.d plugin collectors support the specification of labels at the "collection job" level. Some collectors come with out of the box +labels (e.g. generic Prometheus collector, Kubernetes, Docker and more). But you can also add your own custom labels, by configuring +the data collection jobs. -- [Expose Netdata only in a private LAN](https://github.com/netdata/netdata/blob/master/docs/netdata-security.md#expose-netdata-only-in-a-private-lan) -- [Enable TLS/SSL for web/API requests](https://github.com/netdata/netdata/blob/master/web/server/README.md#enabling-tls-support) -- Put Netdata behind a proxy - - [Use an authenticating web server in proxy - mode](https://github.com/netdata/netdata/blob/master/docs/netdata-security.md#use-an-authenticating-web-server-in-proxy-mode) - - [Nginx proxy](https://github.com/netdata/netdata/blob/master/docs/Running-behind-nginx.md) - - [Apache proxy](https://github.com/netdata/netdata/blob/master/docs/Running-behind-apache.md) - - [Lighttpd](https://github.com/netdata/netdata/blob/master/docs/Running-behind-lighttpd.md) - - [Caddy](https://github.com/netdata/netdata/blob/master/docs/Running-behind-caddy.md) +For example, suppose we have a single Netdata agent, collecting data from two remote Apache web servers, located in different data centers. +The web servers are load balanced and provide access to the service "Payments". -If you have issues or questions around using host labels, don't hesitate to [file an -issue](https://github.com/netdata/netdata/issues/new?assignees=&labels=bug%2Cneeds+triage&template=BUG_REPORT.yml) on GitHub. We're -excited to make host labels even more valuable to our users, which we can only do with your input. +You can define the following in `go.d.conf`, to be able to group the web requests by service or location: +``` +jobs: + - name: mywebserver1 + url: http://host1/server-status?auto + labels: + service: "Payments" + location: "Atlanta" + - name: mywebserver2 + url: http://host2/server-status?auto + labels: + service: "Payments" + location: "New York" +``` +Of course you may define as many custom label/value pairs as you like, in as many data collection jobs you need. diff --git a/docs/metrics-storage-management/enable-streaming.md b/docs/metrics-storage-management/enable-streaming.md new file mode 100644 index 000000000..f54ffaeba --- /dev/null +++ b/docs/metrics-storage-management/enable-streaming.md @@ -0,0 +1,228 @@ +# How metrics streaming works + +Each node running Netdata can stream the metrics it collects, in real time, to another node. Streaming allows you to +replicate metrics data across multiple nodes, or centralize all your metrics data into a single time-series database +(TSDB). + +When one node streams metrics to another, the node receiving metrics can visualize them on the dashboard, run health checks to +[trigger alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/view-active-alarms.md) and +[send notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md), and +[export](https://github.com/netdata/netdata/blob/master/docs/export/external-databases.md) all metrics to an external TSDB. When Netdata streams metrics to another +Netdata, the receiving one is able to perform everything a Netdata instance is capable of. + +Streaming lets you decide exactly how you want to store and maintain metrics data. While we believe Netdata's +[distributed architecture](https://github.com/netdata/netdata/blob/master/docs/store/distributed-data-architecture.md) is +ideal for speed and scale, streaming provides centralization options and high data availability. + +This document will get you started quickly with streaming. More advanced concepts and suggested production deployments +can be found in the [streaming and replication reference](https://github.com/netdata/netdata/blob/master/streaming/README.md). + +## Streaming basics + +There are three types of nodes in Netdata's streaming ecosystem. + +- **Parent**: A node, running Netdata, that receives streamed metric data. +- **Child**: A node, running Netdata, that streams metric data to one or more parent. +- **Proxy**: A node, running Netdata, that receives metric data from a child and "forwards" them on to a + separate parent node. + +Netdata uses API keys, which are just random GUIDs, to authorize the communication between child and parent nodes. We +recommend using `uuidgen` for generating API keys, which can then be used across any number of streaming connections. +Or, you can generate unique API keys for each parent-child relationship. + +Once the parent node authorizes the child's API key, the child can start streaming metrics. + +It's important to note that the streaming connection uses TCP, UDP, or Unix sockets, _not HTTP_. To proxy streaming +metrics, you need to use a proxy that tunnels [OSI layer 4-7 +traffic](https://en.wikipedia.org/wiki/OSI_model#Layer_4:_Transport_Layer) without interfering with it, such as +[SOCKS](https://en.wikipedia.org/wiki/SOCKS) or Nginx's +[TCP/UDP load balancing](https://docs.nginx.com/nginx/admin-guide/load-balancer/tcp-udp-load-balancer/). + +## Supported streaming configurations + +Netdata supports any combination of parent, child, and proxy nodes that you can imagine. Any node can act as both a +parent, child, or proxy at the same time, sending or receiving streaming metrics from any number of other nodes. + +Here are a few example streaming configurations: + +- **Headless collector**: + - Child `A`, _without_ a database or web dashboard, streams metrics to parent `B`. + - `A` metrics are only available via the local Agent dashboard for `B`. + - `B` generates alarms for `A`. +- **Replication**: + - Child `A`, _with_ a database and web dashboard, streams metrics to parent `B`. + - `A` metrics are available on both local Agent dashboards, and can be stored with the same or different metrics + retention policies. + - Both `A` and `B` generate alarms. +- **Proxy**: + - Child `A`, _with or without_ a database, sends metrics to proxy `C`, also _with or without_ a database. `C` sends + metrics to parent `B`. + - Any node with a database can generate alarms. + + + +### A basic parent child setup + +![simple-parent-child](https://user-images.githubusercontent.com/43294513/232492152-11886282-29bc-401f-9577-24237e43a501.jpg) + +For a predictable number of non-ephemeral nodes, install a Netdata agent on each node and replicate its data to a +Netdata parent, preferrably on a management/admin node outside your production infrastructure. +There are two variations of the basic setup: + +- When your nodes have sufficient RAM and disk IO the Netdata agents on each node can run with the default + settings for data collection and retention. + +- When your nodes have severe RAM and disk IO limitations (e.g. Raspberry Pis), you should + [optimize the Netdata agent's performance](https://github.com/netdata/netdata/blob/master/docs/guides/configure/performance.md). + +[Secure your nodes](https://github.com/netdata/netdata/blob/master/docs/category-overview-pages/secure-nodes.md) to +protect them from the internet by making their UI accessible only via an nginx proxy, with potentially different subdomains +for the parent and even each child, if necessary. + +Both children and the parent are connected to the cloud, to enable infrastructure observability, +[without transferring the collected data](https://github.com/netdata/netdata/blob/master/docs/netdata-security.md). +Requests for data are always serverd by a connected Netdata agent. When both a child and a parent are connected, +the cloud will always select the parent to query the user requested data. + +### An advanced setup + +![Ephemeral nodes with two parents](https://user-images.githubusercontent.com/43294513/228891974-590bf0de-4e5a-46b2-a07a-7bb3dffde2bf.jpg) + +When the nodes are ephemeral, we recommend using two parents in an active-active setup, and having the children not store data at all. + +Both parents are configured on each child, so that if one is not available, they connect to the other. + +The children in this set up are not connected to Netdata Cloud at all, as high availability is achieved with the second parent. + +## Enable streaming between nodes + +The simplest streaming configuration is **replication**, in which a child node streams its metrics in real time to a +parent node, and both nodes retain metrics in their own databases. + +To configure replication, you need two nodes, each running Netdata. First you'll first enable streaming on your parent +node, then enable streaming on your child node. When you're finished, you'll be able to see the child node's metrics in +the parent node's dashboard, quickly switch between the two dashboards, and be able to serve +[alarm notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md) from either or both nodes. + +### Enable streaming on the parent node + +First, log onto the node that will act as the parent. + +Run `uuidgen` to create a new API key, which is a randomly-generated machine GUID the Netdata Agent uses to identify +itself while initiating a streaming connection. Copy that into a separate text file for later use. + +> Find out how to [install `uuidgen`](https://command-not-found.com/uuidgen) on your node if you don't already have it. + +Next, open `stream.conf` using [`edit-config`](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) +from within the [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata +sudo ./edit-config stream.conf +``` + +Scroll down to the section beginning with `[API_KEY]`. Paste the API key you generated earlier between the brackets, so +that it looks like the following: + +```conf +[11111111-2222-3333-4444-555555555555] +``` + +Set `enabled` to `yes`, and `default memory mode` to `dbengine`. Leave all the other settings as their defaults. A +simplified version of the configuration, minus the commented lines, looks like the following: + +```conf +[11111111-2222-3333-4444-555555555555] + enabled = yes + default memory mode = dbengine +``` + +Save the file and close it, then restart Netdata with `sudo systemctl restart netdata`, or the [appropriate +method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. + +### Enable streaming on the child node + +Connect to your child node with SSH. + +Open `stream.conf` again. Scroll down to the `[stream]` section and set `enabled` to `yes`. Paste the IP address of your +parent node at the end of the `destination` line, and paste the API key generated on the parent node onto the `api key` +line. + +Leave all the other settings as their defaults. A simplified version of the configuration, minus the commented lines, +looks like the following: + +```conf +[stream] + enabled = yes + destination = 203.0.113.0 + api key = 11111111-2222-3333-4444-555555555555 +``` + +Save the file and close it, then restart Netdata with `sudo systemctl restart netdata`, or the [appropriate +method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. + +### Enable TLS/SSL on streaming (optional) + +While encrypting the connection between your parent and child nodes is recommended for security, it's not required to +get started. If you're not interested in encryption, skip ahead to [view streamed +metrics](#view-streamed-metrics-in-netdatas-dashboard). + +In this example, we'll use self-signed certificates. + +On the **parent** node, use OpenSSL to create the key and certificate, then use `chown` to make the new files readable +by the `netdata` user. + +```bash +sudo openssl req -newkey rsa:2048 -nodes -sha512 -x509 -days 365 -keyout /etc/netdata/ssl/key.pem -out /etc/netdata/ssl/cert.pem +sudo chown netdata:netdata /etc/netdata/ssl/cert.pem /etc/netdata/ssl/key.pem +``` + +Next, enforce TLS/SSL on the web server. Open `netdata.conf`, scroll down to the `[web]` section, and look for the `bind +to` setting. Add `^SSL=force` to turn on TLS/SSL. See the [web server +reference](https://github.com/netdata/netdata/blob/master/web/server/README.md#enabling-tls-support) for other TLS/SSL options. + +```conf +[web] + bind to = *=dashboard|registry|badges|management|streaming|netdata.conf^SSL=force +``` + +Next, connect to the **child** node and open `stream.conf`. Add `:SSL` to the end of the existing `destination` setting +to connect to the parent using TLS/SSL. Uncomment the `ssl skip certificate verification` line to allow the use of +self-signed certificates. + +```conf +[stream] + enabled = yes + destination = 203.0.113.0:SSL + ssl skip certificate verification = yes + api key = 11111111-2222-3333-4444-555555555555 +``` + +Restart both the parent and child nodes with `sudo systemctl restart netdata`, or the [appropriate +method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system, to stream encrypted metrics using TLS/SSL. + +### View streamed metrics in Netdata Cloud + +In Netdata Cloud you should now be able to see a new parent showing up in the Home tab under "Nodes by data replication". +The replication factor for the child node has now increased to 2, meaning that its data is now highly available. + +You don't need to do anything else, as the cloud will automatically prefer to fetch data about the child from the parent +and switch to querying the child only when the parent is unavailable, or for some reason doesn't have the requested +data (e.g. the connection between parent and the child is broken). + +### View streamed metrics in Netdata's dashboard + +At this point, the child node is streaming its metrics in real time to its parent. Open the local Agent dashboard for +the parent by navigating to `http://PARENT-NODE:19999` in your browser, replacing `PARENT-NODE` with its IP address or +hostname. + +This dashboard shows parent metrics. To see child metrics, open the left-hand sidebar with the hamburger icon +![Hamburger icon](https://raw.githubusercontent.com/netdata/netdata-ui/master/src/components/icon/assets/hamburger.svg) +in the top panel. Both nodes appear under the **Replicated Nodes** menu. Click on either of the links to switch between +separate parent and child dashboards. + +![Switching between parent and child dashboards](https://user-images.githubusercontent.com/1153921/110043346-761ec000-7d04-11eb-8e58-77670ba39161.gif) + +The child dashboard is also available directly at `http://PARENT-NODE:19999/host/CHILD-HOSTNAME`, which in this example +is `http://203.0.113.0:19999/host/netdata-child`. + diff --git a/docs/metrics-storage-management/enable-streaming.mdx b/docs/metrics-storage-management/enable-streaming.mdx deleted file mode 100644 index 3bcf19b40..000000000 --- a/docs/metrics-storage-management/enable-streaming.mdx +++ /dev/null @@ -1,158 +0,0 @@ ---- -title: "Enable streaming between nodes" -description: >- - "With metrics streaming enabled, you can not only replicate metrics data - into a second database, but also view dashboards and trigger alarm notifications - for multiple nodes in parallel." -type: "how-to" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.mdx" -sidebar_label: "Enable streaming between nodes" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup" ---- - -# Enable streaming between nodes - -The simplest streaming configuration is **replication**, in which a child node streams its metrics in real time to a -parent node, and both nodes retain metrics in their own databases. - -To configure replication, you need two nodes, each running Netdata. First you'll first enable streaming on your parent -node, then enable streaming on your child node. When you're finished, you'll be able to see the child node's metrics in -the parent node's dashboard, quickly switch between the two dashboards, and be able to serve [alarm -notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md) from either or both nodes. - -## Enable streaming on the parent node - -First, log onto the node that will act as the parent. - -Run `uuidgen` to create a new API key, which is a randomly-generated machine GUID the Netdata Agent uses to identify -itself while initiating a streaming connection. Copy that into a separate text file for later use. - -> Find out how to [install `uuidgen`](https://command-not-found.com/uuidgen) on your node if you don't already have it. - -Next, open `stream.conf` using [`edit-config`](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) -from within the [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). - -```bash -cd /etc/netdata -sudo ./edit-config stream.conf -``` - -Scroll down to the section beginning with `[API_KEY]`. Paste the API key you generated earlier between the brackets, so -that it looks like the following: - -```conf -[11111111-2222-3333-4444-555555555555] -``` - -Set `enabled` to `yes`, and `default memory mode` to `dbengine`. Leave all the other settings as their defaults. A -simplified version of the configuration, minus the commented lines, looks like the following: - -```conf -[11111111-2222-3333-4444-555555555555] - enabled = yes - default memory mode = dbengine -``` - -Save the file and close it, then restart Netdata with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. - -## Enable streaming on the child node - -Connect to your child node with SSH. - -Open `stream.conf` again. Scroll down to the `[stream]` section and set `enabled` to `yes`. Paste the IP address of your -parent node at the end of the `destination` line, and paste the API key generated on the parent node onto the `api key` -line. - -Leave all the other settings as their defaults. A simplified version of the configuration, minus the commented lines, -looks like the following: - -```conf -[stream] - enabled = yes - destination = 203.0.113.0 - api key = 11111111-2222-3333-4444-555555555555 -``` - -Save the file and close it, then restart Netdata with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. - -## Enable TLS/SSL on streaming (optional) - -While encrypting the connection between your parent and child nodes is recommended for security, it's not required to -get started. If you're not interested in encryption, skip ahead to [view streamed -metrics](#view-streamed-metrics-in-netdatas-dashboard). - -In this example, we'll use self-signed certificates. - -On the **parent** node, use OpenSSL to create the key and certificate, then use `chown` to make the new files readable -by the `netdata` user. - -```bash -sudo openssl req -newkey rsa:2048 -nodes -sha512 -x509 -days 365 -keyout /etc/netdata/ssl/key.pem -out /etc/netdata/ssl/cert.pem -sudo chown netdata:netdata /etc/netdata/ssl/cert.pem /etc/netdata/ssl/key.pem -``` - -Next, enforce TLS/SSL on the web server. Open `netdata.conf`, scroll down to the `[web]` section, and look for the `bind -to` setting. Add `^SSL=force` to turn on TLS/SSL. See the [web server -reference](https://github.com/netdata/netdata/blob/master/web/server/README.md#enabling-tls-support) for other TLS/SSL options. - -```conf -[web] - bind to = *=dashboard|registry|badges|management|streaming|netdata.conf^SSL=force -``` - -Next, connect to the **child** node and open `stream.conf`. Add `:SSL` to the end of the existing `destination` setting -to connect to the parent using TLS/SSL. Uncomment the `ssl skip certificate verification` line to allow the use of -self-signed certificates. - -```conf -[stream] - enabled = yes - destination = 203.0.113.0:SSL - ssl skip certificate verification = yes - api key = 11111111-2222-3333-4444-555555555555 -``` - -Restart both the parent and child nodes with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system, to stream encrypted metrics using TLS/SSL. - -## View streamed metrics in Netdata's dashboard - -At this point, the child node is streaming its metrics in real time to its parent. Open the local Agent dashboard for -the parent by navigating to `http://PARENT-NODE:19999` in your browser, replacing `PARENT-NODE` with its IP address or -hostname. - -This dashboard shows parent metrics. To see child metrics, open the left-hand sidebar with the hamburger icon -![Hamburger icon](https://raw.githubusercontent.com/netdata/netdata-ui/master/src/components/icon/assets/hamburger.svg) -in the top panel. Both nodes appear under the **Replicated Nodes** menu. Click on either of the links to switch between -separate parent and child dashboards. - -![Switching between parent and child -dashboards](https://user-images.githubusercontent.com/1153921/110043346-761ec000-7d04-11eb-8e58-77670ba39161.gif) - -The child dashboard is also available directly at `http://PARENT-NODE:19999/host/CHILD-HOSTNAME`, which in this example -is `http://203.0.113.0:19999/host/netdata-child`. - -## What's next? - -Now that you have a basic streaming setup with replication, you may want to tweak the configuration to eliminate the -child database, disable the child dashboard, or enable SSL on the streaming connection between the parent and child. - -See the [streaming reference -doc](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/reference-streaming.mdx#examples) for details about -other possible configurations. - -When using Netdata's default TSDB (`dbengine`), the parent node maintains separate, parallel databases for itself and -every child node streaming to it. Each instance is sized identically based on the `dbengine multihost disk space` -setting in `netdata.conf`. See our doc on [changing metrics retention](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) for -details. - -### Related information & further reading - -- Streaming - - [How Netdata streams metrics](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/how-streaming-works.mdx) - - **[Enable streaming between nodes](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.mdx)** - - [Streaming reference](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/reference-streaming.mdx) diff --git a/docs/metrics-storage-management/how-streaming-works.mdx b/docs/metrics-storage-management/how-streaming-works.mdx deleted file mode 100644 index f181d3769..000000000 --- a/docs/metrics-storage-management/how-streaming-works.mdx +++ /dev/null @@ -1,99 +0,0 @@ ---- -title: "How metrics streaming works" -description: >- - "Netdata's real-time streaming allows you to replicate metrics data - across multiple nodes, or centralize all your metrics data into a single - time-series database (TSDB)." -type: "explanation" -custom_edit_url: "https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/how-streaming-works.mdx" -sidebar_label: "How metrics streaming works" -learn_status: "Published" -learn_topic_type: "Concepts" -learn_rel_path: "Concepts" ---- - -# How metrics streaming works - -Each node running Netdata can stream the metrics it collects, in real time, to another node. Streaming allows you to -replicate metrics data across multiple nodes, or centralize all your metrics data into a single time-series database -(TSDB). - -When one node streams metrics to another, the node receiving metrics can visualize them on the -[dashboard](https://github.com/netdata/netdata/blob/master/docs/visualize/interact-dashboards-charts.md), run health checks to [trigger -alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/view-active-alarms.md) and [send notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md), and -[export](https://github.com/netdata/netdata/blob/master/docs/export/external-databases.md) all metrics to an external TSDB. When Netdata streams metrics to another -Netdata, the receiving one is able to perform everything a Netdata instance is capable of. - -Streaming lets you decide exactly how you want to store and maintain metrics data. While we believe Netdata's -[distributed architecture](https://github.com/netdata/netdata/blob/master/docs/store/distributed-data-architecture.md) is ideal for speed and scale, streaming -provides centralization options for those who want to maintain only a single TSDB instance. - -## Streaming basics - -There are three types of nodes in Netdata's streaming ecosystem. - -- **Parent**: A node, running Netdata, that receives streamed metric data. -- **Child**: A node, running Netdata, that streams metric data to one or more parent. -- **Proxy**: A node, running Netdata, that receives metric data from a child and "forwards" them on to a - separate parent node. - -Netdata uses API keys, which are just random GUIDs, to authorize the communication between child and parent nodes. We -recommend using `uuidgen` for generating API keys, which can then be used across any number of streaming connections. -Or, you can generate unique API keys for each parent-child relationship. - -Once the parent node authorizes the child's API key, the child can start streaming metrics. - -It's important to note that the streaming connection uses TCP, UDP, or Unix sockets, _not HTTP_. To proxy streaming -metrics, you need to use a proxy that tunnels [OSI layer 4-7 -traffic](https://en.wikipedia.org/wiki/OSI_model#Layer_4:_Transport_Layer) without interfering with it, such as -[SOCKS](https://en.wikipedia.org/wiki/SOCKS) or Nginx's [TCP/UDP load -balancing](https://docs.nginx.com/nginx/admin-guide/load-balancer/tcp-udp-load-balancer/). - -## Supported streaming configurations - -Netdata supports any combination of parent, child, and proxy nodes that you can imagine. Any node can act as both a -parent, child, or proxy at the same time, sending or receiving streaming metrics from any number of other nodes. - -Here are a few example streaming configurations: - -- **Headless collector**: - - Child `A`, _without_ a database or web dashboard, streams metrics to parent `B`. - - `A` metrics are only available via the local Agent dashboard for `B`. - - `B` generates alarms for `A`. -- **Replication**: - - Child `A`, _with_ a database and web dashboard, streams metrics to parent `B`. - - `A` metrics are available on both local Agent dashboards, and can be stored with the same or different metrics - retention policies. - - Both `A` and `B` generate alarms. -- **Proxy**: - - Child `A`, _with or without_ a database, sends metrics to proxy `C`, also _with or without_ a database. `C` sends - metrics to parent `B`. - - Any node with a database can generate alarms. - -## Viewing streamed metrics - -Parent nodes feature a **Replicated Nodes** section in the left-hand panel, which opens with the hamburger icon -![Hamburger icon](https://raw.githubusercontent.com/netdata/netdata-ui/master/src/components/icon/assets/hamburger.svg) -in the top navigation. The parent node, plus any child nodes, appear here. Click on any of the hostnames to switch -between parent and child dashboards, all served by the parent's [web server](https://github.com/netdata/netdata/blob/master/web/server/README.md). - -![Switching between -](https://user-images.githubusercontent.com/1153921/110043346-761ec000-7d04-11eb-8e58-77670ba39161.gif) - -Each child dashboard is also available directly at the following URL pattern: -`http://PARENT-NODE:19999/host/CHILD-HOSTNAME`. - -## What's next? - -Now that you understand the fundamentals of streaming metrics between nodes, go ahead and [enable -streaming](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.mdx) using a simple `parent-child` relationship. For all -the details, see the [streaming reference](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/reference-streaming.mdx) doc. - -Take your streaming setup even further by [exporting metrics](https://github.com/netdata/netdata/blob/master/docs/export/external-databases.md) to an external TSDB. - -### Related information & further reading - -- Streaming - - **[How Netdata streams metrics](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/how-streaming-works.mdx)** - - [Enable streaming between nodes](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.mdx) - - [Streaming reference](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/reference-streaming.mdx)
\ No newline at end of file diff --git a/docs/metrics-storage-management/reference-streaming.mdx b/docs/metrics-storage-management/reference-streaming.mdx deleted file mode 100644 index 58c898639..000000000 --- a/docs/metrics-storage-management/reference-streaming.mdx +++ /dev/null @@ -1,490 +0,0 @@ ---- -title: "Streaming reference" -description: "Each node running Netdata can stream the metrics it collects, in real time, to another node. See all of the available settings in this reference document." -type: "reference" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/metrics-storage-management/reference-streaming.mdx" -sidebar_label: "Streaming reference" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "References/Configuration" ---- - -# Streaming reference - -Each node running Netdata can stream the metrics it collects, in real time, to another node. To learn more, read about -[how streaming works](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/how-streaming-works.mdx). - -For a quickstart guide for enabling a simple `parent-child` streaming relationship, see our [stream metrics between -nodes](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/enable-streaming.mdx) doc. All other configuration options and scenarios are -covered in the sections below. - -## Configuration - -There are two files responsible for configuring Netdata's streaming capabilities: `stream.conf` and `netdata.conf`. - -From within your Netdata config directory (typically `/etc/netdata`), [use `edit-config`](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) to -open either `stream.conf` or `netdata.conf`. - -``` -sudo ./edit-config stream.conf -sudo ./edit-config netdata.conf -``` - -## Settings - -As mentioned above, both `stream.conf` and `netdata.conf` contain settings relevant to streaming. - -### `stream.conf` - -The `stream.conf` file contains three sections. The `[stream]` section is for configuring child nodes. - -The `[API_KEY]` and `[MACHINE_GUID]` sections are both for configuring parent nodes, and share the same settings. -`[API_KEY]` settings affect every child node using that key, whereas `[MACHINE_GUID]` settings affect only the child -node with a matching GUID. - -The file `/var/lib/netdata/registry/netdata.public.unique.id` contains a random GUID that **uniquely identifies each -node**. This file is automatically generated by Netdata the first time it is started and remains unaltered forever. - -#### `[stream]` section - -| Setting | Default | Description | -| :---------------------------------------------- | :------------------------ | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `enabled` | `no` | Whether this node streams metrics to any parent. Change to `yes` to enable streaming. | -| [`destination`](#destination) | ` ` | A space-separated list of parent nodes to attempt to stream to, with the first available parent receiving metrics, using the following format: `[PROTOCOL:]HOST[%INTERFACE][:PORT][:SSL]`. [Read more →](#destination) | -| `ssl skip certificate verification` | `yes` | If you want to accept self-signed or expired certificates, set to `yes` and uncomment. | -| `CApath` | `/etc/ssl/certs/` | The directory where known certificates are found. Defaults to OpenSSL's default path. | -| `CAfile` | `/etc/ssl/certs/cert.pem` | Add a parent node certificate to the list of known certificates in `CAPath`. | -| `api key` | ` ` | The `API_KEY` to use as the child node. | -| `timeout seconds` | `60` | The timeout to connect and send metrics to a parent. | -| `default port` | `19999` | The port to use if `destination` does not specify one. | -| [`send charts matching`](#send-charts-matching) | `*` | A space-separated list of [Netdata simple patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md) to filter which charts are streamed. [Read more →](#send-charts-matching) | -| `buffer size bytes` | `10485760` | The size of the buffer to use when sending metrics. The default `10485760` equals a buffer of 10MB, which is good for 60 seconds of data. Increase this if you expect latencies higher than that. The buffer is flushed on reconnect. | -| `reconnect delay seconds` | `5` | How long to wait until retrying to connect to the parent node. | -| `initial clock resync iterations` | `60` | Sync the clock of charts for how many seconds when starting. | - -### `[API_KEY]` and `[MACHINE_GUID]` sections - -| Setting | Default | Description | -| :---------------------------------------------- | :------------------------ | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `enabled` | `no` | Whether this API KEY enabled or disabled. | -| [`allow from`](#allow-from) | `*` | A space-separated list of [Netdata simple patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md) matching the IPs of nodes that will stream metrics using this API key. [Read more →](#allow-from) | -| `default history` | `3600` | The default amount of child metrics history to retain when using the `save`, `map`, or `ram` memory modes. | -| [`default memory mode`](#default-memory-mode) | `ram` | The [database](https://github.com/netdata/netdata/blob/master/database/README.md) to use for all nodes using this `API_KEY`. Valid settings are `dbengine`, `map`, `save`, `ram`, or `none`. [Read more →](#default-memory-mode) | -| `health enabled by default` | `auto` | Whether alarms and notifications should be enabled for nodes using this `API_KEY`. `auto` enables alarms when the child is connected. `yes` enables alarms always, and `no` disables alarms. | -| `default postpone alarms on connect seconds` | `60` | Postpone alarms and notifications for a period of time after the child connects. | -| `default proxy enabled` | ` ` | Route metrics through a proxy. | -| `default proxy destination` | ` ` | Space-separated list of `IP:PORT` for proxies. | -| `default proxy api key` | ` ` | The `API_KEY` of the proxy. | -| `default send charts matching` | `*` | See [`send charts matching`](#send-charts-matching). | - -#### `destination` - -A space-separated list of parent nodes to attempt to stream to, with the first available parent receiving metrics, using -the following format: `[PROTOCOL:]HOST[%INTERFACE][:PORT][:SSL]`. - -- `PROTOCOL`: `tcp`, `udp`, or `unix`. (only tcp and unix are supported by parent nodes) -- `HOST`: A IPv4, IPv6 IP, or a hostname, or a unix domain socket path. IPv6 IPs should be given with brackets - `[ip:address]`. -- `INTERFACE` (IPv6 only): The network interface to use. -- `PORT`: The port number or service name (`/etc/services`) to use. -- `SSL`: To enable TLS/SSL encryption of the streaming connection. - -To enable TCP streaming to a parent node at `203.0.113.0` on port `20000` and with TLS/SSL encryption: - -```conf -[stream] - destination = tcp:203.0.113.0:20000:SSL -``` - -#### `send charts matching` - -A space-separated list of [Netdata simple patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md) to filter which charts are streamed. - -The default is a single wildcard `*`, which streams all charts. - -To send only a few charts, list them explicitly, or list a group using a wildcard. To send _only_ the `apps.cpu` chart -and charts with contexts beginning with `system.`: - -```conf -[stream] - send charts matching = apps.cpu system.* -``` - -To send all but a few charts, use `!` to create a negative match. To send _all_ charts _but_ `apps.cpu`: - -```conf -[stream] - send charts matching = !apps.cpu * -``` - -#### `allow from` - -A space-separated list of [Netdata simple patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md) matching the IPs of nodes that -will stream metrics using this API key. The order is important, left to right, as the first positive or negative match is used. - -The default is `*`, which accepts all requests including the `API_KEY`. - -To allow from only a specific IP address: - -```conf -[API_KEY] - allow from = 203.0.113.10 -``` - -To allow all IPs starting with `10.*`, except `10.1.2.3`: - -```conf -[API_KEY] - allow from = !10.1.2.3 10.* -``` - -> If you set specific IP addresses here, and also use the `allow connections` setting in the `[web]` section of -> `netdata.conf`, be sure to add the IP address there so that it can access the API port. - -#### `default memory mode` - -The [database](https://github.com/netdata/netdata/blob/master/database/README.md) to use for all nodes using this `API_KEY`. Valid settings are `dbengine`, `ram`, -`save`, `map`, or `none`. - -- `dbengine`: The default, recommended time-series database (TSDB) for Netdata. Stores recent metrics in memory, then - efficiently spills them to disk for long-term storage. -- `ram`: Stores metrics _only_ in memory, which means metrics are lost when Netdata stops or restarts. Ideal for - streaming configurations that use ephemeral nodes. -- `save`: Stores metrics in memory, but saves metrics to disk when Netdata stops or restarts, and loads historical - metrics on start. -- `map`: Stores metrics in memory-mapped files, like swap, with constant disk write. -- `none`: No database. - -When using `default memory mode = dbengine`, the parent node creates a separate instance of the TSDB to store metrics -from child nodes. The [size of _each_ instance is configurable](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) with the `page -cache size` and `dbengine multihost disk space` settings in the `[global]` section in `netdata.conf`. - -### `netdata.conf` - -| Setting | Default | Description | -| :----------------------------------------- | :---------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| **`[global]` section** | | | -| `memory mode` | `dbengine` | Determines the [database type](https://github.com/netdata/netdata/blob/master/database/README.md) to be used on that node. Other options settings include `none`, `ram`, `save`, and `map`. `none` disables the database at this host. This also disables alarms and notifications, as those can't run without a database. | -| **`[web]` section** | | | -| `mode` | `static-threaded` | Determines the [web server](https://github.com/netdata/netdata/blob/master/web/server/README.md) type. The other option is `none`, which disables the dashboard, API, and registry. | -| `accept a streaming request every seconds` | `0` | Set a limit on how often a parent node accepts streaming requests from child nodes. `0` equals no limit. If this is set, you may see `... too busy to accept new streaming request. Will be allowed in X secs` in Netdata's `error.log`. | - -## Examples - -### Per-child settings - -While the `[API_KEY]` section applies settings for any child node using that key, you can also use per-child settings -with the `[MACHINE_GUID]` section. - -For example, the metrics streamed from only the child node with `MACHINE_GUID` are saved in memory, not using the -default `dbengine` as specified by the `API_KEY`, and alarms are disabled. - -```conf -[API_KEY] - enabled = yes - default memory mode = dbengine - health enabled by default = auto - allow from = * - -[MACHINE_GUID] - enabled = yes - memory mode = save - health enabled = no -``` - -### Securing streaming with TLS/SSL - -Netdata does not activate TLS encryption by default. To encrypt streaming connections, you first need to [enable TLS -support](https://github.com/netdata/netdata/blob/master/web/server/README.md#enabling-tls-support) on the parent. With encryption enabled on the receiving side, you -need to instruct the child to use TLS/SSL as well. On the child's `stream.conf`, configure the destination as follows: - -``` -[stream] - destination = host:port:SSL -``` - -The word `SSL` appended to the end of the destination tells the child that connections must be encrypted. - -> While Netdata uses Transport Layer Security (TLS) 1.2 to encrypt communications rather than the obsolete SSL protocol, -> it's still common practice to refer to encrypted web connections as `SSL`. Many vendors, like Nginx and even Netdata -> itself, use `SSL` in configuration files, whereas documentation will always refer to encrypted communications as `TLS` -> or `TLS/SSL`. - -#### Certificate verification - -When TLS/SSL is enabled on the child, the default behavior will be to not connect with the parent unless the server's -certificate can be verified via the default chain. In case you want to avoid this check, add the following to the -child's `stream.conf` file: - -``` -[stream] - ssl skip certificate verification = yes -``` - -#### Trusted certificate - -If you've enabled [certificate verification](#certificate-verification), you might see errors from the OpenSSL library -when there's a problem with checking the certificate chain (`X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY`). More -importantly, OpenSSL will reject self-signed certificates. - -Given these known issues, you have two options. If you trust your certificate, you can set the options `CApath` and -`CAfile` to inform Netdata where your certificates, and the certificate trusted file, are stored. - -For more details about these options, you can read about [verify -locations](https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_load_verify_locations.html). - -Before you changed your streaming configuration, you need to copy your trusted certificate to your child system and add -the certificate to OpenSSL's list. - -On most Linux distributions, the `update-ca-certificates` command searches inside the `/usr/share/ca-certificates` -directory for certificates. You should double-check by reading the `update-ca-certificate` manual (`man -update-ca-certificate`), and then change the directory in the below commands if needed. - -If you have `sudo` configured on your child system, you can use that to run the following commands. If not, you'll have -to log in as `root` to complete them. - -``` -# mkdir /usr/share/ca-certificates/netdata -# cp parent_cert.pem /usr/share/ca-certificates/netdata/parent_cert.crt -# chown -R netdata.netdata /usr/share/ca-certificates/netdata/ -``` - -First, you create a new directory to store your certificates for Netdata. Next, you need to change the extension on your -certificate from `.pem` to `.crt` so it's compatible with `update-ca-certificate`. Finally, you need to change -permissions so the user that runs Netdata can access the directory where you copied in your certificate. - -Next, edit the file `/etc/ca-certificates.conf` and add the following line: - -``` -netdata/parent_cert.crt -``` - -Now you update the list of certificates running the following, again either as `sudo` or `root`: - -``` -# update-ca-certificates -``` - -> Some Linux distributions have different methods of updating the certificate list. For more details, please read this -> guide on [adding trusted root certificates](https://github.com/Busindre/How-to-Add-trusted-root-certificates). - -Once you update your certificate list, you can set the stream parameters for Netdata to trust the parent certificate. -Open `stream.conf` for editing and change the following lines: - -``` -[stream] - CApath = /etc/ssl/certs/ - CAfile = /etc/ssl/certs/parent_cert.pem -``` - -With this configuration, the `CApath` option tells Netdata to search for trusted certificates inside `/etc/ssl/certs`. -The `CAfile` option specifies the Netdata parent certificate is located at `/etc/ssl/certs/parent_cert.pem`. With this -configuration, you can skip using the system's entire list of certificates and use Netdata's parent certificate instead. - -#### Expected behaviors - -With the introduction of TLS/SSL, the parent-child communication behaves as shown in the table below, depending on the -following configurations: - -- **Parent TLS (Yes/No)**: Whether the `[web]` section in `netdata.conf` has `ssl key` and `ssl certificate`. -- **Parent port TLS (-/force/optional)**: Depends on whether the `[web]` section `bind to` contains a `^SSL=force` or - `^SSL=optional` directive on the port(s) used for streaming. -- **Child TLS (Yes/No)**: Whether the destination in the child's `stream.conf` has `:SSL` at the end. -- **Child TLS Verification (yes/no)**: Value of the child's `stream.conf` `ssl skip certificate verification` - parameter (default is no). - -| Parent TLS enabled | Parent port SSL | Child TLS | Child SSL Ver. | Behavior | -| :----------------- | :--------------- | :-------- | :------------- | :--------------------------------------------------------------------------------------------------------------------------------------- | -| No | - | No | no | Legacy behavior. The parent-child stream is unencrypted. | -| Yes | force | No | no | The parent rejects the child connection. | -| Yes | -/optional | No | no | The parent-child stream is unencrypted (expected situation for legacy child nodes and newer parent nodes) | -| Yes | -/force/optional | Yes | no | The parent-child stream is encrypted, provided that the parent has a valid TLS/SSL certificate. Otherwise, the child refuses to connect. | -| Yes | -/force/optional | Yes | yes | The parent-child stream is encrypted. | - -### Proxy - -A proxy is a node that receives metrics from a child, then streams them onward to a parent. To configure a proxy, -configure it as a receiving and a sending Netdata at the same time. - -Netdata proxies may or may not maintain a database for the metrics passing through them. When they maintain a database, -they can also run health checks (alarms and notifications) for the remote host that is streaming the metrics. - -In the following example, the proxy receives metrics from a child node using the `API_KEY` of -`66666666-7777-8888-9999-000000000000`, then stores metrics using `dbengine`. It then uses the `API_KEY` of -`11111111-2222-3333-4444-555555555555` to proxy those same metrics on to a parent node at `203.0.113.0`. - -```conf -[stream] - enabled = yes - destination = 203.0.113.0 - api key = 11111111-2222-3333-4444-555555555555 - -[66666666-7777-8888-9999-000000000000] - enabled = yes - default memory mode = dbengine -``` - -### Ephemeral nodes - -Netdata can help you monitor ephemeral nodes, such as containers in an auto-scaling infrastructure, by always streaming -metrics to any number of permanently-running parent nodes. - -On the parent, set the following in `stream.conf`: - -```conf -[11111111-2222-3333-4444-555555555555] - # enable/disable this API key - enabled = yes - - # one hour of data for each of the child nodes - default history = 3600 - - # do not save child metrics on disk - default memory = ram - - # alarms checks, only while the child is connected - health enabled by default = auto -``` - -On the child nodes, set the following in `stream.conf`: - -```bash -[stream] - # stream metrics to another Netdata - enabled = yes - - # the IP and PORT of the parent - destination = 10.11.12.13:19999 - - # the API key to use - api key = 11111111-2222-3333-4444-555555555555 -``` - -In addition, edit `netdata.conf` on each child node to disable the database and alarms. - -```bash -[global] - # disable the local database - memory mode = none - -[health] - # disable health checks - enabled = no -``` - -## Troubleshooting - -Both parent and child nodes log information at `/var/log/netdata/error.log`. - -If the child manages to connect to the parent you will see something like (on the parent): - -``` -2017-03-09 09:38:52: netdata: INFO : STREAM [receive from [10.11.12.86]:38564]: new client connection. -2017-03-09 09:38:52: netdata: INFO : STREAM xxx [10.11.12.86]:38564: receive thread created (task id 27721) -2017-03-09 09:38:52: netdata: INFO : STREAM xxx [receive from [10.11.12.86]:38564]: client willing to stream metrics for host 'xxx' with machine_guid '1234567-1976-11e6-ae19-7cdd9077342a': update every = 1, history = 3600, memory mode = ram, health auto -2017-03-09 09:38:52: netdata: INFO : STREAM xxx [receive from [10.11.12.86]:38564]: initializing communication... -2017-03-09 09:38:52: netdata: INFO : STREAM xxx [receive from [10.11.12.86]:38564]: receiving metrics... -``` - -and something like this on the child: - -``` -2017-03-09 09:38:28: netdata: INFO : STREAM xxx [send to box:19999]: connecting... -2017-03-09 09:38:28: netdata: INFO : STREAM xxx [send to box:19999]: initializing communication... -2017-03-09 09:38:28: netdata: INFO : STREAM xxx [send to box:19999]: waiting response from remote netdata... -2017-03-09 09:38:28: netdata: INFO : STREAM xxx [send to box:19999]: established communication - sending metrics... -``` - -The following sections describe the most common issues you might encounter when connecting parent and child nodes. - -### Slow connections between parent and child - -When you have a slow connection between parent and child, Netdata raises a few different errors. Most of the -errors will appear in the child's `error.log`. - -```bash -netdata ERROR : STREAM_SENDER[CHILD HOSTNAME] : STREAM CHILD HOSTNAME [send to PARENT IP:PARENT PORT]: too many data pending - buffer is X bytes long, -Y unsent - we have sent Z bytes in total, W on this connection. Closing connection to flush the data. -``` - -On the parent side, you may see various error messages, most commonly the following: - -``` -netdata ERROR : STREAM_PARENT[CHILD HOSTNAME,[CHILD IP]:CHILD PORT] : read failed: end of file -``` - -Another common problem in slow connections is the child sending a partial message to the parent. In this case, the -parent will write the following to its `error.log`: - -``` -ERROR : STREAM_RECEIVER[CHILD HOSTNAME,[CHILD IP]:CHILD PORT] : sent command 'B' which is not known by netdata, for host 'HOSTNAME'. Disabling it. -``` - -In this example, `B` was part of a `BEGIN` message that was cut due to connection problems. - -Slow connections can also cause problems when the parent misses a message and then receives a command related to the -missed message. For example, a parent might miss a message containing the child's charts, and then doesn't know -what to do with the `SET` message that follows. When that happens, the parent will show a message like this: - -``` -ERROR : STREAM_RECEIVER[CHILD HOSTNAME,[CHILD IP]:CHILD PORT] : requested a SET on chart 'CHART NAME' of host 'HOSTNAME', without a dimension. Disabling it. -``` - -### Child cannot connect to parent - -When the child can't connect to a parent for any reason (misconfiguration, networking, firewalls, parent -down), you will see the following in the child's `error.log`. - -``` -ERROR : STREAM_SENDER[HOSTNAME] : Failed to connect to 'PARENT IP', port 'PARENT PORT' (errno 113, No route to host) -``` - -### 'Is this a Netdata?' - -This question can appear when Netdata starts the stream and receives an unexpected response. This error can appear when -the parent is using SSL and the child tries to connect using plain text. You will also see this message when -Netdata connects to another server that isn't Netdata. The complete error message will look like this: - -``` -ERROR : STREAM_SENDER[CHILD HOSTNAME] : STREAM child HOSTNAME [send to PARENT HOSTNAME:PARENT PORT]: server is not replying properly (is it a netdata?). -``` - -### Stream charts wrong - -Chart data needs to be consistent between child and parent nodes. If there are differences between chart data on -a parent and a child, such as gaps in metrics collection, it most often means your child's `memory mode` -does not match the parent's. To learn more about the different ways Netdata can store metrics, and thus keep chart -data consistent, read our [memory mode documentation](https://github.com/netdata/netdata/blob/master/database/README.md). - -### Forbidding access - -You may see errors about "forbidding access" for a number of reasons. It could be because of a slow connection between -the parent and child nodes, but it could also be due to other failures. Look in your parent's `error.log` for errors -that look like this: - -``` -STREAM [receive from [child HOSTNAME]:child IP]: `MESSAGE`. Forbidding access." -``` - -`MESSAGE` will have one of the following patterns: - -- `request without KEY` : The message received is incomplete and the KEY value can be API, hostname, machine GUID. -- `API key 'VALUE' is not valid GUID`: The UUID received from child does not have the format defined in [RFC - 4122](https://tools.ietf.org/html/rfc4122) -- `machine GUID 'VALUE' is not GUID.`: This error with machine GUID is like the previous one. -- `API key 'VALUE' is not allowed`: This stream has a wrong API key. -- `API key 'VALUE' is not permitted from this IP`: The IP is not allowed to use STREAM with this parent. -- `machine GUID 'VALUE' is not allowed.`: The GUID that is trying to send stream is not allowed. -- `Machine GUID 'VALUE' is not permitted from this IP. `: The IP does not match the pattern or IP allowed to connect to - use stream. - -### Netdata could not create a stream - -The connection between parent and child is a stream. When the parent can't convert the initial connection into -a stream, it will write the following message inside `error.log`: - -``` -file descriptor given is not a valid stream -``` - -After logging this error, Netdata will close the stream. diff --git a/docs/monitor/configure-alarms.md b/docs/monitor/configure-alarms.md deleted file mode 100644 index 4b5b8134e..000000000 --- a/docs/monitor/configure-alarms.md +++ /dev/null @@ -1,152 +0,0 @@ -<!-- -title: "Configure health alarms" -description: "Netdata's health monitoring watchdog is incredibly adaptable to your infrastructure's unique needs, with configurable health alarms." -custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/monitor/configure-alarms.md" -sidebar_label: "Configure health alarms" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup" ---> - -# Configure health alarms - -Netdata's health watchdog is highly configurable, with support for dynamic thresholds, hysteresis, alarm templates, and -more. You can tweak any of the existing alarms based on your infrastructure's topology or specific monitoring needs, or -create new entities. - -You can use health alarms in conjunction with any of Netdata's [collectors](https://github.com/netdata/netdata/blob/master/docs/collect/how-collectors-work.md) (see -the [supported collector list](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md)) to monitor the health of your systems, containers, and -applications in real time. - -While you can see active alarms both on the local dashboard and Netdata Cloud, all health alarms are configured _per -node_ via individual Netdata Agents. If you want to deploy a new alarm across your -[infrastructure](https://github.com/netdata/netdata/blob/master/docs/quickstart/infrastructure.md), you must configure each node with the same health configuration -files. - -## Edit health configuration files - -All of Netdata's [health configuration files](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#health-configuration-files) are in Netdata's config -directory, inside the `health.d/` directory. Navigate to your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) and -use `edit-config` to make changes to any of these files. - -For example, to edit the `cpu.conf` health configuration file, run: - -```bash -sudo ./edit-config health.d/cpu.conf -``` - -Each health configuration file contains one or more health _entities_, which always begin with `alarm:` or `template:`. -For example, here is the first health entity in `health.d/cpu.conf`: - -```yaml -template: 10min_cpu_usage - on: system.cpu - os: linux - hosts: * - lookup: average -10m unaligned of user,system,softirq,irq,guest - units: % - every: 1m - warn: $this > (($status >= $WARNING) ? (75) : (85)) - crit: $this > (($status == $CRITICAL) ? (85) : (95)) - delay: down 15m multiplier 1.5 max 1h - info: average cpu utilization for the last 10 minutes (excluding iowait, nice and steal) - to: sysadmin -``` - -To tune this alarm to trigger warning and critical alarms at a lower CPU utilization, change the `warn` and `crit` lines -to the values of your choosing. For example: - -```yaml - warn: $this > (($status >= $WARNING) ? (60) : (75)) - crit: $this > (($status == $CRITICAL) ? (75) : (85)) -``` - -Save the file and [reload Netdata's health configuration](#reload-health-configuration) to make your changes live. - -### Silence an individual alarm - -Instead of disabling an alarm altogether, or even disabling _all_ alarms, you can silence individual alarms by changing -one line in a given health entity. To silence any single alarm, change the `to:` line in its entity to `silent`. - -```yaml - to: silent -``` - -## Write a new health entity - -While tuning existing alarms may work in some cases, you may need to write entirely new health entities based on how -your systems, containers, and applications work. - -Read Netdata's [health reference](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#health-entity-reference) for a full listing of the format, -syntax, and functionality of health entities. - -To write a new health entity into a new file, navigate to your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), -then use `touch` to create a new file in the `health.d/` directory. Use `edit-config` to start editing the file. - -As an example, let's create a `ram-usage.conf` file. - -```bash -sudo touch health.d/ram-usage.conf -sudo ./edit-config health.d/ram-usage.conf -``` - -For example, here is a health entity that triggers a warning alarm when a node's RAM usage rises above 80%, and a -critical alarm above 90%: - -```yaml - alarm: ram_usage - on: system.ram -lookup: average -1m percentage of used - units: % - every: 1m - warn: $this > 80 - crit: $this > 90 - info: The percentage of RAM being used by the system. -``` - -Let's look into each of the lines to see how they create a working health entity. - -- `alarm`: The name for your new entity. The name needs to follow these requirements: - - Any alphabet letter or number. - - The symbols `.` and `_`. - - Cannot be `chart name`, `dimension name`, `family name`, or `chart variable names`. -- `on`: Which chart the entity listens to. -- `lookup`: Which metrics the alarm monitors, the duration of time to monitor, and how to process the metrics into a - usable format. - - `average`: Calculate the average of all the metrics collected. - - `-1m`: Use metrics from 1 minute ago until now to calculate that average. - - `percentage`: Clarify that we're calculating a percentage of RAM usage. - - `of used`: Specify which dimension (`used`) on the `system.ram` chart you want to monitor with this entity. -- `units`: Use percentages rather than absolute units. -- `every`: How often to perform the `lookup` calculation to decide whether or not to trigger this alarm. -- `warn`/`crit`: The value at which Netdata should trigger a warning or critical alarm. This example uses simple - syntax, but most pre-configured health entities use - [hysteresis](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#special-use-of-the-conditional-operator) to avoid superfluous notifications. -- `info`: A description of the alarm, which will appear in the dashboard and notifications. - -In human-readable format: - -> This health entity, named **ram_usage**, watches the **system.ram** chart. It looks up the last **1 minute** of -> metrics from the **used** dimension and calculates the **average** of all those metrics in a **percentage** format, -> using a **% unit**. The entity performs this lookup **every minute**. -> -> If the average RAM usage percentage over the last 1 minute is **more than 80%**, the entity triggers a warning alarm. -> If the usage is **more than 90%**, the entity triggers a critical alarm. - -When you finish writing this new health entity, [reload Netdata's health configuration](#reload-health-configuration) to -see it live on the local dashboard or Netdata Cloud. - -## Reload health configuration - -To make any changes to your health configuration live, you must reload Netdata's health monitoring system. To do that -without restarting all of Netdata, run `netdatacli reload-health` or `killall -USR2 netdata`. - -## What's next? - -With your health entities configured properly, it's time to [enable -notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md) to get notified whenever a node reaches a warning or critical -state. - -To build complex, dynamic alarms, read our guide on [dimension templates](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/dimension-templates.md). - - diff --git a/docs/monitor/enable-notifications.md b/docs/monitor/enable-notifications.md index 99c24b64e..1174561cf 100644 --- a/docs/monitor/enable-notifications.md +++ b/docs/monitor/enable-notifications.md @@ -1,51 +1,49 @@ <!-- -title: "Enable alarm notifications" +title: "Alert notifications" description: "Send Netdata alarms from a centralized place with Netdata Cloud, or configure nodes individually, to enable incident response and faster resolution." custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/monitor/enable-notifications.md" -sidebar_label: "Enable alarm notifications" +sidebar_label: "Notify" learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup" +learn_rel_path: "Integrations/Notify" --> -# Enable alarm notifications +# Alert notifications -Netdata offers two ways to receive alarm notifications on external platforms. These methods work independently _or_ in -parallel, which means you can enable both at the same time to send alarm notifications to any number of endpoints. +Netdata offers two ways to receive alert notifications on external platforms. These methods work independently _or_ in +parallel, which means you can enable both at the same time to send alert notifications to any number of endpoints. -Both methods use a node's health alarms to generate the content of alarm notifications. Read the doc on [configuring -alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md) to change the preconfigured thresholds or to create tailored alarms for your +Both methods use a node's health alerts to generate the content of alert notifications. Read our documentation on [configuring alerts](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md) to change the preconfigured thresholds or to create tailored alerts for your infrastructure. -Netdata Cloud offers [centralized alarm notifications](#netdata-cloud) via email, which leverages the health status +Netdata Cloud offers [centralized alert notifications](#netdata-cloud) via email, which leverages the health status information already streamed to Netdata Cloud from connected nodes to send notifications to those who have enabled them. The Netdata Agent has a [notification system](#netdata-agent) that supports more than a dozen services, such as email, Slack, PagerDuty, Twilio, Amazon SNS, Discord, and much more. -For example, use centralized alarm notifications in Netdata Cloud for immediate, zero-configuration alarm notifications +For example, use centralized alert notifications in Netdata Cloud for immediate, zero-configuration alert notifications for your team, then configure individual nodes send notifications to a PagerDuty endpoint for an automated incident response process. ## Netdata Cloud -Netdata Cloud's [centralized alarm -notifications](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.mdx) is a zero-configuration way to +Netdata Cloud's [centralized alert +notifications](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.md) is a zero-configuration way to get notified when an anomaly or incident strikes any node or application in your infrastructure. The advantage of using -centralized alarm notifications from Netdata Cloud is that you don't have to worry about configuring each node in your +centralized alert notifications from Netdata Cloud is that you don't have to worry about configuring each node in your infrastructure. -To enable centralized alarm notifications for a Space, click on **Manage Space** in the left-hand menu, then click on +To enable centralized alert notifications for a Space, click on **Manage Space** in the left-hand menu, then click on the **Notifications** tab. Click the toggle switch next to **E-mail** to enable this notification method. Next, enable notifications on a user level by clicking on your profile icon, then **Profile** in the dropdown. The **Notifications** tab reveals rich management settings, including the ability to enable/disable methods entirely or choose what types of notifications to receive from each War Room. -![Enabling and configuring alarm notifications in Netdata +![Enabling and configuring alert notifications in Netdata Cloud](https://user-images.githubusercontent.com/1153921/101936280-93c50900-3b9d-11eb-9ba0-d6927fa872b7.gif) -See the [centralized alarm notifications](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.mdx) +See the [centralized alert notifications](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.md) reference doc for further details about what information is conveyed in an email notification, flood protection, and more. @@ -53,7 +51,7 @@ more. The Netdata Agent's [notification system](https://github.com/netdata/netdata/blob/master/health/notifications/README.md) runs on every node and dispatches notifications based on configured endpoints and roles. You can enable multiple endpoints on any one node _and_ use Agent -notifications in parallel with centralized alarm notifications in Netdata Cloud. +notifications in parallel with centralized alert notifications in Netdata Cloud. > ❗ If you want to enable notifications from multiple nodes in your infrastructure, each running the Netdata Agent, you > must configure each node individually. @@ -70,7 +68,6 @@ notification platform. - [**Dynatrace**](https://github.com/netdata/netdata/blob/master/health/notifications/dynatrace/README.md) - [**Email**](https://github.com/netdata/netdata/blob/master/health/notifications/email/README.md) - [**Flock**](https://github.com/netdata/netdata/blob/master/health/notifications/flock/README.md) -- [**Google Hangouts**](https://github.com/netdata/netdata/blob/master/health/notifications/hangouts/README.md) - [**Gotify**](https://github.com/netdata/netdata/blob/master/health/notifications/gotify/README.md) - [**IRC**](https://github.com/netdata/netdata/blob/master/health/notifications/irc/README.md) - [**Kavenegar**](https://github.com/netdata/netdata/blob/master/health/notifications/kavenegar/README.md) @@ -91,61 +88,4 @@ notification platform. - [**Telegram**](https://github.com/netdata/netdata/blob/master/health/notifications/telegram/README.md) - [**Twilio**](https://github.com/netdata/netdata/blob/master/health/notifications/twilio/README.md) -### Enable Slack notifications - -First, [Add an incoming webhook](https://slack.com/apps/A0F7XDUAZ-incoming-webhooks) in Slack for the channel where you -want to see alarm notifications from Netdata. Click the green **Add to Slack** button, choose the channel, and click the -**Add Incoming WebHooks Integration** button. - -On the following page, you'll receive a **Webhook URL**. That's what you'll need to configure Netdata, so keep it handy. - -Navigate to your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) and use `edit-config` to -open the `health_alarm_notify.conf` file: - -```bash -sudo ./edit-config health_alarm_notify.conf -``` - -Look for the `SLACK_WEBHOOK_URL=" "` line and add the incoming webhook URL you got from Slack: - -```conf -SLACK_WEBHOOK_URL="https://hooks.slack.com/services/XXXXXXXXX/XXXXXXXXX/XXXXXXXXXXXX" -``` - -A few lines down, edit the `DEFAULT_RECIPIENT_SLACK` line to contain a single hash `#` character. This instructs Netdata -to send a notification to the channel you configured with the incoming webhook. - -```conf -DEFAULT_RECIPIENT_SLACK="#" -``` - -To test Slack notifications, switch to the Netdata user. - -```bash -sudo su -s /bin/bash netdata -``` - -Next, run the `alarm-notify` script using the `test` option. - -```bash -/usr/libexec/netdata/plugins.d/alarm-notify.sh test -``` - -You should receive three notifications in your Slack channel for each health status change: `WARNING`, `CRITICAL`, and -`CLEAR`. - -See the [Agent Slack notifications](https://github.com/netdata/netdata/blob/master/health/notifications/slack/README.md) doc for more options and information. - -## What's next? - -Now that you have health entities configured to your infrastructure's needs and notifications to inform you of anomalies -or incidents, your health monitoring setup is complete. - -To make your dashboards most useful during root cause analysis, use Netdata's [distributed data -architecture](https://github.com/netdata/netdata/blob/master/docs/store/distributed-data-architecture.md) for the best-in-class performance and scalability. - -### Related reference documentation - -- [Netdata Cloud · Alarm notifications](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.mdx) -- [Netdata Agent · Notifications](https://github.com/netdata/netdata/blob/master/health/notifications/README.md) diff --git a/docs/monitor/view-active-alarms.md b/docs/monitor/view-active-alarms.md index 07c22fe12..cc6a2d3a1 100644 --- a/docs/monitor/view-active-alarms.md +++ b/docs/monitor/view-active-alarms.md @@ -1,45 +1,46 @@ -<!-- -title: "View active health alarms" -description: "View active alarms and their rich data to discover and resolve anomalies and performance issues across your infrastructure." -custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/monitor/view-active-alarms.md" -sidebar_label: "View active health alarms" -learn_status: "Published" -learn_topic_type: "Concepts" -learn_rel_path: "Operations/Alerts" ---> +# View active alerts -# View active health alarms +Netdata comes with hundreds of pre-configured health alerts designed to notify you when an anomaly or performance issue affects your node or its applications. -Every Netdata Agent comes with hundreds of pre-installed health alarms designed to notify you when an anomaly or -performance issue affects your node or the applications it runs. +From the Alerts tab you can see all the active alerts in your War Room. You will be presented with a table having information about each alert that is in warning and critical state. +You can always sort the table by a certain column by clicking on the name of that column, and use the gear icon on the top right to control which columns are visible at any given time. -## Netdata Cloud +![image](https://user-images.githubusercontent.com/70198089/226340574-7e138dc7-5eab-4c47-a4a9-5f2640e38643.png) -A War Room's [alarms indicator](https://learn.netdata.cloud/docs/cloud/war-rooms#indicators) displays the number of -active `critical` (red) and `warning` (yellow) alerts for the nodes in this War Room. Click on either the critical or -warning badges to open a pre-filtered modal displaying only those types of [active -alarms](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/view-active-alerts.mdx). +## Filter alerts -![The Alarms panel in Netdata -Cloud](https://user-images.githubusercontent.com/1153921/108564747-d2bfbb00-72c0-11eb-97b9-5863ad3324eb.png) +From this tab, you can also filter alerts with the right hand bar. More specifically you can filter: -The Alarms panel lists all active alarms for nodes within that War Room, and tells you which chart triggered the alarm, -what that chart's current value is, the alarm that triggered it, and when the alarm status first began. +- Alert status + - Filter based on the status of the alerts (e.g. Warning, Critical) +- Alert class + - Filter based on the class of the alert (e.g. Latency, Utilization, Workload etc.) +- Alert type & component + - Filter based on the alert's type (e.g. System, Web Server) and component (e.g. CPU, Disk, Load) +- Alert role + - Filter by the role that the alert is set to notify (e.g. Sysadmin, Webmaster etc.) +- Nodes + - Filter the alerts based on the nodes that are online, next to each node's name you can see how many alerts the node has, "critical" colored in red and "warning" colored in yellow -Use the input field in the Alarms panel to filter active alarms. You can sort by the node's name, alarm, status, chart -that triggered the alarm, or the operating system. Read more about the [filtering -syntax](https://learn.netdata.cloud/docs/cloud/war-rooms#node-filter) to build valuable filters for your infrastructure. +## View alert details -Click on the 3-dot icon (`⋮`) to view active alarm information or navigate directly to the offending chart in that -node's Cloud dashboard with the **Go to chart** button. +By clicking on the name of an entry of the table you can access that alert's details page, providing you with: -The active alarm information gives you details about the alarm that's been triggered. You can see the alarm's -configuration, how it calculates warning or critical alarms, and which configuration file you could edit on that node if -you want to tweak or disable the alarm to better suit your needs. +- Latest and Triggered time values +- The alert's description +- A link to the Community forum's alert page +- The chart at the time frame that the alert was triggered +- The alert's information: Node name, chart ID, type, component and class +- Configuration section +- Instance values - Node Instances -![Active alarm details in Netdata -Cloud](https://user-images.githubusercontent.com/1153921/108564813-f08d2000-72c0-11eb-80c8-b2af22a751fd.png) +![image](https://user-images.githubusercontent.com/70198089/226339928-bae60140-0293-42cf-9713-ac4901708aba.png) +At the bottom of the panel you can click the green button "View dedicated alert page" to open a [dynamic tab](https://github.com/netdata/netdata/blob/master/docs/quickstart/infrastructure.md#dynamic-tabs) containing all the info for this alert in a tab format, where you can also run correlations and go to the node's chart that raised the particular alert. + +![image](https://user-images.githubusercontent.com/70198089/226339794-61896c35-0b93-4ac9-92aa-07116fe63784.png) + +<!-- ## Local Netdata Agent dashboard Find the alarms icon ![Alarms @@ -65,15 +66,5 @@ With the three icons beneath that and the **role** designation, you can: 3. Copy the code to embed the badge onto another web page using an `<embed>` element. The table on the right-hand side displays information about the health entity that triggered the alarm, which you can -use as a reference to [configure alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md). - -## What's next? - -With the information that appears on Netdata Cloud and the local dashboard about active alarms, you can [configure -alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md) to match your infrastructure's needs or your team's goals. - -If you're happy with the pre-configured alarms, skip ahead to [enable -notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md) to use Netdata Cloud's centralized alarm notifications and/or -per-node notifications to endpoints like Slack, PagerDuty, Twilio, and more. - - +use as a reference to [configure alarms](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md). + --> diff --git a/docs/netdata-for-IoT.md b/docs/netdata-for-IoT.md index 87b307b97..8dfed21eb 100644 --- a/docs/netdata-for-IoT.md +++ b/docs/netdata-for-IoT.md @@ -1,6 +1,9 @@ <!-- title: "Netdata for IoT" custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/netdata-for-IoT.md +sidebar_label: "Netdata for IoT" +learn_status: "Published" +learn_rel_path: "Miscellaneous" --> # Netdata for IoT diff --git a/docs/netdata-security.md b/docs/netdata-security.md index 511bc7721..6cd33c061 100644 --- a/docs/netdata-security.md +++ b/docs/netdata-security.md @@ -1,222 +1,167 @@ -<!-- -title: "Security design" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/netdata-security.md ---> +# Security and privacy design -# Security design +This document serves as the relevant Annex to the [Terms of Service](https://www.netdata.cloud/service-terms/), the [Privacy Policy](https://www.netdata.cloud/privacy/) and +the Data Processing Addendum, when applicable. It provides more information regarding Netdata’s technical and organizational security and privacy measures. We have given special attention to all aspects of Netdata, ensuring that everything throughout its operation is as secure as possible. Netdata has been designed with security in mind. -**Table of Contents** +> When running Netdata in environments requiring Payment Card Industry Data Security Standard (**PCI DSS**), Systems and Organization Controls (**SOC 2**), +or Health Insurance Portability and Accountability Act (**HIPAA**) compliance, please keep in mind that +**even when the user uses Netdata Cloud, all collected data is always stored inside their infrastructure**. -1. [Your data is safe with Netdata](#your-data-is-safe-with-netdata) -2. [Your systems are safe with Netdata](#your-systems-are-safe-with-netdata) -3. [Netdata is read-only](#netdata-is-read-only) -4. [Netdata viewers authentication](#netdata-viewers-authentication) - * [Why Netdata should be protected](#why-netdata-should-be-protected) - * [Protect Netdata from the internet](#protect-netdata-from-the-internet) - * [Expose Netdata only in a private LAN](#expose-netdata-only-in-a-private-lan) - * [Use an authenticating web server in proxy mode](#use-an-authenticating-web-server-in-proxy-mode) - * [Other methods](#other-methods) -5. [Registry or how to not send any information to a third party server](#registry-or-how-to-not-send-any-information-to-a-third-party-server) +Dashboard data a user views and alert notifications do travel +over Netdata Cloud, as they also travel over third party networks, to reach the user's web browser or the notification integrations the user has configured, +but Netdata Cloud does not store metric data. It only transforms them as they pass through it, aggregating them from multiple Agents and Parents, +to appear as one data source on the user's browser. -## Your data is safe with Netdata +## Cloud design -Netdata collects raw data from many sources. For each source, Netdata uses a plugin that connects to the source (or reads the relative files produced by the source), receives raw data and processes them to calculate the metrics shown on Netdata dashboards. +### User identification and authorization -Even if Netdata plugins connect to your database server, or read your application log file to collect raw data, the product of this data collection process is always a number of **chart metadata and metric values** (summarized data for dashboard visualization). All Netdata plugins (internal to the Netdata daemon, and external ones written in any computer language), convert raw data collected into metrics, and only these metrics are stored in Netdata databases, sent to upstream Netdata servers, or archived to external time-series databases. +Netdata ensures that only an email address is stored to create an account and use the Service. +User identification and authorization is done +either via third parties (Google, GitHub accounts), or short-lived access tokens, sent to the user’s email account. -> The **raw data** collected by Netdata, does not leave the host when collected. **The only data Netdata exposes are chart metadata and metric values.** +### Personal Data stored -This means that Netdata can safely be used in environments that require the highest level of data isolation (like PCI Level 1). - -## Your systems are safe with Netdata - -We are very proud that **the Netdata daemon runs as a normal system user, without any special privileges**. This is quite an achievement for a monitoring system that collects all kinds of system and application metrics. - -There are a few cases, however, that raw source data are only exposed to processes with escalated privileges. To support these cases, Netdata attempts to minimize and completely isolate the code that runs with escalated privileges. - -So, Netdata **plugins**, even those running with escalated capabilities or privileges, perform a **hard coded data collection job**. They do not accept commands from Netdata. The communication is strictly **unidirectional**: from the plugin towards the Netdata daemon. The original application data collected by each plugin do not leave the process they are collected, are not saved and are not transferred to the Netdata daemon. The communication from the plugins to the Netdata daemon includes only chart metadata and processed metric values. - -Child nodes use the same protocol when streaming metrics to their parent nodes. The raw data collected by the plugins of -child Netdata servers are **never leaving the host they are collected**. The only data appearing on the wire are chart -metadata and metric values. This communication is also **unidirectional**: child nodes never accept commands from -parent Netdata servers. - -## Netdata is read-only - -Netdata **dashboards are read-only**. Dashboard users can view and examine metrics collected by Netdata, but cannot instruct Netdata to do something other than present the already collected metrics. - -Netdata dashboards do not expose sensitive information. Business data of any kind, the kernel version, O/S version, application versions, host IPs, etc are not stored and are not exposed by Netdata on its dashboards. - -## Netdata viewers authentication - -Netdata is a monitoring system. It should be protected, the same way you protect all your admin apps. We assume Netdata will be installed privately, for your eyes only. - -### Why Netdata should be protected - -Viewers will be able to get some information about the system Netdata is running. This information is everything the dashboard provides. The dashboard includes a list of the services each system runs (the legends of the charts under the `Systemd Services` section), the applications running (the legends of the charts under the `Applications` section), the disks of the system and their names, the user accounts of the system that are running processes (the `Users` and `User Groups` section of the dashboard), the network interfaces and their names (not the IPs) and detailed information about the performance of the system and its applications. - -This information is not sensitive (meaning that it is not your business data), but **it is important for possible attackers**. It will give them clues on what to check, what to try and in the case of DDoS against your applications, they will know if they are doing it right or not. - -Also, viewers could use Netdata itself to stress your servers. Although the Netdata daemon runs unprivileged, with the minimum process priority (scheduling priority `idle` - lower than nice 19) and adjusts its OutOfMemory (OOM) score to 1000 (so that it will be first to be killed by the kernel if the system starves for memory), some pressure can be applied on your systems if someone attempts a DDoS against Netdata. - -### Protect Netdata from the internet - -Netdata is a distributed application. Most likely you will have many installations of it. Since it is distributed and you are expected to jump from server to server, there is very little usability to add authentication local on each Netdata. - -Until we add a distributed authentication method to Netdata, you have the following options: - -#### Expose Netdata only in a private LAN - -If your organisation has a private administration and management LAN, you can bind Netdata on this network interface on all your servers. This is done in `Netdata.conf` with these settings: +Netdata ensures that only an email address is stored to create an account and use the Service. The same email +address is used for Netdata product and marketing communications (via Hubspot and Sendgrid). -``` -[web] - bind to = 10.1.1.1:19999 localhost:19999 -``` +Email addresses are stored in our production database on AWS and copied to Google BigQuery, our data lake, +for analytics purposes. These analytics are crucial for our product development process. -You can bind Netdata to multiple IPs and ports. If you use hostnames, Netdata will resolve them and use all the IPs (in the above example `localhost` usually resolves to both `127.0.0.1` and `::1`). +If the user accepts the use of analytical cookies, the email address is also stored in the systems we use to track the +usage of the application (Posthog and Gainsight PX) -**This is the best and the suggested way to protect Netdata**. Your systems **should** have a private administration and management LAN, so that all management tasks are performed without any possibility of them being exposed on the internet. +The IP address used to access Netdata Cloud is stored in web proxy access logs. If the user accepts the use of analytical +cookies, the IP is also stored in the systems we use to track the usage of the application (Posthog and Gainsight PX). -For cloud based installations, if your cloud provider does not provide such a private LAN (or if you use multiple providers), you can create a virtual management and administration LAN with tools like `tincd` or `gvpe`. These tools create a mesh VPN allowing all servers to communicate securely and privately. Your administration stations join this mesh VPN to get access to management and administration tasks on all your cloud servers. +### Infrastructure data stored -For `gvpe` we have developed a [simple provisioning tool](https://github.com/netdata/netdata-demo-site/tree/master/gvpe) you may find handy (it includes statically compiled `gvpe` binaries for Linux and FreeBSD, and also a script to compile `gvpe` on your macOS system). We use this to create a management and administration LAN for all Netdata demo sites (spread all over the internet using multiple hosting providers). +The metric data that a user sees in the web browser when using Netdata Cloud is streamed directly from the Netdata Agent +to the Netdata Cloud dashboard, via the Agent-Cloud link (see [data transfer](#data-transfer)). The data passes through our systems, but it isn’t stored. ---- +The metadata we do store for each node connected to the user's Spaces in Netdata Cloud is: + - Hostname (as it appears in Netdata Cloud) + - Information shown in `/api/v1/info`. For example: [https://frankfurt.my-netdata.io/api/v1/info](https://frankfurt.my-netdata.io/api/v1/info). + - Metric metadata information shown in `/api/v1/contexts`. For example: [https://frankfurt.my-netdata.io/api/v1/contexts](https://frankfurt.my-netdata.io/api/v1/contexts). + - Alarm configurations shown in `/api/v1/alarms?all`. For example: [https://frankfurt.my-netdata.io/api/v1/alarms?all](https://frankfurt.my-netdata.io/api/v1/alarms?all). + - Active alarms shown in `/api/v1/alarms`. For example: [https://frankfurt.my-netdata.io/api/v1/alarms](https://frankfurt.my-netdata.io/api/v1/alarms). -In Netdata v1.9+ there is also access list support, like this: +The infrastructure data is stored in our production database on AWS and copied to Google BigQuery, our data lake, for + analytics purposes. -``` -[web] - bind to = * - allow connections from = localhost 10.* 192.168.* -``` +### Data transfer -#### Fine-grained access control +All infrastructure data visible on Netdata Cloud has to pass through the Agent-Cloud link (ACLK) mechanism, which +securely connects a Netdata Agent to Netdata Cloud. The Netdata agent initiates and establishes an outgoing secure +WebSocket (WSS) connection to Netdata Cloud. The ACLK is encrypted, safe, and is only established if the user connects their node. -The access list support allows filtering of all incoming connections, by specific IP addresses, ranges -or validated DNS lookups. Only connections that match an entry on the list will be allowed: +Data is encrypted when in transit between a user and Netdata Cloud using TLS. -``` -[web] - allow connections from = localhost 192.168.* 1.2.3.4 homeip.net -``` +### Data retention -Connections from the IP addresses are allowed if the connection IP matches one of the patterns given. -The alias localhost is always checked against 127.0.0.1, any other symbolic names need to resolve in -both directions using DNS. In the above example the IP address of `homeip.net` must reverse DNS resolve -to the incoming IP address and a DNS lookup on `homeip.net` must return the incoming IP address as -one of the resolved addresses. +Netdata may maintain backups of Netdata Cloud Customer Content, which would remain in place for approximately ninety +(90) days following a deletion in Netdata Cloud. -More specific control of what each incoming connection can do can be specified through the access control -list settings: +### Data portability and erasure -``` -[web] - allow connections from = 160.1.* - allow badges from = 160.1.1.2 - allow streaming from = 160.1.2.* - allow management from = control.subnet.ip - allow netdata.conf from = updates.subnet.ip - allow dashboard from = frontend.subnet.ip -``` +Netdata will, as necessary to enable the Customer to meet its obligations under Data Protection Law, provide the Customer +via the availability of Netdata Cloud with the ability to access, retrieve, correct and delete the Personal Data stored in +Netdata Cloud. The Customer acknowledges that such ability may from time to time be limited due to temporary service outages +for maintenance or other updates to Netdata Cloud, or technically not feasible. -In this example only connections from `160.1.x.x` are allowed, only the specific IP address `160.1.1.2` -can access badges, only IP addresses in the smaller range `160.1.2.x` can stream data. The three -hostnames shown can access specific features, this assumes that DNS is setup to resolve these names -to IP addresses within the `160.1.x.x` range and that reverse DNS is setup for these hosts. +To the extent that the Customer, in its fulfillment of its Data Protection Law obligations, is unable to access, retrieve, +correct or delete Customer Personal Data in Netdata Cloud due to prolonged unavailability of Netdata Cloud due to an issue +within Netdata’s control, Netdata will where possible use reasonable efforts to provide, correct or delete such Customer Personal Data. +If a Customer is unable to delete Personal Data via the self-services functionality, then Netdata deletes Personal Data upon +the Customer’s written request, within the timeframe specified in the DPA and in accordance with applicable data protection law. -#### Use an authenticating web server in proxy mode +#### Delete all personal data -Use one web server to provide authentication in front of **all your Netdata servers**. So, you will be accessing all your Netdata with URLs like `http://{HOST}/netdata/{NETDATA_HOSTNAME}/` and authentication will be shared among all of them (you will sign-in once for all your servers). Instructions are provided on how to set the proxy configuration to have Netdata run behind [nginx](Running-behind-nginx.md), [Apache](Running-behind-apache.md), [lighttpd](Running-behind-lighttpd.md) and [Caddy](Running-behind-caddy.md). +To remove all personal info we have about a user (email and activities) they need to delete their cloud account by logging into https://app.netdata.cloud and accessing their profile, at the bottom left of the screen. -To use this method, you should firewall protect all your Netdata servers, so that only the web server IP will be allowed to directly access Netdata. To do this, run this on each of your servers (or use your firewall manager): -```sh -PROXY_IP="1.2.3.4" -iptables -t filter -I INPUT -p tcp --dport 19999 \! -s ${PROXY_IP} -m conntrack --ctstate NEW -j DROP -``` +## Agent design -*commands to allow direct access to Netdata from a web server proxy* +### User data is safe with Netdata -The above will prevent anyone except your web server to access a Netdata dashboard running on the host. +Netdata collects raw data from many sources. For each source, Netdata uses a plugin that connects to the source (or reads the +relative files produced by the source), receives raw data and processes them to calculate the metrics shown on Netdata dashboards. -For Netdata v1.9+ you can also use `netdata.conf`: +Even if Netdata plugins connect to the user's database server, or read user's application log file to collect raw data, the product of +this data collection process is always a number of **chart metadata and metric values** (summarized data for dashboard visualization). +All Netdata plugins (internal to the Netdata daemon, and external ones written in any computer language), convert raw data collected +into metrics, and only these metrics are stored in Netdata databases, sent to upstream Netdata servers, or archived to external +time-series databases. -``` -[web] - allow connections from = localhost 1.2.3.4 -``` +The **raw data** collected by Netdata does not leave the host when collected. **The only data Netdata exposes are chart metadata and metric values.** -Of course you can add more IPs. - -For Netdata prior to v1.9, if you want to allow multiple IPs, use this: - -```sh -# space separated list of IPs to allow access Netdata -NETDATA_ALLOWED="1.2.3.4 5.6.7.8 9.10.11.12" -NETDATA_PORT=19999 +This means that Netdata can safely be used in environments that require the highest level of data isolation (like PCI Level 1). -# create a new filtering chain || or empty an existing one named netdata -iptables -t filter -N netdata 2>/dev/null || iptables -t filter -F netdata -for x in ${NETDATA_ALLOWED} -do - # allow this IP - iptables -t filter -A netdata -s ${x} -j ACCEPT -done +### User systems are safe with Netdata -# drop all other IPs -iptables -t filter -A netdata -j DROP +We are very proud that **the Netdata daemon runs as a normal system user, without any special privileges**. This is quite an +achievement for a monitoring system that collects all kinds of system and application metrics. -# delete the input chain hook (if it exists) -iptables -t filter -D INPUT -p tcp --dport ${NETDATA_PORT} -m conntrack --ctstate NEW -j netdata 2>/dev/null +There are a few cases, however, that raw source data are only exposed to processes with escalated privileges. To support these +cases, Netdata attempts to minimize and completely isolate the code that runs with escalated privileges. -# add the input chain hook (again) -# to send all new Netdata connections to our filtering chain -iptables -t filter -I INPUT -p tcp --dport ${NETDATA_PORT} -m conntrack --ctstate NEW -j netdata -``` +So, Netdata **plugins**, even those running with escalated capabilities or privileges, perform a **hard coded data collection job**. +They do not accept commands from Netdata. The communication is **unidirectional** from the plugin towards the Netdata daemon, except +for Functions (see below). The original application data collected by each plugin do not leave the process they are collected, are +not saved and are not transferred to the Netdata daemon. The communication from the plugins to the Netdata daemon includes only chart +metadata and processed metric values. -_script to allow access to Netdata only from a number of hosts_ +Child nodes use the same protocol when streaming metrics to their parent nodes. The raw data collected by the plugins of +child Netdata servers are **never leaving the host they are collected**. The only data appearing on the wire are chart +metadata and metric values. This communication is also **unidirectional**: child nodes never accept commands from +parent Netdata servers (except for Functions). -You can run the above any number of times. Each time it runs it refreshes the list of allowed hosts. +[Functions](https://github.com/netdata/netdata/blob/master/docs/cloud/netdata-functions.md) is currently +the only feature that routes requests back to origin Netdata Agents via Netdata Parents. The feature allows Netdata Cloud to send +a request to the Netdata Agent data collection plugin running at the +edge, to provide additional information, such as the process tree of a server, or the long queries of a DB. -#### Other methods +<!-- The user has full control over the available functions. For more information see “Controlling Access to Functions” and “Disabling Functions”. --> -Of course, there are many more methods you could use to protect Netdata: +### Netdata is read-only -- bind Netdata to localhost and use `ssh -L 19998:127.0.0.1:19999 remote.netdata.ip` to forward connections of local port 19998 to remote port 19999. This way you can ssh to a Netdata server and then use `http://127.0.0.1:19998/` on your computer to access the remote Netdata dashboard. +Netdata **dashboards are read-only**. Dashboard users can view and examine metrics collected by Netdata, but cannot +instruct Netdata to do something other than present the already collected metrics. -- If you are always under a static IP, you can use the script given above to allow direct access to your Netdata servers without authentication, from all your static IPs. +Netdata dashboards do not expose sensitive information. Business data of any kind, the kernel version, O/S version, +application versions, host IPs, etc. are not stored and are not exposed by Netdata on its dashboards. -- install all your Netdata in **headless data collector** mode, forwarding all metrics in real-time to a parent - Netdata server, which will be protected with authentication using an nginx server running locally at the parent - Netdata server. This requires more resources (you will need a bigger parent Netdata server), but does not require - any firewall changes, since all the child Netdata servers will not be listening for incoming connections. +### Protect Netdata from the internet -## Anonymous Statistics +Users are responsible to take all appropriate measures to secure their Netdata agent installations and especially the Netdata web user interface and API against unauthorized access. Netdata comes with a wide range of options to +[secure user nodes](https://github.com/netdata/netdata/blob/master/docs/category-overview-pages/secure-nodes.md) in +compliance with the user organization's security policy. -### Registry or how to not send any information to a third party server +### Anonymous statistics -The default configuration uses a public registry under registry.my-netdata.io (more information about the registry here: [mynetdata-menu-item](https://github.com/netdata/netdata/blob/master/registry/README.md) ). Please be aware that if you use that public registry, you submit the following information to a third party server: +#### Netdata registry -- The url where you open the web-ui in the browser (via http request referrer) -- The hostnames of the Netdata servers +The default configuration uses a public [registry](https://github.com/netdata/netdata/blob/master/registry/README.md) under registry.my-netdata.io. +If the user uses that public registry, they submit the following information to a third party server: + - The URL of the agent's web user interface (via http request referrer) + - The hostnames of the user's Netdata servers -If sending this information to the central Netdata registry violates your security policies, you can configure Netdata to [run your own registry](https://github.com/netdata/netdata/blob/master/registry/README.md#run-your-own-registry). +If sending this information to the central Netdata registry violates user's security policies, they can configure Netdata to +[run their own registry](https://github.com/netdata/netdata/blob/master/registry/README.md#run-your-own-registry). -### Opt-out of anonymous statistics +#### Anonymous telemetry events Starting with v1.30, Netdata collects anonymous usage information by default and sends it to a self hosted PostHog instance within the Netdata infrastructure. Read -about the information collected, and learn how to-opt, on our [anonymous statistics](anonymous-statistics.md) page. - -The usage statistics are _vital_ for us, as we use them to discover bugs and prioritize new features. We thank you for -_actively_ contributing to Netdata's future. +about the information collected and learn how to opt-out, on our +[anonymous telemetry events](https://github.com/netdata/netdata/blob/master/docs/anonymous-statistics.md) page. -## Netdata directories +### Netdata directories +The agent stores data in 6 different directories on the user's system. + | path|owner|permissions|Netdata|comments| |:---|:----|:----------|:------|:-------| | `/etc/netdata`|user `root`<br/>group `netdata`|dirs `0755`<br/>files `0640`|reads|**Netdata config files**<br/>may contain sensitive information, so group `netdata` is allowed to read them.| @@ -226,4 +171,26 @@ _actively_ contributing to Netdata's future. | `/var/lib/netdata`|user `netdata`<br/>group `netdata`|dirs `0750`<br/>files `0660`|reads, writes, creates, deletes|**Netdata permanent database files**<br/>Netdata stores here the registry data, health alarm log db, etc.| | `/var/log/netdata`|user `netdata`<br/>group `root`|dirs `0755`<br/>files `0644`|writes, creates|**Netdata log files**<br/>all the Netdata applications, logs their errors or other informational messages to files in this directory. These files should be log rotated.| +## Organization processes + +### Employee identification and authorization + +Netdata operates technical and organizational measures for employee identification and authentication, such as logs, policies, +assigning distinct usernames for each employee and utilizing password complexity requirements for access to all platforms. + +The COO or HR are the primary system owners for all platforms and may designate additional system owners, as needed. Additional +user access is also established on a role basis, requires the system owner’s approval, and is tracked by HR. User access to each +platform is subject to periodic review and testing. When an employee changes roles, HR updates the employee’s access to all systems. +Netdata uses on-boarding and off-boarding processes to regulate access by Netdata Personnel. + +Second-layer authentication is employed where available, by way of multi-factor authentication. + +Netdata’s IT control environment is based upon industry-accepted concepts, such as multiple layers of preventive and detective +controls, working in concert to provide for the overall protection of Netdata’s computing environment and data assets. + +### Systems security +Netdata maintains a risk-based assessment security program. The framework for Netdata’s security program includes administrative, +organizational, technical, and physical safeguards reasonably designed to protect the services and confidentiality, integrity, +and availability of user data. The program is intended to be appropriate to the nature of the services and the size and complexity +of Netdata’s business operations. diff --git a/docs/overview/netdata-monitoring-stack.md b/docs/overview/netdata-monitoring-stack.md deleted file mode 100644 index 36f5b5f06..000000000 --- a/docs/overview/netdata-monitoring-stack.md +++ /dev/null @@ -1,62 +0,0 @@ -<!-- -title: "Use Netdata standalone or as part of your monitoring stack" -description: "Netdata can run independently or as part of a larger monitoring stack thanks to its flexibility, interoperable core, and exporting features." -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/overview/netdata-monitoring-stack.md ---> - -# Use Netdata standalone or as part of your monitoring stack - -Netdata is an extremely powerful monitoring, visualization, and troubleshooting platform. While you can use it as an -effective standalone tool, we also designed it to be open and interoperable with other tools you might already be using. - -Netdata helps you collect everything and scales to infrastructure of any size, but it doesn't lock-in data or force you -to use specific tools or methodologies. Each feature is extensible and interoperable so they can work in parallel with -other tools. For example, you can use Netdata to collect metrics, visualize metrics with a second open-source program, -and centralize your metrics in a cloud-based time-series database solution for long-term storage or further analysis. - -You can build a new monitoring stack, including Netdata, or integrate Netdata's metrics with your existing monitoring -stack. No matter which route you take, Netdata helps you monitor infrastructure of any size. - -Here are a few ways to enrich your existing monitoring and troubleshooting stack with Netdata: - -## Collect metrics from Prometheus endpoints - -Netdata automatically detects 600 popular endpoints and collects per-second metrics from them via the [generic -Prometheus collector](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/README.md). This even -includes support for Windows 10 via [`windows_exporter`](https://github.com/prometheus-community/windows_exporter). - -This collector is installed and enabled on all Agent installations by default, so you don't need to waste time -configuring Netdata. Netdata will detect these Prometheus metrics endpoints and collect even more granular metrics than -your existing solutions. You can now use all of Netdata's meaningfully-visualized charts to diagnose issues and -troubleshoot anomalies. - -## Export metrics to external time-series databases - -Netdata can send its per-second metrics to external time-series databases, such as InfluxDB, Prometheus, Graphite, -TimescaleDB, ElasticSearch, AWS Kinesis Data Streams, Google Cloud Pub/Sub Service, and many others. - -To [export metrics to external time-series databases](https://github.com/netdata/netdata/blob/master/docs/export/external-databases.md), you configure an [exporting -_connector_](https://github.com/netdata/netdata/blob/master/docs/export/enable-connector.md). These connectors support filtering and resampling for granular control -over which metrics you export, and at what volume. You can export resampled metrics as collected, as averages, or the -sum of interpolated values based on your needs and other monitoring tools. - -Once you have Netdata's metrics in a secondary time-series database, you can use them however you'd like, such as -additional visualization/dashboarding tools or aggregation of data from multiple sources. - -## Visualize metrics with Grafana - -One popular monitoring stack is Netdata, Graphite, and Grafana. Netdata acts as the stack's metrics collection -powerhouse, Graphite the time-series database, and Grafana the visualization platform. With Netdata at the core, you can -be confident that your monitoring stack is powered by all possible metrics, from all possible sources, from every node -in your infrastructure. - -Of course, just because you export or visualize metrics elsewhere, it doesn't mean Netdata's equivalent features -disappear. You can always build new dashboards in Netdata Cloud, drill down into per-second metrics using Netdata's -charts, or use Netdata's health watchdog to send notifications whenever an anomaly strikes. - -## What's next? - -Whether you're using Netdata standalone or as part of a larger monitoring stack, the next step is the same: [**Get -Netdata**](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx). - - diff --git a/docs/overview/what-is-netdata.md b/docs/overview/what-is-netdata.md deleted file mode 100644 index f8e67159b..000000000 --- a/docs/overview/what-is-netdata.md +++ /dev/null @@ -1,76 +0,0 @@ -<!-- -title: "What is Netdata?" -description: "Netdata is distributed, real-time performance and health monitoring for systems and applications on a single node or an entire infrastructure." -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/overview/what-is-netdata.md ---> - -# What is Netdata? - -Netdata helps sysadmins, SREs, DevOps engineers, and IT professionals collect all possible metrics from systems and -applications, visualize these metrics in real-time, and troubleshoot complex performance problems. - -Netdata's solution uses two components, the Netdata Agent and Netdata Cloud, to deliver real-time performance and health -monitoring for both single nodes and entire infrastructure. - -## Netdata Agent - -Netdata's distributed monitoring Agent collects thousands of metrics from systems, hardware, and applications with zero -configuration. It runs permanently on all your physical/virtual servers, containers, cloud deployments, and edge/IoT -devices. - -You can [install](https://github.com/netdata/netdata/blob/master/docs/get-started.mdx) Netdata on most Linux -distributions (Ubuntu, Debian, CentOS, and more), -container/microservice platforms (Kubernetes clusters, Docker), and many other operating systems (FreeBSD, macOS), with -no `sudo` required. - -![The Netdata -Agent](https://user-images.githubusercontent.com/1153921/94492596-72a86b00-019f-11eb-91ab-224e6ac9ea21.png) - -## Netdata Cloud - -Netdata Cloud is a web application that gives you real-time visibility for your entire infrastructure. With Netdata -Cloud, you can view key metrics, insightful charts, and active alarms from all your nodes in a single web interface. -When an anomaly strikes, seamlessly navigate to any node to troubleshoot and discover the root cause with the familiar -Netdata dashboard. - -**[Netdata Cloud is free](https://www.netdata.cloud/blog/why-netdata-is-free/)**! You can add an entire infrastructure -of nodes, invite all your colleagues, and visualize any number of metrics, charts, and alarms entirely for free. - -While Netdata Cloud offers a centralized method of monitoring your Agents, your metrics data is not stored or -centralized in any way. Metrics data remains with your nodes and is only streamed to your browser, through Cloud, when -you're viewing the Netdata Cloud interface. - -![Netdata Cloud](https://user-images.githubusercontent.com/1153921/94492597-73410180-019f-11eb-9a9e-032420baa489.png) - -## What you can do with Netdata - -Netdata is designed to be both simple to use and flexible for every monitoring, visualization, and troubleshooting use -case: - -- **Collect**: Netdata collects all available metrics from your system and applications with 300+ collectors, - Kubernetes service discovery, and in-depth container monitoring, all while using only 1% CPU and a few MB of RAM. It - even collects metrics from Windows machines. -- **Visualize**: The dashboard meaningfully presents charts to help you understand the relationships between your - hardware, operating system, running apps/services, and the rest of your infrastructure. Add nodes to Netdata Cloud - for a complete view of your infrastructure from a single pane of glass. -- **Monitor**: Netdata's health watchdog uses hundreds of preconfigured alarms to notify you via Slack, email, - PagerDuty and more when an anomaly strikes. Customize with dynamic thresholds, hysteresis, alarm templates, and - role-based notifications. -- **Troubleshoot**: 1s granularity helps you detect and analyze anomalies other monitoring platforms might have - missed. Interactive visualizations reduce your reliance on the console, and historical metrics help you trace issues - back to their root cause. -- **Store**: Netdata's efficient database engine efficiently stores per-second metrics for days, weeks, or even - months. Every distributed node stores metrics locally, simplifying deployment, slashing costs, and enriching - Netdata's interactive dashboards. -- **Export**: Integrate per-second metrics with other time-series databases like Graphite, Prometheus, InfluxDB, - TimescaleDB, and more with Netdata's interoperable and extensible core. -- **Stream**: Aggregate metrics from any number of distributed nodes in one place for in-depth analysis, including - ephemeral nodes in a Kubernetes cluster. - -## What's next? - -Learn more -about [why you should use Netdata](https://github.com/netdata/netdata/blob/master/docs/overview/why-netdata.md), -or [how Netdata works with your existing monitoring stack](https://github.com/netdata/netdata/blob/master/docs/overview/netdata-monitoring-stack.md). - - diff --git a/docs/overview/why-netdata.md b/docs/overview/why-netdata.md deleted file mode 100644 index 158bc50df..000000000 --- a/docs/overview/why-netdata.md +++ /dev/null @@ -1,63 +0,0 @@ -<!-- -title: "Why use Netdata?" -description: "Netdata is simple to deploy, scalable, and optimized for troubleshooting. Cut the complexity and expense out of your monitoring stack." -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/overview/why-netdata.md ---> - -# Why use Netdata? - -Netdata takes a different approach to helping people build extraordinary infrastructure. It was built out of frustration -with existing monitoring tools that are too complex, too expensive, and don't help their users actually troubleshoot -complex performance and health issues. - -Netdata is: - -## Simple to deploy - -- **One-line deployment** for Linux distributions, plus support for Kubernetes/Docker infrastructures. -- **Zero configuration and maintenance** required to collect thousands of metrics, every second, from the underlying - OS and running applications. -- **Prebuilt charts and alarms** alert you to common anomalies and performance issues without manual configuration. -- **Distributed storage** to simplify the cost and complexity of storing metrics data from any number of nodes. - -## Powerful and scalable - -- **1% CPU utilization, a few MB of RAM, and minimal disk I/O** to run the monitoring Agent on bare metal, virtual - machines, containers, and even IoT devices. -- **Per-second granularity** for an unlimited number of metrics based on the hardware and applications you're running - on your nodes. -- **Interoperable exporters** let you connect Netdata's per-second metrics with an existing monitoring stack and other - time-series databases. - -## Optimized for troubleshooting - -- **Visual anomaly detection** with a UI/UX that emphasizes the relationships between charts. -- **Customizable dashboards** to pinpoint correlated metrics, respond to incidents, and help you streamline your - workflows. -- **Distributed metrics in a centralized interface** to assist users or teams trace complex issues between distributed - nodes. - -## Comparison with other monitoring solutions - -Netdata offers many benefits over the existing monitoring landscape, whether they're expensive SaaS products or other -open-source tools. - -| Netdata | Others (open-source and commercial) | -| :-------------------------------------------------------------- | :--------------------------------------------------------------- | -| **High resolution metrics** (1s granularity) | Low resolution metrics (10s granularity at best) | -| Collects **thousands of metrics per node** | Collects just a few metrics | -| Fast UI optimized for **anomaly detection** | UI is good for just an abstract view | -| **Long-term, autonomous storage** at one-second granularity | Centralized metrics in an expensive data lake at 10s granularity | -| **Meaningful presentation**, to help you understand the metrics | You have to know the metrics before you start | -| Install and get results **immediately** | Long sales process and complex installation process | -| Use it for **troubleshooting** performance problems | Only gathers _statistics of past performance_ | -| **Kills the console** for tracing performance issues | The console is always required for troubleshooting | -| Requires **zero dedicated resources** | Require large dedicated resources | - -## What's next? - -Whether you already have a monitoring stack you want to integrate Netdata into, or are building something from the -ground-up, you should read more on how Netdata can work either [standalone or as an interoperable part of a monitoring -stack](https://github.com/netdata/netdata/blob/master/docs/overview/netdata-monitoring-stack.md). - - diff --git a/docs/quickstart/infrastructure.md b/docs/quickstart/infrastructure.md index 23986b002..c76948f60 100644 --- a/docs/quickstart/infrastructure.md +++ b/docs/quickstart/infrastructure.md @@ -1,11 +1,9 @@ -<!-- -title: "Infrastructure monitoring with Netdata" -sidebar_label: "Infrastructure monitoring" -description: "Build a robust, infinitely scalable infrastructure monitoring solution with Netdata. Any number of nodes and every available metric." -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/quickstart/infrastructure.md ---> +import { Grid, Box, BoxList, BoxListItemRegexLink } from '@site/src/components/Grid/' +import { RiExternalLinkLine } from 'react-icons/ri' -# Infrastructure monitoring with Netdata +# Monitor your infrastructure + +Learn how to view key metrics, insightful charts, and active alarms from all your nodes, with Netdata Cloud's real-time infrastructure monitoring. [Netdata Cloud](https://app.netdata.cloud) provides scalable infrastructure monitoring for any number of distributed nodes running the Netdata Agent. A node is any system in your infrastructure that you want to monitor, whether it's a @@ -14,7 +12,7 @@ physical or virtual machine (VM), container, cloud deployment, or edge/IoT devic The Netdata Agent uses zero-configuration collectors to gather metrics from every application and container instantly, and uses Netdata's [distributed data architecture](https://github.com/netdata/netdata/blob/master/docs/store/distributed-data-architecture.md) to store metrics locally. Without a slow and troublesome centralized data lake for your infrastructure's metrics, you reduce the -resources you need to invest in, and the complexity of, monitoring your infrastructure. +resources you need to invest in, and the complexity of, monitoring your infrastructure. Netdata Cloud unifies infrastructure monitoring by _centralizing the interface_ you use to query and visualize your nodes' metrics, not the data. By streaming metrics values to your browser, with Netdata Cloud acting as the secure proxy @@ -25,14 +23,8 @@ In this quickstart guide, you'll learn the basics of using Netdata Cloud to moni composite charts, and alarm viewing. You'll then learn about the most critical ways to configure the Agent on each of your nodes to maximize the value you get from Netdata. -This quickstart assumes you've installed the Netdata Agent on more than one node in your infrastructure, and connected -those nodes to your Space in Netdata Cloud. If you haven't yet, see the [Netdata -Cloud](https://github.com/netdata/netdata/blob/master/docs/cloud/cloud.mdx) docs for details on signing up for Netdata Cloud, installation, and -connection process. - -> If you want to monitor a Kubernetes cluster with Netdata, see our [k8s installation -> doc](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/kubernetes.md) for setup details, and then read our guide, [_Monitor a Kubernetes -> cluster with Netdata_](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/kubernetes-k8s-netdata.md). +This quickstart assumes you've [installed Netdata](https://github.com/netdata/netdata/edit/master/packaging/installer/README.md) +on more than one node in your infrastructure, and connected those nodes to your Space in Netdata Cloud. ## Set up your Netdata Cloud experience @@ -67,32 +59,49 @@ To [invite new users](https://github.com/netdata/netdata/blob/master/docs/cloud/ Space management Area. Choose which War Rooms to add this user to, then click **Send**. If your team members have trouble signing in, direct them to the [Netdata Cloud sign -in](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/sign-in.mdx) doc. +in](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/sign-in.md) doc. ### See an overview of your infrastructure -The default way to visualize the health and performance of an infrastructure with Netdata Cloud is the -[**Overview**](https://github.com/netdata/netdata/blob/master/docs/visualize/overview-infrastructure.md), which is the default interface of every War Room. The -Overview features composite charts, which display aggregated metrics from every node in a given War Room. These metrics -are streamed on-demand from individual nodes and composited onto a single, familiar dashboard. +Netdata Cloud utilizes "tabs" in order to provide you with informative sections based on your infrastructure. +These tabs can be separated into "static", meaning they are by default presented, and "non-static" which are tabs that get presented by user action (e.g clicking on a custom dashboard) + +#### Static tabs + +- The default tab for any War Room is the [Home tab](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md#home), which gives you an overview of this Space. + Here you can see the number of Nodes claimed, data retention statics, users by role, alerts and more. + +- The second and most important tab is the [Overview tab](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md#overview-and-single-node-view) which uses composite charts to display real-time metrics from every available node in a given War Room. + +- The [Nodes tab](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/nodes.md) gives you the ability to see the status (offline or online), host details, alarm status and also a short overview of some key metrics from all your nodes at a glance. + +- [Kubernetes tab](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/kubernetes.md) is a logical grouping of charts regarding your Kubernetes clusters. It contains a subset of the charts available in the **Overview tab**. + +- The [Dashboards tab](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md) gives you the ability to have tailored made views of specific/targeted interfaces for your infrastructure using any number of charts from any number of nodes. + +- The [Alerts tab](https://github.com/netdata/netdata/blob/master/docs/monitor/view-active-alarms.md) provides you with an overview for all the active alerts you receive for the nodes in this War Room, you can also see all the alerts that are configured to be triggered in any given moment. + +- The [Anomalies tab](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/anomaly-advisor.md) is dedicated to the Anomaly Advisor tool. -![The War Room -Overview](https://user-images.githubusercontent.com/1153921/108732681-09791980-74eb-11eb-9ba2-98cb1b6608de.png) +- The [Functions tab](https://github.com/netdata/netdata/blob/master/docs/cloud/netdata-functions.md) gives you the ability to visualize functions that the Netdata Agent collectors are able to expose. -Read more about the Overview in the [infrastructure overview](https://github.com/netdata/netdata/blob/master/docs/visualize/overview-infrastructure.md) doc. +- The [Feed & events](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/events-feed.md) tab lets you investigate events that occurred in the past, which is invaluable for troubleshooting. -Netdata Cloud also features the [**Nodes view**](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/nodes.md), which you can -use to configure and see a few key metrics from every node in the War Room, view health status, and more. +#### Dynamic tabs + +If you open a [new dashboard](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md), jump to a single-node dashboard, or navigate to a dedicated alert page, a new tab will open in War Room bar. + +Tabs can be rearranged with drag-and-drop or closed with the **X** button. Open tabs persist between sessions, so you can always come right back to your preferred setup. ### Drill down to specific nodes -Both the Overview and Nodes view offer easy access to **single-node dashboards** for targeted analysis. You can use +Both the Overview and the Nodes tab offer easy access to **single-node dashboards** for targeted analysis. You can use single-node dashboards in Netdata Cloud to drill down on specific issues, scrub backward in time to investigate historical data, and see like metrics presented meaningfully to help you troubleshoot performance problems. Read about the process in the [infrastructure overview](https://github.com/netdata/netdata/blob/master/docs/visualize/overview-infrastructure.md#drill-down-with-single-node-dashboards) doc, then learn about [interacting with -dashboards and charts](https://github.com/netdata/netdata/blob/master/docs/visualize/interact-dashboards-charts.md) to get the most from all of Netdata's real-time +dashboards and charts](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md) to get the most from all of Netdata's real-time metrics. ### Create new dashboards @@ -104,7 +113,7 @@ from every node in your infrastructure on a single dashboard. ![An example system CPU dashboard](https://user-images.githubusercontent.com/1153921/108732974-4b09c480-74eb-11eb-87a2-c67e569c08b6.png) -Read more about [creating new dashboards](https://github.com/netdata/netdata/blob/master/docs/visualize/create-dashboards.md) for more details about the process and +Read more about [creating new dashboards](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md) for more details about the process and additional tips on best leveraging the feature to help you troubleshoot complex performance problems. ## Set up your nodes @@ -134,7 +143,7 @@ sudo ./edit-config netdata.conf Our [configuration basics doc](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) contains more information about `netdata.conf`, `edit-config`, along with simple examples to get you familiar with editing your node's configuration. -After you've learned the basics, you should [secure your infrastructure's nodes](https://github.com/netdata/netdata/blob/master/docs/configure/secure-nodes.md) using +After you've learned the basics, you should [secure your infrastructure's nodes](https://github.com/netdata/netdata/blob/master/docs/netdata-security.md) using one of our recommended methods. These security best practices ensure no untrusted parties gain access to the metrics collected on any of your nodes. @@ -144,42 +153,74 @@ Netdata has [300+ pre-installed collectors](https://github.com/netdata/netdata/b configuration. Collectors search each of your nodes in default locations and ports to find running applications and gather as many metrics as they can without you having to configure them individually. -Most collectors work without configuration, but you should read up on [how collectors -work](https://github.com/netdata/netdata/blob/master/docs/collect/how-collectors-work.md) and [how to enable/configure](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure.md) them so -that you can see metrics from those applications in Netdata Cloud. +Most collectors work without configuration, should you want more info, you can read more on [how Netdata's metrics collectors work](https://github.com/netdata/netdata/blob/master/collectors/README.md) and the [Collectors configuration reference](https://github.com/netdata/netdata/blob/master/collectors/REFERENCE.md) documentation. In addition, find detailed information about which [system](https://github.com/netdata/netdata/blob/master/docs/collect/system-metrics.md), [container](https://github.com/netdata/netdata/blob/master/docs/collect/container-metrics.md), and [application](https://github.com/netdata/netdata/blob/master/docs/collect/application-metrics.md) metrics you can collect from across your infrastructure with Netdata. -## What's next? - -Netdata has many features that help you monitor the health of your nodes and troubleshoot complex performance problems. -Once you have a handle on configuration and are collecting all the right metrics, try out some of Netdata's other -infrastructure-focused features: - -- [See an overview of your infrastructure](https://github.com/netdata/netdata/blob/master/docs/visualize/overview-infrastructure.md) using Netdata Cloud's composite - charts and real-time visualizations. -- [Create new dashboards](https://github.com/netdata/netdata/blob/master/docs/visualize/create-dashboards.md) from any number of nodes and metrics in Netdata Cloud. - -To change how the Netdata Agent runs on each node, dig in to configuration files: - -- [Change how long nodes in your infrastructure retain metrics](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) based on how - many metrics each node collects, your preferred retention period, and the resources you want to dedicate toward - long-term metrics retention. -- [Create new alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md), or tweak some of the pre-configured alarms, to stay on top - of anomalies. -- [Enable notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md) to Slack, PagerDuty, email, and 30+ other services. -- [Export metrics](https://github.com/netdata/netdata/blob/master/docs/export/external-databases.md) to an external time-series database to use Netdata alongside - other monitoring and troubleshooting tools. - -### Related reference documentation - -- [Netdata Cloud · Spaces](https://github.com/netdata/netdata/blob/master/docs/cloud/spaces.md) -- [Netdata Cloud · War Rooms](https://github.com/netdata/netdata/blob/master/docs/cloud/war-rooms.md) -- [Netdata Cloud · Invite your team](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/invite-your-team.md) -- [Netdata Cloud · Sign in or sign up with email, Google, or - GitHub](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/sign-in.mdx) -- [Netdata Cloud · Nodes view](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/nodes.md) - - +## Netdata Cloud features + +<Grid columns="2"> + <Box + title="Spaces and War Rooms"> + <BoxList> + <BoxListItemRegexLink to="[](https://github.com/netdata/netdata/blob/master/docs/cloud/spaces.md)" title="Spaces" /> + <BoxListItemRegexLink to="[](https://github.com/netdata/netdata/blob/master/docs/cloud/war-rooms.md)" title="War Rooms" /> + </BoxList> + </Box> + <Box + title="Dashboards"> + <BoxList> + <BoxListItemRegexLink to="[](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md)" title="Overview" /> + <BoxListItemRegexLink to="[](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/nodes.md)" title="Nodes tab" /> + <BoxListItemRegexLink to="[](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/kubernetes.md)" title="Kubernetes" /> + <BoxListItemRegexLink to="[](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md)" title="Create new dashboards" /> + </BoxList> + </Box> + <Box + title="Alerts and notifications"> + <BoxList> + <BoxListItemRegexLink to="[](https://github.com/netdata/netdata/blob/master/docs/monitor/view-active-alarms.md#netdata-cloud)" title="View active alerts" /> + <BoxListItemRegexLink to="[](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.md)" title="Alert notifications" /> + </BoxList> + </Box> + <Box + title="Troubleshooting with Netdata Cloud"> + <BoxListItemRegexLink to="[](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md)" title="Metric Correlations" /> + <BoxListItemRegexLink to="[](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/anomaly-advisor.md)" title="Anomaly Advisor" /> + <BoxListItemRegexLink to="[](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/events-feed.md)" title="Events Feed" /> + </Box> + <Box + title="Management and settings"> + <BoxList> + <BoxListItemRegexLink to="[](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/sign-in.md)" title="Sign in with email, Google, or GitHub" /> + <BoxListItemRegexLink to="[](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/invite-your-team.md)" title="Invite your team" /> + <BoxListItemRegexLink to="[](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/themes.md)" title="Choose your Netdata Cloud theme" /> + <BoxListItemRegexLink to="[](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/role-based-access.md)" title="Role-Based Access" /> + <BoxListItemRegexLink to="[](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/plans.md)" title="Paid Plans" /> + </BoxList> + </Box> +</Grid> + +- Spaces and War Rooms + - [Spaces](https://github.com/netdata/netdata/blob/master/docs/cloud/spaces.md) + - [War Rooms](https://github.com/netdata/netdata/blob/master/docs/cloud/war-rooms.md) +- Dashboards + - [Overview](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md) + - [Nodes tab](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/nodes.md) + - [Kubernetes](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/kubernetes.md) + - [Create new dashboards](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md) +- Alerts and notifications + - [View active alerts](https://github.com/netdata/netdata/blob/master/docs/monitor/view-active-alarms.md#netdata-cloud) + - [Alert notifications](https://github.com/netdata/netdata/blob/master/docs/cloud/alerts-notifications/notifications.md) +- Troubleshooting with Netdata Cloud + - [Metric Correlations](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/metric-correlations.md) + - [Anomaly Advisor](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/anomaly-advisor.md) + - [Events Feed](https://github.com/netdata/netdata/blob/master/docs/cloud/insights/events-feed.md) +- Management and settings + - [Sign in with email, Google, or GitHub](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/sign-in.md) + - [Invite your team](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/invite-your-team.md) + - [Choose your Netdata Cloud theme](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/themes.md) + - [Role-Based Access](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/role-based-access.md) + - [Paid Plans](https://github.com/netdata/netdata/blob/master/docs/cloud/manage/plans.md) diff --git a/docs/quickstart/single-node.md b/docs/quickstart/single-node.md deleted file mode 100644 index 293731911..000000000 --- a/docs/quickstart/single-node.md +++ /dev/null @@ -1,92 +0,0 @@ -<!-- -title: "Single-node monitoring with Netdata" -sidebar_label: "Single-node monitoring" -description: "Learn dashboard basics, configuring your nodes, and collecting metrics from applications to create a powerful single-node monitoring tool." -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/quickstart/single-node.md ---> - -# Single-node monitoring with Netdata - -Because it's free, open-source, and requires only 1% CPU utilization to collect thousands of metrics every second, -Netdata is a superb single-node monitoring tool. - -In this quickstart guide, you'll learn how to access your single node's metrics through dashboards, configure your node -to your liking, and make sure the Netdata Agent is collecting metrics from the applications or containers you're running -on your node. - -## See your node's metrics - -To see your node's real-time metrics, you need to access its dashboard. You can either view the local dashboard, which -runs on the node itself, or see the dashboard through Netdata Cloud. Both methods feature real-time, interactive, and -synchronized charts, with the same metrics, and use the same UI. - -The primary difference is that Netdata Cloud also has a few extra features, like creating new dashboards using a -drag-and-drop editor, that enhance your monitoring and troubleshooting experience. - -To see your node's local dashboard, open up your web browser of choice and navigate to `http://NODE:19999`, replacing -`NODE` with the IP address or hostname of your Agent. Hit `Enter`. - -![Animated GIF of navigating to the -dashboard](https://user-images.githubusercontent.com/1153921/80825153-abaec600-8b94-11ea-8b17-1b770a2abaa9.gif) - -To see a node's dashboard in Netdata Cloud, [sign in](https://app.netdata.cloud). From the **Nodes** view in your -**General** War Room, click on the hostname of your node to access its dashboard through Netdata Cloud. - -![Screenshot of an embedded node -dashboard](https://user-images.githubusercontent.com/1153921/87457036-9b678e00-c5bc-11ea-977d-ad561a73beef.png) - -Once you've decided which dashboard you prefer, learn about [interacting with dashboards and -charts](https://github.com/netdata/netdata/blob/master/docs/visualize/interact-dashboards-charts.md) to get the most from Netdata's real-time metrics. - -## Configure your node - -The Netdata Agent is highly configurable so that you can match its behavior to your node. You will find most -configuration options in the `netdata.conf` file, which is typically at `/etc/netdata/netdata.conf`. The best way to -edit this file is using the `edit-config` script, which ensures updates to the Netdata Agent do not overwrite your -changes. For example: - -```bash -cd /etc/netdata -sudo ./edit-config netdata.conf -``` - -Our [configuration basics doc](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) contains more information about `netdata.conf`, `edit-config`, -along with simple examples to get you familiar with editing your node's configuration. - -After you've learned the basics, you should [secure your node](https://github.com/netdata/netdata/blob/master/docs/configure/secure-nodes.md) using one of our -recommended methods. These security best practices ensure no untrusted parties gain access to your dashboard or its -metrics. - -## Collect metrics from your system and applications - -Netdata has [300+ pre-installed collectors](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md) that gather thousands of metrics with zero -configuration. Collectors search your node in default locations and ports to find running applications and gather as -many metrics as possible without you having to configure them individually. - -These metrics enrich both the local and Netdata Cloud dashboards. - -Most collectors work without configuration, but you should read up on [how collectors -work](https://github.com/netdata/netdata/blob/master/docs/collect/how-collectors-work.md) and [how to enable/configure](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure.md) them. - -In addition, find detailed information about which [system](https://github.com/netdata/netdata/blob/master/docs/collect/system-metrics.md), -[container](https://github.com/netdata/netdata/blob/master/docs/collect/container-metrics.md), and [application](https://github.com/netdata/netdata/blob/master/docs/collect/application-metrics.md) metrics you can -collect from across your infrastructure with Netdata. - -## What's next? - -Netdata has many features that help you monitor the health of your node and troubleshoot complex performance problems. -Once you understand configuration, and are certain Netdata is collecting all the important metrics from your node, try -out some of Netdata's other visualization and health monitoring features: - -- [Build new dashboards](https://github.com/netdata/netdata/blob/master/docs/visualize/create-dashboards.md) to put disparate but relevant metrics onto a single - interface. -- [Create new alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md), or tweak some of the pre-configured alarms, to stay on top - of anomalies. -- [Enable notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md) to Slack, PagerDuty, email, and 30+ other services. -- [Change how long your node stores metrics](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md) based on how many metrics it - collects, your preferred retention period, and the resources you want to dedicate toward long-term metrics - retention. -- [Export metrics](https://github.com/netdata/netdata/blob/master/docs/export/external-databases.md) to an external time-series database to use Netdata alongside - other monitoring and troubleshooting tools. - - diff --git a/docs/store/change-metrics-storage.md b/docs/store/change-metrics-storage.md index e82393a65..5e14fe247 100644 --- a/docs/store/change-metrics-storage.md +++ b/docs/store/change-metrics-storage.md @@ -1,102 +1,207 @@ -<!-- -title: "Change how long Netdata stores metrics" -description: "With a single configuration change, the Netdata Agent can store days, weeks, or months of metrics at its famous per-second granularity." -custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/store/change-metrics-storage.md" -sidebar_label: "Change how long Netdata stores metrics" -learn_status: "Published" -learn_topic_type: "Tasks" -learn_rel_path: "Setup" ---> - # Change how long Netdata stores metrics -The Netdata Agent uses a custom made time-series database (TSDB), named the [`dbengine`](https://github.com/netdata/netdata/blob/master/database/engine/README.md), to store metrics. +The Netdata Agent uses a custom made time-series database (TSDB), named the +[`dbengine`](https://github.com/netdata/netdata/blob/master/database/engine/README.md), to store metrics. -The default settings retain approximately two day's worth of metrics on a system collecting 2,000 metrics every second, -but the Netdata Agent is highly configurable if you want your nodes to store days, weeks, or months worth of per-second -data. +To see the number of metrics stored and the retention in days per tier, use the `/api/v1/dbengine_stats` endpoint. -The Netdata Agent uses the following three fundamental settings in `netdata.conf` to change the behavior of the database engine: +To increase or decrease the metric retention time, you just [configure](#configure-metric-retention) +the number of storage tiers and the space allocated to each one. The effect of these two parameters +on the maximum retention and the memory used by Netdata is described in detail, below. + +## Calculate the system resources (RAM, disk space) needed to store metrics + +### Effect of storage tiers and disk space on retention + +3 tiers are enabled by default in Netdata, with the following configuration: -```conf -[global] - dbengine page cache size = 32 - dbengine multihost disk space = 256 - storage tiers = 1 +``` +[db] + mode = dbengine + + # per second data collection + update every = 1 + + # number of tiers used (1 to 5, 3 being default) + storage tiers = 3 + + # Tier 0, per second data + dbengine multihost disk space MB = 256 + + # Tier 1, per minute data + dbengine tier 1 multihost disk space MB = 128 + dbengine tier 1 update every iterations = 60 + + # Tier 2, per hour data + dbengine tier 2 multihost disk space MB = 64 + dbengine tier 2 update every iterations = 60 ``` -`dbengine page cache size` sets the maximum amount of RAM (in MiB) the database engine uses to cache and index recent -metrics. -`dbengine multihost disk space` sets the maximum disk space (again, in MiB) the database engine uses to store -historical, compressed metrics and `storage tiers` specifies the number of storage tiers you want to have in -your `dbengine`. When the size of stored metrics exceeds the allocated disk space, the database engine removes the -oldest metrics on a rolling basis. +The default "update every iterations" of 60 means that if a metric is collected per second in Tier 0, then +we will have a data point every minute in tier 1 and every minute in tier 2. -## Calculate the system resources (RAM, disk space) needed to store metrics +Up to 5 tiers are supported. You may add, or remove tiers and/or modify these multipliers, as long as the +product of all the "update every iterations" does not exceed 65535 (number of points for each tier0 point). + +e.g. If you simply add a fourth tier by setting `storage tiers = 4` and defining the disk space for the new tier, +the product of the "update every iterations" will be 60 * 60 * 60 = 216,000, which is > 65535. So you'd need to reduce +the `update every iterations` of the tiers, to stay under the limit. + +The exact retention that can be achieved by each tier depends on the number of metrics collected. The more +the metrics, the smaller the retention that will fit in a given size. The general rule is that Netdata needs +about **1 byte per data point on disk for tier 0**, and **4 bytes per data point on disk for tier 1 and above**. + +So, for 1000 metrics collected per second and 256 MB for tier 0, Netdata will store about: + +``` +256MB on disk / 1 byte per point / 1000 metrics => 256k points per metric / 86400 sec per day ~= 3 days +``` + +At tier 1 (per minute): + +``` +128MB on disk / 4 bytes per point / 1000 metrics => 32k points per metric / (24 hr * 60 min) ~= 22 days +``` + +At tier 2 (per hour): + +``` +64MB on disk / 4 bytes per point / 1000 metrics => 16k points per metric / 24 hr per day ~= 2 years +``` + +Of course double the metrics, half the retention. There are more factors that affect retention. The number +of ephemeral metrics (i.e. metrics that are collected for part of the time). The number of metrics that are +usually constant over time (affecting compression efficiency). The number of restarts a Netdata Agents gets +through time (because it has to break pages prematurely, increasing the metadata overhead). But the actual +numbers should not deviate significantly from the above. + +To see the number of metrics stored and the retention in days per tier, use the `/api/v1/dbengine_stats` endpoint. -You can store more or less metrics using the database engine by changing the allocated disk space. Use the calculator -below to find the appropriate value for the `dbengine` based on how many metrics your node(s) collect, whether you are -streaming metrics to a parent node, and more. +### Effect of storage tiers and retention on memory usage -You do not need to edit the `dbengine page cache size` setting to store more metrics using the database engine. However, -if you want to store more metrics _specifically in memory_, you can increase the cache size. +The total memory Netdata uses is heavily influenced by the memory consumed by the DBENGINE. +The DBENGINE memory is related to the number of metrics concurrently being collected, the retention of the metrics +on disk in relation with the queries running, and the number of metrics for which retention is maintained. -:::tip +The precise analysis of how much memory will be used by the DBENGINE itself is described in +[DBENGINE memory requirements](https://github.com/netdata/netdata/blob/master/database/engine/README.md#memory-requirements). -We advise you to visit the [tiering mechanism](https://github.com/netdata/netdata/blob/master/database/engine/README.md#tiering) reference. This will help you -configure the Agent to retain metrics for longer periods. +In addition to the DBENGINE, Netdata uses memory for contexts, metric labels (e.g. in a Kubernetes setup), +other Netdata structures/processes (e.g. Health) and system overhead. -::: +The quick rule of thumb, for a high level estimation is -:::caution +``` +DBENGINE memory in MiB = METRICS x (TIERS - 1) x 8 / 1024 MiB +Total Netdata memory in MiB = Metric ephemerality factor x DBENGINE memory in MiB + "dbengine page cache size MB" from netdata.conf +``` + +You can get the currently collected **METRICS** from the "dbengine metrics" chart of the Netdata dashboard. You just need to divide the +value of the "collected" dimension with the number of tiers. For example, at the specific point highlighted in the chart below, 608k metrics +were being collected across all 3 tiers, which means that `METRICS = 608k / 3 = 203667`. + +<img width="988" alt="image" src="https://user-images.githubusercontent.com/43294513/225335899-a9216ba7-a09e-469e-89f6-4690aada69a4.png" /> -This calculator provides an estimation of disk and RAM usage for **metrics usage**. Real-life usage may vary based on -the accuracy of the values you enter below, changes in the compression ratio, and the types of metrics stored. -::: +The **ephemerality factor** is usually between 3 or 4 and depends on how frequently the identifiers of the collected metrics change, increasing their +cardinality. The more ephemeral the infrastructure, the more short-lived metrics you have, increasing the ephemerality factor. If the metric cardinality is +extremely high due for example to a lot of extremely short lived containers (hundreds started every minute), the ephemerality factor can be much higher than 4. +In such cases, we recommend splitting the load across multiple Netdata parents, until we can provide a way to lower the metric cardinality, +by aggregating similar metrics. -Visit the [Netdata Storage Calculator](https://netdata-storage-calculator.herokuapp.com/) app to customize -data retention according to your preferences. +#### Small agent RAM usage -## Edit `netdata.conf` with recommended database engine settings +For 2000 metrics (dimensions) in 3 storage tiers and the default cache size: + +``` +DBENGINE memory for 2k metrics = 2000 x (3 - 1) x 8 / 1024 MiB = 32 MiB +dbengine page cache size MB = 32 MiB +Total Netdata memory in MiB = 3*32 + 32 = 128 MiB (low ephemerality) +``` + +#### Large parent RAM usage + +The Netdata parent in our production infrastructure at the time of writing: + - Collects 206k metrics per second, most from children streaming data + - The metrics include moderately ephemeral Kubernetes containers, leading to an ephemerality factor of about 4 + - 3 tiers are used for retention + - The `dbengine page cache size MB` in `netdata.conf` is configured to be 4GB + +Netdata parents can end up collecting millions of metrics per second. See also [scaling dedicated parent nodes](#scaling-dedicated-parent-nodes). + +The rule of thumb calculation for this set up gives us +``` +DBENGINE memory = 206,000 x 16 / 1024 MiB = 3,217 MiB = about 3 GiB +Extra cache = 4 GiB +Metric ephemerality factor = 4 +Estimated total Netdata memory = 3 * 4 + 4 = 16 GiB +``` -Now that you have a recommended setting for your Agent's `dbengine`, open `netdata.conf` with -[`edit-config`](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) and look for the `[db]` -subsection. Change it to the recommended values you calculated from the calculator. For example: +The actual measurement during a low usage time was the following: + +Purpose|RAM|Note +:--- | ---: | :--- +DBENGINE usage | 5.9 GiB | Out of 7GB max +Cardinality/ephemerality related memory (k8s contexts, labels, strings) | 3.4 GiB +Buffer for queries | 0 GiB | Out of 0.5 GiB max, when heavily queried +Other | 0.5 GiB | +System overhead | 4.4 GiB | Calculated by subtracting all of the above from the total +**Total Netdata memory usage** | 14.2 GiB | + +All the figures above except for the system memory management overhead were retrieved from Netdata itself. +The overhead can't be directly calculated, so we subtracted all the other figures from the total Netdata memory usage to get it. +This overhead is usually around 50% of the memory actually useable by Netdata, but could range from 20% in small +setups, all the way to 100% in some edge cases. + +## Configure metric retention + +Once you have decided how to size each tier, open `netdata.conf` with +[`edit-config`](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) +and make your changes in the `[db]` subsection. + +Save the file and restart the Agent with `sudo systemctl restart netdata`, or +the [appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) +for your system, to change the database engine's size. + +## Legacy configuration + +### v1.35.1 and prior + +These versions of the Agent do not support tiers. You could change the metric retention for the parent and +all of its children only with the `dbengine multihost disk space MB` setting. This setting accounts the space allocation +for the parent node and all of its children. + +To configure the database engine, look for the `page cache size MB` and `dbengine multihost disk space MB` settings in +the `[db]` section of your `netdata.conf`. ```conf [db] - mode = dbengine - storage tiers = 3 - update every = 1 - dbengine multihost disk space MB = 1024 - dbengine page cache size MB = 32 - dbengine tier 1 update every iterations = 60 - dbengine tier 1 multihost disk space MB = 384 - dbengine tier 1 page cache size MB = 32 - dbengine tier 2 update every iterations = 60 - dbengine tier 2 multihost disk space MB = 16 - dbengine tier 2 page cache size MB = 32 + dbengine page cache size MB = 32 + dbengine multihost disk space MB = 256 ``` -Save the file and restart the Agent with `sudo systemctl restart netdata`, or -the [appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system, to change the database engine's size. +### v1.23.2 and prior + +_For Netdata Agents earlier than v1.23.2_, the Agent on the parent node uses one dbengine instance for itself, and another instance for every child node it receives metrics from. If you had four streaming nodes, you would have five instances in total (`1 parent + 4 child nodes = 5 instances`). -## What's next? +The Agent allocates resources for each instance separately using the `dbengine disk space MB` (**deprecated**) setting. If `dbengine disk space MB`(**deprecated**) is set to the default `256`, each instance is given 256 MiB in disk space, which means the total disk space required to store all instances is, roughly, `256 MiB * 1 parent * 4 child nodes = 1280 MiB`. -If you have multiple nodes with the Netdata Agent installed, you -can [stream metrics](https://github.com/netdata/netdata/blob/master/docs/metrics-storage-management/how-streaming-works.mdx) from any number of _child_ nodes to a _ -parent_ node and store metrics using a centralized time-series database. Streaming allows you to centralize your data, -run Agents as headless collectors, replicate data, and more. +#### Backward compatibility -Storing metrics with the database engine is completely interoperable -with [exporting to other time-series databases](https://github.com/netdata/netdata/blob/master/docs/export/external-databases.md). With exporting, you can use the -node's resources to surface metrics when [viewing dashboards](https://github.com/netdata/netdata/blob/master/docs/visualize/interact-dashboards-charts.md), while also -archiving metrics elsewhere for further analysis, visualization, or correlation with other tools. +All existing metrics belonging to child nodes are automatically converted to legacy dbengine instances and the localhost +metrics are transferred to the multihost dbengine instance. -### Related reference documentation +All new child nodes are automatically transferred to the multihost dbengine instance and share its page cache and disk +space. If you want to migrate a child node from its legacy dbengine instance to the multihost dbengine instance, you +must delete the instance's directory, which is located in `/var/cache/netdata/MACHINE_GUID/dbengine`, after stopping the +Agent. -- [Netdata Agent · Database engine](https://github.com/netdata/netdata/blob/master/database/engine/README.md) -- [Netdata Agent · Database engine configuration option](https://github.com/netdata/netdata/blob/master/daemon/config/README.md#[db]-section-options) +## Scaling dedicated parent nodes +When you use streaming in medium to large infrastructures, you can have potentially millions of metrics per second reaching each parent node. +In the lab we have reliably collected 1 million metrics/sec with 16cores and 32GB RAM. +Our suggestion for scaling parents is to have them running on dedicated VMs, using a maximum of 50% of cpu, and ensuring you have enough RAM +for the desired retention. When your infrastructure can lead a parent to exceed these characteristics, split the load to multiple parents that +do not communicate with each other. With each child sending data to only one of the parents, you can still have replication, high availability, +and infrastructure level observability via the Netdata Cloud UI. diff --git a/docs/store/distributed-data-architecture.md b/docs/store/distributed-data-architecture.md index 96ae4d999..b08a265a3 100644 --- a/docs/store/distributed-data-architecture.md +++ b/docs/store/distributed-data-architecture.md @@ -1,20 +1,12 @@ -<!-- -title: "Distributed data architecture" -description: "Netdata's distributed data architecture stores metrics on individual nodes for high performance and scalability using all your granular metrics." -custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/store/distributed-data-architecture.md" -sidebar_label: "Distributed data architecture" -learn_status: "Published" -learn_topic_type: "Concepts" -learn_rel_path: "Concepts" ---> - # Distributed data architecture -Netdata uses a distributed data architecture to help you collect and store per-second metrics from any number of nodes. +Learn how Netdata's distributed data architecture enables us to store metrics on the edge nodes for security, high performance and scalability. + +This way, it helps you collect and store per-second metrics from any number of nodes. Every node in your infrastructure, whether it's one or a thousand, stores the metrics it collects. Netdata Cloud bridges the gap between many distributed databases by _centralizing the interface_ you use to query and -visualize your nodes' metrics. When you [look at charts in Netdata Cloud](https://github.com/netdata/netdata/blob/master/docs/visualize/interact-dashboards-charts.md) +visualize your nodes' metrics. When you [look at charts in Netdata Cloud](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md) , the metrics values are queried directly from that node's database and securely streamed to Netdata Cloud, which proxies them to your browser. @@ -77,16 +69,7 @@ When you use the database engine to store your metrics, you can always perform a Netdata Cloud does not store metric values. To enable certain features, such as [viewing active alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/view-active-alarms.md) -or [filtering by hostname/service](https://learn.netdata.cloud/docs/cloud/war-rooms#node-filter), Netdata Cloud does +or [filtering by hostname/service](https://github.com/netdata/netdata/blob/master/docs/cloud/war-rooms.md), Netdata Cloud does store configured alarms, their status, and a list of active collectors. Netdata does not and never will sell your personal data or data about your deployment. - -## What's next? - -You can configure the Netdata Agent to store days, weeks, or months worth of distributed, per-second data by -[configuring the database engine](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md). Use our calculator to determine the system -resources required to retain your desired amount of metrics, and expand or contract the database by editing a single -setting. - - diff --git a/docs/visualize/create-dashboards.md b/docs/visualize/create-dashboards.md deleted file mode 100644 index f4306f335..000000000 --- a/docs/visualize/create-dashboards.md +++ /dev/null @@ -1,69 +0,0 @@ -<!-- -title: "Create new dashboards" -description: "Create new dashboards in Netdata Cloud, with any number of metrics from any node on your infrastructure, for targeted troubleshooting." -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/visualize/create-dashboards.md ---> - -# Create new dashboards - -With Netdata Cloud, you can build new dashboards that put key metrics from any number of distributed systems in one -place for a bird's eye view of your infrastructure. You can create more meaningful visualizations for troubleshooting or -keep a watchful eye on your infrastructure's most meaningful metrics without moving from node to node. - -In the War Room you want to monitor with this dashboard, click on your War Room's dropdown, then click on the green **+ -Add** button next to **Dashboards**. In the panel, give your new dashboard a name, and click **+ Add**. - -Click the **Add Chart** button to add your first chart card. From the dropdown, select the node you want to add the -chart from, then the context. Netdata Cloud shows you a preview of the chart before you finish adding it. - -The **Add Text** button creates a new card with user-defined text, which you can use to describe or document a -particular dashboard's meaning and purpose. Enrich the dashboards you create with documentation or procedures on how to -respond - -![A bird's eye dashboard for a single -node](https://user-images.githubusercontent.com/1153921/102650776-a654ba80-4128-11eb-9a65-4f9801b03d4b.png) - -Charts in dashboards -are [fully interactive](https://github.com/netdata/netdata/blob/master/docs/visualize/interact-dashboards-charts.md) and -synchronized. You can -pan through time, zoom, highlight specific timeframes, and more. - -Move any card by clicking on their top panel and dragging them to a new location. Other cards re-sort to the grid system -automatically. You can also resize any card by grabbing the bottom-right corner and dragging it to its new size. - -Hit the **Save** button to finalize your dashboard. Any other member of the War Room can now access it and make changes. - -## Jump to single-node Cloud dashboards - -While dashboards help you associate essential charts from distributed nodes on a single pane of glass, you might need -more detail when troubleshooting an issue. Quickly jump to any node's dashboard by clicking the 3-dot icon in the corner -of any card to open a menu. Hit the **Go to Chart** item. - -Netdata Cloud takes you to the same chart on that node's dashboard. You can now navigate all that node's metrics and -[interact with charts](https://github.com/netdata/netdata/blob/master/docs/visualize/interact-dashboards-charts.md) to -further investigate anomalies or troubleshoot -complex performance problems. - -When viewing a single-node Cloud dashboard, you can also click on the add to dashboard icon <img -src="https://user-images.githubusercontent.com/1153921/87587846-827fdb00-c697-11ea-9f31-aed0b8c6afba.png" alt="Dashboard -icon" class="image-inline" /> to quickly add that chart to a new or existing dashboard. You might find this useful when -investigating an anomaly and want to quickly populate a dashboard with potentially correlated metrics. - -## Pin dashboards and navigate through Netdata Cloud - -Click on the **Pin** button in any dashboard to put those charts into a separate panel at the bottom of the screen. You -can now navigate through Netdata Cloud freely, individual Cloud dashboards, the Nodes view, different War Rooms, or even -different Spaces, and have those valuable metrics follow you. - -Pinning dashboards helps you correlate potentially related charts across your infrastructure and discover root causes -faster. - -## What's next? - -While it's useful to see real-time metrics on flexible dashboards, you need ways to know precisely when an anomaly -strikes. Every Netdata Agent comes with a health watchdog that -uses [alarms](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md) and -[notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md) to notify you of -issues seconds after they strike. - - diff --git a/docs/visualize/interact-dashboards-charts.md b/docs/visualize/interact-dashboards-charts.md deleted file mode 100644 index bf6d7a01f..000000000 --- a/docs/visualize/interact-dashboards-charts.md +++ /dev/null @@ -1,131 +0,0 @@ -<!-- -title: "Interact with dashboards and charts" -description: "Zoom, highlight, and pan through time on hundreds of real-time, interactive charts to quickly discover the root cause of any anomaly." -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/visualize/interact-dashboards-charts.md ---> - -# Interact with dashboards and charts - -> ⚠️ There is a new version of charts that is currently **only** available on [Netdata Cloud](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md). We didn't -> want to keep this valuable feature from you, so after we get this into your hands on the Cloud, we will collect and implement your feedback to make sure we are providing the best possible version of the feature on the Netdata Agent dashboard as quickly as possible. - -You can find Netdata's dashboards in two places: locally served at `http://NODE:19999` by the Netdata Agent, and in -Netdata Cloud. While you access these dashboards differently, they have similar interfaces, identical charts and -metrics, and you interact with both of them the same way. - -> If you're not sure which option is best for you, see our [single-node](https://github.com/netdata/netdata/blob/master/docs/quickstart/single-node.md) and -> [infrastructure](https://github.com/netdata/netdata/blob/master/docs/quickstart/infrastructure.md) quickstart guides. - -Netdata dashboards are single, scrollable pages with many charts stacked on top of one another. As you scroll up or -down, charts appearing in your browser's viewport automatically load and update every second. - -The dashboard is broken up into multiple **sections**, such as **System Overview**, **CPU**, **Disk**, which are -automatically generated based on which [collectors](https://github.com/netdata/netdata/blob/master/docs/collect/how-collectors-work.md) begin collecting metrics when -Netdata starts up. Sections also appear in the right-hand **menu**, along with submenus based on the contexts and -families Netdata creates for your node. - -## Choose timeframes to visualize - -Both the local Agent dashboard and Netdata Cloud feature time & date pickers to help you visualize specific points in -time. In Netdata Cloud, the picker appears in the [Overview](https://github.com/netdata/netdata/blob/master/docs/visualize/overview-infrastructure.md), [Nodes -view](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/nodes.md), [new -dashboards](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md), and any single-node dashboards you visit. - -Local Agent dashboard: - -![Time & date picker on the local Netdata -dashboard](https://user-images.githubusercontent.com/1153921/101512538-5875d080-3938-11eb-8daf-0fbd0948a04b.png) - -Netdata Cloud: - -![Time & date picker on Netdata -Cloud](https://user-images.githubusercontent.com/1153921/101512689-86f3ab80-3938-11eb-8abc-12171a9b8a5e.png) - -Their behavior is identical. Use the Quick Selector to visualize generic timeframes, or use the calendar or inputs to -select days, hours, minutes or seconds. Click **Apply** to re-render all visualizations with new metrics data, or -**Clear** to restore the default timeframe. - -See reference documentation for the [local Agent dashboard](https://github.com/netdata/netdata/blob/master/web/gui/README.md#time--date-picker) and [Netdata -Cloud](https://github.com/netdata/netdata/blob/master/docs/cloud/war-rooms.md#time--date-picker) for additional context about how the time & -date picker behaves in each environment. - -## Charts, dimensions, families, and contexts - -A **chart** is an interactive visualization of one or more collected/calculated metrics. You can see the name (also -known as its unique ID) of a chart by looking at the top-left corner of a chart and finding the parenthesized text. On a -Linux system, one of the first charts on the dashboard will be the system CPU chart, with the name `system.cpu`. - -A **dimension** is any value that gets shown on a chart. The value can be raw data or calculated values, such as -percentages, aggregates, and more. Most charts will have more than one dimension, in which case it will display each in -a different color. You can disable or enable showing these dimensions by clicking on them. - -A **family** is _one_ instance of a monitored hardware or software resource that needs to be monitored and displayed -separately from similar instances. For example, if your node has multiple partitions, Netdata will create different -families for `/`, `/boot`, `/home`, and so on. Same goes for entire disks, network devices, and more. - -A **context** groups several charts based on the types of metrics being collected and displayed. For example, the -**Disk** section often has many contexts: `disk.io`, `disk.ops`, `disk.backlog`, `disk.util`, and so on. Netdata uses -this context to create individual charts and then groups them by family. You can always see the context of any chart by -looking at its name or hovering over the chart's date. - -See our [dashboard docs](https://github.com/netdata/netdata/blob/master/web/README.md#charts-contexts-families) for more information about the above distinctions -and how they're used across Netdata to meaningfully organize and present metrics. - -## Interact with charts - -Netdata's charts are fully interactive to help you find meaningful information about complex problems. You can pan -through historical metrics, zoom in and out, select specific timeframes for further analysis, resize charts, and more. -Whenever you use a chart in this way, Netdata synchronizes all the other charts to match it. - -| Change | Method #1 | Method #2 | Method #3 | -| ------------------------------------------------- | ----------------------------------- | --------------------------------------------------------- | ---------------------------------------------------------- | -| **Stop** a chart from updating | `click` | | | -| **Reset** charts to default auto-refreshing state | `double click` | `double tap` (touchpad/touchscreen) | | -| **Select** a certain timeframe | `ALT` + `mouse selection` | `⌘` + `mouse selection` (macOS) | | -| **Pan** forward or back in time | `click and drag` | `touch and drag` (touchpad/touchscreen) | | -| **Zoom** to a specific timeframe | `SHIFT` + `mouse selection` | | | -| **Zoom** in/out | `SHIFT`/`ALT` + `mouse scrollwheel` | `SHIFT`/`ALT` + `two-finger pinch` (touchpad/touchscreen) | `SHIFT`/`ALT` + `two-finger scroll` (touchpad/touchscreen) | - -![Animated GIF of interacting with Netdata -charts](https://user-images.githubusercontent.com/1153921/102652236-051b3380-412b-11eb-8f7c-a2372ed92cd0.gif) - -These interactions can also be triggered using the icons on the bottom-right corner of every chart. They are, -respectively, `Pan Left`, `Reset`, `Pan Right`, `Zoom In`, and `Zoom Out`. - -You can show and hide individual dimensions by clicking on their names. Use `SHIFT + click` to hide or show dimensions -one at a time. Hiding dimensions simplifies the chart and can help you better discover exactly which aspect of your -system is behaving strangely. - -You can resize any chart by clicking-and-dragging the icon on the bottom-right corner of any chart. To restore the chart -to its original height, double-click the same icon. - -![Resizing a chart and resetting it to the default -height](https://user-images.githubusercontent.com/1153921/102652691-24b25c00-412b-11eb-9e2c-95325fcedc67.gif) - -### Composite charts in Netdata Cloud - -Netdata Cloud now supports composite charts in the Overview interface. Composite charts come with a few additional UI -elements and varied interactions, such as the location of dimensions and a utility bar for configuring the state of -individual composite charts. All of these details are covered in the [Overview -reference](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md) doc. - -## What's next? - -Netdata Cloud users can [build new dashboards](https://github.com/netdata/netdata/blob/master/docs/visualize/create-dashboards.md) in just a few clicks. By -aggregating relevant metrics from any number of nodes onto a single interface, you can respond faster to anomalies, -perform more targeted troubleshooting, or keep tabs on a bird's eye view of your infrastructure. - -If you're finished with dashboards for now, skip to Netdata's health watchdog for information on [creating or -configuring](https://github.com/netdata/netdata/blob/master/docs/monitor/configure-alarms.md) alarms, and [send notifications](https://github.com/netdata/netdata/blob/master/docs/monitor/enable-notifications.md) -to get informed when something goes wrong in your infrastructure. - -### Related reference documentation - -- [Netdata Agent · Web dashboards overview](https://github.com/netdata/netdata/blob/master/web/README.md) -- [Netdata Cloud · Interact with new charts](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md) -- [Netdata Cloud · War Rooms](https://github.com/netdata/netdata/blob/master/docs/cloud/war-rooms.md) -- [Netdata Cloud · Overview](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md) -- [Netdata Cloud · Nodes](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/nodes.md) -- [Netdata Cloud · Build new dashboards](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md) - - diff --git a/docs/visualize/overview-infrastructure.md b/docs/visualize/overview-infrastructure.md index 0daddd97a..c09e9aeae 100644 --- a/docs/visualize/overview-infrastructure.md +++ b/docs/visualize/overview-infrastructure.md @@ -2,6 +2,10 @@ title: "See an overview of your infrastructure" description: "With Netdata Cloud's War Rooms, you can see real-time metrics, from any number of nodes in your infrastructure, in composite charts." custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/visualize/overview-infrastructure.md +sidebar_label: "See an overview of your infrastructure" +learn_status: "Published" +learn_topic_type: "Tasks" +learn_rel_path: "Operations/Netdata Cloud Visualizations" --> # See an overview of your infrastructure @@ -80,32 +84,12 @@ given node to quickly _jump to the same chart in that node's single-node dashboa You can use single-node dashboards in Netdata Cloud to drill down on specific issues, scrub backward in time to investigate historical data, and see like metrics presented meaningfully to help you troubleshoot performance problems. -All of the familiar [interactions](https://github.com/netdata/netdata/blob/master/docs/visualize/interact-dashboards-charts.md) are available, as is adding any chart -to a [new dashboard](https://github.com/netdata/netdata/blob/master/docs/visualize/create-dashboards.md). +All of the familiar [interactions](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md) are available, as is adding any chart +to a [new dashboard](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/dashboards.md). -## Nodes view - -You can also use the **Nodes view** to monitor the health status and user-configurable key metrics from multiple nodes -in a War Room. Read the [Nodes view doc](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/nodes.md) for details. - -![The Nodes view](https://user-images.githubusercontent.com/1153921/108733066-5fe65800-74eb-11eb-98e0-abaccd36deaf.png) - -## What's next? - -To troubleshoot complex performance issues using Netdata, you need to understand how to interact with its meaningful -visualizations. Learn more about [interaction](https://github.com/netdata/netdata/blob/master/docs/visualize/interact-dashboards-charts.md) to see historical metrics, -highlight timeframes for targeted analysis, and more. - -If you're a Kubernetes user, read about Netdata's [Kubernetes -visualizations](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/kubernetes.md) for details about the health map and -time-series k8s charts, and our tutorial, [_Kubernetes monitoring with Netdata: Overview and -visualizations_](https://github.com/netdata/netdata/blob/master/docs/guides/monitor/kubernetes-k8s-netdata.md), for a full walkthrough. - -### Related reference documentation - -- [Netdata Cloud · War Rooms](https://github.com/netdata/netdata/blob/master/docs/cloud/war-rooms.md) -- [Netdata Cloud · Overview](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/overview.md) -- [Netdata Cloud · Nodes view](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/nodes.md) -- [Netdata Cloud · Kubernetes visualizations](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/kubernetes.md) +## Nodes tab +You can also use the **Nodes tab** to monitor the health status and user-configurable key metrics from multiple nodes +in a War Room. Read the [Nodes tab documentation](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/nodes.md) for details. +![The Nodes tab](https://user-images.githubusercontent.com/1153921/108733066-5fe65800-74eb-11eb-98e0-abaccd36deaf.png) diff --git a/docs/why-netdata/1s-granularity.md b/docs/why-netdata/1s-granularity.md deleted file mode 100644 index 4fc7fab2d..000000000 --- a/docs/why-netdata/1s-granularity.md +++ /dev/null @@ -1,59 +0,0 @@ -<!-- -title: "1s granularity" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/why-netdata/1s-granularity.md ---> - -# 1s granularity - -High resolution metrics are required to effectively monitor and troubleshoot systems and applications. - -## Why? - -- The world is going real-time. Today, customer experience is significantly affected by response time, so SLAs are tighter than ever before. It is just not practical to monitor a 2-second SLA with 10-second metrics. - -- IT goes virtual. Unlike real hardware, virtual environments are not linear, nor predictable. You cannot expect resources to be available when your applications need them. They will eventually be, but not exactly at the time they are needed. The latency of virtual environments is affected by many factors, most of which are outside our control, like: the maintenance policy of the hosting provider, the work load of third party virtual machines running on the same physical servers combined with the resource allocation and throttling policy among virtual machines, the provisioning system of the hosting provider, etc. - -## What do others do? - -So, why don't most monitoring platforms and monitoring SaaS providers offer high resolution metrics? - -They want to, but they can't, at least not massively. - -The reasons lie in their design decisions: - -1. Time-series databases (prometheus, graphite, opentsdb, influxdb, etc) centralize all the metrics. At scale, these databases can easily become the bottleneck of the whole infrastructure. - -2. SaaS providers base their business models on centralizing all the metrics. On top of the time-series database bottleneck they also have increased bandwidth costs. So, massively supporting high resolution metrics, destroys their business model. - -Of course, since a couple of decades the world has fixed this kind of scaling problems: instead of scaling up, scale out, horizontally. That is, instead of investing on bigger and bigger central components, decentralize the application so that it can scale by adding more smaller nodes to it. - -There have been many attempts to fix this problem for monitoring. But so far, all solutions required centralization of metrics, which can only scale up. So, although the problem is somehow managed, it is still the key problem of all monitoring platforms and one of the key reasons for increased monitoring costs. - -Another important factor is how resource efficient data collection can be when running per second. Most solutions fail to do it properly. The data collection agent is consuming significant system resources when running "per second", influencing the monitored systems and applications to a great degree. - -Finally, per second data collection is a lot harder. Busy virtual environments have [a constant latency of about 100ms, spread randomly to all data sources](https://docs.google.com/presentation/d/18C8bCTbtgKDWqPa57GXIjB2PbjjpjsUNkLtZEz6YK8s/edit#slide=id.g422e696d87_0_57). If data collection is not implemented properly, this latency introduces a random error of +/- 10%, which is quite significant for a monitoring system. - -So, the monitoring industry fails to massively provide high resolution metrics, mainly for 3 reasons: - -1. Centralization of metrics makes monitoring cost inefficient at that rate. -2. Data collection needs optimization, otherwise it will significantly affect the monitored systems. -3. Data collection is a lot harder, especially on busy virtual environments. - -## What does Netdata do differently? - -Netdata decentralizes monitoring completely. Each Netdata node is autonomous. It collects metrics locally, it stores them locally, it runs checks against them to trigger alarms locally, and provides an API for the dashboards to visualize them. This allows Netdata to scale to infinity. - -Of course, Netdata can centralize metrics when needed. For example, it is not practical to keep metrics locally on ephemeral nodes. For these cases, Netdata streams the metrics in real-time, from the ephemeral nodes to one or more non-ephemeral nodes nearby. This centralization is again distributed. On a large infrastructure, there may be many centralization points. - -To eliminate the error introduced by data collection latencies on busy virtual environments, Netdata interpolates collected metrics. It does this using microsecond timings, per data source, offering measurements with an error rate of 0.0001%. When running [in debug mode, Netdata calculates this error rate](https://github.com/netdata/netdata/blob/36199f449852f8077ea915a3a14a33fa2aff6d85/database/rrdset.c#L1070-L1099) for every point collected, ensuring that the database works with acceptable accuracy. - -Finally, Netdata is really fast. Optimization is a core product feature. On modern hardware, Netdata can collect metrics with a rate of above 1M metrics per second per core (this includes everything, parsing data sources, interpolating data, storing data in the time series database, etc). So, for a few thousands metrics per second per node, Netdata needs negligible CPU resources (just 1-2% of a single core). - -Netdata has been designed to: - -- Solve the centralization problem of monitoring -- Replace the console for performance troubleshooting. - -So, for Netdata 1s granularity is easy, the natural outcome... - - diff --git a/docs/why-netdata/README.md b/docs/why-netdata/README.md deleted file mode 100644 index 9c3af5e7d..000000000 --- a/docs/why-netdata/README.md +++ /dev/null @@ -1,35 +0,0 @@ -<!-- -title: "Why Netdata" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/why-netdata/README.md ---> - -# Why Netdata - -> Any performance monitoring solution that does not go down to per second -> collection and visualization of the data, is useless. -> It will make you happy to have it, but it will not help you more than that. - -Netdata is built around 4 principles: - -1. **[Per second data collection for all metrics.](https://github.com/netdata/netdata/blob/master/docs/why-netdata/1s-granularity.md)** - - _It is impossible to monitor a 2 second SLA, with 10 second metrics._ - -2. **[Collect and visualize all the metrics from all possible sources.](https://github.com/netdata/netdata/blob/master/docs/why-netdata/unlimited-metrics.md)** - - _To troubleshoot slowdowns, we need all the available metrics. The console should not provide more metrics._ - -3. **[Meaningful presentation, optimized for visual anomaly detection.](https://github.com/netdata/netdata/blob/master/docs/why-netdata/meaningful-presentation.md)** - - _Metrics are a lot more than name-value pairs over time. The monitoring tool should know all the metrics. Users should not!_ - -4. **[Immediate results, just install and use.](https://github.com/netdata/netdata/blob/master/docs/why-netdata/immediate-results.md)** - - _Most of our infrastructure is standardized. There is no point to configure everything metric by metric._ - -Unlike other monitoring solutions that focus on metrics visualization, -Netdata's helps us troubleshoot slowdowns without touching the console. - -So, everything is a bit different. - - diff --git a/docs/why-netdata/immediate-results.md b/docs/why-netdata/immediate-results.md deleted file mode 100644 index b35aa5381..000000000 --- a/docs/why-netdata/immediate-results.md +++ /dev/null @@ -1,46 +0,0 @@ -<!-- -title: "Immediate results" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/why-netdata/immediate-results.md ---> - -# Immediate results - -Most of our infrastructure is based on standardized systems and applications. - -It is a tremendous waste of time and effort, in a global scale, to require from all users to configure their infrastructure dashboards and alarms metric by metric. - -## Why? - -Most of the existing monitoring solutions, focus on providing a platform "for building your monitoring". So, they provide the tools to collect metrics, store them, visualize them, check them and query them. And we are all expected to go through this process. - -However, most of our infrastructure is standardized. We run well known Linux distributions, the same kernel, the same database, the same web server, etc. - -So, why can't we have a monitoring system that can be installed and instantly provide feature rich dashboards and alarms about everything we use? Is there any reason you would like to monitor your web server differently than me? - -What a waste of time and money! Hundreds of thousands of people doing the same thing over and over again, trying to understand what the metrics are, how to visualize them, how to configure alarms for them and how to query them when issues arise. - -## What do others do? - -Open-source solutions rely almost entirely on configuration. So, you have to go through endless metric-by-metric configuration yourself. The result will reflect your skills, your experience, your understanding. - -Monitoring SaaS providers offer a very basic set of pre-configured metrics, dashboards and alarms. They assume that you will configure the rest you may need. So, once more, the result will reflect your skills, your experience, your understanding. - -## What does Netdata do? - -1. Metrics are auto-detected, so for 99% of the cases data collection works out of the box. -2. Metrics are converted to human readable units, right after data collection, before storing them into the database. -3. Metrics are structured, organized in charts, families and applications, so that they can be browsed. -4. Dashboards are automatically generated, so all metrics are available for exploration immediately after installation. -5. Dashboards are not just visualizing metrics; they are a tool, optimized for visual anomaly detection. -6. Hundreds of pre-configured alarm templates are automatically attached to collected metrics. - -The result is that Netdata can be used immediately after installation! - -Netdata: - -- Helps engineers understand and learn what the metrics are. -- Does not require any configuration. Of course there are thousands of options to tweak, but the defaults are pretty good for most systems. -- Does not introduce any query languages or any other technology to be learned. Of course some familiarity with the tool is required, but nothing too complicated. -- Includes all the community expertise and experience for monitoring systems and applications. - - diff --git a/docs/why-netdata/meaningful-presentation.md b/docs/why-netdata/meaningful-presentation.md deleted file mode 100644 index fc670e33f..000000000 --- a/docs/why-netdata/meaningful-presentation.md +++ /dev/null @@ -1,68 +0,0 @@ -<!-- -title: "Meaningful presentation" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/why-netdata/meaningful-presentation.md ---> - -# Meaningful presentation - -Metrics are a lot more than name-value pairs over time. It is just not practical to require from all users to have a deep understanding of all metrics for monitoring their systems and applications. - -## Why? - -There is a plethora of metrics. And each of them has a context, a meaning, a way to be interpreted. - -Traditionally, monitoring solutions instruct engineers to collect only the metrics they understand. This is a good strategy as long as you have a clear understanding of what you need and you have the skills, the expertise and the experience to select them. - -For most people, this is an impossible task. It is just not practical to assume that any engineer will have a deep understanding of how the kernel works, how the networking stack works, how the system manages its memory, how it schedules processes, how web servers work, how databases work, etc. - -The result is that for most of the world, monitoring sucks. It is incomplete, inefficient, and in most of the cases only useful for providing an illusion that the infrastructure is being monitored. It is not! According to the [State of Monitoring 2017](http://start.bigpanda.io/state-of-monitoring-report-2017), only 11% of the companies are satisfied with their existing monitoring infrastructure, and on the average they use 6-7 monitoring tools. - -But even if all the metrics are collected, an even bigger challenge is revealed: What to do with them? How to use them? - -The existing monitoring solutions, assume the engineers will: - -- Design dashboards -- Configure alarms -- Use a query language to investigate issues - -However, all these have to be configured metric by metric. - -The monitoring industry believes there is this "IT Operations Hero", a person combining these abilities: - -1. Has a deep understanding of IT architectures and is a skillful SysAdmin. -2. Is a superb Network Administrator (can even read and understand the Linux kernel networking stack). -3. Is an exceptional database administrator. -4. Is fluent in software engineering, capable of understanding the internal workings of applications. -5. Masters Data Science, statistical algorithms and is fluent in writing advanced mathematical queries to reveal the meaning of metrics. - -Of course this person does not exist! - -## What do others do? - -Most solutions are based on a time-series database. A database that tracks name-value pairs, over time. - -Data collection blindly collects metrics and stores them into the database, dashboard editors query the database to visualize the metrics. They may also provide a query editor, that users can use to query the database by hand. - -Of course, it is just not practical to work that way when the database has 10,000 unique metrics. Most of them will be just noise, not because they are not useful, but because no one understands them! - -So, they collect very limited metrics. Basic dashboards can be created with these metrics, but for any issue that needs to be troubleshooted, the monitoring system is just not adequate. It cannot help. So, engineers are using the console to access the rest of the metrics and find the root cause. - -## What does Netdata do? - -In Netdata, the meaning of metrics is incorporated into the database: - -1. all metrics are converted and stored to human-friendly units. This is a data-collection process, not a visualization process. For example, cpu utilization in Netdata is stored as percentage, not as kernel ticks. - -2. all metrics are organized into human-friendly charts, sharing the same context and units (similar to what other monitoring solutions call `cardinality`). So, when Netdata developer collect metrics, they configure the correlation of the metrics right in data collection, which is stored in the database too. - -3. all charts are then organized in families, and chart families are organized in applications. These structures are responsible for providing the menu at the right side of Netdata dashboards for exploring the whole database. - -The result is a system that can be browsed by humans, even if the database has 100,000 unique metrics. It is pretty natural for everyone to browse them, understand their meaning and their scope. - -Of course, this process makes data collection significantly more time consuming. Netdata developers need to normalize and correlate and categorize every single metric Netdata collects. - -But it simplifies everything else. Data collection, metrics database and visualization are de-coupled, thus the query engine is simpler, and the visualization is straight forward. - -Netdata goes a step further, by enriching the dashboard with information that is useful for most people. So, to improve clarity and help users be more effective, Netdata includes right in the dashboard the community knowledge and expertise about the metrics. So, that Netdata users can focus on solving their infrastructure problem, not on the technicalities of data collection and visualization. - - diff --git a/docs/why-netdata/unlimited-metrics.md b/docs/why-netdata/unlimited-metrics.md deleted file mode 100644 index b79a4ede3..000000000 --- a/docs/why-netdata/unlimited-metrics.md +++ /dev/null @@ -1,49 +0,0 @@ -<!-- -title: "Unlimited metrics" -custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/why-netdata/unlimited-metrics.md ---> - -# Unlimited metrics - -All metrics are important and all metrics should be available when you need them. - -## Why? - -Collecting all the metrics breaks the first rule of every monitoring text book: "collect only the metrics you need", "collect only the metrics you understand". - -Unfortunately, this does not work! Filtering out most metrics is like reading a book by skipping most of its pages... - -For many people, monitoring is about: - -- Detecting outages -- Capacity planning - -However, **slowdowns are 10 times more common** compared to outages (check slide 14 of [Online Performance is Business Performance ](https://www.slideshare.net/KenGodskind/alertsitetrac) reported by Trac Research/AlertSite). Designing a monitoring system targeting only outages and capacity planning solves just a tiny part of the operational problems we face. Check also [Downtime vs. Slowtime: Which Hurts More?](https://dzone.com/articles/downtime-vs-slowtime-which-hurts-more). - -To troubleshoot a slowdown, a lot more metrics are needed. Actually all the metrics are needed, since the real cause of a slowdown is most probably quite complex. If we knew the possible reasons, chances are we would have fixed them before they become a problem. - -## What do others do? - -Most monitoring solutions, when they are able to detect something, provide just a hint (e.g. "hey, there is a 20% drop in requests per second over the last minute") and they expect us to use the console for determining the root cause. - -Of course this introduces a lot more problems: how to troubleshoot a slowdown using the console, if the slowdown lifetime is just a few seconds, randomly spread throughout the day? - -You can't! You will spend your entire day on the console, waiting for the problem to happen again while you are logged in. A blame war starts: developers blame the systems, sysadmins blame the hosting provider, someone says it is a DNS problem, another one believes it is network related, etc. We have all experienced this, multiple times... - -So, why do monitoring solutions and SaaS providers filter out metrics? - -They can't do otherwise! - -1. Centralization of metrics depends on metrics filtering, to control monitoring costs. Time-series databases limit the number of metrics collected, because the number of metrics influences their performance significantly. They get congested at scale. -2. It is a lot easier to provide an illusion of monitoring by using a few basic metrics. -3. Troubleshooting slowdowns is the hardest IT problem to solve, so most solutions just avoid it. - -## What does Netdata do? - -Netdata collects, stores and visualizes everything, every single metric exposed by systems and applications. - -Due to Netdata's distributed nature, the number of metrics collected does not have any noticeable effect on the performance or the cost of the monitoring infrastructure. - -Of course, since Netdata is also about [meaningful presentation](meaningful-presentation.md), the number of metrics makes Netdata development slower. We, the Netdata developers, need to have a good understanding of the metrics before adding them into Netdata. We need to organize the metrics, add information related to them, configure alarms for them, so that you, the Netdata users, will have the best out-of-the-box experience and all the information required to kill the console for troubleshooting slowdowns. - - |