diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-07-24 09:54:23 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-07-24 09:54:44 +0000 |
commit | 836b47cb7e99a977c5a23b059ca1d0b5065d310e (patch) | |
tree | 1604da8f482d02effa033c94a84be42bc0c848c3 /docs/dashboards-and-charts | |
parent | Releasing debian version 1.44.3-2. (diff) | |
download | netdata-836b47cb7e99a977c5a23b059ca1d0b5065d310e.tar.xz netdata-836b47cb7e99a977c5a23b059ca1d0b5065d310e.zip |
Merging upstream version 1.46.3.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'docs/dashboards-and-charts')
-rw-r--r-- | docs/dashboards-and-charts/README.md | 40 | ||||
-rw-r--r-- | docs/dashboards-and-charts/alerts-tab.md | 66 | ||||
-rw-r--r-- | docs/dashboards-and-charts/anomaly-advisor-tab.md | 27 | ||||
-rw-r--r-- | docs/dashboards-and-charts/dashboards-tab.md | 96 | ||||
-rw-r--r-- | docs/dashboards-and-charts/events-feed.md | 74 | ||||
-rw-r--r-- | docs/dashboards-and-charts/home-tab.md | 60 | ||||
-rw-r--r-- | docs/dashboards-and-charts/import-export-print-snapshot.md | 78 | ||||
-rw-r--r-- | docs/dashboards-and-charts/kubernetes-tab.md | 42 | ||||
-rw-r--r-- | docs/dashboards-and-charts/logs-tab.md | 16 | ||||
-rw-r--r-- | docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md | 25 | ||||
-rw-r--r-- | docs/dashboards-and-charts/netdata-charts.md | 425 | ||||
-rw-r--r-- | docs/dashboards-and-charts/node-filter.md | 17 | ||||
-rw-r--r-- | docs/dashboards-and-charts/nodes-tab.md | 57 | ||||
-rw-r--r-- | docs/dashboards-and-charts/themes.md | 15 | ||||
-rw-r--r-- | docs/dashboards-and-charts/top-tab.md | 27 | ||||
-rw-r--r-- | docs/dashboards-and-charts/visualization-date-and-time-controls.md | 92 |
16 files changed, 1157 insertions, 0 deletions
diff --git a/docs/dashboards-and-charts/README.md b/docs/dashboards-and-charts/README.md new file mode 100644 index 000000000..372f2030b --- /dev/null +++ b/docs/dashboards-and-charts/README.md @@ -0,0 +1,40 @@ +# Dashboards and Charts + +This guide covers how to access both Agent and Cloud dashboards, along with links to explore specific sections in more detail. + +When you access the Netdata dashboard through the Cloud, you'll always have the latest version available. + +By default, the Agent dashboard shows the latest version (matching Netdata Cloud). However, there are a few exceptions: + +- Without internet access, the Agent can't download the newest dashboards. In this case, it will automatically use the bundled version. +- Users have defined, e.g. through URL bookmark, that they want to see the previous version of the dashboard (accessible `http://NODE:19999/v1`, replacing `NODE` with the IP address or hostname of your Agent). + +## Main sections + +The Netdata dashboard consists of the following main sections: + +- [Home tab](/docs/dashboards-and-charts/home-tab.md) +- [Nodes tab](/docs/dashboards-and-charts/nodes-tab.md) +- [Netdata charts](/docs/dashboards-and-charts/netdata-charts.md) +- [Metrics tab and single node tabs](/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md) +- [Top tab](/docs/dashboards-and-charts/top-tab.md) +- [Logs tab](/docs/dashboards-and-charts/logs-tab.md) +- [Dashboards tab](/docs/dashboards-and-charts/dashboards-tab.md) +- [Alerts tab](/docs/dashboards-and-charts/alerts-tab.md) +- [Events tab](/docs/dashboards-and-charts/events-feed.md) + +> **Note** +> +> Some sections of the dashboard, when accessed through the agent, may require the user to be signed in to Netdata Cloud or have the Agent claimed to Netdata Cloud for their full functionality. Examples include saving visualization settings on charts or custom dashboards, claiming the node to Netdata Cloud, or executing functions on an Agent. + +## How to access the dashboards? + +### Netdata Cloud + +You can access the dashboard at <https://app.netdata.cloud/> and [sign-in with an account or sign-up](/docs/netdata-cloud/authentication-and-authorization/README.md) if you don't have an account yet. + +### Netdata Agent + +To view your Netdata dashboard, open a web browser and enter the address `http://NODE:19999` - replace `NODE` with your Agent's IP address or hostname. If the Agent is on the same machine, use http://localhost:19999. + +Documentation for previous Agent dashboard can still be found [here](/src/web/gui/README.md). diff --git a/docs/dashboards-and-charts/alerts-tab.md b/docs/dashboards-and-charts/alerts-tab.md new file mode 100644 index 000000000..00d3efcb7 --- /dev/null +++ b/docs/dashboards-and-charts/alerts-tab.md @@ -0,0 +1,66 @@ +# Alerts tab + +Netdata comes with hundreds of pre-configured health alerts designed to notify you when an anomaly or performance issue affects your node or its applications. + +## Active tab + +From the Active tab you can see all the active alerts in your Room. You will be presented with a table having information about each alert that is in warning or critical state. + +You can always sort the table by a certain column by clicking on the name of that column, and using the gear icon on the top right to control which columns are visible at any given time. + +### Filter alerts + +From this tab, you can also filter alerts with the right hand bar. More specifically you can filter: + +- Alert status + - Filter based on the status of the alerts (e.g. Warning, Critical) +- Alert class + - Filter based on the class of the alert (e.g. Latency, Utilization, Workload etc.) +- Alert type & component + - Filter based on the alert's type (e.g. System, Web Server) and component (e.g. CPU, Disk, Load) +- Alert role + - Filter by the role that the alert is set to notify (e.g. Sysadmin, Webmaster etc.) +- Host labels + - Filter based on the host labels that are configured for the nodes across the Room (e.g. `_cloud_instance_region` to match `us-east-1`) +- Node status + - Filter by node availability status (e.g. Live or Offline) +- Netdata version + - Filter by Netdata version (e.g. `v1.45.3`) +- Nodes + - Filter the alerts based on the nodes of your Room. + +### View alert details + +By clicking on the name of an entry of the table you can access that alert's details page, providing you with: + +- Latest and Triggered time values +- The alert's description +- A link to the Netdata Advisor's page about this alert +- The chart at the time frame that the alert was triggered +- The alert's information: Node name, chart instance, type, component and class +- Configuration section +- Instance values - Node Instances + +At the bottom of the panel you can click the green button "View alert page" to open a dynamic tab containing all the info for this alert in a tab format, where you can also run correlations and go to the node's chart that raised the particular alert. + +### Silence an alert + +From this tab, the "Silencing" column shows if there is any rule present for each alert, and from the "Actions" column you can create a new [silencing rule](/docs/alerts-and-notifications/notifications/centralized-cloud-notifications/centralized-cloud-notifications-reference.md#alert-notifications-silencing-rules) for this alert, or get help and information about this alert from the [Netdata Assistant](/docs/netdata-assistant.md). + +## Alert Configurations tab + +From this tab you can view all the configurations for all running alerts in your Room. Each row concerns one alert, and it provides information about it in the rest of the table columns. + +By running alerts we mean alerts that are related to some metric that is or was collected. Netdata may have more alerts pre-configured that aren't applicable to your monitoring use-cases. + +You can control which columns are visible by using the gear icon on the right-hand side. + +Similarly to the previous tab, you can see the silencing status of an alert, while also being able to dig deeper and show the configuration for the alert and ask the [Netdata Assistant](/docs/netdata-assistant.md) for help. + +### See the configuration for an alert + +From the actions column you can explore the alert's configuration, split by the different nodes that have this alert configured. + +From there you can click on any of the rows to get to the individual alert configurations for that node. + +Click on an alert row to see the alert's page, with all the information about when it was last triggered and what it's configuration is. diff --git a/docs/dashboards-and-charts/anomaly-advisor-tab.md b/docs/dashboards-and-charts/anomaly-advisor-tab.md new file mode 100644 index 000000000..51b58b23a --- /dev/null +++ b/docs/dashboards-and-charts/anomaly-advisor-tab.md @@ -0,0 +1,27 @@ +# Anomaly Advisor tab + +The Anomaly Advisor tab lets you focus on potentially anomalous metrics and charts related to a particular highlighted window of interest. In addition to this tab, each chart in the [Metrics tab](/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md) also has an [Anomaly Rate ribbon](/docs/dashboards-and-charts/netdata-charts.md#anomaly-rate-ribbon). + + +More details about configuration can be found in the [ML documentation](/src/ml/README.md#configuration). + +This tab uses our [Anomaly Rate ML feature](/src/ml/README.md#anomaly-rate---averageanomaly-bit) to score metrics in terms of anomalous behavior. + +- The "Anomaly Rate" chart shows the percentage of anomalous metrics over time per node. + +- The "Count of Anomalous Metrics" chart shows raw counts of anomalous metrics per node so may often be similar to the Anomaly Rate chart, apart from where nodes may have different numbers of metrics. + +- The "Anomaly Events Detected" chart shows whether the anomaly rate per node has increased enough to cause a node-level anomaly. Anomaly events will appear slightly after the anomaly rate starts to increase in the timeline, this is because a significant number of metrics in the node need to be anomalous before an anomaly event is triggered. + +Once you have highlighted a window of interest, you should see an ordered list of charts, with the Anomaly Rate being displayed as a purple ribbon in the chart. + +> **Tip** +> +> You can also use the [node filter](/docs/dashboards-and-charts/node-filter.md) to select which nodes you want to include or exclude. + +The right side of the page displays an anomaly index for the highlighted timeline of interest. The index is sorted from most anomalous (highest level of anomaly) to least (lowest level of anomaly). Clicking on an entry in the index will get you to the corresponding chart for the anomalous metric. + +## Usage Tips + +- If you are interested in a subset of specific nodes then filtering to just those nodes before highlighting is recommended to get better results. When you highlight a timeframe, Netdata will ask the Agents for a ranking across all metrics, so if there is a subset of nodes there will be less "averaging" going on and you'll get a less noisy ranking. +- Ideally try and highlight close to a spike or window of interest so that the resulting ranking can narrow-in more easily on the timeline you are interested in. diff --git a/docs/dashboards-and-charts/dashboards-tab.md b/docs/dashboards-and-charts/dashboards-tab.md new file mode 100644 index 000000000..4d7bbc84f --- /dev/null +++ b/docs/dashboards-and-charts/dashboards-tab.md @@ -0,0 +1,96 @@ +# Dashboards tab + +With Netdata Cloud, you can build **custom dashboards** that target your infrastructure's unique needs. Put key metrics from any number of distributed systems in one place for a bird's eye view of your infrastructure. + +Click on the **Dashboards** tab in any Room to get started. + +## Create your first dashboard + +From the Dashboards tab, click on the **+** button. + +In the modal, give your custom dashboard a name, and click **+ Add**. + +- The **Add Chart** button on the top right of the interface adds your first chart card. From the dropdown, select either **All Nodes** or a specific node. + + Next, select the context. You'll see a preview of the chart before you finish adding it. In this modal you can also [interact with the chart](/docs/dashboards-and-charts/netdata-charts.md), meaning you can configure all the aspects of the [NIDL framework](/docs/dashboards-and-charts/netdata-charts.md#nidl-framework) of the chart and more in detail, you can: + - define which `group by` method to use + - select the aggregation function over the data source + - select nodes + - select instances + - select dimensions + - select labels + - select the aggregation function over time + + After you are done configuring the chart, you can also change the type of the chart from the right hand side of the [Title bar](/docs/dashboards-and-charts/netdata-charts.md#title-bar), and select which of the final dimensions you want to be visible and in what order, from the [Dimensions bar](/docs/dashboards-and-charts/netdata-charts.md#dimensions-bar). + +- The **Add Text** button on the top right of the interface creates a new card with user-defined text, which you can use to describe or document a particular dashboard's meaning and purpose. + +> ### Important +> +> Be sure to click the **Save** button any time you make changes to your dashboard. + +## Using your dashboard + +Dashboards are designed to be interactive and flexible so you can design them to your needs. They are made from any number of charts and cards, which can contain charts or text. + +### Charts + +The charts you add to any dashboard are [fully interactive](/docs/dashboards-and-charts/netdata-charts.md), just like any other Netdata chart. You can zoom in and out, highlight timeframes, and more. + +Charts also synchronize as you interact with them, even across contexts _or_ nodes. + +### Text cards + +You can use text cards as notes to explain to other members of the [Room](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md#netdata-cloud-rooms) the purpose of the dashboard's arrangement. + +By clicking the `T` icon on the text box, you can switch between font sizes. + +### Move elements + +To move a chart or a card, click and hold on **Drag & drop** at the top right of each element and drag it to a new location. A green placeholder indicates the +new location. Once you release your mouse, other elements re-sort to the grid system automatically. + +### Resize elements + +To resize any element on a dashboard, click on the bottom-right corner and drag it to its new size. Other elements re-sort to the grid system automatically. + +### Go to chart + +Quickly jump to the location of the chart in either the [Metrics tab](/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md) or if the chart refers to a single node, its single node dashboard by clicking the 3-dot icon in the corner of any chart to open a menu. Hit the **Go to Chart** item. + +You'll land directly on that chart of interest, but you can now scroll up and down to correlate your findings with other +charts. Of course, you can continue to zoom, highlight, and pan through time just as you're used to with Netdata Charts. + +### Rename a chart + +Using the 3-dot icon in the corner of any chart, you can rename it to better explain your use case or the visualization settings you've chosen for the chart. + +### Remove an individual element + +Click on the 3-dot icon in the corner of any card to open a menu. Click the **Remove** item to remove the card. + +## Managing your dashboard + +To see dashboards associated with the current Room, click the **Dashboards** tab in any Room. You can select dashboards and delete them using the ποΈ icon. + +### Update/save a dashboard + +If you've made changes to a dashboard, such as adding or moving elements, the **Save** button is enabled. Click it to save your most recent changes. + +Any other members of the Room will be able to see these changes the next time they load this dashboard. + +If multiple users attempt to make concurrent changes to the same dashboard, the second user who hits Save will be +prompted to either overwrite the dashboard or reload to see the most recent changes. + +### Delete a dashboard + +Delete any dashboard by navigating to it and clicking the **Delete** button. This will remove this entry from the +dropdown for every member of this Room. + +### Minimum browser viewport + +Because of the visual complexity of individual charts, dashboards require a minimum browser viewport of 800px. + +## What's next? + +Once you've designed a dashboard or two, make sure to [invite your team](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md#invite-your-team) if you haven't already. You can add these new users to the same Room to let them see the same dashboards without any effort. diff --git a/docs/dashboards-and-charts/events-feed.md b/docs/dashboards-and-charts/events-feed.md new file mode 100644 index 000000000..a5386e80e --- /dev/null +++ b/docs/dashboards-and-charts/events-feed.md @@ -0,0 +1,74 @@ +# Events tab + +The Events tab provides a feed which is a powerful feature that tracks events that happen on your infrastructure, or in your Space. The feed lets you investigate events that occurred in the past, which is invaluable for troubleshooting. Common use cases are ones like when a node goes offline, and you want to understand what events happened before that. A detailed event history can also assist in attributing sudden pattern changes in a time series to specific changes in your environment. + +## What are the available events? + +At a high-level view, these are the domains from which the Events feed will provide visibility into. + +> **Note** +> +> Based on your space's plan, different allowances are defined to query past data. + +| **Domains of events** | **Community** | **Homelab** | **Business** | **Enterprise On-Premise** | +|:------------------------------------------------------------------------------------------------------------------------------------------------|:--------------|:------------|:-------------|:--------------------------| +| **[Auditing events](#auditing-events)** <p>Events related to actions done on your Space, e.g. invite user, change user role or change plan.</p> | 4 hours | 90 days | 90 days | User dependent | +| **[Topology events](#topology-events)** <p>Node state transition events, e.g. live or offline.</p> | 4 hours | 14 days | 14 days | User dependent | +| **[Alert events](#alert-events)** <p>Alert state transition events, can be seen as an alert history log.</p> | 4 hours | 90 days | 90 days | User dependent | + +### Auditing events + +| **Event name** | **Description** | **Example** | +|:------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------| +| Space Created | The space was created. | Space `Acme Space` was **created** | +| Room Created | A Room was created on the Space. | Room `DB Servers` was **created** by `John Doe` | +| Room Deleted | A Room was deleted from the Space. | Room `DB servers` was **deleted** by `John Doe` | +| User Invited to Space | A user was invited to join the Space. | User `John Smith` was **invited** to this space by `Alan Doe` | +| User Uninvited from Space | An invitation for a user to join the space was revoked. | User `John Smith` was **uninvited** from this space | +| User Added to Space | A user was added to the Space from an invitation (user accepted the invitation). | User `John Smith` was **added** to this space by invite of `Alan Doe` | +| User Removed from Space | A user was added to the Space from an invitation. | User `John Smith` was **removed** from this space by `Alan Doe` | +| User Added to Room | A user was added to a Room on the Space. | User `John Smith` was **added** to Room `DB servers` | +| User Removed from Room | A user was removed from a Room on the Space. | User `John Smith` was **removed** from Room `DB Servers` by `Alan Doe` | +| User Space Properties Changed | The properties of a user on the Space have changed, e.g. change user role | User role for `John Smith` was **changed** to `troubleshooter` by `Alan Doe` | +| Node Added To Room | The node was added to a Room on the Space. | Node `ip-xyz.ec2.internal` was **added** to Room `DB Servers` by `John Doe` | +| Node Removed To Room | The node was removed from a Room on the Space. | Node `ip-xyz.ec2.internal` was **removed** from Room `DB Servers` by `John Doe` | +| Silencing Rule Created | A new alert notification silencing rule was created on the Space. | Silencing rule `DB Servers schedule silencing` on Rooms `All nodes` and `DB Servers` was **created** by `John Smith` | +| Silencing Rule Changed | An existing alert notification silencing rule was modified on the Space. | Silencing rule `DB Servers schedule silencing` on Rooms `All nodes` and `DB Servers` was **changed** by `John Doe` | +| Silencing Rule Deleted | An existing alert notifications silencing rule was removed from the Space. | Silencing rule `DB Servers schedule silencing` on Rooms `All nodes` and `DB Servers` was **changed** by `Alan Smith` | +| Space Claiming Token Created | A Space Claiming Token was created. | Claiming Token was created by user `John Doe` | +| Space Claiming Token Revoked | A Space Claiming Token was revoked. | Claiming Token `_OtF2ssjrv` was revoked by user `John Doe` | + +### Topology events + +| **Event name** | **Description** | **Example** | +|:--------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------| +| Node Became Live | The node is collecting and streaming metrics to Cloud. | Node `netdata-k8s-state-xyz` was **live** | +| Node Became Stale | The node is offline and not streaming metrics to Cloud. It can show historical data from a parent node. | Node `ip-xyz.ec2.internal` was **stale** | +| Node Became Offline | The node is offline, not streaming metrics to Cloud and not available in any parent node. | Node `ip-xyz.ec2.internal` was **offline** | +| Node Created | The node is created but it is still `Unseen` on Cloud, didn't establish a successful connection yet. | Node `ip-xyz.ec2.internal` was **created** | +| Node Removed | The node was removed from the Space, for example by using the `Delete` action on the node. This is a soft delete in that the node gets marked as deleted, but retains the association with this space. If it becomes live again, it will be restored (see `Node Restored` below) and reappear in this space as before. | Node `ip-xyz.ec2.internal` was **deleted (soft)** | +| Node Restored | The node was restored. See `Node Removed` above. | Node `ip-xyz.ec2.internal` was **restored** | +| Node Deleted | The node was deleted from the Space. This is a hard delete and no information on the node is retained. | Node `ip-xyz.ec2.internal` was **deleted (hard)** | +| Agent Connected | The agent connected to the Cloud MQTT server (Agent-Cloud Link established).<br/>These events can only be seen on _All nodes_ Room. | Agent with claim ID `7d87bqs9-cv42-4823-8sd4-3614548850c7` has connected to Cloud. | +| Agent Disconnected | The agent disconnected from the Cloud MQTT server (Agent-Cloud Link severed).<br/>These events can only be seen on _All nodes_ Room. | Agent with claim ID `7d87bqs9-cv42-4823-8sd4-3614548850c7` has disconnected from Cloud: **Connection Timeout**. | +| Space Statistics | Daily snapshot of space node statistics.<br/>These events can only be seen on _All nodes_ Room. | Space statistics. Nodes: **22 live**, **21 stale**, **18 removed**, **61 total**. | + +### Alert events + +| **Event name** | **Description** | **Example** | +|:-------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|| +| Node Alert State Changed | These are node alert state transition events and can be seen as an alert history log. You will be able to see transitions to or from any of these states: Cleared, Warning, Critical, Removed, Error or Unknown | Transition to Cleared:<br/>`httpcheck_web_service_bad_status` for `httpcheck_netdata_cloud.request_status` on `netdata-parent-xyz` recovered with value **8.33%**<br/><br/>Transition from Cleared to Warning or Critical:<br/>`httpcheck_web_service_bad_status` for `httpcheck_netdata_cloud.request_status` on `netdata-parent-xyz` was raised to **WARNING** with value **10%**<br/><br/>Transition from Warning to Critical:<br/>`httpcheck_web_service_bad_status` for `httpcheck_netdata_cloud.request_status` on `netdata-parent-xyz` escalated to **CRITICAL** with value **25%**<br/><br/>Transition from Critical to Warning:<br/>`httpcheck_web_service_bad_status` for `httpcheck_netdata_cloud.request_status` on `netdata-parent-xyz` was demoted to **WARNING** with value **10%**<br/><br/>Transition to Removed:<br/>Alert `httpcheck_web_service_bad_status` for `httpcheck_netdata_cloud.request_status` on `netdata-parent-xyz` is no longer available, state can't be assessed.<br/><br/>Transition to Error:<br/>For this alert `httpcheck_web_service_bad_status` related to `httpcheck_netdata_cloud.request_status` on `netdata-parent-xyz` we couldn't calculate the current value β | + +## Who can access the events? + +All users will be able to see events from the Topology and Alerts domain but Auditing events, once these are added, will only be accessible to administrators. For more details check the [Netdata Role-Based Access model](/docs/netdata-cloud/authentication-and-authorization/role-based-access-model.md). + +## How to use the events feed + +1. Click on the **Events** tab (located near the top of your screen) +1. You will be presented with a table listing the events that occurred from the timeframe defined on the [date time picker](/docs/dashboards-and-charts/visualization-date-and-time-controls.md#date-and-time-selector) +1. You can use the filtering capabilities available on right-hand bar to slice through the results provided. See more details on [event types and filters](#event-types-and-filters) + +> **Note** +> +> When you try to query a longer period than what your space allows you will see an error message highlighting that you are querying data outside your plan. diff --git a/docs/dashboards-and-charts/home-tab.md b/docs/dashboards-and-charts/home-tab.md new file mode 100644 index 000000000..23764815f --- /dev/null +++ b/docs/dashboards-and-charts/home-tab.md @@ -0,0 +1,60 @@ +# Home tab + +The Home tab allows users to see an overview of their Room. + +## Total nodes + +The total number of nodes is presented and dissected by their state, Live, Offline or Stale. + +## Active alerts + +The number of active alerts is presented in a donut chart, while also having counters for both Critical and Warning alerts. + +## Nodes map + +A map consisting of node entries allows for quick hoverable information about each node, while also presenting node status in a color-coded way. + +The map classification can be altered, allowing the categorization of nodes by: + +- Status (e.g. Live) +- OS (e.g. Ubuntu) +- Technology (e.g. Container) +- Agent version (e.g. v1.45.2) +- Replication factor (e.g. Single, Multi) +- Cloud provider (e.g AWS) +- Cloud region (e.g. us-east-1) +- Instance type (e.g. c6a.xlarge) + +Color-coding can also be configured between: + +- Status (e.g. Live, Offline) +- Connection stability (e.g. Stable, Unstable) +- Replication factor (e.g. None, Single) + +## Data replication + +There are two views about data replication in the Home tab: + +The first bar chart presents the amount of **Parents**, **Children** and **Standalone** nodes. + +The second bar chart presents the number of nodes depending on their Replication factor, **None**, **Single** and **Multi**. + +## Alerts overview over the last 24h + +There are two views that display information about nodes that produced the most alerts and top alerts in the last 24 hours. + +The first bar chart presents the nodes that produced the most alerts in a time window of the last 24 hours. + +The second table contains the top alerts in the last 24 hours, along with their instance, the occurrences and their duration in seconds. + +## Netdata Assistant shortcut + +In the Home tab there is a shortcut button in order to start an instant conversation with the [Netdata Assistant](https://github.com/netdata/netdata/edit/master/docs/netdata-assistant.md). + +## Space metrics + +There are three key metrics that are displayed in the Home tab, **Metrics collected**, **Charts visualized** and **Alerts configured**. + +## Data retention per Nodes + +This bar chart shows the number of nodes based on their retention period. diff --git a/docs/dashboards-and-charts/import-export-print-snapshot.md b/docs/dashboards-and-charts/import-export-print-snapshot.md new file mode 100644 index 000000000..80bf514ae --- /dev/null +++ b/docs/dashboards-and-charts/import-export-print-snapshot.md @@ -0,0 +1,78 @@ +<!-- +title: "Import, export, and print a snapshot" +description: >- + "Snapshots can be incredibly useful for diagnosing anomalies after + they've already happened, and are interoperable with any other node + running Netdata." +type: "how-to" +custom_edit_url: "/docs/dashboards-and-charts/import-export-print-snapshot.md" +sidebar_label: "Import, export, and print a snapshot" +learn_status: "Published" +learn_topic_type: "Tasks" +learn_rel_path: "Operations" +--> + +# Import, export, and print a snapshot + +>βThis feature is only available on v1 dashboards, it hasn't been port-forwarded to v2. +> For more information on accessing dashboards check [this documentation](/docs/dashboards-and-charts/README.md). + + +Netdata can export snapshots of the contents of your dashboard at a given time, which you can then import into any other +node running Netdata. Or, you can create a print-ready version of your dashboard to save to PDF or actually print to +paper. + +Snapshots can be incredibly useful for diagnosing anomalies after they've already happened. Let's say Netdata triggered a warning alert while you were asleep. In the morning, you can [select the +timeframe](/docs/dashboards-and-charts/visualization-date-and-time-controls.md) when the alert triggered, export a snapshot, and send it to a + +colleague for further analysis. + +Or, send the Netdata team a snapshot of your dashboard when [filing a bug +report](https://github.com/netdata/netdata/issues/new?assignees=&labels=bug%2Cneeds+triage&template=BUG_REPORT.yml) on +GitHub. + +![The export, import, and print +buttons](https://user-images.githubusercontent.com/1153921/114218399-360fb600-991e-11eb-8dea-fabd2bffc5b3.gif) + +## Import a snapshot + +To import a snapshot, click on the **import** icon ![Import +icon](https://raw.githubusercontent.com/netdata/netdata-ui/98e31799c1ec0983f433537ff16d2ac2b0d994aa/src/components/icon/assets/upload.svg) +in the top panel. + +Select the Netdata snapshot file to import. Once the file is loaded, the modal updates with information about the +snapshot and the system from which it was taken. Click **Import** to begin to process. + +Netdata takes the data embedded inside the snapshot and re-creates a static replica on your dashboard. When the import +finishes, you're free to move around and examine the charts. + +Some caveats and tips to keep in mind: + +- Only metrics in the export timeframe are available to you. If you zoom out or pan through time, you'll see the + beginning and end of the snapshot. +- Charts won't update with new information, as you're looking at a static replica, not the live dashboard. +- The import is only temporary. Reload your browser tab to return to your node's real-time dashboard. + +## Export a snapshot + +To export a snapshot, first pan/zoom any chart to an appropriate _visible timeframe_. The export snapshot will only +contain the metrics you see in charts, so choose the most relevant timeframe. + +Next, click on the **export** icon ![Export +icon](https://raw.githubusercontent.com/netdata/netdata-ui/98e31799c1ec0983f433537ff16d2ac2b0d994aa/src/components/icon/assets/download.svg) +in the top panel. + +Select the metrics resolution to export. The default is 1-second, equal to how often Netdata collects and stores +metrics. Lowering the resolution will reduce the number of data points, and thus the snapshot's overall size. + +Edit the snapshot file name and select your desired compression method. Click on **Export**. When the export is +complete, your browser will prompt you to save the `.snapshot` file to your machine. + +## Print a snapshot + +To print a snapshot, click on the **print** icon ![Import +icon](https://raw.githubusercontent.com/netdata/netdata-ui/98e31799c1ec0983f433537ff16d2ac2b0d994aa/src/components/icon/assets/print.svg) +in the top panel. + +When you click **Print**, Netdata opens a new window to render every chart. This might take some time. When finished, +Netdata opens a browser print dialog for you to save to PDF or print. diff --git a/docs/dashboards-and-charts/kubernetes-tab.md b/docs/dashboards-and-charts/kubernetes-tab.md new file mode 100644 index 000000000..9b5df87d8 --- /dev/null +++ b/docs/dashboards-and-charts/kubernetes-tab.md @@ -0,0 +1,42 @@ +# Kubernetes tab + +The Netdata dashboards feature enhanced visualizations for the resource utilization of Kubernetes (k8s) clusters, embedded in the default [Metrics tab](/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md) dashboard. + +These visualizations include a health map for viewing the status of k8s pods/containers, in addition to [Netdata charts](/docs/dashboards-and-charts/netdata-charts.md) for viewing per-second CPU, memory, disk, and networking metrics from k8s nodes. + +See our [Kubernetes deployment instructions](/packaging/installer/methods/kubernetes.md) for details on deploying Netdata on your Kubernetes cluster. + +## Available Kubernetes metrics + +Netdata Cloud organizes and visualizes the following metrics from your Kubernetes cluster from every container: + +| Metric | Description | +|------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `k8s.cgroup.cpu_limit` | CPU utilization as a percentage of the limit defined by the [pod specification `spec.containers[].resources.limits.cpu`](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-requests-and-limits-of-pod-and-container) or a [`LimitRange` object](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/#create-a-limitrange-and-a-pod). | +| `k8s.cgroup.cpu` | CPU utilization of the pod/container. 100% usage equals 1 fully-utilized core, 200% equals 2 fully-utilized cores, and so on. | +| `k8s.cgroup.throttled` | The percentage of runnable periods when tasks in a cgroup have been throttled. | +| `k8s.cgroup.throttled_duration` | The total time duration for which tasks in a cgroup have been throttled. | +| `k8s.cgroup.mem_utilization` | Memory utilization within the configured or system-wide (if not set) limits. | +| `k8s.cgroup.mem_usage_limit` | Memory utilization, without cache, as a percentage of the limit defined by the [pod specification `spec.containers[].resources.limits.memory`](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-requests-and-limits-of-pod-and-container) or a [`LimitRange` object](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/#create-a-limitrange-and-a-pod). | +| `k8s.cgroup.mem_usage` | Used memory, without cache. | +| `k8s.cgroup.mem` | The sum of `cache` and `rss` (resident set size) memory usage. | +| `k8s.cgroup.writeback` | The size of `dirty` and `writeback` cache. | +| `k8s.cgroup.pgfaults` | Sum of page fault bandwidth, which are raised when the Kubernetes cluster tries accessing a memory page that is mapped into the virtual address space, but not actually loaded into main memory. | +| `k8s.cgroup.throttle_io` | Sum of `read` and `write` per second across all PVs/PVCs attached to the container. | +| `k8s.cgroup.throttle_serviced_ops` | Sum of the `read` and `write` operations per second across all PVs/PVCs attached to the container. | +| `k8s.cgroup.net_net` | Sum of `received` and `sent` bandwidth per second. | +| `k8s.cgroup.net_packets` | Sum of `multicast`, `received`, and `sent` packets. | + + +When viewing the [overview of this dashboard](#kubernetes-containers-overview), Netdata presents the above metrics per container, or aggregated based on +their associated pods. + +## Kubernetes Containers overview + +At the top of the Kubernetes containers section there is a map, that with a given context colorizes the containers in terms of their utilization. + +The filtering of this map is controlled by using the [NIDL framework](/docs/dashboards-and-charts/netdata-charts.md#nidl-framework) from the definition bar of the chart. + +### Detailed information + +Hover over any of the pods/containers in the map to display a modal window, which contains contextual information and real-time metrics from that resource. diff --git a/docs/dashboards-and-charts/logs-tab.md b/docs/dashboards-and-charts/logs-tab.md new file mode 100644 index 000000000..3851d90da --- /dev/null +++ b/docs/dashboards-and-charts/logs-tab.md @@ -0,0 +1,16 @@ +# Logs tab + +The Logs tab is using the [`systemd` journal plugin](/src/collectors/systemd-journal.plugin/README.md), to present a structured view into your infrastructure's `systemd` logs. + +We have a thorough section explaining how you can [work with logs](/docs/category-overview-pages/working-with-logs.md), detailing how the plugin works, and what other utilities are used under the hood to provide you with the visualizations and the log entries. + +The [`systemd` journal plugin](/src/collectors/systemd-journal.plugin/README.md) documentation has information about: + +- [Key features the plugin provides](/src/collectors/systemd-journal.plugin/README.md#key-features) +- [Journal sources](/src/collectors/systemd-journal.plugin/README.md#journal-sources) +- [Journal fields](/src/collectors/systemd-journal.plugin/README.md#journal-fields) +- [Full-text search](/src/collectors/systemd-journal.plugin/README.md#full-text-search) +- [Query performance](/src/collectors/systemd-journal.plugin/README.md#query-performance) +- [Performance at scale](/src/collectors/systemd-journal.plugin/README.md#performance-at-scale) + +We recommend you to read through that document, to better understand how the plugin and the visualizations work. diff --git a/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md b/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md new file mode 100644 index 000000000..bf31b8a71 --- /dev/null +++ b/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md @@ -0,0 +1,25 @@ +# Metrics tab and single node tabs + +The Metrics tab is where all the time series [charts](/docs/dashboards-and-charts/netdata-charts.md) for all the nodes of a Room are located. + +You can also see single-node dashboards, essentially the same dashboard the Metrics tab offers but only for one node. They are reached from most places in the UI, often by clicking the name of a node. + +From this tab, a user can also reach the Integrations tab and run [Metric Correlations](/docs/metric-correlations.md) + +## Dashboard structure + +The dashboard consists of various charts presented in different chart types. They are categorized based on their [context](/docs/dashboards-and-charts/netdata-charts.md#contexts) and at the beginning of each section, there is a predefined arrangement of charts helping you to get an overview for that particular section. + +## Chart navigation Menu + +On the right-hand side, there is a bar that: + +- Allows for quick navigation through the sections of the dashboard +- Provides a filtering mechanism that can filter charts by: + - Host labels + - Node status + - Netdata version + - Individual nodes +- Presents the active alerts for the Room + +From this bar you can also view the maximum chart anomaly rate on each menu section by clicking the `AR%` button. diff --git a/docs/dashboards-and-charts/netdata-charts.md b/docs/dashboards-and-charts/netdata-charts.md new file mode 100644 index 000000000..5536f83b2 --- /dev/null +++ b/docs/dashboards-and-charts/netdata-charts.md @@ -0,0 +1,425 @@ +# Netdata Charts + +Learn how to use Netdata's powerful charts to troubleshoot with real-time, per-second metric data. + +Netdata excels in collecting, storing, and organizing metrics in out-of-the-box dashboards. +To make sense of all the metrics, Netdata offers an enhanced version of charts that update every second. + +These charts provide a lot of useful information, so that you can: + +- Enjoy the high-resolution, granular metrics collected by Netdata +- Examine all the metrics by hovering over them with your cursor +- Filter the metrics in any way you want using the [Definition bar](#definition-bar) +- View the combined anomaly rate of all underlying data with the [Anomaly Rate ribbon](#anomaly-rate-ribbon) +- Explore even more details about a chart's metrics through [hovering over certain elements of it](#hover-over-the-chart) +- Use intuitive tooling and shortcuts to pan, zoom or highlight areas of interest in your charts +- On highlight, get easy access to [Metric Correlations](/docs/metric-correlations.md) to see other metrics with similar patterns +- Have the dimensions sorted based on name or value +- View information about the chart, its plugin, context, and type +- View individual metric collection status about a chart + +These charts are available on Netdata Cloud's +[Metrics tab](/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md), [single sode tabs](/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md) and +on your [Custom Dashboards](/docs/dashboards-and-charts/dashboards-tab.md). + +## Overview + +A Netdata chart looks like this: + +<img src="https://user-images.githubusercontent.com/70198089/236133212-353c102f-a6ed-45b7-9251-34e004c7a10a.png" width="900"/> + +With a quick glance you have immediate information available at your disposal: + +- [Chart title and units](#title-bar) +- [Anomaly Rate ribbon](#anomaly-rate-ribbon) +- [Definition bar](#definition-bar) +- [Tool bar](#tool-bar) +- [Chart area](#hover-over-the-chart) +- [Legend with dimensions](#dimensions-bar) + +## Fundemental elements + +While Netdata's charts require no configuration and are easy to interact with, they have a lot of underlying complexity. To meaningfully organize charts out of the box based on what's happening in your nodes, Netdata uses the concepts of [dimensions](#dimensions), [contexts](#contexts), and [families](#families). + +Understanding how these work will help you more easily navigate the dashboard, +[write new alerts](/src/health/REFERENCE.md), or play around +with the [API](/src/web/api/README.md). + +### Dimensions + +A **dimension** is a value that gets shown on a chart. The value can be raw data or calculated values, such as the +average (the default), minimum, or maximum. These values can then be given any type of unit. For example, CPU +utilization is represented as a percentage, disk I/O as `MiB/s`, and available RAM as an absolute value in `MiB` or +`GiB`. + +Beneath every chart (or on the right-side if you configure the dashboard) is a legend of dimensions. When there are +multiple dimensions, you'll see a different entry in the legend for each dimension. + +The **Apps CPU Time** chart (with the [context](#contexts) `apps.cpu`), which visualizes CPU utilization of +different types of processes/services/applications on your node, always provides a vibrant example of a chart with +multiple dimensions. + +Dimensions can be [hidden](#show-and-hide-dimensions) to help you focus your attention. + +### Contexts + +A **context** is a way of grouping charts by the types of metrics collected and dimensions displayed. It's like a machine-readable naming and organization scheme. + +For example, the **Apps CPU Time** has the context `apps.cpu`. A little further down on the dashboard is a similar +chart, **Apps Real Memory (w/o shared)** with the context `apps.mem`. The `apps` portion of the context is the **type**, +whereas anything after the `.` is specified either by the chart's developer or by the [family](#families). + +By default, a chart's type affects where it fits in the menu, while its family creates submenus. + +Netdata also relies on contexts for [alert configuration](/src/health/REFERENCE.md) (the [`on` line](/src/health/REFERENCE.md#alert-line-on)). + +### Families + +**Families** are a _single instance_ of a hardware or software resource that needs to be displayed separately from +similar instances. + +For example, let's look at the **Disks** section, which contains a number of charts with contexts like `disk.io`, +`disk.ops`, `disk.backlog`, and `disk.util`. If your node has multiple disk drives at `sda` and `sdb`, Netdata creates +a separate family for each. + +Netdata now merges the contexts and families to create charts that are grouped by family, following a +`[context].[family]` naming scheme, so that you can see the `disk.io` and `disk.ops` charts for `sda` right next to each +other. + +Given the four example contexts, and two families of `sda` and `sdb`, Netdata will create the following charts and their +names: + +| Context | `sda` family | `sdb` family | +|:---------------|--------------------|--------------------| +| `disk.io` | `disk_io.sda` | `disk_io.sdb` | +| `disk.ops` | `disk_ops.sda` | `disk_ops.sdb` | +| `disk.backlog` | `disk_backlog.sda` | `disk_backlog.sdb` | +| `disk.util` | `disk_util.sda` | `disk_util.sdb` | + +## Title bar + +When you start interacting with a chart, you'll notice valuable information on the Title bar: + +<img src="https://github.com/netdata/netdata/assets/70198089/75d700de-bc7d-4b96-b73d-7b248b83afea" width="900"/> + +Title bar elements: + +- **Netdata icon**: this indicates that data is continuously being updated, this happens if [Time controls](/docs/dashboards-and-charts/visualization-date-and-time-controls.md#time-controls) are in Play or Force Play mode. +- **Chart title**: on the chart title you can see the title together with the metric being displayed, as well as the unit of measurement. +- **Chart status icon**: possible values are: Loading, Timeout, Error or No data, otherwise this icon is not shown. + +Along with viewing chart type, context and units, on this bar you have access to immediate actions over the chart: + + +<img src="https://github.com/netdata/netdata/assets/70198089/d21f326e-065c-4a08-bee9-69ad23736e38" width="200" /> + +- **Manage Alerts**: manage [Alert configurations](/docs/dashboards-and-charts/alerts-tab.md#alert-configurations-tab) for this chart. +- **Chart info**: get more information relevant to the chart you are interacting with. +- **Chart type**: change the chart type from **line**, **stacked**, **area**, **stacked bar** and **multi bar**. +- **Enter fullscreen mode**: expand the current chart to the full size of your screen. +- **User settings**: save your settings for the chart at hand, so it persists across dashboard reloads. + - Personal has the top priority. + - Room and Space settings for a chart are shared across all users who don't have personal settings for it. +- **Drag and Drop the chart to a Dashboard**: add the chart to an existing custom [Dashboard](/docs/dashboards-and-charts/dashboards-tab.md) or directly create a new one that includes the chart. + +## Definition bar + +Each composite chart has a definition bar to provide information and options about the following: + +<img src="https://user-images.githubusercontent.com/70198089/236134615-e53a1d68-8a0f-466b-b2ef-1974085f0e8d.png" width="900"/> + +- Group by option +- Aggregate function to be applied in case multiple data sources exist +- Nodes filter +- Instances filter +- Dimensions filter +- Labels filter +- The aggregate function over time to be applied if one point in the chart consists of multiple data points aggregated +- Resetting the Definition bar + +### NIDL framework + +To help users instantly understand and validate the data they see on charts, we developed the NIDL (Nodes, Instances, Dimensions, Labels) framework. This information is visualized on all charts. + +> You can explore the in-depth infographic, by clicking on this image and opening it in a new tab, +> allowing you to zoom in to the different parts of it. +> +> <a href="https://user-images.githubusercontent.com/2662304/235475061-44628011-3b1f-4c44-9528-34452018eb89.png" target="_blank"> +> <img src="https://user-images.githubusercontent.com/2662304/235475061-44628011-3b1f-4c44-9528-34452018eb89.png" width="400" border="0" align="center"/> +> </a> + +You can rapidly access condensed information for collected metrics, grouped by node, monitored instances, dimension, or any key/value label pair. + +At the Definition bar of each chart, there are a few dropdown menus: + +<img src="https://user-images.githubusercontent.com/43294513/235470150-62a3b9ac-51ca-4c0d-81de-8804e3d733eb.png" width="900"/> + +These dropdown menus have 2 functions: + +1. Provide additional information about the visualized chart, to help with understanding the data that is presented. +2. Provide filtering and grouping capabilities, altering the query on the fly, to help get different views of the dataset. + +The NIDL framework attaches metadata to every metric that is collected to provide for each of them the following consolidated data for the visible time frame: + +1. The volume contribution of each metric into the final query. So even if a query comes from 1000 nodes, the contribution of each node in the result can instantly be visualized. The same goes for instances, dimensions and labels. Especially for labels, Netdata also provides the volume contribution of each label `key:value` pair to the final query, so that you can immediately see how much every label value involved in the query affected the chart. +2. The anomaly rate of each of them for the time-frame of the query. This is used to quickly spot which of the nodes, instances, dimensions or labels have anomalies in the requested time-frame. +3. The minimum, average and maximum values of all the points used for the query. This is used to quickly spot which of the nodes, instances, dimensions or labels are responsible for a spike or a dive in the chart. + +All of these dropdown menus can be used for instantly filtering the information shown, by including or excluding specific nodes, instances, dimensions or labels. Directly from the dropdown menu, without the need to edit a query string and without any additional knowledge of the underlying data. + +### Group by dropdown + +The "Group by" dropdown menu allows selecting 1 or more groupings to be applied at once on the same dataset. + +<img src="https://user-images.githubusercontent.com/43294513/235468819-3af5a1d3-8619-48fb-a8b7-8e8b4cf6a8ff.png" width="900"/> + +It supports: + +1. **Group by Node**, to summarize the data of each node, and provide one dimension on the chart for each of the nodes involved. Filtering nodes is supported at the same time, using the nodes dropdown menu. +2. **Group by Instance**, to summarize the data of each instance and provide one dimension on the chart for each of the instances involved. Filtering instances is supported at the same time, using the instances dropdown menu. +3. **Group by Dimension**, so that each metric in the visualization is the aggregation of a single dimension. This provides a per dimension view of the data from all the nodes in the Room, taking into account filtering criteria if defined. +4. **Group by Label**, to summarize the data for each label value. Multiple label keys can be selected at the same time. + +Using this menu, you can slice and dice the data in any possible way, to quickly get different views of it, without the need to edit a query string and without any need to better understand the format of the underlying data. + +> ### Tip +> +> A very pertinent example is composite charts over contexts related to cgroups (VMs and containers). +> You have the means to change the default group by or apply filtering to get a better view into what data your are trying to analyze. +> For example, if you change the group by to _instance_ you get a view with the data of all the instances (cgroups) that contribute to that chart. +> Then you can use further filtering tools to focus the data that is important to you and even save the result to your own dashboards. + +> ### Tip +> +> Group by instance, dimension to see the time series of every individual collected metric participating in the chart. + +### Aggregate functions over data sources dropdown + +Each chart uses an opinionated-but-valuable default aggregate function over the data sources. + +<img src="https://user-images.githubusercontent.com/70198089/236136725-778670b4-7e81-44a8-8d3d-f38ded823c94.png" width="500"/> + +For example, the `system.cpu` chart shows the average for each dimension from every contributing chart, while the `net.net` chart shows the sum for each dimension from every contributing chart, which can also come from multiple networking interfaces. + +The following aggregate functions are available for each selected dimension: + +- **Average**: Displays the average value from contributing nodes. If a composite chart has 5 nodes with the following + values for the `out` dimension—`-2.1`, `-5.5`, `-10.2`, `-15`, `-0.1`—the composite chart displays a + value of `β6.58`. +- **Sum**: Displays the sum of contributed values. Using the same nodes, dimension, and values as above, the composite + chart displays a metric value of `-32.9`. +- **Min**: Displays a minimum value. For dimensions with positive values, the min is the value closest to zero. For + charts with negative values, the min is the value with the largest magnitude. +- **Max**: Displays a maximum value. For dimensions with positive values, the max is the value with the largest + magnitude. For charts with negative values, the max is the value closet to zero. + +### Nodes dropdown + +In this dropdown, you can view or filter the nodes contributing time-series metrics to the chart. +This menu also provides the contribution of each node to the volume of the chart, and a break down of the anomaly rate of the queried data per node. + +<img src="https://user-images.githubusercontent.com/70198089/236137765-b57d5443-3d4b-42f4-9e3d-db1eb606626f.png" width="900"/> + +If one or more nodes can't contribute to a given chart, the definition bar shows a warning symbol plus the number of +affected nodes, then lists them in the dropdown along with the associated error. Nodes might return errors because of +networking issues, a stopped `netdata` service, or because that node does not have any metrics for that context. + +### Instances dropdown + +In this dropdown, you can view or filter the instances contributing time-series metrics to the chart. +This menu also provides the contribution of each instance to the volume of the chart, and a break down of the anomaly rate of the queried data per instance. + +<img src="https://user-images.githubusercontent.com/70198089/236138302-4dd4072e-3a0d-43bb-a9d8-4dde79c65e92.png" width="900"/> + +### Dimensions dropdown + +In this dropdown, you can view or filter the original dimensions contributing time-series metrics to the chart. +This menu also presents the contribution of each original dimensions on the chart, and a break down of the anomaly rate of the data per dimension. + +<img src="https://user-images.githubusercontent.com/70198089/236138796-08dc6ac6-9a50-4913-a46d-d9bbcedd48f6.png" width="900"/> + +### Labels dropdown + +In this dropdown, you can view or filter the contributing time-series labels of the chart. +This menu also presents the contribution of each label on the chart,and a break down of the anomaly rate of the data per label. + +<img src="https://user-images.githubusercontent.com/70198089/236139027-8a51a958-2074-4675-a41b-efff30d8f51a.png" width="900"/> + +### Aggregate functions over time + +When the granularity of the data collected is higher than the plotted points on the chart an aggregation function over +time is applied. + +<img src="https://user-images.githubusercontent.com/70198089/236411297-e123db06-0117-4e24-a5ac-955b980a8f55.png" width="400"/> + +By default the aggregation applied is _average_ but the user can choose different options from the following: + +- Min, Max, Average or Sum +- Percentile + - you can specify the percentile you want to focus on: 25th, 50th, 75th, 80th, 90th, 95th, 97th, 98th and 99th. + <img src="https://user-images.githubusercontent.com/70198089/236410299-de5f3367-f3b0-4beb-a73f-a49007c543d4.png" width="250"/> +- Trimmed Mean or Trimmed Median + - you can choose the percentage of data tha you want to focus on: 1%, 2%, 3%, 5%, 10%, 15%, 20% and 25%. + <img src="https://user-images.githubusercontent.com/70198089/236410858-74b46af9-280a-4ab2-ad26-5a6aa9403aa8.png" width="250"/> +- Median +- Standard deviation +- Coefficient of variation +- Delta +- Single or Double exponential smoothing + +For more details on each, you can refer to our Agent's HTTP API details on [Data Queries - Data Grouping](/src/web/api/queries/README.md#data-grouping). + +### Reset to defaults + +Finally, you can reset everything to its defaults by clicking the green "Reset" prompt at the end of the definition bar. + +## Anomaly Rate ribbon + +Netdata's unsupervised machine learning algorithm creates a unique model for each metric collected by your agents, using exclusively the metric's past data. +It then uses these unique models during data collection to predict the value that should be collected and check if the collected value is within the range of acceptable values based on past patterns and behavior. + +If the value collected is an outlier, it is marked as anomalous. + +<img src="https://user-images.githubusercontent.com/70198089/236139886-79d63cf6-61ed-4aa7-842c-b5a1728c870d.png" width="900"/> + +This unmatched capability of real-time predictions as data is collected allows you to **detect anomalies for potentially millions of metrics across your entire infrastructure within a second of occurrence**. + +The Anomaly Rate ribbon on top of each chart visualizes the combined anomaly rate of all the underlying data, highlighting areas of interest that may not be easily visible to the naked eye. + +Hovering over the Anomaly Rate ribbon provides a histogram of the anomaly rates per presented dimension, for the specific point in time. + +Anomaly Rate visualization does not make Netdata slower. Anomaly rate is saved in the Netdata database, together with metric values, and due to the smart design of Netdata, it does not even incur a disk footprint penalty. + +## Hover over the chart + +Hovering over any point in the chart will reveal a more informative overlay. +It includes a bar indicating the volume percentage of each time series compared to the total, the anomaly rate, and a notification on if there are data collection issues. + +This overlay sorts all dimensions by value, makes bold the closest dimension to the mouse and presents a histogram based on the values of the dimensions. + +<img src="https://user-images.githubusercontent.com/70198089/236141460-bfa66b99-d63c-4a2c-84b1-2509ed94857f.png" width="500"/> + +When hovering the anomaly ribbon, the overlay sorts all dimensions by anomaly rate, and presents a histogram of these anomaly rates. + +#### Info column + +Additionally, when hovering over the chart, the overlay may display an indication in the "Info" column. + +Currently, this column is used to inform users of any data collection issues that might affect the chart. +Below each chart, there is an information ribbon. This ribbon currently shows 3 states related to the points presented in the chart: + +1. **[P]: Partial Data** + At least one of the dimensions in the chart has partial data, meaning that not all instances available contributed data to this point. This can happen when a container is stopped, or when a node is restarted. This indicator helps to gain confidence of the dataset, in situations when unusual spikes or dives appear due to infrastructure maintenance, or due to failures to part of the infrastructure. + +2. **[O]: Overflown** + At least one of the data sources included in the chart has a counter that has overflowed at this point. + +3. **[E]: Empty Data** + At least one of the dimensions included in the chart has no data at all for the given points. + +All these indicators are also visualized per dimension, in the pop-over that appears when hovering the chart. + +<img src="https://user-images.githubusercontent.com/70198089/236145768-8ffadd02-93a4-4e9e-b4ae-c1367f614a7e.png" width="700"/> + +## Play, Pause and Reset + +Your charts are controlled using the available [Time controls](/docs/dashboards-and-charts/visualization-date-and-time-controls.md#time-controls). +Besides these, when interacting with the chart you can also activate these controls by: + +- Hovering over any chart to temporarily pause it - this momentarily switches time control to Pause, so that you can + hover over a specific timeframe. When moving out of the chart time control will go back to Play (if it was it's + previous state) +- Clicking on the chart to lock it - this enables the Pause option on the time controls, to the current timeframe. This + is if you want to jump to a different chart to look for possible correlations. +- Double clicking to release a previously locked chart - move the time control back to Play + +| Interaction | Keyboard/mouse | Touchpad/touchscreen | Time control | +|:------------------|:---------------|:---------------------|:----------------------| +| **Pause** a chart | `hover` | `n/a` | Temporarily **Pause** | +| **Stop** a chart | `click` | `tap` | **Pause** | +| **Reset** a chart | `double click` | `n/a` | **Play** | + +Note: These interactions are available when the default "Pan" action is used from the [Tool Bar](#tool-bar). + +## Tool bar + +While exploring the chart, a tool bar will appear. This tool bar is there to support you on this task. +The available manipulation tools you can select are: + +<img src="https://user-images.githubusercontent.com/70198089/236143292-c1d75528-263d-4ddd-9db8-b8d6a31cb83e.png" width="400" /> + +- Pan +- Highlight +- Select and zoom +- Chart zoom +- Reset zoom + +### Pan + +Drag your mouse/finger to the right to pan backward through time, or drag to the left to pan forward in time. Think of +it like pushing the current timeframe off the screen to see what came before or after. + +| Interaction | Keyboard | Mouse | Touchpad/touchscreen | +|:------------|:---------|:---------------|:---------------------| +| **Pan** | `n/a` | `click + drag` | `touch drag` | + +### Highlight + +Selecting timeframes is useful when you see an interesting spike or change in a chart and want to investigate further by: + +- Looking at the same period of time on other charts/sections +- Running [metric correlations](/docs/metric-correlations.md) to filter metrics that also show something different in the selected period, vs the previous one + +| Interaction | Keyboard/mouse | Touchpad/touchscreen | +|:-----------------------------------|:---------------------------------------------------------|:---------------------| +| **Highlight** a specific timeframe | `Alt + mouse selection` or `β + mouse selection` (macOS) | `n/a` | + +> **Note** +> +> To clear a highlighted timeframe, simply click on the chart area. + +### Select and zoom + +You can zoom to a specific timeframe, either horizontally of vertically, by selecting a timeframe. + +| Interaction | Keyboard/mouse | Touchpad/touchscreen | +|:-------------------------------------------|:-------------------------------------|:-----------------------------------------------------| +| **Zoom** to a specific timeframe | `Shift + mouse vertical selection` | `n/a` | +| **Horizontal Zoom** a specific Y-axis area | `Shift + mouse horizontal selection` | `n/a` | + +### Chart zoom + +Zooming in helps you see metrics with maximum granularity, which is useful when you're trying to diagnose the root cause +of an anomaly or outage. + +Zooming out lets you see metrics within the larger context, such as the last hour, day, or week, which is useful in understanding what "normal" looks like, or to identify long-term trends, like a slow creep in memory usage. + +| Interaction | Keyboard/mouse | Touchpad/touchscreen | +|:-------------------------------------------|:-------------------------------------|:-----------------------------------------------------| +| **Zoom** in or out | `Shift + mouse scrollwheel` | `two-finger pinch` <br />`Shift + two-finger scroll` | + +## Dimensions bar + +### Order dimensions legend + +The bottom legend where you can see the dimensions of the chart can be ordered by: + +<img src="https://user-images.githubusercontent.com/70198089/236144658-6c3d0e31-9bcb-45f3-bb95-4eafdcbb0a58.png" width="300" /> + +- Dimension name (Ascending or Descending) +- Dimension value (Ascending or Descending) +- Dimension Anomaly Rate (Ascending or Descending) + +### Show and hide dimensions + +Hiding dimensions simplifies the chart and can help you better discover exactly which aspect of your system might be +behaving strangely. + +| Interaction | Keyboard/mouse | Touchpad/touchscreen | +|:---------------------------------------|:----------------|:---------------------| +| **Show one** dimension and hide others | `click` | `tap` | +| **Toggle (show/hide)** one dimension | `Shift + click` | `n/a` | + +## Resize a chart + +To resize the chart, click-and-drag the icon on the bottom-right corner of any chart. To restore the chart to its original height, double-click the same icon. diff --git a/docs/dashboards-and-charts/node-filter.md b/docs/dashboards-and-charts/node-filter.md new file mode 100644 index 000000000..9f5371fff --- /dev/null +++ b/docs/dashboards-and-charts/node-filter.md @@ -0,0 +1,17 @@ +# Node filter + +The node filter allows you to quickly filter the nodes visualized in a Room's views. It appears on all views, except on single-node dashboards. + +Inside the filter, the nodes get categorized into three groups: + +| Group | Description | +|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Live | Nodes that are currently online, collecting and streaming metrics to Cloud. Live nodes display raised [Alert](/docs/dashboards-and-charts/alerts-tab.md) counters, [Machine Learning](/src/ml/README.md) availability, and [Functions](/docs/top-monitoring-netdata-functions.md) availability | +| Stale | Nodes that are offline and not streaming metrics to Cloud. Only historical data can be presented from a parent node. For these nodes you can only see their ML status, as they are not online to provide more information | +| Offline | Nodes that are offline, not streaming metrics to Cloud and not available in any parent node. Offline nodes are automatically deleted after 30 days and can also be deleted manually. | + +By using the search bar, you can narrow down to specific nodes based on their name. + +When you select one or more nodes, the total selected number will appear in the **Nodes** bar on the **Selected** field. + +![The node filter](https://user-images.githubusercontent.com/70198089/225249850-60ce4fcc-4398-4412-a6b5-6082308f4e60.png) diff --git a/docs/dashboards-and-charts/nodes-tab.md b/docs/dashboards-and-charts/nodes-tab.md new file mode 100644 index 000000000..70d2bca89 --- /dev/null +++ b/docs/dashboards-and-charts/nodes-tab.md @@ -0,0 +1,57 @@ +# Nodes tab + +The nodes tab provides a summarized view of your [Room](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md#netdata-cloud-rooms), allowing you to view quick information per node. + +> **Tip** +> +> Keep in mind that all configurations mentioned below are persistent and visible across all users. + +## Center information view + +The center information view consists of one row per node, and can be configured and filtered by the user. + +### Filtering and adjusting the view + +In the top right-hand corner, you can: + +- Order the nodes per status or per alert status +- Select which charts you want to be displayed as quick reference points + +### Node row + +Each node row allows you to: + +- View the node's status +- Go to a single node dashboard, by clicking the node name +- View information about the node, along with a button to display more in the right-hand sidebar +- View active alerts for the node +- View Machine Learning status +- View Functions capability status +- Add configuration (beta) +- [Add alert silencing rules](/docs/alerts-and-notifications/notifications/centralized-cloud-notifications/manage-alert-notification-silencing-rules.md) +- View a set of key attributes collected on your node + +## Right bar + +The bar on the right-hand side provides additional information about the nodes in the Room and allows you to filter what is displayed in the [center information view](#center-information-view). + +### Node hierarchy + +The first tab displays a hierarchy of the nodes displayed, making it easy to find a specific node by name. It follows the ordering that the user has selected. + +### Filters sub-tab + +The second tab allows you to filter which nodes are displayed, you can filter by: + +- Host labels +- Node status +- Netdata version +- Individual nodes + +### Alerts sub-tab + +The third tab displays Room alerts and allows you to see additional information about each alert. + +### Info sub-tab + +The last tab presents information about a node, by clicking the `i` icon from a node's row, right next to its name. diff --git a/docs/dashboards-and-charts/themes.md b/docs/dashboards-and-charts/themes.md new file mode 100644 index 000000000..0ca7425ae --- /dev/null +++ b/docs/dashboards-and-charts/themes.md @@ -0,0 +1,15 @@ +# Choose your Netdata UI theme + +The Dark theme is the default in the Netdata UI. + +To change your theme across the Netdata UI, click on your profile picture, click on the **Settings** +tab, and then choose your preferred theme: **Light** or **Dark**. + +**Dark**: + +![Dark theme](https://github.com/netdata/netdata/assets/70198089/81addd13-28a4-425f-ae39-0f9de5199496) + +**Light**: + +![Light theme](https://github.com/netdata/netdata/assets/70198089/eb0fb8c1-5695-450a-8ba8-a185874e8496) + diff --git a/docs/dashboards-and-charts/top-tab.md b/docs/dashboards-and-charts/top-tab.md new file mode 100644 index 000000000..4edaf32f9 --- /dev/null +++ b/docs/dashboards-and-charts/top-tab.md @@ -0,0 +1,27 @@ +# Top tab + +The Top tab allows you to run [Netdata Functions](/docs/top-monitoring-netdata-functions.md) on a node where a Netdata Agent is running. These routines are exposed by a given collector. +They can be used to retrieve additional information to help you troubleshoot or to trigger some action to happen on the node itself. + +> **Tip** +> +> You can also execute a Function from the [Nodes tab](/docs/dashboards-and-charts/nodes-tab.md), by pressing the `f(x)` button. + +> **Note** +> +> If you get an error saying that your node can't execute Functions please check the [prerequisites](/docs/top-monitoring-netdata-functions.md#prerequisites). + +The main view of this tab provides you with (depending on the Function) two elements: a visualization on the top and a table on the bottom. + +Visualizations vary depending on the Function and most allow for user customization. + +On the top right-hand corner you can: + +- Refresh the results (Given that the dashboard is on `Paused` mode) +- Set the update interval of the results. + +## Functions bar + +The bar on the right-hand side allows you to select which Function to run, on which node, and then depending on the Function, there might be more fine-grained filtering available. + +For example the `Block-devices` Function allows you to filter per Device, Type, ID, Model and Serial number or the Block devices on your node. diff --git a/docs/dashboards-and-charts/visualization-date-and-time-controls.md b/docs/dashboards-and-charts/visualization-date-and-time-controls.md new file mode 100644 index 000000000..3e2b6dbdc --- /dev/null +++ b/docs/dashboards-and-charts/visualization-date-and-time-controls.md @@ -0,0 +1,92 @@ +# Visualization date and time controls + +Netdata's dashboard features powerful date visualization controls that include a time control, a timezone selector and a rich date and timeframe selector. + +The controls come with useful defaults and rich customization, to help you narrow your focus when troubleshooting issues or anomalies. + +## Time controls + +The time control provides you the following options: **Play**, **Pause** and **Force Play**. + +- **Play** - the content of the page will be automatically refreshed while this is in the foreground +- **Pause** - the content of the page isn't refreshed due to a manual request to pause it or, for example, when your investigating data on a chart (cursor is on top of a chart) +- **Force Play** - the content of the page will be automatically refreshed even if this is in the background + +With this, we aim to bring more clarity and allow you to distinguish if the content you are looking at is live or historical and also allow you to always refresh the content of the page when the tabs are in the background. + +Main use cases for **Force Play**: + +- You use a terminal or deployment tools to do changes in your infra and want to see the effect immediately, Netdata is in the background, displaying the impact of these changes +- You want to have Netdata on the background, example displayed on a TV, to constantly see metrics through dashboards or to watch the alert status + +![The time control with Play, Pause and Force Play](https://user-images.githubusercontent.com/70198089/225850250-1fe12477-23f8-4b4d-b497-79b416963e10.png) + +## Date and time selector + +The date and time selector allows you to change the visible timeframe and change the timezone used in the interface. + +### Pick timeframes to visualize + +While [panning through time and zooming in/out](/docs/dashboards-and-charts/netdata-charts.md) from charts it is helpful when you're looking a recent history, or want to do granular troubleshooting, what if you want to see metrics from 6 hours ago? Or 6 days? + +Netdata's dashboard features a **timeframe selector** to help you visualize specific timeframes in a few helpful ways. +By default, it shows a certain number of minutes of historical metrics based on the your browser's viewport to ensure it's always showing per-second granularity. + +#### Open the timeframe selector + +To visualize a new timeframe, you need to open the picker, which appears just above the menu, near the top-right bar of the dashboard. + +![Timeframe Selector](https://user-images.githubusercontent.com/70198089/225850611-728936d9-7ca4-49fa-8d37-1ce73dd6f76c.png) + +The **Clear** button resets the dashboard back to its default state based on your browser viewport, and **Apply** closes +the picker and shifts all charts to the selected timeframe. + +#### Use the pre-defined timeframes + +Click any of the following options in the predefined timeframe column to choose between: + +- Last 5 minutes +- Last 15 minutes +- Last 30 minutes +- Last hour +- Last 2 hours +- Last 6 hours +- Last 12 hours +- Last day +- Last 2 days +- Last 7 days + +Click **Apply** to see metrics from your selected timeframe. + +#### Choose a specific interval + +Beneath the predefined timeframe columns is an input field and dropdown you use in combination to select a specific timeframe of +minutes, hours, days, or months. Enter a number and choose the appropriate unit of time, then click **Apply**. + +#### Choose multiple days via the calendar + +Use the calendar to select multiple days. Click on a date to begin the timeframe selection, then an ending date. The +timeframe begins at noon on the beginning and end dates. Click **Apply** to see your selected multi-day timeframe. + +#### Caveats and considerations + +**Longer timeframes will decrease metrics granularity**. At the default timeframe, based on your browser viewport, each +"tick" on charts represents one second. If you select a timeframe of 6 hours, each tick represents the _average_ value +across a larger period of time. + +**You can only see metrics as far back in history as your metrics retention policy allows**. Netdata uses an internal +time-series database (TSDB) to store as many metrics as it can within a specific amount of disk space. The default +storage is 256 MiB, which should be enough for 1-3 days of historical metrics. If you navigate back to a timeframe +beyond stored historical metrics, you'll see this message: + +![image](https://user-images.githubusercontent.com/70198089/225851033-43b95164-a651-48f2-8915-6aac9739ed93.png) + +At any time, [configure the internal TSDB's storage capacity](/docs/netdata-agent/configuration/optimizing-metrics-database/change-metrics-storage.md) to expand your +depth of historical metrics. + +### Timezone selector + +The default timezone used in all date and time fields in Netdata Cloud comes from your browser. To change it, open the +date and time selector and use the control displayed here: + +![Timezone selector](https://user-images.githubusercontent.com/43294513/216628390-c3bd1cd2-349d-4523-b8d3-c7e68395f670.png) |