summaryrefslogtreecommitdiffstats
path: root/docs/guides/longer-metrics-storage.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/guides/longer-metrics-storage.md')
-rw-r--r--docs/guides/longer-metrics-storage.md218
1 files changed, 108 insertions, 110 deletions
diff --git a/docs/guides/longer-metrics-storage.md b/docs/guides/longer-metrics-storage.md
index 85edb55e..8ccd9585 100644
--- a/docs/guides/longer-metrics-storage.md
+++ b/docs/guides/longer-metrics-storage.md
@@ -1,160 +1,158 @@
<!--
-title: "Change how long Netdata stores metrics"
-description: "With a single configuration change, the Netdata Agent can store days, weeks, or months of metrics at its famous per-second granularity."
+title: "Netdata Longer Metrics Retention"
+description: ""
custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/longer-metrics-storage.md
-->
-# Change how long Netdata stores metrics
+# Netdata Longer Metrics Retention
-Netdata helps you collect thousands of system and application metrics every second, but what about storing them for the
-long term?
+Metrics retention affects 3 parameters on the operation of a Netdata Agent:
-Many people think Netdata can only store about an hour's worth of real-time metrics, but that's simply not true any
-more. With the right settings, Netdata is quite capable of efficiently storing hours or days worth of historical,
-per-second metrics without having to rely on an [exporting engine](/docs/export/external-databases.md).
+1. The disk space required to store the metrics.
+2. The memory the Netdata Agent will require to have that retention available for queries.
+3. The CPU resources that will be required to query longer time-frames.
-This guide gives two options for configuring Netdata to store more metrics. **We recommend the default [database
-engine](#using-the-database-engine)**, but you can stick with or switch to the round-robin database if you prefer.
+As retention increases, the resources required to support that retention increase too.
-Let's get started.
+Since Netdata Agents usually run at the edge, inside production systems, Netdata Agent **parents** should be considered. When having a **parent - child** setup, the child (the Netdata Agent running on a production system) delegates all its functions, including longer metrics retention and querying, to the parent node that can dedicate more resources to this task. A single Netdata Agent parent can centralize multiple children Netdata Agents (dozens, hundreds, or even thousands depending on its available resources).
-## Using the database engine
-The database engine uses RAM to store recent metrics while also using a "spill to disk" feature that takes advantage of
-available disk space for long-term metrics storage. This feature of the database engine allows you to store a much
-larger dataset than your system's available RAM.
+## Ephemerality of metrics
-The database engine is currently the default method of storing metrics, but if you're not sure which database you're
-using, check out your `netdata.conf` file and look for the `memory mode` setting:
+The ephemerality of metrics plays an important role in retention. In environments where metrics stop being collected and new metrics are constantly being generated, we are interested about 2 parameters:
-```conf
-[global]
- memory mode = dbengine
-```
-
-If `memory mode` is set to anything but `dbengine`, change it and restart Netdata using the standard command for
-restarting services on your system. You're now using the database engine!
+1. The **expected concurrent number of metrics** as an average for the lifetime of the database.
+ This affects mainly the storage requirements.
-What makes the database engine efficient? While it's structured like a traditional database, the database engine splits
-data between RAM and disk. The database engine caches and indexes data on RAM to keep memory usage low, and then
-compresses older metrics onto disk for long-term storage.
+2. The **expected total number of unique metrics** for the lifetime of the database.
+ This affects mainly the memory requirements for having all these metrics indexed and available to be queried.
-When the Netdata dashboard queries for historical metrics, the database engine will use its cache, stored in RAM, to
-return relevant metrics for visualization in charts.
+## Granularity of metrics
-Now, given that the database engine uses _both_ RAM and disk, there are two other settings to consider: `page cache
-size` and `dbengine multihost disk space`.
+The granularity of metrics (the frequency they are collected and stored, i.e. their resolution) is significantly affecting retention.
-```conf
-[global]
- page cache size = 32
- dbengine multihost disk space = 256
-```
+Lowering the granularity from per second to every two seconds, will double their retention and half the CPU requirements of the Netdata Agent, without affecting disk space or memory requirements.
-`page cache size` sets the maximum amount of RAM (in MiB) the database engine will use for caching and indexing.
-`dbengine multihost disk space` sets the maximum disk space (again, in MiB) the database engine will use for storing
-compressed metrics. The default settings retain about two day's worth of metrics on a system collecting 2,000 metrics
-every second.
+## Which database mode to use
-[**See our database engine
-calculator**](/docs/store/change-metrics-storage.md#calculate-the-system-resources-ram-disk-space-needed-to-store-metrics)
-to help you correctly set `dbengine multihost disk space` based on your needs. The calculator gives an accurate estimate
-based on how many child nodes you have, how many metrics your Agent collects, and more.
+Netdata Agents support multiple database modes.
-With the database engine active, you can back up your `/var/cache/netdata/dbengine/` folder to another location for
-redundancy.
+The default mode `[db].mode = dbengine` has been designed to scale for longer retentions.
-Now that you know how to switch to the database engine, let's cover the default round-robin database for those who
-aren't ready to make the move.
+The other available database modes are designed to minimize resource utilization and should usually be considered on **parent - child** setups at the children side.
-## Using the round-robin database
+So,
-In previous versions, Netdata used a round-robin database to store 1 hour of per-second metrics.
+* On a single node setup, use `[db].mode = dbengine` to increase retention.
+* On a **parent - child** setup, use `[db].mode = dbengine` on the parent to increase retention and a more resource efficient mode (like `save`, `ram` or `none`) for the child to minimize resources utilization.
-To see if you're still using this database, or if you would like to switch to it, open your `netdata.conf` file and see
-if `memory mode` option is set to `save`.
+To use `dbengine`, set this in `netdata.conf` (it is the default):
-```conf
-[global]
- memory mode = save
```
+[db]
+ mode = dbengine
+```
+
+## Tiering
-If `memory mode` is set to `save`, then you're using the round-robin database. If so, the `history` option is set to
-`3600`, which is the equivalent to 3,600 seconds, or one hour.
+`dbengine` supports tiering. Tiering allows having up to 3 versions of the data:
-To increase your historical metrics, you can increase `history` to the number of seconds you'd like to store:
+1. Tier 0 is the high resolution data.
+2. Tier 1 is the first tier that samples data every 60 data collections of Tier 0.
+3. Tier 2 is the second tier that samples data every 3600 data collections of Tier 0 (60 of Tier 1).
-```conf
-[global]
- # 2 hours = 2 * 60 * 60 = 7200 seconds
- history = 7200
- # 4 hours = 4 * 60 * 60 = 14440 seconds
- history = 14440
- # 24 hours = 24 * 60 * 60 = 86400 seconds
- history = 86400
+To enable tiering set `[db].storage tiers` in `netdata.conf` (the default is 1, to enable only Tier 0):
+
+```
+[db]
+ mode = dbengine
+ storage tiers = 3
```
-And so on.
+## Disk space requirements
-Next, check to see how many metrics Netdata collects on your system, and how much RAM that uses. Visit the Netdata
-dashboard and look at the bottom-right corner of the interface. You'll find a sentence similar to the following:
+Netdata Agents require about 1 bytes on disk per database point on Tier 0 and 4 times more on higher tiers (Tier 1 and 2). They require 4 times more storage per point compared to Tier 0, because for every point higher tiers store `min`, `max`, `sum`, `count` and `anomaly rate` (the values are 5, but they require 4 times the storage because `count` and `anomaly rate` are 16-bit integers). The `average` is calculated on the fly at query time using `sum / count`.
-> Every second, Netdata collects 1,938 metrics, presents them in 299 charts and monitors them with 81 alarms. Netdata is
-> using 25 MB of memory on **netdata-linux** for 1 hour, 6 minutes and 36 seconds of real-time history.
+### Tier 0 - per second for a week
-On this desktop system, using a Ryzen 5 1600 and 16GB of RAM, the round-robin databases uses 25 MB of RAM to store just
-over an hour's worth of data for nearly 2,000 metrics.
+For 2000 metrics, collected every second and retained for a week, Tier 0 needs: 1 byte x 2000 metrics x 3600 secs per hour x 24 hours per day x 7 days per week = 1100MB.
-To increase the `history` option, you need to edit your `netdata.conf` file and increase the `history` setting. In most
-installations, you'll find it at `/etc/netdata/netdata.conf`, but some operating systems place it at
-`/opt/netdata/etc/netdata/netdata.conf`.
+The setting to control this is in `netdata.conf`:
-Use `/etc/netdata/edit-config netdata.conf`, or your favorite text editor, to replace `3600` with the number of seconds
-you'd like to store.
+```
+[db]
+ mode = dbengine
+
+ # per second data collection
+ update every = 1
+
+ # enable only Tier 0
+ storage tiers = 1
+
+ # Tier 0, per second data for a week
+ dbengine multihost disk space MB = 1100
+```
-You should base this number on two things: How much history you need for your use case, and how much RAM you're willing
-to dedicate to Netdata.
+By setting it to `1100` and restarting the Netdata Agent, this node will start maintaining about a week of data. But pay attention to the number of metrics. If you have more than 2000 metrics on a node, or you need more that a week of high resolution metrics, you may need to adjust this setting accordingly.
-> Take care when you change the `history` option on production systems. Netdata is configured to stop its process if
-> your system starts running out of RAM, but you can never be too careful. Out of memory situations are very bad.
+### Tier 1 - per minute for a month
-How much RAM will a longer history use? Let's use a little math.
+Tier 1 is by default sampling the data every 60 points of Tier 0. If Tier 0 is per second, then Tier 1 is per minute.
-The round-robin database needs 4 bytes for every value Netdata collects. If Netdata collects metrics every second,
-that's 4 bytes, per second, per metric.
+Tier 1 needs 4 times more storage per point compared to Tier 0. So, for 2000 metrics, with per minute resolution, retained for a month, Tier 1 needs: 4 bytes x 2000 metrics x 60 minutes per hour x 24 hours per day x 30 days per month = 330MB.
-```text
-4 bytes * X seconds * Y metrics = RAM usage in bytes
+Do this in `netdata.conf`:
+
+```
+[db]
+ mode = dbengine
+
+ # per second data collection
+ update every = 1
+
+ # enable only Tier 0 and Tier 1
+ storage tiers = 2
+
+ # Tier 0, per second data for a week
+ dbengine multihost disk space MB = 1100
+
+ # Tier 1, per minute data for a month
+ dbengine tier 1 multihost disk space MB = 330
```
-Let's assume your system collects 1,000 metrics per second.
+Once `netdata.conf` is edited, the Netdata Agent needs to be restarted for the changes to take effect.
-```text
-4 bytes * 3600 seconds * 1,000 metrics = 14400000 bytes = 14.4 MB RAM
-```
+### Tier 2 - per hour for a year
-With that formula, you can calculate the RAM usage for much larger history settings.
-
-```conf
-# 2 hours at 1,000 metrics per second
-4 bytes * 7200 seconds * 1,000 metrics = 28800000 bytes = 28.8 MB RAM
-# 2 hours at 2,000 metrics per second
-4 bytes * 7200 seconds * 2,000 metrics = 57600000 bytes = 57.6 MB RAM
-# 4 hours at 2,000 metrics per second
-4 bytes * 14440 seconds * 2,000 metrics = 115520000 bytes = 115.52 MB RAM
-# 24 hours at 1,000 metrics per second
-4 bytes * 86400 seconds * 1,000 metrics = 345600000 bytes = 345.6 MB RAM
-```
+Tier 2 is by default sampling data every 3600 points of Tier 0 (60 of Tier 1). If Tier 0 is per second, then Tier 2 is per hour.
-## What's next?
+The storage requirements are the same to Tier 1.
-Now that you have either configured database engine or round-robin database engine to store more metrics, you'll
-probably want to see it in action!
+For 2000 metrics, with per hour resolution, retained for a year, Tier 2 needs: 4 bytes x 2000 metrics x 24 hours per day x 365 days per year = 67MB.
+
+Do this in `netdata.conf`:
+
+```
+[db]
+ mode = dbengine
+
+ # per second data collection
+ update every = 1
+
+ # enable only Tier 0 and Tier 1
+ storage tiers = 3
+
+ # Tier 0, per second data for a week
+ dbengine multihost disk space MB = 1100
+
+ # Tier 1, per minute data for a month
+ dbengine tier 1 multihost disk space MB = 330
+
+ # Tier 2, per hour data for a year
+ dbengine tier 2 multihost disk space MB = 67
+```
-For more information about how to pan charts to view historical metrics, see our documentation on [using
-charts](/web/README.md#using-charts).
+Once `netdata.conf` is edited, the Netdata Agent needs to be restarted for the changes to take effect.
-And if you'd now like to reduce Netdata's resource usage, view our [performance
-guide](/docs/guides/configure/performance.md) for our best practices on optimization.