diff options
Diffstat (limited to 'docs/guides/longer-metrics-storage.md')
-rw-r--r-- | docs/guides/longer-metrics-storage.md | 218 |
1 files changed, 108 insertions, 110 deletions
diff --git a/docs/guides/longer-metrics-storage.md b/docs/guides/longer-metrics-storage.md index 85edb55e..8ccd9585 100644 --- a/docs/guides/longer-metrics-storage.md +++ b/docs/guides/longer-metrics-storage.md @@ -1,160 +1,158 @@ <!-- -title: "Change how long Netdata stores metrics" -description: "With a single configuration change, the Netdata Agent can store days, weeks, or months of metrics at its famous per-second granularity." +title: "Netdata Longer Metrics Retention" +description: "" custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/longer-metrics-storage.md --> -# Change how long Netdata stores metrics +# Netdata Longer Metrics Retention -Netdata helps you collect thousands of system and application metrics every second, but what about storing them for the -long term? +Metrics retention affects 3 parameters on the operation of a Netdata Agent: -Many people think Netdata can only store about an hour's worth of real-time metrics, but that's simply not true any -more. With the right settings, Netdata is quite capable of efficiently storing hours or days worth of historical, -per-second metrics without having to rely on an [exporting engine](/docs/export/external-databases.md). +1. The disk space required to store the metrics. +2. The memory the Netdata Agent will require to have that retention available for queries. +3. The CPU resources that will be required to query longer time-frames. -This guide gives two options for configuring Netdata to store more metrics. **We recommend the default [database -engine](#using-the-database-engine)**, but you can stick with or switch to the round-robin database if you prefer. +As retention increases, the resources required to support that retention increase too. -Let's get started. +Since Netdata Agents usually run at the edge, inside production systems, Netdata Agent **parents** should be considered. When having a **parent - child** setup, the child (the Netdata Agent running on a production system) delegates all its functions, including longer metrics retention and querying, to the parent node that can dedicate more resources to this task. A single Netdata Agent parent can centralize multiple children Netdata Agents (dozens, hundreds, or even thousands depending on its available resources). -## Using the database engine -The database engine uses RAM to store recent metrics while also using a "spill to disk" feature that takes advantage of -available disk space for long-term metrics storage. This feature of the database engine allows you to store a much -larger dataset than your system's available RAM. +## Ephemerality of metrics -The database engine is currently the default method of storing metrics, but if you're not sure which database you're -using, check out your `netdata.conf` file and look for the `memory mode` setting: +The ephemerality of metrics plays an important role in retention. In environments where metrics stop being collected and new metrics are constantly being generated, we are interested about 2 parameters: -```conf -[global] - memory mode = dbengine -``` - -If `memory mode` is set to anything but `dbengine`, change it and restart Netdata using the standard command for -restarting services on your system. You're now using the database engine! +1. The **expected concurrent number of metrics** as an average for the lifetime of the database. + This affects mainly the storage requirements. -What makes the database engine efficient? While it's structured like a traditional database, the database engine splits -data between RAM and disk. The database engine caches and indexes data on RAM to keep memory usage low, and then -compresses older metrics onto disk for long-term storage. +2. The **expected total number of unique metrics** for the lifetime of the database. + This affects mainly the memory requirements for having all these metrics indexed and available to be queried. -When the Netdata dashboard queries for historical metrics, the database engine will use its cache, stored in RAM, to -return relevant metrics for visualization in charts. +## Granularity of metrics -Now, given that the database engine uses _both_ RAM and disk, there are two other settings to consider: `page cache -size` and `dbengine multihost disk space`. +The granularity of metrics (the frequency they are collected and stored, i.e. their resolution) is significantly affecting retention. -```conf -[global] - page cache size = 32 - dbengine multihost disk space = 256 -``` +Lowering the granularity from per second to every two seconds, will double their retention and half the CPU requirements of the Netdata Agent, without affecting disk space or memory requirements. -`page cache size` sets the maximum amount of RAM (in MiB) the database engine will use for caching and indexing. -`dbengine multihost disk space` sets the maximum disk space (again, in MiB) the database engine will use for storing -compressed metrics. The default settings retain about two day's worth of metrics on a system collecting 2,000 metrics -every second. +## Which database mode to use -[**See our database engine -calculator**](/docs/store/change-metrics-storage.md#calculate-the-system-resources-ram-disk-space-needed-to-store-metrics) -to help you correctly set `dbengine multihost disk space` based on your needs. The calculator gives an accurate estimate -based on how many child nodes you have, how many metrics your Agent collects, and more. +Netdata Agents support multiple database modes. -With the database engine active, you can back up your `/var/cache/netdata/dbengine/` folder to another location for -redundancy. +The default mode `[db].mode = dbengine` has been designed to scale for longer retentions. -Now that you know how to switch to the database engine, let's cover the default round-robin database for those who -aren't ready to make the move. +The other available database modes are designed to minimize resource utilization and should usually be considered on **parent - child** setups at the children side. -## Using the round-robin database +So, -In previous versions, Netdata used a round-robin database to store 1 hour of per-second metrics. +* On a single node setup, use `[db].mode = dbengine` to increase retention. +* On a **parent - child** setup, use `[db].mode = dbengine` on the parent to increase retention and a more resource efficient mode (like `save`, `ram` or `none`) for the child to minimize resources utilization. -To see if you're still using this database, or if you would like to switch to it, open your `netdata.conf` file and see -if `memory mode` option is set to `save`. +To use `dbengine`, set this in `netdata.conf` (it is the default): -```conf -[global] - memory mode = save ``` +[db] + mode = dbengine +``` + +## Tiering -If `memory mode` is set to `save`, then you're using the round-robin database. If so, the `history` option is set to -`3600`, which is the equivalent to 3,600 seconds, or one hour. +`dbengine` supports tiering. Tiering allows having up to 3 versions of the data: -To increase your historical metrics, you can increase `history` to the number of seconds you'd like to store: +1. Tier 0 is the high resolution data. +2. Tier 1 is the first tier that samples data every 60 data collections of Tier 0. +3. Tier 2 is the second tier that samples data every 3600 data collections of Tier 0 (60 of Tier 1). -```conf -[global] - # 2 hours = 2 * 60 * 60 = 7200 seconds - history = 7200 - # 4 hours = 4 * 60 * 60 = 14440 seconds - history = 14440 - # 24 hours = 24 * 60 * 60 = 86400 seconds - history = 86400 +To enable tiering set `[db].storage tiers` in `netdata.conf` (the default is 1, to enable only Tier 0): + +``` +[db] + mode = dbengine + storage tiers = 3 ``` -And so on. +## Disk space requirements -Next, check to see how many metrics Netdata collects on your system, and how much RAM that uses. Visit the Netdata -dashboard and look at the bottom-right corner of the interface. You'll find a sentence similar to the following: +Netdata Agents require about 1 bytes on disk per database point on Tier 0 and 4 times more on higher tiers (Tier 1 and 2). They require 4 times more storage per point compared to Tier 0, because for every point higher tiers store `min`, `max`, `sum`, `count` and `anomaly rate` (the values are 5, but they require 4 times the storage because `count` and `anomaly rate` are 16-bit integers). The `average` is calculated on the fly at query time using `sum / count`. -> Every second, Netdata collects 1,938 metrics, presents them in 299 charts and monitors them with 81 alarms. Netdata is -> using 25 MB of memory on **netdata-linux** for 1 hour, 6 minutes and 36 seconds of real-time history. +### Tier 0 - per second for a week -On this desktop system, using a Ryzen 5 1600 and 16GB of RAM, the round-robin databases uses 25 MB of RAM to store just -over an hour's worth of data for nearly 2,000 metrics. +For 2000 metrics, collected every second and retained for a week, Tier 0 needs: 1 byte x 2000 metrics x 3600 secs per hour x 24 hours per day x 7 days per week = 1100MB. -To increase the `history` option, you need to edit your `netdata.conf` file and increase the `history` setting. In most -installations, you'll find it at `/etc/netdata/netdata.conf`, but some operating systems place it at -`/opt/netdata/etc/netdata/netdata.conf`. +The setting to control this is in `netdata.conf`: -Use `/etc/netdata/edit-config netdata.conf`, or your favorite text editor, to replace `3600` with the number of seconds -you'd like to store. +``` +[db] + mode = dbengine + + # per second data collection + update every = 1 + + # enable only Tier 0 + storage tiers = 1 + + # Tier 0, per second data for a week + dbengine multihost disk space MB = 1100 +``` -You should base this number on two things: How much history you need for your use case, and how much RAM you're willing -to dedicate to Netdata. +By setting it to `1100` and restarting the Netdata Agent, this node will start maintaining about a week of data. But pay attention to the number of metrics. If you have more than 2000 metrics on a node, or you need more that a week of high resolution metrics, you may need to adjust this setting accordingly. -> Take care when you change the `history` option on production systems. Netdata is configured to stop its process if -> your system starts running out of RAM, but you can never be too careful. Out of memory situations are very bad. +### Tier 1 - per minute for a month -How much RAM will a longer history use? Let's use a little math. +Tier 1 is by default sampling the data every 60 points of Tier 0. If Tier 0 is per second, then Tier 1 is per minute. -The round-robin database needs 4 bytes for every value Netdata collects. If Netdata collects metrics every second, -that's 4 bytes, per second, per metric. +Tier 1 needs 4 times more storage per point compared to Tier 0. So, for 2000 metrics, with per minute resolution, retained for a month, Tier 1 needs: 4 bytes x 2000 metrics x 60 minutes per hour x 24 hours per day x 30 days per month = 330MB. -```text -4 bytes * X seconds * Y metrics = RAM usage in bytes +Do this in `netdata.conf`: + +``` +[db] + mode = dbengine + + # per second data collection + update every = 1 + + # enable only Tier 0 and Tier 1 + storage tiers = 2 + + # Tier 0, per second data for a week + dbengine multihost disk space MB = 1100 + + # Tier 1, per minute data for a month + dbengine tier 1 multihost disk space MB = 330 ``` -Let's assume your system collects 1,000 metrics per second. +Once `netdata.conf` is edited, the Netdata Agent needs to be restarted for the changes to take effect. -```text -4 bytes * 3600 seconds * 1,000 metrics = 14400000 bytes = 14.4 MB RAM -``` +### Tier 2 - per hour for a year -With that formula, you can calculate the RAM usage for much larger history settings. - -```conf -# 2 hours at 1,000 metrics per second -4 bytes * 7200 seconds * 1,000 metrics = 28800000 bytes = 28.8 MB RAM -# 2 hours at 2,000 metrics per second -4 bytes * 7200 seconds * 2,000 metrics = 57600000 bytes = 57.6 MB RAM -# 4 hours at 2,000 metrics per second -4 bytes * 14440 seconds * 2,000 metrics = 115520000 bytes = 115.52 MB RAM -# 24 hours at 1,000 metrics per second -4 bytes * 86400 seconds * 1,000 metrics = 345600000 bytes = 345.6 MB RAM -``` +Tier 2 is by default sampling data every 3600 points of Tier 0 (60 of Tier 1). If Tier 0 is per second, then Tier 2 is per hour. -## What's next? +The storage requirements are the same to Tier 1. -Now that you have either configured database engine or round-robin database engine to store more metrics, you'll -probably want to see it in action! +For 2000 metrics, with per hour resolution, retained for a year, Tier 2 needs: 4 bytes x 2000 metrics x 24 hours per day x 365 days per year = 67MB. + +Do this in `netdata.conf`: + +``` +[db] + mode = dbengine + + # per second data collection + update every = 1 + + # enable only Tier 0 and Tier 1 + storage tiers = 3 + + # Tier 0, per second data for a week + dbengine multihost disk space MB = 1100 + + # Tier 1, per minute data for a month + dbengine tier 1 multihost disk space MB = 330 + + # Tier 2, per hour data for a year + dbengine tier 2 multihost disk space MB = 67 +``` -For more information about how to pan charts to view historical metrics, see our documentation on [using -charts](/web/README.md#using-charts). +Once `netdata.conf` is edited, the Netdata Agent needs to be restarted for the changes to take effect. -And if you'd now like to reduce Netdata's resource usage, view our [performance -guide](/docs/guides/configure/performance.md) for our best practices on optimization. |