diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2023-05-08 16:27:08 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2023-05-08 16:27:08 +0000 |
commit | 81581f9719bc56f01d5aa08952671d65fda9867a (patch) | |
tree | 0f5c6b6138bf169c23c9d24b1fc0a3521385cb18 /database/engine/README.md | |
parent | Releasing debian version 1.38.1-1. (diff) | |
download | netdata-81581f9719bc56f01d5aa08952671d65fda9867a.tar.xz netdata-81581f9719bc56f01d5aa08952671d65fda9867a.zip |
Merging upstream version 1.39.0.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'database/engine/README.md')
-rw-r--r-- | database/engine/README.md | 305 |
1 files changed, 10 insertions, 295 deletions
diff --git a/database/engine/README.md b/database/engine/README.md index 664d40506..890018642 100644 --- a/database/engine/README.md +++ b/database/engine/README.md @@ -1,17 +1,9 @@ -<!-- -title: "Database engine" -description: "Netdata's highly-efficient database engine use both RAM and disk for distributed, long-term storage of per-second metrics." -custom_edit_url: "https://github.com/netdata/netdata/edit/master/database/engine/README.md" -sidebar_label: "Database engine" -learn_status: "Published" -learn_topic_type: "Concepts" -learn_rel_path: "Concepts" ---> - -# DBENGINE +# Database engine DBENGINE is the time-series database of Netdata. +![image](https://user-images.githubusercontent.com/2662304/233838474-d4f8f0b9-61dc-4409-a708-97d403cd153a.png) + ## Design ### Data Points @@ -118,53 +110,13 @@ Tiers are supported in Netdata Agents with version `netdata-1.35.0.138.nightly` Updating the higher **tiers** is automated, and it happens in real-time while data are being collected for **tier 0**. -When the Netdata Agent starts, during the first data collection of each metric, higher tiers are automatically **backfilled** with data from lower tiers, so that the aggregation they provide will be accurate. - -3 tiers are enabled by default in Netdata, with the following configuration: - -``` -[db] - mode = dbengine - - # per second data collection - update every = 1 - - # number of tiers used (1 to 5, 3 being default) - storage tiers = 3 - - # Tier 0, per second data - dbengine multihost disk space MB = 256 - - # Tier 1, per minute data - dbengine tier 1 multihost disk space MB = 128 - - # Tier 2, per hour data - dbengine tier 2 multihost disk space MB = 64 -``` - -The exact retention that can be achieved by each tier depends on the number of metrics collected. The more the metrics, the smaller the retention that will fit in a given size. The general rule is that Netdata needs about **1 byte per data point on disk for tier 0**, and **4 bytes per data point on disk for tier 1 and above**. - -So, for 1000 metrics collected per second and 256 MB for tier 0, Netdata will store about: +When the Netdata Agent starts, during the first data collection of each metric, higher tiers are automatically **backfilled** with +data from lower tiers, so that the aggregation they provide will be accurate. -``` -256MB on disk / 1 byte per point / 1000 metrics => 256k points per metric / 86400 seconds per day = about 3 days -``` - -At tier 1 (per minute): - -``` -128MB on disk / 4 bytes per point / 1000 metrics => 32k points per metric / (24 hours * 60 minutes) = about 22 days -``` +Configuring how the number of tiers and the disk space allocated to each tier is how you can +[change how long netdata stores metrics](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md). -At tier 2 (per hour): - -``` -64MB on disk / 4 bytes per point / 1000 metrics => 16k points per metric / 24 hours per day = about 2 years -``` - -Of course double the metrics, half the retention. There are more factors that affect retention. The number of ephemeral metrics (i.e. metrics that are collected for part of the time). The number of metrics that are usually constant over time (affecting compression efficiency). The number of restarts a Netdata Agents gets through time (because it has to break pages prematurely, increasing the metadata overhead). But the actual numbers should not deviate significantly from the above. - -### Data Loss +### Data loss Until **hot pages** and **dirty pages** are **flushed** to disk they are at risk (e.g. due to a crash, or power failure), as they are stored only in memory. @@ -172,36 +124,9 @@ power failure), as they are stored only in memory. The supported way of ensuring high data availability is the use of Netdata Parents to stream the data in real-time to multiple other Netdata agents. -## Memory Requirements - -DBENGINE memory is related to the number of metrics concurrently being collected, the retention of the metrics on disk in relation with the queries running, and the number of metrics for which retention is maintained. - -### Memory for concurrently collected metrics - -DBENGINE is automatically sized to use memory according to this equation: - -``` -memory in KiB = METRICS x (TIERS - 1) x 4KiB x 2 + 32768 KiB -``` - -Where: -- `METRICS`: the maximum number of concurrently collected metrics (dimensions) from the time the agent started. -- `TIERS`: the number of storage tiers configured, by default 3 ( `-1` when using 3+ tiers) -- `x 2`, to accommodate room for flushing data to disk -- `x 4KiB`, the data segment size of each metric -- `+ 32768 KiB`, 32 MB for operational caches - -So, for 2000 metrics (dimensions) in 3 storage tiers: +## Memory requirements and retention -``` -memory for 2k metrics = 2000 x (3 - 1) x 4 KiB x 2 + 32768 KiB = 64 MiB -``` - -For 100k concurrently collected metrics in 3 storage tiers: - -``` -memory for 100k metrics = 100000 x (3 - 1) x 4 KiB x 2 + 32768 KiB = 1.6 GiB -``` +See (change how long netdata stores metrics)[https://github.com/netdata/netdata/edit/master/docs/store/change-metrics-storage.md] #### Exceptions @@ -262,216 +187,6 @@ The time-ranges of the queries running control the amount of shared memory requi DBENGINE uses 150 bytes of memory for every metric for which retention is maintained but is not currently being collected. ---- - ---- OLD DOCS BELOW THIS POINT --- - ---- - - -## Legacy configuration - -### v1.35.1 and prior - -These versions of the Agent do not support [Tiers](#Tiers). You could change the metric retention for the parent and -all of its children only with the `dbengine multihost disk space MB` setting. This setting accounts the space allocation -for the parent node and all of its children. - -To configure the database engine, look for the `page cache size MB` and `dbengine multihost disk space MB` settings in -the `[db]` section of your `netdata.conf`. - -```conf -[db] - dbengine page cache size MB = 32 - dbengine multihost disk space MB = 256 -``` - -### v1.23.2 and prior - -_For Netdata Agents earlier than v1.23.2_, the Agent on the parent node uses one dbengine instance for itself, and another instance for every child node it receives metrics from. If you had four streaming nodes, you would have five instances in total (`1 parent + 4 child nodes = 5 instances`). - -The Agent allocates resources for each instance separately using the `dbengine disk space MB` (**deprecated**) setting. If `dbengine disk space MB`(**deprecated**) is set to the default `256`, each instance is given 256 MiB in disk space, which means the total disk space required to store all instances is, roughly, `256 MiB * 1 parent * 4 child nodes = 1280 MiB`. - -#### Backward compatibility - -All existing metrics belonging to child nodes are automatically converted to legacy dbengine instances and the localhost -metrics are transferred to the multihost dbengine instance. - -All new child nodes are automatically transferred to the multihost dbengine instance and share its page cache and disk -space. If you want to migrate a child node from its legacy dbengine instance to the multihost dbengine instance, you -must delete the instance's directory, which is located in `/var/cache/netdata/MACHINE_GUID/dbengine`, after stopping the -Agent. - -##### Information - -For more information about setting `[db].mode` on your nodes, in addition to other streaming configurations, see -[streaming](https://github.com/netdata/netdata/blob/master/streaming/README.md). - -## Requirements & limitations - -### Memory - -Using database mode `dbengine` we can overcome most memory restrictions and store a dataset that is much larger than the -available memory. - -There are explicit memory requirements **per** DB engine **instance**: - -- The total page cache memory footprint will be an additional `#dimensions-being-collected x 4096 x 2` bytes over what - the user configured with `dbengine page cache size MB`. - - -- an additional `#pages-on-disk x 4096 x 0.03` bytes of RAM are allocated for metadata. - - - roughly speaking this is 3% of the uncompressed disk space taken by the DB files. - - - for very highly compressible data (compression ratio > 90%) this RAM overhead is comparable to the disk space - footprint. - -An important observation is that RAM usage depends on both the `page cache size` and the `dbengine multihost disk space` -options. - -You can use -our [database engine calculator](https://github.com/netdata/netdata/blob/master/docs/store/change-metrics-storage.md#calculate-the-system-resources-ram-disk-space-needed-to-store-metrics) -to validate the memory requirements for your particular system(s) and configuration (**out-of-date**). - -### Disk space - -There are explicit disk space requirements **per** DB engine **instance**: - -- The total disk space footprint will be the maximum between `#dimensions-being-collected x 4096 x 2` bytes or what the - user configured with `dbengine multihost disk space` or `dbengine disk space`. - -### File descriptor - -The Database Engine may keep a **significant** amount of files open per instance (e.g. per streaming child or parent -server). When configuring your system you should make sure there are at least 50 file descriptors available per -`dbengine` instance. - -Netdata allocates 25% of the available file descriptors to its Database Engine instances. This means that only 25% of -the file descriptors that are available to the Netdata service are accessible by dbengine instances. You should take -that into account when configuring your service or system-wide file descriptor limits. You can roughly estimate that the -Netdata service needs 2048 file descriptors for every 10 streaming child hosts when streaming is configured to use -`[db].mode = dbengine`. - -If for example one wants to allocate 65536 file descriptors to the Netdata service on a systemd system one needs to -override the Netdata service by running `sudo systemctl edit netdata` and creating a file with contents: - -```sh -[Service] -LimitNOFILE=65536 -``` - -For other types of services one can add the line: - -```sh -ulimit -n 65536 -``` - -at the beginning of the service file. Alternatively you can change the system-wide limits of the kernel by changing -`/etc/sysctl.conf`. For linux that would be: - -```conf -fs.file-max = 65536 -``` - -In FreeBSD and OS X you change the lines like this: - -```conf -kern.maxfilesperproc=65536 -kern.maxfiles=65536 -``` - -You can apply the settings by running `sysctl -p` or by rebooting. - -## Files - -With the DB engine mode the metric data are stored in database files. These files are organized in pairs, the datafiles -and their corresponding journalfiles, e.g.: - -```sh -datafile-1-0000000001.ndf -journalfile-1-0000000001.njf -datafile-1-0000000002.ndf -journalfile-1-0000000002.njf -datafile-1-0000000003.ndf -journalfile-1-0000000003.njf -... -``` - -They are located under their host's cache directory in the directory `./dbengine` (e.g. for localhost the default -location is `/var/cache/netdata/dbengine/*`). The higher numbered filenames contain more recent metric data. The user -can safely delete some pairs of files when Netdata is stopped to manually free up some space. - -_Users should_ **back up** _their `./dbengine` folders if they consider this data to be important._ You can also set up -one or more [exporting connectors](https://github.com/netdata/netdata/blob/master/exporting/README.md) to send your Netdata metrics to other databases for long-term -storage at lower granularity. - -## Operation - -The DB engine stores chart metric values in 4096-byte pages in memory. Each chart dimension gets its own page to store -consecutive values generated from the data collectors. Those pages comprise the **Page Cache**. - -When those pages fill up, they are slowly compressed and flushed to disk. It can -take `4096 / 4 = 1024 seconds = 17 minutes`, for a chart dimension that is being collected every 1 second, to fill a -page. Pages can be cut short when we stop Netdata or the DB engine instance so as to not lose the data. When we query -the DB engine for data we trigger disk read I/O requests that fill the Page Cache with the requested pages and -potentially evict cold (not recently used) -pages. - -When the disk quota is exceeded the oldest values are removed from the DB engine at real time, by automatically deleting -the oldest datafile and journalfile pair. Any corresponding pages residing in the Page Cache will also be invalidated -and removed. The DB engine logic will try to maintain between 10 and 20 file pairs at any point in time. - -The Database Engine uses direct I/O to avoid polluting the OS filesystem caches and does not generate excessive I/O -traffic so as to create the minimum possible interference with other applications. - -## Evaluation - -We have evaluated the performance of the `dbengine` API that the netdata daemon uses internally. This is **not** the web -API of netdata. Our benchmarks ran on a **single** `dbengine` instance, multiple of which can be running in a Netdata -parent node. We used a server with an AMD Ryzen Threadripper 2950X 16-Core Processor and 2 disk drives, a Seagate -Constellation ES.3 2TB magnetic HDD and a SAMSUNG MZQLB960HAJR-00007 960GB NAND Flash SSD. - -For our workload, we defined 32 charts with 128 metrics each, giving us a total of 4096 metrics. We defined 1 worker -thread per chart (32 threads) that generates new data points with a data generation interval of 1 second. The time axis -of the time-series is emulated and accelerated so that the worker threads can generate as many data points as possible -without delays. - -We also defined 32 worker threads that perform queries on random metrics with semi-random time ranges. The starting time -of the query is randomly selected between the beginning of the time-series and the time of the latest data point. The -ending time is randomly selected between 1 second and 1 hour after the starting time. The pseudo-random numbers are -generated with a uniform distribution. - -The data are written to the database at the same time as they are read from it. This is a concurrent read/write mixed -workload with a duration of 60 seconds. The faster `dbengine` runs, the bigger the dataset size becomes since more data -points will be generated. We set a page cache size of 64MiB for the two disk-bound scenarios. This way, the dataset size -of the metric data is much bigger than the RAM that is being used for caching so as to trigger I/O requests most of the -time. In our final scenario, we set the page cache size to 16 GiB. That way, the dataset fits in the page cache so as to -avoid all disk bottlenecks. - -The reported numbers are the following: - -| device | page cache | dataset | reads/sec | writes/sec | -|:------:|:----------:|--------:|----------:|-----------:| -| HDD | 64 MiB | 4.1 GiB | 813K | 18.0M | -| SSD | 64 MiB | 9.8 GiB | 1.7M | 43.0M | -| N/A | 16 GiB | 6.8 GiB | 118.2M | 30.2M | - -where "reads/sec" is the number of metric data points being read from the database via its API per second and -"writes/sec" is the number of metric data points being written to the database per second. - -Notice that the HDD numbers are pretty high and not much slower than the SSD numbers. This is thanks to the database -engine design being optimized for rotating media. In the database engine disk I/O requests are: -- asynchronous to mask the high I/O latency of HDDs. -- mostly large to reduce the amount of HDD seeking time. -- mostly sequential to reduce the amount of HDD seeking time. -- compressed to reduce the amount of required throughput. -As a result, the HDD is not thousands of times slower than the SSD, which is typical for other workloads. -An interesting observation to make is that the CPU-bound run (16 GiB page cache) generates fewer data than the SSD run -(6.8 GiB vs 9.8 GiB). The reason is that the 32 reader threads in the SSD scenario are more frequently blocked by I/O, -and generate a read load of 1.7M/sec, whereas in the CPU-bound scenario the read load is 70 times higher at 118M/sec. -Consequently, there is a significant degree of interference by the reader threads, that slow down the writer threads. -This is also possible because the interference effects are greater than the SSD impact on data generation throughput. |