diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-11-25 17:33:56 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-11-25 17:34:10 +0000 |
commit | 83ba6762cc43d9db581b979bb5e3445669e46cc2 (patch) | |
tree | 2e69833b43f791ed253a7a20318b767ebe56cdb8 /docs/netdata-agent/sizing-netdata-agents | |
parent | Releasing debian version 1.47.5-1. (diff) | |
download | netdata-83ba6762cc43d9db581b979bb5e3445669e46cc2.tar.xz netdata-83ba6762cc43d9db581b979bb5e3445669e46cc2.zip |
Merging upstream version 2.0.3+dfsg (Closes: #923993, #1042533, #1045145).
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'docs/netdata-agent/sizing-netdata-agents')
5 files changed, 101 insertions, 127 deletions
diff --git a/docs/netdata-agent/sizing-netdata-agents/README.md b/docs/netdata-agent/sizing-netdata-agents/README.md index 3ba346f7a..3880e214c 100644 --- a/docs/netdata-agent/sizing-netdata-agents/README.md +++ b/docs/netdata-agent/sizing-netdata-agents/README.md @@ -1,89 +1,85 @@ -# Sizing Netdata Agents +# Resource utilization -Netdata automatically adjusts its resources utilization based on the workload offered to it. +Netdata is designed to automatically adjust its resource consumption based on the specific workload. -This is a map of how Netdata **features impact resources utilization**: +This table shows the specific system resources affected by different Netdata features: -| Feature | CPU | RAM | Disk I/O | Disk Space | Retention | Bandwidth | -|-----------------------------:|:---:|:---:|:--------:|:----------:|:---------:|:---------:| -| Metrics collected | X | X | X | X | X | - | -| Samples collection frequency | X | - | X | X | X | - | -| Database mode and tiers | - | X | X | X | X | - | -| Machine learning | X | X | - | - | - | - | -| Streaming | X | X | - | - | - | X | +| Feature | CPU | RAM | Disk I/O | Disk Space | Network Traffic | +|------------------------:|:---:|:---:|:--------:|:----------:|:---------------:| +| Collected metrics | ✓ | ✓ | ✓ | ✓ | - | +| Sample frequency | ✓ | - | ✓ | ✓ | - | +| Database mode and tiers | - | ✓ | ✓ | ✓ | - | +| Machine learning | ✓ | ✓ | - | - | - | +| Streaming | ✓ | ✓ | - | - | ✓ | -1. **Metrics collected**: The number of metrics collected affects almost every aspect of resources utilization. +1. **Collected metrics** - When you need to lower the resources used by Netdata, this is an obvious first step. + - **Impact**: More metrics mean higher CPU, RAM, disk I/O, and disk space usage. + - **Optimization**: To reduce resource consumption, consider lowering the number of collected metrics by disabling unnecessary data collectors. -2. **Samples collection frequency**: By default Netdata collects metrics with 1-second granularity, unless the metrics collected are not updated that frequently, in which case Netdata collects them at the frequency they are updated. This is controlled per data collection job. +2. **Sample frequency** - Lowering the data collection frequency from every-second to every-2-seconds, will make Netdata use half the CPU utilization. So, CPU utilization is proportional to the data collection frequency. + - **Impact**: Netdata collects most metrics with 1-second granularity. This high frequency impacts CPU usage. + - **Optimization**: Lowering the sampling frequency (e.g., 1-second to 2-second intervals) can halve CPU usage. Balance the need for detailed data with resource efficiency. -3. **Database Mode and Tiers**: By default Netdata stores metrics in 3 database tiers: high-resolution, mid-resolution, low-resolution. All database tiers are updated in parallel during data collection, and depending on the query duration Netdata may consult one or more tiers to optimize the resources required to satisfy it. +3. **Database Mode** - The number of database tiers affects the memory requirements of Netdata. Going from 3-tiers to 1-tier, will make Netdata use half the memory. Of course metrics retention will also be limited to 1 tier. + - **Impact**: The default database mode, `dbengine`, compresses data and writes it to disk. + - **Optimization**: In a Parent-Child setup, switch the Child's database mode to `ram`. This eliminates disk I/O for the Child. -4. **Machine Learning**: Byt default Netdata trains multiple machine learning models for every metric collected, to learn its behavior and detect anomalies. Machine Learning is a CPU intensive process and affects the overall CPU utilization of Netdata. +4. **Database Tiers** -5. **Streaming Compression**: When using Netdata in Parent-Child configurations to create Metrics Centralization Points, the compression algorithm used greatly affects CPU utilization and bandwidth consumption. + - **Impact**: The number of database tiers directly affects memory consumption. More tiers mean higher memory usage. + - **Optimization**: The default number of tiers is 3. Choose the appropriate number of tiers based on data retention requirements. - Netdata supports multiple streaming compressions algorithms, allowing the optimization of either CPU utilization or Network Bandwidth. The default algorithm `zstd` provides the best balance among them. +5. **Machine Learning** -## Minimizing the resources used by Netdata Agents - -To minimize the resources used by Netdata Agents, we suggest to configure Netdata Parents for centralizing metric samples, and disabling most of the features on Netdata Children. This will provide minimal resources utilization at the edge, while all the features of Netdata are available at the Netdata Parents. - -The following guides provide instructions on how to do this. + - **Impact**: Machine learning model training is CPU-intensive, affecting overall CPU usage. + - **Optimization**: Consider disabling machine learning for less critical metrics or adjusting model training frequency. -## Maximizing the scale of Netdata Parents - -Netdata Parents automatically size resource utilization based on the workload they receive. The only possible option for improving query performance is to dedicate more RAM to them, by increasing their caches efficiency. - -Check [RAM Requirements](/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md) for more information. +6. **Streaming Compression** -## Innovations Netdata has for optimal performance and scalability + - **Impact**: Compression algorithm choice affects CPU usage and network traffic. + - **Optimization**: Select an algorithm that balances CPU efficiency with network bandwidth requirements (e.g., zstd for a good balance). -The following are some of the innovations the open-source Netdata agent has, that contribute to its excellent performance, and scalability. - -1. **Minimal disk I/O** - - When Netdata saves data on-disk, it stores them at their final place, eliminating the need to reorganize this data. - - Netdata is organizing its data structures in such a way that samples are committed to disk as evenly as possible across time, without affecting its memory requirements. +## Minimizing the resources used by Netdata Agents - Furthermore, Netdata Agents use direct-I/O for saving and loading metric samples. This prevents Netdata from polluting system caches with metric data. Netdata maintains its own caches for this data. +To optimize resource utilization, consider using a **Parent-Child** setup. - All these features make Netdata an nice partner and a polite citizen for production applications running on the same systems Netdata runs. +This approach involves centralizing the collection and processing of metrics on Parent nodes while running lightweight Children Agents on edge devices. -2. **4 bytes per sample uncompressed** +## Maximizing the scale of Parent Agents - To achieve optimal memory and disk footprint, Netdata uses a custom 32-bit floating point number. This floating point number is used to store the samples collected, together with their anomaly bit. The database of Netdata is fixed-step, so it has predefined slots for every sample, allowing Netdata to store timestamps once every several hundreds samples, minimizing both its memory requirements and the disk footprint. +Parents dynamically adjust their resource usage based on the volume of metrics received. However, for optimal query performance, you may need to dedicate more RAM. - The final disk footprint of Netdata varies due to compression efficiency. It is usually about 0.6 bytes per sample for the high-resolution tier (per-second), 6 bytes per sample for the mid-resolution tier (per-minute) and 18 bytes per sample for the low-resolution tier (per-hour). +Check [RAM Requirements](/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md) for more information. -3. **Query priorities** +## Netdata's performance and scalability optimization techniques - Alerting, Machine Learning, Streaming and Replication, rely on metric queries. When multiple queries are running in parallel, Netdata assigns priorities to all of them, favoring interactive queries over background tasks. This means that queries do not compete equally for resources. Machine learning or replication may slow down when interactive queries are running and the system starves for resources. +1. **Minimal Disk I/O** -4. **A pointer per label** + Netdata directly writes metric data to disk, bypassing system caches and reducing I/O overhead. Additionally, its optimized data structures minimize disk space and memory usage through efficient compression and timestamping. - Apart from metric samples, metric labels and their cardinality is the biggest memory consumer, especially in highly ephemeral environments, like kubernetes. Netdata uses a single pointer for any label key-value pair that is reused. Keys and values are also deduplicated, providing the best possible memory footprint for metric labels. +2. **Compact Storage Engine** -5. **Streaming Protocol** + Netdata uses a custom 32-bit floating-point format tailored for efficient storage of time-series data, along with an anomaly bit. This, combined with a fixed-step database design, enables efficient storage and retrieval of data. - The streaming protocol of Netdata allows minimizing the resources consumed on production systems by delegating features of to other Netdata agents (Parents), without compromising monitoring fidelity or responsiveness, enabling the creation of a highly distributed observability platform. + | Tier | Approximate Sample Size (bytes) | + |-----------------------------------|---------------------------------| + | High-resolution tier (per-second) | 0.6 | + | Mid-resolution tier (per-minute) | 6 | + | Low-resolution tier (per-hour) | 18 | -## Netdata vs Prometheus + Timestamp optimization further reduces storage overhead by storing timestamps at regular intervals. -Netdata outperforms Prometheus in every aspect. -35% CPU Utilization, -49% RAM usage, -12% network bandwidth, -98% disk I/O, -75% in disk footprint for high resolution data, while providing more than a year of retention. +3. **Intelligent Query Engine** -Read the [full comparison here](https://blog.netdata.cloud/netdata-vs-prometheus-performance-analysis/). + Netdata prioritizes interactive queries over background tasks like machine learning and replication, ensuring optimal user experience, especially under heavy load. -## Energy Efficiency +4. **Efficient Label Storage** -University of Amsterdam contacted a research on the impact monitoring systems have on docker based systems. + Netdata uses pointers to reference shared label key-value pairs, minimizing memory usage, especially in highly dynamic environments. -The study found that Netdata excels in CPU utilization, RAM usage, Execution Time and concluded that **Netdata is the most energy efficient tool**. +5. **Scalable Streaming Protocol** -Read the [full study here](https://www.ivanomalavolta.com/files/papers/ICSOC_2023.pdf). + Netdata's streaming protocol enables the creation of distributed monitoring setups, where Children offload data processing to Parents, optimizing resource utilization. diff --git a/docs/netdata-agent/sizing-netdata-agents/bandwidth-requirements.md b/docs/netdata-agent/sizing-netdata-agents/bandwidth-requirements.md index 092c8da16..fbbc279d5 100644 --- a/docs/netdata-agent/sizing-netdata-agents/bandwidth-requirements.md +++ b/docs/netdata-agent/sizing-netdata-agents/bandwidth-requirements.md @@ -1,16 +1,16 @@ # Bandwidth Requirements -## On Production Systems, Standalone Netdata +## Production Systems: Standalone Netdata Standalone Netdata may use network bandwidth under the following conditions: -1. You configured data collection jobs that are fetching data from remote systems. There is no such jobs enabled by default. +1. You configured data collection jobs that are fetching data from remote systems. There are no such jobs enabled by default. 2. You use the dashboard of the Netdata. 3. [Netdata Cloud communication](#netdata-cloud-communication) (see below). -## On Metrics Centralization Points, between Netdata Children & Parents +## Metrics Centralization Points: Between Netdata Children & Parents -Netdata supports multiple compression algorithms for streaming communication. Netdata Children offer all their compression algorithms when connecting to a Netdata Parent, and the Netdata Parent decides which one to use based on algorithms availability and user configuration. +Netdata supports multiple compression algorithms for streaming communication. Netdata Children offer all their compression algorithms when connecting to a Netdata Parent, and the Netdata Parent decides which one to use based on algorithm availability and user configuration. | Algorithm | Best for | |:---------:|:-----------------------------------------------------------------------------------------------------------------------------------:| @@ -23,7 +23,7 @@ The expected bandwidth consumption using `zstd` for 1 million samples per second The order compression algorithms is selected is configured in `stream.conf`, per `[API KEY]`, like this: -``` +```text compression algorithms order = zstd lz4 brotli gzip ``` @@ -42,6 +42,6 @@ The information transferred to Netdata Cloud is: 3. Information about the **metrics available and their retention**. 4. Information about the **configured alerts and their transitions**. -This is not a constant stream of information. Netdata Agents update Netdata Cloud only about status changes on all the above (e.g. an alert being triggered, or a metric stopped being collected). So, there is an initial handshake and exchange of information when Netdata starts, and then there only updates when required. +This is not a constant stream of information. Netdata Agents update Netdata Cloud only about status changes on all the above (e.g., an alert being triggered, or a metric stopped being collected). So, there is an initial handshake and exchange of information when Netdata starts, and then there only updates when required. Of course, when you view Netdata Cloud dashboards that need to query the database a Netdata agent maintains, this query is forwarded to an agent that can satisfy it. This means that Netdata Cloud receives metric samples only when a user is accessing a dashboard and the samples transferred are usually aggregations to allow rendering the dashboards. diff --git a/docs/netdata-agent/sizing-netdata-agents/cpu-requirements.md b/docs/netdata-agent/sizing-netdata-agents/cpu-requirements.md index 021a35fb2..76580b1c3 100644 --- a/docs/netdata-agent/sizing-netdata-agents/cpu-requirements.md +++ b/docs/netdata-agent/sizing-netdata-agents/cpu-requirements.md @@ -1,65 +1,43 @@ -# CPU Requirements +# CPU -Netdata's CPU consumption is affected by the following factors: +Netdata's CPU usage depends on the features you enable. For details, see [resource utilization](/docs/netdata-agent/sizing-netdata-agents/README.md). -1. The number of metrics collected -2. The frequency metrics are collected -3. Machine Learning -4. Streaming compression (streaming of metrics to Netdata Parents) -5. Database Mode +## Children -## On Production Systems, Netdata Children +With default settings on Children, CPU utilization typically falls within the range of 1% to 5% of a single core. This includes the combined resource usage of: -On production systems, where Netdata is running with default settings, monitoring the system it is installed at and its containers and applications, CPU utilization should usually be about 1% to 5% of a single CPU core. +- Three database tiers for data storage. +- Machine learning for anomaly detection. +- Per-second data collection. +- Alerts. +- Streaming to a [Parent Agent](/docs/observability-centralization-points/metrics-centralization-points/README.md). -This includes 3 database tiers, machine learning, per-second data collection, alerts, and streaming to a Netdata Parent. +## Parents -## On Metrics Centralization Points, Netdata Parents +For Netdata Parents (Metrics Centralization Points), we estimate the following CPU utilization: -On Metrics Centralization Points, Netdata Parents running on modern server hardware, we **estimate CPU utilization per million of samples collected per second**: +| Feature | Depends On | Expected Utilization (CPU cores per million) | Key Reasons | +|:--------------------:|:---------------------------------------------------:|:--------------------------------------------:|:------------------------------------------------------------------------:| +| Metrics Ingest | Number of samples received per second | 2 | Decompress and decode received messages, update database | +| Metrics re-streaming | Number of samples resent per second | 2 | Encode and compress messages towards another Parent | +| Machine Learning | Number of unique time-series concurrently collected | 2 | Train machine learning models, query existing models to detect anomalies | -| Feature | Depends On | Expected Utilization | Key Reasons | -|:-----------------:|:---------------------------------------------------:|:----------------------------------------------------------------:|:-------------------------------------------------------------------------:| -| Metrics Ingestion | Number of samples received per second | 2 CPU cores per million of samples per second | Decompress and decode received messages, update database. | -| Metrics re-streaming| Number of samples resent per second | 2 CPU cores per million of samples per second | Encode and compress messages towards Netdata Parent. | -| Machine Learning | Number of unique time-series concurrently collected | 2 CPU cores per million of unique metrics concurrently collected | Train machine learning models, query existing models to detect anomalies. | +To ensure optimal performance, keep total CPU utilization below 60% when the Parent is actively processing metrics, training models, and running health checks. -We recommend keeping the total CPU utilization below 60% when a Netdata Parent is steadily ingesting metrics, training machine learning models and running health checks. This will leave enough CPU resources available for queries. +## Increased CPU consumption on Parent startup -## I want to minimize CPU utilization. What should I do? +When a Netdata Parent starts up, it undergoes a series of initialization tasks that can temporarily increase CPU, network, and disk I/O usage: -You can control Netdata's CPU utilization with these parameters: +1. **Backfilling Higher Tiers**: The Parent calculates aggregated metrics for missing data points, ensuring consistency across different time resolutions. +2. **Metadata Synchronization**: The Parent and Children exchange metadata information about collected metrics. +3. **Data Replication**: Missing data is transferred from Children to the Parent. +4. **Normal Streaming**: Regular streaming of new metrics begins. +5. **Machine Learning Initialization**: Machine learning models are loaded and prepared for anomaly detection. +6. **Health Check Initialization**: The health engine starts monitoring metrics and triggering alerts. -1. **Data collection frequency**: Going from per-second metrics to every-2-seconds metrics will half the CPU utilization of Netdata. -2. **Number of metrics collected**: Netdata by default collects every metric available on the systems it runs. Review the metrics collected and disable data collection plugins and modules not needed. -3. **Machine Learning**: Disable machine learning to save CPU cycles. -4. **Number of database tiers**: Netdata updates database tiers in parallel, during data collection. This affects both CPU utilization and memory requirements. -5. **Database Mode**: The default database mode is `dbengine`, which compresses and commits data to disk. If you have a Netdata Parent where metrics are aggregated and saved to disk and there is a reliable connection between the Netdata you want to optimize and its Parent, switch to database mode `ram` or `alloc`. This disables saving to disk, so your Netdata will also not use any disk I/O. +Additional considerations: -## I see increased CPU consumption when a busy Netdata Parent starts, why? +- **Compression Optimization**: The compression algorithm learns data patterns to optimize compression ratios. +- **Database Optimization**: The database engine adjusts page sizes for efficient disk I/O. -When a Netdata Parent starts and Netdata children get connected to it, there are several operations that temporarily affect CPU utilization, network bandwidth and disk I/O. - -The general flow looks like this: - -1. **Back-filling of higher tiers**: Usually this means calculating the aggregates of the last hour of `tier2` and of the last minute of `tier1`, ensuring that higher tiers reflect all the information `tier0` has. If Netdata was stopped abnormally (e.g. due to a system failure or crash), higher tiers may have to be back-filled for longer durations. -2. **Metadata synchronization**: The metadata of all metrics each Netdata Child maintains are negotiated between the Child and the Parent and are synchronized. -3. **Replication**: If the Parent is missing samples the Child has, these samples are transferred to the Parent before transferring new samples. -4. Once all these finish, the normal **streaming of new metric samples** starts. -5. At the same time, **machine learning** initializes, loads saved trained models and prepares anomaly detection. -6. After a few moments the **health engine starts checking metrics** for triggering alerts. - -The above process is per metric. So, while one metric back-fills, another replicates and a third one streams. - -At the same time: - -- the compression algorithm learns the patterns of the data exchanged and optimizes its dictionaries for optimal compression and CPU utilization, -- the database engine adjusts the page size of each metric, so that samples are committed to disk as evenly as possible across time. - -So, when looking for the "steady CPU consumption during ingestion" of a busy Netdata Parent, we recommend to let it stabilize for a few hours before checking. - -Keep in mind that Netdata has been designed so that even if during the initialization phase and the connection of hundreds of Netdata Children the system lacks CPU resources, the Netdata Parent will complete all the operations and eventually enter a steady CPU consumption during ingestion, without affecting the quality of the metrics stored. So, it is ok if during initialization of a busy Netdata Parent, CPU consumption spikes to 100%. - -Important: the above initialization process is not such intense when new nodes get connected to a Netdata Parent for the first time (e.g. ephemeral nodes), since several of the steps involved are not required. - -Especially for the cases where children disconnect and reconnect to the Parent due to network related issues (i.e. both the Netdata Child and the Netdata Parent have not been restarted and less than 1 hour has passed since the last disconnection), the re-negotiation phase is minimal and metrics are instantly entering the normal streaming phase. +These initial tasks can temporarily increase resource usage, but the impact typically diminishes as the Parent stabilizes and enters a steady-state operation. diff --git a/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md b/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md index 7cd9a527d..68da44000 100644 --- a/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md +++ b/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md @@ -12,7 +12,7 @@ Netdata offers two database modes to suit your needs for performance and data pe ## `dbengine` Netdata's `dbengine` mode efficiently stores data on disk using compression. The actual disk space used depends on how well the data compresses. -This mode utilizes a tiered storage approach: data is saved in multiple tiers on disk. Each tier retains data at a different resolution (detail level). Higher tiers store a down-sampled (less detailed) version of the data found in lower tiers. +This mode uses a tiered storage approach: data is saved in multiple tiers on disk. Each tier retains data at a different resolution (detail level). Higher tiers store a down-sampled (less detailed) version of the data found in lower tiers. ```mermaid gantt @@ -25,7 +25,7 @@ gantt tier2, 365d :a3, 2023-11-02, 59d ``` -`dbengine` supports up to 5 tiers. By default, 3 tiers are used: +`dbengine` supports up to five tiers. By default, three tiers are used: | Tier | Resolution | Uncompressed Sample Size | Usually On Disk | |:-------:|:--------------------------------------------------------------------------------------------:|:------------------------:|:---------------:| @@ -40,11 +40,11 @@ gantt ## `ram` -`ram` mode can help when Netdata should not introduce any disk I/O at all. In both of these modes, metric samples exist only in memory, and only while they are collected. +`ram` mode can help when Netdata shouldn’t introduce any disk I/O at all. In both of these modes, metric samples exist only in memory, and only while they’re collected. -When Netdata is configured to stream its metrics to a Metrics Observability Centralization Point (a Netdata Parent), metric samples are forwarded in real-time to that Netdata Parent. The ring buffers available in these modes is used to cache the collected samples for some time, in case there are network issues, or the Netdata Parent is restarted for maintenance. +When Netdata is configured to stream its metrics to a Metrics Observability Centralization Point (a Netdata Parent), metric samples are forwarded in real-time to that Netdata Parent. The ring buffers available in these modes are used to cache the collected samples for some time, in case there are network issues, or the Netdata Parent is restarted for maintenance. -The memory required per sample in these modes, is 4 bytes: `ram` mode uses `mmap()` behind the scene, and can be incremented in steps of 1024 samples (4KiB). Mode `ram` allows the use of the Linux kernel memory dedupper (Kernel-Same-Page or KSM) to deduplicate Netdata ring buffers and save memory. +The memory required per sample in these modes, is four bytes: `ram` mode uses `mmap()` behind the scene, and can be incremented in steps of 1024 samples (4KiB). Mode `ram` allows the use of the Linux kernel memory dedupper (Kernel-Same-Page or KSM) to deduplicate Netdata ring buffers and save memory. **Configuring ram mode and retention**: diff --git a/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md b/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md index 8d8522517..a4ccf5507 100644 --- a/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md +++ b/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md @@ -8,21 +8,21 @@ Netdata supports memory ballooning and automatically sizes and limits the memory With default settings, Netdata should run with 100MB to 200MB of RAM, depending on the number of metrics being collected. -This number can be lowered by limiting the number of database tier or switching database modes. For more information check [Disk Requirements and Retention](/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md). +This number can be lowered by limiting the number of database tier or switching database modes. For more information, check [Disk Requirements and Retention](/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md). ## On Metrics Centralization Points, Netdata Parents The general formula, with the default configuration of database tiers, is: -``` +```text memory = UNIQUE_METRICS x 16KiB + CONFIGURED_CACHES ``` The default `CONFIGURED_CACHES` is 32MiB. -For 1 million concurrently collected time-series (independently of their data collection frequency), the memory required is: +For one million concurrently collected time-series (independently of their data collection frequency), the memory required is: -``` +```text UNIQUE_METRICS = 1000000 CONFIGURED_CACHES = 32MiB @@ -32,16 +32,16 @@ CONFIGURED_CACHES = 32MiB about 16 GiB ``` -There are 2 cache sizes that can be configured in `netdata.conf`: +There are two cache sizes that can be configured in `netdata.conf`: -1. `[db].dbengine page cache size MB`: this is the main cache that keeps metrics data into memory. When data are not found in it, the extent cache is consulted, and if not found in that either, they are loaded from disk. -2. `[db].dbengine extent cache size MB`: this is the compressed extent cache. It keeps in memory compressed data blocks, as they appear on disk, to avoid reading them again. Data found in the extend cache but not in the main cache have to be uncompressed to be queried. +1. `[db].dbengine page cache size`: this is the main cache that keeps metrics data into memory. When data is not found in it, the extent cache is consulted, and if not found in that too, they are loaded from the disk. +2. `[db].dbengine extent cache size`: this is the compressed extent cache. It keeps in memory compressed data blocks, as they appear on disk, to avoid reading them again. Data found in the extent cache but not in the main cache have to be uncompressed to be queried. Both of them are dynamically adjusted to use some of the total memory computed above. The configuration in `netdata.conf` allows providing additional memory to them, increasing their caching efficiency. ## I have a Netdata Parent that is also a systemd-journal logs centralization point, what should I know? -Logs usually require significantly more disk space and I/O bandwidth than metrics. For optimal performance we recommend to store metrics and logs on separate, independent disks. +Logs usually require significantly more disk space and I/O bandwidth than metrics. For optimal performance, we recommend to store metrics and logs on separate, independent disks. Netdata uses direct-I/O for its database, so that it does not pollute the system caches with its own data. We want Netdata to be a nice citizen when it runs side-by-side with production applications, so this was required to guarantee that Netdata does not affect the operation of databases or other sensitive applications running on the same servers. @@ -49,9 +49,9 @@ To optimize disk I/O, Netdata maintains its own private caches. The default sett `systemd-journal` on the other hand, relies on operating system caches for improving the query performance of logs. When the system lacks free memory, querying logs leads to increased disk I/O. -If you are experiencing slow responses and increased disk reads when metrics queries run, we suggest to dedicate some more RAM to Netdata. +If you are experiencing slow responses and increased disk reads when metrics queries run, we suggest dedicating some more RAM to Netdata. -We frequently see that the following strategy gives best results: +We frequently see that the following strategy gives the best results: 1. Start the Netdata Parent, send all the load you expect it to have and let it stabilize for a few hours. Netdata will now use the minimum memory it believes is required for smooth operation. 2. Check the available system memory. |