summaryrefslogtreecommitdiffstats
path: root/docs/netdata-agent/sizing-netdata-agents
diff options
context:
space:
mode:
Diffstat (limited to 'docs/netdata-agent/sizing-netdata-agents')
-rw-r--r--docs/netdata-agent/sizing-netdata-agents/README.md89
-rw-r--r--docs/netdata-agent/sizing-netdata-agents/bandwidth-requirements.md47
-rw-r--r--docs/netdata-agent/sizing-netdata-agents/cpu-requirements.md65
-rw-r--r--docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md131
-rw-r--r--docs/netdata-agent/sizing-netdata-agents/ram-requirements.md60
5 files changed, 392 insertions, 0 deletions
diff --git a/docs/netdata-agent/sizing-netdata-agents/README.md b/docs/netdata-agent/sizing-netdata-agents/README.md
new file mode 100644
index 000000000..b945dc56c
--- /dev/null
+++ b/docs/netdata-agent/sizing-netdata-agents/README.md
@@ -0,0 +1,89 @@
+# Sizing Netdata Agents
+
+Netdata automatically adjusts its resources utilization based on the workload offered to it.
+
+This is a map of how Netdata **features impact resources utilization**:
+
+| Feature | CPU | RAM | Disk I/O | Disk Space | Retention | Bandwidth |
+|-----------------------------:|:---:|:---:|:--------:|:----------:|:---------:|:---------:|
+| Metrics collected | X | X | X | X | X | - |
+| Samples collection frequency | X | - | X | X | X | - |
+| Database mode and tiers | - | X | X | X | X | - |
+| Machine learning | X | X | - | - | - | - |
+| Streaming | X | X | - | - | - | X |
+
+1. **Metrics collected**: The number of metrics collected affects almost every aspect of resources utilization.
+
+ When you need to lower the resources used by Netdata, this is an obvious first step.
+
+2. **Samples collection frequency**: By default Netdata collects metrics with 1-second granularity, unless the metrics collected are not updated that frequently, in which case Netdata collects them at the frequency they are updated. This is controlled per data collection job.
+
+ Lowering the data collection frequency from every-second to every-2-seconds, will make Netdata use half the CPU utilization. So, CPU utilization is proportional to the data collection frequency.
+
+3. **Database Mode and Tiers**: By default Netdata stores metrics in 3 database tiers: high-resolution, mid-resolution, low-resolution. All database tiers are updated in parallel during data collection, and depending on the query duration Netdata may consult one or more tiers to optimize the resources required to satisfy it.
+
+ The number of database tiers affects the memory requirements of Netdata. Going from 3-tiers to 1-tier, will make Netdata use half the memory. Of course metrics retention will also be limited to 1 tier.
+
+4. **Machine Learning**: Byt default Netdata trains multiple machine learning models for every metric collected, to learn its behavior and detect anomalies. Machine Learning is a CPU intensive process and affects the overall CPU utilization of Netdata.
+
+5. **Streaming Compression**: When using Netdata in Parent-Child configurations to create Metrics Centralization Points, the compression algorithm used greatly affects CPU utilization and bandwidth consumption.
+
+ Netdata supports multiple streaming compressions algorithms, allowing the optimization of either CPU utilization or Network Bandwidth. The default algorithm `zstd` provides the best balance among them.
+
+## Minimizing the resources used by Netdata Agents
+
+To minimize the resources used by Netdata Agents, we suggest to configure Netdata Parents for centralizing metric samples, and disabling most of the features on Netdata Children. This will provide minimal resources utilization at the edge, while all the features of Netdata are available at the Netdata Parents.
+
+The following guides provide instructions on how to do this.
+
+## Maximizing the scale of Netdata Parents
+
+Netdata Parents automatically size resource utilization based on the workload they receive. The only possible option for improving query performance is to dedicate more RAM to them, by increasing their caches efficiency.
+
+Check [RAM Requirements](https://github.com/netdata/netdata/blob/master/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md) for more information.
+
+## Innovations Netdata has for optimal performance and scalability
+
+The following are some of the innovations the open-source Netdata agent has, that contribute to its excellent performance, and scalability.
+
+1. **Minimal disk I/O**
+
+ When Netdata saves data on-disk, it stores them at their final place, eliminating the need to reorganize this data.
+
+ Netdata is organizing its data structures in such a way that samples are committed to disk as evenly as possible across time, without affecting its memory requirements.
+
+ Furthermore, Netdata Agents use direct-I/O for saving and loading metric samples. This prevents Netdata from polluting system caches with metric data. Netdata maintains its own caches for this data.
+
+ All these features make Netdata an nice partner and a polite citizen for production applications running on the same systems Netdata runs.
+
+2. **4 bytes per sample uncompressed**
+
+ To achieve optimal memory and disk footprint, Netdata uses a custom 32-bit floating point number. This floating point number is used to store the samples collected, together with their anomaly bit. The database of Netdata is fixed-step, so it has predefined slots for every sample, allowing Netdata to store timestamps once every several hundreds samples, minimizing both its memory requirements and the disk footprint.
+
+ The final disk footprint of Netdata varies due to compression efficiency. It is usually about 0.6 bytes per sample for the high-resolution tier (per-second), 6 bytes per sample for the mid-resolution tier (per-minute) and 18 bytes per sample for the low-resolution tier (per-hour).
+
+3. **Query priorities**
+
+ Alerting, Machine Learning, Streaming and Replication, rely on metric queries. When multiple queries are running in parallel, Netdata assigns priorities to all of them, favoring interactive queries over background tasks. This means that queries do not compete equally for resources. Machine learning or replication may slow down when interactive queries are running and the system starves for resources.
+
+4. **A pointer per label**
+
+ Apart from metric samples, metric labels and their cardinality is the biggest memory consumer, especially in highly ephemeral environments, like kubernetes. Netdata uses a single pointer for any label key-value pair that is reused. Keys and values are also deduplicated, providing the best possible memory footprint for metric labels.
+
+5. **Streaming Protocol**
+
+ The streaming protocol of Netdata allows minimizing the resources consumed on production systems by delegating features of to other Netdata agents (Parents), without compromising monitoring fidelity or responsiveness, enabling the creation of a highly distributed observability platform.
+
+## Netdata vs Prometheus
+
+Netdata outperforms Prometheus in every aspect. -35% CPU Utilization, -49% RAM usage, -12% network bandwidth, -98% disk I/O, -75% in disk footprint for high resolution data, while providing more than a year of retention.
+
+Read the [full comparison here](https://blog.netdata.cloud/netdata-vs-prometheus-performance-analysis/).
+
+## Energy Efficiency
+
+University of Amsterdam contacted a research on the impact monitoring systems have on docker based systems.
+
+The study found that Netdata excels in CPU utilization, RAM usage, Execution Time and concluded that **Netdata is the most energy efficient tool**.
+
+Read the [full study here](https://www.ivanomalavolta.com/files/papers/ICSOC_2023.pdf).
diff --git a/docs/netdata-agent/sizing-netdata-agents/bandwidth-requirements.md b/docs/netdata-agent/sizing-netdata-agents/bandwidth-requirements.md
new file mode 100644
index 000000000..092c8da16
--- /dev/null
+++ b/docs/netdata-agent/sizing-netdata-agents/bandwidth-requirements.md
@@ -0,0 +1,47 @@
+# Bandwidth Requirements
+
+## On Production Systems, Standalone Netdata
+
+Standalone Netdata may use network bandwidth under the following conditions:
+
+1. You configured data collection jobs that are fetching data from remote systems. There is no such jobs enabled by default.
+2. You use the dashboard of the Netdata.
+3. [Netdata Cloud communication](#netdata-cloud-communication) (see below).
+
+## On Metrics Centralization Points, between Netdata Children & Parents
+
+Netdata supports multiple compression algorithms for streaming communication. Netdata Children offer all their compression algorithms when connecting to a Netdata Parent, and the Netdata Parent decides which one to use based on algorithms availability and user configuration.
+
+| Algorithm | Best for |
+|:---------:|:-----------------------------------------------------------------------------------------------------------------------------------:|
+| `zstd` | The best balance between CPU utilization and compression efficiency. This is the default. |
+| `lz4` | The fastest of the algorithms. Use this when CPU utilization is more important than bandwidth. |
+| `gzip` | The best compression efficiency, at the expense of CPU utilization. Use this when bandwidth is more important than CPU utilization. |
+| `brotli` | The most CPU intensive algorithm, providing the best compression. |
+
+The expected bandwidth consumption using `zstd` for 1 million samples per second is 84 Mbps, or 10.5 MiB/s.
+
+The order compression algorithms is selected is configured in `stream.conf`, per `[API KEY]`, like this:
+
+```
+ compression algorithms order = zstd lz4 brotli gzip
+```
+
+The first available algorithm on both the Netdata Child and the Netdata Parent, from left to right, is chosen.
+
+Compression can also be disabled in `stream.conf` at either Netdata Children or Netdata Parents.
+
+## Netdata Cloud Communication
+
+When Netdata Agents connect to Netdata Cloud, they communicate metadata of the metrics being collected, but they do not stream the samples collected for each metric.
+
+The information transferred to Netdata Cloud is:
+
+1. Information and **metadata about the system itself**, like its hostname, architecture, virtualization technologies used and generally labels associated with the system.
+2. Information about the **running data collection plugins, modules and jobs**.
+3. Information about the **metrics available and their retention**.
+4. Information about the **configured alerts and their transitions**.
+
+This is not a constant stream of information. Netdata Agents update Netdata Cloud only about status changes on all the above (e.g. an alert being triggered, or a metric stopped being collected). So, there is an initial handshake and exchange of information when Netdata starts, and then there only updates when required.
+
+Of course, when you view Netdata Cloud dashboards that need to query the database a Netdata agent maintains, this query is forwarded to an agent that can satisfy it. This means that Netdata Cloud receives metric samples only when a user is accessing a dashboard and the samples transferred are usually aggregations to allow rendering the dashboards.
diff --git a/docs/netdata-agent/sizing-netdata-agents/cpu-requirements.md b/docs/netdata-agent/sizing-netdata-agents/cpu-requirements.md
new file mode 100644
index 000000000..021a35fb2
--- /dev/null
+++ b/docs/netdata-agent/sizing-netdata-agents/cpu-requirements.md
@@ -0,0 +1,65 @@
+# CPU Requirements
+
+Netdata's CPU consumption is affected by the following factors:
+
+1. The number of metrics collected
+2. The frequency metrics are collected
+3. Machine Learning
+4. Streaming compression (streaming of metrics to Netdata Parents)
+5. Database Mode
+
+## On Production Systems, Netdata Children
+
+On production systems, where Netdata is running with default settings, monitoring the system it is installed at and its containers and applications, CPU utilization should usually be about 1% to 5% of a single CPU core.
+
+This includes 3 database tiers, machine learning, per-second data collection, alerts, and streaming to a Netdata Parent.
+
+## On Metrics Centralization Points, Netdata Parents
+
+On Metrics Centralization Points, Netdata Parents running on modern server hardware, we **estimate CPU utilization per million of samples collected per second**:
+
+| Feature | Depends On | Expected Utilization | Key Reasons |
+|:-----------------:|:---------------------------------------------------:|:----------------------------------------------------------------:|:-------------------------------------------------------------------------:|
+| Metrics Ingestion | Number of samples received per second | 2 CPU cores per million of samples per second | Decompress and decode received messages, update database. |
+| Metrics re-streaming| Number of samples resent per second | 2 CPU cores per million of samples per second | Encode and compress messages towards Netdata Parent. |
+| Machine Learning | Number of unique time-series concurrently collected | 2 CPU cores per million of unique metrics concurrently collected | Train machine learning models, query existing models to detect anomalies. |
+
+We recommend keeping the total CPU utilization below 60% when a Netdata Parent is steadily ingesting metrics, training machine learning models and running health checks. This will leave enough CPU resources available for queries.
+
+## I want to minimize CPU utilization. What should I do?
+
+You can control Netdata's CPU utilization with these parameters:
+
+1. **Data collection frequency**: Going from per-second metrics to every-2-seconds metrics will half the CPU utilization of Netdata.
+2. **Number of metrics collected**: Netdata by default collects every metric available on the systems it runs. Review the metrics collected and disable data collection plugins and modules not needed.
+3. **Machine Learning**: Disable machine learning to save CPU cycles.
+4. **Number of database tiers**: Netdata updates database tiers in parallel, during data collection. This affects both CPU utilization and memory requirements.
+5. **Database Mode**: The default database mode is `dbengine`, which compresses and commits data to disk. If you have a Netdata Parent where metrics are aggregated and saved to disk and there is a reliable connection between the Netdata you want to optimize and its Parent, switch to database mode `ram` or `alloc`. This disables saving to disk, so your Netdata will also not use any disk I/O.
+
+## I see increased CPU consumption when a busy Netdata Parent starts, why?
+
+When a Netdata Parent starts and Netdata children get connected to it, there are several operations that temporarily affect CPU utilization, network bandwidth and disk I/O.
+
+The general flow looks like this:
+
+1. **Back-filling of higher tiers**: Usually this means calculating the aggregates of the last hour of `tier2` and of the last minute of `tier1`, ensuring that higher tiers reflect all the information `tier0` has. If Netdata was stopped abnormally (e.g. due to a system failure or crash), higher tiers may have to be back-filled for longer durations.
+2. **Metadata synchronization**: The metadata of all metrics each Netdata Child maintains are negotiated between the Child and the Parent and are synchronized.
+3. **Replication**: If the Parent is missing samples the Child has, these samples are transferred to the Parent before transferring new samples.
+4. Once all these finish, the normal **streaming of new metric samples** starts.
+5. At the same time, **machine learning** initializes, loads saved trained models and prepares anomaly detection.
+6. After a few moments the **health engine starts checking metrics** for triggering alerts.
+
+The above process is per metric. So, while one metric back-fills, another replicates and a third one streams.
+
+At the same time:
+
+- the compression algorithm learns the patterns of the data exchanged and optimizes its dictionaries for optimal compression and CPU utilization,
+- the database engine adjusts the page size of each metric, so that samples are committed to disk as evenly as possible across time.
+
+So, when looking for the "steady CPU consumption during ingestion" of a busy Netdata Parent, we recommend to let it stabilize for a few hours before checking.
+
+Keep in mind that Netdata has been designed so that even if during the initialization phase and the connection of hundreds of Netdata Children the system lacks CPU resources, the Netdata Parent will complete all the operations and eventually enter a steady CPU consumption during ingestion, without affecting the quality of the metrics stored. So, it is ok if during initialization of a busy Netdata Parent, CPU consumption spikes to 100%.
+
+Important: the above initialization process is not such intense when new nodes get connected to a Netdata Parent for the first time (e.g. ephemeral nodes), since several of the steps involved are not required.
+
+Especially for the cases where children disconnect and reconnect to the Parent due to network related issues (i.e. both the Netdata Child and the Netdata Parent have not been restarted and less than 1 hour has passed since the last disconnection), the re-negotiation phase is minimal and metrics are instantly entering the normal streaming phase.
diff --git a/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md b/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md
new file mode 100644
index 000000000..d9e879cb6
--- /dev/null
+++ b/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md
@@ -0,0 +1,131 @@
+# Disk Requirements & Retention
+
+## Database Modes and Tiers
+
+Netdata comes with 3 database modes:
+
+1. `dbengine`: the default high-performance multi-tier database of Netdata. Metric samples are cached in memory and are saved to disk in multiple tiers, with compression.
+2. `ram`: metric samples are stored in ring buffers in memory, with increments of 1024 samples. Metric samples are not committed to disk. Kernel-Same-Page (KSM) can be used to deduplicate Netdata's memory.
+3. `alloc`: metric samples are stored in ring buffers in memory, with flexible increments. Metric samples are not committed to disk.
+
+## `ram` and `alloc`
+
+Modes `ram` and `alloc` can help when Netdata should not introduce any disk I/O at all. In both of these modes, metric samples exist only in memory, and only while they are collected.
+
+When Netdata is configured to stream its metrics to a Metrics Observability Centralization Point (a Netdata Parent), metric samples are forwarded in real-time to that Netdata Parent. The ring buffers available in these modes is used to cache the collected samples for some time, in case there are network issues, or the Netdata Parent is restarted for maintenance.
+
+The memory required per sample in these modes, is 4 bytes:
+
+- `ram` mode uses `mmap()` behind the scene, and can be incremented in steps of 1024 samples (4KiB). Mode `ram` allows the use of the Linux kernel memory dedupper (Kernel-Same-Page or KSM) to deduplicate Netdata ring buffers and save memory.
+- `alloc` mode can be sized for any number of samples per metric. KSM cannot be used in this mode.
+
+To configure database mode `ram` or `alloc`, in `netdata.conf`, set the following:
+
+- `[db].mode` to either `ram` or `alloc`.
+- `[db].retention` to the number of samples the ring buffers should maintain. For `ram` if the value set is not a multiple of 1024, the next multiple of 1024 will be used.
+
+## `dbengine`
+
+`dbengine` supports up to 5 tiers. By default, 3 tiers are used, like this:
+
+| Tier | Resolution | Uncompressed Sample Size | Usually On Disk |
+|:--------:|:--------------------------------------------------------------------------------------------:|:------------------------:|:---------------:|
+| `tier0` | native resolution (metrics collected per-second as stored per-second) | 4 bytes | 0.6 bytes |
+| `tier1` | 60 iterations of `tier0`, so when metrics are collected per-second, this tier is per-minute. | 16 bytes | 6 bytes |
+| `tier2` | 60 iterations of `tier1`, so when metrics are collected per second, this tier is per-hour. | 16 bytes | 18 bytes |
+
+Data are saved to disk compressed, so the actual size on disk varies depending on compression efficiency.
+
+`dbegnine` tiers are overlapping, so higher tiers include a down-sampled version of the samples in lower tiers:
+
+```mermaid
+gantt
+ dateFormat YYYY-MM-DD
+ tickInterval 1week
+ axisFormat
+ todayMarker off
+ tier0, 14d :a1, 2023-12-24, 7d
+ tier1, 60d :a2, 2023-12-01, 30d
+ tier2, 365d :a3, 2023-11-02, 59d
+```
+
+## Disk Space and Metrics Retention
+
+You can find information about the current disk utilization of a Netdata Parent, at <http://agent-ip:19999/api/v2/info>. The output of this endpoint is like this:
+
+```json
+{
+ // more information about the agent
+ // then, near the end:
+ "db_size": [
+ {
+ "tier": 0,
+ "metrics": 43070,
+ "samples": 88078162001,
+ "disk_used": 41156409552,
+ "disk_max": 41943040000,
+ "disk_percent": 98.1245269,
+ "from": 1705033983,
+ "to": 1708856640,
+ "retention": 3822657,
+ "expected_retention": 3895720,
+ "currently_collected_metrics": 27424
+ },
+ {
+ "tier": 1,
+ "metrics": 72987,
+ "samples": 5155155269,
+ "disk_used": 20585157180,
+ "disk_max": 20971520000,
+ "disk_percent": 98.1576785,
+ "from": 1698287340,
+ "to": 1708856640,
+ "retention": 10569300,
+ "expected_retention": 10767675,
+ "currently_collected_metrics": 27424
+ },
+ {
+ "tier": 2,
+ "metrics": 148234,
+ "samples": 314919121,
+ "disk_used": 5957346684,
+ "disk_max": 10485760000,
+ "disk_percent": 56.8136853,
+ "from": 1667808000,
+ "to": 1708856640,
+ "retention": 41048640,
+ "expected_retention": 72251324,
+ "currently_collected_metrics": 27424
+ }
+ ]
+}
+```
+
+In this example:
+
+- `tier` is the database tier.
+- `metrics` is the number of unique time-series in the database.
+- `samples` is the number of samples in the database.
+- `disk_used` is the currently used disk space in bytes.
+- `disk_max` is the configured max disk space in bytes.
+- `disk_percent` is the current disk space utilization for this tier.
+- `from` is the first (oldest) timestamp in the database for this tier.
+- `to` is the latest (newest) timestamp in the database for this tier.
+- `retention` is the current retention of the database for this tier, in seconds (divide by 3600 for hours, divide by 86400 for days).
+- `expected_retention` is the expected retention in seconds when `disk_percent` will be 100 (divide by 3600 for hours, divide by 86400 for days).
+- `currently_collected_metrics` is the number of unique time-series currently being collected for this tier.
+
+So, for our example above:
+
+| Tier | # Of Metrics | # Of Samples | Disk Used | Disk Free | Current Retention | Expected Retention | Sample Size |
+|-----:|-------------:|--------------:|----------:|----------:|------------------:|-------------------:|------------:|
+| 0 | 43.1K | 88.1 billion | 38.4Gi | 1.88% | 44.2 days | 45.0 days | 0.46 B |
+| 1 | 73.0K | 5.2 billion | 19.2Gi | 1.84% | 122.3 days | 124.6 days | 3.99 B |
+| 2 | 148.3K | 315.0 million | 5.6Gi | 43.19% | 475.1 days | 836.2 days | 18.91 B |
+
+To configure retention, in `netdata.conf`, set the following:
+
+- `[db].mode` to `dbengine`.
+- `[db].dbengine multihost disk space MB`, this is the max disk size for `tier0`. The default is 256MiB.
+- `[db].dbengine tier 1 multihost disk space MB`, this is the max disk space for `tier1`. The default is 50% of `tier0`.
+- `[db].dbengine tier 2 multihost disk space MB`, this is the max disk space for `tier2`. The default is 50% of `tier1`.
diff --git a/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md b/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md
new file mode 100644
index 000000000..159c979a9
--- /dev/null
+++ b/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md
@@ -0,0 +1,60 @@
+# RAM Requirements
+
+With default configuration about database tiers, Netdata should need about 16KiB per unique metric collected, independently of the data collection frequency.
+
+Netdata supports memory ballooning and automatically sizes and limits the memory used, based on the metrics concurrently being collected.
+
+## On Production Systems, Netdata Children
+
+With default settings, Netdata should run with 100MB to 200MB of RAM, depending on the number of metrics being collected.
+
+This number can be lowered by limiting the number of database tier or switching database modes. For more information check [Disk Requirements and Retention](https://github.com/netdata/netdata/blob/master/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md).
+
+## On Metrics Centralization Points, Netdata Parents
+
+The general formula, with the default configuration of database tiers, is:
+
+```
+memory = UNIQUE_METRICS x 16KiB + CONFIGURED_CACHES
+```
+
+The default `CONFIGURED_CACHES` is 32MiB.
+
+For 1 million concurrently collected time-series (independently of their data collection frequency), the memory required is:
+
+```
+UNIQUE_METRICS = 1000000
+CONFIGURED_CACHES = 32MiB
+
+(UNIQUE_METRICS * 16KiB / 1024 in MiB) + CONFIGURED_CACHES =
+( 1000000 * 16KiB / 1024 in MiB) + 32 MiB =
+15657 MiB =
+about 16 GiB
+```
+
+There are 2 cache sizes that can be configured in `netdata.conf`:
+
+1. `[db].dbengine page cache size MB`: this is the main cache that keeps metrics data into memory. When data are not found in it, the extent cache is consulted, and if not found in that either, they are loaded from disk.
+2. `[db].dbengine extent cache size MB`: this is the compressed extent cache. It keeps in memory compressed data blocks, as they appear on disk, to avoid reading them again. Data found in the extend cache but not in the main cache have to be uncompressed to be queried.
+
+Both of them are dynamically adjusted to use some of the total memory computed above. The configuration in `netdata.conf` allows providing additional memory to them, increasing their caching efficiency.
+
+## I have a Netdata Parent that is also a systemd-journal logs centralization point, what should I know?
+
+Logs usually require significantly more disk space and I/O bandwidth than metrics. For optimal performance we recommend to store metrics and logs on separate, independent disks.
+
+Netdata uses direct-I/O for its database, so that it does not pollute the system caches with its own data. We want Netdata to be a nice citizen when it runs side-by-side with production applications, so this was required to guarantee that Netdata does not affect the operation of databases or other sensitive applications running on the same servers.
+
+To optimize disk I/O, Netdata maintains its own private caches. The default settings of these caches are automatically adjusted to the minimum required size for acceptable metrics query performance.
+
+`systemd-journal` on the other hand, relies on operating system caches for improving the query performance of logs. When the system lacks free memory, querying logs leads to increased disk I/O.
+
+If you are experiencing slow responses and increased disk reads when metrics queries run, we suggest to dedicate some more RAM to Netdata.
+
+We frequently see that the following strategy gives best results:
+
+1. Start the Netdata Parent, send all the load you expect it to have and let it stabilize for a few hours. Netdata will now use the minimum memory it believes is required for smooth operation.
+2. Check the available system memory.
+3. Set the page cache in `netdata.conf` to use 1/3 of the available memory.
+
+This will allow Netdata queries to have more caches, while leaving plenty of available memory of logs and the operating system.