diff options
Diffstat (limited to 'docs/netdata-agent/sizing-netdata-agents/README.md')
-rw-r--r-- | docs/netdata-agent/sizing-netdata-agents/README.md | 106 |
1 files changed, 51 insertions, 55 deletions
diff --git a/docs/netdata-agent/sizing-netdata-agents/README.md b/docs/netdata-agent/sizing-netdata-agents/README.md index 3ba346f7a..3880e214c 100644 --- a/docs/netdata-agent/sizing-netdata-agents/README.md +++ b/docs/netdata-agent/sizing-netdata-agents/README.md @@ -1,89 +1,85 @@ -# Sizing Netdata Agents +# Resource utilization -Netdata automatically adjusts its resources utilization based on the workload offered to it. +Netdata is designed to automatically adjust its resource consumption based on the specific workload. -This is a map of how Netdata **features impact resources utilization**: +This table shows the specific system resources affected by different Netdata features: -| Feature | CPU | RAM | Disk I/O | Disk Space | Retention | Bandwidth | -|-----------------------------:|:---:|:---:|:--------:|:----------:|:---------:|:---------:| -| Metrics collected | X | X | X | X | X | - | -| Samples collection frequency | X | - | X | X | X | - | -| Database mode and tiers | - | X | X | X | X | - | -| Machine learning | X | X | - | - | - | - | -| Streaming | X | X | - | - | - | X | +| Feature | CPU | RAM | Disk I/O | Disk Space | Network Traffic | +|------------------------:|:---:|:---:|:--------:|:----------:|:---------------:| +| Collected metrics | ✓ | ✓ | ✓ | ✓ | - | +| Sample frequency | ✓ | - | ✓ | ✓ | - | +| Database mode and tiers | - | ✓ | ✓ | ✓ | - | +| Machine learning | ✓ | ✓ | - | - | - | +| Streaming | ✓ | ✓ | - | - | ✓ | -1. **Metrics collected**: The number of metrics collected affects almost every aspect of resources utilization. +1. **Collected metrics** - When you need to lower the resources used by Netdata, this is an obvious first step. + - **Impact**: More metrics mean higher CPU, RAM, disk I/O, and disk space usage. + - **Optimization**: To reduce resource consumption, consider lowering the number of collected metrics by disabling unnecessary data collectors. -2. **Samples collection frequency**: By default Netdata collects metrics with 1-second granularity, unless the metrics collected are not updated that frequently, in which case Netdata collects them at the frequency they are updated. This is controlled per data collection job. +2. **Sample frequency** - Lowering the data collection frequency from every-second to every-2-seconds, will make Netdata use half the CPU utilization. So, CPU utilization is proportional to the data collection frequency. + - **Impact**: Netdata collects most metrics with 1-second granularity. This high frequency impacts CPU usage. + - **Optimization**: Lowering the sampling frequency (e.g., 1-second to 2-second intervals) can halve CPU usage. Balance the need for detailed data with resource efficiency. -3. **Database Mode and Tiers**: By default Netdata stores metrics in 3 database tiers: high-resolution, mid-resolution, low-resolution. All database tiers are updated in parallel during data collection, and depending on the query duration Netdata may consult one or more tiers to optimize the resources required to satisfy it. +3. **Database Mode** - The number of database tiers affects the memory requirements of Netdata. Going from 3-tiers to 1-tier, will make Netdata use half the memory. Of course metrics retention will also be limited to 1 tier. + - **Impact**: The default database mode, `dbengine`, compresses data and writes it to disk. + - **Optimization**: In a Parent-Child setup, switch the Child's database mode to `ram`. This eliminates disk I/O for the Child. -4. **Machine Learning**: Byt default Netdata trains multiple machine learning models for every metric collected, to learn its behavior and detect anomalies. Machine Learning is a CPU intensive process and affects the overall CPU utilization of Netdata. +4. **Database Tiers** -5. **Streaming Compression**: When using Netdata in Parent-Child configurations to create Metrics Centralization Points, the compression algorithm used greatly affects CPU utilization and bandwidth consumption. + - **Impact**: The number of database tiers directly affects memory consumption. More tiers mean higher memory usage. + - **Optimization**: The default number of tiers is 3. Choose the appropriate number of tiers based on data retention requirements. - Netdata supports multiple streaming compressions algorithms, allowing the optimization of either CPU utilization or Network Bandwidth. The default algorithm `zstd` provides the best balance among them. +5. **Machine Learning** -## Minimizing the resources used by Netdata Agents - -To minimize the resources used by Netdata Agents, we suggest to configure Netdata Parents for centralizing metric samples, and disabling most of the features on Netdata Children. This will provide minimal resources utilization at the edge, while all the features of Netdata are available at the Netdata Parents. - -The following guides provide instructions on how to do this. + - **Impact**: Machine learning model training is CPU-intensive, affecting overall CPU usage. + - **Optimization**: Consider disabling machine learning for less critical metrics or adjusting model training frequency. -## Maximizing the scale of Netdata Parents - -Netdata Parents automatically size resource utilization based on the workload they receive. The only possible option for improving query performance is to dedicate more RAM to them, by increasing their caches efficiency. - -Check [RAM Requirements](/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md) for more information. +6. **Streaming Compression** -## Innovations Netdata has for optimal performance and scalability + - **Impact**: Compression algorithm choice affects CPU usage and network traffic. + - **Optimization**: Select an algorithm that balances CPU efficiency with network bandwidth requirements (e.g., zstd for a good balance). -The following are some of the innovations the open-source Netdata agent has, that contribute to its excellent performance, and scalability. - -1. **Minimal disk I/O** - - When Netdata saves data on-disk, it stores them at their final place, eliminating the need to reorganize this data. - - Netdata is organizing its data structures in such a way that samples are committed to disk as evenly as possible across time, without affecting its memory requirements. +## Minimizing the resources used by Netdata Agents - Furthermore, Netdata Agents use direct-I/O for saving and loading metric samples. This prevents Netdata from polluting system caches with metric data. Netdata maintains its own caches for this data. +To optimize resource utilization, consider using a **Parent-Child** setup. - All these features make Netdata an nice partner and a polite citizen for production applications running on the same systems Netdata runs. +This approach involves centralizing the collection and processing of metrics on Parent nodes while running lightweight Children Agents on edge devices. -2. **4 bytes per sample uncompressed** +## Maximizing the scale of Parent Agents - To achieve optimal memory and disk footprint, Netdata uses a custom 32-bit floating point number. This floating point number is used to store the samples collected, together with their anomaly bit. The database of Netdata is fixed-step, so it has predefined slots for every sample, allowing Netdata to store timestamps once every several hundreds samples, minimizing both its memory requirements and the disk footprint. +Parents dynamically adjust their resource usage based on the volume of metrics received. However, for optimal query performance, you may need to dedicate more RAM. - The final disk footprint of Netdata varies due to compression efficiency. It is usually about 0.6 bytes per sample for the high-resolution tier (per-second), 6 bytes per sample for the mid-resolution tier (per-minute) and 18 bytes per sample for the low-resolution tier (per-hour). +Check [RAM Requirements](/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md) for more information. -3. **Query priorities** +## Netdata's performance and scalability optimization techniques - Alerting, Machine Learning, Streaming and Replication, rely on metric queries. When multiple queries are running in parallel, Netdata assigns priorities to all of them, favoring interactive queries over background tasks. This means that queries do not compete equally for resources. Machine learning or replication may slow down when interactive queries are running and the system starves for resources. +1. **Minimal Disk I/O** -4. **A pointer per label** + Netdata directly writes metric data to disk, bypassing system caches and reducing I/O overhead. Additionally, its optimized data structures minimize disk space and memory usage through efficient compression and timestamping. - Apart from metric samples, metric labels and their cardinality is the biggest memory consumer, especially in highly ephemeral environments, like kubernetes. Netdata uses a single pointer for any label key-value pair that is reused. Keys and values are also deduplicated, providing the best possible memory footprint for metric labels. +2. **Compact Storage Engine** -5. **Streaming Protocol** + Netdata uses a custom 32-bit floating-point format tailored for efficient storage of time-series data, along with an anomaly bit. This, combined with a fixed-step database design, enables efficient storage and retrieval of data. - The streaming protocol of Netdata allows minimizing the resources consumed on production systems by delegating features of to other Netdata agents (Parents), without compromising monitoring fidelity or responsiveness, enabling the creation of a highly distributed observability platform. + | Tier | Approximate Sample Size (bytes) | + |-----------------------------------|---------------------------------| + | High-resolution tier (per-second) | 0.6 | + | Mid-resolution tier (per-minute) | 6 | + | Low-resolution tier (per-hour) | 18 | -## Netdata vs Prometheus + Timestamp optimization further reduces storage overhead by storing timestamps at regular intervals. -Netdata outperforms Prometheus in every aspect. -35% CPU Utilization, -49% RAM usage, -12% network bandwidth, -98% disk I/O, -75% in disk footprint for high resolution data, while providing more than a year of retention. +3. **Intelligent Query Engine** -Read the [full comparison here](https://blog.netdata.cloud/netdata-vs-prometheus-performance-analysis/). + Netdata prioritizes interactive queries over background tasks like machine learning and replication, ensuring optimal user experience, especially under heavy load. -## Energy Efficiency +4. **Efficient Label Storage** -University of Amsterdam contacted a research on the impact monitoring systems have on docker based systems. + Netdata uses pointers to reference shared label key-value pairs, minimizing memory usage, especially in highly dynamic environments. -The study found that Netdata excels in CPU utilization, RAM usage, Execution Time and concluded that **Netdata is the most energy efficient tool**. +5. **Scalable Streaming Protocol** -Read the [full study here](https://www.ivanomalavolta.com/files/papers/ICSOC_2023.pdf). + Netdata's streaming protocol enables the creation of distributed monitoring setups, where Children offload data processing to Parents, optimizing resource utilization. |