diff options
Diffstat (limited to 'docs/why-netdata/1s-granularity.md')
-rw-r--r-- | docs/why-netdata/1s-granularity.md | 21 |
1 files changed, 11 insertions, 10 deletions
diff --git a/docs/why-netdata/1s-granularity.md b/docs/why-netdata/1s-granularity.md index 0d12a2d41..195a0d8f0 100644 --- a/docs/why-netdata/1s-granularity.md +++ b/docs/why-netdata/1s-granularity.md @@ -4,9 +4,9 @@ High resolution metrics are required to effectively monitor and troubleshoot sys ## Why? -- The world is going real-time. Today, customer experience is significantly affected by response time, so SLAs are tighter than ever before. It is just not practical to monitor a 2-second SLA with 10-second metrics. +- The world is going real-time. Today, customer experience is significantly affected by response time, so SLAs are tighter than ever before. It is just not practical to monitor a 2-second SLA with 10-second metrics. -- IT goes virtual. Unlike real hardware, virtual environments are not linear, nor predictable. You cannot expect resources to be available when your applications need them. They will eventually be, but not exactly at the time they are needed. The latency of virtual environments is affected by many factors, most of which are outside our control, like: the maintenance policy of the hosting provider, the work load of third party virtual machines running on the same physical servers combined with the resource allocation and throttling policy among virtual machines, the provisioning system of the hosting provider, etc. +- IT goes virtual. Unlike real hardware, virtual environments are not linear, nor predictable. You cannot expect resources to be available when your applications need them. They will eventually be, but not exactly at the time they are needed. The latency of virtual environments is affected by many factors, most of which are outside our control, like: the maintenance policy of the hosting provider, the work load of third party virtual machines running on the same physical servers combined with the resource allocation and throttling policy among virtual machines, the provisioning system of the hosting provider, etc. ## What do others do? @@ -16,9 +16,9 @@ They want to, but they can't, at least not massively. The reasons lie in their design decisions: -1. Time-series databases (prometheus, graphite, opentsdb, influxdb, etc) centralize all the metrics. At scale, these databases can easily become the bottleneck of the whole infrastructure. +1. Time-series databases (prometheus, graphite, opentsdb, influxdb, etc) centralize all the metrics. At scale, these databases can easily become the bottleneck of the whole infrastructure. -2. SaaS providers base their business models on centralizing all the metrics. On top of the time-series database bottleneck they also have increased bandwidth costs. So, massively supporting high resolution metrics, destroys their business model. +2. SaaS providers base their business models on centralizing all the metrics. On top of the time-series database bottleneck they also have increased bandwidth costs. So, massively supporting high resolution metrics, destroys their business model. Of course, since a couple of decades the world has fixed this kind of scaling problems: instead of scaling up, scale out, horizontally. That is, instead of investing on bigger and bigger central components, decentralize the application so that it can scale by adding more smaller nodes to it. @@ -30,9 +30,9 @@ Finally, per second data collection is a lot harder. Busy virtual environments h So, the monitoring industry fails to massively provide high resolution metrics, mainly for 3 reasons: -1. Centralization of metrics makes monitoring cost inefficient at that rate. -2. Data collection needs optimization, otherwise it will significantly affect the monitored systems. -3. Data collection is a lot harder, especially on busy virtual environments. +1. Centralization of metrics makes monitoring cost inefficient at that rate. +2. Data collection needs optimization, otherwise it will significantly affect the monitored systems. +3. Data collection is a lot harder, especially on busy virtual environments. ## What does Netdata do differently? @@ -45,9 +45,10 @@ To eliminate the error introduced by data collection latencies on busy virtual e Finally, Netdata is really fast. Optimization is a core product feature. On modern hardware, Netdata can collect metrics with a rate of above 1M metrics per second per core (this includes everything, parsing data sources, interpolating data, storing data in the time series database, etc). So, for a few thousands metrics per second per node, Netdata needs negligible CPU resources (just 1-2% of a single core). Netdata has been designed to: -- Solve the centralization problem of monitoring -- Replace the console for performance troubleshooting. + +- Solve the centralization problem of monitoring +- Replace the console for performance troubleshooting. So, for Netdata 1s granularity is easy, the natural outcome... -[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fdocs%2Fwhy-netdata%2F1s-granularity&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)]() +[![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fdocs%2Fwhy-netdata%2F1s-granularity&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) |